A Novel Approach for Robust Perceptual Image Hashing

Perceptual image hashing system generates a short signature called perceptual hash attached to an image before transmission and acts as side information for analyzing the trustworthiness of the received image. In this paper, we propose a novel approach to improve robustness for perceptual image hashing scheme for generating a perceptual hash that should be resistant to content-preserving manipulations, such as JPEG compression and Additive white Gaussian noise (AWGN) also should differentiate the maliciously tampered image and its original version. Our algorithm first constructs a robust image, derived from the original input by analyzing the stability of the extracted features and improving their robustness. From the robust image, which does perceptually resemble the original input, we further extract the final robust features. Next, robust features are suitably quantized allowing the generation of the final perceptual hash using the cryptographic hash function SHA1. The main idea of this paper is to transform the original image into a more robust one that allows the extraction of robust features. Generation of the robust image turns out be quite important since it introduces further robustness to the perceptual image hashing system. The paper can be seen as an attempt to propose a general methodology for more robust perceptual image hashing. The experimental results presented in this paper reveal that the proposed scheme offers good robustness against JPEG compression and Additive white Gaussian noise.


Introduction
The widespread use of multimedia technology has made it relatively easy to manipulate and tamper visual data. In particular, digital image processing and image manipulations tools offer facilities to intentionally alter image content without leaving perceptual traces (Qureshi & Deriche, 2015). Therefore, there should be some mechanism to prove the authenticity of the image in question other than human vision. A simple way to authenticate digital data is to calculate the data hash using standard cryptographic hash functions like MD5 (Rivest, 1992) or SHA1 (National Institute of Standards and Technology [NIST], 1995) and form a digital signature. However, the direct use of cryptographic hash functions is designed to be strongly dependent on every single bit of the input data (Menezes, Oorschot, Vanstone & Rivest, 1997). This property of cryptographic hash functions is not suitable for multimedia data, since the carried information is mostly retained even when the multimedia data have undergone various content-preserving operations like for example compression or filtering. All the content-preserving manipulations change the bits of the multimedia data while leaving the image perception unaltered. Multimedia image authentication, therefore, requires techniques which not authenticate the digital representation of the visual data but its visual appearance. Perceptual image hashing schemes have been proposed as solutions to get over the above problems by establishing the "perceptual equality" of image content. Such schemes extract features from image and generate a hash value (usually just few bytes) based on those features. Perceptual hashes are expected to be able to survive on acceptable content-preserving manipulations and reject malicious manipulations. As an ideal image hashing scheme, visually similar images should have the same perceptual hashes (i.e., perceptual robustness) and visually distinct images should have totally different hashes (i.e., discrimination). Hence, perceptual image hashing should be resistant to content-preserving manipulations, such as JPEG compression and Additive white Gaussian noise, and also should differentiate the maliciously tampered image and its original version. A perceptual image hashing system is also expected to be secure. This means that is impossible to keep the same perceptual hash value for a given image when its perceptual/visually content is modified.
The performance of a perceptual image hashing system primarily consists of robustness, discrimination and security. Robustness means that the perceptual image hashing system always generates the same perceptual hash values for perceptually similar images. Discrimination means that different visually image inputs must result in totally different hash values. A perceptual image hashing system is secure when it is impossible for an adversary to keep the same perceptual hash value in case the image content is perceptually modified. To authenticate the received image, the receiver needs only to compare its hash value with the one of the original image since the reference image does not exist.
Based on the statistical analysis of the extracted features behavior (Hadmi, Puech, Ait Essaid, & Ait ouahman, 2011), we propose in this paper a novel approach for perceptual image hashing system for image authentication that simultaneously attempts to address these core issues. Unlike most existing schemes that only focus on extracting robust visual features to generate the final perceptual hash, we propose a new approach by enhancing the robustness of the extracted features. We propose to transform the original image into a robust one that allows the extraction of robust features to the quantization stage. Thus, after using the cryptographic hash function SHA1, the final robust and secure perceptual hashes are then generated.
Rest of the paper is organized as follows: in Section 2, we introduce an overview of perceptual image hashing schemes published in the literature. In Section 3, we present our proposed method to generate a robust perceptual hash. Section 4 presents experimental results and Section 5 concludes this paper with future directions.

Previous Work
In recent years, accompanying with the rapid development of the technique for digital signal processing, digital images have been indispensable in our daily life. Also, through many image editing software, it has become easy to create or modify images conveniently. This poses a serious problem in case a digital content is to be used as an evidence. To address this issue, perceptual hashing schemes have been proposed. Most of the existing perceptual hashing studies mainly focus on extracting robust visual features and then use them during authentication step. They believe that robustness is ensured by extracting a set of robust visual features that resist (or stay relatively constant) to content-preserving manipulations, and at the same time, should detect malicious manipulations that modify the image content, is the most important goal in perceptual image hashing framework. Since the selected robust visual features are usually publicly calculated, an adversary can adjust them maliciously to match that of another perceptually different image. In this case, the security of the perceptual image hashing scheme is threatened. Current schemes in literature can be classified into two categories. Some works focus on the nearest neighbor search and content-based image retrieval, such as (Wang, Kumar, & Chang, 2012;Kulis & Grauman, 2012;Gorisse, Cord, & Precioso, 2012;Liu, Wu, Yang, Zhuang, & Hauptmann, 2012;Song, Yang, Li, Huang, & Yang, 2014), others are hash methods used for image content authentication (Li, Lu, Zhu, & Niu, 2012;Lv & Wang, 2012;Zhao, Wang, Zhang, & Yao, 2013;Lin, Varodayan, & Girod, 2012). About the latter, according to the difference of extracted feature, existing methods in literature can be classified into three categories, global-feature-based methods, local-feature-based methods, and hybrid-feature-based methods. The Proposed method in (Zhang, Tang, & Li, 2007) is a block-based hash method that was generated via the statistical value of DCT coefficients of image block. The schemes proposed in (Lu & Lia, 2003); Sumalatha, Venkata, & Vijaya, 2012) are also block-based image hash methods. In these methods, hash codes were generated from the statistical feature of the DWT coefficients of image blocks. The scheme (Qin, Chen, Dong, & Zhang, 2016) integrated principal DCT coefficients of the sampled blocks and their corresponding position information to generate robust features. After the compression with dimensionality reduction for the concatenated features, the final image hash was obtained. An image hashing method by using the statistics of wavelet coefficients is presented in (Venkatesan, Koon, Jakubowski, & Moulin, 2000). The scheme in (Swaminathan, Mao, & Wu, 2006) was good at perceptual robustness toward several digital operations, including moderate geometric transform and filtering, however, its performance of discrimination was not good enough. A robust mesh-based hashing method is proposed in (Lu C. S., Hsu C. Y., Sun S. W. & Chang P. C. 2004) that aims at resisting more geometrical distortions by firstly, extracting robust mesh and secondly, extracting mesh-based robust hash and finally matching hash for similarity measurement. However, they still allow limited resistance to geometrical distortions.
Previous schemes considered the security of the system, i.e. the use of a crypto-compression stage to generate the finale perceptual hash. In (Sun & Chang, 2005), for example, the authors proposed to use an error correction coding (ECC). In (Fawad, Siyal, & Abbas, 2010), they proposed to send an additional information beside the perceptual hash in order to adjust contaminated extracted features during the image verification stage before performing quantization. The main disadvantage of such schemes is that they need to send or store additional information in order to correct errors of extracted features, which is too costly in storage space. In the perceptual image hashing field, the robustness and security of a perceptual image hashing system that generates a signature of a fixed length (just few bytes) are very important and must be taken seriously into account. To not wast the storage space and for more efficacity while preserving security, the system should send only the final perceptual hash to the receiver via a secure channel without sending any additional information.

Proposed Method
In this Section, we describe our proposed perceptual hashing scheme. Our aim is to develop a perceptual image hashing system that encompasses the two core components, i.e., robustness and security obtained by the use of a crypto-compression function such as SHA1. To meet these requirements, a transformation of the original image into a robust one is presented. In previous work, a statistical analysis of the extracted features behavior under some attacks (Hadmi, Puech, Ait Essaid, & Ait ouahman, 2011). This analysis has motivated the presented approach. In Section 3.1, we give a description of how we generate the robust image and then we present in Section 3.2 the proposed perceptual image hashing system.

Robust Image Generation
Based on the idea adopted in (Puech, Montesinos, & Dumas 2002), we propose a new method to enhance the robustness of the extracted features used by SHA1 to generate the final hash of 160 bits. Since the exact values of pixels are insignificant in regard to the human vision system, we propose to modify the pixel values of the original image to generate a robust image to a tolerate changes modeled by the threshold, B. The proper selection of B defines the boundary between non-malicious distortion and malicious tampering. The procedure of robust image generation is (Fig. 1) as follows: • Step 1: In the Transformation stage T(.), the input image undergoes spatial and/or frequency transformation to generate a transformed image allowing the extraction of the proper image features.  Step 4: Assuming that the quantization step size Q verifies > 2 , B is the threshold that defines the boundary between non-malicious distortion and malicious tampering. We locate the original features points that beyond the dead zones , , + , and -( + 1) − , ( + 1)in each quantization interval. We determine the spatial image sensitive zones that result each continuous feature in the dead zone. The modification of each sensitive zone in the original image is based on the change of pixels zone values. If the original continuous feature xo beyond the dead zone , , + , or -( + 1) − , ( + 1) -, we increase or decrease the corresponding grey level of pixels zone in order to push xo in the confidence zone , + , ( + 1) − -. The distances + = ( + ) − ,when ∈ , , + , or − = − (( + 1) − )), when ∈ -( + 1) − , ( + 1)determine the number of pixels and the right ones to modify in each sensitive zone. Step 5: When each sensitive zone of the original image is modified to have robust features, we finally generate the robust image that ensures the extraction of robust features. The robust image will be used as input image in the perceptual image hashing system to generate the final hash value.
The threshold B draws the boundary between robustness and security of the authentication system.

Robust Perceptual Image Hashing System
The proposed perceptual image hashing system contains a cryptographic hash function i.e. SHA1 to generate a final perceptual hash of 160 bits as shown in Fig. 3 Vol. 14, No. 3; During the authentication procedure, when the robust image undergoes content-preserving manipulations, the extracted features will not exceed the dead zones fixed by B. Thus, we get the same discrete hash vector as the original image during the Quantization stage, which makes the robustness of the perceptual hash guaranteed. When the changes that undergoes the robust image are significant (malicious attacks), the distorted features will exceed the dead zones even after B adjustment and drop in the neighboring quantization intervals. However, the final perceptual hash definitely will be changed, because the quantized values are changed.

Experimental Results
In this section, we present a number of experiments on which the proposed perceptual image hashing approach has been tested. The tests have been performed considering the image block mean features. The original input image I of size × pixels is split to non-overlapping blocks of size × pixels that we denote by , , where 1 ≤ ≤ and 1 ≤ ≤ . Let is computed and stored in a one dimensional vector that we denote by ( ) , where ∈ *1, … , × + . Elements of ( ) present the continuous intermediate hash.

Generation of robust image in case of mean block features:
• When , ∈-( + 1) − , ( + 1) -, the pixels , of the block , will be sorted in decreasing order.
Let be the number of pixels to modify in the block , .
is given by the following formula: where ⌈. ⌉ is ceiling function.
In case of ≤ × , we select the biggest pixel values in the block , and we decrease their corresponding grey levels by 1, i.e. , ′ = , − 1, where , is the new pixel value at the spatial location (m, n). When is bigger than × , we modify the ′ selected biggest pixels values by more than one grey level, where ′ is the new number of pixels to modify that satisfies ′ < × .

• When
, ∈-, + -, the pixels , of the block , will be sorted in increasing order. Let be the number of pixels to modify in the block , .
is given by the following formula: where ⌈. ⌉ is ceiling function.
In case of ≤ × , we select the smallest pixel values in the block , and we increase their corresponding grey levels by1, i.e., , ′ = , + 1, where , ′ is the new pixel value at the spatial location (m, n). When is bigger than × , we modify the ′ selected smallest pixels values by more than one grey level, where ′ is the new number of pixels to modify that satisfies ′ < × .. For example, if × < ≤ 2( × ), the number of the pixels to modify will be ′ = 2 and the ′ new pixel values will be , ′ = , + 2.
Finally, after modifying suitably all selected pixels by increasing or decreasing their corresponding grey levels, we guarantee that all later extracted features from the robust image belong the confidence zone -+ , ( + 1) − -. We note that increasing B will increase the system's robustness while decreasing the robust image quality. Indeed, when B is high, the number of pixels (Eq. 1) or (Eq. 2) to modify is also high, which decrease the robust image quality in comparison to the original one.   4.(b4)) contains 1.18% erroneous quantized features to that of the original image ( Fig. 4.(a4)), which will cause a false authentication during the crypto-compression stage. Fig. 4.(c1) shows the robust image generated from the original image ( Fig. 4.(a1)) in case of B = 2. The distribution of the continuous intermediate hash of the robust image is shown in Fig. 4.(c3), where we can see that all the continuous features are taken away from the death zones in each quantization interval. When the JPEG compression with QF = 90 is applied to the robust image, we are sure that all elements of the continuous intermediate hash will not exceed each quantization interval death zones after the B = 2 adjustment, as shown in Fig. 4.(d3). This allow us, after the uniform quantization, to get two identical binary intermediate hashes of the robust image and its JPEG compressed version, as shown in Fig.  4.(c4) and Fig. 4.(d4). Thus, the JPEG compressed image (QF = 90) will be positively authenticated.
In case of an Additive white Gaussian noise, Fig. 5 presents a comparison in terms of robustness between the generation of a perceptual hash directly from the Gaussian noisy original image ( Fig. 5(a)) and the generation of a perceptual hash from the robust Gaussian noisy image (B = 3) ( Fig. 5(b)) in case of an Additive white Gaussian noise of a standard deviation σ = 3.
As observed in Fig. 5(a), when an Additive white Gaussian noise of standard deviation σ = 3 is applied, the Gaussian noisy image remains perceptually identical to the original image ( Fig. 4.(a1)), but it causes changes in the extracted features distribution as we can see in Fig. 5(e). Thus, the intermediate binary perceptual hash of the Gaussian noisy image (Fig. 5(g)) contains 3.84% erroneous quantized features to that of the original image ( Fig.  4.(a4)) which will cause a false authentication during the crypto-compression stage. When extracting features from the Gaussian noisy robust image (Fig. 5(b)), we get two identical binary intermediate hashes of the Gaussian noisy robust image (Fig. 5(h)) and the original image ( Fig. 4.(a4)). Thus, the Gaussian noisy image (σ = 3) will positively be authenticated. It is important to note that robustness characteristics vary from image to image. To explore this fact, we formed a database (BOSSBase v1.00 available on: http://agents.cz/boss/BOSSFinal/.) of 100 grayscale different images of size 512 × 512 pixels. Fig. 6 shows the percentage of correct perceptual hashes generated from the original images and the robust images in case of B = 4 and B = 6 as a function of the JPEG compression quality factor (QF). The original perceptual hashes are generated directly from the original images without compression. When we apply directly a JPEG compression with QF = 100, we have 0% of correct perceptual hashes. When we generate perceptual hashes from the robust image, we increase the robustness of the system, for example with QF = 100 and QF = 90, 100% of the extracted perceptual hashes are correct. Note that when increasing B the system robustness increases while decreasing the robust image quality (in term of PSNR), as shown in Fig. 7. Note also that the robust image keeps always good quality: PSNR = 43.08 db in case of B = 4 and PSNR = 38 db in case of B = 6.

Conclusion
In this paper, we have presented a new method for image perceptual hashing to generate a more robust perceptual hash. In our scheme, we construct a robust image allowing the extraction of robust features to desired content-preserving manipulations. The choice of the threshold B depends of the used method of features extraction, the applied quantization step size Q and the type of the content-preserving manipulation. In the experiment results, we presented our method for block mean features and we tested its robustness in case of JPEG compression. Our future research will explore other types of content-preserving manipulations and other methods of feature extraction.