Multi-scale segmentation strategies in PRNU-based image tampering localization

With the rapid development of advanced media technology, especially the popularization of digital cameras and image editing software, digital images can be easily forged without leaving visible clues. Therefore, image forensics technology for identifying the accuracy, integrity, and originality of digital images has become increasingly important. Photo-response non-uniformity (PRNU) noise, a unique fingerprint of imaging sensors, is a valuable forgery detection tool because of its consistently good detection performance. All kinds of forgeries, including copy-move and splicing, can be dealt with in a uniform manner. This paper addresses the problem of forgery localization based on PRNU estimation and aims to improve the resolution of PRNU-based algorithms. Different from traditional overlapping and sliding window-based methods, in which PRNU correlations are estimated on overlapped patches, the proposed scheme is analyzed based on nonoverlapping and irregular patches. First, the test image is segmented into nonoverlapped patches with multiple scales. Second, correlations of PRNU are estimated on nonoverlapped patches to obtain the real-valued candidate tampering probability map for each individual scale. Then, all of the candidate maps are fused into a single and more reliable probability map through an adaptive window strategy. In the final step, the final decision map is obtained by adopting a conditional random field (CRF) to model neighborhood interactions. The contributions of this work include the following: a novel PRNU-based forgery localization scheme using multi-scale nonoverlapping segmentation is proposed for the first time. Furthermore, the adaptive fusion strategy involves selecting the best candidate tampering probability individually for each location in the image. Additionally, the experimental results prove that the proposed scheme can achieve much better detection results and robustness compared with the existing state-of-the-art PRNU-based methods.


Introduction
initially developed a PRNU-based technique for image forgery detection and camera identification. The camera PRNU noise is estimated by averaging noise residues extracted from images acquired by the camera. Given an image, they obtained the pattern noise from the image using a smoothing filter and identified the camera model by comparing with candidate reference patterns. In [7], the maximum likelihood estimator (MLE) was used to estimate the camera PRNU. In view of the good performance of the PRNU-based algorithm, many studies have made improvements under several aspects. Since denoising filtering contributes significantly to the accuracy of PRNU estimation, denoising filters, such as predictors based on the eight-neighbor context-adaptive interpolation (PCAI) algorithms [23] and block matching and 3D filtering (BM3D) algorithms [15], have been discussed. Since the PRNU is a very weak signal, Lin et al. [28] believed that some components of SPN have been severely contaminated by the errors introduced by denoising filters and that the quality of PRNU can be improved by abandoning those components. To reduce undesirable nonunique noise components, Lin et al. [29] proposed the method of equalizing the magnitude spectrum of the reference SPN to decrease the false identification rate. Later, a threestage enhancement of the PRNU was proposed in [26]. More recently, PRNU has been used to detect forgeries caused by hue modification [20]. However, these methods mainly aim to discriminate whether a given image is pristine or fake. In practice, we are more interested in determining the tampered regions, which are called tampering localization.
In this paper, we focus on tampering localization based on the PRNU algorithm. The core of PRNU-based tampering localization involves the correlation of a known noise pattern with its estimate from the investigated image. The operation is often performed in a sliding window manner. To detect small-sized manipulations, Chierchia et al. proposed segmentation-based analysis [9] and a spatially adaptive filtering technique [8]. In addition, the authors in [10] cast the problem in terms of Bayesian estimation and adopted a Markovian prior to model the strong spatial dependences of the source, which allows for the propagation of reliable decisions into ambiguous areas. More recently, a multi-scale analysis was adopted to improve the localization resolution in [25]. Although the above methods can improve the resolution dramatically, these methods are all based on overlapped sliding windows, which can lead to many false decisions when the sliding window falls near the boundary between tampered and authentic regions. Lower localization accuracy near the boundary of tampered objects is still a major problem to be solved in tampering localization. Therefore, obtaining accurate image segmentation is necessary to improve the localization resolution.
Image segmentation aims to partition an image into several parts automatically or with simple interactions. It is a key step in image analysis. Graph cut technology is one of the leading algorithms for interactive segmentation [2], which is suitable for delineating a boundary of one or multiple objects from images. A multilevel banded heuristic for computation of graph cuts is proposed in [31] for fast image segmentation. Recently, an efficient hierarchical graph cut method was proposed for interactive RGB-D image segmentation, which can generate high-quality segmentation results and real-time interactions [19]. In recent years, researchers have focused on superpixels in the field of image segmentation. A new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels, was proposed [1].

Contributions
To address the abovementioned problems, in this study, we propose a novel PRNU-based forgery localization scheme using multi-scale nonoverlapping segmentation. The main contributions of this work are as follows: 1) the test image is segmented into nonoverlapping superpixels of multiple scales by the SLIC algorithm. This is the first time that a multi-scale SLIC strategy is proposed in the framework of PRNU-based algorithms. 2) In each individual scale, unlike existing sliding window-based algorithms in which PRNU correlations are estimated on overlapped sliding windows, our algorithm directly computes correlations on nonoverlapped irregular patches, which can accurately delineate boundaries of contrasting objects with lower complexity. 3) An adaptive fusion strategy is used to combine multi-scale tampering probability maps. 4) Compared with existing state-of-the-art PRNU-based methods, the proposed algorithm retains better experimental results in diverse situations.
The rest of this paper is organized as follows. Section 2 introduces the PRNU-based localization method. Section 3 describes the proposed strategy in detail. Section 4 shows a series of experiments and comparisons with state-of-the-art methods. Finally, conclusions are drawn in section 5.

Background
This section mainly introduces basic analysis strategies in PRNU-based tampering localization. Let y ∈ R N be a digital image taken from a given camera, y i indicates the value at site i, either the grayscale or a single color component from a color image. Let us consider a simplified model [32] in which y can be written as: where x is the acquired noise-free image, θ an additive noise term, and k is the camera PRNU. For the purpose of forgery detection, k is the signal of interest, while all the rest can be considered undesired disturbances. Therefore, to eliminate the original signal x, the noise residual r is estimated as follows: wherex ¼ D y ð Þ is an estimate of the noise-free image x by applying a denoising filter D and n is the ensemble of all disturbances.
The main steps of the PRNU-based algorithm are as follows.
As the preliminary step, the camera PRNU is estimated by a large number of photos taken by the target camera. The noise residuals are extracted using Eq. (2) from a number of lowcontrast images taken by the target camera; then, the camera PRNU k is obtained by maximum likelihood estimation of noise residuals [7]. That iŝ where m is the number of images involved in the calculation, I i is the ith image taken by the target camera, and W i is the corresponding noise residual extracted from I i . Note that the multiplication operation in Eq. (3) is element wise. Then, the image PRNU is estimated by Eq. (2) in the second step. Since there is only one image to be detected, the noise residual r is often used to approximate its image PRNU.
Third, tamper detection was based on sliding window analysis. Let w i denote the sliding analysis window of size w × w centered around pixel i. For each analysis window w i , the normalized cross-correlation q i is used to compare the image PRNU against the camera PRNU in Eq. (4).
Note that r w i is the noise residual and z w i ¼ y w i ⋅k w i is an estimate of the camera PRNU in Eq. (4). Given k, the detection problem can be formulated as a binary hypothesis test between hypothesis H 0 that the camera PRNU is absent and hypothesis H 1 that the PRNU is present: where the expected correlation predictorq i account for special situations such as saturated image regions where PRNU cannot be detected. σ 0 , σ 1 are the variances of the detection statistics for H 0 and H 1 , respectively. Then, Korus et al. [25] converted the measured correlation q i into tampering probability map c i : In the last step, the final decision map is obtained by a conditional random field (CRF) model [25].

The proposed PRNU-based multi-scale tampering localization algorithm
This section describes the proposed multi-scale segmentation strategies in PRNU-based image tampering localization. Figure 1 shows the framework of the proposed algorithm. First, a multi-scale segmentation method is proposed to segment the test image into successive scales. The segmentation result for each scale is composed of nonoverlapping and irregular patches. For each scale, PRNU correlations are computed on nonoverlapped patches to obtain a realvalued candidate tampering probability map. Subsequently, the candidate tampering probability maps of all scales are fused into a single, more reliable map by the adaptive fusion method. Finally, we use CRF modeling to obtain the final decision map. The SLIC algorithm is applied to segment the input image on multiple scales. SLIC adopts a kmeans clustering approach to efficiently generate superpixels and adheres to boundaries as well as better than other similar segmentation methods [1]. At the same time, it is fast, memory efficient and simple to use. In most cases, one image with a size of 1920 × 1080 can be segmented into thousands of patches in 2 s using a personal computer with a 3.60 GHz CPU with 16 GB of RAM. By default, the only parameter of the algorithm is J, the desired number of superpixels.
Assume the size of the test image is M × N, in the segmentation stage, the computational complexity of overlapping segmentation is O(MN), and the nonoverlapping segmentation is O(J), which is much lower than the former. Compared with the existing sliding window-based analysis, superpixel segmentation by SLIC can significantly reduce the complexity of the subsequent image processing. Furthermore, the irregular and meaningful regions can adhere to the boundary better than the regular blocks. Figure 2 gives an example of image segmentation obtained by SLIC. The man dressed in red in the middle of the image is in the tampered regions. We segment the image by means of the SLIC algorithm with the initial number of superpixels J = 100 (left) and 1000 (right).
However, the initial number of superpixels in SLIC is difficult to determine. It is difficult to detect and locate tampered objects of various sizes with one fixed initial segmentation number. Different initial numbers of superpixels can produce different forgery detection results. When J is too small, the average size of superpixels is too large. Reliable statistics can be obtained for large superpixels. However, the small tampered regions will occupy only a small part of the superpixels, and thus it may cause false detection in the following steps. In contrast, if J is large enough to decompose all possible tampered areas, the average size of superpixels is too small. Smaller tampered regions can be accurately detected on smaller scales, while smaller superpixels yield more noise and uncertainty. Hence, to combine the benefits of small-scale and large-scale analysis, we propose to segment the test image into multiple scales. Therefore, the proposed multi-scale segmentation method plays an important role in detecting various size forgeries.

Multi-scale segmentation
In the first step of the proposed algorithm, we segment the test image I into S scales, and the segments satisfy: where T s j indicates the jth patch, and there are J s total segments on the sth scale. S is the total number of scales.
Note that segments for each scale s (s = 1,…, S) are nonoverlapped and segments from different scales should not be intentionally the same, that is:

Obtaining a tampering probability map based on PRNU analysis across each scale
In contrast to sliding window-based analysis [25,32], the proposed nonoverlapped regions of irregular shape are expected to accurately delineate boundaries of contrasting objects with lower complexity. The image PRNU and the camera PRNU are estimated by Eq. (2) and Eq.
(3), respectively. Then, a PRNU-based analysis is conducted across each scale s (s = 1,…, S) to obtain a tampering probability map. In the rest of this section, unless specified, all operations are performed on the same scale s. For each patch T s j j ¼ 1; ⋯; J s ð Þ , the correlation between the image PRNU and the camera PRNU is computed only for pixels that belong to the jth patch, that is:  H 0 : q j ∼Ν 0; σ 0 ω j À Á À Á where σ 0 (ω j ),q j ω j À Á and σ 1 (ω j ) are obtained by cubic spline interpolation between the original value σ 0 (ω s ),q j ω s ð Þ and σ 1 (ω s ) used in multi-scale square window analysis. Note that the square window {ω s }(s ∈ {1, ⋯, S}) used for spline interpolation is the same as [25].
To prevent excessive degradation of the correlation statistic, at least ω 2 min pixels are required for the computation in the proposed scheme. If the segmentation yields a smaller region, we expand it with morphological dilation.
Then, candidate tampering probability maps c s for each scale s(s = 1, ⋯, S) are obtained by Eq. (6). The detailed steps of the proposed PRNU-based nonoverlapping segmentation algorithm are shown in Algorithm 1.

Fusion of the multi-scale tampering probability maps
With the analyses of PRNU-based multi-scale nonoverlapping segmentation, a set of tampering probability maps c s (s = 1, ⋯, S) of the test image can be obtained. The next task is to fuse multi-scale tampering probability maps using an adaptive fusion approach to obtain a single, more reliable tampering probability map. It can combine the benefits of both small-scale and large-scale analyses. The analysis starts by evaluating the tampering probability c s i according to Eq. (6) for the smallest scale (e.g., s = 1 in our experiment). Note that i denotes the location of the ith pixel. If the patch is too small and a confident decision cannot be reached, the patch size is increased to the next available scale s + 1. Such an approach uses smaller patches in more confident, bright and flat areas and larger patches are used in darker, more textured regions of the image. In our experiments, we proceed to the next patch size if jc s i −0:5j < 0:5−Δc 1 . The new tampering probability estimate is accepted if it is more confident than the previous one. If the next (larger) scale reinforces a previous, reasonably confident detection (jc s i −0:5j > Δc 2 ), we stop increasing the scale. The described algorithm is summarized as pseudocode in Algorithm 2.
An example of the effect of using the adaptive fusion algorithm can be seen in Fig. 3, with color bars from 1 to 7 representing 7 scales. In this case, noisier and more uncertain regions are replaced by another region taken from a different scale.

Obtaining the final decision map
Based on the obtained tampering probability map, the final decision map is adopted by a CRF model. The tampering probability map c can be formulated in terms of CRF and resolves to find the optimal labeling of authentication units (with labels t i = 1 denotes tampered regions) that minimizes the following energy function [25]: where N is the number of pixels in the test image. The decision is controlled by a decision threshold τ and parameterized by tampering penalty α and interaction parameter β.
Readers can refer to [25] for more details. To speed up processing, the tamper probability map is resized to a smaller size (e.g., 240 × 135 in our experiment) before using CRF in the proposed scheme.

Experimental results
In this section, we discuss the performance of the proposed technique. The proposed method was implemented using MATLAB2015a on a computer with a 3.4 GHz CPU and 16 GB of RAM. In this section, the forgery localization performance is evaluated with the F 1 -score as follows:  where TP, FN, FP denote statistics of the detected true positives, false negatives, and false positives, respectively. In addition, we also generate the corresponding receiver operation characteristics (ROC) curve by sweeping the decision threshold τ over 24 values, uniformly distributed in (0, 1).

Dataset selection
Experiments are conducted on a realistic tampering dataset proposed by Korus et al. [25], which contains a total of 136 tampered images originating from four cameras: a Sony α57, a Canon 60D, a Nikon D90, and a Nikon D7000. The cameras contain 52, 27, 31 and 26 tampered images, respectively. All images have the same size of 1920 × 1080 pixels RGB uint8 bitmaps stored in the TIFF format. The forgeries are of various sizes and characters and include object insertion, object removal and more subtle changes to existing content, such as subtle shadows or reflections, which are unlikely to be detected with PRNU analysis. The experiment was performed separately for each camera. Table 1 shows the parameter values used in the experiments. The square window ω s (s = 1, ⋯, S) used for interpolation includes {32, 48, 64, 96, 128, 192, 256}. Parameter J is related to the number of segmentation patches. Note that, in our experiment, the relationship between ω s and J s satisfies Eq. (13):

Parameter selection
where M × N represents the size of the test image and [⋅] indicates the rounding function. The parameters used in the camera model and CRF decision are the same as in [25].

Localization performance and comparisons
To validate the effectiveness of the proposed multi-scale SLIC (M-SLIC) algorithm, we compare the proposed scheme with the 7 single-scale ({2025, 900, 506, 225, 127, 56, 32}) SLIC methods. ROC curves and the average F 1 -score are plotted in Fig. 4 when changing the decision threshold τ on each of the camera datasets separately. Figure 4a shows the ROC curves for four cameras. To improve readability, we show only a close-up of the most relevant region. Compared with all 7 single scales, the proposed M-SLIC strategy delivered superior performance for all four cameras. Similar results can be observed in Fig. 4b, where the average F 1 -score plotted when changing the decision threshold τ on each of the datasets. The maximum average F 1 -score of the M-SLIC method performed better than all 7 single scales for all cameras. This confirmed that the proposed fusion strategy could effectively combine the benefits of both small-scale and large-scale analyses.
To assess the performance of the proposed PRNU-based M-SLIC algorithm, we also compare it with two other PRNU-based methods. One is the sliding window-based segmentation-guided (SW-SG) strategy [25], and the other is the sliding window-based single scale (SW-SS) detectors with the standard 128 × 128 pixel window [32]. For these two methods, we use the source codes provided by the authors with default parameters to generate the results. In In our previous study, it was confirmed through experiments that the average F 1 -scores all reached the maximum, with the parameter J = 700 for all four cameras. Therefore, in the following comparison, J is fixed to 700 in the S-SLIC algorithm. The obtained results are shown in Figs. 5 and 6. It can be seen that for the Sony α57, Canon 60D and Nikon D7000 cameras, the most stable improvement can be seen for the proposed M-SLIC strategy, which performs better than the SW-SG, SW-SS and S-SLIC methods. For the Nikon D90 dataset, M-SLIC has similar performance as the SW-SG method and is better than the other two methods. Similar tendencies can be observed from the average F 1 -score plotted in Fig. 6.
To clearly show which methods perform better, the maximum average F 1 -score is shown in Table 2. The maximum value of each camera is highlighted in bold. As shown in Table 2, most of the bold numbers appear in the proposed scheme (the last column), indicating the effectiveness of the proposed M-SLIC algorithm. The insignificant performance decline appears in the Nikon D90 camera. The reason is that in the dataset of the Nikon D90 camera, there are many subtle object removal forgeries.
We also present some examples of tampering localization results in Fig. 7 for the strategies mentioned above. It can be observed that the proposed algorithm can detect small size and large size forgeries. At the same time, the proposed algorithm can not only detect additive tampering (2th and 4th rows), but also detect object removal tampering (1th and 3th rows). Besides, compared with the SW-SG and SW-SS algorithms, the proposed scheme can achieve much better localization accuracy.

JPEG compression robustness test
The tampered image may undergo JPEG compression after manipulation. The following experimental results demonstrate the performance of the proposed method when the images are JPEG compressed. We used the Nikon D7000 dataset for this experiment. Photoshop is used to compress the TIFF images into JPEG format images with quality factors varying from 100 to 70 in steps of −10. That is, each picture in the original forgery dataset is altered to four versions. We used the same camera models and predictors as in the previous experiments.  The impact of JPEG compression on tampering localization performance is shown in Fig. 8. As the quality factor decreases, the detection result deteriorates. The proposed multi-scale fusion strategy delivers the best maximum average F 1 -score for all JPEG quality factors.

Conclusion
This paper introduced a novel PRNU-based multi-scale fusion method to expose copy-move and splicing forgery in digital images. Different from existing sliding window-based algorithms, in which correlations of PRNU are estimated on overlapped patches, the proposed  The merits of the proposed approach are as follows. The proposed algorithm is particularly good at identifying the location and shape of the object insertion forgeries. It uses nonoverlapped irregular segmentation, which can accurately delineate boundaries of contrasting objects with lower computational complexity. In addition, multi-scale analysis can detect as many types of forgery as possible.
Despite the present advances, there is still considerable room for improvement. Although subtle object removal forgeries can be detected by the proposed scheme, the localization accuracy needs to be further improved in future work. Since PRNU can be contaminated mainly by image content and non-unique artefacts of JPEG compression, how to improve the estimated quality of PRNU is one of the major research orientations in the future. In addition, as a future study, we suggest that we need to design a better and more robust correlation predictor for the PRNU detector.