1 Introduction

Today’s advanced media technology, such as digital image processing, video coding [38], high-efficiency video coding (HEVC) [24, 35], Internet of Things (IoT) [36, 45], and cloud computing (CC) [39], represents a fascinating time that will considerably affect daily life. In particular, digital images are being used in many applications such as the military, medical diagnosis, art pieces, and photography. The reliability of digital images is thus becoming an important issue. However, currently, it is very easy to manipulate digital images without leaving visible traces using photo editing software. Therefore, it is important to focus on the image forensics field. One of the principal problems in image forensics is determining whether a particular image is authentic and, if manipulated, to localize which parts have been altered. Since forgery localization requires pixel-level analysis rather than image-level analysis, it faces more challenges compared to forgery detection.

1.1 Related works

Instead of using digital watermarks [3] and signatures [47], many passive methods have been proposed for image forgery detection. Copy-move and splicing forgery are the most common forms to manipulate digital images. For copy-move forgery, there are mainly two classes of detection algorithms [11]. One is based on blockwise division, such as discrete wavelet transform (DWT) [43], principal component analysis (PCA) [34], and Zernike moments [42], and the other is based on keypoint extraction, such as scale-invariant feature transform (SIFT) [4, 27, 41] and speeded-up robust features (SURF) [37]. For splicing forgery, the spliced region from another image has a significantly different intrinsic noise variance. The method in [33] exposes region splicing by revealing inconsistencies in local noise levels. However, the spliced region and the target image differ under many more aspects than just noise. In [13], a feature-based algorithm to detect image splicing was proposed. Local features were computed from the co-occurrence of image residuals and used to extract synthetic feature parameters. The authors in [14] regarded features coming from the spliced area as anomalies and iterated autoencoder-based modeling and discriminative labeling to distinguish them. Noise discrepancies in multi-scales are used for image splicing forgery detection in [40]. Similarly, Yao et al. [46] explored possible noise level inconsistency using a noise level function (NLF) to detect image splicing.

Recently, a multitask fully convolutional network (MFCN) was proposed to localize image splicing attacks [44]. Since JPEG format is widely used and image splicing usually involves the operation of double JPEG compression, Bianchi et al. [5] exploited the artifacts arising from double JPEG compression. The illumination environment in pictures also presents some consistency: directions of lights [21], shadows [30] and illumination colors [6] can be estimated and used as cues. However, the methods mentioned above are intrinsically sensitive only to specific manipulations.

In addition, some methods rely on machine learning [12, 16, 18] and have reported good performance. However, these methods essentially depend on the availability and quality of training data, which is not always guaranteed.

An interesting approach for forgery detection relies on the characteristics of the digital camera, such as the color filter array (CFA) interpolation artifacts [17], lens aberration [22] and sensor pattern noise (SPN) [32], which has drawn considerable attention due to the uniqueness of individual cameras and the stability against environmental conditions. Photo-response non-uniformity (PRNU) noise is the dominant component of SPN. PRNU is the result of imperfections caused by the manufacturing process and the inhomogeneity of silicon wafers. Lukas et al. [32] initially developed a PRNU-based technique for image forgery detection and camera identification. The camera PRNU noise is estimated by averaging noise residues extracted from images acquired by the camera. Given an image, they obtained the pattern noise from the image using a smoothing filter and identified the camera model by comparing with candidate reference patterns. In [7], the maximum likelihood estimator (MLE) was used to estimate the camera PRNU. In view of the good performance of the PRNU-based algorithm, many studies have made improvements under several aspects. Since denoising filtering contributes significantly to the accuracy of PRNU estimation, denoising filters, such as predictors based on the eight-neighbor context-adaptive interpolation (PCAI) algorithms [23] and block matching and 3D filtering (BM3D) algorithms [15], have been discussed. Since the PRNU is a very weak signal, Lin et al. [28] believed that some components of SPN have been severely contaminated by the errors introduced by denoising filters and that the quality of PRNU can be improved by abandoning those components. To reduce undesirable nonunique noise components, Lin et al. [29] proposed the method of equalizing the magnitude spectrum of the reference SPN to decrease the false identification rate. Later, a three-stage enhancement of the PRNU was proposed in [26]. More recently, PRNU has been used to detect forgeries caused by hue modification [20]. However, these methods mainly aim to discriminate whether a given image is pristine or fake. In practice, we are more interested in determining the tampered regions, which are called tampering localization.

In this paper, we focus on tampering localization based on the PRNU algorithm. The core of PRNU-based tampering localization involves the correlation of a known noise pattern with its estimate from the investigated image. The operation is often performed in a sliding window manner. To detect small-sized manipulations, Chierchia et al. proposed segmentation-based analysis [9] and a spatially adaptive filtering technique [8]. In addition, the authors in [10] cast the problem in terms of Bayesian estimation and adopted a Markovian prior to model the strong spatial dependences of the source, which allows for the propagation of reliable decisions into ambiguous areas. More recently, a multi-scale analysis was adopted to improve the localization resolution in [25]. Although the above methods can improve the resolution dramatically, these methods are all based on overlapped sliding windows, which can lead to many false decisions when the sliding window falls near the boundary between tampered and authentic regions. Lower localization accuracy near the boundary of tampered objects is still a major problem to be solved in tampering localization. Therefore, obtaining accurate image segmentation is necessary to improve the localization resolution.

Image segmentation aims to partition an image into several parts automatically or with simple interactions. It is a key step in image analysis. Graph cut technology is one of the leading algorithms for interactive segmentation [2], which is suitable for delineating a boundary of one or multiple objects from images. A multilevel banded heuristic for computation of graph cuts is proposed in [31] for fast image segmentation. Recently, an efficient hierarchical graph cut method was proposed for interactive RGB-D image segmentation, which can generate high-quality segmentation results and real-time interactions [19]. In recent years, researchers have focused on superpixels in the field of image segmentation. A new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels, was proposed [1].

1.2 Contributions

To address the abovementioned problems, in this study, we propose a novel PRNU-based forgery localization scheme using multi-scale nonoverlapping segmentation. The main contributions of this work are as follows: 1) the test image is segmented into nonoverlapping superpixels of multiple scales by the SLIC algorithm. This is the first time that a multi-scale SLIC strategy is proposed in the framework of PRNU-based algorithms. 2) In each individual scale, unlike existing sliding window-based algorithms in which PRNU correlations are estimated on overlapped sliding windows, our algorithm directly computes correlations on nonoverlapped irregular patches, which can accurately delineate boundaries of contrasting objects with lower complexity. 3) An adaptive fusion strategy is used to combine multi-scale tampering probability maps. 4) Compared with existing state-of-the-art PRNU-based methods, the proposed algorithm retains better experimental results in diverse situations.

The rest of this paper is organized as follows. Section 2 introduces the PRNU-based localization method. Section 3 describes the proposed strategy in detail. Section 4 shows a series of experiments and comparisons with state-of-the-art methods. Finally, conclusions are drawn in section 5.

2 Background

This section mainly introduces basic analysis strategies in PRNU-based tampering localization. Let y ∈ RN be a digital image taken from a given camera, yi indicates the value at site i, either the grayscale or a single color component from a color image. Let us consider a simplified model [32] in which y can be written as:

$$ y=x+ kx+\theta $$
(1)

where x is the acquired noise-free image, θ an additive noise term, and k is the camera PRNU. For the purpose of forgery detection, k is the signal of interest, while all the rest can be considered undesired disturbances. Therefore, to eliminate the original signal x, the noise residual r is estimated as follows:

$$ r=y-\widehat{x}= yk+\left(x-\widehat{x}\right)+\left(x-y\right)k+\theta = yk+n $$
(2)

where \( \widehat{x}=D(y) \) is an estimate of the noise-free image x by applying a denoising filter D and n is the ensemble of all disturbances.

The main steps of the PRNU-based algorithm are as follows.

As the preliminary step, the camera PRNU is estimated by a large number of photos taken by the target camera. The noise residuals are extracted using Eq. (2) from a number of low-contrast images taken by the target camera; then, the camera PRNU k is obtained by maximum likelihood estimation of noise residuals [7]. That is

$$ \widehat{k}=\frac{\sum \limits_{i=1}^m{W}_i{I}_i}{\sum \limits_{i=1}^m{I}_i^2}\kern0.5em i=1,\cdots, m $$
(3)

where m is the number of images involved in the calculation, Ii is the ith image taken by the target camera, and Wi is the corresponding noise residual extracted from Ii. Note that the multiplication operation in Eq. (3) is element wise.

Then, the image PRNU is estimated by Eq. (2) in the second step. Since there is only one image to be detected, the noise residual r is often used to approximate its image PRNU.

Third, tamper detection was based on sliding window analysis. Let wi denote the sliding analysis window of size w × w centered around pixel i. For each analysis window wi, the normalized cross-correlation qi is used to compare the image PRNU against the camera PRNU in Eq. (4).

$$ {q}_i= corr\left({r}_{w_i},{z}_{w_i}\right)=\frac{\left({r}_{w_i}-{\overline{r}}_{w_i}\right)\odot \left({z}_{w_i}-{\overline{z}}_{w_i}\right)}{\left\Vert {r}_{w_i}-{\overline{r}}_{w_i}\right\Vert \cdot \left\Vert {z}_{w_i}-{\overline{z}}_{w_i}\right\Vert } $$
(4)

Note that \( {r}_{w_i} \) is the noise residual and \( {z}_{w_i}={y}_{w_i}\cdot {k}_{w_i} \) is an estimate of the camera PRNU in Eq. (4).

Given k, the detection problem can be formulated as a binary hypothesis test between hypothesis H0 that the camera PRNU is absent and hypothesis H1 that the PRNU is present:

$$ \Big\{{\displaystyle \begin{array}{c}{H}_0:\kern0.5em {q}_i\sim N\left(0,{\sigma}_0\right)\\ {}\begin{array}{cc}{H}_1:& {q}_i\end{array}\sim N\left({\widehat{q}}_i,{\sigma}_1\right)\end{array}} $$
(5)

where the expected correlation predictor \( {\widehat{q}}_i \) account for special situations such as saturated image regions where PRNU cannot be detected. σ0, σ1 are the variances of the detection statistics for H0 and H1, respectively.

Then, Korus et al. [25] converted the measured correlation qi into tampering probability map ci:

$$ {c}_i=P\left({q}_i|{\sigma}_0,{\sigma}_1,{\widehat{q}}_i\right)={\left(1+{e}^{-\log \left({\sigma}_1/{\sigma}_0\right)-\frac{{\left({q}_i-{\widehat{q}}_i\right)}^2}{2{\sigma}_1^2}+\frac{q_i^2}{2{\sigma}_0^2}}\right)}^{-1} $$
(6)

In the last step, the final decision map is obtained by a conditional random field (CRF) model [25].

3 The proposed PRNU-based multi-scale tampering localization algorithm

This section describes the proposed multi-scale segmentation strategies in PRNU-based image tampering localization. Figure 1 shows the framework of the proposed algorithm. First, a multi-scale segmentation method is proposed to segment the test image into successive scales. The segmentation result for each scale is composed of nonoverlapping and irregular patches. For each scale, PRNU correlations are computed on nonoverlapped patches to obtain a real-valued candidate tampering probability map. Subsequently, the candidate tampering probability maps of all scales are fused into a single, more reliable map by the adaptive fusion method. Finally, we use CRF modeling to obtain the final decision map.

Fig. 1
figure 1

Framework of the proposed multi-scale tampering localization scheme

The SLIC algorithm is applied to segment the input image on multiple scales. SLIC adopts a k-means clustering approach to efficiently generate superpixels and adheres to boundaries as well as better than other similar segmentation methods [1]. At the same time, it is fast, memory efficient and simple to use. In most cases, one image with a size of 1920 × 1080 can be segmented into thousands of patches in 2 s using a personal computer with a 3.60 GHz CPU with 16 GB of RAM. By default, the only parameter of the algorithm is J, the desired number of superpixels.

Assume the size of the test image is M × N, in the segmentation stage, the computational complexity of overlapping segmentation is O(MN), and the nonoverlapping segmentation is O(J), which is much lower than the former. Compared with the existing sliding window-based analysis, superpixel segmentation by SLIC can significantly reduce the complexity of the subsequent image processing. Furthermore, the irregular and meaningful regions can adhere to the boundary better than the regular blocks.

Figure 2 gives an example of image segmentation obtained by SLIC. The man dressed in red in the middle of the image is in the tampered regions. We segment the image by means of the SLIC algorithm with the initial number of superpixels J = 100 (left) and 1000 (right).

Fig. 2
figure 2

An example of image segmentation using SLIC. The initial number of superpixels aJ = 100 and bJ = 1000

However, the initial number of superpixels in SLIC is difficult to determine. It is difficult to detect and locate tampered objects of various sizes with one fixed initial segmentation number. Different initial numbers of superpixels can produce different forgery detection results. When J is too small, the average size of superpixels is too large. Reliable statistics can be obtained for large superpixels. However, the small tampered regions will occupy only a small part of the superpixels, and thus it may cause false detection in the following steps. In contrast, if J is large enough to decompose all possible tampered areas, the average size of superpixels is too small. Smaller tampered regions can be accurately detected on smaller scales, while smaller superpixels yield more noise and uncertainty. Hence, to combine the benefits of small-scale and large-scale analysis, we propose to segment the test image into multiple scales. Therefore, the proposed multi-scale segmentation method plays an important role in detecting various size forgeries.

3.1 Multi-scale segmentation

In the first step of the proposed algorithm, we segment the test image I into S scales, and the segments satisfy:

$$ \Big\{{\displaystyle \begin{array}{c}\underset{j=1}{\overset{J_s}{\cup }}{T}_j^s=I\\ {}{T}_i^s\cap {T}_j^s=\varPhi \left(i\ne j,i,j=1,\cdots, {J}_s\right)\end{array}} for\kern0.5em s=1,\cdots, S $$
(7)

where \( {T}_j^s \) indicates the jth patch, and there are Js total segments on the sth scale. S is the total number of scales.

Note that segments for each scale s (s = 1,…, S) are nonoverlapped and segments from different scales should not be intentionally the same, that is:

$$ {T}^p\ne {T}^q\kern0.5em \mathrm{for}\kern0.5em p\ne q $$
(8)

3.2 Obtaining a tampering probability map based on PRNU analysis across each scale

In contrast to sliding window-based analysis [25, 32], the proposed nonoverlapped regions of irregular shape are expected to accurately delineate boundaries of contrasting objects with lower complexity. The image PRNU and the camera PRNU are estimated by Eq. (2) and Eq. (3), respectively. Then, a PRNU-based analysis is conducted across each scale s (s = 1,…, S) to obtain a tampering probability map. In the rest of this section, unless specified, all operations are performed on the same scale s. For each patch \( {T}_j^s\left(j=1,\cdots, {J}_s\right) \), the correlation between the image PRNU and the camera PRNU is computed only for pixels that belong to the jth patch, that is:

$$ {q}_j= corr\left({R}_j^s,{Z}_j^s\right)=\frac{\left({R}_j^s-{\overline{R}}_j^s\right)\odot \left({Z}_j^s-{\overline{Z}}_j^s\right)}{\left\Vert {R}_j^s-{\overline{R}}_j^s\right\Vert \cdot \left\Vert {Z}_j^s-{\overline{Z}}_j^s\right\Vert}\kern0.5em \left(j=1,\cdots, {J}_s\right) $$
(9)

where\( {R}_j^s \) and \( {Z}_j^s={T}_j^s\cdot {K}_j^s \) are the image PRNU and camera PRNU, respectively, of the patch \( {T}_j^s \).

The problem can be cast as a binary test between hypothesis H0 that the camera PRNU is absent and hypothesis H1 that the camera PRNU is present. We define \( \sum {T}_j^s \) as the number of actual pixels of the patch \( {T}_j^s \). Thus, \( {\omega}_j=\left[\sqrt{\sum {T}_j^s}\right] \) represents the equivalent square window size and [⋅] represents the rounding function. Since the test image is divided into nonoverlapped regions of irregular shape, the distribution models for the hypothesis in Eq. (5) are adjusted according to the number of actual pixels used in the correlation calculation:

$$ \Big\{{\displaystyle \begin{array}{ll}{H}_0:& {q}_j\sim N\left(0,{\sigma}_0\left({\omega}_j\right)\right)\\ {}{H}_1:& {q}_j\sim N\left({\widehat{q}}_j\left({\omega}_j\right),{\sigma}_1\left({\omega}_j\right)\right)\end{array}} $$
(10)

where σ0(ωj), \( {\widehat{q}}_j\left({\omega}_j\right) \) and σ1(ωj) are obtained by cubic spline interpolation between the original value σ0(ωs), \( {\widehat{q}}_j\left({\omega}_s\right) \) and σ1(ωs) used in multi-scale square window analysis. Note that the square window {ωs}(s ∈ {1, ⋯, S}) used for spline interpolation is the same as [25].

To prevent excessive degradation of the correlation statistic, at least \( {\omega}_{\mathrm{min}}^2 \) pixels are required for the computation in the proposed scheme. If the segmentation yields a smaller region, we expand it with morphological dilation.

Then, candidate tampering probability maps cs for each scale s(s = 1, ⋯, S) are obtained by Eq. (6). The detailed steps of the proposed PRNU-based nonoverlapping segmentation algorithm are shown in Algorithm 1.

figure e

3.3 Fusion of the multi-scale tampering probability maps

With the analyses of PRNU-based multi-scale nonoverlapping segmentation, a set of tampering probability maps cs(s = 1, ⋯, S) of the test image can be obtained. The next task is to fuse multi-scale tampering probability maps using an adaptive fusion approach to obtain a single, more reliable tampering probability map. It can combine the benefits of both small-scale and large-scale analyses. The analysis starts by evaluating the tampering probability \( {c}_i^s \) according to Eq. (6) for the smallest scale (e.g., s = 1 in our experiment). Note that i denotes the location of the ith pixel. If the patch is too small and a confident decision cannot be reached, the patch size is increased to the next available scale s + 1. Such an approach uses smaller patches in more confident, bright and flat areas and larger patches are used in darker, more textured regions of the image. In our experiments, we proceed to the next patch size if \( \mid {c}_i^s-0.5\mid <0.5-\varDelta {c}_1 \). The new tampering probability estimate is accepted if it is more confident than the previous one. If the next (larger) scale reinforces a previous, reasonably confident detection (\( \mid {c}_i^s-0.5\mid >\varDelta {c}_2 \)), we stop increasing the scale. The described algorithm is summarized as pseudocode in Algorithm 2.

figure f

An example of the effect of using the adaptive fusion algorithm can be seen in Fig. 3, with color bars from 1 to 7 representing 7 scales. In this case, noisier and more uncertain regions are replaced by another region taken from a different scale.

Fig. 3
figure 3

An example of an adaptive fusion method. Color bars from 1 to 7 represent 7 scales

3.4 Obtaining the final decision map

Based on the obtained tampering probability map, the final decision map is adopted by a CRF model. The tampering probability map c can be formulated in terms of CRF and resolves to find the optimal labeling of authentication units (with labels ti = 1 denotes tampered regions) that minimizes the following energy function [25]:

$$ E\left(t|c\right)=\sum \limits_{i=1}^N{E}_{\tau}\left({c}_i,{t}_i\right)+\alpha \sum \limits_{i=1}^N{t}_i+\sum \limits_{i=1}^N\sum \limits_{j\in {\varDelta}_i}{\beta}_{ij}\mid {t}_i-{t}_j\mid $$
(11)

where N is the number of pixels in the test image. The decision is controlled by a decision threshold τ and parameterized by tampering penalty α and interaction parameter β. Readers can refer to [25] for more details. To speed up processing, the tamper probability map is resized to a smaller size (e.g., 240 × 135 in our experiment) before using CRF in the proposed scheme.

4 Experimental results

In this section, we discuss the performance of the proposed technique. The proposed method was implemented using MATLAB2015a on a computer with a 3.4 GHz CPU and 16 GB of RAM. In this section, the forgery localization performance is evaluated with the F1-score as follows:

$$ {F}_1=\frac{2\cdot TP}{2\cdot TP+ FN+ FP} $$
(12)

where TP, FN, FP denote statistics of the detected true positives, false negatives, and false positives, respectively. In addition, we also generate the corresponding receiver operation characteristics (ROC) curve by sweeping the decision threshold τ over 24 values, uniformly distributed in (0, 1).

4.1 Dataset selection

Experiments are conducted on a realistic tampering dataset proposed by Korus et al. [25], which contains a total of 136 tampered images originating from four cameras: a Sony α57, a Canon 60D, a Nikon D90, and a Nikon D7000. The cameras contain 52, 27, 31 and 26 tampered images, respectively. All images have the same size of 1920 × 1080 pixels RGB uint8 bitmaps stored in the TIFF format. The forgeries are of various sizes and characters and include object insertion, object removal and more subtle changes to existing content, such as subtle shadows or reflections, which are unlikely to be detected with PRNU analysis. The experiment was performed separately for each camera.

4.2 Parameter selection

Table 1 shows the parameter values used in the experiments. The square window ωs(s = 1, ⋯, S) used for interpolation includes {32, 48, 64, 96, 128, 192, 256}. Parameter J is related to the number of segmentation patches. Note that, in our experiment, the relationship between ωs and Js satisfies Eq. (13):

$$ {\omega}_s=\left[\sqrt{\frac{M\times N}{J_s}}\right] $$
(13)

where M × N represents the size of the test image and [⋅] indicates the rounding function. The parameters used in the camera model and CRF decision are the same as in [25].

Table 1 Parameters used in the experiments

4.3 Localization performance and comparisons

To validate the effectiveness of the proposed multi-scale SLIC (M-SLIC) algorithm, we compare the proposed scheme with the 7 single-scale ({2025, 900, 506, 225, 127, 56, 32}) SLIC methods. ROC curves and the average F1-score are plotted in Fig. 4 when changing the decision threshold τ on each of the camera datasets separately. Figure 4a shows the ROC curves for four cameras. To improve readability, we show only a close-up of the most relevant region. Compared with all 7 single scales, the proposed M-SLIC strategy delivered superior performance for all four cameras. Similar results can be observed in Fig. 4b, where the average F1-score plotted when changing the decision threshold τ on each of the datasets. The maximum average F1-score of the M-SLIC method performed better than all 7 single scales for all cameras. This confirmed that the proposed fusion strategy could effectively combine the benefits of both small-scale and large-scale analyses.

Fig. 4
figure 4

Comparison of the proposed M-SLIC algorithm with individual single-scale SLIC algorithm. a ROC curve comparison. b Average F1-score comparison

To assess the performance of the proposed PRNU-based M-SLIC algorithm, we also compare it with two other PRNU-based methods. One is the sliding window-based segmentation-guided (SW-SG) strategy [25], and the other is the sliding window-based single scale (SW-SS) detectors with the standard 128 × 128 pixel window [32]. For these two methods, we use the source codes provided by the authors with default parameters to generate the results. In addition, we also compare the proposed M-SLIC algorithm with the single-scale SLIC (S-SLIC) algorithm proposed in our previous paper.

In our previous study, it was confirmed through experiments that the average F1-scores all reached the maximum, with the parameter J = 700 for all four cameras. Therefore, in the following comparison, J is fixed to 700 in the S-SLIC algorithm. The obtained results are shown in Figs. 5 and 6. It can be seen that for the Sony α57, Canon 60D and Nikon D7000 cameras, the most stable improvement can be seen for the proposed M-SLIC strategy, which performs better than the SW-SG, SW-SS and S-SLIC methods. For the Nikon D90 dataset, M-SLIC has similar performance as the SW-SG method and is better than the other two methods. Similar tendencies can be observed from the average F1-score plotted in Fig. 6.

Fig. 5
figure 5

Comparison of the ROC curves between the proposed M-SLIC algorithm and the SW-SG, SW-SS and S-SLIC (J = 700) algorithms for the four cameras

Fig. 6
figure 6

Comparison of the average F1 scores between the proposed M-SLIC algorithm and the SW-SG, SW-SS and S-SLIC (J = 700) algorithms for the four cameras

To clearly show which methods perform better, the maximum average F1-score is shown in Table 2. The maximum value of each camera is highlighted in bold. As shown in Table 2, most of the bold numbers appear in the proposed scheme (the last column), indicating the effectiveness of the proposed M-SLIC algorithm. The insignificant performance decline appears in the Nikon D90 camera. The reason is that in the dataset of the Nikon D90 camera, there are many subtle object removal forgeries.

Table 2 Maximum average F1-scores for the four cameras

We also present some examples of tampering localization results in Fig. 7 for the strategies mentioned above. It can be observed that the proposed algorithm can detect small size and large size forgeries. At the same time, the proposed algorithm can not only detect additive tampering (2th and 4th rows), but also detect object removal tampering (1th and 3th rows). Besides, compared with the SW-SG and SW-SS algorithms, the proposed scheme can achieve much better localization accuracy.

Fig. 7
figure 7

Example of tampering localization results. The pixels in white, black, red, and green indicate true positive, true negative, false positive, and false negative, respectively. Here, positive means fake pixels, while negative means pristine pixels. a Original image. b Tampered image. c Ground truth. d SW-SS [32] method. e SW-SG [25] method. f Proposed M-SLIC method

4.4 JPEG compression robustness test

The tampered image may undergo JPEG compression after manipulation. The following experimental results demonstrate the performance of the proposed method when the images are JPEG compressed. We used the Nikon D7000 dataset for this experiment. Photoshop is used to compress the TIFF images into JPEG format images with quality factors varying from 100 to 70 in steps of −10. That is, each picture in the original forgery dataset is altered to four versions. We used the same camera models and predictors as in the previous experiments.

The impact of JPEG compression on tampering localization performance is shown in Fig. 8. As the quality factor decreases, the detection result deteriorates. The proposed multi-scale fusion strategy delivers the best maximum average F1-score for all JPEG quality factors.

Fig. 8
figure 8

JPEG compression test for the Nikon D7000 dataset. Note that 70, 80, 90 and 100 are the compression quality factors and TIFF present uncompressed TIFF format images. a Comparison of the average F1-scores between the proposed algorithm with SW-SS and SW-SG algorithms. b Comparison of the average F1-scores between the proposed algorithm with the 7 single scale (J∈{2025, 900, 506, 225, 127, 56, 32}) SLIC algorithms

5 Conclusion

This paper introduced a novel PRNU-based multi-scale fusion method to expose copy-move and splicing forgery in digital images. Different from existing sliding window-based algorithms, in which correlations of PRNU are estimated on overlapped patches, the proposed algorithm directly segments the test image into nonoverlapping and irregular blocks of multiple scales and computes correlations on nonoverlapped segmentation patches.

The merits of the proposed approach are as follows. The proposed algorithm is particularly good at identifying the location and shape of the object insertion forgeries. It uses nonoverlapped irregular segmentation, which can accurately delineate boundaries of contrasting objects with lower computational complexity. In addition, multi-scale analysis can detect as many types of forgery as possible.

Despite the present advances, there is still considerable room for improvement. Although subtle object removal forgeries can be detected by the proposed scheme, the localization accuracy needs to be further improved in future work. Since PRNU can be contaminated mainly by image content and non-unique artefacts of JPEG compression, how to improve the estimated quality of PRNU is one of the major research orientations in the future. In addition, as a future study, we suggest that we need to design a better and more robust correlation predictor for the PRNU detector.