1 Introduction

Digital forensics is a branch of forensic science which deals with the problem of determining the authenticity of digital data. Digital data such as images play a significant role in digital forensics since they are a main source of information. With the advent of image editing software, digital images can easily be manipulated. Tampered images can be found everywhere and this has eroded confidence in the reliability of digital images.

Digital image forensics can be categorized into two groups: active and passive. In the active approach [1], a digital watermark or signature is embedded into the image to verify its integrity and authenticity. A digital watermark can be visually undetectable but can be used to detect changes in image pixels and locate where the changes occurred [2]. However, watermark removal software is readily available at no cost and the images are still vulnerable to forgery. For this reason, passive methods have been introduced which require no prior information to detect tampering [3].

Images can be modified in several ways such as image splicing, retouching, and copy-move forgery [4]. Image retouching alters an image in order to change the look of a subject [5]. Image splicing refers to copying a part of an image and pasting it onto another image. Most techniques to detect image splicing rely on the sharp edges and corners of the pasted region and the inconsistency in the color of the pasted region compared to the original image [6]. In contrast, copy-move forgery refers to copying a part of an image and pasting it onto the same image. Since the pasted region comes from the same image, the color is typically consistent which makes it hard to detect. Thus, copy-move forgery is more difficult to detect than other types and is the focus of this paper.

In this paper, a new copy-move forgery detection technique is introduced which is based on singular vector decomposition (SVD) for features extraction and Kolmogorov Smirnov (KS) test for decomposition. The proposed method can detect copy-move forgery in images with accuracy up to the pixel level by using only 4 features per image block of dimension \(16 \times 16\). Four different post-processing techniques are considered, namely brightness change (BC), contrast adjustment (CA), color reduction (CR), and image blurring (IB). Comparisons are conducted with several well-known techniques in terms of accuracy, time complexity, and image post-processing using the well-known CoMoFoD dataset [7].

The remainder of this paper is organized as follows. Section 2 provides a review of existing approaches for copy-move forgery detection. Section 3 explains the structure of block-based methods and a review of the existing block-based methods. Section 4 discusses the proposed method. The experimental results, comparisons and analysis are given in Sect. 5. Finally, the conclusions are drawn in Sect. 6.

2 Copy-move forgery

Copy-move forgery has a long history beginning in the early 1900s when Alexander Malchenko (standing, on the left) was edited out from the image as shown in Fig. 1. This task was complicated due to the limited tools available at that time. In contrast, there are now tools that can easily perform this task such as Adobe Photoshop which is available at no cost.

Copy-move forgery is usually used to maliciously hide or add information as in Fig. 1. Since the forged region comes from the same image, the resulting image has at least one duplicated region. The goal of detection methods is to determine these regions. Searching for duplicated regions by comparing pixels in the image is a direct solution but is slow and computationally expensive. Further, post-processing techniques such as brightness change (BC), color reduction (CR), contrast adjustments (CA), and image blurring (IB) can be used on the image or just the copied region to make the forgery harder to detect. This makes copy-move forgery detection a challenging task.

Fig. 1
figure 1

Alexander Malchenko has been edited out: a original image, and b forged image

Copy-move forgery detection can be keypoint-based or block-based. In block-based methods, an image is partitioned into fixed size overlapping or nonoverlapping rectangular or circular blocks. A feature vector is extracted for each block and these vectors are matched by calculating the distance between them. This distance can be Euclidean distance [8, 9], Hamming distance [10], Hausdorff distance [11], logical distance [12], correlation coefficient [13, 14], phase correlation [15, 16], or local sensitive hashing [17, 18]. The main concern with block-based methods is their computational complexity.

Keypoint-based methods detect and describe local features in an image using techniques such as the scale invariant feature transform (SIFT) and speeded up robust features (SURF) [19]. These features are used to find matching regions in the image and if two regions have similar keypoint features, one is assumed to be forged. However, these methods can fail when the forged regions have been modified using techniques such as image retouching [20]. For this reason, block-based methods are employed here. The next section presents a review of copy-move forgery detection using these methods.

3 Block-based methods

A number of block-based methods have been proposed to detect copy-move forgery [21, 22]. The goal is to find similar regions in a forged image.

3.1 Preprocessing

Some block-based methods require the image to be preprocessed before any further analysis. In [5, 15, 18, 22, 23] an image was converted to grayscale using the luminance formula \(Y=0.299R+0.587G+0.114B\), where R, G, and B represent the red, green, and blue luminance respectively, of the colour image. Each color is represented by a vector which provides the brightness intensity, for example 0 to 255 for 8-bit images.

3.2 Image partitioning

In block-based methods, an image is partitioned into fixed size overlapping or non-overlapping blocks. Square block partitioning was used in [5, 17, 21, 22, 24]. Circular partitioning was used in [9, 14] to reduce the dimension of each block compared to square blocks which lowers the computational complexity. After an image is partitioned into blocks, features can be extracted from each block to find those with similar features. Other methods operate directly on the blocks without feature extraction such as those in [25].

3.3 Feature extraction and matching techniques

Copy-move forgery detection is used to find replicated regions in an image. However, the copied region can be processed prior to pasting into another part of the image to make it harder to detect. The detection accuracy depends on finding matching features in a copy-move pair of blocks in an image even if the image has undergone post-processing such as blurring, color reduction, brightness change, or contrast adjustment [26]. To accomplish this goal, several procedures have been proposed.

In [21], a method for copy-move forgery detection was given which is based on the discrete cosine transform (DCT). In this approach, DCT coefficients are extracted for each block. To reduce the complexity, the coefficients are sorted lexicographically and then quantized to reduce the effects of noise, JPEG compression, brightness change or other post-processing. After quantization, the coefficients for each block are compared and a block is said to be copied if these coefficients are comparable. To reduce the false positive rate, a region is detected as forged only if there is a cluster of copied blocks in the region. However, this may result in small forged regions going undetected.

In [27], the DCT coefficients were considered as eigenvalues for each block. Then, the distances between the eigenvectors of all blocks were calculated and lexicographically sorted to reduce the false positive rate [21]. If the distance between two blocks is less than a certain threshold, the block is considered to be a duplicate. This method is robust to noise and JPEG compression but not to other post-processing operations. Comparing block features makes block-based partitioning methods complex. As a result, several techniques have been developed to lower the complexity by reducing the length of the feature vectors. Cao et al. [24, 28] reduced the feature vector length by using circular blocks instead of square blocks. Each block is divided into four regions. DCT is applied to each region so each is represented by DCT coefficients. The mean of the coefficients in each region is used as a feature resulting in 4 features per block. This method is robust to noise and blurring, and can detect multiple forgeries in the same image. However, it is not robust against rotation or scaling.

Several block-based algorithms have been proposed to reduce the feature vector length while being robust to post-processing operations. A log-polar transform was used in [14, 27, 29]. This transform maps the image blocks to log-polar coordinates [30]. This approach is robust to post-processing operations such as rotation, scaling, mirroring, illumination modification, and chrominance modification, but not against other techniques such as noise, JPEG compression or blurring.

SVD is a matrix factorization technique that can be used to extract features from an image. Li et al. [31] proposed an algorithm based on SVD and the Discrete Wavelet Transform (DWT). The DWT is used to decompose an image into a series of coefficients corresponding to the image spatio-frequency subbands and SVD is employed to extract the feature vectors. Then these vectors are sorted lexicographically to detect duplicated regions. Zhang and Wang [32] introduced a method which combines SVD with a k-dimensional (KD) tree. A KD tree is commonly used to search for nearest neighbours. First, SVD is used to extract a feature vector for each block, and these features are organized using a KD tree for fast searching. Similar blocks are matched using the Euclidean distance

$$D(u,v) = \left( {\sum\limits_{{i = 1}}^{r} {((} u(i) - v(i))^{2} } \right)^{{\frac{1}{2}}} ,$$

where \(u=(u_1, u_2, \ldots , u_r)^T\) and \(v=(v_1, v_2,\ldots , v_r )^T\) are length r feature vectors. Xiaobing and Shengmin [1] used an approach similar to that in [32], but only the k largest singular values were considered and the remaining values discarded to reduce the feature vector size. Kang and Cheng [1] also retained only the largest k values. Regions were matched using the cumulative contribution which is defined as

$$\eta = {{\left( {\sum\limits_{{i = 1}}^{k} {(\lambda _{i} )} } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\limits_{{i = 1}}^{k} {(\lambda _{i} )} } \right)} {\left( {\sum\limits_{{i = 1}}^{r} ( \lambda _{i} )} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sum\limits_{{i = 1}}^{r} ( \lambda _{i} )} \right)}},$$

where \(\lambda\) is the feature vector sorted from largest to smallest and r is the number of singular values. However, this method is not robust to post-processing operations.

The size of the feature vectors is an important factor in the complexity of detection algorithms. Steerable pyramid is a technique used for image decomposition. In this method, an image is smoothed using a smoothing filter and subsampled by a factor of 2 along each coordinate. This process is repeated multiple times which looks like a pyramid if illustrated graphically. In [5], steerable pyramid decomposition was applied to image blocks to reduce the size from \(16 \times 16\) to \(4 \times 4\). Then, similar blocks were extracted using the Gaussian copula which is a distribution constructed from a multivariate normal distribution.

3.3.1 Kolmogorov–Smirnov (KS) test

Kolmogorov–Smirnov test is used to compare two samples based on a distribution function [33]. Given samples \(x_1,\ldots , x_n\) of a random variable with cumulative distribution function (CDF) F, consider the problem of testing \(H_0:F=F_n\) versus \(H_1:F \ne F_n\), where \(F_n\) is some specified distribution function. For example, in the univariate case \(H_0\) can be tested using the KS test statistic \(sup_{x \in R} \mid F_n(x)-F(x)\mid\), where \(sup_{x\in R}\) is the supremum of the set of distances, F is the CDF of the reference distribution, and \(F_n\) is the empirical distribution of the samples  [34]. The test statistic measures the similarity between distributions and equals 0 when they are identical.

3.3.2 Steerable pyramid

Steerable pyramid is a mathematical tool that is multi-orientation, multi scale image decomposition technique. This transform was first introduced in the literature in early 1990s. It is a wavelet-based representation [35]. Steerability refers to the ability of the wavelets to rotate to any orientation by forming suitable linear combinations of a primary set of equiangular directional wavelet components [36, 37]. In steerable pyramid decomposition, an image is decomposed into lowpass and highpass subbands, using steerable filters \(L_0\) and \(H_0\). Then, the lowpass band breaks down into a set of bandpass subbands \(B_0, \ldots , B_k\) and lower lowpass subband \(H_1\). The resulting lower lowpass subband is subsampled by a factor of 2 along the x and y directions. This process is repeated until we arrive at the desired scale of decomposition. Figure 2 shows an image of size \(128 \times 128\) pixels decomposed to its steerable pyramid subbands. This example shows 4 orientations and 3 scales. Each scale has 4 orientations. The first scale is \(128 \times 128\) where the second and third scales are \(64 \times 64\) and \(32 \times 32\) respectively. Finally, the last subband is \(16 \times 16\) pixels [36].

Fig. 2
figure 2

Lena Image and its decomposed steerable pyramid subbands

3.3.3 Singular value decomposition (SVD)

Singular value decomposition has been used in several fields such as data compression, signal processing and pattern analysis [1, 38]. An \(N \times M\) matrix \(A \in R^{N \times M}\) of rank j can be factored in the form

$$\begin{aligned} A=P \Sigma Q^T \end{aligned}$$
(1)

where

$$\begin{aligned} \Sigma = \begin{bmatrix} \Sigma _j &{} \quad 0 \\ 0 &{} \quad 0 \end{bmatrix} \end{aligned}$$
(2)

is an \(N\times M\) diagonal matrix and \(\Sigma _r=diag(\sigma _1,\sigma _2,\ldots ,\sigma _j)\) is a square diagonal matrix with positive diagonal entries called the singular values of A, and orthogonal matrices \(P \in R^{N \times N}\) and \(Q \in R^{M \times M}\).

4 The proposed method

In the proposed method, an image is partitioned into blocks of size \(16 \times 16\) using a \(16 \times 16\) sliding window which is shifted by one pixel per step as shown in Fig. 3. This results in 246, 016 blocks for an image of size \(512 \times 512\). To reduce the computational complexity, each block is decomposed into a \(4 \times 4\) block using steerable pyramid decomposition. A \(4 \times 4\) block has 4 nonoverlapping \(2 \times 2\) sub-blocks. SVD is applied to each \(2 \times 2\) sub-block to extract a single singular value which is the corresponding feature as shown in Fig. 4. This results in 4 features per \(4 \times 4\) block. Thus, each original \(16 \times 16\) block is represented by a vector of only 4 features. Figure 5 shows a block diagram of the feature extraction process. The indices of the original blocks are stored with the feature vectors so that each pixel is associated with a feature vector.

Fig. 3
figure 3

Image partitioning into \(16 \times 16\) overlapping blocks.

Fig. 4
figure 4

Feature extraction from \(2 \times 2\) sub-blocks using SVD

Fig. 5
figure 5

The overall feature extraction process

The feature vectors are sorted lexicographically so that similar vectors are close which simplifies the search process. The KS test is applied to the sorted feature vectors. The empirical and reference CDFs \(F_n\) and F are represented by the two feature vectors being compared. If the test statistic is below or equal to the threshold, the two feature vectors are said to match and belong to the same distribution. Conversely, i it is above the threshold, the vectors are said to differ and belong to different distributions. Pixels belonging to the same distribution share the same features and so one group is said to be forged.

Pixels in the same region often share the same features such as brightness and color. Thus, applying the KS test to the corresponding feature vectors may increase the false positive rate. To avoid this problem, a minimum distance between pixels is employed. The distance between two pixels is defined as

$$\begin{aligned} D = |x_2-x_1|+|y_2-y_1| \end{aligned}$$

where \((x_1, y_1)\) and \((x_2, y_2)\) are the coordinates of the pixels. If \(D \ge d\) where d is a minimum distance threshold, then the pixels are said to be in different regions. This threshold should be high enough to avoid detecting pixels in the same region, but if it is too high forged pixels in different regions may be missed. A good threshold should have a low false positive rate and a low false negative rate. For this reason, several values were evaluated considering the false positive and false negative rates and the best value was found to be 50 pixels. A block diagram of the proposed algorithm is shown in Fig. 6.

Fig. 6
figure 6

Block diagram of the proposed algorithm

5 Results and discussion

In this section, experimental results are presented to evaluate the effectiveness of the proposed method for image forgery detection. This method is compared with state of the art image forgery detection methods in terms of complexity and robustness to post-processed techniques.

5.1 Image dataset

The CoMoFoD image database [7] is used here to evaluate the image forgery methods. This database consists of two groups of images, one with 200 images of size \(512 \times 512\) and the other with 60 images of size \(3000 \times 2000\). The small scale images are classified into 5 categories: translation, rotation, scaling, distortion, and combination of all previous. Each category consists of 40 images and all images are processed using 6 post-processing techniques. These techniques are brightness change (BC), contrast adjustment (CA), color reduction (CR), and image blurring (IB). The proposed method has been evaluated on the 40 images of the translation category and 4 post-processing techniques BC, CA, CR, and IB. The proposed method has also been evaluated on one high resolution image of size \(3000 \times 2000\). All images have 8 bit color vectors. Forged regions have been added to these images and modified using 4 different post-processing techniques [7]. For BC, the range of the color intensity values was changed by mapping the intensity to values between lower and upper bounds. Intensity values that fall outside these bounds are set to the corresponding minimum or maximum value. For CA, the image contrast is adjusted by mapping the range of color intensity values to a new interval bounded by lower and upper bounds. For BC and CA, the bounds are (0.01, 0.95), (0.01, 0.9), and (0.01, 0.8), denoted by 1, 2 and 3, respectively. For CR, the intensity values for each color are quantized to a smaller range. The ranges are (0, 32), (0, 64), and (0, 128) denoted by CR1, CR2 and CR3, respectively. For IB, the images are blurred by adding Gaussian noise to the intensity values. The mean of the noise is \(\mu = 0\) and the variances are \(\sigma = [0.009,0.005,0.0005]\) denoted by IB1, IB2 and IB3, respectively [7]. This is known as Gaussian blur. Note that IB1 is the highest level of blurring, whereas BC3, CA3, and CR3 are the highest levels of change for the other techniques. Figure 7 shows examples of forged images from CoMoFoD database. Column 1 is showing the original image, column 2 is showing the binary mask to show the duplicated regions, and columns 3 to 6 are showing examples of forged images post-processed with techniques BC, CA, CR, and IB respectively. The post-processing techniques showing in this table are at the highest level.

Fig. 7
figure 7

Examples of forged images from CoMoFoD database.

5.2 Performance of the proposed method

Figure 8 shows the performance of the proposed algorithm on 5 images from the CoMoFoD database. These images were selected because they show the performance for different sizes and numbers of forged regions. The first two columns show the original and forged images. Columns 3 to 6 show the results with the images after post-processing has been applied to the forged region, i.e. the detection results after BC1, CA1, CR1 and IB1 post-processing. The original region is circled in green and the forged region is circled in red. Thus in the first image, the bird was copied along with part of the adjacent area to make the color more consistent with the surrounding area after pasting. This makes the forged region hard to detect. In all cases, the proposed algorithm was able to detect the forged regions. The proposed method was able to locate and mask the forged region accurately even with a small region as in image 4 and multiple regions as in image 5.

Fig. 8
figure 8

The results obtained using the CoMoFoD database.

To evaluate the robustness of the proposed algorithm against post-processing, all 3 levels of each technique are considered. Figure 9 shows the results for four images from the database. This shows that the forged regions were detected in all 12 cases. The accuracy of the proposed algorithm for the 200 images in the database is evaluated using precision and recall. Typically, precision and recall are calculated at the image level, i.e. based on how many images are correctly classified as forged, but to provide more accurate results, the precision and recall are calculated per pixel here. In other words, precision and recall are calculated based on the percentage of pixels in the image that are correctly classified.

Fig. 9
figure 9

Four images altered using different levels of post-processing.

The precision, recall and F1_score are given by

$$\begin{aligned} \text{ Precision }=\frac{\text{ Forged } \text{ region } \cap \text{ Detected } \text{ region }}{\text{ Detected } \text{ region }}=\frac{TP}{TP+FP},\end{aligned}$$
(3)
$$\begin{aligned} \text{ Recall }=\frac{\text{ Forged } \text{ region } \cap \text{ Detected } \text{ region }}{\text{ Forged } \text{ region }}=\frac{TP}{TP+FN},\end{aligned}$$
(4)
$$\begin{aligned} \text {F1}\_\text {score}=\frac{2 \times \text{ Precision } \times \text{ Recall }}{\text{ Precision } + \text{ Recall }}, \end{aligned}$$
(5)

where TP is the true positive rate which is the number of forged pixels detected as forged, FP is the false positive rate which is the number of pixels incorrectly detected as forged, and FN is the false negative rate which is the number of forged pixels incorrectly detected as original. Precision is a good measure of the false positive rate whereas recall is a good measure of the false negative rate. F1_score is the weighted average of precision and recall which means it takes both false positives and false negatives into account. A good detection technique should have high precision, high recall, and high F1_score.

To calculate TP, FP and FN, the results are binarized using the Otsu method [39] for comparison with the binary masks from the database. In a binary mask, the intensity values of the forged pixels are set to 1 and the intensity values of the original pixels are set to 0. Figure 10 illustrates the precision and recall for the three levels of each of the four post-processing techniques. The results obtained are calculated by averaging over all 40 images chosen for evaluating this method. These results show that the precision and recall rates are almost constant over the levels with a maximum variation of \(\pm \;2\%\) except for blurring which reduces the precision to \(34\%\) at the highest level. This consistency reflects the robustness of the proposed algorithm even with image post-processing. The precision achieved is more than \(95\%\) for three of the post-processing techniques, and except for the highest level of blurring it is greater than 75%.

Table 1 presents FP for the three levels of each post-processing technique. This shows that FP increases as the level of post-processing increases and reaches a maximum of just over \(1\%\) except for image blurring which is \(4\%\) for the highest level of blurring (\(\mu = 0\), \(\sigma = 0.0005\)).

Fig. 10
figure 10

Precision and recall with image post-processing a brightness change, b contrast adjustment, c color reduction, and d image blurring

Table 1 FP for different levels of post-processing
Table 2 Copy-move forgery detection results for the proposed and three published methods [5, 21, 22].

Table 2 presents the results for one image with the proposed and three other methods [5, 21, 22]. The binary mask is shown along with the result for the other three methods for comparison with the proposed method. The forged region and the region where it was copied from are shown in white and the reminder of the image is shown in black. The feature size per block for each method is given in the last column. Table 3 presents the results for one image of size \(3000 \times 2000\) from the CoMoFoD high resolution database. The proposed method was able to detect the forged regions with an F1_score of 68.5%. The high precision and low recall mean that the proposed method has a low false positive as well as a high false negative ratio. Conversely, the method [21] was not able to detect the forgeries in this image due to the high complexity of the feature vectors used per block (i.e. 64 features) which exceeded the maximum array size of the software used.

The size of the feature vectors has a significant effect on the complexity. Thus, the main goal of the methods in [5, 21, 22] is to reduce the feature size while providing good detection performance. However, applying post-processing to an image may result in losing important features that can be used to find similar image blocks. For this reason, reducing feature size may result in the loss of more information which will affect the robustness against post-processing techniques. The last column in Table. 3 shows that the proposed method was able to accurately detect forged regions using only 4 features per block compared to the other methods which require 64, 32, and 16 features per block. The performance using the 200 small size images in the CoMoFoD database is given in Tables 4 and 5. In terms of precision, recall, and F1_score, the proposed method is the best while the technique in [21] is better in terms of the computation time. This is because the Big-O complexity of computing the DCT coefficients is O(N) while the Big-O complexity of SVD is \(O(N^2)\). However, the SVD features are better suited to image post-processing as the proposed method performed better than the other techniques for all four post-processing techniques using the lowest number of features.

Table 3 The results obtained for a \(3000 \times 2000\) image from the CoMoFoD high resolution database.
Table 4 Comparison of the proposed and two published methods [5, 21]
Table 5 Comparison of the proposed and two published methods [5, 21]

6 Conclusion

In this paper, a new copy-move forgery detection algorithm was presented which is based on SVD and the KS test. The singular values are used as features. The feature vectors are sorted lexicographically to reduce the false matching rate. The KS test is then used to extract similar features. The performance of this method was tested using the CoMoFoD database. The results obtained show that the proposed method can detect forgeries at a pixel level and has a minimal false positive rate of less than 4%. Furthermore, this method is robust to the post-processing techniques such as brightness change, contrast adjustment, color reduction, and image blurring. Although this method had an insignificant longer computation time compared to one method we used, but the proposed method scored the highest precision, recall, and F1_score which averaged 95% for brightness change, contrast adjustment, and color reduction. For image blurring, it achieved the highest precision, recall, and F1_score which are 70%, 82.7%, and 75%, respectively. The proposed method uses only 4 features per block which is smaller than those used in the other methods. We also evaluated our method using a high resolution image of size \(3000 \times 2000\) and we obtained an F1_score of 68.5% which is promising.