1 Introduction

The life-cycle of a digital image is extremely complicated nowadays: images are acquired by smartphones or digital cameras, edited, shared through Instant Messaging platforms [1], etc. In each step, the image could go through a modification that potentially changes something without modifying (in almost cases) the semantic content. This makes forensics analysis really difficult in order to reconstruct the history of an image from the first acquisition device to each of the subsequent processing ([2, 3]). Even detecting if an investigated image has been compressed only two times is a challenging task, namely Double Compression Detection ([46]). The problem is furtherly complicated by considering the possibility to crop and/or resize images (e.g., aligned and non-aligned scenario [7, 8]). State-of-the-art image forensics techniques usually make use of different underlying assumptions specifically addressed for the task ([710]). This becomes particularly relevant when dealing with multiple compressions [11]. The robust inference of how many times an image has been compressed is a problem investigated with techniques working mainly for the aligned scenario ([1215]). In particular, [15] pushes the detection up to triple compression by defining a three-class classification problem demonstrated to work only for multiple compressed images with the same Quality Factor.

Once an image has been detected to be multiply compressed, the reconstruction of the history of the image itself becomes challenging. First Quantization Estimation (FQE) has been widely investigated for both the aligned and non-aligned cases w.r.t. different datasets in the double compressed scenario.

A first technique for FQE was proposed by Bianchi et al. ([1618]). They proposed a method based on the Expectation Maximization algorithm to predict the most probable quantization factors of the primary compression over a set of candidates. Other techniques based on statistical consideration of Discrete Cosine Transform (DCT) histograms were proposed by Galvan et al. [4]. Their technique works effectively in specific conditions on double compressed images exploiting the a-priori knowledge of monotonicity of the DCT coefficients by histogram iterative refinement. Strategies related to histogram analysis and filtering similar to Galvan et al. [4] were introduced until these days ([1921]). Still they lack of robustness and are likely to work only in double compressed scenario and at specific conditions demonstrating many limits. Recently, Machine Learning has been employed for the prediction task making many black-boxes able to train and model statistical data w.r.t. specific datasets. For instance Lukáš and Fridrich in [22] introduced a first attempt exploiting neural networks, furtherly improved in [23] with considerations on errors similar to [4]. At last Convolutional Neural Networks (CNN) were also introduced in some works ([2426]). CNNs have demonstrated to be incredibly powerful in finding hidden correlations among data, specifically in images but they are also very prone to overfitting, making all the techniques extremely dependent on the dataset used for training ([27]). This drawback is in some way mitigated by employing as much training data as possible in wild conditions, Niu et al. [28] in this way achieved top-rated results for both aligned and non-aligned double compressions.

All the techniques reported above tried to estimate the first quantization matrix in a double compression scenario, although estimating just the previous quantization matrix for multiple compressed images, could be of extreme importance for investigation in order to understand intermediate processing. When it comes to multiple compressions, the number of compression parameters involved in each step for every single image becomes huge. Machine Learning techniques need to see and consider almost all combinations during the training phase, and are not easily viable for this specific task. In this paper, a FQE technique based on simulations of multiple compression processes is proposed in order to detect the most similar DCT histogram computed in the previous compression step. The method is based only on information coming from a single image, thus it does not need a training phase.

The proposed technique starts from the information of the (known) last quantization matrix (easily readable from the image file itself) in conjunction with simulations of compressions applied to the image itself with proper matrices. Experiments on 2, 3 and 4 times compressed images show the robustness of the technique providing useful insights for investigators at specific compression parameters combination. The remainder of this paper is organized as follows: Sections 2 and 3 describe the proposed approach and datasets, in Sections 4 experimental results are reported in different scenarios, and Section 5 concludes the paper.

2 Proposed Approach

Given a JPEG m-compressed (compressed m times) image I, the main objective of this work is the estimation of a reliable number of k quantization factors (zig-zag order) of the 8×8 quantization matrix Qm−1 (i.e., the quantization table of (m−1)-th compression), which it is possible to define as qm−1={q1,q2,......,qk}. The unique information available about I is the last quantization matrix qm, which can be one of the standard JPEG quantization tables or custom ones ([29, 30]), available by accessing the JPEG file and the extracted (e.g., with LibJpeg C libraryFootnote 1) DCT coefficients of each 8×8 block (Dref). No inverse-DCT operation is done at this step, thus no further rounding and truncation errors can be introduced. The set of the obtained DCT blocks and the respective coefficients (multiplied by qm) are collected to compute an histogram for each of the first k coefficients in classic zig-zag order denoted with: href,k(Dref) with k∈{1,2,..,64}. A square patch CI of a size d×d is cropped from the image I previously decompressed (e.g., Python Pillow libraryFootnote 2), leaving out 4 pixels for each direction, in order to break the JPEG block structure [22]. CI is then used as input to simulate JPEG compressions, carried out with a certain number n>0 of constant 8×8 matrices Mi with i∈{1,2,..,n}. The parameter n is simply set considering the greatest value that can be assumed by the quantization factors employed in the previous quantization step for the worst scenario (i.e., lowest Quality Factor). Once the parameter n is defined, the simulation of compression of CI is arranged as follows: given CI for i=1,2,...n, a 8×8 quantization matrix Mi with each element equal to i is defined, allowing to generate \(C^{\prime }_{I,i}\) compressed images. The current (second) compression is then simulated by employing the known qm on each of the n\(C^{\prime }_{I,i}\) thus generating CI,i″ new compressed images. Each CI,i″ represents a simulation of compression with known previous and last quantization parameters.

As done with I, the DCT coefficients (Di) are extracted from CI,i″, the distributions hi,k(Di) are computed, with i∈{1,..,n}, which represent a set of n distributions for the k coefficient, where k∈{1,..,64}. hi,k(Di) are then analytically compared, one by one, with the real one href,k(Dref) through the χ2 distance defined as follows:

$$ {\chi}^{2}({x},{y})=\sum_{i=1}^{m} {\left(x_{i} - y_{i}\right)^{2}}/\left(x_{i} + y_{i}\right) $$
(1)

where x and y represent the distributions to be compared.

Finally the estimation of qm−1={q1,q2,......,qk}, can be done for every qk quantization factor as follows:

$$ {q_{k}}={argmin}_{i=1,..,n}{\chi^{2}{\left(h_{i,k}\left(D_{i}\right),h_{ref,k}\left(D_{ref}\right)\right)}} $$
(2)

For sake of clarity, the pseudo-code of the process is reported in Algorithm 1.

3 Datasets

The effectiveness of the proposed approach was demonstrated through experiments performed on four datasets (BOSSBase [31], RAISE [32], Dresden [33] and UCID [34]) for the first quantization estimation in the double compression scenario: patches of different dimensions were obtained by extracting a proper region from the central part of the original images. A new set of doubly compressed images was then created starting from the cropped images with a certain number of combinations of parameters in terms of crop size and compression quality factors (employing only standard quantization tables [29]).

Other experimental datasets were similarly created from RAISE using custom quantization tables employed in Photoshop and from the collection shared by Park et al. [35]. The first dataset is obtained from all RAISE images cropped in patches 64×64, by employing the 8 highest Photoshop custom quantization tables (on 12 total) for first compression (where higher values correspond to better quality factors) and QF2={80,90}. The second dataset is built from 500 randomly picked full-size RAISE images by considering for first and second compression a collection of 1070 custom tables, with substantial differences from the standard ones, splitted in 3 quality clusters (LOW, MID, HIGH) calculated by the mean of the first 15 DCT coefficients and selected randomly from the clusters in the compression phase.

Finally, a dataset for the multiple compression scenario was created starting from UCID [34] and compressing two, three and four times patches of different size with QFm∈{80,90},m=1,2,3 and all previous steps of compression with QF∈{60,65,70,75,80,85,90,95,100}.

4 Experimental Results

To properly assess the performances of the proposed solution, a series of tests have been conducted, considering the datasets described in the previous Section, in multiple compression scenarios. Four approaches were considered for comparison: Bianchi et al. [17], that is a milestone among analytical methods and has great similarity with the proposed approach; Galvan et al. [4] and Dalmia et al. [19] which achieve state of the art results when QF1<QF2 and Niu et al. [28], which represents the state-of-the-art with the use of CNNs with best results as today. It is worth noting that Niu et al. [28] uses different trained neural models for each QF2 (80 and 90), while the proposed solution works for any QF2 with the same technique. Although [28] has been designed to work on a more general scenario and the related CNN has been trained considering also the non-aligned double compression, it achieves the best results among CNN based approaches also in the aligned scenario.

As regards implementations used for testing above mentioned techniques: the publicly availableFootnote 3 Matlab implementation was employed for Bianchi et al. [17]; code from the ICVGIP-2016.RAR archive available on Dr. Manish Okade’s websiteFootnote 4 was employed for Dalmia et al. [19]; models and implementation available on GithubFootnote 5 were employed for Niu at al. [28] and finally an implementation from scratch was employed for Galvan et al. [4]

Experiments were carried out for standard tables and custom ones, all employing 64×64 patches extracted from RAISE dataset [32]. As reported in Table 1 and Figs. 1, 2, 3 and 4 the proposed method outperforms almost always the state-of-the-art when the first quantization is computed with standard tables, while the obtained results on images employing Photoshop custom tables demonstrate a much greater gap in accuracy values (see Table 2 and Figs. 5, 6). Results on custom tables show better generalization capabilities w.r.t. [28] which, being CNN-based, seems to be dependent on tables used for training.

Fig. 1
figure 1

Average accuracy of the estimation for each DCT coefficient (first is DC) employing standard tables with QF1={55,60,65,75} and QF2=80. Plot shows results of our method, Bianchi et al. [17], Dalmia et al. [19], Galvan et al. [4] and Niu et al. [28]

Fig. 2
figure 2

Average accuracy of the estimation for each DCT coefficient (first is DC) employing standard tables with QF1={60,65,75,80,85} and QF2=90. Plot shows results of our method, Bianchi et al. [17], Dalmia et al. [19], Galvan et al. [4] and Niu et al. [28]

Fig. 3
figure 3

Average accuracy of the estimation for each DCT coefficient (first is DC) employing standard tables with QF1={55,60,65,75,80,85,90,95} and QF2=80. Plot shows results of our method, Bianchi et al. [17] and Niu et al. [28]

Fig. 4
figure 4

Average accuracy of the estimation for each DCT coefficient (first is DC) employing standard tables with QF1={60,65,75,80,85,90,95} and QF2=90. Plot shows results of our method, Bianchi et al. [17] and Niu et al. [28]

Fig. 5
figure 5

Average accuracy of the estimation for each DCT coefficient (first is DC) employing custom tables with QF2=80

Fig. 6
figure 6

Average accuracy of the estimation for each DCT coefficient (first is DC) employing custom tables with QF2=90

Table 1 Accuracy obtained by proposed approach compared to Bianchi et al. ([17]), Galvan et al. ([4]), Dalmia et al. [19] and Niu et al. ([28]) with different combinations of QF1/ QF2 by considering the standard quantization tables
Table 2 Accuracy obtained by proposed approach compared to Bianchi et al. ([17]), Galvan et al. ([4]) and Niu et al. ([28]) employing custom tables for first compression

Further tests have been performed to demonstrate the robustness of the proposed solution w.r.t. image contents and acquisition conditions (e.g., different devices). Specifically, three datasets have been considered: Dresden [33], UCID [34] and BOSSBase [31]. Results reported in Tables 35, confirm the effectiveness of the proposed solution. The impact of the resolution/crop pair is evident observing the results of a single dataset (Table 4), where for each increase in crop size (incrementally) corresponds an improvement of accuracy. At the same time, considering the same crop of different datasets (64×64 in Tables 1, 35) the best results are obtained in the crop extracted from the dataset with lowest resolution. A crop d×d extracted from an high resolution image contains less information than that extracted from a smaller one, delivering a flatter histogram that is difficult to discriminate.

Table 3 Accuracy obtained by the proposed approach on Dresden [33] dataset with different patch size and QF1/QF2
Table 4 Accuracy obtained by the proposed approach on UCID [34] dataset with different patch size and QF1/QF2
Table 5 Accuracy obtained by the proposed approach on BOSSBase [31] dataset with different patch size and QF1/QF2

A final test regarding double compressed images has been performed in a much more challenging scenario: a dataset of 500 full-size RAISE images was employed for first and second compression by using 1070 custom tables collected by Park et al. [35] (as described in Section 3). For this test, the parameter of the proposed approach was n=136 which is the maximum value of the first 15 coefficients among the 1070 quantization tables in this context. Results obtained, in terms of accuracy, are reported in Table 6 and definitively demonstrate the robustness of the technique even in a wild scenario of non-standard tables.

Table 6 Accuracy of proposed approach using RAISE full-size images compressed with custom table from Park et al. [35]

4.1 Experiments with Multiple Compressions

The hypothesis that only one compression was performed before the last one could be a strong limit. Thus, a method able to extract information about previous quantization matrices, in a multiple compression scenario, may be a considerable contribution. For this reason, the proposed approach was tested in a triple JPEG compression scenario, where the new goal was the estimation of the quantization factors related to the second compression matrix. Figure 7 shows the accuracy obtained employing different crop sizes (64×64,128×128,256×256) on all the combinations QF1/QF2/QF3 with QF1/QF2∈{60,65,70,75,80,85,90,95,100} and QF3∈{80,90} with the method that predicts the firsts 15 coefficients of QF2.

Fig. 7
figure 7

Overall accuracy of the proposed method on JPEG triple compressed images when trying to estimate the Qm−1 quantization factors. First row identifies patch size 64×64,128×128,256×256 and QF3=80 respectively [(a),(b),(c)], while second row is related to the same patch sizes and QF3=90 [(d),(e),(f)

As shown in Fig. 7, the method in general achieves satisfactory results. Some limits are visible when the first compression is strong (low QF) and the second one has been performed with an high quality factor QF2∈{90,95,100}. By analyzing the results in these particular cases, it is worth noting that the method estimates QFm−2 instead of QFm−1. Figure 8 shows the accuracies obtained in these last cases (QF2∈{90,95,100}) considering as correct estimations the quantization factors related to Qm−1 (a), Qm−2 (b) and both (c). Results shown in (c) demonstrate how the method is able to return information about quantization factors (not only m−1) even in this challenging scenario. Starting from this phenomenon, in order to discriminate a predicted factor qk between Qm−2 and Qm−1, a simple test has been carried out on 100 triple compressed images with QF1=65,QF2=95 and QF3=90. Starting from the cropped image CI (see Section 2), we simulated, similarly to the case of double compressions in the proposed approach, all the possible triple compressions taking into account only two hypothesis (i.e., qk belongs to Q2 or Q1) and considering a constant matrix built from qk as Q1 or Q2 respectively. Thus, the obtained simulated distributions are compared with the real one through χ2 distance (1). In this scenario, the proposed solution correctly estimated Q1 quantization factors with an accuracy of 95.5%. Moreover, as a side effect of the triple compression also Q2 is predicted with 76.6% accuracy.

Fig. 8
figure 8

Overall accuracy of the proposed method on JPEG triple compressed images with high QF2 (90,95,98), patch size 256×256 and QF3=90, considering as ground truth (i.e., correct estimations) the quantization factors related to QF2 (a), QF1 (b) and both (c)

The insights found for the triple compression experiments were confirmed on 4 times JPEG compressed images (Fig. 9). Even in this scenario, if high QF are employed in the third compression (e.g., 90, 95, 100) Q2 factors are actually predicted in a similar way of what was described before. Besides, if both QF3 and QF2 are high, Q1 elements could be estimated, confirming how the method in each case obtains information about previous compressions.

Fig. 9
figure 9

Accuracy of the proposed method on JPEG 4-compressed images employing all the combinations QF1,QF2,QF3∈{60,65,70,75,80,85,90,95,100} and QF4=90 considering QF3 as ground truth (a). Further analysis have been conducted with QF3∈{90,95,100} (low accuracy regions): (b) and (c) show the results employing QF2 and QF1 as ground truth respectively

The proposed method estimates the strongest previous compression which is basically the behavior of most First Quantization Estimation (FQE) methods. For this reason, a comparison was made with [28] on triple compressed images considering Qm−1 as correct estimation. Figure 10 reports the accuracy in the QF3=90 scenario showing how our method (left graph) maintains good result even in triple compression while [28] has a significant performance drop compared to double compression.

Fig. 10
figure 10

Accuracy of our method (left) and [28] (right) on JPEG triple compressed images employing all the combinations QF1,QF2∈{60,65,70,75,80,85,90,95,100} and QF3=90 considering QF2 as ground truth

4.2 Cross JPEG Validation

Recent works in literature demonstrate how different JPEG implementations could employ various Discrete Cosine Transform and mathematical operators to perform floating-point to integer conversion of DCT coefficients [36].

In order to further validate the proposed method, a cross JPEG implementation test was conducted considering two different libraries (Pillow and libjpeg-turbo) and 2 DCT configurationsFootnote 6 to compress the input images and Pillow to simulate the double compression described in the pipeline. The test was performed using the same 8156 RAISE images cropped 64×64 and double compressed by means of the aforementioned JPEG implementations with QF1={60,65,70,75,80,85,90,95} and QF2=90. Results reported in Table 7 confirm the overall robustness of the proposed solution with respect to different JPEG implementations.

Table 7 Accuracy obtained employing different JPEG implementations with QF2=90

5 Conclusions

In this paper a novel method for previous quantization factor estimation was proposed. The technique outperforms the state-of-the-art in the aligned double compressed JPEG scenario, specifically in the challenging cases where custom JPEG quantization tables are involved. The good results obtained, even in the multiple compression scenarios (up to 4 compressions) highlight that previous compressions leave traces detectable in the distributions of quantization factors. Furthermore, the use of these distributions for previous quantization estimation makes the proposed technique simple with a relatively low computational effort, avoiding extremely computationally hungry techniques while maintaining the same accuracy results. The strengths of the proposed method compared to machine learning approaches are its simplicity and the fact that it does not need training sets.