1 Introduction

The intensive use of images and more in general multimedia content makes them crucial for investigation purposes but, on the other hand, makes investigations difficult from different points of view: large amount of data to process and analyse; authenticity assessment of image content; identification of the source generating the image, robustness to image manipulation and modification due to transmission, storage, posting on social networks and so on. Photo Response Non Uniformity pattern noise (PRNU in the sequel) is a camera fingerprint that can be used in different digital forensics procedures, as it allows to establish a correspondence between a device and the images that have been acquired by it [2, 20, 24]. This kind of information can result useful, for example, in source camera identification, source camera verification, image integrity and manipulation assessment [10, 11, 21, 29].

Specifically, PRNU represents a noise component in the image that is caused by sensors imperfections. Due to its specific characteristic and its ability in uniquely identifying the device, there has been a great interest in this kind of feature for addressing some critical issues in the image forensic field. Unfortunately, due to its nature, i.e. noise component hidden in the image content, its extraction is not trivial. Several methods have been proposed in the literature in the last years and they mainly depend on the application purpose. In fact, the working scenario can often change, requiring different operative procedures to get the final result with the expected/required accuracy [8, 12, 14, 21, 25, 36, 37]. The most common working scenarios are:

  1. 1.

    given one or more images acquired by an unknown device and a set of devices, establish which of the avaliable devices took the images;

  2. 2.

    given one or more images acquired by an unknown device and a set of devices, establish if one of the available devices took the images;

  3. 3.

    given a set of images, group them according to the corresponding source.

In some “relaxed” scenarios, it could be enough to identify the brand of the source device.

Source camera identification/verification methods are mainly based on two main phases: i) PRNU extraction and ii) PRNU classification. The literature concerning this topic focused on one of the two phases or both in dependence on the final task. In the first case denoising procedures are applied as PRNU is considered a noise component whose contribution can be found in a proper residual image [22, 23, 27, 34]. In the second case classification methods, mainly based on clustering techniques, are used [17, 19, 25]. In both cases, adopted methods are required to be robust to PRNU extraction procedures as well as data coming from different sources (notebook, social network [8], smartphone [37], etc.). As a matter of fact, in forensic investigations very few data often are available. This makes source camera identification problem further delicate. The pioneering work in this field is the one in [24], where a model for the acquired image has been proposed and discussed. Using this model, the extraction algorithm has then been derived by applying a proper denoising filter; classification is performed using the normalized correlation as similarity metric and a proper statistical study is conducted for estimating device-based thresholds to use in the final identification—see Fig. 1. With regard to the denoising step, several denoisers have been proposed, including methods dealing with multiplicative noise [28]; similarly, several similarity metrics have been introduced for the classification step even though correlation-based metrics have shown to be somewhat effective for this purpose. That is why normalized cross-correlation (NCC) is usually employed, even though many papers show the better performance provided by peak-to-correlation energy (PCE) [16, 38]. Interesting reviews concerning the actual state of the art can be found in [1, 2, 23, 31, 35]. As mentioned above, the main problem in PRNU extraction is the denoising method. In fact, residual edges (i.e., structured information that is not preserved in the denoised image) in the residual image (i.e., the difference between the noisy and the denoised image) can contribute to mis-classifications or can alter the classification procedure. That is why some approaches try to estimate PRNU just on flat regions, while some other apply proper weighting in the denoising process that distinguishes between edges/textures and smooth regions [33, 39, 40]. The latter approaches show a considerable improvement in PRNU extraction even though they make it computationally intensive; this is the reason why faster denoising procedures are preferable. A different strategy has been proposed in [32] and [6]. Instead of refining and improving PRNU extraction procedures, the contribution of each pixel in the similarity metric is weighted according to its probability to be corrupted by sources different from the original PRNU. As in [32], in this paper the role of edges and textures in denoising process is taken into account without neglecting the role of enhancement processing that is applied whenever the reference PRNU for a given device is extracted from a set of images taken from it. This paper is then an extension and a generalization of the one in [6], where the initial idea and preliminary results have been presented. Specifically, the first two working scenarios previously mentioned have been considered, and two feature vectors are defined for each candidate image. Each feature vector refers to a different averaging-based method for device PRNU estimation and it is composed of three correlation values: the one computed using the whole PRNU image; the one computed using only the region containing edges; the one computed using only flat regions. Due to the random nature of noise, it is expected that in case of matching (i.e., the analysed image has been acquired by the device under study), the two feature vectors are more correlated than in the case of no matching (i.e., the analysed image has not been taken with the device under study). Therefore, the role played by the coherence between NCC values derived from different PRNU estimations in the source identification process is studied in this paper. The proposed method has been extensively tested on different publicly available databases. Experimental results show that it contributes to improve the basic correlation-based source identification method by reaching and often outperforming classification results provided by selected competing state of the art methods. In particular, the proposed method seems to provide:

  • less ambiguities in case of images acquired by different devices of the same model;

  • more robustness to reference PRNU estimation from natural images (NI) instead of flat fields (FF) images, i.e. the ones whose subject is a uniform and constant background;

  • robustness to PRNU estimation from images coming from social networks.

Fig. 1
figure 1

Scheme of the source identification method in [24]. \(J_{1,d_{k}}, ..., J_{M_{d_{k}},d_{k}}\) are \(M_{d_{k}}\) images acquired by the device dk whose PRNU is \(K_{d_{k}}\). The latter is estimated by suitably combining the residual images obtained by applying a denoiser to the \(M_{d_{k}}\) images. JC is the candidate image, i.e. the one whose source has to be found; R is the estimation of the PRNU of the device that took JC and this estimation is extracted from JC by means of a denoising filter. The comparison between R and \(K_{d_{k}}\) allows us to establish if dk is the source device for JC

The remainder of the paper is the following. Next section presents the proposed method and its theoretical and practical motivations. Section 3 presents some experimental results, comparative studies and discussions. The last section draws the conclusions and provides guidelines for future work.

2 The proposed method

PRNU extraction must be based on a precise image modeling where the role of the noise sources is defined; on the other hand, the model must be not complicated in order to make PRNU estimation feasible. The common model, after some simplifications and assumptions, adopted in the literature is the following [10, 24]

$$ J(\mathbf{x}) = I(\mathbf{x}) + I(\mathbf{x}) K(\mathbf{x}) + N(\mathbf{x}), $$
(1)

where J is the acquired image, x is the pixel location, I is the original image content, K is the PRNU noise component while N includes other noise sources that are independent of K. Hence, K identifies in a unique way the device that took the image J; it is zero-mean and independent of I pointwise. From now on, the dependence on x will be omitted for simplicity.

As it can be observed, if a denoising filter F is applied to J, the residual image, i.e.

$$ R = J-F(J) , $$
(2)

preserves K component.

By considering the first two scenarios described in the introduction, when several images (especially FF images) are available for each device, the reference pattern K of the device can be estimated by combining each single estimation of K that is derived from each available image. Hence, whenever another image from the same device is available, high correlation is expected between its residual R and the reference pattern K of the device. That is why the normalized correlation is used to assess the similarity between K and R, i.e.

$$ \rho(R,K) = \frac{<(R-\bar{R}),(K-\bar{K})>}{\|(R-\bar{R})\| \| (K-\bar{K})\|}, $$
(3)

where < ⋅,⋅ > denotes the inner product and \(\bar {*}\) is the mean value of ∗.

More precisely, by denoting with \(J_{i,d_{j}}\) the i −th image acquired by the j −th device, \(M_{d_{j}}\) the number of images available for the same device and with \(R_{i,d_{j}}\) the corresponding residual image estimated as in (2), we have that the reference PRNU for the dj-th device, namely \(K_{d_{j}}\), is

$$ K_{d_{j}} \approx \frac{1}{M_{d_{j}}}\underset{i}{\sum} R_{i,d_{j}}. $$
(4)

Previous equation refers to the ideal case, i.e. whenever \(R_{i,d_{j}} \approx I_{i,d_{j}} K_{d_{j}} + N_{i,d_{j}}\), \(M_{d_{j}}\) is large, \(K_{d_{j}}\) and \(I_{i,d_{j}}\) are independent and \(I_{i,d_{j}}\) resembles a flat field image [2].

In order to better suppress eventual error sources, a maximum likelihood estimation of the reference PRNU for the dj-th device [2, 24] can be derived as follows

$$ K_{d_{j}, MLE} \approx \frac{{\sum}_{i=1}^{M_{d_{j}}} R_{i,d_{j}} J_{i,d_{j}}}{{\sum}_{i=1}^{M_{d_{j}}} J_{i,d_{j}}^{2}}. $$
(5)

In this case [24] the following model for the residual image is considered \(R_{i,d_{j}} \approx J_{i,d_{j}} K_{d_{j}} + {\varTheta }_{i,d_{j}}\), where \({\varTheta }_{i,d_{j}}\) are noise sources that are supposed to be independent of \(K_{d_{j}}\). This kind of estimation holds true even in case of natural (not FF) images.

It is worth observing that, whenever FF images are considered for the single device, \(I_{i,d_{j}}\) is almost constant, i.e. \(I_{i,d_{j}} (\mathbf {x}) = C_{i,d_{j}} , \forall \mathbf {x}\). As a result,

$$J_{i,d_{j}} = C_{i,d_{j}} + C_{i,d_{j}}K + N_{i,d_{j}} $$

and \(K_{d_{j}}\) can be estimated directly from \(J_{i,d_{j}}\) as follows

$$ K_{d_{j},FF} = \frac{\bar{J} - \bar{C}}{\bar{C}}, $$
(6)

where \(\bar {J} = \frac {1}{M_{d_{j}}} {\sum }_{i} J_{i,d_{j}} = \bar {C}_{i,d_{j}} + \bar {C}_{i,d_{j}} K + \bar {N}_{i,d_{j}}\), \(\bar {C}\) is the mean value of \( C_{i,d_{j}}\) and \(\bar {N}_{i,d_{j}} \approx 0\).

Better or different variants of the aforementioned estimations for the reference PRNU can be considered in order to prevent eventual denoising artifacts, error in noise source modelling and assumptions, and so on. This kind of operation is commonly denoted in the literature as PRNU enhancement process [2]. In this paper, we focus on the basic estimations described above as we expect a certain amount of coherence between PRNU estimations, independently of the adopted but consistent procedure, as it will be clearer in the sequel.

Despite the variety of distortion sources, denoising procedure represents a crucial step as the successive analyses are based on the residual image. The more the model hypotheses are met, the more consistent PRNU estimation. In particular, the residual R has a noise component and a structural component due to the fact that part of edges and structures are smoothed in the denoising procedure and then they leave traces in the residual image; on the other hand, some noise component remains in the denoised image, so that R may contain only a part of PRNU image—see Fig. 2. As a result, without loss of generality, we can briefly split the residual into two components as follows

$$ R = I_{S} + K_{N}, $$
(7)

where IS is the structural part still present in R while KN is the PRNU component in R. The better the denoiser, the less IS and the more KN approaches K. This requirement is crucial especially for the single residual R (PRNU image) that has to be compared with the device PRNU (reference PRNU or reference pattern) in order to fix the origin of a given image (candidate image). A method to address this issue is to use only image regions where the aforementioned statement holds true, i.e. smooth and almost flat regions (the ones that do not contain edges or textures) [34]; an alternative solution is to properly weight the similarity measure that is adopted in the classification process, according to edges or textures local density [32]. Unfortunately, in the latter case some settings concerning the threshold to adopt and the best weighting function remain open questions that can influence the final result. In any case, the selection of the denoiser as well as K estimation procedure can considerably change the final classification, as many papers in the literature have demonstrated [9, 22].

Fig. 2
figure 2

Example of denoising artifacts in the residual image: top) candidate image; bottom) residual image computed as in (2)

2.1 Conditioning of cross-correlation

It is worth noticing that if R denotes the residual extracted from an image whose origin (device dj) has to be assessed, by using the arguments used in [3] for the denoising problem, (7) and the independence between IS and KN provide

$$ \rho(R,K_{d_{k}}) = \frac{<(I_{S} + K_{N})-\overline{(I_{S} + K_{N})},(K_{d_{k}}-\bar{K}_{d_{k}})>}{\sigma_{R} \sigma_{K_{d_{k}}}} = \frac{\sigma_{K_{N},K_{d_{k}}} }{\sigma_{R} \sigma_{K_{d_{k}}}} \quad \forall k, $$
(8)

where \(\sigma _{K_{N},K_{d_{k}}}\) is the covariance between KN and \(K_{d_{k}}\), while σR and \(\sigma _{K_{d_{k}}}\) are the standard deviations of R and \(K_{d_{k}}\) respectively. This equation holds true either in the matching case (j = k), i.e whenever KN is a part of \(K_{d_{j}}\), or in the no matching case (jk), i.e. whenever KN is completely independent of \(K_{d_{j}}\). This way of writing ρ is interesting as it well shows the two different error sources in source camera identification:

  • the denoiser, i.e. the term KN;

  • PRNU enhancement, i.e. the term \(K_{d_{k}}\).

In addition, it allows us to make a simple but crucial observation. Without loss of generality, we consider only the numerator in (8). Since the two terms are expected to be zero-mean, it corresponds to the inner product between KN and \(K_{d_{k}}\), i.e. \(<K_{N},K_{d_{k}}>\). It is straightforward to observe that the inner product between two vectors is badly conditioned if the two vectors are orthogonal, while it is well conditioned if the two vectors are linearly dependent. In fact, by denoting with p =< y,x >= yTx the inner product between the two vectors x and y, it follows

$$\frac{|\delta p|}{|p|} \leq \frac{\|y^{T}\| \|x\|}{|y^{T} x|} \frac{\|\delta x\|}{\|x\|},$$

where δp is the absolute error for p caused by the absolute error δx for the vector x. The quantity \(\frac {\|y^{T}\| \|x\|}{|y^{T} x|}\) resembles the condition number for the computation of p whenever y is fixed and it is exactly the inverse of the cosine of the angle between y and x. As a result, with reference to the numerator of (8), if denoising is accurate and j = k, i.e. \(K_{N} \sim K_{d_{j}}\), then the problem is well conditioned; on the contrary, if denoising is accurate but jk, the problem is badly conditioned as it is expected that KN does not share anything with \(K_{d_{k}}\). This property still holds whenever the reference PRNU for a given device is estimated using different but consistent estimation strategies. As a result, independently of the way K is estimated, we expect that \(\sigma _{R,K_{d_{j}}}\) (matching case) computation is better conditioned and stable than \(\sigma _{R,K_{d_{k}}}\) (no matching case) .

In addition, by assuming the estimation of \(K_{d_{k}}\) enough accurate, if j = k then \(K_{d_{j}}\) is contained in the residual image R; since R = IS + KN, we can write \(K_{d_{j}} = K_{N} + {K_{N}^{c}}\) and then

$$ \begin{array}{@{}rcl@{}} \rho(R,K_{d_{k}}) &=& \rho(R,K_{d_{j}}) = \frac{\sigma_{K_{N},K_{d_{j}}}}{\sigma_{R} \sigma_{K_{d_{j}}}} = \frac{\sigma_{K_{N}}^{2} + \sigma_{K_{N},{K_{N}^{c}}}}{\sigma_{R} \sigma_{K_{d_{j}}}} \\ &=&\frac{\sigma_{K_{N}}^{2} + \sigma_{K_{N},{K_{N}^{c}}}}{\sqrt{\sigma^{2}_{I_{S}} \sigma^{2}_{K_{d_{j}}} + (\sigma^{2}_{K_{N}} + \sigma_{K_{N},{K_{N}^{c}}} )^{2} + \sigma^{2}_{K_{N}}\sigma^{2}_{{K_{N}^{c}}} - \sigma^{2}_{K_{N},{K_{N}^{c}}} }}\\ &=&\frac{1}{\sqrt{1+\frac{\sigma^{2}_{I_{S}} \sigma^{2}_{K_{d_{j}}} + \sigma^{2}_{K_{N}}\sigma^{2}_{{K_{N}^{c}}} - \sigma^{2}_{K_{N},{K_{N}^{c}}} }{(\sigma_{K_{N}}^{2} + \sigma_{K_{N},{K_{N}^{c}}})^{2}}}}, \end{array} $$

where \(\sigma ^{2}_{K_{N}}\sigma ^{2}_{{K_{N}^{c}}} - \sigma ^{2}_{K_{N},{K_{N}^{c}}} \geq 0\) from the Cauchy-Schwarz inequality.

Hence, in case of a perfect denoiser, \(\sigma ^{2}_{I_{S}} = 0\) and \({K_{N}^{c}}= 0\), so that ρ approaches 1. Even though it is not feasible in real situations, we expect that it is nearly true in correspondence to flat regions. In this case, as IS contribution should be minor (as well as \({K_{N}^{c}}\)), the argument in the square root in the last line of the previous equation is close to 1. On the contrary, for textured/edges regions we expect a greater contribution from IS as well as from \({K_{N}^{c}}\) and the argument in the square root is much greater than 1 so that ρ value decreases. As a result, in the matching case we are able to predict, in some sense, the behaviour of ρ whenever estimated in specific regions of the image, i.e. with or without edges or textures. On the contrary, in the no matching case (jk) we cannot say anything more about \(\sigma _{K_{N},K_{d_{k}}}\) except for the fact that we expect values close to zero in all image regions.

2.2 The proposed source identification method

Aforementioned observations further and more formally motivate the preliminary work presented in [6]. Specifically, in the source camera identification problem, it could be more advantageous to exploit the fact that if an image is acquired by a given sensor, with high probability we are able to measure this match with almost all consistent estimations of the reference pattern (Fig. 3); on the contrary, if the image comes from another device, then we expect more variable and less predictable correlation values whenever different estimations of the reference pattern are considered. In addition, as mentioned in the previous section, in the match case the relation between the ρ values computed on flat regions and on textured regions is expected to be almost insensitive to different but coherent estimations of the reference PRNU. As a result, it would be convenient to adopt this coherence in the identification process. In particular, in this work the coherence between ρ values computed on the whole image, on image flat regions and on textured/edge regions has been analysed and adopted for source identification purposes—see Fig. 4.

Fig. 3
figure 3

Left column) Two different images from Dresden database; Middle column) \(\rho (R,K_{d_{k}})\) versus device number k computed for \(K_{d_{k}}\) estimated as in (6); Right column) \(\rho (R,K_{d_{k}})\) versus device number k computed for \(K_{d_{k}}\) estimated as in (5). Even though the reference pattern is differently estimated, in both cases the maximum value is in correspondence to the device that took the image

Fig. 4
figure 4

Left column) Match case j = k. \(\rho (R,K_{d_{j}})\) value evaluated for i) the whole image residual R, ii) the only flat regions, iii) the only textured regions. Two different estimations for \(R_{d_{j}}\) have been considered. Each row refers to a different candidate image. Right column) No match case jk. \(\rho (R,K_{d_{k}})\) values for the same images when compared with the reference PRNU of a difference source. Measures in the left column are more coherent than those in the right column

More formally, let \(K_{d_{k},1}\) and \(K_{d_{k},2}\) be two different estimations of \(K_{d_{k}}\) for a fixed device dk and let {Pi}i= 1,2 the features vectors computed with respect to the ith estimation of camera fingerprint \(K_{d_{k}}\). By denoting with J the candidate image, Pi is the three component vectors whose components are described below:

  1. 1.

    \(\rho (R,K_{d_{k},i})\), i.e. the correlation between the candidate image residual R and the ith estimation of camera fingerprint \(K_{d_{k}}\);

  2. 2.

    \(\rho (R_{flat},K_{d_{k},i,flat})\), i.e. the correlation between the candidate image residual R restricted to flat regions of J and \(K_{d_{k},i}\) restricted to the same region;

  3. 3.

    \(\rho (R_{edge},K_{d_{k},i,edge})\), i.e. the correlation between the candidate image residual R restricted to edge regions in J and \(K_{d_{k},i}\) restricted to the same region.

Independently of the inner dependency between the similarity metric evaluated in image subregions, we expect that these dependencies are preserved more in the match case whenever \(K_{d_{k}}\) estimation slightly changes.

The coherence between the two feature vectors is measured by means of their inner product, i.e.

$$ \tau_{J,K_{d_{k}}} = <P_{1},P_{2}>. $$
(9)

The larger \(\tau _{J,K_{d_{k},i}}\), the higher the coherence between the normalized correlation evaluated in different image regions with respect to different estimations of camera fingerprint, and then the higher the probability that J comes from dk—see Fig. 5.

Fig. 5
figure 5

\(\tau _{J,K_{d_{k}}}\) computed with respect to k = 1,...,21 for 25 different images acquired by d6 (topleft), d11 (topright ), d13 (bottomleft) and d16 (bottomright)

2.3 The algorithm

The source camera identification algorithm is summarized below.

  1. 1.

    For each device dk in the database, estimate the reference pattern \(K_{d_{k}}\) using the first predefined estimation mode and let \(K_{d_{k},1}\) be the estimation

  2. 2.

    For each device dk in the database, estimate the reference pattern \(K_{d_{k}}\) using the second predefined estimation mode and let \(K_{d_{k},2}\) be the estimation

  3. 3.

    For each candidate image J,

    • apply a predefined denoising filter and estimate the residual image R as in (2);

    • apply an edge detection filter for extracting edges/textured regions; extract flat regions as edges/textures complementary regions;

    • compute \(P_{1} = [\rho (R,K_{d_{k},1}),\rho (R_{edge},K_{d_{k},1,edge}),\rho (R_{flat},K_{d_{k},1,flat})]\) and \(P_{2} = [\rho (R,K_{d_{k},2}),\rho (R_{edge},K_{d_{k},2,edge}),\rho (R_{flat},K_{d_{k},2,flat})]\);

    • compute \(\tau _{J,K_{d_{k}}}\) as in (9);

    • compare \(\tau _{J,K_{d_{k}}}\) with a predefined threshold value. If \(\tau _{J,K_{d_{k}}}\) overexceeds the threshold value, then the image has been acquired by the device dk with high probability.

Remark

PRNU extraction requires the use of a denoising filter and, as pointed out in the previous sections, it is required to be enough accurate. In this paper we do not focus on the denoising filter and the widely used wavelet-based Mihcak filter [26], as suggested in the pioneering paper [24], has been considered in the experimental results. As a matter of fact, as also shown in some papers in the literature, different and better performing methods could be used, as for example non-local based methods [4, 5, 7, 13], but they could result too much computationally expensive. The wavelet-based denoising method represents a good trade-off between accuracy and required computational effort.

2.4 Weighted coherence measure

As mentioned in the previous section, the inner product defined in (9) conveys some information concerning the relation between the candidate image and a given device. This is the reason why in this paper the inner product has been considered as a discrimination measure in the source identification process. In order to further investigate this index, the following corrections for \(\tau _{I,K_{d_{k}}}\) have also been considered

  1. 1.

    \(w(J,K_{d_{k}}) = \rho (R,K_{d_{k},1}) \tau _{J,K_{d_{k}}}\);

  2. 2.

    \(v(J,K_{d_{k}}) = \rho (R_{flat},K_{d_{k},1,flat}) \tau _{J,K_{d_{k}}} \).

In the first case, the classical correlation measure (basic algorithm in [24]) is used as a corrective term for the proposed coherence measure; in the second case, the classical correlation measure restricted to image flat regions has been selected as corrective term. In this latter case better results are expected since measures in flat regions should be more accurate as less affected by denoising artifacts.

3 Experimental results

The proposed method has been tested on publicly available databases in the forensics field, the Dresden Image database [18] and Vision database [30]. The former includes hundred images (natural and flat field) captured by several camera models and devices—a subset of uncompressed images and devices, listed in Tables 1 and 2, have been used in our tests. The latter is composed of videos and images both in the native format and in their social version (Facebook, YouTube, and WhatsApp are considered), from 35 portable devices of 11 major brands. In this paper, Facebook images have been analysed, as listed in Table 3.

Table 1 Selected images and devices from Dresden database
Table 2 Selected images and devices from Dresden database
Table 3 Selected images and devices from Vision database

For the estimation of camera fingerprint, i.e. the reference PRNU \(K_{d_{k}}\), we use

$$ K_{d_{k},1} = \frac{1}{M_{d_{k}}}\sum\limits_{i=1}^{M_{d_{k}}} H_{i,d_{k}} $$
(10)

where \( H_{i,d_{k}}(\mathbf {x}) = J_{i,d_{k}}(\mathbf {x}) - C_{i,x_{2}} - C_{i,x_{1}}\), with \(C_{n,x_{2}} = \frac {1}{Nrows} {\sum }_{x_{1}} J_{i,d_{k}}(\mathbf {x}) \) and \(C_{i,x_{1}} = \frac {1}{Ncols} {\sum }_{x_{2}} (J_{i,d_{k}}(\mathbf {x}) - C_{i,x_{2}}) \), in agreement with [24], while for \(K_{d_{k},2}\) a different equation is used according to the available images in order to be more consistent with the image model. Specifically, if FF images are available, \(K_{d_{k},2}\) is set equal to (6), as it refers to constant images. On the contrary, if only NI are available, \(K_{d_{k},2}\) is set equal to (5), as it is more robust to eventual distortions introduced during the denoising step.

For edges extraction, the standard Canny edge detection algorithm has been selected with Matlab default parameters and a dilation window size equal to 7 has been applied to the output edge map. Even in this case, a classical edge detector has been considered for simplicity. The size of the dilation window has been empirically set as the one that provided the best results on average. As a matter of fact, the dilation factor should be fixed according to the candidate image content. However, the estimation of the best dilation parameter would require an additional computational effort and it is out of the scope of this paper.

For comparative studies the reference pioneering method in [24] (basic method) and the one in [32] have been considered. The former has been selected since we aim at measuring to what extent the proposed method improves the original work. In other words, we are interested in quantifying the benefit in using the proposed coherence measure as alternative to the absolute correlation (plain mode) or as a corrective term (weighted mode) for it. It is worth observing that the plain mode corresponds to the preliminary work presented in [6]. The method in [32] has been selected as it shares the same strategy, i.e. weighting correlation measure according to image regions. However, the method in [32] defines the weight as the local density of textured/edges region. On the contrary, in the proposed method we focus on the inner dependencies between the correlation metric between more or less textured image regions rather than their contribution to the similarity metric. The results have been compared using standard classification indices, as specificity, sensitivity, precision, F1-score and accuracy [15].

The first test is oriented to compare the proposed method, i.e. the inner product \(\tau _{J,K_{d_{k}}}\) and its weighted version \(w(J,K_{d_{k}})\), with the basic algorithm, i.e. \(\rho (R,K_{d_{k},1})\), at a fixed sensitivity value, i.e. the number of true positives and negatives is the same for all methods. As a result, in this case a unique threshold is used for all candidate images and devices, and it has been fixed as the one allowing the predefined sensitivity value. As a consequence, for each method a different threshold has been used. As it can be observed in Tables 4 and 5, both \(\tau _{J,K_{d_{k}}}\) and \(w(J,K_{d_{k}})\) are able to outperform the basic \(\rho (J,K_{d_{k},1})\) in terms of reduced number of false positives. This means that the inner product provides a reduced number of false positives assignments in the second working scenario (image source device could not be in the available set of devices) and the weighted inner product further improves this result. It is also worth observing that the same considerations are valid either the device PRNU is estimated from FF images or from NI.

Table 4 Results in terms of number of true positives (TP), true negatives (TN), false negatives (FN), false positives (FP), precision (Prec), F1 score (F1), specificity (Spec) and accuracy (Acc) provided by the proposed method (inner product, \(\tau _{J,K_{d_{k}}}\)), its weighted version (\(w(J,K_{d_{k}})\)) and the basic algorithm in [24] (\(\rho (R,K_{d_{k},1})\)) at three fixed sensitivity values
Table 5 Results in terms of number of true positives (TP), true negatives (TN), false negatives (FN), false positives (FP), precision (Prec), F1 score (F1), specificity (Spec) and accuracy (Acc) provided by the proposed method (inner product, \(\tau _{J,K_{d_{k}}}\)), its weighted version (\(w(J,K_{d_{k}}))\) and the basic algorithm in [24] (\(\rho (R,K_{d_{k},1})\)) at three fixed sensitivity values

In order to stress this point, the ROC (sensitivity vs 1-specificity) curves for the three methods are depicted in Figs. 6 and 7. As it can be observed, the proposed procedure is able to considerably improve the basic method in terms of True Positive Rate (TPR = Sensitivity), especially in correspondence to high specificity values, i.e. low False Positive Rate (FPR = 1-Specificity). In fact, as also outlined in [6], the use of the inner product allows us to reduce the number of false positive assignments.

Fig. 6
figure 6

Reference PRNU from FF images. Left) ROC curve of the basic algorithm compared with the one of the inner product based algorithm. Right) ROC curve of the basic algorithm compared with the one of the weighted inner product based algorithm

Fig. 7
figure 7

Reference PRNU from NI images. Left) ROC curve of the basic algorithm compared with the one of the inner product based algorithm. Right) ROC curve of the basic algorithm compared with the one of the weighted inner product based algorithm

It is worth observing that we reach the same conclusions if all candidate images are analysed and a unique decision threshold value is applied to each metric for source identification purposes. More precisely, two different thresholds have been selected as follows:

  • the one corresponding to the Mth highest metric value, with M equal to the total number of candidate images used in the test;

  • the one able to select the first 5% of the highest metric values (right tail of metric values distribution).

As Fig. 8 shows, the two thresholds are close to the optimal separation point of the monotonic decreasing rearrangement of metric values, i.e. the one that separates the distribution into two groups having different characteristics. Quantitative results are reported in Tables 6 and 7. As it can be observed, the inner product based method performs better than the basic cross-correlation one in correspondence to this optimal point.

Fig. 8
figure 8

Top) Sorted cross-correlation (left) and inner product (right) evaluated for all the candidate images listed in Table 1 when compared with all the devices listed in the same table. Bottom) Same plots restricted to the first 1100 values. The marker indicates the threshold value corresponding to 5% of the whole metric evaluations and to the first 525 values. Device PRNU has been estimated using FF

Table 6 Results in terms of number of true positives (TP), true negatives (TN), false negatives (FN), false positives (FP), precision (Prec), sensitivity (Sens), F1 score (F1), specificity (Spec) and accuracy (Acc) provided by the proposed method (inner product, \(\tau _{J,K_{d_{k}}}\)) and the basic algorithm in [24] (\(\rho (R,K_{d_{k},1})\)) at the threshold level corresponding to 5% of the distribution and the one corresponding to the number of candidate images
Table 7 Results in terms of number of true positives (TP), true negatives (TN), false negatives (FN), false positives (FP), precision (Prec), sensitivity (Sens), F1 score (F1), specificity (Spec) and accuracy (Acc) provided by the proposed method (inner product, \(\tau _{J,K_{d_{k}}}\) ) and the basic algorithm in [24] (\(\rho (R,K_{d_{k},1})\)) at the threshold level corresponding to 5% of the distribution and the one corresponding to the number of candidate images

In the previous test the inner product shows to convey information concerning the image source, while the global weighting procedure improves the classification results provided by the basic algorithm. To further confirm this fact, the value \(\rho (R_{flat},K_{d_{k},1,flat})\) has been considered as weighting coefficient for the inner product as it is expected to be more accurate than the global correlation value. The result is also compared with the method in [32], where a pointwise weighted correlation is employed for classification purposes. The methods are compared in Table 8, where fixed specificity values have been considered.

Table 8 Comparisons in terms of sensitivity for fixed specificity values: 0.990, 0.994 and 0.995

As it can be observed, the weighting method always guarantees an improvement and the more consistent the weight the higher the improvement. It is also worth observing that the proposed global weighting procedure is able to reach comparable results to the local weighting procedure in [32]. However, it is interesting to observe that the main contribution of the proposed inner product consists in better addressing the separation problem for some devices for which the correlation measure results more noisy, as the Olympus one in Dresden database. As Fig. 9 shows, there’s not a threshold that allows to assign each candidate image to the analysed device so that many false assignments can occurr. However, it is also possible to observe that inner product results less noisy than cross-correlation. By repeating the previous comparative test device-per-device, whose results are in Table 9, it is evident that the inner product greatly improves the one of the basic algorithm.

Fig. 9
figure 9

1st row) \(\rho (R,K_{d_{k},1}), \quad k=1,...,21\) (left) computed for 25 candidate images acquired by: left) Olympus Device 1 (x-indices 1–25 in the plot); middle) Olympus Device 2 (x-indices 26–50 in the plot); right) Olympus Device 3 (x-indices 51–75 in the plot). 2nd row) \(\tau _{J,K_{d_{k},1}}, \quad k=1,...,21\) (right) computed for 25 candidate images acquired by: left) Olympus Device 1 (x-indices 1–25 in the plot); middle) Olympus Device 2 (x-indices 26–50 in the plot); right) Olympus Device 3 (x-indices 51–75 in the plot). 3rd row) Same values in the first row but restricted to the first 5 devices of the same Olympus model. 4th row) Same values in the second row but restricted to the first 5 devices of the same Olympus model

Table 9 First three devices for Olympus. Comparisons in terms of sensitivity for fixed specificity values: 0.990, 0.994 and 0.998

The same considerations are valid if NI are used for device PRNU estimation instead of FF, as shown in Tables 10 and 11, by confirming the robustenss of the proposed method to the use of less precise estimations of source camera PRNU. In order to stress this fact, Table 10 also contains the values of \(\rho (R_{flat},K_{d_{k},1,flat})\), i.e. the normalized correlation restricted to the flat regions of the candidate image. As it can be observed, in this case the correction provided by the inner product allows to increase the discrimination power of the metric, resulting more robust to the different error sources, especially for low false positive rates.

Table 10 Comparisons in terms of sensitivity for fixed specificity values: 0.990, 0.994, 0.995, 0.998, 0.999 and 0.9995
Table 11 First three devices for Olympus

The proposed procedure showed to be robust to the analysis of candidate images that have been acquired by a social network, as shown in Table 12. In this case, the proposed index allows to improve the basic one whenever used as corrective term for the basic measure.

Table 12 Vision database

With regard to the first scenario, i.e. the set of devices contains the one that took the image, the benefit in using the weighted inner product is evident. In this case, for the candidate image J the aim is to have

$$j = argmax_{k} f(R,K_{d_{k}})$$

with dj the device that took the image, R the residual image associated to J and f the adopted source identification metric, i.e. \(\rho (R,K_{d_{k}})\), \(\tau _{J,K_{d_{k}}}\), \(w(J,K_{d_{k}})\) and \(v(J,K_{d_{k}})\) respectively for the basic method [24], the proposed inner product and its two weighted versions. As Tables 13 and 14 show, the four metrics provide comparable results and reach 100% of acceptance rate for most of the candidate images in the database. However, it is worth observing that basic and the inner product metrics benefit from the weighting operations in resolving some ambiguities and instabilities for some critical brands as Olympus and Panasonic. In particular, the weigthing operation allows us to increase the number of correct assignments.

Table 13 Global source identification results (left) for the basic method, the inner product based method, the weighted inner product with the basic method and the weighted inner product restricted to flat regions, in the second working scenario and for the brands listed in the righmost columns
Table 14 Confusion matrix for brand classification in the second working scenario for the basic method, the inner product based method, the weighted inner product with the basic method and the weighted inner product restricted to flat regions

Finally, with regard to the computational effort, the proposed method inherits the same properties of the basic algorithm in [24]. As a result, the most expensive procedure is the denosing process; the remaining operations, i.e. enhancement methods for reference PRNU estimations and selective correlation values are somewhat inexpensive, real time and user’s independent.

4 Conclusion

In this paper source camera identification problem has been addressed. Even though denoising and enhancement procedures play a crucial role in the whole identification process, the role of the metric used for source assessment is not negligible, especially in the classification procedures. This paper focused on this task. Specifically, the coherence between metric values computed in different but specific regions of the image has been considered and its dependence on camera fingerprint estimation has been studied. The main result has been the use of the inner product as measure of this coherence and the observation that this quantity is better conditioned whenever the analysed candidate image has been acquired by the reference device. The proposed coherence measure has been used as both absolute metric for the source identification process and corrective term for basic existing methods. Experimental results show that, even though in its preliminary version, the use of the proposed coherence contributes to improve the identification process, especially by decreasing the number of false assignments. In addition, it shows some robustness to PRNU estimation from natural images and to candidate images coming from social networks. Future research will be devoted to investigate further on this coherence with particular reference to the study of its dependence on each single component of the whole identification process, such as denoising method, PRNU estimation mode, similarity metric, image region extraction. Finally, a more intensive study concerning its dependence on image manipulation will be also one of the topics of future work.