Chapters 3 and 4 have shown that PRNU is a very effective solution for source camera verification, i.e., linking a visual object to its source camera (Lukas et al. 2006). However, in order to determine the source camera, there has to be a suspect camera in the possession of a forensics analyst which is often not the case. On the other hand, in the source camera identification problem, the goal is to match a PRNU fingerprint to a large database. This capability is needed when one needs to attribute one or more images from an unknown camera to a large number of images in a large image repository to find other images that may have been taken from the same camera. For example, consider that a legal entity acquires some illegal visual objects (such as child pornography) without any suspect camera available. Consider also that the owner of the camera shares images or videos on social media such as Facebook, Flickr, or YouTube. In such a scenario, it becomes crucial to find a way to link the illegal content to those social media accounts. Given that these social media contains billions of accounts, camera identification at large scale becomes a very challenging task.

This chapter will focus on how databases that support camera identification can be created and the methodologies to search query images on those databases. Along with that, the time complexities, strengths, and drawbacks of different methods will also be presented in this chapter.

1 Introduction

As established in Chaps. 3 and 4, once a fingerprint, K, is obtained from a camera, the PRNU noise obtained from other query visual objects can be tested against the fingerprint to determine if they are captured by the same camera. Each of these comparisons is a verification task (i.e., 1-to-1).

On the other hand, in the identification task, the source of a query image or video is searched within a fingerprint collection. The collection could be compiled from social media such as Facebook, Flickr, and YouTube. Visual objects from each camera in such a collection can be clustered together, and their fingerprints extracted to create a known camera fingerprint database. This chapter will discuss identification on a large scale which can be seen as a sequence of multiple verification tasks (i.e., 1-to-many). Specifically, we will focus on structuring large databases of images or camera fingerprints and the search efficiency of different methods using such databases.

Notice that search can mainly be done in two ways: querying one or more images from a camera against a database of camera fingerprints or querying a fingerprint against a database of images. This chapter will focus on the first case where we will assume there is only a single query image whose source camera is under question. However, the workflow is the same for the latter case as well.

Over the past decade, various approaches have been proposed to speed up camera identification using large databases of camera fingerprints. These methods can be grouped under two categories: (i) techniques for decreasing the cost of pairwise comparisons and (ii) decreasing number of comparisons made. Along with these a third category aims at combining the strengths of two approaches to create a superior method.

We will assume that neither query images nor the images used for computing the fingerprint are geometrically transformed unless otherwise stated. As described in Chap. 3, some techniques can be used to reverse the geometric transformations performed before searching the database. However, not all algorithms proposed in this chapter will work, and geometric transformations may cause a failure in the methodology. We will also assume that there is no camera brand/model information available as metadata typically easy to forge and often does not exist when visual objects are obtained from social media. However, when reliable metadata is available, it can speed up search further by restricting to the relevant models in the obvious manner.

2 Naive Methods

This section will present the most basic methods for source camera identification. Consider that images or videos from N known cameras are obtained. Their fingerprints are extracted as a one-time operation and stored in a fingerprint dataset, \(D = {K_1, K_2, \dots K_N}\) to form a fingerprint dataset. This step is common for all the methods presented in this and other camera identification methods.

2.1 Linear Search

The most straightforward search method for source camera identification is linear search (brute force). A query visual object with n pixels, whose fingerprint estimate is \(K_q\), is then compared with all the fingerprints until (i) it matches with one of them (i.e., at least one \(H_1\) case) (ii) no more fingerprints left (i.e., all comparisons are \(H_0\) cases)

$$\begin{aligned} \begin{array}{l} H_0: K_i \ne K_q \text { (non-matching fingerprints)}\\ H_1: K_i = K_q \text { (matching fingerprints)} \end{array}\end{aligned}$$
(6.1)

So the complexity of the linear search is O(nN) for a single query image. Since modern cameras typically have more than 10 M pixels, and the number of cameras in a comprehensive database could be many millions, the computational cost of this method becomes exceedingly large.

2.2 Sequential Trimming

Another simple method is to trim all images to a fixed length such that all fingerprints become the same length k where \(k \le n\) (Goljan et al. 2010). The main benefits of this method are it would speed up the search by n/k times, and the memory and disk usage drop by the same ratio.

Given a camera fingerprint, \(K_i\), and a fingerprint of a query object, \(K_q\), their correlation \(c_q\) is independent of the number of remaining pixels after trimming (i.e., k). However, as k drops, the probability of false alarm, \(P_{FA}\) increases, and the probability of detection, \(P_D\) decreases for a fixed decision threshold (Fridrich and Goljan 2010). Therefore, the sequential trimming method may have only limited speed up if the goal is to retain performance.

Moreover, this method still requires the query fingerprint to be correlated with N fingerprints in the database (i.e., linear with the number of cameras in the database). However, the time the computation complexity drops to O(kN) as fingerprint is trimmed from n to k elements.

3 Efficient Pairwise Correlation

As presented in Chap. 3, the complexity of a single correlation is proportional to the number of pixels in a fingerprint pair. Since modern cameras contain millions of pixels and a large database can contain billions of images, using naive search methods is prohibitively expensive for source camera identification.

The previous section shows that the time complexity of a brute-force search is proportional to the number of fingerprints and number of pixels in each fingerprint (i.e., \(O(n\times N)\)). Similarly, sequential trimming can only speed up the search a few times. Hence, further improvements for identification are required.

This section presents the first approach mentioned in Sect. 6.1 that decreases the time complexity of a pairwise correlation.

3.1 Search over Fingerprint Digests

The first method that addresses the above problem is searching over fingerprint digests (Goljan et al. 2010). This method reduces the dimensionality of each fingerprint by selecting the k largest (in terms of magnitude) fingerprint elements along with their spatial location from a fingerprint F.

Fingerprint Digest

Given a fingerprint, F, with n fingerprint elements, the proposed method picks the top k largest of them as well as their indices. Suppose that the values of these k elements are \(V_F = \{v_1, v_2, \dots v_k\}\) and their locations are \(L_{F} = \{l_1, l_2,\dots l_k\}\) for \(1 \le l_i \le n\) where \(V_F = F[L_F]\).

The digest of each fingerprint can be extracted and stored along with the original fingerprint. An important aspect is determining the number of pixels, k, in a digest. Picking a large number of pixels will not yield a high speedup, whereas a low number will result in many false positives.

Search step

Suppose that the fingerprint of the query image is X and it is correlated with one of the digests in the database \(V_F\) whose indices are \(L_F\). In the first step, the digest of X is estimated as

$$\begin{aligned} V_X = X[L_F] \end{aligned}$$
(6.2)

Then, \(corr(V_F, V_X)\) is obtained. If the correlation is lower than a preset threshold, \(\tau \), then X and F are said to be from different sources. Otherwise, the original fingerprint X and F are correlated to decide if they originate from the same source camera. Overall, this method speeds up approx. k/n times where \(k<< n\). Therefore, a significant speedup can be achieved.

3.2 Pixel Quantization

The previous section presented that search in large databases is proportional to the number of images in a database, N and the number of elements each fingerprint has, n. To be more precise, together with these variables, the number of bits of each fingerprint element also plays a role in the time complexity of the search. Taking the number of bits into account, the time complexity of a brute-force search becomes \(O(b \times n\times N)\) where b is the number of bits of each element. Therefore, another avenue for speeding up pairwise fingerprint comparison is by quantizing the fingerprint elements.

Bayram et al. (2012) proposed a fingerprint binarization method where each fingerprint element is represented only by its sign, and the magnitude is discarded. Hence, each element is represented by either 0 or 1. Given a camera fingerprint, X, with n elements, its binary form, \(X^B\), can be represented as

$$\begin{aligned} X^B_i = {\left\{ \begin{array}{ll} 0, &{} X_i < 0\\ 1, &{} X_i >= 0\\ \end{array}\right. } \end{aligned}$$
(6.3)

where \(X^B_i\) is the ith element in binary form for \(i \in \{1, \dots n\}\).

This way, the efficiency of pairwise fingerprint matching can be improved by a factor of b. The original work (Lukas et al. 2006) uses 64-bit double precision for each fingerprint element. Therefore, the binary quantization method can save storage usage and computation efficiency by a factor 64 as compared to Lukas et al. (2006). However, binary quantization comes with an expense of a decrease in the correct detection.

Alternatively, one can use the quantization approach but with more bits instead of a single bit. This way, the metrics can be preserved. When the bit depth of two fingerprints is reduced from 64-bit to 8-bit format, the pairwise correlation changes marginally (i.e., less than \(10^{-4}\)) but nearly 8 times speed up can be achieved as the complexity of Pearson correlation is proportional to the number of bits.

3.3 Downsizing

Another simple yet effective method is creating multiple downsized versions of a fingerprint where along with the original fingerprint L other copies of the fingerprint are stored (Yaqub et al. 2018; Tandoğan et al. 2019). In these methods, a camera fingerprint whose resolution is \(r \times c\), is downsized by a scaling factor s in the first level. Then, in the next level, the scaled fingerprint is further resized by s, and so on. Hence, the resolution of a fingerprint in Lth level the resolution becomes \(\frac{r}{s^L} \times \frac{c}{s^L}\).

Since, multiple copies of a fingerprint is stored in this method, the storage usage increases by \(\frac{r \times c}{s^2 + s^4 + \dots s^{2L}}\). The authors show that the optimal value for s is 2 as it causes the least interpolation artefact, and L is 3 level. Moreover, Lanczos\(-3\) interpolation results in best accuracy as it best approximates the Sinc kernel (Lehman et al. 1999). In these settings, the storage requirement increases by \({\approx }33\%\).

Matching

Given a query fingerprint and a dataset of N fingerprints (for each one of them 3 other resized versions are stored), a pairwise comparison can be done as follows:

The query fingerprint is resized to L3 size (i.e., by 1/8) and compared by L3 version of ith fingerprint. If the comparison of L3 query and reference fingerprint pair is below a preset threshold, the query image is not captured by the ith camera, and the query fingerprint is then compared by \((i+1)\)th reference fingerprint. Otherwise, L2 fingerprint are compared. The same process is continued for other levels as well. The preset PCE threshold is set to 60, as presented in Goljan et al. (2009).

The results show that when cropping is not involved, this method can achieve up to 53 times speed up with a small drop in the true positive rate.

3.4 Dimension Reduction Using PCA and LDA

It is known that sensor noise is high-dimensional data (i.e., more than millions of pixels). This data contains redundant and interfering components such as demosaicing, JPEG compression, and other in-camera image processing operations. Because of these artifacts are the probability of match may decrease, and a significant slow down happens. Therefore, removing those redundant or interfering signals helps improve the matching accuracy and the efficiency of pairwise comparison.

Li et al. propose to use Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) to compress camera fingerprints (Li et al. 2018). The proposed method first uses PCA-based denoising (Zhang et al. 2009) to create a better representation of camera fingerprints. Then LDA is utilized to decrease the dimensionality further as well as further improve matching accuracy.

Fingerprint Extraction

Suppose that there are n images, \(\{I_1, \dots I_{n}\}\), captured by a camera. These images are cropped from their centers such that each becomes \(N \times N\) resolution. First, PRNU noise, \(x_i\), extracted from these images using one of extraction methods  (Lukas et al. 2006; Dabov et al. 2009). In this step, the noise signals are flattened to create column vectors. Hence, the size of \(x_i\) becomes \(N^2 \times 1\). They are then horizontally stacked to created the training matrix, X as follows:

$$\begin{aligned} X = [x_1, x_2, \dots x_n] \end{aligned}$$
(6.4)

Feature extraction via PCA

The mean of X, \(\mu \), becomes

$$\mu = \frac{1}{n} \sum _{i=1}^{n} x_i$$

The training set is normalized by subtracting the mean from each image by \(\bar{x}_i = x_i - \mu \). Hence, the normalized training set \(\bar{X}\) becomes

$$\begin{aligned} \bar{X} = [\bar{x}_1, \bar{x}_2, \dots \bar{x}_n] \end{aligned}$$
(6.5)

The covariance matrix, \(\Sigma \) of \(\bar{X}\) can be calculated as \(\Sigma = E [\bar{X} \bar{X}^T] \approx \frac{1}{n} \bar{X} \bar{X}^T\).

PCA is performed for finding orthonormal vectors, \(u_k\) and their eigenvalues, \(\lambda _k\). The dimensionality of \(\bar{X}\) is very high (i.e., \(N^2 \times n\)). Hence the computational complexity of PCA is prohibitively high. Instead of direct computation, singular value decomposition (SVD) is applied, which is a close approximation of PCA. SVD can be written as

$$\begin{aligned} \Sigma = \Phi \Lambda \Phi ^T \end{aligned}$$
(6.6)

where \(\Phi = [\phi _1, \phi _2 \dots \phi _m]\) is the \(m\times m\) orthonormal eigenvector matrix and \(\Lambda = diag\{\lambda _1, \lambda _2 \dots \lambda _n\}\) are eigenvalues.

Along with creating decorrelated eigenvectors, PCA allow to reduce dimensionality. Using the first d of m eigenvectors, transformation matrix \(M_{pca} = \{\phi _1, \phi _2 \dots \phi _d\}\) for \(d < m\). Then the transformed dataset, \(\hat{Y}\) becomes \(\hat{Y} = M_{pca}^T \hat{X}\) whose size is \(d\times n\). Generally speaking, most of the discriminative information of the SPN signal will concentrate on the first few eigenvectors. In this research, the authors kept only d eigenvalues to preserve \(99\%\) of the variance, which is used to create a compact version of the PRNU noise. Moreover, thanks to this dimension reduction, the “contamination” caused by other in-camera operations can be decreased. The authors call this a “PCA feature”.

Feature extraction via LDA

LDA is used for two purposes: (i) dimension reduction (ii) increasing the distance between different classes. Since the training set is already labeled, LDA can be used to reduce dimension and increase separability of different classes. This can be done by finding a transformation matrix, \(M_{lda}\) that maximizes the ratio of the determinant of between-class scatter matrix, \(S_b\) to within-classes scatter matrix \(S_w\):

$$\begin{aligned} M_{lda} = \underset{J}{arg max} = \begin{vmatrix} \frac{J^T S_b J}{J^T S_w J} \end{vmatrix} \end{aligned}$$
(6.7)

where \(S_w = \sum _{j=1}^{c} \sum _{i=1}^{L} (y_i - \mu _j) (y_i - \mu _j)^T\). There are c distinct classes and jth class has \(L_j\) samples, \(y_i\) is the ith sample of the class j and \(\mu _j\) is the mean of the class j. On the other hand \(S_b\) is defined as \(S_b = \frac{1}{c} \sum _{j=1}^{c} \sum _{i=1}^{L} (\mu _j - \mu ) (\mu _j - \mu )^T\) where \(\mu \) is the average of all samples.

Using the “LDA feature” extractor, \(M_{lda}\), a \(c-1\) dimensional vector can be obtained as

$$\begin{aligned} \begin{aligned} z&= M_{lda}^T [(M^d_{pca})^T X] \\&= ((M^d_{pca})^T (M_{lda}^T) X \end{aligned} \end{aligned}$$
(6.8)

So, PCA and LDA operations can be combined using only a single operation, i.e., \(M_e = ((M^d_{pca})^T (M_{lda}^T)\). Typically \((c-1)\) is lower than d, which can further compress the training data.

Reference Fingerprint Estimation

Consider that jth camera, \(c_j\), which has captured image \(I_1, I_2, \dots I_{L_j}\) and their PRNU noise patterns are \(x_1, x_2, \dots x_{L_j}\), respectively.

The reference fingerprint for \(c_j\) can be obtained using (6.8). So, using (6.8), LDA features are extracted (i.e., \(z_j = M_e^T X\)).

Source Matching

The authors present two different methodologies: Algorithm 1 and 2. These algorithms make use of \(y^d\) (PCA-SPN) and z (LDA-SPN), respectively. Both of these compressed fingerprints have much lower dimensionality compared to the original fingerprint, x.

The source camera identification process is straightforward for both algorithms. “One-time” fingerprint compression step is used to set up the databases, the PCA-SPN of query fingerprint, \(y^d_q\), is correlated by each fingerprint in the database. Similarly, for LDA-SPN, \(z_q\) is correlated with fingerprint in the more compressed dataset. Decision thresholds \(\tau _{y}\) and \(\tau _{z}\) can be setup based on desired false acceptance rate (Fridrich and Goljan 2010).

3.5 PRNU Compression via Random Projection

Valsesia et al. (2015a, 2015b) utilized the idea of applying random projections (RP) followed by binary quantization to reduce fingerprint dimension for camera identification on large databases.

Dimension Reduction

Given a dataset, D which contains N n-dimensional fingerprints, the goal is to reduce each fingerprint to m-dimensions with slight or no information loss where \(m<< n\). RP is a method to reduce original fingerprints using a random matrix \(\Phi \in R^{m\times n}\) so that the original dataset, \(D \in R^{n\times N}\), can be compressed to a lower dimensional subspace, \(A \in R^{m\times N}\), as follows:

$$\begin{aligned} A = \Phi D \end{aligned}$$
(6.9)

Johnson et al. present that given an n dimensional data, there exists an m dimensional Euclidean space via a transformation that retains all pairwise distances within a factor of \(1+\epsilon \) (Lindenstrauss 1984).

A linear mapping, f, can be represented by a random matrix \(\Phi \in R^{m\times n}\). The elements of \(\Phi \) can be drawn from certain probability distributions such as Gaussian or Rademacher (Achlioptas 2003). However, one of the main drawbacks of generating \(m\times n\) random numbers. Given that n is on the order of several million, generating and storing such a random matrix is not feasible as it would require too much memory. Moreover, full matrix multiplication is carried out for the dimension reduction of each fingerprint. This problem can be overcome by using circulant matrices. Instead of generating an entire \(m\times n\) matrix, only the first row is generated in Gaussian distribution, and the rest of the matrix can be obtained by circular shift. The multiplication after generating \(\Phi \) can be done using Fast Fourier Transform (FFT). Thanks to FFT, the dimension reduction can be efficiently done with a complexity of \(O(N\times n \times logn)\) as opposed to \(O(N\times m \times n)\).

Further improvement can be achieved by binarization. Instead of a floating value, each element can be represented by a single bit as

$$\begin{aligned} A = sign(\Phi D) \end{aligned}$$
(6.10)

The compressed dataset A is now a matrix consisting of N vectors, each of which is a separate fingerprint, \(K_i\) as

$$\begin{aligned} A = \{y_i: y_i = sign(\Phi K_i)\}, i = \{1, \dots , N\} \end{aligned}$$
(6.11)

Search

Now that database of N fingerprints is set up, the authors present two methods to identify the source of a query fingerprint: linear search and hierarchical search. In these search methods, a query fingerprint, \(\hat{K} \in R^n\) is first transformed in similar manner as the database (i.e., random projection followed by binary quantization) as follows:

$$\begin{aligned} \hat{y} = sign(\Phi \hat{K}) \end{aligned}$$
(6.12)

Since the query fingerprint and the N fingerprints in the dataset are compressed to their binary form, pairwise correlation can be done using Hamming distance instead of Pearson correlation coefficient.

Linear Search

The most straightforward approach for search is by brute-force approach. In this method, the compressed query fingerprint, \(\hat{y}\) is compared against each fingerprint in A, \(y_i\). Therefore, similar to the search over fingerprint digests (Sect. 6.3.1), the computational complexity of the method is proportional to the number of fingerprints, N (i.e., O(N)). Moreover, this method also provides the benefit achieved by the binary quantization (Sect. 6.3.2). However, it is crucial to note that the number of pixels used for a fingerprint digest is significantly lower than the random projection. While fingerprint digests typically require up to a few 10 K pixels, this method typically requires 0.5 M pixels.

Hierarchical Search

A first step improvement for linear search is creating a hierarchical search schema where two different sets of compressed datasets are created. In the first step a coarse version of the compressed fingerprints \(m_1\) elements (\(m_1<< m\)) correlated with a query fingerprint. This step works as a pre-filter that each coarse query fingerprint is correlated with the coarse versions of the N fingerprints. This step reduces the search space to \(N^\prime \) (\(N^\prime<< N\)). In the second step, the m random projections of the \(N^\prime \) fingerprint are loaded into memory and correlated with the m random projections of the query fingerprint. This way, as opposed to loading \(m\times N\) bits into memory, \(m_1 \times N + m \times N^\prime \) bits are loaded.

One way to do this is to pick \(m_1\) random indices such as the first \(m_1\) of them. However, this method is not robust enough to achieve any improvement over the linear search presented above. An alternative method is to pick indices with the largest magnitude similar to fingerprint digests (Goljan et al. 2010).

3.6 Preprocessing, Quantization, Coding

Another method that is proposed to compress PRNU fingerprints is through preprocessing, quantization, and finally entropy coding (Bondi et al. 2018). The preprocessing step consists of decimation and random projection. In this step, the number of fingerprint elements is reduced. The quantization step aims to reduce the number of bits per fingerprint element. Finally, entropy coding is another step to further decrease the number of elements, which is similar to Huffman coding.

Preprocessing Step

The first step in this method is decimation. Given a fingerprint K with r rows, and c columns, this step decimates the fingerprint over d times. The columns are grouped by d elements to create \(\lceil \frac{c}{d} \rceil \). Then the same operation is done along the rows. Thus, the resolution of the output fingerprint becomes \(\lceil \frac{r}{d} \rceil \times \lceil \frac{c}{d} \rceil \).

For decimation, the authors propose to use a \(1 \times 5\) bicubical interpolation as

$$\begin{aligned} K_d = {\left\{ \begin{array}{ll} 1.5|x|^3 - 2.5|x|^2 + 1, &{} \text {if } |x| \le 1 \\ -0.5|x|^3 + 2.5|x|^2 - 4|x| + 2, &{} \text {if } 1 < |x| \le 2\\ 0, &{} \text {otherwise} \\ \end{array}\right. } \end{aligned}$$
(6.13)

\(K_d\) is flattened to create a column vector. Then, Random Projection (RP) is applied to \(K_d\) to produce \(r_P^*\) which contains P elements.

Dead-Zone Quantization

Although binary quantization is shown to be an effective method to reduce bitrate, the authors present a slightly different approach to divide to quantize \(r_P^*\). Instead of checking the sign of a fingerprint element, the authors divide the value range into 3 ranges, and depending on where an element lies, its value is assigned as one of \(\{0, 1, -1\}\) as follows:

$$\begin{aligned} r^\delta (i) = {\left\{ \begin{array}{ll} +1, &{} \text {if } r*(i) > \sigma \delta \\ 0, &{} \text {if } -\sigma \delta \le r*(i) \le \sigma \delta \\ -1, &{} \text {if } r*(i) < -\sigma \delta \\ \end{array}\right. } \end{aligned}$$
(6.14)

where \(\sigma \) is the standard deviation of \(r^*_{P}\) and \(\delta \) is a factor for determining sparsity of the vector. Then \(+1\)’s are assigned to 10 and \(-1\)’s to 11 to create a bit-stream, \(r^\delta _P\) from \(r^*_{P}\).

Entropy Coding

The last step of this pipeline is to apply an arithmetic entropy coding to the bit-stream \(r^\delta _P\). Here \(\delta \) is a tunable variable that by increasing it, more sparse vector can be created.

4 Decreasing the Number of Comparisons

The methodologies presented in the previous section aim to speed up pairwise comparisons. Although they can significantly decrease the cost of a fingerprint search, this improvement is not sufficient given the sheer number of cameras available. Often, a large database can contain many millions of cameras, and alternative methods may be required to carry over search in large datasets. This section presents an alternative methodology that improves searching in large databases by reducing pairwise comparisons.

4.1 Clustering by Cameras

An obvious way to reduce the search space is by clustering camera fingerprints by their brands or models. A query fingerprint can be compared with only the camera fingerprints from the same brand/model.

As presented in Chapter 4, a fingerprint dataset, \(D = \{K_1, K_2, \dots K_N\}\) can be divided into smaller groups as \(D = \{D_1, D_2, \dots D_m\}\) where \(D_i\) is a set that consists of a single camera model.

This way, a query fingerprint from the same camera model as \(D_i\) can be searched within that group and a speedup of \(\frac{|D |}{|D_i |}\) can be achieved where \(|d |\) is the length of the list d.

One of the main strengths of this approach is that it can be combined with any other search methodology presented in this chapter.

4.2 Composite Fingerprints

Bayram et al. (2012) proposed a group testing approach to organize large databases and utilize it to decrease the number of comparisons when searching for the source of a camera fingerprint (or an image). In group testing methods, multiple objects are combined to create a composite object. When a composite object is compared with a query object, a negative decision indicates that none of the objects in the group matches with the query object. On the other hand, a positive decision indicates one or more objects in the composite may match with the query object.

Building a Composite Tree

The proposed method is proposed to generate composite fingerprints, which creates unordered binary trees. A leaf node on these trees is a single fingerprint, whereas the internal nodes are the composition of their descendant nodes.

Fig. 6.1
figure 1

a Composite Fingerprint-based Search Tree (CFST): query is matched against the tree of composite fingerprints with fingerprints at the leaves

Suppose that there are N fingerprints, \(\{F_1, F_2 \dots F_N\}\) which will be used to generate a composite tree as in Fig. 6.1. The composite tree in the figure is formed of 4 fingerprints (i.e., \(\{F_1, F_2, F_3, F_4\}\)), and the composite fingerprints are defined as

$$\begin{aligned} C_{1:N} = \frac{1}{\sqrt{N}} \sum _{j=1}^{N} F_j \end{aligned}$$
(6.15)

The reason for normalizing the composite fingerprint by \(\frac{1}{\sqrt{N}}\) is to ensure the composite has unit variance.

Notice that one of the main drawbacks of this approach is that it doubles the storage usage (i.e., for N fingerprints, a composite tree contains \(2*N-1\)).

Source Identification

Suppose that a database D consists of N fingerprints of length n, i.e., \(D=\{F_1, F_2 \dots F_N\}\). Now consider a query fingerprint, \(F_q\) whose source is under questioning. To determine whether \(F_q\) is captured by any camera in the composite tree formed of \(\{F_1, F_2 \dots F_N\}\), \(F_q\) and \(C_{1:N}\) are correlated.

Given a preset threshold, \(\tau \), the correlation of \(F_q\) and \(C_{1:N}\) can result in the following:

  • \(H^c_0\): \(corr(F_q, C_{1:N}) < \tau \): none of the fingerprints in \(C_{1:N}\) match with \(F_q\)

  • \(H^c_1\): \(corr(F_q, C_{1:N}) \ge \tau \), one or more fingerprints in \(C_{1:N}\) may match with \(F_q\).

The decision of the preset threshold, \(\tau \) depends on the number of fingerprints in a composite (i.e., N in this case).

If the decision of the correlation with \(C_{1:N}\) is positive, \(F_q\) is correlated with its children (i.e., \(C_{1:N/2}\) and \(C_{N/2+1:N}\)). This process is recursively done until a match is found or all pairwise comparisons are found to be negative.

5 Hybrid Methods

In this section, other methodologies combining multiple of the previous methods. These methods improve ...

5.1 Search over Composite-Digest Search Tree

In this work, Taspinar et al. (2017) leverages the individual strengths of the search over fingerprint digest (Sect. 6.3.1) and composite fingerprint (Sect. 6.4) to make the search on large databases more efficient.

In this method, two-level composite-digest search trees are created as in Fig. 6.2. Each root node is a composite fingerprint which is the mean of the original fingerprints in the tree, and each leaf node is a digested version of individual fingerprints.

Similar to Sect. 6.4, as the number of fingerprints increases in a composite fingerprint, the share of each fingerprint will decrease, which will result in a less reliable decision. Therefore, creating a composite from so many fingerprints will not be a viable option. Because this will lead to more incorrect decisions (i.e., lower true positive and higher false positive rates). Therefore, a more viable approach is to divide the large dataset into smaller subgroups.

Fig. 6.2
figure 2

Composite-Digest Search Tree (CDST): one-level CFST with fingerprint digests at the leaves instead of fingerprints

The search stage for this approach is performed in two steps. In the first stage, a query fingerprint is compared with composite fingerprints. If the decision is negative, it indicates that the entire subset doesn’t contain the source camera of the query image. Otherwise, a linear search is done over the fingerprint digests (leaf nodes) as opposed to the linear full-length versions of the fingerprints.

5.2 Search over Full Digest Search Tree

To further improve the efficiency of the previous method, Taspinar et al. present the search over a full digest search tree where the tree structure and search method are the same except that composite fingerprint are also digested as in Fig. 6.3 (Taspinar et al. 2017).

Fig. 6.3
figure 3

Full Digest Search Tree (FDST): one-level CFST comprised of composite digests and fingerprint digests

In this method, composite fingerprints contain more fingerprint elements than individual fingerprint digests. The reason for this is a composite fingerprint obtained from n fingerprints is contaminated by \(n-1\) other fingerprints when it is correlated by a fingerprint obtained from one of its child cameras. Hence, typically a higher number of fingerprint elements are used for composite fingerprint digests.

6 Conclusion

This chapter presented source camera identification on large databases, which is a crucial task when a crime image or video such as child pornography is found, and there is no suspect camera available. To tackle this problem, visual objects can be collected from social media, and a fingerprint estimate can be extracted from each camera in the set.

Assuming that there is no geometric transformation on visual objects, or the transformations are reverted through a registration step, this chapter focused on determining if one of the cameras in the dataset has captured the query visual objects.

The methods are grouped under three main categories: reducing the complexity of pairwise fingerprint correlation, decreasing the number of correlations, and hybrid methods that use two methods to achieve both of the ways of speeding up.

Various techniques have been presented in this chapter that can help speed up the search.