A provenance graph depicts the story of edits and manipulations underwent by a media asset. This section focuses on the provenance graph of images, whose vertices individually represent the image variants and whose edges represent the direct pairwise image relationships. Depending on the transformations applied to one image to obtain another, the two connected images can share partial to full visual content. In the case of partial content sharing, the source images of the shared content are called the donor images (or simply donors), while the resultant manipulated image is called the composite image. In full-content sharing, we have near-duplicate variants when one image is created from another through a series of transformations such as cropping, blurring, and color changes. Once a set of related images is collected from the first stage of content retrieval (see Sect. 15.2), a fine-grained analysis of pairwise relationships is required to obtain the full provenance graph. This analysis involves two major steps, namely (1) image similarity computation and (2) graph building.
Similarity computation involves understanding the degree of similarity between two images. It is a fundamental task for any visual recognition problem. Image matching methods are at the core of vision-based applications, ranging from handcrafted approaches to modern deep-learning-based solutions. A matching method is a similarity (or dissimilarity) score that can be used for further decision-making and classification. For provenance analysis, computing pairwise image similarity helps distinguish between direct versus indirect relationships. A selection of a feasible set of pairwise relationships creates a provenance graph. To analyze the closest provenance match to an image in the provenance graph, pairwise matching is performed for all possible image pairs in the set of k retrieved images. The similarity scores are then recorded in a matrix \(\mathbf {M}\) of size \(k \times k\) where each cell indexed \(\mathbf {M}(i, j)\) represents the similarity between image \(I_i\) and image \(I_j\).
Graph building, in turn, comprises the task of constructing the provenance graph after similarity computation. The matrix containing the similarity scores for all pairs of images involved in the provenance analysis for each case can be interpreted as an adjacency matrix. This implies that each similarity score in this matrix is the weight of an edge in a complete graph of k vertices. Extracting a provenance graph requires selecting a minimal set of edges that span the entire set of relevant images or vertices (this can be different from k). If the similarity measure used for the previous stage is symmetric, the final graph will be undirected, whereas an asymmetric measure of the similarity will lead to a directed provenance graph. The provenance cases considered in the literature, so far, are spanning trees. This implies that there are no cycles within graphs, and there is at most one path to get from one vertex to another.
Approaches
There are multiple aspects of a provenance graph. Vertices represent the different variants of an image or visual subject, pairwise relationships between images (i.e., undirected edges) represent atomic manipulations that led to the evolution of the manipulated image, and directions for these relationships provide more precise information about the change. Finally, the last details are the specific operations performed on one image to create the other. The fundamental task for an image-based provenance analysis is, thus, performing image comparison. This stage requires describing an image using a global or a set of local descriptors. Depending on the methods used for image description and matching, the similarity computation stage can create different types of adjacency weight matrices. The edge selection algorithm then depends on the nature of the computed image similarity. In the rest of this section, we present a series of six graph construction techniques that have been proposed in the literature and represent the current state of the art in image provenance analysis.
Undirected Graphs: A simple and yet-effective graph construction solution was proposed by Bharati et al. (2017). It takes the top k retrieved images for the given query and computes the similarity between the two elements of every image pair, including the query, through keypoint extraction, description, and matching. Keypoint extractors and descriptors, such as SIFT (see Lowe 2004) or SURF (see Bay et al. 2008), offer a manner to highlight the important regions within the images (such as corners and edges), and to describe their content in a way that is robust to several of the transformations manipulated images might have been through (such as scaling, rotating, and blurring.). The quantity of keypoint matches that are geometrically consistent with the others in the match set can act as an image similarity score for each image pair.
As depicted in Fig. 15.15, two images that share visual content will present more keypoint matches (see Fig. 15.15a) than the ones that have nothing in common (see Fig. 15.15b). Consequently, a symmetric pairwise image adjacency matrix can be built by simply using the number of keypoint matches between every image pair. Ultimately, a maximum spanning tree algorithm, such as Kruskal’s (1956) or Prim’s (1957), can be used to generate the final undirected image provenance graph.
Directed Graphs: The previously described method has the limitation of generating only symmetric adjacency matrices, therefore, not providing enough information to compute the direction of the provenance graphs’ edges. As explained in Sect. 15.1, within the problem of provenance analysis, the direction of an edge within a provenance graph expresses the important information of which asset gives rise to the other.
Aiming to mitigate this limitation and inspired by the early work of Dias et al. (2012), Moreira et al. (2018) proposed an extension to the keypoint-based image similarity computation alternative. After finding the geometrically consistent keypoint matches for each pair of images \((I_i, I_j)\), the obtained keypoints can be used for estimating the homography \(H_{ij}\) that guides the registration of image \(I_i\) onto image \(I_j\), as well as the homography \(H_{ji}\) that analogously guides the registration of image \(I_j\) onto image \(I_i\).
In the particular case of \(H_{ij}\), after obtaining the transformation \(T_j(I_i)\) of image \(I_i\) towards \(I_j\), \(T_j(I_i)\) and \(I_j\) are properly registered, with \(T_j(I_i)\) presenting the same size of \(I_j\) and the matched keypoints relying on the same position. One can, thus, compute the bounding boxes that enclose all the matched keypoints within each image, obtaining two correspondent patches \(R_1\), within \(T_j(I_i)\), and \(R_2\), within \(I_j\). With the two aligned patches at hand, the distribution of the pixel values of \(R_1\) can be matched to the distribution of \(R_2\), before calculating the similarity (or dissimilarity) between them.
Considering that patches \(R_1\) and \(R_2\) have the same width W and height H after content registration, one possible method of patch dissimilarity computation is the pixel-wise mean squared error (MSE):
$$\begin{aligned} MSE(R_1, R_2) = \frac{\sum _{w}^{W}{\sum _{h}^{H}{(R_1(w, h) - R_2(w, h)})^2}}{H \times W}, \vspace{0.2cm} \end{aligned}$$
(15.1)
where \(R_1(w, h) \in [0, 255]\) and \(R_2(w, h) \in [0, 255]\) are the pixel values of \(R_1\) and \(R_2\) at position (w, h), respectively.
Alternatively to MSE, one can express the similarity between R1 and R2 as the mutual information (MI) between them. From the perspective of information theory, MI is the amount of information that one random variable contains about another. From the point of view of probability theory, it measures the statistical dependence of two random variables. In practical terms, assuming each random variable as, respectively, the aligned and color-corrected patches \(R_1\) and \(R_2\), the value of MI can be given by the entropy of discrete random variables:
$$\begin{aligned} MI(R_1, R_2) = \sum _{x \in R_1}^{ }{\sum _{y \in R_2}^{ }{p(x, y)}} \log \left( \frac{p(x, y)}{\sum _{x}^{ } {p(x, y)} \sum _{y}^{ } {p(x, y)}} \right) , \vspace{0.2cm} \end{aligned}$$
(15.2)
where \(x \in [0, 255]\) refers to the pixel values of \(R_1\), and \(y \in [0, 255]\) refers to the pixel values of \(R_2\). The p(x, y) value regards the joint probability distribution function of \(R_1\) and \(R_2\). As explained by Costa et al. (2017), it can be satisfactorily approximated by
$$\begin{aligned} p(x, y) = \frac{h(x, y)}{\sum _{x, y}^{ } {h(x, y)}}, \vspace{0.2cm} \end{aligned}$$
(15.3)
where h(x, y) is the joint histogram that counts the number of occurrences for each possible value of the pair (x, y), evaluated on the corresponding pixels for both patches \(R_1\) and \(R_2\).
As a consequence of their respective natures, while MSE is inversely proportional to the two patches’ similarity, MI is directly proportional. Aware of this, one can either use (i) the inverse of the MSE scores or (ii) the MI scores directly as the similarity elements \(s_{ij}\) within the pairwise image adjacency matrix \(\mathbf {M}\), to represent the similarity between image \(I_j\) and the transformed version of image \(I_i\) towards \(I_j\), namely \(T_j(I_i)\).
The homography \(H_{ji}\) is calculated in an analogous way to \(H_{ij}\) with the difference that \(T_i(I_j)\) is manipulated by transforming \(I_j\) towards \(I_i\). Due to this, the size of the registered images, the format of the matched patches, and the matched color distributions will be different, leading to unique MSE (or MI) values for setting \(s_{ji}\). Since \(s_{ij}\ne s_{ji}\), the resulting similarity matrix \(\mathbf {M}\) will be asymmetric. Figure 15.16 depicts this process.
Upon computing the full matrix, the assumption introduced by Dias et al. (2012) is that, in the case of \(s_{ij} > s_{ji}\), it would be easier to transform image \(I_i\) towards image \(I_j\), than the contrary (i.e., \(I_j\) towards \(I_i\)). Analogously, \(s_{ij} < s_{ji}\) would mean the opposite. This information can, thus, be used for edge selection. The oriented Kruskal (2012) solution (with a preference for higher adjacency weights) would help construct the final provenance graph.
Clustered Graph Construction: As an alternative to the oriented Kruskal, Moreira et al. (2018) introduced a method of directed provenance graph building, which leverages both symmetric keypoint-based and asymmetric mutual-information-based image similarity matrices.
Inspired by Oikawa et al. (2015) and dubbed clustered graph construction, the idea behind such a solution is to group the available retrieved images in a way that only near-duplicates of a common image are added to the same cluster. Starting from the image query \(I_q\) as the initial expansion point, the remaining images are sorted according to the number of geometrically consistent matches shared with \(I_q\), from the largest to the smallest. The solution then clusters probable near-duplicates around \(I_q\), as long as they share enough content, which is decided based upon the number of keypoint matches (see Daniel Moreira et al. 2018). Once the query’s cluster is finished (i.e., the remaining images do not share enough keypoint matches with the query), a new cluster is computed over the remaining unclustered images, taking another image of the query’s cluster as the new expansion point. This process is repeated iteratively by trying different images as the expansion point until every image belongs to a near-duplicate cluster.
Once all images are clustered, it is time to establish the graph edges. Images belonging to the same cluster are sequentially connected into a single path without branches. This makes sense in scenarios containing sequential image edits where one near-duplicate is obtained on top of the other. As a consequence of the iterative execution and selection of different images as expansion points, the successful ones (i.e., the images that were helpful in the generation of new image clusters) fatally belong to more than one cluster, hence serving as graph bifurcation points. Orthogonal edges are established in such cases, allowing every near-duplicate image branch to be connected to the final provenance graph through an expansion point image as a joint. To determine the direction of every single edge, Moreira et al. (2018) suggested using the mutual information similarity asymmetry in the same way as depicted in Fig. 15.16.
Leveraging Metadata: Image comparison techniques may be limited depending upon the transformations involved in any given image’s provenance analysis. In cases where the transformations are reversible or collapsible, the visual content analysis may not suffice for edge selection during graph building. Specifically, the homography estimation and color mapping steps involved in asymmetric matrix computation for edge direction inference could be noisy. To make this process more robust, it is pertinent to utilize other evidence sources to determine connections. As can be seen from the example in Fig. 15.17, it is difficult to point out the plausible direction of manipulation with visual correspondence, but auxiliary information related to the image, mostly accessible within the image files (a.k.a. image metadata), can increase confidence in predicting the directions.
Image metadata, when available, can provide additional evidence for directed edge inference. Bharati et al. (2019) identify highly relevant tags for the task. Specific tags that provide the time of image acquisition and editing, location, editing operation, etc. can be used for metadata analysis that corroborates visual evidence for provenance analysis. An asymmetric heuristic-based metadata comparison parallel to a symmetric visual comparison is proposed. The metadata comparison similarity scores are higher for image pairs (ordered) with consistency from more sets of metadata tags. The resulting visual adjacency matrix is used for edge selection, while the metadata-based comparison scores are used for edge direction inference. As explained in clustered graph construction, there are three parts of the graph building method, namely node cluster expansion, edge selection, and assigning directions to edges. The metadata information can supplement the last two depending on the specific stage at which it is incorporated. As metadata tags can be volatile in the world of intelligent forgeries, a conservative approach is to use them to improve the confidence of the provenance graph obtained through visual analysis. The proposed design enables the usage of metadata when available and consistent.
Transformation-Aware Embeddings: While metadata analysis can improve the fidelity of edge directions in provenance graphs when available and not tampered with, local keypoint matching for visual correspondence faces challenges in image ordering for provenance analysis. Local matching is efficient and robust to finding shared regions between related images. This works well for connecting donors with composite images but can be insufficient in capturing subtle differences between near-duplicate images, which affect the ordering of long chains of operations. Establishing sequences of images that vary slightly based on the transformations requires differentiating between slightly modified versions of the same content.
Towards improving the reconstructions of globally-related image chains in provenance graphs, Bharati et al. (2021) proposed encoding awareness of the transformation sequence in the image comparison stage. Specifically, the devised method learns transformation-aware embeddings to better order related images in an edit sequence or provenance chain. The framework uses a patch-based siamese structure trained with an Edit Sequence Loss (ESL) using sets of four image patches. Each set is expressed as quadruplets or edit sequences, namely (i) the anchor patch, which represents the original content, (ii) the positive patch, a near-duplicate of the anchor after M image processing transformations, (iii) the weak positive patch, the positive patch after N transformations, and (iv) the negative patch, a patch that is unrelated to the others. The quadruplets of patches are obtained for training using a specific set of image transformations that are of interest to image forensics, particularly image phylogeny and provenance analysis, as suggested in Dias et al. (2012). For each anchor patch, random unit transformations are sequentially applied, one on top of the other’s result, allowing to generate positive and weak positive patches from the anchor, after M and \(M+N\) transformations, respectively. The framework aims at providing distance scores to pairs of patches, where the output score between the anchor and the positive patch is smaller than the one between the anchor and the weak positive, which, in turn, is smaller than the score between the anchor and the negative patch (as shown in Fig. 15.18).
Given a feature vector for an anchor image patch a, two transformed derivatives of the anchor patch p (positive) and \(p^\prime \) (weak positive) where \(p=T_M(a)\) and \(p^\prime =T_N(T_M(a))\), and an unrelated image patch from a different image n, ESL is a pairwise margin ranking loss computed as follows:
$$\begin{aligned} \begin{aligned} SL(a,p,{p^\prime },n)\, =&\,{\text {max}}(0, - y \times (d(a,{p^\prime }) - d(a,n)) + {\mu _1}) + \\&\,{\text {max}}(0, - y \times (d(p,{p^\prime }) - d(p,n)) + {\mu _2}) + \\&\,{\text {max}}(0, - y \times (d(a,p) - d(a,{p^\prime })) + {\mu _3}) \\ \end{aligned} \end{aligned}$$
(15.4)
Here, y is the truth function which determines the rank order (see Rudin and Schapire 2009) and \(\mu _1\), \(\mu _2\), and \(\mu _3\) are margins corresponding to each pairwise distance term and are treated as hyperparameters. Both terms having the same sign implies ordering is correct, and the loss is zero. A positive loss is accumulated when the ordering is wrong, and they are of opposite signs.
The above loss is optimized, and the model corresponding to the best measure for validation is used for feature extraction from patches of test images. Features learned with the proposed technique are used to provide pairwise image similarity scores. The value \(d_{ij}\) between images \(I_i\) and \(I_j\) is computed by matching the set of features (extracted from patches) from one image to the other using an iterative greedy brute-force matching strategy. At each iteration, the best match is selected as the pair of patches between image \(I_i\) and image \(I_j\) whose l2-distance is the smallest and whose patches did not participate in a match on previous iterations. This guarantees a deterministic behavior regardless of the order of the images, meaning that either comparing the patches of \(I_i\) against \(I_j\) or vice-versa will lead to the same consistent set of patch pairs. Once all patch pairs are selected, the average l2-distance is calculated and finally set as \(d_{ij}\). The inverse of \(d_{ij}\) is then used to set both \(s_{ij}\) and \(s_{ji}\) within the pairwise image similarity matrix \(\mathbf {M}\), which in this case is a symmetric one. Upon computing all values within \(\mathbf {M}\) for all possible image pairs, a greedy algorithm (such as Kruskal’s 1956) is employed to order these pairwise values and create an optimally connected undirected graph of images.
Leveraging Manipulation Detectors: A challenging aspect of image provenance analysis is establishing high-confidence direct relationships between images that share a small portion of content. Keypoint-based approaches may not suffice as there may not be enough keypoints in the shared regions, and global matching approaches may not appropriately capture the matching region’s importance. To improve analysis of composite images where source images have only contributed a small region and determine the source image among a group of image variants, Zhang et al. (2020) proposed to combine a pairwise ancestor-offspring classifier with manipulation detection approaches. They build the graph by combining edges based on both local feature matching and pixel similarity.
Their proposed algorithm attempts to balance global and local features and matching scores to boost performance. They start by using a weighted combination of the matched SIFT keypoints and the matched pixel values for image pairs that can be aligned, and null for the ones that cannot be aligned. A hierarchical clustering approach is used to group images coming from the same source together. For graph building within each determined cluster, the authors combine the likelihood of images being manipulated or extracted from a holistic image manipulation detector (see Zhang et al. 2020) and the pairwise ancestor score extracted by an L2-Net (see Tian et al. 2017). The image manipulation detector uses a patch-based convolutional neural network (CNN) to predict manipulations from a median-filtered residual image. For ambiguous cases where the integrity score may not be assigned accurately, a lightweight CNN-based ancestor-offspring network takes patch pairs as input and predicts one’s scores to be derived from the other. The similarity scores used as edge weights are the average of the integrity and the ancestor scores from the two used networks. The image with the highest score among the smaller set of images is considered as the source. All incoming links to this vertex are removed to reduce confusion in directions. This one is then treated as the root of the arborescence built by applying Chu-Liu/Edmonds’ algorithm (see Chu 1965; Edmonds 1967) on pairwise image similarities.
The different arborescences are connected by finding the best-matched image pair among the image clusters. If the matched keypoints are above a threshold, these images are connected, indicating a splicing or composition possibility. As reported in the following section, this method obtains state-of-the-art results on the NIST challenges (MFC18 2018 and MFC19 2019), and it significantly improves the computation of the edges of the provenance graphs over the Reddit Photoshop Battles dataset (see Brogan 2021).
Datasets and Evaluation
With respect to the step of provenance graph construction, four datasets stand out as publicly available and helpful benchmarks, namely NC17 (2017), MFC18 (2018) (both discussed in Sect. 15.2.2), MFC19 (2019), and the Reddit Photoshop Battles dataset (2021).
NC17 (2017): As mentioned in Sect. 15.2.2, this dataset contains an interesting development partition (Dev1-Beta4), which presents 65 image queries, each one belonging to a particular manually curated provenance graph. As expected, these provenance graphs are provided within the partition as ground truth. The number of images per provenance graph ranges from two to 81, with the average graph order being equal to 13.6 images.
MFC18 (2018): Besides providing images and ground truth for content retrieval (as explained in Sect. 15.2.2), the Eval-Ver1-Part1 partition of this dataset also provides provenance graphs and 897 queries aiming at evaluating graph construction. For this dataset, the average graph order is 14.3 images, and the resolution of its images is larger, on an average, when compared to NC17. Moreover, its provenance cases encompass a larger set of applied image manipulations.
MFC19 (2019): A more recent edition of the NIST challenge released a larger set of provenance graphs. The Eval-Part1 partition within the MFC19 (2019) dataset has 1,027 image queries, and the average order of the provided ground truth provenance graphs is equal to 12.7 image vertices. In this group, the number of types of image manipulations used to generate the edges of the graphs was almost twice the number of MFC18.
Reddit Photoshop Battles (2021): Aiming at testing the image provenance analysis solutions over more realistic scenarios, Moreira et al. (2018) introduced the Reddit Photoshop Battles dataset. This dataset was collected from images posted to the Reddit community known as r/photoshopbattles (2012), where professional and amateur image manipulators share doctored images. Each “battle” starts with a teaser image posted by a user. Subsequent users post modifications of either the teaser or previously submitted manipulations in comments to the related posts. By using the underlying tree comment structure, Moreira et al. (2018) were able to infer and collect 184 provenance graphs, which together contain 10,421 original and composite images. Figure 15.19 illustrates this provenance graph inference process.
To evaluate the available graph construction solutions, two configurations are proposed by the NIST challenge (2017). In the first one, named the “oracle” scenario, there is a strong focus on the graph construction task. It assumes that a flawless content retrieval solution is available, thus, starting from the ground-truth content retrieval image ranks to build the provenance graphs, with neither missing images nor distractors. In the second one, named “end-to-end” scenario, content retrieval must be performed before graph construction, thus, delivering imperfect image ranks (with missing images or distractors) to the step of graph construction. We rely on both configurations and on the aforementioned datasets to report results of provenance graph construction, in the following section.
Metrics: As suggested by NIST (2017), given a provenance graph G(V, E) generated by a solution whose performance we want to assess, we compute the F1-measure (i.e., the harmonic mean of precision and recall) of the (i) retrieved image vertices V and of the (ii) established edges E, when compared to the ground truth graph \(G'(V', E')\), with its \(V'\) and \(E'\) homologous components. The first metric is named vertex overlap (VO) and the second one is named edge overlap (EO), respectively:
$$\begin{aligned} \begin{aligned} VO(G', G)= & {} 2\times \frac{|V'\cap V|}{|V'|+|V|}, \end{aligned} \end{aligned}$$
(15.5)
$$\begin{aligned} \begin{aligned} EO(G', G)= & {} 2\times \frac{|E'\cap E|}{|E'|+|E|}. \end{aligned} \end{aligned}$$
(15.6)
Moreover, we compute the vertex and edge overlap (VEO), which is the F1-measure of retrieving both vertices and edges simultaneously:
$$\begin{aligned} \begin{aligned} VEO(G', G)= & {} 2\times \frac{|V'\cap V|+|E'\cap E|}{|V'|+|V| + |E'|+|E|}. \end{aligned} \end{aligned}$$
(15.7)
Table 15.4 Results of provenance graph construction over the NC17 dataset. Reported here are the average vertex overlap (VO), edge overlap (EO), and vertex and edge overlap (VEO) values of 65 queries. These experiments were executed in the “oracle” scenario, where the image ranks fed to the graph construction step are perfect (i.e., with neither distractors nor missing images)
In a nutshell, these metrics aim at assessing the overlap between G and \(G'\). The higher the values of VO, EO, and VEO, the better the performance of the solution. Finally, in the particular case of EO and VEO, when they are both assessed for an approach that does not generate directed graphs (such as Undirected Graphs and Transformation-Aware Embeddings, presented in Sect. 15.3.1), an edge within E is considered a hit (i.e., a correct edge) when there is a homologous edge within \(E'\) that connects equivalent image vertices, regardless of the edges’ directions.
Results
Table 15.4 puts in perspective the different provenance graph construction approaches explained in Sect. 15.3.1, when executed over the NC17 dataset. The provided results were all collected in oracle mode, hence the high values of VO (above 0.9), since there are neither distractors nor missing images in the rank lists used to build the provenance graphs. A comparison between rows 1 and 2 within this table shows the efficacy of leveraging image metadata as additional information to compute the edges of the provenance graphs. The values of EO (and VEO, consequently) have a significant increase (from 0.12 to 0.45, and from 0.55 to 0.70, respectively), when metadata is available. In addition, by comparing rows 3 and 4, one can observe the contribution of the data-driven Transformation-Aware Embeddings approach, in the scenario where only undirected graphs are being generated. In both cases, the generated edges have no direction by design, making their edge overlap conditions easier to be achieved (since the order of the vertices within the edges become irrelevant for the computation of EO and VEO, justifying their higher values when compared to rows 1 and 2). Nevertheless, contrary to the first two approaches, these solutions are not able to define which image gives rise to the other within the established provenance edges.
Table 15.5 Results of provenance graph construction over the MFC18 and MFC19 datasets. Reported here are the average vertex overlap (VO), edge overlap (EO), and vertex and edge overlap (VEO) values of 897 queries, in the case of MFC18, and of 1,027 queries, in the case of MFC19. These experiments were executed in the “end-to-end” scenario, thus, building graphs upon imperfect image ranks (i.e., with distractors or missing images)
Table 15.5 compares the current state-of-the-art solution (Leveraging Manipulation Detectors by Xu Zhang et al. 2020) with the official NIST challenge participation results of the Purdue-Notre Dame team (2018), for both MFC18 and MFC19 datasets. In both cases, the reported results refer to the more realistic end-to-end scenario, where performers must execute content retrieval prior to building the provenance graphs. As a consequence, the image ranks fed to the graph construction step are noisy, since they contain both missing images and distractors. For all the reported cases, the image ranks had 50 images and presented an average R@50 of around 90% (i.e., nearly 10% of the needed images are missing). Moreover, nearly 35% of the images within the 50 available ones in a rank are distractors, on average. The best solution (contained in rows 2 and 4 within Table 15.5) still delivers low values of EO when compared to VO, revealing an important limitation of the available approaches.
Table 15.6 Results of provenance graph construction over the Reddit Photoshop Battles dataset. Reported here are the average vertex overlap (VO), edge overlap (EO), and vertex and edge overlap (VEO) values of 184 queries. These experiments were executed in the “oracle” scenario, where the image ranks fed to the graph construction step are perfect. “N.R.” stands for not-reported values
Table 15.6, in turn, reports results on the Reddit Photoshop Battles dataset. As one might observe, especially in terms of EO, this set is a more challenging one for the graph construction approaches, except for the state-of-the-art solution (Leveraging Manipulation Detectors by Xu Zhang et al. (2020). While methods that have worked fairly well on the NC17, MFC18, and MFC19 datasets drastically fail in the case of the Reddit dataset (see EO values below 0.10 in the case of rows 1 and 2), the state-of-the-art approach (in the last row of the table) more than doubles the results of EO. Again, even with this improvement, increasing the values of EO within graph construction solutions is still an open problem that deserves attention from researchers.