A New Database and Protocol for Image Reuse Detection
- 7.9k Downloads
Abstract
The use of visual elements of an existing image while creating new ones is a commonly observed phenomenon in digital artworks. The practice, which is referred to as image reuse, is not an easy one to detect even with the human eye, less so using computational methods. In this paper, we study the automatic image reuse detection in digital artworks as an image retrieval problem. First, we introduce a new digital art database (BODAIR) that consists of a set of digital artworks that re-use stock images. Then, we evaluate a set of existing image descriptors for image reuse detection, providing a baseline for the detection of image reuse in digital artworks. Finally, we propose an image retrieval method tailored for reuse detection, by combining saliency maps with the image descriptors.
Keywords
Image database Digital art Image retrieval Feature extraction DeviantArt Image reuse BODAIR1 Introduction
One of the main focus of art historical research is the detection of stylistic similarities between works of art. As early as 1915, Heinrich Wöllflin, who is deemed by many as the “father” of art history as a discipline, introduced the notion of comparing artworks to define the style of a period [1]. Art historians, connoisseurs, and art critics are trained to detect whether certain features of an artwork are apparent in another one, and whether two artworks belong to the same artist or not. The experts not only use their visual understanding for such detection, but also rely heavily on historical records and archival information, which are not always sufficiently clear or available. Hence, for decades, art historical research has applied scientific methods such as infrared and x-ray photographic techniques (among others) to help in different instances where the trained eye faltered. Using computational approaches in detecting stylistic traditions of artworks is a relatively new addition to the field [2]. In this paper, we introduce a new digital image database that consists of original artworks that are re-used to create new artworks. We use this database to examine approaches for image reuse detection. In the long run, trying to detect which image is reused with computational methods will help in detecting stylistic similarities between artworks in general [3].
In Western tradition, artists learned their trade by joining ateliers of masters as apprentices. With the introduction of printing press and the wider availability of paper, and especially due to the replacement of etchings on woodblocks with engravings on metal proliferated the art education in ateliers. Metal engravings started to be widely used to teach apprentices drawing, by copying known forms and designs. Novices used these models as the basis of their new artworks, and in that sense, these designs might be the first ones that were massively re-used in visual art. Today, a similar tendency is to use the so called “stock-images” for the same purpose: to help facilitate the design of a new artwork. These images are made freely available online, and can be found in repositories and dedicated websites. With the help of multimedia technologies and digital drawing tools, as well as the availability of free stock images, it has become a common approach in digital image creation to reuse existing images. The digital re-use scenarios are on the one hand quite different than their forefathers from centuries ago: they heavily rely on photo manipulation tools to generate a desired effect or design. On the other hand, certain photo manipulation tools offer the same (basic) design changes that were commonly used centuries ago. Unlike early archives with erroneous and missing data, today, we may have access to precise information about who has reused which image for which artwork. Social networks and online communities for digital artworks, such as DeviantArt1, and 500px2, help us to follow the interaction between artists to minute detail, and build a reliable database of artworks which have re-used other images.
Image reuse detection in digital art is a high-level semantic task, which can be challenging even for humans. Despite the advances in image retrieval and image copy detection techniques, automatic detection of image reuse remains a challenge due to the lack of annotated data and specific tools designed for the analysis of reuse in digital artworks. Image reuse detection differs from general-purpose image retrieval in its scale and amount of the reused pictorial elements. A small object in an artwork can constitute a major part in another composition. An image can be featured in another image in a variety of forms. Developing a global method that addresses all types of image reuse is challenging, as the types of image reuse and modifications vary greatly among different artists and genres of digital art. Another challenge in reuse detection is that the images can have similar content without actually reusing parts from each other. For example, a famous architectural structure can be depicted by several artists. An ideal image reuse detection system should be able to detect even a small amount of reuse without retrieving false positive images. To develop a robust framework for image reuse detection, it is essential to develop tools and datasets that are designed for the task.
The prolific expansion in the reuse of pictorial elements introduces problems related to the detection and analysis of image reuse. Automatic detection of image reuse would be useful for numerous tasks, including source image location, similar image retrieval [4, 5], popularity and influence analysis [6], image manipulation and forgery detection [7, 8, 9], and copyright violation detection [8, 10, 11]. Information about the sources of elements in an image could be used in image search as a semantic variable in addition to low-level image features. Furthermore, such information would be useful for image influence analysis, for discovering the relationships between different genres of (digital) artworks, measuring the popularity of a specific piece of art, and detecting possible copyright violations.
In this paper, we first introduce a novel database called BODAIR (Bogazici-DeviantArt Image Reuse Database3). The BODAIR database is open for research use under a license agreement. To annotate BODAIR, we introduce a taxonomy in image re-use types and techniques. Next, we evaluate a set of baseline image retrieval methods on this database, discussing their strengths and weaknesses. Finally, we propose a saliency-based image retrieval approach to detect reuse on images.
Example images from the BODAIR database in the animal, food, nature, place, plant, and premade background categories.
2 The Bogazici-DeviantArt Image Reuse Database Database
Examples to the types of reuse: Source images (left), destination images (right).
The etiquette of DeviantArt requires members to leave each other comments if they reuse an image. This tradition helped us to track down which stock images are used in which new works, by performing link and text analysis on the stock image comments. We used regular expressions in the comments to detect any reference to another artwork and crawled more than 16,000 images in the following six subcategories of stock images: animals, food, nature, places, plants, and premade backgrounds (e.g., Fig. 1). Our image crawler used a depth limited recursive search to download the reused images (children images) and related the images to their source images (parent images). In addition to the automatically extracted parent-children relationships between images, we manually annotated a total of 1,200 images for four reuse types and nine manipulation types:
- Partial reuse:
-
superimposition of a selected area in an image on another one.
- Direct reuse:
-
use of an image as a whole, such as insertion or removal of objects, addition of frames or captions, color and texture filters, and background manipulations.
- Remake:
-
remake or inspirational use of an image, such as paintings, sketches, and comics based on another artwork.
- Use as a background:
-
use of an artwork as a part of the background in another image.
- Color manipulations:
-
brightness and contrast change, color replacement, hue and saturation shift, tint and shades, and color balance change.
- Translation:
-
moving the visual elements in an image.
- Texture manipulations:
-
altering the texture of the image, such as excessive blurring/sharpening, overlaying a texture, or tiling a pattern.
- Text overlay:
-
image captions, motivational posters, and flyer designs.
- Rotation:
-
rotation of elements in an image.
- Aspect ratio change:
-
non-proportional scaling of images.
- Alpha Blending:
-
partially transparent overlay of visual elements.
- Mirroring:
-
horizontal or vertical flipping of images.
- Duplicative use:
-
using a visual element more than one time.
Each image in the database has an ID and a reference to the ID of the original work if the image is a reused one. The manually annotated images also include the information about the aforementioned types of reuse and manipulation. The manually annotated images include 200 original images, selected among the most popular posts, and their derivatives in each of the six subcategories. The distribution of the partial reuse, direct reuse, use as a background, and remake/inspiration among the manually annotated images are 27 %, 47 %, 44 %, and 6 %, respectively. The direct reuse and background categories have a considerable overlap, since the background images are generally used as a whole without excessive cropping. In this classification, only the direct and partial reuse categories are considered to be mutually exclusive.
Overlaps between reuse types in the BODAIR database.
Examples of manipulations.
3 Methods
In this section, we describe the methods we apply for image reuse detection. We first summarize several image description methods that are used in matching-based tasks in computer vision, such as content-based image retrieval, image copy detection, and object recognition. Then, we discuss how saliency maps could be combined with image descriptors to improve matching accuracy and reduce computation time in image reuse detection.
Representing an image by its most discriminant properties is an important factor to achieve higher accuracies. Different feature descriptors extract different features from the images to achieve invariance to certain conditions such as color, illumination or viewpoint changes. Traditional image recognition methods usually involve sampling keypoints, computing descriptors from the keypoints, and matching the descriptors [12]. The image descriptors can also be computed over the entire image without sampling keypoints. However, these global features usually perform poorly in detecting partial correspondences between images where a small portion of an image constitutes a major part of another image. Local descriptors, such as SIFT [13] and its color variants CSIFT [14], Opponent-SIFT [14], on the other hand, are more robust in the detection of partial matches. Although the local descriptors usually perform better than global approaches, computation of local descriptors can be computationally expensive as they usually produce a high-dimensional representation of the image which may create a bottleneck in large-scale image retrieval tasks. As suggested by Bosch et al. [12], Bag-of-visual-words (BoW) methods reduce the high-dimensional representation to a fixed-size feature vector, sacrificing some accuracy [14].
More recent approaches make use of convolutional neural networks (CNNs) to learn powerful models from the data itself [15, 16, 17, 18]. Training these models usually requires a large dataset, such as the ImageNet [19] which consists of over 15 million images in more than 22,000 categories. However, it has been shown that the models trained on a set of natural images can be generalized to other datasets [20], and the features learned by a model can be transferred to another model with another task [21].
In this work, we evaluate five image descriptors that are commonly used in image matching and content-based image retrieval problems for image reuse detection: color histograms, Histogram of Oriented Gradients (HOG) [22], Scale Invariant Feature Transform (SIFT) [13], and the SIFT-variants OpponentSIFT and C-SIFT, which are shown to have a better overall performance than the original SIFT and many other color descriptors [14]. In addition, we also use a CNN model [15] pretrained on the ImageNet [19] as a feature extractor, using the fully connected layer outputs (FC6 and FC7) as feature vectors.
Proposed framework for image reuse detection.
Image saliency can help narrow down the areas of interest in image reuse detection. In our earlier work [24], we showed the effectiveness of using saliency maps in image description for image reuse detection. The purpose of the saliency map is to represent the conspicuity or saliency at every spatial location in the visual field by a scalar quantity and to guide the selection of attended locations [25]. Many stock images feature a foreground object that is more likely to be used in other artworks. Therefore, features can be extracted only from the salient regions, which will reduce the processing time and improve the matching accuracy. We use the saliency maps only in the stock images, assuming that each stock image provides such a region of interest to the composition images. We extract features from the query images as a whole, as the use of saliency maps could exclude some references completely.
The overall proposed framework (see Fig. 5) consists of four modules: salient region detection, salient object segmentation, feature extraction, and feature matching. For saliency map estimation, we use a recently proposed Boolean Map based Saliency (BMS) model [26], which is an efficient and simple-to-implement estimator of saliency. Despite its simplicity, BMS achieved state-of-the-art performance on five eye tracking datasets. To segment salient objects, we threshold the saliency maps at their mean intensity to create a binary segmentation mask.
4 Experimental Results
Cumulative matching accuracies for BoW model with different parameters.
4.1 Tuning the Model Parameters
We chose the keypoint sampling strategy and the number of visual words experimentally. For 144 stock and 1,056 query images in the database, we ran the SIFT descriptor with two sampling strategies: sparse salient keypoint detection, and dense sampling. For sparse sampling, we used the default keypoint detector of SIFT, and for dense sampling, we sampled every \(8^{th}\) pixel. Then, we generated a BoW codebook with different vocabulary sizes. We selected the number of clusters for the BoW model as 160, 320, 640, 1,280, and 2,560. The first 20 rank retrieval accuracies for the above-mentioned parameters are shown in Fig. 6.
When the BoW framework is used, dense sampling worked better, as also shown in Nowak et al.’s work on the evaluation of sampling strategies [27]. Thus, we selected uniform dense sampling as our default sampling strategy for the BoW methods. However, in the experiments where we use SIFT with RANSAC without the BoW framework, we selected sparse sampling as our default sampling strategy after some preliminary experiments. Dense sampling increases the outliers in the matching results, which furthermore increases the complexity of finding inliers using RANSAC. Furthermore, sparse sampling results in a smaller set of features, reducing the computational cost.
The accuracy increased parallel to an increase in the number of clusters, i.e. visual words. Increasing the number of clusters did not improve the performance significantly after a point of saturation (Fig. 6). Therefore, we selected the number of visual words as 1,280 in the rest of the experiments.
4.2 Evaluation of the Methods
Top-1 retrieval accuracies on the BODAIR database for the four types of reuse.
Top-5 retrieval accuracies on the BODAIR database for the four types of reuse.
Examples of matching with RANSAC. (Color figure online)
Top-1 retrieval accuracies on the BODAIR database for nine different types of manipulations.
Top-5 retrieval accuracies on the BODAIR database for nine different types of manipulations.
We also evaluated these methods and how they perform when it comes to the nine classified image manipulation types: color manipulation, translation, texture manipulation, text overlay, rotation, aspect ratio change, alpha blending, mirroring, and duplication. Overall, the use of saliency maps improved the Top-1 accuracies, although it caused a small decrease in the Top-5 accuracies. HOG features showed poor performance on cropped and translated images, since HOG is not robust to translations when computed globally. All descriptors seem to have a poor performance on images involving rotations, alpha blending, mirroring, and duplication. However, these types of manipulations are frequently observed in tandem with other manipulations in our database. Therefore, the performance of the descriptors is likely to be affected by more than a single type of manipulation. Experimental results for each of the nine types of manipulations are shown in Figs. 10 and 11.
Overall, SIFT and its color-based variants resulted in a higher accuracy without using the BoW framework. Saliency-based approaches provided a better Top-1 retrieval accuracy almost in all types of reuse and manipulations, when they are applied to the original images only. Even though CNN-based approaches failed to outperform SIFT and its color-based variants, the results are promising. With its overall high performance, we recommend using the Opponent SIFT descriptors with RANSAC as a baseline model for the future use of the BODAIR database.
To investigate the poor performance on the rotation manipulations, we took 12 rotated versions of each query image, extracted SIFT descriptors, and used them in comparisons with the gallery. We took the best matching rotation for each gallery image. The Top-1 accuracy showed some slight improvement (from 0.09 to 0.11), but the Top-5 accuracy did not change. The reason, we figure, is that rotation is often used together with other manipulation and reuse types. The database contains 74 images with rotation, of which 71 contain a translation, 62 contain partial reuse, and 51 contain color manipulation.
5 Conclusions
In this work, we focused on how to detect image reuse in digitally created artworks. To that end, we first collected stock images from DeviantArt, a website where digital artworks are posted by users, and built the BODAIR database. Using automatic link and text analysis in the images’ comment sections, as well as manual labeling, we made available a database that has two sets of images: stock images, and images that reuse those stock images. We furthermore made the distinction between “type of reuse” and “type of manipulation”, i.e. we highlighted the difference between the contextual approach, and technical approach in reuse. We have detected four type of reuse scenarios, and nine ways of manipulations. We evaluated methods for image reuse detection that are widely used in related tasks, such as image retrieval and object recognition. Lastly, we improved the performance of these methods by using saliency maps. The methods we evaluated provide a baseline for the future research on image reuse detection.
Footnotes
References
- 1.Wölfflin, H.: Kunstgeschichtliche Grundbegriffe: das Problem der Stilentwicklung in der neueren Kunst. Münich, Hugo Bruckmann (1915)Google Scholar
- 2.Stork, D.G.: Computer vision and computer graphics analysis of paintings and drawings: an introduction to the literature. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 9–24. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-03767-2_2 CrossRefGoogle Scholar
- 3.Akdag Salah, A.A., Salah, A.A.: Flow of innovation in deviantart: following artists on an online social network site. Mind Soc. 12(1), 137–149 (2013)CrossRefGoogle Scholar
- 4.Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000)CrossRefGoogle Scholar
- 5.Liu, Y., Zhang, D., Lu, G., Ma, W.Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)CrossRefzbMATHGoogle Scholar
- 6.Buter, B., Dijkshoorn, N., Modolo, D., Nguyen, Q., van Noort, S., van der Poel, B., Akdag Salah, A.A., Salah, A.A.: Explorative visualization and analysis of a social network for arts: the case of deviantART. J. Convergence 2(2), 87–94 (2011)Google Scholar
- 7.Bayram, S., Avcibas, I., Sankur, B., Memon, N.: Image manipulation detection. J. Electron. Imaging 15(4), 041102 (2006)CrossRefGoogle Scholar
- 8.Ke, Y., Sukthankar, R., Huston, L., Ke, Y., Sukthankar, R.: Efficient near-duplicate detection and sub-image retrieval. In: ACM Multimedia, pp. 869–876 (2004)Google Scholar
- 9.Fridrich, A.J., Soukal, B.D., Lukáš, A.J.: Detection of copy-move forgery in digital images. In: in Proceedings of Digital Forensic Research Workshop, Citeseer (2003)Google Scholar
- 10.Kim, C.: Content-based image copy detection. Signal Process. Image Commun. 18(3), 169–184 (2003)CrossRefGoogle Scholar
- 11.Zhao, W.L., Ngo, C.W.: Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection. IEEE Trans. Image Process. 18(2), 412–423 (2009)MathSciNetCrossRefGoogle Scholar
- 12.Bosch, A., Muoz, X., Mart, R.: Which is the best way to organize/classify images by content? Image Vis. Comput. 25(6), 778–791 (2007)CrossRefGoogle Scholar
- 13.Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
- 14.Van De Sande, K., Gevers, T., Snoek, C.: Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1582–1596 (2010)CrossRefGoogle Scholar
- 15.Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)Google Scholar
- 16.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE CVPR, pp. 770–778 (2016)Google Scholar
- 17.Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
- 18.Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
- 19.Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
- 20.Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10590-1_53 Google Scholar
- 21.Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in neural information processing systems, pp. 3320–3328 (2014)Google Scholar
- 22.Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVpPR 2005, vol. 1, pp. 886–893. IEEE (2005)Google Scholar
- 23.Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
- 24.Isikdogan, F., Salah, A.: Affine invariant salient patch descriptors for image retrieval. In: International Workshop on Image and Audio Analysis for Multimedia Interactive Services, Paris, France (2013)Google Scholar
- 25.Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)CrossRefGoogle Scholar
- 26.Zhang, J., Sclaroff, S.: Saliency detection: a Boolean map approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–160 (2013)Google Scholar
- 27.Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006). doi: 10.1007/11744085_38 CrossRefGoogle Scholar
- 28.Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)CrossRefGoogle Scholar