Abstract
Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a text query and return a set of document results, including images, a reverse image search accepts an image as a query and returns a set of images as results. This paper evaluates how well common reverse image search engines discover abstract images. We conducted an experiment leveraging images from Wikimedia Commons, a website known to be well indexed by Baidu, Bing, Google, and Yandex. We measure how difficult an image is to find again (retrievability), what percentage of images returned are relevant (precision), and the average number of results a visitor must review before finding the submitted image (mean reciprocal rank). When trying to discover the same image again among similar images, Yandex performs best. When searching for pages containing a specific image, Google and Yandex outperform the others when discovering photographs with precision scores ranging from 0.8191 to 0.8297, respectively. In both of these cases, Google and Yandex perform better with natural images than with abstract ones achieving a difference in retrievability as high as 54% between images in these categories. These results affect anyone applying common web search engines to search for technical documents that use abstract images.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Aprin, F., Chounta, I.A., Hoppe, H.U.: “See the image in different contexts”: using reverse image search to support the identification of fake news in Instagram-like social media. In: Intelligent Tutoring Systems, pp. 264–275. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09680-8_25
Araujo, F.H., et al.: Reverse image search for scientific data within and beyond the visible spectrum. Exp. Syst. Appl. 109, 35–48 (2018). https://doi.org/10.1016/j.eswa.2018.05.015
Arefkhani, M., Soryani, M.: Malware clustering using image processing hashes. In: Proceedings of the 9th Iranian Conference on Machine Vision and Image Processing (MVIP), pp. 214–218 (2015). https://doi.org/10.1109/IranianMVIP.2015.7397539
Askinadze, A.: Fake war crime image detection by reverse image search. In: Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, pp. 345–354. Gesellschaft für Informatik e.V., Bonn (2017). https://dl.gi.de/handle/20.500.12116/930
Azzopardi, L.: Theory of retrieval: the retrievability of information. In: Proceedings of the 2015 International Conference on The Theory of Information Retrieval, ICTIR 2015, pp. 3–6. Association for Computing Machinery, Northampton (2015). https://doi.org/10.1145/2808194.2809444
Azzopardi, L., English, R., Wilkie, C., Maxwell, D.: Page retrievability calculator. Adv. Inf. Retrieval, 737–741 (2014). https://doi.org/10.1007/978-3-319-06028-6_85
Azzopardi, L., Vinay, V.: Retrievability: an evaluation measure for higher order information access tasks. In: Proceeding of the 17th ACM Conference on Information and Knowledge Mining, p. 561. ACM Press, Napa Valley (2008). https://doi.org/10.1145/1458082.1458157
Bashir, S., Rauber, A.: Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China, pp. 1863–1866 (2009). https://doi.org/10.1145/1645953.1646250
Bitirim, Y., Bitirim, S., Celik Ertugrul, D., Toygar, O.: An evaluation of reverse image search performance of google. In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 1368–1372 (2020). https://doi.org/10.1109/COMPSAC48688.2020.00-65
Buchner, J.: Johannesbuchner/imagehash (2021). https://github.com/JohannesBuchner/imagehash
Cao, Y., Qi, H., Kato, J., Li, K.: Hash ranking with weighted asymmetric distance for image search. IEEE Trans. Comput. Imaging 3(4), 1008–1019 (2017). https://doi.org/10.1109/TCI.2017.2736980
Caragea, C., et al.: CiteSeerx: A Scholarly Big Dataset. In: de Rijke, M., et al. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 311–322. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06028-6_26
Chamoso, P., Rivas, A., Martín-Limorti, J.J., Rodríguez, S.: A hash based image matching algorithm for social networks. In: De la Prieta, F., et al. (eds.) PAAMS 2017. AISC, vol. 619, pp. 183–190. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-61578-3_18
Chen, R.C., Azzopardi, L., Scholer, F.: An empirical analysis of pruning techniques: performance, retrievability and bias. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, Singapore, pp. 2023–2026 (2017). https://doi.org/10.1145/3132847.3133151
Chutel, P.M., Sakhare, A.: Evaluation of compact composite descriptor based reverse image search. In: Proceedings of the 2014 International Conference on Communication and Signal Processing, Melmaruvathur, India, pp. 1430–1434 (2014). https://doi.org/10.1109/ICCSP.2014.6950085
Chutel, P.M., Sakhare, A.: Reverse image search engine using compact composite descriptor. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 2(1) (2014). https://www.ijarcsms.com/docs/paper/volume2/issue1/V2I1-0106.pdf
Clark, C., Divvala, S.: PDFFigures 2.0: mining figures from research papers. In: IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 143–152 (2016). https://ieeexplore.ieee.org/abstract/document/7559577
Croft, W.B., Metzler, D., Strohman, T.: Information Retrieval in Practice. Pearson Education, Boston (2015). https://ciir.cs.umass.edu/irbook/
Curran, A.: Ordinary and extraordinary images: making visible the operations of stock photography in posters against the repeal of the 8th amendment. Feminist Encounters 6(1) (2022). https://doi.org/10.20897/femenc/11746
d’Andrea, C., Mintz, A.: Studying ‘Live’ cross-platform circulation of images with a computer vision API: an experiment based on a sports media event. In: The 19th Annual Conference of the Association of Internet Researchers, Montréal, Canada (2018). https://doi.org/10.5210/spir.v2018i0.10477
d’Andrea, C., Mintz, A.: Studying the live cross-platform circulation of images with computer vision API: an experiment based on a sports media event. Int. J. Commun. 13(0) (2019). https://ijoc.org/index.php/ijoc/article/view/10423
Diyasa, I.G.S.M., Alhajir, A.D., Hakim, A.M., Rohman, M.F.: Reverse image search analysis based on pre-trained convolutional neural network model. In: Proceedings of the 6th Information Technology International Seminar (ITIS), Surabaya, Indonesia, pp. 1–6 (2020). https://doi.org/10.1109/ITIS50118.2020.9321037
Fei, M., Li, J., Liu, H.: Visual tracking based on improved foreground detection and perceptual hashing. Neurocomputing 152, 413–428 (2015). https://doi.org/10.1016/j.neucom.2014.09.060
Gaillard, M., Egyed-Zsigmond, E.: Large scale reverse image search. XXXVème Congrès INFORSID, p. 127 (2017). https://inforsid.fr/actes/2017/INFORSID_2017_paper_34.pdf
Gaillard, M., Egyed-Zsigmond, E., Granitzer, M.: CNN features for Reverse Image Search. Document numérique 21(1–2), 63–90 (2018). https://www.cairn.info/revue-document-numerique-2018-1-page-63.htm
Gandhi, V., Vaidya, J., Rana, N., Jariwala, D.: Reverse image search using discrete wavelet transform, local histogram and canny edge detector. Int. J. Eng. Res. Technol. 7(6) (2018). https://www.ijert.org/reverse-image-search-using-discrete-wavelet-transform-local-histogram-and-canny-edge-detector
Ganti, D.: A novel method for detecting misinformation in videos, utilizing reverse image search, semantic analysis, and sentiment comparison of metadata. In: SSRN (2022). https://ssrn.com/abstract=4128499
Gordo, A., Almazan, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017). https://doi.org/10.1007/s11263-017-1016-8
Guinness, D., Cutrell, E., Morris, M.R.: Caption crawler: enabling reusable alternative text descriptions using reverse image search. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, pp. 1–11 (2018). https://doi.org/10.1145/3173574.3174092
Horváth, A.: Object recognition based on Google’s reverse image search and image similarity. In: Proceedings of the Seventh International Conference on Graphic and Image Processing (ICGIP 2015), vol. 9817, pp. 162–166. International Society for Optics and Photonics, SPIE (2015). https://doi.org/10.1117/12.2228505
Jia, J.L., Wang, J.Y., Mills, D.E., Shen, A., Sarin, K.Y.: Fitzpatrick phototype disparities in identification of cutaneous malignancies by google reverse image. J. Am. Acad. Dermatol. 84(5), 1415–1417 (2021). https://doi.org/10.1016/j.jaad.2020.05.005
Jones, S.M.: Improving collection understanding for web archives with storytelling: shining light into dark and stormy archives. Ph.D. thesis, Old Dominion University (2021). https://doi.org/10.25777/zts6-v512
Jones, S.M., Weigle, M.C., Klein, M., Nelson, M.L.: Automatically selecting striking images for social cards. In: Proceedings of the 13th ACM Web Science Conference, pp. 36–45 (2021). https://doi.org/10.1145/3447535.3462505
Kateřina, Z.: Propaganda on social media: the case of geert wilders. Master’s thesis, Charles University (2018). https://hdl.handle.net/20.500.11956/99767
Kelly, E.: Reverse image lookup of a small academic library digital collection. Codex J. Louisiana Chap. ACRL 3(2) (2015). https://journal.acrlla.org/index.php/codex/article/view/101
Krawetz, N.: Looks like it (2011). https://hackerfactor.com/blog/index.php%3F/archives/432-Looks-Like-It.html
Krawetz, N.: Kind of like that (2013). https://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
Kucer, M., Oyen, D., Castorena, J., Wu, J.: DeepPatent: large scale patent drawing recognition and retrieval. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2309–2318 (2022). https://openaccess.thecvf.com/content/WACV2022/html/Kucer_DeepPatent_Large_Scale_Patent_Drawing_Recognition_and_Retrieval_WACV_2022_paper.html
Lei, Y., Wang, Y., Huang, J.: Robust image hash in Radon transform domain for authentication. Sig. Process. Image Commun. 26(6), 280–288 (2011). https://doi.org/10.1016/j.image.2011.04.007
Li, S., Hu, J., Cui, Y., Hu, J.: DeepPatent: patent classification with convolutional neural networks and word embedding. Scientometrics 117(2), 721–744 (2018). https://doi.org/10.1007/s11192-018-2905-5
Mamrosh, J.L., Moore, D.D.: Using google reverse image search to decipher biological images. Current Protoc. Mol. Biol. 111(1), 19.13.1–19.13.4 (2015). https://doi.org/10.1002/0471142727.mb1913s111
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008). https://nlp.stanford.edu/IR-book/html/htmledition/irbook.html
Mawoneke, K.F., Luo, X., Shi, Y., Kita, K.: Reverse image search for the fashion industry using convolutional neural networks. In: 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), pp. 483–489 (2020). https://doi.org/10.1109/ICSIP49896.2020.9339350
McMahon, C., Johnson, I., Hecht, B.: The substantial interdependence of wikipedia and google: a case study on the relationship between peer production communities and information technologies. In: Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 2017), Montréal, Québec, Canada, p. 10 (2017). https://www.aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/viewPaper/15623
Meuschke, N., Gondek, C., Seebacher, D., Breitinger, C., Keim, D., Gipp, B.: An adaptive image-based plagiarism detection approach. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2018, pp. 131–140. Association for Computing Machinery, Fort Worth (2018). https://doi.org/10.1145/3197026.3197042
Monga, V., Evans, B.: Perceptual image hashing via feature points: performance evaluation and tradeoffs. IEEE Trans. Image Process. 15(11), 3452–3465 (2006). https://doi.org/10.1109/TIP.2006.881948
Nieuwenhuysen, P.: Search by image through the WWW: an additional tool for information retrieval. In: The International Conference on Asia-Pacific Library and Information Education and Practices A-LIEP, p. 38 (2013)
Nieuwenhuysen, P.: Finding copies of an image: a comparison of reverse image search systems on the WWW. In: Proceedings of 14th International Conference on Webometrics, Informetrics and Scientometrics, Macau, China, pp. 97–106 (2018). https://doi.org/10.22032/dbt.39355
Oyen, D., Kucer, M., Wohlberg, B.: VisHash: visual similarity preserving image hashing for diagram retrieval. In: Applications of Machine Learning 2021, vol. 11843, pp. 50–66. International Society for Optics and Photonics, SPIE (2021). https://doi.org/10.1117/12.2594720
Oyen, D., Wohlberg, B., Kucer, M.: GoFigure-LANL/VisHash (2021). https://github.com/GoFigure-LANL/VisHash
Piroi, F., Lupu, M., Hanbury, A., Zenz, V.: CLEF-IP 2011: retrieval in the intellectual property domain. In: Conference and Labs of the Evaluation Forum (2011). https://ceur-ws.org/Vol-1177/CLEF2011wn-CLEF-IP-PiroiEt2011.pdf
Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018). https://doi.org/10.1109/TPAMI.2018.2846566
Ransohoff, J.D., Li, S., Sarin, K.Y.: Assessment of accuracy of patient-initiated differential diagnosis generation by google reverse image searching. JAMA Dermatol. 152(10), 1164–1166 (2016). https://doi.org/10.1001/jamadermatol.2016.2096
Ribeiro, L.S.F., Bui, T., Collomosse, J., Ponti, M.: Sketchformer: transformer-based representation for sketched structure. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. https://openaccess.thecvf.com/content_CVPR_2020/html/Ribeiro_Sketchformer_Transformer-Based_Representation_for_Sketched_Structure_CVPR_2020_paper.html
Ruchay, A., Kober, V., Yavtushenko, E.: Fast perceptual image hash based on cascade algorithm. In: Applications of Digital Image Processing XL, vol. 10396, pp. 424–430. International Society for Optics and Photonics, SPIE (2017). https://doi.org/10.1117/12.2272716
Samar, T., Traub, M.C., van Ossenbruggen, J., Hardman, L., de Vries, A.P.: Quantifying retrieval bias in Web archive search. Int. J. Dig. Libr. 19(1), 57–75 (2018). https://doi.org/10.1007/s00799-017-0215-9
Sangkloy, P., Burnell, N., Ham, C., Hays, J.: The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans. Graph. (Proc. SIGGRAPH) (2016). https://doi.org/10.1145/2897824.2925954
Sharifzadeh, A., Smith, G.P.: Inaccuracy of Google reverse image search in complex dermatology cases. J. Am. Acad. Dermatol. 84(1), 202–203 (2021). https://doi.org/10.1016/j.jaad.2020.04.107
Song, J., Song, Y.Z., Xiang, T., Hospedales, T.M.: Fine-grained image retrieval: the text/sketch input dilemma. In: BMVC, vol. 2, p. 7 (2017). https://doi.org/10.5244/C.31.45
van Strien, D., Beelen, K., Ardanuy, M.C., Hosseini, K., McGillivray, B., Colavizza, G.: Assessing the impact of OCR quality on downstream NLP tasks. In: Proceedings of the 12th International Conference on Agents and Artificial Intelligence (2020). https://doi.org/10.5220/0009169004840496
Thompson, S., Reilly, M.: “A picture is worth a thousand words”: reverse image lookup and digital library assessment. J. Assoc. Inf. Sci. Technol. 68(9), 2264–2266 (2017). https://doi.org/10.1002/asi.23847
Tikhonov, A.: Preservation of digital images: question of fixity. Heritage 2(2), 1160–1165 (2019). https://doi.org/10.3390/heritage2020075
Traub, M.C., Samar, T., van Ossenbruggen, J., He, J., de Vries, A., Hardman, L.: Querylog-based assessment of retrievability bias in a large newspaper corpus. In: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016, pp. 7–16. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2910896.2910907
Vega, F., Medina, J., Mendoza, D., Saquicela, V., Espinoza, M.: A robust video identification framework using perceptual image hashing. In: 2017 XLIII Latin American Computer Conference (CLEI), pp. 1–10 (2017). https://doi.org/10.1109/CLEI.2017.8226396
Veres, O., Rusyn, B., Sachenko, A., Rishnyak, I.: Choosing the method of finding similar images in the reverse search system. In: COLINS, pp. 99–107 (2018). https://ceur-ws.org/Vol-2136/10000099.pdf
Vincent, N., Hecht, B.: A deeper investigation of the importance of Wikipedia links to search engine results. Proc. ACM Hum. Comput. Interact. 5(CSCW1), 1–15 (2021). https://doi.org/10.1145/3449078
Voorhees, E.M.: The TREC-8 question answering track report. In: Proceedings of the 8th Text Retrieval Conference (TREC-8) (1999). https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication500-246.pdf
Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval. MIT Press (2005)
Vrochidis, S., Moumtzidou, A., Kompatsiaris, I.: Concept-based patent image retrieval. World Patent Inf. 34(4), 292–303 (2012). https://doi.org/10.1016/j.wpi.2012.07.002
Wilkie, C., Azzopardi, L.: Relating retrievability, performance and length. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 937–940 (2013). https://doi.org/10.1145/2484028.2484145
Wilkie, C., Azzopardi, L.: Efficiently estimating retrievability bias. Adv. Inf. Retrieval, 720–726 (2014). https://doi.org/10.1007/978-3-319-06028-6_82
Wilkie, C., Azzopardi, L.: A retrievability analysis: exploring the relationship between retrieval bias and retrieval performance. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 81–90. Association for Computing Machinery, Shanghai (2014). https://doi.org/10.1145/2661829.2661948
Wilkie, C., Azzopardi, L.: Retrievability and retrieval bias: a comparison of inequality measures. Adv. Inf. Retrieval, 209–214 (2015). https://doi.org/10.1007/978-3-319-16354-3_22
Wilkie, C., Azzopardi, L.: Algorithmic bias: do good systems make relevant documents more retrievable? In: Proceedings of the 2017 ACM Conference on Information and Knowledge Management, Singapore, Singapore, pp. 2375–2378 (2017). https://doi.org/10.1145/3132847.3133135
Xu, P., Hospedales, T.M., Yin, Q., Song, Y.Z., Xiang, T., Wang, L.: Deep learning for free-hand sketch: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022). https://doi.org/10.1109/TPAMI.2022.3148853
Zannettou, S., Caulfield, T., Bradlyn, B., De Cristofaro, E., Stringhini, G., Blackburn, J.: Characterizing the use of images in state-sponsored information warfare operations by Russian Trolls on Twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, no. (1), pp. 774–785 (2020). https://ojs.aaai.org/index.php/ICWSM/article/view/7342
Zauner, C.: Implementation and benchmarking of perceptual image hash functions. Master’s thesis, Upper Austria University of Applied Sciences (2010). https://www.phash.org/docs/pubs/thesis_zauner.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jones, S.M., Oyen, D. (2023). Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13808. Springer, Cham. https://doi.org/10.1007/978-3-031-25085-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-25085-9_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25084-2
Online ISBN: 978-3-031-25085-9
eBook Packages: Computer ScienceComputer Science (R0)