Skip to main content

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13808))

Included in the following conference series:

Abstract

Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a text query and return a set of document results, including images, a reverse image search accepts an image as a query and returns a set of images as results. This paper evaluates how well common reverse image search engines discover abstract images. We conducted an experiment leveraging images from Wikimedia Commons, a website known to be well indexed by Baidu, Bing, Google, and Yandex. We measure how difficult an image is to find again (retrievability), what percentage of images returned are relevant (precision), and the average number of results a visitor must review before finding the submitted image (mean reciprocal rank). When trying to discover the same image again among similar images, Yandex performs best. When searching for pages containing a specific image, Google and Yandex outperform the others when discovering photographs with precision scores ranging from 0.8191 to 0.8297, respectively. In both of these cases, Google and Yandex perform better with natural images than with abstract ones achieving a difference in retrievability as high as 54% between images in these categories. These results affect anyone applying common web search engines to search for technical documents that use abstract images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://tineye.com/.

References

  1. Aprin, F., Chounta, I.A., Hoppe, H.U.: “See the image in different contexts”: using reverse image search to support the identification of fake news in Instagram-like social media. In: Intelligent Tutoring Systems, pp. 264–275. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09680-8_25

  2. Araujo, F.H., et al.: Reverse image search for scientific data within and beyond the visible spectrum. Exp. Syst. Appl. 109, 35–48 (2018). https://doi.org/10.1016/j.eswa.2018.05.015

  3. Arefkhani, M., Soryani, M.: Malware clustering using image processing hashes. In: Proceedings of the 9th Iranian Conference on Machine Vision and Image Processing (MVIP), pp. 214–218 (2015). https://doi.org/10.1109/IranianMVIP.2015.7397539

  4. Askinadze, A.: Fake war crime image detection by reverse image search. In: Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, pp. 345–354. Gesellschaft für Informatik e.V., Bonn (2017). https://dl.gi.de/handle/20.500.12116/930

  5. Azzopardi, L.: Theory of retrieval: the retrievability of information. In: Proceedings of the 2015 International Conference on The Theory of Information Retrieval, ICTIR 2015, pp. 3–6. Association for Computing Machinery, Northampton (2015). https://doi.org/10.1145/2808194.2809444

  6. Azzopardi, L., English, R., Wilkie, C., Maxwell, D.: Page retrievability calculator. Adv. Inf. Retrieval, 737–741 (2014). https://doi.org/10.1007/978-3-319-06028-6_85

  7. Azzopardi, L., Vinay, V.: Retrievability: an evaluation measure for higher order information access tasks. In: Proceeding of the 17th ACM Conference on Information and Knowledge Mining, p. 561. ACM Press, Napa Valley (2008). https://doi.org/10.1145/1458082.1458157

  8. Bashir, S., Rauber, A.: Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China, pp. 1863–1866 (2009). https://doi.org/10.1145/1645953.1646250

  9. Bitirim, Y., Bitirim, S., Celik Ertugrul, D., Toygar, O.: An evaluation of reverse image search performance of google. In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 1368–1372 (2020). https://doi.org/10.1109/COMPSAC48688.2020.00-65

  10. Buchner, J.: Johannesbuchner/imagehash (2021). https://github.com/JohannesBuchner/imagehash

  11. Cao, Y., Qi, H., Kato, J., Li, K.: Hash ranking with weighted asymmetric distance for image search. IEEE Trans. Comput. Imaging 3(4), 1008–1019 (2017). https://doi.org/10.1109/TCI.2017.2736980

  12. Caragea, C., et al.: CiteSeerx: A Scholarly Big Dataset. In: de Rijke, M., et al. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 311–322. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06028-6_26

    Chapter  Google Scholar 

  13. Chamoso, P., Rivas, A., Martín-Limorti, J.J., Rodríguez, S.: A hash based image matching algorithm for social networks. In: De la Prieta, F., et al. (eds.) PAAMS 2017. AISC, vol. 619, pp. 183–190. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-61578-3_18

    Chapter  Google Scholar 

  14. Chen, R.C., Azzopardi, L., Scholer, F.: An empirical analysis of pruning techniques: performance, retrievability and bias. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, Singapore, pp. 2023–2026 (2017). https://doi.org/10.1145/3132847.3133151

  15. Chutel, P.M., Sakhare, A.: Evaluation of compact composite descriptor based reverse image search. In: Proceedings of the 2014 International Conference on Communication and Signal Processing, Melmaruvathur, India, pp. 1430–1434 (2014). https://doi.org/10.1109/ICCSP.2014.6950085

  16. Chutel, P.M., Sakhare, A.: Reverse image search engine using compact composite descriptor. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 2(1) (2014). https://www.ijarcsms.com/docs/paper/volume2/issue1/V2I1-0106.pdf

  17. Clark, C., Divvala, S.: PDFFigures 2.0: mining figures from research papers. In: IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 143–152 (2016). https://ieeexplore.ieee.org/abstract/document/7559577

  18. Croft, W.B., Metzler, D., Strohman, T.: Information Retrieval in Practice. Pearson Education, Boston (2015). https://ciir.cs.umass.edu/irbook/

  19. Curran, A.: Ordinary and extraordinary images: making visible the operations of stock photography in posters against the repeal of the 8th amendment. Feminist Encounters 6(1) (2022). https://doi.org/10.20897/femenc/11746

  20. d’Andrea, C., Mintz, A.: Studying ‘Live’ cross-platform circulation of images with a computer vision API: an experiment based on a sports media event. In: The 19th Annual Conference of the Association of Internet Researchers, Montréal, Canada (2018). https://doi.org/10.5210/spir.v2018i0.10477

  21. d’Andrea, C., Mintz, A.: Studying the live cross-platform circulation of images with computer vision API: an experiment based on a sports media event. Int. J. Commun. 13(0) (2019). https://ijoc.org/index.php/ijoc/article/view/10423

  22. Diyasa, I.G.S.M., Alhajir, A.D., Hakim, A.M., Rohman, M.F.: Reverse image search analysis based on pre-trained convolutional neural network model. In: Proceedings of the 6th Information Technology International Seminar (ITIS), Surabaya, Indonesia, pp. 1–6 (2020). https://doi.org/10.1109/ITIS50118.2020.9321037

  23. Fei, M., Li, J., Liu, H.: Visual tracking based on improved foreground detection and perceptual hashing. Neurocomputing 152, 413–428 (2015). https://doi.org/10.1016/j.neucom.2014.09.060

  24. Gaillard, M., Egyed-Zsigmond, E.: Large scale reverse image search. XXXVème Congrès INFORSID, p. 127 (2017). https://inforsid.fr/actes/2017/INFORSID_2017_paper_34.pdf

  25. Gaillard, M., Egyed-Zsigmond, E., Granitzer, M.: CNN features for Reverse Image Search. Document numérique 21(1–2), 63–90 (2018). https://www.cairn.info/revue-document-numerique-2018-1-page-63.htm

  26. Gandhi, V., Vaidya, J., Rana, N., Jariwala, D.: Reverse image search using discrete wavelet transform, local histogram and canny edge detector. Int. J. Eng. Res. Technol. 7(6) (2018). https://www.ijert.org/reverse-image-search-using-discrete-wavelet-transform-local-histogram-and-canny-edge-detector

  27. Ganti, D.: A novel method for detecting misinformation in videos, utilizing reverse image search, semantic analysis, and sentiment comparison of metadata. In: SSRN (2022). https://ssrn.com/abstract=4128499

  28. Gordo, A., Almazan, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017). https://doi.org/10.1007/s11263-017-1016-8

  29. Guinness, D., Cutrell, E., Morris, M.R.: Caption crawler: enabling reusable alternative text descriptions using reverse image search. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, pp. 1–11 (2018). https://doi.org/10.1145/3173574.3174092

  30. Horváth, A.: Object recognition based on Google’s reverse image search and image similarity. In: Proceedings of the Seventh International Conference on Graphic and Image Processing (ICGIP 2015), vol. 9817, pp. 162–166. International Society for Optics and Photonics, SPIE (2015). https://doi.org/10.1117/12.2228505

  31. Jia, J.L., Wang, J.Y., Mills, D.E., Shen, A., Sarin, K.Y.: Fitzpatrick phototype disparities in identification of cutaneous malignancies by google reverse image. J. Am. Acad. Dermatol. 84(5), 1415–1417 (2021). https://doi.org/10.1016/j.jaad.2020.05.005

  32. Jones, S.M.: Improving collection understanding for web archives with storytelling: shining light into dark and stormy archives. Ph.D. thesis, Old Dominion University (2021). https://doi.org/10.25777/zts6-v512

  33. Jones, S.M., Weigle, M.C., Klein, M., Nelson, M.L.: Automatically selecting striking images for social cards. In: Proceedings of the 13th ACM Web Science Conference, pp. 36–45 (2021). https://doi.org/10.1145/3447535.3462505

  34. Kateřina, Z.: Propaganda on social media: the case of geert wilders. Master’s thesis, Charles University (2018). https://hdl.handle.net/20.500.11956/99767

  35. Kelly, E.: Reverse image lookup of a small academic library digital collection. Codex J. Louisiana Chap. ACRL 3(2) (2015). https://journal.acrlla.org/index.php/codex/article/view/101

  36. Krawetz, N.: Looks like it (2011). https://hackerfactor.com/blog/index.php%3F/archives/432-Looks-Like-It.html

  37. Krawetz, N.: Kind of like that (2013). https://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html

  38. Kucer, M., Oyen, D., Castorena, J., Wu, J.: DeepPatent: large scale patent drawing recognition and retrieval. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2309–2318 (2022). https://openaccess.thecvf.com/content/WACV2022/html/Kucer_DeepPatent_Large_Scale_Patent_Drawing_Recognition_and_Retrieval_WACV_2022_paper.html

  39. Lei, Y., Wang, Y., Huang, J.: Robust image hash in Radon transform domain for authentication. Sig. Process. Image Commun. 26(6), 280–288 (2011). https://doi.org/10.1016/j.image.2011.04.007

  40. Li, S., Hu, J., Cui, Y., Hu, J.: DeepPatent: patent classification with convolutional neural networks and word embedding. Scientometrics 117(2), 721–744 (2018). https://doi.org/10.1007/s11192-018-2905-5

  41. Mamrosh, J.L., Moore, D.D.: Using google reverse image search to decipher biological images. Current Protoc. Mol. Biol. 111(1), 19.13.1–19.13.4 (2015). https://doi.org/10.1002/0471142727.mb1913s111

  42. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008). https://nlp.stanford.edu/IR-book/html/htmledition/irbook.html

  43. Mawoneke, K.F., Luo, X., Shi, Y., Kita, K.: Reverse image search for the fashion industry using convolutional neural networks. In: 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), pp. 483–489 (2020). https://doi.org/10.1109/ICSIP49896.2020.9339350

  44. McMahon, C., Johnson, I., Hecht, B.: The substantial interdependence of wikipedia and google: a case study on the relationship between peer production communities and information technologies. In: Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 2017), Montréal, Québec, Canada, p. 10 (2017). https://www.aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/viewPaper/15623

  45. Meuschke, N., Gondek, C., Seebacher, D., Breitinger, C., Keim, D., Gipp, B.: An adaptive image-based plagiarism detection approach. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL 2018, pp. 131–140. Association for Computing Machinery, Fort Worth (2018). https://doi.org/10.1145/3197026.3197042

  46. Monga, V., Evans, B.: Perceptual image hashing via feature points: performance evaluation and tradeoffs. IEEE Trans. Image Process. 15(11), 3452–3465 (2006). https://doi.org/10.1109/TIP.2006.881948

  47. Nieuwenhuysen, P.: Search by image through the WWW: an additional tool for information retrieval. In: The International Conference on Asia-Pacific Library and Information Education and Practices A-LIEP, p. 38 (2013)

    Google Scholar 

  48. Nieuwenhuysen, P.: Finding copies of an image: a comparison of reverse image search systems on the WWW. In: Proceedings of 14th International Conference on Webometrics, Informetrics and Scientometrics, Macau, China, pp. 97–106 (2018). https://doi.org/10.22032/dbt.39355

  49. Oyen, D., Kucer, M., Wohlberg, B.: VisHash: visual similarity preserving image hashing for diagram retrieval. In: Applications of Machine Learning 2021, vol. 11843, pp. 50–66. International Society for Optics and Photonics, SPIE (2021). https://doi.org/10.1117/12.2594720

  50. Oyen, D., Wohlberg, B., Kucer, M.: GoFigure-LANL/VisHash (2021). https://github.com/GoFigure-LANL/VisHash

  51. Piroi, F., Lupu, M., Hanbury, A., Zenz, V.: CLEF-IP 2011: retrieval in the intellectual property domain. In: Conference and Labs of the Evaluation Forum (2011). https://ceur-ws.org/Vol-1177/CLEF2011wn-CLEF-IP-PiroiEt2011.pdf

  52. Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018). https://doi.org/10.1109/TPAMI.2018.2846566

  53. Ransohoff, J.D., Li, S., Sarin, K.Y.: Assessment of accuracy of patient-initiated differential diagnosis generation by google reverse image searching. JAMA Dermatol. 152(10), 1164–1166 (2016). https://doi.org/10.1001/jamadermatol.2016.2096

  54. Ribeiro, L.S.F., Bui, T., Collomosse, J., Ponti, M.: Sketchformer: transformer-based representation for sketched structure. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. https://openaccess.thecvf.com/content_CVPR_2020/html/Ribeiro_Sketchformer_Transformer-Based_Representation_for_Sketched_Structure_CVPR_2020_paper.html

  55. Ruchay, A., Kober, V., Yavtushenko, E.: Fast perceptual image hash based on cascade algorithm. In: Applications of Digital Image Processing XL, vol. 10396, pp. 424–430. International Society for Optics and Photonics, SPIE (2017). https://doi.org/10.1117/12.2272716

  56. Samar, T., Traub, M.C., van Ossenbruggen, J., Hardman, L., de Vries, A.P.: Quantifying retrieval bias in Web archive search. Int. J. Dig. Libr. 19(1), 57–75 (2018). https://doi.org/10.1007/s00799-017-0215-9

  57. Sangkloy, P., Burnell, N., Ham, C., Hays, J.: The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans. Graph. (Proc. SIGGRAPH) (2016). https://doi.org/10.1145/2897824.2925954

  58. Sharifzadeh, A., Smith, G.P.: Inaccuracy of Google reverse image search in complex dermatology cases. J. Am. Acad. Dermatol. 84(1), 202–203 (2021). https://doi.org/10.1016/j.jaad.2020.04.107

  59. Song, J., Song, Y.Z., Xiang, T., Hospedales, T.M.: Fine-grained image retrieval: the text/sketch input dilemma. In: BMVC, vol. 2, p. 7 (2017). https://doi.org/10.5244/C.31.45

  60. van Strien, D., Beelen, K., Ardanuy, M.C., Hosseini, K., McGillivray, B., Colavizza, G.: Assessing the impact of OCR quality on downstream NLP tasks. In: Proceedings of the 12th International Conference on Agents and Artificial Intelligence (2020). https://doi.org/10.5220/0009169004840496

  61. Thompson, S., Reilly, M.: “A picture is worth a thousand words”: reverse image lookup and digital library assessment. J. Assoc. Inf. Sci. Technol. 68(9), 2264–2266 (2017). https://doi.org/10.1002/asi.23847

  62. Tikhonov, A.: Preservation of digital images: question of fixity. Heritage 2(2), 1160–1165 (2019). https://doi.org/10.3390/heritage2020075

    Article  Google Scholar 

  63. Traub, M.C., Samar, T., van Ossenbruggen, J., He, J., de Vries, A., Hardman, L.: Querylog-based assessment of retrievability bias in a large newspaper corpus. In: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2016, pp. 7–16. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2910896.2910907

  64. Vega, F., Medina, J., Mendoza, D., Saquicela, V., Espinoza, M.: A robust video identification framework using perceptual image hashing. In: 2017 XLIII Latin American Computer Conference (CLEI), pp. 1–10 (2017). https://doi.org/10.1109/CLEI.2017.8226396

  65. Veres, O., Rusyn, B., Sachenko, A., Rishnyak, I.: Choosing the method of finding similar images in the reverse search system. In: COLINS, pp. 99–107 (2018). https://ceur-ws.org/Vol-2136/10000099.pdf

  66. Vincent, N., Hecht, B.: A deeper investigation of the importance of Wikipedia links to search engine results. Proc. ACM Hum. Comput. Interact. 5(CSCW1), 1–15 (2021). https://doi.org/10.1145/3449078

  67. Voorhees, E.M.: The TREC-8 question answering track report. In: Proceedings of the 8th Text Retrieval Conference (TREC-8) (1999). https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication500-246.pdf

  68. Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval. MIT Press (2005)

    Google Scholar 

  69. Vrochidis, S., Moumtzidou, A., Kompatsiaris, I.: Concept-based patent image retrieval. World Patent Inf. 34(4), 292–303 (2012). https://doi.org/10.1016/j.wpi.2012.07.002

  70. Wilkie, C., Azzopardi, L.: Relating retrievability, performance and length. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, pp. 937–940 (2013). https://doi.org/10.1145/2484028.2484145

  71. Wilkie, C., Azzopardi, L.: Efficiently estimating retrievability bias. Adv. Inf. Retrieval, 720–726 (2014). https://doi.org/10.1007/978-3-319-06028-6_82

  72. Wilkie, C., Azzopardi, L.: A retrievability analysis: exploring the relationship between retrieval bias and retrieval performance. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 81–90. Association for Computing Machinery, Shanghai (2014). https://doi.org/10.1145/2661829.2661948

  73. Wilkie, C., Azzopardi, L.: Retrievability and retrieval bias: a comparison of inequality measures. Adv. Inf. Retrieval, 209–214 (2015). https://doi.org/10.1007/978-3-319-16354-3_22

  74. Wilkie, C., Azzopardi, L.: Algorithmic bias: do good systems make relevant documents more retrievable? In: Proceedings of the 2017 ACM Conference on Information and Knowledge Management, Singapore, Singapore, pp. 2375–2378 (2017). https://doi.org/10.1145/3132847.3133135

  75. Xu, P., Hospedales, T.M., Yin, Q., Song, Y.Z., Xiang, T., Wang, L.: Deep learning for free-hand sketch: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2022). https://doi.org/10.1109/TPAMI.2022.3148853

  76. Zannettou, S., Caulfield, T., Bradlyn, B., De Cristofaro, E., Stringhini, G., Blackburn, J.: Characterizing the use of images in state-sponsored information warfare operations by Russian Trolls on Twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, no. (1), pp. 774–785 (2020). https://ojs.aaai.org/index.php/ICWSM/article/view/7342

  77. Zauner, C.: Implementation and benchmarking of perceptual image hash functions. Master’s thesis, Upper Austria University of Applied Sciences (2010). https://www.phash.org/docs/pubs/thesis_zauner.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shawn M. Jones .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jones, S.M., Oyen, D. (2023). Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13808. Springer, Cham. https://doi.org/10.1007/978-3-031-25085-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25085-9_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25084-2

  • Online ISBN: 978-3-031-25085-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics