Skip to main content

GPR1200: A Benchmark forĀ General-Purpose Content-Based Image Retrieval

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13141))

Included in the following conference series:

Abstract

Even though it has extensively been shown that retrieval specific training of deep neural networks is beneficial for nearest neighbor image search quality, most of these models are trained and tested in the domain of landmarks images. However, some applications use images from various other domains and therefore need a network with good generalization properties - a general-purpose CBIR model. To the best of our knowledge, no testing protocol has so far been introduced to benchmark models with respect to general image retrieval quality. After analyzing popular image retrieval test sets we decided to manually curate GPR1200, an easy to use and accessible but challenging benchmark dataset with a broad range of image categories. This benchmark is subsequently used to evaluate various pretrained models of different architectures on their generalization qualities. We show that large-scale pretraining significantly improves retrieval performance and present experiments on how to further increase these properties by appropriate fine-tuning. With these promising results, we hope to increase interest in the research topic of general-purpose CBIR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 584ā€“599. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_38

    ChapterĀ  Google ScholarĀ 

  2. Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 726ā€“743. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_43

    ChapterĀ  Google ScholarĀ 

  3. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: CVPR (2019)

    Google ScholarĀ 

  4. Dosovitskiy, A., et al.: An image is worth 16Ā \(\times \)Ā 16 words: transformers for image recognition at scale. CoRR (2020)

    Google ScholarĀ 

  5. Gordo, A., AlmazĆ”n, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 241ā€“257. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_15

    ChapterĀ  Google ScholarĀ 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google ScholarĀ 

  7. Horn, G.V., et al.: The iNaturalist species classification and detection dataset. In: CVPR (2018)

    Google ScholarĀ 

  8. Iscen, A., Tolias, G., Avrithis, Y., Furon, T., Chum, O.: Efficient diffusion on region manifolds: recovering small objects with compact CNN representations. CoRR (2016)

    Google ScholarĀ 

  9. Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304ā€“317. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_24

    ChapterĀ  Google ScholarĀ 

  10. Kim, S., Kim, D., Cho, M., Kwak, S.: Proxy anchor loss for deep metric learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google ScholarĀ 

  11. Kolesnikov, A., et al.: Big Transfer (BiT): general visual representation learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 491ā€“507. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_29

    ChapterĀ  Google ScholarĀ 

  12. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13) (2013)

    Google ScholarĀ 

  13. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)

  14. Lokoč, J., et al.: Is the reign of interactive search eternal? Findings from the video browser showdown 2020. ACM Trans. Multimedia Comput. Commun. Appl. 17, 1ā€“26 (2021)

    ArticleĀ  Google ScholarĀ 

  15. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: British Machine Vision Conference (2015)

    Google ScholarĀ 

  16. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2007)

    Google ScholarĀ 

  17. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)

    Google ScholarĀ 

  18. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: CVPR (2008)

    Google ScholarĀ 

  19. Radenovic, F., Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Revisiting Oxford and Paris: large-scale image retrieval benchmarking. In: CVPR (2018)

    Google ScholarĀ 

  20. Radenovic, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1655ā€“1668 (2019)

    ArticleĀ  Google ScholarĀ 

  21. Rothe, R., Timofte, R., Gool, L.V.: Deep expectation of real and apparent age from a single image without facial landmarks. Int. J. Comput. Vis. 126, 144ā€“157 (2018)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  22. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115, 211ā€“252 (2015)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  23. Schall, K., Barthel, K.U., Hezel, N., Jung, K.: Deep aggregation of regional convolutional activations for content based image retrieval. In: MMSP (2019)

    Google ScholarĀ 

  24. Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google ScholarĀ 

  25. Tan, M., Le, Q.: EfficientNetV2: smaller models and faster training. In: Proceedings of the 38th International Conference on Machine Learning (2021)

    Google ScholarĀ 

  26. Tolias, G., Sicre, R., JĆ©gou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: ICLR (Poster) (2016)

    Google ScholarĀ 

  27. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., JĆ©gou, H.: Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877 (2020)

  28. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30. Curran Associates, Inc. (2017)

    Google ScholarĀ 

  29. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 dataset. Technical report, CNS-TR-2011-001, California Institute of Technology (2011)

    Google ScholarĀ 

  30. Wang, H., Ge, S., Lipton, Z., Xing, E.P.: Learning robust global representations by penalizing local predictive power. In: Advances in Neural Information Processing Systems, pp. 10506ā€“10518 (2019)

    Google ScholarĀ 

  31. Wang, S., Jiang, S.: INSTRE: a new benchmark for instance-level object retrieval and recognition. In: ACM Transactions on Multimedia Computing, Communications, and Applications (2015)

    Google ScholarĀ 

  32. Weyand, T., Araujo, A., Cao, B., Sim, J.: Google Landmarks Dataset v2 - a large-scale benchmark for instance-level recognition and retrieval. In: CVPR (2020)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantin Schall .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schall, K., Barthel, K.U., Hezel, N., Jung, K. (2022). GPR1200: A Benchmark forĀ General-Purpose Content-Based Image Retrieval. In: ĆžĆ³r JĆ³nsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98358-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98357-4

  • Online ISBN: 978-3-030-98358-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics