Advertisement

Multimedia Tools and Applications

, Volume 77, Issue 23, pp 30729–30748 | Cite as

Exploiting tf-idf in deep Convolutional Neural Networks for Content Based Image Retrieval

  • Nikolaos Kondylidis
  • Maria Tzelepi
  • Anastasios Tefas
Article
  • 189 Downloads

Abstract

In this paper, a novel term frequency-inverse document frequency (tf-idf) based method that utilizes deep Convolutional Neural Networks (CNN) for Content Based Image Retrieval (CBIR) is proposed. That is, we treat the learned filters of the convolutional layers of a CNN model as detectors of visual words. Each of these filters has been trained to be activated in different visual patterns. Thus, since the activations of each filter provide information about the degree of presence of the visual pattern that the filter has learned during the training procedure, we consider the activations of these filters as the tf part. Subsequently, we propose three approaches of computing the idf part. Finally, we propose a query expansion technique on top of the formulated descriptors. The proposed approach interconnects the standard tf-idf method with the modern CNN analysis for visual content, providing a very powerful image retrieval technique with improved results as it is highlighted by extensive experiments in four challenging image datasets.

Keywords

Content Based Image Retrieval Deep Convolutional Neural Networks TF-IDF 

Notes

Acknowledgments

Maria Tzelepi was supported by the General Secretariat for Research and Technology (GSRT) and the Hellenic Foundation for Research and Innovation (HFRI) (PhD Scholarship No. 2826).

References

  1. 1.
    Arandjelovic R, Zisserman A (2013) All about vlad. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1578–1585Google Scholar
  2. 2.
    Babenko A, Lempitsky V (2015) Aggregating deep convolutional features for image retrieval. arXiv:1510.07493
  3. 3.
    Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: Computer Vision–ECCV 2014. Springer, pp 584–599Google Scholar
  4. 4.
    Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New YorkGoogle Scholar
  5. 5.
    Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: 2007 IEEE 11th international conference on computer vision. IEEE, pp 1–8Google Scholar
  6. 6.
    Ciresan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3642–3649Google Scholar
  7. 7.
    Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1. Prague, pp 1–2Google Scholar
  8. 8.
    Datta R, Li J, Wang JZ (2005) Content-based image retrieval: approaches and trends of the new age. In: Proceedings of the 7th ACM SIGMM international workshop on multimedia information retrieval. ACM, pp 253–262Google Scholar
  9. 9.
    Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process 3:e2CrossRefGoogle Scholar
  10. 10.
    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2013) Decaf: a deep convolutional activation feature for generic visual recognition. arXiv:1310.1531
  11. 11.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587Google Scholar
  12. 12.
    Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision. Springer, pp 241–257Google Scholar
  13. 13.
    Hinami R, Matsui Y, Satoh S (2017) Region-based image retrieval revisited. arXiv:1709.09106
  14. 14.
    Iscen A, Tolias G, Avrithis Y, Furon T, Chum O (2016) Efficient diffusion on region manifolds: recovering small objects with compact cnn representations. arXiv:1611.05113
  15. 15.
    Jégou H, Zisserman A (2014) Triangulation embedding and democratic aggregation for image search. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3310–3317Google Scholar
  16. 16.
    Jégou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Zisserman A, Forsyth D, Torr P (eds) European conference on computer vision, volume I of LNCS. Springer, Berlin, pp 304–317Google Scholar
  17. 17.
    Jégou H, Perronnin F, Douze M, Sanchez J, Perez P, Schmid C (2012) Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell 34(9):1704–1716CrossRefGoogle Scholar
  18. 18.
    Kato T (1992) Database architecture for content-based image retrieval. In: SPIE/IS&T 1992 symposium on electronic imaging: science and technology. International Society for Optics and Photonics, pp 112–123Google Scholar
  19. 19.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  20. 20.
    Le Cun B B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems. CiteseerGoogle Scholar
  21. 21.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  22. 22.
    Li Z, Liu J, Tang J, Lu H (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098CrossRefGoogle Scholar
  23. 23.
    Li Z, Tang J (2015) Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans Multimed 17(11):1989–1999CrossRefGoogle Scholar
  24. 24.
    Liu Z, Wang S, Tian Q (2016) Fine-residual vlad for image retrieval. Neurocomputing 173:1183–1191CrossRefGoogle Scholar
  25. 25.
    Lowe DG (1999) Object recognition from local scale-invariant features. In: The proceedings of the seventh IEEE international conference on computer vision, vol 2. IEEE, pp 1150–1157Google Scholar
  26. 26.
    Mayron LM (2008) Image retrieval using visual attention. Florida Atlantic UniversityGoogle Scholar
  27. 27.
    Mohedano E, Salvador A, McGuinness K, Marques F, O’Connor N E, Nieto X G (2016) Bags of local convolutional features for scalable instance search. arXiv:1604.04653
  28. 28.
    Ng J, Yang F, Davis L (2015) Exploiting local features from deep networks for image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61Google Scholar
  29. 29.
    Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 2161–2168Google Scholar
  30. 30.
    Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3384–3391Google Scholar
  31. 31.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8Google Scholar
  32. 32.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8Google Scholar
  33. 33.
    Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks. ITE Trans Media Technol Appl 4(3):251–258CrossRefGoogle Scholar
  34. 34.
    Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3626–3633Google Scholar
  35. 35.
    Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: Ninth IEEE international conference on computer vision. Proceedings. IEEE, pp 1470–1477Google Scholar
  36. 36.
    Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRefGoogle Scholar
  37. 37.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9Google Scholar
  38. 38.
    Taigman Y, Yang M, Ranzato MA, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708Google Scholar
  39. 39.
    Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of cnn activations. arXiv:1511.05879
  40. 40.
    Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660Google Scholar
  41. 41.
    Tzelepi M, Tefas A (2016) Exploiting supervised learning for finetuning deep cnns in content based image retrieval. In: 2016 23rd international conference on pattern recognition (ICPR). IEEE, pp 2918–2923Google Scholar
  42. 42.
    Tzelepi M, Tefas A (2018) Deep convolutional learning for content based image retrieval. Neurocomputing 275:2467–2478CrossRefGoogle Scholar
  43. 43.
    Voorhees EM (1985) The cluster hypothesis revisited. In: Proceedings of the 8th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 188–196Google Scholar
  44. 44.
    Wan J, Wang D, Hoi SC H, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the ACM international conference on multimedia. ACM, pp 157–166Google Scholar
  45. 45.
    Yu W, Yang K, Yao H, Sun X, Xu P (2017) Exploiting the complementary strengths of multi-layer cnn features for image retrieval. Neurocomputing 237:235–241CrossRefGoogle Scholar
  46. 46.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Berlin, pp 818–833Google Scholar
  47. 47.
    Zhao W-L, Jégou H, Gravier G (2013) Oriented pooling for dense and non-dense rotation-invariant features. In: BMVC-24th British machine vision conferenceGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of InformaticsAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations