An Image Retrieval System for Video

  • Paolo Bolettieri
  • Fabio Carrara
  • Franca DeboleEmail author
  • Fabrizio Falchi
  • Claudio Gennaro
  • Lucia Vadicamo
  • Claudio Vairo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11807)


Since the 1970’s the Content-Based Image Indexing and Retrieval (CBIR) has been an active area. Nowadays, the rapid increase of video data has paved the way to the advancement of the technologies in many different communities for the creation of Content-Based Video Indexing and Retrieval (CBVIR). However, greater attention needs to be devoted to the development of effective tools for video search and browse. In this paper, we present Visione, a system for large-scale video retrieval. The system integrates several content-based analysis and retrieval modules, including a keywords search, a spatial object-based search, and a visual similarity search. From the tests carried out by users when they needed to find as many correct examples as possible, the similarity search proved to be the most promising option. Our implementation is based on state-of-the-art deep learning approaches for content analysis and leverages highly efficient indexing techniques to ensure scalability. Specifically, we encode all the visual and textual descriptors extracted from the videos into (surrogate) textual representations that are then efficiently indexed and searched using an off-the-shelf text search engine using similarity functions.


Content-based image indexing Neural networks Multimedia retrieval Similarity search Object detection 


  1. 1.
    Amato, G., et al.: VISIONE at VBS2019. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019, Part II. LNCS, vol. 11296, pp. 591–596. Springer, Cham (2019). Scholar
  2. 2.
    Amato, G., Falchi, F., Gennaro, C., Rabitti, F.: Searching and annotating 100M images with YFCC100M-HNfc6 and MI-File. In: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, CBMI 2017, pp. 26:1–26:4. ACM (2017)Google Scholar
  3. 3.
    Amato, G., Falchi, F., Gennaro, C., Vadicamo, L.: Deep permutations: deep convolutional neural networks and permutation-based indexing. In: Amsaleg, L., Houle, M.E., Schubert, E. (eds.) SISAP 2016. LNCS, vol. 9939, pp. 93–106. Springer, Cham (2016). Scholar
  4. 4.
    Awad, G., Snoek, C.G.M., Smeaton, A.F., Quénot, G.: Trecvid semantic indexing of video: a 6-year retrospective. ITE Trans. Media Technol. Appl. 4(3), 187–208 (2016)CrossRefGoogle Scholar
  5. 5.
    Fellbaum, C., Miller, G.: WordNet: An Electronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridge (1998)CrossRefGoogle Scholar
  6. 6.
    Gennaro, C., Amato, G., Bolettieri, P., Savino, P.: An approach to content-based image retrieval based on the lucene search engine library. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 55–66. Springer, Heidelberg (2010). Scholar
  7. 7.
    Gordo, A., Almazán, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  9. 9.
    Jiang, Y.G., Wu, Z., Wang, J., Xue, X., Chang, S.F.: Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 352–364 (2018)CrossRefGoogle Scholar
  10. 10.
    Lokoč, J., Kovalčík, G., Souček, T.: Revisiting SIRET video retrieval tool. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 419–424. Springer, Cham (2018). Scholar
  11. 11.
    Lokoč, J., Bailer, W., Schöffmann, K., Münzer, B., Awad, G.: On influential trends in interactive video retrieval: Video browser showdown 2015–2017. IEEE Trans. Multimedia 20(12), 3361–3376 (2018)CrossRefGoogle Scholar
  12. 12.
    Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Trans. Multimedia Comput. Commun. Appl. 15(1), 29:1–29:18 (2019)CrossRefGoogle Scholar
  13. 13.
    Niraimathi, D.S.: Color based image segmentation using classification of k-NN with contour analysis method. Int. Res. J. Eng. Technol. 3, 1169–1177 (2016)Google Scholar
  14. 14.
    Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
  15. 15.
    Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C – a research video collection. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11295, pp. 349–360. Springer, Cham (2019). Scholar
  16. 16.
    Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)CrossRefGoogle Scholar
  17. 17.
    Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015)
  18. 18.
    Truong, T.-D., et al.: Video search based on semantic extraction and locally regional object proposal. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 451–456. Springer, Cham (2018). Scholar
  19. 19.
    Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Paolo Bolettieri
    • 1
  • Fabio Carrara
    • 1
  • Franca Debole
    • 1
    Email author
  • Fabrizio Falchi
    • 1
  • Claudio Gennaro
    • 1
  • Lucia Vadicamo
    • 1
  • Claudio Vairo
    • 1
  1. 1.Institute of Information Science and TechnologiesItalian National Research Council (CNR)PisaItaly

Personalised recommendations