Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11296)


This paper presents VISIONE, a tool for large–scale video search. The tool can be used for both known-item and ad-hoc video search tasks since it integrates several content-based analysis and retrieval modules, including a keyword search, a spatial object-based search, and a visual similarity search. Our implementation is based on state-of-the-art deep learning approaches for the content analysis and leverages highly efficient indexing techniques to ensure scalability. Specifically, we encode all the visual and textual descriptors extracted from the videos into (surrogate) textual representations that are then efficiently indexed and searched using an off-the-shelf text search engine.


Content-based video retrieval Video search Known item search Convolutional neural networks 



This work was partially funded by “Smart News: Social sensing for breaking news”, CUP CIPE D58C15000270008, by VISECH ARCO-CNR, CUP B56J17001330004, and by “Automatic Data and documents Analysis to enhance human-based processes” (ADA), CUP CIPE D55F17000290009. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.


  1. 1.
    Amato, G., Falchi, F., Gennaro, C., Rabitti, F.: Searching and annotating 100M images with YFCC100M-HNfc6 and MI-file. In: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, CBMI 2017, Florence, Italy, 19–21 June 2017, pp. 26:1–26:4 (2017).
  2. 2.
    Amato, G., Falchi, F., Gennaro, C., Vadicamo, L.: Deep permutations: deep convolutional neural networks and permutation-based indexing. In: Amsaleg, L., Houle, M.E., Schubert, E. (eds.) SISAP 2016. LNCS, vol. 9939, pp. 93–106. Springer, Cham (2016). Scholar
  3. 3.
    Awad, G., Snoek, C.G.M., Smeaton, A.F., Quénot, G.: TRECVid semantic indexing of video: a 6-year retrospective (2016)Google Scholar
  4. 4.
    Cobârzan, C., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools Appl. 76(4), 5539–5571 (2017). Scholar
  5. 5.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  6. 6.
    Gennaro, C., Amato, G., Bolettieri, P., Savino, P.: An approach to content-based image retrieval based on the lucene search engine library. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 55–66. Springer, Heidelberg (2010). Scholar
  7. 7.
    Gordo, A., Almazán, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 124(2), 237–254 (2017)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Jia, Y., et al.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  9. 9.
    Jiang, Y.G., Wu, Z., Wang, J., Xue, X., Chang, S.F.: Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans. Patt. Anal. Mach. Intell. 40(2), 352–364 (2018). Scholar
  10. 10.
    Lokoc, J., Bailer, W., Schoeffmann, K., Muenzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE Trans. Multimedia 20(12), 3361–3376 (2018). Scholar
  11. 11.
    Lokoč, J., Kovalčík, G., Souček, T.: Revisiting SIRET video retrieval tool. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 419–424. Springer, Cham (2018). Scholar
  12. 12.
    Lokoč, J., Souček, T., Kovalčik, G.: Using an interactive video retrieval tool for lifelog data. In: Proceedings of the 2018 ACM Workshop on the Lifelog Search Challenge, LSC 2018, pp. 15–19. ACM, New York (2018).
  13. 13.
    Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv (2018)Google Scholar
  14. 14.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). Scholar
  15. 15.
    Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016). Scholar
  16. 16.
    Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015)
  17. 17.
    Truong, T.D., et al.: Video search based on semantic extraction and locally regional object proposal. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 451–456. Springer, Cham (2018). Scholar
  18. 18.
    Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Patt. Anal. Mach. Intell. 40, 1452–1464 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute of Information Science and Technologies (ISTI)Italian National Research Council (CNR)PisaItaly

Personalised recommendations