VIRET Tool Meets NasNet

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11296)


The results of the last Video Browser Showdown in Bangkok 2018 show that multimodal search with interactive query reformulation represents a competitive search strategy for all the evaluated task categories. Therefore, we plan to target the effectiveness of involved retrieval models by making use of the most recent deep network architectures in the new version of our interactive video retrieval VIRET tool. Specifically, we apply the NasNet deep convolutional neural network architecture for automatic annotation and similarity search in the set of selected frames from the provided video collection. In addition, we implement temporal sequence queries and subimage similarity search to provide higher query formulation flexibility for users.


Known-item search Deep learning NasNet Interactive video retrieval 



This paper has been supported in part by Czech Science Foundation (GAČR) project Nr. 17-22224S and by Charles University grant SVV-260451.


  1. 1.
    Barthel, K.U., Hezel, N., Mackowiak, R.: Navigating a graph of scenes for exploring large video collections. In: Tian, Q., Sebe, N., Qi, G.-J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9517, pp. 418–423. Springer, Cham (2016). Scholar
  2. 2.
    Blazek, A., Lokoc, J., Kubon, D.: Video hunter at VBS 2017. In: MultiMedia Modeling - 23rd International Conference, MMM 2017, Proceedings, Part II, Reykjavik, Iceland, 4–6 January 2017, pp. 493–498 (2017)Google Scholar
  3. 3.
    Čech, P., Maroušek, J., Lokoč, J., Silva, Y.N., Starks, J.: Comparing MapReduce-based k-NN similarity joins on hadoop for high-dimensional data. In: Cong, G., Peng, W.-C., Zhang, W.E., Li, C., Sun, A. (eds.) ADMA 2017. LNCS (LNAI), vol. 10604, pp. 63–75. Springer, Cham (2017). Scholar
  4. 4.
    Cobârzan, C., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools Appl. 76(4), 5539–5571 (2017)CrossRefGoogle Scholar
  5. 5.
    Hu, P., Ramanan, D.: Finding tiny faces. CoRR abs/1612.04402 (2016)Google Scholar
  6. 6.
    Lokoc, J., Bailer, W., Schoeffmann, K., Muenzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE Trans. Multimedia 20(12), 3361–3376 (2018). Scholar
  7. 7.
    Lokoč, J., Blažek, A., Skopal, T.: Signature-based video browser. In: Gurrin, C., Hopfgartner, F., Hurst, W., Johansen, H., Lee, H., O’Connor, N. (eds.) MMM 2014. LNCS, vol. 8326, pp. 415–418. Springer, Cham (2014). Scholar
  8. 8.
    Lokoč, J., Kovalčík, G., Souček, T.: Revisiting SIRET video retrieval tool. In: MultiMedia Modeling - 24th International Conference, MMM 2018, Bangkok, Thailand, Proceedings, Part II, 5–7 February 2018, pp. 419–424 (2018)Google Scholar
  9. 9.
    Lokoč, J., Souček, T., Kovalčík, G.: Using an interactive video retrieval tool for lifelog data. In: Proceedings of the 2018 ACM Workshop on the Lifelog Search Challenge, LSC 2018, pp. 15–19. ACM, New York (2018)Google Scholar
  10. 10.
    Nguyen, P.A., Lu, Y.-J., Zhang, H., Ngo, C.-W.: Enhanced VIREO KIS at VBS 2018. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 407–412. Springer, Cham (2018). Scholar
  11. 11.
    Primus, M.J., Münzer, B., Leibetseder, A., Schoeffmann, K.: The ITEC collaborative video search system at the video browser showdown 2018. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 438–443. Springer, Cham (2018). Scholar
  12. 12.
    Rossetto, L., Giangreco, I., Tănase, C., Schuldt, H., Dupont, S., Seddati, O.: Enhanced retrieval and browsing in the IMOTION system. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10133, pp. 469–474. Springer, Cham (2017). Scholar
  13. 13.
    Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 1–9 (2015)Google Scholar
  14. 14.
    Zhou, X., et al.: EAST: an efficient and accurate scene text detector. CoRR abs/1704.03155 (2017)Google Scholar
  15. 15.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.SIRET Research Group, Department of Software Engineering, Faculty of Mathematics and PhysicsCharles UniversityPragueCzech Republic

Personalised recommendations