SOM-Hunter: Video Browsing with Relevance-to-SOM Feedback Loop

  • Miroslav KratochvílEmail author
  • Patrik Veselý
  • František Mejzlík
  • Jakub LokočEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11962)


This paper presents a prototype video retrieval engine focusing on a simple known-item search workflow, where users initialize the search with a query and then use an iterative approach to explore a larger candidate set. Specifically, users gradually observe a sequence of displays and provide feedback to the system. The displays are dynamically created by a self organizing map that employs the scores based on the collected feedback, in order to provide a display matching the user preferences. In addition, users can inspect various other types of specialized displays for exploitation purposes, once promising candidates are found.


Interactive video retrieval Deep features Relevance feedback Self-organizing maps 



This paper has been supported by Czech Science Foundation (GAČR) project 19-22071Y and by Charles University grant SVV-260451. M.K. was supported by ELIXIR CZ (MEYS), grant number LM2015047.

We are extremely grateful to Vladimír Vondruš for his helpful advices on using the Magnum engine, and to Tomáš Souček and Gregor Kovalčík for their help with frame selection and feature extraction.


  1. 1.
    Barthel, K.U., Hezel, N., Jung, K.: Fusing keyword search and visual exploration for untagged videos. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 413–418. Springer, Cham (2018). Scholar
  2. 2.
    Cox, I.J., Miller, M.L., Minka, T.P., Papathomas, T.V., Yianilos, P.N.: The Bayesian image retrieval system, pichunter: theory, implementation, and psychophysical experiments. IEEE Trans. Image Process. 9(1), 20–37 (2000)CrossRefGoogle Scholar
  3. 3.
    Donahue, J., et al.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: Proceedings of the 31st International Conference on Machine Learning, ICML 2014, vol. 32, pp. 647–655. (2014)Google Scholar
  4. 4.
    He, J., Shang, X., Zhang, H., Chua, T.S.: Mental visual browsing. In: Tian, Q., Sebe, N., Qi, G.J., Huet, B., Hong, R., Liu, X. (eds.) MMM 2016. LNCS, vol. 9517, pp. 424–428. Springer, Cham (2016). Scholar
  5. 5.
    Kohonen, T.: The self-organizing map. Neurocomputing 21(1–3), 1–6 (1998)CrossRefGoogle Scholar
  6. 6.
    Lokoč, J., Bailer, W., Schoeffmann, K., Münzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE Trans. Multimedia 20(12), 3361–3376 (2018). Scholar
  7. 7.
    Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Trans. Multimedia Comput. Commun. Appl. 15(1), 29:1–29:18 (2019). Scholar
  8. 8.
    Lokoč, J., Kovalčík, G., Souček, T., Moravec, J., Čech, P.: A framework for effective known-item search in video. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, pp. 1777–1785. ACM, New York (2019).
  9. 9.
    Rossetto, L., Amiri Parian, M., Gasser, R., Giangreco, I., Heller, S., Schuldt, H.: Deep learning-based concept detection in vitrivr. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11296, pp. 616–621. Springer, Cham (2019). Scholar
  10. 10.
    Schoeffmann, K., Münzer, B., Leibetseder, A., Primus, J., Kletz, S.: Autopiloting feature maps: the deep interactive video exploration (diveXplore) system at VBS2019. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11296, pp. 585–590. Springer, Cham (2019). Scholar
  11. 11.
    Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, 7–12 June 2015, Boston, MA, USA, pp. 1–9 (2015)Google Scholar
  12. 12.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017)CrossRefGoogle Scholar
  13. 13.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012 (2017).
  14. 14.
    Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: Proceedings of the 27th ACM International Conference on Multimedia, MM 2019, 21–25 October 2019, Nice, France, pp. 1786–1794 (2019).

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.SIRET Research Group, Department of Software Engineering, Faculty of Mathematics and PhysicsCharles UniversityPragueCzech Republic

Personalised recommendations