An Interactive Video Search Platform for Multi-modal Retrieval with Advanced Concepts

  • Nguyen-Khang Le
  • Dieu-Hien Nguyen
  • Minh-Triet TranEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11962)


The previous version of our retrieval system has shown some significant results in some retrieval tasks such as Lifelog’s moment retrieval tasks. In this paper, we adapt our platform to the Video Browser Showdown’s KIS and AVS tasks and present how our system performs in video search tasks. In addition to the smart features in our retrieval system that take advantage of the provided analysis data, we enhance the data with object color detection by employing Mask R-CNN and clustering. In this version of our search system, we try to extract the location information of the entities appearing in the videos and aim to exploit the spatial relationship between these entities. We also focus on designing efficient user interaction and a high-performance way to transfer data in the system to minimize the retrieval time.


Retrieval system User interaction Concept detection 



Research is supported by Vingroup Innovation Foundation (VINIF) in project code VINIF.2019.DA19. We would like to thank AIOZ Pte Ltd for supporting our research team with computing infrastructure.


  1. 1.
    Lifelog moment retrieval with advanced semantic extraction and flexible moment visualization for exploration. In: CEUR Workshop Proceedings, Lugano, Switzerland, 09–12 September 2019, vol. 2380 (2019).
  2. 2.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)Google Scholar
  3. 3.
    Gurrin, C., et al.: Overview of the NTCIR-14 lifelog-3 task. In: Proceedings of the Fourteenth NTCIR Conference (NTCIR-14) (2019)Google Scholar
  4. 4.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017, pp. 2980–2988 (2017)Google Scholar
  5. 5.
    Le, N.K., Nguyen, D.H., Tran, M.T.: Smart lifelog retrieval system with habit-based concepts and moment visualization. In: LSC 2019 @ ICMR 2019 (2019)Google Scholar
  6. 6.
    Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM Trans. Multimed. Comput. Commun. Appl. 15(1), 29:1–29:18 (2019). Scholar
  7. 7.
    Lokoč, J., Bailer, W., Schoeffmann, K., Muenzer, B., Awad, G.: On influential trends in interactive video retrieval: video browser showdown 2015–2017. IEEE Trans. Multimed. 20(12), 3361–3376 (2018). Scholar
  8. 8.
    Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C – a research video collection. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11295, pp. 349–360. Springer, Cham (2019). Scholar
  9. 9.
    Schoeffmann, K.: A user-centric media retrieval competition: the video browser showdown 2012–2014. IEEE Multimed. 21(4), 8–13 (2014). Scholar
  10. 10.
    Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452–1464 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Nguyen-Khang Le
    • 1
  • Dieu-Hien Nguyen
    • 1
  • Minh-Triet Tran
    • 1
    Email author
  1. 1.University of Science, VNU-HCMHo Chi Minh CityVietnam

Personalised recommendations