IVIST: Interactive VIdeo Search Tool in VBS 2020

  • Sungjune ParkEmail author
  • Jaeyub SongEmail author
  • Minho Park
  • Yong Man Ro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11962)


This paper presents a new video retrieval tool, Interactive VIdeo Search Tool (IVIST), which participates in the 2020 Video Browser Showdown (VBS). As a video retrieval tool, IVIST is equipped with proper and high-performing functionalities such as object detection, dominant-color finding, scene-text recognition and text-image retrieval. These functionalities are constructed with various deep neural networks. By adopting these functionalities, IVIST performs well in searching users’ desirable videos. Furthermore, due to user-friendly user interface, IVIST is easy to use even for novice users. Although IVIST is developed to participate in VBS, we hope that it will be applied as a practical video retrieval tool in the future, dealing with actual video data on the Internet.


Video Browser Showdown (VBS) Video retrieval tool Text-image retrieval 


  1. 1.
    Cobârzan, C., Schoeffmann, K., Bailer, W., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimedia Tools Appl. 76, 5539–5571 (2017)CrossRefGoogle Scholar
  2. 2.
    Lokoč, J., et al.: Interactive search or sequential browsing? a detailed analysis of the video browser showdown 2018. ACM Trans. Multimedia Comput. Commun. Appl. 15(29), 18 (2019)Google Scholar
  3. 3.
    Deng, D., Liu, H., Li, X., Cai, D.: PixelLink.: detecting scene text via instance segmentation. arXiv preprint arXiv:1801.01315 (2018)
  4. 4.
    Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 41(9), 2035–2048 (2018)CrossRefGoogle Scholar
  5. 5.
    Bookstein, F.L.: Thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)CrossRefGoogle Scholar
  6. 6.
    Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)CrossRefGoogle Scholar
  7. 7.
    Lee K.-H., Xi, C., Gang, H., Houdong, H., Xiaodong, H.: Stacked cross attention for image-text matching. arXiv preprint arXiv:1803.08024 (2018)
  8. 8.
    Chen, K., et al.: Hybrid task cascade for instance segmentation. arXiv preprint arXiv:1901.07518 (2019)
  9. 9.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. arXiv preprint arXiv:1405.0312 (2014)
  10. 10.
    Kuznetsova, A., et al.: The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale. arXiv preprint arXiv:1811.00982 (2018)
  11. 11.
    ZFTurbo: Keras-RetinaNet-for-Open-Images-Challenge-2018.
  12. 12.
    Lin, T.-Y., Goyal, P., Girchick, R., He, K., Dollar, P.: Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 (2018)
  13. 13.
    Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. arXiv preprint arXiv:1712.00726 (2017)
  14. 14.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv preprint arXiv:1703.06870 (2018)
  15. 15.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2016)
  16. 16.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. IEEE Computer Society, pp. 770–778 (2016)Google Scholar
  17. 17.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2671–2673 (1997)CrossRefGoogle Scholar
  18. 18.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)Google Scholar
  19. 19.
    Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. arXiv preprint arXiv:1506.07503 (2015)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Image and Video Systems Lab, School of Electrical EngineeringKAISTDaejeonSouth Korea

Personalised recommendations