Advertisement

VERGE in VBS 2020

  • Stelios AndreadisEmail author
  • Anastasia Moumtzidou
  • Konstantinos Apostolidis
  • Konstantinos Gkountakos
  • Damianos Galanopoulos
  • Emmanouil Michail
  • Ilias Gialampoukidis
  • Stefanos Vrochidis
  • Vasileios Mezaris
  • Ioannis Kompatsiaris
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11962)

Abstract

This paper demonstrates VERGE, an interactive video retrieval engine for browsing a collection of images or videos and searching for specific content. The engine integrates a multitude of retrieval methodologies that include visual and textual searches and further capabilities such as fusion and reranking. All search options and results appear in a web application that aims at a friendly user experience.

Notes

Acknowledgements

This work was supported by the EU’s Horizon 2020 research and innovation programme under grant agreements H2020-779962 V4Desi-gn, H2020-786731 CONNEXIONs and H2020-780656 ReTV.

References

  1. 1.
    Awad, G., Butt, A., et al.: TRECVID 2018: benchmarking video activity detection, video captioning and matching, video storytelling linking and video search (2018)Google Scholar
  2. 2.
    Lokoč, J., Kovalčík, G., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM TOMM 15(1), 29 (2019)Google Scholar
  3. 3.
    Yang, H.-F., Lin, K., Chen, C.-S.: Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. PAMI 40(2), 437–451 (2017)CrossRefGoogle Scholar
  4. 4.
    Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. PAMI 33(1), 117–128 (2011)CrossRefGoogle Scholar
  5. 5.
    Markatopoulou, F., Moumtzidou, A., Galanopoulos, D., et al.: ITI-CERTH participation in TRECVID 2017. In: Proceedings of TRECVID 2017 Workshop, USA (2017)Google Scholar
  6. 6.
    Zhou, B., Lapedriza, A., et al.: Places: a 10 million image database for scene recognition. IEEE Trans. PAMI 40(6), 1452–1464 (2017)CrossRefGoogle Scholar
  7. 7.
    Markatopoulou, F., Mezaris, V., Patras, I.: Implicit and explicit concept relations in deep neural networks for multi-label video/image annotation. IEEE Trans. Circuits Syst. Video Technol. 29(6), 1631–1644 (2018)CrossRefGoogle Scholar
  8. 8.
    Guangnan, Y., Yitong, L., Hongliang, X., et al.: EventNet: a large scale structured concept library for complex event detection in video. In Proceedings of ACM MM (2015)Google Scholar
  9. 9.
    Gu, C., Sun, C., Ross, D.A., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE Conference on CVPR, pp. 6047–6056 (2018)Google Scholar
  10. 10.
    Tan, W.R., Chan, C.S., Aguirre, H.E., Tanaka, K.: Ceci n’est pas une pipe: a deep convolutional network for fine-art paintings classification. In 2016 IEEE ICIP, pp. 3703–3707. IEEE (2016)Google Scholar
  11. 11.
    Dong, J., Li, X., Xu, C., et al.: Dual encoding for zero-example video retrieval. In: Proceedings of the IEEE Conference on CVPR, pp. 9346–9355 (2019)Google Scholar
  12. 12.
    Cho, K., Van M.B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint. arXiv:1406.1078 (2014)
  13. 13.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint. arXiv:1408.5882 (2014)
  14. 14.
    Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improving visual-semantic embeddings with hard negatives (2018)Google Scholar
  15. 15.
    Li, Y., Song, Y., Cao, L., et al.: TGIF: a new dataset and benchmark on animated GIF description. In: The IEEE Conference on CVPR (2016)Google Scholar
  16. 16.
    Xu, J., Mei, T., Yao, T., Rui, Y.: MSR-VTT: a large video description dataset for bridging video and language. In: The IEEE Conference on CVPR (June 2016)Google Scholar
  17. 17.
    Lamere, P., Kwok, P., Gouvea, E., et al.: The CMU SPHINX-4 speech recognition system. In: IEEE ICASSP 2003, vol. 1, pp. 2–5, Hong Kong (2003)Google Scholar
  18. 18.
    Venugopalan, S., Rohrbach, M., Donahue, J., et al.: Sequence to sequence-video to text. In: Proceedings of the IEEE ICCV, pp. 4534–4542 (2015)Google Scholar
  19. 19.
    Phan, S., Henter, G.E., Miyao, Y., Satoh, S.: Consensus-based sequence training for video captioning. arXiv preprint. arXiv:1712.09532 (2017)
  20. 20.
    Gialampoukidis, I., Moumtzidou, A., Liparas, D., et al.: A hybrid graph-based and non-linear late fusion approach for multimedia retrieval. In: 2016 14th International Workshop on CBMI, pp. 1–6 (June 2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Stelios Andreadis
    • 1
    Email author
  • Anastasia Moumtzidou
    • 1
  • Konstantinos Apostolidis
    • 1
  • Konstantinos Gkountakos
    • 1
  • Damianos Galanopoulos
    • 1
  • Emmanouil Michail
    • 1
  • Ilias Gialampoukidis
    • 1
  • Stefanos Vrochidis
    • 1
  • Vasileios Mezaris
    • 1
  • Ioannis Kompatsiaris
    • 1
  1. 1.Information Technologies InstituteCentre for Research and Technology HellasThessalonikiGreece

Personalised recommendations