Skip to main content

VERGE in VBS 2020

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2020)

Abstract

This paper demonstrates VERGE, an interactive video retrieval engine for browsing a collection of images or videos and searching for specific content. The engine integrates a multitude of retrieval methodologies that include visual and textual searches and further capabilities such as fusion and reranking. All search options and results appear in a web application that aims at a friendly user experience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Awad, G., Butt, A., et al.: TRECVID 2018: benchmarking video activity detection, video captioning and matching, video storytelling linking and video search (2018)

    Google Scholar 

  2. Lokoč, J., Kovalčík, G., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM TOMM 15(1), 29 (2019)

    Google Scholar 

  3. Yang, H.-F., Lin, K., Chen, C.-S.: Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. PAMI 40(2), 437–451 (2017)

    Article  Google Scholar 

  4. Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. PAMI 33(1), 117–128 (2011)

    Article  Google Scholar 

  5. Markatopoulou, F., Moumtzidou, A., Galanopoulos, D., et al.: ITI-CERTH participation in TRECVID 2017. In: Proceedings of TRECVID 2017 Workshop, USA (2017)

    Google Scholar 

  6. Zhou, B., Lapedriza, A., et al.: Places: a 10 million image database for scene recognition. IEEE Trans. PAMI 40(6), 1452–1464 (2017)

    Article  Google Scholar 

  7. Markatopoulou, F., Mezaris, V., Patras, I.: Implicit and explicit concept relations in deep neural networks for multi-label video/image annotation. IEEE Trans. Circuits Syst. Video Technol. 29(6), 1631–1644 (2018)

    Article  Google Scholar 

  8. Guangnan, Y., Yitong, L., Hongliang, X., et al.: EventNet: a large scale structured concept library for complex event detection in video. In Proceedings of ACM MM (2015)

    Google Scholar 

  9. Gu, C., Sun, C., Ross, D.A., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE Conference on CVPR, pp. 6047–6056 (2018)

    Google Scholar 

  10. Tan, W.R., Chan, C.S., Aguirre, H.E., Tanaka, K.: Ceci n’est pas une pipe: a deep convolutional network for fine-art paintings classification. In 2016 IEEE ICIP, pp. 3703–3707. IEEE (2016)

    Google Scholar 

  11. Dong, J., Li, X., Xu, C., et al.: Dual encoding for zero-example video retrieval. In: Proceedings of the IEEE Conference on CVPR, pp. 9346–9355 (2019)

    Google Scholar 

  12. Cho, K., Van M.B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint. arXiv:1406.1078 (2014)

  13. Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint. arXiv:1408.5882 (2014)

  14. Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improving visual-semantic embeddings with hard negatives (2018)

    Google Scholar 

  15. Li, Y., Song, Y., Cao, L., et al.: TGIF: a new dataset and benchmark on animated GIF description. In: The IEEE Conference on CVPR (2016)

    Google Scholar 

  16. Xu, J., Mei, T., Yao, T., Rui, Y.: MSR-VTT: a large video description dataset for bridging video and language. In: The IEEE Conference on CVPR (June 2016)

    Google Scholar 

  17. Lamere, P., Kwok, P., Gouvea, E., et al.: The CMU SPHINX-4 speech recognition system. In: IEEE ICASSP 2003, vol. 1, pp. 2–5, Hong Kong (2003)

    Google Scholar 

  18. Venugopalan, S., Rohrbach, M., Donahue, J., et al.: Sequence to sequence-video to text. In: Proceedings of the IEEE ICCV, pp. 4534–4542 (2015)

    Google Scholar 

  19. Phan, S., Henter, G.E., Miyao, Y., Satoh, S.: Consensus-based sequence training for video captioning. arXiv preprint. arXiv:1712.09532 (2017)

  20. Gialampoukidis, I., Moumtzidou, A., Liparas, D., et al.: A hybrid graph-based and non-linear late fusion approach for multimedia retrieval. In: 2016 14th International Workshop on CBMI, pp. 1–6 (June 2016)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the EU’s Horizon 2020 research and innovation programme under grant agreements H2020-779962 V4Desi-gn, H2020-786731 CONNEXIONs and H2020-780656 ReTV.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stelios Andreadis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Andreadis, S. et al. (2020). VERGE in VBS 2020. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37734-2_69

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37733-5

  • Online ISBN: 978-3-030-37734-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics