Multi-modal Video Retrieval in Virtual Reality with vitrivr-VR

Spiess, Florian; Gasser, Ralph; Heller, Silvan; Parian-Scherb, Mahnaz; Rossetto, Luca; Sauter, Loris; Schuldt, Heiko

doi:10.1007/978-3-030-98355-0_45

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13142))

Included in the following conference series:

International Conference on Multimedia Modeling

2176 Accesses
11 Citations

Abstract

In multimedia search, appropriate user interfaces (UIs) are essential to enable effective specification of the user’s information needs and the user-friendly presentation of search results. vitrivr-VR addresses these challenges and provides a novel Virtual Reality-based UI on top of the multimedia retrieval system vitrivr. In this paper we present the version of vitrivr-VR participating in the Video Browser Showdown (VBS) 2022. We describe our visual-text co-embedding feature and new query interfaces, namely text entry, pose queries and temporal queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Competitive Interactive Video Retrieval in Virtual Reality with vitrivr-VR

Towards Explainable Interactive Multi-modal Video Retrieval with Vitrivr

Free-Form Multi-Modal Multimedia Retrieval (4MR)

Notes

References

Cer, D., et al.: Universal sentence encoder. CoRR (2018)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: Conference on Computer Vision and Pattern Recognition (2009)
Google Scholar
Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improving visual-semantic embeddings with hard negatives. In: British Machine Vision Conference 2018 (2018)
Google Scholar
Gasser, R., Rossetto, L., Heller, S., Schuldt, H.: Cottontail DB: an open source database system for multimedia retrieval and analysis. In: International Conference on Multimedia (2020)
Google Scholar
Heller, S., et al.: Multi-modal interactive video retrieval with temporal queries. In: International Conference on Multimedia Modeling (2022)
Google Scholar
Heller, S., Sauter, L., Schuldt, H., Rossetto, L.: Multi-stage queries and temporal scoring in vitrivr. In: International Conference on Multimedia & Expo Workshops (2020)
Google Scholar
Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: International Conference on Multimedia (2019)
Google Scholar
Li, Y., Song, Y., Cao, L., Tetreault, J.R., Goldberg, L., Jaimes, A., Luo, J.: TGIF: A new dataset and benchmark on animated GIF description. In: Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Lokoč, J., et al.: Is the reign of interactive search eternal? Findings from the video browser showdown 2020. In: ACM TOMM (2021)
Google Scholar
Rossetto, L., Giangreco, I., Schuldt, H.: Cineast: a multi-feature sketch-based video retrieval engine. In: IEEE International Symposium on Multimedia (2014)
Google Scholar
Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C - a research video collection. In: International Conference on Multimedia Modeling (2019)
Google Scholar
Sidorov, O., Hu, R., Rohrbach, M., Singh, A.: TextCaps: a dataset for image captioning with reading comprehension. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 742–758. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_44
Chapter Google Scholar
Spiess, F., Gasser, R., Heller, S., Rossetto, L., Sauter, L., Schuldt, H.: Competitive interactive video retrieval in virtual reality with vitrivr-VR. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 441–447. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_42
Chapter Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Tran, L., et al.: A VR interface for browsing visual spaces at VBS2021. In: International Conference on Multimedia Modeling (2021)
Google Scholar
Wang, X., Wu, J., Chen, J., Li, L., Wang, Y., Wang, W.Y.: Vatex: a large-scale, high-quality multilingual dataset for video-and-language research. In: International Conference on Computer Vision (2019)
Google Scholar
Xu, J., Mei, T., Yao, T., Rui, Y.: MSR-VTT: a large video description dataset for bridging video and language. In: Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguistics 2, 67–78 (2014)
Google Scholar

Download references

Acknowledgements

This work was partly supported by the Swiss National Science Foundation (project “Participatory Knowledge Practices in Analog and Digital Image Archives”, contract no. CRSII5_193788).

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Basel, Basel, Switzerland
Florian Spiess, Ralph Gasser, Silvan Heller, Mahnaz Parian-Scherb, Loris Sauter & Heiko Schuldt
Department of Informatics, University of Zurich, Zurich, Switzerland
Luca Rossetto

Authors

Florian Spiess
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Gasser
View author publications
You can also search for this author in PubMed Google Scholar
Silvan Heller
View author publications
You can also search for this author in PubMed Google Scholar
Mahnaz Parian-Scherb
View author publications
You can also search for this author in PubMed Google Scholar
Luca Rossetto
View author publications
You can also search for this author in PubMed Google Scholar
Loris Sauter
View author publications
You can also search for this author in PubMed Google Scholar
Heiko Schuldt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Spiess .

Editor information

Editors and Affiliations

IT University of Copenhagen, Copenhagen, Denmark
Björn Þór Jónsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Minh-Triet Tran
University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
National Tsing Hua University, Hsinchu, Taiwan
Anita Min-Chun Hu
Hanoi University of Science and Technology, Hanoi, Vietnam
Binh Huynh Thi Thanh
Median Technologies, Valbonne, France
Benoit Huet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spiess, F. et al. (2022). Multi-modal Video Retrieval in Virtual Reality with vitrivr-VR. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_45

Download citation

DOI: https://doi.org/10.1007/978-3-030-98355-0_45
Published: 15 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98354-3
Online ISBN: 978-3-030-98355-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-modal Video Retrieval in Virtual Reality with vitrivr-VR

Abstract

Access this chapter

Similar content being viewed by others

Competitive Interactive Video Retrieval in Virtual Reality with vitrivr-VR

Towards Explainable Interactive Multi-modal Video Retrieval with Vitrivr

Free-Form Multi-Modal Multimedia Retrieval (4MR)

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multi-modal Video Retrieval in Virtual Reality with vitrivr-VR

Abstract

Access this chapter

Similar content being viewed by others

Competitive Interactive Video Retrieval in Virtual Reality with vitrivr-VR

Towards Explainable Interactive Multi-modal Video Retrieval with Vitrivr

Free-Form Multi-Modal Multimedia Retrieval (4MR)

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation