Abstract
In multimedia search, appropriate user interfaces (UIs) are essential to enable effective specification of the user’s information needs and the user-friendly presentation of search results. vitrivr-VR addresses these challenges and provides a novel Virtual Reality-based UI on top of the multimedia retrieval system vitrivr. In this paper we present the version of vitrivr-VR participating in the Video Browser Showdown (VBS) 2022. We describe our visual-text co-embedding feature and new query interfaces, namely text entry, pose queries and temporal queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cer, D., et al.: Universal sentence encoder. CoRR (2018)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: Conference on Computer Vision and Pattern Recognition (2009)
Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improving visual-semantic embeddings with hard negatives. In: British Machine Vision Conference 2018 (2018)
Gasser, R., Rossetto, L., Heller, S., Schuldt, H.: Cottontail DB: an open source database system for multimedia retrieval and analysis. In: International Conference on Multimedia (2020)
Heller, S., et al.: Multi-modal interactive video retrieval with temporal queries. In: International Conference on Multimedia Modeling (2022)
Heller, S., Sauter, L., Schuldt, H., Rossetto, L.: Multi-stage queries and temporal scoring in vitrivr. In: International Conference on Multimedia & Expo Workshops (2020)
Li, X., Xu, C., Yang, G., Chen, Z., Dong, J.: W2VV++: fully deep learning for ad-hoc video search. In: International Conference on Multimedia (2019)
Li, Y., Song, Y., Cao, L., Tetreault, J.R., Goldberg, L., Jaimes, A., Luo, J.: TGIF: A new dataset and benchmark on animated GIF description. In: Conference on Computer Vision and Pattern Recognition (2016)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Lokoč, J., et al.: Is the reign of interactive search eternal? Findings from the video browser showdown 2020. In: ACM TOMM (2021)
Rossetto, L., Giangreco, I., Schuldt, H.: Cineast: a multi-feature sketch-based video retrieval engine. In: IEEE International Symposium on Multimedia (2014)
Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C - a research video collection. In: International Conference on Multimedia Modeling (2019)
Sidorov, O., Hu, R., Rohrbach, M., Singh, A.: TextCaps: a dataset for image captioning with reading comprehension. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 742–758. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_44
Spiess, F., Gasser, R., Heller, S., Rossetto, L., Sauter, L., Schuldt, H.: Competitive interactive video retrieval in virtual reality with vitrivr-VR. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 441–447. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_42
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI Conference on Artificial Intelligence (2017)
Tran, L., et al.: A VR interface for browsing visual spaces at VBS2021. In: International Conference on Multimedia Modeling (2021)
Wang, X., Wu, J., Chen, J., Li, L., Wang, Y., Wang, W.Y.: Vatex: a large-scale, high-quality multilingual dataset for video-and-language research. In: International Conference on Computer Vision (2019)
Xu, J., Mei, T., Yao, T., Rui, Y.: MSR-VTT: a large video description dataset for bridging video and language. In: Conference on Computer Vision and Pattern Recognition (2016)
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguistics 2, 67–78 (2014)
Acknowledgements
This work was partly supported by the Swiss National Science Foundation (project “Participatory Knowledge Practices in Analog and Digital Image Archives”, contract no. CRSII5_193788).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Spiess, F. et al. (2022). Multi-modal Video Retrieval in Virtual Reality with vitrivr-VR. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_45
Download citation
DOI: https://doi.org/10.1007/978-3-030-98355-0_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98354-3
Online ISBN: 978-3-030-98355-0
eBook Packages: Computer ScienceComputer Science (R0)