Abstract
In this paper, we present the newest version of our interactive video browser tool Vibro. For this iteration, we focused on improving the user interface to enable a more accessible temporal search, upgrading the shot-detection algorithm, replacing a keyword-based search with rich text input, and reducing query times by applying a graph-based approximate nearest neighbor search method. With these extensive updates, we feel well-equipped to handle the huge amounts of data coming our way in the next VBS competitions and achieve competitive results in the contest.
Keywords
- Content-based video retrieval
- Exploration
- Visualization
- Image browsing
- Visual and textual co-embeddings
This is a preview of subscription content, access via your institution.
Buying options

References
Asano, Y.M., Rupprecht, C., Vedaldi, A.: Self-labelling via simultaneous clustering and representation learning. In: ICLR. OpenReview.net (2020)
Aumüller, M., Bernhardsson, E., Faithfull, A.: ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87, 101374 (2020)
Barthel, K.U., Hezel, N.: Visually Exploring Millions of Images using Image Maps and Graphs, chap. 11, pp. 289–315. Wiley, Hoboken (2019)
Berns, F., Rossetto, L., Schoeffmann, K., Beecks, C., Awad, G.: V3C1 dataset: an evaluation of content characteristics. In: ICMR 2019 Proceedings of the 2019 on International Conference on Multimedia Retrieval (2019)
Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. arXiv pp. arXiv-2001 (2020)
Chen, Y.-C., et al.: UNITER: UNiversal image-TExt representation learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 104–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_7
Gordo, A., Almazán, J., Revaud, J., Larlus, D.: Deep image retrieval: learning global representations for image search. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 241–257. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_15
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Heller, S., et al.: Towards explainable interactive multi-modal video retrieval with Vitrivr. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 435–440. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_41
Hezel, N., Barthel, K.U.: Dynamic construction and manipulation of hierarchical quartic image graphs. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, ICMR 2018, pp. 513–516. Association for Computing Machinery, New York (2018)
Hezel, N., Schall, K., Jung, K., Barthel, K.U.: Video search with sub-image keyword transfer using existing image archives. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 484–489. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_49
Kratochvíl, M., Veselý, P., Mejzlík, F., Lokoč, J.: SOM-hunter: video browsing with relevance-to-SOM feedback loop. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 790–795. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_71
Lokoč, J., et al.: Is the reign of interactive search eternal? Findings from the video browser showdown 2020. ACM Trans. Multimedia Comput. Commun. Appl. 17(3), 1–26 (2021)
Radford, A., et al.: Learning transferable visual models from natural language supervision. CoRR abs/2103.00020 (2021)
Schall, K., Barthel, K.U., Hezel, N., Jung, K.: Deep aggregation of regional convolutional activations for content based image retrieval. In: 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6. IEEE (2019)
Wang, M., Xu, X., Yue, Q., Wang, Y.: A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. Proc. VLDB Endow. 14(11), 1964–1978 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Hezel, N., Schall, K., Jung, K., Barthel, K.U. (2022). Efficient Search and Browsing of Large-Scale Video Collections with Vibro. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-98355-0_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98354-3
Online ISBN: 978-3-030-98355-0
eBook Packages: Computer ScienceComputer Science (R0)