Abstract
Video content has been increasing at an unprecedented rate in recent years, bringing the need for improved tools providing efficient access to specific contents of interest. Within the management of video content, hyperlinking aims at determining related video segments from a collection with respect to an input video anchor. This paper describes the system we designed to address feature selection for the video hyperlinking challenge, as defined by TRECVID, one of the top worldwide venues for multimedia benchmarking. The proposed solution is based on different combinations of textual and visual features, enriched to capture the various facets of the videos: automatically generated transcripts, visual concepts, video metadata, named-entity recognition, and concept-mapping techniques. The different combinations of monomodal queries are experimentally evaluated, and the impact of both parameters and single features are discussed to identify their contributions. The best performing approach at the TRECVID 2017 video hyperlinking challenge was the ensemble feature selection, which includes three different monomodal queries based on enriched feature sets.
Similar content being viewed by others
References
Aly R, Eskevich M, Ordelman R, Jones GJ (2013) Adapting binary information retrieval evaluation metrics for segment-based retrieval tasks. arXiv preprint arXiv:1312.1913
Awad G, Butt A, Fiscus J, Joy D, Delgado A, Michel M, Smeaton AF, Graham Y, Kraaij W, Quénot G, Eskevich M, Ordelman R, Jones GJF, Huet B (2017) TRECVID 2017: evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In: Proceedings of TRECVID 2017, NIST
Banerjee S, Pedersen T (2002) An adapted lesk algorithm for word sense disambiguation using wordnet. In: Gelbukh A (ed) International conference on intelligent text processing and computational linguistics. Springer, Berlin, pp 136–145
Barthel KU, Hezel N, Mackowiak R (2016) Navigating a graph of scenes for exploring large video collections. In: Tian Q, Sebe N, Qi GJ, Huet B, Hong R, Liu X (eds) International conference on multimedia modeling. Springer, Berlin, pp 418–423
Beecks C, Schoeffmann K, Lux M, Uysal MS, Seidl T (2015) Endoscopic video retrieval: a signature-based approach for linking endoscopic images with video segments. In: Multimedia (ISM), 2015 IEEE international symposium on. IEEE, pp 33–38
Blažek A, Lokoč J, Matzner F, Skopal T (2015) Enhanced signature-based video browser. In: He X, Luo S, Tao D, Xu C, Yang J, Hasan MA (eds) International conference on multimedia modeling. Springer, Berlin, pp 243–248
Bois R, Vukotić V, Simon AR, Sicre R, Raymond C, Sébillot P, Gravier G (2017) Exploiting multimodality in video hyperlinking to improve target diversity. In: International conference on multimedia modeling. Springer, Berlin, pp 185–197
Bradford RB, Pozniak J (2016) A systematic approach to design of a text categorizer. In: Systems, man, and cybernetics (SMC), 2016 IEEE international conference on. IEEE, pp 509–514
Cheng ZQ, Zhang H, Wu X, Ngo CW (2017) On the selection of anchors and targets for video hyperlinking. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval. ACM, pp 287–293
Demirdelen M, Budnik M, Sargent G, Bois R, Gravier G (2017) IRISA at TRECVID 2017: beyond crossmodal and multimodal models for video hyperlinking. In: Working notes of the TRECVID 2017 workshop
Eskevich M, Jones GJ, Aly R, Ordelman RJ, Chen S, Nadeem D, Guinaudeau C, Gravier G, Sébillot P, De Nies T, et al (2013) Multimedia information seeking through search and hyperlinking. In: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval. ACM, pp 287–294
Eskevich M, Aly R, Racca D, Ordelman R, Chen S, Jones GJ (2014) The search and hyperlinking task at mediaeval 2014
Eskevich M, Bui QM, Le HA, Huet B (2015) Exploring video hyperlinking in broadcast media. In: Proceedings of the third edition workshop on speech, language & audio in multimedia. ACM, pp 35–38
Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books. MIT Press, Cambridge, USA
Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 363–370
Galuščáková P, Saleh S, Pecina P (2016) Shamus: Ufal search and hyperlinking multimedia system. In: Ferro N et al (eds) European conference on information retrieval. Springer, Berlin, pp 853–856
Gauvain JL (2010) The Quaero program: multilingual and multimedia technologies. In: International workshop on spoken language translation (IWSLT)
Hoogs A, Perera AA, Collins R, Basharat A, Fieldhouse K, Atkins C, Sherrill L, Boeckel B, Blue R, Woehlke M, et al (2015) An end-to-end system for content-based video retrieval using behavior, actions, and appearance with interactive query refinement. In: Advanced video and signal based surveillance (AVSS), 2015 12th IEEE international conference on. IEEE, pp 1–6
Huet B, Baralis E, Garza P, Kavoosifar MR (2016) Eurecom-Polito at TRECVID 2016: hyperlinking task. In: Working notes of the TRECVID 2016 workshop
Huet B, Baralis E, Garza P, Kavoosifar MR (2017) Eurecom-Polito at TRECVID 2017: hyperlinking task. In: Working notes of the TRECVID 2017 workshop
Kumar J (2015) Apache Solr search patterns. Packt Publishing Ltd, Birmingham, United Kingdom
Lamel L (2012) Multilingual speech processing activities in quaero: application to multimedia search in unstructured data. In: Baltic HLT, pp 1–8
Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, Cambridge
Lin D et al (1998) An information-theoretic definition of similarity. In: Icml, Citeseer, vol 98, pp 296–304
Liu X, Troncy R, Huet B (2011) Finding media illustrating events. In: Proceedings of the 1st ACM international conference on multimedia retrieval. ACM, p 58
McCandless M, Hatcher E, Gospodnetic O (2010) Lucene in action: covers Apache Lucene 3.0. Manning Publications Co, Shelter Island, New York
Moumtzidou A, Mironidis T, Apostolidis E, Markatopoulou F, Ioannidou A, Gialampoukidis I, Avgerinakis K, Vrochidis S, Mezaris V, Kompatsiaris I et al (2016) Verge: a multimodal interactive search engine for video browsing and retrieval. In: Tian Q, Sebe N, Qi GJ, Huet B, Hong R, Liu X (eds) International conference on multimedia modeling. Springer, Berlin, pp 394–399
Nakagawa A, Kutics A, Tanaka K, Nakajima M (2003) Combining words and object-based visual features in image retrieval. In: Image analysis and processing, 2003. Proceedings. 12th international conference on. IEEE, pp 354–359
Okuoka T, Takahashi T, Deguchi D, Ide I, Murase H (2009) Labeling news topic threads with Wikipedia entries. In: Multimedia, 2009. ISM’09. 11th IEEE international symposium on. IEEE, pp 501–504
Quattoni A, Collins M, Darrell T (2007) Learning visual representations using images with captions. In: Computer vision and pattern recognition, 2007. CVPR’07. IEEE conference on. IEEE, pp 1–8
Racca DN, Jones GJ (2015) Evaluating search and hyperlinking: an example of the design, test, refine cycle for metric development. In: MediaEval
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint arXiv:cmp-lg/9511007
Rossetto L, Giangreco I, Tănase C, Schuldt H (2017) Multimodal video retrieval with the 2017 IMOTION system. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval. ACM, pp 457–460
Safadi B, Sahuguet M, Huet B (2014) When textual and visual information join forces for multimedia retrieval. In: Proceedings of international conference on multimedia retrieval. ACM, p 265
Schmiedeke S, Xu P, Ferrané I, Eskevich M, Kofler C, Larson MA, Estève Y, Lamel L, Jones GJ, Sikora T (2013) Blip10000: a social video dataset containing SPUG content for tagging and retrieval. In: Proceedings of the 4th ACM multimedia systems conference. ACM, pp 96–101
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Smiley D, Pugh E, Parisa K, Mitchell M (2015) Apache Solr enterprise search server. Packt Publishing Ltd, Birmingham, United Kingdom
Soleymani M, Riegler M, Halvorsen P (2018) Multimodal analysis of user behavior and browsed content under different image search intents. Int J Multimed Inf Retr 7(1):29–41
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tan S, Ngo CW, Tan HK, Pang L (2011) Cross media hyperlinking for search topic browsing. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 243–252
Tanase C, Giangreco I, Rossetto L, Schuldt H, Seddati O, Dupont S, Altiok OC, Sezgin M (2016) Semantic sketch-based video retrieval with autocompletion. In: Companion publication of the 21st international conference on intelligent user interfaces. ACM, pp 97–101
Tani MYK, Ghomari A, Lablack A, Bilasco IM (2017) OVIS: ontology video surveillance indexing and retrieval system. Int J Multimed Inf Retr 6(4):295–316
Tay Y, Phan MC, Tuan LA, Hui SC (2017) Learning to rank question answer pairs with holographic dual LSTM architecture. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 695–704
Verma Y, Jha A, Jawahar C (2018) Cross-specificity: modelling data semantics for cross-modal matching and retrieval. Int J Multimed Inf Retr 7(2):139–146
Vukotić V, Raymond C, Gravier G (2016) Bidirectional joint representation learning with symmetrical deep neural networks for multimodal and crossmodal applications. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval. ACM, pp 343–346
Vukotić V, Raymond C, Gravier G (2018) A crossmodal approach to multimodal fusion in video hyperlinking. IEEE Multimed 25(2):11–23
Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 133–138
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kavoosifar, M.R., Apiletti, D., Baralis, E. et al. Effective video hyperlinking by means of enriched feature sets and monomodal query combinations. Int J Multimed Info Retr 9, 215–227 (2020). https://doi.org/10.1007/s13735-019-00173-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-019-00173-y