Effective video hyperlinking by means of enriched feature sets and monomodal query combinations

Kavoosifar, Mohammad Reza; Apiletti, Daniele; Baralis, Elena; Garza, Paolo; Huet, Benoit

doi:10.1007/s13735-019-00173-y

Effective video hyperlinking by means of enriched feature sets and monomodal query combinations

Regular Paper
Published: 10 June 2019

Volume 9, pages 215–227, (2020)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Mohammad Reza Kavoosifar¹,
Daniele Apiletti ORCID: orcid.org/0000-0003-0538-9775¹,
Elena Baralis¹,
Paolo Garza¹ &
…
Benoit Huet²

184 Accesses
2 Citations
Explore all metrics

Abstract

Video content has been increasing at an unprecedented rate in recent years, bringing the need for improved tools providing efficient access to specific contents of interest. Within the management of video content, hyperlinking aims at determining related video segments from a collection with respect to an input video anchor. This paper describes the system we designed to address feature selection for the video hyperlinking challenge, as defined by TRECVID, one of the top worldwide venues for multimedia benchmarking. The proposed solution is based on different combinations of textual and visual features, enriched to capture the various facets of the videos: automatically generated transcripts, visual concepts, video metadata, named-entity recognition, and concept-mapping techniques. The different combinations of monomodal queries are experimentally evaluated, and the impact of both parameters and single features are discussed to identify their contributions. The best performing approach at the TRECVID 2017 video hyperlinking challenge was the ensemble feature selection, which includes three different monomodal queries based on enriched feature sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 5

Learning with Noisy Correspondence

Article 13 April 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Article 15 September 2023

Current challenges and visions in music recommender systems research

Article Open access 05 April 2018

Notes

References

Aly R, Eskevich M, Ordelman R, Jones GJ (2013) Adapting binary information retrieval evaluation metrics for segment-based retrieval tasks. arXiv preprint arXiv:1312.1913
Awad G, Butt A, Fiscus J, Joy D, Delgado A, Michel M, Smeaton AF, Graham Y, Kraaij W, Quénot G, Eskevich M, Ordelman R, Jones GJF, Huet B (2017) TRECVID 2017: evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In: Proceedings of TRECVID 2017, NIST
Banerjee S, Pedersen T (2002) An adapted lesk algorithm for word sense disambiguation using wordnet. In: Gelbukh A (ed) International conference on intelligent text processing and computational linguistics. Springer, Berlin, pp 136–145
Barthel KU, Hezel N, Mackowiak R (2016) Navigating a graph of scenes for exploring large video collections. In: Tian Q, Sebe N, Qi GJ, Huet B, Hong R, Liu X (eds) International conference on multimedia modeling. Springer, Berlin, pp 418–423
Beecks C, Schoeffmann K, Lux M, Uysal MS, Seidl T (2015) Endoscopic video retrieval: a signature-based approach for linking endoscopic images with video segments. In: Multimedia (ISM), 2015 IEEE international symposium on. IEEE, pp 33–38
Blažek A, Lokoč J, Matzner F, Skopal T (2015) Enhanced signature-based video browser. In: He X, Luo S, Tao D, Xu C, Yang J, Hasan MA (eds) International conference on multimedia modeling. Springer, Berlin, pp 243–248
Bois R, Vukotić V, Simon AR, Sicre R, Raymond C, Sébillot P, Gravier G (2017) Exploiting multimodality in video hyperlinking to improve target diversity. In: International conference on multimedia modeling. Springer, Berlin, pp 185–197
Bradford RB, Pozniak J (2016) A systematic approach to design of a text categorizer. In: Systems, man, and cybernetics (SMC), 2016 IEEE international conference on. IEEE, pp 509–514
Cheng ZQ, Zhang H, Wu X, Ngo CW (2017) On the selection of anchors and targets for video hyperlinking. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval. ACM, pp 287–293
Demirdelen M, Budnik M, Sargent G, Bois R, Gravier G (2017) IRISA at TRECVID 2017: beyond crossmodal and multimodal models for video hyperlinking. In: Working notes of the TRECVID 2017 workshop
Eskevich M, Jones GJ, Aly R, Ordelman RJ, Chen S, Nadeem D, Guinaudeau C, Gravier G, Sébillot P, De Nies T, et al (2013) Multimedia information seeking through search and hyperlinking. In: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval. ACM, pp 287–294
Eskevich M, Aly R, Racca D, Ordelman R, Chen S, Jones GJ (2014) The search and hyperlinking task at mediaeval 2014
Eskevich M, Bui QM, Le HA, Huet B (2015) Exploring video hyperlinking in broadcast media. In: Proceedings of the third edition workshop on speech, language & audio in multimedia. ACM, pp 35–38
Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books. MIT Press, Cambridge, USA
Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 363–370
Galuščáková P, Saleh S, Pecina P (2016) Shamus: Ufal search and hyperlinking multimedia system. In: Ferro N et al (eds) European conference on information retrieval. Springer, Berlin, pp 853–856
Gauvain JL (2010) The Quaero program: multilingual and multimedia technologies. In: International workshop on spoken language translation (IWSLT)
Hoogs A, Perera AA, Collins R, Basharat A, Fieldhouse K, Atkins C, Sherrill L, Boeckel B, Blue R, Woehlke M, et al (2015) An end-to-end system for content-based video retrieval using behavior, actions, and appearance with interactive query refinement. In: Advanced video and signal based surveillance (AVSS), 2015 12th IEEE international conference on. IEEE, pp 1–6
Huet B, Baralis E, Garza P, Kavoosifar MR (2016) Eurecom-Polito at TRECVID 2016: hyperlinking task. In: Working notes of the TRECVID 2016 workshop
Huet B, Baralis E, Garza P, Kavoosifar MR (2017) Eurecom-Polito at TRECVID 2017: hyperlinking task. In: Working notes of the TRECVID 2017 workshop
Kumar J (2015) Apache Solr search patterns. Packt Publishing Ltd, Birmingham, United Kingdom
Lamel L (2012) Multilingual speech processing activities in quaero: application to multimedia search in unstructured data. In: Baltic HLT, pp 1–8
Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, Cambridge
Book Google Scholar
Lin D et al (1998) An information-theoretic definition of similarity. In: Icml, Citeseer, vol 98, pp 296–304
Liu X, Troncy R, Huet B (2011) Finding media illustrating events. In: Proceedings of the 1st ACM international conference on multimedia retrieval. ACM, p 58
McCandless M, Hatcher E, Gospodnetic O (2010) Lucene in action: covers Apache Lucene 3.0. Manning Publications Co, Shelter Island, New York
Moumtzidou A, Mironidis T, Apostolidis E, Markatopoulou F, Ioannidou A, Gialampoukidis I, Avgerinakis K, Vrochidis S, Mezaris V, Kompatsiaris I et al (2016) Verge: a multimodal interactive search engine for video browsing and retrieval. In: Tian Q, Sebe N, Qi GJ, Huet B, Hong R, Liu X (eds) International conference on multimedia modeling. Springer, Berlin, pp 394–399
Nakagawa A, Kutics A, Tanaka K, Nakajima M (2003) Combining words and object-based visual features in image retrieval. In: Image analysis and processing, 2003. Proceedings. 12th international conference on. IEEE, pp 354–359
Okuoka T, Takahashi T, Deguchi D, Ide I, Murase H (2009) Labeling news topic threads with Wikipedia entries. In: Multimedia, 2009. ISM’09. 11th IEEE international symposium on. IEEE, pp 501–504
Quattoni A, Collins M, Darrell T (2007) Learning visual representations using images with captions. In: Computer vision and pattern recognition, 2007. CVPR’07. IEEE conference on. IEEE, pp 1–8
Racca DN, Jones GJ (2015) Evaluating search and hyperlinking: an example of the design, test, refine cycle for metric development. In: MediaEval
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint arXiv:cmp-lg/9511007
Rossetto L, Giangreco I, Tănase C, Schuldt H (2017) Multimodal video retrieval with the 2017 IMOTION system. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval. ACM, pp 457–460
Safadi B, Sahuguet M, Huet B (2014) When textual and visual information join forces for multimedia retrieval. In: Proceedings of international conference on multimedia retrieval. ACM, p 265
Schmiedeke S, Xu P, Ferrané I, Eskevich M, Kofler C, Larson MA, Estève Y, Lamel L, Jones GJ, Sikora T (2013) Blip10000: a social video dataset containing SPUG content for tagging and retrieval. In: Proceedings of the 4th ACM multimedia systems conference. ACM, pp 96–101
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Smiley D, Pugh E, Parisa K, Mitchell M (2015) Apache Solr enterprise search server. Packt Publishing Ltd, Birmingham, United Kingdom
Soleymani M, Riegler M, Halvorsen P (2018) Multimodal analysis of user behavior and browsed content under different image search intents. Int J Multimed Inf Retr 7(1):29–41
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tan S, Ngo CW, Tan HK, Pang L (2011) Cross media hyperlinking for search topic browsing. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 243–252
Tanase C, Giangreco I, Rossetto L, Schuldt H, Seddati O, Dupont S, Altiok OC, Sezgin M (2016) Semantic sketch-based video retrieval with autocompletion. In: Companion publication of the 21st international conference on intelligent user interfaces. ACM, pp 97–101
Tani MYK, Ghomari A, Lablack A, Bilasco IM (2017) OVIS: ontology video surveillance indexing and retrieval system. Int J Multimed Inf Retr 6(4):295–316
Article Google Scholar
Tay Y, Phan MC, Tuan LA, Hui SC (2017) Learning to rank question answer pairs with holographic dual LSTM architecture. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 695–704
Verma Y, Jha A, Jawahar C (2018) Cross-specificity: modelling data semantics for cross-modal matching and retrieval. Int J Multimed Inf Retr 7(2):139–146
Article Google Scholar
Vukotić V, Raymond C, Gravier G (2016) Bidirectional joint representation learning with symmetrical deep neural networks for multimodal and crossmodal applications. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval. ACM, pp 343–346
Vukotić V, Raymond C, Gravier G (2018) A crossmodal approach to multimodal fusion in video hyperlinking. IEEE Multimed 25(2):11–23
Article Google Scholar
Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 133–138

Download references

Author information

Authors and Affiliations

Dipartimento di Automatica e Informatica, Politecnico di Torino, Turin, Italy
Mohammad Reza Kavoosifar, Daniele Apiletti, Elena Baralis & Paolo Garza
EURECOM, Sophia Antipolis, France
Benoit Huet

Authors

Mohammad Reza Kavoosifar
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Apiletti
View author publications
You can also search for this author in PubMed Google Scholar
Elena Baralis
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Garza
View author publications
You can also search for this author in PubMed Google Scholar
Benoit Huet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniele Apiletti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kavoosifar, M.R., Apiletti, D., Baralis, E. et al. Effective video hyperlinking by means of enriched feature sets and monomodal query combinations. Int J Multimed Info Retr 9, 215–227 (2020). https://doi.org/10.1007/s13735-019-00173-y

Download citation

Received: 23 January 2019
Revised: 25 April 2019
Accepted: 31 May 2019
Published: 10 June 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s13735-019-00173-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective video hyperlinking by means of enriched feature sets and monomodal query combinations

Abstract

Access this article

Similar content being viewed by others

Learning with Noisy Correspondence

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Current challenges and visions in music recommender systems research

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effective video hyperlinking by means of enriched feature sets and monomodal query combinations

Abstract

Access this article

Similar content being viewed by others

Learning with Noisy Correspondence

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Current challenges and visions in music recommender systems research

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation