Skip to main content
Log in

Effective video hyperlinking by means of enriched feature sets and monomodal query combinations

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Video content has been increasing at an unprecedented rate in recent years, bringing the need for improved tools providing efficient access to specific contents of interest. Within the management of video content, hyperlinking aims at determining related video segments from a collection with respect to an input video anchor. This paper describes the system we designed to address feature selection for the video hyperlinking challenge, as defined by TRECVID, one of the top worldwide venues for multimedia benchmarking. The proposed solution is based on different combinations of textual and visual features, enriched to capture the various facets of the videos: automatically generated transcripts, visual concepts, video metadata, named-entity recognition, and concept-mapping techniques. The different combinations of monomodal queries are experimentally evaluated, and the impact of both parameters and single features are discussed to identify their contributions. The best performing approach at the TRECVID 2017 video hyperlinking challenge was the ensemble feature selection, which includes three different monomodal queries based on enriched feature sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://lucene.apache.org/solr.

  2. https://www.ranks.nl/stopwords.

References

  1. Aly R, Eskevich M, Ordelman R, Jones GJ (2013) Adapting binary information retrieval evaluation metrics for segment-based retrieval tasks. arXiv preprint arXiv:1312.1913

  2. Awad G, Butt A, Fiscus J, Joy D, Delgado A, Michel M, Smeaton AF, Graham Y, Kraaij W, Quénot G, Eskevich M, Ordelman R, Jones GJF, Huet B (2017) TRECVID 2017: evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In: Proceedings of TRECVID 2017, NIST

  3. Banerjee S, Pedersen T (2002) An adapted lesk algorithm for word sense disambiguation using wordnet. In: Gelbukh A (ed) International conference on intelligent text processing and computational linguistics. Springer, Berlin, pp 136–145

  4. Barthel KU, Hezel N, Mackowiak R (2016) Navigating a graph of scenes for exploring large video collections. In: Tian Q, Sebe N, Qi GJ, Huet B, Hong R, Liu X (eds) International conference on multimedia modeling. Springer, Berlin, pp 418–423

  5. Beecks C, Schoeffmann K, Lux M, Uysal MS, Seidl T (2015) Endoscopic video retrieval: a signature-based approach for linking endoscopic images with video segments. In: Multimedia (ISM), 2015 IEEE international symposium on. IEEE, pp 33–38

  6. Blažek A, Lokoč J, Matzner F, Skopal T (2015) Enhanced signature-based video browser. In: He X, Luo S, Tao D, Xu C, Yang J, Hasan MA (eds) International conference on multimedia modeling. Springer, Berlin, pp 243–248

  7. Bois R, Vukotić V, Simon AR, Sicre R, Raymond C, Sébillot P, Gravier G (2017) Exploiting multimodality in video hyperlinking to improve target diversity. In: International conference on multimedia modeling. Springer, Berlin, pp 185–197

  8. Bradford RB, Pozniak J (2016) A systematic approach to design of a text categorizer. In: Systems, man, and cybernetics (SMC), 2016 IEEE international conference on. IEEE, pp 509–514

  9. Cheng ZQ, Zhang H, Wu X, Ngo CW (2017) On the selection of anchors and targets for video hyperlinking. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval. ACM, pp 287–293

  10. Demirdelen M, Budnik M, Sargent G, Bois R, Gravier G (2017) IRISA at TRECVID 2017: beyond crossmodal and multimodal models for video hyperlinking. In: Working notes of the TRECVID 2017 workshop

  11. Eskevich M, Jones GJ, Aly R, Ordelman RJ, Chen S, Nadeem D, Guinaudeau C, Gravier G, Sébillot P, De Nies T, et al (2013) Multimedia information seeking through search and hyperlinking. In: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval. ACM, pp 287–294

  12. Eskevich M, Aly R, Racca D, Ordelman R, Chen S, Jones GJ (2014) The search and hyperlinking task at mediaeval 2014

  13. Eskevich M, Bui QM, Le HA, Huet B (2015) Exploring video hyperlinking in broadcast media. In: Proceedings of the third edition workshop on speech, language & audio in multimedia. ACM, pp 35–38

  14. Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books. MIT Press, Cambridge, USA

  15. Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 363–370

  16. Galuščáková P, Saleh S, Pecina P (2016) Shamus: Ufal search and hyperlinking multimedia system. In: Ferro N et al (eds) European conference on information retrieval. Springer, Berlin, pp 853–856

  17. Gauvain JL (2010) The Quaero program: multilingual and multimedia technologies. In: International workshop on spoken language translation (IWSLT)

  18. Hoogs A, Perera AA, Collins R, Basharat A, Fieldhouse K, Atkins C, Sherrill L, Boeckel B, Blue R, Woehlke M, et al (2015) An end-to-end system for content-based video retrieval using behavior, actions, and appearance with interactive query refinement. In: Advanced video and signal based surveillance (AVSS), 2015 12th IEEE international conference on. IEEE, pp 1–6

  19. Huet B, Baralis E, Garza P, Kavoosifar MR (2016) Eurecom-Polito at TRECVID 2016: hyperlinking task. In: Working notes of the TRECVID 2016 workshop

  20. Huet B, Baralis E, Garza P, Kavoosifar MR (2017) Eurecom-Polito at TRECVID 2017: hyperlinking task. In: Working notes of the TRECVID 2017 workshop

  21. Kumar J (2015) Apache Solr search patterns. Packt Publishing Ltd, Birmingham, United Kingdom

  22. Lamel L (2012) Multilingual speech processing activities in quaero: application to multimedia search in unstructured data. In: Baltic HLT, pp 1–8

  23. Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, Cambridge

    Book  Google Scholar 

  24. Lin D et al (1998) An information-theoretic definition of similarity. In: Icml, Citeseer, vol 98, pp 296–304

  25. Liu X, Troncy R, Huet B (2011) Finding media illustrating events. In: Proceedings of the 1st ACM international conference on multimedia retrieval. ACM, p 58

  26. McCandless M, Hatcher E, Gospodnetic O (2010) Lucene in action: covers Apache Lucene 3.0. Manning Publications Co, Shelter Island, New York

  27. Moumtzidou A, Mironidis T, Apostolidis E, Markatopoulou F, Ioannidou A, Gialampoukidis I, Avgerinakis K, Vrochidis S, Mezaris V, Kompatsiaris I et al (2016) Verge: a multimodal interactive search engine for video browsing and retrieval. In: Tian Q, Sebe N, Qi GJ, Huet B, Hong R, Liu X (eds) International conference on multimedia modeling. Springer, Berlin, pp 394–399

  28. Nakagawa A, Kutics A, Tanaka K, Nakajima M (2003) Combining words and object-based visual features in image retrieval. In: Image analysis and processing, 2003. Proceedings. 12th international conference on. IEEE, pp 354–359

  29. Okuoka T, Takahashi T, Deguchi D, Ide I, Murase H (2009) Labeling news topic threads with Wikipedia entries. In: Multimedia, 2009. ISM’09. 11th IEEE international symposium on. IEEE, pp 501–504

  30. Quattoni A, Collins M, Darrell T (2007) Learning visual representations using images with captions. In: Computer vision and pattern recognition, 2007. CVPR’07. IEEE conference on. IEEE, pp 1–8

  31. Racca DN, Jones GJ (2015) Evaluating search and hyperlinking: an example of the design, test, refine cycle for metric development. In: MediaEval

  32. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint arXiv:cmp-lg/9511007

  33. Rossetto L, Giangreco I, Tănase C, Schuldt H (2017) Multimodal video retrieval with the 2017 IMOTION system. In: Proceedings of the 2017 ACM on international conference on multimedia retrieval. ACM, pp 457–460

  34. Safadi B, Sahuguet M, Huet B (2014) When textual and visual information join forces for multimedia retrieval. In: Proceedings of international conference on multimedia retrieval. ACM, p 265

  35. Schmiedeke S, Xu P, Ferrané I, Eskevich M, Kofler C, Larson MA, Estève Y, Lamel L, Jones GJ, Sikora T (2013) Blip10000: a social video dataset containing SPUG content for tagging and retrieval. In: Proceedings of the 4th ACM multimedia systems conference. ACM, pp 96–101

  36. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  37. Smiley D, Pugh E, Parisa K, Mitchell M (2015) Apache Solr enterprise search server. Packt Publishing Ltd, Birmingham, United Kingdom

  38. Soleymani M, Riegler M, Halvorsen P (2018) Multimodal analysis of user behavior and browsed content under different image search intents. Int J Multimed Inf Retr 7(1):29–41

    Article  Google Scholar 

  39. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  40. Tan S, Ngo CW, Tan HK, Pang L (2011) Cross media hyperlinking for search topic browsing. In: Proceedings of the 19th ACM international conference on multimedia. ACM, pp 243–252

  41. Tanase C, Giangreco I, Rossetto L, Schuldt H, Seddati O, Dupont S, Altiok OC, Sezgin M (2016) Semantic sketch-based video retrieval with autocompletion. In: Companion publication of the 21st international conference on intelligent user interfaces. ACM, pp 97–101

  42. Tani MYK, Ghomari A, Lablack A, Bilasco IM (2017) OVIS: ontology video surveillance indexing and retrieval system. Int J Multimed Inf Retr 6(4):295–316

    Article  Google Scholar 

  43. Tay Y, Phan MC, Tuan LA, Hui SC (2017) Learning to rank question answer pairs with holographic dual LSTM architecture. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 695–704

  44. Verma Y, Jha A, Jawahar C (2018) Cross-specificity: modelling data semantics for cross-modal matching and retrieval. Int J Multimed Inf Retr 7(2):139–146

    Article  Google Scholar 

  45. Vukotić V, Raymond C, Gravier G (2016) Bidirectional joint representation learning with symmetrical deep neural networks for multimodal and crossmodal applications. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval. ACM, pp 343–346

  46. Vukotić V, Raymond C, Gravier G (2018) A crossmodal approach to multimodal fusion in video hyperlinking. IEEE Multimed 25(2):11–23

    Article  Google Scholar 

  47. Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 133–138

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniele Apiletti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kavoosifar, M.R., Apiletti, D., Baralis, E. et al. Effective video hyperlinking by means of enriched feature sets and monomodal query combinations. Int J Multimed Info Retr 9, 215–227 (2020). https://doi.org/10.1007/s13735-019-00173-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-019-00173-y

Keywords

Navigation