Hierarchical Late Fusion for Concept Detection in Videos

Strat, Sabin Tiberius; Benoit, Alexandre; Lambert, Patrick; Bredin, Hervé; Quénot, Georges

doi:10.1007/978-3-319-05696-8_3

Sabin Tiberius Strat^7,8,
Alexandre Benoit⁷,
Patrick Lambert⁷,
Hervé Bredin⁹ &
…
Georges Quénot¹⁰

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1747 Accesses
4 Citations

Abstract

Current research shows that the detection of semantic concepts (e.g., animal, bus, person, dancing, etc.) in multimedia documents such as videos, requires the use of several types of complementary descriptors in order to achieve good results. In this work, we explore strategies for combining dozens of complementary content descriptors (or “experts”) in an efficient way, through the use of late fusion approaches, for concept detection in multimedia documents. We explore two fusion approaches that share a common structure: both start with a clustering of experts stage, continue with an intra-cluster fusion and finish with an inter-cluster fusion, and we also experiment with other state-of-the-art methods. The first fusion approach relies on a priori knowledge about the internals of each expert to group the set of available experts by similarity. The second approach automatically obtains measures on the similarity of experts from their output to group the experts using agglomerative clustering, and then combines the results of this fusion with those from other methods. In the end, we show that an additional performance boost can be obtained by also considering the context of multimedia elements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
TREC Video Retrieval Evaluation, http://trecvid.nist.gov/.
2.
http://mrim.imag.fr/irim/
3.
http://mrim.imag.fr/georges.quenot/freesoft/knnlsb/index.html

References

Ayache S, Quénot G, Gensel J (2007) Image and video indexing using networks of operators. J Image Video Process 2007(3):1:1–1:13. doi:10.1155/2007/56928. http://dx.doi.org/10.1155/2007/56928
Ballas N, Delezoide B, Prêteux F (2011) Trajectories based descriptor for dynamic events annotation. In: Proceedings of the 2011 joint ACM workshop on modeling and representing events, J-MRE ’11. ACM, New York, pp 13–18. doi:10.1145/2072508.2072512. http://doi.acm.org/10.1145/2072508.2072512
Ballas N, Labbé B, Shabou A, Borgne L (2012) Cea list at trecvid 2012: semantic indexing and instance search. In: Proceedings of TRECVid workshop, Gaithersburg, 2012
Google Scholar
Ballas N, Labbé B, Shabou A, Le Borgne H, Gosselin P, Redi M, Merialdo B, Jégou H, Delhumeau J, Vieux R, Mansencal B, Benois-Pineau J, Ayache S, Hamadi A, Safadi B, Thollard F, Derbas N, Quenot G, Bredin H, Cord M, Gao B, Zhu C, Tang Y, Dellandrea E, Bichot CE, Chen L, Benoit A, Lambert P, Strat T, Razik J, Paris S, Glotin H, Trung TN, Petrovska-Delacrétaz D, Chollet G, Stoian A, Crucianu M (2012) IRIM at TRECVid 2012: semantic indexing and instance search. In: Proceedings of the workshop on TREC video retrieval evaluation (TRECVid). Gaithersburg, p 12. http://hal.archives-ouvertes.fr/hal-00770258. CNRS, RENATER, several Universities, other funding bodies (see https://www.grid5000.fr)
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359. doi:10.1016/j.cviu.2007.09.014. http://dx.doi.org/10.1016/j.cviu.2007.09.014
Google Scholar
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):10008. http://stacks.iop.org/1742-5468/2008/i=10/a=P10008
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Google Scholar
Cai N, Li M, Lin S, Zhang Y, Tang S (2007) Ap-based adaboost in high level feature extraction at trecvid. In: Proceedings of 2nd international conference on pervasive computing and applications, 2007. ICPCA 2007, pp 194–198. doi:10.1109/ICPCA.2007.4365438
Cao L, Chang SF, Codella N, Cotton C, Ellis D, Gong L, Hill M, Hua G, Kender J, Merler M, Mu Y, Smith JR, Felix XY (2012) Ibm research and columbia university trecvid-2012 multimedia event detection (med), multimedia event recounting (mer), and semantic indexing (sin) systems. In: NIST TRECVid workshop, Gaithersburg, 2012
Google Scholar
Cliville V, Berrah L, Mauris G (2004) Information fusion in industrial performance: a 2-additive choquet-integral based approach. In: IEEE international conference on systems, man and cybernetics, vol 2, pp 1297–1302. doi:10.1109/ICSMC.2004.1399804
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR09, 2009
Google Scholar
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–38
Article Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. doi:10.1006/jcss.1997.1504. http://www.sciencedirect.com/science/article/pii/S002200009791504X
Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268. http://dl.acm.org/citation.cfm?id=1953048.2021071
Google Scholar
Gosselin PH, Cord M, Philipp-Foliguet S (2008) Combining visual dictionary, kernel-based similarity and learning strategy for image category retrieval. Comput Vis Image Underst 110(3):403–417. doi:10.1016/j.cviu.2007.09.018. http://dx.doi.org/10.1016/j.cviu.2007.09.018
Hamadi A, Quénot G, Mulhem P (2013) Conceptual feedback for semantic multimedia indexing. In: 11th international workshop on content-based multimedia indexing (CBMI), Veszprém, 2013
Google Scholar
Kendall MG (1948) Rank correlation methods. Griffin, London
MATH Google Scholar
Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–23
Article Google Scholar
Little S, Llorente A, Rüger S (2010) An overview of evaluation campaigns in multimedia retrieval. In: Müller H, Clough P, Deselaers T, Caputo B (eds.) ImageCLEF. The information retrieval series, vol 32. Springer, Berlin, pp 507–525. doi:10.1007/978-3-642-15181-1_27. http://dx.doi.org/10.1007/978-3-642-15181-1_27
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. doi:10.1023/B:VISI.0000029664.99615.94. http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94
Google Scholar
Negrel R, Picard D, Gosselin P (2012) Compact tensor based image representation for similarity search. In: 19th IEEE international conference on image processing (ICIP), 2012, pp 2425–2428. doi:10.1109/ICIP.2012.6467387
Newman MEJ (2006) Modularity and community structure in networks. Proc Nat Acad Sci U.S.A 103(23):8577–8582. doi:10.1073/pnas.0601602103. http://www.pnas.org/cgi/content/abstract/103/23/8577
Ng KB, Kantor PB (2000) Predicting the effectiveness of naive data fusion on the basis of system characteristics. J Am Soc Inform Sci 51:1177–1189. doi: 10.1002/1097-4571(2000)9999:9999\(\langle \)::AID-ASI1030\(\rangle \)3.0.CO;2-E. http://dl.acm.org/citation.cfm?id=357868.357870
Over P, Awad G, Michel M, Fiscus J, Kraaij W, Smeaton AF, Quénot G (2011) Trecvid 2011—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVid 2011. NIST, USA, 2011
Google Scholar
Over P, Awad G, Michel M, Fiscus J, Sanders G, Kraaij W, Smeaton AF, Quénot G (2013) Trecvid 2013—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2013. NIST, USA 2013
Google Scholar
Paris S, Glotin H (2010) Pyramidal multi-level features for the robot vision@icpr 2010 challenge. In: 20th International conference on pattern recognition (ICPR), pp 2949–2952. doi:10.1109/ICPR.2010.1143
Pinquier J, Karaman S, Letoupin L, Guyot P, Megret R, Benois-Pineau J, Gaestel Y, Dartigues JF (2012) Strategies for multiple feature fusion with hierarchical hmm: application to activity recognition from wearable audiovisual sensors. In: 21st International conference on pattern recognition (ICPR), pp 3192–3195
Google Scholar
Redi M, Merialdo B (2011) Saliency moments for image categorization. In: Proceedings of the 1st ACM international conference on multimedia retrieval, ICMR ’11, pp 39:1–39:8. ACM, New York. doi:10.1145/1991996.1992035. http://doi.acm.org/10.1145/1991996.1992035
Safadi B, Quénot G (2010) Evaluations of multi-learner approaches for concept indexing in video documents. In: Adaptivity, personalization and fusion of heterogeneous information, RIAO ’10, pp 88–91. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D’INFORMATIQUE DOCUMENTAIRE, Paris, 2010. http://dl.acm.org/citation.cfm?id=1937055.1937075
Safadi B, Quénot G (2011) Re-ranking for multimedia indexing and retrieval. In: ECIR 2011: 33rd european conference on information retrieval. Springer, Dublin, pp 708–711
Google Scholar
Safadi B, Quénot G (2013) Descriptor optimization for multimedia indexing and retrieval. In: 11th International workshop on content-based multimedia indexing, CBMI 2013, Veszprem, 2013
Google Scholar
Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245. doi:10.1007/s11263-013-0636-x. http://dx.doi.org/10.1007/s11263-013-0636-x
Google Scholar
van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596. http://www.science.uva.nl/research/publications/2010/vandeSandeTPAMI2010
Google Scholar
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336. doi:10.1023/A:1007614523901. http://dx.doi.org/10.1023/A:1007614523901
Google Scholar
Shabou A, Borgne HL (2012) Locality-constrained and spatially regularized coding for scene categorization. In: CVPR, pp. 3618–3625. IEEE, 2012. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2012.html #ShabouL12
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton
MATH Google Scholar
Smeaton AF, Over P, Kraaij W (2009) High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Divakaran A (ed) Multimedia content analysis. Theory and applications. Springer, Berlin, pp 151–174
Google Scholar
Snoek CGM, van de Sande KEA, Habibian A, Kordumova S, Li Z, Mazloom M, Pintea SL, Tao R, Koelma DC, Smeulders AWM (2012) The mediamill trecvid 2012 semantic video search engine. In: Proceedings of the TRECVid workshop. http://www.science.uva.nl/research/publications/2012/SnoekPTRECVid2012a
Strat S, Benoit A, Lambert P (2013) Retina enhanced sift descriptors for video indexing. In: 11th International workshop on content-based multimedia indexing (CBMI), pp. 201–206. doi:10.1109/CBMI.2013.6576582
Strat S, Benoit A, Lambert P, Caplier A (2012) Retina-enhanced surf descriptors for semantic concept detection in videos. In: 3rd International conference on image processing theory, tools and applications (IPTA), 2012, pp 319–324. doi:10.1109/IPTA.2012.6469557
Strat ST, Benoit A, Lambert P, Caplier A (2013) Retina enhanced surf descriptors for spatio-temporal concept detection. In: Multimedia tools and applications, pp 1–27. doi:10.1007/s11042-012-1280-0. http://dx.doi.org/10.1007/s11042-012-1280-0
Strat T, Benoit A, Bredin H, Quenot G, Lambert P (2012) Hierarchical late fusion for concept detection in videos. In: Andrea Fusiello VMRC (ed.) Proceedings of computer vision—ECCV 2012. workshops and demonstrations, Part III, Lecture notes in computer science (LNCS), vol 7585. Springer, Berlin, pp 335–344. doi:10.1007/978-3-642-33885-4_34. http://hal.archives-ouvertes.fr/hal-00732740. Oral session 1: WS21—Workshop on information fusion in computer vision for concept recognition OSEO (French State agency for innovation) and ANR (French national research agency)
Tang Z, Yanai K (2008) UEC at TRECVID 2008 high level feature task. In: In: Proceedings of the workshop on TREC video retrieval evaluation (TRECVID). Gaithersburg. http://www-nlpir.nist.gov/projects/tvpubs/tv8.papers/uec.pdf
Wang H, Kläser A, Schmid C, Cheng-Lin L (2011) Action recognition by dense trajectories. In: IEEE conference on computer vision and pattern recognition. Colorado Springs, pp 3169–3176. http://hal.inria.fr/inria-00583818
Wu L, Guo Y, Qiu X, Feng Z, Rong J, Jin W, Zhou D, Wang R, Jin M (2003) Fudan university at trecvid 2003. In: Notebook of TRECVid
Google Scholar
Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM ’06, pp 102–111. ACM, New York. doi:10.1145/1183614.1183633. http://doi.acm.org/10.1145/1183614.1183633
Yilmaz E, Kanoulas E, Aslam JA (2008) A simple and efficient sampling method for estimating AP and NDCG. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’08. ACM, New York, pp 603–610. DOI http://doi.acm.org/10.1145/1390334.1390437. http://doi.acm.org/10.1145/1390334.1390437
Zhang L, Jiang L, Bao L, Takahashi S, Li YAH (2011) Informedia@trecvid 2011: Surveillance event detection. In: TRECVid video retrieval evaluation workshop, Gaitherburg
Google Scholar
Zhu C, Bichot CE, Chen L (2013) Image region description using orthogonal combination of local binary patterns enhanced with color information. Pattern Recogn. 46(7):1949–1963. doi:10.1016/j.patcog.2013.01.003. http://dx.doi.org/10.1016/j.patcog.2013.01.003
Znaidia A, Borgne HL, Hudelot C (2012) Belief theory for large-scale multi-label image classification. In: Denoeux T, Masson MH (eds.) Belief functions. Advances in soft computing, vol 164. Springer, Berlin, pp 205–212
Google Scholar

Download references

Acknowledgments

This work was supported by the Quaero Program and the QCompere project, respectively funded by OSEO (French State agency for innovation) and ANR (French national research agency). The authors would also like to thank the members of the IRIM consortium for the expert scores used throughout the experiments described in this paper.

Author information

Authors and Affiliations

LISTIC—University of Savoie, Annecy, France
Sabin Tiberius Strat, Alexandre Benoit & Patrick Lambert
LAPI—University “POLITEHNICA” of Bucharest, Bucharest, Romania
Sabin Tiberius Strat
CNRS-LIMSI, Orsay, France
Hervé Bredin
UJF-Grenoble 1 / UPMF-Grenoble 2 / Grenoble INP / CNRS, LIG UMR 5217, 38041, Grenoble, France
Georges Quénot

Authors

Sabin Tiberius Strat
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Benoit
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Lambert
View author publications
You can also search for this author in PubMed Google Scholar
Hervé Bredin
View author publications
You can also search for this author in PubMed Google Scholar
Georges Quénot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabin Tiberius Strat .

Editor information

Editors and Affiliations

University Politehnica of Bucharest, Romania
Bogdan Ionescu
University of Bordeaux, Talence, France
Jenny Benois-Pineau
Queen Mary University of London, London, United Kingdom
Tomas Piatrik
Lab. of Informatics of Grenoble, France
Georges Quénot

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Strat, S.T., Benoit, A., Lambert, P., Bredin, H., Quénot, G. (2014). Hierarchical Late Fusion for Concept Detection in Videos. In: Ionescu, B., Benois-Pineau, J., Piatrik, T., Quénot, G. (eds) Fusion in Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-05696-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-05696-8_3
Published: 26 March 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05695-1
Online ISBN: 978-3-319-05696-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics