Local Invariant Feature Tracks for High-Level Video Feature Extraction

Mezaris, Vasileios; Dimou, Anastasios; Kompatsiaris, Ioannis

doi:10.1007/978-1-4614-3831-1_10

Vasileios Mezaris⁵,
Anastasios Dimou⁵ &
Ioannis Kompatsiaris⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 158))

1065 Accesses
4 Citations

Abstract

In this work the use of feature tracks for the detection of high-level features (concepts) in video is proposed. Extending previous work on local interest point detection and description in images, feature tracks are defined as sets of local interest points that are found in different frames of a video shot and exhibit spatio-temporal and visual continuity, thus defining a trajectory in the 2D+Time space. These tracks jointly capture the spatial attributes of 2D local regions and their corresponding long-term motion. The extraction of feature tracks and the selection and representation of an appropriate subset of them allow the generation of a Bag-of-Spatiotemporal-Words model for the shot, which facilitates capturing the dynamics of video content. Experimental evaluation of the proposed approach on two challenging datasets (TRECVID 2007, TRECVID 2010) highlights how the selection, representation and use of such feature tracks enhances the results of traditional keyframe-based concept detection techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www-nlpir.nist.gov/projects/trecvid/

References

Mezaris V, Kompatsiaris I, Boulgouris N, Strintzis M (2004) Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval. IEEE Trans Circuits Syst Video Technol 14(5):606–621
Google Scholar
Mezaris V, Kompatsiaris I, Strintzis M (2004) Video object segmentation using Bayes-based temporal tracking and trajectory-based region merging. IEEE Trans Circuits Syst Video Technol 14(6):782–795
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60:91–110
Article Google Scholar
Dance C, Willamowski J, Fan L, Bray C, Csurka G (2004) Visual categorization with bags of keypoints. In: Proceedings of the ECCV international workshop on statistical learning in computer vision, Prague, Czech Republic, May 2004
Google Scholar
Mezaris V, Sidiropoulos P, Dimou A, Kompatsiaris I (2010) On the use of visual soft semantics for video temporal decomposition to scenes. In: Proceedings of the fourth IEEE international conference on semantic computing (ICSC 2010), Pittsburgh, PA, USA, Sept 2010
Google Scholar
Gkalelis N, Mezaris V, Kompatsiaris I (2010) Automatic event-based indexing of multimedia content using a joint content-event model. In: Proceedings of the ACM multimedia 2010, events in multiMedia workshop (EiMM10), Firenze, Italy, Oct 2010
Google Scholar
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630
Article Google Scholar
Bay H, Ess A, Tuytelaars T, Gool LV (2008) Surf: speeded up robust features. Comput Vis Image Underst 110(3):346–359
Article Google Scholar
Burghouts GJ, Geusebroek JM (2009) Performance evaluation of local colour invariants. Comput Vis Image Underst 113:48–62
Article Google Scholar
Smeaton AF, Over P, Kraaij W (2009) High-level feature detection from video in TRECVid: a 5-Year retrospective of achievements. In: Divakaran A (ed) Multimedia content analysis, signals and communication technology. Springer, Berlin, pp 151–174
Google Scholar
Piro P, Anthoine S, Debreuve E, Barlaud M (2010) Combining spatial and temporal patches for scalable video indexing. Multimedia Tools Appl 48(1):89–104
Article Google Scholar
Snoek C, van de Sande K, de Rooij O et al (2008) The MediaMill TRECVID 2008 semantic video search engine. In: Proceedings of the TRECVID 2008 workshop, USA, Nov 2008
Google Scholar
Ballan L, Bertini M, Bimbo AD, Serra G (2010) Video event classification using String Kernels. Multimedia Tools Appl 48(1):69–87
Article Google Scholar
Chen M, Hauptmann A (2009) Mo SIFT: recognizing human actions in surveillance videos. Technical Report CMU-CS-09-161, Carnegie Mellon University
Google Scholar
Laptev I (2005) On space-time interest points. Int J Comput Vision 64(2/3):107–123
Article Google Scholar
Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vision 79(3):299–318
Google Scholar
Zhou H, Yuan Y, Shi C (2009) Object tracking using SIFT features and mean shift. Comput Vision Image Underst 113(3):345–352
Article Google Scholar
Tsuduki Y, Fujiyoshi H (2009) A method for visualizing pedestrian traffic flow using SIFT feature point tracking. In: Proceedings of the 3rd Pacific-Rim symposium on image and video technology, Tokyo, Japan, Jan 2009
Google Scholar
Anjulan A, Canagarajah N (2009) A unified framework for object retrieval and mining. IEEE IEEE Trans Circuits Syst Video Technol 19(1):63–76
Google Scholar
Moenne-Loccoz N, Bruno E, Marchand-Maillet S (2006) Local feature trajectories for efficient event-based indexing of video sequences. In: Proceedings of the international conference on image and video retrieval (CIVR), Tempe, USA, July 2006
Google Scholar
Sun J, Wu X, Yan S, Cheong L, Chua TS, Li J (2009) Hierarchical spatio-temporal context modeling for action recognition. In: Proceedings international conference on computer vision and pattern recognition (CVPR), Miami, USA, June 2009
Google Scholar
Lazebnik S, Schmid C, Ponce J (2009) Spatial pyramid matching. In: Dickinson S, Leonardis A, Schiele B, Tarr M (eds) Object categorization: computer and human vision perspectives. Cambridge University Press, Cambridge
Google Scholar
Moumtzidou A, Dimou A, Gkalelis N, Vrochidis S, Mezaris V, Kompatsiaris I (2010) ITI-CERTH participation to TRECVID 2010. In: Proceedings of the TRECVID 2010 workshop, USA, Nov 2010
Google Scholar
Yilmaz E, Kanoulas E, Aslam J (2008) A simple and efficient sampling method for estimating AP and NDCG. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in, information retrieval (SIGIR), pp 603–610
Google Scholar

Download references

Acknowledgments

This work was supported by the European Commission under contract FP7-248984 GLOCAL.

Author information

Authors and Affiliations

Centre for Research and Technology Hellas, Informatics and Telematics Institute, 6th Km Charilaou-Thermi Road, 57001, Thermi, Greece
Vasileios Mezaris, Anastasios Dimou & Ioannis Kompatsiaris

Authors

Vasileios Mezaris
View author publications
You can also search for this author in PubMed Google Scholar
Anastasios Dimou
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Kompatsiaris
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vasileios Mezaris .

Editor information

Editors and Affiliations

, DEA - Facoltà di Ingegneria, University of Brescia, Studio N. 5, Via Branze 38, Brescia, 25123, Italy
Nicola Adami
, Elec. Engineering and Computer Science, Queen Mary University of London, Mile End Road, London, E1 4NS, United Kingdom
Andrea Cavallaro
, DII - Facoltà di Ingegneria, University of Brescia, Office N.7, Via Branze 38, Brescia, 25123, Italy
Riccardo Leonardi
, Department of Information Engineering, University of Brescia, Via Branze, 38, Brescia, 25123, Italy
Pierangelo Migliorati

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mezaris, V., Dimou, A., Kompatsiaris, I. (2013). Local Invariant Feature Tracks for High-Level Video Feature Extraction. In: Adami, N., Cavallaro, A., Leonardi, R., Migliorati, P. (eds) Analysis, Retrieval and Delivery of Multimedia Content. Lecture Notes in Electrical Engineering, vol 158. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3831-1_10

Download citation

DOI: https://doi.org/10.1007/978-1-4614-3831-1_10
Published: 08 August 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3830-4
Online ISBN: 978-1-4614-3831-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics