Skip to main content

Local Invariant Feature Tracks for High-Level Video Feature Extraction

  • Chapter
  • First Online:
Analysis, Retrieval and Delivery of Multimedia Content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 158))

Abstract

In this work the use of feature tracks for the detection of high-level features (concepts) in video is proposed. Extending previous work on local interest point detection and description in images, feature tracks are defined as sets of local interest points that are found in different frames of a video shot and exhibit spatio-temporal and visual continuity, thus defining a trajectory in the 2D+Time space. These tracks jointly capture the spatial attributes of 2D local regions and their corresponding long-term motion. The extraction of feature tracks and the selection and representation of an appropriate subset of them allow the generation of a Bag-of-Spatiotemporal-Words model for the shot, which facilitates capturing the dynamics of video content. Experimental evaluation of the proposed approach on two challenging datasets (TRECVID 2007, TRECVID 2010) highlights how the selection, representation and use of such feature tracks enhances the results of traditional keyframe-based concept detection techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www-nlpir.nist.gov/projects/trecvid/

References

  1. Mezaris V, Kompatsiaris I, Boulgouris N, Strintzis M (2004) Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval. IEEE Trans Circuits Syst Video Technol 14(5):606–621

    Google Scholar 

  2. Mezaris V, Kompatsiaris I, Strintzis M (2004) Video object segmentation using Bayes-based temporal tracking and trajectory-based region merging. IEEE Trans Circuits Syst Video Technol 14(6):782–795

    Google Scholar 

  3. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60:91–110

    Article  Google Scholar 

  4. Dance C, Willamowski J, Fan L, Bray C, Csurka G (2004) Visual categorization with bags of keypoints. In: Proceedings of the ECCV international workshop on statistical learning in computer vision, Prague, Czech Republic, May 2004

    Google Scholar 

  5. Mezaris V, Sidiropoulos P, Dimou A, Kompatsiaris I (2010) On the use of visual soft semantics for video temporal decomposition to scenes. In: Proceedings of the fourth IEEE international conference on semantic computing (ICSC 2010), Pittsburgh, PA, USA, Sept 2010

    Google Scholar 

  6. Gkalelis N, Mezaris V, Kompatsiaris I (2010) Automatic event-based indexing of multimedia content using a joint content-event model. In: Proceedings of the ACM multimedia 2010, events in multiMedia workshop (EiMM10), Firenze, Italy, Oct 2010

    Google Scholar 

  7. Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630

    Article  Google Scholar 

  8. Bay H, Ess A, Tuytelaars T, Gool LV (2008) Surf: speeded up robust features. Comput Vis Image Underst 110(3):346–359

    Article  Google Scholar 

  9. Burghouts GJ, Geusebroek JM (2009) Performance evaluation of local colour invariants. Comput Vis Image Underst 113:48–62

    Article  Google Scholar 

  10. Smeaton AF, Over P, Kraaij W (2009) High-level feature detection from video in TRECVid: a 5-Year retrospective of achievements. In: Divakaran A (ed) Multimedia content analysis, signals and communication technology. Springer, Berlin, pp 151–174

    Google Scholar 

  11. Piro P, Anthoine S, Debreuve E, Barlaud M (2010) Combining spatial and temporal patches for scalable video indexing. Multimedia Tools Appl 48(1):89–104

    Article  Google Scholar 

  12. Snoek C, van de Sande K, de Rooij O et al (2008) The MediaMill TRECVID 2008 semantic video search engine. In: Proceedings of the TRECVID 2008 workshop, USA, Nov 2008

    Google Scholar 

  13. Ballan L, Bertini M, Bimbo AD, Serra G (2010) Video event classification using String Kernels. Multimedia Tools Appl 48(1):69–87

    Article  Google Scholar 

  14. Chen M, Hauptmann A (2009) Mo SIFT: recognizing human actions in surveillance videos. Technical Report CMU-CS-09-161, Carnegie Mellon University

    Google Scholar 

  15. Laptev I (2005) On space-time interest points. Int J Comput Vision 64(2/3):107–123

    Article  Google Scholar 

  16. Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vision 79(3):299–318

    Google Scholar 

  17. Zhou H, Yuan Y, Shi C (2009) Object tracking using SIFT features and mean shift. Comput Vision Image Underst 113(3):345–352

    Article  Google Scholar 

  18. Tsuduki Y, Fujiyoshi H (2009) A method for visualizing pedestrian traffic flow using SIFT feature point tracking. In: Proceedings of the 3rd Pacific-Rim symposium on image and video technology, Tokyo, Japan, Jan 2009

    Google Scholar 

  19. Anjulan A, Canagarajah N (2009) A unified framework for object retrieval and mining. IEEE IEEE Trans Circuits Syst Video Technol 19(1):63–76

    Google Scholar 

  20. Moenne-Loccoz N, Bruno E, Marchand-Maillet S (2006) Local feature trajectories for efficient event-based indexing of video sequences. In: Proceedings of the international conference on image and video retrieval (CIVR), Tempe, USA, July 2006

    Google Scholar 

  21. Sun J, Wu X, Yan S, Cheong L, Chua TS, Li J (2009) Hierarchical spatio-temporal context modeling for action recognition. In: Proceedings international conference on computer vision and pattern recognition (CVPR), Miami, USA, June 2009

    Google Scholar 

  22. Lazebnik S, Schmid C, Ponce J (2009) Spatial pyramid matching. In: Dickinson S, Leonardis A, Schiele B, Tarr M (eds) Object categorization: computer and human vision perspectives. Cambridge University Press, Cambridge

    Google Scholar 

  23. Moumtzidou A, Dimou A, Gkalelis N, Vrochidis S, Mezaris V, Kompatsiaris I (2010) ITI-CERTH participation to TRECVID 2010. In: Proceedings of the TRECVID 2010 workshop, USA, Nov 2010

    Google Scholar 

  24. Yilmaz E, Kanoulas E, Aslam J (2008) A simple and efficient sampling method for estimating AP and NDCG. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in, information retrieval (SIGIR), pp 603–610

    Google Scholar 

Download references

Acknowledgments

This work was supported by the European Commission under contract FP7-248984 GLOCAL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasileios Mezaris .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Mezaris, V., Dimou, A., Kompatsiaris, I. (2013). Local Invariant Feature Tracks for High-Level Video Feature Extraction. In: Adami, N., Cavallaro, A., Leonardi, R., Migliorati, P. (eds) Analysis, Retrieval and Delivery of Multimedia Content. Lecture Notes in Electrical Engineering, vol 158. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3831-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3831-1_10

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-3830-4

  • Online ISBN: 978-1-4614-3831-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics