Local Feature Trajectories for Efficient Event-Based Indexing of Video Sequences

  • Nicolas Moënne-Loccoz
  • Eric Bruno
  • Stéphane Marchand-Maillet
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4071)


We address the problem of indexing video sequences according to the events they depict. While a number of different approaches have been proposed in order to describe events, none is sufficiently generic and computationally efficient to be applied to event-based retrieval of video sequences within large databases. In this paper, we propose a novel index of video sequences which aims at describing their dynamic content. This index relies on the local feature trajectories estimated from the spatio-temporal volume of the video sequences. The computation of this index is efficient, makes assumption neither about the represented events nor about the video sequences. We show through a batch of experimentations on standard video sequence corpus that this index permits to classify complex human activities as efficiently as state of the art methods while being far more efficient to retrieve generic classes of events.


Video Sequence Interest Point Camera Motion Equal Error Rate Event Representation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)CrossRefGoogle Scholar
  2. 2.
    Bretzner, L., Lindeberg, T.: Feature tracking with automatic selection of spatial scales. Computer Vision and Image Understanding: CVIU 71(3), 385–392 (1998)CrossRefGoogle Scholar
  3. 3.
    Bruno, É., Moënne-Loccoz, N., Marchand-Maillet, S.: Unsupervised event discrimination based on nonlinear temporal modelling of activity. Pattern Analysis and Application (PAA) 7(4), 402–410 (2004)CrossRefGoogle Scholar
  4. 4.
    Chen, L., Tamer Özsu, M., Oria, V.: Using multi-scale histograms to answer pattern existence and shape match queries. In: SSDBM, pp. 217–226 (2005) Google Scholar
  5. 5.
    Chomat, O., Crowley, J.L.: Probabilistic recognition of activity using local appearance. In: CVPR, pp. 2104–2109 (1999)Google Scholar
  6. 6.
    Csurka, G., Dance, C., Fan, L., Williamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision (2004)Google Scholar
  7. 7.
    Fablet, R., Bouthemy, P., Pérez, P.: Non-parametric motion characterization using causal probabilistic models for video indexing and retrieval. IEEE Trans. on Image Processing 11(4), 393–407 (2002)CrossRefGoogle Scholar
  8. 8.
    Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research Logistics Quaterly 2, 83–97 (1955)CrossRefGoogle Scholar
  9. 9.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)Google Scholar
  10. 10.
    Laptev, I., Lindeberg, T.: Velocity adaptation of space-time interest points. In: ICPR (1), pp. 52–56 (2004)Google Scholar
  11. 11.
    Lin, C.-Y., Tseng, B.L., Smith, J.R.: Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In: Proceedings of the TRECVID 2003 Workshop (2003)Google Scholar
  12. 12.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  13. 13.
    Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. International Journal of Computer Vision 60(1), 63–86 (2004)CrossRefGoogle Scholar
  14. 14.
    Moënne-Loccoz, N., Bruno, E., Marchand-Maillet, S.: Video content representation as salient regions of activity. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 384–392. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Moënne-Loccoz, N., Bruno, E., Marchand-Maillet, S.: Interactive partial matching of video sequences in large collections. In: IEEE International Conference on Image Processing, Genova, Italy, 11-14 September (2005)Google Scholar
  16. 16.
    Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR (3), pp. 32–36 (2004)Google Scholar
  17. 17.
    Shi, J., Tomasi, C.: Good features to track. In: CVPR, Seattle (June 1994)Google Scholar
  18. 18.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV (October 2003)Google Scholar
  19. 19.
    Smeaton, A.F., Kraaij, W., Over, P.: TRECVID 2003 - An Introduction. In: Proceedings of the TRECVID 2003 Workshop (2003)Google Scholar
  20. 20.
    Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: CVPR (2), pp. 123–130 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nicolas Moënne-Loccoz
    • 1
  • Eric Bruno
    • 1
  • Stéphane Marchand-Maillet
    • 1
  1. 1.University of GenevaGeneva 4Switzerland

Personalised recommendations