Local Descriptors for Spatio-temporal Recognition

  • Ivan Laptev
  • Tony Lindeberg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3667)

Abstract

This paper presents and investigates a set of local space-time descriptors for representing and recognizing motion patterns in video. Following the idea of local features in the spatial domain, we use the notion of space-time interest points and represent video data in terms of local space-time events. To describe such events, we define several types of image descriptors over local spatio-temporal neighborhoods and evaluate these descriptors in the context of recognizing human activities. In particular, we compare motion representations in terms of spatio-temporal jets, position dependent histograms, position independent histograms, and principal component analysis computed for either spatio-temporal gradients or optic flow. An experimental evaluation on a video database with human actions shows that high classification performance can be achieved, and that there is a clear advantage of using local position dependent histograms, consistent with previously reported findings regarding spatial recognition.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Black, M.J., Jepson, A.D.: Eigentracking: Robust matching and tracking of articulated objects using view-based representation. IJCV 26(1), 63–84 (1998)CrossRefGoogle Scholar
  2. 2.
    Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE-PAMI 23(3), 257–267 (2001)Google Scholar
  3. 3.
    Chomat, O., Martin, J., Crowley, J.L.: A Probabilistic Sensor for the Perception and the Recognition of Activities. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. I:487–503. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: Proc. ICCV, pp. 726–733 (2003)Google Scholar
  5. 5.
    Fablet, R., Bouthemy, P.: Motion recognition using nonparametric image motion models estimated from temporal and multiscale co-occurrence statistics. IEEE-PAMI 25(12), 1619–1624 (2003)Google Scholar
  6. 6.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR, Madison, Wisconsin, pp. 264–271 (2003)Google Scholar
  7. 7.
    Gavrila, D.M.: The visual analysis of human movement: A survey. Computer Vision and Image Understanding 73(1), 82–98 (1999)MATHCrossRefGoogle Scholar
  8. 8.
    Hoey, J., Little, J.J.: Representation and recognition of complex human motion. In: Proc. CVPR, pp. I:752–759 (2000)Google Scholar
  9. 9.
    Ke, Y., Sukthankar, R.: PCA-SIFT: A more disctinctive representation for local image descriptors. Technical Report IRP–TR–03–15, Intel (2003)Google Scholar
  10. 10.
    Koenderink, J.J., van Doorn, A.J.: Representation of local geometry in the visual system. Biol. Cyb. 55, 367–375 (1987)MATHCrossRefGoogle Scholar
  11. 11.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. ICCV, pp. 432–439 (2003)Google Scholar
  12. 12.
    Laptev, I., Lindeberg, T.: Velocity adaptation of space-time interest points. In: Proc. of ICPR (to appear, 2004)Google Scholar
  13. 13.
    Laptev, I., Lindeberg, T.: Velocity-adapted spatio-temporal receptive fields for direct recognition of activities. IVC 22(2), 105–116 (2004)Google Scholar
  14. 14.
    Lindeberg, T.: Feature detection with automatic scale selection. IJCV 30(2), 77–116 (1998)Google Scholar
  15. 15.
    Lindeberg, T.: Time-recursive velocity-adapted spatio-temporal scale-space filters. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. I:52–67. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Lindeberg, T., Gårding, J.: Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D structure. IVC 15, 415–434 (1997)Google Scholar
  17. 17.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Proc. 7th Int. Conf. on Computer Vision, Corfu, Greece, pp. 1150–1157 (1999)Google Scholar
  18. 18.
    Lukas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Image Understanding Workshop (1981)Google Scholar
  19. 19.
    Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. I:128–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: Proc. CVPR, pp.II: 257–263 (2003)Google Scholar
  21. 21.
    Nagel, H.H., Gehrke, A.: Spatiotemporal adaptive filtering for estimation and segmentation of optical flow fields. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 86–102. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  22. 22.
    Schiele, B., Crowley, J.: Recognition without correspondence using multidimensional receptive field histograms. IJCV 36(1), 31–50 (2000)CrossRefGoogle Scholar
  23. 23.
    Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proc. of ICPR (to appear, 2004)Google Scholar
  24. 24.
    Shah, M., Jain, R. (eds.): Motion-Based Recognition. Kluwer, Dordrecht (1997)MATHGoogle Scholar
  25. 25.
    Yacoob, Y., Black, M.J.: Parameterized modeling and recognition of activities. Computer Vision and Image Understanding 73(2), 232–247 (1999)CrossRefGoogle Scholar
  26. 26.
    Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: Proc. CVPR, pp. II:123–130 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ivan Laptev
    • 1
  • Tony Lindeberg
    • 1
  1. 1.Computational Vision and Active Perception Laboratory (CVAP), Dept. of Numerical Analysis and Computing Science, KTHStockholmSweden

Personalised recommendations