International Journal of Computer Vision

, Volume 64, Issue 2–3, pp 107–123 | Cite as

On Space-Time Interest Points

Article

Abstract

Local image features or interest points provide compact and abstract representations of patterns in an image. In this paper, we extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be used for a compact representation of video data as well as for interpretation of spatio-temporal events.

To detect spatio-temporal events, we build on the idea of the Harris and Förstner interest point operators and detect local structures in space-time where the image values have significant local variations in both space and time. We estimate the spatio-temporal extents of the detected events by maximizing a normalized spatio-temporal Laplacian operator over spatial and temporal scales. To represent the detected events, we then compute local, spatio-temporal, scale-invariant N-jets and classify each event with respect to its jet descriptor. For the problem of human motion analysis, we illustrate how a video representation in terms of local space-time features allows for detection of walking people in scenes with occlusions and dynamic cluttered backgrounds.

Keywords

interest points scale-space video interpretation matching scale selection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

Supplementary material (4.44 MB)

References

  1. Almansa, A. and Lindeberg, T. 2000. Fingerprint enhancement by shape adaptation of scale-space operators with automatic scale-selection. IEEE Transactions on Image Processing, 9(12):2027–2042.CrossRefGoogle Scholar
  2. Barron, J., Fleet, D., and Beauchemin, S. 1994. Performance of optical flow techniques. International Journal of Computer Vision, 12(1):43–77.CrossRefGoogle Scholar
  3. Baumberg, A.M. and Hogg, D. 1996. Generating spatiotemporal models from examples. Image and Vision Computing, { 14}(8):525–532.CrossRefGoogle Scholar
  4. Bigün, J., Granlund, G., and Wiklund, J. 1991. Multidimensional orientation estimation with applications to texture analysis and optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8):775–790.CrossRefGoogle Scholar
  5. Black, M. and Jepson, A. 1998. Eigentracking: Robust matching and tracking of articulated objects using view-based representation. International Journal of Computer Vision, 26(1):63–84.CrossRefGoogle Scholar
  6. Black, M., Yacoob, Y., Jepson, A., and Fleet, D. 1997. Learning parameterized models of image motion. Proc. Computer Vision and Pattern Recognition, pp. 561–567.Google Scholar
  7. Blake, A. and Isard, M. 1998. Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1):5–28.CrossRefGoogle Scholar
  8. Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. Proc. Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 8–15.Google Scholar
  9. Bretzner, L. and Lindeberg, T. 1998. Feature tracking with automatic selection of spatial scales. Computer Vision and Image Understanding, {71}(3):385–392.CrossRefGoogle Scholar
  10. Chomat, O., de Verdiere, V., Hall, D., and Crowley, J. 2000a. Local scale selection for {G}aussian based description techniques. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:117–133.Google Scholar
  11. Chomat, O., Martin, J., and Crowley, J. 2000b. A probabilistic sensor for the perception and recognition of activities. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:487–503.Google Scholar
  12. Duda, R., Hart, P., and Stork, D. 2001. Pattern Classification, Wiley.Google Scholar
  13. Efros, A., Berg, A., Mori, G., and Malik, J. (2003). Recognizing action at a distance. Proc. Ninth International Conference on Computer Vision, Nice, France, pp. 726–733.Google Scholar
  14. Fergus, R., Perona, P., and Zisserman, A. 2003. Object class recognition by unsupervised scale-invariant learning. In Proc. Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. II:264–271.Google Scholar
  15. Fleet, D., Black, M., and Jepson, A. 1998. Motion feature detection using steerable flow fields. In Proc. Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 274–281.Google Scholar
  16. Florack, L.M.J. 1997. Image Structure, {K}luwer {A}cademic {P}ublishers, Dordrecht, Netherlands.Google Scholar
  17. Förstner, W.A. and Gülch, E. 1987. A fast operator for detection and precise location of distinct points, corners and centers of circular features. In Proc. Intercommission Workshop of the Int. Soc. for Photogrammetry and Remote Sensing, Interlaken, Switzerland.Google Scholar
  18. Gârding, J. and Lindeberg, T. 1996. Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision, {17}(2):163–191.CrossRefGoogle Scholar
  19. Hall, D., de Verdiere, V., and Crowley, J. 2000. Object recognition using coloured receptive fields. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:164– 177.Google Scholar
  20. Harris, C. and Stephens, M. 1988. A combined corner and edge detector. Alvey Vision Conference, pp. 147–152.Google Scholar
  21. Hoey, J. and Little, J. 2000. Representation and recognition of complex human motion. In Proc. Computer Vision and Pattern Recognition, Hilton Head, SC, pp. I:752–759.Google Scholar
  22. Koenderink, J. and van Doorn, A. 1987. Representation of local geometry in the visual system. Biological Cybernetics, {55}:367–375.CrossRefPubMedGoogle Scholar
  23. Koenderink, J.J. 1988. Scale-time. Biological Cybernetics, {58}:159–162.CrossRefGoogle Scholar
  24. Koenderink, J.J. and {van Doorn}, A.J. 1992. Generic neighborhood operators. IEEE Transactions on Pattern Analysis and Machine Intelligence, {14}(6):597–605.CrossRefGoogle Scholar
  25. Laptev, I. and Lindeberg, T. 2002. Velocity-Adaptation of Spatio-Temporal Receptive Fields for Direct Recognition of Activities: An Experimental Study. In Proc. ECCV′02 Workshop on Statistical Methods in Video Processing (Extended Version to Appear in Image and Vision Computing), D. Suter (Ed.), Copenhagen, Denmark, pp. 61–66.Google Scholar
  26. Laptev, I. and Lindeberg, T. 2003a. Interest Point Detection and Scale Selection in Space-Time. In Scale-Space′03, L. Griffin and M. Lillholm (Eds.), Vol. 2695 of Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 372–387.Google Scholar
  27. Laptev, I. and Lindeberg, T. 2003b. Interest points in space-time. In Proc. Ninth International Conference on Computer Vision, Nice, France.Google Scholar
  28. Leung, T. and Malik, J. 2001. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, {43}(1):29–44.CrossRefGoogle Scholar
  29. Lindeberg, T. 1994. {Scale-{S}pace {T}heory in {C}omputer {V}ision}, Kluwer Academic Publishers, Boston.Google Scholar
  30. Lindeberg, T. 1997. On automatic selection of temporal scales in time-causal scale-space, AFPAC′97: Algebraic Frames for the Perception-Action Cycle, Vol. 1315 of Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 94–113.Google Scholar
  31. Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, {30}(2):77–116.Google Scholar
  32. Lindeberg, T. 2002. Time-recursive velocity-adapted spatio-temporal scale-space filters. In Proc. Seventh European Conference on Computer Vision, Vol. 2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Copenhagen, Denmark, pp. I:52–67.Google Scholar
  33. Lindeberg, T. and Bretzner, L. 2003. Real-time scale selection in hybrid multi-scale representations. In Scale-Space′03, L. Griffin and M. Lillholm (Eds)., Vol. 2695 of Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 148–163.Google Scholar
  34. Lindeberg, T. and Fagerström, D. 1996. Scale-space with causal time direction. In Proc. Fourth European Conference on Computer Vision, Vol. 1064 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Cambridge, UK, pp. I:229–240.Google Scholar
  35. Lowe, D. 1999. Object recognition from local scale-invariant features. In Proc. Seventh International Conference on Computer Vision, Corfu, Greece, pp. 1150–1157.Google Scholar
  36. Malik, J., Belongie, S., Shi, J., and Leung, T. 1999. Textons, contours and regions: Cue integration in image segmentation. In Proc. Seventh International Conference on Computer Vision, Corfu, Greece, pp. 918–925.Google Scholar
  37. Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In Proc. Eighth International Conference on Computer Vision, Vancouver, Canada, pp. I:525–531.Google Scholar
  38. Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. In Proc. Seventh European Conference on Computer Vision, Vol. 2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Copenhagen, Denmark, pp. I:128–142.Google Scholar
  39. Niyogi, S.A. 1995. Detecting kinetic occlusion. In Proc. Fifth International Conference on Computer Vision, Cambridge, MA, pp. 1044–1049.Google Scholar
  40. Niyogi, S. and Adelson, H. 1994. Analyzing and recognizing walking figures in {XYT}. CVPR, pp. 469–474.Google Scholar
  41. Schmid, C. and Mohr, R. 1997. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530–535.CrossRefGoogle Scholar
  42. Schmid, C., Mohr, R., and Bauckhage, C. 2000. Evaluation of interest point detectors. International Journal of Computer Vision, 37(2):151–172.CrossRefGoogle Scholar
  43. Sidenbladh, H., Black, M., and Fleet, D. 2000. Stochastic tracking of 3D human figures using 2D image motion. In Proc. Sixth European Conference on Computer Vision, Vol. 1843 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. II:702–718.Google Scholar
  44. Smith, S. and Brady, J. 1995. ASSET-2: Real-time motion segmentation and shape tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):814–820.CrossRefGoogle Scholar
  45. Tell, D. and Carlsson, S. 2002. Combining topology and appearance for wide baseline matching. In Proc. Seventh European Conference on Computer Vision, Vol. 2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Copenhagen, Denmark, pp. I:68–83.Google Scholar
  46. Tuytelaars, T. and Van Gool, L. 2000. Wide baseline stereo matching based on local, affinely invariant regions. British Machine Vision Conference, pp. 412–425.Google Scholar
  47. Wallraven, C., Caputo, B., and Graf, A. 2003. Recognition with local features: the kernel recipe. In Proc. Ninth International Conference on Computer Vision, Nice, France.Google Scholar
  48. Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for visual object class recognition. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:18–32.Google Scholar
  49. Witkin, A.P. 1983. Scale-space filtering. In Proc. 8th Int. Joint Conf. Art. Intell., Karlsruhe, Germany, pp. 1019–1022.Google Scholar
  50. Zelnik-Manor, L. and Irani, M. 2001. Event-based analysis of video. In Proc. Computer Vision and Pattern Recognition, Kauai Marriott, Hawaii, pp. II:123–130.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.IRISA/INRIARennes CedexFrance

Personalised recommendations