An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector

  • Geert Willems
  • Tinne Tuytelaars
  • Luc Van Gool
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5303)


Over the years, several spatio-temporal interest point detectors have been proposed. While some detectors can only extract a sparse set of scale-invariant features, others allow for the detection of a larger amount of features at user-defined scales. This paper presents for the first time spatio-temporal interest points that are at the same time scale-invariant (both spatially and temporally) and densely cover the video content. Moreover, as opposed to earlier work, the features can be computed efficiently. Applying scale-space theory, we show that this can be achieved by using the determinant of the Hessian as the saliency measure. Computations are speeded-up further through the use of approximative box-filter operations on an integral video structure. A quantitative evaluation and experimental results on action recognition show the strengths of the proposed detector in terms of repeatability, accuracy and speed, in comparison with previously proposed detectors.


Action Recognition Interest Point Interest Point Detector Scale Selection Saliency Measure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

978-3-540-88688-4_48_MOESM1_ESM.avi (6.8 mb)
Supplementary material (14,036 KB)


  1. 1.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV, vol. 2, pp. 1470–1477 (October 2003)Google Scholar
  2. 2.
    Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)Google Scholar
  3. 3.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: BMVC, Edinburgh, U.K (2006)Google Scholar
  4. 4.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, Nice, France (October 2003)Google Scholar
  5. 5.
    Laptev, I.: On space-time interest points. IJCV 64(2), 107–123 (2005)CrossRefGoogle Scholar
  6. 6.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 65–72 (2005)Google Scholar
  7. 7.
    Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: ICCV, vol. I, pp. 166–173 (2005)Google Scholar
  8. 8.
    Oikonomopoulos, A., Patras, I., Pantic, M.: Spatiotemporal salient points for visual recognition of human actions. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics 36(3), 710–719 (2006)CrossRefGoogle Scholar
  9. 9.
    Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. IJCV 65(1-2), 43–72 (2005)CrossRefGoogle Scholar
  10. 10.
    Kadir, T., Brady, M.: Scale, saliency and image description. IJCV 45(2), 83–105 (2001)CrossRefzbMATHGoogle Scholar
  11. 11.
    Wong, S.F., Cipolla, R.: Extracting spatiotemporal interest points using global information. In: ICCV, Rio de Janeiro, Brazil, pp. 1–8 (2007)Google Scholar
  12. 12.
    Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded-up robust features. In: ECCV, Graz, Austria (2006)Google Scholar
  13. 13.
    Lindeberg, T.: Feature detection with automatic scale selection. IJCV 30(2), 77–116 (1998)Google Scholar
  14. 14.
    Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60(1), 63–86 (2004)CrossRefGoogle Scholar
  15. 15.
    Beaudet, P.: Rotationally invariant image operators. In: International Joint Conference on Pattern Recognition, pp. 579–583 (1978)Google Scholar
  16. 16.
    Laptev, I., Lindeberg, T.: Velocity adaptation of space-time interest points. In: ICPR, Cambridge, U.K (2004)Google Scholar
  17. 17.
    Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. Technical Report KUL/ESAT/PSI/0802, K.U. Leuven (2008)Google Scholar
  18. 18.
    Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: Int. Workshop on Spatial Coherence for Visual Motion AnalysisGoogle Scholar
  19. 19.
    Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: CIVR, pp. 494–501 (2007)Google Scholar
  20. 20.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001),
  21. 21.
    Nowozin, S., Bakir, G., Tsuda, K.: Discriminative subsequence mining for action classification, pp. 1919–1923 (2007)Google Scholar
  22. 22.
    Yan, J., Pollefeys, M.: Video synchronization via space-time interest point distribution. In: Advanced Concepts for Intelligent Vision Systems, ACIVS 2004 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Geert Willems
    • 1
  • Tinne Tuytelaars
    • 1
  • Luc Van Gool
    • 1
    • 2
  1. 1.ESAT-PSI, K.U. LeuvenBelgium
  2. 2.ETH, ZürichSwitzerland

Personalised recommendations