Evaluation of Local Descriptors for Action Recognition in Videos

  • Piotr Bilinski
  • Francois Bremond
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6962)

Abstract

Recently, local descriptors have drawn a lot of attention as a representation method for action recognition. They are able to capture appearance and motion. They are robust to viewpoint and scale changes. They are easy to implement and quick to calculate. Moreover, they have shown to obtain good performance for action classification in videos. Over the last years, many different local spatio-temporal descriptors have been proposed. They are usually tested on different datasets and using different experimental methods. Moreover, experiments are done making assumptions that do not allow to fully evaluate descriptors. In this paper, we present a full evaluation of local spatio-temporal descriptors for action recognition in videos. Four widely used in state-of-the-art approaches descriptors and four video datasets were chosen. HOG, HOF, HOG-HOF and HOG3D were tested under a framework based on the bag-of-words model and Support Vector Machines.

Keywords

Support Vector Machine Action Recognition Local Descriptor Codebook Size Video Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: SURF: Speeded Up Robust Features. In: Computer Vision and Image Understanding (2008)Google Scholar
  2. 2.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as Space-Time Shapes. In: International Conference on Computer Vision (2005)Google Scholar
  3. 3.
    Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
  4. 4.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, in conjunction with ICCV 2005 (2005)Google Scholar
  5. 5.
    Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as Space-Time Shapes. Transactions on Pattern Analysis and Machine Intelligence (2007)Google Scholar
  6. 6.
    Harris, C., Stephens, M.: A Combined Corner and Edge Detector. In: Alvey Vision Conference (1988)Google Scholar
  7. 7.
    Joachims, T.: Making Large-Scale SVM Learning Practical. In: Advances in Kernel Methods - Support Vector Learning (1999)Google Scholar
  8. 8.
    Ke, Y., Sukthankar, R.: PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (2004)Google Scholar
  9. 9.
    Kläser, A., Marszałek, M., Schmid, C.: A Spatio-Temporal Descriptor Based on 3D-Gradients. In: British Machine Vision Conference (2008)Google Scholar
  10. 10.
    Kläser, A.: Learning human actions in video. In: PhD thesis, Université de Grenoble (2010)Google Scholar
  11. 11.
    Laptev, I., Lindeberg, T.: Space-Time Interest Points. In: International Conference on Computer Vision (2003)Google Scholar
  12. 12.
    Laptev, I.: On Space-Time Interest Points. International Journal of Computer Vision (2005)Google Scholar
  13. 13.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  14. 14.
    Lin, Z., Jiang, Z., Davis, L.S.: Recognizing Actions by Shape-Motion Prototype Trees. In: International Conference on Computer Vision (2009)Google Scholar
  15. 15.
    Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. In: International Journal of Computer Vision (2004)Google Scholar
  16. 16.
    Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: International Conference on Computer Vision (2009)Google Scholar
  17. 17.
    Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing Human Actions: A Local SVM Approach. In: International Conference on Pattern Recognition (2004)Google Scholar
  19. 19.
    Stöttinger, J., Goras, B.T., Pönitz, T., Sebe, N., Hanbury, A., Gevers, T.: Systematic Evaluation of Spatio-temporal Features on Comparative Video Challenges. In: International Workshop on Video Event Categorization, Tagging and Retrieval, in conjunction with ACCV 2010 (2010)Google Scholar
  20. 20.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. Journal of Machine Learning Research (2005)Google Scholar
  21. 21.
    Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference (2009)Google Scholar
  22. 22.
    Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Piotr Bilinski
    • 1
  • Francois Bremond
    • 1
  1. 1.INRIA Sophia Antipolis - PULSAR groupSophia Antipolis CedexFrance

Personalised recommendations