Human Action Recognition by SOM Considering the Probability of Spatio-temporal Features

  • Yanli Ji
  • Atsushi Shimada
  • Rin-ichiro Taniguchi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6444)


In this paper, an action recognition system was invented by proposing a compact 3D descriptor to represent action information, and employing self-organizing map (SOM) to learn and recognize actions. Histogram Of Gradient 3D (HOG3D) performed better among currently used descriptors for action recognition. However, the calculation of the descriptor is quite complex. Furthermore, it used a vector with 960 elements to describe one interest point. Therefore, we proposed a compact descriptor, which shortened the support region of interest points, combined symmetric bins after orientation quantization. In addition, the top value bin of quantized vector was kept instead of setting threshold experimentally. Comparing with HOG3D, our descriptor used 80 bins to describe a point, which reduced much computation complexity. The compact descriptor was used to learn and recognize actions considering the probability of local features in SOM, and the results showed that our system outperformed others both on KTH and Hollywood datasets.


Computer vision Human action recognition SOM 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: 4th Alvey Vision Conference. Elsevier North-Holland, The Netherlands (1988)Google Scholar
  2. 2.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)Google Scholar
  3. 3.
    Laptev, I., Lindeberg, T.: On Space-time interest points. In: 6th IEEE International Conference on Computer Vision, pp. 432–439 (2003)Google Scholar
  4. 4.
    Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scaleinvariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    FeiFei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: 15th IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 524–531 (2005)Google Scholar
  7. 7.
    Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: 8th IEEE International Conference on Computer Vision, pp. 604–610 (2005)Google Scholar
  8. 8.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 18th IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  9. 9.
    Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D gradients. In: 19th British Machine Vision Conference, pp. 995–1004. British Machine Vision Association, Worcs (2008)Google Scholar
  10. 10.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: 15th ACM International Conference on Multimedia, pp. 357–360. ACM, New York (2007)Google Scholar
  11. 11.
    Shimada, A., Taniguchi, R.: Gesture recognition using sparse code of hierarchical SOM. In: 18th International Conference on Pattern Recognition (2008)Google Scholar
  12. 12.
    Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1995)CrossRefMATHGoogle Scholar
  13. 13.
    Gilbert, A., Illingworth, J., Bowden, R.: Fast realistic multi-action recognition using mined dense spatio-temporal features. In: 12th IEEE International Conference on computer Vision (2009)Google Scholar
  14. 14.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: 14th International Conference on Pattern Recognition, pp. 32–36 (2004)Google Scholar
  15. 15.
    Heng, W., Muhammad, M.U., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference, pp. 127–137 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Yanli Ji
    • 1
  • Atsushi Shimada
    • 1
  • Rin-ichiro Taniguchi
    • 1
  1. 1.Department of Advanced Information TechnologyKyushu UniversityFukuokaJapan

Personalised recommendations