An Unsupervised Framework for Action Recognition Using Actemes

  • Kaustubh Kulkarni
  • Edmond Boyer
  • Radu Horaud
  • Amit Kale
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6495)


In speech recognition, phonemes have demonstrated their efficacy to model the words of a language. While they are well defined for languages, their extension to human actions is not straightforward. In this paper, we study such an extension and propose an unsupervised framework to find phoneme-like units for actions, which we call actemes, using 3D data and without any prior assumptions. To this purpose, build on an earlier proposed framework in speech literature to automatically find actemes in the training data. We experimentally show that actions defined in terms of actemes and actions defined by whole units give similar recognition results. We define actions out of the training set in terms of these actemes to see whether the actemes generalize to unseen actions. The results show that although the acteme definitions of the actions are not always semantically meaningful, they yield optimal recognition accuracy and constitute a promising direction of research for action modeling.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Green, R.D., Guan, L.: Quantifying and recognizing human movement patterns from monocular video images-part i: a new framework for modeling human motion. IEEE Trans. Circuits Syst. Video Techn. 14, 179–190 (2004)CrossRefGoogle Scholar
  2. 2.
    Guerra-Filho, G., Aloimonos, Y.: A language for human action. Computer 40, 42–51 (2007)CrossRefGoogle Scholar
  3. 3.
    Bregler, C.: Learning and recognizing human dynamics in video sequences. In: CVPR (1997)Google Scholar
  4. 4.
    Lee, C.H., Soong, F., Juang, B.H.: A segment model based approach to speech recognition. In: ICASSP (1988)Google Scholar
  5. 5.
    Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Prentice-Hall, New Jersey (1993)MATHGoogle Scholar
  6. 6.
    Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28, 976–990 (2010)CrossRefGoogle Scholar
  7. 7.
    Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden markov models. In: CVPR, pp. 379–385 (1992)Google Scholar
  8. 8.
    Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. PAMI (2001)Google Scholar
  9. 9.
    Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. In: CVIU, vol. 104, pp. 249–257 (2006)Google Scholar
  10. 10.
    Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV (2007)Google Scholar
  11. 11.
    Veeraraghavan, A., Chellappa, R., Roy-Chowdhury, A.K.: The function space of an activity. In: CVPR (2006)Google Scholar
  12. 12.
    Turaga, P.K., Veeraraghavan, A., Chellappa, R.: From videos to verbs: Mining videos for events using a cascade of dynamical systems. In: CVPR (2007)Google Scholar
  13. 13.
    Turaga, P.K., Veeraraghavan, A., Chellappa, R.: Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: CVPR (2008)Google Scholar
  14. 14.
    Turaga, P.K., Chellappa, R.: Locally time-invariant models of human activities using trajectories on the grassmanian. In: CVPR (2009)Google Scholar
  15. 15.
    Kulkarni, K., Cherla, S., Kale, A., Ramasubramanian, V.: A framework for indexing human actions in video. In: ECCV Workshops (2008)Google Scholar
  16. 16.
    Carlsson, S., Sullivan, J.: Action recognition by shape matching to key frames. In: CVPR Workshops (2001)Google Scholar
  17. 17.
    Schindler, K., Gool, L.V.: Action snippets: How many frames does human action recognition require? In: CVPR (2008)Google Scholar
  18. 18.
    Weinland, D., Boyer, E.: Action recognition using exemplar-based embedding. In: CVPR (2008)Google Scholar
  19. 19.
    Ogale, A.S., Karapurkar, A., Aloimonos, Y.: View-invariant modeling and recognition of human actions using grammars. In: ICCV Workshops (2005)Google Scholar
  20. 20.
    Ney, H.: The use of one-stage dynamic programming algorithm for connected word recognition. IEEE Trans. on Acoustic Speech and Signal Processing 32(2), 263–270 (1984)CrossRefGoogle Scholar
  21. 21.
    Ramasubramanian, V., Kulkarni, K., Kaemmerer, B.: Acoustic modeling by phoneme templates and modified one-pass dp decoding for continuous speech recognition. In: ICASSP (2008)Google Scholar
  22. 22.
    Weinland, D., Ronfard, R., Boyer, E.: Automatic discovery of action taxonomies from multiple views. In: CVPR (2006)Google Scholar
  23. 23.
    Svendsen, T., Soong, F.: On the automatic segmentation of speech signals (1987)Google Scholar
  24. 24.
    Ramasubramanian, V., Sreenivas, T.: Automatically derived units for segment vocoders. In: ICASSP, vol. 1, pp. I-473–I-476 (2004)Google Scholar
  25. 25.
    Zelinski, R., Class, F.: A learning procedure for speaker-dependent word recognition systems based on sequential processing of input tokens. In: ICASSP (1983)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Kaustubh Kulkarni
    • 1
  • Edmond Boyer
    • 1
  • Radu Horaud
    • 1
  • Amit Kale
    • 2
  1. 1.INRIAGrenobleFrance
  2. 2.Siemens Corporate TechnologyBangaloreIndia

Personalised recommendations