Skip to main content

An Unsupervised Framework for Action Recognition Using Actemes

  • Conference paper
Computer Vision – ACCV 2010 (ACCV 2010)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6495))

Included in the following conference series:

Abstract

In speech recognition, phonemes have demonstrated their efficacy to model the words of a language. While they are well defined for languages, their extension to human actions is not straightforward. In this paper, we study such an extension and propose an unsupervised framework to find phoneme-like units for actions, which we call actemes, using 3D data and without any prior assumptions. To this purpose, build on an earlier proposed framework in speech literature to automatically find actemes in the training data. We experimentally show that actions defined in terms of actemes and actions defined by whole units give similar recognition results. We define actions out of the training set in terms of these actemes to see whether the actemes generalize to unseen actions. The results show that although the acteme definitions of the actions are not always semantically meaningful, they yield optimal recognition accuracy and constitute a promising direction of research for action modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Green, R.D., Guan, L.: Quantifying and recognizing human movement patterns from monocular video images-part i: a new framework for modeling human motion. IEEE Trans. Circuits Syst. Video Techn. 14, 179–190 (2004)

    Article  Google Scholar 

  2. Guerra-Filho, G., Aloimonos, Y.: A language for human action. Computer 40, 42–51 (2007)

    Article  Google Scholar 

  3. Bregler, C.: Learning and recognizing human dynamics in video sequences. In: CVPR (1997)

    Google Scholar 

  4. Lee, C.H., Soong, F., Juang, B.H.: A segment model based approach to speech recognition. In: ICASSP (1988)

    Google Scholar 

  5. Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Prentice-Hall, New Jersey (1993)

    MATH  Google Scholar 

  6. Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28, 976–990 (2010)

    Article  Google Scholar 

  7. Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden markov models. In: CVPR, pp. 379–385 (1992)

    Google Scholar 

  8. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. PAMI (2001)

    Google Scholar 

  9. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. In: CVIU, vol. 104, pp. 249–257 (2006)

    Google Scholar 

  10. Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV (2007)

    Google Scholar 

  11. Veeraraghavan, A., Chellappa, R., Roy-Chowdhury, A.K.: The function space of an activity. In: CVPR (2006)

    Google Scholar 

  12. Turaga, P.K., Veeraraghavan, A., Chellappa, R.: From videos to verbs: Mining videos for events using a cascade of dynamical systems. In: CVPR (2007)

    Google Scholar 

  13. Turaga, P.K., Veeraraghavan, A., Chellappa, R.: Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: CVPR (2008)

    Google Scholar 

  14. Turaga, P.K., Chellappa, R.: Locally time-invariant models of human activities using trajectories on the grassmanian. In: CVPR (2009)

    Google Scholar 

  15. Kulkarni, K., Cherla, S., Kale, A., Ramasubramanian, V.: A framework for indexing human actions in video. In: ECCV Workshops (2008)

    Google Scholar 

  16. Carlsson, S., Sullivan, J.: Action recognition by shape matching to key frames. In: CVPR Workshops (2001)

    Google Scholar 

  17. Schindler, K., Gool, L.V.: Action snippets: How many frames does human action recognition require? In: CVPR (2008)

    Google Scholar 

  18. Weinland, D., Boyer, E.: Action recognition using exemplar-based embedding. In: CVPR (2008)

    Google Scholar 

  19. Ogale, A.S., Karapurkar, A., Aloimonos, Y.: View-invariant modeling and recognition of human actions using grammars. In: ICCV Workshops (2005)

    Google Scholar 

  20. Ney, H.: The use of one-stage dynamic programming algorithm for connected word recognition. IEEE Trans. on Acoustic Speech and Signal Processing 32(2), 263–270 (1984)

    Article  Google Scholar 

  21. Ramasubramanian, V., Kulkarni, K., Kaemmerer, B.: Acoustic modeling by phoneme templates and modified one-pass dp decoding for continuous speech recognition. In: ICASSP (2008)

    Google Scholar 

  22. Weinland, D., Ronfard, R., Boyer, E.: Automatic discovery of action taxonomies from multiple views. In: CVPR (2006)

    Google Scholar 

  23. Svendsen, T., Soong, F.: On the automatic segmentation of speech signals (1987)

    Google Scholar 

  24. Ramasubramanian, V., Sreenivas, T.: Automatically derived units for segment vocoders. In: ICASSP, vol. 1, pp. I-473–I-476 (2004)

    Google Scholar 

  25. Zelinski, R., Class, F.: A learning procedure for speaker-dependent word recognition systems based on sequential processing of input tokens. In: ICASSP (1983)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kulkarni, K., Boyer, E., Horaud, R., Kale, A. (2011). An Unsupervised Framework for Action Recognition Using Actemes . In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19282-1_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19282-1_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19281-4

  • Online ISBN: 978-3-642-19282-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics