An Unsupervised Framework for Action Recognition Using Actemes

Kulkarni, Kaustubh; Boyer, Edmond; Horaud, Radu; Kale, Amit

doi:10.1007/978-3-642-19282-1_47

Kaustubh Kulkarni¹⁹,
Edmond Boyer¹⁹,
Radu Horaud¹⁹ &
…
Amit Kale²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6495))

Included in the following conference series:

Asian Conference on Computer Vision

2324 Accesses
1 Citations

Abstract

In speech recognition, phonemes have demonstrated their efficacy to model the words of a language. While they are well defined for languages, their extension to human actions is not straightforward. In this paper, we study such an extension and propose an unsupervised framework to find phoneme-like units for actions, which we call actemes, using 3D data and without any prior assumptions. To this purpose, build on an earlier proposed framework in speech literature to automatically find actemes in the training data. We experimentally show that actions defined in terms of actemes and actions defined by whole units give similar recognition results. We define actions out of the training set in terms of these actemes to see whether the actemes generalize to unseen actions. The results show that although the acteme definitions of the actions are not always semantically meaningful, they yield optimal recognition accuracy and constitute a promising direction of research for action modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Green, R.D., Guan, L.: Quantifying and recognizing human movement patterns from monocular video images-part i: a new framework for modeling human motion. IEEE Trans. Circuits Syst. Video Techn. 14, 179–190 (2004)
Article Google Scholar
Guerra-Filho, G., Aloimonos, Y.: A language for human action. Computer 40, 42–51 (2007)
Article Google Scholar
Bregler, C.: Learning and recognizing human dynamics in video sequences. In: CVPR (1997)
Google Scholar
Lee, C.H., Soong, F., Juang, B.H.: A segment model based approach to speech recognition. In: ICASSP (1988)
Google Scholar
Rabiner, L., Juang, B.: Fundamentals of Speech Recognition. Prentice-Hall, New Jersey (1993)
MATH Google Scholar
Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28, 976–990 (2010)
Article Google Scholar
Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden markov models. In: CVPR, pp. 379–385 (1992)
Google Scholar
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. PAMI (2001)
Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. In: CVIU, vol. 104, pp. 249–257 (2006)
Google Scholar
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV (2007)
Google Scholar
Veeraraghavan, A., Chellappa, R., Roy-Chowdhury, A.K.: The function space of an activity. In: CVPR (2006)
Google Scholar
Turaga, P.K., Veeraraghavan, A., Chellappa, R.: From videos to verbs: Mining videos for events using a cascade of dynamical systems. In: CVPR (2007)
Google Scholar
Turaga, P.K., Veeraraghavan, A., Chellappa, R.: Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: CVPR (2008)
Google Scholar
Turaga, P.K., Chellappa, R.: Locally time-invariant models of human activities using trajectories on the grassmanian. In: CVPR (2009)
Google Scholar
Kulkarni, K., Cherla, S., Kale, A., Ramasubramanian, V.: A framework for indexing human actions in video. In: ECCV Workshops (2008)
Google Scholar
Carlsson, S., Sullivan, J.: Action recognition by shape matching to key frames. In: CVPR Workshops (2001)
Google Scholar
Schindler, K., Gool, L.V.: Action snippets: How many frames does human action recognition require? In: CVPR (2008)
Google Scholar
Weinland, D., Boyer, E.: Action recognition using exemplar-based embedding. In: CVPR (2008)
Google Scholar
Ogale, A.S., Karapurkar, A., Aloimonos, Y.: View-invariant modeling and recognition of human actions using grammars. In: ICCV Workshops (2005)
Google Scholar
Ney, H.: The use of one-stage dynamic programming algorithm for connected word recognition. IEEE Trans. on Acoustic Speech and Signal Processing 32(2), 263–270 (1984)
Article Google Scholar
Ramasubramanian, V., Kulkarni, K., Kaemmerer, B.: Acoustic modeling by phoneme templates and modified one-pass dp decoding for continuous speech recognition. In: ICASSP (2008)
Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: Automatic discovery of action taxonomies from multiple views. In: CVPR (2006)
Google Scholar
Svendsen, T., Soong, F.: On the automatic segmentation of speech signals (1987)
Google Scholar
Ramasubramanian, V., Sreenivas, T.: Automatically derived units for segment vocoders. In: ICASSP, vol. 1, pp. I-473–I-476 (2004)
Google Scholar
Zelinski, R., Class, F.: A learning procedure for speaker-dependent word recognition systems based on sequential processing of input tokens. In: ICASSP (1983)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA, Grenoble, Rhone Alpes, France
Kaustubh Kulkarni, Edmond Boyer & Radu Horaud
Siemens Corporate Technology, Bangalore, India
Amit Kale

Authors

Kaustubh Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Edmond Boyer
View author publications
You can also search for this author in PubMed Google Scholar
Radu Horaud
View author publications
You can also search for this author in PubMed Google Scholar
Amit Kale
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Technion – Israel Institute of Technology, 32000, Haifa, Israel
Ron Kimmel
The University of Auckland, 37 Kohimarama Road, 1071, Mission Bay, Auckland, New Zealand
Reinhard Klette
National Institute of Informatics, 1018430, Chiyoda, Tokyo, Japan
Akihiro Sugimoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kulkarni, K., Boyer, E., Horaud, R., Kale, A. (2011). An Unsupervised Framework for Action Recognition Using Actemes . In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19282-1_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-19282-1_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19281-4
Online ISBN: 978-3-642-19282-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics