Action Recognition in the Presence of One Egocentric and Multiple Static Cameras

Soran, Bilge; Farhadi, Ali; Shapiro, Linda

doi:10.1007/978-3-319-16814-2_12

Bilge Soran¹⁷,
Ali Farhadi¹⁷ &
Linda Shapiro¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9007))

Included in the following conference series:

Asian Conference on Computer Vision

1712 Accesses
11 Citations

Abstract

In this paper, we study the problem of recognizing human actions in the presence of a single egocentric camera and multiple static cameras. Some actions are better presented in static cameras, where the whole body of an actor and the context of actions are visible. Some other actions are better recognized in egocentric cameras, where subtle movements of hands and complex object interactions are visible. In this paper, we introduce a model that can benefit from the best of both worlds by learning to predict the importance of each camera in recognizing actions in each frame. By joint discriminative learning of latent camera importance variables and action classifiers, our model achieves successful results in the challenging CMU-MMAC dataset. Our experimental results show significant gain in learning to use the cameras according to their predicted importance. The learned latent variables provide a level of understanding of a scene that enables automatic cinematography by smoothly switching between cameras in order to maximize the amount of relevant information in each frame.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. (2010)
Google Scholar
Ren, X., Philipose, M.: Egocentric recognition of handled objects: benchmark and analysis. In: CVPR (2009)
Google Scholar
Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43, 1–43 (2011)
Article Google Scholar
Taralova, E., De la Torre, F., Hebert, M.: Source constrained clustering. In: ICCV (2011)
Google Scholar
Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities. In: ICCV (2011)
Google Scholar
Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012)
Chapter Google Scholar
Kanade, T., Hebert, M.: First-person vision. In: Proceedings of the IEEE (2012)
Google Scholar
Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)
Google Scholar
Fathi, A., Hodgins, J.K., Rehg, J.M.: Social interactions: a first-person perspective. In: CVPR (2012)
Google Scholar
Kitani, K.M.: Ego-action analysis for first-person sports videos. IEEE Pervasive Comput. 11, 92–95 (2012)
Article Google Scholar
Ogaki, K., Kitani, K.M., Sugano, Y., Sato, Y.: Coupling eye-motion and ego-motion features for first-person activity recognition. In: CVPR Workshops (2012)
Google Scholar
Sundaram, S., Mayol-Cuevas, W.: What are we doing here? egocentric activity recognition on the move for contextual mapping. In: ICRA (2012)
Google Scholar
Sundaram, S., Mayol-Cuevas, W.: High level activity recognition using low resolution wearable vision. In: CVPR (2009)
Google Scholar
Li, Y.L., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV (2013)
Google Scholar
Fathi, A., Rehg, J.M.: Modeling actions through state changes. In: CVPR (2013)
Google Scholar
Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they doing to me? In: CVPR (2013)
Google Scholar
Farhadi, A., Tabrizi, M.K., Endres, I., Forsyth, D.A.: A latent model of discriminative aspect. In: ICCV (2009)
Google Scholar
Wu, X., Jia, Y.: View-invariant action recognition using latent kernelized structural SVM. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 411–424. Springer, Heidelberg (2012)
Chapter Google Scholar
Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010)
Chapter Google Scholar
Song, Y., Morency, L.P., Davis, R.: Multi-view latent variable discriminative models for action recognition. In: CVPR (2012)
Google Scholar
Wu, C., Khalili, A.H., Aghajan, H.: Multiview activity recognition in smart homes with spatio-temporal features. In: ACM/IEEE International Conference on Distributed Smart Cameras (2010)
Google Scholar
Weinland, D., Özuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 635–648. Springer, Heidelberg (2010)
Chapter Google Scholar
Farhadi, A., Tabrizi, M.K.: Learning to recognize activities from the wrong view point. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 154–166. Springer, Heidelberg (2008)
Chapter Google Scholar
Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: CVPR (2011)
Google Scholar
Huang, C.-H., Yeh, Y.-R., Wang, Y.-C.F.: Recognizing actions across cameras by exploring the correlated subspace. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part I. LNCS, vol. 7583, pp. 342–351. Springer, Heidelberg (2012)
Google Scholar
Junejo, I., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. PAMI 33, 172–185 (2011)
Article Google Scholar
Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3DPost multi-view and 3D human action/interaction database. In: CVMP (2009)
Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. CVIU 104, 249–257 (2006)
Google Scholar
De la Torre, F., Hodgins, J., Montano, J., Valcarcel, S., Macey, J.: Guide to the carnegie mellon university multimodal activity (CMU-MMAC) database. Technical report, CMU, RI (2009)
Google Scholar
Spriggs, E.H., De la Torre, F., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: IEEE Workshop on Egocentric Vision (2009)
Google Scholar
Fisher, R., Reddy, P.: Supervised multi-modal action classification. Technical report, Carnegie Mellon University (2011)
Google Scholar
McCall, C., Reddy, K.K., Shah, M.: Macro-class selection for hierarchical k-nn classification of inertial sensor data. In: PECCS (2012)
Google Scholar
Zhao, L., Wang, X., Sukthankar, G., Sukthankar, R.: Motif discovery and feature selection for CRF-based activity recognition. In: ICPR (2010)
Google Scholar
Elson, D.K., Riedl, M.O.: A lightweight intelligent virtual cinematography system for machinima production. In: AIIDE (2007)
Google Scholar
He, L.W., Cohen, M.F., Salesin, D.H.: The virtual cinematographer: a paradigm for automatic real-time camera control and directing. In: Computer Graphics and Interactive Techniques (1996)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32, 1627–1645 (2010)
Article Google Scholar
Scheirer, W.J., Rocha, A., Michaels, R., Boult, T.E.: Meta-recognition: the theory and practice of recognition score analysis. PAMIs 33, 1689–1695 (2011)
Article Google Scholar
Morency, L., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. In: CVPR (2007)
Google Scholar
van der Maaten, L.J.P., Welling, M., Saul, L.K.: Hidden-unit conditional random fields. In: IJCAI (2011)
Google Scholar
Oliva, A., Torralba, A.: Building the gist of a scene: the role of global image features in recognition. In: Martinez-Conde, S., Macknik, S., Martinez, L., Alonso, J.-M., Tse, P. (eds.) Progress in Brain Research. Elsevier, Amsterdam (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Washington, Seattle, USA
Bilge Soran, Ali Farhadi & Linda Shapiro

Authors

Bilge Soran
View author publications
You can also search for this author in PubMed Google Scholar
Ali Farhadi
View author publications
You can also search for this author in PubMed Google Scholar
Linda Shapiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bilge Soran .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (zip 24,768 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soran, B., Farhadi, A., Shapiro, L. (2015). Action Recognition in the Presence of One Egocentric and Multiple Static Cameras. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-16814-2_12
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics