Skip to main content

Action Recognition in the Presence of One Egocentric and Multiple Static Cameras

  • Conference paper
  • First Online:
Computer Vision -- ACCV 2014 (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9007))

Included in the following conference series:

Abstract

In this paper, we study the problem of recognizing human actions in the presence of a single egocentric camera and multiple static cameras. Some actions are better presented in static cameras, where the whole body of an actor and the context of actions are visible. Some other actions are better recognized in egocentric cameras, where subtle movements of hands and complex object interactions are visible. In this paper, we introduce a model that can benefit from the best of both worlds by learning to predict the importance of each camera in recognizing actions in each frame. By joint discriminative learning of latent camera importance variables and action classifiers, our model achieves successful results in the challenging CMU-MMAC dataset. Our experimental results show significant gain in learning to use the cameras according to their predicted importance. The learned latent variables provide a level of understanding of a scene that enables automatic cinematography by smoothly switching between cameras in order to maximize the amount of relevant information in each frame.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. (2010)

    Google Scholar 

  2. Ren, X., Philipose, M.: Egocentric recognition of handled objects: benchmark and analysis. In: CVPR (2009)

    Google Scholar 

  3. Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43, 1–43 (2011)

    Article  Google Scholar 

  4. Taralova, E., De la Torre, F., Hebert, M.: Source constrained clustering. In: ICCV (2011)

    Google Scholar 

  5. Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities. In: ICCV (2011)

    Google Scholar 

  6. Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Kanade, T., Hebert, M.: First-person vision. In: Proceedings of the IEEE (2012)

    Google Scholar 

  8. Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)

    Google Scholar 

  9. Fathi, A., Hodgins, J.K., Rehg, J.M.: Social interactions: a first-person perspective. In: CVPR (2012)

    Google Scholar 

  10. Kitani, K.M.: Ego-action analysis for first-person sports videos. IEEE Pervasive Comput. 11, 92–95 (2012)

    Article  Google Scholar 

  11. Ogaki, K., Kitani, K.M., Sugano, Y., Sato, Y.: Coupling eye-motion and ego-motion features for first-person activity recognition. In: CVPR Workshops (2012)

    Google Scholar 

  12. Sundaram, S., Mayol-Cuevas, W.: What are we doing here? egocentric activity recognition on the move for contextual mapping. In: ICRA (2012)

    Google Scholar 

  13. Sundaram, S., Mayol-Cuevas, W.: High level activity recognition using low resolution wearable vision. In: CVPR (2009)

    Google Scholar 

  14. Li, Y.L., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV (2013)

    Google Scholar 

  15. Fathi, A., Rehg, J.M.: Modeling actions through state changes. In: CVPR (2013)

    Google Scholar 

  16. Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they doing to me? In: CVPR (2013)

    Google Scholar 

  17. Farhadi, A., Tabrizi, M.K., Endres, I., Forsyth, D.A.: A latent model of discriminative aspect. In: ICCV (2009)

    Google Scholar 

  18. Wu, X., Jia, Y.: View-invariant action recognition using latent kernelized structural SVM. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 411–424. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  19. Gu, C., Ren, X.: Discriminative mixture-of-templates for viewpoint classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 408–421. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Song, Y., Morency, L.P., Davis, R.: Multi-view latent variable discriminative models for action recognition. In: CVPR (2012)

    Google Scholar 

  21. Wu, C., Khalili, A.H., Aghajan, H.: Multiview activity recognition in smart homes with spatio-temporal features. In: ACM/IEEE International Conference on Distributed Smart Cameras (2010)

    Google Scholar 

  22. Weinland, D., Özuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 635–648. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  23. Farhadi, A., Tabrizi, M.K.: Learning to recognize activities from the wrong view point. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 154–166. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  24. Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: CVPR (2011)

    Google Scholar 

  25. Huang, C.-H., Yeh, Y.-R., Wang, Y.-C.F.: Recognizing actions across cameras by exploring the correlated subspace. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part I. LNCS, vol. 7583, pp. 342–351. Springer, Heidelberg (2012)

    Google Scholar 

  26. Junejo, I., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. PAMI 33, 172–185 (2011)

    Article  Google Scholar 

  27. Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3DPost multi-view and 3D human action/interaction database. In: CVMP (2009)

    Google Scholar 

  28. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. CVIU 104, 249–257 (2006)

    Google Scholar 

  29. De la Torre, F., Hodgins, J., Montano, J., Valcarcel, S., Macey, J.: Guide to the carnegie mellon university multimodal activity (CMU-MMAC) database. Technical report, CMU, RI (2009)

    Google Scholar 

  30. Spriggs, E.H., De la Torre, F., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: IEEE Workshop on Egocentric Vision (2009)

    Google Scholar 

  31. Fisher, R., Reddy, P.: Supervised multi-modal action classification. Technical report, Carnegie Mellon University (2011)

    Google Scholar 

  32. McCall, C., Reddy, K.K., Shah, M.: Macro-class selection for hierarchical k-nn classification of inertial sensor data. In: PECCS (2012)

    Google Scholar 

  33. Zhao, L., Wang, X., Sukthankar, G., Sukthankar, R.: Motif discovery and feature selection for CRF-based activity recognition. In: ICPR (2010)

    Google Scholar 

  34. Elson, D.K., Riedl, M.O.: A lightweight intelligent virtual cinematography system for machinima production. In: AIIDE (2007)

    Google Scholar 

  35. He, L.W., Cohen, M.F., Salesin, D.H.: The virtual cinematographer: a paradigm for automatic real-time camera control and directing. In: Computer Graphics and Interactive Techniques (1996)

    Google Scholar 

  36. Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32, 1627–1645 (2010)

    Article  Google Scholar 

  37. Scheirer, W.J., Rocha, A., Michaels, R., Boult, T.E.: Meta-recognition: the theory and practice of recognition score analysis. PAMIs 33, 1689–1695 (2011)

    Article  Google Scholar 

  38. Morency, L., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. In: CVPR (2007)

    Google Scholar 

  39. van der Maaten, L.J.P., Welling, M., Saul, L.K.: Hidden-unit conditional random fields. In: IJCAI (2011)

    Google Scholar 

  40. Oliva, A., Torralba, A.: Building the gist of a scene: the role of global image features in recognition. In: Martinez-Conde, S., Macknik, S., Martinez, L., Alonso, J.-M., Tse, P. (eds.) Progress in Brain Research. Elsevier, Amsterdam (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bilge Soran .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (zip 24,768 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Soran, B., Farhadi, A., Shapiro, L. (2015). Action Recognition in the Presence of One Egocentric and Multiple Static Cameras. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16814-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16813-5

  • Online ISBN: 978-3-319-16814-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics