Camera Motion and Surrounding Scene Appearance as Context for Action Recognition

  • Fabian Caba Heilbron
  • Ali Thabet
  • Juan Carlos Niebles
  • Bernard GhanemEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9006)


This paper describes a framework for recognizing human actions in videos by incorporating a new set of visual cues that represent the context of the action. We develop a weak foreground-background segmentation approach in order to robustly extract not only foreground features that are focused on the actors, but also global camera motion and contextual scene information. Using dense point trajectories, our approach separates and describes the foreground motion from the background, represents the appearance of the extracted static background, and encodes the global camera motion that interestingly is shown to be discriminative for certain action classes. Our experiments on four challenging benchmarks (HMDB51, Hollywood2, Olympic Sports, and UCF50) show that our contextual features enable a significant performance improvement over state-of-the-art algorithms.



Research reported in this publication was supported by competitive research funding from King Abdullah University of Science and Technology (KAUST). F.C.H. was also supported by a COLCIENCIAS Young Scientist and Innovator Fellowship. J.C.N. is supported by a Microsoft Research Faculty Fellowship.


  1. 1.
    Aggarwal, J., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43, 1–43 (2011)CrossRefGoogle Scholar
  2. 2.
    Atmosukarto, I., Ghanem, B., Ahuja, N.: Trajectory-based fisher kernel representation for action recognition in videos. In: International Conference on Pattern Recognition, pp. 3333–3336 (2012)Google Scholar
  3. 3.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV (2005)Google Scholar
  4. 4.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2005 Visual Surveillance and Performance Evaluation of Tracking and Surveillance (2005)Google Scholar
  5. 5.
    Escorcia, V., Niebles, J.C.: Spatio-temporal human-object interactions for action recognition in videos. In: ICCV (2013)Google Scholar
  6. 6.
    Hartley, R.: In defense of the eight-point algorithm. TPAMI 19, 580–593 (1997)CrossRefGoogle Scholar
  7. 7.
    Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 494–507. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  8. 8.
    Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR (2013)Google Scholar
  9. 9.
    Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., Schmid, C.: Aggregating local image descriptors into compact codes. PAMI 34, 1704–1716 (2012)CrossRefGoogle Scholar
  10. 10.
    Jiang, Y.-G., Dai, Q., Xue, X., Liu, W., Ngo, C.-W.: Trajectory-based modeling of human actions with motion reference points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 425–438. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  11. 11.
    Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: ICCV (2011)Google Scholar
  12. 12.
    Laptev, I.: On space-time interest points. IJCV 64, 107–123 (2005)CrossRefGoogle Scholar
  13. 13.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  14. 14.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  15. 15.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)Google Scholar
  16. 16.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  17. 17.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV 42, 145–175 (2001)CrossRefzbMATHGoogle Scholar
  18. 18.
    Park, D., Zitnick, C.L., Ramanan, D., Dollár, P.: Exploring weak stabilization for motion feature extraction. In: CVPR (2013)Google Scholar
  19. 19.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  20. 20.
    Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24, 971–981 (2013)CrossRefGoogle Scholar
  21. 21.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR (2004)Google Scholar
  22. 22.
    Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)Google Scholar
  23. 23.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)Google Scholar
  24. 24.
    Wang, X., Wang, L.M., Qiao, Y.: A comparative study of encoding, pooling and normalization methods for action recognition. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part III. LNCS, vol. 7726, pp. 572–585. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  25. 25.
    Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: ICCV (2011)Google Scholar
  26. 26.
    Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73, 213–238 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Fabian Caba Heilbron
    • 1
    • 2
  • Ali Thabet
    • 1
  • Juan Carlos Niebles
    • 2
  • Bernard Ghanem
    • 1
    Email author
  1. 1.King Abdullah University of Science and Technology (KAUST)ThuwalSaudi Arabia
  2. 2.Universidad del NorteBarranquillaColombia

Personalised recommendations