Advertisement

International Journal of Computer Vision

, Volume 111, Issue 2, pp 229–248 | Cite as

Pose Adaptive Motion Feature Pooling for Human Action Analysis

  • Bingbing Ni
  • Pierre Moulin
  • Shuicheng Yan
Article

Abstract

Ineffective spatial–temporal motion feature pooling has been a fundamental bottleneck for human action recognition/detection for decades. Previous pooling schemes such as global, spatial–temporal pyramid, or human and object centric pooling fail to capture discriminative motion patterns because informative movements only occur in specific regions of the human body, that depend on the type of action being performed. Global (holistic) motion feature pooling methods therefore often result in an action representation with limited discriminative capability. To address this fundamental limitation, we propose an adaptive motion feature pooling scheme that utilizes human poses as side information. Such poses can be detected for instance in assisted living and indoor smart surveillance scenarios. Taking both video sub-volumes for pooling and human pose types as hidden variables, we formulate the motion feature pooling problem as a latent structural learning problem where the relationship between the discriminative pooling video sub-volumes and the pose types is learned. The resulting pose adaptive motion feature pooling scheme is extensively tested on assisted living and smart surveillance datasets and on general action recognition benchmarks. Improved action recognition and detection performances are demonstrated.

Keywords

Adaptive feature pooling Human pose Action recognition 

References

  1. Andrews, S., & Tsochantaridis, I. (2003). Support vector machines for multiple instance learning. In: Advances in neural information processing systems (pp. 561–568). MIT Press.Google Scholar
  2. Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3d human pose annotations. In: International conference on computer vision, URL http://www.eecs.berkeley.edu/~lbourdev/poselets
  3. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent System and Technology, 2(27), 1–27.CrossRefGoogle Scholar
  4. Chen, Q., Song, Z., Hua, Y., Huang, Z., & Yan, S. (2011). Hierarchical matching with side information for image classification. In: International conference on computer vision and pattern recognition.Google Scholar
  5. Choi, J., Jeon, W.J., & Lee, S.C. (2008). Spatio-temporal pyramid matching for sports videos. In: ACM multimedia information retrieval.Google Scholar
  6. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: International conference on computer vision and pattern recognition (pp. 886–893).Google Scholar
  7. Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In: VS-PETS.Google Scholar
  8. Duchenne, O., Laptev, I., Sivic, J., Bach, F., & Ponce, J. (2009). Automatic annotation of human actions in video. In: International conference on computer vision (pp. 1491–1498).Google Scholar
  9. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRefGoogle Scholar
  10. Girshick, R.B., Felzenszwalb, P.F., & McAllester, D. (2012). Discriminatively trained deformable part models, release 5. http://people.cs.uchicago.edu/~rbg/latent-release5/
  11. Yang, J., YG., Yu, K., Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In: International conference on computer vision and pattern recognition.Google Scholar
  12. Jiang, Y., Yuan, J., Yu, G. (2012). Randomized spatial partition for scene recognition. In: European conference on computer vision.Google Scholar
  13. Kanan, C., Cottrell, G. (2010). Robust classification of objects, faces, and flowers using natural image statistics. In: International conference on computer vision and pattern recognition.Google Scholar
  14. Klaser, A., Marszalek, M., & Schmid, C. (2008). A spatio-temporal descriptor based on 3d gradients. In: British machine vision conference.Google Scholar
  15. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). HMDB: A large video database for human motion recognition. In: International conference on computer vision.Google Scholar
  16. Laptev, I., Lindeberg, T. (2003). Space-time interest points. In: International conference on computer vision.Google Scholar
  17. Lazebnik, S., Schmid, C., Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: International conference on computer vision and pattern recognition.Google Scholar
  18. Lv, F., & Nevatia, R. (2007). Single view human action recognition using key pose matching and viterbi path searching. In: International conference computer vision and pattern recognition.Google Scholar
  19. Marszalek, M., Laptev, I., & Schmid, C. (2009). Actions in context. In: International conference on computer vision and pattern recognition (pp. 2929–2936). Retrieved June, 2009.Google Scholar
  20. Ni, B., Wang, G., & Moulin, P. (2011). RGBD-HuDaAct: A color-depth video database for human daily activity recognition. In: ICCV workshops (pp. 1147–1153).Google Scholar
  21. Niebles, J.C., Chen, C.W., & Fei-fei, L. (2010). Modeling temporal structure of decomposable motion segments for activity classification. In: European conference on computer vision (pp. 392–405).Google Scholar
  22. Perronnin, F., Sánchez, J., & Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In: European conference on computer vision (pp. 143–156).Google Scholar
  23. Raptis, M., & Sigal, L. (2013). Poselet key-framing: A model for human activity recognition. In: International conference on computer vision and pattern recognition (pp. 2650–2657).Google Scholar
  24. Raptis, M., Kokkinos, I., & Soatto, S. (2012). Discovering discriminative action parts from mid-level video representations. In: International conference on computer vision and pattern recognition.Google Scholar
  25. Russakovsky, O., Lin, Y., Yu, K., & Fei-Fei, L. (2012). Object-centric spatial pooling for image classification. In: European conference on computer vision.Google Scholar
  26. Ryoo, M.S., & Aggarwal, J. (2009). Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: International conference on computer vision (pp. 1593–1600).Google Scholar
  27. Satkin, S., Hebert, M. (2010). Modeling the temporal extent of actions. In: European conference on computer vision (pp. 536–548). Google Scholar
  28. Schuldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: A local SVM approach. In: International conference on pattern recognition.Google Scholar
  29. Shi, Q., Wang, L., Cheng, L., & Smola, A. (2011). Discriminative human action segmentation and recognition using semi-markov models. International Journal of Computer Vision, 93(1), 22–32.CrossRefzbMATHGoogle Scholar
  30. Shimada, A., Kondo, K., Deguchi, D., Morin, G., & Stern, H. (2013). Kitchen scene context based gesture recognition: A contest in icpr2012. In: Advances in depth image analysis and applications. (vol. 7854), (pp. 168–185), URL http://www.murase.m.is.nagoya-u.ac.jp/KSCGR/index.html
  31. Tang, K., Fei-fei, L., & Koller, D. (2012). Learning latent temporal structure for complex event detection. In: International conference on computer vision and pattern recognition.Google Scholar
  32. Vahdat, A., Gao, B., Ranjbar, M., & Mori, G. (2011). A discriminative key pose sequence model for recognizing human interactions. In: ICCV workshop (pp. 1729–1736).Google Scholar
  33. Wang, G., & Forsyth, D. (2009). Joint learning of visual attributes, object classes and visual saliency. In: International conference on computer vision.Google Scholar
  34. Wang, H., & Schmid, C. (2013). Action recognition with improved trajectories. In: International conference on computer vision.Google Scholar
  35. Wang, H., Kläser, A., Schmid, C., & Cheng-Lin, L. (2011). Action recognition by dense trajectories. In: International conference on computer vision and pattern recognition (pp. 3169–3176).Google Scholar
  36. Wang, H., Kläser, A., Schmid, C., & Liu, C. L. (2013). Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 103(1), 60–79.CrossRefMathSciNetGoogle Scholar
  37. Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In: International conference on computer vision and pattern recognition (pp. 1290–1297).Google Scholar
  38. Wang, Y., & Mori, G. (2011). Hidden part models for human action recognition: Probabilistic versus max margin. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7), 1310–1323.CrossRefGoogle Scholar
  39. Wolf, C., Mille, J., Lombardi, L., Celiktutan, O., Jiu, M., Baccouche, M., Dellandrea, E., Bichot, C., Garcia, C., & Sankur, B. (2012). The liris human activities dataset and the icpr 2012 human activities recognition and localization competition. Technical report RR-LIRIS-2012-004, LIRIS laboratory, URL http://liris.cnrs.fr/harl2012/evaluation.html
  40. Yakhnenko, O., & Verbeek, J. (2011). Region-based image classification with a latent SVM model. Technical report, INRIA.Google Scholar
  41. Yamato, J., Ohya, J., & Ishii, K. (1992). Recognizing human action in time-sequential images using hidden markov model. In: International conference on computer vision and pattern recognition (pp. 379–385).Google Scholar
  42. Yuan, J., Liu, Z., & Wu, Y. (2011). Discriminative video pattern search for efficient action detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1728–1743.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Advanced Digital Sciences CenterSingaporeSingapore
  2. 2.University of Illinois at Urbana-ChampaignUrbanaUSA
  3. 3.National University of SingaporeSingaporeSingapore

Personalised recommendations