Action Recognition Robust to Background Clutter by Using Stereo Vision

  • Jordi Sanchez-Riera
  • Jan Čech
  • Radu Horaud
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7583)


An action recognition algorithm which works with binocular videos is presented. The proposed method uses standard bag-of-words approach, where each action clip is represented as a histogram of visual words. However, instead of using classical monocular HoG/HoF features, we construct features from the scene-flow computed by a matching algorithm on the sequence of stereo images. The resulting algorithm has a comparable or slightly better recognition accuracy than standard monocular solution in controlled setup with a single actor present in the scene. However, we show its significantly improved performance in case of strong background clutter due to other people freely moving behind the actor.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. CVIU 115, 224–241 (2011)Google Scholar
  2. 2.
    Poppe, R.: A survey on vision-based human action recognition. IVC 28, 976–990 (2010)CrossRefGoogle Scholar
  3. 3.
    Laptev, I.: On space-time interest points. IJCV 64 (2005)Google Scholar
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR (2005)Google Scholar
  5. 5.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. VS-PETS (2005)Google Scholar
  6. 6.
    Bregonzio, M., Gong, S., Xiang, T.: Recognising action as clouds of space-time interest points. In: Proc. CVPR (2009)Google Scholar
  7. 7.
    Wang, H., Klaser, A., Laptev, I., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proc. BMVC (2009)Google Scholar
  8. 8.
    Tuytelaars, T.: Dense interest points. In: Proc. CVPR (2010)Google Scholar
  9. 9.
    Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proc. BMVC (2008)Google Scholar
  10. 10.
    Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Proc. CVPR (2011)Google Scholar
  11. 11.
    Roh, M.C., Shin, H.K., Lee, S.W.: View-independent human action recognition with volume motion template on single stereo camera. Pattern Recognition Letters 31, 639–647 (2010)CrossRefGoogle Scholar
  12. 12.
    Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D exemplars. In: Proc. ICCV (2007)Google Scholar
  13. 13.
    Yan, P., Khan, S.M., Shah, M.: Learning 4D action feautre models for arbitrary view action recognition. In: Proc. CVPR (2008)Google Scholar
  14. 14.
    Uddin, M.Z., Thang, N.D., Kim, J.T., Kim, T.S.: Human activity recognition using body joint-angle features and hidden Markov model. ETRI Journal 33, 569–579 (2011)CrossRefGoogle Scholar
  15. 15.
    Holte, M.B., Moeslund, T.B., Fihl, P.: View-invariant gesture recognition using 3d optical flow and harmonic motion context. CVIU 114, 1353–1361 (2010)Google Scholar
  16. 16.
    Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: Proc. CVPR Workshop on Human Communicative Behaviour Analysis (2010)Google Scholar
  17. 17.
    Zhang, H., Parker, L.E.: 4-dimensional local spatio-temporal features for human activity recognition. In: Proc. IROS (2011)Google Scholar
  18. 18.
    Ni, P.B., Wang, G., Moulin: RGBD-HuDaAct: A color-depth video database for human daily activity recognition. In: Proc. ICCV Workshop on Consumer Depth Cameras for Computer Vision (2011)Google Scholar
  19. 19.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M.: Real-time human pose recognition in parts from single depth images. In: Proc. CVPR (2011)Google Scholar
  20. 20.
    Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: Proc. ICRA (2012)Google Scholar
  21. 21.
    Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: Proc. CVPR Workshop on Human Activity Understanding from 3D Data (HAU3D) (2012)Google Scholar
  22. 22.
    Cech, J., Sanchez-Riera, J., Horaud, R.P.: Scene flow estimation by growing correspondence seeds. In: Proc. CVPR (2011)Google Scholar
  23. 23.
    Šochman, J., Matas, J.: Waldboost – learning for time constrained sequential detection. In: CVPR (2005)Google Scholar
  24. 24.
    Alameda-Pineda, X., Sanchez-Riera, J., Franc, V., Wienke, J., Cech, J., Kulkarni, K., Deleforge, A., Horaud, R.P.: Ravel: An annotated corpus for training robots with audiovisual abilities. Journal on Multimodal User Interfaces (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jordi Sanchez-Riera
    • 1
  • Jan Čech
    • 1
  • Radu Horaud
    • 1
  1. 1.INRIA Grenoble Rhône-AlpesMontbonnot Saint-MarinFrance

Personalised recommendations