Advertisement

Machine Vision and Applications

, Volume 23, Issue 1, pp 135–150 | Cite as

Human action recognition based on aggregated local motion estimates

  • M. LucenaEmail author
  • N. Pérez de la Blanca
  • J. M. Fuertes
Original Paper

Abstract

This paper addresses the human action recognition task from optical flow. This task is in itself an interesting problem, given the lack of accuracy and noisy characteristics of the optical flow estimation. Optical flow is one of the most popular descriptors characterizing motion, but due to its instability is usually used in combination with parametric models. In this work, we develop a non-parametric motion model using only the image region surrounding the actor making the action. To be precise, for every two consecutive frames, a local motion descriptor is calculated from the optical flow orientation histograms collected inside the actor’s bounding box. An action descriptor is built by weighting and aggregating the estimated histograms along the temporal axis. The proposed approach obtains a promising trade-off between complexity and performance compared with state-of-the-art approaches. The action recognition can also be done in real time by accumulating evidence from each new incoming image. Experiments on two well-known video sequence databases are carried out in order to evaluate the behavior of the proposal.

Keywords

Human action recognition Motion characterization Optical flow descriptors Video sequence processing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

138_2010_305_MOESM1_ESM.pdf (21 kb)
ESM 1 (PDF 21 kb)

References

  1. 1.
    Aggarwal J., Cai Q.: Human motion analysis: a review. Comput. Vis. Image Underst. 73(3), 428–440 (1999)CrossRefGoogle Scholar
  2. 2.
    Ahmad, M., Lee, S.: HMM-based human action recognition using multiview image sequences. In: International Conference on Pattern Recognition, pp. 263–266 (2006)Google Scholar
  3. 3.
    Ahmad M., Lee S.: Human action recognition using shape and clg-motion flow from multi-view image sequences. Pattern Recognit. 41, 2237–2252 (2008)zbMATHCrossRefGoogle Scholar
  4. 4.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV’05), vol. 2, pp. 1395–1402 (2005)Google Scholar
  5. 5.
    Bobick A., Davis J.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)CrossRefGoogle Scholar
  6. 6.
    Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2008Google Scholar
  7. 7.
    Brand, M., Oliver, N., Pentland, A.: Coupled hidden Markov models for complex action recognition. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 994–999 (1997)Google Scholar
  8. 8.
    Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Proceedings of the 8th European Conference on Computer Vision. Lecture Notes in Computer Science, vol. 3024, pp. 25–36. Springer, New York (2004)Google Scholar
  9. 9.
    Bruhn A., Weickert J., Schnörr C.: Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int. J. Comput. Vis. 61(3), 211–231 (2005)CrossRefGoogle Scholar
  10. 10.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. (2001). http://www.csie.ntu.edu.tw/~cjlin/libsvm
  11. 11.
    Cuntoor, N.P., Yegnanarayana, B., Chellappa, R.: Interpretation of state sequences in hmm for activity representation. In: Proceedings of IEEE ICASSP, pp. 709–712 (2005)Google Scholar
  12. 12.
    Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, vol. 2, pp. 428–441 (2006)Google Scholar
  13. 13.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)Google Scholar
  14. 14.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (2003)Google Scholar
  15. 15.
    Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Proceedings of the 13th Scandinavian Conference on Image Analysis. Lecture Notes in Computer Science, vol. 2749, pp. 363–370, Göthenburg, Sweden, June–July, 2003Google Scholar
  16. 16.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: Int. Conference on Computer Vision and Pattern Recognition, CVPR-08 (2008)Google Scholar
  17. 17.
    Gavrila D.: The visual analysis of human movement: a survey. Comput. Vis. Image Underst. 73(1), 82–98 (1999)zbMATHCrossRefGoogle Scholar
  18. 18.
    Ikizler, N., Duygulu, P.: Human action recognition using distribution of oriented rectangular patches. In: Workshop on Human Motion. Lecture notes in Computer Science, vol. 4814, pp. 271–284. Springer, New York (2007)Google Scholar
  19. 19.
    Isard, M., MacCormick, J.: Bramble: a Bayesian multiple-blob tracker. In: Proceedings of the Eighth IEEE International Conference on Computer Vision (ICCV ’01), vol. 2, pp. 34–41 (2001)Google Scholar
  20. 20.
    Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: Proceedings of IEEE International Conference on Computer Vision (ICCV ’05), pp. 166–173 (2005)Google Scholar
  21. 21.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: International Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  22. 22.
    Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008Google Scholar
  23. 23.
    Liu, J., Shah, M.: Learning human actions via information maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008Google Scholar
  24. 24.
    Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of DARPA IU Workshop, pp. 121–130 (1981)Google Scholar
  25. 25.
    Lucena, M., Pérez de la Blanca, N., Fuertes, J.M., Marín-Jiménez, M.J.: Human action recognition using optical flow accumulated local histograms. In: Proceedings of the 4th IbPRIA. Lecture Notes in Computer Science, vol. 5524, pp. 32–39, June 2009, Póvoa de Varzim (Portugal). Springer, New York (2009)Google Scholar
  26. 26.
    Mendoza, M.A., Perez de la Blanca, N.: HMM-based action recognition using contour histograms. In: Proceedings of the 3th Iberian Conference on Pattern Recognition and Image Analysis. Lecture Notes in Computer Science, vol. 4477, pp. 394–401. Springer, New York (2007)Google Scholar
  27. 27.
    Mendoza, M.A., Perez de la Blanca, N.: Human action recognition using space state models: a comparitive study. In: Proceedings of the AMDO’08. Palma de Mallorca (2008)Google Scholar
  28. 28.
    Mikolajczyk, K., Uemura, H.: Action recognition with motion-appearance vocabulary forest. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008Google Scholar
  29. 29.
    Moeslund T., Granum E.: A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81(3), 231–268 (2001)zbMATHCrossRefGoogle Scholar
  30. 30.
    Moeslund T., Hilton A., Krger V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104, 90–126 (2006)CrossRefGoogle Scholar
  31. 31.
    Mokhber A., Achard C., Maurice M.: Recognition of human behavior by space-time silhouette characterization. Pattern Recognit. Lett. 29, 81–89 (2008)CrossRefGoogle Scholar
  32. 32.
    Morency, L.P., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. Technical report, Massachussetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory (CSAIL) (2007)Google Scholar
  33. 33.
    Otsu N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybernet. 9, 62–66 (1979)CrossRefGoogle Scholar
  34. 34.
    Polana, R., Nelson, R.: Detecting activities. In: Proceedings of Computer Vision and Pattern Recognition, pp. 2–7 (1993)Google Scholar
  35. 35.
    Ramanan D., Forsyth D.A., Zisserman A.: Tracking people by learning their appearance. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 65–81 (2007)CrossRefGoogle Scholar
  36. 36.
    Rao, C., Shah, M.: View-invariance in action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 316–322 (2001)Google Scholar
  37. 37.
    Schindler, K., van Gool, L.: Action snippets: how many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR 2008), June 2008Google Scholar
  38. 38.
    Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, vol. 3, pp. 32–36, Cambridge, UK (2004)Google Scholar
  39. 39.
    Seitz S., Dyer C.: View invariant analysis of cyclic motion. Int. J. Comput. Vis. 25, 231–251 (1997)CrossRefGoogle Scholar
  40. 40.
    Shechtman E., Irani M.: Space-time behavior-based correlation or how to tell if two underlying motion fields are similar without computing them?. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 2045–2056 (2007)CrossRefGoogle Scholar
  41. 41.
    Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional models for contextual human motion recognition. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), vol. 2, pp. 1808–1815 (2005)Google Scholar
  42. 42.
    Venkatesh Babu R., Anantharaman B., Ramakrishnan K.R., Srinivasan S.H.: Compressed domain action classification using hmm. Pattern Recognit. Lett. 23(10), 1203–1213 (2002)zbMATHCrossRefGoogle Scholar
  43. 43.
    Venkatesh Babu, R., Ramakrishnan, K.R.: Compressed domain human motion recognition using motion history information. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’03), vol. 3, pp. 321–324, 6–10 April 2003Google Scholar
  44. 44.
    Wang, S., Quattoni, A., Morency, L.P., Demirdjian, D., Darrel, T.: Hidden conditional random fields for gesture recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’06), vol. 2, pp. 1521–1527 (2006)Google Scholar
  45. 45.
    Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: ECCV ’08: Proceedings of the 10th European Conference on Computer Vision, pp. 650–663. Springer-Verlag, Berlin (2008)Google Scholar
  46. 46.
    Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden Markov model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 379–385 (1992)Google Scholar
  47. 47.
    Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: Proceedings of Computer Vision and Pattern Recognition, vol. 2, pp. 123–130 (2001)Google Scholar
  48. 48.
    Zelnik-Manor L., Irani M.: Statistical analysis of dynamic actions. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1530–1535 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • M. Lucena
    • 1
    Email author
  • N. Pérez de la Blanca
    • 2
  • J. M. Fuertes
    • 1
  1. 1.Department of Computer ScienceUniversity of JaénJaénSpain
  2. 2.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations