Modeling and Recognition of Complex Human Activities

  • Nandita M. Nayak
  • Ricky J. Sethi
  • Bi Song
  • Amit K. Roy-Chowdhury

Abstract

Activity recognition is a field of computer vision which has shown great progress in the past decade. Starting from simple single person activities, research in activity recognition is moving toward more complex scenes involving multiple objects and natural environments. The main challenges in the task include being able to localize and recognize events in a video and deal with the large amount of variation in viewpoint, speed of movement and scale. This chapter gives the reader an overview of the work that has taken place in activity recognition, especially in the domain of complex activities involving multiple interacting objects. We begin with a description of the challenges in activity recognition and give a broad overview of the different approaches. We go into the details of some of the feature descriptors and classification strategies commonly recognized as being the state of the art in this field. We then move to more complex recognition systems, discussing the challenges in complex activity recognition and some of the work which has taken place in this respect. Finally, we provide some examples of recent work in complex activity recognition. The ability to recognize complex behaviors involving multiple interacting objects is a very challenging problem and future work needs to study its various aspects of features, recognition strategies, models, robustness issues, and context, to name a few.

References

  1. 1.
    Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. Comput. Vis. Image Underst. 73(3), 428–440 (1999) CrossRefGoogle Scholar
  2. 2.
    Anderson, P.A.: Nonverbal Communication: Forms and Functions, 2nd edn. Waveland Press, Long Grove (2008) Google Scholar
  3. 3.
    Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. Int. J. Comput. Vis. 12, 43–77 (1994) CrossRefGoogle Scholar
  4. 4.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2001) Google Scholar
  5. 5.
    Benezeth, Y., Jodoin, P.M., Saligrama, V., Rosenberger, C.: Abnormal events detection based on spatio-temporal co-occurrences. In: Computer Vision and Pattern Recognition, pp. 2458–2465 (2009) Google Scholar
  6. 6.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space–time shapes. In: International Conference on Computer Vision, Washington, DC, USA, pp. 1395–1402 (2005) Google Scholar
  7. 7.
    Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001) CrossRefGoogle Scholar
  8. 8.
    Chaudhary, R., Ravichandran, A., Hager, G.D., Vidal, R.: Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: Computer Vision and Pattern Recognition, pp. 1932–1939 (2009) Google Scholar
  9. 9.
    Cinbis, N.I., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: European Conference on Computer Vision, pp. 494–507 (2010) Google Scholar
  10. 10.
    Cock, K.D., Moor, B.D.: Subspace angles and distances between ARMA models. Syst. Control Lett. 46(4), 265–270 (2002) MATHCrossRefGoogle Scholar
  11. 11.
    Cuntoor, N.P., Chellappa, R.: Epitomic representation of human activities. In: Computer Vision and Pattern Recognition, pp. 1–8 (2007) Google Scholar
  12. 12.
    Denina, G., Bhanu, B., Nguyen, H., Ding, C., Kamal, A., Ravishanka, C., Roy-Chowdhury, A., Ivers, A., Varda, B.: Videoweb dataset for multi-camera activities and non-verbal communication. In: Distributed Video Sensor Networks. Springer, London (2010) Google Scholar
  13. 13.
    Ding, L., Yilmaz, A.: Learning relations among movie characters: A social network perspective. In: European Conference on Computer Vision, pp. 410–423 (2010) Google Scholar
  14. 14.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005) CrossRefGoogle Scholar
  15. 15.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: International Conference of Computer Vision, pp. 726–733 (2003) CrossRefGoogle Scholar
  16. 16.
    Forstner, W., Gulch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: ISPRS Intercommission Conference on Fast Processing of Photogrammetric Data, pp. 281–305 (1987) Google Scholar
  17. 17.
    Gaur, U.: Complex activity recognition using string of feature graphs. Master’s thesis, University of California, Riverside, CA, USA (2010) Google Scholar
  18. 18.
    Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer, Dordrecht (1995) Google Scholar
  19. 19.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Fourth Alvey Vision Conference, pp. 147–151 (1988) Google Scholar
  20. 20.
    Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000) CrossRefGoogle Scholar
  21. 21.
    Wang, H., Niebles, J.C., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: British Machine Vision Conference (2006) Google Scholar
  22. 22.
    Jiang, F., Yuan, J., Tsaftaris, S.A., Katsaggelos, A.K.: Anomalous video event detection using spatiotemporal context. Comput. Vis. Image Underst. 115, 323–333 (2011) CrossRefGoogle Scholar
  23. 23.
    Joo, S.W., Chellappa, R.: Attribute grammar-based event recognition and anomaly detection. In: Computer Vision and Pattern Recognition Workshop, p. 107 (2006) Google Scholar
  24. 24.
    Kale, A., Sundaresan, A., Rajagopalan, A.N., Cuntoor, N.P., Roy-Chowdhury, A.K., Krueger, V., Chellappa, R.: Identification of humans using gait. IEEE Trans. Image Process. 13, 1163–1173 (2004) CrossRefGoogle Scholar
  25. 25.
    Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: International Conference on Computer Vision, vol. 1, pp. 166–173 (2005) Google Scholar
  26. 26.
    Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space–time neighborhood features for human action recognition. In: Computer Vision and Pattern Recognition, pp. 2046–2053 (2010) Google Scholar
  27. 27.
    Kuettel, D., Breitenstein, M.D., Gool, L.J.V., Ferrari, V.: What’s going on? discovering spatio-temporal dependencies in dynamic scenes. In: Computer Vision and Pattern Recognition, pp. 1951–1958 (2010) Google Scholar
  28. 28.
    Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: First International Workshop on Spatial Coherence for Visual Motion Analysis (2004) Google Scholar
  29. 29.
    Lee, M.W., Nevatia, R.: Human pose tracking in monocular sequence using multilevel structured models. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 27–38 (2009) CrossRefGoogle Scholar
  30. 30.
    Leordeanu, M., Hebert, M.: A spectral technique for correspondence problems using pairwise constraints. In: International Conference of Computer Vision, vol. 2, pp. 1482–1489 (October 2005) Google Scholar
  31. 31.
    Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30, 79–116 (1998) CrossRefGoogle Scholar
  32. 32.
    Liu, H., Feris, R.S., Krueger, V., Sun, M.T.: Unsupervised action classification using space–time link analysis. EURASIP J. Image Video Process. 2010, Article ID 626324 (2010) CrossRefGoogle Scholar
  33. 33.
    Liu, Z., Sarkar, S.: Improved gait recognition by gait dynamics normalization. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2006 (2006) CrossRefGoogle Scholar
  34. 34.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, Washington, DC, USA, pp. 1150–1157 (1999) CrossRefGoogle Scholar
  35. 35.
    Makris, D., Ellis, T.: Learning semantic scene models from observing activity in visual surveillance. IEEE Trans. Syst. Man Cybern. 35(3), 397–408 (2005) CrossRefGoogle Scholar
  36. 36.
    Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: European Conference on Computer Vision (September 2010) Google Scholar
  37. 37.
    Medioni, G., Nevatia, R., Cohen, I.: Event detection and analysis from video streams. IEEE Trans. Pattern Anal. Mach. Intell. 23, 873–889 (1998) CrossRefGoogle Scholar
  38. 38.
    Mehran, R., Moore, B.E., Shah, M.: A streakline representation of flow in crowded scenes. In: European Conference on Computer Vision, pp. 439–452 (2010) Google Scholar
  39. 39.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1615–1630 (2005) CrossRefGoogle Scholar
  40. 40.
    Natarajan, P., Singh, V.K., Nevatia, R.: Learning 3d action models from a few 2d videos for view invariant action recognition. In: Computer Vision and Pattern Recognition, pp. 2006–2013 (2010) Google Scholar
  41. 41.
    North, B., Blake, A., Isard, M., Rittscher, J.: Learning and classification of complex dynamics. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1016–1034 (2000) CrossRefGoogle Scholar
  42. 42.
    Park, S.: A hierarchical Bayesian network for event recognition of human actions and interactions. Assoc. Comput. Mach. Multimedia Syst. J. 10, 164–179 (2004) Google Scholar
  43. 43.
    Park, S., Aggarwal, J.K.: Recognition of two-person interactions using a hierarchical Bayesian network. In: ACM SIGMM International Workshop on Video Surveillance, New York, NY, USA, pp. 65–76 (2003) CrossRefGoogle Scholar
  44. 44.
    Polana, R., Nelson, R.C.: Detection and recognition of periodic, nonrigid motion. Int. J. Comput. Vis. 23(3), 261–282 (1997) CrossRefGoogle Scholar
  45. 45.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000) CrossRefGoogle Scholar
  46. 46.
    Ryoo, M.S., Yu, W.: One video is sufficient? human activity recognition using active video composition. In: IEEE Workshop on Motion and Video Computing (2011) Google Scholar
  47. 47.
    Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: Computer Vision and Pattern Recognition, pp. 1709–1718 (2006) Google Scholar
  48. 48.
    Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. Int. J. Comput. Vis. 82(1), 1–24 (2009) CrossRefGoogle Scholar
  49. 49.
    Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: International Conference on Computer Vision, pp. 1593–1600 (2009) CrossRefGoogle Scholar
  50. 50.
    Ryoo, M.S., Chen, C., Aggarwal, J.K., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities (SDHA) 2010. In: International Conference on Pattern Recognition, Berlin, Heidelberg, pp. 270–285 (2010) Google Scholar
  51. 51.
    Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978) MATHCrossRefGoogle Scholar
  52. 52.
    Savarese, S., DelPozo, A., Niebles, J.C., Fei-Fei, L.: Spatial-temporal correlations for unsupervised action classification. In: IEEE Workshop on Motion and Video Computing (2008) Google Scholar
  53. 53.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: International Conference on Pattern Recognition (2004) Google Scholar
  54. 54.
    Seo, H.J., Milanfar, P.: Detection of human actions from a single example. In: International Conference on Computer Vision (2009) Google Scholar
  55. 55.
    Sethi, R.J., Roy-Chowdhury, A.K., Ali, S.: Activity recognition by integrating the physics of motion with a neuromorphic model of perception. In: IEEE Workshop on Motion and Video Computing (2009) Google Scholar
  56. 56.
    Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: Computer Vision and Pattern Recognition (2007) Google Scholar
  57. 57.
    Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473–1488 (2008) CrossRefGoogle Scholar
  58. 58.
    Vaswani, N., Roy-Chowdhury, A., Chellappa, R.: “Shape activity”: A continuous state HMM for moving/deforming shapes with application to abnormal activity detection. IEEE Trans. Image Process. 14, 1603–1616 (2005) CrossRefGoogle Scholar
  59. 59.
    Wersborg, I.S., Bautze, T., Born, F., Diepold, K.: A cognitive approach for a robotic welding system that can learn how to weld from acoustic data. In: Computational Intelligence in Robotics and Automation, Piscataway, NJ, USA, pp. 108–113 (2009) Google Scholar
  60. 60.
    Yilmaz, A., Shah, M.: Actions sketch: A novel action representation. In: Computer Vision and Pattern Recognition, vol. 1, pp. 984–989 (2005) Google Scholar
  61. 61.
    Young, R.A., Lesperance, R.M.: The Gaussian derivative model for spatial-temporal vision. Spat. Vis. 2001, 3–4 (2001) Google Scholar
  62. 62.
    Zeng, Z., Qiang, J.: Knowledge based activity recognition with dynamic Bayesian network. In: European Conference in Computer Vision, Crete, Greece (2010) Google Scholar
  63. 63.
    Zhang, Z., Huang, K.Q., Tan, T.N.: Complex activity representation and recognition by extended stochastic grammar. In: Asian Conference on Computer Vision, pp. 150–159 (2006) Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Nandita M. Nayak
    • 1
  • Ricky J. Sethi
    • 2
    • 3
  • Bi Song
    • 1
  • Amit K. Roy-Chowdhury
    • 1
  1. 1.University of CaliforniaRiversideUSA
  2. 2.University of CaliforniaLos AngelesUSA
  3. 3.Information Sciences InstituteUniversity of Southern CaliforniaMarina del ReyUSA

Personalised recommendations