Learning Human Interaction by Interactive Phrases

  • Yu Kong
  • Yunde Jia
  • Yun Fu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7572)


In this paper, we present a novel approach for human interaction recognition from videos. We introduce high-level descriptions called interactive phrases to express binary semantic motion relationships between interacting people. Interactive phrases naturally exploit human knowledge to describe interactions and allow us to construct a more descriptive model for recognizing human interactions. We propose a novel hierarchical model to encode interactive phrases based on the latent SVM framework where interactive phrases are treated as latent variables. The interdependencies between interactive phrases are explicitly captured in the model to deal with motion ambiguity and partial occlusion in interactions. We evaluate our method on a newly collected BIT-Interaction dataset and UT-Interaction dataset. Promising results demonstrate the effectiveness of the proposed method.


  1. 1.
    Patron-Perez, A., Marszalek, M., Zissermann, A., Reid, I.: High five: Recognising human interactions in tv shows. In: BMVC (2010)Google Scholar
  2. 2.
    Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)Google Scholar
  3. 3.
    Lan, T., Wang, Y., Yang, W., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: NIPS (2010)Google Scholar
  4. 4.
    Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: Using spatial and functional compatibility for recognition. PAMI 31, 1775–1789 (2009)CrossRefGoogle Scholar
  5. 5.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR, pp. 17–24 (2010)Google Scholar
  6. 6.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for static human-object interactions. In: CVPR Workshop on Structued Models in Computer Vision (2010)Google Scholar
  7. 7.
    Ferrari, V., Zisserman, A.: Learning visual attributes. In: NIPS (2007)Google Scholar
  8. 8.
    Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)Google Scholar
  9. 9.
    Wang, Y., Mori, G.: A Discriminative Latent Model of Object Classes and Attributes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 155–168. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  11. 11.
    Wang, Y., Mori, G.: Max-margin hidden conditional random fields for human action recognition. In: CVPR, pp. 872–879 (2009)Google Scholar
  12. 12.
    Vahdat, A., Gao, B., Ranjbar, M., Mori, G.: A discriminative key pose sequence model for recognizing human interactions. In: ICCV Workshops, pp. 1729–1736 (2011)Google Scholar
  13. 13.
    Ryoo, M., Aggarwal, J.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: ICCV, pp. 1593–1600 (2009)Google Scholar
  14. 14.
    Yu, T.H., Kim, T.K., Cipolla, R.: Real-time action recognition by spatiotemporal semantic and structural forests. In: BMVC (2010)Google Scholar
  15. 15.
    Ryoo, M., Aggarwal, J.: Stochastic representation and recognition of high-level group activities. IJCV 93, 183–200 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)Google Scholar
  17. 17.
    Gupta, A., Davis, L.S.: Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 16–29. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)Google Scholar
  19. 19.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)Google Scholar
  20. 20.
    Chow, C., Liu, C.: Approximating discrete probability distributions with dependence tree. IEEE Transactions on Information Theory 14, 462–467 (1968)zbMATHCrossRefGoogle Scholar
  21. 21.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: ICCV (2009)Google Scholar
  22. 22.
    Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: NIPS (2003)Google Scholar
  23. 23.
    Ryoo, M.S., Aggarwal, J.K.: UT-Interaction Dataset. In: ICPR Contest on Semantic Description of Human Activities, SDHA (2010),
  24. 24.
    Ryoo, M.S.: Human activity prediction: Early recognition of ongoing activities from streaming videos. In: ICCV (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yu Kong
    • 1
    • 3
  • Yunde Jia
    • 1
  • Yun Fu
    • 2
  1. 1.Beijing Laboratory of Intelligent Information Technology, School of Computer ScienceBeijing Institute of TechnologyBeijingP.R. China
  2. 2.Department of ECE and College of CISNortheastern UniversityBostonUSA
  3. 3.Department of CSEState University of New YorkBuffaloUSA

Personalised recommendations