Advertisement

Dyadic Interaction Detection from Pose and Flow

  • Coert van Gemeren
  • Robby T. Tan
  • Ronald Poppe
  • Remco C. Veltkamp
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8749)

Abstract

We propose a method for detecting dyadic interactions: fine-grained, coordinated interactions between two people. Our model is capable of recognizing interactions such as a hand shake or a high five, and locating them in time and space. At the core of our method is a pictorial structures model that additionally takes into account the fine-grained movements around the joints of interest during the interaction. Compared to a bag-of-words approach, our method not only allows us to detect the specific type of actions more accurately, but it also provides the specific location of the interaction. The model is trained with both video data and body joint estimates obtained from Kinect. During testing, only video data is required. To demonstrate the efficacy of our approach, we introduce the ShakeFive dataset that consists of videos and Kinect data of hand shake and high five interactions. On this dataset, we obtain a mean average precision of 49.56%, outperforming a bag-of-words approach by 23.32%. We further demonstrate that the model can be learned from just a few interactions.

Keywords

Action Recognition Mean Average Precision Dyadic Interaction Slide Window Approach Dense Trajectory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28(6), 976–990 (2010)CrossRefGoogle Scholar
  2. 2.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: Proceedings International Conference on Pattern Recognition (ICPR), Cambridge, UK, pp. 32–36 (2004)Google Scholar
  3. 3.
    Wang, H., Kläser, A., Schmid, C., Cheng-Lin, L.: Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision (IJCV) 103(1), 60–79 (2013)CrossRefGoogle Scholar
  4. 4.
    Felzenszwalb, P.F., Huttenlocher, D.: Pictorial structures for object recognition. International Journal of Computer Vision (IJCV) 61(1), 55–79 (2005)CrossRefGoogle Scholar
  5. 5.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  6. 6.
    Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 35(12), 2878–2890 (2013)CrossRefGoogle Scholar
  7. 7.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: Proceedings IEEE International Conference on Computer Vision (ICCV), Kyoto, Japan, pp. 1365–1372 (2009)Google Scholar
  8. 8.
    Maji, S., Bourdev, L.D., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Colorado Springs, CO, pp. 3177–3184 (2011)Google Scholar
  9. 9.
    Raptis, M., Sigal, L.: Poselet key-framing: A model for human activity recognition. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, pp. 2650–2657 (2013)Google Scholar
  10. 10.
    Yao, B.Z., Nie, B.X., Liu, Z., Zhu, S.C.: Animated pose templates for modeling and detecting human actions. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 36(3), 436–452 (2014)CrossRefGoogle Scholar
  11. 11.
    Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: Proceedings IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, pp. 3192–3199 (2013)Google Scholar
  12. 12.
    Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 31(10), 1775–1789 (2009)CrossRefGoogle Scholar
  13. 13.
    Yao, B., Fei-Fei, L.: Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 34(9), 1691–1703 (2012)CrossRefGoogle Scholar
  14. 14.
    Lan, T., Wang, Y., Yang, W., Robinovitch, S.N., Mori, G.: Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(8), 1549–1562 (2012)CrossRefGoogle Scholar
  15. 15.
    Choi, W., Savarese, S.: Understanding collective activities of people from videos. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 36(6), 1242–1257 (2014)CrossRefGoogle Scholar
  16. 16.
    Cristani, M., Bazzani, L., Paggetti, G., Fossati, A., Tosato, D., Del Bue, A., Menegaz, G., Murino, V.: Social interaction discovery by statistical analysis of F-formations. In: Proceedings British Machine Vision Conference (BMVC), Dundee, United Kingdom, pp. 1–12 (2011)Google Scholar
  17. 17.
    Chang, M.C., Krahnstoever, N., Ge, W.: Probabilistic group-level motion analysis and scenario recognition. In: Proceedings IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, pp. 747–754 (2011)Google Scholar
  18. 18.
    Patron-Perez, A., Marszałek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in tv shows. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 34(12), 2441–2453 (2012)CrossRefGoogle Scholar
  19. 19.
    Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: DeepFlow: Large displacement optical flow with deep matching. In: Proceedings IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, pp. 1385–1392 (2013)Google Scholar
  20. 20.
    Ryoo, M.S., Aggarwal, J.K.: UT-Interaction Dataset, ICPR contest on semantic description of human activities, SDHA (2010), http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html
  21. 21.
    Maji, S., Berg, A.C., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, pp. 1–8 (2008)Google Scholar
  22. 22.
    Mittal, A., Blaschko, M.B., Zisserman, A., Torr, P.H.S.: Taxonomic multi-class prediction and person layout using efficient structured ranking. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 245–258. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Coert van Gemeren
    • 1
  • Robby T. Tan
    • 2
  • Ronald Poppe
    • 1
  • Remco C. Veltkamp
    • 1
  1. 1.Interaction Technology Group, Department of Information, and Computing SciencesUtrecht UniversityThe Netherlands
  2. 2.School of Science and TechnologySIM UniversitySingapore

Personalised recommendations