Dyadic Interaction Detection from Pose and Flow

van Gemeren, Coert; Tan, Robby T.; Poppe, Ronald; Veltkamp, Remco C.

doi:10.1007/978-3-319-11839-0_9

Coert van Gemeren²¹,
Robby T. Tan²²,
Ronald Poppe²¹ &
…
Remco C. Veltkamp²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8749))

Included in the following conference series:

International Workshop on Human Behavior Understanding

980 Accesses
10 Citations

Abstract

We propose a method for detecting dyadic interactions: fine-grained, coordinated interactions between two people. Our model is capable of recognizing interactions such as a hand shake or a high five, and locating them in time and space. At the core of our method is a pictorial structures model that additionally takes into account the fine-grained movements around the joints of interest during the interaction. Compared to a bag-of-words approach, our method not only allows us to detect the specific type of actions more accurately, but it also provides the specific location of the interaction. The model is trained with both video data and body joint estimates obtained from Kinect. During testing, only video data is required. To demonstrate the efficacy of our approach, we introduce the ShakeFive dataset that consists of videos and Kinect data of hand shake and high five interactions. On this dataset, we obtain a mean average precision of 49.56%, outperforming a bag-of-words approach by 23.32%. We further demonstrate that the model can be learned from just a few interactions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28(6), 976–990 (2010)
Article Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: Proceedings International Conference on Pattern Recognition (ICPR), Cambridge, UK, pp. 32–36 (2004)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Cheng-Lin, L.: Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision (IJCV) 103(1), 60–79 (2013)
Article Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.: Pictorial structures for object recognition. International Journal of Computer Vision (IJCV) 61(1), 55–79 (2005)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 32(9), 1627–1645 (2010)
Article Google Scholar
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 35(12), 2878–2890 (2013)
Article Google Scholar
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: Proceedings IEEE International Conference on Computer Vision (ICCV), Kyoto, Japan, pp. 1365–1372 (2009)
Google Scholar
Maji, S., Bourdev, L.D., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Colorado Springs, CO, pp. 3177–3184 (2011)
Google Scholar
Raptis, M., Sigal, L.: Poselet key-framing: A model for human activity recognition. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, pp. 2650–2657 (2013)
Google Scholar
Yao, B.Z., Nie, B.X., Liu, Z., Zhu, S.C.: Animated pose templates for modeling and detecting human actions. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 36(3), 436–452 (2014)
Article Google Scholar
Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: Proceedings IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, pp. 3192–3199 (2013)
Google Scholar
Gupta, A., Kembhavi, A., Davis, L.: Observing human-object interactions: Using spatial and functional compatibility for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 31(10), 1775–1789 (2009)
Article Google Scholar
Yao, B., Fei-Fei, L.: Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 34(9), 1691–1703 (2012)
Article Google Scholar
Lan, T., Wang, Y., Yang, W., Robinovitch, S.N., Mori, G.: Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(8), 1549–1562 (2012)
Article Google Scholar
Choi, W., Savarese, S.: Understanding collective activities of people from videos. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 36(6), 1242–1257 (2014)
Article Google Scholar
Cristani, M., Bazzani, L., Paggetti, G., Fossati, A., Tosato, D., Del Bue, A., Menegaz, G., Murino, V.: Social interaction discovery by statistical analysis of F-formations. In: Proceedings British Machine Vision Conference (BMVC), Dundee, United Kingdom, pp. 1–12 (2011)
Google Scholar
Chang, M.C., Krahnstoever, N., Ge, W.: Probabilistic group-level motion analysis and scenario recognition. In: Proceedings IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, pp. 747–754 (2011)
Google Scholar
Patron-Perez, A., Marszałek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in tv shows. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 34(12), 2441–2453 (2012)
Article Google Scholar
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: DeepFlow: Large displacement optical flow with deep matching. In: Proceedings IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, pp. 1385–1392 (2013)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: UT-Interaction Dataset, ICPR contest on semantic description of human activities, SDHA (2010), http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html
Maji, S., Berg, A.C., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, pp. 1–8 (2008)
Google Scholar
Mittal, A., Blaschko, M.B., Zisserman, A., Torr, P.H.S.: Taxonomic multi-class prediction and person layout using efficient structured ranking. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 245–258. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Interaction Technology Group, Department of Information, and Computing Sciences, Utrecht University, The Netherlands
Coert van Gemeren, Ronald Poppe & Remco C. Veltkamp
School of Science and Technology, SIM University, Singapore
Robby T. Tan

Authors

Coert van Gemeren
View author publications
You can also search for this author in PubMed Google Scholar
Robby T. Tan
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Poppe
View author publications
You can also search for this author in PubMed Google Scholar
Remco C. Veltkamp
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Pennsylvania, Philadelphia, PA, USA
Hyun Soo Park
Department of Computer Engineering, Boğaziçi University, Bebek, 34342, Istanbul, Turkey
Albert Ali Salah
University of California, David, CA, USA
Yong Jae Lee
University of Southern California, Playa Vista, CA, USA
Louis-Philippe Morency
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Yaser Sheikh
Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Modena, Modena, Italy
Rita Cucchiara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Gemeren, C., Tan, R.T., Poppe, R., Veltkamp, R.C. (2014). Dyadic Interaction Detection from Pose and Flow. In: Park, H.S., Salah, A.A., Lee, Y.J., Morency, LP., Sheikh, Y., Cucchiara, R. (eds) Human Behavior Understanding. HBU 2014. Lecture Notes in Computer Science, vol 8749. Springer, Cham. https://doi.org/10.1007/978-3-319-11839-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-11839-0_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11838-3
Online ISBN: 978-3-319-11839-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics