Spatio-Temporal Detection of Fine-Grained Dyadic Human Interactions

van Gemeren, Coert; Poppe, Ronald; Veltkamp, Remco C.

doi:10.1007/978-3-319-46843-3_8

Coert van Gemeren¹⁶,
Ronald Poppe¹⁶ &
Remco C. Veltkamp¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9997))

Included in the following conference series:

International Workshop on Human Behavior Understanding

921 Accesses
7 Citations

Abstract

We introduce a novel spatio-temporal deformable part model for offline detection of fine-grained interactions in video. One novelty of the model is that part detectors model the interacting individuals in a single graph that can contain different combinations of feature descriptors. This allows us to use both body pose and movement to model the coordination between two people in space and time. We evaluate the performance of our approach on novel and existing interaction datasets. When testing only on the target class, we achieve mean average precision scores of 0.82. When presented with distractor classes, the additional modelling of the motion of specific body parts significantly reduces the number of confusions. Cross-dataset tests demonstrate that our trained models generalize well to other settings.

This publication was supported by the Dutch national program COMMIT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

Article Open access 01 March 2018

Dyadic Interaction Detection from Pose and Flow

Part Bricolage: Flow-Assisted Part-Based Graphs for Detecting Activities in Videos

Notes

1.
ShakeFive2 is publicly available from https://goo.gl/ObHv36.

References

Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15567-3_13
Chapter Google Scholar
Choi, W., Savarese, S.: Understanding collective activities of people from videos. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 36(6), 1242–1257 (2014)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 32(9), 1627–1645 (2010)
Article Google Scholar
Girshick, R., Iandola, F., Darrell, T., Malik, J.: Deformable part models are convolutional neural networks. In: Proceedings Conference on Computer Vision and Pattern Recognition (CVPR), pp. 437–446 (2015)
Google Scholar
Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: Proceedings IEEE International Conference on Computer Vision (ICCV), pp. 3192–3199 (2013)
Google Scholar
Kabsch, W.: A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallogr. Sect. A 34(5), 827–828 (1978)
Article Google Scholar
Kong, Y., Fu, Y.: Close human interaction recognition using patch-aware models. IEEE Trans. Image Process. (TIP) 25(1), 167–178 (2015)
Article MathSciNet Google Scholar
Kong, Y., Jia, Y., Fu, Y.: Interactive phrases: semantic descriptions for human interaction recognition. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 36(9), 1775–1788 (2014)
Article Google Scholar
Lan, T., Wang, Y., Yang, W., Robinovitch, S.N., Mori, G.: Discriminative latent models for recognizing contextual group activities. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 34(8), 1549–1562 (2012)
Article Google Scholar
Marín-Jiménez, M.J., Yeguas, E., Pérez de la Blanca, N.: Exploring STIP-based models for recognizing human interactions in TV videos. Pattern Recognit. Lett. 34(15), 1819–1828 (2013)
Article Google Scholar
Ni, B., Moulin, P., Yang, X., Yan, S.: Motion part regularization: improving action recognition via trajectory selection. In: Proceedings Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3698–3706 (2015)
Google Scholar
Ozerov, A., Vigouroux, J., Chevallier, L., Pérez, P.: On evaluating face tracks in movies. In: Proceedings International Conference on Image Processing (ICIP), pp. 3003–3007 (2013)
Google Scholar
Patron-Perez, A., Marszałek, M., Reid, I., Zisserman, A.: Structured learning of human interactions in TV shows. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 34(12), 2441–2453 (2012)
Article Google Scholar
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)
Article Google Scholar
Raptis, M., Sigal, L.: Poselet key-framing: a model for human activity recognition. In: Proceedings Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2650–2657 (2013)
Google Scholar
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: Proceedings IEEE International Conference on Computer Vision (ICCV), pp. 1036–1043 (2011)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: UT-Interaction Dataset, ICPR contest on semantic description of human activities (SDHA) (2010). http://cvrc.ece.utexas.edu/SDHA2010
Ryoo, M.S., Aggarwal, J.K.: Stochastic representation and recognition of high-level group activities. Int. J. Comput. Vis. (IJCV) 93(2), 183–200 (2011)
Article MathSciNet MATH Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings International Conference on Pattern Recognition (ICPR), pp. 32–36 (2004)
Google Scholar
Sefidgar, Y.S., Vahdat, A., Se, S., Mori, G.: Discriminative key-component models for interaction detection and recognition. Comput. Vis. Image Underst. (CVIU) 135, 16–30 (2015)
Article Google Scholar
Sener, F., İkizler-Cinbis, N.: Two-person interaction recognition via spatial multiple instance embedding. J. Vis. Commun. Image Represent. 32(C), 63–73 (2015)
Article Google Scholar
Supancic III, J.S., Ramanan, D.: Self-paced learning for long-term tracking. In: Proceedings Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2379–2386 (2013)
Google Scholar
Tian, Y., Sukthankar, R., Shah, M.: Spatiotemporal deformable part models for action detection. In: Proceedings Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2649 (2013)
Google Scholar
van Gemeren, C., Tan, R.T., Poppe, R., Veltkamp, R.C.: Dyadic interaction detection from pose and flow. In: Proceedings Human Behavior Understanding Workshop (ECCV-HBU), pp. 101–115 (2014)
Google Scholar
van Gemert, J.C., Jain, M., Gati, E., Snoek, C.G.M.: APT: action localization proposals from dense trajectories. In: Proceedings British Machine Vision Conference (BMVC), p. A117 (2015)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Cheng-Lin, L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. (IJCV) 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Weinzaepfel, P., Revaud, J., Harchaoui, Z., Schmid, C.: DeepFlow: large displacement optical flow with deep matching. In: Proceedings IEEE International Conference on Computer Vision (ICCV), pp. 1385–1392 (2013)
Google Scholar
Yang, Y., Baker, S., Kannan, A., Ramanan, D.: Recognizing proxemics in personal photos. In: Proceedings Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3522–3529 (2012)
Google Scholar
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35(12), 2878–2890 (2013)
Article Google Scholar
Yao, B., Nie, B., Liu, Z., Zhu, S.-C.: Animated pose templates for modelling and detecting human actions. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 36(3), 436–452 (2014)
Article Google Scholar
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: Proceedings Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 28–35 (2012)
Google Scholar
Zhang, Y., Liu, X., Chang, M.-C., Ge, W., Chen, T.: Spatio-temporal phrases for activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 707–721. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33712-3_51
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Interaction Technology Group, Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands
Coert van Gemeren, Ronald Poppe & Remco C. Veltkamp

Authors

Coert van Gemeren
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Poppe
View author publications
You can also search for this author in PubMed Google Scholar
Remco C. Veltkamp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Coert van Gemeren .

Editor information

Editors and Affiliations

Université Pierre et Marie Curie , Paris, France
Mohamed Chetouani
University of Pittsburgh , Pittsburgh, Pennsylvania, USA
Jeffrey Cohn
Bogazici University , Bebek, Istanbul, Turkey
Albert Ali Salah

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Gemeren, C., Poppe, R., Veltkamp, R.C. (2016). Spatio-Temporal Detection of Fine-Grained Dyadic Human Interactions. In: Chetouani, M., Cohn, J., Salah, A. (eds) Human Behavior Understanding. HBU 2016. Lecture Notes in Computer Science(), vol 9997. Springer, Cham. https://doi.org/10.1007/978-3-319-46843-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-46843-3_8
Published: 22 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46842-6
Online ISBN: 978-3-319-46843-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Spatio-Temporal Detection of Fine-Grained Dyadic Human Interactions

Abstract

Access this chapter

Similar content being viewed by others

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

Dyadic Interaction Detection from Pose and Flow

Part Bricolage: Flow-Assisted Part-Based Graphs for Detecting Activities in Videos

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Spatio-Temporal Detection of Fine-Grained Dyadic Human Interactions

Abstract

Access this chapter

Similar content being viewed by others

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

Dyadic Interaction Detection from Pose and Flow

Part Bricolage: Flow-Assisted Part-Based Graphs for Detecting Activities in Videos

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation