Detection of Generic Human-Object Interactions in Video Streams

Bruckschen, Lilli; Amft, Sabrina; Tanke, Julian; Gall, Jürgen; Bennewitz, Maren

doi:10.1007/978-3-030-35888-4_11

Lilli Bruckschen¹⁵,
Sabrina Amft¹⁷,
Julian Tanke¹⁶,
Jürgen Gall¹⁶ &
…
Maren Bennewitz¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11876))

Included in the following conference series:

International Conference on Social Robotics

2446 Accesses
4 Citations

Abstract

The detection of human-object interactions is a key component in many applications, examples include activity recognition, human intention understanding or the prediction of human movements. In this paper, we propose a novel framework to detect such interactions in RGB-D video streams based on spatio-temporal and pose information. Our system first detects possible human-object interactions using position and pose data of humans and objects. To counter false positive and false negative detections, we calculate the likelihood that such an interaction really occurs by tracking it over subsequent frames. Previous work mainly focused on the detection of specific activities with interacted objects in short prerecorded video clips. In contrast to that, our framework is able to find arbitrary interactions with 510 different objects exploiting the detection capabilities of R-CNNs as well as the Open Image dataset and can be used on online video streams. Our experimental evaluation demonstrates the robustness of the approach on various published videos recorded in indoor environments. The system achieves precision and recall rates of 0.82 on this dataset. Furthermore, we also show that our system can be used for online human motion prediction in robotic applications.

This work has been supported by the DFG Research Unit FOR 2535 Anticipating Human Behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.hrl.uni-bonn.de/icsr_interaction_demo.mp4.
2.
Videos from our dataset are availably under https://www.hrl.uni-bonn.de/icsr2019.
3.
A video showing the capabilities of this approach can be found under https://www.hrl.uni-bonn.de/icsr_application_demo.mp4.

References

Bruckschen, L., Dengler, N., Bennewitz, M.: Human motion prediction based on object interactions. In: Proceedings of the European Conference on Mobile Robots (ECMR) (2019)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2017)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2014)
Google Scholar
Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human-object interactions. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2018)
Google Scholar
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)
Article Google Scholar
Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2017)
Google Scholar
Krasin, I., et al.: Openimages: a public dataset for large-scale multi-label and multi-class image classification (2017)
Google Scholar
Li, H., Ye, C., Sample, A.P.: IDSense: a human object interaction detection system based on passive UHF RFID. In: Proceeding of the ACM Conference on Human Factors in Computing Systems. ACM (2015)
Google Scholar
Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(12), 2935–2947 (2018)
Article Google Scholar
Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(4), 835–848 (2013)
Article Google Scholar
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 34(3), 601–614 (2012)
Article Google Scholar
Rohrbach, A., et al.: Coherent multi-sentence video description with variable level of detail. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 184–195. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_15
Chapter Google Scholar
Weisstein: Gamma function. http://mathworld.wolfram.com/GammaFunction.html. Accessed 24 Feb 2019
Yang, C., et al.: Knowledge-based role recognition by using human-object interaction and spatio-temporal analysis. In: Proceedings of IEEE International Conference on Robotics and Biomimetics (ROBIO) (2017)
Google Scholar
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2010)
Google Scholar

Download references

Acknowledgments

We would like to thank Nils Dengler, Sandra Höltervennhoff, Sophie Jenke, Saskia Rabich, Jenny Mack, Marco Pinno, Mosadeq Saljoki and Dominik Wührer for their help during our experiments.

Author information

Authors and Affiliations

Humanoid Robots Lab, University of Bonn, Bonn, Germany
Lilli Bruckschen & Maren Bennewitz
Institute of Computer Science III, University of Bonn, Bonn, Germany
Julian Tanke & Jürgen Gall
Human-Centered Security, Leibniz University Hannover, Hanover, Germany
Sabrina Amft

Authors

Lilli Bruckschen
View author publications
You can also search for this author in PubMed Google Scholar
Sabrina Amft
View author publications
You can also search for this author in PubMed Google Scholar
Julian Tanke
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Gall
View author publications
You can also search for this author in PubMed Google Scholar
Maren Bennewitz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lilli Bruckschen .

Editor information

Editors and Affiliations

Robotics Lab, Universidad Carlos III de Madrid, Leganés, Madrid, Spain
Miguel A. Salichs
The National University of Singapore, Singapore, Singapore
Shuzhi Sam Ge
Faculty of Industrial Design, Eindhoven University of Technology, Eindhoven, Noord-Brabant, The Netherlands
Emilia Ivanova Barakova
Mechanical & Industrial, Qatar University, Doha, Qatar
John-John Cabibihan
Department of Aerospace Engineering, The Pennsylvania State University, University Park, PA, USA
Alan R. Wagner
Robotics Lab - Department of Systems Engineering and Automation, Universidad Carlos III de Madrid, Leganés, Madrid, Spain
Álvaro Castro-González
Wichita State University, Wichita, KS, USA
Hongsheng He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bruckschen, L., Amft, S., Tanke, J., Gall, J., Bennewitz, M. (2019). Detection of Generic Human-Object Interactions in Video Streams. In: Salichs, M., et al. Social Robotics. ICSR 2019. Lecture Notes in Computer Science(), vol 11876. Springer, Cham. https://doi.org/10.1007/978-3-030-35888-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-35888-4_11
Published: 17 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35887-7
Online ISBN: 978-3-030-35888-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics