Abstract
The detection of human-object interactions is a key component in many applications, examples include activity recognition, human intention understanding or the prediction of human movements. In this paper, we propose a novel framework to detect such interactions in RGB-D video streams based on spatio-temporal and pose information. Our system first detects possible human-object interactions using position and pose data of humans and objects. To counter false positive and false negative detections, we calculate the likelihood that such an interaction really occurs by tracking it over subsequent frames. Previous work mainly focused on the detection of specific activities with interacted objects in short prerecorded video clips. In contrast to that, our framework is able to find arbitrary interactions with 510 different objects exploiting the detection capabilities of R-CNNs as well as the Open Image dataset and can be used on online video streams. Our experimental evaluation demonstrates the robustness of the approach on various published videos recorded in indoor environments. The system achieves precision and recall rates of 0.82 on this dataset. Furthermore, we also show that our system can be used for online human motion prediction in robotic applications.
This work has been supported by the DFG Research Unit FOR 2535 Anticipating Human Behavior.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Videos from our dataset are availably under https://www.hrl.uni-bonn.de/icsr2019.
- 3.
A video showing the capabilities of this approach can be found under https://www.hrl.uni-bonn.de/icsr_application_demo.mp4.
References
Bruckschen, L., Dengler, N., Bennewitz, M.: Human motion prediction based on object interactions. In: Proceedings of the European Conference on Mobile Robots (ECMR) (2019)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2017)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2014)
Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human-object interactions. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2018)
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)
Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2017)
Krasin, I., et al.: Openimages: a public dataset for large-scale multi-label and multi-class image classification (2017)
Li, H., Ye, C., Sample, A.P.: IDSense: a human object interaction detection system based on passive UHF RFID. In: Proceeding of the ACM Conference on Human Factors in Computing Systems. ACM (2015)
Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(12), 2935–2947 (2018)
Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(4), 835–848 (2013)
Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 34(3), 601–614 (2012)
Rohrbach, A., et al.: Coherent multi-sentence video description with variable level of detail. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 184–195. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_15
Weisstein: Gamma function. http://mathworld.wolfram.com/GammaFunction.html. Accessed 24 Feb 2019
Yang, C., et al.: Knowledge-based role recognition by using human-object interaction and spatio-temporal analysis. In: Proceedings of IEEE International Conference on Robotics and Biomimetics (ROBIO) (2017)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2010)
Acknowledgments
We would like to thank Nils Dengler, Sandra Höltervennhoff, Sophie Jenke, Saskia Rabich, Jenny Mack, Marco Pinno, Mosadeq Saljoki and Dominik Wührer for their help during our experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Bruckschen, L., Amft, S., Tanke, J., Gall, J., Bennewitz, M. (2019). Detection of Generic Human-Object Interactions in Video Streams. In: Salichs, M., et al. Social Robotics. ICSR 2019. Lecture Notes in Computer Science(), vol 11876. Springer, Cham. https://doi.org/10.1007/978-3-030-35888-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-35888-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35887-7
Online ISBN: 978-3-030-35888-4
eBook Packages: Computer ScienceComputer Science (R0)