Skip to main content

Detection of Generic Human-Object Interactions in Video Streams

  • Conference paper
  • First Online:
Social Robotics (ICSR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11876))

Included in the following conference series:

Abstract

The detection of human-object interactions is a key component in many applications, examples include activity recognition, human intention understanding or the prediction of human movements. In this paper, we propose a novel framework to detect such interactions in RGB-D video streams based on spatio-temporal and pose information. Our system first detects possible human-object interactions using position and pose data of humans and objects. To counter false positive and false negative detections, we calculate the likelihood that such an interaction really occurs by tracking it over subsequent frames. Previous work mainly focused on the detection of specific activities with interacted objects in short prerecorded video clips. In contrast to that, our framework is able to find arbitrary interactions with 510 different objects exploiting the detection capabilities of R-CNNs as well as the Open Image dataset and can be used on online video streams. Our experimental evaluation demonstrates the robustness of the approach on various published videos recorded in indoor environments. The system achieves precision and recall rates of 0.82 on this dataset. Furthermore, we also show that our system can be used for online human motion prediction in robotic applications.

This work has been supported by the DFG Research Unit FOR 2535 Anticipating Human Behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.hrl.uni-bonn.de/icsr_interaction_demo.mp4.

  2. 2.

    Videos from our dataset are availably under https://www.hrl.uni-bonn.de/icsr2019.

  3. 3.

    A video showing the capabilities of this approach can be found under https://www.hrl.uni-bonn.de/icsr_application_demo.mp4.

References

  1. Bruckschen, L., Dengler, N., Bennewitz, M.: Human motion prediction based on object interactions. In: Proceedings of the European Conference on Mobile Robots (ECMR) (2019)

    Google Scholar 

  2. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2017)

    Google Scholar 

  3. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2014)

    Google Scholar 

  4. Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human-object interactions. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2018)

    Google Scholar 

  5. Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)

    Article  Google Scholar 

  6. Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE Conference Computer Vision Pattern Recognition (CVPR) (2017)

    Google Scholar 

  7. Krasin, I., et al.: Openimages: a public dataset for large-scale multi-label and multi-class image classification (2017)

    Google Scholar 

  8. Li, H., Ye, C., Sample, A.P.: IDSense: a human object interaction detection system based on passive UHF RFID. In: Proceeding of the ACM Conference on Human Factors in Computing Systems. ACM (2015)

    Google Scholar 

  9. Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40(12), 2935–2947 (2018)

    Article  Google Scholar 

  10. Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(4), 835–848 (2013)

    Article  Google Scholar 

  11. Prest, A., Schmid, C., Ferrari, V.: Weakly supervised learning of interactions between humans and objects. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 34(3), 601–614 (2012)

    Article  Google Scholar 

  12. Rohrbach, A., et al.: Coherent multi-sentence video description with variable level of detail. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 184–195. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_15

    Chapter  Google Scholar 

  13. Weisstein: Gamma function. http://mathworld.wolfram.com/GammaFunction.html. Accessed 24 Feb 2019

  14. Yang, C., et al.: Knowledge-based role recognition by using human-object interaction and spatio-temporal analysis. In: Proceedings of IEEE International Conference on Robotics and Biomimetics (ROBIO) (2017)

    Google Scholar 

  15. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

Download references

Acknowledgments

We would like to thank Nils Dengler, Sandra Höltervennhoff, Sophie Jenke, Saskia Rabich, Jenny Mack, Marco Pinno, Mosadeq Saljoki and Dominik Wührer for their help during our experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lilli Bruckschen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bruckschen, L., Amft, S., Tanke, J., Gall, J., Bennewitz, M. (2019). Detection of Generic Human-Object Interactions in Video Streams. In: Salichs, M., et al. Social Robotics. ICSR 2019. Lecture Notes in Computer Science(), vol 11876. Springer, Cham. https://doi.org/10.1007/978-3-030-35888-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35888-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35887-7

  • Online ISBN: 978-3-030-35888-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics