Skip to main content

Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Human-Object Interaction (HOI) detection plays a crucial role in activity understanding. Though significant progress has been made, interactiveness learning remains a challenging problem in HOI detection: existing methods usually generate redundant negative H-O pair proposals and fail to effectively extract interactive pairs. Though interactiveness has been studied in both whole body- and part- level and facilitates the H-O pairing, previous works only focus on the target person once (i.e., in a local perspective) and overlook the information of the other persons. In this paper, we argue that comparing body-parts of multi-person simultaneously can afford us more useful and supplementary interactiveness cues. That said, to learn body-part interactiveness from a global perspective: when classifying a target person’s body-part interactiveness, visual cues are explored not only from herself/himself but also from other persons in the image. We construct body-part saliency maps based on self-attention to mine cross-person informative cues and learn the holistic relationships between all the body-parts. We evaluate the proposed method on widely-used benchmarks HICO-DET and V-COCO. With our new perspective, the holistic global-local body-part interactiveness learning achieves significant improvements over state-of-the-art. Our code is available at https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness.

X. Wu and Y.-L. Li—The first two authors contribute equally.

C. Lu–member of Qing Yuan Research Institute and Shanghai Qi Zhi institute.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chao, Y.W., Liu, Y., Liu, X., Zeng, H., Deng, J.: Learning to detect human-object interactions. In: WACV (2018)

    Google Scholar 

  2. Chen, M., Liao, Y., Liu, S., Chen, Z., Wang, F., Qian, C.: Reformulating hoi detection as adaptive set prediction. In: CVPR (2021)

    Google Scholar 

  3. Fang, H.S., Cao, J., Tai, Y.W., Lu, C.: Pairwise body-part attention for recognizing human-object interactions. In: ECCV (2018)

    Google Scholar 

  4. Fang, H.S., Xie, S., Tai, Y.W., Lu, C.: Rmpe: regional multi-person pose estimation. In: ICCV (2017)

    Google Scholar 

  5. Gao, C., Zou, Y., Huang, J.B.: iCAN: Instance-centric attention network for human-object interaction detection. In: BMVC (2018)

    Google Scholar 

  6. Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human-object interactions. In: CVPR (2018)

    Google Scholar 

  7. Gkioxari, G., Girshick, R., Malik, J.: Actions and attributes from wholes and parts. In: ICCV (2015)

    Google Scholar 

  8. Gupta, S., Malik, J.: Visual semantic role labeling. arXiv preprint arXiv:1505.04474 (2015)

  9. Hou, Z., Peng, X., Qiao, Y., Tao, D.: Visual compositional learning for human-object interaction detection. arXiv preprint arXiv:2007.12407 (2020)

  10. Hou, Z., Yu, B., Qiao, Y., Peng, X., Tao, D.: Detecting human-object interaction via fabricated compositional learning. In: CVPR (2021)

    Google Scholar 

  11. Kim, B., Lee, J., Kang, J., Kim, E.S., Kim, H.J.: Hotr: End-to-end human-object interaction detection with transformers. In: CVPR (2021)

    Google Scholar 

  12. Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. In: IJCV (2017)

    Google Scholar 

  13. Kuznetsova, A., et al.: The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982 (2018)

  14. Li, J., et al.: Human pose regression with residual log-likelihood estimation. In: ICCV (2021)

    Google Scholar 

  15. Li, J., Wang, C., Zhu, H., Mao, Y., Fang, H.S., Lu, C.: Crowdpose: efficient crowded scenes pose estimation and a new benchmark. In: CVPR (2019)

    Google Scholar 

  16. Li, Y.L., et al.: Detailed 2d–3d joint representation for human-object interaction. In: CVPR (2020)

    Google Scholar 

  17. Li, Y.L., Liu, X., Wu, X., Huang, X., Xu, L., Lu, C.: Transferable interactiveness knowledge for human-object interaction detection. In: TPAMI (2022)

    Google Scholar 

  18. Li, Y.L., Liu, X., Wu, X., Li, Y., Lu, C.: Hoi analysis: integrating and decomposing human-object interaction. In: NeurIPS (2020)

    Google Scholar 

  19. Li, Y.L., et al.: Hake: a knowledge engine foundation for human activity understanding. arXiv preprint arXiv:2202.06851 (2022)

  20. Li, Y.L., et al.: Hake: human activity knowledge engine. arXiv preprint arXiv:1904.06539 (2019)

  21. Li, Y.L., et al.: Pastanet: toward human activity knowledge engine. In: CVPR (2020)

    Google Scholar 

  22. Li, Y.L., et al.: Transferable interactiveness knowledge for human-object interaction detection. In: CVPR (2019)

    Google Scholar 

  23. Liao, Y., Liu, S., Wang, F., Chen, Y., Feng, J.: Ppdm: parallel point detection and matching for real-time human-object interaction detection. In: CVPR (2020)

    Google Scholar 

  24. Lin, T.Y., et al.: Microsoft coco: Common objects in context. In: ECCV (2014)

    Google Scholar 

  25. Liu, X., Li, Y.L., Lu, C.: Highlighting object category immunity for the generalization of human-object interaction detection. In: AAAI (2022)

    Google Scholar 

  26. Liu, X., Li, Y.L., Wu, X., Tai, Y.W., Lu, C., Tang, C.K.: Interactiveness field in human-object interactions. In: CVPR (2022)

    Google Scholar 

  27. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  28. Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: ECCV (2016)

    Google Scholar 

  29. Lu, C., Su, H., Li, Y., Lu, Y., Yi, L., Tang, C.K., Guibas, L.J.: Beyond holistic object recognition: Enriching image understanding with part states. In: CVPR (2018)

    Google Scholar 

  30. Qi, S., Wang, W., Jia, B., Shen, J., Zhu, S.C.: Learning human-object interactions by graph parsing neural networks. In: ECCV (2018)

    Google Scholar 

  31. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: NeurIPS (2015)

    Google Scholar 

  32. Tamura, M., Ohashi, H., Yoshinaga, T.: QPIC: query-based pairwise human-object interaction detection with image-wide contextual information. In: CVPR (2021)

    Google Scholar 

  33. Ulutan, O., Iftekhar, A., Manjunath, B.: Vsgnet: spatial attention network for detecting human object interactions using graph convolutions. In: CVPR (2020)

    Google Scholar 

  34. Wan, B., Zhou, D., Liu, Y., Li, R., He, X.: Pose-aware multi-level feature network for human object interaction detection. In: ICCV (2019)

    Google Scholar 

  35. Wang, T., Yang, T., Danelljan, M., Khan, F.S., Zhang, X., Sun, J.: Learning human-object interaction detection using interaction points. In: CVPR (2020)

    Google Scholar 

  36. Zhang, A., et al.: Mining the benefits of two-stage and one-stage hoi detection. arXiv preprint arXiv:2108.05077 (2021)

  37. Zhong, X., Qu, X., Ding, C., Tao, D.: Glance and gaze: Inferring action-aware points for one-stage human-object interaction detection. In: CVPR (2021)

    Google Scholar 

  38. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

  39. Zou, C., et al.: End-to-end human object interaction detection with hoi transformer. In: CVPR (2021)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key R &D Program of China (No. 2021ZD0110700), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), Shanghai Qi Zhi Institute, and SHEITC (2018-RGZN-02046).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cewu Lu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4217 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, X., Li, YL., Liu, X., Zhang, J., Wu, Y., Lu, C. (2022). Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13664. Springer, Cham. https://doi.org/10.1007/978-3-031-19772-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19772-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19771-0

  • Online ISBN: 978-3-031-19772-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics