Inferring object properties from human interaction and transferring them to new motions

Zheng, Qian; Wu, Weikai; Pan, Hanting; Mitra, Niloy; Cohen-Or, Daniel; Huang, Hui

doi:10.1007/s41095-021-0218-8

Inferring object properties from human interaction and transferring them to new motions

Research Article
Open access
Published: 19 April 2021

Volume 7, pages 375–392, (2021)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Inferring object properties from human interaction and transferring them to new motions

Download PDF

Qian Zheng¹,
Weikai Wu¹,
Hanting Pan¹,
Niloy Mitra²,
Daniel Cohen-Or³ &
…
Hui Huang¹

799 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Humans regularly interact with their surrounding objects. Such interactions often result in strongly correlated motions between humans and the interacting objects. We thus ask: “Is it possible to infer object properties from skeletal motion alone, even without seeing the interacting object itself?” In this paper, we present a fine-grained action recognition method that learns to infer such latent object properties from human interaction motion alone. This inference allows us to disentangle the motion from the object property and transfer object properties to a given motion. We collected a large number of videos and 3D skeletal motions of performing actors using an inertial motion capture device. We analyzed similar actions and learned subtle differences between them to reveal latent properties of the interacting objects. In particular, we learned to identify the interacting object, by estimating its weight, or its spillability. Our results clearly demonstrate that motions and interacting objects are highly correlated and that related object latent properties can be inferred from 3D skeleton sequences alone, leading to new synthesis possibilities for motions involving human interaction. Our dataset is available at http://vcc.szu.edu.cn/research/2020/IT.html.

Article PDF

HuMoMM: A Multi-Modal Dataset and Benchmark for Human Motion Analysis

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

Article Open access 01 March 2018

Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Blake, R.; Shiffrar, M. Perception of human motion. Annual Review of Psychology Vol. 58, No. 1, 47–73, 2007.
Article Google Scholar
Runeson, S.; Frykholm, G. Visual perception of lifted weight. Journal of Experimental Psychology: Human Perception and Performance Vol. 7, No. 4, 733–740, 1981.
Google Scholar
Podda, J.; Ansuini, C.; Vastano, R.; Cavallo, A.; Becchio, C. The heaviness of invisible objects: Predictive weight judgments from observed real and pantomimed grasps. Cognition Vol. 168, 140–145, 2017.
Article Google Scholar
Vaina, L. M.; Goodglass, H.; Daltroy, L. Inference of object use from pantomimed actions by aphasics and patients with right hemisphere lesions. Synthese Vol. 104, No. 1, 43–57, 1995.
Article Google Scholar
Shahroudy, A.; Liu, J.; Ng, T. T.; Wang, G. NTU RGB+D: A large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1010–1019, 2016.
Liu, C.; Hu, Y.; Li, Y.; Song, S.; Liu, J. PKUMMD: A large scale benchmark for continuous multimodal human action understanding. arXiv preprint arXiv:1703.07475, 2017.
Lo Presti, L.; La Cascia, M. 3D skeleton-based human action classification: A survey. Pattern Recognition Vol. 53, 130–147, 2016.
Article Google Scholar
Yao, B. P.; Fei-Fei, L. Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 17–24, 2010.
Gkioxari, G.; Girshick, R.; Dollár, P.; He, K. Detecting and recognizing human-object interactions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8359–8367, 2018.
Kato, K.; Li, Y.; Gupta, A. Compositional learning for human object interaction. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11218. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 247–264, 2018.
Grabner, H.; Gall, J.; van Gool, L. What makes a chair a chair? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1529–1536, 2011.
Kim, V. G.; Chaudhuri, S.; Guibas, L.; Funkhouser, T. Shape2Pose. ACM Transactions on Graphics Vol. 33, No. 4, Article No. 120, 2014.
Hu, R. Z.; Yan, Z. H.; Zhang, J. W.; van Kaick, O., Shamir, A., Zhang, H.; Huang, H. Predictive and generative neural networks for object functionality. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 151, 2018.
Savva, M.; Chang, A. X.; Hanrahan, P.; Fisher, M.; Nießner, M. SceneGrok. ACM Transactions on Graphics Vol. 33, No. 6, Article No. 212, 2014.
Li, X. T.; Liu, S. F.; Kim, K.; Wang, X. L.; Yang, M. H.; Kautz, J. Putting humans in a scene: Learning affordance in 3D indoor environments. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12360–12368, 2019.
Hu, R.; Savva, M.; van Kaick, O. Functionality representations and applications for shape analysis. Computer Graphics Forum Vol. 37, No. 2, 603–624, 2018.
Article Google Scholar
Jiang, Y.; Koppula, H.; Saxena, A. Hallucinated humans as the hidden context for labeling 3D scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2993–3000, 2013.
Jiang, Y.; Koppula, H. S.; Saxena, A. Modeling 3D environments through hidden human context. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 10, 2040–2053, 2016.
Article Google Scholar
Ho, E. S. L.; Komura, T.; Tai, C. L. Spatial relationship preserving character motion adaptation. ACM Transactions on Graphics Vol. 29, No. 4, Article No. 33, 2010.
Article Google Scholar
Shen, Y. J.; Yang, L. Z.; Ho, E. S. L.; Shum, H. P. H. Interaction-based human activity comparison. IEEE Transactions on Visualization and Computer Graphics Vol. 26, No. 8, 2620–2633, 2019.
Article Google Scholar
Stoffregen, T. A.; Flynn, S. B. Visual perception of support-surface deformability from human body kinematics. Ecological Psychology Vol. 6, No. 1, 33–64, 1994.
Article Google Scholar
Hamilton, A. F.; Joyce, D. W.; Flanagan, J. R.; Frith, C. D.; Wolpert, D. M. Kinematic cues in perceptual weight judgement and their origins in box lifting. Psychological Research Vol. 71, No. 1, 13–21, 2007.
Article Google Scholar
Schmidt, F.; Paulun, V. C.; van Assen, J. J. R.; Fleming, R. W. Inferring the stiffness of unfamiliar objects from optical, shape, and motion cues. Journal of Vision Vol. 17, No. 3, 18, 2017.
Article Google Scholar
Koppula, H. S.; Gupta, R.; Saxena, A. Learning human activities and object affordances from RGB-D videos. The International Journal of Robotics Research Vol. 32, No. 8, 951–970, 2013.
Article Google Scholar
Kang, C. G.; Lee, S. H. Scene reconstruction and analysis from motion. Graphical Models Vol. 94, 25–37, 2017.
Article MathSciNet Google Scholar
Monszpart, A.; Guerrero, P.; Ceylan, D.; Yumer, E.; Mitra, N. J. iMapper: Interaction-guided joint scene and human motion mapping from monocular videos. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 92, 2019.
Davis, J. W.; Gao, H. Recognizing human action efforts: An adaptive three-mode PCA framework. In: Proceedings of the 9th IEEE International Conference on Computer Vision, 1463–1469, 2003.
Gupta, A.; Davis, L. S. Objects in action: An approach for combining action understanding and object perception. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–8, 2007.
Wu, J.; Yildirim, I.; Lim, J. J.; Freeman, W. T.; Tenenbaum, J. B. Galileo: Perceiving physical object properties by integrating a physics engine with deep learning. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 1, 127–135, 2015.
Google Scholar
Wu, J. J.; Lim, J.; Zhang, H. Y.; Tenenbaum, J.; Freeman, W. Physics 101: Learning physical object properties from unlabeled videos. In: Proceedings of the British Machine Vision Conference, 39.1–39.12, 2016.
Liu, J.; Shahroudy, A.; Xu, D.; Wang, G. Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9907. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 816–833, 2016.
Liu, J.; Wang, G.; Hu, P.; Duan, L. Y.; Kot, A. C. Global context-aware attention LSTM networks for 3D action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3671–3680, 2017.
Song, S.; Lan, C.; Xing, J.; Zeng, W.; Liu, J. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, 4263–4270, 2017.
Yan, S. J.; Xiong, Y. J.; Lin, D. H. Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
Ke, Q. H.; Bennamoun, M.; An, S. J.; Sohel, F.; Boussaid, F. A new representation of skeleton sequences for 3D action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4570–4579, 2017.
Li, C.; Zhong, Q. Y.; Xie, D.; Pu, S. L. Cooccurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, 786–792, 2018.
Aristidou, A.; Cohen-Or, D.; Hodgins, J. K.; Chrysanthou, Y.; Shamir, A. Deep motifs and motion signatures. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 187, 2018.
Google Scholar
Hsu, E.; Pulli, K.; Popović, J. Style translation for human motion. ACM Transactions on Graphics Vol. 24, No. 3, 1082–1089, 2005.
Article Google Scholar
Xia, S. H.; Wang, C. Y.; Chai, J. X.; Hodgins, J. Realtime style transfer for unlabeled heterogeneous human motion. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 119, 2015.
Yumer, M. E.; Mitra, N. J. Spectral style transfer for human motion between independent actions. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 137, 2016.
Bellini, R.; Kleiman, Y.; Cohen-Or, D. Dance to the beat: Synchronizing motion to audio. Computational Visual Media Vol. 4, No. 3, 197–208, 2018.
Article Google Scholar
Cao, Z.; Simon, T.; Wei, S. H.; Sheikh, Y. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1302–1310, 2017.
Insafutdinov, E.; Pishchulin, L.; Andres, B.; Andriluka, M.; Schiele, B. DeeperCut: A deeper, stronger, and faster multi-person pose estimation model. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9910. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 34–50, 2016.
Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 483–499, 2016.
Wei, S. H.; Ramakrishna, V.; Kanade, T.; Sheikh, Y. Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724–4732, 2016.
Güler, R. A.; Neverova, N.; Kokkinos, I. DensePose: Dense human pose estimation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7297–7306, 2018.
Tekin, B.; Rozantsev, A.; Lepetit, V.; Fua, P. Direct prediction of 3D body poses from motion compensated sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 991–1000, 2016.
Tome, D.; Russell, C.; Agapito, L. Lifting from the deep: Convolutional 3D pose estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5689–5698, 2017.
Mehta, D.; Sridhar, S.; Sotnychenko, O.; Rhodin, H.; Shafiei, M.; Seidel, H. P.; Xu, W.; Casas, D.; Theobalt, C. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 44, 2017.
Kanazawa, A.; Black, M. J.; Jacobs, D. W.; Malik, J. End-to-end recovery of human shape and pose. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7122–7131, 2018.
Pavlakos, G.; Zhou, X. W.; Daniilidis, K. Ordinal depth supervision for 3D human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7307–7316, 2018.
Andriluka, M.; Iqbal, U.; Insafutdinov, E.; Pishchulin, L.; Milan, A.; Gall, J.; Schiele, B. PoseTrack: A benchmark for human pose estimation and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5167–5176, 2018.
CMU. CMU Graphics Lab Motion Capture Database. 2018. Available at http://mocap.cs.cmu.edu/.
Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1724–1734, 2014.
Wang, Y.; Sun, Y. B.; Liu, Z. W.; Sarma, S. E.; Bronstein, M. M.; Solomon, J. M. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics Vol. 38, No. 5, Article No. 146, 2019.
Zhang, P. F.; Xue, J. R.; Lan, C. L.; Zeng, W. J.; Gao, Z. N.; Zheng, N. N. Adding attentiveness to the neurons in recurrent neural networks. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11213. Springer Cham, 136–152, 2018.
Holden, D.; Saito, J.; Komura, T. A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 138, 2016.
Aberman, K.; Wu, R. D.; Lischinski, D.; Chen, B. Q.; Cohen-Or, D. Learning character-agnostic motion for motion retargeting in 2D. ACM Transactions on Graphics Vol. 38, No. 4, Article No. 75, 2019.
Gui, L. Y.; Wang, Y. X.; Liang, X. D.; Moura, J. M. F. Adversarial geometry-aware human motion prediction. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11208. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 823–842, 2018.
Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality reduction by learning an invariant mapping. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1735–1742, 2006.
Charles, R. Q.; Hao, S.; Mo, K. C.; Guibas, L. J. PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 77–85, 2017.
Wang, H.; Ho, E. S. L.; Shum, H. P. H.; Zhu, Z. X. Spatio-temporal manifold learning for human motions via long-horizon modeling. IEEE Transactions on Visualization and Computer Graphics Vol. 27, No. 1, 216–227, 2021.
Article Google Scholar

Download references

Acknowledgements

We sincerely thank the reviewers for their valuable comments. This work was supported in part by Shenzhen Innovation Program (JCYJ20180305125709986), National Natural Science Foundation of China (61861130365, 61761146002), GD Science and Technology Program (2020A0505100064, 2015A030312015), and DEGP Key Project (2018KZDXM058).

Author information

Authors and Affiliations

Shenzhen University, Shenzhen, China
Qian Zheng, Weikai Wu, Hanting Pan & Hui Huang
University College London, London, UK
Niloy Mitra
Tel Aviv University, Tel-Aviv, Israel
Daniel Cohen-Or

Authors

Qian Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Weikai Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hanting Pan
View author publications
You can also search for this author in PubMed Google Scholar
Niloy Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Cohen-Or
View author publications
You can also search for this author in PubMed Google Scholar
Hui Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Huang.

Additional information

Qian Zheng received her doctoral degree in computer science from Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences, in 2015. She is an assistant professor in the College of Computer Science and Software Engineering, Shenzhen University. Her interests include computer graphics and information visualization.

Weikai Wu is a software engineer in TCL. He received his M.S. degree in computer science from Shenzhen University in 2020.

Hanting Pan is a software engineer in Orbbec. He received his M.S. degree in computer science from Shenzhen University in 2020.

Niloy Mitra leads the Smart Geometry Processing Group in the Department of Computer Science at University College London. He received his Ph.D. degree from Stanford University under the guidance of Leonidas Guibas. His research interests include shape analysis, creativeAI, and computational design and fabrication. Niloy received the Eurographics Outstanding Technical Contributions Award in 2019, the BCS Roger Needham Award in 2015, and the ACM Siggraph Significant New Researcher Award in 2013.

Daniel Cohen-Or is a professor in the School of Computer Science, Tel Aviv University. He received his Ph.D. degree from the State University of New York at Stony Brook in 1991. He was the recipient of a Eurographics Outstanding Technical Contributions Award in 2005, and an ACM SIGGRAPH Computer Graphics Achievement Award in 2018. In 2019 he won a Kadar Family Award for Outstanding Research. In 2020 he received a Eurographics Distinguished Career Award. His research interests are in computer graphics, in particular, synthesis, processing, and modeling techniques.

Hui Huang is a Distinguished TFA Professor at Shenzhen University, where she directs the Visual Computing Research Center. She received her Ph.D. degree in applied math from The University of British Columbia in 2008. Her research interests span computer graphics, 3D vision, and visualization. She is currently a Senior Member of IEEE/ACM/CSIG, a Distinguished Member of CCF, and is on the editorial board of ACM Trans. on Graphics and Computers & Graphics.

Electronic supplementary material

Supplementary material, approximately 50.5 MB.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorial-manager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Zheng, Q., Wu, W., Pan, H. et al. Inferring object properties from human interaction and transferring them to new motions. Comp. Visual Media 7, 375–392 (2021). https://doi.org/10.1007/s41095-021-0218-8

Download citation

Received: 22 January 2021
Accepted: 24 February 2021
Published: 19 April 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s41095-021-0218-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Inferring object properties from human interaction and transferring them to new motions

Abstract

Article PDF

Similar content being viewed by others

HuMoMM: A Multi-Modal Dataset and Benchmark for Human Motion Analysis

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 50.5 MB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inferring object properties from human interaction and transferring them to new motions

Abstract

Article PDF

Similar content being viewed by others

HuMoMM: A Multi-Modal Dataset and Benchmark for Human Motion Analysis

Hands-on: deformable pose and motion models for spatiotemporal localization of fine-grained dyadic interactions

Global-Local Motion Transformer for Unsupervised Skeleton-Based Action Learning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 50.5 MB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation