QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation

Ni, Haomin; Xie, Shengli; Xu, Pingping; Fang, Xiaozhao; Sun, Weijun; Fang, Ribo

doi:10.1007/s13042-023-01879-6

QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation

Original Article
Published: 21 June 2023

Volume 14, pages 4029–4045, (2023)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Haomin Ni¹^na1,
Shengli Xie¹^na1,
Pingping Xu²^na1,
Xiaozhao Fang¹^na1,
Weijun Sun ORCID: orcid.org/0000-0002-2342-4434¹ &
…
Ribo Fang³^na1

298 Accesses
1 Citation
Explore all metrics

Abstract

3D Hand Pose Estimation (HPE) is a task of recognizing the 3D joints of the hand from static images or dynamic video frames, which is an important topic in human-computer interaction and computer vision. Even though the rise of deep learning promotes the accuracy quite a lot, previous CNN-based methods still suffer from two drawbacks in nature. On one hand, real-valued CNNs break the original structure of the RGB images, and learn color features dependencies and hidden relations mixedly resulting in prediction difficulty when skin is similar. On the other hand, the feature representation for each pixel is only learned by local convolution while the global context information is ignored. To alleviate these problems, we propose a deep learning architecture, named Quaternion Multi-Graph Reasoning Network (QMGR-Net), which contains two novel modules: Quaternion Separation Learning Module (RSLM) and Global Quaternion Reasoning Module (GQRM). The RSLM is proposed to learn color feature dependencies and hidden relations such as edges or shapes at the different level where features can be learned in the quaternion space. Besides, the GQRM is proposed to explicitly capture the interactions among hand joints and model the relationship of non-adjacent nodes. Compared with other state-of-the-art (SOTA) methods on the STB dataset and the RHD dataset, we achieve the best results with 7.27 EPE and 18.59 EPE, respectively, which exhibit the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks

Article 25 January 2024

GHand: A Graph Convolution Network for 3D Hand Pose Estimation

Multi-scale Adaptive Structure Network for Human Pose Estimation from Color Images

References

Lee T, Hollerer T (2009) Multithreaded hybrid feature tracking for markerless augmented reality. IEEE Trans Visual Computer Graph 15(3):355–368
Article Google Scholar
Jang Y, Noh S-T, Chang HJ, Kim T-K, Woo W (2015) 3D finger cape: clicking action and position estimation under self-occlusions in egocentric viewpoint. IEEE Trans Visual Computer Graph 21(4):501–510
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inform Process Syst 27
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035
Alexiadis DS, Daras P (2014) Quaternionic signal processing techniques for automatic evaluation of dance performances from mocap data. IEEE Trans Multimedia 16(5):1391–1406
Article Google Scholar
Xu C, Jiang Y, Zhou J, Liu Y (2021) Semi-supervised joint learning for hand gesture recognition from a single color image. Sensors 21(3):1007
Article Google Scholar
Bianchi M, Haschke R, Büscher G, Ciotti S, Carbonaro N, Tognetti A (2016) A multi-modal sensing glove for human manual-interaction studies. Electronics 5(3):42
Article Google Scholar
Chossat J-B, Tao Y, Duchaine V, Park Y-L (2015) Wearable soft artificial skin for hand motion detection with embedded microfluidic strain sensing 2568–2573
Wang Y, Zhang B, Peng C (2019) Srhandnet: Real-time 2d hand pose estimation with simultaneous region localization. IEEE Tans Image Process 29:2977–2986
Article MATH Google Scholar
Chen Y, Ma H, Kong D, Yan X, Wu J, Fan W, Xie X (2020) Nonparametric structure regularization machine for 2d hand pose estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 381–390
Sharp T, Keskin C, Robertson D, Taylor J, Shotton J, Kim D, Rhemann C, Leichter I, Vinnikov A, Wei Y, et al. (2015) Accurate, robust, and flexible real-time hand tracking. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3633–3642
Sridhar S, Mueller F, Oulasvirta A, Theobalt C (2015) Fast and robust hand tracking using detection-guided optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221
Tan DJ, Cashman T, Taylor J, Fitzgibbon A, Tarlow D, Khamis S, Izadi S, Shotton J (2016) Fits like a glove: Rapid and reliable hand shape personalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5610–5619
Tzionas D, Ballan L, Srikantha A, Aponte P, Pollefeys M, Gall J (2016) Capturing hands in action using discriminative salient points and physics simulation. Int J Comput Vis 118(2):172–193
Article MathSciNet Google Scholar
Guo X, Xu S, Lin X, Sun Y, Ma X (2022) 3D hand pose estimation from a single rgb image through semantic decomposition of vae latent space. Pattern Anal Appl 25(1):157–167
Article Google Scholar
Zimmermann C, Brox T (2017) Learning to estimate 3d hand pose from single rgb images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4903–4911
Oberweger M, Lepetit V (2017) Deepprior++: Improving fast and accurate 3d hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594
Oberweger M, Wohlhart P, Lepetit V (2015) Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807
Moon G, Chang JY, Lee KM (2018) V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5088
Moon G, Yu S-I, Wen H, Shiratori T, Lee KM (2020) Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In: European Conference on Computer Vision, pp. 548–564. Springer
Chen X, Wang G, Guo H, Zhang C (2020) Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395:138–149
Article Google Scholar
Ge L, Liang H, Yuan J, Thalmann D (2018) Real-time 3d hand pose estimation with 3d convolutional neural networks. IEEE Trans Pattern Anal Mach Intellig 41(4):956–970
Article Google Scholar
Liao M, Zhu Z, Shi B, Xia G-s, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918
Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–59
Spurr A, Song J, Park S, Hilliges O (2018) Cross-modal deep variational hand pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 89–98
Yang L, Yao A (2019) Disentangling latent hands for image synthesis and pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9877–9886
Khaleghi L, Sepas-Moghaddam A, Marshall J, Etemad A (2022) Multi-view video-based 3d hand pose estimation. IEEE Trans Artif Intellig
Zhang J, Jiao J, Chen M, Qu L, Xu X, Yang Q (2016) 3D hand pose tracking and estimation using stereo matching. arXiv preprint arXiv:1610.07214
Panteleris P, Argyros A (2017) Back to rgb: 3D tracking of hands and hand-object interactions based on short-baseline stereo. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 575–584
Arjovsky M, Shah A, Bengio Y (2016) Unitary evolution recurrent neural networks. In: International Conference on Machine Learning, pp. 1120–1128. PMLR
Guberman N (2016) On complex valued convolutional neural networks. arXiv preprint arXiv:1602.09046
Trabelsi C, Bilaniuk O, Serdyuk D, Subramanian S, Santos JF, Mehri S, Rostamzadeh N, Bengio Y, Pal CJ (2017) Deep complex networks. CoRR abs/1705.09792arXiv:1705.09792
Shen W, Zhang B, Huang S, Wei Z, Zhang Q (2020) 3d-rotation-equivariant quaternion neural networks. In: European Conference on Computer Vision, pp. 531–547. Springer
Parcollet T, Morchid M, Linarès G (2019) Quaternion convolutional neural networks for heterogeneous image processing. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8514–8518
Grassucci E, Cicero E, Comminiello D (2021) Quaternion generative adversarial networks. arXiv preprint arXiv:2104.09630
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neur Inform Process Syst 29
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603
Xu Y, Mu L, Ji Z, Liu X, Han J (2022) Meta hyperbolic networks for zero-shot learning. Neurocomputing 491:57–66
Article Google Scholar
Fang L, Liu X, Liu L, Xu H, Kang W (2020) Jgr-p2o: Joint graph reasoning based pixel-to-offset prediction network for 3d hand pose estimation from a single depth image. In: European Conference on Computer Vision, pp. 120–137. Springer
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Hamilton WR (1848) Xi. on quaternions; or on a new system of imaginaries in algebra. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 33(219), 58–60
Gaudet CJ, Maida AS (2018) Deep quaternion networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neur Inform Process Syst 25
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Moon G, Chang JY, Lee KM (2019) Camera distance-aware top-down approach for 3d multi-person pose estimation from a single rgb image. In: Proceedings of the IEEE/cvf International Conference on Computer Vision, pp. 10133–10142
Moon G, Chang JY, Lee KM (2018) V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5088
Oikonomidis I, Kyriazis N, Argyros AA (2011) Efficient model-based 3d tracking of hand articulations using kinect. In: BmVC, vol. 1, p. 3
Qian C, Sun X, Wei Y, Tang X, Sun J (2014) Realtime and robust hand tracking from depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106–1113
Chen L, Lin S-Y, Xie Y, Tang H, Xue Y, Xie X, Lin Y-Y, Fan W (2018) Generating realistic training images based on tonality-alignment generative adversarial networks for hand pose estimation. arXiv preprint arXiv:1811.09916
Zhou Y, Habermann M, Xu W, Habibie I, Theobalt C, Xu F (2020) Monocular real-time hand shape and motion capture using multi-modal data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5346–5355
Zhao L, Peng X, Chen Y, Kapadia M, Metaxas DN (2020) Knowledge as priors: Cross-modal knowledge generalization for datasets without superior knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6528–6537
Doosti B, Naha S, Mirbagheri M, Crandall DJ (2020) Hope-net: A graph-based model for hand-object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6608–6617
Liu Y, Jiang J, Sun J, Wang X (2021) Internet+: A light network for hand pose estimation. Sensors 21(20):6747
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by National Key Research and Development Project under Grant 2018YFB1802400, Key-Area Research and Development Program of Guangdong Province under Grant 2019B010121001 and 2019B010118001.

Author information

Haomin Ni, Shengli Xie, Pingping Xu, Xiaozhao Fang, and Ribo Fang are contributing author.

Authors and Affiliations

School of Automation, Guangdong University of Technology, No. 100 Waihuan Xi Road, Guangzhou, 510000, Guangdong, China
Haomin Ni, Shengli Xie, Xiaozhao Fang & Weijun Sun
School of Computer Science and Technology, Guangdong University of Technology, No. 100 Waihuan Xi Road, Guangzhou, 510000, Guangdong, China
Pingping Xu
Huizhou Innovation Research Institute for Next Generation Industrial Internet, Tonghu ecological wisdom District, Huizhou, 512200, Guangdong, China
Ribo Fang

Authors

Haomin Ni
View author publications
You can also search for this author in PubMed Google Scholar
Shengli Xie
View author publications
You can also search for this author in PubMed Google Scholar
Pingping Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaozhao Fang
View author publications
You can also search for this author in PubMed Google Scholar
Weijun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Ribo Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weijun Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ni, H., Xie, S., Xu, P. et al. QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation. Int. J. Mach. Learn. & Cyber. 14, 4029–4045 (2023). https://doi.org/10.1007/s13042-023-01879-6

Download citation

Received: 25 July 2022
Accepted: 23 May 2023
Published: 21 June 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s13042-023-01879-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation

Abstract

Access this article

Similar content being viewed by others

3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks

GHand: A Graph Convolution Network for 3D Hand Pose Estimation

Multi-scale Adaptive Structure Network for Human Pose Estimation from Color Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation

Abstract

Access this article

Similar content being viewed by others

3D Hand Pose Estimation Using Semantic Dynamic Hypergraph Convolutional Networks

GHand: A Graph Convolution Network for 3D Hand Pose Estimation

Multi-scale Adaptive Structure Network for Human Pose Estimation from Color Images

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation