Skip to main content
Log in

QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

3D Hand Pose Estimation (HPE) is a task of recognizing the 3D joints of the hand from static images or dynamic video frames, which is an important topic in human-computer interaction and computer vision. Even though the rise of deep learning promotes the accuracy quite a lot, previous CNN-based methods still suffer from two drawbacks in nature. On one hand, real-valued CNNs break the original structure of the RGB images, and learn color features dependencies and hidden relations mixedly resulting in prediction difficulty when skin is similar. On the other hand, the feature representation for each pixel is only learned by local convolution while the global context information is ignored. To alleviate these problems, we propose a deep learning architecture, named Quaternion Multi-Graph Reasoning Network (QMGR-Net), which contains two novel modules: Quaternion Separation Learning Module (RSLM) and Global Quaternion Reasoning Module (GQRM). The RSLM is proposed to learn color feature dependencies and hidden relations such as edges or shapes at the different level where features can be learned in the quaternion space. Besides, the GQRM is proposed to explicitly capture the interactions among hand joints and model the relationship of non-adjacent nodes. Compared with other state-of-the-art (SOTA) methods on the STB dataset and the RHD dataset, we achieve the best results with 7.27 EPE and 18.59 EPE, respectively, which exhibit the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Lee T, Hollerer T (2009) Multithreaded hybrid feature tracking for markerless augmented reality. IEEE Trans Visual Computer Graph 15(3):355–368

    Article  Google Scholar 

  2. Jang Y, Noh S-T, Chang HJ, Kim T-K, Woo W (2015) 3D finger cape: clicking action and position estimation under self-occlusions in egocentric viewpoint. IEEE Trans Visual Computer Graph 21(4):501–510

    Article  Google Scholar 

  3. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inform Process Syst 27

  4. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035

  5. Alexiadis DS, Daras P (2014) Quaternionic signal processing techniques for automatic evaluation of dance performances from mocap data. IEEE Trans Multimedia 16(5):1391–1406

    Article  Google Scholar 

  6. Xu C, Jiang Y, Zhou J, Liu Y (2021) Semi-supervised joint learning for hand gesture recognition from a single color image. Sensors 21(3):1007

    Article  Google Scholar 

  7. Bianchi M, Haschke R, Büscher G, Ciotti S, Carbonaro N, Tognetti A (2016) A multi-modal sensing glove for human manual-interaction studies. Electronics 5(3):42

    Article  Google Scholar 

  8. Chossat J-B, Tao Y, Duchaine V, Park Y-L (2015) Wearable soft artificial skin for hand motion detection with embedded microfluidic strain sensing 2568–2573

  9. Wang Y, Zhang B, Peng C (2019) Srhandnet: Real-time 2d hand pose estimation with simultaneous region localization. IEEE Tans Image Process 29:2977–2986

    Article  MATH  Google Scholar 

  10. Chen Y, Ma H, Kong D, Yan X, Wu J, Fan W, Xie X (2020) Nonparametric structure regularization machine for 2d hand pose estimation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 381–390

  11. Sharp T, Keskin C, Robertson D, Taylor J, Shotton J, Kim D, Rhemann C, Leichter I, Vinnikov A, Wei Y, et al. (2015) Accurate, robust, and flexible real-time hand tracking. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3633–3642

  12. Sridhar S, Mueller F, Oulasvirta A, Theobalt C (2015) Fast and robust hand tracking using detection-guided optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221

  13. Tan DJ, Cashman T, Taylor J, Fitzgibbon A, Tarlow D, Khamis S, Izadi S, Shotton J (2016) Fits like a glove: Rapid and reliable hand shape personalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5610–5619

  14. Tzionas D, Ballan L, Srikantha A, Aponte P, Pollefeys M, Gall J (2016) Capturing hands in action using discriminative salient points and physics simulation. Int J Comput Vis 118(2):172–193

    Article  MathSciNet  Google Scholar 

  15. Guo X, Xu S, Lin X, Sun Y, Ma X (2022) 3D hand pose estimation from a single rgb image through semantic decomposition of vae latent space. Pattern Anal Appl 25(1):157–167

    Article  Google Scholar 

  16. Zimmermann C, Brox T (2017) Learning to estimate 3d hand pose from single rgb images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4903–4911

  17. Oberweger M, Lepetit V (2017) Deepprior++: Improving fast and accurate 3d hand pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 585–594

  18. Oberweger M, Wohlhart P, Lepetit V (2015) Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807

  19. Moon G, Chang JY, Lee KM (2018) V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5088

  20. Moon G, Yu S-I, Wen H, Shiratori T, Lee KM (2020) Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In: European Conference on Computer Vision, pp. 548–564. Springer

  21. Chen X, Wang G, Guo H, Zhang C (2020) Pose guided structured region ensemble network for cascaded hand pose estimation. Neurocomputing 395:138–149

    Article  Google Scholar 

  22. Ge L, Liang H, Yuan J, Thalmann D (2018) Real-time 3d hand pose estimation with 3d convolutional neural networks. IEEE Trans Pattern Anal Mach Intellig 41(4):956–970

    Article  Google Scholar 

  23. Liao M, Zhu Z, Shi B, Xia G-s, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918

  24. Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–59

  25. Spurr A, Song J, Park S, Hilliges O (2018) Cross-modal deep variational hand pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 89–98

  26. Yang L, Yao A (2019) Disentangling latent hands for image synthesis and pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9877–9886

  27. Khaleghi L, Sepas-Moghaddam A, Marshall J, Etemad A (2022) Multi-view video-based 3d hand pose estimation. IEEE Trans Artif Intellig

  28. Zhang J, Jiao J, Chen M, Qu L, Xu X, Yang Q (2016) 3D hand pose tracking and estimation using stereo matching. arXiv preprint arXiv:1610.07214

  29. Panteleris P, Argyros A (2017) Back to rgb: 3D tracking of hands and hand-object interactions based on short-baseline stereo. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 575–584

  30. Arjovsky M, Shah A, Bengio Y (2016) Unitary evolution recurrent neural networks. In: International Conference on Machine Learning, pp. 1120–1128. PMLR

  31. Guberman N (2016) On complex valued convolutional neural networks. arXiv preprint arXiv:1602.09046

  32. Trabelsi C, Bilaniuk O, Serdyuk D, Subramanian S, Santos JF, Mehri S, Rostamzadeh N, Bengio Y, Pal CJ (2017) Deep complex networks. CoRR abs/1705.09792arXiv:1705.09792

  33. Shen W, Zhang B, Huang S, Wei Z, Zhang Q (2020) 3d-rotation-equivariant quaternion neural networks. In: European Conference on Computer Vision, pp. 531–547. Springer

  34. Parcollet T, Morchid M, Linarès G (2019) Quaternion convolutional neural networks for heterogeneous image processing. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8514–8518

  35. Grassucci E, Cicero E, Comminiello D (2021) Quaternion generative adversarial networks. arXiv preprint arXiv:2104.09630

  36. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neur Inform Process Syst 29

  37. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907

  38. Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603

  39. Xu Y, Mu L, Ji Z, Liu X, Han J (2022) Meta hyperbolic networks for zero-shot learning. Neurocomputing 491:57–66

    Article  Google Scholar 

  40. Fang L, Liu X, Liu L, Xu H, Kang W (2020) Jgr-p2o: Joint graph reasoning based pixel-to-offset prediction network for 3d hand pose estimation from a single depth image. In: European Conference on Computer Vision, pp. 120–137. Springer

  41. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907

  42. Hamilton WR (1848) Xi. on quaternions; or on a new system of imaginaries in algebra. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 33(219), 58–60

  43. Gaudet CJ, Maida AS (2018) Deep quaternion networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE

  44. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neur Inform Process Syst 25

  45. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778

  46. Moon G, Chang JY, Lee KM (2019) Camera distance-aware top-down approach for 3d multi-person pose estimation from a single rgb image. In: Proceedings of the IEEE/cvf International Conference on Computer Vision, pp. 10133–10142

  47. Moon G, Chang JY, Lee KM (2018) V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5088

  48. Oikonomidis I, Kyriazis N, Argyros AA (2011) Efficient model-based 3d tracking of hand articulations using kinect. In: BmVC, vol. 1, p. 3

  49. Qian C, Sun X, Wei Y, Tang X, Sun J (2014) Realtime and robust hand tracking from depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106–1113

  50. Chen L, Lin S-Y, Xie Y, Tang H, Xue Y, Xie X, Lin Y-Y, Fan W (2018) Generating realistic training images based on tonality-alignment generative adversarial networks for hand pose estimation. arXiv preprint arXiv:1811.09916

  51. Zhou Y, Habermann M, Xu W, Habibie I, Theobalt C, Xu F (2020) Monocular real-time hand shape and motion capture using multi-modal data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5346–5355

  52. Zhao L, Peng X, Chen Y, Kapadia M, Metaxas DN (2020) Knowledge as priors: Cross-modal knowledge generalization for datasets without superior knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6528–6537

  53. Doosti B, Naha S, Mirbagheri M, Crandall DJ (2020) Hope-net: A graph-based model for hand-object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6608–6617

  54. Liu Y, Jiang J, Sun J, Wang X (2021) Internet+: A light network for hand pose estimation. Sensors 21(20):6747

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Key Research and Development Project under Grant 2018YFB1802400, Key-Area Research and Development Program of Guangdong Province under Grant 2019B010121001 and 2019B010118001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weijun Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ni, H., Xie, S., Xu, P. et al. QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation. Int. J. Mach. Learn. & Cyber. 14, 4029–4045 (2023). https://doi.org/10.1007/s13042-023-01879-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01879-6

Keywords

Navigation