Advertisement

Point-to-Point Regression PointNet for 3D Hand Pose Estimation

  • Liuhao Ge
  • Zhou Ren
  • Junsong YuanEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11217)

Abstract

Convolutional Neural Networks (CNNs)-based methods for 3D hand pose estimation with depth cameras usually take 2D depth images as input and directly regress holistic 3D hand pose. Different from these methods, our proposed Point-to-Point Regression PointNet directly takes the 3D point cloud as input and outputs point-wise estimations, i.e., heat-maps and unit vector fields on the point cloud, representing the closeness and direction from every point in the point cloud to the hand joint. The point-wise estimations are used to infer 3D joint locations with weighted fusion. To better capture 3D spatial information in the point cloud, we apply a stacked network architecture for PointNet with intermediate supervision, which is trained end-to-end. Experiments show that our method can achieve outstanding results when compared with state-of-the-art methods on three challenging hand pose datasets.

Keyword

3D hand pose estimation 

Notes

Acknowledgment

This research is supported by the BeingTogether Centre, a collaboration between NTU Singapore and UNC at Chapel Hill. The BeingTogether Centre is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its International Research Centres in Singapore Funding Initiative. This work is also supported in part by Singapore Ministry of Education Academic Research Fund Tier 2 MOE2015-T2-2-114, start-up grants from University at Buffalo, and a gift grant from Snap Inc.

Supplementary material

474201_1_En_29_MOESM1_ESM.pdf (9.1 mb)
Supplementary material 1 (pdf 9361 KB)

References

  1. 1.
    Ballan, L., Taneja, A., Gall, J., Van Gool, L., Pollefeys, M.: Motion capture of hands in action using discriminative salient points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 640–653. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_46CrossRefGoogle Scholar
  2. 2.
    Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3D hand pose estimation from monocular RGB images. In: ECCV (2018)CrossRefGoogle Scholar
  3. 3.
    Cao, Z., Huang, Q., Ramani, K.: 3D object classification via spherical projections. In: 3DV (2017)Google Scholar
  4. 4.
    Chen, X., Wang, G., Guo, H., Zhang, C.: Pose guided structured region ensemble network for cascaded hand pose estimation. arXiv preprint arXiv:1708.03416 (2017)
  5. 5.
    Choi, C., Kim, S., Ramani, K.: Learning hand articulations by hallucinating heat distribution. In: ICCV (2017)Google Scholar
  6. 6.
    Choi, C., Sinha, A., Hee Choi, J., Jang, S., Ramani, K.: A collaborative filtering approach to real-time hand pose estimation. In: ICCV (2015)Google Scholar
  7. 7.
    Ge, L., Liang, H., Yuan, J., Thalmann, D.: Real-time 3D hand pose estimation with 3D convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 1–15 (2018).  https://doi.org/10.1109/TPAMI.2018.2827052. ISSN: 0162-8828
  8. 8.
    Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation from single depth images using multi-view CNNs. IEEE Trans. Image Process. 27(9), 4422–4436 (2018)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand pointnet: 3D hand pose estimation using point sets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8417–8426 (2018)Google Scholar
  10. 10.
    Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: CVPR (2016)Google Scholar
  11. 11.
    Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: CVPR (2017)Google Scholar
  12. 12.
    Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., Yang, H.: Region ensemble network: improving convolutional network for hand pose estimation. In: ICIP (2017)Google Scholar
  13. 13.
    Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 852–863. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_61CrossRefGoogle Scholar
  14. 14.
    Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: CVPR (2015)Google Scholar
  15. 15.
    Klokov, R., Lempitsky, V.: Escape from cells: deep Kd-networks for the recognition of 3D point cloud models. In: ICCV (2017)Google Scholar
  16. 16.
    Maturana, D., Scherer, S.: Voxnet: a 3D convolutional neural network for real-time object recognition. In: IROS (2015)Google Scholar
  17. 17.
    Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: Voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: CVPR (2018)Google Scholar
  18. 18.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  19. 19.
    Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3D hand pose estimation. In: ICCV Workshop (2017)Google Scholar
  20. 20.
    Oberweger, M., Riegler, G., Wohlhart, P., Lepetit, V.: Efficiently creating 3D training data for fine hand pose estimation. In: CVPR (2016)Google Scholar
  21. 21.
    Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. In: CVWW (2015)Google Scholar
  22. 22.
    Oberweger, M., Wohlhart, P., Lepetit, V.: Training a feedback loop for hand pose estimation. In: ICCV (2015)Google Scholar
  23. 23.
    Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3D tracking of hand articulations using Kinect. In: BMVC (2011)Google Scholar
  24. 24.
    Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: CVPR (2017)Google Scholar
  25. 25.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)Google Scholar
  26. 26.
    Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: CVPR (2016)Google Scholar
  27. 27.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NIPS (2017)Google Scholar
  28. 28.
    Remelli, E., Tkach, A., Tagliasacchi, A., Pauly, M.: Low-dimensionality calibration through local anisotropic scaling for robust hand model personalization. In: ICCV (2017)Google Scholar
  29. 29.
    Riegler, G., Ulusoy, A.O., Geiger, A.: Octnet: learning deep 3D representations at high resolutions. In: CVPR (2017)Google Scholar
  30. 30.
    Rogez, G., Weinzaepfel, P., Schmid, C.: Lcr-net: Localization-classification-regression for human pose. In: CVPR (2017)Google Scholar
  31. 31.
    Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (TOG) 36(6), 245 (2017)CrossRefGoogle Scholar
  32. 32.
    Sharp, T., et al.: Accurate, robust, and flexible real-time hand tracking. In: CHI (2015)Google Scholar
  33. 33.
    Song, S., Xiao, J.: Deep sliding shapes for amodal 3D object detection in RGB-D images. In: CVPR (2016)Google Scholar
  34. 34.
    Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: ICCV (2015)Google Scholar
  35. 35.
    Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: CVPR (2015)Google Scholar
  36. 36.
    Tang, D., Chang, H.J., Tejani, A., Kim, T.K.: Latent regression forest: structured estimation of 3D articulated hand posture. In: CVPR (2014)Google Scholar
  37. 37.
    Tang, D., Taylor, J., Kohli, P., Keskin, C., Kim, T.K., Shotton, J.: Opening the black box: hierarchical sampling optimization for estimating human hand pose. In: ICCV (2015)Google Scholar
  38. 38.
    Taylor, J., et al.: Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. ACM Trans. Graph. 35(4), 143 (2016)CrossRefGoogle Scholar
  39. 39.
    Taylor, J., Shotton, J., Sharp, T., Fitzgibbon, A.: The vitruvian manifold: inferring dense correspondences for one-shot human pose estimation. In: CVPR (2012)Google Scholar
  40. 40.
    Tkach, A., Tagliasacchi, A., Remelli, E., Pauly, M., Fitzgibbon, A.: Online generative model personalization for hand tracking. ACM Trans. Graph. (TOG) 36(6), 243 (2017)CrossRefGoogle Scholar
  41. 41.
    Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: CVPR (2017)Google Scholar
  42. 42.
    Tompson, J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: NIPS (2014)Google Scholar
  43. 43.
    Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. 33(5), 169 (2014)CrossRefGoogle Scholar
  44. 44.
    Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. Int. J. Comput. Vis. 118(2), 172–193 (2016)MathSciNetCrossRefGoogle Scholar
  45. 45.
    Wan, C., Probst, T., Van Gool, L., Yao, A.: Crossing nets: dual generative models with a shared latent space for hand pose estimation. In: CVPR (2017)Google Scholar
  46. 46.
    Wan, C., Probst, T., Van Gool, L., Yao, A.: Dense 3D regression for hand pose estimation, pp. 5147–5156 (2018)Google Scholar
  47. 47.
    Wan, C., Yao, A., Van Gool, L.: Hand pose estimation from local surface normals. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 554–569. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_34CrossRefGoogle Scholar
  48. 48.
    Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: Octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. (TOG) 36(4), 72 (2017)Google Scholar
  49. 49.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)Google Scholar
  50. 50.
    Wu, Z., et al.: 3D shapenets: a deep representation for volumetric shapes. In: CVPR (2015)Google Scholar
  51. 51.
    Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: ICCV (2013)Google Scholar
  52. 52.
    Xu, C., Govindarajan, L.N., Zhang, Y., Cheng, L.: Lie-X: Depth image based articulated object pose estimation, tracking, and action recognition on lie groups. Int. J. Comput. Vis. 123(3), 454–478 (2017)MathSciNetCrossRefGoogle Scholar
  53. 53.
    Ye, Q., Yuan, S., Kim, T.-K.: Spatial attention deep net with partial PSO for hierarchical hybrid hand pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 346–361. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_21CrossRefGoogle Scholar
  54. 54.
    Yuan, S., et al.: Depth-based 3D hand pose estimation: from current achievements to future goals. In: CVPR (2018)Google Scholar
  55. 55.
    Yuan, S., Ye, Q., Stenger, B., Jain, S., Kim, T.K.: BigHand2.2m benchmark: hand pose dataset and state of the art analysis. In: CVPR (2017)Google Scholar
  56. 56.
    Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: IJCAI (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Institute for Media Innovation, Interdisciplinary Graduate SchoolNanyang Technological UniversitySingaporeSingapore
  2. 2.Snap Inc.VeniceUSA
  3. 3.Department of Computer Science and EngineeringState University of New York at BuffaloBuffaloUSA

Personalised recommendations