Skip to main content
Log in

G-SAM: A Robust One-Shot Keypoint Detection Framework for PnP Based Robot Pose Estimation

  • Regular paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

Robot pose estimation plays a fundamental role in various applications involving service and industrial robots. Among the methods used for robot pose estimation from a single image, the Perspective-n-Point (PnP) based approach is widely used due to its popularity and efficiency. An important part of this framework is keypoint detection. However, the current keypoint detection module used for PnP has two problems: Small number of input keypoints and Large error of input keypoints. This paper proposes a Grouping and Soft-ArgMax (G-SAM) framework to address these two problems: First, a simple and powerful Soft-ArgMax module followed by point subset selection is designed to address the problem of small number of input keypoints; Second, a grouping module is introduced, taking into account the texture and spatial structure information of the robot, to solve the problem of large error of input keypoints. Extensive experiments compare our proposed framework with existing state-of-the-art methods on several public datasets and demonstrate that it can provide more reliable, accurate and faster pose estimation for robotic applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

Not Applicable.

Code Availability

Not Applicable.

References

  1. Yu, X., Li, B., He, W., Feng, Y., Cheng, L., Silvestre, C.: Adaptive-constrained impedance control for human-robot co-transportation. IEEE transactions on cybernetics 52(12), 13237–13249 (2021)

    Article  Google Scholar 

  2. Yu, X., He, W., Li, Q., Li, Y., Li, B.: Human-robot co-carrying using visual and force sensing. IEEE Transactions on Industrial Electronics 68(9), 8657–8666 (2020)

    Article  Google Scholar 

  3. Tao, H., Cheng, L., Qiu, J., Stojanovic, V.: Few shot cross equipment fault diagnosis method based on parameter optimization and feature mertic. Meas. Sci. Technol. 33(11), 115005 (2022)

    Article  Google Scholar 

  4. Cheng, P., Wang, H., Stojanovic, V., Liu, F., He, S., Shi, K.: Dissipativity-based finite-time asynchronous output feedback control for wind turbine system via a hidden markov model. Int. J. Syst. Sci. 53(15), 3177–3189 (2022)

  5. Zhou, C., Tao, H., Chen, Y., Stojanovic, V., Paszke, W.: Robust point-to-point iterative learning control for constrained systems: A minimum energy approach. Int. J. Robust Nonlinear Control 32(18), 10139–10161 (2022)

    Article  MathSciNet  Google Scholar 

  6. Zhuang, Z., Tao, H., Chen, Y., Stojanovic, V., Paszke, W.: An optimal iterative learning control approach for linear systems with nonuniform trial lengths under input constraints. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2022)

  7. Dantas, M.S.M., Rodrigues, I.R., Barbosa, G., Bezerra, D., Sadok, D.F., Kelner, J., Marquezini, M., Silva, R., et al.: Fcn-pose: A pruned and quantized cnn for robot pose estimation for constrained devices. arXiv preprint arXiv:2205.13272 (2022)

  8. Rodrigues, I.R., Dantas, M., Oliveira Filho, A.T., Barbosa, G., Bezerra, D., Souza, R., Marquezini, M.V., Endo, P.T., Kelner, J., Sadok, D.: A framework for robotic arm pose estimation and movement prediction based on deep and extreme learning models. The Journal of Supercomputing, 1–30 (2022)

  9. Noguchi, A., Iqbal, U., Tremblay, J., Harada, T., Gallo, O.: Watch it move: Unsupervised discovery of 3d joints for re-posing of articulated objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3677–3687 (2022)

  10. Liu, Q., Qiu, W., Wang, W., Hager, G.D., Yuille, A.L.: Nothing but geometric constraints: A model-free method for articulated object pose estimation. arXiv preprint arXiv:2012.00088 (2020)

  11. Sefercik, B.C., Akgun, B.: Learning markerless robot-depth camera calibration and end-effector pose estimation. arXiv preprint arXiv:2212.07567 (2022)

  12. Simoni, A., Pini, S., Borghi, G., Vezzani, R.: Semi-perspective decoupled heatmaps for 3d robot pose estimation from depth maps. IEEE Robotics and Automation Letters 7(4), 11569–11576 (2022)

  13. Bahadir, O., Siebert, J.P., Aragon-Camarasa, G.: A deep learning-based hand-eye calibration approach using a single reference point on a robot manipulator. In: 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 1109–1114 (2022)

  14. Lee, T.E., Tremblay, J., To, T., Cheng, J., Mosier, T., Kroemer, O., Fox, D., Birchfield, S.: Camera-to-robot pose estimation from a single image. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 9426–9432 (2020)

  15. Lambrecht, J.: Robust few-shot pose estimation of articulated robots using monocular cameras and deep-learning-based keypoint detection. In: 2019 7th International Conference on Robot Intelligence Technology and Applications (RiTA), pp. 136–141 (2019)

  16. Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: Single-view robot pose and joint angle estimation via render & compare. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1654–1663 (2021)

  17. Zuo, Y., Qiu, W., Xie, L., Zhong, F., Wang, Y., Yuille, A.L.: Craves: Controlling robotic arm with a vision-based economic system. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4214–4223 (2019)

  18. Lambrecht, J., Kästner, L.: Towards the usage of synthetic data for marker-less pose estimation of articulated robots in rgb images. In: 2019 19th International Conference on Advanced Robotics (ICAR), pp. 240–247 (2019)

  19. Zheng, Y., Kuang, Y., Sugimoto, S., Astrom, K., Okutomi, M.: Revisiting the pnp problem: A fast, general and optimal solution. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2344–2351 (2013)

  20. Lepetit, V., Moreno-Noguer, F., Fua, P.: Epnp: An accurate o (n) solution to the p n p problem. Int. J. Comput. Vision 81, 155–166 (2009)

    Article  Google Scholar 

  21. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015)

    Google Scholar 

  22. Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 466–481 (2018)

  23. Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh,Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)

  24. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pp. 483–499 (2016)

  25. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

  26. Ding, Y., Deng, W., Zheng, Y., Liu, P., Wang, M., Cheng, X., Bao, J., Chen, D., Zeng, M.: I\(^\wedge \) 2r-net: Intra-and inter-human relation network for multi-person pose estimation. arXiv preprint arXiv:2206.10892 (2022)

  27. Kan, Z., Chen, S., Li, Z., He, Z.: Self-constrained inference optimization on structural groups for human pose estimation. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part V, pp. 729–745 (2022)

  28. Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)

  29. Nie, X., Feng, J., Zhang, J., Yan, S.: Single-stage multi-person pose machines. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6951–6960 (2019)

  30. Nibali, A., He, Z., Morgan, S., Prendergast, L.: Numerical coordinate regression with convolutional neural networks. arXiv preprint arXiv:1801.07372 (2018)

  31. Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 529–545 (2018)

  32. Fiala, M.: Artag, a fiducial marker system using digital techniques. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2, 590–596 (2005)

  33. Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F.J., Marín-Jiménez, M.J.: Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47(6), 2280–2292 (2014)

    Article  Google Scholar 

  34. Park, F.C., Martin, B.J.: Robot sensor calibration: solving ax= xb on the euclidean group. IEEE Transactions on Robotics and Automation 10(5), 717–721 (1994)

    Article  Google Scholar 

  35. Fassi, I., Legnani, G.: Hand to sensor calibration: A geometrical interpretation of the matrix equation ax= xb. Journal of Robotic Systems 22(9), 497–506 (2005)

    Article  MATH  Google Scholar 

  36. Miseikis, J., Knobelreiter, P., Brijacak, I., Yahyanejad, S., Glette, K., Elle, O.J., Torresen, J.: Robot localisation and 3d position estimation using a free-moving camera and cascaded convolutional neural networks. In: 2018 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), pp. 181–187 (2018)

  37. Miseikis, J., Brijacak, I., Yahyanejad, S., Glette, K., Elle, O.J., Torresen, J.: Multi-objective convolutional neural networks for robot localisation and 3d position estimation in 2d camera images. In: 2018 15th International Conference on Ubiquitous Robots (UR), pp. 597–603 (2018)

  38. Mišeikis, J., Brijacak, I., Yahyanejad, S., Glette, K., Elle, O.J., Torresen, J.: Transfer learning for unseen robot detection and joint estimation on a multi-objective convolutional neural network. In: 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR), pp. 337–342 (2018)

  39. Mišeikis, J., Brijačak, I., Yahyanejad, S., Glette, K., Elle, O.J., Torresen, J.: Two-stage transfer learning for heterogeneous robot detection and 3d joint position estimation in a 2d camera image using cnn. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8883–8889 (2019)

  40. Chen, K., Cheng, H.: Posture estimation of articulated robot based on multi-cylinder segmentation. In: 2022 12th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 922–928 (2022)

  41. Tremblay, J., Tyree, S., Mosier, T., Birchfield, S.: Indirect object-to-robot pose estimation from an external monocular rgb camera. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4227–4234 (2020)

  42. Lambrecht, J., Grosenick, P., Meusel, M.: Optimizing keypoint-based single-shot camera-to-robot pose estimation through shape segmentation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13843–13849 (2021)

  43. Dimitropoulos, K., Hatzilygeroudis, I., Chatzilygeroudis, K.: A brief survey of sim2real methods for robot learning. Advances in Service and Industrial Robotics: RAAD 2022, 133–140 (2022)

    Article  Google Scholar 

  44. Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment-a modern synthesis. In: Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, September 21–22, 1999 Proceedings, pp. 298–372 (2000)

  45. Kneip, L., Li, H., Seo, Y.: Upnp: An optimal o (n) solution to the absolute pose problem with universal applicability. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 127–142 (2014)

  46. Gu, K., Yang, L., Yao, A.: Removing the bias of integral pose regression. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11067–11076 (2021)

  47. Gu, K., Yang, L., Yao, A.: Dive deeper into integral pose regression. In: International Conference on Learning Representations (2022)

  48. To, T., Tremblay, J., McKay, D., Yamaguchi, Y., Leung, K., Balanon, A., Cheng, J., Hodge, W., Birchfield, S.: Ndds: Nvidia deep learning dataset synthesizer. In: CVPR 2018 Workshop on Real World Challenges and New Benchmarks for Deep Learning in Robotic Vision, Salt Lake City, UT, June, 22 (2018)

  49. Tremblay, J., To, T., Molchanov, A., Tyree, S., Kautz, J., Birchfield, S.: Synthetically trained neural networks for learning human-readable plans from real-world demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 5659–5666 (2018)

  50. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)

  51. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  52. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  53. Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: Making vgg-style convnets great again. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733–13742 (2021)

Download references

Acknowledgements

We would like to thank anonymous reviewers.

Funding

This work was received funding from the Grants of National Key R &D Program of China (2020AAA0108304), the National Natural Science Foundation of China (No.62171288), the Shenzhen University 2035 Program for Excellent Research (00000224) and the Open Research Fund from Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design,material preparation and the paper writing/review.The first draft of the the manuscript was written by Xiaopin Zhong and revised by Weixiang Liu.The experimental part of the manuscript was mainly completed by Wenxuan Zhu. Jianye Yi, Chengxiang Liu and Zongze Wu contributed to the experimental design and the writing of the manuscript by participating in the discussions and providing valuable insights. All authors commented on previous versions of the manuscript and approved the final manuscript. Supervision was mainly performed by both Xiaopin Zhong and Weixiang Liu.

Corresponding author

Correspondence to Weixiang Liu.

Ethics declarations

Ethics Approval

This is purely a review paper. The Research team involved within this research confirm that no ethical approval is required.

Consent to Participate

Not Applicable.

Consent for Publication

Not Applicable.

Conflict of Interest

Not Applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xiaopin Zhong and Wenxuan Zhu contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, X., Zhu, W., Liu, W. et al. G-SAM: A Robust One-Shot Keypoint Detection Framework for PnP Based Robot Pose Estimation. J Intell Robot Syst 109, 28 (2023). https://doi.org/10.1007/s10846-023-01957-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-023-01957-5

Keywords

Navigation