Optimal stroke learning with policy gradient approach for robotic table tennis

Gao, Yapeng; Tebbe, Jonas; Zell, Andreas

doi:10.1007/s10489-022-04131-w

Optimal stroke learning with policy gradient approach for robotic table tennis

Published: 08 October 2022

Volume 53, pages 13309–13322, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

509 Accesses
4 Citations
47 Altmetric
4 Mentions
Explore all metrics

Abstract

Learning to play table tennis is a challenging task for robots, as a wide variety of strokes are required. Recent advances have shown that deep Reinforcement Learning (RL) is able to successfully learn the optimal actions in a simulated environment. However, the applicability of RL in real scenarios remains limited due to the high exploration effort. In this work, we propose a realistic simulation environment in which multiple models are built for the dynamics of the ball and the kinematics of the robot. Instead of training an end-to-end RL model, a novel policy gradient approach with TD3 backbone is proposed to learn the racket strokes based on the predicted state of the ball at the hitting time. In the experiments, we show that the proposed approach significantly outperforms the existing RL methods in simulations. Furthermore, to cross the domain from simulation to reality, we adopt an efficient retraining method and test it in three real scenarios. The resulting success rate is 98% and the distance error is around 24.9 cm. The total training time is about 1.5 hours.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Deep learning: systematic review, models, challenges, and research directions

Article Open access 07 September 2023

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

Article 22 April 2021

References

Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Syst 13(1):41–77. https://doi.org/10.1023/A:1022140919877
Article MathSciNet MATH Google Scholar
Kendall A, Hawke J, Janz D, Mazur P, Reda D, Allen J-M, Lam V-D, Bewley A, Shah A (2019) Learning to drive in a day. In: 2019 international conference on robotics and automation (ICRA), pp 8248–8254. https://doi.org/10.1109/ICRA.2019.8793742
Osiński B, Jakubowski A, Zięcina P, Miłoś P, Galias C, Homoceanu S, Michalewski H (2020) Simulation-based reinforcement learning for real-world autonomous driving. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 6411–6418, https://doi.org/10.1109/ICRA40945.2020.9196730
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550 (7676):354–359. https://doi.org/10.1038/nature24270
Article Google Scholar
Berner C, Brockman G, Chan B, Cheung V, Debiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C et al (2019) Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680
Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3389–3396. https://doi.org/10.1109/ICRA.2017.7989385
Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V et al (2018) Scalable deep reinforcement learning for visionbased robotic manipulation. In: 2018 Conference on robot learning. PMLR, vol 87, pp 651–673. https://proceedings.mlr.press/v87/kalashnikov18a.html
Koos S, Mouret J-B, Doncieux S (2010) Crossing the reality gap in evolutionary robotics by promoting transferable controllers. In: Proceedings of the 12th annual conference on genetic and evolutionary computation, pp 119–126. https://doi.org/10.1145/1830483.1830505
Cutler M, How JP (2015) Efficient reinforcement learning for robots using informative simulated priors. In: 2015 IEEE international conference on robotics and automation (ICRA), pp 2605–2612. https://doi.org/10.1109/ICRA.2015.7139550
Gao W, Graesser L, Choromanski K, Song X, Lazic N, Sanketi P, Sindhwani V, Jaitly N (2020) Robotic table tennis with model-free reinforcement learning. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5556–5563. https://doi.org/10.1109/IROS45743.2020.9341191
Mahjourian R, Miikkulainen R, Lazic N, Levine S, Jaitly N (2018) Hierarchical policy design for sample-efficient learning of robot table tennis through self-play. arXiv:1811.12927
Zhu Y, Zhao Y, Jin L, Wu J, Xiong R (2018) Towards high level skill learning: Learn to return table tennis ball using monte-carlo based policy gradient method. In: 2018 IEEE international conference on real-time computing and robotics (RCAR), pp 34–41. https://doi.org/10.1109/RCAR.2018.8621776
Hanna JP, Desai S, Karnan H, Warnell G, Stone P (2021) Grounded action transformation for sim-to-real reinforcement learning. Mach Learn:1–31. https://doi.org/10.1007/s10994-021-05982-z
Büchler D., Guist S, Calandra R, Berenz V, Schölkopf B, Peters J (2022) Learning to play table tennis from scratch using muscular robots. In: 2022 IEEE Transactions on robotics. IEEE, pp 1–11. https://doi.org/10.1109/TRO.2022.3176207
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: Bach F, Blei D (eds) Proceedings of the 32nd international conference on machine learning. PMLR, vol 37, pp. 1889–1897. https://proceedings.mlr.press/v37/schulman15.html
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:707.06347
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:509.02971
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: 2018 International conference on machine learning. PMLR, pp 1587–1596. https://proceedings.mlr.press/v80/fujimoto18a.html
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: 2018 International conference on machine learning. PMLR, pp 1861–1870, https://proceedings.mlr.press/v80/haarnoja18b.html
ASAI K, Nakayama M, YASE S (2019) The ping pong robot to return a ball precisely. https://www.omron.com/global/en/technology/omrontechnics/vol51/016.html. Accessed 2019
Li F, Jiang Q, Zhang S, Wei M, Song R (2019) Robot skill acquisition in assembly process using deep reinforcement learning. Neurocomputing 345:92–102. https://doi.org/10.1016/j.neucom.2019.01.087
Article Google Scholar
Abreu M, Reis LP, Lau N (2019) Learning to run faster in a humanoid robot soccer environment through reinforcement learning. In: Chalup S, Niemueller T, Suthakorn J, Williams M-A (eds) RoboCup 2019: robot world cup XXIII. Springer, pp 3–15. https://doi.org/10.1007/978-3-030-35699-6_1
Gao Y, Tebbe J, Zell A (2021) Robust stroke recognition via vision and imu in robotic table tennis. In: Farkaš I, Masulli P, Otte S, Wermter S (eds) Artificial neural networks and machine learning – ICANN 2021. Springer, pp 379–390. https://doi.org/10.1007/978-3-030-86362-3_31
Coumans E, Bai Y (2017) Pybullet, a python module for physics simulation in robotics games and machine learning
Koç O, Maeda G, Peters J (2018) Online optimal trajectory generation for robot table tennis. Rob Auton Syst 105:121–137. https://doi.org/10.1016/j.robot.2018.03.012
Article Google Scholar
Silva R, Melo FS, Veloso M (2015) Towards table tennis with a quadrotor autonomous learning robot and onboard vision. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 649–655. https://doi.org/10.1109/IROS.2015.7353441
Blank P, Groh BH, Eskofier BM (2017) Ball speed and spin estimation in table tennis using a racket-mounted inertial sensor. In: Proceedings of the 2017 ACM international symposium on wearable computers. ISWC ’17, Association for computing machinery pp 2–9. https://doi.org/10.1145/3123021.3123040
Tebbe J, Klamt L, Gao Y, Zell A (2020) Spin detection in robotic table tennis. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 9694–9700. https://doi.org/10.1109/ICRA40945.2020.9196536
Hester T, Stone P (2013) Texplore: real-time sample-efficient reinforcement learning for robots. Mach Learn 90(3):385–429. https://doi.org/10.1007/s10994-012-5322-7
Article MathSciNet Google Scholar
Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3389–3396. https://doi.org/10.1109/ICRA.2017.7989385
Huang Y, Büchler D, Koç O, Schölkopf B, Peters J (2016) Jointly learning trajectory generation and hitting point prediction in robot table tennis. In: 2016 IEEE-RAS 16th international conference on humanoid robots (Humanoids), pp 650–655. https://doi.org/10.1109/HUMANOIDS.2016.7803343
Yang L, Zhang H, Zhu X, Sheng X (2021) Ball motion control in the table tennis robot system using time-series deep reinforcement learning. IEEE Access 9:99816–99827. https://doi.org/10.1109/ACCESS.2021.3093340
Article Google Scholar
Tebbe J, Krauch L, Gao Y, Zell A (2021) Sample-efficient reinforcement learning in robotic table tennis. In: 2021 IEEE international conference on robotics and automation (ICRA), pp 4171–4178. https://doi.org/10.1109/ICRA48506.2021.9560764
Yang L, Zhang H, Zhu X, Sheng X (2021) Ball motion control in the table tennis robot system using time-series deep reinforcement learning. IEEE Access 9:99816–99827
Article Google Scholar
Tebbe J, Gao Y, Sastre-Rienietz M, Zell A (2018) A table tennis robot system using an industrial kuka robot arm. In: German conference on pattern recognition, pp 33–45. https://doi.org/10.1007/978-3-030-12939-2_3
Zhang Y, Zhao Y, Xiong R, Wang Y, Wang J, Chu J (2014) Spin observation and trajectory prediction of a ping-pong ball. In: 2014 IEEE international conference on robotics and automation (ICRA), pp 4108–4114. https://doi.org/10.1109/ICRA.2014.6907456
Kröger T, Wahl FM (2010) Online trajectory generation: basic concepts for instantaneous reactions to unforeseen events. IEEE Trans Rob 26(1):94–111. https://doi.org/10.1109/TRO.2009.2035744
Article Google Scholar
Mülling K., Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. Int J Rob Res 32(3):263–279. https://doi.org/10.1177/0278364912472380
Article Google Scholar

Download references

Acknowledgements

We acknowledge the support of the Vector Stiftung and the KUKA Robotics Corporation.

Author information

Authors and Affiliations

College of Information and Computer, Taiyuan University of Technology, 79 Yingzexi Dajie, Taiyuan City, Shanxi Province, China
Yapeng Gao
Cognitive Systems, Eberhard Karls University Tübingen, Geschwister-Scholl-Platz, Tübingen, 72074, Baden-Württemberg, Germany
Yapeng Gao, Jonas Tebbe & Andreas Zell

Authors

Yapeng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Jonas Tebbe
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Zell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yapeng Gao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gao, Y., Tebbe, J. & Zell, A. Optimal stroke learning with policy gradient approach for robotic table tennis. Appl Intell 53, 13309–13322 (2023). https://doi.org/10.1007/s10489-022-04131-w

Download citation

Accepted: 29 August 2022
Published: 08 October 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04131-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal stroke learning with policy gradient approach for robotic table tennis

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Deep learning: systematic review, models, challenges, and research directions

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal stroke learning with policy gradient approach for robotic table tennis

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Deep learning: systematic review, models, challenges, and research directions

Challenges of real-world reinforcement learning: definitions, benchmarks and analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation