Skip to main content
Log in

Optimal stroke learning with policy gradient approach for robotic table tennis

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Learning to play table tennis is a challenging task for robots, as a wide variety of strokes are required. Recent advances have shown that deep Reinforcement Learning (RL) is able to successfully learn the optimal actions in a simulated environment. However, the applicability of RL in real scenarios remains limited due to the high exploration effort. In this work, we propose a realistic simulation environment in which multiple models are built for the dynamics of the ball and the kinematics of the robot. Instead of training an end-to-end RL model, a novel policy gradient approach with TD3 backbone is proposed to learn the racket strokes based on the predicted state of the ball at the hitting time. In the experiments, we show that the proposed approach significantly outperforms the existing RL methods in simulations. Furthermore, to cross the domain from simulation to reality, we adopt an efficient retraining method and test it in three real scenarios. The resulting success rate is 98% and the distance error is around 24.9 cm. The total training time is about 1.5 hours.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Syst 13(1):41–77. https://doi.org/10.1023/A:1022140919877

    Article  MathSciNet  MATH  Google Scholar 

  2. Kendall A, Hawke J, Janz D, Mazur P, Reda D, Allen J-M, Lam V-D, Bewley A, Shah A (2019) Learning to drive in a day. In: 2019 international conference on robotics and automation (ICRA), pp 8248–8254. https://doi.org/10.1109/ICRA.2019.8793742

  3. Osiński B, Jakubowski A, Zięcina P, Miłoś P, Galias C, Homoceanu S, Michalewski H (2020) Simulation-based reinforcement learning for real-world autonomous driving. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 6411–6418, https://doi.org/10.1109/ICRA40945.2020.9196730

  4. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550 (7676):354–359. https://doi.org/10.1038/nature24270

    Article  Google Scholar 

  5. Berner C, Brockman G, Chan B, Cheung V, Debiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C et al (2019) Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680

  6. Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3389–3396. https://doi.org/10.1109/ICRA.2017.7989385

  7. Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V et al (2018) Scalable deep reinforcement learning for visionbased robotic manipulation. In: 2018 Conference on robot learning. PMLR, vol 87, pp 651–673. https://proceedings.mlr.press/v87/kalashnikov18a.html

  8. Koos S, Mouret J-B, Doncieux S (2010) Crossing the reality gap in evolutionary robotics by promoting transferable controllers. In: Proceedings of the 12th annual conference on genetic and evolutionary computation, pp 119–126. https://doi.org/10.1145/1830483.1830505

  9. Cutler M, How JP (2015) Efficient reinforcement learning for robots using informative simulated priors. In: 2015 IEEE international conference on robotics and automation (ICRA), pp 2605–2612. https://doi.org/10.1109/ICRA.2015.7139550

  10. Gao W, Graesser L, Choromanski K, Song X, Lazic N, Sanketi P, Sindhwani V, Jaitly N (2020) Robotic table tennis with model-free reinforcement learning. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5556–5563. https://doi.org/10.1109/IROS45743.2020.9341191

  11. Mahjourian R, Miikkulainen R, Lazic N, Levine S, Jaitly N (2018) Hierarchical policy design for sample-efficient learning of robot table tennis through self-play. arXiv:1811.12927

  12. Zhu Y, Zhao Y, Jin L, Wu J, Xiong R (2018) Towards high level skill learning: Learn to return table tennis ball using monte-carlo based policy gradient method. In: 2018 IEEE international conference on real-time computing and robotics (RCAR), pp 34–41. https://doi.org/10.1109/RCAR.2018.8621776

  13. Hanna JP, Desai S, Karnan H, Warnell G, Stone P (2021) Grounded action transformation for sim-to-real reinforcement learning. Mach Learn:1–31. https://doi.org/10.1007/s10994-021-05982-z

  14. Büchler D., Guist S, Calandra R, Berenz V, Schölkopf B, Peters J (2022) Learning to play table tennis from scratch using muscular robots. In: 2022 IEEE Transactions on robotics. IEEE, pp 1–11. https://doi.org/10.1109/TRO.2022.3176207

  15. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: Bach F, Blei D (eds) Proceedings of the 32nd international conference on machine learning. PMLR, vol 37, pp. 1889–1897. https://proceedings.mlr.press/v37/schulman15.html

  16. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:707.06347

  17. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:509.02971

  18. Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: 2018 International conference on machine learning. PMLR, pp 1587–1596. https://proceedings.mlr.press/v80/fujimoto18a.html

  19. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: 2018 International conference on machine learning. PMLR, pp 1861–1870, https://proceedings.mlr.press/v80/haarnoja18b.html

  20. ASAI K, Nakayama M, YASE S (2019) The ping pong robot to return a ball precisely. https://www.omron.com/global/en/technology/omrontechnics/vol51/016.html. Accessed 2019

  21. Li F, Jiang Q, Zhang S, Wei M, Song R (2019) Robot skill acquisition in assembly process using deep reinforcement learning. Neurocomputing 345:92–102. https://doi.org/10.1016/j.neucom.2019.01.087

    Article  Google Scholar 

  22. Abreu M, Reis LP, Lau N (2019) Learning to run faster in a humanoid robot soccer environment through reinforcement learning. In: Chalup S, Niemueller T, Suthakorn J, Williams M-A (eds) RoboCup 2019: robot world cup XXIII. Springer, pp 3–15. https://doi.org/10.1007/978-3-030-35699-6_1

  23. Gao Y, Tebbe J, Zell A (2021) Robust stroke recognition via vision and imu in robotic table tennis. In: Farkaš I, Masulli P, Otte S, Wermter S (eds) Artificial neural networks and machine learning – ICANN 2021. Springer, pp 379–390. https://doi.org/10.1007/978-3-030-86362-3_31

  24. Coumans E, Bai Y (2017) Pybullet, a python module for physics simulation in robotics games and machine learning

  25. Koç O, Maeda G, Peters J (2018) Online optimal trajectory generation for robot table tennis. Rob Auton Syst 105:121–137. https://doi.org/10.1016/j.robot.2018.03.012

    Article  Google Scholar 

  26. Silva R, Melo FS, Veloso M (2015) Towards table tennis with a quadrotor autonomous learning robot and onboard vision. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 649–655. https://doi.org/10.1109/IROS.2015.7353441

  27. Blank P, Groh BH, Eskofier BM (2017) Ball speed and spin estimation in table tennis using a racket-mounted inertial sensor. In: Proceedings of the 2017 ACM international symposium on wearable computers. ISWC ’17, Association for computing machinery pp 2–9. https://doi.org/10.1145/3123021.3123040

  28. Tebbe J, Klamt L, Gao Y, Zell A (2020) Spin detection in robotic table tennis. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 9694–9700. https://doi.org/10.1109/ICRA40945.2020.9196536

  29. Hester T, Stone P (2013) Texplore: real-time sample-efficient reinforcement learning for robots. Mach Learn 90(3):385–429. https://doi.org/10.1007/s10994-012-5322-7

    Article  MathSciNet  Google Scholar 

  30. Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3389–3396. https://doi.org/10.1109/ICRA.2017.7989385

  31. Huang Y, Büchler D, Koç O, Schölkopf B, Peters J (2016) Jointly learning trajectory generation and hitting point prediction in robot table tennis. In: 2016 IEEE-RAS 16th international conference on humanoid robots (Humanoids), pp 650–655. https://doi.org/10.1109/HUMANOIDS.2016.7803343

  32. Yang L, Zhang H, Zhu X, Sheng X (2021) Ball motion control in the table tennis robot system using time-series deep reinforcement learning. IEEE Access 9:99816–99827. https://doi.org/10.1109/ACCESS.2021.3093340

    Article  Google Scholar 

  33. Tebbe J, Krauch L, Gao Y, Zell A (2021) Sample-efficient reinforcement learning in robotic table tennis. In: 2021 IEEE international conference on robotics and automation (ICRA), pp 4171–4178. https://doi.org/10.1109/ICRA48506.2021.9560764

  34. Yang L, Zhang H, Zhu X, Sheng X (2021) Ball motion control in the table tennis robot system using time-series deep reinforcement learning. IEEE Access 9:99816–99827

    Article  Google Scholar 

  35. Tebbe J, Gao Y, Sastre-Rienietz M, Zell A (2018) A table tennis robot system using an industrial kuka robot arm. In: German conference on pattern recognition, pp 33–45. https://doi.org/10.1007/978-3-030-12939-2_3

  36. Zhang Y, Zhao Y, Xiong R, Wang Y, Wang J, Chu J (2014) Spin observation and trajectory prediction of a ping-pong ball. In: 2014 IEEE international conference on robotics and automation (ICRA), pp 4108–4114. https://doi.org/10.1109/ICRA.2014.6907456

  37. Kröger T, Wahl FM (2010) Online trajectory generation: basic concepts for instantaneous reactions to unforeseen events. IEEE Trans Rob 26(1):94–111. https://doi.org/10.1109/TRO.2009.2035744

    Article  Google Scholar 

  38. Mülling K., Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. Int J Rob Res 32(3):263–279. https://doi.org/10.1177/0278364912472380

    Article  Google Scholar 

Download references

Acknowledgements

We acknowledge the support of the Vector Stiftung and the KUKA Robotics Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yapeng Gao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Y., Tebbe, J. & Zell, A. Optimal stroke learning with policy gradient approach for robotic table tennis. Appl Intell 53, 13309–13322 (2023). https://doi.org/10.1007/s10489-022-04131-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04131-w

Keywords

Navigation