Abstract
Learning to play table tennis is a challenging task for robots, as a wide variety of strokes are required. Recent advances have shown that deep Reinforcement Learning (RL) is able to successfully learn the optimal actions in a simulated environment. However, the applicability of RL in real scenarios remains limited due to the high exploration effort. In this work, we propose a realistic simulation environment in which multiple models are built for the dynamics of the ball and the kinematics of the robot. Instead of training an end-to-end RL model, a novel policy gradient approach with TD3 backbone is proposed to learn the racket strokes based on the predicted state of the ball at the hitting time. In the experiments, we show that the proposed approach significantly outperforms the existing RL methods in simulations. Furthermore, to cross the domain from simulation to reality, we adopt an efficient retraining method and test it in three real scenarios. The resulting success rate is 98% and the distance error is around 24.9 cm. The total training time is about 1.5 hours.
Similar content being viewed by others
References
Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Syst 13(1):41–77. https://doi.org/10.1023/A:1022140919877
Kendall A, Hawke J, Janz D, Mazur P, Reda D, Allen J-M, Lam V-D, Bewley A, Shah A (2019) Learning to drive in a day. In: 2019 international conference on robotics and automation (ICRA), pp 8248–8254. https://doi.org/10.1109/ICRA.2019.8793742
Osiński B, Jakubowski A, Zięcina P, Miłoś P, Galias C, Homoceanu S, Michalewski H (2020) Simulation-based reinforcement learning for real-world autonomous driving. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 6411–6418, https://doi.org/10.1109/ICRA40945.2020.9196730
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550 (7676):354–359. https://doi.org/10.1038/nature24270
Berner C, Brockman G, Chan B, Cheung V, Debiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C et al (2019) Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680
Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3389–3396. https://doi.org/10.1109/ICRA.2017.7989385
Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V et al (2018) Scalable deep reinforcement learning for visionbased robotic manipulation. In: 2018 Conference on robot learning. PMLR, vol 87, pp 651–673. https://proceedings.mlr.press/v87/kalashnikov18a.html
Koos S, Mouret J-B, Doncieux S (2010) Crossing the reality gap in evolutionary robotics by promoting transferable controllers. In: Proceedings of the 12th annual conference on genetic and evolutionary computation, pp 119–126. https://doi.org/10.1145/1830483.1830505
Cutler M, How JP (2015) Efficient reinforcement learning for robots using informative simulated priors. In: 2015 IEEE international conference on robotics and automation (ICRA), pp 2605–2612. https://doi.org/10.1109/ICRA.2015.7139550
Gao W, Graesser L, Choromanski K, Song X, Lazic N, Sanketi P, Sindhwani V, Jaitly N (2020) Robotic table tennis with model-free reinforcement learning. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5556–5563. https://doi.org/10.1109/IROS45743.2020.9341191
Mahjourian R, Miikkulainen R, Lazic N, Levine S, Jaitly N (2018) Hierarchical policy design for sample-efficient learning of robot table tennis through self-play. arXiv:1811.12927
Zhu Y, Zhao Y, Jin L, Wu J, Xiong R (2018) Towards high level skill learning: Learn to return table tennis ball using monte-carlo based policy gradient method. In: 2018 IEEE international conference on real-time computing and robotics (RCAR), pp 34–41. https://doi.org/10.1109/RCAR.2018.8621776
Hanna JP, Desai S, Karnan H, Warnell G, Stone P (2021) Grounded action transformation for sim-to-real reinforcement learning. Mach Learn:1–31. https://doi.org/10.1007/s10994-021-05982-z
Büchler D., Guist S, Calandra R, Berenz V, Schölkopf B, Peters J (2022) Learning to play table tennis from scratch using muscular robots. In: 2022 IEEE Transactions on robotics. IEEE, pp 1–11. https://doi.org/10.1109/TRO.2022.3176207
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: Bach F, Blei D (eds) Proceedings of the 32nd international conference on machine learning. PMLR, vol 37, pp. 1889–1897. https://proceedings.mlr.press/v37/schulman15.html
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:707.06347
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:509.02971
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: 2018 International conference on machine learning. PMLR, pp 1587–1596. https://proceedings.mlr.press/v80/fujimoto18a.html
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: 2018 International conference on machine learning. PMLR, pp 1861–1870, https://proceedings.mlr.press/v80/haarnoja18b.html
ASAI K, Nakayama M, YASE S (2019) The ping pong robot to return a ball precisely. https://www.omron.com/global/en/technology/omrontechnics/vol51/016.html. Accessed 2019
Li F, Jiang Q, Zhang S, Wei M, Song R (2019) Robot skill acquisition in assembly process using deep reinforcement learning. Neurocomputing 345:92–102. https://doi.org/10.1016/j.neucom.2019.01.087
Abreu M, Reis LP, Lau N (2019) Learning to run faster in a humanoid robot soccer environment through reinforcement learning. In: Chalup S, Niemueller T, Suthakorn J, Williams M-A (eds) RoboCup 2019: robot world cup XXIII. Springer, pp 3–15. https://doi.org/10.1007/978-3-030-35699-6_1
Gao Y, Tebbe J, Zell A (2021) Robust stroke recognition via vision and imu in robotic table tennis. In: Farkaš I, Masulli P, Otte S, Wermter S (eds) Artificial neural networks and machine learning – ICANN 2021. Springer, pp 379–390. https://doi.org/10.1007/978-3-030-86362-3_31
Coumans E, Bai Y (2017) Pybullet, a python module for physics simulation in robotics games and machine learning
Koç O, Maeda G, Peters J (2018) Online optimal trajectory generation for robot table tennis. Rob Auton Syst 105:121–137. https://doi.org/10.1016/j.robot.2018.03.012
Silva R, Melo FS, Veloso M (2015) Towards table tennis with a quadrotor autonomous learning robot and onboard vision. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 649–655. https://doi.org/10.1109/IROS.2015.7353441
Blank P, Groh BH, Eskofier BM (2017) Ball speed and spin estimation in table tennis using a racket-mounted inertial sensor. In: Proceedings of the 2017 ACM international symposium on wearable computers. ISWC ’17, Association for computing machinery pp 2–9. https://doi.org/10.1145/3123021.3123040
Tebbe J, Klamt L, Gao Y, Zell A (2020) Spin detection in robotic table tennis. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 9694–9700. https://doi.org/10.1109/ICRA40945.2020.9196536
Hester T, Stone P (2013) Texplore: real-time sample-efficient reinforcement learning for robots. Mach Learn 90(3):385–429. https://doi.org/10.1007/s10994-012-5322-7
Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 3389–3396. https://doi.org/10.1109/ICRA.2017.7989385
Huang Y, Büchler D, Koç O, Schölkopf B, Peters J (2016) Jointly learning trajectory generation and hitting point prediction in robot table tennis. In: 2016 IEEE-RAS 16th international conference on humanoid robots (Humanoids), pp 650–655. https://doi.org/10.1109/HUMANOIDS.2016.7803343
Yang L, Zhang H, Zhu X, Sheng X (2021) Ball motion control in the table tennis robot system using time-series deep reinforcement learning. IEEE Access 9:99816–99827. https://doi.org/10.1109/ACCESS.2021.3093340
Tebbe J, Krauch L, Gao Y, Zell A (2021) Sample-efficient reinforcement learning in robotic table tennis. In: 2021 IEEE international conference on robotics and automation (ICRA), pp 4171–4178. https://doi.org/10.1109/ICRA48506.2021.9560764
Yang L, Zhang H, Zhu X, Sheng X (2021) Ball motion control in the table tennis robot system using time-series deep reinforcement learning. IEEE Access 9:99816–99827
Tebbe J, Gao Y, Sastre-Rienietz M, Zell A (2018) A table tennis robot system using an industrial kuka robot arm. In: German conference on pattern recognition, pp 33–45. https://doi.org/10.1007/978-3-030-12939-2_3
Zhang Y, Zhao Y, Xiong R, Wang Y, Wang J, Chu J (2014) Spin observation and trajectory prediction of a ping-pong ball. In: 2014 IEEE international conference on robotics and automation (ICRA), pp 4108–4114. https://doi.org/10.1109/ICRA.2014.6907456
Kröger T, Wahl FM (2010) Online trajectory generation: basic concepts for instantaneous reactions to unforeseen events. IEEE Trans Rob 26(1):94–111. https://doi.org/10.1109/TRO.2009.2035744
Mülling K., Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. Int J Rob Res 32(3):263–279. https://doi.org/10.1177/0278364912472380
Acknowledgements
We acknowledge the support of the Vector Stiftung and the KUKA Robotics Corporation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, Y., Tebbe, J. & Zell, A. Optimal stroke learning with policy gradient approach for robotic table tennis. Appl Intell 53, 13309–13322 (2023). https://doi.org/10.1007/s10489-022-04131-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04131-w