Biological Cybernetics

, Volume 108, Issue 5, pp 603–619 | Cite as

Learning strategies in table tennis using inverse reinforcement learning

  • Katharina MuellingEmail author
  • Abdeslam Boularias
  • Betty Mohler
  • Bernhard Schölkopf
  • Jan Peters
Original Paper


Learning a complex task such as table tennis is a challenging problem for both robots and humans. Even after acquiring the necessary motor skills, a strategy is needed to choose where and how to return the ball to the opponent’s court in order to win the game. The data-driven identification of basic strategies in interactive tasks, such as table tennis, is a largely unexplored problem. In this paper, we suggest a computational model for representing and inferring strategies, based on a Markov decision problem, where the reward function models the goal of the task as well as the strategic information. We show how this reward function can be discovered from demonstrations of table tennis matches using model-free inverse reinforcement learning. The resulting framework allows to identify basic elements on which the selection of striking movements is based. We tested our approach on data collected from players with different playing styles and under different playing conditions. The estimated reward function was able to capture expert-specific strategic information that sufficed to distinguish the expert among players with different skill levels as well as different playing styles.


Computational models of decision processes Table tennis Inverse reinforcement learning 



We would like to thank Ekaterina Volkova for her support with the calibration and advise for the motion suits and VICON system, as well as Volker Grabe for his technical support for the integration of Kinect and VICON with ROS. We also like to thank Dr. Tobias Meilinger for helpful comments on the psychological part of this experiment and Oliver Kroemer for proof reading this paper.


  1. Abbeel P, Coates A, Ng A (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Robotics Res 29:1608–1679CrossRefGoogle Scholar
  2. Abbeel P, Dolgov D, Ng A, Thrun S (2008) Apprenticeship learning for motion planning with application to parking lot navigation. In: Proceedings of the international conference on intelligent robots and systems (IROS)Google Scholar
  3. Abbeel P, Ng A (2004) Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st international conference of machine learning (ICML)Google Scholar
  4. Argall B, Chernova S, Veloso MM, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483CrossRefGoogle Scholar
  5. Boularias A, Kober J, Peters J (2011) Relative entropy inverse reinforcement learning. In: Proceedings of the artificial intelligences and statistics (AISTATS), pp 20–27Google Scholar
  6. Boyd S, El Ghaoui L, Feron E, Balakrishnan V (1994) Linear matrix inequalities in system and control theory, volume 15 of studies in applied mathematics. SIAM, PhiladelphiaGoogle Scholar
  7. Braitenberg V (1984) Vehicles: experiments in synthetic psychology. MIT Press, CambridgeGoogle Scholar
  8. Braitenberg V, Heck D, Sultan F (1997) The detection and generation of sequences as a key to cerebellar function: experiments and theory. Behav Brian Sci 20:229–277CrossRefGoogle Scholar
  9. Chandramohan S, Geist M, Lefevre F, Pietquin O (2011) User simulation in dialogue systems using inverse reinforcement learning. In: Proceedings of the 12th annual conference of the international speech communication associationGoogle Scholar
  10. Diaz G, Cooper J, Rothkopf C, Hayhoe M (2013) Saccades to future ball location reveal memory-based prediction in a natural interception task. J Vis 13(1):1–14Google Scholar
  11. Hohmann A, Zhang H, Koth A (2004) Performance diagnosis through mathematical simulation in table tennis. In: Lees A, Kahn J-F, Maynard I (eds) Science and racket sports III. Routledge, London, pp 220–226Google Scholar
  12. International Table Tennis Federation (2011) Table tennis rulesGoogle Scholar
  13. Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parameterized motor primitives to new situations. Auton Robot 33(4):361–379CrossRefGoogle Scholar
  14. Kolter Z, Ng A (2011) The Stanford LittleDog: A learning and rapid replanning approach to quadruped locomotion. Int J Robot Res 30(2):150–174Google Scholar
  15. Levine S, Popovic Z, Koltun V (2010) Feature construction for inverse reinforcement learning. In: Advances in neural information processing systems (NIPS), pp 1342–1350Google Scholar
  16. Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with gaussian processes. Adv Neural Inf Process Syst 19–27Google Scholar
  17. Monahan G (1982) A survey of partially observable markov decision processes: theory, models and algorithms. Manag Sci 28:1–16CrossRefGoogle Scholar
  18. Mori T, Howard M, Vijayakumar S (2011) Model-free apprenticeship learning for transfer of human impedance behaviour. In: Proceedings of the 11th IEEE-RAS international conference on humanoid robots (HUMANOIDS), pp 239–246Google Scholar
  19. Muelling K, Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. Int J Robot Res 32(3):263–279CrossRefGoogle Scholar
  20. Ng A, Russel X (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the 17th international conference of, machine learning, pp 663–670Google Scholar
  21. Powell W (2011) Approximate dynamic programming: solving the curses of dimensionality, 1st edn. Wiley, New YorkCrossRefGoogle Scholar
  22. Puterman M (1994) Markov decision processes: discrete stochastic dynamic programming, 1st edn. Wiley, New YorkCrossRefGoogle Scholar
  23. Ramachandran D, Amir E (2007) Bayesian inverse reinforcement learning. In: Proceedings of the 20th international joint conference of artificial intelligence (IJCAI), pp 2586–2591Google Scholar
  24. Ratliff N, Bagnell J, Zinkevich M (2006) Maximum margin planning. In: Proceedings of the 23rd international conference on machine learning (ICML), pp 729–736Google Scholar
  25. Rothkopf C, Ballard D (2013) Modular inverse reinforcement learning for visuomotor behavior. Biol Cybern 107:477–490PubMedCentralPubMedCrossRefGoogle Scholar
  26. Rothkopf C, Dimitrakakis C (2011) Preference elicitation and inverse reinforcement learning. In: 22nd European conference on machine learning (ECML)Google Scholar
  27. Schaal S (1999) Is imitation learning the route to humanoid robots? Trends Cogn Sci 6:233–242CrossRefGoogle Scholar
  28. Seve C, Saury J, Leblanc S, Durand M (2004) Course-of-action theory in table tennis: a qualitative analysis of the knowledge used by three elite players during matches. Revue europeen de psychologie appliqueeGoogle Scholar
  29. Sutton R, Barto A (1998) Reinforcement learning: an introduction. The MIT Press, CambridgeGoogle Scholar
  30. Vis J, Kosters W, Terroba A (2010) Tennis patterns: player, match and beyond. In: 22nd Benelux conference on artificial intelligenceGoogle Scholar
  31. Wang J, Parameswaran N (2005) Analyzing tennis tactics from broadcasting tennis video clips. In: Proceedings of the 11th international multimedia modelling conference, pp 102–106 Google Scholar
  32. Wang P, Cai R, Yang S (2004) A tennis video indexing approach through pattern discovery in interactive process. Adv Multimed Inf Process 3331:56–59Google Scholar
  33. Zhifei S, Joo E (2012) A survey of inverse reinforcement learning techniques. Int J Intell Comput Cybern 5(3):293–311CrossRefGoogle Scholar
  34. Ziebart B, Maas A, Bagnell A, Dey A (2008) Maximum entropy inverse reinforcement learning. In: Proceedings of the 23th national conference of artificial intelligence (AAAI), pp 1433–1438Google Scholar
  35. Ziebart B, Ratliff N, Gallagher G, Mertz C, Peterson K, Bagnell A, Herbert M, Srinivasa S (2009) Planning based prediction for pedestrians. In: Proceedings of the international conference on intelligent robotics and systems (IROS)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Katharina Muelling
    • 1
    • 2
    • 4
    Email author
  • Abdeslam Boularias
    • 1
    • 2
  • Betty Mohler
    • 3
  • Bernhard Schölkopf
    • 1
  • Jan Peters
    • 1
    • 4
  1. 1.Max Planck Institute for Intelligent SystemsTuebingenGermany
  2. 2.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA
  3. 3.Max Planck Institute for Biological CyberneticsTuebingenGermany
  4. 4.FG Intelligente Autonome SystemeTechnische Universität DarmstadtDarmstadtGermany

Personalised recommendations