Reinforcement Learning in Order to Control Biomechanical Models

  • Simon GottschalkEmail author
  • Michael Burger
Conference paper
Part of the Mathematics in Industry book series (MATHINDUSTRY, volume 30)


These days, techniques belonging to the research field of Artificial Intelligence (AI) are widely applied and used. Researchers increasingly understand the possibilities and advantages of those techniques for new types of tasks as well as for solving problems which are studied for years and solved by well known solution techniques so far. We focus on Reinforcement Learning (RL) [14] in the context of optimal control problems. We point out the similarities and differences between RL and classical optimal control systems and stress advantages of RL applied to biomechanical systems.



The authors are grateful for the funding by the Federal Ministry of Education and Research of Germany (BMBF), project number 05M16UKD.

Fig. 3

Activations of the Hill’s muscle models

Fig. 4

State of one trajectory executed after the training


  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.orgGoogle Scholar
  2. 2.
    Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. CoRR (2016). abs/1606.01540Google Scholar
  3. 3.
    Coady, P.: Ai gym workout (2017). Cited 26 Oct 2018
  4. 4.
    Deisenroth, M., Rasmussen, C.: PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on Machine Learning, ICML, pp. 465–472 (2011)Google Scholar
  5. 5.
    Gerdts, M.: Optimal Control of ODEs and DAEs. De Gruyter Textbook. De Gruyter, Berlin (2011)zbMATHGoogle Scholar
  6. 6.
    Hill, A.V.: The heat of shortening and the dynamic constants of muscle. Proc. R. Soc. Lond. B Biol. Sci. 126(843), 136–195 (1938)CrossRefGoogle Scholar
  7. 7.
    Kidzinski, L., Mohanty, S.P., Ong, C.F., Hicks, J.L., Carroll, S.F., Levine, S., Salath, M., Delp, S.L.: Learning to Run Challenge: Synthesizing Physiologically Accurate Motion Using Deep Reinforcement Learning. CoRR (2018). abs/1804.00198Google Scholar
  8. 8.
    Kullback, S.: Information Theory and Statistics. Wiley, New York (1959)zbMATHGoogle Scholar
  9. 9.
    Kullback, S., Leibler, R.A.: On Information and Sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, New York (1994)CrossRefGoogle Scholar
  11. 11.
    Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust Region Policy Optimization. In: ICML. Lille, France, pp. 1889–1897 (2015)Google Scholar
  12. 12.
    Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-Dimensional Continuous Control Using Generalized Advantage Estimation. CoRR (2015). abs/1506.02438Google Scholar
  13. 13.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. CoRR (2017). abs/1707.06347Google Scholar
  14. 14.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  15. 15.
    Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8(3), 279–292 (1992)zbMATHGoogle Scholar
  16. 16.
    Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8(3), 229–256 (1992)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Fraunhofer ITWMKaiserslauternGermany

Personalised recommendations