Evolutionary Function Approximation for Gait Generation on Legged Robots

  • Oscar A. Silva
  • Miguel A. SolisEmail author
Part of the Studies in Systems, Decision and Control book series (SSDC, volume 40)


Reinforcement learning methods can be computationally expensive. Their cost is prone to be higher when the cardinality of the state space representation becomes larger. This curse of dimensionality plays an important role on our work, since gait generation by using more degrees of freedom at each leg, implies a bigger state space after discretization, and look-up tables become impractical. Thus, appropriate function approximators are needed for such kind of tasks on robotics. This chapter shows the advantage of using reinforcement learning, specifically within the batch framework. A neuroevolution of augmenting topologies scheme is used as function approximator, a particular case of a topology and weight evolving artificial neural network which has proved to outperform a fixed-topology network for certain tasks. A comparison between function approximators within the batch reinforcement learning approach is tested on a simulated version of an hexapod robot designed and already built at our undergraduate and graduate students group.


Artificial neural networks Reinforcement learning Robotics 


  1. 1.
    Altendorfer, R., Moore, N., Komsuoglu, H., Buehler, M., Brown Jr, H., McMordie, D., Saranli, U., Full, R., Koditschek, D.E.: Rhex: a biologically inspired hexapod runner. Auton. Robots 11(3), 207–213 (2001)CrossRefzbMATHGoogle Scholar
  2. 2.
    Beer, R.D., Quinn, R.D., Chiel, H.J., Ritzmann, R.E.: Biologically inspired approaches to robotics: what can we learn from insects? Commun. ACM 40(3), 30–38 (1997)CrossRefGoogle Scholar
  3. 3.
    Bertsekas, D.P., Bertsekas, D.P.: Dynamic programming and optimal control, vol. 1. Athena Scientific, Belmont (1995)Google Scholar
  4. 4.
    Cunha, J., Lau, N., Neves, A.J.R.: Q-batch: initial results with a novel update rule for batch reinforcement learning. In: Advances in Artificial Intelligence-Local Proceedings, XVI Portuguese Conference on Artificial Intelligence. Azores pp. 240–251 (2013)Google Scholar
  5. 5.
    Devjanin, E.A., Gurfinkel, V.S., Gurfinkel, E.V., Kartashev, V.A., Lensky, A.V., Yu Shneider, A., Shtilman, L.G.: The six-legged walking robot capable of terrain adaptation. Mech. Mach. Theor. 18(4), 257–260 (1983)CrossRefGoogle Scholar
  6. 6.
    Duan, X., Chen, W., Yu, S., Liu, J.: Tripod gaits planning and kinematics analysis of a hexapod robot. In: Control and Automation, 2009. ICCA 2009. IEEE International Conference on, pp. 1850–1855, IEEE (2009)Google Scholar
  7. 7.
    Erden, M.S., Leblebicioğlu, K.: Free gait generation with reinforcement learning for a six-legged robot. Robot. Auton. Syst. 56(3), 199–212 (2008)CrossRefGoogle Scholar
  8. 8.
    Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. J.Mach. Learn. Res., 503–556 (2005)Google Scholar
  9. 9.
    Freese, M., Singh, S., Ozaki, F., Matsuhira, N.: Virtual robot experimentation platform v-rep: a versatile 3d robot simulator. Simulation, modeling, and programming for autonomous robots, pp. 51–62. Springer, Berlin (2010)CrossRefGoogle Scholar
  10. 10.
    Ghanbari, A., Vaghei, Y., Noorani, S., Reza, S.M.: Reinforcement learning in neural networks: a survey. Int. J. Adv. Biol. Biomed. Res. 2(5), 1398–1416 (2014)Google Scholar
  11. 11.
    Glette, K., Klaus, G., Zagal, J.C., Torresen, J.: Evolution of locomotion in a simulated quadruped robot and transferral to reality. In: Proceedings of the Seventeenth International Symposium on Artificial Life and Robotics (2012)Google Scholar
  12. 12.
    Glorennec, P.Y., Jouffe, L.: Fuzzy Q-learning. In: Fuzzy Systems, 1997., Proceedings of the Sixth IEEE International Conference on, vol. 2. pp. 659–662, IEEE (1997)Google Scholar
  13. 13.
    Gruau, F.: Genetic synthesis of modular neural networks. In: Proceedings of the 5th International Conference on Genetic Algorithms, pp. 318–325. Morgan Kaufmann Publishers Inc. (1993)Google Scholar
  14. 14.
    He, P., Jagannathan, S.: Reinforcement learning-based output feedback control of nonlinear systems with input constraints. IEEE Trans. Syst. Man Cybern. B Cybern. 35(1), 150–154 (2005)CrossRefGoogle Scholar
  15. 15.
    Hirose, S., Fukuda, Y., Yoneda, K., Nagakubo, A., Tsukagoshi, H., Arikawa, K., Endo, G., Doi, T., Hodoshima, R.: Quadruped walking robots at tokyo institute of technology: design, analysis, and gait control methods. IEEE Robot. Autom. Mag. 16(2), 104–114 (2009)CrossRefGoogle Scholar
  16. 16.
    Huang, Q., Yokoi, K., Kajita, S., Kaneko, K., Arai, H., Koyachi, N., Tanie, K.: Planning walking patterns for a biped robot. IEEE Trans. Robot. Autom. 17(3), 280–289 (2001)CrossRefGoogle Scholar
  17. 17.
    Kajita, S., Morisawa, M., Miura, K., Nakaoka, S., Harada, K., Kaneko, K., Kanehiro, F., Yokoi, K.: Biped walking stabilization based on linear inverted pendulum tracking. In: Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pp. 4489–4496. IEEE (2010)Google Scholar
  18. 18.
    Kalyanakrishnan, S., Stone, P.: Batch reinforcement learning in a complex domain. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, p.94. ACM (2007)Google Scholar
  19. 19.
    Kamikawa, K., Arai, T., Inoue, K., Mae, Y.: Omni-directional gait of multi-legged rescue robot. In: Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International Conference on, vol. 3, pp. 2171–2176. IEEE (2004)Google Scholar
  20. 20.
    Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M.B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Kiumarsi-Khomartash, B., Lewis, F.L., Naghibi-Sistani, M.B., Karimpour, A.: Optimal tracking control for linear discrete-time systems using reinforcement learning. In: Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on, pp. 3845–3850. IEEE (2013)Google Scholar
  22. 22.
    Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)CrossRefGoogle Scholar
  23. 23.
    Konidaris, G., Osentoski, S., Thomas, P.S.: Value function approximation in reinforcement learning using the fourier basis. In: AAAI (2011)Google Scholar
  24. 24.
    Kosslyn, S.M., Kosslyn, S.: Top brain, bottom brain: surprising insights into how you think. Simon and Schuster, New York (2013)Google Scholar
  25. 25.
    Lange, S., Gabel, T., Riedmiller, M.: Batch reinforcement learning. In: Reinforcement Learning, pp. 45–73. Springer, Berlin (2012)Google Scholar
  26. 26.
    Lewis, F.L., Liu, D.: Reinforcement learning and approximate dynamic programming for feedback control, vol. 17. Wiley, New York (2013)Google Scholar
  27. 27.
    Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)Google Scholar
  28. 28.
    Lin, L.J.: Reinforcement learning for robots using neural networks. Technical report, DTIC Document (1993)Google Scholar
  29. 29.
    Lohmann, S., Yosinski, J., Gold, E., Clune, J., Blum, J., Lipson, H.: Aracna: an open-source quadruped platform for evolutionary robotics. Artif. Life 13, 387–392 (2012)Google Scholar
  30. 30.
    Ma, S., Tomiyama, T., Wada, H.: Omnidirectional static walking of a quadruped robot. IEEE Trans. Robot. 21(2), 152–161 (2005)CrossRefGoogle Scholar
  31. 31.
    Modares, H., Lewis, F.L.: Online solution to the linear quadratic tracking problem of continuous-time systems using reinforcement learning. In: Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on, pp. 3851–3856. IEEE (2013)Google Scholar
  32. 32.
    Munos, R.: Error bounds for approximate policy iteration. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 560–567 (2003)Google Scholar
  33. 33.
    Nakamura, Y., Mori, T., Sato, M., Ishii, S.: Reinforcement learning for a biped robot based on a cpg-actor-critic method. Neural Netw. 20(6), 723–735 (2007)CrossRefzbMATHGoogle Scholar
  34. 34.
    Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., Littman, M.L.: An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 752–759. ACM (2008)Google Scholar
  35. 35.
    Pyeatt, L.D., Howe, A.E., et al.: Decision tree function approximation in reinforcement learning. In: Proceedings of the Third International Symposium on Adaptive Systems: Evolutionary Computation and Probabilistic Graphical Models, vol. 2. pp. 70–77 (2001)Google Scholar
  36. 36.
    Riedmiller, M.: Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In: Machine Learning: ECML 2005, pp. 317–328. Springer, Berlin (2005)Google Scholar
  37. 37.
    Schmucker,U., Schneider, A., Ihme, T.: Hexagonal walking vehicle with force sensing capability. In: Proceedings of 6th International Symposium on Measurement and Control in Robotics. Brussel, pp. 354–359 (1996)Google Scholar
  38. 38.
    Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)CrossRefGoogle Scholar
  39. 39.
    Sutton, R.S., Barto, A.G.: Introduction to reinforcement learning. MIT Press, Cambridge (1998)Google Scholar
  40. 40.
    Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is direct adaptive optimal control. IEEE Control Syst. 12(2), 19–22 (1992)CrossRefGoogle Scholar
  41. 41.
    Vamvoudakis, K.G., Lewis, F.L.: Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Watkins, C.J.C.H.: Learning from delayed rewards. PhD thesis, University of Cambridge, England (1989)Google Scholar
  43. 43.
    Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. J. Mach. Learn. Res. 7, 877–917 (2006)MathSciNetzbMATHGoogle Scholar
  44. 44.
    Wiering, M., Van Otterlo, M.: Reinforcement learning. In: Adaptation, Learning, and Optimization, vol. 12. Springer, Berlin (2012)Google Scholar
  45. 45.
    Williams, R.J., Baird, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. Technical report, Citeseer (1993)Google Scholar
  46. 46.
    Yamaguchi, A., Hyon, S., Ogasawara, T.: Reinforcement learning for balancer embedded humanoid locomotion. In: Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on, pp. 308–313. IEEE (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Innovación y Robótica Estudiantil UTFSMValparaísoChile
  2. 2.Centro de Robótica UTFSMValparaísoChile

Personalised recommendations