Abstract
Reinforcement learning methods can be computationally expensive. Their cost is prone to be higher when the cardinality of the state space representation becomes larger. This curse of dimensionality plays an important role on our work, since gait generation by using more degrees of freedom at each leg, implies a bigger state space after discretization, and look-up tables become impractical. Thus, appropriate function approximators are needed for such kind of tasks on robotics. This chapter shows the advantage of using reinforcement learning, specifically within the batch framework. A neuroevolution of augmenting topologies scheme is used as function approximator, a particular case of a topology and weight evolving artificial neural network which has proved to outperform a fixed-topology network for certain tasks. A comparison between function approximators within the batch reinforcement learning approach is tested on a simulated version of an hexapod robot designed and already built at our undergraduate and graduate students group.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Altendorfer, R., Moore, N., Komsuoglu, H., Buehler, M., Brown Jr, H., McMordie, D., Saranli, U., Full, R., Koditschek, D.E.: Rhex: a biologically inspired hexapod runner. Auton. Robots 11(3), 207–213 (2001)
Beer, R.D., Quinn, R.D., Chiel, H.J., Ritzmann, R.E.: Biologically inspired approaches to robotics: what can we learn from insects? Commun. ACM 40(3), 30–38 (1997)
Bertsekas, D.P., Bertsekas, D.P.: Dynamic programming and optimal control, vol. 1. Athena Scientific, Belmont (1995)
Cunha, J., Lau, N., Neves, A.J.R.: Q-batch: initial results with a novel update rule for batch reinforcement learning. In: Advances in Artificial Intelligence-Local Proceedings, XVI Portuguese Conference on Artificial Intelligence. Azores pp. 240–251 (2013)
Devjanin, E.A., Gurfinkel, V.S., Gurfinkel, E.V., Kartashev, V.A., Lensky, A.V., Yu Shneider, A., Shtilman, L.G.: The six-legged walking robot capable of terrain adaptation. Mech. Mach. Theor. 18(4), 257–260 (1983)
Duan, X., Chen, W., Yu, S., Liu, J.: Tripod gaits planning and kinematics analysis of a hexapod robot. In: Control and Automation, 2009. ICCA 2009. IEEE International Conference on, pp. 1850–1855, IEEE (2009)
Erden, M.S., Leblebicioğlu, K.: Free gait generation with reinforcement learning for a six-legged robot. Robot. Auton. Syst. 56(3), 199–212 (2008)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. J.Mach. Learn. Res., 503–556 (2005)
Freese, M., Singh, S., Ozaki, F., Matsuhira, N.: Virtual robot experimentation platform v-rep: a versatile 3d robot simulator. Simulation, modeling, and programming for autonomous robots, pp. 51–62. Springer, Berlin (2010)
Ghanbari, A., Vaghei, Y., Noorani, S., Reza, S.M.: Reinforcement learning in neural networks: a survey. Int. J. Adv. Biol. Biomed. Res. 2(5), 1398–1416 (2014)
Glette, K., Klaus, G., Zagal, J.C., Torresen, J.: Evolution of locomotion in a simulated quadruped robot and transferral to reality. In: Proceedings of the Seventeenth International Symposium on Artificial Life and Robotics (2012)
Glorennec, P.Y., Jouffe, L.: Fuzzy Q-learning. In: Fuzzy Systems, 1997., Proceedings of the Sixth IEEE International Conference on, vol. 2. pp. 659–662, IEEE (1997)
Gruau, F.: Genetic synthesis of modular neural networks. In: Proceedings of the 5th International Conference on Genetic Algorithms, pp. 318–325. Morgan Kaufmann Publishers Inc. (1993)
He, P., Jagannathan, S.: Reinforcement learning-based output feedback control of nonlinear systems with input constraints. IEEE Trans. Syst. Man Cybern. B Cybern. 35(1), 150–154 (2005)
Hirose, S., Fukuda, Y., Yoneda, K., Nagakubo, A., Tsukagoshi, H., Arikawa, K., Endo, G., Doi, T., Hodoshima, R.: Quadruped walking robots at tokyo institute of technology: design, analysis, and gait control methods. IEEE Robot. Autom. Mag. 16(2), 104–114 (2009)
Huang, Q., Yokoi, K., Kajita, S., Kaneko, K., Arai, H., Koyachi, N., Tanie, K.: Planning walking patterns for a biped robot. IEEE Trans. Robot. Autom. 17(3), 280–289 (2001)
Kajita, S., Morisawa, M., Miura, K., Nakaoka, S., Harada, K., Kaneko, K., Kanehiro, F., Yokoi, K.: Biped walking stabilization based on linear inverted pendulum tracking. In: Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pp. 4489–4496. IEEE (2010)
Kalyanakrishnan, S., Stone, P.: Batch reinforcement learning in a complex domain. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, p.94. ACM (2007)
Kamikawa, K., Arai, T., Inoue, K., Mae, Y.: Omni-directional gait of multi-legged rescue robot. In: Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International Conference on, vol. 3, pp. 2171–2176. IEEE (2004)
Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M.B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)
Kiumarsi-Khomartash, B., Lewis, F.L., Naghibi-Sistani, M.B., Karimpour, A.: Optimal tracking control for linear discrete-time systems using reinforcement learning. In: Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on, pp. 3845–3850. IEEE (2013)
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Konidaris, G., Osentoski, S., Thomas, P.S.: Value function approximation in reinforcement learning using the fourier basis. In: AAAI (2011)
Kosslyn, S.M., Kosslyn, S.: Top brain, bottom brain: surprising insights into how you think. Simon and Schuster, New York (2013)
Lange, S., Gabel, T., Riedmiller, M.: Batch reinforcement learning. In: Reinforcement Learning, pp. 45–73. Springer, Berlin (2012)
Lewis, F.L., Liu, D.: Reinforcement learning and approximate dynamic programming for feedback control, vol. 17. Wiley, New York (2013)
Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3–4), 293–321 (1992)
Lin, L.J.: Reinforcement learning for robots using neural networks. Technical report, DTIC Document (1993)
Lohmann, S., Yosinski, J., Gold, E., Clune, J., Blum, J., Lipson, H.: Aracna: an open-source quadruped platform for evolutionary robotics. Artif. Life 13, 387–392 (2012)
Ma, S., Tomiyama, T., Wada, H.: Omnidirectional static walking of a quadruped robot. IEEE Trans. Robot. 21(2), 152–161 (2005)
Modares, H., Lewis, F.L.: Online solution to the linear quadratic tracking problem of continuous-time systems using reinforcement learning. In: Decision and Control (CDC), 2013 IEEE 52nd Annual Conference on, pp. 3851–3856. IEEE (2013)
Munos, R.: Error bounds for approximate policy iteration. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 560–567 (2003)
Nakamura, Y., Mori, T., Sato, M., Ishii, S.: Reinforcement learning for a biped robot based on a cpg-actor-critic method. Neural Netw. 20(6), 723–735 (2007)
Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., Littman, M.L.: An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 752–759. ACM (2008)
Pyeatt, L.D., Howe, A.E., et al.: Decision tree function approximation in reinforcement learning. In: Proceedings of the Third International Symposium on Adaptive Systems: Evolutionary Computation and Probabilistic Graphical Models, vol. 2. pp. 70–77 (2001)
Riedmiller, M.: Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In: Machine Learning: ECML 2005, pp. 317–328. Springer, Berlin (2005)
Schmucker,U., Schneider, A., Ihme, T.: Hexagonal walking vehicle with force sensing capability. In: Proceedings of 6th International Symposium on Measurement and Control in Robotics. Brussel, pp. 354–359 (1996)
Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)
Sutton, R.S., Barto, A.G.: Introduction to reinforcement learning. MIT Press, Cambridge (1998)
Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is direct adaptive optimal control. IEEE Control Syst. 12(2), 19–22 (1992)
Vamvoudakis, K.G., Lewis, F.L.: Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)
Watkins, C.J.C.H.: Learning from delayed rewards. PhD thesis, University of Cambridge, England (1989)
Whiteson, S., Stone, P.: Evolutionary function approximation for reinforcement learning. J. Mach. Learn. Res. 7, 877–917 (2006)
Wiering, M., Van Otterlo, M.: Reinforcement learning. In: Adaptation, Learning, and Optimization, vol. 12. Springer, Berlin (2012)
Williams, R.J., Baird, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. Technical report, Citeseer (1993)
Yamaguchi, A., Hyon, S., Ogasawara, T.: Reinforcement learning for balancer embedded humanoid locomotion. In: Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on, pp. 308–313. IEEE (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Silva, O.A., Solis, M.A. (2016). Evolutionary Function Approximation for Gait Generation on Legged Robots. In: Espinosa, H. (eds) Nature-Inspired Computing for Control Systems. Studies in Systems, Decision and Control, vol 40. Springer, Cham. https://doi.org/10.1007/978-3-319-26230-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-26230-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26228-4
Online ISBN: 978-3-319-26230-7
eBook Packages: EngineeringEngineering (R0)