Blind Hexapod Locomotion in Complex Terrain with Gait Adaptation Using Deep Reinforcement Learning and Classification

Azayev, Teymur; Zimmerman, Karel

doi:10.1007/s10846-020-01162-8

Blind Hexapod Locomotion in Complex Terrain with Gait Adaptation Using Deep Reinforcement Learning and Classification

Published: 19 March 2020

Volume 99, pages 659–671, (2020)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

969 Accesses
22 Citations
3 Altmetric
Explore all metrics

Abstract

We present a scalable two-level architecture for Hexapod locomotion through complex terrain without the use of exteroceptive sensors. Our approach assumes that the target complex terrain can be modeled by N discrete terrain distributions which capture individual difficulties of the target terrain. Expert policies (physical locomotion controllers) modeled by Artificial Neural Networks are trained independently in these individual scenarios using Deep Reinforcement Learning. These policies are then autonomously multiplexed during inference using a Recurrent Neural Network terrain classifier conditioned on the state history, giving an adaptive gait appropriate for the current terrain. We perform several tests to assess policy robustness by changing various parameters, such as contact, friction and actuator properties. We also show experiments of goal-based positional control of such a system and a way of selecting several gait criteria during deployment, giving us a complete solution for blind Hexapod locomotion in a practical setting. The Hexapod platform and all our experiments are modeled in the MuJoCo [1] physics simulator. Demonstrations are available in the supplementary video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Economical Quadrupedal Multi-Gait Locomotion via Gait-Heuristic Reinforcement Learning

Article 18 May 2024

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Similar but Different: A Survey of Ground Segmentation and Traversability Estimation for Terrestrial Robots

Article 01 February 2024

References

Vespignani, M., Friesen, J.M., SunSpiral, V., Bruce, J.: Design of superball v2, a compliant tensegrity robot for absorbing large impacts. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2865-2871 (2018). DOI 10.1109/IROS.2018.8594374
Lample, G., Chaplot, D.S.: Playing FPS games with deep reinforcement learning. CoRRabs/1609.05521 (2016). URL http://arxiv.org/abs/1609.05521
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). URL http://arxiv.org/abs/1707.06347
Wang, T., Liao, R., Ba, J., Fidler, S.: Nervenet: Learning structured policy with graph neural networks. In: International Conference on Learning Representations (2018). URL https://openreview.net/forum?id=S1sqHMZCb
Peng, X.B., Berseth, G., van de Panne, M.: Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Transactions on Graphics (Proc. SIGGRAPH 2016) 35(4) (2016)
Bjelonic, M., Kottege, N., Homberger, T., Borges, P., Beckerle, P.: Chli, M.:Weaver: Hexapod robot for autonomous navigation on unstructured terrain. Journal of Field Robotics. 35, 1063–1079 (2018). https://doi.org/10.1002/rob.21795
Article Google Scholar
Yu, W., Turk, G., Liu, C.K.: Learning symmetry and low-energy locomotion. CoRRabs/1801.08093 (2018). URL http://arxiv.org/abs/1801.08093
Boston dynamics, spot. https://www.bostondynamics.com/spot. Accessed: 16-10-2019
Ijspeert, A.J.: Central pattern generators for locomotion control in animals and robots: A review. Neural networks: the official journal of the International Neural Network Society. 21(4), 642–653 (2008)
Article Google Scholar
Trossen robotics. https://www.trossenrobotics.com/. Accessed: 22-05-2010
Isvara, Y., Rachmatullah, S., Mutijarsa, K., Prabakti, D.E., Pragitatama, W.: Terrain adaptation gait algorithm in a hexapod walking robot. In: 2014 13th International Conference on Control Automation Robotics Vision (ICARCV), pp. 1735-1739 (2014). DOI 10.1109/ICARCV.2014.7064578
Open AI, Andrychowicz, M., Baker, B., Chociej, M., Józefowicz, R., McGrew, B., Pachocki, J.W., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., Schneider, J., Sidor, S., Tobin, J., Welinder, P., Weng, L., Zaremba, W.: Learning dexterous in-hand manipulation. CoRR abs/1808.00177 (2018). URL http://arxiv.org/abs/1808.00177
Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 31(5), 855–868 (2009). https://doi.org/10.1109/TPAMI.2008.137
Article Google Scholar
Kruijff, G., Kruijff-Korbayová, I., Keshavdas, S., Larochelle, B., Janícek, M., Colas, F., Liu, M., Pomerleau, F., Siegwart, R., Neerincx, M., Looije, R., Smets, N., Mioch, T., van Diggelen, J., Pirri, F., Gianni, M., Ferri, F., Menna, M., Worst, R., Linder, T., Tretyakov, V., Surmann, H., Svoboda, T., Reinštein, M., Zimmermann, K., Petříćek, T., Hlaváč, V.: Designing, developing, and deploying systems to support human-robot teams in disaster response. Advanced Robotics. 28(23), 1547–1570 (2014). https://doi.org/10.1080/01691864.2014.985335
Article Google Scholar
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: S.A. Solla, T.K. Leen, K. Muller (eds.) Advances in Neural Information Processing Systems 12, pp. 1057-1063. MIT Press (2000). URL http://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. CoRRabs/1703.06907 (2017). URL http://arxiv.org/abs/1703.06907
Perlin, K.: Improving noise. ACM Trans. Graph. 21(3), 681–682 (2002). https://doi.org/10.1145/566654.566636
Article Google Scholar
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., Schmidhuber, J.: Natural evolution strategies. Journal of Machine Learning Research 15, 949-980 (2014). URL http://jmlr.org/papers/v15/wierstra14a.html
Hutter, M., Gehring, C., Lauber, A., Gunther, F., Bellicoso, C.D., Tsounis, V., Fankhauser, P., Diethelm, R., Bachmann, S., Bloesch, M., Kolvenbach, H., Bjelonic, M., Isler, L., Meyer, K.: Anymal - toward legged robots for harsh environments. Advanced Robotics. 31(17), 918–931 (2017). https://doi.org/10.1080/01691864.2017.1378591
Article Google Scholar
Graves, A., Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. CoRR abs/1303.5778 (2013). URL http://arxiv.org/abs/1303.5778
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Manoonpong, P., Parlitz, U., Wörgötter, F.: Neural control and adaptive neural forward models for insect-like, energy-e_cient, and adaptable locomotion of walking machines. Frontiers in neural circuits. 7, 12 (2013). https://doi.org/10.3389/fncir.2013.00012
Article Google Scholar
Ross, S., Gordon, G.J., Bagnell, J.A.: No-regret reductions for imitation learning and structured prediction. CoRR abs/1011.0686 (2010). URL http://arxiv.org/abs/1011.0686
Pecka, M., Zimmermann, K., Reinstein, M., Svoboda, T.: Controlling robot morphology from incomplete measurements. CoRR abs/1612.02739 (2016). URL http://arxiv.org/abs/1612.02739
Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems pp. 5026-5033 (2012)
Čížek, P., Faigl, J.: On locomotion control using position feedback only in traversing rough terrains with hexapod crawling robot. IOP Conference Series: Materials Science and Engineering. 428, 012065 (2018). https://doi.org/10.1088/1757-899X/428/1/012065
Article Google Scholar
Sanchez-Gonzalez, A., Heess, N., Springenberg, J.T., Merel, J., Riedmiller, M.A., Hadsell, R., Battaglia, P.: Graph networks as learnable physics engines for inference and control. CoRR abs/1806.01242 (2018). URL http://arxiv.org/abs/1806.01242
Saranli, U.: Rhex: A simple and highly mobile hexapod robot. The International Journal of Robotics Research. 20, 616–631 (2001). https://doi.org/10.1177/02783640122067570
Article Google Scholar
Xie, Z., Berseth, G., Clary, P., Hurst, J.W., van de Panne, M.: Feedback control for cassie with deep reinforcement learning. CoRR abs/1803.05580 (2018). URL http: //arxiv.org/abs/1803.05580
Bitter lesson, rich sutton. http://www.incompleteideas.net/IncIdeas/BitterLesson. html. Accessed: 2019-04-18

Download references

Acknowledgements

The research leading to these results has received funding from the Czech Science Foundation under Project 17-08842S.

Author information

Authors and Affiliations

CVUT-FEL, E227, Karlovo nam. 13, Praha 2, Prague, Czechia
Teymur Azayev
CVUT-FEL, E226, Karlovo nam. 13, Praha 2, Prague, Czechia
Karel Zimmerman

Authors

Teymur Azayev
View author publications
You can also search for this author in PubMed Google Scholar
Karel Zimmerman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Teymur Azayev.

Electronic supplementary material

(MP4 22534 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Azayev, T., Zimmerman, K. Blind Hexapod Locomotion in Complex Terrain with Gait Adaptation Using Deep Reinforcement Learning and Classification. J Intell Robot Syst 99, 659–671 (2020). https://doi.org/10.1007/s10846-020-01162-8

Download citation

Received: 23 May 2019
Accepted: 28 January 2020
Published: 19 March 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10846-020-01162-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Blind Hexapod Locomotion in Complex Terrain with Gait Adaptation Using Deep Reinforcement Learning and Classification

Abstract

Access this article

Similar content being viewed by others

Economical Quadrupedal Multi-Gait Locomotion via Gait-Heuristic Reinforcement Learning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Similar but Different: A Survey of Ground Segmentation and Traversability Estimation for Terrestrial Robots

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Blind Hexapod Locomotion in Complex Terrain with Gait Adaptation Using Deep Reinforcement Learning and Classification

Abstract

Access this article

Similar content being viewed by others

Economical Quadrupedal Multi-Gait Locomotion via Gait-Heuristic Reinforcement Learning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

Similar but Different: A Survey of Ground Segmentation and Traversability Estimation for Terrestrial Robots

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation