Reinforcement learning with imitative behaviors for humanoid robots navigation: synchronous planning and control

Wang, Xiaoying; Zhang, Tong

doi:10.1007/s10514-024-10160-w

Reinforcement learning with imitative behaviors for humanoid robots navigation: synchronous planning and control

Published: 17 April 2024

Volume 48, article number 5, (2024)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Xiaoying Wang¹ &
Tong Zhang^1,2

143 Accesses
Explore all metrics

Abstract

Humanoid robots have strong adaptability to complex environments and possess human-like flexibility, enabling them to perform precise farming and harvesting tasks in varying depths of terrains. They serve as essential tools for agricultural intelligence. In this article, a novel method was proposed to improve the robustness of autonomous navigation for humanoid robots, which intercommunicates the data fusion of the footprint planning and control levels. In particular, a deep reinforcement learning model - Proximal Policy Optimization (PPO) that has been fine-tuned is introduced into this layer, before which heuristic trajectory was generated based on imitation learning. In the RL period, the KL divergence between the agent’s policy and imitative expert policy as a value penalty is added to the advantage function. As a proof of concept, our navigation policy is trained in a robotic simulator and then successfully applied to the physical robot GTX for indoor multi-mode navigation. The experimental results conclude that incorporating imitation learning imparts anthropomorphic attributes to robots and facilitates the generation of seamless footstep patterns. There is a significant improvement in ZMP trajectory in y-direction from the center by 21.56% is noticed. Additionally, this method improves dynamic locomotion stability, the body attitude angle falling between less than ± 5.5\(^\circ \) compared to ± 48.4\(^\circ \) with traditional algorithm. In general, navigation error is below 5 cm, which we verified in the experiments. It is thought that the outcome of the proposed framework presented in this article can provide a reference for researchers studying autonomous navigation applications of humanoid robots on uneven ground.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 10

Target-Driven Autonomous Robot Exploration in Mappless Indoor Environments Through Deep Reinforcement Learning

Incremental Learning for Autonomous Navigation of Mobile Robots based on Deep Reinforcement Learning

Article 07 December 2020

Collision Avoidance for Indoor Service Robots Through Multimodal Deep Reinforcement Learning

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., & Ghemawat, S. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
Abedi, E., Alamirpour, P., & Mirshahvalad, R. (2017). Control humanoid robot using intelligent optimization algorithms fusion with fourier series. In 2017 9th international conference on computational intelligence and communication networks (CICN) (pp. 181–185). IEEE.
Aldana-Murillo, N. G., Sandoval, L., Hayet, J.-B., Esteves, C., & Becerra, H. M. (2020). Coupling humanoid walking pattern generation and visual constraint feedback for pose-regulation and visual path-following. Robotics and Autonomous Systems, 128, 103497.
Article Google Scholar
Amos, B., Jimenez, I., Sacks, J., Boots, B., & Kolter, J. Z. (2018). Differentiable MPC for end-to-end planning and control. In Advances in neural information processing systems (Vol. 31).
Ayari, A., & Knani, J. (2018). The generation of a stable walking trajectory of a bipedal robot based on the COG based gain pattern and ZMP constraint. International Journal of Advanced Computer Science and Applications (IJASCA), 9(9).
Ayaz, Y., Munawar, K., Malik, M. B., Konno, A., & Uchiyama, M. (2007). Human-like approach to footstep planning among obstacles for humanoid robots. International Journal of Humanoid Robotics, 4(01), 125–149.
Article Google Scholar
Brandenburger, A., Rodriguez, D., & Behnke, S. (2021). Mapless humanoid navigation using learned latent dynamics. In 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1555–1561). IEEE.
Chebotar, Y., Handa, A., Makoviychuk, V., Macklin, M., Issac, J., Ratliff, N., & Fox, D. (2019). Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In 2019 international conference on robotics and automation (ICRA) (pp. 8973–8979). IEEE.
Chestnutt, J., Lau, M., Cheung, G., Kuffner, J., Hodgins, J., & Kanade, T. (2005). Footstep planning for the honda asimo humanoid. In Proceedings of the 2005 IEEE international conference on robotics and automation (pp. 629–634). IEEE.
Chestnutt, J., Nishiwaki, K., Kuffner, J., & Kagami, S. (2007). An adaptive action model for legged navigation planning. In 2007 7th IEEE-RAS international conference on humanoid robots (pp. 196–202). IEEE.
Chestnutt, J., & Kuffner, J. J. (2004). A tiered planning strategy for biped navigation. In 4th IEEE/RAS international conference on humanoid robots, (Vol. 1, pp. 422–436). IEEE.
Chung, R.-L., Hsueh, Y., Chen, S.-L., & Abu, P. A. R. (2022). Efficient and accurate cordic pipelined architecture chip design based on binomial approximation for biped robot. Electronics, 11(11), 1701.
Article Google Scholar
Diedam, H., Dimitrov, D., Wieber, P.-B., Mombaur, K., & Diehl, M. (2008). Online walking gait generation with adaptive foot positioning through linear model predictive control. In 2008 IEEE/RSJ international conference on intelligent robots and systems (pp. 1121–1126). IEEE.
Garcia, M., Stasse, O., Hayet, J.-B., Dune, C., Esteves, C., & Laumond, J.-P. (2015). Vision-guided motion primitives for humanoid reactive walking: Decoupled versus coupled approaches. The International Journal of Robotics Research, 34(4–5), 402–419.
Article Google Scholar
Gutmann, J.-S., Fukuchi, M., & Fujita, M. (2005). A modular architecture for humanoid robot navigation. In 5th IEEE-RAS international conference on humanoid robots (pp. 26–31). IEEE.
Ha, I., Tamura, Y., & Asama, H. (2011). Gait pattern generation and stabilization for humanoid robot based on coupled oscillators. In 2011 IEEE/RSJ international conference on intelligent robots and systems (pp. 3207–3212). IEEE.
Herdt, A., Perrin, N., & Wieber, P.-B. (2010). Walking without thinking about it. In IEEE/RSJ international conference on intelligent robots and systems (pp. 190–195). IEEE.
Herdt, A., Diedam, H., Wieber, P.-B., Dimitrov, D., Mombaur, K., & Diehl, M. (2010). Online walking motion generation with automatic footstep placement. Advanced Robotics, 24(5–6), 719–737.
Article Google Scholar
Hildebrandt, A.-C., Wahrmann, D., Wittmann, R., & Rixen, D., (2016). Autonomous robotics: Application on legged and agricultural robots. In DGR-Days.
Huang, Z., Wu, J., & Lv, C. (2022). Efficient deep reinforcement learning with IMITAtive expert principles for autonomous driving. In IEEE transactions on neural networks and learning systems.
Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397.
Kahn, G., Villaflor, A., Ding, B., Abbeel, P., & Levine, S. (2018). Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In IEEE international conference on robotics and automation (ICRA) (pp. 5129–5136). IEEE.
Kajita, S., Hirukawa, H., Harada, K., & Yokoi, K. (2014). Introduction to humanoid robotics (Vol. 101). Springer.
Kajita, S., Hirukawa, H., Harada, K., & Yokoi, K. (2014). Introduction to humanoid robotics.
Kuffner, J. J., Nishiwaki, K., Kagami, S., Inaba, M., & Inoue, H. (2001). Footstep planning among obstacles for biped robots. In Proceedings 2001 IEEE/RSJ international conference on intelligent robots and systems. Expanding the societal role of robotics in the the next millennium (cat. no. 01CH37180) (Vol. 1, pp. 500–505). IEEE.
Kuffner, J., Nishiwaki, K., Kagami, S., Inaba, M., & Inoue, H. (2005). Motion planning for humanoid robots. In Robotics research. The 11th international symposium: With 303 figures (pp. 365–374). Springer.
Kumar, P. B., Muni, M. K., & Parhi, D. R. (2020). Navigational analysis of multiple humanoids using a hybrid regression-fuzzy logic control approach in complex terrains. Applied Soft Computing, 89, 106088.
Article Google Scholar
Kusuma, M., & Machbub, C. (2019). Humanoid robot path planning and rerouting using a—star search algorithm. In 2019 IEEE international conference on signals and systems (ICSigSys) (pp. 110–115). IEEE.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
Lobos-Tsunekawa, K., Leiva, F., & Ruiz-delSolar, J. (2018). Visual navigation for biped humanoid robots using deep reinforcement learning. IEEE Robotics and Automation Letters, 3(4), 3247–3254.
Article Google Scholar
Mahapatro, A., Dhal, P. R., Parhi, D. R., Muni, M. K., Sahu, C., & Patra, S. K. (2023). Towards stabilization and navigational analysis of humanoids in complex arena using a hybridized fuzzy embedded pid controller approach. Expert Systems with Applications, 213, 119251.
Article Google Scholar
Maximo, M. R., & Afonso, R. J. (2020). Mixed-integer quadratic programming for automatic walking footstep placement, duration, and rotation. Optimal Control Applications and Methods, 41(6), 1928–1963.
Article MathSciNet Google Scholar
Montiel, O., Orozco-Rosas, U., & Sepúlveda, R. (2015). Path planning for mobile robots using bacterial potential field for avoiding static and dynamic obstacles. Expert Systems with Applications, 42(12), 5177–5191.
Article Google Scholar
Nguyen, A., Kanoulas, D., Caldwell, D. G., & Tsagarakis, N. G. (2016). Detecting object affordances with convolutional neural networks. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 2765–2770). IEEE.
Okada, K., Ogura, T., Haneda, A., & Inaba, M. (2005). Autonomous 3d walking system for a humanoid robot based on visual step recognition and 3d foot step planner. In Proceedings of the 2005 IEEE international conference on robotics and automation (pp. 623–628). IEEE.
Peng, M., Gong, Z., Sun, C., Chen, L., & Cao, D. (2020). Imitative reinforcement learning fusing vision and pure pursuit for self-driving. In 2020 IEEE international conference on robotics and automation (ICRA) (pp. 3298–3304). IEEE.
Raj, M., Semwal, V. B., & Nandi, G. C. (2019). Multiobjective optimized bipedal locomotion. International Journal of Machine Learning and Cybernetics, 10, 1997–2013.
Article Google Scholar
Rath, A. K., Parhi, D. R., Das, H. C., Muni, M. K., & Kumar, P. B. (2018). Analysis and use of fuzzy intelligent technique for navigation of humanoid robot in obstacle prone zone. Defence Technology, 14(6), 677–682.
Article Google Scholar
Regier, P., Milioto, A., Stachniss, C., & Bennewitz, M. (2020). Classifying obstacles and exploiting class information for humanoid navigation through cluttered environments. International Journal of Humanoid Robotics, 17(02), 2050013.
Article Google Scholar
Röfer, T., Laue, T., Kuball, J., Lübken, A., Maaß, F., Müller, J., Post, L., Richter-Klug, J., Schulz, P., & Stolpmann, A. (2016). B-human: Team Report and Code Release 2016. Deutschen Forschungszentrums für Künstliche Intelligenz (DFKI) GmbH.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Sherikov, A., Dimitrov, D., & Wieber, P.-B. (2014). Whole body motion controller with long-term balance constraints. In 2014 IEEE-RAS international conference on humanoid robots (pp. 444–450). IEEE.
Silva, M. F., & Machado, J. T. (2012). A literature review on the optimization of legged robots. Journal of Vibration and Control, 18(12), 1753–1767.
Article MathSciNet Google Scholar
Sugihara*, T. (2002). Realtime humanoid motion generation through zmp manipulation based on inverted pendulum control. In Porc. of proc. IEEE int. conf. on robotics and automation.
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 23–30). IEEE.
Vukobratović, M., & Borovac, B. (2004). Zero-moment point-thirty five years of its life. International Journal of Humanoid Robotics, 1(01), 157–173.
Article Google Scholar
Wiering, M., & Otterlo, M. V. (2012). Reinforcement learning: State of the art. Springer.
Yamamoto, T., & Sugihara, T. (2021). Responsive navigation of a biped robot that takes into account terrain, foot-reachability and capturability. Advanced Robotics, 35(8), 516–530.
Yang, C., Yuan, K., Heng, S., Komura, T., & Li, Z. (2020). Learning natural locomotion behaviors for humanoid robots using human bias. IEEE Robotics and Automation Letters, 5(2), 2610–2617.
Article Google Scholar
Yi, J., Zhu, Q., Xiong, R., & Wu, J. (2016). Walking algorithm of humanoid robot on uneven terrain with terrain estimation. International Journal of Advanced Robotic Systems, 13(1), 35.
Article Google Scholar

Download references

Funding

This research is supported by the National Natural Science Foundation of China (Grant No. 51375434 and Grant No. 11372270).

Author information

Authors and Affiliations

State Key Laboratory of Dynamics and Control Technology, Zhejiang University, Hangzhou, China
Xiaoying Wang & Tong Zhang
China Mobile Research Institute, Beijing, China
Tong Zhang

Authors

Xiaoying Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xiaoying Wang (First Author): Conceptualization, Methodology, Software, Investigation, Formal Analysis, Writing - Original Draft & Editing; Tong Zhang(Corresponding Author): Software, Validation, Visualization, Writing, Supervision - Original Draft & Review.

Corresponding author

Correspondence to Tong Zhang.

Ethics declarations

Conflict of interest

All authors have no conflicts of interest. On behalf of all authors, the corresponding author states that there is no Conflict of interest.

Ethical approval

The ethics committee approval from the authors’ institution informed consent has been obtained for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, X., Zhang, T. Reinforcement learning with imitative behaviors for humanoid robots navigation: synchronous planning and control. Auton Robot 48, 5 (2024). https://doi.org/10.1007/s10514-024-10160-w

Download citation

Received: 07 October 2023
Accepted: 17 March 2024
Published: 17 April 2024
DOI: https://doi.org/10.1007/s10514-024-10160-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning with imitative behaviors for humanoid robots navigation: synchronous planning and control

Abstract

Access this article

Similar content being viewed by others

Target-Driven Autonomous Robot Exploration in Mappless Indoor Environments Through Deep Reinforcement Learning

Incremental Learning for Autonomous Navigation of Mobile Robots based on Deep Reinforcement Learning

Collision Avoidance for Indoor Service Robots Through Multimodal Deep Reinforcement Learning

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation