Skip to main content
Log in

Reinforcement learning with imitative behaviors for humanoid robots navigation: synchronous planning and control

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Humanoid robots have strong adaptability to complex environments and possess human-like flexibility, enabling them to perform precise farming and harvesting tasks in varying depths of terrains. They serve as essential tools for agricultural intelligence. In this article, a novel method was proposed to improve the robustness of autonomous navigation for humanoid robots, which intercommunicates the data fusion of the footprint planning and control levels. In particular, a deep reinforcement learning model - Proximal Policy Optimization (PPO) that has been fine-tuned is introduced into this layer, before which heuristic trajectory was generated based on imitation learning. In the RL period, the KL divergence between the agent’s policy and imitative expert policy as a value penalty is added to the advantage function. As a proof of concept, our navigation policy is trained in a robotic simulator and then successfully applied to the physical robot GTX for indoor multi-mode navigation. The experimental results conclude that incorporating imitation learning imparts anthropomorphic attributes to robots and facilitates the generation of seamless footstep patterns. There is a significant improvement in ZMP trajectory in y-direction from the center by 21.56% is noticed. Additionally, this method improves dynamic locomotion stability, the body attitude angle falling between less than ± 5.5\(^\circ \) compared to ± 48.4\(^\circ \) with traditional algorithm. In general, navigation error is below 5 cm, which we verified in the experiments. It is thought that the outcome of the proposed framework presented in this article can provide a reference for researchers studying autonomous navigation applications of humanoid robots on uneven ground.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., & Ghemawat, S. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.

  • Abedi, E., Alamirpour, P., & Mirshahvalad, R. (2017). Control humanoid robot using intelligent optimization algorithms fusion with fourier series. In 2017 9th international conference on computational intelligence and communication networks (CICN) (pp. 181–185). IEEE.

  • Aldana-Murillo, N. G., Sandoval, L., Hayet, J.-B., Esteves, C., & Becerra, H. M. (2020). Coupling humanoid walking pattern generation and visual constraint feedback for pose-regulation and visual path-following. Robotics and Autonomous Systems, 128, 103497.

    Article  Google Scholar 

  • Amos, B., Jimenez, I., Sacks, J., Boots, B., & Kolter, J. Z. (2018). Differentiable MPC for end-to-end planning and control. In Advances in neural information processing systems (Vol. 31).

  • Ayari, A., & Knani, J. (2018). The generation of a stable walking trajectory of a bipedal robot based on the COG based gain pattern and ZMP constraint. International Journal of Advanced Computer Science and Applications (IJASCA), 9(9).

  • Ayaz, Y., Munawar, K., Malik, M. B., Konno, A., & Uchiyama, M. (2007). Human-like approach to footstep planning among obstacles for humanoid robots. International Journal of Humanoid Robotics, 4(01), 125–149.

    Article  Google Scholar 

  • Brandenburger, A., Rodriguez, D., & Behnke, S. (2021). Mapless humanoid navigation using learned latent dynamics. In 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1555–1561). IEEE.

  • Chebotar, Y., Handa, A., Makoviychuk, V., Macklin, M., Issac, J., Ratliff, N., & Fox, D. (2019). Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In 2019 international conference on robotics and automation (ICRA) (pp. 8973–8979). IEEE.

  • Chestnutt, J., Lau, M., Cheung, G., Kuffner, J., Hodgins, J., & Kanade, T. (2005). Footstep planning for the honda asimo humanoid. In Proceedings of the 2005 IEEE international conference on robotics and automation (pp. 629–634). IEEE.

  • Chestnutt, J., Nishiwaki, K., Kuffner, J., & Kagami, S. (2007). An adaptive action model for legged navigation planning. In 2007 7th IEEE-RAS international conference on humanoid robots (pp. 196–202). IEEE.

  • Chestnutt, J., & Kuffner, J. J. (2004). A tiered planning strategy for biped navigation. In 4th IEEE/RAS international conference on humanoid robots, (Vol. 1, pp. 422–436). IEEE.

  • Chung, R.-L., Hsueh, Y., Chen, S.-L., & Abu, P. A. R. (2022). Efficient and accurate cordic pipelined architecture chip design based on binomial approximation for biped robot. Electronics, 11(11), 1701.

    Article  Google Scholar 

  • Diedam, H., Dimitrov, D., Wieber, P.-B., Mombaur, K., & Diehl, M. (2008). Online walking gait generation with adaptive foot positioning through linear model predictive control. In 2008 IEEE/RSJ international conference on intelligent robots and systems (pp. 1121–1126). IEEE.

  • Garcia, M., Stasse, O., Hayet, J.-B., Dune, C., Esteves, C., & Laumond, J.-P. (2015). Vision-guided motion primitives for humanoid reactive walking: Decoupled versus coupled approaches. The International Journal of Robotics Research, 34(4–5), 402–419.

    Article  Google Scholar 

  • Gutmann, J.-S., Fukuchi, M., & Fujita, M. (2005). A modular architecture for humanoid robot navigation. In 5th IEEE-RAS international conference on humanoid robots (pp. 26–31). IEEE.

  • Ha, I., Tamura, Y., & Asama, H. (2011). Gait pattern generation and stabilization for humanoid robot based on coupled oscillators. In 2011 IEEE/RSJ international conference on intelligent robots and systems (pp. 3207–3212). IEEE.

  • Herdt, A., Perrin, N., & Wieber, P.-B. (2010). Walking without thinking about it. In IEEE/RSJ international conference on intelligent robots and systems (pp. 190–195). IEEE.

  • Herdt, A., Diedam, H., Wieber, P.-B., Dimitrov, D., Mombaur, K., & Diehl, M. (2010). Online walking motion generation with automatic footstep placement. Advanced Robotics, 24(5–6), 719–737.

    Article  Google Scholar 

  • Hildebrandt, A.-C., Wahrmann, D., Wittmann, R., & Rixen, D., (2016). Autonomous robotics: Application on legged and agricultural robots. In DGR-Days.

  • Huang, Z., Wu, J., & Lv, C. (2022). Efficient deep reinforcement learning with IMITAtive expert principles for autonomous driving. In IEEE transactions on neural networks and learning systems.

  • Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397.

  • Kahn, G., Villaflor, A., Ding, B., Abbeel, P., & Levine, S. (2018). Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In IEEE international conference on robotics and automation (ICRA) (pp. 5129–5136). IEEE.

  • Kajita, S., Hirukawa, H., Harada, K., & Yokoi, K. (2014). Introduction to humanoid robotics (Vol. 101). Springer.

  • Kajita, S., Hirukawa, H., Harada, K., & Yokoi, K. (2014). Introduction to humanoid robotics.

  • Kuffner, J. J., Nishiwaki, K., Kagami, S., Inaba, M., & Inoue, H. (2001). Footstep planning among obstacles for biped robots. In Proceedings 2001 IEEE/RSJ international conference on intelligent robots and systems. Expanding the societal role of robotics in the the next millennium (cat. no. 01CH37180) (Vol. 1, pp. 500–505). IEEE.

  • Kuffner, J., Nishiwaki, K., Kagami, S., Inaba, M., & Inoue, H. (2005). Motion planning for humanoid robots. In Robotics research. The 11th international symposium: With 303 figures (pp. 365–374). Springer.

  • Kumar, P. B., Muni, M. K., & Parhi, D. R. (2020). Navigational analysis of multiple humanoids using a hybrid regression-fuzzy logic control approach in complex terrains. Applied Soft Computing, 89, 106088.

    Article  Google Scholar 

  • Kusuma, M., & Machbub, C. (2019). Humanoid robot path planning and rerouting using a—star search algorithm. In 2019 IEEE international conference on signals and systems (ICSigSys) (pp. 110–115). IEEE.

  • Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

  • Lobos-Tsunekawa, K., Leiva, F., & Ruiz-delSolar, J. (2018). Visual navigation for biped humanoid robots using deep reinforcement learning. IEEE Robotics and Automation Letters, 3(4), 3247–3254.

    Article  Google Scholar 

  • Mahapatro, A., Dhal, P. R., Parhi, D. R., Muni, M. K., Sahu, C., & Patra, S. K. (2023). Towards stabilization and navigational analysis of humanoids in complex arena using a hybridized fuzzy embedded pid controller approach. Expert Systems with Applications, 213, 119251.

    Article  Google Scholar 

  • Maximo, M. R., & Afonso, R. J. (2020). Mixed-integer quadratic programming for automatic walking footstep placement, duration, and rotation. Optimal Control Applications and Methods, 41(6), 1928–1963.

    Article  MathSciNet  Google Scholar 

  • Montiel, O., Orozco-Rosas, U., & Sepúlveda, R. (2015). Path planning for mobile robots using bacterial potential field for avoiding static and dynamic obstacles. Expert Systems with Applications, 42(12), 5177–5191.

    Article  Google Scholar 

  • Nguyen, A., Kanoulas, D., Caldwell, D. G., & Tsagarakis, N. G. (2016). Detecting object affordances with convolutional neural networks. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 2765–2770). IEEE.

  • Okada, K., Ogura, T., Haneda, A., & Inaba, M. (2005). Autonomous 3d walking system for a humanoid robot based on visual step recognition and 3d foot step planner. In Proceedings of the 2005 IEEE international conference on robotics and automation (pp. 623–628). IEEE.

  • Peng, M., Gong, Z., Sun, C., Chen, L., & Cao, D. (2020). Imitative reinforcement learning fusing vision and pure pursuit for self-driving. In 2020 IEEE international conference on robotics and automation (ICRA) (pp. 3298–3304). IEEE.

  • Raj, M., Semwal, V. B., & Nandi, G. C. (2019). Multiobjective optimized bipedal locomotion. International Journal of Machine Learning and Cybernetics, 10, 1997–2013.

    Article  Google Scholar 

  • Rath, A. K., Parhi, D. R., Das, H. C., Muni, M. K., & Kumar, P. B. (2018). Analysis and use of fuzzy intelligent technique for navigation of humanoid robot in obstacle prone zone. Defence Technology, 14(6), 677–682.

    Article  Google Scholar 

  • Regier, P., Milioto, A., Stachniss, C., & Bennewitz, M. (2020). Classifying obstacles and exploiting class information for humanoid navigation through cluttered environments. International Journal of Humanoid Robotics, 17(02), 2050013.

    Article  Google Scholar 

  • Röfer, T., Laue, T., Kuball, J., Lübken, A., Maaß, F., Müller, J., Post, L., Richter-Klug, J., Schulz, P., & Stolpmann, A. (2016). B-human: Team Report and Code Release 2016. Deutschen Forschungszentrums für Künstliche Intelligenz (DFKI) GmbH.

  • Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

  • Sherikov, A., Dimitrov, D., & Wieber, P.-B. (2014). Whole body motion controller with long-term balance constraints. In 2014 IEEE-RAS international conference on humanoid robots (pp. 444–450). IEEE.

  • Silva, M. F., & Machado, J. T. (2012). A literature review on the optimization of legged robots. Journal of Vibration and Control, 18(12), 1753–1767.

    Article  MathSciNet  Google Scholar 

  • Sugihara*, T. (2002). Realtime humanoid motion generation through zmp manipulation based on inverted pendulum control. In Porc. of proc. IEEE int. conf. on robotics and automation.

  • Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. In IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 23–30). IEEE.

  • Vukobratović, M., & Borovac, B. (2004). Zero-moment point-thirty five years of its life. International Journal of Humanoid Robotics, 1(01), 157–173.

    Article  Google Scholar 

  • Wiering, M., & Otterlo, M. V. (2012). Reinforcement learning: State of the art. Springer.

  • Yamamoto, T., & Sugihara, T. (2021). Responsive navigation of a biped robot that takes into account terrain, foot-reachability and capturability. Advanced Robotics, 35(8), 516–530.

  • Yang, C., Yuan, K., Heng, S., Komura, T., & Li, Z. (2020). Learning natural locomotion behaviors for humanoid robots using human bias. IEEE Robotics and Automation Letters, 5(2), 2610–2617.

    Article  Google Scholar 

  • Yi, J., Zhu, Q., Xiong, R., & Wu, J. (2016). Walking algorithm of humanoid robot on uneven terrain with terrain estimation. International Journal of Advanced Robotic Systems, 13(1), 35.

    Article  Google Scholar 

Download references

Funding

This research is supported by the National Natural Science Foundation of China (Grant No. 51375434 and Grant No. 11372270).

Author information

Authors and Affiliations

Authors

Contributions

Xiaoying Wang (First Author): Conceptualization, Methodology, Software, Investigation, Formal Analysis, Writing - Original Draft & Editing; Tong Zhang(Corresponding Author): Software, Validation, Visualization, Writing, Supervision - Original Draft & Review.

Corresponding author

Correspondence to Tong Zhang.

Ethics declarations

Conflict of interest

All authors have no conflicts of interest. On behalf of all authors, the corresponding author states that there is no Conflict of interest.

Ethical approval

The ethics committee approval from the authors’ institution informed consent has been obtained for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Zhang, T. Reinforcement learning with imitative behaviors for humanoid robots navigation: synchronous planning and control. Auton Robot 48, 5 (2024). https://doi.org/10.1007/s10514-024-10160-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10514-024-10160-w

Keywords

Navigation