Skip to main content
Log in

Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum

  • Short Paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript


Dexterous manipulation tasks usually have multiple objectives. The priorities of these objectives may vary at different phases of a manipulation task. Current methods do not consider the objective priority and its change during the task, making a robot have a hard time or even fail to learn a good policy. In this work, we develop a novel Adaptive Hierarchical Curriculum to guide the robot to learn manipulation tasks with multiple prioritized objectives. Our method determines the objective priorities during the learning process and updates the learning sequence of the objectives to adapt to the changing priorities at different phases. A smooth transition function is developed to mitigate the effects on the learning stability when updating the learning sequence. The proposed method is validated in a multi-objective manipulation task with a JACO robot arm in which the robot needs to manipulate a target with obstacles surrounded. The simulation and physical experiment results show that the proposed method outperforms the baseline methods with a 92.5% success rate in 40 tests and on average takes 36.4% less time to finish the task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Similar content being viewed by others

Data Availability

Not applicable.

Code Availability

Not applicable.


  1. Bellman, R.: A markovian decision process. J Appl Math Mech. 6(5), 679–684 (1957)

    MathSciNet  MATH  Google Scholar 

  2. Devlin, S., Kudenko, D., Grześ, M.: An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Adv. Complex Syst. 14(2), 251–278 (2011)

    Article  MathSciNet  Google Scholar 

  3. Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. In: Conference on robot learning, pp. 482–495. PMLR (2017)

    Google Scholar 

  4. Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., Levine, S.: Composable deep reinforcement learning for robotic manipulation. In: 2018 IEEE international conference on robotics and automation (ICRA), pp. 6244–6251. IEEE (2018)

    Chapter  Google Scholar 

  5. Hu, Z., Kaifang, W., Gao, X., Zhai, Y.: A dynamic adjusting reward function method for deep reinforcement learning with adjustable parameters. Math Probl. Eng. 1–10 (2019).

  6. Jain V, Tulabandhula T (2017) Faster Reinforcement learning using active simulators. CoRR. abs/1703.07853.

  7. Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., et al.: Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on robot learning, pp. 651–673. PMLR (2018)

    Google Scholar 

  8. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 2015 3rd International Conference on Learning Representations, (ICLR). Conference Track Proceedings, San Diego (2015).

    Google Scholar 

  9. Kroemer, O., Daniel, C., Neumann, G., Van Hoof, H., Herke, G., Peters, J.: Towards learning hierarchical skills for multi-phase manipulation tasks. In: 2015 IEEE international conference on robotics and automation (ICRA), pp. 1503–1510. IEEE (2015)

    Chapter  Google Scholar 

  10. Kroemer, O., Van Hoof, H., Neumann, G., Peters, J.: Learning to predict phases of manipulation tasks as hidden states. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4009–4014. IEEE (2014)

    Chapter  Google Scholar 

  11. Luo, Y., Dong, K., Zhao, L., Sun, Z., Zhou, C., Song, B.: Balance between efficient and effective learning: Dense2sparse reward shaping for robot manipulation with environment uncertainty. arXiv preprint arXiv:2003.02740. (2020)

  12. Maheu, V., Archambault, P.S., Frappier, J., Routhier, F.: Evaluation of the JACO robotic arm: Clinico-economic study for powered wheelchair users with upper-extremity disabilities. In: 2011 IEEE international conference on rehabilitation robotics, pp. 1–5. IEEE (2011)

    Google Scholar 

  13. Modugno, V., Neumann, G., Rueckert, E., Oriolo, G., Peters, J., Ivaldi, S.: Learning soft task priorities for control of redundant robots. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 21–226. IEEE (2016)

    Google Scholar 

  14. Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M.E., Stone, P.: Curriculum learning for reinforcement learning domains: a framework and survey. CoRR. abs/2003.04960 (2020)

  15. Narvekar, S., Sinapov, J., Stone, P.: Autonomous task sequencing for customized curriculum design in reinforcement learning, pp. 2536–2542 (2017) IJCAI

    Google Scholar 

  16. Narvekar, S., Stone, P.: Learning curriculum policies for reinforcement learning. CoRR. abs/1812.00285 (2018)

  17. Narvekar, S., Stone, P.: Generalizing curricula for reinforcement learning. In: 2020 4th lifelong machine learning workshop at ICML (2020)\_070N

    Google Scholar 

  18. Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: Theory and application to reward shaping. Icml. 99, 278–287 (1999)

    Google Scholar 

  19. Nguyen, H., La, H.: Review of deep reinforcement learning for robot manipulation. In: 2019 Third IEEE International Conference on Robotic Computing (IRC), pp. 590–595. IEEE (2019)

    Chapter  Google Scholar 

  20. Popov, I., Heess, N., Lillicrap, T.P., Hafner, R., Barth-Maron, G., Vecerik, M., Lampe, T., Tassa, Y., Erez, T., Riedmiller, M.A.: Data-efficient deep reinforcement learning for dexterous manipulation. CoRR. abs/1704.03073 (2017)

  21. Rohmer, E., Singh, S.S.P., Freese, M.: V-REP: A versatile and scalable robot simulation framework. In: 2013 IEEE/RSJ international conference on intelligent robots and systems, pp. 1321–1326. IEEE (2013)

    Chapter  Google Scholar 

  22. Salini, J., Padois, V., Bidaud, P.: Synthesis of complex humanoid whole-body behavior: A focus on sequencing and tasks transitions. In: 2011 IEEE international conference on robotics and automation, pp. 1283–1290. IEEE (2011)

    Chapter  Google Scholar 

  23. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: Bengio, Y., Lecun, Y. (eds.) 2016 4th International Conference on Learning Representations, (ICLR). Conference Track Proceedings, San Juan (2016)

    Google Scholar 

  24. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR. abs/1707.06347 (2017)

  25. Sharma, M., Liang, J., Zhao, J., LaGrassa, A., Kroemer, O.: Learning to compose hierarchical object-centric controllers for robotic manipulation. CoRR. abs/2011.04627 (2020)

  26. Da Silva, F.L., Da Costa, L., Reali, A.H.: Object-oriented curriculum generation for reinforcement learning. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, pp. 1026–1034 (2018)

    Google Scholar 

  27. Tenorio-Gonzalez, A.C., Morales, E.F., Villasenor-Pineda, L.: Dynamic reward shaping: training a robot by voice. In: Ibero-American conference on artificial intelligence, pp. 483–492. Springer (2010)

    Google Scholar 

  28. Tutsoy, O., Barkana, D.E., Tugal, H.: Design of a completely model free adaptive control in the presence of parametric, non-parametric uncertainties and random control signal delay. ISA Trans. 76, 67–77 (2018)

    Article  Google Scholar 

  29. Tutsoy, O., Erol Barkana, D., Sule, C.: Learning to balance an NAO robot using reinforcement learning with symbolic inverse kinematic. Trans. Inst. Meas. Control. 39(11), 1735–1748 (2017)

    Article  Google Scholar 

  30. Veiga, F., Akrour, R., Peters, J.: Hierarchical tactile-based control decomposition of dexterous in-hand manipulation tasks. Front Robot AI. 7., (2020).

  31. Zhang, D., Bailey, C.P.: Obstacle avoidance and navigation utilizing reinforcement learning with reward shaping. In: Pham, T., Solomon, L., Rainey, K. (eds.) Artificial intelligence and machine learning for multi-domain operations applications II, vol. 11413, pp. 500–506. International Society for Optics and Photonics (SPIE) (2020).

    Chapter  Google Scholar 

  32. Zhu, Y., Wang, Z., Merel, J., Rusu, A.A., Erez, T., Cabi, S., Tunyasuvunakool, S., Kramár, J., Hadsell, R., de Freitas, N., Heess, N.: Reinforcement and imitation learning for diverse visuomotor skills. CoRR. abs/1802.09564 (2018)

Download references


This material is based on work supported by the US NSF under grant 1652454 and 2114464. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the National Science Foundation.

Author information

Authors and Affiliations



All authors contributed to the study conception and design. The first manuscript was written by Lingfeng Tao. Dr. Jiucai Zhang and Dr. Xiaoli Zhang provided comments and edits towards the creation of the final manuscript.

Corresponding author

Correspondence to Xiaoli Zhang.

Ethics declarations

Ethics Approval

Ethical approval was waived by the local Ethics Committee of Colorado School of Mines in view of the retrospective nature of the study and all the procedures being performed were part of the routine care.

Consents to Participate

Informed consent was obtained from all individual participants included in the study.

Consents for Publication

The participants have consented to the submission of the case report to the journal.

Conflict of Interests

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Table 3 shows the performance statistics of the trained policies across the three phases over 40 evaluations, including the number of times that the robot touched obstacles, average time consumption in each phase, and the average normalized reward in each phase. LS is not included because it has no phases during the training.

Table 3 Performance Statistics across Phases over 40 Evaluations

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tao, L., Zhang, J. & Zhang, X. Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum. J Intell Robot Syst 106, 1 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: