Abstract
Today, robotic arms are widely used in industry. Reinforcement learning algorithms are used frequently for controlling robotic arms in complex environments. One of the customs off-policy model-free actor-critic deep reinforcement learning for continuous action spaces is deep deterministic policy gradient (DDPG). This algorithm has achieved significant results when applied to control robotic arms with high degrees of freedom. But, it also has limitations. DDPG is prone to instability and divergence in complex tasks due to the high dimensional continuous action spaces. In this paper, in order to increase the reliability and convergence speed of the DDPG algorithm, a new modified convergence DDPG (MCDDPG) algorithm is presented. By saving and reusing desirable parameters of the previous actor and critic networks, the proposed algorithm has shown a significant enhancement in training time and stability of the model compared to the conventional DDPG. We evaluate our method on the PR2’s right arm which is a 7-DoF manipulator, and simulations demonstrate that our MCDDPG outperforms state-of-the-art algorithms such as DDPG and normalized advantage function in learning complex robotic tasks.
Similar content being viewed by others
References
Chieh-Li C et al (2011) Robust trajectories following control of a 2-link robot manipulator via coordinate transformation for manufacturing applications. Robot Comput Integr Manuf 27(3):569–580
Rao D et al (2023) Design and development of robotic manipulator’s for medical surgeries. Mater Today Proc 80(1):195–201
Broquere X et al (2008) Soft motion trajectory planner for service manipulator robot. In: 2008 IEEE/RSJ international conference on intelligent robots and systems
Galicki M (2016) Finite-time trajectory tracking control in a task space of robotic manipulators. Automatica 67:165–170
Truong TN et al (2021) A backstepping global fast terminal sliding mode control for trajectory tracking control of industrial robotic manipulators. IEEE Access 9:31921–31931
Chen D et al (2021) A novel supertwisting zeroing neural network with application to mobile robot manipulators. IEEE Trans Neural Netw Learn Syst 32(4):1776–1787
Li W et al (2021) A gradient-based neural network accelerated for vision-based control of an RCM-constrained surgical endoscope robot. Neural Comput Appl 34:1329–1343. https://doi.org/10.1007/s00521-021-06465-x
Adorno BV, Marinho MM (2020) Dq robotics: a library for robot modeling and control. IEEE Robot Autom Mag 28(3):102–116. https://doi.org/10.1109/MRA.2020.2997920
Marinho MM et al (2020) SmartArm: integration and validation of a versatile surgical robotic system for constrained workspaces. Int J Med Robot Comput Assist Surg 16(2):e2053. https://doi.org/10.1002/rcs.2053
Quiroz-Omana JJ et al (2019) Whole-body control with (self) collision avoidance using vector field inequalities. IEEE Robot Autom Lett 4(4):4048–4053. https://doi.org/10.1109/LRA.2019.2928783
Savino HJ et al (2020) Pose consensus based on dual quaternion algebra with application to decentralized formation control of mobile manipulators. J Frankl Inst 357(1):142–178. https://doi.org/10.1016/j.jfranklin.2019.09.045
Pane YP et al (2019) Reinforcement learning based compensation methods for robot manipulators. Eng Appl Artif Intell 78:236–247
Tai L et al (2017) Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE. https://doi.org/10.1109/IROS.2017.8202134
Rusu AA et al (2017) Sim-to-real robot learning from pixels with progressive nets. In: 1st conference on robot learning (CoRL 2017), Mountain View, USA
Kober J et al (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274. https://doi.org/10.1177/0278364913495721
Deisenroth MP et al (2013) A survey on policy search for robotics. Found Trends Robot 2(1–2):388–403. https://doi.org/10.1561/2300000021
Mnih V et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Krizhevsky A et al (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Kodama N et al (2018) A proposal for reducing the number of trial-and-error searches for deep Q-networks combined with exploitation-oriented learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA). IEEE. https://doi.org/10.1109/ICMLA.2018.00160
Zhang C, Ma L (2019) Trial and error experience replay based deep reinforcement learning. In: 2019 IEEE international conference on smart cloud (SmartCloud), IEEE. https://doi.org/10.1109/SmartCloud.2019.00045
Levine S et al (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373
Ghadirzadeh A et al (2017) Deep predictive policy training using reinforcement learning. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE. https://doi.org/10.1109/IROS.2017.8206046
Degris T et al (2012) Model-free reinforcement learning with continuous action in practice. In: 2012 American control conference (ACC). IEEE. https://doi.org/10.1109/ACC.2012.6315022
Hua J et al (2021) Learning for a robot: deep reinforcement learning, imitation learning, transfer learning. Sensors 21(4):1278
Yang Z et al (2018) Hierarchical deep reinforcement learning for continuous action control. IEEE Trans Neural Netw Learn Syst 29(11):5174–5184. https://doi.org/10.1109/TNNLS.2018.2805379
Lillicrap T, Hunt JJ, Pritzel A, Heess N et al (2016) Continuous control with deep reinforcement learning. arXiv:1509.02971. https://doi.org/10.48550/arXiv.1509.02971
Gu S, Lillicrap T, Sutskever I, Levine S (2016) Continuous deep Q-learning with model-based acceleration. In: Proceedings of the 33rd international conference on machine learning, New York, NY, USA, pp 19–24. https://doi.org/10.48550/arXiv.1603.00748
PR2 - ROBOTS: Your Guide to the World of Robotics (ieee.org)
Zhang F, Leitner J, Milford M, Upcroft B, Corke P (2015) Towards vision-based deep reinforcement learning for robotic motion control. In: Australasian conference on robotics and automation (ACRA)
Schulman J, Wolski F et al (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. https://doi.org/10.48550/arXiv.1707.06347
Mnih V, Badia AP et al (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937. https://doi.org/10.48550/arXiv.1602.01783
Mirowski P, Pascanu R, Viola F et al (2017) Learning to navigate in complex environments. https://doi.org/10.48550/arXiv.1611.03673
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th international conference on machine learning, PMLR, vol 80, pp 1861–1870. https://doi.org/10.48550/arXiv.1801.01290
Haarnoja T, Zhou A, Hartikainen K et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905. https://doi.org/10.48550/arXiv.1812.05905
Gu S, Holly E, Lillicrap T, Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings of the 2017 IEEE international conference on robotics and automation (ICRA), Singapore, pp 3389–3396. https://doi.org/10.1109/ICRA.2017.7989385
Arnekvist I (2017) Reinforcement learning for robotic manipulation. Thesis, KTH Royal Institute of Technology
Gu S (2019) Sample-efficient deep reinforcement learning for continuous control. PhD dissertation, University of Cambridge.
Funding
This study was not supported by any funding agency.
Author information
Authors and Affiliations
Contributions
All authors wrote the main manuscript text and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Afzali, S.R., Shoaran, M. & Karimian, G. A Modified Convergence DDPG Algorithm for Robotic Manipulation. Neural Process Lett 55, 11637–11652 (2023). https://doi.org/10.1007/s11063-023-11393-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11393-z