A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning

Morales, Eduardo F.; Murrieta-Cid, Rafael; Becerra, Israel; Esquivel-Basaldua, Marco A.

doi:10.1007/s11370-021-00398-z

A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning

Review Article
Published: 16 November 2021

Volume 14, pages 773–805, (2021)
Cite this article

Intelligent Service Robotics Aims and scope Submit manuscript

Eduardo F. Morales^1,2,
Rafael Murrieta-Cid¹,
Israel Becerra ORCID: orcid.org/0000-0002-9788-1128^1,3 &
…
Marco A. Esquivel-Basaldua¹

3932 Accesses
29 Citations
Explore all metrics

Abstract

This article is about deep learning (DL) and deep reinforcement learning (DRL) works applied to robotics. Both tools have been shown to be successful in delivering data-driven solutions for robotics tasks, as well as providing a natural way to develop an end-to-end pipeline from the robot’s sensing to its actuation, passing through the generation of a policy to perform the given task. These frameworks have been proven to be able to deal with real-world complications such as noise in sensing, imprecise actuation, variability in the scenarios where the robot is being deployed, among others. Following that vein, and given the growing interest in DL and DRL, the present work starts by providing a brief tutorial on deep reinforcement learning, where the goal is to understand the main concepts and approaches followed in the field. Later, the article describes the main, recent, and most promising approaches of DL and DRL in robotics, with sufficient technical detail to understand the core of the works and to motivate interested readers to initiate their own research in the area. Then, to provide a comparative analysis, we present several taxonomies in which the references can be classified, according to high-level features, the task that the work addresses, the type of system, and the learning techniques used in the work. We conclude by presenting promising research directions in both DL and DRL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep reinforcement learning in mobile robotics – a concise review

Article 05 February 2024

Research on Robot Control Based on Reinforcement Learning

Uncertainty-Aware Autonomous Mobile Robot Navigation with Deep Reinforcement Learning

Notes

State, Action, Reward, State, Action.
REward Increment = Non negative Factor \(\times \) Offset Reinforcement \(\times \) Characteristic Eligibility.

References

Abate A, Prandini M, Lygeros J, Sastry S (2008) Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica 44(11):2724–2734
Article MathSciNet MATH Google Scholar
Amini A, Gilitschenski I, Phillips J, Moseyko J, Banerjee R, Karaman S, Rus D (2020) Learning robust control policies for end-to-end autonomous driving from data-driven simulation. IEEE Robot Autom Lett RA-L 5(2):1143–1150
Article Google Scholar
Amini A, Rosman G, Karaman S, Rus D (2019) Variational end-to-end navigation and localization. In: IEEE international conference on robotics and automation (ICRA), pp 8958–8964
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Pieter Abbeel O, Zaremba W (2017) Hindsight experience replay, pp 5048–5058
Asseman A, Kornuta T, Ozcan A (2018) Learning beyond simulated physics
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Bareinboim E, Forney A, Pearl J (2015) Bandits with unobserved confounders: a causal approach. Adv Neural Inf Process Syst NIPS 28:1342–1350
Google Scholar
Barth-Maron G, Hoffman MW, Budden D, Dabney W, Horgan D, Tb D, Muldal A, Heess N, Lillicrap T (2018) Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617
Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: International conference on machine learning (ICML), pp 449–458
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: International conference on machine learning (ICML), pp 41–48
Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, Jackel LD, Monfort M, Muller U, Zhang J et al (2016) End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI gym. arXiv preprint arXiv:1606.01540
Cabi S, Colmenarejo SG, Novikov A, Konyushkova K, Reed S, Jeong R, Zolna K, Aytar Y, Budden D, Vecerik M et al (2019) Scaling data-driven robotics with reward sketching and batch reinforcement learning. arXiv preprint arXiv:1909.12200
Cai P, Wang S, Sun Y, Liu M (2020) Probabilistic end-to-end vehicle navigation in complex dynamic environments with multimodal sensor fusion. IEEE Robot Autom Lett 5(3):4218–4224
Google Scholar
Caicedo JC, Lazebnik S (2015) Active object localization with deep reinforcement learning. In: International conference on computer vision, pp 2488–2496
Caicedo JC, Lazebnik S (2017) Action-decision networks for visual tracking with deep reinforcement learning. In: IEEE international conference on imaging, vision and pattern recognition, pp 2711–2720
Calli B, Singh A, Walsman A, Srinivasa S, Abbeel P, Dollar AM (2015) The YCB object and model set: towards common benchmarks for manipulation research. In: International conference on advanced robotics (ICAR), pp 510–517
Campos V, Trott A, Xiong C, Socher R, Giró-i Nieto X, Torres J (2020) Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International conference on machine learning. PMLR, pp 1317–1327
Canny J (1988) Some algebraic and geometric computations in PSPACE. In: ACM symposium on theory of computing (STOC), pp 460–467
Canny J, Reif J (1987) New lower bound techniques for robot motion planning problems. In: IEEE symposium on foundations of computer science (FOCS), pp 49–60
Chebotar Y, Handa A, Makoviychuk V, Macklin M, Issac J, Ratliff N, Fox D (2019) Closing the sim-to-real loop: adapting simulation randomization with real world experience. In: IEEE international conference on robotics and automation (ICRA), pp 8973–8979
Chen B, Dai B, Lin Q, Ye G, Liu H, Song L (2019) Learning to plan in high dimensions via neural exploration-exploitation trees. In: International conference on learning representations (ICLR)
Chen T, Zhai X, Ritter M, Lucic M, Houlsby N (2019) Self-supervised GANs via auxiliary rotation loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12154–12163
Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297
Chiang HT, Malone N, Lesser K, Oishi M, Tapia L (2015) Aggressive moving obstacle avoidance using a stochastic reachable set based potential field. In: International workshop on the algorithmic foundations of robotics (WAFR), pp 73–89
Chiang HTL, Faust A, Sugaya S, Tapia L (2018) Fast swept volume estimation with deep learning. In: International workshop on the algorithmic foundations of robotics (WAFR), pp 52–68
Chiang HTL, Faust M, Fiser M, Frances A (2019) Learning navigation behaviors end to end with auto-RL. IEEE Robot Autom Lett 56:2007–2014
Article Google Scholar
Chiang HTL, Hsu J, Fiser M, Tapia L, Faust A (2019) RL-RRT: kinodynamic motion planning via learning reachability estimators from RL policies. IEEE Robot Autom Lett 4(4):4298–4305
Article Google Scholar
Codevilla F, Miiller M, López A, Koltun V, Dosovitskiy A (2018) End-to-end driving via conditional imitation learning. In: IEEE international conference on robotics and automation (ICRA), pp 1–9
Crosby M, Beyret B, Halina M (2019) The animal-ai olympics. Nat Mach Intell 1(5):257–257
Article Google Scholar
Dasari S, Ebert F, Tian S, Nair S, Bucher B, Schmeckpeper K, Singh S, Levine S, Finn C (2019) Robonet: large-scale multi-robot learning. arXiv preprint arXiv:1910.11215
Dasgupta I, Wang J, Chiappa S, Mitrovic J, Ortega P, Raposo D, Hughes E, Battaglia P, Botvinick M, Kurth-Nelson Z (2019) Causal reasoning from meta-reinforcement learning. arXiv preprint arXiv:1901.08162
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255
Devo A, Dionigi A, Costante G (2021) Enhancing continuous control of mobile robots for end-to-end visual active tracking. Robot Autonom Syst. https://doi.org/10.1016/j.robot.2021.103799
Article Google Scholar
Dolgov D, Thrun S, Montemerlo M, Diebel J (2010) Path planning for autonomous vehicles in unknown semi-structured environments. Int J Robot Res 29(5):485–501
Article Google Scholar
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) FlowNet: learning optical flow with convolutional networks. In: IEEE international conference on computer vision (ICCV), pp 2758–2766
Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) Carla: an open urban driving simulator. In: Conference on robot learning (CoRL), pp 1–16
Driess D, Oguz O, Ha JS, Toussaint M (2020) Deep visual heuristics: learning feasibility of mixed-integer programs for manipulation planning. In: 2020 IEEE international conference on robotics and automation (ICRA). IEEE, pp 9563–9569
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531. https://doi.org/10.1109/TNN.2011.2160459
Article Google Scholar
Fabisch A, Petzoldt C, Otto M, Kirchner F (2019) A survey of behavior learning applications in robotics-state of the art and perspectives. arXiv preprint arXiv:1906.01868
Fairbank M, Alonso E (2012) The divergence of reinforcement learning algorithms with value-iteration and function approximation. In: IEEE international joint conference on neural networks (IJCNN), pp 1–8
Faust A, Oslund K, Ramirez O, Francis A, Tapia L, Fiser M, Davidson J (2018) PRM-RL: long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. In: IEEE international conference on robotics and automation (ICRA), pp 5113–5120
Fernández IMR, Sutanto G, Englert P, Ramachandran RK, Sukhatme GS (2020) Learning manifolds for sequential motion planning. arXiv preprint arXiv:2006.07746
Fortunato M, Azar MG, Piot B, Menick J, Hessel M, Osband I, Graves A, Mnih V, Munos R, Hassabis D et al (2018) Noisy networks for exploration. In: International conference on learning representations (ICLR)
Fox D (2001) KLD-sampling: adaptive particle filters, pp 713–720
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning. PMLR, pp 1587–1596
Gao W, Hsu D, Lee WS, Shen S, Subramanian K (2017) Intention-net: integrating planning and deep learning for goal-directed autonomous navigation. In: Conference on robot learning (CoRL), pp 185–194
Garcia Cifuentes C, Issac J, Wüthrich M, Schaal S, Bohg J (2016) Probabilistic articulated real-time tracking for robot manipulation. IEEE Robot Autom Lett RA-L 2(2):577–584
Article Google Scholar
Garg A, Chiang HTL, Sugaya S, Faust A, Tapia L (2019) Comparison of deep reinforcement learning policies to formal methods for moving obstacle avoidance. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3534–3541
Gershman SJ (2017) Reinforcement learning and causal models. The Oxford handbook of causal reasoning, p 295
Gonzalez-Trejo J, Mercado-Ravell DA, Becerra I, Murrieta-Cid R (2021) On the visual-based safe landing of UAVS in populated areas: a crucial aspect for urban deployment. IEEE Robot Autom Lett. https://doi.org/10.1109/LRA.2021.3101861
Article Google Scholar
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge
MATH Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems (NIPS), pp 2672–2680
Hadsell R, Sermanet P, Ben J, Erkan A, Scoffier M, Kavukcuoglu K, Muller U, LeCun Y (2009) Learning long-range vision for autonomous off-road driving. J Field Robot 26(2):120–144
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Heiden E, Millard D, Coumans E, Sukhatme GS (2020) Augmenting differentiable simulators with neural networks to close the Sim2Real gap. arXiv preprint arXiv:2007.06045
Henaff O (2020) Data-efficient image recognition with contrastive predictive coding. In: International conference on machine learning. PMLR, pp 4182–4192
Hernandez-Garcia JF, Sutton RS (2019) Understanding multi-step deep reinforcement learning: a systematic study of the DQN target. arXiv preprint arXiv:1901.07510
Hessel M, Modayil J, van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar MG, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In: AAAI conference on artificial intelligence
Higuera JCG, Meger D, Dudek G (2017) Adapting learned robotics behaviours through policy adjustment. In: IEEE international conference on robotics and automation (ICRA), pp 5837–5843
Hirose N, Sadeghian A, Vázquez M, Goebel P, Savarese S (2018) Gonet: a semi-supervised deep learning approach for traversability estimation. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3044–3051
Hirose N, Sadeghian A, Vázquez M, Goebel P, Savarese S (2018) Gonet: a semi-supervised deep learning approach for traversability estimation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3044–3051
Hirose N, Sadeghian A, Xia F, Martín-Martín R, Savarese S (2019) VUNet: dynamic scene view synthesis for traversability estimation using an RGB camera. IEEE Robot Autom Lett 4(2):2062–2069
Article Google Scholar
Hirose N, Xia F, Martín-Martín R, Sadeghian A, Savarese S (2019) Deep visual MPC-policy learning for navigation. IEEE Robot Autom Lett RA-L 4(4):3184–3191
Article Google Scholar
Ho SB (2017) Causal learning versus reinforcement learning for knowledge learning and problem solving. In: AAAI workshops
Hoffman J, Tzeng E, Park T, Zhu JY, Isola P, Saenko K, Efros A, Darrell T (2018) Cycada: cycle-consistent adversarial domain adaptation. In: International conference on machine learning (ICML), pp 1989–1998
Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, van Hasselt H, Silver D (2018) Distributed prioritized experience replay. In: International conference on learning representations (ICLR)
Ibarz B, Leike J, Pohlen T, Irving G, Legg S, Amodei D (2018) Reward learning from human preferences and demonstrations in Atari. arXiv preprint arXiv:1811.06521
Ichter B, Harrison J, Pavone M (2018) Learning sampling distributions for robot motion planning. In: IEEE international conference on robotics and automation (ICRA), pp 7087–7094
Ichter B, Pavone M (2019) Robot motion planning in learned latent spaces. IEEE Robot Autom Lett 4(3):2407–2414
Article Google Scholar
Ichter B, Schmerling E, Lee TWE, Faust A (2020) Learned critical probabilistic roadmaps for robotic motion planning. In: IEEE international conference on robotics and automation (ICRA), pp 9535–9541
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: evolution of optical flow estimation with deep networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2462–2470
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F (2021) A survey on contrastive self-supervised learning. Technologies 9(1):2
Article Google Scholar
James S, Ma Z, Arrojo DR, Davison AJ (2020) RLBench: the robot learning benchmark & learning environment. IEEE Robot Autom Lett 5(2):3019–3026
Article Google Scholar
Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell
Julian R, Swanson B, Sukhatme GS, Levine S, Finn C, Hausman K (2020) Efficient adaptation for end-to-end vision-based robotic manipulation. arXiv preprint arXiv:2004.10190
Kahn G, Abbeel P, Levine S (2020) Badgr: an autonomous self-supervised learning-based navigation system. arXiv preprint arXiv:2002.05700
Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V et al (2018) Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on robot learning (CoRL), pp 651–673
Kapturowski S, Ostrovski G, Quan J, Munos R, Dabney W (2018) Recurrent experience replay in distributed reinforcement learning. In: International conference on learning representations (ICLR)
Karkus P, Hsu D, Lee WS (2017) Qmdp-net: deep learning for planning under partial observability. In: Advances in neural information processing systems (NIPS), pp 4697–4707
Karkus P, Ma X, Hsu D, Kaelbling LP, Lee WS, Lozano-Pérez T (2019) Differentiable algorithm networks for composable robot learning. arXiv preprint arXiv:1905.11602
Károly AI, Galambos P, Kuti J, Rudas IJ (2020) Deep learning in robotics: survey on model structures and training strategies. IEEE Trans Syst Man Cybern Syst
Kaufmann E, Loquercio A, Ranftl R, Müller M, Koltun V, Scaramuzza D (2020) Deep drone acrobatics. arXiv preprint arXiv:2006.05768
Kaushik R, Desreumaux P, Mouret JB (2020) Adaptive prior selection for repertoire-based online adaptation in robotics. Front Robot AI 6:151. https://doi.org/10.3389/frobt.2019.00151
Article Google Scholar
Kirtas M, Tsampazis K, Passalis N, Tefas A (2020) Deepbots: a webots-based deep reinforcement learning framework for robotics. In: IFIP international conference on artificial intelligence applications and innovations. Springer, pp 64–75
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274
Article Google Scholar
Kong J, Pfeiffer M, Schildbach G, Borrelli F (2015) Kinematic and dynamic vehicle models for autonomous driving control design. In: IEEE Intelligent vehicles symposium (IV), pp 1094–1099
Kostrikov I, Yarats D, Fergus R (2020) Image augmentation is all you need: regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Kumar R, Mandalika A, Choudhury S, Srinivasa S (2019) Lego: leveraging experience in roadmap generation for sampling-based planning. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1488–1495
Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2020) A survey of deep learning applications to autonomous vehicle control. IEEE Trans Intell Transp Syst
Lamb L, Garcez A, Gori M, Prates M, Avelar P, Vardi M (2020) Graph neural networks meet neural-symbolic computing: A survey and perspective. arXiv preprint arXiv:2003.00330
Lecarpentier E, Abel D, Asadi K, Jinnai Y, Rachelson E, Littman ML (2020) Lipschitz lifelong reinforcement learning. arXiv preprint arXiv:2001.05411
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
LeCun Y, Muller U, Ben J, Cosatto E, Flepp B (2006) Off-road obstacle avoidance through end-to-end learning. In: Advances in neural information processing systems (NIPS), pp 739–746
Lee JH, Han MK, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326
Lee K, Smith L, Abbeel P (2021) Pebble: feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. arXiv preprint arXiv:2106.05091
Lee MA, Zhu Y, Srinivasan K, Shah P, Savarese S, Fei-Fei L, Garg A, Bohg J (2019) Making sense of vision and touch: self-supervised learning of multimodal representations for contact-rich tasks. In: IEEE international conference on robotics and automation (ICRA), pp 8943–8950
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: International conference on learning representations (ICLR)—poster
Lippi M, Poklukar P, Welle MC, Varava A, Yin H, Marino A, Kragic D (2020) Latent space roadmap for visual action planning of deformable and rigid object manipulation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)
Liu F, Shen C, Lin G, Reid I (2015) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039
Article Google Scholar
Liu H, Abbeel P (2021) Behavior from the void: Unsupervised active pre-training
Liu K, Stadler M, Roy N (2020) Learned sampling distributions for efficient planning in hybrid geometric and object-level representations. In: IEEE international conference on robotics and automation (ICRA), pp 9555–9562
Loquercio A, Maqueda AI, Del-Blanco CR, Scaramuzza D (2018) DroNet: learning to fly by driving. IEEE Robot Autom Lett 3(2):1088–1095
Article Google Scholar
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275
Luo W, Sun P, Zhong F, Liu W, Zhang T, Wang Y (2018) End-to-end active object tracking via reinforcement learning. In: International conference on machine learning, pp 3286–3295
Luo W, Sun P, Zhong F, Liu W, Zhang T, Wang Y (2019) End-to-end active object tracking and its real-world deployment via reinforcement learning. IEEE Trans Pattern Anal Mach Intell 42(6):1317–1332
Article Google Scholar
Madumal P, Miller T, Sonenberg L, Vetere F (2020) Explainable reinforcement learning through a causal lens. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 2493–2500
Mao J, Gan C, Kohli P, Tenenbaum JB, Wu J (2019) The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision. arXiv preprint arXiv:1904.12584
McCarty SL, Burke LM, McGuire M (2018) Parallel monotonic basin hopping for low thrust trajectory optimization. In: AAS/AIAA space flight mechanics meeting, p 1452
Mendoza M, Vasquez-Gomez JI, Taud H, Sucar LE, Reta C (2020) Supervised learning of the next-best-view for 3D object reconstruction. Pattern Recognit Lett 133:224–231
Article Google Scholar
Merkt WX, Ivan V, Dinev T, Havoutis I, Vijayakumar S (2021) Memory clustering using persistent homology for multimodality- and discontinuity-sensitive learning of optimal control warm-starts. IEEE Trans Robot. https://doi.org/10.1109/TRO.2021.3069132
Article Google Scholar
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning (ICML), pp 1928–1937
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Molchanov A, Chen T, Hönig W, Preiss JA, Ayanian N, Sukhatme GS (2019) Sim-to-(multi)-real: transfer of low-level robust control policies to multiple quadrotors. arXiv preprint arXiv:1903.04628
Morgan AS, Bircher WG, Dollar AM (2021) Towards generalized manipulation learning through grasp mechanics-based features and self-supervision. IEEE Trans Robot. https://doi.org/10.1109/TRO.2021.3057802
Article Google Scholar
Nagabandi A, Finn C, Levine S (2019) Deep online learning via meta-learning: continual adaptation for model-based RL
Nagabandi A, Konolige K, Levine S, Kumar V (2020) Deep dynamics models for learning dexterous manipulation. In: Conference on robot learning (CoRL), pp 1101–1112
Nagami K, Schwager M (2021) Hjb-rl: Initializing reinforcement learning with optimal control policies applied to autonomous drone racing. In: Robotics: science and systems, pp 1–9
Nguyen TT, Silander T, Li Z, Leong TY (2017) Scalable transfer learning in heterogeneous, dynamic environments. Artif Intell 247:70–94. https://doi.org/10.1016/j.artint.2015.09.013
Article MathSciNet MATH Google Scholar
Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499
Owens A, Efros AA (2018) Audio-visual scene analysis with self-supervised multisensory features. In: European conference on computer vision (ECCV), pp 631–648
Pan X, Seita D, Gao Y, Canny J (2019) Risk averse robust adversarial reinforcement learning. In: IEEE international conference on robotics and automation (ICRA), pp 8522–8528
Park D, Hoshi Y, Kemp CC (2018) A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder. IEEE Robot Autom Lett RA-L 3(3):1544–1551
Article Google Scholar
Pfeiffer M, Schaeuble M, Nieto J, Siegwart R, Cadena C (2017) From perception to decision: a data-driven approach to end-to-end motion planning for autonomous ground robots. In: IEEE international conference on robotics and automation (ICRA), pp 1527–1533
Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
MATH Google Scholar
Qureshi AH, Miao Y, Simeonov A, Yip MC (2021) Motion planning networks: bridging the gap between learning-based and classical motion planners. IEEE Trans Robot
Qureshi AH, Simeonov A, Bency MJ, Yip MC (2019) Motion planning networks. In: IEEE international conference on robotics and automation (ICRA), pp 2118–2124
Qureshi AH, Yip MC (2018) Deeply informed neural sampling for robot motion planning. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 6582–6588
Radwan N, Valada A, Burgard W (2018) Vlocnet++: deep multitask learning for semantic visual localization and odometry. IEEE Robot Autom Lett 3(4):4407–4414
Article Google Scholar
Ranftl R, Koltun V (2018) Deep fundamental matrix estimation. In: European conference on computer vision (ECCV), pp 284–299
Reddy DSK, Saha A, Tamilselvam SG, Agrawal P, Dayama P (2019) Risk averse reinforcement learning for mixed multi-agent environments. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 2171–2173
Ribeiro EG, de Queiroz Mendes R, Grassi V (2021) Real-time deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation. Robot Auton Syst. https://doi.org/10.1016/j.robot.2021.103757
Article Google Scholar
Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1–2):107–136
Article MATH Google Scholar
Riegel R, Gray A, Luus F, Khan N, Makondo N, Akhalwaya IY, Qian H, Fagin R, Barahona F, Sharma U et al (2020) Logical neural networks. arXiv preprint arXiv:2006.13155
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: International conference on machine learning (ICML)
Ross S, Gordon G, Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: International conference on artificial intelligence and statistics (AISTATS), pp 627–635
Rubinstein RY, Kroese DP (2013) The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Springer, New York
MATH Google Scholar
Ruder M, Dosovitskiy A, Brox T (2018) Artistic style transfer for videos and spherical images. Int J Comput Vis 126(11):1199–1219
Article MathSciNet Google Scholar
Rudin N, Kolvenbach H, Tsounis V, Hutter M (2021) Cat-like jumping and landing of legged robots in low gravity using deep reinforcement learning. IEEE Trans Robot. https://doi.org/10.1109/TRO.2021.3084374
Article Google Scholar
Schaul T, Horgan D, Gregor K, Silver D (2015) Universal value function approximators. In: International conference on machine learning (ICML), pp 1312–1320
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning (ICML), pp 1889–1897
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Schwarzer M, Anand A, Goel R, Hjelm RD, Courville A, Bachman P (2020) Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929
Seo Y, Chen L, Shin J, Lee H, Abbeel P, Lee K (2021) State entropy maximization with random encoders for efficient exploration. arXiv preprint arXiv:2102.09430
Serafini L, Garcez Ad (2016) Logic tensor networks: deep learning and logical reasoning from data and knowledge. arXiv preprint arXiv:1606.04422
Shi W, Song S, Wu C (2019) Soft policy gradient method for maximum entropy deep reinforcement learning. In: International joint conference on artificial intelligence (IJCAI), pp 3425–3431
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning (ICML)
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR)
Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for dynamics and control. PMLR, pp 958–968
Singh S, Jaakkola T, Littman ML, Szepesvári C (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Mach Learn 38(3):287–308
Article MATH Google Scholar
Smolyanskiy N, Kamenev A, Smith J, Birchfield S (2017) Toward low-flying autonomous mav trail navigation using deep neural networks for environmental awareness. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 4241–4247
Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Adv Neural Inf Process Syst NIPS 28:3483–3491
Google Scholar
Srinivas A, Laskin M, Abbeel P (2020) Curl: contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136
Sukhbaatar S, Lin Z, Kostrikov I, Synnaeve G, Szlam A, Fergus R (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: International conference on learning representations (ICLR)
Sun T, Gong L, Li X, Xie S, Chen Z, Hu Q, Filliat D (2021) Robotdrlsim: a real time robot simulation platform for reinforcement learning and human interactive demonstration learning. J Phys Conf Ser 1746:012035
Article Google Scholar
Sun Z, Li F, Duan X, Jin L, Lian Y, Liu S, Liu K (2021) Deep reinforcement learning for quadrotor path following with adaptive velocity. Auton Robots 45:119–134. https://doi.org/10.1016/j.robot.2021.103757
Article Google Scholar
Sun Z, Li F, Duan X, Jin L, Lian Y, Liu S, Liu K (2021) A novel adaptive iterative learning control approach and human-in-the-loop control pattern for lower limb rehabilitation robot in disturbances environment. Auton Robots 45:595–610. https://doi.org/10.1016/j.robot.2021.103757
Article Google Scholar
Sünderhauf N, Brock O, Scheirer W, Hadsell R, Fox D, Leitner J, Upcroft B, Abbeel P, Burgard W, Milford M et al (2018) The limits and potentials of deep learning for robotics. Int J Robot Res 37(4–5):405–420
Article Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge
MATH Google Scholar
Tai L, Paolo G, Liu M (2017) Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 31–36
Tang G, Hauser K (2019) Discontinuity-sensitive optimal control learning by mixture of experts. In: 2019 international conference on robotics and automation (ICRA). IEEE, pp 7892–7898
Tenorio-González AC, Morales EF (2018) Automatic discovery of concepts and actions. Expert Syst Appl 92:192–205
Article Google Scholar
Tenorio-Gonzalez AC, Morales EF, Villasenor-Pineda L (2010) Dynamic reward shaping: training a robot by voice. In: Ibero-American conference on artificial intelligence. Springer, pp 483–492
Terasawa R, Ariki Y, Narihira T, Tsuboi T, Nagasaka K (2020) 3D-CNN based heuristic guided task-space planner for faster motion planning. In: IEEE international conference on robotics and automation (ICRA), pp 9548–9554
Tesauro G (1992) Practical issues in temporal difference learning. In: Advances in neural information processing systems (NIPS), pp 259–266
Thrun S, Schwartz A (1993) Issues in using function approximation for reinforcement learning. In: Connectionist models summer school
To T, Tremblay J, McKay D, Yamaguchi Y, Leung K, Balanon A, Cheng J, Birchfield S (2018) Ndds: Nvidia deep learning dataset synthesizer. https://github.com/NVIDIA/Dataset_Synthesizer
Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 23–30
Todorov E, Erez T, Tassa Y (2012) Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 5026–5033
Tremblay J, To T, Birchfield S (2018) Falling things: a synthetic dataset for 3D object detection and pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) workshops, pp 2038–2041
Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790
Ugurlu H, Kalkan S, Saranli A (2021) Reinforcement learning versus conventional control for controlling a planar bi-rotor platform with tail appendage. J Intell Robot Syst. https://doi.org/10.1007/s10846-021-01412-3
Article Google Scholar
Van Hasselt H, Guez A, Silver D (2015) Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461
Vasquez-Gomez JI, Troncoso D, Becerra I, Sucar E, Murrieta-Cid R (2021) Next-best-view regression using a 3D convolutional neural network. Mach Vis Appl 32(42):1–14. https://doi.org/10.1007/s00138-020-01166-2
Article Google Scholar
Wang H, Yeung DY (2020) A survey on Bayesian deep learning. ACM Comput Surv CSUR 53(5):1–37
Google Scholar
Wang Z, Chen C, Li HX, Dong D, Tarn TJ (2019) Incremental reinforcement learning with prioritized sweeping for dynamic environments. IEEE/ASME Trans Mechatron 24(2):621–632
Article Google Scholar
Wang Z, Reed Garrett C, Pack Kaelbling L, Lozano-Pérez T (2021) Learning compositional models of robot skills for task and motion planning. Int J Robot Res 40(6–7):866–894. https://doi.org/10.1177/02783649211004615
Article Google Scholar
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning (ICML), pp 1995–2003
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
Article MATH Google Scholar
Wellhausen L, Dosovitskiy A, Ranftl R, Walas K, Cadena C, Hutter M (2019) Where should i walk? Predicting terrain properties from images via self-supervised learning. IEEE Robot Autom Lett 4(2):1509–1516
Article Google Scholar
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
Article MATH Google Scholar
Wu C, Zeng R, Pan J, Wang CC, Liu YJ (2019) Plant phenotyping by deep-learning-based planner for multi-robots. IEEE Robot Autom Lett 4(4):3113–3120
Article Google Scholar
Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64
Article Google Scholar
Xiang Y, Schmidt T, Narayanan V, Fox D (2017) PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199
Xu H, Gao Y, Yu F, Darrell T (2017) End-to-end learning of driving models from large-scale video datasets. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2174–2182
Yang C, Liu Y, Zell A (2021) Relative camera pose estimation using synthetic data with domain adaptation via cycle-consistent adversarial networks. J Intell Robot Syst. https://doi.org/10.1007/s10846-021-01439-6
Article Google Scholar
Yarats D, Fergus R, Lazaric A, Pinto L (2021) Reinforcement learning with prototypical representations
Zhang C, Huh J, Lee DD (2018) Learning implicit sampling distributions for motion planning. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3654–3661
Zhang J, Cheung B, Finn C, Levine S, Jayaraman D (2020) Cautious adaptation for reinforcement learning in safety-critical settings. In: International conference on machine learning. PMLR, pp 11055–11065
Zhang J, Tai L, Yun P, Xiong Y, Liu M, Boedecker J, Burgard W (2019) VR-goggles for robots: real-to-sim domain adaptation for visual control. IEEE Robot Autom Lett RA-L 4(2):1148–1155
Article Google Scholar
Zhang S, Liu B, Whiteson S (2020) Per-step reward: a new perspective for risk-averse reinforcement learning. arXiv preprint arXiv:2004.10888
Zhou T, Tulsiani S, Sun W, Malik J, Efros AA (2016) View synthesis by appearance flow. In: European conference on computer vision (ECCV), pp 286–301
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE international conference on computer vision (ICCV), pp 2223–2232
Zhu Z, Lin K, Zhou J (2020) Transfer learning in deep reinforcement learning: a survey. arXiv preprint arXiv:2009.07888
Zhu Z, Lin K, Zhou J (2020) Transfer learning in deep reinforcement learning: a survey. arXiv preprint arXiv:2009.07888

Download references

Funding

This work was supported by CONACyT Alliance of Artificial Intelligence, Cátedras-CONACyT project 745 and Intel Corporation.

Author information

Authors and Affiliations

Centro de Investigación en Matemáticas (CIMAT), Guanajuato, Mexico
Eduardo F. Morales, Rafael Murrieta-Cid, Israel Becerra & Marco A. Esquivel-Basaldua
Instituto Nacional de Astrofísica Óptica y Electrónica (INAOE), Tonantzintla, Mexico
Eduardo F. Morales
Consejo Nacional de Ciencia y Tecnología (CONACyT), Mexico City, Mexico
Israel Becerra

Authors

Eduardo F. Morales
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Murrieta-Cid
View author publications
You can also search for this author in PubMed Google Scholar
Israel Becerra
View author publications
You can also search for this author in PubMed Google Scholar
Marco A. Esquivel-Basaldua
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E. F. Morales and R. Murrieta-Cid contributed to the article conception and design. All authors performed the literature search, drafted, and critically revised the work.

Corresponding author

Correspondence to Israel Becerra.

Ethics declarations

Conflict of interest

The authors have no relevant financial or nonfinancial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morales, E.F., Murrieta-Cid, R., Becerra, I. et al. A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning. Intel Serv Robotics 14, 773–805 (2021). https://doi.org/10.1007/s11370-021-00398-z

Download citation

Received: 30 August 2021
Accepted: 18 October 2021
Published: 16 November 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s11370-021-00398-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Deep reinforcement learning in mobile robotics – a concise review

Research on Robot Control Based on Reinforcement Learning

Uncertainty-Aware Autonomous Mobile Robot Navigation with Deep Reinforcement Learning

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation