Abstract
This article is about deep learning (DL) and deep reinforcement learning (DRL) works applied to robotics. Both tools have been shown to be successful in delivering data-driven solutions for robotics tasks, as well as providing a natural way to develop an end-to-end pipeline from the robot’s sensing to its actuation, passing through the generation of a policy to perform the given task. These frameworks have been proven to be able to deal with real-world complications such as noise in sensing, imprecise actuation, variability in the scenarios where the robot is being deployed, among others. Following that vein, and given the growing interest in DL and DRL, the present work starts by providing a brief tutorial on deep reinforcement learning, where the goal is to understand the main concepts and approaches followed in the field. Later, the article describes the main, recent, and most promising approaches of DL and DRL in robotics, with sufficient technical detail to understand the core of the works and to motivate interested readers to initiate their own research in the area. Then, to provide a comparative analysis, we present several taxonomies in which the references can be classified, according to high-level features, the task that the work addresses, the type of system, and the learning techniques used in the work. We conclude by presenting promising research directions in both DL and DRL.
Similar content being viewed by others
Notes
State, Action, Reward, State, Action.
REward Increment = Non negative Factor \(\times \) Offset Reinforcement \(\times \) Characteristic Eligibility.
References
Abate A, Prandini M, Lygeros J, Sastry S (2008) Probabilistic reachability and safety for controlled discrete time stochastic hybrid systems. Automatica 44(11):2724–2734
Amini A, Gilitschenski I, Phillips J, Moseyko J, Banerjee R, Karaman S, Rus D (2020) Learning robust control policies for end-to-end autonomous driving from data-driven simulation. IEEE Robot Autom Lett RA-L 5(2):1143–1150
Amini A, Rosman G, Karaman S, Rus D (2019) Variational end-to-end navigation and localization. In: IEEE international conference on robotics and automation (ICRA), pp 8958–8964
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Pieter Abbeel O, Zaremba W (2017) Hindsight experience replay, pp 5048–5058
Asseman A, Kornuta T, Ozcan A (2018) Learning beyond simulated physics
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Bareinboim E, Forney A, Pearl J (2015) Bandits with unobserved confounders: a causal approach. Adv Neural Inf Process Syst NIPS 28:1342–1350
Barth-Maron G, Hoffman MW, Budden D, Dabney W, Horgan D, Tb D, Muldal A, Heess N, Lillicrap T (2018) Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617
Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: International conference on machine learning (ICML), pp 449–458
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: International conference on machine learning (ICML), pp 41–48
Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, Jackel LD, Monfort M, Muller U, Zhang J et al (2016) End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI gym. arXiv preprint arXiv:1606.01540
Cabi S, Colmenarejo SG, Novikov A, Konyushkova K, Reed S, Jeong R, Zolna K, Aytar Y, Budden D, Vecerik M et al (2019) Scaling data-driven robotics with reward sketching and batch reinforcement learning. arXiv preprint arXiv:1909.12200
Cai P, Wang S, Sun Y, Liu M (2020) Probabilistic end-to-end vehicle navigation in complex dynamic environments with multimodal sensor fusion. IEEE Robot Autom Lett 5(3):4218–4224
Caicedo JC, Lazebnik S (2015) Active object localization with deep reinforcement learning. In: International conference on computer vision, pp 2488–2496
Caicedo JC, Lazebnik S (2017) Action-decision networks for visual tracking with deep reinforcement learning. In: IEEE international conference on imaging, vision and pattern recognition, pp 2711–2720
Calli B, Singh A, Walsman A, Srinivasa S, Abbeel P, Dollar AM (2015) The YCB object and model set: towards common benchmarks for manipulation research. In: International conference on advanced robotics (ICAR), pp 510–517
Campos V, Trott A, Xiong C, Socher R, Giró-i Nieto X, Torres J (2020) Explore, discover and learn: Unsupervised discovery of state-covering skills. In: International conference on machine learning. PMLR, pp 1317–1327
Canny J (1988) Some algebraic and geometric computations in PSPACE. In: ACM symposium on theory of computing (STOC), pp 460–467
Canny J, Reif J (1987) New lower bound techniques for robot motion planning problems. In: IEEE symposium on foundations of computer science (FOCS), pp 49–60
Chebotar Y, Handa A, Makoviychuk V, Macklin M, Issac J, Ratliff N, Fox D (2019) Closing the sim-to-real loop: adapting simulation randomization with real world experience. In: IEEE international conference on robotics and automation (ICRA), pp 8973–8979
Chen B, Dai B, Lin Q, Ye G, Liu H, Song L (2019) Learning to plan in high dimensions via neural exploration-exploitation trees. In: International conference on learning representations (ICLR)
Chen T, Zhai X, Ritter M, Lucic M, Houlsby N (2019) Self-supervised GANs via auxiliary rotation loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12154–12163
Chen X, Fan H, Girshick R, He K (2020) Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297
Chiang HT, Malone N, Lesser K, Oishi M, Tapia L (2015) Aggressive moving obstacle avoidance using a stochastic reachable set based potential field. In: International workshop on the algorithmic foundations of robotics (WAFR), pp 73–89
Chiang HTL, Faust A, Sugaya S, Tapia L (2018) Fast swept volume estimation with deep learning. In: International workshop on the algorithmic foundations of robotics (WAFR), pp 52–68
Chiang HTL, Faust M, Fiser M, Frances A (2019) Learning navigation behaviors end to end with auto-RL. IEEE Robot Autom Lett 56:2007–2014
Chiang HTL, Hsu J, Fiser M, Tapia L, Faust A (2019) RL-RRT: kinodynamic motion planning via learning reachability estimators from RL policies. IEEE Robot Autom Lett 4(4):4298–4305
Codevilla F, Miiller M, López A, Koltun V, Dosovitskiy A (2018) End-to-end driving via conditional imitation learning. In: IEEE international conference on robotics and automation (ICRA), pp 1–9
Crosby M, Beyret B, Halina M (2019) The animal-ai olympics. Nat Mach Intell 1(5):257–257
Dasari S, Ebert F, Tian S, Nair S, Bucher B, Schmeckpeper K, Singh S, Levine S, Finn C (2019) Robonet: large-scale multi-robot learning. arXiv preprint arXiv:1910.11215
Dasgupta I, Wang J, Chiappa S, Mitrovic J, Ortega P, Raposo D, Hughes E, Battaglia P, Botvinick M, Kurth-Nelson Z (2019) Causal reasoning from meta-reinforcement learning. arXiv preprint arXiv:1901.08162
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255
Devo A, Dionigi A, Costante G (2021) Enhancing continuous control of mobile robots for end-to-end visual active tracking. Robot Autonom Syst. https://doi.org/10.1016/j.robot.2021.103799
Dolgov D, Thrun S, Montemerlo M, Diebel J (2010) Path planning for autonomous vehicles in unknown semi-structured environments. Int J Robot Res 29(5):485–501
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) FlowNet: learning optical flow with convolutional networks. In: IEEE international conference on computer vision (ICCV), pp 2758–2766
Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) Carla: an open urban driving simulator. In: Conference on robot learning (CoRL), pp 1–16
Driess D, Oguz O, Ha JS, Toussaint M (2020) Deep visual heuristics: learning feasibility of mixed-integer programs for manipulation planning. In: 2020 IEEE international conference on robotics and automation (ICRA). IEEE, pp 9563–9569
Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22(10):1517–1531. https://doi.org/10.1109/TNN.2011.2160459
Fabisch A, Petzoldt C, Otto M, Kirchner F (2019) A survey of behavior learning applications in robotics-state of the art and perspectives. arXiv preprint arXiv:1906.01868
Fairbank M, Alonso E (2012) The divergence of reinforcement learning algorithms with value-iteration and function approximation. In: IEEE international joint conference on neural networks (IJCNN), pp 1–8
Faust A, Oslund K, Ramirez O, Francis A, Tapia L, Fiser M, Davidson J (2018) PRM-RL: long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. In: IEEE international conference on robotics and automation (ICRA), pp 5113–5120
Fernández IMR, Sutanto G, Englert P, Ramachandran RK, Sukhatme GS (2020) Learning manifolds for sequential motion planning. arXiv preprint arXiv:2006.07746
Fortunato M, Azar MG, Piot B, Menick J, Hessel M, Osband I, Graves A, Mnih V, Munos R, Hassabis D et al (2018) Noisy networks for exploration. In: International conference on learning representations (ICLR)
Fox D (2001) KLD-sampling: adaptive particle filters, pp 713–720
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning. PMLR, pp 1587–1596
Gao W, Hsu D, Lee WS, Shen S, Subramanian K (2017) Intention-net: integrating planning and deep learning for goal-directed autonomous navigation. In: Conference on robot learning (CoRL), pp 185–194
Garcia Cifuentes C, Issac J, Wüthrich M, Schaal S, Bohg J (2016) Probabilistic articulated real-time tracking for robot manipulation. IEEE Robot Autom Lett RA-L 2(2):577–584
Garg A, Chiang HTL, Sugaya S, Faust A, Tapia L (2019) Comparison of deep reinforcement learning policies to formal methods for moving obstacle avoidance. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3534–3541
Gershman SJ (2017) Reinforcement learning and causal models. The Oxford handbook of causal reasoning, p 295
Gonzalez-Trejo J, Mercado-Ravell DA, Becerra I, Murrieta-Cid R (2021) On the visual-based safe landing of UAVS in populated areas: a crucial aspect for urban deployment. IEEE Robot Autom Lett. https://doi.org/10.1109/LRA.2021.3101861
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems (NIPS), pp 2672–2680
Hadsell R, Sermanet P, Ben J, Erkan A, Scoffier M, Kavukcuoglu K, Muller U, LeCun Y (2009) Learning long-range vision for autonomous off-road driving. J Field Robot 26(2):120–144
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Heiden E, Millard D, Coumans E, Sukhatme GS (2020) Augmenting differentiable simulators with neural networks to close the Sim2Real gap. arXiv preprint arXiv:2007.06045
Henaff O (2020) Data-efficient image recognition with contrastive predictive coding. In: International conference on machine learning. PMLR, pp 4182–4192
Hernandez-Garcia JF, Sutton RS (2019) Understanding multi-step deep reinforcement learning: a systematic study of the DQN target. arXiv preprint arXiv:1901.07510
Hessel M, Modayil J, van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar MG, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In: AAAI conference on artificial intelligence
Higuera JCG, Meger D, Dudek G (2017) Adapting learned robotics behaviours through policy adjustment. In: IEEE international conference on robotics and automation (ICRA), pp 5837–5843
Hirose N, Sadeghian A, Vázquez M, Goebel P, Savarese S (2018) Gonet: a semi-supervised deep learning approach for traversability estimation. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3044–3051
Hirose N, Sadeghian A, Vázquez M, Goebel P, Savarese S (2018) Gonet: a semi-supervised deep learning approach for traversability estimation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3044–3051
Hirose N, Sadeghian A, Xia F, Martín-Martín R, Savarese S (2019) VUNet: dynamic scene view synthesis for traversability estimation using an RGB camera. IEEE Robot Autom Lett 4(2):2062–2069
Hirose N, Xia F, Martín-Martín R, Sadeghian A, Savarese S (2019) Deep visual MPC-policy learning for navigation. IEEE Robot Autom Lett RA-L 4(4):3184–3191
Ho SB (2017) Causal learning versus reinforcement learning for knowledge learning and problem solving. In: AAAI workshops
Hoffman J, Tzeng E, Park T, Zhu JY, Isola P, Saenko K, Efros A, Darrell T (2018) Cycada: cycle-consistent adversarial domain adaptation. In: International conference on machine learning (ICML), pp 1989–1998
Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, van Hasselt H, Silver D (2018) Distributed prioritized experience replay. In: International conference on learning representations (ICLR)
Ibarz B, Leike J, Pohlen T, Irving G, Legg S, Amodei D (2018) Reward learning from human preferences and demonstrations in Atari. arXiv preprint arXiv:1811.06521
Ichter B, Harrison J, Pavone M (2018) Learning sampling distributions for robot motion planning. In: IEEE international conference on robotics and automation (ICRA), pp 7087–7094
Ichter B, Pavone M (2019) Robot motion planning in learned latent spaces. IEEE Robot Autom Lett 4(3):2407–2414
Ichter B, Schmerling E, Lee TWE, Faust A (2020) Learned critical probabilistic roadmaps for robotic motion planning. In: IEEE international conference on robotics and automation (ICRA), pp 9535–9541
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: evolution of optical flow estimation with deep networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2462–2470
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F (2021) A survey on contrastive self-supervised learning. Technologies 9(1):2
James S, Ma Z, Arrojo DR, Davison AJ (2020) RLBench: the robot learning benchmark & learning environment. IEEE Robot Autom Lett 5(2):3019–3026
Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell
Julian R, Swanson B, Sukhatme GS, Levine S, Finn C, Hausman K (2020) Efficient adaptation for end-to-end vision-based robotic manipulation. arXiv preprint arXiv:2004.10190
Kahn G, Abbeel P, Levine S (2020) Badgr: an autonomous self-supervised learning-based navigation system. arXiv preprint arXiv:2002.05700
Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V et al (2018) Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on robot learning (CoRL), pp 651–673
Kapturowski S, Ostrovski G, Quan J, Munos R, Dabney W (2018) Recurrent experience replay in distributed reinforcement learning. In: International conference on learning representations (ICLR)
Karkus P, Hsu D, Lee WS (2017) Qmdp-net: deep learning for planning under partial observability. In: Advances in neural information processing systems (NIPS), pp 4697–4707
Karkus P, Ma X, Hsu D, Kaelbling LP, Lee WS, Lozano-Pérez T (2019) Differentiable algorithm networks for composable robot learning. arXiv preprint arXiv:1905.11602
Károly AI, Galambos P, Kuti J, Rudas IJ (2020) Deep learning in robotics: survey on model structures and training strategies. IEEE Trans Syst Man Cybern Syst
Kaufmann E, Loquercio A, Ranftl R, Müller M, Koltun V, Scaramuzza D (2020) Deep drone acrobatics. arXiv preprint arXiv:2006.05768
Kaushik R, Desreumaux P, Mouret JB (2020) Adaptive prior selection for repertoire-based online adaptation in robotics. Front Robot AI 6:151. https://doi.org/10.3389/frobt.2019.00151
Kirtas M, Tsampazis K, Passalis N, Tefas A (2020) Deepbots: a webots-based deep reinforcement learning framework for robotics. In: IFIP international conference on artificial intelligence applications and innovations. Springer, pp 64–75
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274
Kong J, Pfeiffer M, Schildbach G, Borrelli F (2015) Kinematic and dynamic vehicle models for autonomous driving control design. In: IEEE Intelligent vehicles symposium (IV), pp 1094–1099
Kostrikov I, Yarats D, Fergus R (2020) Image augmentation is all you need: regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Kumar R, Mandalika A, Choudhury S, Srinivasa S (2019) Lego: leveraging experience in roadmap generation for sampling-based planning. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1488–1495
Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2020) A survey of deep learning applications to autonomous vehicle control. IEEE Trans Intell Transp Syst
Lamb L, Garcez A, Gori M, Prates M, Avelar P, Vardi M (2020) Graph neural networks meet neural-symbolic computing: A survey and perspective. arXiv preprint arXiv:2003.00330
Lecarpentier E, Abel D, Asadi K, Jinnai Y, Rachelson E, Littman ML (2020) Lipschitz lifelong reinforcement learning. arXiv preprint arXiv:2001.05411
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
LeCun Y, Muller U, Ben J, Cosatto E, Flepp B (2006) Off-road obstacle avoidance through end-to-end learning. In: Advances in neural information processing systems (NIPS), pp 739–746
Lee JH, Han MK, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326
Lee K, Smith L, Abbeel P (2021) Pebble: feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. arXiv preprint arXiv:2106.05091
Lee MA, Zhu Y, Srinivasan K, Shah P, Savarese S, Fei-Fei L, Garg A, Bohg J (2019) Making sense of vision and touch: self-supervised learning of multimodal representations for contact-rich tasks. In: IEEE international conference on robotics and automation (ICRA), pp 8943–8950
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: International conference on learning representations (ICLR)—poster
Lippi M, Poklukar P, Welle MC, Varava A, Yin H, Marino A, Kragic D (2020) Latent space roadmap for visual action planning of deformable and rigid object manipulation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS)
Liu F, Shen C, Lin G, Reid I (2015) Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans Pattern Anal Mach Intell 38(10):2024–2039
Liu H, Abbeel P (2021) Behavior from the void: Unsupervised active pre-training
Liu K, Stadler M, Roy N (2020) Learned sampling distributions for efficient planning in hybrid geometric and object-level representations. In: IEEE international conference on robotics and automation (ICRA), pp 9555–9562
Loquercio A, Maqueda AI, Del-Blanco CR, Scaramuzza D (2018) DroNet: learning to fly by driving. IEEE Robot Autom Lett 3(2):1088–1095
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275
Luo W, Sun P, Zhong F, Liu W, Zhang T, Wang Y (2018) End-to-end active object tracking via reinforcement learning. In: International conference on machine learning, pp 3286–3295
Luo W, Sun P, Zhong F, Liu W, Zhang T, Wang Y (2019) End-to-end active object tracking and its real-world deployment via reinforcement learning. IEEE Trans Pattern Anal Mach Intell 42(6):1317–1332
Madumal P, Miller T, Sonenberg L, Vetere F (2020) Explainable reinforcement learning through a causal lens. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 2493–2500
Mao J, Gan C, Kohli P, Tenenbaum JB, Wu J (2019) The neuro-symbolic concept learner: interpreting scenes, words, and sentences from natural supervision. arXiv preprint arXiv:1904.12584
McCarty SL, Burke LM, McGuire M (2018) Parallel monotonic basin hopping for low thrust trajectory optimization. In: AAS/AIAA space flight mechanics meeting, p 1452
Mendoza M, Vasquez-Gomez JI, Taud H, Sucar LE, Reta C (2020) Supervised learning of the next-best-view for 3D object reconstruction. Pattern Recognit Lett 133:224–231
Merkt WX, Ivan V, Dinev T, Havoutis I, Vijayakumar S (2021) Memory clustering using persistent homology for multimodality- and discontinuity-sensitive learning of optimal control warm-starts. IEEE Trans Robot. https://doi.org/10.1109/TRO.2021.3069132
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning (ICML), pp 1928–1937
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Molchanov A, Chen T, Hönig W, Preiss JA, Ayanian N, Sukhatme GS (2019) Sim-to-(multi)-real: transfer of low-level robust control policies to multiple quadrotors. arXiv preprint arXiv:1903.04628
Morgan AS, Bircher WG, Dollar AM (2021) Towards generalized manipulation learning through grasp mechanics-based features and self-supervision. IEEE Trans Robot. https://doi.org/10.1109/TRO.2021.3057802
Nagabandi A, Finn C, Levine S (2019) Deep online learning via meta-learning: continual adaptation for model-based RL
Nagabandi A, Konolige K, Levine S, Kumar V (2020) Deep dynamics models for learning dexterous manipulation. In: Conference on robot learning (CoRL), pp 1101–1112
Nagami K, Schwager M (2021) Hjb-rl: Initializing reinforcement learning with optimal control policies applied to autonomous drone racing. In: Robotics: science and systems, pp 1–9
Nguyen TT, Silander T, Li Z, Leong TY (2017) Scalable transfer learning in heterogeneous, dynamic environments. Artif Intell 247:70–94. https://doi.org/10.1016/j.artint.2015.09.013
Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499
Owens A, Efros AA (2018) Audio-visual scene analysis with self-supervised multisensory features. In: European conference on computer vision (ECCV), pp 631–648
Pan X, Seita D, Gao Y, Canny J (2019) Risk averse robust adversarial reinforcement learning. In: IEEE international conference on robotics and automation (ICRA), pp 8522–8528
Park D, Hoshi Y, Kemp CC (2018) A multimodal anomaly detector for robot-assisted feeding using an LSTM-based variational autoencoder. IEEE Robot Autom Lett RA-L 3(3):1544–1551
Pfeiffer M, Schaeuble M, Nieto J, Siegwart R, Cadena C (2017) From perception to decision: a data-driven approach to end-to-end motion planning for autonomous ground robots. In: IEEE international conference on robotics and automation (ICRA), pp 1527–1533
Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Qureshi AH, Miao Y, Simeonov A, Yip MC (2021) Motion planning networks: bridging the gap between learning-based and classical motion planners. IEEE Trans Robot
Qureshi AH, Simeonov A, Bency MJ, Yip MC (2019) Motion planning networks. In: IEEE international conference on robotics and automation (ICRA), pp 2118–2124
Qureshi AH, Yip MC (2018) Deeply informed neural sampling for robot motion planning. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 6582–6588
Radwan N, Valada A, Burgard W (2018) Vlocnet++: deep multitask learning for semantic visual localization and odometry. IEEE Robot Autom Lett 3(4):4407–4414
Ranftl R, Koltun V (2018) Deep fundamental matrix estimation. In: European conference on computer vision (ECCV), pp 284–299
Reddy DSK, Saha A, Tamilselvam SG, Agrawal P, Dayama P (2019) Risk averse reinforcement learning for mixed multi-agent environments. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, pp 2171–2173
Ribeiro EG, de Queiroz Mendes R, Grassi V (2021) Real-time deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation. Robot Auton Syst. https://doi.org/10.1016/j.robot.2021.103757
Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62(1–2):107–136
Riegel R, Gray A, Luus F, Khan N, Makondo N, Akhalwaya IY, Qian H, Fagin R, Barahona F, Sharma U et al (2020) Logical neural networks. arXiv preprint arXiv:2006.13155
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: International conference on machine learning (ICML)
Ross S, Gordon G, Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: International conference on artificial intelligence and statistics (AISTATS), pp 627–635
Rubinstein RY, Kroese DP (2013) The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Springer, New York
Ruder M, Dosovitskiy A, Brox T (2018) Artistic style transfer for videos and spherical images. Int J Comput Vis 126(11):1199–1219
Rudin N, Kolvenbach H, Tsounis V, Hutter M (2021) Cat-like jumping and landing of legged robots in low gravity using deep reinforcement learning. IEEE Trans Robot. https://doi.org/10.1109/TRO.2021.3084374
Schaul T, Horgan D, Gregor K, Silver D (2015) Universal value function approximators. In: International conference on machine learning (ICML), pp 1312–1320
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning (ICML), pp 1889–1897
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Schwarzer M, Anand A, Goel R, Hjelm RD, Courville A, Bachman P (2020) Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929
Seo Y, Chen L, Shin J, Lee H, Abbeel P, Lee K (2021) State entropy maximization with random encoders for efficient exploration. arXiv preprint arXiv:2102.09430
Serafini L, Garcez Ad (2016) Logic tensor networks: deep learning and logical reasoning from data and knowledge. arXiv preprint arXiv:1606.04422
Shi W, Song S, Wu C (2019) Soft policy gradient method for maximum entropy deep reinforcement learning. In: International joint conference on artificial intelligence (IJCAI), pp 3425–3431
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning (ICML)
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR)
Singh R, Zhang Q, Chen Y (2020) Improving robustness via risk averse distributional reinforcement learning. In: Learning for dynamics and control. PMLR, pp 958–968
Singh S, Jaakkola T, Littman ML, Szepesvári C (2000) Convergence results for single-step on-policy reinforcement-learning algorithms. Mach Learn 38(3):287–308
Smolyanskiy N, Kamenev A, Smith J, Birchfield S (2017) Toward low-flying autonomous mav trail navigation using deep neural networks for environmental awareness. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 4241–4247
Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Adv Neural Inf Process Syst NIPS 28:3483–3491
Srinivas A, Laskin M, Abbeel P (2020) Curl: contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136
Sukhbaatar S, Lin Z, Kostrikov I, Synnaeve G, Szlam A, Fergus R (2018) Intrinsic motivation and automatic curricula via asymmetric self-play. In: International conference on learning representations (ICLR)
Sun T, Gong L, Li X, Xie S, Chen Z, Hu Q, Filliat D (2021) Robotdrlsim: a real time robot simulation platform for reinforcement learning and human interactive demonstration learning. J Phys Conf Ser 1746:012035
Sun Z, Li F, Duan X, Jin L, Lian Y, Liu S, Liu K (2021) Deep reinforcement learning for quadrotor path following with adaptive velocity. Auton Robots 45:119–134. https://doi.org/10.1016/j.robot.2021.103757
Sun Z, Li F, Duan X, Jin L, Lian Y, Liu S, Liu K (2021) A novel adaptive iterative learning control approach and human-in-the-loop control pattern for lower limb rehabilitation robot in disturbances environment. Auton Robots 45:595–610. https://doi.org/10.1016/j.robot.2021.103757
Sünderhauf N, Brock O, Scheirer W, Hadsell R, Fox D, Leitner J, Upcroft B, Abbeel P, Burgard W, Milford M et al (2018) The limits and potentials of deep learning for robotics. Int J Robot Res 37(4–5):405–420
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge
Tai L, Paolo G, Liu M (2017) Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 31–36
Tang G, Hauser K (2019) Discontinuity-sensitive optimal control learning by mixture of experts. In: 2019 international conference on robotics and automation (ICRA). IEEE, pp 7892–7898
Tenorio-González AC, Morales EF (2018) Automatic discovery of concepts and actions. Expert Syst Appl 92:192–205
Tenorio-Gonzalez AC, Morales EF, Villasenor-Pineda L (2010) Dynamic reward shaping: training a robot by voice. In: Ibero-American conference on artificial intelligence. Springer, pp 483–492
Terasawa R, Ariki Y, Narihira T, Tsuboi T, Nagasaka K (2020) 3D-CNN based heuristic guided task-space planner for faster motion planning. In: IEEE international conference on robotics and automation (ICRA), pp 9548–9554
Tesauro G (1992) Practical issues in temporal difference learning. In: Advances in neural information processing systems (NIPS), pp 259–266
Thrun S, Schwartz A (1993) Issues in using function approximation for reinforcement learning. In: Connectionist models summer school
To T, Tremblay J, McKay D, Yamaguchi Y, Leung K, Balanon A, Cheng J, Birchfield S (2018) Ndds: Nvidia deep learning dataset synthesizer. https://github.com/NVIDIA/Dataset_Synthesizer
Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 23–30
Todorov E, Erez T, Tassa Y (2012) Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 5026–5033
Tremblay J, To T, Birchfield S (2018) Falling things: a synthetic dataset for 3D object detection and pose estimation. In: IEEE conference on computer vision and pattern recognition (CVPR) workshops, pp 2038–2041
Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790
Ugurlu H, Kalkan S, Saranli A (2021) Reinforcement learning versus conventional control for controlling a planar bi-rotor platform with tail appendage. J Intell Robot Syst. https://doi.org/10.1007/s10846-021-01412-3
Van Hasselt H, Guez A, Silver D (2015) Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461
Vasquez-Gomez JI, Troncoso D, Becerra I, Sucar E, Murrieta-Cid R (2021) Next-best-view regression using a 3D convolutional neural network. Mach Vis Appl 32(42):1–14. https://doi.org/10.1007/s00138-020-01166-2
Wang H, Yeung DY (2020) A survey on Bayesian deep learning. ACM Comput Surv CSUR 53(5):1–37
Wang Z, Chen C, Li HX, Dong D, Tarn TJ (2019) Incremental reinforcement learning with prioritized sweeping for dynamic environments. IEEE/ASME Trans Mechatron 24(2):621–632
Wang Z, Reed Garrett C, Pack Kaelbling L, Lozano-Pérez T (2021) Learning compositional models of robot skills for task and motion planning. Int J Robot Res 40(6–7):866–894. https://doi.org/10.1177/02783649211004615
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning (ICML), pp 1995–2003
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292
Wellhausen L, Dosovitskiy A, Ranftl R, Walas K, Cadena C, Hutter M (2019) Where should i walk? Predicting terrain properties from images via self-supervised learning. IEEE Robot Autom Lett 4(2):1509–1516
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
Wu C, Zeng R, Pan J, Wang CC, Liu YJ (2019) Plant phenotyping by deep-learning-based planner for multi-robots. IEEE Robot Autom Lett 4(4):3113–3120
Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64
Xiang Y, Schmidt T, Narayanan V, Fox D (2017) PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199
Xu H, Gao Y, Yu F, Darrell T (2017) End-to-end learning of driving models from large-scale video datasets. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2174–2182
Yang C, Liu Y, Zell A (2021) Relative camera pose estimation using synthetic data with domain adaptation via cycle-consistent adversarial networks. J Intell Robot Syst. https://doi.org/10.1007/s10846-021-01439-6
Yarats D, Fergus R, Lazaric A, Pinto L (2021) Reinforcement learning with prototypical representations
Zhang C, Huh J, Lee DD (2018) Learning implicit sampling distributions for motion planning. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3654–3661
Zhang J, Cheung B, Finn C, Levine S, Jayaraman D (2020) Cautious adaptation for reinforcement learning in safety-critical settings. In: International conference on machine learning. PMLR, pp 11055–11065
Zhang J, Tai L, Yun P, Xiong Y, Liu M, Boedecker J, Burgard W (2019) VR-goggles for robots: real-to-sim domain adaptation for visual control. IEEE Robot Autom Lett RA-L 4(2):1148–1155
Zhang S, Liu B, Whiteson S (2020) Per-step reward: a new perspective for risk-averse reinforcement learning. arXiv preprint arXiv:2004.10888
Zhou T, Tulsiani S, Sun W, Malik J, Efros AA (2016) View synthesis by appearance flow. In: European conference on computer vision (ECCV), pp 286–301
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE international conference on computer vision (ICCV), pp 2223–2232
Zhu Z, Lin K, Zhou J (2020) Transfer learning in deep reinforcement learning: a survey. arXiv preprint arXiv:2009.07888
Zhu Z, Lin K, Zhou J (2020) Transfer learning in deep reinforcement learning: a survey. arXiv preprint arXiv:2009.07888
Funding
This work was supported by CONACyT Alliance of Artificial Intelligence, Cátedras-CONACyT project 745 and Intel Corporation.
Author information
Authors and Affiliations
Contributions
E. F. Morales and R. Murrieta-Cid contributed to the article conception and design. All authors performed the literature search, drafted, and critically revised the work.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or nonfinancial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Morales, E.F., Murrieta-Cid, R., Becerra, I. et al. A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning. Intel Serv Robotics 14, 773–805 (2021). https://doi.org/10.1007/s11370-021-00398-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-021-00398-z