Behavior Adaptation by Means of Reinforcement Learning

Carreras, Marc; El-fakdi, Andrés; Ridao, Pere

doi:10.1007/978-1-4614-5659-9_7

Marc Carreras²,
Andrés El-fakdi³ &
Pere Ridao²

2414 Accesses
3 Citations

Abstract

Machine learning techniques can be used for learning the action-decision problem that most autonomous robots have when working in unknown and changing environments. Reinforcement learning (RL) offers the possibility of learning a state-action policy that solves a particular task without any previous experience. A reinforcement function, designed by a human operator, is the only required information to determine, after some experimentation, the way of solving the task. This chapter proposes the use of RL algorithms to learn reactive AUV behaviors and therefore not having to define the state-action mapping to solve the task. The algorithms will find the policy that optimizes the task and will adapt to any environment dynamics encountered. The advantage of the approach is that the same algorithms can be applied to a range of tasks, assuming that the problem is correctly sensed and defined. The two main methodologies that have been applied in RL-based robot learning for the past 2 decades, value-function methods and policy gradient methods, are presented in this chapter and evaluated in two AUV tasks. In both cases, a well-known theoretical algorithm has been modified to fulfill the requirements of the AUV task and has been applied with a real AUV. Results show the effectiveness of both approaches, each of them with some advantages and disadvantages, and point out the further investigation of these methods for making AUVs perform more robustly and adaptively in future applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 149.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aberdeen DA (2003) Policy-gradient algorithms for partially observable Markov decision processes. PhD thesis, Australian National University, April 2003
Google Scholar
Amari S (1998) Natural gradient works efficiently in learning. Neural Comput 10:251–276
Google Scholar
Anderson C (2000) Approximating a policy can be easier than approximating a value function. Computer science technical report, University of Colorado State
Google Scholar
Bagnell JA, Schneider JG (2001) Autonomous helicopter control using reinforcement learning policy search methods. In: IEEE international conference on robotics and automation, ICRA. Korea
Google Scholar
Baird K (1995) Residual algorithms: reinforcement learning with function approximation. In: 12th international conference on machine learning, ICML. San Francisco, USA
Google Scholar
Baird LC, Klopf AH (1993) Reinforcement learning with high-dimensional, continuous action. Technical report WL-TR-93-1147, Wright Laboratory
Google Scholar
Barto AG, Sutton RS, Anderson CW (1983) Neuronlike elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13:835–846
Google Scholar
Baxter J, Bartlett PL (2000) Direct gradient-based reinforcement learning. In: International symposium on circuits and systems. Geneva, Switzerland, May 2000
Google Scholar
Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton
Google Scholar
Benbrahim H, Doleac J, Franklin J, Selfridge O (1992) Real-time learning: a ball on a beam. In: International joint conference on neural networks (IJCNN). Baltimore, MD, USA
Google Scholar
Benbrahim H, Franklin J (1997) Biped dynamic walking using reinforcement. Robot Auton Syst 22:283–302
Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Google Scholar
Carreras M, Ridao P, El-Fakdi A (2003) Semi-online neural-Q-learning for real-time robot learning. In: IEEE/RSJ international conference on intelligent robots and systems, IROS. Las Vegas, USA, October 27–31 2003
Google Scholar
Carreras M, Ridao P, Garcia R, Nicosevici T (2003) Vision-based localization of an underwater robot in a structured environment. In: IEEE international conference on robotics and automation, ICRA. Taipei, Taiwan
Google Scholar
Carreras M, Yuh J, Batlle J, Ridao P (2007) Application of sonql for real-time learning of robot behaviors. Robot Auton Syst 55(8):628–642
Google Scholar
Crites RH, Barto AG (1996) Improving elevator performance using reinforcement learning. In: Advances in neural information processing systems, NIPS. MIT Press, Cambridge, MA
Google Scholar
Dayan P (1992) The convergence of TD(\(\lambda \)) for general \(\lambda \). J Mach Learn 8:341–362
Google Scholar
Gordon GJ (1999) Approximate solutions to Markov decision processes. PhD thesis, Carnegie-Mellon University
Google Scholar
Gullapalli V, Franklin JJ, Benbrahim H (1994) Acquiring robot skills via reinforcement. IEEE Control Syst J, Spec Issue on Robot: Capturing Nat Motion 4(1):13–24
Google Scholar
Haykin S (1999) Neural networks, a comprehensive foundation, 2nd edn. Prentice Hall, Englewood Cliffs
Google Scholar
Hernandez N, Mahadevan S (2000) Hierarchical memory-based reinforcement learning. In: Advances in neural information processing systems, NIPS. Denver, USA
Google Scholar
Konda VR, Tsitsiklis JN (2003) On actor-critic algorithms. SIAM J Control Optim 42(4): 1143–1166
Google Scholar
Kormushev P, Nomoto K, Dong F, Hirota K (2011) Time hopping technique for faster reinforcement learning in simulations. International Journal of Cybernetics and Information Technologies 11(3):42–59
Google Scholar
Lin LJ (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. J Mach Learn 8(3/4):293–321
Google Scholar
Marbach P, Tsitsiklis JN (2000) Gradient-based optimization of Markov reward processes: practical variants. Technical report, Center for Communications Systems Research, University of Cambridge, March 2000
Google Scholar
Meuleau N, Peshkin L, Kim K (2001) Exploration in gradient based reinforcement learning. Technical report, Massachusetts Institute of Technology, AI Memo 2001–2003, April 2001
Google Scholar
Ortiz A, Simo M, Oliver G (2002) A vision system for an underwater cable tracker. Int J Mach Vis Appl 13(3):129–140
Google Scholar
Peters J (2007) Machine learning of motor skills for robotics. PhD thesis, Department of Computer Science, University of Southern California
Google Scholar
Peters J, Schaal S (2006) Policy gradient methods for robotics. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS. Beijing, China, October 9–15 2006
Google Scholar
Peters J, Vijayakumar S, Schaal S (2005) Natural actor-critic. In: ECML, pp 280–291. Austria
Google Scholar
Precup D, Sutton RS, Dasgupta S (2001) Off-policy temporal-difference learning with function approximation. In: 18th international conference on machine learning, ICML. San Francisco
Google Scholar
Pyeatt LD, Howe AE (1998) Decision tree function approximation in reinforcement learning. Technical Report TR CS-98-112, Colorado State University
Google Scholar
Ribas D, Palomeras N, Ridao P, Carreras M, Hernandez E (2007) Ictineu AUV wins the first SAUC-E competition. In: IEEE international conference on robotics and automation, ICRA. Roma
Google Scholar
Richter S, Aberdeen D, Yu J (2006) Natural actor-critic for road traffic optimisation. In: Neural information processing systems, NIPS, pp 1169–1176. Cambridge
Google Scholar
Singh SP, Jaakkola T, Jordan MI (1994) Learning without state-estimation in partially observable markovian decision processes. In: 11th international conference on machine learning, ICML, pp 284–292. Morgan Kaufmann, New Jersey, USA
Google Scholar
Singh SP, Jaakkola T, Jordan MI (1995) Reinforcement learning with soft state aggregation. Advances in neural information processing systems, NIPS, 7. Cambridge
Google Scholar
Smart WD, Kaelbling LP (2000) Practical reinforcement learning in continuous spaces. In: International conference on machine learning, ICML. San Francisco
Google Scholar
Sutton RS (1988) Learning to predict by the method of temporal differences. J Mach Learn 3:9–44
Google Scholar
Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: 7th international workshop on machine learning, pp 216–224. San Francisco
Google Scholar
Sutton RS (1996) Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems, NIPS 9:1038–1044
Google Scholar
Sutton RS, Barto A (1998) Reinforcement Learning, an introduction. MIT Press, Cambridge
Google Scholar
Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, NIPS 12:1057–1063
Google Scholar
Tesauro GJ (1992) Practical issues in temporal difference learning. J Mach Learn 8(3/4): 257–277
Google Scholar
Tsitsiklis JN, Van Roy B (1996) Feature-based methods for large scale dynamic programming. J Mach Learn 22:59–94
Google Scholar
Tsitsiklis JN, Van Roy B (1997) Average cost temporal-difference learning. IEEE Trans Autom Control 42:674–690
Google Scholar
Ueno T, Nakamura Y, Shibata T, Hosoda K, Ishii S (2006) Stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy natural actor-critic. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
Google Scholar
Watkins CJCH, Dayan P (1992) Q-learning. J Mach Learn 8:279–292
Google Scholar
Weaver S, Baird L, Polycarpou M (1998) An analytical framework for local feedforward networks. IEEE Trans Neural Netw 9(3)
Google Scholar
Zhang W, Dietterich TG (1995) A reinforcement learning approach to job-shop scheduling. In: International Joint Conference on Artificial Intelligence, IJCAI
Google Scholar

Download references

Acknowledgements

This research was sponsored by the Spanish government (DPI2008-06548-C03-03, DPI2011-27977-C03-02) and the PANDORA EU FP7-Project under the grant agreement No: ICT-288273.

Author information

Authors and Affiliations

Computer Vision and Robotics Group, University of Girona, Edifici PIV, Campus Montilivi, 17071, Girona, Spain
Marc Carreras & Pere Ridao
Control Engineering and Intelligent Systems Group, University of Girona, Edifici PIV, Campus Montilivi, 17071, Girona, Spain
Andrés El-fakdi

Authors

Marc Carreras
View author publications
You can also search for this author in PubMed Google Scholar
Andrés El-fakdi
View author publications
You can also search for this author in PubMed Google Scholar
Pere Ridao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Carreras .

Editor information

Editors and Affiliations

, Department of Mechanical Engineering, Dalhousie University, 1360 Barrington St., Halifax, Nova Scotia, Canada
Mae L. Seto

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Carreras, M., El-fakdi, A., Ridao, P. (2013). Behavior Adaptation by Means of Reinforcement Learning. In: Seto, M. (eds) Marine Robot Autonomy. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5659-9_7

Download citation

DOI: https://doi.org/10.1007/978-1-4614-5659-9_7
Published: 06 November 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5658-2
Online ISBN: 978-1-4614-5659-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics