Abstract
The use of robots in society could be expanded by using reinforcement learning (RL) to allow robots to learn and adapt to new situations on-line. RL is a paradigm for learning sequential decision making tasks, usually formulated as a Markov Decision Process (MDP). For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-time. In addition, the algorithm must learn efficiently in the face of noise, sensor/actuator delays, and continuous state features. In this paper, we present the texplore ROS code release, which contains texplore, the first algorithm to address all of these challenges together. We demonstrate texplore learning to control the velocity of an autonomous vehicle in real-time. texplore has been released as an open-source ROS repository, enabling learning on a variety of robot tasks.
Chapter PDF
Similar content being viewed by others
References
Beeson, P., O’Quin, J., Gillan, B., Nimmagadda, T., Ristroph, M., Li, D., Stone, P.: Multiagent interactions in urban driving. Journal of Physical Agents 2(1), 15–30 (2008)
Brafman, R., Tennenholtz, M.: R-Max - a general polynomial time algorithm for near-optimal reinforcement learning. In: IJCAI (2001)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Dietterich, T.: The MAXQ method for hierarchical reinforcement learning. In: ICML, pp. 118–126 (1998)
Hester, T.: TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. PhD thesis, Department of Computer Science, University of Texas at Austin, Austin, TX (December 2012)
Hester, T., Quinlan, M., Stone, P.: Generalized model learning for reinforcement learning on a humanoid robot. In: ICRA (May 2010)
Hester, T., Quinlan, M., Stone, P.: RTMBA: A real-time model-based reinforcement learning architecture for robot control. In: ICRA (2012)
Hester, T., Stone, P.: Real time targeted exploration in large domains. In: ICDL (August 2010)
Hester, T., Stone, P.: TEXPLORE: Real-time sample-efficient reinforcement learning for robots. Machine Learning 87, 10–20 (2012)
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Kohl, N., Stone, P.: Machine learning for fast quadrupedal locomotion. In: AAAI Conference on Artificial Intelligence (2004)
Konidaris, G., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: IJCAI (2007)
Ng, A., Kim, H.J., Jordan, M., Sastry, S.: Autonomous helicopter flight via reinforcement learning. In: NIPS (2003)
Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.: ROS: An open-source robot operating system. In: ICRA Workshop on Open Source Software (2009)
Quinlan, R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Quinlan, R.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, pp. 343–348. World Scientific, Singapore (1992)
Rummery, G., Niranjan, M.: On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166. Cambridge University Engineering Department (1994)
Sutton, R.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: ICML, pp. 216–224 (1990)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Tanner, B., White, A.: RL-Glue: Language-independent software for reinforcement-learning experiments. JMLR 10, 2133–2136 (2009)
Watkins, C.: Learning From Delayed Rewards. PhD thesis. University of Cambridge (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hester, T., Stone, P. (2014). The Open-Source TEXPLORE Code Release for Reinforcement Learning on Robots. In: Behnke, S., Veloso, M., Visser, A., Xiong, R. (eds) RoboCup 2013: Robot World Cup XVII. RoboCup 2013. Lecture Notes in Computer Science(), vol 8371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44468-9_47
Download citation
DOI: https://doi.org/10.1007/978-3-662-44468-9_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44467-2
Online ISBN: 978-3-662-44468-9
eBook Packages: Computer ScienceComputer Science (R0)