Reinforcement Learning through Global Stochastic Search in N-MDPs

Leonetti, Matteo; Iocchi, Luca; Ramamoorthy, Subramanian

doi:10.1007/978-3-642-23783-6_21

Reinforcement Learning through Global Stochastic Search in N-MDPs

Matteo Leonetti²³,
Luca Iocchi²³ &
Subramanian Ramamoorthy²⁴

Conference paper

3103 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6912))

Abstract

Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirement on the knowledge representation in order to be sound: the underlying stochastic process must be Markovian. In many applications, including those involving interactions between multiple agents (e.g., humans and robots), sources of uncertainty affect rewards and transition dynamics in such a way that a Markovian representation would be computationally very expensive. An alternative formulation of the decision problem involves partially specified behaviors with choice points. While this reduces the complexity of the policy space that must be explored - something that is crucial for realistic autonomous agents that must bound search time - it does render the domain Non-Markovian. In this paper, we present a novel algorithm for reinforcement learning in Non-Markovian domains. Our algorithm, Stochastic Search Monte Carlo, performs a global stochastic search in policy space, shaping the distribution from which the next policy is selected by estimating an upper bound on the value of each action. We experimentally show how, in challenging domains for RL, high-level decisions in Non-Markovian processes can lead to a behavior that is at least as good as the one learned by traditional algorithms, and can be achieved with significantly fewer samples.

Download to read the full chapter text

Chapter PDF

References

Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 89–96 (2009)
Google Scholar
Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15(4), 319–350 (2001)
MathSciNet MATH Google Scholar
Crook, P.A.: Learning in a state of confusion: Employing active perception and reinforcement learning in partially observable worlds. Technical report, University of Edinburgh (2006)
Google Scholar
Jung, T., Polani, D.: Learning robocup-keepaway with kernels. In: JMLR: Workshop and Conference Proceedings (Gaussian Processes in Practice), vol. 1, pp. 33–57 (2007)
Google Scholar
Kalyanakrishnan, S., Stone, P.: An empirical analysis of value function-based and policy search reinforcement learning. In: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2009, SC 2009. International Foundation for Autonomous Agents and Multiagent Systems, Richland, pp. 749–756 (2009)
Google Scholar
Kalyanakrishnan, S., Stone, P.: Learning Complementary Multiagent Behaviors: A Case Study. In: Proceedings of the 13th RoboCup International Symposium, pp. 153–165 (2009)
Google Scholar
Leonetti, M., Iocchi, L.: Improving the performance of complex agent plans through reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 723–730 (2010)
Google Scholar
Littman, M.L.: Memoryless policies: Theoretical limitations and practical results. In: From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior, pp. 238–247 (1994)
Google Scholar
Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 323–331 (1998)
Google Scholar
Pendrith, M.D., McGarity, M.: An analysis of direct reinforcement learning in non-markovian domains. In: Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pp. 421–429 (1998)
Google Scholar
Perkins, T.J.: Reinforcement learning for POMDPs based on action values and stochastic optimization. In: Proceedings of the National Conference on Artificial Intelligence, pp. 199–204 (2002)
Google Scholar
Perkins, T.J., Pendrith, M.D.: On the existence of fixed points for Q-learning and Sarsa in partially observable domains. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 490–497 (2002)
Google Scholar
Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. In: Recent Advances in Reinforcement Learning, pp. 123–158 (1996)
Google Scholar
Spall, J.C.: Introduction to Stochastic Search and Optimization, 1st edn. John Wiley & Sons, Inc., New York (2003)
Book MATH Google Scholar
Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior 13(3), 165–188 (2005)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Taylor, M.E., Whiteson, S., Stone, P.: Transfer via inter-task mappings in policy search reinforcement learning. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2007, pp. 1–37. ACM, New York (2007)
Google Scholar
Whiteson, S., Kohl, N., Miikkulainen, R., Stone, P.: Evolving soccer keepaway players through task decomposition. Machine Learning 59, 5–30 (2005), doi:10.1007/s10994-005-0460-9
Article MATH Google Scholar
Young, H.: Strategic learning and its limits. Oxford University Press, USA (2004)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and System Sciences, Sapienza University of Rome, via Ariosto 25, Rome, 00185, Italy
Matteo Leonetti & Luca Iocchi
School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, United Kingdom
Subramanian Ramamoorthy

Authors

Matteo Leonetti
View author publications
You can also search for this author in PubMed Google Scholar
Luca Iocchi
View author publications
You can also search for this author in PubMed Google Scholar
Subramanian Ramamoorthy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leonetti, M., Iocchi, L., Ramamoorthy, S. (2011). Reinforcement Learning through Global Stochastic Search in N-MDPs. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23783-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-23783-6_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23782-9
Online ISBN: 978-3-642-23783-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics