Market-Based Reinforcement Learning in Partially Observable Worlds
Unlike traditional reinforcement learning (RL), market-based RL is in principle applicable to worlds described by partially observable Markov Decision Processes (POMDPs), where an agent needs to learn short-term memories of relevant previous events in order to execute optimal actions. Most previous work, however, has focused on reactive settings (MDPs) instead of POMDPs. Here we reimplement a recent approach to market-based RL and for the first time evaluate it in a toy POMDP setting.
KeywordsReinforcement Learning Memory Register Observable Markov Decision Process Observable Environment Bucket Brigade
Unable to display preview. Download preview PDF.
- 2.E. B. Baum and I. Durdanovic. An evolutionary Post production system. Technical report, NEC Research Institute, Princeton, NJ, January 2000.Google Scholar
- 4.J. H. Holland. Properties of the bucket brigade. In Proceedings of an International Conference on Genetic Algorithms. Hillsdale, NJ, 1985.Google Scholar
- 5.M. Hutter. Towards a universal theory of artificial intelligence based on algorithmic probability and sequential decision theory. Submitted to the 17 th International Joint Conference on Artificial Intelligence (IJCAI-2001), (IDSIA-14-00), December 2000.Google Scholar
- 6.T. Jaakkola, S. P. Singh, and M. I. Jordan. Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 345–352. MIT Press, Cambridge MA, 1995.Google Scholar
- 7.L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Technical report, Brown University, Providence RI, 1995.Google Scholar
- 8.M.L. Littman, A.R. Cassandra, and L.P. Kaelbling. Learning policies for partially observable environments: Scaling up. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 362–370. Morgan Kaufmann Publishers, San Francisco, CA, 1995.Google Scholar
- 9.R. A. McCallum. Overcoming incomplete perception with utile distinction memory. In Machine Learning: Proceedings of the Tenth International Conference. Morgan Kaufmann, Amherst, MA, 1993.Google Scholar
- 10.M. B. Ring. Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, Texas 78712, August 1994.Google Scholar
- 11.J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook. Institut für Informatik, Technische Universität München, 1987.Google Scholar
- 13.J. Schmidhuber. Reinforcement learning in Markovian and non-Markovian environments. InD. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 500–506. San Mateo, CA: Morgan Kaufmann, 1991.Google Scholar
- 15.R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.Google Scholar
- 18.M.A. Wiering and J. Schmidhuber. Solving POMDPs with Levin search and EIRA. In L. Saitta, editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 534–542. Morgan Kaufmann Publishers, San Francisco, CA, 1996.Google Scholar