Abstract
One of the key problems in model-based reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large relational domains, in which there is a varying number of objects and relations between them. We provide one of the first solutions to exploring large relational Markov decision processes by developing relational extensions of the concepts of the Explicit Explore or Exploit (E 3) algorithm. A key insight is that the inherent generalization of learnt knowledge in the relational representation has profound implications also on the exploration strategy: what in a propositional setting would be considered a novel situation and worth exploration may in the relational setting be an instance of a well-known context in which exploitation is promising. Our experimental evaluation shows the effectiveness and benefit of relational exploration over several propositional benchmark approaches on noisy 3D simulated robot manipulation problems.
Chapter PDF
References
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proc. of the Int. Conf. on Machine Learning (ICML), pp. 41–48 (2009)
Boutilier, C., Reiter, R., Price, B.: Symbolic dynamic programming for first-order MDPs. In: Proc. of the Int. Conf. on Artificial Intelligence (IJCAI), pp. 690–700 (2001)
Brafman, R.I., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002)
Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. Journal of Artificial Intelligence Research 4(1), 129–145 (1996)
Croonenborghs, T., Ramon, J., Blockeel, H., Bruynooghe, M.: Online learning and exploiting relational models in reinforcement learning. In: Proc. of the Int. Conf. on Artificial Intelligence (IJCAI), pp. 726–731 (2007)
Driessens, K., Džeroski, S.: Integrating guidance into relational reinforcement learning. Machine Learning 57(3), 271–304 (2004)
Driessens, K., Ramon, J., Gärtner, T.: Graph kernels and Gaussian processes for relational reinforcement learning. In: Machine Learning (2006)
Džeroski, S., de Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43, 7–52 (2001)
Getoor, L., Taskar, B. (eds.): A Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)
Guestrin, C., Patrascu, R., Schuurmans, D.: Algorithm-directed exploration for model-based reinforcement learning in factored MDPs. In: Proc. of the Int. Conf. on Machine Learning (ICML), pp. 235–242 (2002)
Halbritter, F., Geibel, P.: Learning models of relational MDPs using graph kernels. In: Proc. of the Mexican Conf. on A.I (MICAI), pp. 409–419 (2007)
Joshi, S., Kersting, K., Khardon, R.: Self-taught decision theoretic planning with first order decision diagrams. In: Proceedings of ICAPS 2010 (2010)
Kearns, M., Koller, D.: Efficient reinforcement learning in factored MDPs. In: Proc. of the Int. Conf. on Artificial Intelligence (IJCAI), pp. 740–747 (1999)
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Machine Learning 49(2-3), 209–232 (2002)
Kersting, K., Driessens, K.: Non–parametric policy gradients: A unified treatment of propositional and relational domains. In: Proceedings of the 25th International Conference on Machine Learning (ICML 2008), July 5-9 (2008)
Lang, T., Toussaint, M.: Approximate inference for planning in stochastic relational worlds. In: Proc. of the Int. Conf. on Machine Learning, ICML (2009)
Lang, T., Toussaint, M.: Relevance grounding for planning in relational domains. In: Proc. of the European Conf. on Machine Learning (ECML) (September 2009)
Pasula, H.M., Zettlemoyer, L.S., Kaelbling, L.P.: Learning symbolic models of stochastic domains. Artificial Intelligence Research 29, 309–352 (2007)
Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete bayesian reinforcement learning. In: Proc. of the Int. Conf. on Machine Learning (ICML), pp. 697–704 (2006)
Ramon, J., Driessens, K., Croonenborghs, T.: Transfer learning in reinforcement learning problems through partial policy recycling. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 699–707. Springer, Heidelberg (2007)
Sanner, S., Boutilier, C.: Practical solution techniques for first order MDPs. Artificial Intelligence Journal 173, 748–788 (2009)
Thrun, S.: The role of exploration in learning control. In: White, D., Sofge, D. (eds.) Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, Florence (1992)
Walsh, T.J.: Efficient learning of relational models for sequential decision making. PhD thesis, Rutgers, The State University of New Jersey, New Brunswick, NJ (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lang, T., Toussaint, M., Kersting, K. (2010). Exploration in Relational Worlds. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science(), vol 6322. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15883-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-15883-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15882-7
Online ISBN: 978-3-642-15883-4
eBook Packages: Computer ScienceComputer Science (R0)