Machine Learning

, Volume 64, Issue 1–3, pp 91–119 | Cite as

Graph kernels and Gaussian processes for relational reinforcement learning

  • Kurt Driessens
  • Jan Ramon
  • Thomas Gärtner


RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance based regression as a generalization algorithm for RRL.


Reinforcement learning Relational learning Graph kernels Gaussian processes 


  1. Barnett, S. (1979). Matrix Methods for Engineers and Scientists. MacGraw-Hill.Google Scholar
  2. Cortes, C., Haffner, P., & Mohri, M. (2003). Positive Definite Rational Kernels. In Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop.Google Scholar
  3. Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. In Proceedings of AAAI-98/IAAI-98, (pp. 761–768).Google Scholar
  4. Demaine, E., Hohenberger, S., & Liben-Nowell, D. (2002). Tetris is Hard, Even to Approximate. Technical Report MIT-LCS-TR-865, Massachussets Institue of Technology, Boston.Google Scholar
  5. Deshpande, M., Kuramochi, M., & Karypis, G. (2002). Automated Approaches for Classifying Structures. In Proceedings of the 2nd ACM SIGKDD Workshop on Data Mining in Bioinformatics.Google Scholar
  6. Diestel, R. (2000). Graph Theory. Springer-Verlag.Google Scholar
  7. Dietterich, T., & Wang, X. (2002). Batch value function approximation via support vectors. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems, vol. 14, Cambridge, MA, The MIT Press.Google Scholar
  8. Driessens, K., & Džeroski, S. (2002). Integrating experimentation and guidance in relational reinforcement learning. In C. Sammut, & A. Hoffmann (Eds.), Proceedings of the Nineteenth International Conference on Machine Learning (pp. 115–122). Morgan Kaufmann Publishers, Inc.Google Scholar
  9. Driessens, K., & Džeroski, S. (2004). Integrating guidance into relational reinforcement learning. Machine Learning, 57, 271–304.MATHCrossRefGoogle Scholar
  10. Driessens, K., & Ramon, J. (2003). Relational instance based regression for relational reinforcement learning. In Proceedings of the Twentieth International Conference on Machine Learning (pp. 123–130). AAAI Press.Google Scholar
  11. Driessens, K., Ramon, J., & Blockeel, H. (2001). Speeding up Relational Reinforcement Learning Through the Use of an Incremental First Order Decision Tree Learner. In L. De Raedt, & P. Flach (Eds.), Proceedings of the 13th European Conference on Machine Learning, vol. 2167 of Lecture Notes in Artificial Intelligence (pp. 97–108). Springer-Verlag.Google Scholar
  12. Džeroski, S., De Raedt, L., & Blockeel, H. (1998). Relational reinforcement Learning. In Proceedings of the 15th International Conference on Machine Learning (pp. 136–143). Morgan Kaufmann.Google Scholar
  13. Engel, Y., Mannor, S., & Meir, R. (2003). Bayes meets Bellman: The gaussian process approach to temporal difference learning. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003) (pp. 154–161). Morgan Kaufmann.Google Scholar
  14. Gärtner, T. (2002). Exponential and Geometric Kernels for Graphs. In NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data.Google Scholar
  15. Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5(1), 49–58.Google Scholar
  16. Gärtner, T., Driessens, K., & Ramon, J. (2003a). Graph kernels and Gaussian processes for relational reinforcement learning. In Inductive Logic Programming, 13th International Conference, ILP 2003, Proceedings, vol. 2835 of Lecture Notes in Computer Science (pp. 146–163). Springer.Google Scholar
  17. Gärtner, T., Flach, P., & Wrobel, S. (2003b). On graph kernels: Hardness Results and Efficient Alternatives. In M. W. B. Schölkopf (Ed.), Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop (129–143).Google Scholar
  18. Gibbs, M. (1997). Bayesian Gaussian Processes for Regression and Classification. Ph.D. thesis, University of Cambridge.Google Scholar
  19. Golub, G. H., & Van Loan, C. F. (1996). Matrix computations. Johns Hopkins Series in the Mathematical Sciences. The Johns Hopkins University Press.Google Scholar
  20. Graepel, T. (2002). PAC-Bayesian Pattern Classification with Kernels. Ph.D. thesis, TU Berlin.Google Scholar
  21. Horvath, T., Gärtner, T., & Wrobel, S. (2004). Cyclic Pattern Kernels for Predictive Graph Mining. In Proceedings of the International Conference on Knowledge Discovery and Data Mining.Google Scholar
  22. Imrich, W., & Klavžar, S. (2000). Product Graphs: Structure and Recognition. John Wiley.Google Scholar
  23. Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.Google Scholar
  24. Kashima, H., & Inokuchi, A. (2002). Kernels for Graph Classification. In ICDM Workshop on Active Mining.Google Scholar
  25. Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels Between Labeled Graphs. In Proceedings of the 20th International Conference on Machine Learning.Google Scholar
  26. Korte, B., & Vygen, J. (2002). Combinatorial Optimization: Theory and Algorithms. Springer-Verlag.Google Scholar
  27. Kuramochi, M., & Karypis, G. (2001). Frequent subgraph discovery. In Proceedings of the IEEE International Conference on Data Mining.Google Scholar
  28. MacKay, D. (1997a) Introduction to Gaussian processes. Aavailable at
  29. MacKay, D. J. C. (1997b). Introduction to Gaussian processes. Available at
  30. Mitchell, T. (1997). Machine Learning. McGraw-Hill.Google Scholar
  31. Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49, 161–178.MATHCrossRefGoogle Scholar
  32. Rasmussen, C. E., & Kuss, M. (2004). Gaussian Processes in Reinforcement Learning. In Advances in Neural Information Processing Systems, vol. 16. MIT Press.Google Scholar
  33. Rifkin, R. M. (2002). Everything old is new again: A fresh Look at Historical Approaches to Machine Learning. Ph.D. thesis, MIT.Google Scholar
  34. Saunders, C., Gammerman, A., & Vovk, v. (1998). Ridge Regression Learning Algorithm in Dual Variables. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann.Google Scholar
  35. Schaal, S., Atkeson, C. G., & Vijayakumar, S. (2000). Real-Time Robot Learning with Locally Weighted Statistical Learning. In Proceedings of the IEEE International Conference on Robotics and Automation (pp. 288–293). IEEE Press, Piscataway, N.J.Google Scholar
  36. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. MIT Press.Google Scholar
  37. Smart, W. D., & Kaelbling, L. P. (2000). Practical Reinforcement Learning in Continuous Spaces. In Proceedings of the 17th International Conference on Machine Learning (pp. 903–910). Morgan Kaufmann.Google Scholar
  38. Sutton, R., & Barto, A. (1998). Reinforcement Learning: An introduction. Cambridge, MA: The MIT Press.Google Scholar
  39. Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer-Verlag.Google Scholar
  40. Watkins, C. (1989). Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge.Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of WaikatoHamiltonNew Zealand
  2. 2.Department of Computer ScienceKatholieke Universiteit LeuvenHeverleeBelgium
  3. 3.Knowledge Discovery, Fraunhofer Institut Autonome Intelligente Systeme, Schloss BirlinghovenSankt AugustinGermany

Personalised recommendations