# Graph kernels and Gaussian processes for relational reinforcement learning

## Abstract

RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance based regression as a generalization algorithm for RRL.

### Keywords

Reinforcement learning Relational learning Graph kernels Gaussian processes### References

- Barnett, S. (1979).
*Matrix Methods for Engineers and Scientists*. MacGraw-Hill.Google Scholar - Cortes, C., Haffner, P., & Mohri, M. (2003). Positive Definite Rational Kernels. In
*Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop*.Google Scholar - Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. In
*Proceedings of AAAI-98/IAAI-98*, (pp. 761–768).Google Scholar - Demaine, E., Hohenberger, S., & Liben-Nowell, D. (2002). Tetris is Hard, Even to Approximate. Technical Report MIT-LCS-TR-865, Massachussets Institue of Technology, Boston.Google Scholar
- Deshpande, M., Kuramochi, M., & Karypis, G. (2002). Automated Approaches for Classifying Structures. In
*Proceedings of the 2nd ACM SIGKDD Workshop on Data Mining in Bioinformatics*.Google Scholar - Diestel, R. (2000).
*Graph Theory*. Springer-Verlag.Google Scholar - Dietterich, T., & Wang, X. (2002). Batch value function approximation via support vectors. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.),
*Advances in Neural Information Processing Systems*, vol. 14, Cambridge, MA, The MIT Press.Google Scholar - Driessens, K., & Džeroski, S. (2002). Integrating experimentation and guidance in relational reinforcement learning. In C. Sammut, & A. Hoffmann (Eds.),
*Proceedings of the Nineteenth International Conference on Machine Learning*(pp. 115–122). Morgan Kaufmann Publishers, Inc.Google Scholar - Driessens, K., & Džeroski, S. (2004). Integrating guidance into relational reinforcement learning.
*Machine Learning*,*57*, 271–304.MATHCrossRefGoogle Scholar - Driessens, K., & Ramon, J. (2003). Relational instance based regression for relational reinforcement learning. In
*Proceedings of the Twentieth International Conference on Machine Learning*(pp. 123–130). AAAI Press.Google Scholar - Driessens, K., Ramon, J., & Blockeel, H. (2001). Speeding up Relational Reinforcement Learning Through the Use of an Incremental First Order Decision Tree Learner. In L. De Raedt, & P. Flach (Eds.),
*Proceedings of the 13th European Conference on Machine Learning*, vol. 2167 of*Lecture Notes in Artificial Intelligence*(pp. 97–108). Springer-Verlag.Google Scholar - Džeroski, S., De Raedt, L., & Blockeel, H. (1998). Relational reinforcement Learning. In
*Proceedings of the 15th International Conference on Machine Learning*(pp. 136–143). Morgan Kaufmann.Google Scholar - Engel, Y., Mannor, S., & Meir, R. (2003). Bayes meets Bellman: The gaussian process approach to temporal difference learning. In
*Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003)*(pp. 154–161). Morgan Kaufmann.Google Scholar - Gärtner, T. (2002). Exponential and Geometric Kernels for Graphs. In
*NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data*.Google Scholar - Gärtner, T. (2003). A survey of kernels for structured data.
*SIGKDD Explorations*,*5*(1), 49–58.Google Scholar - Gärtner, T., Driessens, K., & Ramon, J. (2003a). Graph kernels and Gaussian processes for relational reinforcement learning. In
*Inductive Logic Programming, 13th International Conference, ILP 2003, Proceedings*, vol. 2835 of*Lecture Notes in Computer Science*(pp. 146–163). Springer.Google Scholar - Gärtner, T., Flach, P., & Wrobel, S. (2003b). On graph kernels: Hardness Results and Efficient Alternatives. In M. W. B. Schölkopf (Ed.),
*Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop*(129–143).Google Scholar - Gibbs, M. (1997). Bayesian Gaussian Processes for Regression and Classification. Ph.D. thesis, University of Cambridge.Google Scholar
- Golub, G. H., & Van Loan, C. F. (1996).
*Matrix computations*. Johns Hopkins Series in the Mathematical Sciences. The Johns Hopkins University Press.Google Scholar - Graepel, T. (2002). PAC-Bayesian Pattern Classification with Kernels. Ph.D. thesis, TU Berlin.Google Scholar
- Horvath, T., Gärtner, T., & Wrobel, S. (2004). Cyclic Pattern Kernels for Predictive Graph Mining. In
*Proceedings of the International Conference on Knowledge Discovery and Data Mining*.Google Scholar - Imrich, W., & Klavžar, S. (2000).
*Product Graphs: Structure and Recognition*. John Wiley.Google Scholar - Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey.
*Journal of Artificial Intelligence Research*,*4*, 237–285.Google Scholar - Kashima, H., & Inokuchi, A. (2002). Kernels for Graph Classification. In
*ICDM Workshop on Active Mining*.Google Scholar - Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels Between Labeled Graphs. In
*Proceedings of the 20th International Conference on Machine Learning*.Google Scholar - Korte, B., & Vygen, J. (2002).
*Combinatorial Optimization: Theory and Algorithms*. Springer-Verlag.Google Scholar - Kuramochi, M., & Karypis, G. (2001). Frequent subgraph discovery. In
*Proceedings of the IEEE International Conference on Data Mining*.Google Scholar - MacKay, D. (1997a) Introduction to Gaussian processes. Aavailable at http://wol.ra.phy.cam.ac.uk/mackay.
- MacKay, D. J. C. (1997b). Introduction to Gaussian processes. Available at http://wol.ra.phy.cam.ac.uk/mackay.
- Mitchell, T. (1997).
*Machine Learning*. McGraw-Hill.Google Scholar - Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning.
*Machine Learning*,*49*, 161–178.MATHCrossRefGoogle Scholar - Rasmussen, C. E., & Kuss, M. (2004). Gaussian Processes in Reinforcement Learning. In
*Advances in Neural Information Processing Systems*, vol. 16. MIT Press.Google Scholar - Rifkin, R. M. (2002). Everything old is new again: A fresh Look at Historical Approaches to Machine Learning. Ph.D. thesis, MIT.Google Scholar
- Saunders, C., Gammerman, A., & Vovk, v. (1998). Ridge Regression Learning Algorithm in Dual Variables. In
*Proceedings of the Fifteenth International Conference on Machine Learning*. Morgan Kaufmann.Google Scholar - Schaal, S., Atkeson, C. G., & Vijayakumar, S. (2000). Real-Time Robot Learning with Locally Weighted Statistical Learning. In
*Proceedings of the IEEE International Conference on Robotics and Automation*(pp. 288–293). IEEE Press, Piscataway, N.J.Google Scholar - Schölkopf, B., & Smola, A. J. (2002).
*Learning with kernels*. MIT Press.Google Scholar - Smart, W. D., & Kaelbling, L. P. (2000). Practical Reinforcement Learning in Continuous Spaces. In
*Proceedings of the 17th International Conference on Machine Learning*(pp. 903–910). Morgan Kaufmann.Google Scholar - Sutton, R., & Barto, A. (1998).
*Reinforcement Learning: An introduction*. Cambridge, MA: The MIT Press.Google Scholar - Vapnik, V. (1995).
*The Nature of Statistical Learning Theory*. Springer-Verlag.Google Scholar - Watkins, C. (1989). Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge.Google Scholar