Machine Learning

, Volume 75, Issue 1, pp 3–35 | Cite as

Graph kernels based on tree patterns for molecules

Open Access
Article

Abstract

Motivated by chemical applications, we revisit and extend a family of positive definite kernels for graphs based on the detection of common subtrees, initially proposed by Ramon and Gärtner (Proceedings of the first international workshop on mining graphs, trees and sequences, pp. 65–74, 2003). We propose new kernels with a parameter to control the complexity of the subtrees used as features to represent the graphs. This parameter allows to smoothly interpolate between classical graph kernels based on the count of common walks, on the one hand, and kernels that emphasize the detection of large common subtrees, on the other hand. We also propose two modular extensions to this formulation. The first extension increases the number of subtrees that define the feature space, and the second one removes noisy features from the graph representations. We validate experimentally these new kernels on problems of toxicity and anti-cancer activity prediction for small molecules with support vector machines.

Keywords

Graph kernels Support vector machines Chemoinformatics 

References

  1. Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S. V. N., Smola, A. J., & Kriegel, H.-P. (2005). Protein function prediction via graph kernels. Bioinformatics, 21(Suppl. 1), i47–i56. CrossRefGoogle Scholar
  2. Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Google Scholar
  3. Chen, J., Swamidass, S. J., Dou, Y., Bruand, J., & Baldi, P. (2005). ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics, 21(22), 4133–4139. CrossRefGoogle Scholar
  4. Deshpande, M., Kuramochi, M., Wale, N., & Karypis, G. (2005). Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1036–1050. CrossRefGoogle Scholar
  5. Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: hardness results and efficient alternatives. In B. Schölkopf & M. Warmuth (Eds.), Lecture notes in computer science : Vol. 2777. Proceedings of the sixteenth annual conference on computational learning theory and the seventh annual workshop on kernel machines (pp. 129–143). Heidelberg: Springer. Google Scholar
  6. Helma, C., Cramer, T., Kramer, S., & De Raedt, L. (2004). Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. Journal of Chemical Information and Computer Sciences, 44(4), 1402–1411. Google Scholar
  7. Horváth, T., Gärtner, T., & Wrobel, S. (2004). Cyclic pattern kernels for predictive graph mining. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 158–167). New York: ACM Press. CrossRefGoogle Scholar
  8. Inokuchi, A., Washio, T., & Motoda, H. (2003). Complete mining of frequent patterns from graphs: mining graph data. Machine Learning, 50(3), 321–354. MATHCrossRefGoogle Scholar
  9. Karklin, Y., Meraz, R. F., & Holbrook, S. R. (2005). Classification of non-coding RNA using graph representations of secondary structure. In Pacific symposium on biocomputing (pp. 4–15). Google Scholar
  10. Kashima, H., Tsuda, K., & Inokuchi, A. (2004). Kernels for graphs. In B. Schölkopf, K. Tsuda, & J. P. Vert (Eds.), Kernel methods in computational biology (pp. 155–170). Cambridge: MIT Press. Google Scholar
  11. King, R. D., Muggleton, S. H., Srinivasan, A., & Sternberg, M. J. (1996). Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences of the United States of America, 93(1), 438–442. CrossRefGoogle Scholar
  12. Leach, A. R., & Gillet, V. J. (2003). An introduction to chemoinformatics. Dordrecht: Kluwer Academic. Google Scholar
  13. Leslie, C., Eskin, E., & Noble, W. S. (2002). The spectrum kernel: a string kernel for SVM protein classification. In R. B. Altman, A. K. Dunker, L. Hunter, K. Lauerdale, & T. E. Klein (Eds.), Proceedings of the Pacific symposium on biocomputing 2002 (pp. 564–575). Singapore: World Scientific. Google Scholar
  14. Mahé, P., Ueda, N., Akutsu, T., Perret, J.-L., & Vert, J.-P. (2005). Graph kernels for molecular structure-activity relationship analysis with support vector machines. Journal of Chemical Information and Modeling, 45(4), 939–951. CrossRefGoogle Scholar
  15. Menchetti, S., Costa, F., & Frasconi, P. (2005). Weighted decomposition kernels. In L. De Raedt & S. Wrobel (Eds.), Proceedings of the twenty-second international conference on machine learning (ICML 2005) (pp. 585–592). New York: Assoc. Comput. Mach. Google Scholar
  16. Ralaivola, L., Swamidass, S. J., Saigo, H., & Baldi, P. (2005). Graph kernels for chemical informatics. Neural Networks, 18(8), 1093–1110. CrossRefGoogle Scholar
  17. Ramon, J., & Gärtner, T. (2003). Expressivity versus efficiency of graph kernels. In T. Washio & L. De Raedt (Eds.), Proceedings of the first international workshop on mining graphs, trees and sequences (pp. 65–74). Google Scholar
  18. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press. Google Scholar
  19. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press. Google Scholar
  20. Swamidass, S. J., Chen, J., Bruand, J., Phung, P., Ralaivola, L., & Baldi, P. (2005). Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(Suppl. 1), i359–i368. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2008

Authors and Affiliations

  1. 1.Centre for Computational BiologyEcole des Mines de Paris—ParisTechFontainebleauFrance
  2. 2.Institut CurieParisFrance
  3. 3.INSERM, U900ParisFrance

Personalised recommendations