Recursive Neural Networks for Undirected Graphs for Learning Molecular Endpoints

  • Ian Walsh
  • Alessandro Vullo
  • Gianluca Pollastri
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5780)


Accurately predicting the endpoints of chemical compounds is an important step towards drug design and molecular screening in particular.

Here we develop a recursive architecture that is capable of mapping Undirected Graphs into individual labels, and apply it to the prediction of a number of different properties of small molecules. The results we obtain are generally state-of-the-art.

The final model is completely general and may be applied not only to prediction of molecular properties, but to a vast range of problems in which the input is a graph and the output is either a single property or (with small modifications) a set of properties of the nodes.


Support Vector Machine Root Mean Square Error Undirected Graph Aqueous Solubility Multi Layer Perceptron 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Hansch, C., Muir, R.M., Fujita, T., Maloney, P., Geiger, E., Streich, M.: The correlation of biological activity of plant growth regulators and chloromycetin derivatives with hammett constants and partition coefficients. J. Am. Chem. Soc. 85, 2817 (1963)CrossRefGoogle Scholar
  2. 2.
    Delaney, J.: Esol: Estimating aqueous solubility directly from molecular structure. J. Chem. Inf. Comput. Sci. 44(3), 1000–1005 (2004)CrossRefPubMedGoogle Scholar
  3. 3.
    Huuskonen, J.: Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. J. Chem. Inf. Comput. Sci. 40(3), 773–777 (2000)CrossRefPubMedGoogle Scholar
  4. 4.
    Fröhlich, H., Wegner, J., Zell, A.: Towards optimal descriptor subset selection with support vector machines in classification and regression. J. Chem. Inf. Comput. Sci. 45(3), 581–590 (2005)CrossRefGoogle Scholar
  5. 5.
    Karthikeyan, M.: General melting point prediction based on a diverse compound data set and artificial neural networks. J. Chem. Inf. Comput. Sci. 45(3), 581–590 (2005)CrossRefGoogle Scholar
  6. 6.
    Wang, R., Fu, Y., Lai, L.: Towards optimal descriptor subset selection with support vector machines in classification and regression. J. Chem. Inf. Comput. Sci. 37(3), 615–621 (1997)CrossRefGoogle Scholar
  7. 7.
    Kazius, J., McGuire, R., Bursi, R.: Derivation and validation of toxicophores for mutagenicity prediction. J. Med. Chem. 48(1), 312–320 (2005)CrossRefPubMedGoogle Scholar
  8. 8.
    Kazius, J., Nijssen, S., Kok, J., Bäck, T., Ijzerman, A.: Substructure mining using elaborate chemical representation. J. Chem. Inf. Model. 46(2), 597–605 (2006)CrossRefPubMedGoogle Scholar
  9. 9.
    Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering 17(8), 1036–1050 (2005)CrossRefGoogle Scholar
  10. 10.
    Benigni, R., Giuliani, A.: Putting the predictive toxicology challenge into perspective: reflections on the results. Bioinformatics 19(10), 1194–1200 (2003)CrossRefPubMedGoogle Scholar
  11. 11.
    Mahé, P., Ueda, N., Akutsu, T., Perret, J., Vert, J.: Graph kernels for molecular structure-activity relationship analysis with support vector machines. Journal of Chemical Information and Modeling 45, 939–951 (2005)CrossRefPubMedGoogle Scholar
  12. 12.
    Azencott, C., Ksikes, A., Swamidass, A., Chen, J., Ralaivola, L., Baldi, P.: One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties. J. Chem. Inf. Comput. Sci. 47(3), 965–974 (2007)CrossRefGoogle Scholar
  13. 13.
    Ceroni, A., Costa, F., Frasconi, P.: Classification of small molecules by two- and three-dimensional decomposition kernels. Bioinformatics 23(16), 2038–2045 (2007)CrossRefPubMedGoogle Scholar
  14. 14.
    Swamidass, S., Chen, J., Bruand, J., Phung, P., Ralaivola, L., Baldi, P.: Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics 21(suppl. 1), 359–368 (2005)CrossRefGoogle Scholar
  15. 15.
    Micheli, A., Sperduti, A., Starita, A.: An introduction to recursive neural networks and kernel methods for cheminformatics. Current Pharmaceutical Design 13(14), 1469–1495 (2007)CrossRefPubMedGoogle Scholar
  16. 16.
    Sperduti, A., Starita, A.: Supervised neural networks for the classification of structures. IEEE Transactions on Neural Networks 8(3), 714–735 (1997)CrossRefPubMedGoogle Scholar
  17. 17.
    Frasconi, P.: An introduction to learning structured information. J. Chem. Inf. Comput. Sci. 1387/1998, 99 (2004)Google Scholar
  18. 18.
    Frasconi, P., Gori, M., Sperduti, A.: A general framework for adaptive processing of data structures. IEEE Transactions on Neural Networks 9(5), 768–786 (1998)CrossRefPubMedGoogle Scholar
  19. 19.
    Bernazzani, L., Duce, C., Micheli, A., Mollica, V., Sperduti, A., Starita, A., Tiné, M.: Predicting physical-chemical properties of compounds from molecular structures by recursive neural networks. Applied Intelligence 19(1-2), 9–25 (2003)Google Scholar
  20. 20.
    Micheli, A., Portera, F., Sperduti, A.: QSAR/QSPR studies by kernel machines, recursive neural networks and their integration. In: Apolloni, B., Marinaro, M., Tagliaferri, R. (eds.) WIRN 2003. LNCS, vol. 2859, pp. 308–315. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  21. 21.
    Bianucci, A., Micheli, A., Sperduti, A., Starita, A.: Application of cascade correlation networks for structures to chemistry. Applied Intelligence 12(1-2), 117–147 (2000)CrossRefGoogle Scholar
  22. 22.
    Siu-Yeung, C., Zheru, C.: Genetic evolution processing of data structures for image classification. IEEE Transactions on Knowledge and Data Engineering 17(2), 216–231 (2005)CrossRefGoogle Scholar
  23. 23.
    Costa, F., Frasconi, P., Lombardo, V., Soda, G.: Towards incremental parsing of natural language using recursive neural networks. Applied Intelligence 19(1-2), 9–25 (2003)CrossRefGoogle Scholar
  24. 24.
    Bianchini, M., Maggini, M., Sarti, L., Scarselli, F.: Recursive neural networks learn to localize faces. Pattern Recognition Letters 26(12), 1885–1895 (2005)CrossRefGoogle Scholar
  25. 25.
    Zheng, M., Liu, Z., Xue, C., Zhu, W., Chen, K., Luo, X., Jiang, H.: Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine. Bioinformatics 22(17), 2099–2106 (2006)CrossRefPubMedGoogle Scholar
  26. 26.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 5(2), 157–166 (1994)CrossRefPubMedGoogle Scholar
  27. 27.
    The open babel package version 2.1.1,
  28. 28.
    Huuskonen, J.: Estimation of aqueous solubility in drug design. Combinatorial Chemistry and High Throughput Screening 4(3), 311–316 (2000)CrossRefGoogle Scholar
  29. 29.
    Butina, D., Gola, J.: Modeling aqueous solubility. J. Chem. Inf. Comput. Sci. 43, 837–841 (2003)CrossRefPubMedGoogle Scholar
  30. 30.
    Jain, N., Yalkowsky, S.: Estimation of the aqueous solubility i: Application to organic nonelectrolytes. Journal of Pharmaceutical Sciences 90(2), 234–252 (2001)CrossRefPubMedGoogle Scholar
  31. 31.
    Abramowitz, R., Yalkowsky, S.: Melting point, boiling point, and symmetry. Pharmaceutical Research 7(9), 942–947 (1990)CrossRefPubMedGoogle Scholar
  32. 32.
    Molecular diversity preservation international database,
  33. 33.
    Mortelmans, K., Zeiger, E.: The ames salmonella/microsome mutagenicity assay. Mutat. Res. 455(1-2), 29–60 (2000)CrossRefPubMedGoogle Scholar
  34. 34.
    Helma, C., Cramer, T., Kramer, S., De Raedt, L.: Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J. Chem. Inf. Comput. Sci. 44(4), 1402–1411 (2004)CrossRefPubMedGoogle Scholar
  35. 35.
    Piegorsch, W., Zeiger, E.: Measuring intra-assay agreement for the ames salmonella assay. Statistical Methods in Toxicology. Lect. Notes Med. Informatics 43, 35–41 (1991)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Ian Walsh
    • 1
  • Alessandro Vullo
    • 1
  • Gianluca Pollastri
    • 1
  1. 1.School of Computer Science and Informatics and Complex and Adaptive Systems LaboratoryUniversity College Dublin, BelfieldDublin 4Ireland

Personalised recommendations