Journal of Computer-Aided Molecular Design

, Volume 23, Issue 7, pp 419–429 | Cite as

Machine learning of chemical reactivity from databases of organic reactions

  • Gonçalo V. S. M. Carrera
  • Sunil Gupta
  • João Aires-de-SousaEmail author


Databases of chemical reactions contain knowledge about the reactivity of specific reagents. Although information is in general only explicitly available for compounds reported to react, it is possible to derive information about substructures that do not react in the reported reactions. Both types of information (positive and negative) can be used to train machine learning techniques to predict if a compound reacts or not with a specific reagent. The whole process was implemented with two databases of reactions, one involving BuNH2 as the reagent, and the other NaCNBH3. Negative information was derived using MOLMAP molecular descriptors, and classification models were developed with Random Forests also based on MOLMAP descriptors. MOLMAP descriptors were based exclusively on calculated physicochemical features of molecules. Correct predictions were achieved for ∼90% of independent test sets. While NaCNBH3 is a selective reducing reagent widely used in organic synthesis, BuNH2 is a nucleophile that mimics the reactivity of the lysine side chain (involved in an initiating step of the mechanism leading to skin sensitization).


MOLMAP Chemical reactivity Databases Machine learning Electrophilicity 



MOLecular maps of atom-level properties




Random forest


Volatile organic compounds


Quantitative structure activity relationship


Out of bag


Support vector machines


Receiver operating characteristic


Self organizing maps


High-throughput screening



G.C. and S.G. acknowledge Fundação para a Ciência e Tecnologia (Lisbon, Portugal) for financial support under grants SFRH/BD/18354/2004 and SFRH/BPD/14475/2003. Molecular Networks GmbH (Erlangen, Germany) and Infochem (Munich, Germany) are acknowledged for access to the PETRA program and to subsets of chemical reactions from the SPRESI database, respectively.

Supplementary material

10822_2009_9275_MOESM1_ESM.pdf (80 kb)
Supplementary material 1 (PDF 80 kb)


  1. 1.
    Aptula AO, Patlewicz G, Roberts DW (2005) Chem Res Toxicol 18:1420. doi: 10.1021/tx050075m CrossRefGoogle Scholar
  2. 2.
    Benigni R (2005) Chem Rev 105:1767. doi: 10.1021/cr030049y CrossRefGoogle Scholar
  3. 3.
    Metz JT, Huth JR, Hajduk PJ (2007) J Comput Aided Mol Des 21:139. doi: 10.1007/s10822-007-9109-z CrossRefGoogle Scholar
  4. 4.
  5. 5.
    Directive 2003/15/EC of the European Parliament and of the Council of 27 February 2003 amending Council Directive 76/768/EEC. OJ L066, 26–35, 11 March 2003Google Scholar
  6. 6.
    Lilienblum W, Dekant W, Foth H, Gebel T, Hengstler JG, Kahl R, Kramer P-J, Schweinfurth H, Wollin K-M (2008) Arch Toxicol 82:211. doi: 10.1007/s00204-008-0279-9 CrossRefGoogle Scholar
  7. 7.
    Aptula AO, Patlewicz G, Roberts DW, Schultz TW (2006) Toxicol In Vitro 20:239. doi: 10.1016/j.tiv.2005.07.003 CrossRefGoogle Scholar
  8. 8.
    Gerberick GF, Vassallo JD, Bailey RE, Chaney JG, Morrall SW, Lepoittevin J-P (2004) Toxicol Sci 81:332. doi: 10.1093/toxsci/kfh213 CrossRefGoogle Scholar
  9. 9.
    Gerberick GF, Vassallo JD, Foertsch LM, Price BB, Chaney JG, Lepoittevin J-P (2007) Toxicol Sci 97:427. doi: 10.1093/toxsci/kfm064 CrossRefGoogle Scholar
  10. 10.
    Natsch A, Emter R, Ellis G (2009) Toxicol Sci 107:106. doi: 10.1093/toxsci/kfn204 CrossRefGoogle Scholar
  11. 11.
    Patlewicz G, Aptula AO, Roberts DW, Uriarte E (2008) QSAR Comb Sci 27:60. doi: 10.1002/qsar.200710067 CrossRefGoogle Scholar
  12. 12.
    Gramatica P, Pilutti P, Papa E (2004) Atmos Environ 38:6167. doi: 10.1016/j.atmosenv.2004.07.026 CrossRefGoogle Scholar
  13. 13.
    Chaudry UA, Popelier PLA (2003) J Phys Chem A 107:4578. doi: 10.1021/jp034272a CrossRefGoogle Scholar
  14. 14.
    Zhang H, Qu X, Ando H (2005) J Mol Struct THEOCHEM 725:31. doi: 10.1016/j.theochem.2005.02.086 CrossRefGoogle Scholar
  15. 15.
    Hiob R, Karelson M (2000) J Chem Inf Comput Sci 40:1062. doi: 10.1021/ci0004457 Google Scholar
  16. 16.
    Meylan WM, Howard PH (2003) Environ Toxicol Chem 22:1724. doi: 10.1897/01-275 CrossRefGoogle Scholar
  17. 17.
    Gramatica P, Consonni V, Todeschini R (1999) Chemosphere 38:1371. doi: 10.1016/S0045-6535(98)00539-6 CrossRefGoogle Scholar
  18. 18.
    Atkinson R (1998) Environ Toxicol Chem 7:435. doi: 10.1897/1552-8618(1988)7[435:EOGHRR]2.0.CO;2 CrossRefGoogle Scholar
  19. 19.
    Gramatica P, Pilutti P, Papa E (2004) J Chem Inf Comput Sci 44:1794Google Scholar
  20. 20.
    Klamt A (1993) Chemosphere 26:1273. doi: 10.1016/0045-6535(93)90181-4 CrossRefGoogle Scholar
  21. 21.
    Fatemi MH (2006) Anal Chim Acta 556:355. doi: 10.1016/j.aca.2005.09.033 CrossRefGoogle Scholar
  22. 22.
    Huth JR, Mendoza R, Olejniczak ET, Johnson RW, Cothron DA, Liu Y, Lerner CG, Chen J, Hajduk PJ (2005) J Am Chem Soc 127:217CrossRefGoogle Scholar
  23. 23.
    Satoh H, Itono S, Funatsu K, Takano K, Nakata TA (1999) J Chem Inf Comput Sci 39:671. doi: 10.1021/ci9801567 Google Scholar
  24. 24.
    Satoh H, Funatsu K, Takano K, Nakata T (2000) Bull Chem Soc Jpn 73:1955. doi: 10.1246/bcsj.73.1955 CrossRefGoogle Scholar
  25. 25.
    Simon V, Gasteiger J, Zupan J (1993) J Am Chem Soc 115:9148. doi: 10.1021/ja00073a034 CrossRefGoogle Scholar
  26. 26.
    Gupta S, Mathew S, Abreu PM, Aires-de-Sousa J (2006) Bioorg Med Chem 14:1199. doi: 10.1016/j.bmc.2005.09.047 CrossRefGoogle Scholar
  27. 27.
    Zhang Q, Aires-de-Sousa J (2007) J Chem Inf Model 47:1. doi: 10.1021/ci050520j CrossRefGoogle Scholar
  28. 28.
    Zhang Q-Y, Aires-de-Sousa J (2005) J Chem Inf Model 45:1775. doi: 10.1021/ci0502707 CrossRefGoogle Scholar
  29. 29.
    Latino DARS, Aires-de-Sousa J (2006) Angew Chem Int Ed 45:2066. doi: 10.1002/anie.200503833 CrossRefGoogle Scholar
  30. 30.
    Latino DARS, Zhang Q-Y, Aires-de-Sousa J (2008) Bioinformatics 24:2236. doi: 10.1093/bioinformatics/btn405 CrossRefGoogle Scholar
  31. 31.
  32. 32.
    Kohonen T (1998) Self-Organization and Associative Memory. Springer, BerlinGoogle Scholar
  33. 33.
    Breiman L (2001) Mach Learn 45:5. doi: 10.1023/A:1010933404324 CrossRefGoogle Scholar
  34. 34.
    Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BPJ (2003) Chem Inf Comput Sci 43:1947Google Scholar
  35. 35.
    R Development Core Team (2004). R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. ISBN 3-900051-07-0, URL
  36. 36.
    Fortran original by Leo Breiman, Adele Cutler, R port by Andy Liaw and Matthew Wiener. (2004).
  37. 37.
    Clayden J, Greeves N, Warren S, Wothers P (2001) Organic Chemistry. Oxford University Press, OxfordGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Gonçalo V. S. M. Carrera
    • 1
  • Sunil Gupta
    • 1
  • João Aires-de-Sousa
    • 1
    Email author
  1. 1.REQUIMTE, CQFB, Departamento de Química, Faculdade de Ciências e TecnologiaUniversidade Nova de LisboaCaparicaPortugal

Personalised recommendations