Journal of Computer-Aided Molecular Design

, Volume 31, Issue 9, pp 829–839 | Cite as

Structure–reactivity modeling using mixture-based representation of chemical reactions

  • Pavel PolishchukEmail author
  • Timur MadzhidovEmail author
  • Timur Gimadiev
  • Andrey Bodrov
  • Ramil Nugmanov
  • Alexandre VarnekEmail author


We describe a novel approach of reaction representation as a combination of two mixtures: a mixture of reactants and a mixture of products. In turn, each mixture can be encoded using an earlier reported approach involving simplex descriptors (SiRMS). The feature vector representing these two mixtures results from either concatenated product and reactant descriptors or the difference between descriptors of products and reactants. This reaction representation doesn’t need an explicit labeling of a reaction center. The rigorous “product-out” cross-validation (CV) strategy has been suggested. Unlike the naïve “reaction-out” CV approach based on a random selection of items, the proposed one provides with more realistic estimation of prediction accuracy for reactions resulting in novel products. The new methodology has been applied to model rate constants of E2 reactions. It has been demonstrated that the use of the fragment control domain applicability approach significantly increases prediction accuracy of the models. The models obtained with new “mixture” approach performed better than those required either explicit (Condensed Graph of Reaction) or implicit (reaction fingerprints) reaction center labeling.


Chemical reactions Simplex representation of molecular structure Condensed graph of reaction Reaction fingerprints Rate constant prediction Mixtures 



This work was supported by Russian Science Foundation, Grant No. 14-43-00024.

Supplementary material

10822_2017_44_MOESM1_ESM.rdf (2.8 mb)
Supplementary material 1 (RDF 2861 KB)
10822_2017_44_MOESM2_ESM.xlsx (39 kb)
Supplementary material 2 (XLSX 39 KB)
10822_2017_44_MOESM3_ESM.pdf (3.6 mb)
Supplementary material 3 (PDF 3710 KB)


  1. 1.
    Chen WL, Chen DZ, Taylor KT (2013) Automatic reaction mapping and reaction center detection. Wiley Interdiscip Rev Comput Mol Sci 3(6):560–593. doi: 10.1002/wcms.1140 CrossRefGoogle Scholar
  2. 2.
    Zhang J, Kleinöder T, Gasteiger J (2006) Prediction of pKa values for aliphatic carboxylic acids and alcohols with empirical atomic charge descriptors. J Chem Inf Model 46(6):2256–2266. doi: 10.1021/ci060129d CrossRefGoogle Scholar
  3. 3.
    Gasteiger J, Hondelmann U, Rose P, Witzenbichler W (1995) Computer-assisted prediction of the degradation of chemicals: hydrolysis of amides and benzoylphenylureas. J Chem Soc Perkin Trans 2(2):193–204. doi: 10.1039/p29950000193 CrossRefGoogle Scholar
  4. 4.
    Varnek A, Fourches D, Horvath D, Klimchuk O, Gaudin C, Vayer P, Solov’ev V, Hoonakker F, Tetko IV, Marcou G (2008) ISIDA—platform for virtual screening based on fragment and pharmacophoric descriptors. Curr Comput Aided Drug Des 4(3):191–198. doi: 10.2174/157340908785747465 CrossRefGoogle Scholar
  5. 5.
    Ruggiu F, Marcou G, Varnek A, Horvath D (2010) ISIDA property-labelled fragment descriptors. Mol Inform 29(12):855–868. doi: 10.1002/minf.201000099 CrossRefGoogle Scholar
  6. 6.
    Varnek A, Fourches D, Hoonakker F, Solov’ev VP (2005) Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures. J Comput Aided Mol Des 19(9):693–703. doi: 10.1007/s10822-005-9008-0 CrossRefGoogle Scholar
  7. 7.
    Hoonakker F, Lachiche N, Varnek A, Wagner A (2011) A representation to apply usual data mining techniques to chemical reactions—illustration on the rate constant of SN2 reactions in water. Int J Artif Intell Tools 20(02):253–270. doi: 10.1142/S0218213011000140 CrossRefGoogle Scholar
  8. 8.
    de Luca A, Horvath D, Marcou G, Solov’ev V, Varnek A (2012) Mining chemical reactions using neighborhood behavior and condensed graphs of reactions approaches. J Chem Inf Model 52(9):2325–2338. doi: 10.1021/ci300149n CrossRefGoogle Scholar
  9. 9.
    Madzhidov TI, Polishchuk PG, Nugmanov RI, Bodrov AV, Lin AI, Baskin II, Varnek AA, Antipin IS (2014) Structure-reactivity relationships in terms of the condensed graphs of reactions. Russ J Org Chem 50(4):459–463. doi: 10.1134/S1070428014040010 CrossRefGoogle Scholar
  10. 10.
    Nugmanov RI, Madzhidov TI, Haliullina GR, Baskin II, Antipin IS, Varnek A (2014) Development of “structure-reactivity” models for nucleophilic substitution reactions with participation of azides. J Struct Chem 55(6):1080–1087CrossRefGoogle Scholar
  11. 11.
    Madzhidov T, Bodrov A, Gimadiev T, Nugmanov R, Antipin I, Varnek A (2015) Obtaining structure-reactivity relationships for bimolecular elimination reactions with Condensed Reaction Graph approach. J Struct Chem 56(7):1227–1234CrossRefGoogle Scholar
  12. 12.
    Marcou G, Aires de Sousa J, Latino DARS, de Luca A, Horvath D, Rietsch V, Varnek A (2015) Expert system for predicting reaction conditions: the michael reaction case. J Chem Inf Model 55(2):239–250. doi: 10.1021/ci500698a CrossRefGoogle Scholar
  13. 13.
    Faulon J-L, Visco DP, Pophale RS (2003) The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J Chem Inf Comput Sci 43(3):707–720. doi: 10.1021/ci020345w CrossRefGoogle Scholar
  14. 14.
    Ridder L, Wagener M (2008) SyGMa: combining expert knowledge and empirical scoring in the prediction of metabolites. ChemMedChem 3(5):821–832. doi: 10.1002/cmdc.200700312 CrossRefGoogle Scholar
  15. 15.
    Schneider N, Lowe DM, Sayle RA, Landrum GA (2015) Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity. J Chem Inf Model 55(1):39–53. doi: 10.1021/ci5006614 CrossRefGoogle Scholar
  16. 16.
    Zhang Q-Y, Aires-de-Sousa J (2005) Structure-based classification of chemical reactions without assignment of reaction centers. J Chem Inf Model 45(6):1775–1783. doi: 10.1021/ci0502707 CrossRefGoogle Scholar
  17. 17.
    Kravtsov AA, Karpov PV, Baskin II, Palyulin VA, Zefirov NS (2011) Prediction of rate constants of SN2 reactions by the multicomponent QSPR method. Dokl Chem 440 (2):299–301. doi: 10.1134/s0012500811100107 CrossRefGoogle Scholar
  18. 18.
    Faulon J-L, Misra M, Martin S, Sale K, Sapra R (2008) Genome scale enzyme—metabolite and drug—target interaction predictions using the signature molecular descriptor. Bioinformatics 24(2):225–233. doi: 10.1093/bioinformatics/btm580 CrossRefGoogle Scholar
  19. 19.
    Kravtsov AA, Karpov PV, Baskin II, Palyulin VA, Zefirov NS (2011) Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and prognosis of S N 1 rate constants by means of QSPR. Dokl Chem 441 (1):314–317. doi: 10.1134/s0012500811110048 CrossRefGoogle Scholar
  20. 20.
    Muller C, Marcou G, Horvath D, Aires-de-Sousa J, Varnek A (2012) Models for identification of erroneous atom-to-atom mapping of reactions performed by automated algorithms. J Chem Inf Model 52(12):3116–3122. doi: 10.1021/ci300418q CrossRefGoogle Scholar
  21. 21.
    Patel H, Bodkin MJ, Chen B, Gillet VJ (2009) Knowledge-based approach to de novo design using reaction vectors. J Chem Inf Model 49(5):1163–1184. doi: 10.1021/ci800413m CrossRefGoogle Scholar
  22. 22.
    Oprisiu I, Varlamova E, Muratov E, Artemenko A, Marcou G, Polishchuk P, Kuz’min V, Varnek A (2012) QSPR approach to predict nonadditive properties of mixtures. Application to bubble point temperatures of binary mixtures of liquids. Mol Inform 31(6–7):491–502. doi: 10.1002/minf.201200006 CrossRefGoogle Scholar
  23. 23.
    Palm VA (1974–1978) Tables of rate and equilibrium constants of heterolytic organic reactions, vol 1–5. MoscowGoogle Scholar
  24. 24.
    Catalán J, Díaz C (1997) A generalized solvent acidity scale: the solvatochromism of o-tert-butylstilbazolium betaine dye and its homomorph o, o′-di-tert-butylstilbazolium betaine dye. Liebigs Ann 1997 (9):1941–1949. doi: 10.1002/jlac.199719970921 CrossRefGoogle Scholar
  25. 25.
    Catalán J, Díaz C, López V, Pérez P, De Paz J-LG, Rodríguez JG (1996) A generalized solvent basicity scale: the solvatochromism of 5-nitroindoline and its homomorph 1-methyl-5-nitroindoline. Liebigs Ann 1996 (11):1785–1794. doi: 10.1002/jlac.199619961112 CrossRefGoogle Scholar
  26. 26.
    Catalán J, López V, Pérez P, Martin-Villamil R, Rodríguez J-G (1995) Progress towards a generalized solvent polarity scale: The solvatochromism of 2-(dimethylamino)-7-nitrofluorene and its homomorph 2-fluoro-7-nitrofluorene. Liebigs Ann 1995 (2):241–252. doi: 10.1002/jlac.199519950234 CrossRefGoogle Scholar
  27. 27.
    Taft RW, Kamlet MJ (1976) The solvatochromic comparison method. 2. The .alpha.-scale of solvent hydrogen-bond donor (HBD) acidities. J Am Chem Soc 98(10):2886–2894. doi: 10.1021/ja00426a036 CrossRefGoogle Scholar
  28. 28.
    Kamlet MJ, Taft RW (1976) The solvatochromic comparison method. I. The .beta.-scale of solvent hydrogen-bond acceptor (HBA) basicities. J Am Chem Soc 98(2):377–383. doi: 10.1021/ja00418a009 CrossRefGoogle Scholar
  29. 29.
    Kamlet MJ, Abboud JL, Taft RW (1977) The solvatochromic comparison method. 6. The .pi.* scale of solvent polarities. J Am Chem Soc 99(18):6027–6038. doi: 10.1021/ja00460a031 CrossRefGoogle Scholar
  30. 30.
    cxcalc. 5.4 edn. Chemaxon, Budapest, HungaryGoogle Scholar
  31. 31.
    Kuz’min VE, Artemenko AG, Muratov EN (2008) Hierarchical QSAR technology based on the Simplex representation of molecular structure. J Comput Aided Mol Des 22(6–7):403–421. doi: 10.1007/s10822-008-9179-6 CrossRefGoogle Scholar
  32. 32.
    Kuz’min VE, Artemenko AG, Polischuk PG, Muratov EN, Khromov AI, Liahovskiy AV, Andronati SA, Makan SY (2005) Hierarchic system of QSAR models (1D-4D) on the base of simplex representation of molecular structure. J Mol Model 11:457–467. doi: 10.1007/s00894-005-0237-x CrossRefGoogle Scholar
  33. 33.
    RDKit: Open-Source Cheminformatics.
  34. 34.
    Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci 25(2):64–73. doi: 10.1021/ci00046a002 CrossRefGoogle Scholar
  35. 35.
    Rogers D, Hahn M (2010) Extended-Connectivity Fingerprints. J Chem Inf Model 50(5):742–754. doi: 10.1021/ci100050t CrossRefGoogle Scholar
  36. 36.
    Nilakantan R, Bauman N, Dixon JS, Venkataraghavan R (1987) Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors. J Chem Inf Comput Sci 27(2):82–85. doi: 10.1021/ci00054a008 CrossRefGoogle Scholar
  37. 37.
    Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22Google Scholar
  38. 38.
    Max Kuhn. Contributions from Jed Wing and Steve Weston and Andre Williams and Chris Keefer and Allan Engelhardt and Tony Cooper and Zachary Mayer and the R Core Team caret: Classification and Regression Training (2014). R package version 6.0–30 edn.Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Institute of Molecular and Translational Medicine, Faculty of Medicine and DentistryPalacky UniversityOlomoucCzech Republic
  2. 2.A.V. Bogatsky Physico-Chemical Institute of National Academy of Sciences of UkraineOdessaUkraine
  3. 3.A.M. Butlerov Institute of ChemistryKazan Federal UniversityKazanRussia
  4. 4.Department of General and Organic ChemistryKazan State Medical UniversityKazanRussia
  5. 5.Laboratory of ChemoinformaticsUniversity of StrasbourgStrasbourgFrance

Personalised recommendations