Structure–reactivity modeling using mixture-based representation of chemical reactions
- 394 Downloads
We describe a novel approach of reaction representation as a combination of two mixtures: a mixture of reactants and a mixture of products. In turn, each mixture can be encoded using an earlier reported approach involving simplex descriptors (SiRMS). The feature vector representing these two mixtures results from either concatenated product and reactant descriptors or the difference between descriptors of products and reactants. This reaction representation doesn’t need an explicit labeling of a reaction center. The rigorous “product-out” cross-validation (CV) strategy has been suggested. Unlike the naïve “reaction-out” CV approach based on a random selection of items, the proposed one provides with more realistic estimation of prediction accuracy for reactions resulting in novel products. The new methodology has been applied to model rate constants of E2 reactions. It has been demonstrated that the use of the fragment control domain applicability approach significantly increases prediction accuracy of the models. The models obtained with new “mixture” approach performed better than those required either explicit (Condensed Graph of Reaction) or implicit (reaction fingerprints) reaction center labeling.
KeywordsChemical reactions Simplex representation of molecular structure Condensed graph of reaction Reaction fingerprints Rate constant prediction Mixtures
This work was supported by Russian Science Foundation, Grant No. 14-43-00024.
- 22.Oprisiu I, Varlamova E, Muratov E, Artemenko A, Marcou G, Polishchuk P, Kuz’min V, Varnek A (2012) QSPR approach to predict nonadditive properties of mixtures. Application to bubble point temperatures of binary mixtures of liquids. Mol Inform 31(6–7):491–502. doi: 10.1002/minf.201200006 CrossRefGoogle Scholar
- 23.Palm VA (1974–1978) Tables of rate and equilibrium constants of heterolytic organic reactions, vol 1–5. MoscowGoogle Scholar
- 26.Catalán J, López V, Pérez P, Martin-Villamil R, Rodríguez J-G (1995) Progress towards a generalized solvent polarity scale: The solvatochromism of 2-(dimethylamino)-7-nitrofluorene and its homomorph 2-fluoro-7-nitrofluorene. Liebigs Ann 1995 (2):241–252. doi: 10.1002/jlac.199519950234 CrossRefGoogle Scholar
- 30.cxcalc. 5.4 edn. Chemaxon, Budapest, HungaryGoogle Scholar
- 33.RDKit: Open-Source Cheminformatics. http://www.rdkit.org
- 37.Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22Google Scholar
- 38.Max Kuhn. Contributions from Jed Wing and Steve Weston and Andre Williams and Chris Keefer and Allan Engelhardt and Tony Cooper and Zachary Mayer and the R Core Team caret: Classification and Regression Training (2014). R package version 6.0–30 edn.Google Scholar