Abstract
Solvation Gibbs energy of chemicals is a critical parameter in chemical industry and chemical reactivity. Predicting the solvation Gibbs energies for a large number of solvents and solutes through machine learning techniques is challenging area. In this work, the random forest (RF) algorithm, together with a combined descriptor set from solvents and solutes, was used for developing a quantitative structure–property relationship (QSPR) model for solvation Gibbs energies of 6238 solute/solvent pairs. The optimal RF (ntree = 25, mtry = 10 and nodesize = 5) model was obtained, whose training and test sets, respectively, have determination coefficients of 0.935 and 0.924, and root mean square errors of 2.477 and 2.464 kJ·mol− 1. In predicting the solvation Gibbs energies for a large dataset, the optimal RF model is comparable to other QSPR models reported in the literature.
Graphical Abstract
Similar content being viewed by others
Data Availability
The data that support the findings of this study are available in the supporting information of this article.
References
Ratkova, E.L., Palmer, D.S., Fedorov, M.V.: Solvation thermodynamics of organic molecules by the molecular integral equation theory: approaching chemical accuracy. Chem. Rev 115, 6312–6356 (2015)
Misin, M., Palmer, D.S., Fedorov, M.V.: Predicting solvation free energies using parameter-free solvent models. J. Phys. Chem. B 120, 5724–5731 (2016)
Barrera, M.C., Jorge, M.: A polarization-consistent model for alcohols to predict solvation free energies. J. Chem. Inf. Model 60, 1352–1367 (2020)
Marenich, A.V., Cramer, C.J., Truhlar, D.G.: Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J. Phys. Chem. B 113, 6378–6396 (2009)
Zanith, C.C., Pliego, J.R.: Performance of the SMD and SM8 models for predicting solvation free energy of neutral solutes in methanol, dimethyl sulfoxide and acetonitrile. J. Comput. Aided Mol. Des. 29, 217–224 (2015)
Klamt, A., Diedenhofen, M.: Calculation of solvation free energies with DCOSMO-RS. J. Phys. Chem. A 119, 5439–5445 (2015)
Roese, S.N., Margulis, G.V., Schmidt, A.J., Uzat, C.B., Heintz, J.D., Paluch, A.S.: A simple method to predict and interpret the formation of azeotropes in binary Systems using conventional solvation free energy calculations. Ind. Eng. Chem. Res 58, 22626–22632 (2019)
Saidi, C.N., Mielczarek, D.C., Paricaud, P.: Predictions of solvation Gibbs free energies with COSMO-SAC approaches. Fluid Phase Equilib. 517, 112614 (2020)
Pereyaslavets, L., Kamath, G., Butin, O., Illarionov, A., Olevanov, M., Kurnikov, I., Sakipov, S., Leontyev, I., Voronina, E., Gannon, T., Nawrocki, G., Darkhovskiy, M., Ivahnenko, I., Kostikov, A., Scaranto, J., Kurnikova, M.G., Banik, S., Chan, H., Sternberg, M.G., Sankaranarayanan, S.K.R.S., Crawford, B., Potoff, J., Levitt, M., Kornberg, R.D., Fain, B.: Accurate determination of solvation free energies of neutral organic compounds from first principles. Nat. Commun. 13, 414 (2022)
Mobley, D.L., Guthrie, J.P.: FreeSolv: a database of experimental and calculated hydration free energies,with input files. J. Comput. Aided Mol. Des. 28, 711–720 (2014)
Borhani, T.N., García-Muñoz, S., Luciani, C.V., Galindo, A., Adjiman, C.S.: Hybrid QSPR models for the prediction of the free energy of solvation of organic solute/solvent pairs. Phys. Chem. Chem. Phys. 21, 13706–13720 (2019)
Wang, B., Wang, C., Wu, K., Wei, G.W.: Breaking the polar-nonpolar division in solvation free energy prediction. J. Comput. Chem. 39, 217–233 (2018)
Hutchinson, S.T., Kobayashi, R.: Solvent-specific featurization for predicting free energies of solvation through machine learning. J. Chem. Inf. Model. 59, 1338–1346 (2019)
Rauer, C., Bereau, T.: Hydration free energies from kernel-based machine learning: compound-database bias. J. Chem. Phys. 153, 014101 (2020)
Alibakhshi, A., Hartke, B.: Improved prediction of solvation free energies by machine-learning polarizable continuum solvation model. Nat. Commun. 12, 3584 (2021)
Vermeire, F.H., Green, W.H.: Transfer learning for solvation free energies: from quantum chemistry to experiments. Chem. Eng. J. 418, 129307 (2021)
Katritzky, A.R., Kuanar, M., Slavov, S., Hall, C.D., Karelson, M., Kahn, I., Dobchev, D.A.: Quantitative correlation of physical and chemical properties with chemical structure: utility for prediction. Chem. Rev. 110, 5714–5789 (2010)
Lim, H., Jung, Y.: Delfos: deep learning model for prediction of solvation free energies in generic organic solvents. Chem. Sci. 10, 8306 (2019)
Lim, H., Jung, Y.: MLSolvA: solvation free energy prediction from pairwise atomistic interactions by machine learning. J. Cheminform. 13, 5 6 (2021)
Chung, Y., Vermeire, F.H., Wu, H., Walker, P.J., Abraham, M.H., Green, W.H.: Group contribution and machine learning approaches to predict Abraham solute parameters, solvation free energy, and solvation enthalpy. J. Chem. Inf. Model 62, 433–446 (2022)
Zhang, D., Xia, S., Zhang, Y.: Accurate prediction of Aqueous Free Solvation Energies using 3D atomic feature-based graph neural network with transfer learning. J. Chem. Inf. Model. 62, 1840–1848 (2022)
Wu, Z., Ramsundar, B., Feinberg, E.N., Gomes, J., Geniesse, C., Pappu, A.S., Leswing, K., Pande, V.: MoleculeNet: a benchmark for molecular machine learning. Chem. Sci 9, 513–530 (2018)
Yang, K., Swanson, K., Jin, W., Coley, C., Eiden, P., Gao, H., Guzman-Perez, A., Hopper, T., Kelley, B., Mathea, M., Palmer, A., Settels, V., Jaakkola, T., Jensen, K., Barzilay, R.: Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model 59, 3370–3388 (2019)
Pathak, Y., Laghuvarapu, S., Mehta, S., Priyakumar, U.D.: Chemically interpretable graph Interaction network for prediction of pharmacokinetic properties of drug-like molecules. AAAI 34, 873–880 (2020)
Malik, A., Javeri, Y.T., Shah, M., Mangrulkar, R.: Impact Analysis of COVID-19 news Headlines on Global Economy. Cyber-physical Systems, pp. 189–206. Elsevier, Netherlands (2022)
Vo, A.H., Van Vleet, T.R., Gupta, R.R., Liguori, M.J., Rao, M.S.: An overview of machine learning and Big data for drug toxicity evaluation. Chem. Res. Toxicol. 33, 20–37 (2020)
Hille, C., Ringe, S., Deimel, M., Kunkel, C., Acree, W.E., Reuter, K., Oberhofer, H.: Solv@TUM v 1.0. (2018). https://mediatum.ub.tum.de/1452571 Accessed 9 November 2018
Hille, C., Ringe, S., Deimel, M., Kunkel, C., Acree, W.E., Reuter, K., Oberhofer, H.: Generalized molecular solvation in non-aqueous solutions by a single parameter implicit solvation scheme. J. Chem. Phys. 150, 041710 (2019)
PerkinElmer Informatics: PerkinElmer ChemOffice Suite 2019, version 19.0.0.22. PerkinElmer Informatics, Waltham, Massachusetts: (2019)
IBM Corp: IBM SPSS Statistics for Windows, Version 19.0. IBM Corp, Armonk, New York (2010)
Todeschini, R., Consonni, V., Mauri, A., Pavan, M.: DRAGON Software for the Calculation of Molecular Descriptors, revision 6.0 for Windows. Talete s.r.l., Milan (2012)
Liaw, A., Wiener, M.: Classification and regression by random forest. R News 2, 18–22 (2002)
Yu, X., Zeng, Q.: Random forest algorithm-based classification model of pesticide aquatic toxicity to fishes. Aquat. Toxicol. 251, 106265 (2022)
Oukawa, G.Y., Krecl, P., Targino, A.C.: Fine-scale modeling of the urban heat island: a comparison of multiple linear regression and random forest approaches. Sci. Total Environ. 815, 152836 (2022)
Montes, C., Kapelan, Z., Saldarriaga, J.: Predicting non-deposition sediment transport in sewer pipes using random forest. Water Res 189, 116639 (2021)
Rajput, A., Bhamare, K.T., Thakur, A., Kumar, M.: Biofilm-i: a platform for Predicting BiofilmInhibitors using quantitative structure—relationship (QSAR) based regression models to Curb Antibiotic Resistance. Molecules. 27, 4861 (2022)
Yu, X.: Prediction of chemical toxicity to Tetrahymena pyriformis with four descriptor models. Ecotox Environ. Safe. 190, 110146 (2020)
Masand, V.H., El-Sayed, N.N.E., Bambole, M.U., Patil, V.R., Thakur, S.D.: Multiple quantitative structure-activity relationships (QSARs) analysis for orally active trypanocidal nmyristoyltransferase inhibitors. J. Mol. Struct. 1175, 481–487 (2019)
Masand, V.H., El-Sayed, N.N.E., Mahajan, D.T., Rastija, V.: QSAR analysis for 6-arylpyrazine-2-carboxamides as Trypanosoma brucei inhibitors. SAR QSAR Environ. Res 28, 165–177 (2017)
Roy, K., Ambure, P., Aher, R.B.: How important is to detect systematic error in predictions and understand statistical applicability domain of QSAR models? Chemometr Intell. Lab. Syst. 162, 44–54 (2017)
Acknowledgements
This work was supported by the Open Project Program of Hunan Provincial Key Laboratory of Environmental Catalysis & Waste Regeneration (Hunan Institute of Engineering) (No. 2018KF11) and the Hunan Provincial Natural Science Foundation (Nos. 2020JJ6013, 2021JJ50111).
Author information
Authors and Affiliations
Contributions
ML, LZ, HW, and JZ contributed to data collection and curation, descriptor calculation, software, and model development; FW contributed to manuscript revision; XY contributed to conceptualization, methodology, writing-original draft, and manuscript revision.
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liao, M., Wu, F., Yu, X. et al. Random Forest Algorithm-Based Prediction of Solvation Gibbs Energies. J Solution Chem 52, 487–498 (2023). https://doi.org/10.1007/s10953-023-01247-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10953-023-01247-6