Skip to main content

Advertisement

Log in

Random Forest Algorithm-Based Prediction of Solvation Gibbs Energies

  • Published:
Journal of Solution Chemistry Aims and scope Submit manuscript

Abstract

Solvation Gibbs energy of chemicals is a critical parameter in chemical industry and chemical reactivity. Predicting the solvation Gibbs energies for a large number of solvents and solutes through machine learning techniques is challenging area. In this work, the random forest (RF) algorithm, together with a combined descriptor set from solvents and solutes, was used for developing a quantitative structure–property relationship (QSPR) model for solvation Gibbs energies of 6238 solute/solvent pairs. The optimal RF (ntree = 25, mtry = 10 and nodesize = 5) model was obtained, whose training and test sets, respectively, have determination coefficients of 0.935 and 0.924, and root mean square errors of 2.477 and 2.464 kJ·mol− 1. In predicting the solvation Gibbs energies for a large dataset, the optimal RF model is comparable to other QSPR models reported in the literature.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available in the supporting information of this article.

References

  1. Ratkova, E.L., Palmer, D.S., Fedorov, M.V.: Solvation thermodynamics of organic molecules by the molecular integral equation theory: approaching chemical accuracy. Chem. Rev 115, 6312–6356 (2015)

    Article  CAS  PubMed  Google Scholar 

  2. Misin, M., Palmer, D.S., Fedorov, M.V.: Predicting solvation free energies using parameter-free solvent models. J. Phys. Chem. B 120, 5724–5731 (2016)

    Article  CAS  PubMed  Google Scholar 

  3. Barrera, M.C., Jorge, M.: A polarization-consistent model for alcohols to predict solvation free energies. J. Chem. Inf. Model 60, 1352–1367 (2020)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Marenich, A.V., Cramer, C.J., Truhlar, D.G.: Universal solvation model based on solute electron density and on a continuum model of the solvent defined by the bulk dielectric constant and atomic surface tensions. J. Phys. Chem. B 113, 6378–6396 (2009)

    Article  CAS  PubMed  Google Scholar 

  5. Zanith, C.C., Pliego, J.R.: Performance of the SMD and SM8 models for predicting solvation free energy of neutral solutes in methanol, dimethyl sulfoxide and acetonitrile. J. Comput. Aided Mol. Des. 29, 217–224 (2015)

    Article  CAS  PubMed  Google Scholar 

  6. Klamt, A., Diedenhofen, M.: Calculation of solvation free energies with DCOSMO-RS. J. Phys. Chem. A 119, 5439–5445 (2015)

    Article  CAS  PubMed  Google Scholar 

  7. Roese, S.N., Margulis, G.V., Schmidt, A.J., Uzat, C.B., Heintz, J.D., Paluch, A.S.: A simple method to predict and interpret the formation of azeotropes in binary Systems using conventional solvation free energy calculations. Ind. Eng. Chem. Res 58, 22626–22632 (2019)

    Article  CAS  Google Scholar 

  8. Saidi, C.N., Mielczarek, D.C., Paricaud, P.: Predictions of solvation Gibbs free energies with COSMO-SAC approaches. Fluid Phase Equilib. 517, 112614 (2020)

    Article  CAS  Google Scholar 

  9. Pereyaslavets, L., Kamath, G., Butin, O., Illarionov, A., Olevanov, M., Kurnikov, I., Sakipov, S., Leontyev, I., Voronina, E., Gannon, T., Nawrocki, G., Darkhovskiy, M., Ivahnenko, I., Kostikov, A., Scaranto, J., Kurnikova, M.G., Banik, S., Chan, H., Sternberg, M.G., Sankaranarayanan, S.K.R.S., Crawford, B., Potoff, J., Levitt, M., Kornberg, R.D., Fain, B.: Accurate determination of solvation free energies of neutral organic compounds from first principles. Nat. Commun. 13, 414 (2022)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Mobley, D.L., Guthrie, J.P.: FreeSolv: a database of experimental and calculated hydration free energies,with input files. J. Comput. Aided Mol. Des. 28, 711–720 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Borhani, T.N., García-Muñoz, S., Luciani, C.V., Galindo, A., Adjiman, C.S.: Hybrid QSPR models for the prediction of the free energy of solvation of organic solute/solvent pairs. Phys. Chem. Chem. Phys. 21, 13706–13720 (2019)

    Article  CAS  PubMed  Google Scholar 

  12. Wang, B., Wang, C., Wu, K., Wei, G.W.: Breaking the polar-nonpolar division in solvation free energy prediction. J. Comput. Chem. 39, 217–233 (2018)

    Article  CAS  PubMed  Google Scholar 

  13. Hutchinson, S.T., Kobayashi, R.: Solvent-specific featurization for predicting free energies of solvation through machine learning. J. Chem. Inf. Model. 59, 1338–1346 (2019)

    Article  CAS  PubMed  Google Scholar 

  14. Rauer, C., Bereau, T.: Hydration free energies from kernel-based machine learning: compound-database bias. J. Chem. Phys. 153, 014101 (2020)

    Article  CAS  PubMed  Google Scholar 

  15. Alibakhshi, A., Hartke, B.: Improved prediction of solvation free energies by machine-learning polarizable continuum solvation model. Nat. Commun. 12, 3584 (2021)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Vermeire, F.H., Green, W.H.: Transfer learning for solvation free energies: from quantum chemistry to experiments. Chem. Eng. J. 418, 129307 (2021)

    Article  CAS  Google Scholar 

  17. Katritzky, A.R., Kuanar, M., Slavov, S., Hall, C.D., Karelson, M., Kahn, I., Dobchev, D.A.: Quantitative correlation of physical and chemical properties with chemical structure: utility for prediction. Chem. Rev. 110, 5714–5789 (2010)

    Article  CAS  PubMed  Google Scholar 

  18. Lim, H., Jung, Y.: Delfos: deep learning model for prediction of solvation free energies in generic organic solvents. Chem. Sci. 10, 8306 (2019)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Lim, H., Jung, Y.: MLSolvA: solvation free energy prediction from pairwise atomistic interactions by machine learning. J. Cheminform. 13, 5 6 (2021)

    Article  Google Scholar 

  20. Chung, Y., Vermeire, F.H., Wu, H., Walker, P.J., Abraham, M.H., Green, W.H.: Group contribution and machine learning approaches to predict Abraham solute parameters, solvation free energy, and solvation enthalpy. J. Chem. Inf. Model 62, 433–446 (2022)

    Article  CAS  PubMed  Google Scholar 

  21. Zhang, D., Xia, S., Zhang, Y.: Accurate prediction of Aqueous Free Solvation Energies using 3D atomic feature-based graph neural network with transfer learning. J. Chem. Inf. Model. 62, 1840–1848 (2022)

    Article  CAS  PubMed  Google Scholar 

  22. Wu, Z., Ramsundar, B., Feinberg, E.N., Gomes, J., Geniesse, C., Pappu, A.S., Leswing, K., Pande, V.: MoleculeNet: a benchmark for molecular machine learning. Chem. Sci 9, 513–530 (2018)

    Article  CAS  PubMed  Google Scholar 

  23. Yang, K., Swanson, K., Jin, W., Coley, C., Eiden, P., Gao, H., Guzman-Perez, A., Hopper, T., Kelley, B., Mathea, M., Palmer, A., Settels, V., Jaakkola, T., Jensen, K., Barzilay, R.: Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model 59, 3370–3388 (2019)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Pathak, Y., Laghuvarapu, S., Mehta, S., Priyakumar, U.D.: Chemically interpretable graph Interaction network for prediction of pharmacokinetic properties of drug-like molecules. AAAI 34, 873–880 (2020)

    Article  Google Scholar 

  25. Malik, A., Javeri, Y.T., Shah, M., Mangrulkar, R.: Impact Analysis of COVID-19 news Headlines on Global Economy. Cyber-physical Systems, pp. 189–206. Elsevier, Netherlands (2022)

    Google Scholar 

  26. Vo, A.H., Van Vleet, T.R., Gupta, R.R., Liguori, M.J., Rao, M.S.: An overview of machine learning and Big data for drug toxicity evaluation. Chem. Res. Toxicol. 33, 20–37 (2020)

    Article  CAS  PubMed  Google Scholar 

  27. Hille, C., Ringe, S., Deimel, M., Kunkel, C., Acree, W.E., Reuter, K., Oberhofer, H.: Solv@TUM v 1.0. (2018). https://mediatum.ub.tum.de/1452571 Accessed 9 November 2018

  28. Hille, C., Ringe, S., Deimel, M., Kunkel, C., Acree, W.E., Reuter, K., Oberhofer, H.: Generalized molecular solvation in non-aqueous solutions by a single parameter implicit solvation scheme. J. Chem. Phys. 150, 041710 (2019)

    Article  PubMed  Google Scholar 

  29. PerkinElmer Informatics: PerkinElmer ChemOffice Suite 2019, version 19.0.0.22. PerkinElmer Informatics, Waltham, Massachusetts: (2019)

  30. IBM Corp: IBM SPSS Statistics for Windows, Version 19.0. IBM Corp, Armonk, New York (2010)

    Google Scholar 

  31. Todeschini, R., Consonni, V., Mauri, A., Pavan, M.: DRAGON Software for the Calculation of Molecular Descriptors, revision 6.0 for Windows. Talete s.r.l., Milan (2012)

  32. Liaw, A., Wiener, M.: Classification and regression by random forest. R News 2, 18–22 (2002)

    Google Scholar 

  33. Yu, X., Zeng, Q.: Random forest algorithm-based classification model of pesticide aquatic toxicity to fishes. Aquat. Toxicol. 251, 106265 (2022)

    Article  CAS  PubMed  Google Scholar 

  34. Oukawa, G.Y., Krecl, P., Targino, A.C.: Fine-scale modeling of the urban heat island: a comparison of multiple linear regression and random forest approaches. Sci. Total Environ. 815, 152836 (2022)

    Article  CAS  PubMed  Google Scholar 

  35. Montes, C., Kapelan, Z., Saldarriaga, J.: Predicting non-deposition sediment transport in sewer pipes using random forest. Water Res 189, 116639 (2021)

    Article  CAS  PubMed  Google Scholar 

  36. Rajput, A., Bhamare, K.T., Thakur, A., Kumar, M.: Biofilm-i: a platform for Predicting BiofilmInhibitors using quantitative structure—relationship (QSAR) based regression models to Curb Antibiotic Resistance. Molecules. 27, 4861 (2022)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Yu, X.: Prediction of chemical toxicity to Tetrahymena pyriformis with four descriptor models. Ecotox Environ. Safe. 190, 110146 (2020)

    Article  CAS  Google Scholar 

  38. Masand, V.H., El-Sayed, N.N.E., Bambole, M.U., Patil, V.R., Thakur, S.D.: Multiple quantitative structure-activity relationships (QSARs) analysis for orally active trypanocidal nmyristoyltransferase inhibitors. J. Mol. Struct. 1175, 481–487 (2019)

    Article  CAS  Google Scholar 

  39. Masand, V.H., El-Sayed, N.N.E., Mahajan, D.T., Rastija, V.: QSAR analysis for 6-arylpyrazine-2-carboxamides as Trypanosoma brucei inhibitors. SAR QSAR Environ. Res 28, 165–177 (2017)

    Article  CAS  PubMed  Google Scholar 

  40. Roy, K., Ambure, P., Aher, R.B.: How important is to detect systematic error in predictions and understand statistical applicability domain of QSAR models? Chemometr Intell. Lab. Syst. 162, 44–54 (2017)

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the Open Project Program of Hunan Provincial Key Laboratory of Environmental Catalysis & Waste Regeneration (Hunan Institute of Engineering) (No. 2018KF11) and the Hunan Provincial Natural Science Foundation (Nos. 2020JJ6013, 2021JJ50111).

Author information

Authors and Affiliations

Authors

Contributions

ML, LZ, HW, and JZ contributed to data collection and curation, descriptor calculation, software, and model development; FW contributed to manuscript revision; XY contributed to conceptualization, methodology, writing-original draft, and manuscript revision.

Corresponding authors

Correspondence to Feng Wu or Xinliang Yu.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (XLSX 886.9 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, M., Wu, F., Yu, X. et al. Random Forest Algorithm-Based Prediction of Solvation Gibbs Energies. J Solution Chem 52, 487–498 (2023). https://doi.org/10.1007/s10953-023-01247-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10953-023-01247-6

Keywords

Navigation