Skip to main content
Log in

Machine learning prediction of empirical polarity using SMILES encoding of organic solvents

  • Original Article
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

Machine learning based statistical models have played a significant role in increasing the speed and accuracy with which the chemical and physical properties of chemical compounds can be predicted as compared to the experimental, and traditional ab initio and quantum mechanical approaches. The transformative impact that these techniques have, in the field of chemical sciences has completely changed the way experiments are designed. The last decade has seen the prominence of computer-aided molecular design based on machine learning algorithms. The major challenge has been the generation of machine-readable data in the form of descriptors and observations for training the model, which can again be time-consuming and computationally expensive if atomic coordinates based molecular encoding approach is used. In this study, we have tried to solve this problem using SMILES representation of molecules for generating various topological, physicochemical, electronic and steric descriptors using open-source cheminformatics packages. With the aid of the data generated using these packages, we have been able to develop a simple and explainable quantitative structure property relationship model using artificial neural network based on 7 numerical descriptors and 1 categorical descriptor for predicting the empirical polarity of a wide diversity of organic solvents. Since polarity is the representation of various solute–solvent and solvent–solvent interactions taking place in an organic transformation, its intuition beforehand will definitely help a chemist in a better experimental design.

Graphical abstract

An ANN algorithm based on 8 descriptors was successfully employed to predict the ET(30) values of organic solvents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The datasets and model algorithms can be accessed from this link: https://github.com/v-saini/SMILES-EP.git.

References

  1. Sun D, Gao W, Hu H, Zhou S (2022) Why 90% of clinical drug development fails and how to improve it? Acta Pharm Sin B 12:3049–3062. https://doi.org/10.1016/j.apsb.2022.02.002

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Geerlings P, De Proft F, Langenaeker W (2003) Conceptual density functional theory. Chem Rev 103:1793–1874. https://doi.org/10.1021/cr990029p

    Article  CAS  PubMed  Google Scholar 

  3. Varnek A, Baskin I (2012) Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Model 52:1413–1437. https://doi.org/10.1021/ci200409x

    Article  CAS  PubMed  Google Scholar 

  4. Kulik HJ, Sigman MS (2021) Advancing discovery in chemistry with artificial intelligence: from reaction outcomes to new materials and catalysts. Acc Chem Res 54:2335–2336. https://doi.org/10.1021/acs.accounts.1c00232

    Article  CAS  PubMed  Google Scholar 

  5. Iype E, Urolagin S (2019) Machine learning model for non-equilibrium structures and energies of simple molecules. J Chem Phys 150:024307. https://doi.org/10.1063/1.5054968

    Article  CAS  PubMed  Google Scholar 

  6. Boobier S, Hose DRJ, Blacker AJ, Nguyen BN (2020) Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat Commun 11:5753. https://doi.org/10.1038/s41467-020-19594-z

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Galushka M, Swain C, Browne F, Mulvenna MD, Bond R, Gray D (2021) Prediction of chemical compounds properties using a deep learning model. Neural Comput Appl 33:13345–13366. https://doi.org/10.1007/s00521-021-05961-4

    Article  Google Scholar 

  8. Datta R, Das D, Das S (2021) Efficient lipophilicity prediction of molecules employing deep-learning models. Chemometr Intell Lab Syst 213:104309. https://doi.org/10.1016/j.chemolab.2021.104309

    Article  CAS  Google Scholar 

  9. Saini V, Sharma A, Nivatia D (2022) A machine learning approach for predicting the nucleophilicity of organic molecules. Phys Chem Chem Phys 24:1821–1829. https://doi.org/10.1039/D1CP05072A

    Article  CAS  PubMed  Google Scholar 

  10. Boobier S, Liu Y, Sharma K, Hose DRJ, Blacker AJ, Kapur N, Nguyen BN (2021) Predicting solvent-dependent nucleophilicity parameter with a causal structure property relationship. J Chem Inf Model 61:4890–4899. https://doi.org/10.1021/acs.jcim.1c00610

    Article  CAS  PubMed  Google Scholar 

  11. Hoffmann G, Balcilar M, Tognetti V, Héroux P, Gaüzère B, Adam S, Joubert L (2020) Predicting experimental electrophilicities from quantum and topological descriptors: a machine learning approach. J Comput Chem 41:2124–2136. https://doi.org/10.1002/jcc.26376

    Article  CAS  Google Scholar 

  12. Ahneman DT, Estrada JG, Lin S, Dreher SD, Doyle AG (2018) Predicting reaction performance in C–N cross-coupling using machine learning. Science 360:186–190. https://doi.org/10.1126/science.aar5169

    Article  CAS  PubMed  Google Scholar 

  13. Zahrt AF, Henle JJ, Rose BT, Wang Y, Darrow WT, Denmark SE (2019) Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363:eaau5631. https://doi.org/10.1126/science.aau5631

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Beker W, Gajewska EP, Badowski T, Grzybowski BA (2019) Prediction of major regio-, site-, and diastereoisomers in Diels-Alder reactions by using machine-learning: the importance of physically meaningful descriptors. Angew Chem Int Ed 58:4515–4519. https://doi.org/10.1002/anie.201806920

    Article  CAS  Google Scholar 

  15. St. John PC, Guan Y, Kim Y, Kim S, Paton RS (2020) Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost. Nat Commun 11:2328. https://doi.org/10.1038/s41467-020-16201-z

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Jorner K, Brinck T, Norrby P-O, Buttar D (2021) Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem Sci 12:1163–1175. https://doi.org/10.1039/D0SC04896H

    Article  CAS  PubMed  Google Scholar 

  17. Stokes JM, Yang K, Swanson K, Jin W, Cubillos-Ruiz A, Donghia NM, MacNair CR, French S, Carfrae LA, Bloom-Ackermann Z, Tran VM, Chiappino-Pepe A, Badran AH, Andrews IW, Chory EJ, Church GM, Brown ED, Jaakkola TS, Barzilay R, Collins JJ (2020) A deep learning approach to antibiotic discovery. Cell 180:688-702.e613. https://doi.org/10.1016/j.cell.2020.01.021

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Li J, Tong X-Y, Zhu L-D, Zhang H-Y (2020) A machine learning method for drug combination prediction. Front Genet 11:1–9. https://doi.org/10.3389/fgene.2020.01000

    Article  Google Scholar 

  19. Gentile F, Yaacoub JC, Gleave J, Fernandez M, Ton A-T, Ban F, Stern A, Cherkasov A (2022) Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking. Nat Protoc 17:672–697. https://doi.org/10.1038/s41596-021-00659-2

    Article  CAS  PubMed  Google Scholar 

  20. Potts DS, Bregante DT, Adams JS, Torres C, Flaherty DW (2021) Influence of solvent structure and hydrogen bonding on catalysis at solid–liquid interfaces. Chem Soc Rev 50:12308–12337. https://doi.org/10.1039/D1CS00539A

    Article  CAS  PubMed  Google Scholar 

  21. Reichardt C (2007) Solvents and solvent effects: an introduction. Org Process Res Dev 11:105–113. https://doi.org/10.1021/op0680082

    Article  CAS  Google Scholar 

  22. Reichardt C (1988) Solvents and solvent effects in organic chemistry. VCH Publishers, Weinheim

    Google Scholar 

  23. Watarai H, Suzuki N (1974) Keto-enol tautomerization rates of acetylacetone in mixed aqueous media. J Inorg Nucl Chem 36:1815–1820. https://doi.org/10.1016/0022-1902(74)80516-6

    Article  CAS  Google Scholar 

  24. Ferrari E, Saladini M, Pignedoli F, Spagnolo F, Benassi R (2011) Solvent effect on keto–enol tautomerism in a new β-diketone: a comparison between experimental data and different theoretical approaches. New J Chem 35:2840–2847. https://doi.org/10.1039/C1NJ20576E

    Article  CAS  Google Scholar 

  25. Industry ESSF (1984) Solvent problems in industry. Elsevier Applied Science, London

    Google Scholar 

  26. Reichardt C (1994) Solvatochromic dyes as solvent polarity indicators. Chem Rev 94:2319–2358. https://doi.org/10.1021/cr00032a005

    Article  CAS  Google Scholar 

  27. Marcus Y (1993) The properties of organic liquids that are relevant to their use as solvating solvents. Chem Soc Rev 22:409–416. https://doi.org/10.1039/CS9932200409

    Article  CAS  Google Scholar 

  28. Reichardt C (2004) Pyridinium N-phenolate betaine dyes as empirical indicators of solvent polarity: some new findings. Pure Appl Chem 76:1903–1919. https://doi.org/10.1351/pac200476101903

    Article  CAS  Google Scholar 

  29. Reichardt C (2008) Pyridinium-N-phenolate betaine dyes as empirical indicators of solvent polarity: some new findings. Pure Appl Chem 80:1415–1432. https://doi.org/10.1351/pac200880071415

    Article  CAS  Google Scholar 

  30. Cerón-Carrasco JP, Jacquemin D, Laurence C, Planchat A, Reichardt C, Sraïdi K (2014) Solvent polarity scales: determination of new ET(30) values for 84 organic solvents. J Phys Org Chem 27:512–518. https://doi.org/10.1002/poc.3293

    Article  CAS  Google Scholar 

  31. Saini V, Kumar R (2022) A machine learning approach for predicting the empirical polarity of organic solvents. New J Chem 46:16981–16989. https://doi.org/10.1039/d2nj02513b

    Article  CAS  Google Scholar 

  32. Geerlings P, Chamorro E, Chattaraj PK, De Proft F, Gázquez JL, Liu S, Morell C, Toro-Labbé A, Vela A, Ayers P (2020) Conceptual density functional theory: status, prospects, issues. Theor Chem Acc 139:36. https://doi.org/10.1007/s00214-020-2546-7

    Article  CAS  Google Scholar 

  33. Karelson M, Lobanov VS, Katritzky AR (1996) Quantum-chemical descriptors in QSAR/QSPR studies. Chem Rev 96:1027–1044. https://doi.org/10.1021/cr950202r

    Article  CAS  PubMed  Google Scholar 

  34. Nakajima M, Nemoto T (2021) Machine learning enabling prediction of the bond dissociation enthalpy of hypervalent iodine from SMILES. Sci Rep 11:20207. https://doi.org/10.1038/s41598-021-99369-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Scalmani G, Barone V, Petersson GA, Nakatsuji H, Li X, Caricato M, Marenich AV, Bloino J, Janesko BG, Gomperts R, Mennucci B, Hratchian HP, Ortiz JV, Izmaylov AF, Sonnenberg JL, Williams, Ding F, Lipparini F, Egidi F, Goings J, Peng B, Petrone A, Henderson T, Ranasinghe D, Zakrzewski VG, Gao J, Rega N, Zheng G, Liang W, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda Y, Kitao O, Nakai H, Vreven T, Throssell K, Montgomery Jr. JA, Peralta JE, Ogliaro F, Bearpark MJ, Heyd JJ, Brothers EN, Kudin KN, Staroverov VN, Keith TA, Kobayashi R, Normand J, Raghavachari K, Rendell AP, Burant JC, Iyengar SS, Tomasi J, Cossi M, Millam JM, Klene M, Adamo C, Cammi R, Ochterski JW, Martin RL, Morokuma K, Farkas O, Foresman JB, Fox DJ (2016) Gaussian 16 Rev. C.01. Gaussian 16 Rev C01, Gaussian, Inc, Wallingford CT.

  36. Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36. https://doi.org/10.1021/ci00057a005

    Article  CAS  Google Scholar 

  37. Landrum G (2016) Rdkit: Open-source cheminformatics software, 2016. http://www.rdkit.org/, https://github.com/rdkit/rdkit 149:150.

  38. Moriwaki H, Tian Y-S, Kawashita N, Takagi T (2018) Mordred: a molecular descriptor calculator. J Cheminform 10:4. https://doi.org/10.1186/s13321-018-0258-y

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Medina-Franco JL, Sánchez-Cruz N, López-López E, Díaz-Eufracio BI (2022) Progress on open chemoinformatic tools for expanding and exploring the chemical space. J Comput Aided Mol Des 36:341–354. https://doi.org/10.1007/s10822-021-00399-1

    Article  CAS  PubMed  Google Scholar 

  40. Pinheiro GA, Mucelini J, Soares MD, Prati RC, Da Silva JLF, Quiles MG (2020) Machine learning prediction of nine molecular properties based on the SMILES representation of the QM9 quantum-chemistry dataset. J Phys Chem A 124:9854–9866. https://doi.org/10.1021/acs.jpca.0c05969

    Article  CAS  PubMed  Google Scholar 

  41. Maser MR, Cui AY, Ryou S, DeLano TJ, Yue Y, Reisman SE (2021) Multilabel classification models for the prediction of cross-coupling reaction conditions. J Chem Inf Model 61:156–166. https://doi.org/10.1021/acs.jcim.0c01234

    Article  CAS  PubMed  Google Scholar 

  42. Lever J, Krzywinski M, Altman N (2016) Model selection and overfitting. Nat Methods 13:703–704. https://doi.org/10.1038/nmeth.3968

    Article  CAS  Google Scholar 

  43. Mitchell JBO (2014) Machine learning methods in chemoinformatics. WIREs Comput Mol Sci 4:468–481. https://doi.org/10.1002/wcms.1183

    Article  CAS  Google Scholar 

  44. Kananenka AA, Yao K, Corcelli SA, Skinner JL (2019) Machine learning for vibrational spectroscopic maps. J Chem Theory Comput 15:6850–6858. https://doi.org/10.1021/acs.jctc.9b00698

    Article  CAS  PubMed  Google Scholar 

  45. Dybowski R (2020) Interpretable machine learning as a tool for scientific discovery in chemistry. New J Chem 44:20914–20920. https://doi.org/10.1039/D0NJ02592E

    Article  CAS  Google Scholar 

  46. Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1:206–215. https://doi.org/10.1038/s42256-019-0048-x

    Article  PubMed  PubMed Central  Google Scholar 

  47. Lipton ZC (2018) The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16:31–57. https://doi.org/10.1145/3236386.3241340

    Article  Google Scholar 

  48. Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci 116:22071–22080. https://doi.org/10.1073/pnas.1900654116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inf Comput Sci 35:1039–1045. https://doi.org/10.1021/ci00028a014

    Article  CAS  Google Scholar 

  50. Gasteiger J, Marsili M (1978) A new model for calculating atomic charges in molecules. Tetrahedron Lett 19:3181–3184. https://doi.org/10.1016/S0040-4039(01)94977-9

    Article  Google Scholar 

  51. Sanderson RT (1983) Electronegativity and bond energy. J Am Chem Soc 105:2259–2261. https://doi.org/10.1021/ja00346a026

    Article  CAS  Google Scholar 

  52. Basak SC, Mills D (2005) Development of quantitative structure-activity relationship models for vapor pressure estimation using computed molecular descriptors. ARKIVOC 2005:308–320. https://doi.org/10.3998/ark.5550190.0006.a23

    Article  Google Scholar 

  53. Balaban AT (1982) Highly discriminating distance-based topological index. Chem Phys Lett 89:399–404. https://doi.org/10.1016/0009-2614(82)80009-2

    Article  CAS  Google Scholar 

Download references

Funding

This work was supported by Department of Science and Technology, Ministry of Science and Technology, India (Grant Number – [DST/INSPIRE/04/2017/002529]).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vaneet Saini.

Ethics declarations

Competing Interests

The author has no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 735 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saini, V. Machine learning prediction of empirical polarity using SMILES encoding of organic solvents. Mol Divers 27, 2331–2343 (2023). https://doi.org/10.1007/s11030-022-10559-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-022-10559-6

Keywords

Navigation