Structural Chemistry

, Volume 24, Issue 1, pp 303–331 | Cite as

QSPR with extended topochemical atom (ETA) indices. 4. Modeling aqueous solubility of drug like molecules and agrochemicals following OECD guidelines

Original Research


Aqueous solubility is the property of utmost interest for predicting the behavior of chemical compounds inside body, since water serves as the most ubiquitous component of any living cell. Predictive quantitative structure–property relationship models on aqueous solubility try to explore the essential chemical information of molecules that control their dissolution ability. Considering the importance of solubility controlling the absorption, distribution, metabolism, excretion, and toxicity properties of drug and other such chemicals, attempts were made to develop predictive models following OECD guidelines on aqueous solubility of a large set (N = 565) of diverse drug, drug like compounds, and agrochemicals with extended topochemical atom (ETA) indices using suitable chemometric tools. Because of the prime involvement of hydrophobicity in solubilization of structurally complex and crystalline organic compounds, computed lipophilicity parameter ClogP was used. Models were also developed using various other non-ETA descriptors. Additional attempt was made to build models employing ETA, non-ETA, and ClogP parameters. All the models were subjected to rigorous statistical validation using multiple strategies and encouraging results were obtained for internal, external, and overall validation of the models. Comparative analysis performed on the prediction set (test set) using general solubility equation, and the best model developed here with ETA and ClogP parameters demonstrated better predictive potential of the latter model.


Aqueous solubility QSPR ETA Topological Drug Agrochemical 

Supplementary material

11224_2012_80_MOESM1_ESM.doc (222 kb)
Supplementary material 1 (DOC 222 kb)


  1. 1.
    Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 23:3–25CrossRefGoogle Scholar
  2. 2.
    Clarke ED, Delaney JS (2003) Physical and molecular properties of agrochemicals: an analysis of screen inputs, hits, leads and products. Chimia 57:731–734CrossRefGoogle Scholar
  3. 3.
    Klamt A, Eckert F, Hornig M, Beck ME, Bürger T (2002) Prediction of aqueous solubility of drugs and pesticides with COSMO-RS. J Comput Chem 23:275–281CrossRefGoogle Scholar
  4. 4.
    McElroy NR, Jurs PC (2001) Prediction of aqueous solubility of heteroatom-containing organic compounds from molecular structure. J Chem Inf Comput Sci 41:1237–1247CrossRefGoogle Scholar
  5. 5.
    Schuster D, Laggner C, Langer T (2005) Why drugs fail-a study on side effects in new chemical entities. Curr Pharm Des 11:3545–3559CrossRefGoogle Scholar
  6. 6.
    Hansen NT, Kouskoumvekaki I, Jørgensen FS, Brunak S, Jo′nsdo′ttir SO (2006) Prediction of pH-dependent aqueous solubility of druglike molecules. J Chem Inf Model 46:2601–2609CrossRefGoogle Scholar
  7. 7.
    Di L, Kerns EH (2006) Biological assay challenges from compound solubility: strategies for bioassay optimisation. Drug Discovery Today 11:446–451CrossRefGoogle Scholar
  8. 8.
    McGovern SL, Caselli E, Grigorieff N, Shoichet BK (2002) A common mechanism underlying promiscuous inhibitors from virtual and high throughput screening. J Med Chem 45:1712–1722CrossRefGoogle Scholar
  9. 9.
    van de Waterbeemd H, Smith DA, Beaumont K, Walker DK (2001) Property-based design: optimization of drug absorption and pharmacokinetics. J Med Chem 44:1–21CrossRefGoogle Scholar
  10. 10.
    Center for Drug Evaluation and Research (2000) Guidance for industry. Rockville, MD, CDER/FDA. Accessed 26 April 2012
  11. 11.
    EMEA (2007) Committee for medicinal products for human use, concept paper on BCS-based biowaiver. EMEA, London, EMEA/CHMP/EWP/213035/2007Google Scholar
  12. 12.
    Stegemann S, Leveiller F, Franchi D, de Jong H, Lindén H (2007) When poor solubility becomes an issue: from early stage to proof of concept. Eur J Pharm Sci 31:249–261CrossRefGoogle Scholar
  13. 13.
    Smith CJ, Hansch C (2000) The relative toxicity of compounds in mainstream cigarette smoke condensate. Food Chem Toxicol 38:637–646CrossRefGoogle Scholar
  14. 14.
    Pogãcean MP, Gavrilescu M (2009) Plant protection products and their sustainable and environmentally friendly use. Environ Eng Manag J 8:607–627Google Scholar
  15. 15.
    Waichman AV, Römbke J, Ribeiro MOA, Nina NCS (2002) Use and fate of pesticides in the Amazon State, Brazil. Risk to human health and the environment. Environ Sci Pollut Res 9:423–428CrossRefGoogle Scholar
  16. 16.
    Jain N, Yalkowsky SH (2001) Estimation of the aqueous solubility I: application to organic nonelectrolytes. J Pharm Sci 90:234–252CrossRefGoogle Scholar
  17. 17.
    Faller B, Ertl P (2007) Computational approaches to determine drug solubility. Adv Drug Deliv Rev 59:533–545CrossRefGoogle Scholar
  18. 18.
    Taskinen J (2000) Prediction of aqueous solubility in drug design. Curr Opin Drug Discov Dev 3:102–107Google Scholar
  19. 19.
    Jorgensena WL, Duffy EM (2002) Prediction of drug solubility from structure. Adv Drug Deliv Rev 54:355–366CrossRefGoogle Scholar
  20. 20.
    Worth AP, Bassan A, De Bruijn J, Saliner AG, Netzeva T, Patlewicz G, Pavan M, Tsakovska I, Eisenreich S (2007) The role of the European Chemicals Bureau in promoting the regulatory use of (Q)SAR methods. SAR QSAR Environ Res 18:111–125CrossRefGoogle Scholar
  21. 21.
    OECD Environment Health and Safety Publications Series on Testing and Assessment No. 69 (2007) Guidance document on the validation of (quantitative) structure-activity relationship [(Q)SAR] models. Accessed 26 April 2012
  22. 22.
    Bhattachar SN, Deschenes LA, Wesley JA (2006) Solubility: it’s not just for physical chemists. Drug Discovery Today 11:1012–1018CrossRefGoogle Scholar
  23. 23.
    Yalkowsky SH, Banerjee S (1992) Aqueous solubility: methods of estimation for organic compounds. Marcel Dekker, New YorkGoogle Scholar
  24. 24.
    Peterson DL, Yalkowski SH (2001) Comparison of two methods for predicting aqueous solubility. J Chem Inf Comput Sci 41:1531–1534CrossRefGoogle Scholar
  25. 25.
    Ran Y, Yalkowsky SH (2001) Prediction of drug solubility by the general solubility equation (GSE). J Chem Inf Comput Sci 41:354–357CrossRefGoogle Scholar
  26. 26.
    Ran Y, Jain N, Yalkowsky SH (2001) Prediction of aqueous solubility of organic compounds by the general solubility equation (GSE). J Chem Inf Comput Sci 41:1207–1208Google Scholar
  27. 27.
    Meylan WM, Howard PH, Boethling RS (1996) Improved method for estimating water solubility from octanol/water coefficient. Environ Toxicol Chem 15:100–106CrossRefGoogle Scholar
  28. 28.
    Meylan WM, Howard PH (2000) Estimating log P with atom/fragments and water solubility with logP. Perspect Drug Discovery Des 19:67–84CrossRefGoogle Scholar
  29. 29.
    Myrdal P, Ward GH, Dannenfelser RM, Mishra DS, Yalkowsky SH (1992) AQUAFAC 1: aqueous Functional group activity coefficients: application to hydrocarbons. Chemosphere 24:1047–1061CrossRefGoogle Scholar
  30. 30.
    Ruelle P, Rey-Mermet C, Buchmann M, Nam-Tran H, Kesselring U, Huyskens P (1991) A new predictive equation for the solubility of drugs based on the thermodynamics of mobile disorder. Pharm Res 8:840–850CrossRefGoogle Scholar
  31. 31.
    Roy K, Das RN (2011) On some novel extended topochemical atom (ETA) parameters for effective encoding of chemical information and modeling of fundamental physicochemical properties. SAR QSAR Environ Res 22:451–472CrossRefGoogle Scholar
  32. 32.
    Delaney JS (2005) Predicting aqueous solubility from structure. Drug Discovery Today 10:289–295CrossRefGoogle Scholar
  33. 33.
    Huuskonen J (2001) Estimation of aqueous solubility in drug design. Comb Chem HTS 4:311–316Google Scholar
  34. 34.
    Huuskonen J, Livingstone DJ, Manallack DT (2008) Prediction of drug solubility from molecular structure using a drug-like training set. SAR QSAR Env Res 19:191–212CrossRefGoogle Scholar
  35. 35.
    Yalkowsky SH, Dannelfelser RM (1990) The Arizona database of aqueous solubility. College of Pharmacy, University of Arizona, TucsonGoogle Scholar
  36. 36.
    O’Neill MJ, Smith A, Heckelman PE (eds) (2001) The Merck Index: an encyclopedia of chemicals, drugs, and biologicals, 13th edn. Whitehouse Station, RahwayGoogle Scholar
  37. 37.
    CambridgeSoft Corporation (2012) Cambridge USA, Accessed 26 April 2012
  38. 38.
    Syracuse Research Corporation (2012) Syracuse, USA, Accessed 26 April 2012
  39. 39.
    PubChem (2012) PubChem is a linked database of compounds and provides fast chemical structure similarity search tool. Accessed 26 April 2012
  40. 40.
    The National Institute of Standards and Technology (NIST) Chemistry WebBook is a database of chemicals compiled under the Standard Reference Data Program. Accessed 26 April 2012
  41. 41.
    ChemSpideris (2012) ChemSpideris a free chemical structure database governed by the Royal Society of Chemistry, Cambridge. Accessed 26 April 2012
  42. 42.
    Roy K, Ghosh G (2003) Introduction of extended topochemical atom (ETA) indices in the valence electron mobile (VEM) environment as tools for QSAR/QSPR studies. Internet Electron J Mol Des 2:599–620Google Scholar
  43. 43.
    Roy K, Ghosh G (2004) Introduction of extended topochemical atom (ETA) Indices in the valence electron mobile (VEM) environment as tools for QSAR/QSPR studies QSTR with extended topochemical atom indices. 2. Fish toxicity of substituted benzenes. J Chem Inf Comput Sci 44:559–567CrossRefGoogle Scholar
  44. 44.
    Roy K, Ghosh G (2004) QSTR with extended topochemical atom indices: 3. Toxicity of nitrobenzenes to Tetrahymena pyriformis. QSAR Comb Sci 23:99–108CrossRefGoogle Scholar
  45. 45.
    Roy K, Ghosh G (2004) QSTR with extended topochemical atom indices: 4. Modeling of the acute toxicity of phenylsulfonyl carboxylates to Vibrio fischeri using principal component factor analysis and principal component regression analysis. QSAR Comb Sci 23:526–535CrossRefGoogle Scholar
  46. 46.
    Roy K, Ghosh G (2005) QSTR with extended topochemical atom indices. Part 5. Modeling of the acute toxicity of phenylsulfonyl carboxylates to Vibrio fischeri using genetic function approximation. Bioorg Med Chem 13:1185–1194CrossRefGoogle Scholar
  47. 47.
    Roy K, Ghosh G (2006) QSTR with extended topochemical atom (ETA) indices: vI. Acute toxicity of benzene derivatives to tadpoles (Rana japonica). J Mol Model 12:306–316CrossRefGoogle Scholar
  48. 48.
    Roy K, Sanyal I (2006) QSTR with extended topochemical atom indices: 7. QSAR of substituted benzenes to Saccharomyces cerevisiae. QSAR Comb Sci 25:359–371CrossRefGoogle Scholar
  49. 49.
    Roy K, Ghosh G (2006) QSTR with extended topochemical atom (ETA) indices: 8. QSAR for the inhibition of substituted phenols on germination rate of Cucumis sativus using chemometric tools. QSAR Comb Sci 25:846–859CrossRefGoogle Scholar
  50. 50.
    Roy K, Ghosh G (2007) QSTR with extended topochemical atom (ETA) indices: 9. Comparative QSAR for the toxicity of diverse functional organic compounds to Chlorella vulgaris using chemometric tools. Chemosphere 70:1–12CrossRefGoogle Scholar
  51. 51.
    Roy K, Ghosh G (2008) QSTR with extended topochemical atom indices: 10. Modeling of toxicity of organic chemicals to humans using different chemometric tools. Chem Biol Drug Des 72:383–394CrossRefGoogle Scholar
  52. 52.
    Roy K, Ghosh G (2009) QSTR with extended topochemical atom (ETA) indices. 11. Comparative QSAR of acute NSAID cytotoxicity in rat hepatocytes using chemometric tools. Mol Simul 35:648–659CrossRefGoogle Scholar
  53. 53.
    Roy K, Ghosh G (2009) QSTR with extended topochemical atom (ETA) indices. 12. QSAR for the toxicity of diverse aromatic compounds to Tetrahymena pyriformis using chemometric tools. Chemosphere 77:999–1009CrossRefGoogle Scholar
  54. 54.
    Roy K, Ghosh G (2009) QSTR with extended topochemical atom (ETA) Indices. 13. Modeling of hERG K+ channel blocking activity of diverse functional drugs using different chemometric tools. Mol Simul 15:1256–1268CrossRefGoogle Scholar
  55. 55.
    Roy K, Das RN (2010) QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis. J Hazard Mater 183:913–922CrossRefGoogle Scholar
  56. 56.
    Roy K, Das RN (2012) QSTR with extended topochemical atom (ETA) indices. 15. Development of predictive models for toxicity of organic chemicals against fathead minnow using second generation ETA indices. SAR QSAR Environ Res 23:125–140CrossRefGoogle Scholar
  57. 57.
    Roy K, Sanyal I, Roy PP (2006) QSPR of the bio-concentration factors of nonionic organic compounds in fish using extended topochemical atom (ETA) indices. SAR QSAR Environ Res 17:563–582CrossRefGoogle Scholar
  58. 58.
    Roy K, Sanyal I, Ghosh G (2006) QSPR of n-octanol/water partition coefficient of non-ionic organic compounds using extended topochemical atom (ETA) indices. QSAR Comb Sci 25:629–646CrossRefGoogle Scholar
  59. 59.
    Roy K, Ghosh G (2010) Exploring QSARs with extended topochemical atom (ETA) indices for modeling chemical and drug toxicity. Curr Pharm Des 16:2625–2639CrossRefGoogle Scholar
  60. 60.
    Roy K, Das RN (2011) On extended topochemical atom (ETA) indices for QSPR studies. In: Castro EA, Hagi AK (eds) Advanced methods and applications in chemoinformatics: research progress and new applications. IGI Global, HersheyGoogle Scholar
  61. 61.
    Roy K, Kabir H (2012) QSPR with extended topochemical atom (ETA) indices. Modeling of critical micelle concentration of non-ionic surfactants. Chem Engg Sci 73:86–98CrossRefGoogle Scholar
  62. 62.
    Pal DK, Sengupta C, De AU (1988) A new topochemical descriptor (TAU) in molecular connectivity concept: part I—aliphatic compounds. Ind J Chem 27B:734–739Google Scholar
  63. 63.
    Pal DK, Purkayastha SK, Sengupta C, De AU (1992) Quantitative structure—property relationships with TAU indices: part I—research octane numbers of alkane fuel molecules. Ind J Chem 31B:109–114Google Scholar
  64. 64.
    Roy K, Saha A (2003) QSPR with TAU indices: water solubility of diverse functional acyclic compounds. Internet Electron J Mol Des 2:475–491Google Scholar
  65. 65.
    Roy K, Saha A (2004) QSPR with TAU indices: boiling points of sulfides and thiols. Ind J Chem 43A:1369–1376Google Scholar
  66. 66.
    Roy K, Saha A (2005) QSPR with TAU indices: molar refractivity of diverse functional acyclic compounds. Ind J Chem 44B:1693–1707Google Scholar
  67. 67.
    Leo AJ (1991) CLOGP, version 3.63. Daylight Chemical Information Systems, IrvineGoogle Scholar
  68. 68.
    Roy PP, Leonard JT, Roy K (2008) Exploring the impact of the size of training sets for the development of predictive QSAR models. Chemom Intell Lab Syst 90:31–42CrossRefGoogle Scholar
  69. 69.
    Stephens MA (1976) Asymptotic results for goodness-of-fit statistics with unknown parameters. Ann Stat 4:357–369CrossRefGoogle Scholar
  70. 70.
    Massey FJ Jr (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46:68–78CrossRefGoogle Scholar
  71. 71.
    Lilliefors HW (1967) On the Kolmogorov–Smirnov test for normality with mean and variance unknown. J Am Stat Assoc 64:399–402CrossRefGoogle Scholar
  72. 72.
    Hutter MC (2011) Determining the degree of randomness of descriptors in linear regression equations with respect to the data size. J Chem Inf Model 51:3099–3104CrossRefGoogle Scholar
  73. 73.
    Darlington RB (1990) Regression and linear models. McGrawHill, New YorkGoogle Scholar
  74. 74.
    Wold S (1995) In: van de Waterbeemd H (ed) Chemometric methods in molecular design. VCH, WeinheimGoogle Scholar
  75. 75.
    Wold H (1966) In: David FN (ed) Research papers in statistics, Festschrift for J. Neyman. Wiley, New YorkGoogle Scholar
  76. 76.
    Holland J (1975) Adaptation in artificial and natural systems. University of Michigan Press, Ann ArborGoogle Scholar
  77. 77.
    Friedman J (1988) Multivariate adaptive regression splines, technical report No. 102. Laboratory for Computational Statistics, Department of Statistics, Stanford University, Stanford, CA, Novemer (revised August 1990)Google Scholar
  78. 78.
    Rogers D, Hopfinger AJ (1994) Application of genetic function approximation to quantitative structure—activity relationships and quantitative structure—property relationships. J Chem Inf Comput Sci 34:854–866CrossRefGoogle Scholar
  79. 79.
    Yap CW (2011) PaDEL-Descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1466–1474CrossRefGoogle Scholar
  80. 80.
    Cerius 2 Version 4.10 (2005) Accelrys Inc., San Diego, CA, USA. Software. Accessed 26 April 2012
  81. 81.
    MINITAB, Minitab Inc., USA (2012) Software. Accessed 26 April 2012
  82. 82.
    STATISTICA, STATSOFT Inc., USA (2012) Software. Accessed 26 April 2012
  83. 83.
    Snedecor GW, Cochran WG (1967) Statistical methods. Oxford & IBH, New DelhiGoogle Scholar
  84. 84.
    Hawkins DM, Basak SC, Mills D (2003) Assessing model fit by cross-validation. J Chem Inf Comput Sci 43:579–586CrossRefGoogle Scholar
  85. 85.
    Schürmann G, Ebert R-U, Chen J, Wang B, Kühne R (2008) External validation and prediction employing the predictive squared correlation coefficients test set activity mean vs training set activity mean. J Chem Inf Model 48:2140–2145CrossRefGoogle Scholar
  86. 86.
    Roy PP, Paul S, Mitra I, Roy K (2009) On two novel parameters for validation of predictive QSAR models. Molecules 14:1660–1701CrossRefGoogle Scholar
  87. 87.
    Mitra I, Roy PP, Kar S, Ojha PK, Roy K (2010) On further application of rm2 as a metric for validation of QSAR models. J Chemom 24:22–33CrossRefGoogle Scholar
  88. 88.
    Ojha PK, Mitra I, Das RN, Roy K (2011) Further exploring rm2 metrics for validation of QSPR models. Chemom Intell Lab Syst 107:194–205CrossRefGoogle Scholar
  89. 89.
    Roy K, Mitra I, Kar S, Ojha PK, Das RN, Kabir H (2012) Comparative studies on some metrics for external validation of QSPR models. J Chem Inf Model 52:396–408CrossRefGoogle Scholar
  90. 90.
    Todeschini R (2010) Milano chemometrics, Italy (personal communication)Google Scholar
  91. 91.
    Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58:109–130CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Drug Theoretics and Cheminformatics Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical TechnologyJadavpur UniversityKolkataIndia

Personalised recommendations