Skip to main content
Log in

The effect of the diversity of molecules in sets and similarity of sets on the quality of prediction in QSAR studies

  • Original paper
  • Published:
Journal of Mathematical Chemistry Aims and scope Submit manuscript

Abstract

We report here: (a) formulas/procedures for calculating the similarity of molecules, considering their chemical structure, size, shape and hydrophilicity (b) a procedure for clusterization of the sets of molecules, according to similarity (c) formulas/procedures for calculating the diversity of molecules in clusterized sets as well as similarity of clusterized sets, based on Shannon Entropy formalism The paper analyses the influence of the diversity of molecules and similarity of calibration/prediction sets on the quality of prediction for prediction set molecules. The calculated influence of certain molecular feature (chemical structure, size, shape and hydrophilicity) on toxicity depends on the structure of the database, specifically the number of molecules and diversity of molecules having analyzed molecular feature. A QSAR analysis of 49 phenol derivatives revealed the effect of the diversity of molecules in sets and of the similarity of sets on the quality of prediction for prediction set molecules: (a) a direct correlation with the similarity of sets, regardless of analyzed molecular feature (b) an inverse correlation with the diversity of molecules in the calibration set, from the point of view of chemical structure, size and shape (c) a direct correlation with the diversity of molecules in calibration set, from the point of view of hydrophilicity (d) a direct correlation with the diversity of molecules in prediction set, regardless of analyzed feature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P. Gramatica, P. Pilutti, E. Papa, SAR QSAR Environ. Res. 13, 743 (2002)

    Article  CAS  Google Scholar 

  2. P. Gramatica, P. Pilutti, E. Papa, QSAR Comb. Sci. 22, 364 (2003)

    Article  CAS  Google Scholar 

  3. D.M. Hawkins, S.C. Basak, D. Mills, J. Chem. Inf. Comp. Sci. 43, 579 (2003)

    Article  CAS  Google Scholar 

  4. C. Helma, SAR QSAR Environ. Res. 15, 367 (2004)

    Article  CAS  Google Scholar 

  5. P. Gramatica, QSAR Comb. Sci. 26, 694 (2007)

    Article  CAS  Google Scholar 

  6. P.P. Roy, S. Paul, I. Mitra, K. Roy, Molecules 14, 1660 (2009)

    Article  CAS  Google Scholar 

  7. L. Tarko, C.T. Supuran, Bioorg. Med. Chem. 21, 1404 (2013)

    Article  CAS  Google Scholar 

  8. P. Gramatica, P. Pilutti, Report in Joint Research Centre (European Commission), contract ECVA-CCR.496576-Z

  9. A. Golbraikh, M. Shen, Z. Xiao, Y.-D. Xiao, K.-H. Lee, A. Tropsha, J. Comp-Aid, Mol. Des. 17, 241 (2003)

    CAS  Google Scholar 

  10. R. Carbo, M. Arnau, L. Leyda, Int. J. Quantum. Chem 17, 1185 (1980)

    Article  CAS  Google Scholar 

  11. R. Carbo, B. Calabuig, Concepts and Applications of Molecular Similarity (Wiley, New- York, 1990), pp. 147–171

  12. R. Carbo-Dorca, P.G. Mezey, Advances in Molecular Similarity, vol. 1 (JAI Press, Greenwich, 1996), pp. 89–120

  13. H. Kubinyi, Persp. Drug Discov. Des. 9, 225 (1998)

    Article  Google Scholar 

  14. Y.C. Martin, J.L. Kofron, L.M. Traphagen, J. Med. Chem. 45, 4350 (2002)

    Article  CAS  Google Scholar 

  15. J.L. Durant, B.A. Leland, D.R. Henry, J.G. Nourse, J. Chem. Inf. Comput. Sci. 42, 1273 (2002)

    Article  CAS  Google Scholar 

  16. N. Nikolova, J. Jaworska, QSAR Comb. Sci. 22, 1006 (2003)

    Article  Google Scholar 

  17. L. Ralaivola, S.J. Swamidass, S. Hiroto, P. Baldi, Neural Netw. 18, 1093 (2005)

    Article  Google Scholar 

  18. S.A. Rahman, M. Bashton, G.L. Holliday, R. Schrader, J.M. Thornton, J. Cheminform. 1, 12 (2009)

    Article  Google Scholar 

  19. M.S. Armstrong, J. Mol. Graph. Model. 28, 368 ((2009)

    Google Scholar 

  20. P.M. Petrone, B. Simms, F. Nigsch, E. Lounkine, P. Kutchukian, A. Cornett, Z. Deng, J.W. Davies, J.L. Jenkins, M. Glick, ACS Chem. Biol. 17, 1399 (2012)

    Article  Google Scholar 

  21. S. Nallusamy, S. Selvaraj, Bioinformation 8, 498 (2012)

    Article  Google Scholar 

  22. C. Li, L.M. Colosi, SAR QSAR Environ. Res. 24, 679 (2013)

    Article  CAS  Google Scholar 

  23. T.A. Roy, A.J. Krueger, C.R. Makerer, W. Neil, A.M. Arroyo, J.J. Yang, Dermal penetration capacity of some PAHs. SAR QSAR Environ. Res. 9, 171 (1998)

    Google Scholar 

  24. C. Rücker, M. Meringer, A. Kerber, Boiling point of some fluoroalkanes. J. Chem. Inf. Model. 45, 74 (2005)

    Google Scholar 

  25. O. Ivanciuc, T. Ivanciuc, P.A. Filip, D. Cabrol-Bass, Viscosity of quite various compounds. J. Chem. Inf. Sci. 39, 515 (1999)

    Google Scholar 

  26. J.S. Barker, C.K. Hattotuwagama, M.G.B. Drew, Sweetness power of some guanidine derivatives. Pure Appl. Chem. 74, 1207 (2002)

    Google Scholar 

  27. L.H. Hall, T.A. Vaughn, Med. Chem. Res. 7, 407 (1997)

    CAS  Google Scholar 

  28. K. Roy, G. Ghosh, Int. Elect. J. Mol. Des. 2, 599 (2003)

    CAS  Google Scholar 

  29. PCModel Program is Available from Gajewski, J. J.; Gilbert, K. E., Serena Software, Box 3076, Bloomington, IN, USA

  30. MOPAC Program is Available from Stewart, J.J.P., 15210 Paddington Circle, Colorado Springs, CO 80921 E-mail: MrMOPAC@OpenMOPAC.net; http://www.openmopac.net/

  31. PRECLAV Program is Available from Center of Organic Chemistry—Bucharest—Romanian Academy; ltarko@cco.ro; tarko\_laszlo@yahoo.com

  32. L. Tarko, Rev. Chim. (Bucuresti) 55, 539 (2004)

    CAS  Google Scholar 

  33. C.E. Shannon, Bell Syst. Tech. J. 27, 379 (1948)

    Article  Google Scholar 

  34. L. Tarko, J. Math. Chem. 49, 2330 (2011)

    Article  CAS  Google Scholar 

  35. L. Tarko, Rev. Chim. (Bucuresti) 55, 169 (2004)

    CAS  Google Scholar 

  36. See the documentation of PRECLAV, last version

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laszlo Tarko.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tarko, L. The effect of the diversity of molecules in sets and similarity of sets on the quality of prediction in QSAR studies. J Math Chem 52, 948–965 (2014). https://doi.org/10.1007/s10910-013-0302-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10910-013-0302-0

Keywords

Navigation