Abstract
We report here: (a) formulas/procedures for calculating the similarity of molecules, considering their chemical structure, size, shape and hydrophilicity (b) a procedure for clusterization of the sets of molecules, according to similarity (c) formulas/procedures for calculating the diversity of molecules in clusterized sets as well as similarity of clusterized sets, based on Shannon Entropy formalism The paper analyses the influence of the diversity of molecules and similarity of calibration/prediction sets on the quality of prediction for prediction set molecules. The calculated influence of certain molecular feature (chemical structure, size, shape and hydrophilicity) on toxicity depends on the structure of the database, specifically the number of molecules and diversity of molecules having analyzed molecular feature. A QSAR analysis of 49 phenol derivatives revealed the effect of the diversity of molecules in sets and of the similarity of sets on the quality of prediction for prediction set molecules: (a) a direct correlation with the similarity of sets, regardless of analyzed molecular feature (b) an inverse correlation with the diversity of molecules in the calibration set, from the point of view of chemical structure, size and shape (c) a direct correlation with the diversity of molecules in calibration set, from the point of view of hydrophilicity (d) a direct correlation with the diversity of molecules in prediction set, regardless of analyzed feature.
Similar content being viewed by others
References
P. Gramatica, P. Pilutti, E. Papa, SAR QSAR Environ. Res. 13, 743 (2002)
P. Gramatica, P. Pilutti, E. Papa, QSAR Comb. Sci. 22, 364 (2003)
D.M. Hawkins, S.C. Basak, D. Mills, J. Chem. Inf. Comp. Sci. 43, 579 (2003)
C. Helma, SAR QSAR Environ. Res. 15, 367 (2004)
P. Gramatica, QSAR Comb. Sci. 26, 694 (2007)
P.P. Roy, S. Paul, I. Mitra, K. Roy, Molecules 14, 1660 (2009)
L. Tarko, C.T. Supuran, Bioorg. Med. Chem. 21, 1404 (2013)
P. Gramatica, P. Pilutti, Report in Joint Research Centre (European Commission), contract ECVA-CCR.496576-Z
A. Golbraikh, M. Shen, Z. Xiao, Y.-D. Xiao, K.-H. Lee, A. Tropsha, J. Comp-Aid, Mol. Des. 17, 241 (2003)
R. Carbo, M. Arnau, L. Leyda, Int. J. Quantum. Chem 17, 1185 (1980)
R. Carbo, B. Calabuig, Concepts and Applications of Molecular Similarity (Wiley, New- York, 1990), pp. 147–171
R. Carbo-Dorca, P.G. Mezey, Advances in Molecular Similarity, vol. 1 (JAI Press, Greenwich, 1996), pp. 89–120
H. Kubinyi, Persp. Drug Discov. Des. 9, 225 (1998)
Y.C. Martin, J.L. Kofron, L.M. Traphagen, J. Med. Chem. 45, 4350 (2002)
J.L. Durant, B.A. Leland, D.R. Henry, J.G. Nourse, J. Chem. Inf. Comput. Sci. 42, 1273 (2002)
N. Nikolova, J. Jaworska, QSAR Comb. Sci. 22, 1006 (2003)
L. Ralaivola, S.J. Swamidass, S. Hiroto, P. Baldi, Neural Netw. 18, 1093 (2005)
S.A. Rahman, M. Bashton, G.L. Holliday, R. Schrader, J.M. Thornton, J. Cheminform. 1, 12 (2009)
M.S. Armstrong, J. Mol. Graph. Model. 28, 368 ((2009)
P.M. Petrone, B. Simms, F. Nigsch, E. Lounkine, P. Kutchukian, A. Cornett, Z. Deng, J.W. Davies, J.L. Jenkins, M. Glick, ACS Chem. Biol. 17, 1399 (2012)
S. Nallusamy, S. Selvaraj, Bioinformation 8, 498 (2012)
C. Li, L.M. Colosi, SAR QSAR Environ. Res. 24, 679 (2013)
T.A. Roy, A.J. Krueger, C.R. Makerer, W. Neil, A.M. Arroyo, J.J. Yang, Dermal penetration capacity of some PAHs. SAR QSAR Environ. Res. 9, 171 (1998)
C. Rücker, M. Meringer, A. Kerber, Boiling point of some fluoroalkanes. J. Chem. Inf. Model. 45, 74 (2005)
O. Ivanciuc, T. Ivanciuc, P.A. Filip, D. Cabrol-Bass, Viscosity of quite various compounds. J. Chem. Inf. Sci. 39, 515 (1999)
J.S. Barker, C.K. Hattotuwagama, M.G.B. Drew, Sweetness power of some guanidine derivatives. Pure Appl. Chem. 74, 1207 (2002)
L.H. Hall, T.A. Vaughn, Med. Chem. Res. 7, 407 (1997)
K. Roy, G. Ghosh, Int. Elect. J. Mol. Des. 2, 599 (2003)
PCModel Program is Available from Gajewski, J. J.; Gilbert, K. E., Serena Software, Box 3076, Bloomington, IN, USA
MOPAC Program is Available from Stewart, J.J.P., 15210 Paddington Circle, Colorado Springs, CO 80921 E-mail: MrMOPAC@OpenMOPAC.net; http://www.openmopac.net/
PRECLAV Program is Available from Center of Organic Chemistry—Bucharest—Romanian Academy; ltarko@cco.ro; tarko\_laszlo@yahoo.com
L. Tarko, Rev. Chim. (Bucuresti) 55, 539 (2004)
C.E. Shannon, Bell Syst. Tech. J. 27, 379 (1948)
L. Tarko, J. Math. Chem. 49, 2330 (2011)
L. Tarko, Rev. Chim. (Bucuresti) 55, 169 (2004)
See the documentation of PRECLAV, last version
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tarko, L. The effect of the diversity of molecules in sets and similarity of sets on the quality of prediction in QSAR studies. J Math Chem 52, 948–965 (2014). https://doi.org/10.1007/s10910-013-0302-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10910-013-0302-0