Skip to main content
Log in

Novel criteria for elimination of the outliers in QSPR studies, when the ‘forward stepwise’ procedure is used

  • Original Paper
  • Published:
Journal of Mathematical Chemistry Aims and scope Submit manuscript

Abstract

The characteristics of the proposed algorithm are (a) the use of a new formula for the quality of the QSPRs (b) the outlier (atypical) character is defined using a classic criterion (c) the condition for elimination of the outliers includes the quality of the equation (d) only ‘the most atypical’ molecule is eliminated and all calculations are automatically repeated (e) the elimination of outliers is stopped if the condition for elimination is not fulfilled or if the number of the eliminated molecules exceeds a predetermined limit. The second situation in (e) was encountered once in the four examples discussed. The number of descriptors in ‘the best’ equation and the number of outliers removed can not be a priori predicted. The text proposes also a criterion for the identification of ‘outliers for lead hopping’. There were no molecules of this type in the four examples discussed. The initial number of molecules in the calibration sets was 50, 60, 133 and 54 respectively, the number of descriptors in ‘the best’ equations was 5, 5, 9, and 9 respectively and the number of eliminated outliers was 0, 0, 8, and 6 respectively. If there were outliers, the best equation obtained in the presence of the outliers and the best equation obtained in the absence of outliers, were very different.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. V. Barnett, D. Roberts, Commun. Stat. 22, 2703 (1993)

    Article  Google Scholar 

  2. M. Frigge, D.C. Hoaglin, B. Iglewicz, Am. Statist. 43, 50 (1989)

    Google Scholar 

  3. M.B. Kremer, R.D. Martin, Comput. Intell. Finan. Eng. 29, 212 (1998)

    Google Scholar 

  4. K. Carling, Comput. Stat. Data Anal. 33, 249 (2000)

    Article  Google Scholar 

  5. V. Saltenis, Informatica 15, 399 (2004)

    Google Scholar 

  6. A.G. Steele, B.M. Wood, R.J. Douglas, Metrologia 42, 32 (2005)

    Article  Google Scholar 

  7. Q. Zhou, S. Li, X. Li, W. Wang, Z. Wang, Clin. Chim. Acta 372, 94 (2006)

    Article  CAS  PubMed  Google Scholar 

  8. J.-L. Faulon, A. Bender, Handbook of Chemoinformatics Algorithms (CRC Press, Boca Raton, 2010)

    Book  Google Scholar 

  9. L. Tarko, MATCH Commun. Math. Comput. Chem. 75, 511 (2016)

    Google Scholar 

  10. L. Tarko, MATCH Commun. Math. Comput. Chem. 78, 565 (2017)

    Google Scholar 

  11. M. Hrubaru, L. Tarko, Rev. Chim. (Bucharest) 79, 887 (2019)

    Google Scholar 

  12. L.D. Grigoreva, V.Y. Grigorev, A.V. Yarkov, Moscow Univ. Chem. Bull. 74, 1 (2019)

    Article  Google Scholar 

  13. G.H. Schmid, V.M. Csizmadia, P.G. Mezey, I.G. Csizmadia, Can. J. Chem. 54, 3330 (1976)

    Article  CAS  Google Scholar 

  14. A. Lehman, Jmp For Basic Univariate And Multivariate Statistics:A Step-by-step Guide (Cary, NC: SAS Press 2005, p. 123)

  15. M. Kendall, Biometrika 30, 81 (1938)

    Article  Google Scholar 

  16. N. Draper, H. Smith, Applied Regression Analysis, 2d edn. (Wiley, NY, 1981)

    Google Scholar 

  17. E.S. Pearson, C.C. Sekar, Biometrika 28, 308 (1936)

    Article  Google Scholar 

  18. A. C. R. Sodero, N. C. Romeiro, E. F. F. da Cunha, U. de O. Magalhães, R. B. de Alencastro, C. R. Rodrigues, L. M. Cabral, H. C. Castro, M G. Albuquerque, Molecules 17, 7415 (2012)

  19. L. Tarko, I. Lupescu, D. Gropoşilă - Constantinescu, ARKIVOC xiii, 22 (2006)

  20. D. Kim, S.-I. Hong, D.-S. Lee, Int. J. Mol. Sci. 7, 485 (2006)

    Article  CAS  Google Scholar 

  21. L. Tarko, J. Math. Chem. 47, 174 (2010)

    Article  CAS  Google Scholar 

  22. D.S. Cao, Y.Z. Liang, O.S. Xu, H.D. Li, X. Chen, J. Comput. Chem. 31, 592 (2010)

    CAS  PubMed  Google Scholar 

  23. A. Cherkasov, E.N. Muratov, D. Fourches, A. Varnek, I.I. Baskin, M. Cronin, J. Dearden, P. Gramatica, Y.C. Martin, R. Todeschini, V. Consonni, V.E. Kuzmin, R. Cramer, R. Benigni, C. Yang, J. Rathman, L. Terfloth, J. Gasteiger, A. Richard, A. Tropsha, J. Med. Chem. 57, 4977 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. On-line Accelrys documentation of the software QSAR+http://www.esi.umontreal.ca/accelrys/life/cerius46/qsar/working_with_stats.html

  25. O. Maimon, L. Rokach, Data mining and knowledge discovery handbook, vol. 2 (Springer, Berlin, 2005)

    Book  Google Scholar 

  26. C. C. Aggarwal, Outlier analysis. in Data Mining (Springer 2015)

  27. F. Ruggiu, Anal. Chem. 86, 2510 (2014)

    Article  CAS  PubMed  Google Scholar 

  28. F.E. Grubbs, Ann. Math. Statis. 21, 27 (1950)

    Article  Google Scholar 

  29. L. Tarko, J. Math. Chem. 52, 948 (2014)

    Article  CAS  Google Scholar 

  30. L. Zhao, W. Wang, A. Sedykh, H. Zhu, ACS Omega. 2, 2805 (2017)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. PCModel program is available from J. J. Gajewski, K. E. Gilbert, Serena Software, Box 3076, Bloomington, IN, USA

  32. MOPAC program is available from J. J. P. Stewart,15210 Paddington Circle, Colorado Springs, CO 80921; MrMOPAC@OpenMOPAC.net http://www.openmopac.net/, accessed in March 2019

  33. J.J.P. Stewart, J. Mol. Model. 13, 1173 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. L. Tarko, MATCH Commun. Math. Comput. Chem. 77, 245 (2017)

    Google Scholar 

  35. DRAGON program is available from Talete srl., via V Pisani, 13-20124, Milano, Italy; http://www.talete.mi.it

  36. J.G. Topliss, J. Med. Chem. 15, 1006 (1972)

    Article  CAS  PubMed  Google Scholar 

  37. A. Tropsha, Mol. Inf. 29, 476 (2010)

    Article  CAS  Google Scholar 

  38. C. Michael, M.C. Hutter, J. Chem. Inf. Model. 51, 3099 (2011)

    Article  CAS  Google Scholar 

  39. M.T.D. Cronin, T.W. Schultz, J. Mol. Struct. THEOCHEM. 622, 39 (2003)

    Article  CAS  Google Scholar 

  40. R.D. Cramer, R.J. Lilek, S. Guessregen, S.J. Clark, B. Wendt, R.D. Clark, J. Med. Chem. 47, 6777 (2004)

    Article  CAS  PubMed  Google Scholar 

  41. J.C. Saeh, P.D. Lynep, B.K. Takasaki, D.A. Cosgrove, J. Chem. Inf. Comput. Sci. 45, 1122 (2005)

    Article  CAS  Google Scholar 

  42. L.H. Hall, T.A. Vaughn, Med. Chem. Res. 7, 407 (1997)

    CAS  Google Scholar 

  43. K. Roy, G. Ghosh G., Int. Elec. J. Mol. Des. 2, 599 (2003)

  44. R.C. Geary, Incorp. Statist. 5, 115 (1954)

    Article  Google Scholar 

  45. T.A. Roy, A.J. Krueger, C.R. Makerer, W. Neil, A.M. Arroyo, J.J. Yang, SAR and QSAR Env. Res. 9, 171 (1998)

    Article  CAS  Google Scholar 

  46. O. Ivanciuc, T. Ivanciuc, A.T. Balaban, Int. Elec. J. Mol. Des. 1, 559 (2002)

    CAS  Google Scholar 

  47. L. Tarko L., ARKIVOC, xi, 24 (2008)

  48. M.C. Hemmer, V. Steinhauer, J. Gasteiger, Vibrat. Spect. 19, 151 (1999)

    Article  CAS  Google Scholar 

  49. K. Fukui, Theory of Orientation and Stereoselection (Springer, Berlin, 1975)

    Book  Google Scholar 

  50. J. Gálvez, R. Garcìa, M.T. Salabert, R. Soler, J. Chem. Inf. Comput. Sci. 34, 520 (1994)

    Article  Google Scholar 

  51. L. Tako, S. Calafeteanu, Rev. Chim. 49, 169 (1998)

    Google Scholar 

  52. M. Randic, J. Chem. Inf. Comput. Sci. 41, 607 (2001)

    Article  CAS  PubMed  Google Scholar 

  53. T.M. Krygowski, M. Cyranski, A. Ciesielski, B. Swirska, P. Leszczynski, J. Chem. Inf. Comput. Sci. 36, 1135 (1996)

    Article  CAS  Google Scholar 

  54. V. Consonni, R. Todeschini, M. Pavan, J. Chem. Inf. Comput. Sci. 42, 682 (2002)

    Article  CAS  PubMed  Google Scholar 

  55. V. Consonni, R. Todeschini, M. Pavan, P. Gramatica, J. Chem. Inf. Comput. Sci. 42, 693 (2002)

    Article  CAS  PubMed  Google Scholar 

  56. N. Trinajstic, D. Babic, S. Nikolic, D. Plavsic, D. Amic, Z. Mihalic, J. Chem. Inf. Comput. Sci. 34, 368 (1994)

    Article  CAS  Google Scholar 

  57. P.A.P. Moran, Biometrika 37, 17 (1950)

    Article  CAS  PubMed  Google Scholar 

  58. R. Todeschini, M. Lasagni, E. Marengo, J. Chemom. 8, 263 (1994)

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laszlo Tarko.

Ethics declarations

Conflict of interest

There is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tarko, L. Novel criteria for elimination of the outliers in QSPR studies, when the ‘forward stepwise’ procedure is used. J Math Chem 57, 1770–1796 (2019). https://doi.org/10.1007/s10910-019-01036-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10910-019-01036-x

Keywords

Mathematics Subject Classification

Navigation