Skip to main content
Log in

Prediction of the datasets modelability for the building of QSAR classification models by means of the centroid based rivality index

  • Original Paper
  • Published:
Journal of Mathematical Chemistry Aims and scope Submit manuscript

Abstract

The modelability index of a dataset of molecules is a measurement of the capacity of the dataset to be modeled using a QSAR algorithm. This measure allows to predict the correct classification rate of the dataset counting the nearest neighbors to the molecules of the dataset belonging to their same class. In this paper, we propose a new measure for the prediction of the modelability of datasets based on the use of the nearest neighbors based rivality index and the centroids based rivality index. These indexes take into account the noise that the nearest neighbor belonging to a different class could generate in the results of the QSAR classification algorithm. Using thirty benchmark datasets, two types of dataset representation and six different algorithms, we show the excellent behavior of the proposed indexes, obtaining correlations with values of R2 greater than 0.9 between the correct classification rate obtained in the classification processes using five folds cross-validation and the modelability index calculated using the centroid based rivality index.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. K. Roy, S. Kar, R.N. Das, A Primer on QSAR/QSPR Modeling (SpringerBriefs in Molecular Science, Springer, New York, 2015)

    Book  Google Scholar 

  2. G.M. Maggiora, On outliers and activity cliffs: why QSAR often disappoints. J. Chem. Inf. Model. 46, 1535 (2006)

    Article  CAS  PubMed  Google Scholar 

  3. A. Cherkasov, E.N. Muratov, D. Fourches, A. Varnek, I.I. Baskin, M. Cronin, J. Dearden, P. Gramatica, Y.C. Martin, R. Todeschini, V. Consonni, V.E. Kuz’min, R. Cramer, R. Benigni, C. Yang, J. Rathman, L. Terfloth, J. Gasteiger, A. Richard, A. Tropsha, QSAR modeling: where have you been? where are you going to? J. Chem. Inf. Model. 54, 1–4 (2014)

    Article  CAS  Google Scholar 

  4. F. Sahigara, K. Mansouri, D. Ballabio, A. Mauri, V. Consonni, R. Todeschini, Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17, 4791–4810 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. I.H. Witten, E. Frank, M.A. Hall, C.J. Pal, Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmman, Springer, Cambridge, 2017)

    Google Scholar 

  6. G. Cerruela García, N. García-Pedrajas, I. Luque Ruiz, M.A. Gómez-Nieto, An ensemble approach for in silico prediction of Ames mutagenicity. J. Math. Chem. 56, 2085–2098 (2018)

    Article  CAS  Google Scholar 

  7. A. Tropsha, Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 29, 476–488 (2010)

    Article  CAS  PubMed  Google Scholar 

  8. F. Adilova, A. Ikramov, Data set analysis for the calculation of the QSAR models predictive efficiency based on activity cliffs. Adv. Tech. Biol. Med. 5, 1–3 (2017)

    Article  Google Scholar 

  9. A. Golbraikh, E. Muratov, D. Fourches, A. Tropsha, Data set modelability by QSAR. J. Med. Chem. 57, 4977–5010 (2014)

    Article  CAS  Google Scholar 

  10. I. Luque Ruiz, M.A. Gómez-Nieto, Study of the Datasets Modelability: modelability, rivality and weighted modelability indexes. J. Chem. Inf. Model. 58, 1798–1814 (2018)

    Article  CAS  PubMed  Google Scholar 

  11. Chembench. Carolina Exploratory Center for Cheminformatics Research (CECCR). https://chembench.mml.unc.edu/. Accessed May, 2018

  12. A. Dalby, J.G. Nourse, W.D. Hounshell, A.K.I. Gushurt, D.L. Grier, B.A. Leland, J. Laufer, Description of several chemical structure file formats used by computer programs developed at molecular design limited. J. Chem. Inf. Comput. Sci. 32, 244–245 (1992)

    Article  CAS  Google Scholar 

  13. C.W. Yap, PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32, 1466–1474 (2011)

    Article  CAS  PubMed  Google Scholar 

  14. Daylight. Chemical Information System, Inc. Fingerprints-Screening and Similarity. http://www.daylight.com/dayhtml/doc/theory/theory.finger.html. Accessed May 2018

  15. Matlab and Simulink. Matlab 2017Rb. https://www.mathworks.com/products/matlab.html. Accessed May 2018

  16. Statistics and Machine Learning Toolbox. Matlab 2017Rb. https://www.mathworks.com/products/statistics.html. Accessed May 2018

  17. N.G. Zagouruiko, I.A. Borisova, V.V. Dyubanov, O.A. Kutnenko, Methods of recognition based on the function of rival similarity. Pattern Recognit. Image Anal. 18, 1–6 (2008)

    Article  Google Scholar 

Download references

Funding

Any funding supported the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

ILR and MAGN have shared all the design and experimental tasks and the development of the study and manuscript.

Corresponding author

Correspondence to Irene Luque Ruiz.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Availability of data and material

Word file including the results of the predictions models built using fingerprint and similarity matrixes as input data to the algorithms.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luque Ruiz, I., Gómez-Nieto, M.Á. Prediction of the datasets modelability for the building of QSAR classification models by means of the centroid based rivality index. J Math Chem 57, 1374–1393 (2019). https://doi.org/10.1007/s10910-018-0972-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10910-018-0972-8

Keywords

Navigation