Prediction of the datasets modelability for the building of QSAR classification models by means of the centroid based rivality index

Luque Ruiz, Irene; Gómez-Nieto, Miguel Ángel

doi:10.1007/s10910-018-0972-8

Prediction of the datasets modelability for the building of QSAR classification models by means of the centroid based rivality index

Original Paper
Published: 01 November 2018

Volume 57, pages 1374–1393, (2019)
Cite this article

Journal of Mathematical Chemistry Aims and scope Submit manuscript

251 Accesses
2 Citations
Explore all metrics

Abstract

The modelability index of a dataset of molecules is a measurement of the capacity of the dataset to be modeled using a QSAR algorithm. This measure allows to predict the correct classification rate of the dataset counting the nearest neighbors to the molecules of the dataset belonging to their same class. In this paper, we propose a new measure for the prediction of the modelability of datasets based on the use of the nearest neighbors based rivality index and the centroids based rivality index. These indexes take into account the noise that the nearest neighbor belonging to a different class could generate in the results of the QSAR classification algorithm. Using thirty benchmark datasets, two types of dataset representation and six different algorithms, we show the excellent behavior of the proposed indexes, obtaining correlations with values of R² greater than 0.9 between the correct classification rate obtained in the classification processes using five folds cross-validation and the modelability index calculated using the centroid based rivality index.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dataset Modelability by QSAR: Continuous Response Variable

Modelability Criteria: Statistical Characteristics Estimating Feasibility to Build Predictive QSAR Models for a Dataset

MASSA Algorithm: an automated rational sampling of training and test subsets for QSAR modeling

Article 07 October 2023

References

K. Roy, S. Kar, R.N. Das, A Primer on QSAR/QSPR Modeling (SpringerBriefs in Molecular Science, Springer, New York, 2015)
Book Google Scholar
G.M. Maggiora, On outliers and activity cliffs: why QSAR often disappoints. J. Chem. Inf. Model. 46, 1535 (2006)
Article CAS PubMed Google Scholar
A. Cherkasov, E.N. Muratov, D. Fourches, A. Varnek, I.I. Baskin, M. Cronin, J. Dearden, P. Gramatica, Y.C. Martin, R. Todeschini, V. Consonni, V.E. Kuz’min, R. Cramer, R. Benigni, C. Yang, J. Rathman, L. Terfloth, J. Gasteiger, A. Richard, A. Tropsha, QSAR modeling: where have you been? where are you going to? J. Chem. Inf. Model. 54, 1–4 (2014)
Article CAS Google Scholar
F. Sahigara, K. Mansouri, D. Ballabio, A. Mauri, V. Consonni, R. Todeschini, Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17, 4791–4810 (2012)
Article CAS PubMed PubMed Central Google Scholar
I.H. Witten, E. Frank, M.A. Hall, C.J. Pal, Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmman, Springer, Cambridge, 2017)
Google Scholar
G. Cerruela García, N. García-Pedrajas, I. Luque Ruiz, M.A. Gómez-Nieto, An ensemble approach for in silico prediction of Ames mutagenicity. J. Math. Chem. 56, 2085–2098 (2018)
Article CAS Google Scholar
A. Tropsha, Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 29, 476–488 (2010)
Article CAS PubMed Google Scholar
F. Adilova, A. Ikramov, Data set analysis for the calculation of the QSAR models predictive efficiency based on activity cliffs. Adv. Tech. Biol. Med. 5, 1–3 (2017)
Article Google Scholar
A. Golbraikh, E. Muratov, D. Fourches, A. Tropsha, Data set modelability by QSAR. J. Med. Chem. 57, 4977–5010 (2014)
Article CAS Google Scholar
I. Luque Ruiz, M.A. Gómez-Nieto, Study of the Datasets Modelability: modelability, rivality and weighted modelability indexes. J. Chem. Inf. Model. 58, 1798–1814 (2018)
Article CAS PubMed Google Scholar
Chembench. Carolina Exploratory Center for Cheminformatics Research (CECCR). https://chembench.mml.unc.edu/. Accessed May, 2018
A. Dalby, J.G. Nourse, W.D. Hounshell, A.K.I. Gushurt, D.L. Grier, B.A. Leland, J. Laufer, Description of several chemical structure file formats used by computer programs developed at molecular design limited. J. Chem. Inf. Comput. Sci. 32, 244–245 (1992)
Article CAS Google Scholar
C.W. Yap, PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32, 1466–1474 (2011)
Article CAS PubMed Google Scholar
Daylight. Chemical Information System, Inc. Fingerprints-Screening and Similarity. http://www.daylight.com/dayhtml/doc/theory/theory.finger.html. Accessed May 2018
Matlab and Simulink. Matlab 2017Rb. https://www.mathworks.com/products/matlab.html. Accessed May 2018
Statistics and Machine Learning Toolbox. Matlab 2017Rb. https://www.mathworks.com/products/statistics.html. Accessed May 2018
N.G. Zagouruiko, I.A. Borisova, V.V. Dyubanov, O.A. Kutnenko, Methods of recognition based on the function of rival similarity. Pattern Recognit. Image Anal. 18, 1–6 (2008)
Article Google Scholar

Download references

Funding

Any funding supported the manuscript.

Author information

Authors and Affiliations

Department of Computing and Numerical Analysis, University of Córdoba, Campus de Rabanales, Albert Einstein building, 14071, Córdoba, Spain
Irene Luque Ruiz & Miguel Ángel Gómez-Nieto

Authors

Irene Luque Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Ángel Gómez-Nieto
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

ILR and MAGN have shared all the design and experimental tasks and the development of the study and manuscript.

Corresponding author

Correspondence to Irene Luque Ruiz.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Availability of data and material

Word file including the results of the predictions models built using fingerprint and similarity matrixes as input data to the algorithms.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luque Ruiz, I., Gómez-Nieto, M.Á. Prediction of the datasets modelability for the building of QSAR classification models by means of the centroid based rivality index. J Math Chem 57, 1374–1393 (2019). https://doi.org/10.1007/s10910-018-0972-8

Download citation

Received: 11 June 2018
Accepted: 24 October 2018
Published: 01 November 2018
Issue Date: 15 May 2019
DOI: https://doi.org/10.1007/s10910-018-0972-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of the datasets modelability for the building of QSAR classification models by means of the centroid based rivality index

Abstract

Access this article

Similar content being viewed by others

Dataset Modelability by QSAR: Continuous Response Variable

Modelability Criteria: Statistical Characteristics Estimating Feasibility to Build Predictive QSAR Models for a Dataset

MASSA Algorithm: an automated rational sampling of training and test subsets for QSAR modeling

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Availability of data and material

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Prediction of the datasets modelability for the building of QSAR classification models by means of the centroid based rivality index

Abstract

Access this article

Similar content being viewed by others

Dataset Modelability by QSAR: Continuous Response Variable

Modelability Criteria: Statistical Characteristics Estimating Feasibility to Build Predictive QSAR Models for a Dataset

MASSA Algorithm: an automated rational sampling of training and test subsets for QSAR modeling

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Availability of data and material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation