Which Performance Parameters Are Best Suited to Assess the Predictive Ability of Models?

  • Károly Héberger
  • Anita Rácz
  • Dávid Bajusz
Part of the Challenges and Advances in Computational Chemistry and Physics book series (COCH, volume 24)


We have revisited the vivid discussion in the QSAR-related literature concerning the use of external versus cross-validation, and have presented a thorough statistical comparison of model performance parameters with the recently published SRD (sum of (absolute) ranking differences) method and analysis of variance (ANOVA). Two case studies were investigated, one of which has exclusively used external performance merits. The SRD methodology coupled with ANOVA shows unambiguously for both case studies that the performance merits are significantly different, independently from data preprocessing. While external merits are generally less consistent (farther from the reference) than training and cross-validation based merits, a clear ordering and a grouping pattern of them could be acquired. The results presented here corroborate our earlier, recently published findings (SAR QSAR Environ. Res., 2015, 26, 683–700) that external validation is not necessarily a wise choice, and is frequently comparable to a random evaluation of the models.


Performance parameters (merits) Ranking Cross-validation External validation QSAR modeling 



The work was supported by the Hungarian Scientific Research Fund (OTKA, grant number K 119269).


  1. Andrić, F., Bajusz, D., Rácz, A., et al. (2016). Multivariate assessment of lipophilicity scales—Computational and reversed phase thin-layer chromatographic indices. Journal of Pharmaceutical and Biomedical Analysis, 127, 81–93. doi: 10.1016/j.jpba.2016.04.001.CrossRefGoogle Scholar
  2. Bajusz, D., Rácz, A., & Héberger, K. (2015). Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? Journal of Cheminformatics, 7, 20. doi: 10.1186/s13321-015-0069-3.CrossRefGoogle Scholar
  3. Chirico, N., & Gramatica, P. (2011). Real external predictivity of QSAR models: How to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. Journal of Chemical Information and Modeling, 51, 2320–2335. doi: 10.1021/ci200211n.CrossRefGoogle Scholar
  4. Consonni, V., Ballabio, D., & Todeschini, R. (2010). Evaluation of model predictive ability by external validation techniques. Journal of Chemometrics, 24, 194–201. doi: 10.1002/cem.1290.CrossRefGoogle Scholar
  5. Esbensen, K. H., & Geladi, P. (2010). Principles of proper validation: Use and abuse of re-sampling for validation. Journal of Chemometrics, 24, 168–187. doi: 10.1002/cem.1310.CrossRefGoogle Scholar
  6. Gramatica, P. (2014). External evaluation of QSAR models, in addition to cross-validation: Verification of predictive capability on totally new chemicals. Molecular Informatics, 33, 311–314. doi: 10.1002/minf.201400030.CrossRefGoogle Scholar
  7. Gramatica, P., Cassani, S., Roy, P. P., et al. (2012). QSAR Modeling is not “push a button and find a correlation”: A case study of toxicity of (Benzo-)triazoles on Algae. Molecular Informatics, 31, 817–835. doi: 10.1002/minf.201200075.CrossRefGoogle Scholar
  8. Gramatica, P., Chirico, N., Papa, E., et al. (2013). QSARINS: A new software for the development, analysis, and validation of QSAR MLR models. Journal of Computational Chemistry, 34, 2121–2132. doi: 10.1002/jcc.23361.CrossRefGoogle Scholar
  9. Gütlein, M., Helma, C., Karwath, A., & Kramer, S. (2013). A large-scale empirical evaluation of cross-validation and external test set validation in (Q)SAR. Molecular Informatics, 32, 516–528. doi: 10.1002/minf.201200134.CrossRefGoogle Scholar
  10. Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). Cross-Validation. The elements of statistical learning: Data mining, inference, and prediction (2nd ed., pp. 241–249). New York: Springer.CrossRefGoogle Scholar
  11. Hawkins, D. M. (2004). The problem of overfitting. Journal of Chemical Information and Computer Sciences, 44, 1–12. doi: 10.1021/ci0342472.CrossRefGoogle Scholar
  12. Hawkins, D. M., Basak, S. C., & Mills, D. (2003). Assessing model fit by cross-validation. Journal of Chemical Information and Computer Sciences, 43, 579–586. doi: 10.1021/ci025626i.CrossRefGoogle Scholar
  13. Héberger, K. (2010). Sum of ranking differences compares methods or models fairly. TrAC Trends in Analytical Chemistry, 29, 101–109.CrossRefGoogle Scholar
  14. Héberger, K., Kolarević, S., Kračun-Kolarević, M., et al. (2014). Evaluation of single-cell gel electrophoresis data: Combination of variance analysis with sum of ranking differences. Mutation Research, Genetic Toxicology and Environmental Mutagenesis, 771, 15–22. doi: 10.1016/j.mrgentox.2014.04.028.CrossRefGoogle Scholar
  15. Kollár-Hunek, K., & Héberger, K. (2013). Method and model comparison by sum of ranking differences in cases of repeated observations (ties). Chemometrics and Intelligent Laboratory Systems, 127, 139–146. doi: 10.1016/j.chemolab.2013.06.007.CrossRefGoogle Scholar
  16. Lin, L. I.-K. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255–268.CrossRefGoogle Scholar
  17. Lin, L. I.-K. (1992). Assay validation using the concordance correlation coefficient. Biometrics, 48, 599. doi: 10.2307/2532314.CrossRefGoogle Scholar
  18. Lindman, H. R. (1991). Analysis of variance in experimental design. New York: Springer.Google Scholar
  19. Miller, A. (1990). Subset selection in regression. London: Chapman and Hall.CrossRefGoogle Scholar
  20. Rácz, A., Bajusz, D., & Héberger, K. (2015). Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters. SAR and QSAR in Environmental Research, 26, 683–700. doi: 10.1080/1062936X.2015.1084647.CrossRefGoogle Scholar
  21. Roy, K., Das, R. N., Ambure, P., & Aher, R. B. (2016). Be aware of error measures. Further studies on validation of predictive QSAR models. Chemometrics and Intelligent Laboratory Systems, 152, 18–33. doi: 10.1016/j.chemolab.2016.01.008.CrossRefGoogle Scholar
  22. Schüürmann, G., Ebert, R.-U., Chen, J., et al. (2008). External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean. Journal of Chemical Information and Modeling, 48, 2140–2145. doi: 10.1021/ci800253u.CrossRefGoogle Scholar
  23. Shi, L. M., Fang, H., Tong, W., et al. (2001). QSAR models using a large diverse set of estrogens. Journal of Chemical Information and Modeling, 41, 186–195. doi: 10.1021/ci000066d.Google Scholar
  24. Silla, J. M., Nunes, C. A., Cormanich, R. A., et al. (2011). MIA-QSPR and effect of variable selection on the modeling of kinetic parameters related to activities of modified peptides against dengue type 2. Chemometrics and Intelligent Laboratory Systems, 108, 146–149. doi: 10.1016/j.chemolab.2011.06.009.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Plasma Chemistry Research Group, Research Centre for Natural SciencesHungarian Academy of SciencesBudapestHungary
  2. 2.Medicinal Chemistry Research Group, Research Centre for Natural SciencesHungarian Academy of SciencesBudapestHungary

Personalised recommendations