Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization

  • Ioannis Tsamardinos
  • Amin Rakhshani
  • Vincenzo Lagani
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8445)


In a typical supervised data analysis task, one needs to perform the following two tasks: (a) select the best combination of learning methods (e.g., for variable selection and classifier) and tune their hyper-parameters (e.g., K in K-NN), also called model selection, and (b) provide an estimate of the performance of the final, reported model. Combining the two tasks is not trivial because when one selects the set of hyper-parameters that seem to provide the best estimated performance, this estimation is optimistic (biased / overfitted) due to performing multiple statistical comparisons. In this paper, we confirm that the simple Cross-Validation with model selection is indeed optimistic (overestimates) in small sample scenarios. In comparison the Nested Cross Validation and the method by Tibshirani and Tibshirani provide conservative estimations, with the later protocol being more computationally efficient. The role of stratification of samples is examined and it is shown that stratification is beneficial.


Model Selection Class Distribution Estimation Protocol Photon Emission Compute Tomography Image Gamma Particle 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anguita, D., Ghio, A., Oneto, L., Ridella, S.: In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines. IEEE Trans. Neural Networks Learn. Syst. 23, 1390–1406 (2012)CrossRefGoogle Scholar
  2. 2.
    Cawley, G.C., Talbot, N.L.C.: On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)zbMATHMathSciNetGoogle Scholar
  3. 3.
    Jensen, D.D., Cohen, P.R.: Multiple comparisons in induction algorithms. Mach. Learn. 38, 309–338 (2000)CrossRefzbMATHGoogle Scholar
  4. 4.
    Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: International Joint Conference on Artificial Intelligence, pp. 1137–1143 (1995)Google Scholar
  5. 5.
    Tibshirani, R.J., Tibshirani, R.: A bias correction for the minimum error rate in cross-validation. Ann. Appl. Stat. 3, 822–829 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21, 631–643 (2005)CrossRefGoogle Scholar
  7. 7.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. (2000)Google Scholar
  8. 8.
    Mitchell, T.M.: Machine Learning (1997)Google Scholar
  9. 9.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics) (2006)Google Scholar
  10. 10.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Elements 1, 337–387 (2009)Google Scholar
  11. 11.
    Bengio, Y., Grandvalet, Y.: Bias in Estimating the Variance of K-Fold Cross-Validation. Statistical Modeling and Analysis for Complex Data Problem, 75–95 (2005)Google Scholar
  12. 12.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems (2005)Google Scholar
  13. 13.
    Lagani, V., Tsamardinos, I.: Structure-based variable selection for survival data. Bioinformatics 26, 1887–1894 (2010)CrossRefGoogle Scholar
  14. 14.
    Statnikov, A., Tsamardinos, I., Dosbayev, Y., Aliferis, C.F.: GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int. J. Med. Inform. 74, 491–503 (2005)CrossRefGoogle Scholar
  15. 15.
    Salzberg, S.: On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Min. Knowl. Discov. 328, 317–328 (1997)CrossRefGoogle Scholar
  16. 16.
    Iizuka, N., Oka, M., Yamada-Okabe, H., Nishida, M., Maeda, Y., Mori, N., Takao, T., Tamesa, T., Tangoku, A., Tabuchi, H., Hamada, K., Nakayama, H., Ishitsuka, H., Miyamoto, T., Hirabayashi, A., Uchimura, S., Hamamoto, Y.: Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection. Lancet 361, 923–929 (2003)CrossRefGoogle Scholar
  17. 17.
    Kurgan, L.A., Cios, K.J., Tadeusiewicz, R., Ogiela, M., Goodenday, L.S.: Knowledge discovery approach to automated cardiac SPECT diagnosis. Artif. Intell. Med. 23, 149–169 (2001)CrossRefGoogle Scholar
  18. 18.
    Bock, R.K., Chilingarian, A., Gaug, M., Hakl, F., Hengstebeck, T., Jiřina, M., Klaschka, J., Kotrč, E., Savický, P., Towers, S., Vaiciulis, A., Wittek, W.: Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip. 516, 511–528 (2004)CrossRefGoogle Scholar
  19. 19.
    Mansouri, K., Ringsted, T., Ballabio, D., Todeschini, R., Consonni, V.: Quantitative structure-activity relationship models for ready biodegradability of chemicals. J. Chem. Inf. Model. 53, 867–878 (2013)CrossRefGoogle Scholar
  20. 20.
    Moro, S., Laureano, R.M.S.: Using Data Mining for Bank Direct Marketing: An application of the CRISP-DM methodology. In: Eur. Simul. Model. Conf., pp. 117–121 (2011)Google Scholar
  21. 21.
    Bendall, S.C., Simonds, E.F., Qiu, P., Amir, E.D., Krutzik, P.O., Finck, R., Bruggner, R.V., Melamed, R., Trejo, A., Ornatsky, O.I., Balderas, R.S., Plevritis, S.K., Sachs, K., Pe’er, D., Tanner, S.D., Nolan, G.P.: Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332, 687–696 (2011)CrossRefGoogle Scholar
  22. 22.
    Fawcett, T.: An introduction to ROC analysis (2006)Google Scholar
  23. 23.
    Coppersmith, D., Hong, S.J., Hosking, J.R.M.: Partitioning Nominal Attributes in Decision Trees. Data Min. Knowl. Discov. 3, 197–217 (1999)CrossRefGoogle Scholar
  24. 24.
    Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst..... 2, 1–39 (2011)CrossRefGoogle Scholar
  25. 25.
    O’brien, R.G.: A General ANOVA Method for Robust Tests of Additive Models for Variances. J. Am. Stat. Assoc. 74, 877–880 (1979)CrossRefzbMATHMathSciNetGoogle Scholar
  26. 26.
    Quenouille, M.H.: Approximate tests of correlation in time-series 3 (1949)Google Scholar
  27. 27.
    Varma, S., Simon, R.: Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7, 91 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ioannis Tsamardinos
    • 1
    • 2
  • Amin Rakhshani
    • 1
    • 2
  • Vincenzo Lagani
    • 1
  1. 1.Institute of Computer ScienceFoundation for Research and Technology HellasHeraklionGreece
  2. 2.Computer Science DepartmentUniversity of CreteHeraklionGreece

Personalised recommendations