Skip to main content
Log in

An empirical comparison of \(V\)-fold penalisation and cross-validation for model selection in distribution-free regression

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Model selection is a crucial issue in machine learning and a wide variety of penalisation methods (with possibly data-dependent complexity penalties) have recently been introduced for this purpose. However, their empirical performance is generally not well documented in the literature. It is the goal of this paper to investigate to which extent such recent techniques can be successfully used for the tuning of both the regularisation and kernel parameters in support vector regression (SVR) and the complexity measure in regression trees (CART). This task is traditionally solved via \(V\)-fold cross-validation (VFCV), which gives efficient results for a reasonable computational cost. A disadvantage, however, of VFCV is that the procedure is known to provide an asymptotically suboptimal risk estimate as the number of examples tends to infinity. Recently, a penalisation procedure called \(V\)-fold penalisation has been proposed to improve on VFCV, supported by theoretical arguments. Here, we report on an extensive set of experiments comparing \(V\)-fold penalisation and VFCV for SVR/CART calibration on several benchmark datasets. We highlight cases in which VFCV and \(V\)-fold penalisation provide poor estimates of the risk, respectively, and introduce a modified penalisation technique to reduce the estimation error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16:125–127

    Article  MathSciNet  MATH  Google Scholar 

  2. Arlot S (2008) \(V\)-fold cross-validation improved: \(V\)-fold penalization. http://hal.archives-ouvertes.fr/hal-00239182/en/. arXiv:0802.0566v2

  3. Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79

    Article  MathSciNet  MATH  Google Scholar 

  4. Bartlett PL, Mendelson S (2003) Rademacher and gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482

    MathSciNet  MATH  Google Scholar 

  5. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual conference on computational learning theory, ACM Press, New York, pp 144–152. doi:10.1145/130385.130401

  6. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Pacific Grove

    MATH  Google Scholar 

  7. Breiman L, Spector P (1992) Submodel selection and evaluation in regression. The X-random case. Intern Stat Rev 60(3):291–319

    Article  Google Scholar 

  8. Burman P (1989) A comparative study of ordinary cross-validation, \(v\)-fold cross-validation and the repeated learning-testing methods. Biometrika 76(3):503–514

    Article  MathSciNet  MATH  Google Scholar 

  9. Burrows W (1997) CART regression models for predicting UV radiation at the ground in the presence of cloud and other environmental factors. J Appl Meteorol 36(5):531–544

    Article  Google Scholar 

  10. Chang MW, Lin CJ (2005) Leave-one-out bounds for support vector regression model selection. Neural Comput 17:1188–1222

    Article  MATH  Google Scholar 

  11. Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17:113–126

    Article  MATH  Google Scholar 

  12. Chou P, Lookabaugh T, Gray R (1989) Optimal pruning with applications to tree-structured source coding and modeling. IEEE Trans Inf Theory 35(2):299–315

    Article  MathSciNet  Google Scholar 

  13. Drucker H, Burges CJC, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. In: Advances in neural information processing systems, vol 9. MIT Press, Cambridge, pp 155–161

  14. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Statist 7(1):1–26

    Article  MathSciNet  MATH  Google Scholar 

  15. Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

  16. Geisser S (1975) The predictive sample reuse method with applications. J Amer Statist Assoc 70:320–328

    Article  MATH  Google Scholar 

  17. Gey S, Nedelec E (2005) Model selection for CART regression trees. IEEE Trans Inf Theory 51(2):658–670

    Article  MathSciNet  MATH  Google Scholar 

  18. Guyon I, Boser B, Vapnik V (1993) Automatic capacity tuning of very large VC-dimension classifiers. Adv Neural Inf Process Syst 5:147–155

    Google Scholar 

  19. Hastie T, Rosset S, Tibshirani R, Zhu J (2004) The entire regularization path for the support vector machine. J Mach Learn Res 5:1391–1415

    MathSciNet  MATH  Google Scholar 

  20. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics., Data mining, inference, and predictionSpringer-Verlag, New York

    Book  MATH  Google Scholar 

  21. He W, Wang Z, Jiang H (2008) Model optimizing and feature selecting for support vector regression in time series forecasting. Neurocomputing 72:600–611

    Article  Google Scholar 

  22. Koltchinskii V (2006) Local Rademacher complexities and oracle inequalities in risk minimisation. Ann Stat 34(6):2593–2656

    Article  MathSciNet  MATH  Google Scholar 

  23. Kwok JT (2001) Linear dependency between epsilon and the input noise in epsilon-support vector regression. In: ICANN, Lecture Notes in Computer Science, vol 2130, Springer, Berlin, pp 405–410

  24. Liang P, Srebro N (2010) On the interaction between norm and dimensionality: multiple regimes in learning. In: International conference on machine learning (ICML)

  25. Mallows CL (1973) Some comments on \(\text{ C }_{p}\). Technometrics 15:661–675

    MATH  Google Scholar 

  26. Momma M, Bennett KP (2002) A pattern search method for model selection of support vector regression. In: In Proceedings of the SIAM international conference on data mining. SIAM

  27. Neal R (1998) Assessing relevance determination methods using DELVE. In: Neural networks and machine learning, pp 97–129

  28. Ong CJ, Shao S, Yang J (2010) An improved algorithm for the solution of the regularization path of support vector machine. Trans Neural Netw 21:451–462

    Article  Google Scholar 

  29. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  30. Rodriguez J, Perez A, Lozano J (2010) Sensitivity analysis of \(k\)-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32(3):569–575

    Article  Google Scholar 

  31. Schölkopf B, Smola A (2002) Learning with kernels. MIT press, Cambridge

    MATH  Google Scholar 

  32. Shao J (1993) Linear model selection by cross-validation. J Amer Stat Assoc 88(422):486–494

    Article  MathSciNet  MATH  Google Scholar 

  33. Smola A, Murata N, schölkopf B, müller KR (1998) Asymptotically optimal choice of \(\epsilon\)-loss for support vector machines. In: Niklasson L, bodén M, Ziemke T (eds.) Proceedings of the 8th international conference on artificial neural networks, perspectives in neural computing. Springer Verlag, Berlin

  34. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222

    Article  MathSciNet  Google Scholar 

  35. Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  36. Vapnik VN, Chervonenkis AY (1974) Ordered risk minimization. Automat Remote Cont 35:1226–1235

    MathSciNet  MATH  Google Scholar 

  37. Wernecke K, Possinger K, Kalb G, Stein J (1998) Validating classification trees. Biometr J 40(8):993–1005

    Article  MathSciNet  MATH  Google Scholar 

  38. Wilcoxon F (1945) Individual comparisons by ranking methods. Biometr Bull 1(6):80–83

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charanpal Dhanjal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhanjal, C., Baskiotis, N., Clémençon, S. et al. An empirical comparison of \(V\)-fold penalisation and cross-validation for model selection in distribution-free regression. Pattern Anal Applic 19, 41–53 (2016). https://doi.org/10.1007/s10044-014-0381-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-014-0381-y

Keywords

Navigation