Advertisement

Pattern Analysis and Applications

, Volume 19, Issue 1, pp 41–53 | Cite as

An empirical comparison of \(V\)-fold penalisation and cross-validation for model selection in distribution-free regression

  • Charanpal DhanjalEmail author
  • Nicolas Baskiotis
  • Stéphan Clémençon
  • Nicolas Usunier
Theoretical Advances
  • 152 Downloads

Abstract

Model selection is a crucial issue in machine learning and a wide variety of penalisation methods (with possibly data-dependent complexity penalties) have recently been introduced for this purpose. However, their empirical performance is generally not well documented in the literature. It is the goal of this paper to investigate to which extent such recent techniques can be successfully used for the tuning of both the regularisation and kernel parameters in support vector regression (SVR) and the complexity measure in regression trees (CART). This task is traditionally solved via \(V\)-fold cross-validation (VFCV), which gives efficient results for a reasonable computational cost. A disadvantage, however, of VFCV is that the procedure is known to provide an asymptotically suboptimal risk estimate as the number of examples tends to infinity. Recently, a penalisation procedure called \(V\)-fold penalisation has been proposed to improve on VFCV, supported by theoretical arguments. Here, we report on an extensive set of experiments comparing \(V\)-fold penalisation and VFCV for SVR/CART calibration on several benchmark datasets. We highlight cases in which VFCV and \(V\)-fold penalisation provide poor estimates of the risk, respectively, and introduce a modified penalisation technique to reduce the estimation error.

Keywords

V-fold penalisation Cross-validation Model selection SVR CART 

References

  1. 1.
    Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16:125–127MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Arlot S (2008) \(V\)-fold cross-validation improved: \(V\)-fold penalization. http://hal.archives-ouvertes.fr/hal-00239182/en/. arXiv:0802.0566v2
  3. 3.
    Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Bartlett PL, Mendelson S (2003) Rademacher and gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482MathSciNetzbMATHGoogle Scholar
  5. 5.
    Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual conference on computational learning theory, ACM Press, New York, pp 144–152. doi: 10.1145/130385.130401
  6. 6.
    Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Pacific GrovezbMATHGoogle Scholar
  7. 7.
    Breiman L, Spector P (1992) Submodel selection and evaluation in regression. The X-random case. Intern Stat Rev 60(3):291–319CrossRefGoogle Scholar
  8. 8.
    Burman P (1989) A comparative study of ordinary cross-validation, \(v\)-fold cross-validation and the repeated learning-testing methods. Biometrika 76(3):503–514MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Burrows W (1997) CART regression models for predicting UV radiation at the ground in the presence of cloud and other environmental factors. J Appl Meteorol 36(5):531–544CrossRefGoogle Scholar
  10. 10.
    Chang MW, Lin CJ (2005) Leave-one-out bounds for support vector regression model selection. Neural Comput 17:1188–1222CrossRefzbMATHGoogle Scholar
  11. 11.
    Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17:113–126CrossRefzbMATHGoogle Scholar
  12. 12.
    Chou P, Lookabaugh T, Gray R (1989) Optimal pruning with applications to tree-structured source coding and modeling. IEEE Trans Inf Theory 35(2):299–315MathSciNetCrossRefGoogle Scholar
  13. 13.
    Drucker H, Burges CJC, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. In: Advances in neural information processing systems, vol 9. MIT Press, Cambridge, pp 155–161Google Scholar
  14. 14.
    Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Statist 7(1):1–26MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
  16. 16.
    Geisser S (1975) The predictive sample reuse method with applications. J Amer Statist Assoc 70:320–328CrossRefzbMATHGoogle Scholar
  17. 17.
    Gey S, Nedelec E (2005) Model selection for CART regression trees. IEEE Trans Inf Theory 51(2):658–670MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Guyon I, Boser B, Vapnik V (1993) Automatic capacity tuning of very large VC-dimension classifiers. Adv Neural Inf Process Syst 5:147–155Google Scholar
  19. 19.
    Hastie T, Rosset S, Tibshirani R, Zhu J (2004) The entire regularization path for the support vector machine. J Mach Learn Res 5:1391–1415MathSciNetzbMATHGoogle Scholar
  20. 20.
    Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics., Data mining, inference, and predictionSpringer-Verlag, New YorkCrossRefzbMATHGoogle Scholar
  21. 21.
    He W, Wang Z, Jiang H (2008) Model optimizing and feature selecting for support vector regression in time series forecasting. Neurocomputing 72:600–611CrossRefGoogle Scholar
  22. 22.
    Koltchinskii V (2006) Local Rademacher complexities and oracle inequalities in risk minimisation. Ann Stat 34(6):2593–2656MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Kwok JT (2001) Linear dependency between epsilon and the input noise in epsilon-support vector regression. In: ICANN, Lecture Notes in Computer Science, vol 2130, Springer, Berlin, pp 405–410Google Scholar
  24. 24.
    Liang P, Srebro N (2010) On the interaction between norm and dimensionality: multiple regimes in learning. In: International conference on machine learning (ICML)Google Scholar
  25. 25.
    Mallows CL (1973) Some comments on \(\text{ C }_{p}\). Technometrics 15:661–675zbMATHGoogle Scholar
  26. 26.
    Momma M, Bennett KP (2002) A pattern search method for model selection of support vector regression. In: In Proceedings of the SIAM international conference on data mining. SIAMGoogle Scholar
  27. 27.
    Neal R (1998) Assessing relevance determination methods using DELVE. In: Neural networks and machine learning, pp 97–129Google Scholar
  28. 28.
    Ong CJ, Shao S, Yang J (2010) An improved algorithm for the solution of the regularization path of support vector machine. Trans Neural Netw 21:451–462CrossRefGoogle Scholar
  29. 29.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  30. 30.
    Rodriguez J, Perez A, Lozano J (2010) Sensitivity analysis of \(k\)-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32(3):569–575CrossRefGoogle Scholar
  31. 31.
    Schölkopf B, Smola A (2002) Learning with kernels. MIT press, CambridgezbMATHGoogle Scholar
  32. 32.
    Shao J (1993) Linear model selection by cross-validation. J Amer Stat Assoc 88(422):486–494CrossRefMathSciNetzbMATHGoogle Scholar
  33. 33.
    Smola A, Murata N, schölkopf B, müller KR (1998) Asymptotically optimal choice of \(\epsilon\)-loss for support vector machines. In: Niklasson L, bodén M, Ziemke T (eds.) Proceedings of the 8th international conference on artificial neural networks, perspectives in neural computing. Springer Verlag, BerlinGoogle Scholar
  34. 34.
    Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222MathSciNetCrossRefGoogle Scholar
  35. 35.
    Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag, New YorkCrossRefzbMATHGoogle Scholar
  36. 36.
    Vapnik VN, Chervonenkis AY (1974) Ordered risk minimization. Automat Remote Cont 35:1226–1235MathSciNetzbMATHGoogle Scholar
  37. 37.
    Wernecke K, Possinger K, Kalb G, Stein J (1998) Validating classification trees. Biometr J 40(8):993–1005MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Wilcoxon F (1945) Individual comparisons by ranking methods. Biometr Bull 1(6):80–83CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Charanpal Dhanjal
    • 1
    Email author
  • Nicolas Baskiotis
    • 1
  • Stéphan Clémençon
    • 2
  • Nicolas Usunier
    • 1
  1. 1.UPMC, LIP6ParisFrance
  2. 2.Télécom ParisTechParisFrance

Personalised recommendations