Abstract
Model selection is a crucial issue in machine learning and a wide variety of penalisation methods (with possibly data-dependent complexity penalties) have recently been introduced for this purpose. However, their empirical performance is generally not well documented in the literature. It is the goal of this paper to investigate to which extent such recent techniques can be successfully used for the tuning of both the regularisation and kernel parameters in support vector regression (SVR) and the complexity measure in regression trees (CART). This task is traditionally solved via \(V\)-fold cross-validation (VFCV), which gives efficient results for a reasonable computational cost. A disadvantage, however, of VFCV is that the procedure is known to provide an asymptotically suboptimal risk estimate as the number of examples tends to infinity. Recently, a penalisation procedure called \(V\)-fold penalisation has been proposed to improve on VFCV, supported by theoretical arguments. Here, we report on an extensive set of experiments comparing \(V\)-fold penalisation and VFCV for SVR/CART calibration on several benchmark datasets. We highlight cases in which VFCV and \(V\)-fold penalisation provide poor estimates of the risk, respectively, and introduce a modified penalisation technique to reduce the estimation error.
Similar content being viewed by others
References
Allen DM (1974) The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16:125–127
Arlot S (2008) \(V\)-fold cross-validation improved: \(V\)-fold penalization. http://hal.archives-ouvertes.fr/hal-00239182/en/. arXiv:0802.0566v2
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Bartlett PL, Mendelson S (2003) Rademacher and gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual conference on computational learning theory, ACM Press, New York, pp 144–152. doi:10.1145/130385.130401
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Pacific Grove
Breiman L, Spector P (1992) Submodel selection and evaluation in regression. The X-random case. Intern Stat Rev 60(3):291–319
Burman P (1989) A comparative study of ordinary cross-validation, \(v\)-fold cross-validation and the repeated learning-testing methods. Biometrika 76(3):503–514
Burrows W (1997) CART regression models for predicting UV radiation at the ground in the presence of cloud and other environmental factors. J Appl Meteorol 36(5):531–544
Chang MW, Lin CJ (2005) Leave-one-out bounds for support vector regression model selection. Neural Comput 17:1188–1222
Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17:113–126
Chou P, Lookabaugh T, Gray R (1989) Optimal pruning with applications to tree-structured source coding and modeling. IEEE Trans Inf Theory 35(2):299–315
Drucker H, Burges CJC, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. In: Advances in neural information processing systems, vol 9. MIT Press, Cambridge, pp 155–161
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Statist 7(1):1–26
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Geisser S (1975) The predictive sample reuse method with applications. J Amer Statist Assoc 70:320–328
Gey S, Nedelec E (2005) Model selection for CART regression trees. IEEE Trans Inf Theory 51(2):658–670
Guyon I, Boser B, Vapnik V (1993) Automatic capacity tuning of very large VC-dimension classifiers. Adv Neural Inf Process Syst 5:147–155
Hastie T, Rosset S, Tibshirani R, Zhu J (2004) The entire regularization path for the support vector machine. J Mach Learn Res 5:1391–1415
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics., Data mining, inference, and predictionSpringer-Verlag, New York
He W, Wang Z, Jiang H (2008) Model optimizing and feature selecting for support vector regression in time series forecasting. Neurocomputing 72:600–611
Koltchinskii V (2006) Local Rademacher complexities and oracle inequalities in risk minimisation. Ann Stat 34(6):2593–2656
Kwok JT (2001) Linear dependency between epsilon and the input noise in epsilon-support vector regression. In: ICANN, Lecture Notes in Computer Science, vol 2130, Springer, Berlin, pp 405–410
Liang P, Srebro N (2010) On the interaction between norm and dimensionality: multiple regimes in learning. In: International conference on machine learning (ICML)
Mallows CL (1973) Some comments on \(\text{ C }_{p}\). Technometrics 15:661–675
Momma M, Bennett KP (2002) A pattern search method for model selection of support vector regression. In: In Proceedings of the SIAM international conference on data mining. SIAM
Neal R (1998) Assessing relevance determination methods using DELVE. In: Neural networks and machine learning, pp 97–129
Ong CJ, Shao S, Yang J (2010) An improved algorithm for the solution of the regularization path of support vector machine. Trans Neural Netw 21:451–462
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Rodriguez J, Perez A, Lozano J (2010) Sensitivity analysis of \(k\)-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell 32(3):569–575
Schölkopf B, Smola A (2002) Learning with kernels. MIT press, Cambridge
Shao J (1993) Linear model selection by cross-validation. J Amer Stat Assoc 88(422):486–494
Smola A, Murata N, schölkopf B, müller KR (1998) Asymptotically optimal choice of \(\epsilon\)-loss for support vector machines. In: Niklasson L, bodén M, Ziemke T (eds.) Proceedings of the 8th international conference on artificial neural networks, perspectives in neural computing. Springer Verlag, Berlin
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag, New York
Vapnik VN, Chervonenkis AY (1974) Ordered risk minimization. Automat Remote Cont 35:1226–1235
Wernecke K, Possinger K, Kalb G, Stein J (1998) Validating classification trees. Biometr J 40(8):993–1005
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometr Bull 1(6):80–83
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dhanjal, C., Baskiotis, N., Clémençon, S. et al. An empirical comparison of \(V\)-fold penalisation and cross-validation for model selection in distribution-free regression. Pattern Anal Applic 19, 41–53 (2016). https://doi.org/10.1007/s10044-014-0381-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-014-0381-y