Abstract
We discuss a model selection procedure, the adaptive ridge selector, derived from a hierarchical Bayes argument, which results in a simple and efficient fitting algorithm. The hierarchical model utilized resembles an un-replicated variance components model and leads to weighting of the covariates. We discuss the intuition behind this type estimator and investigate its behavior as a regularized least squares procedure. While related alternatives were recently exploited to simultaneously fit and select variablses/features in regression models (Tipping in J Mach Learn Res 1:211–244, 2001; Figueiredo in IEEE Trans Pattern Anal Mach Intell 25:1150–1159, 2003), the extension presented here shows considerable improvement in model selection accuracy in several important cases. We also compare this estimator’s model selection performance to those offered by the lasso and adaptive lasso solution paths. Under randomized experimentation, we show that a fixed choice of tuning parameter leads to results in terms of model selection accuracy which are superior to the entire solution paths of lasso and adaptive lasso when the underlying model is a sparse one. We provide a robust version of the algorithm which is suitable in cases where outliers may exist.
Similar content being viewed by others
References
Angers JF, Berger JO (1991) Robust hierarchical bayes estimation of exchangeable means. Can J Stat 19: 39–56
Bae K, Mallick BK (2004) Gene selection using a two-level hierarchical Bayesian model. Bioinformatics 20(18): 3423–3430
Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37(4): 373–384
Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc 80(391): 580–598
Brown P, Vannucci M, Fearn T (1998) Multivariable bayesian variable selection and prediction. J R Stat Soc Series B 60(3): 627–641
Candes E, Tao T (2007) The dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35(6): 2313–2351
Casella G, Moreno E (2006) Objective bayesian variable selection. J Am Stat Assoc 101(473): 157–167
Chen MH, Shao QM, Ibrahim JG (2001) Monte Carlo methods in Bayesian computation. Springer, Berlin
Chipman H, George EI, McCulloch RE (2001) The practical implementation of bayesian model selection. IMS Lecture notes—monograph series, vol 38, pp 65–116
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2): 407–499
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96: 1348–1360
Figueiredo MAT (2003) Adaptive sparseness for supervised learning. IEEE Trans Pattern Anal Mach Intell 25: 1150–1159
George EI (2000) The variable selection problem. J Am Stat Assoc 95: 1304–1308
Geweke J (1993) Bayesian treatment of the independent student-t linear model. J Appl Econom 8(S): S19–40
Griffen JE, Brown PJ (2007) Bayesian adaptive lassos with non-convex penalization. Technical Report 07-2v2, Centre for Research in Statistical Methodology. University of Warwick, UK
Harville DA (1977) Maximum likelihood approaches to variance component estimation and to related problems. J Am Stat Assoc 72(358): 320–338
Hoerl AE, Kennard RW (2000) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42(1): 80–86
Johnstone IM, Silverman BW (2005) Empirical bayes selection of wavelet thresholds. Ann Stat 33: 1700–1752
Kiiveri H (2003) A bayesian approach to variable selection when the number of variables is very large. IMS Lecture Notes - Monograph Series 40: 127–143
Lindley DV, Smith AFM (1972) Bayes estimates for the linear model. J R Stat Soc Series B 34(1): 1–41
O’Hagan A (1976) On posterior joint and marginal modes. Biometrika 63(2): 329–333
Park T, Casella G (2008) The bayesian lasso. J Am Stat Assoc 103: 681–686
Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75(2): 317–343
Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Springer, New York
Sun L, Hsu JSJ, Guttman I, Leonard T (1996) Bayesian methods for variance component models. J Am Stat Assoc 91(434): 743–752
ter Braak CJ (2005) Bayesian sigmoid shrinkage with improper variance priors and an application to wavelet denoising. Comput Stat Data Anal 51(2): 1232–1242
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B 58(1): 267–288
Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1: 211–244
Yuan M, Lin Y (2005) Efficient empirical Bayes variable selection and estimation in linear models. J Am Stat Assoc 100(472): 1215–1225
Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7: 2541–2563
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101: 1418–1429
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Armagan, A., Zaretzki, R.L. Model selection via adaptive shrinkage with t priors. Comput Stat 25, 441–461 (2010). https://doi.org/10.1007/s00180-010-0186-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-010-0186-4