Model selection via adaptive shrinkage with t priors

Armagan, Artin; Zaretzki, Russell L.

doi:10.1007/s00180-010-0186-4

Model selection via adaptive shrinkage with t priors

Original Paper
Published: 17 February 2010

Volume 25, pages 441–461, (2010)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Artin Armagan¹ &
Russell L. Zaretzki²

198 Accesses
6 Citations
Explore all metrics

Abstract

We discuss a model selection procedure, the adaptive ridge selector, derived from a hierarchical Bayes argument, which results in a simple and efficient fitting algorithm. The hierarchical model utilized resembles an un-replicated variance components model and leads to weighting of the covariates. We discuss the intuition behind this type estimator and investigate its behavior as a regularized least squares procedure. While related alternatives were recently exploited to simultaneously fit and select variablses/features in regression models (Tipping in J Mach Learn Res 1:211–244, 2001; Figueiredo in IEEE Trans Pattern Anal Mach Intell 25:1150–1159, 2003), the extension presented here shows considerable improvement in model selection accuracy in several important cases. We also compare this estimator’s model selection performance to those offered by the lasso and adaptive lasso solution paths. Under randomized experimentation, we show that a fixed choice of tuning parameter leads to results in terms of model selection accuracy which are superior to the entire solution paths of lasso and adaptive lasso when the underlying model is a sparse one. We provide a robust version of the algorithm which is suitable in cases where outliers may exist.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Angers JF, Berger JO (1991) Robust hierarchical bayes estimation of exchangeable means. Can J Stat 19: 39–56
Article MathSciNet Google Scholar
Bae K, Mallick BK (2004) Gene selection using a two-level hierarchical Bayesian model. Bioinformatics 20(18): 3423–3430
Article Google Scholar
Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37(4): 373–384
Article MathSciNet Google Scholar
Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc 80(391): 580–598
Article MathSciNet Google Scholar
Brown P, Vannucci M, Fearn T (1998) Multivariable bayesian variable selection and prediction. J R Stat Soc Series B 60(3): 627–641
Article MathSciNet Google Scholar
Candes E, Tao T (2007) The dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35(6): 2313–2351
Article MathSciNet Google Scholar
Casella G, Moreno E (2006) Objective bayesian variable selection. J Am Stat Assoc 101(473): 157–167
Article MathSciNet Google Scholar
Chen MH, Shao QM, Ibrahim JG (2001) Monte Carlo methods in Bayesian computation. Springer, Berlin
Google Scholar
Chipman H, George EI, McCulloch RE (2001) The practical implementation of bayesian model selection. IMS Lecture notes—monograph series, vol 38, pp 65–116
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2): 407–499
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96: 1348–1360
Article MathSciNet Google Scholar
Figueiredo MAT (2003) Adaptive sparseness for supervised learning. IEEE Trans Pattern Anal Mach Intell 25: 1150–1159
Article Google Scholar
George EI (2000) The variable selection problem. J Am Stat Assoc 95: 1304–1308
Article MathSciNet Google Scholar
Geweke J (1993) Bayesian treatment of the independent student-t linear model. J Appl Econom 8(S): S19–40
Article Google Scholar
Griffen JE, Brown PJ (2007) Bayesian adaptive lassos with non-convex penalization. Technical Report 07-2v2, Centre for Research in Statistical Methodology. University of Warwick, UK
Harville DA (1977) Maximum likelihood approaches to variance component estimation and to related problems. J Am Stat Assoc 72(358): 320–338
Article MathSciNet Google Scholar
Hoerl AE, Kennard RW (2000) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42(1): 80–86
Article MathSciNet Google Scholar
Johnstone IM, Silverman BW (2005) Empirical bayes selection of wavelet thresholds. Ann Stat 33: 1700–1752
Article MathSciNet Google Scholar
Kiiveri H (2003) A bayesian approach to variable selection when the number of variables is very large. IMS Lecture Notes - Monograph Series 40: 127–143
Article MathSciNet Google Scholar
Lindley DV, Smith AFM (1972) Bayes estimates for the linear model. J R Stat Soc Series B 34(1): 1–41
MathSciNet Google Scholar
O’Hagan A (1976) On posterior joint and marginal modes. Biometrika 63(2): 329–333
Article MathSciNet Google Scholar
Park T, Casella G (2008) The bayesian lasso. J Am Stat Assoc 103: 681–686
Article MathSciNet Google Scholar
Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75(2): 317–343
Article Google Scholar
Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Springer, New York
Google Scholar
Sun L, Hsu JSJ, Guttman I, Leonard T (1996) Bayesian methods for variance component models. J Am Stat Assoc 91(434): 743–752
Article MathSciNet Google Scholar
ter Braak CJ (2005) Bayesian sigmoid shrinkage with improper variance priors and an application to wavelet denoising. Comput Stat Data Anal 51(2): 1232–1242
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B 58(1): 267–288
MathSciNet Google Scholar
Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1: 211–244
Article MathSciNet Google Scholar
Yuan M, Lin Y (2005) Efficient empirical Bayes variable selection and estimation in linear models. J Am Stat Assoc 100(472): 1215–1225
Article MathSciNet Google Scholar
Zhao P, Yu B (2006) On model selection consistency of lasso. J Mach Learn Res 7: 2541–2563
MathSciNet Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101: 1418–1429
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistical Science, Duke University, Durham, NC, USA
Artin Armagan
Department of Statistics, Operations, and Management Science, The University of Tennessee, Knoxville, TN, USA
Russell L. Zaretzki

Authors

Artin Armagan
View author publications
You can also search for this author in PubMed Google Scholar
Russell L. Zaretzki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Artin Armagan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Armagan, A., Zaretzki, R.L. Model selection via adaptive shrinkage with t priors. Comput Stat 25, 441–461 (2010). https://doi.org/10.1007/s00180-010-0186-4

Download citation

Received: 07 October 2008
Accepted: 28 January 2010
Published: 17 February 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s00180-010-0186-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model selection via adaptive shrinkage with t priors

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Data clustering: application and trends

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Model selection via adaptive shrinkage with t priors

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Data clustering: application and trends

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation