Abstract
The stochastic search variable selection proposed by George and McCulloch (J Am Stat Assoc 88:881–889, 1993) is one of the most popular variable selection methods for linear regression models. Many efforts have been proposed in the literature to improve its computational efficiency. However, most of these efforts change its original Bayesian formulation, thus the comparisons are not fair. This work focuses on how to improve the computational efficiency of the stochastic search variable selection, but remains its original Bayesian formulation unchanged. The improvement is achieved by developing a new Gibbs sampling scheme different from that of George and McCulloch (J Am Stat Assoc 88:881–889, 1993). A remarkable feature of the proposed Gibbs sampling scheme is that, it samples the regression coefficients from their posterior distributions in a componentwise manner, so that the expensive computation of the inverse of the information matrix, which is involved in the algorithm of George and McCulloch (J Am Stat Assoc 88:881–889, 1993), can be avoided. Moreover, since the original Bayesian formulation remains unchanged, the stochastic search variable selection using the proposed Gibbs sampling scheme shall be as efficient as that of George and McCulloch (J Am Stat Assoc 88:881–889, 1993) in terms of assigning large probabilities to those promising models. Some numerical results support these findings.
Similar content being viewed by others
References
Beattie SD, Fong DKH, Lin DKJ (2002) A two-stage Bayesian model selection strategy for supersaturated designs. Technometrics 44:55–63
Casella G, George EI (1992) Explaining the Gibbs sampler. Am Stat 91:883–904
Chen RB, Chu CH, Lai TH, Wu YN (2011) Stochastic matching pursuit for Bayesian variable selection. Stat Comput 21:247–259
Chen RB, Weng JZ, Chu CH (2013) Screening procedure for supersaturated designs using a Bayesian variable selection method. Qual Reliab Eng Int 29:89–101
Chipman H (1998) Fast model search for designed experiments with complex aliasing. Quality improvement through statistical methods. Birkhäuser, Boston
Chipman H, Hamada H, Wu CFJ (1997) A Bayesian variable selection approach for analyzing designed experiments with complex aliasing. Technometrics 39:372–381
Diebolt J, Robert C (1994) Estimation of finite mixture distribution trough Bayesian sampling. J R Stat Soc Ser B 56:363–375
Draper N, Smith H (1981) Applied regression analysis, 2nd edn. Wiley, New York
Fang KT, Li R, Sudjianto A (2006) Design and modeling for computer experiments. Chapman & Hall/CRC, Boca Raton
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881–889
George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Stat Sin 7:339–373
Georgiou SD (2014) Supersaturated designs: a review of their construction and analysis. J Stat Plann Inference 144:92–109
Geweke J (1996) Variable selection and model comparison in regression. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian statistics. Oxford Press, Oxford
Huang HZ, Yang JY, Liu MQ (2014) Functionally induced priors for componentwise Gibbs sampler in the analysis of supersaturated designs. Comput Stat Data Anal 72:1–12
Li R, Lin DKJ (2003) Analysis methods for supersaturated design: some comparisons. J Data Sci 1:249–260
Lin DKJ (1993) A new class of supersaturated designs. Technometrics 35:28–31
Liu Y, Liu MQ (2011) Construction of optimal supersaturated design with large number of levels. J Stat Plan Inference 141:2035–2043
Liu Y, Liu MQ (2012) Construction of equidistant and weak equidistant supersaturated designs. Metrika 75:33–53
Liu Y, Liu MQ (2013) Construction of supersaturated design with large number of factors by the complementary design method. Acta Math Appl Sin 29:253–262
Phoa FKH, Pan YH, Xu H (2009) Analysis of supersaturated designs via the Dantzig selector. J Stat Plan Inference 139:2362–2372
Shao J (2003) Mathematical statistics, 2nd edn. Springer, New York
Sun FS, Lin DKJ, Liu MQ (2011) On construction of optimal mixed-level supersaturated designs. Ann Stat 39:1310–1333
Tanner MA, Wong WH (1987) The calculation of posterior distribution by data augmentation (with discussion). J Am Stat Assoc 82:528–550
Thompson MB (2010) A comparison of methods for computing autocorrelation time. Technical Report No. 1007, Department of Statistics, University of Toronto
Westfall PH, Young SS, Lin DKJ (1998) Forward selection error control in the analysis of supersaturated designs. Stat Sin 8:101–117
Wu CFJ, Hamada M (2009) Experiments: planning, analysis, and optimization, 2nd edn. Wiley, New York
Yin YH, Zhang QZ, Liu MQ (2013) A two-stage variable selection strategy for supersaturated designs with multiple responses. Front Math China 8:717–730
Zhang QZ, Zhang RC, Liu MQ (2007) A method for screening active effects in supersaturated designs. J Stat Plan Inference 137:235–248
Acknowledgements
The authors thank Editor Professor Norbert Henze, and two anonymous referees for their valuable comments and suggestions. This work was supported by the National Natural Science Foundation of China (Grant Nos. 11271205, 11401321 and 11431006), the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20130031110002), the “131” Talents Program of Tianjin and Project 613319. The first two authors contributed equally to this work.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Theorem 1
The joint probability density function (pdf) of all the variables can be expressed as
where the last equality follows from the assumptions (1)–(4).
For \(i=1,\ldots ,p\), the full conditional pdf of the pair \((\beta _i,\gamma _i)\) can be expressed as
where the notation \((-i)\) means all the components except the i-th one.
Next, we derive the closed forms for the two conditional pdf’s on the right-hand side of (10). Notice that the prior distribution of \(\gamma _i\) follows a Bernoulli distribution, by Tanner and Wong (1987), given other variables the conditional distribution of \(\gamma _i\) follows a Bernoulli distribution as well. Thus
Similarly,
Since \( P(\gamma _i=1|\varvec{\beta }_{(-i)}, \varvec{\gamma }_{(-i)}, \sigma ^2, {\mathbf {Y}})+ P(\gamma _i=0|\varvec{\beta }_{(-i)}, \varvec{\gamma }_{(-i)}, \sigma ^2, {\mathbf {Y}})=1\), after some basic algebra we obtain that
where
Now we derive the closed form for \(z_i\). Notice that
The closed form of the integrand in the above equation can be calculated from (1) and (2). After some calculations, the integration yields
where C is a normalization constant, \( \omega _i^2=\sigma ^2/({\mathbf {X}}_i^T{\mathbf {X}}_i)\), and \(b_i={\mathbf {X}}_i^T{\mathbf {R}}_i/({\mathbf {X}}_i^T{\mathbf {X}}_i)\) with \({\mathbf {R}}_i={\mathbf {Y}}-\sum _{j\ne i}\beta _j{\mathbf {X}}_j\). The closed form of \([{\mathbf {Y}}|\gamma _i=0,\varvec{\beta }_{(-i)}, \varvec{\gamma }_{(-i)}, \sigma ^2]\) can be obtained from the right-hand side of the above equation with \(c_i\) being replaced by 1. After some calculations on (11), we have
where \((\sigma _1^i)^2=(\omega ^{-2}_i+c_i^{-2}\tau _i^{-2})^{-1}\) and \((\sigma _2^i)^2=(\omega ^{-2}_i+\tau _i^{-2})^{-1}\). From (9), we know that \([\beta _i|\gamma _i,\varvec{\beta }_{(-i)}, \varvec{\gamma }_{(-i)},\sigma ^2,{\mathbf {Y}}] \propto [{\mathbf {Y}}|\beta _i, \varvec{\beta }_{(-i)}, \sigma ^2][\beta _i|\gamma _i]\), which can be calculated from (1) and (2). In particular,
Finally, by the same argument in George and McCulloch (1993), we have
where \({\mathrm {IG}}\) denotes an inverted gamma distribution. Then from the well-known relation between the inverted gamma distribution and the chi-square distribution (cf., Wu and Hamada 2009), we conclude that
\(\square \)
Rights and permissions
About this article
Cite this article
Huang, H., Zhou, S., Liu, MQ. et al. Acceleration of the stochastic search variable selection via componentwise Gibbs sampling. Metrika 80, 289–308 (2017). https://doi.org/10.1007/s00184-016-0604-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-016-0604-x
Keywords
- Bayesian variable selection
- Gibbs sampler
- Linear regression
- Stochastic search variable selection
- Supersaturated design