Bayesian selection of best subsets via hybrid search

Jin, Shiqiang; Goh, Gyuhyeong

doi:10.1007/s00180-020-00996-y

Bayesian selection of best subsets via hybrid search

Original paper
Published: 11 May 2020

Volume 36, pages 1991–2007, (2021)
Cite this article

Computational Statistics Aims and scope Submit manuscript

198 Accesses
4 Citations
Explore all metrics

Abstract

Over the past decades, variable selection for high-dimensional data has drawn increasing attention. With a large number of predictors, there rises a big challenge for model fitting and prediction. In this paper, we develop a new Bayesian method of best subset selection using a hybrid search algorithm that combines a deterministic local search and a stochastic global search. To reduce the computational cost of evaluating multiple candidate subsets for each update, we propose a novel strategy that enables us to calculate exact marginal likelihoods of all neighbor models simultaneously in a single computation. In addition, we establish model selection consistency for the proposed method in the high-dimensional setting in which the number of possible predictors can increase faster than the sample size. Simulation study and real data analysis are conducted to investigate the performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of Bayesian predictive methods for model selection

Article Open access 07 April 2016

A multistage algorithm for best-subset model selection based on the Kullback–Leibler discrepancy

Article 24 April 2015

COMBSS: best subset selection via continuous optimization

Article Open access 12 February 2024

References

Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44(2):813–852
Article MathSciNet Google Scholar
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3):759–771
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
Findlay GM, Daza RM, Martin B, Zhang MD, Leith AP, Gasperini M, Janizek JD, Huang X, Starita LM, Shendure J (2018) Accurate classification of BRCA1 variants with saturation genome editing. Nature 562(7726):217–222
Article Google Scholar
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889
Article Google Scholar
Hans C, Dobra A, West M (2007) Shotgun stochastic search for “large p” regression. J Am Stat Assoc 102(478):507–516
Article MathSciNet Google Scholar
Herz HM, Chen Z, Scherr H, Lackey M, Bolduc C, Bergmann A (2006) vps25 mosaics display non-autonomous cell survival and overgrowth, and autonomous apoptosis. Development 133(10):1871–1880
Article Google Scholar
Hocking RR, Leslie RN (1967) Selection of the best subset in regression analysis. Technometrics 9(4):531–540
Article MathSciNet Google Scholar
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90(430):773–795
Article MathSciNet Google Scholar
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Article MathSciNet Google Scholar
Krishnan R, Boddapati N, Mahalingam S (2018) Interplay between human nucleolar gnl1 and rps20 is critical to modulate cell proliferation. Sci Rep 8:11421
Article Google Scholar
Liang F, Liu C, Carroll RJ (2007) Stochastic approximation in Monte Carlo computation. J Am Stat Assoc 102(477):305–320
Article MathSciNet Google Scholar
Liang F, Song Q, Yu K (2013) Bayesian subset modeling for high-dimensional generalized linear models. J Am Stat Assoc 108(502):589–606
Article MathSciNet Google Scholar
Madigan D, York J (1995) Bayesian graphical models for discrete data. Int Stat Rev 63(2):215–232
Article Google Scholar
Marceaux C, Petit D, Bertoglio J, David MD (2018) Phosphorylation of arhgap19 by cdk1 and rock regulates its subcellular localization and function during mitosis. J Cell Sci. https://doi.org/10.1242/jcs.208397
Article Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Stat Methodol) 58(1):267–288
MathSciNet MATH Google Scholar
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Article MathSciNet Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors are grateful to two anonymous reviewers for their helpful comments and constructive suggestions on an earlier version of this paper.

Author information

Authors and Affiliations

Department of Statistics, Kansas State University, Manhattan, KS, 66506, USA
Shiqiang Jin & Gyuhyeong Goh

Authors

Shiqiang Jin
View author publications
You can also search for this author in PubMed Google Scholar
Gyuhyeong Goh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gyuhyeong Goh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Calculation of Equation (3)

For any $i \notin {\hat{{\varvec{\gamma }}}}$, from Eq. (2), we have

$$\begin{aligned} m(\mathbf{y}|{\hat{{\varvec{\gamma }}}}\cup \{i\})\propto & {} |\mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }^{{ \top }}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }+\tau ^{-1}\mathbf{I}_{k+1} |^{-1/2} \nonumber \\&\times \left( \mathbf{y}^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}\cup \{i\}}\mathbf{y}+b_{\sigma }\right) ^{-\frac{a_{\sigma }+n}{2} }. \end{aligned}$$

(13)

It follows from the Sherman–Morrison formula that

$$\begin{aligned} \mathbf{H}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }= \mathbf{H}_{{\hat{{\varvec{\gamma }}}}}-\frac{\mathbf{H}_{{\hat{{\varvec{\gamma }}}}}\mathbf{x}_i \mathbf{x}_i ^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}} }{\tau ^{-1}+\mathbf{x}_i^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}}\mathbf{x}_i}. \end{aligned}$$

(14)

Using the Sylvester’s determinant identity and the Sherman–Morrison formula, we obtain

$$\begin{aligned}&|\mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }^{{ \top }}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }+\tau ^{-1}\mathbf{I}_{k+1} | \nonumber \\&\quad = \tau ^{-(k+1)}|\tau \mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }\mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }^{{ \top }}+\mathbf{I}_n |\nonumber \\&\quad = \tau ^{-(k+1)}|\tau \mathbf{X}_{{\hat{{\varvec{\gamma }}}}}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n+\tau \mathbf{x}_i \mathbf{x}_i^{{ \top }} |\nonumber \\&\quad = \tau ^{-(k+1)} |\tau \mathbf{X}_{{\hat{{\varvec{\gamma }}}}}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n| \{1+ \tau \mathbf{x}_i^{{ \top }} (\tau \mathbf{X}_{{\hat{{\varvec{\gamma }}}}}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n)^{-1}\mathbf{x}_i\}\nonumber \\&\quad =| \mathbf{X}_{{\hat{{\varvec{\gamma }}}}}^{{ \top }}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}}+\tau ^{-1}\mathbf{I}_{k}| (\tau ^{-1}+\mathbf{x}_i^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}} \mathbf{x}_i). \end{aligned}$$

(15)

Applying (14) and (15) to (13), we thus have

$$\begin{aligned} m(\mathbf{y}|{\hat{{\varvec{\gamma }}}}\cup \{i\})\propto & {} \left\{ \mathbf{y}^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}}\mathbf{y}-\frac{( \mathbf{x}_i^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}}\mathbf{y})^2}{\tau ^{-1}+\mathbf{x}_i^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}} \mathbf{x}_i }+b_{\sigma }\right\} ^ {-\frac{a_{\sigma }+n}{2}}\\&\times (\tau ^{-1}+\mathbf{x}_i^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}} \mathbf{x}_i)^{-1/2} \end{aligned}$$

for any $i \notin {\hat{{\varvec{\gamma }}}}$.

Calculation of Eq. (4)

For any $j \in {\tilde{{\varvec{\gamma }}}}$, Eq. (2) leads to

$$\begin{aligned} m(\mathbf{y}|{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\})\propto & {} |\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\} }^{{ \top }}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\}}+\tau ^{-1}\mathbf{I}_{k} |^{-1/2} \nonumber \\&\times \left( \mathbf{y}^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\}}\mathbf{y}+b_{\sigma }\right) ^{-\frac{a_{\sigma }+n}{2} }. \end{aligned}$$

(16)

From the Sherman-Morrison formula, we have

$$\begin{aligned} \mathbf{H}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\} }= \mathbf{H}_{{\tilde{{\varvec{\gamma }}}}}+\frac{\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{x}_j \mathbf{x}_j ^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}} }{\tau ^{-1}-\mathbf{x}_j^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{x}_j}. \end{aligned}$$

(17)

From the Sylvester’s determinant identity and the Sherman-Morrison formula, we obtain

$$\begin{aligned}&|\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\}}^{{ \top }}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\} }+\tau ^{-1}\mathbf{I}_{k} | \nonumber \\&\quad = \tau ^{-k}|\tau \mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\} }\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\} }^{{ \top }}+\mathbf{I}_n |\nonumber \\&\quad = \tau ^{-k}|\tau \mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n-\tau \mathbf{x}_j \mathbf{x}_j^{{ \top }} |\nonumber \\&\quad = \tau ^{-k} |\tau \mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n| \{1- \tau \mathbf{x}_j^{{ \top }} (\tau \mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n)^{-1}\mathbf{x}_j\}\nonumber \\&\quad =\tau ^2 | \mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}^{{ \top }}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}+\tau ^{-1}\mathbf{I}_{k+1}| (\tau ^{-1}-\mathbf{x}_j^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}} \mathbf{x}_j). \end{aligned}$$

(18)

Hence, applying (17) and (18) to (16), we have

$$\begin{aligned}&m(\mathbf{y}|{\tilde{{\varvec{\gamma }}}}{\setminus } \{j\}) \propto \left\{ \mathbf{y}^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{y}+\frac{( \mathbf{x}_j^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{y})^2}{\tau ^{-1}-\mathbf{x}_j^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}} \mathbf{x}_j }+b_{\sigma }\right\} ^ {-\frac{a_{\sigma }+n}{2}}\\&\quad \times (\tau ^{-1}-\mathbf{x}_j^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}} \mathbf{x}_j)^{-1/2} \end{aligned}$$

for any $j \in {\tilde{{\varvec{\gamma }}}}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, S., Goh, G. Bayesian selection of best subsets via hybrid search. Comput Stat 36, 1991–2007 (2021). https://doi.org/10.1007/s00180-020-00996-y

Download citation

Received: 22 October 2019
Accepted: 29 April 2020
Published: 11 May 2020
Issue Date: September 2021
DOI: https://doi.org/10.1007/s00180-020-00996-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian selection of best subsets via hybrid search

Abstract

Access this article

Similar content being viewed by others

Comparison of Bayesian predictive methods for model selection

A multistage algorithm for best-subset model selection based on the Kullback–Leibler discrepancy

COMBSS: best subset selection via continuous optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Calculation of Equation (3)

Calculation of Eq. (4)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian selection of best subsets via hybrid search

Abstract

Access this article

Similar content being viewed by others

Comparison of Bayesian predictive methods for model selection

A multistage algorithm for best-subset model selection based on the Kullback–Leibler discrepancy

COMBSS: best subset selection via continuous optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Calculation of Equation (3)

Calculation of Eq. (4)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation