Skip to main content
Log in

Bayesian selection of best subsets via hybrid search

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Over the past decades, variable selection for high-dimensional data has drawn increasing attention. With a large number of predictors, there rises a big challenge for model fitting and prediction. In this paper, we develop a new Bayesian method of best subset selection using a hybrid search algorithm that combines a deterministic local search and a stochastic global search. To reduce the computational cost of evaluating multiple candidate subsets for each update, we propose a novel strategy that enables us to calculate exact marginal likelihoods of all neighbor models simultaneously in a single computation. In addition, we establish model selection consistency for the proposed method in the high-dimensional setting in which the number of possible predictors can increase faster than the sample size. Simulation study and real data analysis are conducted to investigate the performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44(2):813–852

    Article  MathSciNet  Google Scholar 

  • Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3):759–771

    Article  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • Findlay GM, Daza RM, Martin B, Zhang MD, Leith AP, Gasperini M, Janizek JD, Huang X, Starita LM, Shendure J (2018) Accurate classification of BRCA1 variants with saturation genome editing. Nature 562(7726):217–222

    Article  Google Scholar 

  • George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889

    Article  Google Scholar 

  • Hans C, Dobra A, West M (2007) Shotgun stochastic search for “large p” regression. J Am Stat Assoc 102(478):507–516

    Article  MathSciNet  Google Scholar 

  • Herz HM, Chen Z, Scherr H, Lackey M, Bolduc C, Bergmann A (2006) vps25 mosaics display non-autonomous cell survival and overgrowth, and autonomous apoptosis. Development 133(10):1871–1880

    Article  Google Scholar 

  • Hocking RR, Leslie RN (1967) Selection of the best subset in regression analysis. Technometrics 9(4):531–540

    Article  MathSciNet  Google Scholar 

  • Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90(430):773–795

    Article  MathSciNet  Google Scholar 

  • Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680

    Article  MathSciNet  Google Scholar 

  • Krishnan R, Boddapati N, Mahalingam S (2018) Interplay between human nucleolar gnl1 and rps20 is critical to modulate cell proliferation. Sci Rep 8:11421

    Article  Google Scholar 

  • Liang F, Liu C, Carroll RJ (2007) Stochastic approximation in Monte Carlo computation. J Am Stat Assoc 102(477):305–320

    Article  MathSciNet  Google Scholar 

  • Liang F, Song Q, Yu K (2013) Bayesian subset modeling for high-dimensional generalized linear models. J Am Stat Assoc 108(502):589–606

    Article  MathSciNet  Google Scholar 

  • Madigan D, York J (1995) Bayesian graphical models for discrete data. Int Stat Rev 63(2):215–232

    Article  Google Scholar 

  • Marceaux C, Petit D, Bertoglio J, David MD (2018) Phosphorylation of arhgap19 by cdk1 and rock regulates its subcellular localization and function during mitosis. J Cell Sci. https://doi.org/10.1242/jcs.208397

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Stat Methodol) 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942

    Article  MathSciNet  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful to two anonymous reviewers for their helpful comments and constructive suggestions on an earlier version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gyuhyeong Goh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Calculation of Equation (3)

For any \(i \notin {\hat{{\varvec{\gamma }}}}\), from Eq. (2), we have

$$\begin{aligned} m(\mathbf{y}|{\hat{{\varvec{\gamma }}}}\cup \{i\})\propto & {} |\mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }^{{ \top }}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }+\tau ^{-1}\mathbf{I}_{k+1} |^{-1/2} \nonumber \\&\times \left( \mathbf{y}^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}\cup \{i\}}\mathbf{y}+b_{\sigma }\right) ^{-\frac{a_{\sigma }+n}{2} }. \end{aligned}$$
(13)

It follows from the Sherman–Morrison formula that

$$\begin{aligned} \mathbf{H}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }= \mathbf{H}_{{\hat{{\varvec{\gamma }}}}}-\frac{\mathbf{H}_{{\hat{{\varvec{\gamma }}}}}\mathbf{x}_i \mathbf{x}_i ^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}} }{\tau ^{-1}+\mathbf{x}_i^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}}\mathbf{x}_i}. \end{aligned}$$
(14)

Using the Sylvester’s determinant identity and the Sherman–Morrison formula, we obtain

$$\begin{aligned}&|\mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }^{{ \top }}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }+\tau ^{-1}\mathbf{I}_{k+1} | \nonumber \\&\quad = \tau ^{-(k+1)}|\tau \mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }\mathbf{X}_{{\hat{{\varvec{\gamma }}}}\cup \{i\} }^{{ \top }}+\mathbf{I}_n |\nonumber \\&\quad = \tau ^{-(k+1)}|\tau \mathbf{X}_{{\hat{{\varvec{\gamma }}}}}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n+\tau \mathbf{x}_i \mathbf{x}_i^{{ \top }} |\nonumber \\&\quad = \tau ^{-(k+1)} |\tau \mathbf{X}_{{\hat{{\varvec{\gamma }}}}}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n| \{1+ \tau \mathbf{x}_i^{{ \top }} (\tau \mathbf{X}_{{\hat{{\varvec{\gamma }}}}}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n)^{-1}\mathbf{x}_i\}\nonumber \\&\quad =| \mathbf{X}_{{\hat{{\varvec{\gamma }}}}}^{{ \top }}\mathbf{X}_{{\hat{{\varvec{\gamma }}}}}+\tau ^{-1}\mathbf{I}_{k}| (\tau ^{-1}+\mathbf{x}_i^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}} \mathbf{x}_i). \end{aligned}$$
(15)

Applying (14) and (15) to (13), we thus have

$$\begin{aligned} m(\mathbf{y}|{\hat{{\varvec{\gamma }}}}\cup \{i\})\propto & {} \left\{ \mathbf{y}^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}}\mathbf{y}-\frac{( \mathbf{x}_i^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}}\mathbf{y})^2}{\tau ^{-1}+\mathbf{x}_i^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}} \mathbf{x}_i }+b_{\sigma }\right\} ^ {-\frac{a_{\sigma }+n}{2}}\\&\times (\tau ^{-1}+\mathbf{x}_i^{{ \top }}\mathbf{H}_{{\hat{{\varvec{\gamma }}}}} \mathbf{x}_i)^{-1/2} \end{aligned}$$

for any \(i \notin {\hat{{\varvec{\gamma }}}}\).

Calculation of Eq. (4)

For any \(j \in {\tilde{{\varvec{\gamma }}}}\), Eq. (2) leads to

$$\begin{aligned} m(\mathbf{y}|{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\})\propto & {} |\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\} }^{{ \top }}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\}}+\tau ^{-1}\mathbf{I}_{k} |^{-1/2} \nonumber \\&\times \left( \mathbf{y}^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\}}\mathbf{y}+b_{\sigma }\right) ^{-\frac{a_{\sigma }+n}{2} }. \end{aligned}$$
(16)

From the Sherman-Morrison formula, we have

$$\begin{aligned} \mathbf{H}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\} }= \mathbf{H}_{{\tilde{{\varvec{\gamma }}}}}+\frac{\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{x}_j \mathbf{x}_j ^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}} }{\tau ^{-1}-\mathbf{x}_j^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{x}_j}. \end{aligned}$$
(17)

From the Sylvester’s determinant identity and the Sherman-Morrison formula, we obtain

$$\begin{aligned}&|\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\}}^{{ \top }}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\} }+\tau ^{-1}\mathbf{I}_{k} | \nonumber \\&\quad = \tau ^{-k}|\tau \mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\} }\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}{\setminus }\{j\} }^{{ \top }}+\mathbf{I}_n |\nonumber \\&\quad = \tau ^{-k}|\tau \mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n-\tau \mathbf{x}_j \mathbf{x}_j^{{ \top }} |\nonumber \\&\quad = \tau ^{-k} |\tau \mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n| \{1- \tau \mathbf{x}_j^{{ \top }} (\tau \mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}^{{ \top }}+\mathbf{I}_n)^{-1}\mathbf{x}_j\}\nonumber \\&\quad =\tau ^2 | \mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}^{{ \top }}\mathbf{X}_{{\tilde{{\varvec{\gamma }}}}}+\tau ^{-1}\mathbf{I}_{k+1}| (\tau ^{-1}-\mathbf{x}_j^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}} \mathbf{x}_j). \end{aligned}$$
(18)

Hence, applying (17) and (18) to (16), we have

$$\begin{aligned}&m(\mathbf{y}|{\tilde{{\varvec{\gamma }}}}{\setminus } \{j\}) \propto \left\{ \mathbf{y}^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{y}+\frac{( \mathbf{x}_j^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}}\mathbf{y})^2}{\tau ^{-1}-\mathbf{x}_j^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}} \mathbf{x}_j }+b_{\sigma }\right\} ^ {-\frac{a_{\sigma }+n}{2}}\\&\quad \times (\tau ^{-1}-\mathbf{x}_j^{{ \top }}\mathbf{H}_{{\tilde{{\varvec{\gamma }}}}} \mathbf{x}_j)^{-1/2} \end{aligned}$$

for any \(j \in {\tilde{{\varvec{\gamma }}}}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, S., Goh, G. Bayesian selection of best subsets via hybrid search. Comput Stat 36, 1991–2007 (2021). https://doi.org/10.1007/s00180-020-00996-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-00996-y

Keywords

Navigation