Abstract
Over the past decades, variable selection for high-dimensional data has drawn increasing attention. With a large number of predictors, there rises a big challenge for model fitting and prediction. In this paper, we develop a new Bayesian method of best subset selection using a hybrid search algorithm that combines a deterministic local search and a stochastic global search. To reduce the computational cost of evaluating multiple candidate subsets for each update, we propose a novel strategy that enables us to calculate exact marginal likelihoods of all neighbor models simultaneously in a single computation. In addition, we establish model selection consistency for the proposed method in the high-dimensional setting in which the number of possible predictors can increase faster than the sample size. Simulation study and real data analysis are conducted to investigate the performance of the proposed method.
Similar content being viewed by others
References
Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44(2):813–852
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3):759–771
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Findlay GM, Daza RM, Martin B, Zhang MD, Leith AP, Gasperini M, Janizek JD, Huang X, Starita LM, Shendure J (2018) Accurate classification of BRCA1 variants with saturation genome editing. Nature 562(7726):217–222
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889
Hans C, Dobra A, West M (2007) Shotgun stochastic search for “large p” regression. J Am Stat Assoc 102(478):507–516
Herz HM, Chen Z, Scherr H, Lackey M, Bolduc C, Bergmann A (2006) vps25 mosaics display non-autonomous cell survival and overgrowth, and autonomous apoptosis. Development 133(10):1871–1880
Hocking RR, Leslie RN (1967) Selection of the best subset in regression analysis. Technometrics 9(4):531–540
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90(430):773–795
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Krishnan R, Boddapati N, Mahalingam S (2018) Interplay between human nucleolar gnl1 and rps20 is critical to modulate cell proliferation. Sci Rep 8:11421
Liang F, Liu C, Carroll RJ (2007) Stochastic approximation in Monte Carlo computation. J Am Stat Assoc 102(477):305–320
Liang F, Song Q, Yu K (2013) Bayesian subset modeling for high-dimensional generalized linear models. J Am Stat Assoc 108(502):589–606
Madigan D, York J (1995) Bayesian graphical models for discrete data. Int Stat Rev 63(2):215–232
Marceaux C, Petit D, Bertoglio J, David MD (2018) Phosphorylation of arhgap19 by cdk1 and rock regulates its subcellular localization and function during mitosis. J Cell Sci. https://doi.org/10.1242/jcs.208397
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Stat Methodol) 58(1):267–288
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Acknowledgements
The authors are grateful to two anonymous reviewers for their helpful comments and constructive suggestions on an earlier version of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Calculation of Equation (3)
For any \(i \notin {\hat{{\varvec{\gamma }}}}\), from Eq. (2), we have
It follows from the Sherman–Morrison formula that
Using the Sylvester’s determinant identity and the Sherman–Morrison formula, we obtain
Applying (14) and (15) to (13), we thus have
for any \(i \notin {\hat{{\varvec{\gamma }}}}\).
Calculation of Eq. (4)
For any \(j \in {\tilde{{\varvec{\gamma }}}}\), Eq. (2) leads to
From the Sherman-Morrison formula, we have
From the Sylvester’s determinant identity and the Sherman-Morrison formula, we obtain
Hence, applying (17) and (18) to (16), we have
for any \(j \in {\tilde{{\varvec{\gamma }}}}\).
Rights and permissions
About this article
Cite this article
Jin, S., Goh, G. Bayesian selection of best subsets via hybrid search. Comput Stat 36, 1991–2007 (2021). https://doi.org/10.1007/s00180-020-00996-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-00996-y