Abstract
The Bayesian approach to modelling differs from the frequentist approach primarily in the supplementation of additional information about the parameters to the data. If we specify a “good” prior, in the sense that the prior nudges the likelihood in the right direction, then the estimates will also be good. This is what we aim to do in the case of variable selection problems, whereby the Bayesian method reduces the selection problem to one of estimation from a true search of the variable space for the model which optimises a certain criterion. We contribute to the vastly available literature of variable selection methods by using I-priors [5]—a class of Gaussian distributions which has the distinguishing property of having covariance proportional to the Fisher information (of the model parameters). The original motivation behind the I-prior methodology was to develop a novel unifying approach to various regression models. In this work, we detail the I-prior model used, and showcase some simulation results and several real-world applications in which the I-prior performs favourably compared to other prior distributions and/or variable selection techniques in terms of model size, \(R^2\), predictive ability, and so on.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Briefly, in testing a point null hypothesis of the mean of a normally distributed parameter, the null hypothesis is increasingly accepted as the prior variance of the parameter approaches infinity, regardless of evidence for or against the null. The paradox is also termed Jeffreys-Lindley paradox [40].
- 2.
The Jeffreys prior for a parameter \(\theta \) is defined as \(p(\theta ) \propto \vert {\mathcal I}(\theta ) \vert ^{1/2}\) [21].
- 3.
For any row of \({\mathbf{X}}\), \(\text {Cov}[X_j, X_k] = \text {Cov}[Z_j + U, Z_k + U] = \text {Var }[ U] = 1\), and \(\text {Var}[X_j] = \text {Var}[Z_j + U] = 2\). Thus, \(\text {Corr}[X_j, X_k] = \text {Cov}[X_j, X_k] / (\text {Var}[X_j]\text {Var}[X_k])^{1/2} = 1/2\).
- 4.
Since the total model space used was different between our method, C&M and B&F, it does not make sense to compare posterior model probabilities which we obtained. C&M reported a model probability of 0.491 for their model, but this model was not selected at all using the I-prior.
- 5.
Map tiles by Stamen Design, under CC BY 3.0. Data by OpenStreetMap, under CC BY-SA 3.0. Created using the ggmap package [22] in R.
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: 2nd International Symposium on Information Theory, pp. 267–281. Akadémiai Kiadó (1973)
Banner, K.M., Higgs, M.D.: Considerations for assessing model averaging of regression coeffcients. Ecol. Appl. 27(1), 78–93 (2017). https://doi.org/10.1002/eap.1419
Barbieri, M.M., Berger, J.O.: Optimal predictive model selection. Ann. Stat. 32(3), 870–897 (2004). https://doi.org/10.1214/009053604000000238
Bergsma, W.: Regression with I-priors. J. Econom. Stat. (2019). https://doi.org/10.1016/j.ecosta.2019.10.002
Bergsma, W.: Regression with I-priors. Econom. Stat. 14, 89–111 (2020)
Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, Boston, MA (2004). ISBN 978-1-4613-4792-7. https://doi.org/10.1007/978-1-4419-9096-9
Breiman, L., Friedman, J.H.: Estimating optimal transformations for multiple regression and correlation. J. Am. Stat. Assoc. 80(391), 590–598 (1985). https://doi.org/10.1080/01621459.1985.10478157
Cade, B.S.: Model averaging and muddled multimodel inferences. Ecology 96(9), 2370–2382 (2015). https://doi.org/10.1890/14-1639.1
Casella, G., Javier Girón, F., Lina Martnez, M., Moreno, E.: Consistency of Bayesian procedures for variable selection. Ann. Stat. 37(3), 1207–1228 (2009). https://doi.org/10.1214/08-AOS606
Casella, G., Moreno, E.: Objective Bayesian variable selection. J. Am. Stat. Assoc. 101(473), 157–167 (2006). https://doi.org/10.1198/016214505000000646
Chipman, H., George, E.I., McCulloch, R.E.: The practical implementation of Bayesian model selection. In: Lahiri P. (ed.) Model Selection, vol. 38, pp. 65–134. Institute of Mathematical Statistics (2001). https://doi.org/10.1214/lnms/1215540964
Dellaportas, P., Forster, J.J., Ntzoufras, I.: On Bayesian model and variable selection using MCMC. Stat. Comput. 12(1), 27–36 (2002). https://doi.org/10.1023/A:1013164120801
Fouskakis, D., Draper, D.: Comparing stochastic optimization methods for variable selection in binary outcome prediction, with application to health policy. J. Am. Stat. Assoc. 103(484), 1367–1381 (2008). https://doi.org/10.1198/016214508000001048
Friedman, J.H., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2001). ISBN 978-0-387-84857-0. https://doi.org/10.1007/978-0-387-84858-7
George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88(423), 881–889 (1993). https://doi.org/10.2307/2290777
Geweke, J.: Variable selection and model comparison in regression. In: Bernardo, J.M., Berger, J.O., Philip Dawid, A., Smith, A.F.M. (eds.) Bayesian Statistics 5. Proceedings of the Fifth Valencia International Meeting. Oxford University Press (1996). ISBN: 978-0-19-852356-7
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970). https://doi.org/10.2307/1267351
Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial. Stat. Sci. 14(4), 382–401 (1999). https://doi.org/10.1214/ss/1009212519
Jamil, H. (2018). ipriorBVS: Bayesian Variable Selection Using I-priors. R package version 0.1.1. https://github.com/haziqj/ipriorBVS
Jamil, H.: Regression modelling using priors depending on Fisher information covariance kernels (I-priors). Ph.D. thesis, London School of Economics and Political Science (2018)
Jeffreys, H.: An invariant form for the prior probability in estimation problems. Proc. Roy. Soc. A 186(1007), 453–461 (1946). https://doi.org/10.1098/rspa.1946.0056
Kahle, D., Wickham, H.: ggmap: spatial visualization with ggplot2. R J. 5(1), 144–161 (2013)
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90(430), 773–795 (1995). https://doi.org/10.2307/2291091
Kuo, L., Mallick, B.: Variable selection for regression models. Sankhy Indian J. Stat. Ser. B 601, 65–81
Kyung, M., Gill, J., Ghosh, M., Casella, G.: Penalized regression, standard errors, and Bayesian lassos. Bayesian Anal. 5(2), 369–411 (2010). https://doi.org/10.1214/10-BA607
Lee, K.E., Sha, N., Dougherty, E.R., Vannucci, M., Mallick, B.: Gene selection: a Bayesian variable selection approach. Bioinformatics 19(1), 90–97 (2003). https://doi.org/10.1093/bioinformatics/19.1.90
Leisch, F., Dimitriadou, E.: mlbench: Machine Learning Benchmark Problems. R package version 2.1-1 (2010)
Lindley, D.V.: A statistical paradox. Biometrika 44(1–2), 187–192 (1957). https://doi.org/10.1093/biomet/44.1-2.187
Madigan, D., Raftery, A.E.: Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Am. Stat. Assoc. 89(428), 1535–1546 (1994). https://doi.org/10.2307/2291017
Mallows, C.L.: Some comments on CP. Technometrics 15(4), 661–675 (1973). https://doi.org/10.2307/1267380
McDonald, G.C., Schwing, R.C.: Instabilities of regression estimates relating air pollution to mortality. Technometrics 15(3), 463–481 (1973). https://doi.org/10.2307/1266852
Miller, A.: Subset selection in regression. Chapman & Hall/CRC (2002). ISBN: 978-1-58488-171-1
Mitchell, T.J., Beauchamp, J.J.: Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83(404), 1023–1032 (1988). https://doi.org/10.2307/2290129
Ntzoufras, I.: Bayesian modeling using WinBUGS. Wiley (2011). ISBN 978-0-470-14114-4. https://doi.org/10.1002/9780470434567
O’Hara, R.B., Sillanpää, M.J.: A review of Bayesian variable selection methods: what, how and which. Bayesian Anal. 4(1), 85–117 (2009). https://doi.org/10.1214/09-BA403
Ormerod, J.T., You, C., Mäller, S.: A variational Bayes approach to variable selection. Electron. J. Stat. 11(2), 3549–3594 (2017). https://doi.org/10.1214/17-EJS1332
Park, T., Casella, G.: The Bayesian Lasso. J. Am. Stat. Assoc. 103(482), 681–686 (2008). https://doi.org/10.1198/016214508000000337
Plummer, M.: JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In: Hornik, K., Leisch, F., Zeileis, A. (eds.) Proceedings of the Third International Workshop on Distributed Statistical Computing (DSC 2003), Vienna, Austria (2003)
Raftery, A.E., Madigan, D., Hoeting, J.A.: Bayesian model averaging for linear regression models. J. Am. Stat. Assoc. 92(437), 179–191 (1997). https://doi.org/10.1080/01621459.1997.10473615
Robert, C.: On the Jeffreys-Lindley paradox. Philos. Sci. 81(2), 216–232 (2014). arXiv: 1303.5973
SAS Institute Inc.: SAS/STAT(R) 9.2 User’s Guide, 2nd edn. SAS Institute Inc., Cary, NC (2008). ISBN: 978-1-60764-566-5
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978). https://doi.org/10.1214/aos/1176344136
Scott, S.L., Varian, H.R.: Predicting the present with Bayesian structural time series. Int. J. Math. Model. Numer. Optim. 5(1–2), 4–23 (2014). https://doi.org/10.1504/IJMMNO.2014.059942
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 58(1), 267–288 (1996). https://doi.org/10.1111/j.1467-9868.2011.00771.x
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 67(2), 301–320 (2005). https://doi.org/10.1111/j.1467-9868.2005.00503.x
Zellner, A.: On Assessing Prior Distributions and Bayesian Regression Analysis with g-Prior Distributions. In: Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti. New York: Elsevier, pp. 233–243. (1986)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Jamil, H., Bergsma, W. (2021). Bayesian Variable Selection for Linear Models Using I-Priors. In: Abdul Karim, S.A. (eds) Theoretical, Modelling and Numerical Simulations Toward Industry 4.0. Studies in Systems, Decision and Control, vol 319. Springer, Singapore. https://doi.org/10.1007/978-981-15-8987-4_8
Download citation
DOI: https://doi.org/10.1007/978-981-15-8987-4_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8986-7
Online ISBN: 978-981-15-8987-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)