Abstract
We propose a Bayesian method to select groups of correlated explanatory variables in a linear regression framework. We do this by introducing in the prior distribution assigned to the regression coefficients a random matrix \(G\) that encodes the group structure. The groups can thus be inferred by sampling from the posterior distribution of \(G\). We then give a graph-theoretic interpretation of this random matrix \(G\) as the adjacency matrix of cliques. We discuss the extension of the groups from cliques to more general random graphs, so that the proposed approach can be viewed as a method to find networks of correlated covariates that are associated with the response.
Similar content being viewed by others
Notes
We could instead set \(G_{ii}=1\). Since one can easily go from one form to the other by a suitable redefinition of the prior distributions, we choose the definitions which we find easier to implement.
To show this, one has to consider a function \(\phi (L, x)\) in (8) which is either finite or has a finite limit (by simultaneously taking \(L\) very small, if the case) when \(x \rightarrow 1\).
Henceforth, by network we mean a connected component of a graph that is not a clique.
References
Besag, J., Green, P.J., Higdon, D., Mengersen, K.: Bayesian computation and stochastic systems. Stat. Sci. 10, 3–66 (1995)
Bondell, H.D., Reich, B.J.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64, 115–123 (2008)
Bornn, L., Gottardo, R., Doucet, A.: Grouping priors and the Bayesian elastic net. Tech. Rep., Department of Statistics, University of British Columbia, arXiv:1001.4083v1 [stat.ME] (2010)
Box, G.E.P., Tiao, G.C.: Bayesian Inference in Statistical Analysis. Addison-Wesley, Reading (1973)
Chatterjee, S., Diaconis, P.: Estimating and understanding exponential random graph models. Tech. Rep., Department of Statistics, Stanford University, arXiv:1102.2650 [math.PR] (2011)
Chipman, H., George, E.I., McCulloch, R.E.: The practical implementation of Bayesian model selection. In: Lahiri, P. (ed.) Model Selection, pp. 65–116. Institute of Mathematical Statistics, Beachwood (2001)
Clyde, M.A., Ghosh, J., Littman, M.L.: Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Stat. 20, 80–101 (2011)
Drummond, M.J., McCarthy, J.J., Sinha, M., Spratt, H.M., Volpi, E., Esser, K.A., Rasmussen, B.B.: Aging and microRNA expression in human skeletal muscle: a microarray and bioinformatics analysis. Physiol. Genomics 43, 595–603 (2011)
Frank, I.E., Friedman, J.H.: A statistical view of some chemometrics regression tools (with discussions). Technometrics 35, 109–148 (1993)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall, London (2003)
George, E.I.: Dilution priors: compensating for model space redundancy. In: Borrowing Strength: Theory Powering Applications—a Festschrift for Lawrence D. Brown, Institute of Mathematical, Statistics, pp 158–165 (2010)
George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)
George, E.I., McCulloch, R.E.: Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–82 (1970)
Kass, R.E., Wasserman, L.: A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Am. Stat. Assoc. 90, 928–934 (1995)
Li, Q., Lin, N.: The Bayesian elastic net. Bayesian Anal. 5, 151–170 (2010)
Liang, F., Paulo, R., Molina, G., Clyde, M.A., Berger, J.O.: Mixtures of \(g\) priors for Bayesian variable selection. J. Am. Stat. Assoc. 103, 410–423 (2008)
Monni, S., Li, H.: Bayesian methods for network-structured genomics data. In: Chen, M., Dey, D., Müller, P., Sun, D., Ye, K. (eds.) Frontiers of Statistical Decision Making and Bayesian Analysis, pp. 303–315. Springer, New York (2010)
Smith, M., Kohn, R.: Nonparametric regression using Bayesian variable selection. J. Econom. 75, 317–343 (1996)
Tarjan, R.: A note on finding the bridges of a graph. Inf. Process. Lett. 2, 160–161 (1974)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
Tutz, G., Ulbricht, J.: Penalized regression with correlation-based penalty. Stat. Comput. 19, 239–253 (2009)
Zellner, A.: On assessing prior distributions and Bayesian regression analysis with \(g\)-prior distributions. In: Goel, P.K., Zellner, A. (eds.) Bayesian Inference and Decision Techniques, pp. 233–243. North Holland, Amsterdam (1986)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Acknowledgments
This research was carried out while the author was at the Department of Public Health of the Weill Cornell Medical College. It was supported by CTSC Grant UL1-RR024996. The author would like to thank two reviewers for their comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Monni, S. Bayesian variable selection for correlated covariates via colored cliques. AStA Adv Stat Anal 98, 143–163 (2014). https://doi.org/10.1007/s10182-013-0218-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-013-0218-9