# Model Averaging

**DOI:**https://doi.org/10.1057/978-1-349-95189-5_2075

## Abstract

Model averaging estimates the distribution of quantities of interest across models. Model averaging can be used for inference, prediction and policy analysis to address model uncertainty. Three main approaches are discussed: Bayesian model averaging (BMA), empirical Bayes (EB) methods, and frequentist model averaging (FMA). Differences in prior specifications are contrasted using the example of normal, linear regression models. Finally, the article discusses implementation issues such as numerical simulation techniques and software for model averaging.

## Keywords

Bayes’ rule Bayesian estimation Bayesian model averaging Empirical Bayes methods Exchangeability Frequentist model averaging Homoskedasticity Likelihood Markov chain Monte Carlo methods Metropolis–Hastings algorithm Model averaging Model selection criteria Model uncertainty Posterior model probabilities Sensitivity analysis Statistical decision theory Stochastic search variable selection## JEL Classifications

C10 C50 D81 E52 O40Model averaging allows the estimation of the distribution of unknown parameters and related quantities of interest across different models. The basic principle of model averaging is to treat models and associated parameters as unobservable and estimate their distributions based on observable data. Model averaging can be employed for inference, prediction and policy analysis in the face of model uncertainty. Many areas of economics give rise to model uncertainty, including uncertainty about theory, specification and data issues. A naive approach that ignores model uncertainty generally results in biased parameter estimates, overconfident (too narrow) standard errors and misleading inference and predictions (see Draper 1995). Taking model uncertainty seriously implies a departure from conditioning on a particular model and calculating quantities of interest by averaging across different models instead.

Model averaging is conceptually straightforward. The sample information contained in the likelihood function for a particular model is combined with relative model weights or posterior model probabilities to estimate the distribution of unknown parameters across models. Three main approaches – Bayesian, empirical Bayes, and frequentist – have been developed, and they differ in their underlying statistical foundations and practical implementation.

*Bayesian model averaging* (*BMA*) was developed first to systematically deal with model uncertainty. The idea of combining evidence from different models is readily integrated into a Bayesian framework. Jeffreys (1961) laid the foundation for BMA, further developed by Leamer (1978). Hoeting et al. (1999), Wasserman (2000) and Koop (2003) give excellent introductions to BMA. A drawback of the Bayesian approach is that it requires assumptions about prior information about distribution of unknown parameters. In response, *empirical Bayes* (*EB*) approaches have been developed to estimate elements of the prior using observable data. Chipman et al. (2001) argue for a pragmatic approach that introduces objective or frequentist considerations into model averaging. In contrast to Bayesian approaches, *frequentist model averaging* (*FMA*) methods were developed only relatively recently. Recent contributions include Yang (2001), Hjort and Claeskens (2003) and Hansen (2007).

Model averaging was not widely used until advances in statistical techniques and computing power facilitated its practical use (see Chib 2001; Geweke and Whiteman 2006). Economic applications of model averaging include economic growth (Fernandez et al. 2001a; Sala-i-Martin et al. 2004), finance (Avramov 2002), policy evaluation (Brock et al. 2003; Levin and Williams 2003), macroeconomic forecasting (Garratt et al. 2003).

This article is organized as follows. The statistical model averaging framework is introduced in the next section. Different model averaging approaches are illustrated with applications to linear regressions. Finally, implementation issues, including model priors, numerical methods, and software are discussed.

## Statistical Framework

*Y*and wishes to learn about quantities of interest related to an unknown parameter (vector)

*θ*, such as the effect of an economic variable (say

*θ*> 0 or

*θ*≤ 0) or predictions of future observations

*Y*

^{f}. The utility (or loss) function of the decision maker describes the relation between parameter of interest

*θ*and action

*a*. For example, the decision maker could maximize expected utility

In general, the preferred action depends on the preferences of the decision-maker and the unconditional distribution of parameters. Alternative preference structure can have important consequences for optimal estimators and implied policy conclusions. Bernardo and Smith (1994) give an accessible introduction to statistical decision theory. In the context of economic policy, Brock et al. (2003) present an interesting discussion of alternative preferences and implied policies.

*posterior*distribution of the parameter

*θ*, which can be calculated using Bayes’s rule:

The posterior distribution is therefore proportional to the *likelihood* function *L*(*Y*|*θ*), which summarizes all information about *θ* contained in the observed data, and the *prior* distribution *p*(*θ*). In contrast, the classical approach assumes that the parameter *θ* is fixed (non-random) and does not have a meaningful distribution. The estimator \( \widehat{\theta} \) on the other hand is viewed as a random variable.

In many economic and more generally non-experimental applications, a decision maker might face considerable model uncertainty given potentially overlapping, economic theories. Brock and Durlauf (2001) refer to this as ‘open-endedness’ of economic theories. Also, there might be alternative empirical specifications of these theoretical channels. In sum, the number of observations may be smaller than the number of suggested explanations, and the problem may be compounded by data problems, such as missing data or outliers.

*M*

_{1},…,

*M*

_{K}to explain the observed data. A model

*M*

_{j}can be described by a probability distribution

*p*(

*Y|θ*

_{j},

*M*

_{j}) with model-specific parameter (vector)

*θ*

_{j}. In a situation of model uncertainty, the decision-maker evaluates the utility function Eq. (1) using the posterior distribution of

*θ*. The posterior distribution is unconditional with respect to the set of models and is calculated by averaging conditional or model-specific distributions across all models

*w*

_{j}are proportional to the fit in explaining the observable data. In a Bayesian context, the weights are the

*posterior model probabilities, w*

_{j}=

*p*(

*M*

_{j}|

*Y*). Using Bayes’s rule,

*p*(

*M*

_{j}) and model-specific marginal likelihood

*L*(

*Y|M*

_{j}). The marginal likelihood is obtained by integrating a model-specific version of equation Eq. (2) with respect to

*θ*

_{j}

*∫p*(

*θ*

_{j}|

*M*

_{j},

*Y*)

*dθ*

_{j}= 1.

*M*

_{i}and

*M*

_{j}say, the posterior model probabilities or

*posterior odds ratio*equals the ratio of integrated likelihoods times the prior odds

*M*

_{i}relative to

*K*models under consideration is given by Eq. (4), where the normalizing factor \( {\sum}_{j=1}^KL\left(Y|{M}_j\right)p\left({M}_j\right) \) ensures consistency of model weights.

*θ*

The expression for the unconditional mean of *θ* in Eq. (7) is simply the model-weighted sum of conditional means. Notice that the unconditional variance of *θ* in Eq. (8) exceeds the sum of model-weighted conditional variances by an additional term, reflecting the distance between the estimated conditional mean in each model *E*(*θ*_{j}|*Y*, *M*_{j}) and the unconditional mean *E*(*θ*|*Y*). Ignoring this last term overestimates the precision of estimated effects and underestimates parameter uncertainty (see Draper 1995).

The advantage of the Bayesian approach to model averaging is its generality and the explicit treatment of model uncertainty and decision theory. The decision maker simply combines prior information about the distribution of parameters and models with sample information to calculate the unconditional posterior distribution of *θ* in Eq. (3).

- 1.
The specification of prior distribution of parameters

*θ*requires assumptions about functional forms and unknown hyper-parameters which will in general affect the marginal likelihood Eq. (5) and hence posterior model weights Eq. (4). - 2.
The specification of prior probabilities over the model space

*p*(*M*_{j}) might have important effects on posterior model weights Eq. (4). - 3.
The number of models

*K*in Eq. (3) can be too large for a complete summation across models, implying the use simulation techniques to approximate the unconditional distribution*p*(*θ*|*Y*) in equation Eq. (3). - 4.
Choices of utility function Eq. (1) and class of models are other important issues.

These issues are discussed in turn, contrasting the fully Bayesian, empirical Bayes and frequentist approaches.

## Linear Regression Example

Many of the implementation problems of model averaging and approaches suggested in the literature can be illustrated using the linear regression example (see Koop 2003). Raftery et al. (1997) and Fernandez et al. (2001b) discuss BMA for linear regression models.

where *y* is the vector of *N* observations of the dependent variable and *X* = [*x*_{1},…, *x*_{k}] is a set of *k* regressors (including a constant) with associated coefficient vector *β*. Each model *M*_{j} is characterized by a subset of explanatory variables *X*_{j} with coefficient vector *β*_{j}. With *k* regressors, the set of linear models equals *K* = 2^{k}. The residuals are drawn from a multivariate normal distribution and are assumed to be conditionally homoskedastic, *ε*_{j} ∼ *N*(0, *σ*^{2}*I*). Notice that this implies that the residuals are also conditionally exchangeable (see Bernardo and Smith 1994; Brock and Durlauf 2001).

Suppose the decision maker is interested in the effect of different explanatory variables, represented by slope parameters *β* with posterior distribution of *p*(*β*|*Y* ). As shown in Eq. (3), the posterior distribution is estimated by weighting conditional distributions of parameters by posterior model probabilities. The relative posterior model weights in Eqs (6) and (4) are proportional to the marginal likelihood and prior model weights.

with degrees of freedom *v*_{j} = *N* − *k*_{j} − 1. The implementation of model averaging – Bayesian, empirical Bayes, or frequentist – requires the specification of prior distributions *p*(*θ*_{j}) for the model parameters *θ*_{j} = (*β*_{j}, σ^{2}).

## Bayesian Conjugate Priors

*conjugate prior*distribution leads to a posterior distribution of the same class of distributions when combined with the likelihood. The likelihood Eq. (10) is part of the Normal-Gamma family of distributions, proportional to the product of a normal distribution for the slope

*β*

_{j}, conditional on the variance

*σ*

^{2}, and an inverse-Gamma distribution for the variance

*σ*

^{2}. The conjugate prior therefore takes the form

where the prior hyper-parameters for slope and variance are denoted by subscript 0. Notice that the error variance is assumed to be drawn from the same distribution across all regression models, reflecting the assumption of conditional homoskedasticity and exchangeability of the residuals.

A drawback of the Bayesian approach is that marginal likelihood and posterior model weights depend on unknown hyper-parameters (*β*_{0}, *V*_{0}, *s*_{0}, *v*_{0}). Different subjective priors therefore affect the posterior model weights and distribution of parameters, and hence also the decision maker’s action. The standard Bayesian approach to check for robustness with respect to the choice of prior parameters is sensitivity analysis. An alternative strategy is to limit the use of subjective prior information and use objective methods based on observed data.

## Empirical Bayes Priors

Empirical Bayes (EB) approaches make use of sample information to specify prior parameters. Different versions of empirical Bayes methods have been proposed in the literature (see Hoeting et al. 1999; George and Foster 2000; Chipman et al. 2001). To limit the importance of prior information, EB methods often use non-informative or diffuse priors that are dominated by the sample information (see Leamer 1978). Jeffreys (1961) proposes non-informative priors to represent lack of prior knowledge and derives a formal relationship to the expected information in the sample.

A drawback of non-informative priors is that they are usually not proper distributions, which can lead to undesirable properties when comparing models with different parameters. In this case, relative model weights can depend on arbitrary constants. However, this problem is not present when comparing models with common parameters, since normalizing constants drop out from *relative* model weights (see Kass and Raftery 1995). Koop (2003) argues that informative or proper priors should be used for all other (non-common) parameters.

*benchmark priors*for BMA that limit the subjective prior information to a minimum while maintaining the Bayesian natural conjugate framework. They suggest the following non-informative priors for the error variance, assumed to be the same in all

*k*models:

*β*

_{j}is drawn from a normal prior distribution as in Eq. (13) with prior mean

*β*

_{0j}= 0 and prior covariance matrix

*V*

_{oj}equal to the so-called

*g-*prior suggested by Zellner (1986):

*g*

_{0}. The

*g-*prior simplifies the specification of prior covariances to choosing a single parameter

*g*

_{0}. For example,

*g*

_{0}= 0 corresponds to completely non-informative priors, and

*g*

_{0}= 1 implies a very informative prior receiving equal weight to the sample information. Based on extensive simulations, Fernandez et al. (2001b) recommend the following benchmark values:

Note that the ratio of prior to sample variance *g*_{0} decreases with the sample size or with the square of estimated parameters. If the number of parameters is relatively large *k*^{2} ≥ *N*, the variance is assumed to be relatively more diffuse.

*M*

_{j}can be written as

The weight for model *p*(*M*_{j}|*Y* ) depends on three terms: (i) the prior model weight *p* (*M*_{j}), (ii) a penalty term for the number of regressors \( {\left(\left(1+{g}_0\right)/{g}_0\right)}^{-{k}_j/2} \) implying a preference for parsimonious models, and (iii) a term involving the sum of squared errors of the regression *SSE*_{j} ≡ (*y* − *X*_{j}*β*_{j})′ (*y* − *X*_{j}*β*_{j}), corresponding to the kernel of the normal likelihood.

## Frequentist Sample Dominated Priors

*g-*priors for the error covariance matrix is that the limit of posterior weights may be very sensitive to specification of the prior (see Leamer 1978). Alternatively, Leamer (1978) assumes that a proper, conjugate Normal-Gamma prior Eq. (13) is ‘dominated’ by the sample information as the number of observations

*N*grows. For stationary regressors with lim

_{N→∞}(

*X′*

_{j}

*X*

_{j})/

*N*converging to a constant, the implied model weight is approximately equal to the (exponentiated) Schwarz (1978) model selection criterion (BIC)

On closer inspection, the relative model weights using non-informative *g-*priors Eq. (17) or sample-dominated prior Eq. (18) are essentially the same, using *g*_{0} = 1/*N* in Eq. (16). This is very reassuring for a decision maker, since the relative model weights are very similar under an empirical Bayesian or frequentist interpretation.

The BIC weights can also be derived from a unit information prior, where the information introduced by the prior corresponds to *one* datapoint from the sample (see Kass and Wasserman 1995; Raftery 1995). Klein and Brown (1984) give an alternative derivation of the BIC model weights Eq. (18) by minimizing the so-called

Shannon information in the prior distribution; this approach also lends support for using the BIC model weights in small samples.

The underlying model space and its interpretation are important issues in the model uncertainty literature. Bernardo and Smith (1994) distinguish between *M*closed and *M-*open environments, where the former includes the true model and the latter does not necessarily. A set of Akaike (AIC) model weights can be derived in the *M-*open environment as the best approximation to the true distribution (see Burnham and Anderson 2002). The AIC weights have the disadvantage that they will not be consistent in *M-*closed environments.

## Prior Over Model Space

This prior might represent diffuse information about the set of models, but does have important implications for the size of models.

There are different approaches to modelling the inclusion of explanatory variables in the linear regression models Eq. (9). Mitchell and Beauchamp (1988) assign a discrete prior probability mass *p*(*β*_{i} = 0|*M*_{j}) to excluding regressors *x*_{i} from the regression model *M*_{j}, that is a ‘spike’ at zero. A more Bayesian approach assigns a mixture of a relatively informative prior at zero (corresponding to a spike at zero) and a more diffuse prior if the variable is included (see George and McCulloch 1993).

*k*/2 might be preferable. Notice that this translates into a prior probability \( \pi =p\left({\beta}_i\ne 0|{M}_j\right)=\overline{k}/k \) of including a regressor

*x*

_{i}in model

*M*

_{j}. The implied model probability can then be written as

Notice that the prior inclusion probabilities π_{i} and implied prior model weights can also differ across variables, which is used in the ‘stratified’ sampler of the BACE approach by Sala-i-Martin et al. (2004) to speed up numerical convergence.

George (1999) observes that, when allowing for a large number of explanatory variables which could be correlated with each other, posterior model probabilities can be spread across models with ‘similar’ regressors. To address this problem, George (1999) proposes *dilution priors*, which reduce the prior weight on models that include explanatory variables measuring similar underlying theories. Alternatively, one can impose a hierarchical structure on the set of models and variables and partition the model space accordingly (see Chipman et al. 2001; Brock et al. 2003). Doppelhofer and Weeks (2007) propose to estimate the degree of dependence or jointness among regressors over the model space. If we are only interested in prediction, the orthogonalization of regressors greatly reduces the computational burden of model averaging (see Clyde et al. 1996). The costs are the loss of interpretation of associated coefficient estimates and the need to recalculate orthogonal factors with changing sample information.

## Numerical Simulation Techniques

A major challenge for the practical implementation of model averaging is the computational burden of calculating posterior quantities of interest when the model space is potentially very large. In the linear regressions example, an exhaustive integration over all 2^{k} models becomes impractical for a relatively moderate number of 30 regressors.

Recent advances in computing power and development of statistical methods have made numerical approximations of posterior distributions feasible. Chib (2001) gives an overview of computationally intensive methods. Such methods include Markov chain Monte Carlo techniques (Madigan and York 1995), stochastic search variable selection (George and McCulloch 1993), the Metropolis–Hastings algorithm (Chib and Greenberg 1995), and the Gibbs sampler (Casella and George 1992). Chipman et al. (2001) contrast different approaches in the context of Bayesian model selection.

*θ*or related functions of interest

*g*(

*θ*) by sampling from the posterior distribution

*g*(

*θ*) could be any function, such as variance of

*θ*or predicted values of the dependent variable

*y*. Consider the sample counterpart

*θ*

^{(s)}is a random

*i.i.d*. sample drawn from

*p*(

*θ*|

*Y*) and

*S*is the number of draws. Provided that

*E*[

*g*(

*θ*)|

*Y*] < 0 exists, a weak law of large numbers implies

where Σ_{g} is the estimated covariance matrix of *g*(*θ*)|*Y* .

Markov chain Monte Carlo (MCMC) techniques strengthen these results by constructing a Markov chain moving through the model space {*M*(*s*), *s* = 1,…, *S*} that simulates from a transition kernel *p*(*θ*^{(s)}|*θ*^{(s−1)}), starting from an initial value *θ*^{(0)}. There are various approaches to constructing a Markov chain that converges to the posterior distribution *p*(*θ*|*Y*). This limiting distribution can be estimated from simulated values of *θ*^{(s)}.

Simulation methods differ with respect to the choice of sampling procedure and transition kernels. A sampling algorithm that uses the underlying structure of the model can greatly improve the efficiency of the simulation. For example, the Gibbs sampler uses the structure of the statistical model to partition parameters and their distribution into blocks, which breaks up the simulation into smaller steps. In the linear regression example, the Gibbs sampler can draw from the conditional distributions for slope and variance parameters Eq. (13) separately. A disadvantage of numerical methods can be the technical challenges in their implementation (for an excellent introduction, see Gilks et al. 1996). Links to software packages and codes that facilitate implementation, such as BACC, BACE, BUGS and the BMA project website, are listed at the end of this article.

An alternative approach is to limit the set of models and rule out dominated models by Occam’s razor, see Hoeting et al. (1999). This can speed up computation of posterior distributions and can be useful tool for model selection. Evidence by Raftery et al. (1996) suggests that model averaging leads to important improvements in predictive performance over any single model, and gives a small predictive advantage relative to the restricted set of models. The relative performance of different model averaging techniques and associated model weights depends on sample size and stability of estimated model (see Yuan and Yang 2005; Hansen 2007).

## See Also

## Bibliography

- Avramov, D. 2002. Stock return predictability and model uncertainty.
*Journal of Financial Economics*64: 423–458.CrossRefGoogle Scholar - Bernardo, J.M., and A.F.M. Smith. 1994.
*Bayesian theory*. New York: Wiley.CrossRefGoogle Scholar - Brock, W.A., and S.N. Durlauf. 2001. Growth empirics and reality.
*World Bank Economic Review*15: 229–272.CrossRefGoogle Scholar - Brock, W.A., S.N. Durlauf, and K. West. 2003. Policy evaluation in uncertain economic environments.
*Brookings Papers on Economic Activity*2003(1): 235–322.CrossRefGoogle Scholar - Burnham, K.P., and D.R. Anderson. 2002.
*Model selection and multimodel inference: A practical information-theoretic approach*. 2nd ed. New York: Springer.Google Scholar - Carlin, B.P., and T.A. Louis. 2000.
*Bayes and empirical Bayes Methods for data analysis*. 2nd ed. New York: Chapman & Hall.CrossRefGoogle Scholar - Casella, G., and E.I. George. 1992. Explaining the Gibbs sampler.
*The American Statistician*46: 167–174.Google Scholar - Chib, S. 2001. Markov chain Monte Carlo methods: Computation and inference. In
*Handbook of econometrics*, ed. J. Heckman and E. Leamer, vol. 5. Amsterdam: North- Holland Pub. Co..Google Scholar - Chib, S., and E. Greenberg. 1995. Understanding the Metropolis–Hastings algorithm.
*The American Statistician*49: 327–335.Google Scholar - Chipman, H., E.I. George, and R.E. McCulloch. 2001. The practical implementation of Bayesian model selection. In
*Model selection. IMS lecture notes: Monograph series*, ed. P. Lahiri. Beachwood: Institute of Mathematical Statistics.Google Scholar - Clyde, M., H. Desimone, and G. Parmigiani. 1996. Prediction via orthogonalized model mixing.
*Journal of the American Statistical Association*91: 1197–1208.CrossRefGoogle Scholar - Doppelhofer, G. and M. Weeks. 2007. Jointness of growth determinants.
*Journal of Applied Econometrics*.Google Scholar - Draper, D. 1995. Assessment and propagation of model uncertainty (with discussion).
*Journal of the Royal Statistical Society B*57: 45–97.Google Scholar - Fernandez, C., E. Ley, and M.F.J. Steel. 2001a. Model uncertainty in cross-country growth regressions.
*Journal of Applied Econometrics*16: 563–576.CrossRefGoogle Scholar - Fernandez, C., E. Ley, and M.F.J. Steel. 2001b. Benchmark priors for Bayesian model averaging.
*Journal of Econometrics*100: 381–427.CrossRefGoogle Scholar - Garratt, A., K. Lee, M.H. Pesaran, and Y. Shin. 2003. Forecast uncertainties in macroeconomic modelling: An application to the U.K. economy.
*Journal of the American Statistical Association*98: 829–838.CrossRefGoogle Scholar - George, E.I. 1999. Discussion of Bayesian model averaging and model search strategies by M.A. Clyde.
*Bayesian Statistics*6: 175–177.Google Scholar - George, E.I., and D.P. Foster. 2000. Calibration and empirical Bayes variable selection.
*Biometrika*87: 731–747.CrossRefGoogle Scholar - George, E., and R.E. McCulloch. 1993. Variable selection via Gibbs sampling.
*Journal of the American Statistical Association*88: 881–889.CrossRefGoogle Scholar - Geweke, J. 1989. Bayesian inference in econometric models using Monte Carlo integration.
*Econometrica*57: 1317–1339.CrossRefGoogle Scholar - Geweke, J., and C. Whiteman. 2006. Bayesian forecasting. In
*Handbook of economic forecasting*, ed. G. Elliott, C.W.J. Granger, and A. Timmermann, vol. 1. Amsterdam: North-Holland.Google Scholar - Gilks, W., S. Richardson, and D. Spiegelhalter. 1996.
*Markov Chain Monte Carlo in practice*. New York: Chapman & Hall.Google Scholar - Hansen, B.E. 2007. Least squares model averaging.
*Econometrica*75: 1175–1189.CrossRefGoogle Scholar - Hjort, N.L., and G. Claeskens. 2003. Frequentist model averaging.
*Journal of the American Statistical Association*98: 879–899.CrossRefGoogle Scholar - Hoeting, J.A., D. Madigan, A.E. Raftery, and C.T. Volinsky. 1999. Bayesian model averaging: A tutorial.
*Statistical Science*14: 382–417.CrossRefGoogle Scholar - Jeffreys, H. 1961.
*Theory of probability*. 3rd ed. Oxford: Clarendon Press.Google Scholar - Kass, R.E., and A.E. Raftery. 1995. Bayes factors.
*Journal of the American Statistical Association*90: 773–795.CrossRefGoogle Scholar - Kass, R.E., and L. Wasserman. 1995. A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion.
*Journal of the American Statistical Association*90: 928–934.CrossRefGoogle Scholar - Klein, R.W., and S.J. Brown. 1984. Model selection when there is ‘minimal’ prior information.
*Econometrica*52: 1291–1312.CrossRefGoogle Scholar - Koop, G. 2003.
*Bayesian econometrics*. Chichester: Wiley.Google Scholar - Leamer, E. 1978.
*Specification searches*. New York: Wiley.Google Scholar - Levin, A.T., and J.C. Williams. 2003. Robust monetary policy with competing reference models.
*Journal of Monetary Economics*50: 945–975.CrossRefGoogle Scholar - Madigan, D., and J. York. 1995. Bayesian graphical models for discrete data.
*International Statistical Review*63: 215–232.CrossRefGoogle Scholar - Mitchell, T.J., and J.J. Beauchamp. 1988. Bayesian variable selection in linear regression.
*Journal of the American Statistical Association*83: 1023–1032.CrossRefGoogle Scholar - Raftery, A.E. 1995. Bayesian model selection in social research.
*Sociological Methodology*25: 111–163.CrossRefGoogle Scholar - Raftery, A.E., D. Madigan, and J.A. Hoeting. 1997. Bayesian model averaging for linear regression models.
*Journal of the American Statistical Association*92: 179–191.CrossRefGoogle Scholar - Raftery, A.E., D. Madigan, and C.T. Volinsky. 1996. Accounting for model uncertainty in survival analysis improves predictive performance.
*Bayesian Statistics*5: 323–349.Google Scholar - Sala-i-Martin, X., G. Doppelhofer, and R.M. Miller. 2004. Determinants of economic growth: A Bayesian averaging of classical estimates (BACE) approach.
*American Economic Review*94: 813–835.CrossRefGoogle Scholar - Schwarz, G. 1978. Estimating the dimension of a model.
*Annals of Statistics*6: 461–464.CrossRefGoogle Scholar - Wasserman, L. 2000. Bayesian model selection and model averaging.
*Journal of Mathematical Psychology*44: 92–107.CrossRefGoogle Scholar - Yang, Y. 2001. Adaptive regression by mixing.
*Journal of the American Statistical Association*96: 574–588.CrossRefGoogle Scholar - Yuan, Z., and Y. Yang. 2005. Combining linear regression models: When and how?
*Journal of the American Statistical Association*100: 1202–1214.CrossRefGoogle Scholar - Zellner, A. 1986. On assessing prior distributions and Bayesian regression analysis with
*g-*prior distributions. In*Bayesian inference and decision techniques: Essays in honor of Bruno de Finetti*, ed. P.K. Goel and A. Zellner. Amsterdam: North-Holland.Google Scholar

## Model Averaging Software and Codes

- BACC package: http://www2.cirano.qc.ca/Bbacc
- BACE website: http://www.nhh.no/sam/bace
- BMA homepage: http://www.research.att.com/Bvolinsky/bma.html
- BUGS project: http://www.mrc-bsu.cam.ac.uk/bugs
- LeSage’s Econometrics Toolbox: http://www.spatial-econometrics.com