The New Palgrave Dictionary of Economics

2018 Edition
| Editors: Macmillan Publishers Ltd

Model Averaging

  • Gernot Doppelhofer
Reference work entry


Model averaging estimates the distribution of quantities of interest across models. Model averaging can be used for inference, prediction and policy analysis to address model uncertainty. Three main approaches are discussed: Bayesian model averaging (BMA), empirical Bayes (EB) methods, and frequentist model averaging (FMA). Differences in prior specifications are contrasted using the example of normal, linear regression models. Finally, the article discusses implementation issues such as numerical simulation techniques and software for model averaging.


Bayes’ rule Bayesian estimation Bayesian model averaging Empirical Bayes methods Exchangeability Frequentist model averaging Homoskedasticity Likelihood Markov chain Monte Carlo methods Metropolis–Hastings algorithm Model averaging Model selection criteria Model uncertainty Posterior model probabilities Sensitivity analysis Statistical decision theory Stochastic search variable selection 

JEL Classifications

C10 C50 D81 E52 O40 

Model averaging allows the estimation of the distribution of unknown parameters and related quantities of interest across different models. The basic principle of model averaging is to treat models and associated parameters as unobservable and estimate their distributions based on observable data. Model averaging can be employed for inference, prediction and policy analysis in the face of model uncertainty. Many areas of economics give rise to model uncertainty, including uncertainty about theory, specification and data issues. A naive approach that ignores model uncertainty generally results in biased parameter estimates, overconfident (too narrow) standard errors and misleading inference and predictions (see Draper 1995). Taking model uncertainty seriously implies a departure from conditioning on a particular model and calculating quantities of interest by averaging across different models instead.

Model averaging is conceptually straightforward. The sample information contained in the likelihood function for a particular model is combined with relative model weights or posterior model probabilities to estimate the distribution of unknown parameters across models. Three main approaches – Bayesian, empirical Bayes, and frequentist – have been developed, and they differ in their underlying statistical foundations and practical implementation.

Bayesian model averaging (BMA) was developed first to systematically deal with model uncertainty. The idea of combining evidence from different models is readily integrated into a Bayesian framework. Jeffreys (1961) laid the foundation for BMA, further developed by Leamer (1978). Hoeting et al. (1999), Wasserman (2000) and Koop (2003) give excellent introductions to BMA. A drawback of the Bayesian approach is that it requires assumptions about prior information about distribution of unknown parameters. In response, empirical Bayes (EB) approaches have been developed to estimate elements of the prior using observable data. Chipman et al. (2001) argue for a pragmatic approach that introduces objective or frequentist considerations into model averaging. In contrast to Bayesian approaches, frequentist model averaging (FMA) methods were developed only relatively recently. Recent contributions include Yang (2001), Hjort and Claeskens (2003) and Hansen (2007).

Model averaging was not widely used until advances in statistical techniques and computing power facilitated its practical use (see Chib 2001; Geweke and Whiteman 2006). Economic applications of model averaging include economic growth (Fernandez et al. 2001a; Sala-i-Martin et al. 2004), finance (Avramov 2002), policy evaluation (Brock et al. 2003; Levin and Williams 2003), macroeconomic forecasting (Garratt et al. 2003).

This article is organized as follows. The statistical model averaging framework is introduced in the next section. Different model averaging approaches are illustrated with applications to linear regressions. Finally, implementation issues, including model priors, numerical methods, and software are discussed.

Statistical Framework

Suppose a decision maker observes data Y and wishes to learn about quantities of interest related to an unknown parameter (vector) θ, such as the effect of an economic variable (say θ > 0 or θ ≤ 0) or predictions of future observations Yf. The utility (or loss) function of the decision maker describes the relation between parameter of interest θ and action a. For example, the decision maker could maximize expected utility
$$ \underset{a}{\max }E\left[u\left(a,\theta |Y\right)\right]=\int u\left(a,\theta |Y\right)p\left(\theta |Y\right) d\theta . $$

In general, the preferred action depends on the preferences of the decision-maker and the unconditional distribution of parameters. Alternative preference structure can have important consequences for optimal estimators and implied policy conclusions. Bernardo and Smith (1994) give an accessible introduction to statistical decision theory. In the context of economic policy, Brock et al. (2003) present an interesting discussion of alternative preferences and implied policies.

A key ingredient in decision making is the posterior distribution of the parameter θ, which can be calculated using Bayes’s rule:
$$ p\left(\theta |Y\right)=\frac{L\left(Y|\theta \right)p\left(\theta \right)}{p(Y)}\propto L\left(Y|\theta \right)p\left(\theta \right). $$

The posterior distribution is therefore proportional to the likelihood function L(Y|θ), which summarizes all information about θ contained in the observed data, and the prior distribution p(θ). In contrast, the classical approach assumes that the parameter θ is fixed (non-random) and does not have a meaningful distribution. The estimator \( \widehat{\theta} \) on the other hand is viewed as a random variable.

In many economic and more generally non-experimental applications, a decision maker might face considerable model uncertainty given potentially overlapping, economic theories. Brock and Durlauf (2001) refer to this as ‘open-endedness’ of economic theories. Also, there might be alternative empirical specifications of these theoretical channels. In sum, the number of observations may be smaller than the number of suggested explanations, and the problem may be compounded by data problems, such as missing data or outliers.

Formally, there may be many candidate models M1,…, MK to explain the observed data. A model Mj can be described by a probability distribution p(Y|θj, Mj) with model-specific parameter (vector) θj. In a situation of model uncertainty, the decision-maker evaluates the utility function Eq. (1) using the posterior distribution of θ. The posterior distribution is unconditional with respect to the set of models and is calculated by averaging conditional or model-specific distributions across all models
$$ p\left(\theta |Y\right)=\sum_{j=1}^K{w}_j\cdot p\left({\theta}_j|{M}_j,Y\right), $$
where the model weights wj are proportional to the fit in explaining the observable data. In a Bayesian context, the weights are the posterior model probabilities, wj = p(Mj|Y ). Using Bayes’s rule,
$$ p\left({M}_j|Y\right)=\frac{L\left(Y|{M}_j\right)p\left({M}_j\right)}{\sum_{j=1}^KL\left(Y|{M}_j\right)p\left({M}_j\right)}\propto L\left(Y|{M}_j\right)p\left({M}_j\right). $$
The posterior model weights are proportional to the product of prior model probability p(Mj) and model-specific marginal likelihood L(Y|Mj). The marginal likelihood is obtained by integrating a model-specific version of equation Eq. (2) with respect to θj
$$ L\left(Y|{M}_j\right)={\int}_{\theta }L\left(Y|{\theta}_j,{M}_j\right)p\left({\theta}_j|{M}_j\right)d{\theta}_j $$
using the fact that ∫p(θj|Mj, Y )j = 1.
When comparing two models, Mi and Mj say, the posterior model probabilities or posterior odds ratio equals the ratio of integrated likelihoods times the prior odds
$$ \frac{p\left({M}_i|Y\right)}{p\left({M}_J|Y\right)}=\frac{L\left(Y|{M}_i\right)p\left({M}_i\right)}{L\left(Y|{M}_j\right)p\left({M}_j\right)}. $$
Similarly, the weight for model Mi relative to K models under consideration is given by Eq. (4), where the normalizing factor \( {\sum}_{j=1}^KL\left(Y|{M}_j\right)p\left({M}_j\right) \) ensures consistency of model weights.
The decision maker may be interested in particular aspects of the unconditional distribution Eq. (3), such as posterior mean or variance. Leamer (1978) derives the following expressions for unconditional mean or variance of the parameter θ
$$ E\left(\theta |Y\right)=\sum_{j=1}^Kp\left({M}_j|Y\right)E\left({\theta}_j|Y,{M}_j\right). $$
$$ {\displaystyle \begin{array}{l} Var\left(\theta |Y\right)=E\left({\theta}^2|Y\right)-{\left[E\left(\theta |Y\right)\right]}^2\hfill \\ {}=\sum \limits_{j=1}^Kp\left({M}_j|Y\right)\hfill \\ {}\times \left\{ Var\left({\theta}_j|Y,{M}_j\right)+\left[E\right({\theta}_j|Y,{M}_j\left)\right]{}^2\right\}\hfill \\ {}-{\left[E\left(\theta |Y\right)\right]}^2\hfill \\ {}=\sum \limits_{j=1}^Kp\left({M}_j|Y\right) Var\left({\theta}_j|Y,{M}_j\right)\\ {}+\sum \limits_{j=1}^kp\left({M}_j|Y\right){\left[E\left({\theta}_j|Y,{M}_j\right)-E\left(\theta |Y\right)\right]}^2.\hfill \end{array}} $$

The expression for the unconditional mean of θ in Eq. (7) is simply the model-weighted sum of conditional means. Notice that the unconditional variance of θ in Eq. (8) exceeds the sum of model-weighted conditional variances by an additional term, reflecting the distance between the estimated conditional mean in each model E(θj|Y, Mj) and the unconditional mean E(θ|Y). Ignoring this last term overestimates the precision of estimated effects and underestimates parameter uncertainty (see Draper 1995).

The advantage of the Bayesian approach to model averaging is its generality and the explicit treatment of model uncertainty and decision theory. The decision maker simply combines prior information about the distribution of parameters and models with sample information to calculate the unconditional posterior distribution of θ in Eq. (3).

However, there are several problems that can make implementation of BMA difficult in practice (see Hoeting et al. 1999; Chipman et al. 2001):
  1. 1.

    The specification of prior distribution of parameters θ requires assumptions about functional forms and unknown hyper-parameters which will in general affect the marginal likelihood Eq. (5) and hence posterior model weights Eq. (4).

  2. 2.

    The specification of prior probabilities over the model space p(Mj) might have important effects on posterior model weights Eq. (4).

  3. 3.

    The number of models K in Eq. (3) can be too large for a complete summation across models, implying the use simulation techniques to approximate the unconditional distribution p(θ|Y ) in equation Eq. (3).

  4. 4.

    Choices of utility function Eq. (1) and class of models are other important issues.


These issues are discussed in turn, contrasting the fully Bayesian, empirical Bayes and frequentist approaches.

Linear Regression Example

Many of the implementation problems of model averaging and approaches suggested in the literature can be illustrated using the linear regression example (see Koop 2003). Raftery et al. (1997) and Fernandez et al. (2001b) discuss BMA for linear regression models.

Consider linear regression models of the form
$$ y={x}_1{\beta}_1+\cdots +{x}_k{\beta}_k+\varepsilon = X\beta +\varepsilon, $$

where y is the vector of N observations of the dependent variable and X = [x1,…, xk] is a set of k regressors (including a constant) with associated coefficient vector β. Each model Mj is characterized by a subset of explanatory variables Xj with coefficient vector βj. With k regressors, the set of linear models equals K = 2k. The residuals are drawn from a multivariate normal distribution and are assumed to be conditionally homoskedastic, εjN(0, σ2I). Notice that this implies that the residuals are also conditionally exchangeable (see Bernardo and Smith 1994; Brock and Durlauf 2001).

Suppose the decision maker is interested in the effect of different explanatory variables, represented by slope parameters β with posterior distribution of p(β|Y ). As shown in Eq. (3), the posterior distribution is estimated by weighting conditional distributions of parameters by posterior model probabilities. The relative posterior model weights in Eqs (6) and (4) are proportional to the marginal likelihood and prior model weights.

For the normal regression model, the likelihood function can be written as
$$ {\displaystyle \begin{array}{ll}\hfill & L\left(y|{\beta}_j,{\sigma}^2\right)\\ {}& =\frac{1}{{\left(2{\pi \sigma}^2\right)}^{N/2}}\left\{\exp \left[-\frac{1}{2{\sigma}^2}{\left(y-X{\beta}_j\right)}^{\prime}\left(y-X{\beta}_j\right)\right]\right\}\hfill \\ {}& \times \propto \left\{\exp \left[-\frac{1}{2{\sigma}^2}{\left({\beta}_j-{\widehat{\beta}}_j\right)}^{\prime }{X}_j^{\prime }{X}_j\left({\beta}_j-{\widehat{\beta}}_j\right)\right]\right\}\hfill \\ {}& \times \left\{{\sigma}^{-\left({\nu}_j+1\right)}\exp \left[-\frac{\nu_j{s}_j^2}{2{\sigma}^2}\right]\right\}.\hfill \end{array}} $$
The second line of the likelihood substitutes the ordinary least squares (OLS) estimates for the slope and variance
$$ {\widehat{\beta}}_j={\left({X}_j^{\prime }{X}_j\right)}^{-1}{X}_j^{\prime }y, $$
$$ {s}_j^2=\frac{{\left(y-{X}_j{\widehat{\beta}}_j\right)}^{\prime}\left({y}_j-{X}_j{\widehat{\beta}}_j\right)}{v_j}, $$

with degrees of freedom vj = Nkj − 1. The implementation of model averaging – Bayesian, empirical Bayes, or frequentist – requires the specification of prior distributions p(θj) for the model parameters θj = (βj, σ2).

Bayesian Conjugate Priors

A standard way to specify priors in Bayesian estimation is to assume a prior structure that is analytically and computationally convenient. A conjugate prior distribution leads to a posterior distribution of the same class of distributions when combined with the likelihood. The likelihood Eq. (10) is part of the Normal-Gamma family of distributions, proportional to the product of a normal distribution for the slope βj, conditional on the variance σ2, and an inverse-Gamma distribution for the variance σ2. The conjugate prior therefore takes the form
$$ p\left({\beta}_j|{\sigma}^2,{M}_j\right)\sim N\left({\beta}_{0j},{\sigma}^2{V}_{0j}\right)p\left({\sigma}^2|{M}_j\right)=p\left({\sigma}^2\right)\sim IG\left({s}_0^2,{v}_0\right) $$

where the prior hyper-parameters for slope and variance are denoted by subscript 0. Notice that the error variance is assumed to be drawn from the same distribution across all regression models, reflecting the assumption of conditional homoskedasticity and exchangeability of the residuals.

A drawback of the Bayesian approach is that marginal likelihood and posterior model weights depend on unknown hyper-parameters (β0, V0, s0, v0). Different subjective priors therefore affect the posterior model weights and distribution of parameters, and hence also the decision maker’s action. The standard Bayesian approach to check for robustness with respect to the choice of prior parameters is sensitivity analysis. An alternative strategy is to limit the use of subjective prior information and use objective methods based on observed data.

Empirical Bayes Priors

Empirical Bayes (EB) approaches make use of sample information to specify prior parameters. Different versions of empirical Bayes methods have been proposed in the literature (see Hoeting et al. 1999; George and Foster 2000; Chipman et al. 2001). To limit the importance of prior information, EB methods often use non-informative or diffuse priors that are dominated by the sample information (see Leamer 1978). Jeffreys (1961) proposes non-informative priors to represent lack of prior knowledge and derives a formal relationship to the expected information in the sample.

A drawback of non-informative priors is that they are usually not proper distributions, which can lead to undesirable properties when comparing models with different parameters. In this case, relative model weights can depend on arbitrary constants. However, this problem is not present when comparing models with common parameters, since normalizing constants drop out from relative model weights (see Kass and Raftery 1995). Koop (2003) argues that informative or proper priors should be used for all other (non-common) parameters.

Fernandez et al. (2001b) propose benchmark priors for BMA that limit the subjective prior information to a minimum while maintaining the Bayesian natural conjugate framework. They suggest the following non-informative priors for the error variance, assumed to be the same in all k models:
$$ p\left({\sigma}^2\right)\propto \frac{1}{\sigma^2}. $$
The slope parameter βj is drawn from a normal prior distribution as in Eq. (13) with prior mean β0j = 0 and prior covariance matrix Voj equal to the so-called g-prior suggested by Zellner (1986):
$$ {V}_{oj}={\left({g}_0{X}_j^{\prime }{X}_j\right)}^{-1}. $$
Intuitively, the prior covariance matrix is assumed to be proportional to the sample covariance with a factor of proportionality g0. The g-prior simplifies the specification of prior covariances to choosing a single parameter g0. For example, g0 = 0 corresponds to completely non-informative priors, and g0 = 1 implies a very informative prior receiving equal weight to the sample information. Based on extensive simulations, Fernandez et al. (2001b) recommend the following benchmark values:
$$ {g}_0=\left\{\begin{array}{lll}1/{k}^2,\hfill & \mathrm{if}\hfill & N\le {k}^2\hfill \\ {}1/N,\hfill & \mathrm{if}\hfill & N>{k}^2\hfill \end{array}.\right. $$

Note that the ratio of prior to sample variance g0 decreases with the sample size or with the square of estimated parameters. If the number of parameters is relatively large k2N, the variance is assumed to be relatively more diffuse.

Using this prior structure, the posterior weights for model Mj can be written as
$$ p\left({M}_j|Y\right)\propto p\left({M}_j\right)\cdot {\left(\frac{1+{g}_0}{g_0}\right)}^{-{k}_j/2}\cdot {SSE}_j^{-\left(N-1\right)/2}. $$

The weight for model p(Mj|Y ) depends on three terms: (i) the prior model weight p (Mj), (ii) a penalty term for the number of regressors \( {\left(\left(1+{g}_0\right)/{g}_0\right)}^{-{k}_j/2} \) implying a preference for parsimonious models, and (iii) a term involving the sum of squared errors of the regression SSEj ≡ (yXjβj)′ (yXjβj), corresponding to the kernel of the normal likelihood.

Frequentist Sample Dominated Priors

A potential problem of using non-informative g-priors for the error covariance matrix is that the limit of posterior weights may be very sensitive to specification of the prior (see Leamer 1978). Alternatively, Leamer (1978) assumes that a proper, conjugate Normal-Gamma prior Eq. (13) is ‘dominated’ by the sample information as the number of observations N grows. For stationary regressors with limN(X′jXj)/N converging to a constant, the implied model weight is approximately equal to the (exponentiated) Schwarz (1978) model selection criterion (BIC)
$$ p\left({M}_j|Y\right)\propto p\left({M}_j\right)\cdot {N}^{-{k}_j/2}\cdot {SSE}_j^{-N/2}. $$

On closer inspection, the relative model weights using non-informative g-priors Eq. (17) or sample-dominated prior Eq. (18) are essentially the same, using g0 = 1/N in Eq. (16). This is very reassuring for a decision maker, since the relative model weights are very similar under an empirical Bayesian or frequentist interpretation.

The BIC weights can also be derived from a unit information prior, where the information introduced by the prior corresponds to one datapoint from the sample (see Kass and Wasserman 1995; Raftery 1995). Klein and Brown (1984) give an alternative derivation of the BIC model weights Eq. (18) by minimizing the so-called

Shannon information in the prior distribution; this approach also lends support for using the BIC model weights in small samples.

The underlying model space and its interpretation are important issues in the model uncertainty literature. Bernardo and Smith (1994) distinguish between Mclosed and M-open environments, where the former includes the true model and the latter does not necessarily. A set of Akaike (AIC) model weights can be derived in the M-open environment as the best approximation to the true distribution (see Burnham and Anderson 2002). The AIC weights have the disadvantage that they will not be consistent in M-closed environments.

Prior Over Model Space

An important ingredient to model averaging is the choice of prior model probability. A popular choice is to impose a uniform prior over the space of models
$$ p\left({M}_j\right)=1/K. $$

This prior might represent diffuse information about the set of models, but does have important implications for the size of models.

There are different approaches to modelling the inclusion of explanatory variables in the linear regression models Eq. (9). Mitchell and Beauchamp (1988) assign a discrete prior probability mass p(βi = 0|Mj) to excluding regressors xi from the regression model Mj, that is a ‘spike’ at zero. A more Bayesian approach assigns a mixture of a relatively informative prior at zero (corresponding to a spike at zero) and a more diffuse prior if the variable is included (see George and McCulloch 1993).

An alternative to specifying prior model probabilities is to think about prior model size and the implied probability of including individual variables. Sala-i-Martin et al. (2004) argue that in the context of economic growth regressions a prior model size \( \overline{k} \) smaller than the one implied by uniform priors k/2 might be preferable. Notice that this translates into a prior probability \( \pi =p\left({\beta}_i\ne 0|{M}_j\right)=\overline{k}/k \) of including a regressor xi in model Mj. The implied model probability can then be written as
$$ p\left({M}_j\right)={\pi}^{k_j}\cdot {\left(1-\pi \right)}^{k-{k}_j}. $$

Notice that the prior inclusion probabilities πi and implied prior model weights can also differ across variables, which is used in the ‘stratified’ sampler of the BACE approach by Sala-i-Martin et al. (2004) to speed up numerical convergence.

George (1999) observes that, when allowing for a large number of explanatory variables which could be correlated with each other, posterior model probabilities can be spread across models with ‘similar’ regressors. To address this problem, George (1999) proposes dilution priors, which reduce the prior weight on models that include explanatory variables measuring similar underlying theories. Alternatively, one can impose a hierarchical structure on the set of models and variables and partition the model space accordingly (see Chipman et al. 2001; Brock et al. 2003). Doppelhofer and Weeks (2007) propose to estimate the degree of dependence or jointness among regressors over the model space. If we are only interested in prediction, the orthogonalization of regressors greatly reduces the computational burden of model averaging (see Clyde et al. 1996). The costs are the loss of interpretation of associated coefficient estimates and the need to recalculate orthogonal factors with changing sample information.

Numerical Simulation Techniques

A major challenge for the practical implementation of model averaging is the computational burden of calculating posterior quantities of interest when the model space is potentially very large. In the linear regressions example, an exhaustive integration over all 2k models becomes impractical for a relatively moderate number of 30 regressors.

Recent advances in computing power and development of statistical methods have made numerical approximations of posterior distributions feasible. Chib (2001) gives an overview of computationally intensive methods. Such methods include Markov chain Monte Carlo techniques (Madigan and York 1995), stochastic search variable selection (George and McCulloch 1993), the Metropolis–Hastings algorithm (Chib and Greenberg 1995), and the Gibbs sampler (Casella and George 1992). Chipman et al. (2001) contrast different approaches in the context of Bayesian model selection.

The main idea of Monte Carlo simulation techniques is to estimate the empirical distribution of the parameter θ or related functions of interest g(θ) by sampling from the posterior distribution
$$ E\left[g\left(\theta \right)|Y\right]=\int g\left(\theta \right)p\left(\theta |Y\right) d\theta, $$
where g(θ) could be any function, such as variance of θ or predicted values of the dependent variable y. Consider the sample counterpart
$$ {\widehat{g}}_S=\frac{1}{S}\sum_{s=1}^sg\left({\theta}^{(s)}\right), $$
where θ(s) is a random i.i.d. sample drawn from p(θ|Y ) and S is the number of draws. Provided that E[g(θ)|Y] < 0 exists, a weak law of large numbers implies
$$ {\widehat{g}}_S\overset{p}{\to }E\left[g\left(\theta \right)|Y\right]. $$
A central limit theorem implies that
$$ \sqrt{S}\left\{{\widehat{g}}_S-E\left[g\left(\theta \right)|Y\right]\right\}\overset{d}{\to }N\left(0,{\Sigma}_g\right) $$

where Σg is the estimated covariance matrix of g(θ)|Y .

Markov chain Monte Carlo (MCMC) techniques strengthen these results by constructing a Markov chain moving through the model space {M(s), s = 1,…, S} that simulates from a transition kernel p(θ(s)|θ(s−1)), starting from an initial value θ(0). There are various approaches to constructing a Markov chain that converges to the posterior distribution p(θ|Y). This limiting distribution can be estimated from simulated values of θ(s).

Simulation methods differ with respect to the choice of sampling procedure and transition kernels. A sampling algorithm that uses the underlying structure of the model can greatly improve the efficiency of the simulation. For example, the Gibbs sampler uses the structure of the statistical model to partition parameters and their distribution into blocks, which breaks up the simulation into smaller steps. In the linear regression example, the Gibbs sampler can draw from the conditional distributions for slope and variance parameters Eq. (13) separately. A disadvantage of numerical methods can be the technical challenges in their implementation (for an excellent introduction, see Gilks et al. 1996). Links to software packages and codes that facilitate implementation, such as BACC, BACE, BUGS and the BMA project website, are listed at the end of this article.

An alternative approach is to limit the set of models and rule out dominated models by Occam’s razor, see Hoeting et al. (1999). This can speed up computation of posterior distributions and can be useful tool for model selection. Evidence by Raftery et al. (1996) suggests that model averaging leads to important improvements in predictive performance over any single model, and gives a small predictive advantage relative to the restricted set of models. The relative performance of different model averaging techniques and associated model weights depends on sample size and stability of estimated model (see Yuan and Yang 2005; Hansen 2007).

See Also


  1. Avramov, D. 2002. Stock return predictability and model uncertainty. Journal of Financial Economics 64: 423–458.CrossRefGoogle Scholar
  2. Bernardo, J.M., and A.F.M. Smith. 1994. Bayesian theory. New York: Wiley.CrossRefGoogle Scholar
  3. Brock, W.A., and S.N. Durlauf. 2001. Growth empirics and reality. World Bank Economic Review 15: 229–272.CrossRefGoogle Scholar
  4. Brock, W.A., S.N. Durlauf, and K. West. 2003. Policy evaluation in uncertain economic environments. Brookings Papers on Economic Activity 2003(1): 235–322.CrossRefGoogle Scholar
  5. Burnham, K.P., and D.R. Anderson. 2002. Model selection and multimodel inference: A practical information-theoretic approach. 2nd ed. New York: Springer.Google Scholar
  6. Carlin, B.P., and T.A. Louis. 2000. Bayes and empirical Bayes Methods for data analysis. 2nd ed. New York: Chapman & Hall.CrossRefGoogle Scholar
  7. Casella, G., and E.I. George. 1992. Explaining the Gibbs sampler. The American Statistician 46: 167–174.Google Scholar
  8. Chib, S. 2001. Markov chain Monte Carlo methods: Computation and inference. In Handbook of econometrics, ed. J. Heckman and E. Leamer, vol. 5. Amsterdam: North- Holland Pub. Co..Google Scholar
  9. Chib, S., and E. Greenberg. 1995. Understanding the Metropolis–Hastings algorithm. The American Statistician 49: 327–335.Google Scholar
  10. Chipman, H., E.I. George, and R.E. McCulloch. 2001. The practical implementation of Bayesian model selection. In Model selection. IMS lecture notes: Monograph series, ed. P. Lahiri. Beachwood: Institute of Mathematical Statistics.Google Scholar
  11. Clyde, M., H. Desimone, and G. Parmigiani. 1996. Prediction via orthogonalized model mixing. Journal of the American Statistical Association 91: 1197–1208.CrossRefGoogle Scholar
  12. Doppelhofer, G. and M. Weeks. 2007. Jointness of growth determinants. Journal of Applied Econometrics.Google Scholar
  13. Draper, D. 1995. Assessment and propagation of model uncertainty (with discussion). Journal of the Royal Statistical Society B 57: 45–97.Google Scholar
  14. Fernandez, C., E. Ley, and M.F.J. Steel. 2001a. Model uncertainty in cross-country growth regressions. Journal of Applied Econometrics 16: 563–576.CrossRefGoogle Scholar
  15. Fernandez, C., E. Ley, and M.F.J. Steel. 2001b. Benchmark priors for Bayesian model averaging. Journal of Econometrics 100: 381–427.CrossRefGoogle Scholar
  16. Garratt, A., K. Lee, M.H. Pesaran, and Y. Shin. 2003. Forecast uncertainties in macroeconomic modelling: An application to the U.K. economy. Journal of the American Statistical Association 98: 829–838.CrossRefGoogle Scholar
  17. George, E.I. 1999. Discussion of Bayesian model averaging and model search strategies by M.A. Clyde. Bayesian Statistics 6: 175–177.Google Scholar
  18. George, E.I., and D.P. Foster. 2000. Calibration and empirical Bayes variable selection. Biometrika 87: 731–747.CrossRefGoogle Scholar
  19. George, E., and R.E. McCulloch. 1993. Variable selection via Gibbs sampling. Journal of the American Statistical Association 88: 881–889.CrossRefGoogle Scholar
  20. Geweke, J. 1989. Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57: 1317–1339.CrossRefGoogle Scholar
  21. Geweke, J., and C. Whiteman. 2006. Bayesian forecasting. In Handbook of economic forecasting, ed. G. Elliott, C.W.J. Granger, and A. Timmermann, vol. 1. Amsterdam: North-Holland.Google Scholar
  22. Gilks, W., S. Richardson, and D. Spiegelhalter. 1996. Markov Chain Monte Carlo in practice. New York: Chapman & Hall.Google Scholar
  23. Hansen, B.E. 2007. Least squares model averaging. Econometrica 75: 1175–1189.CrossRefGoogle Scholar
  24. Hjort, N.L., and G. Claeskens. 2003. Frequentist model averaging. Journal of the American Statistical Association 98: 879–899.CrossRefGoogle Scholar
  25. Hoeting, J.A., D. Madigan, A.E. Raftery, and C.T. Volinsky. 1999. Bayesian model averaging: A tutorial. Statistical Science 14: 382–417.CrossRefGoogle Scholar
  26. Jeffreys, H. 1961. Theory of probability. 3rd ed. Oxford: Clarendon Press.Google Scholar
  27. Kass, R.E., and A.E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90: 773–795.CrossRefGoogle Scholar
  28. Kass, R.E., and L. Wasserman. 1995. A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association 90: 928–934.CrossRefGoogle Scholar
  29. Klein, R.W., and S.J. Brown. 1984. Model selection when there is ‘minimal’ prior information. Econometrica 52: 1291–1312.CrossRefGoogle Scholar
  30. Koop, G. 2003. Bayesian econometrics. Chichester: Wiley.Google Scholar
  31. Leamer, E. 1978. Specification searches. New York: Wiley.Google Scholar
  32. Levin, A.T., and J.C. Williams. 2003. Robust monetary policy with competing reference models. Journal of Monetary Economics 50: 945–975.CrossRefGoogle Scholar
  33. Madigan, D., and J. York. 1995. Bayesian graphical models for discrete data. International Statistical Review 63: 215–232.CrossRefGoogle Scholar
  34. Mitchell, T.J., and J.J. Beauchamp. 1988. Bayesian variable selection in linear regression. Journal of the American Statistical Association 83: 1023–1032.CrossRefGoogle Scholar
  35. Raftery, A.E. 1995. Bayesian model selection in social research. Sociological Methodology 25: 111–163.CrossRefGoogle Scholar
  36. Raftery, A.E., D. Madigan, and J.A. Hoeting. 1997. Bayesian model averaging for linear regression models. Journal of the American Statistical Association 92: 179–191.CrossRefGoogle Scholar
  37. Raftery, A.E., D. Madigan, and C.T. Volinsky. 1996. Accounting for model uncertainty in survival analysis improves predictive performance. Bayesian Statistics 5: 323–349.Google Scholar
  38. Sala-i-Martin, X., G. Doppelhofer, and R.M. Miller. 2004. Determinants of economic growth: A Bayesian averaging of classical estimates (BACE) approach. American Economic Review 94: 813–835.CrossRefGoogle Scholar
  39. Schwarz, G. 1978. Estimating the dimension of a model. Annals of Statistics 6: 461–464.CrossRefGoogle Scholar
  40. Wasserman, L. 2000. Bayesian model selection and model averaging. Journal of Mathematical Psychology 44: 92–107.CrossRefGoogle Scholar
  41. Yang, Y. 2001. Adaptive regression by mixing. Journal of the American Statistical Association 96: 574–588.CrossRefGoogle Scholar
  42. Yuan, Z., and Y. Yang. 2005. Combining linear regression models: When and how? Journal of the American Statistical Association 100: 1202–1214.CrossRefGoogle Scholar
  43. Zellner, A. 1986. On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In Bayesian inference and decision techniques: Essays in honor of Bruno de Finetti, ed. P.K. Goel and A. Zellner. Amsterdam: North-Holland.Google Scholar

Model Averaging Software and Codes

  1. LeSage’s Econometrics Toolbox:

Copyright information

© Macmillan Publishers Ltd. 2018

Authors and Affiliations

  • Gernot Doppelhofer
    • 1
  1. 1.