A Monte Carlo study of the BE estimator for growth regressions


A recent Monte Carlo study claims that the BE estimator outperforms other panel estimators in terms of average estimation bias in a dynamic specification of the Solow model in levels (Hauk and Wacziarg in J Econ Growth 14(2):103–147, 2009). Our simulation results show that the reported performance of the BE estimator depends on the selected parameterization of the data generating process. Using alternative parameter values, a different model specification, and a restricted cross-section estimator, we find that the BE estimator tends to produce a coefficient of the lagged endogenous variable that is biased toward 1.

This is a preview of subscription content, access via your institution.


  1. 1.

    For BE estimates of a static production function model, see, e.g., Mairesse (1990).

  2. 2.

    The econometrics literature has discussed the implications of cross-country heterogeneity in technology parameters for some time. For instance, Pesaran and Smith (1995) use estimates of labor demand functions to study the bias that results from pooling dynamic heterogenous panels. Pesaran et al. (1996) use Monte Carlo simulations to show that traditional pooled estimators can be misleading for dynamic panels with slope heterogeneity.

  3. 3.

    In addition to technology heterogeneity and cross-section correlation, the mainstream empirical growth literature has also largely ignored the problem of variable non-stationarity. These topics are emphasized in recent panel time series econometrics (for a survey, see Eberhardt and Teal 2011).

  4. 4.

    This outcome is by no means self-evident because after the first time period, the synthetic cross-country per capita incomes depend on the endogenously generated per capita income of the previous period. Hence, there is no guarantee that the synthetic panel data will evolve just like the observed panel data, but under the assumptions being made they do, as documented for our replication in Table 7 in the Appendix. The appendix also gives more details on the construction of the synthetic data.

  5. 5.

    In a continuous-time specification of Eq. (3), the sums would be replaced by integrals and the difference between the two time-averaged variables would become even smaller.

  6. 6.

    If one has 40 years of data (1960–2000) as in HW and in our simulations below, the time horizon of the synthetic data is 40 years. HW set \(\tau =5\), which implies that \(T=8\). The MRW estimator sets \(\tau =40\), which implies that \(T=1\).

  7. 7.

    Also different from HW, we use Stata’s simulate command and code a number of additional sub-files for easier and more detailed reporting of results.

  8. 8.

    The Monte Carlo simulations demonstrate the robustness of an estimator conditional on the variation of the true parameters of the DGP; they do not demonstrate the plausibility of alternative economic models.

  9. 9.

    A more detailed discussion of the effect of heterogeneity bias on the BE estimates of \({\hat{b}}_j \left( {j=1,\ldots ,4} \right) \) in a dynamic panel specification is available upon request.

  10. 10.

    We have repeated the simulations reported in Table 2 with synthetic data that include two additional time periods. These synthetic data are based on the moments of PWT 7.1 data (Heston et al. 2012) and schooling data from Barro and Lee (1994) and UNESCO (2012). Our simulation results based on the extended synthetic data are not systematically different from the main results presented in Table 2, as discussed in an earlier version of this paper. Hence, the simulation results reported in Table 2 appear to be robust to employing synthetic data with alternative moments and additional time periods. Detailed results can be generated with our Stata code (available for download from our homepages) by changing the data base from “hw” to “pwt”.

  11. 11.

    The results in Tables 2 and 3 for a true \(b_1 =0.832\) are not exactly identical because they are based on different simulation runs with 1000 replications.

  12. 12.

    For the alternative sets of periods, see also the notes to Table 3 and the lower panel of Table 9.

  13. 13.

    It is again acknowledged that a parameterization with \(b_1 =0.25\) and \(\tau =5\) is not plausible from the perspective of the Solow model. Here it is only meant to reveal the extent of the BE bias.

  14. 14.

    As before, the results are not exactly identical because they are based on different simulation runs with 1000 replications.

  15. 15.

    HW also consider the MRW estimator for \(\tau =40\). As it seems, they have used equation (\(2'\)) for their estimates, which can be compared with our estimates in the last row of Table 4. HW find a large difference between the estimated and the true coefficient of the lagged endogenous variable (0.773 vs. 0.229; their Table 12, \(F=10\,\%\)) but do not comment on this particular result. Their other coefficient estimates are also much further away from the true coefficients than our estimates in the last row of Table 4. In light of their own estimates, it remains unclear why HW (p. 141) conclude that “the MRW and BE biases are very similar in terms of magnitudes and signs” (which is correct for \(\tau =40\) but not shown by HW and incorrect for \(\tau <40\)).

  16. 16.

    Detailed results for BE estimates of equation (\(5'\)) with time averages of the two income variables are not reported but can be generated with our Stata code (available for download from our homepages).

  17. 17.

    In our simulations, measurement error is built in without autocorrelation, and hence, the covariance matrix is a simple identity matrix. In case of autocorrelated measurement errors, the non-diagonal elements would have to contain the degree of autocorrelation.


  1. Baltagi BH, Griffin JM (1984) Short and long run effects in pooled models. Int Econ Rev 25(3):631–645

    Article  Google Scholar 

  2. Barro RJ, Lee JW (1994) A new data set of educational attainment in the world. NBER working paper, 15902, Cambridge MA

  3. Barro RJ, Lee JW (2000) International data on educational attainment: updates and implications. CID working papers 42, Center for International Development at Harvard University, Cambridge MA

  4. Bergstrom AR (1984) Continuous time stochastic models and issues of aggregation over time. In: Griliches Z, Intrilligator MD (eds) Handbook of econometrics, vol II. Elsevier, Amsterdam, pp 1145–1212

    Google Scholar 

  5. Coakley J, Fuertes AM, Smith R (2006) Unobserved heterogeneity in panel time series models. Comput Stat Data An 50(9):2361–2380

    Article  Google Scholar 

  6. Eberhardt M, Teal F (2011) Econometrics for grumblers: a new look at the literature on cross-country growth empirics. J Econ Surv 25(1):109–155

    Article  Google Scholar 

  7. Hauk WR, Wacziarg R (2009) A Monte Carlo study of growth regressions. J Econ Growth 14(2):103–147

    Article  Google Scholar 

  8. Heston A, Summers R, Aten B (2002) Penn world table version 6.1. Center for International Comparisons of Production, Income and Prices at the University of Pennsylvania, November. URL: http://pwt.econ.upenn.edu/. Accessed 7 Sept 2012

  9. Heston A, Summers R, Aten B (2012) Penn world table version 7.1. Center for International Comparisons of Production, Income and Prices at the University of Pennsylvania, October 2002. URL: http://pwt.econ.upenn.edu/. Accessed 10 Sept 2012

  10. Islam N (1995) Growth empirics: a panel data approach. Q J Econ 110(4):1127–1170

    Article  Google Scholar 

  11. Kiviet JF (2011) Monte Carlo simulations for econometricians. Foundations and Trends \(\textregistered \) in Econometrics 5(1–2):1–181

  12. Lee K, Pesaran MH, Smith R (1998) Growth empirics: a panel data approach—a comment. Q J Econ 113(1):319–323

    Article  Google Scholar 

  13. Mairesse J (1990) Time-series and cross-sectional estimates on panel data: why are they different and why should they be equal? In: Hartog J, Ridder G, Theeuwes J (eds) Panel data and labor market studies. North Holland, Amsterdam, pp 81–95

    Google Scholar 

  14. Mankiw NG, Romer D, Weil DN (1992) A contribution to the empirics of economic growth. Q J Econ 107(2):408–437

    Article  Google Scholar 

  15. Pesaran MH (2004) General diagnostic tests for cross section dependence in panels. CESifo working paper, 1229, Munich

  16. Pesaran MH, Smith R (1995) Estimating long-run relationships from dynamic heterogeneous panels. J Econom 68(1):79–113

    Article  Google Scholar 

  17. Pesaran MH, Smith R, Im KS (1996) Dynamic linear models for heterogeneous panels. In: Matyas L, Sevestre P (eds) The econometrics of panel data. Advanced studies in theoretical and applied econometrics, vol 33. Kluwer, Amsterdam, pp 145–195

    Google Scholar 

  18. Tiao GC, Wei WS (1976) Effect of temporal aggregation on the dynamic relationship of two time series variables. Biometrika 63(3):513–523

    Article  Google Scholar 

  19. UNESCO Gross secondary school enrolment rates. URL: http://stats.uis.unesco.org/unesco. Accessed 16 Sept 2012

Download references


We are grateful to two anonymous referees and the associate editor Badi Baltagi for many constructive suggestions on earlier versions, to Bill Hauk and Romain Wacziarg for sharing their Stata code, and especially to Bill Hauk for patiently explaining many details of the code to one of us.

Author information



Corresponding author

Correspondence to Erich Gundlach.



Table 7 Correlation matrices, observed and synthetic data

Construction of the synthetic data

We use the Stata code kindly provided by HW to derive the moments of the observed data, to draw the random variables, and to implement error heteroscedasticity. The following notes briefly summarize our application of the HW approach. More details are reported in HW (see their Sect. 3.1).

The synthetic data for our Monte Carlo Simulations are based on the same sources used by HW, namely income, population, and investment data from Penn World Table (PWT) 6.1 (Heston et al. 2002) and schooling data from Barro and Lee (2000). Table 8 lists all variables used in Eq. (1) with their definitions and their original labels.

Table 8 List of variables

The population data from PWT are used to calculate period-specific population growth rates. As is common in the empirical growth literature, the rate of technological change is set to \(g=0.02\) and the depreciation rate is set to \(\delta =0.05\). The time fixed effect \(\eta _t\) is eliminated by measuring all variables as differences from the country averages in each year. The country fixed effect \(\mu _i\), which is treated like an additional variable, is estimated from the PWT sample observations in 1960–2000 by a standard fixed effects regression using Eq. (1). The error term is identified as the residual of the same regression.

The basic idea of the Monte Carlo simulations is that the true DGP is known, such that the biasing effects of measurement error and other data problems can be assessed. But the observed data are likely to come with measurement error in the first place. Hence, the covariance matrix of the empirical variables and the corresponding estimate of the error term have to be corrected as explained by HW (p. 116) before the moments of the empirical variables can be drawn.

The country fixed effect is also included in the covariance matrix because it is treated like an additional variable. To allow for the presence and absence of a heterogeneity bias in levels, the corresponding columns and rows are multiplied by the variable hl, which can be set to a value between 1 and 0 to control for the strength of the correlation between the fixed effect and the other variables of the DGP. We use \(hl=1\) as the default parameterization in our simulations, i.e., a heterogeneity bias in levels of 100 %.

Given these adjustments, the observed data for 1960–2000 are averaged into eight periods with an interval length of \(\tau =5\) years. The period averages for \(\ln s^{k}\), \(\ln s^{h}\), and \(\ln \left( {n+g+\delta } \right) \) are based on the values of all five observations within each period. Hence, the averages for the first period are based on the observed data for the years 1960–1964, the averages for the second period are based on the observed data for 1965–1969, and so on. Then the means and standard deviations of the time-averaged variables are derived for each of the eight periods and the cross-section correlation of all variables is calculated for each time period.

The adjusted moments of \(\ln s^{k}\), \(\ln s^{h}\), and \(\ln \left( {n+g+\delta } \right) \) are used together with the moments of the country fixed effect and the moments of \(\ln y\) in 1960 to draw an initial set of right-hand side variables for the first period for 100 cross-section units (countries). That is, Eq. (1) uses these random variables to generate the synthetic per capita income for 100 countries at the end of the first period, which in turn is used with the random variables based on the moments of the time-averaged empirical variables of the second period to generate the synthetic per capita income of the third period, and so on until the final synthetic per capita income is reached. This process implies that the end-of-period income in t is equal to the begin-of-period income in \(t+1\). Table 9 summarizes the time structure that underlies the generation of the synthetic data.

Table 9 Periods and years of the synthetic data

Different from HW, our Stata code also allows for changes in \(\tau \), i.e., changes in the number of time periods that are used to calculate the time-averaged variables. For instance if \(T=4\,\,\left( {\tau =10}\right) \), the averages of \(\ln s^{k}\), \(\ln s^{h}\), and \(\ln \left( {n+g+\delta } \right) \) for the first period are calculated from the first two periods of the synthetic data with \(T=8\), initial income refers to 1960, and final income refers to 1969=1970. The synthetic data for the remaining three periods of this example follow along the same lines. The same logic is applied to calculate synthetic data with an alternative number of periods, including the special case of \(T=1\,\,\left( {\tau =40}\right) \), where income in 2000 is used as the dependent variable and income in 1960 is used as the lagged dependent variable.

In the generation of the synthetic data, the error term of the DGP Eq. (1) is augmented to allow for different degrees of endogeneity bias. The error term is assumed to consist of two parts,

$$\begin{aligned} \nu _{i,t}=\varepsilon _{i,t} +\zeta _{i,t}. \end{aligned}$$

The first part, \(\varepsilon _{i,t} =rc\ln s_{i,t}^k +rc\ln \ln s_{i,t}^h -rc\ln \left( {n_{i,t} +g+\delta } \right) \), captures the assumed degree of correlation between the error term and the independent variables by setting rc to a positive or negative value. The default parameterization of our simulations is an absolute value of \(rc=0.25\), i.e., a residual correlation of 25 %. This correlation is assumed to be positive for \(\ln s^{k}\) and \(\ln s^{h}\), negative for \(\ln \left( {n+g+\delta } \right) \), and constant over all time periods.

The second part, \(\zeta _{i,t}\sim N\left( {\mu _\zeta ,\,\sigma _\zeta ^2}\right) \) with \(\mu _\zeta =\mu \left( {\nu _t} \right) -\mu \left( {\varepsilon _t}\right) \) and \(\sigma _\zeta ^2 =\left[ {\sigma \left( {\nu _t}\right) +\sigma \left( {\varepsilon _t}\right) } \right] ^{2}\), is the white noise term where mean and variance are the difference of the overall moment minus the value of the part which is correlated with the independent variables. The definition of the white noise term is necessary to ensure that the moments of the overall error term mimic the moments of the “real” error term.

Finally, measurement error is added to the variables according to

$$\begin{aligned} \ln \tilde{x}_{i,t} =\ln x_{i,t} +d_{i,t}, \end{aligned}$$

where \(x_{i,t}\) is a variable free of the measurement error, \(\tilde{x}_{i,t}\) is the same variable shocked with measurement error, and \(d_{i,t}\) is the measurement error term. The measurement error term is constructed as \(d_{i,t}\sim N\left( {-me\,\sigma _x^2 /2,\,me\,\sigma _x^2}\right) \) where me captures the assumed degree of measurement error. The default parameterization of our simulations is a value of \(me=0.10\), i.e., a measurement error of 10 %.

Since d is assumed to be log normal distributed, the moments of \(x_{i,t}\) remain independent of the degree of measurement error. Put differently, the underlying assumption is that the modeling of the measurement error with Eq. (12) does not lead to systematic negative or positive biases when drawing the random variables for the simulations.Footnote 17

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ditzen, J., Gundlach, E. A Monte Carlo study of the BE estimator for growth regressions. Empir Econ 51, 31–55 (2016). https://doi.org/10.1007/s00181-015-1000-5

Download citation


  • Monte Carlo simulations
  • Dynamic panel specification
  • BE estimator
  • Solow model
  • Convergence rate

JEL Classification

  • C15
  • C23
  • O47