Abstract
Objective
We address several issues concerning standard error bias in pooled timeseries crosssection regressions. These include autocorrelation, problems with unit root tests, nonstationarity in levels regressions, and problems with clustered standard errors.
Methods
We conduct unit root test for crimes and other variables. We use Monte Carlo procedures to illustrate the standard error biases caused by the above issues in pooled studies. We replicate prior research that uses clustered standard errors with differenceindifferences regressions and only a small number of policy changes.
Results
Standard error biases in the presence of autocorrelation are substantial when standard errors are not clustered. Importantly, clustering greatly mitigates bias resulting from the use of nonstationary variables in levels regressions, although in some circumstances clustering can fail to correct for standard error biases due to other reasons. The “small number of policy changes” problem can cause extreme standard error bias, but this can be corrected using “placebo laws”. Other biases are caused by weighting regressions, having too few units, and having dissimilar autocorrelation coefficients across units.
Conclusions
With clustering, researchers can usually conduct regressions in levels even with nonstationary variables. They should, however, be leery of pitfalls caused by clustering, especially when conducting differenceindifferences analyses.
This is a preview of subscription content, access via your institution.
Notes
 1.
The main alternative TSCS design is the random effects model. Random effects parameter estimates are biased and inconsistent in the presence of unobserved heterogeneity (Wooldridge 2002). The results here are not directly applicable to random effects regressions.
 2.
 3.
We are concerned here with issues that affect only standard errors in TSCS studies, and not issues causing biased coefficients and, thus, usually tratios as well. The major cause of the latter is missing variable bias. A widely discussed example is spatial dependency, where variables are related to their counterparts in other units.
 4.
It should be noted that addressing autocorrelation by clustering does not fix problems associated with omitted variables if the autocorrelation is caused by “misspecified dynamics,” e.g., not including lags in a dynamic regression model.
 5.
Differencing stationary data typically generates negative serial correlation in the residuals. This will cause the tratios to be underestimated, but that can be corrected by clustering in panels or using Newey and West (1987) standard errors in single time series.
 6.
Aside from the case where T = 2 in which case the FE and the first difference estimates are identical.
 7.
 8.
 9.
https://bjs.gov/ucrdata. These data are corrections of the original data published by the Federal Bureau of Investigation. The Bureau of Justice Statistics has not revised earlier data.
 10.
 11.
We also used two older tests, the ADF test and the Phillips and Perron (1988) tests. They found more states series to be stationary, but never a majority, especially for rape, robbery, and assault.
 12.
Similarly, Spelman (2017) found that police is nonstationary in the great majority of cities.
 13.
We include the latter because the lagged dependent variable was often used to mitigate autocorrelation and is still often used as an important control variable. See the “Appendix” concerning the possibility that the lagged dependent variable can bias coefficients.
 14.
The simulations were conducted without year dummies, an important element of the fixed effects model, because in the program the year effects are random. The results in Table 5 are the same when year effects are included. We explored using block bootstrapping in the simulations, and the critical values are approximately two to five percent larger, even if rho is set at zero. In practice, bootstrapping is probably not useful for crime regressions because we find that it usually fails when year effects are entered.
 15.
We could not use only 1 or 2 states because the variance matrix was nonsymmetric or highly singular.
 16.
In separate simulations, we found that the standard error biases are very similar when all variables are stationary or when only 20 years are used. The biases are slightly less throughout when adding a lagged dependent variable to the clustered regression. The biases for OLS, with and without a lagged dependent variable, are slightly higher than in Table 5 when only a few states are included. Also, the standard error biases for OLS are approximately a third less when variables are stationary or when 20 years are used. Simulations with block bootstrapping produced critical values similar to those in Fig. 1.
 17.
The general results apply when the nonstationary variables are substituted with variables that are stationary with large autocorrelation coefficients. The results are similar when the dependent variable, rather than the independent variable, has heterogeneous auto correlation coefficients. The standard error bias is greatly reduced by entering a lagged dependent variable.
 18.
This simulation assumes 50 states and 50 years, and it assumes that the dependent variable is nonstationary in all states. The results appear to be robust to other situations. They are very similar with 20 years and with stationary dependent variables. Critical values for OLS are quite uniform.
 19.
MacKennon and Webb (2017) show that wild bootstrapt methods do not work in this context.
 20.
The authors later published corrections to these results that did not change the overall conclusion or the order of magnitude of the estimated effects. (Webster et al 2014b). We thank Daniel Webster for supplying the data used in their paper. We believe that the authors can be faulted for not showing OLS results, where presumably the standard error is larger than with clustering, which should not be the case.
 21.
We use 999 observations instead of 1000 so that there is no need to average between two observations to get the 975th observation.
 22.
The 3.05 critical value for the dummy compares to the critical value of 2.72 for eight dummies in Fig. 3. The program, data, and log files for these examples are in the online appendix.
 23.
We thank John Donohue for supplying the data.
 24.
The 2.73 figure is the same as the critical value for eight state dummies in Fig. 3. Other examples of dummies or trends in a small number of states, with clustering, are Kovandzic et al (2004) and Crifasi et al (2015). Many earlier studies used single state variables with OLS, where the standard errors are biased due to autocorrelation, as discussed above.
 25.
In the past, the major reason for weighting was to mitigate heteroskedasticity, but this correction is now routine using robust regressions procedures, which are automatically included when clustering standard errors in Stata. The results of the simulation here are very similar when heteroskedasticity is created by introducing the dissimilar state variation into the error term.
 26.
Table 8 is based on simulations using 30 years. The results vary little with the number of years, the number of nonstationary dependent variables, and whether a lagged dependent variable is entered. The results are also the same when dummies are entered for all states.
 27.
References
Abadie A, Diamond A, Hainmueller J (2010) Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program. J Am Stat Assoc 105:493–505
Abhay A, Donohue JJ, Zhang A (2014) The impact of right to crarry laws and the NRC Report: the latest lessons for the empirical evaluation of law and policy. NBER Working Paper No. 18294 http://www.nber.org/papers/w18294
Arellano M (1987) Computing robust standard errors for withingroups estimators. Oxf Bull Econ Stat 49(4):431–434
Bertrand M, Duflo E, Mullainathan S (2002) How much should we trust differencesindifferences estimates? NBER Working Paper No. 8841 http://www.nber.org/papers/w8841
Bertrand M, Duflo E, Mullainathan S (2004) How much should we trust differencesindifferences estimates? Q J Econ 119(1):249–275
Bound J, Jaeger DA, Baker RM (1995) Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J Am Stat Assoc 90:443–450
Breitung J (2000) The local power of some unit root tests for panel data. In: Baltagi BH (ed) Advances in econometrics, volume 15: nonstationary panels, panel cointegration, and dynamic panels. JAI Press, Amsterdam, pp 161–178
Breitung J, Pesaran MH (2008) Unit roots and cointegration in panels. In: Matyas L, Sevestre P (eds) The economics of panel data. Springer, Berlin, pp 279–302
Breusch TS (1978) Testing for autocorrelation in dynamic linear models. Aust Econ Pap 17:334–355
Cameron AC, Miller DL (2015) A practitioner’s guide to clusterrobust inference. J Hum Resour 50(2):317–372
Chalfin A, Haviland AM, Raphael S (2012) What do panel studies tell us about a deterrent effect of capital punishment? A critique of the literature. J Quant Criminol 29:5–43
Conley TG, Tabor CR (2011) Inference with “difference in differences” with a small number of policy changes. Rev Econ Stat 93:113–125
Crifasi CK, Meyers JS, Vernick JS, Webster DW (2015) Effects of changes in permittopurchase handgun laws in Connecticut and Missouri on suicide rates. Prev Med 79:43–49
Dickey DA, Fuller WA (1979) Distribution of the estimators for autoregressive time series with a unit root. J Am Stat Assoc 74:427–431
Donohue JJ, Wolfers J (2006) Uses and abuses of statistical evidence in the death penalty debate. Stanford L Rev 58:791–845
Eliott G, Rothenburg TJ, Stock JH (1996) Efficient tests for autoregressive unit root. Econometrica 64:813–831
Glaeser EL, Sacerdote B, Scheinkman JA (1996) Crime and social interactions. Q J Econ 3:507–548
Godfrey LG (1978) Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables. Econometrica 46:1293–1301
Granger CWJ, Newbold P (1974) Spurious regressions in econometrics. J Econom 2:111–120
Hadri K (2000) Testing for stationarity in heterogeneous panel. Econom J 3:148–161
Hamilton JD (1994) Time series analysis. Princeton University Press, Princeton
Hanson CB (2007) Asymptotic properties of a robust variance matrix estimator for panel data when T is large. J Econom 141(2):597–620
Harris RDF, Tzavalis E (1999) Inference for unit roots in dynamic panels where the time dimension is fixed. J Econom 91:201–226
Helland E, Tabarrok A (2004) Using placebo laws to test “more guns, lesscrime”. Adv Econ Anal Pol 4(1):1–7
Hlouskova J, Wagner M (2006) The performance of panel unit root and stationarity tests: results from a large scale simulation study. Econ Rev 25:85–116
Im KS, Pesaran MH, Shin Y (2003) Testing for unit roots in heterogeneous panels. J Econom 115:53–74
Judson RA, Owen AL (1999) Estimating dynamic panel data models: a guide for macroeconomists. Econ Lett 65:9–15
Kim JH, Choi I (2017) Unit roots in economic and financial time series: a reevaluation at the decisionbased significance levels. Econometrics 5(3):41
Kovandzic TV, Sloan JS, Vieriatis LM (2004) “Striking out” as crime reduction policy: the impact of “three strikes” laws on crime rates in U.S. cities. Justice Q 21:207–239
Kwiatkowski D, Phillips PCB, Schmidt P, Shin Y (1992) Testing the null hypothesis of stationarity against the alternative of a unit root. Econom J 54:91–115
Levin A, Lin CF, Chu SJ (2002) Unit roots in panel data: asymptotic and finitesample properties. J Econom 108:1–24
MacKinnon JG, Webb MD (2017) Wild bootstrap inference for wildly different cluster sizes. J Appl Econom 32(1):233–254
Maddala GS, Wu S (1999) A comparative study of unit root tests with panel data and a new simple test. Oxf Bull Econ Stat 61:631–652
Marvell TB (2010) Prison population and crime. In: Benson BL, Zimmerman PR (eds) Handbook on the economics of crime. Edward Elgar, Northhampton, pp 145–183
Moody CE (2016) Fixedeffects panel data models: to cluster or not to cluster. SSRN: https://ssrn.com/abstract=2840273
Newey WK, West KD (1987) A simple, positive semidefinite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55:703–708
Nickell S (1981) Biases in dynamic models with fixed effects. Econometrica 49:1417–1426
Phillips PCB, Perron P (1988) Testing for a unit root in time series regression. Biometrika 75:335–346
Rosenbaum PR (1996) Observational studies and nonrandomized experiments. In: Ghosh S, Rao CR (eds) Handbook of statistics, vol 13, pp 181–197
Spelman W (2008) Specifying the relationship between prison and crime. J Quant Criminol 24:149–178
Spelman W (2017) The murder mystery: police effectiveness and homicide. J Quant Criminol 33:859–886
Stock JH, Watson MW (2003) Introduction to econometrics. Pearson Education, Boston
Webster D, Crifasi CK, Vernick JS (2014a) Effects of the repeal of Missouri’s handgun purchaser licensing law on homicides. J Urban Health 91(2):293–302
Webster D, Crifasi CK, Vernick JS (2014b) Erratum to: Effects of the repeal of Missouri’s handgun purchaser licensing law on homicides. J Public Health 91(3):598–601
Wooldridge JM (2002) Economic analysis of cross section and panel data. MIT Press, Cambridge
Wooldridge JM (2016) Introductory economics, 6th edn. Cengage Learning, Boston
Acknowledgements
We thank the editor and three anonymous referees for constructive comments on an earlier draft.
Author information
Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Fixed Effects Regression and Monte Carlo
Appendix: Fixed Effects Regression and Monte Carlo
The fixed effects (FE) regression model has the advantage of being unbiased in the presence of unobserved heterogeneity. That is, if each state has longrun, permanent features that are correlated both with the dependent variable and the independent variables in the model, then any regression procedure, such as the random effects model or the pooled ordinary least squares model, that uses variation across states will be biased and inconsistent. This is very likely to be the case in criminology since Massachusetts, for example, is permanently different from Louisiana because of history, culture, climate, and a number of other dimensions. The same can be said for Arizona and Vermont, Hawaii and any other state.
The FE model has the form^{Footnote 27}
where i = 1, …, N, t = 1, …, T, X_{k,it} is the value of the kth regressor for state i in year t, α_{i} are statespecific fixed effects, and γ_{t} are yearspecific fixed effects. This model requires four assumptions (assuming one regressor for simplicity):
that is, the conditional value of the error term is zero, given the value of the regressor(s);
that is, the variable(s) over all the years in one state are distributed identically but independent of the same variable(s) over the same time span in other states;
have nonzero finite fourth moments; this assumption is important for asymptotic results, it limits the probability of observing extreme values of the regressor(s) or the errors;
the errors are uncorrelated over time, conditional on the regressor(s).
With these assumptions we can estimate the fixedeffects model, generating unbiased and consistent estimates. (The FE model is not efficient because it ignores crosssection variation.) The model is estimated by applying ordinary least squares to the demeaned variable(s), e.g. \(\bar{y}_{i} = (1/T)\sum\nolimits_{t = 1}^{T} {y_{it} } ,\;\bar{x}_{i} = (1/T)\sum\nolimits_{t = 1}^{T} {x_{it} } ,\bar{u}_{i} = (1/T)\sum\nolimits_{t = 1}^{T} {u_{it} }\). The crosssection equation is
Subtracting the crosssection equation from (6), still assuming only one regressor, yields
The fixed effects have been “swept out” or “absorbed.” We can write this equation as
The fixedeffects estimator is just OLS applied to (13). This estimator is also known as the “within” estimator because it only uses variation within each state.
Identical results can be achieved by regressing y on x and a set of dummy variables, one for each state such that the dummy for state i is one if state = state i, otherwise zero. This is the “least squares dummy variable” or LSDV model.
Generalizing to multiple regression, define the matrix of observations on the demeaned.
Regressor for state i as \(\tilde{X}_{i}\) so that
The asymptotic variance–covariance matrix is
where \(\hat{\sigma }_{u}^{2} = {{\sum\nolimits_{i = 1}^{N} {\sum\nolimits_{t = 1}^{T} {\hat{u}_{it}^{2} } } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^{N} {\sum\nolimits_{t = 1}^{T} {\hat{u}_{it}^{2} } } } {(N(T  1)  K)}}} \right. \kern0pt} {(N(T  1)  K)}}\) is a consistent estimator of the error variance and
The square roots of the principal diagonal of the AVAR matrix are the standard errors.
Clustered Standard Errors
The clustered asymptotic variance–covariance matrix (Arellano 1987) is a modified sandwich estimator (White 1984, Chapter 6):
The “meat” of the sandwich contains the estimated covariances among the error terms. The residuals are “clustered” in the sense that only covariances from state i are used, covariances from other states are ignored. The formula therefore corrects for heteroskedasticity (using the squared terms) and autocorrelation using the remaining terms.
Bias with Lagged Dependent Variables
A problem particular to TSCS regression is potential bias when a lagged dependent variable is included in the list of regressors. This variable is correlated with the error term, which biases its coefficient (Nickell 1981). The order of the bias is 1/T, and the bias declines relatively rapidly with more years. Judson and Owen (1999) show that, for T of 30 or more, fixed effects TSCS performs as well or better than other methods, such as generalized methods of moments. For T = 20 the bias is roughly 20% for the coefficient on the lagged dependent variable. The impact on the lagged dependent variable affects the coefficients and standard errors on other independent variables if these are correlated with the lagged dependent variable. In practice, the researcher does not know the direction and extent of this bias. We do not encounter the Nichols problem in our simulations because we set the coefficient on “x” to be zero, thereby dropping it from the regression and assuring that its coefficient remains unbiased.
DFGLS Unit Root Test
Elliott et al. (1996) propose the following twostep procedure.
If the data has a trend:

1.
Detrend by estimating the following regression using OLS.
$$(y_{t}  ay_{t  1} ) = \alpha (1  a) + \delta (t  a(t  1)) + v_{t}$$
ERS do Monte Carlo experiments to determine the optimal value of a:
Note that \(a \to 1\) as T goes to infinity (local to unity).

2.
Compute the (detrended) residuals \(e = \hat{v}_{t}\) and estimate the usual ADF test equation.
$$\Delta e_{t} = (\rho  1)e_{t  1} + \sum\limits_{j = 1}^{p} {\gamma_{j} } \Delta e_{t  j} + v_{t}$$
Use the modified AIC, to choose the lag length, p. Test using the standard tratio on the lagged level, taking the critical values from the ERS tables.
If the data does not have a trend:

1.
Estimate the constant only model (demeaned not detrended).
$$(y_{t}  ay_{t  1} ) = \alpha (1  a) + v_{t} \;{\text{where}}\;a = 1  \frac{7}{T}$$ 
2.
Compute the residuals, \(\tilde{y}_{t}\) and estimate the same ADF test.
$$\Delta \tilde{y}_{t} = (\rho  1)\tilde{y}_{t  1} + \sum\limits_{j = 1}^{p} {\gamma_{j} } \Delta \tilde{y}_{t  j} + v_{t}$$
Monte Carlo Program
The model is generated as follows. The researcher specifies the number of states with nonstationary dependent variables, n_{y}, and the number of states with nonstationary independent variables, n_{x}. In each case, the remaining states series are stationary with an autocorrelation coefficient randomly chosen with values between 0.80 and 0.99.
The state fixed effects (\(a_{i}\),\(b_{i}\)) are drawn from a uniform distribution with range [0, 100] and \(\sigma_{\varepsilon }^{2}\) = \(\sigma_{\nu }^{2}\) = 1. The estimated fixedeffects model is
We ran 10,000 regressions of this model for each variation of years and numbers of nonstationary dependent and independent variables. The results are presented in Table A in the online appendix, along with the Stata dofiles. We also prepared a lengthy table with critical values for clustered regressions comparable to Table A, but giving the critical values from 0.10 to 0.01 as well as 0.005 (Table B).
Rights and permissions
About this article
Cite this article
Moody, C.E., Marvell, T.B. Clustering and Standard Error Bias in Fixed Effects Panel Data Regressions. J Quant Criminol 36, 347–369 (2020). https://doi.org/10.1007/s109400189383z
Published:
Issue Date:
Keywords
 Panel data regression
 Auto correlation
 Nonstationarity
 Clustered standard errors
 Small N
 Differenceindifferences
 Weighted regressions