Skip to main content

Clustering and Standard Error Bias in Fixed Effects Panel Data Regressions



We address several issues concerning standard error bias in pooled time-series cross-section regressions. These include autocorrelation, problems with unit root tests, nonstationarity in levels regressions, and problems with clustered standard errors.


We conduct unit root test for crimes and other variables. We use Monte Carlo procedures to illustrate the standard error biases caused by the above issues in pooled studies. We replicate prior research that uses clustered standard errors with difference-in-differences regressions and only a small number of policy changes.


Standard error biases in the presence of autocorrelation are substantial when standard errors are not clustered. Importantly, clustering greatly mitigates bias resulting from the use of nonstationary variables in levels regressions, although in some circumstances clustering can fail to correct for standard error biases due to other reasons. The “small number of policy changes” problem can cause extreme standard error bias, but this can be corrected using “placebo laws”. Other biases are caused by weighting regressions, having too few units, and having dissimilar autocorrelation coefficients across units.


With clustering, researchers can usually conduct regressions in levels even with nonstationary variables. They should, however, be leery of pitfalls caused by clustering, especially when conducting difference-in-differences analyses.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

    The main alternative TSCS design is the random effects model. Random effects parameter estimates are biased and inconsistent in the presence of unobserved heterogeneity (Wooldridge 2002). The results here are not directly applicable to random effects regressions.

  2. 2.

    This induces the Nickell (1981) bias, but the FE model is nevertheless consistent as T, the number of years, goes to infinity. There is little bias for T > 20. See Judson and Owen 1999. See also the “Appendix”.

  3. 3.

    We are concerned here with issues that affect only standard errors in TSCS studies, and not issues causing biased coefficients and, thus, usually t-ratios as well. The major cause of the latter is missing variable bias. A widely discussed example is spatial dependency, where variables are related to their counterparts in other units.

  4. 4.

    It should be noted that addressing autocorrelation by clustering does not fix problems associated with omitted variables if the autocorrelation is caused by “mis-specified dynamics,” e.g., not including lags in a dynamic regression model.

  5. 5.

    Differencing stationary data typically generates negative serial correlation in the residuals. This will cause the t-ratios to be underestimated, but that can be corrected by clustering in panels or using Newey and West (1987) standard errors in single time series.

  6. 6.

    Aside from the case where T = 2 in which case the FE and the first difference estimates are identical.

  7. 7.

    An alternative to the ADF test is the Phillips and Perron (1988) test where the lagged differences are omitted and any autocorrelation is taken care of by the use of Newey and West (1987) standard errors.

  8. 8.

    An example of the extent of this practice can be found in a review of 36 TSCS studies of the impact of prison populations on crime (Marvell 2010). All use differenced data, presumably because crimes and prison population may be nonstationary in most states (see Tables 2, 4).

  9. 9. These data are corrections of the original data published by the Federal Bureau of Investigation. The Bureau of Justice Statistics has not revised earlier data.

  10. 10.

    Tables 1 and 3 use the 0.05 significance level. When results are significant, they are always significant to the 0.001 level. When they are not significant, they are also not significant at the 0.10 level (with one exception, the HT test for larceny with no trend, where p = 0.051).

  11. 11.

    We also used two older tests, the ADF test and the Phillips and Perron (1988) tests. They found more states series to be stationary, but never a majority, especially for rape, robbery, and assault.

  12. 12.

    Similarly, Spelman (2017) found that police is nonstationary in the great majority of cities.

  13. 13.

    We include the latter because the lagged dependent variable was often used to mitigate autocorrelation and is still often used as an important control variable. See the “Appendix” concerning the possibility that the lagged dependent variable can bias coefficients.

  14. 14.

    The simulations were conducted without year dummies, an important element of the fixed effects model, because in the program the year effects are random. The results in Table 5 are the same when year effects are included. We explored using block bootstrapping in the simulations, and the critical values are approximately two to five percent larger, even if rho is set at zero. In practice, bootstrapping is probably not useful for crime regressions because we find that it usually fails when year effects are entered.

  15. 15.

    We could not use only 1 or 2 states because the variance matrix was nonsymmetric or highly singular.

  16. 16.

    In separate simulations, we found that the standard error biases are very similar when all variables are stationary or when only 20 years are used. The biases are slightly less throughout when adding a lagged dependent variable to the clustered regression. The biases for OLS, with and without a lagged dependent variable, are slightly higher than in Table 5 when only a few states are included. Also, the standard error biases for OLS are approximately a third less when variables are stationary or when 20 years are used. Simulations with block bootstrapping produced critical values similar to those in Fig. 1.

  17. 17.

    The general results apply when the nonstationary variables are substituted with variables that are stationary with large autocorrelation coefficients. The results are similar when the dependent variable, rather than the independent variable, has heterogeneous auto correlation coefficients. The standard error bias is greatly reduced by entering a lagged dependent variable.

  18. 18.

    This simulation assumes 50 states and 50 years, and it assumes that the dependent variable is nonstationary in all states. The results appear to be robust to other situations. They are very similar with 20 years and with stationary dependent variables. Critical values for OLS are quite uniform.

  19. 19.

    MacKennon and Webb (2017) show that wild bootstrap-t methods do not work in this context.

  20. 20.

    The authors later published corrections to these results that did not change the overall conclusion or the order of magnitude of the estimated effects. (Webster et al 2014b). We thank Daniel Webster for supplying the data used in their paper. We believe that the authors can be faulted for not showing OLS results, where presumably the standard error is larger than with clustering, which should not be the case.

  21. 21.

    We use 999 observations instead of 1000 so that there is no need to average between two observations to get the 975th observation.

  22. 22.

    The 3.05 critical value for the dummy compares to the critical value of 2.72 for eight dummies in Fig. 3. The program, data, and log files for these examples are in the online appendix.

  23. 23.

    We thank John Donohue for supplying the data.

  24. 24.

    The 2.73 figure is the same as the critical value for eight state dummies in Fig. 3. Other examples of dummies or trends in a small number of states, with clustering, are Kovandzic et al (2004) and Crifasi et al (2015). Many earlier studies used single state variables with OLS, where the standard errors are biased due to autocorrelation, as discussed above.

  25. 25.

    In the past, the major reason for weighting was to mitigate heteroskedasticity, but this correction is now routine using robust regressions procedures, which are automatically included when clustering standard errors in Stata. The results of the simulation here are very similar when heteroskedasticity is created by introducing the dissimilar state variation into the error term.

  26. 26.

    Table 8 is based on simulations using 30 years. The results vary little with the number of years, the number of nonstationary dependent variables, and whether a lagged dependent variable is entered. The results are also the same when dummies are entered for all states.

  27. 27.

    This development is taken from Stock and Watson (2003, pp. 278–291) and Wooldridge (2002, pp. 247–302).


  1. Abadie A, Diamond A, Hainmueller J (2010) Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program. J Am Stat Assoc 105:493–505

    Google Scholar 

  2. Abhay A, Donohue JJ, Zhang A (2014) The impact of right to crarry laws and the NRC Report: the latest lessons for the empirical evaluation of law and policy. NBER Working Paper No. 18294

  3. Arellano M (1987) Computing robust standard errors for within-groups estimators. Oxf Bull Econ Stat 49(4):431–434

    Google Scholar 

  4. Bertrand M, Duflo E, Mullainathan S (2002) How much should we trust differences-in-differences estimates? NBER Working Paper No. 8841

  5. Bertrand M, Duflo E, Mullainathan S (2004) How much should we trust differences-in-differences estimates? Q J Econ 119(1):249–275

    Google Scholar 

  6. Bound J, Jaeger DA, Baker RM (1995) Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J Am Stat Assoc 90:443–450

    Google Scholar 

  7. Breitung J (2000) The local power of some unit root tests for panel data. In: Baltagi BH (ed) Advances in econometrics, volume 15: nonstationary panels, panel cointegration, and dynamic panels. JAI Press, Amsterdam, pp 161–178

    Google Scholar 

  8. Breitung J, Pesaran MH (2008) Unit roots and cointegration in panels. In: Matyas L, Sevestre P (eds) The economics of panel data. Springer, Berlin, pp 279–302

    Google Scholar 

  9. Breusch TS (1978) Testing for autocorrelation in dynamic linear models. Aust Econ Pap 17:334–355

    Google Scholar 

  10. Cameron AC, Miller DL (2015) A practitioner’s guide to cluster-robust inference. J Hum Resour 50(2):317–372

    Google Scholar 

  11. Chalfin A, Haviland AM, Raphael S (2012) What do panel studies tell us about a deterrent effect of capital punishment? A critique of the literature. J Quant Criminol 29:5–43

    Google Scholar 

  12. Conley TG, Tabor CR (2011) Inference with “difference in differences” with a small number of policy changes. Rev Econ Stat 93:113–125

    Google Scholar 

  13. Crifasi CK, Meyers JS, Vernick JS, Webster DW (2015) Effects of changes in permit-to-purchase handgun laws in Connecticut and Missouri on suicide rates. Prev Med 79:43–49

    Google Scholar 

  14. Dickey DA, Fuller WA (1979) Distribution of the estimators for autoregressive time series with a unit root. J Am Stat Assoc 74:427–431

    Google Scholar 

  15. Donohue JJ, Wolfers J (2006) Uses and abuses of statistical evidence in the death penalty debate. Stanford L Rev 58:791–845

    Google Scholar 

  16. Eliott G, Rothenburg TJ, Stock JH (1996) Efficient tests for autoregressive unit root. Econometrica 64:813–831

    Google Scholar 

  17. Glaeser EL, Sacerdote B, Scheinkman JA (1996) Crime and social interactions. Q J Econ 3:507–548

    Google Scholar 

  18. Godfrey LG (1978) Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables. Econometrica 46:1293–1301

    Google Scholar 

  19. Granger CWJ, Newbold P (1974) Spurious regressions in econometrics. J Econom 2:111–120

    Google Scholar 

  20. Hadri K (2000) Testing for stationarity in heterogeneous panel. Econom J 3:148–161

    Google Scholar 

  21. Hamilton JD (1994) Time series analysis. Princeton University Press, Princeton

    Google Scholar 

  22. Hanson CB (2007) Asymptotic properties of a robust variance matrix estimator for panel data when T is large. J Econom 141(2):597–620

    Google Scholar 

  23. Harris RDF, Tzavalis E (1999) Inference for unit roots in dynamic panels where the time dimension is fixed. J Econom 91:201–226

    Google Scholar 

  24. Helland E, Tabarrok A (2004) Using placebo laws to test “more guns, lesscrime”. Adv Econ Anal Pol 4(1):1–7

    Google Scholar 

  25. Hlouskova J, Wagner M (2006) The performance of panel unit root and stationarity tests: results from a large scale simulation study. Econ Rev 25:85–116

    Google Scholar 

  26. Im KS, Pesaran MH, Shin Y (2003) Testing for unit roots in heterogeneous panels. J Econom 115:53–74

    Google Scholar 

  27. Judson RA, Owen AL (1999) Estimating dynamic panel data models: a guide for macroeconomists. Econ Lett 65:9–15

    Google Scholar 

  28. Kim JH, Choi I (2017) Unit roots in economic and financial time series: a re-evaluation at the decision-based significance levels. Econometrics 5(3):41

    Google Scholar 

  29. Kovandzic TV, Sloan JS, Vieriatis LM (2004) “Striking out” as crime reduction policy: the impact of “three strikes” laws on crime rates in U.S. cities. Justice Q 21:207–239

    Google Scholar 

  30. Kwiatkowski D, Phillips PCB, Schmidt P, Shin Y (1992) Testing the null hypothesis of stationarity against the alternative of a unit root. Econom J 54:91–115

    Google Scholar 

  31. Levin A, Lin CF, Chu SJ (2002) Unit roots in panel data: asymptotic and finite-sample properties. J Econom 108:1–24

    Google Scholar 

  32. MacKinnon JG, Webb MD (2017) Wild bootstrap inference for wildly different cluster sizes. J Appl Econom 32(1):233–254

    Google Scholar 

  33. Maddala GS, Wu S (1999) A comparative study of unit root tests with panel data and a new simple test. Oxf Bull Econ Stat 61:631–652

    Google Scholar 

  34. Marvell TB (2010) Prison population and crime. In: Benson BL, Zimmerman PR (eds) Handbook on the economics of crime. Edward Elgar, Northhampton, pp 145–183

    Google Scholar 

  35. Moody CE (2016) Fixed-effects panel data models: to cluster or not to cluster. SSRN:

  36. Newey WK, West KD (1987) A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55:703–708

    Google Scholar 

  37. Nickell S (1981) Biases in dynamic models with fixed effects. Econometrica 49:1417–1426

    Google Scholar 

  38. Phillips PCB, Perron P (1988) Testing for a unit root in time series regression. Biometrika 75:335–346

    Google Scholar 

  39. Rosenbaum PR (1996) Observational studies and nonrandomized experiments. In: Ghosh S, Rao CR (eds) Handbook of statistics, vol 13, pp 181–197

  40. Spelman W (2008) Specifying the relationship between prison and crime. J Quant Criminol 24:149–178

    Google Scholar 

  41. Spelman W (2017) The murder mystery: police effectiveness and homicide. J Quant Criminol 33:859–886

    Google Scholar 

  42. Stock JH, Watson MW (2003) Introduction to econometrics. Pearson Education, Boston

    Google Scholar 

  43. Webster D, Crifasi CK, Vernick JS (2014a) Effects of the repeal of Missouri’s handgun purchaser licensing law on homicides. J Urban Health 91(2):293–302

    Google Scholar 

  44. Webster D, Crifasi CK, Vernick JS (2014b) Erratum to: Effects of the repeal of Missouri’s handgun purchaser licensing law on homicides. J Public Health 91(3):598–601

    Google Scholar 

  45. Wooldridge JM (2002) Economic analysis of cross section and panel data. MIT Press, Cambridge

    Google Scholar 

  46. Wooldridge JM (2016) Introductory economics, 6th edn. Cengage Learning, Boston

    Google Scholar 

Download references


We thank the editor and three anonymous referees for constructive comments on an earlier draft.

Author information



Corresponding author

Correspondence to Thomas B. Marvell.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (ZIP 592 kb)

Appendix: Fixed Effects Regression and Monte Carlo

Appendix: Fixed Effects Regression and Monte Carlo

The fixed effects (FE) regression model has the advantage of being unbiased in the presence of unobserved heterogeneity. That is, if each state has long-run, permanent features that are correlated both with the dependent variable and the independent variables in the model, then any regression procedure, such as the random effects model or the pooled ordinary least squares model, that uses variation across states will be biased and inconsistent. This is very likely to be the case in criminology since Massachusetts, for example, is permanently different from Louisiana because of history, culture, climate, and a number of other dimensions. The same can be said for Arizona and Vermont, Hawaii and any other state.

The FE model has the formFootnote 27

$$y_{it} = \alpha_{i} + \sum\limits_{k = 1}^{K} {\beta_{k} } x_{k,it} + u_{it}$$

where i = 1, …, N, t = 1, …, T, Xk,it is the value of the kth regressor for state i in year t, αi are state-specific fixed effects, and γt are year-specific fixed effects. This model requires four assumptions (assuming one regressor for simplicity):

$$E(u_{it} |x_{i1} , \ldots ,x_{iT} ,\alpha_{i} ) = 0$$

that is, the conditional value of the error term is zero, given the value of the regressor(s);

$$(x_{i1} , \ldots ,x_{iT} ,y_{i1} , \ldots ,y_{iT} ),\quad i = 1, \ldots ,N\,{\text{are}}\,{\text{iid}}\,{\text{draws}}\,{\text{from}}\,{\text{their}}\,{\text{joint}}\,{\text{distribution}};$$

that is, the variable(s) over all the years in one state are distributed identically but independent of the same variable(s) over the same time span in other states;

$$\left( {x_{it} ,u_{it} } \right)$$

have nonzero finite fourth moments; this assumption is important for asymptotic results, it limits the probability of observing extreme values of the regressor(s) or the errors;

$$\text{cov} (u_{it} ,u_{is} |x_{it} ,x_{is} ) = 0\quad {\text{for}}\quad s \ne t;$$

the errors are uncorrelated over time, conditional on the regressor(s).

With these assumptions we can estimate the fixed-effects model, generating unbiased and consistent estimates. (The FE model is not efficient because it ignores cross-section variation.) The model is estimated by applying ordinary least squares to the demeaned variable(s), e.g. \(\bar{y}_{i} = (1/T)\sum\nolimits_{t = 1}^{T} {y_{it} } ,\;\bar{x}_{i} = (1/T)\sum\nolimits_{t = 1}^{T} {x_{it} } ,\bar{u}_{i} = (1/T)\sum\nolimits_{t = 1}^{T} {u_{it} }\). The cross-section equation is

$$\bar{y}_{i} = \alpha_{i} + \beta \bar{x}_{i} + \bar{u}_{i}$$

Subtracting the cross-section equation from (6), still assuming only one regressor, yields

$$y_{it} - \bar{y}_{i} = \alpha_{i} + \beta x_{it} + u_{it} - \alpha_{i} - \beta \bar{x}_{i} - \bar{u}_{i} = \beta (x_{it} - \bar{x}_{i} ) + u_{it} - \bar{u}_{i}$$

The fixed effects have been “swept out” or “absorbed.” We can write this equation as

$$\tilde{y}_{it} = \beta \tilde{x}_{it} + \tilde{u}_{it}$$

The fixed-effects estimator is just OLS applied to (13). This estimator is also known as the “within” estimator because it only uses variation within each state.

$$\hat{\beta }_{FE} = {{\sum\limits_{i = 1}^{N} {\sum\limits_{t = 1}^{T} {\tilde{x}_{it} \tilde{y}_{it} } } } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{N} {\sum\limits_{t = 1}^{T} {\tilde{x}_{it} \tilde{y}_{it} } } } {\sum\limits_{i = 1}^{N} {\sum\limits_{t = 1}^{T} {\tilde{x}_{it}^{2} } } }}} \right. \kern-0pt} {\sum\limits_{i = 1}^{N} {\sum\limits_{t = 1}^{T} {\tilde{x}_{it}^{2} } } }}$$

Identical results can be achieved by regressing y on x and a set of dummy variables, one for each state such that the dummy for state i is one if state = state i, otherwise zero. This is the “least squares dummy variable” or LSDV model.

Generalizing to multiple regression, define the matrix of observations on the demeaned.

Regressor for state i as \(\tilde{X}_{i}\) so that

$$\hat{\beta }_{FE} = \left( {\sum\limits_{i = 1}^{N} {\tilde{X}_{i}^{{\prime }} \tilde{X}_{i} } } \right)^{ - 1} \left( {\sum\limits_{i = 1}^{N} {\tilde{X}_{ii}^{{\prime }} \tilde{y}_{i} } } \right)$$

The asymptotic variance–covariance matrix is

$$AVAR(\hat{\beta }_{FE} ) = \hat{\sigma }_{u}^{2} \left( {\sum\limits_{i = 1}^{N} {\tilde{X}_{i}^{{\prime }} \tilde{X}_{i} } } \right)^{ - 1}$$

where \(\hat{\sigma }_{u}^{2} = {{\sum\nolimits_{i = 1}^{N} {\sum\nolimits_{t = 1}^{T} {\hat{u}_{it}^{2} } } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^{N} {\sum\nolimits_{t = 1}^{T} {\hat{u}_{it}^{2} } } } {(N(T - 1) - K)}}} \right. \kern-0pt} {(N(T - 1) - K)}}\) is a consistent estimator of the error variance and

$$\hat{u}_{i} = \tilde{y}_{i} - \tilde{X}_{i} \hat{\beta }_{FE} ,\quad {\text{i}} = 1, \ldots ,{\text{N}}.$$

The square roots of the principal diagonal of the AVAR matrix are the standard errors.

Clustered Standard Errors

The clustered asymptotic variance–covariance matrix (Arellano 1987) is a modified sandwich estimator (White 1984, Chapter 6):

$$\hat{V}(\hat{\beta }_{FE} ) = \left( {\tilde{X}^{\prime}\tilde{X}} \right)^{ - 1} \left( {\sum\limits_{i = 1}^{N} {\tilde{X}^{\prime}_{i} } \hat{u}_{i} \hat{u}_{i}^{\prime } \tilde{X}_{i} } \right)\left( {\tilde{X}^{\prime}\tilde{X}} \right)^{ - 1}$$

The “meat” of the sandwich contains the estimated covariances among the error terms. The residuals are “clustered” in the sense that only covariances from state i are used, covariances from other states are ignored. The formula therefore corrects for heteroskedasticity (using the squared terms) and autocorrelation using the remaining terms.

Bias with Lagged Dependent Variables

A problem particular to TSCS regression is potential bias when a lagged dependent variable is included in the list of regressors. This variable is correlated with the error term, which biases its coefficient (Nickell 1981). The order of the bias is 1/T, and the bias declines relatively rapidly with more years. Judson and Owen (1999) show that, for T of 30 or more, fixed effects TSCS performs as well or better than other methods, such as generalized methods of moments. For T = 20 the bias is roughly 20% for the coefficient on the lagged dependent variable. The impact on the lagged dependent variable affects the coefficients and standard errors on other independent variables if these are correlated with the lagged dependent variable. In practice, the researcher does not know the direction and extent of this bias. We do not encounter the Nichols problem in our simulations because we set the coefficient on “x” to be zero, thereby dropping it from the regression and assuring that its coefficient remains unbiased.

DFGLS Unit Root Test

Elliott et al. (1996) propose the following two-step procedure.

If the data has a trend:

  1. 1.

    Detrend by estimating the following regression using OLS.

    $$(y_{t} - ay_{t - 1} ) = \alpha (1 - a) + \delta (t - a(t - 1)) + v_{t}$$

ERS do Monte Carlo experiments to determine the optimal value of a:

$$a = 1 - \frac{13.5}{T}$$

Note that \(a \to 1\) as T goes to infinity (local to unity).

  1. 2.

    Compute the (detrended) residuals \(e = \hat{v}_{t}\) and estimate the usual ADF test equation.

    $$\Delta e_{t} = (\rho - 1)e_{t - 1} + \sum\limits_{j = 1}^{p} {\gamma_{j} } \Delta e_{t - j} + v_{t}$$

Use the modified AIC, to choose the lag length, p. Test using the standard t-ratio on the lagged level, taking the critical values from the ERS tables.

If the data does not have a trend:

  1. 1.

    Estimate the constant only model (demeaned not detrended).

    $$(y_{t} - ay_{t - 1} ) = \alpha (1 - a) + v_{t} \;{\text{where}}\;a = 1 - \frac{7}{T}$$
  2. 2.

    Compute the residuals, \(\tilde{y}_{t}\) and estimate the same ADF test.

    $$\Delta \tilde{y}_{t} = (\rho - 1)\tilde{y}_{t - 1} + \sum\limits_{j = 1}^{p} {\gamma_{j} } \Delta \tilde{y}_{t - j} + v_{t}$$

Monte Carlo Program

The model is generated as follows. The researcher specifies the number of states with nonstationary dependent variables, ny, and the number of states with nonstationary independent variables, nx. In each case, the remaining states series are stationary with an autocorrelation coefficient randomly chosen with values between 0.80 and 0.99.

The state fixed effects (\(a_{i}\),\(b_{i}\)) are drawn from a uniform distribution with range [0, 100] and \(\sigma_{\varepsilon }^{2}\) = \(\sigma_{\nu }^{2}\) = 1. The estimated fixed-effects model is

$$y_{it} = \alpha_{i} + \beta x_{it} + u_{it}$$

We ran 10,000 regressions of this model for each variation of years and numbers of nonstationary dependent and independent variables. The results are presented in Table A in the online appendix, along with the Stata do-files. We also prepared a lengthy table with critical values for clustered regressions comparable to Table A, but giving the critical values from 0.10 to 0.01 as well as 0.005 (Table B).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Moody, C.E., Marvell, T.B. Clustering and Standard Error Bias in Fixed Effects Panel Data Regressions. J Quant Criminol 36, 347–369 (2020).

Download citation


  • Panel data regression
  • Auto correlation
  • Nonstationarity
  • Clustered standard errors
  • Small N
  • Difference-in-differences
  • Weighted regressions