Skip to main content

Clustering Methods for Statistical Inference

  • Living reference work entry
  • First Online:
Handbook of Labor, Human Resources and Population Economics

Abstract

We discuss when and how to deal with possibly clustered errors in linear regression models. Specifically, we discuss situations in which a regression model may plausibly be treated as having error terms that are arbitrarily correlated within known clusters but uncorrelated across them. The methods we discuss include various covariance matrix estimators, possibly combined with various methods of obtaining critical values, several bootstrap procedures, and randomization inference. Special attention is given to models with few treated clusters and clusters that vary a lot in size, where inference may be problematic. Two empirical examples illustrate the methods we discuss and the concerns we raise, and a simulation experiment illustrates the consequences of over-clustering and under-clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Abadie A, Gardeazabal J (2003) The economic costs of conflict: a case study of the Basque country. Am Econ Rev 93(1):112–132

    Article  Google Scholar 

  • Abadie A, Diamond A, Hainmueller J (2010) Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program. J Am Stat Assoc 105(490):493–505

    Article  Google Scholar 

  • Abadie A, Athey S, Imbens GW, Wooldridge J (2017) When should you adjust standard errors for clustering? NBER working papers 24003. National Bureau of Economic Research, Inc

    Google Scholar 

  • Abadie A, Athey S, Imbens GW, Wooldridge J (2020) Sampling-based vs. design-based uncertainty in regression analysis. Econometrica 88(1):265–296

    Article  Google Scholar 

  • Andrews DWK (2005) Cross-section regression with common shocks. Econometrica 73(5): 1551–1585

    Article  Google Scholar 

  • Angrist JD, Pischke JS (2008) Mostly harmless econometrics: an empiricist’s companion, 1st edn. Princeton University Press

    Google Scholar 

  • Arellano M (1987) Computing robust standard errors for within-groups estimators. Oxf Bull Econ Stat 49(4):431–434

    Article  Google Scholar 

  • Barrios T, Diamond R, Imbens GW, Kolesár M (2012) Clustering, spatial correlations, and randomization inference. J Am Stat Assoc 107(498):578–591

    Article  Google Scholar 

  • Bell RM, McCaffrey DF (2002) Bias reduction in standard errors for linear regression with multi-stage samples. Surv Methodol 28(2):169–181

    Google Scholar 

  • Bertrand M, Duflo E, Mullainathan S (2004) How much should we trust differences-in-differences estimates? Q J Econ 119(1):249–275

    Article  Google Scholar 

  • Bester CA, Conley TG, Hansen CB (2011) Inference with dependent data using cluster covariance estimators. J Econ 165(1):137–151

    Article  Google Scholar 

  • Branzoli N, Decarolis F (2015) Entry and subcontracting in public procurement auctions. Management Science 61(12):2945–2962

    Google Scholar 

  • Brewer M, Crossley TF, Joyce R (2017) Inference with difference-in-differences revisited. J Econ Methods 7(1):1–16

    Google Scholar 

  • Cameron AC, Miller DL (2015) A practitioner’s guide to cluster-robust inference. J Hum Resour 50(2):317–372

    Article  Google Scholar 

  • Cameron AC, Gelbach JB, Miller DL (2008) Bootstrap-based improvements for inference with clustered errors. Rev Econ Stat 90(3):414–427

    Article  Google Scholar 

  • Cameron AC, Gelbach JB, Miller DL (2011) Robust inference with multiway clustering. J Bus Econ Stat 29(2):238–249

    Article  Google Scholar 

  • Canay IA, Romano JP, Shaikh AM (2017) Randomization tests under an approximate symmetry assumption. Econometrica 85(3):1013–1030

    Article  Google Scholar 

  • Canay IA, Santos A, Shaikh A (2020) The wild bootstrap with a ‘small’ number of ‘large’ clusters. Rev Econ Stat 102:to appear

    Google Scholar 

  • Carter AV, Schnepel KT, Steigerwald DG (2017) Asymptotic behavior of a t-test robust to cluster heterogeneity. Rev Econ Stat 99(4):698–709

    Article  Google Scholar 

  • Conley T (1999) GMM estimation with cross sectional dependence. J Econ 92(1):1–45

    Article  Google Scholar 

  • Conley TG, Taber CR (2011) Inference with “difference in differences” with a small number of policy changes. Rev Econ Stat 93(1):113–125

    Article  Google Scholar 

  • Conley TG, Gonçalves S, Hansen CB (2018) Inference with dependent data in accounting and finance applications. J Account Res 56(4):1139–1203

    Article  Google Scholar 

  • Davezies L, D’Haultfœuille X, Guyonvarch Y (2020) Empirical process results for exchangeable arrays. Annals of Statistics, to appear

    Google Scholar 

  • Davidson R, Flachaire E (2008) The wild bootstrap, tamed at last. J Econ 146(1):162–169

    Article  Google Scholar 

  • Davidson R, MacKinnon JG (2006a) Bootstrap methods in econometrics. In: Mills TC, Patterson KD (eds) Palgrave handbook of econometrics: volume 1 econometric theory. Palgrave Macmillan, pp 812–838

    Google Scholar 

  • Davidson R, MacKinnon JG (2006b) The power of bootstrap and asymptotic tests. J Econ 133(2):421–441

    Article  Google Scholar 

  • Djogbenou AA, MacKinnon JG, Nielsen MØ (2019) Asymptotic theory and wild bootstrap inference with clustered errors. J Econ 212(2):393–412

    Article  Google Scholar 

  • Donald SG, Lang K (2007) Inference with difference-in-differences and other panel data. Rev Econ Stat 89(2):221–233

    Article  Google Scholar 

  • Esarey J (2018) clusterSEs: calculate cluster-robust p-values and confidence intervals. Tech. rep

    Google Scholar 

  • Esarey J, Menger A (2019) Practical and effective approaches to dealing with clustered data. Polit Sci Res Methods 7(3):541–559

    Article  Google Scholar 

  • Ferman B (2019) Inference in differences-in-differences: how much should we trust in independent clusters? MPRA paper 93746. University Library of Munich, Germany

    Google Scholar 

  • Ferman B, Pinto C (2019) Inference in differences-in-differences with few treated groups and heteroskedasticity. Rev Econ Stat 101:452–467

    Article  Google Scholar 

  • Fisher RA (1935) The Design of experiments. Oliver and Boyd, Edinburgh

    Google Scholar 

  • Giné X, Mansuri G (2018) Together we will: experimental evidence on female voting behavior in Pakistan. Am Econ J Appl Econ 10(1):207–235

    Article  Google Scholar 

  • Hagemann A (2019a) Placebo inference on treatment effects when the number of clusters is small. J Econ 213(1):190–209

    Article  Google Scholar 

  • Hagemann A (2019b) Permutation inference with a finite number of heterogeneous clusters. ArXiv e-prints 1907.01049

    Google Scholar 

  • Hansen BE, Lee S (2019) Asymptotic theory for clustered samples. J Econ 210(2):268–290

    Article  Google Scholar 

  • Hess S (2017) Randomization inference with Stata: a guide and software. Stata J 17(3):630–651

    Article  Google Scholar 

  • Horowitz JL (2019) Bootstrap methods in econometrics. Annu Rev Econ 11(1):193–224

    Article  Google Scholar 

  • Ibragimov R, Müller UK (2010) t-statistic based correlation and heterogeneity robust inference. J Bus Econ Stat 28(4):453–468

    Article  Google Scholar 

  • Ibragimov R, Müller UK (2016) Inference with few heterogeneous clusters. Rev Econ Stat 98(1):83–96

    Article  Google Scholar 

  • Imbens GW, Kolesár M (2016) Robust standard errors in small samples: some practical advice. Rev Econ Stat 98(4):701–712

    Article  Google Scholar 

  • Imbens GW, Rubin DB (2015) Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, New York

    Book  Google Scholar 

  • Jackson JE (2020) Corrected standard errors with clustered data. Political analysis 28(3):318–339

    Google Scholar 

  • Kelly M (2019) The standard errors of persistence. Tech. rep

    Google Scholar 

  • Kloek T (1981) OLS estimation in a model where a microvariable is explained by aggregates and contemporaneous disturbances are equicorrelated. Econometrica 49(1):205–207

    Article  Google Scholar 

  • Kolenikov S (2010) Resampling variance estimation for complex survey data. Stata J 10(2):165–199

    Article  Google Scholar 

  • Lee CH, Steigerwald DG (2018) Inference for clustered data. Stata J 18(2):447–460

    Article  Google Scholar 

  • Lehmann EL, Romano JP (2005) Testing statistical hypotheses, 3rd edn. Springer, New York

    Google Scholar 

  • Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22

    Article  Google Scholar 

  • MacKinnon JG (2002) Bootstrap inference in econometrics. Can J Econ 35(4):615–645

    Article  Google Scholar 

  • MacKinnon JG (2013) Thirty years of heteroskedasticity-robust inference. In: Chen X, Swanson NR (eds) Recent advances and future directions in causality, prediction, and specification analysis. Springer, pp 437–461

    Google Scholar 

  • MacKinnon JG (2015) Wild cluster bootstrap confidence intervals. L’Actualité Économique 91:11–33

    Article  Google Scholar 

  • MacKinnon JG (2016) Inference with large clustered datasets. L’Actualité Économique 92:649–665

    Article  Google Scholar 

  • MacKinnon JG (2019) How cluster-robust inference is changing applied econometrics. Can J Econ 52(3):851–881

    Article  Google Scholar 

  • MacKinnon JG, Webb MD (2017a) Wild bootstrap inference for wildly different cluster sizes. J Appl Econ 32(2):233–254

    Article  Google Scholar 

  • MacKinnon JG, Webb MD (2017b) Pitfalls when estimating treatment effects using clustered data. Polit Methodol 24(2):20–31

    Google Scholar 

  • MacKinnon JG, Webb MD (2018) The wild bootstrap for few (treated) clusters. Econ J 21(2):114–135

    Google Scholar 

  • MacKinnon JG, Webb MD (2019) Wild bootstrap randomization inference for few treated clusters. In: Huynh KP, Jacho-Chávez DT, Tripathi G (eds) The econometrics of complex survey data: theory and applications, advances in econometrics, vol 39, Emerald Group, Chap 3, pp 61–85

    Google Scholar 

  • MacKinnon JG, Webb MD (2020) Randomization inference for difference-in-differences with few treated clusters. J Econ 218(2):435–450

    Google Scholar 

  • MacKinnon JG, White H (1985) Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties. J Econ 29(3):305–325

    Article  Google Scholar 

  • MacKinnon JG, Nielsen MØ, Webb MD (2020a) Wild bootstrap and asymptotic inference with multiway clustering. J Bus Econ Stat 38:to appear

    Google Scholar 

  • MacKinnon JG, Nielsen MØ, Webb MD (2020b) Testing for the appropriate level of clustering in linear regression models. QED working paper 1428, Queen’s University, Department of Economics

    Google Scholar 

  • Menzel K (2018) Bootstrap with cluster-dependence in two or more dimensions. ArXiv e-prints, New York University, 1703.03043

    Google Scholar 

  • Miglioretti DL, Heagerty PJ (2006) Marginal modeling of nonnested multilevel data using standard software. Am J Epidemiol 165(4):453–463

    Article  Google Scholar 

  • Moulton BR (1986) Random group effects and the precision of regression estimates. J Econ 32(3):385–397

    Article  Google Scholar 

  • Moulton BR (1990) An illustration of a pitfall in estimating the effects of aggregate variables on micro units. Rev Econ Stat 72(2):334–338

    Article  Google Scholar 

  • Pustejovsky J (2017) Clubsandwich: cluster-robust (sandwich) variance estimators with small-sample corrections. Tech. rep

    Google Scholar 

  • Racine JS, MacKinnon JG (2007) Simulation-based tests that can use any number of simulations. Commun Stat Simul Comput 36:357–365

    Article  Google Scholar 

  • Riddell WC (1979) The empirical foundations of the Phillips curve: evidence from Canadian wage contract data. Econometrica 47(1):1–24

    Article  Google Scholar 

  • Rogers WH (1993) Regression standard errors in clustered samples. Stata Tech Bull 13:19–23

    Google Scholar 

  • Roodman D, MacKinnon JG, Nielsen MØ, Webb MD (2019) Fast and wild: bootstrap inference in Stata using boottest. Stata J 19(1):4–60

    Article  Google Scholar 

  • Spamann H (2019) On inference when using state corporate laws for identification. Discussion paper 1024, Harvard Law School

    Google Scholar 

  • Thompson SB (2011) Simple formulas for standard errors that cluster by both firm and time. J Financ Econ 99:1–10

    Article  Google Scholar 

  • Toulis P (2019) Life after bootstrap: residual randomization inference in regression models. Tech, rep., University of Chicago, 1908.04218

    Google Scholar 

  • Webb MD (2014) Reworking wild bootstrap based inference for clustered errors. QED working paper 1315, Queen’s University, Department of Economics

    Google Scholar 

  • White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4):817–838

    Article  Google Scholar 

  • Young A (2016) Improved, nearly exact, statistical inference with robust and clustered covariance matrices using effective degrees of freedom corrections. Working paper, London School of Economics

    Google Scholar 

  • Young A (2019) Channelling Fisher: randomization tests and the statistical insignificance of seemingly significant experimental results. Q J Econ 134(2):557–598

    Article  Google Scholar 

Download references

Acknowledgments

We thank the Social Sciences and Humanities Research Council of Canada (SSHRC) for financial support. We are grateful to Mehtab Hanzroh for his excellent research assistance. We benefited from the comments of Alfonso Flores-Lagunes, Andreas Hagemann, Azeem Shaikh, Holger Spamann, an anonymous referee, and participants at the CIREQ 2019 Bootstrap Conference and the 2019 Canadian Economics Association Annual Meeting.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James G. MacKinnon .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

MacKinnon, J.G., Webb, M.D. (2020). Clustering Methods for Statistical Inference. In: Zimmermann, K.F. (eds) Handbook of Labor, Human Resources and Population Economics. Springer, Cham. https://doi.org/10.1007/978-3-319-57365-6_43-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57365-6_43-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57365-6

  • Online ISBN: 978-3-319-57365-6

  • eBook Packages: Springer Reference Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences

Publish with us

Policies and ethics