Perils and prospects of using aggregate area level socioeconomic information as a proxy for individual level socioeconomic confounders in instrumental variables regression

  • Jesse Yenchih Hsu
  • Scott A. Lorch
  • Dylan S. Small


A frequent concern in making statistical inference for causal effects of a policy or treatment based on observational studies is that there are unmeasured confounding variables. The instrumental variable method is an approach to estimating a causal relationship in the presence of unmeasured confounding variables. A valid instrumental variable needs to be independent of the unmeasured confounding variables. It is important to control for the confounding variable if it is correlated with the instrument. In health services research, socioeconomic status variables are often considered as confounding variables. In recent studies, distance to a specialty care center has been used as an instrument for the effect of specialty care vs. general care. Because the instrument may be correlated with socioeconomic status variables, it is important that socioeconomic status variables are controlled for in the instrumental variables regression. However, health data sets often lack individual socioeconomic information but contain area average socioeconomic information from the US Census, e.g., average income or education level in a county. We study the effects on the bias of the two stage least squares estimates in instrumental variables regression when using an area-level variable as a controlled confounding variable that may be correlated with the instrument. We propose the aggregated instrumental variables regression using the concept of Wald’s method of grouping, provided the assumption that the grouping is independent of the errors. We present simulation results and an application to a study of perinatal care for premature infants.


Aggregation Causal inference Instrumental variables Proxy variables Wald’s grouping method 


  1. Abadie, A.: Semiparametric instrumental variable estimation of treatment response models. J. Econometr. 113, 231–263 (2003)CrossRefGoogle Scholar
  2. American Academy of Pediatrics, Committee on Fetus and Newborn: Levels of neonatal care. Pediatrics 114(5), 1341–1347 (2004)CrossRefGoogle Scholar
  3. Angrist, J.D.: Grouped-data estimation and testing in simple labor-supply models. J. Econometr. 47, 243–266 (1991)CrossRefGoogle Scholar
  4. Angrist, J.D., Imbens, G.W., Rubin, D.B.: Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91(434), 444–455 (1996)Google Scholar
  5. Angrist, J.D., Krueger, A.B.: Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Working Paper 8456, National Bureau of Economic Research (2001)Google Scholar
  6. Baiocchi, M., Small, D.S., Lorch, S., Rosenbaum, P.R.: Building a stronger instrument in an observational study of perinatal care for premature infants. J. Am. Stat. Assoc. 105(492), 1285–1296 (2010)CrossRefGoogle Scholar
  7. Brookhart, M.A., Schneeweiss, S.: Preference-based instrumental variable methods for the estimation of treatment effects: assessing validity and interpreting results. Int. J. Biostat. 3(1), Article 14 (2007)PubMedGoogle Scholar
  8. Card, D., Krueger, A.B.: Does school quality matter? returns to education and the characteristics of public schools in the united states. J. Polit. Econ. 100(1), 1–40 (1992)CrossRefGoogle Scholar
  9. Cifuentes, J., Bronstein, J., Phibbs, C.S., Phibbs, R.H., Schmitt, S.K., Carlo, W.A.: Mortality in low birth weight infants according to level of neonatal care at hospital of birth. Pediatrics 109(5), 745–751 (2002)PubMedCrossRefGoogle Scholar
  10. Geronimus, A.T., Bound, J.: Use of census-based aggregate variables to proxy for socioeconomic group: evidence from national samples. Am. J. Epidemiol. 148(5), 475–486 (1998)PubMedCrossRefGoogle Scholar
  11. Geronimus, A.T., Bound, J., Neidert, L.J.: On the validity of using census geocode characteristics to proxy individual socioeconomic characteristics. J. Am. Stat. Assoc. 91(434), 529–537 (1996)Google Scholar
  12. Hernán, M.A., Robins, J.M.: Instruments for causal inference: an epidemiologist’s dream?. Epidemiology 17(4), 360–372 (2006)PubMedCrossRefGoogle Scholar
  13. Holland, P.W.: Causal inference, path analysis, and recursive structural equations models. Sociol. Methodol. 18, 449–484 (1988)CrossRefGoogle Scholar
  14. Joffe, M.M., Small, D., Ten Have, T., Brunelli, S., Feldman, H.I.: Extended instrumental varialbes estimation for overall effects. Int. J. Biostat. 4(1), Article 4 (2008)PubMedGoogle Scholar
  15. Krieger, N.: Overcoming the absence of socioeconomic data in medical records: validation and application of a census-based methodology. Am. J. Public Health 82(5), 703–710 (1992)PubMedCrossRefGoogle Scholar
  16. Krieger, N., Chen, J.T., Waterman, P.D., Rehkopf, D.H., Subramanian, S.V.: Race/ethnicity, gender, and monitoring socioeconomic gradients in health: a comparison of area-based socioeconomic measures – the public health disparities geocoding project. Am. J. Public Health 93(10), 1655–1671 (2003)PubMedCrossRefGoogle Scholar
  17. Krieger, N., Chen, J.T., Waterman, P.D., Soobader, M.-J., Subramanian, S.V., Carson, R.: Choosing area based socioeconomic measures to monitor social inequalities in low birth weight and childhood lead poisoning: the public health disparities geocoding project (us). J. Epidemiol. Commun. Health 57, 186–199 (2003)CrossRefGoogle Scholar
  18. Lipsitz, S., Fitzmaurice, G.: Generalized estimating equations for longitudinal data analysis. In: Fitzmaurice, G., Davidian, M., Verbeke, G., Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 43–78. CRC/Chapman & Hall, Boca Raton, FL (2009)Google Scholar
  19. Lorch, S.A., Baiocchi, M., Ahlberg, C.E., Small, D.S.: The differential impact of delivery hospital on the outcomes of premature infants. Pediatrics (in press) (2012)Google Scholar
  20. Lorch, S.A., Myers, S., Carr, B.: The regionalization of pediatric health care. Pediatrics 126(6), 1182–1190 (2010)PubMedCrossRefGoogle Scholar
  21. Mayer, S.E., Jencks, C.: Growing up in poor neighborhoods: how much does it matter? Science 243(4897), 1441–1445 (1989)PubMedCrossRefGoogle Scholar
  22. McClellan, M., McNeil, B.J., Newhouse, J.P.: Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality?. J. Am. Med. Assoc. 272(1), 859–866 (1994)CrossRefGoogle Scholar
  23. Neyman, J.: On the application of probability theory to agricultural experiments (translated and edited by D.M. Dabrowska and T. P. Speed). Stat. Sci. 5(4), 465–480 (1990)Google Scholar
  24. Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, New York (2000)Google Scholar
  25. Phibbs, C.S., Baker, L.C., Caughey, A.B., Danielsen, B., Schmitt, S.K., Phibbs, R.H.: Level and volume of neonatal intensive care and mortality in very-low-birth-weight infants. New Engl. J. Med. 356, 2165–2175 (2007)PubMedCrossRefGoogle Scholar
  26. Phibbs, C.S., Mark, D.H., Luft, H.S., Peltzman-Rennie, D.J., Garnick, D.W., Lichtenberg, E., McPhee, S.J.: Choice of hospital for delivery: a comparison of high-risk and low-risk women. Health Serv. Res. 28(2), 201–222 (1993)PubMedGoogle Scholar
  27. Phibbs, C.S., Robinson, J.C.: A variable-radius measure of local hospital market structure. Health Serv. Res. 28(3), 313–324 (1993)PubMedGoogle Scholar
  28. Prais, S.J., Aitchison, J.: The grouping of observations in regression analysis. Rev. Int. Stat. Inst. 22(1/3), 1–22 (1954)CrossRefGoogle Scholar
  29. Rogowski, J.A., Horbar, J.D., Staiger, D.O., Kenny, M., Carpenter, J., Geppert, J.: Indirect vs direct hospital quality indicators for very-low-birth-weight infants. J. Am. Med. Assoc. 291(2), 202–209 (2004)CrossRefGoogle Scholar
  30. Rosenbaum, P.R., Rubin, D.B.: Discussion of “on state education statistics”: a difficulty with regression analyses of regional test score averages. J. Edu. Stat. 10(4), 326–333 (1985)Google Scholar
  31. Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688–701 (1974)CrossRefGoogle Scholar
  32. Rubin, D.B.: Statistics and causal inference: comment: which ifs have causal answers. J. Am. Stat. Assoc. 81(396), 961–962 (1986)Google Scholar
  33. Stock, J.H., Wright, J.H., Yogo, M.: A survey of weak instruments and weak identification in generalized method of moments. J. Bus. Econ. Stat. 20(4), 518–529 (2002)CrossRefGoogle Scholar
  34. Theil, H.: Principles of Econometrics. Wiley, New York (1971)Google Scholar
  35. Wald, A.: The fitting of straight lines if both variables are subject to error. Ann. Math. Stat. 11(3), 284–300 (1940)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Jesse Yenchih Hsu
    • 1
    • 2
  • Scott A. Lorch
    • 2
    • 3
    • 4
  • Dylan S. Small
    • 1
  1. 1.Department of Statistics, Wharton SchoolUniversity of PennsylvaniaPhiladelphiaUSA
  2. 2.Center for Outcomes ResearchThe Children’s Hospital of PhiladelphiaPhiladelphiaUSA
  3. 3.Department of Pediatrics, School of MedicineUniversity of PennsylvaniaPhiladelphiaUSA
  4. 4.Division of NeonatologyThe Children’s Hospital of PhiladelphiaPhiladelphiaUSA

Personalised recommendations