Health Services and Outcomes Research Methodology

, Volume 12, Issue 2, pp 119–140

Perils and prospects of using aggregate area level socioeconomic information as a proxy for individual level socioeconomic confounders in instrumental variables regression


    • Department of Statistics, Wharton SchoolUniversity of Pennsylvania
    • Center for Outcomes ResearchThe Children’s Hospital of Philadelphia
  • Scott A. Lorch
    • Center for Outcomes ResearchThe Children’s Hospital of Philadelphia
    • Department of Pediatrics, School of MedicineUniversity of Pennsylvania
    • Division of NeonatologyThe Children’s Hospital of Philadelphia
  • Dylan S. Small
    • Department of Statistics, Wharton SchoolUniversity of Pennsylvania

DOI: 10.1007/s10742-012-0095-9

Cite this article as:
Hsu, J.Y., Lorch, S.A. & Small, D.S. Health Serv Outcomes Res Method (2012) 12: 119. doi:10.1007/s10742-012-0095-9


A frequent concern in making statistical inference for causal effects of a policy or treatment based on observational studies is that there are unmeasured confounding variables. The instrumental variable method is an approach to estimating a causal relationship in the presence of unmeasured confounding variables. A valid instrumental variable needs to be independent of the unmeasured confounding variables. It is important to control for the confounding variable if it is correlated with the instrument. In health services research, socioeconomic status variables are often considered as confounding variables. In recent studies, distance to a specialty care center has been used as an instrument for the effect of specialty care vs. general care. Because the instrument may be correlated with socioeconomic status variables, it is important that socioeconomic status variables are controlled for in the instrumental variables regression. However, health data sets often lack individual socioeconomic information but contain area average socioeconomic information from the US Census, e.g., average income or education level in a county. We study the effects on the bias of the two stage least squares estimates in instrumental variables regression when using an area-level variable as a controlled confounding variable that may be correlated with the instrument. We propose the aggregated instrumental variables regression using the concept of Wald’s method of grouping, provided the assumption that the grouping is independent of the errors. We present simulation results and an application to a study of perinatal care for premature infants.


AggregationCausal inferenceInstrumental variablesProxy variablesWald’s grouping method

Copyright information

© Springer Science+Business Media, LLC 2012