Skip to main content

Advertisement

Log in

Perils and prospects of using aggregate area level socioeconomic information as a proxy for individual level socioeconomic confounders in instrumental variables regression

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

A frequent concern in making statistical inference for causal effects of a policy or treatment based on observational studies is that there are unmeasured confounding variables. The instrumental variable method is an approach to estimating a causal relationship in the presence of unmeasured confounding variables. A valid instrumental variable needs to be independent of the unmeasured confounding variables. It is important to control for the confounding variable if it is correlated with the instrument. In health services research, socioeconomic status variables are often considered as confounding variables. In recent studies, distance to a specialty care center has been used as an instrument for the effect of specialty care vs. general care. Because the instrument may be correlated with socioeconomic status variables, it is important that socioeconomic status variables are controlled for in the instrumental variables regression. However, health data sets often lack individual socioeconomic information but contain area average socioeconomic information from the US Census, e.g., average income or education level in a county. We study the effects on the bias of the two stage least squares estimates in instrumental variables regression when using an area-level variable as a controlled confounding variable that may be correlated with the instrument. We propose the aggregated instrumental variables regression using the concept of Wald’s method of grouping, provided the assumption that the grouping is independent of the errors. We present simulation results and an application to a study of perinatal care for premature infants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abadie, A.: Semiparametric instrumental variable estimation of treatment response models. J. Econometr. 113, 231–263 (2003)

    Article  Google Scholar 

  • American Academy of Pediatrics, Committee on Fetus and Newborn: Levels of neonatal care. Pediatrics 114(5), 1341–1347 (2004)

    Article  Google Scholar 

  • Angrist, J.D.: Grouped-data estimation and testing in simple labor-supply models. J. Econometr. 47, 243–266 (1991)

    Article  Google Scholar 

  • Angrist, J.D., Imbens, G.W., Rubin, D.B.: Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91(434), 444–455 (1996)

    Google Scholar 

  • Angrist, J.D., Krueger, A.B.: Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Working Paper 8456, National Bureau of Economic Research (2001)

  • Baiocchi, M., Small, D.S., Lorch, S., Rosenbaum, P.R.: Building a stronger instrument in an observational study of perinatal care for premature infants. J. Am. Stat. Assoc. 105(492), 1285–1296 (2010)

    Article  CAS  Google Scholar 

  • Brookhart, M.A., Schneeweiss, S.: Preference-based instrumental variable methods for the estimation of treatment effects: assessing validity and interpreting results. Int. J. Biostat. 3(1), Article 14 (2007)

    PubMed  Google Scholar 

  • Card, D., Krueger, A.B.: Does school quality matter? returns to education and the characteristics of public schools in the united states. J. Polit. Econ. 100(1), 1–40 (1992)

    Article  Google Scholar 

  • Cifuentes, J., Bronstein, J., Phibbs, C.S., Phibbs, R.H., Schmitt, S.K., Carlo, W.A.: Mortality in low birth weight infants according to level of neonatal care at hospital of birth. Pediatrics 109(5), 745–751 (2002)

    Article  PubMed  Google Scholar 

  • Geronimus, A.T., Bound, J.: Use of census-based aggregate variables to proxy for socioeconomic group: evidence from national samples. Am. J. Epidemiol. 148(5), 475–486 (1998)

    Article  PubMed  CAS  Google Scholar 

  • Geronimus, A.T., Bound, J., Neidert, L.J.: On the validity of using census geocode characteristics to proxy individual socioeconomic characteristics. J. Am. Stat. Assoc. 91(434), 529–537 (1996)

    Google Scholar 

  • Hernán, M.A., Robins, J.M.: Instruments for causal inference: an epidemiologist’s dream?. Epidemiology 17(4), 360–372 (2006)

    Article  PubMed  Google Scholar 

  • Holland, P.W.: Causal inference, path analysis, and recursive structural equations models. Sociol. Methodol. 18, 449–484 (1988)

    Article  Google Scholar 

  • Joffe, M.M., Small, D., Ten Have, T., Brunelli, S., Feldman, H.I.: Extended instrumental varialbes estimation for overall effects. Int. J. Biostat. 4(1), Article 4 (2008)

    PubMed  Google Scholar 

  • Krieger, N.: Overcoming the absence of socioeconomic data in medical records: validation and application of a census-based methodology. Am. J. Public Health 82(5), 703–710 (1992)

    Article  PubMed  CAS  Google Scholar 

  • Krieger, N., Chen, J.T., Waterman, P.D., Rehkopf, D.H., Subramanian, S.V.: Race/ethnicity, gender, and monitoring socioeconomic gradients in health: a comparison of area-based socioeconomic measures – the public health disparities geocoding project. Am. J. Public Health 93(10), 1655–1671 (2003)

    Article  PubMed  Google Scholar 

  • Krieger, N., Chen, J.T., Waterman, P.D., Soobader, M.-J., Subramanian, S.V., Carson, R.: Choosing area based socioeconomic measures to monitor social inequalities in low birth weight and childhood lead poisoning: the public health disparities geocoding project (us). J. Epidemiol. Commun. Health 57, 186–199 (2003)

    Article  CAS  Google Scholar 

  • Lipsitz, S., Fitzmaurice, G.: Generalized estimating equations for longitudinal data analysis. In: Fitzmaurice, G., Davidian, M., Verbeke, G., Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 43–78. CRC/Chapman & Hall, Boca Raton, FL (2009)

    Google Scholar 

  • Lorch, S.A., Baiocchi, M., Ahlberg, C.E., Small, D.S.: The differential impact of delivery hospital on the outcomes of premature infants. Pediatrics (in press) (2012)

  • Lorch, S.A., Myers, S., Carr, B.: The regionalization of pediatric health care. Pediatrics 126(6), 1182–1190 (2010)

    Article  PubMed  Google Scholar 

  • Mayer, S.E., Jencks, C.: Growing up in poor neighborhoods: how much does it matter? Science 243(4897), 1441–1445 (1989)

    Article  PubMed  CAS  Google Scholar 

  • McClellan, M., McNeil, B.J., Newhouse, J.P.: Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality?. J. Am. Med. Assoc. 272(1), 859–866 (1994)

    Article  CAS  Google Scholar 

  • Neyman, J.: On the application of probability theory to agricultural experiments (translated and edited by D.M. Dabrowska and T. P. Speed). Stat. Sci. 5(4), 465–480 (1990)

    Google Scholar 

  • Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, New York (2000)

    Google Scholar 

  • Phibbs, C.S., Baker, L.C., Caughey, A.B., Danielsen, B., Schmitt, S.K., Phibbs, R.H.: Level and volume of neonatal intensive care and mortality in very-low-birth-weight infants. New Engl. J. Med. 356, 2165–2175 (2007)

    Article  PubMed  CAS  Google Scholar 

  • Phibbs, C.S., Mark, D.H., Luft, H.S., Peltzman-Rennie, D.J., Garnick, D.W., Lichtenberg, E., McPhee, S.J.: Choice of hospital for delivery: a comparison of high-risk and low-risk women. Health Serv. Res. 28(2), 201–222 (1993)

    PubMed  CAS  Google Scholar 

  • Phibbs, C.S., Robinson, J.C.: A variable-radius measure of local hospital market structure. Health Serv. Res. 28(3), 313–324 (1993)

    PubMed  CAS  Google Scholar 

  • Prais, S.J., Aitchison, J.: The grouping of observations in regression analysis. Rev. Int. Stat. Inst. 22(1/3), 1–22 (1954)

    Article  Google Scholar 

  • Rogowski, J.A., Horbar, J.D., Staiger, D.O., Kenny, M., Carpenter, J., Geppert, J.: Indirect vs direct hospital quality indicators for very-low-birth-weight infants. J. Am. Med. Assoc. 291(2), 202–209 (2004)

    Article  CAS  Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B.: Discussion of “on state education statistics”: a difficulty with regression analyses of regional test score averages. J. Edu. Stat. 10(4), 326–333 (1985)

    Google Scholar 

  • Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688–701 (1974)

    Article  Google Scholar 

  • Rubin, D.B.: Statistics and causal inference: comment: which ifs have causal answers. J. Am. Stat. Assoc. 81(396), 961–962 (1986)

    Google Scholar 

  • Stock, J.H., Wright, J.H., Yogo, M.: A survey of weak instruments and weak identification in generalized method of moments. J. Bus. Econ. Stat. 20(4), 518–529 (2002)

    Article  Google Scholar 

  • Theil, H.: Principles of Econometrics. Wiley, New York (1971)

    Google Scholar 

  • Wald, A.: The fitting of straight lines if both variables are subject to error. Ann. Math. Stat. 11(3), 284–300 (1940)

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank the Editors and the referees for helpful comments. This work was supported by the National Science Foundation (Measurement, Methodology and Statistics program) grant # NSF 0961971. This work was also supported by Maternal and Child Health Bureau (MCHB) grant # R40 MC05474-01-00 and by Agency for Healthcare Research and Quality (AHRQ) grant # R01 HS 01569.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesse Yenchih Hsu.

Appendix: Consistency of the TSLS estimator from the aggregate IV regression and its variance estimate

Appendix: Consistency of the TSLS estimator from the aggregate IV regression and its variance estimate

In this section, we will show the show the consistency of \(\hat{\beta}^{agg}\) obtained from the aggregate IV regression in Sect. 4 We will provide an estimate for the variance of \(\hat{\beta}^{agg}\).

Let (Y aggX aggD aggZ agg) denote vectors of aggregated (YXDZ), and \(\underline{{\bf W}}^{agg} = [{\bf D}^{agg}, {\bf X}^{agg}]\) and \(\underline{{\bf A}}^{agg} = [{\bf Z}^{agg}, {\bf X}^{agg}]\). We use \((\underline{{\bf W}}^{agg}, \underline{{\bf A}}^{agg})\) to distinguish (W aggA agg) used in Sect. 3 in which W agg = [DX agg] and A agg = [ZX agg]. Let (H Y H A H W ) be aggregation errors for (YAW), where \({\bf H}_{Y} = {\bf Y} - {\bf Y}^{agg},\, {\bf H}_{A} = {\bf A} - \underline{{\bf A}}^{agg}\), and \({\bf H}_{W} = {\bf W} - \underline{{\bf W}}^{agg}\). The TSLS estimator \(\hat{\beta}^{agg}\) obtained from all aggregate variables is

$$ \begin{aligned} \hat{\beta}^{agg} &= (\underline{{\bf A}}^{agg^{T}} \underline{{\bf W}}^{agg})^{-1} \underline{{\bf A}}^{agg^{T}}{\bf Y}^{agg} \\ &= (\underline{{\bf A}}^{agg^{T}} \underline{{\bf W}}^{agg})^{-1} \underline{{\bf A}}^{agg^{T}}\{(\underline{{\bf W}}^{agg} + {\bf H}_{W}) \beta - {\bf H}_{Y} + \varepsilon\} \\ &= \beta + (\underline{{\bf A}}^{agg^{T}} \underline{{\bf W}}^{agg})^{-1} \underline{{\bf A}}^{agg^{T}}({\bf H}_{W} \beta - {\bf H}_{Y} + \varepsilon) \end{aligned} $$
(10)

If we can show that \(\underline{{\bf A}}^{agg^{T}}({\bf H}_{W} \beta - {\bf H}_{Y} + \varepsilon) \xrightarrow{p}{\bf 0}\) in (10), then \(\hat{\beta}^{agg}\) is a consistent estimator for β. We could write the aggregate matrices \(({\bf Y}^{agg}, \underline{{\bf A}}^{agg}, \underline{{\bf W}}^{agg})\) as (G YG AG W). The matrix G is a diagonal grouping matrix, where

$$ {\bf G} = \left[\begin{array}{llll} {\bf G}_{1} & {\bf 0} & \cdots & {\bf 0} \\ {\bf 0} & {\bf G}_{2} & \cdots & {\bf 0} \\ \vdots & \vdots & \ddots & \vdots \\ {\bf 0} & {\bf 0} & \cdots & {\bf G}_{j}\\ \end{array}\right] \quad \hbox{and} \quad {\bf G}_{j} = \left[\begin{array}{lll} 1/n_{j} & \cdots & 1/n_{j} \\ \vdots & \ddots & \vdots \\ 1/n_{j} & \cdots & 1/n_{j}\\ \end{array}\right]. $$

Thus, \(\underline{{\bf A}}^{agg^{T}}({\bf H}_{W}\beta - {\bf H}_{Y} + \varepsilon)\) can be written as \({\bf A}^{T}{\bf G}^{T}\{({\bf W} - {\bf G}{\bf W})\beta - ({\bf Y} - {\bf G}{\bf Y}) + \varepsilon\}\). Since G is a symmetric and idempotent matrix, G T(W − G W) and G T(Y − G Y) are zero. Also, \({\bf A}^{T}{\bf G}^{T}\varepsilon\) converges in probability to zero because of the assumption of independence between G and \(\varepsilon\). The variance of \(\hat{\beta}^{agg}\) is

$$ \begin{aligned} Var\left(\hat{\beta}^{agg}\right) &= Var\left\{(\underline{{\bf A}}^{agg^{T}} \underline{{\bf W}}^{agg})^{-1} \underline{{\bf A}}^{agg^{T}} ({\bf H}_{W} \beta - {\bf H}_{Y} + \varepsilon)\right\} \\ & = ({\bf A}^{T}{\bf G}{\bf W})^{-1}{\bf A}^{T}{\bf G} \times Var\left\{({\bf W} - {\bf G}{\bf W})\beta - ({\bf Y} - {\bf G}{\bf Y}) + \varepsilon\right\} \times {\bf G}{\bf A}({\bf W}^{T}{\bf G}{\bf A})^{-1} \\ &= ({\bf A}^{T}{\bf G}{\bf W})^{-1}{\bf A}^{T}{\bf G} \times Var\left\{({\bf G}{\bf Y} - {\bf G}{\bf W} \beta)\right\} \times {\bf G}{\bf A}({\bf W}^{T}{\bf G}{\bf A})^{-1} \\ &= ({\bf A}^{T}{\bf G}{\bf W})^{-1}{\bf A}^{T}{\bf G} \times Var({\bf G}\varepsilon) \times {\bf G}{\bf A}({\bf W}^{T}{\bf G}{\bf A})^{-1}, \end{aligned} $$
(11)

where \(Var({\bf G} \varepsilon)\) can be estimated by

$$ \begin{aligned} \widehat{Var}({\bf G}\varepsilon) &= \left[\begin{array}{llll} \hat{\Upsigma}_{1} & {\bf 0} & \cdots & {\bf 0} \\ {\bf 0} & \hat{\Upsigma}_{2} & \cdots & {\bf 0} \\ \vdots & \vdots & \ddots & \vdots \\ {\bf 0} & {\bf 0} & \cdots & \hat{\Upsigma}_{j}\\ \end{array}\right] \quad \hbox{and} \quad \\ \hat{\Upsigma}_{j} &= \left[\begin{array}{ccc} (n_{j}-1)^{-1}\sum_{i=1}^{n_{j}}(y_{j}^{agg}-[z_{j}^{agg}, x_{j}^{agg}]\hat{\beta}^{agg})^{2} & \cdots & (n_{j}-1)^{-1}\sum_{i=1}^{n_{j}}(y_{j}^{agg}-[z_{j}^{agg}, x_{j}^{agg}]\hat{\beta}^{agg})^{2} \\ \vdots & \ddots & \vdots \\ (n_{j}-1)^{-1}\sum_{i=1}^{n_{j}}(y_{j}^{agg}-[z_{j}^{agg}, x_{j}^{agg}]\hat{\beta}^{agg})^{2} & \cdots & (n_{j}-1)^{-1}\sum_{i=1}^{n_{j}}(y_{j}^{agg}-[z_{j}^{agg}, x_{j}^{agg}]\hat{\beta}^{agg})^{2}\\ \end{array}\right]. \end{aligned} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, J.Y., Lorch, S.A. & Small, D.S. Perils and prospects of using aggregate area level socioeconomic information as a proxy for individual level socioeconomic confounders in instrumental variables regression. Health Serv Outcomes Res Method 12, 119–140 (2012). https://doi.org/10.1007/s10742-012-0095-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-012-0095-9

Keywords

Navigation