Abstract
A frequent concern in making statistical inference for causal effects of a policy or treatment based on observational studies is that there are unmeasured confounding variables. The instrumental variable method is an approach to estimating a causal relationship in the presence of unmeasured confounding variables. A valid instrumental variable needs to be independent of the unmeasured confounding variables. It is important to control for the confounding variable if it is correlated with the instrument. In health services research, socioeconomic status variables are often considered as confounding variables. In recent studies, distance to a specialty care center has been used as an instrument for the effect of specialty care vs. general care. Because the instrument may be correlated with socioeconomic status variables, it is important that socioeconomic status variables are controlled for in the instrumental variables regression. However, health data sets often lack individual socioeconomic information but contain area average socioeconomic information from the US Census, e.g., average income or education level in a county. We study the effects on the bias of the two stage least squares estimates in instrumental variables regression when using an area-level variable as a controlled confounding variable that may be correlated with the instrument. We propose the aggregated instrumental variables regression using the concept of Wald’s method of grouping, provided the assumption that the grouping is independent of the errors. We present simulation results and an application to a study of perinatal care for premature infants.
Similar content being viewed by others
References
Abadie, A.: Semiparametric instrumental variable estimation of treatment response models. J. Econometr. 113, 231–263 (2003)
American Academy of Pediatrics, Committee on Fetus and Newborn: Levels of neonatal care. Pediatrics 114(5), 1341–1347 (2004)
Angrist, J.D.: Grouped-data estimation and testing in simple labor-supply models. J. Econometr. 47, 243–266 (1991)
Angrist, J.D., Imbens, G.W., Rubin, D.B.: Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91(434), 444–455 (1996)
Angrist, J.D., Krueger, A.B.: Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Working Paper 8456, National Bureau of Economic Research (2001)
Baiocchi, M., Small, D.S., Lorch, S., Rosenbaum, P.R.: Building a stronger instrument in an observational study of perinatal care for premature infants. J. Am. Stat. Assoc. 105(492), 1285–1296 (2010)
Brookhart, M.A., Schneeweiss, S.: Preference-based instrumental variable methods for the estimation of treatment effects: assessing validity and interpreting results. Int. J. Biostat. 3(1), Article 14 (2007)
Card, D., Krueger, A.B.: Does school quality matter? returns to education and the characteristics of public schools in the united states. J. Polit. Econ. 100(1), 1–40 (1992)
Cifuentes, J., Bronstein, J., Phibbs, C.S., Phibbs, R.H., Schmitt, S.K., Carlo, W.A.: Mortality in low birth weight infants according to level of neonatal care at hospital of birth. Pediatrics 109(5), 745–751 (2002)
Geronimus, A.T., Bound, J.: Use of census-based aggregate variables to proxy for socioeconomic group: evidence from national samples. Am. J. Epidemiol. 148(5), 475–486 (1998)
Geronimus, A.T., Bound, J., Neidert, L.J.: On the validity of using census geocode characteristics to proxy individual socioeconomic characteristics. J. Am. Stat. Assoc. 91(434), 529–537 (1996)
Hernán, M.A., Robins, J.M.: Instruments for causal inference: an epidemiologist’s dream?. Epidemiology 17(4), 360–372 (2006)
Holland, P.W.: Causal inference, path analysis, and recursive structural equations models. Sociol. Methodol. 18, 449–484 (1988)
Joffe, M.M., Small, D., Ten Have, T., Brunelli, S., Feldman, H.I.: Extended instrumental varialbes estimation for overall effects. Int. J. Biostat. 4(1), Article 4 (2008)
Krieger, N.: Overcoming the absence of socioeconomic data in medical records: validation and application of a census-based methodology. Am. J. Public Health 82(5), 703–710 (1992)
Krieger, N., Chen, J.T., Waterman, P.D., Rehkopf, D.H., Subramanian, S.V.: Race/ethnicity, gender, and monitoring socioeconomic gradients in health: a comparison of area-based socioeconomic measures – the public health disparities geocoding project. Am. J. Public Health 93(10), 1655–1671 (2003)
Krieger, N., Chen, J.T., Waterman, P.D., Soobader, M.-J., Subramanian, S.V., Carson, R.: Choosing area based socioeconomic measures to monitor social inequalities in low birth weight and childhood lead poisoning: the public health disparities geocoding project (us). J. Epidemiol. Commun. Health 57, 186–199 (2003)
Lipsitz, S., Fitzmaurice, G.: Generalized estimating equations for longitudinal data analysis. In: Fitzmaurice, G., Davidian, M., Verbeke, G., Molenberghs, G. (eds.), Longitudinal Data Analysis, pp. 43–78. CRC/Chapman & Hall, Boca Raton, FL (2009)
Lorch, S.A., Baiocchi, M., Ahlberg, C.E., Small, D.S.: The differential impact of delivery hospital on the outcomes of premature infants. Pediatrics (in press) (2012)
Lorch, S.A., Myers, S., Carr, B.: The regionalization of pediatric health care. Pediatrics 126(6), 1182–1190 (2010)
Mayer, S.E., Jencks, C.: Growing up in poor neighborhoods: how much does it matter? Science 243(4897), 1441–1445 (1989)
McClellan, M., McNeil, B.J., Newhouse, J.P.: Does more intensive treatment of acute myocardial infarction in the elderly reduce mortality?. J. Am. Med. Assoc. 272(1), 859–866 (1994)
Neyman, J.: On the application of probability theory to agricultural experiments (translated and edited by D.M. Dabrowska and T. P. Speed). Stat. Sci. 5(4), 465–480 (1990)
Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, New York (2000)
Phibbs, C.S., Baker, L.C., Caughey, A.B., Danielsen, B., Schmitt, S.K., Phibbs, R.H.: Level and volume of neonatal intensive care and mortality in very-low-birth-weight infants. New Engl. J. Med. 356, 2165–2175 (2007)
Phibbs, C.S., Mark, D.H., Luft, H.S., Peltzman-Rennie, D.J., Garnick, D.W., Lichtenberg, E., McPhee, S.J.: Choice of hospital for delivery: a comparison of high-risk and low-risk women. Health Serv. Res. 28(2), 201–222 (1993)
Phibbs, C.S., Robinson, J.C.: A variable-radius measure of local hospital market structure. Health Serv. Res. 28(3), 313–324 (1993)
Prais, S.J., Aitchison, J.: The grouping of observations in regression analysis. Rev. Int. Stat. Inst. 22(1/3), 1–22 (1954)
Rogowski, J.A., Horbar, J.D., Staiger, D.O., Kenny, M., Carpenter, J., Geppert, J.: Indirect vs direct hospital quality indicators for very-low-birth-weight infants. J. Am. Med. Assoc. 291(2), 202–209 (2004)
Rosenbaum, P.R., Rubin, D.B.: Discussion of “on state education statistics”: a difficulty with regression analyses of regional test score averages. J. Edu. Stat. 10(4), 326–333 (1985)
Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688–701 (1974)
Rubin, D.B.: Statistics and causal inference: comment: which ifs have causal answers. J. Am. Stat. Assoc. 81(396), 961–962 (1986)
Stock, J.H., Wright, J.H., Yogo, M.: A survey of weak instruments and weak identification in generalized method of moments. J. Bus. Econ. Stat. 20(4), 518–529 (2002)
Theil, H.: Principles of Econometrics. Wiley, New York (1971)
Wald, A.: The fitting of straight lines if both variables are subject to error. Ann. Math. Stat. 11(3), 284–300 (1940)
Acknowledgments
The authors thank the Editors and the referees for helpful comments. This work was supported by the National Science Foundation (Measurement, Methodology and Statistics program) grant # NSF 0961971. This work was also supported by Maternal and Child Health Bureau (MCHB) grant # R40 MC05474-01-00 and by Agency for Healthcare Research and Quality (AHRQ) grant # R01 HS 01569.
Author information
Authors and Affiliations
Corresponding author
Appendix: Consistency of the TSLS estimator from the aggregate IV regression and its variance estimate
Appendix: Consistency of the TSLS estimator from the aggregate IV regression and its variance estimate
In this section, we will show the show the consistency of \(\hat{\beta}^{agg}\) obtained from the aggregate IV regression in Sect. 4 We will provide an estimate for the variance of \(\hat{\beta}^{agg}\).
Let (Y agg, X agg, D agg, Z agg) denote vectors of aggregated (Y, X, D, Z), and \(\underline{{\bf W}}^{agg} = [{\bf D}^{agg}, {\bf X}^{agg}]\) and \(\underline{{\bf A}}^{agg} = [{\bf Z}^{agg}, {\bf X}^{agg}]\). We use \((\underline{{\bf W}}^{agg}, \underline{{\bf A}}^{agg})\) to distinguish (W agg, A agg) used in Sect. 3 in which W agg = [D, X agg] and A agg = [Z, X agg]. Let (H Y , H A , H W ) be aggregation errors for (Y, A, W), where \({\bf H}_{Y} = {\bf Y} - {\bf Y}^{agg},\, {\bf H}_{A} = {\bf A} - \underline{{\bf A}}^{agg}\), and \({\bf H}_{W} = {\bf W} - \underline{{\bf W}}^{agg}\). The TSLS estimator \(\hat{\beta}^{agg}\) obtained from all aggregate variables is
If we can show that \(\underline{{\bf A}}^{agg^{T}}({\bf H}_{W} \beta - {\bf H}_{Y} + \varepsilon) \xrightarrow{p}{\bf 0}\) in (10), then \(\hat{\beta}^{agg}\) is a consistent estimator for β. We could write the aggregate matrices \(({\bf Y}^{agg}, \underline{{\bf A}}^{agg}, \underline{{\bf W}}^{agg})\) as (G Y, G A, G W). The matrix G is a diagonal grouping matrix, where
Thus, \(\underline{{\bf A}}^{agg^{T}}({\bf H}_{W}\beta - {\bf H}_{Y} + \varepsilon)\) can be written as \({\bf A}^{T}{\bf G}^{T}\{({\bf W} - {\bf G}{\bf W})\beta - ({\bf Y} - {\bf G}{\bf Y}) + \varepsilon\}\). Since G is a symmetric and idempotent matrix, G T(W − G W) and G T(Y − G Y) are zero. Also, \({\bf A}^{T}{\bf G}^{T}\varepsilon\) converges in probability to zero because of the assumption of independence between G and \(\varepsilon\). The variance of \(\hat{\beta}^{agg}\) is
where \(Var({\bf G} \varepsilon)\) can be estimated by
Rights and permissions
About this article
Cite this article
Hsu, J.Y., Lorch, S.A. & Small, D.S. Perils and prospects of using aggregate area level socioeconomic information as a proxy for individual level socioeconomic confounders in instrumental variables regression. Health Serv Outcomes Res Method 12, 119–140 (2012). https://doi.org/10.1007/s10742-012-0095-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10742-012-0095-9