Abstract
This chapter discusses the use of directed acyclic graphs (DAGs) for causal inference in the observational social sciences. It focuses on DAGs’ main uses, discusses central principles, and gives applied examples. DAGs are visual representations of qualitative causal assumptions: They encode researchers’ beliefs about how the world works. Straightforward rules map these causal assumptions onto the associations and independencies in observable data. The two primary uses of DAGs are (1) determining the identifiability of causal effects from observed data and (2) deriving the testable implications of a causal model. Concepts covered in this chapter include identification, d-separation, confounding, endogenous selection, and overcontrol. Illustrative applications then demonstrate that conditioning on variables at any stage in a causal process can induce as well as remove bias, that confounding is a fundamentally causal rather than an associational concept, that conventional approaches to causal mediation analysis are often biased, and that causal inference in social networks inherently faces endogenous selection bias. The chapter discusses several graphical criteria for the identification of causal effects of single, time-point treatments (including the famous backdoor criterion), as well identification criteria for multiple, time-varying treatments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Identification is also relative to the set of observed variables. Identification may be possible for one set of observed variables, but not for another set. Mimicking the logic of secondary data analysis, here I assume that the analyst is given a set of observed variables (and hence that all other variables are unobserved). Identification analysis can also be used to ask what sets of variables should be observed to achieve identification.
- 2.
A detailed tutorial for reading counterfactuals (including nested counterfactuals) off a DAG is presented in Section 4.4 of Pearl (2012a).
- 3.
By contrast, marginally correlated error terms must be explicitly included in the causal DAG, since they represent common causes.
- 4.
These assumptions are, first, the causal Markov assumption, which states that a variable is independent of its nondescendants given its parents, and second, stability or faithfulness, which, among other things, rules out exact cancelation of positive and negative effects. In this chapter, I mostly use weak faithfulness, which is the reason for interpreting arrows as possible rather than certain direct effects. Glymour and Greenland (2008) give an accessible summary. See Pearl (2009) and Spirtes et al. ([1993] 2001) for technical details.
- 5.
Common cause confounding by unobserved variables is sometimes represented by a bi-headed dashed arrow.
- 6.
Terminology is in flux. The name “endogenous selection bias” highlights that the problem originates from conditioning on an endogenous variable. Others prefer “selection bias” (Hernán et al. 2004), “collider stratification bias” (Greenland 2003), “M-bias” (Greenland 2003), “Berkson’s [1946] bias,” “explaining away effect” (Kim and Pearl 1983), or “conditioning bias” (Morgan and Winship 2007). Simpson’s paradox (Hernán et al. 2011) and the Monty-Hall dilemma (Burns and Wieth 2004) involve specific examples of endogenous selection. The shared structure of some examples of endogenous selection bias has been known in the social sciences at least since Heckman (1976). For a comprehensive treatment, see Elwert and Winship (forthcoming).
- 7.
Endogenous selection bias is guaranteed if one assumes that positive and negative arrows do not cancel each other out exactly, i.e. if the DAG is faithful. Faithfulness is a mild assumption since exact cancelation is exceedingly unlikely in practice.
- 8.
Occasionally, a variable may be both a collider and a common cause. In that case, conditioning on the variable may eliminate confounding bias but induce endogenous selection bias, whereas not conditioning on the variable would lead to confounding bias yet eliminate endogenous selection bias (Greenland 2003). Nevertheless, the definitions of confounding and endogenous selection remain distinct.
- 9.
D-Connectedness necessarily implies statistical dependence if the DAG is faithful.
- 10.
- 11.
- 12.
The requirement not to condition on a descendant of a variable on a causal path is explained in the discussion of Fig. 13.9 below.
- 13.
Pearl (1995, 2009) and others use so-called do-operator notation to write \( P\left( {{y^T}} \right) \) as \( P\left( {Y=y\,|\,\mathrm{do}(T=t)} \right) \). The do-operator do(T = t) emphasizes that T is set to t by intervention (“doing”). \( P(Y=y\,|\,\mathrm{do}(T=t)) \) gives the post-intervention distribution of Y if one intervened on T to set it to some specific value t, that is, the counterfactual distribution of T.
- 14.
D 5 is a descendant of the collider T→D 2←e 2 (recall the implied existence of idiosyncratic error terms), which opens the noncausal path T---e 2→D 2→Y.
- 15.
The difference between Fig. 13.13a, b illustrates why identifying the magnitude of a causal effect is more difficult than testing the null of no effect. If one could condition on W R in Fig. 13.13a, then the absence of an association between M and W 0 conditional on E and W R would imply the absence of a causal effect M→W 0—the null can be tested. But if there is an effect M→W 0, as in Fig. 13.13b, then the observed association between M and W 0 given E and W R is biased for the causal effect M→W 0—the magnitude of the effect cannot be measured.
- 16.
- 17.
Elwert and Christakis (2008) use additional knowledge of the network topology to gage and remove the bias from residual confounding (i.e., if conditioning on H does not solve the problem).
- 18.
- 19.
DAGs for triadic networks would usually include separate variables for the characteristics of all three members of a generic triad. Obviously, the complexity of a DAG increases with the complexity of social structure. This is one reason why causal inference in social networks is a difficult problem.
- 20.
Here, we focus on causal effects of time-varying treatments that contrast predetermined treatment sequences. For two binary unit treatments, we can define six causal effects corresponding to the six pairwise contrasts between the four possible predetermined treatment sequences, here, (math, math), (math, English), (English, math), and (English, English). Note that some of these causal effects, such as (math, English) vs. (English, English), equal so-called controlled direct effects (Pearl 2001; Robins and Greenland 1992). The identification criteria discussed in this section apply to all causal effects of predetermined treatment sequences and hence to all controlled direct causal effects. See Bollen and Pearl (Chap. 15, this volume) and Wang and Sobel (Chap. 12, this volume) for mediation formulae and the identification of other types of (“natural” or “pure”) direct and indirect effects. See Robins and Richardson (2011) and Pearl (2012b) for graphical identification conditions of path-specific effects. See Robins and Hernán (2009) for yet other types of time-varying treatments, especially the distinction between static and dynamic time-varying treatment effects.
- 21.
Note that the joint causal effect of A 0 and A 1 is not the same as the total causal effect of A 0 plus the total causal effect of A 1, as is sometimes incorrectly thought.
- 22.
A minimally sufficient set is a sufficient set with the smallest number of variables. There may be multiple minimally sufficient sets.
References
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182.
Berkson, J. (1946). Limitations of the application of fourfold table analysis to hospital data. Biometrics Bulletin, 2(3), 47–53.
Blalock, H. M. (1964). Causal inferences in nonexperimental research. Chapel Hill: University of North Carolina Press.
Brito, C., & Pearl, J. (2002). Generalized instrumental variables. In A. Darwiche & N. Friedman (Eds.), Uncertainty in artificial intelligence, proceedings of the eighteenth conference (pp. 85–93). San Francisco: Morgan Kaufmann.
Brumback, B. A., Hernán, M. A., Haneuse, S. J. P. A., & Robins, J. M. (2004). Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Statistics in Medicine, 23, 749–767.
Burns, B. D., & Wieth, M. (2004). The collider principle in causal reasoning: Why the Monty Hall Dilemma is so hard. Journal of Experimental Psychology: General, 133(3), 434–449.
Chan, H., & Kuroki, M. (2010). Using descendants as instrumental variables for the identification of direct causal effects in linear SEMs. In Proceedings of the thirteenth international conference on Artificial Intelligence and Statistics (AISTATS-10) (pp. 73–80), Sardinia, Italy.
Cole, S. R., & Hernán, M. A. (2002). Fallibility in estimating direct effects (with discussion). International Journal of Epidemiology, 31, 163–165.
Duncan, O. D. (1975). Introduction to structural equation models. New York: Academic.
Elwert, F., & Christakis, N. A. (2006). Widowhood and race. American Sociological Review, 71(1), 16–41.
Elwert, F., & Christakis, N. A. (2008). Wives and ex-wives: A new test for homogamy bias in the widowhood effect. Demography, 45(4), 851–873.
Elwert, F., & Winship, C. (2010). Effect heterogeneity and bias in main-effects-only regression models. In R. Dechter, H. Geffner, & J. Y. Halpern (Eds.), Heuristics, probability and causality: A tribute to Judea Pearl (pp. 327–336). London: College Publications.
Elwert, F., & Winship, C. (forthcoming). Endogenous selection bias the dangers of conditioning on collider variables. Annual Review of Sociology.
Farr, W. (1858). Influence of marriage on the mortality of the French people. In G. W. Hastings (Ed.), Transactions of the national association for the promotion of social science (pp. 504–513). London: John W. Park & Son.
Finn, J. D., & Achilles, C. M. (1990). Answers and questions about class size. American Educational Research Journal, 27(3), 557–577.
Finn, J. D., Gerber, S. B., & Boyd-Zaharias, J. (2005). Small classes in the early grades, academic achievement, and graduating from high school. Journal of Educational Psychology, 97(2), 214–223.
Fowler, J. H., & Christakis, N. A. (2010). Cooperative behavior cascades in human social networks. PNAS: Proceedings of the National Academy of Sciences, 107(12), 5334–5338.
Galles, D., & Pearl, J. (1998). An axiomatic characterization of causal counterfactuals. Foundations of Science, 3(1), 151–182.
Glymour, M. M., & Greenland, S. (2008). Causal diagrams. In K. J. Rothman, S. Greenland, & T. Lash (Eds.), Modern epidemiology (3rd ed., pp. 183–209). Philadelphia: Lippincott.
Greenland, S. (2003). Quantifying biases in causal models: Classical confounding versus collider-stratification bias. Epidemiology, 14, 300–306.
Greenland, S. (2010). Overthrowing the tyranny of null hypotheses hidden in causal diagrams. In R. Dechter, H. Geffner, & J. Y. Halpern (Eds.), Heuristics, probability and causality: A tribute to Judea Pearl (pp. 365–382). London: College Publications.
Greenland, S., & Robins, J. M. (1986). Identifiability, exchangeability and epidemiological confounding. International Journal of Epidemiology, 15, 413–419.
Greenland, S., Pearl, J., & Robins, J. M. (1999a). Causal diagrams for epidemiologic research. Epidemiology, 10, 37–48.
Greenland, S., Robins, J. M., & Pearl, J. (1999b). Confounding and collapsibility in causal inference. Statistical Science, 14, 29–46.
Gronau, R. (1974). Wage comparisons-a selectivity bias. Journal of Political Economy, 82, 1119–1144.
Heckman, J. J. (1974). Shadow prices, market wages and labor supply. Econometrica, 42(4), 679–694.
Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection, and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5, 475–492.
Hernán, M. A., Hernández-Diaz, S., Werler, M. M., Robins, J. M., & Mitchell, A. A. (2002). Causal knowledge as a prerequisite of confounding evaluation: An application to birth defects epidemiology. American Journal of Epidemiology, 155(2), 176–184.
Hernán, M. A., Hernández-Diaz, S., & Robins, J. M. (2004). A structural approach to section bias. Epidemiology, 155(2), 174–184.
Hernán, M. A., Clayton, D., & Keiding, N. (2011). The Simpson’s paradox unraveled. International Journal of Epidemiology, 40, 780–785.
Holland, P. W. (1986). Statistics and causal inference (with discussion). Journal of the American Statistical Association, 81, 945–970.
Holland, P. W. (1988). Causal inference, path analysis, and recursive structural equation models. Sociological Methodology, 18, 449–484.
Kim, J.H., & Pearl, J. (1983). A computational model for combined causal and diagnostic reasoning in inference systems. In Proceedings of the 8th International Joint Conference on Artificial Intelligence (pp. 190–193). Karlsruhe.
Kyono, T. (2010). Commentator: A front-end user-interface module for graphical and structural equation modeling (Tech. Rep. (R-364)). UCLA Cognitive Systems Laboratory.
Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. Cambridge: Cambridge University Press.
Morgan, S. L., & Winship, C. (2012). Bringing context and variability back in to causal analysis. In H. Kincaid (Ed.), Oxford handbook of the philosophy of the social sciences. New York: Oxford University Press.
Neyman, J. ([1923] 1990). On the application of probability theory to agricultural experiments. Essay on principles, section 9, translated (with discussion). Statistical Science, 5(4), 465–480.
O’Malley, A. J., Elwert, F., Rosenquist, J. N., Zaslavsky, A. M., & Christakis, N. A. (2012). Estimating peer effects in longitudinal dyadic data using instrumental variables (Working Paper). Department of Health Care Policy, Harvard Medical School.
Pearl, J. (1985). Bayesian networks: A model of self-activated memory for evidential reasoning. In Proceedings, Cognitive Science Society (pp. 329–334). Irvine: University of California.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo: Morgan Kaufman.
Pearl, J. (1993). Comment: Graphical models, causality, and interventions. Statistical Science, 8(3), 266–269.
Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4), 669–710.
Pearl, J. (1998). Graphs, causality, and structural equation models. Sociological Methods and Research, 27(2), 226–284.
Pearl, J. (2001). Direct and indirect effects. In Proceedings of the seventeenth conference on Uncertainty in Artificial Intelligence (pp. 411–420). San Francisco: Morgan Kaufmann.
Pearl, J. ([2000] 2009). Causality: Models, reasoning, and inference (2nd ed.). Cambridge: Cambridge University Press.
Pearl, J. (2010). The foundations of causal inference. Sociological Methodology, 40, 75–149.
Pearl, J. (2012a). The causal foundations of structural equation modeling. In R. H. Hoyle (Ed.), Handbook of structural equation modeling (pp. 68–91). New York: Guilford Press.
Pearl, J. (2012b). Interpretable conditions for identifying direct and indirect effects (Tech. Rep. (R-389)). UCLA Cognitive Systems Laboratory.
Pearl, J., & Robins, J. M. (1995). Probabilistic evaluation of sequential plans from causal models with hidden variables. In P. Besnard & S. Hanks (Eds.), Uncertainty in artificial intelligence 11 (pp. 444–453). San Francisco: Morgan Kaufmann.
Robins, J. M. (1986). A new approach to causal inference in mortality studies with a sustained exposure period: Application to the health worker survivor effect. Mathematical Modeling, 7, 1393–1512.
Robins, J. M. (1989). The control of confounding by intermediate variables. Statistics in Medicine, 8, 679–701.
Robins, J. M. (1997). Causal inference from complex longitudinal data. In M. Berkane (Ed.), Latent variable modeling and applications to causality (Lecture notes in statistics 120, pp. 69–117). New York: Springer.
Robins, J. M. (1999). Association, causation, and marginal structural models. Synthese, 121, 151–179.
Robins, J. M. (2001). Data, design, and background knowledge in etiologic inference. Epidemiology, 23(3), 313–320.
Robins, J. M., & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3, 143–155.
Robins, J. M., & Hernán, M. A. (2009). Estimation of the causal effects of time-varying exposures. In G. Fitzmaurice et al. (Eds.), Handbooks of modern statistical methods: Longitudinal data analysis (pp. 553–599). Boca Raton: CRC Press.
Robins, J. M., & Richardson, T. (2011). Alternative graphical causal models and the identification of direct effects. In P. Shrout, K. Keyes, & K. Ornstein (Eds.), Causality and psychopathology: Finding the determinants of disorders and their cures (pp. 103–158). New York: Oxford University Press.
Robins, J. M., & Wasserman, L. (1999). On the impossibility of inferring causation from association without background knowledge. In C. N. Glymour & G. G. Cooper (Eds.), Computation, causation, and discovery (pp. 305–321). Cambridge: AAAI/MIT Press.
Rosenbaum, P. R. (1984). The consequences of adjustment for a concomitant variable that has been affected by the treatment. Journal of the Royal Statistical Society, Series A, 147(5), 656–666.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology, 66, 688–701.
Rubin, D. B. (1980). Comment on ‘randomization analysis of experimental data in the fisher randomization test’ by Basu. Journal of the American Statistical Association, 75, 591–593.
Shalizi, C. R., & Thomas, A. C. (2011). Homophily and contagion are generically confounded in observational social network studies. Sociological Methods and Research, 40, 211–239.
Sharkey, P., & Elwert, F. (2011). The legacy of disadvantage: Multigenerational neighborhood effects on cognitive ability. The American Journal of Sociology, 116(6), 1934–1981.
Shpitser, I., & Pearl, J. (2006). Identification of conditional interventional distributions. In R. Dechter & T. S. Richardson (Eds.), Proceedings of the twenty-first national conference on Artificial Intelligence (pp. 437–444). Menlo Park: AAAI Press.
Shpitser, I., & Pearl, J. (2007). What counterfactuals can be tested. In Proceedings of the twenty-third conference on Uncertainty in Artificial Intelligence (UAI-07) (pp. 352–359). Corvallis: AUAI Press.
Shpitser, I., VanderWeele, T. J., & Robins, J. M. (2010). On the validity of covariate adjustment for estimating causal effects. In Proceedings of the 26th conference on Uncertainty and Artificial Intelligence (pp. 527–536). Corvallis: AUAI Press.
Shrier, I. (2009). Letter to the editor. Statistics in Medicine, 27, 2740–2741.
Sjölander, A. (2009). Letter to the editor: Propensity scores and M-structures. Statistics in Medicine, 28, 1416–1423.
Smith, H. L. (1990). Specification problems in experimental and nonexperimental social research. Sociological Methodology, 20, 59–91.
Sobel, M. E. (2008). Identification of causal parameters in randomized studies with mediating variables. Journal of Educational and Behavioral Statistics, 33(2), 230–251.
Spirtes, P., Glymour, C. N., & Schein, R. ([1993] 2001). Causation, prediction, and search (2nd ed.). Cambridge, MA: MIT Press.
Textor, J., Hardt, J., & Knüppel, S. (2011). Letter to the editor: DAGitty: A graphical tool for analyzing causal diagrams. Epidemiology, 22(5), 745.
VanderWeele, T. J. (2009). On the distinction between interaction and effect modification. Epidemiology, 20, 863–871.
VanderWeele, T. J. (2011). Sensitivity analysis for contagion effects in social networks. Sociological Methods and Research, 40, 240–255.
VanderWeele, T. J., & Robins, J. M. (2007). Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology, 18(5), 561–568.
VanderWeele, T. J., & Robins, J. M. (2009). Minimal sufficient causation and directed acyclic graphs. The Annals of Statistics, 37, 1437–1465.
VanderWeele, T. J., & Shpitser, I. (2011). A new criterion for confounder selection. Biometrics, 67, 1406–1413.
Verma, T., & Pearl, J. (1988). Causal networks: Semantics and expressiveness. In Proceedings of the fourth workshop on Uncertainty in Artificial Intelligence (pp. 352–359). Minneapolis/Mountain View: AUAI Press.
Winship, C., & Harding, D. J. (2008). A mechanism-based approach to the identification of age-period-cohort models. Sociological Methods and Research, 36(3), 362–401.
Wodtke, G. T., Harding, D. J., & Elwert, F. (2011). Neighborhood effects in temporal perspective: The impact of long-term exposure to concentrated disadvantage on high school graduation. American Sociological Review, 76, 713–736.
Wooldridge, J. (2005). Violating ignorability of treatment by controlling for too many factors. Econometric Theory, 21, 1026–1028.
Wooldridge, J. (2006). Acknowledgement of related prior work. Econometric Theory, 22, 1177–1178.
Acknowledgments
I thank Stephen Morgan, Judea Pearl, Tyler VanderWeele, Xiaolu Wang, Christopher Winship, and my students in Soc 952 at the University of Wisconsin for discussions and advice. Janet Clear and Laurie Silverberg provided editorial assistance. All errors are mine.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Elwert, F. (2013). Graphical Causal Models. In: Morgan, S. (eds) Handbook of Causal Analysis for Social Research. Handbooks of Sociology and Social Research. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6094-3_13
Download citation
DOI: https://doi.org/10.1007/978-94-007-6094-3_13
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6093-6
Online ISBN: 978-94-007-6094-3
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)