Skip to main content

Causal Inference in Biostatistics

  • Chapter
  • First Online:
Handbook of Statistical Bioinformatics

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

  • 1106 Accesses

Abstract

This chapter considers the problems of causal inference in biostatistics. We briefly overview the concepts of causal effect and causal discovery and then introduce the potential outcome approach for defining and assessing causal effect. We consider the problems of causal inference with data from random clinical trials and the real world separately. For causal inference with clinical trial data, we put the focus on the methods to address the problems of missing data and post-treatment variables. For causal inference with observational data, we provide a detailed discussion on the methods to address the problems of measured and unmeasured cofounding. Further, we briefly review the current research topics in causal inference. Finally, we conclude the chapter with a list of software for estimating causal effect.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hernán MA (2004) A definition of causal effect for epidemiological research. J Epidemiol Community Health 58(4):265–271. https://doi.org/10.1136/JECH.2002.006361

    Article  Google Scholar 

  2. Little RJ, Rubin DB (2000) Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. Annu Rev Public Health 21:121–145

    Article  CAS  Google Scholar 

  3. Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55

    Article  Google Scholar 

  4. Friedman N, Linial M, Nachman I, Pe’er D. (2000) Using Bayesian networks to analyze expression data. J Comput Biol: J Comput Mol Cell Biol 7(3–4):601–620. https://doi.org/10.1089/106652700750050961

    Article  CAS  Google Scholar 

  5. Murphy K, Murphy K, Mian S (1999) Modelling gene expression data using dynamic Bayesian networks

    Google Scholar 

  6. Spirtes P, Zhang K (2016) Causal discovery and inference: concepts and recent methodological advances. Appl Inform 3(1):1–28. https://doi.org/10.1186/S40535-016-0018-X

    Article  Google Scholar 

  7. Verma, Thomas, and Judea Pearl. 1990. “Causal networks: semantics and expressiveness.” Machine intelligence and pattern recognition 9(C):69–76. doi: https://doi.org/10.1016/B978-0-444-88650-7.50011-1

    Chapter  Google Scholar 

  8. Andersen H (2013) When to expect violations of causal faithfulness and why it matters. Philos Sci 80(5):672–683. https://doi.org/10.1086/673937/0

    Article  Google Scholar 

  9. Woodward J (2010) Causation in biology: stability, specificity, and the choice of levels of explanation. Biol Philos 25(3):287–318. https://doi.org/10.1007/S10539-010-9200-Z

    Article  Google Scholar 

  10. Pearl J (1995) Causal diagrams for empirical research. Biometrika 82(4):669. https://doi.org/10.2307/2337329

    Article  Google Scholar 

  11. Warrell J, Gerstein M (2020) Cyclic and multilevel causation in evolutionary processes. Biol Philos 35(5):1–36. https://doi.org/10.1007/S10539-020-09753-3/FIGURES/2

    Article  Google Scholar 

  12. Rubenstein PK, Weichwald S, Bongers S, Mooij JM, Janzing D, Grosse-Wentrup M, Schölkopf B (2017) Causal consistency of structural equation models | Max Planck Institute for Intelligent Systems. P. ID 11. In: Proceedings of the 33rd conference on uncertainty in artificial intelligence (UAI)

    Google Scholar 

  13. Glymour C, Zhang K, Spirtes P (2019) Review of causal discovery methods based on graphical models. Front Genet 10:524. https://doi.org/10.3389/FGENE.2019.00524

    Article  Google Scholar 

  14. Neyman J (1923) On the application of probability theory to agricultural experiments. Essay on principles. Section 9 on JSTOR. Stat Sci 5(4):465–480. Translated in Statistical Science (1990)

    Google Scholar 

  15. Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies 1. J Educ Psychol 66(5):688–701

    Article  Google Scholar 

  16. Hernán MA, Robins JM (2020) Causal inference: what if. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  17. Imbens GW (2000) The role of the propensity score in estimating dose-response functions on JSTOR. Biometrika 87(3)

    Google Scholar 

  18. Wright PG (1928) The tariff on animal and vegetable oils. Macmillan, New York

    Google Scholar 

  19. Pearl J (2009) Causal inference in statistics: an overview. Stat Surv 3:96–146. https://doi.org/10.1214/09-SS057

    Article  Google Scholar 

  20. Fisher RA (1925) Statistical methods for research workers. Oliver & Boyd, London

    Google Scholar 

  21. Little RJ, Rubin DB (2014) Statistical analysis with missing data, pp 1–381. https://doi.org/10.1002/9781119013563

    Book  Google Scholar 

  22. Rubin DB (1976) Inference and missing data. Biometrika 63(3):581. https://doi.org/10.2307/2335739

    Article  Google Scholar 

  23. Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley

    Book  Google Scholar 

  24. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39(1):1–22. https://doi.org/10.1111/J.2517-6161.1977.TB01600.X

    Article  Google Scholar 

  25. Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260):663–685. https://doi.org/10.1080/01621459.1952.10483446

    Article  Google Scholar 

  26. Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89(427):846–866. https://doi.org/10.1080/01621459.1994.10476818

    Article  Google Scholar 

  27. Bang H, Robins JM (2005) Doubly robust estimation in missing data and causal inference models. Biometrics 61(4):962–973. https://doi.org/10.1111/j.1541-0420.2005.00377.x

    Article  Google Scholar 

  28. Kim JK, Yu CL (2011) Semiparametric estimation of mean functionals with nonignorable missing data. J Am Stat Assoc 106:157–165. https://doi.org/10.1198/jasa.2011.tm10104

    Article  CAS  Google Scholar 

  29. Robins JM, Rotnitzky A, Scharfstein DO (2000) Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models:1–94. https://doi.org/10.1007/978-1-4612-1284-3_1

  30. Heckman JJ (1979) Sample selection bias as a specification error. Econometrica 47(1). https://doi.org/10.2307/1912352

  31. Sun B, Liu L, Miao W, Wirth K, Robins J, Tchetgen EJ, Tchetgen. (2018) Semiparametric estimation with data missing not at random using an instrumental variable. Stat Sin 28:1965–1983. https://doi.org/10.5705/ss.202016.0324

    Article  Google Scholar 

  32. Tchetgen Tchetgen EJ, Wirth KE (2017) A general instrumental variable framework for regression analysis with outcome missing not at random. Biometrics 73(4):1123–1131. https://doi.org/10.1111/BIOM.12670

    Article  Google Scholar 

  33. Ibrahim JG, Lipsitz SR, Horton N (2001) Using auxiliary data for parameter estimation with non-ignorably missing outcomes. J R Stat Soc: Ser C: Appl Stat 50(3):361–373. https://doi.org/10.1111/1467-9876.00240

    Article  Google Scholar 

  34. Miao W, Tchetgen Tchetgen EJ (2016) On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika 103(2):475. https://doi.org/10.1093/BIOMET/ASW016

    Article  Google Scholar 

  35. Kott PS (2014) Calibration weighting when model and calibration variables can differ:1–18. https://doi.org/10.1007/978-3-319-05320-2_1

  36. Rose A, Triano C, Alatovic J, Maas S (2020) Pfizer and biotech conclude phase 3 study of COVID-19 vaccine candidate meeting all primary efficacy endpoints. Pfizer Inc.

    Google Scholar 

  37. Hughes MD, Daniels MJ, Fischl MA, Kim S, Schooley RT (1998) CD4 cell count as a surrogate endpoint in HIV clinical trials: a meta-analysis of studies of the AIDS clinical trials group. AIDS (London, England) 12(14):1823–1832. https://doi.org/10.1097/00002030-199814000-00014

    Article  CAS  Google Scholar 

  38. Mellors JW, Muñoz A, Giorgi JV, Margolick JB, Tassoni CJ, Gupta P, Kingsley LA, Todd JA, Saah AJ, Detels R, Phair JP, Rinaldo CR (1997) Plasma viral load and CD4+ lymphocytes as prognostic markers of HIV-1 infection. Ann Intern Med 126(12):946–954. https://doi.org/10.7326/0003-4819-126-12-199706150-00003

    Article  CAS  Google Scholar 

  39. Frumento P, Mealli F, Pacini B, Rubin DB (2012) Evaluating the effect of training on wages in the presence of noncompliance, nonemployment, and missing outcome data. J Am Stat Assoc 107(498):450–466. https://doi.org/10.1080/01621459.2011.643719

    Article  Google Scholar 

  40. Zhang JL, Rubin DB, Mealli F (2009) Likelihood-based analysis of causal effects of job-training programs using principal stratification. J Am Stat Assoc 104(485):166–176. https://doi.org/10.1198/JASA.2009.0012

    Article  CAS  Google Scholar 

  41. Chen H, Geng Z, Zhou XH (2009) Identifiability and estimation of causal effects in randomized trials with noncompliance and completely nonignorable missing data. Biometrics 65(3):675–682. https://doi.org/10.1111/J.1541-0420.2008.01120.X

    Article  Google Scholar 

  42. Taylor L, Zhou X-H (2011) Methods for clustered encouragement design studies with noncompliance and missing data. Biostatistics (Oxford, England) 12(2):313–326. https://doi.org/10.1093/BIOSTATISTICS/KXQ065

    Article  Google Scholar 

  43. Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91(434):444–455. https://doi.org/10.1080/01621459.1996.10476902

    Article  Google Scholar 

  44. Ding P, Geng Z, Yan W, Zhou X-H (2011) Identifiability and estimation of causal effects by principal stratification with outcomes truncated by death. J Am Stat Assoc 106(496):1578–1591. https://doi.org/10.1198/jasa.2011.tm10265

    Article  CAS  Google Scholar 

  45. Ding P, Lu J (2017) Principal stratification analysis using principal scores. J R Stat Soc Ser B Stat Methodol 79(3):757–777. https://doi.org/10.1111/RSSB.12191

    Article  Google Scholar 

  46. Wang L, Richardson TS, Zhou XH (2017) Causal analysis of ordinal treatments and binary outcomes under truncation by death. J R Stat Soc Ser B Stat Methodol 79(3):719–735. https://doi.org/10.1111/RSSB.12188

    Article  Google Scholar 

  47. Wang L, Zhou X-H, Richardson TS (2017) Identification and estimation of causal effects with outcomes truncated by death. Biometrika 104(3):597–612. https://doi.org/10.1093/BIOMET/ASX034

    Article  Google Scholar 

  48. Mealli F, Pacini B (2013) Using secondary outcomes to sharpen inference in randomized experiments with noncompliance. J Am Stat Assoc 108(503):1120–1131. https://doi.org/10.1080/01621459.2013.802238

    Article  CAS  Google Scholar 

  49. Han S, Rubin DB (2021) Contrast-specific propensity scores. Biostat & Epidemiol 5(1):1–8. https://doi.org/10.1080/24709360.2021.1936421

    Article  Google Scholar 

  50. Imbens GW, Rubin DB (1997) Bayesian inference for causal effects in randomized experiments with noncompliance. Ann Stat 25(1):305–327

    Article  Google Scholar 

  51. Lipsitch M, Tchetgen ET, Cohen T (2010) Negative controls: a tool for detecting confounding and bias in observational studies. Epidemiology 21(3):383–388

    Article  Google Scholar 

  52. Shi X, Miao W, Tchetgen ET (2020) A selective review of negative control methods in epidemiology. Curr Epidemiol Rep 7(4):190–202. https://doi.org/10.1007/S40471-020-00243-4

    Article  Google Scholar 

  53. Lechner M (2010) The estimation of causal effects by difference-in-difference methods. Found Trends Econom 4(3):165–224. https://doi.org/10.1561/0800000014

    Article  Google Scholar 

  54. Abadie A, Diamond A, Hainmueller J (2010) Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program. J Am Stat Assoc 105(490). https://doi.org/10.1198/jasa.2009.ap08746

  55. Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc 113(523):1228–1242. https://doi.org/10.1080/01621459.2017.1319839

    Article  CAS  Google Scholar 

  56. Guo W, Zhou X-H, Ma S (2021) Estimation of optimal individualized treatment rules using a covariate-specific treatment effect curve with high-dimensional covariates. J Am Stat Assoc 116(533):309–321. https://doi.org/10.1080/01621459.2020.1865167

    Article  CAS  Google Scholar 

  57. Qiu Y, Tao J, Zhou X-H (2021) Inference of heterogeneous treatment effects using observational data with high-dimensional covariates. J R Stat Soc Ser B Methodol:1–28. https://doi.org/10.1111/rssb.12469

  58. Wu P, Han S, Tong X, Li R (2021) Propensity score regression for causal inference with treatment heterogeneity

    Google Scholar 

  59. Ma Y, Zhou X-H (2017) Treatment selection in a randomized clinical trial via covariate-specific treatment effect curves. Stat Methods Med Res 26(1):124–141. https://doi.org/10.1177/0962280214541724

    Article  Google Scholar 

  60. Song X, Pepe MS (2004) Evaluating markers for selecting a patient’s treatment. Biometrics 60(4):874–883. https://doi.org/10.1111/J.0006-341X.2004.00242.X

    Article  Google Scholar 

  61. Frieden TR (2017) Evidence for health decision making — beyond randomized, controlled trials. N Engl J Med 377(5):465–475. https://doi.org/10.1056/NEJMRA1614394

    Article  Google Scholar 

  62. Li X, Miao W, Fang L, Zhou X-H (2021) Improving efficiency of inference in clinical trials with external control data. Biometrics. https://doi.org/10.1111/BIOM.13583

  63. Yang S, Ding P (2020) Combining multiple observational data sources to estimate causal effects. J Am Stat Assoc 115(531):1540–1554. https://doi.org/10.1080/01621459.2019.1609973

    Article  CAS  Google Scholar 

  64. Liu R, Rizzo S, Whipple S, Pal N, Pineda AL, Lu M, Arnieri B, Lu Y, Capra W, Copping R, Zou J (2021) Evaluating eligibility criteria of oncology trials using real-world data and AI. Nature 592(7855):629–633. https://doi.org/10.1038/s41586-021-03430-5

    Article  CAS  Google Scholar 

  65. Kallus N, Puli AM, Shalit U (2018) Removing hidden confounding by experimental grounding. Adv Neural Inf Proces Syst 31

    Google Scholar 

  66. Lechner M (2001) Equation section identification and estimation of causal effects of multiple treatments under the conditional independence assumption. In: Pfeiffer F (ed) Econometric evaluation of labour market policies. Physica, Heidelberg

    Chapter  Google Scholar 

  67. Ho DE, Imai K, King G, Stuart EA (2011) MatchIt: nonparametric preprocessing for parametric causal inference. J Stat Softw 42(8):1–28. https://doi.org/10.18637/JSS.V042.I08

    Article  Google Scholar 

  68. Sekhon JS (2011) Multivariate and propensity score matching software with automated balance optimization: the matching package for R. J Stat Softw 42(7):1–52. https://doi.org/10.18637/JSS.V042.I07

    Article  Google Scholar 

  69. Cefalu M, Ridgeway G, McCaffrey D, Morral A, Griffin BA, Burgette L (2021). CRAN – package twang. https://cran.r-project.org/web/packages/twang/index.html. Accessed 28 Oct 2021

  70. Iacus SM, King G, Porro G (2012) Causal inference without balance checking: coarsened exact matching. Polit Anal 20(1):1–24. https://doi.org/10.1093/PAN/MPR013

    Article  Google Scholar 

  71. Hansen BB, Fredrickson M, Buckner J, Errickson J, Rauh A, Solenberger P (n.d.) CRAN – package optmatch. https://cran.r-project.org/web/packages/optmatch/index.html. Accessed 29 Oct 2021

  72. Fong C, Ratkovic M, Imai K, Hazlett C, Yang X, Peng S (2021) R package ‘CBPS’. https://imai.fas.harvard.edu/research/CBPStheory.html. Accessed 28 Oct 2021

  73. Hainmueller J (2012) Entropy balancing for causal effects: a multivariate reweighting method to produce balanced samples in observational studies. Polit Anal 20(1):25–46. https://doi.org/10.1093/PAN/MPR025

    Article  Google Scholar 

  74. Saul BC, Hudgens MG (2017) A recipe for interference: start with causal inference. Add interference. Mix well with R. J Stat Softw 82:1–21. https://doi.org/10.18637/JSS.V082.I02

    Article  Google Scholar 

  75. Gruber S, van der Laan MJ (2012) Tmle: an R package for targeted maximum likelihood estimation. J Stat Softw 51(13):1–35. https://doi.org/10.18637/JSS.V051.I13

    Article  Google Scholar 

  76. Fox J, Kleiber C, Zeileis A (2020) Ivreg: two-stage least-squares regression with diagnostics. https://cran.r-project.org/web/packages/ivreg/vignettes/ivreg.html. Accessed 28 Oct 2021

  77. Abadie A, Diamond A, Hainmueller J (2011) Synth: an R package for synthetic control methods in comparative case studies. J Stat Softw 42(13):1–17. https://doi.org/10.18637/JSS.V042.I13

    Article  Google Scholar 

  78. Brodersen KH, Gallusser F, Koehler J, Remy N, Scott SL (2015) Inferring causal impact using Bayesian structural time-series models. Ann Appl Stat

    Google Scholar 

  79. Tibshirani J, Athey S, Friedberg R, Hadad V, Hirshberg D, Miner L, Sverdrup E, Wager S, Wright M (2021) Generalized random forests. [R package Grf version 2.0.2]

    Google Scholar 

  80. Wang Y, Blei DM (2020) The blessings of multiple causes. J Am Stat Assoc 114(528):1574–1596. https://doi.org/10.1080/01621459.2019.1686987

    Article  CAS  Google Scholar 

  81. Wu P, Hu W, Deng Y, Zhou X-H (2021) CSTE: covariate specific treatment effect (CSTE) curve. https://cran.r-project.org/web/packages/CSTE/index.html

  82. Frangakis CE, Rubin DB (2002) Principal stratification in causal inference. Biometrics 58(1):21–9. https://doi.org/10.1111/j.0006-341x.2002.00021.x

  83. Gilbert PB, Hudgens MG (2008) Evaluating candidate principal surrogate endpoints. Biometrics 64(4):1146–1154. https://doi.org/10.1111/j.1541-0420.2008.01014.x

  84. Robins JM, Rotnitzky A, Zhao LP (1995) Analysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing Data. J Am Stat Assoc 90(429):106–121. https://doi.org/10.2307/2291134

  85. Helmreich JE, Pruzek RM (2009) PSAgraphics: An R Package to Support Propensity Score Analysis. J Stat Softw 29(6):1–23. https://doi.org/10.18637/jss.v029.i06

Download references

Acknowledgments

We thank Wenjie Hu and Peng Wu for their kindly comments on combing RCTs and observational data. We thank Yuhao Deng for his suggestions on the relevant software as well as the figure illustrations, especially for his helpful discussions on the draft of the chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Hua Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Han, S., Zhou, XH. (2022). Causal Inference in Biostatistics. In: Lu, H.HS., Schölkopf, B., Wells, M.T., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-65902-1_11

Download citation

Publish with us

Policies and ethics