Skip to main content

Privacy-preserving estimation of an optimal individualized treatment rule: a case study in maximizing time to severe depression-related outcomes


Estimating individualized treatment rules—particularly in the context of right-censored outcomes—is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While this motivates the use of very large datasets such as those from multiple health systems or centres, data privacy may be of concern with participating data centres reluctant to share individual-level data. In this case study on the treatment of depression, we demonstrate an application of distributed regression for privacy protection used in combination with dynamic weighted survival modelling (DWSurv) to estimate an optimal individualized treatment rule whilst obscuring individual-level data. In simulations, we demonstrate the flexibility of this approach to address local treatment practices that may affect confounding, and show that DWSurv retains its double robustness even when performed through a (weighted) distributed regression approach. The work is motivated by, and illustrated with, an analysis of treatment for unipolar depression using the United Kingdom’s Clinical Practice Research Datalink.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  • Alam S, Moodie EEM, Stephens DA (2019) Should a propensity score model be super? The utility of ensemble procedures for causal adjustment. J Am Stat Assoc 38(9):1690–1702

    MathSciNet  Google Scholar 

  • Bauer, M., A. Pfennig, E. Severus, P. C. Whybrow, J. Angst, H.-J. Möller, and World Federation of Societies of Biological Psychiatry Task Force on Unipolar Depressive Disorders (2013) World Federation of Societies of Biological Psychiatry (WFSBP) guidelines for biological treatment of unipolar depressive disorders, part 1: Update 2013 on the acute and continuation treatment of unipolar depressive disorders. World J Biol Psychiatry 14(5):334–385

  • Blanco C, Patel SR, Liu L, Jiang H, Lewis-Fernández R, Schmidt AB, Liebowitz MR, Olfson M (2007) National trends in ethnic disparities in mental health care. Med Care 45(11):1012–1019

    Article  Google Scholar 

  • Chakraborty B (2011) Dynamic treatment regimes for managing chronic health conditions: a statistical perspective. Am J Public Health 101:40–45

    Article  Google Scholar 

  • Chakraborty B, Moodie EEM (2013) Statistical methods for dynamic treatment regimes: reinforcement learning, causal inference, and personalized medicine. Springer, New York

    Book  Google Scholar 

  • Cipriani A, Furukawa TA, Salanti G, Geddes JR, Higgins JP, Churchill R, Watanabe N, Nakagawa A, Omori IM, McGuire H, Tansella M, Barbui C (2009) Comparative efficacy and acceptability of 12 new-generation antidepressants: a multiple-treatments meta-analysis. Lancet 373(9665):746–758

    Article  Google Scholar 

  • Cook AJ, Wellman RD, Marsh TL, Tiwari RC, Nguyen MD, Russek-Cohen E, Jiang Z, Nelson JC (2012) Statistical methods for estimating causal risk differences in the distributed data setting for postmarket safety outcomes. Technical report, Mini-Sentinel

    Google Scholar 

  • Cook A, Wellman R, Marsh T, Tiwari R (2014) Group sequential method for observational data by using generalized estimating equations: application to vaccine safety datalink. J R Stat Soc Ser C 64(2):319–338

    MathSciNet  Article  Google Scholar 

  • Cook AJ, Wellman RD, Marsh T, Shoaibi A, Tiwari R, Nguyen M, Boudreau D, Weintraub ES, Jackson L, Nelson JC (2019) Applying sequential surveillance methods that use regression adjustment or weighting to control confounding in a multisite, rare-event, distributed setting: Part 2 in-depth example of a reanalysis of the measles-mumps-rubella-varicella combination vaccine and seizure risk. J Clin Epidemiol 64(2):114–122

    Article  Google Scholar 

  • Coulombe J, Moodie EEM, Shortreed SM, Renoux C (2021) Can the risk of severe depression-related outcomes be reduced by tailoring the antidepressant therapy to patient characteristics? Am J Epidemiol 190(7):1210–1219

    Article  Google Scholar 

  • Danieli C, Moodie EEM (2021) Preserving data privacy when using multi-site data to estimate individualized treatment rules (under review), 1–20

  • Deas I, Robson B, Wong C, Bradford M (2003) Measuring neighbourhood deprivation: a critique of the index of multiple deprivation. Eviron Plann C Gov Policy 21(6):883–903

    Article  Google Scholar 

  • Gill RD, Laan M J van der, Robins JM (1997) Coarsening at random: characterizations, conjectures, counter-examples. In: Proceedings of the First Seattle Symposium in Biostatistics, pp 255–294. Springer

  • Goldberg Y, Kosorok MR (2012) Q-learning with censored data. Ann Stat 40(1):529–560

    MathSciNet  Article  Google Scholar 

  • Greco T, Zangrillo A, Biondi-Zoccai G, Landoni G (2013) Meta-analysis: pitfalls and hints. Heart Lung Vessels 5(4):219–225

    Google Scholar 

  • Greenland S (2003) Tests for interaction in epidemiologic studies: a review and a study of power. Stat Med 2(2):243–251

    MathSciNet  Article  Google Scholar 

  • Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, Van Staa T, Smeeth L (2015) Data resource profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol 44(3):827–836

    Article  Google Scholar 

  • Karr AF, Fulp WJ, Vera F, Young SS, Lin X, Reiter JP (2007) Secure, privacy-preserving analysis of distributed databases. Technometrics 49(3):335–345

    MathSciNet  Article  Google Scholar 

  • Kaufman DJ, Murphy-Bollinger J, Scott J, Hudson KL (2009) Public opinion about the importance of privacy in biobank research. Am J Hum Genet 85(5):643–654

    Article  Google Scholar 

  • Li F, Morgan KL, Zaslavsky AM (2018) Balancing covariates via propensity score weighting. J Am Stat Assoc 113(521):390–400

    MathSciNet  Article  Google Scholar 

  • Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22

    MathSciNet  Article  Google Scholar 

  • Luo L, Song P X-K (2021) Multivariate online regression analysis with heterogeneous streaming data. Can J Stat (accepted)

  • Mazor KM, Richards A, Gallagher M, Arterburn DE, Raebel MA, Nowell WB, Curtis JR, Paolino AR, Toh S (2017) Stakeholders’ views on data sharing in multicenter studies. J Comp Effect Res 6(6):537–547

  • McCrea RL, Sammon CJ, Nazareth I, Petersen I (2016) Initiation and duration of selective serotonin reuptake inhibitor prescribing over time: UK cohort study. Lancet 209(5):423–428

    Google Scholar 

  • Milea D, Verpillat P, Guelfucci F, Toumi M, Lamure M (2010) Prescription patterns of antidepressants: findings from a US claims database. Curr Med Res Opin 26(6):1343–1353

    Article  Google Scholar 

  • Morris TP, White IR, Crowther MJ (2019) Using simulation studies to evaluate statistical methods. Stat Med 38(11):2074–2102

    MathSciNet  Article  Google Scholar 

  • Murphy SA (2003) Optimal dynamic treatment regimes (with discussion). J R Stat Soc B 65(2):331–366

    MathSciNet  Article  Google Scholar 

  • Rassen JA, Moran J, Toh D, Kowal MK, Johnson K, Shoabi A, Hammad TA, Raebel MA, Holmes JH, Haynes K, Myers J, Schneeweiss S (2010) Evaluating strategies for data sharing and analyses in distributed data settings. Technical report, Mini-Sentinel

    Google Scholar 

  • Rich B, Moodie EEM, Stephens DA (2016) Optimal individualized dosing strategies: a pharmacologic approach to developing dynamic treatment regimens for continuous-valued treatments. Biom J 58(3):502–517

    MathSciNet  Article  Google Scholar 

  • Robins JM (2000) Robust estimation in sequentially ignorable missing data and causal inference models. In: Proceedings of the American Statistical Association 1999:6–10

  • Robins JM (2004) Optimal structural nested models for optimal sequential decisions. In D. Lin and P. Heagerty (Eds.), Proceedings of the Second Seattle Symposium on Biostatistics, New York, pp 189–326. Springer

  • Robins JM, Rotnitzky A (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. AIDS Epidemiol. Springer, pp 297–331

  • Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91(434):473–489

    Article  Google Scholar 

  • Saha-Chaudhuri P, Weinberg CR (2017) Addressing data privacy in matched studies via virtual pooling. BMC Med Res Methodol 17:136

    Article  Google Scholar 

  • Schulz J, Moodie EEM (2021) Doubly robust estimation of optimal dosing strategies. J Am Stat Assoc 116(533):256–268

    MathSciNet  Article  Google Scholar 

  • Shortreed SM, Ertefaie A (2017) Outcome-adaptive lasso: variable selection for causal inference. Biometrics 73(4):1111–1122

    MathSciNet  Article  Google Scholar 

  • Shu D, Yoshida K, Fireman BH, Toh S (2020) Inverse probability weighted Cox model in multi-site studies without sharing individual-level data. Stat Methods Med Res 29(6):1668–1681

    MathSciNet  Article  Google Scholar 

  • Simon G (2001) Choosing a first-line antidepressant: Equal on average does not mean equal for everyone. J Am Med Assoc 286(23):3003–3004

    Article  Google Scholar 

  • Simoneau G, Moodie EEM, Azoulay L, Platt RW (2020a) Adaptive treatment strategies with survival outcomes: An application to the treatment of Type 2 Diabetes using a large observational database. Am J Epidemiol 189(5):461–469

  • Simoneau G, Moodie EEM, Nijjar JS, Platt RW, the Scottish Early Rheumatoid Arthritis Inception Cohort Investigators (2020b) Estimating optimal dynamic treatment regimes with survival outcomes. J Am Stat Assoc 115(531):1531–1539

  • Simoneau G, Moodie EEM, Platt RW (2020c) Optimal dynamic treatment regimes with survival endpoints: introducing DWSurv in the R package DTRreg. J Stat Comput Simul 90(16):2991–3008

  • Simpson SM, Krishnan LL, Kunik ME, Ruiz P (2007) Racial disparities in diagnosis and treatment of depression: A literature review. Psychiatr Q 78(1):3–14

    Article  Google Scholar 

  • Sutton RS, Barto AG (1998) Introduction to reinforcement learning, 1st edn. MIT Press, Cambridge

    MATH  Google Scholar 

  • Talbot D, Moodie EEM, Diorio C (2021) Double robust estimation of partially adaptive treatment strategies. Technical report, arxiv

  • Toh S (2020) Analytic and data sharing options in real-world multidatabase studies of comparative effectiveness and safety of medical products. Clin Pharmacol Ther 107(4):834–842

    Article  Google Scholar 

  • Toh S, Rifas-Shiman SL, Lin P-I, Bailey LC, Forrest CB, Horgan CE, Lunsford D, Moyneur E, Sturtevant JL, Young JG, Block JP, PCORnet Antibiotics and Childhood Growth Study Group (2020) Privacy-protecting multivariable-adjusted distributed regression analysis for multi-center pediatric study. Pediatr Res 87:1086–1092

  • Van Buuren S (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 16(3):219–242

    MathSciNet  Article  Google Scholar 

  • Wallace MP, Moodie EEM (2015) Doubly-robust dynamic treatment regimen estimation via weighted least squares. Biometrics 71(3):636–644

    MathSciNet  Article  Google Scholar 

  • Watkins CJCH (1989) Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge, UK

  • Zeger SL, Liang K-Y (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42(1):121–130

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Erica E. M. Moodie.

Ethics declarations

Conflict of interest

SMS has worked on grants awarded to Kaiser Permanente Washington Health Research Institute (KPWHRI) by Bristol Meyers Squibb and by Pfizer. She was also a co-Investigator on grants awarded to KPWHRI from Syneos Health, who represented a consortium of pharmaceutical companies carrying out FDA-mandated studies regarding the safety of extended-release opioids. The study protocol was approved by the Independent Scientific Advisory Committee of the United Kingdom Clinical Practice Research Datalink (CPRD) (protocol number 19_017R) and the Research Ethics Committee of the Jewish General Hospital (Montréal, Québec, Canada).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

EEMM holds a Canada Research Chair (Tier 1) in Statistical Methods for Precision Medicine; she further acknowledges support from a Discovery Grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada and a chercheur de mérite career award from the Fonds de recherche du Québec–Santé. The contributions of JC and SMS to this work were supported by the National Institute of Mental Health of the National Institutes of Health, Award Number R01 MH114873. CD is supported by the Canadian Institutes of Health Research grant CIHR TD3-137716 and the Natural Sciences and Engineering Research Council of Canada grant NSERC 228203. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 3371 KB)

Supplementary file 2 (zip 7 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Moodie, E.E.M., Coulombe, J., Danieli, C. et al. Privacy-preserving estimation of an optimal individualized treatment rule: a case study in maximizing time to severe depression-related outcomes. Lifetime Data Anal (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Data aggregation
  • Distributed regression
  • Dynamic weighted survival modelling
  • Effect modification
  • Precision medicine
  • Selective serotonin reuptake inhibitors

Mathematics Subject Classification

  • 92B15
  • 62P10