Abstract
Estimating individualized treatment rules—particularly in the context of right-censored outcomes—is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While this motivates the use of very large datasets such as those from multiple health systems or centres, data privacy may be of concern with participating data centres reluctant to share individual-level data. In this case study on the treatment of depression, we demonstrate an application of distributed regression for privacy protection used in combination with dynamic weighted survival modelling (DWSurv) to estimate an optimal individualized treatment rule whilst obscuring individual-level data. In simulations, we demonstrate the flexibility of this approach to address local treatment practices that may affect confounding, and show that DWSurv retains its double robustness even when performed through a (weighted) distributed regression approach. The work is motivated by, and illustrated with, an analysis of treatment for unipolar depression using the United Kingdom’s Clinical Practice Research Datalink.
Similar content being viewed by others
References
Alam S, Moodie EEM, Stephens DA (2019) Should a propensity score model be super? The utility of ensemble procedures for causal adjustment. J Am Stat Assoc 38(9):1690–1702
Bauer, M., A. Pfennig, E. Severus, P. C. Whybrow, J. Angst, H.-J. Möller, and World Federation of Societies of Biological Psychiatry Task Force on Unipolar Depressive Disorders (2013) World Federation of Societies of Biological Psychiatry (WFSBP) guidelines for biological treatment of unipolar depressive disorders, part 1: Update 2013 on the acute and continuation treatment of unipolar depressive disorders. World J Biol Psychiatry 14(5):334–385
Blanco C, Patel SR, Liu L, Jiang H, Lewis-Fernández R, Schmidt AB, Liebowitz MR, Olfson M (2007) National trends in ethnic disparities in mental health care. Med Care 45(11):1012–1019
Chakraborty B (2011) Dynamic treatment regimes for managing chronic health conditions: a statistical perspective. Am J Public Health 101:40–45
Chakraborty B, Moodie EEM (2013) Statistical methods for dynamic treatment regimes: reinforcement learning, causal inference, and personalized medicine. Springer, New York
Cipriani A, Furukawa TA, Salanti G, Geddes JR, Higgins JP, Churchill R, Watanabe N, Nakagawa A, Omori IM, McGuire H, Tansella M, Barbui C (2009) Comparative efficacy and acceptability of 12 new-generation antidepressants: a multiple-treatments meta-analysis. Lancet 373(9665):746–758
Cook AJ, Wellman RD, Marsh TL, Tiwari RC, Nguyen MD, Russek-Cohen E, Jiang Z, Nelson JC (2012) Statistical methods for estimating causal risk differences in the distributed data setting for postmarket safety outcomes. Technical report, Mini-Sentinel
Cook A, Wellman R, Marsh T, Tiwari R (2014) Group sequential method for observational data by using generalized estimating equations: application to vaccine safety datalink. J R Stat Soc Ser C 64(2):319–338
Cook AJ, Wellman RD, Marsh T, Shoaibi A, Tiwari R, Nguyen M, Boudreau D, Weintraub ES, Jackson L, Nelson JC (2019) Applying sequential surveillance methods that use regression adjustment or weighting to control confounding in a multisite, rare-event, distributed setting: Part 2 in-depth example of a reanalysis of the measles-mumps-rubella-varicella combination vaccine and seizure risk. J Clin Epidemiol 64(2):114–122
Coulombe J, Moodie EEM, Shortreed SM, Renoux C (2021) Can the risk of severe depression-related outcomes be reduced by tailoring the antidepressant therapy to patient characteristics? Am J Epidemiol 190(7):1210–1219
Danieli C, Moodie EEM (2021) Preserving data privacy when using multi-site data to estimate individualized treatment rules (under review), 1–20
Deas I, Robson B, Wong C, Bradford M (2003) Measuring neighbourhood deprivation: a critique of the index of multiple deprivation. Eviron Plann C Gov Policy 21(6):883–903
Gill RD, Laan M J van der, Robins JM (1997) Coarsening at random: characterizations, conjectures, counter-examples. In: Proceedings of the First Seattle Symposium in Biostatistics, pp 255–294. Springer
Goldberg Y, Kosorok MR (2012) Q-learning with censored data. Ann Stat 40(1):529–560
Greco T, Zangrillo A, Biondi-Zoccai G, Landoni G (2013) Meta-analysis: pitfalls and hints. Heart Lung Vessels 5(4):219–225
Greenland S (2003) Tests for interaction in epidemiologic studies: a review and a study of power. Stat Med 2(2):243–251
Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, Van Staa T, Smeeth L (2015) Data resource profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol 44(3):827–836
Karr AF, Fulp WJ, Vera F, Young SS, Lin X, Reiter JP (2007) Secure, privacy-preserving analysis of distributed databases. Technometrics 49(3):335–345
Kaufman DJ, Murphy-Bollinger J, Scott J, Hudson KL (2009) Public opinion about the importance of privacy in biobank research. Am J Hum Genet 85(5):643–654
Li F, Morgan KL, Zaslavsky AM (2018) Balancing covariates via propensity score weighting. J Am Stat Assoc 113(521):390–400
Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22
Luo L, Song P X-K (2021) Multivariate online regression analysis with heterogeneous streaming data. Can J Stat (accepted)
Mazor KM, Richards A, Gallagher M, Arterburn DE, Raebel MA, Nowell WB, Curtis JR, Paolino AR, Toh S (2017) Stakeholders’ views on data sharing in multicenter studies. J Comp Effect Res 6(6):537–547
McCrea RL, Sammon CJ, Nazareth I, Petersen I (2016) Initiation and duration of selective serotonin reuptake inhibitor prescribing over time: UK cohort study. Lancet 209(5):423–428
Milea D, Verpillat P, Guelfucci F, Toumi M, Lamure M (2010) Prescription patterns of antidepressants: findings from a US claims database. Curr Med Res Opin 26(6):1343–1353
Morris TP, White IR, Crowther MJ (2019) Using simulation studies to evaluate statistical methods. Stat Med 38(11):2074–2102
Murphy SA (2003) Optimal dynamic treatment regimes (with discussion). J R Stat Soc B 65(2):331–366
Rassen JA, Moran J, Toh D, Kowal MK, Johnson K, Shoabi A, Hammad TA, Raebel MA, Holmes JH, Haynes K, Myers J, Schneeweiss S (2010) Evaluating strategies for data sharing and analyses in distributed data settings. Technical report, Mini-Sentinel
Rich B, Moodie EEM, Stephens DA (2016) Optimal individualized dosing strategies: a pharmacologic approach to developing dynamic treatment regimens for continuous-valued treatments. Biom J 58(3):502–517
Robins JM (2000) Robust estimation in sequentially ignorable missing data and causal inference models. In: Proceedings of the American Statistical Association 1999:6–10
Robins JM (2004) Optimal structural nested models for optimal sequential decisions. In D. Lin and P. Heagerty (Eds.), Proceedings of the Second Seattle Symposium on Biostatistics, New York, pp 189–326. Springer
Robins JM, Rotnitzky A (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. AIDS Epidemiol. Springer, pp 297–331
Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91(434):473–489
Saha-Chaudhuri P, Weinberg CR (2017) Addressing data privacy in matched studies via virtual pooling. BMC Med Res Methodol 17:136
Schulz J, Moodie EEM (2021) Doubly robust estimation of optimal dosing strategies. J Am Stat Assoc 116(533):256–268
Shortreed SM, Ertefaie A (2017) Outcome-adaptive lasso: variable selection for causal inference. Biometrics 73(4):1111–1122
Shu D, Yoshida K, Fireman BH, Toh S (2020) Inverse probability weighted Cox model in multi-site studies without sharing individual-level data. Stat Methods Med Res 29(6):1668–1681
Simon G (2001) Choosing a first-line antidepressant: Equal on average does not mean equal for everyone. J Am Med Assoc 286(23):3003–3004
Simoneau G, Moodie EEM, Azoulay L, Platt RW (2020a) Adaptive treatment strategies with survival outcomes: An application to the treatment of Type 2 Diabetes using a large observational database. Am J Epidemiol 189(5):461–469
Simoneau G, Moodie EEM, Nijjar JS, Platt RW, the Scottish Early Rheumatoid Arthritis Inception Cohort Investigators (2020b) Estimating optimal dynamic treatment regimes with survival outcomes. J Am Stat Assoc 115(531):1531–1539
Simoneau G, Moodie EEM, Platt RW (2020c) Optimal dynamic treatment regimes with survival endpoints: introducing DWSurv in the R package DTRreg. J Stat Comput Simul 90(16):2991–3008
Simpson SM, Krishnan LL, Kunik ME, Ruiz P (2007) Racial disparities in diagnosis and treatment of depression: A literature review. Psychiatr Q 78(1):3–14
Sutton RS, Barto AG (1998) Introduction to reinforcement learning, 1st edn. MIT Press, Cambridge
Talbot D, Moodie EEM, Diorio C (2021) Double robust estimation of partially adaptive treatment strategies. Technical report, arxiv
Toh S (2020) Analytic and data sharing options in real-world multidatabase studies of comparative effectiveness and safety of medical products. Clin Pharmacol Ther 107(4):834–842
Toh S, Rifas-Shiman SL, Lin P-I, Bailey LC, Forrest CB, Horgan CE, Lunsford D, Moyneur E, Sturtevant JL, Young JG, Block JP, PCORnet Antibiotics and Childhood Growth Study Group (2020) Privacy-protecting multivariable-adjusted distributed regression analysis for multi-center pediatric study. Pediatr Res 87:1086–1092
Van Buuren S (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 16(3):219–242
Wallace MP, Moodie EEM (2015) Doubly-robust dynamic treatment regimen estimation via weighted least squares. Biometrics 71(3):636–644
Watkins CJCH (1989) Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge, UK
Zeger SL, Liang K-Y (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42(1):121–130
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
SMS has worked on grants awarded to Kaiser Permanente Washington Health Research Institute (KPWHRI) by Bristol Meyers Squibb and by Pfizer. She was also a co-Investigator on grants awarded to KPWHRI from Syneos Health, who represented a consortium of pharmaceutical companies carrying out FDA-mandated studies regarding the safety of extended-release opioids. The study protocol was approved by the Independent Scientific Advisory Committee of the United Kingdom Clinical Practice Research Datalink (CPRD) (protocol number 19_017R) and the Research Ethics Committee of the Jewish General Hospital (Montréal, Québec, Canada).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
EEMM holds a Canada Research Chair (Tier 1) in Statistical Methods for Precision Medicine; she further acknowledges support from a Discovery Grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada and a chercheur de mérite career award from the Fonds de recherche du Québec–Santé. The contributions of JC and SMS to this work were supported by the National Institute of Mental Health of the National Institutes of Health, Award Number R01 MH114873. CD is supported by the Canadian Institutes of Health Research grant CIHR TD3-137716 and the Natural Sciences and Engineering Research Council of Canada grant NSERC 228203. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Moodie, E.E.M., Coulombe, J., Danieli, C. et al. Privacy-preserving estimation of an optimal individualized treatment rule: a case study in maximizing time to severe depression-related outcomes. Lifetime Data Anal 28, 512–542 (2022). https://doi.org/10.1007/s10985-022-09554-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-022-09554-8
Keywords
- Data aggregation
- Distributed regression
- Dynamic weighted survival modelling
- Effect modification
- Precision medicine
- Selective serotonin reuptake inhibitors