Skip to main content

Statistical Analysis—Meta-Analysis/Reproducibility

  • Chapter
  • First Online:
Clinical Applications of Artificial Intelligence in Real-World Data

Abstract

Federated learning has gained great popularities in the last decade for its capability of collaboratively building models on data from multiple datasets. However, in real-world biomedical settings, practical challenges remain, including the needs to protect privacy of the patients, the capability of accounting for between-site heterogeneity in patient characteristics, and, from operational point of view, the number of needed communications across data partners. In this chapter, we describe and provide examples of multi-database data-sharing mechanisms in the healthcare data context and highlight the primary methods available for performing statistical regression analysis in each setting. For each method, we discuss the advantages and disadvantages in terms of data privacy, data communication efficiency, heterogeneity awareness, and statistical accuracy. Our goal is to provide researchers with the insight necessary to choose among the available algorithms for a given setting of conducting regression analysis using multi-site data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, LaVange L, Marinac-Dabic D, Marks PW, Robb MA, Shuren J. Real-world evidence—what is it and what can it tell us. N Engl J Med. 2016;375(23):2293–7.

    Article  Google Scholar 

  2. Jarow JP, LaVange L, Woodcock J. Multidimensional evidence generation and FDA regulatory decision making: defining and using “real-world” data. JAMA. 2017;318(8):703–4.

    Article  Google Scholar 

  3. NIH. Announcement: Access to the COVID-19 Data Analytics Platform is Open. 2021. https://ncats.nih.gov/news/releases/2020/access-to-N3C-COVID-19-data-analytics-platform-now-open (visited on 05/06/2021).

  4. 4CE. Consortium for Clinical Characterization of COVID-19 by EHR: Members. 2021. https://covidclinical.net/members.index.html (visited on 05/06/2021).

  5. Weeks J, Pardee R. Learning to share health care data: a brief timeline of influential common data models and distributed health data networks in U.S. health care research. eGEMs (Generating Evidence & Methods to improve patient outcomes). 2019;7(1): 4, p. 1–7. https://doi.org/10.5334/egems.279.

  6. Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe WA, Payne PR, Pfaff ER, Robinson PN, Saltz JH, Spratt H. The National COVID Cohort Collaborative (N3C): rationale, design, infrastructure, and deployment. J Am Med Inform Assoc. 2021;28(3):427–43.

    Article  Google Scholar 

  7. Love D, Custer W. Miller P, 2010. All-payer claims databases: state initiatives to improve health care transparency. New York (NY): Commonwealth Fund.

    Google Scholar 

  8. Centers for Disease Control and Prevention. HIPAA privacy rule and public health. Guidance from CDC and the US Department of Health and Human Services. MMWR: Morbidity and Mortality Weekly Report, 2003;52(Suppl 1):1–17.

    Google Scholar 

  9. Voigt P, Von dem Bussche A. The EU general data protection regulation (GDPR). A Practical Guide, vol. 10. no. 3152676, 1st ed. Cham: Springer International Publishing; 2017. p. 10–5555.

    Google Scholar 

  10. D. McGraw, Building public trust in uses of Health Insurance. Portability and Accountability Act de-identified data. J Am Med Inform Assoc. 2012; https://doi.org/10.1136/amiajnl-2012-000936

  11. Benitez K, Malin B. Evaluating re-identification risks with respect to the HIPAA privacy rule. J Am Med Inform Assoc. 2010;17(2):169–77. https://doi.org/10.1136/jamia.2009.000026.

    Article  Google Scholar 

  12. Mazor KM, Richards A, Gallagher M, Arterburn DE, Raebel MA, Nowell WB, Curtis JR, Paolino AR, Toh S. Stakeholders’ views on data sharing in multicenter studies. J Comparat Effectiveness Res. 2017;6(6):537–47.

    Article  Google Scholar 

  13. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong ICK, Rijnbeek PR, Van Der Lei J. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inf. 2015;216:574.

    Google Scholar 

  14. Suchard MA, Schuemie MJ, Krumholz HM, You SC, Chen R, Pratt N, Reich CG, Duke J, Madigan D, Hripcsak G, Ryan PB. Comprehensive comparative effectiveness and safety of first-line antihypertensive drug classes: a systematic, multinational, large-scale analysis. The Lancet. 2019;394(10211):1816–26.

    Article  Google Scholar 

  15. Ball R, Robb M, Anderson SA, Dal Pan G. The FDA’s sentinel initiative—a comprehensive approach to medical product surveillance. Clin Pharmacol Ther. 2016;99(3):265–8.

    Article  Google Scholar 

  16. Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014;21(4):578–82.

    Article  Google Scholar 

  17. Chen RT, Glasser JW, Rhodes PH, Davis RL, Barlow WE, Thompson RS, Mullooly JP, Black SB, Shinefield HR, Vadheim CM, Marcy SM. Vaccine safety datalink project: a new tool for improving vaccine safety monitoring in the United States. Pediatrics. 1997;99(6):765–73.

    Article  Google Scholar 

  18. Vogt TM, Lafata JE, Tolsma DD, Greene SM. The role of research in integrated health care systems: the HMO Research Network. Permanente J. 2004;8(4):10.

    Google Scholar 

  19. Nelder JA, Wedderburn RW. Generalized linear models. J Royal Stat Soc: Series A (General). 1972;135(3):370–84.

    Article  Google Scholar 

  20. Cox DR. Regression models and life-tables. J Roy Stat Soc: Ser B (Methodol). 1972;34(2):187–202.

    MathSciNet  MATH  Google Scholar 

  21. Oxman AD, Clarke MJ, Stewart LA. From science to practice: meta-analyses using individual patient data are needed. JAMA. 1995;274(10):845–6. https://doi.org/10.1001/jama.1995.03530100085040.

    Article  Google Scholar 

  22. Riley RD, Higgins JP. Deeks JJ. 2011. Interpretation of random effects meta-analyses. BMJ, 342.

    Google Scholar 

  23. You SC, Rho Y, Bikdeli B, Kim J, Siapos A, Weaver J, Londhe A, Cho J, Park J, Schuemie M, Suchard MA. Association of ticagrelor vs clopidogrel with net adverse clinical events in patients with acute coronary syndrome undergoing percutaneous coronary intervention. JAMA. 2020;324(16):1640–50.

    Article  Google Scholar 

  24. Vashisht R, Jung K, Schuler A, Banda JM, Park RW, Jin S, Li L, Dudley JT, Johnson KW, Shervey MM, Xu H. Association of hemoglobin A1c levels with use of sulfonylureas, dipeptidyl peptidase 4 inhibitors, and thiazolidinediones in patients with type 2 diabetes treated with metformin: analysis from the observational health data sciences and informatics initiative. JAMA Netw Open. 2018;1(4):e181755–e181755.

    Article  Google Scholar 

  25. Zeng D, Lin DY. On random-effects meta-analysis. Biometrika. 2015;102(2):281–94.

    Article  MathSciNet  MATH  Google Scholar 

  26. Rassen JA, Avorn J, Schneeweiss S. Multivariate-adjusted pharmacoepidemiologic analyses of confidential information pooled from multiple health care utilization databases. Pharmacoepidemiol Drug Saf. 2010;19(8):848–57.

    Article  Google Scholar 

  27. Toh S, Reichman ME, Houstoun M, Ding X, Fireman BH, Gravel E, Levenson M, Li L, Moyneur E, Shoaibi A, Zornberg G, Hennessy S. Multivariable confounding adjustment in distributed data networks without sharing of patient-level data. Pharmacoepidemiol Drug Saf. 2013;22(11):1171–7. https://doi.org/10.1002/pds.3483. Epub 2013 Jul 23 PMID: 23878013.

    Article  Google Scholar 

  28. Duan R, Luo C, Schuemie MJ, Tong J, Liang CJ, Chang HH, Boland MR, Bian J, Xu H, Holmes JH, Forrest CB. Learning from local to global: an efficient distributed algorithm for modeling time-to-event data. J Am Med Inform Assoc. 2020;27(7):1028–36.

    Article  Google Scholar 

  29. Firth D. Bias reduction of maximum likelihood estimates. Biometrika. 1993;80(1):27–38.

    Article  MathSciNet  MATH  Google Scholar 

  30. Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman HI. Individual patient-versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Stat Med. 2002;21(3):371–87.

    Article  Google Scholar 

  31. Riley RD, Debray TP, Fisher D, Hattle M, Marlin N, Hoogland J, Gueyffier F, Staessen JA, Wang J, Moons KG, Reitsma JB. Individual participant data meta-analysis to examine interactions between treatment effect and participant-level covariates: statistical recommendations for conduct and planning. Stat Med. 2020;39(15):2115–37.

    Article  MathSciNet  Google Scholar 

  32. Fisher DJ, Carpenter JR, Morris TP, Freeman SC, Tierney JF. Meta-analytical methods to identify who benefits most from treatments: daft, deluded, or deft approach? BMJ. 2017;356: j573. https://doi.org/10.1136/bmj.j573.

    Article  Google Scholar 

  33. Chen Y, Dong G, Han J, Pei J, Wah BW, Wang J. Regression cubes with lossless compression and aggregation. IEEE Trans Knowl Data Eng. 2006;18(12):1585–99.

    Article  Google Scholar 

  34. Ben-Israel A. A Newton-Raphson method for the solution of systems of equations. J Math Anal Appl. 1966;15(2):243–52.

    Article  MathSciNet  MATH  Google Scholar 

  35. Wu Y, Jiang X, Kim J, Ohno-Machado L. G rid Binary LO gistic RE gression (GLORE): building shared models without sharing data. J Am Med Inform Assoc. 2012;19(5):758–64.

    Article  Google Scholar 

  36. Lu CL, Wang S, Ji Z, Wu Y, Xiong L, Jiang X, Ohno-Machado L. WebDISCO: a web service for distributed cox model learning without patient-level data sharing. J Am Med Inform Assoc. 2015;22(6):1212–9.

    Article  Google Scholar 

  37. Huang C, Huo X. A distributed one-step estimator. Math Program. 2019;174:41–76. https://doi.org/10.1007/s10107-019-01369-0.

    Article  MathSciNet  MATH  Google Scholar 

  38. Shu D, Yoshida K, Fireman BH, Toh S. Inverse probability weighted Cox model in multi-site studies without sharing individual-level data. Stat Methods Med Res. 2020;29(6):1668–81.

    Article  MathSciNet  Google Scholar 

  39. Riley RD, Simmonds MC, Look MP. Evidence synthesis combining individual patient data and aggregate data: a systematic review identified current practice and possible methods. J Clin Epidemiol. 2007;60(5):431–9. https://doi.org/10.1016/j.jclinepi.2006.09.009. Epub 2007 Feb 5 PMID: 17419953.

    Article  Google Scholar 

  40. Duan R, Boland MR, Liu Z, Liu Y, Chang HH, Xu H, Chu H, Schmid CH, Forrest CB, Holmes JH, Schuemie MJ. Learning from electronic health records across multiple sites: a communication-efficient and privacy-preserving distributed algorithm. J Am Med Inform Assoc. 2020;27(3):376–85.

    Article  Google Scholar 

  41. Jordan MI, Lee JD, Yang Y. Communication-efficient distributed statistical inference. J Am Stat Assoc. 2019;114(526):668–81. https://doi.org/10.1080/01621459.2018.1429274.

    Article  MathSciNet  MATH  Google Scholar 

  42. Edmondson MJ, Luo C, Islam MN, Sheils NE, Buresh J, Chen Z, Bian J, Chen Y. Distributed quasi-Poisson regression algorithm for modeling multi-site count outcomes in distributed data networks. J Biomed Inf. 2022;104097.

    Google Scholar 

  43. Edmondson MJ, Luo C, Duan R, Maltenfort M, Chen Z, Locke K, Shults J, Bian J, Ryan PB, Forrest CB, Chen Y. An efficient and accurate distributed learning algorithm for modeling multi-site zero-inflated count outcomes. Sci Rep. 2021;11(1):1–17.

    Article  Google Scholar 

  44. Sutton AJ, Kendrick D, Coupland CA. Meta-analysis of individual-and aggregate-level data. Stat Med. 2008;27(5):651–69.

    Article  MathSciNet  Google Scholar 

  45. Luo C, Islam M, Sheils NE, Buresh J, Reps J, Schuemie MJ, Ryan PB, Edmondson M, Duan R, Tong J, Marks-Anglin A. DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models. Nat Commun. 2022;13(1):1–10.

    Article  Google Scholar 

  46. Zhu R, Jiang C, Wang X, Wang S, Zheng H, Tang H. Privacy-preserving construction of generalized linear mixed model for biomedical computation. Bioinformatics, 2020:36(Supplement_1);i128–35.

    Google Scholar 

  47. Luo C, Islam MN, Sheils NE, Buresh J, Schuemie MJ, Doshi JA, Werner RM, Asch DA, Chen Y. dPQL: a lossless distributed algorithm for generalized linear mixed model with application to privacy-preserving hospital profiling. J Am Med Inf Assoc. 2022; ocac067. https://doi.org/10.1093/jamia/ocac067.

  48. Tong J, Duan R, Li R, Scheuemie MJ, Moore JH, Chen Y. Robust-ODAL: learning from heterogeneous health systems without sharing patient-level data. In: Pacific symposium on biocomputing 2020, 2019; 695–706.

    Google Scholar 

  49. Luo C, Duan R, Naj AC, et al. ODACH: a one-shot distributed algorithm for Cox model with heterogeneous multi-center data. Sci Rep. 2022;12:6627. https://doi.org/10.1038/s41598-022-09069-0.

    Article  Google Scholar 

  50. Luo X, Tsai WY. A proportional likelihood ratio model. Biometrika. 2012;99(1):211–22.

    Article  MathSciNet  MATH  Google Scholar 

  51. Tong J, Luo C, Islam MN, Sheils NE, Buresh J, Edmondson M, Merkel PA, Lautenbach E, Duan R, Chen Y. Distributed learning for heterogeneous clinical data with application to integrating COVID-19 data across 230 sites. NPJ Dig Med. 2022;5(1):1–8.

    Google Scholar 

  52. Duan R, Ning Y, Chen Y. Heterogeneity-aware and communication-efficient distributed statistical inference. Biometrika. 2022;109(1):67–83.

    Article  MathSciNet  MATH  Google Scholar 

  53. Luo C, Duan R, Edmondson M, Shi J, Maltenfort M, Morris J, Forrest C, Hubbard R, Chen Y. Distributed proportional likelihood ratio model with application to data integration across clinical sites 2020.

    Google Scholar 

  54. Shokri R, Stronati M, Song C, Shmatikov V. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE; 2017. p. 3–18.

    Google Scholar 

  55. Pyrgelis A, Troncoso C, De Cristofaro E. Knock knock, who’s there? Membership inference on aggregate location data. 2017. ArXiv Prepr. https://arxiv.org/abs/1708.06145.

  56. Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. J Priv Confidentiality. 2017;7:17–51.

    Article  MATH  Google Scholar 

  57. Wasserman L, Zhou S. A statistical framework for differential privacy. J Am Stat Assoc. 2010;105:375–89.

    Article  MathSciNet  MATH  Google Scholar 

  58. Sweeney L. k-anonymity: a model for protecting privacy. Int J Uncertainty, Fuzziness Knowledge-Based Syst. 10, 557–570 (2002).

    Google Scholar 

  59. CMS Cell Suppression Policy, accessed April 15th, 2022. https://www.hhs.gov/guidance/document/cms-cell-suppression-policy.

  60. Froelicher D, et al. Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption. bioRxiv 2021.

    Google Scholar 

  61. Ohno-Machado L, et al. pSCANNER: patient-centered scalable national network for effectiveness research. J Am Med Inform Assoc. 2014;21:621–6.

    Article  Google Scholar 

  62. Luo C, Duan R, Edmondson M, Tong J, Chen Y. pda: privacy-preserving distributed algorithms. R package version 1.0–2 2020. https://CRAN.R-project.org/package=pda.

  63. Luo C, et al. pda: Privacy-Preserving Distributed Algorithms (v 1.2–4). Github. https://github.com/Penncil/pda. (Accessed on 20 Mar 2021).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Edmondson, M.J., Luo, C., Chen, Y. (2023). Statistical Analysis—Meta-Analysis/Reproducibility. In: Asselbergs, F.W., Denaxas, S., Oberski, D.L., Moore, J.H. (eds) Clinical Applications of Artificial Intelligence in Real-World Data. Springer, Cham. https://doi.org/10.1007/978-3-031-36678-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36678-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36677-2

  • Online ISBN: 978-3-031-36678-9

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics