Skip to main content

Advertisement

Log in

Use of Real-World EMR Data to Rapidly Evaluate Treatment Effects of Existing Drugs for Emerging Infectious Diseases: Remdesivir for COVID-19 Treatment as an Example

  • Case Studies and Practice Articles
  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

For an emerging infectious disease such as 2019 coronavirus disease (COVID-19), initially there may not be any existing medication or treatment immediately available, which may result in high morbidity and mortality in a short time of period. In this case, it is urgent to quickly identify whether existing medications or treatments could be repurposed to treat the newly appeared disease before time-consuming randomized clinical trials (RCTs) can be done and new drugs can be developed. For example, when SARS-CoV-2 appeared in late 2019, clinicians started to use existing antiviral drugs, anti-inflammatory drugs, immune-based therapies and other types of medications to treat COVID-19 patients before any data or evidence was available to support the use of these medications for the new COVID-19 disease. Most of these medications have proven to be ineffective or only marginally effective to treat COVID-19 patients by more rigorous RCT or secondary data analyses later. We propose to use real-world electronic medical records (EMR) data to develop real-time treatment evaluation and monitoring systems to identify effective treatments or avoid ineffective treatments for emerging diseases in the future. In order to do this, first we have to deal with the challenges in processing and analyzing complex and noisy EMR data. In this paper, we outline these challenges and propose practical statistical methods and guidelines, which are derived from a project in evaluating anti-viral medication, remdesivir, for COVID-19 treatment based on a local healthcare EMR database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, Zhao X, Huang B, Shi W, Lu R, Niu P, Zhan F, Ma X, Wang D, Xu W, Wu G, Gao GF, Tan W, China Novel Coronavirus, I., & Research, T (2020) A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med 382(8):727–733. https://doi.org/10.1056/NEJMoa2001017

    Article  Google Scholar 

  2. Geneva: World Health Organization (2023) WHO COVID-19 Dashboard. https://covid19.who.int/

  3. Beigel JH, Tomashek KM, Dodd LE, Mehta AK, Zingman BS, Kalil AC, Hohmann E, Chu HY, Luetkemeyer A, Kline S, Lopez de Castilla D, Finberg RW, Dierberg K, Tapson V, Hsieh L, Patterson TF, Paredes R, Sweeney DA, Short WR et al (2020) Remdesivir for the treatment of Covid-19—final report. N Engl J Med 383(19):1813–1826. https://doi.org/10.1056/NEJMoa2007764

    Article  Google Scholar 

  4. Goldman JD, Lye DCB, Hui DS, Marks KM, Bruno R, Montejano R, Spinner CD, Galli M, Ahn M-Y, Nahass RG, Chen Y-S, SenGupta D, Hyland RH, Osinusi AO, Cao H, Blair C, Wei X, Gaggar A, Brainard DM et al (2020) Remdesivir for 5 or 10 days in patients with severe covid-19. N Engl J Med 383(19):1827–1837. https://doi.org/10.1056/NEJMoa2015301

    Article  Google Scholar 

  5. Spinner CD, Gottlieb RL, Criner GJ, Arribas López JR, Cattelan AM, Soriano Viladomiu A, Ogbuagu O, Malhotra P, Mullane KM, Castagna A, Chai LYA, Roestenberg M, Tsang OTY, Bernasconi E, Le Turnier P, Chang S-C, SenGupta D, Hyland RH, Osinusi AO et al (2020) Effect of Remdesivir vs standard care on clinical status at 11 days in patients with moderate COVID-19: a randomized clinical trial. JAMA 324(11):1048–1057. https://doi.org/10.1001/jama.2020.16349

    Article  Google Scholar 

  6. WHO Solidarity Trial Consortium (2020) Repurposed antiviral drugs for Covid-19—interim WHO solidarity trial results. N Engl J Med 384(6):497–511. https://doi.org/10.1056/NEJMoa2023184

    Article  Google Scholar 

  7. Consortium, W. H. O. S. T. (2022) Remdesivir and three other drugs for hospitalised patients with COVID-19: final results of the WHO Solidarity randomised trial and updated meta-analyses. Lancet 399(10339):1941–1953. https://doi.org/10.1016/S0140-6736(22)00519-0

    Article  Google Scholar 

  8. Thadhani R (2006) In: Mehta A, Beck M, Sunder-Plassmann G (eds) Fabry disease: perspectives from 5 years of FOS. https://www.ncbi.nlm.nih.gov/pubmed/21290683

  9. Girman CJ, Ritchey ME, 3rd Lo Re V (2022) Real-world data: Assessing electronic health records and medical claims data to support regulatory decision-making for drug and biological products. Pharmacoepidemiol Drug Saf 31(7):717–720. https://doi.org/10.1002/pds.5444

    Article  Google Scholar 

  10. Committee on the Learning Health Care System in, A., & Institute of, M. (2013). In M. Smith, R. Saunders, L. Stuckhardt, & J. M. McGinnis (Eds.), Best Care at Lower Cost: The Path to Continuously Learning Health Care in America. National Academies Press (US) Copyright 2013 by the National Academy of Sciences. All rights reserved. https://doi.org/10.17226/13444

  11. Mills EJ, Thorlund K, Ioannidis JP (2013) Demystifying trial networks and network meta-analysis. BMJ 346:f2914. https://doi.org/10.1136/bmj.f2914

    Article  Google Scholar 

  12. Tricoci P, Allen JM, Kramer JM, Califf RM, Smith SC Jr (2009) Scientific evidence underlying the ACC/AHA clinical practice guidelines. JAMA 301(8):831–841. https://doi.org/10.1001/jama.2009.205

    Article  Google Scholar 

  13. Van Poucke S, Thomeer M, Heath J, Vukicevic M (2016) Are randomized controlled trials the (G)old standard? From clinical intelligence to prescriptive analytics. J Med Int Res 18(7):e185. https://doi.org/10.2196/jmir.5549

    Article  Google Scholar 

  14. Anglemyer A, Horvath HT, Bero L (2014) Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database Syst Rev 2014(4):Mr000034. https://doi.org/10.1002/14651858.MR000034.pub2

    Article  Google Scholar 

  15. Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI, Tektonidou MG, Contopoulos-Ioannidis DG, Lau J (2001) Comparison of evidence of treatment effects in randomized and nonrandomized studies. JAMA 286(7):821–830. https://doi.org/10.1001/jama.286.7.821

    Article  Google Scholar 

  16. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong IC, Rijnbeek PR, van der Lei J, Pratt N, Norén GN, Li YC, Stang PE, Madigan D, Ryan PB (2015) Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 216:574–578

    Google Scholar 

  17. Hernán MA, Robins JM (2016) Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol 183(8):758–764. https://doi.org/10.1093/aje/kwv254

    Article  Google Scholar 

  18. Van Spall HG, Toren A, Kiss A, Fowler RA (2007) Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review. JAMA 297(11):1233–1240. https://doi.org/10.1001/jama.297.11.1233

    Article  Google Scholar 

  19. Callahan A, Shah NH, Chen JH (2020) Research and reporting considerations for observational studies using electronic health record data. Ann Intern Med 172(11 Suppl):S79-s84. https://doi.org/10.7326/m19-0873

    Article  Google Scholar 

  20. Kundu MG (2021) Statistics and machine learning methods for EHR data—from data extraction to data analytics. J Biopharm Stat 31(4):559–560. https://doi.org/10.1080/10543406.2021.1928833

    Article  Google Scholar 

  21. Kratochwill TR, Bergan JR (1990) Treatment evaluation. In: TR Kratochwill, JR Bergan (eds) Behavioral consultation in applied settings: an individual guide. Springer, Berlin, pp 157–185. https://doi.org/10.1007/978-1-4757-9395-6_5

  22. Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM (2014) A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 21(2):221–230. https://doi.org/10.1136/amiajnl-2013-001935

    Article  Google Scholar 

  23. Zeng Z, Deng Y, Li X, Naumann T, Luo Y (2019) Natural language processing for EHR-based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinform 16(1):139–153. https://doi.org/10.1109/tcbb.2018.2849968

    Article  Google Scholar 

  24. Wu H, Yamal JM, Yaseen Y, Maroufy V (2021) Statistics and machine learning methods for EHR data: from data extraction to data analytics (edited). CRC Press, Boca Raton

    Google Scholar 

  25. Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11(1):1–21

    Article  Google Scholar 

  26. Aguinis H, Gottfredson RK, Joo H (2013) Best-practice recommendations for defining, identifying, and handling outliers. Organ Res Methods 16(2):270–301

    Article  Google Scholar 

  27. Barnett V, Lewis T (1994) Outliers in statistical data, 3rd edn. Wiley, New York

    Google Scholar 

  28. Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc: Ser B (Methodol) 26(2):211–243. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x

    Article  Google Scholar 

  29. Dong Y, Peng CY (2013) Principled missing data methods for researchers. Springerplus 2(1):222. https://doi.org/10.1186/2193-1801-2-222

    Article  Google Scholar 

  30. Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576. https://doi.org/10.1146/annurev.psych.58.110405.085530

    Article  Google Scholar 

  31. Lachin JM (2016) Fallacies of last observation carried forward analyses. Clin Trials 13(2):161–168. https://doi.org/10.1177/1740774515602688

    Article  Google Scholar 

  32. Jonsson P, Wohlin C (2004) An evaluation of k-nearest neighbour imputation using Likert data. In: 10th international symposium on software metrics, 2004. Proceedings

  33. van Buuren S, Groothuis-Oudshoorn K (2011) mice: multivariate imputation by chained equations in R. J Stat Softw 45(3):1–67. https://doi.org/10.18637/jss.v045.i03

    Article  Google Scholar 

  34. Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 20(1):40–49. https://doi.org/10.1002/mpr.329

    Article  Google Scholar 

  35. Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H (2014) Comparison of random forest and parametric imputation models. Am J Epidemiol 179(6):764–774

    Article  Google Scholar 

  36. Molenberghs G, Kenward MG (2007) Missing data in clinical studies. Wiley, San Francisco, CA

    Book  Google Scholar 

  37. Rai SN, Wu X, Srivastava DK, Craycroft JA, Rai JP, Srivastava S, James RF, Boakye M, Bhatnagar A, Baumgartner R (2020) Review: propensity score methods with application to the HELP clinic clinical study [Clinical report]. Open Access Medical Statistics, 11+. https://link.gale.com/apps/doc/A621084577/AONE?u=anon~b6653b08&sid=googleScholar&xid=666f20f0

  38. Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55. https://doi.org/10.1093/biomet/70.1.41

    Article  MathSciNet  Google Scholar 

  39. Austin PC (2007) Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. J Thorac Cardiovasc Surg 134(5):1128–1135. https://doi.org/10.1016/j.jtcvs.2007.07.021

    Article  Google Scholar 

  40. Rosenbaum PR (1987) Model-based direct adjustment. J Am Stat Assoc 82(398):387–394. https://doi.org/10.2307/2289440

    Article  Google Scholar 

  41. Kurita T (2019) Principal component analysis (PCA). In: Computer vision: a reference guide. Springer, Berlin, pp 1–4. https://doi.org/10.1007/978-3-030-03243-2_649-1

  42. Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc: Ser B (Stat Methodol) 70(5):849–911. https://doi.org/10.1111/j.1467-9868.2008.00674.x

    Article  MathSciNet  Google Scholar 

  43. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320. http://www.jstor.org/stable/3647580

  44. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360. https://doi.org/10.1198/016214501753382273

    Article  MathSciNet  Google Scholar 

  45. Heinze G, Wallisch C, Dunkler D (2018) Variable selection—a review and recommendations for the practicing statistician. Biom J 60(3):431–449. https://doi.org/10.1002/bimj.201700067

    Article  MathSciNet  Google Scholar 

  46. Kaufman S, Rosset S, Perlich C (2011) Leakage in data mining: formulation, detection, and avoidance. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, California, USA. https://doi.org/10.1145/2020408.2020496

  47. Chen Y, Wang J, Chubak J, Hubbard RA (2019) Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: empirical illustration using breast cancer recurrence. Pharmacoepidemiol Drug Saf 28(2):264–268. https://doi.org/10.1002/pds.4680

    Article  Google Scholar 

  48. Brown SM, Duggal A, Hou PC, Tidswell M, Khan A, Exline M, Park PK, Schoenfeld DA, Liu M, Grissom CK, Moss M, Rice TW, Hough CL, Rivers E, Thompson BT, Brower RG (2017) Nonlinear imputation of PaO2/FIO2 from SpO2/FIO2 among mechanically ventilated patients in the ICU: a prospective. Obs Study Crit Care Med 45(8):1317–1324. https://doi.org/10.1097/ccm.0000000000002514

    Article  Google Scholar 

  49. Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39(1):33–38. https://doi.org/10.2307/2683903

    Article  Google Scholar 

  50. Groenwold RHH (2020) Informative missingness in electronic health record systems: the curse of knowing. Diagn Progn Res 4(1):8. https://doi.org/10.1186/s41512-020-00077-0

    Article  MathSciNet  Google Scholar 

  51. Stekhoven DJ, Bühlmann P (2011) MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118. https://doi.org/10.1093/bioinformatics/btr597

    Article  Google Scholar 

  52. Zhang C, Maroufy V, Chen B, Wu H (2021) Missing data issues in EHR. In: Statistics and machine learning methods for EHR data, 1st ed, p 25

  53. Gray RJ (1988) A class of K-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat 16(3):1141–1154. http://www.jstor.org/stable/2241622

  54. Austin PC, Lee DS, Fine JP (2016) Introduction to the analysis of survival data in the presence of competing risks. Circulation 133(6):601–609. https://doi.org/10.1161/CIRCULATIONAHA.115.017719

    Article  Google Scholar 

  55. Cole SR, Hernán MA (2008) Constructing inverse probability weights for marginal structural models. Am J Epidemiol 168(6):656–664. https://doi.org/10.1093/aje/kwn164

    Article  Google Scholar 

  56. McCaw ZR, Tian L, Vassy JL, Ritchie CS, Lee C-C, Kim DH, Wei L-J (2020) How to quantify and interpret treatment effects in comparative clinical studies of COVID-19. Ann Intern Med 173(8):632–637. https://doi.org/10.7326/M20-4044

    Article  Google Scholar 

  57. Scholz FW, Stephens MA (1987) K-sample anderson-darling tests. J Am Stat Assoc 82(399):918–924. https://doi.org/10.2307/2288805

    Article  MathSciNet  Google Scholar 

  58. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Series B (Methodol) 58(1):267–288. http://www.jstor.org/stable/2346178

  59. Aleissa MM, Silverman EA, Acosta LMP, Nutt CT, Richterman A, Marty FM (2020) New perspectives on antimicrobial agents: Remdesivir treatment for COVID-19. Antimicrob Agents Chemother 65(1):e01814–e01820. https://doi.org/10.1128/AAC.01814-20

    Article  Google Scholar 

  60. Ansems K, Grundeis F, Dahms K, Mikolajewska A, Thieme V, Piechotta V, Metzendorf MI, Stegemann M, Benstoem C, Fichtner F (2021) Remdesivir for the treatment of COVID-19. Cochrane Database Syst Rev 8(8):Cd014962. https://doi.org/10.1002/14651858.Cd014962

    Article  Google Scholar 

  61. Arnaud M, Bégaud B, Thurin N, Moore N, Pariente A, Salvo F (2017) Methods for safety signal detection in healthcare databases: a literature review. Expert Opin Drug Saf 16(6):721–732. https://doi.org/10.1080/14740338.2017.1325463

    Article  Google Scholar 

  62. Sacks JJ, Harrold LR, Helmick CG, Gurwitz JH, Emani S, Yood RA (2005) Validation of a surveillance case definition for arthritis. J Rheumatol 32(2):340–347

    Google Scholar 

  63. Cutler JA, Sorlie PD, Wolz M, Thom T, Fields LE, Roccella EJ (2008) Trends in hypertension prevalence, awareness, treatment, and control rates in United States adults between 1988–1994 and 1999–2004. Hypertension 52(5):818–827. https://doi.org/10.1161/hypertensionaha.108.113357

    Article  Google Scholar 

  64. Kohsaka S, Katada J, Saito K, Jenkins A, Li B, Mardekian J, Terayama Y (2020) Safety and effectiveness of non-vitamin K oral anticoagulants versus warfarin in real-world patients with non-valvular atrial fibrillation: a retrospective analysis of contemporary Japanese administrative claims data. Open Heart 7(1):e001232. https://doi.org/10.1136/openhrt-2019-001232

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hulin Wu.

Ethics declarations

Conflict of interest

The authors do not have any financial interests that are directly or indirectly related to the work submitted for publication.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 17160 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Nigo, M., Patel, S. et al. Use of Real-World EMR Data to Rapidly Evaluate Treatment Effects of Existing Drugs for Emerging Infectious Diseases: Remdesivir for COVID-19 Treatment as an Example. Stat Biosci (2024). https://doi.org/10.1007/s12561-023-09411-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12561-023-09411-8

Keywords

Navigation