Use of Real-World EMR Data to Rapidly Evaluate Treatment Effects of Existing Drugs for Emerging Infectious Diseases: Remdesivir for COVID-19 Treatment as an Example

Zhang, Chenguang; Nigo, Masayuki; Patel, Shivani; Yu, Duo; Septimus, Edward; Wu, Hulin

doi:10.1007/s12561-023-09411-8

Use of Real-World EMR Data to Rapidly Evaluate Treatment Effects of Existing Drugs for Emerging Infectious Diseases: Remdesivir for COVID-19 Treatment as an Example

Case Studies and Practice Articles
Published: 02 January 2024

(2024)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Chenguang Zhang¹^na1,
Masayuki Nigo^2,3^na1,
Shivani Patel⁴,
Duo Yu⁵,
Edward Septimus^6,7 &
…
Hulin Wu ORCID: orcid.org/0000-0002-5809-5407¹

75 Accesses
2 Altmetric
Explore all metrics

Abstract

For an emerging infectious disease such as 2019 coronavirus disease (COVID-19), initially there may not be any existing medication or treatment immediately available, which may result in high morbidity and mortality in a short time of period. In this case, it is urgent to quickly identify whether existing medications or treatments could be repurposed to treat the newly appeared disease before time-consuming randomized clinical trials (RCTs) can be done and new drugs can be developed. For example, when SARS-CoV-2 appeared in late 2019, clinicians started to use existing antiviral drugs, anti-inflammatory drugs, immune-based therapies and other types of medications to treat COVID-19 patients before any data or evidence was available to support the use of these medications for the new COVID-19 disease. Most of these medications have proven to be ineffective or only marginally effective to treat COVID-19 patients by more rigorous RCT or secondary data analyses later. We propose to use real-world electronic medical records (EMR) data to develop real-time treatment evaluation and monitoring systems to identify effective treatments or avoid ineffective treatments for emerging diseases in the future. In order to do this, first we have to deal with the challenges in processing and analyzing complex and noisy EMR data. In this paper, we outline these challenges and propose practical statistical methods and guidelines, which are derived from a project in evaluating anti-viral medication, remdesivir, for COVID-19 treatment based on a local healthcare EMR database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium

Article Open access 19 August 2020

Data-driven analysis to understand long COVID using electronic health records from the RECOVER initiative

Article Open access 07 April 2023

An evidence mapping and analysis of registered COVID-19 clinical trials in China

Article Open access 01 June 2020

References

Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, Zhao X, Huang B, Shi W, Lu R, Niu P, Zhan F, Ma X, Wang D, Xu W, Wu G, Gao GF, Tan W, China Novel Coronavirus, I., & Research, T (2020) A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med 382(8):727–733. https://doi.org/10.1056/NEJMoa2001017
Article Google Scholar
Geneva: World Health Organization (2023) WHO COVID-19 Dashboard. https://covid19.who.int/
Beigel JH, Tomashek KM, Dodd LE, Mehta AK, Zingman BS, Kalil AC, Hohmann E, Chu HY, Luetkemeyer A, Kline S, Lopez de Castilla D, Finberg RW, Dierberg K, Tapson V, Hsieh L, Patterson TF, Paredes R, Sweeney DA, Short WR et al (2020) Remdesivir for the treatment of Covid-19—final report. N Engl J Med 383(19):1813–1826. https://doi.org/10.1056/NEJMoa2007764
Article Google Scholar
Goldman JD, Lye DCB, Hui DS, Marks KM, Bruno R, Montejano R, Spinner CD, Galli M, Ahn M-Y, Nahass RG, Chen Y-S, SenGupta D, Hyland RH, Osinusi AO, Cao H, Blair C, Wei X, Gaggar A, Brainard DM et al (2020) Remdesivir for 5 or 10 days in patients with severe covid-19. N Engl J Med 383(19):1827–1837. https://doi.org/10.1056/NEJMoa2015301
Article Google Scholar
Spinner CD, Gottlieb RL, Criner GJ, Arribas López JR, Cattelan AM, Soriano Viladomiu A, Ogbuagu O, Malhotra P, Mullane KM, Castagna A, Chai LYA, Roestenberg M, Tsang OTY, Bernasconi E, Le Turnier P, Chang S-C, SenGupta D, Hyland RH, Osinusi AO et al (2020) Effect of Remdesivir vs standard care on clinical status at 11 days in patients with moderate COVID-19: a randomized clinical trial. JAMA 324(11):1048–1057. https://doi.org/10.1001/jama.2020.16349
Article Google Scholar
WHO Solidarity Trial Consortium (2020) Repurposed antiviral drugs for Covid-19—interim WHO solidarity trial results. N Engl J Med 384(6):497–511. https://doi.org/10.1056/NEJMoa2023184
Article Google Scholar
Consortium, W. H. O. S. T. (2022) Remdesivir and three other drugs for hospitalised patients with COVID-19: final results of the WHO Solidarity randomised trial and updated meta-analyses. Lancet 399(10339):1941–1953. https://doi.org/10.1016/S0140-6736(22)00519-0
Article Google Scholar
Thadhani R (2006) In: Mehta A, Beck M, Sunder-Plassmann G (eds) Fabry disease: perspectives from 5 years of FOS. https://www.ncbi.nlm.nih.gov/pubmed/21290683
Girman CJ, Ritchey ME, 3rd Lo Re V (2022) Real-world data: Assessing electronic health records and medical claims data to support regulatory decision-making for drug and biological products. Pharmacoepidemiol Drug Saf 31(7):717–720. https://doi.org/10.1002/pds.5444
Article Google Scholar
Committee on the Learning Health Care System in, A., & Institute of, M. (2013). In M. Smith, R. Saunders, L. Stuckhardt, & J. M. McGinnis (Eds.), Best Care at Lower Cost: The Path to Continuously Learning Health Care in America. National Academies Press (US) Copyright 2013 by the National Academy of Sciences. All rights reserved. https://doi.org/10.17226/13444
Mills EJ, Thorlund K, Ioannidis JP (2013) Demystifying trial networks and network meta-analysis. BMJ 346:f2914. https://doi.org/10.1136/bmj.f2914
Article Google Scholar
Tricoci P, Allen JM, Kramer JM, Califf RM, Smith SC Jr (2009) Scientific evidence underlying the ACC/AHA clinical practice guidelines. JAMA 301(8):831–841. https://doi.org/10.1001/jama.2009.205
Article Google Scholar
Van Poucke S, Thomeer M, Heath J, Vukicevic M (2016) Are randomized controlled trials the (G)old standard? From clinical intelligence to prescriptive analytics. J Med Int Res 18(7):e185. https://doi.org/10.2196/jmir.5549
Article Google Scholar
Anglemyer A, Horvath HT, Bero L (2014) Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database Syst Rev 2014(4):Mr000034. https://doi.org/10.1002/14651858.MR000034.pub2
Article Google Scholar
Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI, Tektonidou MG, Contopoulos-Ioannidis DG, Lau J (2001) Comparison of evidence of treatment effects in randomized and nonrandomized studies. JAMA 286(7):821–830. https://doi.org/10.1001/jama.286.7.821
Article Google Scholar
Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong IC, Rijnbeek PR, van der Lei J, Pratt N, Norén GN, Li YC, Stang PE, Madigan D, Ryan PB (2015) Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform 216:574–578
Google Scholar
Hernán MA, Robins JM (2016) Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol 183(8):758–764. https://doi.org/10.1093/aje/kwv254
Article Google Scholar
Van Spall HG, Toren A, Kiss A, Fowler RA (2007) Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review. JAMA 297(11):1233–1240. https://doi.org/10.1001/jama.297.11.1233
Article Google Scholar
Callahan A, Shah NH, Chen JH (2020) Research and reporting considerations for observational studies using electronic health record data. Ann Intern Med 172(11 Suppl):S79-s84. https://doi.org/10.7326/m19-0873
Article Google Scholar
Kundu MG (2021) Statistics and machine learning methods for EHR data—from data extraction to data analytics. J Biopharm Stat 31(4):559–560. https://doi.org/10.1080/10543406.2021.1928833
Article Google Scholar
Kratochwill TR, Bergan JR (1990) Treatment evaluation. In: TR Kratochwill, JR Bergan (eds) Behavioral consultation in applied settings: an individual guide. Springer, Berlin, pp 157–185. https://doi.org/10.1007/978-1-4757-9395-6_5
Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM (2014) A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 21(2):221–230. https://doi.org/10.1136/amiajnl-2013-001935
Article Google Scholar
Zeng Z, Deng Y, Li X, Naumann T, Luo Y (2019) Natural language processing for EHR-based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinform 16(1):139–153. https://doi.org/10.1109/tcbb.2018.2849968
Article Google Scholar
Wu H, Yamal JM, Yaseen Y, Maroufy V (2021) Statistics and machine learning methods for EHR data: from data extraction to data analytics (edited). CRC Press, Boca Raton
Google Scholar
Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11(1):1–21
Article Google Scholar
Aguinis H, Gottfredson RK, Joo H (2013) Best-practice recommendations for defining, identifying, and handling outliers. Organ Res Methods 16(2):270–301
Article Google Scholar
Barnett V, Lewis T (1994) Outliers in statistical data, 3rd edn. Wiley, New York
Google Scholar
Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc: Ser B (Methodol) 26(2):211–243. https://doi.org/10.1111/j.2517-6161.1964.tb00553.x
Article Google Scholar
Dong Y, Peng CY (2013) Principled missing data methods for researchers. Springerplus 2(1):222. https://doi.org/10.1186/2193-1801-2-222
Article Google Scholar
Graham JW (2009) Missing data analysis: making it work in the real world. Annu Rev Psychol 60:549–576. https://doi.org/10.1146/annurev.psych.58.110405.085530
Article Google Scholar
Lachin JM (2016) Fallacies of last observation carried forward analyses. Clin Trials 13(2):161–168. https://doi.org/10.1177/1740774515602688
Article Google Scholar
Jonsson P, Wohlin C (2004) An evaluation of k-nearest neighbour imputation using Likert data. In: 10th international symposium on software metrics, 2004. Proceedings
van Buuren S, Groothuis-Oudshoorn K (2011) mice: multivariate imputation by chained equations in R. J Stat Softw 45(3):1–67. https://doi.org/10.18637/jss.v045.i03
Article Google Scholar
Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 20(1):40–49. https://doi.org/10.1002/mpr.329
Article Google Scholar
Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H (2014) Comparison of random forest and parametric imputation models. Am J Epidemiol 179(6):764–774
Article Google Scholar
Molenberghs G, Kenward MG (2007) Missing data in clinical studies. Wiley, San Francisco, CA
Book Google Scholar
Rai SN, Wu X, Srivastava DK, Craycroft JA, Rai JP, Srivastava S, James RF, Boakye M, Bhatnagar A, Baumgartner R (2020) Review: propensity score methods with application to the HELP clinic clinical study [Clinical report]. Open Access Medical Statistics, 11+. https://link.gale.com/apps/doc/A621084577/AONE?u=anon~b6653b08&sid=googleScholar&xid=666f20f0
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55. https://doi.org/10.1093/biomet/70.1.41
Article MathSciNet Google Scholar
Austin PC (2007) Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. J Thorac Cardiovasc Surg 134(5):1128–1135. https://doi.org/10.1016/j.jtcvs.2007.07.021
Article Google Scholar
Rosenbaum PR (1987) Model-based direct adjustment. J Am Stat Assoc 82(398):387–394. https://doi.org/10.2307/2289440
Article Google Scholar
Kurita T (2019) Principal component analysis (PCA). In: Computer vision: a reference guide. Springer, Berlin, pp 1–4. https://doi.org/10.1007/978-3-030-03243-2_649-1
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc: Ser B (Stat Methodol) 70(5):849–911. https://doi.org/10.1111/j.1467-9868.2008.00674.x
Article MathSciNet Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320. http://www.jstor.org/stable/3647580
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360. https://doi.org/10.1198/016214501753382273
Article MathSciNet Google Scholar
Heinze G, Wallisch C, Dunkler D (2018) Variable selection—a review and recommendations for the practicing statistician. Biom J 60(3):431–449. https://doi.org/10.1002/bimj.201700067
Article MathSciNet Google Scholar
Kaufman S, Rosset S, Perlich C (2011) Leakage in data mining: formulation, detection, and avoidance. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, San Diego, California, USA. https://doi.org/10.1145/2020408.2020496
Chen Y, Wang J, Chubak J, Hubbard RA (2019) Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: empirical illustration using breast cancer recurrence. Pharmacoepidemiol Drug Saf 28(2):264–268. https://doi.org/10.1002/pds.4680
Article Google Scholar
Brown SM, Duggal A, Hou PC, Tidswell M, Khan A, Exline M, Park PK, Schoenfeld DA, Liu M, Grissom CK, Moss M, Rice TW, Hough CL, Rivers E, Thompson BT, Brower RG (2017) Nonlinear imputation of PaO₂/FIO₂ from SpO₂/FIO₂ among mechanically ventilated patients in the ICU: a prospective. Obs Study Crit Care Med 45(8):1317–1324. https://doi.org/10.1097/ccm.0000000000002514
Article Google Scholar
Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39(1):33–38. https://doi.org/10.2307/2683903
Article Google Scholar
Groenwold RHH (2020) Informative missingness in electronic health record systems: the curse of knowing. Diagn Progn Res 4(1):8. https://doi.org/10.1186/s41512-020-00077-0
Article MathSciNet Google Scholar
Stekhoven DJ, Bühlmann P (2011) MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118. https://doi.org/10.1093/bioinformatics/btr597
Article Google Scholar
Zhang C, Maroufy V, Chen B, Wu H (2021) Missing data issues in EHR. In: Statistics and machine learning methods for EHR data, 1st ed, p 25
Gray RJ (1988) A class of K-sample tests for comparing the cumulative incidence of a competing risk. Ann Stat 16(3):1141–1154. http://www.jstor.org/stable/2241622
Austin PC, Lee DS, Fine JP (2016) Introduction to the analysis of survival data in the presence of competing risks. Circulation 133(6):601–609. https://doi.org/10.1161/CIRCULATIONAHA.115.017719
Article Google Scholar
Cole SR, Hernán MA (2008) Constructing inverse probability weights for marginal structural models. Am J Epidemiol 168(6):656–664. https://doi.org/10.1093/aje/kwn164
Article Google Scholar
McCaw ZR, Tian L, Vassy JL, Ritchie CS, Lee C-C, Kim DH, Wei L-J (2020) How to quantify and interpret treatment effects in comparative clinical studies of COVID-19. Ann Intern Med 173(8):632–637. https://doi.org/10.7326/M20-4044
Article Google Scholar
Scholz FW, Stephens MA (1987) K-sample anderson-darling tests. J Am Stat Assoc 82(399):918–924. https://doi.org/10.2307/2288805
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Series B (Methodol) 58(1):267–288. http://www.jstor.org/stable/2346178
Aleissa MM, Silverman EA, Acosta LMP, Nutt CT, Richterman A, Marty FM (2020) New perspectives on antimicrobial agents: Remdesivir treatment for COVID-19. Antimicrob Agents Chemother 65(1):e01814–e01820. https://doi.org/10.1128/AAC.01814-20
Article Google Scholar
Ansems K, Grundeis F, Dahms K, Mikolajewska A, Thieme V, Piechotta V, Metzendorf MI, Stegemann M, Benstoem C, Fichtner F (2021) Remdesivir for the treatment of COVID-19. Cochrane Database Syst Rev 8(8):Cd014962. https://doi.org/10.1002/14651858.Cd014962
Article Google Scholar
Arnaud M, Bégaud B, Thurin N, Moore N, Pariente A, Salvo F (2017) Methods for safety signal detection in healthcare databases: a literature review. Expert Opin Drug Saf 16(6):721–732. https://doi.org/10.1080/14740338.2017.1325463
Article Google Scholar
Sacks JJ, Harrold LR, Helmick CG, Gurwitz JH, Emani S, Yood RA (2005) Validation of a surveillance case definition for arthritis. J Rheumatol 32(2):340–347
Google Scholar
Cutler JA, Sorlie PD, Wolz M, Thom T, Fields LE, Roccella EJ (2008) Trends in hypertension prevalence, awareness, treatment, and control rates in United States adults between 1988–1994 and 1999–2004. Hypertension 52(5):818–827. https://doi.org/10.1161/hypertensionaha.108.113357
Article Google Scholar
Kohsaka S, Katada J, Saito K, Jenkins A, Li B, Mardekian J, Terayama Y (2020) Safety and effectiveness of non-vitamin K oral anticoagulants versus warfarin in real-world patients with non-valvular atrial fibrillation: a retrospective analysis of contemporary Japanese administrative claims data. Open Heart 7(1):e001232. https://doi.org/10.1136/openhrt-2019-001232
Article Google Scholar

Download references

Author information

Chenguang Zhang and Masayuki Nigo have contributed equally to this work.

Authors and Affiliations

Department of Biostatistics and Data Science, School of Public Health, University of Texas Health Science Center at Houston, 1200 Pressler Street, Suite E-833, Houston, TX, 77030, USA
Chenguang Zhang & Hulin Wu
Division of Infectious Diseases, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
Masayuki Nigo
School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
Masayuki Nigo
Memorial Hermann Health System, Houston, TX, USA
Shivani Patel
Division of Biostatistics, Institute for Health and Equity, Medical College of Wisconsin, Milwaukee, WI, USA
Duo Yu
Department of Data Science, Memorial Hermann Health System, Houston, TX, USA
Edward Septimus
Department of Population Medicine, Harvard Medical School, Boston, MA, USA
Edward Septimus

Authors

Chenguang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Nigo
View author publications
You can also search for this author in PubMed Google Scholar
Shivani Patel
View author publications
You can also search for this author in PubMed Google Scholar
Duo Yu
View author publications
You can also search for this author in PubMed Google Scholar
Edward Septimus
View author publications
You can also search for this author in PubMed Google Scholar
Hulin Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hulin Wu.

Ethics declarations

Conflict of interest

The authors do not have any financial interests that are directly or indirectly related to the work submitted for publication.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 17160 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, C., Nigo, M., Patel, S. et al. Use of Real-World EMR Data to Rapidly Evaluate Treatment Effects of Existing Drugs for Emerging Infectious Diseases: Remdesivir for COVID-19 Treatment as an Example. Stat Biosci (2024). https://doi.org/10.1007/s12561-023-09411-8

Download citation

Received: 15 June 2023
Revised: 12 November 2023
Accepted: 18 November 2023
Published: 02 January 2024
DOI: https://doi.org/10.1007/s12561-023-09411-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Use of Real-World EMR Data to Rapidly Evaluate Treatment Effects of Existing Drugs for Emerging Infectious Diseases: Remdesivir for COVID-19 Treatment as an Example

Abstract

Access this article

Similar content being viewed by others

International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium

Data-driven analysis to understand long COVID using electronic health records from the RECOVER initiative

An evidence mapping and analysis of registered COVID-19 clinical trials in China

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

Supplementary file1 (DOCX 17160 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Use of Real-World EMR Data to Rapidly Evaluate Treatment Effects of Existing Drugs for Emerging Infectious Diseases: Remdesivir for COVID-19 Treatment as an Example

Abstract

Access this article

Similar content being viewed by others

International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium

Data-driven analysis to understand long COVID using electronic health records from the RECOVER initiative

An evidence mapping and analysis of registered COVID-19 clinical trials in China

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

Supplementary file1 (DOCX 17160 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation