BACKGROUND

Health organizations require rapid, reliable access to data on patients infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). On April 1, 2020, the Centers for Disease Control and Prevention (CDC) introduced a new International Classification of Disease, 10th revision (ICD-10) code U07.1 (COVID-19, virus identified),1 together with specific coding guidance regarding its appropriate use.2 Government agencies, including the Centers for Medicare and Medicaid Services, use ICD-10 data for ascertainment of COVID-19 hospitalizations,3 though little is known about the reliability of the U07.1 code in identifying disease. We sought to determine the performance characteristics of the ICD-10 code U07.1 for identification of COVID-19 illness in a large multicenter health system.

METHODS

We identified all inpatient encounters during which ≥1 SARS-CoV-2 reverse transcription polymerase chain reaction (RT-PCR) was performed from April 1 to July 31, 2020, across the Mass General Brigham health system. Patients with ≥1 positive SARS-CoV-2 RT-PCR were denoted to be COVID-19 positive. The agreement between COVID-19 positivity and primary or secondary ICD-10 coding of U07.1 was determined. Performance characteristics (sensitivity, specificity, positive predictive value [PPV], negative predictive value [NPV]) for the ICD-10 code U07.1 were reported overall and across major subgroups. Among COVID-19-positive patients, we performed multivariable logistic regression to identify independent predictors of corresponding ICD-10 coding for COVID-19(sensitivity). The Mass General Brigham Institutional Review Board approved the study protocol. Data management and analysis were performed using STATA (College Station, TX).

RESULTS

There were 22,633 patient encounters with a discharge date between April 1, 2020, and July 31, 2020, in which ≥1 SARS-CoV-2 RT-PCR was obtained during admission. Overall, 66.7%, 25.7%, and 7.7% of encounters had 1, 2, and >2 SARS-CoV-2 RT-PCR test(s) performed, respectively (range 1 to 16). Among these encounters, 2210 (9.8%) were determined to be COVID-19 positive. COVID-19 test–positive patients were older (64±18 vs. 60±19 years) and more likely to be men (51.8% vs. 44.3%), Hispanic (22.3% vs. 8.3%), and Black (21.6% vs. 10.4%) as compared with those with all negative RT-PCR results (P<0.001 for all). ICD-10 diagnostic code U07.1 was coded in 1208 (5.3%) patients. U07.1 had an overall sensitivity of 49.2% (95% confidence interval [CI]: 47.1–51.3%), specificity of 99.4% (95% CI: 99.3–99.5%), PPV of 90.0% (95% CI: 88.2–91.6%), and NPV of 94.8% (95% CI: 94.5–95.0%). Specificity remained high over time: 98.4% in April, 99.0% in May, 99.7% in June, and 99.9% in July 2020, while sensitivity varied and was lowest in July 2020 (27.9%). Sensitivity was lower among those age 0–17 years (14.3%, 95% CI: 1.2–70.1%), although this subset was limited by few encounters (n=175). Similar performance of U07.1 coding was observed across all other major subgroups (Table 1). Earlier months in the pandemic were the only significant independent predictors of higher sensitivity of the ICD-10 diagnostic code U07.1 among COVID-19 test–positive patients (Fig. 1).

Table 1 Performance Characteristics of ICD-10 Code for COVID-19 as Compared with SARS-CoV-2 RT-PCR Positivity
Figure 1
figure 1

Independent predictors of ICD-10 sensitivity among encounters with ≥1 positive SARS-CoV-2 RT-PCR test. CI, confidence interval; ICD-10, International Classification of Disease, 10th revision; ICU, intensive care unit; OR, odds ratio; SARS-CoV-2 RT-PCR, severe acute respiratory syndrome coronavirus 2 reverse transcription polymerase chain reaction.

DISCUSSION

Uniform administrative coding for research, disease tracking, and quality improvement is appealing given its widespread use and ease of interoperability across health systems. Reliance on these administrative data will likely remain important for prior COVID-19 disease identification, particularly given expanding interest in identifying legacy effects and post-acute sequelae of SARS-CoV-2 (PASC) infection. However, ICD-10 codes in other clinical settings4 and for COVID-19-related symptoms5 are known to be subject to misclassification.

We found sensitivity for U07.1 coding among hospitalized patients undergoing SARS-CoV-2 RT-PCR testing was modest, while specificity was high and approached 100% over time. Lack of initial awareness or familiarity with ICD-10 coding for COVID-19, in addition to distinctions between test positivity and clinical disease, may account for lower sensitivity. Lags in coding after hospital discharge and shifts in routine testing practices among hospitalized patients may also partially explain variable sensitivity. The robust specificity of ICD-10 U07.1 coding suggests that claims-based analyses may accurately capture patients with true COVID-19 disease. Epidemiological evaluations relying solely on U07.1 coding, however, may underestimate true disease burden. We acknowledge that SARS-CoV-2 RT-PCR is an imperfect “gold standard” and has itself had variable reported performance.6 We did not have access to corroborating information from other microbiological assessments or clinical presentations. Until higher fidelity testing is available, these data from a large integrated health system inform the alignment between SARS-CoV-2 RT-PCR testing and ICD-10 coding for COVID-19.

Access to Data

The first and corresponding authors, Drs. Ankeet S. Bhatt and Muthiah Vaduganathan, had full access to the study data and take responsibility for the integrity and accuracy of its analysis.