Introduction

COVID-19, the illness caused by Severe Acute Respiratory Syndrome (SARS) coronavirus 2, arose as a global pandemic in 2020, continuing into 2021. Symptoms of the disease range widely from none to death, and result in high demand on health care providers, hospitals and resources, such as emergency care services and inpatient beds, intensive care beds and complex life-saving equipment. Correspondingly, the median costs associated with COVID-19 treated cases were estimated by Bartsch in 2020 to range from $3,994 for mild symptomatology to $18,579 for hospitalized cases [1]. A later study by Tsai et al. used Medicare claims revealing higher costs for hospitalization, at a mean of $21,752 up to $49,441 if mechanical ventilation was required [2]. Several studies have shown that patient characteristics such as age, gender, and comorbidities have impacted both the risk for infection as well as the severity of illness [3, 4, 5].

The World Health Organization (WHO), working with the International Forum for Acute Care Trialists and the International Severe Acute Respiratory and Emerging Infections Consortium, developed a WHO Clinical Progression Scale to measure the viral burden of COVID-19 and to assess the patient trajectory and resources used over the course of COVID-19 [6]. Other scoring instruments have been developed for emergency department triage or prediction of mortality [7]. These tools are primarily utilized for identification of patient risk to optimize inpatient management of patients.

To date no known tool exists to assign a severity level to an episode of COVID-19 for purposes other than patient management. We propose a reliable and reproducible method to allow researchers to evaluate episodes as opposed to acute hospitalization so that analyses can be conducted on the processes of treatment intervention, treatment effectiveness, treatment efficiencies, and health care costs and outcomes related to COVID-19.

Healthcare claims data are available through payor sources such as health insurance carriers, Medicare and Medicaid, as well as from companies that aggregate, de-identify and license use of large administrative claims databases. Health claims data provide valuable information on insured persons across time regardless of provider or provider group. Specific diagnoses and procedures are documented for services and exist in the data both historically and linked to treatment events by date. Thus, claims data allow for identification of specific individuals or cohorts who meet designated medical or demographic criteria, allowing the researcher to evaluate co-morbidities, utilization patterns, costs of services, define episodes of care and measure outcomes both individually and population based. Claims data research has been used effectively for policy analyses and to inform population health initiatives.

Claims data are, however, subject to a time lag, whereby the provider submission of a claim for reimbursement follows the actual service, and the adjudication of the claim by the payor is a process that also delays data. Typically, claims data are added to the database when the claim is processed and paid by the carrier, so often are incomplete until 90 days after submission. As the COVID-19 pandemic began in the United States in early 2020 and continues to date, the full year of 2020 claims were determined to be available in April 2021 for analysis.

Methods

The database used for the development of this scale was Optum’s Clinformatics® Data Mart (CDM) which is derived from administrative health claims for members of large commercial and Medicare Advantage health plans (Optum® de-identified COVID-19 Electronic Health Record dataset (2007–2020). The database includes approximately 19 million annual covered lives, for a total of over 65 million unique lives over a 9-year period (1/2007 through 12/2020). Clinformatics® Data Mart is statistically de-identified under the Expert Determination method consistent with HIPAA and managed according to Optum® customer data use agreements. CDM administrative claims submitted for payment by providers and pharmacies are verified, adjudicated and de-identified prior to inclusion. These data, including patient-level enrollment information, are derived from claims submitted for all medical and pharmacy health care services with information related to health care costs and resource utilization. The population is geographically diverse, spanning all 50 states.

Results

The COVID-19 scale was applied to the national claims data in the Optum CDM for all medical claims in 2020. Of the 19,761,754 total unique persons with enrollment information in the database in 2020, 692,094 (3.5%) met the criteria for one of the severity levels based on diagnosis codes. The age distribution was as expected with infection rates generally rising with increasing age (Table 1).

Table 1 Percentage of persons in each age group with a COVID-19 diagnosis

As shown in Table 2, over half of all patients –(60%), fell into Severity Level 2 – a confirmed diagnosis of COVID-19 but asymptomatic and ambulatory. Another 14% remained ambulatory (Level 3), resulting in 72% of diagnosed cases that did not require a higher level of care. 12% utilized the emergency department (Level 4) but did not require admission, and the remaining 12% (Levels 5–9) were hospitalized at various levels of severity. Slightly more than 2% died during hospitalization (Level 9). The rates for each level varied considerably by age group, with the older ages reaching higher severity levels (p < 0.001). Other demographic factors such as race and ethnicity, geographic region, and comorbidity count had statistically significant associations with severity level of COVID-19 (Table 2).

Table 2 Distribution of severity by age, gender, race, region, comorbidities and cost

Costs also varied significantly by severity level, which was an expected finding as the levels were related to intensity of resource use which drive costs. Table 3 presents the mean and median costs per person in relation to severity level. Patients who reached severity level 8 and survived incurred the highest average costs related to COVID-19 at a median of $197,007. The highest severity level 9 had lower median costs than the level below, which is likely explained by the death in hospital resulting in shorter lengths of stay. In Table 4, the gamma regression analysis shows how each predictor increases compared to the reference level while holding all other variables constant. The regression shows a significant relationship between cost and severity, in that more severe cases predicted higher costs while controlling for the other factors. The most pronounced difference can be found between severity levels 3 and 4. Severity level 4 is exp(1.86): 6.42 times the cost of level 3. Similar trends were observed in higher levels such as 6.05 times the cost in level 5 compared to level 4, and 2.53 times the cost in level 8 compared to level 7.

Table 3 Mean and Median cost of care by severity level for patients with COVID-19
Table 4 Generalized Gamma Regression of Cost and Severity

Discussion

This study relied on claims data to identify persons with confirmed COVID-19. Confirmation of COVID-19 was determined by a diagnosis code on a claim submitted by a provider for medical or pharmacy services. The prevalence of confirmed COVID-19 cases at 3.6% was lower in the claims data than generally reported for the population. Sen et al. reported that 33% of the US population had been infected by the end of 2020 yet only 11.8% were documented, providing a comparative estimate of 3.9% of the population with documented infection [8]. The lower rate may be due to several factors including (1) testing not recorded with a health care claim, (2) diagnosis not assigned on testing claim, (3) cases confirmed through non-billed sources, such as public health agencies resulting in undercounting due to care received without a related bill for service [9], (4) selection bias in studying only persons with commercial health insurance [9].

The COVID-19 pandemic impacted the health care system through excessive demands on resources such as intensive care facilities, over-taxed capacity of hospitals, and shortage of health care providers which may have influenced the progression of an individual’s disease severity. These factors could not be controlled for in the model for evaluation. An additional limitation was the issue related to high cost claimants whose claims were allocated to a stop-loss account once an annual limit was reached and subsequent charges were not reported. This may have resulted in lower observed costs than actual costs incurred, however, the number of affected individuals was small and any resulting bias in cost estimates likely was minor.

The recorded average costs are consistent with the studies by Bartsch and Tsai, with ambulatory patients incurring costs less than $3,000 and increasing for hospitalized cases with wide variation in overall mean of $21,752 to $47,441 for Medicare patients [1, 2]. What has not been demonstrated previously is the extent to which health care costs are driven by the most severely affected patients. As the present index was created based on the intensity of medical interventions, which are directly related to costs. What was not appreciated prior to this evaluation was the exponential shape of the cost curve, with the small number of patients receiving the most intensive care driving overall costs.

We believe that the COVID-19 scale would be useful for further research on both clinical and financial impacts of the disease. Severity Level 1 has limited utility for a cost analysis because it represented a small number of patients with limited information. However, the authors believe that this level may have potential value in other contexts, such as the exploration of long-term outcomes (i.e. “post-acute COVID”) and its impact on comorbid conditions.

The authors present this scale for application by researchers who use claims data to evaluate the impact of the COVID-19 pandemic on individuals, populations, and on policy. Standardization of a measure of severity would allow easier comparison of results across studies and facilitate a determination of reproducibility of findings in various settings and populations. The scale can be implemented in any claims-based dataset such as those maintained by health plans, researchers, and health systems with claims based records. It is relevant across age groups, sex, and payor groups (i.e. Medicare, Medicaid, commercial, etc.). Future COVID-19 research will likely include analyses of the severity of COVID-19 events and the impact on continuing symptoms or complications. Additionally, over time, the value of Level 1 may increase as COVID-19 cases are less frequently documented by a provider and self-report increases. Finally, in the future the use of the severity scale may benefit from the addition of information on vaccination history for which various codes have been created.

To build the logic for the claims-based COVID-19 Severity Scale (referred to as the COVID-19 Scale) we modeled it upon the design of the WHO Progression Scale. For hospitalized patients, this scale relied on clinical values documented in medical records, with a special focus on oxygen levels (FiO2 and pO2) and use of mechanical ventilation, renal dialysis and extracorporeal membrane oxygenation (ECMO) as key measures for patients at the highest levels of severity [6]. From data measured over the course of treatment, the WHO scale identified 10 levels of patient progression as follows: Score 0: Uninfected, Scores 1, 2, and 3: Ambulatory mild disease, Scores 4 or 5: hospitalized: moderate disease, Scores 6,7,8,or 9: Hospitalized: severe diseases and Score 10: Dead.

Since documentation of oxygen levels (FiO2 and p02) is not available in claims data, we modeled similar endpoints as used in the WHO scale, using information that is routinely available within retrospective claims data. The modified measure is intended to be used as an index of episode severity as opposed to treatment progression. Endpoints commonly used included symptoms, respiratory status, progression to levels of treatment (ambulatory, emergency department, inpatient admission), and mortality. For patients who required oxygen therapy we relied upon documentation of the use of various levels of respiratory treatments, with specific focus on mechanical ventilation. Renal dialysis and ECMO are well documented in claims data and were incorporated for patients who required these additional treatments. Death is not always well documented in claims data, unless it occurs in an inpatient setting, for which discharge status is coded as “expired.”

The strategy for identification of cases between January 1, 2020 and April 2020 before an official ICD-10 diagnosis code was issued for COVID-19 relied upon the February 2020 guidance from the Centers for Disease Control and Prevention (CDC) that combined codes for respiratory infections with code B97.29 – other coronavirus [10]. Another challenge was evident in that many persons during the pandemic developed a presumptive COVID-19 infection without a confirming diagnosis or a medical claim submitted by a provider for detection or treatment. Additionally, individuals began to experience sequelae of COVID-19 without an earlier diagnosis in the claims data. For these cases, the CDC published additional guidance with ICD-10 codes for “personal history of COVID-19” or “sequelae of COVID-19” [11]. The publication of the ICD-10 code for COVID-19 in April 2020 allowed confirmed cases to be documented in claims data. The coding logic is detailed in Appendix A and includes the ICD-10 codes used in the COVID-19 Scale.

The original intent of the scale was to identify persons who had COVID-19 at any time during 2020, and to assign the highest severity level experienced by each person. If a person had more than one documented episode of COVID-19, the highest severity level was assigned to that person so that person-based research would capture the most debilitating state attained.

Like the WHO Progression Scale, we tiered the COVID-19 Scale by ordinal levels according to symptomatology, resource use, and mortality. The individual ranks are clearly defined, mutually exclusive, and ordered in a hierarchical progression reflecting clinical` deterioration [11]. Levels 1–3 had no documentation of presentation to or treatment at an emergency department or an inpatient facility yet are differentiated by documentation of diagnosis and symptoms (Level 1: no documented diagnosis but personal report of COVID-19, Level 2: diagnosis code but no symptom code, Level 3: diagnosis code and symptom code(s)). Because the scale was used initially to assess “subacute COVID” and other individual health impacts, we defined Level 1 to represent a personal history of COVID as documented in the claims without confirmatory diagnostic evidence. Because less than 1% met criteria for Level 1 - no confirmatory diagnosis yet personal report of COVID-19, this level was excluded from further analysis because a diagnosis did not exist in the data.

The other ambulatory-only levels were delineated by the existence of symptoms Levels 2 and 3. Level 4 reflected treatment for COVID-19 at an emergency department without inpatient admission. Levels 5–9 all required an inpatient admission with progressively increasing levels of resource use or procedures reflecting respiratory distress or organ failure (Level 5: Hospital admit no oxygen, Level 6: Hospital admit with non-invasive oxygen, Level 7: Hospital admit with mechanical ventilation, Level 8: Hospital admit with mechanical ventilation and renal dialysis or ECMO). Level 9 indicated death during the hospital treatment. See Appendix B for detailed Level definitions.

Costs were computed for each patient with COVID-19 in 2020, considering only the costs associated with claims that included a COVID-19 diagnosis. Total costs were based upon both hospital/facility and professional charges. It was noted that approximately 1% of the claims in the database had the charges and paid amounts recorded as “$0” or $0.01”, and it was determined that these were claims that exceeded an annual individual stop-loss amount for that member. In these cases, the commercial health plan reallocated the claim to a stop-loss policy and the excess amount is shown as “0” but all other claim details remain in the data. These cases were excluded from the cost analysis because the total cost could not be computed.

A generalized gamma regression analysis was used because the univariate relationship between severity level and cost was non-linear. Severity levels were backwards difference coded to compare each level to the level directly prior in the regression. A sensitivity analysis was done on only those identified as having COVID-19 after April 2020 when the COVID-19 ICD-10 code was developed. Results were near identical to original regression with complete population, verifying the measures (Appendix C). This analysis was performed using SAS software, Version 9.4 of the SAS System [12]. This study was reviewed and approved by the Institutional Review Board of the University of Texas Health Science Center at Houston, and all methods were performed in accordance with the relevant guidelines and regulations.