FormalPara Key Summary Points

Why carry out this study?

Prevalence of osteoarthritis (OA) is increasing, and it is projected to reach 25% of U.S. adults by 2030.

Understanding the burden of OA is key to determining the most beneficial prevention and treatment strategies but classifying patients by disease severity to predict future needs is difficult.

Our study is the first to use two strategies for segmenting patients into OA severity groups, based on pain and treatment intensity, and compare differences in utilization and costs in a large population over 18 years.

What has been learned from the study?

Pain and treatment intensity were both strongly associated with OA-related resource utilization but not consistently with all-cause utilization.

These results provide evidence for developing improved criteria for classifying patients that would help to predict future disease burden and enable targeted preventative strategies.

Introduction

Osteoarthritis (OA) is a mechanical and inflammatory-mediated disease that involves progressive change in the external and internal environment of the joint and results in substantial clinical and economic burden [1, 2]. Worldwide, 14% of all individuals over age 60 have this diagnosis [3], and prevalence is only anticipated to increase [4]. In 2030, it is projected that approximately 25% of American adults will have disability due to OA, and over 50% of these sufferers will be 65 years of age and older [4]. In the United States (U.S.), hospitalizations for arthritis are the second most expensive, costing roughly $20 billion [5], and the U.S. is estimated to spend $139.8 billion annually on outpatient OA care [6,7,8]. In addition to age, there are also clear associations between OA and risk factors such as obesity and female sex [9,10,11,12,13,14,15].

The most likely reason for patients with osteoarthritis to seek help from a physician is the symptom of pain [16]. However, due to inconsistent measures to qualify pain sensation, as well as the perception that joint pain is a normal consequence of aging, many patients suffer in silence [17]. Even though self-reported pain scores such as the Numerical Rating Scale (NRS) are limited in adequately describing the multidimensional nature of pain, a two-point difference (on a 0–10 scale) has been proven to detect clinically important changes in pharmaceutical and physical therapy trials [18, 19], and using electronic health records to consistently report this measure has been shown to improve pain management outcomes [20]. Recently, increased healthcare resource utilization in patients with knee and hip OA was shown to be associated with higher pain scores, with cost of care for patients with mild pain roughly 18% less than for those who rated their pain as moderate to severe [21].

Osteoarthritis pain is not always correlated with disease progression [16, 22], however, as the pathogenesis of OA is labile. This lability makes it difficult to reliably determine disease severity, particularly when imaging studies are clinically not indicated to make an OA diagnosis and thus not routinely ordered [23]. Although significant strides have been made in evidence-based clinical algorithms and care guidelines have been established, the data to guide clinical treatment decision making is still limited [8, 24,25,26,27]. Understanding the burden of OA is key to determining the most beneficial preventative strategies and the potential therapeutic interventions that would have the most beneficial impact [28, 29].

This study was intended to address the question of which segments of the OA population are driving the burden and costs of the disease. While the studies mentioned above have compared patients with OA to those without it [23] or focused on pain levels [21], they have not directly compared utilization and cost across multiple approaches to defining OA subgroups. Our objective was to utilize two strategies for segmenting patients into OA severity groups—one defined by pain scores, and one based on treatment intensity—and compare how these two strategies were able to differentiate between groups having different healthcare utilization and costs. This is the first study to our knowledge to use both electronic health records and claims to analyze direct financial costs in subcategories of disease severity, based on both pain and treatment progression, in a large open-cohort design in an integrated health system over an 18-year period.

Methods

Setting and Study Design

All data for this study originated from Geisinger, an integrated health system in Pennsylvania. Geisinger serves over 500,000 patients per year with seven hospitals, a network of 138 primary and specialty clinics, and a single electronic health record (EHR) platform (Epic Corporation, Verona, WI) encompassing inpatient and outpatient care across its network since 2001. Geisinger Health Plan, an affiliated insurance company, provides insurance to approximately one-third of the patients receiving care at Geisinger, with the remaining patients having a mix of commercial, government, and other insurance plans. All EHR and claims data for this project were extracted from Geisinger’s comprehensive enterprise-level data warehouse, created in 2006 as a single source of truth of cleansed, relevant data for clinical, financial, operational, and research needs that is updated every 24 h.

This was a retrospective, open-cohort study using EHR and insurance claims at Geisinger. The aim was to compare how two strategies for segmenting OA patients based on pain or treatment intensity were able to differentiate between groups having different healthcare utilization and costs. Patients were initially identified as eligible for the study if they had at least two outpatient encounters at a Geisinger facility between January 1, 2001 and December 31, 2018 and were age 18 or older when they received a diagnosis code for OA of any joint in the EHR on an encounter, problem list, or OA-related procedure (hip/knee replacement, arthroscopy, or injection). The first occurrence of an OA diagnosis or procedure was defined as the patient’s index date, recognizing that in a retrospective study like this one, patients may have been diagnosed with OA prior to the study period or prior to entering the health system. OA was defined using the International Classification of Diseases Ninth/Tenth Revisions, Clinical Modification (ICD-9/10-CM) codes ICD-9: 715.* or ICD-10 M15-19 [30, 31]. Other procedure and diagnosis codes are provided in Tables X1 and X2 in the Supplementary Material. For this study, patients were included in the treatment intensity-based analysis if they had ≥1 month of insurance coverage and were included in the pain analysis if they also had ≥1 NRS for pain after OA diagnosis. Patients were right-censored when they dropped insurance enrollment, died, or reached the end of the study period on December 31, 2018, whichever came first. This study was performed in line with the principles of the Helsinki Declaration of 1964, and reviewed and approved (IRB #2019-1033) by Geisinger's Institutional Review Board (IRB) as meeting the criteria for exemption as defined in the U.S. Department of Health and Human Services Regulations for the Protection of Human Subjects [(45 CFR 46.104)]. The specific exemption category under 45 CFR 46.104 was category 4, secondary research for which consent was not required for a retrospective study.

Pain Episode Definition

We examined NRS pain scores (scale of 0–10) taken on or after the first OA diagnosis in either the inpatient or outpatient setting, and defined “pain episodes” for every patient, where patients could contribute multiple episodes to the analysis. Each pain episode began on the date when an NRS score was taken, and continued for 90 days, with time-dependent covariates such as age or body mass index (BMI) updated accordingly. If an additional NRS was taken during those 90 days, the length of the episode was extended further until 90 days elapsed with no new NRS scores. In order to avoid double counting utilization or cost, episodes could not overlap, and each episode was categorized based on the initial pain score taken as mild (0–3), moderate (4–6), or severe (7–10). These categories have been widely used in clinical studies and routine clinical practice [32].

Treatment Stage Definition

We defined two stages of OA progression based on treatment intensity, mild and moderate-severe. A set of eight treatment-based criteria identified in EHR and claims data were used to define moderate-severe OA, and these were based on the 2019 OA treatment recommendations from the American College of Rheumatology (ACR) and 2019 OA treatment algorithm from the European Society for Clinical and Economic Aspects of Osteoporosis and Osteoarthritis (ESCEO) and other literature [24,25,26, 33,34,35,36,37,38,39]. These criteria are outlined and described in Table 1. At the time of initial OA diagnosis (index date), each patient was initially assigned to the mild category. If a patient never met any of the criteria, their entire timespan was classified as mild OA, and if a patient met at least one of the criteria on the same date as their first OA diagnosis, then their entire timespan was classified as moderate-severe OA. Otherwise, if a criterion was met after initial diagnosis, the patient’s time period was divided into mild and moderate-severe periods (i.e., the patient was presumed to progress to moderate-severe OA after the criterion was met). As before, time-dependent covariates such as age or BMI were updated at the start of each period. Kaplan–Meier survival analysis was used to examine and plot the percentage of patients progressing to moderate-severe OA over time.

Table 1 Treatment-based severity criteria used to define moderate-severe category of OA

As a secondary analysis to empirically support the validity of the eight criteria, we compared the prevalence of each criterion between two subgroups of patients: (1) “incident TKR patients,” with total knee replacement (TKR) and at least 24 months of EHR records prior to first OA diagnosis; and (2) “no THR” patients who received no hip or knee replacement. Any patients not meeting either definition were excluded from this specific analysis. Each patient was flagged as to whether they met each of the eight criteria as of 180 days prior to TKR (group 1) or last encounter in the EHR (group 2). We hypothesized that incident TKR patients would be significantly more likely than the comparison group to have experienced each criterion if it were a marker for moderate-severe OA. A logistic regression model was used to estimate the relative risk ratios (RRR) and 95% confidence intervals (CI) of those ratios between the two groups.

Statistical Analysis

Descriptive statistics (means and percentages) with standardized differences were used to describe the baseline characteristics of our two main populations (insured patients for the treatment-based analysis, and insured patients with pain episodes for the pain-based analysis), as well as the original EHR-based cohort these were drawn from. We compared age, sex, body mass index (BMI), and Charlson Comorbidity Index scores (CCI) between treatment-based and pain episode groups. Once the start and end dates of pain episodes and treatment severity stages were established, we compared patients in the different categories with respect to all-cause outpatient visits (OP), emergency department visits (ED), and inpatient days hospitalized (IP). For these utilization outcomes, we analyzed both the frequency of use (percentage of time periods in which any use occurred) and the units used, expressed as rates per member per year (PMPY). For costs, we tabulated the total allowed amounts (i.e., amounts actually paid to the provider, combining payor and patient deductible, insurance and out-of-pocket costs) in the following categories: total cost, pharmacy cost, and medical cost, with medical cost further subdivided into inpatient, outpatient, and emergency department costs. All amounts were normalized to 2018 U.S. dollars based on the healthcare component of the Consumer Price Index [40]. All of the above outcomes were also re-analyzed within a subset of only those claims that included OA-related medications, procedure or diagnosis codes in order to compare OA-related utilization, and cost among categories. Because of concerns about confounding due to age, sex, and BMI, generalized linear regression models were used to test for statistically significant differences among treatment-based or pain episode categories, with additional terms to adjust for age category (18–44, 45–64, 65–79, or 80+), sex (male or female) and BMI (< or ≥30 kg/m2). These models used a binomial distribution for frequency of utilization, normal distribution for rates of utilization, and log-link function and gamma distribution for costs. Results were expressed as estimates of the utilization and cost for a reference group of females aged 45–64 with BMI ≥ 30. All statistical analysis was performed using SAS software (SAS 9.4, Cary, NC).

Results

There were 290,897 patients identified in the EHR with a diagnosis of OA during the study period; 127,656 (44%) of these had at least 1 month of claims information available after OA diagnosis and were included in the study. Of the 127,656 patients, approximately 26% progressed from mild to moderate-severe OA during the study and therefore contributed to both categories, while 48% were only mild and 26% were only observed after progressing to moderate-severe disease. Of the patients eligible for the study, 92,576 had at least one pain episode, for a total of 306,200 pain episodes available for analysis (43% mild, 32% moderate, and 25% severe).

Baseline demographics of insured OA patients included in this study were similar to those of our health system's OA population overall. Table X3 in the Supplementary Material provides more information about baseline characteristics of the initially-identified EHR population versus those who met inclusion criteria for the current study. The initial OA population had a mean age of 50, with 59% females, 97% white/Caucasian and 97% non-Hispanic (reflecting the geographic region), 54% obese (BMI ≥ 30), and a mean CCI of 1.0. Patients divided into treatment-based categories and pain episodes did show some differences in baseline characteristics, as shown in Table 2. Patients with moderate-severe OA were older than those with mild OA (mean age 57 vs. 49 years), less likely to have a BMI < 30 (39 vs. 50%), and more likely to have a BMI > 35 (27% vs. 20%). Patients experiencing moderate and severe pain episodes had a lower mean age than those in mild pain episodes (54 vs. 54 vs. 57 years, respectively) and were more likely to be female (63 vs. 65 vs. 59%, respectively). Charlson Comorbidity Index was similar among groups. The regression adjustment for age, sex, and BMI was utilized when comparing utilization and cost therefore to address these possible sources of confounding.

Table 2 Baseline characteristics compared among treatment-based and pain categories, as of the patient's first observation in each category

Moderate and severe pain episodes were associated with statistically significantly higher frequencies and rates of every category of OA-related utilization when compared with mild pain episodes (OP visits: 1.01 vs. 1.61 vs. 1.83 PMPY for mild, moderate, and severe pain, respectively; ED visits: 0.14 vs. 0.27 vs. 0.47 PMPY, respectively; IP days: 0.18 vs. 0.40 vs. 0.42; all p values < 0.0001). All-cause ED frequency and visits PMPY also significantly increased for severe pain vs. mild pain, but all other categories of all-cause utilization decreased or stayed the same with increasing pain severity (OP visits: 14.1 vs. 13.6 vs. 14.0; ED visits: 1.47 vs. 1.38 vs. 2.03; IP days: 4.7 vs. 3.8 vs. 3.3). Similarly, we observed significant increases in every category of OA-related costs during moderate and severe pain episodes when compared to mild pain (for example, pharmacy: $729 vs. $961 vs. $1228 PMPY for mild, moderate, and severe, respectively; medical: $1505 vs. $2613 vs. $2742; all p values < 0.0001). In contrast, pharmacy was the only type of all-cause cost that was positively associated with increasing pain severity (pharmacy: $9114 vs. $9558 vs. $10,591), while all other all-cause costs decreased with increasing pain. These data are presented in Fig. 1 and Table 3.

Fig. 1
figure 1

Utilization and cost outcomes, compared among three pain categories (mild, moderate, and severe). Asterisks indicate significant contrasts (at p < 0.0001) between the indicated group compared to the mild pain category, and error bars indicate 95% confidence intervals. a Frequency (% of patients with any use) of all-cause utilization. b Frequency of OA-related utilization. c Units used (PMPY rate), all-cause. d Units used (PMPY), OA-related. e Mean costs, all-cause. f Mean costs, OA-related. ED emergency department visits, OA osteoarthritis, PMPY per member per year, US United States

Table 3 Frequency and units of utilization, and costs compared among the three pain episode types

All OA-related utilization and costs were significantly higher in both frequency and PMPY rates for patients in the moderate-severe OA category when compared to mild OA (OP visits: 0.58 vs. 1.08 PMPY for mild vs. moderate-severe OA, respectively; ED visits: 0.11 vs. 0.17; IP days: 0.07 vs. 0.24; pharmacy cost: $543 vs. $825; medical cost: $576 vs. $1,985; all p values < 0.0001). Patients in the moderate-severe OA category also had statistically significantly higher rates of all-cause ED visits and OP visits than mild OA patients (ED visits: 0.77 vs. 0.86 for mild vs. moderate-severe; OP visits: 9.7 vs. 10.0) but a similar rate of all-cause IP days (2.3 vs. 2.2, p = 0.53). Pharmacy and ED were the only two costs that were significantly higher for moderate-severe OA compared with mild OA (pharmacy: $6127 vs. $7707; ED; $715 vs. $786), while all other all-cause cost categories were significantly lower for moderate-severe OA patients. All significant p values noted above were < 0.0001. These data are presented in Fig. 2 and Table 4.

Fig. 2
figure 2

Utilization and cost outcomes, compared among the two treatment-based categories (mild OA vs. moderate-severe OA). Asterisks indicate significant contrasts compared to the mild OA category with p < 0.0001, and error bars indicate 95% confidence intervals. a Frequency (% of patients with any use) of all-cause utilization. b Frequency of OA-related utilization. c Units used (PMPY rate), all-cause. d Units used (PMPY), OA-related. e Mean costs, all-cause. f Mean costs, OA-related. ED emergency department visits, OA osteoarthritis, PMPY per member per year, US United States

Table 4 Frequency and units of utilization, and costs, compared among the two treatment-based severity levels

Finally, the secondary analysis comparing prevalence of the eight treatment-based severity criteria between incident OA patients with TKR and patients without joint replacement showed that all criteria displayed statistically significant risk ratios greater than 1.0. Because the former patients were more likely to have met these criteria before TKR surgery than other OA patients were to have met them before the study period ended, these data provide additional empirical support for these treatment-based criteria, which were largely based on guideline and expert opinion. These comparisons and risk ratios are shown in Table 5.

Table 5 Numbers and percentages of patients in two subgroups (incident OA + TKR, or no THR or TKR) meeting each of the eight criteria used in this study to define moderate-severe OA, measured as of 180 days before their TKR or last encounter

Discussion

This study examined utilization and cost differences among subgroups of OA patients, segmented into groups using two different approaches. One approach was based on patients meeting one of eight treatment-based criteria indicating transition from mild to moderate-severe OA, and the other approach was based on shorter periods of time (90 days or more) triggered by a pain score in the mild, moderate, or severe range. Analyses adjusting for age, sex, and BMI differences showed increases in utilization and costs in every OA-related category with increasing severity, using either a treatment-based or pain-based severity definition. Some similar associations were observed with all-cause resource utilization and costs, but that evidence was much more mixed, with many all-cause utilization and cost types that decreased as pain or treatment severity increased.

We saw increases in every subcategory of OA-related utilization and costs for patients who were defined as having more severe OA, whether that severity was defined by the treatment intensity or pain severity. Given the importance of pain as a symptom of OA, these associations have strong face validity and are consistent with what has been shown in other OA populations. Wei et al. [21], which used claims from 35,861 commercial and Medicare Part D knee/hip OA patients in the Optum database, reported statistically significant increases in OA-related costs stratified by pain severity; however, they also saw approximately 16–28% increases in total costs within each of these pain categories when comparing patients with versus without routine opioid use. Our work expands on that prior work by applying both pain severity and additional treatment criteria (instead of opioid use only), with very comparable results.

These increases in OA-related utilization and cost did not always translate, however, to increases in all-cause (i.e., including non-OA-related) utilization or costs, particularly when classifying the population on the basis of pain score. While treatment-based severity level tracked very well with most types of all-cause utilization, pharmacy and ED were the only types of all-cause costs associated with treatment severity level, and pharmacy was the only type of cost that showed a relationship with pain score severity. This lack of association remains an unexpected finding, though we suggest there are several possible explanations that could be explored further. Predominantly, it is reasonable to hypothesize that pain level may still be getting confounded with treatment; for example, lower pain scores could mean the patient is being treated more aggressively to control the pain and would therefore be incurring higher costs despite a lower reported pain score. Some types of utilization may taper off once the patient has progressed to a worse level; for example, if they have had a procedure or surgery that has helped them, or if they experienced unwanted side effects or other barriers to medication adherence, they may be spending or utilizing less despite the fact that they are further along in the progression of OA overall. We note also that in some prior studies, mean differences in ED visits, outpatient visits, and inpatient days between severe vs. mild pain that were statistically significant were still relatively small in clinical magnitude (e.g., only approximately 1.1, 1.9, and 0.8 per patient per year, respectively) [21].

Prior studies have aimed to identify clinically relevant phenotypes for OA [24], and Van Spil et al. described a consensus-based framework for conducting and reporting such studies [41]. There is still, however, no standard set of classification criteria, and this investigation provides further quantitative evidence supporting criteria based on both pain and clinician-ordered treatments. In the 2019 ACR guidelines for OA of the hand, hip, and knee, no hierarchy of recommended treatments is provided that would indicate varying levels of severity, and treatments may be used and reused at various times during the course of disease. The ESCEO's 2019 consensus statement [38] provides more of a guidance on three steps of treatment that may correspond more closely to the progression and severity of the disease, but all steps are based on medications only and no other domains, while the Osteoarthritis Research Society International (OARSI) published its own 2019 treatment guidelines that included non-pharmacological “core” interventions including exercise and education [42]. Deveza et al.'s 2017 systematic review (and 2019 narrative review) of knee OA phenotype studies noted that few studies combined data from different domains, despite the fact that evidence from different authors showed "pain sensitization, psychological distress, radiographic severity, BMI, muscle strength, inflammation and comorbidities" helped to differentiate OA patients [25, 43]. We did not directly compare our patients' classifications to ESCEO or OARSI classifications or previous authors' knee OA phenotypes, so we cannot assert advantages of our criteria over theirs, but we suggest our work provides further support for the inclusion of non-pharmacologic signals such as concomitant diagnoses and procedures when phenotyping OA patients based on OA disease severity.

The strength of this study was the novelty of being able to examine a large population over a long timeframe with a combination of both EHR and insurance claims, as opposed to claims only, which previous large studies have often been based on [21]. We recognize several limitations, however, related to both the pain and treatment-based analyses. We were not able to directly attribute pain severity scores with the patient’s OA, since the patient could be experiencing pain from other causes. We considered a sensitivity analysis to address this, excluding patients who were diagnosed with other major painful conditions such as cancer and chronic migraine, but those conditions affected fewer than 5% of patients in the cohort and thus would likely not have impacted our results; we nevertheless acknowledge this limitation that patients could have experienced non-OA pain. Next, our study was conducted in a largely Caucasian, non-Hispanic population, and we did not limit our study to hip or knee OA only, so these factors should be considered when directly comparing results with other studies. As this was an observational study, pain scores were not collected at regular intervals in usual practice and so patients vary widely in their number and frequency of scores. We recognize that a functional measure such as the Knee Injury and Osteoarthritis Outcome Score (KOOS) would have been a better outcome than pain NRS, but the study included all types of OA and NRS was the most frequently available outcome in the EHR. Pain is also generally recognized to be a subjective, not objective, measure that could be influenced by other unmeasured factors, and there is uncertainty about the relative timing between when the score was collected and when treatment was administered (e.g., does the pain score reflect a patient’s pain before or after a new treatment was given). Nonetheless, we believe this study’s results support considering pain when defining OA severity. On the treatment side, we believe this investigation provides even stronger evidence for segmenting patients based on treatments received, with a few caveats. We recognize that caution is needed to avoid circular arguments in which receipt of a treatment (e.g., pain medication) is used as both a classifying variable and an outcome. Our treatment-based approach, however, assigned patients to the moderate-severe OA category if they received any one of eight types of treatment, none of which would be expected to individually account for the magnitude of differences in utilization or cost seen here. In our results, meeting any one of those criteria was associated with patients significantly more likely to see their OA-related utilization and cost increase in the subsequent period, across all categories, which supports the idea that these individual categories are useful markers to predict future health and economic burden.

Conclusions

In conclusion, by understanding at what point osteoarthritis patients become greater consumers of healthcare resources, we can deploy targeted preventative strategies aimed at halting progression into the next more costly phase of disease. Previous studies have measured the extent of this burden, particularly with respect to differences between patients with OA and patients without it [1, 44], but this is the first study to our knowledge that utilizes patterns of care found in both claims and the EHR to analyze direct financial costs of OA patients in subcategories of disease severity based on both pain and treatment progression, and our results provide promising evidence of better criteria and approaches for predicting disease burden and costs in the future.