Introduction

One of the most important confounders in observational studies of patients with rheumatoid arthritis (RA) is disease severity. Owing to large size and relative ease of access, health-care utilization databases have been increasingly used to study various treatment outcomes in RA [1-4]. However, clinical disease activity markers are not available in these databases, and hence studies conducted using these data are prone to residual confounding by disease severity. To address this problem, Ting et al. [5] developed an algorithm to create a claims-based index for rheumatoid arthritis severity (CIRAS) by using numerous variables from claims. In their original article, Ting et al. [5] used RA records-based index of severity (RARBIS), which was constructed by using ratings by a Delphi panel on potential markers of RA severity commonly found in medical charts, to demonstrate the validity of CIRAS and reported moderate correlation between the medical records and claims-based indices.

Despite being commonly used in RA observational research [6-8], CIRAS has not been validated against a clinical marker of RA severity until now. RA severity is a complex concept that depends on a combination of disease activity, physical function impairment, and physical damage to the joints. Clinically accepted measures for accurately determining RA severity that include all of these aspects are scarce. However, the disease activity score in 28 joints calculated by using C-reactive protein (DAS28-CRP) [9] is commonly used to evaluate treatment success and to guide treatment selection in patients with RA [10]. Therefore, in the absence of a standard clinical measure for RA severity, we selected the disease activity measure DAS28-CRP to validate the claims-based severity measure CIRAS in this external validation study using data from the Brigham and Women’s Hospital Rheumatoid Arthritis Sequential Study (BRASS) linked to Medicare claims. Furthermore, we examined the correlation between the multi-dimensional health assessment questionnaire (MD-HAQ) [11] physical function scores and CIRAS.

Methods

The BRASS registry is a single-center, prospective, observational cohort of 1,350 patients with a rheumatologist-verified diagnosis of RA. For the subjects enrolled in this registry, data on patient-reported items, including demographics, lifestyle factors, medication use, and quality-of-life scales, as well as physician-reported items such as DAS28-CRP, extra-articular manifestations, and medication changes are collected during annual follow-up visits. For this study, we identified BRASS patients who were also enrolled in Medicare between 2006 and 2010, and linked their data from these two sources. Of these subjects, we further identified those with at least one valid DAS28-CRP measurement in BRASS after 365 days of continuous enrollment in Medicare. The algorithm proposed by Ting et al. [5] was implemented by using Medicare claims data in the period of 365 days immediately prior to the DAS28-CRP measurement date to calculate the CIRAS for these patients. Pearson correlation coefficients between the calculated CIRAS and DAS28-CRP were calculated. We also analyzed MD-HAQ physical function scores measured on the same day as DAS28-CRP for these patients and calculated Pearson correlation coefficients between the calculated CIRAS and MD-HAQ. Personal identifiers were removed from the dataset before the analysis to protect subject confidentiality. Patient informed consent was, therefore, not required. This study was approved by the Brigham and Women’s Hospital’s Institutional Review Board.

Furthermore, we identified several other potential predictors of RA severity, which were not part of the original CIRAS, from medical and pharmacy claims in a subset of patients who had Medicare part D enrollment for the 365-day period prior to the DAS28 measurement date in order to improve the algorithm for CIRAS. These variables included rheumatoid lung involvement, hand surgery, tuberculin test ordered, and anti-cyclic citrullinated peptide (CCP) test ordered, steroid use, opioid use, non-steroidal anti-inflammatory drug (NSAID) use, number of non-biologic disease-modifying anti-rheumatoid drugs (DMARDs) used, and biologic DMARD use. A multivariable linear regression model was built by using DAS28-CRP as the outcome and these claims-derived variables as predictors. Adjusted correlations between the predictors and the outcome were reported as partial R2 values. Full model R2 was reported as a measure of the overall performance of this model.

Results

We located 368 patients who were enrolled in both BRASS and Medicare. We then excluded 53 patients who did not have at least one valid DAS28-CRP measured in BRASS after 365 days of continuous enrollment in Medicare, leaving 315 patients with sufficient baseline data for calculation of CIRAS. Of these 315 patients, the majority (81%) were females. The mean (standard deviation) age of the cohort was 70 (10) years. The median (interquartile range) DAS28-CRP and CIRAS were 3.3 (2.3 to 4.6) and 4.4 (3.7 to 5.1), respectively. Other patient characteristics used for CIRAS calculation are summarized in Table 1. The correlation between the calculated CIRAS and DAS28-CRP was found to be poor (Pearson correlation coefficient = 0.07, P = 0.24). The correlation between the calculated CIRAS and MD-HAQ physical function scores was also found to be low (Pearson correlation coefficient = 0.08, P = 0.17).

Table 1 Characteristics of rheumatoid arthritis patients included in the external validation study

Furthermore, we identified a subgroup of 119 patients who had at least 1 year of Medicare part D enrollment immediately prior to the DAS28-CRP measurement date. The linear regression model containing additional claims-derived variables along with the variables originally proposed by Ting et al. [5] yielded model R2 of 0.23, suggesting limited ability of this model to explain variation in DAS28-CRP. Among some of the most influential predictors in this model were biologic DMARD use, opioid use, tuberculin test ordered, and number of non-biologic DMARDs used in the prior year (Table 2).

Table 2 Adjusted correlations between additional claims-based variables related to rheumatoid arthritis severity and DAS28-CRP

Discussion

In this validation study using data from an external cohort of Medicare-enrolled patients with an established diagnosis of RA, the previously published algorithm to approximate RA severity by using claims-based variables had poor correlation with DAS28-CRP and MD-HAQ. Adding more variables derived from both medical and pharmacy claims as predictors in a linear regression model did not substantially improve the performance of this algorithm in predicting DAS28-CRP.

Several potential differences between this external validation study and the original study in which Ting et al. [5] developed CIRAS may explain the poor performance of CIRAS in this cohort. First, it must be noted that CIRAS was validated against a medical records-based RA severity index (RARBIS) in the original study and that the correlation between the two indices was found to be moderate (Spearman correlation coefficient = 0.51). RARBIS itself has been shown to correlate only moderately with DAS28 (Spearman correlation coefficient = 0.41) [12]. The majority of the clinical parameters measured through RARBIS, including patients’ functional status, arthritis flares, x-ray results, and laboratory results, are not captured in claims and hence in CIRAS. Therefore, the poor performance of CIRAS against DAS28-CRP may simply reflect the inability to account for these important clinical parameters. Next, important differences between the current cohort and the CIRAS derivation cohort, including sizable gender differences (81% versus 9% females), differences in the disease activity, and differences in health-care utilization patterns, may help explain the poor performance of CIRAS in this validation cohort.

CIRAS has been used in observational studies of RA treatments in the past mainly to control for confounding by disease activity. Two prior studies used CIRAS as a covariate in their regression models for the outcome [6,7]. Another study used CIRAS as one of the variables for prediction of a disease risk score (infection score) and stratified analysis based on this disease risk score to account for measured confounding [8]. Findings from our study show poor correlation between CIRAS and DAS28-CRP (RA activity measure, which often drives treatment selection) as well as MD-HAQ (patient physical function score, which may be indicative of frailty and hence may be an important confounder). These findings suggest that CIRAS may not accurately approximate disease activity or frailty in observational studies of RA treatments using insurance claims data. Given this poor correlation between CIRAS and important confounders unmeasured in health-care claims data, future research should be considered to critically evaluate the benefit of using CIRAS as a tool for confounding control.

Another important contribution of our study is that it highlights the importance of external validation of claims-based algorithms. Two prior studies have attempted to build algorithms predicting RA severity. Wolfe et al. [13] used data on the type and number of DMARDs used by the patients in the National Data Bank for Rheumatic Diseases to predict their RA severity and found suboptimal performance of these variables in predicting RA severity as measured by a patient activity scale. Baser et al. [14] used Veterans Health Administration claims to build a severity index for rheumatoid arthritis (SIFRA) and reported moderate correlations with the CIRAS. Before widespread adoption of these indices, broad testing is critical to determine their appropriateness in different databases.

Conclusions

Our study reported a low correlation between the previously proposed CIRAS and DAS28-CRP as well as MD-HAQ physical function scores, suggesting that CIRAS may not approximate RA disease activity or frailty reliably in observational cohorts. Claims-based algorithms for clinical disease activity should be rigorously tested in distinct populations in order to establish their generalizability.