Plain English summary

Apart from clinical, laboratory tests, or imaging, patient-reported outcome measures (PROMs) are essential to evaluate the outcomes of inflammatory arthritis and its management. With the Patient-Reported Outcomes Measurement Information System (PROMIS), PROMs can be measured in a uniform and standardized way. PROMIS covers specific and generic health domains and are relevant for various patient populations. Specific PROMIS measures such as Physical Function, Fatigue, Sleep Disturbance, or Depression can be used to measure a more specific health domain than the general measures such as PROMIS Global Health. Although the use of PROMIS measures is widely advocated, little is known on their actual use in patients with inflammatory arthritis. In this systematic literature review, we wanted to describe the use and outcomes of PROMIS measures in clinical studies involving people with rheumatoid arthritis or axial spondyloarthritis. This systematic literature review found that PROMIS measures are currently not often used in clinical studies in these patient groups and that there is a large variety regarding the use of specific PROMIS measures. To facilitate the comparisons of outcomes across studies, more standardization of the use of specific PROMIS measures is needed.

Introduction

Rheumatoid arthritis (RA) and axial spondyloarthritis (axSpA) are two forms of inflammatory arthritis that can lead to pain, stiffness, fatigue, limitations in functioning, and participation in a considerable proportion of patients, despite the availability of effective drug treatments [1,2,3]. It is beyond doubt that this has a major impact on the quality of life of these patients [1,2,3].

Apart from clinical, laboratory, or imaging parameters, patient-reported outcome measures (PROMs) are essential to evaluate the outcomes of inflammatory arthritis and its management. To date, numerous PROMs, either generic or disease-specific, are used in clinical care and research regarding inflammatory arthritis. However, some of the widely used legacy measures that are based on the classical test theory are criticized for a lack of standardization, precision, and/or comparability of scores across studies and diseases [4, 5]. To overcome these limitations, in 2007, the Patient-Reported Outcomes Measurement Information System (PROMIS) became available [6]. PROMIS measures are item-response theory-based questionnaires (Item Banks, Short Forms or Computer Adaptive Tests) that cover specific and generic health domains and are relevant for various patient populations. All PROMIS measures use a standardized metric, called a T-score, centered around the general population, which enhances the interpretability of these scores.

PROMIS measures have been applied in general populations and in people with different physical conditions such as critical illness, spinal surgery, low back pain, cancer, and chronic pain [7,8,9,10,11,12]. For inflammatory arthritis patients, the use of PROMIS measures seems to be appropriate as well, where several PROMIS measures are used since its introduction in 2007. Recently, the International Consortium for Health Outcomes Measurement (ICHOM) promoted the use of PROMIS Pain Interference, General Health, Physical Function, and Fatigue measures as part of routine outcome measurement for patients with inflammatory arthritis [13]. This more standardized way of reporting PROMIS outcomes facilitates new options to compare the performance of health care for inflammatory arthritis on a global scale, allowing health care professionals to learn from each other and to further improve the health care for inflammatory arthritis patients.

Little is known so far about the extent and nature of their actual use in clinical research in patients with inflammatory arthritis. Thus, the aim of this review was to systematically determine the use and outcomes of PROMIS measures in clinical studies including patients with RA and/or axSpA. The outcomes of PROMIS measures were included to assess whether the PROMIS measures depict the relatively worse health status of RA and axSpA patients.

Methods

Study design

This systematic review was reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [14], with the exception of the PRISMA item on risk of bias assessments, as the methodological quality of the studies was deemed less relevant given the exploratory nature of the literature review.

Search strategy

A trained librarian (JS) performed a literature search in nine electronic databases (PubMed, MEDLINE, Embase, Web of Science, Cochrane Library, Emcare, PsycINFO, Academic Search Premier, Google Scholar) on July 29, 2022. The search strategy consisted of the combination of the disease concepts (RA, AxSpA and inflammatory arthritis) with PROMIS. Not only controlled subject terms such as MeSH terms were applied, but also various free text words describing the search concepts were used. The search was limited to articles published from 2007 onwards, as PROMIS became available in that year. The search strategy is presented in Supplement 1. The identified records were imported into a software application (Rayyan (http://rayyan.qcri.org) [15] and duplicates were removed. In addition, studies were identified through an indirect approach by screening the references of included studies and those of relevant systematic reviews resulting from the search.

Selection criteria

  • Inclusion criteria: Original clinical studies (a) reporting the use of one or more PROMIS measures; (b) including patients with RA and/or axSpA aged 18 years or above; (c) written in English, French, German or Dutch.

  • Exclusion criteria: Studies including patients with multiple diagnoses, but not reporting the information on RA or axSpA patients separately.

No limitations were formulated on the type of study design (e.g., retrospective studies, prospective studies, randomized controlled trials).

Selection process

Records retrieved from the search were screened in two phases. In the first phase, all identified records were screened by checking the title and abstract by two researchers (MT, IK) according to the abovementioned eligibility criteria using the online Rayyan® software [15]. Records were scored as most likely eligible, possibly eligible and not eligible. Records that were scored as not eligible were excluded. Disagreements were resolved by discussion between the two researchers and if no agreement was found the record was deemed as eligible for the second phase of screening. Subsequently, 10% of all records in the first phase (title and abstracts) were screened by a third researcher (TVV) to ensure the quality of the selection process.

In the second phase, full-text articles were retrieved and independently screened by the same two researchers, using the same eligibility criteria. For that purpose, the outcomes of the screening were entered into a pre-defined database with the inclusion and exclusion criteria. Disagreements were resolved by discussion between the two researchers and if no agreement was found, a third researcher was consulted (TVV or MG). Fifty percent of the screening of the full-text papers was checked by a third researcher (MG). The third reviewer was a supervisor (TVV /MG), who was engaged to further ensure the quality of the screening process. For feasibility reasons, given the total amount of titles and abstracts versus full-text papers, 10% and 50% of the selection and extraction processes was checked.

Data extraction

A pre-defined data extraction form was used to systematically extract information from the full-text articles that were ultimately selected. One researcher extracted the data (MT or IK), a second researcher checked this extraction (MT or IK). Again, a third researcher checked the data extraction of 50% of the papers to ensure the data were correctly extracted (MG).

Regarding the study characteristics, information on the first author, year of publication, country, study design (cross-sectional study, longitudinal cohort study, controlled or uncontrolled clinical trial, other; based on definition of the original study) and population (registry, community or clinic) was retrieved. With respect to the study populations we collected: type of inflammatory arthritis (RA, axSpA or both), the number of patients, general patient characteristics (mean age, sex, disease duration).

We defined articles as individual papers unless the data of two or more articles were gathered in the same community, clinic(s) or registry, and the sample sizes and general patient characteristics (age, sex distribution) were exactly the same, in that case we addressed these articles as one single study. The date of the first publication was used for the chronological ordering of the studies. However, if one of these publications included T-scores and the other publications did not, the date of the publication reporting on T-scores was used.

The name of the PROMIS measures (Item Banks, Short Forms, Computer Adaptive Tests) used with version number was recorded and checked with the website of healthmeasures.net, accessed on August 1, 2022. If the name of the reported PROMIS measure was not registered, the measure was not taken into account. Also the results of T-score metrics were extracted, if available. For T-scores a normalized distribution (T-score 0–100, standardized mean 50, standard deviation 10) is used. A value of 50 is considered as the mean score of the general population with a standard deviation of 10. For some PROMIS measures a score higher than 50 indicates a better outcome (e.g., PROMIS Ability to Participate in Social roles and Activities, Physical Function, Satisfaction with Social roles and Activities), whereas for others a score higher than 50 means a worse outcome (e.g., Anger, Anxiety, Fatigue, Pain Behavior, Pain Intensity, Pain Interference, Sleep Disturbance, Sleep-Related Impairment, Depression). If a PROMIS measure was administered multiple times in one study and likewise reported, the results at baseline were extracted.

If the results of a specific PROMIS measure were reported in multiple articles that were grouped in one study, and there was a difference between T-scores smaller than 0.5, the score reported in the first publication was extracted. In case of any scores that were unclear, the first author of the article was contacted, to confirm the calculation.

Results

The search identified initially 714 records, which after deduplication resulted in a set of 272 records. The first screening resulted in the exclusion of 163 records (Fig. 1). After the screening of the remaining 109 full-text articles, 69 were excluded. Thus, in total, 40 articles were included [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55], reporting on 29 studies, including 25 studies in RA patients three studies in axSpA patients and one study on both RA and axSpA patients. The flow of the screening process is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of screening process

The publication years of the studies ranged from 2011 up to 2022, with relatively more studies being published in recent years. Of the 29 studies, three studies were published in 2011–2015, 13 studies in 2015–2019 and 13 studies in 2020-present.

The characteristics of the 29 included studies (total number of 22,855 patients) are summarized in Table 1. Overall, most of the studies originated from the US (23 of 29 studies; 79.3%). The study designs included cross-sectional studies (10 of 29 studies, 34.5%), longitudinal cohort studies (15 of 29 studies; 51.7%), randomized controlled trials (two of 29 studies, 6.9%), or other (one pilot study 3.4% and one cross-over study, 3.4%).

Table 1 Characteristics and used PROMIS measures of clinical studies in patients with axSpA and RA

Table 1 shows the various PROMIS measures identified in the included studies. In total, 17 different PROMIS measures were identified in this review, consisting of two general health measures (PROMIS Global Health and PROMIS-29) and 13 measures pertaining to a specific health domain. The latter included the PROMIS Physical Function, Fatigue, Pain Interference, Pain Intensity, Pain Behavior, Sleep Disturbance, Sleep-Related Impairment, Satisfaction with Social Roles and Activities, Ability to Participate in Social roles and Activities, Anxiety, Anger, Depression, and Self-Efficacy Managing Symptoms. The four most frequently used measures were: PROMIS Pain Interference (17 studies), Physical Function (14 studies), Fatigue (13 studies), and Depression (12 studies).

Table 2 shows the details of the specific versions of PROMIS measures being used, classified according to their typology into Item Banks, Computer Adaptive Tests (CATs), and Short Forms. Some of the variation regarding the versions can be explained by the publication dates, with more recent articles reporting more recent versions of a similar PROMIS instrument. Other sources of variation include the precise naming and the number of items used.

Table 2 PROMIS measure versions used in axSpA and RA populations

In Table 3 the T-scores of the PROMIS measures are presented, classified according to the health domain which they represent. In total, eight of the 29 studies did not report actual outcomes of PROMIS measures in terms of T-scores, but reported on their psychometric properties (e.g., the validity, reliability, correlations with other questionnaires, responsiveness, meaningful change) only. The 26 articles presenting actual PROMIS data described the results from 21 studies.

Table 3 T-scores (SD) of PROMIS measures in populations with axSpA and RA

We contacted the authors of one study (18) as the reported score differed considerably from other reported scores (T-score PROMIS Ability to Participate in Social Roles) with the authors confirming its accuracy. For PROMIS measures where a higher score denotes better health, the mean T-scores were > 50 in only one of the 24 reported scores, reflecting the overall poorer health status of people with RA and axSpA. For PROMIS measures where a lower score indicates better health, the mean T-scores were < 50 in six of the 67 reported scores.

There were four PROMIS measures of which actual T-scores were reported in 10 or more articles: PROMIS Physical function: range mean 30.6–46.6, PROMIS Fatigue: range 51.1–66.0, PROMIS Depression: range 45.3–57.7, and PROMIS Pain Interference: range 52.2–65.8, overall indicating poor health.

Discussion

This systematic literature review on the use of PROMIS measures in clinical studies in RA and axSpA patients identified 29 studies described in 40 articles. In total, two general health and 13 domain-specific PROMIS measures were used, with the PROMIS Pain interference, Fatigue, and Physical function and Depression being the measures that were most often reported. Overall, there was considerable variety concerning the versions of PROMIS measures that were used.

The 29 included studies were published from 2011 up to 2022, with relatively more articles published in recent years. As the total number of publications on clinical studies in RA and axSpA has also grown markedly, it remains to be ascertained whether the proportion of studies using PROMIS measures as outcome measures increased with time. Overall, the total number of identified studies using PROMIS measures is quite small as compared to the wealth of clinical studies in inflammatory arthritis published in the past two decades.

Regarding the nature of the PROMIS measures that were identified, most of the measures cover dimensions as described in the International Classification of Functioning, Disability and Health (ICF) Core Sets (Comprehensive and Brief) for Rheumatoid Arthritis and for Ankylosing Spondylitis [56, 57]. Similarly, the full range of measures is in line with the Outcome Measures in Rheumatoid Arthritis Clinical Trials (OMERACT) recommendations for outcome assessment in RA and axSpA patients in clinical trials [58, 59]. Both the ICF Core sets and OMERACT recommendations include health domains rather than specific measurement instruments, such as PROMIS measures. Specific measures are included in the more recently developed ICHOM core set for inflammatory arthritis, which particularly advocates the use of PROMIS measures, i.e., the PROMIS General Health, PROMIS Pain Interference, PROMIS Physical Function and PROMIS Fatigue measures [13]. In line with the ICHOM core set, we found that these were the PROMIS measures that were most often used. However, there was also substantial use of other PROMIS measures that were not recommended by ICHOM, such as the PROMIS Sleep Disturbance, PROMIS Abilities to Participate in Social Roles and Activities, PROMIS Depression and PROMIS Anxiety. Although not advocated by ICHOM, they do concern the domains as proposed by the OMERACT recommendations and the ICF core sets. It is unclear so far if the use of measures covering areas such as sleep indicates that the content of some established core sets must be revised. Moreover, the use of PROMIS measures also depends on the research question to be answered. Hence some studies warrant the use of not recommended PROMIS measures and within such studies the recommended PROMIS measures may be less relevant.

With respect to the actual scores of the PROMIS measures, the T-scores extracted from 21 studies were generally in line with the expectation that patients with RA and axSpA have a worse health status than the general population. There were, however, some exceptions where the T-scores indicated better health than expected, namely in the Depression and Abilities to Participate in Social Roles and Activities. Overall, the number of T-scores available per PROMIS measure was low, and often different versions of an instrument were used. Of note is that we observed considerable variation regarding the versions of specific PROMIS measures that were used. Although this could in part be explained by the launch of updates, there was also quite some variation regarding the number of items and the naming. It remains to be established if comparisons of scores where different versions of one measure have been used are valid. Therefore, taking the latter into account as well as the variation in the number of items and the naming of the PROMIS measures, we could not conduct subgroup analyses. Hence, conclusions on the level of T-scores for RA and axSpA patients cannot be drawn.

This study had some limitations that need to be addressed. First, as a result of the large diversity of the included studies in terms of follow-up moments, presentation of the data, and inclusion criteria, we did not yet review the data on psychometric properties of PROMIS measures according to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidelines. Second, the large variability between studies also hampered the further comparison between populations and studies in terms of a meta-analysis. Third, the limited amount of four studies reporting on axSpA patients (3 studies reported solely on axSpA, one study reported on RA and axSpA patients) compared to the 25 studies solely on RA patients which hampered the interpretation for the axSpA patient group. Subsequently, we were unable to compare these two groups together and we displayed the individual data and analyzed the total data of the RA patients and axSpA patients combined. Finally, the possible overrepresentation of the use of certain PROMIS measures as a result of studies that were based on similar populations. Some studies showed overlap with others but were considered as a separate study since the data were not exactly the same in terms of the sample sizes and general patient characteristics.

Nevertheless, the broad eligibility allowed the inclusion of most of the relevant literature, thereby presenting a fairly complete picture of the use of PROMIS measures in clinical research in inflammatory arthritis. The conduct of the study according to the PRISMA recommendations supports the accuracy and validity of the work.

In conclusion, currently, PROMIS measures are not often used in clinical studies in patients with RA and with axSpA. Within the studies that did use them considerable variety regarding the different PROMIS measures used as well as the specific versions of each instrument was present. As expected, the PROMIS measure outcomes depicted the overall impaired health outcomes in RA and axSpA populations. In future research, to facilitate comparisons across studies, more standardization regarding the use of PROMIS measures in clinical studies in RA and axSpA is needed.