Introduction

Advances in health information technology have the potential to elevate the quality of patient care, especially by providing clinicians with efficient measures of patient reported outcomes (PROs) that provide insights into health-related quality of life (HRQoL) during treatment. In the field of orthopaedic surgery, HRQoL assessment instruments help elucidate patients’ well-being and functional capabilities beyond visible outcomes [1, 2]. Numerous validated HRQoL assessment instruments exist in orthopaedics, commonly described as legacy measures. These include American Shoulder and Elbow Surgeons Score (ASES), Disabilities of the Arm, Shoulder and Hand (DASH), Foot and Ankle Ability Measure (FAAM), Knee Injury and Osteoarthritis Outcome Score (KOOS), and others [1, 3]. However, most of these instruments are narrow in scope, limited to specific outcomes or mobility constructs [1, 3]. The National Institute of Health (NIH) Patient Reported Outcomes Measurement Information System (PROMIS) was developed to deliver standardized, precise, quantitative values for individual domains of health and well-being [4], and has great potential to improve understandings of PROs in orthopaedic cases. By design, PROMIS outcome instruments report outcomes utilizing standardized T-scores. The computerized adaptive test (CAT) feature, available for many PROMIS instruments, is the most efficient method of collecting useful PROs in a multitude of musculoskeletal conditions by utilizing item response theory [1, 5, 6]. Patients respond to questions and the system is programmed to select subsequent questions based on answers of previous questions, which minimizes the burden on the patient while providing maximally useful information for clinicians [6]. The most important outcome domain in orthopaedic surgery is physical health (PH) [7]. In PROMIS, PH includes Physical Function (PF) and subdomains, such as Pediatric Mobility, Upper Extremity (UE), and Lower Extremity (LE) [8]. PROMIS PF CAT selects from a 124-item bank [6], and requires 12 or fewer questions to identify the most informative PF value [5]. While PROMIS PF CAT was not designed for any particular disease, the range of PRO values available allow it to be tailored for use in specific medical conditions.

Psychometric validation of HRQoL assessment instruments generally requires evaluation of reliability, responsiveness, and validity [9, 10]. However, the use of different terminology for the same measurement properties can complicate the consensus for validity of an assessment instrument [11]. While Sullivan established guidelines for assessing the validity of PROs assessment instruments [12], and the Consensus-based Standards for the selection of health status Measurement Instruments (COSMIN) study developed international agreement on taxonomy, terminology, and definitions of measurement properties [11], the subsequent application of these terms remains to be evaluated.

A scoping study allows us to establish uniform application of terms and definitions for assessing validity of measurement properties as applied in orthopaedic research. This approach is defined by Daudt et al., as an attempt to “map the literature on a particular topic or research area and provide an opportunity to identify key concepts; gaps in the research; and types and sources of evidence to inform practice, policymaking, and research” [13]. A key feature of a scoping study is that the research aims to provide an overview of all existing literature concerning a broad topic [13, 14], whereas the purpose of a systematic review is to provide a summary of the leading existing research on a specific question [15].

HRQoL assessment instruments provide patients an opportunity to better understand their medical condition and be involved in their own care—key steps in reaching an appropriate and successful treatment plan. Given growing recognition of the importance of patients’ involvement in their own care, PROMIS is a measurement system which contains unique measures for improving patient care throughout orthopaedics. The efficacy of PROMIS has been demonstrated in many diseases including rheumatoid arthritis, chronic heart failure, and cancer [16]. However, its application throughout orthopaedic care has yet to be depicted. This scoping study sought to elucidate the extent to which orthopaedic surgery subspecialties have used and validated PROMIS measures in peer-reviewed research in order to identify its potential as an applicable and valuable tool across subspecialties in orthopaedics.

Methods

Approach

This study followed the methodology developed by Arksey and O’Malley [14], and further enhanced by Daudt et al. [13]. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist was followed for reporting the results of the study [17].

Data source

We identified all peer-reviewed publications in the National Library of Medicine (NLM) PubMed database that utilized PROMIS measures within adult orthopaedic surgery using specific search criteria: (PROMIS) AND (orthopaedics OR orthopaedic OR orthopedics OR orthopedic). The NLM PubMed database search was conducted on January 1, 2021. This search identified both assessments of surgical patient outcomes with PROMIS and analyses of the quality of PROMIS as a measurement system within orthopaedic surgery. Pediatric publications, literature reviews, and publications that were unrelated to the care of orthopaedic patients or did not utilize PROMIS in the study were excluded from analysis as identified by individual review of publications. The full text of each publication was independently reviewed by one of two reviewers.

Outcomes and variables collected

We then evaluated and charted each publication for study design, level of evidence, number of patients, and PROMIS domains and instrument format tested. Publications were separated by orthopaedic surgery subspecialties, including Foot and Ankle (FA), Hand and Upper Extremity (HUE), Tumor, Trauma, Adult Reconstruction (AR), Sports, Spine, and General Orthopaedics. We excluded review and editorial publications as well as pediatric orthopaedic surgery publications. Level of evidence was determined following the updated assignments provided by the Journal of Bone and Joint Surgery [18]. PROMIS domains included Global Health (Physical and Mental), PF, UE, LE, Pain Interference, Pain Intensity, Pain Behavior, Depression, Anxiety, Social Satisfaction, and Fatigue. Instrument format of PROMIS domains included CAT or short form. Study design included prospective, retrospective, cross-sectional, randomized controlled trial (RCT), and case report or series.

Assessment of PROMIS validation studies

After initial review, we further assessed each publication to determine whether the performance of PROMIS PH CAT measurement instruments (PF, UE, and LE) use in orthopaedic surgery care was analyzed following the validation guidelines developed by Sullivan [12], with respect to reliability, responsiveness, and validity following terminology defined by COSMIN [11]. In this study, a PROMIS validation study refers to a publication that statistically analyzed a PROMIS domains’ reliability [19], responsiveness [20], or validity [21], using the following statistical tests described [22, 23]. The statistical analysis of a PROMIS domain’s validity relates to the evaluation of content validity, construct validity, or criterion validity [11]. Each validation study was assessed for any recommendations on whether the PROMIS PH CAT domains utilized (PF, UE, and LE) were accurate and useful in orthopaedic patients.

Reliability including internal consistency and inter- and intra-rater reliability, was presented by Cronbach’s alpha, kappa statistics, percentage agreement, or a correlation coefficient [11, 24].

Internal and external responsiveness was assessed using a range of statistical tests including effect size, standardized response mean, relative efficiency statistic, the response statistic, and correlation (using Spearman’s rho) [11, 20, 25]. While evaluating the minimal clinically important difference and floor and ceiling effects of assessment instruments risks spurious change and does not maintain the same statistical integrity as the prior evaluation tests, studies that calculated these values were included as psychometric tests of responsiveness, as these calculations are necessary to measure responsiveness of a given instrument [26].

Modern validity theory from the psychometric perspective requires specific contexts to be evaluated in order to assess the validity of a PROs assessment instruments [27, 28]. Therefore, the types of validity evaluated by this study looked to denote how interpretable PROMIS PH CAT scores are in various contexts of orthopaedic clinical care and research [29]. We included the three types of validity defined by COSMIN when evaluating the performance of PROMIS PH CAT: content validity, construct validity, and criterion validity [11]. Assessment of content validity uses judgements from experts in the field to give a scale of relevance for the construct or the dimensions of the construct evaluated and an average relevance is calculated [30]. Additionally, confirmatory factor analysis, a special form of structural equating modeling, analyzes specific structures and components of the construct through correlations between latent variables: mathematically inferred variables from observed variables [31]. Use of structural equating modeling to confirm specified relationships between PROMIS domains and events of interest in a disease or treatment qualified as measurement of content validity in the publications found [31, 32].

Assuming content validity, construct validity evaluates the consistency of the assessment instrument with different hypotheses [11]. For instance, evaluating construct validity can refer to the ability of PROMIS to discriminate between relevant groups or confirm relationships to known risk factors [11, 21]. Several methods for testing construct validity have been described including correlation calculations such as Pearson’s rho, multivariate analysis, confirmatory factor analysis, and covariance component analysis [33, 34]. Finally, criterion validity refers to the degree to which the assessment instrument correlates to previously validated or gold standard instruments [11], as tested by correlation coefficients [35].

Results

Selection of sources of evidence

The NLM PubMed database identified 493 non-duplicated publications. Individual review of the search results identified 102 publications that were not related to the primary objective of the search criteria and therefore excluded from further analysis: 36 pediatric orthopaedics publications, 29 review studies, 9 editorials, 1 published erratum, and 27 publications that were unrelated to the care of orthopaedic patients or did not utilize PROMIS in the study. The Preferred Reporting Items for Systematic Reviews of Meta-Analyses (PRISMA) diagram in Fig. 1 illustrates the sequence of review results collected in this study [36].

Fig. 1
figure 1

The preferred reporting items for systematic reviews of meta-analyses flow diagram for Patient-Reported Outcomes Information System (PROMIS) publications in adult orthopaedic surgery collected

All orthopaedic surgery PROMIS publications

Of the total 391 publications assessed (Additional File 1), 153 (39%) were PROMIS PH CAT validation publications. From 2011 through 2020 there were increasingly more orthopaedic PROMIS studies published each year in all specialties except for Trauma (Table 1) and an increase in the number of studies investigating or utilizing PROMIS PH CAT domains in all specialties (Fig. 2). PROMIS publications most often reported HUE outcomes (26%, n = 100), followed by Spine (18%, n = 69) (Table 2). More Level I (8%, n = 3) and RCT (11%, n = 4) studies were published in the Trauma subspecialty relative to other subspecialties. UE, Pain Interference, and Depression domains were utilized the most frequently in HUE; PF and Pain Intensity domains were utilized the most in Spine. Six percent (n = 22) of publications were Level I; 33% (n = 129) were Level II; 50% (n = 196) were Level III; 11% (n = 43) were Level IV.

Table 1 Characteristics of all PROMIS publications by publication year
Fig. 2
figure 2

Number of all adult orthopaedic surgery Patient-Reported Outcomes Information System (PROMIS) studies and PROMIS physical health computerized adaptive test (CAT) studies (Physical Function, Upper Extremity, and Lower Extremity) published each year from 2011 through December 31, 2020

Table 2 Characteristics of all PROMIS publications by orthopaedic subspecialty

PF (I: 50%, n = 11; II: 73%, n = 94; III: 68%, n = 133; IV: 58%, n = 25) and then Pain Interference (I: 36%, n = 8; II: 63%, n = 81; III: 51%, n = 99; IV: 53%, n = 23) were utilized the most within each level of evidence degree, and within each orthopaedic subspecialty with the exception of General Orthopaedics (Table 3).

Table 3 Characteristics of all PROMIS publications by level of evidence

Orthopaedic surgery physical health PROMIS validation publications

Ninety-five percent (n = 146) of all orthopaedic surgery PROMIS PH CAT validation publications determined that the instruments were responsive, reliable, and valid. Two studies in AR (18%), two in HUE (5%), two in Sports (8%), and one in Trauma (10%) did not find PROMIS PH CAT instruments to be valid instrument within their respective field. Specifically, these studies found problems with PROMIS PH CAT criterion validity and responsiveness.

Eighty-five percent (n = 130) of all orthopaedic surgery PROMIS PH CAT validation publications analyzed PF, 30% (n = 46) analyzed UE, and 3% (n = 4) analyzed LE. More PROMIS PH CAT validation publications were performed in 2019 (35%, n = 53) than any other year (Table 4). PROMIS PH CAT validation publications most often reported HUE outcomes (26%, n = 40), followed by Spine (23%, n = 35) and Sports (16%, n = 24) (Table 5).

Table 4 Characteristics of publications validating physical health CAT PROMIS domains by publication year (includes physical function, upper extremity, lower extremity)
Table 5 Characteristics of publications validating physical health CAT PROMIS domains by orthopaedic subspecialty (includes physical function, upper extremity, lower extremity)

Reliability was the least-often analyzed component of PROMIS PH CAT performance throughout each subspecialty (range, 4–67%), as compared to responsiveness or validity. Reliability was analyzed most frequently in HUE validation studies (n = 12), followed by FA validation studies (n = 6). Responsiveness was analyzed most frequently in HUE validation studies (n = 33), followed by Spine (n = 21) and Sports (n = 19) studies. At least one form of validity (criterion, content, or construct) was analyzed in over 65% of all subspecialties and analyzed in over 80% of General Orthopaedics, Spine, Trauma, and Tumor validation studies. More than one form of validity was analyzed in over 20% of FA, HUE, Spine, and Tumor validation studies. Five percent (n = 7) of validation publications were Level I studies; 50% (n = 77) were Level II studies; 42% (n = 65) were Level III; 3% (n = 4) were Level IV (Table 6). The majority of Level I studies were performed in FA (43%, n = 3), Level II studies in HUE (29%, n = 22), and Level III studies in both HUE and Spine (25%, n = 16).

Table 6 Characteristics of publications validating physical health CAT PROMIS domains by level of evidence (includes physical function, upper extremity, lower extremity)

PROMIS PF CAT specifically was validated in 130 studies, 50% of which were Level II studies. Since 2011, PROMIS PF CAT was analyzed for reliability 29 times, responsiveness 110 times, and at least one form of validity 118 times.

Discussion

The increased utilization of PROMIS measurement instruments across all types of orthopaedic surgery has enabled surgeons to gain a deeper understanding of patients’ physical and mental health while engaging patients more directly in their care. Compared to legacy measurement instruments (ASES, DASH, FAAM, KOOS) which are generally narrow in scope and can incur patient and administrative burden [1, 3], PROMIS CATs have the capacity to be tuned to orthopaedic diseases and improve patients’ experiences in orthopaedic surgery clinics [37]. These tests are enabling surgeons to interpret the patient’s HRQoL before and after treatment [3]. Additionally, understanding the degree and impact of a patient’s pain provides surgeons with a metric for tailoring treatment to each patient’s specific goals and needs, whether that be surgical or medical management [38, 39]. This scoping study demonstrates that in addition to becoming a more frequent subject of analysis, the PROMIS PH CAT domains (PF, UE, and LE) have repeatedly been shown to be reliable, responsive, and interpretable instruments when utilized in most contexts of orthopaedic surgery.

This scoping study determined that from January 1, 2011, through December 31, 2020, the PROMIS PH CAT was found to be interpretable as analyzed by at least one type of validity in various contexts throughout all orthopaedic surgery subspecialties in a total of 146 studies. In particular, PROMIS PF CAT was interpretable in 130 studies, 50% of which were Level II studies. Specific PROMIS PH CAT subdomains were first proposed in 2011 by Hung et al. [8], and have since been tested for reliability 29 times, responsiveness 110 times, and at least one form of validity 118 times. The extensive analysis of PROMIS PH CAT validity demonstrates the potential of PROMIS to assess PH in orthopaedic surgery patients. More importantly, this establishes an instrument that should effectively depict the patient’s perception of physical function status. As a widely interpretable outcome assessment instrument, PROMIS PH CAT may benefit patient care and advance orthopaedic outcomes research.

While PROMIS CAT is being shown to be interpretable more frequently and in more contexts, several limitations remain. Integrating these measurement instruments into electronic medical records remains a substantial obstacle, predominately due to financial, logistic, and technological barriers [40]. However, large-scale clinical implementation is possible and has valuable potential for improved patient care and experience [41]. Furthermore, while the short form format of PROMIS allows it to be administered as a physical test, the CAT format requires extra technology. The potential benefits outlined above may outweigh these costs in many settings. Additionally, CAT has been shown to have an improved ability to distinguish between two patients with similar health status [42], which can provide valuable insight when distinguishing between small details that can improve capabilities to diagnose and provide care.

Limitations

We note several limitations of this analysis. The evaluation of each publication was performed by two reviewers, which risked reporting bias of selective inclusion of research findings. However, the studies analyzed had clear descriptions of collection variables and followed the terminology and guidelines created for studies validating assessment instruments [11, 12], which contributed to more reliable evaluation of publications. Utilization of specific statistical methods in evaluation of instrument validation reduced potential disagreement of publication type and analyses performed. Additionally, publications were not evaluated on quality of the results; recommendations for PROMIS instruments from validation studies were taken directly from the publication, following common methodology of scoping studies [13, 43].

Our scoping study solely searched the NLM PubMed database, which risked evidence selection bias due to the potential for missed studies published in other databases. However, the relatively high number of 391 publications demonstrated sufficient evidence of PROMIS usage in orthopaedic surgery. At the time of the search, the orthopaedic surgery Tumor subspecialty had only six PH CAT validation publications, which may be an area of further exploration. Finally, given the nature of a scoping study, the results can only be as good as the publications evaluated. Therefore, each publication was evaluated for number of patients studied and publication level of evidence, and validation was evaluated based on statistical methods. Stratification of the publications based on these variables allows readers to observe these differences and make their own inferences. Regardless of these limitations, our scoping study provides an exhaustive overview of the existing literature on the usage of PROMIS in orthopaedic surgery [13].

Conclusions

PROMIS utilization within orthopaedics as a whole has significantly increased within the past decade, particularly within PROMIS CAT domains. The existing literature reviewed in this scoping study demonstrates that PROMIS physical health CAT domains (PF, UE, and LE) are reliable, responsive, and interpretable in most contexts of patient care throughout all orthopaedic surgery subspecialties. PROMIS enables orthopaedic surgeons to gain a deeper understanding of a patient’s physical and mental health directly from the patient, facilitating the potential to improve shared decision-making and quality of care. With numerous validation analyses of PROMIS PH CAT domains and the increasing utilization of PROMIS instruments, this study demonstrates that PROMIS PH CAT measurement instruments have much success in various contexts of orthopaedic clinical care and research. Clinicians and researchers should consider the use of PROMIS instruments within each context specifically, but in many instances, PROMIS PH CAT measures may work well in orthopaedic applications. While challenges of integrating these measurement instruments into electronic medical records exist, large-scale clinical implementation is possible and has valuable potential for improved patient care and experience; this implementation process should be an area of further research and a future healthcare objective.