Background

For a randomised controlled trial (RCT) to discern the true effect of an intervention, relevant and robust outcome measures must be chosen. A standardised set of outcome measures used across similar types of trials has the potential to increase their efficiency and value by enabling comparisons between trials and pooling of data, thereby providing more precise estimates of the treatment effect.

Twenty years ago the World Health Organization (WHO) and the International League of Associations for Rheumatology (ILAR) established a core set of outcomes for clinical trials in rheumatoid arthritis. This work originated from the Outcome Measures in Rheumatology (OMERACT) group that developed a framework and methodology (i.e. the OMERACT Filter), for the identification and validation of core outcome measurement sets for use in clinical trials, for any health condition [1]. The OMERACT group has gone on to develop successful core outcome measurement sets for other conditions including ankylosing spondylitis and gout, and the OMERACT Filter and methodology has been widely adopted internationally within the rheumatology community [1,2,3] and other disciplines [4,5,6].

Within the discipline of orthopaedic surgery, the development of a core outcome measurement set for trials involving patients with hip fractures is underway [7]. To our knowledge, there are currently no standardised or universally accepted core outcome measurement sets for clinical trials of joint replacement surgery. With over a million hip and knee joint replacements done each year in the USA alone [8], and the technology for joint replacement surgery evolving rapidly, there is a need for high-quality randomised controlled trials (RCTs). The use of standardised measures of outcome assessment in trials involving joint replacement will facilitate accurate and effective comparisons of new and existing joint replacement implants and techniques, as well as accurate and effective evaluation of the value of pre- and post-operative interventions.

In order to improve the reporting of relevant health outcome domains within joint replacement trials and develop a standard core set, a working group within OMERACT was established in 2008 and preliminary work was completed [9,10,11]. This work demonstrated the lack of well-validated outcome instruments in knee and hip clinical trials and identified the need to develop core outcome domains and a core outcome measurement set with the goal of harmonisation of outcome measures used in joint replacement clinical trials.

The OMERACT Filter 2.0 defines three “core areas” that should be measured within a clinical trial of any disease condition: death, life impact, and pathophysiological manifestations [1]; it also strongly recommends the measurement of resource utilisation. The OMERACT Filter 2.0 provides a roadmap, describing the steps to achieve a final core measurement set for clinical trials for a given condition. Firstly, it recommends relevant stakeholders start by identifying at least one “domain” within each of the core areas to formulate the “core domain set;” an additional file shows this in more detail (see Additional file 1). At least one applicable measurement instrument for each core domain is then identified to formulate a “core outcome measurement set.” Each measurement instrument must prove to be truthful (valid), discriminative, and reliable.

At the OMERACT-12 Meeting (2014), clinical and methodological experts in epidemiology, psychometrics, orthopaedics, and rheumatology along with patient partners interested in harmonising outcomes for people undergoing joint replacement surgery met as a working group. The ultimate aim of the group is to develop and reach international consensus on a core outcome measurement set for joint replacement surgery. In preparation for the meeting, we systematically examined the outcomes reported in all randomised controlled trials of joint replacement surgery published in 2008 and 2013. We found suboptimal reporting of primary outcomes in TJR trials as well as heterogeneity in the primary outcomes when reported [12]. In this paper, we report the extent to which the outcomes reported in the trials fulfil the OMERACT Filter 2.0 core areas of mortality, life impact, and pathophysiological manifestations, and the OMERACT Filter 2.0 strongly recommended area, resource use.

Methods

We undertook the review in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [13, 14]. A PRISMA checklist is provided as an additional file that shows this in more detail (see Additional file 2). The protocol for this review was registered with the International Prospective Register of Systematic Reviews (PROSPERO; Registration number: CRD42014009216).

We included all randomised or quasi-randomised (where allocation not strictly random) controlled trials investigating joint replacement surgery (defined as substitution of any joint surface with a prosthesis) in adult patients ≥18 years published in either 2008 or 2013. We chose 2 years only (2008 and 2013) for our study for two reasons: we anticipated that a 2-year data including a recent year would provide us with a reasonable sample size for our main study to assess consistency with OMERACT filter 2.0 [1]; and a secondary objective was to assess study quality and outcome reporting over time (2008 to 2013) and due to feasibility issues, since we expected >100 studies per year to be eligible, limited resources prohibited a review of 6-year trial data (reported in a separate manuscript ) [12]. We excluded trials investigating spinal joint replacement surgery and those trials where the intervention of interest was not part of the intraoperative insertion of joint replacement prosthesis, for example, trials investigating pre-operative education, peri-operative analgesia, or post-operative care.

The comparator could include another type of joint implant, surgical placebo or sham, usual care, physical therapy, or other active treatments. Trials were included if at least one outcome had been reported. Only trials published in English as full articles or available as full trial report were included.

We searched the Cochrane Central Register of Controlled Trials, MEDLINE, EMBASE, and hand searched reference lists of relevant articles for randomised or quasi-randomised controlled trials on 20 March 2014. We limited the search to publications in 2008 and 2013, in order to capture recent trials. The search strategy used for MEDLINE is provided as an additional file and shows this in more detail (see Additional file 3).

Two authors (BR and PW) independently assessed the search results based on the title and abstract, and the full texts of all potentially eligible studies were then assessed to identify studies that fulfilled inclusion criteria. Any disagreement in study selection was resolved by consensus or by discussion with a third review author (RB).

Trial details were extracted for each trial including the first author, year of publication, and interventions. Additional details including number of participants, year of recruitment, study duration, and sample size were also extracted but are reported in a separate manuscript [12].

We extracted all outcome measures using a standardised data extraction form. Outcome measures were then grouped according to outcome domains and then grouped according to the three OMERACT core areas, pathophysiological, life impact, and death or the recommended area, resource use. Joint-specific multidimensional outcome measures were broken down into constituent outcome domains and then grouped according to the four OMERACT core areas. The data was then aggregated and reported using simple summary statistics.

Results

There were a total of 1635 potential studies identified from the initial searches after de-duplication (41 duplicates in 2008 and 60 duplicates in 2013), and 70 trials (30 published in 2008 and 40 published in 2013) met the eligibility criteria and were included in the review (Fig. 1). Screening of titles/abstracts was done over 3 weeks, data abstraction over the next 4–6 weeks and data analyses for the 4 weeks after that. No published trials of joint replacement involving the foot, ankle, or elbow were identified. There were 27 trials for hip, 39 trials for knee, three trials for shoulder, and one trial for replacement surgery of the small joints (Table 1). The inter-rater agreement was 86% for 2008 and 93% for 2013 initial abstractions. One hundred percent consensus was reached by discussion and with involvement of a third reviewer. There were 13 joint-specific multidimensional outcome tools reported; all of which measured outcome domains of both pain and function (Table 2). Nine (69%) of the joint-specific multidimensional outcome tools were patient reported.

Fig. 1
figure 1

We identified 1635 potential studies  from the initial searches after de-duplication (41 duplicates in 2008 and 60 duplicates in 2013 were removed). Seventy trials, 30 published in 2008 and 40 published in 2013 met the eligibility criteria

Table 1 Studies of hip and knee arthroplasty from 2008 to 2013
Table 2 Constituent outcomes for multidimensional joint-specific outcome tools

A mean of six outcome domains were reported per trial. Twenty-two (31%) trials reported outcome domains/measures in all three of the essential OMERACT core areas (pathophysiological, life impact, and death), and 21 (30%) trials reported outcome domains/measures in the recommended area of resource utilisation.

Hip replacement trial outcome domains

Twenty-seven trials of hip replacement surgery were included (10 published in 2008 and 17 published in 2013) (Table 3). Eighteen unique outcome measures were identified with a mean of six outcome measures per trial. Eleven (41%) trials reported an outcome domain/measure within all three of the essential OMERACT core areas. The most common outcome domains/measures reported were pain (20/27, 74%) and function (23/27, 85%).

Table 3 Hip RCT outcomes and their mapping to the three core and one optional OMERACT areas/domains

Seven unique outcome domains/measures mapped to core area pathophysiological, five mapped to life impact, five mapped to resource use and one mapped to death. Core area pathophysiological was represented most frequently with 86 instances of mapping to this area.

Knee replacement trial outcome domains

Thirty-nine trials of knee replacement surgery were included (19 published in 2008 and 20 in 2013) (Table 4). Twenty-one individual outcome domains/measures were identified with a mean of six per trial. Nine (23%) trials reported an outcome domain/measure within all three of the essential OMERACT core areas. The most common outcome domains/measures were pain (26/39, 67%) and function (27/39, 69%). Nine outcome domains mapped to pathophysiological, five mapped to life impact, six mapped to resource use and one outcome mapped to death. Core area pathophysiological was represented most frequently, with 150 instances of mapping to this area.

Table 4 Knee study outcomes and their mapping to three core and one optional OMERACT areas/domains

Shoulder replacement trial outcome domains

There were three (4%) trials of shoulder replacement surgery; an additional file shows this in more detail (see Additional file 4). Outcome domains/measures of pain, strength, and activity levels were reported in all three trials. Seven outcome domains mapped to pathophysiological, three mapped to life impact, and one outcome domain mapped to death. Core area pathophysiological was represented most frequently with 12 instances of mapping to this area.

Hand joint replacement outcome domains

There was one (1%) trial involving replacement of the small joints of the hand reporting six individual outcome domains/measures with four mapping to pathophysiological and two to life impact core areas. An additional file shows this in more detail (see Additional file 4).

Discussion

The purpose of this systematic review was to examine and highlight inconsistencies in reporting of joint replacement trials and make recommendations for future studies in the area. This systematic review has highlighted that there are significant gaps in the measurement of OMERACT core outcome areas in joint replacement trials. Less than a third (31%) of trials captured outcome domains/measures within all three essential OMERACT core areas. The majority of joint replacement trials (but not all) did, however, capture outcome domains/measures of pain (71%) and function (77%). This finding is in keeping with the principles and primary indications for joint replacement surgery, which are to relieve pain and improve function. All of the joint-specific multidimensional outcome tools included in the trials capture both pain and function, which is a reflection that these measures are well established and accepted by the orthopaedic community for monitoring outcomes after joint replacement surgery [15].

All trials captured domains within the core area of pathophysiological manifestations, with many trials reporting surrogate outcome domains such as radiosteriometric analysis (RSA) and plain radiographs to assess implant loosening. RSA uses x-rays to determine the implant position and is a well-validated tool for measuring the movement of implants following joint replacement surgery. RSA requires specialist equipment and training to use and therefore really only has a role in early/short-term clinical evaluation of joint replacements. The correlation between movement detected on RSA and longer term clinically meaningful implant failure is not well documented or validated [16]. It is not surprising that the OMERACT filter 2.0 framework specifies both pathophysiological manifestations and life impact (such as pain, function, mobility, quality of life) as two of the three core areas for any disease construct. In our example, filter 2.0 indicates that it is just as important (if not more) to know the true clinical impact of a difference in implant positioning between interventions, i.e. implant failure/revision and pain, function, quality of life (impact on the patient) as is knowing the exact positioning of the implant (e.g. by RSA).

Measurement of mortality is one of the three core areas of the OMERACT Filter 2.0, but was reported in only 36% of the trials reviewed. In addition, none of the trials reported whether or not mortality was considered attributable to the interventions under study or underlying condition/s. Measurement and reporting of 7-, 30- and 90-day mortality, or mortality during the trial (3 or 6 or 12 months) could capture potential intervention-related versus unrelated deaths and be supplemented with a case by case review to determine the cause of death. For joint replacement, which is usually an elective procedure, mortality is rare, but unexpected. Therefore, mortality reporting is very important. As in any clinical trial, study subject mortality is always known to the investigator and its reporting is quite simple, i.e. “there were no deaths in this trial,” and or adding a row with zeros (or the number applicable) to the table showing adverse events of each intervention being compared.

We also found that less than a third (31%) of trials captured the OMERACT recommended area of resource utilisation. Without comprehensive data about resource utilisation, it is difficult to determine the true comparative effectiveness (and cost-effectiveness) of one type of joint replacement compared to another. A potential reason for this may be a lack of appropriate outcome measures or a lack of consensus as to which outcome measure/s to use. Joint replacement is typically an elective surgery, and therefore, in principle, resource utilisation is pertinent and appropriate to capture from both the individual’s and system’s perspective. Outcome tools would need to be identified which could capture the individual initial costs of surgery and follow-up hospital visits but also any additional costs incurred as a result of further surgery or its complications.

One of the limitations of this review is that we only included two snapshots of joint replacement research trials, i.e. trial results published in 2008 and 2013. Our results may therefore not be truly representative of periods just before, between and after these dates. On the other hand, there is no reason to suspect that outcomes/measures and trial reporting would differ significantly different in other years.

Successful adoption of the original OMERACT filter [17] for validation of measures has led to the successful development and implementation of core domain sets and core measurement sets for various rheumatic and non-rheumatic diseases [1, 4,5,6, 18]. An updated version, OMERACT filter 2.0, is based on the WHO framework [1]. OMERACT filter 2.0 provides a practical framework to develop and validate domains and measures for any health condition. A pragmatic approach is to use a data-driven, consensus-based process with multi-stakeholder involvement to define a minimum measurement set for all joint replacement trials. In line with the OMERACT working group’s future agenda for achieving an international consensus-based core domain set for joint replacement trials, and building upon the findings of this review, we have derived a preliminary core domain set for joint replacement clinical trials based on the OMERACT filter 2.0 and multi-stakeholder consensus. The joint replacement clinical trial core domain set includes six core domains: pain, function, patient satisfaction, revision, adverse events, and death [19].

Conclusions

In conclusion, this systematic review provides insights into the outcome areas/domains being used and reported in contemporary joint replacement RCTs and highlights the gaps in this area. The minimum standard of outcome reporting within joint replacement trials needs improvement. The OMERACT Filter [1] provides a well-established methodology for improving this, i.e. providing guidance and methods for developing a core outcome measurement set. RCTs are expensive time-consuming studies. As researchers, we have a duty to patients to extract as much clinically useful information as possible. The development of a core outcome measurement set for joint replacement trials would undoubtedly help to strengthen both the design and subsequent reporting of results in much the same way as it has within rheumatology clinical trials, and hopefully advance the field at an accelerated pace, by allowing comparisons across trials and standard meta-analyses.