Introduction

Like many other chronic autoimmune diseases, rheumatoid arthritis (RA) is a heterogeneous disorder with a varying and largely unpredictable clinical presentation and disease course.1 The numbers and types of joints and body systems affected by RA generally differ between patients, as do the timing and nature of disease onset and the rapidity and aggressiveness of disease progression. Few reliable prognostic markers exist,2 although many outcomes are recognized by rheumatologists and health authorities to be important in assessing disease status. The three features measured most are acute signs and symptoms, bony (structural) changes, and overall long-term physical function and mental status.3

Until around 20 years ago, the issue of adequate RA disease assessment presented a substantial hurdle to investigators. To address this problem, various measures and scoring systems were developed to assess important clinical and structural outcomes, one of which was adopted by the American College of Rheumatology (ACR).4,5,6,7 Shortly thereafter, the FDA published a guidance document that included the ACR's recommendations for the study of new investigational agents.3 European health authorities concurrently developed and adopted similar outcome measures for new therapies.8,9

Health authorities in the US and Europe suggested that companies wishing to make claims about investigational therapies must study at least the clinical, structural (radiographic) and long-term functional outcomes of RA. To encourage long-term studies, the FDA recommended that, in clinical trials for product licensure, data on these three features should be collected, respectively, for no less than 6 months to identify acute improvements in clinical signs and symptoms, 1 year to assess radiologic effects and 2 years to assess health-related quality of life. The collection of long-term safety information, especially for immunosuppressive, immunomodulatory or immunogenic agents (e.g. biologics), was also encouraged.3,8 One of the key outcomes of efforts to improve clinical trial design has been the development of a number of new therapeutic agents for RA in the past decade or so.10 In particular, new types of therapies, including many new biologic agents, are now available (Table 1).

Table 1 Approval details of major new therapies for rheumatoid arthritis.

This Review discusses the outcome measures used to assess new investigational therapies for the treatment of RA, ways to ensure study rigor and the challenges that lie ahead.

Efficacy outcomes

Clinical assessment

As RA is a heterogeneous disease, assessing improvement in disease status is usually measured with the use of a composite end point. Various outcome measures are in common use (Table 2). The most commonly used efficacy outcome in the US is the ACR index.11 A response to therapy is defined by this outcome measure as at least 20% improvement from baseline (ACR20 response) in tender and swollen joint counts, three of five patient or physician global assessments, pain and disability (each measured by the visual analogue scale), and an acute-phase reactant (measured by erythrocyte sedimentation rate [ESR] or serum C-reactive protein level). Other, more robust levels of efficacy assessment, including a minimum of 50% (ACR50) or 70% (ACR70) improvement in the above parameters are also widely used. Major clinical improvement is defined as an ACR70 response for at least 6 months, and remission as absence of fatigue, joint pain and tender and swollen joints, with morning stiffness less than 15 minutes in duration and low ESR (<30 mm/h for males; <20 mm/h for females) for at least two consecutive months.6,7 Widespread adoption of the ACR index in the past decade has unified the study of RA in the US and assisted in the substantial advancement of the field.

Table 2 Outcomes used to measure efficacy in rheumatoid arthritis trials.

The main limitation of the ACR index is that it only measures changes in signs and symptoms over time. In part to measure disease status at a given time point as well as changes over time, European investigators developed the Disease Activity Score (DAS). The DAS28, which scores 28 joints, is a calculated, numerical score ranging from 2 to 10 and, like the ACR index, incorporates multiple outcome measures. Data are based on a mixture of objective assessments and a general health assessment reported by patients—measured with a visual analogue scale (Table 2).8 Specific cutoff points in the scores indicate low (<3.2), moderate (3.2-5.1) and high (>5.1) disease activity. The European League Against Rheumatism (EULAR) has developed response criteria based on the DAS. These responses are based on improvement in scores from baseline over time following therapy, and are dependent on the patient's baseline condition. For example, for a patient with low baseline disease activity, a good clinical response is defined as an improvement greater than 1.2, a moderate response as an improvement of 0.6–1.2, and no response as an improvement less than 0.6; a patient achieving a final score of 2–2.6 is defined as being in remission.

Health authorities have grown accustomed to using both the ACR index and the DAS28 (and their permutations) for interpreting patient data. Different percentage or numerical thresholds for improvement can be set for the ACR or DAS, respectively, which can be measured either at a particular time (e.g. landmark analysis) or calculated over time (e.g. area-under-the-curve). While the value of these derivative measures of the ACR and DAS is often an increased sensitivity to detecting treatment effects in clinical trials, their relevance becomes less certain in clinical practice. For example, while the minimum ACR score (ACRn response) can be used to detect smaller incremental improvements in patients' signs and symptoms following therapy, it is a less intuitive quantifier of improvement than other outcomes, and has never been validated as a meaningful or relevant measure of clinical benefit.

Radiographic assessment of pathologic changes in bones and joints related to RA disease activity is one of three parameters that are most frequently used by rheumatologists to assess product efficacy in clinical trials.12 Which joints to include in a scoring system, which bony abnormalities to evaluate (bony erosions, joint space narrowing, malalignment, soft tissue swelling, ankylosis etc.) and which radiographic views to use are all important considerations for investigators and health authorities. In addition, the experience and number of readers of the radiographs, whether scores are determined by consensus or averaged among readers, and the order in which the films are read can all influence the interpretation of radiographic outcome data.12 For example, having radiographs assessed by more than one reader has been shown in a clinical trial to reduce measurement error rates and increase precision.13 One randomized controlled trial by Fries et al.14 showed that two readers was optimal for clinical studies, and that reader training was essential for consistent results.

The Sharp score15 and Larsen score,16 and their modifications, are radiographic scoring systems that have been widely used in clinical trials (Table 2). Both of these systems measure the bony damage of representative joints of the hand and/or foot—frequently from the posterior–anterior view.

The Sharp scoring system is currently the most widely used radiographic scoring system in clinical trials. The system has been modified a number of times since it was originally proposed in 1971. These modifications include one from Sharp himself in 1985 (now considered the standard Sharp method), one by van der Heijde in 198917,18,19 and one by Genant in 1998 (widely in use today).20

Sharp's modified method considers 17 areas for erosion and 18 areas for joint space narrowing in the hands—the two types of bony abnormalities deemed the most important for assessing damage in RA. Each erosion scores one point, up to a maximum of five points for each area. Joint space narrowing is scored on a scale of 1–4. For example, a score of 1 is given when there is evidence of only focal narrowing, while a score of 4 is assigned when ankylosis is present. Thus, total erosion scores range from 0 to 170, and total joint space narrowing scores range from 0 to 144.13,15

In van der Heijde's modification of Sharp's method, erosions and joint space narrowing in the feet are also assessed, scored on a scale of 1–10, but fewer joints are assessed than in the hands, which still, therefore, contribute more than the feet to the final, overall score. In Genant's modification, only bones of the hands are scored, but erosion is scored on an eight-point scale with increments of 0.5, and joint space narrowing is scored on a nine-point scale with the same increments. The final score is, therefore, normalized to a scale of 0–200.

The strengths and weaknesses of these scoring systems, including their sensitivity, specificity and time needed for completion, are often debated. The more-detailed Sharp and van der Heijde versions are considered by some to have better sensitivity for changes than the less-detailed Genant modification; however, the increased time needed to complete more-detailed systems is considered to be a notable drawback by some investigators.13

Radiographic benefits can be detected as early as 3–4 months, and certainly by 6 months, following initiation of some therapies. To encourage sponsors to conduct trials for longer durations, however, the FDA and other health authorities recommend that patients be treated for at least 12 months before definitive claims about radiographic improvement are made. Furthermore, the authorities recommend that the study of radiographic progression of disease should continue for several years to determine the longer-term benefits. Many observational studies that measure radiographic progression last 2–5 years or longer, and many biopharmaceuticals in these studies have shown significant benefits on patient function and disease modification.21

Functional assessment

Patient function (disability) was included as a core feature of disease activity by the rheumatologists who devised the ACR criteria,6 but it is frequently reported as a separate outcome measure. To encourage long-term assessments of patient function, the FDA has effectively required that patients be followed up for a minimum of 2 years.3 Several tools describe metrics for this feature (Table 2).

The Health Assessment Questionnaire (HAQ) was originally developed in 1978 at Stanford University, CA, USA.22 It was one of the first methods developed to enable measurement of patient function, and is widely used—in various forms—in RA and other diseases. The results achieved with HAQ are not, however, as sensitive or specific as those yielded by the Short Form-36 Questionnaire (SF-36) physical functioning scale, role–physical scale or physical component score used in patients with psoriatic arthritis, systemic lupus erythematosus and systemic sclerosis, respectively.23,24,25,26 The original HAQ includes 20 questions in eight categories (Table 2). Each category contains two or three questions on dressing, standing, eating, walking, toileting, reach, grip and instrumental activities. Answers are scored from 0 to 3, with 3 representing the worst functioning. The highest score is taken as the score for that category, and the final HAQ score is the average of those for all categories. Shorter versions of the HAQ, including the modified HAQ, also exist.27

The SF-36 is a generic measure of patient function that is used across an even wider range of diseases than the HAQ. The form includes 36 questions across eight health domain areas, divided into both mental and physical aspects of health. The SF-36 has been shown to be well suited to capturing overall patient benefits in studies of RA treatment.28

Safety outcomes

Sponsors of new therapies for RA must obtain adequate safety information. Below are some safety outcomes that are of particular interest for many new RA therapies.

As treatment for RA frequently involves either suppression or substantial modulation of the immune system, the risks of infection or malignancy are often increased.29 This effect is particularly true for biologic agents that target potent immune activators (e.g. tumor necrosis factor). Diligent monitoring of patients in clinical trials and long-term, post-marketing registries is often necessary to prevent complications.

The emergence of new biologic therapies has also increased the importance of assessing immunogenicity, which can result in depletion and/or inhibition of the treatment. Other safety concerns include serum-sickness-like syndromes.30,31

Trial design considerations

Unmet medical needs in the patient population

Unmet medical needs are perhaps the most important considerations in assessing the overall benefits and risks of new investigational therapies, and can be defined by the seriousness of the patients' condition and the number of effective treatments available to treat it. Box 1 shows the commonly used definitions for patient populations with differing unmet medical needs enrolled in clinical trials.

Patient populations, comparator groups and methods of comparison

Clinical trials in RA are designed to compare the efficacy and safety of new investigational agents with those of existing alternatives. Comparisons can be made with a treatment used in a control group or with the patient's baseline condition. As patients in randomized controlled trials are randomly assigned to receive either the existing or investigational therapies, the treatments must be equal in terms of their anticipated likelihood to benefit (or potentially harm) patients. Such equality, known as equipoise, can be affected by many factors, including the risks and benefits of therapies used in the comparator arm, the likelihood of the new investigational therapy yielding benefits or adverse effects, and the manner in which patients will be monitored and removed from therapy in the event of a lack of efficacy or the development of new adverse events.32

Discussion of equipoise and the appropriate patient population for study is very often complicated, and requires careful consideration and co-ordination with health authorities. Experienced sponsors of clinical trials for new therapeutic agents provide the FDA and other health authorities with careful assessments of the available data for the new and existing therapies to be studied, in addition to detailed patient monitoring plans that include the number and timing of clinic visits and radiographic assessments. Investigators should also consider the use of an external data safety monitoring board to prospectively define stopping rules that specify alterations to the study and/or when a patient is to be removed from a study, should adverse events occur.

Dosage

During product development, it is important to systematically explore and establish the minimum effective dose and the maximum tolerated dose of a drug, whether dose adjustments are required to improve safety or tolerability, the effects of dose on pharmacokinetic (e.g. absorption, distribution, metabolism and excretion) and pharmacodynamic (e.g. effects on immune function) parameters, and the effects of other concomitant therapies on the investigational agent (i.e. drug–drug interactions).3

Challenges for the future

The success of rheumatologists so far in devising meaningful and effective outcome measures for RA and implementing rigorous trial designs for new therapeutic agents has resulted in substantial advances in the past decade or so. Clear regulatory frameworks for sponsors of new investigational agents have also been established. The involvement of investigators, health authorities, key opinion leaders and others in the field of rheumatology is a unique success story in the annals of modern medicine. New and remaining challenges must, however, be addressed if the field is to continue to progress.

With the development of many new therapeutic agents, important new questions have arisen regarding the characterization and optimization of their beneficial effects, their role in treatment algorithms, and their long-term safety. In addition, new approaches must be added to the three essential clinical areas of RA (signs and symptoms, disease progression and function) that are currently measured in clinical trials if novel agents are to be optimally and expeditiously developed. A partial list of challenges now facing rheumatologists can be found in Box 2.

A discussion of the challenges facing rheumatologists would not be complete without mentioning the issues of patient recruitment and overall trial quality and relevance. With the plethora of therapies now available for testing in RA, and the associated worldwide interest in the field, competition for patients to participate in clinical studies has notably intensified. The recruitment of adequate numbers of patients to clinical trials is becoming increasingly difficult. Moreover, differing standards of medical care and trial quality around the world have sometimes adversely affected the relevance and utility of clinical data. Improvements in efficiency and greater standardization of methods in clinical trials are necessary if the field is to continue to progress. Rheumatologists need to work towards conducting smaller and shorter studies that make use of radiographic, serologic, genetic, proteomic and other biomarkers for use as prognostic indicators and therapeutic outcomes. The current emphasis on empiric clinical data in studies needs to be revised to also include numerous biomarker data. Health authorities need to encourage more-efficient trial designs that maximize the potential to detect early signals of efficacy and minimize the continued enrollment of those who are responding poorly to therapy.

Finally, rheumatologists need to emphasize the importance of overall life-cycle management for new therapies—that is, novel products for the treatment of RA need to be continually studied and characterized throughout their life span, and questions need to be continually asked and addressed regarding their safety and efficacy, if the field is to advance and patients are to maximise their benefits from new therapies.

Conclusions

These times are both exciting and demanding for rheumatologists. Despite the challenges of studying RA—a multisystem and heterogeneous disease with variable presentations and prognoses—investigators, key opinion leaders and health authorities in rheumatology have been successful in developing meaningful outcome measures and other trial design parameters for the study of new investigational therapies. As a result, many effective new therapies are now available for patients with RA. However, many challenges remain or are arising. The effectiveness of combination therapies—including combinations of small molecules and/or biologic therapies in early disease to maximize patient benefit and long-term patient function—are becoming increasingly important concerns. Likewise, the generation of new definitions and outcome measures that adequately characterize the nature, durability and clinical relevance of new bony tissue resulting from therapy are needed. Finally, identifying new clinical trial designs that are more efficient, quicker to identify patient responders and oriented towards effectively obtaining safety information are likely to become priorities for future investigators. Rheumatologists must meet these new challenges if the field is to progress and patients are to further benefit from new therapies.

Review criteria

No formal search of the literature was conducted. Evidence was taken from previous work, efforts and systematic Reviews by the author's and colleagues' own collections of articles, and from the author's direct experience at the FDA—where he co-authored a guidance document in this area.