Application of Item Response Theory to Modeling of Expanded Disability Status Scale in Multiple Sclerosis
- 1.6k Downloads
In this study, we report the development of the first item response theory (IRT) model within a pharmacometrics framework to characterize the disease progression in multiple sclerosis (MS), as measured by Expanded Disability Status Score (EDSS). Data were collected quarterly from a 96-week phase III clinical study by a blinder rater, involving 104,206 item-level observations from 1319 patients with relapsing-remitting MS (RRMS), treated with placebo or cladribine. Observed scores for each EDSS item were modeled describing the probability of a given score as a function of patients’ (unobserved) disability using a logistic model. Longitudinal data from placebo arms were used to describe the disease progression over time, and the model was then extended to cladribine arms to characterize the drug effect. Sensitivity with respect to patient disability was calculated as Fisher information for each EDSS item, which were ranked according to the amount of information they contained. The IRT model was able to describe baseline and longitudinal EDSS data on item and total level. The final model suggested that cladribine treatment significantly slows disease-progression rate, with a 20% decrease in disease-progression rate compared to placebo, irrespective of exposure, and effects an additional exposure-dependent reduction in disability progression. Four out of eight items contained 80% of information for the given range of disabilities. This study has illustrated that IRT modeling is specifically suitable for accurate quantification of disease status and description and prediction of disease progression in phase 3 studies on RRMS, by integrating EDSS item-level data in a meaningful manner.
Key Wordscladribine tablets disease progression model expanded disability status scale item response theory multiple sclerosis
Multiple sclerosis (MS) is a chronic inflammatory and neurodegenerative disease of the central nervous system (1). MS affects over 1 million people worldwide, and it is the leading cause of non-traumatic disability in young adults (2). Over 80% of all patients present with relapsing-remitting MS (RRMS), which is characterized by unpredictable acute episodes of neurological dysfunction named relapses, followed by variable recovery and periods of clinical stability.
The heterogeneity of the MS patient population and complexity of its clinical course have offered challenges to the quantification of disease severity and progression. The clinical manifestations of the disease are extremely variable, even in an individual patient, ranging from motor and sensory problems to cognitive and affective disorders, which renders it necessary to use multidimensional outcome measures.
Since the 1960s, many scales for rating disability caused by MS have been proposed, but none has been entirely satisfactory (3). The Kurtzke Expanded Disability Status Scale (EDSS) remains the most widely used scoring system in MS. Its assessment is based on seven functional systems including vision, brainstem, pyramidal, cerebellar, sensory, bowel and bladder, mental (cerebral), and ambulation (500-m walk), and reliance on aid. The EDSS is a summarized measure which ranges from 0 (normal neurological exam) to 10 (death due to MS) in incremental steps of 0.5 (4). Despite its wide use and acceptance, there are several perceived problems with the use of the scale, such as limited inter-rater reproducibility, bimodal distribution of the scale, and potentially unequal steps, mostly due to its ordinal nature (5, 6). Its overall score is greatly weighted toward ambulation, especially in higher scores (EDSS > 3.5) (7) and is rather insensitive to cognitive or upper limb dysfunctions. It is important to note that EDSS itself is rarely used as a clinical endpoint in MS clinical trial, but rather the EDSS-related endpoint: time to sustained EDSS progression.
Quantifying the disease severity in MS is important to monitor individual patients during their treatment and for evaluating experimental therapies in clinical trials. As increasing numbers of treatment options become available, sensitive clinical outcome measures that can detect small changes in disability that reliably reflect long-term changes in disease progression are required. Identifying effective treatments depends upon the availability of outcome measures that exhibit good sensitivity to rates of changes caused by the disease.
Traditionally, item response theory (IRT) models have been applied in educational testing to measure ability or proficiency and in psychological assessments to measure personality traits (8). Also, health outcome researchers have been employing IRT to questionnaire development, evaluation, and refinement (9). IRT is a statistical theory consisting of mathematical models expressing the probability of the particular response to a scale item as a function of an underlying trait, here disability of a person (10). IRT models are also referred to as latent trait models, because the latent “unobservable” trait of interest cannot be measured directly and is therefore assessed indirectly by scoring various items constructed to measure that underlying domain. Traditional scoring consists of summarizing all the information in one composite score, which might lead to loss of information captured in the individual item. The recent application of IRT to Alzheimer’s disease has demonstrated that increased precision in cognitive assessment can be achieved by not only considering scores on item level, but also how those items function and the amount of information they contain for the studied population (11, 12).
Here, we report the development of the first IRT model within a NLME (non-linear mixed effect) framework in MS therapeutic area. Analysis was based on the data from CLARITY (CLAdRIbine Tablets treating multiple sclerosis orallY) study where cladribine was found to reduce, as compared to placebo, the risk of 3-month sustained progression, by 33 and 31% in the cladribine 3.5 and 5.25 mg/kg groups, respectively (13). Giovannoni et al. have reported that the administration of cladribine tablets have been found to be also efficient in regard to other studied clinical endpoints: annualized relapse rate (primary endpoint), percentage of relapse-free patients, and occurrence of magnetic resonance imaging (MRI) detected brain lesions. The current work investigates the possibility of quantification of MS disease progression and of effect of cladribine tablets. We also explore the information content of each item constituting EDSS.
MATERIALS AND METHODS
Patients and Study Design
Data from the CLARITY clinical trial were included in the analysis. CLARITY was a phase III randomized, multi-center, double blind, parallel group, controlled study, evaluating the efficacy and safety of 3.5 and 5.25 mg/kg cumulative doses of cladribine tablets over 96 weeks in patients with RRMS. Enrolled subjects had a diagnosed definite relapse-remitting form of multiple sclerosis, according to the McDonald criteria (14). Outcome assessments were conducted in identical fashion to all other major MS clinical trials. The blind was maintained by utilizing a treating physician who viewed clinical laboratory results, and assessed adverse events and safety information. Patients received neurological assessment at baseline and every 12 weeks thereafter for the duration of the study by an independent blinded evaluating physician. The additional details of the study protocol, subject characteristics, and study results can be found in the original publication (13).
Analyses were performed in the software NONMEM 7.2.0, and Laplacian estimation method was applied for parameter estimation (15). The simulation-based diagnostics were realized using computer-intensive statistical methods available in the Perl-coded program PsN (16).
In addition to seven polychotomous items of functional systems with internal ranking, EDSS comprises measures of ambulation (0–500 m walk) and reliance on aid (0, 1, 2). According to neurostatus definition (www.neurostatus.net), it is the combination of ambulation and reliance on aid that is affecting the determination of EDSS and not one of those variables independently. This was used as a rational for combining those two variables in the IRT analysis. Thus, a polychotomous variable with 11 categories, called ambaid was defined as following: ambaid = 0: ambulation ≥ 500 m and aid = 0; ambaid = 1: 300 m ≤ ambulation ≤ 499 m and aid = 0; ambaid = 2: 200 m ≤ ambulation ≤ 299 m and aid = 0; ambaid = 3: 100 m ≤ ambulation ≤ 199 m and aid = 0; ambaid = 4: 5 m ≤ ambulation ≤ 99 m and aid = 0 or ambulation ≥ 50 m and aid = 1 or ambulation > 120 m and aid = 2; ambaid = 5: 10 m < ambulation ≤ 49 m and aid = 1 or 10 m ≤ ambulation ≤ 120 m and aid = 2; ambaid = 6: ambulation ≤ 5 m and use of standard wheelchair; ambaid = 7: ambulation of few steps requiring aid to transfer and use standard wheelchair with assistance or motorized wheelchair; ambaid = 8: patient is wheelchair bound and capable of “many” self-care; ambaid = 9: patient is bed bound and capable of “some” self-care; ambaid = 10: patient is bed bound and not capable of any self-care.
Parameters aj and bj characterizing item specific parameters were modeled as fixed effects, while the IRT disability D was modeled as subject-specific random effect, assuming normal distribution with a mean of zero and fixed variance of 1, meaning that 68% of the population will be within the IRT disability range of (−1, 1). The assumed scale of D goes from –∞ to + ∞, and it is relative to the studied population with a typical patient at baseline having an IRT disability of 0. In the case when scores of an item were not occurring in the available data, merging of scores with a closest observed score was performed.
Model development was conducted in five sequential steps: development of the baseline model; development of disease progression model based on placebo data; development of the exposure-response model based on data from patients on cladribine treatment; development of the covariate model; and model evaluation.
After the drug model was developed, all model parameters were re-estimated simultaneously based on all available data.
Age and clinical covariates (disease duration and number of relapses in the preceding 12 months (EXNB)) were evaluated for their potential to account for the variability in baseline IRT disability and in slope of disease progression of the full model described above, using a full random effect models (FREM) approach (20). Covariates were introduced as observed variables, and their distribution was modeled as random effects. A full covariance matrix between random effects for parameters and covariates was estimated together with other model components. Coefficients for covariate-parameter relations were obtained from the ratio of covariance between parameter and covariate variability to the covariate variance.
Model discrimination between hierarchical models was primarily numerical and based on the likelihood ratio test of obtained objective function values (OFVs). For model selection, a significance level of p < 0.05 was used, with the degrees of freedom being equal to the difference in the number of parameters between two models.
Model evaluation was carried out through simulation-based diagnostics, mainly visual predictive checks (VPCs). Two hundred Monte Carlo simulation replicates of the original dataset with 95% prediction intervals were generated. Simulations were performed both on item level and on total score level. An algorithm was developed using the R program (21), to derive the total EDSS scores from individual item scores.
Calculation of Information Content
From the developed IRT model, the Fisher information for estimating a patient’s IRT disability was calculated for each item constituting EDSS as minus the expectation of the second derivative of the log-likelihood. Subsequently, the information content for each item was computed for the studied population, and items were ranked according to the amount of information they contained.
Based on obtained item ranking, it was investigated whether a shorter version of the EDSS including only the most informative items would be able to identify patients with sustained progression equally well as the original scale. For this purpose, sustained progression was defined as a confirmed increase in EDSS after a period of at least 3 months with the increase defined in relation to the baseline, of ≥1.5 points if baseline EDSS was 0; ≥1 points if baseline EDSS was ≥1.0 and ≤4.5; ≥0.5 point if baseline EDSS was ≥5.0 (22). IRT disability status was determined based on all or on the subset of EDSS items, and then 200 simulations were performed using the developed model. The proportions of patients identified as progressing according to the original and shorten EDSS form were compared.
Summary of Patient Baseline Characteristics
CLARITY cladribine 3.5 mg/kg
CLARITY cladribine 5.25 mg/kg
Sex no. (%)
Age of disease onset
Figure 1 also highlights an expected score larger than 0 for the sensory, mental, and visual item for individuals considered healthy (IRT disability = −4); this can be explained by non-MS-related impairment of those functions, as those are not MS-specific symptoms.
For most of the items, score of 0 is the most frequently observed score. Also, the probability for a score of 0 drops quickly as the IRT disability increases, except for ambaid item where probability of having score of 0 remains 100% with increasing IRT disability until a certain level of IRT disability is reached. This is in line with common clinical knowledge that only patients with advanced stage of the disease will start experiencing impaired ambulation (EDSS higher than score of 4).
Probability curves for different scores of some items (e.g., mental) overlap over a range of IRT disability levels, indicating that a specific item does not differentiate well between those scores for a given range of IRT disability.
Figure in supplemental 1 shows that the frequency with which the score is observed at baseline is captured within the 95% prediction interval of the model.
Disease Progression Model Based on Placebo Data
A significant positive correlation of 0.59 (p < 0.001) was observed between baseline IRT disability and the disease progression rate, indicating that patients with higher IRT disability at baseline are likely to progress faster. Positive slope of disease progression, significantly different from zero (p < 0.001) was estimated. The estimated disease progression rates were on IRT disability scale. Simulations were also performed which translates those results to the EDSS scale, and according to these simulations, the typical patient in this dataset receiving placebo treatment will progress 0.16 EDSS points over 2 years.
with IRT disability at baseline (D0), disease progression rate (α), power constant (pwr), maximal exposure-dependent drug effect (Emax), exposure needed for half maximal effect (Exps50), and constant exposure-independent drug effect (EffD).
The effect of cladribine on IRT disability was best described using both exposure-dependent and exposure-independent drug effects. The final model suggests that cladribine treatment significantly (p < 0.001) slows disease-progression rate, with a 20% decrease in disease progression rate compared to placebo, irrespective of exposure in the investigated cumulative dose range (20–600 mg). The model also describes an exposure dependent decrease in IRT disability in patients treated with cladribine tablets with a cumulative dose of 407 mg being needed for half maximal (exposure-dependent) effect in a typical patient, which would translate for a typical patient receiving a typical dose of 240 mg in 45% reduction of disease progression.
Covariate analysis revealed that baseline IRT disability was correlated with age, duration of disease, and EXNB by coefficients of 0.027, 0.037, and 0.075, respectively. This means for instance that a typical patient of 58 years, who is 20 years older than the population’s mean of 38 years, will have a baseline IRT disability that is 0.54 (i.e., 20*0.027) units higher on the disability scale, then the IRT disability of a typical patient with the mean age in this population. Similarly, there is a 7.5% increase in baseline IRT disability per number of relapses (>1) in the year previous to the study. Coefficients for covariates effects on the slope of disease progression were 0.0053, 0.0054, and 0.05 for age, duration of disease, and EXNB, respectively.
Population Parameter Estimates from the Final Model
Parameter estimates (RSE%a)
Disease progression slope
Disease progression power
Emax exposure-dependent drug effect
Exp50 exposure-dependent drug effect
Constant exposure-independent drug effect
Calculation of Information Content
Ranking of EDSS Components by Information Content in Studied Population
Cumulative % total
Bowel and bladder
From this, the EDSS4 scale, based only on the four most informative items, was derived and then evaluated by computing the ratios of patients classified as progressing for EDSS4 (shortened version) and EDSS8 (original version). Based on simulations, proportions of progressing patients were very similar independent of used scale (95% CI [0.92, 1.06]).
Using the data from a phase III clinical trial, IRT methodology has been successfully implemented for the first time to model EDSS in patients with RRMS. The model reported here was developed using data from a clinical trial investigating the effect of cladribine tablets on RRMS. The drug effect model is certainly specific to cladribine, but the implementation of IRT methodology to EDSS as well as the description of time-course of disease progression has broader applicability, beyond cladribine tablets.
Traditional approaches to analyze questionnaire-based scales generally disregard the underlying nature of the data and usually regard only summary scores. In the past, EDSS has been modeled either as a continuous variable (19) or as an ordered categorical variable with considerable simplification of the scale (20 categories combined into 5–6 categories) (23, 24). Instead of modeling changes in the composite score over time, application of IRT allows derivation of underlying/unobserved latent variable from observed subscores and model the changes in that latent variable over time. The IRT methodology has been applied here to order categorical data, but it has been shown by Ueckert et al. that it is equally suitable for other types of non-continuous data, such as binary or count data (12).
The effectiveness of therapeutic interventions can be determined, only if accurate quantification of disease severity is possible. Central to the patient, the most important therapeutic aim of any disease modifying treatment of MS is to prevent or postpone long-term disability. In phase III trials, various surrogate measures such as relative reduction in annualized relapse rate and risk of 3-month sustained progression have been used as predictors for this disability, but there is limited evidence that those changes reflect true irreversible accumulation of disability at long-term scale (25). Both analyses of CLARITY trial data, our IRT analysis of EDSS subitems and traditional statistical analysis of time to sustained progression have found that cladribine tablets have an effect on the studied endpoints “disease progression” and “risk of 3-month sustained progression of disability”. However, using time to first confirmed disability progression as an endpoint in MS clinical development does not allow for a description of disease progression, as we know that disability progression does not stop after the first event. In contrast, our model can be used to understand time-course of disease and effect of the treatment, and a role of the individual components of EDSS. It can also be used for clinical trial simulations.
Another aspect of the slowly progressing and highly variable nature of the disease is that patients may remain at the same score for a prolonged period of time. According to the model developed by Savic et al.(19) for instance, a typical patient will experience 0.14 EDSS units increase in disease severity over 2 years on placebo treatment and only 0.024 points increase when treated with a 3.5 mg/kg cumulated dose of cladribine. Using IRT, disability may be determined more accurately than with the composite EDSS score. Results shown in Fig. 2 for instance reveal that each EDSS score corresponds to a wide range of underlying IRT disabilities, indicating that total scores are relatively imprecise measures of underlying IRT disability. Better quantification of disease severity will also improve our assessment of disease progression and treatment effect. Comparable results were obtained for ADAS-cog score in Alzheimer’s disease (11).
Full EDSS assessment takes over 40 min to be performed by a neurologist, hampering its use in everyday clinical practice. Increased efficiency could be achieved with optimal selection of the most informative subset of items. Here, we use Fisher information as a measure of item information content as it directly relates to the expected variance of the individual latent variable estimates. Conceptually, we are able to choose the items that have the largest signal to noise ratio, i.e., where a functional change relates most directly to a change in disease state. We have shown that 80% of the information about underlying disease status in MS, in the studied population, is quantified in only four of the eight EDSS items, namely cerebellar, pyramidal, ambaid, and bowel and bladder. Simulations have demonstrated that our proposed shortened scale performs equally well as the full EDSS scale when it comes to determining a clinically meaningful measure, the ratio of patients experiencing a 3-month sustained progression. With this example, we have just demonstrated how a rational subselection of items can be made if one wants to simplify the test. This could be taken even further by turning it into a dynamic process—the answer to the first item evaluation directs which item to investigate next.
MS affects functional systems of the EDSS differently as identified by Healy et al. (25). They have demonstrated that the time to sustained progression varied widely across the EDSS items; it was the fastest for the pyramidal and sensory scales and the slowest for brainstem and visual scales. Identification of subgroups of patients more likely to experience substantial worsening of the disease, by focusing on specific sensitive items, will increase the difference in drug effect between groups, if one is in fact present. Thus, the insight into information content on item level achieved through IRT analysis could be used as a valuable tool, in combination with other study enrichment strategies.
Despite its weaknesses, the extensive use of EDSS in patients with RRMS is likely to be continued. Current treatments have been authorized based on clinical trials using EDSS as one of the endpoints, and EMA requires new therapies to be compared to existing ones by using the same outcome measures to demonstrate their effectiveness (26). Also, on the individual patient level, there is a need for continuity in use of outcome measures in order to ensure the long-term records of disease severity (27). Moreover, the clinical course of RRMS can vary tremendously, and it is likely that different outcome measures are demanded in different stages of the disease (3). Establishing and quantifying the relationship between different outcome measures has been proven challenging in the past (25, 28).
One of the ways to enable the direct comparison of results obtained on different scales for disease severity would be by the application of IRT. Ueckert et al. have demonstrated the possibility of jointly analyzing different ADAS-cog variants, without any recalculation or normalization of measured scores. In the field of MS, this approach could be utilized to bridge between the diversity of scores that are used for quantification of disease progression, as IRT disability levels of patients can be easily compared once outcome measures on the different scales have been mapped to overall IRT disability. This approach will also allow evaluation of performance of one assessment method relative to another.
Accurate quantification of disease status and description and prediction of disease progression is essential for drug development. For chronic diseases with slow progression such as multiple sclerosis, this is especially pertinent, due to the high costs of long-term clinical trials required to establish treatment efficacy. This study has illustrated that IRT modeling is specifically suitable for this purpose in phase 3 studies on RRMS, by integrating EDSS item level data in a meaningful manner instead of aggregating information by deriving a composite score.
The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement n° 115156, resources of which are composed of financial contributions from the European Union’s Seventh Framework Programme (FP7/2007–2013) and EFPIA companies’ in kind contribution. The DDMoRe project is also supported by financial contribution from Academic and SME partners. This work does not necessarily represent the view of all DDMoRe partners.
Compliance with Ethical Standards
Trial protocols, amendments, and subject informed consent forms were reviewed by a national, regional, or investigational site ethic committees or an Institutional Review Boards. The study was conducted in accordance with the ethical principles of the Declaration of Helsinki and the International Conference on Harmonization Tripartite Guidelines for Good Clinical Practice.
Conflicts of Interest
The authors declare that they have no conflict of interests.
- 10.Baker FB. The basics of item response theory. 2001.Google Scholar
- 12.Ueckert S, Plan EL, Ito K, Karlsson MO, Corrigan B, Hooker AC, et al. Improved utilization of ADAS-Cog assessment data through item response theory based pharmacometric modeling. Pharm Res 2014.Google Scholar
- 15.Beal SL, Sheiner LB, Boeckmann A, Bauer RJ. NONMEM user’s guides (1989–2009). Ellicott City: Icon Development Solutions; 2009.Google Scholar
- 19.Savic RM, Munafo A, Karlsson MO. Disease progression model for multiple sclerosis and effect of cladribine tablets therapy on clinical endpoints. San Diego: ACOP; 2011.Google Scholar
- 20.Karlsson MO. A full model approach based on the covariance matrix of parameters and covariates. PAGE 2012.Google Scholar
- 21.Team RC. R: A Language and Environment for Statistical Computing. 2014.Google Scholar
- 26.Draft of Guideline on clinical investigation of medicinal products for the treatment of Multiple Sclerosis. 2012.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.