Introduction

Paroxysmal nocturnal hemoglobinuria (PNH) is a rare and life-threatening hematologic disorder with significant morbidity and premature mortality [1]. People with PNH may present with hemoglobinuria, thrombosis, impaired kidney function, abdominal pain, dysphagia, pulmonary hypertension, chest pain, dyspnea, erectile dysfunction in males, end organ damage, and/or severe fatigue [2,3,4,5,6,7]. PNH is characterized by dysregulation of the terminal complement pathway, leading to intravascular hemolysis and thrombosis. Such patients generally have a poor quality of life (QOL) [8]. If untreated, up to 35% die within 5 years of diagnosis [2, 3, 9,10,11,12,13]. Although onset can occur at any age, PNH has a worldwide mean age of diagnosis of 39.3 years (SD = 18.6) [2, 3, 14,15,16]. The prevalence rate is 12–13 per 1,000,000 persons and is similar across sexes but higher among older adults [17]. Its clinical course is highly unpredictable [3, 7]. Some patients have sudden onset and rapid progression to death, whereas others have long-term chronic illness but few life-threatening complications [3].

Eculizumab is a complement component-5 (C5) inhibitor that has been the standard of care since 2007, with evidence of lower mortality [18], improved QOL [19], reduced thrombosis risk, and normal life expectancy [9, 10, 12, 20]. Because of the treatment burden [21, 22] imposed by biweekly doses of eculizumab, recent clinical trials compared it with ravulizumab. Ravulizumab is a recentlyFootnote 1 developed C5 inhibitor that produces immediate, complete, and sustained inhibition of C5 with an extended, 8-week dosing interval. Two head-to-head randomized clinical trials documented the non-inferiority, safety, tolerability and efficacy of the two drugs. Trial 301 (ALXN1210-PNH-301) [22] was implemented in people with PNH naïve to complement inhibitors [22]; Trial 302 (ALXN1210-PNH-302) [21], in people with PNH who were stable on eculizumab for at least 6 months and of which half were randomized to switch to ravulizumab [21]. The most frequently reported adverse event was headache, with slightly higher rates for ravulizumab [21].

One important indicator of treatment effectiveness is whether the treatment can enable a normal QOL; however, “normal” or near-normal levels is a “high bar” for conditions like PNH. It is a particularly challenging question because there is no validated disease-specific patient-reported outcome (PRO) measure for PNH [23]. Because PNH’s QOL impacts are similar to those of hematologic cancers, the pivotal trials collected data on cancer-specific QOL measures. Published results reported no difference between the treatments on the Functional Assessment of Chronic Illness (FACIT)-Fatigue [21, 22] and showed improvements on the European Organisation for Research and Treatment of Cancer (EORTC)—QLQ-C30 Global Health Status/QOL score [22]. Understanding how PNH EORTC scores compare to general-population values would be important for characterizing the QOL impact of ravulizumab and eculizumab.

A substantial evidence base of research across a broad range of patient populations has documented that people living with chronic or terminal illness evaluate their QOL differently than the general population does [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. These response-shift effects reflect changes in their internal standards, values and/or conceptualization of QOL over time [43, 44]. Such changes might, for example, lead to a different way of thinking about “moderate” versus “little” fatigue compared to someone who has never had this blood disease (i.e., recalibration or change in internal standards). They may change their ideas of what is important to role functioning [45], for example, leading to different priorities and thus a different perspective on how well they are functioning (i.e., reprioritization or change in values) [40]. They may change the way they define QOL, for example by focusing less on economic or professional achievements and more on family welfare or intimacy (i.e., reconceptualization or change in conceptualization) [46]. Response-shift effects are natural and common concomitants to treatment outcomes [47,48,49]. When adaptive, they can help people maintain a homeostasis or stability in QOL that enables better affective and eudemonic well-being [50, 51].

We hypothesize that PNH patients whose condition is well-managed will evidence response-shift effects. Evaluating response-shift effects is akin to studying an iceberg: one notices the portion that stands out from the surface (e.g., surprising or paradoxical findings), and then examines indicators of what is below to characterize the object’s nature and size (e.g., information about differences in correlations among variables, item-response parameters, or cognitive-appraisal processes).

The present study thus evaluated the impact of ravulizumab and eculizumab on patients’ QOL as measured by the EORTC QLQ-C30 after 26 weeks of treatment, as compared to general-population norms. This treatment period is generally accepted as sufficient to achieve a stable, well-managed condition. The present work thus provides a normative comparison by examining the same PRO in people with PNH and the general population. The study then investigated response-shift effects by examining differential item functioning (DIF) [52]—by treatment, by group as compared to the general population, and over time, the latter two suggesting and reflecting response-shift effects.

Methods

Sample

This post-hoc secondary analysis utilized three data sources: two PNH clinical trials and one general-population study. Both trials were phase-3, open-label studies evaluating the non-inferiority of ravulizumab compared to eculizumab in changing primary and secondary clinical endpoints. Trial 301 (ALXN1210-PNH-301) was implemented in people with PNH not previously treated with complement inhibitors [22]; Trial 302 (ALXN1210-PNH-302), in people with PNH who were stable on eculizumab for at least 6 months and of whom half were randomized to switch to ravulizumab [21]. Data available for analysis included longitudinal follow-up from baseline through the extension trials, at which time all participants received ravulizumab, with total follow-up time typically 12 months (mean = 11.9; SD = 2.2; range = 0.3–19.4. For complete details on trial inclusion and exclusion criteria and procedures see references [21, 22]) The trial was conducted in accordance with the provision of the Declaration of Helsinki, the International Conference on Harmonization guidelines for Good Clinical Practice, and applicable regulatory requirements. The trial was approved by the institutional review board at each participating institution. All the patients provided written informed consent before participating.

The general-population study provided a 2015 cross-sectional sample from 11 European countries. Further country-specific norm data were obtained from Russia, Turkey, Canada, and the United States. Ethical approval was not sought as this study was solely based on panel research data collected by GfK SE. The survey conformed to the required ethical standards by obtaining written informed consent from all participants and collecting data completely anonymously [53].

Measures

The EORTC QLQ-C30 is a comprehensive cancer-specific measure containing 30 items covering five function subscales (physical, role, emotional, cognitive, social); nine symptom subscales/items (fatigue, nausea/vomiting, pain, dyspnea, insomnia, appetite loss, constipation, diarrhea, financial difficulties); and a global health status/QOL subscale [54, 55]. Higher scores on the function and global health status/QOL scales and lower scores on the symptom scales reflect better health/QOL [56]. Of note, each individual item’s response options, except those for global health status/QOL, moved toward worsening health, which will be specifically relevant for selected analyses.

Demographic characteristics collected for all datasets included age, sex, and region. From the trial datasets, baseline clinical variables included in the analysis were lactate dehydrogenase or LDH stratum (< 1.5× upper limit of normal [ULN]; 1.5–< 3 × ULN; or ≥ 3 × ULN); pRBC stratum (0 units; 1–14; or > 14), and binary flags for aplastic anemia, immunosuppressant treatment, myelodysplastic syndrome, and bone marrow disorder.

Statistical analysis

Analyses were conducted for the overall PNH group versus general population and by PNH risk-factor group. Risk-factor groups were created based on clinical indicators known to be associated with worse PNH outcomes (Table 1). An initial risk-factor score was based on a weighted sum of these indicators. The binary flags were given a weight of one (i.e., no = 1, yes = 2), whereas the LDH stratum was given a higher weight (i.e., stratum 1 = 2; stratum 2 = 4, stratum 3 = 6). Since pRBC was not used in the 302 trial, it was not included among the clinical indicators used for the risk-factor score. This weighting approach was based on input from a knowledgeable PNH clinician (AGK). The resulting score ranged from 6 to 12, and it was used to create a lower-risk-factor group (score 6–8) and a higher-risk-factor group (score 9–12).

Table 1 Deriving the PNH risk-factor score

Multivariate Analysis of Covariance (MANCOVA) compared people with PNH on ravulizumab or eculizumab after 26 weeks to the general-population sample. Group was coded such that those on ravulizumab and eculizumab were each compared to the general population, the referent group. Dependent variables for a first model included function and global-QOL scale scores, and for a second model, symptom scale scores/items. Age, sex, and region were included as covariates. MANCOVAs were also computed separately for lower and higher PNH-risk-factor groups as a way of adjusting for PNH severity.

Similar MANCOVA models were also computed by PNH-risk-factor group at baseline to check that results of the above models were likely results of treatment rather than of preexisting characteristics of the study samples.

Because the general-population sample was disproportionately large, model results are reported in terms of Cohen’s d statistic [57], expressed in standard-deviation units, to emphasize the degree to which group differences may have been clinically important. Using Cohen’s criteria, a d of 0.2–0.49 is considered a small effect size, 0.5–0.79 is medium, and 0.8 or greater is large [57].

Heat maps were used to illustrate group differences by computing this same effect size using means and standard deviations by age and gender groupings. Formatting of tables and figures illustrates effect-size magnitude, with more saturated color indicating larger effect.

Past research on item response and response shift have built on structural equation models [41] or item response theory (IRT) models [42]. Here, initial efforts used a bifactor model for the function scores (poor model fit) and multidimensional IRT models for function and symptom scores (models did not converge due to identifiability problems). The present work thus utilized a logistic-regression framework to test for DIF [43]. Accordingly, we adapted response-shift operationalizations by building upon this prior work.

In this study, recalibration response shift is operationalized as uniform DIF over time, because it reflects the idea that, for a given group, the difficulty of endorsing an item may change over time, after adjusting for the total subscale score (i.e., the latent trait). For example, uniform DIF would reflect a specific emotional-functioning item being easier or harder to endorse than one might expect, given a certain level of overall emotional functioning.

Reprioritization response shift is operationalized as non-uniform DIF over time because the relative difficulty of endorsing an item over time may change across the total score on the domain. This type of response shift is captured by item discrimination or slope. For example, non-uniform DIF would reflect a specific emotional-functioning item becoming easier or harder to endorse over time than one might expect, given a certain trajectory of overall emotional functioning.

DIF analyses [58, 59] were conducted on the 24 EORTC QLQ-C30 items belonging to scales with at least two items. The basic DIF analyses used ordinal logistic regression and involved building three nested models:

  • Model 1: Logit[P(Y ≤ j)] = αj + b1(Total Score);

  • Model 2: Logit[P(Y ≤ j)] = αj + b1(Total Score) + b2(Group); and

  • Model 3: Logit[P(Y ≤ j)] = αj + b1(Total Score) + b2(Group) + b3(Total Score * Group),

where P(Y ≤ j) represents the probability that j is the rating-scale response category, each αj is a regression constant, and each b is a regression coefficient.Footnote 2

The log-likelihood ratio test for statistical significance compared Model 1 versus 2, Model 2 versus 3, and Model 1 versus 3. Uniform DIF is characterized by b2 being significant and the log-likelihood test comparing Models 1 and 2 being significant (i.e., there is a significant main effect for Group). Non-uniform DIF is characterized by b3 being significant and the log-likelihood test comparing Models 2 and 3 being significant (i.e., there is a significant Group-by-total score interaction). Uniform and Non-uniform DIF is characterized by the log-likelihood test comparing Models 1 and 3 being significant.

DIF was computed in three ways to test distinct hypotheses, which tested one alternative explanation (first hypothesis) prior to testing for more definitive evidence of response-shift effects (second and third hypothesis, respectively):

DIF by treatment compared ravulizumab and eculizumab groups on item difficulty (threshold) and item discrimination (slope) in the longitudinal data. If significant, this type of DIF would suggest that the two treatment groups are responding differently to the EORTC items, and thus one cannot validly compare their responses.

DIF by group compared people with PNH to the general-population group at one point in time: after 26 weeks on therapy and at the single time point collected in the general-population study. In this analysis, domain scores were first grand-mean-centered to aid interpretation. When uniform DIF was detected, the associated odds ratio indicated the “favored” group: when > 1.0, the PNH group was more likely than expected to endorse (i.e., endorsing was “easier”); when < 1.0, the PNH group was less likely than expected to endorse (i.e., endorsing was “harder”). If the associated log-likelihood test’s p value was significant (i.e., < 0.05), this type of DIF showed that the groups were responding differently to the items. The use of the term “harder” reflects the centrality of the idea of difficulty in the study of item response. Greater item difficulty would mean a higher bar for endorsing a particular response option, given one’s total score on that domain. Such systematic differences between people with PNH and the general population would suggest that the two groups do not have a similar contingent true score, meaning that they are thinking about the QOL item(s) differently in terms of frame of reference, sampling of experience, standards of comparison, or patterns of emphasis. Fuller explanation of these concepts can be found in [46, 49]. Because the data testing this DIF hypothesis are measured at one point in time, response shift is not a definitive explanation and would require longitudinal data for confirmation.

DIF over time compared, for people with PNH, slopes and thresholds over the course of the pivotal and extension trials, to test for intra-individual changes. If significant, this type of DIF provides further support for recalibration and reprioritization response-shift effects. This DIF would demonstrate that individuals with PNH change the cognitive-appraisal processes underlying their item response, i.e., that their contingent true score changes over time.

Multilevel modeling was used to account for the multiple data points per person used for the DIF-by-treatment and DIF-over-time analyses. SPSS Release 27 [60] and Stata/IC 16.1 [61] were used for all analyses.

Results

Sample

The study samples included 441 people with PNH, of whom 246 had participated in trial 301 and 195 in trial 302. In trial 301, 214 people were on eculizumab and 224 on ravulizumab. In trial 302, 107 people were on eculizumab and 111 on ravulizumab. The PNH group was further characterized as 224 with lower and 217 higher levels of risk factors. The EORTC sample included 15,386 people. Table 2 provides descriptive statistics on demographic information shared between the two study samples. Table 3 provides clinical information about the PNH-treatment groups.

Table 2 Demographics of PNH patients at baseline compared to general population
Table 3 Clinical characteristics of PNH patients at baseline

QOL comparison after 26 weeks

MANCOVA models revealed that across levels of PNH risk factors, patients who had been on either ravulizumab or eculizumab for 26 weeks reported better physical, emotional, and cognitive functioning, and lower nausea/vomiting, pain, insomnia, appetite loss, constipation, and diarrhea symptoms, than the general population, after adjusting for covariates (Table 4). Additionally, ravulizumab patients reported higher global health status/QOL, lower fatigue, and lower financial difficulties than the general population (Table 4). The effect sizes were generally larger for the ravulizumab patients.

Table 4 Effect sizes: PNH patients after 26 weeks of treatment compared to general population

MANCOVA models conducted separately by risk level revealed further nuances in QOL after treatment. Similar to the overall MANCOVA, compared to the general population, both eculizumab and ravulizumab lower-risk-factor patients reported higher physical and emotional functioning and lower nausea/vomiting, pain, insomnia, appetite loss, constipation, and diarrhea symptoms. Further, the lower-risk-factor ravulizumab patients also reported better cognitive functioning and global QOL, and lower fatigue, dyspnea, and financial difficulties. In several domains, the effect sizes were larger for these ravulizumab patients (Table 4).

Models focused on the higher-risk-factor patients as compared to the general population revealed that people with PNH reported better emotional and cognitive functioning, and lower fatigue, pain, insomnia, constipation, and diarrhea (Table 4). Further, these ravulizumab patients also reported better physical and social functioning, and lower symptom burden in nausea/vomiting and appetite loss (Table 4). In almost all cases, these ravulizumab patients had larger effect sizes than the eculizumab patients (Table 4).

Figure 1a and b show heat maps comparing treated patients to general-population norms. Since all of the differences showed better scores for the PNH group (i.e., higher on function/global QOL scales, lower on symptom scales/items), only one color is used for the conditional formatting. These graphs suggest that generally the effects were larger for the function scales than for the symptom scales/items and larger for ravulizumab patients than for eculizumab patients.

Fig. 1
figure 1figure 1

Heat maps. Heat maps illustrate group differences for ravulizumab (a) and eculizumab (b) using Cohen’s d effect size computed from aggregated means and standard deviations by age and gender groupings. Conditional formatting illustrates effect-size magnitude with a more saturated color reflecting larger effect size. Since all of the differences were in the direction of PNH group scoring better than the general population (i.e., higher on function/global QOL scales, lower on symptom scales/items), only one color is used for the conditional formatting. Figure a includes people with PNH on ravulizumab after 26 weeks either during the randomized period or during the extension-trial period. This meant assessment at 52 weeks for patients who had eculizumab for 26 weeks and then had ravulizumab for 26 weeks. Includes Trial 301 (N = 242) and 302 (N = 185). Figure b includes people with PNH who had been on eculizumab for 26 weeks. All these patients' assessments were made during the randomized period. Includes Trial 301 (N = 118) and 302 (N = 95)

QOL comparison at baseline

Because many of these findings were counter to expectation (i.e., functioning and symptom scores that were better than in the general population), we implemented similar MANCOVAs using the baseline data of trial patients who were treatment-naïve (from trial 301), to check whether the results were more likely due to treatment or to stable participant characteristics. These sample sizes are substantially smaller due to excluding patients from trial 302 while also splitting the analysis by level of risk. Results show that in general and as expected, untreated people with PNH at baseline reported worse function and symptom scores than did the general population. The exceptions generally involved small effects. (Additional file 1: Table S1).

DIF by treatment

Results of multilevel DIF analysis by treatment group revealed no significant effects in any of the 24 EORTC QLQ-C30 items (Table 5). Thus, across the multiple time points, there is no indication of treatment-related DIF, and one can compare responses of people with PNH regardless of the treatment they have received. In other words, given the same total score, people in the two groups responded similarly to a given item in that scale.

Table 5 Results of DIF analyses by treatment group

DIF by group

Results of PNH versus general-population groups’ DIF analysis revealed uniform DIF in 14 items (Table 6). Most often it was more difficult for the treatment group to report that they had poor health. This was true for 9 of these items (1 physical item, 2 emotional, 1 cognitive, 2 social, 1 fatigue, 1 nausea, and 1 pain). In 5 of these items (1 physical, 1 emotional, 1 cognitive, 1 fatigue, 1 nausea), it was more difficult for the general-population group to report poorer health.

Table 6 Results of DIF analyses of PNH versus general population

Non-uniform DIF was detected in 11 items, 6 favoring the general population at the domain score mean, meaning that it was easier for them to report poorer health (1 physical, 1 emotional, 1 cognitive, 2 social, 1 nausea), suggesting that this group effect varied by level of the EORTC QLQ-C30 item. There were 5 items favoring the PNH group, meaning that it was easier for them to report poorer health (1 physical, 1 emotional, 1 cognitive, 1 fatigue, and 1 pain).

DIF over time

Results of multilevel DIF analysis evaluating the impact of time on people with PNH item responses revealed significant uniform DIF effects in 7 of the 24 items (Table 7). These differences related to physical function (2 of 5 items), role function (2 of 2), emotional function (1 of 4), fatigue (1 of 3), and pain (1 of 2). These DIF effects suggested a decreasing likelihood over time of endorsing physical function problems, fatigue, and pain symptoms, given their total scores on the corresponding scales. In contrast, there was an increasing likelihood of endorsing irritability (emotional function item). For the two role-function items, one result showed an increase and one a decrease, thereby canceling each other out. Three of 24 items showed evidence of non-uniform DIF: 1 emotional, 1 fatigue, and 1 pain. Thus, there is evidence of recalibration response-shift effects in 7 of 24 items, and reprioritization response-shift effects in 3 items.

Table 7 Results of DIF analyses over time

Discussion

This study revealed that people with PNH on eculizumab and especially ravulizumab for 26 weeks reported QOL levels better than those of the general population, typically by 0.3 standard deviations. Not only was ravulizumab not inferior to eculizumab [21, 22], but both treatments also appeared to make QOL with PNH at least as good as the norm. These findings were equally notable for lower- and higher-risk-factor patients. In contrast, at baseline and prior to treatment, people with PNH,Footnote 3 especially those categorized with higher-risk-factor PNH, were generally worse off than the general population.

DIF analyses revealed group- and time-related DIF, but not treatment-related DIF. Thus, there were no systematic differences in item response between these two effective PNH treatments, but there were in analyses comparing people with PNH to the general population, and to themselves over time. Specifically, compared to the general population, people with PNH after 26 weeks of effective treatment tended to be less likely than expected to endorse poor health. For example, they were less likely to endorse having trouble concentrating than one might expect given their overall level of cognitive function (uniform DIF or recalibration). This effect for concentration was even more pronounced over levels of the trait (non-uniform DIF or reprioritization).

These recalibration and reprioritization effects reflect adaptive response shifts. In this way, the scores of people with PNH, irrespective of treatment, not only approached “normal” QOL, but even “better than normal.” This pattern of responses suggests that ravulizumab and eculizumab enabled patients not only to achieve a better QOL but also to adapt to their condition. For example, they may have been aware of being fatigued while at the same time noting that it was less debilitating than it used to be. Thus, compared to the general population, the same level of feeling heavy and lethargic may have been calibrated as less onerous for someone with PNH. This recalibration response shift would continue over time, making their earlier and later responses less-than-comparable because of differences in their contingent true score (e.g., comparing their QOL to different standards). As another related dynamic, they may have modified their daily responsibilities or hobbies, so that the activities were more feasible. In this new context, it would be more difficult for them to report that these activities were limited by their condition (reprioritization response shift).

PNH is a difficult disease to live with. Its many signs and symptoms involve multiple organ systems, and the uncertainty that people with PNH experience makes these function- and symptom-impacts even more challenging. A treatment that provides immediate, complete and sustained C5 inhibition not only brings QOL to a normal level, but it enables adaptation, which may have an even greater value. For someone who knows what debilitating fatigue is, being given the opportunity to experience life without fatigue makes those days all the more poignant and joyful.

The present work had many strengths, including robust sample sizes and the use of a general-population comparison sample. Its limitations must, however, be acknowledged. First, the comparison group was very large at 15,000, and so the multivariable analyses had sufficient power to detect very small effect sizes. This hypersensitivity is why we emphasize Cohen’s d effect sizes. Caution is also warranted in interpreting results because of the few items in each scale, especially when there are only two. Future research might replicate the response-shift analyses on groups of more similar size, or might investigate the longitudinal-DIF findings using measures of QOL cognitive appraisal [29] or interviews. Given the rarity of PNH, this replication would be challenging. Finally, in the multivariate analyses comparing people with PNH and the general population, we were ultimately able to adjust only for age, sex, and region. Other variables unexamined and unavailable in this study might be relevant to explaining or mediating these group differences, such as expectations.

In summary, people with PNH who were treated for 26 weeks with eculizumab or ravulizumab not only showed comparable effects on clinical outcomes, but also showed a notable and important QOL benefit—especially with ravulizumab. People with PNH also provided evidence of response shifts over time, suggesting that the treatments enabled adaptive changes.