Introduction

Narcolepsy, a chronic neurologic disease with no known cure, is associated with an economic burden resulting from higher healthcare resource utilization and reduced productivity relative to those without narcolepsy, and a humanistic burden, since patients with narcolepsy report substantially impaired health-related quality of life (HRQoL) [1,2,3,4]. Studies that have used the 36-item Short Form Health Status survey (SF-36) [5], which is a generic patient-reported measure of HRQoL, have consistently shown lower scores on most SF-36 subscales in narcolepsy patients not only compared with the general population, but also relative to patients with other chronic conditions including obstructive sleep apnea, Parkinson’s disease, and epilepsy [6,7,8,9,10,11,12,13,14,15]. In particular, of the eight SF-36 subscales, the greatest differences between narcolepsy patients and the general population have been for the subscales of Vitality and Role Physical; the eight subscales of the SF-36 are Physical Function (PF; ability to perform physical activity), Role Physical (RP; assesses the impact of physical function on daily roles including work and daily activities), Bodily Pain (BP; presence of pain and its impact on limiting activities), General Health (GH; overall health status), Vitality (VT; energy and tiredness), Social Function (SF; ability to perform social activities), Role Emotional (RE; impact of emotional problems on participation in life activities such as work and other daily activities), and Mental Health (MH; general mood, i.e., anxiety, depression).

Clinical change in disease status is often assessed using a generic global measure, the Clinical Global Impression of Change (CGI-C) [16], which is a frequently used summary scale from the clinician’s perspective. The CGI-C asks the clinician to rate on a scale from 1 (=very much improved to 7 (=very much worse) how much the patient has improved from their perspective since baseline or an earlier assessment. An analysis of SF-36 data from a clinical trial of sodium oxybate (SXB) suggested dose-dependent effects of SXB on the SF-36, with significant improvements for the change from baseline relative to placebo with the 9 g/night dose on the SF-36 domains of VT, GH, PF, and SF, as well as the Physical Component Summary (PCS) [17]. Similarly, analysis of data from this clinical trial showed that the proportion of patients in each SXB treatment group who were rated as Much Improved/Very Much Improved by the investigator was significantly greater than placebo [18]. However, it is not clear if the clinician’s assessment using this scale accurately reflects what the patient may be experiencing with regard to the broader impact of narcolepsy on health status or HRQoL.

Patient-reported outcomes are becoming increasingly important to researchers and regulatory agencies, including the United States Food and Drug Administration [19]. This importance suggests not only that there is a need for evaluating changes in disease status from both the clinical and patient perspectives, but also a need to understand the similarities and differences in how clinicians and patients perceive and interpret the impact of changes in disease. Furthermore, it is useful to identify whether any of the SF-36 domains reflect narcolepsy-specific issues or correlate with the clinician’s perspective of a patient’s improvement or worsening. Therefore, the purpose of this analysis was to provide initial assessment of the degree of correlation between these outcomes using data from the previously mentioned clinical trial of SXB that included both the SF-36 and the CGI-C [17, 18, 20]; SXB is approved for the treatment of both excessive daytime sleepiness and cataplexy in patients with narcolepsy [21].

Methods

Study Design and Population

This post hoc analysis was based on data from an 8-week clinical trial of SXB for which the methodology and results have previously been published [18]. Patients were required to be ≥16 years of age and have a diagnosis of narcolepsy with cataplexy based on clinical history, an overnight polysomnogram, and a multiple sleep latency test. Randomization was to treatment with placebo or SXB 4.5 g, 6 g, or 9 g administered as two equally divided nightly doses. This article does not contain any new studies with human or animal subjects performed by any of the authors.

Outcomes

Both the CGI-C [16] and the SF-36 [5] were included as outcomes in the trial, with the former assessed at Week 8, and the latter as an exploratory efficacy endpoint at baseline, Week 4, and Week 8. The CGI-C was scored by the clinician from 1 = very much improved to 7 = very much worse. Scores on the SF-36 range from 0 to 100, with higher scores indicating better HRQoL. In addition to the eight subscales, the SF-36 derives two summary scales from positively weighting specific domains; the PCS, which is derived from weighting the domains of PF, RP, BP, and GH, and the Mental Component Summary (MCS), derived from weighting the domains of VT, SF, RE, and MH.

Statistical Analysis

This analysis was based on data that were available for 209 of the 228 patients in the intent-to-treat population of the study; 19 patients did not have both CGI-C and SF-36 results for inclusion in the correlation analysis. Missing Week 8 values on the SF-36 were imputed using the last observation (Week 4 values) carried forward. Regardless of treatment group, the change from baseline at Week 8 for all SF-36 subscales and the two summary scores were evaluated for correlation with the CGI-C at Week 8 using the Pearson product-moment coefficient (r), such that H 0: ρ = 0, where absolute values of r ≤ 0.30 represent a weak correlation, 0.30–0.50 are moderate, and ≥0.50 are strong [22]. Scatterplots and regression lines were developed to visualize the relationship between SF-36 and CGI-C. These scatterplot analyses estimated both the 95% confidence interval of the regression, as well as the 95% prediction limits, which indicate the predictive probability that future observations will fall within these boundaries. Additionally, Pearson correlation analysis was performed to determine the relationships between the CGI-C at Week 8 and changes from baseline in SF-36 subscales for each treatment group.

Results

Population Characteristics

Treatment groups were generally well-balanced for demographic characteristics, and the population was predominantly female (65.4%), white (86.0%), with a mean (SD) age of 40.5 (15.3) years. Baseline scores on the SF-36 were also well-balanced among the treatment groups (Table 1).

Table 1 Baseline characteristics of the intent-to-treat population (N = 228)a

Correlation Between CGI-C and SF-36 Regardless of Treatment Group

A general relationship was observed between scores on the CGI-C and SF-36, and this relationship was inverse as indicated by the negative correlation coefficients, such that CGI-C scores indicating greater improvement (i.e., lower scores) were associated with improved HRQoL with SF-36 domain scores (i.e., higher SF-36 scores) (Table 2). The correlations were moderate and significant for the subscales of VT (r = −0.464; P < 0.0001) and RP (r = −0.310; P < 0.0001). However, all other correlations were weak including those for PCS and MCS, although the correlations were significant except for the two subscales with the weakest correlations, BP and RE (Table 2).

Table 2 Correlation of physician-rated Clinical Global Impression of Change (CGI-C) at Week 8 with changes reported by patients on the 36-item Short Form Health Survey (SF-36) regardless of treatment (n = 209)

Scatterplots of the relationship between CGI-C and the two SF-36 subscales that had moderate correlations, VT (Fig. 1a) and RP (Fig. 1b), show low R 2 values for the linear regression of both subscales, 0.215 and 0.096 for VT and RP, respectively, and wide 95% prediction limits, providing further visual representation that the correlations are not strong. For the other SF-36 subscales, the R 2 values of the linear regression ranged from 0.0079 (RE) to 0.0834 (SF) with similarly wide 95% prediction limits (data not shown).

Fig. 1
figure 1

Scatterplot showing moderate correlation at Week 8 between Clinical Global Impression of Change (CGI-C) and the 36-item Short Form Health Status survey (SF-36) subscales of a vitality and b role physical

Correlation Between CGI-C and SF-36 by Treatment Group

Correlations appeared to be dose dependent across SXB doses for PF, RP, VT, SF, and the PCS, with the SXB 9 g dose showing strong correlations for these subscales except SF for which the correlation was moderate (Table 3); the strongest correlation was for VT with SXB 9 g. Correlations were generally weak or moderate for the other subscales across all doses, consistent with overall results.

Table 3 Correlation of physician-rated Clinical Global Impression of Change (CGI-C) at Week 8 with changes reported by patients on the 36-item Short Form Health Survey (SF-36) by treatment group

Discussion

Determining how global changes in disease and their impact are assessed and identifying concordance or discordance between clinician and patient perspectives can contribute to understanding what is considered an effective treatment. Correlation analyses such as those presented here help enhance this understanding by validating patient experiences as well as clinician observations.

While this analysis showed that most of the SF-36 subscales only weakly correlated with the CGI-C, it did identify two subscales with moderate correlations suggestive of changes in patient-reported domains that may be perceived by clinicians and likely enable them to determine patient improvement. These subscales, VT and RP, are also the subscales that are most frequently reported to have the lowest value (poorest outcome) among the SF-36 domains in patients with narcolepsy as well as the greatest difference in scores between narcolepsy patients and the general population [6,7,8]. In particular, VT can be interpreted as a measure of fatigue, which is frequently reported by patients with narcolepsy [26, 27]. In contrast, the low correlation coefficients and the lack of statistical significance in two other SF-36 subscales indicate domains where there either was not a good match between patient and clinician perspectives (RE) or did not adequately reflect disease-specific issues of patients with narcolepsy (BP). The correlation coefficients for PF, GH, SF, and MH as well as the PCS and MCS also suggest that despite showing statistical significance, these subscales may not be adequate as indicators of change in narcolepsy. Based on these results, it may be prudent to suggest that while the SF-36 can provide a general assessment of HRQoL in patients with narcolepsy, the use of other, more appropriate measures should be considered for evaluating global and disease-specific changes.

However, the low to moderate correlations also indicate lack of agreement, at least in part, between how clinicians and patients perceive improvements. This lack of agreement is consistent with the discordance between patient and clinician perspectives of global assessments of disease activity and treatment response that has also been reported in other diseases [23,24,25] suggesting that the variables contributing to a patient’s perception of disease may be different from that of the healthcare provider. Such a lack of agreement, especially with regard to the low correlations observed for mental health in the current analysis, further supports the need for narcolepsy assessment that incorporates a broader range of patient-reported outcomes that more closely align with patient perceptions of treatment effects.

Limitations

A limitation of this study is that it was a post hoc analysis, although data were available from most of the patients in the study. Additionally, the 8-week duration of the study may also be considered a limitation because it may have been too short to capture patient-reported changes in HRQoL; response onset, assessed as clinically meaningful improvements in EDS and cataplexy, were observed in most patients within 2 months, but a longer period was needed to achieve maximum response [28]. Finally, the study could also potentially be criticized for not evaluating factors that may confound the relationship. However, it should be noted that these correlations reflect within-subject assessments, since patient scores from both outcome measures were assessed within the same patient among those who had available data or who met the imputation guidelines.

Conclusion

Only two SF-36 subscales showed moderate correlation with overall change in status observed by clinicians, VT and RP, suggesting that improvement in these domains may be reflected by clinicians’ ratings. However, other SF-36 subscales as well as the summary scores expressed only a weak correlation with clinician ratings, indicating that some aspects of HRQoL measured by the SF-36 may assess symptoms of narcolepsy that are also recognized by physicians. However, there is also discordance between patient and physician perspectives of disease, further suggesting a need for a broader assessment of narcolepsy and treatment effects that emphasizes the patient perspective.