FormalPara Key Summary Points

Why carry out this study?

Two phase 3 clinical trials, SKYLIGHT 1 and 2, demonstrated that fezolinetant 30 mg and 45 mg once-daily were superior to placebo in improving the frequency and severity of vasomotor symptoms (VMS) at weeks 4 and 12 in participants with moderate-to-severe VMS associated with menopause.

It is important to understand what magnitude of treatment effect is clinically meaningful to patients to help guide the interpretation of these results.

Using pooled data from SKYLIGHT 1 and 2 and an anchor-based method, this study aimed to define a clinically meaningful threshold for a reduction of moderate-to-severe VMS in postmenopausal women treated with fezolinetant.

What was learned from the study?

The thresholds for a meaningful within-patient change in moderate-to-severe VMS frequency were estimated to be a reduction of 5.73 and 6.20 VMS episodes/day at weeks 4 and 12, respectively.

When these thresholds were applied, significantly greater proportions of responders were observed in participants treated with fezolinetant compared with placebo at both weeks 4 and 12.

This analysis indicates fezolinetant is an important nonhormonal treatment option that provides clinically meaningful improvements in symptoms for women who have VMS associated with menopause.

Introduction

Vasomotor symptoms (VMS), which are caused by an imbalance of hypothalamic thermoregulation, are commonly referred to as hot flashes and/or night sweats [1,2,3]. VMS episodes are the characteristic symptoms of menopausal transition; they affect up to 80% of women and are considered moderate-to-severe by up to 50% of women [2, 4, 5]. VMS are bothersome for the majority of women, and they have a negative impact on health-related quality of life (HRQoL), including impacts on sleep, concentration/cognition, mood, energy, sexual activity, work, and social/leisure activities [6, 7].

VMS are reported across all races/ethnicities [5, 8], and they persist for a median duration of 7.4 years but may continue for more than 10 years in one-third of women [4, 9]. VMS are the most common primary reason for women to seek treatment for menopausal symptoms [8]. Hormone therapy remains an effective treatment [10, 11], but it is unsuitable for some women because of contraindications, and a large proportion of women are eligible but averse to taking hormone therapy (e.g., due to beliefs that menopausal symptoms will diminish without pharmacological intervention and side effects/long-term risks) [11, 12].

Fezolinetant is a nonhormonal, selective neurokinin 3 receptor antagonist that moderates neuronal activity in the thermoregulatory center in the hypothalamus [13]. It was recently approved at a once-daily dose of 45 mg by the US Food and Drug Administration (FDA) for the treatment of moderate-to-severe VMS due to menopause and by the European Medicines Agency for the treatment of moderate-to-severe VMS associated with menopause [14]. SKYLIGHT 1 and SKYLIGHT 2, two identical phase 3 trials, demonstrated that fezolinetant 30 mg and 45 mg provided a statistically significant improvement in mean daily VMS frequency and VMS severity at weeks 4 and 12 (the four co-primary endpoints) compared with placebo in participants with moderate-to-severe VMS associated with menopause [15, 16]. SKYLIGHT 4, a 52-week, randomized, phase 3 study, confirmed the safety and tolerability of fezolinetant in this patient population [17].

Patient-reported outcomes (PROs) report the status of a patient’s health condition directly from the patient, without interpretation by a clinician or anyone else [18]. PROs help determine the effects of treatment, directly reflecting how patients feel or function [19]. Importantly, PROs can be used to estimate improvements that are clinically meaningful from the patient’s own perspective, thus providing insights for the interpretation of the estimates of the overall treatment effect of any given treatment [20].

The objective of this analysis was to define the threshold for meaningful within-patient change to guide the interpretation of results for the reduction in VMS frequency. The paper further applies this threshold, with results that support the efficacy of fezolinetant using the primary endpoints reported elsewhere [15, 16]. These analyses were pre-specified and used pooled data from SKYLIGHT 1 and SKYLIGHT 2.

Methods

Study Design and Participants

The study design of SKYLIGHT 1 (NCT04003155) and SKYLIGHT 2 (NCT04003142) has been described previously [15, 16]. In brief, the two trials were phase 3, randomized, double-blind, placebo-controlled studies. Individuals aged ≥ 40 to ≤ 65 years who were female at birth and who had moderate-to-severe VMS (minimum average of seven VMS/day or ≥ 50 per week) were randomized to once-daily fezolinetant 30 mg, fezolinetant 45 mg, or placebo (1:1:1) for 12 weeks followed by a 40-week extension period of active treatment (individuals initially randomized to placebo were re-randomized to fezolinetant 30 mg or 45 mg). The SKYLIGHT 1 and SKYLIGHT 2 studies were conducted in accordance with the Declaration of Helsinki, Good Clinical Practice, and International Council for Harmonisation guidelines. An independent ethics committee or institutional review board reviewed the ethical, scientific, and medical appropriateness of the study at each site before data collection. Written informed consent was provided by all participants.

Evaluation of Meaningful Within-Patient Change

The four co-primary endpoints of both studies were mean change in daily frequency and severity of moderate-to-severe VMS from baseline to weeks 4 and 12 [15, 16]. VMS data were collected using an electronic VMS diary that was completed by participants daily during a 24-h period; this started at screening and continued through to the follow-up visit. The VMS diary, an interactive, electronic data capture system available for data entry 24 h/day, included a reference guide with the following definitions: mild symptoms (i.e., sensation of heat without sweating), moderate symptoms (i.e., sensation of heat with sweating, able to continue activity), and severe symptoms (i.e., sensation of heat with sweating, causing cessation of activity) [21].

Meaningful within-patient change for fezolinetant treatment was assessed using the Patient Global Impression of Change in VMS (PGI-C VMS) instrument, which was a pre-specified secondary endpoint of the trials and was used as an anchor measure in this analysis. PGI-C VMS is a single-item global PRO, analogous to the Clinical Global Impression (CGI) scales [22], designed to provide a patient’s assessment of change in VMS from the start of treatment. The PGI-C VMS asked the following question: “Compared to the beginning of this study, how would you rate your hot flushes/night sweats now?” Participants rated change using a seven-point Likert scale: “much better,” “moderately better,” “a little better,” “no change,” “a little worse,” “moderately worse,” and “much worse.” Patient responses for PGI-C VMS were collected at weeks 4 and 12. The “moderately better” response category was selected to characterize a meaningful change in PGI-C VMS in our analysis. There is evidence supporting this conservative approach from prior VMS studies that employed similar CGI scales [23, 24], in which clinical meaningfulness in VMS frequency/severity reflected changes associated with improvements above the “minimally improved” or “a little better” category by using the two highest ranked categories in the seven-point scale. These two studies support the choice of primary anchor in our analysis, and the week 12 time point was designated as the key time point.

Statistical Analyses

Analyses of meaningful within-patient change were performed using all randomized participants in SKYLIGHT 1 and 2 who had VMS frequency data at baseline and at least one post-baseline value at either week 4 or week 12. The primary analysis used the overall pooled population, and sensitivity analyses used subgroups of the overall pooled sample and the individual phase 3 studies (Table S1).

The number and proportion of participants for each PGI-C VMS response category were summarized at weeks 4 and 12; differences between the fezolinetant and placebo groups were estimated using the Cochran–Mantel–Haenszel test with modified ridit scores stratified by study. In addition, descriptive statistics (mean, standard deviation [SD], and median) were reported for moderate-to-severe VMS frequency at baseline, week 4, and week 12, and changes from baseline.

The appropriateness of the PGI-C VMS to serve as an anchor measure for the change in the frequency of VMS was assessed by conducting correlational analyses (polyserial and Spearman’s rank) and reviewing select descriptive statistics to ensure that it was adequately related to the change in frequency of moderate-to-severe VMS at weeks 4 and 12. The subsequent meaningful within-patient change analyses would be performed only if the association between the PGI-C VMS and the change in moderate-to-severe VMS frequency was deemed to be appropriate and had a correlation value of > 0.37; this threshold was based on criteria recommended in the literature for appropriate anchor measures [25, 26].

After PGI-C VMS was deemed an appropriate anchor measure, thresholds of meaningful within-patient change in moderate-to-severe VMS frequency were estimated. The primary anchor-based estimates were summarized descriptively by mean change with SD (and median, with first and third quartiles as supportive estimates) in the frequency of moderate-to-severe VMS for the different levels of change defined by the PGI-C VMS. Estimates were determined using data for week 4 and week 12 separately. In addition, sensitivity analyses were conducted using several supportive analyses, including the distribution-based method of half-SD of baseline, receiver operating characteristic (ROC) curve analysis, and empirical cumulative distribution functions (eCDF) plots of the primary anchor measure. The ROC-based analysis applied a slightly lower level of response, defined as the change in score that differentiates the “moderately better” response and the “a little better” response on the PGI-C VMS. eCDF curves for the change in the frequency of moderate-to-severe VMS by PGI-C VMS response groups (anchor categories) at weeks 4 and 12 are presented.

Subsequent responder analyses of moderate-to-severe VMS frequency were performed that applied the primary threshold estimates of meaningful within-patient change associated with a PGI-C VMS “moderately better” response at week 4 and week 12 to compare fezolinetant 30 mg and fezolinetant 45 mg with placebo. A patient was classified as a responder if the change in VMS frequency from baseline to week 4 was equal to or larger than the meaningful within-patient change threshold at week 4. A similar definition was used to classify responders at week 12. Odds ratios (ORs), 95% confidence intervals (CIs), and P values were from a logistic regression model that used protocol, treatment group, and smoking status (current versus former/never) as factors and baseline frequency of VMS as a covariate.

Results

Patient Demographics

The pooled population of SKYLIGHT 1 and 2 consisted of 1022 women who were randomized and received at least one dose of study drug (placebo n = 342, fezolinetant 30 mg n = 339, fezolinetant 45 mg n = 341) (Table 1). These women had a mean (SD) age of 54.3 (5.0) years, and most were White (81.1%). Demographics were generally balanced across groups, although mean time since onset of hot flashes was slightly longer in the placebo group (81.9 months) compared with the fezolinetant 30-mg (76.7 months) and 45-mg (76.9 months) groups.

Table 1 Key participant demographics and baseline characteristics (full analysis set)

VMS Frequency and PGI-C VMS

In the overall pooled population, the mean (SD) number of moderate-to-severe VMS episodes/day was 11.02 (5.30) at baseline and reduced to 6.22 (5.60) and 5.23 (5.40) at weeks 4 and 12, respectively, which corresponds to mean (SD) changes from baseline of − 4.86 (4.64) and − 5.82 (5.40) at weeks 4 and 12, respectively (Table 2). Similar data were reported in the pooled sub-sample populations (Table 2) and separately in SKYLIGHT 1 and SKYLIGHT 2 (Table S2).

Table 2 Moderate-to-severe VMS frequency at baseline, week 4, and week 12, and change from baseline to week 4 and week 12 in the pooled population

In the overall pooled population, greater proportions of participants in the fezolinetant 30-mg and 45-mg groups relative to placebo reported an improvement (“much better,” “moderately better,” or “a little better”) in PGI-C VMS at weeks 4 and 12 compared with baseline (Table 3). The proportions of participants with an improvement in PGI-C VMS were 81.6% in the fezolinetant 30-mg group and 85.6% in the fezolinetant 45-mg group versus 61.4% in the placebo group at week 4, and 84.7% in the fezolinetant 30-mg group and 91.1% in the fezolinetant 45-mg group versus 66.2% in the placebo group at week 12. The association between response and treatment group (fezolinetant 30-mg and 45-mg groups versus placebo) had a P value of < 0.001 at weeks 4 and 12. Similar data were reported in the individual phase 3 studies (Tables S3 and S4).

Table 3 PGI-C VMS categorical responses at week 4 and week 12 in the pooled population

Meaningful Within-Patient Change

Correlations of change scores between moderate-to-severe VMS frequency and PGI-C VMS exceeded the minimum magnitude of correlation criterion of > 0.37 in the overall pooled study sample; polyserial values were 0.55 at week 4 and 0.48 at week 12, and Spearman’s rank values were 0.59 at week 4 and 0.52 at week 12 (Table 4). At study level, the polyserial correlations in SKYLIGHT 1 and SKYLIGHT 2 were 0.55 and 0.56 at week 4 and 0.46 and 0.49 at week 12, respectively (Table S5).

Table 4 Correlation of change scores between moderate-to-severe VMS frequency and PGI-C VMS at weeks 4 and 12 in the pooled population

In the overall sample, using the primary anchor of “moderately better” improvement in PGI-C VMS, the mean (SD) and median (first quartile, third quartile) estimated thresholds for reporting a meaningful within-patient change in moderate-to-severe VMS frequency were − 5.73 (3.47) and − 5.79 (− 7.44, − 3.26) respectively at week 4 and − 6.20 (5.18) and − 6.28 (− 8.29, − 4.13) respectively at week 12 (Table 5).

Table 5 Meaningful within-patient change estimation statistics of change in moderate-to-severe VMS frequency by PGI-C VMS in the pooled population

The mean and median estimates were similar at both time points, and the first and third quartiles around the median had relative symmetry, which indicates that there was no extreme skewness in the data impacting the mean estimates. Also, the anchor-based estimates were slightly larger at week 12 than week 4 (Fig. 1). Mean and median anchor-based threshold estimates were larger than the ROC estimates at weeks 4 and 12, which were larger than the half-SD estimate. At study level, mean and median estimated thresholds from SKYLIGHT 1 and SKYLIGHT 2 are reported in Tables S6 and S7, respectively; SKYLIGHT 2 had slightly higher baseline mean VMS counts and variability than SKYLIGHT 1. The eCDF curves of the primary anchor measure response categories by VMS change scores at weeks 4 and 12 show well-spaced and well-ordered patterns for the three improvement and “no change” response groups and are supportive of the anchor-based method and its estimates (Fig. S1).

Fig. 1
figure 1

Meaningful within-patient change triangulation plot for moderate-to-severe VMS frequency in the pooled population. PGI-C Patient Global Impression of Change, ROC receiver operating characteristic, SD standard deviation, VMS vasomotor symptoms

VMS Frequency Responder Analyses

The single-responder analyses applied the primary threshold estimates based on PGI-C VMS “moderately better” from the pooled sample at week 4 (mean value − 5.73 VMS/day) and week 12 (mean value − 6.20 VMS/day), as described previously. Based on the participants with a non-missing value at the analysis visit, the proportions of responders at week 12 were greater in the fezolinetant 30-mg (50.0%; n = 133/266) and 45-mg (55.1%; n = 161/292) groups compared with the placebo group (31.4%; n = 88/280) [27]. When a “missing as non-responder” imputation method was used, greater proportions of responders were observed in the fezolinetant 30-mg (42.8%; n = 145/339) and 45-mg (46.6%; n = 159/341) groups compared with the placebo group (24.0%; n = 82/342) at week 4. The corresponding ORs (95% CI) for a VMS frequency responder compared with placebo were 2.48 (1.78–3.47; P < 0.001) for fezolinetant 30 mg and 2.90 (2.09–4.07; P < 0.001) for fezolinetant 45 mg. Similarly, at week 12, greater proportions of participants experienced a meaningful reduction in VMS frequency in the fezolinetant 30-mg (38.9%; n = 132/339) and 45-mg (47.2%; n = 161/341) groups compared with the placebo group (25.7%; n = 88/342). The corresponding ORs (95% CI) compared with placebo were 1.90 (1.36–2.65; P < 0.001) and 2.68 (1.94–3.74; P < 0.001) for fezolinetant 30 mg and 45 mg, respectively.

Discussion

Understanding the patient’s perspective of their condition and benefits from treatment provides value to regulatory health authorities and healthcare professionals in routine care. In this pre-specified analysis of pooled data from SKYLIGHT 1 and 2, treatment with fezolinetant led to approximately five to six fewer moderate-to-severe VMS episodes/day at weeks 4 and 12 compared with baseline. The PGI-C VMS data at weeks 4 and 12 indicated that a greater proportion of participants treated with fezolinetant experienced an improved response versus placebo. Threshold estimates based on using a “moderately better” PGI-C VMS response as the anchor for meaningful within-patient change indicate that a reduction of approximately six VMS episodes/day represents a meaningful improvement to patients (mean primary threshold values for meaningful within-patient change were − 5.7 at week 4 and − 6.2 at week 12). In addition, higher odds of achieving meaningful within-patient change in the frequency of VMS were observed for both fezolinetant doses versus placebo at weeks 4 and 12. This analysis helps to highlight the timing and thresholds of improvement, which appear to be important for patients with moderate-to-severe VMS.

Our analyses show that PGI-C VMS is an appropriate anchor measure for defining meaningful within-patient change in VMS frequency. Correlations between change scores (baseline to week 4 and week 12) in PGI-C VMS and the frequency of moderate-to-severe VMS in the overall pooled sample were 0.55–0.59 at week 4 and 0.48–0.52 at week 12. These correlation estimates exceeded the minimum magnitude of correlation criterion (> 0.37) required to support the appropriateness of the candidate anchor measure [25, 26]. The robustness of these data is supported by consistent data estimated by both polyserial and Spearman’s rank correlations and consistent data regardless of the sample (i.e., pooled population, SKYLIGHT 1 and SKYLIGHT 2, and the sub-samples).

There is currently no consensus concerning the best approach to assess treatment outcomes or symptom modifications from a patient’s perspective in moderate-to-severe VMS. PGI-C VMS is based on the generic PGI-C scale tool but adapted for VMS (it asks patients to rate their hot flushes/night sweats at a particular time point compared with the beginning of the study using a seven-point Likert scale of “much better” to “much worse”). PGI-C is an easy-to-use self-reported global rating measure that can help evaluate a patient’s assessment of efficacy relative to anchor points across multiple conditions [28, 29]. The PGI-C VMS is consistent in construction with the global rating items that are included in nearly all clinical trials to capture patients’ perceptions regarding current experience of or change from baseline in disease or symptom severity. Anchoring objective VMS frequency to subjective PGI-C VMS establishes the patient-reported meaningfulness of the change in VMS frequency. The key objective of this analysis was to define and apply a threshold for meaningful within-patient change to support the evaluation of the relevance between clinical changes and patient-observed changes in VMS frequency with fezolinetant. In our study, the meaningful within-patient change threshold represents the smallest change in an outcome score that is considered a meaningful change at an individual level. In this context, “meaningful within-patient change” is considered equivalent to similar terms used in the literature. This approach is in line with FDA guidance [30], which states that “it is important to understand how [a clinical outcome assessment or patient reported outcome]-based endpoint corresponds to changes relevant to patients (e.g., the type and extent of change that is meaningful to patients)”. In our paper, the focus is on the derivation of thresholds that allow one to qualify a clinically meaningful change in VMS frequency at patient level (not placebo-adjusted), from baseline to weeks 4 and 12.

Over 30 years ago, the term “minimal clinically important difference” was defined as “the smallest difference in score in the domain of interest that patients perceive as beneficial and would mandate, in the absence of troublesome side effects and excessive costs, a change in the patient’s management” [31]. A number of additional terms have been introduced in the proceeding years, including “minimally important difference,” “minimally important change,” “clinically important difference,” “minimally detectable difference,” and “minimum detectable change” [32]. The precise definition applied to these alternative terms may differ across studies, but they all generally aim to quantify thresholds of change that are considered clinically relevant either for an individual or a group [32, 33]. Furthermore, anchor-based and distribution-based methods are the two most common approaches for estimating meaningful/minimal changes in PROs [32, 34, 35]. We used an anchor-based method aligned to recommendations by Revicki et al., who proposed the use of the anchor-based method to provide primary estimates of an instrument’s meaningful within-patient change, and we also used the distribution-based method to provide supportive evidence when anchor-based estimates were unavailable [35].

A few prior publications have assessed clinically/minimally important difference in postmenopausal women with moderate-to-severe VMS, but they involved different treatments and were anchored with different PRO measures compared with our study. In two studies of hormone therapy, weekly VMS severity [24] or weekly VMS frequency [36] were anchored to generic (not specific to VMS) CGI outcomes. Other reports of responder thresholds in moderate-to-severe VMS include VMS frequency anchored to CGI and the Menopause-Specific Quality of Life questionnaire in women treated with hormone therapy [23], VMS frequency and severity anchored to the Menopause Symptoms Treatment Satisfaction Questionnaire in women treated with desvenlafaxine [37], and VMS frequency anchored to the Hot Flash Related Daily Interference Scale/Hot Flash Interference Scale in women treated with escitalopram [38].

Further clinically important changes from a patient perspective have been observed with fezolinetant based on improvements in other PRO data reported in the SKYLIGHT 1 and 2 studies. Statistically significant improvements from baseline in Menopause-Specific Quality of Life total score and the VMS domain score were observed at weeks 4 and 12 in both the fezolinetant 30-mg and 45-mg groups compared with the placebo group [15, 16]. In addition, both fezolinetant doses demonstrated numerical improvements in sleep scores, measured by the Patient-Reported Outcomes Information System Short Form v1.0 Sleep Disturbance 8b total scores [15, 16]. Overall, these data indicate that fezolinetant improves different aspects of HRQoL from as early as week 4 in women with moderate-to-severe VMS.

PROs are routinely collected in clinical trials and are used in conjunction with clinical objective outcomes to assess the overall impact of a medical treatment or intervention [39]. Data derived from PROs provide important evidence for the impact of treatment on patient-reported symptoms and HRQoL [39, 40]. The use of PROs is encouraged by the major international health policy and regulatory authorities (e.g., European Medicines Agency and the FDA), and findings from a validated PRO can be used to support clinical decision-making, pharmaceutical labeling claims, product reimbursement, and health technology assessment bodies/payers appraisal [18, 40]. Perhaps most importantly, PROs are valuable tools that capture the patient’s perspective, allowing healthcare professionals and patients to make more informed healthcare decisions.

Strengths of this analysis include the derivation of data from a large, pooled population of approximately 1000 patients with moderate-to-severe VMS from two phase 3, randomized, double-blind, placebo-controlled studies, and the meaningful within-patient change was based on changes observed at two time points. Another strength is the evaluation of threshold-estimate plausibility using several supportive analyses. A limitation in our study is that the average weight and BMI were relatively high. While consistent with several other studies conducted in menopausal women, it is recognized that the frequency and severity of VMS can be greater in women with a higher BMI. A further limitation is that the PGI-C VMS instrument may be subject to recall bias. Additionally, data collection within a clinical trial setting may limit the generalizability of these findings to routine clinical practice.

Conclusion

VMS are bothersome for most women undergoing menopausal transition, but few large clinical trials have reported patient perception of meaningful symptomatic improvement. This analysis of pooled data from the SKYLIGHT 1 and 2 trials demonstrates that PGI-C VMS is sensitive to change and correlates with VMS frequency, and a reduction of approximately six VMS per day is a meaningful improvement for patients with moderate-to-severe VMS. Application of this threshold in a responder analysis found greater odds (up to three times) of achieving meaningful within-patient reductions in VMS for fezolinetant 45 mg compared with placebo at weeks 4 and 12. These analyses may support the interpretation of data for fezolinetant with different stakeholders, including but not limited to clinicians and patients. Overall, these data support the position that fezolinetant provides a meaningful clinical benefit for women who have VMS associated with menopause and that it represents an important nonhormonal treatment option.