FormalPara Key Summary Points

Why carry out this study?

Vasomotor symptoms (VMS) associated with menopause can negatively affect health-related quality of life (HRQoL). The Menopause-Specific Quality of Life (MENQOL) questionnaire has been developed to assess QOL specific to menopause.

To support use of MENQOL in studies of novel therapies for VMS, it is important to evaluate its properties and establish a definition of meaningful change (i.e., treatment response).

What was the hypothesis of the study?

The objective of the current study was to assess the psychometric properties, sensitivity to change, and clinically meaningful within-patient change of the MENQOL using data from the SKYLIGHT 1 and 2 studies, featuring fezolinetant, a novel non-hormonal neurokinin 3 receptor antagonist approved for treatment of VMS due to menopause.

What was learned from the study?

Overall, the results provide evidence of acceptable psychometric properties of the MENQOL overall and domain scores using 1-week recall, supporting use of this instrument to capture experiences among people with moderate-to-severe VMS associated with menopause and assess related endpoints in clinical trials.

What has been learned from the study?

These results support the internal consistency reliability, convergent reliability, and structural validity of the MENQOL found in other populations such as breast cancer survivors and participants with diabetes.

Introduction

Menopause is often accompanied by bothersome vasomotor symptoms (VMS), particularly hot flashes and night sweats [1,2,3]. Hormone therapy is an effective VMS treatment but is contraindicated in women with a history of breast or endometrial cancer, coronary heart disease, venous thromboembolism, or stroke [4]. For some, therefore, non-hormonal treatment may be preferable to hormone therapy.

Fezolinetant is a novel selective non-hormonal neurokinin 3 receptor antagonist approved by the US Food and Drug Administration (FDA) for treatment of moderate-to-severe VMS due to menopause and by the European Medicines Agency for treatment of moderate-to-severe VMS associated with menopause. In phase 2 placebo-controlled studies in participants aged 40–65 years, fezolinetant significantly reduced the frequency and severity of moderate-to-severe VMS and improved a range of patient-reported outcomes (PROs) [5, 6]. Consequently, two phase 3 studies—SKYLIGHT 1 and 2—investigated efficacy and safety of fezolinetant.

During SKYLIGHT 1 and 2, quality of life (QOL) was assessed using a range of PRO measures, including the Menopause-Specific Quality of Life (MENQOL) questionnaire, which assesses QOL specific to menopause. Other PROs that were not menopause specific were the Patient-Reported Outcomes Measurement Information System Sleep Disturbance—Short Form 8b (PROMIS SD SF 8b), the Patient Global Impression of Change in Sleep Disturbance (PGI-C SD), and the Patient Global Impression of Severity in Sleep Disturbance (PGI-S SD). While not specific to menopause, to support use of MENQOL in studies of novel therapies for VMS, it is important to evaluate its psychometric properties and sensitivity to change and establish a definition of meaningful change (i.e., treatment response). On the basis of guidance issued by the US FDA [7, 8], we assessed these properties using pooled data from SKYLIGHT 1 and 2 in participants diagnosed with menopause-associated moderate-to-severe VMS.

Methods

Study Design and Patients

SKYLIGHT 1 (NCT04003155) and 2 (NCT04003142) were phase 3, randomized, placebo-controlled, double-blind studies with identical designs. These studies were conducted in accordance with the Declaration of Helsinki, Good Clinical Practice, and International Council for Harmonisation guidelines. An independent ethics committee or institutional review board reviewed the ethical, scientific, and medical appropriateness of the study at each site before data collection. Individuals aged ≥ 40 to ≤ 65 years with moderate-to-severe VMS (minimum average seven hot flashes/day) who were female at birth were randomized to once-daily fezolinetant 30 mg, fezolinetant 45 mg, or placebo (1:1:1) for 12 weeks. All women had spontaneous amenorrhea for at least 12 consecutive months, spontaneous amenorrhea for at least 6 months with biochemical criteria of menopause (follicle stimulating hormone > 40 IU/L), bilateral oophorectomy at least 6 weeks before the screening visit (with or without hysterectomy), and a BMI of 18–38 kg/m2. A 40-week open-label extension followed, in which all participants received active treatment (individuals initially randomized to placebo were re-randomized to fezolinetant 30 mg or 45 mg, and fezolinetant-treated individuals continued their original dose). Full details of the study designs and inclusion/exclusion criteria have been published [9, 10]. A total of 1022 women were randomized and received ≥ 1 dose of study drug across both studies (placebo, n = 342; fezolinetant 30 mg, n = 339; fezolinetant 45 mg, n = 341). Mean (standard deviation [SD]) age was 54.3 (5.0) years, and the majority of the women were White (828 [81.1%]). Demographic data were largely balanced across groups, although mean (range) time since onset of VMS was slightly longer in the placebo group (81.9 [2–422] months) versus the fezolinetant 30 mg (76.7 [3–370] months) and 45 mg (76.9 [1–396] months) groups.

MENQOL Questionnaire

The MENQOL questionnaire is a self-reported measure that assesses QOL specific to menopause [11, 12]. Respondents are asked whether they have experienced any of the 29 symptoms within the past week and to rate how bothersome each symptom was on a 7-point Likert scale (0 = not at all bothered; 6 = extremely bothered). The 29 items are combined into four domains: vasomotor (three items), psychosocial (seven items), physical (16 items), and sexual (three items). For analysis, the questionnaire score becomes 1 for no; 2 for yes, not bothered through to 8 for yes, extremely bothered. The score by domain is the mean of the converted item scores forming that domain and ranges from 1 to 8 [12].

Additional PRO Measures

Eight other PRO measures included in the SKYLIGHT 1 and 2 studies were used to evaluate the MENQOL questionnaire: PRO Measurement Information System Sleep-Related Impairment–Short Form 8a (PROMIS SRI SF 8a); PROMIS Sleep Disturbance–Short Form 8b (PROMIS SD SF 8b); VMS episodes captured using an electronic diary; Patient Global Impression of Severity Sleep Disturbance (PGI-S SD); Patient Global Impression of Change Sleep Disturbance (PGI-C SD); Patient Global Impression of Change Vasomotor Symptoms (PGI-C VMS); EuroQoL 5-dimension 5-level (EQ-5D-5L) questionnaire, including the EQ visual analog scale (VAS); and Work Productivity and Activity Impairment questionnaire specific to Vasomotor Symptoms (WPAI-VMS). Further details on these instruments are included in Supplementary Table S1.

PRO assessments were completed at baseline and weeks 4 and 12 during the treatment period, except for PGI-C assessments, which were completed at weeks 4 and 12 only, as PGI-C measures change from baseline.

Descriptive Analyses

All PRO analyses were performed on the full analysis set (all randomized participants who received ≥ 1 dose of study drug). Completion rates were the number of participants with a completed item entry at each clinic visit divided by the total number in the full analysis set.

Descriptive analysis was performed at item and score level using pooled arms at baseline, week 4, and week 12 for MENQOL items. At each timepoint, the number (%) of participants with completed responses for each item was recorded and descriptive statistics generated for each item and for domain and total scores. Floor (> 20% of responses for the lowest/least severe option) and ceiling (> 20% of responses for the highest/most severe option) effects were investigated.

Psychometric Evaluation

Confirmatory factor analysis (CFA) was conducted on MENQOL using a second-order model including four factors, as previously described [13], with three latent factors represented for vasomotor, psychosocial, and sexual domains, with physical domain defined as the observed factor using the physical score (i.e., mean score of 16 domain items). These factors were loaded on the second-order latent factor representing the overall score (main construct). The maximum likelihood method was used, as this assumes input data are multivariate normal, which may be approximately true for average scores using ordinal data with eight categories. CFA produced several goodness-of-fit measures or fit indices to evaluate the model, including the standardized root mean residual, root mean square error of approximation, and comparative fit index. McDonald’s omega was also calculated as a measure of internal consistency and reliability.

Internal consistency was assessed using Cronbach’s alpha coefficient (values ≥ 0.70 indicate acceptable reliability [14]); 95% confidence intervals (CI) were calculated for the total score and four MENQOL domains at baseline and week 12. Alpha-if-item-deleted results were calculated.

Test–retest reliability was assessed using a two-way mixed, absolute agreement, single measure intraclass correlation coefficient (ICC) at baseline (test) and week 4 (retest). ICC values of 0.50–0.90 and > 0.90 represent moderate-to-good and excellent reliability, respectively [15].

Spearman correlation coefficients were used to evaluate convergent validity between MENQOL overall and domain scores and other PROs. At least moderate correlations (r > 0.30) were expected between overall and domain scores of similar constructs. MENQOL assessed QOL specific to menopause, which was predicted be moderately associated with VMS (frequency and severity), sleep disturbance and related impairments (PGI-S SD, PROMIS SD SF 8b, PROMIS SRI SF 8a scores), overall health (EQ VAS), as well as VMS-related work productivity and activity (WPAI scores). Known-groups validity was evaluated by comparing mean MENQOL overall and domain scores across groups defined using EQ VAS and VMS severity. Analysis of variance with orthogonal planned comparisons was used to test hypotheses that MENQOL scores differ significantly between adjacent quartile groups defined using EQ VAS scores at baseline. The known-groups served as the independent variable, and MENQOL scores were dependent variables in the analysis of variance with alpha = 0.05.

Sensitivity to change was evaluated using Spearman correlations and analysis of covariance. Changes in PRO scores from baseline to week 12 were correlated with changes in MENQOL scores. Concurrent improvement in PRO measures was expected to result in moderate-to-strong correlations.

Changes in MENQOL overall and domain scores from baseline to week 12 were assessed using mean changes for improved versus non-improved participants and through significance testing using separate analysis of covariance models controlling for baseline with alpha = 0.05. Change from baseline to week 12 was calculated for MENQOL scores among EQ VAS response groups (response = improvement of ≥ 0.5 SD of baseline score) and PGI-C VMS response groups (responders = participants reporting better, much better, or very much better). The dependent variable was change from baseline at week 12, and the model included the responder or improvement factor as fixed.

Thresholds for meaningful within-patient change on MENQOL were estimated using anchor-based approaches together with distribution-based estimates and receiver operating characteristic (ROC) curves. These methods are consistent with guidance from the FDA for determining responder thresholds [7, 8].

Meaningful within-patient change was evaluated using the PGI-C VMS as an anchor (a suitable anchor typically has a correlation of ≥ 0.30 [16]). Spearman correlations between categories of response scores in PGI-C VMS (Supplementary Table S2) and MENQOL summary score response at weeks 4 and 12 were assessed, as were descriptive statistics for raw change. Using the PGI-C VMS anchor, meaningful within-subject change was defined as a PGI-C VMS score of + 2 (moderately better). Mean changes in MENQOL domain and total scores from baseline to weeks 4 and 12 were calculated for the following categories of change: + 3 or + 2 (much better or moderately better), + 1 (a little better), 0 (no change), − 1 (a little worse) and − 2 or − 3 (moderately worse or much worse) (Supplementary Table S2). Mean changes in MENQOL scores from baseline to weeks 4 and 12 were calculated for ≥ 1-point increase, no change (0 points), and ≥ 1-point decrease. Sensitivity and specificity were calculated and ROC curves derived using logistic regression analyses, with anchor change group variables collapsed into two groups: moderate or better improvement and minimal/no improvement or deterioration (Supplementary Table S3). Responder status was the dependent variable in each model, and change from baseline in MENQOL score was the independent variable.

The clinically meaningful threshold was defined by the change value corresponding to the cutoff in the ROC space that minimizes the sum of squares of (1 − sensitivity) and (1 − specificity) and is therefore closest to the top-left corner (1, 0) of the ROC space [17].

For the supportive distribution-based methods, statistical parameters included effect size (Cohen’s d), SD, and standard error of measurement (defined as SD·√[1 − r], where r is the internal consistency of the instrument). Effect sizes were defined as small (0.2), medium (0.5), or large (0.8) [18].

Statistical Analyses

Statistical comparisons were made using two-sided tests at the alpha = 0.05 level. For point estimates, 95% CIs were used. All analyses were performed using SAS version 9.3 or higher (SAS Institute, Cary, North Carolina, USA). See Fig. S1 in the electronic supplementary material for details of the CFA second-order model code.

Results

Completion Rates

Of the expected 1022 participants in SKYLIGHT 1 and 2, 99.5–99.7% had baseline data, with completion rates remaining high at 91.2% at weeks 4 and 84.6% at week 12. Compliance rates were > 94% at all timepoints.

Mean MENQOL Scores

The mean MENQOL total score at baseline was 4.30, improving to 3.25 at week 4 and 3.16 at week 12 (Fig. 1). All item mean scores demonstrated the greatest decrease from baseline to week 4, with smaller improvements between weeks 4 and 12 (Supplementary Table S4). The greatest symptom bother at baseline was reported for items in the vasomotor domain (mean range 6.41–6.58) and the individual item “difficulty sleeping” (mean 5.60). While large ceiling effects were observed for all other items at baseline (mean range 2.13–4.87), all improved at weeks 4 and 12 (Supplementary Table S4). For domain scores, baseline vasomotor score was the highest (mean 6.52), improving to 4.74 at week 4 and 4.36 at week 12. Lowest mean scores were observed for the psychosocial, physical, and sexual domains, ranging from 3.37 to 3.69 at baseline, 2.52–2.88 at week 4, and 2.51–2.91 at week 12.

Fig. 1
figure 1

MENQOL total score over time. MENQOL total score over time. MENQOL Menopause Quality of Life questionnaire, SD standard deviation

Psychometric Evaluation

Excellent fit was observed for the second-order model (Table 1; Supplementary Fig. S2). Factor loadings were consistently high for each domain, ranging from 0.75 to 0.82 for vasomotor, 0.64–0.74 for psychosocial, and 0.67–0.88 for the sexual domain. Latent domain factor loadings and physical domain scores were also highly related to the general factor (range 0.43–0.93).

Table 1 Confirmatory factor analysis of MENQOL overall score at baseline (second-order model)

Overall, MENQOL showed a high degree of internal consistency (Cronbach’s alpha: baseline, 0.93; week 12, 0.94; Table 2). Internal consistency for the domain scores was also high at both timepoints (baseline, 0.83–0.90; week 12, 0.84–0.91). Furthermore, the coefficients for MENQOL scores when each item was individually deleted further supported inter-item consistency (MENQOL overall score—item omitted: baseline, 0.93–0.93; week 12, 0.71–0.91; Table 2).

Table 2 Internal consistency reliability analysis of MENQOL overall and domain scores

Correlations between items within the same domain at baseline were generally sufficient without suggesting redundancy (defined as r = 0.40–0.90), particularly for the vasomotor (r = 0.76–0.80), psychosocial (r = 0.42–0.67), and sexual domains (r = 0.67–0.84; Supplementary Table S5). However, some weaker correlations were observed for physical domain items 14–17 (difficulty sleeping, aches in back of neck or head, decrease in physical strength, and decrease in stamina), items 21–22 (increased facial hair and changes in appear/texture/tone of skin), and items 24–26 (low backache, frequent urination, and involuntary urination when laugh/cough), which ranged from 0.23 to 0.40. Excluding these weak correlations, the physical item correlations ranged from 0.40 to 0.88.

Moderate-to-strong item-total correlations were observed at baseline (r = 0.44–0.73) and week 12 (r = 0.40–0.73) for MENQOL overall score, with no redundant items (Table 3). Furthermore, strong correlations were observed between items within each of the domain scores, notably the relationship between items 1–3 and the vasomotor score at baseline (r = 0.72–0.73) and week 12 (r = 0.81–0.82; Table 3).

Table 3 Item-total correlation analysis of MENQOL overall and domain scores at baseline and week 12

Moderate test–retest reliability was observed for MENQOL overall score when PGI-C VMS was used to define stable participants (ICC 0.71) as well as the domain scores (ICC 0.61–0.71) (Table 4). When EQ VAS was used to define stable participants, test–retest reliability was moderate for the physical and sexual domain scores (ICC 0.53 and 0.65, respectively) but lower for other domain scores (ICC 0.21–0.47; Table 4).

Table 4 Test–retest reliability analysis of MENQOL overall and domain scores among stable patients from baseline to week 4 in PGI-C VMS and EQ VAS scores

MENQOL overall scores at baseline moderately correlated (r > 0.30) with PROMIS SRI SF 8a, PROMIS SD SF 8b, PGI-S SD, EQ VAS, and WPAI activity impairment, presenteeism, and overall work productivity loss scores (Table 5). Weak correlations were observed for frequency and severity of VMS (r = 0.09 and 0.18, respectively) and for WPAI absenteeism (r = 0.20). For the vasomotor domain score, correlations were moderate with PROMIS SD SF 8b, PGI-S SD, and WPAI scores except absenteeism (absolute r = 0.34–0.43). Correlations between vasomotor domain score and frequency and severity of VMS (r = 0.21 and 0.23, respectively) were lower, as were correlations with PROMIS SRI SF 8a, EQ VAS, and WPAI absenteeism (absolute r = 0.02–0.30). Similar patterns of moderate correlations were observed for the psychosocial and physical domain scores between the same measures (absolute r = 0.35–0.46 and 0.32–0.48, respectively), except for notably higher correlations for PROMIS SRI SF 8a and EQ VAS (absolute r = 0.42–0.61). Similar to overall scores, weak correlations were found for VMS frequency and severity and WPAI absenteeism with psychosocial and physical scores (absolute r = 0.04–0.19 and 0.04–0.19, respectively). The MENQOL sexual domain score had low correlations with all PRO measures (absolute r = 0.04–0.26) (Table 5).

Table 5 Convergent validity: correlations between MENQOL overall and domain scores and assessments of related constructs at baseline

In the known-groups validity analysis, there were significant differences in MENQOL overall and domain scores across EQ VAS quartiles and VMS severity groups at baseline (all p ≤ 0.012), except for vasomotor score using EQ VAS quartiles and sexual scores using VMS severity (Table 6). MENQOL overall scores were significantly different between adjacent EQ VAS quartile and VMS severity groups (all p ≤ 0.013).

Table 6 Known-groups validity: analysis of mean MENQOL overall and domain scores by EQ VAS quartile groups at baseline

Moderate correlations (r > 0.30) were observed for the MENQOL overall change score (r = 0.37–0.50) and vasomotor domain change score (r = 0.38–0.60) with PGI-S SD, PGI-C SD, PGI-C VMS, PROMIS SRI SF 8a, PROMIS SD SF 8b, and frequency of VMS (Table 7). Moderate correlations were also observed for change from baseline in MENQOL psychosocial domain score with PGI-S SD, PGI-C VMS, PROMIS SRI SF 8a, and PROMIS SD SF 8b (r = 0.30–0.46) and change from baseline in physical score with PGI-S, PROMIS SRI SF 8a, PROMIS SD SF 8b, and EQ VAS (absolute r = 0.32–0.49; Table 7).

Table 7 Sensitivity to change: correlations between MENQOL overall and domain scores and PRO variables change from baseline to week 12

Moderate correlations were found between the anchor and MENQOL overall score (r = 0.47–0.48) and vasomotor domain score (r = 0.60–0.60; Supplementary Table S6). Anchor support was demonstrated for the psychosocial score, although the correlations were lower (r = 0.30–0.30). Changes in physical and sexual scores were weakly correlated with PGI-C VMS (r = 0.28–0.29 and 0.18–0.18, respectively).

For results with sufficient anchor correlations (area under the curve [AUC] > 0.70), AUCs for the MENQOL overall and domain scores ranged from 0.73 to 0.81 (Supplementary Table S7). Psychosocial score AUCs were, however, below the recommended value of 0.70 (week 4, 0.65; week 12, 0.66). Thresholds for overall score were − 1.08 and − 0.91 at weeks 4 and 12 and for the vasomotor domain were − 2.00 at both timepoints. The threshold for the psychosocial score was − 0.71 at both timepoints, although AUCs were low. The physical and sexual domains showed low anchor correlations and AUCs (Supplementary Table S7).

For the MENQOL overall score (score range 1–8), anchor-based mean change estimates at weeks 4 and 12 were − 1.05 and − 1.19, respectively, using the PGI-C VMS anchor (median − 0.98 and − 1.19; Table 8). ROC analyses provided support for the lower end of this range, even when only using the largest ROC estimates; the smallest threshold arising from the ROC analyses was − 0.91 points. Therefore, a threshold of 0.90 points based on the lowest anchor-based median and ROC estimates was selected for MENQOL overall score.

Table 8 Mean change in MENQOL overall and domain scores within PGI-C VMS categories of change from baseline to weeks 4 and 12

Vasomotor score anchor-based mean change estimates at weeks 4 and 12 were larger than for overall score (− 1.91 and − 2.02, respectively; median − 2; score range 1–8; Table 8). Again, ROC analyses provided support for the lower end of this range, with thresholds estimated at − 2.0 points. The same assessment for vasomotor score led to a threshold of 2.0 points, above the distribution-based estimate (0.5 SD at baseline) and the largest estimate for the “no change” anchor category (0.75). Thresholds for psychosocial score at weeks 4 and 12 were − 0.83 and − 0.87, respectively, for the anchor-based mean change estimates (median − 0.43 and − 0.71; score range 1–8); the ROC estimate was − 0.71, resulting in a threshold of 0.9 for the psychosocial score, in line with distribution-based and “no change” estimates (0.5 SD at baseline) as well as participants reporting “no change” (− 0.45).

Discussion

This analysis was designed to evaluate the psychometric properties, sensitivity to change, and clinically meaningful within-patient change of the MENQOL questionnaire in individuals experiencing moderate-to-severe VMS related to menopause who were treated with fezolinetant. MENQOL completion rates were high, ranging from 99.5% at baseline to 84.6% at week 12. Overall, the results provide evidence of acceptable psychometric properties of the MENQOL overall and domain scores using 1-week recall, supporting use of this instrument to capture experiences among people with moderate-to-severe VMS associated with menopause and assess related endpoints in clinical trials.

At baseline, greatest symptom bother was reported for vasomotor domain items (rated moderately to highly bothersome), and symptom improvement was seen at weeks 4 and 12. Baseline scores for psychosocial, physical, and sexual domains were low to moderate at baseline relative to vasomotor scores. Correlations between items were generally moderate (r > 0.4), with some weak correlations between physical domain items. Correlations between each item and the overall and domain scores (omitting that item) were also moderate to high, with no redundant items. CFA provided additional support for the established MENQOL domain structure, including overall score. The second-order model demonstrated acceptable fit and generally strong relationships between the items, domains, and overall score.

Internal consistency of the MENQOL overall and domain scores were supported using Cronbach’s alpha and McDonald’s omega, and MENQOL construct validity was supported for overall and domain scores. Overall and domain scores differentiated well between groups defined by EQ VAS severity at baseline except for vasomotor scores. Additionally, MENQOL scores between VMS severity groups were significantly different at baseline and week 12 except for sexual scores at baseline. Adequate convergent validity was generally demonstrated by moderate correlations between overall and scale scores and PRO measures of related constructs, although some weak correlations were observed. Longitudinal analysis using two timepoints provided support for sensitivity to change (baseline to week 12) and test–retest reliability (baseline to week 4; ICC 0.50–0.90) except for sexual domain scores. Test–retest reliability was lower using EQ VAS than PGI-C VMS, possibly because the EQ VAS is a general health measure, while the PGI-C VMS is symptom specific and more related to MENQOL overall and domain scores.

Thresholds for defining clinically important responses were estimated on the basis of within-subject change on each scale score. These data were triangulated considering the range of the scale, sufficient anchor correlations, the smallest improvement exceeding the 0.5 SD at baseline, and lower 95% CI estimates for participants experiencing “no change” on the anchors. On the basis of FDA guidance, more consideration was allotted to anchor-based estimates [7, 8]. The analysis supports a MENQOL overall score reduction of ≥ 0.9 points as responding to treatment (a clinically important threshold). Thresholds of 2.0 points for the vasomotor domain and 0.9 for the psychosocial domain were proposed, in addition to distribution-based threshold estimates of 0.8 and 1.2 for the physical and sexual domains, respectively.

A few prior publications have assessed clinically/minimally important difference in postmenopausal women with moderate/severe VMS, but they anchored with different PRO measures compared to our study and involved different treatments. In two hormone therapy studies, weekly VMS frequency [19] or severity [20] was anchored to generic CGI (not specific to VMS) outcomes. Other publications that have reported responder thresholds in moderate-to-severe VMS include those reporting VMS frequency and severity anchored to the Menopause Symptoms Treatment Satisfaction Questionnaire in women treated with desvenlafaxine [21]; VMS frequency anchored to CGI and the MENQOL questionnaire in women treated with hormone therapy [22]; and VMS frequency anchored to the Hot Flash Related Daily Interference scale/Hot Flash Interference scale in women treated with escitalopram [23].

The current results support the good psychometric properties of the MENQOL (internal consistency reliability, convergent reliability, and structural validity) found in other populations such as breast cancer survivors [24, 25] and participants with diabetes [26].

Limitations are primarily due to challenges associated with evaluating MENQOL sexual domains. Anchors for meaningful change were not optimal for MENQOL physical and sexual domains. However, as noted by Bushmakin et al. [13], an individual may experience some physical symptoms of menopause (such as difficulty sleeping) but not others (such as drying skin). Correlations of anchors with overall and vasomotor domain scores were between 0.47 and 0.60 and were lower for the psychosocial domain (0.30–0.30) because of different relationships between VMS and the domains. More precise estimates of meaningful change may be offered by targeted global impressions of change, focused on specific concepts of interest for the physical and sexual domains. The scales against which construct validity and responsiveness for MENQOL domains was examined were moderately related to MENQOL domains in general, providing additional support for acceptable measurement properties of MENQOL in this population.

Conclusion

These analyses confirm the measurement properties of the MENQOL questionnaire. Additionally, within-person clinically important response thresholds have been established using appropriate anchors and distribution-based methods. Overall, these results suggest MENQOL is fit for purpose to evaluate appropriate endpoints in trials investigating moderate-to-severe VMS associated with menopause.