FormalPara Key Points for Decision Makers

A proportion of patients with hypertrophic cardiomyopathy (HCM) report limiting symptoms, including shortness of breath, tiredness, chest pain, palpitations, dizziness, and fainting.

This study assessed the validity of the HCM Symptom Questionnaire (HCMSQ), a new patient-reported outcome instrument designed to measure symptoms and treatment benefit from the perspective of patients with HCM.

The HCMSQ demonstrated acceptable longitudinal measurement properties, supporting it to be a reliable and valid tool to capture change in symptoms associated with treatments for HCM, and supporting its use in capturing patients’ symptoms of HCM.

1 Introduction

Hypertrophic cardiomyopathy (HCM) is a primary myocardial disorder defined by left ventricular hypertrophy that cannot be explained by another cardiac or systemic disease [1]. Although fewer than one in 3000 people receive a diagnosis of HCM globally, it is thought that between one in 200 and one in 500 people in the general population have the disorder [1,2,3,4]. HCM is classified as obstructive or nonobstructive based on the presence or absence of left ventricular outflow tract (LVOT) obstruction [3, 5, 6]. Approximately two-thirds of patients with HCM are affected by obstructive HCM [6], and it is estimated that 50% of all patients diagnosed with obstructive HCM have one or more symptoms related to the condition [7].

Recent qualitative and quantitative research resulted in the development of a conceptual model to gain an in-depth understanding of the cardinal signs, symptoms, and impacts of HCM [8]. Core HCM symptoms are: shortness of breath, especially with physical exertion; fatigue; chest pain; palpitations; dizziness; and fainting (syncope) [2, 3, 6, 8,9,10]. These symptoms have physical, emotional, and social impacts on patients, and contribute to significant impairment in quality of life [9, 11].

The primary treatment goal for HCM is to improve patients’ symptoms and health status [3, 12]. Given the nature of HCM symptoms and their impact on patients’ function and quality of life, it is critical to understand the effect of interventions on these symptoms. Data generated from a fit-for-purpose patient-reported outcome (PRO) instrument can provide robust and direct evidence of treatment benefit from the patient’s perspective [13]. Such data can be used by regulators to understand the benefit–risk profile of the intervention; by payer and health technology assessment agencies to understand differences between treatments; by healthcare professionals to identify interventions from which patients are likely to gain benefit; and by patients to engage in shared treatment decision-making [1, 14, 15]. For a PRO instrument to be considered ‘fit for purpose’ it must be well defined and generate reliable and interpretable data. Specifically, the PRO instrument must be content valid, and it must demonstrate adequate psychometric measurement properties in the context in which it will be used, including being a valid measure of the domains it seeks to quantify, being reproducible in stable patients, and being sensitive to clinical changes.

The HCM Symptom Questionnaire (HCMSQ) has recently been developed as a novel PRO instrument for measuring the symptoms of HCM, in line with US Food and Drug Administration (FDA) guidance for developing PROs [13, 16]. In brief, cognitive debriefing interviews and a card-sorting task were conducted in 33 and 22 patients with HCM, respectively, and baseline blinded data from two trials (EXPLORER-HCM and MAVERICK-HCM) were pooled (N = 299) to develop the scoring algorithm [16]. Measurement properties were examined, followed by a meaningful-change analysis to interpret scores. Rasch modeling, mixed-model repeated measures, exploratory factor analysis, confirmatory factor analysis, and missing-data simulation analysis informed the number of domains and the items in each domain [16]. During development of the tool, robust evidence of content validity was shown in populations with nonobstructive HCM as well as obstructive HCM and empirical evidence from both populations was used to develop the scoring algorithm [16]. Preliminary evidence of psychometric measurement properties in HCM has been shown, from cross-sectional data of patients with obstructive HCM and short-term longitudinal data of a small population with nonobstructive HCM; this included item-to-item correlation, item–scale analyses, test–retest reliability, convergent validity, known-groups validity, and responsiveness [16]. The known-groups analysis, performed to determine the degree to which the HCMSQ can distinguish between defined groups of patients, showed significant differences for many domains by Patient Global Impression of Severity (PGIS) category and by New York Heart Association (NYHA) classification [16]. To support the appropriateness of the HCMSQ instrument for defining disease severity and for use in clinical studies in obstructive HCM, longitudinal confirmation of psychometric measurement properties in an obstructive HCM population are required. Here, we report the results of these longitudinal analyses, based on patient data from the phase III EXPLORER-HCM trial (ClinicalTrials.gov identifier NCT03470545).

2 Methods

2.1 Study Design and Patient Population

The EXPLORER-HCM trial was a phase III, double-blind, 1:1 randomized study to evaluate the safety, tolerability, and efficacy of mavacamten (starting at 5 mg) versus placebo in patients with symptomatic obstructive HCM [17]. The study included a 5-week screening period, a 30-week treatment period, and an 8-week post-treatment follow-up period (± 7 days). A total of 251 patients with obstructive HCM were enrolled in EXPLORER-HCM (Table 1 shows demographic characteristics of the study sample). Additional details of the EXPLORER-HCM methodology, patient characteristics, and key clinical and safety results have been described elsewhere [17,18,19,20].

Table 1 Demographic characteristics and baseline severity scores of the patient population

2.2 The HCMSQ Instrument and Other Measurement Tools

The HCMSQ was designed based on an HCM conceptual model of symptoms and impacts [8], and measures the core symptoms of HCM (covering tiredness/fatigue, heart palpitations, chest pain, dizziness, and shortness of breath) over a 24-hour recall period. The HCMSQ is intended to be administered on 7 consecutive days with item scores calculated as the mean of the 7-day (weekly) scores divided by the number of scores available for that week. The HCMSQ produces four domain scores (shortness of breath [4 items], tiredness [1 item], cardiovascular symptoms [3 items], syncope [1 item]), which are combined to create a total score (sum of the first three domains [8 items]) ranging from 0 to 12.5 [16]. The complete HCMSQ instrument is available in Supplementary Table S1 in the electronic supplementary material (ESM).

In EXPLORER-HCM, the HCMSQ was administered electronically on provisioned handheld devices daily during screening and through the first 6 weeks of the treatment period, and then for 7 consecutive days prior to the week 10, 14, 18, 22, 26, 30 (end of treatment), and 38 (end of study) study visits. Patients were only able to submit a daily entry if all items were completed. For HCMSQ data to be included in the evaluation, patients were required to have completed a daily entry on at least 4 out of the 7 consecutive days prior to the visit. If fewer than 4 days were available in the 7-day period, then a weekly score was not calculated and the entries for the week were treated as missing. HCMSQ scoring was performed as outlined in Supplementary Table S2 in the ESM. For example, for shortness of breath, the score is the sum of four shortness of breath items, with a potential score range of 0–18; lower scores indicate less shortness of breath. To guide the handling of missing data, a missing-data simulation study for HCMSQ domain and total scores has been performed previously [16].

In addition to the HCMSQ, several other PRO instruments were administered in the EXPLORER-HCM study, including the Kansas City Cardiomyopathy Questionnaire-23 (KCCQ-23) [17, 18, 21], the EuroQol visual analogue scale (EQ VAS) [22], the PGIS, and the Patient Global Impression of Change (PGIC). The KCCQ-23 asks respondents to describe the frequency and severity of symptoms, physical and social limitations, and how they perceive that their heart failure symptoms impact their quality of life. It includes 23 items, has a recall period of 2 weeks, and produces individual scores for each of the seven domains (symptom burden, symptom frequency, physical limitation, quality of life, social limitation, self-efficacy, and symptom stability). In addition, it generates a total symptom score, which combines symptom burden and symptom frequency scales, and two summary scores—the clinical summary score is the sum of the total symptom score and physical limitation scales; the overall summary score is the sum of the total symptom score, clinical summary score, social limitation, and quality of life scales [21]. All scores are transformed into a score ranging from 0 to 100, with higher scores reflecting better health status. The EQ VAS records respondents’ self-rated health on a vertical, graduated (0–100) VAS, whereby 0 represents worst health imaginable and 100 represents the best health imaginable. Higher scores indicate better quality of life [22]. The PGIS is a single-item questionnaire that asks respondents to rate their overall symptom severity in the past week on a 5-point rating scale (from 1 [no symptoms] to 5 [very severe]). Higher scores indicate greater severity (see also Supplementary Methods in the ESM). The PGIC is a single-item questionnaire that asks respondents to rate their overall change in symptom severity compared with their status at randomization over time on a 7-point rating scale (from 1 [very much improved] to 7 [very much worse]). Higher scores indicate a change for the worse (see also Supplementary Methods in the ESM).

The PGIS was administered electronically at screening, and both the PGIS and the PGIC were administered electronically at the week 6, 10, 14, 18, 22, 26, 30, and 38 study visits. The other PROs (KCCQ-23 and EQ VAS) were administered electronically at the baseline, week 6, 12, 18, 30, and 38 study visits on the same devices. The scoring for the KCCQ-23 instrument and the EQ VAS was carried out according to the procedures described by their respective scoring manuals; patients with missing forms at a visit were excluded from the analysis at that time point.

2.3 HCMSQ Psychometric Analyses

Longitudinal psychometric analyses of the HCMSQ and derivation of a responder definition specific to the HCM population were conducted in line with FDA guidance [13]. This psychometric analysis outlines the longitudinal measurement properties focusing on change from baseline, using baseline and week 30 data from the EXPLORER-HCM trial; supplemental analyses also included data from weeks 6 and 18. The following scores were evaluated per the scoring algorithm (see also Supplementary Tables S1 and S2 in the ESM): shortness of breath (items 1–3 and 6), tiredness (item 7), cardiovascular symptoms (items 8–10), and total score. Item 11 (syncope) is not included in the total score. The decision to exclude syncope from the total score was due to its low occurrence in the patient population; from a psychometric evaluation perspective there was not enough variation in this item response to be included in the total score. However, because it provides relevant information about patient experiences with HCM, it remains part of the scale. Following results from Rasch modeling, items 4 and 5 are not included in the scoring of any domain or total scores, and will be removed in future administrations of the HCMSQ [16].

2.3.1 Test–Retest Reliability

Test–retest reliability is the extent to which an instrument yields consistent, reproducible estimates of a patient’s health status in a stable population. Test–retest reliability of the HCMSQ was evaluated by calculating agreement-based intraclass correlation coefficients (ICCs) using a two-way mixed model among a subgroup of stable patients. Stable patients were defined separately in two analyses: first using the PGIC (i.e., patients who reported “no change” on the PGIC at baseline and week 6, week 6 and week 18, and week 18 and week 30), and second using the PGIS (i.e., patients who endorsed the same response at both baseline and week 6, week 6 and week 18, and week 18 and week 30). ICC values > 0.7 were considered evidence of test–retest reliability in both analyses [23, 24].

2.3.2 Sensitivity to Change/Responsiveness

Sensitivity to change refers to the extent to which the instrument can detect true change in patients known to have changed in clinical status. It was examined via correlations between change scores and via tests of differences between ordinal change categories.

2.3.2.1 Correlations Between Change Scores

Spearman correlation coefficients were calculated between changes from baseline to week 30 in the HCMSQ domain/total scores, and changes from baseline to week 30 in the domain scores for the KCCQ-23 overall summary, clinical summary, and total symptom scores, and EQ VAS score. Moderate correlations (r  0.3) were considered evidence for sensitivity to change.

2.3.2.2 Differences Between Ordinal Change Categories

Mean change in HCMSQ domain/total scores from baseline to week 30 across ordinal change groups was assessed via the following variables and ordinal change categories:

  1. 1.

    Change in NYHA classification from baseline to week 30 (i.e., improvement, no change, worsening).

  2. 2.

    Change in PGIS score from baseline to week 30 (i.e., worsened, no change, improved by one category, improved by two or more categories).

  3. 3.

    Change in PGIC score from baseline to week 30 (i.e., worsened, no change, minimally improved, much improved, very much improved).

  4. 4.

    Change in peak oxygen consumption (pVO2) from baseline to week 30 quartile group (i.e., decreased, no change, increased).

  5. 5.

    Change in post-exercise LVOT gradient scores from baseline to week 30 (i.e., decrease by two percentile groups [0–25th, 26–50th, 51–75th, and 76–100th], decrease by one percentile group, no change, increase by one percentile group, increase by two percentile groups).

Change group differences were tested via one-way analysis of variance (ANOVA). To assist in interpretation, two effect size measures are reported for each change category level: Cohen’s d, whereby a category’s mean change score is divided by the pooled baseline standard deviation (SD); and standardized response mean (SRM), whereby a category’s mean change score is divided by the pooled change score SD. The magnitude of these effect size measures is most meaningfully and intuitively interpreted in terms of non-overlap (u) between distributions: when the effect size is 0, populations’ distributions are perfectly superimposed and u = 0; when the effect size is 0.2 (i.e., a small effect), normally distributed populations of equal size and variability have only 14.7% of their combined area which does not overlap; when the effect size is 0.5 (i.e., a medium effect), 33% of the combined areas covered by the two distributions do not overlap; and when the effect size is 0.8 (i.e., a large effect), the two populations are so separated that almost half (u = 47.4%) of their areas do not overlap [25].

2.3.3 Meaningful Change Analysis (Responder Definition)

Meaningful change (i.e., the amount of individual-level change over a predefined period that could be interpreted as a meaningful clinical benefit) was evaluated by applying anchor-based methods to establish thresholds for meaningful change, using baseline and week 30 data and PGIC and PGIS scores as anchors [26]. To explore the adequateness of the anchor, the correlation (Spearman’s rho) of these anchors with the change in HCMSQ weekly scores was assessed. The anchor was implemented only if the correlation coefficient was > 0.30. Groups were formed from the “improved” PGIC responses, and mean change values were calculated for the HCMSQ weekly domain and total scores. These values represented the responder thresholds. The PGIS baseline to week 30 change scores were used to identify a sample of patients whose PGIS scores improved by at least one level. Mean change values for this sample represented a second estimate of responder thresholds.

3 Results

A total of 251 patients were enrolled in the EXPLORER-HCM study. Baseline demographic characteristics as well as baseline severity scores are shown in Table 1. Patients had a mean age of 58.5 (SD: 11.9) years, 41% (102/251) were women, and most (183/251; 73%) were in NYHA class II. More than 80% of patients had evaluable HCMSQ data at baseline; the compliance rates stayed above 70% across all further time points (weeks 3, 6, 14, 18, 22, 26, 30), with the exception of week 10 when rates were > 50% (Supplementary Table S3, see ESM).

3.1 HCMSQ Longitudinal Psychometric Analyses

3.1.1 TestRetest Reliability

In PGIC-based analyses, all HCMSQ domain scores had acceptable test–retest reliability using data from baseline and week 6 (n = 51), week 18 (n = 87), and week 30 (n = 106) (shortness of breath, ICC = 0.78–0.79; tiredness, ICC = 0.76–0.82; cardiovascular symptoms, ICC = 0.72–0.97; total score, ICC = 0.82–0.90; see Table 2). Similarly, all HCMSQ domain scores had acceptable test–retest reliability (ICC > 0.70) for PGIS-based analyses (Table 2).

Table 2 Assessment of test–retest reliability for HCMSQ domains using PGIC and PGIS

3.1.2 Sensitivity to Change/Responsiveness

3.1.2.1 Correlations Between Change Scores

HCMSQ domain scores and the total score had moderate or strong Spearman correlations with all KCCQ-23 and EQ VAS change scores (r ≥ 0.3) (Table 3).

Table 3 Assessment of correlations between change scores of HCMSQ domain/total scores with KCCQ-23 and EQ VAS scores (Spearman correlations), from baseline to week 30
3.1.2.2 Differences Between Ordinal Change Categories

HCMSQ domain and total change scores significantly differed across ordinal change levels for several measures of interest (Tables 48). There were statistically significant differences for all HCMSQ change scores by change in NYHA category (all p < 0.05), except for the tiredness score (Table 4); by change in PGIS score (all p < 0.0001; Table 5); by change in PGIC score (all p < 0.001; Table 6); and by change in pVO2 percentile group (all p ≤ 0.05; Table 7). There were no significant differences by change in LVOT gradient percentile group (Table 8). Effective size measures were consistently of greater magnitude for the improvement groups than for the no-change groups.

Table 4 Assessment of sensitivity to change: mean (SD) change from baseline to week 30 on HCMSQ domains, by NYHA category change
Table 5 Assessment of sensitivity to change: mean (SD) change from baseline to week 30 on HCMSQ domains, by PGIS category change
Table 6 Assessment of sensitivity to change: mean (SD) change from baseline to week 30 on HCMSQ domains, by PGIC category change
Table 7 Assessment of sensitivity to change: mean (SD) change from baseline to week 30 on HCMSQ domains, by pVO2 quartile change group
Table 8 Assessment of sensitivity to change: mean (SD) change from baseline to week 30 on HCMSQ domains, by LVOT quartile change group

3.1.3 Meaningful Change Analysis (Responder Definition)

Thresholds with improvement categories derived from the PGIC (n = 136) and PGIS (n = 90) analyses were 1.00 and 1.37, respectively, for cardiovascular symptoms; 2.57 and 3.40, respectively, for shortness of breath; and 1.47 and 2.04, respectively, for total score (see Table 9). For tiredness, a PGIC threshold was not calculated because correlation between change scores and anchor was < 0.3; with PGIS as anchor, a threshold of 0.73 was derived (Table 9).

Table 9 Meaningful change analysis (responder definition) for the HCMSQ, from baseline to week 30

4 Discussion

The HCMSQ, which was designed to measure the patient experience in the context of HCM, had already demonstrated good cross-sectional measurement properties (internal consistency reliability, convergent validity, known-groups validity) in obstructive HCM using baseline data from the EXPLORER-HCM trial, as well as adequate cross-sectional and longitudinal measurement properties (internal consistency reliability, convergent validity, known-groups validity, test–retest reliability, and sensitivity to change) in a relatively small population with nonobstructive HCM in the MAVERICK-HCM trial [16]. This present analysis was performed to supplement these analyses with longitudinal data from a larger data set, so that the HCMSQ could be considered reliable, valid, and interpretable for use in defining changes in obstructive HCM outcomes. Responder definitions aid the ease with which the data can be interpreted.

This study, conducted using longitudinal data from the EXPLORER-HCM data set, showed strong evidence for the longitudinal psychometric properties of all HCMSQ scores. The majority of HCMSQ domain scores had acceptable (ICC > 0.70) test–retest reliability, as demonstrated by analyses of PGIC- and PCIS-based stable patients across multiple time frames. The HCMSQ was sensitive to change: there were strong or moderate correlations for change from baseline to week 30, with change scores for measures of health-related quality of life (EQ VAS) and health-related quality of life for heart failure (KCCQ-23); and significant differences in HCMSQ change scores by NYHA class (except tiredness), PGIS, PGIC, and pVO2 categories, but not by post-exercise LVOT gradient percentiles. It is possible that non-significant differences by LVOT gradient were observed because a post-exercise gradient (not a resting or Valsalva LVOT gradient) was used, and patients’ symptoms that were experienced and then reported on the HCMSQ may not have been due to high levels of exertion. Most of the means for differences in HCMSQ change scores are in a logical order despite small sample sizes for some of the categories. A small number of rows in the assessment of the HCMSQ versus NYHA contradict the expected order; however, this can be attributed to the small group of patients with worsened symptoms (n = 7) in this assessment. Although some of the correlations between change scores of HCMSQ with KCCQ-23 and EQ VAS scores were not as high as desirable, they did cross the threshold of 0.3 and are therefore acceptable [25].

The psychometric analyses presented here validate the HCMSQ tool for use in patient populations with obstructive HCM. Taken together with the analyses performed during the development of this instrument [16], the HCMSQ demonstrated content validity and psychometric support in nonobstructive HCM and obstructive HCM as well as well-defined meaningful change estimates in both populations, and the instrument is therefore fit for purpose to support an understanding of treatment benefit in HCM clinical trials. This makes it the first HCM-specific symptom PRO instrument to measure the patient experience in HCM. As such, it has potential to be used more widely by pharmaceutical companies, payer and health technology assessment agencies, and healthcare professionals to evaluate and track salient and meaningful symptoms of HCM. The data from the HCMSQ can provide reliable, valid, and interpretable data to support decision-making.

The similar thresholds derived from the PGIC and PGIS and the narrow ranges across time points allow us to propose the following a priori responder definitions, derived from preliminary minimal meaningful change thresholds: a reduction in score of at least 1 point for tiredness and cardiovascular symptoms domains; a reduction in score of at least 2.5 points for the shortness of breath domain; and a reduction in score of at least 2 points for the total score are considered clinically relevant for patients. These estimates of definitions of minimal meaningful change are conservative, and may change with time as more evidence is gathered through further studies [27]. Data on responders can be used to support interpretation of data derived from the HCMSQ, and to report the proportion of people who see a meaningful improvement or worsening in symptoms. It is important to note that the anchors on which meaningful change were defined were both patient-reported; the data set did not identify a clinical marker that could be considered an appropriate anchor. However, patient-reported anchors may be more suitable than clinical measures or those that are assessed by a clinician [28].

This psychometric analysis aimed to evaluate the longitudinal reliability and validity of the HCMSQ domains in the EXPLORER-HCM data set. Importantly, this analysis has addressed the next steps suggested in the publication discussing development of the instrument [16]. Our findings should be interpreted in the context of several potential limitations. First, while the HCMSQ was developed for patients with obstructive HCM and nonobstructive HCM, the EXPLORER-HCM data set included only patients with obstructive HCM. Similarly, the generalizability of data from clinical trials to real-world clinical practice cannot be assumed. The EXPLORER-HCM patient population was relatively small (n = 251) and, although it was demographically similar to a general population of patients with HCM [17], patient experiences of disease, treatment or care, and motivations for symptom improvement cannot be assumed to be the same. The HCMSQ can be considered content valid in a broad population owing to its development from a conceptual model developed in an observational sample, but it will be essential to confirm the psychometric measurement properties in a larger and more heterogenous population of patients with HCM under routine clinical care. Further, the EXPLORER-HCM trial was designed to test the safety and efficacy of mavacamten, whereas analysis of the psychometric performance of the HCMSQ was a secondary consideration and the trial was not necessarily set up optimally for all analyses. Moreover, for HCMSQ data to be included in the analysis, patients were required to have a daily entry on at least 4 out of the 7 consecutive days prior to the visit. As such, as is often the case with PROs, there were missing data for which imputation was required; however, previous missing-data simulation analyses support this strategy for accounting for data gaps. Lastly, all questionnaires were administered electronically. Further tests using pen-and-paper-based versions of the instrument could confirm the results. Despite these limitations, the HCMSQ demonstrated adequate measurement properties to support its use as a fit-for-purpose PRO to measure the symptoms of HCM.

5 Conclusions

The HCMSQ is the only instrument designed to specifically measure the symptoms of HCM. It demonstrates acceptable longitudinal measurement properties, supporting that it is a reliable and valid tool for use in HCM clinical trials to capture change in symptoms associated with treatments. These results support the use of the HCMSQ as a fit-for-purpose instrument to capture the experience of patients with symptoms of HCM.