Background

Fibromyalgia (FM) is a disorder characterized by chronic widespread pain and tenderness that is estimated to affect 0.5–10% of the worldwide population, with approximately 2–3% (greater than 5 million individuals) of the affected individuals present in the United States (US) alone [1,2,3,4,5]. Patients with FM often experience other symptoms, such as fatigue, impaired sleep, negative mood, cognitive limitations, and physical functioning limitations, leading to a reduced health-related quality of life (HRQoL) [6, 7]. Beyond pain, fatigue is commonly identified as one of the most bothersome and disabling symptoms, reported by greater than 80% of FM patients [1, 5, 8]. Patients often describe fatigue as “disruptive or extremely disruptive” to their HRQoL [9].

There is a growing body of evidence from both clinical and regulatory communities supporting FM-related fatigue as a multidimensional concept [1, 9,10,11]. Additional research on this phenomenon is needed within the context of clinical studies to fully understand the dimensionality as well as ascertain the ability of a single measure to saturate the construct of fatigue. The Multidimensional Daily Diary of Fatigue—Fibromyalgia-17 items (MDF-Fibro-17) is being developed for this purpose; to allow for the exploration and assessment of different components of FM-related fatigue (cognitive versus physical, etc.) in clinical trials while capturing the overall complexity of this experience [12].

Existing research that had been conducted with FM patients for concept elicitation [1], cognitive debriefing and the pilot testing [9] of an initial pool of 23 items was reviewed and used to inform the development of a multidimensional assessment of FM-related fatigue [12]. Five dimensions were identified to reflect the broad experience of FM-related fatigue: Global Fatigue Experience, Cognitive Fatigue, Physical Fatigue, Motivation, and Impact on Function. Qualitative and quantitative item-level evaluation suggested that 17 of the original pool of 23 items best supported the conceptual model. This resulted in the 17 item MDF-Fibro-17 being proposed [12].

The original qualitative work confirmed the content validity of the instrument, [12] developed for use in FM clinical studies in accordance with the Food and Drug Administration (FDA) guidance for patient reported outcome (PRO) development [13]. Further work however was needed to conduct psychometric analyses to support the appropriateness of the MDF-Fibro-17 for use in FM clinical studies. The original 23 item pool were therefore administered in a Phase 2 clinical study of TD-9855 (NCT01693692), and psychometric analyses were conducted and are presented in this article. The Phase 2 clinical study (NCT01693692) was a randomized, double-blind, parallel group, placebo controlled study conducted to investigate whether an investigative product, TD-9855, was effective in treating patients with fibromyalgia. TD-9855 is a potent reuptake inhibitor with modest selectivity for inhibition of norepinephrine reuptake and good central nervous system penetration properties in humans. It was hypothesized that TD-9855 would offer the potential for robust pain relief while minimizing any putative serotonergic side effects such as nausea, somnolence, fatigue, and sexual dysfunction. In addition, the majority of fibromyalgia patients suffer comorbid fatigue, therefore reduction in serotonergic activity could be beneficial [14]. Based on this, the primary endpoint for this study was fibromyalgia pain and the exploratory endpoint was fibromyalgia-related fatigue. The Multidimensional Assessment of Fatigue (MAF) was included in the study along with the 23-item pool used to develop the MDF-Fibro-17. The study included 392 subjects treated with TD-9855 2 dose levels or placebo with a ratio of approximately 2 to 1. This quantitative analysis was conducted to confirm whether the MDF-Fibro-17 is an acceptable instrument for the measurement of FM-related fatigue in clinical trials in adult patients with FM, and includes parameters associated with the reliability and validity of the individual items and scores of the MDF-Fibro-17 as well as the responsiveness and hence, interpretability of the measure.

Methods

The original pool of 23 items developed from the qualitative work was incorporated into a Phase 2 study of TD-9855, an investigational norepinephrine and serotonin reuptake inhibitor, in patients with FM [15]. Patients were required to be diagnosed with FM according to the 1990 American College of Rheumatology criteria,[3] be aged 18–65 years, and to have a self-reported pain level of at least 4 on an 11-point Numeric Rating Scale (NRS). Each subject signed an Institutional Review Board or Independent Ethics Committee approved informed consent form prior to participating in this study. Ethical approval for the original qualitative research was provided by Copernicus, a US centralized Independent Review Board. Ethical approval for the Pfizer cross-sectional validation study was provided by the Schulman Associates Institutional Review Board, Inc. and the University of Cincinnati Institutional Review Board. Ethical approval was obtained for the Theravance validation study at a site level, with each site obtaining approval individually. The 23 items were programmed onto a personal digital assistant (PDA) hand-held electronic device, to be completed by the patients at the end of each day during the placebo run-in period (Days -7 to -1), the treatment period (Days 1 to 43), and the post-treatment washout period (Days 44 to 57). Training for investigators and patients in the use of the PDA and completion of the diary in accordance with study procedures was provided in addition to a quick reference guide.

Patients were instructed to complete all items at approximately the same time every evening, and a restricted time-window for completion was programmed between the hours of 17:00 and 24:00. Retrospective completion of missed days was not allowed. The diary questions were presented sequentially and the option to skip items was not provided.

Each item was presented as a 0–10 NRS anchored by “not at all” at 0 and “extremely” at 10; higher scores indicated greater fatigue severity for 22 of the 23 items. A weekly score was calculated as the mean of the available data if greater than 4 entries were completed within the 7-day period. Observations less than 4 entries were considered missing with no imputation. All items were evaluated on an item level to confirm the hypothesized 5 domain, and a 17-item fit of the data to the conceptual model identified previously in qualitative work [12]. The 5 domain scores (Global Fatigue Experience, Cognitive Fatigue, Physical Fatigue, Motivation, and Impact on Function) were calculated as the summed average of item scores in each domain. A total score was calculated as the average of the domain scores (also ranging from 0 to 10).

A number of additional instruments were included in the study and used to inform the psychometric evaluation of the MDF-Fibro-17 (see Table 1 for further details.)

Table 1 Instruments used to inform the psychometric evaluation of the MDF-Fibro-17

The following standard set of psychometric analyses was performed [16].

Item-level evaluation

Item-level evaluation was conducted to examine data completeness, the distribution of responses per item was examined to identify any floor or ceiling effects and the pattern of missing item levels.

Confirmatory factor analysis (CFA)

Initial CFA of 17-item, five-factor latent-model

The factor structure of the MDF-Fibro-17 items was evaluated using the 17-item, five-factor latent-model (Fig. 1) analyses using interim baseline data from the Phase 2 study (N = 192) to assess the degree to which the hypothetical conceptual measurement model fit the data.

Fig. 1
figure 1

MDF-Fibro-17 Hypothesized Model

Second CFA of 5 domains to create a total score

Following the initial CFA conducted to explore the multidimensional domain structure of the measure, a secondary factor analysis of the domains was conducted to explore the appropriateness of calculating a total score (Fig. 1). This second CFA was conducted using full data set from the Phase 2 study (N = 381). The averaged domain raw scores were used as the manifest variables in a single-factor CFA.

For the initial and secondary CFA, the goodness of fit of the models was evaluated by several fit indices using the following pre-defined thresholds: a comparative fit index (CFI) of 0.95 or higher; a root mean square error of approximation value (RMSEA) of 0.06 or lower; a non-normed fit index (NNFI) of 0.90 or higher; and a standardized root mean residual (SRMR) of 0.08 or lower [17,18,19,20,21,22,23]. Confirmatory factor analysis was conducted using Mplus Version 6.1.19.

Item-domain relationships

The relationships between individual items and the proposed MDF-Fibro-17 domains were evaluated. Item-total correlations, within the hypothesized domains, were expected to be 0.4 or greater [24,25,26].

Reliability

The consistency of the items to measure fatigue at individual time points as well as the repeatability while patients were considered stable were evaluated. Reliability of the MDF-Fibro-17 domain and total scores were assessed using test-retest reliability (intra-class correlation coefficient [ICC] ≥ 0.7; Spearman Brown) and internal consistency (Cronbach’s alpha > 0.8) [18, 24]. The former was used specifically to determine the repeatability of the observed score in the absence of an observed change and the latter to assess the level of internal consistency ratings across a group of items within a domain.

Construct (convergent and divergent) validity

Convergent validity was assessed by looking at correlations with other measures of fatigue (the MAF Global Fatigue Index [GFI] the SF-36 Vitality [VT] subscale). A moderate relationship (>0.4) was expected with overall FM severity (FIQ Total score), and measures of physical functioning on the SF-36 physical functioning (PF) subscale and physical component score (PCS).

Divergent validity was assessed by looking at correlations with measures assessing concepts other than fatigue, such as mood (HADS), sexual function (ASEX), and cognitive function (BDEFS-FS, MASQ, PASAT, and ACT) and other aspects of HRQoL measured on the remaining 6 subscales on the SF-36v2.

Moderate or greater correlations (>0.4) were expected to confirm convergent validity, and weaker correlations (<0.4) expected to confirm divergent validity. However, given the complex relationships between symptoms in FM, correlations with measures assessing concepts other than fatigue were not expected to be zero. These analyses were conducted on absolute scores at Baseline and repeated at End of Study using change scores calculated for each measure.

Known-groups validity

Known-groups validity was examined to provide further evidence of construct validity. Scores on measures indicative of overall severity of condition (the pain intensity NRS and FIQ total score), and the GFI, a measure of fatigue, were divided into quintiles. Mean MDF-Fibro-17 total and domain scores were computed for each quintile. A generalized linear model provided an overall F-test for the group discrimination with effect size estimates considered as 0.2 (small), 0.5 (moderate), and 0.8 (large) [27].

Sensitivity to change and responder analysis

Effect sizes are defined as the mean change found in a variable divided by the standard deviation (SD) of that variable. Effect sizes are used to translate “the before and after changes” into a standard unit of measurement that will provide a clearer understanding the relative sensitivity and performance of each clinical variable. The ability of the MDF-Fibro-17 to detect changes observed in the clinical study was evaluated using distribution- and anchor-based methods. Distribution-based methods include estimations based on observed variance in the sample such as the evaluation of ½ SD or 1 standard error of measurement. Anchor-based methods allow for the conceptual linking (e.g., discriminability) between additional known clinical or patient variables.

For the distribution-based analyses, 2 definitions for the ½ SD approach were used: ½ of the baseline SD and ½ of the change score SD; and 2 for the standard error of the mean (SEM) approach: SEM based on the ICC (test-retest coefficient and the baseline SD), and SEM based on the ICC and the change score SD [28,29,30].

For the anchor-based analyses, a collapsed PGI-C scale category of “very much improved” and “much improved” versus remaining PGI-C responses denoting minimal improvement, no change, or decline (“minimally worse” to “very much worse”) was used for discrimination on the MDF-Fibro-17 (see Table 1 for further details). Additional anchors of a change of 8.0 points on the GFI, and 11.0 points on the FIQ total score were also used based upon the meaningful change established for these measures [31,32,33,34,35,36,37,38,39,40].

All analyses, unless otherwise specified, were conducted using Statistical Analysis Software (SAS) software Version 9.1.3 (SAS Institute Inc., Cary, NC, US). Values reported in text are means ± SD.

Results

Sample characteristics

The final sample of 392 patients in the intention-to-treat (ITT) population (369 females, 23 males) had an average age of 45.7 ± 10.6 years. The majority of patients were Caucasian (82.7%) followed by Black/African American (13.0%) (Table 2). At Baseline, patients had an average FIQ total score of 54.9 ± 14.92, which indicated moderate FM severity [32]. The average pain intensity NRS score was 6.1 ± 1.31 and average GFI score was 33.4 ± 8.09. Demographic and baseline clinical characteristics of the ITT analysis group are detailed in Table 2.

Table 2 Demographics and Baseline Clinical Characteristics (ITT Analysis group)

A total of 381 (97%) patients from the ITT population had data available on the DFS-Fibro at Baseline. This analysis set was used in the psychometric evaluation of the measure.

Item-level evaluation

The items were administered via electronic PDA, which did not allow items to be skipped; therefore, there no missing data were at the item level. No floor or ceiling effects at the item level were observed (0.3-1.3% and 0.3–0.5% respectively). All items showed a negative skew, with the majority of values to the right of the mean. Nine items had a z-score greater than 2.0 indicating a substantial departure from normality.

Confirmatory factor analysis (CFA)

Initial CFA of 17-item, five-factor latent-model

An initial CFA conducted using preliminary baseline data from the TD-9855 Phase 2 study (N = 192) concluded that the MDF-Fibro-17 fit the data well with all parameters met the pre-specified criteria. The initial CFA model was evaluated on the 17 items, 5-factor model hypothesized for the MDF-Fibro 17 and suggest that the model fit the data from both studies. These results are presented in Table 3 below, for reference also included are initial results from the existing validation study that was reviewed to inform the development of the tool, discussed elsewhere [12].

Table 3 Previous Confirmatory Factor Analyses of Item-level Results

Second CFA of 5 domains to create a total score (current study)

Using data collected in the full-dataset TD-9855 Phase 2 study (N = 381), the averaged domain raw scores were used as the manifest variables in a single-factor CFA to explore the appropriateness of a total score. The CFA models were evaluated in a stepwise fashion to allow for accumulation of evidence surrounding the dimensionality of the MDF-Fibro-17. The single-factor CFA model was evaluated on 5 domain scores and the total of 5 domain scores and suggest that the model fit the data. The CFI and NNFI were both 0.952, above their respective 0.95 and 0.90 required thresholds. The SRMR was 0.020, below the prespecified 0.08 threshold, which is, in part, due to the small number of parameters in this model. Due to the presence of correlated residuals between Fatigue Experience and the Physical Fatigue domain items, the RMSEA (0.15) was short of recommended standards and was associated with a notably high modification index (>10.0; amount of reduction if constraints removed). The path coefficients for the 5 domain MDF-Fibro-17, before accounting for correlated residuals, were 0.92 for Global Fatigue Experience, 0.88 for Cognitive Fatigue, 0.87 for Physical Fatigue, 0.98 for Motivation and 0.99 for Impact on Function.

The second-order CFA confirmed that it is acceptable to calculate a total score, which consists of all domain scores. The CFI was 0.997 and NNFI was 0.992, both well above their respective 0.95 and 0.90 required thresholds. The SRMR (0.010) was well under the required threshold, and the RMSEA (0.061) also met required standards. The path coefficients for the 17-item, 5 domain MDF-Fibro, accounting for correlated residual between Global Fatigue Experience and Physical Fatigue (0.42), were between 0.88 and 0.990. The correlation coefficients for individual items ranged from 0.92 to 0.99. The CFA results are shown in Table 4.

Table 4 Confirmatory Factor Analysis of Domain-level

Item-domain relationships

Corrected item-total correlations within hypothesized domains ranged from 0.92 to 0.96 for Global Fatigue Experience, 0.96 to 0.98 for Cognitive Fatigue, 0.85 to 0.91 for Physical Fatigue, 0.94 to 0.96 for Motivation, and 0.93 to 0.97 for Impact on Function, all of which met pre-defined criteria and were considered substantial. For all items except two, observed correlations were highest with its own domain compared to with other domains. Item “How tired did your body feel today?”, part of the Physical Fatigue domain, correlated more strongly with Global Fatigue Experience (0.92), Motivation (0.88), and Impact on Function (0.88) than its own domain (0.85). Item “How much did tiredness make it difficult to do things today?”, part of the Impact on Function domain had a slightly higher correlation with the Motivation domain than its own domain (0.95 versus 0.93). All correlations are presented in Table 5.

Table 5 Corrected Item-Level Psychometrics: Item-Total Correlations

Reliability

Test-retest reliability was assessed by evaluating the reproducibility of MDF-Fibro-17 scores over the time period between Baseline and Day 8 and from Week 5 to Week 6. All ICCs (Spearman Brown) exceeded the required 0.70 level, for baseline versus day 8, ICCs ranged from 0.71 to 0.82 (median of 0.74), and for Week 5 versus Week 6 all exceeded 0.90.

Internal consistency was confirmed as acceptable with strong Cronbach’s alpha for the total score and all domain scores (ɑ = 0.94-0.99). Reliability data are shown per MDF-Fibro domain in Table 6.

Table 6 Psychometric Testing of Final Questionnaire (Reliability, Construct Validity, Responsiveness)

Construct (convergent and divergent) validity

These data indicate overall good construct validity for the MDF-Fibro-17. Correlations with measures hypothesized to capture the same or a highly related concept, demonstrating convergent validity, were moderate (>0.4) to high (>0.7) at Baseline and End of Study for MDF-Fibro-17 scores. The highest correlations for each of the MDF-Fibro-17 total and domain scores were with the GFI (0.62 to 0.84), the FIQ Total (0.59 to 0.81), and the SF-36 VT (0.43 to 0.68). The majority of the correlations with the SF-36 measures of physical functioning – the PF and PCS – were all at least moderate with the exception of the MDF-Fibro-17 Cognitive Fatigue domain at Baseline versus PF and PCS (-0.31 and -0.28 respectively), and the MDF-Fibro-17 Global Fatigue Experience, Cognitive Fatigue, Physical Fatigue, and Motivation domains against the SF-36 PF at End of Study (-0.39, -0.34, -0.38 and -0.39 respectively). The results for convergent validity are presented in Table 6.

With respect to divergent validity, weaker correlations were observed, with low correlations (<0.4) between all MDF-Fibro-17 total and domain scores versus sexual function (ASEX) at Baseline and End of Study, and all measures of cognitive function (MASQ, PASAT, ACT, and BDEFS-SF), mood (HADS), and the other SF-36 subscales at Baseline. Low to moderate correlations were observed at the End of Study Treatment visit (Day 43; 0.36 to 0.66). The results for divergent validity are presented in Table 6.

Known-groups validity

All known-group difference analyses of MDF-Fibro-17scores were highly significant (p < 0.001) when performed using quintiles. Large effect sizes (>0.8),[27, 41] determined by the F value, provided an indication of the differential sensitivity of the MDF-Fibro-17 scores to the cross-sectional known-groups, showing the greatest ability to discriminate between the 5 quintiles on the NRS, GFI, and FIQ Total. Scores by quintiles are summarized in Table 7.

Table 7 Scores by GFI, FIQ-Total, and Pain NRS Quintiles

Sensitivity to change and responder analysis

Significant (p < 0.001) changes were observed in all MDF-Fibro-17 scores from Baseline to End of Study. A medium effect size (>0.5) was observed for the Cognitive Fatigue domain (-0.69). Effect sizes for the total score and all other domains were large (-0.85 to -0.95). Similar effect sizes to those observed on the MDF-Fibro-17 were also observed in the pain intensity NRS, FIQ total score, GFI, and SF-36 VT.

The responder definitions for the MDF-Fibro-17 domains were assessed using distribution and anchor-based approaches. Similar results were found with both distribution-based approaches, used to understand the lower limits of acceptable responder definitions. Anchor-based responder definitions using the PGIC ([Patients’ Global Impression of Change] very much/much improved category), GFI (>11-point improvement), and FIQ total score (>8-point improvement) were similar to those determined by selected distribution based methods (-2.55 to -2.94). However, the responder definitions determined using the PGIC much improved category, GFI, and FIQ had a broader range (-2.06 to -3.41). The mean responder score, based on the anchor-based analyses, for the MDF-Fibro-17 Total Score and the 5 domains ranged from -2.48 to -2.85. Overall, the recommended responder cut-off for the total score as well as the other domains is -2.5 (summarized in Table 8).

Table 8 Responder Analysis Results

Discussion

The MDF-Fibro-17 is a multidimensional measure of FM-related fatigue, made up of 5 domains (Global Fatigue Experience, Cognitive Fatigue, Physical Fatigue, Motivation, and Impact on Function). The analyses confirmed the domain structure suggested by the conceptual model developed from in-depth qualitative work with FM patients, and indicated sound psychometric properties of the measure.

All 17 items in the MDF-Fibro-17 performed well as individual items and as part of the 5 domain structure of the instrument. The multidimensional structure allows the MDF-Fibro-17 to capture the broad experience of FM-related fatigue, a characteristic that has been identified as important within the clinical and regulatory community [1, 7, 8, 10]. In addition, the factor analyses confirmed that it is also appropriate to calculate a single total score informed by the in-depth measurement of FM-related fatigue. The relationships between individual items within and across domains demonstrates the complexity of fatigue in FM. There was a strong correlation observed between motivation and physical functioning items in particular, suggesting potential item redundancy. However, both the qualitative data and conceptual model [9] highlighted that these are related but distinct aspects of FM-related fatigue from the patient perspective and therefore relevant and important to include within the measure.

Tests of internal consistency and test-retest reliability were strong, indicating that this is a highly reliable measure. The correlations observed between the MDF-Fibro-17 and other measures in the study hypothesized to be either similar (convergent validity) or dissimilar (divergent validity) were overall as expected, confirming good construct validity. The strongest relationships were observed between the MDF-Fibro-17 and overall FM severity (FIQ Total) and the GFI, another measure of fatigue. The moderate correlations with the SF-36 VT, a single item evaluating a simple concept similar to fatigue, and some of the measures for divergent validity demonstrate the high level of complexity of FM-related fatigue, in which multiple symptoms are experienced and, though distinct, are closely related.

Known-groups analysis revealed that the MDF-Fibro-17 total and domain scores were able to differentiate between all groups tested. Highly significant changes were observed over the study period on all scores of the MDF-Fibro-17, with medium to large effect sizes, which reflected the changes observed on other outcomes in the study, indicate that the instrument is sensitive to detecting changes observed in a clinical study.

Responder analyses conducted using different definitions for both anchor based and distribution-based techniques produced similar estimates and the results suggested a reasonable responder cut-off to be around -2.5.

One limitation to this study is that although the MDF-Fibro-17 has the potential to assess the different components of FM-related fatigue based on data described above, this study was conducted in a particular clinical trial population in response to drug therapy intervention. Therefore, responsiveness and sensitivity to other therapies would need to be further explored in future studies.

Conclusion

The psychometric evaluation and strong evidence of content validity indicate that the MDF-Fibro-17 is a relevant, psychometrically robust, multidimensional instrument, with sensitivity to detection change and clear response definitions. Taken as a whole, the MDF-Fibro-17 has the potential to become a reliable clinical outcome assessment tool to evaluate fatigue in adult patients with FM within a clinical trial setting [12].