The Piper Fatigue Scale-12 (PFS-12): psychometric findings and item reduction in a cohort of breast cancer survivors
- First Online:
- Cite this article as:
- Reeve, B.B., Stover, A.M., Alfano, C.M. et al. Breast Cancer Res Treat (2012) 136: 9. doi:10.1007/s10549-012-2212-4
- 568 Views
Brief, valid measures of fatigue, a prevalent and distressing cancer symptom, are needed for use in research. This study’s primary aim was to create a shortened version of the revised Piper Fatigue Scale (PFS-R) based on data from a diverse cohort of breast cancer survivors. A secondary aim was to determine whether the PFS captured multiple distinct aspects of fatigue (a multidimensional model) or a single overall fatigue factor (a unidimensional model). Breast cancer survivors (n = 799; stages in situ through IIIa; ages 29–86 years) were recruited through three SEER registries (New Mexico, Western Washington, and Los Angeles, CA) as part of the Health, Eating, Activity, and Lifestyle (HEAL) study. Fatigue was measured approximately 3 years post-diagnosis using the 22-item PFS-R that has four subscales (Behavior, Affect, Sensory, and Cognition). Confirmatory factor analysis was used to compare unidimensional and multidimensional models. Six criteria were used to make item selections to shorten the PFS-R: scale’s content validity, items’ relationship with fatigue, content redundancy, differential item functioning by race and/or education, scale reliability, and literacy demand. Factor analyses supported the original 4-factor structure. There was also evidence from the bi-factor model for a dominant underlying fatigue factor. Six items tested positive for differential item functioning between African-American and Caucasian survivors. Four additional items either showed poor association, local dependence, or content validity concerns. After removing these 10 items, the reliability of the PFS-12 subscales ranged from 0.87 to 0.89, compared to 0.90–0.94 prior to item removal. The newly developed PFS-12 can be used to assess fatigue in African-American and Caucasian breast cancer survivors and reduces response burden without compromising reliability or validity. This is the first study to determine PFS literacy demand and to compare PFS-R responses in African-Americans and Caucasian breast cancer survivors. Further testing in diverse populations is warranted.
KeywordsFatigueBreast cancer survivorsPatient-reported outcomesPiper Fatigue ScalePsychometrics
Cancer-related fatigue is defined as the perception of unusual tiredness that varies in pattern or severity and can affect the functional ability of cancer survivors [1–3]. A recent literature review of 18 studies measuring symptoms in adults during active treatment found fatigue to be experienced by 62 % of patients . Fatigue continues to be a prevalent and distressful symptom for cancer survivors years after active treatment ends . Among disease-free breast cancer survivors approximately 3 years post-diagnosis, 41 % reported moderate to severe fatigue levels .
Existing self-report fatigue questionnaires vary on a number of factors including questionnaire length, reference period, and the response scale . One key factor that differentiates these questionnaires is whether they provide a single overall score (a unidimensional measure) or multiple scores that reflect different attributes of the fatigue experience (a multidimensional measure). The Piper Fatigue Scale (PFS) is one of the commonly used multidimensional fatigue measures in the cancer research field and includes subdomains of behavioral, affective, sensory, and cognitive/mood attributes of fatigue . The original PFS consisted of 40 questions (items) and the revised PFS (PFS-R) includes 22 questions .
While the PFS-R has been translated into multiple languages [9–13], more researchers and clinicians would likely use the PFS if it were shorter in length. Respondent burden is a concern as most research studies include a battery of self-report measures to capture the health-related quality of life (HRQOL) of cancer patients. In addition, prospective studies are encouraged over cross-sectional studies to capture variations in fatigue experience over time; thus, reducing response burden is a necessity. Finally, to increase research of fatigue in diverse samples of survivors, fatigue instruments must be valid and reliable across different racial/ethnic groups.
The overall goal of this psychometric study was to analyze the PFS-R in a diverse cohort of breast cancer survivors to reduce the number of questions in the scale. To accomplish this goal, the original four subdomain structure of the PFS-R was re-examined. The assumption guiding this study was that if analyses confirmed the multi-dimensionality of the scale, then the shortened scale must maintain its ability to provide reliable measurement for each subdomain.
As described elsewhere [6, 14, 15], participants were female breast cancer survivors enrolled in the Health, Eating, Activity, and Lifestyle (HEAL) study. A total of 1,183 women were recruited through the population-based Surveillance, Epidemiology, and End Results (SEER) cancer registries in New Mexico, Los Angeles County, and Western Washington. These women participated in a baseline in-person interview within 1 year after diagnosis. Of those, 944 (80 %) participated in a follow-up assessment approximately 2 years after the first interview, and 858 (73 %) completed an additional HRQOL questionnaire approximately 3 years (40.5 months) after diagnosis.
For this study, 57 women were excluded for recurrent or new primary breast cancer before completing their HRQOL survey, as were 2 women with incomplete fatigue data. The final sample of 799 women included 436 from New Mexico, 195 from Los Angeles County, and 168 from Western Washington. All African-Americans were recruited from Los Angeles and most Hispanic women were recruited from New Mexico. In addition, the African-Americans were restricted to 35–64 years of age to focus on younger breast cancer survivors. Participants were diagnosed with in situ, Stage I, II, or IIIA breast cancer.
Informed consent was obtained from each participant at each assessment. The study was approved by the Institutional Review Board at each site, in accord with assurances filed with and approved by the U.S. Department of Health and Human Services.
Demographic and background variables
Age, education, and race/ethnicity data were collected at baseline. Data on marital status, household income, height, employment, menopausal status, smoking, and current weight were collected at the first follow-up survey. Self-reported comorbidities were captured on the HRQOL survey.
Stage of disease was obtained from each SEER registry database. Medical records were abstracted to obtain treatment information and these data were supplemented with SEER registry data. Women were asked if they were taking tamoxifen (yes/no) at the 24-month follow-up. In the HRQOL assessment, women self-reported about reconstructive surgery and lymphedema.
The PFS-R was included in the HRQOL survey. It is a 22-item scale that measures four subscales: behavior (6 items), affect (5 items), sensory (5 items), and cognition/mood (6 items) . Each item has 11 response categories on a 0–10 metric with verbal descriptors anchoring the endpoints. Each subscale is scored individually and then aggregated together for an overall score, with higher scores reflecting more fatigue. The HEAL study used an adapted version of the PFS-R that asked survivors to rate their fatigue over the past month rather than the past week. This extended reference period was used to minimize the effect of acute situational events and to enhance the assessment of the survivor’s general state of fatigue. Previous studies have found evidence that the PFS-R has acceptable internal consistency and evidence for validity with cancer patients [16–18] including the adapted version used in this study .
The HEAL HRQOL questionnaire included an initial screening question that had respondents skip the PFS-R behavioral and affective subscales if they indicated that they had “no fatigue” over the past 4 weeks. These subscales were skipped because the questions asked the respondent to further clarify the fatigue they were currently experiencing. Approximately 36 % of the 799 breast cancer survivors did not complete those two subscales (N = 291), but did complete the sensory and cognitive/mood subscales.
Before selecting items for removal from the PFS-R, the dimensionality of the PFS was re-confirmed to determine if psychometric analyses should be performed at the subscale level (i.e., referred to as the “multidimensional model” with four subdomains) or at the overall fatigue level (i.e., referred to as the “unidimensional model” with all 22 items loading on a single fatigue factor).
Confirmatory factor analysis was used to evaluate the fit of the multidimensional and unidimensional models to the HEAL data. In addition, a bifactor model was fit to the data to examine the extent to which the items loaded on a general fatigue factor compared with the items’ loadings on their respective fatigue subdomains. In the bifactor model, larger loadings on the general factor than the subdomain-specific factors would suggest that a dominant single fatigue factor accounted for a majority of the variation observed among the response data. Larger loadings on the subdomain-specific factors than the general factor would suggest that the multidimensional model was a better solution .
The next step included reviewing the psychometric properties of the PFS-R with specific attention given to shortening scale length. Six criteria, described below, were considered for making item selections: (1) content validity, (2) strength of relationship with fatigue, (3) content redundancy, (4) differential item functioning by race and/or education, (5) reliability, and (6) literacy demand.
Content validity of each item examined the extent to which the item reflected a critical attribute of the fatigue domain being measured . The content validity review was especially important for those items being considered for removal based on evidence from the psychometric analyses.
The strength of the relationship between each item and the overall scale was assessed using three statistical methods. The first method was item-total score correlations, performed in SAS (version 9.2). Secondly, factor loadings were examined from the factor analyses. Finally, item response theory (IRT) models were used to examine how well each item was related to the fatigue domain (i.e., discrimination) and how the item’s response categories reflected different levels of fatigue (i.e., threshold). The IRTPRO software (ver. 2.1; Scientific Software, Inc.) was used to estimate IRT parameters for Samejima’s Graded Response Model [25–27].
Content redundancy was assessed with inter-item Pearson correlations (from SAS), residual correlations abstracted from the factor analyses (from MPLUS), local dependence matrices, and IRT information functions (from IRTPRO). IRTPRO reports a standardized local dependence χ2 statistic where values over 10 may be considered problematic . Local dependence suggests there is excessive correlation between the items even after controlling for the underlying fatigue domain being measured. Authors’ expertise was also used to judge content redundancy.
Differential item functioning was examined to identify possible response bias by race or by education. Due to the small sample size of Hispanic women and those classified as “other” race, differential item functioning could only be examined between non-Hispanic whites and African-Americans. Education was evaluated by comparing response data between those with a high school education or less to those with more than a high school education. Differential item functioning was tested within an IRT framework (using IRTPRO) using Wald χ2 to evaluate statistical significance. Because of the large number of response categories (11) per item with the possibility of small sample sizes for each category, a sensitivity analysis was performed comparing the differential item functioning findings using all 11 categories with an alternative 4-category response scale created by collapsing response categories (0 = none; 1–3 = mild; 4–6 = moderate; 7–10 = severe fatigue).
Scale reliability was examined using Cronbach’s coefficient alpha. We also examined scale precision with IRT information functions. We ideally wanted to maintain each fatigue subscale’s reliability above .80, which is more than adequate precision for group-level comparisons in cancer research settings .
Literacy demand was evaluated with the Lexile Framework for Reading . A Lexile value is based on two strong predictors of how difficult a text is to comprehend: word frequency and sentence length. Lexile measures provide corresponding grade levels ranging from first grade to post-high school. Scores were averaged across PFS items to produce the mean literacy demand of the scale and corresponding mean reading grade level for the PFS. Lexile analyses have been previously used to evaluate HRQOL questionnaires . The Institute of Medicine guidelines state that health communication materials should be written at a mean 8th grade reading level or below .
Together these six criteria were used to inform the selection of items to remain in the PFS-12. Guiding our decisions, the senior author, Barbara Piper, the originator of the PFS, provided oversight in these judgments bringing her collective experience in the use of her scale in different populations and settings worldwide. At a minimum, it was decided that three questions from each of the four PFS subscales would remain in the new scale to maintain the ability to factor analyze the scale (PFS-12). Only the important findings that led to the decision to remove or keep an item in the PFS-12 are reported in “Results” section. After the final selection of the PFS-12 items, factor analysis and DIF testing were not repeated because the items need to be administered together to a new sample.
Survivor characteristics by race and ethnicity
African-American non-Hispanic breast cancer survivors (n = 196)
Caucasian non-Hispanic breast cancer survivors (n = 484)
Hispanic breast cancer survivors (n = 95)
Other race non-Hispanic breast cancer survivors (n = 24)
Total sample (n = 799)
Age at diagnosis**
Mean age ± SD
51.43 ± 7.84
57.30 ± 10.77
55.82 ± 11.58
51.33 ± 6.13
55.51 ± 10.43
≤High school graduate
Not working—leave, retired, unemployed
Not at all (or N/A)
On some days
# Comorbid conditions that limit activities*
Stage of breast cancer**
In situ—stage 0
No surgical procedure
Radiation therapy and chemotherapy
Tamoxifen at follow-up—ns
Yes, not current
Bothered by lymphedema in past 3 months*
Not at all
A fair amount
Dimensionality of the 22-item PFS-R
Factor loadings for a 4-factor, 1-factor, and bifactor model
4-Factor model: factor loading
1-Factor model: factor loading
Bifactor general model: factor loading
Bifactor specific model: factor loading
To What Degree is the Fatigue You Are Now Feeling Causing You Distress? (No Distress-Great Deal of Distress)
…Interfering With Your Ability to Complete Your Work or School Activities? (None-Great Deal)
… Interfering With Your Ability to Visit or Socialize With Your Friends? (None-Great Deal)
…Interfering With Your Ability to Engage in Sexual Activity? (None-Great Deal)
…Interfering With Your Ability to Engage in the Kind of Activities You Enjoy Doing? (None-Great Deal)
How Would You Describe the Degree of Intensity or Severity of the Fatigue Which You Are Now Experiencing? (Mild to Severe)
To What Degree Would You Describe the Fatigue Which You Are Experiencing Now as Being: Pleasant to Unpleasant
Agreeable to Disagreeable
Protective to Destructive
Positive to Negative
Normal to Abnormal
To What Degree Are You Now Feeling: Strong to Weak
Awake to Sleepy
Lively to Listless
Refreshed to Tired
Energetic to Unenergetic
To What Degree Are You Now Feeling: Patient to Impatient
Relaxed to Tense
Exhilarated to Depressed
Able to Concentrate to Unable to Concentrate
Able to Remember to Unable to Remember
Able to Think Clearly to Unable to Think Clearly
Model fit statistics
RMSEA (Root Mean Square Error of Approximation): Criterion: ≤.08
CFI (Comparative Fit Indices): Criterion: ≥.95
TLI (Tucker–Lewis Index): Criterion: ≥.95
The bifactor model results and the high correlations among the subscales suggest there may be evidence of a dominant underlying fatigue factor. A one-factor solution accounted for 58 % of the variance observed in the data, while the second factor accounted for only 7 % of the variance. To be consistent with prior use of the PFS-R and to maintain content validity, item reduction analyses proceeded on a subscale-by-subscale basis.
Selecting items for the PFS-12
In the behavioral subscale, no item tested positive for differential item functioning. The item “engage in sexual activity” had the lowest item-total score correlation and was found by the IRT model (not presented) to have poor discrimination. The item “fatigue causing distress” was judged to have a more affective than behavioral content. While no psychometric issue emerged, the item on “socializing with friends” was thought to be highly related to the item on “engaging in enjoyable activities” but the latter captured a broader range of impact on activities that could be done alone or in a group setting. However, the item “engaging in enjoyable activities” had the highest literacy demand, but was retained for noted content reasons.
In the affective subscale, four items tested positive for differential item functioning between African-Americans and whites. In addition, two item pairs were found to exhibit local dependence. Given this psychometric evidence, we relied on content expertise and literacy demand to select the final three items for the affective subscale of the PFS-12.
For the sensory subscale, two items tested positive for differential item functioning. In addition, three sets of items were found to be locally dependent. One item from each pair was removed to resolve the local dependence.
In the cognitive/mood subscale, two items tested positive for differential item functioning and three item pairs had evidence for local dependence.
For each subscale, the differential item functioning analyses were repeated, as a sensitivity analysis, using the collapsed 4-category response scale. There still was no differential item functioning by education level. Consistent with prior findings, the item “exhilarated to depressed” on the cognitive subscale tested positive for differential item functioning by race. However, the item “able to remember” did not demonstrate differential item functioning with the collapsed categories. The item “lively to listless” on the sensory subscale tested positive for differential item functioning by race for both the 11-point and 4-point scale. With the 4-point scale, the sensory item “energetic to unenergetic” no longer showed differential item functioning by race. Different from prior findings, the sexual activity item in the behavioral subscale tested positive for differential item functioning by race. In the affective subscale, only one item showed differential item functioning by race (“pleasant to unpleasant”), but was retained for content validity reasons.
Overall, the PFS-12 maintained high scale reliability (r = .92) with the original 22-item PFS-R having a reliability of .96. In addition, reliability for the PFS-12 subscales remained above .80: behavior (.89), affective (.87), sensory (.87), and cognition/mood (.87). The formatted version of the PFS-12 and accompanying scoring manual is provided in Appendix.
In a diverse sample of breast cancer survivors approximately 3 years from diagnosis, the 22-item PFS-R was shortened to the 12-item PFS-12 based on multiple criteria including reliability, validity, literacy demand, and response bias (i.e., differential item functioning) by race. This ten-item reduction in the PFS-R has the potential to reduce response burden in future studies while still maintaining a high level of precision for group-level comparisons at the subscale level (i.e., behavioral, affective, sensory, and cognitive/mood aspects of fatigue).
After testing alternative factor analytic models, results indicated that a four-factor model representing the original four PFS-R subdomains fit the data better than a one-factor model. This provides evidence that the four subscales may represent distinct aspects of the fatigue experience as reported by breast cancer survivors.
It is possible, however, that the findings for the multidimensional model may be more methodological artifact than distinct fatigue subdomains. For instance, items within each subscale are worded more similarly than items in other subscales; in addition, items presented next to each other on average will be more related than items further away . If the “distinctiveness” of the factors were purely artifact, this would suggest that the fatigue experience itself similarly affects all aspects of a person’s life: behaviors, sensory, affect, and cognition/mood. In support of this perspective, a dominant underlying factor was found accounting for 58 % of the variance and high correlations were found among the four subscales (all correlations >0.67). In addition, the bifactor model supported a dominant general fatigue factor after extracting clustering of items within each subscale. This study cannot disentangle whether evidence for a multidimensional model is due to the distinctness of each fatigue subdomain or due to artifact; likely it is a mixture of both.
Retaining the ability to provide scores on the PFS-12 for both overall fatigue and individual subdomains is valuable for researchers. It allows investigators to characterize the extent to which a health condition or treatment affects different aspects of the fatigue experience. In addition, an intervention may differentially affect specific fatigue subdomains; e.g., meditation may improve cognition/mood and have less effect on behavioral fatigue. For other research studies, overall fatigue may only be of interest as a mediating variable or an outcome. For each application, the PFS-12 can be used without extensive response burden.
Evidence in the literature from the US and abroad supports the PFS as a multi-dimensional measure. Three psychometric studies of non-English translations of the PFS-R, Greek , French , and Italian , also identified the same four subscales. In contrast, two studies found evidence for a 3-factor solution combining the domains of sensory with cognition/mood. One of these studies used a Brazilian translated version of the PFS-R ; the other used the English version in caregivers of stroke survivors . Lending additional validity to the multi-dimensional model of fatigue, recent evidence suggested that the mechanisms driving fatigue may differ by fatigue dimension: increased inflammation (generally thought to produce fatigue in cancer survivors) was related to the behavioral and sensory aspects of fatigue but not to the more psychological aspects (affective and cognitive subscales) .
The review of literacy demand of the PFS-R found items in the sensory subscale required a 3rd grade education or higher. The most demanding items were in the behavioral subscale and required an 8th grade education or higher (with one item that required a post-high school education). This more demanding item was kept in the PFS-12 because it captures a broader range of enjoyable activities and could be applied to experiences alone or with a friend or partner. Future versions of the PFS-R or PFS-12 should consider revising this item to read, “To what degree is the fatigue you are now feeling interfering with your ability to do the activities you enjoy?” This simple modification does not change the item content, but drops the literacy demand to a 9th grade education level. Additional cognitive testing of this revised item is recommended.
This study expands on findings from a published study with the same breast cancer cohort, which found fatigue, measured by the PFS-R, was associated with poorer HRQOL . Specifically, fatigue was associated with pain, cognitive problems, antidepressant use, weight gain/personal appearance, and physical inactivity . Because this previous study provided evidence for the validity of the PFS-R in terms of its association with other related clinical and HRQOL factors, these analyses were not duplicated in this study.
This study had limitations. First, though this psychometric analysis used a population-based sample of breast cancer survivors, results cannot be generalized to other survivor groups or health conditions. In addition, race comparisons are not based on women equally recruited across sites. The Los Angeles site collected all African-American data and New Mexico collected a majority of the Hispanics. Thus, the differential item functioning results by race could have been associated with geographic differences . In addition, eligible African-American women were restricted to 35–64 years of age; thus age differences could be another source for differential item functioning findings. Restricting whites to the same age range would not have yielded enough sample sizes for differential item functioning testing.
It is therefore recommended that additional analyses of datasets that included the PFS-R be done to confirm these findings, including datasets that used a 7-day reference period instead of the 30-day reference period used in this study. The psychometric properties of the PFS-12 also need reconfirmation in new samples, so the items are presented next to each other instead of dispersed across the larger PFS-R. Evaluations in women undergoing active treatment and in other cancer populations using both men and women also are recommended.
Despite these limitations, the brief and reliable PFS-12 should have great value in future patient-centered outcomes research studies to capture the multidimensional aspects of the fatigue experience. Additional research is planned to identify clinically meaningful cut-points on the PFS-12 that classify individuals as mildly, moderately, or severely fatigued and to characterize the association between these cut-points with HRQOL decrements. Together, a psychometrically sound, and decision-relevant fatigue measure will enable researchers to provide empirical evidence about the impact of interventions on survivors’ lives that may lead to identifying safer and more effective treatments.
Dr. Reeve’s work was supported under a National Cancer Institute contract: HHSN261201000642P.
Conflict of interest
Barbara Piper is the developer of the scale we analyzed. There are no other conflicts of interest to report.