Background

The importance of symptom control in cancer populations, in particular, has been widely recognized due to the extraordinarily high prevalence of physical and psychological symptoms as well as the impact of these symptoms on patients' QOL [1]. For patients with advanced disease who have reduced life expectancy and no immediate hope for a cure, relief of physical symptoms and maintenance of function become primary objectives of medical intervention [24]. This is true for advanced prostate cancer, in particular, where patients are faced with palliative rather than curative treatment options [3].

Although the literature contains a number of reliable and valid instruments to measure quality of life (QOL) [58], oncology health care experts and regulatory agencies have resisted using these multi-item, multi-dimensional instruments in clinical practices and decision-making [912]. This resistance stems from time and resource constraints [13, 14], difficulty interpreting the meaning of multidimensional information, and difficulty determining the clinical meaning of score changes, including implications for treatment decisions [12, 1520].

The U.S. Food and Drug Administration (FDA) has stated that, along with survival, benefit to QOL is one of two primary endpoints that could be considered for approval of new anti-cancer drugs [21]. Yet, this regulatory agency is also challenged with implications for claims of drug effectiveness from multidimensional assessment of QOL [22]. The FDA Oncology Drug Advisory Committee subcommittee on Quality of Life has suggested that assessment of symptoms might represent a reasonable place to start in working toward a goal of more focused assessment of QOL domains [23].

Most recently validated measures of cancer-specific QOL incorporate an assessment of certain prevalent symptoms, such as pain and fatigue, within the multidimensional assessment [5, 6]. Broad-based cancer specific QOL questionnaires, such as the Functional Assessment of Cancer Therapy-General (FACT-G) [6] and European Organization for Research and Treatment of Cancer (EORTC) QLQ-C30 [5], assess a few common cancer symptoms such as pain, fatigue and nausea, and add more detailed, site-specific symptom assessment to the "core" general questionnaire. While the questionnaires have been developed and tested to assess cancer-specific symptoms, disease symptoms of interest are embedded in large, longer QOL questionnaires and cannot readily be aggregated into clinically relevant, responsive symptom indices. A common request, therefore, is for a more symptom-focused approach to QOL assessment tools whereby the disease symptoms measured by these multi-dimensional QOL questionnaires are aggregated in such a way that is clinically relevant, easy to use in clinical practice, and psychometrically acceptable.

Our response to this need was launched by the National Comprehensive Cancer Network (NCCN) in relation to nine common cancers, including prostate cancer [24]. This effort revealed that there are seven symptoms or concerns in prostate cancer that hold the very highest priority to clinical experts who treat men with advanced prostate cancer. These concerns include fatigue, pain (3 items), weight loss, and difficulty with urination. This article describes the development and initial validation of a brief prostate cancer-specific symptom index derived from a well-established multidimensional QOL questionnaire, the FACT-P [4].

Methods

The FACT Prostate Symptom Index (FAPSI), a brief, symptom-targeted instrument, was developed and validated in three phases. During phase 1, we extracted a list of symptoms related to cancer in general as well as prostate cancer specifically from the FACT-P to develop a prostate cancer symptoms/concerns survey (Survey). In phase 2, we presented the Survey to an international sample of physician experts for selection of the highest priority symptoms to evaluate when treating men with advanced prostate cancer. During Phase 3, we analyzed data from clinical trial in which the FACT-P was administered to patients to determine the psychometric performance of the FAPSI-8.

Participants

The sample of physicians who were asked to complete the Survey had at least three years experience treating 100 patients with advanced prostate cancer. The total sample included 23 medical oncologists (17 North American, 6 European), 13 radiation oncologists (9 North American, 4 European), and 20 urologists (11 North American & 9 European). A total of 44 prostate cancer specialists (79% response rate, 77% to 80% range) physician experts (18 medical oncologists, 16 urologists, 10 radiation oncologists; 29 North American, 15 European) completed the Survey. The response rate was consistent among specialties and between geographic areas, ranging from 77% to 80%.

The validation patient sample consisted of 288 men with hormone refractory prostate cancer enrolled in a randomized, placebo-controlled clinical trial of atrasentan, an oral, selective endothelin-A receptor antagonist. See Table 1 for patient demographic and clinical characteristics. Institutional review board approval was obtained at each institution where data was collected in the clinical trial.

Table 1 Description of Patient Sample

Measures

The source of symptoms and concerns for both the survey tool and the symptom index was the FACT-P [4], comprised of the 27 FACT-G items plus 12 items specific to prostate disease, such as urinary, sexual and bowel dysfunction, and pain [25]. The FACT-G (version 4) is a 27-item compilation of general questions divided into 4 primary QOL domains: Physical Well-Being (PWB), Social/Family Well-Being (S/FWB), Emotional Well-Being (EWB), and Functional Well-Being (FWB) [6]. Scores are obtained for each of the specific domains as well as a total QOL score. An additional score, the Trial Outcome Index (TOI) [26, 27], is created by summing the PWB, FWB, and Prostate Cancer Subscale (PCS). Responses to FACT questions use a five-point Likert-type scale ranging from 0 ("not at all") to 4 ("very much so"). The FACT-G has good test-retest reliability (r range = 0.82 – 0.92), is sensitive to change over time, and has been shown to possess good convergent and discriminant validity [6]. Both the FACT-G and FACT-P were derived using a thorough item generation and review procedure with patients and clinicians, ensuring that important content is well-covered.

The European Organization for Research and Treatment of Cancer (EORTC) QLQ-C30 QOL questionnaire is a widely-used, validated instrument that includes 30 questions measuring physical, role, emotional, and social functioning, disease symptoms, financial impact, and global QOL [5]. A global score as well as symptom scores (e.g., pain, fatigue) can be calculated. The QLQ-C30 was used in this study to evaluate convergent validity.

The Eastern Cooperative Oncology Group (ECOG) Performance Status Rating (PSR) is a single-item rating of the degree to which patients are able to participate in typical activities without the need for rest [28]. This index is widely used in cancer clinical trials to assess functional capability of patients as they undergo treatment. The PSR score ranges from 0 (normal activity without symptoms) to 4 (unable to get out of bed). In this study, the PSR was obtained from patients themselves and served as a means of classifying patients for known-groups validation.

Procedure

An independent review of the 39 items on the FACT-P was conducted by two medical oncologists with subspecialties in health services research and policy and one of the co-authors (DC), a clinical psychologist specializing in QOL assessment. First, symptoms or concerns from the FACT-G deemed to be a consequence of the disease itself were selected. Second, symptoms or concerns specific to prostate cancer (from the PCS) were subjected to the same review process. Items were ranked on a four-point Likert scale from "always disease-related" to "never disease-related." Items rated as "always" or "usually" disease-related symptoms or concerns by two or more of the three raters were retained for the Survey. The raters discussed any items that did not receive a consensus rating in a conference call. Twenty-nine items resulting from this two-step process were compiled in a symptom/concern survey. The only revision made to the wording of any item was the addition of "bone pain" as clarification in parentheses after the question: "I have certain areas of my body where I experience significant pain."

To control for effects due to order of administration of items, four versions of each survey were created and randomly distributed. The survey asked each respondent to select no more than 10 symptoms or concerns that were "the most important to monitor when assessing the value of treatment for advanced prostate cancer." Of the ten symptoms/concerns nominated as "the most important," each respondent was then asked to select up to five as "the very most important." Respondents were also asked to write in important symptoms or concerns that were omitted from the Survey. Surveys were sent to physician experts via email, traditional mail, fax and/or distribution at cooperative group conferences (i.e., ECOG). Each physician who returned a survey and completed the participant information section was compensated for his/her time.

Survey Analysis Plan

The Survey was analyzed by tabulating the frequency with which experts selected a particular symptom/concern as one of the five most important for the total sample and for each specialty (medical oncology, radiation oncology, urology) and geographic region (North American, European). The items most commonly endorsed by the total sample were retained in the final symptom index. The criterion for item retention was a rate of endorsement as one of the top five symptoms/concerns exceeding the probability of chance (17%), calculated by dividing the allowable number of "very most important symptoms" (5) by the total number of items in the survey (29). In addition, a 95% confidence interval (CI) above chance probability was calculated to serve as a more conservative criterion for selection. Using the total sample, 2 × 2 Chi square analyses were conducted to determine if the order of presentation of the symptoms had any systematic influence on experts' selection of the ten "most important" symptoms.

Validation Analysis Plan

Patient data used for the initial validation analyses included only those time points from baseline through week 24 of the 52-week trial because of patient attrition. Because the objective of this study was to develop and validate a brief symptom index as opposed to determining treatment response, and the sample size remained adequate, we did not feel the study objectives were compromised by this cut-off point.

Patient responses to the items retained for the FAPSI were subjected to analysis for determination of internal consistency (Cronbach's alpha), and convergent and discriminant validity. Guyatt's Responsiveness Statistic, a modification of the effect size, was calculated as an index of the responsiveness of FAPSI to change in clinical status [29]. This statistic is computed as the ratio of difference between average change in FAPSI scores among patients whose ECOG PSR worsened and average change in FAPSI scores in patients whose PSR remained unchanged to the standard deviation of FAPSI scores among patients with no change in PSR. Similar to Cohen's effect size conventions [30], a Guyatt's statistic ≤ 0.20–0.49 is considered small, 0.50–0.79 is moderate, and ≥ 0.80 is large.

We also applied an item response theory (IRT) based approach to evaluate the unidimensionality and construct validity of the FAPSI candidate items in greater detail [31]. For items retained in the symptom index according to the more liberal of the criteria (i.e., exceeding chance probability of endorsement), Andrich's [32, 33] rating scale extension of the Rasch measurement model was used to determine whether FAPSI candidate items measure the same underlying construct. The WINSTEPS computer program [34] was used for Rasch analyses. Unweighted item fit mean square (MNSQ) values (expected value = 1.0) were also calculated to identify potential misfitting items or those that indicate a lack of construct homogeneity with other items in a scale to assure scale unidimensionality. MNSQ = 1.3 was set as the critical value for a misfitting item. The MNSQ value indicates the amount of error associated with the item estimate with respect to its fit with other items in the dimension being measured. For example, a MNSQ of 1.40 indicates 40% excess noise in the data, suggesting the item is measuring a different dimension than the one it is intended to measure.

Results

Survey

Of the symptoms/concerns presented to physician experts (Table 2), eight items were endorsed with a greater than chance probability (>17%) for the total sample and selected for FAPSI-8 (Please see Additional File 1 [Appendix]). Using the total sample, Chi square analyses revealed presentation order had no systematic effects. The only symptom on the FAPSI-8 displaying a significant order effect was "I have pain" (X2[1] = 7.3, p < .05). Physician experts whose Survey presented this item in the top half endorsed it more frequently than those whose Survey presented the item in the bottom half. More than one physician expert entered two additional write-in symptoms/concerns: hot flashes (2 experts) and PSA-related anxiety (2 experts).

Table 2 Frequency of endorsement of checklist symptom/concerns

Endorsed symptoms/concerns that ranked highly across all physician expert specialties (medical oncology, radiation oncology, urology) included three pain items (pain, bone pain, pain limiting performance of activities) and fatigue (Table 3). Variation between specialties in prioritization of weight loss and urinary difficulty was greater than other items. Difficulty urinating was an item endorsed as among the five "very most important" symptoms/concerns by urologists (50%) and radiation oncologists (40%), but the percentage fell (11%) among medical oncologists. Geographic differences in ranking of urinary difficulty were also apparent: European experts (mostly urologists) ranked it second overall, compared to the North American ranking of 7.

Table 3 Rankings of FAPSI-8 Items by Expert Specialty and Geographic Region

FAPSI-8 Validation

Scores and internal consistencies for the FACT-G, FACT-P and FAPSI are reported in Table 4. The scale scores are presented both in raw form and transformed to a 0–100 scale for ease of comparison across scales.

Table 4 Descriptive Baseline Statistics of Scales (N = 272–278)

Items with mean square (MNSQ) values outside 0.7–1.3 have been identified as possible misfitting items, indicating that further examination may be warranted (Linacre & Wright). MNSQ < 0.7 suggests "overfit" to the concept being measured, and MNSQ > 1.3 suggests misfit to the dimension being measured by the collection of FAPSI questions. These analyses suggest that the items "I have difficulty urinating" and "My problems with urinating limit my activities" do not measure a construct consistent with the other 6 items. Excluding these items produced essentially no change to the internal consistency of the FAPSI-6. However, because this item received frequent endorsement by the physician experts, we elected to retain them in the FAPSI-8 (Table 5).

Table 5 Summary of Item statistics for FAPSI-81

The FACT-G and FACT-P had good internal consistency (baseline alpha = 0.84 and 0.87, respectively). PWB, FWB, and EWB subscales (alphas= 0.69 to 0.85) as well as the PCS (alpha = 0.70) and TOI (alpha = 0.86) also demonstrated good internal consistency. The internal consistency of the SFWB scale was lower than the other domain scales (alpha = 0.59).

Because the item level analyses suggested that both a 6- and 8-item version of the FAPSI warranted consideration, analyses of the FAPSI were conducted on both versions. The 6-item symptom index excluded the items, "I have difficulty urinating" and "My problems with urinating limit my activities." Internal consistency of the FAPSI-6 (alpha = 0.68) and FAPSI-8 (alpha = 0.67) was adequate at baseline, and by week 24 increased to 0.81 and 0.80, respectively (Table 4).

FAPSI-8 was significantly and positively correlated with the FACT-G total score (r = 0.51, p < .001), PWB (r = 0.66, p < .001), FWB (r = 0.44, p < .001), EWB (r = 0.40, p < .0001), FACT-P total score (r = 0.71, p < .001), PCS (r = 0.85, p < .001), and TOI (r = 0.80, p < .001), as well as the EORTC global score (r = 0.48, p < .001), pain symptom scale (r=-0.72, p < .001), and fatigue symptom scale (r=-0.59, p < .001) (Table 6). The magnitude of correlations of the 6-item symptom index with the above scales was comparable to FAPSI-8, with the exception of the three FACT scales that include the two urination items excluded from the 6-item index (FACT-P, r = 0.67; PCS, r = 0.73; TOI, r = 0.74, all p < .001). Neither symptom index was significantly correlated with the FACT SFWB subscale.

Table 6 Unadjusted and adjusted1 correlations between baseline FAPSI-6 & FAPSI-8 and study measures (N = 272–278)

FAPSI-6 and FAPSI-8 had comparable responsiveness on Guyatt's statistic (Table 7). While all of the scales have responsiveness statistics consistent with large effect sizes, the responsiveness statistics for FAPSI-6 and FAPSI-8 were among the largest (1.42 and 1.29, respectively) and were comparable to that of the commonly-recommended FACT TOI (1.33).

Table 7 Guyatt's Responsiveness Statistics for FAPSI-6 and FAPSI-8

The sample was divided into three groups by baseline PSR: PSR = 0 versus PSR = 1 versus PSR = 2 (no patients were rated a PSR ≥ 3). Better symptom status (lower FAPSI score) was expected to be associated with better performance status (lower PSR). Baseline PSR was associated with QOL and symptom status as measured by FACT-G total score (F(2,269) = 19.97, p < .0001), FACT-P total score (F(2,268) = 25.09, p < .0001), PWB (F(2,274) = 30.90, p < .0001), FWB (F(2,273) = 30.87, p < .0001), EWB (F(2,272) = 3.55, p < .05), PCS (F(2,274) = 20.01, p < .0001), TOI (F(2,273) = 40.16, p < .0001), and both the FHSI-6 (F(2,274) = 19.06, p < .0001), and FHSI-8 (F(2,274) = 21.46, p < .0001). SFWB scores were not significantly different between the three PSR groups. Post hoc review of the subgroup differences using Tukey's HSD indicated that the FACT-G, FACT-P, PWB, FWB, and TOI differentiated all three PSR levels (all p < .05). In contrast, PCS, FAPSI-6 and FAPSI-8 were able to differentiate only between PSR = 0 and PSR = 1 or PSR = 2 (Figure 1).

Figure 1
figure 1

Mean FACT scale responses (± one standard error of the mean) by baseline patient ECOG Performance Status Rating (PSR). PSR groups were trichotomized into PSR = 0 (n = 159–160), PSR = 1 (n = 102–105), and PSR = 2 (n = 12). [1] indicates discrimination between (PSR = 0) v (PSR = 1) v (PSR = 2); [2] indicates discrimination between (PSR = 0) v (PSR = 1 or 2). *p < .05, ***p < .001

At week 24, PSR remained associated with QOL and symptom status: FACT-G total score (F(2,117) = 12.91, p < .0001), FACT-P total score (F(2,117) = 12.25, p < .0001), PWB (F(2,119) = 14.23, p < .0001), FWB (F(2,118) = 21.51, p < .0001), EWB (F(2,118) = 3.62, p < .05), PCS (F(2,119) = 7.10, p < .01), TOI (F(2,118) = 16.84, p < .0001), and both the FHSI-6 (F(2,119) = 11.75, p < .0001), and FHSI-8 (F(2,119) = 9.99, p < .0001). SFWB scores did not differ between the three PSR groups. As with the baseline differences, post hoc review with Tukey's HSD indicated that the FACT-G, FACT-P, PWB, FWB, and TOI differentiated all three PSR levels (all p < .05). At Week 24, the PCS was able to differentiate only between PSR = 0 and PSR = 2. The ability of the six- and eight-item FAPSI scales to discriminate between PSR groups was intermediate, differentiating between PSR = 0 and PSR = 1 or PSR = 2 (Figure 2).

Figure 2
figure 2

Mean FACT scale responses (± one standard error of the mean) by Week 24 patient ECOG Performance Status Rating (PSR). PSR groups were trichotomized into PSR = 0 (n = 70), PSR = 1 (n = 37–39), and PSR = 2 (n = 13). [1] indicates discrimination between (PSR = 0) v (PSR = 1) v (PSR = 2); [2] indicates discrimination between (PSR = 0) v (PSR = 1 or 2); [3] indicates discrimination between (PSR = 0) v (PSR = 2). *p < .05, **p < .01, ***p < .001

Discussion

The objective of this project was to develop a brief symptom index for advanced prostate cancer from items derived from an existing, well-established multidimensional QOL questionnaire, the FACT-P. Based on the input of an international sample of 44 expert physicians, an eight-item symptom index was constructed. Initial patient validation of the eight items demonstrated that these items have adequate reliability and validity to assess the most important symptoms in this population. The FAPSI-6 and FAPSI-8 were shown to have good internal consistency, and convergent validity was demonstrated by its significant correlations with the FACT-G and its PWB, EWB, FWB subscales as well as with the FACT-P and the Prostate Cancer Subscale. The FAPSI-6 and FAPSI-8 also successfully discriminated patients based on differences in performance status at baseline and week 24, with patients with better performance status reporting better symptom status than those with poorer performance status. Although they had comparable responsiveness on Guyatt's Statistic, neither the FAPSI-6 nor the FAPSI-8 separated performance status groups quite as well as the FACT-G, FACT-P, PWB, FWB and TOI. However, further research is needed to determine if the FAPSI-6 or FAPSI-8 is best used in concert with other measures, such as the FACT-G, FACT-P, EORTC QLQ C30 or SF-36.

The candidate items presented to the experts for selection were drawn from the FACT QOL measurement system, although experts were also provided with the opportunity to 'write in' items not appearing on the surveys. Results of this project suggest that the FACT-P contains most of the disease-related symptoms and concerns that physicians believe are important to monitor in this patient population. Results of item-level analyses also suggested that a 6-item version of the FAPSI, excluding two items related to urination difficulties, demonstrated good psychometric performance in this population. However, the slight psychometric gain with respect to unidimensionality must be weighed against the sacrifice in clinical utility resulting from these two items deemed relevant by expert clinicians. Although patients did not participate in the choice of target symptoms, they did participate, in a 3:1 ratio, in the selection of the original items during development of the FACT-Prostate. It remains to be seen, however, if patients would select similar or the same 8 symptoms when presented with this task.

The symptoms endorsed as the most important included three pain items ("I have pain," "My pain keeps me from doing things I want to do," and "I have certain areas of my body where I experience significant pain [bone pain]"). Five questions of the 29 on the survey pertained to pain whereas, for example, only one was devoted to fatigue. We believe that the frequency with which these multiple pain items were endorsed among the top five "most important" highlights the importance of pain experiences in advanced prostate cancer patients, but this must be confirmed in subsequent studies.

Observed consistencies and differences in item endorsement between expert groups (specialty and region) were informative in two ways. First, the eight final items comprising the FAPSI-8 were selected based on the combined endorsements of a range of specialists treating advanced prostate cancer patients. Respondents from all three specialties and both geographic regions endorsed most of the eight responses. Difficulty urinating was a question endorsed as among the top five priority symptoms by 50% of urologists and 40% of radiation oncologists in this sample but only 11% of medical oncologists. This same question was endorsed as a priority symptom by 60% of European experts and only 17% of North American experts, but this is probably due to the greater representation of urologists among the European sample than the North American sample (53% vs. 28%, respectively).

The priority symptoms identified by expert physicians in this study are consistent with previously reported symptoms and concerns of cancer patients in general, and prostate cancer patients specifically. Pain and fatigue have been highlighted in a number of studies of symptom assessment in numerous medical oncology populations [1, 35, 36]. In addition, depending on the stage of disease, patients with prostate cancer report difficulties with anorexia, urination, and sexual function [4, 37]. The NCCN survey, using a similar methodology but a U.S. sample only, produced six of the same eight symptoms (fatigue, bone pain, pain, pain limits performance, weight loss, difficulty urinating) in its seven-item NCCN/FACT Prostate Symptom Index, which also includes an item concerned with being able to enjoy life [24].

For patients with advanced prostate cancer, especially hormone refractory prostate cancer, current therapy has limited ability to extend life and is associated with some morbidity [3841]. The choice of additional therapies can be justified only when symptomatic relief or maintenance or improvement in QOL is reasonable to expect [42]. Some treatments have demonstrated beneficial effects on disease-related symptoms and QOL [43]. The availability of patient-reported symptom and QOL information may be useful in helping patients and physicians make more informed choices about treatments as well as cope with the consequences of the choices they make [44].

Disease-specific symptom assessment has potential to play a key role in evaluating patient-related endpoints in clinical trials. Cancer of a specific site is often accompanied by distinct constellation of symptoms. Some clinical trials contain endpoints that include multidimensional QOL along with disease- or treatment-specific endpoints [45]. While the assessment of global QOL is important, the use of global QOL scores may obscure important and significant changes in disease-related symptoms when those symptoms are embedded in a larger instrument [36]. This underscores the importance of targeting some assessment toward pre-specified, priority disease-related symptoms. Further, the FDA Oncologic Drug Advisory Committee (ODAC) subcommittee on QOL has advanced the position that overall claims of QOL benefit should not be made from one or two domain measurements and that claims made in this area need to be specific to the domain that was measured [23]. An abbreviated, symptom focused assessment could lend support to the use of more targeted claims, such as "symptomatic relief" or "delay of onset of tumor-related symptoms."

The use of brief assessment tools to assess symptomatology may serve the interests of the clinical investigators and regulatory authorities as well as the patients being treated for these various diseases. From a clinician's perspective, assessment of symptomatology may represent an efficient and clinically-relevant means of obtaining information related to the symptom component of QOL. It may also help identify patients who would benefit from palliative interventions [17]. Systematic symptom assessment may help to clarify a treatment's toxicity, potential palliative benefit, or need to make a change in the patient's clinical management [45]. It is noteworthy that a degree of responsiveness in the 8-item index reported here is lost relative to the full-length FACT-P. In addition, some important areas of patient concern are necessarily omitted from this brief index. Thus, while this eight-item scale has been shown to be a suitable index of important symptoms associated with prostate cancer, it is not a replacement tool for the FACT-P. Each individual user must decide whether competing considerations of content relevance, clinical interpretability, and length would suggest use of one or the other in a given application. This research expands the range of assessment and reporting options with the FACT measurement system.

Conclusions

In summary, experts in the management of prostate cancer can reach consensus about the symptoms and concerns that are most important to monitor when treating patients with advanced disease. Furthermore, the symptoms identified by experts as the very most important to assess in treating patients with advanced cancer can be derived from a well-established multidimensional QOL questionnaire. Both the FAPSI-6 and FAPSI-8 represent the constellation of symptoms/concerns endorsed by our sample of experts and possess adequate psychometric properties and sensitivity to justify their use in future studies. The decision regarding six or eight items, or other reasonable combinations of a priori targeted questions, can be left to the discretion of the investigator and may be dictated by preferred length of scale, weighting of symptom category, or particular cluster of symptoms/concerns of interest. Future work will examine the extent to which changes in symptomatology as measured by these brief indices translate into meaningful improvement to the patient.