The Functional Assessment of Cancer Therapy-General (FACT-G) Questionnaire, designed to measure quality of life (QOL) in cancer patients [1], has gone through many validation studies, both in its English and translated versions [25]. However, little research has been conducted in South American Spanish speaking patients. A previous study performed among Uruguayan cancer patients using FACT-G Spanish Version 2 [6] showed acceptable to good reliability and validity except for the emotional well-being subscale. Cella and his colleagues also recognized this potential problem and subsequently revised the items in a new Spanish Version 3 and its most recent Spanish Version 4[2] in order to improve its reliability. Major changes include adding one previously available but not scored item to the scoring algorithm ("I worry my condition will get worse" – "Me preocupa que mi enfermedad empeore"), rephrasing some other items to improve readability and removing the 2-item Relationship with Doctor subscale. These remarks the need for additional validation studies for the Spanish-speaking cancer patient population in South America.

Another reason for further psychometric studies on FACT-G is that, up to date, its Spanish version has been validated using classic test theory (CTT) approaches like other QOL questionnaires. Traditionally, classical psychometric based procedures have dominated the health status assessment. More recently, item response theory (IRT) measurement models have entered the field and researchers are increasingly enthusiastic for the prospect of deriving better definitions of underlying constructs and the opportunity to turn attention away from static tests and scales to items and the incremental information they provide [7]. The present study aimed to evaluate the performance of the FACT-G Spanish Version 4 in Uruguayan cancer patients using both classic psychometric and item response theory based approaches.



The data were collected from cancer patients, between 18 and 75 years of age, with various tumor sites, at different stages of disease and under different forms of treatment. To be eligible patients must have been fluent in Spanish. Potential participants were identified from the daily record of office visits, treatment visits and inpatient hospitalizations. To ensure sufficient experience with treatment related side-effects, patients must have completed a minimum of two cycles of chemotherapy and/or 10 radiation therapy sessions or one month of hormone therapy. There must have been at least one month since last surgery. To ensure heterogeneity of the socioeconomic features, patients from one private (Centro de Asistencia del Sindicato Médico del Uruguay – CASMU) and three public (Hospital de Clínicas de la Universidad de la República, Instituto de Oncología del MSP, Servicio de Oncología Radioterápica del Hospital Pereira Rossell) hospitals of the city of Montevideo were recruited for the study. Patients with ostensible cognitive deficits or serious psychiatric dysfunctions were excluded. Ability to give informed consent was required. Approval from the corresponding ethics committees was obtained.


Patients were assessed using a battery of instruments. Two physician rated QOL questionnaires, the ECOG Performance Status Rating and the Spitzer's Quality of Life-Index doctor version (QLI-d) were completed either by the treating physician or oncologist. Patient self-reported questionnaires included the Functional Assessment of Cancer Therapy-General (FACT-G) Spanish Version 4, the Spitzer's Quality of Life-Index patient version (QLI-p), the Profile of Mood States, Short Form (POMS-SF) and the Marlowe-Crowne Social Desirability Scale (MCSDS).

ECOG Performance Status Rating (ECOG PSR) is a five-point scale [8] ranging from 0 (fully ambulatory) to 4 (not being able to leave bed).

The Quality of Life Index (QL-I) [9] is a five-item questionnaire where each one of them explores a dimension or domain of quality of life: health, activity, daily living, support and outlook. Every item has three response categories indicating different levels of functional impairment. Although it was originally developed as an observer rated scale (QLI-d), it can also be used as a patient rated scale (QLI-p). For the purpose of the study, the QL-I was translated into Spanish following a forward and backward translation procedure, carried out by an English native speaking linguist and a native Spanish-speaking English translator. The Spanish versions of the two QLI questionnaires are available from the authors upon request.

The Functional Assessment of Cancer Therapy-General Questionnaire (FACT-G) Spanish Version 4[2] is a widely used QOL instrument. It comprises 27 questions that assess four primary dimensions of QOL: physical (PWB; 7 items), social and family (SFWB; 7 items), emotional (EWB; 6 items), and functional well-being (FWB; 7 items). It uses 5-point Likert-type response categories ranging from 0 = 'not at all' to 4 = 'very much'. The total FACT-G score is the summation of the 4 subscale scores and ranges from 0 to 108. Data from a previous study on the FACT-G Spanish version 2[6] conducted among Uruguayan cancer patients suggested that its reliability was acceptable to good in all subscales except for the EWB scale. An important question raised by these results was whether these subscales showed sufficient internal consistency to justify their use across cultures and whether there was equivalence of the Spanish EWB to its English counterpart. Ever since, developers of the FACT-G revised the questionnaire into its most recent version 4 [2]. Major changes were the inclusion of an additional item ("I worry my condition will get worse" – "Me preocupa que mi enfermedad empeore") in the EWB subscale, the removal of the two-item "Relationship with the Doctor" subscale and the rewording of 12 Spanish items to improve their readability.

The Profile of Mood States, Short Form (POMS-SF)[10] is a widely used scale measuring subjective mood states, such as anxiety, tension, vigor, depression, fatigue and confusion. The POMS-SF is a valid measure of affective states and psychological adjustment in cancer patients and is available in Spanish. A Total Mood Disturbance score (POMS TMD) may be obtained by summing the five scores of Tension, Depression, Anxiety, Fatigue and Confusion subscales and substracting Vigor from these scores. Only patients with a 6th grade or higher level of reading abilities were included in the analysis, according to the instrument developers' instructions.

The Marlowe-Crowne Social Desirability Scale (MCSDS) [11]. The 10 items short form of the MCSDS provides a measure of the degree to which participants endorse socially desirable characteristics. A validated Spanish version of the questionnaire [12] was completed by the study participants.

Demographic, disease and treatment information was collected from patients, the treating physician and verified by the research assistants with the participants' medical record.

Statistical analysis

Classical psychometric approach for analysis of the data consisted of an examination of the reliability and validity of the FACT-G Version 4. Reliability was examined by internal consistency (Cronbach's coefficient alpha) for each subscale and the overall scale. Alpha coefficients of 0.70 or higher were considered acceptable [13]. Construct validity was assessed by comparing mean differences in FACT-G total and subscale scores according to known groups, i.e. patients performance status, in vs. out-patients and by studying the correlations between the instruments (convergent and discriminant validity).

Several methods have been described in order to establish clinical significance of QOL measures [14]. Anchor-based methods examine the relationship between scores on the instrument whose interpretation is under question (target instrument) and some independent measure (an anchor). This approach requires that 1) the anchor is easily interpretable and 2) there must be appreciable association between the target and the anchor. Differences in scores in relation to the clinical anchors can be then used to set the minimum important difference or clinically meaningful change [15] in order to evaluate outcomes in clinical trials. Anchor-based clinical significance was studied in order to determine clinically significant differences in QOL assessments as measured by FACT-G total and subscale scores, using tumor stage as a definite clinical criterion commonly used in the Oncology field[16]. Differences of 5 to 10 score points in a 100 point scale are considered relevant in determining the clinical significance of QOL measures [17]. To examine the statistical magnitude of the observed differences, each mean difference score was standardized by relating this score to its standard deviation (effect size). An effect size of d = 0.2 was taken to indicate a small difference, d = 0.5 a moderate difference, and d = 0.8, a large one [18].

Rasch analysis

The item response data were analyzed using Andrich's [1922] rating scale model (RSM). The RSM is an item response theory (IRT)-based measurement model and has been implemented in the WINSTEPS computer program [23]. The RSM specifies two facets (person latent trait, Bn; item location, Di), and the step threshold (Fi). The probability of person n responding in response category j to item i can then be expressed by the formula:

ln [P nij / P ni(j-1) ] = Bn - Di - Fj,

in which P nij is the probability of person n endorsing or choosing in category j of item i, P ni(j-1) is the probability of person n endorsing or choosing in category j - 1 of item i, Bn is the latent trait measure (e.g., fatigue) of person n, and Di is the location of item i, and Fj is the step threshold between categories j - 1 and j. In the present study, for example, F1 is the transition from intensity category 1 ("not at all") to category 2 ("a little bit") and F4 is the transition from category 4 ("quite a bit") to category 5 ("very much"). That is the point on the latent trait scale (i.e., PWB) at which two consecutive category response curves intersect.

Item fit statistics

In order to examine the fit of each item to measure a unidimensional construct (e.g., PWB), the infit and outfit mean square (MNSQ) item fit statistics provided in the WINSTEPS program were evaluated. Fit implies meeting the measurement requirements of item homogeneity and unidimensionality. It also indicates the validity of the item calibrations and person measures. Item misfit indicates that an item is not measuring the same underlying construct as other items within the same scale.

The infit MNSQ is an information-weighted fit statistic, which is more sensitive to unexpected behavior affecting responses to items near the person's trait level. The weighting reduces the influence of less informative, low variance, off-target responses. The outfit MNSQ is an outlier-sensitive fit statistic, more sensitive to unexpected behavior by persons on items far from the person's trait level. These statistics have an expected value of 1.0, and range from 0 to infinity. Values substantially below 1 indicate local dependencies in the data; values substantially above 1 indicate noise. These values are on a ratio scale, so that 1.2 indicates 20% excess noise. We set .60–1.4 for infit MNSQ and outfit MNSQ as cut-off criteria for items with good fit.


A total of 361 patients were approached and asked to participate. Among them, 36 patients (10%) either refused to participate, were too sick to complete the battery of questionnaires or a family member prevented the patient from participating. Data collection for 16 other participants (4%) was not completed due to other reasons. There were no significant differences between participants and non-participants with regard to age, gender, institution or tumor stage, however more non-participants were in-patients (chi-square = 13.5, p = .001) and rated higher in the ECOG PSR ratings (chi-square = 7.0, p = .03). The study analysis was based on the remaining 309 patients. Detailed demographic and clinical information about these patients can be found in Table 1. Certain facts to be highlighted are that this is a sample of severely ill patients with 78% of them having regional and metastatic diseases; only 13.9% of patients being fully employed, 27% reached less then the sixth year of formal education and with a very unequal income distribution. The distribution of income shows a very wide range in monthly income with an interquartile range of 192 and 571 dollars/month that highlights the kind of low income and unequal distribution that prevails in South American countries. As many as 41.7% of patients in the study sample stated not to have any religion, a typical characteristic of Uruguayan culture, quite dissimilar to other South American countries.

Table 1 Socio-demographic and clinical features of the sample (n = 309)

Mean time since cancer diagnosis was 29 months (SD = 39.5 months; ranged 2 weeks to 24.5 years). As for treatment characteristics, 172 patients had undergone surgery (55.7%); 204 received only one treatment modality, 95 of them were treated with chemotherapy (ChT) (30.7%), 98 with radiotherapy (RT) (31.7%) and 11 with hormone therapy (HT) (3.6%), while 100 patients received a combination of ChT and RT; 4 (1.3%) were treated with RT and HT and 1 (0.3%) with ChT and HT. The timing of testing in relation to treatment varied along the sample. In 119 cases (38.5%), patients were interviewed during the week following their last treatment (ChT cycle or RT session); in 64 cases (20.7%) there had been between one week and one month since the last treatment; in 44 cases (14.2%) between 3 months and a year and 82 (26.5%) had been off treatment for more than one year.

The mode of administration of the FACT-G (self-administration vs. read in interview) was registered in 303 cases. Although the FACT-G is designed for self-administration, most patients in our sample (n = 205, 67.7%) requested some help from the interviewer to fill out the questionnaire. Figures 1,2,3,4 show differences in sociodemographic and somatic features of the two groups of patients. T-test of independent samples showed that those patients capable of self administration were younger (p < .000) while chi-square statistics showed that this group of patients had a higher education level (p < .000) and better performance status (lower ratings in ECOG scale) (p < .000). No differences in gender were observed.

Figure 1
figure 1

Differences in mode of administration with age

Figure 2
figure 2

Differences in mode of administration with gender

Figure 3
figure 3

Differences in mode of administration with education

Figure 4
figure 4

Differences in mode of administration with ECOG-PSR


The reliability of the each of the FACT-G scales was evaluated with Cronbach's coefficient alpha. The alpha coefficients for the total scale and the four FACT subscales (PWB, SFWB, EWB, FWB) were quite good (ranged .78 – .91; See Table 2). Table 2 also shows alpha coefficients of FACT-G English and Spanish Version 2. No relevant differences were found in Cronbach alpha coefficients for Total FACT-G and its subscales when the group of patients who filled out the questionnaires by self administration (Cronbach's alpha = .78 – .91) and by interview (Cronbach's alpha = .79 – .89) were studied separately.

Table 2 Internal consistency of FACT-G and subscales


Mean score differences among known groups were tested by ANOVA or t-test. Scheffé post-hoc comparisons showed significant differences in FACT-G total and subscale scores according to ECOG PSR (Table 3), indicating that patients with worse functional status rated lower in the QOL multidimensional assessment. Due to the small number, patients rated "3" or "4" on ECOG PSR were combined with individuals rated "2". Total FACT-G scores and subscale scores, with the exception of the PWB subscale, also enabled to discriminate among groups according to patient location at the moment of filling out the questionnaire (out-patient vs. in-patient), see Table 3.

Table 3 Differences in FACT-G scores according to criteria groups

As another evidence of validity, tumor stage (Local vs. Regional vs. Metastatic disease) was examined as a clinical anchor and differences in mean scores and effect sizes were calculated (Table 4). Differences of approximately 5 to 10 % points in total FACT-G score (5.6 to 10.5 points in a 108 point scale) were observed between local vs. metastatic and regional vs. metastatic diseases. Accordingly differences of 11 to 18% points in the PWB subscale score (3 to 5 points in a 28 point subscale) and of 9% in the FWB subscales (2.6 points in a 28 point subscale) were observed between the ratings of patients with local and regional diseases vs. those with spread forms of cancer. Calculated effect sizes of these differences ranged between 0.30–0.60.

Table 4 Differences in FACT-G and subscale scores with tumor stage and effect size

Convergent and discriminant validity was evaluated using data from the whole set of patients with the exception of the POMS-SF questionnaire. In this case, only patients that had completed the sixth grade of primary school and higher were included according to the POMS manual's instructions. As expected, Pearson correlation coefficients were high between the FACT-G total score and its subscale scores, ranging from r = .53 to r = .80, see Table 5. Similarly, moderate but statistically significant correlations (r = .30 and higher) were observed between QL-I total score in both doctor and patient versions as well as between QL-I items and the corresponding FACT-G subscales (QL-Id and QL-Ip Activity, Daily Living and Health and FWB; QL-Id and QL-Ip Support and SFW), (Table 5). Correlation between the FACT-G and the POMS-SF Total Mood Disturbance Score (POMS TMD) is also relatively high. As expected, the correlation coefficients between the FACT-G total and subscale scores and the MCSDS-10 are low.

Table 5 Correlation between FACT-G, QLI doctor and patient version, POMS-SF, MCSDS-10

The results of Rasch analyses confirmed the unidimensionality of each subscale of the FACT-G with two exceptions (see Table 6). Items in each table were listed in the order of item difficulty. GS7 (I am satisfied with my sex life) of the SFWB subscale had both MNSQ fit statistics greater 1.40, indicating it did not perform well with the other items in the same scale. GE2 (I am satisfied with how I'm coping with my illness) of the EWB also had large MNSQ fit statistics, indicating its poor fit to the unidimensionality model with the other items on the same scale.

Table 6 IRT based analysis of FACT-G Spanish Version 4.


Since most instruments designed to evaluate quality of life have been developed in the United States or in Western Europe, it is necessary to adapt them to be used in other cultural settings. Thus, it is important to produce a culturally equivalent measure that can be used to accurately evaluate different groups of people. The final step in the complex process of cross-cultural adaptation is to validate the instrument through the study of the psychometric properties of the measure.

A validation study of the FACT-G Spanish Version 4 was conducted in a sample of Uruguayan cancer patients. A large variability in the biological and sociodemographic features of the sample was ensured to study the general performance of the questionnaire. The frequency of tumor sites represents very closely the incidence of solid tumors in the Uruguayan general population [24].

Reliability analysis showed high internal consistency as indicated by Cronbach alpha coefficients ranging from (.78 – .91). The comparison of these data with the results obtained from the FACT-G English and Spanish Version 2 (Table 2) points out a remarkable improvement in FACT-G Spanish Version 4 total scale as well as subscale reliability coefficients. Based on these results it is safe to conclude that the FACT-G Spanish Version 4 shows sufficient internal consistency to justify its use across cultures.

As an evidence of construct validity, the FACT-G questionnaire appeared to be capable of discriminating among groups of patients according to their level of performance status, showing differences in the total scale and subscale scores with the criterion groups. The FACT-G total and the Functional Well-being subscale showed the best discriminative ability. Differences among known groups were also observed in the FACT-G total and subscale scores in relation to in-patients vs. out-patients. As expected, the FACT-G scores are higher (better QOL) among outpatients and differences were statistically significant on the Functional, Social and Emotional Well-being subscales.

Quality of life researchers have been concerned about the clinical significance of measures and have pursued the objective to find practical and comprehensive criteria that could be interpretable by clinicians when conveying research results. Clinical status and disease characteristics were considered as clinical anchors. In our study, differences ranging from 5 to10 points in the overall scale and of approximately 3 points in the physical and functional subscales can be considered clinically relevant since they can discriminate between patients with loco regional and metastatic diseases, a clear-cut criterion commonly used by oncologists. Accordingly, effect size calculations showed moderate values ranging from 0.30 to 0.60. These findings are consistent with those found in a longitudinal study using the FACT-G to assess treatment outcomes in a sample of advanced lung cancer patients [15].

In CTT, the most common form of construct validation of HRQL measures has been the study of convergent and discriminant validity [25]. We included it in our study, along with the more recent IRT approach, because it provides relevant information on the relationship of the questionnaire with other measures of QOL and related constructs, i.e., performance status or psychological distress after a priori hypothesis were made about the magnitude and direction of the correlations. As an evidence of convergent validity, moderate but significant, correlations were found between the FACT-G and a set of instruments (ECOG PSR, QL-I and POMS-SF) that are expected a priori to be related to QOL assessments while no correlation was found between the FACT-G and the MCSDS-10, supporting divergent validity.

An important issue to be considered is the technical equivalence of the FACT-G when used in a sample of cancer patients of a South American country [26]. As mentioned earlier, most patients in the Uruguayan sample preferred the questionnaire to be read out loud by an interviewer instead of filling it out by themselves. This is not a common finding in studies with patients from the United States. These may raise the issue as to whether this difference in the method of assessment is comparable in each culture with respect to the data that it yields. In our study, the internal consistency of the FACT-G did not vary when studied separately, by means of the Cronbach alpha coefficients for the FACT-G total and subscale scores for two groups of patients. In a study of the impact of socio-cultural and clinical factors on Hispanic and African American cancer patients' quality of life, Wan et al [27] also found no significant effect of the mode of administration of the FACT-G on the reporting of overall QOL. Recently, audio-visual computerized based assessments of QOL provide an innovative way for gathering and using self-report data and may be feasible for individuals with limited literacy skills (Hahn et al. unpublished data).

Based on the CTT approach, we may conclude that Spanish FACT-G Version 4 is a psychometrically sound instrument to assess QOL in the population being studied. However, in the present study, we moved on to introduce IRT based analyses of the data considering the significant advantages shown by this method when used to evaluate health outcomes measures. Despite the long history of CTT, there remain major limitations in some areas that have been summarized by Hambleton [28]. First of all, the CTT-based statistics that describe test performance are sample dependent. IRT is more useful since it provides more robust item statistics that are independent and invariant over sample populations that vary in the trait measured by the test. Another limitation of CTT is that the scores that are commonly used as a measure of the examinees' ability are test dependent. Potential advantages of using IRT in health outcome assessments are: more comprehensive and accurate evaluation of item characteristics, assessment of group differences in item and scale functioning, evaluation of scales containing items with different response formats, improvement of existing measures, computer adaptive testing (CAT) applications, and evaluation of person fit [29]. Thus, IRT can facilitate the development of new items and scales to improve exiting measures. It may raise attention on redundant items or the location along the trait continuum (in our case, quality of life) where the scale provides little information and needs to be improved.

An IRT analysis was included in the evaluation of FACT-G Spanish Version 4 because it provides additional information on the reliability of the scale than that provided by the CTT approach. A rating scale model (RSM) [20] was used which assumes that the logit-transformed measures of the item scores within each subscale vary along the latent trait level (quality of life) and are aligned according to the difficulty (or location) the patients had to endorse each item, with negative values representing those items that are easier to endorse and positive values those that are more difficult. In the present study, item fit statistics confirmed the unidimensionality of each subscale with two exceptions. In the case of GS7 (I am satisfied with my sex life), many patients were reluctant to give information about their sexual life and this may be a cause of inconsistency in their responses. Another reason for discrepancy may be related to some problem in translation or comprehension. In both items (GS7 and GE2) the word "satisfaction" was translated into Spanish as "satisfacción" which implies in Spanish a degree of fulfillment that patients may be not prone to express when answering such question. Another possible explanation is that these items are the only ones phrased positively in their respective subscales while the rest of the items refer to negative conditions for quality of life.

Other IRT models could have been used for the analysis of the data. For instance, the graded response model (GRM), an extension of the two parameter logistic model [30] is also appropriate to use when item responses can be characterized as ordered categorical responses. However, scores obtained using several models were highly correlated showing that these approaches yield comparable results [31].

IRT analyses demands large sample sizes in order to obtain stable and invariant item and latent trait estimates. However, several studies using this procedure to assess the psychometric properties of QOL measures addressed rather small sample of patients (range: 100 to 400 patients) [3237].

Cella and Chang [7] warned of the possible limitations of using IRT methods in the evaluation of health measures since they were originally developed for and used with a fairly homogeneous educational assessment population. When we apply these methods to more heterogeneous clinical populations there may be limitations to obtain item-free estimates of sample latent traits. They remark that the context, selection and sequence of questions, considering both item diversity and clinical diversity, may produce sample-dependent item difficulty estimates and therefore unreliable item-dependant estimates of patient ability. The continuous monitoring of item calibrations involved in the process of item banking will help to solve these uncertainties.

The present study is the first one in South America reporting results on item functioning on a health related quality of life measure. Future studies with larger sample of patients could lead to a better understanding of differences in item functioning across different South American countries and cultures and move forward to item banking and CAT technology suitable for developing countries.


We conclude that the FACT-G Spanish Version 4 showed, using classic psychometric and IRT approaches, good reliability and validity and is a valid instrument to set clinical significant differences in longitudinal studies of cancer treatment. Thus, the FACT – G Spanish language version, as reported here, provides sufficient assurance of equivalence to its original English version to be used in future research on quality of life among South American Spanish speaking patients.