Measurement invariance of the Functional Assessment of Cancer Therapy—Colorectal quality-of-life instrument among modes of administration

Objectives To test for the measurement invariance of the Functional Assessment of Cancer Therapy—Colorectal (FACT-C) in patients with colorectal neoplasms between two modes of administration (self- and interviewer administrations). It is important to establish the measurement invariance of the FACT-C across different modes of administration to ascertain whether it is valid to pool FACT-C data collected by different modes or to assess each group separately. Methods A cross-sectional sample of 391 Chinese patients with colorectal neoplasms was recruited from specialist outpatient clinics between September 2009 and July 2010. Confirmatory factor analysis (CFA) was used to test the original five-factor model of the FACT-C on data collected by self- and interviewer administrations in single-group analysis. Multiple-group CFA was then used to compare the factor structure between the two modes of administration using chi-square tests and other goodness-of-fit statistics. Results The hypothesized five-factor model of FACT-C demonstrated good fit in each group. Configural invariance and metric invariance were fully supported in multiple-group CFA. Some item intercepts and their corresponding error variances were not identical between administration groups, suggesting evidence of partial strict factorial invariance. Conclusions Our results confirmed that the five-factor structure of FACT-C was invariant in Chinese patients using both self- and interviewer administrations. It is appropriate to pool or compare data in the emotional well-being and colorectal cancer subscale scores collected by both administrations. Measurement invariance in three items, one from each of the other subscales, may be contaminated by response bias between modes of administration.


Introduction
Health-related quality of life (HRQOL) is commonly used in clinical trials as an outcome measure in the evaluation of medical interventions. Most HRQOL measures are completed by self-administration using pencil and paper or interviewer administration using face-to-face or telephonebased approaches. Theoretically, interviewer administration has the advantage of feasibility (e.g., higher response rate), especially among subjects who have low literacy, manual dexterity problems or visual impairments. Trained interviewers can help patients in understanding ambiguous and complex item concepts with the support of standardized clarifications and explanations. However, completing HRQOL measures by self-administration eliminates interviewer bias and may save considerable time and costs [1].
Measurement invariance across groups can be assumed when the relationships between observed variables and latent variables are the same across samples. However, unless measurement invariance across subgroups has been demonstrated for an instrument, group comparisons of HRQOL scores collected by different modes of administration may be problematic [2][3][4]. Differences in the observed HRQOL scores may not truly reflect the real differences in the latent variables, but confounders and their interpretations may be biased, flawed or misleading. Therefore, it is important to establish evidence for invariance of measurement scores across groups to support fair and meaningful group comparisons of HRQOL outcome data [2]. There has been much research raising concerns about measurement invariance of HRQOL instruments across socio-demographic subgroups, such as gender [5][6][7], age [7][8][9][10] and ethnicity [11][12][13][14][15][16]. The effect of administration mode on HRQOL score has also been found in a number of cancer clinical trial studies using generic [17] or cancer-specific instruments [1,11,18,19]. When differences in HRQOL scores are detected, the potential violation of measurement invariance across modes of administration needs to be considered. For example, a patient is hypothesized to express the same underlying level of HRQOL score through self-and interviewer administrations. Although both administration modes present the exact item and response wording, self-administration allows patients to reconsider and change their answers upon completion of the instrument. Interviewer administration may introduce barriers for patients to alter the answers of previous items. In addition, patients may tend to give more socially desirable responses in interviewer administration than in self-administration [20], responding in a socially favorable way of response. Therefore, not only are the observed variables given by the item responses related to latent variables, but they are also inferred by modes of administration.
The Functional Assessment of Cancer Therapy-General (FACT-G) is a commonly used HRQOL instrument in oncology. It was demonstrated to have a four-factor solution, corresponding to the original subscales, in the US population [1,21], and later among Latin-American (e.g., Uruguay [22] and Colombia [23]) and UK [24] patients. The replication of the findings across countries provides strong evidence regarding the dimensionality of the instrument. Some of these studies were conducted using exploratory factor analysis, and the data-driven factor structure was found to differ from the hypothesized factor structure [23,24]. Given the discrepancies between the data-driven and hypothesized factor structures, confirmatory factor analysis (CFA) was used to examine whether the data were the best fit with the hypothesized relationships of the instrument [1,23]. The FACT-Colorectal (FACT-C), a colorectal cancer-specific HRQOL measure, is an extended version of the FACT-G. It comprises five subscales (the four FACT-G subscales and an additional concerns of ''Colorectal Cancer Subscale'') [25]. The FACT instruments were principally designed for self-administration, but it can also be administered using interviews [26]. Measurement invariance of FACT-G between self-and interviewer administrations in patients with high literacy level has been illustrated [1]. However, the measurement invariance of the FACT has not been demonstrated between modes of administration in Chinese patients with a wider range of literacy competence.
The aims of the study were twofold. Firstly, we aimed to validate the conceptual measurement model of the FACT-C among Chinese patients with colorectal neoplasms. Secondly, we aimed to examine the measurement invariance of FACT-C scores collected using self-and interviewer administrations. Measurement invariance was assessed using sequential multiple-group CFA, a subcategory of modern quantitative approach [27]. The level of invariance of the FACT-C factor structure and factor loadings, and the equality of the item-level statistics across the two modes of administration groups were assessed. There are little data testing the factor structure and measurement model of the five-factor solution for the FACT-C using CFA. Furthermore, no studies have examined the factor structure of the FACT instruments in the Chinese population. To enable the combined analysis of FACT-C data collected by two different modes, measurement invariance should be established between them.

Method
Participants and data collection A total of 647 patients were recruited from outpatient specialist colorectal clinics of a regional hospital in Hong Kong, China, between September 2009 and July 2010. All adults had a known diagnosis of colorectal neoplasms (colorectal polyp/cancer) for at least 6 months. In this study, colorectal polyp and colorectal cancer were defined as non-malignant and malignant tumors of the colon or the rectum or both, respectively. Patients completed the traditional Chinese version of the FACT-C and reported demographic variables such as age, education level, marital status, working status, smoking and drinking status, income and disease status. Of relevance to the current study, clinical variables and other HRQOL measures were also collected but were reported in previous papers [28][29][30][31]. In total, 57 subjects refused to participate in this study, and 41 subjects were excluded because they had a life expectancy of less than 6 months, were unable to understand and communicate in Chinese/Cantonese, displayed evidence of cognitive impairment or were too ill to participate in an interview. Subjects were allowed to choose their preferred instrument administration mode unless they could not complete the questionnaire by themselves. According to the FACT administration guideline [26], the face-to-face and telephone administrations were concurrently supported if adequate training was provided to each interviewer. At least two training sessions were given to each interviewer who was then instructed to go through each item of the questionnaire starting from the beginning to the end and to standardize how each item and its response options were read out during the interviews. Previous studies provided support for the use of FACT instruments by interviewer administration which was not partitioned into face-to-face and telephone interviews [1,18]. Since then, 48 subjects by face-to-face and 108 subjects by telephone interviews were collapsed into one interviewer administration group in the current study to obtain a sufficient sample size. The aim of the study was explained to 549 eligible subjects (selfadministration: 340; interviewer administration: 209), and written consent was obtained. Thirteen subjects withdrew from the study in the early part of survey, and 145 subjects did not complete the FACT-C component of survey. The remaining 391 subjects (self-administration: 235; interviewer administration: 156) with complete FACT-C data were included in the data analyses. Ethics approval was obtained from the Institutional Review Board of the University of Hong Kong and Hospital Authority, and the trial was registered with the HK Clinical Trial Register.
The FACT-C The FACT-C, the colorectal-specific module of the FACT measurement system [26], measures self-reported HRQOL during the past 7 days and has been extensively validated in English- [25], Spanish- [25], Korean- [32], French- [33] and Cantonese-speaking Chinese patients [28]. FACT-C aggregates 36 items into five dimensions: Physical Well-Being (PWB, 7 items), Functional Well-Being (FWB, 7 items), Social/Family Well-Being (SWB, 7 items), Emotional Well-Being (EWB, 6 items) and additional concerns of Colorectal Cancer Subscale (CCS, 9 items). Each item is rated by a five-point Likert scale (not at all, a little bit, somewhat, quite a bit, very much). FACT-C has been shown to have an acceptable degree of validity and reliability across a number of populations [25,28,32,33] using classical quantitative approaches.
The FACT-C item related to sexual satisfaction (GS7, ''I am satisfied with my sex life'') was omitted from analysis because of the conservative sexual attitude among Chinese society which resulted in a low overall response rate (38.8 %). Two items relating to ostomy appliances (C8, ''I am embarrassed by my ostomy appliance'', and C9, ''Caring for my ostomy appliance is difficult'') were not applicable to 251 colorectal cancer subjects without living with stoma, so they were omitted from the analyses.

Data analysis
The patterns of socio-demographic and clinical characteristics of subjects with and without completion of FACT-C instrument were described. Independent t test and chisquare test were conducted to assess the differences between self-administered and interviewer-administered subjects. Descriptive analyses were carried out using SPSS 18.0 for Windows (SPSS Inc., Chicago, IL, USA).

Model estimation
Factor analyses were carried out using LISREL 8.80 program (Scientific Software International, Inc., Lincolnwood, IL, USA). The CFA models for ordinal data were performed using a polychoric correlation matrix to confirm the hypothesized factor structure for the FACT-C originally proposed by Ward et al. [25]. Diagonally weighted leastsquares method, which is an estimator that can be used for ordinal data, was employed for parameter estimations. Missing data were excluded from mean comparisons and factor analysis.

Measurement invariance testing
Measurement invariance of the hypothesized five-factor model was tested individually using the self-and interviewer-administered data, and a combined model was also assessed. The importance of evaluating the factor structure of FACT-C in single-group analysis was to investigate whether the measurement model had five factors and whether the 33 items loaded on the same factor across each mode of administration. Multiple-group CFA was conducted to examine the extent of measurement invariance of the FACT-C factor structure across the mode of administration comparison groups, which were evaluated using four steps [2,4,8,34,35]. Firstly, configural invariance (which tests the equality of factor structures and model specification across groups) was used to assess whether the hypothesized five-factor model is the same across groups. If there is configural invariance between the models, it is unnecessary to perform subsequent analyses of measurement invariance. Secondly, metric invariance (which tests equality of factor loadings across groups) was examined by constraining the factor loadings to be equal across groups. Thirdly, scalar invariance (which tests equality of item intercepts across groups) was examined. Scalar invariance is satisfied if the item intercepts and factor loadings are constrained to be identical across groups. Finally, strict factorial invariance (which tests the equality of item residuals across groups) was examined. Strict factorial invariance is achieved only if configural invariance, metric invariance, scalar invariance and item residuals are constrained to be equal across groups simultaneously [2]. The analytic procedures applied a ''step-up'' strategy that began with unconstrained model and consecutively restricted constrained models [35]. In each level of invariance testing, partial measurement invariance was assessed using the conventional Cheung and Rensvold [36] approach to determine whether the removal of cross-group constraints would improve the model fit substantially after re-specification of models. This approach was initially developed for testing partial metric invariance (partial equality of factor loadings across groups), but it can be applied to assess partial scalar invariance (partial equality of item intercepts across groups) and strict factorial invariance (partial equality item residuals across groups) [4].

Goodness-of-fit statistics
The model goodness-of-fit statistics were primarily assessed using root mean square error of approximation (RMSEA) [37], comparative fit index (CFI) [38] and Tucker-Lewis index (TLI). Besides these absolute and incremental fit measures, the Satorra-Bentler scaled chisquare statistic (SB v 2 ) [39] was estimated using the diagonally weighted least-squares method with degrees of freedom (df) reported to reflect the model fit. CFA Models were considered to have acceptable model fit if RMSEA values and their 90 % confidence intervals were close to 0.08 or below and CFI and TLI values were close to 0.95 or greater [40]. For multiple-group comparisons, the Satorra-Bentler scaled chi-square difference (DSB v 2 ) test and the change in CFI (DCFI) [41] were used to compare the model fit of the more constrained model with that of the less constrained model. P value of \0.05 was considered statistically significant for DSB v 2 test, implying the null hypothesis of invariance (or constrained model) should not be rejected. The DCFI of [-0.01 was recommended to indicate invariance [41].

Results
Sample characteristics Table 1 shows the socio-demographic and clinical characteristics of patients in overall, by administration modes and by completion of FACT-C instrument. The mean age of interviewer-administered patients was significantly higher than that of the self-administered patients (66.0 ± 12.0 vs. 61.2 ± 10.8, P \ 0.001). Self-administered patients were more likely to have younger age, received at least primary school education (P \ 0.001), work (P = 0.001) or more income (P \ 0.001) than interviewer-administered patients. CRC patients were more likely to be interviewer-administered than to be selfadministered (P = 0.041). Among CRC patients, those who were on palliative treatment (P = 0.007) or had stoma (P = 0.045) were associated with interviewer administration.
FACT-C descriptive statistics Table 2 shows the mean, standard deviation, and floor and ceiling effects of items among all patients, and separately for the self-administration and interviewer administration groups. Overall, three out of seven CCS items, all PWB items and five out of six EWB items had floor effect, defined as floor percentage [30 %. Ceiling effects ([30 %) were observed in four out of six SWB items. Table 3 shows the mean differences in FACT-C subscale scores with and without adjustments for socio-demographic and clinical characteristics. Statistical differences between self-and interviewer administrations were found in PWB, EWB and CCS subscale scores (P \ 0.001, P \ 0.001, P \ 0.001).
Adjusted results of those subscale scores were also significantly different between modes of administration (PWB, P \ 0.001, EWB, P \ 0.001, CCS, P = 0.002). Both the unadjusted and adjusted differences in those subscale scores were negative, indicating that interviewer administration had higher estimated HRQOL than self-administration.
Factor structure Table 4 demonstrates the goodness-of-fit indices of two CFA models in single-group analysis, overall and separately on data collected by two modes of administration. For those items with negative response choices, the responses were reversed to achieve consistency in positive factor loadings. In single-group analyses of self-and interviewer-administered data, factor loadings of all items except C5 (''I have diarrhea (diarrhoea)'') exceeded 0.4, achieving substantial interpretability of underlying factor structure. Based on the conventional guidelines [40], the original five-factor CFA model Multiple-group CFA Table 5 shows the results of the single-group and multiplegroup CFA for testing invariance between modes of administration. Multiple-group CFA initially started with the assessment of configural invariance with five-factor model. Model 3 precluded the equality constraints on The following model (Model 4) tested metric invariance by imposing additional equality constraint on the factor loadings. An acceptable fit of Model 3 to the data was significantly better than (DSB v 2 = 33.29, Ddf = 28, P value = 0.225, DCFI = 0.000) that for Model 4, which further supported the evidence of full metric invariance. In other words, the factor loadings were numerically identical in both administration groups.
Model 5 tested full scalar invariance by imposing the equality constraint on the intercepts. Full scalar invariance across groups was not fully supported as indicated by a statistically significant misfit (DSB v 2 = 127.04, Ddf = 28, P value \ 0.001, DCFI = -0.005). Upon rejection of full scalar invariance, separate CFA models were tested to allow the intercept of specific items to be freely estimated in order to test for partial scalar invariance. Model 6 presented the improvement in model fit when allowing intercepts of item GP3 (''Because of my physical condition, I have trouble meeting the needs of my family''), GS3 (''I get support from my friends'') and GF1 (''I am able to work (include work at home)'') to vary. The change in goodness-of-fit was good for Model 6 (RMSEA = 0.0649, CFI = 0.962), as compared to the full scalar invariance model (Model 5). Furthermore, compared with full metric invariance model (Model 4), Model 6 reported a better model fit (DSB v 2 = 24.83, Ddf = 25, P value = 0.472, DCFI = 0.000), suggesting partial scalar invariance (i.e., intercepts of all items except GP3, GS3 and GF1 were equivalent across groups) between self-administration and interviewer administration groups.
The final step in our invariance analyses tested partial strict factorial invariance by additionally restricting the error variances specific to item GP3, GS3 and GF1 to be equal among groups. Our results indicated better model fit in Model 7 (DSB v 2 = 22.35, Ddf = 30, P value = 0.841, DCFI = 0.002) when compared with the former partial scalar invariance model (Model 6). As a whole, our results indicated retention of partial strict factorial invariance between administration groups.

Discussion
This study examined the factor structures and measurement invariance of FACT-C between two administration groups. Our results showed significant mean differences in the physical and emotional well-being and colorectal-specific scores of the FACT-C between administration modes, while the differences were previously driven by lower estimated scores among self-administration rather than among interviewer administration on physical and emotional subscale scores of the FACT-G [1]. No difference was found in the social and functional subscale scores between administration modes. Differentials in administration modes were not significantly reduced by adjustment for socio-demographic and clinical factors. Unlike previous studies [1,11,19] that examined the relationship between factors and FACT-G subscale scores, mode of administration was an insignificant determinant of physical and emotional subscale scores in this study. Intervieweradministered patients did not report significantly lower HRQOL in social and functional subscale scores when compared with self-administered patients, which is in contrast to findings of previous studies [1,11,19].
Our CFA results provided empirical evidence to support the hypothesized five-factor structure and conceptual measurement model of the FACT-C [25]. It showed a satisfactory model fit without the need of any modification in the hypothesized factor structure in our Chinese colorectal neoplasms patients, irrespective of whether FACT-C was self-administered or interviewer-administered. In a previous study that examined the FACT-G in a Latin-American sample [23], modification of the original four-factor structure was suggested by factor analysis, with minor modifications to the emotional subscale. Factor analysis conducted in a UK sample [24] found that some items in the social and functional subscales did not load satisfactorily (r \ 0.5) on any existing factors, and this indicated potential ambiguity in the measurement of the underlying constructs. In contrast, we found that the factor loadings of all items in the physical, social, emotional and functional subscales were satisfactorily (r C 0.5) correlated  with their hypothesized factors within the colorectal-specific FACT-C measure. In the last seven items concerning colorectal cancer, five loaded on the colorectal specific factor.
Our multiple-group CFA supported full metric invariance (Model 5) which implied that the factor structure and corresponding factor loadings were equal between the two modes of administration. However, three violations of scalar invariance were identified in three different items in regard to mode of administration. The hypothesis of full scalar invariance was not supported because the FACT-C instrument was not invariant in intercepts in GP3 (''Because of my physical condition, I have trouble meeting the needs of my family''), GS3 (''I get support from my friends'') and GF1 (''I am able to work (include work at home)'') of the physical, social and functional subscales, respectively, between administration groups. Partial scalar invariance implied that group differences observed in the physical, social and functional subscales were not interpreted to reflect the real differences in the particular constructs. In other words, physical, social and functional subscales were measured similarly across administration groups when employing the hypothesized five-factor model of the FACT-C measure. Current findings established that the mean comparisons of those subscale scores differ between groups that used different modes of administration, particularly conveying a message that it is appropriate to pool or compare the data in emotional well-being and colorectal cancer subscale scores collected by two administration groups. In general, one possible explanation for the variance is the response bias introduced by the differential effects of administration mode [4]. Interviewer administration has a tendency to lead to responses in line with acceptable social norms, the so-called social desirability bias [20]. Cancer patients were found to inflate their HRQOL and report more socially favorable and desirable responses when the instrument was administered by an interviewer in a randomized study of administration modes [17]. Concerning the measurement non-invariance in three individual items (GP3, GS3 and GF1), patients may interpret diverse understanding from those items across administration modes or mixed up with other items that carry similar concept at the interviews. For example, most patients (76.3 %) held the same answer for item GS3 (''I get support from my friends'') and GS1 (''I feel close to my friends'') in interviewer administration, but it was less likely to occur (69.4 %) in self-administration. Evidence of the socio-demographics and diseases measurement invariance of the FACT-C across groups should be provided in further studies. It is certainly worthwhile to consider the measurement invariance of the FACT-C in relation to other patient characteristics because differentials in latent HRQOL may exist in those characteristics. Findings of measurement non-invariance in factor structure, loading or intercepts call for caution in interpreting group differences between self-administered and interviewer-administered FACT instruments Removing misfit items is recommended to improve the measurement properties of FACT instruments [1,24]. For instance, Hahn et al. [1] did not identify misfit items in the PWB and SWB subscales of the FACT-G English version and PWB and FWB subscales of the Spanish version. Mean comparisons of the misfit items and corresponding subscales between groups might be out of the scope of this study. Further studies should apply contemporary psychometric methods such as item-response theory and Rasch analysis to investigate the factor structure and psychometric performance of the FACT-C scales.

Limitations
Firstly, our study results were based on a convenience sample of Chinese patients with colorectal neoplasms, which did not necessitate the generalizability to non-Chinese or other Chinese populations. Investigations into measurement invariance in other cultural or ethnic groups of patients with colorectal neoplasms are needed. Secondly, the removal of the sensitive item related to sex and the items related to stoma that had low response rates limits the applicability of the results to these items. Further empirical evaluations of data from patients who are more likely to answer the sexual activity item may be useful to overcome this shortcoming. Finally, there were significant differences in socio-demographic characteristics between the self-administration and interviewer administration groups, possibly due to own preference for administration modes in the outpatient setting, which could have introduced measurement variance in addition to the mode of administration. Given the patient preferences in decision making for the administration modes in the current study, younger or lower-disease-severity patients preferred selfadministration to interviewer administration except those patients who had literacy dexterity or visual difficulty, and only interviewer administration was feasible. To eliminate the bias, patients' characteristics should be well balanced across the two administration groups when randomization is conducted in a further study.

Conclusion
Our results revealed that the five-factor structure of FACT-C provided excellent fit when the HRQOL instrument was either self-administered or interviewer-administered, separately or simultaneously. The construct validity of FACT-C was supported in Chinese patients with colorectal neoplasms. Given the acceptable degree of cross-group measurement invariance, it is valid to compare EWB and CCS scores collected by these two modes of administration. Complete measurement invariance could be contaminated by response bias by administration modes in other subscales. Pooling or direct comparison of FACT-C data collected by a mixture of self-and interviewer administrations should be done with caution.