Main

Nasopharyngeal carcinoma (NPC), here used as a surrogate of nasopharyngeal cancer, is one of the rarer malignancies globally, with an age-adjusted incidence rate of 1.2 per 100 000 persons per year for both sexes combined (Ferlay et al, 2010). Its incidence rates worldwide show a noteworthy geographical and ethnic variation (Jemal et al, 2011). The southern part of China, other parts of South-Eastern Asia, Northern and Southern Africa, and Alaska (limited to the Inuit populations) define the high-incidence area, whereas populations living elsewhere in the United States of America and Europe have considerably lower NPC incidence rates (Curado et al, 2007; Jemal et al, 2011). Infection with the Epstein–Barr virus (EBV) and genetic susceptibility appear to have a major role in high-incidence populations, but migrant studies suggest that other environmental factors may also be important (Vokes et al, 1997; Chang and Adami, 2006). In low-incidence areas, NPC has been associated with well-known lifestyle risk factors, such as tobacco and alcohol (Polesel et al, 2011; Vokes et al, 1997; International Agency for Research on Cancer, 2004, 2007).

Among other factors, diet has been associated with NPC in either low- or high-incidence areas (Chang and Adami, 2006; Gallicchio et al, 2006; Jia and Qin, 2012). There are suggestions that high intakes of non-preserved fruit and non-starchy vegetables are protective against NPC, whereas Cantonese-style salted fish, characterised by using less salt and a higher degree of fermentation during the drying process, is a cause of NPC (World Cancer Research Fund/American Institute for Cancer Research, 2007). In addition, a few studies considered the role of single nutrients on this cancer site (Lee et al, 1994; Farrow et al, 1998; Kasum et al, 2002; Polesel et al, 2012), but only two of them were based on validated food-frequency questionnaires (FFQs) (Farrow et al, 1998; Polesel et al, 2012). To our knowledge, only one study assessed the association between dietary patterns and NPC risk (Armstrong et al, 1998), although dietary patterns address the well-known issues of collinearity of nutrients and interdependencies between foods and nutrients (Hu, 2002). Therefore, the overall evidence on diet and NPC risk is still weak and mainly limited to high-incidence areas (Farrow et al, 1998; Kasum et al, 2002; Gallicchio et al, 2006). This indicated the need for future research.

In the current paper, we will carry out an exploratory principal component factor analysis (PCFA) to identify dietary patterns defined a posteriori in a multicentre case–control study on cancer of the nasopharynx conducted in a low-incidence country, like Italy. Previous analyses on single nutrients and NPC on those data showed some positive associations with higher intakes of cholesterol, saturated fatty acids, and phosphorus, and inverse associations with some carotenoids and total fibre (Polesel et al, 2012; Bidoli et al, 2013). Among foods, higher intakes of eggs were significantly associated to an increased NPC risk, whereas total vegetables, in particular raw and yellow/red vegetables, were inversely related to it (Polesel et al, 2013).

The identification of dietary patterns may allow to integrate our knowledge of the association between dietary habits and NPC risk in this low-incidence Mediterranean country.

Materials and Methods

A case–control study on NPC was conducted between January 1992 and December 2008 within an established network of centres, including Aviano (Pordenone) and Milan in northern Italy, and Naples and Catania in southern Italy (Polesel et al, 2011, 2012; Bidoli et al, 2013). Cases were 198 Caucasian subjects (157 men, 41 women; median age 52 years, range 18–76 years) admitted to major teaching and general hospitals in the study areas with incident, histologically confirmed NPC, diagnosed no longer than 1 year before the interview, and with no history of cancer at other sites. Available cases included 137 (68.5%) undifferentiated NPCs (World Health Organization (WHO) type 3), 23 (11.5%) keratinising squamous cell carcinomas (WHO type 1; here referred to as differentiated NPCs), and 38 (20%) not otherwise specified NPCs (Shanmugaratnam, 1991). EBV status was defined based on the detection of EBV nuclear antigen in tissue samples and was available for 61 cases only. All the 57 undifferentiated NPCs tested and 2 out of 4 differentiated NPCs were EBV positive. Controls were 594 Caucasian subjects (471 men, 123 women; median age 52 years, range 19–76 years) admitted to the same hospitals as cases for a wide spectrum of acute, non-neoplastic conditions, unrelated to tobacco smoking or alcohol or to long-term modifications of diet. Three controls were frequency matched to each case according to age (±2 years), sex, period of interview (±2 years), and area of residence. Thirty-four percent of the controls were admitted for traumas, 32% for other orthopedic disorders, 22% for acute surgical conditions, and 12% for miscellaneous other illnesses. Less than 3% of both cases and controls originally contacted refused to participate. Cases and controls who agreed to participate in the study signed an informed consent based on recommendations from the individual hospitals’ ethics boards.

Centrally trained interviewers administered a structured questionnaire to cases and controls during their hospital stay. The questionnaire included information on socio-demographic characteristics, anthropometric variables, life-style factors, including tobacco and alcohol, a problem-oriented personal medical history, family history of cancer, and, for women, menstrual and reproductive history.

Information on diet referred to the 2 years before diagnosis for cases (or hospital admission, for the controls) and was based on a FFQ that was validated for nutrient intake and tested for reproducibility for specific nutrients and food items (Franceschi et al, 1993, 1995; Decarli et al, 1996). Subjects were asked to indicate quantity and average weekly frequency of consumption for the period under investigation. The FFQ presented 78 foods and food groups, including some of the most common Italian recipes, and various types of alcoholic beverages. Intakes reported at least once a month but less than once a week were coded as 0.5 per week. Dietary supplements were not considered, given their rare consumption in this population. Other questions on fat-intake pattern, as well as portion size, were used to modulate the composition of recipes.

Italian food composition tables (Gnagnarella et al, 2004) were used to calculate intakes of total energy and various nutrients. Losses due to cooking were subtracted from the computation of the vitamin content, when appropriate.

Statistical analysis

Factorability of the original matrix

We conducted the analyses on a selected set of 28 major macro- and micro-nutrients. Nutrients were chosen to provide a comprehensive representation of the Italian diet and to assess their potential role in cancer risk. Moreover, we took into account existing relationships among nutrients to avoid over-representation of single nutrients and consequent artificially higher correlations.

We evaluated the correlation matrix of the nutrients to determine if it was factorable through both visual inspection of the matrix and statistical procedures, including Bartlett’s test of sphericity, overall (Kaiser-Meyer-Olkin) and individual measures of sampling adequacy (Pett et al, 2003). Given the reassuring results obtained (see Results section), we applied a factor analysis to derive the dietary patterns.

Identification of dietary patterns

Exploratory PCFA (Johnson and Wichern, 2007) was carried out on the correlation matrix of the original nutrients of both cases and controls to describe the variance–covariance structure among nutrients in terms of a few underlying unobservable and randomly varying factors that are generally known as dietary patterns. We chose the number of factors to retain based on the following criteria: factor eigenvalue greater than 1, scree-plot construction and factor interpretability (Johnson and Wichern, 2007). We applied a varimax rotation to the factor loading matrix to achieve a simpler loading pattern. Nutrients with rotated factor loadings greater or equal to 0.63 in absolute value on a given factor were used to name the factors, and are indicated as ‘dominant nutrients’ hereafter (the contribution that a factor gives to a nutrient’s sample variance is equal to the square of its loading on that factor, so, if we choose a 0.63 cut-off, we expect a minimum contribution of approximately 0.40; Comrey and Lee, 1992). We calculated factor scores following the weighted least squares method. They indicate the degree to which each subject’s diet conforms to one of the identified patterns.

To examine the robustness of the identified dietary patterns, we considered the following checks. First, we performed a principal axis factor analysis and a maximum likelihood factor analysis, after a logarithmic transformation of the original nutrients to improve normality in their joint distribution. Second, we confirmed the choice of the number of factors to retain referring to the Velicer’s minimum average partial test (Velicer, 1976). Third, we calculated factor scores by applying the multiple regression method and standardising the results (Johnson and Wichern, 2007). Given the robustness in the patterns identified by these complementary checks, we calculated factor scores in all subsequent analyses by using results from the original overall PCFA and applying the weighted least squares method.

To assess reliability and refine the identified factors, we evaluated the internal consistency of those nutrients that load more than 0.40 (in absolute value) on any factor using standardised Cronbach’s coefficient α (Cronbach, 1951; Pett et al, 2003).

To confirm internal reproducibility of the identified patterns, factor analysis was performed separately in two (randomly selected) subsets of the original data using the same approach of the main analysis. The procedure was repeated several times.

To improve interpretability of the identified dietary patterns, we calculated the values of the Spearman rank correlation coefficient between the continuous factor scores derived from our factor analysis and the weekly number of portions for 29 selected food groups defined on the same data.

Risk estimates

For each factor, participants were grouped into three categories according to tertiles of factor scores among the controls.

We estimated the odds ratios (ORs) and the corresponding 95% confidence intervals (CIs) for tertile categories and a continuous increment in factor scores using unconditional multiple logistic regression models. We fitted separate models for each factor and a composite model including all the factors simultaneously. Tests for linear trend were computed for all these models scoring the tertiles as numbers from 1 to 3. We included in each model the same set of potential confounding and risk factors: age, sex, area of residence, education, year of interview, alcohol drinking, and tobacco smoking. We carried out stratified analyses by age, alcohol, and tobacco.

Calculations were performed using the open-source statistical computing environment R (Ihaka and Gentleman, 1996; R Core Team, 2014), with its libraries psych (Revelle, 2014) and GPArotation (Bernaards and Jennrich, 2005).

Results

The correlation matrix of the original nutrients was amenable to factor analysis. Visual inspection of the correlation matrix revealed that each nutrient showed at least 12 correlation coefficients ⩾0.30 in absolute value (data not shown), thus allowing to perform the analyses on the entire set of selected nutrients. Table 1 reports results on statistical procedures for checking matrix factorability. Bartlett’s test of sphericity allowed to reject the null hypothesis that the correlation matrix is an identity matrix (P-value<0.001). The Kaiser-Meyer-Olkin statistic was equal to about 0.85, suggesting that the sample size was adequate relative to the number of nutrients. In addition, the individual measures of sampling adequacy gave reassuring results, with 21 nutrients having measures ⩾0.80, 5 in the 0.70 s, and only 2 nutrients with measures between 0.60 and 0.70.

Table 1 Factorability of the correlation matrix of the original nutrients: Bartlett’s test of sphericity and measures of sampling adequacy

Table 2 presents the factor loading matrix for the five retained factors, together with the corresponding communality estimates. The retained factors explained ∼80% of the total variance in the original data set. The greater was the loading of a given nutrient to a factor, the higher was the contribution of that nutrient to the factor. The first factor, named Animal products, had the greatest loadings on calcium, riboflavin, phosphorus, saturated fatty acids, animal protein, and cholesterol (20.90% of the total variance explained); the second factor, named Starch-rich, was characterised by the greatest loadings on starch, vegetable protein, and sodium (15.85% of the total variance explained); the third factor, named Vitamins and fibre, had the greatest loadings on vitamin C and total fibre, β-carotene equivalents, and total folate (15.12% of the total variance explained); the fourth factor, named Animal unsaturated fatty acids (AUFAs), had the greatest loadings on vitamin D, other polyunsaturated fatty acids, and niacin (15.06% of the total variance explained); the fifth factor, named Vegetable unsaturated fatty acids (VUFAs), had the greatest loadings on linoleic acid, vitamin E, and linolenic acid (12.67% of the total variance explained). Except for lycopene, all the examined nutrients showed at least one loading greater than 0.40 on any factor, thus proving to be good candidates for the original set of nutrients. The percentages of nutrient variance explained by all the retained factors (communality estimates) were generally satisfactory and exceeded the threshold of 80% for most nutrients.

Table 2 Factor loading matrixa, COMM and explained VAR for the five major dietary patterns identified by factor analysis

Most of the nutrients contributed to high reliability. Standardised coefficient α for each factor were greater than 0.90, indicating that at least 90% of the variance of the total scores on these subscales for each factor can be attributed to reliable, systematic variance. Standardised coefficient α, when item deleted were generally lower than the corresponding overall standardised coefficient α for the same factor, although the differences were small (Appendix Table A1). The internal reproducibility of the sets of patterns identified in the two split-samples was reassuring. Factor loading matrices and communalities obtained from the two factor analyses on the split-samples were almost identical (data not shown).

Table 3 shows the values of the Spearman rank correlation coefficient between the continuous factor scores derived from our factor analysis based on nutrient intakes and the weekly number of portions of 29 food groups on the same data set. For the Animal product pattern, the highest values were with milk, cheese, desserts, eggs, sugar and candies, potatoes, butter and margarine, and red meat; for the Starch-rich pattern, the highest values of the Spearman coefficient were with bread, desserts, red meat, and pasta and rice; for the Vitamins and fibre pattern, the highest values were with other fruits, citrus fruits, fruiting vegetables, root vegetables, other vegetables, olive oil, leafy vegetables, pulses, cruciferous vegetables, and liver; for the AUFA pattern, the highest values were with fish, liver, red meat, olive oil, eggs, processed meat, and leafy vegetables; for the VUFA pattern, the highest values of the Spearman coefficient were with specified seed oils, unspecified seed oils, red meat, leafy vegetables, butter and margarine.

Table 3 Spearman rank correlation coefficients between continuous factor scores derived from factor analysis on nutrient intakes and weekly number of portions for 29 selected food groups defined on the same data

Table 4 gives the ORs and corresponding CIs for NPC by tertiles of factor scores and continuous factor scores for the retained dietary patterns. Results refer to the composite model including all the five factors simultaneously, together with the relevant confounding and risk variables. Higher intakes of the dominant nutrients for the Animal product dietary pattern were positively associated to NPC risk (OR=2.62, 95% CI=1.67–4.13 for the highest vs the lowest score tertile, P for trend<0.001). A positive association was also observed between the Starch-rich pattern and NPC risk (OR=2.05, 95% CI=1.27–3.33, P for trend=0.022), whereas a nonsignificant but inverse association was evident with the Vitamins and fibre pattern (OR=0.68, 95% CI=0.44–1.05, P for trend=0.223). In addition, there was a borderline significant association between the AUFA pattern and NPC risk (OR=1.55, 95% CI=1.00–2.39, P for trend=0.038). Finally, the VUFA pattern was positively related to NPC risk: the OR was 1.90 (95% CI=1.22–2.96, P for trend=0.011) for subjects consuming the highest intakes of the dominant nutrients for this pattern. Consistent results were observed for the five models including each factor separately.

Table 4 ORsa of NPC and corresponding 95% CIs according to tertiles of factor scores and continuous factor scores from a principal component factor analysis

Table 5 shows the ORs of NPC and corresponding CIs for the identified dietary patterns in strata of age, tobacco smoking, and alcohol drinking. There is no significant heterogeneity across strata for the five patterns. However, subjects in the highest tertile of the Vitamins and fibre pattern and drinking less than (or equal to) the median intake of the overall population (14.5 drinks per week) had a significantly lower NPC risk: OR=0.46 (95% CI=0.24–0.87) vs OR=0.96 (95% CI=0.51–1.80) for subjects drinking >14.5 drinks per week.

Table 5 ORsa of NPC and corresponding 95% CIs on tertiles of factor scores in strata of age, tobacco smoking, and alcohol drinking

Discussion

The present analysis identified five major dietary patterns that explained about 80% of the total variance in the nutrient intakes of this Italian population. After adjustment for several confounders and mutual adjustment for all the other patterns, the Animal products, the Starch-rich, and the VUFA patterns were positively related to NPC, whereas the AUFA pattern, still positively related, showed a borderline significant association with NPC. The Vitamins and fibre pattern was inversely, but non-significantly, associated with it.

To our knowledge, there is only one study investigating the role of dietary patterns, identified through a factor analysis, on the risk of NPC (Armstrong et al, 1998). This case–control study was carried in the high-incidence Chinese population of the Federal Territory (Kuala Lumpur) and State of Selangor, Malaysia, and assessed the association of NPC with single food items (queried in a 55-item FFQ) and dietary patterns, at two time points (childhood and 5 years before diagnosis). Factor analysis was applied on a selection of 19 foods identified as being associated with NPC risk on the basis of the single-food-item analysis. In accordance with our work, the results showed a significant positive association with a dietary pattern based on meats (Factor 3), including pork/beef, liver and other organ meats (∼60% increase in NPC risk with a change from the 25th to the 75th percentile of each estimated factor score), which is similar to our Animal product and AUFA patterns. The decrease in risk associated to their Factor 1 – loading highly on fresh fruit and vegetable consumption, with modest involvement of shrimp and preserved fruit – was replicated in our study, although the association was significant in subjects with at most 52 years only. In addition, the strongest positive association with NPC was found with a pattern based on salted, preserved foods (∼95% increase), which was not typical to the Italian diet. However, the preliminary selection of foods based on significant findings from the single-food-item analysis did not provided a comprehensive picture of the overall diet and therefore prevented from a fair comparison of results.

The composition of the patterns associated with NPC is consistent with findings on nutrients and food groups from our (Polesel et al, 2012, 2013; Bidoli et al, 2013) and some other studies (Farrow et al, 1998; Kasum et al, 2002; Chang and Adami, 2006; Gallicchio et al, 2006; Turkoz et al, 2011; Jia and Qin, 2012) conducted in low-incidence areas, where meat, fresh fruit and vegetables, β-carotene, and vitamin C were associated, to a greater or lesser extent, to NPC. However, apart from the established association with salted fish and salt-preserved foods, typical of high-incidence areas, the evidence on dietary components and NPC is still limited (World Cancer Research Fund/American Institute for Cancer Research, 2007, Jia and Qin, 2012).

Factor analysis is the most common statistical method used today to derive dietary patterns defined a posteriori in studies assessing the association between diet and cancer. A proper use of this technique allows to enter in the same regression model different aspects of the overall diet without suffering from severe multicollinearity. As, for instance, people who consume large amounts of red and processed meats tend to consume less poultry, fish, and vegetables, and vice versa, an apparent detrimental effect of these meats could possibly be due, at least in part, to low intakes of these favourable foods (World Cancer Research Fund/American Institute for Cancer Research, 2007). Factor analysis may take care of these issues in a more elaborate way than traditional approaches, and may offer advantages over simultaneous statistical adjustment for food groups used as independent exploratory covariates (Armstrong et al, 1998; Imamura et al, 2009). However, subjective decisions are involved at each stage of this process (Newby and Tucker, 2004; Edefonti et al, 2009). In this paper, we proposed the use of tools for the assessment of the factorability of the correlation matrix. Moreover, we stressed the importance of performing a series of separate robustness analyses to evaluate the impact of all the decisions taken during the analysis. For instance, we checked the solution method using different approaches, we considered two different methods for calculating factor scores and applied factor analysis separately in subgroups of subjects. Only after a complete series of checks, a given data reduction solution is likely to be independent from the specific statistical procedure used to derive it.

Given the strong evidence supporting a causal role of EBV in the development of NPC (Chang and Adami, 2006), the lack of information on EBV status in most of the NPC cases and controls was a potential weakness of the present study. In any case, EBV-status is unlikely to confound the association between dietary habits and NPC. In our case, the role of the identified dietary patterns did not substantially change when analyses were restricted to the undifferentiated NPC patients only, although the Vitamins and fibre pattern became significant in the last tertile category (OR=0.58, 95% CI=0.34–0.98). The small sample size was an additional weakness, but this is due to the fact that the study was carried out in a low-incidence area, which is actually a strength of the study. Indeed, in low-incidence areas, the choice of an adequate sample size should balance between avoiding statistical power issues, related to an insufficient number of subjects, and limiting the possibility of changes in life-style habits, especially diet, possibly occurring over a long period of observation.

This study has some limitations, but also several strengths of hospital-based case–control investigations. The high participation and the comparable catchment areas of cases and controls have likely avoided substantial selection biases, and comparison of cases with major diagnostic categories of controls led to similar results. Bias in the recall of food intake by cases should be small, given the limited knowledge and attention paid in this population to the possible relationship between diet and NPC. The comparability of the recall between cases and controls was improved by interviewing all the subjects in a hospital setting (D’Avanzo et al, 1997). Information on alcohol, tobacco, and diet was satisfactorily reproducible (D’Avanzo et al, 1996; Ferraroni et al, 1996). Subjects with admission diagnosis related to tobacco smoking, alcohol drinking, and diet modifications were not considered as eligible controls.

In conclusion, the findings of the present study point to the presence of positive associations between NPC and the Animal product, the Starch-rich, and the VUFA dietary patterns, respectively. A nonsignificant, but inverse, association was also evident with the Vitamins and fibre pattern in our low-incidence country.