Dietary patterns and breast cancer risk among women in northern Tanzania: a case–control study
- First Online:
- Cite this article as:
- Jordan, I., Hebestreit, A., Swai, B. et al. Eur J Nutr (2013) 52: 905. doi:10.1007/s00394-012-0398-1
- 1.2k Downloads
Breast cancer is the second most common cancer among women in the Kilimanjaro Region of Tanzania. It was tested within a case–control study in this region whether a specific dietary pattern impacts on the breast cancer risk.
A validated semi-quantitative Food Frequency Questionnaire was used to assess the dietary intake of 115 female breast cancer patients and 230 healthy age-matched women living in the same districts. A logistic regression was performed to estimate breast cancer risk. Dietary patterns were obtained using principal component analysis with Varimax rotation.
The adjusted logistic regression estimated an increased risk for a “Fatty Diet”, characterized by a higher consumption of milk, vegetable oils and fats, butter, lard and red meat (OR = 1.42, 95 % CI 1.08–1.87; P = 0.01), and for a “Fruity Diet”, characterized by a higher consumption of fish, mango, papaya, avocado and watery fruits (OR = 1.61, 95 % CI 1.14–2.28; P = 0.01). Both diets showed an inverse association with the ratio between polyunsaturated and saturated fatty acids (P/S ratio).
A diet characterized by a low P/S ratio seems to be more important for the development of breast cancer than total fat intake.
KeywordsBreast cancerDietary patternPUFATanzania
Established factors for breast cancer development are age at menarche, age at menopause, age at first full-term pregnancy, breastfeeding and alcohol consumption at all ages [1–7]. A high percentage of total body fat and tall height at adulthood in postmenopausal women is associated with an increased risk [8–11]. Several studies have looked at possible linkages between single nutrient intake as well as foods or dietary patterns and breast cancer [12–18]. However, there has been limited evidence suggesting that consumption of total dietary fat and special dietary patterns influence breast cancer risk, but no internationally accepted conclusion was reached up to now [7, 19, 20].
In Tanzania, a low-income country where breast cancer is currently the second most common cancer in women, the lifestyle characterized by long-standing lactation or late age at menarche has been associated with a lower breast cancer risk . However, breast cancer occurs, and a pilot case–control study in the Kilimanjaro Region of northern Tanzania estimated an increased association between alcohol consumption and breast cancer . A new case–control study looked at dietary patterns rather than single nutrients as nutrients are ingested within diets. A case–control design was chosen because of a lack of demographic data and infrastructural deficits for identification of all women affected to allow for a prospective study approach.
The study was carried out in collaboration between the Kilimanjaro Christian Medical Centre (KCMC) in Moshi, Tanzania, and the Institute of Nutritional Sciences of the University of Giessen, Germany.
Breast cancer patients and controls were recruited in the Kilimanjaro Region between 2004 and 2007. The detailed study methodology has been described previously . In summary, cases were identified using fine needle aspiration cytology (FNAC) confirming primary breast cancer diagnosis. The hospital and visitor-based controls were matched according to age (±1.5 years) and lived in the same district for at least 5 years during the past 10 years. The controls were interviewed for their medical history and underwent a physical examination to exclude palpable breast cancer. After informed consent, 115 cases and 230 controls were interviewed by either a trained nurse or a medical doctor in Swahili, based on a standardized questionnaire in English about their socioeconomic situation, current and former lifestyle. At first, the variables were tested for normal distribution, followed by their respective tests for statistically significant differences between cases and controls. The two control groups were analysed for differences in their socioeconomic status using the Mann–Whitney U test .
The present analyses focus on the dietary patterns of both cases and controls using the data of a semi-quantitative food frequency questionnaire (FFQ). The FFQ food list was prepared based on market surveys at different seasons and completed after a pre-test. The relative validity of the FFQ was assessed in 2005 and 2006 based on two non-consecutive 24-h recalls of 50 randomly selected women with a mean age of 40 years (23–70 years), who did not participate in the case–control study but lived in the same study region. The validation study covered two seasons with different food availability: dry and rainy season. Data collection was done by four trained enumerators. The training included estimation of quantities using common household measurements, for example, cups, spoons, customary packing size, and solid foods in pieces or slices. Foods were prepared according to local standard recipes and weighed using household kitchen scales by the research staff. Countable foods such as onions, eggs or bananas were classified according to their size into small, medium and large. Samples of food pieces were obtained from the local market, and mean weights were taken of each size. The matter of size was intensively discussed in the interviewer trainings to assure a common comprehension. A raw/cooked coefficient was applied when large deviations between cooked and raw foods were expected after preparation, for example, for dried cereals (pasta, rice) and dried legumes. The coefficients were calculated by cooking experiments done by the nutritionist but without calculating any loss of vitamins and minerals. Seasonal food availability on individual level was assessed within the interview, especially for fruits, and a seasonal factor was applied accordingly.
The FFQ data from both, the validation and the case–control study, were entered into NutriSurvey®, a nutrition software package, which generated tables of the individual food and nutrient intake per day, latter based on food composition tables from Tanzania, Kenya, Senegal, Mali and Germany [23, 24]. All data were converted to gram intake per day for each food item.
For the validation study, the data sets were merged into six food groups to describe individual food intake: (1) cereals: bread, rolls, cereal products, grains, egg-free pasta; (2) vegetables: vegetables, pulses, potatoes, mushrooms; (3) animal products: eggs, dairy and cheese, meat, fish, poultry, sausages and other meat products; (4) beverages: non-alcoholic beverages, coffee, tea, water, alcoholic beverages; (5) fruits; (6) fats: oil, fats, butter. Since the values of most variables were not normally distributed, non-parametric tests were carried out in the subsequent analysis. The studied population had a low educational level, and considering the relative high number of interviewers in relation to the study population, the validation data were tested for interviewer effects before any statistical analysis was performed. The Kruskal–Wallis test chosen to test for homogeneity between the interviewers showed interviewer effects in 100 % of the food groups confirmed by the median one-way test at a level of 83 %. Therefore, further analysis was carried out stratified by interviewer. The Wilcoxon signed rank test was used to test the 24-h recall and the FFQ for seasonal variability. It is a non-parametric test equivalent to the paired t test. In addition, the Wilcoxon signed rank test was used to test for differences in the results of the 24-h recall and FFQ. There was no evidence for a seasonal effect in the food groups if the FFQ is used, except for non-alcoholic beverages. Differences in the intake of oils and fats assessed by the validated FFQ and its reference, the 24-h recall, could only be shown by one interviewer. This might be due to low quantification capacities of the studied population especially in this respective food group and especially during the 24-h recall. Furthermore, Spearman correlation was calculated with all interviewers grouped together for comparison with other studies that did not report whether they checked for interviewer bias. The correlation coefficient (rs) was highest in the food group “fruits” (rs = 0.39, P = 0.01) followed by “cereals” (rs = 0.38, P = 0.01), “beverages” (rs = 0.33, P = 0.01) and the food groups “animal products” and “vegetables” (rs = 0.27 and rs = 0.14, respectively, both values not significant). A negative non-significant correlation coefficient (rs) was found in the food group “oils and fats” (rs = −0.22, P = 0.13). There had been no consistent statistical differences between FFQs and the 24-h recalls, and the correlations were low to modest but comparable to other studies except for oil and fats [25–27]. The negative correlation coefficient for oils and fats might be explained by the difficulties in assessing the oil and fat consumption using the 24-h recall. The reference methods ranged from 7-day-weighed record and 2-day-weighed record to two 7-day food dairies. The high variation in correlation coefficients for the different food groups might be caused by under- or overestimation due to either high fluctuations in food availability or difficulties on the part of the respondents in estimating the quantities of the foods consumed in low-income countries. However, Parr et al.  pointed out that these factors should not be directly linked to the questionnaire design; thus, the tested FFQ was considered a reliable instrument to assess dietary intake in the Kilimanjaro Region.
However, it was recommended paying special attention to the training of interviewers and especially to the assessment of the oil and fat intake. In addition, a calculation for seasonal variability in fruit and vegetable intake was recommended to be used where applicable. The FFQ finally contained in total 65 food items.
From data on individual food intake of the case–control study population, dietary patterns were created using PCA. Although there is a certain disagreement among statistical theorists about it [29–31], PCA was chosen for keeping the results comparable to other studies looking at dietary patterns and disease [20, 32–36]. The sampling adequacy of the food group variables for factor analysis was confirmed using the Kaiser–Meyer–Olkin measure. The food items listed in the FFQ were at first merged into 36 food groups for obtaining factors from the PCA defined as dietary patterns. A second PCA was performed based on 34 food groups, excluding alcoholic beverages. Scree plots and parallel analysis were used to quantify the number of factors wanted . Food groups with factor loadings between −0.4 and 0.4 were disregarded for defining the dietary patterns. Differences in body size, metabolic efficiency and physical activity increase the variation in dietary intake, thus requiring energy adjustment. We chose to apply the residual method after performing the PCA to ensure comparability between the dietary patterns [37, 38]. The final dietary patterns were included in a non-conditional logistic regression model; at first adjusted only for age. Secondly, the dietary patterns were included into the basic model described elsewhere . This model includes the matching variables of age, place of living and the acknowledged predictors in the aetiology of breast cancer from high-income countries. In addition, the body mass index (BMI) of the women at the age of 10 and 20 years as well as current BMI was estimated by each woman herself using a pictogram developed by Stunkard et al.  and modified to African settings .
If possible, the variables were entered as continuous variables. The variable “age at first full-term pregnancy” was categorized into three groups, “first full-term pregnancy ‘≤20 years’, ‘>20 years’, and ‘no pregnancy’”.
Descriptive statistics, principal component analysis and logistic regression were performed using the statistical package of SPSS version 18 (SPSS Inc.).
Ethical clearance was obtained from the Research and Ethics Committee of the KCMC, Moshi, Tanzania, and the Ethics Committee of the Faculty of Medicine of the University of Giessen, Germany.
Selected socioeconomic and reproductive indicators 
Age at menarche (years)
Age at first full-term pregnancy (years)
Number of childrena
Lifelong lactation (months)
Less than 3 years
Finished primary school
Finished secondary school
Property level (%)
Women with children (%)
Results of rotated principal component analysis (PCA 1)
Diet of the Rich
Variance explained (%)
Cucumber and okra
Carrots and tomatoes
Green (cooking) banana
Butter and lard
Mixed vegetable fats and oil
A PCA was conducted primarily on 36 food groups with Varimax rotation. The Kaiser–Meyer–Olkin measure verified the sampling adequacy for the PCA, KMO = 0.621, which is considered as mediocre [40, 41]. Following Kaiser’s criterion retaining all components with eigenvalues greater than one, 14 components would have been useful for further analysis. However, the number of food groups with factor loadings <−0.4 or >0.4 varied between 0 and 11; thus, the results were not interpretable. Consequently, it was decided to retain four components as suggested by the scree plot. These four components or dietary patterns describe 29.9 % of the variance in food intake (Table 2). The first pattern is characterized by rice, nuts, eggs, chapati (unleavened East African flat wheat bread), leguminous vegetables, bread, soda and red meat. Since most of these food items are usually purchased, we called it the “Diet of the Rich”. Pattern two is characterized by Mchicha, cucumber, okra, onions, carrots, tomatoes, maize, fish and avocado. Mchicha is the Swahili name for amaranth leaf, a traditional food in Tanzania often synonymously used for a dish consisting of amaranth leaves and, for example, onions, tomatoes and/or carrots in various amounts. The pattern was therefore named “Mchicha Diet”. The third pattern is characterized by ripe and green banana, sugar, different fruits, tubers, pulses and Mbege. The mountainous area of the Kilimanjaro Region is known for its various banana plants. Therefore, pattern three was called “Banana Diet”. Pattern four is characterized by a high consumption of milk, butter, lard, vegetable oils and fats, and a low consumption of sunflower oil and tea. All of the positively loading food items relate to fat, thus we called this pattern “Fatty Diet”. With increased affiliation to this Fatty Diet, bread consumption decreased (1st quartile median = 17 g bread/d, 4th quartile median = 9 g bread/d; P for trend < 0.001) and red meat consumption increased (1st quartile median 44 g/day, 4th quartile median = 52 g/day; P for trend = 0.09).
Results of the logistic regression: dietary patterns only
95 % CI
Dietary patterns (PCA 1)
Diet of the Rich
Basic breast cancer risk model and dietary patterns
95 % CI
Body mass index (kg/m²)
At 20 years
Age at first full-term pregnancy
Dietary patterns (no alc)
Diet of the Rich (no alc)
Fruity Diet (no alc)
Mchicha Diet (no alc)
Banana Diet (no alc)
Starchy Diet (no alc)
Fatty Diet (no alc)
Several dietary patterns from two principal component analyses with Varimax rotation based on a FFQ were associated with increased breast cancer risk. Two patterns, both called Fatty Diet, are basically characterized by a higher consumption of milk, mixed vegetable oils and fats, butter and lard, but a low consumption of sunflower oil. Both Fatty Diets were associated with an increased risk in different logistical models. A diet rich in fat similar to our Fatty Diets was discussed by Schulz et al.  using reduced rank regression, stating that specific fatty acids are less important in populations with a generally higher fat consumption (mean 8.3–10.4 g/MJ). However, this level of dietary fat intake as a proportion of energy intake was comparable to our study population (mean 10.2 g/MJ), but the mean total fat intake in our population was 15 g per day lower because of the overall lower energy consumption than reported by Schulz et al. . Here, the women’s total fat intake was not associated with breast cancer risk (data not shown), although total fat intake increased significantly with increased affiliation to the fatty dietary patterns. Another prospective cohort study found a direct association between dietary fat intake including subtypes and post-menopausal invasive breast cancer . However, they recorded at median 20.3 % energy intake from fat per day in the 1st quintile and 40.1 % energy intake from fat per day in the last quintile. Only the latter energy intake level from dietary fat is comparable to our data. The wide range of fat intake observed in their study population may have resulted in an increased statistical power. This assumption was made by Thiébaut et al.  based on the hypotheses from Wynder et al.  that a threshold effect may exist for dietary fat, such that it would be difficult to detect an association between fat intake and breast cancer risk in Western populations. They referred to studies about Asian diets in which more people consume diets containing 20 % or less of energy from fat, which have shown significant or borderline significant associations of fat intake and breast cancer risk . The median fat intake as percentage from energy intake in our study population was 39 %, which is above this benchmark of 20 % and may explain why no association was found for our population. Regarding the fatty acid composition of the diet, the major PUFA sources reported by Thiébaut et al. were vegetable oils and fats, butter and mayonnaise . Except mayonnaise, these food items have also been identified by Schulz et al.  and in our study as part of dietary patterns rich in fat which have been associated with a higher risk of developing breast cancer. Even if the fatty acid composition of foods varies intrinsically, this composition may be more important than the total fat intake. Our study population showed a negative association of the P/S ratio with breast cancer risk (Fig. 1). This negative association was also observed in a case–control study among pre-menopausal women in Singapore, but was attributed to PUFA intake only . However, results from the European Investigation into Cancer and Nutrition (EPIC) study  and a case–control study in Connecticut  supported the hypothesis of Rose et al.  and Key et al.  that both saturated and polyunsaturated fatty acids influence inversely the oestrogen metabolism and mammary carcinogenesis. In addition, results from the Shanghai Women’s Health study, a prospective cohort study, suggested that the relative amounts of n−6 PUFA to marine-derived n-3 PUFAs may be more important for the breast cancer risk than individual amounts of these fatty acids in the diet . They supported the hypothesis that the different PUFA compete as enzyme substrates inside membrane phospholipids : this may also explain the contradictory results of other studies analysing the effect of PUFAs on breast cancer risk [50–52].
Investigators from the Black Women’s Health Study, a prospective cohort study, identified a dietary pattern similar to our Fatty Diet called “Western Diet” also based on a PCA with Varimax rotation and factor loadings for dairy products and meat at similar level . However, they associated a lower risk for breast cancer only with another dietary pattern, the “Prudent Diet”, characterized with a low consumption of meat and dairy products. Since both the Western and the Prudent Diets were more complex than in our study with each diet having more than 8 foods with factor loadings above 0.4, it is not known whether the non-relationship between the Western Diet and breast cancer has been masked by a higher consumption of potentially preventive foods which in turn result in a high P/S ratio.
Effect of alcohol on dietary patterns risk association
The reported consumption of the local banana beer, Mbege, increased significantly with increased affiliation to the Fatty Diet (P for trend < 0.001). Our data show a higher breast cancer risk for women mainly following the Banana Diet, which was also associated with a high consumption of Mbege, even though there is no risk association between alcohol intake and breast cancer risk in this study. According to the WRCF panel, there is ample and generally consistent evidence from case–control and cohort studies that alcoholic drinks are a cause of pre- and post-menopausal breast cancer . In order to exclude a possible bias in the risk estimation of dietary behaviour and breast cancer risk, we generated a second set of dietary patterns excluding the alcoholic beverages from the factor analysis. The risk-increasing effect of the Fatty Diet remained slightly less pronounced when alcoholic beverages were singularized and added separately into the risk estimation model. On the contrary, the Mchicha Diet and the Banana Diet were no longer associated with breast cancer risk, although the latter is characterized by rapidly absorbable carbohydrates. Such carbohydrates have been associated with increased breast cancer risk [20, 54]. One would assume that if the alcoholic beverages did influence the risk estimation of the Michicha and Banana Diets, the analysis keeping alcoholic beverages as separate food groups should visualize an increased risk association. However, the odds ratio of Mbege as well as bottled beer and wine was estimated to be 1.00 (95 % CI, 1.00–1.00; P = 0.08 and 0.87, respectively) indicating no risk association. In our study, the alcohol consumption was 8.2 g/day, which is well below the recommended maximum intake of one drink per day in the European code against cancer . Thus, the alcohol intake in general was probably too low to show an effect.
Socioeconomic status (SES) is an internationally acknowledged indicator in epidemiological, economical and sociological studies. However, there is no international consensus on assessing SES, income and household expenditures being the most commonly used measures of SES . In low-income countries like Tanzania, these indicators are difficult to assess. Often, poverty or possession scores are used instead. Studies from low-income countries looking at socioeconomic status and health have shown that a possession score might be even a better indicator of SES, as this score allows greater discrimination in identifying health risks than a poverty index . In this study, a possession score called property, which was used as proxy for SES, showed an inverse association with breast cancer risk. This seems at odds with the statement that higher education and socioeconomic status are associated with an increased risk resulting from the lower number of parities and lactations. However, parity and lactation were correlated with educational level only. Furthermore, educational level was negatively correlated with age indicating a trend towards higher education among the younger women. Education might have an impact on breast cancer risk estimation in the future following the expectation that lactation and parity will reduce over time with increasing educational level. In addition, it is expected that the trend towards higher education and fewer children, thus reduced lifelong lactation, will continue especially with all the efforts towards the MDGs.
We did not find a correlation between lactation, parity and property level. In the context of this study, “low property level” means people can call a bicycle or a radio their own. If they own both, they already belong to the group at “medium property level”. Thus, any extra income is used first to improve basic living conditions like nutrition, sanitation and health before it is used for education. Alderman  points out that the relationship between possessions to nutrition provides only an indirect answer looking at social transfer programmes in low-income countries aiming at improvement in nutrition and healthcare-seeking behaviour. However, according to Hou et al. , it may not be surprising to observe an inverse association between SES and breast cancer risk, as studies have shown that people with low SES develop triple-negative subtypes, which accounts for a substantial proportion of breast cancer in Africa. Nevertheless, they required confirmation by larger population-based studies.
Strengths and limitations
Our data show the impact of reproductive and lifestyle factors on breast cancer aetiology of women in the Kilimanjaro Region. With regard to eating habits and dietary patterns, the diversity of the Kilimanjaro diet is low, and it was less likely to miss important foods on the FFQ food list reducing the estimation bias for dietary behaviour. Due to low education levels and the poor infrastructure, we do not expect socially desirable answers, and participants are less likely to be informed about possible dietary impacts on health outcomes. The semi-quantitative FFQ allowed us to identify non-consumers and frequent consumers on the basis of eating habits, nutrient and energy intake in the case and control groups. Also, ready-to-use meals and eating out are uncommon in the Kilimanjaro region, which facilitated the identification of food groups based on single food items and less on complex meals.
The sample is relatively small compared to studies in Westernized countries. The sample size required to achieve a high level of power in a logistic regression depends on the number of predictors and the size of the expected effect. Peduzzi et al.  showed that no problems occur by events per variable (EVP) of 10 or more. Also, high regression coefficients and high correlations between the predictors may cause large problems in the estimation process, resulting in very low power even with EVP of 20 or more . Thus, we have tested multicollinearity, which was acceptable in all predictors used. Several studies showed that a sample size like in our studies allows detecting large and medium effects, but might miss small effects. Thus, the results of our study are moderately powered and need to be confirmed by studies with a larger study population.
Before running a PCA, the sampling adequacy was controlled using the KMO measure. The KMO measure, which was 0.62, is considered as mediocre in our case [68, 69]. A low KMO measure might result in a high unexplained variance. However, in this study, we extracted 6 factors explaining 40.3 % variance, which is a medium result compared to other studies, for example: Hu et al. = 2 factors: 20 %; Arkkola et al. = 7 factors: 29.5 %; Shi et al. = 4 factors: 28.5 %; Lau et al. = 2 factors: 17.1 % [69–73]. In addition, Bartlett’s test for sphericity was 2,599.25, P < 0.001 indicating that correlations between items were sufficiently large for a PCA.
A major limitation is the PCA method. There have been discussions that the PCA method is less suitable for risk estimations of dietary patterns, because of difficulties to find plausible linkages between dietary patterns and the observed disease . Therefore, it was recommend to use reduced rank regression based on response variables. However, breast cancer develops over a long period of time. Thus, using response variables—such as biochemical parameters—is only possible in prospective studies.
The knowledge about breast cancer and breast self-examination was very poor in our study population, and facilities for cancer diagnosis and treatment are still rare in countries like Tanzania . In order to avoid a bias, we excluded the family history data for cancer from the analysis.
In the absence of a general health insurance, patients have had to pay for getting access to the health facilities. With the aim to minimize confounding errors due to different livelihood systems between cases and controls, we decided to select the controls also from within the hospital setting. But thereby, other selection biases cannot be excluded.
In conclusion, a dietary pattern rich in fat and characterized by a low P/S ratio may be associated with a higher risk of breast cancer. The fatty acid composition is probably more important than total fat intake for the breast cancer risk.
We thank the participating women of the Kilimanjaro Region and the members of the research team. Greatly appreciated is the input of: Mark Swai for his contribution to the study design and implementation, Joel Julia Mfinanga for conducting the interviews and physical examinations, Johannes Herrmann for statistical guidance and data analysis, and Emma Sykes and Margaret Wightman for the English editing.
Conflict of interest
The study was partially funded by Henskes Ltd., Laatzen, Germany, Plural, Sachse and Heinzelmann Ltd., Germany, as well as the German Academic Exchange Service (DAAD). The authors have no conflict of interest.
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.