Serum vitamins A and E as modifiers of lipid trait genetics in the National Health and Nutrition Examination Surveys as part of the Population Architecture using Genomics and Epidemiology (PAGE) study
Both environmental and genetic factors impact lipid traits. Environmental modifiers of known genotype–phenotype associations may account for some of the “missing heritability” of these traits. To identify such modifiers, we genotyped 23 lipid-associated variants identified previously through genome-wide association studies (GWAS) in 2,435 non-Hispanic white, 1,407 non-Hispanic black, and 1,734 Mexican-American samples collected for the National Health and Nutrition Examination Surveys (NHANES). Along with lipid levels, NHANES collected environmental variables, including fat-soluble macronutrient serum levels of vitamin A and E levels. As part of the Population Architecture using Genomics and Epidemiology (PAGE) study, we modeled gene–environment interactions between vitamin A or vitamin E and 23 variants previously associated with high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglyceride (TG) levels. We identified three SNP × vitamin A and six SNP × vitamin E interactions at a significance threshold of p < 2.2 × 10−3. The most significant interaction was APOB rs693 × vitamin E (p = 8.9 × 10−7) for LDL-C levels among Mexican-Americans. The nine significant interaction models individually explained 0.35–1.61 % of the variation in any one of the lipid traits. Our results suggest that vitamins A and E may modify known genotype–phenotype associations; however, these interactions account for only a fraction of the overall variability observed for HDL-C, LDL-C, and TG levels in the general population.
The importance of both genetics and environment in shaping an individual’s lipid profile is intuitively obvious. However, the search for gene–environment interactions that influence levels of high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglycerides (TG) has only been relatively recent. One driving force for expanding beyond the standard single-variant models is the observation that single-variant main effects do not account for the majority of the heritability attributed to additive genetics for most complex human traits (Manolio et al. 2009). For the lipid traits, heritability estimates are as high as 80 % (Heller et al. 1993; O’Connell et al. 1988; Snieder et al. 1999), yet the largest and most comprehensive lipid meta-analysis to date was only able to explain about 25–30 % of the genetic variance (Teslovich et al. 2010). The identification of gene–environment interactions may help find a proportion of this “missing heritability”.
Within a statistical framework, a gene–environment interaction describes the effect of a genotype and an environmental factor that deviates from their additive effects. Within a biological framework, the environment (or its by-product) modifies the function or amount of a gene product (Hunter 2005). The latter approach to identify gene–environment interactions is difficult in outbred populations such as humans given that both genetic background and environmental exposures vary within and across populations. Model organisms are more suited to identify biological interactions, but it is difficult to automate these studies, and the findings of these experiments may not generalize to humans (Ober and Vercelli 2011). In contrast, methods to identify statistical interactions can be automated, making them an attractive option for detecting gene–environment interactions important for complex human traits (Hunter 2005).
A number of candidate environmental factors may affect lipoprotein phenotypes, including diet and nutrition. More specifically, fat-soluble micronutrients such as vitamin E (α-tocopherol) and vitamin A may influence lipid metabolism since their metabolic pathways are tightly linked as fat-soluble vitamins and vitamin precursors are absorbed together with dietary fat. Following absorption, both lipophilic molecules are transported to the liver via lipoproteins (as retinyl esters in the case of vitamin A). Vitamins E and A are then re-secreted by the liver into the circulation (as retinol in the case of vitamin A). However, here their metabolic pathways diverge, with the majority (90 %) of vitamin E found in LDL or HDL particles and majority of circulating vitamin A found in complex with a specific transporter, retinol-binding protein (Norum and Blomhoff 1992; Zingg and Azzi 2004). Despite this, both vitamins A and E are known to be positively correlated with cholesterol levels. Furthermore, previous studies have indicated that variants in genes which influence lipid metabolism also influence plasma α-tocopherol levels (Borel et al. 2007, 2009; Gomez-Coronado et al. 2002; Ortega et al. 2005) and trans-retinol levels (Gomez-Coronado et al. 2002).
Despite evidence that genetic variants and environmental factors are independently associated with lipid traits, relatively few studies have been published investigating the interaction between the two (Bernstein et al. 2002; Corella et al. 2001a, b; Hagberg et al. 2000; Lai et al. 2006; Weinberg 2002). And, to our knowledge, no studies explicitly testing the effect of non-additive interactions between lipid-associated SNPs and vitamin E or A on lipid levels have been published. We present here an investigation of the effects of 23 lipid-associated SNPs in the context of circulating levels of vitamins A and E using data from the National Health and Nutrition Examination Surveys (NHANES) as part of the Population Architecture using Genomics and Epidemiology (PAGE) study (Matise et al. 2011). Analysis of greater than 5,500 participants from this diverse population-based survey provides the first steps in finding the “missing heritability” for lipid traits by accounting for nutritional modifiers.
Materials and methods
Study samples were drawn from three National Health and Nutrition Examination Surveys (NHANES III, NHANES 1999–2000, and NHANES 2001–2002). Participant ascertainment and data collection for NHANES have been previously described (Centers for Disease Control and Prevention 1996; Centers for Disease Control and Prevention 2010). Fasting adults (age ≥18 years) were included in this analysis, regardless of self-reported lipid lowering medication use, as less than 4 % of participants fell into this category and previous sensitivity analyses showed that excluding participants based on medication use did not appreciably alter the results of single-SNP tests of association (Dumitrescu et al. 2011). Race/ethnicity was self-described. Body mass index (BMI) was calculated from height and weight measured in the Mobile Examination Center by CDC medical personnel. Current smoking was defined by “do you smoke cigarettes now?” or cotinine levels >15 ng/ml.
All procedures were approved by the CDC Ethics Review Board and written informed consent was obtained from all participants. Because no identifying information was accessed by the investigators, this study was considered exempt from human subjects by Vanderbilt University’s Institutional Review Board.
Laboratory and dietary measurements
Serum HDL-C, triglycerides, and total cholesterol were measured using standard enzymatic methods. LDL-C was calculated using the Friedewald equation, with missing values assigned for samples with triglyceride levels greater than 400 mg/dl. Serum levels of vitamin E (α-tocopherol) and vitamin A (retinol) were measured with isocratic high-performance liquid chromatography (Center for Disease Control and Prevention 1996; Centers for Disease Control and Prevention (CDC) 2002).
Data for dietary intake were collected via a 24-h dietary recall administered by a trained dietary interviewer. Total nutrient intake was calculated using the US Department of Agriculture’s survey nutrient database. Total energy intake from fat, protein, carbohydrates, and alcohol was calculated by multiplying the grams of intake by the appropriate conversation factor: 9, 4, 4, and 7 kcal/g, respectively.
SNP selection and genotyping
List of 23 candidate genes and GWAS-identified SNPs genotyped in NHANES
Build 37 location (bp)
Gene of interest
Regression modeling was used to investigate the effect of interactions between lipid-associated variants and vitamin levels on HDL-C, LDL-C, and triglycerides. Gene–environment interactions were modeled using a multiplicative interaction term between the environmental variable and the additively encoded SNP. All models were adjusted for the main effect of the SNP and the environmental variable, along with age and sex. Significant associations were further adjusted for total energy intake from five dietary variables and/or BMI and current smoking status. Triglycerides and vitamin E levels were natural-log transformed due to a skewed, non-normal distribution. A linear model was used for the main effects of both serum vitamins for all analyses. Visual inspection of the relevant scatter plots failed to suggest that more complicated models would better fit these main effects. Given that misspecification of the model will reduce our power to detect an interaction, only linear models were considered here.
All analyses were stratified by self-reported race/ethnicity to minimize possible confounding due to population stratification and were conducted in SAS v9.2 (SAS Institute, Cary, NC) using the Analytic Data Research by Email (ANDRE) portal of the CDC Research Data Center in Hyattsville, MD. Associations were deemed significant if the p value was less than or equal to the Bonferroni corrected threshold of 2.2 × 10−3 (=0.05/23 SNPs). Aggregate statistics related to this work will be available via dbGaP as part of the PAGE study.
To detect an interaction of a certain effect size, a general rule of thumb is that four times the sample size required to detect a comparable main effect is needed (Smith and Day 1984; Thomas 2010). Based on this assumption, we have 80 % power to detect a gene–environment interaction with an effect size as low as R2 = 2.3 % in non-Hispanic whites, R2 = 3.8 % in non-Hispanic blacks, and R2 = 3.1 % in Mexican-Americans (α = 2.2 × 10−3; additive genetic model). Quanto (Gauderman 2002) was used to estimate statistical power.
NHANES participant characteristics
51.9 ± 20
42.5 ± 17
42.8 ± 18
Vitamin A (μg/dl)
60.6 ± 16
53.1 ± 17
52.8 ± 15
Vitamin E (μg/dl)
1,322 ± 615
1,002 ± 379
1,135 ± 459
51.3 ± 16
54.1 ± 17
48.3 ± 14
126.9 ± 36
122.2 ± 39
120.9 ± 34
146.7 ± 93
107.0 ± 72
156.3 ± 104
Associations between lipid traits and vitamins A and E
SNP × vitamin interactions
Significant SNP × environment interactions in NHANES
Associated lipid trait
SNP main effect
Environment main effect
SNP × Environment interaction effect
rs693 × VitA
rs693 × VitE
rs693 × VitE
rs693 × VitE
rs1748195 × VitA
rs1748195 × VitE
rs11206510 × VitA
rs11206510 × VitE
rs3135506 × VitE
Interactions between ANGPTL3 rs1748195 and both vitamins A and E were associated with HDL-C levels in non-Hispanic whites (p = 1.16 × 10−3 and p = 2.06 × 10−3). The ANGPTL3 rs1748195 × vitamin A interaction trended toward significance in non-Hispanic blacks (p = 0.01) but was not associated with HDL-C in Mexican-Americans (p = 0.64, Table S1). Similarly, the rs1748195 × vitamin E interaction was not associated with HDL-C in the other two populations.
Two interactions with a variant in PCSK9 are also listed in Table 4. The PCSK9 rs11206510 × vitamin A interaction was associated with LDL-C in Mexican-Americans at p = 7.65 × 10−5. In addition, the PCSK9 rs11206510 × vitamin E interaction was associated with transformed triglycerides in non-Hispanic whites at p = 1.27 × 10−3. Lastly, the only significant gene–environment interaction observed in non-Hispanic blacks was between the APOA1/C3/A4/A5 cluster variant rs3135506 and vitamin E, which was associated with triglyceride levels at p = 2.45 × 10−4.
The nine significant interaction models individually explained 0.35–1.61 % of the variation in one of the lipid traits. Interactions rs693 × vitamin E and rs11206510 × vitamin A had the greatest R2 values and contributed to 1.61 and 1.26 %, respectively, of the variation in LDL-C among Mexican-Americans. The seven other interaction terms had R2 values <1 %.
Adjustment for lifestyle and dietary variables
Additional factors, both dietary and environmental, may influence both serum lipid and serum vitamin levels and, possibly, the interactions modeled here. To account for these variables, we adjusted our nine most significant associations for (1) BMI and current smoking status and (2) BMI, current smoking status, and five dietary variables (total fiber and total energy intake from carbohydrates, protein, fat, and alcohol), along with age and sex. For five of the nine associations tested, adjustment for BMI and smoking did not appreciably alter the results compared to the models minimally adjusted for age and sex (Table S7). And of these five associations, only one (rs174819 × vitamin E with HDL-C in non-Hispanic whites) was no longer significant (p = 0.02) after including dietary variables in the model. Interestingly, of the four associations that no longer remained significant (p > 0.04) after adjustment for BMI and current smoking status, all four included an interaction with rs693. Indeed, the p value for the previously most significant interaction (rs693 × vitamin E with LDL-C in Mexican-Americans) rose from p = 2.67 × 10−7 to p = 0.59 and the amount of variance explained dropped from R2 = 1.61 % to only R2 = 0.30 % (Table S7).
In this study, we have identified three novel SNP × vitamin A and six novel SNP × vitamin E interactions. A majority of the significant interactions were associated with triglycerides (4/9) and were among non-Hispanic whites (6/9), which may be a result of the stronger associations between triglycerides and serum vitamin levels (Table 3) and the larger sample size for non-Hispanic whites compared to non-Hispanic blacks and Mexican-Americans (Table 1). When dietary and lifestyle variables were included in the model, the four vitamin interactions with rs693 were no longer significant (Table S7).
Although we identified several statistically significant interactions, the overall contribution each interaction term made toward the observed trait variability for any of the lipid traits was small. For example, after adjusting for age and sex, the interactions discovered here explained only 0.35–0.39 %, 0.67–1.61 %, and 0.36–0.80 % of the variability in HDL-C, LDL-C, and triglyceride levels, respectively. Our most significant finding (APOB rs693 × vitamin E) only explained 1.61 % of the variance in LDL-C among Mexican-Americans, a trait that is up to 80 % heritable. In comparison, the effect of age and sex together accounted for 5.9 % of the variance in LDL-C among Mexican-Americans. Furthermore, after adjusting our most significant interactions for BMI, current smoking status, and dietary intake, all the R2 values decreased, with the rs11206510 × vitamin A interaction with LDL-C in Mexican-Americans resulting in the largest R2 of only 1.12 %.
All of the genes implicated here play key roles in lipid metabolism. ANGPTL3 encodes a protein which can suppress lipoprotein lipase (LPL) activity, leading to increases in plasma triglycerides and HDL-C. PCSK9 encodes protein convertase subtilisin kexin 9, a protein that binds the LDL receptor and induces its degradation. The APOA1/C3/A4/A5 gene cluster lies within a 17-kb region on chromosome 11. Proteins made by this gene cluster are the major constituents of very low-density lipoprotein (VLDL) and/or HDL, act to inhibit LPL activity, and influence dietary fat absorption and chylomicron synthesis (Delgado-Lista et al. 2010). The gene products of APOB, apoB-48 and apo-100, are the main apolipoproteins of chylomicrons and LDL particles, respectively.
Interestingly, four out of the nine (44 %) significant interactions included the variant rs693 in APOB (Table 4). The gene products of APOB, apoB-48 and apo-100, are the main apolipoproteins of chylomicrons and LDL particles, respectively. In fact, one study showed that genetically modified mice that do not express APOB in the intestine do not form chylomicrons and display defective absorption of fats and fat-soluble vitamins (Young et al. 1995). Furthermore, mutations in APOB have been shown to cause familial hypolipoproteinemia (FHBL), which is characterized by low levels of apolipoprotein B containing lipoproteins and fat-soluble vitamin malabsorption, resulting neurological complications from lack of vitamin E (Young 1990).
Both vitamin E and A precursors are incorporated into chylomicrons for delivery to the liver. In addition, circulating vitamin E is found exclusively in plasma lipoproteins (VLDL, LDL, and HDL) (Borel et al. 2007). The interdependence of these vitamins and lipids (as demonstrated in Table 3) suggests that the interactions described in this study may be either just reflective of the strong correlation between vitamins and lipids or biological relevance. In support of the latter interpretation, micronutrients have previously been implicated in affecting the gene expression of import lipid-metabolizing genes (Hagberg et al. 2000; Gatica et al. 2006; Mooradian et al. 2006a, b; Oliveros et al. 2007). For example, Mooradian et al. (2006a) demonstrated that high concentrations of vitamin E were associated with significant decreases in apoA-I expression (which is sensitive to the oxidative state of the cell) in hepatic HepG2 cells by reducing apoA-I promoter activity.
In this study, we tested for gene–environment interactions regardless if any of the 23 SNPs had a significant main effect in our previous meta-analysis as part of the larger PAGE study (Dumitrescu et al. 2011). Indeed, only half (13 HDL-C, 12 LDL-C, and 12 TG = 37/69 = 54 %) of the associations tested in Dumitrescu et al. were significant at p < 0.05 in European Americans, the largest population studied in PAGE (n ≈ 20,000). Even fewer single-SNP associations were significant in African Americans (8 HDL-C, 3 LDL-C, and 8 TG = 19/69 = 28 %; n ≈ 9,000) and in Mexican-Americans/Hispanics (8 HDL-C, 6 LDL-C, and 5 TG = 19/69 = 28 %; n ≈ 2,500). Differences in SNP main effects across racial/ethnic groups may help to explain the differences we observed across our three study populations. Indeed, of the significant interactions identified, only the rs693-vitamin E interaction with LDL-C was significant in more than one racial/ethnic group (non-Hispanic whites and Mexican-Americans). As discussed in Dumitrescu et al. (2011), the lack of generalization of SNP main effects between the different racial ethnic groups may be due to differences in linkage disequilibrium or differences in power.
It has also been argued that gene–environment heterogeneity may be, in part, to blame for the lack of replication among GWAS studies and among different ancestral populations (Lasky-Su et al. 2008; Ober and Vercelli 2011). In the single-SNP PAGE meta-analysis detailed in Dumitrescu et al. (2011), APOB rs693 was strongly associated with LDL-C in European Americans (p = 3.38 × 10−21), marginally associated in African Americans (p = 0.02), but not associated in Mexican-Americans/Hispanics (p = 0.18). However, in this analysis, which represents a subset of the PAGE study sample, the main effect of rs693 was significantly associated in Mexican-Americans (p = 1.17 × 10−6, Table 4) after adjusting for the interaction with vitamin E. Indeed, nine significant gene-environment interactions were identified here and involved seven-independent SNP main effects. Of those seven, only four had significant main effects in our earlier single-SNP meta-analysis for the same lipid trait and study population. However, after adjusting for the gene–environment interaction, the SNP main effect was significant (p < 6.11 × 10−3) for all seven. Accounting for environmental modifiers in genetic studies of lipid levels may not only uncover new biology, it may also improve the generalizability of findings from genome-wide association studies.
In interpreting our findings, we should consider several aspects. First, NHANES is a cross-sectional study and, therefore, we are unable to determine the temporal sequence of our results. Second, the issue of sample size and the ‘curse of dimensionality’ (Bellman 1961; Dumitrescu et al. 2011) are relevant to this study. As the number of factors under study increases (as with the addition of interaction terms), so do the number of strata. With a set sample size, increasing the number of terms in the model quickly increases the degrees of freedom and reduces the per-stratum sample size, thus decreasing statistical power. For this reason, even with relatively large sample sizes in NHANES, we had to restrict our analysis to SNPs with minor allele frequencies >5 %. To better study less common variants, collaborative studies and/or other non-regression-based approaches (such as Multifactor Dimensionality Reduction) (Ritchie et al. 2001) may be appropriate, although they are not without their own limitations.
In addition, it is important to note that correcting for multiple testing in gene–environment interaction studies is inherently more complicated than in standard single-SNP association studies. For GWAS, it is well known that a strict Bonferroni adjustment using the total number of SNPs tested is overly conservative as many SNPs are in linkage disequilibrium and, therefore, not all tests are independent. This concern holds true for studies of gene–environment interactions and is compounded by correlations among the SNP and the environmental variable (i.e., main effects) and, possibly, correlations among the different environmental variables tested. And while permutation testing has become very popular in single-SNP analyses as a way to correct for multiple testing, for gene–gene and gene–environment interaction studies, permutation testing is not available in most situations and does not guarantee strong control of the family-wise error rate (Anderson and Robinson 2001; Buzkova et al. 2011). As this was a discovery study, we corrected for only 23 tests even though we conducted 414 tests (23 SNPs × 2 environmental variables × 3 race/ethnicities × 3 lipid traits), albeit many of these tests are highly correlated. Indeed, replication may be the most acceptable approach to filter true findings from the false positives, but this approach, like all others, is not without limitations.
The differences in lipid traits between individuals and between populations may partly result from interactions of known lipid-associated genetic variants and fat-soluble micronutrients. The results presented here highlight the fact that effect sizes of gene–environment interactions which tend to be small and large sample sizes are needed to detect them. Nevertheless, understanding the mechanism of the interaction between these lipid-associated variants and environmental factors, such as serum vitamin E and A levels, is imperative to determining the etiology of a poor lipid profile and could, therefore, have implications in clinical care.
Genotyping in NHANES was supported in part by The Population Architecture Using Genomics and Epidemiology (PAGE) study, which is funded by the National Human Genome Research Institute (NHGRI). Data included in this report were resulted from the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) Study, as part of the NHGRI PAGE study (U01HG004798). Genotyping services for select EAGLE NHANES III SNPs presented here were also provided by the Johns Hopkins University under federal contract number (N01-HV-48195) from NHLBI. We at EAGLE would like to thank Dr. Geraldine McQuillan and Jody McLean for their help in accessing the Genetic NHANES data. The Vanderbilt University Center for Human Genetics Research, Computational Genomics Core provided computational and/or analytical support for this work. The NHANES DNA samples are stored and plated by the Vanderbilt DNA Resources Core, managed by Cara Sutcliffe. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the National Institutes for Health or the Centers for Disease Control and Prevention.