Introduction

S-Adenosylmethionine (SAM) is the methyl donor for 200 methyltransferase reactions of DNA, RNA, proteins, and metabolites (Finkelstein 2007; Lennard 2010) involved in a wide range of metabolic and signaling pathways. The transfer of the methyl group leads to the formation of S-adenosylhomocysteine (SAH), which is subsequently metabolized to homocysteine (Hcy). Plasma levels of SAM, SAH, and Hcy have independently and in combinations been associated with cardiovascular, neurological, immunological, and obesity pathologies (Poirier et al. 2001c; Guerra-Shinohara et al. 2004; Strain et al. 2004; Selley 2007; Van Hecke et al. 2008; Ming et al. 2009; Obeid et al. 2009; Panza et al. 2009a, b; Linnebank et al. 2010; Muller 2010; Altug Sen et al. 2011). SAM is a product of the one-carbon pathway that requires nutritional cofactors folate, vitamin B6 (pyridoxal), and vitamin B12, and substrates choline, betaine, and methionine all of which are derived from the diet (Mason 2003; Lim et al. 2007; Rubio-Aliaga et al. 2011; Kasperzyk et al. 2011; Dominguez-Salas et al. 2013; Wadhwani et al. 2013). The quality of the diet has been shown to dramatically alter the levels of SAM, SAH, and homocysteine (Poirier et al. 2001a; Dominguez-Salas et al. 2013) and subsequently methylation potential for all substrates including DNA. Variation in genes involved in the one-carbon pathway, which also alter methylation potential, has been intensively studied (Hazra et al. 2007; Wernimont et al. 2011; Kasperzyk et al. 2011; Molloy 2012).

Gene–environment interactions that alter one-carbon pathway production of SAM have both short-term and long-term effects since DNA methylation and other SAM-dependent products contribute to the regulation of gene transcription (Klose and Zhang 2007; Gibney and Nolan 2010; Pu et al. 2010). A growing body of evidence from laboratory animal (Wolff et al. 1998; Cooney et al. 2002; Dolinoy et al. 2006; Waterland et al. 2007) and human studies (Heijmans et al. 2008; Gertz et al. 2011) indicates that changes in DNA methylation in utero or during critical developmental windows contribute to the developmental origins of adult diseases (Barker et al. 1993; Godfrey and Barker 1995; Gluckman et al. 2009; Hochberg et al. 2011). While many studies have understandably focused on methylation potential during fetal and perinatal periods (McGowan and Szyf 2010; Laurent et al. 2010; Beyan et al. 2012), fewer investigations have analyzed methyl substrate pools during childhood and adolescence.

The USDA Delta Nutrition Intervention Research Initiative (Delta NIRI), renamed in 2010 to the Delta Obesity Prevention Research Unit, initiated a community-based participatory research (CBPR) program (O’Fallon et al. 2000) to develop interventions to address childhood obesity in communities of the Lower Mississippi Delta (LMD) region. This project was conducted in conjunction with a Freedom School Program held at the Boys, Girls, Adults Community Development Center (BGACDC). BGACDC has managed and operated a summer day camp for children and teens for 30 years. CBPR simultaneously conducts research while applying existing scientific knowledge to improve prevention practices and healthcare among the participants and their community (Israel et al. 2005; McCabe-Sellers et al. 2008). The USDA program has a long history in the Delta region as evidenced by the many consultations held with the community (Harrison 1997; Smith et al. 1999; Yadrick et al. 2001; Horton et al. 2004; Ndirangu et al. 2007, 2008), assessments of the food insecurity in the region (Stuff et al. 2004a, b; Champagne et al. 2007), local diets (Champagne et al. 2004; McCabe-Sellers et al. 2007; McGee et al. 2008), and the creation of a regional food frequency questionnaire (Tucker et al. 2005). The assessments for efficacy had not previously analyzed biochemical markers to test whether the intervention programs were successful in altering physiological processes. The USDA and the FDA researchers expanded the scope of the CBPR program in consultation with community leaders in Phillips County to include physiological measures from blood analyzed with omics and genomic methodologies, physical activity monitoring, and dietary intakes (McCabe-Sellers et al. 2008).

Levels of 9 plasma metabolites, erythrocyte metabolites S-adenosylmethionine and S-adenosylhomocysteine, 1,129 plasma proteins, and 1M single-nucleotide polymorphisms were measured. Associations were found between the SAM/SAH and gene variants and between levels of metabolites and plasma proteins, which may provide additional biomarkers of vitamin status. Two SAM/SAH metabolic groups were found by cluster analysis. The clusters significantly differed in terms of multiple dietary variables as well as in genotype within 25 genes involved in SAM/SAH metabolism.

Methods

Study design

This study is an observational n-of-1 design, and data were aggregated for population-level statistical analysis (Nikles et al. 2011), to assess levels of metabolites and proteins associated with each other, with dietary intake, and with genetic makeup. Participants in the research studies conducted in the Marvell (AR) School District in 2009 and 2010 were recruited during the youth summer day program at the Boys, Girls, Adults Community Development Center (BGACDC). The camp was held at sites in Marvell and Elaine (Arkansas) and consisted of structured and unstructured physical activities, reading, leadership, and other enrichment activities using material and training from the Children’s Defense Fund’s Freedom School curriculum (http://www.childrensdefense.org/programs-campaigns/freedom-schools/). Children arrived by 7:30 a.m. and the day ended between 3 and 4 p.m. The FDA’s Research Involving Human Subjects Committee (RIHSC) and the University of Arkansas for Medical Sciences (UAMS) Institutional Review Board (IRB) approved this research protocol.

Breakfast, lunch, and two healthy snacks were provided per day for 5 weeks of the summer day camp. The snacks offered were fruits and vegetables. Reduced fat milk and water rather than sweetened fruit drinks or other sweetened soft drinks were also provided. Menus were developed in accordance with USDA guidelines for healthy meals for children/teens ages 6–14, and the same foods were offered in both years of the camps although the quantity of food intake per participant was not monitored at the camp.

The assessments were conducted before the beginning of the camp (baseline), at the end of 5 weeks of the camp, and 1 month after camp ended (post-camp). Thirty-six participants were recruited in the first year, and 19 completed all three assessments. In the second year, 72 participants enrolled and a total of 42 completed all three assessments. Hence, data were available for 105 individuals at 2 baseline measures, 72 at the end of the summer day camp, and 61 who had at least three assessments in the 2-year study. Results for the three assessments for each of 2 years are reported, and 15 individuals participated in both years. All participants were healthy African-American children and adolescents. None of the participants were taking prescribed medicines, nor did they have overt malnutrition, suffered from active infection, or any known genetic disease that could alter metabolism. Some participants voluntarily left the study between assessments, and no adverse events were recorded.

Assessments at baseline (time point 1), end of camp (time point 2), and post-camp (time point 3)

Blood was sampled from participants after an overnight fast at each time point. Blood (3 ml) was collected in purple top EDTA vacutainer tubes, kept on ice, and centrifuged within 0.5 h of collection. After centrifugation, plasma, buffy coat, and red blood cells (erythrocytes) were separated, frozen, and stored at −80 °C. A second 3-ml blood sample for DNA extraction was collected in a PAX gene tube (Qiagen 761115) at time point 1.

Height and weight were measured according to previously used training protocol modified from Lohman et al. (1998). Body mass index (BMI) was calculated and compared to percentile references of Centers for Disease Control and Prevention—CDC 2000 curves (Kuczmarski et al. 2002): 5th percentile ≤ BMI < 85th percentile (normal weight), 85th percentile ≤ BMI < 95th percentile (overweight), and BMI ≥ 95th percentile (obesity) (WHO Expert Committee 1995).

Dietary intake

Twenty-four-hour dietary recall interviews were done using the USDA automated multiple-pass method (AMPM) for all children and adolescents at each time point with the assistance of a parent/guardian (Tucker et al. 2005). The FNDDS—USDA Food and Nutrient Database for Dietary Studies software [version 2.0 (2006) and version 3.0 (2008)—Beltsville, MD: Agricultural Research Service, Food Surveys Research Group] was used for energy and nutrient intake analyses (http://www.ars.usda.gov/SP2UserFiles/Place/12355000/pdf/fndds_doc.pdf).

To measure compliance with dietary guidance, the Healthy Eating Index [HEI (Guenther et al. 2007)] was also analyzed. The HEI has a maximum score of 100 with intakes of (1) total fruit (TF), (2) whole fruit (WF), (3) total vegetables (TV), (4) dark green and orange vegetables and legumes (DGOVL), (5) total grains (TG), (6) whole grains (WG) rated to five as maximum in each category, (7) milk, (8) meat and beans, (9) oils, (10) saturated fat and (11) sodium rated to a maximum of 10 in each category, and (12) solid fat, alcohol, and added sugar (SoFAAS) rated to a maximum of 20. By convention, a total score of more than 80 was considered “good,” scores of 51–80 indicated “needs improvement,” and scores of less than 51 were considered “poor” (Guenther et al. 2007; Miller et al. 2011).

Metabolites, proteomic, and genomic analysis

Metabolites

All samples from each year at each time point were randomized at time of analysis. Plasma Hcy was analyzed on HPLC, and erythrocyte SAM and SAH were analyzed on UPLC as reported previously (Wise et al. 1997). Lipid-soluble and water-soluble vitamins were determined separately using UPLC/MS/MS coupled to mass spectrometers (MS) as reported in Morine, Monteiro et al. (submitted). Complete metabolite analysis was done on 105 samples from both years at time point 1, 72 samples from both years at time point 2, and 61 samples from both years at time point 3. The time point and data points (i.e., samples) available for statistical analysis are reported in each table. Statistical analysis was done using age groups (<9 and ≥9 years of age) based on USDA guidelines for dietary intakes for ages 4–8 and 9–13 (http://www.cnpp.usda.gov/Publications/DietaryGuidelines/2010/PolicyDoc/Appendices.pdf.

Proteomics

The plasma proteome was analyzed in 110 samples from 6 different time points from the 2 years of sample collection. However, data from 61 participants at time point 1 (the same participants who completed three assessments) were used in subsequent statistical and computational analyses due to missing samples at time points 2 and 3. Somalogic Inc. (Boulder, CO) performed all proteomic assessments and was blinded to the clinical characteristics of participants in this study. Samples were analyzed as previously described (Gold 1995; Brody and Gold 2000; Gold et al. 2010; Ostroff et al. 2010; Brody et al. 2012).

Genomic analysis

Blood for DNA isolation was collected in a 3-ml PAX gene tube. DNA was isolated as per the manufacturer’s protocol (http://tinyurl.com/ot2ovuc). Whole-genome genotyping was done with the HumanOmni1-Quad, version 1.0 kits (Illumina, San Diego, CA) following the manufacturer’s protocol as described in Morine, Monteiro et al. (submitted).

Statistical analysis

All statistical analyses were performed with the use of the R programming language version 3.0.1 or SPSS version 15.0®. Unless specified otherwise, only those participants who had completed three questionnaires (n = 61) were included for ANCOVA (SPSS), adjusted by age, sex, location, year, and energy intake. Nutrient intakes were also adjusted for energy intake. Statistical differences in nutritional intake variables and metabolite levels between SAM/SAH clusters were determined with analysis of covariance (SPSS-ANCOVA), adjusting for age and sex with a significance threshold set at 0.05. To ascertain whether differences existed within participants’ metabolite levels and Healthy Eating Index at baseline (first time point) and at the end of the camp (second time point), paired t tests were performed for variables with normal distribution and Wilcoxon method for those with non-normal distributions using SPSS. The general linear model for repeated measures was also used in longitudinal analysis to adjust for confounding variables.

While 1M SNPs were measured in the study, analysis focused on SNPs within genes with known functional association with the metabolites measured in the study. This was accomplished by mining all genes from the MetaCore database (version 6.10 build 31731) with a direct functional connection to organic micronutrients. This resulted in 275 unique genes (S2_Gene_Annotate_Net), which were represented across 9515 SNPs on the SNP array. These 9515 SNPs were further filtered to remove those with a minor allele frequency less than 0.1, and those that significantly deviated from Hardy–Weinberg equilibrium, resulting in a final starting set of 4122 SNPs. Significant correlations between genotype in these SNPs and SAM/SAH levels (averaged across three time points) were assessed in each SNP using generalized estimating equations (GEE), as implemented in the geese function in the geepack R library (Højsgaard et al. 2006). Genetic analysis used one genotyping data set for each individual (16 individuals attended both years of the camp). In fitting the GEE models, SAM/SAH was modeled as a function of genotype at each SNP locus, adjusted by age, gender, mean total grain intake, and controlling for sibling relationships among the participants (the latter being included as a background correlation structure in the GEE models), which were 9 pairs and 1 trio. A significance threshold was set at 0.1 after correcting for multiple testing using the Benjamini and Hochberg method (Benjamini and Hochberg 1995).

Proteomic data were only comprehensively collected at the first time point, and thus longitudinal analysis was not performed on this data set. The presence of outlier samples in many of the plasma proteins necessitated the use of robust linear regression [using the lmFit and eBayes functions from the limma R library (Smyth 2005)] to identify proteins associated with age group, sex, and body mass index (BMI). Fitted linear models controlled for age, sex, year, and location, and a significance threshold was set at 0.1 after correcting for multiple testing using the Benjamini and Hochberg method (Benjamini and Hochberg 1995). Associations between proteins and plasma vitamins, SAM, SAH, and SAM/SAH, were identified using sparse partial least squares, as implemented in the sPLS function in the mixOmics R library (Lê Cao et al. 2009). While robust regression was used to identify proteins associated with single outcome variables, sPLS allowed for broader analysis of correlation patterns between multiple proteins and multiple metabolites. As sPLS was used for exploratory analysis of data set covariance patterns, variable selection was performed by keeping the top 50 variables in both dimensions, i.e., a lasso penalty was imposed that shrank all but 50 variable weights in each dimension (Kim-Anh et al. 2008).

Data mining

Proteins correlated with age, sex, or metabolites were annotated based primarily on manual curation using Online Mendelian Inheritance in Man (OMIM—http://www.omim.org/) or published reports. Functional analysis of proteins with high similarity in hierarchical clustering dendrograms (“branches” of dendrograms displayed in Figures) was performed using the MetaCore™ system version 6.15 build 62452 (Thomson Reuters GeneGO) using the Autoanalyze algorithm. Although small gene clusters are not ideal for drawing broader biological conclusions from overrepresentation analysis, we used functional overrepresentation analysis to determine whether the genes that clustered together in hierarchical clustering also tended to share common functional roles (as evidenced by significantly overrepresented pathways). We observed that SNPs/genes that were members of a cluster often mapped to different regions of the same chromosome or different chromosomes (Fig. 2). We extended the analysis to protein clusters (Fig. 3) for the same reason, to determine whether the proteins that clustered were functionally related. While all such analysis is limited because of knowledge represented in databases (i.e., publication bias), the use of this strategy expands the single-gene/protein analysis to functional pathways. We provide both single-gene/protein annotations and the functional pathway data (see tabs in Supplementary EXCEL files for each figure).

Results

Demographic and anthropometric data

A total of 108 participants attending the summer day camp in 2 years were initially recruited for the baseline analysis, and samples from 105 were analyzed at baseline. Nutritional status based on CDC 2000 Body Mass Index classification showed a high prevalence of overweight and obese participants in this population: 60 % were considered healthy weight, 13.3 % were overweight, and 26.7 % were obese (Table 1). The majority of participants were older than 9 years, and more females than males attended the camp. A higher percentage of participants completed all three assessments in 2010 (66.7 %) compared with participants who attended the 2009 summer camp (33.3 %).

Table 1 Demographic and anthropometric data

Dietary intake assessments

Food intake results were based on the mean of three 24-h recalls done at the assessment periods (one 24-h recall per assessment period) since single food intake measurements are less reliable than repeated measures. Food intake during the camp was not monitored. No significant differences in energy and nutrient intake were observed based on sex. Younger (<9 years old) participants reported eating higher amounts of riboflavin (p = 0.014), folate (p = 0.026), vitamin B12 (p = 008), iron (p = 0.042), and vitamin D (p = 0.025) when compared to the participants who were ≥9 year old.

Healthy Eating Index (Guenther et al. 2007) scores were also determined from the 24-h dietary intake data (Table 2). Mean Healthy Eating Index (HEI) score from those who completed three 24-h recalls was compared between age and sex, and no statistically different results were found (data not shown). Total HEI for this population was below 51, which is indicative of a poor diet (McCabe-Sellers et al. 2007). Total dark green and orange vegetables and legumes (TDGOVL), whole fruit (WF), and whole grains (WG) had the worst pattern of scores in participants in our study (Table 2) compared to the 2005 Dietary Guidelines for Americans. The data showed negative associations between BMI and total fruit, whole fruit and milk, as well as a positive association between saturated fat and BMI, although after adjusting for age and sex, only the milk component was correlated with BMI (p = 0.04).

Table 2 Health Eating Index scores for all components

In separate longitudinal analyses to assess whether the food provided in the camp had any impact on HEI components, 24-h dietary intake data at baseline (time point 1) versus end of camp (time point 2) were analyzed with paired t test and the Wilcoxon method. Seventy-three participants could be paired in these two time points. Total fruit (p < 0.001), whole fruit (p = 0.001), and whole grain scores (p = 0.02) improved after the camp in both years. No other component presented statistically significant results.

Plasma and erythrocyte metabolite levels

Based on normality of the data, the mean and standard deviation or the median and range of for each plasma metabolite are provided in Table 3. ANCOVA was used to model the effect of sex, age, year, and/or location on plasma vitamin levels. Results of the analysis of individual metabolites and comparison with published literature are provided in Supplement 1. The key results are:

Table 3 Plasma vitamin levels
  • Vitamin A. Levels of vitamin A in this population were significantly above that reported in NHANES [36.4 (35.6–37.2)] (CDC 2012). A negative association was observed with HEI (r = −0.3; p = 0.01) and a positive correlation between plasma vitamin A levels and homocysteine (r = 0.51; p < 0.001).

  • Vitamin D. The population average level of vitamin D in the participants in this study was 21 ng/ml, which is just below the 50th percentile of the 2012 NHANES data (CDC 2012). However, 55 % (39 of 70) had vitamin D plasma levels below 20 ng/ml and one participant had a value below 7 ng/ml. The youngest participants (<9 years old) in the present study had higher mean vitamin D plasma levels compared with older participants (p = 0.005).

  • Folate. The standard cutoff for low plasma folate is <3 ng/mL, which is based on microbiological assays (Raiten and Fisher 1995). In our study using LC/MS methods, 96.3 % (n = 26 out of 27) of the participants measured were below this cutoff (Table 4). In addition, younger participants had higher plasma levels when compared to the older participants (p = 0.025).

    Table 4 Metabolite concentrations and nutrient intakes between SAM/SAH clusters
  • Thiamin. 70 % (n = 49 out of 70) of the participants in this study (Table 3) had values lower than the reference value of 1.6 μg/dl (Lynch and Young 2000). Thiamin plasma levels were positively correlated with intakes of total dark green and orange vegetables and legumes (r = 0.3; p = 0.018). Additional longitudinal analyses are in Supplement 1.

  • Riboflavin. Younger participants had higher levels of riboflavin compared with older participants (p < 0.001). Riboflavin plasma levels also were higher in females when compared to males [0.34 µg/dl (0.04–3.67) versus 0.18 µg/dl (0.05–0.69; p = 0.002)].

  • Pyridoxine and pyridoxal. High plasma levels of pyridoxal and pyridoxine were found in these participants (Table 3) compared to the limited number of reports measuring these metabolites in children and teens (Midttun et al. 2005; Footitt et al. 2012). Pyridoxal plasma levels were correlated with total fruit intake (r = 0.35; p = 0.006). Pyridoxal levels were inversely correlated with homocysteine levels (Pearson: r = −0.46; p < 0.001).

  • Vitamin E. The participants in this study (ages 6–14) had higher levels of vitamin E compared to NHANES 2012 data (Table 3). Plasma vitamin E was not associated with any HEI components. Average vitamin E plasma levels from years 1 and 2 decreased between time point 1 and time point 2 (Supplement 1).

  • Homocysteine. One hundred percent of the participants had Hcy levels under 15 µmol/l (reference values are <15 µmol/l) and corroborate the results found by Pfeiffer et al. (2005). For those who had two or three data points (total of 76), positive correlations were found between Hcy and (1) mean plasma vitamin A (Pearson: r = 0.51; p < 0.001), (2) mean riboflavin plasma levels (Spearman r = 0.28; p = 0.020), (3) mean erythrocyte SAM (Pearson r = 0.59; p < 0.001), and (4) erythrocyte SAM/SAH ratio (Spearman r = 0.49; p < 0.001). Negative correlations were found between Hcy and (1) mean plasma vitamin E (Spearman: r = −0.36; p = 0.001) and (2) mean pyridoxal plasma levels (Pearson: r = −0.46; p < 0.001). After adjustment for age and sex, these metabolites were statistically associated with Hcy. Bates et al. (2007) also found a negative correlation between B6 and Hcy.

  • Erythrocyte SAM, SAH, and SAM/SAH. Although SAM and SAH concentrations are often measured from plasma (Poirier et al. 2001c; van Driel et al. 2009), levels of these metabolites are higher in erythrocytes (Poirier et al. 2001b; Smulders et al. 2007; Hirsch et al. 2008) and may be less affected by physiological processes that induce cell death or turnover. The mean SAM/SAH level was determined since the ratio did not change with or without controlling for confounding variables such as improved whole fruit, total fruit, and whole grain intake at time point 2. Mean concentrations of erythrocyte SAM and SAH were 0.97 and 0.88 μmol/L, respectively, in 61 participants with at least three assessments per year. The mean erythrocyte SAM/SAH ratio for participants of this study was also considerably lower (0.98) than the ratios of 2–8 reported in the literature (Poirier et al. 2001a, c; Smulders et al. 2007; Hirsch et al. 2008; Dominguez-Salas et al. 2013). These differences may be due to measurement techniques, age, ancestral background, or environment factors that influence SAM/SAH ratio.

SAM/SAH ratio: groups and diet influence

The distribution of mean SAM/SAH ratios in 61 participants was further analyzed using the K-means clustering algorithm. Significantly distinct clusters of SAM/SAH (center to center distance 0.991, Fig. 1) were found at k = 2, but not at k = 3, 4, or 5. Cluster 1 (C1) consisted of 10 participants with higher values for SAM and SAM/SAH ratio compared with 51 participants in cluster 2 (C2) (Fig. 1; Table 4). The proportion of males to females differed (p = 0.021) between C1 (7/3, respectively) versus C2 (16/35, respectively), a finding consistent with gender differences in SAM/SAH ratios found in adults (Poirier et al. 2001a; Smulders et al. 2007). C1 had the youngest participants (age 8.2 ± 1.39 vs. 9.7 ± 2.3; p = 0.049) and more participants from the first year of the study. C2 had more participants from the second year of the study (p = 0.004). Body mass index and weight (normal, overweight, obese) were similar between clusters (not shown).

Fig. 1
figure 1

K means cluster results of SAM/SAH ratios. SAM/SAH was analyzed using SPSS K-means clustering program. Only K = 2 yielded significant differences between the groups with center cluster 1 to cluster 2 distance = 0.991

Plasma vitamin A levels also differed significantly between C1 and C2 (Table 4) with high vitamin A correlated with high SAM/SAH (r = 0.30; p = 0.02). Differences in homocysteine levels between the two groups approached significance (p = 0.053) (Table 4) with higher homocysteine (but still normal) plasma levels and higher SAM/SAH (r = 0.42, p = 0.001). Pyridoxal plasma levels (r = −0.37, p = 0.003) and vitamin E plasma levels (r = −0.26, p = 0.046) were negatively correlated with SAM/SAH ratio. Other observed differences in metabolite levels between C1 and C2 were found but they failed to reach significance at alpha = 0.05.

Analysis of the mean 24-h dietary intakes showed that participants in C1 were eating, on average, significantly more vitamin A (retinol equivalents), thiamin, iron, β-carotene, and energy compared with participants in cluster 2 (Table 4). Sixteen individuals attended the camp in both years. Of these, 14 had decreased SAM/SAH ratios between year 1 and year 2 (Table 5). In addition, levels of all metabolites except SAH and vitamin D changed statistically in these participants between year 1 and year 2. Vitamin B12 and saturated fat intake changed between year 1 and year 2 but only as a trend. Changes in SAM/SAH ratio due to changes in diet and environment have been noted by others (Dominguez-Salas et al. 2013).

Table 5 Variables in 14 individuals that changed from higher SAM/SAH in year 1 to lower SAM/SAH in year 2

Genotype analysis

While the SAM/SAH ratio correlated with different dietary patterns (Table 4) and with indications of specific nutrients affecting the methylation potential, genetic variation may also contribute to the observed levels in plasma and erythrocyte metabolites. Bottom-up analysis of single-gene variants (e.g., SNPs) is unlikely to explain a complex phenotype such as the relationships between SAM and SAH. Genome-wide association analysis, a top-down approach, typically uses sample sizes much larger than the number of participants in this study. Hence, the genotype association with SAM/SAH ratio in this study was done with a middle-out approach. Middle-out is an emerging approach that uses a predetermined subset of high-dimensional data that are limited to a system of interest (Radulescu et al. 2008; de Graaf et al. 2009; Panteleev et al. 2010; Secomb and Pries 2011). Specifically, we matched the genetic system to be analyzed to the plasma and erythrocyte metabolites measured in this study since many are involved in the one-carbon pathway (folate, pyridoxal/pyridoxine, thiamin) or were correlated with SAM/SAH (vitamin A, E, homocysteine). A commercial software and database (GeneGO MetaCore) were used to identify 275 genes in pathways and networks involved in metabolism of the 11 metabolites measured in this study (Fig. 2 and gene list tab in Supplement 2). These are referred to as the micronutrient neighborhood genes.

Fig. 2
figure 2

Heatmap of significant SNPs associated with SAM/SAH ratios. SNPs statistically associated with SAM/SAH ratios (left axis, displayed high SAM/SAH to low) corrected for multiple comparisons were identified using procedures described in “Methods.” Two-hundred and sixty-seven (267) genes were used for genetic analysis (Supplements 3 and 5)

The genotyping platform used in this study included 9515 SNPs in 268 of these 275 genes. Of the 9515 SNPs, 4122 were selected for analysis based on preprocessing criteria (see “Methods”). Significant correlations between genotype and ratio of SAM/SAH levels were assessed in each SNP using generalized estimating equations (GEE), wherein SAM/SAH was modeled as a function of genotype at each SNP locus, adjusted by age, gender, mean total grain intake, and sibling relationships among the participants (the latter being included as a background correlation structure in the GEE models). The raw ratio of SAM/SAH was used instead of the SAM/SAH cluster memberships due to the higher information content of continuous variables. Resulting p values were corrected for multiple testing using the procedure proposed by Benjamini and Hochberg (2000). Forty-six SNPs in 25 genes (annotate tab in Supplement 2) were found to be associated with the SAM/SAH ratio. Two SNPs were not assigned to genes in public databases.

The statistically significant genes identified were involved in organic ion (e.g., ABCC4), and other transport systems (e.g., SCL1A1, SCL28A3, SCL29A3) and micronutrient metabolism pathways (e.g., ALDH1A3, BHMT) including genes involved in vitamin A (BCMO1, RDH5) and vitamin B6 (PDXK) metabolism. Others have analyzed associations of plasma Hcy levels and Alu and LINE-1 DNA methylation status with 330 polymorphisms in 52 genes directly involved in SAM/SAH metabolism in an elderly population (Wernimont et al. 2011). None of the statistically significant SNPs found in that study overlapped with the study reported here which may be due to differences between genetic makeups, diet intakes, and ages of participants involved in the two studies.

The Metacore™ functional analysis tools were used to determine gene ontology processes of genes that clustered in branches of the heatmap (red arrows, Fig. 2). The statistically significant SNPs/genes within the major 5 clusters identified cell signaling, energy metabolism, negative insulin regulation, ion and lipid transport, and cell adhesion processes as the main functions of the 25 identified genes (Net tab in Supplement 2). Hence, the middle-out approach identified genes within a wider SAM/SAH system than defined solely by the one-carbon and methylation pathways.

Proteins correlated with plasma metabolite levels

DNA aptamer technology (Zichi et al. 2008; Gold et al. 2010) was used to measure 1,129 proteins in plasma of 61 participants at baseline. Sparse partial least squares analysis identified 100 protein aptamers (99 unique proteins) associated with baseline plasma Hcy, vitamin A, riboflavin, vitamin E, thiamin, and pyridoxal and erythrocyte SAM, SAH, and SAM/SAH levels (Fig. 3, Annotate tab in Supplement 3). Although the Pearson correlation coefficients for each protein–metabolite pair were modest (between −0.5 and +0.5), a large number of proteins showed similar correlation coefficients with Hcy, vitamin A, and riboflavin and inversely with thiamin and pyridoxal. These proteins participate in a wide variety of metabolic, neuronal, immune, growth, and development processes (Fig. 3), and each is annotated in Supplement 3 (Net tab).

Fig. 3
figure 3

Heatmap of proteins most strongly correlated with metabolites. DNA aptamer technology was used to analyze 1,129 proteins in plasma of 107 samples from all time points. Analysis was done at Somalogic (Boulder, CO). A complete set of data was available for participants time point 1. See “Methods” for statistical analysis procedures. Annotation of each protein was done with OMIM or published reports and is provided in Supplement 1. Plasma membrane and soluble proteins refer to proteins expected to be in the plasma. Functional analysis of proteins with similar correlation coefficients (branches marked by red arrows above the heatmap) was performed using GeneGO MetaCore™ programs for identifying gene ontologies (GO) (S2_Gene_Annotate_Net)

Cluster analysis showed that ~85 % of the proteins that strongly correlated with plasma metabolites (as opposed to erythrocyte metabolites) were intracellular proteins released from damaged cells (right branch of heat map). In contrast, 72 % of the proteins that are predicted to be in the plasma were in the left branch. The significance of the cluster results will require further studies but may be related to developmental reprogramming of tissues expected for individuals in this age range. Cole et al. (2013) analyzed 4705 proteins using iTRAQ mass spectroscopy methods and identified proteins strongly correlated with vitamin A (retinol binding protein 4), 25-hydroxyvitamin D (vitamin D binding protein), α-tocopherol (apolipoprotein C-III), copper (ceruloplasmin), and selenium (selenoprotein P isoform 1) in Nepalese children ages 6–8. None of these proteins were correlated in the study reported here, a difference that may be due to genetic makeup and environment dissimilarities between the two study locations.

MetaCore™ data mining of proteins with closely correlated coefficients (i.e., with similar correlations and designated by red arrows above the heatmap) identified functional subnetworks by mapping to gene ontology (GO) terms (net tab in Supplement 3). For example, the first branch (NCR2, MAP2K1, HBA/HBB, MAP2K4, MAP2K1) was associated with lower levels of erythrocyte SAM and plasma Hcy and was significantly associated with response to hormone stimulus (p = 8.992 × 10−48) as well as response to organic nitrogen (p = 1.009 × 10−43, net tab in Supplement 3). A second branch (KLK6, HIBADH, NR1D1, NCK1, PDGFC, FGF7, DLL4, LGALS2, PTEN) correlated with low plasma vitamin A and riboflavin and was also associated with responses to hormones and organic nitrogen. Results for the remaining branches are provided in Supplement 3. Many individual soluble and membrane proteins have relatively strong positive associations with erythrocyte SAM levels (Fig. 3), consistent with the central role of SAM in methylation reactions in diverse pathways and functional processes. The breadth of biological processes associated with vitamins and metabolites (Hcy, SAM, SAH) is not unexpected since vitamins are cofactors for a large and diverse set of enzymatic functions (Ames et al. 2002).

Proteins correlated with physiological and anthropometric variables

The availability of the quantitative levels of blood proteins provided an opportunity to discover correlations with other physiological or anthropometric variables. Using robust linear regression, fourteen proteins were found to be associated with BMI (Table 6) including proteins involved in appetite regulation (leptin and ghrelin) and inflammatory processes (CRP, LGALS3BP, CD70, and APCS). Leptin levels have been positively associated (Chu et al. 2001; Hansen et al. 2010) and ghrelin is negatively associated with BMI (Stylianou et al. 2007) in adolescents. However, ghrelin levels are high before meals (Moran 2009), and the samples in this study were taken in the fasting state, suggesting that the acute appetite stimulus was highest in the high-BMI individuals in the study.

Table 6 Proteins associated with BMI

Correlation analysis identified 49 proteins associated with age group and 30 proteins correlated with sex (Supplement 5). Mapping sets of proteins to function revealed a difference between females and males in glucose transport (through FASLG, PGK1, IDS) and insulin metabolic processes (through IDE). Protein levels indicate that these subsystems might be more active in males than females, a finding consistent with different metabolic trajectories induced by puberty (DiVall and Radovick 2009; Lewis and Lee 2009) including processes involved in lipid metabolism and fat deposition [e.g., (Staiano and Katzmarzyk 2012; Shapira 2013)]. Due to space limitations, the figures and analysis of these proteins are provided in Supplement 4.

Discussion

Experimental design and main results

Data from the sequencing of the human genome were to enable a more comprehensive analysis of physiology, responses of individuals to drugs, nutrition, and lifestyle factors, and personalized healthcare. However, many human experiments continue to use case–control designs, which implicitly assume that individuals randomized to the case (intervention) group and to the control group are genetically identical with similar, if not identical, diets and lifestyles. The HapMap (The International HapMap 2005; Frazer et al. 2007), Human Genetic Diversity (http://www.hagsc.org/hgdp/), 1000 Genomes Projects (Durbin et al. 2010), and every published whole-genome sequence [references in (Olson 2012)] demonstrated that individuals are genetically unique. Metabolomic, proteomic, and clinical data also demonstrate biochemical individuality (Williams 1956; Robinette et al. 2012).

The experimental design employed in the observational study reported here provided the same dietary and physical environment to all participants in a community-based summer day program for 6–14 year olds. This design accounts for heterogeneity in genetic makeup, individuality in metabolites and proteins, and dietary differences by measuring these parameters and aggregating the data for population-level analysis (this report) and for group level analysis (SAM/SAH cluster in this report and Morine, Monteiro et al. submitted).

Discovery-based methods were used to identify two groups of participants that were statistically different in SAM/SAH ratio. A difference in SAM/SAH levels based on gender was also found consistent with published results of others (Poirier et al. 2001c; Van Hecke et al. 2008). Average intakes of energy, thiamin, iron, β-carotene, and vitamin A (RE) were higher in the participants in the high SAM/SAH group compared with the low SAM/SAH group. Longitudinal analysis of 14 participants who attended 2 years of the summer day camp indicated that saturated fat intake, vitamin B12 intake, and metabolites changes might have contributed to differences in SAM/SAH ratio. Hence, SAM/SAH may be a marker of nutritional status, a conclusion consistent with previous published reports with larger numbers of participants (Poirier et al. 2001a; Barbosa et al. 2008; Dominguez-Salas et al. 2013). Others showed that the SAM/SAH ratio correlated with differences in methylation at metastable epialleles based on season and food availability (Waterland et al. 2010). Changes in epigenetic programming at critical developmental windows such as in utero, early childhood, or during puberty have been associated with developmental plasticity, health, and susceptibility to chronic diseases in adults (Barker et al. 1993; Gluckman et al. 2009; Kussmann et al. 2010). The ability to identify plasma proteins involved in inflammatory or other metabolic processes that respond to nutritional interventions may lead to diets that improve the SAM/SAH ratio and associated physiological processes.

To determine whether genotypic differences were also associated with the SAM/SAH levels, we used a middle-out genetic analysis. Specifically, existing pathway and network knowledge was mined to identify a micronutrient-related neighborhood of 275 genes whose protein products interact with, regulate, or metabolize micronutrients measured in the study. Forty-six SNPs in 25 genes were found to be significantly associated with differences in the SAM/SAH ratio after correction for multiple comparisons. Expanding single-gene [e.g., (Lee et al. 2011)] or single pathway [e.g., (Kiyohara et al. 2006; Kelemen et al. 2008; Wernimont et al. 2011; Signorello et al. 2011)] analysis allows for a broader interrogation of the system interacting with the measured metabolites. The discovery-based middle-out strategy also avoided a pitfall evident in genome-wide association studies that contain a large multiple-testing burden due to the presence of many SNPs (and hence statistical tests) that may not be core to the study.

In addition, a phenotype may be caused by genetic contributions to many different metabolic processes (Kaput 2008). For example, individuals of the same age or sex had different proteomic profiles (Supplement 4 and 5) and individuals with identical SAM/SAH differed in genotype in one or more of the statistically significant SNPs identified in this study (Fig. 2). The genetic contribution to a complex phenotype is therefore dispersed among many genes and most likely with different population average effect sizes (Peltonen and McKusick 2001). Effect sizes of the same variant at each locus may differ between individuals due to epistatic interactions and gene–environment interactions (Williams 1956; Olson 2012). Any given protein or DNA marker needs to be interpreted in conjunction with other variants in the genome, and the similarity of genotype (and/or metabolite and/or proteomic) patterns may reveal more than a single genetic variant, even when corrected for multiple comparisons and even if the study were highly powered. The effects of diet on SAM/SAH were analyzed separately from effects of genotype on SAM/SAH although the ratio was the common variable in the analysis. The sample size is too small to calculate the interaction term and the effect size for each nutrient–SNP combination.

These data and those of others (Waterland et al. 2010; Dominguez-Salas et al. 2013) are examples for genetic association and metabolomics studies: using metabolite levels without knowledge of usual dietary intakes may result in misclassification of individuals, thereby affecting association analysis. Lack of dietary intake data (and the genetic heterogeneity added in large sample sizes) may be among many reasons for the “missing heritability” (Hardy and Singleton 2009; Manolio et al. 2009; Hebebrand et al. 2010; Ober and Vercelli 2011) in genetic association studies. The interpretation of the differences in metabolite levels was made possible by assessing nutrient intakes, in this case by using 24-h dietary recalls. While dietary assessments have been criticized (Thompson et al. 2010) and are not routinely used in genome-wide studies, any validated intake data are better than having no information or data about environmental conditions.

Plasma metabolite levels—population-based data aggregation

In addition to the main finding of SAM/SAH groups with different dietary intakes, metabolite levels, and genetic differences, data were also aggregated for “population”-level analysis. The average diet of the participants was classified as poor, which may affect the observed levels of individual plasma metabolites and protein measured in this study relative to studies in populations with different nutritional intakes. For example, total dark green and orange vegetables and legumes (TDGOVL), whole fruit (WF), and whole grains (WG) had the worst pattern of scores in participants in our study compared to the 2005 Dietary Guidelines for Americans. Intake of WG in children and adolescents is not well documented and subject to methodological questions (Newby et al. 2007; Garden et al. 2011); however, studies conducted before (Forshee and Storey 2003) and after (O’Neil et al. 2011) the release of the 2005 Dietary Guidelines for Americans (DGA) indicated that WG consumption is low in participants below 18 years of age. Others have found inverse associations between TDGOVL, WF, WG, and central obesity among adolescents (Bradlee et al. 2010), and overweight girls ate fewer servings of fruit than the non-overweight girls from low socioeconomic status [SES—(Wilson et al. 2009)]. Lower intakes of milk were also found to be associated with being overweight (Wilson et al. 2009) and having a high BMI (Garden et al. 2011). Other studies are focused on adults, making it difficult to compare results (French et al. 1999; Ello-Martin et al. 2007).

Over 50 % of the participants in this study had low levels of vitamin D and 70 % were below the recommended range of thiamin. Metabolite–metabolite interactions were detected since positive correlations were found between Hcy and (1) mean plasma vitamin A, (2) mean riboflavin plasma levels, (3) mean erythrocyte SAM, and (4) erythrocyte SAM/SAH ratio. The correlation between higher levels of vitamin A and higher levels of homocysteine is novel but will require further research to understand its significance. Negative correlations were found between Hcy and (1) mean plasma vitamin E and (2) mean pyridoxal plasma levels. Bates et al. (2007) also found a negative correlation between vitamin B6 and homocysteine. These associations were further analyzed by discovery-based methods and will be reported elsewhere (Morine, Monteiro et al. submitted).

Correlations of plasma proteins with vitamins, BMI, age, and sex

Robust linear regression analysis identified proteins associated with plasma and erythrocyte metabolites, BMI, age, and sex. These analyses were limited since sufficient samples were available only for time point 1 in both years of the study. The 99 plasma proteins were marginally associated with levels of individual or combinations of plasma and erythrocyte metabolites. The results indicate that no one protein could be a marker for a plasma metabolite and specifically for a micronutrient. Additional studies would be needed to test whether combinations of the proteins associated with metabolite levels may be markers of micronutrient status. Post-analysis data mining indicated the many biological processes that might be influenced by or correlated with changes in plasma levels of these metabolites including basic metabolic pathways, hormonal responses, immune, neuronal, and growth processes.

The prevalence of obesity and overweight in the study participants was 26.7 and 13.3 %, respectively, while the national average in the United States for children ages 6–11 is 33 % overweight and obese (http://www.cdc.gov/healthyyouth/obesity/facts.htm). Proteomic analysis indicated that leptin and inflammatory markers CRP, LGALS3BP, CD70, and APCS were associated with BMI in this population and in other published reports (Chu et al. 2001; Tam et al. 2010; Hansen et al. 2010). Ghrelin was also positively associated with BMI which differs from the work of others (Stylianou et al. 2007). Since ghrelin levels are high before meals (Moran 2009), the results reported here may be due to sampling in the fasted state. Different combinations of proinflammatory markers have been found in overweight or obese children in studies, e.g., high CRP levels not only associated with obesity but BMI can predict CRP levels [rev in (Tam et al. 2010)]. Additional studies will be needed to determine whether the other proteins correlated with BMI (Table 6) are specific to the gene–environment interactions that occurred in this population.

Forty proteins differed between younger (<9) and older (≥9) participants and 30 between males and females. A noteworthy outcome of this study is that proteomic analysis may allow for the determination of biological age as opposed to chronological age since some younger children clustered with the older children and vice versa (Supplement 4). Manual annotation showed that plasma and membrane proteins were involved in neuronal-, immune-, and growth-related processes which associate vitamin homeostasis with basic cellular and physiological processes (Ma et al. 2009; Liu et al. 2013; Swartz et al. 2013; Kedishvili 2013). Age and sex differences in proteins involved glucagon and insulin signaling and glucose transport suggesting that these processes change during aging and by sex. These results need to be tested using metabolomic and proteomic analysis targeted to these processes in other populations with more participants than used in this study. Many of the proteins that differed between males and females would be expected (CGA/FSHB), and others may provide stimulus to examine early changes in neuronal, metabolic, and immune system functions that differ between the sexes. Other studies have also shown differences in proteomic profiles in plasma between males and females [e.g., (Silliman et al. 2013)]. Changes in basic metabolic processes during aging (Staiano and Katzmarzyk 2012; Shapira 2013) and puberty (DiVall and Radovick 2009; Lewis and Lee 2009) make it challenging for creating dietary recommendations to maintain health.

Community-based participatory research

The translational study described here used community-based participatory research principles, one of which emphasizes “conducting research that is beneficial to the communities involved” (Kannan et al. 2009), a form of socially engaged nutrition science (Beauman et al. 2005; Cannon and Leitzmann 2005; Schubert et al. 2012). Consultations with community members and frequent meetings with community leaders ensured active participation of the community and participants. The “standard” CBPR methods (Israel et al. 2005) were extended (McCabe-Sellers et al. 2008) using a discovery-based analytical approach that classifies participants by similar homeostatic profiles and allows for analyses of individuals. While difficult to quantify, this research study raised the awareness of nutrition in health of participants and their families through the research activities and meetings with the community. Further dialogs with the community are planned to provide a report of the findings of this research to continue the dialog on improving health through nutrition and lifestyle choices.

Limitations and reproducibility

Although the sample size of this study was small, significant result were obtained for diet intakes, metabolite levels, proteins, and genetic associations. Proteomic and genomic data were corrected for multiple comparisons. Nevertheless, human genetic, cultural, and lifestyle (including dietary intakes) variability will make it challenging to replicate experimental results: the specific genotype (see Fig. 2) and diets (averages shown in Table 2) produce different physiological readouts as shown for proteomic analysis (e.g., Supplements 3 and 5). Hence, the results reported here are specific to the genetic makeup of individuals in the study, their dietary patterns at the time of the study, geographical location, built environment, and socioeconomic factors that alter their physiology (Hochberg et al. 2011; United Nations Standing Committe on Nutrition 2012). Nevertheless, the results presented can be integrated into other experimental findings assuming that all reports analyze genetic makeup, diet, and metadata associated with the experimental setting. Progress in nutrition and health research would be more rapid with the development of harmonized protocols that would allow for the integration of high-dimensional data sets from different genetic, cultural, and environmental backgrounds.