Background

There is intense clinical and research interest in blood and urinary biomarkers to diagnose disease, to risk stratify individuals for prognosis and potential intervention, and to provide insights into disease pathogenesis [1]. Hence, it has been proposed that biomarkers may prove useful in the goal of developing what has been referred to as "predictive, preemptive, personalized medicine" [2].

In the present analysis, we examined biomarkers involving four biological systems: inflammation, natriuretic peptides, hepatic function, and vitamins. Circulating inflammatory, natriuretic peptides [35], hepatic function [6, 7] and vitamin [8] biomarker concentrations have been linked to increased risk of cardiovascular disease and mortality. For instance, the inflammatory marker C-reactive protein (CRP) predicts incident stroke [9], coronary heart disease [1012], and all-cause mortality [13].

Because of their prognostic importance, there has been interest in understanding the environmental and genetic factors contributing to interindividual variability in systemic biomarker concentrations. Prior reports support the heritability of systemic biomarker concentrations reflecting inflammatory processes [14, 15], natriuretic peptides activation [16], hepatic function [17, 18], and vitamin metabolism [19]. The majority of prior studies examining the genetic contribution to biomarker concentrations have examined genetic linkage or variation in selected candidate genes. Although there have been some successes with both approaches [20], the specific genes contributing to variability of most circulating biomarkers are incompletely understood. We examined the relation of single nucleotide polymorphisms (SNPs) on the Affymetrix 100K chip to variation in systemic biomarker concentrations. The GWAS approach has the advantage that it is not constrained by known physiologic associations.

Materials and methods

Study sample

The biomarkers were assessed in the Framingham Offspring sample, which is described in the Framingham 100K Overview [21]. Briefly, the Framingham Offspring were recruited in 1971–1974 from the children (and children's spouses) of the Framingham Original Cohort [22]. The examinations and the number of participants in which the biomarkers were assessed vary by analyte, as noted in Table 1.

Table 1 Types of traits phenotype master trait table, exam cycle, numbers of participants in family plates with phenotype

Phenotype definitions and methods

Biomarkers were measured on morning specimens after an overnight fast (typically 10 hours) between 7:30 and 9:00 am. EDTA and citrated blood collection tubes are centrifuged in a refrigerated centrifuge immediately after venipuncture. Serum blood collection tubes sit for 30 minutes after venipuncture to allow for complete clotting. Specimens are processed immediately after centrifugation. Blood samples were centrifuged and frozen at -20° (examination 2 through 4) and -80° (examinations 5 through 7). The measurement of the inflammatory markers is detailed in the inflammatory marker manual at the National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007.

Inflammatory biomarkers (except CRP) were measured in duplicate with commercially available ELISA kits: R&D Systems (intercellular adhesion molecule-1, interleukin-6, monocyte chemoattractant-1 [MCP1], P-selectin, tumor necrosis factor receptor 2, high sensitivity tumor necrosis factor-α), Bender MedSystems (CD40 ligand), Oxis (myeloperoxidase), and BIOMEDICA (osteoprotegerin). High-sensitivity CRP was measured in 2002 and 2004 on examination cycle 2, 6 and 7 specimens with a Dade Behring nephelometer; the less sensitive Hemagen assay was used in 1998 for examination cycle 5 specimens. Natriuretic peptides were measured by Shionogi using a noncompetitive high sensitivity immunoradiometric assay [23]. Liver function tests were measured at examination cycle 2 by Quest Diagnostics (previously METPATH) with a variety of methods: γ-glutamyl aminotransferase was measured with spectrophotometry [7], bilirubin was measured by the colorimetric method (Dow Bilirubin Kit) [24, 25]; alkaline phosphatase was measured with the kinetic method [26, 27]; aspartate aminotransferase and alanine aminotransferase were measured using the kinetic method with Beckman Liquid-Stat Reagent Kit [28]. Vitamin K status was measured as phylloquinone concentrations with reverse phase high-performance liquid chromatography [29], and percentage of undercarboxylated osteocalcin was measured by radioimmunoassay [30, 31], Vitamin D status was measured as 25(OH)D concentrations by using RIA (DiaSorin, Stillwater MN).

Plasma samples were used for natriuretic peptides, vitamin K phylloquinone, vitamin D, and some inflammatory markers including CD40 ligand, osteoprotegerin, P-selectin, tumor necrosis factor receptor 2, and tumor necrosis factor-α. Serum samples were analyzed for liver function, vitamin K, % undercarboxylated osteocalcin, and other inflammatory markers including CRP, interleukin-6, soluble intracellular adhesion molecule-1, MCP1, and myeloperoxidase concentrations. The reproducibility of the biomarkers was good; the intra-assay coefficients of variation were CD40 ligand 4.4%, interleukin-6 3.1%, intercellular adhesion molecule-1 3.1%, MCP1 4.1%, myeloperoxidase 3.0%, osteoprotegerin 3.7%, P-selectin 3.0%, tumor necrosis factor-α 8.8%, and tumor necrosis factor receptor-2 2.3%; the inter-assay coefficients of variation were brain natriuretic peptide 12.2%, n-terminal-atrial natriuretic peptide 12.7%. The Kappa statistic for 146 CRP samples run in duplicate was 0.95 [32]. Coefficients of variation for aspartate aminotransferase and alanine aminotransferase, respectively, were 10.7 and 8.3%. The coefficients of variation for low and high Vitamin K plasma phylloquinone concentrations were 15.2 and 10.9% respectively on control specimens. For low, medium and high osteocalcin concentrations used to determine Vitamin K percentage of undercarboxylated osteocalcin, the coefficients of variation were 22.3, 12.8, and 7.8%, respectively. For Vitamin D, the coefficients of variation were 8.5% and 13.2%, respectively.

Genotyping methods

Details of the genotyping methods are available in the Framingham Heart Study 100K Overview [21]. Framingham staff extracted genomic DNA with a Qiagen Blood and Cell Culture Maxi Kit from immortalized lymphoblasts. Briefly, SNPs on the Affymetrix 100K chip were genotyped (n = 112,990 autosomal SNPs) in a sample of family members of the Original and Offspring cohorts of the Framingham Heart Study [33]. SNPs were excluded for the following reasons: minor allele frequency <10% n = 38062; call rate <80% n = 2346; Hardy-Weinberg equilibrium p-value < 0.001 n = 1595, leaving 70,987 SNPs available for analysis.

Statistical analysis methods

We created standardized multivariable-adjusted natural log transformed biomarker residuals adjusted for the covariates listed in Table 1. The CRP average residuals were constructed as follows: (1) create age- and sex-adjusted or multivariable-adjusted residual at each of exams 2, 6 and 7; (2) take average of the residuals across exams; (3) the residual was excluded if there were not at least 2 exams for its calculation. In some instances we performed additional transformation (e.g. Winsorized models). Tobit models were used to generate residuals for the natriuretic peptides, because 2% of N-ANP levels and 30% of BNP levels were below the respective assay detection limits. Association and linkage results examining age- and sex-adjusted residuals are posted at the web site. As described in the Overview [21], we examined generalized estimating equations (GEE) and family based association testing (FBAT), assuming an additive genetic effect, to account for correlation among related individuals within nuclear families. We also used Merlin software [34] (splitting the largest families) to compute exact identity by descent linkage, with variance component analysis in SOLAR using 11,200 SNPs and short tandem repeats [35]. Traits with extreme values, as defined by 4 standard deviations away from the mean, were Winsorized at 4.0 in secondary linkage analyses to determine the sensitivity of the logarithm of the odds (LOD) score to the presence of outlier values.

Results

Twenty-two biomarker traits (plus 4 additional CRP traits) were analyzed in 1012 Offspring participants, on log-transformed multivariable-adjusted residuals as outlined in Table 1 (minimum-maximum per phenotype n = 507–1008). The phenotypes were collected at various Framingham Offspring examinations from cycles 2 to 7. At examination cycles 2 and 7 the mean age of the participants with both phenotype and genotype data was 41 ± 10 and 59 ± 10 years, and 51.2% and 51.1% were women, respectively. For details of biomarker phenotype-genotype association refer to http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007.

There were 58 SNPs associated with biomarker concentrations with a p < 10-6 by GEE. The 25 most statistically significant GEE associations sorted by p-value, listed with their corresponding FBAT p-value are shown in Table 2a. MCP1 concentrations were associated with rs2494250 (p = 1*10-14) and rs4128725 (p = 3.68*10-12), both on chromosome 1, near the FCER1A and the OR10J1 genes, respectively. CRP concentrations averaged over 3 examinations (about 20 years) were associated with rs2794520 (p = 2.83*10-8) and rs2808629 (p = 3.19*10-8).

Table 2 Top genetic associations with biomarkers based on the lowest p value for GEE test (2a), FBAT (2b), and Linkage (2c)

We estimated the amount of variability in biomarker concentrations explained by the 4 most statistically significant SNPs in the GEE model using a pseudo measure of R2 based on log-likelihood estimates [36]. The two most statistically significant GEE SNPs explained about 7% and 4% of the variability in MCP1 concentrations (R2 = 0.070 for rs2494250 and R2 = 0.043 for rs4128725); for CRP concentrations averaged over examinations 2, 6, and 7 the two most statistically significant GEE SNPs explained 2.3% of the variability [R2 = 0.023 for rs2794520 and rs2808629) [36]. We also examined the linkage disequilibrium between the most statistically significant GEE SNPs: rs2494250 and rs4128725 had a D' = 0.724 and an r2 = 0.196, whereas rs2794520 and rs2808629 served as perfect proxies for each other (D' = 1; r2 = 1).

With FBAT, 11 SNPs were associated with biomarker concentrations with a p < 10-6. The two most statistically significant SNPs for FBAT were the same two SNPs observed with GEE: MCP1 concentrations were significantly associated with rs4128725, p = 3.28*10-8, and rs2494250, p = 3.55*10-8 (Table 2b). In addition, B-type natriuretic peptide (rs437021, p = 1.01*10-6) and Vitamin K% undercarboxylated osteocalcin (rs2052028, p = 1.07*10-6) also were nominally statistically significantly associated.

In Table 2c we list the magnitude and location of LOD scores > 2.5 observed for the circulating biomarker traits. Because we were concerned that some of the LOD scores might be inflated by individuals with extreme marker concentrations, we reanalyzed the LOD scores on Winsorized residuals. The peak Winsorized LOD scores observed were for the biomarkers MCP1 (4.38, chromosome 1), and CRP (3.23, chromosome 10; 3.28, chromosome 1). Of note the 1.5 LOD support intervals for the linkage peaks on chromosome 1 included the SNPs significantly associated with MCP1 and CRP reported above (GEE model).

In an effort to potentially uncover genetic pleiotropy we display in Table 3 two ways to synthesize findings across phenotypes. We examined 3 correlated inflammatory biomarker phenotypes, interleukin-6, CRP and fibrinogen, and report SNPs that were significantly associated with all 3 phenotypes by GEE or FBAT at p < 0.01 (Table 3a). We also examined phenotypes within a specific biomarker category including CRP over multiple examinations, liver function tests and vitamin concentrations (nutrients involved in bone health [37, 38]), and display in Table 3b SNPs significant by either FBAT or GEE at a p < 0.01 for all of the phenotypes in a given phenotype cluster.

Table 3 Combined phenotypes

In Table 4 we compared our data with previously reported phenotype-genotype associations in the published literature on systemic biomarker concentrations: bilirubin concentrations (TA repeat in UGT1A1) [39, 40]; CRP (CRP) [20, 32, 4150], intercellular adhesion molecule-1 (ICAM1) [5154], interleukin-6 (IL6) [5562], and MCP1 (CCL2 = MCP1 gene [63, 64]). Unfortunately, there were no SNPs within 60 KB of the ICAM1 gene on the Affymetrix 100K chip. There was no association between bilirubin concentrations and 1 SNP within 30 kb (rs741159) + 2 more SNPs within 50 kb (rs726017 and rs6752792) of a previously reported TA repeat in UGT1A1. Additionally, there was no association between interleukin-6 concentrations and SNPs in the IL6 region despite one SNP in high LD (linkage disequilibrium; r2 = 0.819) with the previously reported rs1800795 (-174G/C) SNP. Similarly, we did not observe an association between MCP1 concentrations and SNPs in the CCL2 region, despite one SNP with a high r2 (0.956) with the SNP previously reported in the literature. For CRP concentrations, we had 2 SNPs in perfect LD with rs1205, and we observed strong evidence for replication. However, it should be noted that this association has been previously reported by Framingham investigators in unrelated participants [32]. Similarly, rs431568, which is in high LD (r2 = 0.83) with 2 previously associated SNPs (rs3116653 and rs1417938), was highly associated with many of the CRP phenotypes.

Table 4 Comparison with the prior literature

Discussion

In collaboration with NCBI we have web-posted our unfiltered biomarker-genotype associations and linkage results to provide a resource to investigators seeking to understand and replicate their biomarker-genotype associations. We submit that the findings of highest priority for follow-up are associations that were detected by several statistical approaches. MCP1 was associated with 2 SNPs on chromosome 1 (rs4128725 and rs2494250) with p-values in the 10-8 by FBAT, ≤ 10-12 by GEE. Acknowledging that linkage is less powerful and accurate, we note that the 1.5 support interval for the MCP1 linkage peak (Winsorized maximum LOD 4.38) on chromosome 1 supports the GEE and FBAT analyses. Findings for CRP (chromosome 1), brain natriuretic peptide (chromosome 1) and Vitamin K % undercarboxylated osteocalcin (Chromosome 7) are also of potential priority for follow-up. We acknowledge that the ultimate validation of our findings will require replication in other cohorts and functional studies.

A fundamental challenge of GWAS tests is sorting through associations and prioritizing SNPs for follow-up. In the absence of external replication, one approach to synthesizing findings is to examine associations across similar biological domains, which may capture pleiotropy. We presented the exploratory analyses in Tables 3a and 3b, but reiterate that the findings will need to be examined in other cohorts.

Do the findings represent true positive genetic associations?

It is notable that some of the associations with the strongest statistical support were for associations between a gene and its protein product (e.g. CRP gene and CRP concentration). Cis-acting regulatory variants have been shown to influence mRNA and protein levels for many genes [65]. Studies involving additional biomarker phenotypes and variants (e.g. Affymetrix 500 K Chip) should clarify whether cis- or trans-acting regulatory variants explain the greatest proportion of phenotypic variation.

With GWAS, which typically test for the association of 1000s of SNPs with multiple traits, it is difficult for any specific association to achieve genome wide significance. For instance, a strict Bonferroni correction for the 30 traits tested in the present study with both age/sex- and multivariable-adjusted models and 2 statistical methods (0.05/(70,987*30*2*2) would require a p = 5.9 × 10-9. We submit that the most significant association in the selected biomarker group, the FCER1A rs2494250 SNP with MCP1 concentrations achieved genome-wide significance with a GEE p = 1.0*10-14 and a FBAT p = 3.5*10-8. It should be noted that rs2494250 and rs4128725 are in modest linkage disequilibrium (D' = 0.724 and r squared = 0.196) and hence, may be serving as proxies for the same causal SNP.

Several human and experimental studies suggest that the association between FCER1A and MCP1 concentrations is biologically plausible. FCER1A codes for the high affinity Fc receptor fragment for IgE. In vitro experiments with rat mast cells demonstrated that if aggregated the high affinity receptor for IgE (FcεRI) increased gene transcription and secretion of MCP1 [66]. Similarly, in mice mast cells if the FcεRI was occupied by small amounts IgE/antigen, MCP1 mRNA increased significantly [67]. In humans IgE and MCP1 concentrations are both increased in occupational asthma [68, 69]. Similar to the animal data, human mast cells exposed to anti-IgE antibody or to IgE released MCP1 [7072].

Comparison with prior literature

Our efforts to compare our findings with associations previously reported in the literature underscore some of the challenges in genetic association studies. The ICAM1 gene did not have any markers within 60 kb on the Affymetrix 100K chip. Of the 4 genes that did have SNPs in the marker genomic region coding, only the CRP association was replicated in our cohort; however as noted above we [32], as well as others [20], have previously reported this association. For bilirubin concentrations we previously reported significant linkage to chromosome 2q telomere [39] and a significant association to a TA repeat in UGT1A1, under this linkage peak [40] in Framingham unrelated participants. However, there was no association between bilirubin concentrations and the 3 SNP within 60 kb of UGT1A1. The previously reported interleukin-6-IL6 and the MCP1-CCL2 associations were not replicated. Of note, our group previously reported that rs1024611 [in CCL2] was associated with MCP1 concentrations in unrelated participants [63]; the association was nowhere close to significant in the present report (FBAT p = 0.78; GEE p = 0.35) Possible explanations of the failure to confirm the previously reported Framingham study MCP1-CCL2 association may stem from the current report having a smaller sample size (n = 989), using different genetic markers, and being conducted with an additive genetic model in related participants, as opposed to the prior study using unrelated participants (n = 1602) with recessive and dominant models. In a recent meta-analysis of phenotype-genotype association studies, only about one third (8 of 25) of the associations examined were replicated [73]. There are many plausible explanations why we did not replicate previously reported phenotype-genotype associations. Previous reports could represent false positive findings, or the present and prior study cohorts may differ on key factors, which may modify the phenotype-genotype associations, or our lack of replication may represent a false negative report because of inadequate statistical power [73, 74].

Strengths and limitations

The strengths of the present study include a comprehensively characterized community-based cohort, with biomarker phenotypes routinely assessed with careful attention to quality control. However, the cohort was largely middle-aged to elderly, and white of European descent, so the findings may not be generalizable to individuals who are younger or of other ethnicity/racial descent. DNA was collected at the 5th and 6th examinations, which may have introduced a survival bias. In addition, our study was susceptible to false negative findings because of the moderate size of the cohort; we lacked power to detect modest associations. Conversely, similar to most GWAS, the reported associations and linkage may represent false positive findings from multiple statistical testing.

Conclusions and future directions

The Framingham GWAS and the web posting of the unfiltered results represent a unique resource to discover potentially novel genetic influences on systemic biomarker variability. We acknowledge that the newly described associations will need to be replicated in other studies.