Background

Genome-wide association study (GWAS) has proved to be a useful tool in the discovery of genetic variants associated with many complex diseases and traits [1]. Unfortunately, the level of success in variants discovery by GWAS for some complex human diseases, such as elevated blood pressure or hypertension, has been very low. In fact, variants so far discovered through GWAS collectively explain only a small fraction of the known heritability for any of the diseases [25].

Given what is known about the biology of these diseases and traits, it is suspected that important variants with moderate to large effect sizes remain to be identified; this is commonly referred to as the “missing heritability” [2, 3, 6, 7]. The explanations for missing heritability include the postulation that it could lie in regulatory rare variants, functional variants, structural variants, gene-by-gene or gene-by-environment interactions [811]. It has also been suggested that multiple small-effect variants, which are individually undetectable with the statistical power of GWAS, additively contribute to the missing heritability [1214]. Another explanation is that the current estimates of total heritability may have been significantly inflated by the effects of epistasis [15]. The search for missing heritability has witnessed application of various approaches including pathway-based analysis of common, less frequent, and rare variants [1618]; analysis of correlated traits using summary statistics from GWAS [19]; and analytical procedures that accommodate mixture of effects on the traits [20]. Because some of the existing effective pharmacotherapeutic agents for blood pressure control act by targeting specific biological pathways and these pathways are less represented in the GWAS-identified variants [1, 2124], analytical approaches that focus on known biological pathways rather than on the entire genome could lead to discovery of some of the variants linked to “missing heritability” in association studies.

Consequently, the main objective of the present study was to perform pathway-based association analysis to identify blood pressure phenotypes–associated functional variants in the aldosterone-regulated sodium reabsorption pathway using whole exome sequence data provided for Genetic Analysis Workshop 19 (GAW19). The aldosterone pathway was chosen because it is one of the known target biological pathways for pharmacological control of hypertension. We hypothesize that functional genetic variant in the pathway influences susceptibility to blood pressure elevation.

Methods

Analyses were based on the unrelated data set of human whole exome sequence data plus the simulated and real phenotypes data as provided for GAW19 and described by Almasy et al [25].

Study subjects and phenotypes

The study samples included 1943 adult Hispanic subjects, that is, 1021 type 2 diabetes cases and 922 controls from the San Antonio Family Heart Study, San Antonio Family Diabetes/Gallbladder Study, Veterans Administration Genetic Epidemiology Study, and the Investigation of Nephropathy and Diabetes Study family component (HA) [2629]; and the Starr County, Texas (HS) [30, 31] studies. Available study variables included sex, age, diastolic blood pressure (DBP), systolic blood pressure (SBP), and use of antihypertensive medication. Of the 1943 subjects, only 1850 had complete data on study variables.

We analyzed both the simulated blood pressure phenotypes in the “SIMPHEN.1” data set and the real blood pressure phenotypes in the “T2D-GENES_P1_Hispanic_phenotypes” data set. Outcome variables included in the analysis are DBP, SBP, pulse pressure (PP) (defined as PP = SBP − DBP), mean arterial pressure (MAP) (defined as MAP = DBP+[PP/3]), and hypertension (defined as blood pressure ≥140/90 mm Hg or use of antihypertensive medication). Sex, age, and age-squared were treated as covariates in the analysis.

Genotype data

Whole exome sequence data were provided on 11 odd-numbered autosomes. The genotypes used in the present analysis were based on NALTT (number of nonreference alleles for each individual thresholded) as provided in the variant call format (VCF) files. We used the software BCFtools (http://samtools.github.io/bcftools/bcftools.html) to extract data on biallelic (single nucleotide and deletion/insertion) variants and then recoded the genotypes from 0/1/2 to ACGT using the information on both the reference and alternate alleles for each variant. The quality control (QC) of the genotype data was carried out using the software PLINK [32]. Of the 1,765,688 total available variants, 1,711,766 were biallelic. We excluded 136,233 variants with missing genotypes greater than 10 % and 1,529,240 variants with minor allele frequency of less than 1 %. Rare variants were excluded because the focus of the analysis was on common and less-frequent variants and also because the sample size was too small for single-variant analysis of rare variants. Another 1238 variants that failed Hardy-Weinberg equilibrium test at p <0.001 were excluded. One sample with missing genotypes of greater than 10 % was excluded. The final quality-controlled genotype data set was made up of 1942 samples and 45,055 biallelic variants. Principal component analysis was performed using all 45,055 variants and the first of 10 components was extracted and included in association analysis to control for population stratification. Only the 1850 subjects with complete data on blood pressure phenotypes were included in association analysis.

Aldosterone-regulated sodium reabsorption pathway genes

The aldosterone-regulated sodium reabsorption pathway was defined using KEGG PATHWAY Database (http://www.genome.jp/kegg/pathway.html). The pathway comprises of 39 genes located across 14 autosomes and the X chromosome. Twenty-two genes were on the 11 odd-numbered autosomes available for the present analysis (Table 1). Annotation of variants was done using the SeattleSeq Annotation (http://snp.gs.washington.edu/SeattleSeqAnnotation138/index.jsp). Of the 45,055 biallelic variants that passed QC, a total of 127 were in the aldosterone-regulated sodium reabsorption pathway. With the exception of the SFN gene, each of the 22 genes on the odd-numbered autosomes had at least 1 variant available for analysis.

Table 1 Aldosterone-regulated sodium reabsorption pathway genes available in the data set

Association analysis

Using the simulated and real phenotypes, we fitted additive linear (for DBP, SBP, PP, MAP) and logistic (for hypertension status) regression models for each outcome variable with the variant as explanatory variable coded as dosage of the minor allele. Sex, age, age-squared, and first principal component were included as covariates. The software PLINK [32] was used for the association analysis by implementing the set-based tests. All the 127 variants in the pathway were considered as a set. The test involved iterative steps that included: (a) for each variant, we determined which other variants were in linkage disequilibrium above a certain threshold R 2 and eliminated other variants with values above the threshold; (b) performed single-variant association analysis and selected up to N variants with p values below P, starting with the most significant one; (c) from the subset of variants, we calculated set-statistic as the mean of the single-variant statistics; (d) permuted the data set 5000 times and repeated steps (b) and (c) for each permuted data set; (e) calculated empirical p value as the number of times the permuted set–statistic exceeded the original data set–statistic. Software default values of 0.5, 0.05, and 10 were used for the parameters R 2, P, and N, respectively. For each outcome variable, haplotype and epistasis association analyses were also done. The epistasis association involved all pairwise combinations of the 127 variants and their interaction. We also performed Bonferroni correction for multiple testing using number of testing as equal to the number of genes in the pathway. This was based on the assumption of nonindependence of variants in the pathway genes.

Results

Figure 1 displays the distributions of the single-variant association analysis for both real and simulated phenotypes. Table 2 shows the topmost pathway-based association signals. After Bonferroni correction for multiple testing, associations of PRKCA gene with DBP, SBP, and MAP remained significant in both real and simulated data. None of the empirical p values reached significant level. Figure 2 displays the distributions of the haplotype associations. The haplotype signals are similar to those of the single variant analysis in Fig. 1. Table 3 shows the results of the 2-locus epistasis analysis. The most significant interactions for real phenotypes were those between different genes, for example, INS vs. PIK3R2 for DBP, whereas for simulated phenotypes there were significant within-genes interactions such as in INSR gene for SBP, MAP, and hypertension (Table 3).

Fig. 1
figure 1

Distributions of single single-nucleotide polymorphism association signals for real phenotypes (top) and simulated phenotypes (bottom)

Table 2 List of genes with top pathway-based signals for real and simulated phenotypes
Fig. 2
figure 2

Distributions of haplotype association signals for real phenotypes (top) and simulated phenotypes (bottom)

Table 3 List of loci with most significant epistasis signals for real and simulated phenotypes

Discussion

In this study, we explored pathway-based analytical approach for the discovery of functional variants influencing blood phenotypes as additional method that could lead to identification of additional variants for complex human conditions. We focussed on a known biological pathway rather than pathways constructed from none proven biological systems. Our hypothesis was that because existing effective pharmacotherapeutic agents for blood pressure control act by targeting specific biological pathways, appropriate analytical methods that focus on such pathways could lead to identification of additional variants linked to complex human conditions than currently discovered by GWAS and candidate gene approaches. Results from this analysis indicate that, indeed, the use of known biological pathways for genetic association analysis can be a useful approach in the presence of true association since it takes advantage of the nonindependence of variants across pathway genes for setting threshold for statistical significance. We do note that because our analysis included genes from only the odd-numbered autosomes provided for GAW19, these results and their interpretations cannot be taken as fully representative of the aldosterone pathway. We are of the opinion that pathway-based analysis of variants from all genes in the pathway with those from the regulatory regions would lead to identification of important associations that can be interpreted with less limitation than in the present study. The use of known biological pathways in this study represents useful extension of genetic association analysis for complex human diseases.

Conclusions

The findings from this study show that pathway-based analytical approaches can be useful in identification of important disease-associated variants that are otherwise undetectable by GWAS because of the assumption of nonindependence of variants within and across pathway genes which leads to reduced penalty of multiple testing and thus less stringent statistical significance threshold.