Background

Lipid level profiles are important determinants of heart and vascular health [1], with a disproportionate prevalence of unhealthy lipid profiles among U.S. minority populations, especially Hispanics/Latinos [2, 3]. The most commonly studied lipid traits include total cholesterol, high-density lipoprotein cholesterol (HDL), low-density lipoprotein cholesterol (LDL), and triglycerides. High levels of LDL and triglycerides are considered risk factors for coronary heart disease (CHD) while the opposite is true for HDL levels [4] .

The estimated heritability of lipid traits from twin and family studies, generally studied in European descent populations, ranges from 20% to higher than 70% [5]. The largest genome-wide association study (GWAS) reported to date identified 157 common genetic variants (minor allele frequency [MAF] > 2%) in over 188,000 participants from European-descent populations [6, 7] and several of the loci marked by these variants also have secondary signals, defined as associations that remain (or become) statistically significant after conditioning on the most significant SNP in the region [8]. Collectively, these common variants explain ~30-33% of the phenotypic variance for these traits in samples of primarily European ancestry [8].

Despite the disparities surrounding unhealthy lipid profiles [2, 3], there is a dearth of genetic studies investigating lipids in non-European populations. While most lipid-associated loci were discovered in studies of Europeans, a few studies of lipids have identified population-specific signals in African, Asian, and Hispanic/Latino descent populations [9,10,11,12,13,14,15]. Previous studies [14] show that by studying multiple ethnicities one can leverage differences in associations, allele frequencies, and linkage disequilibrium (LD) patterns to fine-map known loci and narrow down the region in which functional variants are expected. Therefore, studies in diverse ethnicities are important to increase understanding of the differences in lipid profiles across populations, to ascertain whether these differences are due to genetic architecture, and finally, for fine-mapping of known loci.

In this study, we present findings from a large Hispanic/Latino cohort, the Hispanic Community Health Study / Study of Latinos (HCHS/SOL). We tested the association of more than 25 million genotyped or imputed variants with four lipid traits: LDL, HDL, total cholesterol, and triglycerides. Further, we sought to identify new signals within known association regions by implementing statistical models conditioned on previously reported associated SNPs [9, 11,12,13,14]. We assessed SNP associations that were significant in the conditional model for replication in a Hispanic/Latino meta-analysis, and for generalization in European- and African-descent study populations. In addition, we assessed the phenotypic variance in lipid traits explained by common SNPs across the genome in the HCHS/SOL.

Methods

Participants and study design

The HCHS/SOL is a community-based cohort study of 16,415 self-identified Hispanic/Latino individuals aged 18 - 74 from randomly selected households near four U.S. field sites (Chicago, IL; Miami, FL; Bronx, NY; and San Diego, CA) [16]. The two-stage probability sample design was previously described in LaVange et al. [17]. HCHS/SOL cohort includes participants who self-identified as having Hispanic/Latino background, the largest groups being Central American (n = 1732), Cuban (n = 2348), Dominican (n = 1473), Mexican (n = 6472), Puerto-Rican (n = 2728), and South American (n = 1072). Baseline lipids levels were measured for participants during clinical examinations from 2008 to 2011. Of the study population, 12,803 individuals both consented for genotyping and were successfully genotyped. The HCHS/SOL was approved by institutional review boards at participating institutions, and written informed consent was obtained from all participants.

Genotyping and imputation

DNA extracted from whole blood was genotyped on an Illumina custom array consisting of the Illumina Omni 2.5 M array (HumanOmni2.5-8v1-1) and ~150,000 custom SNPs identified to capture Amerindian genetic variation. Sample-level quality and identity checks, and SNP-level quality filtering resulted in a total of 12,803 samples with a missing call rate < 1% and 2,232,944 informative SNPs with a missing call rate < 2%. Imputation with the 1000 Genomes Project phase 1 multi-ethnic reference panel using SHAPEIT2 pre-phasing and IMPUTE2 imputation resulted in 25,568,744 imputed variants. Genotype quality control, imputation, relatedness, and PC estimation methods are described in more detail in Conomos et al., 2016 [18].

Lipids phenotype outcomes

Twelve-hour fasting blood samples were collected according to standard protocols and used to measure serum total cholesterol, triglycerides, and HDL levels [19]. LDL levels were estimated according to the Friedewald equation [20]. An inventory of all prescription and over-the-counter medication each participant had used in the previous four weeks was taken at the clinic examination. We retained individuals taking lipid-lowering medications, first because these individuals are likely to carry variants that result in dyslipidemia, and second, to maximize the sample size. Therefore, we adjusted lipid levels by adding constant values for participants who reported taking lipid-lowering medications (statins, fibrates, bile acid sequestrants, niacin, and cholesterol absorption inhibitors) as has been done previously [15]. This adjustment was made in an attempt to restore the lipid value to what it was before taking the medications. The constant value depended on the specific type of medications used (Additional file 1: Table S1) [4, 21, 22]. If multiple medications were used, we applied the correction factor with the largest effect (e.g. for someone on statins and fibrates, we adjusted their triglycerides level by +57.1 mg/dL and their LDL by +49.9 mg/dL). To assess potential biases from applying a medication correction, we performed sensitivity analyses with known lipid loci, where we included all individuals and 1) applied a correction factor by adding constant values for participants who reported taking lipid-lowering medications, or 2) adjusted for lipid-lowering medications using a separate covariate for each lipid drug. Results varied little regardless of how we accounted for users of lipid-lowering medication (Additional file 2: Fig. S1). One extreme triglyceride level was excluded from analysis (adjusted triglycerides = 6366 mg/dL). Triglyceride levels were log-transformed (after medication adjustment) for association analyses. All other lipid traits were normally distributed.

Genetic association analyses

Tests for genetic associations were performed using linear mixed models (LMMs), adjusting for population structure using the first five genetic principal components (PCs) and “genetic analysis group” as fixed effects. Genetic analysis groups were defined from a combination of genetic PCs and self-identified ancestry [18]. We adjusted for sampling design using a function (determined by AIC) of the sampling weights as a fixed effect, and adjusted for correlation among individuals due to shared community (group block), household, and genetic relatedness (kinship) using random effects. Expected allelic dosages were used for imputed SNPs in the association analyses, and results were filtered according to the effective minor allele count accounting for imputation quality. A detailed description of genetic association analyses in the HCHS/SOL can be found in Conomos et al., 2016 [18]. Significance was assessed using a genome-wide significance threshold of p-value ≤5 × 10−8. A significant locus was defined as a 1 Mb region (+/−500 kb) around the most significant SNP (index SNP).

Conditional analyses

To evaluate the possibility of novel lipid loci within previously established association regions, we applied the same LMMs for association testing while conditioning on previously reported index SNPs associated with any of the four lipids traits [6,7,8,9,10,11,12,13,14,15] (i.e. by including them as covariates), under the assumption of pleiotropy (Additional file 1: Table S2). In this table a ‘primary’ SNP was defined as the first identified (published) SNP in a given 1 Mb (+/− 500 kb) region, and an established ‘secondary’ SNP was defined as any published SNPs inside of a ‘primary’ SNP region. All primary SNPs are independent from each other and from all secondary SNPs, but not all secondary SNPs are independent from each other in a given region. SNPs that remained or became genome-wide significant after conditioning on known loci were considered for replication testing in other cohorts. In the results presented here, we defined potentially novel primary signals as previously-unreported SNPs that fell outside of a known index SNP region. We defined potentially novel secondary signals as previously-unreported if the SNP fell inside of a known index SNP region but was independent of all other SNPs within that region.

Replication of potential novel signals

Eight SNPs that reached genome-wide significance, separate from previously reported signals, were tested for replication in independent studies. These consisted of samples from Mexican Americans from Starr County, Texas and individuals from Mexico City [9], women in the Women’s Health Initiative Study (WHI) who self-reported Hispanic/Latino ancestry [23], and a subset of cohorts from the GUARDIAN consortium consisting of participants of Mexican ancestry [Insulin Resistance Atherosclerosis Study [24], Insulin Resistance Atherosclerosis Family Study (IRAS-FS) [25], Hypertension-Insulin Resistance (HTN-IR) Family study [26], and Mexican-American Coronary Artery Disease (MACAD) study] [27]. We also sought to generalize novel association signals to populations of European or African ancestry, including individuals in the Atherosclerosis Risk in Communities (ARIC) study [28] and the Women’s Health Initiative Study (WHI) [23]. We selected individuals of different ancestries because many of the SNPs followed-up for replication testing were rare in the HCHS/SOL, but slightly more frequent in other ancestries based on reference samples including African (AFR) or European (EUR) individuals.

In each replication study, fasting lipid levels were collected using standardized procedures and lipid phenotypes were adjusted for medication use in a manner similar to the HCHS/SOL analyses, except for GUARDIAN who did not adjust for lipid medication due to <5% of use within each study. Each study used linear regression stratified by ancestry (i.e. European, African, and Hispanic/Latino) to test for SNP-trait associations while adjusting for covariates including age, sex, and PCs 1-10, and obtained p-values from the Wald test. Family-based cohorts adjusted for pedigree structure. We then performed both ancestry-specific and a combined ancestries inverse normal fixed effects meta-analyses. To test the hypothesis of an association in the replication studies, we used the framework of Sofer et al. [29] and calculated a false discovery rate (FDR)-controlling directional r-value for each tested association, based on its p-value in both the HCHS/SOL and the replication study. FDR was controlled at the 0.05 level in calculating the r-values in each replication analysis, and we concluded that an association replicated in a given follow-up meta-analysis if it had an r-value ≤0.05.

Heritability estimation

To estimate the heritability of each lipid trait in the HCHS/SOL sample, we estimated kinship coefficients within a maximal set of unrelated participants, using the complete set of genotyped SNPs with minor allele frequencies (MAF) ≥1% (~1.7 million SNPs). Unrelated individuals were defined as those with estimated kinship coefficients smaller than 2-11/2, i.e. more distant than fourth degree relatives. We estimated heritability as the proportion of the total phenotypic variance explained by the kinship coefficient matrix, calculated in a LMM as described above. The LMM was adjusted for the same fixed and random effects as described above in genetic association analyses, with the exception that a slightly different kinship matrix was used. Heritability p-values were calculated based on the likelihood ratio test, and 95% confidence intervals based on the normal approximation to the distribution of the ratio between the kinship variance components and the total variance.

Generalization of previously reported SNP-trait associations

We identified nine studies that previously reported SNP-lipid trait associations in cohorts of European ancestry [5,6,7], and other ancestries [8,9,10,11,12,13]. We investigated whether the 347 SNP-trait associations (some SNPs overlap with more than one lipid trait, Additional file 1: Table S2) reported in these studies generalized to Hispanics/Latinos in the HCHS/SOL sample. For each known lipid-associated SNP, we calculated an FDR-controlling directional r-value based on both the p-value reported in the literature, and the HCHS/SOL association testing results. We computed r-values for each study (i.e. the study in the literature and the HCHS/SOL study) and trait separately. An association was generalized if its corresponding r-value was smaller than 0.05.

For each lipid trait, we also computed a genetic risk score for the SNPs that did not generalize to determine the importance of power on negative results. Specifically, for each trait, we summed the trait-increasing alleles of all the non-generalized SNPs, and tested the resulting risk score in the linear mixed model described above. A p-value <0.05 indicates that, while not formally generalized, some of the SNPs are likely associated with the trait in Hispanics/Latinos.

Results

The sample included participants (~59% female) who were on average 46.1 years of age, ranging from 18 to 74 years (Table 1). Approximately 12.3% of the individuals reported using lipid-lowering medications. Mean measured total cholesterol, LDL, HDL, and triglycerides were 199.49 (±43.6) mg/dl, 122.85 (±36.54) mg/dl, 49.07 (±13.09), and 139.75 (±101.03) mg/dl, respectively.

Table 1 Descriptives of analytic sample for each lipid trait

Trait-specific association analyses

The genomic inflation factors for the association analyses of HDL, LDL, and total cholesterol were each 1.03 and for triglycerides was 1.00 (Additional file 2: Fig. S2), indicating adequate control of population stratification. In these trait-specific analyses, we identified 14, 16, 17, and 10 genome-wide significant independent loci (+/− 500 kb from index SNP) associated with HDL, LDL, total cholesterol, and triglycerides, respectively (Additional file 1: Table S3). We tested these loci for novelty as follows.

Conditional analyses

To identify potentially novel signals, we conditioned the trait-specific association analyses on 344 previously identified variants for the four lipid traits in European, Asian, African, and Hispanic/Latino samples. We observed eight new independent genome-wide significant association signals in total; four for HDL, one for LDL, two for total cholesterol, and one for triglycerides (Additional file 1: Table S4).

Two of the four HDL signals were potentially novel secondary signals as they fell within +/− 500 kb of a known locus, one in APOA5/A1 (rs184637772, MAF = 0.002) and one in DAGLB (rs77071750, MAF = 0.003). The additional six signals were potentially novel primary signals, as they fell outside +/− 500 kb of known loci: two signals associated with HDL, one in SYNE1 (rs78768891, MAF = 0.007) and one in AUTS2 (rs191891263, MAF = 0.003); one signal associated with LDL located near DNAL1 (rs149886784, MAF = 0.002); one signal associated with triglycerides near SMOC2 (rs77635931, MAF = 0.002); and two signals associated with total cholesterol, in CD86 (rs114378860, MAF = 0.007), and near DNAH5 (rs183336356, MAF = 0.002).

Replication of potential novel signals

These potentially novel signals failed to replicate in our combined replication samples from GUARDIAN, the ARIC study, and WHI study, and an existing Mexican ancestry meta-analysis (Table 2). They also failed to replicate when we tested them in each separate meta-analyzed ancestry of the replication samples (i.e. European only, African only, and Hispanic only). Our power to replicate an effect in for each signal based on the replication meta-analyzed sample was less than 80% power except for the signal in DAGLB (rs77071750), which was 94%. This was partially from the larger replication sample available for thes signal and slightly higher MAF for in the combined replication sample, 0.08% versus 0.02% for HCHS/SOL. The signal in DAGLB (rs77071750) nearly reached significance in the combined meta-analyzed sample at r-value = 0.08. Figure 1 shows this signal before and after conditional analyses on the known SNP, rs702485.

Table 2 Replication and overall meta-analysis of 8 variants of interest with the indicated lipid trait
Fig. 1
figure 1

LocusZoom plots of the novel signal identified and replicated in the DAGLB locus (index variant rs77071750) that is independent of the known signal at rs702485. The top half of the figure “Primary analysis” shows the -log10 p-values for all variants before conditioning on the known variant. The bottom half of the figure “Conditioned on known SNPs” shows the -log10 p-values for all variants after conditioning on the known variant, rs702485

Heritability estimates of lipid traits

SNP-based heritability was estimated from a variance-component analysis performed with a subset of 10,264 individuals that excluded close familial relatives. The estimated heritability was 22% for total cholesterol, 24% for triglycerides, 24% for HDL, and 21% for LDL (Table 3). These estimates are population-specific, but comparable to what has been reported before using SNP data in large European ancestry populations [6, 8].

Table 3 Heritability estimates for each lipid trait

Generalization

We tested all known signals that had been previously associated with a given lipid trait for generalization in the HCHS/SOL using directional r-values. For total cholesterol, we tested 121 previously published variants in 74 regions. Of these, 36 regions generalized to the HCHS/SOL (Additional file 1: Tables S5, S9), two of which (HLA and PLEC1 regions) were based on secondary signals only. For LDL, we tested 128 published variants in 60 regions (Additional file 1: Tables S6, S9). Of these, 28 generalized to HCHS/SOL, one of which (LDLRAP1) was based on a secondary signal only. For HDL, we tested 139 published variants in 75 regions. Thirty-one of the 75 regions generalized to the HCHS/SOL (Additional file 1: Tables S7, S9), one of which (LILRA3) was based on secondary signals only. Finally, we tested 79 published triglycerides variants in 44 regions (Additional file 1: Tables S8, S9). Of these, 19 regions generalized to the HCHS/SOL, one of which was based on a secondary signal (HLA). In general, 50% of the tested SNPs generalized, overall and by ancestry, notably 11 out of the 12 SNPs previously identified in studies of Hispanic/Latino ancestry replicated in HCHS/SOL.

We defined a region as generalized if at least one of the variants in the region generalized to HCHS/SOL. Figure 2 shows the number of generalized and non-generalized regions per trait. A region could have generalized because the primary SNP generalized, because a secondary SNP generalized, or because both did. There were nine regions in which the primary SNP did not generalize, while a secondary SNP did generalize (4 regions for HDL, 2 regions for total cholesterol, 1 region for LDL, 2 regions for triglycerides). In 49 generalized regions, only the primary SNP generalized, but no additional secondary SNP generalized. In the 48 remaining generalized regions, both the primary SNP and at least one secondary SNP generalized. Across all associations (regardless of regions), 43% of the primary SNPs (109 of 253 SNPs) generalized, and 59% of the secondary SNPs (123 of 209 total SNPs) generalized.

Fig. 2
figure 2

The number of generalized and non-generalized regions per trait. The number of generalized regions are divided according to the number of region generalizations that are due solely to generalization of the primary SNP (while other secondary SNPs did not generalize): “Only primary”; solely due to generalizations of a (at least one) secondary SNP (while the primary SNP did not generalize): “Only secondary”; and the number of regions in which both the primary SNP generalized, and at least one secondary SNP: “Primary & Secondary”. Those that generalized are shown in blue and those that did not generalize are shown in grey

Finally, for each trait, to assess the possibility that additional previously reported associations exists in the HCHS/SOL, but could not be generalized due to lack of power, we calculated a score defined by the sum of all trait-increasing alleles of all the non-generalized SNPs. The results were highly significant: the p-value for genetic scores by trait were: HDL p-value 8.6 × 10−26, LDL p-value 1.4 × 10−29, triglycerides p-value 6.9 × 10−41, and total cholesterol p-value 3.5 × 10−15. These results suggest that some of the non-generalized SNPs are truly associated with their respective trait in the HCHS/SOL, but did not generalize individually due to lack of power.

Discussion

We performed a GWAS of four lipid traits in a sample of approximately 12,800 Hispanic/Latino participants of HCHS/SOL. Of eight potentially novel loci identified from conditioning on known loci in the literature, we were unable to replicate any SNPs in several diverse replication cohorts. We demonstrated that >50% of the previously identified GWAS loci for lipid traits generalized to HCHS/SOL, which has important implications for further study of lipids genetics in Hispanics/Latinos.

We failed to replicate the eight potentially novel signals, possibly because of the very low minor allele frequencies, which ranged from 0.2% to 0.7% in the HCHS/SOL sample. At these frequencies and the identified effect sizes, our replication samples were not powered at 80% to replicate these effects. Studies in larger samples of Hispanic/Latino or diverse ancestry participants are needed to further investigate these low frequency variants. The signal that came closest to replicating was a signal in DAGLB (rs77071750 with MAF = 0.003) associated with HDL. rs77071750 lies about 130 kb from the primary signal, rs702485, and is monomorphic in populations of European ancestry but has a frequency of about 4% in populations of African ancestry. The SNP rs77071750 is intronic to GRID2IP, which encodes the glutamate receptor, ionotropic, delta 2 (Grid2) interacting protein that links GRID2 with actin cytoskeleton signaling molecules. As for the other potentially novel SNPs, while rs184637772 did not replicate, it has some interesting biology. It is intronic to APOC3 has histone marks H3K4me3, H3K9ac indicative of promoter activity in liver and intestine. Other potentially novel SNPs appear less interesting.

Generalization analysis revealed that >50% of previously established lipid variants identified in GWAS of European-, Hispanic/Latino-, East Asian-, or African-descent populations generalized to HCHS/SOL based on an r-value <0.05, which accounts for agreement in the direction of effect. This fraction is much greater than expected by chance for each trait (binomial test for consistent direction of effect and an r-value <0.05 reached p-values < 6.5 × 10−31 for each trait). We might expect that loci that are directionally consistent and generalize are also likely to be functionally involved in lipid biology across diverse population groups. On the other hand, Hispanic/Latino descent populations have a fair amount of European ancestry and this could also be a reason for generalization. Failure to generalize can occur because the power for discovery in the HCHS/SOL was low, due to either low MAF or differences in LD across populations, chip coverage of the relevant locus, or because the originally published variant was a false positive.

We note that of the 12 SNPs (11 for HDL and 1 for triglycerides) that were previously identified in Hispanic/Latino samples, 11 generalized to HCHS/SOL. Ten of the 12 SNPs identified in Hispanic/Latino samples lie in previously established 1 Mb regions identified in Europeans, while two SNPs (both associated with HDL), rs148533712 (in RORA) and rs78557978 (near UGT8), lie in 1 Mb regions first identified in Hispanics/Latino samples. However, neither rs148533712 nor rs78557978 generalized to HCHS/SOL. Of the 10 SNPs that are in previously identified 1 Mb regions all of which generalized to HCHS/SOL), five associated with HDL are in LD (R2 > 0.5) with another previously identified European SNP or signal (rs2278426 in LOC55908 and the ANGPTL8 known region, rs1532624 near CETP, rs261334 and rs1077835 in/near LIPC, and rs4149310 in ABCA1). Except for rs4149310, all have no remaining signal (p-values > 0.05) after conditional analyses. However, rs4149310 still has some signal remaining, specifically an unconditioned p-value =1.52 × 10−12 that becomes p-value = 5.6 × 10−08 after conditioning on all known SNPs. An additional SNP associated with HDL, rs2472386 in the ABCA1 region, identified in a different Hispanic sample than rs4149310, is in moderate LD with rs4149310 (r2 = 0.6) and in low LD with a nearby European identified SNP, rs2853579 (r2 = 0.3). However, between these 3 SNPs (i.e. rs2853579, rs4149310, and rs2472386) there is a distinct haplotype (TTG) that is identified in the 1000 genomes phase 3 AMR populations at 1% frequency and not found in the EUR populations. Two SNPs associated with HDL, rs9282541 a missense variant in the ABCA1 gene, and rs11216230 an intronic variant in SIK3 gene (also in the APOA5/APOA1 known region) are Hispanic-specific signals and independent of all other known signals in their respective regions [12, 13]. We replicate these two signals for the first time in another Hispanic sample. Within the APOA5/APOA1 region for the HDL, rs11216230 is the only SNP in a secondary signal that generalized. The other 4 SNPs in secondary signals did not replicate. Finally, a Hispanic-specific signal in the CLIP2 region, rs8102280, an intronic variant in the MAU2 gene, generalized to the HCHS/SOL sample. This signal has been replicated previously [9].

Conclusions

In summary, we did not identify any novel lipid-associated loci, but did demonstrate that greater than half of previously identified GWAS loci for lipid traits generalized to HCHS/SOL. These results suggest that the genetic architecture of lipid levels includes several loci that are shared across different population groups. It is also possible that the generalization of European signals is due to the large fractions of European ancestry in Hispanics. Larger sample sizes are required for further investigations of potentially novel loci in Hispanic populations.