Background

Triglyceride (TG) levels are a marker for health outcomes. High TG, (defined as 200 < TG < 500 mg/dL in adults), are associated with, and may cause, cardiovascular disease (CVD) [1,2,3,4]. High TG is also associated with type II diabetes and metabolic syndrome [5, 6]. TG levels are affected by environmental variables such as diet, alcohol consumption, smoking, and exercise [7,8,9,10,11]. Maintaining healthy levels of TGs may improve health outcomes [6, 12].

Both hypo- and hypertriglyceridemia (HTG) have major genetic components such as heterozygous variation at APOC3 and APOA5, respectively [13,14,15]. More recently, homozygous and compound heterozygous null variation at ANGPTL3 and ANGTPL4 have been shown to be associated with low TG [16, 17]. SNVs in the genes APOE, GCKR, APOA1/C3/A4/A5 gene cluster, INSR and CETP have smaller effects on TG [18,19,20].

We previously identified linkage and association of a rare, highly conserved missense SNV, rs762174003, in SLC25A40 with HTG in a large family, and replicated the association with rare missense SNVs at highly conserved sites in SLC25A40 in a European American (EA) and African American (AA) sample [21]. SLC25A40 encodes a mitochondrial transmembrane protein [22]. In drosophila, SLC25A40 loss-of-function was shown to contribute to mitochondrial damage due to excessive oxidation [23]. Here, we attempt to replicate the association between deleterious variation at SLC25A40, as well as other reported associations with variation at APOE, GCKR, APOA1/C3/A4/A5 gene cluster, INSR, and CETP, with TG in a multi-ancestry population from the Electronic Medical Records and Genomics (eMERGE) Network phase 3 study [24, 25].

Methods

Participants and phenotype data

Participants were ascertained from each of nine phase 3 eMERGE sites, a study tasked with investigating return of results from the sequencing of a panel of ~ 100 genes in a large (N = 25,000), diverse cohort of participants [24]. Participants ages 18 years and older were included, as the distribution for TG levels in children differs from that of adults. We ascertained relevant patient data from the clinical records, including TG level measured in mg/dL (N = 12,229 adult participants), age, sex, weight, height and body mass index (BMI), measured at the same clinical visit. We also ascertained relevant ICD9 and ICD10 codes for lipidemias, pancreatic cancer, and pancreatitis. Thirteen participants with chylomicronemia (ICD9 code 272.3, ICD10 code E78.3), 12 participants with Kaposi’s sarcoma (ICD9 code 176 and ICD10 code C46) and 2,539 participants with morbid obesity (ICD9 code 278.01 and ICD10 code E66.01) or bariatric surgery were removed as their TG values are likely associated with unaccounted for exposures. Quality control was performed within participants as well. Participants who had maximum TG (maxTG) < 40 and had more than one record verifying this low TG were retained. However, one participant who had a single record with TG = 19 mg/dL was removed. Records within 2 weeks prior or 4 weeks after a diagnosis of pancreatitis (ICD9 code 577.0, ICD10 code K85.9) for an individual were removed (382 participants), as high TG are correlated with pancreatitis and is likely due to an environmental exposure such as alcohol. Similarly, individual level records coincident and after a diagnosis of pancreatic cancer were removed (147 participants). Quality control of BMI was performed using height and weight data as described by Goodloe et. al [26]. For participants with only BMI data (i.e., no height or weight data), values more than 5 standard deviations from the participant specific median BMI were removed.

Genotype data

A subset of individuals (N = 24,956) in eMERGE were genotyped using a custom capture target containing sequence for 788 genes with partial to full exon coverage, including 58 of the 59 genes considered actionable by the American College of Medical Genetics (ACMG) [27, 28], and detects 62,051 SNVs. Sequencing was performed at two sites: Partners Healthcare Laboratory for Molecular Medicine (LMM) and Baylor. Samples from dropped out participants, that failed quality control or had sex discrepancy were removed (N = 229). After removing duplicate data for a single participant, sequence data for all participants was aligned to a merged target sequence map, using the Burrows-Wheeler Aligner version 0.7.10 [29]. Genotypes were jointly called on all participants using the Genome Analysis Toolkit version 3.5 [30]. 896 Variants with genotype missingness rate > 5% were removed. Individual level genotyping rate was > 98% for all participants. After cleaning the data, 8970 participants with both phenotype and genotype data remained in the analysis. Four additional participants were removed due to their influence on the residual model (see results).

Principal components (PC) of ancestry was calculated on a pruned set of 1571 SNVs (r2 < 0.7, MAF > 0.05). Self-reported ancestry aligned well with the first two PCs.

Exon sequence data was available for the relevant genes SLC25A40, APOC3, and APOA5. Two individuals with known pathogenic variants in APOA5 (rs201079485, rs147528707) were removed from analysis. Exon sequence data was also available for known TG risk SNVs rs429358 and rs7412 (which determine the APOE ε2/3/4 genotype), rs7248104 (INSR), rs1260326 (GCKR), rs7205804 (CETP), and rs964184 (APOA1/C3/A4/A5 gene cluster).

Analysis

The phenotype of interest, medTGres, was calculated as the residuals from the median log10 transformed TG for each individual, adjusted for median age, sex, median BMI, presence of hypertriglyceridemia (HTG, ICD9 code 272.1 and ICD10 code E78.1), presence of hyperlipidemia (HL, ICD9 codes 272.2, 272.4, 272.9 and ICD10 codes E78.2, E78.4, E78.9), APOE genotype, known risk SNVs, known APOC3 protective SNVs, the first two principal components (PCs) of ancestry, and site. The variable for median age was centered around age 50, as this is when TG levels tend to descend, and then squared, as the relationship between TG levels and age is parabolic. We log10 transformed TG as the distribution is highly skewed and the transformation makes it nearly Normal. We used the median transformed TG (medTG) value over all records for each individual because the median protects against outliers that could be due to non-fasting measurements. The resulting residuals from the above linear model (medTGres) were used as the primary phenotype. We also used a similar residual from the adjusted maximum log10 transformed TG (maxTGres), as using the maximum protects against errors introduced from treatment with Niacin, fibrates or binders for high TG. Given our more complete local data from the University of Washington and Kaiser Permanente Washington (UWKP) site, we expect about 3% of the participants to be receiving such treatment at some time point in their clinical records [31]. However, this data was not available to us study-wide.

In addition to assessing the significance of the known risk variation, we analyzed variants in SLC25A40 that are likely to affect TG levels [21]. This included rare (maximum population MAF < 0.005) variants that either caused an early termination in the first 90% of the coding region, or changed the amino acid sequence. Missense variants were further constrained to be evolutionarily conserved (Genomic Evolutionary Rate Profiling (GERP) score > 4.8) [32]. The maximum population MAF was derived from populations in the gnomAD database that have not experienced a bottleneck (i.e., ignored MAF from Ashkenazi Jewish or Finnish populations) [33, 34]. We performed four separate gene-wise tests for association with medTGres and maxTGres using 1. Genotype at single SNVs with more than 10 heterozygotes, 2. All other SNVs collapsed into a single indicator variable, 3. All other rare missense SNVs collapsed into a single indicator variable and 4. All other early terminations collapsed into a single indicator variable.

Results

Demographics

Demographics of the participants included in these analyses are given in Table 1. There were slightly more females than males at most sites. The distribution of age varied widely among the sites, with younger participants from Children’s Hospital of Philadelphia (CHOP) and Cincinnati Children’s Hospital Medical Center. Median BMI tended to be above normal (> 25), but there were some participants with low BMI (< 18.5) (Columbia, Mayo). There were 51 participants with very low median BMI < 11, all from Mayo, with at least 3 records supporting this measurement. The median TG ranged between 19 and 1371 mg/dL over all participants, but the majority of participants had median TG < 150 mg/dL, which is defined as normal (Fig. 1). The number of TG records available per participant varied widely, with a median of 6 overall. Most participants were of European ancestry (77%) followed by African ancestry (7%) and Asian ancestry (6%) (Table 2, Fig. 2). Most of the participants of Asian ancestry (82%) were from UWKP and most of the participants of African ancestry (62%) were from Northwestern and Cincinnati.

Table 1 Demographics by site
Fig. 1
figure 1

Distribution of median triglyceride by site. The horizontal line at 150 mg/dL represents the maximum normal level for an adult

Table 2 Distribution of ancestry by site
Fig. 2
figure 2

First and second principal components of ancestry. AS Asian ancestry, EA European ancestry, AA African ancestry, UN unknown ancestry

RiskSNVs

Genotype counts for APOE and risk SNVs at APOC3, APOA5, APOA1/C3/A4/A5 gene cluster and GCKR are given in Tables 3 and 4. The distribution of APOE genotypes follows expectations, given known allele frequencies. Sixty-two participants were heterozygous for 1 of 4 protective APOC3 SNVs (rs76353203, rs147210663, rs138326449, rs140621530). Eighty-three participants had at least one of two risk alleles at APOA5 SNV rs2075291.

Table 3 APOE genotypes counts and percentages derived from rs429358 and rs7412, by site and ancestry
Table 4 Risk SNV genotype counts by site and ancestry

SLC25A40 genotype data

Twenty-one rare SNVs, consisting of 2 stop gains, 3 frame shifts and 16 evolutionarily conserved missense variants, were heterozygous in the data set for 32 participants (Table 5). There were 119 participants who were heterozygous at the missense rs724665 and this SNV is considered separately from the others. Each participant was heterozygous for at most one of the SNVs. Three participants were heterozygous at the frame shifts and 2 participants were heterozygous at the stop gains, limiting the power of the tests.

Table 5 SLC25A40 variants which are rare and cause a coding change

Residual model

The final residual model, containing only significant covariates is shown in Table 6. Although centered-median-age-squared was significantly associated with log10(medTG), its effect is near zero (β = − 4e−05). Males had higher log10(medTG) (β = 0.02). Increased BMI was associated with higher log10(medTG) (β = 0.01). Similarly, individuals with HTG or HL had higher log10(medTG) (β = 0.3 and 0.08, respectively) than other participants. SITE was also highly associated with log10(medTG) (F-test p < 2e−16). Neither the INSR or CETP SNVs were significantly associated with median or maximum TG (p > 0.16), and were not included in the model (data not shown). Four potentially influential outlying participants (as defined by their cook’s distance [35]) in either model (medTGres or maxTGres) were removed, leaving a total of 8,966 participants with complete phenotype and genotype data for analysis. None of the participants with low BMI (median BMI < 15) appeared to influence the models.

Table 6 Effect of covariates in linear model used to adjust median TG

Association with known TG risk SNVs

Previously published risk SNVs at APOE, GCKR, APOA5 and APOC3 were associated with TG (Table 6). The minor alleles at the APOE SNVs rs429358 and rs7412 were associated with increased log10(medTG) (β = 0.01 and 0.04, respectively) and are jointly significantly associated with log10(medTG) (F-test p = 1.1e−11). Under an additive model, the minor alleles at GCKR and APOA1/C3/A4/A5 gene cluster were also significantly positively associated with log10(medTG) (GCKR: β = 0.03, p < 2e−16, APOA1/C3/A4/A5 gene cluster: β = 0.04, p < 2e−16). Under a dominant model, presence of either minor allele at APOA5 was positively associated with log10(medTG) (β = 0.09, p = 2.7e−05). Jointly, a single copy of any of the minor alleles at the four APOC3 SNVs were negatively associated with log10(medTG) (β = -0.23, p = 1.3e−21).

Primary analysis

None of the SLC25A40 SNVs were significantly associated with medTGres, either separately or in combination (p > 0.4, Table 7, Fig. 3). Furthermore, the estimated effect of the variants was negative in all situations (rs724665 β = − 0.01; all SNVs excluding rs724665 β = − 0.02; all rare missense β = − 0.01; rare early terminations β = − 0.07). Similar results were observed for maxTGres and for analysis including the EA-only subset.

Table 7 Gene tests for SLC25A40 association with medTGres
Fig. 3
figure 3

Association between variation at SLC25A40 and medTGres. a rs724665; b all rare variation; c all rare missense variation; d all rare stop gains and frameshifts. Width of the boxplots is proportional to sample size

Discussion

We replicated association between TG and risk SNVs at APOE, GCKR, APOA5 and APOC3, with the respective directions of effect as expected. Association between variation at APOA5 and APOC3 with TG has been well documented, and our ability to detect these effects supports the reliability of these TG data. Similarly, replication of an association between APOE and TG lends further support to the association of APOE ε2/3/4 and lipid traits in general. The association between GCKR and TG further supports a biological mechanism between diabetes and TG [36]. However, we did not replicate the reported association between TG and risk SNVs at INSR and CETP. The association between INSR rs7248104 and TG was first reported by a large GWAS with a sample size of 176,000 subjects [18]. It is possible that the sample size here is too small to detect any effect. Similarly, the association with CETP rs7205804 was discovered in a cohort of 95,000 subjects [37], and therefore it is possible that the sample size here is too small to detect any effect with TG.

Additionally, we failed to replicate an association between putatively deleterious variation at SLC25A40 and TG. Previous studies relied on linkage in a large family and data from the Exome Sequencing Project (ESP). It is possible that the original signal found in the family is due to another variant in linkage with rs762174003, but unrelated to the function of SLC25A40. Further research into this locus may benefit from whole genome sequencing of selected individuals from this family. However, this would not explain why an association was detected in the relatively small sample from the ESP. Results for maximum TG also showed no association.

It is also possible that there is a relationship between SLC25A40 and TG, but we are not able to detect it in this sample. One major caveat is that we are missing data on treatment for high TG. In our local data set, UWKP, where treatment data were available, we discovered approximately 3% of individuals were being treated for high TG with niacin, fibrates or binders [31]. When we accounted for these treatments in the local data we found evidence supporting an association between TG and rare early termination and conserved missense variation at SLC25A40 [31]. As the number of participants who harbor such a variant in SLC25A40 is small (1.7% of our sample), an unaccounted for 3% treatment rate could obscure any signal. Although we used median TG to reduce the effect of unknown treatment, it may have been insufficient as we do not know when treatment would have begun. We did not use the mean as it can be influenced by outliers and TG have a right skewed distribution. Furthermore, this is secondary use of data, which is prone to a high error rate in the data. Although we attempted to clean the data using multiple strategies, it is possible that enough error remained in the data to obscure any true association with SLC25A40. In addition, the small numbers of heterozygotes that we observe in this study limit the power of our tests.

Conclusion

Although we replicated known, strong associations with TG, these data do not replicate the previously reported association between SL25A40 variation and TG. Larger datasets with more complete data on fasting and medication use may be required to further explore this association. Furthermore, use of secondary data, such as EHR data, needs extensive quality control and would benefit from more comprehensive data extraction methods.