Background

Elevated circulating triglyceride (TG) levels are a well-established risk factor for cardiovascular disease (CVD), with accumulating evidence supporting a causal role [1,2,3]. Other CVD-associated lipid profiles include elevated low density lipoprotein cholesterol (LDLC) and total cholesterol (TC), and low levels of high density lipoprotein cholesterol (HDLC) [4, 5]. The genetic architecture of lipid levels is polygenic [6], comprising rare variants with high penetrance (e.g., LDLR mutations in familial hypercholesterolemia) [7, 8], and the aggregate effects of common variation that can be captured in a polygenic risk score (PRS) [9, 10].

Among the genetic contributors to lipid disorders are rare copy number variants (CNVs). Classically, these have been confined to CNVs that impact a single well-established dyslipidemia gene (e.g., LDLR, LPL) [11]. Recently, however, we demonstrated that the 22q11.2 microdeletion confers an approximately 2-fold increased risk for mild-moderate hypertriglyceridemia (HTG; TG levels 1.7–10.0 mmol/L) compared to general population risk [12]. Related conditions also associated with the 22q11.2 deletion include type 2 diabetes [13] (T2D) and obesity [14]. To our knowledge, this is the only recurrent multigenic CNV to be associated with elevated TG levels that does not overlap an established TG metabolism gene. The 22q11.2 microdeletion, with estimated live birth prevalence of 1 in 2148 [15], defines the 22q11.2 deletion syndrome (22q11.2DS), and has proven utility as a genetic model to study associated common complex conditions [16]. Its role as a genetic model includes serving as a platform to investigate the interplay between rare and common genetic variation, an area that has become of intense research interest and potential clinical relevance [17,18,19,20,21,22]. For example, recent studies have demonstrated that additional genome-wide variation, rare CNVs and schizophrenia PRS, can contribute to likelihood of schizophrenia expression, where there is baseline > 20-fold increased risk conferred by the 22q11.2 deletion [18, 22, 23].

In this study, we aimed to assess the additional genomic and phenotypic contributors to lipid levels (TG, HDLC, LDLC, TC) in individuals with a 22q11.2 microdeletion, with a focus on TG levels given the associated elevated baseline risk for mild-moderate HTG [12]. We studied a unique, deeply-phenotyped adult cohort of individuals with 22q11.2DS where there were both lipid level and genome sequencing data available for this rare condition (Additional file 1: Figure S1). The main goals of the study were to test whether lipid PRSs derived from the general population are associated with lipid levels (TG, LDLC, HDLC, and TC) in individuals with a 22q11.2 microdeletion. In exploratory post-hoc analyses, we constructed receiver operating characteristic (ROC) curves to assess the predictive value of the TG-PRS and other clinical variables for mild-moderate HTG status. We also examined the role of additional genome-wide clinically relevant rare variants, and assessed the contribution of overall candidate gene-based rare variants, on lipid levels.

Methods

22q11.2DS cohort and clinical variables

This study involved a well-characterized cohort of adults with a typical 22q11.2 microdeletion ascertained from a specialized 22q11.2DS clinic in Toronto, Canada. Typical 22q11.2 deletions were identified through standard clinical laboratory methods [13, 24] and precise 22q11.2 deletion extents confirmed using genome sequencing data (see Additional file 1: Table S1 for details).

To be included, participants had to have at least one recorded circulating lipid level of TC, TG, LDLC, and/or HDLC (Additional File 1: Figure S1), obtained from routine clinical bloodwork assessments. Measurements were taken predominantly in the non-fasting but not post-prandial state, as this was most feasible for this patient population [12]. For most individuals we used their most recent bloodwork. LDLC levels were calculated using the Friedewald equation. However, in cases where LDLC levels were unavailable due to high TG levels that result in an inaccurate estimation by Friedewald equation [25], we used records of LDLC levels at other time points when available. No LDLC levels were calculated using the Friedewald equation when TG levels were > 4.52 mmol/L, consistent with previous lipid genetics studies [9, 26].

Additionally, we assessed other traits known to influence lipid levels or genetic background, including sex, age, BMI, T2D, psychotic illness [27, 28], and ancestry. T2D was defined as having a hemoglobin A1c value ≥ 6.5% and/or diagnosed with T2D as indicated by medical records. We defined “psychotic” as individuals diagnosed with schizophrenia or schizoaffective disorder; all other individuals were deemed “non-psychotic”. European versus non-European ancestry was assigned using principal component analysis (PCA) of common genetic variants (Additional file 1: Figure S2), which showed complete concordance with pedigree-derived information.

For details on genome sequencing methods and variant annotation, see Additional file 2: Supplementary Methods.

Polygenic risk score analyses

We used lipid PRSs that were previously constructed in a UK Biobank study [29] (PGS Catalog publication ID: PGP000263), with a development sample of 391,124 European individuals using the penalized regression (bigstatsr) method. Genotype positions and effect sizes for the TG (PGS001979), HDLC (PGS001954), LDLC (PGS001933), and TC (PGS001895) PRSs were retrieved from the PGS catalog [30] (Additional file 1: Table S3). Individual-level PRSs for the study cohort were calculated using PRSice-2 following QC (Additional File 1: Figure S3).

We tested for associations between PRSs and their corresponding lipid level using linear regression in 1) a univariable model and 2) a multivariable model that adjusted for other key phenotypic variables, batch (TCAG vs IBBC cohort and sequencing platform), and the first four principal components (PC) of ancestry.

  • 1) lipid level ~ lipid PRS

  • 2) lipid level ~ lipid PRS + sex* + age + BMI + T2D* + psychotic illness* + cohort + sequencing platform + PC1–PC4 *binary variable

Binary variables were coded as 0 or 1 and all values were standardized using the scale() function in R to produce standardized beta coefficients. For regression analyses, TG levels were natural log transformed to approximate a normal distribution, as done previously [8, 9, 31] (Additional File 1: Figure S4). For individuals on statins, LDLC and TC levels were divided by 0.7 and 0.8, respectively, to adjust for the cholesterol-lowering effects of these medications, as done previously [8, 9, 32]. The variance in lipid level explained by each multivariable model was measured using the multiple R2 metric. The variance in lipid levels explained by the PRS variable alone in a multivariable model (i.e., ΔR2) was calculated as the difference in the multiple R2 between the multivariable model when including the PRS variable (full model) versus without the PRS variable (covariate only model). Additionally, we tested for an interaction between TG-PRS x BMI by adding this interaction variable to a model that included TG-PRS and BMI as other independent variables and to the multivariable model (2) (Additional file 1: Table S4).

Receiver operating characteristic (ROC) curve analyses

Given the elevated baseline risk for mild-moderate HTG for individuals with 22q11.2DS, we constructed ROC curves to classify mild-moderate HTG status based on logistic regression models using TG-PRS, sex, and BMI as predictor variables, independently or in various combinations (TG-PRS + BMI, TG-PRS + sex, BMI + sex, TG-PRS + BMI + sex). Logistic regression models were implemented using the glm() function in R and all visualizations and analyses related ROC curves were done using the R package “pROC” [33]. Delong’s test for two correlated ROC curves was used to test for the difference between the area under the curve (AUC) of two ROC curves and the optimal sensitivity and specificity of each ROC curve was determined using Youden’s J statistic. Confidence intervals for AUCs were calculated using 2000 bootstrap replicates.

Rare variant analyses

To prioritize variants for assessment of clinical relevance with respect to their relationship to causing extreme lipid levels (i.e., high TG, LDLC, HDLC, and low HDLC), we restricted to variants affecting protein coding or splicing regions that are (1) very rare (gnomAD PopMax filtering allele frequency < 0.2%), (2) loss of function (LoF) or predicted damaging missense, and (3) within genes relevant to lipid levels that are part of a targeted next generation sequencing (NGS) panel (n = 33 candidate genes) used at a specialized genetics clinic for lipid metabolism disorders in London, Ontario [34] (Additional file 1: Table S5). Prioritized rare variants were then assessed using the American College of Medical Genetics and Genomics (ACMG) variant interpretation guidelines [35] or LDLR-specific guidelines developed by ClinGen [36]. For further details on variant prioritization, see Additional File 2: Supplementary methods.

Additionally, we sought to assess whether being a carrier of a rare variant, including those with potentially smaller effect sizes that are not considered pathogenic/likely pathogenic per ACMG criteria, would be associated with altered lipid levels (Additional file 1: Table S5). An association between “rare variant carrier status” and lipid levels was assessed using the same univariable and multivariable linear regression models as for PRS analyses, but with the rare variant carrier status variable in place of the PRS variable. For additional details on the filtering criteria for rare variants for this analysis, see Additional File 2: Supplementary Methods.

All statistical analyses were performed using R version 4.0.3. Statistical significance was defined as p < 0.05. P-values were not adjusted for multiple testing.

Results

Cohort description and demographic and phenotypic predictors of lipid levels

Table 1 summarizes the clinical and demographic features of the 151 of 157 individuals with a 22q11.2 microdeletion who had genome sequencing data that passed common variant quality control (Additional file 1: Figure S1). As expected from previous results for an overlapping sample (Additional file 1: Table S2), 45.0% of the cohort with TG data available (n=67 of 149) had mild-moderate HTG (TG 1.7–10.0 mmol/L), representing an approximately twofold increase in risk compared to an age-matched general Canadian population prevalence of 21.6% [12]. Males in particular had a significantly higher prevalence of HTG (60.8% vs 29.3%, Fisher’s exact test p = 1.44E-04) and significantly lower average HDLC levels (0.99 mmol/L vs 1.31 mmol/L, Wilcoxon test p = 3.02E-09) (Table 1).

Table 1 Lipid and other phenotypic/clinical variables, and sex effects, in adults with a 22q11.2 microdeletion

Polygenic risk score and other predictors of lipid levels

We first examined whether lipid PRSs (for TG, HDLC, LDLC, and TC) that were previously developed in an entirely European general population cohort (aged 40–69 years) [29] (Additional file 1: Table S3), would be significantly associated with their corresponding lipid level in a relatively younger (median age 34, range 17–64, years), predominantly (89.4%) European, and smaller (n = 149–151) cohort of individuals with a 22q11.2 microdeletion. The results showed that each PRS was a significant predictor (all p < 0.01) of its corresponding lipid level, in univariable models. Multivariable linear regression models that adjusted for sex, age, BMI, T2D, psychotic illness, ascertainment cohort, sequencing platform, and the first four principal components (PCs) of ancestry (Table 2, Additional file 1: Figure S7) were significant however only for TG and HDLC levels; overall models were non-significant for LDLC and TC (only the PRS variable appeared to have a significant effect). Results were similar when restricting to only individuals of European ancestry (n = 134–136), with no notable changes in effect sizes of the respective PRSs (Additional file 1: Table S6).

Table 2 Linear regression analyses testing lipid polygenic risk score (PRS) as a predictor of its corresponding lipid level, in a univariable model and in a multivariable model accounting for phenotypic, batch, and ancestry variables

The TG-PRS variable alone (beta = 0.313, p = 1.52E-04) explained 8.4% of the variance (ΔR2) in TG levels in the multivariable model (R2model = 24.7%, pmodel = 7.26E-05) (Table 2), with male sex and higher BMI also significant independent predictors of higher TG levels (Table 2).

Univariable linear regression analyses within those with or without obesity (BMI ≥ 30), revealed that the effect size of the association between the TG-PRS and TG levels was greater in those with obesity (beta = 0.4617) than without obesity (beta = 0.1778) (Fig. 1). Further testing for a difference between these effect sizes (i.e., the slope of the regression lines), by adding an interaction term (TG-PRS x BMI) to a linear regression model that included TG-PRS and BMI as predictors of TG levels, identified a significant interaction (beta = 0.179, p = 0.045) (Additional file 1: Table S4). The interaction term did not reach significance however when adjusted for the other 10 variables included in the multivariable models (beta = 0.168, p = 0.059) (Additional file 1: Table S4).

Fig. 1
figure 1

Scatterplot and linear associations between the triglyceride (TG) polygenic risk score (TG-PRS) and TG levels for n = 149 adults with a 22q11.2 microdeletion by obesity classification. Fitted lines were generated using linear regression performed within two sub-groups, with (orange) (beta = 0.4617) and without (blue) (beta = 0.1778) obesity defined as BMI ≥ 30 kg/m2 (unadjusted TG x BMI interaction beta = 0.179, p = 0.045). TG levels are natural log (2ln) transformed. The dashed black line indicates the natural log transformed value of the lower-bound clinical cut-off that defines mild-moderate HTG (1.7 mmol/L), thus above this line individuals would be classified as having mild-moderate hypertriglyceridemia

Using triglyceride polygenic risk score (TG-PRS), BMI, and sex to classify mild-moderate hypertriglyceridemia

In post-hoc analyses we constructed ROC curves using logistic regression models to assess the ability of the TG-PRS, along with significant clinical predictors of TG levels (sex and BMI), to discriminate between those with and without clinically defined mild-moderate HTG (TG levels 1.7–10.0 mmol/L). Among the univariable models, sex had the highest AUC (0.659; 95% CI 0.582–0.736), with BMI and TG-PRS having slightly lower AUCs (Fig. 2, Additional File 1: Table S7). Combining all three variables in one model resulted in a significantly higher AUC (0.7486) than each univariable model (vs TG-PRS p = 0.0023, vs BMI p = 0.0089, vs sex p = 0.0024), achieving an optimal sensitivity and specificity (using the Youden Index) of 0.746 and 0.707, respectively (Additional File 1: Tables S7 and S8). Furthermore, we tested whether the addition of TG-PRS to each clinical predictor alone (i.e., sex vs sex + TG-PRS, BMI vs BMI + TG-PRS) or combined (i.e., sex + BMI vs sex + BMI + TG-PRS) would improve prediction (Additional File 1: Figure S8). In each case, the addition of TG-PRS marginally increased the AUC, but the difference did not reach statistical significance (Additional File 1: Table S8).

Fig. 2
figure 2

Receiver operating characteristic (ROC) curves of logistic regression models using each of triglyceride polygenic risk score (TG-PRS; red), BMI (brown), sex (purple), and the combination of these three variables (blue), as predictors of mild-moderate hypertriglyceridemia in adults with 22q11.2DS. The area under the curve (AUC) and 95% confidence intervals (95% CI) for each curve are shown in the figure key on the bottom right

We also assessed the prevalence of mild-moderate HTG within each decile of TG-PRS (Additional File 1: Figure S9). Individuals in the top decile had a non-significantly higher prevalence of mild-moderate HTG (64.3%; n = 9 of 14) compared to individuals in the lowest decile (26.7%, n = 4 of 15) (Fisher’s exact test p = 0.3166).

Clinically relevant rare variants

In 32 (20.5%) of 156 individuals we identified 38 rare (< 0.2%) single nucleotide (SNV) or insertion/deletion (indels) variants in 17 of the 33 lipid panel genes assessed, which we then examined using standard clinical criteria [35, 36] (Additional file 1: Table S9). There were no rare CNVs overlapping exonic regions of these 33 genes (Additional file 1: Table S5).

Of these 38 SNV/indels, two were classified as pathogenic/likely pathogenic, one for hypercholesterolemia and the other for HTG. A pathogenic heterozygous missense variant in LDLR (c.G523A, p.D175N; rs121908033; ClinVar ID: 3726) [37, 38], diagnostic for familial hypercholesterolemia, was identified in a 21.1-year-old male with an elevated LDLC level of 4.53 mmol/L. We also identified a loss of function (LoF) (frameshift insertion) variant in CREB3L3 (c.729dupG, p.L243fs; rs780374391; ClinVar ID: 967,101) that was classified as likely pathogenic with respect to mild-moderate HTG. This female patient had a variable history of elevated TG levels with a maximum recorded TG level of 3.89 mmol/L (age 46.9 years), and most recent recorded TG level of 1.57 mmol/L (age 47.4 years). See Additional file 1: Table S10 for detailed patient history and scoring of these two variants.

Rare variant carrier status regression analyses

Of the n = 154–156 individuals assessed for the four lipid traits examined (Additional file 1: Figure S1), we identified relatively few individuals who were carriers of rare (< 1.0%) variants in genes canonically associated with each lipid trait: high TG (n = 7, two individuals had two variants each), high LDLC (n = 14), low HDLC (n = 9), and high HDLC (n = 10) (Additional file 1: Table S5, Table S11). Rare variant carrier status was not a significant predictor of the level of corresponding lipid trait using either univariable or multivariable linear regression models (Additional file 1: Table S12), possibly due to limited power afforded by the sample size.

Discussion

In this initial study of lipid genetics in individuals with the most common pathogenic CNV in humans, we demonstrate that part of the variable expression of the 22q11.2 microdeletion with respect to elevated TG levels can be explained by genome-wide common variation captured in a PRS, along with significant effects of sex and BMI. With few individuals having rare variants in TG metabolism genes, this study was underpowered to determine their role in 22q11.2DS.

The modifying effect on TG levels from the TG-PRS in the presence of a high impact “first-hit” exerted by the 22q11.2 deletion is consistent with previous studies of schizophrenia and related neuropsychiatric phenotypes in 22q11.2DS, where Schizophrenia-PRS has been reported to modify the associated penetrance of the deletion [18, 39]. This pattern for the 22q11.2 deletion also appears to be consistent with previously reported PRS modification of the BMI-lowering effect of the 16p11.2 duplication [40]. In other conditions, PRSs have also been reported to modify the penetrance of a pathogenic rare variant affecting a single gene, e.g., in familial hypercholesterolemia [21, 40], coronary artery disease [17], breast cancer [20], and prostate cancer [20]. Notably, the variance in TG levels explained by the TG-PRS (8%) in a multivariable model in the current study of 22q11.2DS appears in line with general population expectations for TG-PRS explaining variance in TG levels (~ 2–10%), although it is difficult to make head-to-head comparisons of PRS performance given that each study used a different PRS and different methods to quantify PRS performance [41,42,43].

Consistent with previous studies using general population samples [44,45,46,47], in this 22q11.2 microdeletion sample we identified a potential interaction between TG-PRS and BMI, in an unadjusted interaction model, that indicates that the TG-PRS may have a stronger association with TG levels amongst individuals with obesity. This suggests that obesity may play a role in further unmasking the TG level-increasing effect of a high TG-PRS, potentially further increasing the likelihood of HTG in individuals with a 22q11.2 microdeletion who also have elevated BMI.

The potential value of PRSs as a clinical risk prediction tool is a widely debated topic [48, 49]. In a previous study of PRS and schizophrenia in 22q11.2DS, it was suggested that PRSs may potentially have greater clinical use when there is an elevated baseline risk, as stratification using PRSs would be able to produce larger differences in absolute risk [39]. In this study, we assessed the ability of the TG-PRS, along with significant clinical variables, sex and BMI, to predict mild-moderate HTG, in the context of an elevated baseline risk conveyed by a typical pathogenic 22q11.2 deletion. While the performance of each of these three variables independently was relatively poor (AUC 0.59–0.61), the combination of the three demonstrated moderate predictive value (AUC = 0.75, sensitivity = 0.75, specificity = 0.71). However, the addition of the TG-PRS to each clinical variable-only model did not demonstrate a significant increase prediction accuracy, suggesting that the TG-PRS used adds no predictive value that would be clinically meaningful to that obtained from standard clinical data. Furthermore, while we observed a substantial difference in the prevalence of mild-moderate HTG between individuals in the highest (64.3%) vs the lowest (26.7%) deciles of TG-PRS, this difference did not reach statistical significance, likely related to the relatively small size of each decile bin (n = 14–15) and the effect size of the PRS used.

It is also important to consider when an intermediate biomarker, such as a lipid PRS, would be clinically useful. Lipid PRSs are predictors of lipid levels, which are risk factors for end-point diseases, such as coronary artery disease [6]. Lipid PRSs would not serve as actionable markers in adults where lipid levels are obtainable from routine bloodwork, as the lipid level itself would guide management. Future research may provide a potential use-case for 22q11.2DS in childhood, prior to HTG manifestation, if a well-validated and more highly predictive TG-PRS was shown that could further motivate preventive measures, such as physical exercise and healthy diet, for those individuals with high TG-PRS.

The results of this study demonstrate the potential value of a sample of individuals with a rare clinically relevant CNV where there is both deep phenotyping and genome sequencing data available to study complex, polygenic disorders, especially those of high clinical relevance. To our knowledge, this study contains the largest available cohort of adults with a 22q11.2 microdeletion and lipid level and other essential phenotypic data that enabled an assessment of the added value of lipid PRSs and combined ability with phenotypic data for predicting HTG. This study would not be possible using current large-scale biobank data, as there is an acknowledged selection bias against individuals with rare high-impact CNVs (e.g., n = 10 with a pathogenic 22q11.2 deletion in UK BioBank) [50, 51]. Sequencing data enabled the identification of two rare pathogenic/likely pathogenic variants relevant to lipid disorders.

This study also has several limitations. Due to the rarity of 22q11.2DS, especially with relevant adult data, the sample size is relatively small. This limited the ability to compare mild-moderate HTG prevalence between extremes of the PRS distribution (e.g., highest vs lowest decile of TG-PRS), and to further assess effects of rare variant burden on lipid levels [8, 10]. Also, given the sample size limitations, we opted to include individuals of non-European ancestry in the main analyses and accounted for ancestry using PCA, despite applying a PRS developed using only individuals of European ancestry where the predictive performance may be decreased by up to half [29] when applied to non-European ancestries. Although we observed no substantial differences in effect sizes when restricting to only individuals of European ancestry, using a PRS derived from a multi-ancestry GWAS [9] may yield better overall performance. Also, we were unable to assess the influence of lifestyle/environmental factors on TG levels; this will require future studies.

Conclusions

In conclusion, we found that the TG-PRS is associated with TG levels in the context of elevated baseline risk for mild-moderate HTG conferred by the 22q11.2 microdeletion. The results contribute to the body of literature demonstrating how additional genome-wide common variation can modify the expression of a high-impact rare variant.