Introduction

Prospective cohort studies have identified several risk factors for CHD, including high LDL- and low HDL-cholesterol, obesity, glycaemic traits and high BP. The slow progress of the disease and the potential for reverse causation and confounding make it difficult to infer the causal relationships between the risk factors and CHD using traditional epidemiological methods or intervention studies.

Randomised controlled trials (RCTs) are usually considered a gold-standard method to establish causality. For example, RCTs have shown that lowering LDL-cholesterol levels with statin treatment reduces cardiovascular events [13], which confirms the causal role of LDL-cholesterol in CHD. In contrast, even though low levels of HDL-cholesterol are associated with increased CHD risk, treatment that elevates HDL-cholesterol levels has not been successful in CHD prevention [4]. The causal role of many other CHD risk factors, such as glycaemic traits and obesity, still needs to be determined.

Genetic data can be used to infer causal relationships between risk factors and disease in a more cost-effective way compared with traditional RCTs. As genotypes are determined at conception, they are unaffected by reverse causation and confounding and, thus, genetic data can be used to reveal causal and mechanistic insights leading to CHD. Genome-wide association studies (GWASs) have uncovered genetic variants robustly associated with various phenotypes. There is a growing number of studies demonstrating how the genetic data can be used to assess causal relationships between risk factors and disease. For example, studies using summary-level data from GWASs have supported the causal role of blood triacylglycerols, but not HDL-cholesterol, on CHD [5, 6].

In this study, our objective was to simultaneously evaluate the causal role of several CHD risk factor traits by using genome-wide genetic data from large-scale GWASs [712]. We studied associations between the established risk factors (blood lipids, obesity, glycaemic traits and BP) and CHD with two complementary approaches: (1) using summary statistics from GWASs to analyse the accordance of SNP effects on risk factors and on CHD; and (2) individual-level analysis where we constructed genetic risk scores (GRSs) in a large Finnish dataset (N = 26,554, CHD events n = 4016).

Methods

Genetic marker selection

Single-nucleotide polymorphisms (SNPs) can be used as instrumental variables in Mendelian randomisation (MR) analysis to assess causality of risk factors on disease. The valid instrument should be associated with the outcome only through the risk factor of interest, i.e. it should not have pleiotropic effects through other risk factors. Weak intruments and pleiotropic effects are common challenges of genetic instruments. Inclusion of a high number of SNPs increases the power of the instrument, but it also increases the risk of pleiotropy. On the other hand, excluding SNPs based on observed associations with other traits to minimise pleiotropic effects could lead to weak intrument bias. Mechanistic SNP exclusions based on pre-defined statistical thresholds might also exclude SNPs that are not pleiotropic, but have effects for other traits on the same causal pathway than the risk factor of interest (Fig. 1), which does not violate the causal inference. In this study, rather than excluding SNPs to decrease the risk for pleiotropy, we applied methods that take pleiotropic effects into account at the modelling phase [5, 13, 14].

Fig. 1
figure 1

Causal graph between genetic variant (G) and outcome (Y), when the variant is associated with two risk factors (X1, X2) in the same (a) and multiple (b) causal pathways

For our analyses, we selected all SNPs associated with established CHD risk factors and CHD at genome-wide significance level from the largest GWASs for each trait (for lipids n = 188,577, for BMI n = 339,224, for WHR n = 224,459, for glycaemic traits n = 133,010, for BP n = 200,000, for CHD n = 184,305) [712] (electronic supplementary material [ESM] Table 1). For each SNP, we obtained effect estimates for LDL- and HDL-cholesterol, triacylglycerols, BMI, WHR, fasting insulin and glucose, systolic and diastolic BP and CHD from the public databases of published GWASs. The effect sizes for lipids and obesity traits are in SD units from inverse normal-transformed traits. Fasting insulin results are in ln-transformed units for fasting insulin. Data on lipid traits have been contributed by Global Lipids Genetics investigators, and have been downloaded from http://csg.sph.umich.edu//abecasis/public/lipids2013 (accessed 12 June 2016). Data on BMI and WHR have been contributed by Genome-wide Investigation of ANThropometric measures (GIANT) investigators and have been downloaded from www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files (accessed 12 June 2016). Data on glycaemic traits have been contributed by the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) investigators and have been downloaded from www.magicinvestigators.org (accessed 12 June 2016). Data on BP traits have been contributed by International Consortium for Blood Pressure and have been downloaded from www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000585.v1.p1 (accessed 12 June 2016). Data on coronary artery disease/myocardial infarction have been contributed by CARDIoGRAMplusC4D investigators and have been downloaded from www.cardiogramplusc4d.org (accessed 12 June 2016).

Summary-level data analysis

To evaluate the causal effects of risk factor traits on CHD, we applied a statistical method that controls for pleiotropic effects, noted as ‘weighted regression-based method’ [14]. This method has been described in Do et al and Burgess et al [5, 14]; in short, we extracted the SNP effect estimates for LDL- and HDL-cholesterol, triacylglycerols, BMI, WHR, fasting insulin and glucose, systolic and diastolic BP, and CHD from the summary result files of reference GWASs. As some of the reference GWASs were conducted by using cardio-metabochip data, we were able to find 278 SNPs that had effect estimates available for all traits. In addition, we removed SNPs based on their high correlation with other SNPs (r 2 > 0.5), as strong linkage disequilibrium (LD) between the SNPs might result in too narrow CIs [13]. This resulted in 257 SNPs. Pairwise SNP correlations were estimated using the Finnish population cohort FINRISK. To further evaluate the pleiotropy assumption, we applied a sensitivity analyses with the R package gtx (version 0.0.8, https://cran.r-project.org/web/packages/gtx/index.html). In this method, SNPs are removed based on the null hypothesis that they affect disease risk only through their effects on the trait of interest (i.e. they have no pleiotropic effects). The details of this method are provided in Johnson [15]; in short, the SNPs are filtered stepwise until there is no significant heterogeneity (p ≤ 0.05) of the SNP effects on CHD risk relative to the effects on the risk factor of interest. In addition, the effect of influential observations was evaluated using the Cook’s D statistic.

The association between the SNP effects on the trait of interest (βTrait_1) and CHD (βCHD) were tested with a linear regression model. The effects of genetic variants on other risk factor traits were taken into account in the modelling so that the SNP effects on other traits (βTrait_2,…,βTrait_n ) were included as covariates in the linear regression model. The regression model parameters were estimated using weights defined by inverse standard errors of βCHD to account for uncertainty of the β coefficients [14].

Individual-level data analysis

Study cohorts

The prospective study population consists of 19,751 individuals from four FINRISK studies with risk factor and genome-wide genetic marker data. Each FINRISK study is a random sample drawn from the 25–64- or 25–74-year-old inhabitants in different regions of Finland. Extensive independent surveys were conducted in 1992, 1997, 2002 and 2007, and hospitalisation and mortality data for study participants were obtained from the Finnish National Hospital Discharge Register and the Finnish National Causes-of-Death Register. CHD was defined as myocardial infarction, unstable angina pectoris, coronary revascularisation, or death due to CHD (ICD-9 codes 410-414, 798 [www.icd9data.com/2007/Volume1] and ICD-10 codes I20-I25, I46, R96, R98 [www.who.int/classifications/icd/en/]). The registry-based follow-up ended on 31 December 2013.

The data were genotyped in six batches with Illumina 610K, Illumina Omni Express, Illumina HumanCoreExome and Affymetrix 6.0. SNPs with Hardy–Weinberg equilibrium (HWE) p value < 1 × 10−6, call rate < 95% (99% for SNPs with minor allele frequency [MAF] < 5%) were excluded. For Illumina 610K, Illumina Omni Express and Affymetrix 6.0 MAF ≤ 1% were excluded. In HumanCoreExome data, monomorphic markers and singletons were excluded. Each subset was imputed with IMPUTE 2 [16] by using the 1000 genomes reference panel (September 2013 release). After imputation, data were merged with qctool software (version 1.4, www.well.ox.ac.uk/~gav/qctool/#overview). The FINRISK study was approved by the ethical committee of the National Public Health Institute, Helsinki, Finland. Participants gave written informed consent.

The DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study is a subcohort of FINRISK 2007. The cohort has detailed phenotypic information on metabolic traits (including the oral glucose tolerance test) and genetic data genotyped with the Illumina cardio-metabochip. In this study, 3971 individuals from the DILGOM study sample were used for sensitivity analysis of glycaemic traits.

The Corogene cohort is a case–control dataset including 2113 Finnish CHD patients and their matched controls [17]. CHD patients were collected from the Helsinki University Hospital in 2006–2008, where CHD was defined as coronary artery obstruction > 50% in at least one coronary artery. Controls were collected from the FINRISK participants living in the Helsinki-Vantaa region. For each CHD case, two controls were matched by sex and birth year. The data were genotyped with Illumina 610K and the same pre-imputation quality control was applied as for FINRISK. Data were imputed with IMPUTE 2 by using the 1000 genomes reference panel (September 2013 release). The Corogene study was approved by the ethics committee of Helsinki University Hospital. Participants gave written informed consent.

The Genmets cohort is a case–control study for the metabolic syndrome, sampled from the Health 2000 survey [18]. Half of the study participants were defined to have the metabolic syndrome based on IDF criteria, and and half of the study participants were their matched controls. Prevalent CHD was determined at baseline based on self-reports and clinical diagnoses made by the field physicians. The genetic data were genotyped with Illumina 610K and the same quality control was applied as for FINRISK and Corogene cohorts. Data were imputed with IMPUTE 2 by using the 1000 genomes reference panel (September 2013 release). For the present study, a total of 1998 individuals were included for the analysis. The Health 2000 study was approved by the ethical committee of the National Public Health Institute, Helsinki, Finland. Participants gave written informed consent.

Statistical methods

Imputation quality for all selected SNPs was good (information score > 0.8), thus there was no need to exclude any SNPs based on the imputation quality. We calculated GRSs for each trait as a weighted sum of the risk alleles and divided by the number of SNPs in each score, after which GRSs were standardised (mean = 0, SD = 1). Effect estimates obtained from discovery GWASs were used as the weights [711]. There was a high correlation between systolic BP (SBP)-GRS and diastolic BP (DBP)-GRS (r = 0.87). Otherwise, the correlations were modest (|r| < 0.41, ESM Fig. 1).

The associations between the GRSs and CHD events were estimated with logistic regression and conditional logistic regression models. Analyses were conducted separately in FINRISK, Corogene and Genmets study cohorts. We first fitted the models individually for each GRS, and then combined them into a single model to reduce the effect of pleiotropic SNPs that are included in more than one GRS. DBP-GRS was not included in the joint model because of its high correlation with SBP-GRS. In FINRISK, we adjusted the models for age, sex, cohort indicator and genotyping batch. To avoid overlap with the Corogene cohort, those individuals used as controls in Corogene were excluded from the FINRISK dataset. In Corogene, we studied the association with conditional logistic regression analysis. In Genmets, we used logistic regression model adjusted for age, sex and metabolic syndrome case status. The results from the three datasets were combined with fixed-effects meta-analysis, as no significant heterogeneity was detected (ESM Table 2).

We then performed some additional analyses in the FINRISK cohort (N = 19,078, incident CHD events n = 1376), where the risk factor traits were measured at baseline (1992, 1997, 2002 and 2007). We estimated the associations between the GRSs and risk factor traits. The measurements of insulin and glucose were available only for a subset of the data (n = 12,754 for insulin, n = 11,194 for glucose). The trait distributions were inspected and outliers removed, and log-transformation was applied to normalise triacylglycerols and insulin. In the analyses for lipid traits, those individuals receiving lipid-lowering medication were excluded. For BP analyses, we added 15 mmHg for SBP and 10 mmHg for DBP for individuals receiving treatment for high BP. For glycaemic traits, diabetic individuals were excluded. Associations between the GRSs and risk factor traits were obtained with linear regression models adjusted for age, sex, cohort indicator and genotyping batch. To evaluate causal effects of risk factor traits on CHD, we applied the two-stage least-squares (2SLS) method that controls for pleiotropic effects of the GRSs. This is a two-step regression model, where the risk factors are first regressed on the GRSs and then the outcome is regressed on the fitted values of the risk factors [13].

All analyses were done with the R statistical package (version 3.0.2).

Results

Summary-level data analysis

The effect estimates of SNPs for the traits are denoted as βLDL, βHDL, βTG, βBMI, βWHR, βInsulinIns), βGlucoseGlu), βSBP, βDBP and βCHD. The results of the linear regression model βCHD ∼ βLDL + βHDL + βTG + βBMI + βWHR + βIns are shown in Table 1. The effects for glycaemic traits βIns and βGlu and lipids βLDL and βTG were associated with βCHD after accounting for all other risk factor related SNP effects. Interestingly, the effects for obesity traits βBMI and βWHR were also associated with βCHD only when βIns was omitted from the model (Table 2). Excluding other βs did not modify the results for other predictors. This supports the previous findings that LDL-cholesterol and triacylglycerols are causally related to CHD, and suggests that insulin is a causal risk factor that might explain the epidemiological associations between obesity traits and CHD.

Table 1 Associations between the SNP effects on risk factor traits and SNP effects on CHD in a multivariable model
Table 2 Associations between the SNP effects on risk factor traits and SNP effects on CHD in a multivariable model, without βIns

Our sensitivity analyses with gtx removed 68 SNPs based on the null hypothesis of no pleiotropic effects. With 189 SNPs, the effect size for βIns was 0.80 (95% CI 0.55, 1.05; p = 2.5−9; Fig. 2a, b), which is comparable with the estimate of βIns in Table 1. Cook’s D statistic indicated one influential SNP (rs3184504, ESM Fig. 2), but removing this from the model did not change the results (ESM Table 3). Finally, we performed the analysis without the SNPs with lower frequency (MAF < 0.05) as they might have imprecise effect estimates. This exclusion did not affect the results notably (ESM Table 4).

Fig. 2
figure 2

Effects of SNPs on insulin and risk for CHD. (a) Effects of all 257 independent SNPs. (b) Effects of 189 SNPs after removing the SNPs based on the heterogeneity test. Vertical grey lines show 95% confidence interval (CI) for each SNP. A solid red line shows the estimated effect for insulin on CHD with dashed lines representing the 95% CI

Individual-level data analysis

Study characteristics

The characteristics of the study cohorts are shown in Table 3. Four FINRISK studies were initiated in 1992, 1997, 2002 and 2007, when the baseline information was collected. The mean age at baseline was 47.9 years (SD 13.3 years) and 47.0% of participants were men and 53.0% women. During the median follow-up of 14.5 years (interquartile range 10.75–18.25), 1420 (7.2%) incident CHD cases occurred. In total, 653 individuals had been diagnosed cardiovascular disease (CVD) at baseline. The Corogene study consisted of 2113 CHD events and 4440 matched controls (of whom 1576 were unique). The Genmets cohort consisted of 1998 individuals of whom 99 (5%) had CHD at baseline.

Table 3 Cohort characteristics

Associations between GRSs, risk factors and CHD

Among nine GRSs for risk factor traits, BMI-, WHR- and insulin-GRSs had individually significant associations with CHD (Table 4). The estimates were largely similar in a multivariable model with all GRSs tested simultaneously (BMI-GRSs: OR 1.08 per SD in GRS, 95% CI 1.04, 1.13, p = 1 × 10−4; WHR-GRS: OR 1.06 per SD in GRS, 95% CI 1.02, 1.10, p = 0.003; insulin-GRS: OR 1.05 per SD in GRS, 95% CI 1.01, 1.09, p = 0.025).

Table 4 Associations between GRSs and CHD (N = 25,534, cases n = 4016)

The analyses with risk factor data in the FINRISK cohort showed that all the GRSs were associated with the corresponding risk factor trait (p ≤ 1 × 10−8, ESM Table 5). However, the 2SLS model yielded non-significant results for all risk factors, probably because of lack of power and imprecise measurement of glycaemic traits due to inadequate fasting (n = 12,754 for insulin, n = 11,194 for glucose with, on average, 5 h fasting). The modest association between LDL-GRSs and CHD also suggests that the power of the GRSs to evaluate causal process is relatively weak; in fact, the GRSs only explained 0.3–7.3% of the trait variance (ESM Table 5).

As data for glycaemic traits were limited, we performed a sensitivity analyses using the DILGOM metabochip data with > 8 h fasting (n = 3971), from which we estimated the effect of measured insulin levels on incident CHD. The effect estimate of fasting insulin on CHD in the DILGOM cohort was higher (HR 1.32 per SD in GRS, 95% CI 1.14, 1.53) than in FINRISK (HR 1.18 per SD in GRS, 95% CI 1.12, 1.25). This result illustrates how incomplete fasting can bias the observational estimates of insulin. To further evaluate the ability of insulin-GRS to reflect insulin resistance, we studied associations between insulin-GRS and different measurements of insulin resistance in the DILGOM cohort. Insulin-GRS was associated with fasting and 2 h insulin (insulin 2 h after oral glucose tolerance test), HOMA-IR and insulin sensitivity index (ISI) (ESM Table 6).

Discussion

We have shown evidence that glycaemic traits (insulin and glucose) are causally related to CHD. Our results also support the previous findings that triacylglycerols and LDL-cholesterol are causal risk factors for CHD. Moreover, the SNP effects for insulin explained the effects for BMI and WHR, which suggests that the risk-increasing effect of obesity is at least partially mediated by insulin. This is in line with the idea that healthy obesity (as occurs in 10–25% of obese individuals) is due to preserved insulin sensitivity [19]. Our results from the individual-level data analysis supported the causal role of insulin, but not glucose, on CHD.

Insulin resistance usually coexists with obesity, but not always. For example, insulin resistance is common in some populations with normal BMI [20]. The pathological role of insulin resistance in CVD has been debated for decades. The Helsinki Policemen Study was one of the first prospective studies demonstrating associations between high plasma insulin and incident CHD [2123], stroke [24] and all-cause mortality [25]. Further, other markers of insulin resistance, namely HOMA-IR and ISI, have been shown to associate with CVD risk independently of the metabolic syndrome [2628]. Other studies, however, have shown that fasting insulin levels predict the CVD risk only moderately and the effect is highly dependent on the measurement assay used [29].

Contradictory results could be at least partly explained by the limitations in observational epidemiological studies to assess causality. Observational studies are prone to confounding, reverse causality and multicollinearity, which could especially affect the traits that are measured with lower precision, such as insulin resistance. The advantage of using genetic markers in epidemiological studies compared with traditional risk factor traits is that it is possible to avoid most of these common problems. However, it is probable that SNPs associated with direct measurement of insulin resistance (rather than fasting insulin) would be more suitable genetic instruments for causal analysis. As genetic loci associated with insulin resistence will be identified, this hypothesis can be tested. Nevertheless, the correlation between fasting insulin and the gold-standard clamp measurement of insulin resistance is reasonably high, and the measurements correlate similarly with metabolic and cardiovascular risks [30]. Thus, fasting insulin is a useful marker of insulin resistance in large-scale studies where it is not possible to conduct the clamp measurement.

To overcome the limitations of observational studies to assess causality, many studies have used summary-level data from GWASs to infer causal relationships between risk factors and CHD [5, 6, 31, 32]. Yaghootkar et al [32] showed that a GRS comprising SNPs associated with fasting insulin and adverse metabolic profile, but lower BMI, is associated with CHD. This is consistent with our finding that insulin is associated with CHD independently of obesity. Our study has several strengths compared with these studies. First, these studies have only evaluated one or a few traits at a time (e.g. lipids [5], BMI [31]). We are not aware of any other study that takes genetic effects on multiple CHD risk factor traits into account simultaneously. Second, the method for pleiotropic SNP effects suggested by Do et al [5] has been criticised as it does not account for uncertainty of the effect estimates [13]. In our analyses, we applied a simple modification of this method that takes the uncertainty into account [14]. Third, apart from studies using only summary-level data, we performed complementary analysis in our study with the large individual-level dataset with consistent disease event definitions and genome-wide genetic data. In addition, by using the population-based FINRISK cohort with extensive baseline health information and follow-up data, we were able to estimate the relationships between the GRSs, risk factor traits and prospectively ascertained CHD events in the same dataset. The diagnoses for incident CHD events in our data have been obtained from validated national registries with comprehensive coverage [33, 34].

Our study also has some limitations. First, we restricted our analysis to lead SNPs with genome-wide significance. Consequently, the variants only explain a small proportion of trait variability. Identifying new loci for these traits could highly increase the effect sizes between the GRSs and CHD and increase the power in MR analysis. Another way to create more powerful GRSs is to combine multiple genetic variants from a single gene region. Even if the variants are correlated, this method can provide more precise causal estimates [35]. Second, limited data of glycaemic traits restricted us to perform powerful multivariable analysis with measured risk factor data. Thus, individual-level analysis with detailed phenotypic and genetic information is required to further elucidate the causal processes between insulin resistance and CHD. Third, even though our statistical modelling approach and sensitivity analyses minimise the potential for pleiotropic effects on our set of well-known CHD risk traits, it is possible that the tested SNPs have large effects on other factors and causal pathways leading to CHD. As further GWAS results for potential risk factors are published, these can be added to the future analyses of causality and thus minimise the possibility of pleiotropy.

In summary, our findings from complementary summary- and individual-level data analyses support the causal role of insulin and insulin resistance in the pathogenesis of CHD. Efficient prevention and treatment of insulin resistance is essential to prevent future CHD events.