Genetic support for the causal role of insulin in coronary heart disease
Epidemiological studies have identified several traits associated with CHD, but few of these have been shown to be causal risk factors and thus suitable targets for treatment. Our aim was to evaluate the causal role of a large set of known CHD risk factors using single-nucleotide polymorphisms (SNPs) as instrumental variables.
Based on published genome-wide association studies (GWASs), we estimated the associations between the established risk factors (blood lipids, obesity, glycaemic traits and BP) and CHD with two complementary approaches: (1) using summary statistics from GWASs to analyse the accordance of SNP effects on risk factors and on CHD; and (2) individual-level analysis where we constructed genetic risk scores (GRSs) in a large Finnish dataset (N = 26,554, CHD events n = 4016).
We used a weighted regression-based method for summary-level data to evaluate the causality of risk factors. The associations between the GRSs and CHD in the Finnish dataset were evaluated with logistic and conditional logistic regression models.
The summary-level data analysis revealed causal effects between glycaemic traits (insulin and glucose) and CHD. The individual-level data analysis supported the causal role of insulin, but not of glucose, on CHD. The GRS for insulin was associated with CHD in the Finnish cohort (OR 1.06 per SD in GRS, 95% CI 1.02, 1.10, p = 0.002).
These results support the causal role of insulin in the pathogenesis of CHD. Efficient treatment and prevention of insulin resistance is essential to prevent future CHD events.
KeywordsEpidemiology Genetics Insulin resistance Insulin sensitivity
DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome
Genetic risk score
Genome-wide association study
Insulin sensitivity index
Minor allele frequency
Randomised controlled trial
Prospective cohort studies have identified several risk factors for CHD, including high LDL- and low HDL-cholesterol, obesity, glycaemic traits and high BP. The slow progress of the disease and the potential for reverse causation and confounding make it difficult to infer the causal relationships between the risk factors and CHD using traditional epidemiological methods or intervention studies.
Randomised controlled trials (RCTs) are usually considered a gold-standard method to establish causality. For example, RCTs have shown that lowering LDL-cholesterol levels with statin treatment reduces cardiovascular events [1, 2, 3], which confirms the causal role of LDL-cholesterol in CHD. In contrast, even though low levels of HDL-cholesterol are associated with increased CHD risk, treatment that elevates HDL-cholesterol levels has not been successful in CHD prevention . The causal role of many other CHD risk factors, such as glycaemic traits and obesity, still needs to be determined.
Genetic data can be used to infer causal relationships between risk factors and disease in a more cost-effective way compared with traditional RCTs. As genotypes are determined at conception, they are unaffected by reverse causation and confounding and, thus, genetic data can be used to reveal causal and mechanistic insights leading to CHD. Genome-wide association studies (GWASs) have uncovered genetic variants robustly associated with various phenotypes. There is a growing number of studies demonstrating how the genetic data can be used to assess causal relationships between risk factors and disease. For example, studies using summary-level data from GWASs have supported the causal role of blood triacylglycerols, but not HDL-cholesterol, on CHD [5, 6].
In this study, our objective was to simultaneously evaluate the causal role of several CHD risk factor traits by using genome-wide genetic data from large-scale GWASs [7, 8, 9, 10, 11, 12]. We studied associations between the established risk factors (blood lipids, obesity, glycaemic traits and BP) and CHD with two complementary approaches: (1) using summary statistics from GWASs to analyse the accordance of SNP effects on risk factors and on CHD; and (2) individual-level analysis where we constructed genetic risk scores (GRSs) in a large Finnish dataset (N = 26,554, CHD events n = 4016).
Genetic marker selection
For our analyses, we selected all SNPs associated with established CHD risk factors and CHD at genome-wide significance level from the largest GWASs for each trait (for lipids n = 188,577, for BMI n = 339,224, for WHR n = 224,459, for glycaemic traits n = 133,010, for BP n = 200,000, for CHD n = 184,305) [7, 8, 9, 10, 11, 12] (electronic supplementary material [ESM] Table 1). For each SNP, we obtained effect estimates for LDL- and HDL-cholesterol, triacylglycerols, BMI, WHR, fasting insulin and glucose, systolic and diastolic BP and CHD from the public databases of published GWASs. The effect sizes for lipids and obesity traits are in SD units from inverse normal-transformed traits. Fasting insulin results are in ln-transformed units for fasting insulin. Data on lipid traits have been contributed by Global Lipids Genetics investigators, and have been downloaded from http://csg.sph.umich.edu//abecasis/public/lipids2013 (accessed 12 June 2016). Data on BMI and WHR have been contributed by Genome-wide Investigation of ANThropometric measures (GIANT) investigators and have been downloaded from www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files (accessed 12 June 2016). Data on glycaemic traits have been contributed by the Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) investigators and have been downloaded from www.magicinvestigators.org (accessed 12 June 2016). Data on BP traits have been contributed by International Consortium for Blood Pressure and have been downloaded from www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000585.v1.p1 (accessed 12 June 2016). Data on coronary artery disease/myocardial infarction have been contributed by CARDIoGRAMplusC4D investigators and have been downloaded from www.cardiogramplusc4d.org (accessed 12 June 2016).
Summary-level data analysis
To evaluate the causal effects of risk factor traits on CHD, we applied a statistical method that controls for pleiotropic effects, noted as ‘weighted regression-based method’ . This method has been described in Do et al and Burgess et al [5, 14]; in short, we extracted the SNP effect estimates for LDL- and HDL-cholesterol, triacylglycerols, BMI, WHR, fasting insulin and glucose, systolic and diastolic BP, and CHD from the summary result files of reference GWASs. As some of the reference GWASs were conducted by using cardio-metabochip data, we were able to find 278 SNPs that had effect estimates available for all traits. In addition, we removed SNPs based on their high correlation with other SNPs (r2 > 0.5), as strong linkage disequilibrium (LD) between the SNPs might result in too narrow CIs . This resulted in 257 SNPs. Pairwise SNP correlations were estimated using the Finnish population cohort FINRISK. To further evaluate the pleiotropy assumption, we applied a sensitivity analyses with the R package gtx (version 0.0.8, https://cran.r-project.org/web/packages/gtx/index.html). In this method, SNPs are removed based on the null hypothesis that they affect disease risk only through their effects on the trait of interest (i.e. they have no pleiotropic effects). The details of this method are provided in Johnson ; in short, the SNPs are filtered stepwise until there is no significant heterogeneity (p ≤ 0.05) of the SNP effects on CHD risk relative to the effects on the risk factor of interest. In addition, the effect of influential observations was evaluated using the Cook’s D statistic.
The association between the SNP effects on the trait of interest (βTrait_1) and CHD (βCHD) were tested with a linear regression model. The effects of genetic variants on other risk factor traits were taken into account in the modelling so that the SNP effects on other traits (βTrait_2,…,βTrait_n) were included as covariates in the linear regression model. The regression model parameters were estimated using weights defined by inverse standard errors of βCHD to account for uncertainty of the β coefficients .
Individual-level data analysis
The prospective study population consists of 19,751 individuals from four FINRISK studies with risk factor and genome-wide genetic marker data. Each FINRISK study is a random sample drawn from the 25–64- or 25–74-year-old inhabitants in different regions of Finland. Extensive independent surveys were conducted in 1992, 1997, 2002 and 2007, and hospitalisation and mortality data for study participants were obtained from the Finnish National Hospital Discharge Register and the Finnish National Causes-of-Death Register. CHD was defined as myocardial infarction, unstable angina pectoris, coronary revascularisation, or death due to CHD (ICD-9 codes 410-414, 798 [www.icd9data.com/2007/Volume1] and ICD-10 codes I20-I25, I46, R96, R98 [www.who.int/classifications/icd/en/]). The registry-based follow-up ended on 31 December 2013.
The data were genotyped in six batches with Illumina 610K, Illumina Omni Express, Illumina HumanCoreExome and Affymetrix 6.0. SNPs with Hardy–Weinberg equilibrium (HWE) p value < 1 × 10−6, call rate < 95% (99% for SNPs with minor allele frequency [MAF] < 5%) were excluded. For Illumina 610K, Illumina Omni Express and Affymetrix 6.0 MAF ≤ 1% were excluded. In HumanCoreExome data, monomorphic markers and singletons were excluded. Each subset was imputed with IMPUTE 2  by using the 1000 genomes reference panel (September 2013 release). After imputation, data were merged with qctool software (version 1.4, www.well.ox.ac.uk/~gav/qctool/#overview). The FINRISK study was approved by the ethical committee of the National Public Health Institute, Helsinki, Finland. Participants gave written informed consent.
The DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study is a subcohort of FINRISK 2007. The cohort has detailed phenotypic information on metabolic traits (including the oral glucose tolerance test) and genetic data genotyped with the Illumina cardio-metabochip. In this study, 3971 individuals from the DILGOM study sample were used for sensitivity analysis of glycaemic traits.
The Corogene cohort is a case–control dataset including 2113 Finnish CHD patients and their matched controls . CHD patients were collected from the Helsinki University Hospital in 2006–2008, where CHD was defined as coronary artery obstruction > 50% in at least one coronary artery. Controls were collected from the FINRISK participants living in the Helsinki-Vantaa region. For each CHD case, two controls were matched by sex and birth year. The data were genotyped with Illumina 610K and the same pre-imputation quality control was applied as for FINRISK. Data were imputed with IMPUTE 2 by using the 1000 genomes reference panel (September 2013 release). The Corogene study was approved by the ethics committee of Helsinki University Hospital. Participants gave written informed consent.
The Genmets cohort is a case–control study for the metabolic syndrome, sampled from the Health 2000 survey . Half of the study participants were defined to have the metabolic syndrome based on IDF criteria, and and half of the study participants were their matched controls. Prevalent CHD was determined at baseline based on self-reports and clinical diagnoses made by the field physicians. The genetic data were genotyped with Illumina 610K and the same quality control was applied as for FINRISK and Corogene cohorts. Data were imputed with IMPUTE 2 by using the 1000 genomes reference panel (September 2013 release). For the present study, a total of 1998 individuals were included for the analysis. The Health 2000 study was approved by the ethical committee of the National Public Health Institute, Helsinki, Finland. Participants gave written informed consent.
Imputation quality for all selected SNPs was good (information score > 0.8), thus there was no need to exclude any SNPs based on the imputation quality. We calculated GRSs for each trait as a weighted sum of the risk alleles and divided by the number of SNPs in each score, after which GRSs were standardised (mean = 0, SD = 1). Effect estimates obtained from discovery GWASs were used as the weights [7, 8, 9, 10, 11]. There was a high correlation between systolic BP (SBP)-GRS and diastolic BP (DBP)-GRS (r = 0.87). Otherwise, the correlations were modest (|r| < 0.41, ESM Fig. 1).
The associations between the GRSs and CHD events were estimated with logistic regression and conditional logistic regression models. Analyses were conducted separately in FINRISK, Corogene and Genmets study cohorts. We first fitted the models individually for each GRS, and then combined them into a single model to reduce the effect of pleiotropic SNPs that are included in more than one GRS. DBP-GRS was not included in the joint model because of its high correlation with SBP-GRS. In FINRISK, we adjusted the models for age, sex, cohort indicator and genotyping batch. To avoid overlap with the Corogene cohort, those individuals used as controls in Corogene were excluded from the FINRISK dataset. In Corogene, we studied the association with conditional logistic regression analysis. In Genmets, we used logistic regression model adjusted for age, sex and metabolic syndrome case status. The results from the three datasets were combined with fixed-effects meta-analysis, as no significant heterogeneity was detected (ESM Table 2).
We then performed some additional analyses in the FINRISK cohort (N = 19,078, incident CHD events n = 1376), where the risk factor traits were measured at baseline (1992, 1997, 2002 and 2007). We estimated the associations between the GRSs and risk factor traits. The measurements of insulin and glucose were available only for a subset of the data (n = 12,754 for insulin, n = 11,194 for glucose). The trait distributions were inspected and outliers removed, and log-transformation was applied to normalise triacylglycerols and insulin. In the analyses for lipid traits, those individuals receiving lipid-lowering medication were excluded. For BP analyses, we added 15 mmHg for SBP and 10 mmHg for DBP for individuals receiving treatment for high BP. For glycaemic traits, diabetic individuals were excluded. Associations between the GRSs and risk factor traits were obtained with linear regression models adjusted for age, sex, cohort indicator and genotyping batch. To evaluate causal effects of risk factor traits on CHD, we applied the two-stage least-squares (2SLS) method that controls for pleiotropic effects of the GRSs. This is a two-step regression model, where the risk factors are first regressed on the GRSs and then the outcome is regressed on the fitted values of the risk factors .
All analyses were done with the R statistical package (version 3.0.2).
Summary-level data analysis
Associations between the SNP effects on risk factor traits and SNP effects on CHD in a multivariable model
Effects of 257 SNPs (units)
2 × 10−16***
3 × 10−5***
Associations between the SNP effects on risk factor traits and SNP effects on CHD in a multivariable model, without βIns
Effects of 257 SNPs (units)
2 × 10−16***
Individual-level data analysis
Sex, n (%)
Mean age, years (SD)
Blood lipids, mmol/l, (SD)
BP, mmHg (SD)
BMI, kg/m2 (SD)
WHR, cm/cm (SD)
Glucose, mmol/l (SD)
Insulin, pmol/l (SD)
Prevalent diabetes, n (%)
CHD, n (%)
Prevalent CVD, n (%)
Incident CHD, n (%)
Associations between GRSs, risk factors and CHD
Associations between GRSs and CHD (N = 25,534, cases n = 4016)
No. of SNPs
Significance after multiple testing correctionb
4 × 10−5
The analyses with risk factor data in the FINRISK cohort showed that all the GRSs were associated with the corresponding risk factor trait (p ≤ 1 × 10−8, ESM Table 5). However, the 2SLS model yielded non-significant results for all risk factors, probably because of lack of power and imprecise measurement of glycaemic traits due to inadequate fasting (n = 12,754 for insulin, n = 11,194 for glucose with, on average, 5 h fasting). The modest association between LDL-GRSs and CHD also suggests that the power of the GRSs to evaluate causal process is relatively weak; in fact, the GRSs only explained 0.3–7.3% of the trait variance (ESM Table 5).
As data for glycaemic traits were limited, we performed a sensitivity analyses using the DILGOM metabochip data with > 8 h fasting (n = 3971), from which we estimated the effect of measured insulin levels on incident CHD. The effect estimate of fasting insulin on CHD in the DILGOM cohort was higher (HR 1.32 per SD in GRS, 95% CI 1.14, 1.53) than in FINRISK (HR 1.18 per SD in GRS, 95% CI 1.12, 1.25). This result illustrates how incomplete fasting can bias the observational estimates of insulin. To further evaluate the ability of insulin-GRS to reflect insulin resistance, we studied associations between insulin-GRS and different measurements of insulin resistance in the DILGOM cohort. Insulin-GRS was associated with fasting and 2 h insulin (insulin 2 h after oral glucose tolerance test), HOMA-IR and insulin sensitivity index (ISI) (ESM Table 6).
We have shown evidence that glycaemic traits (insulin and glucose) are causally related to CHD. Our results also support the previous findings that triacylglycerols and LDL-cholesterol are causal risk factors for CHD. Moreover, the SNP effects for insulin explained the effects for BMI and WHR, which suggests that the risk-increasing effect of obesity is at least partially mediated by insulin. This is in line with the idea that healthy obesity (as occurs in 10–25% of obese individuals) is due to preserved insulin sensitivity . Our results from the individual-level data analysis supported the causal role of insulin, but not glucose, on CHD.
Insulin resistance usually coexists with obesity, but not always. For example, insulin resistance is common in some populations with normal BMI . The pathological role of insulin resistance in CVD has been debated for decades. The Helsinki Policemen Study was one of the first prospective studies demonstrating associations between high plasma insulin and incident CHD [21, 22, 23], stroke  and all-cause mortality . Further, other markers of insulin resistance, namely HOMA-IR and ISI, have been shown to associate with CVD risk independently of the metabolic syndrome [26, 27, 28]. Other studies, however, have shown that fasting insulin levels predict the CVD risk only moderately and the effect is highly dependent on the measurement assay used .
Contradictory results could be at least partly explained by the limitations in observational epidemiological studies to assess causality. Observational studies are prone to confounding, reverse causality and multicollinearity, which could especially affect the traits that are measured with lower precision, such as insulin resistance. The advantage of using genetic markers in epidemiological studies compared with traditional risk factor traits is that it is possible to avoid most of these common problems. However, it is probable that SNPs associated with direct measurement of insulin resistance (rather than fasting insulin) would be more suitable genetic instruments for causal analysis. As genetic loci associated with insulin resistence will be identified, this hypothesis can be tested. Nevertheless, the correlation between fasting insulin and the gold-standard clamp measurement of insulin resistance is reasonably high, and the measurements correlate similarly with metabolic and cardiovascular risks . Thus, fasting insulin is a useful marker of insulin resistance in large-scale studies where it is not possible to conduct the clamp measurement.
To overcome the limitations of observational studies to assess causality, many studies have used summary-level data from GWASs to infer causal relationships between risk factors and CHD [5, 6, 31, 32]. Yaghootkar et al  showed that a GRS comprising SNPs associated with fasting insulin and adverse metabolic profile, but lower BMI, is associated with CHD. This is consistent with our finding that insulin is associated with CHD independently of obesity. Our study has several strengths compared with these studies. First, these studies have only evaluated one or a few traits at a time (e.g. lipids , BMI ). We are not aware of any other study that takes genetic effects on multiple CHD risk factor traits into account simultaneously. Second, the method for pleiotropic SNP effects suggested by Do et al  has been criticised as it does not account for uncertainty of the effect estimates . In our analyses, we applied a simple modification of this method that takes the uncertainty into account . Third, apart from studies using only summary-level data, we performed complementary analysis in our study with the large individual-level dataset with consistent disease event definitions and genome-wide genetic data. In addition, by using the population-based FINRISK cohort with extensive baseline health information and follow-up data, we were able to estimate the relationships between the GRSs, risk factor traits and prospectively ascertained CHD events in the same dataset. The diagnoses for incident CHD events in our data have been obtained from validated national registries with comprehensive coverage [33, 34].
Our study also has some limitations. First, we restricted our analysis to lead SNPs with genome-wide significance. Consequently, the variants only explain a small proportion of trait variability. Identifying new loci for these traits could highly increase the effect sizes between the GRSs and CHD and increase the power in MR analysis. Another way to create more powerful GRSs is to combine multiple genetic variants from a single gene region. Even if the variants are correlated, this method can provide more precise causal estimates . Second, limited data of glycaemic traits restricted us to perform powerful multivariable analysis with measured risk factor data. Thus, individual-level analysis with detailed phenotypic and genetic information is required to further elucidate the causal processes between insulin resistance and CHD. Third, even though our statistical modelling approach and sensitivity analyses minimise the potential for pleiotropic effects on our set of well-known CHD risk traits, it is possible that the tested SNPs have large effects on other factors and causal pathways leading to CHD. As further GWAS results for potential risk factors are published, these can be added to the future analyses of causality and thus minimise the possibility of pleiotropy.
In summary, our findings from complementary summary- and individual-level data analyses support the causal role of insulin and insulin resistance in the pathogenesis of CHD. Efficient prevention and treatment of insulin resistance is essential to prevent future CHD events.
We acknowledge CSC (IT center for science), Espoo, Finland for computational services.
This work was supported by the Academy of Finland (251217 and 255847 to SR, 139635 to VS, 263836 to SM), Center of Excellence in Complex Disease Genetics (to SR), EU FP7 projects ENGAGE (201413 to SR) and BioSHaRE (261433 to SR), the Finnish Foundation for Cardiovascular Research (to SR and VS), Biocentrum Helsinki (to SR) and the Sigrid Juselius Foundation (to SR).
Duality of interest
The authors declare that there is no duality of interest associated with this manuscript.
ET, MP and SR conceived and designed the study. SM, JS, M-LL, JS, AL, AJ and VS acquired the data. ET, MP, ASH, A-PS, SR and VS analysed and interpreted data. ET drafted the manuscript, and all authors critically revised it for key intellectual content. All authors approved the final version to be published. ET and SR are responsible for the integrity of the work as a whole.
- 12.Nikpay M, Goel A, Won HH et al (2015) A comprehensive 1,000 genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet 47:1121–1130Google Scholar
- 15.Johnson T (2012) Efficient calculation for multi-SNP genetic risk scores. Available from https://cran.r-project.org/web/packages/gtx/vignettes/ashg2012.pdf, accessed 6 June 2016
- 18.Aromaa A, Koskinen S (eds) (2004) Health and functional capacity in Finland. Baseline results of the Health 2000 health examination survey. National Public Health Institute, HelsinkiGoogle Scholar
- 22.Pyorala K, Savolainen E, Kaukola S, Haapakoski J (1985) Plasma insulin as coronary heart disease risk factor: relationship to other risk factors and predictive value during 9 1/2-year follow-up of the Helsinki Policemen Study population. Acta Medica Scand Suppl 701:38–52Google Scholar