Details of participants are given in Table 1 and Fig. 1. Ethical approval was obtained from the Ethics Review Committee at the Aga Khan University, Karachi, Pakistan.
Table 1 Baseline characteristics of the enrolled participants
Study design
A population-based sample representative of adults in Karachi was recruited in a cluster randomised trial of strategies for control of hypertension (COBRA, ClinicalTrials.gov ID no. NCT00327574). The sampling details have been described previously [15]. Briefly, a multi-stage cluster random sampling design was used to randomly select 12 geographical clusters in Karachi, the largest metropolitan city in Pakistan. A census was done and a listing made of all individuals from all households in the selected areas. All participants aged 40 years or above and residing in the same household were invited to participate in the study by trained community health workers. All participants were evaluated after obtaining informed consent. A range of anthropometric and biochemical data were collected from all consenting participants, with and without hypertension, who had been screened for eligibility for the trial. The enrolled group was categorised according to the five main Pakistani ethnic subgroups into Muhajirs, Punjabis, Sindhis, Baluchis, Pashtun and others (Table 1) [16, 17]. The numbers of individuals involved differed slightly by trait, as genotyping success rates differed slightly and individuals with diabetes had to be excluded from the fasting glucose analysis.
SNP selection
We only selected single nucleotide polymorphisms (SNPs) from GWAS studies of European individuals because these studies include the majority of GWAS findings. We aimed to test the effects of SNPs identified through European GWAS in Pakistanis. We selected the ten SNPs most strongly associated with triacylglycerol levels [8, 9] (excluding FADS1 rs174547 and GCKR rs1260326 because they are strongly associated with several other quantitative traits relevant to diabetes), the five SNPs most strongly associated with BMI [10], 16 SNPs associated with fasting glucose [7] and 29 SNPs associated with blood pressure [12].
Genotyping
We genotyped all SNPs except the five BMI SNPs, using a modified Taqman assay, KASPAR assay (www.kbioscience.co.uk). The five BMI SNPs were genotyped using a pre-designed Taqman SNP genotyping assay from Applied Biosystems (Warrington, UK), followed by genotype clustering using Klustercaller software (Kbiosciences, Hoddesdon, UK). The final number of analysed individuals differed by trait because of different genotype success rates (we required individuals to have >85% of SNPs successfully genotyped for any one trait typed) and, for fasting glucose, because we excluded individuals with diabetes.
Fasting glucose SNPs
We excluded participants with diabetes as defined by use of glucose-lowering medication or fasting glucose ≥7 mmol/l, since these individuals would be on glucose-lowering agents, making their fasting glucose levels uninformative. Appropriate samples were available from 1,544 individuals, of which 18 (1%) failed four or more of the 16 attempted SNPs in the batch; these samples were excluded. The call rates were therefore generated from a subset of 1,526 samples, for which at least 13 of 16 SNPs were called (Fig. 1). Call rates by SNP in these 1,526 samples ranged from 96.81% to 99.4%.
Triacylglycerol SNPs
We used 2,111 participants, including those with diabetes. Of these, 60 (3%) failed two or more of the ten attempted SNPs in the batch and we excluded these samples. The call rates were therefore generated from a subset of 2,051 samples, for which at least nine of ten SNPs were called (Fig. 1). Call rates by SNP ranged from 98.5% to 100%.
BMI SNPs
We used 2,004 participants, including those with diabetes. Of these, 249 (12%) failed two or more of the five attempted SNPs in the batch and were excluded. The call rates were therefore generated from a subset of 1,755 samples, for which at least four of the five SNPs were called (Fig. 1). Call rates by SNP ranged from 88.7% to 99.4%.
Blood pressure SNPs
We used 1,833 samples. Of these, 15 (0.8%) failed 22 or more of the 25 attempted SNPs in the batch and were excluded. The call rates were therefore generated from a subset of 1,818 samples, for which at least 23 of 25 SNPs were called (Fig. 1). Call rates by SNP ranged from 94.2% to 98.6%.
Statistical analyses
Association of SNPs with phenotypes in South Asians
We log10-transformed triacylglycerol levels, and systolic and diastolic blood pressure. Fasting glucose did not require transformation. We generated within-study z scores for fasting glucose and log10-transformed triacylglycerol using the mean and SD of the samples. We used individual SNPs as independent variables and outcome measures as dependent variables in a per-allele test, with age, sex, BMI, ethnicity and clustering of household member. We coded ethnicity as a non-ordinal value of 1, 2, 3 and 4 for the Muhajir, Sindhi, Punjabi and other ethnicities respectively. For individual SNP associations, we did not have adequate power to replicate the effects observed in European samples, and we did not correct single SNP analyses for multiple testing. Instead, we report the number of SNPs associated in the same direction as European associations at p < 0.05 and compare this with the one of 20 SNPs expected by chance. We also tested the significance of genetic risk scores that combine information from all known variants. For these tests we had sufficient power, and based on the combined variants explaining >0.7% of the phenotypic variance and a minimal sample size of 1,500 individuals, we had >89% power to detect effects at p = 0.05.
We created weighted allele scores and accounted for the varying, previously reported effect sizes of each SNP using Eq. (1), where w is the β-coefficient from the individual regressions of SNP genotype against the outcome.
$$ {\text{Weighted score}} = \left( {{w_1} \times {\text{SN}}{{\text{P}}_1}} \right) + \left( {{w_2} \times {\text{SN}}{{\text{P}}_{{2}}}} \right) + .............. + \left( {{w_n} \times {\text{SN}}{{\text{P}}_n}} \right) $$
(1)
We rescaled the weighted score to reflect the number of available SNPs using Eq. (2) as described in Lin et al. [18]:
$$ {\text{Allele score}} = \frac{{{\text{Weighted score }} \times {\text{ Number of SNPs available}}}}{\text{Sum weight of the available SNPs}} $$
(2)
We used this allele score as the independent variable, and fasting glucose, triacylglycerol, BMI and systolic or diastolic blood pressure as the dependent variables, with age, sex, ethnicity, clustering of household member and BMI (except for the BMI analysis) as covariates in linear regression analyses.
For all traits, we also performed sensitivity analysis by computing the allele score based on only those alleles associated with the dependent variables at p < 0.05, and assessed their significance and the variability that was explained for each model.
We also evaluated the association between SNPs and secondary phenotypes. For this, we used the weighted allele score as independent variable and type 2 diabetes status (defined as use of glucose-lowering medications or a fasting blood glucose level of ≥7.0 mmol/l), hypertension status (defined as mean systolic blood pressure ≥140 mmHg or mean diastolic blood pressure ≥90 mmHg or taking anti-hypertensive medication) or hypertriacylglycerolaemia (defined as serum triacylglycerol >1.7 mmol/l) as the dependent variables in logistic regression analyses, with age and sex as covariates.
Mendelian randomisation and instrumental variables analysis
We performed a series of instrumental variables (IV)-based Mendelian randomisation analyses on several pairs of traits as described below. We used the weighted allele score associated with the primary trait as the ‘instrument’ to test the relationship between each pair of primary and secondary phenotypes. An IV analysis relates the variation in the potentially causal risk factor of interest (here one of the quantitative traits) that is influenced by the ‘instrument’ (here the weighted allele score) to the outcome (here one of the quantitative traits or type 2 diabetes). This method assumes that the IV (1) is not associated with measured or unmeasured confounders (likely to be true for genetic variants) [11] and (2) is only related to the outcome via its effect on the risk factor. This produces an estimate of the causal effect in a similar way to an intention to treat analysis in a randomised controlled trial [11]. To ensure our tests were as close as possible to meeting these assumptions, we excluded two SNPs (in or near GCKR and FADS1) known to have multiple effects on metabolic phenotypes.
When the outcome was a continuously distributed trait, we performed the IV estimation for each outcome in each study using the two-stage least squares estimator, implemented in the Stata command ‘ivreg2’. We tested for a difference between the IV and observational estimates using the Durbin–Wu–Hausman test of endogeneity.
When the outcome was a dichotomous trait (type 2 diabetes case–control status) we performed IV analysis using a logistic control function estimator. The analysis was performed in two stages. In the first stage, we assessed the observational association between the weighted allele score and the triacylglycerol z score. We saved the predicted values and residuals from this regression model. In the second stage, we used the predicted values from stage 1 as the independent variable (reflecting an unconfounded estimate of triacylglycerol levels due to these genotypes) and diabetes status as the dependent variable in a logistic regression analysis. The residuals from stage 1 were included as a covariate, representing residual variation in triacylglycerol levels that is not due to these genotypes. We then used a Wald test to assess the evidence of any difference between the predicted values coefficient (IV estimate of the causal effect of triacylglycerol levels on type 2 diabetes) and the residuals coefficient as a test of endogeneity.
The sensitivity analysis for the IV analysis was performed using the weighted allele score from the model only with alleles significantly associated with the outcomes.
Given a minimum sample size of 1,500, we had 80% power, at p < 0.05, to detect combinations of genetic variants that explain >0.5% of phenotypic variance. It is difficult to estimate the statistical power of Mendelian randomisation tests, but an approximate method is to take the product of the SNP–primary trait and primary trait–secondary trait effects. For example, an effect of 0.5% variance would require a SNP–primary trait association explaining 5% variance and a primary trait–secondary trait association explaining 10% variance. Most associations in our study were weaker than this and power therefore lower than 80% in most tests. However, given that these calculations are very approximate, we present the results of the tests, which in themselves provide better power calculations for further study [19, 20]. All statistical analyses were performed in Stata/IC v.12.1 for Windows (Stata, College Station, TX, USA).