Introduction

Given the relatively small sample sizes obtainable in pharmacogenetic studies1, consortia are forming with the goal of improving power to detect associations by pooling data generated from multiple studies. One effort relevant to this work is the International Serious Adverse Event Consortium (iSAEC). Phase 1 of the project has resulted in one of the largest DILI research collections in the world and initial studies have successfully identified genetic variants associated with DILI2.

While there are some successes identifying major risk factors of DILI (e.g., HLA-B*5701 as a major risk factor of liver injury due to flucloxacillin), no SNPs were identified in DILI across all drugs, nor have any been validated when flucloxacillin and coamoxiclav are excluded in this collection. This is likely due to limited power of detecting modest SNP effects with the relatively small cohort size. Understanding the heritability of these conditions is one step toward understanding genetic architecture of pharmacogenomic risk and may provide valuable information for designing future studies.

Genome-wide association studies (GWAS) usually assume that common diseases are attributable to allelic variants present in more than 1–5% of the population3. However, most variants identified so far in GWAS analyses explain relatively small increases in risk. Also, in complex conditions where heritability has been estimated through twin and family studies, GWAS have failed to produce findings to support estimations. For example, the heritability of height is estimated to be 80%, however only 4% can be explained by variants identified in GWAS studies4. This discrepancy has been described as the missing heritability problem5. Proposed explanations for this issue are that rare variants with larger effects, gene-gene interactions, and/or gene-environment interactions are poorly detected by GWAS approaches6. Another explanation for the missing heritability is that common variants of smaller effect have not yet been identified. Peter Visscher's group has developed the Genome-wide Complex Trait Analysis (GCTA) method to measure the effect of common variants using a mixed model with genome-wide SNP data7. Applying this approach has provided further evidence that a substantial proportion of heritability is captured by common SNPs in complex traits including height and body mass index (BMI).

This approach is particularly attractive to apply in the context of pharmacogenomics studies, where ascertainment of family data is difficult or even impossible. Given the implicit assumption that adverse drug reactions (ADRs) are subject to substantial genetic control in pharmacogenomics studies, it is important to present evidence of the size of the genetic component (or heritability) for the trait under investigation.

Heritability estimates for complex traits are typically determined through twin studies. However, these approaches have limited utility in the context of susceptibility to ADRs8. Limitations are primarily due to difficulties recruiting and obtaining clinical outcome data in twins. This limitation might be overcome by applying the GCTA approach that estimates the effect of common variants from genome-wide SNP data. In this study we assessed the performance of this approach with DILI GWAS datasets.

Results

Proportion of heritability captured by genome-wide common variants for drug-induced liver injury

We estimated the proportion of h2 captured by all genome-wide common SNPs, chromosome 6 genome-wide SNPs and genome-wide SNPs without chromosome 6 for DILI. Results for liver injuries induced by co-amoxiclav and flucloxacillin are summarized in Table 1. We observe that for flucloxacillin induced DILI patients, chromosome 6 explained almost all of the heritability (h2all = 0.48 ± 0.313, h2chr6 = 0.478 ± 0.076, h2NOchr6 = 0.18 ± 0.40). Whereas with co-amoxiclav induced DILI patients, chromosome 6 explained part of the heritability (h2all = 0.40 ± 0.16, h2chr6 = 0.17 ± 0.041, h2NOchr6 = 0.378 ± 0.17), indicating contributions from additional common variants are yet to be found. Estimates for h2 may be underestimated due to GCTA algorithm constraints so that SNP-heritability on the observed scale could not be greater than 1. Even so, this finding suggests that continuous collection of DILI cases is valuable for the potential of discovering additional associations with common genetic variants.

Table 1 Heritability estimates from iSAEC genome-wide data for co-amoxiclav and flucloxacilin induced drug-induced liver injury

GCTA algorithm estimation of heritability for moderate sample sizes

The heritability of T1D is estimated from pedigree studies to be 0.99,10. To date, the proportion of phenotypic variance explained by validated SNPs and genome-wide significant variants (including pre-GWAS loci with large effects) is 0.69,11. In addition, the proportion of phenotypic variance explained when all GWAS SNPs are considered simultaneously is 0.39,12. With the GCTA algorithm, we estimated the proportion of phenotypic variance explained by all genome-wide SNPs, chromosome 6 genome-wide SNPs and genome-wide SNPs without chromosome 6. Results of data for all WTCCC data are summarized in Table 2. Results of data for 75 and 200 cases are summarized in Table 3. Our simulations to assess of the impact of sample sizes on heritability estimates indicated relatively stable estimates with moderate sample sizes. We estimate the proportion of h2 captured by common SNPs for T1D to be, on average, 0.51 ± 0.23 with a sample size of 75 and 0.46 ± 0.15 with a sample size of 200. These estimates are similar to estimations made with the entire dataset of 1963 cases and 2938 controls (h2all = 0.44 ± 0.037). Moreover, estimates calculated from sample sizes of 75 and 200 cases gave similar results, with both indicating that chromosome 6 explains part of the heritability (the h2chr6 estimate is 0.19 ± 0.15 with 75 cases and 0.12 ± 0.058 with 200 cases). For a sample size of 75, however, the standard deviation for our h2chr6 estimate is close to the mean. This indicates that calculations are not stable with smaller sample sizes, underscoring the uncertainty about flucloxacillin-induced DILI. Estimates for h2chr6 calculated using this particular T1D dataset may also be underestimated due to inadequate SNP density in the MHC region in this particular data set.

Table 2 Heritability estimates from WTCCC genome-wide data for T1D
Table 3 Heritability estimation from WTCCC genome-wide data for T1D with limited number of samples

Analysis of measurement errors in the drug-induced liver injury dataset

Lastly, we observe that the estimated h2 is 0 for both flucloxacillin induced and co-amoxiclav induced DILI controls coded as cases. Results are summarized in Table 4. We conclude that there were no substantial measurement errors in our datasets.

Table 4 Heritability estimation to assess measurement errors

Discussion

While further investigation is required to confirm the robustness of the GCTA algorithm for low prevalence traits such as DILI, this work highlights the potential value of its application. Here we were able to apply the algorithm to investigate the contribution of common variants on the chromosome level. Such investigations can provide insight into disease mechanism and into inter-individual variation. Moreover, for ADRs in particular, we are able to provide previously unobtainable estimates of heritability. Such estimates will help optimize designs of future studies for identifying additional genetic contributions to these conditions.

Findings suggest that collecting and genotyping more co-amoxiclav induced DILI cases is valuable for discovering additional associations. While further investigation is required to confirm the robustness of the GCTA algorithm for low prevalence traits like DILI, this work highlights the potential value of its application. Particularly for pharmacological traits, we can provide previously unobtainable estimates of heritability. Such estimates will help optimize designs for future studies.

Methods

Study population

DILI datasets were from case-control studies of individuals taking flucloxacillin (77 cases and 288 population controls)2 and individuals taking co-amoxiclav (201 cases and 532 population controls)13. The genotyping data were generated using the Illumina 1M-duo described previously2,13.

Genome-wide complex trait analysis algorithm

The GCTA algorithm measures the effect of common variants with genome-wide SNP data using a mixed model approach7. The algorithm involves first estimating the genetic relationship matrix (GRM) of all individuals, then fitting the GRM in a mixed linear model (MLM) for binary traits to estimate the proportion of variance explained by all the autosomal SNPs. We used GCTA algorithm extended for case-control designs14 to estimate h2 captured by common SNPs for DILI. We also assessed the robustness of the GCTA algorithm with datasets of moderate sample sizes and low prevalence.

Estimating proportion of heritability captured by common variants

We used the GCTA algorithm to estimate h2 from iSAEC DILI genome-wide data (all chromosomes), chromosome 6 data and genome-wide data without chromosome 6. We also estimated h2 for all individuals (co-amoxiclav or flucloxacillin induced DILI), for individuals with co-amoxiclav induced DILI and for individuals with flucloxacillin induced DILI. For individuals with co-amoxiclav induced DILI we evaluated northwest Europeans only and southern Europeans both together and separately. All analyses were performed separately. For each population and dataset, unless otherwise noted, we calculated h2 adjusting for prediction errors due to global structure and local structure. We used principal components analysis (PCA) to adjust for prediction errors due to global structure. Two principal components were included as regression covariates in the mixed model. We also used an optional functionality of the GCTA tool to adjust for prediction errors due to imperfect linkage disequilibrium. All analyses were performed for DILI prevalence estimated to be 0.00012,15 and 0.0005. We reported main results using a prevalence of 0.0005 based on recent work (the estimated rate is 0.00043 in Iceland population16) and a general trend of underreporting of DILI incidences17.

Assessing the robustness of the GCTA algorithm with moderate sample sizes

To evaluate the robustness of the GCTA algorithm with datasets of moderate sample sizes, we conducted a positive control and negative control experiment. We performed simulations for our positive control experiment. Simulations involved estimating h2 for subsets of cases and controls from a Type I Diabetes (T1D) dataset. We choose to use a T1D because the histocompatibility complex (MHC) region on chromosome 6 has major genetic contribution to risks of both DILI2 and T1D11,18. Specifically, we used the Welcome Trust Case Control Consortium (WTCCC) T1D dataset (1963 cases and 2938 controls)18. Genotyping was conducted for the WTCCC study using an Illumina 550K chip. We estimated h2 from genome-wide data (all chromosomes), chromosome 6 data and genome-wide data without chromosome 6 for cases and controls with a 1:3 ratio, where the number of cases were 75 and 200 to reflect sample sizes similar to iSAEC DILI sample sizes. Estimates for h2 were averaged over twenty random selections of cases and controls for these sample sizes. The prevalence of T1D is estimated to be 0.008 19. We report the mean and standard deviation of the twenty h2 estimates. We also report h2 estimates and standard errors with the full WTCCC T1D dataset and the GCTA parameter for prevalence set at 0.0005 and 0.008. Given the WTCCC data were collected from a homogenous population, however, we do not adjust for prediction errors due to global structure.

For a negative control experiment we tested for measurement errors by estimating h2 for DILI controls coded as cases with WTCCC controls. This analysis was performed with northwestern European controls from the flucloxacillinin induced and co-amoxiclav induced DILI datasets. The prevalence of DILI was set to be 0.0005.