Introduction

Pharmacogenomics is the study of how inter-individual genetic variation determines drug response or toxicity [1]. With the rapid development and increasing application of genome-wide genotyping and sequencing technologies, the field has shifted from evaluating single genes or pathways known to be associated with a drug's metabolic detoxification profile towards evaluating millions of variants using a comprehensive, unbiased approach. Genome-wide association studies (GWAS) involve the rapid evaluation of common SNPs throughout the genome for associations with complex diseases or pharmacological traits, and can be used in various study designs, including case-control studies, cohort studies and clinical trials [2]. The field of oncology is especially invested in the discovery of pharmacogenomic markers that predict drug response or toxicity, because chemotherapeutic drugs often have narrow therapeutic indices with toxicity or non-response being potentially life-threatening [3]. The aim is to identify genetic markers that will facilitate physician decision-making regarding optimal drug selection, dose and treatment duration on a patient-by-patient basis, with consequent improvement in drug efficacy and decreased toxicity.

Recent advances in sequencing technologies, statistical genetics analysis methods and clinical trial designs have shown promise for the discovery of variants associated with drug response. Successful clinical GWAS of cancer pharmacogenomic phenotypes have been reported [411], but replication of germline variant associations has been difficult, often owing to challenges associated with large clinical trials and a lack of well-defined replication populations in oncology. In this review, we will focus mainly on the contribution of germline genetic variations in chemotherapeutic toxicity and response, and discuss the advantages and limitations of GWAS in patient cohorts and lymphoblastoid cell lines (LCLs). Finally, we will reflect on the challenges of pharmacogenomic discovery for cancer chemotherapeutics and the implementation of these discoveries in the clinical setting.

Challenges of pharmacogenomic discovery

There are several differences between using pharmacogenomics to study cancer compared with other complex diseases. For one, there are two genomes (germline and tumor) to be considered. Variation in the germline genome represents inter-individual inherited genetic differences. In contrast, the tumor genome is composed of acquired somatic mutations that have accumulated over the evolution of the cancer, in addition to germline SNPs. Thus, variation in the tumor genome represents disease variation. The tumor genome is undeniably important in explaining the heterogeneous responses seen in patients treated with chemotherapy. An excellent example of this is the identification of somatic mutations in the tyrosine kinase domain of the epidermal growth factor receptor (EGFR) gene that correlate with response to gefitinib in non-small-cell lung cancer patients [12, 13]. However, previous studies have shown that chemotherapeutic response is likely a heritable trait, suggesting that germline genetic variation also contributes to a patient's response to a drug [1416]. The role of the germline genome in cancer pharmacogenomics will be the main focus of this review.

Another characteristic of pharmacogenomics in the field of oncology is the difficulty of performing studies in humans, especially using pedigrees or related individuals. Chemotherapeutics are too toxic to be given to unaffected individuals, and as a result classical genetic studies with related individuals are not possible. Furthermore, chemotherapy response and toxicity are probably multigenic traits; therefore, for most drugs, many biologically important signals do not reach genome-wide significance but may contribute to some extent to the trait [1719]. One solution to these challenges is to use a very large clinical study for the discovery of markers and then to confirm the findings in a large validation cohort [20]. However, this brings up one of the greatest challenges, which is that clinical studies are very expensive, and large clinical studies of a single agent, same dosage regimen of a chemotherapeutic are rare. Confounders might include concomitant medications or alternative therapies [21]. Despite these challenges, pharmacogenomic discovery has led to the identification of genetic markers associated with response to chemotherapy. Yet, even when significant genotype-pharmacological phenotype associations have been validated, effectively applying these discoveries to clinical practice remains challenging.

Genetic variants in germline DNA

Contributions to chemotherapeutic toxicity

There are several well-studied relationships between germline genetic variation in a metabolizing gene and drug toxicity. This has led to the inclusion of pharmacogenomic information for chemotherapeutics in the US Food and Drug Administration (FDA) drug labels to ensure prescribing physicians are aware of the consequences of relevant genetic information. Discoveries of pharmacogenomic-trait-associated genetic polymorphisms that have resulted in inclusion of pharmacogenomic information in FDA drug labels are listed in Table 1. We list only genetic variants, but there are several other biomarkers that can be utilized when prescribing drugs, including gene expression changes, chromosomal translocations and copy number variations.

Table 1 Genetic polymorphisms that are included as pharmacogenomic information in FDA labels for chemotherapeutic agents

Genetic variation in thiopurine methyltransferase (TPMT) is associated with myelosuppression after 6-mercaptopurine (6-MP) and 6-thioguanine (6-TG) treatment [22]. 6-MP is a standard treatment option for the most common childhood malignancy, acute lymphoblastic leukemia (ALL) [23]. In addition, data suggest that genetic testing of TPMT may be important not only for determining TPMT-related 6-MP toxicity but also for determining response to 6-MP, measured by minimal residual disease (MRD), in the early course of childhood ALL [24]. Dose modifications based on TPMT genetic testing are now recommended by the FDA, and have been adopted widely at St Jude Children's Research Hospital and certain other centers in the treatment of pediatric ALL [25, 26].

Genetic variation in the metabolizing enzyme UDP glucuronosyltransferase 1 family, polypeptide A1 (UGT1A1) is associated with irinotecan-induced neutropenia [27, 28]. Irinotecan is used to treat rhabdomyosarcoma and refractory solid tumors, and the high association between drug toxicity and genetic variation in UGT1A1 has resulted in an FDA-mandated label change [29].

Another well-studied example is 5-fluorouracil (5-FU)/capecitabine toxicities and dihydrophyrimidine dehydrogenase (DPYD) genetic variation, which is the rate-limiting enzyme in 5-FU catabolism [30, 31]. Associations between DPYD genetic variants, specifically heterozygosity for the defective DYPD*2A allele, were found to be a risk factor for 5-FU toxicities, including leucopenia and severe mucositis. Interestingly, the effects of this heterozygosity depended strongly on sex, because increased toxicity was observed only in men with the risk variant [32]. However, the predictive value of DYPD*2A genotyping is limited, and although the FDA label for 5-FU, which is used in the treatment of several cancers, states that patients with DPYD enzyme deficiency should not use 5-FU-based chemotherapy, the FDA does not require genetic testing [15, 33].

These findings are all examples of the successful implementation of genetic testing in the clinic to affect drug treatment strategy. In each case, the genetic variants were discovered by candidate gene studies focusing on genes involved in drug metabolism and were found to have a large effect size. However, for most chemotherapeutics, toxicity and response are probably multigenic traits, dependent on multiple SNPs in modifier genes that have small effect sizes. Thus, a more comprehensive technique, such as GWAS, has been critical for furthering our understanding of genetic influences on chemotherapeutic toxicity and response.

In 2010, a GWAS was conducted that aimed to identify genetic variants associated with a common side effect of aromatase inhibitors, adverse musculoskeletal effects [8]. Aromatase inhibitors are an alternative treatment to tamoxifen for post-menopausal, hormone-dependent breast cancer patients [34, 35]. The GWAS included 293 cases and 585 controls. The four most significant SNPs were located on chromosome 14, and T-cell leukemia 1A (TCL1A) was the gene closest to the four SNPs [8]. Although this study did not include a validation cohort, the authors performed follow-up studies in cell lines to identify potential mechanisms by which these SNPs may be contributing to adverse musculoskeletal effects. They found that one of the SNPs created an estrogen response element and that TCL1A expression was estrogen dependent, suggesting that patients who carry the SNP might be more sensitive to the reduction of estrogen caused by aromatase inhibitor treatment. Although the means by which TCL1A expression causes adverse musculoskeletal effects were not described, the functional follow-up of their GWAS findings was valuable to the study [8]. Having a potential mechanism to at least partly explain why a genetic variant influences drug response increases the chances that it is indeed biologically relevant, especially if a validation cohort is not available.

Contributions to response to chemotherapy

In contrast to chemotherapeutic toxicity, which affects normal cells, the tumor genome and the germline genome are probably both important in the response to chemotherapeutics. Many of the FDA-mandated label changes relevant to drug response relate to genetic variants in the tumor genome, such as somatic mutations in EGFR for gefitinib, erlotinib and cetuximab, as mentioned previously. Other well-studied examples of tumor gene-drug pairs are KRAS and cetuximab, and BRAF and vemurafenib (Table 1) [36]. However, several recent studies have demonstrated the importance of germline genetic variation in drug response using a GWAS approach. In 2009, two studies identified genetic variants that are critical in determining pediatric ALL patient prognosis [10, 37]. One paper focused on response to methotrexate, finding that, in a discovery cohort of 434 patients, the most significant associations were with SNPs in the organic anion transporter polypeptide SLCO1B1 [10]. These SNPs were validated in an independent cohort of 206 patients. SLCO1B1 mediates uptake and excretion of substrates from the blood, including methotrexate [38]. Further investigation by sequencing of SLCO1B1 demonstrated that both common and rare variants contribute to methotrexate clearance [4]. These studies were able to identify a novel gene that was previously ignored in candidate gene studies, emphasizing the benefit of utilizing unbiased, genome-wide approaches [3941].

The other study aimed to identify germline SNPs associated with risk of MRD after chemotherapy to induce ALL remission in pediatric patients [37]. It is important to note that GWAS with a pharmacological phenotype as the measured endpoint in clinical samples provide more specific data related to the drug than GWAS measuring overall survival. There are other examples of studies measuring overall survival in a population of cancer patients treated with a specific drug, but whether the SNPs identified by these studies are involved in drug responsiveness or in other factors important in overall survival, such as disease aggressiveness, cannot be elucidated without further functional studies [5]. This study investigated two independent cohorts of newly diagnosed pediatric ALL cases: 318 patients in St Jude Total Therapy protocols XIIIB and XV, and 169 patients in Children's Oncology Group trial P9906 [37]. The two patient cohorts were on slightly different remission-induction regimens with different time points for MRD measurement. One benefit to this strategy is that SNPs identified in both cohorts would be expected to have broader prognostic significance, but SNPs specific to either induction treatment could be missed. This study identified 102 SNPs associated with MRD in both cohorts, five of which were located within the IL15 locus. These SNPs were also associated with other leukemic phenotypes such as hematological relapse.

Both of these studies highlight the benefits of investigating genetic variants associated with drug response at a genome-wide level. They also address some of the challenges of GWAS, such as the high rate of false discoveries, variation between patient cohorts, and accessibility of validation cohorts. As a complement to clinical studies, LCLs can be used to investigate associations between genetic variation and chemotherapeutic susceptibility.

LCLs as a model for pharmacogenomic discovery

Some of the limitations of clinical GWAS can be overcome by performing whole-genome studies using cellular models. Studies performed in LCLs derived from large pedigrees have demonstrated a significant role of genetics in the variation in cellular sensitivity seen with several chemotherapeutic agents [14, 4245]. The International HapMap Project was launched in 2002 with the intention of creating a public database of common variations in the human genome [46]. The benefits of HapMap LCLs in identifying genetic variants associated with pharmacological traits include publicly available genotype and sequencing data, allowing for GWAS between the HapMap/1000 Genomes variants [47, 48] and cellular phenotypes. Furthermore, gene expression data [49, 50], cytosine modification patterns [5153], and microRNA data [54] are publicly available for several of the populations, making them a valuable resource for exploring genotype-phenotype relationships at a genome-wide level. Overlaying these datasets on top of each other allows researchers to investigate genetic and epigenetic influences on gene expression, and how they can affect cellular phenotypes such as cellular sensitivity to a drug (Figure 1). Unlike clinical GWAS, which can only show correlation, LCLs offer the opportunity to test the finding via experimental manipulation and therefore begin to get at the underlying biology. LCLs are an unlimited resource and allow for the evaluation of toxic drugs in a controlled testing system.

Figure 1
figure 1

Integration of LCL datasets allows for comprehensive investigation of genotype-phenotype relationships. Genotype information can be found in the International HapMap Project or 1000 Genomes Project databases. Publicly available cytosine modification and microRNA data can be included to identify SNPs associated with these epigenetic factors. Genetics and epigenetics can both influence gene transcriptional activity, which may ultimately lead to variation in pharmacological phenotypes.

However, as with any model system, there are disadvantages of working with LCLs for pharmacogenomic discovery. The phenotype observed from in vitro experiments may not be recapitulated in vivo. For example, studies have shown differences in LCL DNA methylation patterns compared to whole blood and peripheral blood samples [55, 56]. This suggests that LCLs may not recapitulate the epigenetic regulation of normal blood cells, which should be taken into consideration when analyzing downstream phenotypes. But there is still a strong genetic influence on inter-individual DNA methylation patterns in LCLs [51], and incorporating these data into epigenetic studies in LCLs may help researchers focus on biologically relevant epigenetic differences. Experiments with LCLs are also subject to in vitro confounders, such as Epstein-Barr virus (EBV) copy number, growth rate differences between cell lines, and thaw effects. A disadvantage that is especially important to take into consideration for pharmacogenomic studies is that most LCLs lack expression of many CYP450 enzymes and several transporters [57]; therefore, they are most useful for identifying the contribution of pharmacodynamic genes.

LCLs seem most appropriate as a model for chemotherapeutic toxicity and, to some extent, chemotherapeutic response, although they do not contain the extensive somatic mutations known to be present in tumors. There are several cellular phenotypes that can be measured to determine cellular sensitivity to a drug, including cytotoxicity, apoptosis, gene expression changes, and intracellular concentration of the drug or metabolite. Owing to the diverse world populations from which LCLs were created, inclusion of multiple ethnic populations allows for either investigation of inter-ethnic differences or meta-analyses of multiple populations to obtain 'cross-population' SNPs [58, 59].

In addition to identifying genetic variants associated with cellular pharmacological traits, LCLs have also been used to map SNPs associated with endophenotypes such as gene expression. Comprehensive expression quantitative trait loci (eQTL) maps can be analyzed in conjunction with pharmacological-trait-associated SNPs to evaluate the potential function of these associated SNPs [60]. Interestingly, SNPs associated with chemotherapeutic-induced cytotoxicity in LCLs are enriched in eQTLs [61]. Since most pharmacogenetic studies previous to GWAS were focused on variation in coding regions of known candidate genes, this was an important finding because it opened up the possibility that SNPs in introns or intergenic regions associated with gene expression contributed significantly to variation in pharmacological phenotypes. Furthermore, connections between pharmacologically important variants and eQTLs may lay the basis for understanding the mechanism behind genetic influence on cellular sensitivity to chemotherapy.

To facilitate the integration of genotype, gene expression and drug phenotype data in LCLs, the 'triangle model' was first proposed in 2007 [62]. The first side of the triangle is a GWAS between SNPs and a pharmacological phenotype. On the second side, eQTL analysis is performed on the most significant SNPs from the first side to identify SNPs associated with expression of a gene. To complete the triangle, the expression of the eQTL target genes is tested for significant correlation with drug sensitivity. For example, the HapMap LCLs were used to investigate the role of genetic variation in susceptibility to cytarabine arabinoside (ara-C) [63]. Ara-C is an anti-metabolite used to treat patients with acute myeloid leukemia and other hematological malignancies [64]. Using the triangle method, four eQTLs were identified that explained 51% of the variability in ara-C sensitivity among HapMap individuals of European descent (CEU) and five SNPs that explain 58% of the variation among individuals of African descent (YRI). These SNPs were specific to each population, and the YRI population was observed to be more sensitive to ara-C compared to the CEU population.

Translation of LCL findings to the clinic

Although the use of LCLs as a model system for cancer pharmacology brings with it a variety of challenges [65], targets discovered through studies using the LCL model have been replicated in clinical trials, arguably the ultimate measure of utility (Figure 2). A candidate-gene approach in LCLs identified SNPs in FKBP that were associated with sensitivity to anti-leukemics, and these SNPs were found to also associate with clinical response in acute myeloid leukemia patients [66]. In another study using the LCL model, novel germline genetic biomarkers of platinum susceptibility were identified, and these variants were replicable in a clinical setting with head and neck cancer patients [67]. In another LCL study, a top SNP associated with resistance to cisplatin was found to be significantly associated with decreased progression-free survival and poorer overall survival in ovarian cancer patients [68]. A similar study assessed cisplatin cytotoxicity in LCLs from the Human Variation Panel. The 168 most significant SNPs identified in the LCL GWAS were then genotyped in 222 small-cell lung cancer and 961 non-small-cell lung cancer patients treated with platinum-based therapy [69]. Several of the top SNPs were trans-eQTLs, and subsequent knockdown of two of the target genes significantly decreased cisplatin sensitivity in three lung cancer cell lines. Although the top SNPs from these two platinum-based studies did not overlap, this may be attributed to the relatively small sample sizes, differences in ethnicities, differences in cell line panels (HapMap versus Human Variation Panel), and other common LCL confounders such as intrinsic growth rate and ATP levels [69].

Figure 2
figure 2

Translation between cell-based models and clinical studies is bidirectional. The identification of SNPs associated with drug response from a GWAS in LCLs has to be confirmed in patient studies to determine clinical significance. Conversely, SNPs associated with drug response that are identified in a patient cohort and are confirmed in a validation cohort can be experimentally tested in the LCL model to determine biological significance.

Furthermore, recent work from our group has shown that LCLs are able to model paclitaxel-induced peripheral neuropathy. Paclitaxel is a tubulin-targeting agent used in the treatment of many cancers, including breast, lung, head and neck, and ovarian [70]. Peripheral neuropathy is a common side effect of many chemotherapeutic agents, including paclitaxel, and limits their efficacy in patients [71]. A recent GWAS conducted with the CALGB 40401 patient cohort aimed to identify germline genetic variants associated with this adverse effect, and found significant associations with SNPs in FGD4 in both the discovery and validation cohorts [7]. Modeling this toxicity in LCLs would allow for functional follow-up studies to understand better the mechanisms behind this specific adverse effect. To test LCLs as a potential model for peripheral neuropathy, a GWAS was performed in 247 HapMap LCLs and the results from this experiment were compared to the CALGB 40101 GWAS of sensory peripheral neuropathy in 859 breast cancer patients treated with paclitaxel in the previous study. We observed an enrichment of LCL cytotoxicity-associated SNPs in the peripheral-neuropathy-associated SNPs from the clinical trial with concordant allelic directions of effect (empirical P = 0.007) [72]. A second study investigated cis-eQTLs in β-tubulin IIa (TUBB2A) and their correlation with paclitaxel neurotoxicity in 214 cancer patients treated with paclitaxel [73]. Patients with promoter genotypes associated with higher levels of TUBB2A expression experienced less paclitaxel neurotoxicity. In subsequent analyses in LCLs, it was found that increased TUBB2A expression correlated with resistance to paclitaxel. This is another example of how clinical studies and LCL experiments can complement each other to generate a more comprehensive understanding of the role of genetic variation in drug sensitivity [73].

Clinical implementation of pharmacogenomic discoveries

Although the idea of 'personalized medicine' has generated much excitement, the clinical use of pharmacogenomic discoveries remains uncommon. One of the barriers to the use of pharmacogenomic testing is that some prescribing decisions must be made quickly, making the need to wait for a genetic test unappealing to many physicians [74]. A solution to this is preemptive genetic testing. However, preemptive genetic testing has life-long implications, and the physician must make the decision whether to disclose all of the patient's genetic information or just the information relevant to the current prescribing situation [74]. For example, genetic variation in genes important in drug metabolism and transport may be important in adverse drug responses to several drugs, not just chemotherapy; thus, the patient's genotype for these drugs may be useful in future clinical decisions [75].

In order to study the feasibility of incorporating prospective pharmacogenomic testing, the 1200 Patients Project at The University of Chicago has been designed as a model to identify and overcome barriers to the clinical implementation of pharmacogenomics [76]. This model system is prospectively recruiting 1,200 adults who are receiving outpatient care under one of 12 'early adopter' physicians. Preemptive comprehensive pharmacogenomic genotyping will be performed on all patients in a high-throughput Clinical Laboratory Improvement Amendments setting. This addresses the barriers of time delay and cost, because physicians will receive genetic information about a patient from a single, cost-effective test for many pharmacogenomic variants before they prescribe any drug. Using a genotyping platform designed for specific variants associated with pharmacogenomic traits also reduces the ethical concerns raised regarding next-generation sequencing, which may identify incidental genetic findings such as genetic variants associated with disease risk [76].

If genetic information about patients is to be made available to physicians, databases that facilitate physicians' searches for the impact of specific SNPs on relevant drugs will be needed, and are currently being developed [77]. The Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB) is an example of a database that serves as an interactive tool for researchers and physicians searching for information on genetic variation and drug response [78]. PharmGKB displays genotype, molecular and clinical data, and lets the user know the strength of the association based on the confidence of the existing literature. Users can search and browse the knowledge base by genes, drugs, diseases and pathways [78]. Yet, even with this information easily accessible, physicians as a community will still need guidance on how to handle such an abundance of knowledge. The realization of this challenge inspired the creation of the Clinical Pharmacogenetics Implementation Consortium (CPIC) in 2009 [79]. CPIC is a collaboration between Pharmacogenomics Research Network members, PharmGKB staff, and experts in pharmacogenetics, pharmacogenomics and laboratory medicine. Their goal is to provide clear, peer-reviewed guidelines to physicians in order to facilitate the effective use of pharmacogenetic tests in the clinic. Even with these efforts in place, it will require ongoing hard work and communication between researchers, physicians, pharmaceutical companies and patients before pharmacogenetic testing is implemented effectively and commonly in the clinic. For more information on the progress in and challenges of clinical implementation of pharmacogenomic testing in the clinic, please see the following literature [3, 80, 81].

Conclusions

Recent advances in genotyping and sequencing technologies have had a significant impact on the field of pharmacogenomics. The goal of pharmacogenomics is to use a patient's genotype to inform clinical decision-making regarding treatment strategies, with the ultimate goal of avoiding adverse drug reactions while achieving the best drug response. This review has highlighted several successful pharmacogenomic GWAS and discussed the challenges of identifying genetic variants associated with pharmacological traits. Future progress will likely require a combination of patient cohort studies as well as cell-based studies, and effective implementation of pharmacogenomic findings into clinical practice.