Pharmacogenomics and gene regulatory elements: an emerging picture

Most pharmacogenomics studies to date have focused on coding variants of pharmacologically important proteins. However, well-supported examples of variants in regulatory elements of genes involved in drug response, such as drug metabolizing enzymes and transporters (see review by Georgitsi et al. [1]), show that variants in noncoding regulatory sequences are also likely to be important (Table 1). Three classes of regulatory elements have been studied in this context: promoters, enhancers and microRNAs (miRNAs). Each of these has a direct impact on the abundance of messenger RNA (mRNA) (in the case of promoters and enhancers) and protein (in the case of miRNAs). Genetic variation within each of these elements has been linked to human disease as well as interindividual differences in drug response. For example, a single nucleotide polymorphism (SNP) in the promoter of VKORC1, the gene encoding vitamin K epoxide reductase complex subunit 1, radically affects an individual's response to the anticoagulant warfarin [2]. Likewise, a SNP within an enhancer in the vicinity of several solute carrier family (SLC) drug transporters is associated with increased clearance of methotrexate (MTX) [3], and a SNP within a 3'-untranslated region (UTR) miRNA binding site prevents resistance to the chemotherapeutic cisplatin [4].

Table 1 Pharmacogene regulatory variants linked to drug response

While a rough estimate of the number of coding genes exists [5], it is unclear how many regulatory elements there are in the genome. The difficulty in defining critical regulatory elements is compounded by the fact that the search space for regulatory elements is vast (98% of our genome is noncoding) and without clear sequence cues such as open reading frames. Next-generation sequencing (NGS) approaches are rapidly changing the status quo, revealing the location and function of regulatory elements on a genomic scale. Robust, high-throughput DNA sequencing platforms emerged as the Human Genome Project came to a close in 2003, as did a desire to establish a reference human epigenome [6]. Key technical advances have brought this goal closer to reality by enabling rapid de novo detection of DNA methylation [7, 8], enhancers [9, 10] and RNA transcripts [1113] on a genome-wide level (Table 2).

Table 2 Current next-generation sequencing technologies that are suitable for pharmacogene regulatory element discovery

In this review, we discuss the role of each class of regulatory element in drug response variability, and how our understanding of these mechanisms has been impacted by NGS approaches. We also discuss NGS technologies such as deoxyribonuclease I sequencing (DNase-Seq), formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-Seq), high-throughput chromosome capture (Hi-C) and chromatin interaction analysis by paired-end tag sequencing (ChIA-PET), which have not yet been applied to pharmacogene regulation but will greatly improve our ability to interpret noncoding genetic variation. Finally, we comment on the need for more efficient functional validation, and discuss other challenges that need to be considered in moving NGS data into the clinic.

Pharmacogene regulatory elements

There are many different classes of gene regulatory elements, including promoters, enhancers, miRNAs, silencers and insulators (Figure 1; for detailed reviews see Maston et al. [14] and Noonan et al. [15]). In this review, we will focus on the first three classes, each of which has been linked to multiple pharmacogenomic phenotypes (Table 1). That is not to say that other classes of regulatory elements are not important for pharmacogene regulation; they most likely are, but have not yet been identified. As functional validation assays improve, a more complete picture of regulatory mechanisms will no doubt emerge.

Figure 1
figure 1

Schematic summarizing the roles of different classes of regulatory elements. The proximal promoter (dark blue) is located in the immediate vicinity (-250 bp to +250 bp) of the gene's transcription start site (TSS; indicated by the arrow pointing to the right). Additional elements up to 5 kb upstream of this region are still considered part of the promoter (light blue). Promoters are enriched for transcription factor binding sites (TFBS) and are thought to serve as tethering elements for enhancers. The formation of an enhancer-promoter loop activates transcription of the target gene. MicroRNAs (miRNAs) can modulate the abundance of specific mRNA transcripts by binding to their 3' untranslated regions (UTRs). Silencers are thought to have the opposite effect to enhancers, turning off the expression of genes in specific tissues and at specific time points. Insulators are thought to act as barriers, preventing enhancers and silencers from regulating neighboring genes.

Promoters

Gene promoters are located at the 5' terminus of their target gene, and often have two separate domains, known as the core and proximal promoter regions. The core promoter is where the transcription machinery assembles [16, 17] and is usually 35 to 40 base pairs (bp) long. The proximal promoter is located in the immediate vicinity (-250 bp to +250 bp) of the gene's transcription start site (TSS). It contains several transcription factor binding sites (TFBS) and is thought to serve as a tethering element for enhancers, enabling them to interact with the core promoter [18]. Additional elements up to 5 kb upstream of the proximal promoter are often considered to be part of the 'promoter region', and are designated as such in this review. Mammalian promoters often contain CpG islands: sequences at least 200 bp in length with >50% G/C content [19]. CpG islands are unmethylated in tissues in which their target gene is expressed [20, 21], but can be silenced by methylation in disease states.

Numerous studies have shown that genetic variations in promoters have functional effects on drug response. Two well-studied pharmacogene promoter variants are those of genes encoding VKORC1 and UDP glucuronosyltransferase 1 family polypeptide A1 (UGT1A1), which have been linked to the anti-coagulation response of warfarin [22] and diarrhea and neutropenia toxicity caused by irinotecan, respectively [23, 24]. VKORC1 is targeted by warfarin (Figure 2), and a common variant (rs9923231, global minor allele frequency 0.467) in its promoter region (-1639G>A) results in the formation of a novel E-box binding site, leading to lower mRNA expression of VKORC1 [2], and a lower effective dose of warfarin. This site is thought to recruit transcription factors that suppress gene expression by activating repressive histone modification complexes. This variant can explain much of the variability in average dose requirements among Caucasians, and is incorporated in the warfarin-dosing algorithm to improve warfarin treatment outcome [2527]. The active form of irinotecan, SN-38, is metabolized through glucuronidation by the UGT1A enzyme [28], which has five to eight copies of 'TA' in its promoter. There is a negative correlation between the number of TA repeats and UGT1A1 expression levels, and the presence of seven repeats (denoted as UGT1A1*28) was shown to have a significant association with higher-grade neutropenia and diarrhea for patients treated with irinotecan [2931].

Figure 2
figure 2

Examples of regulatory element variants affecting promoter activity and microRNA binding. (a) A promoter variant in the gene encoding vitamin K epoxide reductase complex subunit 1 (VKORC1): VKORC1 is an enzyme that converts vitamin K into an active form that is vital for blood coagulation and is a warfarin target. Individuals with the -1639A promoter variant have diminished levels of this enzyme, resulting in increased sensitivity to a typical warfarin dose. (b) Certain forms of cancer become resistant to the chemotherapeutic cisplatin. This is the result of the miR-200b/200c/429 family of microRNAs, which nullify the action of the drug by binding to and downregulating AP-2α mRNA, a key effector of cisplatin treatment. In cell lines harboring a TFAP2A 3' UTR polymorphism, rs1045385 A>C, miR-200b/200c/429 cannot bind its mRNA target, resulting in increased responsiveness to cisplatin.

Several large-scale pharmacogene promoter sequencing studies have been conducted, and illustrate the potential advantages of using NGS technologies in the future. In one such example, promoters of 107 different ATP-binding cassette (ABC) transporters and SLC drug-associated transporters were resequenced in an ethnically diverse cohort of 272 individuals, identifying several variants that affect expression levels [32]. Another study systematically identified non-coding expression quantitative trait loci (eQTLs) affecting expression of liver cytochrome P450 superfamily (CYPs) enzymes [33], which play key roles in drug metabolism and toxicity. Studies such as these provide valuable functional annotations that can be mined in future pharmacogenomic association studies and whole-genome datasets.

Enhancers

Enhancers interact with promoters, instructing the promoters when, where and at what level to express their target gene. They can regulate in cis, meaning that they regulate a nearby gene on the same chromosome, or in trans, regulating genes on a different chromosome [34]. cis enhancers can be located 5' or 3' distal to the regulated gene, in introns or even within a coding exon of their target gene [35, 36]. The Sonic Hedgehog (SHH) limb enhancer is approximately 1,000,000 bp away from its TSS, highlighting the difficulty in linking such elements with a target gene [37]. Enhancers are thought to direct tissue-specific expression in a modular fashion, and therefore a gene that is active in many tissues is likely to be influenced by multiple enhancers [38, 39]. While genetic variation within enhancers can have direct consequences in human disease states [15, 4042], information regarding their role in interindividual drug response is scarce.

A useful example of pharmacogene enhancers from the literature is that of liver CYPs, which metabolize the vast majority of pharmaceutical compounds. Of these, CYP3A4 is the most abundantly expressed in sites of drug disposition in the liver [43] and is also thought to single-handedly catalyze the metabolism of >50% of prescribed pharmaceutical agents. CYP3A4 activity can vary 5- to 20-fold between individuals (depending on the substrate) [44] and its protein expression level can vary up to 40-fold [45]. Of the 28 common SNPs in the CYP3A4 locus, none has been linked to variability in its expression [46], suggesting regulatory variation. Two regions 7.7 kb and 10.5 kb upstream of the gene encoding CYP3A4 were shown to drive its expression in transgenic mouse studies [47, 48]. A trinucleotide insertion within this region is present in about 3.1% of the French population, and leads to reduced induction of CYP3A4 expression in cell culture models [48]. Although this insertion is relatively rare in other populations and its effects on adverse drug reactions are unclear, this study provides evidence that enhancer variants can lead to interindividual differences in drug response. Distal enhancers have also been discovered for genes encoding other CYP family members, including CYP1A1 [49], CYP1B1 [49], CYP2E1 [50] and CYP2B6 [51], as well as genes encoding other liver enzymes, such as alcohol dehydrogenase 4 (ADH4) [52] and UGT1A1 [53]. The phenotypic effects of variants within these regions are mostly unknown, mainly due to the difficulty in carrying out physiologically relevant studies.

Our laboratory recently used comparative genomics to identify evolutionarily conserved regions (ECRs) in proximity to nine liver membrane transporters, which were screened in vivo using the hydrodynamic tail vein assay [3]. In this technique, a large volume of plasmid-containing delivery solution is rapidly injected into the adult mouse tail vein, causing specific expression of a reporter such as luciferase in the liver. Five ECRs in the vicinity of the genes encoding ABCB11, SLC10A1, SLCO1B1, SLCO1A2 and SLC47A1 were identified as enhancers using this approach. Common human genetic variants within these regions were further functionally characterized, one of which was associated with reduced mRNA expression of SLCO1A2 in human liver tissue samples. Another variant was associated with increased clearance of MTX, a chemotherapeutic substrate of SLCO1A2 that is used to treat several malignancies, as well as psoriasis and rheumatoid arthritis. As NGS techniques become more widely adopted, we will be able to rapidly identify distal regulatory elements and key variants within them.

MicroRNAs

miRNAs are small (18 to 25 nucleotide) noncoding RNAs that regulate gene expression by binding to complementary 3' UTRs of target genes. They are endogenously transcribed as precursors and processed [5456] into mature forms. Mature miRNAs harbor a two to eight nucleotide 'seed' region at the 5' end of the miRNA that is crucial for binding to target mRNA. Upon binding, the miRNA initiates translational repression or cleavage of its target mRNA [57, 58]. SNPs within the seed region of the miRNA or the binding site on the target mRNA (miRSNPs) affect targeting of the miRNA and can lead to interindividual expression differences. Rarer variants can occur in genes involved in miRNA biogenesis and maturation, leading to more severe, syndromic phenotypes [5962]. Compared with enhancers, miRNAs are relatively easy to identify using computational tools [63, 64] and extensive databases of known and predicted miRNAs are publicly available [65].

Despite abundant evidence that miRNAs participate in almost all aspects of cell biology [6668], there are only a handful of examples of their role in interindividual drug response. Overexpression of miR-27b, which binds the 3' UTRs of CYP1B1 [69] and CYP3A4 [70], leads to CYP3A4 downregulation and increased sensitivity to cyclophosphamide [70]. miR-27a and miR-451 activate expression of P-glycoprotein, an ABCB1 gene product that renders cancer cells resistant to chemotherapeutics. Treatment with miR-27a and miR-451 antagomirs, small synthetic RNAs complementary to their target miRNAs, results in increased accumulation of doxorubicin in drug-resistant cells. [71]. It has been reported recently that miR-200c is downregulated in patients resistant to breast cancer therapy, as well as in human breast cancer cell lines resistant to doxorubicin [72], suggesting additional mechanisms in this pathway.

An example of a mutation affecting pharmacogene miRNA targeting is the C829T variant near the miR-24 binding site in the 3' UTR of the gene encoding human dihydrofolate reductase (DHFR). DHFR is a key metabolic enzyme important in DNA synthesis. MTX binds DHFR with high affinity, inhibiting its activity in malignant cells. The C829T variant interferes with miR-24 targeting, resulting in DHFR overexpression and MTX resistance [73]. Another example is rs1045385 A>C in the miR-200b/200c/429 binding site in the 3' UTR of the gene encoding transcription factor AP-2α (TFAP2A) (Figure 2). AP-2α acts as a tumor suppressor by regulating key genes involved in cell proliferation and apoptosis, and can be induced by the chemotherapeutic agent cisplatin. Cancerous cells from certain endometrial and esophageal tumors can become resistant to cisplatin by upregulating miR-200b, miR-200c and miR-429. These molecules bind to the 3' UTR of AP-2α mRNA, repressing protein translation. The AP-2α rs1045385 C SNP interferes with miR-200b/200c/429 targeting of AP-2α , thus preventing its downregulation and resulting in effective cisplatin treatment [4].

Wang et al. [74] recently performed a pair-wise correlation coefficient analysis on expression levels of 366 miRNAs and 14,174 mRNAs in 90 immortalized lymphoblastoid cell lines. They identified 7,207 significantly correlated miRNA-mRNA pairs, with a good representation of metabolic enzymes (for example, CYP family) and drug transporters (for example, ABC and SLC family). Datasets such as these provide excellent troves of candidate regulatory elements for functional validation. The use of NGS methods such as RNA-Seq will greatly aid in the generation of such datasets and will help shed light on the role of miRNAs in regulating drug responses.

NGS approaches for investigating pharmacogene regulatory elements

Although gene promoters are easily identified by their location, their activity in different cell types and disease states can be altered by a myriad of intrinsic regulatory factors and genetic variants. Epigenetic factors such as DNA methylation can also affect promoter activity, resulting in differential response to drug treatment [7579]. For example, methylation of the promoter of the gene encoding O-6-methylguanine-DNA methyltransferase (MGMT) is a good predictor of the efficacy of temozolomide in the treatment of glioblastoma patients [78, 79]. NGS technologies that can analyze DNA methylation on a genomic scale (Table 2), such as MethylC-Seq [7, 8], reduced representation bisulfite sequencing (RRBS) [80, 81], methylated DNA immunoprecipitation sequencing (MeDIP-Seq) [82], methylated DNA binding domain sequencing (MBD-Seq) [83], CXXC affinity purification plus deep sequencing (CAP-Seq) [84] and methylation-sensitive restriction enzyme sequencing (MRE-Seq) [85] will facilitate future systematic epigenetic studies of pharmacogene regulation.

Regulatory proteins influence gene expression by interacting with specific DNA sequences. Determining which proteins bind to which sites in the genome is the first step to understanding regulatory mechanisms. Chromatin immunoprecipitation (ChIP) approaches have been widely used for this purpose. More recently, coupling ChIP with NGS (ChIP-Seq; Figure 3) [86] has become the de facto standard as it provides an unbiased, genome-wide look at enhancer binding with a high signal to noise ratio [87]. In addition to specific regulatory proteins, the availability of specific, high-quality antibodies for histone modification marks has been used to characterize chromatin regulatory states [10, 88, 89]. The nucleosome core consists of histone proteins, which can be modified post-translationally (for example, by methylation, acetylation, phosphorylation, ubiquitination and sumoylation). These modifications determine the regulatory state of the genomic region they are in (active, silent, and so on) and can be used to detect various gene regulatory elements such as promoters or enhancers. For example, developmentally active enhancers can be identified by the acetylation of the 27th lysine of histone H3 (H3K27ac) [90].

Figure 3
figure 3

Next-generation sequencing (NGS) techniques to identify gene regulatory elements. For RNA-Seq, complementary DNA (cDNA) is generated from RNA of interest, fragmented either as cDNA or RNA, followed by the ligation of sequencing adapters. In chromatin immunoprecipitation followed by next-generation sequencing (ChIP-Seq), chromatin is crosslinked with formaldehyde, fragmented and then immunoprecipitated with a specific antibody. The crosslinking is then removed and sequencing adapters are ligated. For next-generation sequencing of deoxyribonuclease I (DNaseI) hypersensitive sites (DNase-Seq), chromatin is digested with DNase I. One biotinylated adapter is ligated and then the fragments are digested with restriction enzyme MmeI and subjected to biotin pull-down following which a second adapter is ligated. In formaldehyde-assisted isolation of regulatory elements (FAIRE)-Seq, chromatin is crosslinked with formaldehyde and then fragmented via sonication. Fragments are subjected to phenol-chloroform extraction and sequencing adapters are ligated to fragments recovered in aqueous phase. For chromatin interaction analysis by paired-end tag sequencing (ChIA-PET), chromatin is crosslinked with formaldehyde and then fragmented via sonication. An antibody is used to enrich for protein-bound fragments. Biotinylated half-linkers with MmeI sites are ligated. Half-linkers are connected, and linked fragments are digested with MmeI and selected for by biotin pull-down. Sequencing adapters are then ligated. In high-throughput chromosome capture (Hi-C), chromatin is crosslinked with formaldehyde, fragmented using a restriction enzyme and ends are labeled with biotin and ligated. DNA fragments are then sheared and biotin-containing fragments are enriched via biotin pull-down. Sequencing adapters are then ligated.

Large-scale, multi-center efforts have mapped the binding of dozens of regulatory proteins in a variety of human cell lines [91]. These include the treatment of cells such as the human hepatocyte cell line HepG2 with factors such as forskolin, insulin and pravastatin. Transcription factors such as signal transducer and activator of transcription 1 (STAT1) or 3 (STAT3) [92, 93], co-activators such as the CREB binding protein (CREBBP/CBP) and E1A binding protein p300 (EP300/p300) [94, 95] that co-localize with enhancers, p160 co-regulators [95] and nuclear receptor proteins such as farnesoid × receptor (FXR/NR1H4) and pregnane × receptor (PXR/NR1I2) [96, 97] have been successfully used in ChIP-Seq assays. However, only one study to date has used ChIP-Seq to interrogate a pharmacogenomically relevant drug response in vivo. Cui et al. [98] mapped PXR binding in mice before and after treatment with pregnenolone-16α-carbonitrile (PCN). PCN is analogous to rifampin, an antibiotic with hepatotoxic side effects, and is thought to activate many of the same targets. In addition to identifying many novel PXR-bound loci, the authors identified a new DNA motif recognized by the factor. Results such as these are invaluable in elucidating complex drug response mechanisms. Furthermore, regulatory regions that are enriched using ChIP-Seq approaches only after the addition of a drug make attractive candidates for interindividual drug response variant discovery.

Over the past two decades, microarray-based methods have substantially improved our ability to quantify gene transcription through gains in throughput. RNA-Seq (Figure 3) has the potential to push the boundaries of our knowledge further by offering an unbiased approach that requires no prior knowledge of transcript variants, and offers single base pair resolution and high dynamic range [1113]. RNA-Seq is currently the only method that can rapidly detect novel splice isoforms [99101] and mRNA sequence variants (for example, RNA editing) on a genome-wide scale. Current commercially available RNA-Seq sample preparation kits require as little as 10 pg of total RNA [102], allowing the possibility of strand-specific sequencing of mRNA species from single cells.

A primary use for RNA-Seq in pharmacogenomic studies is the determination of gene expression profiles that can be correlated with drug response phenotypes. Different proteins are responsible for pharmacokinetic interactions with the drug (how the drug enters the cells and reaches its target) and pharmacodynamic interactions (how the drug exerts its cellular effects), and it is therefore not useful to focus on any one particular gene. For example, breast cancer cell lines demonstrate differential responses to drugs based on their gene expression profiles [103]. In addition, expression levels of several genes were shown to be statistically associated with response to various common chemotherapy agents such as etoposide [104], cisplatin [105] and carboplatin [106]. The Pharmacogenomics Knowledge Base [107, 108] project curates pharmacogenomic data from a wide variety of basic and clinical reports, using them to construct drug pathways. The complexity of these pathways, which routinely involve a dozen or more genes and multiple deleterious variants, highlights the need for genome-wide profiling approaches.

For miRNA and miRSNP discovery and profiling, RNA-Seq offers unprecedented scale and depth. For example, Lee et al. [109] conducted a comprehensive survey of miRNA sequence variations from human and mouse samples using RNA-Seq. This study demonstrated the complexity of the human miRNA spectrum through deep sequencing of isomiRs (miRNA sequence variants generated from the same precursor by different processing mechanisms). So far, only a few studies have used NGS methodologies to identify miRNAs for diagnostic or prognostic applications. Most of this research was carried out on tumor development and progression [110114], with sporadic reports of miRNA profiling in non-cancer diseases (for example, endometriosis, preeclampsia) [115, 116].

The systematic identification and characterization of other classes of regulatory elements will improve our knowledge of how regulatory nucleotide variants affect drug response. Silencers (also termed repressors), which can be thought of as the opposite of enhancers, turn off gene expression at specific time points and locations [14, 15, 117, 118]. Insulators create cis-regulatory boundaries that prevent the transcriptional activity of one gene from affecting neighboring genes [14, 15, 119, 120]. Variants in these elements almost certainly influence interindividual drug responses and remain to be identified.

Future directions for pharmacogene regulatory element discovery

Several emerging NGS techniques have not yet been directly applied to pharmacogenomics, but promise to greatly improve our ability to interpret regulatory variants. Accessible DNA elements residing in active chromatin often harbor regulatory sequences such as promoters, enhancers, silencers and insulators. Deoxyribonuclease I (DNase I) hypersensitive sites (HS) cluster around TSSs, but a significant portion also maps to regions distal to known TSSs [121]. These sites can be mapped genome-wide by DNase-Seq (Figure 3), requiring no prior knowledge of specific transcription factors. A related approach, known as FAIRE [122], can identify open chromatin by phase separation (Figure 3). There is a high level of correlation between FAIRE and DNase HS sites in general, but unique sites are discovered by each because of slight differences between the techniques [123]. Both DNase-Seq and FAIRE-Seq will be useful in broadly identifying regulatory elements in pharmacologically relevant tissues and will help prioritize SNP discovery efforts.

Enhancers are thought to interact with promoters through chromatin looping (Figure 1) [124]. These looping interactions can be identified through techniques such as chromatin conformation capture (3C) and several of its derivatives (4C [125], 5C [126]). With the advent of NGS, whole-genome adaptations of this technique have been introduced, such as Hi-C [127] and ChIA-PET [128] (Figure 3). A great advantage of these techniques over ChIP-Seq, DNase-Seq and FAIRE is that they can link regulatory elements with their target genes. They could therefore be employed to systematically link variants with individual expression profiles, much like eQTLs but with the power to identify long-distance and trans interactions.

NGS technologies are constantly being improved, allowing higher throughput and the ability to ask biological questions on a genomic scale. 'Third-generation', single molecule sequencing platforms are forthcoming and are reviewed in detail elsewhere [129]. Besides higher throughput and longer read lengths, they offer the advantage of eliminating the amplification step, minimizing non-linear biases and thus alleviating some of the informatics and statistical challenges associated with analyses of RNA-Seq and ChIP-Seq data [130132]. A significant limitation of current ChIP-Seq protocols is the low resolution (about 200 to 300 bp) with which TFBSs within regulatory elements can be identified. ChIP-exo partially eliminates this problem, using lambda exonuclease to facilitate strand-specific 5'-3' degradation, removing DNA not directly involved in the protein-DNA interaction [133]. This modification to the ChIP-Seq protocol significantly increases the signal-to-noise ratio, enabling much more precise peak-calling.

A major obstacle to overcome is our inability to functionally characterize candidate regulatory elements and variants with high throughput. Techniques such as ChIP-Seq, DNase-Seq, FAIRE, Hi-C and ChIA-PET are descriptive in nature. The development of techniques that will allow the functional characterization of thousands of these sequences in a high-throughput manner is critical. One technique that can potentially overcome this hurdle is the use of transcribed barcodes in massively parallel reporter assays, the abundance of which can be measured by RNA-Seq. Using this methodology, thousands of promoter variants were tested in a single experiment [134], and key bases that negatively impacted expression were identified. This methodology has been recently followed up with enhancers in vitro using human cell lines [135] and in vivo using the mouse hydrodynamic tail vein assay [136]. Further development of such approaches will permit the high-throughput functional characterization of regulatory elements and nucleotide variants within them.

Translational implications of pharmacogene regulation

As we learn more about how specific variants in regulatory sequences contribute to differential drug responses, it will become more commonplace to personalize drug dosing. Warfarin has become a poster child for pharmacogenomics due to the frequency with which it is prescribed and the importance of genetic testing on proper dosing. Warfarin has a very narrow therapeutic index: there is small difference between clinically beneficial and toxic doses and a large variation in the maintenance dose. Several reports have confirmed VKORC1 as the target of warfarin and CYP2C9 as the principal enzyme responsible for its metabolism [137139]. Together with non-genetic information, such as age, weight and drug interactions, variants affecting the expression of these genes can explain approximately 50% of the variability of warfarin maintenance dose [26]. A prospective study demonstrated the therapeutic benefits of genotyping known CYP2C9 and VKORC1 variant alleles in ensuring an optimal dosage of warfarin [137]. The clinical implications of the VKORC1 -1639G>A regulatory variant and a coding variant of CYP2C9 prompted the Food and Drug Administration (FDA) to add this information to warfarin labeling [140].

Another example of a regulatory variant that has been translated to the clinic is the UGT1A1*28 promoter allele, which alters an individual's response to the anticancer drug irinotecan. The information about the UGT1A1 variant and summary of the clinically significant adverse reactions, related to severe neutropenia and diarrhea, have been added in the 'Warnings' section of the FDA-approved drug label [140]. If a patient is known to be homozygous for the UGT1A1*28 allele, clinicians are instructed to reduce the prescribed dose of irinotecan by one level. Patients who are taking irinotecan are often monitored for adverse reactions and to allow early relief of side effects. Genotyping tests for pharmacogene variants are becoming more widely available, along with guidelines to help clinicians with dosing and dosing adjustment [26, 141143].

Despite the fact that we have discovered many functional pharmacogene variants, the uptake of pharmacogenomic testing in the clinic has been slow. There is mounting evidence that pharmacogenomics data can play an important role in identifying responders and non-responders to drugs, avoiding side effects and allowing optimized dosing for patients. However, the link between biologically significant correlations and the therapeutic impact of adopting new clinical practices is unclear. It is vital that we develop a useful framework to sift through and prioritize functional variants for clinical study. At the same time, it will also be necessary to promote training and education among health professionals about the value of pharmacogenomic testing before new policies can be widely adopted.

Concluding remarks

Over the past few years, NGS technologies have greatly accelerated the identification of regulatory elements. However, their use has mainly been limited to genome annotation of physiologically normal cells and tissues. With time, their use in interpreting pharmacogenomic drug-gene interactions will grow rapidly. With each individual genome having millions of nucleotide variants, and reference epigenomic datasets soon to be widely available, there will be a vital need for ways to limit the search space for biologically relevant variants. The use of these technologies in a drug-specific manner, such as the study carried out by Cui et al. [98] to map PCN-induced PXR binding sites in mice, will provide the opportunity to highlight drug response-associated regions in these whole-genome datasets.

A major challenge will be to bring these experimental results to a clinical setting. Strong collaboration will be needed between scientists, software engineers, clinicians and pharmacists in order to generate tools to visualize and interpret genomic variations. Several ethical issues will also need to be addressed, such as the privacy and confidentiality of this genomic data, how it will be stored and who can access it, keeping in mind that this information will be extremely important throughout the entire prescription process. In addition, the development of rapid high-throughput assays to functionally characterize variants in pharmacogene-associated regulatory elements is still needed. Techniques that use transcribed barcodes alongside NGS technologies, as mentioned previously [134136], hold great promise. However, these techniques need to allow the rapid functional assessment of uncharacterized nucleotide variants of each individual. This is necessary to allow the functional consequence of these variants to be analyzed before a drug is prescribed. The existence of such technologies could also be extremely useful in cancer treatment by allowing assessment of how de novo mutations alter the efficacy of a drug treatment. The ultimate goal of these studies would be to provide information to a physician or pharmacist regarding an individual's genomic sequence and any drug-associated gene or regulatory element variants so that the most efficient and least harmful drug for each patient can be prescribed.