Strategic Applications of Gene Expression: From Drug Discovery/Development to Bedside
- 584 Downloads
Gene expression is useful for identifying the molecular signature of a disease and for correlating a pharmacodynamic marker with the dose-dependent cellular responses to exposure of a drug. Gene expression offers utility to guide drug discovery by illustrating engagement of the desired cellular pathways/networks, as well as avoidance of acting on the toxicological pathways. Successful employment of gene-expression signatures in the later stages of drug development depends on their linkage to clinically meaningful phenotypic characteristics and requires a biologically meaningful mechanism combined with a stringent statistical rigor. Much of the success in clinical drug development is hinged on predefining the signature genes for their fitness for purposes of application. Specific examples are highlighted to illustrate the breadth and depth of the potential utility of gene-expression signatures in drug discovery and clinical development to targeted therapeutics at the bedside.
KEY WORDSclinical molecular signatures molecular signatures of disease signature genes target engagement toxicological pathways
With the increased availability and lowering costs of DNA technologies, gene expression has become a more readily used tool indispensable in drug discovery and development. Educational institutes and NIH have joined forces to produce several sizable publicly accessible gene expression databases, such as the Connectivity Map (1) and the Tox 21 project (2); these have been compiled to provide resources for mining, with the hopes to provide drug discovery opportunities and repurposing of drugs (3, 4, 5). In order to treat orphan diseases where it is difficult to pool enough data to provide an approvable drug, an enriched data source with multiple databases integrated properly is being pursued (6). Beyond discovery, examples of gene expression are being used in the clinic and demonstrate attempts to optimize personalized medicine (7,8). In light of the breadth and diverse application of gene expression in drug discovery and clinical application, this paper will briefly review examples of gene expression in various stages of drug development followed by a high-level strategic map of the fit-for-purpose application of gene expression with the recognition that this type of platform will continue to advance rapidly.
Depicting Drug Actions at the Molecular Level
One major advantage in drug development is the ability to leverage known biology by developing drug candidates that can either target or avoid specific pathways or networks. For example, the cellular pathways leading to cell death (apoptosis) can be a differentiator in oncology drug candidates; on the other hand, being able to avoid the same cell death pathways could lead to an improved drug safety profile in other therapeutic areas. Gene expression technologies, including microarray, quantitative real-time polymerase chain reaction, and next-generation sequencing, are useful for achieving that goal by illustrating engagement of the targeted receptor(s), pathway or network through the downregulated or upregulated pattern(s) of the intended drug target(s). As gene expression technologies advance, gene expression profiling of the whole genome has become affordable and thus more rapidly adopted. Whole genome expression profiling offers the advantage of providing a series of snapshots of the cellular transcriptional patterns to reveal the temporal cellular responses following exposure to a drug candidate and, thereby, depicts its extent of intracellular actions (9), as well as identifies the genes untowardly perturbed due to its suspected off-target actions (10). An adequate analysis of genome-wide expression data to comprehensively capture all the key actions at the molecular level is, however, challenging because of the complex multiple layers of genetic regulatory mechanisms and feedback loops which are mediated over time. To address this issue, one widely used tactic is gene clustering which groups genes into individual functional categories (11) in order to facilitate proper interpretation of gene expression results. Gene clustering can be used in conjunction with curated knowledge bases of pathways in aiding the interpretation of gene expression results (12). Though several pathway databases are publicly accessible (13,14), there exists known inconsistencies in the annotation/curation and vocabulary standards, as well as in pathway details (15). For example, inconsistency in the signaling pathways and networks can arise between primary and immortalized cells even though they are both derived from the human liver (16). Another challenge in the assessment and prediction of target engagement is when only one cell line is used, perhaps because of cost issues. That interpretation of one cell line could lead to an incorrect conclusion. Furthermore, the environmental milieu surrounding the cell may cause different cellular responses at both gene and protein levels (16). Most importantly, a few cell lines derived from some individuals will never adequately represent the broad genetic diversity and variability of the whole human population. The results of a gene expression study are often extrapolated beyond the scope of its study design. Thus, correct illustration of target engagement or off-target effect is hinged on our understanding of the context of a pathway in a specific cell line and on incorporation of both translational and biochemical phenotypic characteristics for the interpretation of transcriptional results.
Beyond the cell lines described above, comparisons between healthy and diseased tissues could potentially identify druggable targets (17). The pattern in the gene expression of a diseased tissue is expected to be the complete opposite after being treated with an effective drug. For example, if a signature gene set is upregulated in the diseased condition, then it is expected to be downregulated compared to pretreatment when treated effectively. Based on this differing pattern notion, gene expression profiles of individual FDA-approved drugs in Connectivity Map were compared to 100 diseases from the Gene Expression Omnibus database of NCBI; there was a statistical association between cimetidine with small-cell lung cancer (18) and topiramate (19) with inflammatory bowel disease (IBD). The current approved therapeutic diseases were gastric ulcer for cimetidine and epilepsy for topiramate. With the statistical associations, investigators demonstrated the preclinical efficacy of topiramate in treating IBD in a rat model and cimetidine in treating non-small cell lung carcinoma with a tumor xenograft mouse model. These preclinical findings await further replication prior to entering clinical development; however, these studies demonstrate the utility of integrating disease-associated and drug-caused perturbations at the transcriptional level for drug discovery and repurposing of already approved drugs.
Genetic mutations create diverse characteristics within an ethnic population and among all ethnic groups and form the basis for the variability in the sensitivity of patient responses to a drug treatment. Recent advance in the next-generation sequencing technology has made whole exome sequencing more accessible and affordable for detecting genetic mutations in a more comprehensive way. Genome-wide expression profiling with a limited number of cell lines when used in conjugation with genetic mutation data can better predict and understand the variability of patient response to a drug treatment. Such an approach integrating both genetic and transcriptomic levels of information is important to ensure the success of large phase 3 clinical trials and adequate post-approval efficacy of a drug. Since it is not practical to access the organ tissues from a very large number of human subjects for genome-wide expression profiling, a compendium of close to 1,000 human cell lines with diverse lineage and genetic mutation is shown to be a useful alternative to gather the needed information for computational assessment of the varying effect of a drug candidate (20). This large number of cell lines increased the confidence in the finding that neuroblastoma RAS viral (v-ras) oncogene homolog (NRAS) mutant cell lines were sensitive to MAPK/ERK kinase (MEK) inhibitors (20).
Taken together, next-generation DNA sequencing and genome-wide expression analysis allow detailed dissection and delineation of context-specific cellular responses at the molecular level (21) for discovering new drug targets and for repurposing of an FDA-approved drug to treat the diseases for which it has not been indicated and for focusing on developing those drug candidates that have a greater likelihood to show clinical efficacy at the population level.
Targeting Pathways Common to Multiple Diseases
Several studies have reported analysis of large gene expression data sets focusing either on specific disease areas, such as respiratory diseases (22,23), infectious diseases (24), and cancers (25), or on profiling a specific tissue type, such as peripheral blood mononuclear cells (PBMC) (26) or whole blood (27), across multiple diseases. These studies collectively provide a rich data set for identifying distinct common pathways/gene modules that are shared by pathophysiological processes of multiple diseases. Other approaches integrate both genetics and transcription profiles from crucial tissues to identify the causal genes that are associated with the phenotype of a disease (28,29). These disease phenotype-associated causal genes often reside in the same genetic network. One such network, the macrophage-enriched metabolic network, was found to be highly enriched in the genes that are causal for metabolic syndrome and was identified in both rodent (30) and human (31). These genes identified via disease or tissue specific pathways could be further studied to potentially identify new targets for drug discovery.
Recently, a gene signature composed of ∼2,500 genes was identified in 12 expression profiling data sets derived from 9 different tissues of rodent inflammatory disease models, including ovalbumin-challenged asthma model (lung), IL-1β transgenic emphysema model (lung), TGFβ Tg transgenic pulmonary fibrosis model (lung), high-fat diet-treated ApoE knockout atherosclerosis model (aorta), db/db diabetes model (adipose and islet), ob/ob obesity model (adipose) carrageenan-induced inflammation pain model (skin), Chung neuropathic pain model (dorsal root ganglia), middle cerebral artery occlusion (experimental stroke model, brain), LPS-treated acute injury model (liver), and age-related sarcopenia model (muscle) (32). These genes significantly overlapped with the known drug targets and contained co-expressed genes linked to metabolic disorders, infectious diseases, and cancers. A large proportion of the genes in this “inflammatome” are connected in several tissue-specific Bayesian networks built from multiple independent mouse and human cohorts. Both the “inflammatome” signature and the corresponding consensus Bayesian network were highly enriched in immune response-related genes which have been found causal for adiposity, adipokine, diabetes, aortic lesion, bone, muscle, and cholesterol traits, thereby supporting the causal nature of the “inflammatome” signature. A further integrated analysis with multiple Bayesian networks highlighted 151 key regulators potentially and biologically relevant to several disease phenotypes (32). Hematopoietic cell kinase (Hck), one of the key regulators identified, for example, has been shown to be associated with chronic obstructive pulmonary disease (33) and with a poor outcome of chronic myeloid leukemia (34). Tyrobp/Dap12 contains an immunoreceptor tyrosine-based activation motif which is a key regulator implicated in presenile dementia with bone cysts and in a cognitive disorder Nasu–Hakola disease (35,36). There are common molecular characteristics and pathways shared by these various diseases, indicating that there are potential targets for developing individual drugs possibly useful for treating more than one disease (37).
Avoiding Toxicological Pathways
Gene expression profiling, due to its capacity to detect comprehensive transcriptomic alterations in the target cells or tissues, has been used to de-risk therapeutic agents under development in all major drug categories including small molecules, biologics, and small interfering RNA (siRNA) as exemplified in the succeeding paragraphs.
An analysis of liver transcriptomes led to the identification of several key cellular pathways affected by ritonavir, an HIV protease inhibitor (PI) (38). The results were then compared to a gene expression compendium from 52 unrelated compounds and to other PIs, including atazanavir and 2 experimental HIV PIs. As a result, the key biological pathways associated with the ritonavir signature genes were cholesterol and fatty acid biosynthesis. Ritonavir reportedly upregulated the ubiquitin proteasome system (UPS) as well, which contains multiple proteasomal subunit transcripts and genes involved in ubiquitination (39). As a result, the established association between proteasomal induction and lipid elevations from the analysis was applied to screen for the novel PIs that do not induce the UPS (40), in hopes of avoiding the unwanted lipid elevations associated with those earlier approved PIs.
IL-13 shows direct actions on lung epithelial and smooth muscle cells (41) and is implicated in airway hyperreactivity; thus, the IL-13 pathways provide an attractive target for drug development for asthma treatments. Two types of signaling IL-4/IL-13 receptors have been characterized. IL-4Rα chain and the common γ (γC) chain constitute the type I receptor, which is utilized solely by IL-4 and is expressed primarily in the lymphoid cells, while IL-4Rα and IL-13Rα1 form the type II receptor, which is shared by both IL-4 and IL-13 and is ubiquitously expressed (42). The blocking of IL-13Rα1 provides an advantage in the initial differentiation of CD4 T cells into Th2 cells, and subsequently, the IL-4 signaling through the type I receptor will be not be impacted. Under this notion, three humanized anti-IL13Rα1 mAbs with affinity maturation were developed and gene expression profiling was conducted in a primary normal human dermal fibroblast (NHDF) cell line. IL-13 generated a robust and consistent signature in the NHDF line and all three humanized anti-IL13Rα1 mAbs significantly inhibited the signature. An IL-13 activity index ranked the relative potency of the overall inhibitory effect of each mAb. By establishing a specific cutoff in the number of signature genes generated by each antibody alone in the absence of IL-13 (i.e., off-target effect), the same rank order as the activity index was observed, thus suggesting that the top ranked mAb would provide a most favorable safety profile since it induced the least number of potentially off-target genes.
RNA interference (RNAi) is a gene regulatory pathway which can be employed to effectively knock down any target gene and is currently being developed into potential novel therapies (43). One of the most common issues with RNAi therapeutics is the off-target effects which could lead to adverse events (AEs) in the clinic (44). Many strategies have been adopted to improve RNAi specificity in order to reduce off-target gene expression and to reduce immune stimulation. For example, 2′-O-methyl ribosyl substitution at position 2 in the guide siRNA strand and structurally asymmetric siRNA design could be adopted to achieve improved siRNA specificity; in addition, Fucini et al. showed that 2′-fluoro modification of adenosine significantly reduced cytokine induction by siRNA in human PBMC (45, 46, 47). Gene expression technologies have been the most widely used technology to monitor RNAi-induced off-target effects. Jackson et al. was the first group to apply genome-wide expression profiling to assess the specificity of siRNA knockdown in cultured human cells and discovered that off-target silencing could occur in genes containing as few as 11 identical contiguous nucleotides to the siRNA (48,49).
TRANSITION TO EARLY CLINICAL DEVELOPMENT
Gene expression profiling can provide a scientific bridge between cellular transcriptomic characteristics and clinical phenotypes following treatment with a drug. For example, increased expression of a suspected immunosuppressive gene signature, regulated by the nuclear factor of activated T cells in transplant patients receiving cyclosporine, associated with recurrent infection and development of skin cancer (50,51). Surrogate tissues, such as blood, skin, or hair follicles, are used to understand the on-target and off-target effects in more inaccessible organs (52, 53, 54), such as the kidney or liver. In advanced renal cancer treated patients, a set of gene transcripts observed in PBMC was associated with the cumulative exposure to a drug (55). The changes in the expression of Ki-67 (proliferation-related Ki-67 antigen), phospho-S6 (phosphorylated S6 ribosomal protein), cyclin D1, and progesterone receptor signatures in breast cancer tissues were shown to be useful pharmacodynamic biomarkers that associated with fewer events in estrogen receptor-positive patients by combining everolimus with letrozole as compared to everolimus alone (56). These results suggested that the PI3K/Akt/mTOR pathway plays a key role in patients’ response to anti-endocrine therapy (56, 57, 58). These examples demonstrate potential applications of using gene transcript signatures to support our molecular level understanding and prediction of clinical responses. At the present time, gene expression profiles, however, cannot quantitatively link, in a precise predictive manner, the cellular response to in vivo patients’ responses for an appropriate choice of dose without translational functional/response studies in animals.
Disease Molecular Signatures
One common practice is to use animal models to predict human responses in the clinic; gene expression profiling is one tool that can be leveraged to associate the changes in animals with the human clinical outcome. The gene expression changes can be tested in either direction. Either the disease biomarker of transcriptomic nature is first modeled in animal models and then tested for consistency in humans or findings in humans can be extrapolated back into the animal models since the preclinical setting typically requires a shorter time frame, costs less, and can help differentiate backup compounds. Transcriptomic analysis can also be used in a longitudinal manner for understanding the progression of a disease or changes in treatment. Gene expression can be useful for understanding diseases; for example, exploratory studies include mRNA expression profiling for diabetic nephropathy using urinary pellets (59), hepatic tissues from subjects with normal liver and with alcoholic hepatitis (60), and cartilage/synovium from osteoarthritic patients and animal models as detailed in the succeeding paragraphs.
Osteoarthritis (OA), a disease associated with reduced synovial joint function and increased pain, afflicts greater than 30 million individuals (61). No consistently effective method exists for preventing OA or halting its progression despite the available clinical treatments. There are several potential biomarkers associated with the clinical progression of OA, including the detection of proteolytic products of cartilage matrix components (62) and gross changes in the structure and content of articular cartilage, subchondral bone, synovial membrane, joint ligaments, and tendons (63). Due to the slow and intermittent progression of this disease, there is a critical need to identify biomarkers that determine the OA disease course and predict its rate of progression. Since human samples are relatively inaccessible, an integrated analysis was performed on the OA-related samples derived from four species (64), including human OA knee cartilage, cartilage from the mouse STR/Ort model, cartilage from the rat anterior cruciate ligament (ACL) transection model, and synovium from the dog ACL model. Approximately 3,000 cartilage signature genes were identified in human OA samples which were in common with at least one preclinical species. Annotation of the upregulated common signature genes pointed to the pathways related to skeletal development, extracellular matrix–receptor interaction, focal adhesion, phosphate transport, and blood vessel development. Additional analyses were performed between human OA cartilage and six mouse inflammatory disease models. By focusing on the human OA-specific genes, a set of potential OA biomarkers, including asporin (ASPN), gremlin 1 (GREM1), and matrilin 3 (MATN3), were identified. The literature confirms the biological relevance of the previously identified human OA-specific gene signature. ASPN has been shown to be highly expressed in tenocyte, synoviocyte, and chondrocyte. An aspartic acid repeat polymorphism in the promoter of the ASPN gene inhibited chondrogenesis induced by TGFβ and increased susceptibility to OA (65). Skeletal overexpression of GREM1 impaired bone formation and caused osteopenia (66). MATN3 is also highly expressed in chondrocytes, and a sequence variant of MATN3 is a risk factor for OA (67). In addition, MATN3 knockout mice caused chondrocyte prematuration to hypertrophy and increased bone mineral density and OA (68). This gene expression signature could be a potential OA biomarker for identifying future drug candidates to treat OA but needs additional research to demonstrate its clinical utility.
There have been several studies employing gene expression technologies to understand drug-induced adverse reactions (10,69,70). Multiple types of data are often needed to pinpoint the mechanism of action involved in a clinical drug-induced adverse reaction. Both gene expression patterns of myeloma plasma cells and single-nucleotide polymorphisms from each patient were used to compare and contrast the early onset and late-onset neuropathy after the administration of bortezomib or vincristine (69). The genes associated with late-onset differed from early onset peripheral neuropathies after the administration of bortezomib; genes associated with late-stage neuropathy also differed between bortezomib and vincristine treatments. The genes responsible for the absorption, distribution, and metabolism of vincristine seemed to be associated with its treatment-related neuropathy. Despite the compelling results, these gene-expression signatures and genetics cannot explain the drug-induced adverse reactions completely. Carfilzomib is the second in the class of proteasome inhibitors developed; when compared to bortezomib, carfilzomib treatment did not cause neuropathy as frequently or as severely (70). Though HtrA2/Omi was upregulated by both drugs, it was only inhibited by bortezomib. HtrA2/Omi is a member of mitochondrial serine protease involved in mitochondria homeostasis (71). This example illustrates the complexity of gene expression, and perturbation of a gene (upregulated or downregulated) by a drug does not necessarily reflect any direct interaction (positive or negative) between the drug and the gene. The observed differential inhibition by bortezomib indicates that integrating the genomic, transcriptomic, and biochemical information with the safety phenotypic data for comparisons between treatments is important to clearly delineate and precisely pinpoint the true mechanism underlying severe adverse reactions associated with treatment administration.
Vaccination is one of the most effective methods for controlling infectious diseases. Typically, laborious antibody titer measurements and T cell response assays are used to evaluate the efficacy of vaccines. As for vaccine safety, conventional animal toxicity tests which assess development-, reproduction-, and immunogenicity-associated safety issues are evaluated by repeated dosing and animal weight change monitoring; the animal toxicity tests are costly and time-consuming as the vaccine safety tests. Intuitively, a more rapid and precise assessment of vaccine efficacy and safety can provide a market advantage in developing vaccines. Several reports have been published in the past few years using gene expression profiling technology to evaluate vaccine safety. In general, the gene expression results were consistent with the degree of toxic effects observed in more traditional assays, such as the abnormal toxicity test and the leukopenic toxicity test (72, 73, 74). More recently, the systems vaccinology approach describes using the genome-wide gene expression underlying the host responses to vaccination (75, 76, 77, 78). In these studies, blood signature genes associated with B cell or T cell response were flagged as potential biomarkers to help differentiate vaccine efficacy or immunogenicity.
For assessing vaccine safety or reactogenicity, additional analysis methodologies, such as gene module approach (26) and metagene model (79), were employed to characterize the vaccine-modulated blood signature genes. Signature genes obtained were then annotated by pathway analysis tools (80). The gene module approach (26) was developed to generate gene expression fingerprints which provide a stable framework for the visualization and functional annotation of blood gene expression results. Essentially, that framework was derived from gene expression profiles generated using the Affymetrix GeneChips (>44,000 probe sets) in 241 PBMC patient samples with 8 diseases (systemic juvenile idiopathic arthritis, systemic lupus erythematosus, type I diabetes, metastatic melanoma, Escherichia coli infection, Staphylococcus aureus infection, influenza A infection, and liver transplant recipients). The co-expressed transcripts were segregated into 28 modules by k-means clustering, and each module contained between 22 and 322 transcripts. The genes within the majority of modules were associated with a particular cell type, biological pathway, or process (26). A module scoring algorithm was developed to obtain module fingerprints with easily distinguishable module scores which allowed association with clinical measurements, such as antibody titer (immunogenicity) or adverse reaction (reactogenicity). When the association analysis for a pilot study was performed between the reactogenicity scores derived from 7 marketed or experimental vaccines (Adacel, Menactra, Havrix, Prevnar, RabAvert, and Merck’s V512/influenza and MRKAd5gag/HIV vaccines) and the 28 blood gene modules (26), the modules identified to be significantly associated with the severity of AEs included one module containing multiple interferon-inducible genes and immune-related transcription factors. The interferon regulatory factor-1 (IRF1) results describing a key transcription factor regulating cellular interferon response are consistent with a literature report by Reif et al. (81), in which smallpox vaccine-associated adverse effects in healthy, vaccinia virus-naive adult volunteers were shown to be associated with two single-nucleotide polymorphisms in the IRF1 gene. Such evidence from both levels of gene expression and genetic mutations are mutually supportive for understanding the role of the IRF1 gene in AEs associated with vaccines.
LATE-STAGE DRUG DEVELOPMENT AND BEDSIDE APPLICATION
As described in the many examples described previously, genome-wide expression analysis is useful to depict the biological networks that could be acted upon by a drug candidate; this technology can disclose information beyond the traditional quantitative structure–activity relationship methods. A recent gene array study indicated that anthracyclines and etoposide are both known DNA topoisomerase II inhibitors, but anthracyclines, not etoposide, could also act as global transcription repressors (82). Coadministration of transcriptional repressors was shown to counteract the pharmacological actions of the drugs that increased the expression of the proapoptotic protein. As a result, the investigators proposed that bortezomib should not be coadministered with anthracyclines or other transcriptional repressors. Given that these drugs are being used clinically together, leveraging the biological knowledge gained from the genome-wide association at the bedside would help avoid cancelling efficacy as a result of drug–drug interaction and exposing patients to toxicity.
This comprehensive iPOP combining genomic, transcriptomics, proteomic, metabolomic, and autoantibody profiles as described previously (83) could possibly be more routine in the future with the advancement in next-generation sequencing technology. However, due to the cost and the large amount of data to analyze, the iPOP approach will remain a research tool for quite some time.
DATA ANALYSIS CHALLENGES
Drug Discovery and Development
Gene array technologies demand the integration of a substantial amount of dynamic data across time with very limited and valuable samples; the ability to perform a robust statistical analysis to answer the study question and to translate a gene-expression signature to a relevant clinical endpoint continues to remain very challenging.
Applications of the gene-expression signatures change as the signature moves from discovery to use in the clinic per the fit-for-purpose framework. In the discovery stage, there is much more flexibility in the statistical analyses and study design. Though clustering is a common method used, it is important to point out that other methods of statistical analyses are also being used and that many of these analyses methods used in this space can lead to overfitting. Understanding the limitations of how vastly different data sets are combined can help drive the statistical methodology chosen. As described previously in many of the examples, separation of conditions (disease versus no disease or treated versus no treatment) aid in the discovery of relevant adverse experiences or subgroup with increased efficacy; the cleaner that divide, the easier it is to pick up “true” signal. In addition, the multiplicity adjustment can be relaxed to allow more false positives in for consideration when merged together with the biological interpretation on the back-end. Though the lack of or limited biological knowledge needs to also be considered in assessing the molecular phenotypes.
As the gene expression signatures are significantly qualified in scope for clinical use, for example, stratification onto treatment, the statistical analyses become much more straightforward and clinical trial statistics can be applied with more rigorous multiplicity adjustments. One real challenge is the development of a companion diagnostic in a timely fashion that would enable bedside use given the fine balance among discovery, replication, and determination of the threshold levels that accompany the assay development.
In summary, there exists no one solution, statistical methodology, or paradigm to move from discovery to clinical use in this space given the high dimensional nature of the data which quickly overwhelms the much smaller sample size. Biological knowledge is also changing and evolving as rapidly as technology. So, as exemplified by many of the examples described above, the ideal path is to (1) focus the study question, (2) cast a wider net at first by integrating many of the technologies with phenotypes and then (3) tighten that net in terms of the set giving the most signal with biological relevance, and (4) replicate findings.
Identifying Clinical Molecular Signatures for Complex Diseases
Building clinically relevant molecular signatures that can be used for the diagnosis, prognosis, and management of complex diseases is the key of personalized medicine. Data analysis is very critical to the development of clinically robust molecular signatures (97). Less stringency is often applied when analyzing molecular profiling technologies for exploratory purposes as compared to use in the clinic for say treatment assignment. Whereas suboptimal data analysis protocols (e.g., biased, underpowered, leading to redundant biomarkers, etc.) exist for exploratory research and hypothesis generation, they are not acceptable for clinical use. Clinical-grade molecular signatures are typically subject to very stringent requirements and even have a separate regulatory path, depending on its use in the trial. Closing the gap between the standards for exploratory versus clinically successful molecular signatures entails a thorough understanding of possible biases in the data analysis phase and the development of strategies to avoid them. Some pitfalls of data analyses for identifying gene expression-based predictive signatures include (1) using unsupervised methods (e.g., clustering) (98); (2) biasing signature accuracy estimation by conducting supervised gene selection both on training and testing data (98); (3) failing to identify predictive signal because of the lack of power in the study or by conducting gene selection for a different phenotype (99); (4) selecting an arbitrary molecular signature without accounting for other equally predictive coexisting molecular signatures (i.e., the phenomenon of “molecular signature multiplicity”) or without testing for statistical redundancy of molecular signatures; and (5) not accounting for the effect of normalization and/or other data preprocessing on selecting molecular signatures, on building of predictive models, and on estimating their accuracy (100). Concerted efforts on establishing agreed-upon guidelines for development as well as validation of molecular signatures (101) and development of more standardization of software are needed to avoid the above biases. The most important gap is the replication of significant associations per description in the fit-for-purpose model.
Gene expression profiling and many of the emerging molecular profiling technologies have become an integral part of drug discovery/development and even personalized patient care. As of now, gene expression profiling is being used especially in oncology to attempt to tailor a better treatment regimen based on tumor subtypes. The challenge is in developing a companion diagnostic in a timely fashion to enable use at the bedside given the fine balance among discovery, replication, and determination of the threshold levels that accompany assay development. Opportune applications of gene-expression signatures change as the signature moves from discovery to use in the clinic per the fit-for-purpose framework. Gene array technologies enable the integration of a substantial amount of dynamic data across time; the ability to perform a robust statistical analysis to answer the study question and to translate a gene-expression signature to a relevant clinical endpoint continues to remain very challenging.
Alexander Statnikov was supported in part by NIH/NLM grant 1 R01 LM011179-01.
- 5.Connectivity Map. http://wwwbroadinstituteorg/cmap/. Accessed July 2012.
- 6.Developing Orphan Products: FDA and Rare Disease Day. http://wwwfdagov/ForIndustry/DevelopingProductsforRareDiseasesConditions/ucm239698htm. Accessed July 2012.
- 13.KEGG (Kyoto Encyclopedia of Genes and Genomes). http://wwwgenomejp/kegg. Accessed July 2011.
- 14.Reactome. http://wwwreactomeorg/ReactomeGWT/entrypointhtml. Accessed January 2012.
- 16.Alexopoulos LG, Saez-Rodriguez J, Cosgrove BD, Lauffenburger DA, Sorger PK. Networks inferred from biochemical data reveal profound differences in toll-like receptor and inflammatory signaling between normal and transformed hepatocytes. Mol Cell Proteomics. 2010;9(9):1849–65. doi: 10.074/mcp.M110.000406.PubMedCrossRefGoogle Scholar
- 27.Banchereau R, Jordan-Villegas A, Ardura M, Mejias A, Baldwin N, Xu H, et al. Host immune transcriptional profiles reflect the variability in clinical disease manifestations in patients with Staphylococcus aureus infections. PLoS One. 2012;7(4):e34390. doi: 10.1371/journal.pone.0034390.PubMedCrossRefGoogle Scholar
- 28.Puig O, Wang IM, Cheng P, Zhou P, Roy S, Cully D, et al. Transcriptome profiling and network analysis of genetically hypertensive mice identifies potential pharmacological targets of hypertension. Physiol Genomics. 2010;42A(1):24–32. doi: 10.1152/physiolgenomics.00010.2010.PubMedCrossRefGoogle Scholar
- 52.Locatelli G, Bosotti R, Ciomei M, Brasca MG, Calogero R, Mercurio C, et al. Transcriptional analysis of an E2F gene signature as a biomarker of activity of the cyclin-dependent kinase inhibitor PHA-793887 in tumor and skin biopsies from a phase I clinical study. Mol Cancer Ther. 2010;9(5):1265–73. doi: 10.58/535-7163.MCT-09-1163.PubMedCrossRefGoogle Scholar
- 54.Berkofsky-Fessler W, Nguyen TQ, Delmar P, Molnos J, Kanwal C, DePinto W, et al. Preclinical biomarkers for a cyclin-dependent kinase inhibitor translate to candidate pharmacodynamic biomarkers in phase I patients. Mol Cancer Ther. 2009;8(9):2517–25. doi: 10.1158/535-7163.MCT-09-0083.PubMedCrossRefGoogle Scholar
- 55.Boni JP, Leister C, Bender G, Fitzpatrick V, Twine N, Stover J, et al. Population pharmacokinetics of CCI-779: correlations to safety and pharmacogenomic responses in patients with advanced renal cancer. Clin Pharmacol Ther. 2005;77(1):76–89. doi: 10.1016/j.clpt.2004.08.025.PubMedCrossRefGoogle Scholar
- 56.Baselga J, Semiglazov V, van Dam P, Manikhas A, Bellet M, Mayordomo J, et al. Phase II randomized study of neoadjuvant everolimus plus letrozole compared with placebo plus letrozole in patients with estrogen receptor-positive breast cancer. J Clin Oncol. 2009;27(16):2630–7. doi: 10.1200/JCO.2008.18.8391.PubMedCrossRefGoogle Scholar
- 60.Affo S, Dominguez M, Lozano JJ, Sancho-Bru P, Rodrigo-Torres D, Morales-Ibanez O, et al. Transcriptome analysis identifies TNF superfamily receptors as potential therapeutic targets in alcoholic hepatitis. Gut. 2012. doi: 10.1136/gutjnl-2011-301146.
- 67.Pullig O, Tagariello A, Schweizer A, Swoboda B, Schaller P, Winterpacht A. MATN3 (matrilin-3) sequence variation (pT303M) is a risk factor for osteoarthritis of the CMC1 joint of the hand, but not for knee osteoarthritis. Ann Rheum Dis. 2007;66(2):279–80. doi: 10.1136/ard.2006.058263.PubMedCrossRefGoogle Scholar
- 68.van der Weyden L, Wei L, Luo J, Yang X, Birk DE, Adams DJ, et al. Functional knockout of the matrilin-3 gene causes premature chondrocyte maturation to hypertrophy and increases bone mineral density and osteoarthritis. Am J Pathol. 2006;169(2):515–27. doi: 10.2353/ajpath.006.050981.PubMedCrossRefGoogle Scholar
- 69.Broyl A, Corthals SL, Jongen JL, van der Holt B, Kuiper R, de Knegt Y, et al. Mechanisms of peripheral neuropathy associated with bortezomib and vincristine in patients with newly diagnosed multiple myeloma: a prospective analysis of data from the HOVON-65/GMMG-HD4 trial. Lancet Oncol. 2010;11(11):1057–65. doi: 10.1016/S1470-2045(10)-0.PubMedCrossRefGoogle Scholar
- 77.Palermo RE, Patterson LJ, Aicher LD, Korth MJ, Robert-Guroff M, Katze MG. Genomic analysis reveals pre- and postchallenge differences in a rhesus macaque AIDS vaccine trial: insights into mechanisms of vaccine efficacy. J Virol. 2011;85(2):1099–116. doi: 10.128/JVI.01522-10.PubMedCrossRefGoogle Scholar
- 78.Balas C, Kennel A, Deauvieau F, Sodoyer R, Arnaud-Barbe N, Lang J, et al. Different innate signatures induced in human monocyte-derived dendritic cells by wild-type dengue 3 virus, attenuated but reactogenic dengue 3 vaccine virus, or attenuated nonreactogenic dengue 1–4 vaccine virus strains. J Infect Dis. 2011;203(1):103–8. doi: 10.1093/infdis/jiq022.PubMedCrossRefGoogle Scholar
- 85.Becker H, Marcucci G, Maharry K, Radmacher MD, Mrozek K, Margeson D, et al. Favorable prognostic impact of NPM1 mutations in older patients with cytogenetically normal de novo acute myeloid leukemia and associated gene- and microRNA-expression signatures: a Cancer and Leukemia Group B study. J Clin Oncol. 2010;28(4):596–604. doi: 10.200/JCO.2009.25.1496.PubMedCrossRefGoogle Scholar
- 88.Jonsson G, Staaf J, Vallon-Christersson J, Ringner M, Holm K, Hegardt C, et al. Genomic subtypes of breast cancer identified by array-comparative genomic hybridization display distinct molecular and clinical characteristics. Breast Cancer Res. 2010;12(3):R42. doi: 10.1186/bcr2596.PubMedCrossRefGoogle Scholar
- 94.PharmGKB. The Pharmacogenomics Knowledgebase. http://www.pharmgkb.org. Accessed July 2011.