Gastrointestinal (GI) cancers affect approximately 2.5 million people worldwide per annum, with close to half this number dying each year.1 Treatment is multimodality and various regimens of surgery, chemotherapy, and radiotherapy have been evaluated for best response guided by clinicopathological staging at the time of diagnosis. Apart from early-stage tumors where surgery alone may be curative, in most instances combination therapy has been shown to prolong survival. A number of studies have demonstrated that patients with locally advanced GI cancers benefit from neoadjuvant chemotherapy with or without radiation prior to surgical resection.24 This approach aims to downstage bulky tumors to improve resectability and treat potential metastatic disease.

Traditionally, selection of patients for neoadjuvant therapy is made on pretreatment stage alone, and although many patients respond well using this “one size fits all” approach, tumor response to treatment is neither uniform nor predictable. Pathological complete response (pCR), defined as complete absence of residual tumor in the resected specimen, is a good predictor of improved long-term outcome.5,6 The rate of pCR to neoadjuvant therapy in patients with GI cancers ranges from 10% to 30%.6,7 Unfortunately, a similar proportion of tumors fail to respond or even progress despite multimodality therapy. These patients derive no benefit from the additional chemotherapy or chemoradiotherapy (CRT) and yet are exposed to the potentially toxic side effects of this treatment as well as having their surgery delayed. Historically, clinicians have relied on real-time monitoring of the patient to track response and adjust treatment accordingly. The ability to predict response in advance of treatment would have a major impact on the clinical management of patients with GI tumors.

Sensitivity to treatment is dependent on many factors including drug metabolism, DNA repair, and cell death, all of which can have unique differences from patient to patient because of their underlying genetic makeup. While cancer is a highly complex disease, it is conceivable that common features may be shared between tumors that behave in a similar fashion in response to CRT. Thus, as we enter the era of personalized medicine, the potential for using pharmacogenomic approaches to identify, in advance, the probability of a patient responding to treatment holds great promise. This will enable more informed decision-making regarding the most appropriate therapeutic approach for each patient, leading to improved rates of response and survival.

Predicting the outcome of CRT in an accurate and reliable manner is an area of great research interest, and a number of different approaches have been investigated in the past. Methods include assessing clinical data, histological markers, serum analysis, DNA markers, miRNAs, and epigenetic modifications associated with tumor behavior.814 However, these approaches are still in the early stages of development and many have yielded conflicting evidence. As a result, they are yet to make the transition to routine clinical use, potentially because many of these markers only consider 1 or a few aspects of tumor biology. The advent of microarray technology, which enables the simultaneous assessment of the expression levels of thousands of genes, holds a great deal of promise in identifying genetic signatures associated with treatment response and overall prognosis. However, to date, a verified gene list capable of predicting outcomes in various GI cancers remains elusive.

In this review we address the promise, and the perils, of using gene expression profiling for predicting response and prognosis in primary esophageal, gastric, rectal, and anal cancers.

The Promise of Gene Expression Profiling

Microarray-based prediction has been used successfully to design tests to manage breast cancer treatment with the development of MammaPrint and OncoType DX, 2 commercially available platforms for breast cancer prognostics. These assays are excellent examples of microarrays, providing clinically important information. MammaPrint, a custom array that predicts risk of metastasis in lymph node negative patients, was developed following an initial study profiling 117 tumor samples. The 70 gene prognostic signature generated was then validated in a further 450 patients.1517 OncoType DX is a prognostic real-time PCR test for recurrence, overall survival, and chemotherapy benefit in node negative, estrogen receptor positive breast cancer patients. This assay was developed by selecting a pool of candidate genes from previous microarray studies. A panel of 21 genes was then used to generate a recurrence score from gene expression analysis of 668 tumors.18 Again, this signature was carefully validated in a number of independent investigations.19,20 The MammaPrint platform has now entered the MINDACT clinical trial, while the performance of OncoType DX is being assessed by the TAILORx trial. The success of these tests has provided “proof-of-principle” that has encouraged the search for microarray-based predictive approaches in other cancer types.

Predictive Microarray Studies in GI Cancer

Numerous studies have used microarray technology to investigate the predictive potential of the transcriptome in GI cancer (Table 1). Each study has generated classifiers capable of high predictive accuracy; however, little overlap is observed between these gene lists. When gene lists predicting similar outcomes for the same tissue type are carefully compared, only a handful of identical matches are evident. For example, in 3 similar publications investigating response to CRT in esophageal cancer only 2 genes, protein phosphatase 2 and ZW10 interactor, appear in more than 1 list.2123

Table 1 Predictive microarray studies in GI cancers

Searching for common genes in previously published data highlights that to date findings are conflicting, with studies investigating the same tumor type often failing to agree on genes influencing response and prognosis. This is due to a number of inherent problems associated with predictive medicine and a microarray-based approach. In order for this field to progress, confidence in these classifiers must be established through repeated validation. It is clear that with careful consideration these issues can be overcome. Below we address some of the concerns surrounding microarray experiments and what will be required to enable accurate prediction of patient response and prognosis in GI cancer.

Patient and Treatment Heterogeneity

Lack of independent validation between studies and discrepancies between gene lists can be attributed to many factors, with the first source of variation being introduced because of heterogeneous sample populations. Different patient demographics and varied treatment regimes introduce variation into predictive classifiers. Each cohort exhibits differences in age, race, and gender that impact the predictive profile identified. In addition, modern treatment modalities often involve combinations of therapies. Consequently, while some studies investigate outcome following radiotherapy alone, others use treatment approaches consisting of various combinations of chemotherapeutic agents with or without radiotherapy. It is probable that these various treatments act via diverse mechanisms, and the effects of multidrug combinations are likely to be complex. Thus, comparing and/or combining expression data from different treatment regimes may be difficult to interpret and limit the ability to generate a robust predictive algorithm. For example, a recent study predicting response to CRT in rectal cancer obtained a gene list capable of 87% correct prediction in a test set.32 However, a mixed chemotherapy regime was used in this cohort including combinations of fluorouracil and leucovorin, irinotecan, and capecitabine. While the predictive accuracy of this classifier is strong, given the potential differences in metabolism and mode of action of these drugs, predictive power might be improved by comparing gene expression differences between patients undergoing a single treatment regime. To ensure the identification of robust predictive classifiers, the number of extraneous variables should be limited as much as possible.

Source of Predictive Tissue

Tumors are varied in their composition and highly heterogeneous for stromal and cancerous components. It can be argued that as long as the gene list extracted is predictive the exact source or tumor content should be of no concern, but variations in tumor composition may affect the reproducibility of the data. Microdissection is often used to ensure a pure tumor cell population can be profiled. However reports are emerging demonstrating that stromal signatures are capable of indicating chemosensitivity, recurrence, and outcome.38 Thus, the predictive potential of the stromal compartment should not be ignored.

While the majority of studies attempting to elucidate predictive profiles simply determine tumor expression profiles, some have analyzed other tissues to identify patterns of expression associated with outcome. Recent evidence suggests that profiling non-neoplastic tissue can correctly predict clinical outcome in colorectal cancer patients more than 80% of the time.27,39 These studies raise the possibility that prognostic markers are present in normal tissues, consistent with the idea that response to treatment is an inherent characteristic of the individual rather than a property of the tumor itself. This possibility requires further investigation because of limited numbers and the lack of validation in the current studies.

Heterogeneity due to the existence of cancer subtypes must also be considered when attempting class prediction. Results may be confounded by genetically distinct tumor types. For example, although anal cancers are considered a single tumor type, microarray profiling has revealed the existence of 2 distinct molecular subtypes.37 In addition, predictive profiling of esophageal tumors revealed that while a predictive classifier could be generated for squamous cell carcinomas the same could not be achieved for adenocarcinomas, suggesting that these 2 subtypes behave very differently.21 These findings highlight that, in order to create robust predictors, the heterogeneity of the cancer itself must be taken into account and different cancer subgroups analyzed separately.

Sample Size and Statistical Analysis

Predictive microarray studies are often logistically challenging in addition to being costly in terms of time and money. These factors unfortunately often result in limited sample size that in turn jeopardizes statistical power. When considering patient numbers compared with the thousands of genes on array platforms, it is evident that robust statistical analysis is essential. The problem with a large number of variables in microarray experiments is that current bioinformatic approaches can inadvertently overestimate predictive accuracy. Classical statistical methods are focused toward handling large patient numbers and a small number of variables, while microarray experiments have an overwhelming number of variables in contrast to a small number of patients. If classical statistical methods are used and a P value set to .05, 5% of genes on the platform have potential to be identified as predictive when their association is due to random chance. Many current studies are subject to high false discovery rates. This means that the predictive classifiers identified contain a high proportion of genes falsely associated with response or prognosis. Increasing cohort size considerably will limit the generation of gene lists riddled with false positives.

The number of genes in the classifier must also be optimized to ensure a stable and highly predictive test. A gene list that is too small or too large will have limited predictive power. Focusing on 1 or just a few genes may result in less robust prediction in further prospective studies, depending on their true significance. Meanwhile, larger classifiers are highly specific for the cohort they were derived from, and many genes will not retain their predictive power when applied to different sets of patients.

Concerns have been raised regarding the true predictive power of microarray studies, with some suggesting microarray studies merely identify meaningless, clinically irrelevant gene lists. Michiels et al. analyzed the gene signatures from 7 large high-impact microarray studies and found that 5 of the 7 studies essentially could not predict classification better than chance alone and cautioned against overly optimistic interpretation of data.40 Analysis and overmanipulation of data can generate questionable results and artefacts. Therefore, it is essential that such studies are designed and carried out thoughtfully to gain the most appropriate and relevant information. In addition, it should be noted that Mammaprint and OncoType DX have yet to prove their worth. While these platforms have been carefully validated, it still is unclear how well these platforms are actually performing and impacting treatment decisions and patient outcome.41

Independent Validation

While numerous studies have used microarray profiling to identify predictive classifiers associated with response and prognosis in GI cancer, the differences in genes identified by each study is cause for concern. The reliability of microarray data will remain questionable until confirmation of profiles occurs. Without this, it is hard to argue that the gene lists are predictive.

Fortunately, independent validation has been achieved in some cases. Lin et al. independently profiled 149 colorectal tumors from New Zealand on oligonucleotide arrays and a second cohort of 55 colorectal tumors from Germany on Affymetrix U133A chips.29 Classifiers built from clinical and array data were used to predict recurrence. The classifier based on the New Zealand samples had a predictive power of 77%, while the classifier derived from the German samples was capable of 84% correct prediction. Importantly, predictive power was retained when the 2 groups switched classifiers and tested it on the alternate group’s data despite being generated from different platforms. These findings are extremely encouraging and demonstrate that microarray data holds robust predictive information.

Gene lists extracted by each array study will also obviously differ depending on the platform used. A clear source for differences between predictive gene lists is the fact that some platforms consider completely different genes to others, with little or no overlap. While some platforms offer comprehensive coverage of the transcriptome with tens of thousands of genes represented, other in-house arrays are more selective with only a few hundred genes represented.25,26

Definition and Clinical Value of Response

Attempts to stratify GI cancer patients in terms of response to neoadjuvant therapy have been problematic. The discrepancy between predictive gene lists can, to some extent, be attributed to the definition of response. How response is best defined is highly contentious and significantly impacts the gene lists identified during analysis. Some studies classify patients as “responder” versus “nonresponder,” while others compare “partial” versus “complete” responders. However, robust markers of response may be lost if a complete responder is compared with a partial responder since a partial response to treatment may simply reflect a larger starting size of the tumor or a slower rate of response. Certainly emerging evidence suggests that the most important factor in achieving pCR is an increased interval between neoadjuvant therapy and time to surgery.42

Adding further confusion to defining response is the way in which it is measured. The RECIST and WHO criteria for measuring response in solid tumors clearly define complete and partial response, but they are based on assessing the reduction in tumor volume rather than the residual cancer remaining at the tumor site that many consider a better measure of response.43,44 Assessment of residual tumor often uses tumor regression grade, a pathologic grading system measuring the reaction of the tumor to treatment, while others consider differences in T staging or Dukes stage pretreatment and posttreatment. However, while residual tumor is almost certainly a more definitive measure of likely outcome, is an 80% reduction in a large tumor really less of a response to the therapy than complete regression of a small tumor?

There is also some debate regarding the clinical application of a predictive test, and this can impact the way in which response is defined. One obvious application is to enable patients to avoid enduring the unnecessary side effects of an ultimately ineffectual treatment by identifying those who are unlikely to benefit from the therapy. In this case, a predictive test might seek to distinguish between “nonresponder” and “responder.” However, an alternative application is to identify those patients who are likely to achieve a complete response from their neoadjuvant therapy and thus might not require surgery.45 In this scenario, the critical cutoff would be between “partial” and “complete” responders. Clearly, the ultimate aim of the predictive test and how clinical management might be altered by the result must be considered when deciding how to define response and when assessing the success of a predictive test.

Retrospective Versus Prospective Analysis

The majority of studies published to date have been retrospective and represent an important step in developing a clinically relevant test. While retrospective studies are required for initially building the predictive classifier, the true test of class prediction power requires the ability of a classifier to predict response or outcome in prospectively collected samples. Some studies have used the approach of a training and test set that essentially involves both retrospective and prospective samples; however, numbers are quite limited. For example, 1 study investigating response to neoadjuvant therapy in colorectal cancer patients used a cohort of 52 patients to produce a gene list that predicted response with 82.4% accuracy in a test set of 17 further patients.31 The next phase of research requires much larger numbers and prospective analysis to confirm and validate findings from this and other preliminary studies.

Gene Lists Versus Pathway Analysis

Microarrays are a powerful tool for expanding knowledge in predictive medicine and can help narrow research focus to genes with the most significant roles. However, the large amounts of data generated from microarray experiments can be overwhelming and, as mentioned previously, gene list instability contributes to a lack of confidence in data. Indeed, it may be a case of “not seeing the forest for the trees.” That is, it is possible that focusing on a set of the most significant genes in a dataset may be rather uninformative, while identifying pathways rather than gene lists might provide greater insight into the biology of response.

Pathway analysis is emerging as an important tool, where predictive trends in the data in terms of affected pathways can help elucidate the underlying biological mechanisms of response and survival. For example, a recent publication that profiled 301 gastric tumors found that the proliferation/stem cell, NF-κB and Wnt/β-catenin pathways were altered in the majority of gastric cancers, and increased activation of the NF-κB and proliferation/stem cell pathways was associated with reduced survival.46 Furthermore, pathway analysis (using Ingenuity Pathway Analysis software, Redwood City, CA) of the gene lists from the 4 response prediction studies in esophageal cancer demonstrates that the NF-κB signaling network is a strong feature in 3 of these signatures.2124 Similarly, in 3 studies attempting to predict response of rectal tumors to CRT the generated gene lists are completely unique, with no shared genes, but pathway analysis reveals that the TNF signaling network is a common feature of all 3 classifiers (Brettingham-Moore, manuscript submitted).32,34,36 Since NF-κB is activated downstream of TNF, these results are consistent with the TNF/NF-κB molecular pathway having an important role in determining the sensitivity and/or resistance of tumor cells to CRT, and thus this pathway may be a potential target for novel therapeutic interventions aimed at increasing the sensitivity of tumors to treatment.

Lab to Clinic Transition for Predictive Arrays

Microarray-based research is helping to uncover the underlying genetic basis responsible for mediating patient response and prognosis; however, there is a long way to go. Predictive classifiers may not provide a definitive answer in terms of a patient’s response or prognosis, but will ultimately provide valuable additional information. Predictive technology will enable patients and oncologists to assess the likelihood that certain therapies will be beneficial and determine when and how to modify treatment options. More informed decision-making will ultimately enable increased rates of response and survival.

Unfortunately, a number of concerns have been raised regarding microarray studies attempting to classify GI cancer patients in terms of response or prognosis. The most obvious is the lack of independent validation between studies, and this must be addressed to enable this research to move forward. The main focus of future research in this area should be increasing participant numbers. This will require inter-institutional collaborations in which predictive classifiers generated by 1 study can be tested on an independent set of samples by another group. In the end, many combinations of gene lists may be predictive, and for translation to a clinically reliable test it may simply be a matter of picking 1 set or the most significant pathway and standardizing the test.

Currently, the use of predictive arrays in GI cancer is lagging behind relative to breast cancer. This is likely due to a number of factors, but perhaps first and foremost is the significantly greater volume of research into predictive medicine in breast cancer. Compared with breast cancer, upper GI cancers are much less common and ongoing controversies regarding optimal multimodality therapy for locally advanced cancers further reduce the opportunities for large studies. Colonic cancers are more prevalent, but resection remains the main initial treatment and thus predicting response to therapy is less relevant. Furthermore, while distinct molecular subtypes in breast cancer are well defined, the identification of molecular subtypes within GI cancers is less developed. Better characterization of genetic subtypes of GI cancers may reduce the biological variation in expression arrays and allow the generation of more robust predictive signatures for individual tumor subtypes.

Conclusion

Predictive genomics in GI cancer treatment is getting closer to reaching the stage of clinical trials, but it is not quite there yet. The next few years will prove to be critical in terms of conducting high-quality studies capable of confident and accurate prediction. Future studies must strengthen and build upon the previous reports in order to reach the point of changing clinical practice. The final hurdle remaining is the convincing validation of predictive classifiers with prospective studies utilizing sufficient patient numbers. Only then will it be possible to use predictive genomics routinely to tailor patient treatment.