The term “proteome,” coined in 1995, is analogous to “genome,” and was initially used to describe “the entire complement of proteins expressed by a genome, cell, tissue or organism” [1]. The human genome contains about 23,000 protein-coding genes [2], but because of the occurrence of alternative splicing, the proteome is much larger, and probably consists of more than 100,000 distinct polypeptides [3]. The prevalence of post-translational modifications contributes additional diversity. Proteomic analysis is more challenging than genomic analysis, but is also more rewarding, because it captures regulatory effects at all levels of gene expression (i.e., transcriptional, translational, and post-translational). The goal of proteomics, in most cases, is the discovery of protein biomarkers, which are signatures of physiological or disease state. These can be used, alone or in combination, for screening and diagnosis, establishment of individual prognosis, prediction of individual response to therapy, and monitoring of disease progression [4, 5].

This review is in three parts. First, we present an overview of current proteomic technologies. Proteins and peptides are much more chemically diverse than nucleic acids, and the technologies required for proteome analysis are correspondingly more complex. It is helpful to introduce terminology and current technical approaches before considering clinical studies in detail. Second, we describe the use of proteomic technologies in GI oncology. This compilation is based on a MEDLINE search of the literature through August 2008. In addition to summarizing our results in table form, we provide an overview and briefly summarize a few notable findings in the main text. Third, we discuss prospects for application of proteomic findings in GI oncology, including the limitations of current studies and a discussion of steps that are needed to advance the field to the next level.

Proteomic Technologies

Figure 1 provides an overview of currently available analytical strategies, including types of samples and profiling methods.

Fig. 1
figure 1

Strategies for proteomic analysis of clinical samples. Samples may include serum, other body fluids, or tissue. Profiling may be antibody-based or MS-based. A variety of labeling and protein separation techniques may be used prior to the MS. Top-down and bottom-up approaches differ in the order in which steps are performed. In many proteomic studies, key findings are validated by independent means (see text for details and definition of additional terms)

Types of Samples

Samples used for clinical proteomics come from three sources: serum, other accessible body fluids (e.g., saliva, gastric juice, pancreatic juice, or bile), and tissue. Serum is advantageous for screening and early detection of disease because collection is minimally invasive. It presents challenges for analysis, however, because cancer biomarkers are likely to be much more dilute in serum than in the tissue of origin. An additional complication is that a majority of serum peptides likely represent fragments of larger proteins degraded by various proteases [6]. GI fluids are advantageous because they are relatively organ-specific, and proteins of interest may be present at higher concentrations than those in the serum. Because collection is difficult and somewhat invasive, analysis of GI fluids is an option that is primarily applicable in symptomatic patients. Tissue samples are advantageous because tissue is the ultimate source of biomarkers present in serum and other fluids. Although collection is invasive, tissue extraction provides access to a variety of intracellular regulatory proteins, such as regulatory kinases or transcription factors, which would not routinely be present in serum or GI fluids. Tissue studies thus may provide more insight into disease mechanisms than can be obtained by analysis of samples from non-tissue sources.

Tissue proteomics may use material from bulk dissection or laser capture microdissection (LCM). The latter, illustrated in Fig. 2, allows analysis of specific cell types (e.g., cancer cells free of stroma) [7]. Other methods of sample fractionation have also been used to enrich for cancer cells, for example passage through a narrow gauge needle to detach tumor cells from stroma [8]. Tissue proteomics can also be performed using imaging mass spectrometry (IMS), where tissue sections are analyzed directly by mass spectrometry, circumventing the need for microdissection or protein extraction (recently reviewed in [9]).

Fig. 2
figure 2

LCM. a Thermoplastic membrane is placed over a tissue section, b infra-red laser pulse is used to heat a 7.5–30 μm diameter spot, briefly melting the membrane and capturing cells of interest. Heating and cooling of the membrane apparently has no adverse effect [7]. c Cells of interest become attached to the membrane and can be lifted from the slide for downstream analysis. d Application of LCM on colonic epithelium and colon cancer tissue slides

Use of archival tissue is complicated by covalent protein modifications introduced by common methods of fixation and staining. Although several recent reports describe analysis of peptides recovered from formalin-fixed, paraffin-embedded tissues [1013], the most common method of sample preservation for proteomic analysis is freezing, which necessitates dedicated sample collection. Alternative techniques based on alcohol or other chemical fixatives have also shown promise [14, 15].

Profiling Methods

Mass Spectrometry (MS)-Based Profiling

Many clinical studies are exploratory, that is, broad surveys of the proteome without prior knowledge of the proteins of interest. Profiling of tissue extracts can be performed using either a “top-down” or “bottom-up” approach (Fig. 1). Top-down approaches begin with one or more separation steps to resolve individual proteins, or classes of proteins, in a complex mixture. Because intact proteins are physically and chemically diverse, there is no single universally applicable separation method. Two-dimensional gel electrophoresis (2-DE), which separates proteins based on charge in one dimension and size in the other [16], can separate up to 5,000 distinct proteins simultaneously [17]. Proteins may be reacted with fluorescent CyDyes prior to electrophoresis, or the gel may be stained afterward. Limitations of 2-DE are that it cannot be fully automated, and tends not to resolve proteins that are large, hydrophobic, or strongly basic. Liquid chromatography (LC) provides an alternative to 2D gels; although it can be used for intact proteins, it has found wider application in the “bottom-up” approaches discussed below.

Surface adsorption is a specialized separation and concentration method used in surface enhanced laser desorption ionization (SELDI). A protein solution is incubated with a hydrophobic, charged, or other surface that is fabricated as part of a chip that can be introduced directly into a mass spectrometer, producing a complex mass spectrum that provides a “fingerprint” of a physiological state or disease. SELDI is easily applied to large numbers of clinical samples and is most commonly used for serum studies.

In a bottom-up approach, the order of the analytical steps differs [18, 19]. Proteins in the sample mixture are first digested to completion with a site-specific protease. Peptides (rather than intact proteins) are chromatographically separated by high-resolution ion exchange and reverse-phase LC. Products are again analyzed by MS in the final step [20]. Although throughput is limited, bottom-up approaches can identify very large numbers of proteomic features, and they can be applied to proteins that are difficult to solubilize and resolve when intact [21]. Top-down and bottom-up approaches are thus complementary and potentially provide somewhat different information.

Options for final MS analysis are similar in all approaches. A “soft ionization” procedure creates peptide ions in the gas phase, using mild conditions that maintain peptide bonds intact. In matrix-assisted laser desorption ionization (MALDI) (and its specialized variations, SELDI and IMS) a laser pulse is directed at a mixture of protein sample and an organic matrix (Fig. 3a) [22]. With electrospray ionization (ESI), the other common soft ionization method, a protein or peptide solution passes through a heated capillary, spraying droplets of solution into a vacuum chamber containing a strong electric field, where they then evaporate and ionize (Fig. 3b) [23]. The ions are passed through a mass analyzer, which separates them based on mass-to-charge (m/z) ratio.

Fig. 3
figure 3

MALDI-MS and ESI-MS procedures. a In MALDI-MS, samples are co-crystallized with an organic matrix on a metal target plate. A pulsed laser irradiates the co-crystals, which causes rapid heating and desorption of ions into the gas phase. Ions go through the mass analyzer and the detector registers the numbers of ions at each individual mass-to-charge (m/z) value, then the peptide mass fingerprint is generated. MALDI-MS produces relatively simple spectra composed of ions with unit charge. b In ESI-MS, sample molecules are ionized directly in the analyte solution by passing through a heated capillary device, spraying droplets of solution into a vacuum chamber containing a high-strength electric field. The resulting ions pass through a mass analyzer and detector as in a. ESI-MS produces complex spectra with multiply charged ions

Data is obtained in the form of a mass spectrum—a histogram with ion counts on the vertical axis and m/z on the horizontal axis (Fig. 3). In SELDI, pattern analysis may be applied to detect features of the spectrum that correlate with disease state, even without identification of proteins at the molecular level. In other MS procedures, the goal is molecular identification of proteins present in the sample. This is done by matching a pattern of peptides, or “peptide mass fingerprint,” against a human protein database. Confidence in MS identifications is based on coverage (the fraction of the protein’s total sequence represented among the identified peptides) and statistical criteria particular to the method used.

Tandem mass spectrometers, which have more than one mass analyzer connected in series, can perform MS/MS. This involves selection of an ion of interest based on m/z, partial fragmentation at peptide bonds (by collision with an inert gas), and passage of the products through a second mass analyzer, with the resulting fragmentation pattern providing amino acid sequence information for the precursor ion. Partial sequence data obtained by MS/MS provides a further basis for identification [24].

Imaging MS is performed by first coating a thin (10-μm) tissue section with organic matrix. The section is systematically moved underneath a laser beam and a mass spectrum is collected at each position. Software renders the data as a spatially resolved density map showing relative abundance of peptides or proteins of interest [9, 25].

Quantification of Protein Abundance

Clinical laboratory studies require quantification of molecular species, rather than simple determination of presence or absence. Neither gel staining nor mass spectrometry provides a good indication of absolute quantity. Quantification thus relies on multiplex analysis, where samples from different sources are differentially labeled, mixed, and subjected to electrophoretic or chromatographic separation. The abundance of each protein or peptide is determined relative to the corresponding feature in the other sample. Clinical samples can be compared directly (e.g., diseased versus normal) or indirectly with reference to an invariant internal standard, consisting of a mixture of samples used in the experiment.

Two-dimensional difference gel electrophoresis (2D-DIGE) is the most common multiplex top-down approach [26, 27] (Fig. 4). Proteins are covalently labeled by reaction of cyanine dyes with cysteine or lysine residues. Spectrally distinct dyes are similar in molecular weight and do not change the protein charge. Thus, the same proteins in different samples, labeled in different colors, migrate to the same position in the gel. For each spot, the ratio of emission at different wavelengths provides a measure of relative abundance [28].

Fig. 4
figure 4

Two-dimensional difference gel electrophoresis (2D-DIGE). a Representative gel images of proteins from analysis of a microdissected CRC specimen in our laboratory. Red represents Cy5-labeled sample proteins, and green represents Cy3-labeled pooled internal standard. In the multiplexed image, spots that are more abundant in the sample than in the standard appear red, spots that are less abundant in the sample appear green, and spots that are equal in the sample and the standard appear yellow. b Design of a clinical proteomics experiment. In this example, which is based on analysis of cancer-normal pairs, each patient contributes two samples: cancer and adjacent normal tissue. The number of gels equals the number of samples. For each spot in each gel, the ratio of emission at Cy5 and Cy3 wavelengths is measured. These “internal ratios” are used to compare the relative abundance of a given protein across the different specimens in the experiment

Isotope-coded affinity tag (ICAT) technology is the analogous method for the bottom-up approach. The ICAT reagent combines three moieties: a biotin group, a heavy or light isotope-tagged linker (e.g., containing 2H vs. 1H, or 13C vs. 12C), and a thiol-specific reactive group that reacts with cysteine in the protein sample (Fig. 5a) [29]. Two samples, pre-labeled with heavy- or light-isotope ICAT reagent, are mixed and proteolytically digested (Fig. 5b). Tagged peptides are isolated by avidin affinity chromatography and analyzed by LC-MS [30]. The relative abundance of heavy and light isotope peaks for each peptide provides an accurate measure of the relative abundance of the peptide in different samples. A variation, isotope-coded protein label (ICPL) [31], is based on isotopic labeling of free amino groups in proteins, which are more abundant than thiols. Another variation, isobaric tags for relative and absolute quantification (iTRAQ) allows multiplexing of up to four samples simultaneously [32].

Fig. 5
figure 5

Schematic illustration of ICAT procedure. a ICAT reagent combines three moieties: a biotin tag, a heavy or light isotope-tagged linker, and a thiol-specific reactive group. b Samples, labeled with heavy- or light-isotope ICAT reagent are mixed and digested. Tagged peptides are isolated by avidin affinity chromatography and analyzed by LC-MS. The relative abundance of heavy and light isotope peaks for each peptide is then measured. Peptides of interest can be identified by MS/MS analysis

Antibody-Based Profiling

In contrast to MS-based methods, antibody-based profiling requires prior knowledge of proteins of interest. Tissue microarrays exemplify a broad class of technologies referred to as protein arrays where proteins or tissue samples are spotted on a surface and probed with antibody (Fig. 6a) [33, 34]. Often used for validation of biomarkers identified in MS-based methods, they have the same advantages and disadvantages as other forms of immunohistochemistry (IHC). Interpretation of staining patterns can be subjective, and quantification is less precise than with other proteomic methods [35].

Fig. 6
figure 6

Protein microarray technology. a Tissue microarray. Multiple tissue sections (or protein extracts) are spotted onto an array, which is incubated with a specific antibody against the protein of interest. Samples that contain the protein of interest are then detected. b Antibody microarrays: A series of capture molecules (antibodies) are displayed on a slide or membrane that is exposed to analytes (a tissue lysate). The bound proteins are detected by labeled secondary antibodies

In another variation on array technology, a panel of antibodies is spotted on a surface and incubated with a solubilized mixture of proteins. After washing, the protein bound to each spot is quantified using a labeled secondary antibody or reagent (Fig. 6b) [36]. The technology of antibody arrays is just beginning to be applied in GI oncology [37, 38] and holds promise as a method for simultaneous analysis of multiple biomarkers, or “proteomic signatures” in a clinical laboratory setting.

Use of Proteomic Technologies in GI Oncology

Methods

To identify relevant literature, we searched MEDLINE through August 2008 using entry terms including “proteomics,” “biomarker discovery,” “mass spectrometry,” “gastrointestinal tumor,” “serum,” “human tissue,” “gastric juice,” “pancreatic juice,” “bile,” “GI secretions,” “esophageal cancer,” “gastric cancer,” “small intestine tumor,” “colorectal cancer,” “pancreatic cancer,” “hepatocellular carcinoma,” and “cholangiocarcinoma” in different combinations. English-language abstracts of the retrieved articles were reviewed and categorized. In all but a few cases, full articles were obtained and reviewed. Additional citations were obtained from review articles and from the bibliographies of cited references.

Serum Biomarkers

We identified 57 serum-based studies (Table 1). Of these, 54 used MS-based profiling, while three recent studies applied antibody-based profiling [3739]. All but one of the MS-based studies used a “top-down” strategy, in the majority of cases SELDI-MS (38/54 studies).

Table 1 Serum proteomic surveys relevant to cancer and other diseases of the GI tract

Diseases of the Alimentary Tract

Eight studies analyzed sera from patients with esophageal cancer or related premalignant conditions such as dysplasia or basal cell hyperplasia. In three of these, anonymous SELDI m/z peaks were used in classification algorithms to discriminate between normal or disease states [4042]. In a fourth SELDI study, m/z peaks were used to distinguish chemoradiation responders from non-responders [43]. Two 2-DE studies identified a small number of serum proteins that differed in pre- and post-surgery patients, with no overlap in the proteins identified in the two reports [44, 45]. Two other studies identified characteristic serum autoantibodies against peroxiredoxin VI and heat-shock protein 70, respectively, as potential diagnostic biomarkers [46, 47].

Five studies of gastric cancer have used the SELDI approach [4852]. In each case, SELDI identified combinations of m/z peaks that correctly classified most cancer patients versus other subjects. In one study, relevant peaks were identified as stress-related proteins, including heat-shock protein 27, glucose-regulated protein, and protein disulfide isomerase [50]. Levels of these proteins declined following surgery, suggesting that they could be used in surveillance for recurrence [50]. Specificity and sensitivity using SELDI-based biomarkers were higher than those achieved for the same samples using two established markers, carcinoembryonic antigen (CEA) and carbohydrate antigen (CA) 19–9, in combination [52]. A very recent study used antibody microarray technology to explore serum biomarkers of gastric cancer. Serum reactivity with IPO-38 antibody, which is directed against a small nuclear protein (possibly H2B), appeared useful both for diagnosis and for predicting survival in gastric cancer [38].

Ten studies analyzed sera from colorectal cancer (CRC) patients. Nine used SELDI-MS; several showed that SELDI-MS biomarkers compare favorably with established tests, including fecal occult blood or CEA [53], or a triple combination of CEA, CA19–9, and CA 242 [54]. SELDI-MS markers can be used to classify different stages of CRC [55], and to differentiate between good and poor responders to neoadjuvant therapy [56]. As with other diseases, the majority of CRC SELDI-MS studies are based on anonymous peaks, although a few studies report identification of specific proteins associated with classifier peaks, including apolipoproteins A-I and C-I [57], complement C3a des-arg, alpha1-antitrypsin and transferrin [53], and serum amyloid A [58]. A 2-DE serum analysis found 28 spots differentially expressed between cancer and normal, among which clusterin, complement factor I and β-2-glycoprotein I were proposed as a potential panel of CRC biomarkers [59].

Although most studies focus on cancer, three used proteomic methods to identify biomarkers of benign diseases [6062]. One of these studies showed the ability to distinguish patients with Crohn’s disease or ulcerative colitis from control subjects who were either healthy or suffered from other inflammatory conditions based on four identified classifier proteins [60]. Another showed the ability to identify patients with large colon adenomas based on a set of anonymous m/z peaks [62]. Another used ICAT technology to identify proteins useful for differential diagnosis of familial adenomatous polyposis [61].

Diseases of the Pancreas and Hepatobiliary Tract

Ten studies profiled sera from pancreatic or biliary tract cancer patients. Three SELDI studies identified anonymous m/z peaks that correctly classified most pancreatic cancer [63, 64] or cholangiocarcinoma [65] patients. In one of these studies, SELDI-MS biomarkers, or SELDI-MS biomarkers in combination with CA19-9, were significantly more accurate than CA19-9 alone [63]. Serum CA 19-9 can also be sensitively detected with protein array technology [39]. Other studies, using 2-DE, identified proteins, or in one case autoantibodies, that are differentially present in sera from cancer patients versus control subjects [6669]. In one of the first applications of antibody microarray to GI oncology, a recent study identified a signature consisting of 21 protein analytes that discriminates between short-surviving (<12 months) and long-surviving pancreatic cancer patients [37].

Twenty studies, primarily using SELDI-MS, characterized changes in the serum proteome of patients with hepatocellular carcinoma (HCC). HCC usually develops following a long history of chronic liver disease, and there is a need for markers of progression to cancer. All SELDI studies identified m/z peaks that accurately classified sera from patients with chronic hepatitis B or C infection, cirrhosis, and HCC (Table 1). Two studies specifically commented on prospects for use of SELDI-MS biomarkers for early detection: Kanmura et al. analyzed sera collected before the diagnosis of HCC by ultrasonography. They demonstrated the ability of SELDI-MS biomarkers to predict the diagnosis of HCC in 6/7 patients before HCC was clinically apparent [70]. Zinkin et al. demonstrated that SELDI-MS biomarkers were more accurate than traditional markers in detecting small HCCs in Hepatitis C patients [71]. A 2-D liquid phase fractionation study, using chromatofocusing (similar to the first dimension of 2-DE but performed in solution) and reverse-phase LC, identified 14 proteins with differential expression in HCC, albeit based on a single patient per group [72]. Another recent study identified a characteristic autoantibody signature in HCC patients [73].

Prospects for Clinical Translation of Serum Biomarkers

When multiple studies of the same disease are compared, a major limitation of serum profiling becomes evident, which is the unsatisfactory reproducibility between studies. The majority of early serum studies used SELDI technology, resulting in identification of anonymous discriminatory m/z peaks. In only a very few cases were the same discriminatory peaks identified. This may well reflect technical differences in sample collection, processing, type of SELDI chip, or other variables. Inconsistency between studies, however, is a major barrier to clinical translation of SELDI biomarkers. In the minority of instances where m/z peaks have been identified at the molecular level, many of them correspond to high abundance, seemingly nonspecific molecules such as stress proteins, clotting factors, and other known serum components. Although tests based on these markers might be clinically useful, it is disappointing that markers have not been identified with a more obvious connection to biological mechanisms of cancer development. One reason for this, suggested by Diamandis [74], is that current SELDI-TOF technology is capable of detecting only those proteins present at a concentration greater than 1 μg/ml, which is approximately 1,000-fold greater than the concentrations of established serum tumor markers (e.g., CEA). Very recently, newer technologies such as ICAT and protein arrays have begun to be applied in serum studies, and it is possible that these may overcome some of the limitations of earlier methodologies.

A potentially difficult issue is that most serum studies relied on patients with advanced disease, where host-tumor (paraneoplastic) interactions are likely to be prominent. Serum biomarkers discovered thus far may not be applicable for early detection of cancer in the general, low-risk population, which is typically a stated goal in serum studies. A more immediate application of serum-based biomarkers may be for differential diagnosis in symptomatic patients or monitoring of disease progression and treatment responses following diagnosis. If issues of standardization and reproducibility can be overcome, accuracy in the various studies cited here (>80% sensitivity and specificity) seems well within the range that would be needed for clinical utility.

Biomarkers from GI Secretions

Biological fluids have a special role in proteomics as applied to GI oncology. The GI tract is unique among organ systems because of the amount and type of secretions. The normal adult produces about 7 l of GI fluids daily, including saliva, gastric juice, pancreatic juice, bile, and enteric secretions [75]. These are secreted and reabsorbed in balance. Fluids produced by the GI tract have less-complex compositions than serum, are relatively organ-specific, and are potentially good sources for biomarker discovery.

There have been two “top-down” 2-D gel-based proteomic profiling studies of gastric juice. These reported simple changes in proteomic pattern that differentiate cancer, precancerous conditions and benign disease, including loss of gastric digestive enzymes and appearance of α1-antitrypsin-related proteins [76, 77] (Table 2).

Table 2 Proteomic surveys of GI-associated body fluids

A SELDI-MS study by Rosty et al. [78] dramatically demonstrated the advantages of using pancreatic juice over serum for detection of pancreatic adenocarcinoma markers. They showed that hepatocarcinoma-intestine-pancreas/pancreatitis-associated-protein-1 (HIP/PAP-1) was present at 1,000-fold higher levels in pancreatic juice of cancer patients than in the serum of the same individuals. The fold difference in cancer patients versus other subjects was also much higher in pancreatic juice than in serum. Studies by Chen et al. [79, 80] used “bottom-up” ICAT and tandem MS-based proteomics to compare protein expression in pancreatic juice from cancer, chronic pancreatitis, and normal tissue. They identified 30 proteins specific to cancer, 27 specific to chronic pancreatitis, with nine in common. Three studies using “top-down” gel-based separations identified numerous potential cancer biomarkers, some of which were known and others of which were novel, including a HIP/PAP-1-related protein designated as PAP-2 [8183].

Three studies characterized bile from patients with cholangiocarcinoma. Kristiansen et al. [84] used lectin chromatography to enrich for proteins of interest and deplete interfering proteins, facilitating analysis of the glycoproteome. Eighty-seven unique proteins were identified and 33 glycosylation sites were found. Two studies [85, 86] analyzed the proteomes of bile fluid from patients with malignant and benign bile tract obstruction using 2-DE; in one study, the pancreatic elastase/amylase ratio was confirmed to be a much more accurate marker than CEA or CA 19-9 [86].

Prospects for Clinical Translation of GI Secretions Biomarkers

Together, studies confirm the promise of GI secretions as a concentrated source of potentially useful biomarkers. Accessibility of these fluids varies, with collection of gastric juice being considerably easier and less invasive than pancreatic juice or bile. Nevertheless, fluids are routinely collected in symptomatic patients and tests based on these fluids may therefore be practical.

Biomarkers from GI Tissue

Tissue biomarkers are useful when a sample of the diseased tissue is available as a result of biopsy or surgical resection. Biomarkers identified by proteomic profiling of tissue have the potential to be useful directly, for example in staging or prediction of response to therapy. Information gleaned from tissue studies also lays a foundation for development of clinical serum tests; for example, if proteomic profiling reveals that a particular protein is present at high concentration in tumor tissue, one might develop a more-sensitive assay (based, for example, on protein-chip or other approaches) to investigate the presence of the protein in serum from cancer patients.

Diseases of the Alimentary Tract

There have been ten studies of esophageal cancer, all using “top-down” analysis. Eight used 2-DE separation, one used chromatofocusing, and one used capillary high-performance LC (Table 3). One of the most comprehensive of all reported proteomic surveys of GI cancers, conducted by Hatakeyama et al. [87], used 2D-DIGE to analyze 129 microdissected tissue specimens, which identified 217 differentially expressed proteins at the molecular level. Thirty-three of these distinguished tumors with and without nodal metastasis. Extensive bioinformatic analysis identified clusters of similarly regulated proteins, and gene ontology analysis showed that differentially regulated proteins had structural, transporter, chaperone, oxidoreduction, transcription, and signal-transduction activities. Zhao et al. [88] performed an interesting comparison of protein and mRNA expression. Of 38 proteins that differed in cancer-metaplasia pairs, mRNA correlated with protein expression changes in some instances but differed markedly in many others, underscoring the value added by proteomic analysis.

Table 3 Proteomic studies of tissues relevant to GI cancers

There have been eight gastric cancer tissue studies, all based on “top-down” analyses. Greengauz-Roberts et al. [89] demonstrated the ability to profile very small amounts of tissue (5 μg protein) using LCM and reported 42 proteins with differential expression in gastric adenocarcinoma versus spasmolytic peptide expressing metaplasia. He et al. [90] identified an 18-kDa antrum mucosa protein that was dramatically down-regulated in cancer tissues and proposed a special role for this protein in pathogenesis of gastric cancer. GI stromal tumor is a rare, non-epithelial malignancy of the GI tract, most commonly occurring in the stomach. Many cases are associated with mutation of the KIT protooncogene or platelet-derived growth factor receptor alpha. Two 2-DE studies identified proteins that were differentially expressed in patients in different mutation classes, or that discriminated patients with poor and good prognoses. Pfetin, a potassium channel protein, was identified as a powerful prognostic marker [91].

Eleven reports describe proteomic profiling of CRC or premalignant adenomas (Table 3). Seven studies using top-down 2-DE approaches identified numerous differentially expressed proteins including transcription regulators, signal transduction, and cytoskeletal proteins, molecular chaperones, protein synthesis factors, metabolic enzymes, apoptosis-associated proteins, and a proteoglycan (mimecan). A study of Pei et al. [92] is notable for identifying four proteins that differed specifically between primary tumors derived from node-positive versus node-negative patients. Two studies applied IMS technology on tissue sections without solubilization or protein-separation steps, an approach that is capable of providing spatially resolved images of in situ protein abundance in tumor areas versus normal areas [93, 94]. Another methodologically interesting study by Madoz-Gurpide et al. [95] selected 29 gene products for detailed investigation based on statistically significant up-regulation at the mRNA level and other criteria. They expressed these gene products in E. coli, prepared antibodies, and tested seven by IHC in a tissue microarray. They confirmed that six (ANXA3, BMP4, LCN2, SPARC, MMP7, and MMP11) were up-regulated at the protein level. Their unique, gene- and antibody-based approach avoids bias against interesting classes of proteins (i.e., very large, hydrophobic, or insoluble) that are readily overlooked in top-down proteomic approaches that rely on 2-DE as a first step.

Diseases of the Pancreas and Hepatobiliary Tract

Pancreatic cancer is second only to CRC as a cause of GI cancer deaths in the US. Unlike CRC, it is almost never detected until it has reached an incurable stage. Better detection, together with insights into disease mechanisms that might lead to better preventive or therapeutic options, would bring a large public health benefit. There have been seven “top-down” 2-DE studies (Table 3). The most comprehensive of these, by Lu et al. [96], identified 111 differentially expressed proteins at the molecular level. Proteins in this and other studies have structural, protease, metabolic, immune/inflammatory, transporter, RNA processing, transcription factor, signal transduction, cell adhesion, and other activities; some have been further validated by IHC. Studies by Chen et al. [97, 98] used “bottom-up” ICAT and tandem MS-based proteomics to identify 50 proteins as differentially expressed in cancer and 116 in pancreatitis, with considerable overlap between groups. Finally, A SELDI study [99] identified 33 anonymous m/z peaks that collectively distinguished pancreatic cancer, benign disease, and nonmalignant tissue. The same group applied a similar methodology to cholangiocarcinoma and identified 14 discriminatory, anonymous m/z peaks [65].

HCC is another GI cancer that has been widely studied using tissue profiling. As in the serum studies, progression of HBV or HCV-related disease to HCC has been the main focus. Of 19 studies, 16 used “top-down” 2-DE approaches, two used a “bottom-up” ICAT approach, and one used direct analysis of tissue slices by SELDI-TOF. Two of the most comprehensive studies combined LCM with ICAT and 2D-LC-MS/MS to compare the proteome of HCC with normal liver, identifying 149 differentially expressed proteins in one case, and 261 in another [100, 101]. Blanc et al. [102] identified 155 differentially regulated proteins in a 2-DE study, and Luk et al. [103], identified 90 in another. A 2D-DIGE study identified 127 differentially expressed proteins in cancer, and demonstrated in a validation study that a proteomic signature, based on clathrin heavy chain and formiminotransferase cyclodeaminase, could make substantial contributions to early diagnosis of HCC [104].

A methodologically interesting study by Emadali et al. [105] describes analysis of the hepatic tyrosine phosphoproteome using anti-phosphotyrosine antibodies to enrich for proteins of interest, followed by 1-DE and LC-ESI-MS/MS analysis. Although the study focused on ischemia/reperfusion (I/R) injury, the methodology could be readily extended to HCC. They found that the tyrosine kinase adaptor protein Nck-1 might play a role in I/R-induced actin reorganization.

Identification of Site of Cancer Origin

Pathologists sometimes face the problem of identifying the original site of a metastatic cancer when no primary tumor has been identified. Bloom et al. [106] used 2-DE, MALDI-TOF, and LC-MS/MS to compare the proteomic profiles of 77 histologically similar adenocarcinomas arising from six different sites of origin. Using these data, a neural network could correctly classify a single held-out sample with an average predictive accuracy of 82%. These findings show that proteomic data can be used to construct an accurate classifier for tumors without knowledge of their primary site of origin.

Prospects for Clinical Translation of Tissue Biomarkers

Almost every tissue proteomics study provides quantitative expression values for at least a few hundred “features”—spots on a gel, m/z peaks, or linked sets of peptides identified in bottom-up analysis. Among these features, it is easy to find examples of proteins or other features that show significant differential expression characteristic of physiological or disease states. More than ten independent studies have appeared for some tumor types (i.e., CRC and HCC). However, lists of candidate biomarkers produced in studies of the same tumor type are often significantly different.

Some differences may be attributable to the distinctive patient population seen at individual centers or by the nature of the comparison sample (i.e., patient-matched normal tissue, patients with benign disease, or “healthy” control subjects). There are also sources of variability that are particular to proteomic analysis. Existing proteomic profiling technology samples no more than 1% of the total proteome, and different studies may sample a different 1% depending on details of the methodology. Additionally, in top-down studies, investigators choose only a fraction of the total features as “interesting” enough for molecular identification, and the criteria for selecting these features vary. There are also significant differences in sample preparation. Some studies (14/64) use microdissection or other techniques to enrich for tumor cells, whereas the majority use bulk specimens, where the tumor-cell specific proteomic signature may be partially obscured by intermixed host tissue.

Despite these sources of variability, some common themes are evident, notably quantitative changes in cytoskeletal proteins, stress proteins, and enzymes of intermediary metabolism. In addition to common and abundant proteins, most studies report a few proteins that seem “interesting” because they are potentially involved in processes that drive malignancy, rather than simply reflecting the malignant phenotype. These include transcription factors, signal transduction proteins, and tumor suppressors. Both types of markers are potentially valuable. Quantitative changes in abundant proteins may have value in establishing individual prognosis (e.g., changes in cytoskeletal proteins that are predictive of metastatic potential), whereas transcription factors and signaling proteins may provide novel therapeutic targets.

Application of Proteomic Findings in GI Oncology

The number of exploratory proteomic studies in GI oncology is astonishing: more than 130 to date. Given the multiplicity of studies, have we moved closer to the ultimate objectives of proteomic research? None of the discoveries cited here has yet made a significant impact on clinical care. Many barriers to clinical translation are evident: the disconnect inherent in use of late-stage patients to discover markers of early disease, the lack of overlap in biomarkers identified in different studies of a same disease, and lack of standardization in sample collection and storage, protein-separation procedures, mass spectrometry, and statistical methodology. Rapid evolution of MS technology is a particularly significant contributor to lack of standardization, because studies performed at different times in different institutions almost inevitably involve different instrumentation.

Some progress has been made towards standardization, at least in areas of sample preparation and analysis, as the Early Detection Research Network of the US National Cancer Institute provides the Standard Operating Procedures and Assay Protocols for researchers via its Web site (http://edrn.nci.nih.gov). At present, however, the content of this resource remains limited.

Our review of published proteomic studies in GI oncology has suggested to us several additional steps that would be particularly valuable to the field:

  • It is important that study design and patient selection reflect the intended application of the markers. Markers discovered in tissue-based studies are most likely to be used for establishing individual prognosis or predicting response to therapy when the presence of disease is already known. It is therefore desirable for these studies to focus on classifying disease subsets or establishing molecular correlates of response in existing trials. Markers discovered in serum-based studies are often intended for early diagnosis, in which case it is essential that future studies include subjects who are at risk but have not yet been diagnosed with disease.

  • It would be helpful to incorporate uniformly rigorous statistical criteria in both design and analysis. There is a wide variation in the types and sophistication of statistical analysis employed in proteomic studies. It appears, in many cases, that study size is based on availability of samples or other resources, rather than explicit statistical reasoning. The use of “fold-change” remains prevalent as a criterion for ranking candidate biomarkers, although there is seldom explicit justification.

  • It would be fruitful to maintain an up-to-date and searchable index of the lists of biomarkers obtained in different studies. Proteomic studies generate vast amounts of data; even a small study with 10–20 patients can generate tens of thousands of protein abundance values. In this review, we have relied primarily on the author’s own assessment of their findings, as a comprehensive re-analysis of all of the primary data in the cited studies is beyond our scope.

  • Finally, it is essential that future studies focus not only on identifying the disease-associated alterations in proteins but also on determining the cellular functions of the proteins identified, as well as the mechanistic networks in which they participate. The biomarkers identified experimentally should serve as entry points for investigating the mechanisms of carcinogenesis and tumor progression.

Despite the existing barriers to clinical translation, it is important not to lose sight of the ultimate promise of clinical proteomics in GI medicine. Standard diagnostic procedures for GI diseases are largely based on clinical data in combination with endoscopy, imaging examination, histopathology, and immunohistology. Yet, we often observe that individual patients sharing the same type of disease, with the same histopathologic diagnosis, at an identical stage, end up with different clinical outcomes with respect to survival and treatment response. This indicates that each patient’s disease may have a unique constellation of molecular derangements [107].

In the future, clinical proteomics may provide a rational basis for individualized therapy. Patients with a GI malignancy could be identified early by screening serum or other GI fluids. A tissue biopsy could then be analyzed for a proteomic signature to establish prognosis, to select the best targets for individualized therapy, and to predict therapeutic responses and toxicities [108]. A recurrence could be detected early by serum analysis, providing an opportunity to alter the therapeutic regime. Using these new tools, GI malignancies could become manageable chronic diseases [109].