Using genomic approaches to understand cancer progression

The accumulation of genetic alterations is thought to drive the progression of normal cells through hyperplastic and dysplastic stages to invasive cancer and, finally, metastatic disease. Since the initial efforts to link histopathological changes to the mutation of specific genes in colorectal cancer [1], progression models have been developed for many tumor types, including lung, breast, head and neck, and prostate [25]. Mutational and gene expression analysis of known tumor suppressors and oncogenes in the context of early tumorigenesis has provided insight into the role of these genes in cancer progression [6, 7]. Gene discovery has been greatly facilitated by molecular cytogenetic technologies identifying chromosomal regions associated with various stages and outcomes. Furthermore, high throughput, genome-wide approaches and the complete sequencing of the human genome have accelerated the large-scale discovery of cancer-related genes and pathways [8].

While genetic alterations in tumors are common, changes found in premalignant stages are more likely to represent causal events initiating and promoting cancer development. These events may be masked by the complex pattern of genetic alterations often associated with genetic instability in later stages of disease. For this reason all stages of progression have to be considered in order to fully understand how malignant tissues develop. To date, genomic and proteomic efforts have been primarily directed at the study of tumors. The relatively limited literature on genetic studies of earlier stage cancers is attributable to challenges associated with accessing premalignant specimens and the fact that genome-wide analysis would require quantities of material far exceeding the size of the minute specimens obtained. Recent advances in cell isolation techniques and miniaturization of genomic technologies have enabled comprehensive molecular profiling of selected cell types and high resolution mapping of gene disruption associated with specific disease phenotypes. This review article describes the current genomic technologies used for analysis of cancer, the model systems used to corroborate the significance of candidate cancer genes and pathways, and the genetic progression models for common types of cancer.

Tissue heterogeneity

Tumors and precancerous lesions are heterogeneous cell populations harboring normal stromal and inflammatory cells. The presence of these cells could mask the detection of genetic and gene expression alterations in the cancer cells. The development of laser-assisted microdissection techniques addresses this problem by enabling selective isolation of cell populations, for example normal epithelium and hyperplastic cells [911] (Fig. 1).

Figure 1
figure 1

Laser capture microdissection of prostate tissue sections. Panel A shows hematoxylin & eosin (H&E) stained prostate section. Black arrow indicates stromal cells. Red arrows indicate epithelial cells. Panel B shows laser outline of the cells to be collected. Panel C shows the remaining cells after laser capture. Panel D shows the cells collected from the outlined area. Images provided by Dr. J R. Vielkind.

Three commonly used microdissection techniques are laser capture microdissection (LCM), laser microbeam microdissection (LMM), and laser pressure catapult (LPC) [11]. LCM involves the capture of cells by adhering them to a thermoplastic membrane activated by a near-infrared low power laser [10]. The relatively low intensity of the laser does not damage DNA, RNA, or proteins in the captured cells, while the remaining tissue section is left intact on the glass slide [11]. LMM uses a focused laser beam to cut out target cells and to photoablate unwanted adjacent tissue [12]. LMM is often used in conjunction with LPC, a technique that involves the build up of laser-generated high-photon density under a given specimen, causing the selected cells of interest to catapult up along the path of the beam and become available for collection [13].

Identification of genetic alterations

Current methods for genome-wide detection of genetic alterations fall into three main categories: (1) molecular cytogenetic evaluation of chromosomal aberrations and re-arrangements, (2) DNA polymorphism analysis for detecting loss of heterozygosity (LOH) or allelic imbalance, and (3) comparative genomic hybridization (CGH) approaches for identifying segmental copy number changes.

Molecular cytogenetics

Cytogenetic approaches are designed to detect aberrations and rearrangements under direct examination of chromosomes and chromosomal targets. G-banding, fluorescence in situ hybridization (FISH), and spectral karyotyping (SKY) are the commonly used methods [14, 15]. G-banding is often used in clinical settings for the analysis of leukaemia and is best suited to detect large chromosomal aberrations, namely structural or numeric changes [16]. This method evaluates stained metaphase chromosome spreads to identify rearrangements and gain or loss of chromosome bands. One of the most comprehensive databases of cytogenetic information for various tumor types is the Mitelman Database of Chromosome Aberrations in Cancer [17]. This and other cytogenetic databases are listed in Table 1.

Table 1 On-line Resources

FISH has helped bridge the gap between molecular genetics and classical cytogenetics. This technology uses specific DNA probes of known chromosomal location to evaluate alterations at a specific locus on a cell-by-cell basis (Fig 2a) [18]. Gain, loss, and splitting of hybridization signals on metaphase or interphase chromosomes reflect duplication, deletion, and translocation events respectively [19]. FISH is useful in fine mapping genetic alterations in very small specimens such as premalignant lesions since it does not require microdissection. With the development of fluorochromes that fluoresce at different wavelengths, multicolor FISH (M-FISH) has enabled the examination of multiple loci in the same experiment [20].

Figure 2
figure 2

Cytogentic analysis. Panel A shows an image of fluorescent in situ hybridization. The metaphase chromosome spread is hybridized with two locus specific probes labeled with FITC (green) and SpectrumRed (red). Panel B Spectral karyotype (SKY) analysis of immortalized human prostate epithelial cells. The line was derived from cells previously described [18]. Images provided by Dr. J. Squire.

SKY uses 24 different probe sets to virtually paint each chromosome a different color. This technique involves the simultaneous excitation of multiple fluorochromes and the use of an interferometer to determine the profile at each pixel [15] (Fig 2b).

Although whole genome cytogenetic techniques are limited to the identification of intrachromosomal rearrangements and breakpoint determination, they have been the preferred techniques for detailed karyotypic assessment of structural chromosome aberrations [15].

Assessing LOH using polymorphic markers

Microsatellite analysis uses simple sequence repeat (SSR) polymorphisms as markers for detecting LOH. A polymerase chain reaction (PCR) using primers flanking a repeat should yield two signals corresponding to the two heterozygous alleles. When the signal intensity ratio of the tumor alleles differs from that of the normal alleles, allelic imbalance or LOH is inferred. An example of mapping of LOH at the chromosome scale was the use of 28 markers spanning chromosome 3p to determine three distinct regions of alteration in non-small cell lung cancer [21]. In addition, microsatellite analysis is commonly used for fine mapping minimal regions of LOH. However, this approach is limited by the availability of polymorphic SSR markers in the chromosomal regions of interest. For microdissected, minute premalignant specimens, DNA yield is an additional limitation since each marker requires at least 5 nanograms of DNA per assay [22]. Therefore, although whole-genome allelotyping has been applied to early stage cancer [2326], efforts have been largely focused on tumors and cell lines where material is not limiting.

Single nucleotide polymorphisms (SNPs) are another source of DNA markers used in identifying LOH. SNPs are common in the human genome and in some instances their variation can be correlated to disease behaviour [27, 28]. The through-put of this approach is greatly enhanced by parallel analysis of multiple loci on microarrays. For example, GeneChip® arrays from Affymetrix® have enabled simultaneous tracking of approximately 1,500 SNPs [29]. The large number of SNPs examined would compensate for the fact that not all loci will be informative (heterozygous). The recently released "Mapping 10 K Array" tracks greater than 10,000 SNPs distributed throughout the genome should increase the information content of an array hybridization experiment.

Unlike microsatellite or SNP analyses, amplified fragment length polymorphism (AFLP)-based approaches require no previous knowledge of polymorphisms. Fingerprinting techniques such as random amplification of polymorphic DNA (RAPD) or arbitrarily primed PCR (AP-PCR) use short primers of 10 to 20 nucleotides to amplify multiple fragments randomly distributed throughout the genome (Fig 3). The PCR products are then separated by electrophoresis to display up to dozens of anonymous DNA polymorphisms [3032]. It has been applied to a variety of tumor types to study genomic instability, identify novel DNA amplifications and deletions, and to assess changes in methylation state [3342]. The recently developed methylation-sensitive AFLP (MS-AFLP) technology allows for an unbiased assessment of epigenetic changes in a subset of methylation sites throughout the genome [43, 44]. However, the use of RAPD patterns in predicting prognosis has not yet been widely used.

Figure 3
figure 3

Schematic representation of randomly amplified polymorphic DNA PCR analysis of tumor DNA. Panel A shows chromosome with multiple binding sites for selected primers (red arrows). Panel B is a representation of a given locus in six different patients. Patient 1 contains both primer sites, patients 2, 4, and 5 contain neither sites, and patients 3 and 6 contain only one of the primer sites. Solid line represents normal chromosome, while the dotted line represents regions of chromosomal loss. Panel C displays the DNA fingerprints for paired normal and tumor DNA from these six individuals. The blue bands correspond to fragments indicated in Panel B.

Comparative genomic hybridization

Comparative Genomic Hybridization (CGH) detects segmental DNA copy number changes. Differentially labeled tumor DNA and control normal DNA are co-hybridized to a metaphase chromosome spread, producing an average fluorescence ratio profile at approximately 20 Mbp resolution [45]. Copy number changes in a variety of cancers – and to a lesser extent, premalignant lesions – have been detected using this method [4656]. While CGH provides a profile of the entire genome, the resolution is limited and therefore it is difficult to determine the identity of specific gene alterations. CGH is often used in conjunction with FISH in order to fine map alterations to the gene level. As CGH has become a more widely used method, profile databases have been assembled for public access (see Table 1).

Array-based CGH

Until recently, localized deletion mapping using microsatellite markers has represented the highest resolution method available to identify potential tumor suppressor genes. However, new approaches based on the use of genomic microarrays have been developed. To achieve higher resolution, Pollack et al. made use of cDNA microarrays for analyzing genomic DNA derived probes [57, 58]. However this approach is hampered by suboptimal hybridization which arises because the genomic DNA probe that is used has introns that are absent in the spotted cDNA target. As mentioned above, the recent development of SNP arrays has greatly facilitated deletion detection, though the resolution of SNP arrays is currently limited to approximately 10,000 SNPs. One would expect that only a subset of these loci will be informative (heterozygous). Another technology called representational oligonucleotide microarray analysis (ROMA) provides a means of detecting genetic alterations in cancer tissue using a high density oligonucleotide array to profile subtractive hybridization products generated through representational differential analysis [59, 60].

Complementary to these array-based CGH techniques, bacterial artificial chromosome (BAC) array CGH allows the detection of segmental copy number changes [45, 61]. BAC array CGH is similar to conventional chromosomal CGH except that it uses segments of human DNA as hybridization targets instead of a metaphase spread of chromosomes [45, 61, 62] (Fig 4). Hybridization onto such arrays overcomes the low resolution that limits conventional CGH. As with conventional CGH, total genomic DNA from a tumor and a normal cell population are differentially labeled and co-hybridized onto an array. The ratio of the fluorescence intensities on each DNA spot on the array is proportional to the copy number of the corresponding sequence.

Figure 4
figure 4

Principle of array CGH. This figure shows the steps in BAC array CGH. (A) BAC clones are selected from a physical map of the genome. (B) DNA samples are extracted from selected BAC clones and their identity is confirmed by DNA fingerprinting or sequence analysis. (C) A multi-step amplification process generates sufficient material from each clone for array spotting. Each clone is spotted in replicate onto a solid support. (D) Reference DNA and test DNA are differentially labeled with cyanine 3 and cyanine 5 respectively. (E) The two labeled products are combined and hybridized onto the spotted slide. (F) Images from hybridized slides are obtained by scanning in two channels. Signal intensity ratios from individual spots can be displayed as a simple plot (G) or by using more complex software such as SeeGH, which can display copy number alterations throughout the whole genome (H) [82].

High resolution arrays allow for the delineation of amplification and deletion boundaries in a single experiment. These arrays have been instrumental in detailed analysis of specific chromosomal regions [42, 6368]. High resolution analysis of entire chromosome arm for segmental copy number alterations is made possible with whole chromosome or chromosome arm BAC arrays [6971].

The application of this technology for genome-wide profiling was first described by Snijders et al., who used 2460 marker BACs and P1 clones to generate an array with clones positioned at ~1.4 Mbp intervals [72]. Arrays of similar resolution have been reported by other groups [73, 74]. This technology has been applied to analyze cell lines and tumors from lymphoma, bladder, breast, prostate, and kidney [7580].

Further advancement of this technology to tiling resolution of the whole genome has eliminated the need for inferring continuity between marker BACs. This was achieved by using an ordered set of 32,433 BAC clones that provide full coverage of the genome, allowing the profiling of the entire genome in a single experiment [61, 81, 82] (Fig 5).

Figure 5
figure 5

SeeGH display of whole genome array comparative genomic hybridization. SeeGH translates spot signal ratio data from array CGH experiments to give high resolution chromosome profiles (see Fig 4). Signal ratios are plotted as a log2 scale. This figure shows a whole genome profile of a squamous non-small cell lung carcinoma. Vertical green and red lines are scale bars indicating log2 ratios. Copy number losses are indicated by a shift in ratio to the left of zero, while gains are reflected by a shift to the right. Red and green arrows highlight examples of copy number deletions or gains respectively.

Digital karyotyping

Digital karyotyping is a genome wide approach for identifying copy number alterations [83]. This technique involves the isolation and enumeration of short sequence tags from specific genomic loci, namely tags adjacent to Sac I restriction enzyme cut sites throughout the genome. Digital enumeration of the tags at intervals along each chromosome reflects DNA content. The concept behind this DNA profiling technique is analogous to that of serial analysis of gene expression (SAGE) described below [84], except that the DNA tags concatenated for sequence analysis are derived from fragmented genomic DNA rather than from a cDNA population. The sensitivity and specificity of digital karyotyping depends on the combination of mapping and fragmenting enzymes employed as well as the number of tags sampled. The identification of high-copy-number amplifications can be detected with fewer tag counts.

Expression profiling

Ultimately, the genome-wide search for oncogenes and tumor suppressors will require the integration of both genomic and expression analysis approaches. Integration of genetic and gene expression data will validate the candidate genes in regions of DNA alteration as well as highlight the downstream effects.

Microarrays

The two main types of microarrays are cDNA microarrays and oligonucleotide microarrays [85, 86]. cDNA microarrays have PCR-generated "target" cDNAs deposited onto glass whereas oligonucleotide microarrays are manufactured using either a photolithographic process that directly synthesizes them on the glass slide or deposition of oligonucleotides onto glass slides [87, 88]. Both types of microarray are hybridized with cDNA samples derived from tissues of interest to assess changes in expression levels. After competitive hybridization of the cDNA samples, differentially labeled with dyes such as Cyanine 3 and Cyanine 5, the slides are washed to remove unspecific binding and then scanned to determine the relative intensities of each channel. Normalization of the samples allows for differences in labeling and detection efficiencies so that the two datasets can be compared [89].

Approximately a quarter of microarray-related literature pertains to cancer, with tumor and cell line transcriptome profiling providing numerous insights into disease [90]. The development of the "lymphochip" cDNA microarray and other cDNA and oligonucleotide arrays has allowed the subclassification of many disease types including lymphoma, leukaemia, and cancers of the breast and lung [91100]. Analysis of small specimens, such as those derived from premalignant tissue, has been facilitated by the introduction of RNA amplification methods where cDNA is linearly amplified, thus preserving the composition of the original RNA population [101, 102]. This analysis of premalignant lesions has led to the discovery of new biomarkers for determining prognosis and new targets for treatment. Frequently used microarray databases are listed in Table 1.

Serial analysis of gene expression

Unlike microarray technology, which focuses analysis to only those cDNAs represented on a chip, Serial Analysis of Gene Expression (SAGE) provides an unbiased profile of the transcriptome by taking a raw count of sequence tags, each representing a transcript in an RNA population [84]. The tag count is accomplished through the creation and quantification of concatenated tags generated from tissue mRNAs [103]. (Figure 6 summarizes the steps of SAGE library construction) Absolute quantification of the transcriptome allows the creation of gene expression profiles that can subsequently be compared against profiles from other cell types. The longSAGE variation of the SAGE protocol allows more specific tag mapping, notably to cDNAs but also to genomic sequence [104]. The microSAGE protocol, on the other hand, reduces the amount of RNA required for library construction and therefore facilitates examination of the early stages in carcinogenesis [105, 106]. There are a number of web resources for SAGE (see Table 1). SAGEnet provides multiple protocols, while SAGEmap and SAGE Genie provide analysis tools and databases [107, 108].

Figure 6
figure 6

Serial analysis of gene expression (SAGE) library construction. This figure shows the steps in SAGE profiling. (A) An RNA population is reverse transcribed to cDNAs using oligo-T primers attached to magnetic beads. (B) cDNAs are collected and digested with the restriction endonuclease Nla III. (C) Linkers containing sequence recognized by BsmF I are ligated to the digested cDNAs. Sequence tags are released from the beads by BsmF I digestion (BsmF I cuts at a fixed distance downstream from its recognition site). (D) Released DNA tags are ligated together to form ditags. (E) Ditags are amplified and then digested with Nla III to remove the linkers. (F) Ditags are ligated together to form a concatemer which is then clones into a plasmid vector to generate a SAGE library. The identity and abundance of tags is deduced from DNA sequence analysis of plasmid clones of concatemated ditags. (G) Relative abundance of gene expression – between genes within the same RNA population or between samples – is deduced by counting sequence tags. Diagrams provided by Dr. K. Lonergan.

SAGE-based research to identify cancer markers has been conducted for a variety of primary cancers and cell lines, including breast, kidney, prostate, liver, lung, gastric, colorectal, and pancreatic cancer [109128]. In a few of these instances, such as the work on breast cancer by Porter et al., libraries have been generated for early histopathological stages of cancer that demonstrate expression profiles distinct to each stage [108, 114, 127]. These authors suggested that some of the observed gene expression changes tied to progression through the in situ stages of disease were likely involved with cell growth, differentiation, and survival.

Quantitative PCR

Whole genome profiling approaches, such as SAGE and microarrays, yield candidate genes that require verification. Given that biological specimens are often limited in size, traditional Northern blot analysis may not always be possible. Reverse-transcriptase polymerase chain reaction (RT-PCR) provides semi-quantitative assessment of relative abundance of specific transcripts using gene-specific primers [129]. Real time RT-PCR measures product amount after each cycle of amplification based on association of fluorescence to the amount of DNA accumulated during the PCR [130133]. Three common real-time approaches are SYBR Green® staining, the TaqMan® system, and the molecular beacon system [134]. In the SYBR Green® method, fluorescent DNA dye that is bound non-specifically to double-stranded DNA is measured to quantify the accumulation of PCR products. In the Taqman® system, a fluorescence resonance energy transfer (FRET) oligonucleotide probe complementary to the target sequence is used as the reporter system. The fluorescence of the reporter molecule at the 5' end of the oligonucleotide is interfered with by a quencher molecule at the 3' end. When strand synthesis occurs in PCR, the nuclease activity of Taq polymerase degrades the FRET probe and releases the reporter from the quencher, producing fluorescence. In the Molecular Beacon method, the 3' quencher and 5' reporter of FRET probes initially exhibit no fluorescence because the oligonucleotide forms a hairpin loop that brings these two factors into close proximity. Binding of the probe at a target sequence separates the two fluorochromes, allowing the reporter to fluoresce.

Immunohistochemistry, tissue microarrays, and proteomic approaches

Basic immunohistochemical (IHC) techniques, when applied to tissue microarrays (TMA), allow for high throughput analysis of multiple tissues [135137]. In the construction of TMAs, core samples taken from multiple archival specimens are re-embedded in a paraffin block so that each section of the TMA would contain multiple samples for parallel analysis [138]. Similarly, cytology microarrays, with cell suspensions spotted in an array format, facilitate parallel analysis of intact cells [139].

While IHC examines individual targets, proteomic approaches aim to assess global changes at the protein level [8, 140]. For more than a quarter century, two-dimensional polyacrylamide gel electrophoresis has been a commonly used method for displaying the proteome [141]. This approach separates proteins based on isoelectric focusing (pI) and size (polyacrylamide gel electrophoresis). A recently developed method for resolving proteins is isotope-coded affinity tagging (ICAT) which allows quantitative analysis of paired protein samples through the use of stable isotope labeling [142]. Isotopic tags covalently bind cysteine residues within a protein. Tagged proteins are separated and identified by liquid chromatography and mass spectrometry. An assessment of these two methodologies was provided by Patton et al. [143].

In contrast to gel electrophoresis, mass spectrometry assesses protein size by time of flight (TOF) analysis [144, 145]. A technique that incorporates this approach is surface-enhanced laser desorption/ionization (SELDI)-TOF, an affinity-based method in which proteins adsorb to a given chemically modified surface and, subsequently, the bound proteins are resolved by TOF analysis [146, 147]. This technique is commonly used for detecting disease-associated proteins in cell lysates as well as serum.

Recently, high throughput proteomic approaches have been used for identifying protein interactions with other proteins, nucleic acids, lipids, antibodies, and drugs. These approaches include protein array-based and phage display-based methodologies. Cell lysates or protein samples are differentially labeled and competitively hybridized to individual protein targets arrayed on a small surface. Signal intensity ratios are used to calculate the relative abundance of a given molecule. Commercially available antibody microarrays have immobilized selected antibodies targeted against components of known cellular pathways such as signal transduction, cell cycle regulation, gene transcription, or apoptosis [148].

Proteins may also be displayed on the surface of bacteriophage, serving as an alternative to protein arrays for high throughput screening [149151]. In this system, cDNA libraries are inserted into vectors that generate fusion products with a bacterial phage coat protein. These recombinant proteins are expressed on the surface of the bacteriophage and can be screened for interactions with proteins of interest.

Gene silencing and overexpression

Methylation

Epigenetic changes may alter gene expression. In general, they are heritable and do not arise due to alterations of DNA sequence [152]. Methylation is the best characterized epigenetic change, typically occurring at CpG dinucleotides within the mammalian genome [153]. CpG dinucleotides are commonly found in promoter regions, in "CpG islands" which are long portions of DNA with high GC content. With the exception of the X chromosome, CpG residues in promoter regions are typically unmethylated [154, 155]. Methylation occurs by the attachment of a methyl group to C5 of the cytosine residue after DNA replication has occurred, resulting in the loss of gene expression. The relative amount of methylation can vary, a decrease termed hypomethylation and an increase known as hypermethylation.

Methylated DNA can be distinguished from unmethylated DNA by virtue of resistance to 1) methylation sensitive restriction enzyme digestion and 2) bisulfite treatment. In the first case, isoschizimers such as Hpa II and Msp I (which recognize CCGG) and Xma I and Sma I (which recognize CCCGGG) are often used to detect methylation, since cleavage by Hpa II and Xma I are impaired by internal cytosine methylation of the recognition sequence. This distinguishing feature is the basis of global methylation detection methods such as restriction landmark genomic scanning (RLGS) of CpG island methylation and methylation target arrays [154, 156159]. In methylation target arrays a multitude of CpG islands are spotted onto an array and hybridized with probes generated by linker-mediated PCR-amplification of sample DNA pre-digested with a methylation-specific enzyme [160162]. Methyl-CpG binding proteins can be used to identify the unique distribution of CpG islands by using chromatin immunoprecipitation [163]. Methylated DNA bound to these proteins serves to identify novel targets of epigenetic inactivation in human cancer. Localization of these targets can be achieved by hybridization to CpG island microarrays or through CGH. Bisulfite treatment of DNA causes selective deamination of cytosine to uracil [164]. However, in contrast to cytosine, 5-methyl-cytosine does not react with bisulfite, hence oligonucleotide primers can be tailored to recognize altered or unaltered sequence in order to distinguish unmethylated and methylated targets in a methylation-specific PCR assay.

With respect to the progression of cancer, the genetic changes associated with disease development are often accompanied by significant changes in methylation state [152]. The idea that epigenetic changes can be a mechanism for altering gene expression and driving tumorigenesis has been supported by recent work, examples including work on 14-3-3σ and CCND2 in breast cancer, p16INK4A and RASSF1A in lung cancer, and HPP1 and SFRP1 in colorectal cancer [165170].

Deducing function of novel genes

Cell models

Cell culture models are often used to deduce gene function through the introduction of a foreign gene or by disruption of endogenous gene function, thereby creating a new phenotype or altering cell behaviour.

A new approach for disrupting gene function is RNA interference (RNAi). This method targets specific genes by way of post-transcriptional gene silencing. The natural function of RNAi is thought to be protection of the genome against invasion by mobile genetic elements such as transposons and viruses, which produce aberrant RNA or dsRNA in the host cell when they become active [171]. Efforts to develop an RNAi microarray will ultimately allow for knockdown analysis of gene function to be undertaken on a genome-wide scale [172, 173].

Animal models

Animal models serve two broad functions in terms of identifying and characterizing genes involved in cancer and cancer progression. First, sequence homology between known animal genes and previously unidentified human genes allows for speculation as to the gene's function in humans. This is possible because there are an increasing number of whole genome sequences available for a variety of animals (e.g. Fugu, Drosophila, mouse, chimpanzee) [174177]. Second, animals serve as functional models for cancer, allowing researchers to assess the effects of gene disruption, treatment regimes, and disease progression. Mammalian models are expected to more closely mimic the intricacies of human conditions [178]. The expansive body of literature pertaining to murine malignancy and the completion of the mouse genome sequence makes the mouse the leading model for cancer gene discovery [179, 180].

Initial efforts to examine cancer genetics in the mouse involved incorporation of embryonic stem (ES) cells containing mutated forms of a gene of interest into a developing mouse [181, 182]. Conditional mutants allow spatial and temporal control over the expression of the introduced genotypic alteration, an example being the Cre-lox system [182, 183]. Briefly, this system involves the generation of parallel lines of mice, one having been manipulated to have the gene of interest flanked by P1 bacteriophage loxP sites and the other having the Cre recombinase expressed under the control of a tissue-specific promoter. When these lines are crossed, the gene book-ended by loxP sites is excised in that tissue where Cre is expressed, thereby disrupting expression of the gene of interest and allowing researchers to assess its role in tumor development in that tissue. There are numerous variations on this technique currently in use and the Cre-lox system has been widely applied in cancer progression research [184, 185].

Current cancer progression models

The use of genome-wide analysis has resulted in the discovery of genes involved in cancer progression. This section summarizes the cumulative information pertaining to the genetic alterations and gene expression changes associated with the progressional stages in four major cancer types.

Breast cancer

Histopathological stages of the most common form of breast cancer include atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS), and invasive ductal carcinoma [3]. Inherited alterations at the BRCA1 or BRCA2 loci can predispose individuals to breast cancer, the histology in these cases differing from that seen in sporadic disease [186, 187]. Altered expression of the FHIT tumor suppressor locus is common in many breast cancer types, especially in individuals carrying BRCA2 mutations [188]. Recent gene expression profiling studies have served to identify a genetic basis for the disease stages listed above [3, 53, 100, 114, 127, 189, 190]. SAGE and microarray data have demonstrated that relative expression of genes within the transcriptome vary from stage to stage, with some of the genes being expressed solely in a specific stage. Correlation of expression changes between multiple cases has led to the characterization of prognostic biomarkers [97, 98, 187, 191195]. Furthermore, proteomic studies have identified additional changes in DCIS not detected by nucleic acid-based assays [140, 196]. Methylation changes driving breast cancer progression have been identified using both high throughput techniques and more established techniques (e.g. methylation-specific PCR) [113, 162, 166, 167, 197200]. This has lead to the discovery of epigenetic changes that correlate to disease outcome and therefore have strong prognostic value. Figure 7 provides a summary of those genes and chromosomal regions implicated in breast cancer progression.

Figure 7
figure 7

Progression model of ductal breast cancer. Histopathological stages of the most common form of breast cancer include atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS), and invasive ductal carcinoma. This figure highlights the changes that occur in breast cancer throughout the histopathological stages of the disease.

Prostate cancer

Prostate cancer is multifocal and heterogeneous, meaning that benign, premalignant, and malignant tissues coexist within the same patient [5]. The prostate cancer progression model suggests that normal prostatic epithelium changes to prostatic intraepithelial neoplasm (PIN), which in turn becomes localized invasive cancer, metastatic, and, finally, hormone refractory disease with increasing severity reflected in a higher Gleason grade [201, 202]. The hormone refractory stage occurs after metastasis, when patients cease to respond to hormone therapy and quickly succumb to the disease [203]. Both conventional and high throughput techniques have been employed to assess the progression of prostate cancer in terms of chromosomal instability and methylation [203210]. Most genes that have been implicated in prostate cancer development have been identified through linkage analysis. Brothmann et al. summarized cytogenetic and molecular genetic alterations associated with hereditary and sporadic prostate cancer, as well as epigenetic changes [201]. With new technology, such as LCM, it is now possible to procure isolated populations of cells to deduce somatic events. In addition, cDNA microarray and SAGE technologies have elucidated gene expression changes tied to prostate cancer progression at each histopathological stage [202, 211214]. These same technologies have been used to identify potential biomarkers, taking advantage of correlation between the expression of specific genes and Gleason score to generate a prognostic model for patients that have undergone prostatectomy based solely on gene expression data [215]. Integration of gene expression profiles with tissue microarray data has allowed multiplex assessment of biomarkers for diagnostics and prognostics in prostate cancer [216, 217]. Those genetic, epigenetic, and chromosomal alterations that have been characterized for prostate cancer are shown in Figure 8.

Figure 8
figure 8

Progression model of prostate cancer. The prostate cancer progression model suggests that normal prostatic epithelium changes to prostatic intraepithelial neoplasm (PIN), which in turn becomes localized invasive cancer, metastatic, and, finally, hormone refractory disease with increasing severity reflected in a higher Gleason grade. This figure outlines the changes that occur in the progression of prostate cancer.

Lung cancer

Pathogenesis of lung cancer is thought to differ for small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) [218]. Classification of lung cancer subtypes is possible based solely on differential expression patterns [9496, 219223]. Analysis of gene expression, methylation, and chromosomal changes in lung cancer have served to better shape the existing lung cancer progression model [37, 167, 219, 220, 224236]. Disease progression is best characterized in bronchial squamous cell carcinoma, a NSCLC subtype where normal epithelium develops hyperplasia or metaplasia, followed by varying degrees of dysplasia, and then carcinoma in situ and invasive cancer [2, 237241]. Alterations on chromosome 3p followed by alterations on 9p are believed to be the earliest genetic events to occur in the progression of the disease [242]. Analysis of the early stages of squamous cell carcinoma has been facilitated by the development of fluorescence bronchoscopy technology (e.g. the LIFE-Lung device) allowing the detection and capture of minute lesions [243, 244] (Fig 9). hTERT, located on chromosome 5p, has also been studied extensively in lung cancer [226]. The expression pattern for hTERT has been reported to be slightly increased in the early premalignant stages of development and gradually increase as the lesion becomes more severe [226, 245]. Figure 10 shows the genetic alterations understood to drive progression.

Figure 9
figure 9

Fluorescence bronchoscopy. Panel A shows a white light bronchoscopy image obtained using the LIFE LUNG device from the upper left lobe of the lung. Panel B shows the detection of a carcinoma in situ lesion due to abnormal autofluorescence. The lesion is observed as a brownish area in a background of green fluorescence. Images provided by Dr. S. Lam.

Figure 10
figure 10

Progression model of squamous non-small cell lung carcinoma. Squamous cell carcinoma of the lung progresses from normal, metaplasia, dysplasia, carcinoma in situ to invasive carcinoma. The alterations present in the various stages of the disease are outlined in this figure.

Colorectal cancer

Colorectal cancer typically progresses from normal epithelium through dysplasia and adenoma stages to carcinoma in situ and finally to invasive cancer [246, 247]. Genetic instability is a hallmark of colorectal cancer; microsatellite instability (MIN) is attributed to DNA mismatch repair genes, whereas chromosomal instability (CIN) is characterized by gross chromosomal changes arising during cell division and commonly involves APC and β-catenin mutations [248250]. cDNA microarray analysis has revealed different gene expression patterns for cell cycle regulation and DNA repair genes in colorectal cancer cell lines characterized by CIN or MIN [251]. Furthermore, gene expression profiling with SAGE, oligonucleotide arrays, and cDNA microarrays has been applied to identify staging and prognostic markers [112, 118, 252259]. Those genes implicated are typically involved with cell cycle control, apoptosis, angiogenesis, and transcription machinery. Figure 11 details those alterations understood to drive tumorigenesis in the colorectal region.

Figure 11
figure 11

Progression model of colorectal cancer. Colorectal cancer typically progresses from normal epithelium through dysplasia and adenoma stages to carcinoma in situ and finally to invasive cancer. The changes that occur in the progression of colorectal cancer are outlined in this figure.

Conclusion

Advances in technology have provided the means for a global look at an increased resolution. Using a global approach, identification of genetic alterations and gene expression changes at the early and late stages of cancer progression is possible. Through the integration of analysis at the level of the genome, transcriptome, and proteome, key pathways and functions can be defined. This will give a better understanding of the critical steps driving disease progression.

Knowledge of causal events driving progression will allow for a mechanistic basis for subclassification of disease and provide novel targets for early diagnosis and the creation of more specific treatment regimens [260].