1 Introduction

In the past decade, advancements in genome profiling technologies have greatly improved our ability to understand the landscape of cancer genomes. From the emergence of array-based comparative genomic hybridization (CGH) and spectral karyotyping (SKY) to the current state of next generation sequencing, the improvement in resolution at which the genome can be described has been over a million fold [16]. Likewise, the recent development of integrative platforms to relate multiple dimensions of DNA features (such as copy number, allelic status, sequence mutations, and DNA methylation) to gene expression patterns has dramatically improved our ability to identify causal genetic events and decipher their downstream consequences in the context of gene networks and biological functions [7, 8] (Table 1). Landmark events in cancer genomics, from the launch of Cancer Genome Anatomy Project at the beginning of the decade to the recent publications of complete cancer genome sequences, are highlighted in Fig. 1 [36, 8, 1145].

Table 1 List of software for integrative analysis
Fig. 1
figure 1

Advances in cancer genomic landscape post Y2K. The timeframe of events are estimated based on time of publication

Multiple levels of genetic and epigenetic disruption are instrumental to cancer development, whereby specific genes may be altered by a variety of mechanisms. For example, the tumor suppressor CDKN2A can be inactivated through copy number loss, DNA hypermethylation, or sequence mutation. These mechanisms of disruption can occur in a tumor-specific manner or may occur concurrently in the same tumor, i.e., a two-hit scenario. Moreover, in the former situation, if a given gene or pathway's frequency of alteration is low when examined by one mechanism or dimension, it is likely that the gene/pathway would be overlooked by the analysis. However, when multiple dimensions of disruption are considered in the analyses, alteration of the gene in question may be detected at a high frequency, albeit at low frequencies by any one mechanism. This illustrates the need for and the benefit of integrative analytical approaches. In this article, we discuss the impact of multidimensional genomic analyses on our view of the cancer genome landscape and the contribution of such new knowledge to our understanding of cancer progression and metastasis.

2 Genomic alterations

2.1 Chromosomal aberrations

Chromosomal aberrations and rearrangements, such as translocations and gains/losses of whole or portions of chromosome arms, are detected through direct examination using molecular cytogenetic techniques such as G-banding, SKY, fluorescence in situ hybridization (FISH), and CGH [2, 4650]. The manifestation of such alterations is generally attributed to mitotic errors, where centrosomal aberrations and telomere dysfunction play key causative roles [5155].

Aberrations such as gains and losses have been further refined using technologies such as microarray CGH (see below). While primarily associated with different types of leukemia and lymphomas, recent genomic studies have identified translocations in epithelial tumors such as prostate and lung cancer [5663]. A compilation of cumulative cytogenetic data from three main sources—NCI/NCBI SKY/M-FISH & CGH Database, NCI Mitelman Database of Chromosome Aberrations in Cancer, and NCI Recurrent Aberrations in Cancer—is now integrated into NCBI’s Entrez system as Cancer Chromosomes [64] (Table 2).

Table 2 List of genomic resources and databases

2.2 Gene dosage, allelic imbalance, and mutational status

Gene dosage

Genomic DNA copy number alterations are a prominent mechanism of gene disruption that contributes to tumor development [85]. Segmental amplification may lead to an increase in gene and protein expression of oncogenes, while deletions may lead to haploinsufficiency or the loss of expression of tumor suppressor genes. Since its development in the mid-1990s, advances in microarray-based CGH technology have dramatically increased genome coverage and target density, improving both the resolution and sensitivity of detection of copy number alterations [86, 87]. The first genome-wide array CGH analysis utilized cDNA microarrays originally designed for gene expression profiling [88]. Since these first experiments, whole genome tiling path arrays with tens of thousands of bacterial artificial chromosome clones, oligonucleotide (25–80-bp nucleotide probes), and single-nucleotide polymorphism (SNP) arrays with over one million DNA elements and the essential bioinformatics tools for visualization and analysis of high-density array CGH data have been developed (Fig. 1) [7, 35, 8993]. These innovations have enabled increasingly precise mapping of the boundaries and magnitude of genetic alterations throughout the genome in a single experiment, greatly increasing our understanding of the cancer genome landscape in the context of DNA copy number [35, 9498]. While early attempts have been made utilizing sequence-based approaches [99102], recent studies have begun to illustrate the improvement in detection resolution through the advances in high-throughput sequencing technologies [6, 13, 15, 16]. The popularity of genome sequencing will depend on further cost reduction in data generation and major advancements in analysis [103].

Copy number variation

The discovery of a vast abundance of germ line segmental DNA copy number variation (CNV) in the normal human population has not only provided a baseline for interpretation of cancer genome data but also highlighted the need for comparison against paired normal tissue [20, 21, 33, 34, 104111]. Moreover, it has been shown that many of the reported CNVs overlap with loci involved with sensory perception and more importantly, disease susceptibility. While the role of CNV in cancer is not well understood, a recent study showed that these regions are more susceptible to genomic rearrangement and may initiate subsequent alterations during tumorigenesis [112]. Moreover, CNV at 1q21.1 was recently shown to be associated with neuroblastoma and implicated NBPF23, a new member of the neuroblastoma breakpoint family, in tumorigenesis [113]. A database of all known CNVs is available at http://projects.tcag.ca/variation [33]. In addition, as copy number profiles of cancer genomes accumulate, hotspots for amplification and deletion are becoming evident, and signature alterations associated with specific diseases and cancer histologic subtypes are emerging [114118]. The manifestation of “oncogene addiction” through lineage-specific DNA amplification is a case in point [40, 41, 119122].

Allelic status

SNP arrays are best known for their application in genome-wide association studies (GWAS), where the correlation of haplotype with phenotype implicates disease susceptibility [123, 124]. SNP array platforms have shown tremendous advances in resolution, with the number of SNPs that can be simultaneously measured increased by 1,000-fold since initial development. Currently, for example, the Affymetrix SNP 6.0 array platform measures 1.8 million elements representing 906,600 SNP elements and >946,000 CNV elements. Likewise, on the Illumina HumanOmni1 platform, over 1,000,000 sites (representing a mixture of SNP and CNV elements) can be simultaneously assessed. In addition to their application in GWAS, SNP arrays can also be used to detect somatic alterations and, when applied in this context, can allow for the simultaneous detection of copy number alteration and allele imbalance in tumor genomes. In the example in Fig. 2, when the SNP array profile of a lung cancer genome is compared against that of its paired noncancerous lung tissue, it is not only possible to distinguish regions of allelic balanced copy neutrality (Fig. 2a) from allelic imbalance (Fig. 2b, c), but also regions of allelic imbalance due to segmental DNA copy number alteration (Fig. 2b) from those without change in total copy number (Fig. 2c).

Fig. 2
figure 2

SNP array analysis to identify areas of altered copy number and allelic composition in a clinical lung cancer specimen. Shown here are a a region that is copy-neutral with no observed allelic imbalance and regions containing a b segmental gain and c UPD. Examining the allele-specific copy number plot, the gain (in b) is likely a single-copy change, and the UPD event (in c) is signified by the shift in allele levels while maintaining total copy number neutral status

Mutational profiling and whole genome sequencing

In cancer, oncogenes are thought to harbor mutations which lead to increased protein expression or constitutive protein activation while tumor suppressor genes are thought to harbor mutations which are inactivating, either through total loss of protein expression or expression of mutant, nonfunctional protein. In addition, activating and inactivating mutations can also be accompanied by changes in gene dosage or allele status (see below). Traditionally, mutation screening has been focused on specific oncogene and tumor suppressor loci. With the availability of newer and cheaper sequencing technologies [125], recent studies have expanded from single gene analyses to genome-wide screens [6, 13, 15, 16, 126]. For example, in studies using small cell lung cancer and melanoma cell lines, tens of thousands of somatic mutations were identified in each cell line, with a proportion of these mutations being attributed to cigarette smoke (G to T substitutions) and UV exposure (C to T), respectively [4, 5]. It will be interesting to see if other cancers have such mutation signatures. Another observation made in both studies was that the uneven distribution of mutations suggests that DNA sequence integrity is largely maintained by transcription-associated DNA repair. While these and future studies will uncover a vast number of mutations, the contribution of those mutations to tumorigenesis will need to be determined [127, 128].

2.3 Genomic landscape: gains, losses, and uniparental disomy

Individually, the study of genomic dimensions has yielded a global description of cancer genomes in terms of gene dosage, allelic status, and somatic mutation. Collectively, however, the integration of these three dimensions has brought two concepts to the forefront: allele-specific copy number alterations and uniparental disomy (UPD; Fig. 2). Typically, the relationship between somatic mutation and allele-specific copy number alterations has been associated with tumor suppressor genes (e.g., RB1 and TP53) whereby mutation is combined with loss to achieve biallelic inactivation [129, 130]. However, recent studies have shown preferential amplification of alleles encoding mutated oncogenes as well [131136]. In non-small cell lung cancer, mutant allele specific imbalance (MASI) is frequently present in mutant EGFR and KRAS tumor cells and is associated with increased mutant allele transcription and gene activity [136].

UPD is the presence of two copies of a chromosome segment from one parent and the absence of that DNA from the other parent. Somatic UPD, also known as copy-neutral loss of heterozygosity (LOH), results in loss of heterozygosity (tumor versus normal), without a change in total DNA copy number [137139]. UPD is observed at tumor suppressor gene loci whereby upon loss of the wild type allele, the mutated allele is duplicated resulting in a diploid state with homozygous mutation of the target gene [140]. Interestingly, UPD events are also detected at mutated oncogenes [136, 141143]. Until recently, due to limitations in the resolution of genomic array platforms, the prevalence of this event has been widely underestimated and underappreciated. Recent studies have shown that UPD events are frequently observed in tumor genomes, with most of the findings reported from hematological malignancies [144153]. Our genome-wide analysis of segmental gain, loss, and UPD in the T47D breast cancer cell line genome identified that a significant portion of the genome exhibits UPD, rivaling the proportion of the genome affected by segmental gain and loss and highlighting the potential of UPD as a prominent mechanism of gene disruption in epithelial cancer (Fig. 3). Interestingly, PIK3CA and TP53 mutations in T47D are noted in the Catalogue of Somatic Mutations in Cancer [67]. Integrative analysis at these loci detected copy number increase at PIK3CA and copy number loss at TP53 illustrating the MASI concept described above (Fig. 3).

Fig. 3
figure 3

Overlay of chromosomal regions of gain, loss, and UPD (copy number neutral LOH) inherent to the T47D breast cancer cell line. The chromosomal loci for PIK3CA and TP53 (modified by activating and inactivating mutations, respectively, in this cell line) are indicated. The majority of the genome is affected by any one of the three genomic alterations. Raw SNP 6.0 array data were obtained from the Sanger database with mutation status obtained from the COSMIC database [67]. Copy number and allelic status changes were determined using Partek Genomics Suite, and reference genomes used were 72 individuals from the HapMap collection. Data were visualized using the SIGMA2 software [7]

Somatic UPD also exists at genes without mutation. The potential significance of this somatic event is not readily apparent, but it raises the intriguing possibility of allelic conversion of epigenetic status [139, 144, 154].

3 Epigenomic alterations

3.1 The cancer methylome

Abnormal DNA methylation patterns occur in cancer, whereby focal hypermethylation at many CpG islands is evident in a background of global DNA hypomethylation [155158]. Broad hypomethylation may lead to genomic instability, while hypermethylation of CpG islands silences transcription of specific genes [157, 159161]. Nonrandom methylation of multiple CpG islands observed in colon cancer led to the discovery of CpG island methylator phenotype, which is causally linked to microsatellite instability via silencing of the mismatch repair gene, MLH1 [162164].

The determination of DNA methylation status relies on the ability to discriminate between methylated and unmethylated cytosines. This is achieved by exploiting methylation-sensitive/insensitive isoschizomer restriction enzyme pairs [165171], chemical conversion of unmethylated cytosine to uracil [172177], and the affinity for methylated DNA of specially developed antibodies and methylated DNA binding proteins [26, 178184]. Several computational methods have been developed for deriving approximations of actual methylation levels from the relative levels generated by most microarray and locus-specific sequencing assays [168, 183, 185, 186]. However, it is important to note that CpG targets represented on microarrays may or may not be the only elements controlling gene expression. Recently, it was shown that in the human colon cancer methylome sequences up to 2 kb away from CpG islands, termed CpG island shores, exhibited more methylation than CpG islands and had greater influence on gene expression than CpG islands [187]. Furthermore, while excess promoter methylation is typically associated with transcriptional repression, the loss of required methylation within gene bodies, proximal to promoters, can have the same effect [188]. DNA methylation of epigenetic neighborhoods in the megabase size range has also been reported [189]. Validation of methylation-mediated control of gene-specific expression and evaluation of biological significance can be achieved via pharmacologic manipulation of DNA methylation, for example by 5-azacytidine treatment, to relieve methylation silencing and invoke re-expression [22, 190].

The first single-base resolution maps of the human methylome have recently been generated by sequencing of bisulfite converted DNA from human embryonic stem cells and fetal fibroblasts [14, 191]. This landmark study will greatly advance the analysis of DNA methylation by providing whole genome reference maps of methylation in these specific cells. However, it is well known that DNA methylation is tissue-specific and that it changes throughout development; thus, methylome maps for all tissues at various stages of development may be necessary to provide adequate maps of “normal” methylation patterns for use in deciphering aberrant methylation patterns characteristic of tumors [192197]. In recognition of this, the Human Epigenome Project was launched in 2004 to map the methylomes of all major human tissues [198].

3.2 Integration of cancer genomic and epigenomic events

DNA methylation and genomic instability

Cancer-specific aberrant DNA methylation is associated with reduced genomic stability and subsequent copy number alterations, including preferential loss of certain imprinted alleles (LOI) [199205]. Mechanistically, this instability may be related to the susceptibility of hypomethylated DNA to undergo inappropriate recombination events [206]. Another mechanism known to negatively impact genomic integrity in lung cancer is the relaxation of transposable element control that is mediated by DNA methylation [207211].

DNA hypomethylation and DNA amplification

Preliminary evidence of specific demethylation of somatic segmental amplifications (or amplicons) has been put forth in lung cancer, perhaps representing a novel mechanism of aberrant oncogene activation [210, 212]. Further studies using large-scale sequencing of bisulfite-treated DNA will help to clarify this phenomenon [14]. Hypomethylation has also been implicated in the formation of specific copy number alterations in glioblastoma multiforme [213]. One potentially interesting application for DNA methylation profiling of cancer amplicons such as these is in the discrimination between “driver” and “passenger” genes within the amplified sequence. It may be that DNA methylation within the promoters or gene bodies of these genes is responsible for the lack of uniform overexpression of genes residing within amplicons.

DNA hypermethylation and copy number loss

The relationship between DNA hypermethylation and allelic loss is well documented. Tumor suppressor genes are frequently found in regions of common LOH, and these same TSGs are frequently found to be hypermethylated, perhaps best exemplified by the FHIT gene on chromosome 3p [214]. Although it is unclear whether loss or hypermethylation occurs first, both are known to be very early events in tumorigenesis preceding any histologic alterations [215217]. With the advent of high resolution genome-wide technologies, it has become possible to comprehensively search for genes that are inactivated by both mechanisms simultaneously [218].

Histone modification states

While DNA methylation and gene dosage profiling technologies have become accessible, technologies for global assays of other key epigenetic marks including histone modifications are not widely available. One of the main challenges to conducting the highest quality studies of genome-wide chromatin immunoprecipitation on microarray (ChIP-chip) or on sequencing platform (ChIP-seq) experiments is the requirement of high-quality DNA from pure cells—which essentially means growing cells in culture. It is thus difficult to analyze these dimensions from clinical specimens. However, much has been learned from studies of the relationship between different histone modification states and transcriptional activation or repression in model systems. Such examples utilizing ChIP-chip include: cell or context-specific histone modification patterns related to cell or context-specific gene expression; histone 3 lysine 27 (H3K27) trimethylation patterns associated with prostate, lung, and breast cancers; and H3K9 and H3K79 modification patterns in leukemia [219225]. Examples utilizing ChIP-seq include: the analysis of the growth inhibition program of the androgen receptor and the chromatic interaction network of the estrogen receptor [226, 227].

4 Relating genetic and epigenetic events to changes in the transcriptome through integrative analysis

Aberrations in individual genetic or epigenetic dimensions are prominent across various cancer types, culminating in changes to the transcriptome. However, for a given gene, most of the events documented previously, such as copy number amplification, homozygous deletion, somatic mutation, or DNA hypermethylation, do not occur in 100% of tumors for a given cancer type. Moreover, it has been observed that the same gene may be activated or inactivated by different mechanisms. Since most of the studies described above analyzed single DNA dimensions, it is likely that many genes would be overlooked due to a low frequency of alteration in a single dimension; the same gene may be detected at a high frequency when multiple dimensions are considered. Thus, analysis of more dimensions may reveal higher frequency gene-specific disruption with corresponding transcriptome aberrations for particular cancer types, as would be expected for genes causative to cancer development.

4.1 Multiple mechanisms of gene disruption

Expression profiling studies have been instrumental in detecting genes dysregulated in cancer [228230]. However, aberrant expression of some genes may simply reflect incidental genome instability or secondary dysregulation. Global gene expression profiling alone may not distinguish causal events and bystander changes. One of the first studies to relate gene expression changes with gene dosage status on a global scale was a parallel analysis of DNA and mRNA [88, 231]. The same cDNA microarray platform was used to investigate impact of DNA copy number alterations on the expression of over 6,500 genes. This study determined that 62% of genes located within regions of DNA amplification showed elevated expression in breast cancer. Subsequent studies in other cancer types revealed a broad range in the correlation between increased gene dosage and expression levels for protein coding genes (19% to 62%) [114, 228, 231234]. Studies integrating gene dosage and gene expression have identified cancer subtype-specific pathway activation and signatures associated with clinical outcome [118, 235238]. In addition, when examining known disease-relevant pathways, it has been shown that even though individual components of a pathway are disrupted at a low frequency, collectively, these alterations can result in frequent disruption of a given pathway [18, 114]. Similarly, alterations in DNA methylation or histone modification status can also affect gene expression and have subsequent pathway level consequences (see above).

4.2 Multiple mechanisms of disrupting noncoding RNA levels

Segmental DNA copy number alterations also affect the expression of noncoding RNAs [239243]. MicroRNAs (miRNA) have been shown to have a significant role in cancer development with specific miRNAs implicated in a number of different cancer types [28, 244246]. Specific miRNA expression signatures are associated with critical steps in tumor initiation and development including cell hyperproliferation, angiogenesis, tumor formation, and metastasis [247]. High-throughput analysis of microRNAs has been of interest, and microarrays have been developed to assess essentially all annotated microRNAs. To date, >700 miRNAs have been annotated in the genome (http://mirdb.org/miRDB/statistics.html, [75]), with more likely to be discovered. For example, we recently demonstrated that a deletion on chromosome 5q leads to the reduced expression of two miRNAs that are abundant in hematopoietic stem/progenitor cells. This study revealed haploinsufficiency and reduced expression of miR-145 and miR-146a as mediators of a subtype of myelodysplastic syndrome [242]. Although the genomic loss and underexpression implicates a tumor-suppressive role for these specific miRNAs, others undergo activating genomic alterations and elevated expression and hence are thought to be oncogenic [248, 249].

Just as copy number alterations can alter miRNA activity, epigenetic alterations have also been shown to affect miRNA expression [250252]. Aberrant methylation of miRNAs has been reported in a variety of cancer types, and the disruption of epigenetically mediated miRNA control has been shown to have oncogenic effects due to downstream gene deregulation [253]. For example, abnormal DNA methylation of miRNAs has been associated with tumor metastasis, leading to the appreciation of a group of metastasis-related miRNAs [249].

4.3 Multidimensional integration of genome, epigenome, and transcriptome

Large-scale initiatives

Since multiple genomic/epigenomic mechanisms can influence gene expression and lead to disruption of a given function, an integrative multidimensional analysis is necessary for a more comprehensive understanding of the cancer phenotype (Fig. 4). Specific programs and initiatives such as those by The Cancer Genome Atlas project and the cancer Biomedical Informatics Grid enable parallel and multidimensional analysis of cancer genomes [8, 18] (Table 2). Recently, studies in glioblastoma and osteosarcoma have shown that integrative genomic and epigenomic approaches can indeed reveal the specific genetic pathways involved in different cancers [18, 254].

Fig. 4
figure 4

Integration of copy number, allelic status, DNA methylation, and gene expression for a single lung adenocarcinoma sample. a Copy number and b allele status analyses revealed a high level allele-specific DNA amplification (highlighted in yellow, image generated with Partek Genomics Suite); c individual CpG loci within this region were assessed for differential methylation between tumor and nonmalignant tissue. Hypomethylation at the indicated CpG locus, which corresponds to the MUC1 gene, is observed (visualized with Genesis). d Expression analysis revealed fourfold overexpression of the MUC1 transcript when a tumor sample was compared to matched, adjacent nonmalignant tissue. Copy number and allele status profiling was performed using the Affymetrix SNP 6.0 array; DNA methylation profiling using the Illumina Infinium HM27 platform and gene expression using the Affymetrix Human Exon 1.0 ST array

Gene disruption by multiple mechanisms

One of the two key reasons for using an integrative approach is the ability to detect critical genes that are disrupted by multiple mechanisms across a sample set but are disrupted at a low frequency by any one mechanism. These genes would have been overlooked in previous, single dimensional studies. The second key advantage of integrative approaches is the ability to identify genes that are simultaneously disrupted by multiple mechanisms—two hits—in a single sample. Using a dataset comprised of DNA copy number, allelic status, DNA methylation, and gene expression profiles from ten lung adenocarcinomas and matched nonmalignant tissue controls, we illustrate these benefits below.

If gene expression changes are a consequence of alterations at the DNA level, then a higher proportion of the observed expression changes can be directly attributed to a defined causal event when multiple types of DNA alterations are examined (Fig. 5a). While some samples have over 70% of the expression associated with DNA level changes (sample 7, sample 8), other samples have only 30% (sample 5, sample 9). Additionally, consequential to associating more gene expression changes with DNA level changes within a sample, more disrupted genes are detected, and in turn, more disrupted pathways are identified across a sample set (Fig. 5b, c). In fact, in our example, nearly five times as many genes (∼1,100 compared to ∼200) are detected as disrupted in at least 50% of the samples when we account for multiple mechanisms of disruption (versus one mechanism alone; Fig. 5c). This result illustrates that without using an integrative approach, many potentially important genes would be dismissed as they are disrupted by low frequency events when a single DNA dimension is analyzed. This also holds true at the pathway level when the identified genes are grouped based on their biological function (Fig. 5d). For example, the Hepatic Fibrosis/Hepatic Stellate Cell Activation pathway and the RAR Activation pathway, which are identified when all DNA dimensions are considered, would not be detected as significantly altered when using individual DNA dimensions alone.

Fig. 5
figure 5

Enhanced analysis of the cancer phenotype using an integrative and multidimensional approach. a On average, a higher proportion of differential gene expression can be associated with genomic alterations when examining multiple DNA dimensions relative to single dimensions. b Using a fixed frequency threshold of 50%, more genes are revealed to be frequently disrupted when multiple mechanisms of genomic alteration (e.g., altered copy number, DNA methylation, or copy number neutral LOH) are considered (∼200 genes versus more than 1,000 genes). c Pathway analyses performed using gene lists derived from a multidimensional approach identifies an enhanced number of aberrant pathways relative to those identified from a unidimensional approach. d Functional pathways identified using the integrated gene list are of relatively high significance; the top 10 such pathways are shown. This suggests that the additional identified genes associate with specific pathways rather than with random functions. The four bars represent, from left to right: all dimensions, copy number, DNA methylation, and UPD. Ingenuity Pathway Analysis was used for analyses in c and d. e Example of two genes that are missed when a single DNA dimension is studied but captured when multiple DNA dimensions are examined. Both ribonucleotide reductase M2 (RRM2) [255, 256] and retinoic acid receptor responder (tazarotene-induced) 2 (RARRES2) [257, 258] are known to be deregulated in multiple cancer types

Implications on sample size requirements

In the example above, we illustrate that a significant number of genes and pathways exhibit a low frequency of disruption when examining single dimensions (and thus would be overlooked) but, indeed, exhibit a high frequency of disruption when multiple dimensions are considered (Fig. 5). Notably, these findings imply that integrative multidimensional analysis of individual samples may directly impact the cohort sample size required for gene discovery on the basis of frequency of disruption (Fig. 5e). Reduction in sample size requirements means that one can extend this approach to situations involving rare specimens where accrual of hundreds of samples in a reasonable timeframe is not possible. Moreover, reduced sample sizes are particularly applicable to familial cancers or to isolated populations at increased risk for specific cancers.

Biallelic gene disruption

Two-hit biallelic inactivation of genes and high-level gene amplifications are typically considered to be causal mechanisms that inflict gene expression changes. When examining multiple DNA dimensions, concerted biallelic disruption of a gene in the same sample can be readily identified; copy number loss with hypermethylation resulting in underexpression or copy number gain with hypomethylation and overexpression are examples. Indeed, we do identify genes harboring concerted disruptions using the same lung adenocarcinoma dataset mentioned above. The MUC1 locus exhibits concurrent copy number increase with hypomethylation and overexpression (Fig. 4). MUC1 has previously been shown to be important in lung and breast cancers and is currently a target for therapeutic intervention [259261]. Collectively, we have demonstrated how an integrative, multidimensional approach can be utilized for cancer gene and pathway discovery.

4.4 Disruption of multiple components in biological pathways

We described above how an integrative, multidimensional approach improves the detection of disrupted genes, especially those affected by multiple low-frequency mechanisms. This concept can be extended to identify biological pathways, where multiple pathway components are disrupted at low frequencies (see above; Fig. 5d). The EGFR signaling pathway is a well-documented dysregulated component of lung cancer. Using the same multidimensional profiling dataset from Fig. 5 above, seven genes were detected with gene dosage alteration at a frequency ≥30%. However, when we considered alterations in gene dosage, allelic status, DNA methylation, and somatic mutation collectively (for KRAS and EGFR only), 18 genes in the pathway were identified to be altered at ≥30% frequency (Fig. 6). The detection of the additional 11 genes illustrates the benefit of employing an integrative approach and extends the sample size reduction argument to the pathway level.

Fig. 6
figure 6

Identification of multiple disrupted components in a biological pathway. Integrative analysis identifies more genes affected in the EGFR signaling pathway than a single dimensional analysis alone. In this example, multidimensional profiling data were generated from ten lung adenocarcinomas and their paired noncancerous lung tissue. Analysis of DNA copy number (gene dosage) alterations that affected expression identified seven genes (in green) that are disrupted at ≥30% frequency. However, when alterations in copy number, DNA methylation, sequence mutation, and/or copy-neutral LOH were considered, 17 genes disrupted at ≥30% frequency were identified to be associated with a change in expression, with an additional gene, KRAS, harboring frequent mutation. The 11 additional genes are indicated in red. Genes in gray are not significant in this dataset as they did not meet the frequency criteria

5 Tracking clonal expansion in spatial dimensions

Delineating the clonal relationship between multiple tumors in the same patient is relevant not only to clinical management of disease but also to the understanding of metastasis. Multiple tumors in the same patient may not necessarily share an identical genomic profile. The similarities and differences in genomic landscape between tumors are quantifiable and therefore can be used for delineating relatedness. Whole genome comparison based on array CGH profiles is a new tool for distinguishing metastatic from primary synchronous carcinomas. A multitude of genomic features, for example the boundaries of segmental deletions, are used to delineate the presence and the sequence of events in clonal evolution [262270].

Furthermore, signature genetic alterations can be used to track clonality in a cell population, putting genetic events in the context of tumor tissue architecture. By assessing the appearance of preselected markers in individual nuclei on a tissue section by FISH, the clustering and the expansion of clonally related cells can be delineated by analyzing the marker patterns of neighboring cells (Fig. 7).

Fig. 7
figure 7

Automated detection of selected clonal populations of cells within a cancer biopsy tissue section. All nuclei (∼150,000 in this example) are detected, and FISH probe signal counts are enumerated for each nucleus. FISH signal pattern for each cell is compared against its neighbor in order to define spatial association (or neighborhood). A mathematical model is then applied to determine clonal cell relationships. a Mapping cancer cells on a tissue section. A gain or loss of any one of three FISH markers indicates a cancer cell. This image shows the density of cancer cells (so defined) in neighborhoods as a color overlay. Red indicates high fraction of cancer cells, yellow indicates medium fraction of cancer cells, and blue indicates low to none (see scale bar). Most of the section is highlighted except for the surrounding normal stromal infiltrates. b Mapping clonal cells. The same image data were analyzed for concurrent gains of each of the three markers. The two clusters of cells, magnified within the white boxes, are cells harboring gain of all three markers

6 Evaluating the biological significance of integrative genomics findings

The utilization of an integrative genomic, epigenomic, and transcriptomic approach will undoubtedly improve our ability to identify gene disruptions and their effects on gene expression. The next challenge is to develop approaches for the determination of functional and phenotypic evidence of the biological relevance of such gene disruptions in a high-throughput manner—for example, functional genomic screens by RNAi, proteomic profiling, and metabolite profiling. Forced expression of genes and RNAi knockdown of gene expression are commonly used methods for assessing growth and invasion phenotypes in cell models. Genome-wide RNAi screens, comprised of large libraries of short hairpin RNA sequences redundantly targeting thousands of genes, have been used to identify genes essential to tumorigenesis, including tumor suppressor genes as well as cooperative genes with oncogenic mutation in several malignancies [24, 30, 31, 271279]. Animal models are also instrumental to functional validation of genes singly or in combination, but this topic is beyond the scope of this article. Cross referencing genomic findings with proteomic profiles will determine the functional consequences yielding information on expression levels, posttranslational modification, and protein–protein interactions [280284]. As recent studies have highlighted the importance of the metabolome in cancer, the genomic landscape can also be integrated with metabolome profiles to determine the role of genetic and epigenetic alterations in cellular physiology relevant to cancer development [285287].

The progress made in the development of technologies and approaches to analyze the genome, epigenome, and transcriptome has allowed for much improved understanding of cancer landscapes. With the increased application of sequence-based approaches to analyze genetic and epigenetic dimensions and the additional complexity with the proteome and metabolome to follow, an unprecedented definition of the cancer cell can be achieved. The next key challenge will be the synthesis of this information to better understand fundamental cancer processes such as progression, metastasis, and drug resistance.