Genomic instability is important for cancer development and can manifest as copy number (CN) gain or loss as well as loss of heterozygosity (LOH). Copy number neutral LOH (CNNLOH) has been observed in tumors following the widespread application of SNP array technology [1, 2]. CNNLOH is common in many tumor types, including basal cell carcinoma [3], acute myeloid leukemia [4, 5], medulloblastoma [6], melanoma [7], follicular lymphoma [8], colorectal cancers [911], glioblastoma [12, 13], cutaneous squamous cell carcinomas [14], acute promyelocytic leukemia [15], acute lymphoblastic leukemia [16], ovarian tumor [17], and esophageal adenocarcinoma [18], and has recently been reviewed for myeloid malignancies [19]. CNNLOH is thought to result from mitotic recombination or nondisjunction in somatic tumor cells [3]. However, the distribution of complex DNA alterations and its relation to gene expression in tumors have not been characterized in ESCC.

ESCC is a common malignancy worldwide and one of the most common cancers in the Chinese population; Shanxi Province in north central China has some of the highest esophageal cancer rates in the world [20, 21]. Previously, we identified several regions of LOH and CN alteration in ESCC using microsatellite markers and low- and high-density SNP arrays [2227], where the majority of ESCC patients from this high-risk population were found to have high genomic instability and high frequency of LOH on several chromosome arms. However, we have not found causal mutations in candidate genes within the LOH regions identified. For example, 82% of 56 ESCCs showed LOH when tested with four microsatellite markers flanking ANXA1 ( 9q11-q21) , but no somatic mutations were detected in these patients [28]. Another example is BRCA2, which also showed frequent LOH in ESCC (57% for D13S260, 83% for D13S767), but only infrequent somatic mutations in these cancer patients (2/56, 3.5%) [29, 30]. Contrary to expectation, expression of BRCA2 was often increased (unpublished data).

In the present study, we analyzed DNA from 30 micro-dissected ESCC tumors, adjacent normal tissue, and blood DNA from the same patient using the Affymetrix 500K SNP array to identify the distribution of complex DNA alterations, including CNNLOH, and we related CNNLOH to expression of the genes affected as assessed with the Affymetrix U133A 2.0 array in these patients.


Case selection

This study was approved by the Institutional Review Boards of the Shanxi Cancer Hospital and the US National Cancer Institute (NCI). Cases diagnosed with ESCC between 1998 and 2001 in the Shanxi Cancer Hospital in Taiyuan, Shanxi Province, PR China, and considered candidates for curative surgical resection were identified and recruited to participate in this study. None of the cases had prior therapy and Shanxi was the ancestral home for all. After obtaining informed consent, cases were interviewed to obtain information on demographics, cancer risk factors (eg, smoking, alcohol drinking, and detailed family history of cancer), and clinical information. The cases evaluated here were part of a larger case-control study of upper gastrointestinal cancers conducted in Shanxi Province [3133].

Biological specimen collection and processing

Venous blood (10 ml) was taken from each case prior to surgery and germ-line DNA from whole blood was extracted and purified using the standard phenol/chloroform method.

Tumor and adjacent normal tissues were dissected at the time of surgery and stored in liquid nitrogen until used. One 5-micron section was H&E stained and reviewed by a pathologist from the NCI to guide the micro-dissection. Five to ten consecutive 8-micron sections were cut from fresh frozen tumor and adjacent normal tissues. Tumor and normal cells were manually micro-dissected under light microscopy. DNA was extracted from micro-dissected tumor as previously described [34] using the protocol from the Puregene DNA Purification Tissue Kit (Gentra Systems, Inc., Minneapolis, MN). RNA was extracted from 17 of these micro-dissected tumor and matched normal tissue pairs using the protocol from the PureLink Micro-to-Midi Total RNA Purification System (Catalog number 12183-018, Invitrogen, Carlsbad, CA). RNA quality and quantity were determined using the RNA 6000 Labchip/Agilent 2100 Bioanalyzer (Agilent Technologies, Germantown, MD). The same tissue blocks were used for extraction of both DNA and RNA for each case studied.

Target preparation for GeneChip Human Mapping 500 K array set

The Affymetrix GeneChip Human Mapping 500 K array set contains ~262,000 (Nsp I array) and ~238,000 (Sty I array) SNPs (mean probe spacing = 5.8 Kb, mean heterozygosity = 27%). A detailed gene chip protocol can be found at

Experiments were conducted according to the protocol (GeneChip Mapping Assay manual) supplied by Affymetrix, Inc. (Santa Clara, CA). Genotype calls were generated by GTYPE v 4.0 software (Affymetrix). Germ-line, tumor and adjacent normal DNA from each case were run together in parallel in the same experiment (ie, same batch, same day). The GEO accession numbers for these array data are GSE15526 and GSE20347.

Probe preparation and hybridization for Human Genome U133A 2.0 array

The Affymetrix Human Genome U133A 2.0 array is a single array used to interrogate expression of 14,500 well-characterized human genes. Array experiments were performed using 1-5 μg total RNA each. We followed the protocol provided by the manufacturer to carry out reverse transcription, labeling, and hybridization.

GeneChip 500 K array data analysis

Probe intensity data from Affymetrix 500 K SNP arrays were used to identify DNA alterations in the present study. To avoid gender-related issues, SNPs mapped to either the X or Y chromosome were excluded.

Copy number (CN) loss or gain was based on comparisons of either adjacent normal to germ-line DNA or tumor to germ-line DNA. Microarray data were first normalized using the gtype-probe set-genotype package included in Affymetrix Power Tools version 1.85. Each tumor sample was individually normalized via the BRLMM algorithm along with 99 blood samples. These blood samples were obtained from the 30 ESCC cases evaluated in the present study plus 69 healthy controls (age-, sex-, and region-matched to cases) who were all part of a larger case-control study of upper gastrointestinal cancers conducted in Shanxi Province (as noted above). Paired CN analysis was then performed on each sample using the Affymetrix Power Tools paired-copy-number workflow, which implements the Affymetrix Copy Number Analysis Tool (CNAT) algorithm. DNA obtained from the blood of each case served as the normal control; a sliding window of 100 kb was chosen to optimize the identification of extended regions of CN alteration (see The output of the CNAT program is CN state rather than an absolute CN prediction: normal CN corresponds to a state of 2; zero and 1 correspond to CN loss; and states 3 and 4 correspond to CN gain.

In the present study, we modified the method for identifying LOH used in our previous studies [26, 27]. Here, LOH was determined using the Affymetrix Power Tools copynumber-pipeline program paired-LOH workflow. Input was *.CHP files generated with the gtype-probeset-genotype package as described above. Matched blood DNA served as the reference for LOH analysis for each tumor and normal adjacent sample.

Combination of LOH and CN alterations

We defined six combinations of copy number state and LOH status. LOH positive loci may have CN loss (CN ≤ 1), be CN neutral (CNNLOH, CN = 2) or show CN gain (CN ≥ 3); Likewise, LOH negative loci may show CN loss, gain, or neutrality. LOH and CN segments for each tumor were defined independently for each sample as contiguous blocks of informative SNPs that possessed the same LOH and CN state. Endpoints of LOH/CN segments were defined by informative SNPs. Some uninformative SNPs were located between these LOH/CN segments; we considered these SNPs to have an undefined LOH/CN state (see Additional file 1/Figure S1). Segment sizes were empirically observed from the data.

Comparison of CN status in DNA from blood versus micro-dissected adjacent normal tissue

DNA isolated from normal adjacent tissue is frequently used as a control in microarray experiments. In the present study we used DNA isolated from peripheral blood. We expected peripheral blood DNA to be a superior control for two reasons: first, unlike adjacent normal tissue, it is does not run the risk of being contaminated with tumor cells; second, adjacent normal tissue may actually be precancerous and contain genetic lesions. To examine whether blood DNA and adjacent normal esophageal DNA were equivalent controls, we compared copy number state calls for blood and normal adjacent from each of the 30 ESCC patients. We found that the two controls were equivalent: 99.29% to 99.99% of all copy number calls were identical. Overall, 99.96% of SNPs in blood and 99.93% in normal adjacent tissue were CN = 2 state.

Human Genome U133A 2.0 array data analysis and relation between CNNLOH and mRNA expression

The Robust Multiarray Average (RMA) algorithm [35, 36] implemented in Bioconductor in R was used for background correction and normalization across all samples. For each sample log2 fold changes in gene expression were calculated by subtracting the adjacent normal RMA value from the corresponding tumor RMA value.

To determine whether any gene showed a difference in the tumor versus normal gene expression fold change that was dependent on LOH state, we performed the following steps: (i) First, genes assayed by the U133A microarray were mapped onto each LOHCN segment of each sample. Map locations of genes were taken from the Affymetrix version na29 microarray annotation file. Note that probe sets from the same gene may have different reference sequences which differ in their chromosomal locations. Also, not every gene will map to every sample - in a particular sample, a gene may map to a gap between LOHCN regions. (ii) Next, we identified genes for which at least two of the 17 ESCC samples with expression data were LOH negative and at least two samples were LOH positive. (iii) We then performed two-sided unpaired t-tests comparing the log2 fold changes for a probe set in LOH positive and LOH negative samples. A P-value < 0.01 was considered significant. (iv) Finally, SNPs on the 500 K microarray were mapped to the reference sequence for each expression probe set. Since probe sets from the same gene may have different reference sequences, they may differ in the number of SNPs assigned to them (Additional file 2/Figure S2).


In the present study we determined copy number and loss of heterozygosity (LOH) status in DNA isolated from germ-line and micro-dissected tumor and matched adjacent normal samples from 30 ESCC patients using the Affymetrix 500 K SNP array. The average genotype call rate was 96% (89-99%): the 250 K Nsp I array was 96% (90-98%) and 250 K Sty I array was 95% (89-99%). Genotype call rates were similar for all three tissue types examined. We first analyzed whether copy numbers were similar between DNAs from the two normal tissues: germ-line (blood) and micro-dissected adjacent normal samples. Our analysis indicated that DNA CN values were similar between the two normal tissues (Additional file 3 - Table S1), as expected. Our results indicate that germ-line DNA can be used as a normal control in studies of CN alteration; it is more readily available than matched adjacent normal tissue.

Complex DNA alterations in ESCC

The distribution of DNA alterations in each of the 30 ESCC cases is summarized in Table 1 (with LOH) and in Additional file 4/Table S2 (without LOH). We divided genomic regions into three groups based on CN states: CN loss, neutral, and gain. We found that 50%, 90%, and 93% of cases showed LOH in the CN loss, neutral, and gain groups, respectively (Table 1). For each chromosome, we also calculated the percentage of SNPs involved in LOH for each group. They ranged between 20-57%, 7-100%, and 2-100% for the CN loss, neutral, and gain groups, respectively (Table 1). Our results suggest that LOH with CN neutral or gain are common phenomena in ESCC. For SNPs without LOH, we also calculated the percent of SNPs in each CN state; averages were 5%, 84%, and 11% for CN loss, neutral, and gain, respectively.

Table 1 LOH by copy number in ESCC cases by individual case (N = 30)

The distribution of the six types of DNA alterations for all 30 cases by chromosome arm is shown in Table 2 (with LOH) and Additional file 5/Table S3 (without LOH). CNNLOH was observed on all chromosome arms, but most frequently on 19p (100%), 5p (96%), 2p (95%), and 20q (95%). The highest frequencies of LOH with CN loss (CN = 1) were found on 3p (56%), 5q (47%), and 21q (41%); relatively high frequencies were also seen on 18q (31%), 11q (29%), 1p (28%), 19q (27%), and 11p (25%). LOH with CN gain was most common on 20p (82%), 8q (74%) and 3q (42%) (Table 2 and Figure 1). Taken together, our results show that LOH with CNN or CN gain were much more frequent than LOH with CN loss on every chromosome arm but one (ie, 3p).

Table 2 LOH by copy number in ESCC cases by chromosomal arm (N = 30 cases)
Figure 1
figure 1

Patterns of loss of heterozygosity and copy number variation in 30 ESCC samples for chromosome 3. Each row (numbered 1 - 30) represents an individual ESCC sample. Circles indicate the positions of SNPs showing LOH. SNP positions are color coded as follows: black indicates copy number neutral LOH; blue indicates LOH accompanied by copy number reduction; red indicates LOH with copy number gain. An ideogram of the chromosome is at the bottom of the figure.

Results of CN alterations in non-LOH group by chromosome arms are summarized in Additional file 5/Table S3. Briefly, a frequency of CN loss ≥ 10% was observed on eight chromosome arms (3p, 4p, 4q, 5q, 8p, 9p, 11q, and 13q). A frequency of CN gain ≥ 10% was observed on 13 chromosome arms (1q, 2p, 2q 3q, 5p, 7p, 7q, 8q, 12p, 14q, 18p, 20p, and 20q).

Relation between genomic alterations and gene expression

The average present call rate on the Human Genome U133A array was 53% (range 51- 61%) for the 34 chips from the 17 sample pairs with sufficient tissue for RNA isolation and testing. To investigate the relation between LOH/CNV and gene expression levels, we intersected genes on the Affymetrix U133A chip with SNPs on the 500 K SNP array. SNPs that mapped within genes are summarized in Additional file 6/Table S4 and include 169,687 SNPs within 12,225 genes.

We were interested in identifying differentially-expressed genes between LOH and non-LOH groups in genes that were CN neutral. A total of 4,572 genes qualified for this analysis (see Methods). Among these genes, 168 genes showed significant differences in expression between tumors with and without LOH (P < 0.01) (Additional file 7/Table S5). Based on chance alone (at the P < 0.01 level), differences in only 45 genes would be expected, therefore, expression differences were observed in over three times as many genes as expected. One hundred and one (60%) of the 168 genes showed lower expression levels in CNNLOH than in the normal group (ie, CNN, no LOH), whereas 67 genes (40%) showed higher expression levels in CNNLOH (Additional file 7/Table S5). Twenty-eight of the 101 down-regulated genes (32 probes) and 18 of the 67 up-regulated genes (19 probes) showed expression differences ≥ 2-fold (Table 3). These findings suggest that in the CN neutral state, LOH can affect gene expression.

Table 3 Comparison of gene expression in copy number neutral (CNN) genes with LOH and without LOH (normal) (N = 46 genes significantly differentially-expressed 2-fold or greater)*

We also compared expression of genes with LOH versus no LOH in CN loss genes. We identified six of 600 genes which showed significantly different expression between the LOH groups. All six genes showed increased expression in tumors with LOH (Table 4a).

Table 4 Comparison of gene expression in copy number loss/gain genes with LOH and without LOH*

Finally, we compared gene expression in the CN gain state between tumors with and without LOH. We found that six of 354 genes showed significant differences in expression between the two groups, including two down-regulated and four up-regulated genes (Table 4b).


We characterized ESCC tumors for complex DNA alterations - LOH and CNV - and related these genomic alterations to gene expression. To our knowledge, this is the first report to comprehensively address the distribution of complex DNA alterations in ESCC and its relation to gene expression on a genome-wide scale.

Ninety percent of cases showed CNNLOH in their tumors and, over all cases, CNNLOH was found on every chromosome arm, indicating that it is a common phenomenon.

The frequency of CNNLOH observed here in ESCC was much less than has been reported in other cancers [319]. For example, in colon cancer and basal cell carcinoma nearly all LOH was associated with copy number neutral regions [3, 10]. In general, CNNLOH occurs with variable frequency in different genomic regions in tumors of different origin. There are several differences between the study reported here and previous studies which likely influenced the results. First, DNA from micro-dissected tumor and adjacent normal was used in the present study, while either cancer DNA without matched controls or cancer cell lines were used in most other reported studies. Second, we examined LOH and CN alterations using the same SNP array platform, while other studies used SNPs for LOH and CGH arrays for CN analyses. Third, the criteria for identifying LOH differed among the studies reported. Finally, the types of cancers studied previously differ from the present study which is the first report of CNNLOH in ESCC.

In previous LOH studies, we reported high-frequency LOH on several chromosome arms, including 3p, 4p, 4q, 9p, 9q, 13q, 17p, and 17q [23, 26, 27]. By integrating LOH and CN alteration data in the present study, we can now say that the LOH on 3p is primarily due to CN loss LOH, while the LOH on the other seven chromosome arms is predominantly due to CNNLOH.

Our results showed that CNNLOH can change expression levels of genes in ESCC, either increasing or decreasing them. We do not know why CNNLOH changes gene expression, but one possibility is that the two alleles may have different gene expression levels. For example, if allele A expression is greater than allele B, the expression level for the 3 genotypes would be ordered as AA > AB > BB. CNNLOH with retention of two B alleles (genotype BB) would then show lower expression than genotype AB. Conversely, CNNLOH with loss of the allele B would result in two copies of allele A and a higher level of expression than that of AB cells. Another possibility is that the two alleles have different expression due to different epigenetic states, with LOH resulting in copies with two extreme epigenetic states. A third possibility is that one allele harbors a mutation and subsequent LOH leads to a homozygous mutant. Several studies have shown that CNNLOH regions can harbor mutated genes. For example, JAK2 V617F, FLT3-ITD, AML1/RUNX1, WT1, and NPM1 mutations were all found in CNNLOH regions in AML [15]. These various hypotheses merit testing in the future.

The study design in the present study has several important features: (i) we compared CN status between DNA from germ-line and micro-dissected adjacent normal tissue; (ii) we used micro-dissected DNA from tumor tissue; (iii) we assessed both LOH and CN alterations simultaneously using the same array platform; and (iv) we integrated complex DNA alterations and gene expression data on a genome-wide level using both high density SNP and expression arrays in the same cases. A noteworthy weakness of our study is the relatively small number of cases evaluated (including a particularly small number of cases with both LOH and RNA expression data to evaluate, due in part to the 500K chip mean heterozygosity of 27%), which limited our power to detect significant differences in loci between LOH and non-LOH groups. In addition, findings for ESCC from this high-risk region may not be generalizable to populations elsewhere in the world.

In summary, we investigated the distribution of complex DNA alterations in ESCCs at the genome-wide level and determined that CN neutral is the most common CN state in LOH, and that CNNLOH is a very common phenomenon overall. Importantly, we also showed that CNNLOH could alter the expression level of genes affected in ESCC.


CNNLOH is a common phenomenon in many cancers, including ESCC, and non-disjunction and/or somatic recombination are the most likely mechanisms for its occurrence. CNNLOH can result in changes in gene expression which are functionally significant. Expression differences in CNNLOH suggest that alleles are different in terms of their gene expression potential, and that these differences may result from differences in genotype and/or epigenetics.