Skip to main content
Log in

Towards the Molecular Characterization of Disease: Comparison of Molecular and Histological Analysis of Esophageal Epithelia

  • Published:
Journal of Gastrointestinal Surgery Aims and scope

Abstract

Reliable quantification of gene expression offers the possibility of more accurate and prognostically relevant characterization of tissues than potentially subjective interpretations of histopathologists. We measured the expression of 18 selected genes and compared them to histological features in a spectrum of esophageal disease to evaluate the feasibility of molecular characterization of normal and pathologic esophageal epithelia. Esophageal tissue biopsies from 82 patients with foregut symptoms were laser capture microdissected, and the expression levels of 18 selected genes were measured by quantitative real-time polymerase chain reaction. Linear discriminant analysis, which uses combinations of genes to distinguish between histological groups, was performed to compare gene expression and the following five histological groups: (1) normal squamous epithelium (n = 35), (2) reflux esophagitis (n = 13), (3) non-dysplastic Barrett’s (n = 33), (4) dysplastic Barrett’s (n = 16), (5) adenocarcinoma (n = 31). A panel of seven genes had 90–94% predictive power to distinguish non-dysplastic and dysplastic Barrett’s esophagus. Clustering analysis revealed structure in gene expression values even in the absence of histology. Expression levels in 17 genes differed significantly across histological groups. Classification based on gene expression agreed with histopathological assessment in the following percentage of cases: normal squamous epithelium = 53%, reflux esophagitis = 31%, non-dysplastic Barrett’s = 76%, dysplastic Barrett’s = 40%, and adenocarcinoma = 59%. Interestingly, predictive power improved markedly when inflammatory and dysplastic tissues were removed (77–94%). Gene expression classification agrees well with histopathological examination. When differences occur, it is unclear whether this effect is due to intraobserver variability in pathological diagnosis or to a genuine difference between gene expression and histopathology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1

Similar content being viewed by others

References

  1. Alikhan M, Rex D, Khan A, Rahmani E, Cummings O, Ulbright TM. Variable pathologic interpretation of columnar lined esophagus by general pathologists in community practice. Gastrointest Endosc 1999;50:23–26.

    Article  PubMed  CAS  Google Scholar 

  2. Skacel M, Petras RE, Gramlich TL, Sigel JE, Richter JE, Goldblum JR. The diagnosis of low-grade dysplasia in Barrett’s esophagus and its implications for disease progression. Am J Gastroenterol 2000;95:3383–3387.

    Article  PubMed  CAS  Google Scholar 

  3. Ormsby AH, Petras RE, Henricks WH, Rice TW, Rybicki LA, Richter JE, Goldblum JR. Observer variation in the diagnosis of superficial oesophageal adenocarcinoma. Gut 2002;51:671–676.

    Article  PubMed  CAS  Google Scholar 

  4. Heid CA, Stevens J, Livak KJ, Williams PM. Real time quantitative PCR. Genome Res 1996;6:986–994.

    Article  PubMed  CAS  Google Scholar 

  5. Khan J, Saal LH, Bittner ML, Chen Y, Trent JM, Meltzer PS. Expression profiling in cancer using cDNA microarrays. Electrophoresis 1999;20:223–229.

    Article  PubMed  CAS  Google Scholar 

  6. Ioannidis JP. Microarrays and molecular research: Noise discovery? Lancet 2005;365:454–455.

    PubMed  Google Scholar 

  7. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: A multiple random validation strategy. Lancet 2005;365:488–492.

    Article  PubMed  CAS  Google Scholar 

  8. Abramovitz A, Leyland-Jones B. A systems approach to clinical oncology: Focus on breast cancer. Proteome Sci 2006;4:5–20.

    Article  PubMed  Google Scholar 

  9. Woodward TA, Klingler PD, Genko PV, Wolfe JT. Barrett’s esophagus, apoptosis and cell cycle regulation: Correlation of p53 with Bax, Bcl-2 and p21 protein expression. Anticancer Res 2000;20:2427–2432.

    PubMed  CAS  Google Scholar 

  10. Vallbohmer D, DeMeester SR, Peters JH, Oh DS, Kuramochi H, Shimizu D, Hagen JA, Danenberg KD, Danenberg PV, DeMeester TR, Chandrasoma PT. Cdx-2 expression in squamous and metaplastic columnar epithelia of the esophagus. Dis Esophagus 2006;19:260–266.

    Article  PubMed  CAS  Google Scholar 

  11. Vallbohmer D, Peters JH, Kuramochi H, Oh D, Yang D, Shimizu D, DeMeester SR, Hagen JA, Chandrasoma PT, Danenberg KD, Danenberg PV, DeMeester TR. Molecular determinants in targeted therapy for esophageal adenocarcinoma. Arch Surg 2006;141:476–481.

    Article  PubMed  CAS  Google Scholar 

  12. Vallbohmer D, Peters JH, Oh D, Kuramochi H, Shimizu D, Demeester SR, Hagen JA, Chandrasoma PT, Danenberg KD, DeMeester TR, Danenberg P. Survivin, a potential biomarker in the development of Barrett’s adenocarcinoma. Surgery 2005;138:701–706.

    Article  PubMed  Google Scholar 

  13. Osterheld MC, Bian YS, Bosman FT, Benhattar J, Fontolliet C. Beta-catenin expression and its association with prognostic factors in adenocarcinoma developed in Barrett esophagus. Am J Clin Pathol 2002;117:451–456.

    Article  PubMed  CAS  Google Scholar 

  14. Park WS, Lee JH, Shin MS, Park JY, Kim HS, Kim YS, Park CH, Lee SK, Lee SH, Lee SN, Kim H, Yoo NJ, Lee JY. Inactivating mutations of KILLER/DR5 gene in gastric cancers. Gastroenterology 2001;121:1219–1225.

    Article  PubMed  CAS  Google Scholar 

  15. Yacoub L, Goldman H, Odze RD. Transforming growth factor-alpha, epidermal growth factor receptor, and MiB-1 expression in Barrett’s-associated neoplasia: Correlation with prognosis. Mod Pathol 1997;10:105–112.

    PubMed  CAS  Google Scholar 

  16. Flejou JF, Paraf F, Muzeau F, Fekete F, Henin D, Jothy S, Potet F. Expression of c-erbB-2 oncogene product in Barrett’s adenocarcinoma: Pathological and prognostic correlations. J Clin Pathol 1994;47:23–26.

    Article  PubMed  CAS  Google Scholar 

  17. Lord RV, Salonga D, Danenberg KD, Peters JH, DeMeester TR, Park JM, Johansson J, Skinner KA, Chandrasoma P, DeMeester SR, Bremner CG, Tsai PI, Danenberg PV. Telomerase reverse transcriptase expression is increased early in the Barrett’s metaplasia, dysplasia, adenocarcinoma sequence. J Gastrointest Surg 2000;4:135–142.

    Article  PubMed  CAS  Google Scholar 

  18. Samantaray S, Sharma R, Chattopadhyaya TK, Gupta SD, Ralhan R. Increased expression of MMP-2 and MMP-9 in esophageal squamous cell carcinoma. J Cancer Res Clin Oncol 2004;130:37–44.

    Article  PubMed  CAS  Google Scholar 

  19. Brabender J, Lord RV, Metzger R, Park J, Salonga D, Danenberg KD, Danenberg PV, Holscher AH, Schneider PM. Differential SPARC mRNA expression in Barrett’s oesophagus. Br J Cancer 2003;89:1508–1512.

    Article  PubMed  CAS  Google Scholar 

  20. Salmela MT, Karjalainen-Lindsberg ML, Puolakkainen P, Saarialho-Kere U. Upregulation and differential expression of matrilysin (MMP-7) and metalloelastase (MMP-12) and their inhibitors TIMP-1 and TIMP-3 in Barrett’s oesophageal adenocarcinoma. Br J Cancer 2001;85:383–392.

    Article  PubMed  CAS  Google Scholar 

  21. Lenz HJ, Danenberg KD, Leichman CG, Florentine B, Johnston PG, Groshen S, Zhou L, Xiong YP, Danenberg PV, Leichman LP. p53 and thymidylate synthase expression in untreated stage II colon cancer: Associations with recurrence, survival, and site. Clin Cancer Res 1998;4:1227–1234.

    PubMed  CAS  Google Scholar 

  22. Vallbohmer D, Peters JH, Kuramochi H, Schneider S, Hamoui N, Uchida K, Shimizu D, Chandrasoma PT, Danenberg KD, Danenberg P, DeMeester TR. Candidate biomarkers in the development of Barrett’s adenocarcinoma; CDX-2 & TSPAN-1. 45th Annual meeting of the society for surgery of the alimentary tract (SSAT), DDW New Orleans, 2004a.

  23. Kuramochi H, Vallböhmer D, Uchida K, Schneider S, Hamoui N, Shimizu D, Chandrasoma PT, DeMeester TR, Danenberg KD, Danenberg PV, Peters JH. Quantitative, tissue-specific analysis of cyclooxygenase gene expression in the pathogenesis of Barrett’s adenocarcinoma. J Gastrointest Surg 2004;8:1007–1016.

    Article  PubMed  Google Scholar 

  24. Bonner RF, Emmert-Buck M, Cole K, Pohida T, Chuaqui R, Goldstein S, Liotta LA. Laser capture microdissection: Molecular analysis of tissue. Science 1997;278:1481–1483.

    Article  PubMed  CAS  Google Scholar 

  25. Lord RV, Salonga D, Danenberg KD, Peters JH, DeMeester TR, Park JM, Johansson J, Skinner KA, Chandrasoma P, DeMeester SR, Bremner CG, Tsai PI, Danenberg PV. Telomerase reverse transcriptase expression is increased early in the Barrett’s metaplasia, dysplasia, adenocarcinoma sequence. J Gastrointest Surg 2000;4:135–142.

    Article  PubMed  CAS  Google Scholar 

  26. Hanawa M, Suzuki S, Dobashi Y, Yamane T, Kono K, Enomoto N, Ooi A. EGFR protein overexpression and gene amplification in squamous cell carcinomas of the esophagus. Int J Cancer 2006;118:1173–1180.

    Article  PubMed  CAS  Google Scholar 

  27. Hulscher JB, Haringsma J, Benraadt J, Offerhaus GJ, ten Kate FJ, Baak JP, Tytgat GN, van Lanschot JJ. Barrett advisory committee comprehensive cancer centre Amsterdam Barrett advisory committee: First results. Neth J Med 2001;58:3–8.

    Article  PubMed  CAS  Google Scholar 

  28. Oberg S, Wenner J, Johansson J, Walther B, Willen R. Barrett esophagus: Risk factors for progression to dysplasia and adenocarcinoma. Ann Surg 2005;242(1):49–54.

    Article  PubMed  Google Scholar 

  29. Tusher V, Tibshirani R, Chu C. Significance analysis of microarrays applied to ionizing radiation response. PNAS 2001;98:5116–5121.

    Article  PubMed  CAS  Google Scholar 

  30. Brabender, Marjoram P, Salonga D, Metzger R, Schneider PM, Park JM, Schneider S, Holscher AH, Yin J, Meltzer SJ, Danenberg KD, Danenberg PV, Lord RV. A multigene expression panel for the molecular diagnosis of Barrett’s esophagus and Barrett’s adenocarcinoma of the esophagus. Oncogene 2004;23:4780–4788.

    Article  PubMed  Google Scholar 

  31. Ambroise C, McLachlan GJ. Selection bias in gene extraction on the basis of microarray gene expression data. Proc Natl Acad Sci 2002;99:6562–6566.

    Article  PubMed  CAS  Google Scholar 

  32. Hartigan JA. Clustering Algorithms. New York: Wiley, 1975.

    Google Scholar 

Download references

Acknowledgment

This paper was supported by NIH RO1-CA84424-02 (JHP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey H. Peters.

Additional information

Daniel Vallböhmer and Paul Marjoram contributed equally to the paper.

Appendix

Appendix

A more lengthy discussion of the statistical techniques used in this analysis is provided in the following.

SAM/t test

The analysis was begun by identifying genes that have significantly changed expression for one or more stages of the disease. After log-transforming the data, three methods to do this were used: t tests, Wilcoxon rank sum tests, and SAM.29 SAM is a package that uses a function of the change in gene expression, related to the standard deviation, and then tests the significance of the calculated value via permutation tests.29

LDA

An LDA attempts to find vectors, consisting of linear combinations of gene expression values, which are able to successfully discriminate between samples by correctly assigning them to their known histological groups. This is an example of a “supervised clustering” method. Other examples include neural nets and multinomial regression. In previous work, an LDA was found to perform as well as, or better than, these other methods.30

In a context such as in this study, in which there are relatively many genes, one has to be aware of the issue of overfitting; that is, given enough genes, it might be expected to explain any classification, even for data that represent nothing but noise. Two schemes were employed to guard against this. Firstly, when fitting the LDA, the cross-validation error rate as an assessment of goodness-of-fit was used. Secondly, the results were compared to those obtained from analyzing data sets in which the histologies were randomly permuted (Fig. 2). These approaches are briefly explained in the following.

Figure 2
figure 2

Histogram of frequency of cross-validated error rates in 1,000 permuted data sets (error rate for fit of observed data shown in red).

Cross-validation

In a cross-validation test for a given model, one sample is left out of the data set and fit an LDA for the remaining samples. Then the omitted sample is reintroduced and checked whether the fitted LDA can correctly predict the type of this sample. For a sample of size n, this procedure is repeated n times, leaving out a different sample each time. Then the overall error rate (the proportion of samples that have their histology incorrectly predicted) for all n cases, as well as reporting a breakdown of this error rate by histological grouping, is reported.

Permutation Test

In the permutation test, the data set is taken and then randomly permuted the histologic labeling of the samples. Thus, a new data set is created in which the frequency of each histologic group is the same as for the original data, but for which the relationship between histology and gene expression is no longer likely to hold. Therefore, these permuted data sets are considered to represent noise. One thousand permuted data sets are generated using this scheme, and then an LDA on each data set is performed. The error rates for these data are reported, and the results to the error rate obtained on the original data are compared. This gives a measure of significance for our LDA fit.

Principal Components

Principal components analysis is a standard statistical technique that finds linear combinations of gene expression values that capture as much of the variation in the data as possible. It is analogous to an LDA in which the histological information is not used. The principal components can then be used as a means of data reduction to represent multivariate data in plots involving fewer (in our case 3) dimensions.

Test to Identify “Best Models of Genes”

As well as being interested in the fit obtained from an LDA using all genes, it is also interesting to know which subset of genes is most accurate in predicting histologic group. Somewhat counter-intuitively, it is not likely to be the case that the model containing all genes will predict histology most accurately.31 The reason for this is that if, for example, five genes are sufficient to characterize the data, extra genes that are added to the analysis will merely model the remaining noise and will thus decrease the accuracy when they are also used to predict the type of future samples. Given that there were just 17 potentially useful genes, it was possible to exhaustively explore all two17 possible LDA models and determine those that were best-fitting. This was performed using an R-script (available from the authors).

An important note regarding estimated error rates in such an exhaustive analysis should be noted. For a single LDA, the cross-validation error rate gives an accurate and unbiased estimate of the expected predictive error of the fitted model. However, when many such models are compared and those with the lowest error rates are selected, these error rates are now expected to underestimate the future predictive error rate of the given models. Thus, the analysis should be interpreted as a way of finding the models that best predict the data, and a ranking of models in terms of their predictive power, rather than as an unbiased estimate of the exact error rates of those models. For a discussion of these issues, see Ambroise and McLachlan.31

K-means Clustering

To compare the study results with those that might be obtained by an analysis that does not use the histologic information, a further (so-called ‘unsupervised’) analysis was conducted. For this, K-means clustering was used, a method that clusters data based upon correlations between gene expression without knowledge of the true histology and that has been widely used to analyze microarray data.32 When using K-means clustering, one has to pre-specify the number of groups (K) into which the data is clustered. Informally speaking, the method then constructs K “centers” and clusters each sample to its closest center. In the context of our study, these centers are vectors of gene expression values. The algorithm will report the clustering corresponding to the optimal set of centers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vallböhmer, D., Marjoram, P., Kuramochi, H. et al. Towards the Molecular Characterization of Disease: Comparison of Molecular and Histological Analysis of Esophageal Epithelia. J Gastrointest Surg 11, 1095–1104 (2007). https://doi.org/10.1007/s11605-007-0208-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11605-007-0208-x

Keywords

Navigation