Abstract
In many gene expression studies, the goals include discovery of novel biological classes and identification of genes whose expression can reliably be associated with these classes. Here we present a statistical analysis approach to facilitate both of these goals. The key idea is to model gene expression using latent categories that can be interpreted as a gene being turned “on“ or “off“ compared to a baseline level of expression. This three-way categorization is used for defining a reference in the unsupervised setting, for removing noise prior to clustering, for defining molecular subclasses in a way that is portable across platforms, and for defining easily interpretable probability-based distance measures for visualization, mining, and clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, J. Hudson Jr J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511.
Berger JO (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. New York: Springer-Verlag.
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M (2001). Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences USA 98:13790–13795.
Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang W, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V, Hayward N, Trent J (2000). Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406:536–540.
Clyde MA, Parmigiani G, Vidakovic B (1998). Multiple shrinkage and subset selection in wavelets. Biometrika 85:391–402.
Cowles MK, Carlin BP (1996). Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association 91:883–904.
Diebolt J, Robert CP (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society, Series B 56:363–375.
Duggan D, Bittner M, Chen Y, Meltzer P, Trent J (1999). Expression profiling using cDNA microarrays. Nature Genetics 21:10–14.
Eisen MB, Spellman PT, Brown PO, Botstein D (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Science, USA 95:14863–14868.
Fraley C, Raftery AE (1998). How many clusters? Which clustering method? — Answers via model-based cluster analysis. Computer Journal 41:578–588.
Garber ME, Troyanskaya OG, Schluens K, Petersen S, Thaesler Z, Pacyna-Gengelbach M, van de Rijn M, Rosen GD, Perou CM, Whyte RI, Altman RB, Brown PO, Botstein D, Petersen I (2001). Diversity of gene expression in adenocarcinoma of the lung. Proceedings of the National Academy of Sciences USA 98:13784–13789.
George EI (1986). Minimax multiple shrinkage estimation. The Annals of Statistics 14:188–205.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh M, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531–537.
Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan WC, Botstein D, Brown P (2000). “Gene shaving“ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology 1:research0003.1–research0003.21.
Kato K, Hida Y, Miyamoto M, Hashida H, Shinohara T, Itoh T, Okushiba S, Kondo S, Katoh H (2002). Overexpression of caveolin-1 in esophageal squamous cell carcinoma correlates with lymph node metastasis and pathologic stage. Cancer 94:929–933.
Lee ML, Kuo FC, Whitmore GA, Sklar J (2000). Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proceedings of the National Academy of Sciences USA 97(18):9834–9839.
McLachlan GJ, Bean RW, DP (2002). A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422.
Parmigiani G, Garrett ES, Anbazhagan R, Gabrielson E (2002). A statistical framework for expression-based molecular classification in cancer. Journal of the Royal Statistical Society, Series B, to appear.
Quackenbush J (2001). Computational analysis of microarray data. Nature Reviews Genetics 2:418–427.
Rousseeuw P, Struyf A, Hubert M (1996). Clustering in an object-oriented environment. Journal of Statistical Software 1:1–30.
Walliman T, Hemmer W (1994). Creatine kinase in non-muscle tissues and cells. Molecular Cell Biochemistry 133–134:193–220.
West M, Turner D (1994). Deconvolution of mixtures in analysis of neural synaptic transmission. The Statistician 43:31–43.
Yang G, Truong L, Wheeler T, Park S, Nasu Y, Bangma M, Kattan P, Scardino P, Thompson T (1998). Elevated expression of caveolin is associated with prostate and breast cancer. Clinical Cancer Research 4:1873–1880.
Yeung K, Fraley C, Murua A, Raftery A, Ruzzo W (2001). Model-based clustering and data transformations for gene expression data. Bioinformatics 17:977–987.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Garrett, E.S., Parmigiani, G. (2003). POE: Statistical Methods for Qualitative Analysis of Gene Expression. In: Parmigiani, G., Garrett, E.S., Irizarry, R.A., Zeger, S.L. (eds) The Analysis of Gene Expression Data. Statistics for Biology and Health. Springer, New York, NY. https://doi.org/10.1007/0-387-21679-0_16
Download citation
DOI: https://doi.org/10.1007/0-387-21679-0_16
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-95577-3
Online ISBN: 978-0-387-21679-9
eBook Packages: Springer Book Archive