Classical Statistical Approaches to Molecular Classification of Cancer from Gene Expression Profiling
Recent literature regarding microarray technology has focused on the need to incorporate classical statistical practices in experimental design in order to utilize more robust, classical statistical methodologies in data analysis. We have demonstrated that classical statistical methods are applicable to analysis of data previously presented by Golub, et al. 1999. Our preliminary analysis of all 6817 genes involves simple t-tests for statistically significant separation of means of gene expression level in two cancer types. Our subsets of genes that distinguish AML types from ALL types are relatively consistent with those published by Golub. We select those predictor genes based on the t-values and stepwise discriminant analysis, and evaluate the resulting model’s performance in predicting 34 test samples by linear discriminant analysis. Only two samples were not correctly predicted (samples 61 and 66) with 25 predictor genes we chose. We also evaluate the parsimony of our model by evaluating, through a stepwise method, the minimum number of genes required to maintain a high level of accuracy in predicting cancer types.
Key wordsdiscriminant analysis stepwise leukemia SAS PROC DISCRIM
Unable to display preview. Download preview PDF.
- Dudoit, Sandrine, Yang, Yee Hwa, Callow Matthew J., and Speed, Terence P., 2000, Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Technical Report #578, http://www.stat.Berkeley.EDU/users/terry/zarray/html/matt.html Google Scholar
- Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D. and Lander, E.S., 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, Vol 286, pp531–537. PubMedCrossRefGoogle Scholar
- Golub’s web site(www.genome.wi.mit.edu/MPR)
- Hilsenbeck, Susan G., Friedrichs, William E., Schiff, Rachel, O’Connell, Peter, Hansen, Rhonda K., Osborne, Kent, and Fuqua, Suzanne A.W., 1999, Statistical analysis of array expression data as applied to the problem of tamoxifen resistance. J Nat. Cancer Inst, Vol 91.5, pp453–459PubMedCrossRefGoogle Scholar
- Kaminski, Naftali, Allard, John D., Pittet, Jean F., Zuo, Fengrong, Griffiths, Mark J.D., Morris, David, Huang, Xiaozhu, Sheppard, Dean, and Heller, Renu A., 2000, Global analysis of gene expression in pulmonary fibrosis reveals distinct programs regulating lung inflammation and fibrosis. PNAS, Vol 97.4, pp 1778–1783PubMedCrossRefGoogle Scholar
- Kerr, M.K. and Churchill, G.A., 2001, Statistical design and the analysis of gene expression microarray data. Genet. Res. Apr: 77(2), pp 123–128.Google Scholar
- PUBMED http://www.ncbi.nlm.nih.gov/entrez/query.fcgi)
- SAS/STAT User’s Guide (V6.04), 1990. SAS Institute, Inc., Cary, NC, USAGoogle Scholar