Classical Statistical Approaches to Molecular Classification of Cancer from Gene Expression Profiling

  • Jun Lu
  • Sarah Hardy
  • Wen-Li Tao
  • Spencer Muse
  • Bruce Weir
  • Susan Spruill

Abstract

Recent literature regarding microarray technology has focused on the need to incorporate classical statistical practices in experimental design in order to utilize more robust, classical statistical methodologies in data analysis. We have demonstrated that classical statistical methods are applicable to analysis of data previously presented by Golub, et al. 1999. Our preliminary analysis of all 6817 genes involves simple t-tests for statistically significant separation of means of gene expression level in two cancer types. Our subsets of genes that distinguish AML types from ALL types are relatively consistent with those published by Golub. We select those predictor genes based on the t-values and stepwise discriminant analysis, and evaluate the resulting model’s performance in predicting 34 test samples by linear discriminant analysis. Only two samples were not correctly predicted (samples 61 and 66) with 25 predictor genes we chose. We also evaluate the parsimony of our model by evaluating, through a stepwise method, the minimum number of genes required to maintain a high level of accuracy in predicting cancer types.

Key words

discriminant analysis stepwise leukemia SAS PROC DISCRIM 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bittner, Michael, Meltzer, Paul, and Trent, Jeffrey, 1999, Data analysis and integration: of steps and arrows. Nature Genetics. Vol 22, pp213–215.PubMedCrossRefGoogle Scholar
  2. Dudoit, Sandrine, Yang, Yee Hwa, Callow Matthew J., and Speed, Terence P., 2000, Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Technical Report #578, http://www.stat.Berkeley.EDU/users/terry/zarray/html/matt.html Google Scholar
  3. Duggan, David J., Bittner, Michael, Chen, Yidong, Meltzer Paul, and Trent, Jeffrey M., 1999, Expression profiling using cDNA microarrays. Nature Genetics, Vol 21, pp10–14.PubMedCrossRefGoogle Scholar
  4. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D. and Lander, E.S., 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, Vol 286, pp531–537. PubMedCrossRefGoogle Scholar
  5. Golub’s web site(www.genome.wi.mit.edu/MPR)
  6. Hilsenbeck, Susan G., Friedrichs, William E., Schiff, Rachel, O’Connell, Peter, Hansen, Rhonda K., Osborne, Kent, and Fuqua, Suzanne A.W., 1999, Statistical analysis of array expression data as applied to the problem of tamoxifen resistance. J Nat. Cancer Inst, Vol 91.5, pp453–459PubMedCrossRefGoogle Scholar
  7. Kaminski, Naftali, Allard, John D., Pittet, Jean F., Zuo, Fengrong, Griffiths, Mark J.D., Morris, David, Huang, Xiaozhu, Sheppard, Dean, and Heller, Renu A., 2000, Global analysis of gene expression in pulmonary fibrosis reveals distinct programs regulating lung inflammation and fibrosis. PNAS, Vol 97.4, pp 1778–1783PubMedCrossRefGoogle Scholar
  8. Kerr, M.K. and Churchill, G.A., 2001, Statistical design and the analysis of gene expression microarray data. Genet. Res. Apr: 77(2), pp 123–128.Google Scholar
  9. PUBMED http://www.ncbi.nlm.nih.gov/entrez/query.fcgi)
  10. SAS/STAT User’s Guide (V6.04), 1990. SAS Institute, Inc., Cary, NC, USAGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2002

Authors and Affiliations

  • Jun Lu
    • 1
  • Sarah Hardy
    • 1
  • Wen-Li Tao
    • 1
  • Spencer Muse
    • 1
  • Bruce Weir
    • 1
  • Susan Spruill
    • 2
  1. 1.Program in BioinformaticsNC State UniversityUSA
  2. 2.PPGx. Inc., a subsidiary of DNA SciencesUSA

Personalised recommendations