Event Models for Tumor Classification with SAGE Gene Expression Data

  • Xin Jin
  • Anbang Xu
  • Guoxing Zhao
  • Jixin Ma
  • Rongfang Bie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3992)

Abstract

Serial Analysis of Gene Expression (SAGE) is a relatively new method for monitoring gene expression levels and is expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. A promising application of SAGE gene expression data is classification of tumors. In this paper, we build three event models (the multivariate Bernoulli model, the multinomial model and the normalized multinomial model) for SAGE data classification. Both binary classification and multicategory classification are investigated. Experiments on two SAGE datasets show that the multivariate Bernoulli model performs well with small feature sizes, but the multinomial performs better at large feature sizes, while the normalized multinomial performs well with medium feature sizes. The multinomial achieves the highest overall accuracy.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ng, R.T., Sander, J., Sleumer, M.C.: Hierarchical Cluster Analysis of SAGE Data for Cancer Profiling. In: BIOKDD, 65–72 (2001)Google Scholar
  2. 2.
    Velculescu, V.E., Zhang, L., Vogelstein, B., Kinzler, K.W.: Serial Analysis of Gene Expression. Science 270, 484–487 (1995)CrossRefGoogle Scholar
  3. 3.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning 46(1/3), 389–422 (2002)MATHCrossRefGoogle Scholar
  4. 4.
  5. 5.
    Sander, J., Ng, R.T., Sleumer, M.C., Saint Yuen, M., Jones, S.J.: A Methodology for Analyzing SAGE Libraries for Cancer Profiling. ACM Transactions on Information Systems 23(1), 35–60 (2005)CrossRefGoogle Scholar
  6. 6.
    McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48. AAAI Press, Menlo Park (1998)Google Scholar
  7. 7.
    Cover, T.: Elements of Information Theory. Wiley & Sons, Chichester (1991)MATHCrossRefGoogle Scholar
  8. 8.
    Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: Twentieth International Conference on Machine Learning, August 22 (2003)Google Scholar
  9. 9.
    Dettling, M.: BagBoosting for Tumor Classification with Gene Expression Data. Bioinformatics 20(18), 3583–3593 (2004)CrossRefGoogle Scholar
  10. 10.
    Lewis, D.D.: Naive (Bayes) at forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)Google Scholar
  11. 11.
    Hilden, J.: Statistical Diagnosis Based on Conditional Independence Does not Require It. Computational Methods in Biology and Medicine 14(4), 429–435 (1984)CrossRefGoogle Scholar
  12. 12.
    Hellerstein, J., Thathachar, J., Rish, I.: Recognizing End-user Transactions in Performance Management. In: Proceedings of AAAI 2000, Austin, Texas, pp. 596–602 (2000)Google Scholar
  13. 13.
    Helman, P., Veroff, R., Atlas, S.R., Willman, C.L.: A Bayesian Network Classification Methodology for Gene Expression Data. Journal of Computational Biology 11(4), 581–615 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xin Jin
    • 1
  • Anbang Xu
    • 1
  • Guoxing Zhao
    • 2
    • 3
  • Jixin Ma
    • 3
  • Rongfang Bie
    • 1
  1. 1.College of Information Science and TechnologyBeijing Normal UniversityBeijingP.R. China
  2. 2.School of Mathematical SciencesBeijing Normal UniversityBeijingP.R. China
  3. 3.School of Computing and Mathematical ScienceUniversity of GreenwichLondonU.K.

Personalised recommendations