Skip to main content

Gene Selection and Classification of Human Lymphoma from Microarray Data

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNBI,volume 3745)

Abstract

Experiments in DNA microarray provide information of thousands of genes, and bioinformatics researchers have analyzed them with various machine learning techniques to diagnose diseases. Recently Support Vector Machines (SVM) have been demonstrated as an effective tool in analyzing microarray data. Previous work involving SVM used every gene in the microarray to classify normal and malignant lymphoid tissue. This paper shows that, using gene selection techniques that selected only 10% of the genes in “Lymphochip” (a DNA microarray developed at Stanford University School of Medicine), a classification accuracy of about 98% is achieved which is a comparable performance to using every gene. This paper thus demonstrates the usefulness of feature selection techniques in conjunction with SVM to improve its performance in analyzing Lymphochip microarray data. The improved performance was evident in terms of better accuracy, ROC (receiver operating characteristics) analysis and faster training. Using the subsets of Lymphochip, this paper then compared the performance of SVM against two other well-known classifiers: multi-layer perceptron (MLP) and linear discriminant analysis (LDA). Experimental results show that SVM outperforms the other two classifiers.

Keywords

  • Support Vector Machine
  • Linear Discriminant Analysis
  • Receiver Operating Characteristic
  • Receiver Operating Characteristic Curve
  • Hide Node

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/11573067_38
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-31658-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alizadeh, A.A., Eisen, M.B., et al.: Distinct types of diffuse large B-cell lyumphoma identified by gene expression profiling. Nature 403, 503–511 (2000)

    CrossRef  Google Scholar 

  2. Alon, U., Barkai, N., Notterman, D.A., et al.: Broad patterns of gene expressions revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: PNAS, vol. 96, pp. 6745–6750. National Academy of Sciences, Washington (1999)

    Google Scholar 

  3. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles. In: 4th Intl Conf on Comptnl Molecular Bio, Universal Acad. Press, Tokyo (2000)

    Google Scholar 

  4. Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)

    Google Scholar 

  5. Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Agnes Jr. M., Haussler, D.: Support vector machine classification of microarray gene expression data. Technical report, U. California, Santa Cruz (1999)

    Google Scholar 

  6. Caruana, R.A., Freitag, D.: How useful is relevance? Technical report, Fall 1994 AAAI Symposium on Relevance, New Orleans (1994)

    Google Scholar 

  7. Chang, C. C., Lin, C. J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm

  8. Chercassky, V., Mullier, P.: Learning from Data, Concepts, Theory and Methods. John Wiley, Chichester (1998)

    Google Scholar 

  9. Devore, J.L.: Probability and Statistics for Engineering and the Sciences. Brooks/Cole, Monterey (1987)

    MATH  Google Scholar 

  10. Dudoid, S., fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. Technical report, University of California, Berkeley (2000)

    Google Scholar 

  11. Lukas, L., et al.: Brain tumor classification based on long echo proton mrs signals. Artificial Intelligence in Medicine 31, 73–89 (2004)

    CrossRef  Google Scholar 

  12. Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1998)

    CrossRef  Google Scholar 

  13. Haykin, S.: Neural Network - A Comprehensive Foundation. Prentice Hall, Englewood Cliffs (1999)

    Google Scholar 

  14. Khan, J., Wei, J.S., Ringnér, M., Sall, L.H., Ladanyi, M., Westermann, F.: Classification and diagnostic prediction of cancers using gene expression profiling and aritifical neural networks. Nat. Med. 7(6), 673–679 (2001)

    CrossRef  Google Scholar 

  15. Molina, L.C., Belanche, L., Nebot, A.: Feature selection algorithms: A survey and experimental evaluation. In: ICDM 2002 (2002)

    Google Scholar 

  16. Demuth, H.B., Hagan, M.T., Beale, M.H.: Neural Network Design. PWS Publishing, Boston (1996)

    Google Scholar 

  17. De Risi, J., Iyer, V., Brown, P.: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 666–680 (1997)

    CrossRef  Google Scholar 

  18. Rumelhart, D.E.: Parallel Distributed Processing and the PDP Research Group. MIT Press, New York (1986)

    Google Scholar 

  19. Simon, R., Lam, A.P.: BRB ArrayTools v 3.2 (2004), http://linus.nci.nih.gov/BRB-ArrayTools.html

  20. Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl. Acad Sci. 98, 5116–5121 (2001)

    CrossRef  MATH  Google Scholar 

  21. Valentini, G.: Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles. Artificial Intelligence in Medicine 26, 281–304 (2002)

    CrossRef  Google Scholar 

  22. Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (1995)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kamruzzaman, J., Lim, S., Gondal, I., Begg, R. (2005). Gene Selection and Classification of Human Lymphoma from Microarray Data. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds) Biological and Medical Data Analysis. ISBMDA 2005. Lecture Notes in Computer Science(), vol 3745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573067_38

Download citation

  • DOI: https://doi.org/10.1007/11573067_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29674-4

  • Online ISBN: 978-3-540-31658-9

  • eBook Packages: Computer ScienceComputer Science (R0)