Abstract
Experiments in DNA microarray provide information of thousands of genes, and bioinformatics researchers have analyzed them with various machine learning techniques to diagnose diseases. Recently Support Vector Machines (SVM) have been demonstrated as an effective tool in analyzing microarray data. Previous work involving SVM used every gene in the microarray to classify normal and malignant lymphoid tissue. This paper shows that, using gene selection techniques that selected only 10% of the genes in “Lymphochip” (a DNA microarray developed at Stanford University School of Medicine), a classification accuracy of about 98% is achieved which is a comparable performance to using every gene. This paper thus demonstrates the usefulness of feature selection techniques in conjunction with SVM to improve its performance in analyzing Lymphochip microarray data. The improved performance was evident in terms of better accuracy, ROC (receiver operating characteristics) analysis and faster training. Using the subsets of Lymphochip, this paper then compared the performance of SVM against two other well-known classifiers: multi-layer perceptron (MLP) and linear discriminant analysis (LDA). Experimental results show that SVM outperforms the other two classifiers.
Keywords
- Support Vector Machine
- Linear Discriminant Analysis
- Receiver Operating Characteristic
- Receiver Operating Characteristic Curve
- Hide Node
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh, A.A., Eisen, M.B., et al.: Distinct types of diffuse large B-cell lyumphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Alon, U., Barkai, N., Notterman, D.A., et al.: Broad patterns of gene expressions revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: PNAS, vol. 96, pp. 6745–6750. National Academy of Sciences, Washington (1999)
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles. In: 4th Intl Conf on Comptnl Molecular Bio, Universal Acad. Press, Tokyo (2000)
Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)
Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C., Agnes Jr. M., Haussler, D.: Support vector machine classification of microarray gene expression data. Technical report, U. California, Santa Cruz (1999)
Caruana, R.A., Freitag, D.: How useful is relevance? Technical report, Fall 1994 AAAI Symposium on Relevance, New Orleans (1994)
Chang, C. C., Lin, C. J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chercassky, V., Mullier, P.: Learning from Data, Concepts, Theory and Methods. John Wiley, Chichester (1998)
Devore, J.L.: Probability and Statistics for Engineering and the Sciences. Brooks/Cole, Monterey (1987)
Dudoid, S., fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. Technical report, University of California, Berkeley (2000)
Lukas, L., et al.: Brain tumor classification based on long echo proton mrs signals. Artificial Intelligence in Medicine 31, 73–89 (2004)
Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1998)
Haykin, S.: Neural Network - A Comprehensive Foundation. Prentice Hall, Englewood Cliffs (1999)
Khan, J., Wei, J.S., Ringnér, M., Sall, L.H., Ladanyi, M., Westermann, F.: Classification and diagnostic prediction of cancers using gene expression profiling and aritifical neural networks. Nat. Med. 7(6), 673–679 (2001)
Molina, L.C., Belanche, L., Nebot, A.: Feature selection algorithms: A survey and experimental evaluation. In: ICDM 2002 (2002)
Demuth, H.B., Hagan, M.T., Beale, M.H.: Neural Network Design. PWS Publishing, Boston (1996)
De Risi, J., Iyer, V., Brown, P.: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 666–680 (1997)
Rumelhart, D.E.: Parallel Distributed Processing and the PDP Research Group. MIT Press, New York (1986)
Simon, R., Lam, A.P.: BRB ArrayTools v 3.2 (2004), http://linus.nci.nih.gov/BRB-ArrayTools.html
Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl. Acad Sci. 98, 5116–5121 (2001)
Valentini, G.: Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles. Artificial Intelligence in Medicine 26, 281–304 (2002)
Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kamruzzaman, J., Lim, S., Gondal, I., Begg, R. (2005). Gene Selection and Classification of Human Lymphoma from Microarray Data. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds) Biological and Medical Data Analysis. ISBMDA 2005. Lecture Notes in Computer Science(), vol 3745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573067_38
Download citation
DOI: https://doi.org/10.1007/11573067_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29674-4
Online ISBN: 978-3-540-31658-9
eBook Packages: Computer ScienceComputer Science (R0)