An Empirical Comparison of Feature Reduction Methods in the Context of Microarray Data Classification

Kestler, Hans A.; Müssel, Christoph

doi:10.1007/11829898_24

An Empirical Comparison of Feature Reduction Methods in the Context of Microarray Data Classification

Hans A. Kestler^20,21 &
Christoph Müssel²¹

Conference paper

1013 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4087))

Abstract

The differentiation between cancerous and benign processes in the body often poses a difficult diagnostic problem in the clinical setting while being of major importance for the treatment of patients. Measuring the expression of a large number of genes with DNA microarrays may serve this purpose. While the expression level of several thousands of genes can be measured in a single experiment, only a few dozens of experiments are normally carried out, leading to data sets of very high dimensionality and low cardinality. In this situation, feature reduction techniques capable of reducing the dimensionality of data are essential for building predictive tools based on classification.

Methods and Data: We compare the popular feature selection and classification method PAM (Tibshirani et al.) to several other methods. Feature reduction and feature ranking methods, such as Random Projection, Random Feature Selection, Area under the ROC curve and PCA are applied. We employ these together with the classification component of PAM, Linear Discriminant Analysis (LDA), a Nearest Prototype (NP) classifier and linear support vector machines (SVMs). We apply these methods to three publicly available linearly separable gene expression data sets of varying cardinality and dimensionality.

Results and Conclusions: In our experiments with the gene expression data we could not discover a clearly superior algorithm, instead most surprisingly we found that feature reduction using random projections or selections performed often equally well.

Download to read the full chapter text

Chapter PDF

References

Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. PNAS 99(10), 6567–6572 (2002)
Article Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays. Statistical Science 18(1), 104–117 (2003)
Article MATH MathSciNet Google Scholar
Vempala, S.: The Random Projection Method. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 65 (2004)
Google Scholar
Duda, R.O., Hart, P., Storck, D.: Pattern classification. Wiley, Chichester (2001)
MATH Google Scholar
Webb, A.: Statistical Pattern Recognition. Wiley, Chichester (2002)
Book MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: PAMR package version 1.27 (2005), http://www-stat.stanford.edu/~tibs/PAM/Rdist
Johnson, W., Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert space. Contemporary Mathematics 26, 189–206 (1984)
MATH MathSciNet Google Scholar
Therrien, C.: Decision estimation and classification. Wiley, Chichester (1989)
MATH Google Scholar
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152. ACM Press, New York (1992)
Chapter Google Scholar
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–536 (1999)
Article Google Scholar
Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)
Article MATH MathSciNet Google Scholar
Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C., Peterson, C., Meltzer, P.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7(6), 673–679 (2001)
Article Google Scholar
Buchholz, M., Kestler, H., Bauer, A., Bock, W., Rau, B., Leder, G., Kratzer, W., Bommer, M., Scarpa, A., Schilling, M., Adler, G., Hoheisel, J., Gress, T.: Specialized DNA arrays for the differentiation of pancreatic tumors. Clin. Cancer Res. 11(22), 8048–8054 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Hans A. Kestler
Department of Internal Medicine I, University Hospital Ulm, Robert-Koch-Str. 8, 89081, Ulm, Germany
Hans A. Kestler & Christoph Müssel

Authors

Hans A. Kestler
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Müssel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Neural Information Processing, University of Ulm, D-89069, Ulm, Germany
Friedhelm Schwenker
Dipartimento di Sistemi e Informatica, Università di Firenze, Via di Santa Marta 3, 50139, Firenze, Italy
Simone Marinai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kestler, H.A., Müssel, C. (2006). An Empirical Comparison of Feature Reduction Methods in the Context of Microarray Data Classification. In: Schwenker, F., Marinai, S. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2006. Lecture Notes in Computer Science(), vol 4087. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11829898_24

Download citation

DOI: https://doi.org/10.1007/11829898_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37951-5
Online ISBN: 978-3-540-37952-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)