Abstract
In this paper the problem of cell classification using gene expression data is addressed. One of the main features of this kind of data is the very large number of variables (genes), relative to the number of observations (cells). This condition makes most of the standard statistical methods for classification difficult to employ. The proposed solution consists of building classification rules on subsets of genes showing a behavior across the cells that differs most from that of all the other ones. This variable selection procedure is based on suitable linear transformations of the observed data: a strategy resorting to independent component analysis is explored. Our proposal is compared with the nearest shrunken centroid method (Tibshirani et al. (2002)) on three publicly available data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ALIZADEH, A.A., EISEN, M.B., DAVIS, R.E. et al. (2000): Distinct Types of Diffuse Large B-cell Lymphoma Identified by Gene Expression Profiling. Nature, 403, 503–511.
DUDOIT, S., FRIDLYAND, J. and SPEED, T.P. (2002): Comparison of Discrimination Methods for the Classification of Tumors using Gene Expression Data. Journal of the American Statistical Association, 457, 77–87.
GOLUB, T.R., SLONIM, D.K., TAMAYO, P. et al. (1999): Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286, 531–537.
HYVÄRINEN, A., KARHUNEN, J. and OJA, E. (2001): Independent Component Analysis, Wiley, New York.
KHAN, J., WEI, J., RINGNER, M. et al. (2001): Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks. Nature Medicine, 7, 673–679.
TIBSHIRANI, R., HASTIE, T., NARASIMHAN, B. and CHU, G. (2002): Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression, Proceedings of the National Accademy of Sciences, 99, 6567–6572.
VIROLI, C. (2003): Reflections on a Supervised Approach to Independent Component Analysis, Between Data Science and Applied Data Analysis, (M. Schader, W. Gaul e M. Vichi eds.), Studies in Classification, Data Analysis, and Knowledge Organization, Springer Berlin, 501–509.
WALL, M.E., RECHTSTEINER, A. and ROCHA, L.M. (2003): Singular Value Decomposition and Principal Component Analysis, in: A Practical Approach to Microarray Data Analysis, Berrar D.P., Dubitzky W. and Granzow M. (Eds.), Kluwer, Norwell, 91–109.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Calò, D.G., Galimberti, G., Pillati, M., Viroli, C. (2005). Variable Selection in Cell Classification Problems: A Strategy Based on Independent Component Analysis. In: Bock, HH., et al. New Developments in Classification and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27373-5_3
Download citation
DOI: https://doi.org/10.1007/3-540-27373-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23809-6
Online ISBN: 978-3-540-27373-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)