Abstract
The dimensionality of biological data is often very high. Feature selection can be used to tackle the problem of high dimensionality. However, majority of the work in feature selection consists of supervised feature selection methods which require class labels. The problem further escalates when the data is time–series gene expression measurements that measure the effect of external stimuli on biological system. In this paper we propose an unsupervised method for gene selection from time–series gene expression data founded on statistical significance testing and swap randomization. We perform experiments with a publicly available mouse gene expression dataset and also a human gene expression dataset describing the exposure to asbestos. The results in both datasets show a considerable decrease in number of genes.
Chapter PDF
References
Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the ninth international workshop on Machine Learning, ML 1992, pp. 249–256. Morgan Kaufmann Publishers Inc., San Francisco (1992)
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1-2), 245–271 (1997)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Mörchen, F.: Time series feature extraction for data mining using DWT and DFT. Technical Report 33, Department of Mathematics and Computer Science, University of Marburg, Germany (2003)
Tikka, J., Hollmén, J.: A Sequential Input Selection Algorithm for Long-term prediction of Time Series. Neurocomputing 71(13-15), 2604–2615 (2008)
Heller, M.J.: DNA microarray technology: Devices, systems, and applications. Annual Review Of Biomedical Engineering 4, 129–153 (2002)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 389–422 (2002)
Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1), 3 (2006)
Wichert, S., Fokianos, K., Strimmer, K.: Identifying periodically expressed transcripts in microarray time series data. Bioinformatics 20, 5–20 (2004)
Peddada, S.D., Lobenhofer, E.K., Li, L., Afshari, C.A., Weinberg, C.R., Umbach, D.M.: Gene selection and clustering for time–course and dose–response microarray experiments using order–restricted inference. Bioinformatics 19(7), 834–841 (2003)
Lin, T., Kaminski, N., Bar-Joseph, Z.: Alignment and classification of time series gene expression in clinical studies. Bioinformatics 24(13), i147–i155 (2008)
Hyvärinen, A., Karhunen, J., Oja, E.: Independent component analysis. Adaptive and learning systems for signal processing, communications, and control. John Wiley and Sons (2001)
Nymark, N., Lindholm, P.M., Korpela, M.V., Lahti, L., Ruosaari, S., Kaski, S., Hollmén, J., Anttila, S., Kinnula, V.L., Knuutila, S.: Gene Expression Profiles in Asbestos-exposed Epithelial and Mesothelial Lung Cell Lines. BMC Genomics 8(1), 62 (2007)
Zhang, Z., Martino, A., Faulon, J.: Identification of expression patterns of IL-2-responsive genes in the murine T cell line CTLL-2. Jounal of Interferon & Cytokine Research 27(12), 991–995 (2007)
Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)
Parmigiani, G.: The analysis of gene expression data: methods and software. Springer, Heidelberg (2003)
Good, P.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses, 2nd edn. Springer, Heidelberg (2000)
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data 1(3), 14 (2007)
Schervish, M.J.: P Values: What They Are and What They Are Not. American Statistician 50(3), 203–206 (1996)
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Adhikari, P.R., Upadhyaya, B.B., Meng, C., Hollmén, J. (2011). Gene Selection in Time-Series Gene Expression Data. In: Loog, M., Wessels, L., Reinders, M.J.T., de Ridder, D. (eds) Pattern Recognition in Bioinformatics. PRIB 2011. Lecture Notes in Computer Science(), vol 7036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24855-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-24855-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24854-2
Online ISBN: 978-3-642-24855-9
eBook Packages: Computer ScienceComputer Science (R0)