Efficient Microarray Data Classification with Three-Stage Dimensionality Reduction
High dimensionality and small sample size are the intrinsic nature of microarray data, which require effective computational methods to discover useful knowledge from it. Classification of microarray data is one of the important tasks in this field of work. Representation of the search space with thousands of genes makes this work much complex and difficult to classify efficiently. In this work, three different stages have been adopted to handle the crush of dimensionality and classify the microarray data. At the first stage, statistical measures are used to remove genes that do not contribute for classification. In the second stage, more noisy genes are removed by considering signal-to-noise ratio (SNR). In the third stage, principal component analysis (PCA) method is used to further reduce the dimension. Finally, these reduced datasets are presented to different classification techniques to evaluate their performance. Here, four different classification algorithms are used such as artificial neural network (ANN), naïve Bayesian classifier, multiple linear regression (MLR), and k-nearest neighbor (k-NN) to validate the benefits of three-stage dimensionality reduction. The experimental results show that the use of statistical methods, SNR, and PCA improves the overall performance of the classifiers.
KeywordsClassification Microarray data Feature selection
The authors gratefully acknowledge the support of the Original Technology Research Program for Brain Science through the National Research Foundation (NRF) of Korea (NRF:2010-0018948) funded by the Ministry of Education, Science, and Technology.
- 3.Mutch, D.M., Berger, A., Mansourian, R., Rytz, A., Roberts, M.A.: Microarray data analysis: a practical approach for selecting differentially expressed genes. Genome Biol. 2(12) (2001)Google Scholar
- 5.Hsia, T.C.: System Identification: Least Squares Methods. D. C. Heath and Company (1997)Google Scholar
- 7.Cover, T., Hart, P.: Nearest neighbor pattern classification. Proc. IEEE Trans. Inf. Theor. 21–27 (1967)Google Scholar
- 8.Yan, J., Zhang, B., Liu, N., Yan, S., Cheng, Q., Fan, W., Yang, Q., Xi, W., Chen, Z.: Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing. IEEE Trans. Knowl. Data Eng. 18(3), 320–333 (2006)Google Scholar
- 9.Valarmathie, P., Srinath, M., Dinakaran, K.: An increased performance of clustering high dimensional data through dimensionality reduction technique. J. Theor. Appl. Inf. Technol. 13, 271–273 (2009)Google Scholar
- 11.Mitchell, T.M.: Machine Learning. McGraw-Hill (1997)Google Scholar
- 12.Golub, T.R.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)Google Scholar
- 13.Notterman, D.A., Alon, U., Sierk, A.J., Levine, A.J.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Res. 61(7), 3124–3130 (2001)Google Scholar