Efficient Microarray Data Classification with Three-Stage Dimensionality Reduction

  • Rasmita Dash
  • B. B. Misra
  • Satchidananda Dehuri
  • Sung-Bae Cho
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 308)


High dimensionality and small sample size are the intrinsic nature of microarray data, which require effective computational methods to discover useful knowledge from it. Classification of microarray data is one of the important tasks in this field of work. Representation of the search space with thousands of genes makes this work much complex and difficult to classify efficiently. In this work, three different stages have been adopted to handle the crush of dimensionality and classify the microarray data. At the first stage, statistical measures are used to remove genes that do not contribute for classification. In the second stage, more noisy genes are removed by considering signal-to-noise ratio (SNR). In the third stage, principal component analysis (PCA) method is used to further reduce the dimension. Finally, these reduced datasets are presented to different classification techniques to evaluate their performance. Here, four different classification algorithms are used such as artificial neural network (ANN), naïve Bayesian classifier, multiple linear regression (MLR), and k-nearest neighbor (k-NN) to validate the benefits of three-stage dimensionality reduction. The experimental results show that the use of statistical methods, SNR, and PCA improves the overall performance of the classifiers.


Classification Microarray data Feature selection 



The authors gratefully acknowledge the support of the Original Technology Research Program for Brain Science through the National Research Foundation (NRF) of Korea (NRF:2010-0018948) funded by the Ministry of Education, Science, and Technology.


  1. 1.
    Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2(6), 418–427 (2001)CrossRefGoogle Scholar
  2. 2.
    Zhou, X., Tuck, D.P.: MSVM-RFE extensions of SVM-REF for multiclass gene selection on DNA microarray data. Bioinformatics 23(9), 1106–1114 (2007)CrossRefGoogle Scholar
  3. 3.
    Mutch, D.M., Berger, A., Mansourian, R., Rytz, A., Roberts, M.A.: Microarray data analysis: a practical approach for selecting differentially expressed genes. Genome Biol. 2(12) (2001)Google Scholar
  4. 4.
    Resul, D., Ibrahim, T., Abdulkadir, S.: Effective diagnosis of heart disease through neural networks ensembles. Expert Syst. Appl. 36, 7675–7680 (2009)CrossRefGoogle Scholar
  5. 5.
    Hsia, T.C.: System Identification: Least Squares Methods. D. C. Heath and Company (1997)Google Scholar
  6. 6.
    Lu, H., Setiono, R., Liu, H.: Effect data mining using neural networks. IEEE Trans. Knowl. Data Eng. 8, 957–961 (1996)CrossRefGoogle Scholar
  7. 7.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. Proc. IEEE Trans. Inf. Theor. 21–27 (1967)Google Scholar
  8. 8.
    Yan, J., Zhang, B., Liu, N., Yan, S., Cheng, Q., Fan, W., Yang, Q., Xi, W., Chen, Z.: Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing. IEEE Trans. Knowl. Data Eng. 18(3), 320–333 (2006)Google Scholar
  9. 9.
    Valarmathie, P., Srinath, M., Dinakaran, K.: An increased performance of clustering high dimensional data through dimensionality reduction technique. J. Theor. Appl. Inf. Technol. 13, 271–273 (2009)Google Scholar
  10. 10.
    Lee, C.-P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11, 208–213 (2011)CrossRefGoogle Scholar
  11. 11.
    Mitchell, T.M.: Machine Learning. McGraw-Hill (1997)Google Scholar
  12. 12.
    Golub, T.R.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)Google Scholar
  13. 13.
    Notterman, D.A., Alon, U., Sierk, A.J., Levine, A.J.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Res. 61(7), 3124–3130 (2001)Google Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  • Rasmita Dash
    • 1
  • B. B. Misra
    • 2
  • Satchidananda Dehuri
    • 3
  • Sung-Bae Cho
    • 4
  1. 1.Department of Computer Science and Information TechnologySOA UniversityBhubaneswarIndia
  2. 2.Department of Computer Science and EngineeringSilicon Institute of TechnologyBhubaneswarIndia
  3. 3.Department of Systems EngineeringAjou UniversitySuwonSouth Korea
  4. 4.Soft Computing Laboratory, Department of Computer ScienceYonsei UniversitySeoulSouth Korea

Personalised recommendations