Three-Step Framework of Feature Selection for Data of DNA Microarray Experiments

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 283)


Dimensionality reduction of attribute set is a common preprocessing step used in machine learning. This step is especially important for high-dimensional data with low-dimensional representation such as gene expression data. Feature reduction is essential in the case of microarray data because most of the microarray data attributes are believed to be unrelated to observed classes. This paper proposes a three-step feature selection framework based on feature clustering, multi-criteria assessment (Borda count) and Markov blanket. The proposed framework is a filter method so it can be used with any classification algorithm. Its classification performance and selection stability were assessed. The experimental studies were performed on 10 microarray data sets. The experimental evaluation showed that the Markov blanket filter produces results comparable to state-of-art methods in terms of classification performance. However it tends to produce unstable solutions.


Feature selection Feature clustering Feature filtering Microarray Borda count Markov blanket Stability 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Christensen, B.C., Houseman, E.A., Marsit, C.J., Zheng, S., Wrensch, M.R., Wiemels, J.L., Nelson, H.H., Karagas, M.R., Padbury, J.F., Bueno, R., Sugarbaker, D.J., Yeh, R.F., Wiencke, J.K., Kelsey, K.T.: Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context. PLoS Genet. 5(8), e1000602+ (2009)Google Scholar
  3. 3.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)zbMATHGoogle Scholar
  4. 4.
    Dernoncourt, D., Hanczar, B., Zucker, J.D.: An empirical analysis of markov blanket filters for feature selection on microarray data. Machine Learning in Systems Biology, 19–23 (2011)Google Scholar
  5. 5.
    Fu, S., Desmarais, M.C.: Markov blanket based feature selection: a review of past decade. Proceedings of the World Congress on Engineering 1, 321–328 (2010)Google Scholar
  6. 6.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  7. 7.
    Gravier, E., Pierron, G., Vincent-Salomon, A., Gruel, N., Raynal, V., Savignoni, A., De Rycke, Y., Pierga, J.Y., Lucchesi, C., Reyal, F., Fourquet, A., Roman-Roman, S., Radvanyi, F., Sastre-Garau, X., Asselain, B., Delattre, O.: A prognostic dna signature for t1t2 node-negative breast cancer patients. Genes Chromosomes Cancer 49(12), 1125–1134 (2010)CrossRefGoogle Scholar
  8. 8.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  9. 9.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)CrossRefzbMATHGoogle Scholar
  10. 10.
    Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: A study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)CrossRefGoogle Scholar
  11. 11.
    Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)CrossRefGoogle Scholar
  12. 12.
    Kononenko, I.: On biases in estimating multi-valued attributes. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, vol. 2, pp. 1034–1040. Morgan Kaufmann Publishers Inc., San Francisco (1995)Google Scholar
  13. 13.
    Liu, H., Wu, X., Zhang, S.: Feature selection using hierarchical feature clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 979–984. ACM (2011)Google Scholar
  14. 14.
    Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 911–916. IEEE Computer Society, Washington, DC (2010)CrossRefGoogle Scholar
  15. 15.
    Napolitano, A., Dittman, D., Wald, R., Khoshgoftaar, T.: Similarity analysis of feature ranking techniques on imbalanced dna microarray datasets. In: Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), pp. 1–5. IEEE Computer Society, Washington, DC (2012)Google Scholar
  16. 16.
    Notterman, D., Alon, U., Sierk, A., Levine, A.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research 61, 3124–3130 (2001)Google Scholar
  17. 17.
    Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)CrossRefGoogle Scholar
  18. 18.
    Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2012) ISBN 3-900051-07-0Google Scholar
  19. 19.
    Ramey, J.A.: Datamicroarray, R package version 0.2.2 (2013)Google Scholar
  20. 20.
    Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1-2), 23–69 (2003)CrossRefzbMATHGoogle Scholar
  21. 21.
    Sahu, B., Mishra, D.: A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Engineering 38, 27–31 (2012); International Conference on Modelling Optimization and ComputingGoogle Scholar
  22. 22.
    Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)CrossRefGoogle Scholar
  23. 23.
    Somol, P., Novovicova, J., Pudil, P.: Efficient Feature Subset Selection and Subset Size Optimization. InTech (2010)Google Scholar
  24. 24.
    Sørlie, T., Perou, C.M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Thorsen, T., Quist, H., Matese, J.C., Brown, P.O., Botstein, D., Lønning, P.E., Børresen-Dale, A.L.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98(19), 10869–10874 (2001)CrossRefGoogle Scholar
  25. 25.
    Yang, F., Mao, K.Z.: Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans. Comput. Biol. Bioinformatics 8(4), 1080–1092 (2011)CrossRefGoogle Scholar
  26. 26.
    Yaramakala, S., Margaritis, D.: Speculative markov blanket discovery for optimal feature selection. In: Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM 2005, pp. 809–812. IEEE Computer Society, Washington, DC (2005)Google Scholar
  27. 27.
    Zeng, Y., Luo, J., Lin, S.: Classification using markov blanket for feature selection. In: IEEE International Conference on Granular Computing, GRC 2009, pp. 743–747 (2009)Google Scholar
  28. 28.
    Zhu, Z., Ong, Y.S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 40(11), 3236–3248 (2007)CrossRefzbMATHGoogle Scholar
  29. 29.
    Broad institute repository,
  30. 30.
    Princeton university collection,

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Department of Systems and Computer NetworksWroclaw University of TechnologyWroclawPoland

Personalised recommendations