Skip to main content

Three-Step Framework of Feature Selection for Data of DNA Microarray Experiments

  • Conference paper
Information Technologies in Biomedicine, Volume 3

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 283))

  • 1160 Accesses

Abstract

Dimensionality reduction of attribute set is a common preprocessing step used in machine learning. This step is especially important for high-dimensional data with low-dimensional representation such as gene expression data. Feature reduction is essential in the case of microarray data because most of the microarray data attributes are believed to be unrelated to observed classes. This paper proposes a three-step feature selection framework based on feature clustering, multi-criteria assessment (Borda count) and Markov blanket. The proposed framework is a filter method so it can be used with any classification algorithm. Its classification performance and selection stability were assessed. The experimental studies were performed on 10 microarray data sets. The experimental evaluation showed that the Markov blanket filter produces results comparable to state-of-art methods in terms of classification performance. However it tends to produce unstable solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  2. Christensen, B.C., Houseman, E.A., Marsit, C.J., Zheng, S., Wrensch, M.R., Wiemels, J.L., Nelson, H.H., Karagas, M.R., Padbury, J.F., Bueno, R., Sugarbaker, D.J., Yeh, R.F., Wiencke, J.K., Kelsey, K.T.: Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context. PLoS Genet. 5(8), e1000602+ (2009)

    Google Scholar 

  3. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)

    MATH  Google Scholar 

  4. Dernoncourt, D., Hanczar, B., Zucker, J.D.: An empirical analysis of markov blanket filters for feature selection on microarray data. Machine Learning in Systems Biology, 19–23 (2011)

    Google Scholar 

  5. Fu, S., Desmarais, M.C.: Markov blanket based feature selection: a review of past decade. Proceedings of the World Congress on Engineering 1, 321–328 (2010)

    Google Scholar 

  6. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  7. Gravier, E., Pierron, G., Vincent-Salomon, A., Gruel, N., Raynal, V., Savignoni, A., De Rycke, Y., Pierga, J.Y., Lucchesi, C., Reyal, F., Fourquet, A., Roman-Roman, S., Radvanyi, F., Sastre-Garau, X., Asselain, B., Delattre, O.: A prognostic dna signature for t1t2 node-negative breast cancer patients. Genes Chromosomes Cancer 49(12), 1125–1134 (2010)

    Article  Google Scholar 

  8. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  9. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)

    Article  MATH  Google Scholar 

  10. Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: A study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)

    Article  Google Scholar 

  11. Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)

    Article  Google Scholar 

  12. Kononenko, I.: On biases in estimating multi-valued attributes. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, vol. 2, pp. 1034–1040. Morgan Kaufmann Publishers Inc., San Francisco (1995)

    Google Scholar 

  13. Liu, H., Wu, X., Zhang, S.: Feature selection using hierarchical feature clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 979–984. ACM (2011)

    Google Scholar 

  14. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 911–916. IEEE Computer Society, Washington, DC (2010)

    Chapter  Google Scholar 

  15. Napolitano, A., Dittman, D., Wald, R., Khoshgoftaar, T.: Similarity analysis of feature ranking techniques on imbalanced dna microarray datasets. In: Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), pp. 1–5. IEEE Computer Society, Washington, DC (2012)

    Google Scholar 

  16. Notterman, D., Alon, U., Sierk, A., Levine, A.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research 61, 3124–3130 (2001)

    Google Scholar 

  17. Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)

    Article  Google Scholar 

  18. Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2012) ISBN 3-900051-07-0

    Google Scholar 

  19. Ramey, J.A.: Datamicroarray, R package version 0.2.2 (2013)

    Google Scholar 

  20. Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1-2), 23–69 (2003)

    Article  MATH  Google Scholar 

  21. Sahu, B., Mishra, D.: A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Engineering 38, 27–31 (2012); International Conference on Modelling Optimization and Computing

    Google Scholar 

  22. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)

    Article  Google Scholar 

  23. Somol, P., Novovicova, J., Pudil, P.: Efficient Feature Subset Selection and Subset Size Optimization. InTech (2010)

    Google Scholar 

  24. Sørlie, T., Perou, C.M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Thorsen, T., Quist, H., Matese, J.C., Brown, P.O., Botstein, D., Lønning, P.E., Børresen-Dale, A.L.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98(19), 10869–10874 (2001)

    Article  Google Scholar 

  25. Yang, F., Mao, K.Z.: Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans. Comput. Biol. Bioinformatics 8(4), 1080–1092 (2011)

    Article  Google Scholar 

  26. Yaramakala, S., Margaritis, D.: Speculative markov blanket discovery for optimal feature selection. In: Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM 2005, pp. 809–812. IEEE Computer Society, Washington, DC (2005)

    Google Scholar 

  27. Zeng, Y., Luo, J., Lin, S.: Classification using markov blanket for feature selection. In: IEEE International Conference on Granular Computing, GRC 2009, pp. 743–747 (2009)

    Google Scholar 

  28. Zhu, Z., Ong, Y.S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 40(11), 3236–3248 (2007)

    Article  MATH  Google Scholar 

  29. Broad institute repository, http://www.broadinstitute.org/

  30. Princeton university collection, http://genomics-pubs.princeton.edu/oncology/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pawel Trajdos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Trajdos, P., Kamizelich, A., Kurzynski, M. (2014). Three-Step Framework of Feature Selection for Data of DNA Microarray Experiments. In: Piętka, E., Kawa, J., Wieclawek, W. (eds) Information Technologies in Biomedicine, Volume 3. Advances in Intelligent Systems and Computing, vol 283. Springer, Cham. https://doi.org/10.1007/978-3-319-06593-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06593-9_36

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06592-2

  • Online ISBN: 978-3-319-06593-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics