Abstract
Dimensionality reduction of attribute set is a common preprocessing step used in machine learning. This step is especially important for high-dimensional data with low-dimensional representation such as gene expression data. Feature reduction is essential in the case of microarray data because most of the microarray data attributes are believed to be unrelated to observed classes. This paper proposes a three-step feature selection framework based on feature clustering, multi-criteria assessment (Borda count) and Markov blanket. The proposed framework is a filter method so it can be used with any classification algorithm. Its classification performance and selection stability were assessed. The experimental studies were performed on 10 microarray data sets. The experimental evaluation showed that the Markov blanket filter produces results comparable to state-of-art methods in terms of classification performance. However it tends to produce unstable solutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)
Christensen, B.C., Houseman, E.A., Marsit, C.J., Zheng, S., Wrensch, M.R., Wiemels, J.L., Nelson, H.H., Karagas, M.R., Padbury, J.F., Bueno, R., Sugarbaker, D.J., Yeh, R.F., Wiencke, J.K., Kelsey, K.T.: Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context. PLoS Genet. 5(8), e1000602+ (2009)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)
Dernoncourt, D., Hanczar, B., Zucker, J.D.: An empirical analysis of markov blanket filters for feature selection on microarray data. Machine Learning in Systems Biology, 19–23 (2011)
Fu, S., Desmarais, M.C.: Markov blanket based feature selection: a review of past decade. Proceedings of the World Congress on Engineering 1, 321–328 (2010)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Gravier, E., Pierron, G., Vincent-Salomon, A., Gruel, N., Raynal, V., Savignoni, A., De Rycke, Y., Pierga, J.Y., Lucchesi, C., Reyal, F., Fourquet, A., Roman-Roman, S., Radvanyi, F., Sastre-Garau, X., Asselain, B., Delattre, O.: A prognostic dna signature for t1t2 node-negative breast cancer patients. Genes Chromosomes Cancer 49(12), 1125–1134 (2010)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: A study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Kononenko, I.: On biases in estimating multi-valued attributes. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, vol. 2, pp. 1034–1040. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Liu, H., Wu, X., Zhang, S.: Feature selection using hierarchical feature clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 979–984. ACM (2011)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 911–916. IEEE Computer Society, Washington, DC (2010)
Napolitano, A., Dittman, D., Wald, R., Khoshgoftaar, T.: Similarity analysis of feature ranking techniques on imbalanced dna microarray datasets. In: Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), pp. 1–5. IEEE Computer Society, Washington, DC (2012)
Notterman, D., Alon, U., Sierk, A., Levine, A.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research 61, 3124–3130 (2001)
Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)
Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2012) ISBN 3-900051-07-0
Ramey, J.A.: Datamicroarray, R package version 0.2.2 (2013)
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1-2), 23–69 (2003)
Sahu, B., Mishra, D.: A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Engineering 38, 27–31 (2012); International Conference on Modelling Optimization and Computing
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Somol, P., Novovicova, J., Pudil, P.: Efficient Feature Subset Selection and Subset Size Optimization. InTech (2010)
Sørlie, T., Perou, C.M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Thorsen, T., Quist, H., Matese, J.C., Brown, P.O., Botstein, D., Lønning, P.E., Børresen-Dale, A.L.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98(19), 10869–10874 (2001)
Yang, F., Mao, K.Z.: Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans. Comput. Biol. Bioinformatics 8(4), 1080–1092 (2011)
Yaramakala, S., Margaritis, D.: Speculative markov blanket discovery for optimal feature selection. In: Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM 2005, pp. 809–812. IEEE Computer Society, Washington, DC (2005)
Zeng, Y., Luo, J., Lin, S.: Classification using markov blanket for feature selection. In: IEEE International Conference on Granular Computing, GRC 2009, pp. 743–747 (2009)
Zhu, Z., Ong, Y.S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 40(11), 3236–3248 (2007)
Broad institute repository, http://www.broadinstitute.org/
Princeton university collection, http://genomics-pubs.princeton.edu/oncology/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Trajdos, P., Kamizelich, A., Kurzynski, M. (2014). Three-Step Framework of Feature Selection for Data of DNA Microarray Experiments. In: Piętka, E., Kawa, J., Wieclawek, W. (eds) Information Technologies in Biomedicine, Volume 3. Advances in Intelligent Systems and Computing, vol 283. Springer, Cham. https://doi.org/10.1007/978-3-319-06593-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-06593-9_36
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06592-2
Online ISBN: 978-3-319-06593-9
eBook Packages: EngineeringEngineering (R0)