Three-Step Framework of Feature Selection for Data of DNA Microarray Experiments

Trajdos, Pawel; Kamizelich, Adam; Kurzynski, Marek

doi:10.1007/978-3-319-06593-9_36

Pawel Trajdos⁵,
Adam Kamizelich⁵ &
Marek Kurzynski⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 283))

1160 Accesses

Abstract

Dimensionality reduction of attribute set is a common preprocessing step used in machine learning. This step is especially important for high-dimensional data with low-dimensional representation such as gene expression data. Feature reduction is essential in the case of microarray data because most of the microarray data attributes are believed to be unrelated to observed classes. This paper proposes a three-step feature selection framework based on feature clustering, multi-criteria assessment (Borda count) and Markov blanket. The proposed framework is a filter method so it can be used with any classification algorithm. Its classification performance and selection stability were assessed. The experimental studies were performed on 10 microarray data sets. The experimental evaluation showed that the Markov blanket filter produces results comparable to state-of-art methods in terms of classification performance. However it tends to produce unstable solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)
Article Google Scholar
Christensen, B.C., Houseman, E.A., Marsit, C.J., Zheng, S., Wrensch, M.R., Wiemels, J.L., Nelson, H.H., Karagas, M.R., Padbury, J.F., Bueno, R., Sugarbaker, D.J., Yeh, R.F., Wiencke, J.K., Kelsey, K.T.: Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context. PLoS Genet. 5(8), e1000602+ (2009)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)
MATH Google Scholar
Dernoncourt, D., Hanczar, B., Zucker, J.D.: An empirical analysis of markov blanket filters for feature selection on microarray data. Machine Learning in Systems Biology, 19–23 (2011)
Google Scholar
Fu, S., Desmarais, M.C.: Markov blanket based feature selection: a review of past decade. Proceedings of the World Congress on Engineering 1, 321–328 (2010)
Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Gravier, E., Pierron, G., Vincent-Salomon, A., Gruel, N., Raynal, V., Savignoni, A., De Rycke, Y., Pierga, J.Y., Lucchesi, C., Reyal, F., Fourquet, A., Roman-Roman, S., Radvanyi, F., Sastre-Garau, X., Asselain, B., Delattre, O.: A prognostic dna signature for t1t2 node-negative breast cancer patients. Genes Chromosomes Cancer 49(12), 1125–1134 (2010)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)
Article MATH Google Scholar
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: A study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)
Article Google Scholar
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Article Google Scholar
Kononenko, I.: On biases in estimating multi-valued attributes. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, vol. 2, pp. 1034–1040. Morgan Kaufmann Publishers Inc., San Francisco (1995)
Google Scholar
Liu, H., Wu, X., Zhang, S.: Feature selection using hierarchical feature clustering. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 979–984. ACM (2011)
Google Scholar
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM 2010, pp. 911–916. IEEE Computer Society, Washington, DC (2010)
Chapter Google Scholar
Napolitano, A., Dittman, D., Wald, R., Khoshgoftaar, T.: Similarity analysis of feature ranking techniques on imbalanced dna microarray datasets. In: Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), pp. 1–5. IEEE Computer Society, Washington, DC (2012)
Google Scholar
Notterman, D., Alon, U., Sierk, A., Levine, A.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research 61, 3124–3130 (2001)
Google Scholar
Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)
Article Google Scholar
Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2012) ISBN 3-900051-07-0
Google Scholar
Ramey, J.A.: Datamicroarray, R package version 0.2.2 (2013)
Google Scholar
Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1-2), 23–69 (2003)
Article MATH Google Scholar
Sahu, B., Mishra, D.: A novel feature selection algorithm using particle swarm optimization for cancer microarray data. Procedia Engineering 38, 27–31 (2012); International Conference on Modelling Optimization and Computing
Google Scholar
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Article Google Scholar
Somol, P., Novovicova, J., Pudil, P.: Efficient Feature Subset Selection and Subset Size Optimization. InTech (2010)
Google Scholar
Sørlie, T., Perou, C.M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Thorsen, T., Quist, H., Matese, J.C., Brown, P.O., Botstein, D., Lønning, P.E., Børresen-Dale, A.L.: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98(19), 10869–10874 (2001)
Article Google Scholar
Yang, F., Mao, K.Z.: Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans. Comput. Biol. Bioinformatics 8(4), 1080–1092 (2011)
Article Google Scholar
Yaramakala, S., Margaritis, D.: Speculative markov blanket discovery for optimal feature selection. In: Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM 2005, pp. 809–812. IEEE Computer Society, Washington, DC (2005)
Google Scholar
Zeng, Y., Luo, J., Lin, S.: Classification using markov blanket for feature selection. In: IEEE International Conference on Granular Computing, GRC 2009, pp. 743–747 (2009)
Google Scholar
Zhu, Z., Ong, Y.S., Dash, M.: Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 40(11), 3236–3248 (2007)
Article MATH Google Scholar
Broad institute repository, http://www.broadinstitute.org/
Princeton university collection, http://genomics-pubs.princeton.edu/oncology/

Download references

Author information

Authors and Affiliations

Department of Systems and Computer Networks, Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370, Wroclaw, Poland
Pawel Trajdos, Adam Kamizelich & Marek Kurzynski

Authors

Pawel Trajdos
View author publications
You can also search for this author in PubMed Google Scholar
Adam Kamizelich
View author publications
You can also search for this author in PubMed Google Scholar
Marek Kurzynski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pawel Trajdos .

Editor information

Editors and Affiliations

Faculty of Biomedical Engineering, Silesian University of Technology, Gliwice, Poland
Ewa Piętka
Faculty of Biomedical Engineering, Silesian University of Technology, Gliwice, Poland
Jacek Kawa
Faculty of Biomedical Engineering, Silesian University of Technology, Gliwice, Poland
Wojciech Wieclawek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trajdos, P., Kamizelich, A., Kurzynski, M. (2014). Three-Step Framework of Feature Selection for Data of DNA Microarray Experiments. In: Piętka, E., Kawa, J., Wieclawek, W. (eds) Information Technologies in Biomedicine, Volume 3. Advances in Intelligent Systems and Computing, vol 283. Springer, Cham. https://doi.org/10.1007/978-3-319-06593-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-06593-9_36
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06592-2
Online ISBN: 978-3-319-06593-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics