A Deceiving Charm of Feature Selection: The Microarray Case Study
Microarray analysis has become a significant use of machine learning in molecular biology. Datasets obtained from this method consist of tens of thousands of attributes usually describing tens of objects. Such setting makes the use of some form of feature selection an inevitable step of analysis—mostly to reduce the feature set to manageable size, but also to obtain an biological insight in the mechanisms of the investigated process. In this paper we present a reanalysis of a previously published late radiation toxicity prediction problem. On that lurid example we show how futile it may be to rely on non-validated feature selection and how even advanced algorithms fail to distinguish between noise and signal when the latter is weak. We also propose methods of detecting and dealing with mentioned problems.
Keywordsgene expression feature selection random forest
Unable to display preview. Download preview PDF.
- 4.Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
- 7.Kursa, M.B., Rudnicki, W.R.: Feature Selection with the Boruta Package. Journal of Statistical Software 36(11), 1–13 (2010)Google Scholar
- 8.Rudnicki, W.R., Kierczak, M., Koronacki, J., Komorowski, J.: A Statistical Method for Determining Importance of Variables in an Information System. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Słowiński, R. (eds.) RSCTC 2006. LNCS (LNAI), vol. 4259, pp. 557–566. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 9.Svensson, J.P., Stalpers, L.J.a., Esveldt-van Lange, R.E.E., Franken, N.a.P., Haveman, J., Klein, B., Turesson, I., Vrieling, H., Giphart-Gassler, M.: Analysis of gene expression using gene sets discriminates cancer patients with and without late radiation toxicity. PLoS Medicine 3(10), e422 (2006)CrossRefGoogle Scholar