Abstract
Analysis and interpretation of gene-expression profiles, and the identification of respective molecular- or, gene-markers is the key towards the understanding of the genetic basis of major diseases. The problem is challenging because of the huge number of genes (thousands to tenths of thousands!) and the small number of samples (about 50 to 100 cases). In this paper we present a novel gene-selection methodology, based on the discretization of the continuous gene-expression values. With a specially devised gene-ranking metric we measure the strength of each gene with respect to its power to discriminate between sample categories. Then, a greedy feature-elimination algorithm is applied on the rank-ordered genes to form the final set of selected genes. Unseen samples are classified according to a specially devised prediction/matching metric. The methodology was applied on a number of real-world gene-expression studies yielding very good results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh, A.A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(3), 503–511 (2000)
Baim, P.W.: A Method for Attribute Selection in Inductive Learning Systems. IEEE PAMI 10(6), 888–896 (1988)
Bassett, D.E., Eisen, M.B., Boguski, M.S.: Gene expression informatics: it’s all in your mine. Nature Genetics 21(Suppl. 1), 51–55 (1999)
Brazma, A., Parkinson, H., Schlitt, T., Shojatalab, M.: A quick introduction to elements of biology - cells, molecules, genes, functional genomics, microarrays. EMBL- European Bioinformatics Institute (EBI) (October 2001), http://www.ebi.ac.uk/microarray/biology_intro.html (accessed October 2003)
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Procs of the 13th Inernational Joint Conference of Artificial Intelligence, pp. 1022–1029. Morgan Kaufmann, San Francisco (1993)
Friend, H.F.: How DNA microarrays and expression profiling will affect clinical practice. Br. Med. J. 319, 1–2 (1999)
Ginsburg, G.S., McCarthy, J.J.: Personalized medicine: revolutionizing drug discovery and patient care. Trends Biotechnol 19(12), 491–496 (2001)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Guttmacher, A.E., Collins, F.S.: Genome Medicine. Special issue of N Engl. Med. 349 (2003)
Hall, M.A.: Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato (1999)
Hedenfalk, I., et al.: Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med. 344(8), 539–548 (2001)
Kinzler, K.W., Vogelstein, B.: Lessons from hereditary colorectal cancer. Cell 87(2), 159–170 (1996)
Kohane, I.S.: Bioinformatics and Clinical Informatics: The Imperative to Collaborate. JAMIA 7, 512–516 (2000)
Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence (special issue on Relevance) 97(1-2), 273–324 (1996)
Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12), 1131–1142 (2001)
Maojo, V., Iakovidis, I., MartÃn-Sánchez, F., Crespo, J., Kulikoswki, C.: Medical Informatics and Bioinformatics: European efforts to facilitate synergy. Journal of Biomedical Informatics 34(6), 423–427 (2001)
Nadon, R., Shoemaker, J.: Statistical issues with microarrays: Processing and analysis. Trends in Genetics 15, 265–271 (2002)
Pomeroy, S.L., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)
Potamias, G.: Utilizing Gene Functional Classification in Microarray Data Analysis: a Hybrid Clustering Approach. In: 9th Panhellenic Conference in Informatics, Thessaloniki, Greece, November 21-23 (2003)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(81), 81–106 (1986)
Su, A.I., et al.: Molecular Classification of Human Carcinomas by Use of Gene Expression Signatures. Cancer Research 61, 7388–7399 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Potamias, G., Koumakis, L., Moustakis, V. (2004). Gene Selection via Discretized Gene-Expression Profiles and Greedy Feature-Elimination. In: Vouros, G.A., Panayiotopoulos, T. (eds) Methods and Applications of Artificial Intelligence. SETN 2004. Lecture Notes in Computer Science(), vol 3025. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24674-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-24674-9_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21937-8
Online ISBN: 978-3-540-24674-9
eBook Packages: Springer Book Archive