Abstract
In recent years, machine learning and data mining fields have found a successful application area in the field of DNA microarray technology. Gene expression profiles are composed of thousands of genes at the same time, representing complex relationships between them. One of the well-known constraints specifically related to microarray data is the large number of genes in comparison with the small number of available experiments or cases. In this context, the ability to identify an accurate gene selection strategy is crucial to reduce the generalization error (false positives) of state-of-the-art classification algorithms. This paper presents a reduction algorithm based on the notion of fuzzy gene expression, where similar (co-expressed) genes belonging to different patients are selected in order to construct a supervised prototype-based retrieval model. This technique is employed to implement the retrieval step in our new gene-CBR system. The proposed method is illustrated with the analysis of microarray data belonging to bone marrow cases from 43 adult patients with cancer plus a group of three cases corresponding to healthy persons.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Schena, M., Shalon, D., Davis, R., Brown, P.O.: Quantitative monitoring of gene expression patterns with a cDNA microarray. Science 270, 467–470 (1995)
DeRisi, J., Penland, L., Brown, P.O., Bittner, M.L., Meltzer, P.S., Ray, M., Chen, Y., Su, Y.A., Trent, J.M.: Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nature Genetics 14(4), 367–370 (1996)
The Chipping Forecast I. Special Supplement. Nature Genetics 21 (1999)
The Chipping Forecast II. Special Supplement. Nature Genetics 32 (2002)
Lipshutz, R.J., Fodor, S.P.A., Gingeras, T.R., Lockhart, D.H.: High density synthetic oligonucleotide arrays. Nature Genetics 21, 20–24 (1999)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Articles on microarray data mining. ACM SIGKDD Explorations Newsletter 5(2), 1–139 (2003)
Cho, S.B., Won, H.H.: Machine learning in DNA microarray analysis for cancer classification. In: Proc. of the First Asia-Pacific Bioinformatics Conference, vol. 19, pp. 189–198 (2003)
Morrison, N., Hoyle, D.C.: Normalization concepts and methods for normalizing microarray data. In: Berrar, D.P., Dubitzky, W., Granzow, M. (eds.) A Practical Approach to MicroArray Data Analysis. Kluwer Academic Publishers, Boston (2003)
Bilban, M., Buehler, L.K., Head, S., Desoye, G., Quaranta, V.: Normalizing DNA microarray data. Current Issues in Molecular Biology 4(2), 57–64 (2000)
Schuchhardt, J., Beule, D., Malik, A., Wolski, E., Eickhoff, H., Lehrach, H., Herzel, H.: Normalization strategies for cDNA microarrays. Nucleic Acids Research 28(10), e47 (2000)
Rubinstein, B.I.P., McAuliffe, F., Cawley, S., Palaniswami, M., Ramamohanarao, K., Speed, T.S.: Machine learning in low-level microarray analysis. ACM SIGKDD Explorations Newsletter 5(2), 130–139 (2003)
Corchado, J.M., Corchado, E.S., Aiken, J., Fyfe, C., Fdez-Riverola, F., Glez-Bedia, M.: Maximum Likelihood Hebbian Learning Based Retrieval Method for CBR Systems. In: Proc. of the 5th International Conference on Case-Based Reasoning, pp. 107–121 (2003)
Corchado, J.M., Aiken, J., Corchado, E., Lefevre, N., Smyth, T.: Quantifying the ocean’s CO2 budget with a coHeL-IBR system. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 533–546. Springer, Heidelberg (2004)
Cakmakov, D., Bennani, Y.: Feature selection for pattern recognition. Informa Press (2002)
Jurisica, I., Glawgow, J.: Applications of case-based reasoning in molecular biology. Artificial Intelligence Magazine, Special issue on Bioinformatics 25(1), 85–95 (2004)
Li, L., Darden, T.A., Weinberg, C.R., Levine, A.J., Pedersen, L.G.: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Combinatorial Chemistry and High Throughput Screening 4(8), 727–739 (2001)
Blanco, R., Larrañaga, P., Inza, I., Sierra, B.: Gene selection for cancer classification using wrapper approaches. International Journal of Pattern Recognition and Artificial Intelligence (accepted for publication) (2004)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)
Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved gene selection for classification of microarrays. In: Proc. of Pacific Symposium on Biocomputing, pp. 53–64 (2003)
Qi, H.: Feature selection and kNN fusion in molecular classification of multiple tumor types. In: Proc. of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (2002)
Hanczar, B., Courtine, M., Benis, A., Hennegar, C., Clément, K., Zucker, J.D.: Improving classification of microarray data using prototype-based feature selection. In: ACM SIGKDD Explorations Newsletter, vol. 5(2), pp. 23–30 (2003)
Zheng, G., Olusegun, E., Narasimhan, G.: Neural network classifiers and gene selection methods for microarray data on human lung adenocarcinoma. Prof. of Critical Assessment of Microarray Data Analysis, 63–67 (2003)
Hochreiter, S., Obermayer, K.: Feature selection and classification on matrix data: from large margins to small covering numbers. In: Advances in Neural Information Processing Systems, vol. 15, pp. 913–920 (2003)
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. In: Advances in Neural Information Processing Systems, vol. 13, pp. 668–674 (2001)
Pal, S., Shiu, S.: Foundations of Soft Case-Based Reasoning. John Wiley, New York (2004)
Pal, S., Mitra, P.: Case Generation Using Rough Sets with Fuzzy Representation. IEEE Transactions on Knowledge and Data Engineering 16(3), 292–300 (2004)
Riesbeck, C.K., Schank, R.C.: Inside Case-Based Reasoning. Lawrence Erlbaum Associates, Hillsdale (1999)
Fdez-Riverola, F., Corchado, J.M.: FSfRT, Forecasting System for Red Tides. An Hybrid Autonomous AI Model. Applied Artificial Intelligence 17(10), 955–982 (2003)
Pal, S.K., Dilon, T.S., Yeung, D.S.: Soft Computing in Case Based Reasoning. Springer, London (2000)
Sankar, K.P., Simon, C.K.S.: Foundations of Soft Case-Based Reasoning. Wiley-Interscience, Hoboken (2003)
Fdez-Riverola, F., Corchado, J.M.: Employing TSK Fuzzy models to automate the revision stage of a CBR system. In: Current Topics in Artificial Intelligence. LNCS (LNAI), vol. 3040, pp. 302–311 (2004)
Gutierrez, N.C., López-Pérez, R., Hernández, J.M., Isidro, I., González, B., García, J.L., Ferminán, E., Lumbreras, E., San Miguel, J.F.: Gene expression profile reveals deregulation of new genes with relevant functions in the different subclasses of acute myeloid leukemia. Blood 102(11) (2003)
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. of the National Academy of Sciences of the United States of America 99(10), 6561–6572 (2002)
Aaronson, J.S., Juergen, H., Overton, G.C.: Knowledge Discovery in GENBANK. In: Proc. of the First International Conference on Intelligent Systems for Molecular Biology, pp. 3–11 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fdez-Riverola, F., Díaz, F., Borrajo, M.L., Yáñez, J.C., Corchado, J.M. (2005). Improving Gene Selection in Microarray Data Analysis Using Fuzzy Patterns Inside a CBR System. In: Muñoz-Ávila, H., Ricci, F. (eds) Case-Based Reasoning Research and Development. ICCBR 2005. Lecture Notes in Computer Science(), vol 3620. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11536406_17
Download citation
DOI: https://doi.org/10.1007/11536406_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28174-0
Online ISBN: 978-3-540-31855-2
eBook Packages: Computer ScienceComputer Science (R0)