Improving Gene Selection in Microarray Data Analysis Using Fuzzy Patterns Inside a CBR System

  • Florentino Fdez-Riverola
  • Fernando Díaz
  • M. Lourdes Borrajo
  • J. Carlos Yáñez
  • Juan M. Corchado
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3620)


In recent years, machine learning and data mining fields have found a successful application area in the field of DNA microarray technology. Gene expression profiles are composed of thousands of genes at the same time, representing complex relationships between them. One of the well-known constraints specifically related to microarray data is the large number of genes in comparison with the small number of available experiments or cases. In this context, the ability to identify an accurate gene selection strategy is crucial to reduce the generalization error (false positives) of state-of-the-art classification algorithms. This paper presents a reduction algorithm based on the notion of fuzzy gene expression, where similar (co-expressed) genes belonging to different patients are selected in order to construct a supervised prototype-based retrieval model. This technique is employed to implement the retrieval step in our new gene-CBR system. The proposed method is illustrated with the analysis of microarray data belonging to bone marrow cases from 43 adult patients with cancer plus a group of three cases corresponding to healthy persons.


Membership Function Case Base Reasoning Microarray Data Analysis Soft Computing Technique Linguistic Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schena, M., Shalon, D., Davis, R., Brown, P.O.: Quantitative monitoring of gene expression patterns with a cDNA microarray. Science 270, 467–470 (1995)CrossRefGoogle Scholar
  2. 2.
    DeRisi, J., Penland, L., Brown, P.O., Bittner, M.L., Meltzer, P.S., Ray, M., Chen, Y., Su, Y.A., Trent, J.M.: Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nature Genetics 14(4), 367–370 (1996)CrossRefGoogle Scholar
  3. 3.
    The Chipping Forecast I. Special Supplement. Nature Genetics 21 (1999)Google Scholar
  4. 4.
    The Chipping Forecast II. Special Supplement. Nature Genetics 32 (2002)Google Scholar
  5. 5.
    Lipshutz, R.J., Fodor, S.P.A., Gingeras, T.R., Lockhart, D.H.: High density synthetic oligonucleotide arrays. Nature Genetics 21, 20–24 (1999)CrossRefGoogle Scholar
  6. 6.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  7. 7.
    Articles on microarray data mining. ACM SIGKDD Explorations Newsletter 5(2), 1–139 (2003)Google Scholar
  8. 8.
    Cho, S.B., Won, H.H.: Machine learning in DNA microarray analysis for cancer classification. In: Proc. of the First Asia-Pacific Bioinformatics Conference, vol. 19, pp. 189–198 (2003)Google Scholar
  9. 9.
    Morrison, N., Hoyle, D.C.: Normalization concepts and methods for normalizing microarray data. In: Berrar, D.P., Dubitzky, W., Granzow, M. (eds.) A Practical Approach to MicroArray Data Analysis. Kluwer Academic Publishers, Boston (2003)Google Scholar
  10. 10.
    Bilban, M., Buehler, L.K., Head, S., Desoye, G., Quaranta, V.: Normalizing DNA microarray data. Current Issues in Molecular Biology 4(2), 57–64 (2000)Google Scholar
  11. 11.
    Schuchhardt, J., Beule, D., Malik, A., Wolski, E., Eickhoff, H., Lehrach, H., Herzel, H.: Normalization strategies for cDNA microarrays. Nucleic Acids Research 28(10), e47 (2000)CrossRefGoogle Scholar
  12. 12.
    Rubinstein, B.I.P., McAuliffe, F., Cawley, S., Palaniswami, M., Ramamohanarao, K., Speed, T.S.: Machine learning in low-level microarray analysis. ACM SIGKDD Explorations Newsletter 5(2), 130–139 (2003)CrossRefGoogle Scholar
  13. 13.
    Corchado, J.M., Corchado, E.S., Aiken, J., Fyfe, C., Fdez-Riverola, F., Glez-Bedia, M.: Maximum Likelihood Hebbian Learning Based Retrieval Method for CBR Systems. In: Proc. of the 5th International Conference on Case-Based Reasoning, pp. 107–121 (2003)Google Scholar
  14. 14.
    Corchado, J.M., Aiken, J., Corchado, E., Lefevre, N., Smyth, T.: Quantifying the ocean’s CO2 budget with a coHeL-IBR system. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 533–546. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Cakmakov, D., Bennani, Y.: Feature selection for pattern recognition. Informa Press (2002)Google Scholar
  16. 16.
    Jurisica, I., Glawgow, J.: Applications of case-based reasoning in molecular biology. Artificial Intelligence Magazine, Special issue on Bioinformatics 25(1), 85–95 (2004)Google Scholar
  17. 17.
    Li, L., Darden, T.A., Weinberg, C.R., Levine, A.J., Pedersen, L.G.: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Combinatorial Chemistry and High Throughput Screening 4(8), 727–739 (2001)Google Scholar
  18. 18.
    Blanco, R., Larrañaga, P., Inza, I., Sierra, B.: Gene selection for cancer classification using wrapper approaches. International Journal of Pattern Recognition and Artificial Intelligence (accepted for publication) (2004)Google Scholar
  19. 19.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)zbMATHCrossRefGoogle Scholar
  20. 20.
    Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved gene selection for classification of microarrays. In: Proc. of Pacific Symposium on Biocomputing, pp. 53–64 (2003)Google Scholar
  21. 21.
    Qi, H.: Feature selection and kNN fusion in molecular classification of multiple tumor types. In: Proc. of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (2002)Google Scholar
  22. 22.
    Hanczar, B., Courtine, M., Benis, A., Hennegar, C., Clément, K., Zucker, J.D.: Improving classification of microarray data using prototype-based feature selection. In: ACM SIGKDD Explorations Newsletter, vol. 5(2), pp. 23–30 (2003)Google Scholar
  23. 23.
    Zheng, G., Olusegun, E., Narasimhan, G.: Neural network classifiers and gene selection methods for microarray data on human lung adenocarcinoma. Prof. of Critical Assessment of Microarray Data Analysis, 63–67 (2003)Google Scholar
  24. 24.
    Hochreiter, S., Obermayer, K.: Feature selection and classification on matrix data: from large margins to small covering numbers. In: Advances in Neural Information Processing Systems, vol. 15, pp. 913–920 (2003)Google Scholar
  25. 25.
    Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. In: Advances in Neural Information Processing Systems, vol. 13, pp. 668–674 (2001)Google Scholar
  26. 26.
    Pal, S., Shiu, S.: Foundations of Soft Case-Based Reasoning. John Wiley, New York (2004)CrossRefGoogle Scholar
  27. 27.
    Pal, S., Mitra, P.: Case Generation Using Rough Sets with Fuzzy Representation. IEEE Transactions on Knowledge and Data Engineering 16(3), 292–300 (2004)CrossRefGoogle Scholar
  28. 28.
    Riesbeck, C.K., Schank, R.C.: Inside Case-Based Reasoning. Lawrence Erlbaum Associates, Hillsdale (1999)Google Scholar
  29. 29.
    Fdez-Riverola, F., Corchado, J.M.: FSfRT, Forecasting System for Red Tides. An Hybrid Autonomous AI Model. Applied Artificial Intelligence 17(10), 955–982 (2003)Google Scholar
  30. 30.
    Pal, S.K., Dilon, T.S., Yeung, D.S.: Soft Computing in Case Based Reasoning. Springer, London (2000)Google Scholar
  31. 31.
    Sankar, K.P., Simon, C.K.S.: Foundations of Soft Case-Based Reasoning. Wiley-Interscience, Hoboken (2003)Google Scholar
  32. 32.
    Fdez-Riverola, F., Corchado, J.M.: Employing TSK Fuzzy models to automate the revision stage of a CBR system. In: Current Topics in Artificial Intelligence. LNCS (LNAI), vol. 3040, pp. 302–311 (2004)Google Scholar
  33. 33.
    Gutierrez, N.C., López-Pérez, R., Hernández, J.M., Isidro, I., González, B., García, J.L., Ferminán, E., Lumbreras, E., San Miguel, J.F.: Gene expression profile reveals deregulation of new genes with relevant functions in the different subclasses of acute myeloid leukemia. Blood 102(11) (2003)Google Scholar
  34. 34.
    Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. of the National Academy of Sciences of the United States of America 99(10), 6561–6572 (2002)Google Scholar
  35. 35.
    Aaronson, J.S., Juergen, H., Overton, G.C.: Knowledge Discovery in GENBANK. In: Proc. of the First International Conference on Intelligent Systems for Molecular Biology, pp. 3–11 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Florentino Fdez-Riverola
    • 1
  • Fernando Díaz
    • 2
  • M. Lourdes Borrajo
    • 1
  • J. Carlos Yáñez
    • 3
  • Juan M. Corchado
    • 4
  1. 1.Dept. InformáticaUniversity of Vigo, Escuela Superior de Ingeniería Informática, Edificio PolitécnicoOurenseSpain
  2. 2.Dept. InformáticaUniversity of Valladolid, Escuela Universitaria de InformáticaSegoviaSpain
  3. 3.Dept. of Financial AccountingUniversity of VigoOurenseSpain
  4. 4.Dept. de Informática y AutomáticaUniversity of SalamancaSalamancaSpain

Personalised recommendations