Gene Expression Data Mining for Functional Genomics using Fuzzy Technology

  • Reinhard Guthke
  • Wolfgang Schmidt-Heck
  • Daniel Hahn
  • Michael Pfaff
Part of the International Series in Intelligent Technologies book series (ISIT, volume 18)


Methods for supervised and unsupervised clustering and machine learning were studied in order to automatically model relationships between gene expression data and gene functions of the microorganism Escherichia coli. From a pre-selected subset of 265 genes (belonging to 3 functional groups) the function has been predicted with an accuracy of 63–71 % by various data mining methods described in this paper. Whereas some of these methods, i.e. K-means clustering, Kohonen’s self-organizing maps (SOM), Eisen’s hierarchical clustering and Quinlan’s C4.5 decision tree induction algorithm have been applied to gene expression data analysis in the literature already, the fuzzy approach for gene expression data analysis is introduced by the authors. The fuzzy-C-means algorithm (FCM) and the Gustafson-Kessel algorithm for unsupervised clustering as well as the Adaptive Neuro-Fuzzy Inference System (ANFIS) were successfully applied to the functional classification of E. coli genes.

Key words

fuzzy clustering machine leaming biotechnology dna chips microarrays escherichia coli 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bezdek, J.C, Pattem Recognition with Fuzzy Objective Function Algorithms“, N.Y. Plenum, 1981.CrossRefGoogle Scholar
  2. Blattner, F.R., Plunkett, G., Bloch, CA., Pema, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., Gregor, J., Davis, N.W., Kirkpatrick, H.A., Goeden, M.A., Rose, D.J., Mau, B., Shao, Y., The complete genome sequence of Escherichia coli K-12, Science 1997; 277: 1453–1474.CrossRefGoogle Scholar
  3. Brazma, A., Mining the yeast genome expression and sequence data, 1999,.Google Scholar
  4. Eisen, M.B., Spellman, M.B., Brown, P.O., Botstein, D., Cluster analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. 1998, 95: 14863–14868.CrossRefGoogle Scholar
  5. Gustafson, E.E., Kessel, W.C, Fuzzy Clustering with a Fuzzy Covariance Matrix, IEEE CDC, San Diego, Califomia, 1979, 761–766.Google Scholar
  6. Guthke, R., Schroeckh, V., Berkholz, R., Pfaff, M., Data Mining and Model based Experimental Design of Fed Batch Fermentations for Bioprocess Optimization and Functional Genomics. Proceedings of the Conference on Data Mining in Bioinformatics; 1999, November 10–12; EMBL European Bioinformatics Institute, Hinxton/Cambridge/UKGoogle Scholar
  7. Guthke, R., Hahn, D., Fahnert, B., Kroll, T., Wölfl, S., Gene Expression Data Mining by Fuzzy-C-Means Clustering and Fuzzy Rule Generation. Proceedings of the 11th International Biotechnology Symposium; 2000 September 3–8, Berlin.Google Scholar
  8. Henrion, R., Henrion, G., Multivariate Datenanalyse, Berlin: Springer-Verlag, 1995.CrossRefGoogle Scholar
  9. Jang, J.-S.R., Fuzzy Modeling Using Generalized Neural Networks and Kalman Filter Algorithms. Proceedings of the 9th National Conference on Artificial Intelligence (AAAI-91); 1991; 762–767.Google Scholar
  10. Jang, J.-S.R., ANFIS: Adaptive-Network-based Fuzzy Inference Systems, IEEE Transactions on Systems, Man and Cybernetics 1993; 23: 665–685.CrossRefGoogle Scholar
  11. Kohonen, T., Self-Organization and Associative Memory. 2nd Edition, Berlin: Springer-Verlag, 1987.Google Scholar
  12. Mac Queen, J., Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on Math. Stat. Prob., 1965/66. Lecam, L.M. and Neyman, J. eds. Berkely 1967, 1:281–297.Google Scholar
  13. Quinlan, J.R., C4.5: Programs for Machine Learning, Morgan Kaufmann, 1993.Google Scholar
  14. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, S., Lander, E.S., Golub, T.R., Interpreting patterns of gene expression with self-organizing maps: Methods & application to hematopoietic differentiation. Proc. Natl. Acad. Sci. 1999; 96:2907–2912.CrossRefGoogle Scholar
  15. Tao, H., Bausch, C, Richmond, C., Blattner, F.R., Conway, T., Functional Genomics: Expression Analysis of Escherichia coli growing on minimal and rich media. J. Bacteriol. 1999; 181:6425–6440.Google Scholar

Copyright information

© Springer Science+Business Media New York 2002

Authors and Affiliations

  • Reinhard Guthke
    • 1
  • Wolfgang Schmidt-Heck
    • 1
  • Daniel Hahn
    • 1
  • Michael Pfaff
    • 2
  1. 1.Hans Knöll Institute for Natural Products ResearchJenaGermany
  2. 2.BioControl Jena GmbHJenaGermany

Personalised recommendations