Advertisement

A Knowledge-Driven Method to Evaluate Multi-source Clustering

  • Chengyong Yang
  • Erliang Zeng
  • Tao Li
  • Giri Narasimhan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3759)

Abstract

Recent research demonstrated that biological literature can complement the information extracted from gene expression data to obtain better gene clusters. The Multi-Source Clustering (MSC) algorithm, which was recently proposed by the authors, performs semantic integration of information obtained from gene expression data and biomedical text literature. To address the challenge of evaluating clustering results, a new knowledge-driven approach is proposed based on information extracted from a database of published binding sites of known transcription factors (TF). We propose the use of a measure called C-index for an objective, quantitative evaluation. We compare the results of algorithm MSC for the integrated data sources with the results obtained (a) & (b) by clustering applied to the two sources of data separately, and (c) by clustering after using a feature-level integration. We show that the C-index measurements of the clustering results from MSC are better than that from the other three approaches.

Keywords

Gene Expression Data Text Data Cluster Assignment Semantic Integration MEDLINE Abstract 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Eisen, M.B., Spellman, P.T., et al.: Cluster analysis and display of genome-wide expression patterns. In: Proc. Natl. Acad. Sci., USA, vol. 95(25), pp. 14863–14868 (1998)Google Scholar
  2. 2.
    Spellman, P.T., Sherlock, G., et al.: Identification of cell cycle regulated genes in yeast by DNA microarray hybridization. Mol. Biol. Cell 9, 371a (1998)Google Scholar
  3. 3.
    Sherlock, G.: Analysis of large-scale gene expression data. Curr. Opin. Immunol. 12(2), 201–205 (2000)CrossRefGoogle Scholar
  4. 4.
    Sharan, R., Elkon, R., et al.: Cluster analysis and its applications to gene expressdata. In: Ernst Schering Res Found Workshop, vol. 38, pp. 83–108 (2002)Google Scholar
  5. 5.
    Altman, R.B., Raychaudhuri, S.: Whole-genome expression analysis: challenges beyond clustering. Curr. Opin. Struct. Biol. 11(3), 340–347 (2001)Google Scholar
  6. 6.
    Shatkay, H., Edwards, S., et al.: Genes, themes and microarrays: using information retrieval for large-scale gene analysis. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 8, pp. 317–28 (2000)Google Scholar
  7. 7.
    Stephens, M., Palakal, M., et al.: Detecting gene relations from Medline abstracts. Pac. Symp. Biocomput., 483–496 (2001)Google Scholar
  8. 8.
    Chiang, J.H., Yu, H.C.: MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 19(11), 1417–1422 (2003)CrossRefGoogle Scholar
  9. 9.
    Raychaudhuri, S., Chang, J.T., et al.: The computational analysis of scientific literature to define and recognize gene expression clusters. Nucleic Acids Res. 31(15), 4553–4560 (2003)Google Scholar
  10. 10.
    Glenisson, P., Mathys, J., et al.: Meta-Clustering of Gene Expression Data and Literature-based Information. SIGKDD Explorations 5(2), 101–112 (2004)CrossRefGoogle Scholar
  11. 11.
    Yang, C., Zeng, E., et al.: Clustering Genes using Gene Expression and Text Literature Data. In: Proc. of Computational Systems Bioinformatics CSB(2005) (To Appear)Google Scholar
  12. 12.
    Ihmels, J., Friedlander, G., et al.: Revealing modular organization in the yeast transcriptional network. Nat. Genet. 31(4), 370–7 (2002)Google Scholar
  13. 13.
    Adryan, B., Schuh, R.: Gene-Ontology-based clustering of gene expression data. Bioinformatics 20(16), 2851–2852 (2004)Google Scholar
  14. 14.
    Tanay, A., Sharan, R., et al.: Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. In: Proc. Natl. Acad. Sci., USA, vol. 101(9), pp. 2981–2986 (2004)Google Scholar
  15. 15.
    Becker, S.: Mutual information maximization: Models of cortical self-organization. Network: Computation in Neural Systems 7(1), 7–31 (1996)zbMATHCrossRefGoogle Scholar
  16. 16.
    Segal, E., Yelensky, R., et al.: Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 19 (Suppl. 1) 273–282 (2003)Google Scholar
  17. 17.
    Gibbons, F.D., Roth, F.P.: Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res. 12(10), 1574–1581 (2002)CrossRefGoogle Scholar
  18. 18.
    Ashburner, M., Ball, C.A., et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–9 (2000)Google Scholar
  19. 19.
    Bolshakova, N., Azuaje, F., et al.: A knowledge-driven approach to cluster validity assessment. Bioinformatics (2005) (In Press)Google Scholar
  20. 20.
    Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  21. 21.
    Dempster, A.P., Laird, N.M., et al.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  22. 22.
    Wingender, E., Chen, X., et al.: TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28(1), 316–319 (2000)Google Scholar
  23. 23.
    Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychologie 29, 190–241 (1976)zbMATHMathSciNetGoogle Scholar
  24. 24.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley Longman Publishing Co Inc., Amsterdam (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Chengyong Yang
    • 1
  • Erliang Zeng
    • 1
  • Tao Li
    • 1
  • Giri Narasimhan
    • 1
  1. 1.Bioinformatics Research Group (BioRG), School of Computer ScienceFlorida International UniversityMiamiUSA

Personalised recommendations