Advertisement

Ontology-Driven Co-clustering of Gene Expression Data

  • Francesca Cordero
  • Ruggero G. Pensa
  • Alessia Visconti
  • Dino Ienco
  • Marco Botta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5883)

Abstract

The huge volume of gene expression data produced by microarrays and other high-throughput techniques has encouraged the development of new computational techniques to evaluate the data and to formulate new biological hypotheses. To this purpose, co-clustering techniques are widely used: these identify groups of genes that show similar activity patterns under a specific subset of the experimental conditions by measuring the similarity in expression within these groups. However, in many applications, distance metrics based only on expression levels fail in capturing biologically meaningful clusters.

We propose a methodology in which a standard expression-based co-clustering algorithm is enhanced by sets of constraints which take into account the similarity/dissimilarity (inferred by the Gene Ontology, GO) between pairs of genes. Our approach minimizes the intervention of the analyst within the co-clustering process. It provides meaningful co-clusters whose discovery and interpretation is increased by embedding GO annotations.

Keywords

Gene Ontology Root Mean Square Error Gene Expression Data Transitive Closure Normalize Mutual Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Eisen, M., Spellman, P., Botstein, P.B.D.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998)CrossRefGoogle Scholar
  2. 2.
    Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1, 24–45 (2004)CrossRefGoogle Scholar
  3. 3.
    Hanisch, D., Zien, A., Zimmer, R., Lengauer, T.: Co-clustering of biological networks and gene expression data. Bioinformatics 18, S145–S154 (2002)Google Scholar
  4. 4.
    Steinhauser, D., Junker, B., Luedemann, A., Selbig, J., Kopka, J.: Hypothesis-driven approach to predict transcriptional units from gene expression data. Bioinformatics 20, 1928–1939 (2004)CrossRefGoogle Scholar
  5. 5.
    Brameier, M., Wiuf, C.: Co-clustering and visualization of gene expression data and gene ontology terms for saccharomyces cerevisiae using self-organizing maps. J. Biomed. Inform. 40, 160–173 (2007)CrossRefGoogle Scholar
  6. 6.
    Pensa, R., Boulicaut, J.: Constrained co-clustering of gene expression data. In: Proceedings of SIAM SDM, pp. 25–36 (2008)Google Scholar
  7. 7.
    Cordero, F., Visconti, A., Botta, M.: A new protein motif extraction framework based on constrained co-clustering. In: Proceedings of the 24th Annual ACM Symposium on Applied Computing, pp. 776–781 (2009)Google Scholar
  8. 8.
    Ashburner, M., et al.: Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat Genet. 25, 25–29 (2000)CrossRefGoogle Scholar
  9. 9.
    Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proceedings ISMB 2000, pp. 93–103 (2000)Google Scholar
  10. 10.
    Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum-squared residue co-clustering of gene expression data. In: Proceedings of the Fourth SIAM International Conference on Data Mining, pp. 114–125 (2004)Google Scholar
  11. 11.
    Salvador, S., Chan, P.: Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In: Proceedings of the 16th IEEE International Conference on Tools with AI, pp. 576–584 (2004)Google Scholar
  12. 12.
    Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Francesca Cordero
    • 1
    • 2
    • 3
  • Ruggero G. Pensa
    • 2
  • Alessia Visconti
    • 2
    • 3
  • Dino Ienco
    • 2
    • 3
  • Marco Botta
    • 2
    • 3
  1. 1.Department of Clinical and Biological SciencesUniversity of Torino 
  2. 2.Department of Computer ScienceUniversity of Torino 
  3. 3.Center for Complex Systems in Molecular Biology and Medicine - SysBioMUniversity of Torino 

Personalised recommendations