Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information

  • Saket Navlakha
  • James White
  • Niranjan Nagarajan
  • Mihai Pop
  • Carl Kingsford
Conference paper

DOI: 10.1007/978-3-642-02008-7_29

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5541)
Cite this paper as:
Navlakha S., White J., Nagarajan N., Pop M., Kingsford C. (2009) Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information. In: Batzoglou S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science, vol 5541. Springer, Berlin, Heidelberg

Abstract

Hierarchical clustering is a popular method for grouping together similar elements based on a distance measure between them. In many cases, annotations for some elements are known beforehand, which can aid the clustering process. We present a novel approach for decomposing a hierarchical clustering into the clusters that optimally match a set of known annotations, as measured by the variation of information metric. Our approach is general and does not require the user to enter the number of clusters desired. We apply it to two biological domains: finding protein complexes within protein interaction networks and identifying species within metagenomic DNA samples. For these two applications, we test the quality of our clusters by using them to predict complex and species membership, respectively. We find that our approach generally outperforms the commonly used heuristic methods.

Keywords

Hierarchical Tree Decompositions Variation of Information Clustering Protein Interaction Networks Metagenomics OTUs 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Saket Navlakha
    • 1
    • 2
  • James White
    • 2
  • Niranjan Nagarajan
    • 1
    • 2
  • Mihai Pop
    • 1
    • 2
  • Carl Kingsford
    • 1
    • 2
  1. 1.Department of Computer Science 
  2. 2.Center for Bioinformatics and Computational Biology, Institute for Advanced Computer StudiesUniversity of Maryland

Personalised recommendations