CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads

  • Sourav Chatterji
  • Ichitaro Yamazaki
  • Zhaojun Bai
  • Jonathan A. Eisen
Conference paper

DOI: 10.1007/978-3-540-78839-3_3

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4955)
Cite this paper as:
Chatterji S., Yamazaki I., Bai Z., Eisen J.A. (2008) CompostBin: A DNA Composition-Based Algorithm for Binning Environmental Shotgun Reads. In: Vingron M., Wong L. (eds) Research in Computational Molecular Biology. RECOMB 2008. Lecture Notes in Computer Science, vol 4955. Springer, Berlin, Heidelberg

Abstract

A major hindrance to studies of microbial diversity has been that the vast majority of microbes cannot be cultured in the laboratory and thus are not amenable to traditional methods of characterization. Environmental shotgun sequencing (ESS) overcomes this hurdle by sequencing the DNA from the organisms present in a microbial community. The interpretation of this metagenomic data can be greatly facilitated by associating every sequence read with its source organism. We report the development of CompostBin, a DNA composition-based algorithm for analyzing metagenomic sequence reads and distributing them into taxon-specific bins. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. CompostBin uses a novel weighted PCA algorithm to project the high dimensional DNA composition data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We demonstrate the algorithm’s accuracy on a variety of low to medium complexity data sets.

Keywords

Metagenomics Binning Feature Extraction Normalized Cut weighted PCA DNA composition metrics Genome Signatures 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Sourav Chatterji
    • 1
  • Ichitaro Yamazaki
    • 2
  • Zhaojun Bai
    • 2
  • Jonathan A. Eisen
    • 1
    • 3
  1. 1.Genome Center, U C DavisDavisUSA
  2. 2.Computer Science Department, U C DavisDavisUSA
  3. 3.The Joint Genome InstituteWalnut CreekUSA

Personalised recommendations