Clustering Metagenome Short Reads Using Weighted Proteins

  • Gianluigi Folino
  • Fabio Gori
  • Mike S. M. Jetten
  • Elena Marchiori
Conference paper

DOI: 10.1007/978-3-642-01184-9_14

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5483)
Cite this paper as:
Folino G., Gori F., Jetten M.S.M., Marchiori E. (2009) Clustering Metagenome Short Reads Using Weighted Proteins. In: Pizzuti C., Ritchie M.D., Giacobini M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2009. Lecture Notes in Computer Science, vol 5483. Springer, Berlin, Heidelberg

Abstract

This paper proposes a new knowledge-based method for clustering metagenome short reads. The method incorporates biological knowledge in the clustering process, by means of a list of proteins associated to each read. These proteins are chosen from a reference proteome database according to their similarity with the given read, as evaluated by BLAST. We introduce a scoring function for weighting the resulting proteins and use them for clustering reads. The resulting clustering algorithm performs automatic selection of the number of clusters, and generates possibly overlapping clusters of reads. Experiments on real-life benchmark datasets show the effectiveness of the method for reducing the size of a metagenome dataset while maintaining a high accuracy of organism content.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Gianluigi Folino
    • 1
  • Fabio Gori
    • 2
  • Mike S. M. Jetten
    • 2
  • Elena Marchiori
    • 2
  1. 1.ICAR-CNRRendeItaly
  2. 2.Radboud UniversityNijmegenThe Netherlands

Personalised recommendations