MetaFlow: Metagenomic Profiling Based on Whole-Genome Coverage Analysis with Min-Cost Flows

  • Ahmed Sobih
  • Alexandru I. Tomescu
  • Veli Mäkinen
Conference paper

DOI: 10.1007/978-3-319-31957-5_8

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9649)
Cite this paper as:
Sobih A., Tomescu A.I., Mäkinen V. (2016) MetaFlow: Metagenomic Profiling Based on Whole-Genome Coverage Analysis with Min-Cost Flows. In: Singh M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science, vol 9649. Springer, Cham

Abstract

High-throughput sequencing (HTS) of metagenomes is proving essential in understanding the environment and diseases. State-of-the-art methods for discovering the species and their abundances in an HTS sample are based on genome-specific markers, which can lead to skewed results, especially at species level. We present MetaFlow, the first method based on coverage analysis across entire genomes that also scales to HTS samples. We formulated this problem as an NP-hard matching problem in a bipartite graph, which we solved in practice by min-cost flows. On synthetic data sets of varying complexity and similarity, MetaFlow is more precise and sensitive than popular tools such as MetaPhlAn, mOTU, GSMer and BLAST, and its abundance estimations at species level are two to four times better in terms of \(\ell _1\)-norm. On a real human stool data set, MetaFlow identifies B.uniformis as most predominant, in line with previous human gut studies, whereas marker-based methods report it as rare. MetaFlow is freely available at http://cs.helsinki.fi/gsa/metaflow.

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Ahmed Sobih
    • 1
  • Alexandru I. Tomescu
    • 1
  • Veli Mäkinen
    • 1
  1. 1.Helsinki Institute for Information Technology HIIT, Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations