MetaFlow: Metagenomic Profiling Based on Whole-Genome Coverage Analysis with Min-Cost Flows

  • Ahmed Sobih
  • Alexandru I. Tomescu
  • Veli Mäkinen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9649)


High-throughput sequencing (HTS) of metagenomes is proving essential in understanding the environment and diseases. State-of-the-art methods for discovering the species and their abundances in an HTS sample are based on genome-specific markers, which can lead to skewed results, especially at species level. We present MetaFlow, the first method based on coverage analysis across entire genomes that also scales to HTS samples. We formulated this problem as an NP-hard matching problem in a bipartite graph, which we solved in practice by min-cost flows. On synthetic data sets of varying complexity and similarity, MetaFlow is more precise and sensitive than popular tools such as MetaPhlAn, mOTU, GSMer and BLAST, and its abundance estimations at species level are two to four times better in terms of \(\ell _1\)-norm. On a real human stool data set, MetaFlow identifies B.uniformis as most predominant, in line with previous human gut studies, whereas marker-based methods report it as rare. MetaFlow is freely available at


  1. 1.
    Altschul, S.F., et al.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRefGoogle Scholar
  2. 2.
    Brady, A., Salzberg, S.L.: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6(9), 673–676 (2009)CrossRefGoogle Scholar
  3. 3.
    Durbin, R., et al.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)CrossRefMATHGoogle Scholar
  4. 4.
    Huson, D.H., et al.: MEGAN analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Lo, C., et al.: Evaluating genome architecture of a complex region via generalized bipartite matching. BMC Bioinform. 14(S–5), S13 (2013)CrossRefGoogle Scholar
  6. 6.
    Mavromatis, K., et al.: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4(6), 495–500 (2007)CrossRefGoogle Scholar
  7. 7.
    Poretsky, R., et al.: Strengths and limitations of 16S rRNA gene amplicon sequencing in revealing temporal microbial community dynamics. PLoS One 9(4), e93827 (2014)CrossRefGoogle Scholar
  8. 8.
    Qin, J., et al.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285), 59–65 (2010)CrossRefGoogle Scholar
  9. 9.
    Raymond, J., et al.: The natural history of nitrogen fixation. Mol. Biol. Evol. 21(3), 541–554 (2004)CrossRefGoogle Scholar
  10. 10.
    Richter, D.C., et al.: MetaSim-A sequencing simulator for genomics and metagenomics. PLoS One 3(10), e3373 (2008)CrossRefGoogle Scholar
  11. 11.
    Rocap, G., et al.: Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature 424(6952), 1042–1047 (2003)CrossRefGoogle Scholar
  12. 12.
    Segata, N., et al.: Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9(8), 811–814 (2012)CrossRefGoogle Scholar
  13. 13.
    Steinhaus, H.: Sur la division des corps matériels en parties. Bull. Acad. Polon. Sci. Cl. III. 4, 801–804 (1956)MATHMathSciNetGoogle Scholar
  14. 14.
    Sunagawa, S., et al.: Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10(12), 1196–1199 (2013)CrossRefGoogle Scholar
  15. 15.
    Tu, Q., et al.: Strain/species identification in metagenomes using genome-specific markers. Nucleic Acids Res. 42, e67 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Ahmed Sobih
    • 1
  • Alexandru I. Tomescu
    • 1
  • Veli Mäkinen
    • 1
  1. 1.Helsinki Institute for Information Technology HIIT, Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations