MetaFlow: Metagenomic Profiling Based on Whole-Genome Coverage Analysis with Min-Cost Flows

  • Ahmed Sobih
  • Alexandru I. Tomescu
  • Veli Mäkinen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9649)


High-throughput sequencing (HTS) of metagenomes is proving essential in understanding the environment and diseases. State-of-the-art methods for discovering the species and their abundances in an HTS sample are based on genome-specific markers, which can lead to skewed results, especially at species level. We present MetaFlow, the first method based on coverage analysis across entire genomes that also scales to HTS samples. We formulated this problem as an NP-hard matching problem in a bipartite graph, which we solved in practice by min-cost flows. On synthetic data sets of varying complexity and similarity, MetaFlow is more precise and sensitive than popular tools such as MetaPhlAn, mOTU, GSMer and BLAST, and its abundance estimations at species level are two to four times better in terms of \(\ell _1\)-norm. On a real human stool data set, MetaFlow identifies B.uniformis as most predominant, in line with previous human gut studies, whereas marker-based methods report it as rare. MetaFlow is freely available at


Reference Genome Bipartite Graph Abundance Estimation Full Version Read Coverage 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Romeo Rizzi for discussions about the computational complexity of our problem. This work was partially supported by the Academy of Finland under grants 284598 (CoECGR) to A.S. and V.M. and 274977 to A.T.


  1. 1.
    Altschul, S.F., et al.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRefGoogle Scholar
  2. 2.
    Brady, A., Salzberg, S.L.: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6(9), 673–676 (2009)CrossRefGoogle Scholar
  3. 3.
    Durbin, R., et al.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)CrossRefzbMATHGoogle Scholar
  4. 4.
    Huson, D.H., et al.: MEGAN analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Lo, C., et al.: Evaluating genome architecture of a complex region via generalized bipartite matching. BMC Bioinform. 14(S–5), S13 (2013)CrossRefGoogle Scholar
  6. 6.
    Mavromatis, K., et al.: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4(6), 495–500 (2007)CrossRefGoogle Scholar
  7. 7.
    Poretsky, R., et al.: Strengths and limitations of 16S rRNA gene amplicon sequencing in revealing temporal microbial community dynamics. PLoS One 9(4), e93827 (2014)CrossRefGoogle Scholar
  8. 8.
    Qin, J., et al.: A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464(7285), 59–65 (2010)CrossRefGoogle Scholar
  9. 9.
    Raymond, J., et al.: The natural history of nitrogen fixation. Mol. Biol. Evol. 21(3), 541–554 (2004)CrossRefGoogle Scholar
  10. 10.
    Richter, D.C., et al.: MetaSim-A sequencing simulator for genomics and metagenomics. PLoS One 3(10), e3373 (2008)CrossRefGoogle Scholar
  11. 11.
    Rocap, G., et al.: Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature 424(6952), 1042–1047 (2003)CrossRefGoogle Scholar
  12. 12.
    Segata, N., et al.: Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9(8), 811–814 (2012)CrossRefGoogle Scholar
  13. 13.
    Steinhaus, H.: Sur la division des corps matériels en parties. Bull. Acad. Polon. Sci. Cl. III. 4, 801–804 (1956)zbMATHMathSciNetGoogle Scholar
  14. 14.
    Sunagawa, S., et al.: Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10(12), 1196–1199 (2013)CrossRefGoogle Scholar
  15. 15.
    Tu, Q., et al.: Strain/species identification in metagenomes using genome-specific markers. Nucleic Acids Res. 42, e67 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Ahmed Sobih
    • 1
  • Alexandru I. Tomescu
    • 1
  • Veli Mäkinen
    • 1
  1. 1.Helsinki Institute for Information Technology HIIT, Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations