A Scalable Reference-Free Metagenomic Binning Pipeline

  • Terry Ma
  • Xin XingEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10847)


Metagenomics studies microbial genomes in an ecosystem such as the gastrointestinal tract of a human through sequencing thousands of organism in parallel. The sheer number of genomic fragments are challenging for current metagenomic binning software to process. Here we present a scalable reference-free metagenomic binning pipeline designed to handle large scale metagenomic data. It allows users to input several tera base pairs (TB) of reads and produces highly accurate binning results, even at a species level. The pipeline outputs all binned species in multiple metagenomic samples and their estimated relative abundance. We integrate the pipeline into an open-source software, MetaMat, which is freely available at:


Metagenomic binning Parallel computing Disease diagnosis 



This research was supported in part by the National Institutes of Health grant R01 GM113242-01 and the National Science Foundation grants DMS-1440038 and DMS-1440037.


  1. 1.
    Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: Megan analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)CrossRefGoogle Scholar
  2. 2.
    Liu, B., Gibbons, T., Ghodsi, M., Pop, M.: MetaPhyler: taxonomic profiling for metagenomic sequences. In: 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 95–100. IEEE (2010)Google Scholar
  3. 3.
    Wood, D.E., Salzberg, S.L.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), R46 (2014)CrossRefGoogle Scholar
  4. 4.
    Ounit, R., Wanamaker, S., Close, T.J., Lonardi, S.: CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom. 16(1), 236 (2015)CrossRefGoogle Scholar
  5. 5.
    Abe, T., Sugawara, H., Kinouchi, M., Kanaya, S., Ikemura, T.: Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. DNA Res. 12(5), 281–290 (2005)CrossRefGoogle Scholar
  6. 6.
    Alneberg, J., Bjarnason, B.S., Bruijn, I.D., Schirmer, M., Quick, J., Ijaz, U.Z., Lahti, L., Loman, N.J., Andersson, A.F., Quince, C.: Binning metagenomic contigs by coverage and composition. Nature Methods 11(11), 1144 (2014)CrossRefGoogle Scholar
  7. 7.
    Yu-Wei, W., Simmons, B.A., Singer, S.W.: MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32(4), 605–607 (2015)Google Scholar
  8. 8.
    Kang, D.D., Froula, J., Egan, R., Wang, Z.: MetaBat, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015)CrossRefGoogle Scholar
  9. 9.
    Imelfort, M., Parks, D., Woodcroft, B.J., Dennis, P., Hugenholtz, P., Tyson, G.W.: GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2, e603 (2014)CrossRefGoogle Scholar
  10. 10.
    Laczny, C.C., Sternal, T., Plugaru, V., Gawron, P., Atashpendar, A., Margossian, H.H., Coronado, S., Van der Maaten, L., Vlassis, N., Wilmes, P.: VizBin-an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome 3(1), 1 (2015)CrossRefGoogle Scholar
  11. 11.
    Boisvert, S., Raymond, F., Godzaridis, É., Laviolette, F., Corbeil, J.: Ray meta: scalable de novo metagenome assembly and profiling. Genome Biol. 13(12), R122 (2012)CrossRefGoogle Scholar
  12. 12.
    Li, D., Liu, C.-M., Luo, R., Sadakane, K., Lam, T.-W.: MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10), 1674–1676 (2015)CrossRefGoogle Scholar
  13. 13.
    Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nature Methods 9(4), 357 (2012)CrossRefGoogle Scholar
  14. 14.
    Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)CrossRefGoogle Scholar
  15. 15.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Haynes, W.: Wilcoxon rank sum test. In: Dubitzky, W., Wolkenhauer, O., Cho, K.H., Yokota, H. (eds.) Encyclopedia of Systems Biology, pp. 2354–2355. Springer, New York (2013). Scholar
  17. 17.
    Tange, O., et al.: GNU parallel-the command-line power tool. USENIX Mag. 36(1), 42–47 (2011)Google Scholar
  18. 18.
    Analytics Revolution, Weston, S.: doParallel: foreach parallel adaptor for the parallel package. R package version, vol. 1, no. 8 (2014)Google Scholar
  19. 19.
    Analytics Revolution, Weston, S.: Foreach: foreach looping construct for R. R package version, vol. 1, no. 1 (2013)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Lambert High SchoolSuwaneeUSA
  2. 2.University of GeorgiaAthensUSA

Personalised recommendations