Skip to main content

Advertisement

Log in

EMDUniFrac: exact linear time computation of the UniFrac metric and identification of differentially abundant organisms

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

Both the weighted and unweighted UniFrac distances have been very successfully employed to assess if two communities differ, but do not give any information about how two communities differ. We take advantage of recent observations that the UniFrac metric is equivalent to the so-called earth mover’s distance (also known as the Kantorovich–Rubinstein metric) to develop an algorithm that not only computes the UniFrac distance in linear time and space, but also simultaneously finds which operational taxonomic units are responsible for the observed differences between samples. This allows the algorithm, called EMDUniFrac, to determine why given samples are different, not just if they are different, and with no added computational burden. EMDUniFrac can be utilized on any distribution on a tree, and so is particularly suitable to analyzing both operational taxonomic units derived from amplicon sequencing, as well as community profiles resulting from classifying whole genome shotgun metagenomes. The EMDUniFrac source code (written in python) is freely available at: https://github.com/dkoslicki/EMDUniFrac.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Adler I, Hoffman AJ, Shamir R (1993) Monge and feasibility sequences in general flow problems. Discrete Appl Math 44(1–3):21–38

    Article  MathSciNet  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    Article  Google Scholar 

  • Altschuler, J, Weed J, Rigollet P (2017) Near-linear time approximation algorithms for optimal transport via sinkhorn iteration. arXiv preprint arXiv:1705.09634

  • Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI et al (2010) Qiime allows analysis of high-throughput community sequencing data. Nat Methods 7(5):335–336

    Article  Google Scholar 

  • Cuturi M (2013) Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in neural information processing systems 26, proceedings of the neural information processing systems conference 2013, pp 2292–2300

  • Evans SN, Matsen FA (2012) The phylogenetic kantorovich-rubinstein metric for environmental sequence samples. J R Stat Soc Ser B (Stat Methodol) 74(3):569–592

    Article  MathSciNet  Google Scholar 

  • Frank DN, Amand ALS, Feldman RA, Boedeker EC, Harpaz N, Pace NR (2007) Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc Nat Acad Sci 104(34):13780–13785

    Article  Google Scholar 

  • Hamady M, Lozupone C, Knight R (2010) Fast unifrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and phylochip data. ISME J 4(1):17–27

    Article  Google Scholar 

  • Huerta-Cepas J, Serra F, Bork P (2016) Ete 3: reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol 33(6):1635–1638

    Article  Google Scholar 

  • Ley RE, Peterson DA, Gordon JI (2006) Ecological and evolutionary forces shaping microbial diversity in the human intestine. Cell 124(4):837–848

    Article  Google Scholar 

  • Ling H, Okada K (2006) Emd-l 1: an efficient and robust algorithm for comparing histogram-based descriptors. Comput Vis ECCV 2006:330–343

    Google Scholar 

  • Lozupone C, Knight R (2005) Unifrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12):8228–8235

    Article  Google Scholar 

  • Lozupone CA, Hamady M, Kelley ST, Knight R (2007) Quantitative and qualitative \(\beta \) diversity measures lead to different insights into factors that structure microbial communities. Appl Environ Microbiol 73(5):1576–1585

    Article  Google Scholar 

  • Maidak BL, Cole JR, Lilburn TG, Parker CT Jr, Saxman PR, Farris RJ, Garrity GM, Olsen GJ, Schmidt TM, Tiedje JM (2001) The RDP-II (ribosomal database project). Nucleic Acids Res 29(1):173–174

    Article  Google Scholar 

  • Mangul S, Koslicki D (2016) Reference-free comparison of microbial communities via de bruijn graphs. ACM-BCB, in print. http://www.biorxiv.org/content/biorxiv/early/2016/05/24/055020.full.pdf

  • Manichanh C, Borruel N, Casellas F, Guarner F (2012) The gut microbiota in IBD. Nat Rev Gastroenterol Hepatol 9(10):599–608

    Article  Google Scholar 

  • Orlin JB (1997) A polynomial time primal network simplex algorithm for minimum cost flows. Math Program 78(2):109–129

    Article  MathSciNet  Google Scholar 

  • Parks DH, Beiko RG (2010) Identifying biologically relevant differences between metagenomic communities. Bioinformatics 26(6):715–721

    Article  Google Scholar 

  • Pele O, Werman M (2008) A linear time histogram metric for improved sift matching. Comput Vis ECCV 2008:495–508

    Google Scholar 

  • Pele O, Werman M (2009) Fast and robust earth mover’s distances. In: IEEE 12th international conference on computer vision, 2009, pp 460–467. IEEE

  • Rawls JF, Mahowald MA, Ley RE, Gordon JI (2006) Reciprocal gut microbiota transplants from zebrafish and mice to germ-free recipients reveal host habitat selection. Cell 127(2):423–433

    Article  Google Scholar 

  • Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40(2):99–121

    Article  Google Scholar 

  • Sandler R, Lindenbaum M (2011) Nonnegative matrix factorization with earth mover’s distance metric for image analysis. IEEE Trans Pattern Anal Mach Intell 33(8):1590–1602

    Article  Google Scholar 

  • Schloss PD, Handelsman J (2006) Introducing sons, a tool for operational taxonomic unit-based comparisons of microbial community memberships and structures. Appl Environ Microbiol 72(10):6773–6779

    Article  Google Scholar 

  • Shirdhonkar S, Jacobs DW (2008) Approximate earth movers distance in linear time. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8

  • Spor A, Koren O, Ley R (2011) Unravelling the effects of the environment and host genotype on the gut microbiome. Nat Rev Microbiol 9(4):279–290

    Article  Google Scholar 

  • White JR, Nagarajan N, Pop M (2009) Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol 5(4):e1000352

    Article  Google Scholar 

  • Willing BP, Dicksved J, Halfvarson J, Andersson AF, Lucio M, Zheng Z, Järnerot G, Tysk C, Jansson JK, Engstrand L (2010) A pyrosequencing study in twins shows that gastrointestinal microbial profiles vary with inflammatory bowel disease phenotypes. Gastroenterology 139(6):1844–1854

    Article  Google Scholar 

  • Wooley JC, Godzik A, Friedberg I (2010) A primer on metagenomics. PLoS Comput Biol 6(2):e1000667

    Article  Google Scholar 

  • Xu D, Yan S, Luo J (2008) Face recognition using spatially constrained earth mover’s distance. IEEE Trans Image Process 17(11):2256–2260

    Article  MathSciNet  Google Scholar 

  • Yilmaz P, Parfrey LW, Yarza P, Gerken J, Pruesse E, Quast C, Schweer T, Peplies J, Ludwig W, Glöckner FO (2013) The SILVA and all-species living tree project (LTP) taxonomic frameworks. Nucleic Acids Res. https://doi.org/10.1093/nar/gkt1209

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason McClelland.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McClelland, J., Koslicki, D. EMDUniFrac: exact linear time computation of the UniFrac metric and identification of differentially abundant organisms. J. Math. Biol. 77, 935–949 (2018). https://doi.org/10.1007/s00285-018-1235-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-018-1235-9

Keywords

Mathematics Subject Classification

Navigation