A Novel Approach for Compressing Phylogenetic Trees
Phylogenetic trees are tree structures that depict relationships between organisms. Popular analysis techniques often produce large collections of candidate trees, which are expensive to store. We introduce TreeZip, a novel algorithm to compress phylogenetic trees based on their shared evolutionary relationships. We evaluate TreeZip’s performance on fourteen tree collections ranging from 2,505 trees on 328 taxa to 150,000 trees on 525 taxa corresponding to 0.6 MB to 434 MB in storage. Our results show that TreeZip is very effective, typically compressing a tree file to less than 2% of its original size. When coupled with standard compression methods such as 7zip, TreeZip can compress a file to less than 1% of its original size. Our results strongly suggest that TreeZip is very effective at compressing phylogenetic trees, which allows for easier exchange of data with colleagues around the world.
Unable to display preview. Download preview PDF.
- 1.Amenta, N., Clarke, F., John, K.S.: A linear-time majority tree algorithm. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 216–227. Springer, Heidelberg (2003)Google Scholar
- 2.Boyer, R.S., Hunt Jr., W.A., Nelesen, S.: A compressed format for collections of phylogenetic trees and improved consensus performance. Technical Report TR-05-12, Department of Computer Sciences, The University of Texas at Austin (2005)Google Scholar
- 4.Felsenstein, J.: The Newick tree format. Internet Website (last accessed January 2010), Newick, http://evolution.genetics.washington.edu/phylip/newicktree.html
- 8.Molin, A.D., Matthews, S., Sul, S.-J., Munro, J., Woolley, J.B., Heraty, J.M., Williams, T.L.: Large data sets, large sets of trees, and how many brains? – Visualization and comparison of phylogenetic hypotheses inferred from rdna in chalcidoidea (hymenoptera) (poster December 2009), http://esa.confex.com/esa/2009/webprogram/Session11584.html