Abstract
Phylogenetic tree searching algorithms often produce thousands of trees which biologists save in Newick format in order to perform further analysis. Unfortunately, Newick is neither space efficient, nor conducive to post-tree analysis such as consensus. We propose a new format for storing phylogenetic trees that significantly reduces storage requirements while continuing to allow the trees to be used as input to post-tree analysis. We implemented mechanisms to read and write such data from and to files, and also implemented a consensus algorithm that is faster by an order of magnitude than standard phylogenetic analysis tools. We demonstrate our results on a collection of data files produced from both maximum parsimony tree searches and Bayesian methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adams, E.N.: Consensus techniques and the comparison of taxonomic trees. Systematic Zoology 21, 390–397 (1972)
Amenta, N., St. John, K., Clarke, F.: A linear-time majority tree algorithm. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 216–227. Springer, Heidelberg (2003)
Berger-Wolf, T.Y.: Online consensus and agreement of phylogenetic trees. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS, vol. 3240, pp. 350–361. Springer, Heidelberg (2004)
Bryant, D.: A classification of consensus methods for phylogenetics. In: Janowitz, M., Lapointe, F.J., McMorris, F., Mirkin, B., Roberts, F. (eds.) Bioconsensus. DIMACS Series in Discrete Mathematics and Theoretical Computer Science. DIMACS-AMS (2001)
Day, W.H.E.: Optimal algorithms for comparing trees with labeled leaves. Journal of Classification 2(1), 7–28 (1985)
Felsenstein, J.: The newick tree format (1986), http://evolution.genetics.washington.edu/phylip/newicktree.html
Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Inc. (2004)
Goloboff, P.A., Farris, J.S., Nixon, K.C.: TNT (Tree analysis using new technology) (BETA) ver. 1.0. Published by the authors, Tucumán, Argentina (2000)
Goto, E., Soma, T., Inade, N., Ida, T., Idesawa, M., Hiraki, K., Suzuki, M., Shimizu, K., Philpov, B.: Design of a lisp machine - flats. In: LFP 1982: Proceedings of the 1982 ACM Symposium on LISP and functional programming, pp. 208–215 (1982)
Hillis, D.M., Moritz, C., Mable, B.K.: Molecular Sytematics, 2nd edn. Sinauer Associates, Inc., Sunderland (1996)
Huelsenbeck, J.P., Ronquist, F.: MRBAYES: Bayesian inference of phylogeny. Bioinformatics 17, 754–755 (2001)
Kaufmann, M., Manolios, P., Moore, J.S.: Computer-Aided Reasoning: An Approach. Kluwer Academic Publishers, Dordrecht (2000)
Margush, T., McMorris, F.R.: Consensus n-trees. Bulletin of Mathematical Biology 43(2), 239–244 (1981)
Nakhleh, L., Miranker, D., Barbancon, F., Piel, W.H., Donoghue, M.J.: Requirements of phylogenetic databases. In: Proceedings of the Third IEEE Symposium on Bioinformatics and Bioengineering (BIBE 2003), pp. 141–148. IEEE Press, Los Alamitos (2003)
Seward, J.: bzip2 (2002), http://sources.redhat.com/bzip2/
Sokal, R.R., Rohlf, F.J.: Taxonomic Congruence in the Leptopodomorpha Re-Examined. Systematic Zoology 30(3), 309–325 (1981)
Steele, G.L.: Common Lisp the Language, 2nd edn., ch. 22.1.4. Digital Press (1990)
Swofford, D.L.: PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods) 4.0 Beta. Sinauer Associates, Sunderland (2002)
Williams, T., Berger-Wolf, T., Moret, B., Roshan, U., Warnow, T.: The relationship between maximum parsimony score and phylogenetic tree topologies. Personal Communication
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23, 337–342 (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boyer, R.S., Hunt, W.A., Nelesen, S.M. (2005). A Compressed Format for Collections of Phylogenetic Trees and Improved Consensus Performance. In: Casadio, R., Myers, G. (eds) Algorithms in Bioinformatics. WABI 2005. Lecture Notes in Computer Science(), vol 3692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557067_29
Download citation
DOI: https://doi.org/10.1007/11557067_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29008-7
Online ISBN: 978-3-540-31812-5
eBook Packages: Computer ScienceComputer Science (R0)