Encyclopedia of Parallel Computing

2011 Edition
| Editors: David Padua

Phylogenetics

  • Alexandros Stamatakis
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-09766-4_443

Synonyms

Definition

Phylogenetics, or phylogenetic inference (bioinformatics discipline), deals with models and algorithms for reconstruction of the evolutionary history – mostly in form of a (binary) evolutionary tree – for a set of living biological organisms based upon their molecular (DNA) or morphological (morphological traits) sequence data.

Discussion

Introduction

The reconstruction of phylogenetic (evolutionary) trees from molecular or morphological sequence data is a comparatively old bioinformatics discipline, given that likelihood-based statistical models for phylogenetic inference were introduced in the early 1980s, while discrete criteria that rely on counting changes in the sequence data date back to the late 1960s and early 1970s.

Computationally, likelihood-based phylogenetic inference approaches represent a major challenge, because of high memory footprints and of floating point...

This is a preview of subscription content, log in to check access.

Bibliography

  1. 1.
    Aberer A, Pattengale N, Stamatakis A (2010) Parallel computation of phylogenetic consensus trees. Procedia Comput Sci 1(1): 1059–1067Google Scholar
  2. 2.
    Aberer A, Pattengale N, Stamatakis A (2010) Parallelized phylogenetic post-analysis on multi-core architectures. J Comput Sci 1(2):107–114Google Scholar
  3. 3.
    Alachiotis N, Sotiriades E, Dollas A, Stamatakis A (2009) Exploring FPGAs for accelerating the phylogenetic likelihood function. In: IEEE international symposium on parallel & distributed processing, 2009. IPDPS 2009, pp 1–8. IEEEGoogle Scholar
  4. 4.
    Alachiotis N, Stamatakis A, Sotiriades E, Dollas A (2009) A reconfigurable architecture for the Phylogenetic Likelihood Function. In: International Conference on Field Programmable Logic and Applications, 2009. FPL 2009, pp 674–678. IEEE, 2009Google Scholar
  5. 5.
    Bakos J (2007) FPGA acceleration of gene rearrangement analysis. In: Proceedings of 15th annual IEEE symposium on field-programmable custom computing machines. IEEE, Napa, CA, pp 85–94Google Scholar
  6. 6.
    Bakos J, Elenis P, Tang J (2007) FPGA acceleration of phylogeny reconstruction for whole genome data. In: Proceedings of the 7th IEEE international conference on bioinformatics and bio engineering. IEEE, Boston, MA, pp 888–895Google Scholar
  7. 7.
    Berger S, Stamatakis A (2010) Accuracy and performance of single versus double precision arithmetics for maximum likelihood phylogeny reconstruction. Lecture notes in computer science, vol 6068. Springer, pp 270–279Google Scholar
  8. 8.
    Blagojevic F, Nikolopoulos D, Stamatakis A, Antonopoulos C (2007) Dynamic multigrain parallelization on the cell broadband engine. In: Proceedings of PPoPP 2007, San Jose, CA, March 2007, pp 90–100Google Scholar
  9. 9.
    Blagojevic F, Nikolopoulos D, Stamatakis A, Antonopoulos C, Curtis-Maury M (2007) Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems. Parallel Comput 33:700–719Google Scholar
  10. 10.
    Blagojevic F, Nikolopoulos DS, Stamatakis A, Antonopoulos CD (2007) RAxML-Cell: Parallel phylogenetic tree inference on the cell broadband engine. In: Proceedings of international parallel and distributed processing symposium (IPDPS2007), 2007Google Scholar
  11. 11.
    Blanchette M, Bourque G, Sankoff D (1997) Breakpoint phylogenies. In: Miyano S, Takagi T (eds) Workshop on genome informatics, vol 8. Univ. Academy Press, pp 25–34Google Scholar
  12. 12.
    Bradley R, Roberts A, Smoot M, Juvekar S, Do J, Dewey C, Holmes I, Pachter L (2009) Fast statistical alignment. PLoS Comput Biol 5(5):e1000392MathSciNetGoogle Scholar
  13. 13.
    Bryant D (1998) The complexity of the breakpoint median problem. Technical report, University of Montreal, CanadaGoogle Scholar
  14. 14.
    Ceron C, Dopazo J, Zapata E, Carazo J, Trelles O (1998) Parallel implementation of DNAml program on message-passing architectures. Parallel Comput 24(5–6):701–716zbMATHMathSciNetGoogle Scholar
  15. 15.
    Charalambous M, Trancoso P, Stamatakis A (2005) Initial experiences porting a bioinformatics application to a graphics processor. Lecture notes in computer science, vol 3746. Springer, New York, pp 415–425Google Scholar
  16. 16.
    Chor B, Tuller T (2005) Maximum likelihood of evolutionary trees: hardness and approximation. Bioinformatics 21(1):97–106Google Scholar
  17. 17.
    Day W (1987) Computational complexity of inferring phylogenies from dissimilarity matrices. Bulletin of Mathematical Biology 49(4):461–467zbMATHMathSciNetGoogle Scholar
  18. 18.
    Day W, Johnson D, Sankoff D (1986) The computational complexity of inferring rooted phylogenies by parsimony. Mathematical biosciences 81(33–42):299MathSciNetGoogle Scholar
  19. 19.
    Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376Google Scholar
  20. 20.
    Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4):783–791Google Scholar
  21. 21.
    Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, SunderlandGoogle Scholar
  22. 22.
    Feng X, Cameron K, Sosa C, Smith B (2007) Building the tree of life on terascale systems. In: Proceedings of international parallel and distributed processing symposium (IPDPS2007), 2007Google Scholar
  23. 23.
    Fitch W, Margoliash E (1967) Construction of phylogenetic trees. Science 155(3760):279–284Google Scholar
  24. 24.
    Fleissner R, Metzler D, Haeseler A (2005) Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst Biol 54:548–561Google Scholar
  25. 25.
    Goldman N, Yang Z (2008) Introduction. statistical and computational challenges in molecular phylogenetics and evolution. Philos Trans R Soc B Biol Sci 363(1512):3889Google Scholar
  26. 26.
    Goloboff P (1999) Analyzing large data sets in reasonable times: solution for composite optima. Cladistics 15:415–428Google Scholar
  27. 27.
    Goloboff PA, Catalano SA, Mirande JM, Szumik CA, Arias JS, Källersjö M, Farris JS (2009) Phylogenetic analysis of 73060 taxa corroborates major eukaryotic groups. Cladistics 25:1–20Google Scholar
  28. 28.
    Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52(5):696–704Google Scholar
  29. 29.
    Hedges S (1992) The number of replications needed for accurate estimation of the bootstrap P value in phylogenetic studies. Mol Biol Evolution 9(2):366–369Google Scholar
  30. 30.
    Hejnol A, Obst M, Stamatakis A, Ott M, Rouse G, Edgecombe G, Martinez P, Baguna J, Bailly X, Jondelius U, Wiens M, Müller W, Seaver E, Wheeler W, Martindale M, Giribet G, Dunn C (2009) Rooting the bilaterian tree with scalable phylogenomic and supercomputing tools. Proc R Soc B 276:4261–4270Google Scholar
  31. 31.
    Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process.Mol Biol Evol 21(6):1095–1109Google Scholar
  32. 32.
    Loytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883):1632Google Scholar
  33. 33.
    Maddison W (1997) Gene trees in species trees. Syst Biol 46(3):523Google Scholar
  34. 34.
    Mak T, Lam K (2003) High speed GAML-based phylogenetic tree reconstruction using HW/SW codesign. In: Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE, pp 470–473Google Scholar
  35. 35.
    Mak T, Lam K (2004) Embedded computation of maximum-likelihood phylogeny inference using platform FPGA. In: Proceedings of IEEE Computational Systems Bioinformatics Conference (CSB 04), pp 512–514Google Scholar
  36. 36.
    Mak T, Lam K (2004) FPGA-Based Computation for Maximum Likelihood Phylogenetic Tree Evaluation. In: Lecture notes in computer science, pp 1076–1079Google Scholar
  37. 37.
    Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E et al (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087Google Scholar
  38. 38.
    Minh B, Vinh L, Haeseler A, Schmidt H (2005) pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies. Bioinformatics 21(19):3794–3796Google Scholar
  39. 39.
    Minh B, Vinh L, Schmidt H, Haeseler A (2006) Large maximum likelihood trees. In: Proceedings of the NIC Symposium 2006, pp 357–365Google Scholar
  40. 40.
    Moret B, Tang J, Wang L, Warnow T (2002) Steps toward accurate reconstructions of phylogenies from gene-order data 1. J Comput Syst Sci 65(3):508–525zbMATHMathSciNetGoogle Scholar
  41. 41.
    Moret B, Wyman S, Bader D, Warnow T, Yan M (2001) A new implementation and detailed study of breakpoint analysis. In: Pacific symposium on biocomputing 6:583–594Google Scholar
  42. 42.
    Morrison D (2007) Increasing the efficiency of searches for the maximum likelihood tree in a phylogenetic analysis of up to 150 nucleotide sequences. Syst Biol 56(6):988–1010Google Scholar
  43. 43.
    Ott M, Zola J, Aluru S, Johnson A, Janies D, Stamatakis A (2008) Large-scale phylogenetic analysis on current HPC architectures. Scientific Programming 16(2–3):255–270Google Scholar
  44. 44.
    Ott M, Zola J, Aluru S, Stamatakis A (2007) Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L. In: Proceedings of IEEE/ACM Supercomputing Conference 2007 (SC2007), IEEE, Reno, NevadaGoogle Scholar
  45. 45.
    Pattengale N, Alipour M, Bininda-Emonds O, Moret B, Stamatakis A (2010) How many bootstrap replicates are necessary? J Comput Biol 17(3):337–354MathSciNetGoogle Scholar
  46. 46.
    Pfeiffer W, Stamatakis A (2010) Hybrid MPI/Pthreads parallelization of the RAxML phylogenetics code. In: IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010, IEEE, Atlanta, Georgia, pp 1–8Google Scholar
  47. 47.
    Pratas F, Trancoso P, Stamatakis A, Sousa L (2009) Fine-grain Parallelism using multi-core, Cell/BE, and GPU systems: accelerating the phylogenetic likelihood function. In: International conference on parallel processing, 2009. ICPP’09, IEEE, Vienna, pp 9–17Google Scholar
  48. 48.
    Price M, Dehal P, Arkin A (2010) FastTree 2–approximately maximumlikelihood trees for large alignments. PLoS One 5(3):e9490Google Scholar
  49. 49.
    Roch S (2006) A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM transactions on Computational Biology and Bioinformatics, pp 92–94Google Scholar
  50. 50.
    Ronquist F, Huelsenbeck J (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12): 1572–1574Google Scholar
  51. 51.
    Savill NJ, Hoyle DC, Higgs PG (2001) Rna sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics 157:399–411Google Scholar
  52. 52.
    Smith S, Donoghue M (2008) Rates of molecular evolution are linked to life history in flowering plants. Science 322(5898):86–89Google Scholar
  53. 53.
    Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21):2688–2690Google Scholar
  54. 54.
    Stamatakis A, Alachiotis N (2010) Time and memory efficient likelihoodbased tree searches on phylogenomic alignments with missing data. Bioinformatics 26(12):i132Google Scholar
  55. 55.
    Stamatakis A, Blagojevic F, Antonopoulos CD, Nikolopoulos DS (2007) Exploring new search algorithms and hardware for phylogenetics: RAxML meets the IBM Cell. J VLSI Sig Proc Syst 48(3):271–286Google Scholar
  56. 56.
    Stamatakis A, Ludwig T, Meier H (2004) Parallel inference of a 10.000-taxon phylogeny with maximum likelihood. In: Proceedings of Euro-Par 2004, September 2004, IEEE, Pisa Italy, pp 997–1004Google Scholar
  57. 57.
    Stamatakis A, Ott M (2008) Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures. Philos Trans R Soc B, Biol Sci 363:3977–3984Google Scholar
  58. 58.
    Stamatakis A, Ott M (2008) Exploiting fine-grained parallelism in the phylogenetic likelihood function with MPI, Pthreads, and OpenMP: a performance study. In: Chetty M, Ngom A, Ahmad S (eds) PRIB, Lecture notes in computer science, vol 5265. Springer, Heidelberg, pp 424–435Google Scholar
  59. 59.
    Stamatakis A, Ott M (2009) Load balance in the phylogenetic likelihood kernel. In: International conference on parallel processing, 2009. ICPP’09, IEEE, Vienna, Austria, pp 348–355Google Scholar
  60. 60.
    Stamatakis A, Ott M, Ludwig T (2005) RAxML-OMP: an efficient program for phylogenetic inference on SMPs. Lecture notes in computer science, vol 3606. Springer, Berlin, Heidelberg, pp 288–302Google Scholar
  61. 61.
    Stewart C, Hart D, Berry D, Olsen G, Wernert E, Fischer W (2001) Parallel implementation and performance of fastDNAml – a program for maximum likelihood phylogenetic inference. In: Supercomputing, ACM/IEEE 2001 conference, ACM/IEEE, Denver, Colorado, pp 32–32Google Scholar
  62. 62.
    Strimmer K, Haeseler A (1996) Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol Biol Evol 13:964–969Google Scholar
  63. 63.
    Suchard M, Rambaut A (2009) Many-core algorithms for statistical phylogenetics. Bioinformatics 25(11):1370Google Scholar
  64. 64.
    Wehe A, Chang W, Eulenstein O, Aluru S (2010) A scalable parallelization of the gene duplication problem. J Parallel Distr Comput 70(3):237–244Google Scholar
  65. 65.
    Wheeler T (2009) Large-scale neighbor-joining with ninja. Lecture notes in computer science, vol 5724. Springer, Berlin, pp 375–389Google Scholar
  66. 66.
    Yang Z (2006) Computational molecular evolution. Oxford University Press, USAGoogle Scholar
  67. 67.
    Zierke S, Bakos J (2010) FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods. BMC Bioinformatics 11(1):184Google Scholar
  68. 68.
    Zwickl D (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. PhD thesis, University of Texas at Austin, April 2006Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Alexandros Stamatakis
    • 1
  1. 1.Scientific Computing GroupHeidelberg Institute for Theoretical StudiesHeidelbergGermany