Skip to main content

Revisiting Evaluation of Multiple Sequence Alignment Methods

  • Protocol
  • First Online:
Multiple Sequence Alignment

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2231))

Abstract

Multiple sequence alignment is a core first step in many bioinformatics analyses, and errors in these alignments can have negative consequences for scientific studies. In this article, we review some of the recent literature evaluating multiple sequence alignment methods and identify specific challenges that arise when performing these evaluations. In particular, we discuss the different trends observed in simulation studies and when using biological benchmarks. Overall, we find that multiple sequence alignment, far from being a “solved problem,” would benefit from new attention.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Morrison D, Ellis JT (1997) Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of apicomplexa. Mol Biol Evol 14:428–441

    Article  CAS  PubMed  Google Scholar 

  2. Hall B (2005) Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences. Mol Biol Evol 22:792–802

    Article  CAS  PubMed  Google Scholar 

  3. Ogden T, Rosenberg M (2006) Multiple sequence alignment accuracy and phylogenetic inference. System Biol 55(2):314–328

    Article  Google Scholar 

  4. Liu K, Raghavan S, Nelesen S, Linder CR, Warnow T (2009) Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science 324(5934):1561–1564

    Article  CAS  PubMed  Google Scholar 

  5. Wang L-S, Leebens-Mack J, Wall PK, Beckmann K, dePamphilis CW, Warnow T (2011) The impact of multiple protein sequence alignment on phylogenetic estimation. IEEE/ACM Trans Comput Biol Bioinform 8(4):1108–1119

    Article  PubMed  Google Scholar 

  6. Morrison D (2006) Multiple sequence alignment for phylogenetic purposes. Aust Syst Bot 19:479–539

    Article  CAS  Google Scholar 

  7. Reeck G, deHaen C, Teller D, Doolitte R, Fitch W, Dickerson R, Chambon P, McLachlan A, Margoliash E, Jukes T, Zuckerkandl E (1987) “Homology” in proteins and nucleic acids: a terminology muddle and a way out of it Cell 50:667

    Google Scholar 

  8. Iantorno S, Gori K, Goldman N, Gil M, Dessimoz C (2014) Who watches the watchmen? an appraisal of benchmarks for multiple sequence alignment. In Russell D (ed) Multiple sequence alignment methods. Springer, Berlin, pp 59–73

    Chapter  Google Scholar 

  9. Cline M, Hughey R, Karplus K (2002) Predicting reliable regions in protein sequence alignments Bioinformatics 18(2):306–314

    Article  CAS  PubMed  Google Scholar 

  10. Löytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions Proc Natl Acad Sci 102:10557–10562

    PubMed  Google Scholar 

  11. Holmes I (2017) Historian: accurate reconstruction of ancestral sequences and evolutionary rates. Bioinformatics 33(8):1227–1229. https://doi.org/10.1093/bioinformatics/btw791

    PubMed  PubMed Central  Google Scholar 

  12. Edgar RC, Sjölander K (2003) SATCHMO: sequence alignment and tree construction using hidden Markov models. Bioinformatics 19(11):1404–1411

    Article  CAS  PubMed  Google Scholar 

  13. Hagopian R, Davidson J, Datta R, Jarvis G, Sjölander K (2010) SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction Nucl Acids Res 38 (Web Server Issue):W29–W34. PMCID: PMC2896197

    Google Scholar 

  14. Pei J, Grishin N (2014) Promals3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information. In Russell D (ed) Multiple sequence alignment methods. Springer, Berlin

    Google Scholar 

  15. Redelings B, Suchard M (2005) Joint Bayesian estimation of alignment and phylogeny. Syst Biol 54(3):401–418

    Article  PubMed  Google Scholar 

  16. Suchard, M. A. and Redelings, B. D. (2006) BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22(16):2047–2048

    Article  CAS  PubMed  Google Scholar 

  17. Novák Á, Miklós I, Lyngsoe R, Hein J (2008) StatAlign: an extendable software package for joint Bayesian estimation of alignments and evolutionary trees. Bioinformatics 24:2403–2404

    Article  PubMed  CAS  Google Scholar 

  18. Löytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320(5883):1632–1635

    Article  PubMed  CAS  Google Scholar 

  19. Löytynoja A, Vilella A, Goldman N (2012) Accurate extension of multiple sequence alignments using a phylogeny-aware algorithm Bioinformatics 28(13):1684–1691

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Liu K, Warnow T, Holder MT, Nelesen SM, Yu J, Stamatakis AP, Linder CR (2012) SATé-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst Biol 61(1):90–106

    Article  PubMed  Google Scholar 

  21. Mirarab S, Nguyen N, Wang L-S, Guo S, Kim J, Warnow T (2015) PASTA: ultra-large multiple sequence alignment of nucleotide and amino acid sequences. J Comput Biol 22:377–386

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Nute M, Warnow T (2016) Scaling statistical multiple sequence alignment to large datasets. BMC Genomics 17(10):764

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30(14):3059–3066

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res 33(2):511–518

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4):772–780

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9(4):286–298

    Article  CAS  PubMed  Google Scholar 

  27. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539

    Article  PubMed  PubMed Central  Google Scholar 

  28. Nelesen S, Liu K, Zhao D, Linder CR, Warnow T (2008) The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses. In Pacific symposium on biocomputing 2008, vol 13. World Scientific, Singapore, pp 15–24

    Google Scholar 

  29. Toth A, Hausknecht A, Krisai-Greilhuber I, Papp T, Vagvolgyi C, Nagy L (2013) Iteratively refined guide trees help improving alignment and phylogenetic inference in the mushroom family bolbitiaceae. PLoS One 8(2):e56143

    Google Scholar 

  30. Edgar RC (2004) MUSCLE: a multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Liu Y, Schmidt B, Maskell DL (2010) MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities Bioinformatics 26(16):1958–1964

    CAS  PubMed  Google Scholar 

  32. Nute M, Saleh E, Warnow T (2018) Evaluating statistical multiple sequence alignment in comparison to other alignment methods on protein data sets Syst Biol 68(3):396–411

    Article  PubMed Central  CAS  Google Scholar 

  33. Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217

    Article  CAS  PubMed  Google Scholar 

  34. Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5(113):113

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Nguyen N, Mirarab S, Kumar K, Warnow T (2015) Ultra-large alignments using phylogeny aware profiles Genome Biol 16(124). A preliminary version appeared in the Proceedings RECOMB 2015

    Google Scholar 

  36. Vialle RA, Tamuri AU, Goldman N (2018) Alignment modulates ancestral sequence reconstruction accuracy. Mol Biol Evol 35(7):1783–1797

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Thompson J, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignments database for the evaluation of multiple sequence alignment programs. Bioinformatics 15:87–88. Extended collection of benchmarks is available at http://www-bio3d-igbmc.u-strasb.fr/balibase/

  38. Suchard MA, Redelings BD (2006) BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22:2047–2048

    Article  CAS  PubMed  Google Scholar 

  39. Redelings BD, Suchard MA (2007) Incorporating indel information into phylogeny estimation for rapidly emerging pathogens. BMC Evol Biol 7:40

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 22:2715–2721

    Article  CAS  PubMed  Google Scholar 

  41. Morrison DA, Ellis JT (1997) Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Mol Biol Evol 14(4):428–441

    Article  CAS  PubMed  Google Scholar 

  42. Wong KM, Suchard MP, Huelsenbeck JP (2008) Alignment uncertainty and genomic analysis Science 319(5862):473 – 476

    Article  CAS  PubMed  Google Scholar 

  43. Cantarel BL, Morrison HG, Pearson W (2006) Exploring the relationship between sequence similarity and accurate phylogenetic trees. Mol Biol Evol 23(11):2090–2100

    Article  CAS  PubMed  Google Scholar 

  44. Hall BG (2005) Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences. Mol Evol Biol 22(3):792–802

    Article  CAS  Google Scholar 

  45. Roshan U, Livesay D, Chikkagoudar S (2006) Improving progressive alignment for phylogeny reconstruction using parsimonious guide-trees. In Proceedings of the IEEE 6th symposium on bioinformatics and bioengineering (BIBE’06). IEEE Computer Society Press, Washington, DC, pp 159–164

    Chapter  Google Scholar 

  46. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    CAS  PubMed  Google Scholar 

  47. Warnow T (2018) Computational phylogenetics: an introduction to designing methods for phylogeny estimation. Cambridge University Press, Cambridge

    Google Scholar 

  48. Wheeler T, Kececioglu J (2007) Multiple alignment by aligning alignments. Bioinformatics 23:i559–i568

    Article  CAS  PubMed  Google Scholar 

  49. Cannone J, Subramanian S, Schnare M, Collett J, D’Souza L, Du Y, Feng B, Lin N, Madabusi L, Muller K, Pande N, Shang Z, Yu N, Gutell R. (2002) The Comparative RNA Web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron and other RNAs. BioMed Central Bioinform 3(15). http://www.rna.ccbb.utexas.edu

  50. Varón A, Vinh L, Wheeler W (2010) POY version 4: phylogenetic analysis using dynamic homologies. Cladistics 26:72–85

    Article  PubMed  Google Scholar 

  51. Thompson J, Higgins D, Gibson T (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S (2005) ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 15(2):330–340

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Stoye J, Evers D, Meyer F (1998) Rose: generating sequence families Bioinformatics 14(2):157–163

    Article  CAS  PubMed  Google Scholar 

  54. Fletcher W, Yang Z (2009) Indelible: a flexible simulator of biological sequence evolution. Mol Biol Evol 26:1879–1888

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Tavaré S (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. In Lectures on mathematics in the life sciences, vol 17. American Mathematical Society, Providence, pp 57–86

    Google Scholar 

  56. Yang Z (2014) Molecular evolution: a statistical approach. Oxford University Press, Oxford

    Book  Google Scholar 

  57. Jermiin LS, Ho SY, Ababneh F, Robinson J, Larkum AW (2004) The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst Biol 53(4):638–643

    Article  PubMed  Google Scholar 

  58. Steel M (1994) Recovering a tree from the leaf colourations it generates under a Markov model. Appl Math Lett 7:19–24

    Article  Google Scholar 

  59. Duchêne DA, Duchêne S, Ho SY (2017) New statistical criteria detect phylogenetic bias caused by compositional heterogeneity. Mol Biol Evol 34(6):1529–1534

    Article  PubMed  CAS  Google Scholar 

  60. Naser-Khdour S, Minh BQ, Zhang W, Stone EA, Lanfear R (2019) The prevalence and impact of model violations in phylogenetic analysis. Genome Biol Evol. https://doi.org/10.1093/gbe/evz193

  61. Crotty SM, Minh BQ, Bean NG, Holland BR, Tuke J, Jermiin LS, Haeseler AV (2019) GHOST: recovering historical signal from heterotachously evolved sequence alignments. Syst Biol 69:249–264, syz051

    Google Scholar 

  62. White ND, Braun MJ (2019) Extracting phylogenetic signal from phylogenomic data: higher-level relationships of the nightbirds (Strisores). Mol Phylogenet Evol 141:106611. https://doi.org/10.1016/j.ympev.2019.106611

    Article  PubMed  Google Scholar 

  63. Edgar RC (2010) Quality measures for protein alignment benchmarks. Nucleic Acids Res 7:2415–2153

    Google Scholar 

  64. Chatzou M, Magis C, Chang J-M, Kemena C, Bussotti G, Erb I, Notredame C (2016) Multiple sequence alignment modeling: methods and applications. Brief Bioinform 17(6):1009–1023

    Article  CAS  PubMed  Google Scholar 

  65. Van Walle IL, Wyns L (2005) SABmark-a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 21:1267–1268

    Article  PubMed  Google Scholar 

  66. Sjölander K (2004) Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics 20(2):170–179

    Article  PubMed  Google Scholar 

  67. Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294(5540):93–96

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This paper was supported by NSF grant ABI-1458652 to TW. This research was also inspired by a Program on Multiple Sequence Alignment held at IPAM (Institute for Pure & Applied Mathematics, an NSF Math Institute at UCLA) in January 2015. The author wishes to thank George Chacko, whose critical comments were helpful in improving the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tandy Warnow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Warnow, T. (2021). Revisiting Evaluation of Multiple Sequence Alignment Methods. In: Katoh, K. (eds) Multiple Sequence Alignment. Methods in Molecular Biology, vol 2231. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1036-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1036-7_17

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1035-0

  • Online ISBN: 978-1-0716-1036-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics