Skip to main content

Multiple Sequence Alignment Computation Using the T-Coffee Regressive Algorithm Implementation

Part of the Methods in Molecular Biology book series (MIMB,volume 2231)

Abstract

Many fields of biology rely on the inference of accurate multiple sequence alignments (MSA) of biological sequences. Unfortunately, the problem of assembling an MSA is NP-complete thus limiting computation to approximate solutions using heuristics solutions. The progressive algorithm is one of the most popular frameworks for the computation of MSAs. It involves pre-clustering the sequences and aligning them starting with the most similar ones. The scalability of this framework is limited, especially with respect to accuracy. We present here an alternative approach named regressive algorithm. In this framework, sequences are first clustered and then aligned starting with the most distantly related ones. This approach has been shown to greatly improve accuracy during scale-up, especially on datasets featuring 10,000 sequences or more. Another benefit is the possibility to integrate third-party clustering methods and third-party MSA aligners. The regressive algorithm has been tested on up to 1.5 million sequences, its implementation is available in the T-Coffee package.

Key words

  • Sequence alignment
  • MSA
  • Guide tree
  • Progressive alignment

This is a preview of subscription content, access via your institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 20:175–186. https://doi.org/10.1007/bf02257378

    CrossRef  CAS  PubMed  Google Scholar 

  2. Garriga E, Di Tommaso P, Magis C et al (2019) Large multiple sequence alignments with a root-to-leaf regressive method. Nat Biotechnol 37(12):1466–1470

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  3. Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal omega. Mol Syst Biol 7:539. https://doi.org/10.1038/msb.2011.75

    CrossRef  PubMed  PubMed Central  Google Scholar 

  4. Finn RD, Bateman A, Clements J et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230. https://doi.org/10.1093/nar/gkt1223

    CrossRef  CAS  PubMed  Google Scholar 

  5. Notredame C, Higgins DG, Heringa J (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217. https://doi.org/10.1006/jmbi.2000.4042

    CrossRef  CAS  PubMed  Google Scholar 

  6. Blackshields G, Sievers F, Shi W et al (2010) Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol 5:21. https://doi.org/10.1186/1748-7188-5-21

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  7. Katoh K, Toh H (2007) PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23:372–374. https://doi.org/10.1093/bioinformatics/btl592

    CrossRef  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

We acknowledge Des Higgins and Olivier Gascuel for useful discussions and feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cedric Notredame .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Garriga, E. et al. (2021). Multiple Sequence Alignment Computation Using the T-Coffee Regressive Algorithm Implementation. In: Katoh, K. (eds) Multiple Sequence Alignment. Methods in Molecular Biology, vol 2231. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1036-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1036-7_6

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1035-0

  • Online ISBN: 978-1-0716-1036-7

  • eBook Packages: Springer Protocols