Multiple Sequence Alignment Computation Using the T-Coffee Regressive Algorithm Implementation

Garriga, Edgar; Di Tommaso, Paolo; Magis, Cedrik; Erb, Ionas; Mansouri, Leila; Baltzis, Athanasios; Floden, Evan; Notredame, Cedric

doi:10.1007/978-1-0716-1036-7_6

Edgar Garriga³,
Paolo Di Tommaso³,
Cedrik Magis³,
Ionas Erb³,
Leila Mansouri³,
Athanasios Baltzis³,
Evan Floden³ &
…
Cedric Notredame^3,4

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2231))

1537 Accesses
6 Citations
1 Altmetric

Abstract

Many fields of biology rely on the inference of accurate multiple sequence alignments (MSA) of biological sequences. Unfortunately, the problem of assembling an MSA is NP-complete thus limiting computation to approximate solutions using heuristics solutions. The progressive algorithm is one of the most popular frameworks for the computation of MSAs. It involves pre-clustering the sequences and aligning them starting with the most similar ones. The scalability of this framework is limited, especially with respect to accuracy. We present here an alternative approach named regressive algorithm. In this framework, sequences are first clustered and then aligned starting with the most distantly related ones. This approach has been shown to greatly improve accuracy during scale-up, especially on datasets featuring 10,000 sequences or more. Another benefit is the possibility to integrate third-party clustering methods and third-party MSA aligners. The regressive algorithm has been tested on up to 1.5 million sequences, its implementation is available in the T-Coffee package.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hogeweg P, Hesper B (1984) The alignment of sets of sequences and the construction of phyletic trees: an integrated method. J Mol Evol 20:175–186. https://doi.org/10.1007/bf02257378
Article CAS PubMed Google Scholar
Garriga E, Di Tommaso P, Magis C et al (2019) Large multiple sequence alignments with a root-to-leaf regressive method. Nat Biotechnol 37(12):1466–1470
Article CAS PubMed PubMed Central Google Scholar
Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal omega. Mol Syst Biol 7:539. https://doi.org/10.1038/msb.2011.75
Article PubMed PubMed Central Google Scholar
Finn RD, Bateman A, Clements J et al (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230. https://doi.org/10.1093/nar/gkt1223
Article CAS PubMed Google Scholar
Notredame C, Higgins DG, Heringa J (2000) T-coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217. https://doi.org/10.1006/jmbi.2000.4042
Article CAS PubMed Google Scholar
Blackshields G, Sievers F, Shi W et al (2010) Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol 5:21. https://doi.org/10.1186/1748-7188-5-21
Article CAS PubMed PubMed Central Google Scholar
Katoh K, Toh H (2007) PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23:372–374. https://doi.org/10.1093/bioinformatics/btl592
Article CAS PubMed Google Scholar

Download references

Acknowledgments

We acknowledge Des Higgins and Olivier Gascuel for useful discussions and feedback.

Author information

Authors and Affiliations

Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain
Edgar Garriga, Paolo Di Tommaso, Cedrik Magis, Ionas Erb, Leila Mansouri, Athanasios Baltzis, Evan Floden & Cedric Notredame
Universitat Pompeu Fabra (UPF), Barcelona, Spain
Cedric Notredame

Authors

Edgar Garriga
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Di Tommaso
View author publications
You can also search for this author in PubMed Google Scholar
Cedrik Magis
View author publications
You can also search for this author in PubMed Google Scholar
Ionas Erb
View author publications
You can also search for this author in PubMed Google Scholar
Leila Mansouri
View author publications
You can also search for this author in PubMed Google Scholar
Athanasios Baltzis
View author publications
You can also search for this author in PubMed Google Scholar
Evan Floden
View author publications
You can also search for this author in PubMed Google Scholar
Cedric Notredame
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cedric Notredame .

Editor information

Editors and Affiliations

Research Institute for Microbial Disease, Osaka University, Osaka, Japan
Kazutaka Katoh

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Garriga, E. et al. (2021). Multiple Sequence Alignment Computation Using the T-Coffee Regressive Algorithm Implementation. In: Katoh, K. (eds) Multiple Sequence Alignment. Methods in Molecular Biology, vol 2231. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1036-7_6

Download citation

DOI: https://doi.org/10.1007/978-1-0716-1036-7_6
Published: 09 December 2020
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1035-0
Online ISBN: 978-1-0716-1036-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics