PASTA: Ultra-Large Multiple Sequence Alignment
- Cite this paper as:
- Mirarab S., Nguyen N., Warnow T. (2014) PASTA: Ultra-Large Multiple Sequence Alignment. In: Sharan R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science, vol 8394. Springer, Cham
In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate – slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory.
KeywordsMultiple sequence alignment Ultra-large SATé
Unable to display preview. Download preview PDF.