PASTA: Ultra-Large Multiple Sequence Alignment

  • Siavash Mirarab
  • Nam Nguyen
  • Tandy Warnow
Conference paper

DOI: 10.1007/978-3-319-05269-4_15

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8394)
Cite this paper as:
Mirarab S., Nguyen N., Warnow T. (2014) PASTA: Ultra-Large Multiple Sequence Alignment. In: Sharan R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science, vol 8394. Springer, Cham

Abstract

In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate – slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory.

Keywords

Multiple sequence alignment Ultra-large SATé 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Siavash Mirarab
    • 1
  • Nam Nguyen
    • 1
  • Tandy Warnow
    • 1
  1. 1.Department of Computer ScienceUniversity of Texas at AustinAustinUSA

Personalised recommendations