Procrastination Leads to Efficient Filtration for Local Multiple Alignment

  • Aaron E. Darling
  • Todd J. Treangen
  • Louxin Zhang
  • Carla Kuiken
  • Xavier Messeguer
  • Nicole T. Perna
Conference paper

DOI: 10.1007/11851561_12

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4175)
Cite this paper as:
Darling A.E., Treangen T.J., Zhang L., Kuiken C., Messeguer X., Perna N.T. (2006) Procrastination Leads to Efficient Filtration for Local Multiple Alignment. In: Bücher P., Moret B.M.E. (eds) Algorithms in Bioinformatics. WABI 2006. Lecture Notes in Computer Science, vol 4175. Springer, Berlin, Heidelberg

Abstract

We describe an efficient local multiple alignment filtration heuristic for identification of conserved regions in one or more DNA sequences. The method incorporates several novel ideas: (1) palindromic spaced seed patterns to match both DNA strands simultaneously, (2) seed extension (chaining) in order of decreasing multiplicity, and (3) procrastination when low multiplicity matches are encountered. The resulting local multiple alignments may have nucleotide substitutions and internal gaps as large as w characters in any occurrence of the motif. The algorithm consumes \(\mathcal{O}(wN)\) memory and \(\mathcal{O}(wN \log wN)\) time where N is the sequence length. We score the significance of multiple alignments using entropy-based motif scoring methods. We demonstrate the performance of our filtration method on Alu-repeat rich segments of the human genome and a large set of Hepatitis C virus genomes. The GPL implementation of our algorithm in C++ is called procrastAligner and is freely available from http://gel.ahabs.wisc.edu/procrastination

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Aaron E. Darling
    • 1
  • Todd J. Treangen
    • 2
  • Louxin Zhang
    • 4
  • Carla Kuiken
    • 5
  • Xavier Messeguer
    • 2
  • Nicole T. Perna
    • 3
  1. 1.Department of Computer ScienceUniversity of WisconsinUSA
  2. 2.Department of Computer ScienceTechnical University of CataloniaBarcelonaSpain
  3. 3.Department of Animal Health and Biomedical Sciences, Genome CenterUniversity of WisconsinUSA
  4. 4.Department of MathematicsNational University of SingaporeSingapore
  5. 5.T-10 Theoretical Biology DivisionLos Alamos National LaboratoryUSA

Personalised recommendations