A BitParallel, General IntegerScoring Sequence Alignment Algorithm
 Gary Benson,
 Yozen Hernandez,
 Joshua Loving
 … show all 3 hide
Abstract
Mapping of nextgeneration sequencing data and other processorintensive sequence comparison applications have motivated a continued search for high efficiency sequence alignment algorithms. In one approach, which exploits the inherent parallelism in computer logic calculations, individual cells in an alignment scoring matrix are represented as bits in a computer word and the calculation of scores is emulated by a series of bit operations comprised of AND, OR, XOR, complement, shift, and addition. Bitparallelism has been successfully applied to the Longest Common Subsequence (LCS) and editdistance problems, producing solutions which are significantly faster than standard implementations. But, the intensive mental effort required to produce these solutions, which are closely tied to special properties of the problems, has limited efforts to extend bitparallelism to more general scoring schemes. In this paper, we give the first bitparallel solution for general, integerscoring global alignment. Integerscoring schemes, which are widely used, assign integer weights for match, mismatch, and insertion/deletion or indel. Our method depends on structural properties of the relationship between adjacent scores in the scoring matrix. We utilize these properties to construct a class of efficient algorithms, each designed for a particular set of weights, and we introduce a standard for characterizing the efficiency in terms of the average number of bitoperations per cell of the original scoring matrix.
 Allison, L., Dix, T.I. (1986) A bitstring longestcommonsubsequence algorithm. Information Processing Letters 23: pp. 305310 CrossRef
 Bergeron, A., Hamel, S. (2002) Vector algorithms for approximate string matching. International Journal of Foundations of Computer Science 13: pp. 5365 CrossRef
 Crochemore, M., Iliopoulos, C.S., Pinzon, Y.J., Reid, J.F. (2001) A fast and practical bitvector algorithm for the longest common subsequence problem. Information Processing Letters 80: pp. 279285 CrossRef
 Gelfand, Y., Loving, J., Hernandez, Y., Benson, G.: VNTRseek – A Computational Pipeline to Detect Tandem Repeat Variants in NextGeneration Sequencing Data: Analysis of the 454 Watson Genome. In: Proc. of RECOMBseq: The Third Annual RECOMB Satellite Workshop on Massively Parallel Sequencing (to appear, 2013)
 Hyyrö, H.: Bitparallel LCSlength computation revisited. In: Proc. 15th Australasian Workshop on Combinatorial Algorithms, AWOCA 2004 (2004)
 Hyyrö, H., Fredriksson, K., Navarro, G. (2005) Increased bitparallelism for approximate and multiple string matching. Journal of Experimental Algorithmics (JEA) 10: pp. 26
 Kernighan, B.W., Ritchie, D.M.: The C programming language, 2nd edn. Prentice Hall (1988)
 Myers, G. (1999) A fast bitvector algorithm for approximate string matching based on dynamic programming. Journal of the ACM (JACM) 46: pp. 395415 CrossRef
 Navarro, G. (2004) Approximate regular expression searching with arbitrary integer weights. Nordic Journal of Computing 11: pp. 356373
 Needleman, S., Wunch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48: pp. 443453 CrossRef
 Smith, T.F., Waterman, M.S. (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147: pp. 195197 CrossRef
 Wu, S., Manber, U. (1992) Fast text searching: allowing errors. Communications of the ACM 35: pp. 8391 CrossRef
 Wu, S., Manber, U., Myers, G. (1996) A subquadratic algorithm for approximate limited expression matching. Algorithmica 15: pp. 5067 CrossRef
 Title
 A BitParallel, General IntegerScoring Sequence Alignment Algorithm
 Book Title
 Combinatorial Pattern Matching
 Book Subtitle
 24th Annual Symposium, CPM 2013, Bad Herrenalb, Germany, June 1719, 2013. Proceedings
 Pages
 pp 5061
 Copyright
 2013
 DOI
 10.1007/9783642389054_7
 Print ISBN
 9783642389047
 Online ISBN
 9783642389054
 Series Title
 Lecture Notes in Computer Science
 Series Volume
 7922
 Series ISSN
 03029743
 Publisher
 Springer Berlin Heidelberg
 Copyright Holder
 SpringerVerlag Berlin Heidelberg
 Additional Links
 Topics
 Keywords

 bitparallelism
 global sequence alignment
 integer weights
 Industry Sectors
 eBook Packages
 Editors

 Johannes Fischer ^{(16)}
 Peter Sanders ^{(17)}
 Editor Affiliations

 16. Fakultät für Informatik, Institut für Theoretische Informatik, Karlsruhe Institut für Technology
 17. Karlsruhe Institute of Technology
 Authors

 Gary Benson ^{(18)} ^{(19)} ^{(20)}
 Yozen Hernandez ^{(18)} ^{(19)}
 Joshua Loving ^{(18)} ^{(19)}
 Author Affiliations

 18. Laboratory for Biocomputing and Informatics, Boston University, Boston, MA, USA
 19. Graduate Program in Bioinformatics, Boston University, Boston, MA, USA
 20. Department of Computer Science, Boston University, Boston, MA, USA
Continue reading...
To view the rest of this content please follow the download PDF link above.