Fast and Accurate Genome Anchoring Using Fuzzy Hash Maps
Although hash-based approaches to sequence alignment and genome assembly are long established, their utility is predicated on the rapid identification of exact k-mers from a hash-map or similar data structure. We describe how a fuzzy hash-map can be applied to quickly and accurately align a prokaryotic genome to the reference genome of a related species. Using this technique, a draft genome of Mycoplasma genitalium, sampled at 1X coverage, was accurately anchored against the genome of Mycoplasma pneumoniae. The fuzzy approach to alignment, ordered and orientated more than 65% of the reads from the draft genome in under 10 seconds, with an error rate of <1.5%. Without sacrificing execution speed, fuzzy hash-maps also provide a mechanism for error tolerance and variability in k-mer centric sequence alignment and assembly applications.
KeywordsDraft Genome Edit Distance Hash Code Mycoplasma Genitalium Fuzzy Index
Unable to display preview. Download preview PDF.
- 1.Goodrich, M., Tamassia, R.: Data Structures and Algorithms in Java. John Wiley & Sons, Chichester (2001)Google Scholar
- 2.Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990)Google Scholar
- 8.Rumble, S., Lacroute, P., Dalca, A., Fiume, M., Sidow, A., Brudno, M.: SHRiMP: accurate mapping of short color-space reads. PLoS computational biology 5 (2009)Google Scholar
- 16.Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform., bbq015 (2010)Google Scholar
- 17.Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Digital SRC Research Report (1994)Google Scholar
- 20.Topac, V.: Efficient fuzzy search enabled hash map, pp. 39–44 (2010)Google Scholar
- 21.Gosling, J., Joy, B., Steele, G., Bracha, G.: Java (TM) Language Specification, The Java (Addison-Wesley): Addison-Wesley Professional (2005)Google Scholar
- 23.Bookstein, A., Tomi Klein, S., Raita, T.: Fuzzy Hamming Distance: A New Dissimilarity Measure (Extended Abstract), pp. 86–97 (2001)Google Scholar
- 24.Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals (1966)Google Scholar