5th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2011)

Volume 93 of the series Advances in Intelligent and Soft Computing pp 149-156

Fast and Accurate Genome Anchoring Using Fuzzy Hash Maps

  • John HealyAffiliated withDepartment Computing & Mathematics, Galway-Mayo Institute of Technology
  • , Desmond ChambersAffiliated withDepartment of Information Technology, National University of Ireland


Although hash-based approaches to sequence alignment and genome assembly are long established, their utility is predicated on the rapid identification of exact k-mers from a hash-map or similar data structure. We describe how a fuzzy hash-map can be applied to quickly and accurately align a prokaryotic genome to the reference genome of a related species. Using this technique, a draft genome of Mycoplasma genitalium, sampled at 1X coverage, was accurately anchored against the genome of Mycoplasma pneumoniae. The fuzzy approach to alignment, ordered and orientated more than 65% of the reads from the draft genome in under 10 seconds, with an error rate of <1.5%. Without sacrificing execution speed, fuzzy hash-maps also provide a mechanism for error tolerance and variability in k-mer centric sequence alignment and assembly applications.