Skip to main content

Biological Sequences and the Exact String Matching Problem

  • Chapter
  • 1763 Accesses

3.15 Summary

In computational biology one often needs to look up the occurrence of some pattern P in a text T. Since the texts of computational biology include genome sequences, which tend to be large, it is important to apply efficient methods of string matching. Traditional string matching methods are guaranteed to take time O(n), where n is the length of the text. By preprocessing a set of patterns into a keyword tree, this time requirement can be extended to set matching. Instead of preprocessing one or more patterns, it is also possible to preprocess the text. A suffix tree is a data structure that can be constructed for a given text in O(n). However, once it is constructed, it can be used to search any P in T in time O(m), where is the length of the pattern. In addition to making string searching extremely efficient, a suffix tree reveals in one fell-swoop the entire repeat structure of T without the need for carrying out any string comparisons. This has important biological applications where unique and repeat sequences play a central role in many fundamental as well as biotechnological problems. Finally, suffix trees can also be used for rapid inexact string matching, where ≤ k mismatches between P and its occurrence in T are allowed.

Keywords

  • Internal Node
  • Failure Link
  • Biological Sequence
  • String Match
  • Maximal Repeat

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/3-7643-7387-3_3
  • Chapter length: 21 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   44.99
Price excludes VAT (USA)
  • ISBN: 978-3-7643-7387-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

3.16 Further Reading

  1. D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, 1997.

    Google Scholar 

Download references

Rights and permissions

Reprints and Permissions

Copyright information

© 2006 Birkhäuser Verlag

About this chapter

Cite this chapter

(2006). Biological Sequences and the Exact String Matching Problem. In: Introduction to Computational Biology. Birkhäuser Basel. https://doi.org/10.1007/3-7643-7387-3_3

Download citation