Indexing Factors in DNA/RNA Sequences

  • Tomáš Flouri
  • Costas Iliopoulos
  • M. Sohel Rahman
  • Ladislav Vagner
  • Michal Voráček
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 13)

Abstract

In this paper, we present the Truncated Generalized Suffix Automaton (TGSA) and present an efficient on-line algorithm for its construction. TGSA is a novel type of finite automaton suitable for indexing DNA and RNA sequences, where the text is degenerate i.e. contains sets of characters. TGSA indexes the so called k-factors, the factors of the degenerate text with length not exceeding a given constant k. The presented algorithm works in \(\mathcal{O}{(n^2)}\) time, where n is the length of the input DNA/RNA sequence. The resulting TGSA has at most linear number of states with respect to the length of the text. TGSA enables us to find the list occ(u) of all occurrences of a given pattern u in degenerate text in time |u| + |occ(u)|.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Tomáš Flouri
    • 1
  • Costas Iliopoulos
    • 2
  • M. Sohel Rahman
    • 2
  • Ladislav Vagner
    • 1
  • Michal Voráček
    • 1
  1. 1.Department of Computer Science & EngineeringCzech Technical University in PragueCzech Republic
  2. 2.Algorithm Design Group Department of Computer ScienceKing’s College London, StrandLondonEngland

Personalised recommendations