International Conference on Current Trends in Theory and Practice of Informatics

SOFSEM 2016: Theory and Practice of Computer Science pp 208-216

Subsequence Automata with Default Transitions

• Philip Bille
• Inge Li Gørtz
• Frederik Rye Skjoldjensen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9587)

Abstract

Let S be a string of length n with characters from an alphabet of size $$\sigma$$. The subsequence automaton of S (often called the directed acyclic subsequence graph) is the minimal deterministic finite automaton accepting all subsequences of S. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is $$O(n\sigma )$$ and that this bound is asymptotically optimal.

In this paper, we consider subsequence automata with default transitions, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the delay, i.e., the maximum number of consecutive default transitions followed before consuming a character.

Specifically, given any integer parameter k, $$1 < k \le \sigma$$, we present a subsequence automaton with default transitions of size $$O(nk\log _{k}\sigma )$$ and delay $$O(\log _k \sigma )$$. Hence, with $$k = 2$$ we obtain an automaton of size $$O(n \log \sigma )$$ and delay $$O(\log \sigma )$$. On the other extreme, with $$k = \sigma$$, we obtain an automaton of size $$O(n \sigma )$$ and delay O(1), thus matching the bound for the standard subsequence automaton construction. The key component of our result is a novel hierarchical automata construction of independent interest.

References

1. 1.
Baeza-Yates, R.A.: Searching subsequences. Theor. Comput. Sci. 78(2), 363–376 (1991)
2. 2.
Troníček, Z., Shinohara, A.: The size of subsequence automaton. Theor. Comput. Sci. 341(1), 379–384 (2005)
3. 3.
Crochemore, M., Melichar, B., Troníček, Z.: Directed acyclic subsequence graph: overview. J. Disc. Algorithms 1(3–4), 255–280 (2003)
4. 4.
Crochemore, M., Troníček, Z.: Directed acyclic subsequence graph for multiple texts. Technical repport, Institut Gaspard-Monge, pp. 99–118. Citeseer (1999)Google Scholar
5. 5.
Crochemore, M., Tronicek, Z.: Directed acyclic subsequence graph for multiple texts. Technical Report IGM-99-13, Institut Gaspard-Monge (1999)Google Scholar
6. 6.
Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S.: Online construction of subsequence automata for multiple texts. In: Proceedings of the 7th SPIRE, pp. 146–152 (2000)Google Scholar
7. 7.
Farhana, E., Ferdous, J., Moosa, T., Rahman, M.S.: Finite automata based algorithms for the generalized constrained longest common subsequence problems. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 243–249. Springer, Heidelberg (2010)
8. 8.
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M.: Inferring strings from graphs and arrays. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 208–217. Springer, Heidelberg (2003)
9. 9.
Troníček, Z.: Operations on DASG. In: Proceedings of the 4th WIA, pp. 82–91 (1999)Google Scholar
10. 10.
Troníček, Z.: Searching subsequences. Department of Computer Science and Engineering, FEE CTU in Prague, Ph.D. thesis (2001)Google Scholar
11. 11.
Troníček, Z.: Common subsequence automaton. In: Champarnaud, J.-M., Maurel, D. (eds.) CIAA 2002. LNCS, vol. 2608, pp. 270–275. Springer, Heidelberg (2003)
12. 12.
Bille, P., Farach-Colton, M.: Fast and compact regular expression matching. Theoret. Comput. Sci. 409, 486–496 (2008)Google Scholar
13. 13.
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
14. 14.
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
15. 15.
Kumar, S., Dharmapurikar, S., Yu, F., Crowley, P., Turner, J.: Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In: Proceedings of the 12th SIGCOMM, pp. 339–350 (2006)Google Scholar
16. 16.
Hayes, Ch.L., Luo, Y.: DPICO: a high speed deep packet inspection engine using compact finite automata. In: Proceedings of the 3rd ANCS, pp. 195–203 (2007)Google Scholar