Stochastic Analysis of Minimal Automata Growth for Generalized Strings
- 21 Downloads
Generalized strings describe various biological motifs that arise in molecular and computational biology. In this manuscript, we introduce an alternative but efficient algorithm to construct the minimal deterministic finite automaton (DFA) associated with any generalized string. We exploit this construction to characterize the typical growth of the minimal DFA (i.e., with the least number of states) associated with a random generalized string of increasing length. Even though the worst-case growth may be exponential, we characterize a point in the construction of the minimal DFA when it starts to grow linearly and conclude it has at most a polynomial number of states with asymptotically certain probability. We conjecture that this number is linear.
KeywordsAho-Corasick algorithm Deterministic finite automaton Generalized string Minimization Motif Polynomial growth
Mathematics Subject Classification (2010)68Q25 68Q45 68Q87 68W40
Unable to display preview. Download preview PDF.
We are thankful to two anonymous referees for their careful reading of this paper and valuable suggestions. We are also very thankful to Dr. Dougherty for partially funding this research through her NSF EXTREEMS training grant.
- AitMous O, Bassino F, Nicaud C (2012) An efficient linear pseudo-minimization algorithm for Aho-Corasick automata. In: Annual symposium on combinatorial pattern matching. Springer, pp 110–123Google Scholar
- Char IG (2018) Algorithmic construction and stochastic analysis of optimal automata for generalized strings. University of Colorado, the United States, Master’s thesisGoogle Scholar
- Chestnut SR, Lladser ME (2010) Occupancy distributions in Markov chains via Doeblin’s ergodicity coefficient. Discrete Mathematics and Theoretical Computer Science Proceedings. Vienna, pp 79–92Google Scholar
- Cristianini N, Hahn MW (2007) Introduction to computational genomics: a case studies approach, 1st edn. Cambridge University PressGoogle Scholar
- Fu JC, Lou WYW (2003) Distribution theory of runs and patterns and its applications. A finite Markov chain imbedding approach. World Scientific Publishing Co. IncGoogle Scholar
- Hopcroft JE, Motwani R, Ullman JD (2001) Introduction to automata theory, languages, and computation, 2nd edn. Addison–WesleyGoogle Scholar
- Lladser ME (2007) Minimal Markov chain embeddings of pattern problems. In: Proceedings of the 2007 information theory and applications workshop. University of California, San DiegoGoogle Scholar
- Lladser ME (2008) Markovian embeddings of general random strings. In: 2008 Proceedings of the fifth workshop on analytic algorithmics and combinatorics. SIAM, San Francisco, pp 183–190Google Scholar
- Robin S, Rodolphe F, Schbath S (2005) DNA, words and models: statistics of exceptional words, 1st edn. Cambridge University PressGoogle Scholar