Aho, A. V., Corasick, M. J. (1975). Efficient string matching: An aid to bibliographic search. *Communications of the ACM*, *18*(6), 333–340.

Aston, J. A. D., Martin, D. E. K. (2007). Distributions associated with general runs and patterns in hidden Markov models. *Annals of Applied Statistics*, *1*(2), 585–611.

Balakrishnan, N., Koutras, M. V. (2002). *Runs and scans with applications*. New York: Wiley.

Bassino, F., Clement, J., Fayolle, J., Nicodème, P. (2008). Constructions for clumps statistics. *Discrete Mathematics and Theoretical Computer Science *(DMTCS), AI, 179–194.

Benson, G. (1999). Tandem repeats finder: a program to analyze DNA sequences.

*Nucleic Acids Research*,

*27*, 573–580.

CrossRefGoogle ScholarBenson, G., Mak, D. Y. F. (2009). Exact distribution of a spaced seed statistic for DNA homology detection. *String processing and information retrieval, Lecture Notes in Computer Science*, *5280*, 282–293.

Buhler, J., Keich, U., Sun, Y. (2005). Designing seeds for similarity search in genomic DNA. *Journal of Computer and System Sciences*, *70*, 342–363.

Ebneshahrashoob, M., Gao, T., Wu, M. (2005). An efficient algorithm for exact distribution of discrete scan statistic. *Methodology and Computing in Applied Probability*, *7*(4), 459–471.

Fu, J. C. (1996). Distribution theory of runs and patterns associated with a sequence of multi-state trials.

*Statistica Sinica*,

*6*, 957–974.

MathSciNetMATHGoogle ScholarFu, J. C., Koutras, M. V. (1994). Distribution theory of runs: a Markov chain approach. *Journal of the American Statistical Association*, *89*, 1050–1058.

Fu, J. C., Lou, W. Y. (2003). *Distribution theory of runs and patterns and its applications*. Singapore: World Scientific Publishing Co.

Fu, J. C., Lou, W. Y. W., Bai, Z.-D., Li, G. (2002). The exact and limiting distributions of the number of successes in success runs within a sequence of Markov-dependent two-state trials. *Annals of the Institute of Statistical Mathematics*, *54*(4), 719–730.

Hopcroft, J. E. (1971). An

\(n\) log

\(n\) algorithm for minimizing states in a finite automaton. In Z. Kohavi & A. Paz (Eds.),

*Theory of Machines and Computations* (pp. 189–196). New York: Academic Press.

CrossRefGoogle ScholarHopcroft, J. E., Motwani, R., Ullman, J. D. (2001). *Introduction to automata theory, languages, and computation*. New York: Addison-Wesley.

Keich, U., Li, M., Ma, B., Tromp, J. (2004). On spaced seeds for similarity search. *Discrete Applied Mathematics*, *138*(3), 253–263.

Koutras, M. V., Alexandrou, V. A. (1995). Runs, scans and urn models: A unified Markov chain approach. *Annals of the Institute of Statistical Mathematics*, *47*, 743–766.

Kucherov G., Noé, L., Roytberg, M. (2007). Subset seed automaton. In *Implementation and application of automata, Lecture Notes in Computer Science*, *Volume 4783* (pp. 180–191).

Ledent, S., Robin, S. (2005). Checking homogeneity of motifs’ distribution in heterogenous sequences. *Journal of Computational Biology*, *12*, 672–685.

Lladser, M., Betterton, M. D., Knight, R. (2008). Multiple pattern matching: a Markov chain approach. *Journal of Mathematical Biology*, *56*(1–2), 51–92.

Lou, W. Y. W. (2003). The exact distribution of the

\(k\)-tuple statistic for sequence homology.

*Statistics and Probability Letters*,

*61*, 51–59.

MathSciNetCrossRefMATHGoogle ScholarMa, B., Tromp, J., Li, M. (2002). Patternhunter-faster and more sensitive homology search. *Bioinformatics*, *18*(3), 440–445.

Marshall, T. and Rahmann, S. (2008). Probabilistic arithmetic automata and their application to pattern matching statistics. *Lecture Notes In Computer Science*; Vol. 5029, *Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching* (pp. 95–106).

Martin, D. E. K. (2006). The exact joint distribution of the sum of heads and apparent size statistics of a “tandem repeats finder” algorithm.

*Bulletin of Mathematical Biology*,

*68*, 2353–2364.

MathSciNetCrossRefMATHGoogle ScholarMartin, D. E. K. (2008). Application of auxiliary Markov chains to start-up demonstration tests.

*European Journal of Operational Research*,

*184*(2), 574–583.

CrossRefMATHGoogle ScholarMartin, D. E. K. (2013). Coverage of spaced seeds as a measure of clumping. In American Statistical (Ed.),

*Association 2013 Proceedings of the Section on Computational Statistics*. Alexandria, VA: American Statistical Association.

Google ScholarMartin, D. E. K. (2014). P-values for the discrete scan statistic through slack variables.

*Communications in Statistics, Simulation and Computation*. doi:

10.1080/03610918.2013.777457.

Martin, D. E. K., Aston, J. A. D. (2008). Waiting time distribution of generalized later patterns. *Computational Statistics and Data Analysis*, *52*, 4879–4890.

Martin, D. E. K., Aston, J. A. D. (2013). Distributions of statistics of hidden state sequences through the sum-product algorithm. *Methodology and Computing in Applied Probability*, *15*(4), 897–918.

Martin, D. E. K., Coleman, D. A. (2011). Distributions of clump statistics for a collection of words. *Journal of Applied Probability*, *48*, 1049–1059.

Mealy, G. H. (1955). A method for synthesizing sequential circuits.

*Bell System Technical Journal*,

*34*(5), 1045–1079.

MathSciNetCrossRefGoogle ScholarMoore E.F. (1956) Gedanken-experiments on sequential machines. *Automata Studies: Annals of Mathematical Studies*, \(34\), 129–153. Princeton, N.J.: Princeton University Press.

Noé, L., Kucherov, G. (2004). Improved hit criteria for DNA local alignment. *BMC Bioinformatics*, *5*, 149.

Noé, L., Martin, D. E. K. (2014). A coverage criterion for spaced seeds and its applications to SVM string kernels and \(k\)-mer distances. *Journal of Computational Biology*, *21*(12), 947–963.

Nuel, G. (2008). Pattern Markov chains: Optimal Markov chain embedding through deterministic finite automata.

*Journal of Applied Probability*,

*45*(1), 226–243.

MathSciNetCrossRefMATHGoogle ScholarParzen, E. (1962).

*Stochastic Processes*. San Francisco: Holden-Day Inc.

MATHGoogle ScholarRibeca, P., Raineri, E. (2008). Faster exact Markovian probability functions for motif occurrences: a DFA-only approach. *Bioinformatics*, *24*(24), 2839–2848.

Robin, S., Rodolphe, F., Schbath, S. (2005). *DNA, words and models*. United Kingdom: Cambridge University Press.

Tewari, A., Srivastava, U., Gupta, P. (2002). A parallel DFA minimization algorithm. In High performance computing Hi PC, *Lecture Notes in Computer Science* (Vol. 2552, pp. 34–40).