Handbook of Scan Statistics pp 1-19 | Cite as

# Generating Function Methods for Run and Scan Statistics

## Abstract

Runs and pattern statistics have found successful applications in various fields. Many classical results of distributions of runs were obtained by combinatorial methods. As the patterns under study become complicated, the combinatorial complexity involved may become challenging, especially when dealing with multistate or multiset systems. Several unified methods have been devised to overcome the combinatorial difficulties. One of them is the finite Markov chain imbedding approach. Here we use a systematic approach that is inspired by methods in statistical physics. In this approach the study of run and pattern distributions is decoupled into two easy independent steps. In the first step, elements of each object (usually represented by its generating function) are considered in isolation without regards of elements of the other objects. In the second step, formulas in matrix or explicit forms combine the results from the first step into a whole multi-object system with potential nearest neighbor interactions. By considering only one kind of object each time in the first step, the complexity arising from the simultaneous interactions of elements from multiple objects is avoided. In essence the method builds up a higher level generating function for the whole system by using the lower level of generating functions from individual objects. By dealing with generating functions in each step, the method usually obtains results that are more general than those obtained by other methods. Examples of different complexities and flavors for run- and pattern-related distributions will be used to illustrate the method.

## Keywords

Combinatorial complexity Distribution-free statistical test Distributions of runs Eulerian number and Simon Newcomb number Generating function Multivariate ligand binding Randomness test Rises, falls, and levels Successions## References

- Balakrishnan N, Koutras MV (2002) Runs and scans with applications. Wiley, New YorkzbMATHGoogle Scholar
- Di Cera E, Kong Y (1996) Theory of multivalent binding in one and two-dimensional lattices. Biophys Chem 61(2):107–124CrossRefGoogle Scholar
- Dillon JF, Roselle DP (1969) Simon Newcomb’s problem. SIAM J Appl Math 17:1086–1093MathSciNetCrossRefGoogle Scholar
- Dwass M (1973) The number of increases in a random permutation. J Combin Theory Ser A 15:192–199MathSciNetCrossRefGoogle Scholar
- Fu JC (1995) Exact and limiting distributions of the number of successions in a random permutation. Ann Inst Stat Math 47:435–446MathSciNetzbMATHGoogle Scholar
- Fu JC, Koutras MV (1994) Distribution theory of runs: a Markov chain approach. J Am Stat Assoc 89:1050–1058MathSciNetCrossRefGoogle Scholar
- Fu JC, Lou WYW (2003) Distribution theory of runs and patterns: a finite Markov chain imbedding approach. World Scientific Publishing Company Pte Limited, SingaporeCrossRefGoogle Scholar
- Glaz J, Naus J, Wallenstein S (2001) Scan statistics. Springer, New YorkCrossRefGoogle Scholar
- Glaz J, Pozdnyakov V, Wallenstein S (eds) (2009) Scan statistics: methods and applications, 1st edn. Birkhäuser, BaselzbMATHGoogle Scholar
- Godbole AP, Papastavridis SG (eds) (1994) Runs and patterns in probability: selected papers. Kluwer Academic Publishers, DordrechtGoogle Scholar
- Graham RL, Knuth DE, Patashnik O (1994) Concrete mathematics: a foundation for computer science, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., BostonzbMATHGoogle Scholar
- Hill T (1985) Cooperativity theory in biochemistry: steady-state and equilibrium systems. Molecular biology, biochemistry and biophysics series. Springer, New YorkCrossRefGoogle Scholar
- Hirano K (1986) Some properties of the distributions of order k. In: Philippou AN, Bergum GE, Horadam AF (eds) Fibonacci numbers and their applications. Reidel, Dordrecht, pp 43–53CrossRefGoogle Scholar
- Inoue K, Aki S (2007) Joint distributions of numbers of runs of specified lengths in a sequence of Markov dependent multistate trials. Ann Inst Statist Math 59(3):577–595MathSciNetCrossRefGoogle Scholar
- Johnson BC (2002) The distribution of increasing 2-sequences in random permutations of arbitrary multi-sets. Statist Probab Lett 59(1):67–74MathSciNetCrossRefGoogle Scholar
- Kaplansky I (1944) Symbolic solution of certain problems in permutations. Bull Am Math Soc 50(12):906–914MathSciNetCrossRefGoogle Scholar
- Knuth DE (1997) The art of computer programming, vol 2 (3rd Ed.): seminumerical algorithms. Addison-Wesley Longman Publishing Co., Inc., BostonzbMATHGoogle Scholar
- Kong Y (1999) General recurrence theory of ligand binding on a three-dimensional lattice. J Chem Phys 111:4790–4799CrossRefGoogle Scholar
- Kong Y (2001) A simple method for evaluating partition functions of linear polymers. J Phys Chem B 105:10111–10114CrossRefGoogle Scholar
- Kong Y (2006a) Distribution of runs and longest runs: a new generating function approach. J Am Stat Assoc 101:1253–1263MathSciNetCrossRefGoogle Scholar
- Kong Y (2006b) Packing dimers on (2p+ 1)×(2q+ 1) lattices. Phys Rev E 73(1):016106MathSciNetCrossRefGoogle Scholar
- Kong Y (2007) Asymptotics of the monomer-dimer model on two-dimensional semi-infinite lattices. Phys Rev E 75(5):051123CrossRefGoogle Scholar
- Kong Y (2015) Distributions of runs revisited. Commun Stat Theory Methods 44:4663–4678MathSciNetCrossRefGoogle Scholar
- Kong Y (2018a) Decoupling combinatorial complexity: a two-step approach to distributions of runs. Methodology and Computing in Applied Probability, https://doi.org/10.1007/s11009-018-9689-1 Google Scholar
- Kong Y (2016) Number of appearances of events in random sequences: a new approach to non-overlapping runs. Commun Stat Theory Methods 45(22):6765–6772MathSciNetCrossRefGoogle Scholar
- Kong Y (2017a) The mth longest runs of multivariate random sequences. Ann Ins Stat Math 69:497–512CrossRefGoogle Scholar
- Kong Y (2017b) Number of appearances of events in random sequences: a new generating function approach to
*Type II*and*Type III*runs. Ann Ins Stat Math 69:489–495MathSciNetCrossRefGoogle Scholar - Kong Y (2018a) Distributions of successions of arbitrary multisets. Manuscript submittedGoogle Scholar
- Kong Y (2018b) Joint distribution of rises, falls, and number of runs in random sequences. Commun Stat Theory Methods. https://doi.org/10.1080/03610926.2017.1414261
- Kong Y (2018c) Decoupling combinatorial complexity: a two-step approach to distributions of runs. Methodol Comput Appl Probab. https://doi.org/10.1007/s11009-018-9689-1
- Koutras M (1997) Waiting times and number of appearances of events in a sequence of discrete random variables. In: Balakrishnan N (ed) Advances in combinatorial methods and applications to probability and statistics. Birkhäuser, Boston, pp 363–384CrossRefGoogle Scholar
- Koutras MV, Alexandrou VA (1995) Runs, scans and urn model distributions: a unified Markov chain approach. Ann Ins Stat Math 47(4):743–766MathSciNetCrossRefGoogle Scholar
- Philippou AN, Makri FS (1986) Successes, runs, and longest runs. Stat Probab Lett 4:211–215MathSciNetCrossRefGoogle Scholar
- Philippou AN, Georghiou C, Philippou GN (1983) A generalized geometric distribution and some of its properties. Stat Probab Lett 1:171–175MathSciNetCrossRefGoogle Scholar
- Reilly JW, Tanny SM (1980) Counting permutations by successions and other figures. Discret Math 32:69–76MathSciNetCrossRefGoogle Scholar
- Riordan J (1965) A recurrence for permutations without rising or falling successions. Ann Math Stat 36(2):708–710MathSciNetCrossRefGoogle Scholar
- Stanley RP (2011) Enumerative combinatorics. Cambridge studies in advanced mathematics, vol 1, 2nd edn. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- Whitworth WA (1959) Choice and chance, 5th edn. Hafner Publishing CO., New YorkzbMATHGoogle Scholar
- Zasedatelev A, Gurskii G, Vol’kenshtein M (1971) Theory of one-dimensional adsorption. I. Adsorption of small molecules on a homopolymer. Mol Biol 5(2):194–198Google Scholar