Abstract
In this chapter we review methods for computing probabilities of the discrete scan statistic. Most of the presented results are for independent trials, as results for higher-order Markovian sequences are scarce. Results from three papers on exact computation of probabilities in Markovian sequences are given, two of which are for binary Markov chains, the third allowing multistate higher-order Markovian trials. Whereas exact computation of the complete distribution of the statistic is limited to relatively small values of the scanning window w, larger window sizes can be handled in the case of individual p-values and extreme values of the scan statistic. Approximations and bounds on probabilities for the statistic have been developed for still larger values of w. Product-type and Poisson/compound Poisson approximations are considered here, as well as Bonferroni- and product-type bounds that give a feel for the accuracy of approximations. The final section includes numerical comparisons of exact and approximate methods to evaluate the accuracy of the approximations and possible areas of future study.
Similar content being viewed by others
References
Amarioarei A (2014) Approximation for multidimensional discrete scan statistics. Doctoral Dissertation, Universite de Lille, France
Arratia R, Goldstein L, Gordon L (1989) Poisson approximation and the Chen-Stein method. Stat Sci 5:403–434
Balakrishnan N, Koutras MV (2002) Runs and scans with applications. Wiley, New York
Bogartz RS (1965) The criterion method: some analysis and remarks. Psych Bull 64:1–14
Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilita. Pubbl d R 1st Super di Sci Econom e Commerciali di Firenze 8:1–62
Brookner E (1966) Recurrent events in a Markov chain. Inf Control 9:215–229
Chen J (1998) Approximations and inequalities for discrete scan statistics. Doctoral Dissertation, University of Connecticut, Storrs, Connecticut
Chen J, Glaz J (1996) Two-dimensional discrete scan statistics. Stat Probab Lett 31(1):59–68
Coleman D, Martin DEK, Reich B (2015) Multiple window scan statistics for higher-order Markovian sequences. J Appl Stat 42(8):1690–1705
Ebneshahrashoob M, Gao T, Wu M (2005) An efficient algorithm for exact distribution of discrete scan statistic. Methods Comput Appl Probab 7:459–481
Fu JC (2000) Distribution of scan and related statistics for a sequence of Bernoulli trials. Manuscript Department Statist, The University of Manitoba, Winnepeg, Manitoba
Fu JC (2001) Distribution of the discrete scan statistic for a sequence of bistate trials. J Appl Probab 38:908–916
Fu JC, Koutras MV (1994) Distribution theory of runs: a Markov chain approach. J Am Stat Assoc 89:1050–1058
Fu JC, Lou WYW (2003) Distribution theory of runs and patterns and its applications: a finite Markov chain imbedding approach. World Scientific, Singapore
Glaz J (1983) Moving window detection for discrete data. IEEE Trans Inf Theory IT-29:457–462
Glaz J (1990) A comparison of Bonferroni-type and product-type inequalities in the presence of dependence. In: Block HW, Sampson AR, Savits TH (eds) Topics in statistical dependence. IMS lecture notes – monograph, vol 16. IMS, Hayward, pp 223–235
Glaz J (1996) Discrete scan statistics with applications to minefield detection. In: Proceedings SPIE 2765, detection and remediation technologies for mines and minelike targets, 31 May 1996. https://doi.org/10.1117/12.241245
Glaz J, Balakrishnan N (1999) Scan statistics and applications. Birkhauser, Boston
Glaz J, Naus JI (1991) Tight bounds and approximations for scan statistic probabilities for discrete data. Ann Appl Probab 1:306–318
Glaz J, Zhang Z (2004) Multiple window scan statistics. J Appl Probab 31:967–980
Glaz J, Naus J, Roos M, Wallenstein S (1994) Poisson approximations for the distribution and moments of ordered m-spacings. J Appl Probab 31:271–281
Glaz J, Naus J, Wallenstein S (2001) Scan statistics. Springer, New York
Glaz J, Pozdnyakov V, Wallenstein S (2009) Scan statistics: methods and applications. Birkhauser, Boston
Goldstein L, Waterman MS (1992) Poisson, compound Poisson and process approximations for testing statistical significance in sequence comparisons. Bull Math Biol 54:785–812
Greenburg I (1970) The first occurrence of n successes in N trials. Technometrics 12(3):627–634
Hailperin T (1965) Best possible inequalities for the probability of a logical function of events. Am Math Mon 72:343–359
Haiman G (2007) Estimating the distribution of one-dimensional discrete scan statistics viewed as extremes of 1-dependent stationary processes. J Stat Plann Inference 137:821–828
Hoh J, Ott J (2000) Scan statistics to scan markers for susceptible genes. Proc Nat Acad Sci USA 97:9615–9617
Hoover DR (1989) Subset complement addition upper bound – an improved inclusion-exclusion method. Technical Report No. 416, Department of Statistics, University of South Carolina
Hunter D (1976) An upper bound for the probability of a union. J Appl Probab 13:597–603
Huntington RJ (1976) Mean recurrence times for k successes within m trials. J Appl Probab 3:604–607
Karlin S, Blaisdell BE, Brendel V (1990) Identification of significant sequence patterns in proteins. Meth Enzym 183:388–402
Karwe W, Naus JI (1997) New recursive methods for scan statistic probabilities. Comput Stat Data Anal 33:389–402
Kounias S, Marin J (1976) Best linear Bonferroni bounds. SIAM J Appl Math 30(2):307–323
Koutras MV, Alexandrou VA (1995) Runs, scans and urn model distributions: a unified Markov chain approach. Ann Inst Stat Math 47(4):743–766
Koutras MV, Papastavridis SG (1993) On the number of runs and related statistics. Stat Sin 3:277–294
Krauth J (1992) Bounds for the upper-tail probabilities of the circular ratchet scan statistic. Biometrics 48:1177–1185
Martin DEK (2015) P-values for the discrete scan statistic through slack variables. Commun Stat Sim Comput 44(9):2223–2239
Martin DEK (2018) Minimal auxiliary Markov chains through sequential elimination of states. Commun Stat Sim Comput (in press)
Martin DEK, Noe L (2017) Faster exact probabilities for statistics of overlapping pattern occurrences. Ann Inst Stat Math 69(1):231–248
Nam C, Aston JAD, Johansen AM (2012) Quantifying the uncertainty in change points. J Time Ser Anal 33(5):807–823
Naus JI (1974) Probabilities for a generalized birthday problem. J Am Stat Assoc 69:810–815
Naus JI (1982) Approximations for distributions of scan statistics. J Am Stat Assoc 77:377–385
Naus JI, Sheng KN (1997) Matching among multiple random sequences. Bull Math Biol 59:483–496; J Am Stat Assoc 77:377–385
Nelson JB (1978) Minimal order models for false alarm calculations on sliding windows. IEEE Trans Aer Elec Syst 15:352–363
Nuel G (2008) Pattern Markov chains: optimal Markov chain embedding through deterministic finite automata. J Appl Probab 45(1):226–243
Pozdnyakov V, Glaz J, Kulldorff M, Steele JM (2005) A martingale approach to scan statistics. Ann Inst Stat Math 57:21–37
Ribeca P, Raineri E (2008) Faster exact Markovian probability functions for motif occurrences: a DFA-only approach. Bioinformatics 24(24):2839–2848
Robin S, Rodolphe F, Schbath S (2005) DNA, words and models. Cambridge University Press, Cambridge
Runnels LK, Thompson R, Runnels P (1968) Near-perfect runs as a learning criterion. J Math Psych 5:362–368
Saperstein B (1973) On the occurrences of n successes within N Bernoulli trials. Technometrics 15:169–175
Sun YV, Jacobsen DM, Kardia SLR (2006) ChromoScan: a scan statistic application for identifying chromosomal regions in genomic studies. Bioinformatics 22(23):2945–2947
Wagner A (1999) Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics 15:776–784
Waterman MS (1995) Introduction to computational biology. Chapman & Hall, New York
Wu TL (2013) On Markov chain imbedding and its applications. Methodol Comput Appl Probab 15:453–465
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Martin, D.E.K. (2019). Discrete Scan Statistics for Higher-Order Markovian Sequences. In: Glaz, J., Koutras, M. (eds) Handbook of Scan Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8414-1_35-1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8414-1_35-1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8414-1
Online ISBN: 978-1-4614-8414-1
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering