Skip to main content

Discrete Scan Statistics for Higher-Order Markovian Sequences

  • Living reference work entry
  • First Online:
Handbook of Scan Statistics
  • 233 Accesses

Abstract

In this chapter we review methods for computing probabilities of the discrete scan statistic. Most of the presented results are for independent trials, as results for higher-order Markovian sequences are scarce. Results from three papers on exact computation of probabilities in Markovian sequences are given, two of which are for binary Markov chains, the third allowing multistate higher-order Markovian trials. Whereas exact computation of the complete distribution of the statistic is limited to relatively small values of the scanning window w, larger window sizes can be handled in the case of individual p-values and extreme values of the scan statistic. Approximations and bounds on probabilities for the statistic have been developed for still larger values of w. Product-type and Poisson/compound Poisson approximations are considered here, as well as Bonferroni- and product-type bounds that give a feel for the accuracy of approximations. The final section includes numerical comparisons of exact and approximate methods to evaluate the accuracy of the approximations and possible areas of future study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

References

  • Amarioarei A (2014) Approximation for multidimensional discrete scan statistics. Doctoral Dissertation, Universite de Lille, France

    MATH  Google Scholar 

  • Arratia R, Goldstein L, Gordon L (1989) Poisson approximation and the Chen-Stein method. Stat Sci 5:403–434

    Article  MathSciNet  Google Scholar 

  • Balakrishnan N, Koutras MV (2002) Runs and scans with applications. Wiley, New York

    MATH  Google Scholar 

  • Bogartz RS (1965) The criterion method: some analysis and remarks. Psych Bull 64:1–14

    Article  Google Scholar 

  • Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilita. Pubbl d R 1st Super di Sci Econom e Commerciali di Firenze 8:1–62

    Google Scholar 

  • Brookner E (1966) Recurrent events in a Markov chain. Inf Control 9:215–229

    Article  MathSciNet  Google Scholar 

  • Chen J (1998) Approximations and inequalities for discrete scan statistics. Doctoral Dissertation, University of Connecticut, Storrs, Connecticut

    Google Scholar 

  • Chen J, Glaz J (1996) Two-dimensional discrete scan statistics. Stat Probab Lett 31(1):59–68

    Article  MathSciNet  Google Scholar 

  • Coleman D, Martin DEK, Reich B (2015) Multiple window scan statistics for higher-order Markovian sequences. J Appl Stat 42(8):1690–1705

    Article  MathSciNet  Google Scholar 

  • Ebneshahrashoob M, Gao T, Wu M (2005) An efficient algorithm for exact distribution of discrete scan statistic. Methods Comput Appl Probab 7:459–481

    Article  MathSciNet  Google Scholar 

  • Fu JC (2000) Distribution of scan and related statistics for a sequence of Bernoulli trials. Manuscript Department Statist, The University of Manitoba, Winnepeg, Manitoba

    Google Scholar 

  • Fu JC (2001) Distribution of the discrete scan statistic for a sequence of bistate trials. J Appl Probab 38:908–916

    Article  MathSciNet  Google Scholar 

  • Fu JC, Koutras MV (1994) Distribution theory of runs: a Markov chain approach. J Am Stat Assoc 89:1050–1058

    Article  MathSciNet  Google Scholar 

  • Fu JC, Lou WYW (2003) Distribution theory of runs and patterns and its applications: a finite Markov chain imbedding approach. World Scientific, Singapore

    Book  Google Scholar 

  • Glaz J (1983) Moving window detection for discrete data. IEEE Trans Inf Theory IT-29:457–462

    Article  Google Scholar 

  • Glaz J (1990) A comparison of Bonferroni-type and product-type inequalities in the presence of dependence. In: Block HW, Sampson AR, Savits TH (eds) Topics in statistical dependence. IMS lecture notes – monograph, vol 16. IMS, Hayward, pp 223–235

    Chapter  Google Scholar 

  • Glaz J (1996) Discrete scan statistics with applications to minefield detection. In: Proceedings SPIE 2765, detection and remediation technologies for mines and minelike targets, 31 May 1996. https://doi.org/10.1117/12.241245

  • Glaz J, Balakrishnan N (1999) Scan statistics and applications. Birkhauser, Boston

    Book  Google Scholar 

  • Glaz J, Naus JI (1991) Tight bounds and approximations for scan statistic probabilities for discrete data. Ann Appl Probab 1:306–318

    Article  MathSciNet  Google Scholar 

  • Glaz J, Zhang Z (2004) Multiple window scan statistics. J Appl Probab 31:967–980

    MathSciNet  MATH  Google Scholar 

  • Glaz J, Naus J, Roos M, Wallenstein S (1994) Poisson approximations for the distribution and moments of ordered m-spacings. J Appl Probab 31:271–281

    Article  MathSciNet  Google Scholar 

  • Glaz J, Naus J, Wallenstein S (2001) Scan statistics. Springer, New York

    Book  Google Scholar 

  • Glaz J, Pozdnyakov V, Wallenstein S (2009) Scan statistics: methods and applications. Birkhauser, Boston

    Book  Google Scholar 

  • Goldstein L, Waterman MS (1992) Poisson, compound Poisson and process approximations for testing statistical significance in sequence comparisons. Bull Math Biol 54:785–812

    Article  Google Scholar 

  • Greenburg I (1970) The first occurrence of n successes in N trials. Technometrics 12(3):627–634

    Article  Google Scholar 

  • Hailperin T (1965) Best possible inequalities for the probability of a logical function of events. Am Math Mon 72:343–359

    Article  MathSciNet  Google Scholar 

  • Haiman G (2007) Estimating the distribution of one-dimensional discrete scan statistics viewed as extremes of 1-dependent stationary processes. J Stat Plann Inference 137:821–828

    Article  MathSciNet  Google Scholar 

  • Hoh J, Ott J (2000) Scan statistics to scan markers for susceptible genes. Proc Nat Acad Sci USA 97:9615–9617

    Article  Google Scholar 

  • Hoover DR (1989) Subset complement addition upper bound – an improved inclusion-exclusion method. Technical Report No. 416, Department of Statistics, University of South Carolina

    Google Scholar 

  • Hunter D (1976) An upper bound for the probability of a union. J Appl Probab 13:597–603

    Article  MathSciNet  Google Scholar 

  • Huntington RJ (1976) Mean recurrence times for k successes within m trials. J Appl Probab 3:604–607

    Article  MathSciNet  Google Scholar 

  • Karlin S, Blaisdell BE, Brendel V (1990) Identification of significant sequence patterns in proteins. Meth Enzym 183:388–402

    Article  Google Scholar 

  • Karwe W, Naus JI (1997) New recursive methods for scan statistic probabilities. Comput Stat Data Anal 33:389–402

    Article  Google Scholar 

  • Kounias S, Marin J (1976) Best linear Bonferroni bounds. SIAM J Appl Math 30(2):307–323

    Article  MathSciNet  Google Scholar 

  • Koutras MV, Alexandrou VA (1995) Runs, scans and urn model distributions: a unified Markov chain approach. Ann Inst Stat Math 47(4):743–766

    Article  MathSciNet  Google Scholar 

  • Koutras MV, Papastavridis SG (1993) On the number of runs and related statistics. Stat Sin 3:277–294

    MathSciNet  MATH  Google Scholar 

  • Krauth J (1992) Bounds for the upper-tail probabilities of the circular ratchet scan statistic. Biometrics 48:1177–1185

    Article  MathSciNet  Google Scholar 

  • Martin DEK (2015) P-values for the discrete scan statistic through slack variables. Commun Stat Sim Comput 44(9):2223–2239

    Article  MathSciNet  Google Scholar 

  • Martin DEK (2018) Minimal auxiliary Markov chains through sequential elimination of states. Commun Stat Sim Comput (in press)

    Google Scholar 

  • Martin DEK, Noe L (2017) Faster exact probabilities for statistics of overlapping pattern occurrences. Ann Inst Stat Math 69(1):231–248

    Article  Google Scholar 

  • Nam C, Aston JAD, Johansen AM (2012) Quantifying the uncertainty in change points. J Time Ser Anal 33(5):807–823

    Article  MathSciNet  Google Scholar 

  • Naus JI (1974) Probabilities for a generalized birthday problem. J Am Stat Assoc 69:810–815

    Article  MathSciNet  Google Scholar 

  • Naus JI (1982) Approximations for distributions of scan statistics. J Am Stat Assoc 77:377–385

    Article  MathSciNet  Google Scholar 

  • Naus JI, Sheng KN (1997) Matching among multiple random sequences. Bull Math Biol 59:483–496; J Am Stat Assoc 77:377–385

    Google Scholar 

  • Nelson JB (1978) Minimal order models for false alarm calculations on sliding windows. IEEE Trans Aer Elec Syst 15:352–363

    Google Scholar 

  • Nuel G (2008) Pattern Markov chains: optimal Markov chain embedding through deterministic finite automata. J Appl Probab 45(1):226–243

    Article  MathSciNet  Google Scholar 

  • Pozdnyakov V, Glaz J, Kulldorff M, Steele JM (2005) A martingale approach to scan statistics. Ann Inst Stat Math 57:21–37

    Article  MathSciNet  Google Scholar 

  • Ribeca P, Raineri E (2008) Faster exact Markovian probability functions for motif occurrences: a DFA-only approach. Bioinformatics 24(24):2839–2848

    Article  Google Scholar 

  • Robin S, Rodolphe F, Schbath S (2005) DNA, words and models. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Runnels LK, Thompson R, Runnels P (1968) Near-perfect runs as a learning criterion. J Math Psych 5:362–368

    Article  Google Scholar 

  • Saperstein B (1973) On the occurrences of n successes within N Bernoulli trials. Technometrics 15:169–175

    MathSciNet  MATH  Google Scholar 

  • Sun YV, Jacobsen DM, Kardia SLR (2006) ChromoScan: a scan statistic application for identifying chromosomal regions in genomic studies. Bioinformatics 22(23):2945–2947

    Article  Google Scholar 

  • Wagner A (1999) Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics 15:776–784

    Article  Google Scholar 

  • Waterman MS (1995) Introduction to computational biology. Chapman & Hall, New York

    Book  Google Scholar 

  • Wu TL (2013) On Markov chain imbedding and its applications. Methodol Comput Appl Probab 15:453–465

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donald E. K. Martin .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Martin, D.E.K. (2019). Discrete Scan Statistics for Higher-Order Markovian Sequences. In: Glaz, J., Koutras, M. (eds) Handbook of Scan Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8414-1_35-1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-8414-1_35-1

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-8414-1

  • Online ISBN: 978-1-4614-8414-1

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics