Skip to main content
Log in

On the asymptotic distribution of the scan statistic for empirical distributions

  • Published:
Extremes Aims and scope Submit manuscript

Abstract

This paper investigates the asymptotic behavior of several variants of the scan statistic for empirical distributions, which can be applied to detect the presence of an anomalous interval of any given length. In particular, we are interested in a Studentized scan statistic that is often preferable in practice. The main ingredients of our proof include Kolmogorov’s theorem, Poisson approximation, and the technical devices developed by Kabluchko and Wang (Stoch. Process. Their Appl. 124 (2014) 2824–2867).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. Naus himself cited even earlier works in the 1940’s by Silberstein (1945), Berg (1945), and Mack (1948).

References

  • Aldous, D.: Probability approximations via the Poisson clumping heuristic, vol 77. Springer Science & Business Media (2013)

  • Anderson, T.W., Darling, D.A.: Asymptotic theory of certain goodness of fit criteria based on stochastic processes. The annals of mathematical statistics pp 193–212 (1952)

  • Arias-Castro, E., Donoho, D.L., Huo, X.: Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inf. Theory 51(7), 2402–2425 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Arias-Castro, E., Chen, S., et al.: Distribution-free multiple testing. Elec J Stat 11(1), 1983–2001 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Arias-Castro, E., Ying, A., et al.: Detection of sparse mixtures: Higher criticism and scan statistic. Elec J Stat 13(1), 208–230 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  • Arias-Castro, E., Chen, S., Ying, A.: A scan procedure for multiple testing: Beyond threshold-type procedures. J. Stat. Plan. Inf. (2020)

  • Arratia, R., Goldstein, L., Gordon, L.: Two moments suffice for poisson approximations: the chen-stein method. Ann. Probab. 17(1), 9–25 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  • Bahadur, R.R., Rao, R.R.: On deviations of the sample mean. Ann Math Statist 31(4), 1015–1027 (1960)

    Article  MathSciNet  MATH  Google Scholar 

  • Barnard, G.A.: Control charts and stochastic processes. J. Roy. Stat. Soc.: Ser. B (Methodol.) 21(2), 239–257 (1959)

    MATH  Google Scholar 

  • Berg, W.: Aggregates in one-and two-dimensional random distributions. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 36(256), 337–346 (1945)

    Article  Google Scholar 

  • Berk, R.H., Jones, D.H.: Goodness-of-fit test statistics that dominate the kolmogorov statistics. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 47(1), 47–59 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  • Besag, J., Newell, J.: The detection of clusters in rare diseases. J. R. Stat. Soc. A. Stat. Soc. 154(1), 143–155 (1991)

    Article  Google Scholar 

  • Cai, T.T., Wu, Y.: Optimal detection of sparse mixtures against a given null distribution. IEEE Trans. Inf. Theory 60(4), 2217–2232 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Cai, T.T., Jin, J., Low, M.G.: Estimation and confidence sets for sparse normal mixtures. Ann. Stat. 35(6), 2421–2449 (2007)

    MathSciNet  MATH  Google Scholar 

  • Cai, T.T., Jeng, X.J., Jin, J.: Optimal detection of heterogeneous and heteroscedastic mixtures. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(5), 629–662 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Chan, H.P., Lai, T.L.: Maxima of asymptotically gaussian random fields and moderate deviation approximations to boundary crossing probabilities of sums of random variables with multidimensional indices. Ann. Probab. 34(1), 80–121 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Cramér, H.: Les sommes et les fonctions de variables aléatoires, vol. 736. Hermann (1938)

    MATH  Google Scholar 

  • Darling, D., Erdös, P.: A limit theorem for the maximum of normalized sums of independent random variables. Duke Math. J. 23(1), 143–155 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  • Deheuvels, P., Devroye, L., Lynch, J.: Exact convergence rate in the limit theorems of erdos-renyi and shepp. Ann. Probab. 14(1), 209–223 (1986)

    MathSciNet  MATH  Google Scholar 

  • Donoho, D., Jin, J.: Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics pp 962–994  (2004)

  • Donoho, D., Jin, J.: Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. 105(39), 14790–14795 (2008)

    Article  MATH  Google Scholar 

  • Donoho, D., Jin, J.: Special invited paper: Higher criticism for large-scale inference, especially for rare and weak effects. Stat. Sci. pp 1–25 (2015)

  • Dümbgen, L., Spokoiny, V.G.: Multiscale testing of qualitative hypotheses. Annals of Statistics pp 124–152 (2001)

  • Eicker, F.: The asymptotic distribution of the suprema of the standardized empirical processes. The Annals of Statistics pp 116–138 (1979)

  • Erdös, P., Rényi, A.: On a new law of large numbers. J d’Analyse Mathématique 23(1), 103–111 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  • Gao, Z., Stoev, S., et al.: Fundamental limits of exact support recovery in high dimensions. Bernoulli 26(4), 2605–2638 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Glaz, J., Balakrishnan, N.: Scan statistics and applications. Springer (1999)

    Book  MATH  Google Scholar 

  • Glaz, J., Koutras, M.V.: Handbook of Scan Statistics. Springer, New York. (2018)  https://doi.org/10.1007/978-1-4614-8414-1

  • Glaz, J., Naus, J.I., Wallenstein, S.: Scan statistics. Springer (2001)

    Book  MATH  Google Scholar 

  • Glaz, J., Pozdnyakov, V., Wallenstein, S.: Scan statistics: Methods and applications. Springer Science & Business Media (2009)

  • Gombay, E., Horvath, L.: An application of the maximum likelihood test to the change-point problem. Stochastic Processes and their Applications 50(1), 161–171 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Gontscharuk, V., Finner, H.: Asymptotics of goodness-of-fit tests based on minimum p-value statistics. Communications in Statistics-Theory and Methods 46(5), 2332–2342 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., Weiss, D.: Syndromic surveillance in public health practice. New York City. Emerging Infectious Diseases 10(5), 858–864 (2004)

    Article  Google Scholar 

  • Jaeschke, D.: The asymptotic distribution of the supremum of the standardized empirical distribution function on subintervals. The Annals of Statistics pp 108–115 (1979)

  • Jager, L., Wellner, J.A.: A new goodness of fit test: the reversed berk-jones statistic (2004)

  • Jager, L., Wellner, J.A.: Goodness-of-fit tests via phi-divergences. Ann. Stat. 35(5), 2018–2053 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Jin, J.: Detecting and estimating sparse mixtures. PhD thesis, Stanford University (2003)

  • Jin, J., Starck, J.L., Donoho, D.L., Aghanim, N.: Forni O (2005) Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests. EURASIP Journal on Advances in Signal Processing 15, 297184 (2005)

    Article  Google Scholar 

  • Kabluchko, Z.: Extremes of the standardized gaussian noise. Stochastic Processes and their Applications 121(3), 515–533 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Kabluchko, Z., Wang, Y.: Limiting distribution for the maximal standardized increment of a random walk. Stochastic Processes and their Applications 124(9), 2824–2867 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Kolmogorov, A.: Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano degli Attuari 4, 89–91 (1933)

    Google Scholar 

  • König, C., Munk, A., Werner, F., et al.: Multidimensional multiscale scanning in exponential families: Limit theory and statistical consequences. Ann. Stat. 48(2), 655–678 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Kulldorff, M.: A spatial scan statistic. Communications in Statistics-Theory and methods 26(6), 1481–1496 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Mack, C.: An exact formula for \(q_k(n)\), the probable number of \(k\)-aggregates in a random distribution of \(n\) points. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 39(297), 778–790 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  • Mason, D.M., Shorack, G.R., Wellner, J.A.: Strong limit theorems for oscillation moduli of the uniform empirical process. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 65(1), 83–97 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Mikosch, T., Račkauskas, A.: The limit distribution of the maximum increment of a random walk with regularly varying jump size distribution. Bernoulli 16(4), 1016–1038 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Moscovich, A., Nadler, B., Spiegelman, C.: On the exact berk-jones statistics and their \(p\)-value calculation. Elec. J. Stat. 10(2), 2329–2354 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Naus, J.I.: The distribution of the size of the maximum cluster of points on a line. J. Am. Stat. Assoc. 60(310), 532–538 (1965)

    Article  MathSciNet  Google Scholar 

  • Petrov, V.V.: Limit theorems of probability theory: sequences of independent random variables. Tech. rep, Oxford, New York (1995)

    MATH  Google Scholar 

  • Proksch, K., Werner, F., Munk, A.: Multiscale scanning in inverse problems. Ann. Stat. 46(6B), 3569–3602 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Qualls, C., Watanabe, H.: Asymptotic properties of gaussian random fields. Trans. Am. Math. Soc. 177, 155–171 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  • Sharpnack, J., Arias-Castro, E.: Exact asymptotics for the scan statistic and fast alternatives. Elec. J. Stat. 10(2), 2641–2684 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Shorack, G.R., Wellner, J.A.: Empirical processes with applications to statistics. SIAM (2009)

  • Siegmund, D.: Large deviations for boundary crossing probabilities. The Annals of Probability pp 581–588 (1982) 

  • Siegmund, D.: Boundary crossing probabilities and statistical applications. The Annals of Statistics pp 361–404 (1986)

  • Siegmund, D.: Approximate tail probabilities for the maxima of some random fields. The Annals of Probability pp 487–501 (1988)

  • Siegmund, D.: Sequential analysis: tests and confidence intervals. Springer Science & Business Media (2013)

  • Siegmund, D., Venkatraman, E.: Using the generalized likelihood ratio statistic for sequential detection of a change-point. The Annals of Statistics pp 255–271 (1995)

  • Siegmund, D., Yakir, B.: Tail probabilities for the null distribution of scanning statistics. Bernoulli 6(2), 191–213 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Silberstein, L.: The probable number of aggregates in distributions of points. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 36(256), 319–336 (1945)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Andrew Ying was partially supported by the Achievement Rewards for College Scientists (ARCS) Scholarship. The authors would like to thank Ery Arias-Castro for motivating the problem, and Qi-Man Shao, Xiao Fang, Hock Peng Chan and David O. Siegmund for stimulating discussions and pointers to the literature. The authors would also like to thank two anonymous reviewers for their valuable suggestions which considerably improved the content and the structure of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Ying.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ying, A., Zhou, WX. On the asymptotic distribution of the scan statistic for empirical distributions. Extremes 25, 487–528 (2022). https://doi.org/10.1007/s10687-021-00435-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10687-021-00435-1

Keywords

Navigation