Abstract
This paper investigates the asymptotic behavior of several variants of the scan statistic for empirical distributions, which can be applied to detect the presence of an anomalous interval of any given length. In particular, we are interested in a Studentized scan statistic that is often preferable in practice. The main ingredients of our proof include Kolmogorov’s theorem, Poisson approximation, and the technical devices developed by Kabluchko and Wang (Stoch. Process. Their Appl. 124 (2014) 2824–2867).
Similar content being viewed by others
References
Aldous, D.: Probability approximations via the Poisson clumping heuristic, vol 77. Springer Science & Business Media (2013)
Anderson, T.W., Darling, D.A.: Asymptotic theory of certain goodness of fit criteria based on stochastic processes. The annals of mathematical statistics pp 193–212 (1952)
Arias-Castro, E., Donoho, D.L., Huo, X.: Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inf. Theory 51(7), 2402–2425 (2005)
Arias-Castro, E., Chen, S., et al.: Distribution-free multiple testing. Elec J Stat 11(1), 1983–2001 (2017)
Arias-Castro, E., Ying, A., et al.: Detection of sparse mixtures: Higher criticism and scan statistic. Elec J Stat 13(1), 208–230 (2019)
Arias-Castro, E., Chen, S., Ying, A.: A scan procedure for multiple testing: Beyond threshold-type procedures. J. Stat. Plan. Inf. (2020)
Arratia, R., Goldstein, L., Gordon, L.: Two moments suffice for poisson approximations: the chen-stein method. Ann. Probab. 17(1), 9–25 (1989)
Bahadur, R.R., Rao, R.R.: On deviations of the sample mean. Ann Math Statist 31(4), 1015–1027 (1960)
Barnard, G.A.: Control charts and stochastic processes. J. Roy. Stat. Soc.: Ser. B (Methodol.) 21(2), 239–257 (1959)
Berg, W.: Aggregates in one-and two-dimensional random distributions. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 36(256), 337–346 (1945)
Berk, R.H., Jones, D.H.: Goodness-of-fit test statistics that dominate the kolmogorov statistics. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 47(1), 47–59 (1979)
Besag, J., Newell, J.: The detection of clusters in rare diseases. J. R. Stat. Soc. A. Stat. Soc. 154(1), 143–155 (1991)
Cai, T.T., Wu, Y.: Optimal detection of sparse mixtures against a given null distribution. IEEE Trans. Inf. Theory 60(4), 2217–2232 (2014)
Cai, T.T., Jin, J., Low, M.G.: Estimation and confidence sets for sparse normal mixtures. Ann. Stat. 35(6), 2421–2449 (2007)
Cai, T.T., Jeng, X.J., Jin, J.: Optimal detection of heterogeneous and heteroscedastic mixtures. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73(5), 629–662 (2011)
Chan, H.P., Lai, T.L.: Maxima of asymptotically gaussian random fields and moderate deviation approximations to boundary crossing probabilities of sums of random variables with multidimensional indices. Ann. Probab. 34(1), 80–121 (2006)
Cramér, H.: Les sommes et les fonctions de variables aléatoires, vol. 736. Hermann (1938)
Darling, D., Erdös, P.: A limit theorem for the maximum of normalized sums of independent random variables. Duke Math. J. 23(1), 143–155 (1956)
Deheuvels, P., Devroye, L., Lynch, J.: Exact convergence rate in the limit theorems of erdos-renyi and shepp. Ann. Probab. 14(1), 209–223 (1986)
Donoho, D., Jin, J.: Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics pp 962–994 (2004)
Donoho, D., Jin, J.: Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. 105(39), 14790–14795 (2008)
Donoho, D., Jin, J.: Special invited paper: Higher criticism for large-scale inference, especially for rare and weak effects. Stat. Sci. pp 1–25 (2015)
Dümbgen, L., Spokoiny, V.G.: Multiscale testing of qualitative hypotheses. Annals of Statistics pp 124–152 (2001)
Eicker, F.: The asymptotic distribution of the suprema of the standardized empirical processes. The Annals of Statistics pp 116–138 (1979)
Erdös, P., Rényi, A.: On a new law of large numbers. J d’Analyse Mathématique 23(1), 103–111 (1970)
Gao, Z., Stoev, S., et al.: Fundamental limits of exact support recovery in high dimensions. Bernoulli 26(4), 2605–2638 (2020)
Glaz, J., Balakrishnan, N.: Scan statistics and applications. Springer (1999)
Glaz, J., Koutras, M.V.: Handbook of Scan Statistics. Springer, New York. (2018) https://doi.org/10.1007/978-1-4614-8414-1
Glaz, J., Naus, J.I., Wallenstein, S.: Scan statistics. Springer (2001)
Glaz, J., Pozdnyakov, V., Wallenstein, S.: Scan statistics: Methods and applications. Springer Science & Business Media (2009)
Gombay, E., Horvath, L.: An application of the maximum likelihood test to the change-point problem. Stochastic Processes and their Applications 50(1), 161–171 (1994)
Gontscharuk, V., Finner, H.: Asymptotics of goodness-of-fit tests based on minimum p-value statistics. Communications in Statistics-Theory and Methods 46(5), 2332–2342 (2017)
Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., Weiss, D.: Syndromic surveillance in public health practice. New York City. Emerging Infectious Diseases 10(5), 858–864 (2004)
Jaeschke, D.: The asymptotic distribution of the supremum of the standardized empirical distribution function on subintervals. The Annals of Statistics pp 108–115 (1979)
Jager, L., Wellner, J.A.: A new goodness of fit test: the reversed berk-jones statistic (2004)
Jager, L., Wellner, J.A.: Goodness-of-fit tests via phi-divergences. Ann. Stat. 35(5), 2018–2053 (2007)
Jin, J.: Detecting and estimating sparse mixtures. PhD thesis, Stanford University (2003)
Jin, J., Starck, J.L., Donoho, D.L., Aghanim, N.: Forni O (2005) Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests. EURASIP Journal on Advances in Signal Processing 15, 297184 (2005)
Kabluchko, Z.: Extremes of the standardized gaussian noise. Stochastic Processes and their Applications 121(3), 515–533 (2011)
Kabluchko, Z., Wang, Y.: Limiting distribution for the maximal standardized increment of a random walk. Stochastic Processes and their Applications 124(9), 2824–2867 (2014)
Kolmogorov, A.: Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Istituto Italiano degli Attuari 4, 89–91 (1933)
König, C., Munk, A., Werner, F., et al.: Multidimensional multiscale scanning in exponential families: Limit theory and statistical consequences. Ann. Stat. 48(2), 655–678 (2020)
Kulldorff, M.: A spatial scan statistic. Communications in Statistics-Theory and methods 26(6), 1481–1496 (1997)
Mack, C.: An exact formula for \(q_k(n)\), the probable number of \(k\)-aggregates in a random distribution of \(n\) points. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 39(297), 778–790 (1948)
Mason, D.M., Shorack, G.R., Wellner, J.A.: Strong limit theorems for oscillation moduli of the uniform empirical process. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 65(1), 83–97 (1983)
Mikosch, T., Račkauskas, A.: The limit distribution of the maximum increment of a random walk with regularly varying jump size distribution. Bernoulli 16(4), 1016–1038 (2010)
Moscovich, A., Nadler, B., Spiegelman, C.: On the exact berk-jones statistics and their \(p\)-value calculation. Elec. J. Stat. 10(2), 2329–2354 (2016)
Naus, J.I.: The distribution of the size of the maximum cluster of points on a line. J. Am. Stat. Assoc. 60(310), 532–538 (1965)
Petrov, V.V.: Limit theorems of probability theory: sequences of independent random variables. Tech. rep, Oxford, New York (1995)
Proksch, K., Werner, F., Munk, A.: Multiscale scanning in inverse problems. Ann. Stat. 46(6B), 3569–3602 (2018)
Qualls, C., Watanabe, H.: Asymptotic properties of gaussian random fields. Trans. Am. Math. Soc. 177, 155–171 (1973)
Sharpnack, J., Arias-Castro, E.: Exact asymptotics for the scan statistic and fast alternatives. Elec. J. Stat. 10(2), 2641–2684 (2016)
Shorack, G.R., Wellner, J.A.: Empirical processes with applications to statistics. SIAM (2009)
Siegmund, D.: Large deviations for boundary crossing probabilities. The Annals of Probability pp 581–588 (1982)
Siegmund, D.: Boundary crossing probabilities and statistical applications. The Annals of Statistics pp 361–404 (1986)
Siegmund, D.: Approximate tail probabilities for the maxima of some random fields. The Annals of Probability pp 487–501 (1988)
Siegmund, D.: Sequential analysis: tests and confidence intervals. Springer Science & Business Media (2013)
Siegmund, D., Venkatraman, E.: Using the generalized likelihood ratio statistic for sequential detection of a change-point. The Annals of Statistics pp 255–271 (1995)
Siegmund, D., Yakir, B.: Tail probabilities for the null distribution of scanning statistics. Bernoulli 6(2), 191–213 (2000)
Silberstein, L.: The probable number of aggregates in distributions of points. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 36(256), 319–336 (1945)
Acknowledgements
Andrew Ying was partially supported by the Achievement Rewards for College Scientists (ARCS) Scholarship. The authors would like to thank Ery Arias-Castro for motivating the problem, and Qi-Man Shao, Xiao Fang, Hock Peng Chan and David O. Siegmund for stimulating discussions and pointers to the literature. The authors would also like to thank two anonymous reviewers for their valuable suggestions which considerably improved the content and the structure of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ying, A., Zhou, WX. On the asymptotic distribution of the scan statistic for empirical distributions. Extremes 25, 487–528 (2022). https://doi.org/10.1007/s10687-021-00435-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10687-021-00435-1