Abstract
Motivated by the more frequent natural and anthropogenic hazards, we revisit the problem of assessing whether an apparent temporal clustering in a sequence of randomly occurring events is a genuine surprise and should call for an examination. We study the problem in both discrete and continuous time formulation. In the discrete formulation, the problem reduces to deriving the probability that p independent people all have birthdays within d days of each other. We provide an analytical expression for a warning limit such that if a subset of p people among n are observed to have birthdays within d days of each other and d is smaller than our warning limit, then it should be treated as a surprising cluster. In the continuous time framework, three different sets of results are given. First, we provide an asymptotic analysis of the problem by embedding it into an extreme value problem for high order spacings of iid samples from the U[0, 1] density. Second, a novel analytical nonasymptotic bound is derived by using certain tools of empirical process theory. Finally, the required probability is approximated by using various bounds and asymptotic results on the supremum of the scanning process of a one dimensional stationary Poisson process. We apply the theories to climate change related datasets, datasets on temperatures, and mass shooting records in the United States. These real data applications of our theoretical methods lead to supporting evidence for climate change and recent spikes in gun violence.
Similar content being viewed by others
References
Abramson M, Moser W (1970) More birthday surprises. Amer Math Monthly 7:856–858
Alm S (1999) Approximations of distributions of scan statistics of poisson processes. In: Glaz J, Balakrishnan N (eds) Scan Statistics and Applications. Birkhäuser, Berlin, pp 113–140
Cressie N (1977) The minimum of higher order gaps. Austr J Statist 19:132–143
DasGupta A (2008) Asymptotic theory of statistics and probability. Springer, New York
Dembo A, Karlin S (1992) Poisson approximations for \(r\)-scan processes. Ann Appl Prob 2:329–357
Diaconis P, Mosteller F (1989) Methods for studying coincidences. J Amer Statist Assoc 8:853–861
Giné E, Zinn J (1984) Some limit theorems for empirical processes. Ann Prob 12:929–989
Glaz J, Naus J (1991) Tight bounds for scan statistics probabilities for discrete data. Ann Appl Probab 1:306–318
Glaz J, Naus J, Walllenstein S (2001) Scan statistics. Springer, New York
Haiman G (2000) Estimating the distribution of scan statistics with high precision. Extremes 3:348–361
Haiman G (2007) Estimating the distribution of one-dimensional discrete scan statistics viewed as extremes of 1-dependent stationary sequence. Jour Stat Plan Infer 137:821–828
Janson S (1984) Bounds on the distributions of extremal values in a scanning process. Stoch Proc Appls 18:313–328
Klamkin M, Newman DJ (1967) Extensions of the birthday surprise. J Combin Theory 3:279–282
Krauth J (1992) Bounds for the upper-tail probabilities of the circular ratchet scan statistics. Biometrics 48:1177–1185
Lagrange R (1963) Sur les combinaisons d’objets numérotes. Bull Sci Math 87:29–42
Loader C (1991) Large-deviation approximations to the distribution of scan statistics. Adv Appl Prob 23:751–771
Naus JI (1968) An extension of the birthday problem. Am Stat 22:27–29
Naus JI (1982) Approximations for distribution of scan statistics. J Am Stat Assoc 77:177–183
Newell G (1963) Distribution between the smallest distance for any pair of \(k\)-th nearest neighbor random points on a line. In: Rosenblatt M (ed) Time Series Analysis, Proceedings of a Conference held at Brown University. Academic Press, NY
Pachauri RK et al (2014). In: Pachauri R, Meyer L (eds) Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. IPCC, Geneva
Robbins MW, Lund RB, Gallagher CM, Lu Q (2011) Changepoints in the north atlantic tropical cyclone record. J Am Stat Assoc 106:89–99
Tu I (1997) Theory and Application of Scan Statistics PhD Dissertation. Department of Statistics, Stanford University
Wallenstein S, Weinberg CR, Gould M (1989) Testing for a pulse in seasonal event data. Biometrics 45:817–830
Wallenstein S, Neff N (1987) An approximation for the distribution of the scan statistic. Stat Med 6:197–207
Watson G (1954) Extreme values in samples from \(m\)-dependent stationary stochastic processes. Ann Math Statist 25:798–800
Acknowledgements
We are greatly indebted to Joe Glaz and Christian Robert for carefully reading earlier drafts of this manuscript and for contributing to the development of the results. Comments from two anonymous reviewers very greatly improved this paper and we are much indebted to the reviewers. We acknowledge that Li’s research is partially supported by NSF grants DPP-1418339 and AGS-1602845 and NASA-NNX14A080G, and DasGupta’s research is partially supported by grant 206057 from Elsevier Global Analytics.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dasgupta, A., Li, B. Detection and Analysis of Spikes in a Random Sequence. Methodol Comput Appl Probab 20, 1429–1451 (2018). https://doi.org/10.1007/s11009-018-9637-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-018-9637-0