Abstract
Researchers frequently scan sequences for unusual clustering of events. Glaz et al. (2001) survey scan statistic tools developed for these analyses. Many of these tools deal with clustering of one type of event. In other applications the researcher scans for clusters of two types of events, A and B. Consider a sequence of D independent and identically distributed trials where each trial has one of four possible outcomes: A c ∩ B c, A ∩ B c, A c ∩ B, A ∩ B. When the events A and B occur within d consecutive trials, we say that a two-type d-cluster has occurred (a directional cluster is also defined that requires that the A event comes at least as early as the B event). Naus and Wartenberg (1997) develop a double scan statistic that counts the number of declumped (a type of non-overlapping) clusters that contain at least one of each of two different types of events. They derived the expectation and variance and Poisson approximation for the distribution of the double scan statistic. The approximation and declumping methods used work well when the events are relatively rare but not as well for the case where the two types of events occur with high frequency. This paper develops an alternative family of double scan statistics to count the number of non-overlapping two-type d-clusters. These new double scan statistics behave similarly to the Naus-Wartenberg statistic for rare events, but capture other information for the more dense event case. Exact and approximate results are derived for the distribution of the new double scan statistics, allowing its use for a wider range of density of events. The double scan statistics are compared for the epidemiologic application in Naus and Wartenberg, and for a molecular biology application involving genome versus genome protein hits.
Similar content being viewed by others
References
O. Barndorff-Nielsen, Information and Exponential Families, Wiley: Chichester, 1978.
E. Çinlar, Introduction to Stochastic Processes, Prentice-Hall: Englewood Cliffs, NJ, 1975.
C. F. Chen and S. Karlin, “Poisson approximation for conditional r-scan lengths of multiple renewal processes and applications to marker arrays in biomolecular sequences,” Journal of Applied Probability vol. 37 pp. 865–880, 2000.
M. D. Ermolaeva, O. White, and S. L. Salzberg, “Prediction of operons in microbial genomes,” Nucleic Acids Research vol. 29 pp. 1216–1221, 2001.
J. Glaz, J. Naus, and S. Wallenstein, Scan Statistics, Springer: New York, 2001.
M. Greenberg, J. Naus, D. Schneider, and D. Wartenberg, “Temporal clustering of homicide and suicide among 15–24 year old white and black Americans,” Ethnicity and Disease vol. 1 pp. 342–350, 1991.
J. Hoh and J. Ott, “Scan statistics to scan markers for susceptibility genes,” Proceedings of the National Academy of Science USA vol. 97 pp. 9615–9617, 2000.
S. Karlin and V. Brendel, “Chance and statistical significance in protein and DNA sequence analysis,” Science vol. 257 pp. 39–49, 1992.
S. Karlin and C. F. Chen, “r-Scan statistics of a marker array in multiple sequences derived from a common progenitor,” Annals of Applied Probability vol. 10 pp. 709–725, 2000.
S. Karlin and F. Ost, “Counts of long aligned word matches among random letter sequences,” Adv. Applied Probability vol. 19 pp. 293–351, 1987.
S. Karlin and F. Ost, “Maximal length of common words among random letter sequences,” Annals of Probability vol. 16 pp. 535–563, 1988.
M. Y. Leung, G. A. Schachtel, and H. S. Yu, “Scan statistics and DNA sequence analysis: The search for an origin of replication in a virus,” Nonlinear World vol. 1 pp. 445–471, 1994.
J. I. Naus and K. N. Sheng, “Matching among multiple random sequences,” Bull. Math. Bio. vol. 59 pp. 483–496, 1997.
J. Naus and D. Wartenberg, “A double-scan statistic for clusters of two types of events,” Journal of the American Statistical Association vol. 92 pp. 1105–1113, 1997.
V. T. Stefanov, “Noncurved exponential families associated with observations over finite-state Markov chains,” Scandinavian Journal of Statistics vol. 18 pp. 353–356, 1991.
V. T. Stefanov, “On some waiting time problems,” Journal of Applied Probability vol. 37 pp. 756–764, 2000.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Naus, J.I., Stefanov, V.T. Double-Scan Statistics. Methodology and Computing in Applied Probability 4, 163–180 (2002). https://doi.org/10.1023/A:1020641624294
Issue Date:
DOI: https://doi.org/10.1023/A:1020641624294