Skip to main content
Log in

Abstract

Researchers frequently scan sequences for unusual clustering of events. Glaz et al. (2001) survey scan statistic tools developed for these analyses. Many of these tools deal with clustering of one type of event. In other applications the researcher scans for clusters of two types of events, A and B. Consider a sequence of D independent and identically distributed trials where each trial has one of four possible outcomes: A cB c, AB c, A cB, AB. When the events A and B occur within d consecutive trials, we say that a two-type d-cluster has occurred (a directional cluster is also defined that requires that the A event comes at least as early as the B event). Naus and Wartenberg (1997) develop a double scan statistic that counts the number of declumped (a type of non-overlapping) clusters that contain at least one of each of two different types of events. They derived the expectation and variance and Poisson approximation for the distribution of the double scan statistic. The approximation and declumping methods used work well when the events are relatively rare but not as well for the case where the two types of events occur with high frequency. This paper develops an alternative family of double scan statistics to count the number of non-overlapping two-type d-clusters. These new double scan statistics behave similarly to the Naus-Wartenberg statistic for rare events, but capture other information for the more dense event case. Exact and approximate results are derived for the distribution of the new double scan statistics, allowing its use for a wider range of density of events. The double scan statistics are compared for the epidemiologic application in Naus and Wartenberg, and for a molecular biology application involving genome versus genome protein hits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • O. Barndorff-Nielsen, Information and Exponential Families, Wiley: Chichester, 1978.

    Google Scholar 

  • E. Çinlar, Introduction to Stochastic Processes, Prentice-Hall: Englewood Cliffs, NJ, 1975.

    Google Scholar 

  • C. F. Chen and S. Karlin, “Poisson approximation for conditional r-scan lengths of multiple renewal processes and applications to marker arrays in biomolecular sequences,” Journal of Applied Probability vol. 37 pp. 865–880, 2000.

    Google Scholar 

  • M. D. Ermolaeva, O. White, and S. L. Salzberg, “Prediction of operons in microbial genomes,” Nucleic Acids Research vol. 29 pp. 1216–1221, 2001.

    Google Scholar 

  • J. Glaz, J. Naus, and S. Wallenstein, Scan Statistics, Springer: New York, 2001.

    Google Scholar 

  • M. Greenberg, J. Naus, D. Schneider, and D. Wartenberg, “Temporal clustering of homicide and suicide among 15–24 year old white and black Americans,” Ethnicity and Disease vol. 1 pp. 342–350, 1991.

    Google Scholar 

  • J. Hoh and J. Ott, “Scan statistics to scan markers for susceptibility genes,” Proceedings of the National Academy of Science USA vol. 97 pp. 9615–9617, 2000.

    Google Scholar 

  • S. Karlin and V. Brendel, “Chance and statistical significance in protein and DNA sequence analysis,” Science vol. 257 pp. 39–49, 1992.

    Google Scholar 

  • S. Karlin and C. F. Chen, “r-Scan statistics of a marker array in multiple sequences derived from a common progenitor,” Annals of Applied Probability vol. 10 pp. 709–725, 2000.

    Google Scholar 

  • S. Karlin and F. Ost, “Counts of long aligned word matches among random letter sequences,” Adv. Applied Probability vol. 19 pp. 293–351, 1987.

    Google Scholar 

  • S. Karlin and F. Ost, “Maximal length of common words among random letter sequences,” Annals of Probability vol. 16 pp. 535–563, 1988.

    Google Scholar 

  • M. Y. Leung, G. A. Schachtel, and H. S. Yu, “Scan statistics and DNA sequence analysis: The search for an origin of replication in a virus,” Nonlinear World vol. 1 pp. 445–471, 1994.

    Google Scholar 

  • J. I. Naus and K. N. Sheng, “Matching among multiple random sequences,” Bull. Math. Bio. vol. 59 pp. 483–496, 1997.

    Google Scholar 

  • J. Naus and D. Wartenberg, “A double-scan statistic for clusters of two types of events,” Journal of the American Statistical Association vol. 92 pp. 1105–1113, 1997.

    Google Scholar 

  • V. T. Stefanov, “Noncurved exponential families associated with observations over finite-state Markov chains,” Scandinavian Journal of Statistics vol. 18 pp. 353–356, 1991.

    Google Scholar 

  • V. T. Stefanov, “On some waiting time problems,” Journal of Applied Probability vol. 37 pp. 756–764, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naus, J.I., Stefanov, V.T. Double-Scan Statistics. Methodology and Computing in Applied Probability 4, 163–180 (2002). https://doi.org/10.1023/A:1020641624294

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1020641624294

Navigation