Skip to main content
Log in

Scan Statistic Tail Probability Assessment Based on Process Covariance and Window Size

  • Published:
Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Abstract

A scan statistic is examined for the purpose of testing the existence of a global peak in a random process with dependent variables of any distribution. The scan statistic tail probability is obtained based on the covariance of the moving sums process, thereby accounting for the spatial nature of the data as well as the size of the searching window. Exact formulas linking this covariance to the window size and the correlation coefficient are developed under general, common and auto covariance structures of the variables in the original process. The implementation and applicability of the formulas are demonstrated on multiple processes of t-statistics, treating also the case of unknown covariance. A sensitivity analysis provides further insight into the variant interaction of the tail probability with the influence parameters. An R code for the tail probability computation and the data analysis is offered within the supplementary material.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adak S (1998) Time-dependent spectral analysis of nonstationary time series. J Am Stat Assoc 93(444):1488–1501

    Article  MathSciNet  MATH  Google Scholar 

  • Adler RJ, Taylor JE (2007) Random fields and geometry. Springer Monographs in Mathematics, Springer, New York

    MATH  Google Scholar 

  • Amarioarei A, Preda C (2014) Approximations for two-dimensional discrete scan statistics in some block-factor type dependent models. J Stat Plan Infer 151-152:107–120

    Article  MathSciNet  MATH  Google Scholar 

  • Amos DE, Bulgren WG (1972) Computation of a multivariate F distribution. Math Comput 26(117):255– 264

    MathSciNet  MATH  Google Scholar 

  • Bates D, Maechler M (2010) Matrix: sparse and dense matrix classes and methods. R package version 0.999375-46. Retrieved from http://CRAN.R-project.org/package=Matrix

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300

    MathSciNet  MATH  Google Scholar 

  • Benjamini Y, Hochberg Y (1997) Multiple hypothesis testing with weights. Scand J Stat 24:407– 418

    Article  MathSciNet  MATH  Google Scholar 

  • Bouaynaya N, Schonfeld D (2008) Non-stationary analysis of coding and non-coding regions in nucleotide sequences. IEEE J Selected Topics Signal Process 2(3):357–364

    Article  Google Scholar 

  • Chan H, Zhang N (2007) Scan statistics with weighted observations. J Am Stat Assoc 102:595–602

    Article  MathSciNet  MATH  Google Scholar 

  • Chen H, Xing H, Zhang NR (2011) Estimation of parent specific DNA copy number in tumors using high-density genotyping arrays. PLoS Comput Biolz 7(1):e1001060. doi:10.1371/journal.pcbi.1001060

    Article  MathSciNet  Google Scholar 

  • Chen J (1998) Approximations and inequalities for discrete scan statistics. unpublished Ph.D. Dissertation, University of Connecticut, Storrs, CT

  • Cheng SH, Higham N (1998) A modified cholesky algorithm based on a symmetric indefinite factorization. SIAM J Matrix Anal Appl 19:1097–1110

    Article  MathSciNet  MATH  Google Scholar 

  • Conneely KN, Boehnke M (2007) So many correlated tests, so little time! rapid adjustment of P values for multiple correlated tests. Am J Hum Genet 81:1158–1168

    Article  Google Scholar 

  • Darling RW, Waterman M (1986) Extreme value distribution for the largest cube in a random lattice. SIAM J Appl Math 46:118–132

    Article  MathSciNet  MATH  Google Scholar 

  • David L, Huber W, Granovskaia M, Toedling J, Palm CJ, Bofkin L, Jones T, Davis RW, Steinmetz LM (2006) A high-resolution map of transcription in the yeast genome. Proc Natl Acad Sci 103:5320–5325

    Article  Google Scholar 

  • Efron B (2007) Correlation and large-scale simultaneous significance testing. J Am Stat Assoc 102(477):93–103

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B (2010) Correlated Z-values and the accuracy of large-scale statistical estimates. J Am Stat Assoc 105(491):1042–1055

    Article  MathSciNet  MATH  Google Scholar 

  • Genovese CR, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32:1035–1061

    Article  MathSciNet  MATH  Google Scholar 

  • Genovese CR, Roeder K, Wasserman L (2006) False discovery control with P-value weighting. Biometrika 93(3):509–524

    Article  MathSciNet  MATH  Google Scholar 

  • Genz A (1992) Numerical computation of multivariate normal probabilities. J Comput Graph Stat 1:141–150

    Google Scholar 

  • Genz A (1993) Comparison of methods for the computation of multivariate normal probabilities. Computing Science and Statistics 25:400–405

    Google Scholar 

  • Genz A, Bretz F (2009) Computation of multivariate normal and t probabilities, vol 195. Springer-Verlag, Heidelberg

    Book  MATH  Google Scholar 

  • Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2014) mvtnorm: multivariate normal and t distributions. R package version 0.9-9996. http://CRAN.R-project.org/package=mvtnorm

  • Glaz J, Balakrishnan N (eds) (1999) Scan statistics and applications. Boston, Birkhäuser

    MATH  Google Scholar 

  • Glaz J, Naus J (1991) Tight bounds and approximations for scan statistic probabilities for discrete data. Ann Appl Probab 1:306–318

    Article  MathSciNet  MATH  Google Scholar 

  • Glaz J, Naus J, Wallenstein S (2001) Scan statistics. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  • Glaz J, Naus J, Wang X (2011) Approximations and inequalities for moving sums. Methodol Comput Appl Probab 14(3):597–616

    Article  MathSciNet  MATH  Google Scholar 

  • Glaz J, Naus J, Wang X (2012) Approximations and inequalities for moving sums. Methodol Comput Appl Probab 14:597–616

    Article  MathSciNet  MATH  Google Scholar 

  • Goldstein L, Waterman M (1992) Poisson, compound poisson and process approximations for testing statistical significance in sequence comparisons. Bull Math Biol 54(5):785–812

    Article  MATH  Google Scholar 

  • Haiman G, Preda C (2013) One dimensional scan statistics generated by some dependent stationary sequences. Statisitcs and Probability Letters 83(5):1457–1463

    Article  MathSciNet  MATH  Google Scholar 

  • Higham N. (2002) Computing the nearest correlation matrix - a problem from finance. IMA J Numer Anal 22:329–343

    Article  MathSciNet  MATH  Google Scholar 

  • Hoh J, Ott J (2000) Scan statistics to scan markers for susceptibility genes. Proc Natl Acad Sci:120–130

  • Huang L, Tiwari CT, Zou Z, Kulldorff M, Feuer EJ (2009) Weighted normal spatial scan statistic for heterogeneous population data. J Am Stat Assoc 104 (487):886–898

    Article  MathSciNet  MATH  Google Scholar 

  • Huber W, Toedling J, Steinmetz L (2006) Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics 22(16):1963–1970

    Article  Google Scholar 

  • Juneau K, Palm C, Miranda M, Davis RW (2007) High-density yeast-tiling array reveals previously undiscovered introns and extensive regulation of meiotic splicing. Proc Natl Acad Sci 104:1522–1527

    Article  Google Scholar 

  • Karlin S, Brendel V (1992) Chance and statistical significance in protein and DNA sequence analysis. Science 257:39–49

    Article  Google Scholar 

  • Karlin S, Dembo A (1992) Limit-distribution of maximal segmental score among markov-dependent partial sums. Adv Appl Probab 24:113–140

    Article  MathSciNet  MATH  Google Scholar 

  • Keles S, Van der Laan MJ, Dudoit S, Cawley S (2006) Multiple testing methods for ChIP-Chip high density Oligonucleotide array data. J Comput Biol 13(3):579–613

    Article  MathSciNet  Google Scholar 

  • Koutras MV, Alexandrou VA (1995) Runs, scans and URN model distributions: a unified Markov chain approach. Ann Inst Stat Math 47(4):743–766

    Article  MathSciNet  MATH  Google Scholar 

  • Ledoit O, Wolf M (2003) Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of Empirical Finance 10:603–621

    Article  Google Scholar 

  • Lin DY (2005) An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 21:781–787

    Article  Google Scholar 

  • Lindgren G, Leadbetter MR, Rootzen H (1983) Extremes and related properties of stationary sequences and processes. Springer-Verlag, New York

    MATH  Google Scholar 

  • Loader CR (1991) Large-deviation approximations to the distribution of scan statistics. Adv Appl Probab 23:751–771

    Article  MathSciNet  MATH  Google Scholar 

  • Mourier T, Jeffares DC (2003) Eukaryotic intron loss. Science 300 (5624):1393—1393

    Article  Google Scholar 

  • Naus J (1974) Probabilities for a generalized birthday problem. J Am Stat Assoc 69:810–815

    Article  MathSciNet  MATH  Google Scholar 

  • Naus J (1982) Approximations for distributions of scan statistics. J Am Stat Assoc 77:177–183

    Article  MathSciNet  MATH  Google Scholar 

  • Perone-Pacifico M, Genovese C, Verdinelli I, Wasserman L (2004) False discovery control for random fields. J Am Soc Stat Assoc 99:1002–1014

    Article  MathSciNet  MATH  Google Scholar 

  • R Development Core Team (2011) R: A language and environment for statistical computing. Foundation for statistical computing, ISBN 3-900051-07-0. Vienna, Austria. Retrieved from http://www.R-project.org/

  • Reiner A, Yekutieli D, Benjamini Y (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19(3):368–375

    Article  Google Scholar 

  • Reiner-Benaim A, Davis WR, Juneau K (2014) Scan statistics analysis for detection of introns in time-course tiling array data. Stat Appl Genet Mol Biol 13:173–90

    MathSciNet  MATH  Google Scholar 

  • Reiner-Benaim A, Yekutieli D, Letwin N, Elmer G, Lee N, Kafkafi N, Benjamini Y (2007) Associating quantitative behavioral traits with gene expression in the brain: searching for diamonds in the hay. Bioinformatics 23(17):2239–2246

    Article  Google Scholar 

  • Rice SO (1945) Mathematical analysis of random noise. Bell System Technical Journal 24:46–156

    Article  MathSciNet  MATH  Google Scholar 

  • Roeder K, Devlin B, Wasserman L (2007) Improving power in genome-wide association studies: weights tip the scale. Genet Epidemiol 31(7):741–747

    Article  Google Scholar 

  • Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4, Article 32

  • Schäfer J, Opgen-Rhein R, Zuber V, Ahdesmaki M, Pedro Duarte Silva A, Strimmer K (2013) corpcor: efficient estimation of covariance and (Partial) correlation. R package version 1.6.6. http://strimmerlab.org/software/corpcor/

  • Schwartzman A, Gavrilov Y, Adler R (2011) Multiple testing of local maxima for detection of peaks in 1D. Ann Stat 39(6):3290–3319

    Article  MathSciNet  MATH  Google Scholar 

  • Seaman SR, Müller-Myhsok B (2005) Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. Am J Hum Genet 76:399–408

    Article  Google Scholar 

  • Siegmund D. (1988) Approximate tail probabilities for the maxima of some random fields. Ann Probab 16(2):487–501

    Article  MathSciNet  MATH  Google Scholar 

  • Siegmund D., Kim H (1989) The likelihood ratio test for a change-point in simple linear regression. Biometrika 76(3):409–423

    Article  MathSciNet  MATH  Google Scholar 

  • Siegmund DO, Zhang NR, Yakir B (2011) False discovery rate for scanning statistics. Biometrika 98:979–985

    Article  MathSciNet  MATH  Google Scholar 

  • Taylor JE, Worsley KJ (2007) Detecting sparse signal in random fields, with an application to brain mapping. J Am Stat Assoc 102(479):913–928

    Article  MathSciNet  MATH  Google Scholar 

  • Woodroofe M (1976) Frequentist properties of bayesian sequential tests. Biometrika 63(1):101–110

    Article  MathSciNet  MATH  Google Scholar 

  • Yekutieli D, Reiner-Benaim A, Benjamini Y, Elmer GI, Kafkafi N, Letwin NE, Lee NH (2006) Approaches to multiplicity issues in complex research in microarray analysis. Statistica Neerlandica 60(4):414–437

    Article  MathSciNet  MATH  Google Scholar 

  • Zelinski JS, Bouaynaya N, Schonfeld D, O’Neill W (2008) Time-dependent ARMA modeling of genomic sequences. BMC Bioinforma 9(Suppl 9):S14

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anat Reiner-Benaim.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 307 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reiner-Benaim, A. Scan Statistic Tail Probability Assessment Based on Process Covariance and Window Size. Methodol Comput Appl Probab 18, 717–745 (2016). https://doi.org/10.1007/s11009-015-9447-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11009-015-9447-6

Keywords

Mathematics Subject Classification (2010)

Navigation