Scan Statistic Tail Probability Assessment Based on Process Covariance and Window Size

Reiner-Benaim, Anat

doi:10.1007/s11009-015-9447-6

Scan Statistic Tail Probability Assessment Based on Process Covariance and Window Size

Published: 02 July 2015

Volume 18, pages 717–745, (2016)
Cite this article

Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Anat Reiner-Benaim¹

88 Accesses
Explore all metrics

Abstract

A scan statistic is examined for the purpose of testing the existence of a global peak in a random process with dependent variables of any distribution. The scan statistic tail probability is obtained based on the covariance of the moving sums process, thereby accounting for the spatial nature of the data as well as the size of the searching window. Exact formulas linking this covariance to the window size and the correlation coefficient are developed under general, common and auto covariance structures of the variables in the original process. The implementation and applicability of the formulas are demonstrated on multiple processes of t-statistics, treating also the case of unknown covariance. A sensitivity analysis provides further insight into the variant interaction of the tail probability with the influence parameters. An R code for the tail probability computation and the data analysis is offered within the supplementary material.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adak S (1998) Time-dependent spectral analysis of nonstationary time series. J Am Stat Assoc 93(444):1488–1501
Article MathSciNet MATH Google Scholar
Adler RJ, Taylor JE (2007) Random fields and geometry. Springer Monographs in Mathematics, Springer, New York
MATH Google Scholar
Amarioarei A, Preda C (2014) Approximations for two-dimensional discrete scan statistics in some block-factor type dependent models. J Stat Plan Infer 151-152:107–120
Article MathSciNet MATH Google Scholar
Amos DE, Bulgren WG (1972) Computation of a multivariate F distribution. Math Comput 26(117):255– 264
MathSciNet MATH Google Scholar
Bates D, Maechler M (2010) Matrix: sparse and dense matrix classes and methods. R package version 0.999375-46. Retrieved from http://CRAN.R-project.org/package=Matrix
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
MathSciNet MATH Google Scholar
Benjamini Y, Hochberg Y (1997) Multiple hypothesis testing with weights. Scand J Stat 24:407– 418
Article MathSciNet MATH Google Scholar
Bouaynaya N, Schonfeld D (2008) Non-stationary analysis of coding and non-coding regions in nucleotide sequences. IEEE J Selected Topics Signal Process 2(3):357–364
Article Google Scholar
Chan H, Zhang N (2007) Scan statistics with weighted observations. J Am Stat Assoc 102:595–602
Article MathSciNet MATH Google Scholar
Chen H, Xing H, Zhang NR (2011) Estimation of parent specific DNA copy number in tumors using high-density genotyping arrays. PLoS Comput Biolz 7(1):e1001060. doi:10.1371/journal.pcbi.1001060
Article MathSciNet Google Scholar
Chen J (1998) Approximations and inequalities for discrete scan statistics. unpublished Ph.D. Dissertation, University of Connecticut, Storrs, CT
Cheng SH, Higham N (1998) A modified cholesky algorithm based on a symmetric indefinite factorization. SIAM J Matrix Anal Appl 19:1097–1110
Article MathSciNet MATH Google Scholar
Conneely KN, Boehnke M (2007) So many correlated tests, so little time! rapid adjustment of P values for multiple correlated tests. Am J Hum Genet 81:1158–1168
Article Google Scholar
Darling RW, Waterman M (1986) Extreme value distribution for the largest cube in a random lattice. SIAM J Appl Math 46:118–132
Article MathSciNet MATH Google Scholar
David L, Huber W, Granovskaia M, Toedling J, Palm CJ, Bofkin L, Jones T, Davis RW, Steinmetz LM (2006) A high-resolution map of transcription in the yeast genome. Proc Natl Acad Sci 103:5320–5325
Article Google Scholar
Efron B (2007) Correlation and large-scale simultaneous significance testing. J Am Stat Assoc 102(477):93–103
Article MathSciNet MATH Google Scholar
Efron B (2010) Correlated Z-values and the accuracy of large-scale statistical estimates. J Am Stat Assoc 105(491):1042–1055
Article MathSciNet MATH Google Scholar
Genovese CR, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32:1035–1061
Article MathSciNet MATH Google Scholar
Genovese CR, Roeder K, Wasserman L (2006) False discovery control with P-value weighting. Biometrika 93(3):509–524
Article MathSciNet MATH Google Scholar
Genz A (1992) Numerical computation of multivariate normal probabilities. J Comput Graph Stat 1:141–150
Google Scholar
Genz A (1993) Comparison of methods for the computation of multivariate normal probabilities. Computing Science and Statistics 25:400–405
Google Scholar
Genz A, Bretz F (2009) Computation of multivariate normal and t probabilities, vol 195. Springer-Verlag, Heidelberg
Book MATH Google Scholar
Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2014) mvtnorm: multivariate normal and t distributions. R package version 0.9-9996. http://CRAN.R-project.org/package=mvtnorm
Glaz J, Balakrishnan N (eds) (1999) Scan statistics and applications. Boston, Birkhäuser
MATH Google Scholar
Glaz J, Naus J (1991) Tight bounds and approximations for scan statistic probabilities for discrete data. Ann Appl Probab 1:306–318
Article MathSciNet MATH Google Scholar
Glaz J, Naus J, Wallenstein S (2001) Scan statistics. Springer-Verlag, New York
Book MATH Google Scholar
Glaz J, Naus J, Wang X (2011) Approximations and inequalities for moving sums. Methodol Comput Appl Probab 14(3):597–616
Article MathSciNet MATH Google Scholar
Glaz J, Naus J, Wang X (2012) Approximations and inequalities for moving sums. Methodol Comput Appl Probab 14:597–616
Article MathSciNet MATH Google Scholar
Goldstein L, Waterman M (1992) Poisson, compound poisson and process approximations for testing statistical significance in sequence comparisons. Bull Math Biol 54(5):785–812
Article MATH Google Scholar
Haiman G, Preda C (2013) One dimensional scan statistics generated by some dependent stationary sequences. Statisitcs and Probability Letters 83(5):1457–1463
Article MathSciNet MATH Google Scholar
Higham N. (2002) Computing the nearest correlation matrix - a problem from finance. IMA J Numer Anal 22:329–343
Article MathSciNet MATH Google Scholar
Hoh J, Ott J (2000) Scan statistics to scan markers for susceptibility genes. Proc Natl Acad Sci:120–130
Huang L, Tiwari CT, Zou Z, Kulldorff M, Feuer EJ (2009) Weighted normal spatial scan statistic for heterogeneous population data. J Am Stat Assoc 104 (487):886–898
Article MathSciNet MATH Google Scholar
Huber W, Toedling J, Steinmetz L (2006) Transcript mapping with high-density oligonucleotide tiling arrays. Bioinformatics 22(16):1963–1970
Article Google Scholar
Juneau K, Palm C, Miranda M, Davis RW (2007) High-density yeast-tiling array reveals previously undiscovered introns and extensive regulation of meiotic splicing. Proc Natl Acad Sci 104:1522–1527
Article Google Scholar
Karlin S, Brendel V (1992) Chance and statistical significance in protein and DNA sequence analysis. Science 257:39–49
Article Google Scholar
Karlin S, Dembo A (1992) Limit-distribution of maximal segmental score among markov-dependent partial sums. Adv Appl Probab 24:113–140
Article MathSciNet MATH Google Scholar
Keles S, Van der Laan MJ, Dudoit S, Cawley S (2006) Multiple testing methods for ChIP-Chip high density Oligonucleotide array data. J Comput Biol 13(3):579–613
Article MathSciNet Google Scholar
Koutras MV, Alexandrou VA (1995) Runs, scans and URN model distributions: a unified Markov chain approach. Ann Inst Stat Math 47(4):743–766
Article MathSciNet MATH Google Scholar
Ledoit O, Wolf M (2003) Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of Empirical Finance 10:603–621
Article Google Scholar
Lin DY (2005) An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 21:781–787
Article Google Scholar
Lindgren G, Leadbetter MR, Rootzen H (1983) Extremes and related properties of stationary sequences and processes. Springer-Verlag, New York
MATH Google Scholar
Loader CR (1991) Large-deviation approximations to the distribution of scan statistics. Adv Appl Probab 23:751–771
Article MathSciNet MATH Google Scholar
Mourier T, Jeffares DC (2003) Eukaryotic intron loss. Science 300 (5624):1393—1393
Article Google Scholar
Naus J (1974) Probabilities for a generalized birthday problem. J Am Stat Assoc 69:810–815
Article MathSciNet MATH Google Scholar
Naus J (1982) Approximations for distributions of scan statistics. J Am Stat Assoc 77:177–183
Article MathSciNet MATH Google Scholar
Perone-Pacifico M, Genovese C, Verdinelli I, Wasserman L (2004) False discovery control for random fields. J Am Soc Stat Assoc 99:1002–1014
Article MathSciNet MATH Google Scholar
R Development Core Team (2011) R: A language and environment for statistical computing. Foundation for statistical computing, ISBN 3-900051-07-0. Vienna, Austria. Retrieved from http://www.R-project.org/
Reiner A, Yekutieli D, Benjamini Y (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19(3):368–375
Article Google Scholar
Reiner-Benaim A, Davis WR, Juneau K (2014) Scan statistics analysis for detection of introns in time-course tiling array data. Stat Appl Genet Mol Biol 13:173–90
MathSciNet MATH Google Scholar
Reiner-Benaim A, Yekutieli D, Letwin N, Elmer G, Lee N, Kafkafi N, Benjamini Y (2007) Associating quantitative behavioral traits with gene expression in the brain: searching for diamonds in the hay. Bioinformatics 23(17):2239–2246
Article Google Scholar
Rice SO (1945) Mathematical analysis of random noise. Bell System Technical Journal 24:46–156
Article MathSciNet MATH Google Scholar
Roeder K, Devlin B, Wasserman L (2007) Improving power in genome-wide association studies: weights tip the scale. Genet Epidemiol 31(7):741–747
Article Google Scholar
Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4, Article 32
Schäfer J, Opgen-Rhein R, Zuber V, Ahdesmaki M, Pedro Duarte Silva A, Strimmer K (2013) corpcor: efficient estimation of covariance and (Partial) correlation. R package version 1.6.6. http://strimmerlab.org/software/corpcor/
Schwartzman A, Gavrilov Y, Adler R (2011) Multiple testing of local maxima for detection of peaks in 1D. Ann Stat 39(6):3290–3319
Article MathSciNet MATH Google Scholar
Seaman SR, Müller-Myhsok B (2005) Rapid simulation of P values for product methods and multiple-testing adjustment in association studies. Am J Hum Genet 76:399–408
Article Google Scholar
Siegmund D. (1988) Approximate tail probabilities for the maxima of some random fields. Ann Probab 16(2):487–501
Article MathSciNet MATH Google Scholar
Siegmund D., Kim H (1989) The likelihood ratio test for a change-point in simple linear regression. Biometrika 76(3):409–423
Article MathSciNet MATH Google Scholar
Siegmund DO, Zhang NR, Yakir B (2011) False discovery rate for scanning statistics. Biometrika 98:979–985
Article MathSciNet MATH Google Scholar
Taylor JE, Worsley KJ (2007) Detecting sparse signal in random fields, with an application to brain mapping. J Am Stat Assoc 102(479):913–928
Article MathSciNet MATH Google Scholar
Woodroofe M (1976) Frequentist properties of bayesian sequential tests. Biometrika 63(1):101–110
Article MathSciNet MATH Google Scholar
Yekutieli D, Reiner-Benaim A, Benjamini Y, Elmer GI, Kafkafi N, Letwin NE, Lee NH (2006) Approaches to multiplicity issues in complex research in microarray analysis. Statistica Neerlandica 60(4):414–437
Article MathSciNet MATH Google Scholar
Zelinski JS, Bouaynaya N, Schonfeld D, O’Neill W (2008) Time-dependent ARMA modeling of genomic sequences. BMC Bioinforma 9(Suppl 9):S14
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Haifa, Haifa, 3498838, Israel
Anat Reiner-Benaim

Authors

Anat Reiner-Benaim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anat Reiner-Benaim.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 307 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reiner-Benaim, A. Scan Statistic Tail Probability Assessment Based on Process Covariance and Window Size. Methodol Comput Appl Probab 18, 717–745 (2016). https://doi.org/10.1007/s11009-015-9447-6

Download citation

Received: 16 November 2013
Revised: 10 January 2015
Accepted: 01 May 2015
Published: 02 July 2015
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11009-015-9447-6

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scan Statistic Tail Probability Assessment Based on Process Covariance and Window Size

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

(PDF 307 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Scan Statistic Tail Probability Assessment Based on Process Covariance and Window Size

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

(PDF 307 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation