Abstract
We consider the problem of detecting sparse heterogeneous mixtures from a nonparametric perspective. Specifically, we assume that the null distribution is symmetric about zero, while the true effects have positive median. We then suggest two new tests for this purpose. The main one is a form of Anderson–Darling test for symmetry and is closely related to the higher criticism. It is shown to achieve the detection boundary for the normal mixture model and, more generally, for asymptotically generalized Gaussian mixture models, in all sparsity regimes. The other test is a form of longest run test and specifically designed for the very sparse situation.
Similar content being viewed by others
Notes
This literature typically assumes that \(F=G\) but this is not essential. Indeed, the definition of the HC test does not require knowledge of G and it was shown in (Cai and Wu 2014) that the HC can adapt to various Gs.
This was suggested to us by a reviewer.
The power under the mixture model (2) is a weighed average of the power at each m, where the weights are given by the binomial distribution with parameters \((n, \varepsilon )\).
In our theoretical developments, we focused on the risk, which is standard in the literature. In our numerical experiments, we chose instead to fix the level and evaluate the power. We did so because this is typically what is done in practice. In fact, optimizing the risk necessitates knowledge of the alternative, which is rarely available.
References
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750
Anderson TW, Darling DA (1952) Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann Math Stat 23:193–212
Arias-Castro E, Wang M (2013) Distribution-free tests for sparse heterogeneous mixtures. Preprint arXiv:1308.0346
Arias-Castro E, Candès EJ, Plan Y (2011) Global testing under sparse alternatives: anova, multiple comparisons and the higher criticism. Ann Stat 39(5):2533–2556
Baklizi A (2007) Testing symmetry using a trimmed longest run statistic. Aust N Z J Stat 49(4):339–347
Cai TT, Wu Y (2014) Optimal detection of sparse mixtures against a given null distribution. IEEE Trans Inf Theory 60(4):2217–2232
Cai TT, Jeng XJ, Jin J (2011) Optimal detection of heterogeneous and heteroscedastic mixtures. J R Stat Soc Ser B Stat Methodol 73(5):629–662
Darling DA, Erdös P (1956) A limit theorem for the maximum of normalized sums of independent random variables. Duke Math J 23:143–155
Delaigle A, Hall P (2009) Higher criticism in the context of unknown distribution, non-independence and classification. In: Perspectives in mathematical sciences. I, vol 7. World Scientific Publishing, Hackensack, pp 109–138
Delaigle A, Hall P, Jin J (2011) Robustness and accuracy of methods for high dimensional data analysis based on Student’s \(t\) statistic. J R Stat Soc Ser B Stat Methodol 73(3):283–301
Donoho D, Jin J (2004) Higher criticism for detecting sparse heterogeneous mixtures. Ann Stat 32(3):962–994
Dudoit S, van der Laan MJ (2008) Multiple testing procedures with applications to genomics. Springer series in statistics. Springer, New York
Hall P, Jin J (2008) Properties of higher criticism under strong dependence. Ann Stat 36(1):381–402
Hall P, Jin J (2010) Innovated higher criticism for detecting sparse signals in correlated noise. Ann Stat 38(3):1686–1732
Hettmansperger TP (1984) Statistical inference based on ranks. Wiley, New York
Ingster Y, Tsybakov A, Verzelen N (2010) Detection boundary in sparse regression. Electron J Stat 4:1476–1526
Ingster YI (1997) Some problems of hypothesis testing leading to infinitely divisible distributions. Math Methods Stat 6(1):47–69
Ingster YI (2002a) Adaptive detection of a signal of growing dimension. I. Math Methods Stat 10:395–421
Ingster YI (2002b) Adaptive detection of a signal of growing dimension. II. Math Methods Stat 11:37–68
Jager L, Wellner J (2007) Goodness-of-fit tests via phi-divergences. Ann Stat 35(5):2018–2053
Jin J (2003) Detecting and estimating sparse mixtures. PhD Thesis, Stanford University
Lehmann EL, Romano JP (2005) Testing statistical hypotheses. Springer texts in statistics, 3rd edn. Springer, New York
Wang M (2014) On the detection of sparse mixtures. PhD Thesis, University of California, San Diego
Acknowledgments
We would like to thank Jason Schweinsberg for helpful discussions. This work was partially supported by a Grant from the US Office of Naval Research (N00014-09-1-0258) and a Grant from the US National Science Foundation (DMS 1223137).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Arias-Castro, E., Wang, M. Distribution-free tests for sparse heterogeneous mixtures. TEST 26, 71–94 (2017). https://doi.org/10.1007/s11749-016-0499-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-016-0499-x
Keywords
- Mixture detection
- Distribution-free tests
- Higher criticism
- Anderson–Darling test
- Smirnov test for symmetry