Skip to main content
Log in

Distribution-free tests for sparse heterogeneous mixtures

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We consider the problem of detecting sparse heterogeneous mixtures from a nonparametric perspective. Specifically, we assume that the null distribution is symmetric about zero, while the true effects have positive median. We then suggest two new tests for this purpose. The main one is a form of Anderson–Darling test for symmetry and is closely related to the higher criticism. It is shown to achieve the detection boundary for the normal mixture model and, more generally, for asymptotically generalized Gaussian mixture models, in all sparsity regimes. The other test is a form of longest run test and specifically designed for the very sparse situation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. This literature typically assumes that \(F=G\) but this is not essential. Indeed, the definition of the HC test does not require knowledge of G and it was shown in (Cai and Wu 2014) that the HC can adapt to various Gs.

  2. This was suggested to us by a reviewer.

  3. The power under the mixture model (2) is a weighed average of the power at each m, where the weights are given by the binomial distribution with parameters \((n, \varepsilon )\).

  4. In our theoretical developments, we focused on the risk, which is standard in the literature. In our numerical experiments, we chose instead to fix the level and evaluate the power. We did so because this is typically what is done in practice. In fact, optimizing the risk necessitates knowledge of the alternative, which is rarely available.

References

  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750

    Article  Google Scholar 

  • Anderson TW, Darling DA (1952) Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann Math Stat 23:193–212

    Article  MathSciNet  MATH  Google Scholar 

  • Arias-Castro E, Wang M (2013) Distribution-free tests for sparse heterogeneous mixtures. Preprint arXiv:1308.0346

  • Arias-Castro E, Candès EJ, Plan Y (2011) Global testing under sparse alternatives: anova, multiple comparisons and the higher criticism. Ann Stat 39(5):2533–2556

    Article  MathSciNet  MATH  Google Scholar 

  • Baklizi A (2007) Testing symmetry using a trimmed longest run statistic. Aust N Z J Stat 49(4):339–347

    MathSciNet  MATH  Google Scholar 

  • Cai TT, Wu Y (2014) Optimal detection of sparse mixtures against a given null distribution. IEEE Trans Inf Theory 60(4):2217–2232

    Article  MathSciNet  Google Scholar 

  • Cai TT, Jeng XJ, Jin J (2011) Optimal detection of heterogeneous and heteroscedastic mixtures. J R Stat Soc Ser B Stat Methodol 73(5):629–662

    Article  MathSciNet  MATH  Google Scholar 

  • Darling DA, Erdös P (1956) A limit theorem for the maximum of normalized sums of independent random variables. Duke Math J 23:143–155

    Article  MathSciNet  MATH  Google Scholar 

  • Delaigle A, Hall P (2009) Higher criticism in the context of unknown distribution, non-independence and classification. In: Perspectives in mathematical sciences. I, vol 7. World Scientific Publishing, Hackensack, pp 109–138

  • Delaigle A, Hall P, Jin J (2011) Robustness and accuracy of methods for high dimensional data analysis based on Student’s \(t\) statistic. J R Stat Soc Ser B Stat Methodol 73(3):283–301

    Article  MathSciNet  Google Scholar 

  • Donoho D, Jin J (2004) Higher criticism for detecting sparse heterogeneous mixtures. Ann Stat 32(3):962–994

    Article  MathSciNet  MATH  Google Scholar 

  • Dudoit S, van der Laan MJ (2008) Multiple testing procedures with applications to genomics. Springer series in statistics. Springer, New York

    Book  MATH  Google Scholar 

  • Hall P, Jin J (2008) Properties of higher criticism under strong dependence. Ann Stat 36(1):381–402

    Article  MathSciNet  MATH  Google Scholar 

  • Hall P, Jin J (2010) Innovated higher criticism for detecting sparse signals in correlated noise. Ann Stat 38(3):1686–1732

    Article  MathSciNet  MATH  Google Scholar 

  • Hettmansperger TP (1984) Statistical inference based on ranks. Wiley, New York

    MATH  Google Scholar 

  • Ingster Y, Tsybakov A, Verzelen N (2010) Detection boundary in sparse regression. Electron J Stat 4:1476–1526

    Article  MathSciNet  MATH  Google Scholar 

  • Ingster YI (1997) Some problems of hypothesis testing leading to infinitely divisible distributions. Math Methods Stat 6(1):47–69

    MathSciNet  MATH  Google Scholar 

  • Ingster YI (2002a) Adaptive detection of a signal of growing dimension. I. Math Methods Stat 10:395–421

    MathSciNet  MATH  Google Scholar 

  • Ingster YI (2002b) Adaptive detection of a signal of growing dimension. II. Math Methods Stat 11:37–68

    MathSciNet  MATH  Google Scholar 

  • Jager L, Wellner J (2007) Goodness-of-fit tests via phi-divergences. Ann Stat 35(5):2018–2053

    Article  MathSciNet  MATH  Google Scholar 

  • Jin J (2003) Detecting and estimating sparse mixtures. PhD Thesis, Stanford University

  • Lehmann EL, Romano JP (2005) Testing statistical hypotheses. Springer texts in statistics, 3rd edn. Springer, New York

    Google Scholar 

  • Wang M (2014) On the detection of sparse mixtures. PhD Thesis, University of California, San Diego

Download references

Acknowledgments

We would like to thank Jason Schweinsberg for helpful discussions. This work was partially supported by a Grant from the US Office of Naval Research (N00014-09-1-0258) and a Grant from the US National Science Foundation (DMS 1223137).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ery Arias-Castro.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arias-Castro, E., Wang, M. Distribution-free tests for sparse heterogeneous mixtures. TEST 26, 71–94 (2017). https://doi.org/10.1007/s11749-016-0499-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-016-0499-x

Keywords

Mathematics Subject Classification

Navigation