WIKS: a general Bayesian nonparametric index for quantifying differences between two populations

de Carvalho Ceregatti, Rafael; Izbicki, Rafael; Bueno Salasar, Luis Ernesto

doi:10.1007/s11749-020-00718-y

WIKS: a general Bayesian nonparametric index for quantifying differences between two populations

Original Paper
Published: 29 May 2020

Volume 30, pages 274–291, (2021)
Cite this article

TEST Aims and scope Submit manuscript

Rafael de Carvalho Ceregatti¹,
Rafael Izbicki¹ &
Luis Ernesto Bueno Salasar ORCID: orcid.org/0000-0003-4715-8633¹

133 Accesses
Explore all metrics

Abstract

A key problem in many research investigations is to decide whether two samples have the same distribution. Numerous statistical methods have been devoted to this issue, but only few considered a Bayesian nonparametric approach. In this paper, we propose a novel nonparametric Bayesian index (WIKS) for quantifying the difference between two populations \(P_1\) and \(P_2\), which is defined by a weighted posterior expectation of the Kolmogorov–Smirnov distance between \(P_1\) and \(P_2\). We present a Bayesian decision-theoretic argument to support the use of WIKS index and a simple algorithm to compute it. Furthermore, we prove that WIKS is a statistically consistent procedure and that it controls the significance level uniformly over the null hypothesis, a feature that simplifies the choice of cutoff values for taking decisions. We present a real data analysis and an extensive simulation study showing that WIKS is more powerful than competing approaches under several settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Confidence distributions and hypothesis testing

Article Open access 29 March 2024

Eugenio Melilli & Piero Veronese

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Sander Greenland, Stephen J. Senn, … Douglas G. Altman

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Ulrich Knief & Wolfgang Forstmeier

Notes

Common choices for this metric are the Kolmogorov–Smirnov metric, the L2 metric, the Lévy metric, the \(L_1\) and the symmetrized Kullback–Leibler metric. For a survey of metrics between probability measures, see Rachev et al. (2013).
This approach was suggested by, e.g., Swartz (1999) in a Bayesian nonparametric goodness-of-fit context.
Proposition 3 of Supplementary Material.
Proposition 1 of Supplementary Material.
In general, as K (the concentration parameter) decreases, the role of G will be less important; in fact, as K gets closer to zero, the test statistic gets closer to the Kolmogorov–Smirnov test statistic.

References

Al Labadi L, Zarepour M (2014) Goodness-of-fit tests based on the distance between the dirichlet process and its base measure. J Nonparametric Stat 26(2):341–357
Article MathSciNet Google Scholar
Basu S, Chib S (2003) Marginal likelihood and Bayes factors for Dirichlet process mixture models. J Am Stat Assoc 98(461):224–235
Article MathSciNet Google Scholar
Berger JO, Guglielmi A (2001) Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. J Am Stat Assoc 96(453):174–184
Article MathSciNet Google Scholar
Cecato JF, Martinelli JE, Izbicki R, Yassuda MS, Aprahamian I (2016) A subtest analysis of the montreal cognitive assessment (MoCA): which subtests can best discriminate between healthy controls, mild cognitive impairment and Alzheimer’s disease? Int Psychogeriatrics 28(5):825–832
Article Google Scholar
Chen Y, Hanson TE (2014) Bayesian nonparametric k-sample tests for censored and uncensored data. Comput Stat Data Anal 71:335–346
Article MathSciNet Google Scholar
Cuevas A, Febrero M, Fraiman R (2004) An anova test for functional data. Comput Stat Data Anal 47(1):111–122
Article MathSciNet Google Scholar
DeGroot MH (1970) Optimal statistical decisions. McGraw-Hill, New York
MATH Google Scholar
Duong T, Goud B, Schauer K (2012) Closed-form density-based framework for automatic detection of cellular morphology changes. Proc Natl Acad Sci 109(22):8382–8387
Article Google Scholar
Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1(4):209–230
MathSciNet MATH Google Scholar
Ferguson TS (1974) Prior distributions on spaces of probability measures. Ann Stat 2(4):615–629
Article MathSciNet Google Scholar
Florens JP, Richard JF, Rolin JM (1996) Bayesian encompassing specification tests of a parametric model against a non parametric alternative. Working Papers 96.08, Catholique de Louvain - Institut de statistique
Good IJ (1992) The Bayes/non-Bayes compromise: a brief review. J Am Stat Assoc 87:597–606
Article MathSciNet Google Scholar
Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012) A kernel two-sample test. J Mach Learn Res 13(Mar):723–773
MathSciNet MATH Google Scholar
Hjort NL et al (1990) Nonparametric Bayes estimators based on beta processes in models for life history data. Ann Stat 18(3):1259–1294
MathSciNet MATH Google Scholar
Holmes CC, Caron F, Griffin JE, Stephens DA (2015) Two-sample Bayesian nonparametric hypothesis testing. Bayesian Anal 10(2):297–320
Article MathSciNet Google Scholar
Jeffreys H (1961) The theory of probability. Oxford University Press, Oxford
MATH Google Scholar
Kolmogorov AN (1933) Sulla determinazione empirica di una legge di distribuzione. Giorn Ist Ital Attuar 4:83–91
MATH Google Scholar
Komárek A (2014) mixAk: Multivariate normal mixture models and mixtures of generalized linear mixed models including model based clustering. R package version 3
Lavine M et al (1992) Some aspects of polya tree distributions for statistical modelling. Ann Stat 20(3):1222–1235
Article Google Scholar
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18:50–60
Article MathSciNet Google Scholar
Pfister N, Bühlmann P, Schölkopf B, Peters J (2018) Kernel-based tests for joint independence. J R Stat Soc Ser B (Stat Methodol) 80(1):5–31
Article MathSciNet Google Scholar
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Rachev ST, Klebanov L, Stoyanov SV, Fabozzi F (2013) The methods of distances in the theory of probability and statistics. Springer, New York
Book Google Scholar
Ramdas A, Trillos NG, Cuturi M (2017) On wasserstein two-sample testing and related families of nonparametric tests. Entropy 19(2):47
Article MathSciNet Google Scholar
Sethuraman J (1994) A constructive definition of dirichlet priors. Stat Sin 4(2):639–650
MathSciNet MATH Google Scholar
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions. Ann Math Stat 19(2):279–281
Article MathSciNet Google Scholar
Srivastava R, Li P, Ruppert D (2016) RAPTT: an exact two-sample test in high dimensions using random projections. J Comput Graph Stat 25(3):954–970
Article MathSciNet Google Scholar
Swartz T (1999) Nonparametric goodness-of-fit. Commun Stat Theory Methods 28(12):2821–2841
Article MathSciNet Google Scholar
Székely GJ, Rizzo ML et al (2004) Testing for equal distributions in high dimension. InterStat 5(16.10):1249–1272
Google Scholar
Wei S, Lee C, Wichers L, Marron JS (2016) Direction-projection-permutation for high-dimensional hypothesis tests. J Comput Graph Stat 25(2):549–569
Article MathSciNet Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics Bull 1(6):80–83
Article Google Scholar

Download references

Acknowledgements

The authors are also grateful for the suggestions given by Danilo Lourenço Lopes, José Galvão Leite, the anonymous referees and the editors.

Author information

Authors and Affiliations

Federal University of São Carlos, Rod. Washington Luís km 235, SP-310, São Carlos, SP, Brazil
Rafael de Carvalho Ceregatti, Rafael Izbicki & Luis Ernesto Bueno Salasar

Authors

Rafael de Carvalho Ceregatti
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Izbicki
View author publications
You can also search for this author in PubMed Google Scholar
Luis Ernesto Bueno Salasar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luis Ernesto Bueno Salasar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partially supported by FAPESP – Fundação de Amparo à Pesquisa do Estado de São Paulo, Grants 2017/03363-8 and 2019/11321-9 and CNPq – Conselho Nacional de Desenvolvimento Científico e Tecnológico, Grant PQ 306943/2017-4.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3236 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Carvalho Ceregatti, R., Izbicki, R. & Bueno Salasar, L.E. WIKS: a general Bayesian nonparametric index for quantifying differences between two populations. TEST 30, 274–291 (2021). https://doi.org/10.1007/s11749-020-00718-y

Download citation

Received: 31 May 2019
Accepted: 25 April 2020
Published: 29 May 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11749-020-00718-y

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WIKS: a general Bayesian nonparametric index for quantifying differences between two populations

Abstract

Access this article

Similar content being viewed by others

Confidence distributions and hypothesis testing

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Violating the normality assumption may be the lesser of two evils

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 3236 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Abstract

Access this article

Similar content being viewed by others

Confidence distributions and hypothesis testing

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Violating the normality assumption may be the lesser of two evils

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 3236 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation