Abstract
In this paper, a Bayesian nonparametric approach to the two-sample problem is proposed. Given two samples \(\text{X} = {X_1}, \ldots ,{X_{m1}}\;\mathop {\text~}\limits^{i.i.d.} F\) and \(Y = {Y_1}, \ldots ,{Y_{{m_2}}}\mathop {\text~}\limits^{i.i.d.} G\), with F and G being unknown continuous cumulative distribution functions, we wish to test the null hypothesis H 0: F = G. The method is based on computing the Kolmogorov distance between two posterior Dirichlet processes and comparing the results with a reference distance. The parameters of the Dirichlet processes are selected so that any discrepancy between the posterior distance and the reference distance is related to the difference between the two samples. Relevant theoretical properties of the procedure are also developed. Through simulated examples, the approach is compared to the frequentist Kolmogorov–Smirnov test and a Bayesian nonparametric test in which it demonstrates excellent performance.
Similar content being viewed by others
References
L. Al-Labadi and M. Evans, “Optimal Robustness Results for Relative Belief Inferences and the Relationship to Prior-Data Conflict”, Bayesian Analysis 12, 705–728 (2016).
L. Al-Labadi, M. Masuadi, and M. Zarepour, “Two-Sample Bayesian NonparametricGoodness-of-Fit Test” (2015), https://arxiv.org/abs/1411.3427.
L. Al-Labadi and M. Zarepour, “A Bayesian Nonparametric Goodness of Fit Test for Right Censored Data Based onApproximate Samples from the Beta–Stacy Process”, Canadian J. Statist. 41 (3), 466–487 (2013).
L. Al-Labadi and M. Zarepour, “On Asymptotic Properties and Almost Sure Approximation of the Normalized Inverse-Gaussian Process”, Bayesian Analysis 8, 553–568 (2013).
L. Al-Labadi and M. Zarepour, “On Simulations from the Two-Parameter Poisson-DirichletProcess and the Normalized Inverse-Gaussian Process”, Sankhya A76, 158–176 (2014).
L. Al-Labadi and M. Zarepour, “Goodness of Fit Tests Based on the Distance between the Dirichlet Process and Its BaseMeasure”, J. Nonparam. Statist. 26, 341–357 (2014).
J. O. Berger and A. Guglielmi, “Bayesian Testing of a Parametric Model versusNonparametric Alternatives”, J. Amer. Statist. Assoc. 96, 174–184 (2001).
P. Billingsley, Probability and Measure, 2nd ed. (Wiley, New York, 1995).
D. Blackwell and J. B. MacQueen, “Ferguson Distributions via Polya Urn Schemes”, Ann. Statist. 1, 353–355 (1973).
L. Bondesson, “On Simulation from Infinitely Divisible Distributions”, Advances in Appl. Probab. 14, 885–869 (1982).
K. M. Borgwardt and Z. Ghahramani, Z. “Bayesian Two–Sample Tests” (2009), http://arxiv.org/abs/0906.4032.
C. Carota and G. Parmigiani, “On Bayes Factors for Nonparametric Alternatives”, in Bayesian Statistics 5, Ed. by J. M. Bernardo, J. Berger, A. P. Dawid, and A. F. M. Smith (Oxford Univ. Press, London, 1996).
Y. Chen and T. Hanson, “Bayesian Nonparametric k-Sample Tests for Censored and Uncensored Data”, Comp. Statist. and Data Analysis 71, 335–346 (2014).
A. Dasgupta, Asymptotic Theory of Statistics and Probability (Springer, New York, 2008).
M. Evans and H. Moshonov, “Checking for prior-data conflict”, Bayesian Analysis 1 (4), 893–914 (2006).
T. S. Ferguson, “A Bayesian Analysis of Some Nonparametric Problems”, Ann. Statist. 1, 209–230 (1973).
J. P. Florens, J. F. Richard, and J. M. Rolin, Bayesian Encompassing Specification Tests of a Parametric Model against a Nonparametric Alternative, Techn. Rep. 9608, Univ. Catholique de Louvain, Inst. de Statistique (1996).
J. K. Ghosh and R. V. Ramamoorthi, Bayesian Nonparametrics (Springer, New York, 2003).
A. Gibbs and E. F. Su, “On choosing and bounding probability metrics”, Int. Statist. Review 70, 419–435 (2002).
I. J. Good, “The Bayes/Non-Bayes Compromise: A Brief Review”, J. Amer. Statist. Assoc. 87, 597–606 (1992).
H. L. Harter and D. B. Owen, Selected Tables in Mathematical Statistics (Markham Publ. Co., Chicago, 1972), Vol. I.
P. Hsieh, “A Nonparametric Assessment of Model Adequacy Based on Kullback–Leibler Divergence”, Statist. and Comput. 23, 149–162 (2011).
C. C. Holmes, F. Caron, J. E. Griffin, and D. A. Stephens, “Two-Sample Bayesian NonparametricHypothesis Testing”, Bayesian Analysis 2, 297–320 (2015).
L. Huang and M. Ghosh, “Two-Sample Hypothesis Hypothesis Testing Problems under Lehmann Alternatives and Polya Tree Priors”, Statist. Sinica 24, 1717–1733 (2014).
L. F. James, “Large Sample Asymptotics for the Two-Parameter Poisson–Dirichlet Process”, in Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, Ed. by B. Clarke and S. Ghosal (Inst. of Math. Statist., Ohio, 2008)
N. Kim and P. Bickel, “The Limit Distribution of a Test Statistic for Bivariate Normality”, Statistica Sinica 13, 327–349 (2003).
M. Lavine, “Some Aspects of Polya Tree Distributions for Statistical Modelling”, Ann. Statist. 20, 1222–1235 (1992).
L. Ma and W. H. Wong, “Coupling Optional Po´ lya Trees and the Two Sample Problem”, J. Amer. Statist. Assoc. 106, 1553–1565 (2011).
R. McVinish, J. Rousseau, and K. Mengersen, “Bayesian Goodness of Fit Testing with Mixtures of Triangular Distributions”, Scand. J. Statist. 36, 337–354 (2009).
P. Muliere and K. Tardella, “Approximating Distributions of Random Functionals of Ferguson–Dirichlet Prior”, Canadian J. Statist. 26, 283–297 (1998).
J. Sethuraman, “AConstructiveDefinition of Dirichlet Priors”, Statistica Sinica 4, 639–650 (1994).
K. Shang and C. Reilly, “Nonparametric Bayesian Analysis Censoring”, Commun. Statist.–Theory and Methods (2017) DOI: 10.1080/03610926.2017.1288249.
T. B. Swartz, “Nonparametric Goodness-of-Fit” Commun. Statist.–Theory and Methods 28, 2821–2841 (1999).
A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical Processes with Applications to Statistics (Springer, New York, 1996).
K. Viele, Evaluating fit using Dirichlet processes, Techn. Rep. 384 (Univ. Kentucky, Dept. Statist., 2000).
R. L. Wolpert and K. Ickstadt, “Simulation of Lévy Random Fields”, in Practical Nonparametric and Semiparametric Bayesian Statistics, Ed. by D. Day, P. Nuller, and D. Sinha (Springer, 1998).
M. Zarepour and L. Al-Labadi, “On a Rapid Simulation of the Dirichlet Process”, Statist. and Probab. Lett. 82, 916–924 (2012).
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Al-Labadi, L., Zarepour, M. Two-sample Kolmogorov-Smirnov test using a Bayesian nonparametric approach. Math. Meth. Stat. 26, 212–225 (2017). https://doi.org/10.3103/S1066530717030048
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S1066530717030048