Skip to main content
Log in

Two-sample Kolmogorov-Smirnov test using a Bayesian nonparametric approach

  • Published:
Mathematical Methods of Statistics Aims and scope Submit manuscript

Abstract

In this paper, a Bayesian nonparametric approach to the two-sample problem is proposed. Given two samples \(\text{X} = {X_1}, \ldots ,{X_{m1}}\;\mathop {\text~}\limits^{i.i.d.} F\) and \(Y = {Y_1}, \ldots ,{Y_{{m_2}}}\mathop {\text~}\limits^{i.i.d.} G\), with F and G being unknown continuous cumulative distribution functions, we wish to test the null hypothesis H 0: F = G. The method is based on computing the Kolmogorov distance between two posterior Dirichlet processes and comparing the results with a reference distance. The parameters of the Dirichlet processes are selected so that any discrepancy between the posterior distance and the reference distance is related to the difference between the two samples. Relevant theoretical properties of the procedure are also developed. Through simulated examples, the approach is compared to the frequentist Kolmogorov–Smirnov test and a Bayesian nonparametric test in which it demonstrates excellent performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. L. Al-Labadi and M. Evans, “Optimal Robustness Results for Relative Belief Inferences and the Relationship to Prior-Data Conflict”, Bayesian Analysis 12, 705–728 (2016).

    Article  MathSciNet  Google Scholar 

  2. L. Al-Labadi, M. Masuadi, and M. Zarepour, “Two-Sample Bayesian NonparametricGoodness-of-Fit Test” (2015), https://arxiv.org/abs/1411.3427.

    Google Scholar 

  3. L. Al-Labadi and M. Zarepour, “A Bayesian Nonparametric Goodness of Fit Test for Right Censored Data Based onApproximate Samples from the Beta–Stacy Process”, Canadian J. Statist. 41 (3), 466–487 (2013).

    Article  MathSciNet  MATH  Google Scholar 

  4. L. Al-Labadi and M. Zarepour, “On Asymptotic Properties and Almost Sure Approximation of the Normalized Inverse-Gaussian Process”, Bayesian Analysis 8, 553–568 (2013).

    Article  MathSciNet  MATH  Google Scholar 

  5. L. Al-Labadi and M. Zarepour, “On Simulations from the Two-Parameter Poisson-DirichletProcess and the Normalized Inverse-Gaussian Process”, Sankhya A76, 158–176 (2014).

    Article  MATH  Google Scholar 

  6. L. Al-Labadi and M. Zarepour, “Goodness of Fit Tests Based on the Distance between the Dirichlet Process and Its BaseMeasure”, J. Nonparam. Statist. 26, 341–357 (2014).

    Article  MATH  Google Scholar 

  7. J. O. Berger and A. Guglielmi, “Bayesian Testing of a Parametric Model versusNonparametric Alternatives”, J. Amer. Statist. Assoc. 96, 174–184 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  8. P. Billingsley, Probability and Measure, 2nd ed. (Wiley, New York, 1995).

    MATH  Google Scholar 

  9. D. Blackwell and J. B. MacQueen, “Ferguson Distributions via Polya Urn Schemes”, Ann. Statist. 1, 353–355 (1973).

    Article  MathSciNet  MATH  Google Scholar 

  10. L. Bondesson, “On Simulation from Infinitely Divisible Distributions”, Advances in Appl. Probab. 14, 885–869 (1982).

    Article  MathSciNet  MATH  Google Scholar 

  11. K. M. Borgwardt and Z. Ghahramani, Z. “Bayesian Two–Sample Tests” (2009), http://arxiv.org/abs/0906.4032.

    Google Scholar 

  12. C. Carota and G. Parmigiani, “On Bayes Factors for Nonparametric Alternatives”, in Bayesian Statistics 5, Ed. by J. M. Bernardo, J. Berger, A. P. Dawid, and A. F. M. Smith (Oxford Univ. Press, London, 1996).

    Google Scholar 

  13. Y. Chen and T. Hanson, “Bayesian Nonparametric k-Sample Tests for Censored and Uncensored Data”, Comp. Statist. and Data Analysis 71, 335–346 (2014).

    Article  MathSciNet  Google Scholar 

  14. A. Dasgupta, Asymptotic Theory of Statistics and Probability (Springer, New York, 2008).

    MATH  Google Scholar 

  15. M. Evans and H. Moshonov, “Checking for prior-data conflict”, Bayesian Analysis 1 (4), 893–914 (2006).

    Article  MathSciNet  MATH  Google Scholar 

  16. T. S. Ferguson, “A Bayesian Analysis of Some Nonparametric Problems”, Ann. Statist. 1, 209–230 (1973).

    Article  MathSciNet  MATH  Google Scholar 

  17. J. P. Florens, J. F. Richard, and J. M. Rolin, Bayesian Encompassing Specification Tests of a Parametric Model against a Nonparametric Alternative, Techn. Rep. 9608, Univ. Catholique de Louvain, Inst. de Statistique (1996).

    Google Scholar 

  18. J. K. Ghosh and R. V. Ramamoorthi, Bayesian Nonparametrics (Springer, New York, 2003).

    MATH  Google Scholar 

  19. A. Gibbs and E. F. Su, “On choosing and bounding probability metrics”, Int. Statist. Review 70, 419–435 (2002).

    Article  MATH  Google Scholar 

  20. I. J. Good, “The Bayes/Non-Bayes Compromise: A Brief Review”, J. Amer. Statist. Assoc. 87, 597–606 (1992).

    Article  MathSciNet  Google Scholar 

  21. H. L. Harter and D. B. Owen, Selected Tables in Mathematical Statistics (Markham Publ. Co., Chicago, 1972), Vol. I.

    MATH  Google Scholar 

  22. P. Hsieh, “A Nonparametric Assessment of Model Adequacy Based on Kullback–Leibler Divergence”, Statist. and Comput. 23, 149–162 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  23. C. C. Holmes, F. Caron, J. E. Griffin, and D. A. Stephens, “Two-Sample Bayesian NonparametricHypothesis Testing”, Bayesian Analysis 2, 297–320 (2015).

    Article  MATH  Google Scholar 

  24. L. Huang and M. Ghosh, “Two-Sample Hypothesis Hypothesis Testing Problems under Lehmann Alternatives and Polya Tree Priors”, Statist. Sinica 24, 1717–1733 (2014).

    MathSciNet  MATH  Google Scholar 

  25. L. F. James, “Large Sample Asymptotics for the Two-Parameter Poisson–Dirichlet Process”, in Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, Ed. by B. Clarke and S. Ghosal (Inst. of Math. Statist., Ohio, 2008)

    Google Scholar 

  26. N. Kim and P. Bickel, “The Limit Distribution of a Test Statistic for Bivariate Normality”, Statistica Sinica 13, 327–349 (2003).

    MathSciNet  MATH  Google Scholar 

  27. M. Lavine, “Some Aspects of Polya Tree Distributions for Statistical Modelling”, Ann. Statist. 20, 1222–1235 (1992).

    Article  MathSciNet  MATH  Google Scholar 

  28. L. Ma and W. H. Wong, “Coupling Optional Po´ lya Trees and the Two Sample Problem”, J. Amer. Statist. Assoc. 106, 1553–1565 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  29. R. McVinish, J. Rousseau, and K. Mengersen, “Bayesian Goodness of Fit Testing with Mixtures of Triangular Distributions”, Scand. J. Statist. 36, 337–354 (2009).

    Article  MathSciNet  MATH  Google Scholar 

  30. P. Muliere and K. Tardella, “Approximating Distributions of Random Functionals of Ferguson–Dirichlet Prior”, Canadian J. Statist. 26, 283–297 (1998).

    Article  MathSciNet  MATH  Google Scholar 

  31. J. Sethuraman, “AConstructiveDefinition of Dirichlet Priors”, Statistica Sinica 4, 639–650 (1994).

    MathSciNet  MATH  Google Scholar 

  32. K. Shang and C. Reilly, “Nonparametric Bayesian Analysis Censoring”, Commun. Statist.–Theory and Methods (2017) DOI: 10.1080/03610926.2017.1288249.

    Google Scholar 

  33. T. B. Swartz, “Nonparametric Goodness-of-Fit” Commun. Statist.–Theory and Methods 28, 2821–2841 (1999).

    Article  MathSciNet  MATH  Google Scholar 

  34. A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical Processes with Applications to Statistics (Springer, New York, 1996).

    Book  MATH  Google Scholar 

  35. K. Viele, Evaluating fit using Dirichlet processes, Techn. Rep. 384 (Univ. Kentucky, Dept. Statist., 2000).

    MATH  Google Scholar 

  36. R. L. Wolpert and K. Ickstadt, “Simulation of Lévy Random Fields”, in Practical Nonparametric and Semiparametric Bayesian Statistics, Ed. by D. Day, P. Nuller, and D. Sinha (Springer, 1998).

    Google Scholar 

  37. M. Zarepour and L. Al-Labadi, “On a Rapid Simulation of the Dirichlet Process”, Statist. and Probab. Lett. 82, 916–924 (2012).

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. Al-Labadi.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Labadi, L., Zarepour, M. Two-sample Kolmogorov-Smirnov test using a Bayesian nonparametric approach. Math. Meth. Stat. 26, 212–225 (2017). https://doi.org/10.3103/S1066530717030048

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S1066530717030048

Keywords

2000 Mathematics Subject Classification

Navigation