Two-sample homogeneity tests based on divergence measures

Wornowizki, Max; Fried, Roland

doi:10.1007/s00180-015-0633-3

Two-sample homogeneity tests based on divergence measures

Original Paper
Published: 09 January 2016

Volume 31, pages 291–313, (2016)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Max Wornowizki¹ &
Roland Fried¹

543 Accesses
6 Citations
Explore all metrics

Abstract

The concept of f-divergences introduced by Ali and Silvey (J R Stat Soc (B) 28:131–142, 1996) provides a rich set of distance like measures between pairs of distributions. Divergences do not focus on certain moments of random variables, but rather consider discrepancies between the corresponding probability density functions. Thus, two-sample tests based on these measures can detect arbitrary alternatives when testing the equality of the distributions. We treat the problem of divergence estimation as well as the subsequent testing for the homogeneity of two-samples. In particular, we propose a nonparametric estimator for f-divergences in the case of continuous distributions, which is based on kernel density estimation and spline smoothing. As we show in extensive simulations, the new method performs stable and quite well in comparison to several existing non- and semiparametric divergence estimators. Furthermore, we tackle the two-sample homogeneity problem using permutation tests based on various divergence estimators. The methods are compared to an asymptotic divergence test as well as to several traditional parametric and nonparametric procedures under different distributional assumptions and alternatives in simulations. It turns out that divergence based methods detect discrepancies between distributions more often than traditional methods if the distributions do not differ in location only. The findings are illustrated on ion mobility spectrometry data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Empirical phi-divergence test statistics for the difference of means of two populations

Article 13 February 2017

Robust statistical inference based on the C-divergence family

Article 30 July 2018

Multivariate Divergences with Application in Multisample Density Ratio Models

References

Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J R Stat Soc (B) 28:131–142
MathSciNet MATH Google Scholar
Alin A, Kurt S (2008) Ordinary and penalized minimum power-divergence estimators in two-way contingency tables. Computat Stat 23:455468
MathSciNet MATH Google Scholar
Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178
MathSciNet MATH Google Scholar
Basu A, Linday BG (1994) Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann Inst Stat Math 46(4):683–705
Article MathSciNet MATH Google Scholar
Basu A, Harris IR, Hjort NL, Jones MC (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika 85:549–559
Article MathSciNet MATH Google Scholar
Beran R (1977) Minimum Hellinger distance estimates for parametric models. Ann Stat 3:445463
MathSciNet MATH Google Scholar
Bischl B, Lang M, Mersmann O (2013) BatchExperiments: statistical experiments on batch computing clusters. R package version 1.0-968, http://CRAN.R-project.org/package=BatchExperiments/
Cardot H, Prchal L, Sarda P (2007) No effect and lack-of-fit permutation tests for functional regression. Comput Stat 22:371390
Article MathSciNet MATH Google Scholar
D’Addario M, Kopczynski D, Baumbach JI, Rahmann S (2014) A modular computational framework for automated peak extraction from ion mobility spectra. BMC Bioinform 15:25–36
Article Google Scholar
Fisher RA (1935) The design of experiments. Oliver and Boyd, Edinburgh
Google Scholar
Green PJ, Silverman BW (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. CRC Monogr Stat Appl Probab (Book 58), Chapman and Hall, New York
Govindarajulu Z (2007) Nonparametric inference. World Scientific Pub Co, Singapore
Book MATH Google Scholar
Kim JS, Scott CD (2012) Robust kernel density estimation. J Mach Learn Res 13(1):2529–2565
MathSciNet MATH Google Scholar
Kanamori T, Suzuki T, Sugiyama M (2012) F-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models. IEEE Trans Inf Theor 58:708–720
Article MathSciNet Google Scholar
Kopczynski D, Baumbach JI, Rahmann S (2012) Peak modeling for ion mobility spectrometry measurements. In: Proceedings of the 20th European signal processing conference (EUSIPCO 2012), pp. 1801–1805
Lee ET, Desu MM, Gehan EA (1975) A monte carlo study of the power of some two-sample tests. Biometrika 62:425–432
Article MATH Google Scholar
Lee S, Na O (2005) Test for parameter change based on the estimator minimizing density-based divergence measures. Ann Inst Stat Mat 57:553–573
Article MathSciNet MATH Google Scholar
Liese F, Miescke KJ (2008) Statistical decision theory: estimation, testing, and selection. Springer Series in Statistics, Berlin
Book MATH Google Scholar
Lindsay BG (1994) Efficiency versus robustness: the case for minimum hellinger distance and related methods. Annals Stat 22:1081–1114
Article MathSciNet MATH Google Scholar
Nelder JA, Mead R (1965) A simple algorithm for function minimization. Comput J 7:308–313
Article MATH Google Scholar
Qin J (1998) Inferences for case control and semiparametric two-sample density ratio models. Biometrika 85:619–630
Article MathSciNet MATH Google Scholar
R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org
Seghouane AK, Amari SI (2007) The AIC criterion and symmetrizing the KullbackLeibler divergence. IEEE Trans Neural Netw 18:97–104
Article Google Scholar
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc (B) 53:683–690
MathSciNet MATH Google Scholar
Sohn S, Jung BC, Jhun M (2012) Permutation tests using least distance estimator in the multivariate regression model. Comput Stat 27:191201
Article MathSciNet MATH Google Scholar
Sugiyama M, Kanamori T, Suzuki T, Hido S, Sese J, Takeuchi I, Wei L (2009) A density-ratio framework for statistical data processing. IPSJ Trans Comput Vis Appl 1:183–208
Google Scholar
Turlach BA (1993) Bandwidth selection in kernel density estimation: a review. Universit catholique de Louvain
Zeileis A, Hothorn T (2013) A toolbox of permutation tests for structural change. Stat Pap 54:931–954
Article MathSciNet MATH Google Scholar
Zhu Y, Wu J, Lu X (2013) Minimum Hellinger distance estimation for a two-sample semiparametric cure rate model with censored survival data. Comput Stat 28:2495–2518
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We thank the anonymous referees for their valuable remarks which helped us to improve this work. The authors were supported in part by the Collaborative Research Center 876, Project C3 Multi-level statistical analysis of high-frequency spatio-temporal process data and Collaborative Research Center 823, Project C3 analysis of structural change in dynamic processes of the German Research Foundation. Furthermore, we thank Marianna D’Addario and Dominik Kopczynski, both members of the Bioinformatics group of Prof. Dr. Sven Rahmann in the Collaborative Research Center 876, Project B1, for providing interesting real world data for our analysis.

Author information

Authors and Affiliations

Department of Statistics, Technische Universität Dortmund, 44221, Dortmund, Germany
Max Wornowizki & Roland Fried

Authors

Max Wornowizki
View author publications
You can also search for this author in PubMed Google Scholar
Roland Fried
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Max Wornowizki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wornowizki, M., Fried, R. Two-sample homogeneity tests based on divergence measures. Comput Stat 31, 291–313 (2016). https://doi.org/10.1007/s00180-015-0633-3

Download citation

Received: 30 June 2014
Accepted: 07 December 2015
Published: 09 January 2016
Issue Date: March 2016
DOI: https://doi.org/10.1007/s00180-015-0633-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-sample homogeneity tests based on divergence measures

Abstract

Access this article

Similar content being viewed by others

Empirical phi-divergence test statistics for the difference of means of two populations

Robust statistical inference based on the C-divergence family

Multivariate Divergences with Application in Multisample Density Ratio Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-sample homogeneity tests based on divergence measures

Abstract

Access this article

Similar content being viewed by others

Empirical phi-divergence test statistics for the difference of means of two populations

Robust statistical inference based on the C-divergence family

Multivariate Divergences with Application in Multisample Density Ratio Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation