Non-parametric Jensen-Shannon Divergence

Nguyen, Hoang-Vu; Vreeken, Jilles

doi:10.1007/978-3-319-23525-7_11

Hoang-Vu Nguyen¹⁰ &
Jilles Vreeken¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9285))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4280 Accesses
8 Citations

Abstract

Quantifying the difference between two distributions is a common problem in many machine learning and data mining tasks. What is also common in many tasks is that we only have empirical data. That is, we do not know the true distributions nor their form, and hence, before we can measure their divergence we first need to assume a distribution or perform estimation. For exploratory purposes this is unsatisfactory, as we want to explore the data, not our expectations. In this paper we study how to non-parametrically measure the divergence between two distributions. More in particular, we formalise the well-known Jensen-Shannon divergence using cumulative distribution functions. This allows us to calculate divergences directly and efficiently from data without the need for estimation. Moreover, empirical evaluation shows that our method performs very well in detecting differences between distributions, outperforming the state of the art in both statistical power and efficiency for a wide range of tasks.

Download to read the full chapter text

Chapter PDF

Asymptotic Properties of Minimum S-Divergence Estimator for Discrete Models

Article 18 December 2014

A Jensen–Gini measure of divergence with application in parameter estimation

Article 05 September 2017

Robust statistical inference based on the C-divergence family

Article 30 July 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Akoglu, L., Tong, H., Vreeken, J., Faloutsos, C.: Comprex: compression based anomaly detection. In: CIKM. ACM (2012)
Google Scholar
Arnold, A., Liu, Y., Abe, N.: Temporal causal modeling with graphical granger methods. In: KDD, pp. 66–75 (2007)
Google Scholar
Bay, S.D.: Multivariate discretization for set mining. Knowledge and Information Systems 3(4), 491–512 (2001)
Article Google Scholar
Chandola, V., Vatsavai, R.R.: A gaussian process based online change detection algorithm for monitoring periodic time series. In: SDM, pp. 95–106 (2011)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (2006)
MATH Google Scholar
Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M.: Subgroup discovery meets bayesian networks - an exceptional model mining approach. In: ICDM, pp. 158–167 (2010)
Google Scholar
Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Transactions on Information Theory 49(7), 1858–1860 (2003)
Article MathSciNet MATH Google Scholar
Kawahara, Y., Sugiyama, M.: Change-point detection in time-series data by direct density-ratio estimation. In: SDM, pp. 389–400 (2009)
Google Scholar
Kullback, S., Leibler, R.: On information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)
Article MathSciNet Google Scholar
Leman, D., Feelders, A., Knobbe, A.J.: Exceptional model mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 1–16. Springer, Heidelberg (2008)
Chapter Google Scholar
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1991)
Article Google Scholar
Liu, S., Yamada, M., Collier, N., Sugiyama, M.: Change-point detection in time-series data by relative density-ratio estimation. Neural Networks 43, 72–83 (2013)
Article MATH Google Scholar
Nguyen, H.V., Müuller, E., Vreeken, J., Efros, P., Böhm, K.: Unsupervised interaction-preserving discretization of multivariate data. Data Min. Knowl. Discov. 28(5–6), 1366–1397 (2014)
Article MathSciNet MATH Google Scholar
Nguyen, H.V., Müller, E., Vreeken, J., Efros, P., Böhm, K.: Multivariate maximal correlation analysis. In: ICML, pp. 775–783 (2014)
Google Scholar
Nguyen, H.V., Müller, E., Vreeken, J., Keller, F., Böhm, K.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: SDM, pp. 198–206 (2013)
Google Scholar
Park, S., Rao, M., Shin, D.W.: On cumulative residual Kullback-Leibler information. Statistics and Probability Letters 82, 2025–2032 (2012)
Article MathSciNet MATH Google Scholar
Perez-Cruz, F.: Kullback-Leibler divergence estimation of continuous distributions. In: ISIT, pp. 1666–1670. IEEE (2008)
Google Scholar
Qiu, H., Liu, Y., Subrahmanya, N.A., Li, W.: Granger causality for time-series anomaly detection. In: ICDM, pp. 1074–1079 (2012)
Google Scholar
Rao, M., Chen, Y., Vemuri, B.C., Wang, F.: Cumulative residual entropy: A new measure of information. IEEE Transactions on Information Theory 50(6), 1220–1228 (2004)
Article MathSciNet Google Scholar
Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data sets. Science 334(6062), 1518–1524 (2011)
Article MATH Google Scholar
Saatci, Y., Turner, R.D., Rasmussen, C.E.: Gaussian process change point models. In: ICML, pp. 927–934 (2010)
Google Scholar
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons Inc, New York (1992)
Book MATH Google Scholar
Song, X., Wu, M., Jermaine, C.M., Ranka, S.: Statistical change detection for multi-dimensional data. In: KDD, pp. 667–676 (2007)
Google Scholar
Wang, F., Vemuri, B.C., Rangarajan, A.: Groupwise point pattern registration using a novel cdf-based Jensen-Shannon divergence. In: CVPR, pp. 1283–1288 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Informatics and Saarland University, Saarbrücken, Germany
Hoang-Vu Nguyen & Jilles Vreeken

Authors

Hoang-Vu Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Jilles Vreeken
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jilles Vreeken .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
University of Porto, Porto, Portugal
Pedro Pereira Rodrigues
Universidade do Porto, Porto, Portugal
Vítor Santos Costa
University of Porto - INESC TEC, Porto, Portugal
João Gama
University of Porto - INESC TEC, Porto, Portugal
Alípio Jorge
University of Porto - INESC TEC, Porto, Portugal
Carlos Soares

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, HV., Vreeken, J. (2015). Non-parametric Jensen-Shannon Divergence. In: Appice, A., Rodrigues, P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9285. Springer, Cham. https://doi.org/10.1007/978-3-319-23525-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-23525-7_11
Published: 29 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23524-0
Online ISBN: 978-3-319-23525-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Non-parametric Jensen-Shannon Divergence

Abstract

Chapter PDF

Similar content being viewed by others

Asymptotic Properties of Minimum S-Divergence Estimator for Discrete Models

A Jensen–Gini measure of divergence with application in parameter estimation

Robust statistical inference based on the C-divergence family

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Non-parametric Jensen-Shannon Divergence

Abstract

Chapter PDF

Similar content being viewed by others

Asymptotic Properties of Minimum S-Divergence Estimator for Discrete Models

A Jensen–Gini measure of divergence with application in parameter estimation

Robust statistical inference based on the C-divergence family

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation