Abstract
Quantifying the difference between two distributions is a common problem in many machine learning and data mining tasks. What is also common in many tasks is that we only have empirical data. That is, we do not know the true distributions nor their form, and hence, before we can measure their divergence we first need to assume a distribution or perform estimation. For exploratory purposes this is unsatisfactory, as we want to explore the data, not our expectations. In this paper we study how to non-parametrically measure the divergence between two distributions. More in particular, we formalise the well-known Jensen-Shannon divergence using cumulative distribution functions. This allows us to calculate divergences directly and efficiently from data without the need for estimation. Moreover, empirical evaluation shows that our method performs very well in detecting differences between distributions, outperforming the state of the art in both statistical power and efficiency for a wide range of tasks.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Akoglu, L., Tong, H., Vreeken, J., Faloutsos, C.: Comprex: compression based anomaly detection. In: CIKM. ACM (2012)
Arnold, A., Liu, Y., Abe, N.: Temporal causal modeling with graphical granger methods. In: KDD, pp. 66–75 (2007)
Bay, S.D.: Multivariate discretization for set mining. Knowledge and Information Systems 3(4), 491–512 (2001)
Chandola, V., Vatsavai, R.R.: A gaussian process based online change detection algorithm for monitoring periodic time series. In: SDM, pp. 95–106 (2011)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (2006)
Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M.: Subgroup discovery meets bayesian networks - an exceptional model mining approach. In: ICDM, pp. 158–167 (2010)
Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Transactions on Information Theory 49(7), 1858–1860 (2003)
Kawahara, Y., Sugiyama, M.: Change-point detection in time-series data by direct density-ratio estimation. In: SDM, pp. 389–400 (2009)
Kullback, S., Leibler, R.: On information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)
Leman, D., Feelders, A., Knobbe, A.J.: Exceptional model mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 1–16. Springer, Heidelberg (2008)
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1991)
Liu, S., Yamada, M., Collier, N., Sugiyama, M.: Change-point detection in time-series data by relative density-ratio estimation. Neural Networks 43, 72–83 (2013)
Nguyen, H.V., Müuller, E., Vreeken, J., Efros, P., Böhm, K.: Unsupervised interaction-preserving discretization of multivariate data. Data Min. Knowl. Discov. 28(5–6), 1366–1397 (2014)
Nguyen, H.V., Müller, E., Vreeken, J., Efros, P., Böhm, K.: Multivariate maximal correlation analysis. In: ICML, pp. 775–783 (2014)
Nguyen, H.V., Müller, E., Vreeken, J., Keller, F., Böhm, K.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: SDM, pp. 198–206 (2013)
Park, S., Rao, M., Shin, D.W.: On cumulative residual Kullback-Leibler information. Statistics and Probability Letters 82, 2025–2032 (2012)
Perez-Cruz, F.: Kullback-Leibler divergence estimation of continuous distributions. In: ISIT, pp. 1666–1670. IEEE (2008)
Qiu, H., Liu, Y., Subrahmanya, N.A., Li, W.: Granger causality for time-series anomaly detection. In: ICDM, pp. 1074–1079 (2012)
Rao, M., Chen, Y., Vemuri, B.C., Wang, F.: Cumulative residual entropy: A new measure of information. IEEE Transactions on Information Theory 50(6), 1220–1228 (2004)
Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data sets. Science 334(6062), 1518–1524 (2011)
Saatci, Y., Turner, R.D., Rasmussen, C.E.: Gaussian process change point models. In: ICML, pp. 927–934 (2010)
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons Inc, New York (1992)
Song, X., Wu, M., Jermaine, C.M., Ranka, S.: Statistical change detection for multi-dimensional data. In: KDD, pp. 667–676 (2007)
Wang, F., Vemuri, B.C., Rangarajan, A.: Groupwise point pattern registration using a novel cdf-based Jensen-Shannon divergence. In: CVPR, pp. 1283–1288 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, HV., Vreeken, J. (2015). Non-parametric Jensen-Shannon Divergence. In: Appice, A., Rodrigues, P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9285. Springer, Cham. https://doi.org/10.1007/978-3-319-23525-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-23525-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23524-0
Online ISBN: 978-3-319-23525-7
eBook Packages: Computer ScienceComputer Science (R0)