Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2015: Machine Learning and Knowledge Discovery in Databases pp 173-189 | Cite as

Non-parametric Jensen-Shannon Divergence

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9285)


Quantifying the difference between two distributions is a common problem in many machine learning and data mining tasks. What is also common in many tasks is that we only have empirical data. That is, we do not know the true distributions nor their form, and hence, before we can measure their divergence we first need to assume a distribution or perform estimation. For exploratory purposes this is unsatisfactory, as we want to explore the data, not our expectations. In this paper we study how to non-parametrically measure the divergence between two distributions. More in particular, we formalise the well-known Jensen-Shannon divergence using cumulative distribution functions. This allows us to calculate divergences directly and efficiently from data without the need for estimation. Moreover, empirical evaluation shows that our method performs very well in detecting differences between distributions, outperforming the state of the art in both statistical power and efficiency for a wide range of tasks.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akoglu, L., Tong, H., Vreeken, J., Faloutsos, C.: Comprex: compression based anomaly detection. In: CIKM. ACM (2012)Google Scholar
  2. 2.
    Arnold, A., Liu, Y., Abe, N.: Temporal causal modeling with graphical granger methods. In: KDD, pp. 66–75 (2007)Google Scholar
  3. 3.
    Bay, S.D.: Multivariate discretization for set mining. Knowledge and Information Systems 3(4), 491–512 (2001)CrossRefGoogle Scholar
  4. 4.
    Chandola, V., Vatsavai, R.R.: A gaussian process based online change detection algorithm for monitoring periodic time series. In: SDM, pp. 95–106 (2011)Google Scholar
  5. 5.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (2006)MATHGoogle Scholar
  6. 6.
    Duivesteijn, W., Knobbe, A.J., Feelders, A., van Leeuwen, M.: Subgroup discovery meets bayesian networks - an exceptional model mining approach. In: ICDM, pp. 158–167 (2010)Google Scholar
  7. 7.
    Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Transactions on Information Theory 49(7), 1858–1860 (2003)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Kawahara, Y., Sugiyama, M.: Change-point detection in time-series data by direct density-ratio estimation. In: SDM, pp. 389–400 (2009)Google Scholar
  9. 9.
    Kullback, S., Leibler, R.: On information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Leman, D., Feelders, A., Knobbe, A.J.: Exceptional model mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 1–16. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  11. 11.
    Lin, J.: Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 37(1), 145–151 (1991)CrossRefGoogle Scholar
  12. 12.
    Liu, S., Yamada, M., Collier, N., Sugiyama, M.: Change-point detection in time-series data by relative density-ratio estimation. Neural Networks 43, 72–83 (2013)CrossRefMATHGoogle Scholar
  13. 13.
    Nguyen, H.V., Müuller, E., Vreeken, J., Efros, P., Böhm, K.: Unsupervised interaction-preserving discretization of multivariate data. Data Min. Knowl. Discov. 28(5–6), 1366–1397 (2014)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Nguyen, H.V., Müller, E., Vreeken, J., Efros, P., Böhm, K.: Multivariate maximal correlation analysis. In: ICML, pp. 775–783 (2014)Google Scholar
  15. 15.
    Nguyen, H.V., Müller, E., Vreeken, J., Keller, F., Böhm, K.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: SDM, pp. 198–206 (2013)Google Scholar
  16. 16.
    Park, S., Rao, M., Shin, D.W.: On cumulative residual Kullback-Leibler information. Statistics and Probability Letters 82, 2025–2032 (2012)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Perez-Cruz, F.: Kullback-Leibler divergence estimation of continuous distributions. In: ISIT, pp. 1666–1670. IEEE (2008)Google Scholar
  18. 18.
    Qiu, H., Liu, Y., Subrahmanya, N.A., Li, W.: Granger causality for time-series anomaly detection. In: ICDM, pp. 1074–1079 (2012)Google Scholar
  19. 19.
    Rao, M., Chen, Y., Vemuri, B.C., Wang, F.: Cumulative residual entropy: A new measure of information. IEEE Transactions on Information Theory 50(6), 1220–1228 (2004)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data sets. Science 334(6062), 1518–1524 (2011)CrossRefMATHGoogle Scholar
  21. 21.
    Saatci, Y., Turner, R.D., Rasmussen, C.E.: Gaussian process change point models. In: ICML, pp. 927–934 (2010)Google Scholar
  22. 22.
    Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons Inc, New York (1992)CrossRefMATHGoogle Scholar
  23. 23.
    Song, X., Wu, M., Jermaine, C.M., Ranka, S.: Statistical change detection for multi-dimensional data. In: KDD, pp. 667–676 (2007)Google Scholar
  24. 24.
    Wang, F., Vemuri, B.C., Rangarajan, A.: Groupwise point pattern registration using a novel cdf-based Jensen-Shannon divergence. In: CVPR, pp. 1283–1288 (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Max Planck Institute for Informatics and Saarland UniversitySaarbrückenGermany

Personalised recommendations