Skip to main content

3D Insights to Some Divergences for Robust Statistics and Machine Learning

  • Conference paper
  • First Online:
Geometric Science of Information (GSI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10589))

Included in the following conference series:

Abstract

Divergences (distances) which measure the similarity respectively proximity between two probability distributions have turned out to be very useful for several different tasks in statistics, machine learning, information theory, etc. Some prominent examples are the Kullback-Leibler information, – for convex functions \(\phi \) – the Csiszar-Ali-Silvey \(\phi -\)divergences CASD, the “classical” (i.e., unscaled) Bregman distances and the more general scaled Bregman distances SBD of [26, 27]. By means of 3D plots we show several properties and pitfalls of the geometries of SBDs, also for non-probability distributions; robustness of corresponding minimum-distance-concepts will also be covered. For these investigations, we construct a special SBD subclass which covers both the often used power divergences (of CASD type) as well as their robustness-enhanced extensions with non-convex non-concave \(\phi \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Which is equal to \(D_{{\check{\phi }}_{1}}\left( P,Q \right) \) with \({\check{\phi }}_{1}(t) := t\log t \in [-\mathrm {e}^{-1}, \infty [\), but generally \(D_{\phi _{1}}\left( \mu ,\nu \right) \ne D_{{\check{\phi }}_{1}}\left( \mu ,\nu \right) \) where the latter can be negative and thus isn’t a distance.

  2. 2.

    Also notice that the HD together with \(\theta _0 = 0.5\) does not exhibit such an effect for our smaller 3-element-state space, due to the lack of outliers.

References

  1. Ali, M.S., Silvey, D.: A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B–28, 131–140 (1966)

    MATH  MathSciNet  Google Scholar 

  2. Basu, A., Lindsay, B.G.: Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann. Inst. Statist. Math. 46(4), 683–705 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  3. Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation by minimising a density power divergence. Biometrika 85, 549–559 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  4. Basu, A., Shioya, H., Park, C.: Statistical Inference: The Minimum Distance Approach. CRC Press, Boca Raton (2011)

    MATH  Google Scholar 

  5. Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)

    MATH  MathSciNet  Google Scholar 

  6. Broniatowski, M.: A weighted bootstrap procedure for divergence minimization problems. In: Antoch, J., Jureckova, J., Maciak, M., PeSta, M. (eds.) AMISTAT 2015, pp. 1–22. Springer, Cham (2017)

    Google Scholar 

  7. Cerone, P., Dragomir, S.S.: Approximation of the integral mean divergence and \(f-\)divergence via mean results. Math. Comp. Model. 42, 207–219 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  8. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning & Games. Cambridge UP, New York (2006)

    Book  MATH  Google Scholar 

  9. Csiszar, I.: Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. A–8, 85–108 (1963)

    MATH  Google Scholar 

  10. Csiszar, I., Breuer, T.: Measuring distribution model risk. Mathe. Finance 26(2), 395–411 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  11. Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Mach. Learn. 48, 253–285 (2002)

    Article  MATH  Google Scholar 

  12. Kißlinger, A.-L., Stummer, W.: Some decision procedures based on scaled Bregman distance surfaces. In: Nielsen, F., Barbaresco, F. (eds.) GSI 2013. LNCS, vol. 8085, pp. 479–486. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40020-9_52

    Chapter  Google Scholar 

  13. Kißlinger, A.-L., Stummer, W.: New model search for nonlinear recursive models, regressions and autoregressions. In: Nielsen, F., Barbaresco, F. (eds.) GSI 2015. LNCS, vol. 9389, pp. 693–701. Springer, Cham (2015). doi:10.1007/978-3-319-25040-3_74

    Chapter  Google Scholar 

  14. Kißlinger, A.-L., Stummer, W.: A New Information-Geometric Method of Change Detection. (2015, Preprint)

    Google Scholar 

  15. Kißlinger, A.-L., Stummer, W.: Robust statistical engineering by means of scaled Bregman distances. In: Agostinelli, C., Basu, A., Filzmoser, P., Mukherjee, D. (eds.) Recent Advances in Robust Statistics: Theory and Applications, pp. 81–113. Springer, New Delhi (2016). doi:10.1007/978-81-322-3643-6_5

    Chapter  Google Scholar 

  16. Liese, F., Vajda, I.: Convex Statistical Distances. Teubner, Leipzig (1987)

    MATH  Google Scholar 

  17. Lindsay, B.G.: Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann. Statist. 22(2), 1081–1114 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  18. Murata, N., Takenouchi, T., Kanamori, T., Eguchi, S.: Information geometry of U-boost and Bregman divergence. Neural Comput. 16(7), 1437–1481 (2004)

    Article  MATH  Google Scholar 

  19. Nock, R., Menon, A.K., Ong, C.S.: A scaled Bregman theorem with applications. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), pp. 19–27 (2016)

    Google Scholar 

  20. Nock, R., Nielsen, F., Amari, S.-I.: On conformal divergences and their population minimizers. IEEE Trans. Inf. Theory 62(1), 527–538 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  21. Nock, R., Nielsen, F.: Bregman divergences and surrogates for learning. IEEE Trans. Pattern Anal. Mach. Intell. 31(11), 2048–2059 (2009)

    Article  Google Scholar 

  22. Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman H, Boca Raton (2006)

    MATH  Google Scholar 

  23. Pardo, M.C., Vajda, I.: On asymptotic properties of information-theoretic divergences. IEEE Trans. Inf. Theory 49(7), 1860–1868 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  24. Read, T.R.C., Cressie, N.A.C.: Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer, New York (1988)

    Book  MATH  Google Scholar 

  25. Shioya, H., Da-te, T.: A generalisation of Lin divergence and the derivation of a new information divergence measure. Electr. Commun. Japan 78(7), 34–40 (1995)

    Article  Google Scholar 

  26. Stummer, W.: Some Bregman distances between financial diffusion processes. Proc. Appl. Math. Mech. 7(1), 1050503–1050504 (2007)

    Article  Google Scholar 

  27. Stummer, W., Vajda, I.: On Bregman distances and divergences of probability measures. IEEE Trans. Inf.Theory 58(3), 1277–1288 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  28. Stummer, W., Vajda, I.: On divergences of finite measures and their applicability in statistics and information theory. Statistics 44, 169–187 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  29. Sugiyama, M., Suzuki, T., Kanamori, T.: Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation. Ann. Inst. Stat. Math. 64, 1009–1044 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  30. Tsuda, K., Rätsch, G., Warmuth, M.: Matrix exponentiated gradient updates for on-line learning and Bregman projection. J. Mach. Learn. Res. 6, 995–1018 (2005)

    MATH  MathSciNet  Google Scholar 

  31. Wu, L., Hoi, S.C.H., Jin, R., Zhu, J., Yu, N.: Learning Bregman distance functions for semi-supervised clustering. IEEE Trans. Knowl. Data Eng. 24(3), 478–491 (2012)

    Article  Google Scholar 

Download references

Acknowledgement

We are grateful to all three referees for their useful suggestions. W. Stummer thanks A.L. Kißlinger for valuable discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Stummer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Roensch, B., Stummer, W. (2017). 3D Insights to Some Divergences for Robust Statistics and Machine Learning. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2017. Lecture Notes in Computer Science(), vol 10589. Springer, Cham. https://doi.org/10.1007/978-3-319-68445-1_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68445-1_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68444-4

  • Online ISBN: 978-3-319-68445-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics