Information geometry in optimization, machine learning and statistical inference

Amari, Shun-ichi

doi:10.1007/s11460-010-0101-3

Information geometry in optimization, machine learning and statistical inference

Research Article
Published: 23 July 2010

Volume 5, pages 241–260, (2010)
Cite this article

Frontiers of Electrical and Electronic Engineering in China

Shun-ichi Amari¹

714 Accesses
13 Citations
4 Altmetric
Explore all metrics

Abstract

The present article gives an introduction to information geometry and surveys its applications in the area of machine learning, optimization and statistical inference. Information geometry is explained intuitively by using divergence functions introduced in a manifold of probability distributions and other general manifolds. They give a Riemannian structure together with a pair of dual flatness criteria. Many manifolds are dually flat. When a manifold is dually flat, a generalized Pythagorean theorem and related projection theorem are introduced. They provide useful means for various approximation and optimization problems. We apply them to alternative minimization problems, Ying-Yang machines and belief propagation algorithm in machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

{Euclidean, metric, and Wasserstein} gradient flows: an overview

Article Open access 14 March 2017

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

References

Amari S, Nagaoka H. Methods of Information Geometry. New York: Oxford University Press, 2000
MATH Google Scholar
Csiszár I. Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 1967, 2: 299–318
MATH MathSciNet Google Scholar
Bregman L. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 1967, 7(3): 200–217
Article Google Scholar
Eguchi S. Second order efficiency of minimum contrast estimators in a curved exponential family. The Annals of Statistics, 1983, 11(3): 793–803
Article MATH MathSciNet Google Scholar
Chentsov N N. Statistical Decision Rules and Optimal Inference. Rhode Island, USA: American Mathematical Society, 1982 (originally published in Russian, Moscow: Nauka, 1972)
MATH Google Scholar
Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B, 1977, 39(1): 1–38
MATH MathSciNet Google Scholar
Csiszár I, Tusnády G. Information geometry and alternating minimization procedures. Statistics and Decisions, 1984, Supplement Issue 1: 205–237
Amari S. Information geometry of the EM and em algorithms for neural networks. Neural Networks, 1995, 8(9): 1379–1408
Article Google Scholar
Xu L. Bayesian Ying-Yang machine, clustering and number of clusters. Pattern Recognition Letters, 1997, 18(11–13): 1167–1178
Article Google Scholar
Xu L. RBF nets, mixture experts, and Bayesian Ying-Yang learning. Neurocomputing, 1998, 19(1–3): 223–257
Article MATH Google Scholar
Xu L. Bayesian Kullback Ying-Yang dependence reduction theory. Neurocomputing, 1998, 22(1–3): 81–111
Article MATH Google Scholar
Xu L. BYY harmony learning, independent state space, and generalized APT financial analyses. IEEE Transactions on Neural Networks, 2001, 12(4): 822–849
Article Google Scholar
Xu L. Best harmony, unified RPCL and automated model selection for unsupervised and supervised learning on Gaussian mixtures, three-layer nets and ME-RBF-SVM models. International Journal of Neural Systems, 2001, 11(1): 43–69
Google Scholar
Xu L. BYY harmony learning, structural RPCL, and topological self-organizing on mixture models. Neural Networks, 2002, 15(8–9): 1125–1151
Article Google Scholar
Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann, 1988
Google Scholar
Ikeda S, Tanaka T, Amari S. Information geometry of turbo and low-density parity-check codes. IEEE Transactions on Information Theory, 2004, 50(6): 1097–1114
Article MathSciNet Google Scholar
Ikeda S, Tanaka T, Amari S. Stochastic reasoning, free energy, and information geometry. Neural Computation, 2004, 16(9): 1779–1810
Article MATH Google Scholar
Csiszár I. Information measures: A critical survey. In: Transactions of the 7th Prague Conference. 1974, 83–86
Csiszár I. Axiomatic characterizations of information measures. Entropy, 2008, 10(3): 261–273
Article MATH Google Scholar
Ali M S, Silvey S D. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society. Series B, 1966, 28(1): 131–142
MATH MathSciNet Google Scholar
Amari S. α-divergence is unique, belonging to both f-divergence and Bregman divergence classes. IEEE Transactions on Information Theory, 2009, 55(11): 4925–4931
Article MathSciNet Google Scholar
Cichocki A, Adunek R, Phan A H, Amari S. Nonnegative Matrix and Tensor Factorizations. John Wiley, 2009
Havrda J, Charvát F. Quantification method of classification process: Concept of structural α-entropy. Kybernetika, 1967, 3: 30–35
MATH MathSciNet Google Scholar
Chernoff H. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 1952, 23(4): 493–507
Article MATH MathSciNet Google Scholar
Matsuyama Y. The α-EM algorithm: Surrogate likelihood maximization using α-logarithmic information measures. IEEE Transactions on Information Theory, 2002, 49(3): 672–706
MathSciNet Google Scholar
Amari S. Integration of stochastic models by minimizing α-divergence. Neural Computation, 2007, 19(10): 2780–2796
Article MATH MathSciNet Google Scholar
Amari S. Information geometry and its applications: Convex function and dually flat manifold. In: Nielsen F ed. Emerging Trends in Visual Computing. Lecture Notes in Computer Science, Vol 5416. Berlin: Springer-Verlag, 2009, 75–102
Chapter Google Scholar
Eguchi S, Copas J. A class of logistic-type discriminant functions. Biometrika, 2002, 89(1): 1–22
Article MATH MathSciNet Google Scholar
Murata N, Takenouchi T, Kanamori T, Eguchi S. Information geometry of U-boost and Bregman divergence. Neural Computation, 2004, 16(7): 1437–1481
Article MATH Google Scholar
Minami M, Eguchi S. Robust blind source separation by beta-divergence. Neural Computation, 2002, 14(8): 1859–1886
Article MATH Google Scholar
Byrne W. Alternating minimization and Boltzmann machine learning. IEEE Transactions on Neural Networks, 1992, 3(4): 612–620
Article MathSciNet Google Scholar
Amari S, Kurata K, Nagaoka H. Information geometry of Boltzmann machines. IEEE Transactions on Neural Networks, 1992, 3(2): 260–271
Article Google Scholar
Amari S. Natural gradient works efficiently in learning. Neural Computation, 1998, 10(2): 251–276
Article MathSciNet Google Scholar
Amari S, Takeuchi A. Mathematical theory on formation of category detecting nerve cells. Biological Cybernetics, 1978, 29(3): 127–136
Article MATH MathSciNet Google Scholar
Jordan M I. Learning in Graphical Models. Cambridge, MA: MIT Press, 1999
Google Scholar
Yuille A L. CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation. Neural Computation, 2002, 14(7): 1691–1722
Article MATH Google Scholar
Yuille A L, Rangarajan A. The concave-convex procedure. Neural Computation, 2003, 15(4): 915–936
Article MATH Google Scholar
Opper M, Saad D. Advanced Mean Field Methods-Theory and Practice. Cambridge, MA: MIT Press, 2001
MATH Google Scholar
Tanaka T. Information geometry of mean-field approximation. Neural Computation, 2000, 12(8): 1951–1968
Article Google Scholar
Amari S, Ikeda S, Shimokawa H. Information geometry and mean field approximation: The α-projection approach. In: Opper M, Saad D, eds. Advanced Mean Field Methods-Theory and Practice. Cambridge, MA: MIT Press, 2001, 241–257
Google Scholar

Download references

Author information

Authors and Affiliations

RIKEN Brain Science Institute, Saitama, 351-0198, Japan
Shun-ichi Amari

Authors

Shun-ichi Amari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shun-ichi Amari.

Additional information

Shun-ichi Amari was born in Tokyo, Japan, on January 3, 1936. He graduated from the Graduate School of the University of Tokyo in 1963 majoring in mathematical engineering and received Degree of Doctor of Engineering. He worked as an Associate Professor at Kyushu University and the University of Tokyo, and then a Full Professor at the University of Tokyo, and is now Professor-Emeritus. He moved to RIKEN Brain Science Institute and served as Director for five years and is now Senior Advisor. He has been engaged in research in wide areas of mathematical science and engineering, such as topological network theory, differential geometry of continuum mechanics, pattern recognition, and information sciences. In particular, he has devoted himself to mathematical foundations of neural networks, including statistical neurodynamics, dynamical theory of neural fields, associative memory, self-organization, and general learning theory. Another main subject of his research is information geometry initiated by himself, which applies modern differential geometry to statistical inference, information theory, control theory, stochastic reasoning, and neural networks, providing a new powerful method to information sciences and probability theory. Dr. Amari is past President of International Neural Networks Society and Institute of Electronic, Information and Communication Engineers, Japan. He received Emanuel R. Piore Award and Neural Networks Pioneer Award from the IEEE, the Japan Academy Award, C&C Award and Caianiello Memorial Award. He was the founding co-editor-in-chief of Neural Networks, among many other journals.

About this article

Cite this article

Amari, Si. Information geometry in optimization, machine learning and statistical inference. Front. Electr. Electron. Eng. China 5, 241–260 (2010). https://doi.org/10.1007/s11460-010-0101-3

Download citation

Received: 15 January 2010
Accepted: 05 February 2010
Published: 23 July 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s11460-010-0101-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information geometry in optimization, machine learning and statistical inference

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

{Euclidean, metric, and Wasserstein} gradient flows: an overview

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Keywords

Navigation

Information geometry in optimization, machine learning and statistical inference

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

{Euclidean, metric, and Wasserstein} gradient flows: an overview

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation