Collection

Information Geometry for Data Science

This collection is associated with the conference Information Geometry for Data Science, IG4DS2022, Hamburg, Germany, September 19-23, 2022.

Data, in its many forms and across various disciplines, is becoming an essential source for research in the 21st century. In fact, data driven knowledge extraction nowadays constitutes one of the core paradigms for scientific discovery. This paradigm is supported by the many successes with universal architectures and algorithms, such as deep neural networks, which can explain observed data and, at the same time, generalise extremely well to unobserved new data. Thus, such systems are capable of revealing the intrinsic structure of the data, as an important step within the process of knowledge extraction.

Structure is coupled with geometry at various levels. At the lowest level, spatially or temporally extended data, such as images or audio recordings, often exhibit complex geometric features which encode their underlying structure. At the next level of description, we can interpret each datum as a structureless point in a high- or infinite-dimensional vector space. Here, structure emerges if we consider a collection of such data points. This can then be modelled in terms of a manifold, leading to the notion of a data manifold, or a distribution of data points. In information geometry, one typically considers such a distribution as a single point, a point in the set of probability measures on the (measurable) space of data points. With this, we enter the next level of description. Again, a collection of points, each of them being a distribution of data points, forms a geometric object, referred to as a statistical model. Traditionally, information geometry has been concerned with the identification of natural geometric structures of such models, the Fisher-Rao metric and the Amari-Chentsov tensor being important instances of these.

Given a set of observed data points, the so-called empirical distribution, it is natural to search for a data distribution from the statistical model that optimally explains the empirical distribution and, at the same time, allows us to predict new data. Such a search within the statistical model is referred to as a learning process, a process intensively studied in statistics and machine learning. It has been demonstrated that the geometry of the statistical model has a great impact on the quality of learning. One instance of this is given by the natural gradient method, which improves the learning simply by utilising the natural geometry induced by the Fisher-Rao metric. The general geometric perspective of information geometry had already a great influence on machine learning and is expected to further influence the general field of data science.

A learning processes is, by its very nature, a data driven and therefore a stochastic process. It takes another level of description in order to interpret that process as a deterministic evolution of a distribution on the given parametrised model. This evolution is typically described in terms of a Kolmogorov equation, a partial differential equation, which can also be studied with the help of Riemannian geometry, where, this time, the Riemannian metric is naturally chosen to be the Otto metric. Here, geometry again yields important insights about the learning process within a statistical model.

The aim of this collection is to highlight recent developments within information geometry that are relevant for data science at any level of the outlined hierarchy of descriptions. Bridging between these levels is subject of the field of optimal transport and Wasserstein geometry. Their consistent integration within information geometry is expected to contribute to the foundations of data science.

Editors

  • Nihat Ay

    Professor, Hamburg University of Technology, Germany Editor-in-Chief

Articles (8 in this collection)