THE L p -FISHER-RAO METRIC AND AMARI-˘CENCOV α -CONNECTIONS

. We introduce a family of Finsler metrics, called the L p -Fisher-Rao metrics F p , for p ∈ (1 , ∞ ), which generalizes the classical Fisher-Rao metric F 2 , both on the space of densities Dens + ( M ) and probability densities Prob( M ). We then study their relations to the Amari-˘Cencov α -connections ∇ ( α ) from information geometry: on Dens + ( M ), the geodesic equations of F p and ∇ ( α ) coincide, for p = 2 / (1 − α ). Both are pullbacks of canonical constructions on L p ( M ), in which geodesics are simply straight lines. In particular, this gives a new interpretation of α -geodesics as being energy minimizing curves. On Prob( M ), the F p and ∇ ( α ) geodesics can still be thought as pullbacks of natural operations on the unit sphere in L p ( M ), but in this case they no longer coincide unless p = 2. On this space we show that geodesics of the α -connections are pullbacks of projections of straight lines onto the unit sphere, and they cease to exists when they leave the positive part of the sphere. This unveils the geometric structure of solutions to the generalized Proudman-Johnson equations, and generalizes them to higher dimensions. In addition, we calculate the associate tensors of F p , and study their relation to ∇ ( α ) .


Introduction
Information geometry is concerned with the study of spaces of probability densities as differentiable manifolds.Its first developments were mostly about the finite-dimensional geometry of parametric statistical models, for which the space of distributions can be identified with the parameter space.In 1945, Rao [37] showed that the Fisher information could be used to define a Riemannian metric on this space, and in 1982, Cencov [15] proved that it was the only metric invariant with respect to sufficient statistics, for families with finite sample spaces.The Fisher-Rao metric was also shown to induce well-known geometries on certain important statistical models, such as hyperbolic geometry on normal distributions [3].
Encompassing the Fisher-Rao metric, a richer geometric structure was introduced by Cencov [15] and Amari [2] on spaces of parametric probability distributions.The Amari-Cencov structure relies on a family of affine connections called the α-connections, denoted by ∇ (α) , that are dual with respect to the Fisher-Rao metric, and such that the 0-connection is the Levi-Civita connection.The α-connections arise naturally as an interpolating family between the so-called exponential and mixture connections ∇ (1) and ∇ (−1) , for which exponential and mixture families are (dually) flat manifolds.These geometric tools relate to natural information-theoretic quantities such as the Kullback-Leibler divergence, and have been used in statistical inference, e.g. to express conditions for existence of consistent and efficient estimators, or to obtain a purely geometric interpretation of the famous Expectation-Maximization (EM) algorithm in the presence of hidden variables [1].
In parallel, infinite-dimensional information geometry tools have also been developed in the nonparametric setting, although arguably to a lesser extent.The non-parametric Fisher-Rao metric was introduced by Friedrich in 1991 [21] on the space of all probability densities.He showed that it yields the historical Fisher information metric when restricted to finite-dimensional submanifolds representing parametric statistical models, and that the geometry is spherical with constant curvature 1/4.More than two decades later, it was proved to be the only metric (up to a multiplicative factor) invariant with respect to the action of sufficient statistics, namely diffeomorphic change of the support, just like in the finite-dimensional case [4,9].In the infinite-dimensional setting, it is possible to work with diffeomorphisms of the support instead of the densities themselves, since the space of smooth densities on a compact manifold M with respect to a volume form λ can be obtained as the quotient Diff(M )/ Diff λ (M ) of diffeomorphisms modulo diffeomorphisms preserving λ.Using this representation Khesin, Lenells, Misiolek and Preston [26] have shown in 2013 that the Fisher-Rao metric can be obtained as the quotient of a right-invariant homogeneous Sobolev Ḣ1 -metric on Diff(M ), see also [33] and the recent overview article [27].
The Amari-Cencov structure induced by the α-connections also received interest in the nonparametric setting.Giblisco and Pistone [24] defined the exponential and mixture connections in this case, and showed that for α ∈ (−1, 1), the interpolating connections can be defined through a p-root mapping to an L p sphere, for p = 2 1−α .Divergences and dualistic structures are investigated in the monograph of Ay, Jost, Lê and Schwachhöfer [5], although the α-connections themselves are not directly considered there in the infinite-dimensional setting.See also [35] for a definition of the α-divergences and α-connections in a Hilbert manifold settings.In [30], Lenells and Misio lek study the α-connections on diffeomorphisms and relate their geodesic equations to a well-known equation, the generalized Proudman-Johnson equation.Very recently, three authors of the present paper showed that these Proudman-Johnson equations, on the real line, could alternatively be seen as the geodesic equations of right-invariant Finsler metrics on the diffeomorphism group [11], which were first introduced in [18].This led to making a first link between α-connections and a family of Finsler metrics, which we investigate further here.
1.1.Main contributions.The aim of the present paper is three-fold.First, to introduce and study the L p -Fisher-Rao metrics on (probability) densities , for p ∈ (1, ∞) and any density µ and tangent vector a.Note, that is a family of Finsler metrics that conincides with the Fisher-Rao metric when p = 2. Second, to give a precise and rigorous review of the Amari-Cencov α-connections in the infinite-dimensional setting, a new variational formulation of their corresponding geodesics, and explicit solution formulas for them.Finally, to make links between the two, distinguishing between the space of densities, the space of probability densities, and parametric statistical models.Next we will describe the main contributions in more details: we study the L p -Fisher-Rao geometry of (probability) densities through a mapping to the set of positive functions, , where λ is some background probability measure.Just like the Fisher-Rao metric is the pullback of the standard L 2 -metric via the square-root transform [26,13,22], we show that the L p -Fisher-Rao metric is the pullback of the L p -norm via the mapping Φ p , that we call by analogy the p-root transform (Theorems 3.12 and 4.10).The L p -Fisher-Rao geometry on the space of densities is therefore that of a flat space, as described in Corollary 3.13, and on the space of probability densities that of the L p -sphere (Theorem 4.10).The p-root transform (for p = 2 1−α ) also presents an alternative way to define the α-connections as pullbacks of the trivial connection of the vector space of functions (Theorems 3.12 and 4.10), as first shown by Gibilisco and Pistone [24] for probability distributions, albeit with a slightly different construction.The geometric differences between these constructions for the L p -Fisher-Rao metric and the α-connections, which we systematically study in this paper, are summarized in Figure 1.
Towards this aim, we show that the geodesic equations of F p and ∇ (α) coincide on Dens + (M ) (for α = 1 − 2/p), but not on Prob(M ) (see Theorems 3.3, 3.7, 4.2 and 4.4); similarly, on Dens + (M ) the Chern connection induced by F p coincides with the α-connection, while this no longer holds on Prob(M ) (Theorem 3.10 and Remark 4.8).This provides the novel variational formulation of these α-connection geodesics.
We further use the p-root transform to obtain explicit solution formulas for α-geodesics on densities and on probability densities: for densities, we show in Corollary 3.13 that geodesics are pullbacks of straight lines in L p space, whereas for probability densities we show in Theorem 4.11 that they are pullbacks of projections of straight lines in L p onto the L p -sphere.In the latter case the projection involves a time rescaling that is obtained as a solution of an ordinary differential equation.Similar solutions of the geodesic equation of the α-connection were obtained for finite sample space [5, pp. 50-51].In the infinite-dimensional case with a one-dimensional base manifold M , it gives an explicit solution (modulo a solution to an ODE) of the generalized Proudman-Johnson equation, for a certain range of parameters, and to the generalization to higher-dimensional base manifolds by Lenells and Misio lek [30].There, they proved the complete integrability of these equations for the flat case α = ±1 by providing an explicit solution formula.Similarly, the integrability for the case α = 0 was shown in [26].Our results can thus be interpreted as complete integrability of the α-geodesic equation for the whole range α ∈ (−1, 1).
The results in the one-dimensional situation are in correspondence with the analysis of [29,38], where a similar p-root transform was used to study the generalized Proudman-Johnson equation.In these articles it was used as an ad-hoc simplification of some auxiliary equations; here we expose the geometry behind it, which also simplifies some of the authors' calculations, and generalize it to higher dimensions.These connections are summarized in Section 5.
Throughout this paper we work in the smooth category, i.e., all densities are assumed to be smooth, and the underlying space M is assumed to be a smooth manifold.This is mainly in order to avoid some technicalities, and most results work in much lower regularity.For example, for all results not involving the action of Diff(M ), the underlying space M can be simply a measurable space, and in many cases densities only need to be integrable.1.2.Outline.The rest of the paper is organized as follows.We start by describing some background on spaces of densities and the Fisher-Rao metric in Section 2. Then we investigate the geometries induced by the α-connections and the L p -Fisher-Rao metrics as well as their links, on the space of smooth densities in Section 3 and on the space of probability densities in Section 4. In Section 5 we discuss the relations of the various geodesic equations obtained in Sections 3-4 to some known PDEs, as well as the relation between the L p -Fisher-Rao metric to Finsler metrics on diffeomorphism groups.The different notions of geodesics are compared numerically on an example in Section 6.Finally, we consider the finite-dimensional setting of parametric statistical models in Section 7, illustrated by the special case of normal distributions.In Appendix A we present a short overview of infinite-dimensional Finsler geometry.Data availability statement.Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Spaces of densities and the Fisher-Rao metric
In all of this article let M be a closed manifold of dimension dim(M ) < ∞.We denote by Dens + (M ) the space of smooth positive densities and by Prob(M ) the subspace of smooth probability densities, i.e., is an open subset of the Fréchet space Ω n (M ) it carries the structure of a Fréchet manifold with tangent space T µ Dens(M ) = Ω n (M ).Similarly, as a linear subspace of a Fréchet manifold, the space of probability densities is a Fréchet manifold, where the tangent space is given by On both the space of densities and probability densities we can consider the pushforward action of the diffeomorphism group Diff(M ).On Dens + (M ) it is given by ( 1) and, since the pushforward by a diffeomorphism is volume preserving, this action restricts to an action on the space of probability densities.By a result of Moser [34] this action is transitive, which allows us to identify the space of probability densities with the quotient where Diff λ (M ) is the group of volume preserving diffeomorphisms of some fixed probability density λ.Thus, constructions (metrics, connections, geodesics) on Prob(M ) can be pulled back to Diff(M ) via the map φ → φ * λ.
For a ∈ Ω n (M ) and µ ∈ Dens + (M ), we denote by a µ the Radon-Nikodym derivative of a with respect to µ.In particular, the map µ → µ λ allows us to identify Dens + (M ) with positive smooth functions on M , and Prob(M ) with the positive smooth functions that integrate to one.For the proof of the local wellposedness results in Sections 3 and 4 we will also need the Sobolev completions of these spaces, which can be defined using their Radon-Nikodym derivative w.r.t. to λ, i.e., for k > dim(M )/2 we consider Note, that the assumption k > dim(M )/2 is necessary to make sense of the positivity condition.A central object in information geometry is the Fisher-Rao metric, which we introduce now: Via restriction G FR induces a Riemannian metric on Prob(M ), which we denote by the same letter.
3. The L p -Fisher-Rao metric and α-connections on the space of densities In this section we will introduce the L p -Fisher-Rao metric on the space of densities, which will allow us to obtain a new interpretation of the family of α-connections.
3.1.The Amari-Cencov α-connections on Dens + (M ).First we will introduce the family of αconnections on the space Dens + (M ).In the finite-dimensional case, i.e., when M is a finite set, the below definitions coincide with the classical ones, see e.g.[2,4].
Using Hölder inequality, it follows that D (α) is non-negative and vanishes if and only if µ = ν.Furthermore, a straightforward calculation shows that the negative of its second derivative defines a positive bilinear form, which is exactly the Fisher-Rao metric, i.e., Here ∂ µ and ∂ ν refer to derivatives with respect to the µ and ν variables, respectively.Thus, for any α ∈ (−1, 1), D (α) is a divergence in the sense of [4,Section 4.4], and induces a connection ∇ (α) on Dens + (M ) via the relation where a, b, c ∈ T µ Dens + (M ).Since Dens + (M ) is a Fréchet manifold, G FR is merely a weak Riemannian metric, and as such, (4) does not necessarily define ∇ (α) uniquely.However, in our case it does, yielding the following formulae: Lemma 3.2 (α-connection).For any α ∈ (−1, 1) the α-connections ∇ (α) on Dens + (M ) are given by Here Db.a| µ := D µ b(a µ ) denotes the directional derivative of the vector field b in the direction given by a µ .
The easiest way to read this lemma (and similar formulae below) is to consider again the identification of densities and positive functions via µ → µ/λ.
Proof.This follows directly from formula (4).□ In the following result we will study the local wellposedness of the corresponding geodesic equations.Therefore we will first consider these equations on a Banach space of Sobolev densities, where it will be easy to obtain the local wellposedness using the theorem of Picard-Lindelöff.The result in the smooth category will then follow from an Ebin-Marsden type no-loss-no-gain result [19]: For any k > dim(M )/2 the geodesic equations are locally wellposed on the space of Sobolev densities Dens k + (M ), i.e., given initial conditions µ(0 there exists an unique solution to equation (6) defined on a maximal interval of existence [0, T ).The maximal interval of existence is uniform in the Sobolev order k and thus the local wellposedness continues to hold in the limit, i.e., on the space of smooth densities Dens + (M ).
Proof.The formula for the geodesic equation follows directly from Lemma 3.2.To show the local well-posedness we view the geodesic equation ( 6) as a flow equation on T Dens k + (M ).Therefore let F (µ t ) denote the right hand side of the geodesic equation, i.e., (7) F (µ, µ t ) = µ −1 µ 2 t where we use the identification of Dens k + (M ) with the space of positive, Sobolev functions H k + (M ) and T µ Dens k + (M ) with all of H k (M ).Using the Sobolev module properties and the positivity of µ it follows that F is a smooth map from H k + (M ) × H k (M ) and thus the local well-posedness follows by the theorem of Picard-Lindelöff.Next, we observe that F is equivariant under the action of the diffeomorphism group Diff(M )(M ), i.e., F (φ * µ, φ * µ t ) = φ * F (µ, µ t ).Thus the result on the uniformness of the maximal interval of existence follows by an adaption of the Ebin-Marsden noloss-no-gain theorem [19,Lemma 12.2] to the present setting, i.e., the diffeomorphism group acting on densities.This can be achieved by following the proof in [7], where the no-loss-no-gain result has been extended to the action of diffeomorphisms on the space of all Riemannian metrics.The key ingredient for this result is the fact that, in a chart, Lie derivatives along coordinate vector fields coincide with ordinary derivatives.□ Remark 3.5.It is easy to see that the L p -Fisher-Rao metric satisfies the axioms of a Finsler metric, as defined in Definition A.1, for any p ∈ (1, ∞).We will, however, see in Lemma 3.8, that it is not strongly convex if p ̸ = 2. First we will show, that the family of L p -Fisher-Rao metrics shares an important property with the Fisher-Rao metric: they are invariant under the action of the diffeomorphism group as defined in (1).Lemma 3.6.For any p ∈ (1, ∞), the L p -Fisher-Rao metric on the space of Dens + (M ) is invariant under the action of the diffeomorphism group Diff(M ), i.e., (9) Proof.This result follows by direct computation using the transformation formula for integrals.□ Next we calculate the geodesic equations of this family of Finsler metrics on Dens + (M ).
Theorem 3.7 (Geodesic equation on Dens + (M )).For any p ∈ (1, ∞), the geodesic equation of the L p -Fisher-Rao metric on the space of densities Dens + (M ) is given by which coincides with the geodesic equation of the α-connection for α = 1 − 2 p .Thus the local wellposedness result of Theorem 3.3 also hold for the geodesic equation of the L p -Fisher-Rao metric.
Proof.The length functional of the L p -Fisher-Rao metric on Dens + (M ) is given by where µ : [0, 1] → Dens + (M ) such that µ(0) = µ 0 , µ(1) = µ 1 and where µ t denotes its (time) derivative.A geodesic is a path that locally minimizes the length functional; since L is invariant to reparametrization, we can restrict ourselves to paths of constant speed.By the Hölder inequality, it follows that constant speed geodesics are equivalently the local minimizers of the q-energy for any q > 1.In our case the most convenient choice is to consider the q-Energy with q = p.The corresponding energy functional reads as Calculating the variation of the p-energy functional in direction δµ leads to (11) where we used integration by parts in time t and that the variational direction vanishes at the end points, i.e., δµ(0) = δµ(1) = 0. From here we can immediately read off the geodesic equation which can be simplified to the desired formula.That this equation coincides with the geodesic equation of the α-connection can be seen by comparing it to the equation of Theorem 3.

□
Next we will study the Finslerian geometry induced by the L p -Fisher-Rao metric (see Appendix A for a short overview of the main definitions).We will see in the next Lemma, that the L p -Fisher-Rao metric is, in general, not strongly convex and thus some of the calculations in this and the next sections have to be understood formally.
Lemma 3.8 (The Hessian matrix).Let µ ∈ Dens + (M ) and ν, a, b ∈ T µ Dens + (M ).The Hessian matrix g ν of the squared L p -Fisher-Rao metric at ν is given by where If ν is nowhere zero than g ν is positive definite and thus a Riemannian metric.If ν vanishes on an open set U ⊂ M then, for p > 2, g ν is degenerate as it vanishes for all a, b ∈ T µ Dens + (M ) with support contained in U, and for p < 2 it is not well-defined.
Proof.We introduce the notations To compute the Hessian matrix of F 2 p (µ, ν) we need to calculate the second derivative in r and s of F 2 p (µ, ω).We have For the second derivative we get Evaluating at r = s = 0 yields the desired formula for g ν µ .For ν ̸ = 0 we can use the Cauchy-Schwarz inequality to prove the positive-definiteness of the Hessian: Then we get the inequality Thus for ν being a nowhere vanishing vector field, g ν (a, a) = 0 implies that a µ = 0. □ Lemma 3.9 (The Cartan tensor).Let µ ∈ Dens + (M ) and ν, a, b, c ∈ T µ Dens + (M ).The Cartan tensor of the L p -Fisher-Rao metric is given by Proof.This formula can be derived similarly as the formula for the Hessian by computing where ω(r, s, t) = ν + ra + sb + tc.□

3.3.
The α-connection as Chern connection of the L p -Fisher-Rao metric.Next we will show that the Chern connection associated to the L p -Fisher-Rao metric on Dens + (M ) is an α-connection, when two entries are taken to be the same.
Proof.Formula ( 16) defines the Chern connection if and only if it verifies the generalized Koszul formula (see Lemma A.8) Since the Cartan tensor verifies To compute the first terms of the right hand-side of this equality, we will need where I and J are defined by (15), and Using this we get and The following terms of the right hand-side of the generalized Koszul formula (17)  Finally there remains to compute the two terms involving the Chern connection, i.e. the term on the left hand-side and the last term of the right hand-side.With the chosen value of α, we have and so This yields, using (14), Putting all the terms together yields the left hand-side of the generalized Koszul formula (17), i.e.

□
As a direct consequence of the above characterization of the α-connections as a Chern connection we obtain that these connections have an interpretation as describing energy minimizing curves: Corollary 3.11.Let α ∈ (−1, 1).Geodesic curves of the α-connection describe locally minimizing curves of the 2  1−α -Energy 3.4.The p-root transform.Next, we will isometrically map the space of densities to a simpler space, which will allow us to obtain explicit expressions for solutions to the geodesic equation; we call this construction, which is a direct generalization of the square-root transform for the Fisher-Rao metric, the p-root transform.At the same time the p-root transform presents an alternative way to define the α-connection.This has been first proposed by Gibilisco and Pistone [24], who considered this construction specifically for the space Prob(M ) albeit with slightly different notations and a different identification of a tangent vector with a function.Theorem 3.12.Endow the space C ∞ (M ) of smooth functions with the standard L p -norm and with the trivial vector space connection ∇ tr , i.e., for two vector fields ξ, η : (c) The pullback of Φ * p ∇ tr coincides with ∇ (α) up to a constant depending only on the footpoint: In particular, the geodesics of Φ * p ∇ tr and ∇ (α) coincide.
Note that geodesics of the trivial connection on a vector space are always straight lines; in particular, this proposition allows us to obtain geodesics of the L p -Fisher-Rao metric (of the α-connection, resp.) by pulling-back straight lines in C ∞ (M ) using Φ p .We will use this in Corollary 3.13 below to explicitly describe the resulting formulas on Dens + (M ).First we present the proof of the above theorem, which is a fairly straightforward calculation: Proof of Theorem 3.12.The characterization of the image of Φ p follows directly from the definition of Dens + (M ).To show item (b) we calculate for µ ∈ Dens + (M ) and a ∈ T µ Dens + (M ) the differential of Φ p : Therefore the pullback of the L p -norm via the embedding Φ p is given by which implies that the embedding Φ p is indeed an isometry.
Similarly we calculate for item (c) a b| µ .

□
The above theorem allows us to explicitly solve for geodesics on Dens + (M ), which in turn leads to a proof of metric and geodesic incompleteness of the L p -Fisher-Rao metric for any p > 1.By the equivalence of geodesics for the α-connections and for the L p -Fisher-Rao metric the formulas for geodesics also hold for the former.In the finite dimensional setting this solution formula (via the p-root mapping) for the α-geodesics is known albeit without any geometric interpretation, cf.(a) The space Dens + (M ) equipped with the L p -Fisher-Rao metric (the α-connection resp.) is geodesically convex and, even more, there exists an explicit formula for all minimizing geodesics: given any µ 0 , µ 1 ∈ Dens + (M ) the unique geodesic µ : [0, 1] → Dens + (M ) connecting µ 0 to µ 1 is given by (b) Given any µ 0 , µ 1 ∈ Dens + (M ) the geodesic distance of the L p -Fisher-Rao metric is given by In particular, the geodesic distance of the L p -Fisher-Rao metric on Dens + (M ) is nondegenerate.(c) For any initial conditions µ 0 ∈ Dens + (M ) and a ∈ T µ Dens + (M ) the unique L p -Fisher-Rao geodesic (α-connection geodesic, resp.)µ : [0, T ) → Dens + (M ) defined on its maximal interval of existence [0, T ) is given by The geodesic µ(t) exists for all time t, i.e., T = ∞, if and only if a λ (x) ≥ 0 for all x ∈ M .Thus the space Dens + (M ) equipped with the L p -Fisher-Rao metric is geodesically incomplete since the solution to the geodesic equation (10) leaves the space in finite time for any initial condition with a λ (x) < 0 for some x.(d) The space Dens + (M ) equipped with the geodesic distance of the L p -Fisher-Rao metric is metrically incomplete.(e) The metric completion of the space Dens + (M ) with respect to the geodesic distance of the L p -Fisher-Rao metric is the space of all non-negative L 1 -densities: Proof.Statements (a)-(d) follow directly from the isometry of Theorem 3.12, the fact that geodesics on the vector space (C ∞ (M ), L p ) are straight lines and the characterization of the image of Φ p as an open, convex subset of C ∞ (M ).To see the statement regarding the metric completion we observe that the metric completion of the image is exactly the set of a.e.non-negative L p -functions and thus the statement on the metric completion follows by applying Φ −1 p .□

4.
The L p -Fisher-Rao metric and α-connections on the space of probability densities The L p -Fisher-Rao metric F p and the α-divergence D α define, via restriction, corresponding objects on Prob(M ), which we study in this section.In particular, we will see that Prob(M ) equipped with the L p -Fisher-Rao metric corresponds geometrically to an infinite dimensional L psphere.In addition we will see that the equivalence to the α-connection, that has been established for the space of all densities in the previous section, does not hold on the space of probability densities.Consequently we obtain three different notions of p-geodesics on this space: (1) geodesics of the restriction of the L p -Fisher-Rao metric to Prob(M ); (2) geodesics of the α-connections on Prob(M ); (3) projections of L p -Fisher-Rao geodesic curves (or equivalently, the α-connection ones) on Dens + (M ).In addition, if we allow to leave the space of probability densities, we obtain a fourth notion: (4) L p -Fisher-Rao geodesics in Dens + (M ).In analogy to the L 2 case, the induced geodesic distance between probability densities defines an L p version of the Hellinger distance.
We will show that (2) and (3) coincide, thereby providing an explicit formula for α-geodesics on Prob(M ).For a graphic summary of these constructions we refer to Figure 1.In the next section we will compare the remaining three notions of geodesics numerically.
4.1.The Amari-Cencov α-connections on Prob(M ).The restriction of the α-divergences D α to the space Prob(M ) induces again a family of α-connections, which we will denote by ∇ (α) .Note, that this connection is not simply the restriction of the α-connections on Dens + (M ), which is the reason for choosing a different notation for it.We start by deriving an explicit formula for the α-connections on Prob(M ): For finite sample spaces this result is well-known (e.g., [5, Section 2.5.2]); in infinite dimensions formula (19) agrees with the formula (22) in [30], under the identification of Prob(M ) = Diff(M )/ Diff λ (M ).
Proof.To derive the formula for the α-connection ∇ (α) we calculate the second derivative of the restriction of D α , which is given again by formula (4) with the only difference being that a, b, c ∈ T µ Prob(M ).Thus we have determined ∇ For any k > dim(M )/2 the geodesic equations are locally wellposed on the space of Sobolev probability densities Prob k (M ), i.e., given initial conditions µ(0) ∈ Prob k (M ), µ t (0) ∈ T µ(0) Prob k (M ) there exists an unique solution to equation (6) defined on a maximal interval of existence [0, T ).
The maximal interval of existence is uniform in the Sobolev order k and thus the local wellposedness continues to hold in the limit, i.e., on the space of smooth, probability densities Prob(M ).
Proof.The proof of the local wellposedness follows exactly as in Theorem 3.3.□ 4.2.The L p -Fisher-Rao metric on Prob(M ).Next, we study the restriction of the L p -Fisher-Rao metric to the space Prob(M ).
Remark 4.3 ( Cencov's theorem).Note that Lemma 3.6 on the invariance of the L p -Fisher-Rao metric continues to hold on the space Prob(M ).For the Riemannian case and dim(M ) > 1 Cencov's theorem states that the Fisher-Rao metric is the only Riemannian metric on Prob(M ) that is invariant under the action of the diffeomorphism group Diff(M ), cf.[16,4,9].In the Finslerian case there is a significant amount of additional flexibility, and one can indeed construct metrics beyond the L p -Fisher-Rao metric that satisfy this property.In future work it would be interesting to obtain a complete characterization of all such Finsler metrics.
We start by computing the geodesic equation of the (restriction) of the L p -Fisher-Rao metric F p on Prob(M ): Theorem 4.4 (Geodesic equation on Prob(M )).For any p ∈ (1, ∞), the geodesic equation of the L p -Fisher-Rao metric on the space of densities Prob(M ) is given by where C(t) is a constant depending only on time t, that is chosen such that M µ(t) = 1.
This equation coincides with the geodesic equation of the α-connection if and only if p = 2 (α = 0, resp.).Remark 4.5 (Existence of solutions).In the previous section we showed that the geodesic equation of the α-connections is locally wellposed on the space Prob(M ).One would be tempted to expect a similar result for the geodesic equation of the L p -Fisher-Rao metric; recall that this statement was true on the space Dens + (M ).It turns out that the above equation is analytically much worsebehaved: the problem arises from the vanishing of the quantity µt µ which leads to singularities of the geodesic equation.As a consequence we conjecture that the geodesic equation does not admit any classical solutions.This behavior can also be observed in the numerical simulations (Figure 2), where the obtained (approximate) solutions show a singular behavior.
Proof of Theorem 4.4.To derive this equation, we proceed as for the geodesic equation on the space Dens + (M ).We then obtain again for the variation of the p-Energy with the only difference being that δµ now has to integrate to zero.Thus we do not get that Ψ = 0 as we had on the space Dens + (M ), but only that Ψ has to be orthogonal to all such δµ.This is equivalent to Ψ being a constant for each fixed time t, which is determined by the condition that M µ(t) = 1.□ The above result suggests that the equivalence between the α-connection and the Chern-connection of the L p -Fisher-Rao metric cannot hold in this setting.We will make this formal in the following theorem: Theorem 4.6 (The Chern connection on Prob(M )).For a vector field ν on Prob(M ) the Chern connection is given by, for all a ∈ T µ Prob(M ), ( 21) with the constants Remark 4.7.As any vector field ν ∈ T µ Prob(M ) has zeros the above formula has to be taken with caution and should be understood formally only.
Remark 4.8.In particular, when all entries are the same, the Chern connection on Prob(M ) is the orthogonal projection of the α-connection ∇ (α) on Dens + (M ), for α = 1 − 2 p , with respect to g ν , the Riemannian metric (12) induced by the L p -Fisher-Rao metric ( 22) where p * is the Hölder conjugate of p and Indeed, the correction term k ν µ 2−p µ is orthogonal to T Prob(M ) and makes the integral zero.
Proof of Theorem 4.6.We start by noticing that, since Dν(a) integrates to zero, the integral of the right hand-side of ( 21) is zero and so it defines a tangent vector of Prob(M ).The formula (21) defines the Chern connection if and only if it verifies the generalized Koszul formula (17).
Letting α = 1 − 2 p and ∇ (α) be the corresponding α-connection on Dens + (M ), we can decompose the candidate for the Chern connection as Since ∇ (α) is the Chern connection on Dens + (M ) for this choice of α, the candidate (21) verifies the generalized Koszul formula if and only if ( 23) This also means that all terms in the Cartan tensor ( 14) but one vanish, leaving Finally there remains to compute Putting all these together, and noticing that k 2 (ν, ν) = − p p−2 k 1 (ν), we obtain and so condition ( 23) is satisfied.□

4.3.
The p-root transform on Prob(M ).In the previous section we have seen that the α-connection and the L p -Fisher-Rao metric induce different geodesics on the space Prob(M ).In this section we will investigate the geometric reasons behind this, by connecting both of these objects to the p-root transform.In order to state this result we will need to define an appropriate connection on the sphere S p := {f ∈ C ∞ (M ) : ∥f ∥ L p = 1}, as the image of Prob(M ) under Φ p is in this set.To this end, we define: Definition 4.9 (p-projection and p-connection).The p-projection map π p : T C ∞ | Sp → T S p is defined by The induced p-connection on S p is defined by The geodesic equation ∇ p γ γ = 0 can therefore be written as: (24) γ ∥ γ M γ p dλ = 1 Note that from a metric point of view, this splitting is natural since f ∈ T f C ∞ is the unique direction from which straight lines (i.e., geodesics in C ∞ ) emanating from f gets the fastest away from S p with respect to the L p norm (since for p ∈ (1, ∞) the space L p is strictly convex).Similarly, For a more general viewpoint on projections on a sphere in uniformly convex Banach spaces whose dual is also uniformly convex, see [23] and [22,Prop. 2].
We are now able to formulate the analogous statement of Theorem 3.12, which will demonstrate the geometric differences between the α-connections and the L p -Fisher-Rao metric: Theorem 4.10.Let α ∈ (−1, 1) and, as before, denote p = 2 1−α .Consider the restriction of the map Φ p , as defined in (18), to the space Prob(M ).We have: In particular, the geodesics of Φ * p ∇ p and ∇ (α) coincide.
Proof.The proof follows by the same calculation as the proof of Theorem 3.12.□ On S p , geodesics are no longer straight lines, and we do not have an explicit solution for the geodesic equations of either the α-connection or the L p -Fisher-Rao metric.However, by projecting straight lines on the sphere and rescaling time, one can obtain geodesics for the α-connection (cf.[4,Section 2.5.2]where this result has been shown in the finite dimensional situation): Theorem 4.11.Let f ∈ S p and ξ ∈ T f S p .Let I ⊂ R be an interval containing 0, and let τ : Then γ : I → S p defined by A boundary value problem between f, g ∈ S p can similarly addressed by putting ξ = g − f and I = [0, 1], and replacing the initial conditions for τ by the boundary conditions τ (0) = 0, τ (1) = 1.
Geodesics of ∇ (α) are obtained by pulling back these geodesics using Φ p .They all cease to exist (i.e., leave the space Prob(M )) after finite time.Since the geodesic equation is locally well-posed (Proposition 4.2), this procedure induces all the α-connection geodesics, i.e., the exponential map of ∇ (α) .
Proof.Using (24), we need to show that γ ∥ γ; all the other assumptions are satisfied by construction.We have The last addend is clearly parallel to γ. Hence it is sufficient to require that which is equivalent to the wanted ODE.
In order to prove that the pullback of the solutions leaves Prob(M ) after a finite time, we need to show that γ(t) stops being positive, i.e., that for some t > 0, f (x) + τ (t)ξ(x) ≤ 0 for some x ∈ M .From the equivariance under the action of Diff(M ), cf.Remark 4.3, it is sufficient to consider the case f ≡ 1 (which corresponds to µ(0) = λ).In this case ξ is a non-zero function satisfying M ξ λ = 0, and thus in particular ξ(x) < 0 for some x.Therefore, in order to prove that 1 + τ (t)ξ(x) ≤ 0 for some t, it is sufficient to prove that τ is unbounded as t → ∞.Note that we can write the equation for τ as (25) τ (t) = 2 1 − 1 Now, since s → 1 + sξ is a tangent line to the unit sphere at f = 1 in the strictly convex space L p , it follows that ∥1 + sξ∥ L p ≥ 1, and equality holds if and only if s = 0. Thus, the term in the parentheses in ( 25) is non-negative, and vanishes if and only if τ (t) = 0. Since we also have that τ (0) = 0 and τ (0) = 1, it follows that τ (t) > 0 for t ∈ (0, t 0 ) for some t 0 small enough, and thus for any positive t.It follows therefore that τ > t for all t > 0, and in particular, it is unbounded.□ Remark 4.12.In fact, the estimate τ > t implies that 1 + τ (t)ξ hits zero at some point for the first time at t * < 1 − min ξ .Pulling back to Prob(M ), we obtain that a geodesic from λ with initial condition a ∈ T λ Prob(M ) blows up at time (26) t * < p − min(a/λ) .
In principle, better estimates on the blowup can be obtained by more careful analysis of (25).The estimate ( 26) is exactly the estimate obtained in [29, Formula (78)] (there, the parameter a is equivalent to −1 − 2 p in our notation).Example 4.13 (Fisher-Rao geodesics).For the case p = 2, assuming that ξ is a unit vector (which is, by definition, perpendicular to f ), we obtain that the ODE takes a simpler form whose solution is τ (t) = tan t, yielding the known solution of the Fisher-Rao geodesics [26, Remark 4.4].

Summary of relations to known PDEs and metrics on diffeomorphism groups
We now summarize how the L p -Fisher-Rao metric relates to (degenerate) right-invariant Finsler metric on the group of diffeomorphisms, in a similar spirit as in [26] who studied this for the L 2case.Furthermore, we will see how the geodesics equations described in this paper relate to other previously studied equations in hydrodynamics and mathematical physics: • On the diffeomorphism group of a closed manifold M one can consider the family of, rightinvariant (degenerate) Ẇ 1,p -Finsler metrics of the form These metrics were useful for proving that the diameter of Diff(M ) with respect to some critical Sobolev Riemannian metrics is infinite [12].Note that the kernel of the Finsler metric Fp consists exactly of all divergence free vector fields, and thus Fp is only a "true" Finsler metric on the quotient space Diff(M )/ Diff λ (M ).The relation to the L p -Fisher-Rao metric, as studied in the present article, becomes clear by considering the mapping φ → Jac(φ)λ, which gives rise to an isometry Note, that this result is a direct generalization of the case p = 2 treated in Khesin et al. [26].For this case Modin [33] constructed an extension of the metric F2 to obtain a non-degenerate, right invariant Riemannian metric on the full group of diffeomorphisms Diff(M ), that still descends to the Fisher-Rao metric F 2 on Prob(M ).In future work it would be interesting to consider a similar extension for the case p ̸ = 2. • Similarly, the α-connections on Prob(M ) can be pulled back to Diff(M )/ Diff λ (M ); the corresponding geodesic equation (which is equivalent to the one in Theorem 4.2) was first considered in [30].Theorem 4.11 shows their integrability and finite-time blowup.• For the special case M = S 1 , where the group of volume preserving diffeomorphisms is given by the group of rotations Rot(S 1 ), the α-connections on Prob(S 1 ) can thus be pulled back to Diff(S 1 )/ Rot(S 1 ), where the associated geodesic equation, when presented on the Lie algebra, is the generalized periodic inviscid Proudman-Johnson equation as was first shown in [30].See [38,29] and the references therein for analysis of this equation, also beyond the range α ∈ (−1, 1).• Similarly, the L p -Fisher-Rao metric on Prob(S 1 ) can be considered as a Finsler metric on Diff(S 1 )/ Rot(S 1 ).The resulting geodesic equation is the periodic r-Hunter-Saxton equation for r = 1/p, as considered in [18,11].As shown in this paper, this is not the same equation as the one of the α-connections on Prob(S 1 ) (i.e., the generalized periodic invicid Proudman-Johnson equation), unlike what we erroneously stated in [11].
• For M = R, the geodesic equations of α-connections (equiv., of the L p -Fisher-Rao metric) on Dens(R) can be considered as equations of an appropriate subgroup of Diff(R), defined in [11].The resulting equation is the generalized non-periodic invicid Proudman-Johnson equation, or equivalently, the non-periodic r-Hunter-Saxton equation (for r = 1/p) [18].Moreover, the metric Fp described above on this subgroup of Diff(R) yields a similar isometry to (Dens + (M )(R), F p ), as follows from [11].It is interesting whether (Dens + (M )(R), F p ) can be similarly interpreted on compact manifolds as well, maybe in a similar way to the "simple unbalanced optimal transport" extension, introduced recently in [28].

6.
A numerical comparison of geodesics on Dens + (M ) and Prob(M ) In this section we aim to numerically compare the different notions of geodesics that we have encountered in this article.Given two probability densities we consider three notions of geodesics: (1) The geodesic for the L p -Fisher-Rao metric and the α-connection on Dens + (M ), which is simply obtained as the pullback by the p-root transform Φ p of the straight line in L p .This geodesic leaves the space Prob(M ). ( 2) The geodesic for the α-connection on Prob(M ), which is the pullback by the p-root transform of the projection of the straight line on the L p sphere, as described in Theorem 4.11.(3) The geodesic for the L p -Fisher-Rao geodesic on Prob(M ), which is the pullback by the p-root transform of the geodesic of the L p -metric restricted to the L p -sphere.Specifically we consider the example of probability densities on the one-dimensional base space M = [0, 1].Note, that we have an explicit formula for the first two notions of geodesics (geodesics on Dens + (M ) and α-connection geodesics on Prob(M )), but that the calculation of the L p -Fisher-Rao geodesic between two probability distributions µ 0 and µ 1 requires us to solve an optimization problem: the geodesic boundary value problem on the L p -sphere.Namely, we minimize the p-energy for the L p metric on smooth functions where f : [0, 1] → C ∞ (M ) is a path constrained to belong to the L p -sphere, such that f (0) = Φ p (µ 0 ), f (1) = Φ p (µ 1 ) and f t denotes its time derivative.This is equivalent to minimizing the length functional, as explained in the proof of Theorem 3.7.We then obtain the wanted geodesic µ : [0, 1] → Prob(M ) by applying Φ −1 p .In Figure 2 we show the three types of geodesics obtained for different values of p (p = 2, 3, 5, 10 from top to bottom), and the corresponding values of α = 1−2/p.The constrained minimization of (27) was performed in Python using the Sequential Least Squares Programming (SLSQP) method provided by the Scipy minimization solver, with a discretization of 30 time points and 100 sampling points, in a straightforward implementation that was not aimed for computational efficiency.As expected, the L p -Fisher-Rao metric and the α-connection yield different geodesics on Prob(M ), except for the special case p = 2 corresponding to the Fisher-Rao metric and its Levi-Civita connection.

Finite-dimensional geometry of parametric statistical models
In this section we make the link with the finite-dimensional setting of parametric statistical models.Let us consider a finite-dimensional submanifold of Prob(R n ) corresponding to a family of probability distributions on R n that are absolutely continuous with respect to the Lebesgue measure, and whose densities are parametrized by a parameter θ belonging to an open subset Θ of R d : Here x ∈ R n is the sample variable and dx denotes the Lebesgue measure on R n .Then a tangent vector of P Θ at a given µ = f (•, θ)dx is of the form a = d dt t=0 µ t , where µ t = f (•, θ t )dx with t → θ t a curve in Θ such that θ 0 = θ and θ0 = u ∈ T θ Θ.Thus the tangent space at µ is where e i = ∂f ∂θ i dx.Here ∇ θ denotes the gradient with respect to θ and ⟨•, •⟩ the Euclidean scalar product on R d .In all the sequel, we identify P Θ ≃ Θ and T µ P Θ ≃ T θ Θ ≃ R d via the one-to-one maps (29) ϕ : Θ → P Θ , θ → f (•, θ)dx, 7.1.The Fisher-Rao metric and the α-connection.The Fisher-Rao metric on the parameter space Θ is the Riemannian metric whose metric matrix is the Fisher information matrix Here E denotes the expectation taken with respect to the random variable X of density f (•, θ), and ℓ(x, θ) = log f (x, θ) is the log-likelihood.
Definition 7.1.Given θ ∈ Θ and u, v ∈ T θ Θ ≃ R d , the Fisher-Rao metric is The Fisher-Rao metric on the parameter space Θ is the pullback of the Fisher-Rao metric on the infinite-dimensional space Prob(R n ) by the bijection ϕ defined by (29), i.e. for any θ ∈ Θ and , and so both are denoted the same way.
Just like in the infinite-dimensional setting, the α-connection on the parameter space can be defined using the α-divergence.
Definition 7.2.The α-connection on Θ is defined by its Christoffel symbols of the first kind ([39], Eqn 2.9) is the α-divergence.This yields the following formula in local coordinates, where The following result is well-known in the literature, and stated e.g. in [2] for spaces of probability distributions on a finite set.Theorem 7.3 (α-connection on Θ).For any u, v ∈ T θ Θ, we have where ∇(α) and ∇ (α) denote the α-connections on P Θ ≃ Θ and Prob(R n ) respectively, and Proj FR : is the orthogonal projection with respect to the Fisher-Rao metric.
Proof.First notice that at any µ = ϕ(θ) ∈ P Θ , the orthogonal projection of a tangent vector a ∈ T µ Prob(R n ) onto T θ Θ with respect to the Fisher-Rao metric G FR is given by where (G ij ) ij is the inverse of the Fisher matrix.Indeed, the tangent space T µ P Θ is a d-dimensional vector space spanned by the tangent vectors e i = ∂ i f dx for i = 1, . . ., d, and so the orthogonal projection of a ∈ T µ Prob(R n ) onto T µ P Θ is given by u i e i where for j = 1, . . ., d, G FR (a − u i e i , e j ) = 0 i.e.G FR (a, e j ) = u i G FR (e i , e j ) = u i G ij .
The α-connection on Prob(R n ) is given by where D µ b(a) is the directional derivative of the vector field b in the direction of the vector a µ .Let ∂ i denote partial derivative with respect to θ i for all i = 1, . . ., d.For vector fields on the finite-dimensional manifold P Θ , and at µ = ϕ(θ), we get since where in the last equality we used the equality Remembering that G FR (hdx, kdx) = E(hk/f 2 ) and since E(∂ m ℓ) = 0, we obtain using (30), which concludes the proof.□ 7.2.The L p -Fisher-Rao metric.We now introduce a finite-dimensional version of the Finsler L p -Fisher-Rao metric.
Definition 7.4.Given θ ∈ Θ and v ∈ T θ Θ we define the L p -Fisher-Rao metric on Θ as Here ⟨•, •⟩ denotes the Euclidean scalar product on R d , E denotes the expectation taken with respect to the random variable X of density f (•, θ), and ℓ(x, θ) = log f (x, θ) is the log-likelihood.
The metric (32) on the parameter space Θ coincides with the Finsler metric induced on P Θ by the L p -Fisher-Rao metric (8) through the identification P Θ ≃ Θ, which is why they are denoted the same way.Indeed, for any (θ, v) ∈ T Θ, Lemma 7.5 (Induced Chern connection on Θ).The Chern connection associated to the L p -Fisher-Rao metric on Θ is given by where g and C respectively denote the Riemannian metric (12) and Cartan tensor (14) induced by the L p -Fisher-Rao metric, (g v ) ij = g ϕ * v (e i , e j ) and (e i ) i are the basis vectors (28) of T µ P Θ and Proof.Let a = ϕ * u, ν = ϕ * v, α = 1 − 2/p, and ∇ (α) be the α-connection on Dens + (R n ).Similarly to the orthogonal projection with respect to the Fisher-Rao metric (31), the orthogonal projection on T Θ with respect to g ν is given by We define the connection ∇ by Let us show that ∇ is the Chern connection ∇v on Θ, by showing once again that it verifies the generalized Koszul formula (40).Using the notations g v (u, w) = g ϕ * v (ϕ * u, ϕ * w), C v (u, w, z) = C ϕ * v (ϕ * u, ϕ * w, ϕ * z) and the fact that C v (v, •, •) = 0, the generalized Koszul formula can be written Recalling that ∇ (α) is the Chern connection on Dens + (M ) and noticing that ϕ which is easily checked to be true using the fact that g ν (e i , e j ) = (g v ) ij .To obtain the desired formula for ∇v , we write the α-connection in coordinates, through the same computations as in the proof of Theorem 7.3 Using (35) we obtain ), e i )e j which injected into (36) gives the desired result.□ Remark 7.6.Like in infinite dimensions (see Remark 4.8), when all entries are the same, the Chern connection on P Θ ≃ Θ is the orthogonal projection of the α-connection ∇ (α) on Dens + (R n ), for α = 1 − 2 p , with respect to g ϕ * v , the Riemannian metric (12) induced by the L p -Fisher-Rao metric Theorem 7.7 (Geodesic equation on Θ).The geodesic equation of the L p -Fisher-Rao metric on the space P Θ is given by θm + (g θ) mk g ϕ * θ(ω( θ, θ), e k ) = 0, (38) where (e i ) i are the basis vectors (28) and ω is defined by (34).
Proof.This results directly from Lemma A.9 in Appendix A and writing ∇ θ θ θ = 0 in local coordinates using (37).□ In both cases, we solve the geodesic ODE with boundary constraints in Python for a discretization of 50 time steps, using the dedicated function in Scipy 1 , which implements a fourth order collocation algorithm.We plot in Figure 7.8 the L p -Fisher-Rao geodesics for several values of p as well as the α-geodesics for the corresponding values of α.As expected, these geodesics do not coincide, except for p = 2, where we retrieve the Fisher-Rao metric.

Appendix A. Infinite dimensional Finsler geometry
In this appendix we will present several key definitions of Finsler geometry in the infinite dimensional setting.We will base our definitions on their counterparts from classical finite dimensional Finsler geometry, see eg. [6,17,36].
In the following let M be an infinite dimensional, Fréchet manifold with tangent bundle T M. Remark A.2.It can be shown that the strong convexity condition (d) implies the subadditivity condition (c) and several modern textbooks require strong convexity instead of subadditivity in the definition of a Finsler metric as this allows to develop several concepts of Riemannian geometry in the Finslerian setting.We choose to not assume this stronger condition as our main example, the L p -Fisher-Rao metric, is not strongly convex.
Remark A.3 (Weak and strong Finsler metrics).For each x ∈ M the Finsler metric F induces a topology on T x M and in finite dimensions this topology coincides with the original manifold topology.In infinite dimensions this is not the case and we will distinguish between two different types of Finsler metrics: strong Finsler metric, for which F x induces the locally convex topology on T x M and weak Finsler metrics, where the induced topology can be weaker than the locally convex topology.If M is not a Banach manifold then any Finsler metric on M can only be a weak Finsler metric.
Similarly as a Riemannian metric a Finsler structure F on a manifold M defines a length structure on the set of piece wise smooth curves and thus one can define a corresponding path length distance: For any pair of points x, y ∈ M we consider the induced geodesic distance function d F (p, q) := inf c L F (c), where the infimum is calculated over the set of a piece wise smooth curves that connect x to y.Similar as in Riemannian geometry one can show that minimizing the length is equivalent to minimizing the energy, which is defined as Remark A.5 (Vanishing Geodesic distance).It is easy to see that the geodesic distance functions is symmetric and satisfies the triangle inequality.In general, for weak Finsler metrics, it does not satisfy the non-degenracy property -d F (x, y) = 0 if and only if x = y for Finsler metrics.Indeed, even in the Riemannian case, several examples have been encountered where the geodesic distance can be degenerate or even vanishes identically, see eg. [20,32,10,25].
Next we will introduce two important concepts from Finsler geometry: the Cartan tensor, which was introduced by E. Cartan [14] to evaluate the differences between Finsler metrics and Riemannian metrics, and the Chern connection, which is a generalization of the Levi-Civita connection on a Finsler manifold.
Note, that the definition of these two objects requires that the Finsler metric is strongly convex.As the L p -Fisher-Rao metric, studied in the following sections, will not satisfy this property several of the calculations in these parts have to be taken with caution and should be thus understood only formally.
Definition A.6 (Cartan Tensor and Chern connection).Let (M, F ) be a Finsler manifold, where F is assumed to satisfy the strong convexity assumption.For any nonzero tangent vector V ∈ T x M, the Cartan tensor is defined as the symmetric trilinear form C V (X, Y, Z) := 1 4 and the Chern connection, if it exists, is the unique affine, torsion-free connection ∇ V that is almost metric, that is for vector fields X, Y, Z we have Remark A.7.In the above definition of the Chern-connection we have added the assumption on it's existence.This is additional assumption is not necessary in finite dimensions, but is entirely an infinite dimensional phenomenon, see eg. [8] where the authors studied a Riemannian metric on a group of diffeomorphisms such that the corresponding Levi-Civita connection does not exist.
The next Lemma, which will be of importance when we show the equivalence of the Chernconnection of the L p -Fisher-Rao metric and the α-connection on Dens(M ), provides a generalized Koszul-formula for the Chern-connection: Lemma A.8. Let (M, F ) be a Finsler manifold, where F is assumed to satisfy the strong convexity assumption.For every non-zero vector field V ∈ T x M, the Chern connection, if it exists, satisfies

Figure 1 .
Figure 1.Geometric structures on Dens + (M ) and Prob(M ) via the p-root transform: The map µ → µ λ 1/p maps Dens + (M ) to (a subset of) L p (λ), and Prob(M ) toits unit sphere S p .On L p (λ) there is the natural Finsler metric ∥ • ∥ L p and the trivial connection ∇ tr of a vector space, the geodesics of both are straight lines.Their pullback via the p-root map yield (up to a constant) the L p -Fisher-Rao metric F p and the α-connection ∇ (α) on Dens + (M ), whose geodesic equations coincide.The metric ∥ • ∥ L p naturally restricts to S p .The connection ∇ tr induces a connection on S p via the natural projection π p : T L p (λ)| Sp → T S p .The geodesics of these induced metric and connection differ.Their pullbacks via the p-root map yield (up to a constant) F p and the α-connection ∇ (α) on Prob(M ).

.
and, as before, denote p = 2 1−α .Define the map Φ p : Dens + (M ) → C ∞ (M ) by We have: (a) The image Φ p (Dens + (M )) is the set of all positive functions in C ∞ (M ).(b) The mapping Φ p is an isometric embedding, where Dens + (M ) is equipped with a multiple of the L p -Fisher-Rao metric and where C ∞ (M ) is viewed as a vector space equipped with the standard L p -norm.

[ 5 ,
Page 50].Corollary 3.13 (The geometry of the L p -Fisher-Rao metric).For any p > 1 we have the following statements:

Lemma 4 . 1 .a
For any α ∈ (−1, 1) the α-connections ∇ (α) on Prob(M ) are given by b on Prob(M ) is the projection of ∇ (α) a b with respect to the Fisher-Rao metric G FR .

Theorem 4 . 2 .
(α) up to a function in the G FR orthogonal complement of T µ Prob(M ), which are exactly the constant multiples of µ.Thus the formula follows by ensuring that ∇ (α) a b ∈ Prob(M ).This argument also proves that ∇ (α) a b is the Fisher-Rao projection of ∇ A path µ : [0, 1] → Prob(M ) is a geodesic with respect to ∇ (α) if Noticing that, for all b ∈ T µ Prob(M ), I(| ν µ | 2−p µ, b) = b = 0, we see from (12) that (a) The image Φ p (Prob(M )) is the set of all positive functions in the L p -sphere S p .(b) The mapping Φ p is an isometric embedding, where Prob(M ) is equipped with a multiple of the L p -Fisher-Rao metric and where S p is equipped with the restriction of the standard L p -norm.(c) The pullback of Φ * p ∇ p to Prob(M ) coincides with the connection ∇ (α) up to a constant depending only on the footpoint:

Figure 2 .
Figure 2. Different notions of geodesics between two probability distributions on [0, 1], for p = 2, 3, 5, 10 from top to bottom, and corresponding values of α = 1 − 2/p.On the left: geodesics of Dens + (M ) for the L p -Fisher-Rao metric and the corresponding α-connection.In the middle: α-geodesics on Prob(M ).On the right: L p -Fisher-Rao geodesics on Prob(M ).The last two notions coincide only for p = 2.

Figure 3 .
Figure 3.Comparison of the geodesics between two normal distributions shown in the parameter space for the L p -Fisher-Rao metric (left) and for the α-connection (right), for different values of p and the corresponding values of α = 1 − 2/p.The geodesics all start at the normal distribution of parameters m 0 = −2, σ 0 = 1, and end at the normal distribution of parameters m 1 = 2, σ 1 = σ 0 = 1.

Example 7 . 8 (σ 2 g v 0 where g v 0 1 √ 2
Normal distributions).Let us consider the example of univariate normal distributions, parametrized by mean and standard deviation.The parameter space is the upper half-plane Θ = R × R * + , and the Fisher-Rao metric, after a change of coordinates, is the hyperbolic metric of the Poincaré half-plane.The family of Riemannian metrics induced by the Finsler L p -Fisher-Rao metric are also Poincaré metrics: for any θ = (m, σ) ∈ Θ and v ∈ R 2 , the metric matrix is given byg v (m,σ) =1 does not depend on m and σ.In order to compute geodesics for the L p -Fisher-Rao metric, one can solve the geodesic equation(38), using the following densities with respect to a given µ(dx) = 2πσ exp(− (x−m) 2 2σ 2 )dx: the basis vectors of the tangent plane T µ P Θ are given by − 1) with z := x − m σ and for a given curve t → θ(t) = (m(t), σ(t)), 2 − 1)) ṁ σ .The L p -Fisher-Rao geodesic can be compared to the solutions of the geodesic equation of the α-connection for α = 1 − 2/p:

Definition A. 1 ( 2 ∂ 2
Finsler structure).A Finsler structure on M is a function F : T M → [0, ∞), that is smooth on the complement of the zero section of T M and satisfies for all x ∈ M and X, Y ∈ T x M (a) F (λY ) = λF (Y ) for all λ > 0; (b) F (Y ) ≥ 0 and F (Y ) = 0 if and only if Y = 0. (c) F (X + Y ) ≤ F (X) + F (Y ).The Finsler norm F is called strongly convex if we have in addition (d) For any 0 ̸ = V ∈ T x M the Hessian matrix g V of F 2 at V exists and is positive definite, where g V (X, Y ) := 1 ∂s∂t F 2 (V + sX + tY ) s=t=0 .