1 Introduction

1.1 Overview, motivation

Semantic image segmentation, a.k.a. image labeling, denotes the problem to partition an image into meaningful parts. Applications are abound and include interpretation of traffic scenes by computer vision systems, medical image analysis, remote sensing, etc. The state of the art is based on deep networks that were trained on very large data sets. A recent survey [1] reviews a vast number of different deep network architectures and their empirical performance on various benchmark data sets. Among the challenges discussed in [1, Sec. 6.3], the authors write: “... a concrete study of the underlying behavior/dynamics of these models is lacking. A better understanding of the theoretical aspects of these models can enable the development of better models curated toward various segmentation scenarios.”

In [2], a class of dynamical systems for image labeling, called assignment flows, was introduced in order to contribute to the mathematics of deep networks and learning. We refer to Sect. 3 for a precise definition and to [3] for a review of recent related work. Assignment flows correspond to solutions W(t) of a high-dimensional system of coupled ordinary differential equations (ODEs) of the form

$$\begin{aligned} \dot{W}(t) = \mathcal {R}_{W(t)}[F\big (W(t)\big )], \end{aligned}$$
(1)

that evolve on the so-called assignment manifold \(\mathcal {W}\). Each ODE of this system is a replicator equation [4, 5]

$$\begin{aligned} \dot{W}_{i} = R_{W_{i}}F_{i}(W),\qquad \dot{W}_{ij} = W_{ij}\Bigg (F_{ij}(W)-\sum _{l=1}^n W_{il}F_{il}(W)\Bigg ),\quad i \in [m],\; j\in [n], \end{aligned}$$
(2)

whose solution \(W_{i}(t)\in \mathcal {S}:={{\,\textrm{rint}\,}}\Delta _{n-1}\subset \mathbb {R}_{+}^{n}\) evolves on the relative interior of the probability simplex that is equipped with the Fisher–Rao metric g [6] and is labeled by a vertex \(i\in \mathcal {V}\) of an underlying graph \(\mathcal {G}=(\mathcal {V},\mathcal {E})\). The assignment manifold \(\mathcal {W}=\mathcal {S}\times \cdots \times \mathcal {S}\) is the product of the Riemannian manifolds \((\mathcal {S},g)\) with respect to all vertices \(i\in \mathcal {V}\).

The essential component of the vector field of (2) are a collection of affinity functions \(F_{ij}:\mathcal {W}\rightarrow \mathbb {R}\) that measure the affinity (fitness, etc.) of the classes (types, species, etc.) \(j\in [n]\). The differences of these affinity values to their expected (or average) value on the right-hand side of (2), together with the multiplication by \(W_{ij}\), define the replicator equation. For suitably defined affinity functions, the solution of this equation is supposed to perform a selection of some class j: \(W_{i}(t)\) converges for \(t\rightarrow \infty \) to a vertex of \(e_{j} \in \Delta _{n-1}\) and in this sense encodes the decision to assign the class label j to the vertex \(i\in \mathcal {V}\) and to any data indexed by i, like e.g. the color value in some image, see Sect. 3.2 for more details.

The basic idea underlying the assignment flow approach (1) is to assign a replicator equations to each vertex of an underlying graph and to couple them through smooth nonlinear interactions of the assignment vectors \(\{W_{k}:k\in \mathcal {N}_{i}\subset \mathcal {V}\}\) within neighborhoods \(\mathcal {N}_{i}\) around each vertex \(i\in \mathcal {V}\). This is why the argument of \(F_{i}\) in (2) is W rather than \(W_{i}\). As a consequence, dynamic label assignments are performed by solving (1) at each vertex i depending on the context in terms of all other decisions. The fact that W(t) assigns class labels at each vertex when \(t\rightarrow \infty \) is not clear a priori but depends on F. We refer to [7] for the study of a basic instance of F and sufficient conditions that ensure unique labeling decisions.

The connection to deep networks results from approximating the flow by geometric integration. The simplest such scheme among a range of proper schemes [8], the geometric Euler scheme with discrete time index t and step size \(h_{(t)}\), yields the iterative update rule

$$\begin{aligned} W_{i}^{(t+1)} = {{\,\textrm{Exp}\,}}_{W_{i}^{(t)}} \circ {R}_{W_{i}^{(t)}}\left( h_{(t)} F_{i}(W^{(t)})\right) ,\quad i\in \mathcal {V}, \end{aligned}$$
(3)

where \({{\,\textrm{Exp}\,}}:T\mathcal {W}\rightarrow \mathcal {W}\) denotes the exponential map of the so-called e-connection of information geometry [6, 9]. The key observation to be made here is that for the choice of a linear affinity map F, see Sect. 4.2, the right-hand side of (3) involves the two essential ingredients of most deep network architectures:

  1. 1.

    A linear operation at each vertex of the underlying graph parametrized by network parameters, here given as part of the definition of the linear affinity map F.

  2. 2.

    A pointwise smooth nonlinearity, here given by the exponential and replicator maps \({{\,\textrm{Exp}\,}}_{W_{i}} \circ {R}_{W_{i}}\).

Related Work. The connection between general continuous-time ODEs and deep networks has been picked out as a central them by [10, 11] and classifies the assignment flow as a particular ‘Neural ODE’. The above-mentioned limited understanding of what deep networks really do underlines the importance of characterizing and understanding the dynamics (1) of assignment flows.

Concerning the geometry of the probability simplex, information geometry and relations to the general concepts of Lagrangian and Hamiltonian geometric mechanics, we refer to the recent general survey [12]. Further papers devoted to this connection, from the viewpoint of regular exponential families of distributions, include [13, 14]. Our paper also relies on the established concepts of geometric mechanics [15] and information geometry [6, 9] and focuses on two specific aspects: (i) the assignment flow system (1) which couples equations of the form (2) through an affinity function F and, (ii) the specification a class of affinity functions F which enables to characterize any solution to (1) as critical point of a novel action functional (Theorem 3.3).

Remark 1.1

In general, numerical integration schemes on manifolds involving exponential maps can be challenging to compute. However, in the present case of the e-connection on simplices, an explicit expression for the corresponding exponential map in the geometric Euler scheme (3) exists, see [9, Proposition 2.4, p. 43f]. A simplified formula of the right-hand side expression in (3) is given as Equation (129) below.

1.2 Contribution, organization

The aim of this paper is to exhibit a special Lagrangian \(L:T\mathcal {W}\rightarrow \mathbb {R}\) of the form kinetic minus potential energy, with a specific choice for the potential energy, and to characterize solutions W(t) to (1) as stationary points of the corresponding action functional

$$\begin{aligned} \mathcal {L}(W) = \int _{0}^{t}L(W,\dot{W}) dt. \end{aligned}$$
(4)

Our result generalizes the result of a recent paper [16], where an action functional was introduced for the evolution p(t) of a single discrete probability vector on the corresponding probability simplex. By contrast, equation (1) couples the evolution of a (typically large) number of assignment vectors across the underlying graph. In particular, we characterize precisely the admissible class of affinity functions F that establishes the connection between (4) and the corresponding Euler–Lagrange equation, a condition that is missing in [16], see also Sect. 4.4. Furthermore, using the Legendre transform, we compute an explicit expression of the Hamiltonian system associated to (4) in the form of the equivalent Lagrangian system on \(T{\mathcal {W}}\). Finally, we show that except for starting points in a specific set of measure zero, the set of Mañé critical points, solutions of the assignment flow are reparametrized geodesics of the so called Jacobi metric.

This paper considerably elaborates the conference version [17]

The paper is organized as follows. Section 2 collects basic notions of geometric mechanics that are required in the remainder of the paper. The assignment flow and our novel results are presented in Sect. 3, followed by a discussion in Sect. 4. We conclude in Sect. 5.

1.3 Basic notation

In accordance with the standard notation in differential geometry, coordinates of vectors have upper indices. For any \(k \in \mathbb {N}\), we set \([k]:= \{ 1, \ldots , k\} \subset \mathbb {N}\). The standard basis of \(\mathbb {R}^d\) is denoted by \(\{e_1, \ldots , e_d\}\) and we set \(\mathbb {1}_d:= (1, \ldots , 1)^\top \in \mathbb {R}^d\).

Depending on the arguments, \(\langle a, b\rangle \) denotes the Euclidean inner product of vectors or the inner product \(\langle A, B\rangle ={{\,\textrm{tr}\,}}(A^{\top } B)\) of matrices inducing the Frobenius norm \(\Vert A\Vert _{F}=\langle A, A\rangle ^{1/2}\). The identity matrix is denoted by \(I_d \in \mathbb {R}^{d\times d}\) and the i-th row vector of any matrix A by \(A_i\).

The linear dependence of a mapping F on its argument x is indicated by square brackets F[x], if F is just a matrix we simply write Fx. The adjoint of a linear operator \(F:\mathbb {R}^{m\times n} \rightarrow \mathbb {R}^{m\times n}\) with respect to the standard matrix inner product on \(\mathbb {R}^{m\times n}\) is denoted by \(F^*\) and fulfills

$$\begin{aligned} \langle F^*[A], B\rangle = \langle A, F[B]\rangle , \quad \text {for all } A, B \in \mathbb {R}^{m\times n} \end{aligned}$$
(5)

Inequalities between vectors or matrices are to be understood componentwise. For \(a, b \in \mathbb {R}^d\), we denote componentwise multiplication (Hadamard product) by

$$\begin{aligned} a \diamond b := (a^1b^1, \ldots , a^d b^d)^\top \end{aligned}$$
(6)

and, if all components of b are nonzero, componentwise division by \(\frac{a}{b} = (\frac{a^1}{b^1}, \ldots , \frac{a^d}{b^d})^\top \). We further set

$$\begin{aligned} a^{\diamond k} := a^{\diamond (k-1)}\diamond a\quad \text {and}\quad a^{\diamond 0} := \mathbb {1}_d. \end{aligned}$$
(7)

Finally, if \(p \in \mathbb {R}^d\) is a probability vector, i.e. \(p \ge 0\) and \(\langle p, \mathbb {1}_d\rangle = 1\), then the expected value and variance of a vector \(a \in \mathbb {R}^d\) (interpreted as a random variable \(a :[d] \rightarrow \mathbb {R}\)) is

$$\begin{aligned} \mathbb {E}_p[a] = \langle p, a\rangle \quad \text {and} \quad \textrm{Var}_p(a) = \mathbb {E}_p[a^2] - (\mathbb {E}_p[a])^2 = \langle p, a^{\diamond 2}\rangle - \langle p, a\rangle ^2. \end{aligned}$$
(8)

2 Elements from geometric mechanics

In this section, we collect some basic notions of geometric mechanics from [15, Ch. 3] that are required in subsequent sections.

2.1 Hamiltonian systems

Let \((N, \omega )\) be a symplectic manifold with the symplectic two-form \(\omega \), and let \(H :N \rightarrow \mathbb {R}\) be a smooth function, called the Hamiltonian. The Hamiltonian vector field \(X_H\) corresponding to H is defined as symplectic gradient by

$$\begin{aligned} dH|_x [v] = \omega _x(X_H(x), v), \quad \text {for all } x\in N, v \in T_xN. \end{aligned}$$
(9)

The triplet \((N, \omega , X_H)\) is called a Hamiltonian system. By [15, Prop. 3.3.2], a curve \(\gamma (t)\) is an integral curve of \(X_H\), i.e.

$$\begin{aligned} \dot{\gamma (t)} = X_H(\gamma (t)), \end{aligned}$$
(10)

if and only if in Darboux coordinates \((q^1, \ldots , q^n, p_1, \ldots , p_n)\) for \(\omega \), the Hamiltonian equations hold for the curve \(\gamma (t) = (q(t), p(t))\),

$$\begin{aligned} \dot{q}^i(t) = \frac{\partial H}{\partial p_i} (q(t), p(t))\quad \text {and} \quad \dot{p}_i = - \frac{\partial H}{\partial q^i} (q(t), p(t)), \quad \text {for all } i \in [n]. \end{aligned}$$
(11)

The value of the Hamiltonian \(H(\gamma (t))\) (also called energy) is constant along integral curves of \(X_H\).

For any smooth manifold M, the cotangent bundle \((T^*M,\omega ^\textrm{can})\) is a basic instance of the above situation, with the canonical symplectic form \(\omega ^\textrm{can}\). Thus any smooth function \(H :T^*M \rightarrow \mathbb {R}\) gives rise to a Hamiltonian system, where \(T^*M\) is interpreted as momentum phase space and H represents an energy.

2.2 Lagrangian systems

Suppose M is a smooth manifold. Similar to Hamiltonian systems on momentum phase space \(T^*M\), there is a related concept on the tangent bundle TM, interpreted as velocity phase space. In this context, a smooth function \(L :TM \rightarrow \mathbb {R}\) is called Lagrangian. For a given point \(x \in M\), denote the restriction of L to the fiber \(T_xM\) by \(L_x:= L|_{T_xM} :T_xM \rightarrow \mathbb {R}\). The fiber derivative of L is defined as

$$\begin{aligned} \mathbb {F}L :TM \rightarrow T^*M, \quad (x, v) \mapsto \mathbb {F}L(x,v) := (x, dL_x|_v), \end{aligned}$$
(12)

where \(dL_x|_v :T_xM \rightarrow \mathbb {R}\) is the differential of \(L_x\) at \(v \in T_xM\). The function L is called a regular Lagrangian if \(\mathbb {F}L\) is regular at all points (i.e. \(\mathbb {F}L\) is a submersion), which is equivalent to \(\mathbb {F}L:TM \rightarrow T^*M\) being a local diffeomorphism [15, Prop. 3.5.9]. Furthermore, L is called hyperregular Lagrangian if \(\mathbb {F}L :TM \rightarrow T^*M\) is a diffeomorphism. A class of hyperregular Lagrangians that will be relevant in Sect. 3, is given as Eq. (22) below.

The Lagrangian two-form \(\omega _L\) is defined as the pullback of the canonical symplectic form \(\omega ^\textrm{can}\) on the cotangent bundle \(T^*M\) under the fiber derivative \(\mathbb {F}L\)

$$\begin{aligned} \omega _L := (\mathbb {F}L)^*\omega ^\textrm{can}. \end{aligned}$$
(13)

According to [15, Prop. 3.5.9], \(\omega _L\) is a symplectic form on TM if and only if L is a regular Lagrangian. In the following, we only consider regular Lagrangians. The action associated to a Lagrangian \(L :TM \rightarrow \mathbb {R}\) is defined by

$$\begin{aligned} A :TM \rightarrow \mathbb {R}, \quad (x, v) \mapsto \mathbb {F}L(x, v)[v] = dL_x|_v [v], \end{aligned}$$
(14)

and the energy function by \(E:= A - L\), that is

$$\begin{aligned} E:TM \rightarrow \mathbb {R},\quad (x, v) \mapsto \mathbb {F}L(x, v)[v] - L(x, v) = dL_x|_v[v] - L(x, v). \end{aligned}$$
(15)

The Lagrangian vector field for L is the unique vector field \(X_E\) on TM satisfying

$$\begin{aligned} dE|_x[v] = \omega _{L, x}(X_E, v) \quad \text {for all } x \in M, v \in T_x M. \end{aligned}$$
(16)

Since we assume L to be regular, \(X_E\) is nothing else than the symplectic gradient of L with respect to \(\omega _L\). A curve \(\gamma (t) = (x(t), v(t))\) on TM is an integral curve of \(X_E\), i.e.

$$\begin{aligned} \dot{\gamma }(t) = X_E(\gamma (t)), \end{aligned}$$
(17)

if \(v(t) = \dot{x}(t)\) and the classical Euler–Lagrange equations in local coordinates

$$\begin{aligned} \frac{d}{dt} \bigg ( \frac{\partial L}{\partial \dot{x}^i} \big (x(t), \dot{x}(t)\big )\bigg ) = \frac{\partial L}{\partial x^i}\big (x(t), \dot{x}(t)\big ) \quad \text {for all } i \in [n] \end{aligned}$$
(18)

are satisfied. Let \(\gamma :I \rightarrow TM\) be any integral curve of \(X_E\). Then

$$\begin{aligned} \tfrac{d}{dt}E(\gamma )=0, \end{aligned}$$
(19)

that is the energy E is constant along \(\gamma \), analogous to the constancy of the Hamiltonian H due to (9). The subsequent Sect. 2.3 makes this connection explicit.

2.3 The Legendre transform

Let \(L:TM\rightarrow \mathbb {R}\) be a hyperregular Lagrangian, i.e. the fiber derivative \(\mathbb {F}L :TM \rightarrow T^*M\) is a diffeomorphism. Then the Lagrangian system on TM and the Hamiltonian system on \(T^*M\) are related to each other by the Legendre transformation, with the Hamiltonian \(H:T^{*}M\rightarrow \mathbb {R}\) corresponding to the energy E via

$$\begin{aligned} H = E \circ (\mathbb {F}L)^{-1}. \end{aligned}$$
(20)

Accordingly, the Hamiltonian vector field \(X_H\) on \(T^*M\) and the Lagrangian vector field \(X_E\) on TM are \(\mathbb {F}L\) related [15, Thm. 3.6.2], that is

$$\begin{aligned} X_H = (\mathbb {F}L)_* X_E, \end{aligned}$$
(21)

and thus integral curves of \(X_E\) are mapped to integral curves of \(X_H\) and vice versa. Furthermore, the base integral curves of \(X_E\) and \(X_H\) coincide.

Therefore, as a consequence of (20) and for a hyperregular Lagrangian L, the energy E is just another representation of the corresponding Hamiltonian H.

2.4 Mechanics on Riemannian manifolds

Let (Mh) be a Riemannian manifold. Suppose a smooth function \(G :M \rightarrow \mathbb {R}\), called potential, is given and consider the Lagrangian

$$\begin{aligned} L(x, v) = \tfrac{1}{2}\Vert v\Vert _h^2 - G(x), \quad (x, v) \in TM. \end{aligned}$$
(22)

It then follows (see [15, Sec. 3.7] or by direct computation) that the fiber derivative of L is the canonical isomorphism

$$\begin{aligned} \mathbb {F}L = h^\flat :TM \rightarrow T^*M. \end{aligned}$$
(23)

Hence the Lagrangian L is hyperregular with action A and energy \(E = A - L\) given by

$$\begin{aligned} A(x, v) = \Vert v\Vert _h^2 \quad \text {and} \quad E(x, v) = \tfrac{1}{2}\Vert v\Vert _h^2 + G(x) \quad \text {for all } (x, v) \in TM. \end{aligned}$$
(24)

Proposition 2.1

[15, Prop. 3.7.4] Let (Mh) be a Riemannian manifold, \(\pi :TM \rightarrow M\) the natural projection and \(L:TM\rightarrow \mathbb {R}\) the Lagrangian defined by (22). Then the curve \(\gamma :I \rightarrow TM\) with \(\gamma (t) = (x(t), v(t))\) is an integral curve of the Lagrangian vector field \(X_E\), i.e. satisfies the Euler–Lagrange equation, if and only if the corresponding base integral curve \(\pi \circ \gamma = x :I \rightarrow M\) satisfies

$$\begin{aligned} D^h_t\dot{x}(t) = - {{\,\textrm{grad}\,}}^h G(x(t)), \end{aligned}$$
(25)

where \(D^h_t = \nabla ^{h}_{\dot{x}}\) is the covariant derivative along x with respect to the Riemannian (Levi-Civita) connection \(\nabla ^{h}\). Here, \(\textrm{grad}^{h}G\) denotes the Riemannian gradient of the potential G.

3 Mechanics of assignment flows

In this section, we get back to the scenario of image labeling, informally introduced in Sect. 1.1. Section 3.2 completes the definition of the assignment flow approach (1). The assignment manifold underlying the assignment flow is introduced in Sect. 3.1 together with the Fisher–Rao metric in Sect. 3.3. We state and prove the main result of this paper in Sect. 3.4 and calculate in Sect. 3.5 an explicit expression for the associated Hamiltonian system in terms of the corresponding Lagrangian system.

3.1 Assignment manifold

Let \({\mathcal {G}}= ({\mathcal {V}}, {\mathcal {E}})\) denote an undirected graph and identify

$$\begin{aligned} {\mathcal {V}}= [m]\quad \text {with} \quad m := |{\mathcal {V}}|. \end{aligned}$$
(26)

Assume that for every node \(i \in {\mathcal {V}}\) some data point \(f_i\) is given in a metric space \((\mathcal {F}, d_{\mathcal {F}})\), together with a set \(\mathcal {F}_* = \{f_1^*, \ldots , f_n^*\} \subset \mathcal {F}\) of predefined prototypes, also called labels, identified with

$$\begin{aligned} \mathcal {F}_* = [n] \quad \text {for}\quad n := |\mathcal {F}_*|. \end{aligned}$$
(27)

Context based metric data classification or labeling refers to the task of assigning to each node \(i \in {\mathcal {V}}\) a suitable label in \(\mathcal {F}_*\), based on the metric distance to the given data \(f_i\) and the relation between data points encoded by the edge set \({\mathcal {E}}\).

As introduced in Sect. 1.1, for every \(i \in {\mathcal {V}}\) the assignment of labels \(\mathcal {F}_*\) to a data point \(f_{i}\) is represented by an assignment vector \(W_{i}(t)\), where the j-th entry \(W_i^j(t)\) represents the probability for the j-th label \(f_{j}^{*}\). These assignment vectors are determined by (2) and evolve on the relative interior of the \((n-1)\)-simplex

$$\begin{aligned} {\mathcal {S}}:= \{ p \in \mathbb {R}^n:p > 0 \text { and } \langle p, \mathbb {1}_n\rangle = 1\} \end{aligned}$$
(28)

with barycenter

$$\begin{aligned} \mathbb {1}_{\mathcal {S}} := \tfrac{1}{n}\mathbb {1}_{n}. \end{aligned}$$
(29)

Accordingly, all probabilistic label choices on the graph are encoded as a single point \(W\in \mathcal {W}\) on the product space

$$\begin{aligned} {\mathcal {W}}:= {\mathcal {S}}\times \cdots \times {\mathcal {S}}\qquad (m = |{\mathcal {V}}| \text { factors) }, \end{aligned}$$
(30)

with barycenter

$$\begin{aligned} {\mathbb {1}_{\mathcal {W}}}:= (\mathbb {1}_{\mathcal {S}},\dotsc ,\mathbb {1}_{\mathcal {S}})^{\top }. \end{aligned}$$
(31)

Thus, the i-th component of \(W = (W_k)_{k\in {\mathcal {V}}}\) represents the probability distribution of label assignments for node \(i \in {\mathcal {V}}\)

$$\begin{aligned} W_i=(W_{i}^1,\dotsc ,W_{i}^n)^{\top } \in {\mathcal {S}}. \end{aligned}$$
(32)

In the following, we always identify the space \({\mathcal {W}}\) from (30) with its matrix embedding

$$\begin{aligned} {\mathcal {W}}= \{ W \in \mathbb {R}^{m\times n}:W > 0 \text { and } W\mathbb {1}_n = \mathbb {1}_m\}, \end{aligned}$$
(33)

by regarding the i-th component \(W_i\) of a point \(W = (W_k)_{k\in {\mathcal {V}}}\) in (30) as the i-th row of a matrix in \(\mathbb {R}^{m\times n}\). Hence points \(W \in {\mathcal {W}}\) are viewed as row-stochastic matrices with full support, called assignment matrices, with assignment vectors (32) as row vectors. The barycenter (31) can then also be expressed as a matrix

$$\begin{aligned} {\mathbb {1}_{\mathcal {W}}}= \mathbb {1}_m {\mathbb {1}_{\mathcal {S}}}^\top = \tfrac{1}{n} \mathbb {1}_m \mathbb {1}_n^\top . \end{aligned}$$
(34)

The tangent space of \({\mathcal {S}}\subset \mathbb {R}^n\) from (28) at any point \(p \in {\mathcal {S}}\) is identified as

$$\begin{aligned} T_p {\mathcal {S}}= \{ v \in \mathbb {R}^n :\langle v, \mathbb {1}_n\rangle = 0\} =: {T_{0}}. \end{aligned}$$
(35)

Hence \(T_p{\mathcal {S}}\) is represented by the same vector subspace \({T_{0}}\) of codimension 1, for all \(p \in {\mathcal {S}}\). In particular, the tangent bundle is trivial

$$\begin{aligned} T{\mathcal {S}}= {\mathcal {S}}\times {T_{0}}. \end{aligned}$$
(36)

Viewing \({\mathcal {W}}\) as an embedded submanifold of \(\mathbb {R}^{m\times n}\) by (33), we accordingly identify

$$\begin{aligned} T_W {\mathcal {W}}= \{ V \in \mathbb {R}^{m\times n} :V \mathbb {1}_n = 0\} =: {\mathcal {T}_{0}}, \quad \text {for all } W \in {\mathcal {W}}\subset \mathbb {R}^{m\times n}. \end{aligned}$$
(37)

With this identification the tangent bundle is also trivial

$$\begin{aligned} T{\mathcal {W}}= {\mathcal {W}}\times {\mathcal {T}_{0}}. \end{aligned}$$
(38)

3.2 Assignment flows

Assignment flows are dynamical systems on \({\mathcal {W}}\) for inferring probabilistic label assignments that gradually become unambiguous label assignments as \(t\rightarrow \infty \). These dynamical systems have the form

$$\begin{aligned} \dot{W}(t) = {\mathcal {R}}_{W(t)}[ F(W(t))], \quad \text {with}\quad W(0) \in {\mathcal {W}}, \end{aligned}$$
(39)

where

$$\begin{aligned} F:\mathcal {W}\rightarrow \mathbb {R}^{m\times n} \end{aligned}$$
(40)

is a smooth function and

$$\begin{aligned} {\mathcal {R}}_W :\mathbb {R}^{m\times n} \rightarrow T_W {\mathcal {W}}= {\mathcal {T}_{0}}, \quad \text {for}\quad W \in {\mathcal {W}}, \end{aligned}$$
(41)

is the linear replicator map defined componentwise

$$\begin{aligned} \mathcal {R}_{W}[F(W)]&= \big ({R}_{W_{i}} F_{i}(W)\big )_{i\in \mathcal {V}},\quad W\in \mathcal {W}, \end{aligned}$$
(42a)

via the replicator matrix

$$\begin{aligned} {R}_{W_{i}}&= {{\,\textrm{Diag}\,}}(W_{i})-W_{i}W_{i}^{\top },\quad i\in \mathcal {V}. \end{aligned}$$
(42b)

The function F couples the evolution of the individual assignment vectors \(\dot{W}_{i},\,i\in \mathcal {V}\), over the graph, typically by reinforcing tangent directions of similar assignment vectors, and is therefore called affinity or similarity mapping. Each choice of a similarity mapping F defines a particular assignment flow; see Sect. 4.2 for a basic instance. Our main result stated in Sect. 3.4 characterizes a general class of admissible similarity mappings F.

3.3 Fisher–Rao metric

From an information geometric viewpoint [6, 9], the canonical Riemannian structure on \({\mathcal {S}}\) is given by the Fisher–Rao (information) metric

$$\begin{aligned} g_p(u, v) := \Big \langle u, \frac{v}{p} \Big \rangle , \quad \text {for all } p \in {\mathcal {S}}\;\text {and}\; u, v \in {T_{0}}. \end{aligned}$$
(43)

This naturally extends to the product manifold structure of \({\mathcal {W}}\) (30) via the product metric

$$\begin{aligned} g_W(U, V) := \sum _{i \in [m]} g_{W_i}(U_i, V_i) = \Big \langle U, \frac{V}{W}\Big \rangle , \quad \text {for all } W \in {\mathcal {W}}\;\text {and}\; U, V \in {\mathcal {T}_{0}}. \end{aligned}$$
(44)

which turns the assignment manifold \(\mathcal {W}\) into a Riemannian manifold.

The orthogonal projection onto \({T_{0}}\) and \({\mathcal {T}_{0}}\), respectively, with respect to the Euclidean inner product are given by

$$\begin{aligned} {P_{T_{0}}}&:\mathbb {R}^n \rightarrow {T_{0}},&{P_{T_{0}}}&:= I_n - \tfrac{1}{n}\mathbb {1}_n \mathbb {1}_n^\top \in \mathbb {R}^{n\times n}, \end{aligned}$$
(45a)
$$\begin{aligned} {\mathcal {P}_{\mathcal {T}_{0}}}&:\mathbb {R}^{m\times n} \rightarrow {\mathcal {T}_{0}},&{\mathcal {P}_{\mathcal {T}_{0}}}[A]&:= \big ( {P_{T_{0}}}A_i\big )_{i\in {\mathcal {V}}}. \end{aligned}$$
(45b)

Next, we return to the replicator mappings (42). The linear mapping

$$\begin{aligned} {R}_p :\mathbb {R}^n \rightarrow {T_{0}},\qquad {R}_p = {{\,\textrm{Diag}\,}}(p) - pp^\top \in \mathbb {R}^{n\times n} \end{aligned}$$
(46)

is symmetric

$$\begin{aligned} {R}_p^* = {R}_p^\top = {R}_p, \end{aligned}$$
(47)

satisfies the relations

$$\begin{aligned} {R}_p&= {R}_p {P_{T_{0}}}= {P_{T_{0}}}{R}_p, \end{aligned}$$
(48a)
$$\begin{aligned} \ker ({R}_p)&= \mathbb {R}\mathbb {1}_n, \end{aligned}$$
(48b)

and the restriction \({R}_p|_{T_{0}}:{T_{0}}\rightarrow {T_{0}}\) to the linear subspace \({T_{0}}\subset \mathbb {R}^n\) is a linear isomorphism with inverse given by [18, Lem. 3.1]

$$\begin{aligned} ({R}_p|_{{T_{0}}})^{-1} u = {P_{T_{0}}}{{\,\textrm{Diag}\,}}\big (\tfrac{1}{p}\big ) u = {P_{T_{0}}}\frac{u}{p}, \quad \text {for all } u \in {T_{0}}. \end{aligned}$$
(49)

Likewise, the replicator operator \({\mathcal {R}}_W :\mathbb {R}^{m \times n} \rightarrow {\mathcal {T}_{0}}\) satisfies for all \(W \in {\mathcal {W}}\)

$$\begin{aligned} {\mathcal {R}}_W = {\mathcal {R}}_W \circ {\mathcal {P}_{\mathcal {T}_{0}}}= {\mathcal {P}_{\mathcal {T}_{0}}}\circ {\mathcal {R}}_W \end{aligned}$$
(50)

and the restriction to the linear subspace \({\mathcal {T}_{0}}\subset \mathbb {R}^{m\times n}\) is a linear isomorphism with inverse

$$\begin{aligned} \big ( {\mathcal {R}}_W|_{\mathcal {T}_{0}}\big )^{-1}[U] = {\mathcal {P}_{\mathcal {T}_{0}}}\Big [\frac{U}{W}\Big ], \quad \text {for all } U \in {\mathcal {T}_{0}}. \end{aligned}$$
(51)

Since all the components \({R}_{W_i}\) of \({\mathcal {R}}_W\) are symmetric, we have for all \(X, Y \in \mathbb {R}^{m\times n}\)

$$\begin{aligned} \langle {R}_{W}[X], Y\rangle = \sum _{i \in [m]} \langle {R}_{W_i} X_i, Y_i\rangle \overset{(47)}{=} \sum _{i \in [m]} \langle X_i, {R}_{W_i} Y_i\rangle = \langle X, {\mathcal {R}}_W[Y]\rangle , \end{aligned}$$
(52)

showing that \({\mathcal {R}}_W\) is self-adjoint \({\mathcal {R}}_W^* = {\mathcal {R}}_W\) with respect to the matrix inner product. There is also a relation between the Fisher–Rao metric and the matrix inner product in terms of the replicator operator.

Lemma 3.1

At any point \(W \in {\mathcal {W}}\), the replicator operator \({\mathcal {R}}_W\) transforms the Riemannian metric into the matrix inner product

$$\begin{aligned} g_W({\mathcal {R}}_W[U], V) = \langle U, V\rangle , \quad \text {for all}\quad U, V \in T_W{\mathcal {W}}= {\mathcal {T}_{0}}. \end{aligned}$$
(53)

Proof

Using the properties of the replicator operator \({\mathcal {R}}_W\) directly results in

$$\begin{aligned} g_W({\mathcal {R}}_W[U], V) \overset{(44)}{=}&\Big \langle {\mathcal {R}}_W[U], \frac{V}{W} \Big \rangle \overset{(52)}{=} \Big \langle U, {\mathcal {R}}_W\Big [\frac{V}{W}\Big ]\Big \rangle \end{aligned}$$
(54a)
$$\begin{aligned} \overset{(50)}{=}&\Big \langle U, {\mathcal {R}}_W \circ {\mathcal {P}_{\mathcal {T}_{0}}}\Big [\frac{V}{W}\Big ]\Big \rangle \overset{(51)}{=} \langle U, V\rangle \end{aligned}$$
(54b)

\(\square \)

Corollary 3.2

Let \(J :{\mathcal {W}}\rightarrow \mathbb {R}\) be a smooth function and assume there is a smooth map \(\Psi :{\mathcal {W}}\rightarrow \mathbb {R}^{m\times n}\) such that the differential of J takes the form

$$\begin{aligned} dJ|_W[V] = \langle \Psi (W), V\rangle \quad \text {for all } W \in {\mathcal {W}}\text { and } V \in T_W{\mathcal {W}}= {\mathcal {T}_{0}}\end{aligned}$$
(55)

with respect to the matrix inner product \(\langle \cdot , \cdot \rangle \). Then, the Riemannian gradient of J is given by

$$\begin{aligned} {{\,\textrm{grad}\,}}^g J(W) = {\mathcal {R}}_W[\Psi (W)] \quad \text {for all } W \in {\mathcal {W}}\end{aligned}$$
(56)

Proof

Let \(V \in T_W{\mathcal {W}}= {\mathcal {T}_{0}}\) be arbitrary. As a consequence of Lemma 3.1,

$$\begin{aligned} dJ|_W[V] = g_W\big ({\mathcal {R}}_W[\Psi (W)], V\big ) \quad \text {for all } V \in T_W{\mathcal {W}}= {\mathcal {T}_{0}}, \end{aligned}$$
(57)

with \({\mathcal {R}}_W[\Psi (W)] \in T_W{\mathcal {W}}= {\mathcal {T}_{0}}\). Since this uniquely determines the Riemannian gradient of J, the statement follows. \(\square \)

For functions \(J:{\mathcal {W}}\rightarrow \mathbb {R}\) extending onto an open set, the above lemma directly implies a relation between the Riemannian gradient and the usual gradient, a result that is already well known [9, Prop. 2.2]. For this, suppose \(\widetilde{J} :U \rightarrow \mathbb {R}\) is a smooth extension of J defined on some open set \(U \subset \mathbb {R}^{m\times n}\) containing \({\mathcal {W}}\), i.e. \(\widetilde{J}|_{\mathcal {W}}= J\). Then, \(\Psi (W)\) can be chosen as the usual gradient with respect to the matrix inner product \(\partial \widetilde{J}(W) \in \mathbb {R}^{m\times n}\) and the Riemannian gradient of J is given by

$$\begin{aligned} {{\,\textrm{grad}\,}}^g J(W) = {\mathcal {R}}_W[\partial \widetilde{J}(W)], \quad \text {for all } W \in {\mathcal {W}}. \end{aligned}$$
(58)

3.4 The action functional

Our main result is summarized in the following theorem. It refers to affinity functions \(F :{\mathcal {W}}\rightarrow \mathbb {R}^{m\times n}\) introduced in and discussed after equation (39). Applying the identifications \(T_W{\mathcal {W}}= {\mathcal {T}_{0}}\) from (37) and \(T_{F(W)}\mathbb {R}^{m\times n} = \mathbb {R}^{m\times n}\) for every \(W \in {\mathcal {W}}\) allows to view the differential of F as a linear operator

$$\begin{aligned} dF|_W :{\mathcal {T}_{0}}\rightarrow \mathbb {R}^{m\times n}. \end{aligned}$$
(59)

The adjoint of \(dF|_W\) with respect to the standard matrix inner product (5) on \(\mathbb {R}^{m\times n}\) and \({\mathcal {T}_{0}}\subset \mathbb {R}^{m\times n}\) is denoted by

$$\begin{aligned} dF|^*_W :\mathbb {R}^{m\times n} \rightarrow {\mathcal {T}_{0}}. \end{aligned}$$
(60)

Theorem 3.3

Let \(F:{\mathcal {W}}\rightarrow \mathbb {R}^{m\times n}\) be an affinity map and \(W :[t_0, t_1] \rightarrow {\mathcal {W}}\) a solution of the corresponding assignment flow (39). Then W(t) is a critical point of the action functional

$$\begin{aligned} \mathcal {L}(W) = \int _{t_0}^{t_1} \tfrac{1}{2}\Vert \dot{W}(t)\Vert _g^2 + \tfrac{1}{2}\sum _{i \in {\mathcal {V}}} \textrm{Var}_{W_i(t)}\big (F_i(W(t))\big ) dt, \end{aligned}$$
(61)

if and only if the affinity function F fulfills the condition

$$\begin{aligned} 0 = {\mathcal {R}}_{W(t)} \circ ( dF|_{W(t)} - dF|_{W(t)}^*)\circ {\mathcal {R}}_{W(t)} [F(W(t))]\quad \text { for }\; t \in [t_0, t_1], \end{aligned}$$
(62)

where \(dF|_{W(t)}^*\) is the adjoint linear operator of \(dF|_{W(t)}\) from (60) and \({\mathcal {R}}_{W(t)}\) is the replicator operator defined by (42a). This condition is equivalent to the Euler–Lagrange equation

$$\begin{aligned} D_t^g \dot{W}(t) = \tfrac{1}{2} \sum _{i\in {\mathcal {V}}}{{\,\textrm{grad}\,}}^{g}\!\textrm{Var}_{W_i(t)}\big (F_i(W(t))\big ) \quad \text {for }\; t \in [t_0, t_1]. \end{aligned}$$
(63)

Remark 3.4

Theorem 3.3 characterizes the class of affinity functions, in terms of condition (62), for which solutions to the assignment flow equation (39) are stationary points of the action functional (61) and the Euler–Lagrange equation (63), respectively. We defer most of the further discussion to Sect. 4 but mention one important point here. Since every first-order ODE can trivially be described as a special case of the Euler–Lagrange equation of some quadratic Lagrangian it is worth pointing out that the Lagrangian \(\mathcal {L}\) in Theorem 3.3 is classical, that is, of the form kinetic minus potential energy. In particular, the potential \(-\frac{1}{2}\sum _{i \in {\mathcal {V}}} \textrm{Var}_{W_i(t)}\big (F_i(W(t))\big ) \) (note the minus sign) is a non-positive function. Because of (65) in connection with the formula for the energy in (24), solutions of the assignment flow equation (39) correspond precisely to those solutions of the Euler–Lagrange equation with energy 0. Since 0 is the maximum of the potential this energy value is precisely the Mañé critical value of this Lagrangian system, see Sect. 4 for further remarks.

We proceed with Lemmata to prepare the proof of Theorem 3.3.

Lemma 3.5

Let \(p \in {\mathcal {S}}\) and \(f \in \mathbb {R}^{n}\). Then

$$\begin{aligned} \Vert {R}_p f\Vert _g^2 = \langle f, {R}_p f\rangle = \mathbb {E}_p[f^2] - (\mathbb {E}_p[f])^2 = \textrm{Var}_{p}(f). \end{aligned}$$
(64)

Thus, for \(W \in {\mathcal {W}}\) and \(F \in \mathbb {R}^{m\times n}\), we have

$$\begin{aligned} \Vert {\mathcal {R}}_W[F]\Vert _g^2 = \langle F, {\mathcal {R}}_W[F]\rangle = \sum _{i\in {\mathcal {V}}} \textrm{Var}_{W_i} (F_i). \end{aligned}$$
(65)

Proof

We have

$$\begin{aligned} \Vert {R}_p f\Vert _g^2&= g_p({R}_p f, {R}_p f) \overset{(53)}{=} \langle f, {R}_p f\rangle \overset{(46)}{=} \langle f, p\diamond f - \langle p, f\rangle p\rangle \end{aligned}$$
(66a)
$$\begin{aligned}&= \langle f^{\diamond 2}, p\rangle - \langle f, p\rangle ^2 = \mathbb {E}_p[f^2] - (\mathbb {E}_p[f])^2 = \textrm{Var}_p(f). \end{aligned}$$
(66b)

Therefore, it follows

$$\begin{aligned} \Vert {\mathcal {R}}_W[F]\Vert _g^2 \overset{(53)}{=} \langle F, {\mathcal {R}}_W[F]\rangle \overset{(42a)}{=} \sum _{i\in [m]} \langle F_i, {R}_{W_i} F_i\rangle = \sum _{i\in {\mathcal {V}}} \textrm{Var}_{W_i} (F_i) \end{aligned}$$
(67)

\(\square \)

Next, we compute the differential of the assignment flow vector field (39) viewed as a mapping

$$\begin{aligned} {\mathcal {R}}[F] :{\mathcal {W}}\rightarrow {\mathcal {T}_{0}}, \qquad W \mapsto {\mathcal {R}}[F](W):= {\mathcal {R}}_W[F(W)]. \end{aligned}$$
(68)

Lemma 3.6

With the identifications \(T_W{\mathcal {W}}= {\mathcal {T}_{0}}\) and \(T_{{\mathcal {R}}_W[F(W)]}{\mathcal {W}}= {\mathcal {T}_{0}}\) due to (37), the differential of the mapping (68) is a linear map \(d{\mathcal {R}}[F]|_W :{\mathcal {T}_{0}}\rightarrow {\mathcal {T}_{0}}\), given by

$$\begin{aligned} d{\mathcal {R}}[F]|_W[V]&= {\mathcal {R}}_W\circ dF|_W[V] + \mathcal {B}(W, F(W))[V], \qquad V \in {\mathcal {T}_{0}}, \end{aligned}$$
(69a)

where the i-th row of the linear map \(\mathcal {B}(W, F(W)) :{\mathcal {T}_{0}}\rightarrow {\mathcal {T}_{0}}\) is defined via matrix multiplication

$$\begin{aligned} (\mathcal {B}(W, F)[V])_i&:= B(W_i, F_i)V_i, \qquad i\in \mathcal {V},\quad W \in {\mathcal {W}},\; F \in \mathbb {R}^{m\times n}, \end{aligned}$$
(69b)

with matrix B given by

$$\begin{aligned} B(p, f)&:= {{\,\textrm{Diag}\,}}(f) - \langle p, f\rangle I_n - p f^\top , \qquad p \in {\mathcal {S}}, f \in \mathbb {R}^n. \end{aligned}$$
(69c)

Proof

A short calculation shows \(\langle B(W_i, F_i(W))V_i, \mathbb {1}_n\rangle = 0\) for all \(i \in {\mathcal {V}}\), that is \(\mathcal {B}(W, X)[V] \in {\mathcal {T}_{0}}\). Let \(\eta :(-\varepsilon , \varepsilon ) \rightarrow {\mathcal {W}}\) be a curve with \(\eta (0) = W\) and \(\dot{\eta }(0) = V\). Keeping in mind \({R}_p = {{\,\textrm{Diag}\,}}(p) - pp^\top \), we obtain for each row vector indexed by \(i \in {\mathcal {V}}\)

$$\begin{aligned} \big (d{\mathcal {R}}[F]|_W[V]\big )_i&= \tfrac{d}{dt} {R}_{\eta _i(t)} F_i(\eta (t))\big |_{t = 0} = \tfrac{d}{dt} {R}_{\eta _i(t)}\big |_{t=0} F_i(W) + {R}_{W_i} \tfrac{d}{dt} F_i(\eta (t))\big |_{t = 0} \end{aligned}$$
(70a)
$$\begin{aligned}&= \big ({{\,\textrm{Diag}\,}}(V_i) - V_i W_i^\top - W_i V_i^\top \big ) F_i(W) + \big ({\mathcal {R}}_{W} \big [\tfrac{d}{dt} F(\eta (t))\big |_{t = 0}\big ]\big )_i \end{aligned}$$
(70b)
$$\begin{aligned}&= \big (\mathcal {B}(W, F(W))[V]\big )_i + \big ({\mathcal {R}}_W\circ dF|_W[V]\big )_i, \end{aligned}$$
(70c)

where \({{\,\textrm{Diag}\,}}(V_i)F_i(W) = {{\,\textrm{Diag}\,}}(F_i(W))V_i\) and \(V_i^\top F_i(W) = F_i(W)^\top V_i\) was used to obtain the last equality. \(\square \)

Next, we consider the covariant derivative of a vector fields along a curve \(p :I \rightarrow {\mathcal {S}}\), with \(I \subset \mathbb {R}\) an interval. Due to \(T{\mathcal {S}}= {\mathcal {S}}\times {T_{0}}\), we view a vector field v(t) along p(t) as a map \(v :I \rightarrow {T_{0}}\), and also its usual time derivative \(\dot{v} :I \rightarrow {T_{0}}\), since \({T_{0}}\) is a vector space. Defining

$$\begin{aligned} A :{\mathcal {S}}\times {T_{0}}\rightarrow {T_{0}},\qquad (p, v) \mapsto A(p, v):= \frac{v^{\diamond 2}}{p} - \Vert v\Vert _g^2 p \end{aligned}$$
(71)

and using the expression from [9, Eq. (2.60)] (with \(\alpha \) set to 0), the covariant derivative \(D_t^g v\) of v is related to \(\dot{v}\) by

$$\begin{aligned} D_t^g v(t) = \dot{v}(t) - \tfrac{1}{2} \frac{(v(t))^{\diamond 2}}{p(t)} + \tfrac{1}{2} \Vert v(t)\Vert _g^2 p(t) = \dot{v}(t) - \tfrac{1}{2} A(p(t), v(t)). \end{aligned}$$
(72)

Similarly, as a consequence of \(T{\mathcal {W}}= {\mathcal {W}}\times {\mathcal {T}_{0}}\), we regard a vector field V(t) along a curve \(W :I \rightarrow {\mathcal {W}}\) as a mapping \(V :I \rightarrow {\mathcal {T}_{0}}\), and likewise \(\dot{V} :I \rightarrow {\mathcal {T}_{0}}\). Since the covariant derivative on a product manifold equipped with a product metric is the componentwise application of the individual covariant derivatives, the covariant derivative of V on \({\mathcal {W}}\) has the form

$$\begin{aligned} D_t^g V(t) = \dot{V}(t) - \tfrac{1}{2}\mathcal {A}(W(t), V(t)), \end{aligned}$$
(73)

with i-th row of the last term given componentwise by (71)

$$\begin{aligned} \mathcal {A} :{\mathcal {W}}\times {\mathcal {T}_{0}}\rightarrow {\mathcal {T}_{0}},\qquad (\mathcal {A}(W, V))_i = A(W_i, V_i) \quad \text {for all } i\in [m]. \end{aligned}$$
(74)

The acceleration of a curve W(t) on \({\mathcal {W}}\) is the covariant derivative of its velocity vector field \(V(t):= \dot{W}(t)\), related to the ordinary time derivative \(\dot{V} = \ddot{W}\) in \(\mathbb {R}^{m\times n}\) by

$$\begin{aligned} D_t^g \dot{W}(t) = \ddot{W}(t) - \tfrac{1}{2}\mathcal {A}(W(t), \dot{W}(t)). \end{aligned}$$
(75)

Lemma 3.7

Suppose \(W :I \rightarrow {\mathcal {W}}\) is a solution of the assignment flow (39). Then the acceleration of W(t) in terms of the covariant derivative of \(\dot{W}(t)\) takes the form

$$\begin{aligned} D^g_t \dot{W} = {\mathcal {R}}_{W} \circ dF|_{W} \circ {\mathcal {R}}_{W}[F(W)] + \tfrac{1}{2} \mathcal {A}\big (W, {\mathcal {R}}_{W}[F(W)]\big ). \end{aligned}$$
(76)

Proof

Since W(t) is a solution of \(\dot{W}(t) = {\mathcal {R}}_{W(t)}[F(W(t))]\), the second derivative \(\ddot{W}(t) = \frac{d}{dt} \dot{W}(t)\) takes the form (to simplify notation we omit the argument t)

$$\begin{aligned} \ddot{W}&= \tfrac{d}{dt}{\mathcal {R}}_{W}[F(W)] = d{\mathcal {R}}[F]|_{W}[\dot{W}] \overset{\text {Lem.~}3.6}{=} {\mathcal {R}}_{W}\circ dF|_{W}[\dot{W}] + \mathcal {B}(W, F(W))[\dot{W}] \end{aligned}$$
(77a)
$$\begin{aligned}&= {\mathcal {R}}_{W} \circ dF|_{W}\circ {\mathcal {R}}_{W}[F(W)] + \mathcal {B}(W, F(W))[{\mathcal {R}}_{W}[F(W)]], \end{aligned}$$
(77b)

where \(\mathcal {B}\) is defined by (69b). We have \(\langle f, {R}_p f\rangle = \Vert {R}_p f\Vert _g^2\) by Lemma 3.5 and using (69c)

$$\begin{aligned} B(p, f) R_p f&= (f - \langle p, f\rangle \mathbb {1}_n) \diamond ({R}_p f) - \langle f, {R}_p f\rangle p \end{aligned}$$
(78a)
$$\begin{aligned}&= \tfrac{1}{p} ({R}_p f)^{\diamond 2} - \Vert {R}_p f\Vert _g^2 p = A(p, {R}_p f). \end{aligned}$$
(78b)

This implies \(\mathcal {B}(W, F(W))[{\mathcal {R}}_{W}[F(W)]] = \mathcal {A}(W, {\mathcal {R}}_W[F(W)])\) and results in the identity

$$\begin{aligned} \ddot{W} = {\mathcal {R}}_{W} \circ dF|_{W} \circ {\mathcal {R}}_{W}[F(W)] + \mathcal {A}\big (W, {\mathcal {R}}_{W}[F(W)]\big ). \end{aligned}$$
(79)

Substituting this expression into (75) yields (76). \(\square \)

As a final preparatory step, we define the potential

$$\begin{aligned} G :{\mathcal {W}}\rightarrow \mathbb {R}, \qquad G(W) := - \tfrac{1}{2}\Vert {\mathcal {R}}_W[F(W)]\Vert _g^2 \overset{(65)}{=} -\tfrac{1}{2}\sum _{k\in {\mathcal {V}}} \textrm{Var}_{W_k} (F_k(W)) \end{aligned}$$
(80)

and compute its Riemannian gradient.

Lemma 3.8

The Riemannian gradient of the potential G from (80) is given by

$$\begin{aligned} {{\,\textrm{grad}\,}}^{g}\!G(W) = -{\mathcal {R}}_W \circ dF|_W^*\circ {\mathcal {R}}_W[F(W)] - \tfrac{1}{2}\mathcal {A}(W, {\mathcal {R}}_W[F(W)]),\quad \forall W \in {\mathcal {W}}, \end{aligned}$$
(81)

where \(dF|_{W(t)}^*\) is the adjoint linear operator of \(dF|_{W(t)}\) from (60).

Proof

Let \(W \in {\mathcal {W}}\). In the following, we derive the expression in (81) by applying Corollary 3.2. To this end, take any \(V \in T_W{\mathcal {W}}={\mathcal {T}_{0}}\) and let \(\eta :(-\varepsilon , \varepsilon ) \rightarrow {\mathcal {W}}\) be a curve with \(\eta (0) = W\) and \(\dot{\eta }(0) = V\). Then

$$\begin{aligned} dG|_W[V]&= \tfrac{d}{dt}G(\eta (t))\big |_{t=0} \overset{\text { Lem. }~65}{=} -\tfrac{1}{2}\tfrac{d}{dt}\big \langle F(\eta (t)), {\mathcal {R}}_{\eta (t)} [F(\eta (t))]\big \rangle \big |_{t=0} \end{aligned}$$
(82a)
$$\begin{aligned}&= -\tfrac{1}{2}\big \langle \tfrac{d}{dt} F(\eta (t))\big |_{t=0}, {\mathcal {R}}_{\eta (t)} F(\eta (t))\big \rangle -\tfrac{1}{2}\big \langle F(\eta (t)), \tfrac{d}{dt}{\mathcal {R}}_{\eta (t)} F(\eta (t))\big |_{t=0}\big \rangle \end{aligned}$$
(82b)
$$\begin{aligned}&= -\tfrac{1}{2}\big \langle dF|_W[V], {\mathcal {R}}_W [F(W)]\big \rangle - \tfrac{1}{2}\big \langle F(W), d{\mathcal {R}}[F]|_W[V]\big \rangle . \end{aligned}$$
(82c)

Using the expression for \(d{\mathcal {R}}[F]|_W\) from Lemma 3.6 and \({\mathcal {R}}_W^* = {\mathcal {R}}_W\) from (52), the second inner product takes the form

$$\begin{aligned} \big \langle F(W), d{\mathcal {R}}[F]|_W[V]\big \rangle&= \big \langle F(W), {\mathcal {R}}_W\circ dF|_W[V]\big \rangle + \big \langle F(W), \mathcal {B}(W, F(W))[V]\big \rangle \end{aligned}$$
(83a)
$$\begin{aligned}&= \big \langle dF|_W^*\circ {\mathcal {R}}_W[F(W)], V\big \rangle + \big \langle \mathcal {B}^*(W, F(W))[F(W)], V\big \rangle . \end{aligned}$$
(83b)

Substituting back this formula into the above expression for \(dG|_W\) together with the expression

$$\begin{aligned} \big \langle dF|_W[V], {\mathcal {R}}_W [F(W)]\big \rangle = \big \langle V, dF|_W^*\circ {\mathcal {R}}_W [F(W)]\big \rangle \end{aligned}$$
(84)

for the first inner product, results in

$$\begin{aligned} dG|_W[V]&= \big \langle -dF|_W^*\circ {\mathcal {R}}_W[F(W)] - \tfrac{1}{2} \mathcal {B}^*(W, F(W))[F(W)], V\big \rangle \end{aligned}$$
(85a)
$$\begin{aligned}&= \langle \Psi (W),V\rangle . \end{aligned}$$
(85b)

Due to Corollary 3.2, the Riemannian gradient is given by

$$\begin{aligned} {{\,\textrm{grad}\,}}^g G(W)&= {\mathcal {R}}_W[\Psi (W)] \end{aligned}$$
(86a)
$$\begin{aligned}&= -{\mathcal {R}}_W \circ dF|_W^*\circ {\mathcal {R}}_W[F(W)] - \tfrac{1}{2}{\mathcal {R}}_W[\mathcal {B}^*(W, F(W))[F(W)]]. \end{aligned}$$
(86b)

Regarding the adjoint mapping \(\mathcal {B}^{*}\), we have

$$\begin{aligned} (\mathcal {B}^{*}(W,F)[U])_i = B(W_i, F_i)^\top U_i \quad \text {for all } i \in {\mathcal {V}}\end{aligned}$$
(87)

and by (69c)

$$\begin{aligned} B(p,f)R_{p}&= ({{\,\textrm{Diag}\,}}(f)-\langle p,f\rangle I_{n}-p f^{\top })({{\,\textrm{Diag}\,}}(p)-p p^{\top }) \end{aligned}$$
(88a)
$$\begin{aligned}&= {{\,\textrm{Diag}\,}}(f\diamond p)-\langle p,f\rangle {{\,\textrm{Diag}\,}}(p)-p (f\diamond p)^{\top } \nonumber \\&\quad -(f\diamond p)p^{\top }+ \langle p,f\rangle p p^{\top } + pp^\top fp^\top \end{aligned}$$
(88b)
$$\begin{aligned}&= ({{\,\textrm{Diag}\,}}(p)-p p^{\top })({{\,\textrm{Diag}\,}}(f)-\langle p,f\rangle I_{n}-f p^{\top }) \end{aligned}$$
(88c)
$$\begin{aligned}&= R_{p}B(p,f)^{\top }. \end{aligned}$$
(88d)

Thus, by (), we obtain \(R_{p}B(p,f)^{\top } f = A(p,R_{p} f)\) and consequently by the componentwise definitions of \(\mathcal {B}^*\) in (87), \(\mathcal {A}\) in (74) and \({\mathcal {R}}_W\) in (42a),

$$\begin{aligned} {\mathcal {R}}_W[\mathcal {B}^*(W, F(W))[F(W)]] = \mathcal {A}(W, {\mathcal {R}}_W[F(W)]). \end{aligned}$$
(89)

Substitution into (86b) yields (81). \(\square \)

Proof of Theorem 3.3

Due to Lemma 3.5, the Lagrangian of the action functional (61) has the form

$$\begin{aligned} L(W, V) = \tfrac{1}{2}\Vert V\Vert _g^2 - G(W), \end{aligned}$$
(90)

with G(W) defined by (80). Therefore, the Euler–Lagrange equation (63) is a direct consequence of Proposition 2.1. Due to Lemmas 3.7 and 3.8, the expression for the acceleration of W(t) and the Riemannian gradient of G at W(t) both contain the term

$$\begin{aligned} \tfrac{1}{2}\mathcal {A}(W(t), {\mathcal {R}}_{W(t)}[F(W(t))]) \end{aligned}$$
(91)

with opposite signs, which yields the relation

$$\begin{aligned}&D_t^g \dot{W}(t) + {{\,\textrm{grad}\,}}^{g}\!G(W(t)) \\&\quad = {\mathcal {R}}_{W(t)}\circ dF|_{W(t)}\circ {\mathcal {R}}_{W(t)} [F(W(t))] - {\mathcal {R}}_{W(t)}\circ dF|_{W(t)}^*\circ {\mathcal {R}}_{W(t)} [F(W(t))]\\&\quad = {\mathcal {R}}_{W(t)}\circ (dF|_{W(t)} - dF|_{W(t)}^*)\circ {\mathcal {R}}_{W(t)}[F(W(t))]. \end{aligned}$$

As a consequence, the characterization of F in (62) is equivalent to the Euler–Lagrange equation (63) and by Proposition 2.1 equivalent to W(t) being a critical point of the action functional. \(\square \)

3.5 Lagrangian and Hamiltonian point of view

Theorem 3.3 rests upon the representation of the assignment flow as a Lagrangian mechanical system of the form kinetic minus potential energy (90), as summarized in Sect. 2.4. Due to this specific form, Proposition 2.1 can be applied to characterize critical points of the action functional \(\mathcal {L}\) from Theorem 3.3 as solutions to the Euler–Lagrange equation (63), which in turn allows to derive condition (62).

For general Lagrangians, however, Proposition 2.1 is not applicable and critical points of the action functional are characterized as integral curves of the Lagrangian vector field \(X_E\) as detailed in Sect. 2.2. Since Lagrangians of the form kinetic minus potential energy (22) are hyperregular, the representation as Hamiltonian system via the Legendre transformation \(\mathbb {F}L\) is an equivalent alternative. As mentioned in Sect. 2.3, the energy \(E:T{\mathcal {W}}\rightarrow \mathbb {R}\), the Hamiltonian \(H :T^*{\mathcal {W}}\rightarrow \mathbb {R}\) and their corresponding vector fields \(X_E\) on \(T{\mathcal {W}}\) and \(X_H\) on \(T^*{\mathcal {W}}\) are related via

$$\begin{aligned} E = H \circ \mathbb {F}L \quad \text {and}\quad X_E = (\mathbb {F}L)_*^{-1} X_H. \end{aligned}$$
(92)

To obtain interpretable explicit formulas, it will be more convenient to work on \(T{\mathcal {W}}\) instead of \(T^*{\mathcal {W}}\). In the following, we derive an explicit expression for the Lagrangian vector field \(X_E\) and relate its corresponding integral curves to the Euler–Lagrange equation (63) of Theorem 3.3. Because \(X_E\) is the symplectic gradient of the energy E with respect to the Lagrangian form \(\omega _L\), see (13), we first calculate an alternative formula for \(\omega _L\) in terms of the Fisher-Rao metric. For this we exploit the fact that the assignment manifold is a so called Hessian manifold [19], that is in suitable coordinates the Fisher-Rao metric is the Hessian of a convex function.

Since \(T{\mathcal {W}}= {\mathcal {W}}\times {\mathcal {T}_{0}}\) (38) is trivial, the tangent space of \(T{\mathcal {W}}\) at any point \((W, V) \in T{\mathcal {W}}\) can be identified with the vector space

$$\begin{aligned} T_{(W, V)} T{\mathcal {W}}= {\mathcal {T}_{0}}\times {\mathcal {T}_{0}}. \end{aligned}$$
(93)

With this identification, the Lagrangian two-form \(\omega _L\) has the following simple expression.

Lemma 3.9

Let \((W, V) \in T{\mathcal {W}}\) and \(A = (A', A''), B = (B', B'') \in T_{(W, V)}T{\mathcal {W}}= {\mathcal {T}_{0}}\times {\mathcal {T}_{0}}\). Then the Lagrangian two-form can be expressed via the Fisher-Rao metric as

$$\begin{aligned} \omega _L|_{(W, V)} \big (A, B\big ) = g_W(A', B'') - g_W(A'', B'). \end{aligned}$$
(94)

Proof

In the following, if \(\varphi \) is a real valued function on \({\mathcal {S}}\) or \({\mathcal {W}}\), then its coordinate representation is denoted by \(\widehat{\varphi }\). A global chart on \({\mathcal {S}}\) is given by \(\eta _{\mathcal {S}}:{\mathcal {S}}\rightarrow \mathbb {R}^{n-1}\) with \(p \mapsto \eta _{\mathcal {S}}(p) = (p^1, \ldots , p^{n-1})\). It is a standard result from information geometry [20] that the negative entropy \(\varphi \), a smooth convex function on \({\mathcal {S}}\) defined by

$$\begin{aligned} \varphi :{\mathcal {S}}\rightarrow \mathbb {R}, \quad p \mapsto \sum _{i \in [n]} p^i \log (p^i) = \langle p, \log (p)\rangle , \end{aligned}$$
(95)

induces the Fisher-Rao metric in coordinates \(\eta _{\mathcal {S}}\), denoted by \((g^{\mathcal {S}}_{ij})\), as the Hessian of \(\widehat{\varphi }\)

$$\begin{aligned} g^{\mathcal {S}}_{ij}(p) = \frac{\partial ^2 \widehat{\varphi }}{\partial p^i \partial p^j} (p^1, \ldots , p^{n-1}) \quad \text {for all } i, j \in [n-1]. \end{aligned}$$
(96)

Thus, a single simplex \({\mathcal {S}}\) has the structure of a Hessian manifold [19]. As a global chart of the product manifold \({\mathcal {W}}= \prod _{i\in [m]} {\mathcal {S}}\) we take the product chart \(\eta _{\mathcal {W}}:{\mathcal {W}}\rightarrow \mathbb {R}^{m(n-1)}\) with \(W \mapsto \eta _{\mathcal {W}}(W) = (\eta _{\mathcal {S}}(W_1), \ldots , \eta _{\mathcal {S}}(W_m)) = (x^1, \ldots , x^{m(n-1)}) = x\), where each \(W_i\) lies in \({\mathcal {S}}\) for all \(i \in [m]\). Define the accumulated negative entropy by

$$\begin{aligned} \varphi _\textrm{acc} :{\mathcal {W}}\rightarrow \mathbb {R}, \quad W \mapsto \sum _{i\in [m]} \varphi (W_i). \end{aligned}$$
(97)

and let \((g^{\mathcal {W}}_{ij})\) denote the representation of the product Fisher-Rao metric (44) on \({\mathcal {W}}\) in coordinates \(\eta _{\mathcal {W}}\). Since \(\varphi _\textrm{acc}\) separates over the product structure of \({\mathcal {W}}\), the accumulated negative entropy also induces the product Riemannian metric in the chart \(\eta _{\mathcal {W}}\)

$$\begin{aligned} g^{\mathcal {W}}_{ij}(x) = \frac{\partial ^2 \widehat{\varphi }_\textrm{acc}}{\partial x^i \partial x^j} (x), \end{aligned}$$
(98)

equipping also the assignment manifold with the structure of a Hessian manifold [19].

Now, take an arbitrary point \((W, V) \in T{\mathcal {W}}= {\mathcal {W}}\times {\mathcal {T}_{0}}\) and let (xv) be the corresponding coordinates with respect to the chart \(\eta _{\mathcal {W}}\). According to [15, Prop. 3.5.6], the Lagrangian two-form \(\omega _L\) (13) in coordinates is given by

$$\begin{aligned} \omega _L = \sum _{i, j} \bigg ( \frac{\partial ^2 L}{\partial v^i \partial x^j } dx^i \wedge dx^j + \frac{\partial ^2 L}{\partial v^i \partial v^j} dx^i\wedge dv^j\bigg ). \end{aligned}$$
(99)

Since the coordinate expression of the Lagrangian (90) is \(L(x, v) = \frac{1}{2}\sum _{i, j} g^{\mathcal {W}}_{ij}v^i v^j - G(x)\), the second-order derivatives are

$$\begin{aligned} \frac{\partial ^2 L}{\partial v^i \partial x^j} = \sum _k\frac{\partial g^{\mathcal {W}}_{ik}}{\partial x^j}v^k \quad \text {and}\quad \frac{\partial ^2 L}{\partial v^i \partial v^j} = g^{\mathcal {W}}_{ij}. \end{aligned}$$
(100)

Plugging these expressions into (99) and rearranging the first sum using \(dx^j\wedge dx^i = - dx^i \wedge dx^j\) yields

$$\begin{aligned} \omega _L = \sum _{i < j} \sum _{k}\Bigg (\frac{\partial g^{\mathcal {W}}_{ik}}{\partial x^j} - \frac{\partial g^{\mathcal {W}}_{jk}}{\partial x^i} \Bigg )v^k dx^i\wedge dx^j + \sum _{i,j} g^{\mathcal {W}}_{ij}dx^i \wedge dv^j. \end{aligned}$$
(101)

Due to the Hessian structure (98)

$$\begin{aligned} \frac{\partial g^{\mathcal {W}}_{ik}}{\partial x^j} = \frac{\partial ^3 \widehat{\varphi }_\textrm{acc}}{\partial x^j \partial x^i \partial x^k} = \frac{\partial ^3 \widehat{\varphi }_\textrm{acc}}{\partial x^i \partial x^j \partial x^k} = \frac{\partial g^{\mathcal {W}}_{jk}}{\partial x^i} \end{aligned}$$
(102)

holds and the first sum in (101) vanishes, resulting in the simplified expression

$$\begin{aligned} \omega _L = \sum _{ij}g^{\mathcal {W}}_{ij}dx^i \wedge dv^j. \end{aligned}$$
(103)

Suppose \(A = (A', A''), B=(B', B'') \in T_{(W, V)} T{\mathcal {W}}= {\mathcal {T}_{0}}\times {\mathcal {T}_{0}}\) with coordinates

$$\begin{aligned} A = \sum _i A^{'i} \frac{\partial }{\partial x^i} + \sum _i A^{''i} \frac{\partial }{\partial v^i} \quad \text {and}\quad B = \sum _i B^{'i} \frac{\partial }{\partial x^i} + \sum _i B^{''i} \frac{\partial }{\partial v^i}. \end{aligned}$$
(104)

Evaluating the Lagriangian two-form (103) we finally obtain

$$\begin{aligned} \omega _L(A, B) = \sum _{ij} g^{\mathcal {W}}_{ij}A^{'i}B^{''j} - \sum _{ji}g^{\mathcal {W}}_{ji}A^{''j}B^{'i} = g(A', B'') - g(A'', B'). \end{aligned}$$
(105)

\(\square \)

Now that we have an explicit expression for the Lagrangian two-form \(\omega _L\), we are in a position to calculate an explicit representation of the Lagrangian vector field \(X_E\).

Proposition 3.10

The Lagrangian vector field \(X_E\) on \(T{\mathcal {W}}\) associated to the Lagrangian (90) at a point \((W, V) \in T{\mathcal {W}}= {\mathcal {W}}\times {\mathcal {T}_{0}}\) is given by

$$\begin{aligned} X_E(W, V) = \begin{pmatrix}V \\ \frac{1}{2} \mathcal {A}(W, V) - {{\,\textrm{grad}\,}}^{g}\!G(W) \end{pmatrix}. \end{aligned}$$
(106)

Proof

We directly use the definition (16) of the Lagrangian vector field \(X_E\). For this, let \(B = (B', B'') \in T_{(W, V)} T{\mathcal {W}}= {\mathcal {T}_{0}}\times {\mathcal {T}_{0}}\) be arbitrary and assume, \(\gamma (t) = (W(t), V(t))\) is a smooth curve in \(T{\mathcal {W}}= {\mathcal {W}}\times {\mathcal {T}_{0}}\) with

$$\begin{aligned} \gamma (0) = (W(0), V(0)) = (W, V)\quad \text {and}\quad \dot{\gamma }(0) = (\dot{W}(0), \dot{V}(0)) = B = (B', B''). \end{aligned}$$
(107)

The time derivative of the potential G is expressed via the Riemannian gradient

$$\begin{aligned} \tfrac{d}{dt} G(W(t))\big |_{t=0} = dG|_W[B'] = g_{W}\big ({{\,\textrm{grad}\,}}^{g}\!G(W), B'\big ). \end{aligned}$$
(108)

By (73), the covariant derivative of V(t) at \(t=0\) is \(D_t^gV(0) = B'' - \frac{1}{2}\mathcal {A}(W, V)\), resulting in

$$\begin{aligned} \tfrac{d}{dt}\tfrac{1}{2}\big \Vert V(t)\big \Vert _{g}^2\big |_{t=0}&= \tfrac{1}{2} \tfrac{d}{dt} g_{W(t)}\big (V(t), V(t)\big )\big |_{t=0} = g_{W(0)}\big (V(0), D_t^g V(0)\big ) \end{aligned}$$
(109a)
$$\begin{aligned}&= g_{W}\big (V, B'' - \tfrac{1}{2}\mathcal {A}(W, V)\big ). \end{aligned}$$
(109b)

Putting everything together we obtain the following relation for the differential of the energy E from (24)

$$\begin{aligned} dE|_{(W, V)}[B]&= \tfrac{d}{dt} E(W(t), V(t))\big |_{t=0} = \tfrac{d}{dt}\tfrac{1}{2}\big \Vert V(t)\big \Vert _{g}^2\big |_{t=0} + \tfrac{d}{dt} G(W(t))\big |_{t=0} \end{aligned}$$
(110a)
$$\begin{aligned}&= g_W\big (V, B''\big ) - g_W\big (\tfrac{1}{2} \mathcal {A}(W, V) - {{\,\textrm{grad}\,}}^{g}\!G(W), B'\big ). \end{aligned}$$
(110b)

Writing \(X_E = (X_E', X_E'') \in {\mathcal {T}_{0}}\times {\mathcal {T}_{0}}\) and comparing (110b) with the above expression for \(\omega _L\) from Lemma 3.9 shows \(X_E'(W, V) = V\) and \(X_E''(W, V) = \tfrac{1}{2} \mathcal {A}(W, V) - {{\,\textrm{grad}\,}}^{g}\!G(W)\). \(\square \)

Any solution curve \(\gamma (t) = (W(t), V(t)) \in T{\mathcal {W}}= {\mathcal {T}_{0}}\times {\mathcal {T}_{0}}\) of the Lagrangian dynamics induced by the Lagrangian vector field \(X_E\) associated to the Lagrangian (90) of Theorem 3.3 fulfills the ODE

$$\begin{aligned} \begin{pmatrix}\dot{W}\\ \dot{V} \end{pmatrix}= X_E(W, V) = \begin{pmatrix}V \\ \frac{1}{2} \mathcal {A}(W, V) - {{\,\textrm{grad}\,}}^{g}\!G(W) \end{pmatrix}. \end{aligned}$$
(111)

This form of the Hamiltonian ODE simply reflects the fact that this first-order dynamics on \(T{\mathcal {W}}\) is induced by a second-order ODE on \({\mathcal {W}}\). Indeed, substituting \(V = \dot{W}\) in the second component of \(X_E\) results in

$$\begin{aligned} \ddot{W} = \dot{V} = \tfrac{1}{2} \mathcal {A}(W, V) - {{\,\textrm{grad}\,}}^{g}\!G(W) \quad \overset{(75)}{\Leftrightarrow } \quad D^g_t \dot{W} = - {{\,\textrm{grad}\,}}^{g}\!G(W), \end{aligned}$$
(112)

which we have already known to be satisfied for the base curve W(t) by (63) of Theorem 3.3.

Remark 3.11

Equation (111), and in particular its version on the left-hand side of (112), gives an alternative way to prove condition (62) in Theorem 3.3, without the use of Proposition 2.1. For this, we can directly apply Lemma 3.8 and the result in Eq. 79.

4 Discussion

In this section, we discuss various properties and consequences of Theorem 3.3.

4.1 Mañé critical value

In his influential work [21] Mañé introduced critical values which should be interpreted as energy levels that mark important dynamical and geometric changes for the Euler–Lagrange flow, see [22] for a nice introduction. Dynamical properties at energies being equal to a Mañé critical value are often times hard to analyze. In general, there are various related Mañé critical values, however for classical Lagrangians such as L, e.g. (90), in Theorem 3.3 all of them agree and equal the maximum of the potential. As pointed out before the potential part of the Lagrangian L is \(G(W) = -\frac{1}{2}\sum _{i \in {\mathcal {V}}} \textrm{Var}_{W_i(t)}\big (F_i(W(t))\big )\) which has 0 as maximum. At the same time solutions to the assignment flow equation (39) are precisely the solutions to the Euler–Lagrange equation (63) of energy 0, i.e. at the Mañé critical value of \(\mathcal {L}\).

In the following, basic properties of the set of Mañé critical points on \({\mathcal {W}}\)

$$\begin{aligned} {\mathcal {M}_\textrm{crit}}:= {{\,\textrm{argmax}\,}}_{W \in {\mathcal {W}}} G(W) = G^{-1}(0) \end{aligned}$$
(113)

are investigated and summarized in Proposition 4.1. Subsequently, based on a result from geometric mechanics, Proposition 4.4 shows that integral curves of the assignment flow that are critical points of the action functional \(\mathcal {L}\) in Theorem 3.3 and start in the complement

$$\begin{aligned} {\mathcal {Q}}:= {\mathcal {W}}\setminus {\mathcal {M}_\textrm{crit}}\end{aligned}$$
(114)

are actually reparametrized geodesics of the so called Jacobi metric introduced below.

By Lemma 3.5, we have

$$\begin{aligned} 0 = G(W) \overset{(80)}{=} -\tfrac{1}{2}\Vert {\mathcal {R}}_W[F(W)]\Vert _g^2 \quad \Leftrightarrow \quad 0 = {\mathcal {R}}_W[F(W)], \end{aligned}$$
(115)

that is the potential assumes its maximum at W if and only if W is an equilibrium point of the assignment flow (39). Due to \({\mathcal {R}}_W|_{\mathcal {T}_{0}}\) being a linear isomorphism by (51), we further obtain

$$\begin{aligned} 0 = {\mathcal {R}}_W[F(W)] \overset{(50)}{=} {\mathcal {R}}_W|_{\mathcal {T}_{0}}\circ {\mathcal {P}_{\mathcal {T}_{0}}}[F(W)] \quad \Leftrightarrow \quad 0 = {\mathcal {P}_{\mathcal {T}_{0}}}[F(W)]. \end{aligned}$$
(116)

Thus, we need to consider the zero set of the smooth map

$$\begin{aligned} {\mathcal {P}_{\mathcal {T}_{0}}}\circ F :{\mathcal {W}}\rightarrow {\mathcal {T}_{0}}. \end{aligned}$$
(117)

We restrict our analysis to affinity functions F for which the differential \(d({\mathcal {P}_{\mathcal {T}_{0}}}\circ F)\) has constant rank on \({\mathcal {W}}\), in the following denoted by r. To avoid the trivial case \({\mathcal {P}_{\mathcal {T}_{0}}}\circ F \equiv \textrm{const}\) we further restrict to the case \(r \ge 1\). A basic instance of this case is given in Sect. 4.2 with F being a linear map.

Due to the Constant-Rank Level Set Theorem [23, Thm. 5.12], the zero set

$$\begin{aligned} ({\mathcal {P}_{\mathcal {T}_{0}}}\circ F)^{-1}(0) = G^{-1}(0) = {\mathcal {M}_\textrm{crit}}\subset {\mathcal {W}}\end{aligned}$$
(118)

is a properly embedded submanifold of \({\mathcal {W}}\) with dimension

$$\begin{aligned} \dim ({\mathcal {M}_\textrm{crit}}) = \dim ({\mathcal {W}}) - r \le \dim ({\mathcal {W}}) - 1. \end{aligned}$$
(119)

Since the dimension of \({\mathcal {M}_\textrm{crit}}\) is strictly less than \(\dim ({\mathcal {W}})\), it is a submanifold with measure zero in \({\mathcal {W}}\) [23, Cor. 6.12]. Therefore, the complement \({\mathcal {Q}}\) (114), that is the set of points W with \(G(W) < 0\), is a dense ([23, Prop. 6.8]) subset of \({\mathcal {W}}\). According to [23, Prop. 5.5], being properly embedded in \({\mathcal {W}}\) is equivalent to being a closed subset of \({\mathcal {W}}\) (in the subspace topology). Thus, \({\mathcal {Q}}\) is an open subset of \({\mathcal {W}}\) and consequently also a submanifold. Overall we have proven the following statement.

Proposition 4.1

If the differential \(d({\mathcal {P}_{\mathcal {T}_{0}}}\circ F)\) has constant rank \(r\ge 1\) on \({\mathcal {W}}\), then the set \({\mathcal {M}_\textrm{crit}}\) of Mañé critical points (113) is a submanifold of \({\mathcal {W}}\) with measure zero and its complement \({\mathcal {Q}}\subset {\mathcal {W}}\) (114) is an open and dense subset.

Equipped with this result, we are now able to characterize solutions of the assignment flow (39) starting in \({\mathcal {Q}}\) as reparametrized geodesics.

Definition 4.2

[15, Def. 3.7.6] Let h be a Riemannian metric on M and \(G :M \rightarrow \mathbb {R}\) a potential. Assume C is a constant such that \(G(x) < C\) holds for all \(x \in M\). Then the Jacobi metric is defined by

$$\begin{aligned} h_C:= (C - G)h. \end{aligned}$$
(120)

Theorem 4.3

[15, Thm. 3.7.7] Up to reparametrization, the base integral curves of the Lagrangian \(L(x, v) = \frac{1}{2}\Vert v\Vert _h^2 - G(x)\) with energy \(E_0\) are the same as geodesics of the Jacobi metric \(h_{E_0}\) with energy 1.

Since \(G < 0\) on \({\mathcal {Q}}\), we restrict our investigation to the Riemannian submanifold \(({\mathcal {Q}}, g|_{\mathcal {Q}})\) and set \(C:= 0\), resulting in the Jacobi metric \(h_0 = (-G) g|_{\mathcal {Q}}\) of the form

$$\begin{aligned} (h_0)_W \overset{(80)}{=} \tfrac{1}{2}\sum _{k\in {\mathcal {V}}} \textrm{Var}_{W_k} (F_k(W))\ g_W \quad \text {for any point } W \in {\mathcal {Q}}. \end{aligned}$$
(121)

Now, let W(t) be an integral curve of the assignment flow (39). If the initial value W(0) lies in \({\mathcal {Q}}\), then the entire integral curve W(t) remains in \({\mathcal {Q}}\). This is a consequence of Mañé critical points being equilibrium points by (115) and the fact that the assignment flow is a first-order ODE. If additionally W(t) is a critical point of the action functional \(\mathcal {L}\) from Theorem 3.3, then W(t) is a base integral curve with energy \(E_0 = 0\). Thus, Theorem 4.3 directly implies the following statement.

Proposition 4.4

Let W(t) be an integral curve of the assignment flow (39). If W(t) is a critical point of the action function \(\mathcal {L}\) in Theorem 3.3 with initial value \(W(0)\in {\mathcal {Q}}\), then, up to reparametrization, W(t) is a geodesic of the Jacobi metric (121).

Since geodesics are locally length-minimizing, this result shows that, up to initial conditions in a set of measure zero, solutions to the assignment flow locally realize shortest paths between assignments.

Remark 4.5

It is important to note that the previous statement is only true for solutions of the assignment flow, which is a first-order ODE. A general solution of the second-order ODE Euler–Lagrange equation (63) might leave \({\mathcal {Q}}\) in finite time and cross the set \({\mathcal {M}_\textrm{crit}}\).

Next, we consider the arc length of an integral curve W(t) from Proposition 4.4 with respect to the Jacobi metric \(h_0\). Because of

$$\begin{aligned} \Vert \dot{W}(t)\Vert _{h_0}^2 = - G(W(t)) \Vert \dot{W}\Vert ^2_g \overset{(80)}{=} 2G^2(W(t)), \end{aligned}$$
(122)

the arc length takes the form

$$\begin{aligned} \alpha (t) \!:=\! \int _0^t \!\Vert \dot{W}(\tau )\Vert _{h_0} d\tau \!=\! \sqrt{2} \int _0^t |G(W(\tau ))\!|d\tau = \tfrac{1}{2} \sum _{k\in {\mathcal {V}}} \int _0^t\textrm{Var}_{W_k(\tau )} (F_k(W(\tau )))d\tau . \end{aligned}$$
(123)

Due to the initial condition \(W(0) \in {\mathcal {Q}}\), the solution W(t) is a regular curve and the arc length can be used to reparametrize W(t) by arc length, thereby obtaining the actual geodesic

$$\begin{aligned} \widetilde{W}(s) := W(\alpha ^{-1}(s)) \end{aligned}$$
(124)

with respect to the Jacobi metric \(h_0\). Setting \(\widetilde{F}(\widetilde{W}):= (\sqrt{2} |G(\widetilde{W})|)^{-1} F(\widetilde{W})\), a standard calculation using the inverse function rule further reveals

$$\begin{aligned} \frac{d}{ds}\widetilde{W}(s) = \dot{W}(\alpha ^{-1}(s)) \frac{1}{\sqrt{2}|G(\widetilde{W}(s))|} = {\mathcal {R}}_{\widetilde{W}(s)}[ \widetilde{F}(\widetilde{W}(s))]. \end{aligned}$$
(125)

Thus, the actual Jacobian geodesics \(\widetilde{W}(s)\) themselves are solutions to an assignment flow, one where the original affinity mapping F has been scaled by the combined variations of its components. This allows to investigate Riemannian properties of the Jacobian metric and its geodesics in future work.

In the next section, we directly determine the set \({\mathcal {M}_\textrm{crit}}\) for the a basic instance of an assignment flow.

4.2 Admissible affinity functions

Condition (62) characterizes affinity functions F for Theorem 3.3 to hold. We contrast this condition with a simple affinity function used in prior work and directly determine the corresponding set \({\mathcal {M}_\textrm{crit}}\) of Mañé critical points.

The recent paper [18, Proposition 3.6] introduced a reparametrization, called S-flow, of the original assignment flow formulation of [2]. The distance information between each data point \(f_i \in \mathcal {F}\) and the labels \(f^*_j \in \mathcal {F}\) is collected in the data matrix

$$\begin{aligned} D \in \mathbb {R}^{m\times n}\quad \text {with}\quad D_{ij} = d_\mathcal {F}(f_i, f^*_j), \quad \text {for } i \in [m], j \in [n], \end{aligned}$$
(126)

where \(d_\mathcal {F}\) is the metric introduced in Sect. 3.1. Intuitively it represents how well each data point is represented by the labels. For a nonnegative averaging matrix

$$\begin{aligned} \Omega \in \mathbb {R}^{m\times m} \quad \text {with}\quad \Omega \ge 0 \quad \text {and}\quad \Omega \mathbb {1}_m = \mathbb {1}_m, \end{aligned}$$
(127)

the S-flow equations read

$$\begin{aligned} \dot{S}&= {\mathcal {R}}_{S}[\Omega S],&S(0)&= \exp _{\mathbb {1}_{\mathcal {W}}}(-\Omega D), \end{aligned}$$
(128a)
$$\begin{aligned} \dot{W}&= {\mathcal {R}}_{W}[S],&W(0)&= \mathbb {1}_{\mathcal {W}}, \end{aligned}$$
(128b)

where the so-called lifting map

$$\begin{aligned} \exp _{W}:{\mathcal {T}_{0}}\rightarrow \mathcal {W},\qquad \exp _{W}&= {{\,\textrm{Exp}\,}}_{W}\circ {\mathcal {R}}_{W}, \quad W\in \mathcal {W}, \end{aligned}$$
(129a)
$$\begin{aligned} \big (\exp _{W}(V)\big )_{i}&= \frac{W_{i}\diamond e^{V_{i}}}{\langle W_{i},e^{V_{i}}\rangle },\quad i\in [m],\quad W\in \mathcal {W},\; V\in {\mathcal {T}_{0}}\end{aligned}$$
(129b)

is the composition of the mapping (42a) and the exponential map \({{\,\textrm{Exp}\,}}\) of \((\mathcal {W},g)\) with respect to the so-called e-connection of information geometry [6]. Note that both solutions S(t), W(t) evolve on \(\mathcal {W}\) and that W(t) depends on S(t) but not vice versa. Hence we focus on the system (128a) and the specific affinity function given by matrix multiplication

$$\begin{aligned} F(S) = \Omega S. \end{aligned}$$
(130)

The differential of F at \(S \in {\mathcal {W}}\) is therefore also given by matrix multiplication

$$\begin{aligned} dF|_{S}[V]=\Omega V, \end{aligned}$$
(131)

that is condition (62) holds in particular if \(\Omega =\Omega ^{\top }\) is symmetric. This assumption was adopted in [18] and in a slightly more general form also in [7].

Next, we determine the set \({\mathcal {M}_\textrm{crit}}\) of Mañé critical points (113) based on the condition on the right-hand side of (116). A basic calculation using the properties of \({\mathcal {P}_{\mathcal {T}_{0}}}\) and \(\Omega \) shows that these two linear operators commute, resulting in

$$\begin{aligned} {\mathcal {P}_{\mathcal {T}_{0}}}[F(W)] = {\mathcal {P}_{\mathcal {T}_{0}}}[\Omega W] = \Omega {\mathcal {P}_{\mathcal {T}_{0}}}[W] = \Omega \big ( W - {\mathbb {1}_{\mathcal {W}}}\big ) \quad \text {for all } W \in {\mathcal {W}}. \end{aligned}$$
(132)

Since the corresponding differential is just matrix multiplication independent of \(W \in {\mathcal {W}}\)

$$\begin{aligned} d({\mathcal {P}_{\mathcal {T}_{0}}}\circ F)|_W[V] = {\mathcal {P}_{\mathcal {T}_{0}}}[\Omega V] = \Omega V \quad \text {for all } V \in {\mathcal {T}_{0}}, \end{aligned}$$
(133)

the rank r of \(d({\mathcal {P}_{\mathcal {T}_{0}}}\circ F)\) is constant. For Proposition 4.1 to hold, we need to check that the rank satisfies \(r\ge 1\). For this, denote the corresponding kernel of (133) by

$$\begin{aligned} \Sigma _\Omega := \ker \big (d({\mathcal {P}_{\mathcal {T}_{0}}}\circ F)\big ) = \{ V \in {\mathcal {T}_{0}}\ |\ \Omega V = 0\}. \end{aligned}$$
(134)

Lemma 4.6

\(\dim (\Sigma _\Omega ) = (n-1)\dim (\ker (\Omega ))\) and therefore the rank of \(d({\mathcal {P}_{\mathcal {T}_{0}}}\circ F)\) on \({\mathcal {W}}\) is \(r = (n-1) {{\,\textrm{rank}\,}}(\Omega )\).

Proof

Denote the standard basis of \(\mathbb {R}^n\) by \(e_1, \ldots , e_n\). A basis for \({T_{0}}\) (35) is then given by

$$\begin{aligned} b_i := e_i - e_n, \quad \text {for } i \in [n-1]. \end{aligned}$$
(135)

Furthermore, set \(K:= \dim (\ker (\Omega ))\) and let \(a_1, \ldots , a_K\) be a basis of \(\ker (\Omega ) \subset \mathbb {R}^m\). Then, for every \(k \in [K]\) and \(i \in [n-1]\)

$$\begin{aligned} a_k b_i^\top \mathbb {1}_n = a_k \langle b_i, \mathbb {1}_n\rangle = 0\quad \text {and}\quad \Omega a_kb_i^\top = 0, \end{aligned}$$
(136)

showing that \(a_k b_i^\top \in \Sigma _\Omega \). As all the \(a_k\) and \(b_i\) are each linear independent, so are their outer products \(a_k b_i^\top \) for all \(k \in [K]\) and \(i \in [n-1]\). Now, let \(V \in \Sigma _\Omega \) be arbitrary. Writing V as \(V = \sum _{i \in [n]} V e_i e_i^\top \) we obtain

$$\begin{aligned} 0 \overset{(37)}{=} V\mathbb {1}_n = \sum _{i\in [n]} Ve_i \quad \Leftrightarrow \quad Ve_n = -\sum _{i\in [n-1]} Ve_i, \end{aligned}$$
(137)

which in turn shows that V can be expressed in terms of the basis \(b_i\) as \(V = \sum _{i\in [n-1]} V e_i b^\top _i\). On the other hand, the i-th column of V, given by \(Ve_i\), fulfills \(\Omega Ve_i = 0\) and can be expressed as \(Ve_i = \sum _{k\in [K]} \lambda _{ki} a_k\), with coefficients \(\lambda _{ki} \in \mathbb {R}\). Putting everything together results in \(V = \sum _{i\in [n-1]} V e_i b^\top _i = \sum _{i\in [n-1]}\sum _{k\in [K]} \lambda _{ki} a_kb^\top _i\), showing that all the \(a_k b_i^\top \) are indeed a basis for \(\Sigma _\Omega \). As a result, the formulas for \(\dim (\Sigma _\Omega )\) and the rank

$$\begin{aligned} r = \dim ({\mathcal {T}_{0}}) - \dim (\Sigma _\Omega ) = (n-1)m - (n-1)\dim (\ker (\Omega )) = (n-1){{\,\textrm{rank}\,}}(\Omega ) \end{aligned}$$
(138)

follow. \(\square \)

As a consequence of \({{\,\textrm{rank}\,}}(\Omega ) \ge 1\) by (127), a lower bound on the rank r of \(d({\mathcal {P}_{\mathcal {T}_{0}}}\circ F)\) is given by \(r \ge n-1 \ge 1\). Therefore, Proposition 4.1 applies and \({\mathcal {M}_\textrm{crit}}\) for the S-flow is a submanifold of \({\mathcal {W}}\) with measure zero. The expression of \({\mathcal {P}_{\mathcal {T}_{0}}}\circ F\) in terms of \(\Omega \) from (132) and the fact that \(W -{\mathbb {1}_{\mathcal {W}}}\) lies in \({\mathcal {T}_{0}}\) for all \(W \in {\mathcal {W}}\) allow to explicit characterization \({\mathcal {M}_\textrm{crit}}\) as an affine subspace

$$\begin{aligned} {\mathcal {M}_\textrm{crit}}\overset{(118)}{=} ({\mathcal {P}_{\mathcal {T}_{0}}}\circ F)^{-1}(0) \overset{(132)}{=} ({\mathbb {1}_{\mathcal {W}}}+ \Sigma _\Omega ) \cap {\mathcal {W}}. \end{aligned}$$
(139)

with dimension

$$\begin{aligned} \dim ({\mathcal {M}_\textrm{crit}}) \overset{(119)}{=} \dim ({\mathcal {W}}) - r \overset{\text { Lem. }4.6}{=} \dim (\Sigma _\Omega ) \le (m-1)(n-1). \end{aligned}$$
(140)

As \(\Omega \) is assumed to be given, \({\mathcal {M}_\textrm{crit}}\) can explicitly be constructed after a basis for \(\ker (\Omega )\) has been calculated. Therefore, we are able to check if \(S(0) \notin {\mathcal {M}_\textrm{crit}}\), in which case the corresponding integral curve S(t) of the S-flow (128a) would be a reparametrized geodesic for the Jacobi metric (121) with energy \(E_0 = 0\), according to Theorem 4.3.

We conclude this section with another observation that should stimulate future work. Under the afore-mentioned symmetry assumption on the averaging matrix, a continuous-domain approach was studied in [18] corresponding to () at ‘spatial scale zero’. The latter means to consider only parameter matrices \(\Omega \) in (128a) whose sparse row vectors \(\Omega _{i}\) encode nearest-neighbor interactions of \(S_{i}\) and \(\{S_{k}:k\sim i\}\) on an underlying regular grid graph, and to consider the right-hand side of (128a) as discretized Riemannian gradient of a continuous-domain variational approach with pointwise defined variables. Specifically, replacing \(i\in \mathcal {V}\) by locations \(x\in U\subset \mathbb {R}^{d}\), the vector field \(S :\mathcal {V} \rightarrow {\mathcal {S}}\), \(i \mapsto S_i\), becomes a simplex-valued vector field \(S :U \rightarrow {\mathcal {S}}\), \(x \mapsto S(x)\), that has to solve a variational inequality. Besides analyzing existence of a minimizer in a suitable function space and a corresponding dedicated numerical algorithm, a heuristically (under too strong regularity assumptions) derived partial differential equation was presented that is supposed to characterize any minimizer \(S^{*}\) and reads

$$\begin{aligned} R_{S^{*}}(-\Delta S^{*}-\alpha S^{*}) = 0, \end{aligned}$$
(141)

where \(R_{S^{*}}\) applies pointwise \(R_{S^{*}(x)}\) to the vector \((-\Delta S^{*}-\alpha S^{*})(x)\) at every \(x\in \Omega \), in the same way as the mapping \({\mathcal {R}}_{W}\) defined by (42a) amounts to applying the mappings (42b) at every vertex \(i\in \mathcal {V}\).

From this viewpoint, condition (62),

$$\begin{aligned} 0 = {\mathcal {R}}_{W(t)}\circ (dF|_{W(t)}-dF|^{*}_{W(t)})\circ {\mathcal {R}}_{W(t)}[F(W(t))], \end{aligned}$$
(142)

that was shown to be equivalent to the Euler–Lagrange equation (63), should become the spatially-discrete but nonlocal analogon of (141) in the limit \(t\rightarrow \infty \). We leave the exploration of this observation for future work.

4.3 Geometric dynamics versus optimization

In contrast to classical approaches of the labeling problem, the presented dynamical geometric formulation does not merely rely on finding maximizers of a task specific objective function, but instead solely depends on the Lagrangian dynamics governing the inference process. In the following, this is discussed in more detail.

Classical formulations of image labeling [24] are usually formulated as minimization problems of (preferably convex) functions \(\min _X J(X)\), where global minimizers are associated with meaningful label assignments. As a consequence, the minimizers themselves are the solution of the labeling problem, independent of any specific optimization strategy used to find or approximate them.

In [18, Pro. 3.9, Prop. 3.10] it was shown that if the averaging matrix \(\Omega \) is symmetric \(\Omega =\Omega ^{\top }\), then the above mentioned S-flow (128a) is actually a Riemannian gradient ascent flow with respect to the function

$$\begin{aligned} J(S) = \tfrac{1}{2}\langle S, \Omega S\rangle = \tfrac{1}{2}\Vert S\Vert _2^2 - \tfrac{1}{4}\sum _{i\in {\mathcal {V}}}\sum _{j\in {\mathcal {N}}_i} \Omega _{ij} \Vert S_i - S_j\Vert _2^2. \end{aligned}$$
(143)

Similar to the continuous case in [18, Prop. 4.2], it can be shown that the global maximizers of J are spatially constant assignments, i.e. every node in the graph has the same label. This can directly be seen from the right-hand side expression for J in (143). In order for J to obtain its supremum, the first term \(\Vert S\Vert _2^2\) needs to be maximal, which happens precisely if every \(S_i\) is one of the standard basis vectors, and the second term \(\sum _{i\in {\mathcal {V}}}\sum _{j\in {\mathcal {N}}_i} \Omega _{ij} \Vert S_i - S_j\Vert _2^2\) needs to be minimal (zero), which happens precisely if all the \(S_i\) have the same value at all nodes \(i \in {\mathcal {V}}\), that is S is spatially constant.

Therefore, in contrast to the above mentioned classical methods, we are not interested in maximizers of the function J, as they generally do not represent meaningful assignments. Indeed, any nontrivial assignment the S-flow S(t) converges to (which experimentally happens [7, 18]) cannot be a maximizer of J. Rather, the integral curves themselves, that is the inference process governed by the spatially coupled replicator dynamics, is the crucial element responsible for producing meaningful label assignments as limit points. This highlights the importance of the Lagrangian mechanical viewpoint of the assignment flow. This second-order dynamics formulation allows to relate the assignment flow to other rich areas of mathematics and theoretical physics, with the aim to further investigate and reveal its properties for metric data labeling.

4.4 Directly related work

In [16, Thm. 2.1], the authors claim that all uncoupled equations of the form \(\dot{p} = {R}_p F(p)\), on a single simplex \(p(t) \in {\mathcal {S}}\), satisfy the Euler–Lagrange equation associated with the cost functional

$$\begin{aligned} \mathcal {L}(p) := \int _{t_0}^{t_1} \tfrac{1}{2}\Vert \dot{p}(t)\Vert _g^2 + \tfrac{1}{2}\Vert {R}_{p(t)} F(p(t))\Vert _g^2 dt \quad \text {for curves } p :[t_0, t_1] \rightarrow {\mathcal {S}}. \end{aligned}$$
(144)

In our present paper, we derive a more general result (Theorem 3.3) for a system (1) of coupled equations from the viewpoint of geometric mechanics on manifolds, of which (144) is a (very) special case. In particular, we derive a necessary condition (62) that is missing in [16], which any affinity function F has to satisfy for the assertion of Theorem 3.3 to hold. This latter result yields an interpretation of stationary points of the action function as solutions of the Euler–Lagrange equation (63).

It can be shown that in the case of \(n = 2\) labels, any fitness function F indeed fulfills condition (62) and therefore also the Euler–Lagrange equation. However, for \(n > 2\) labels this is no longer true, as the following counterexample demonstrates.

Suppose we have more than two labels, i.e. \(n>2\), and first consider the case of \(m = |{\mathcal {V}}| = 1\) nodes, that is an uncoupled replicator equation on a single simplex. Define the matrix \(F:= e_2 e_1^\top \), where \(e_i\) are the standard basis vectors of \(\mathbb {R}^n\). Thus, the affinity function is a linear map

$$\begin{aligned} F:{\mathcal {S}}\rightarrow \mathbb {R}^n, \quad p = (p^1, \ldots , p^n)^\top \mapsto Fp = p^1 e_2 \end{aligned}$$
(145)

fulfilling \(d F_p = F\) and \(d F_p^* = F^\top \). A short calculation using the relation \({R}_p e_i = p^i (e_i - p)\) (Einstein summation convention is not used) shows that the first coordinate of condition (62) takes the form

$$\begin{aligned} \big ({R}_p (F - F^\top ){R}_p F p\big )^1 = - (p^1)^2 p^2 (1 - p^1 - p^2) \ne 0,\quad \text {for all } p \in {\mathcal {S}}. \end{aligned}$$

This example generalizes to the case \(m > 1\) by defining the linear affinity function \(\mathcal {F}[W]\) componentwise by \((\mathcal {F}[W])_i:= F W_i,\,i\in [m]\).

5 Conclusion

In this work, we generalized a previous result of uncoupled replicator equations from [16] to the case of coupled replicator equations. The viewpoint of Lagrangian mechanics on manifolds resulted in an interpretable Euler–Lagrange equation (63) and provided the mathematical tools to derive condition (62) for characterizing those affinity maps F that result in critical points of the action functional (61). Accordingly, a constructed counterexample in terms of the specific affinity map (145) highlights that not all affinity maps F lead to critical points. Using the Legendre transformation, we also calculated an explicit expression for the associated Hamiltonian system in terms of the corresponding Lagrangian system (106).

Finally, the geometric mechanics perspective enabled the insight that, ignoring a set of starting points of measure zero, solutions to the assignment flow are reparametrized geodesics of the Jacobi metric (121). Thus, in a certain sense, these solutions locally connect assignment states in an optimal way by realizing a shortest path.

Our results provide a basis for exploring analogies to mathematical representations of interacting particle systems in theoretical physics in future work. In addition, exploring transformations motivated by the underlying symplectic theory [25, 26] which enabled to gain further insight into optimal transportation and the corresponding Wasserstein geometry, might also worth to be explored from the viewpoint of information geometry as exploited in our work. This may further enhance our understanding of dynamical and learning systems, such as deep neural networks, that reveal structures in metric data.