On the Geometric Mechanics of Assignment Flows for Metric Data Labeling

Metric data labeling refers to the task of assigning one of multiple predefined labels to every given datapoint based on the metric distance between label and data. This assignment of labels typically takes place in a spatial or spatio-temporal context. Assignment flows are a class of dynamical models for metric data labeling that evolve on a basic statistical manifold, the so called assignment manifold, governed by a system of coupled replicator equations. In this paper we generalize the result of a recent paper for uncoupled replicator equations and adopting the viewpoint of geometric mechanics, relate assignment flows to critical points of an action functional via the associated Euler-Lagrange equation. We also show that not every assignment flow is a critical point and characterize precisely the class of coupled replicator equations fulfilling this relation, a condition that has been missing in recent related work. Finally, some consequences of this connection to Lagrangian mechanics are investigated including the fact that assignment flows are, up to initial conditions of measure zero, reparametrized geodesics of the so-called Jacobi metric.


INTRODUCTION
1.1.Overview, Motivation.Semantic image segmentation, a.k.a.image labeling, denotes the problem to partition an image into meaningful parts.Applications are abound and include interpretation of traffic scenes by computer vision systems, medical image analysis, remote sensing, etc.The state of the art is based on deep networks that were trained on very large data sets.A recent survey [MBP + 21] reviews a vast number of different deep network architectures and their empirical performance on various benchmark data sets.Among the challenges discussed in [MBP + 21, Sec.6.3], the authors write: "... a concrete study of the underlying behavior / dynamics of these models is lacking.A better understanding of the theoretical aspects of these models can enable the development of better models curated toward various segmentation scenarios." In [APSS17], a class of dynamical systems for image labeling, called assignment flows, was introduced in order to contribute to the mathematics of deep networks and learning.We refer to Section 3 for a precise definition and to [Sch20] for a review of recent related work.Assignment flows correspond to solutions W (t) of a high-dimensional system of coupled ordinary differential equations (ODEs) of the form (1.1) that evolve on the so-called assignment manifold W. Each ODE of this system is a replicator equation [HS03,San10] W il F l (W ) , j ∈ {1, . . ., n}, ( whose solution W i (t) ∈ S := rint ∆ n−1 ⊂ R n + evolves on the relative interior of the probability simplex that is equipped with the Fisher-Rao metric g [AN00] and is labeled by a vertex i ∈ V of an underlying graph G = (V, E).The assignment manifold W = S × • • • × S is the product of the Riemannian manifolds (S, g) with respect to all vertices i ∈ V.
The essential component of the vector field of (1.2) are a collection of affinity functions F ij : W → R that measure the affinity (fitness, etc.) of the classes (types, species, etc.) j ∈ [n].The differences of these affinity values to their expected (or average) value on the right-hand side of (1.2), together with the multiplication by W ij , define the replicator equation.For suitably defined affinity functions, the solution of this equation is supposed to perform a selection of some class j: W i (t) converges for t → ∞ to a vertex of e j ∈ ∆ n−1 and in this sense encodes the decision to assign the class label j to the vertex i ∈ V and to any data indexed by i, like e.g. the color value in some image, see Section 3.2 for more details.
The basic idea underlying the assignment flow approach (1.1) is to assign a replicator equations to each vertex of an underlying graph and to couple them through smooth nonlinear interactions of the assignment vectors {W k : k ∈ N i ⊂ V} within neighborhoods N i around each vertex i ∈ V.This is why the argument of F i in (1.2) is W rather than W i .As a consequence, dynamic label assignments are performed by solving (1.1) at each vertex i depending on the context in terms of all other decisions.The fact that W (t) assigns class labels at each vertex when t → ∞ is not clear a priori but depends on F .We refer to [ZZS21] for the study of a basic instance of F and sufficient conditions that ensure unique labeling decisions.
The connection to deep networks results from approximating the flow by geometric integration.The simplest such scheme among a range of proper schemes [ZSPS20], the geometric Euler scheme with discrete time index t and stepsize h (t) , yields the iterative update rule where Exp : T W → W denotes the exponential map of the so-called e-connection of information geometry [AN00,AJLS17].The key observation to be made here is that for the choice of a linear affinity map F the right-hand side of (1.3) involves the two essential ingredients of most deep network architectures: (1) A linear operation at each vertex of the underlying graph parametrized by network parameters, here given as part of the definition of the linear affinity map F .(2) A pointwise smooth nonlinearity, here given by the exponential and replicator maps Exp W i •R W i .The connection between general continuous-time ODEs and deep networks has been picked out as a central them by [HR17,CRBD18] and classifies the assignment flow as a particular 'Neural ODE'.The above-mentioned limited understanding of what deep networks really do underlines the importance of characterizing and understanding the dynamics (1.1) of assignment flows.
1.2.Contribution, Organization.The aim of this paper is to exhibit a natural Lagrangian L : T W → R of the form kinetic minus potential energy and to characterize solutions W (t) to (1.1) as stationary points of a corresponding action functional (1.4) Our result generalizes the result of a recent paper [RK18], where an action functional was introduced for the evolution p(t) of a single discrete probability vector on the corresponding probability simplex.By contrast, equation (1.1) couples the evolution of a (typically large) number of assignment vectors across the underlying graph.In particular, we characterize precisely the admissible class of affinity functions F that establishes the connection between (1.4) and the corresponding Euler-Lagrange equation, a condition that is missing in [RK18], see also Section 4.2.Furthermore, we show that except for starting points in a specific set of measure zero, the set of Mañé critical points, solutions of the assignment flow are reparametrized geodesics of the so called Jacobi metric.Finally, using the Legendre transform, we compute an explicit expression of the Hamiltonian system associated to (1.4) in the form of the equivalent Lagrangian system on T W.
The paper is organized as follows.Section 2 collects basic notions of geometric mechanics that are required in the remainder of the paper.The assignment flow and our novel results are presented in Section 3, followed by a discussion in Section 4. We conclude in Section 5.
Depending on the arguments, a, b denotes the Euclidean inner product of vectors or the inner product A, B = tr(A ⊤ B) of matrices inducing the Frobenius norm A F = A, A 1/2 .The identity matrix is denoted by I d ∈ R d×d and the i-th row vector of any matrix A by A i .
The linear dependence of a mapping F on its argument x is indicated by square brackets F [x], if F is just a matrix we simply write F x.The adjoint of a linear operator F : R m×n → R m×n with respect to the standard matrix inner product on R m×n is denoted by F * and fulfills Inequalities between vectors or matrices are to be understood componentwise.For a, b ∈ R d , we denote componentwise multiplication (Hadamard product) by and componentwise division, for b = 0, by a b = ( a 1 b 1 , . . ., a d b d ) ⊤ .We further set a ⋄k := a ⋄(k−1) ⋄ a and a ⋄0 := 1 d . (1.7) Finally, if p ∈ R d is a probability vector, i.e. p ≥ 0 and p, 1 d = 1, then the expected value and variance of a vector a ∈ R d (interpreted as a random variable a : (1.8)

ELEMENTS FROM GEOMETRIC MECHANICS
In this section, we collect from [AM87, Ch. 3] some basic notions of geometric mechanics that are required in subsequent sections.
2.1.Hamiltonian Systems.Let (N, ω) be a symplectic manifold with the symplectic twoform ω, and let H : N → R be a smooth function, called the Hamiltonian.The Hamiltonian vector field X H corresponding to H is defined as symplectic gradient by (2.1) if and only if in Darboux coordinates (q 1 , . . ., q n , p 1 , . . ., p n ) for ω, the Hamiltonian equations hold for the curve γ(t) = (q(t), p(t)), qi (t) = ∂H ∂p i (q(t), p(t)) and ṗi = − ∂H ∂q i (q(t), p(t)), for all i ∈ [n]. (2. 3) The value of the Hamiltonian H(γ(t)) (also called energy) is constant along integral curves of X H .For any smooth manifold M , the cotangent bundle (T * M, ω can ) is a basic instance of the above situation, with the canonical symplectic form ω can .Thus any smooth function H : T * M → R gives rise to a Hamiltonian system, where T * M is interpreted as momentum phase space and H represents an energy.2.2.Lagrangian Systems.Suppose M is a smooth manifold.Similar to Hamiltonian systems on momentum phase space T * M , there is a related concept on the tangent bundle T M , interpreted as velocity phase space.In this context, a smooth function L : T M → R is called Lagrangian.For a given point x ∈ M , denote the restriction of L to the fiber T x M by L x := L| TxM : T x M → R. The fiber derivative of L is defined as (2.4) where The Lagrangian two-form ω L is defined as the pullback of the canonical symplectic form ω can on the cotangent bundle T * M under the fiber derivative FL ω L := (FL) * ω can . (2.5) According to [AM87, Prop.3.5.9],ω L is a symplectic form on T * M if and only if L is a regular Lagrangian.In the following, we only consider regular Lagrangians.The action associated to a Lagrangian L : T M → R is defined by and the energy function by E := A − L, that is (2.7) The Lagrangian vector field for L is the unique vector field X E on T M satisfying (2.8) Since we assume L to be regular, X E is nothing else than the symplectic gradient of L with respect to ω L .A curve γ(t) = (x(t), v(t)) on T M is an integral curve of X E , i.e.
(2.12) Accordingly, the Hamiltonian vector field X H on T * M and the Lagrangian vector field X E on T M are FL related [AM87, Thm.3.6.2],that is and thus integral curves of X E are mapped to integral curves of X H and vice versa.Furthermore, the base integral curves of X E and X H coincide.Therefore, as a consequence of (2.12) and for a hyperregular Lagrangian L, the energy E is just another representation of the corresponding Hamiltonian H.
2.4.Mechanics on Riemannian Manifolds.Let (M, h) be a Riemannian manifold.Suppose a smooth function G : M → R, called potential, is given and consider the Lagrangian (2.14) It then follows (see [AM87,Sec. 3.7] or by direct computation) that the fiber derivative of L is the canonical isomorphism FL = h ♭ : T M → T * M. (2.15) Hence the Lagrangian L is hyperregular with action A and energy E = A − L given by where D h t = ∇ h ẋ is the covariant derivative along x with respect to the Riemannian (Levi-Civita) connection ∇ h .Here, grad h G denotes the Riemannian gradient of the potential G.

MECHANICS OF ASSIGNMENT FLOWS
In this section, we get back to the scenario of image labeling informally introduced in Section 1.1.Section 3.2 completes the definition of the assignment flow approach (1.1).The assignment manifold underlying the assignment flow is introduced in Section 3.1 together with Fisher-Rao metric in Section 3.3.We state and prove the main result of this paper in Section 3.4.(3.1) Assume that for every node i ∈ V some data point f i is given in a metric space (F, d F ), together with a set F * = {f * 1 , . . ., f * n } ⊂ F of predefined prototypes, also called labels, identified with Context based metric data classification or labeling refers to the task of assigning to each node i ∈ V a suitable label in F * , based on the metric distance to the given data f i and the relation between data points encoded by the edge set E.
As introduced in Section 1.1, for every i ∈ V the assignment of labels F * to a data point f i is represented by an assignment vector W i (t), where the j-th entry W j i (t) represents the probability for the j-th label f * j .These assignment vectors are determined by (1.2) and evolve on the relative interior of the (n − 1)-simplex with barycenter Accordingly, all probabilistic label choices on the graph are encoded as a single point W ∈ W on the product space with barycenter 1 W := (1 S , . . ., 1 S ) ⊤ .
(3.6) Thus, the i-th component of W = (W k ) k∈V represents the probability distribution of label assignments for node i ∈ V (3.7)In the following, we always identify the space W from (3.5) with its matrix embedding by regarding the i-th component W i of a point W = (W k ) k∈V in (3.5) as the i-th row of a matrix in R m×n .Hence points W ∈ W are viewed as row-stochastic matrices with full support, called assignment matrices, with assignment vectors (3.7) as row vectors.The barycenter (3.6) can then also be expressed as a matrix (3.9) The tangent space of S ⊂ R n from (3.3) at any point p ∈ S is identified as Hence T p S is represented by the same vector subspace T 0 of codimension 1, for all p ∈ S. In particular, the tangent bundle is trivial Viewing W as an embedded submanifold of R m×n by (3.8), we accordingly identify With this identification the tangent bundle is also trivial (3.13) 3.2.Assignment Flows.Assignment flows are dynamical systems on W for inferring probabilistic label assignments that gradually become unambiguous label assignments as t → ∞.These dynamical systems have the form where is the linear replicator map defined componentwise via the replicator matrix The function F couples the evolution of the individual assignment vectors Ẇi , i ∈ V, over the graph, typically by reinforcing tangent directions of similar assignment vectors, and is therefore called affinity or similarity mapping.Each choice of a similarity mapping F defines a particular assignment flow; see Section 4.1.2for a basic instance.Our main result stated in Section 3.4 characterizes a general class of admissible similarity mappings F .

3.3.
Fisher-Rao Metric.From an information geometric viewpoint [AN00, AJLS17], the canonical Riemannian structure on S is given by the Fisher-Rao (information) metric This naturally extends to the product manifold structure of W (3.5) via the product metric which turns the assignment manifold W into a Riemannian manifold.
The orthogonal projection onto T 0 and T 0 , respectively, with respect to the Euclidean inner product are given by Next, we return to the replicator mappings (3.17).The linear mapping satisfies the relations and the restriction R p | T 0 : T 0 → T 0 to the linear subspace T 0 ⊂ R n is a linear isomorphism with inverse given by [SS21, Lem.3.1] Likewise, the replicator operator R W : R m×n → T 0 satisfies for all and the restriction to the linear subspace T 0 ⊂ R m×n is a linear isomorphism with inverse showing that R W is self-adjoint R * W = R W with respect to the matrix inner product.There is also a relation between the Fisher-Rao metric and the matrix inner product in terms of the replicator operator.
Lemma 3.1.At any point W ∈ W, the replicator operator R W transforms the Riemannian metric into the matrix inner product (3.28) Proof.Using the properties of the replicator operator R W directly results in Corollary 3.2.Let J : W → R be a smooth function and assume there is a smooth map Ψ : W → R m×n such that the differential of J takes the form with respect to the matrix inner product •, • .Then, the Riemannian gradient of J is given by Since this uniquely determines the Riemannian gradient of J, the statement follows.
For functions J : W → R extending onto an open set, the above lemma directly implies a relation between the Riemannian gradient and the usual gradient, a result that is already well known [AJLS17, Prop.2.2].For this, suppose J : U → R is a smooth extension of J defined on some open set U ⊂ R m×n containing W, i.e.J| W = J.Then, Ψ(W ) can be chosen as the usual gradient with respect to the matrix inner product ∂ J (W ) ∈ R m×n and the Riemannian gradient of J is given by (3.33) 3.4.The Action Functional.Our main result is summarized in the following theorem.It refers to affinity functions F : W → R m×n introduced in and discussed after equation (3.14).
Applying the identifications T W W = T 0 from (3.12) and T F (W ) R m×n = R m×n for every W ∈ W allows to view the differential of F as a linear operator if and only if the affinity function F fulfills the condition where dF | * W (t) is the adjoint linear operator of dF | W (t) from (3.35) and R W (t) is the replicator operator defined by (3.17a).This condition is equivalent to the Euler-Lagrange equation Remark 3.4.Theorem 3.3 characterizes the class of affinity functions, in terms of condition (3.37), for which solutions to the assignment flow equation (3.14) are stationary points of the action functional (3.36) and the Euler-Lagrange equation (3.38), respectively.We defer most of the further discussion to Section 4 but mention one important point here.Since every first-order ODE can trivially be described as a special case of the Euler-Lagrange equation of some quadratic Lagrangian it is worth pointing out that the Lagrangian L in Theorem 3.3 is classical, that is, of the form kinetic minus potential energy.In particular, the potential − 1 2 i∈V Var W i (t) F i (W (t)) (note the minus sign) is a non-positive function.Solutions of the assignment flow equation (3.14) correspond precisely to those solutions of the Euler-Lagrange equation with energy 0. Since 0 is the maximum of the potential this energy value is precisely the Mañé critical value of this Lagrangian system, see Section 4 for further remarks.
We proceed with Lemmata to prepare the proof of Theorem 3.3.
Thus, for W ∈ W and F ∈ R m×n , we have (3.40) Proof.We have Next, we compute the differential of the assignment flow vector field (3.14) viewed as a mapping where the i-th row of the linear map B(W, F (W )) : T 0 → T 0 is defined via matrix multiplication with matrix B given by → W be a curve with η(0) = W and η(0) = V .Keeping in mind R p = Diag(p) − pp ⊤ , we obtain for each row vector indexed by i ∈ V where Next, we consider the covariant derivative of a vector fields along a curve p : I → S, with I ⊂ R an interval.Due to T S = S × T 0 , we view a vector field v(t) along p(t) as a map v : I → T 0 , and also its usual time derivative v : I → T 0 , since T 0 is a vector space.Defining and using the expression from [AJLS17, Eq. (2.60)] (with α set to 0), the covariant derivative Similarly, as a consequence of T W = W × T 0 , we regard a vector field V (t) along a curve W : I → W as a mapping V : I → T 0 , and likewise V : I → T 0 .Since the covariant derivative on a product manifold equipped with a product metric is the componentwise application of the individual covariant derivatives, the covariant derivative of V on W has the form with i-th row of the last term given componentwise by (3.46) The acceleration of a curve W (t) on W is the covariant derivative of its velocity vector field V (t) := Ẇ (t), related to the ordinary time derivative V = Ẅ in R m×n by (3.50) Lemma 3.7.Suppose W : I → W is a solution of the assignment flow (3.14).Then the acceleration of W (t) in terms of the covariant derivative of Ẇ (t) takes the form takes the form (to simplify notation we omit the argument t) where B is defined by (3.44b).We have f, R p f = R p f 2 g by Lemma 3.5 and using (3.44c) ) and results in the identity Substituting this expression into (3.50)yields (3.51).
As a final preparatory step, we define the potential and compute its Riemannian gradient.
Lemma 3.8.The Riemannian gradient of the potential G from (3.55) is given by where dF | * W (t) is the adjoint linear operator of dF | W (t) from (3.35).
Proof.Let W ∈ W. In the following, we derive the expression in (3.56) by applying Corollary 3.2.To this end, take any V ∈ T W W = T 0 and let η : (−ε, ε) → W be a curve with η(0) = W and η(0) = V .Then Using the expression for dR[F ]| W from Lemma 3.6 and R * W = R W from (3.27), the second inner product takes the form Substituting back this formula into the above expression for dG| W together with the expression for the first inner product, results in (3.60b) Due to Corollary 3.2, the Riemannian gradient is given by Regarding the adjoint mapping B * , we have and by (3.44c) Proof of Theorem 3.3.Due to Lemma 3.5, the Lagrangian of the action functional (3.36) has the form ) with G(W ) defined by (3.55).Therefore, the Euler-Lagrange equation (3.38) is a direct consequence of Proposition 2.1.Due to Lemma 3.7 and 3.8, the expression for the acceleration of W (t) and the Riemannian gradient of G at W (t) both contain the term with opposite signs, which yields the relation ].As a consequence, the characterization of F in (3.37) is equivalent to the Euler-Lagrange equation (3.38) and by Proposition 2.1 equivalent to W (t) being a critical point of the action functional.

DISCUSSION
4.1.Some Implications of Theorem 3.3.We discuss in the this section various properties and consequences of Theorem 3.3.

Mañé critical value. In his influential work [Mn97] Mañé introduced critical values
which should be interpreted as energy levels that mark important dynamical and geometric changes for the Euler-Lagrange flow, see [Abb13] for a nice introduction.Dynamical properties at energies being equal to a Mañé critical value are often times hard to analyze.In general, there are various related Mañé critical values, however for classical Lagrangians such as L, e.g.(3.65), in Theorem 3.3 all of them agree and equal the maximum of the potential.As pointed out before the potential part of the Lagrangian L is G which has 0 as maximum.At the same time solutions to the assignment flow equation (3.14) are precisely the solutions to the Euler-Lagrange equation (3.38) of energy 0, i.e. at the Mañé critical value of L.
In the following, basic properties of the set of Mañé critical points on W are investigated and summarized in Proposition 4.1.Subsequently, based on a result from geometric mechanics, Proposition 4.4 shows that integral curves of the assignment flow that are critical points of the action functional L in Theorem 3.3 and start in the complement are actually be reparametrized geodesics of the so called Jacobi metric introduced below.
By Lemma 3.5, we have that is the potential assumes its maximum at W if and only if W is an equilibrium point of the assignment flow (3.14).Due to R W | T 0 being a linear isomorphism by (3.26), we further obtain Thus, we need to consider the zero set of the smooth map We restrict our analysis to affinity functions F for which the differential d(P T 0 •F ) has constant rank on W, in the following denoted by r.To avoid the trivial case P T 0 • F ≡ const we further restrict to the case r ≥ 1.A basic instance of this case is given in Section 4.1.2with F being a linear map.Due to the Constant-Rank Level Set Theorem [Lee13, Thm.5.12], the zero set is a properly embedded submanifold of W with dimension Since the dimension of M crit is strictly less than dim(W), it is a submanifold with measure zero in W [Lee13, Cor.6.12].Therefore, the complement Q (4.2), that is the set of points W with G(W ) < 0, is a dense ([Lee13, Prop.6.8]) subset of W. According to [Lee13, Prop.5.5], being properly embedded in W is equivalent to being a closed subset of W (in the subspace topology).Thus, Q is an open subset of W and consequently also a submanifold.Overall we have proven the following statement.
with energy E 0 are the same as geodesics of the Jacobi metric h E 0 with energy 1.
Since G < 0 on Q, we restrict our investigation to the Riemannian submanifold (Q, g| Q ) and set C := 0, resulting in the Jacobi metric h 0 = (−G)g| Q of the form In the next section, we directly determine the set M crit for the a basic instance of an assignment flow.4.1.2.Admissible Affinity Functions.Condition (3.37) characterizes affinity functions F for Theorem 3.3 to hold.We contrast this condition with a simple affinity function used in prior work and directly determine the corresponding set M crit of Mañé critical points.
The recent paper [SS21, Proposition 3.6] introduced a reparametrization, called S-flow, of the original assignment flow formulation of [APSS17].The distance information between each data point f i ∈ F and the labels f * j ∈ F is collected in the data matrix where d F is the metric introduced in Section 3.1.Intuitively it represents how well each data point is represented by the labels.For a nonnegative averaging matrix where the so-called lifting map is the composition of the mapping (3.17a) and the exponential map Exp of (W, g) with respect to the so-called e-connection of information geometry [AN00].Note that both solutions S(t), W (t) evolve on W and that W (t) depends on S(t) but not vice versa.Hence we focus on the system (4.12a) and the specific affinity function given by matrix multiplication The differential of F at S ∈ W is therefore also given by matrix multiplication that is condition (3.37) holds in particular if Ω = Ω ⊤ is symmetric.This assumption was adopted in [SS21] and in a slightly more general form also in [ZZS21].
Next, we determine the set M crit of Mañé critical points (4.1) based on the condition on the right-hand side of (4.4).A basic calculation using the properties of P T 0 and Ω shows that these two linear operators commute, resulting in Since the corresponding differential is just matrix multiplication independent of W ∈ W the rank r of d(P T 0 • F ) is constant.For Proposition 4.1 to hold, we need to check that the rank satisfies r ≥ 1.For this, denote the corresponding kernel of (4.17) by Lemma 4.6.dim(Σ Ω ) = (n − 1) dim(ker(Ω)) and therefore the rank of d(P Proof.Denote the standard basis of R n by e 1 , . . ., e n .A basis for T 0 (3.10) is then given by Furthermore, set K := dim(ker(Ω)) and let a 1 , . . ., a K be a basis of ker(Ω) ⊂ R m .Then, for every showing that a k b ⊤ i ∈ Σ Ω .As all the a k and b i are each linear independent, so are their outer products i we obtain which in turn shows that V can be expressed in terms of the basis b i as V = i∈[n−1] V e i b ⊤ i .On the other hand, the i-th column of V , given by V e i , fulfills ΩV e i = 0 and can be expressed as showing that all the a k b ⊤ i are indeed a basis for Σ Ω .As a result, the formulas for dim(Σ Ω ) and the rank As a consequence of rank(Ω) ≥ 1 by (4.11), a lower bound on the rank r of d(P T 0 • F ) is given by r ≥ n − 1 ≥ 1.Therefore, Proposition 4.1 applies and M crit for the S-flow is a submanifold of W with measure zero.The expression of P T 0 • F in terms of Ω from (4.16) and the fact that W − 1 W lies in T 0 for all W ∈ W allow to explicit characterization M crit as an affine subspace As Ω is assumed to be given, M crit can explicitly be constructed after a basis for ker(Ω) has been calculated.Therefore, we are able to check if S(0) / ∈ M crit , in which case the corresponding integral curve S(t) of the S-flow (4.12a) would be a reparametrized geodesic for the Jacobi metric (4.9) with energy E 0 = 0, according to Theorem 4.3.
We conclude this section with another observation that should stimulate future work.Under the afore-mentioned symmetry assumption, a continuous-domain approach was studied in [SS21] corresponding to (4.12) at 'spatial scale zero'.The latter means to consider only parameter matrices Ω in (4.12a) whose sparse row vectors Ω i encode nearest-neighbor interactions of S i and {S k : k ∼ i} on an underlying regular grid graph, and to consider the right-hand side of (4.12a) as discretized Riemannian gradient of a continuous-domain variational approach with pointwise defined variables.Specifically, replacing i ∈ V by locations x ∈ U ⊂ R d , the vector field S : V → S, i → S i , becomes a simplex-valued vector field S : U → S, x → S(x), that has to solve a variational inequality.Besides analyzing existence of a minimizer in a suitable function space and a corresponding dedicated numerical algorithm, a heuristically (under too strong regularity assumptions) derived partial differential equation was presented that is supposed to characterize any minimizer S * and reads where R S * applies pointwise R S * (x) to the vector (−∆S * − αS * )(x) at every x ∈ Ω, in the same way as the mapping R W defined by (3.17a) amounts to applying the mappings (3.17b) at every vertex i ∈ V.
From this viewpoint, condition (3.37), that was shown to be equivalent to the Euler-Lagrange equation (3.38), should become the spatially-discrete but nonlocal analogon of (4.25) in the limit t → ∞.We leave the exploration of this observation for future work.
4.1.3.Geometric Integration Versus Optimization.In contrast to classical approaches of the labeling problem, the presented dynamical geometric formulation does not merely rely on finding maximizers of a task specific objective function, but instead solely depends on the Lagrangian dynamics governing the inference process.In the following, this is discussed in more detail.Classical formulations of image labeling [KAH + 15] are usually formulated as minimization problems of (preferably convex) functions min X J(X), where global minimizers are associated with meaningful label assignments.As a consequence, the minimizers themselves are the solution of the labeling problem, independent of any specific optimization strategy used to find or approximate them.
In [SS21, Pro.3.9, Prop.3.10] it was shown that if the weight matrix Ω is symmetric Ω = Ω ⊤ , then the above mentioned S-flow (4.12a) is actually a Riemannian gradient ascent flow with respect to the function Similar to the continuous case in [SS21,Prop. 4.2], it can be shown that the global maximizers of J are spatially constant assignments, i.e. every node in the graph has the same label.This can directly be seen from the right-hand side expression for J in (4.27).In order for J to obtain its supremum, the first term S 2 2 needs to be maximal, which happens precisely if every S i is one of the standard basis vectors, and the second term i∈V j∈N i Ω ij S i − S j 2 2 needs to be minimal (zero), which happens precisely if all the S i have the same value at all nodes i ∈ V, that is S is spatially constant.
Therefore, in contrast to the above mentioned classical methods, we are not interested in maximizers of the function J, as they generally do not represent meaningful assignments.Indeed, any nontrivial assignment the S-flow S(t) converges to (which experimentally happens [ZZS21,SS21]) cannot be a maximizer of J. Rather, the integral curves themselves, that is the inference process governed by the spatially coupled replicator dynamics, is the crucial element responsible for producing meaningful label assignments as limit points.This highlights the importance of the Lagrangian mechanical viewpoint of the assignment.4.2.Directly Related Work.In [RK18, Thm.2.1], the authors claim that all uncoupled equations of the form ṗ = R p F (p), on a single simplex p(t) ∈ S, satisfy the Euler-Lagrange equation associated with the cost functional (4.28) In our present paper, we derive a more general result (Theorem 3.3) for a system (1.1) of coupled equations from the viewpoint of geometric mechanics on manifolds, of which (4.28) is a (very) special case.In particular, we derive a necessary condition (3.37) that is missing in [RK18], which any affinity function F has to satisfy for the assertion of Theorem 3.3 to hold.This latter result yields an interpretation of stationary points of the action function as solutions of the Euler-Lagrange equation (3.38).
It can be shown that in the case of n = 2 labels, any fitness function F indeed fulfills condition (3.37) and therefore also the Euler-Lagrange equation.However, for n > 2 labels this is no longer true, as the following counterexample demonstrates.
Suppose we have more than two labels, i.e. n > 2, and first consider the case of m = |V| = 1 nodes, that is an uncoupled replicator equation on a single simplex.Define the matrix F := e 2 e ⊤ 1 , where e i are the standard basis vectors of R n .Thus, the affinity function is a linear map F : S → R n , p = (p 1 , . . ., p n ) ⊤ → F p = p 1 e 2 (4.29) fulfilling dF p = F and dF * p = F ⊤ .A short calculation using the relation R p e i = p i (e i − p) (Einstein summation convention is not used) shows that the first coordinate of condition (3.37) takes the form This example generalizes to the case m > 1 by defining the linear affinity function For general Lagrangians, however, Proposition 2.1 is not applicable and critical points of the action functional are characterized as integral curves of the Lagrangian vector field X E as detailed in Section 2.2.Since Lagrangians of the form kinetic minus potential energy (2.14) are hyperregular, the representation as Hamiltonian system via the Legendre transformation FL is an equivalent alternative.As mentioned in Section 2.3, the energy E : T W → R, the Hamiltonian H : T * W → R and their corresponding vector fields X E on T W and X H on T * W are related via To obtain interpretable explicit formulas, it will be more convenient to work on T W instead of T * W. In the following, we derive an explicit expression for the Lagrangian vector field X E and relate its corresponding integral curves to the Euler-Lagrange equation (3.38) of Theorem 3.3.
Because X E is the symplectic gradient of the energy E with respect to the Lagrangian form ω L , see (2.5), we first calculate an alternative formula for ω L in terms of the Fisher-Rao metric.For this we exploit the fact that the assignment manifold is a so called Hessian manifold [SY97], that is in suitable coordinates the Fisher-Rao metric is the Hessian of a convex function.
Since T W = W × T 0 (3.13) is trivial, the tangent space of T W at any point (W, V ) ∈ T W can be identified with the vector space With this identification, the Lagrangian two-form ω L has the following simple expression.
Then the Lagrangian two-form can be expressed via the Fisher-Rao metric as . Define the accumulated negative entropy by and let (g W ij ) denote the representation of the product Fisher-Rao metric (3.19) on W in coordinates η W . Since ϕ acc separates over the product structure of W, the accumulated negative entropy also induces the product Riemannian metric in the chart η equipping also the assignment manifold with the structure of a Hessian manifold [SY97].Now, take an arbitrary point (W, V ) ∈ T W = W × T 0 and let (x, v) be the corresponding coordinates with respect to the chart η W .According to [AM87, Prop.3.5.6], the Lagrangian two-form ω L (2.5) in coordinates is given by (4.37) Since the coordinate expression of the Lagrangian (3.65 (4.38) Plugging these expressions into (4.37) and rearranging the first sum using dx j ∧ dx i = −dx i ∧ dx j yields Now that we have an explicit expression for the Lagrangian two-form ω L , we are in a position to calculate an explicit representation of the Lagrangian vector field X E .
Proposition 4.8.The Lagrangian vector field X E on T W associated to the Lagrangian (3.65) at a point (W, V ) ∈ T W = W × T 0 is given by Writing X E = (X ′ E , X ′′ E ) ∈ T 0 × T 0 and comparing (4.48b) with the above expression for ω L from Lemma 4.7 shows X ′ E (W, V ) = V and X ′′ E (W, V ) = 1 2 A(W, V ) − grad g G(W ).
Any solution curve γ(t) = (W (t), V (t)) ∈ T W = T 0 × T 0 of the Lagrangian dynamics induced by the Lagrangian vector field X E associated to the Lagrangian (3.65) of Theorem 3.3 fulfills the ODE This form of the Hamiltonian ODE simply reflects the fact that this first-order dynamics on T W is induced by a second-order ODE on W. Indeed, substituting V = Ẇ in the second component of X E results in which we have already known to be satisfied for the base curve W (t) by (3.38) of Theorem 3.3.

CONCLUSION
In this work, we generalized a previous result of uncoupled replicator equations from [RK18] to the case of coupled replicator equations.The viewpoint of Lagrangian mechanics on manifolds resulted in an interpretable Euler-Lagrange equation (3.38) and provided the mathematical tools to derive condition (3.37) for characterizing those affinity maps F that result in critical points of the action functional (3.36).Accordingly, a constructed counterexample in terms of the specific affinity map (4.29) highlights that not all affinity maps F lead to critical points.
The geometric mechanics perspective enabled the insight that, ignoring a set of starting points of measure zero, solutions to the assignment flow are reparametrized geodesics of the Jacobi metric (4.9).Thus, in a certain sense, these solutions locally connect assignment states in an optimal way by realizing a shortest path.Finally, using the Legendre transformation, we calculated an explicit expression for the associated Hamiltonian system in terms of the corresponding Lagrangian system (4.44).
Our results provide a basis for exploring analogies to mathematical representations of interacting particle systems in theoretical physics in future work.This may further enhance our understanding of dynamical and learning systems that reveal structures in metric data.

1. 3 .
Basic Notation.In accordance with the standard notation in differential geometry, coordinates of vectors have upper indices.For any k ∈ N, we set [n] := {1, . . ., n} ⊂ N. The standard basis of R d is denoted by {e 1 , . . ., e d } and we set 1

Proposition 4. 1 .
If the differential d(P T 0 • F ) has constant rank r ≥ 1 on W, then the set M crit of Mañé critical points (4.1) is a submanifold of W with measure zero and its complement Q ⊂ W (4.2) is an open and dense subset.Equipped with this result, we are now able to characterize solutions of the assignment flow (3.14) starting in Q as reparametrized geodesics.Definition 4.2.([AM87, Def.3.7.6]).Let h be a Riemannian metric on M and G : M → R a potential.Assume C is a constant such that G(x) < C holds for all x ∈ M .Then the Jacobi metric is defined by h C := (C − G)h.(4.8) Theorem 4.3.([AM87, Thm.3.7.7]).Up to reparametrization, the base integral curves of the Lagrangian

4. 3 .
Lagrangian and Hamiltonian Point of View.Theorem 3.3 rests upon the representation of the assignment flow as a Lagrangian mechanical system of the form kinetic minus potential energy (3.65), as summarized in Section 2.4.Due to this specific form, Proposition 2.1 can be applied to characterize critical points of the action functional L from Theorem 3.3 as solutions to the Euler-Lagrange equation (3.38), which in turn allows to derive condition (3.37).

. 44 )
Proof.We directly use the definition (2.8) of the Lagrangian vector field X E .For this, letB = (B ′ , B ′′ ) ∈ T (W,V ) T W = T 0 × T 0 be arbitrary and assume, γ(t) = (W (t), V (t)) is a smooth curve in T W = W × T 0 with γ(0) = (W (0), V (0)) = (W, V ) and γ(0) = ( Ẇ (0), V (0)) = B = (B ′ , B ′′ ).(4.45)The time derivative of the potential G is expressed via the Rimannian gradientd dt G(W (t)) t=0 = dG| W [B ′ ] = g W grad g G(W ), B ′ .(4.46)By (3.48), the covariant derivative of V (t) at t = 0 is D g t V (0) = B ′′ − 1 2 A(W, V ), resulting in (t) V (t), V (t) t=0 = g W (0) V (0), D g t V (0) (4.47a) = g W V, B ′′ − 1 2 A(W, V ) .(4.47b)Putting everything together we obtain the following relation for the differential of the energy E from (2.16)dE| (W,V ) [B] = d dt E(W (t), V (t)) t=0 = d dt G(W (t)) t=0 (4.48a) = g W V, B ′′ − g W 1 2 A(W, V ) − grad g G(W ), B ′ .(4.48b) The Legendre Transform.Let L : T M → R be a hyperregular Lagrangian, i.e. the fiber derivative FL : T M → T * M is a diffeomorphism.Then the Lagrangian system on T M and the Hamiltonian system on T * M are related to each other by the Legendre transformation, with the Hamiltonian H : T * M → R corresponding to the energy E via .9) Now, let W (t) be an integral curve of the assignment flow (3.14).If the initial value W (0) lies in Q, then the entire integral curve W (t) remains in Q.This is a consequence of Mañé critical points being equilibrium points by (4.3) and the fact that the assignment flow is a first-order ODE.If additionally W (t) is a critical point of the action functional L from Theorem 3.3, then W (t) is a base integral curve with energy E 0 = 0. Thus, Theorem 4.3 directly implies the following statement.Proposition 4.4.Let W (t) be an integral curve of the assignment flow (3.14).If W (t) is a critical point of the action function L in Theorem 3.3 with initial value W (0) ∈ Q, then, up to reparametrization, W (t) is a geodesic of the Jacobi metric (4.9).It is important to note that the previous statement is only true for solutions of the assignment flow, which is a first-order ODE.A general solution of the second-order ODE Euler-Lagrange equation (3.38) might leave Q in finite time and cross the set M crit .