Assignment Flows for Data Labeling on Graphs: Convergence and Stability

The assignment flow recently introduced in the J. Math. Imaging and Vision 58/2 (2017), constitutes a high-dimensional dynamical system that evolves on an elementary statistical manifold and performs contextual labeling (classification) of data given in any metric space. Vertices of a given graph index the data points and define a system of neighborhoods. These neighborhoods together with nonnegative weight parameters define regularization of the evolution of label assignments to data points, through geometric averaging induced by the affine e-connection of information geometry. Regarding evolutionary game dynamics, the assignment flow may be characterized as a large system of replicator equations that are coupled by geometric averaging. This paper establishes conditions on the weight parameters that guarantee convergence of the continuous-time assignment flow to integral assignments (labelings), up to a negligible subset of situations that will not be encountered when working with real data in practice. Furthermore, we classify attractors of the flow and quantify corresponding basins of attraction. This provides convergence guarantees for the assignment flow which are extended to the discrete-time assignment flow that results from applying a Runge-Kutta-Munthe-Kaas scheme for numerical geometric integration of the assignment flow. Several counter-examples illustrate that violating the conditions may entail unfavorable behavior of the assignment flow regarding contextual data classification.


Problem and Motivation
Metric data labeling denotes the task to assign to each data point of a given finite set F I = { f i : i ∈ I} ⊂ F in a metric space (F, d F ) a unique label (a.k.a. prototype, class representative) from another given set F * J = { f * j : j ∈ J} ⊂ F. The data indices i ∈ I typically refer to positions x i ∈ R d in space of in space-time [0, T ] × R d . Accordingly, one associates with the data a graph G = (I, E) where the set of nodes I indexes the data and the edge set E ⊂ I × I represents a neighborhood system. A basic example is provided by the data of a digital image observed on a regular grid graph G in which case the data space F may be a color space, a high-dimensional Euclidean space like in multispectral imaging, or the positive-definite matrix manifold like in diffusion tensor medical imaging.
Data labeling provides a dramatic reduction of given data as Figure 1.1 illustrates. In addition, it is a crucial step for data interpretation. Basic examples include the analysis of traffic scenes [8], of medical images or of satellite images in remote sensing.
The assignment flow approach [2] provides a mathematical framework for the design of dynamical systems that perform metric data labeling. This approach replaces established variational methods to image segmentation [7] as well as discrete Markov random fields for image labeling [17] by smooth dynamical systems that facilitate the design of hierarchical systems for large-scale numerical data analysis. In addition, it can be extended to unsupervised scenarios [34] where the labels F * J can be adapted to given data or even learned from the data itself [35]. We refer to the survey [26] for further discussion and related work.
Interpretation of data is generally not possible without an inductive bias towards prior expectations and application-specific knowledge. In connection with image labeling, such knowledge is represented by regularization parameters that influence label assignments by controlling the assignment flow. Figure 1.2 provides an illustration. Nowadays, such parameters are learned directly from data. Due to the inherent Fig. 1.1 Data labeling on a graph through the assignment flow: Values from a finite set (so-called labels) are assigned to a given vector-valued function so as to preserve its spatial structure on a certain spatial scale. LEFT: Input data. CENTER: Data labeled at a fine spatial scale. RIGHT: Data labeled at a coarser spatial scale. Scale is determined by the size |N i |, i ∈ I of neighborhoods N i (1.13) that couple the individual dynamics (1.1) (see Section 1.2). Here, uniform weight parameters Ω (1.14) were used. smoothness, assignment flows can be conveniently used to accomplish this machine learning task [16,31,32]. From a more distant point of view, deep networks and learning [13] prevail in machine learning. Besides their unprecedented performance in applications, current deep learning architectures are also known to be susceptible to data perturbations leading to unpredictable erroneous outputs [9,11,14]. Our aim, therefore, is to prove stability properties of assignment flows under suitable assumptions on the regularization parameters, together with the guarantee that labelings, i.e. integral assignments, are computed for any data at hand. Section 1.3 further details the scope of this paper after introducing the assignment flow approach in the next section.

Assignment Flow
The assignment flow has been introduced by [2] for the labeling of arbitrary data given on a graph G = (I, E). It is defined by the system of nonlinear ODEṡ W i (0) = 1 n 1 n , i ∈ I, (1.1) whose solutions W i (t) evolve on the elementary Riemannian manifold (S, g) given by the relative interior S = rint(∆ n ) of the probability simplex ∆ n = p ∈ R n : n ∑ j=1 p j = 1 n , p = 1, p ≥ 0 . (1.2) Here, n = |J| denotes the number of labels and 1 n = (1, . . . , 1) ∈ R n is the vector of ones. The tangent space of S at any point p ∈ S is given by (1.3) and the Riemannian structure on S is defined by the Fisher-Rao metric The basic idea underlying the approach (1.1) is that each vector W i (t) converges within S to an ε-neighborhood of some vertex (unit vector) e j of ∆ n , that is for sufficiently large T = T (ε) > 0. This enables to assign a unique label (class index) j to the data point observed at vertex i ∈ I by trivial rounding: In the following, we give a complete definition of the vector field defining the assignment flow (1.1). The linear mapping R W i of (1.1) will be called replicator matrix. It is defined by R p : R n → T 0 , R p = Diag(p) − pp , p ∈ S. (1.7) Regarding the orthogonal projection onto T 0 given by Π 0 : R n → T 0 , Π 0 = I n − 1 n 1 n 1 n (1. 8) with I n denoting the identity matrix, the replicator matrix satisfies (1.9) Further, we will use the exponential map and its inverse exp p : R n → S, exp p (v) = pe v p, e v , p ∈ S, (1.10a) where multiplication, division, exponentiation and logarithm of vectors is meant componentwise. We call this map 'exponential' for simplicity. In fact, definition (1.10a) is the explicit expression of the relation where Exp : S ×T 0 → S is the exponential map corresponding to the affine e-connection of information geometry; see [1,3] and [26] for details. A straightforward calculation shows that the differential of exp p at v is where the right-hand side is defined by (1.7) and (1.10a). The behavior of the assignment flow (1.1), essentially rests upon the coupling of the local systems through the mappings S i within local neighborhoods (1.13) corresponding to the adjacency relation E ⊆ I × I of the underlying graph G. These couplings are parameterized by nonnegative weights (1.14) Considering the assignment manifold W = S × · · · × S, (|I| times) (1.15) the similarity map S : W → W is defined by It regularizes the assignment vectors W i ∈ S depending on the parameters (1.14), for given input data in terms of distance vectors D i ∈ R n storing the distances between data points f i ∈ F I and prototypes f * j ∈ F * J . Denoting the barycenter of S with 1 S = 1 n 1 n , the defining relation (1.16a) can be rewritten in the form [23, Lemma 3.2] (1. 17) In view of (1.15), all the mappings in (1.7), (1.10) and (1.16) naturally generalize from S to W and from T 0 given by (1.3) to T 0 = T 0 × · · · × T 0 .

Objectives
The first goal of this paper is to analyze the asymptotic behavior of the assignment flow (1.1) depending on the parameters (1.14). It was conjectured [2, Conjecture 1] that, for data in 'general position' as they are typically observed in real scenarios (e.g. no symmetry due to additive noise), the assignment flow converges to an integral labeling at every pixel, as described above in connection with (1.6). We confirm this conjecture in this paper under suitable assumptions on the parameters Ω . To this end, we use a reparametrization of the assignment flow and clarify the convergence of the reparameterized flow to equilibria and their stability. The second goal of this paper concerns the same question regarding the timediscrete assignment flow that is generated by a scheme for numerically integrating (1.1). Depending on what scheme is chosen, properties of the resulting flow may differ from properties of the time-continuous flow (1.1). Indeed, the authors of [2] adopted a numerical scheme from [19] which, when adapted and applied to (1.1), was shown in [5] to always converge to a constant solution, i.e. a single label is assigned to every pixel no matter which data are observed. Even though numerical experiments strongly indicate that this undesirable asymptotic behavior is irrelevant in practice, because it only occurs when W (t k ) is so close to the boundary of the closure of the underlying domain such that it cannot be reproduced with the usual machine accuracy, such behavior-nonetheless-is unsatisfactory from the mathematical viewpoint.
In this paper, therefore, we consider the simplest numerical scheme that was recently devised and studied in [33], to better take into account the geometry underlying the assignment flow (1.1) than the numerical scheme adopted in [2]. We show under suitable assumptions on the parameters Ω , that the time-discrete assignment flow generated by such a proper numerical scheme cannot exhibit the pathological asymptotic behavior mentioned above.

Related Work
The assignment flow approach emerged from classical methods (variational methods, discrete Markov random fields) to image segmentation and labeling. We refer to [26] for further discussion. The approach can take into account any differentiable data likelihood, and all discrete decisions like the formation of spatial regions at a certain scale are done by integrating the flow numerically. The inherent smoothness of the approach compares favorably to discrete schemes for image segmentation, like region growing schemes [20], in particular regarding the learning of parameters for incorporating prior knowledge. In particular, spatial regularization can be performed independently of the metric model of the data at hand. This is not the case for segmentation based on spectral clustering [27] as discussed in detail and demonstrated by [35].
From a more distant viewpoint, our results may be also of interest in the field of evolutionary game dynamics [15,22]. The corresponding basic dynamical system has the formṗ where the first multiplication on the right-hand side is done componentwise, the expectation is given by E p [ f (p)] = p, f (p) and p(t) evolves on ∆ n . The differential equation (1.21) is known as replicator equation. It constitutes a Riemannian gradient flow with respect to the Fisher-Rao metric if f = ∇F derives from a potential F. It is well known that depending on what 'affinity function' f : ∆ n → R n is chosen, a broad range of dynamics may occur, even for linear affinities p → Ap, A ∈ R n×n (see e.g. [6]). Other choices give even rise to chaotic dynamics (see e.g. [12]). By comparison, the explicit form of Eq. (1.1) readṡ where S i (W ) couples a possibly very large number m = |I| of replicator equations of the form (1.22), as explained above in connection with (1.14). The mapping S i does not derive from a potential, however, but can be related to a potential after a proper reparametrization and under a symmetry assumption on the parameters (1.14) [23]. We refer to [26] for a more comprehensive discussion of the background and further work related to the assignment flow (1.1).

Organization
The assignment flow and its basic properties (limit points, convergence, stability) are established in Section 2. We briefly examine in Section 2.4 also properties of a simplified approximate version of the assignment flow, that can be linearly parametrized on the tangent space, which is convenient for data-driven estimation of suitable weight parameters [16]. In Section 3, we extend these results to the discrete-time assignment flow that is obtained by applying the simplest numerical scheme for geometric integration of the assignment flow, as worked out in [33]. Numerical examples demonstrate that violating the conditions established in Section 2 may lead to various behaviors of the assignment flow, all of which are unfavorable as regards data classification. Some lengthy proofs have been relegated to Appendix A. We conclude in Section 4.

Basic Notation
We set [n] = {1, 2, . . . , n} for any n ∈ N and denote by |S| the cardinality of any finite set S. Throughout this paper, m and n will denote the number of vertices of the underlying graph G = (I, E) and the number of classes indexed by J, respectively, m = |I|, n = |J|. (1.23) The set W = S × · · · × S (1.15) is called assignment manifold, where S = rint(∆ n ) is the relative interior of the probability simplex ∆ n . S and W, respectively, are equipped with the Fisher-Rao metric (1.4) and hence are Riemannian manifolds. Points of W are row-stochastic matrices denoted by W = (W 1 , . . . ,W m ) ∈ W with row vectors (also called subvectors) W i ∈ S, i ∈ I and with components W i j , j ∈ J. The same notation is adopted for the image S(W ) of the mapping S : W → W defined by (1.16). We denote the set of nonnegative reals by R ≥0 . Parameters (1.14) form a matrix Ω ∈ R m×m ≥0 . The subvectors of Ω S are denoted by (Ω S) i , i ∈ I. 1 n = (1, 1, . . . , 1) ∈ R n denotes the vector with all entries equal to 1 and e i = (0, . . . , 0, 1, 0, . . . , 0) is the ith unit vector. The dimension of e i will be clear from the context. 1 S = 1 n 1 n denotes the barycenter of S (uniform categorical distribution). Similarly, 1 W with subvectors (1 W ) i = 1 S , i ∈ I denotes the barycenter of the assignment manifold W. I n denotes the identity matrix of dimension n × n.
The closure of W is denoted by and the set of integral assignments (labelings) by The support of a vector v ∈ R n is denoted by supp(v) = {i ∈ [n] : v i = 0}. x, y denotes the Euclidean inner product of vectors x, y and A, B = tr(A B) the inner product of matrices A, B. The spectral (or operator) norm of a matrix A is denoted by A 2 . For two matrices of the same size, A B denotes the Hadamard (entrywise) matrix product. For A ∈ R m×n , B ∈ R p×q , the matrix A ⊗ B ∈ R mp×nq denotes the Kronecker product of matrices with submatrices A i j B ∈ R p×q , i ∈ [m], j ∈ [n] (cf. e.g. [30]). N (A) and R(A) denote the nullspace and the range of the linear mapping represented by A ∈ R m×n . For strictly positive vectors with full support, like p ∈ S with supp(p) = [n], the entry-wise division of a vector v ∈ R n by p is denoted by v p . Likewise, we set pv = (p 1 v 1 , . . . , p n v n ) . The exponential function and the logarithm apply componentwise to vectors, i.e. e v = (e v 1 , . . . , e v n ) and log p = (log p 1 , . . . , log p n ) . For large expression as arguments, we also write which should not be confused with the exponential map (1.10) that is always written with subscript. Diag(p) denotes the diagonal matrix with the components of the vector p on its diagonal.

Representation of the Assignment Flow
The following parametrization of the assignment flow will be convenient for our analysis. The difference between (1.20) and (2.1) is that the latter representation separates the dependencies on the data D and the assignments W : The given data D completely determines S(t) through the initial condition of (2.1a), and S(t) completely determines the assignments W (t) by (2.1b). In what follows, our focus will be on how the parameters Ω affect S(t) and W (t).

Remark 1 (S-flow)
We call S-flow system (2.1a) and its solution S(t) in the remainder of this paper and use the short-hand F for the vector field, i.e.
A direct consequence of the parametrization (2.1) is the following.
Then the solution to (2.1b) is given by The second equation of (2.3) follows from the first equation of (1.9).
Transferring the assignment flow (1.20) to the tangent space T 0 and linearizing the ODE leads to the linear assignment flow [33,Prop. 4 with fixed S ∈ W and B ∈ T 0 . We note that both the S-flow (2.2) and the linear assignment flow (2.5) are defined by similar vector fields on the tangent space T 0 . Ignoring the constant term B in (2.5) that can be represented by using a corresponding initial point (see Lemma 2), the difference concerns the parameters S and S of the replicator matrix: In the linear assignment flow, this parameter S is fixed, whereas in the S-flow, it changes with the flow. Notice that 'linear' refers to the linearity of the ODE (2.5) on the tangent space. The corresponding lifted flow (2.56) on the assignment manifold is still nonlinear (cf. [33,Def. 4.1]).
Convergence properties of the S-flow and the linear assignment flow are analyzed in the following sections.

Existence and Uniqueness
We establish global existence and uniqueness of both the S-flow and the assignment flow and examine to what extent the former determines the latter.  and consequently the domain of the assignment flow, too, can be extended to W, and we henceforth assume this to be the case. Furthermore, the domain of the S-flow can be extended to an open set U with W ⊂ U ⊆ R m×n . In the latter case, although the existence for all t ≥ 0 is no longer guaranteed, this simplifies the stability analysis of equilibria S * ∈ W, as we will see in Section 2.3. (b) The assignment flow shares with replicator equations in general (cf. [15]) that it is invariant with respect to the boundary ∂ W: Due to the multiplication with R S and R W , respectively, both S(t) and W (t) cannot leave the corresponding facet of ∂ W whenever they reach it.
Next, we examine what convergence of S(t) close to ∂ W implies for W (t).

Proposition 4 Let
denote the Voronoi cells of the vertices of ∆ n in ∆ n and suppose lim t→∞ S i (t) = S * i ∈ ∆ n , for any i ∈ I. Then the following assertions hold.
(a) If S * i ∈ V j * (i) for some label (index) j * = j * (i) ∈ J, then there exist constants In particular, Proposition 4(a) states that if any subvector of the S-flow converges to a Voronoi cell (2.6), then the corresponding subvector of W (t) converges exponentially fast to the corresponding integral assignment. Proposition 4(b) handles the case when the limit point S * i lies at the border of adjacent Voronoi cells, that is the set argmax j∈J S * i j is not a singleton. In this case, one can only state that W i (t) converges to some (possibly nonintegral) point W * i without being able to predict precisely this limit based on S * i alone. In contrast to (a), we also have to assume that the convergence of the S-flow is fast enough-see the hypothesis of (2.8). This assumption is reasonable, however, because it is satisfied whenever S * i is subvector of a hyperbolic equilibrium point of the S-flow (cf. Remark 5 below).

Example 1
We briefly demonstrate what may happen when the assumption of (2.8) is violated. Suppose S i (t) and S * i are given by The first component of S i (t) converges faster than the second component. Since , the convergence rate assumption of (2.8) does not hold. Calculating W i (t) due to (2.3) gives (2.10) i.e. W i (t) still converges, but we have supp(W * i ) argmax j∈J S * i j unlike the statement of (2.8). This example also shows that, in the case of Proposition 4(b), the limit W * i may depend on the trajectory S i (t), rather than only on the limit point S * i as in case (a).
Proposition 4 makes explicit that the S-flow largely determines the asymptotic behavior of W (t). The next section, therefore, focuses on the S-flow (2.2) and on its dependency on the parameters Ω .

Convergence to Equilibria and Stability
In this section, we characterize equilibria, their stability, and convergence properties of the S-flow (2.2). Quantitative estimates of the basin of attraction to exponentially stable equilibria will be provided, too.

Characterization of Equilibria and Their Stability
We show in this section under mild conditions that only integral equilibrium points S * ∈ W * can be stable.

Proposition 5 (equilibria)
Let Ω ∈ R n×n be an arbitrary matrix.
(a) A point S * ∈ W is an equilibrium point of the S-flow (2.2) if and only if i.e., the subvectors (Ω S * ) i are constant on supp S * i , for each i ∈ I. (b) Every point S * ∈ W * is an equilibrium point of the S-flow (2.1a).
(c) Let J + ⊆ J be a non-empty subset of indices, and let 1 J + ∈ R n be the corresponding indicator vector with components (1 J + ) j = 1 if j ∈ J + and (1 J + ) j = 0 otherwise. Then S * = 1 |J + | 1 m 1 J + is an equilibrium point. In particular, the barycenter 1 W = 1 n 1 m 1 n corresponding to J + = J is an equilibrium point.

Proof
(a) Each equation of the system (2.2) has the forṁ and that the term in the round brackets is zero, which is (2.11).
which implies by (a) that S * is an equilibrium point.

Remark 3
The set of equilibria characterized by Proposition 5 (b) and (c) may not exhaust the set of all equilibrium points for a general parameter matrix Ω . However, we will show below that, under certain mild conditions, any such additional equilibrium points must be unstable.
Next, we study the stability of equilibrium points.
Lemma 1 (Jacobian) Let F(S) denote the vector field defining the S-flow (2.2). Then, after stacking S row-wise, the Jacobian matrix of F is given by Proof The subvectors of F have the form (2.14) Hence with vec(T ) ∈ R mn denoting the vector that results from stacking the row vectors (subvectors) of T . Comparing both sides of this equation, with the block matrices of the left-hand side given by (2.15), implies (2.13).
(a) A subset of the spectrum is given by In the latter case, the eigenvectors are given by (c) Assume the parameter matrix Ω with elements ω ii , i ∈ I on the main diagonal, is nonnegative. If S * i ∈ {0, 1} n and ω ii > 0 hold for some i ∈ I, then the Jacobian matrix has at least one eigenvalue with positive real part. The real and imaginary part of the corresponding eigenvector lie in Next, we apply Proposition 6 and the stability criteria stated in Appendix B in order to classify the equilibria of the S-flow.

Corollary 1 (stability of equilibria)
Let Ω be a nonnegative matrix with positive diagonal entries. Then, regarding the equilibria S * ∈ W of the S-flow (2.2), the following assertions hold.
(a) S * ∈ W * is exponentially stable if, for all i ∈ I, Proof (a) We apply Theorem 3(a) that provides a condition for stability of the S-flow, regarded as flow on an open subset of R m×n . Since the stability also holds on subsets, this shows stability of the S-flow on W. By Proposition 6(a), the spectrum of ∂ F ∂ S (S * ), for S * ∈ W * , is given by the righthand side of (2.16) and, since Ω is nonnegative, is clearly negative if condition (2.20) holds. (b) We take eigenvectors into account and invoke Proposition 16(b). The eigenvectors are given by (2.17), and if the eigenvalue there exists an open truncated cone C ⊂ R m×n such that δ ·V ∈ C, for sufficiently small δ > 0, and the S-flow This shows the instability of S * . (c) By the assumption on Ω , there is an eigenvalue with positive real part due to Proposition 6(c), and the real and imaginary part of the corresponding eigenvector lie in T + ⊆ T 0 . So the argument of (b) applies here as well using the real part of the eigenvector.
Remark 4 (selection of stable equilibria) For S * to be exponentially stable, Corollary 1(a) requires that every averaged subvector (Ω S * ) i has the same component as maximal component, as does the corresponding subvector S * i . This means that the Ω -weighted average of the vectors S * j within the neighborhood j ∈ N i lies in the Voronoi-cell V j * (i) (2.6) corresponding to S * i . Thus, Corollary 1 provides a mathematical and intuitively plausible definition of 'spatially coherent' segmentations of given data, that can be determined by means of the assignment flow. This also demonstrates how the label (index) selection mechanism of the replicator equations (1.22), whose spatial coupling defines the assignment flow (1.20), works from the point of view of evolutionary dynamics [22] when using the similarity vectors S i (W ) (1.16) as 'affinity measures'.

Convergence of the S-flow to Equilibria
We make the basic assumption that the parameter matrix Ω has the form and Matrices of the form (2.22) include as special cases parameters satisfying An instance of Ω satisfying (2.23b) are nonnegative uniform weights with symmetric neighborhoods, i.e.
Note that in the following basic convergence theorem, neither Ω nor Ω is assumed to be row-stochastic or nonnegative.

Theorem 1 (convergence to equilibria)
Let Ω be of the form (2.22). Then the Sflow (2.2) converges to an equilibrium point S * = S * (S 0 ) ∈ W, for any initial value S 0 ∈ W.
Proof See Appendix A.

Proposition 7
Let Ω be nonnegative with positive diagonal entries, and let S * ∈ W be an equilibrium point of the S-flow (2.2) which satisfies one of the instability criteria of Corollary 1 (b) or (c). Then the set of starting points S 0 ∈ W for which the S-flow converges to S * has measure zero in W.
Proof By [18], there exists a center-stable manifold M cs (S * ) which is invariant under the S-flow and tangent to E c ⊕ E s at S * . Here, E c and E s denote the center and stable subspace of ∂ F ∂ S (S * ), respectively. Any trajectory of the S-flow converging to S * lies in M cs (S * ). Therefore, it suffices to show that the dimension of the manifold M cs (S * ) ∩ W is smaller than the dimension of W. Note that M cs (S * ) ∩ W is a manifold since both M cs (S * ) and W are invariant under the S-flow. We have where E u denotes the unstable subspace of ∂ F ∂ S (S * ). Since ∂ F ∂ S (S * ) has an eigenvalue with positive real part and a corresponding eigenvector lying in T 0 (cf. proof of Corol- Remark 5 (consequences for the assignment flow) If S * ∈ W is a hyperbolic equilibrium point, then the S-flow locally behaves as its linearization near S * by the Hartman-Grobman theorem [21, Section 2.8]. Since a linear flow can only converge with an exponential convergence rate, this is also the case for the S-flow (2.2) 1 . More precisely, if the S-flow converges to a hyperbolic equilibrium S * ∈ W then there exist α, β > 0 such that , assumption of Proposition 4(b) automatically holds if S * is hyperbolic.

Theorem 2
Let Ω be a nonnegative matrix with positive diagonal entries. Then the set of starting points S 0 ∈ W for which the S-flow (2.2) converges to a nonintegral equilibrium S * ∈ W, has measure zero in W.
Proof Let E = {S * ∈ W : F(S * ) = 0} denote the set of all equilibria of the S-flow in W, which is a compact subset of W. If E contains only isolated points, i.e., E is finite, then the statement follows from Proposition 7. In order to take also into account nonfinite sets E of equilibria, we apply the more general [10, Theorem 9.1]. Some additional notation is introduced first.
For any index set J ⊆ I × J, set The set E J is the relative interior of a convex polytope and therefore a manifold of equilibria. This follows from the observation that the equilibrium criterion (2.11) is a set of linear equality constraints for S * ∈ W, given by where E c (S * ), E s (S * ) and E u (S * ) denote the center, stable and unstable subspace of . This set can be written as countable union of compact sets. This can be seen as follows. The map where λ (·) denotes the vector of eigenvalues, is a continuous map on a compact set and therefore proper, i.e., preimages of compact sets under the map (2.29) are compact. It is clear that the set U s ×U c ×U u with can be written as countable union of compact sets. The preimage of this set under the map (2.29) is E (n s ,n c ,n u ) .
To complete the proof, we now argue similar to the proof of Proposition 7: the existence of nontrivial unstable subspaces for nonintegral equilibria implies that the center-stable manifold has a smaller dimension.
Let J be the support of any nonintegral equilibrium and let E (n s ,n c ,n u ) be such that E J ∩ E (n s ,n c ,n u ) = / 0. As seen in the proof of Corollary 1(c), we have E u (S * ) ∩ T 0 = {0} for any S * ∈ E J , i.e. n u ≥ 1. Since both E J and E (n s ,n c ,n u ) can be written as countable union of compact sets, this is also the case for their intersection, i.e., we have with K l ⊆ E J compact. For any l ∈ N, there exists a center-stable manifold M cs (K l ) containing K l , which is invariant under the S-flow and tangent to E c (S * ) ⊕ E s (S * ) at any S * ∈ K l [10, Theorem 9.1]. Any trajectory of the S-flow converging to a point S * ∈ K l lies in M cs (K l ). Hence, analogous to the proof of Proposition 7, we have with any S * ∈ K l , i.e., M cs (K l ) ∩ W has measure zero in W. The countable union l∈N M cs (K l ) ∩ W, which contains all trajectories converging to an equilibrium S * ∈ E J ∩ E (n s ,n c ,n u ) , has measure zero as well. Since there are only finitely many such sets E J ∩ E (n s ,n c ,n u ) , this completes the proof.
In view of Theorem 2, the following Corollary that additionally takes into account assumption (2.22), is obvious.

Corollary 2 (Convergence to integral assignments)
Let Ω be a nonnegative matrix with positive diagonal entries which also fulfills the symmetry assumption (2.22). Then the set of starting points S 0 ∈ W, for which the S-flow (2.2) does not converge to an integral assignment S * ∈ W * , has measure zero. If Ω is additionally invertible, then the set of distance matrices D ∈ R m×n for which the S-flow does not converge to an integral assignment has measure zero as well.

Basins of Attraction
Corollary 1 says that, if a point S * ∈ W * satisfies the stability criterion (2.20), then there exists an open neighborhood of S * such that the S-flow emanating from this neighborhood will converge to S * with an exponential convergence rate. The subsequent proposition quantifies this statement by describing the convergence in balls around the equilibria which are contained in the corresponding basin of attraction.

Proposition 8
Let Ω be a nonnegative matrix with positive diagonal entries, and let S * ∈ W * satisfy (2.20). Furthermore, set which is an open convex polytope containing S * . Finally, let ε > 0 be small enough such that Then, regarding the S-flow (2.2), the following holds: and by Gronwall's Lemma (2.35a) holds. Hence, max i∈I S i − S * i 1 monotonically decreases as long as S(t) ∈ B δ (S * ). This guarantees that S(t) stays in B δ (S * ) ⊂ B ε (S * ) and converges toward S * .
Proposition 8 provides a criterion for terminating the numerical integration of the S-flow and subsequent 'safe' rounding to an integral solution. For this purpose, the following proposition provides an estimate of ε defining (2.34).
Proposition 9 Let S * ∈ W * satisfy (2.20). A value ε > 0 that is sufficient small for the inclusion (2.34) to hold, is given by We have to show that S ∈ A(S * ), with A(S * ) given by (2.33).
Hence, for any i ∈ I and any j = j * (i), we get with j * (k), k ∈ I similarly defined as j * (i) in (2.20), and by dropping the second nonnegative summand, and using the subvectors of S * are unit vectors, This verifies S ∈ A(S * ).

Corollary 3
Let Ω defined by (1.14) be given by uniform weights ω ik = 1 |N i | , k ∈ N i , i ∈ I. Then the value ε > 0 that achieves the inclusion (2.34) can be chosen as Proof Let j * (i) be defined as in (2.20). We have by assumption and integrality of the numerator in (2.44a). Monotonicity of the function and hence ε unif ≤ ε est , with ε est given by (2.39). The assertion, therefore, follows from Proposition 9.

Convergence Properties of the Linear Assignment Flow
This section analyzes the convergence of the linear assignment flow to equilibria and limit points. To apply the standard theory, we rewrite the matrix-valued (V ∈ R m×n ) equation of the linear assignment flow (2.5) into a vector-valued (V ∈ R mn ) one, using again V , for simplicity. Equation (2.5) then takes the forṁ Note that matrix A is exactly the second summand in the Jacobian (2.13) of the S-flow. The first summand of (2.13) is due to the dependence of the replicator matrix on the flow. The linear assignment flow (2.5) ignores this dependency by assuming S ∈ W to be fixed.
The following Lemma says that under the assumption b ∈ R(A), the asymptotic properties of (2.46a) can be inferred from the homogeneous system. Lemma 2 Let Ψ A,b,V 0 (t) denote the flow of the dynamical system (2.46) but with initial condition V (0) = V 0 and assume b ∈ R(A). Then the equation Proof For b ∈ R(A) we have AA + b = b and therefore with Duhamel's formula [29, p. 72] The hyperplane of initial values with c 1 = 0 separates two half-spaces which are the regions of attraction for the limit points in the directions v 1 and −v 1 , respectively.
Lemma 3 implies the following properties of the system (2.48).
Proposition 10 Any linear dynamical system of the form (2.48) with diagonalizable A has the following properties (a) If A has an eigenvalue with positive real part, then any finite equilibrium is unstable and the set of initial points converging to these equilibria is a null set. (b) If all eigenvalues of A are real, then the trajectory does not spiral around a subspace through the origin infinitely often, i.e. 0 neither is a spiral sink nor a spiral source. (c) The set of equilibria is the nullspace N (A). (d) The stable (resp. unstable) manifold is spanned by the eigenvectors of A corresponding to eigenvalues with negative (resp. positive) real part. All initial points which do not belong to the center-stable manifold diverge to infinity.
The following proposition complements Proposition 10 by examining the spectrum of the matrix A.
Proposition 11 Let A = R S (Ω ⊗ I n ) be the system matrix of the linear assignment flow (2.46a). Then the following holds.
S denotes the symmetric positive semidefinite square root of RŜ. The last matrix is symmetric and therefore all of the matrices above only have real eigenvalues. By Proposition 10(b), the system converges either to a finite equilibrium or towards a fixed point at infinity. (c) We have rank(A) = rank(R S (Ω ⊗ I n )) = rank(R S ) = m(n − 1), which yields the first statement. The second statement follows from R S (Ω ⊗ I n )(e i ⊗ 1 n ) = R S (Ω e i ⊗ I n 1 n ) = R S (Ω e i ⊗ 1 n ) = 0, (2.52) since R S i 1 n = 0, ∀i ∈ I. With Proposition 10(c) we conclude that 0 is the only finite equilibrium.  (2.53) Hence, by Sylvester's law, the matrices (Ω ⊗ I n ) 1 2 R S (Ω ⊗ I n ) 1 2 and R S have the same inertia. Thus, the center-stable manifold contains only the origin. Proposition 10(d) yields divergence to infinity.

Remark 6
If Ω is not a row-wise positive scaling of a symmetric matrix, the resulting matrix A may have complex eigenvalues. This can be seen for the choicê for which the matrix A has the eigenvalues σ (A) = { 1 2 + 1 2 i, 1 2 − 1 2 i, 0, 0}. Note that Ω is a row-wise scaling of a symmetric matrix but not a row-wise positive scaling.
The same matrixŜ and the matrix If Ω is a randomly chosen or a matrix of the form (2.22) and estimated from data, it generally has negative eigenvalues.
To analyze the asymptotic behavior of the lifted flow it is enough to lift the line in direction of the maximal eigenvector to the assignment manifold, as examined next. In particular, if v has a unique maximal entry, lim t→∞ exp p (tv) converges to the corresponding unit vector.
Proof Set v max = max i v i and consider exp p (tv) = exp p (t(v−v max 1 n )) = pe t(v−vmax1n) p,e t(v−vmax1n) . In the numerator, every entry which does not correspond to a maximal entry of v converges to 0 for t → ∞, whereas the other entries converge to the corresponding entry in p. The denominator normalizes the expression, which yields the result.
Applying this lemma to each vertex in I, we get the following statement on the convergence of the lifted linear assignment flow to integral assignments.

Corollary 4 Under the assumptions of Lemma 3, if v 1
W 0 has a unique maximal entry for each vertex, then the lifted flow (2.56) converges to an integral assignment.
Because W 0 and the dominant eigenvector of A depend on real data in practice, the assumptions of Corollary 4 are typically satisfied.
We conclude this section by comparing the convergence properties of the S-flow to those of the linear assignment flow.

Remark 7 (S-flow vs. linear assignment flow)
If Ω is nonnegative on the diagonal with at least one positive entry, the Jacobian matrices of the S-flow (at nonintegral points) and the Jacobian matrix of the linear assignment flow, i.e. A, have at least one eigenvalue with positive real part (see Proposition 6(c) and Proposition 11(a)). Thus, for both flows and such an Ω , the nonintegral equilibria are unstable (Corollary 1(c) and Proposition 11(a)).
Theorem 1 and Proposition 11(b) state that for both flows a sufficient condition for convergence is that Ω has the form (2.22). Let Ω have both properties, i.e. nonnegative on the diagonal with at least one positive entry and row-wise positive scaling of a symmetric matrix. Then, the set of initial values converging to a nonintegral point is negligible (Proposition 7, Theorem 2 and Proposition 10(a)).
For a given initial value, the two flows generally converge to different limit points and their regions of attraction generally look different. However, for small finite timepoints, the linear assignment flow approximates the assignment flow and (after the appropriate transformation) the S-flow very well [33].

Discretization, Geometric Integration
We confine ourselves to the simplest geometric scheme worked out by [33] for numerically integrating the assignment flow (1.20). Applying this scheme to the S-flow (2.2) that has the same structure as (1.20), yields the iteration where h denotes a fixed step size and the iteration step t represents the points of time th.
The following proposition shows that using this numerical method is 'safe' in the sense that, by setting h to a sufficiently small value, the approximation of the continuous-time solution S(t) by the sequence S(th) t≥0 generated by (3.1) can become arbitrarily accurate.
Proposition 12 Let L > 0 be the Lipschitz constant of the mapping F (2.2) defining the S-flow. Then there exists a constant C > 0 such that the solution S(t) to the S-flow (2.2) and the sequence S(th) t≥0 generated by (3.1) satisfy the relation Proof See Appendix A.
Proposition 8 asserts the existence of regions of attraction for stable equilibria S * ∈ W of the continuous-time S-flow (2.2). The following proposition extends this assertion to the discrete-time S-flow (3.1).

Numerical Examples, Discussion
We illustrate in this section by a range of counter-examples that violating assumption (2.22) can make the assignment flow behave quite differently from what the assertions of Section 2 predict. In fact, we use violations of the assumptions as a guiding principle for constructing alternative asymptotic behavior (Section 3.2.2).
In addition, we briefly discuss the influence of the parameter matrix Ω on the spatial shape of labelings returned by the assignment flow. Finally, we illustrate that our results on the region of attraction of the S-flow towards labelings turns the termination criterion proposed by [2] into a mathematically sound one, provided a proper geometrical scheme is used for numerically integrating the assignment flow.

Vanishing Diagonal Averaging Parameters
We consider a small dynamical system that violates the basic assumption of Corollary 1, that all diagonal entries of the parameter matrix Ω of the S-flow (2.2) are positive. As a consequence, an entire line of nonintegral points S * is locally attracting the flow. is an equilibrium of the S-flow satisfying F(S * ) = 0. In particular, this includes nonintegral points with p ∈ (0, 1). The eigenvalues of the Jacobian are given by and are nonpositive. The phase portrait depicted by Figure 3.1 illustrates that L locally attracts the flow. This small example demonstrates that violation of the basic assumption-here, specifically, ω 11 of (3.7) is not positive-leads to S-flows with properties not covered by the results of Section 2. Note that Theorem 2 is also based on this assumption and does not apply to the present example: there is an open set of starting points S 0 ∈ W for which the S-flow converges to nonintegral equilibria S * ∈ W.
Recalling Corollary 4, we see that for the linear assignment flow (2.5) continuous sets on the boundary of the assignment manifold, like line L in Figure 3.1, cannot be limit points.

Constructing 3 × 3 Systems with Various Asymptotic Properties
In this section, we construct a family of S-flows (2.1a) in terms of a class of nonnegative parameter matrices Ω , that may violate assumption (2.22) which underlies Theorem 1. Accordingly, for a small problem size n = 3, we explicitly specify flows that exhibit one of the following behaviors: 1. t → S(t) converges towards a point S * ∈ W as t → ∞; 2. t → S(t) is periodic with some period t 1 > 0; 3. t → S(t) neither converges to a point nor is periodic.
These cases are discussed below as Example 3 and illustrated by Figure 3.2. They demonstrate that assumption (2.22) is not too strong, because violation may easily imply that the flow fails to converge to an equilibrium.
Let D denote the set of doubly stochastic, circulant matrices. We consider the case m = |I| = |J| = n and therefore have D ⊂ W. Let denote the permutation matrix that represents the n-cycle (1, . . . , n). Then D is the convex hull of the matrices {P, P 2 , . . . , P n } with P n = I n , and any element M ∈ D admits the representation µ k P k with µ ∈ ∆ n . (3.11) Since the matrices P, P 2 , . . . , P n ∈ R n×n are linearly independent, the vector µ ∈ ∆ n is uniquely determined. We will call µ the representative of M ∈ D. The following Lemma characterizes two matrix products on D in terms of the corresponding matrix representatives.
Lemma 5 Let µ (1) , µ (2) ∈ ∆ n be the representatives of any two matrices M (1) , M (2) ∈ D. Then the element-wise Hadamard product and the ordinary matrix product, respectively, are given by Proof We note that the kth power of P is given by This implies P k P l = δ kl P k for k, l ∈ [n], with δ kl denoting the Kronecker delta, and As for (3.13), we compute 16b) The following proposition shows that the S-flow on D can be expressed by the evolution of the corresponding representative.
Then the solution S(t) ∈ D evolves on D for all t ∈ R. In addition, the corresponding representative p(t) ∈ ∆ n of S(t) = ∑ k∈[n] p k (t)P k satisfies the replicator equatioṅ p = R p (Ω p). Proof Let S = ∑ k∈[n] p k P k ∈ D with p ∈ ∆ n . Lemma 5 implies Therefore, for any i ∈ [n], Since this equation holds for any i ∈ [n], the right-hand side of the S-flow (2.1a) can be rewritten as Since p ∈ ∆ n , we have v, 1 n = 0, that is v is tangent to ∆ n . Hence, by (3.20),Ṡ = ∑ k∈[n]ṗk P k = R S (Ω S) is determined byṗ = v = R p (Ω p), whose solution p(t) evolves on ∆ n .
The following proposition introduces a restriction of parameter matrices Ω ∈ D that ensures, for any such Ω , that the product ∏ j∈[n] p j changes monotonously depending on the flow (3.17).

(3.22)
Proof Set π p := ∏ j∈[n] p j . By virtue of (3.17) and 1 n , Ω p = Ω 1 n , p = 1 (Ω is doubly stochastic and p ∈ ∆ n ), we have d dt (Ω p) j − p, Ω p = π p 1 − n p, Ω p . Hence, since π p > 0 for p ∈ S, d dt π p has the same sign as 1 n − p, Ω p . Regarding the term we have the following three cases: (α) for all k < n, the inequality p, P k p ≤ p, p = p, P n p holds, with equality if and only if p = 1 n 1 n ; (β ) ∑ k∈[n] p, P k p = p, 1 n×n p = 1; (γ) for all k ∈ [n], p, P k p = p, P n−k p , since P −1 = P .
The scalars γ k in (3.21) steer the skew-symmetric part of Ω . Consequently, if γ k = 0 for all k, then Ω is symmetric and the S-flow converges to a single point by Theorem 1. Depending on the skew-symmetric part, the S-flow may not converge to a point, as Example 3 below will demonstrate for few explicit instances and n = 3. Note that, in this case n = 3, (3.21) describes a parametrization rather than a restriction of Ω ∈ D. with the constraint µ ∈ ∆ 3 , i.e.
We examine the behavior of the flow (3.17), depending on the parameters α and γ. Note, that the flow does not depend on the parameter β that merely ensures Ω to be row-stochastic. Case α < 0. As already discussed (Remark 8), p(t) converges to the barycenter in this case. Depending on γ, this may happen with (γ = 0) or without (γ = 0) a spiral as depicted by Figure 3.2 (a) and (b).
Case α > 0. We distinguish again the two cases γ = 0 and γ = 0. If γ = 0, then the flow reduces toṗ = αR p p whose solution converges to lim t→∞ p(t) = 1 |J * | ∑ j∈J * e j ∈ ∆ 3 , with J * = arg max j∈ [3] p j (0). (3.31) As for the remaining case α > 0 and γ = 0, we distinguish α > |γ| and α ≤ |γ| as illustrated by Example 4 This example continues Example 3. Accordingly, we consider the case n = 3 and assume Ω ∈ D. Let the distance matrix D, whose row vectors define the mappings (1.16) corresponding to the assignment flow, be given by Then, if Ω ∈ D, the initial value S(0) = exp 1 W (−Ω D) of the S-flow (2.1a) lies in D as well. Hence, the above observations of Example 3 for the S-flow hold. The resulting assignment flow t → W (t) then also evolves in D which can be verified using (2.3).
As for the averaging parameters Ω , we consider the following three matrices in D: Matrix Ω center corresponds to the parameters (α, β , γ) = (− 1 2 , 3 2 , 1 2 ) of (3.28), for which the S-flow converges to the barycenter. As a consequence, W (t) converges to a point in W \ {1 W }.
Matrix Ω cycle corresponds to the parameters (α, β , γ) = (0, 1, 1 3 ), for which the S-flow has periodic orbits. Since these orbits are symmetric around the barycenter, i.e. t 1 0 S(t) − 1 W dt = 0 with t 1 being the period of the trajectory, the trajectory t → W (t) is also periodic as a consequence of equation (2.3).
Finally, matrix Ω spiral corresponds to the parameters (α, β , γ) = (0.1, 0.9, 0.3), for which the S-flow spirals towards the boundary of the simplex. It is not clear a priori if t → W (t) does not converge to a single point either. The trajectory of W (t) shown by Figure 3.3 suggests that the assignment flow also spirals towards the boundary of the simplex without converging to a single point.

Remark 9
Examples 3 and 4 considered the special case m = |I| = |J| = n = 3. We observed in further experiments similar behaviors also in the case |J| < |I|. For example, it can be verified, for |J| = 2 and Ω = Ω cycle from Example 4, that the S-flow possesses a (unstable) limit cycle, i.e. a periodic orbit.
The above examples also demonstrate that several symmetries in the input data are required, e.g. Ω ∈ D and S 0 ∈ D, in order to obtain nonconvergent orbits. However, small perturbations like numerical errors or the omnipresent noise in real data will break these symmetries. Therefore, it is very unlikely to observe such behavior of the S-flow and the assignment flow, respectively, in practice.

Geometric Averaging and Spatial Shape
We design and construct a small academical example that, despite its simplicity, illustrates the following important points: The flow for the matrix Ω center converges to a point in the interior of the assignment manifold. This limit point differs from the barycenter. The trajectory for the averaging matrix Ω cycle is a closed curve. The trajectory for Ω spiral is spiraling towards the boundary of the simplex. For the sake of clarity, the trajectory of only one data point is plotted for Ω spiral . The trajectories for the other data points can be obtained from that one by permuting the label indices.
the region of attraction due to Corollary 3, here for the special case of uniform averaging parameters Ω (and likewise more generally for nonuniform Ω (Proposition 9)), that enables to terminate the numerical scheme and rounding to the correct labeling; the influence of Ω on the spatial shape of patterns created through data labeling, which provides the basis for pixel-accurate 'semantic' image labeling; undesired asymptotic behavior of the numerically integrated assignment flowcf. Remark 10 below-cannot occur when using proper geometric numerical integration, like the scheme (3.1) or any scheme devised by [33]. The input image consisting of three colors, which was used for computing the distance matrix D, is shown on the left. This distance matrix was used to initialize the S-flow, whose limit is illustrated by the image on the right. This is a minimal example that demonstrates how stability conditions (2.20) constrain spatial shape.
green and blue. For spatial regularization we used 3 × 3 neighborhoods N i , i ∈ I with uniform weights ω ik = 1 |N i | , k ∈ N i , with shrunken neighborhoods if they intersect the boundary of the underlying quadratic domain. The distance matrix D that initializes the S-flow by S 0 = exp 1 W (−Ω D), was set to D i j = 10 · u i − e j 2 , i ∈ I, j ∈ J.
Adopting the termination criterion from [2], we numerically integrated the S-flow using the scheme (3.1), until iteration T when the average entropy dropped below 10 −3 , i.e.
The resulting assignment S (T ) was rounded to the integral assignment S * ∈ W * depicted by the right panel of Figure 3.4. We observe the following.
(i) The resulting labeling S * differs from the input image although exact (integral) input data are used. This conforms to Corollary 1(b), which enables to recognize the input data as unstable. As a consequence, the green and blue labels at the corners of the corresponding quadrilateral shapes in the input data are replaced by the flow. The resulting labeling S * is stable, as one easily verifies using Corollary 1(a). This simple example and the corresponding observation points to a fundamental question to be investigated in future work: how can Ω be used for 'storing' prior knowledge about the shape of labeling patterns? (ii) Using the estimate (2.43) which is the special case of (2.39) in the case of uniform weights, we computed Since the distance between S * and the assignment S (T ) obtained after terminating numerical integration due to (3.34), satisfied we had the guarantee due to Proposition 13 that S (t) converges for t > T to S * , i.e. that no label indicated by S (T ) can change anymore. With regard to Proposition 12, the estimate (3.2) implies for sufficiently small step size h > 0 that the continuous S-flow S(hT ) also lies in the attracting region B ε (S * ). Proposition 8 then states the convergence of the S-flow to S * . Eventually, the continuous assignment flow (1.20) converge to S * by Proposition 4.
Remark 10 (numerical integration and asymptotic behavior) The authors of [2] adopted a numerical scheme from [19] which, when adapted and applied to (1.1), was shown in [5] to always converge to a constant solution as t → ∞, i.e. a single label is assigned to every pixel, which clearly is an unfavorable property. Irrespective of the fact that uniform positive weights were used by [2], that satisfy assumption (2.22), this strange asymptotic behavior resulted from the fact that the adaption of the discrete scheme of [19] implicitly uses different step sizes for updating the flow S i at different locations i ∈ I. Our results in this paper show that the continuous-time assignment flow does not exhibit this asymptotic behavior, under appropriate assumptions on the parameter matrix Ω . In addition, point (ii) above and Proposition 13 show that using a proper geometric scheme from [33] turns condition (3.34) into a sound criterion for terminating the numerical scheme, followed by safe rounding to an integral labeling.

Conclusion
We established in this paper that under reasonable assumptions on the weight parameters Ω , the assignment flow approach is a sound method for contextual data classification on graphs. Favourable properties like convergence to integral assignments and existence of corresponding basins of attraction extend to sequences generated by discrete-time schemes for geometric integration. This shows that geometric numerical integration of the assignment flow yields sound numerical algorithms. A range of counter-examples demonstrate that these conditions are not too strong, since violating them may quickly lead to unfavorable behavior of the assignment flow regarding classification.
The results provide a proper basis and justify recent work on learning the assignment flow parameters Ω from data [16,31,32], on extending the approach to unsupervised data classification on graphs [34,35] or taking additional spatial constraints into account [28]. Our future work will focus on deeper parametrizations of assignment flows using the same mathematical framework and on studying their properties and performance for statistical data classification on graphs.

A.1 Proof of Proposition 4
Proof (a) Let β i : there exists t 1 ≥ 0 such that We estimate where the last inequality follows from the hypothesis of (2.8). Thus, the improper integral whereas for any j ∈ J \ J * (i) A.2 Proof of Proposition 6 Proof (a) Since σ ∂ F ∂ S (S * ) = σ ∂ F ∂ S (S * ) , we may alternatively regard the transpose of the Jacobian We have for each i ∈ I, Hence, the transposed Jacobian possesses the following eigenpairs: If S * ∈ W * , then | supp(S * i )| = 1 for each i ∈ I and therefore (A.9) specifies all mn eigenpairs and the entire spectrum, which proves (2.16). In this case, the eigenvectors of ∂ F ∂ S (S * ) can also be stated explicitly: Since R S * = 0, we have Each block B i fulfills Hence, the corresponding eigenvectors of ∂ F ∂ S (S * ) are for all i ∈ I, i.e., the Jacobian matrix simplifies to Hence, the Jacobian has the following mn − (m − | I|)(|J + | − 1) eigenpairs: If | I| = m, we thus have a complete set of mn eigenpairs. If |Ĩ| < m, we may consider a diagonalizable perturbation Ω of Ω . By the same argument, we get a complete set of eigenpairs for the perturbed Jacobian matrix. Consequently, we obtain (2.18) by continuity of the spectrum. (c) We show that the real and imaginary parts of the corresponding eigenvector lie in the linear subspace To this end, we show the two inclusions im R S * ⊆ T + ⊆ ker B, (A. 19) where R S * and B denote the block diagonal matrices (A.20) As for the first inclusion, we use the orthogonal projection onto T + given by One can verify that Π T+ R S * = R S * which implies im R S * ⊆ im Π T+ = T + , i.e. the first inclusion of (A. 19).
As for the second inclusion, we have to take into account that S * is an equilibrium point, i.e. by (2.11) Since B is a block diagonal matrix, it suffices to examine each block separately. Since is independent of j ∈ supp(S * i ), we get B i v = 0 for any v ∈ R n with v, 1 n = 0 and supp(v) ⊆ supp(S * i ). This verifies the second inclusion of (A. 19). As a consequence of the two inclusions (A. 19), any eigenvector V of R S * (Ω ⊗ I n ) corresponding to a nonvanishing eigenvalue λ = 0 has a real and imaginary part lying in im R S * ⊆ T + ⊆ ker B. Therefore, (λ ,V ) is also an eigenpair of ∂ F ∂ S (S * ) = B + R S * (Ω ⊗ I n ). It remains to show that has at least one eigenvalue with positive real part. Since the trace is positive by assumption, the existence of such an eigenvalue is guaranteed.

A.3 Proof of Theorem 1
The proof follows after two preparatory Lemmata. Let Λ ⊂ W be the limit set of the orbit {S(t) : t ≥ 0}, i.e.
The set Λ = / 0 is non-empty since W is compact.
with equality only if S satisfies the equilibrium criterion (A.28).
Next, we introduce some additional notation. Let S * ∈ Λ be an equilibrium with S(t k ) → S * . The weighted Kullback-Leibler divergence is defined by with weights w ∈ R m >0 from (2.22) and the supports Analogously to [19], we consider the index sets and define the continuous functions Q : W → R ≥0 and V : W → R ≥0 ∪ {∞} by The equilibrium criterion (A.28) implies i.e. V (S * ) = Q(S * ) = 0. Using the Lyapunov function (A.29), we have the following. Proof (Proof of Theorem 1) Let S * ∈ Λ be any equilibrium point and (t k ) k∈N a corresponding sequence due to (A.27). We show that D w KL (S * , S(t)) → 0 for t → ∞, which is equivalent to the assertion S(t) → S * to be shown.

A.4 Proof of Proposition 12
Proof For any t ∈ N 0 , we set Y (t) (τ) = F τ (S (t) ) By assumption, F given by (2.2) is C 1 , as is G given by (A.43) which has the same form. Consequently, regarding the integrand of the last integral, since W is compact there exists a constant C such that

B Stability Statements for Dynamical Systems
We state basic results from the literature which are used to analyze the stability of the equilibria of the S-flow in Section 2.3.
Theorem 3 Let x * be an equilibrium point of the systemẋ(t) = F(x(t)) with F ∈ C 1 (U, R n ).
(a) If all eigenvalues of the Jacobian matrix ∂ F ∂ x (x * ) have negative real part, then x * is exponentially stable. (b) If the Jacobian matrix ∂ F ∂ x (x * ) has an eigenvalue with positive real part, then x * is unstable.
Statement (a) can be found in [29,Theorem 6.10]. For statement (b) we refer to [24, Proposition 6.2.1]. These stability criteria concern flowsẋ(t) = F(x(t)) on an open subset U ⊆ R n . Since we regard the S-flow as a flow on the compact set W, we need a few additional arguments. In [24, Section 6.8.4], a direct proof of theorem 3(b) is sketched. Since we employ techniques that are used in that proof for our own analysis, we summarize the main statements in the following proposition for the reader's convenience. Informally, the proposition states that, if ∂ F ∂ x (x * ) has an eigenvalue with positive real part, then there exists an open truncated cone at x * where the flowẋ = F(x) is repelled from x * .
We note that if ∂ F ∂ x (x * ) is diagonalizable with real eigenvalues then the similarity transform in proposition 16(a) is just the diagonalization. In general, if v ∈ C n is an eigenvector of ∂ F ∂ x (x * ) corresponding to an eigenvalue λ ∈ C with Re(λ ) > 0 and V ∈ GL n (R) is given by Proposition 16(a), then V −1 Re(v) = 0 yu and V −1 Im(v) = 0 yu .