Assignment flows for data labeling on graphs: convergence and stability

Zern, Artjom; Zeilmann, Alexander; Schnörr, Christoph

doi:10.1007/s41884-021-00060-8

Assignment flows for data labeling on graphs: convergence and stability

Research Paper
Open access
Published: 18 November 2021

Volume 5, pages 355–404, (2022)
Cite this article

Download PDF

You have full access to this open access article

Information Geometry Aims and scope Submit manuscript

Assignment flows for data labeling on graphs: convergence and stability

Download PDF

2589 Accesses
8 Citations
6 Altmetric
Explore all metrics

Abstract

The assignment flow recently introduced in the J. Math. Imaging and Vision 58/2 (2017) constitutes a high-dimensional dynamical system that evolves on a statistical product manifold and performs contextual labeling (classification) of data given in a metric space. Vertices of an underlying corresponding graph index the data points and define a system of neighborhoods. These neighborhoods together with nonnegative weight parameters define the regularization of the evolution of label assignments to data points, through geometric averaging induced by the affine e-connection of information geometry. From the point of view of evolutionary game dynamics, the assignment flow may be characterized as a large system of replicator equations that are coupled by geometric averaging. This paper establishes conditions on the weight parameters that guarantee convergence of the continuous-time assignment flow to integral assignments (labelings), up to a negligible subset of situations that will not be encountered when working with real data in practice. Furthermore, we classify attractors of the flow and quantify corresponding basins of attraction. This provides convergence guarantees for the assignment flow which are extended to the discrete-time assignment flow that results from applying a Runge–Kutta–Munthe–Kaas scheme for the numerical geometric integration of the assignment flow. Several counter-examples illustrate that violating the conditions may entail unfavorable behavior of the assignment flow regarding contextual data classification.

On the Geometric Mechanics of Assignment Flows for Metric Data Labeling

On the geometric mechanics of assignment flows for metric data labeling

Article Open access 03 November 2023

A Variational Perspective on the Assignment Flow

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Problem and motivation

Metric data labeling denotes the task to assign to each data point of a given finite set $\mathcal {F}_{I}=\{f_{i}:i\in I\}\subset \mathcal {F}$ in a metric space $(\mathcal {F},d_{\mathcal {F}})$ a unique label (a.k.a. prototype, class representative) from another given set $\mathcal {F}^{*}_{J}=\{f^{*}_{j}:j\in J\}\subset \mathcal {F}$. The data indices $i\in I$ typically refer to positions $x_{i} \in {\mathbb {R}}^{d}$ in space of in space-time $[0,T]\times {\mathbb {R}}^{d}$. Accordingly, one associates with the data a graph $G=(I,E)$ where the set of nodes I indexes the data and the edge set $E\subset I\times I$ represents a neighborhood system. A basic example is provided by the data of a digital image observed on a regular grid graph G in which case the data space $\mathcal {F}$ may be a color space, a high-dimensional Euclidean space like in multispectral imaging, or the positive-definite matrix manifold like in diffusion tensor medical imaging.

Data labeling provides a dramatic reduction of given data as Fig. 1 illustrates. In addition, it is a crucial step for data interpretation. Basic examples include the analysis of traffic scenes [8], of medical images or of satellite images in remote sensing.

The assignment flow approach [2] provides a mathematical framework for the design of dynamical systems that perform metric data labeling. This approach replaces established variational methods to image segmentation [7] as well as discrete Markov random fields for image labeling [17] by smooth dynamical systems that facilitate the design of hierarchical systems for large-scale numerical data analysis. In addition, it can be extended to unsupervised scenarios [34] where the labels $\mathcal {F}_{J}^{*}$ can be adapted to given data or even learned from the data itself [35]. We refer to the survey [26] for further discussion and related work.

Interpretation of data is generally not possible without an inductive bias towards prior expectations and application-specific knowledge. In connection with image labeling, such knowledge is represented by regularization parameters that influence label assignments by controlling the assignment flow. Figure 2 provides an illustration. Nowadays, such parameters are learned directly from data. Due to the inherent smoothness, assignment flows can be conveniently used to accomplish this machine learning task [16, 31, 32].

From a more distant point of view, deep networks and learning [13] prevail in machine learning. Besides their unprecedented performance in applications, current deep learning architectures are also known to be susceptible to data perturbations leading to unpredictable erroneous outputs [9, 11, 14]. Our aim, therefore, is to prove stability properties of assignment flows under suitable assumptions on the regularization parameters, together with the guarantee that labelings, i.e. integral assignments, are computed for any data at hand.

Section 1.3 further details the scope of this paper after introducing the assignment flow approach in the next section.

1.2 Assignment flow

The assignment flow has been introduced by [2] for the labeling of arbitrary data given on a graph $G=(I,E)$. It is defined by the system of nonlinear ODEs

$$\begin{aligned} {\dot{W}}_{i} = R_{W_{i}} S_{i}(W),\quad W_{i}(0) = \tfrac{1}{n} \mathbb {1}_n,\quad i \in I, \end{aligned}$$

(1.1)

whose solutions $W_i(t)$ evolve on the elementary Riemannian manifold $(\mathcal {S},g)$ given by the relative interior $\mathcal {S}={{\,\mathrm{rint}\,}}(\varDelta _{n})$ of the probability simplex

$$\begin{aligned} \varDelta _{n} = \Big \{p \in {\mathbb {R}}^{n} :\sum _{j=1}^{n} p_{j} = \langle \mathbb {1}_{n},p\rangle = 1,\ p \ge 0 \Big \}. \end{aligned}$$

(1.2)

Here, $n = |J|$ denotes the number of labels and $\mathbb {1}_n = (1,\dotsc ,1)^{\top } \in {\mathbb {R}}^n$ is the vector of ones. The tangent space of $\mathcal {S}$ at any point $p \in \mathcal {S}$ is given by

$$\begin{aligned} T_{0} = \{v \in {\mathbb {R}}^{n} :\langle \mathbb {1}_n,v\rangle = 0\}, \end{aligned}$$

(1.3)

and the Riemannian structure on $\mathcal {S}$ is defined by the Fisher-Rao metric

$$\begin{aligned} g_{p}(u,v) = \sum _{j=1}^n \frac{u_{j} v_{j}}{p_{j}},\quad p \in \mathcal {S},\quad u,v \in T_{0}. \end{aligned}$$

(1.4)

The basic idea underlying the approach (1.1) is that each vector $W_{i}(t)$ converges within $\mathcal {S}$ to an $\varepsilon $-neighborhood of some vertex (unit vector) $e_{j}$ of $\varDelta _{n}$, that is

$$\begin{aligned} \forall \varepsilon > 0:\quad \Vert W_{i}(T)-e_{j}\Vert \le \varepsilon , \end{aligned}$$

(1.5)

for sufficiently large $T = T(\varepsilon )>0$. This enables to assign a unique label (class index) j to the data point observed at vertex $i \in I$ by trivial rounding:

$$\begin{aligned} j = \underset{l \in \{1,\dotsc ,n\}}{{{\,\mathrm{arg max}\,}}}~W_{i l}. \end{aligned}$$

(1.6)

In the following, we give a complete definition of the vector field defining the assignment flow (1.1). The linear mapping $R_{W_{i}}$ of (1.1) will be called replicator matrix. It is defined by

$$\begin{aligned} R_{p} :{\mathbb {R}}^{n} \rightarrow T_{0},\quad R_{p} = {{\,\mathrm{Diag}\,}}(p)-p p^{\top },\quad p \in \mathcal {S}. \end{aligned}$$

(1.7)

Regarding the orthogonal projection onto $T_0$ given by

$$\begin{aligned} \varPi _{0} :{\mathbb {R}}^{n} \rightarrow T_{0},\qquad \varPi _{0} = I_{n}-\tfrac{1}{n}\mathbb {1}_{n}\mathbb {1}_{n}^{\top } \end{aligned}$$

(1.8)

with $I_n$ denoting the identity matrix, the replicator matrix satisfies

$$\begin{aligned} R_{p} = R_{p}\varPi _{0} = \varPi _{0} R_{p},\quad \forall p \in \mathcal {S}. \end{aligned}$$

(1.9)

Further, we will use the exponential map and its inverse

$$\begin{aligned}&\exp _{p} :{\mathbb {R}}^{n} \rightarrow \mathcal {S},\quad \exp _{p}(v) = \frac{p e^{v}}{\langle p, e^{v}\rangle },\quad p \in \mathcal {S}, \end{aligned}$$

(1.10a)

$$\begin{aligned}&\exp _{p}^{-1} :\mathcal {S} \rightarrow T_{0},\quad \exp _{p}^{-1}(q) = \varPi _{0}\log \frac{q}{p}, \end{aligned}$$

(1.10b)

where multiplication, division, exponentiation and logarithm of vectors is meant componentwise. We call this map ‘exponential’ for simplicity. In fact, definition (1.10a) is the explicit expression of the relation

$$\begin{aligned} \exp _{p} = {{\,\mathrm{Exp}\,}}_{p} \circ R_{p}, \end{aligned}$$

(1.11)

where ${{\,\mathrm{Exp}\,}}:\mathcal {S} \times T_{0} \rightarrow \mathcal {S}$ is the exponential map corresponding to the affine e-connection of information geometry; see [1, 3, 26] for details. A straightforward calculation shows that the differential of $\exp _{p}$ at v is

$$\begin{aligned} d\exp _{p}(v) = R_{\exp _{p}(v)}, \end{aligned}$$

(1.12)

where the right-hand side is defined by (1.7) and (1.10a).

The behavior of the assignment flow (1.1), essentially rests upon the coupling of the local systems through the mappings $S_{i}$ within local neighborhoods

$$\begin{aligned} \mathcal {N}_{i} = \{i\} \cup \{k \in I :i \sim k\},\qquad i \in I, \end{aligned}$$

(1.13)

corresponding to the adjacency relation $E \subseteq I \times I$ of the underlying graph G. These couplings are parameterized by nonnegative weights

$$\begin{aligned} \varOmega = \{\omega _{ik}\}_{k \in \mathcal {N}_{i}, i \in I}. \end{aligned}$$

(1.14)

Considering the assignment manifold

$$\begin{aligned} \mathcal {W} = \mathcal {S} \times \cdots \times \mathcal {S}, \qquad (\text{|I| } \text{ times}) \end{aligned}$$

(1.15)

the similarity map $S :\mathcal {W} \rightarrow \mathcal {W}$ is defined by

$$\begin{aligned}&S_{i}:\mathcal {W} \rightarrow \mathcal {S},\qquad S_{i}(W) = {{\,\mathrm{Exp}\,}}_{W_{i}}\Big (\sum _{k \in \mathcal {N}_{i}} \omega _{ik} {{\,\mathrm{Exp}\,}}_{W_{i}}^{-1}\big (L_{k}(W_{k})\big )\Big ), \quad i \in I \end{aligned}$$

(1.16a)

$$\begin{aligned}&L_{i} :\mathcal {S} \rightarrow \mathcal {S},\qquad L_{i}(W_{i}) = \exp _{W_{i}}(-D_{i}),\quad i \in I. \end{aligned}$$

(1.16b)

It regularizes the assignment vectors $W_{i} \in \mathcal {S}$ depending on the parameters (1.14), for given input data in terms of distance vectors $D_{i} \in {\mathbb {R}}^n$ storing the distances $D_{ij} = d_{\mathcal {F}}(f_i, f_j^*)$ between data points $f_i \in \mathcal {F}_{I}$ and prototypes $f_j^* \in \mathcal {F}_{J}^*$. Denoting the barycenter of $\mathcal {S}$ with $\mathbb {1}_{\mathcal {S}} = \frac{1}{n} \mathbb {1}_{n}$, the defining relation (1.16a) can be rewritten in the form [23, Lemma 3.2]

$$\begin{aligned} S_{i}(W) = \exp _{\mathbb {1}_{\mathcal {S}}}\Big (\sum _{k \in \mathcal {N}_{i}}\omega _{ik}\big (\exp _{\mathbb {1}_{\mathcal {S}}}^{-1}(W_{k})-D_{k}\big )\Big ),\quad i \in I. \end{aligned}$$

(1.17)

In view of (1.15), all the mappings in (1.7), (1.10) and (1.16) naturally generalize from $\mathcal {S}$ to $\mathcal {W}$ and from $T_{0}$ given by (1.3) to

$$\begin{aligned} \mathcal {T}_{0} = T_{0} \times \cdots \times T_{0}. \qquad (|I|\text { times}) \end{aligned}$$

(1.18)

For example,

$$\begin{aligned} \exp _{W}(V) = \big (\exp _{W_{1}}(V_{1}), \dotsc ,\exp _{W_{|I|}}(V_{|I|})\big )^{\top }. \end{aligned}$$

(1.19)

We also denote the barycenter of $\mathcal {W}$ with $\mathbb {1}_{\mathcal {W}} = ( \mathbb {1}_{\mathcal {S}}, \dotsc , \mathbb {1}_{\mathcal {S}})^\top $. Accordingly, collecting all equations of (1.1), the assignment flow reads

$$\begin{aligned} {\dot{W}} = R_{W} S(W),\qquad W(0)=\mathbb {1}_{\mathcal {W}}. \end{aligned}$$

(1.20)

1.3 Objectives

The first goal of this paper is to analyze the asymptotic behavior of the assignment flow (1.1) depending on the parameters (1.14). It was conjectured [2, Conjecture 1] that, for data in ‘general position’ as they are typically observed in real scenarios (e.g. no symmetry due to additive noise), the assignment flow converges to an integral labeling at every pixel, as described above in connection with (1.6). We confirm this conjecture in this paper under suitable assumptions on the parameters $\varOmega $. To this end, we use a reparametrization of the assignment flow and clarify the convergence of the reparameterized flow to equilibria and their stability.

The second goal of this paper concerns the same question regarding the time-discrete assignment flow that is generated by a scheme for numerically integrating (1.1). Depending on what scheme is chosen, properties of the resulting flow may differ from properties of the time-continuous flow (1.1). Indeed, the authors of [2] adopted a numerical scheme from [19] which, when adapted and applied to (1.1), was shown in [5] to always converge to a constant solution, i.e. a single label is assigned to every pixel no matter which data are observed. Even though numerical experiments strongly indicate that this undesirable asymptotic behavior is irrelevant in practice, because it only occurs when $W(t_{k})$ is so close to the boundary of the closure of the underlying domain such that it cannot be reproduced with the usual machine accuracy, such behavior—nonetheless—is unsatisfactory from the mathematical viewpoint.

In this paper, therefore, we consider the simplest numerical scheme that was recently devised and studied in [33], to better take into account the geometry underlying the assignment flow (1.1) than the numerical scheme adopted in [2]. We show under suitable assumptions on the parameters $\varOmega $, that the time-discrete assignment flow generated by such a proper numerical scheme cannot exhibit the pathological asymptotic behavior mentioned above.

1.4 Related work

The assignment flow approach emerged from classical methods (variational methods, discrete Markov random fields) to image segmentation and labeling. We refer to [26] for further discussion. The approach can take into account any differentiable data likelihood, and all discrete decisions like the formation of spatial regions at a certain scale are done by integrating the flow numerically. The inherent smoothness of the approach compares favorably to discrete schemes for image segmentation, like region growing schemes [20], in particular regarding the learning of parameters for incorporating prior knowledge. In particular, spatial regularization can be performed independently of the metric model of the data at hand. This is not the case for segmentation based on spectral clustering [27] as discussed in detail and demonstrated by [35].

From a more distant viewpoint, our results may be also of interest in the field of evolutionary game dynamics [15, 22]. The corresponding basic dynamical system has the form

$$\begin{aligned} \dot{p} = p \big (f(p)-{\mathbb {E}}_{p}[f(p)] \mathbb {1}_{n}\big ),\qquad p(0) \in \varDelta _{n}, \end{aligned}$$

(1.21)

where the first multiplication on the right-hand side is done componentwise, the expectation is given by ${\mathbb {E}}_{p}[f(p)]=\langle p, f(p)\rangle $ and p(t) evolves on $\varDelta _{n}$. The differential equation (1.21) is known as replicator equation. It constitutes a Riemannian gradient flow with respect to the Fisher-Rao metric if $f = \nabla F$ derives from a potential F. It is well known that depending on what ‘affinity function’ $f :\varDelta _{n} \rightarrow {\mathbb {R}}^{n}$ is chosen, a broad range of dynamics may occur, even for linear affinities $p \mapsto A p,\; A \in {\mathbb {R}}^{n \times n}$ (see e.g. [6]). Other choices give even rise to chaotic dynamics (see e.g. [12]). By comparison, the explicit form of Eq. (1.1) reads

$$\begin{aligned} \dot{W}_{i} = W_{i}\big (S_{i}(W)-{\mathbb {E}}_{W_{i}} [S_{i}(W)]\mathbb {1}_{n}\big ),\qquad i \in I, \end{aligned}$$

(1.22)

where $S_{i}(W)$ couples a possibly very large number $m = |I|$ of replicator equations of the form (1.22), as explained above in connection with (1.14). The mapping $S_{i}$ does not derive from a potential, however, but can be related to a potential after a proper reparametrization and under a symmetry assumption on the parameters (1.14) [23]. We refer to [26] for a more comprehensive discussion of the background and further work related to the assignment flow (1.1).

1.5 Organization

The assignment flow and its basic properties (limit points, convergence, stability) are established in Sect. 2. We briefly examine in Sect. 2.4 also properties of a simplified approximate version of the assignment flow, that can be linearly parametrized on the tangent space, which is convenient for data-driven estimation of suitable weight parameters [16]. In Sect. 3, we extend these results to the discrete-time assignment flow that is obtained by applying the simplest numerical scheme for geometric integration of the assignment flow, as worked out in [33]. Numerical examples demonstrate that violating the conditions established in Sect. 2 may lead to various behaviors of the assignment flow, all of which are unfavorable as regards data classification. Some lengthy proofs have been relegated to Appendix A. We conclude in Sect. 4.

1.6 Basic notation

We set $[n]=\{1,2,\dotsc ,n\}$ for any $n \in {\mathbb {N}}$ and denote by |S| the cardinality of any finite set S. Throughout this paper, m and n will denote the number of vertices of the underlying graph $G=(I,E)$ and the number of classes indexed by J, respectively,

$$\begin{aligned} m=|I|,\qquad n = |J|. \end{aligned}$$

(1.23)

The set $\mathcal {W} = \mathcal {S} \times \dots \times \mathcal {S}$ (1.15) is called assignment manifold, where $\mathcal {S} = {{\,\mathrm{rint}\,}}(\varDelta _n)$ is the relative interior of the probability simplex $\varDelta _n$. $\mathcal {S}$ and $\mathcal {W}$, respectively, are equipped with the Fisher-Rao metric (1.4) and hence are Riemannian manifolds. Points of $\mathcal {W}$ are row-stochastic matrices denoted by $W =(W_{1},\dotsc ,W_{m})^{\top } \in \mathcal {W}$ with row vectors (also called subvectors) $W_{i} \in \mathcal {S},\, i \in I$ and with components $W_{ij},\, j \in J$. The same notation is adopted for the image S(W) of the mapping $S :\mathcal {W} \rightarrow \mathcal {W}$ defined by (1.16). We denote the set of nonnegative reals by ${\mathbb {R}}_{\ge 0}$. Parameters (1.14) form a matrix $\varOmega \in {\mathbb {R}}_{\ge 0}^{m \times m}$. The subvectors of $\varOmega S$ are denoted by $(\varOmega S)_{i},\, i \in I$.

$\mathbb {1}_n=(1,1,\dotsc ,1)^{\top } \in {\mathbb {R}}^n$ denotes the vector with all entries equal to 1 and $e_{i}=(0,\dotsc ,0,1,0,\dotsc ,0)^{\top }$ is the ith unit vector. The dimension of $e_{i}$ will be clear from the context. $\mathbb {1}_{\mathcal {S}} = \frac{1}{n}\mathbb {1}_{n}$ denotes the barycenter of $\mathcal {S}$ (uniform categorical distribution). Similarly, $\mathbb {1}_{\mathcal {W}}$ with subvectors $(\mathbb {1}_{\mathcal {W}})_{i}=\mathbb {1}_{\mathcal {S}},\,i \in I$ denotes the barycenter of the assignment manifold $\mathcal {W}$. $I_{n}$ denotes the identity matrix of dimension $n \times n$.

The closure of $\mathcal {W}$ is denoted by

$$\begin{aligned} \overline{\mathcal {W}}=\varDelta _{n} \times \cdots \times \varDelta _{n} \end{aligned}$$

(1.24)

and the set of integral assignments (labelings) by

$$\begin{aligned} \overline{{\mathcal {W}}}^*= \overline{{\mathcal {W}}} \cap \{ 0, 1 \}^{m \times n}. \end{aligned}$$

(1.25)

Each subvector $W_{i}$ of a point $W \in \overline{{\mathcal {W}}}^*$ is a unit vector $W_{i}=e_{j}$ for some $j \in J$.

The support of a vector $v \in {\mathbb {R}}^n$ is denoted by ${{\,\mathrm{supp}\,}}(v) = \{ i \in [n] :v_i \ne 0 \}$. $\langle x, y\rangle $ denotes the Euclidean inner product of vectors x, y and $\langle A, B\rangle ={\mathrm{tr}}(A^{\top } B)$ the inner product of matrices A, B. The spectral (or operator) norm of a matrix A is denoted by $\Vert A\Vert _{2}$. For two matrices of the same size, $A \odot B$ denotes the Hadamard (entry-wise) matrix product. For $A \in {\mathbb {R}}^{m \times n}$, $B \in {\mathbb {R}}^{p \times q}$, the matrix $A \otimes B \in {\mathbb {R}}^{mp \times nq}$ denotes the Kronecker product of matrices with submatrices $A_{ij} B \in {\mathbb {R}}^{p \times q},\; i \in [m],\, j \in [n]$ (cf. e.g. [30]). $\mathcal {N}(A)$ and $\mathcal {R}(A)$ denote the nullspace and the range of the linear mapping represented by $A \in {\mathbb {R}}^{m \times n}$. For strictly positive vectors with full support, like $p \in \mathcal {S}$ with ${{\,\mathrm{supp}\,}}(p)=[n]$, the entry-wise division of a vector $v \in {\mathbb {R}}^{n}$ by p is denoted by $\frac{v}{p}$. Likewise, we set $p v = (p_{1} v_{1},\dotsc ,p_{n} v_{n})^{\top }$. The exponential function and the logarithm apply componentwise to vectors, i.e. $e^{v}=(e^{v_{1}},\dotsc ,e^{v_{n}})^{\top }$ and $\log p = (\log p_{1},\dotsc ,\log p_{n})^{\top }$. For large expression as arguments, we also write

$$\begin{aligned} e^{v} = \exp (v), \end{aligned}$$

(1.26)

which should not be confused with the exponential map (1.10) that is always written with subscript. ${{\,\mathrm{Diag}\,}}(p)$ denotes the diagonal matrix with the components of the vector p on its diagonal.

2 Properties of the assignment flow

2.1 Representation of the assignment flow

The following parametrization of the assignment flow will be convenient for our analysis.

Proposition 1

(S-parametrization [23, Proposition 3.6]) The assignment flow (1.20) is equivalent to the system

$$\begin{aligned} {\dot{S}}&= R_{S}(\varOmega S), \qquad S(0) = \exp _{\mathbb {1}_{{\mathcal {W}}}}(-\varOmega D), \end{aligned}$$

(2.1a)

$$\begin{aligned} {\dot{W}}&= R_{W} S, \qquad W(0) = \mathbb {1}_{{\mathcal {W}}}. \end{aligned}$$

(2.1b)

More precisely, $W(t),\,t \ge 0$ solves (1.20) if and only if it solves (2.1).

The difference between (1.20) and (2.1) is that the latter representation separates the dependencies on the data D and the assignments W: The given data D completely determines S(t) through the initial condition of (2.1a), and S(t) completely determines the assignments W(t) by (2.1b). In what follows, our focus will be on how the parameters $\varOmega $ affect S(t) and W(t).

Remark 1

(S-flow) We call S-flow system (2.1a) and its solution S(t) in the remainder of this paper and use the short-hand F for the vector field, i.e.

$$\begin{aligned} \dot{S} = F(S) = R_{S}(\varOmega S),\qquad S(0) = S_{0} \in \mathcal {W}. \end{aligned}$$

(2.2)

A direct consequence of the parametrization (2.1) is the following.

Proposition 2

Let $S(t),\; t \ge 0$ solve (2.1a). Then the solution to (2.1b) is given by

$$\begin{aligned} W(t) = \exp _{\mathbb {1}_{{\mathcal {W}}}} \left( \int _{0}^{t} S(\tau ) \,\mathrm {d}\tau \right) = \exp _{\mathbb {1}_{{\mathcal {W}}}} \left( \int _{0}^{t} \varPi _{0}S(\tau ) \,\mathrm {d}\tau \right) . \end{aligned}$$

(2.3)

Proof

Set $I_{S}(t)=\int _{0}^{t} S(\tau ){\mathrm{d}}\tau $. Then $W(t) = \exp _{\mathbb {1}_{\mathcal {W}}}\big (I_{S}(t)\big )$ and

$$\begin{aligned} \dot{W}(t) = d\exp _{\mathbb {1}_{\mathcal {W}}}\big (I_{S}(t)\big )\big [{\dot{I}}_{S}(t)\big ] \overset{(1.12)}{=} R_{\exp _{\mathbb {1}_{\mathcal {W}}}(I_{S}(t))}\big ({\dot{I}}_{S}(t)\big ) = R_{W(t)}\big (S(t)\big ). \end{aligned}$$

(2.4)

The second equation of (2.3) follows from the first equation of (1.9).$\square $

Transferring the assignment flow (1.20) to the tangent space $\mathcal {T}_{0}$ and linearizing the ODE leads to the linear assignment flow [33, Prop. 4.2]

$$\begin{aligned} {\dot{V}}&= R_{{\widehat{S}}}(\varOmega V) + B, \quad V(0) = 0, \quad V \in \mathcal {T}_0, \end{aligned}$$

(2.5)

with fixed ${\widehat{S}} \in \mathcal {W}$ and $B \in \mathcal {T}_{0}$.

We note that both the S-flow (2.2) and the linear assignment flow (2.5) are defined by similar vector fields on the tangent space $\mathcal {T}_{0}$. Ignoring the constant term B in (2.5) that can be represented by using a corresponding initial point (see Lemma 2), the difference concerns the parameters S and ${{\widehat{S}}}$ of the replicator matrix: In the linear assignment flow, this parameter ${{\widehat{S}}}$ is fixed, whereas in the S-flow, it changes with the flow. Notice that ‘linear’ refers to the linearity of the ODE (2.5) on the tangent space. The corresponding lifted flow (2.56) on the assignment manifold is still nonlinear (cf. [33, Def. 4.1]).

Convergence properties of the S-flow and the linear assignment flow are analyzed in the following sections.

2.2 Existence and uniqueness

We establish global existence and uniqueness of both the S-flow and the assignment flow and examine to what extent the former determines the latter.

Proposition 3

(global existence and uniqueness) The solutions W(t), S(t) to (2.1) are unique and globally exist for $t \ge 0$.

Proof

The hyperplanes $\{ S :\sum _{j} S_{ij} = 1 \}$ for $i \in I$ and $\{ S :S_{ij} = 0 \}$ for $i \in I$, $j \in J$ are invariant with respect to the flow (2.2). Hence, S(t) stays in ${\mathcal {W}} \subset \overline{{\mathcal {W}}}$ (cf. [15]) and therefore exists for all $t \in {\mathbb {R}}$ by [29, Corollary 2.16]. Equation (2.3) then implies the existence of W(t) for all $t \in {\mathbb {R}}$. The uniqueness of the solutions follow by the local Lipschitz continuity of the right-hand side of (2.2) and (1.20), respectively.$\square $

Remark 2

(a)
It is clear in view of the representation (2.1) that the domain $\mathcal {W}$ of the S-flow and consequently the domain of the assignment flow, too, can be extended to $\overline{\mathcal {W}}$, and we henceforth assume this to be the case. Furthermore, the domain of the S-flow can be extended to an open set U with $\overline{\mathcal {W}} \subset U \subseteq {\mathbb {R}}^{m \times n}$. In the latter case, although the existence for all $t \ge 0$ is no longer guaranteed, this simplifies the stability analysis of equilibria $S^* \in \overline{\mathcal {W}}$, as we will see in Sect. 2.3.
(b)
The assignment flow shares with replicator equations in general (cf. [15]) that it is invariant with respect to the boundary $\partial \overline{\mathcal {W}}$: Due to the multiplication with $R_{S}$ and $R_{W}$, respectively, both S(t) and W(t) cannot leave the corresponding facet of $\partial \overline{\mathcal {W}}$ whenever they reach it.

Next, we examine what convergence of S(t) close to $\partial \overline{\mathcal {W}}$ implies for W(t).

Proposition 4

Let

$$\begin{aligned} \mathcal {V}_{j} = \big \{p \in \varDelta _{n}:p_{j} > p_{l},\; \forall l \in [n]\setminus \{j\}\big \},\quad j \in [n] \end{aligned}$$

(2.6)

denote the Voronoi cells of the vertices of $\varDelta _{n}$ in $\varDelta _{n}$ and suppose $\lim _{t \rightarrow \infty } S_{i}(t) = S_{i}^{*} \in \varDelta _{n}$, for any $i \in I$. Then the following assertions hold.

(a)
If $S_{i}^{*} \in \mathcal {V}_{j^{*}(i)}$ for some label (index) $j^{*} = j^{*}(i) \in J$, then there exist constants $\alpha _{i}, \beta _{i} > 0$ such that
$$\begin{aligned} \Vert W_{i}(t)-e_{j^{*}(i)}\Vert _{1} \le \alpha _{i} e^{-\beta _{i} t},\quad \forall t \ge 0. \end{aligned}$$
(2.7a)
In particular,
$$\begin{aligned} \lim _{t \rightarrow \infty } W_{i}(t) = e_{j^{*}(i)}. \end{aligned}$$
(2.7b)
(b)
One has
$$\begin{aligned}&\int _{0}^{\infty }\Vert S_{i}(t)-S_{i}^{*}\Vert _{1}{\mathrm{d}}t <\infty \nonumber \\&\quad \implies \quad \lim _{t \rightarrow \infty } W_{i}(t) = W_{i}^{*} \quad \text {with}\quad {{\,\mathrm{supp}\,}}(W_{i}^{*})={{\,\mathrm{arg max}\,}}_{j \in J} S_{ij}^{*}. \end{aligned}$$
(2.8)

Proof

See Appendix A. $\square $

Proposition 4(a) states that if any subvector of the S-flow converges to a Voronoi cell (2.6), then the corresponding subvector of W(t) converges exponentially fast to the corresponding integral assignment.

Proposition 4(b) handles the case when the limit point $S_{i}^{*}$ lies at the border of adjacent Voronoi cells, that is the set ${{\,\mathrm{arg max}\,}}_{j \in J} S_{ij}^{*}$ is not a singleton. In this case, one can only state that $W_{i}(t)$ converges to some (possibly nonintegral) point $W_{i}^{*}$ without being able to predict precisely this limit based on $S_{i}^{*}$ alone. In contrast to (a), we also have to assume that the convergence of the S-flow is fast enough—see the hypothesis of (2.8). This assumption is reasonable, however, because it is satisfied whenever $S_{i}^{*}$ is subvector of a hyperbolic equilibrium point of the S-flow (cf. Remark 5 below).

Example 1

We briefly demonstrate what may happen when the assumption of (2.8) is violated. Suppose $S_{i}(t)$ and $S_{i}^{*}$ are given by

$$\begin{aligned} S_i(t) = \begin{pmatrix} \frac{1}{2} - \frac{1}{t+1} \\ \frac{1}{2}- \frac{2}{t+1} \\ \frac{3}{t+1} \end{pmatrix} \ \longrightarrow \ S_i^* = \begin{pmatrix} \frac{1}{2} \\ \frac{1}{2} \\ 0 \end{pmatrix} \quad \text {for} \quad t \rightarrow \infty . \end{aligned}$$

(2.9)

The first component of $S_i(t)$ converges faster than the second component. Since $\Vert S_{i}(t) - S_{i}^{*}\Vert _{1} = \frac{6}{t+1}$, the convergence rate assumption of (2.8) does not hold. Calculating $W_i(t)$ due to (2.3) gives

$$\begin{aligned}&W_i(t) = \frac{1}{1 + \frac{1}{t+1} + (t+1)^4 e^{-\frac{1}{2}t}} \begin{pmatrix} 1 \\ \frac{1}{t+1} \\ (t+1)^4 e^{-\frac{1}{2}t} \end{pmatrix} \nonumber \\&\quad \longrightarrow W_i^* = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \hbox { for}\ t \rightarrow \infty , \end{aligned}$$

(2.10)

i.e. $W_i(t)$ still converges, but we have ${{\,\mathrm{supp}\,}}(W_{i}^{*}) \subsetneq {{\,\mathrm{arg max}\,}}_{j \in J} S_{ij}^{*}$ unlike the statement of (2.8). This example also shows that, in the case of Proposition 4(b), the limit $W_{i}^{*}$ may depend on the trajectory $S_{i}(t)$, rather than only on the limit point $S_{i}^{*}$ as in case (a).

Proposition 4 makes explicit that the S-flow largely determines the asymptotic behavior of W(t). The next section, therefore, focuses on the S-flow (2.2) and on its dependency on the parameters $\varOmega $.

2.3 Convergence to equilibria and stability

In this section, we characterize equilibria, their stability, and convergence properties of the S-flow (2.2). Quantitative estimates of the basin of attraction to exponentially stable equilibria will be provided, too.

2.3.1 Characterization of equilibria and their stability

We show in this section under mild conditions that only integral equilibrium points $S^{*} \in \overline{\mathcal {W}}^{*}$ can be stable.

Proposition 5

(equilibria) Let $\varOmega \in {\mathbb {R}}^{n \times n}$ be an arbitrary matrix.

(a)
A point $S^* \in \overline{{\mathcal {W}}}$ is an equilibrium point of the S-flow (2.2) if and only if
$$\begin{aligned} (\varOmega S^*)_{ij} = \langle S_{i}^*, (\varOmega S^*)_{i} \rangle , \qquad \forall j \in {{\,\mathrm{supp}\,}}(S_i^*),\qquad \forall i \in I, \end{aligned}$$
(2.11)
i.e., the subvectors $(\varOmega S^*)_{i}$ are constant on ${{\,\mathrm{supp}\,}}S_i^*$, for each $i \in I$.
(b)
Every point $S^* \in \overline{{\mathcal {W}}}^*$ is an equilibrium point of the S-flow (2.1a).
(c)
Let $J_{+} \subseteq J$ be a non-empty subset of indices, and let $\mathbb {1}_{J_{+}} \in {\mathbb {R}}^{n}$ be the corresponding indicator vector with components $(\mathbb {1}_{J_{+}})_{j}=1$ if $j \in J_{+}$ and $(\mathbb {1}_{J_{+}})_{j}=0$ otherwise. Then $S^*= \tfrac{1}{|J_{+}|} \mathbb {1}_{m} \mathbb {1}_{J_{+}}^\top $ is an equilibrium point. In particular, the barycenter $\mathbb {1}_{{\mathcal {W}}}=\tfrac{1}{n} \mathbb {1}_{m} \mathbb {1}_{n}^\top $ corresponding to $J_{+}=J$ is an equilibrium point.

Proof

(a)
Each equation of the system (2.2) has the form
$$\begin{aligned} {\dot{S}}_{ij} = S_{ij} \big ( (\varOmega S)_{ij} - \langle S_i, (\varOmega S)_{i} \rangle \big ),\quad i \in I,\; j \in J. \end{aligned}$$
(2.12)
$\dot{S}_{ij}=0$ implies $S_{ij}=S_{ij}^{*} \ne 0$ if $j\in {{\,\mathrm{supp}\,}}(S_{i}^{*})$ and that the term in the round brackets is zero, which is (2.11).
(b)
The replicator matrix (1.7) satisfies $R_{e_{j}} \equiv 0,\; \forall j \in J$. This implies $R_{S^*} = 0$ and in turn $R_{S^*} (\varOmega S^*) = 0$.
(c)
Since $\varOmega S^* = \frac{1}{|J_+|} (\varOmega \mathbb {1}_{m}) \mathbb {1}_{J_+}^\top $, the subvectors $(\varOmega S^*)_i,\,i \in I$ are constant on $J_+ = {{\,\mathrm{supp}\,}}S_i^*$, which implies by (a) that $S^*$ is an equilibrium point.

$\square $

Remark 3

The set of equilibria characterized by Proposition 5 (b) and (c) may not exhaust the set of all equilibrium points for a general parameter matrix $\varOmega $. However, we will show below that, under certain mild conditions, any such additional equilibrium points must be unstable.

Next, we study the stability of equilibrium points.

Lemma 1

(Jacobian) Let F(S) denote the vector field defining the S-flow (2.2). Then, after stacking S row-wise, the Jacobian matrix of F is given by

$$\begin{aligned} \frac{\partial F}{\partial S} = \begin{pmatrix} B_1 &{} &{} \\ &{} \ddots &{} \\ &{} &{} B_{m} \end{pmatrix} + \begin{pmatrix} R_{S_{1}} &{} &{} \\ &{} \ddots &{} \\ &{} &{} R_{S_{m}} \end{pmatrix} \cdot \varOmega \otimes I_{n} \end{aligned}$$

(2.13)

with block matrices $B_i = {{\,\mathrm{Diag}\,}}\big ( (\varOmega S)_i \big ) - \langle S_i, (\varOmega S)_i \rangle I_{n} - S_i (\varOmega S)_i^\top $ and $R_{S_i}$ given by (1.7).

Proof

The subvectors of F have the form

$$\begin{aligned} F_i(S) = R_{S_i} (\varOmega S)_i = \big ( {{\,\mathrm{Diag}\,}}(S_i) - S_i S_i^\top \big ) (\varOmega S)_i,\qquad i \in I. \end{aligned}$$

(2.14)

Hence

$$\begin{aligned} \mathrm {d}F_i(S)[T]&= \tfrac{\mathrm {d}}{\mathrm {d}t} F_i(S+t T) |_{t=0} \end{aligned}$$

(2.15a)

$$\begin{aligned}&= \big ( {{\,\mathrm{Diag}\,}}(T_i) - T_i S_i^\top - S_i T_i^\top \big ) (\varOmega S)_i + R_{S_i} (\varOmega T)_i \end{aligned}$$

(2.15b)

$$\begin{aligned}&= \big ( {{\,\mathrm{Diag}\,}}\big ( (\varOmega S)_i \big ) - \langle S_i, (\varOmega S)_i \rangle I_{n} - S_i (\varOmega S)_i^\top \big ) T_i + R_{S_i} (\varOmega T)_i\end{aligned}$$

(2.15c)

$$\begin{aligned}&= B_{i} T_{i} + R_{S_i} (\varOmega T)_i. \end{aligned}$$

(2.15d)

We have $\mathrm {d}F(S)[T] = \tfrac{\partial F}{\partial S} {\text {vec}}(T)$ with ${\text {vec}}(T) \in {\mathbb {R}}^{m n}$ denoting the vector that results from stacking the row vectors (subvectors) of T. Comparing both sides of this equation, with the block matrices of the left-hand side given by (2.15), implies (2.13). $\square $

Proposition 6

(eigenvalues of the Jacobian) Let $S^* \in \overline{{\mathcal {W}}}$ be an equilibrium point of the S-flow (2.2), i.e. $F(S^{*}) = R_{S^{*}}(\varOmega S^{*}) = 0$. Then regarding the spectrum $\sigma \big ( \tfrac{\partial F}{\partial S}(S^*) \big )$, the following assertions hold.

(a)
A subset of the spectrum is given by
$$\begin{aligned} \sigma \big ( \tfrac{\partial F}{\partial S}(S^*) \big ) \supseteq \bigcup _{i \in I} \big \{ - \langle S_i^*, (\varOmega S^*)_i \rangle \big \} \cup \big \{ (\varOmega S^*)_{ij} - \langle S_i^*, (\varOmega S^*)_i \rangle \big \}_{j \in J \setminus {{\,\mathrm{supp}\,}}(S_i^*)}.\nonumber \\ \end{aligned}$$
(2.16)
This relation becomes an equation if $S^*$ is integral, i.e. $S^* \in \overline{{\mathcal {W}}}^*$. In the latter case, the eigenvectors are given by
$$\begin{aligned} e_i e_{j^*(i)}^\top \in {\mathbb {R}}^{m \times n}, \quad e_i (e_{j^*(i)} - e_j)^\top \in {\mathcal {T}}_0, \quad \forall j \in J \setminus \{ j^*(i) \}, \quad \forall i \in I. \end{aligned}$$
(2.17)
(b)
If $S^* = \tfrac{1}{|J_{+}|} \mathbb {1}_{m} \mathbb {1}_{J_{+}}^\top $ with $J_{+}\subseteq J$ and $|J_{+}| \ge 2$, then
$$\begin{aligned} \sigma \big ( \tfrac{\partial F}{\partial S}(S^*) \big ) = \bigcup _{i \in I} \Big \{ -\tfrac{(\varOmega \mathbb {1}_{m})_{i}}{|J_{+}|} \Big \} \cup \bigcup _{\lambda \in \sigma (\varOmega )} \big \{ \tfrac{\lambda }{|J_{+}|} \big \}. \end{aligned}$$
(2.18)
(c)
Assume the parameter matrix $\varOmega $ with elements $\omega _{ii},\, i \in I$ on the main diagonal, is nonnegative. If $S_{i}^* \not \in \{0,1\}^{n}$ and $\omega _{ii} > 0$ hold for some $i \in I$, then the Jacobian matrix has at least one eigenvalue with positive real part. The real and imaginary part of the corresponding eigenvector lie in
$$\begin{aligned} {\mathcal {T}}_{+} = \big \{ V \in {\mathcal {T}}_0 :{{\,\mathrm{supp}\,}}(V) \subseteq {{\,\mathrm{supp}\,}}(S^*) \big \}. \end{aligned}$$
(2.19)

Proof

See Appendix A. $\square $

Next, we apply Proposition 6 and the stability criteria stated in Appendix B in order to classify the equilibria of the S-flow.

Corollary 1

(stability of equilibria) Let $\varOmega $ be a nonnegative matrix with positive diagonal entries. Then, regarding the equilibria $S^* \in \overline{{\mathcal {W}}}$ of the S-flow (2.2), the following assertions hold.

(a)
$S^* \in \overline{\mathcal {W}}^{*}$ is exponentially stable if, for all $i \in I$,
$$\begin{aligned}&(\varOmega S^*)_{i j} < (\varOmega S^*)_{i j^*(i)} \quad \text {for all } j \in J \setminus \{ j^*(i) \} \nonumber \\&\quad \text {with} \quad \{ j^*(i) \} = {{\,\mathrm{arg max}\,}}_{j \in J}~S_{ij}^*. \end{aligned}$$
(2.20)
(b)
$S^* \in \overline{\mathcal {W}}^{*}$ is unstable if, for some $i \in I$,
$$\begin{aligned}&(\varOmega S^*)_{i j} > (\varOmega S^*)_{i j^*(i)} \quad \text {for some } j \in J \setminus \{ j^*(i) \} \nonumber \\&\quad \text {with} \quad \{ j^*(i) \} = {{\,\mathrm{arg max}\,}}_{j \in J}~S_{ij}^*. \end{aligned}$$
(2.21)
(c)
All equilibrium points $S^* \not \in \overline{\mathcal {W}}^{*}$ are unstable.

Proof

(a)
We apply Theorem 3(a) that provides a condition for stability of the S-flow, regarded as flow on an open subset of ${\mathbb {R}}^{m\times n}$. Since the stability also holds on subsets, this shows stability of the S-flow on $\overline{\mathcal {W}}$.

By Proposition 6(a), the spectrum of $\tfrac{\partial F}{\partial S}(S^*)$, for $S^{*} \in \overline{\mathcal {W}}^{*}$, is given by the right-hand side of (2.16) and, since $\varOmega $ is nonnegative, is clearly negative if condition (2.20) holds.
(b)
We take eigenvectors into account and invoke Proposition 16(b). The eigenvectors are given by (2.17), and if the eigenvalue $\lambda = (\varOmega S^*)_{ij} - (\varOmega S^*)_{i j^*(i)}$ is positive, then the corresponding eigenvector $V = e_i (e_{j^*(i)} - e_j)^\top \in {\mathcal {T}}_0$ is tangent to $\overline{{\mathcal {W}}}$. By Proposition 16(b), there exists an open truncated cone ${\mathcal {C}} \subset {\mathbb {R}}^{m\times n}$ such that $\delta \cdot V \in {\mathcal {C}}$, for sufficiently small $\delta > 0$, and the S-flow (2.1a) is repelled from $S^*$ within $S^* + {\mathcal {C}}$. Since $V \in {\mathcal {T}}_0$, the (relatively) open subset $(S^* + {\mathcal {C}}) \cap \overline{{\mathcal {W}}} \subset \overline{{\mathcal {W}}}$ is non-empty. This shows the instability of $S^*$.
(c)
By the assumption on $\varOmega $, there is an eigenvalue with positive real part due to Proposition 6(c), and the real and imaginary part of the corresponding eigenvector lie in ${\mathcal {T}}_{+} \subseteq {\mathcal {T}}_0$. So the argument of (b) applies here as well using the real part of the eigenvector.

$\square $

Remark 4

(selection of stable equilibria) For $S^{*}$ to be exponentially stable, Corollary 1(a) requires that every averaged subvector $(\varOmega S^{*})_{i}$ has the same component as maximal component, as does the corresponding subvector $S_{i}^{*}$. This means that the $\varOmega $-weighted average of the vectors $S_{j}^{*}$ within the neighborhood $j \in \mathcal {N}_{i}$ lies in the Voronoi-cell $\mathcal {V}_{j^{*}(i)}$ (2.6) corresponding to $S_{i}^{*}$.

Thus, Corollary 1 provides a mathematical and intuitively plausible definition of ‘spatially coherent’ segmentations of given data, that can be determined by means of the assignment flow. This also demonstrates how the label (index) selection mechanism of the replicator equations (1.22), whose spatial coupling defines the assignment flow (1.20), works from the point of view of evolutionary dynamics [22] when using the similarity vectors $S_{i}(W)$ (1.16) as ‘affinity measures’.

2.3.2 Convergence of the S-flow to equilibria

We make the basic assumption that the parameter matrix $\varOmega $ has the form

$$\begin{aligned} \varOmega = {{\,\mathrm{Diag}\,}}(w)^{-1} {\widehat{\varOmega }} \quad \text {with} \quad w \in {\mathbb {R}}_{>0}^{m} \qquad \text {and} \qquad {\widehat{\varOmega }}^\top = {\widehat{\varOmega }} \in {\mathbb {R}}^{m \times m}. \end{aligned}$$

(2.22)

Matrices of the form (2.22) include as special cases parameters satisfying

$$\begin{aligned} \varOmega&= \varOmega ^{\top },\quad (\text {symmetric weights}) \end{aligned}$$

(2.23a)

$$\begin{aligned} w&= {{\widehat{\varOmega }}} \mathbb {1}_{m}. \quad (\text {normalized weights}) \end{aligned}$$

(2.23b)

An instance of $\varOmega $ satisfying (2.23b) are nonnegative uniform weights with symmetric neighborhoods, i.e.

$$\begin{aligned} \omega _{ik} = \tfrac{1}{| {\mathcal {N}}_i |},\quad \forall k \in \mathcal {N}_{i} \quad \text {and}\quad k \in {\mathcal {N}}_i \quad \Leftrightarrow \quad i \in {\mathcal {N}}_k. \end{aligned}$$

(2.24)

Note that in the following basic convergence theorem, neither $\varOmega $ nor ${\widehat{\varOmega }}$ is assumed to be row-stochastic or nonnegative.

Theorem 1

(convergence to equilibria) Let $\varOmega $ be of the form (2.22). Then the S-flow (2.2) converges to an equilibrium point $S^{*} = S^{*}(S_{0}) \in \overline{{\mathcal {W}}}$, for any initial value $S_0 \in {\mathcal {W}}$.

Proof

See Appendix A. $\square $

Proposition 7

Let $\varOmega $ be nonnegative with positive diagonal entries, and let ${S^* \in \overline{{\mathcal {W}}}}$ be an equilibrium point of the S-flow (2.2) which satisfies one of the instability criteria of Corollary 1 (b) or (c). Then the set of starting points $S_0 \in {\mathcal {W}}$ for which the S-flow converges to $S^*$ has measure zero in ${\mathcal {W}}$.

Proof

By [18], there exists a center-stable manifold ${\mathcal {M}}_{\text {cs}}(S^*)$ which is invariant under the S-flow and tangent to $E_{\text {c}} \oplus E_{\text {s}}$ at $S^*$. Here, $E_{\text {c}}$ and $E_{\text {s}}$ denote the center and stable subspace of $\tfrac{\partial F}{\partial S}(S^*)$, respectively. Any trajectory of the S-flow converging to $S^*$ lies in ${\mathcal {M}}_{\text {cs}}(S^*)$. Therefore, it suffices to show that the dimension of the manifold ${\mathcal {M}}_{\text {cs}}(S^*) \cap {\mathcal {W}}$ is smaller than the dimension of ${\mathcal {W}}$. Note that ${\mathcal {M}}_{\text {cs}}(S^*) \cap {\mathcal {W}}$ is a manifold since both ${\mathcal {M}}_{\text {cs}}(S^*)$ and ${\mathcal {W}}$ are invariant under the S-flow. We have

$$\begin{aligned} \begin{aligned} \dim \big ( {\mathcal {M}}_{\text {cs}}(S^*) \cap {\mathcal {W}} \big )&= \dim \big ( (E_{\text {c}} \oplus E_{\text {s}}) \cap {\mathcal {T}}_0 \big ) = \dim ({\mathcal {T}}_0) - \dim ( E_{\text {u}} \cap {\mathcal {T}}_0 ) \\&= \dim ({\mathcal {W}}) - \dim ( E_{\text {u}} \cap {\mathcal {T}}_0 ), \end{aligned} \end{aligned}$$

(2.25)

where $E_{\text {u}}$ denotes the unstable subspace of $\tfrac{\partial F}{\partial S}(S^*)$. Since $\tfrac{\partial F}{\partial S}(S^*)$ has an eigenvalue with positive real part and a corresponding eigenvector lying in ${\mathcal {T}}_0$ (cf. proof of Corollary 1), we have $\dim ( {E_{\text {u}} \cap {\mathcal {T}}_0} ) \ge 1$ and therefore $\dim \!\big ( {\mathcal {M}}_{\text {cs}}(S^*) \cap {\mathcal {W}} \big ) \le \dim ({\mathcal {W}}) - 1$. $\square $

Remark 5

(consequences for the assignment flow) If $S^* \in \overline{\mathcal {W}}$ is a hyperbolic equilibrium point, then the S-flow locally behaves as its linearization near $S^*$ by the Hartman-Grobman theorem [21, Section 2.8]. Since a linear flow can only converge with an exponential convergence rate, this is also the case for the S-flow (2.2).^{Footnote 1} More precisely, if the S-flow converges to a hyperbolic equilibrium $S^* \in \overline{{\mathcal {W}}}$ then there exist $\alpha , \beta > 0$ such that $\Vert S(t) - S^* \Vert \le \alpha e^{-\beta t}$ irrespective of whether $S^*$ is stable or not. A direct consequence is $\int _0^\infty \Vert S_i(t) - S_i^* \Vert _{1} \mathrm {d}t < \infty $ for all $i \in I$, i.e., assumption of Proposition 4(b) automatically holds if $S^*$ is hyperbolic.

Theorem 2

Let $\varOmega $ be a nonnegative matrix with positive diagonal entries. Then the set of starting points $S_0 \in {\mathcal {W}}$ for which the S-flow (2.2) converges to a nonintegral equilibrium $S^* \in \overline{{\mathcal {W}}}$, has measure zero in ${\mathcal {W}}$.

Proof

Let ${\mathcal {E}} = \{ S^* \in \overline{\mathcal {W}} :F(S^*) = 0 \}$ denote the set of all equilibria of the S-flow in $\overline{{\mathcal {W}}}$, which is a compact subset of $\overline{{\mathcal {W}}}$. If ${\mathcal {E}}$ contains only isolated points, i.e., ${\mathcal {E}}$ is finite, then the statement follows from Proposition 7. In order to take also into account nonfinite sets $\mathcal {E}$ of equilibria, we apply the more general [10, Theorem 9.1]. Some additional notation is introduced first.

For any index set $\mathcal {J} \subseteq I \times J$, set

$$\begin{aligned} {\mathcal {E}}_{{\mathcal {J}}} = \big \{ S^* \in {\mathcal {E}} :{{\,\mathrm{supp}\,}}(S^*) = {\mathcal {J}} \big \} \subset {\mathcal {E}}. \end{aligned}$$

(2.26)

The set ${\mathcal {E}}_{{\mathcal {J}}}$ is the relative interior of a convex polytope and therefore a manifold of equilibria. This follows from the observation that the equilibrium criterion (2.11) is a set of linear equality constraints for $S^* \in \overline{\mathcal {W}}$, given by

$$\begin{aligned} \left. \begin{array}{ll} (\varOmega S^*)_{ij} - (\varOmega S^*)_{il} =0 &{}\quad \forall j,l \in {{\,\mathrm{supp}\,}}(S_i^*) \\ S_{ij}^* = 0&{}\quad \forall j \in J \setminus {{\,\mathrm{supp}\,}}(S_i^*) \end{array} \quad \right\} , \qquad \forall i \in I. \end{aligned}$$

(2.27)

Further, define for $n_{\text {s}},n_{\text {c}},n_{\text {u}} \in {\mathbb {N}}\cup \{0\}$ with $n_{\text {s}} + n_{\text {c}} + n_{\text {u}} = m n$ the set

$$\begin{aligned} {\mathcal {E}}_{(n_{\text {s}},n_{\text {c}},n_{\text {u}})} = \big \{ S^* \in {\mathcal {E}} :\dim E_{\text {s}}(S^*) = n_{\text {s}},\; \dim E_{\text {c}}(S^*) = n_{\text {c}},\; \dim E_{\text {u}}(S^*) = n_{\text {u}} \big \},\nonumber \\ \end{aligned}$$

(2.28)

where $E_{\text {c}}(S^*)$, $E_{\text {s}}(S^*)$ and $E_{\text {u}}(S^*)$ denote the center, stable and unstable subspace of $\tfrac{\partial F}{\partial S}(S^*)$. This set can be written as countable union of compact sets. This can be seen as follows. The map

$$\begin{aligned} {\mathcal {E}} \rightarrow \big \{ x \in {\mathbb {R}}^{m n} :x_1 \le x_2 \le \dots \le x_{m n} \big \}, \qquad S^* \mapsto \mathfrak {R}\Big ( \lambda \big ( \tfrac{\partial F}{\partial S}(S^*) \big ) \Big ), \end{aligned}$$

(2.29)

where $\lambda (\cdot )$ denotes the vector of eigenvalues, is a continuous map on a compact set and therefore proper, i.e., preimages of compact sets under the map (2.29) are compact. It is clear that the set $U_{\text {s}} \times U_{\text {c}} \times U_{\text {u}}$ with

$$\begin{aligned} U_{\text {s}}&= \big \{ x \in {\mathbb {R}}^{n_{\text {s}}} :x_1 \le \dots \le x_{n_s} < 0 \big \}, \end{aligned}$$

(2.30a)

$$\begin{aligned} U_{\text {c}}&= \big \{ x \in {\mathbb {R}}^{n_{\text {c}}} :x = 0 \big \}, \end{aligned}$$

(2.30b)

$$\begin{aligned} U_{\text {u}}&= \big \{ x \in {\mathbb {R}}^{n_{\text {u}}} :0 < x_1 \le \dots \le x_{n_u} \big \} \end{aligned}$$

(2.30c)

can be written as countable union of compact sets. The preimage of this set under the map (2.29) is ${\mathcal {E}}_{(n_{\text {s}},n_{\text {c}},n_{\text {u}})}$.

To complete the proof, we now argue similar to the proof of Proposition 7: the existence of nontrivial unstable subspaces for nonintegral equilibria implies that the center-stable manifold has a smaller dimension.

Let ${\mathcal {J}}$ be the support of any nonintegral equilibrium and let ${\mathcal {E}}_{(n_{\text {s}},n_{\text {c}},n_{\text {u}})}$ be such that ${\mathcal {E}}_{{\mathcal {J}}} \cap {\mathcal {E}}_{(n_{\text {s}},n_{\text {c}},n_{\text {u}})} \ne \emptyset $. As seen in the proof of Corollary 1(c), we have $E_{\text {u}}(S^*) \cap \mathcal {T}_0 \ne \{0\}$ for any $S^* \in {\mathcal {E}}_{{\mathcal {J}}}$, i.e. $n_{\text {u}} \ge 1$. Since both ${\mathcal {E}}_{{\mathcal {J}}}$ and ${\mathcal {E}}_{(n_{\text {s}},n_{\text {c}},n_{\text {u}})}$ can be written as countable union of compact sets, this is also the case for their intersection, i.e., we have

$$\begin{aligned} {\mathcal {E}}_{{\mathcal {J}}} \cap {\mathcal {E}}_{(n_{\text {s}},n_{\text {c}},n_{\text {u}})} = \bigcup _{l \in {\mathbb {N}}} K_l \end{aligned}$$

(2.31)

with $K_l \subseteq {\mathcal {E}}_{{\mathcal {J}}}$ compact. For any $l \in {\mathbb {N}}$, there exists a center-stable manifold ${\mathcal {M}}_{\text {cs}}(K_l)$ containing $K_l$, which is invariant under the S-flow and tangent to $E_{\text {c}}(S^*) \oplus E_{\text {s}}(S^*)$ at any $S^* \in K_l$ [10, Theorem 9.1]. Any trajectory of the S-flow converging to a point $S^* \in K_l$ lies in ${\mathcal {M}}_{\text {cs}}(K_l)$. Hence, analogous to the proof of Proposition 7, we have

$$\begin{aligned} \dim \big ( {\mathcal {M}}_{\text {cs}}(K_l) \cap \mathcal {W} \big ) = \dim (\mathcal {W}) - \dim (E_{\text {u}}(S^*) \cap \mathcal {T}_0) \le \dim (\mathcal {W}) - 1, \end{aligned}$$

(2.32)

with any $S^* \in K_l$, i.e., ${\mathcal {M}}_{\text {cs}}(K_l) \cap \mathcal {W}$ has measure zero in $\mathcal {W}$. The countable union $\bigcup _{l \in {\mathbb {N}}} {\mathcal {M}}_{\text {cs}}(K_l) \cap \mathcal {W}$, which contains all trajectories converging to an equilibrium $S^* \in {\mathcal {E}}_{{\mathcal {J}}} \cap {\mathcal {E}}_{(n_{\text {s}},n_{\text {c}},n_{\text {u}})}$, has measure zero as well. Since there are only finitely many such sets ${\mathcal {E}}_{{\mathcal {J}}} \cap {\mathcal {E}}_{(n_{\text {s}},n_{\text {c}},n_{\text {u}})}$, this completes the proof. $\square $

In view of Theorem 2, the following Corollary that additionally takes into account assumption (2.22), is obvious.

Corollary 2

(Convergence to integral assignments) Let $\varOmega $ be a nonnegative matrix with positive diagonal entries which also fulfills the symmetry assumption (2.22). Then the set of starting points $S_0 \in {\mathcal {W}}$, for which the S-flow (2.2) does not converge to an integral assignment $S^* \in \overline{{\mathcal {W}}}^*$, has measure zero. If $\varOmega $ is additionally invertible, then the set of distance matrices $D \in {\mathbb {R}}^{m \times n}$ for which the S-flow does not converge to an integral assignment has measure zero as well.

2.3.3 Basins of attraction

Corollary 1 says that, if a point $S^* \in \overline{{\mathcal {W}}}^*$ satisfies the stability criterion (2.20), then there exists an open neighborhood of $S^*$ such that the S-flow emanating from this neighborhood will converge to $S^*$ with an exponential convergence rate. The subsequent proposition quantifies this statement by describing the convergence in balls around the equilibria which are contained in the corresponding basin of attraction.

Proposition 8

Let $\varOmega $ be a nonnegative matrix with positive diagonal entries, and let $S^* \in \overline{{\mathcal {W}}}^*$ satisfy (2.20). Furthermore, set

$$\begin{aligned}&A(S^*) :=\bigcap _{i \in I} \bigcap _{j \ne j^*(i)} \big \{ S \in {\mathbb {R}}^{m \times n} :(\varOmega S)_{ij} < (\varOmega S)_{i j^*(i)} \big \} \nonumber \\&\quad \text {with} \quad \{ j^*(i) \} = {{\,\mathrm{arg max}\,}}_{j \in J}~S_{ij}^*, \end{aligned}$$

(2.33)

which is an open convex polytope containing $S^*$. Finally, let $\varepsilon > 0$ be small enough such that

$$\begin{aligned} B_{\varepsilon }(S^*) :=\big \{ S \in \overline{{\mathcal {W}}} :\max _{i \in I} \Vert S_{i} - S_{i}^* \Vert _{1} < \varepsilon \big \} \subset \big (A(S^*) \cap \overline{{\mathcal {W}}}\big ). \end{aligned}$$

(2.34)

Then, regarding the S-flow (2.2), the following holds: If $S(t_0) \in B_{\varepsilon }(S^*)$ for some point in time $t_0$, then $S(t) \in B_{\varepsilon }(S^*)$ for all $t \ge t_0$ and $\lim _{t \rightarrow \infty } S(t) = S^*$. Moreover, we have

$$\begin{aligned} \Vert S_i(t) - S_i^* \Vert _{1} \le \Vert S_i(t_0) - S_i^* \Vert _{1} \cdot e^{-\beta _i (t - t_0)}, \quad \forall i \in I, \end{aligned}$$

(2.35a)

where

$$\begin{aligned} \beta _i = \min _{S \in \overline{B_{\delta }(S^*)} \cap \overline{{\mathcal {W}}}}~S_{i j^*(i)} \cdot \min _{j \ne j^*(i)}~\big ( (\varOmega S)_{i j^*(i)} - (\varOmega S)_{ij} \big ) > 0 \end{aligned}$$

(2.35b)

and $\delta > 0$ is chosen small enough such that $S(t_{0}) \in \overline{B_{\delta }(S^{*})} \subset B_{\varepsilon }(S^{*})$.

Proof

For each $i \in I$, we have with $S_{i}^{*}=e_{j^{*}(i)}$

$$\begin{aligned}&\frac{\mathrm {d}}{\mathrm {d}t} \Vert S_{i} - S_{i}^* \Vert _{1} \nonumber \\&= \frac{\mathrm {d}}{\mathrm {d}t} \Big ( 1 - S_{i j^*(i)} + \sum _{j \ne j^*(i)} S_{i j} \Big ) \qquad \left( \text {using}\; \sum _{j \in [n]} S_{ij}=1\right) \end{aligned}$$

(2.36a)

$$\begin{aligned}&= \frac{\mathrm {d}}{\mathrm {d}t}(2 - 2 S_{i j^*(i)}) \end{aligned}$$

(2.36b)

$$\begin{aligned}&{\mathop {=}\limits ^{\tiny (2.2)}} - 2 S_{i j^*(i)} \big ( (\varOmega S)_{i j^*(i)} - \langle S_i, (\varOmega S)_i \rangle \big ) \end{aligned}$$

(2.36c)

$$\begin{aligned}&\le -2 S_{i j^*(i)} \Big ( (\varOmega S)_{i j^*(i)} - S_{i j^*(i)} (\varOmega S)_{i j^*(i)} - \max _{j \ne j^*(i)}~(\varOmega S)_{ij} \sum _{j \ne j^*(i)} S_{ij} \Big ) \end{aligned}$$

(2.36d)

$$\begin{aligned}&= -2 S_{i j^*(i)} (1 - S_{i j^*(i)}) \Big ( (\varOmega S)_{i j^*(i)} - \max _{j \ne j^*(i)}~(\varOmega S)_{ij} \Big ) \end{aligned}$$

(2.36e)

$$\begin{aligned}&{\mathop {=}\limits ^{\tiny (2.36b)}} - S_{i j^*(i)} \Vert S_{i} - S_{i}^* \Vert _{1} \cdot \min _{j \ne j^*(i)}~\big ( (\varOmega S)_{i j^*(i)} - (\varOmega S)_{ij} \big ). \end{aligned}$$

(2.36f)

Choosing $\delta > 0$ such that $S(t_0) \in \overline{B_{\delta }(S^*)} \subset B_{\varepsilon }(S^*)$, it follows that $\beta _{i}$ given by (2.35b) is positive. Consequently

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} \Vert S_{i} - S_{i}^* \Vert _{1} \le -\beta _i \Vert S_{i} - S_{i}^* \Vert _{1} \end{aligned}$$

(2.37)

and by Gronwall’s Lemma (2.35a) holds. Hence, $\max _{i \in I} \Vert S_{i} - S_{i}^* \Vert _{1}$ monotonically decreases as long as $S(t) \in \overline{B_{\delta }(S^*)}$. This guarantees that S(t) stays in $\overline{B_{\delta }(S^*)} \subset B_{\varepsilon }(S^*)$ and converges toward $S^*$. $\square $

Note that if S(t) is close to $S^*$, then the convergence rate (2.35) of S(t) is approximately governed by

$$\begin{aligned} \beta _i \approx \min _{j \ne j^*(i)}~\big ( (\varOmega S^*)_{i j^*(i)} - (\varOmega S^*)_{ij} \big ). \end{aligned}$$

(2.38)

Proposition 8 provides a criterion for terminating the numerical integration of the S-flow and subsequent ‘safe’ rounding to an integral solution. For this purpose, the following proposition provides an estimate of $\varepsilon $ defining (2.34).

Proposition 9

Let $S^* \in \overline{{\mathcal {W}}}^*$ satisfy (2.20). A value $\varepsilon > 0$ that is sufficient small for the inclusion (2.34) to hold, is given by

$$\begin{aligned} \varepsilon _{\mathrm {est}} = \min _{i \in I}~\min _{j \ne j^*(i)}~2 \cdot \frac{(\varOmega S^*)_{i j^*(i)} - (\varOmega S^*)_{ij}}{(\varOmega \mathbb {1}_{m} )_{i} + (\varOmega S^*)_{i j^*(i)} - (\varOmega S^*)_{ij}} > 0. \end{aligned}$$

(2.39)

Proof

Let $S \in \overline{{\mathcal {W}}}$ be a point such that

$$\begin{aligned} \max _{i \in I} \Vert S_i - S_i^* \Vert _1 < \varepsilon = \varepsilon _{\mathrm {est}}. \end{aligned}$$

(2.40)

We have to show that $S \in A(S^{*})$, with $A(S^{*})$ given by (2.33).

Since $\Vert S_i - S_i^* \Vert _1 = 2 - 2 S_{i j^*(i)}$, we have

$$\begin{aligned} S_{i j^*(i)}&> 1 - \frac{\varepsilon }{2}, \qquad \qquad S_{i j} \le \sum _{l \ne j^*(i)} S_{il} = 1 - S_{i j^*(i)} < \frac{\varepsilon }{2}, \quad \forall j \ne j^*(i).\nonumber \\ \end{aligned}$$

(2.41a)

Hence, for any $i \in I$ and any $j \ne j^*(i)$, we get with $j^{*}(k),\,k \in I$ similarly defined as $j^{*}(i)$ in (2.20),

$$\begin{aligned} (\varOmega S)_{i j^*(i)} - (\varOmega S)_{ij}&\overset{(1.14)}{=} \sum _{k \in \mathcal {N}_{i}} \omega _{i k} S_{k j^*(i)} - \sum _{k \in \mathcal {N}_{i}} \omega _{i k} S_{kj} \end{aligned}$$

(2.42a)

$$\begin{aligned}&= \sum _{\begin{array}{c} k \in \mathcal {N}_{i} \\ {j^*(k) = j^*(i)} \end{array}} \omega _{i k} \overbrace{ S_{k j^*(i)} }^{> 1 - \tfrac{\varepsilon }{2}} + \sum _{\begin{array}{c} k \in \mathcal {N}_{i} \\ {j^*(k) \ne j^*(i)} \end{array}} \omega _{i k} \overbrace{ S_{k j^*(i)} }^{\ge 0} - \sum _{\begin{array}{c} k \in \mathcal {N}_{i} \\ {j^*(k) = j} \end{array}} \omega _{i k} \overbrace{ S_{kj} }^{\le 1}\nonumber \\&\quad - \sum _{\begin{array}{c} k \in \mathcal {N}_{i} \\ {j^*(k) \ne j} \end{array}} \omega _{i k} \overbrace{ S_{kj} }^{< \tfrac{\varepsilon }{2}}, \end{aligned}$$

(2.42b)

and by dropping the second nonnegative summand,

$$\begin{aligned}&> \Big (1 - \frac{\varepsilon }{2} \Big ) \sum _{\begin{array}{c} k \in \mathcal {N}_{i} \\ {j^*(k) = j^*(i)} \end{array}} \omega _{i k} - \sum _{\begin{array}{c} k \in \mathcal {N}_{i} \\ {j^*(k) = j} \end{array}} \omega _{i k} - \frac{\varepsilon }{2} \sum _{\begin{array}{c} k \in \mathcal {N}_{i} \\ {j^*(k) \ne j} \end{array}} \omega _{i k} \end{aligned}$$

(2.42c)

and using the subvectors of $S^{*}$ are unit vectors,

$$\begin{aligned}&= \Big (1 - \frac{\varepsilon }{2} \Big ) (\varOmega S^*)_{i j^*(i)} - (\varOmega S^*)_{ij} - \frac{\varepsilon }{2} \big ( ( \varOmega \mathbb {1}_{m} )_{i} - (\varOmega S^*)_{ij} \big ) \end{aligned}$$

(2.42d)

$$\begin{aligned}&= (\varOmega S^*)_{i j^*(i)} - (\varOmega S^*)_{ij} - \frac{\varepsilon }{2} \Big ( ( \varOmega \mathbb {1}_{m} )_{i} + (\varOmega S^*)_{i j^*(i)} - (\varOmega S^*)_{ij} \Big ) \end{aligned}$$

(2.42e)

$$\begin{aligned}&\overset{(2.39)}{\ge }\; 0. \end{aligned}$$

(2.42f)

This verifies $S \in A(S^*)$. $\square $

Figure 3 illustrates the sets $A(S^*)$ and $B_{\varepsilon }(S^*)$ defined by (2.33) and (2.34), for some examples in the simple case of two data points and two labels. The beige and green regions in the left panel illustrate that the condition $S(t_0) \in A(S^*)$ neither guarantees that the S-flow converges to $S^*$ nor to stay in $A(S^*)$. This demonstrates the need for the sets $B_{\varepsilon }(S^*)$, shown as shaded squares in Fig. 3. Note that $B_{\varepsilon }(S^*) \ne \emptyset $ only if $S^* \in A(S^*) \ne \emptyset $, i.e. if the stability condition (2.20) is fulfilled.

If uniform weights $\varOmega $ are used for averaging, then the estimate (2.39) can be cast into a simple form that no longer depends on $S^*$.

Corollary 3

Let $\varOmega $ defined by (1.14) be given by uniform weights $\omega _{ik} = \tfrac{1}{|{\mathcal {N}}_i |}$, ${k \in \mathcal {N}_{i}}$, ${i \in I}$. Then the value $\varepsilon > 0$ that achieves the inclusion (2.34) can be chosen as

$$\begin{aligned} \varepsilon _{\mathrm {unif}} = \frac{2}{1 + \max _{i \in I} |{\mathcal {N}}_i|} > 0. \end{aligned}$$

(2.43)

Proof

Let $j^{*}(i)$ be defined as in (2.20). We have

$$\begin{aligned} (\varOmega S^*)_{i j^*(i)} - (\varOmega S^*)_{ij}&= \frac{| \{ k \in {\mathcal {N}}_i :j^*(k) = j^*(i) \} | - | \{ k \in {\mathcal {N}}_i :j^*(k) = j \} |}{| {\mathcal {N}}_i|} \end{aligned}$$

(2.44a)

$$\begin{aligned}&\ge \frac{1}{| {\mathcal {N}}_i |} > 0 \end{aligned}$$

(2.44b)

by assumption and integrality of the numerator in (2.44a). Monotonicity of the function $x \mapsto \frac{x}{1+x}$ implies

$$\begin{aligned} 2 \cdot \frac{(\varOmega S^*)_{i j^*(i)} - (\varOmega S^*)_{ij}}{1 + (\varOmega S^*)_{i j^*(i)} - (\varOmega S^*)_{ij}} \ge 2 \cdot \frac{\frac{1}{| {\mathcal {N}}_i |}}{1 + \frac{1}{| {\mathcal {N}}_i |}} = \frac{2}{1 + | {\mathcal {N}}_i |} \end{aligned}$$

(2.45)

and hence $\varepsilon _{\mathrm {unif}} \le \varepsilon _{\mathrm {est}}$, with $\varepsilon _{\mathrm {est}}$ given by (2.39). The assertion, therefore, follows from Proposition 9. $\square $

2.4 Convergence properties of the linear assignment flow

This section analyzes the convergence of the linear assignment flow to equilibria and limit points. To apply the standard theory, we rewrite the matrix-valued ($V \in {\mathbb {R}}^{m \times n}$) equation of the linear assignment flow (2.5) into a vector-valued ($V \in {\mathbb {R}}^{m n}$) one, using again V, for simplicity.

Equation (2.5) then takes the form

$$\begin{aligned} {\dot{V}}&= A V + b, \quad V(0) = 0, \end{aligned}$$

(2.46a)

where

$$\begin{aligned} A&= R_{{\widehat{S}}}(\varOmega \otimes I_n). \end{aligned}$$

(2.46b)

Note that matrix A is exactly the second summand in the Jacobian (2.13) of the S-flow. The first summand of (2.13) is due to the dependence of the replicator matrix on the flow. The linear assignment flow (2.5) ignores this dependency by assuming ${{\widehat{S}}} \in \mathcal {W}$ to be fixed.

The following Lemma says that under the assumption $b \in {\mathcal {R}}(A)$, the asymptotic properties of (2.46a) can be inferred from the homogeneous system.

Lemma 2

Let $\varPsi _{A,b,V_0}(t)$ denote the flow of the dynamical system (2.46) but with initial condition $V(0) = V_0$ and assume $b \in {\mathcal {R}}(A)$. Then the equation $\varPsi _{A,b,V_0}(t) = \varPsi _{A,0,V_0+A^+b}(t)-A^+b$ holds, where $A^+$ denotes the pseudoinverse of A.

Proof

For $b \in {\mathcal {R}}(A)$ we have $A A^+ b = b$ and therefore with Duhamel’s formula [29, p. 72]

$$\begin{aligned} \varPsi _{A,b,V_0}(t)&= e^{t A} V_0 + \int _0^t e^{(t-\tau ) A} b\ \mathrm {d}\tau = e^{t A} V_0 + \int _0^t e^{(t-\tau ) A} A\ \mathrm {d}\tau A^+ b \end{aligned}$$

(2.47a)

$$\begin{aligned}&= e^{t A} V_0 + (e^{t A} - I_n) A^+ b = \varPsi _{A,0,V_0+A^+b}(t)-A^+b. \end{aligned}$$

(2.47b)

$\square $

As the translation of the flow by $-A^+b$ does not change the convergence properties (except for translating the equilibria), we can focus on the corresponding homogeneous system

$$\begin{aligned} {\dot{V}} = A V,\quad V(0) = V_0. \end{aligned}$$

(2.48)

Using the eigensystem of A, the solution to (2.48) can be represented in the following well-known way.

Lemma 3

Let A be a diagonalizable matrix with eigenvalues $\lambda _i$, corresponding eigenvectors $v_i$. Further let $V_0 = \sum _i c_i v_i$ with $c_i \in {\mathbb {R}}$. The solution of the linear dynamical system (2.48) can be written as

$$\begin{aligned} V(t) = \sum _i c_i e^{\lambda _i t} v_i. \end{aligned}$$

(2.49)

Without loss of generality, let $\lambda _1$ be the dominant eigenvalue, i.e. the eigenvalue with maximal real part. If $\lambda _1$ is unique and $c_1 \ne 0$, then

$$\begin{aligned} \lim _{t \rightarrow \infty } V(t) = \lim _{t \rightarrow \infty } c_1 e^{\lambda _1 t} v_1. \end{aligned}$$

(2.50)

The hyperplane of initial values with $c_1 = 0$ separates two half-spaces which are the regions of attraction for the limit points in the directions $v_{1}$ and $-v_{1}$, respectively.

Lemma 3 implies the following properties of the system (2.48).

Proposition 10

Any linear dynamical system of the form (2.48) with diagonalizable A has the following properties

(a)
If A has an eigenvalue with positive real part, then any finite equilibrium is unstable and the set of initial points converging to these equilibria is a null set.
(b)
If all eigenvalues of A are real, then the trajectory does not spiral around a subspace through the origin infinitely often, i.e. 0 neither is a spiral sink nor a spiral source.
(c)
The set of equilibria is the nullspace $\mathcal {N}(A)$.
(d)
The stable (resp. unstable) manifold is spanned by the eigenvectors of A corresponding to eigenvalues with negative (resp. positive) real part. All initial points which do not belong to the center-stable manifold diverge to infinity.

The following proposition complements Proposition 10 by examining the spectrum of the matrix A.

Proposition 11

Let $A = R_{{\widehat{S}}}(\varOmega \otimes I_n)$ be the system matrix of the linear assignment flow (2.46a). Then the following holds.

(a)
If the diagonal of $\varOmega $ is nonnegative and contains at least one positive element, the matrix A has at least one eigenvalue with positive real part. This means that all finite equilibria are unstable.
(b)
If $\varOmega $ has the form (2.22) (i.e. $\varOmega $ is a row-wise positive scaling of a symmetric matrix), then A has only real eigenvalues. As a consequence, any initial value converges either to a finite equilibrium or to a fixed limit point at infinity.
(c)
If $\varOmega $ is invertible, then ${\mathrm{rank}}(A) = m(n-1)$. Furthermore, $\mathcal {N}(A)$ is spanned by the vectors $\{e_i \otimes \mathbb {1}_n :i \in I\}$ and the restriction $A|_{\mathcal {T}_{0}}$ is invertible. Thus, 0 is the only finite equilibrium.
(d)
If $\varOmega $ is invertible and positive definite, then the $m(n-1)$ nonzero eigenvalues of A are positive as well. Consequently, the restriction $A|_{\mathcal {T}_{0}}$ is positive definite and any initial value, except for the origin, diverges to infinity.

Proof

(a)
Because the trace of A is positive—cf. (A.26)—A must have at least one positive eigenvalue. The statement on the stability of the equilibria follows from Proposition 10(a).
(b)
Using the notation $A \sim B$ for the similarity of the matrices A and B, we have
$$\begin{aligned} A&= R_{{\widehat{S}}} (\varOmega \otimes I_n) \overset{(2.22)}{=} R_{{\widehat{S}}} ({{\,\mathrm{Diag}\,}}(w)^{-1}{\widehat{\varOmega }} \otimes I_n) \end{aligned}$$
(2.51a)
$$\begin{aligned}&= R_{{\widehat{S}}} ({{\,\mathrm{Diag}\,}}(w)^{-1} \otimes I_n) ({\widehat{\varOmega }} \otimes I_n) \end{aligned}$$
(2.51b)
$$\begin{aligned}&= R_{{\widehat{S}}} ({{\,\mathrm{Diag}\,}}(w) \otimes I_n)^{-1} ({\widehat{\varOmega }} \otimes I_n) \end{aligned}$$
(2.51c)
$$\begin{aligned}&\sim ({{\,\mathrm{Diag}\,}}(w) \otimes I_n)^{\frac{1}{2}} R_{{\widehat{S}}} ({{\,\mathrm{Diag}\,}}(w) \otimes I_n)^{-\frac{1}{2}} \nonumber \\&\quad \cdot ({{\,\mathrm{Diag}\,}}(w) \otimes I_n)^{-\frac{1}{2}} ({\widehat{\varOmega }} \otimes I_n) ({{\,\mathrm{Diag}\,}}(w) \otimes I_n)^{-\frac{1}{2}} \end{aligned}$$
(2.51d)
$$\begin{aligned}&= R_{{\widehat{S}}} ({{\,\mathrm{Diag}\,}}(w) \otimes I_n)^{-\frac{1}{2}} ({\widehat{\varOmega }} \otimes I_n) ({{\,\mathrm{Diag}\,}}(w) \otimes I_n)^{-\frac{1}{2}} \end{aligned}$$
(2.51e)
$$\begin{aligned}&\sim R_{{\widehat{S}}}^{\frac{1}{2}} ({{\,\mathrm{Diag}\,}}(w) \otimes I_n)^{-\frac{1}{2}} ({\widehat{\varOmega }} \otimes I_n) ({{\,\mathrm{Diag}\,}}(w) \otimes I_n)^{-\frac{1}{2}} R_{{\widehat{S}}}^{\frac{1}{2}}, \end{aligned}$$
(2.51f)
where $R_{{\hat{S}}}^{\frac{1}{2}}$ denotes the symmetric positive semidefinite square root of $R_{{\hat{S}}}$. The last matrix is symmetric and therefore all of the matrices above only have real eigenvalues. By Proposition 10(b), the system converges either to a finite equilibrium or towards a fixed point at infinity.
(c)
We have ${\mathrm{rank}}(A) = {\mathrm{rank}}(R_{{\widehat{S}}}(\varOmega \otimes I_n)) = {\mathrm{rank}}(R_{{\widehat{S}}}) = m(n-1)$, which yields the first statement. The second statement follows from
$$\begin{aligned} {R_{{\widehat{S}}}(\varOmega \otimes I_n)(e_i \otimes \mathbb {1}_n)} = {R_{{\widehat{S}}}(\varOmega e_i \otimes I_n \mathbb {1}_n)} = {R_{{\widehat{S}}}(\varOmega e_i \otimes \mathbb {1}_n)} = 0, \end{aligned}$$
(2.52)
since ${R_{{\widehat{S}}_{i}}\mathbb {1}_{n}=0}$, ${\forall i \in I}$. With Proposition 10(c) we conclude that 0 is the only finite equilibrium.
(d)
$R_{{\widehat{S}}}$ is positive semidefinite and we have
$$\begin{aligned} \sigma (R_{{\widehat{S}}}(\varOmega \otimes I_n)) = {\sigma ((\varOmega \otimes I_n)^\frac{1}{2}R_{{\widehat{S}}}(\varOmega \otimes I_n)^\frac{1}{2})}. \end{aligned}$$
(2.53)
Hence, by Sylvester’s law, the matrices $(\varOmega \otimes I_n)^\frac{1}{2}R_{{\widehat{S}}}(\varOmega \otimes I_n)^\frac{1}{2}$ and $R_{{\widehat{S}}}$ have the same inertia. Thus, the center-stable manifold contains only the origin. Proposition 10(d) yields divergence to infinity.

$\square $

Remark 6

If $\varOmega $ is not a row-wise positive scaling of a symmetric matrix, the resulting matrix A may have complex eigenvalues. This can be seen for the choice

$$\begin{aligned} {\hat{S}} = \frac{1}{2}\begin{pmatrix} 1 &{} 1 \\ 1 &{} 1 \end{pmatrix},\quad \varOmega = \frac{1}{2}\begin{pmatrix} 1 &{} 1 \\ -1 &{} 1 \end{pmatrix}, \end{aligned}$$

(2.54)

for which the matrix A has the eigenvalues $\sigma (A) = \{\frac{1}{2}+\frac{1}{2} i, \frac{1}{2}-\frac{1}{2} i, 0, 0 \}$. Note that $\varOmega $ is a row-wise scaling of a symmetric matrix but not a row-wise positive scaling.

The same matrix ${\hat{S}}$ and the matrix

$$\begin{aligned} \varOmega = \frac{1}{2}\begin{pmatrix} -1 &{} 1 \\ 1 &{} -1 \end{pmatrix} \end{aligned}$$

(2.55)

yield only nonpositive eigenvalues $\sigma (A) = \{-\frac{1}{2}, 0, 0, 0 \}$.

For uniform positive weights (2.24), $\varOmega $ has nonpositive eigenvalues. The existence of the eigenvalue 0 depends on the size of the graph and the size of the neighborhood. If $\varOmega $ is a randomly chosen or a matrix of the form (2.22) and estimated from data, it generally has negative eigenvalues.

To analyze the asymptotic behavior of the lifted flow

$$\begin{aligned} W(t) = {{\,\mathrm{Exp}\,}}_{W_{0}}\big (V(t)\big ) = \exp _{W_{0}}\Big (\frac{V(t)}{W_{0}}\Big ), \end{aligned}$$

(2.56)

it is enough to lift the line in direction of the maximal eigenvector to the assignment manifold, as examined next.

Lemma 4

Let v be a vector which has its maximal entries at the positions $\{i_1, \dots i_k\} = {{\,\mathrm{arg max}\,}}_i v_i$. Then the line in direction v lifted at $p \in \mathcal {S}$ converges to a specific point on a k-dimensional face of $\mathcal {S}$ given by

$$\begin{aligned} \lim _{t \rightarrow \infty } \exp _p(t v) = \frac{1}{\sum _{l \in [k]} p_{i_l}} \sum _{l \in [k]} p_{i_l} e_{i_l}. \end{aligned}$$

(2.57)

In particular, if v has a unique maximal entry, $\lim _{t \rightarrow \infty } \exp _p(t v)$ converges to the corresponding unit vector.

Proof

Set $v_{\max } = \max _i v_i$ and consider $\exp _p(t v) = \exp _p(t (v - v_{\max } \mathbb {1}_{n})) = \frac{p e^{t(v - v_{\max } \mathbb {1}_{n})}}{\langle p, e^{t(v - v_{\max } \mathbb {1}_{n})} \rangle }$. In the numerator, every entry which does not correspond to a maximal entry of v converges to 0 for $t \rightarrow \infty $, whereas the other entries converge to the corresponding entry in p. The denominator normalizes the expression, which yields the result. $\square $

Applying this lemma to each vertex in I, we get the following statement on the convergence of the lifted linear assignment flow to integral assignments.

Corollary 4

Under the assumptions of Lemma 3, if $\frac{v_1}{W_0}$ has a unique maximal entry for each vertex, then the lifted flow (2.56) converges to an integral assignment.

Because $W_0$ and the dominant eigenvector of A depend on real data in practice, the assumptions of Corollary 4 are typically satisfied.

We conclude this section by comparing the convergence properties of the S-flow to those of the linear assignment flow.

Remark 7

(S-flow vs. linear assignment flow) If $\varOmega $ is nonnegative on the diagonal with at least one positive entry, the Jacobian matrices of the S-flow (at nonintegral points) and the Jacobian matrix of the linear assignment flow, i.e. A, have at least one eigenvalue with positive real part (see Proposition 6(c) and Proposition 11(a)). Thus, for both flows and such an $\varOmega $, the nonintegral equilibria are unstable (Corollary 1(c) and Proposition 11(a)).

Theorem 1 and Proposition 11(b) state that for both flows a sufficient condition for convergence is that $\varOmega $ has the form (2.22). Let $\varOmega $ have both properties, i.e. nonnegative on the diagonal with at least one positive entry and row-wise positive scaling of a symmetric matrix. Then, the set of initial values converging to a nonintegral point is negligible (Proposition 7, Theorem 2 and Proposition 10(a)).

For a given initial value, the two flows generally converge to different limit points and their regions of attraction generally look different. However, for small finite time-points, the linear assignment flow approximates the assignment flow and (after the appropriate transformation) the S-flow very well [33].

3 Discretization, numerical examples and discussion

3.1 Discretization, geometric integration

We confine ourselves to the simplest geometric scheme worked out by [33] for numerically integrating the assignment flow (1.20). Applying this scheme to the S-flow (2.2) that has the same structure as (1.20), yields the iteration

$$\begin{aligned} S^{(t+1)} = F_h( S^{(t)} ),\qquad F_h(S) = \exp _{S}(h \varOmega S),\qquad h > 0,\; t \in {\mathbb {N}}_{0}, \end{aligned}$$

(3.1)

where h denotes a fixed step size and the iteration step t represents the points of time th.

The following proposition shows that using this numerical method is ‘safe’ in the sense that, by setting h to a sufficiently small value, the approximation of the continuous-time solution S(t) by the sequence $\big (S(t h)\big )_{t \ge 0}$ generated by (3.1) can become arbitrarily accurate.

Proposition 12

Let $L > 0$ be the Lipschitz constant of the mapping F (2.2) defining the S-flow. Then there exists a constant $C > 0$ such that the solution S(t) to the S-flow (2.2) and the sequence $\big (S(t h)\big )_{t \ge 0}$ generated by (3.1) satisfy the relation

$$\begin{aligned} \big \Vert S(t h) - S^{(t)} \big \Vert \le \frac{C}{2 L} h e^{(t+1) L h},\qquad \forall t \in {\mathbb {N}}. \end{aligned}$$

(3.2)

Proof

See Appendix A. $\square $

Proposition 8 asserts the existence of regions of attraction for stable equilibria $S^{*}\in \overline{\mathcal {W}}$ of the continuous-time S-flow (2.2). The following proposition extends this assertion to the discrete-time S-flow (3.1).

Proposition 13

Let $\varOmega $, $S^* \in \overline{{\mathcal {W}}}^*$, $A(S^*)$ and $B_{\varepsilon }(S^*)$ be as in Proposition 8. Then, for the sequence $(S^{(t)})_{t \in {\mathbb {N}}}$ generated by (3.1), the following holds. If $S^{(t_0)} \in B_{\varepsilon }(S^*)$ for some time point $t_0 \in {\mathbb {N}}$, then $S^{(t)} \in B_{\varepsilon }(S^*)$ for all $t \ge t_0$ and $\lim _{t \rightarrow \infty } S^{(t)} = S^*$. Moreover, we have

$$\begin{aligned} \big \Vert S_i^{(t)} - S_i^* \big \Vert _{1} \le \big \Vert S_i^{(t_0)} - S_i^* \big \Vert _{1} \cdot \gamma _i^{t - t_0} \end{aligned}$$

(3.3)

with $\gamma _i \in (0,1)$, for each $i \in I$.

Proof

Let

$$\begin{aligned} \beta _i = \beta _i(S) :=\min \big \{ (\varOmega S)_{i j^*(i)} - (\varOmega S)_{ij} \big \}_{j \ne j^*(i)}. \end{aligned}$$

(3.4)

For $S \in A(S^*)$, we have $\beta _i(S) > 0$ and with $S_{i}^{*}=e_{ij^{*}(i)}$, $F_{h,i}(S)\in \varDelta _{n}$,

$$\begin{aligned} \big \Vert F_{h,i}(S) - S_i^* \big \Vert _1&= 2 - 2 F_{h, i j^*(i)}(S) \end{aligned}$$

(3.5a)

$$\begin{aligned}&= 2 - 2 \frac{ S_{i j^*(i)} }{ S_{i j^*(i)} + \sum _{j \ne j^*(i)} S_{ij} e^{h (\varOmega S)_{ij} - h (\varOmega S)_{i j^*(i)}} } \end{aligned}$$

(3.5b)

$$\begin{aligned}&\le 2 - 2 \frac{ S_{i j^*(i)} }{ S_{i j^*(i)} + (1 - S_{i j^*(i)} ) e^{-h \beta _i} } \end{aligned}$$

(3.5c)

$$\begin{aligned}&= \Vert S_i - S_i^* \Vert _1 \underbrace{\frac{ e^{-h \beta _i} }{ S_{i j^*(i)} + (1 - S_{i j^*(i)}) e^{-h \beta _i} }}_{< 1}. \end{aligned}$$

(3.5d)

Choosing $\delta > 0$ with $S^{(t_0)} \in \overline{B_{\delta }(S^*)} \subset B_{\varepsilon }(S^*)$, we set

$$\begin{aligned} \gamma _i = \max _{S \in \overline{B_{\delta }(S^*)}}~\frac{ e^{-h \beta _i(S)} }{ S_{i j^*(i)} + (1 - S_{i j^*(i)}) e^{-h \beta _i(S)} } \in (0,1). \end{aligned}$$

(3.6)

and thus get $\Vert F_{h,i}(S) - S_i^* \Vert _1 \le \gamma _i \Vert S_i - S_i^* \Vert _1$ for $S \in B_{\delta }(S^*)$, which implies $F_h(\overline{B_{\delta }(S^*)}) \subseteq \overline{B_{\delta }(S^*)} \subset B_{\varepsilon }(S^*)$ and the exponential convergence rate (3.3) of $S^{(t)}$.

$\square $

3.2 Numerical examples, discussion

We illustrate in this section by a range of counter-examples that violating assumption (2.22) can make the assignment flow behave quite differently from what the assertions of Sect. 2 predict. In fact, we use violations of the assumptions as a guiding principle for constructing alternative asymptotic behavior (Sect. 3.2.2).

In addition, we briefly discuss the influence of the parameter matrix $\varOmega $ on the spatial shape of labelings returned by the assignment flow. Finally, we illustrate that our results on the region of attraction of the S-flow towards labelings turns the termination criterion proposed by [2] into a mathematically sound one, provided a proper geometrical scheme is used for numerically integrating the assignment flow.

3.2.1 Vanishing diagonal averaging parameters

We consider a small dynamical system that violates the basic assumption of Corollary 1, that all diagonal entries of the parameter matrix $\varOmega $ of the S-flow (2.2) are positive. As a consequence, an entire line of nonintegral points $S^{*}$ is locally attracting the flow.

Example 2

Let $m=|I| = 3$ and $n=|J| = 2$, and let the parameters of the S-flow (2.2) be given by the row-stochastic matrix

$$\begin{aligned} \varOmega = \{\omega _{ik}\}_{k \in \mathcal {N}_{i}, i \in I} = \frac{1}{4} \begin{pmatrix} 0 &{}\quad 2 &{}\quad 2 \\ 1 &{}\quad 2 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 2 \end{pmatrix}. \end{aligned}$$

(3.7)

One easily checks that any point $S^{*}$ on the line $\mathcal {L}$

$$\begin{aligned} {\mathcal {L}} = \left\{ \begin{pmatrix} p &{}\quad 1-p \\ 1 &{}\quad 0 \\ 0 &{}\quad 1 \end{pmatrix} :p \in [0,1] \right\} \subset \overline{{\mathcal {W}}} \end{aligned}$$

(3.8)

is an equilibrium of the S-flow satisfying $F(S^{*})=0$. In particular, this includes nonintegral points with $p \in (0,1)$. The eigenvalues of the Jacobian are given by

$$\begin{aligned} \sigma \big ( \tfrac{\partial F}{\partial S}(S^*) \big ) = \big \{ 0, -\tfrac{1}{2}, -\tfrac{p+2}{4}, -\tfrac{p}{2}, -\tfrac{1-p}{2}, -\tfrac{3-p}{4} \big \} \subset {\mathbb {R}}_{\le 0} \end{aligned}$$

(3.9)

and are nonpositive. The phase portrait depicted by Fig. 4 illustrates that $\mathcal {L}$ locally attracts the flow.

This small example demonstrates that violation of the basic assumption—here, specifically, $\omega _{11}$ of (3.7) is not positive—leads to S-flows with properties not covered by the results of Sect. 2. Note that Theorem 2 is also based on this assumption and does not apply to the present example: there is an open set of starting points $S_{0} \in \mathcal {W}$ for which the S-flow converges to nonintegral equilibria $S^{*} \in \overline{{\mathcal {W}}}$.

Recalling Corollary 4, we see that for the linear assignment flow (2.5) continuous sets on the boundary of the assignment manifold, like line $\mathcal {L}$ in Fig. 4, cannot be limit points.

3.2.2 Constructing $3\times 3$ systems with various asymptotic properties

In this section, we construct a family of S-flows (2.1a) in terms of a class of nonnegative parameter matrices $\varOmega $, that may violate assumption (2.22) which underlies Theorem 1. Accordingly, for a small problem size $n=3$, we explicitly specify flows that exhibit one of the following behaviors:

1.
$t \mapsto S(t)$ converges towards a point $S^* \in \overline{{\mathcal {W}}}$ as $t \rightarrow \infty $;
2.
$t \mapsto S(t)$ is periodic with some period $t_1 > 0$;
3.
$t \mapsto S(t)$ neither converges to a point nor is periodic.

These cases are discussed below as Example 3 and illustrated by Fig. 5. They demonstrate that assumption (2.22) is not too strong, because violation may easily imply that the flow fails to converge to an equilibrium.

Let $\mathcal {D}$ denote the set of doubly stochastic, circulant matrices. We consider the case $m=|I|=|J|=n$ and therefore have $\mathcal {D} \subset \overline{{\mathcal {W}}}$. Let

$$\begin{aligned} P \in \{0,1\}^{n \times n},\qquad P_{ij} = {\left\{ \begin{array}{ll} 1, &{}\text {if }i-j \equiv 1\ ({\text {mod}} n), \\ 0, &{}\text {else} \end{array}\right. } \end{aligned}$$

(3.10)

denote the permutation matrix that represents the n-cycle $(1,\dotsc ,n)$. Then $\mathcal {D}$ is the convex hull of the matrices $\{P, P^2, \dots , P^n \}$ with $P^n = I_n$, and any element $M \in {\mathcal {D}}$ admits the representation

$$\begin{aligned} M = \sum _{k \in [n]} \mu _{k} P^k \quad \text {with} \quad \mu \in \varDelta _{n}. \end{aligned}$$

(3.11)

Since the matrices $P, P^2, \dots , P^n \in {\mathbb {R}}^{n \times n}$ are linearly independent, the vector ${\mu \in \varDelta _n}$ is uniquely determined. We will call $\mu $ the representative of $M \in {\mathcal {D}}$. The following Lemma characterizes two matrix products on $\mathcal {D}$ in terms of the corresponding matrix representatives.

Lemma 5

Let $\mu ^{(1)}, \mu ^{(2)} \in \varDelta _{n}$ be the representatives of any two matrices $M^{(1)}, M^{(2)} \in {\mathcal {D}}$. Then the element-wise Hadamard product and the ordinary matrix product, respectively, are given by

$$\begin{aligned} M^{(1)} \odot M^{(2)}&= \sum _{k \in [n]} \eta _{k} P^k \quad \text {with} \quad \eta = \mu ^{(1)} \odot \mu ^{(2)} \in {\mathbb {R}}_{\ge 0}^n, \end{aligned}$$

(3.12)

$$\begin{aligned} M^{(1)} M^{(2)}&= \sum _{k \in [n]} \mu _{k} P^k \quad \text {with} \quad \mu = M^{(1)} \mu ^{(2)} \in \varDelta _n. \end{aligned}$$

(3.13)

Proof

We note that the kth power of P is given by

$$\begin{aligned} (P^k)_{ij} = {\left\{ \begin{array}{ll} 1, &{}\text {if }i-j \equiv k\ ({\text {mod}} n), \\ 0, &{}\text {else.} \end{array}\right. } \end{aligned}$$

(3.14)

This implies $P^k \odot P^l = \delta _{kl} P^k$ for $k,l \in [n]$, with $\delta _{kl}$ denoting the Kronecker delta, and

$$\begin{aligned} \begin{aligned} M^{(1)} \odot M^{(2)}&= \bigg ( \sum _{k \in [n]} \mu _k^{(1)} P^k \bigg ) \odot \bigg ( \sum _{l \in [n]} \mu _l^{(2)} P^l \bigg ) = \sum _{k,l \in [n]} \mu _k^{(1)} \mu _l^{(2)} P^k \odot P^l \\&= \sum _{k \in [n]} \mu _k^{(1)} \mu _k^{(2)} P^k. \end{aligned} \end{aligned}$$

(3.15)

As for (3.13), we compute

$$\begin{aligned} M^{(1)} M^{(2)}&{=} \sum _{k,j \in [n]} \mu _{k}^{(1)} \mu _{j}^{(2)} P^{k+j} = \sum _{i \in [n]}~\sum _{k+j \equiv i ({\text {mod}} n)} \mu _{k}^{(1)} \mu _{j}^{(2)} P^{i} \end{aligned}$$

(3.16a)

$$\begin{aligned}&{\mathop {=}\limits ^{(\tiny 3.14)}} \sum _{i \in [n]} \sum _{k \in [n]} \mu _{k}^{(1)} ( P^{k} \mu ^{(2)})_i \, P^i = \sum _{i \in [n]} ( M^{(1)} \mu ^{(2)} )_i \, P^i. \end{aligned}$$

(3.16b)

$\square $

The following proposition shows that the S-flow on $\mathcal {D}$ can be expressed by the evolution of the corresponding representative.

Proposition 14

Let $\varOmega \in {\mathcal {D}}$ and suppose the S-flow (2.1a) is initialized at $S(0) \in {\mathcal {D}}$. Then the solution $S(t) \in {\mathcal {D}}$ evolves on $\mathcal {D}$ for all $t \in {\mathbb {R}}$. In addition, the corresponding representative $p(t) \in \varDelta _n$ of $S(t) = \sum _{k \in [n]} p_{k}(t) P^{k}$ satisfies the replicator equation

$$\begin{aligned} {\dot{p}} = R_{p}(\varOmega p). \end{aligned}$$

(3.17)

Proof

Let $S = \sum _{k \in [n]} p_k P^k \in {\mathcal {D}}$ with $p \in \varDelta _n$. Lemma 5 implies

$$\begin{aligned} S \odot \varOmega S = \sum _{k \in [n]} p_k (\varOmega p)_k \, P^k. \end{aligned}$$

(3.18)

Therefore, for any $i \in [n]$,

$$\begin{aligned} \begin{aligned} \langle S_i, (\varOmega S)_i \rangle&= \langle \mathbb {1}_{n}, S_i \odot (\varOmega S)_i \rangle = \big \langle \mathbb {1}_{n}, \big ( S \odot (\varOmega S) \big )_i \big \rangle \\&= \sum _{k \in [n]} p_k (\varOmega p)_k \underbrace{\langle \mathbb {1}_{n},(P^{k})_{i}\rangle }_{=1} = \langle p, \varOmega p \rangle . \end{aligned} \end{aligned}$$

(3.19)

Since this equation holds for any $i \in [n]$, the right-hand side of the S-flow (2.1a) can be rewritten as

$$\begin{aligned} R_{S}(\varOmega S)&= S \odot (\varOmega S) - \langle p, \varOmega p \rangle S \overset{(3.18)}{=} \sum _{k \in [n]}\Big (p_k (\varOmega p)_k \, P^k - \langle p, \varOmega p \rangle p_{k} P^{k}\Big ) \end{aligned}$$

(3.20a)

$$\begin{aligned}&= \sum _{k \in [n]} v_k P^k \quad \text {with} \quad v = p \odot (\varOmega p) - \langle p, \varOmega p \rangle p = R_{p}(\varOmega p). \end{aligned}$$

(3.20b)

Since $p \in \varDelta _{n}$, we have $\langle v,\mathbb {1}_{n}\rangle = 0$, that is v is tangent to $\varDelta _{n}$. Hence, by (3.20), $\dot{S} = \sum _{k \in [n]} \dot{p}_{k} P^{k} = R_{S}(\varOmega S) $ is determined by $\dot{p} = v = R_{p}(\varOmega p)$, whose solution p(t) evolves on $\varDelta _{n}$. $\square $

The following proposition introduces a restriction of parameter matrices $\varOmega \in \mathcal {D}$ that ensures, for any such $\varOmega $, that the product $\prod _{j \in [n]} p_{j}$ changes monotonously depending on the flow (3.17).

Proposition 15

Let $\varOmega = \sum _{k \in [n]} \mu _{k} P^{k} \in {\mathcal {D}}$ be parametrized by

$$\begin{aligned} \mu = \alpha e_{n} + \frac{\beta }{n} \mathbb {1}_n + \sum _{k < \big \lfloor \tfrac{n}{2} \big \rfloor } \gamma _k (e_{k} - e_{n-k}) \in \varDelta _n,\qquad \alpha ,\beta ,\gamma _{1},\dotsc ,\gamma _{\lfloor \tfrac{n}{2}\rfloor -1} \in {\mathbb {R}}. \end{aligned}$$

(3.21)

Suppose $p(t) \in {\mathcal {S}} = {{\,\mathrm{rint}\,}}(\varDelta _n)$ solves (3.17). Then

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} \prod _{j \in [n]} p_j(t) \left\{ \begin{aligned}&< 0, \;\text {if}\; \alpha> 0\\&= 0, \;\text {if}\; \alpha = 0\\&> 0, \;\text {if}\; \alpha < 0 \end{aligned}\right\} , \quad \text {for} \quad p(t) \ne \tfrac{1}{n} \mathbb {1}_n. \end{aligned}$$

(3.22)

Proof

Set $\pi _{p} :=\prod _{j \in [n]} p_j$. By virtue of (3.17) and $\langle \mathbb {1}_{n},\varOmega p\rangle =\langle \varOmega ^{\top }\mathbb {1}_{n},p\rangle =1$ ($\varOmega $ is doubly stochastic and $p \in \varDelta _{n}$), we have

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} \pi _{p} = \pi _{p}\sum _{j \in [n]} \big ( (\varOmega p)_j - \langle p, \varOmega p \rangle \big ) = \pi _{p} \big ( 1 - n \langle p, \varOmega p \rangle \big ). \end{aligned}$$

(3.23)

Hence, since $\pi _{p} > 0$ for $p \in {\mathcal {S}}$, $\frac{\mathrm {d}}{\mathrm {d}t} \pi _{p}$ has the same sign as $\tfrac{1}{n} - \langle p, \varOmega p \rangle $. Regarding the term

$$\begin{aligned} \langle p, \varOmega p \rangle = \sum _{k \in [n]} \mu _k \langle p, P^k p \rangle , \end{aligned}$$

(3.24)

we have the following three cases:

($\alpha $):: for all $k < n$, the inequality $\langle p, P^k p \rangle \le \langle p, p \rangle = \langle p, P^n p \rangle $ holds, with equality if and only if $p = \tfrac{1}{n} \mathbb {1}_n$;
($\beta $):: $\sum _{k \in [n]} \langle p, P^k p \rangle = \langle p, \mathbb {1}_{n \times n} \, p \rangle = 1$;
($\gamma $):: for all $k \in [n]$, $\langle p, P^k p \rangle = \langle p, P^{n-k} p \rangle $, since $P^{-1} = P^\top $.

Inserting (3.21) into (3.24) and applying $(\alpha ),(\beta ),(\gamma )$ gives

$$\begin{aligned} \langle p, \varOmega p \rangle = \alpha \langle p, p \rangle + \beta \frac{1}{n} \quad \text {and} \quad \langle p, p \rangle > \frac{1}{n} \sum _{k \in [n]} \langle p, P^k p \rangle = \frac{1}{n} \quad \text {for} \quad p \ne \tfrac{1}{n} \mathbb {1}_n. \end{aligned}$$

(3.25)

Since $\langle \mu , \mathbb {1}_n \rangle = \alpha + \beta = 1$, we further obtain

$$\begin{aligned} \langle p, \varOmega p \rangle \left\{ \begin{aligned}&> \tfrac{1}{n},\;\text {if}\; \alpha > 0\\&= \tfrac{1}{n},\;\text {if}\; \alpha = 0\\&< \tfrac{1}{n},\;\text {if}\; \alpha < 0 \end{aligned}\right\} ,\quad \text {for all} \quad p \in \varDelta _n \setminus \{ \tfrac{1}{n} \mathbb {1}_{n} \}. \end{aligned}$$

(3.26)

Combining (3.26) and (3.23) yields (3.22). $\square $

Remark 8

Based on Proposition 15, we observe: If $\alpha > 0$, then p(t) moves towards the (relative) boundary of the simplex $\varDelta _n$, for any $p(0) \ne \tfrac{1}{n} \mathbb {1}_n$. If $\alpha < 0$, then p(t) converges towards the barycenter $\tfrac{1}{n} \mathbb {1}_n$. For $\alpha = 0$, the product $\prod _{j \in [n]} p_j(t)$ is constant over time.

The scalars $\gamma _k$ in (3.21) steer the skew-symmetric part of $\varOmega $. Consequently, if $\gamma _k = 0$ for all k, then $\varOmega $ is symmetric and the S-flow converges to a single point by Theorem 1. Depending on the skew-symmetric part, the S-flow may not converge to a point, as Example 3 below will demonstrate for few explicit instances and $n=3$. Note that, in this case $n=3$, (3.21) describes a parametrization rather than a restriction of $\varOmega \in {\mathcal {D}}$.

Example 3

Let $n=3$. The matrix $\varOmega \in {\mathcal {D}}$ take the form

$$\begin{aligned} \mu&= \alpha e_3 + \tfrac{\beta }{3}\mathbb {1}_3 + \gamma (e_1 - e_2), \end{aligned}$$

(3.27)

$$\begin{aligned} \varOmega&= \begin{pmatrix} \mu _3 &{} \mu _2 &{} \mu _1 \\ \mu _1 &{} \mu _3 &{} \mu _2 \\ \mu _2 &{} \mu _1 &{} \mu _3 \end{pmatrix} = \alpha \begin{pmatrix} 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \end{pmatrix} + \frac{\beta }{3} \begin{pmatrix} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 \end{pmatrix} + \gamma \begin{pmatrix} 0 &{} -1 &{} 1 \\ 1 &{} 0 &{} -1 \\ -1 &{} 1 &{} 0 \end{pmatrix} \end{aligned}$$

(3.28)

with the constraint $\mu \in \varDelta _3$, i.e.

$$\begin{aligned} \alpha + \beta = 1, \quad \alpha +\frac{\beta }{3} \ge 0, \quad \frac{\beta }{3} \ge |\gamma |. \end{aligned}$$

(3.29)

We examine the behavior of the flow (3.17), depending on the parameters $\alpha $ and $\gamma $. Note, that the flow does not depend on the parameter $\beta $ that merely ensures $\varOmega $ to be row-stochastic.

Case $\alpha < 0$. As already discussed (Remark 8), p(t) converges to the barycenter in this case. Depending on $\gamma $, this may happen with ($\gamma \ne 0$) or without ($\gamma = 0$) a spiral as depicted by Fig. 5 (a) and (b).

Case $\alpha = 0$. We distinguish the two cases $\gamma = 0$ and $\gamma \ne 0$. If $\gamma = 0$, then we have $\varOmega = \tfrac{1}{3} \mathbb {1}_{3\times 3}$ and therefore ${\dot{p}} = R_{p} \varOmega p \equiv 0$, i.e., each point $p^* \in \varDelta _3$ is an equilibrium. In contrast, if $\gamma \ne 0$, then we have the (standard) rock-paper-scissors dynamics [25, Chapter 10]:

$$\begin{aligned} {\dot{p}} = \gamma \begin{pmatrix} p_1 (p_3 - p_2) \\ p_2 (p_1 - p_3) \\ p_3 (p_2 - p_1) \end{pmatrix} \ne 0, \quad \text {for } p \in \varDelta _3 \setminus \{ e_1, e_2, e_3, \tfrac{1}{3} \mathbb {1}_3 \}. \end{aligned}$$

(3.30)

Starting at a point $p_0 \in {{\,\mathrm{rint}\,}}(\varDelta _3) {\setminus }\{ \tfrac{1}{3} \mathbb {1}_3 \}$, the curve $t \mapsto p(t)$ moves along the closed curve $\big \{ p \in \varDelta _3 :\prod _{j} p_j = \prod _{j} p_{0,j} \big \}$, i.e., the curve $t \mapsto p(t)$ is periodic; see Fig. 5 (c).

Case $\alpha > 0$. We distinguish again the two cases $\gamma = 0$ and $\gamma \ne 0$. If $\gamma = 0$, then the flow reduces to ${\dot{p}} = \alpha R_p p$ whose solution converges to

$$\begin{aligned} \lim _{t \rightarrow \infty } p(t) = \tfrac{1}{|J^*|} \sum _{j \in J^*} e_j \in \varDelta _3, \quad \text {with} \quad J^* = {\text {*}}{arg\, max}_{j \in [3]}~p_j(0). \end{aligned}$$

(3.31)

As for the remaining case $\alpha > 0$ and $\gamma \ne 0$, we distinguish $\alpha > |\gamma |$ and $\alpha \le |\gamma |$ as illustrated by Fig. 5 (e), (f) and (g). If $\alpha \le |\gamma |$, then we have a generalized rock-paper-scissors game [25, Chapter 10]. The curve $t \mapsto p(t)$ spirals towards the boundary of the simplex $\varDelta _3$ and does not converge to a single point. In contrast, if $\alpha > |\gamma |$, then the flow converges to a point on the boundary. In fact, the vertices of the simplex are attractors.

Example 3 is devoted to the S-flow (2.1a) that parametrizes the assignment flow (2.1b), as specified by Proposition 2. The following examples illustrate how the assignment flow may behave if the S-flow does not converge to an equilibrium point.

Example 4

This example continues Example 3. Accordingly, we consider the case $n=3$ and assume $\varOmega \in \mathcal {D}$. Let the distance matrix D, whose row vectors define the mappings (1.16) corresponding to the assignment flow, be given by

$$\begin{aligned} D = \begin{pmatrix} 0 &{}\quad 1 &{}\quad 1 \\ 1 &{}\quad 0 &{}\quad 1 \\ 1 &{}\quad 1 &{}\quad 0 \end{pmatrix}. \end{aligned}$$

(3.32)

Then, if $\varOmega \in {\mathcal {D}}$, the initial value $S(0) = \exp _{\mathbb {1}_{{\mathcal {W}}}}(-\varOmega D)$ of the S-flow (2.1a) lies in ${\mathcal {D}}$ as well. Hence, the above observations of Example 3 for the S-flow hold. The resulting assignment flow $t \mapsto W(t)$ then also evolves in ${\mathcal {D}}$ which can be verified using (2.3). As for the averaging parameters $\varOmega $, we consider the following three matrices in ${\mathcal {D}}$:

$$\begin{aligned} \varOmega _{\text {center}} = \begin{pmatrix} 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \end{pmatrix}, \quad \varOmega _{\text {cycle}} = \frac{1}{3} \begin{pmatrix} 1 &{} 0 &{} 2 \\ 2 &{} 1 &{} 0 \\ 0 &{} 2 &{} 1 \end{pmatrix}, \quad \varOmega _{\text {spiral}} = \frac{1}{5} \begin{pmatrix} 2 &{} 0 &{} 3 \\ 3 &{} 2 &{} 0 \\ 0 &{} 3 &{} 2 \end{pmatrix}. \end{aligned}$$

(3.33)

Fig. 6 displays the trajectories of the assignment flow for these averaging matrices. The symmetry of these plots results from $W(t) \in {\mathcal {D}}$.

Matrix $\varOmega _{\text {center}}$ corresponds to the parameters $(\alpha ,\beta ,\gamma ) = (-\tfrac{1}{2},\tfrac{3}{2},\tfrac{1}{2})$ of (3.27), for which the S-flow converges to the barycenter. As a consequence, W(t) converges to a point in ${\mathcal {W}} \setminus \{ \mathbb {1}_{{\mathcal {W}}}\}$.

Matrix $\varOmega _{\text {cycle}}$ corresponds to the parameters $(\alpha ,\beta ,\gamma ) = (0,1,\tfrac{1}{3})$, for which the S-flow has periodic orbits. Since these orbits are symmetric around the barycenter, i.e. $\int _{0}^{t_1} \big ( S(t) - \mathbb {1}_{{\mathcal {W}}}\big ) \mathrm {d}t = 0$ with $t_1$ being the period of the trajectory, the trajectory $t \mapsto W(t)$ is also periodic as a consequence of equation (2.3).

Finally, matrix $\varOmega _{\text {spiral}}$ corresponds to the parameters $(\alpha ,\beta ,\gamma ) = (0.1,0.9,0.3)$, for which the S-flow spirals towards the boundary of the simplex. It is not clear a priori if $t \mapsto W(t)$ does not converge to a single point either. The trajectory of W(t) shown by Fig. 6 suggests that the assignment flow also spirals towards the boundary of the simplex without converging to a single point.

Remark 9

Examples 3 and 4 considered the special case $m=|I| = |J|=n=3$. We observed in further experiments similar behaviors also in the case $|J| < |I|$. For example, it can be verified, for $|J|=2$ and $\varOmega = \varOmega _{\text {cycle}}$ from Example 4, that the S-flow possesses a (unstable) limit cycle, i.e. a periodic orbit.

The above examples also demonstrate that several symmetries in the input data are required, e.g. $\varOmega \in {\mathcal {D}}$ and $S_0 \in {\mathcal {D}}$, in order to obtain nonconvergent orbits. However, small perturbations like numerical errors or the omnipresent noise in real data will break these symmetries. Therefore, it is very unlikely to observe such behavior of the S-flow and the assignment flow, respectively, in practice.

3.2.3 Geometric averaging and spatial shape

We design and construct a small academical example that, despite its simplicity, illustrates the following important points:

the region of attraction due to Corollary 3, here for the special case of uniform averaging parameters $\varOmega $ (and likewise more generally for nonuniform $\varOmega $ (Proposition 9)), that enables to terminate the numerical scheme and rounding to the correct labeling;
the influence of $\varOmega $ on the spatial shape of patterns created through data labeling, which provides the basis for pixel-accurate ‘semantic’ image labeling;
undesired asymptotic behavior of the numerically integrated assignment flow—cf. Remark 10 below—cannot occur when using proper geometric numerical integration, like the scheme (3.1) or any scheme devised by [33].

Example 5

We consider a $12 \times 12$ RGB image $u :I \rightarrow [0,1]^3$ shown by Fig. 7. The three unit vectors $e_{j},\,j \in J=[3]$ define the labels that are marked by the colors red, green and blue. For spatial regularization we used $3 \times 3$ neighborhoods $\mathcal {N}_{i},\,i \in I$ with uniform weights $\omega _{ik} = \tfrac{1}{|{\mathcal {N}}_i|},\, k \in \mathcal {N}_{i}$, with shrunken neighborhoods if they intersect the boundary of the underlying quadratic domain. The distance matrix D that initializes the S-flow by $S_0 = \exp _{\mathbb {1}_{{\mathcal {W}}}}(-\varOmega D)$, was set to $D_{ij} = 10 \cdot \Vert u_i - e_j \Vert _2,\,i \in I,\,j \in J$.

Adopting the termination criterion from [2], we numerically integrated the S-flow using the scheme (3.1), until iteration T when the average entropy dropped below $10^{-3}$, i.e.

$$\begin{aligned} -\frac{1}{|I| \log |J|} \sum _{i\in I,j\in J} S_{ij}^{(T)} \log S_{ij}^{(T)} < 10^{-3}. \end{aligned}$$

(3.34)

The resulting assignment $S^{(T)}$ was rounded to the integral assignment $S^* \in \overline{{\mathcal {W}}}^*$ depicted by the right panel of Fig. 7. We observe the following.

(i)
The resulting labeling $S^{*}$ differs from the input image although exact (integral) input data are used.

This conforms to Corollary 1(b), which enables to recognize the input data as unstable. As a consequence, the green and blue labels at the corners of the corresponding quadrilateral shapes in the input data are replaced by the flow. The resulting labeling $S^{*}$ is stable, as one easily verifies using Corollary 1(a).

This simple example and the corresponding observation points to a fundamental question to be investigated in future work: how can $\varOmega $ be used for ‘storing’ prior knowledge about the shape of labeling patterns?
(ii)
Using the estimate (2.43) which is the special case of (2.39) in the case of uniform weights, we computed
$$\begin{aligned} \varepsilon _{\mathrm {est}} = \varepsilon _{\mathrm {unif}} = 0.2. \end{aligned}$$
(3.35)
Since the distance between $S^*$ and the assignment $S^{(T)}$ obtained after terminating numerical integration due to (3.34), satisfied
$$\begin{aligned} \max _{i \in I}~\Vert S_i^{(T)} - S_i^* \Vert _1 \approx 0.00196 < \varepsilon _{\mathrm {est}}, \end{aligned}$$
(3.36)
we had the guarantee due to Proposition 13 that $S^{(t)}$ converges for $t > T$ to $S^*$, i.e. that no label indicated by $S^{(T)}$ can change anymore. With regard to Proposition 12, the estimate (3.2) implies for sufficiently small step size ${h > 0}$ that the continuous S-flow S(hT) also lies in the attracting region $B_{\varepsilon }(S^*)$. Proposition 8 then states the convergence of the S-flow to $S^*$. Eventually, the continuous assignment flow (1.20) converge to $S^*$ by Proposition 4.

Remark 10

(numerical integration and asymptotic behavior) The authors of [2] adopted a numerical scheme from [19] which, when adapted and applied to (1.1), was shown in [5] to always converge to a constant solution as $t \rightarrow \infty $, i.e. a single label is assigned to every pixel, which clearly is an unfavorable property. Irrespective of the fact that uniform positive weights were used by [2], that satisfy assumption (2.22), this strange asymptotic behavior resulted from the fact that the adaption of the discrete scheme of [19] implicitly uses different step sizes for updating the flow $S_{i}$ at different locations $i \in I$.

Our results in this paper show that the continuous-time assignment flow does not exhibit this asymptotic behavior, under appropriate assumptions on the parameter matrix $\varOmega $. In addition, point (ii) above and Proposition 13 show that using a proper geometric scheme from [33] turns condition (3.34) into a sound criterion for terminating the numerical scheme, followed by safe rounding to an integral labeling.

4 Conclusion

We established in this paper that under reasonable assumptions on the weight parameters $\varOmega $, the assignment flow approach is a sound method for contextual data classification on graphs. Favourable properties like convergence to integral assignments and existence of corresponding basins of attraction extend to sequences generated by discrete-time schemes for geometric integration. This shows that geometric numerical integration of the assignment flow yields sound numerical algorithms. A range of counter-examples demonstrate that these conditions are not too strong, since violating them may quickly lead to unfavorable behavior of the assignment flow regarding classification.

The results provide a proper basis and justify recent work on learning the assignment flow parameters $\varOmega $ from data [16, 31, 32], on extending the approach to unsupervised data classification on graphs [34, 35] or taking additional spatial constraints into account [28]. Our future work will focus on deeper parametrizations of assignment flows using the same mathematical framework and on studying their properties and performance for statistical data classification on graphs.

Notes

Note that this follows by the Hölder continuity of the homeomorphism in the Hartman-Grobman theorem. The Hölder continuity is shown in [4].

References

Amari, S.I., Nagaoka, H.: Methods of Information Geometry. Oxford Univ. Press, Oxford (2000)
MATH Google Scholar
Åström, F., Petra, S., Schmitzer, B., Schnörr, C.: Image labeling by assignment. J. Math. Imag. Vision 58(2), 211–238 (2017)
Article MathSciNet MATH Google Scholar
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information Geometry, Ergebnisse Der Mathematik Und Ihrer Grenzgebiete 34, vol. 64. Springer, Cham (2017)
Google Scholar
Belitskii, G., Rayskin, V.: On the Grobman–Hartman theorem in $\alpha $-Hölder class for Banach spaces. preprint (2009)
Bergmann, R., Fitschen, J.H., Persch, J., Steidl, G.: Iterative multiplicative filters for data labeling. Int. J. Comput. Vision 123(3), 435–453 (2017)
Article MathSciNet MATH Google Scholar
Bomze, I.M.: Regularity versus degeneracy in dynamics, games, and optimization: a unified approach to different aspects. SIAM Rev. 44(3), 394–414 (2002)
Article MathSciNet MATH Google Scholar
Chan, T.F., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006)
Article MathSciNet MATH Google Scholar
Cordts, M., Omran, M., Ramos, S., Refeld, T., Enzweiler, M., Beneson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: Proc. CVPR (2016)
Elad, M.: Deep, deep trouble: deep learning’s impact on image processing, mathematics, and humanity. SIAM News 50(4) (2017)
Fenichel, N.: Geometric singular perturbation theory for ordinary differential equations. J. Differ. Equ. 31(1), 53–98 (1979)
Article MathSciNet MATH Google Scholar
Finlayson, S., Bowers, J., Ito, J., Zittrain, J., Beam, A., Kohane, I.: Adversarial attacks on medical machine learning: emerging vulnerabilities demand new conversations. Science 363(6433), 1287–1289 (2019)
Article Google Scholar
Galla, T., Farmer, J.: Complex dynamics in learning complicated games. PNAS 110(4), 1232–1236 (2013)
Article MathSciNet MATH Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, New York (2016)
MATH Google Scholar
Heaven, D.: Deep trouble for deep learning. Nature 574 (2019)
Hofbauer, J., Sigmund, K.: Evolutionary game dynamics. Bull. Am. Math. Soc. 40(4), 479–519 (2003)
Article MathSciNet MATH Google Scholar
Hühnerbein, R., Savarino, F., Petra, S., Schnörr, C.: Learning adaptive regularization for image labeling using geometric assignment. J. Math. Imaging Vis. 63, 186–215 (2021)
Article MathSciNet MATH Google Scholar
Kappes, J., Andres, B., Hamprecht, F., Schnörr, C., Nowozin, S., Batra, D., Kim, S., Kausler, B., Kröger, T., Lellmann, J., Komodakis, N., Savchynskyy, B., Rother, C.: A comparative study of modern inference techniques for structured discrete energy minimization problems. Int. J. Comput. Vis. 115(2), 155–184 (2015)
Article MathSciNet Google Scholar
Kelley, A.: The stable, center-stable, center, center-unstable, unstable manifolds. J. Differ. Equ. (1966)
Losert, V., Akin, E.: Dynamics of games and genes: discrete versus continuous time. J. Math. Biol. 17(2), 241–251 (1983)
Article MathSciNet MATH Google Scholar
Nock, R., Nielsen, F.: Statistical region merging. IEEE Trans. Pattern. Anal. Mach. Intell. 26(11), 1452–1458 (2004)
Article Google Scholar
Perko, L.: Differential Equations and Dynamical Systems, vol. 7. Springer, Berlin (2001)
MATH Google Scholar
Sandholm, W.H.: Population Games and Evolutionary Dynamics. MIT Press, Chicago (2010)
MATH Google Scholar
Savarino, F., Schnörr, C.: Continuous-domain assignment flows. Eur. J. Appl. Math. 32(3), 570–597 (2021)
Article MathSciNet MATH Google Scholar
Schaeffer, D.G., Cain, J.W.: Ordinary Differential Equations: Basics and Beyond. Springer, Berlin (2016)
Book MATH Google Scholar
Schecter, S., Gintis, H.: Game Theory in Action: An Introduction to Classical and Evolutionary Models. Princeton University Press, Princeton (2016)
Book MATH Google Scholar
Schnörr, C.: Assignment Flows. In: Grohs, P., Holler, M., Weinmann, A. (eds.) Variational Methods for Nonlinear Geometric Data and Applications, pp. 235–260. Springer, Berlin (2020)
Google Scholar
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern. Anal. Mach. Intell. 22, 888–905 (2000)
Article Google Scholar
Sitenko, D., Boll, B., Schnörr, C.: Assignment flow for order-constrained OCT segmentation. Int. J. Comput. Vis. 129, 3088–3118 (2021).
Teschl, G.: Ordinary Differential Equations and Dynamical Systems. Grad. Studies Math, vol. 140. Amer. Math. Soc, London (2012)
MATH Google Scholar
Van Loan, C.F.: The ubiquitous Kronecker product. J. Comput. Appl. Math. 123, 85–100 (2000)
Article MathSciNet MATH Google Scholar
Zeilmann, A., Petra, S., Schnörr, C.: Learning Linear Assignment Flows for Image Labeling via Exponential Integration. In: Elmoataz, A., Fadili, J., Quéau, Y., Rabin, J., Simon, L. (eds.) Scale Space and Variational Methods in Computer Vision (SSVM), LNCS, vol. 12679, pp. 385–397 (2021)
Zeilmann, A., Petra, S., Schnörr, C.: Learning Linearized Assignment Flows for Image Labeling. arXiv:2108.02571 (2021)
Zeilmann, A., Savarino, F., Petra, S., Schnörr, C.: Geometric numerical integration of the assignment flow. Inverse Prob. 36, 034004 (33pp) (2020)
Article MathSciNet MATH Google Scholar
Zern, A., Zisler, M., Petra, S., Schnörr, C.: Unsupervised assignment flow: label learning on feature manifolds by spatially regularized geometric assignment. J. Math. Image. Vis. 62(6–7), 982–1006 (2020)
Article MathSciNet MATH Google Scholar
Zisler, M., Zern, A., Petra, S., Schnörr, C.: Self-assignment flows for unsupervised data labeling on graphs. SIAM J. Image. Sci. 13(3), 1113–1156 (2020)
Article MathSciNet MATH Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Image and Pattern Analysis Group, Heidelberg University, Heidelberg, Germany
Artjom Zern, Alexander Zeilmann & Christoph Schnörr

Authors

Artjom Zern
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Zeilmann
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Schnörr
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work has also been stimulated by the Heidelberg Excellence Cluster STRUCTURES, funded by the DFG under Germany’s Excellent Strategy EXC-2181/1 - 390900948.

Appendices

Proofs

1.1 Proof of Proposition 4

Proof

(a)
Let $\beta _i :=\tfrac{1}{2} \min \{S_{i j^*(i)}^* - S_{ij}^*\}_{j \ne j^*(i)} > 0$. Since
$$\begin{aligned} \lim _{t\rightarrow \infty } S_{i j^*(i)}(t) - S_{ij}(t) = S_{i j^*(i)}^* - S_{ij}^* \ge 2 \beta _i > 0, \quad \forall j \in J \setminus \{ j^*(i) \},\qquad \quad \end{aligned}$$
(A.1)
there exists $t_1 \ge 0$ such that
$$\begin{aligned} S_{i j^*(i)}(t) - S_{ij}(t) > \beta _i, \quad \forall t \ge t_1, \quad \forall j \in J \setminus \{ j^*(i) \}. \end{aligned}$$
(A.2)
We estimate
$$\begin{aligned}&\Vert W_i(t) - e_{j^*(i)}\Vert _1 \end{aligned}$$
(A.3a)
$$\begin{aligned}&\quad = 1 - W_{i j^*(i)} + \sum _{j \ne j^*(i)} W_{ij} = 2 - 2 W_{i j^*(i)} \nonumber \\&\qquad \overset{(2.3)}{=} 2 - 2 \frac{\exp \Big ( \int _{0}^{t} S_{i j^*(i)}(\tau ) \mathrm {d}\tau \Big )}{\sum _{j \in J} \exp \Big ( \int _{0}^{t} S_{ij}(\tau ) \mathrm {d}\tau \Big )} \end{aligned}$$
(A.3b)
$$\begin{aligned}&\quad = 2 \frac{\sum _{j \ne j^*(i)} \exp \Big ( \int _{0}^{t} S_{ij}(\tau ) \mathrm {d}\tau \Big )}{\sum _{j \in J} \exp \Big ( \int _{0}^{t} S_{ij}(\tau ) \mathrm {d}\tau \Big )} \end{aligned}$$
(A.3c)
$$\begin{aligned}&\quad = 2 \frac{\sum _{j \ne j^*(i)} \exp \Big ( \int _{0}^{t} \big ( S_{ij}(\tau ) - S_{i j^*(i)}(\tau ) \big ) \mathrm {d}\tau \Big )}{1 + \sum _{j \ne j^*(i)} \exp \Big ( \int _{0}^{t} \big ( S_{ij}(\tau ) - S_{i j^*(i)}(\tau ) \big ) \mathrm {d}\tau \Big )} \end{aligned}$$
(A.3d)
$$\begin{aligned}&\quad \le 2 \sum _{j \ne j^*(i)} \exp \Big ( \int _{0}^{t} \big ( S_{ij}(\tau ) - S_{i j^*(i)}(\tau ) \big ) \mathrm {d}\tau \Big ) \end{aligned}$$
(A.3e)
$$\begin{aligned}&\quad = 2 \sum _{j \ne j^*(i)} \exp \Big ( \int _{0}^{t_1} \big ( S_{ij}(\tau ) - S_{i j^*(i)}(\tau ) \big ) \mathrm {d}\tau + \int _{t_1}^{t} \big ( \overbrace{S_{ij}(\tau ) - S_{i j^*(i)}(\tau )}^{< - \beta _i} \big ) \mathrm {d}\tau \Big ) \end{aligned}$$
(A.3f)
$$\begin{aligned}&\quad \le 2 \sum _{j \ne j^*(i)} \exp \Big ( \int _{0}^{t_1} \big ( S_{ij}(\tau ) - S_{i j^*(i)}(\tau ) \big ) \mathrm {d}\tau \Big ) \cdot e^{-\beta _i (t - t_1)} \end{aligned}$$
(A.3g)
$$\begin{aligned}&\quad = \underbrace{2 e^{\beta _i t_1} \sum _{j \ne j^*(i)} \exp \Big ( \int _{0}^{t_1} \big ( S_{ij}(\tau ) - S_{i j^*(i)}(\tau ) \big ) \mathrm {d}\tau \Big )}_{=:\alpha _i > 0} \cdot e^{-\beta _i t}, \end{aligned}$$
(A.3h)
which proves (2.7).
(b)
Let $J^*(i) :={{\,\mathrm{arg max}\,}}_{j \in J}~S_{ij}^*$. For any $j, l \in J^*(i)$, we have
$$\begin{aligned} \begin{aligned} \int _{0}^{\infty } \big | S_{ij}(t) - S_{il}(t) \big | \mathrm {d}t&\le \int _{0}^{\infty } \big | S_{ij}(t) - S_{ij}^* \big | \mathrm {d}t + \int _{0}^{\infty } \big | S_{il}(t) - S_{il}^* \big | \mathrm {d}t \\&\le 2 \int _{0}^{\infty } \Vert S_i(t) - S_i^* \Vert _1 \mathrm {d}t < \infty , \end{aligned} \end{aligned}$$
(A.4)
where the last inequality follows from the hypothesis of (2.8). Thus, the improper integral $\int _{0}^{\infty } {\big ( S_{ij}(t) - S_{il}(t) \big )} \mathrm {d}t \in {\mathbb {R}}$ exists.

If $j \in J^*(i)$, we obtain
$$\begin{aligned} W_{ij}(t)&\overset{(2.3)}{=} \frac{\exp \Big ( \int _{0}^{t} S_{ij}(\tau ) \mathrm {d}\tau \Big )}{\sum _{l \in J} \exp \Big (\int _{0}^{t} S_{il}(\tau ) \mathrm {d}\tau \Big )} \end{aligned}$$
(A.5a)
$$\begin{aligned}&= \Bigg ( 1 + \sum _{l \in J {\setminus }J^*(i)} \exp \Big ( \overbrace{ \int _{0}^{t} \big ( S_{il}(\tau ) - S_{ij}(\tau ) \big ) \mathrm {d}\tau }^{\rightarrow -\infty } \Big ) \nonumber \\&\quad + \sum _{l \in J^*(i) {\setminus }\{ j \}} \exp \Big ( \int _{0}^{t} \big ( S_{il}(\tau ) - S_{ij}(\tau ) \big ) \mathrm {d}\tau \Big ) \Bigg )^{-1} \end{aligned}$$
(A.5b)
$$\begin{aligned}&\longrightarrow \Bigg ( 1 + \sum _{l \in J^*(i) {\setminus }\{ j \}} \exp \Big ( \int _{0}^{\infty } \big ( S_{il}(\tau ) - S_{ij}(\tau ) \big ) \mathrm {d}\tau \Big ) \Bigg )^{-1} \in (0,1] \quad \text {for} \quad t \rightarrow \infty , \end{aligned}$$
(A.5c)
whereas for any $j \in J \setminus J^*(i)$
$$\begin{aligned} W_{ij}(t)&= \frac{\exp \Big ( \int _{0}^{t} S_{ij}(\tau ) \mathrm {d}\tau \Big )}{\sum _{l \in J} \exp \Big ( \int _{0}^{t} S_{il}(\tau ) \mathrm {d}\tau \Big )} \end{aligned}$$
(A.6a)
$$\begin{aligned}&= \Bigg ( \overbrace{\sum _{l \in J {\setminus }J^*(i)} \exp \Big ( \int _{0}^{t} \big ( S_{il}(\tau ) - S_{ij}(\tau ) \big ) \mathrm {d}\tau \Big )}^{\ge 0} \nonumber \\&\quad + \sum _{l \in J^*(i)} \exp \Big ( \underbrace{\int _{0}^{t} \big ( S_{il}(\tau ) - S_{ij}(\tau ) \big ) \mathrm {d}\tau }_{\rightarrow \infty } \Big ) \Bigg )^{-1} \end{aligned}$$
(A.6b)
$$\begin{aligned}&\longrightarrow 0 \quad \text {for} \quad t \rightarrow \infty . \end{aligned}$$
(A.6c)

$\square $

1.2 Proof of Proposition 6

Proof

(a)
Since $\sigma \big ( \tfrac{\partial F}{\partial S}(S^*)^\top \big ) = \sigma \big ( \tfrac{\partial F}{\partial S}(S^*) \big )$, we may alternatively regard the transpose of the Jacobian
$$\begin{aligned} \tfrac{\partial F}{\partial S}(S^*)^\top = \begin{pmatrix} B_1^\top &{} &{} \\ &{} \ddots &{} \\ &{} &{} B_{m}^\top \end{pmatrix} + \varOmega ^\top \otimes I_{n} \cdot \begin{pmatrix} R_{S_{1}^*} &{} &{} \\ &{} \ddots &{} \\ &{} &{} R_{S_{m}^*} \end{pmatrix} \end{aligned}$$
(A.7)
with $B_i^\top = {{\,\mathrm{Diag}\,}}\big ( (\varOmega S^*)_i \big ) - \langle S_i^*, (\varOmega S^*)_i \rangle I_{n} - (\varOmega S^*)_i {S_i^*}^\top $. We have for each $i \in I$,
$$\begin{aligned} B_i^\top \mathbb {1}_{n}&= -\langle S_i^*, (\varOmega S^*)_i \rangle \mathbb {1}_{n}, \quad R_{S_i^*} \mathbb {1}_{n} = 0, \end{aligned}$$
(A.8a)
$$\begin{aligned} B_i^\top e_j&= \big ( (\varOmega S^*)_{ij} - \langle S_i^*, (\varOmega S^*)_i \rangle \big ) e_j, \quad R_{S_i^*} e_j = 0, \nonumber \\&\quad \forall j \in J {\setminus }{{\,\mathrm{supp}\,}}(S_i^*). \end{aligned}$$
(A.8b)
Hence, the transposed Jacobian possesses the following eigenpairs:
$$\begin{aligned} \tfrac{\partial F}{\partial S}(S^*)^\top \cdot e_i \otimes \mathbb {1}_{n}&= -\langle S_i^*, (\varOmega S^*)_i \rangle \cdot e_i \otimes \mathbb {1}_{n}, \quad \forall i \in I, \end{aligned}$$
(A.9a)
$$\begin{aligned} \tfrac{\partial F}{\partial S}(S^*)^\top \cdot e_i \otimes e_j&= \big ( (\varOmega S^*)_{i j} -\langle S_i^*, (\varOmega S^*)_i \rangle \big ) \cdot e_i \otimes e_j, \quad \forall j \in J {\setminus }{{\,\mathrm{supp}\,}}(S_i^*), \quad \forall i \in I. \end{aligned}$$
(A.9b)
If $S^* \in \overline{{\mathcal {W}}}^*$, then $|{{\,\mathrm{supp}\,}}(S_i^*)| = 1$ for each $i \in I$ and therefore (A.9) specifies all mn eigenpairs and the entire spectrum, which proves (2.16). In this case, the eigenvectors of $\tfrac{\partial F}{\partial S}(S^*)$ can also be stated explicitly: Since $R_{S^*} = 0$, we have
$$\begin{aligned} \tfrac{\partial F}{\partial S}(S^*) = \begin{pmatrix} B_1 &{} &{} \\ &{} \ddots &{} \\ &{} &{} B_{m} \end{pmatrix}. \end{aligned}$$
(A.10)
Each block $B_i$ fulfills
$$\begin{aligned} B_i S_i^*&= - \langle S_i^*, (\varOmega S^*)_i \rangle S_i^*, \end{aligned}$$
(A.11a)
$$\begin{aligned} B_i (S_i^* - e_j)&= \big ( (\varOmega S^*)_{ij} - \langle S_i^*, (\varOmega S^*)_i \rangle \big ) (S_i^* - e_j) \quad \forall j \in J {\setminus }{{\,\mathrm{supp}\,}}(S_i^*). \end{aligned}$$
(A.11b)
Hence, the corresponding eigenvectors of $\tfrac{\partial F}{\partial S}(S^*)$ are
$$\begin{aligned} e_i \otimes S_i^*, \quad e_i \otimes (S_i^* - e_j), \quad \forall j \in J \setminus {{\,\mathrm{supp}\,}}(S_i^*), \quad \forall i \in I. \end{aligned}$$
(A.12)
(b)
Since $\varOmega S^* = \tfrac{1}{|J_{+}|} (\varOmega \mathbb {1}_{m}) \mathbb {1}_{J_{+}}^\top $ for $S^*= \tfrac{1}{|J_{+}|} \mathbb {1}_{m} \mathbb {1}_{J_{+}}^\top $, we have
$$\begin{aligned} B_i&= ( \varOmega \mathbb {1}_{m} )_{i} \cdot \Big ( \tfrac{1}{|J_{+}|} {{\,\mathrm{Diag}\,}}(\mathbb {1}_{J_{+}}) - \tfrac{1}{|J_{+}|} I_{n} -\tfrac{1}{|J_{+}|^2} \mathbb {1}_{J_{+}} \mathbb {1}_{J_{+}}^\top \Big ), \end{aligned}$$
(A.13)
$$\begin{aligned} R_{S_i^*}&= \tfrac{1}{|J_{+}|} {{\,\mathrm{Diag}\,}}(\mathbb {1}_{J_{+}}) - \tfrac{1}{|J_{+}|^2} \mathbb {1}_{J_{+}} \mathbb {1}_{J_{+}}^\top \end{aligned}$$
(A.14)
for all $i \in I$, i.e., the Jacobian matrix simplifies to
$$\begin{aligned}&\tfrac{\partial F}{\partial S}(S^*) = {{\,\mathrm{Diag}\,}}(\varOmega \mathbb {1}_{m}) \otimes B_0 + \varOmega \otimes R_{S_1} \nonumber \\&\quad \text {with} \quad B_0 = \tfrac{1}{|J_{+}|} {{\,\mathrm{Diag}\,}}(\mathbb {1}_{J_{+}}) - \tfrac{1}{|J_{+}|} I_{n} -\tfrac{1}{|J_{+}|^2} \mathbb {1}_{J_{+}} \mathbb {1}_{J_{+}}^\top . \end{aligned}$$
(A.15)
Let $\{ (\lambda _i, w_i) \}_{i \in {\tilde{I}}} \subset {\mathbb {C}}\times {\mathbb {C}}^{m}$ be the set of all eigenpairs of $\varOmega $ indexed by ${{\widetilde{I}}}$, and let $\{ v_{1}, \dots , v_{|J_{+}|-1} \}$ be a basis of $\big \{ v \in {\mathbb {R}}^{n} :\langle v, \mathbb {1}_{J_{+}} \rangle = 0,\ {{\,\mathrm{supp}\,}}(v) \subseteq J_{+} \big \}$. Note that $|{\widetilde{I}}| < m$ if and only if $\varOmega $ is not diagonalizable. A short calculation shows
$$\begin{aligned} B_0 e_j&= -\tfrac{1}{|J_{+}|} e_j, \quad R_{S_1} e_j = 0, \quad \forall j \in J {\setminus }J_{+}, \end{aligned}$$
(A.16a)
$$\begin{aligned} B_0 \mathbb {1}_{J_{+}}&= -\tfrac{1}{|J_{+}|} \mathbb {1}_{J_{+}}, \quad R_{S_1} \mathbb {1}_{J_{+}} = 0, \end{aligned}$$
(A.16b)
$$\begin{aligned} B_0 v_j&= 0, \quad R_{S_1} v_j = \tfrac{1}{|J_{+}|} v_j, \quad \forall j \in \{ 1, \dots , |J_{+}|-1 \}. \end{aligned}$$
(A.16c)
Hence, the Jacobian has the following $m n - (m-|{\widetilde{I}}|)(|J_{+}|-1)$ eigenpairs:
$$\begin{aligned}&\Big ( -\tfrac{( \varOmega \mathbb {1}_{m} )_{i}}{|J_{+}|}, e_i \otimes e_j \Big ), \quad \forall j \in J {\setminus }J_{+}, \quad \forall i \in I, \end{aligned}$$
(A.17a)
$$\begin{aligned}&\Big ( -\tfrac{( \varOmega \mathbb {1}_{m} )_{i}}{|J_{+}|}, e_i \otimes \mathbb {1}_{J_{+}} \Big ), \quad \forall i \in I, \end{aligned}$$
(A.17b)
$$\begin{aligned}&\Big ( \tfrac{\lambda _i}{|J_{+}|}, w_i \otimes v_j \Big ), \quad \forall j \in \{1, \dots , |J_{+}|-1 \}, \quad \forall i \in {\widetilde{I}}. \end{aligned}$$
(A.17c)
If $|{\widetilde{I}}| = m$, we thus have a complete set of mn eigenpairs. If $|{\tilde{I}}| < m$, we may consider a diagonalizable perturbation ${\widetilde{\varOmega }}$ of $\varOmega $. By the same argument, we get a complete set of eigenpairs for the perturbed Jacobian matrix. Consequently, we obtain (2.18) by continuity of the spectrum.
(c)
We show that the real and imaginary parts of the corresponding eigenvector lie in the linear subspace
$$\begin{aligned} {\mathcal {T}}_{+} = {\mathcal {T}}_+(S^*) = \big \{ V \in {\mathbb {R}}^{m n} :\langle V_i, \mathbb {1}_{n} \rangle = 0,\ {{\,\mathrm{supp}\,}}(V_i) \subseteq {{\,\mathrm{supp}\,}}(S_i^*), \; \forall i \in I \big \} \end{aligned}$$
(A.18)
To this end, we show the two inclusions
$$\begin{aligned} {{\,\mathrm{im}\,}}R_{S^*} \subseteq {\mathcal {T}}_{+} \subseteq \ker B, \end{aligned}$$
(A.19)
where $R_{S^*}$ and B denote the block diagonal matrices
$$\begin{aligned} B = \begin{pmatrix} B_1 &{} &{} \\ &{} \ddots &{} \\ &{} &{} B_{m} \end{pmatrix}, \quad R_{S^*} = \begin{pmatrix} R_{S_1^*} &{} &{} \\ &{} \ddots &{} \\ &{} &{} R_{S_{m}^*} \end{pmatrix}. \end{aligned}$$
(A.20)
As for the first inclusion, we use the orthogonal projection onto ${\mathcal {T}}_{+}$ given by
$$\begin{aligned}&\varPi _{{\mathcal {T}}_{+}} = \begin{pmatrix} \varPi _{{\mathcal {T}}_{+},1} &{} &{} \\ &{} \ddots &{} \\ &{} &{} \varPi _{{\mathcal {T}}_{+},m} \end{pmatrix} \nonumber \\&\quad \text {with} \quad \varPi _{{\mathcal {T}}_{+},i} = {{\,\mathrm{Diag}\,}}( \mathbb {1}_{J_i} ) - \tfrac{1}{|J_i|} \mathbb {1}_{J_i} \mathbb {1}_{J_i}^\top , \quad J_i = {{\,\mathrm{supp}\,}}(S_i^*) \quad \forall i \in I.\qquad \end{aligned}$$
(A.21)
One can verify that $\varPi _{{\mathcal {T}}_{+}} R_{S^*} = R_{S^*}$ which implies ${{\,\mathrm{im}\,}}R_{S^*} \subseteq {{\,\mathrm{im}\,}}\varPi _{{\mathcal {T}}_{+}} = {\mathcal {T}}_{+}$, i.e. the first inclusion of (A.19).

As for the second inclusion, we have to take into account that $S^*$ is an equilibrium point, i.e. by (2.11)
$$\begin{aligned} (\varOmega S^*)_{ij} = \langle S_i^*, (\varOmega S^*)_i \rangle \quad \forall j \in {{\,\mathrm{supp}\,}}(S_i^*) \quad \forall i \in I. \end{aligned}$$
(A.22)
Since B is a block diagonal matrix, it suffices to examine each block
$$\begin{aligned} B_i = {{\,\mathrm{Diag}\,}}\big ( (\varOmega S^*)_i \big ) - \langle S_i^*, (\varOmega S^*)_i \rangle I_{n} - S^*_i (\varOmega S^*)_i^\top \end{aligned}$$
(A.23)
separately. Since
$$\begin{aligned}&B_i e_j = (\varOmega S^*)_{ij} e_j - \langle S_i^*, (\varOmega S^*)_i \rangle e_j - (\varOmega S^*)_{ij} S_i^* = -\langle S_i^*, (\varOmega S^*)_i \rangle S_i^*, \nonumber \\&\quad \forall j \in {{\,\mathrm{supp}\,}}(S_i^*) \end{aligned}$$
(A.24)
is independent of $j\in {{\,\mathrm{supp}\,}}(S_i^*)$, we get $B_i v = 0$ for any $v \in {\mathbb {R}}^{n}$ with $\langle v, \mathbb {1}_{n} \rangle = 0$ and ${{\,\mathrm{supp}\,}}(v) \subseteq {{\,\mathrm{supp}\,}}(S_i^*)$. This verifies the second inclusion of (A.19).

As a consequence of the two inclusions (A.19), any eigenvector V of $R_{S^*} (\varOmega \otimes I_{n})$ corresponding to a nonvanishing eigenvalue $\lambda \ne 0$ has a real and imaginary part lying in ${{\,\mathrm{im}\,}}R_{S^*} \subseteq {\mathcal {T}}_{+} \subseteq \ker B$. Therefore, $(\lambda , V)$ is also an eigenpair of $\tfrac{\partial F}{\partial S}(S^*) = B + R_{S^*} (\varOmega \otimes I_{n})$. It remains to show that
$$\begin{aligned} R_{S^*} (\varOmega \otimes I_{n}) = \begin{pmatrix} \omega _{11} R_{S_1^*} &{} \cdots &{} \omega _{1 m} R_{S_1^*} \\ \vdots &{} &{} \vdots \\ \omega _{m 1} R_{S_{m}^*} &{} \cdots &{} \omega _{m m} R_{S_{m}^*} \end{pmatrix} \end{aligned}$$
(A.25)
has at least one eigenvalue with positive real part. Since the trace
$$\begin{aligned} {\mathrm{tr}}\big ( R_{S^*} (\varOmega \otimes I_{n}) \big ) = \sum _{i \in I} \omega _{ii} {\mathrm{tr}}\big ( R_{S_i^*} \big ) = \sum _{i \in I} \underbrace{\omega _{ii} \sum _{j \in J} (S_{ij}^* - {S_{ij}^*}^2)}_{{\left\{ \begin{array}{ll} \ge 0, &{} \forall i \in I \\> 0, &{} \hbox { for some}\ i \in I \end{array}\right. }} > 0 \end{aligned}$$
(A.26)
is positive by assumption, the existence of such an eigenvalue is guaranteed.

$\square $

1.3 Proof of Theorem 1

The proof follows after two preparatory Lemmata. Let $\varLambda \subset \overline{{\mathcal {W}}}$ be the limit set of the orbit $\{ S(t) :t \ge 0 \}$, i.e.

$$\begin{aligned} \varLambda = \varLambda (S_0) = \Big \{ S^* \in \overline{{\mathcal {W}}} :\exists (t_k)_{k \in {\mathbb {N}}} \subset {\mathbb {R}}_{\ge 0}\;\text {with}\; t_k \rightarrow \infty ,\; S(t_k) \rightarrow S^* \Big \}.\quad \quad \end{aligned}$$

(A.27)

The set $\varLambda \ne \emptyset $ is non-empty since $\overline{{\mathcal {W}}}$ is compact.

Lemma 6

Every point $S^{*} \in \varLambda $ of the limit set (A.27) is an equilibrium point satisfying the condition of Proposition 5(a), which under assumption (2.22) reads

$$\begin{aligned} ({\widehat{\varOmega }} S^*)_{ij} = \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle \quad \forall j \in {{\,\mathrm{supp}\,}}S_i^* \quad \forall i \in I. \end{aligned}$$

(A.28)

Proof

The assertion follows from [19, Proposition 1] if the flow ${\dot{S}} = F(S)$ on $\overline{{\mathcal {W}}}$ admits a Lyapunov function $f :\overline{{\mathcal {W}}} \rightarrow {\mathbb {R}}$, i.e. $\frac{\mathrm {d}}{\mathrm {d}t}f(S(t)) = \langle \nabla f(S(t)), F(S(t)) \rangle \ge 0$, with equality only at an equilibrium.

The function $f :\overline{{\mathcal {W}}} \rightarrow {\mathbb {R}}$,

$$\begin{aligned} f(S) = \langle S, {{\widehat{\varOmega }}} S \rangle \end{aligned}$$

(A.29)

is a Lyapunov function for the S-flow (2.2), since

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} f\big (S(t)\big )&= 2 \langle {{\widehat{\varOmega }}} S, {\dot{S}} \rangle = 2 \sum _{i \in I} \langle ({\widehat{\varOmega }} S)_i, {\dot{S}}_i \rangle \nonumber \\&\overset{\langle \mathbb {1}_{n},\dot{S}_{i}\rangle = 0}{=} 2 \sum _{i \in I} \big \langle ({\widehat{\varOmega }} S)_i - \langle S_i, ({\widehat{\varOmega }} S)_i \rangle \mathbb {1}_{n}, {\dot{S}}_i \big \rangle \end{aligned}$$

(A.30a)

$$\begin{aligned}&\overset{(2.2), (2.22)}{=} \sum _{i \in I} \frac{2}{w_i} \sum _{j \in J} S_{ij} \Big ( ({\widehat{\varOmega }} S)_{ij} - \langle S_i, ({\widehat{\varOmega }} S)_i \rangle \Big )^2 \ge 0, \end{aligned}$$

(A.30b)

with equality only if S satisfies the equilibrium criterion (A.28). $\square $

Next, we introduce some additional notation. Let $S^* \in \varLambda $ be an equilibrium with $S(t_k) \rightarrow S^*$. The weighted Kullback-Leibler divergence is defined by

$$\begin{aligned} D_{\mathrm {KL}}^w(S^*, S)&= {\left\{ \begin{array}{ll} - \sum _{i \in I} w_i \sum _{j \in {{\,\mathrm{supp}\,}}(S_i^*)} S_{ij}^* \log \tfrac{S_{ij}}{S_{ij}^*}, &{}\text {if }{{\,\mathrm{supp}\,}}(S^*) \subseteq {{\,\mathrm{supp}\,}}(S), \\ \infty , &{} \text {else,} \end{array}\right. } \end{aligned}$$

(A.31a)

$$\begin{aligned}&= \sum _{i \in I} w_i D_{\mathrm {KL}}(S_i^*, S_i), \end{aligned}$$

(A.31b)

with weights $w \in {\mathbb {R}}_{>0}^{m}$ from (2.22) and the supports

$$\begin{aligned} {{\,\mathrm{supp}\,}}S&= \{ (i,j) \in I \times J :S_{ij} \ne 0 \}, \end{aligned}$$

(A.32a)

$$\begin{aligned} {{\,\mathrm{supp}\,}}S_i&= \{ j \in J :S_{ij} \ne 0 \}. \end{aligned}$$

(A.32b)

Analogously to [19], we consider the index sets

$$\begin{aligned} J_0(i)&= \big \{ j \in J :({\widehat{\varOmega }} S^*)_{ij} = \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle \big \}, \end{aligned}$$

(A.33a)

$$\begin{aligned} J_-(i)&= \big \{ j \in J :({\widehat{\varOmega }} S^*)_{ij} > \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle \big \}, \end{aligned}$$

(A.33b)

$$\begin{aligned} J_+(i)&= \big \{ j \in J :({\widehat{\varOmega }} S^*)_{ij} < \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle \big \} \end{aligned}$$

(A.33c)

and define the continuous functions $Q :\overline{{\mathcal {W}}} \rightarrow {\mathbb {R}}_{\ge 0}$ and $V :\overline{{\mathcal {W}}} \rightarrow {\mathbb {R}}_{\ge 0} \cup \{ \infty \}$ by

$$\begin{aligned} Q :\overline{{\mathcal {W}}}&\rightarrow {\mathbb {R}}_{\ge 0},\qquad Q(S) = \sum _{i \in I} w_i \sum _{j \in J_+(i)} S_{ij}, \end{aligned}$$

(A.34a)

$$\begin{aligned} V :\overline{{\mathcal {W}}}&\rightarrow {\mathbb {R}}_{\ge 0} \cup \{ \infty \},\qquad V(S) = D_{\mathrm {KL}}^w(S^*, S) + 2 Q(S). \end{aligned}$$

(A.34b)

The equilibrium criterion (A.28) implies

$$\begin{aligned} {{\,\mathrm{supp}\,}}(S_i^*) \subseteq J_0(i) \qquad \text {and} \qquad J_-(i), J_+(i) \subseteq J \setminus {{\,\mathrm{supp}\,}}(S_i^*) \qquad \forall i \in I, \end{aligned}$$

(A.35)

i.e. $V(S^*) = Q(S^*) = 0$. Using the Lyapunov function (A.29), we have the following.

Lemma 7

(cf. [19, Proposition 2]) There exists $\varepsilon > 0$ such that, if $\Vert S(t) - S^* \Vert < \varepsilon $ and $f(S(t)) < f(S^*)$ with f given in (A.29), then $\frac{\mathrm {d}}{\mathrm {d}t} V(S(t)) < 0$.

Proof

Since $S(t) \in {\mathcal {W}}$ for all $t \ge 0$, we have $D_{\mathrm {KL}}^w(S^*, S(t)) < \infty $. Hence

$$\begin{aligned}&\frac{\mathrm {d}}{\mathrm {d}t} D_{\mathrm {KL}}^w(S^*, S(t)) \nonumber \\&{\mathop {=}\limits ^{\tiny \mathrm{(A.31)}}} -\sum _{i \in I} w_i \sum _{j \in {{\,\mathrm{supp}\,}}(S_i^*)} S_{ij}^* \frac{{\dot{S}}_{ij}}{S_{ij}} \end{aligned}$$

(A.36a)

$$\begin{aligned}&{\mathop {=}\limits ^{\tiny (2.2)}} -\sum _{i \in I} \sum _{j \in {{\,\mathrm{supp}\,}}(S_i^*)} S_{ij}^* \big ( ({\widehat{\varOmega }} S)_{ij} - \langle S_i, ({\widehat{\varOmega }} S)_i \rangle \big ) \end{aligned}$$

(A.36b)

$$\begin{aligned}&= \langle S, {\widehat{\varOmega }} S \rangle - \langle S^*, {\widehat{\varOmega }} S \rangle {\mathop {=}\limits ^{\tiny (2.2)}} \langle S, {\widehat{\varOmega }} S \rangle - \langle S, {\widehat{\varOmega }} S^* \rangle \end{aligned}$$

(A.36c)

$$\begin{aligned}&= \langle S, {\widehat{\varOmega }} S \rangle - \langle S^*, {\widehat{\varOmega }} S^* \rangle + \langle S^*, {\widehat{\varOmega }} S^* \rangle - \langle S, {\widehat{\varOmega }} S^* \rangle \end{aligned}$$

(A.36d)

$$\begin{aligned}&{\mathop {=}\limits ^{\sum _{j \in J} S_{ij}=1}} \langle S, {\widehat{\varOmega }} S \rangle - \langle S^*, {\widehat{\varOmega }} S^* \rangle + \sum _{i \in I} \sum _{j \in J} S_{ij} \Big ( \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle - ({\widehat{\varOmega }} S^*)_{ij} \Big ) \end{aligned}$$

(A.36e)

$$\begin{aligned}&{\mathop {=}\limits ^{\tiny (\mathrm{A.33})}} \underbrace{\langle S, {\widehat{\varOmega }} S \rangle - \langle S^*, {\widehat{\varOmega }} S^* \rangle }_{=f(S)-f(S^*)< 0} + \sum _{i \in I} \sum _{j \in J_-(i)} S_{ij} \Big ( \underbrace{ \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle - ({\widehat{\varOmega }} S^*)_{ij} }_{< 0} \Big ) \end{aligned}$$

(A.36f)

$$\begin{aligned}&\quad + \sum _{i \in I} \sum _{j \in J_+(i)} S_{ij} \Big ( \underbrace{ \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle - ({\widehat{\varOmega }} S^*)_{ij} }_{> 0} \Big ). \end{aligned}$$

(A.36g)

We now focus on Q(S) (A.34a) that is added to the KL-divergence to define V(S) in (A.34b). We have for each $j \in J_+(i)$

$$\begin{aligned} \langle S_i, ({\widehat{\varOmega }} S)_i \rangle - ({\widehat{\varOmega }} S)_{ij} \quad \longrightarrow \quad \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle - ({\widehat{\varOmega }} S^*)_{ij} > 0 \qquad \text {as} \quad S \rightarrow S^*.\nonumber \\ \end{aligned}$$

(A.37)

Since the limit is positive, there exists $\varepsilon > 0$ such that $\Vert S - S^* \Vert < \varepsilon $ implies

$$\begin{aligned} \langle S_i, ({\widehat{\varOmega }} S)_i \rangle - ({\widehat{\varOmega }} S)_{ij} \ge \frac{3}{4} \Big ( \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle - ({\widehat{\varOmega }} S^*)_{ij} \Big ), \quad \forall j \in J_+(i), \quad \forall i \in I.\nonumber \\ \end{aligned}$$

(A.38)

Consequently,

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} Q\big (S(t)\big )&\overset{\begin{array}{c} (2.2) \\ (\mathrm{A.34a}) \end{array}}{=} \sum _{i \in I} \sum _{j \in J_+(i)} S_{ij} \Big ( ({\widehat{\varOmega }} S)_{ij} - \langle S_i, ({\widehat{\varOmega }} S)_i \rangle \Big ) \end{aligned}$$

(A.39a)

$$\begin{aligned}&\le - \frac{3}{4} \sum _{i \in I} \sum _{j \in J_+(i)} S_{ij} \Big ( \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle - ({\widehat{\varOmega }} S^*)_{ij} \Big ). \end{aligned}$$

(A.39b)

Substituting (A.36) and (A.39) into (A.34b), we finally obtain

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} V(S(t))&= \frac{\mathrm {d}}{\mathrm {d}t} D_{\mathrm {KL}}^w(S^*, S(t)) + 2 \frac{\mathrm {d}}{\mathrm {d}t} Q(S(t)) \nonumber \\&\le \langle S, {\widehat{\varOmega }} S \rangle - \langle S^*, {\widehat{\varOmega }} S^* \rangle + \sum _{i \in I} \sum _{j \in J_-(i)} S_{ij} \Big ( \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle - ({\widehat{\varOmega }} S^*)_{ij} \Big ) \nonumber \\&\quad - \frac{1}{2} \sum _{i \in I} \sum _{j \in J_+(i)} S_{ij} \Big ( \langle S_i^*, ({\widehat{\varOmega }} S^*)_i \rangle - ({\widehat{\varOmega }} S^*)_{ij} \Big ) \end{aligned}$$

(A.40a)

$$\begin{aligned}&< 0. \end{aligned}$$

(A.40b)

$\square $

Proof of Theorem 1

Let $S^{*} \in \varLambda $ be any equilibrium point and $(t_{k})_{k \in {\mathbb {N}}}$ a corresponding sequence due to (A.27). We show that $D_{\mathrm {KL}}^w(S^*, S(t)) \rightarrow 0$ for $t \rightarrow \infty $, which is equivalent to the assertion $S(t) \rightarrow S^*$ to be shown.

Choose $\varepsilon > 0$ according to the Lemma 7. There exists $\varepsilon _1 > 0$ such that the (relatively) open set $U = \{ S \in \overline{{\mathcal {W}}} :V(S) < \varepsilon _1 \}$ is contained in $\{ S \in \overline{{\mathcal {W}}} :\Vert S - S^* \Vert < \varepsilon \}$. The function $t \mapsto f(S(t))$ is strictly increasing unless the orbit $\{ S(t) :t \ge 0 \}$ consists of an equilibrium. Hence, $f(S(t)) < f(S^*)$ for all $t \ge 0$. Since ${S(t_k) \rightarrow S^*}$, we get $S(t_{k_0}) \in U$ for some $k_0 \in {\mathbb {N}}$. Since then $t \mapsto V(S(t))$ is decreasing, i.e. ${V(S(t))< V(S(t_{k_0})) < \varepsilon _1}$ for all $t > t_{k_0}$, and because V(S(t)) is decreasing and $V(S(t_k)) \rightarrow V(S^*) = 0$, we get

$$\begin{aligned} 0 \le D_{\mathrm {KL}}^w (S^*, S(t)) \le V(S(t)) \rightarrow 0 \quad \text {for} \quad t \rightarrow \infty , \end{aligned}$$

(A.41)

which implies $S(t) \rightarrow S^*$ for $t \rightarrow \infty $. $\square $

1.4 Proof of Proposition 12

Proof

For any $t \in {\mathbb {N}}_{0}$, we set

$$\begin{aligned} Y^{(t)}(\tau )&= F_{\tau }(S^{(t)}) \overset{(3.1)}{=} \exp _{S^{(t)}}(\tau \varOmega S^{(t)}) \end{aligned}$$

(A.42a)

and thus have

$$\begin{aligned} Y^{(t)}(h)&= S^{(t+1)},\qquad Y^{(t)}(0) = S^{(t)},\qquad Y^{(0)}(0)= S_{0}. \end{aligned}$$

(A.42b)

Formula (1.12) implies

$$\begin{aligned} \dot{Y}^{(t)}(\tau ) = \frac{\mathrm {d}}{\mathrm {d}\tau }\exp _{S^{(t)}}(\tau \varOmega S^{(t)}) = R_{Y^{(t)}(\tau )}(\varOmega S^{(t)}) = G(Y^{(t)}), \end{aligned}$$

(A.43)

where we defined the shorthand $G(Y^{(t)})$.

Now, with S(t) solving the S-flow (2.2), we estimate with any $T \ge t h$,

$$\begin{aligned}&S(T) - Y^{(t)}(T-th) \end{aligned}$$

(A.44a)

$$\begin{aligned}&\quad = S(t h)-Y^{(t)}(0) + \int _{0}^{T-t h}\frac{\mathrm {d}}{\mathrm {d}\tau }\big (S(th+\tau )-Y^{(t)}(\tau )\big )\mathrm {d}\tau \end{aligned}$$

(A.44b)

$$\begin{aligned}&{\mathop {=}\limits ^{\tiny (\mathrm{A.43}),(2.2)}} S(th)-Y^{(t)}(0) + \int _{0}^{T-t h} \Big (F\big (S(t h+\tau )\big ) - G\big (Y^{(t)}(\tau )\big )\Big )\mathrm {d}\tau \end{aligned}$$

(A.44c)

$$\begin{aligned}&= S(th)-Y^{(t)}(0) + \int _{0}^{T-t h} \Big (F\big (S(t h+\tau )\big ) - F\big (Y^{(t)}(\tau )\big )\Big )\mathrm {d}\tau \nonumber \\&\quad + \int _{0}^{T-t h} \Big (F\big (Y^{(t)}(\tau )\big ) - G\big (Y^{(t)}(\tau )\big )\Big )\mathrm {d}\tau \end{aligned}$$

(A.44d)

$$\begin{aligned}&= S(th)-Y^{(t)}(0) + \int _{0}^{T-t h} \Big (F\big (S(t h+\tau )\big ) - F\big (Y^{(t)}(\tau )\big )\Big )\mathrm {d}\tau \nonumber \\&\quad + \int _{0}^{T-t h} \int _{0}^{\tau } \frac{\mathrm {d}}{\mathrm {d}\tau }\Big (F\big (Y^{(t)}(\tau )\big ) - G\big (Y^{(t)}(\tau )\big )\Big )\Big |_{\tau =\lambda }\mathrm {d}\lambda \mathrm {d}\tau \end{aligned}$$

(A.44e)

$$\begin{aligned}&= S(th) - Y^{(t)}(0) + \int _{0}^{T-t h} \Big (F\big (S(t h+\tau )\big ) - F\big (Y^{(t)}(\tau )\big )\Big )\mathrm {d}\tau \nonumber \\&\quad + \int _{0}^{T-t h} \int _{0}^{\tau } \Big (dF\big (Y^{(t)}(\lambda )\big )\big [G\big (Y^{(t)}(\lambda )\big )\big ]\nonumber \\&\qquad \qquad \qquad \qquad - dG\big (Y^{(t)}(\lambda )\big )\big [G\big (Y^{(t)}(\lambda )\big )\big ] \Big )\mathrm {d}\lambda \mathrm {d}\tau . \end{aligned}$$

(A.44f)

By assumption, F given by (2.2) is $C^{1}$, as is G given by (A.43) which has the same form. Consequently, regarding the integrand of the last integral, since $\overline{\mathcal {W}}$ is compact there exists a constant C such that

$$\begin{aligned} \Big \Vert dF\big (Y^{(t)}(\lambda )\big )\big [G\big (Y^{(t)}(\lambda )\big )\big ] - dG\big (Y^{(t)}(\lambda )\big )\big [G\big (Y^{(t)}(\lambda )\big )\big ] \Big \Vert \le C,\qquad \forall Y^{(t)} \in \overline{{\mathcal {W}}}.\nonumber \\ \end{aligned}$$

(A.45)

Hence,

$$\begin{aligned}&\Vert S(T) - Y^{(t)}(T-th)\Vert \end{aligned}$$

(A.46a)

$$\begin{aligned}&\quad \le \Vert S(th) - Y^{(t)}(0)\Vert + \int _{0}^{T-t h} \big \Vert F\big (S(t h+\tau )\big ) - F\big (Y^{(t)}(\tau )\big )\big \Vert \mathrm {d}\tau \nonumber \\&\qquad + C \int _{0}^{T-t h} \int _{0}^{\tau }\mathrm {d}\lambda \mathrm {d}\tau \end{aligned}$$

(A.46b)

$$\begin{aligned}&\quad \le \Vert S(th) - Y^{(t)}(0)\Vert + L \int _{0}^{T-t h} \Vert S(t h+\tau )-Y^{(t)}(\tau )\Vert \mathrm {d}\tau \nonumber \\&\qquad + \frac{C}{2}(T-t h)^{2} \end{aligned}$$

(A.46c)

Applying Gronwall’s inequality [29, Lemma 2.7] yields

$$\begin{aligned} \Vert S(T) - Y^{(t)}(T-th)\Vert \le \Big (\Vert S(th) - Y^{(t)}(0)\Vert + \frac{C}{2}(T-t h)^{2}\Big ) e^{L(T-t h)} \end{aligned}$$

(A.47)

and setting $T=(t+1) h$

$$\begin{aligned} \big \Vert S\big ((t+1)h\big )-Y^{(t)}(h)\big \Vert&\overset{(\mathrm{A.42b})}{=} \big \Vert S\big ((t+1)h\big )-S^{(t+1)}\big \Vert \end{aligned}$$

(A.48a)

$$\begin{aligned}&\overset{(\mathrm{A.42b})}{\le } \Big (\Vert S(th) - S^{(t)}\Vert + \frac{C h^{2}}{2}\Big ) e^{L h}. \end{aligned}$$

(A.48b)

Thus,

$$\begin{aligned} \Vert S(t h)-S^{(t)}\Vert&\le \Big (\Vert S\big ((t-1)h\big ) - S^{(t-1)}\Vert + \frac{Ch^{2}}{2}\Big ) e^{L h} \end{aligned}$$

(A.49a)

$$\begin{aligned}&\le \Big (\Big (\Vert S\big ((t-2)h\big ) - S^{(t-2)}\Vert + \frac{Ch^{2}}{2}\Big ) e^{L h} + \frac{Ch^{2}}{2}\Big ) e^{L h}\end{aligned}$$

(A.49b)

$$\begin{aligned}&= \big \Vert S\big ((t-2)h\big ) - S^{(t-2)}\big \Vert e^{2\,L h} + \frac{Ch^{2}}{2}\big (e^{2\,L h} + e^{L y}\big )\end{aligned}$$

(A.49c)

$$\begin{aligned}&= \big \Vert S\big ((t-2)h\big ) - S^{(t-2)}\big \Vert e^{2\,L h} + \frac{Ch^{2}}{2}\big (e^{2\,L h} + e^{L h}\big ) \end{aligned}$$

(A.49d)

$$\begin{aligned}&= \frac{Ch^{2}}{2}\Big (\frac{e^{(t+1) L h}-1}{e^{L h}-1}-1\Big ) = \frac{Ch^{2}}{2}e^{L h}\frac{e^{t L h}-1}{e^{L h}-1} \end{aligned}$$

(A.49e)

and using $e^{L h}\ge 1+L h$

$$\begin{aligned}&\le \frac{C}{2\,L} h e^{(t+1) L h},\qquad \forall t \in {\mathbb {N}}. \end{aligned}$$

(A.49f)

$\square $

Stability statements for dynamical systems

We state basic results from the literature which are used to analyze the stability of the equilibria of the S-flow in Sect. 2.3.

Theorem 3

Let $x^*$ be an equilibrium point of the system ${\dot{x}}(t) = F(x(t))$ with $F \in C^1(U, {\mathbb {R}}^n)$.

(a)
If all eigenvalues of the Jacobian matrix $\tfrac{\partial F}{\partial x}(x^*)$ have negative real part, then $x^*$ is exponentially stable.
(b)
If the Jacobian matrix $\tfrac{\partial F}{\partial x}(x^*)$ has an eigenvalue with positive real part, then $x^*$ is unstable.

Statement (a) can be found in [29, Theorem 6.10]. For statement (b) we refer to [24, Proposition 6.2.1]. These stability criteria concern flows ${\dot{x}}(t) = F(x(t))$ on an open subset $U \subseteq {\mathbb {R}}^n$.

Since we regard the S-flow as a flow on the compact set $\overline{{\mathcal {W}}}$, we need a few additional arguments. In [24, Section 6.8.4], a direct proof of theorem 3(b) is sketched. Since we employ techniques that are used in that proof for our own analysis, we summarize the main statements in the following proposition for the reader’s convenience. Informally, the proposition states that, if $\tfrac{\partial F}{\partial x}(x^*)$ has an eigenvalue with positive real part, then there exists an open truncated cone at $x^*$ where the flow ${\dot{x}} = F(x)$ is repelled from $x^*$.

Proposition 16

Let $x^*$ be an equilibrium point of ${\dot{x}}(t) = F(x(t))$ with $F \in C^1(U, {\mathbb {R}}^n)$. Then

(a)
There exist a sufficiently small $\varepsilon _1 > 0$ and a (real) similarity transform
$$\begin{aligned} V^{-1} \tfrac{\partial F}{\partial x}(x^*) V = \begin{pmatrix} A_{\text {sc}} &{} 0 \\ 0 &{} A_{\text {u}} \end{pmatrix} = A, \end{aligned}$$
(B.1)
such that
1. (i)
  $\mathfrak {R}(\lambda ) \le 0$ for all eigenvalues $\lambda $ of $A_{\text {sc}}$,
2. (ii)
  $\mathfrak {R}(\lambda ) > 0$ for all eigenvalues $\lambda $ of $A_{\text {u}}$,
3. (iii)
  $\langle y_{\text {sc}}, A_{\text {sc}} y_{\text {sc}} \rangle \le \tfrac{\varepsilon _1}{4} \Vert y_{\text {sc}} \Vert _2^2$,
4. (iv)
  $\langle y_{\text {u}}, A_{\text {u}} y_{\text {u}} \rangle \ge \varepsilon _1 \Vert y_{\text {u}} \Vert _2^2$.
(b)
Suppose $\tfrac{\partial F}{\partial x}(x^*)$ has at least one eigenvalue $\lambda $ with $\mathfrak {R}(\lambda ) > 0$. Considering an affine coordinate transform $y=V^{-1}(x - x^*)$ with $V \in \mathrm {GL}_n({\mathbb {R}})$ from (a), the resulting flow
$$\begin{aligned} {\dot{y}} = G(y) = V^{-1} F(Vy + x^*), \end{aligned}$$
(B.2)
which has the equilibrium $y^* = 0$ with $\tfrac{\partial G}{\partial y}(0) = V^{-1} \tfrac{\partial F}{\partial x}(x^*) V = A$, has the following property. There exist $\eta > 0$, $\delta > 0$ and $\varepsilon > 0$, such that if the flow starts at some point in the open truncated cone
(B.3)
then the solution will not cross the conical portion of $\partial U_{\eta , \delta }$, i.e.
$$\begin{aligned} \big \{ y \in {\mathbb {R}}^{n} :\Vert y_{\text {sc}} \Vert _2^2 = \eta \Vert y_{\text {u}} \Vert _2^2,\ \Vert y \Vert _2 < \delta \big \}, \end{aligned}$$
(B.4)
and it fulfills $\Vert y(t) \Vert \ge \Vert y(0) \Vert e^{\varepsilon t}$ as long as $y(t) \in U_{\eta ,\delta }$, i.e., y(t) leaves the ball $B_{\delta }(0)$ at some time point. Especially, the equilibrium $y^* = 0$ is unstable. This property is accordingly transferred to the equilibrium $x^*$ of ${\dot{x}}(t) = F(x(t))$ using $x(t) = V y(t) + x^*$.

We note that if $\tfrac{\partial F}{\partial x}(x^*)$ is diagonalizable with real eigenvalues then the similarity transform in proposition 16(a) is just the diagonalization. In general, if $v \in {\mathbb {C}}^n$ is an eigenvector of $\tfrac{\partial F}{\partial x}(x^*)$ corresponding to an eigenvalue $\lambda \in {\mathbb {C}}$ with $\mathfrak {R}(\lambda ) > 0$ and $V \in \mathrm {GL}_n({\mathbb {R}})$ is given by Proposition 16(a), then $V^{-1} \mathfrak {R}(v) = \left( {\begin{array}{c}0\\ y_{\text {u}}\end{array}}\right) $ and $V^{-1} \mathfrak {I}(v) = \left( {\begin{array}{c}0\\ {\tilde{y}}_{\text {u}}\end{array}}\right) $.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zern, A., Zeilmann, A. & Schnörr, C. Assignment flows for data labeling on graphs: convergence and stability. Info. Geo. 5, 355–404 (2022). https://doi.org/10.1007/s41884-021-00060-8

Download citation

Received: 27 February 2020
Revised: 07 October 2021
Accepted: 31 October 2021
Published: 18 November 2021
Issue Date: November 2022
DOI: https://doi.org/10.1007/s41884-021-00060-8

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Assignment flows for data labeling on graphs: convergence and stability

Abstract

Similar content being viewed by others

On the Geometric Mechanics of Assignment Flows for Metric Data Labeling

On the geometric mechanics of assignment flows for metric data labeling

A Variational Perspective on the Assignment Flow

1 Introduction

1.1 Problem and motivation

1.2 Assignment flow

1.3 Objectives

1.4 Related work

1.5 Organization

1.6 Basic notation

2 Properties of the assignment flow

2.1 Representation of the assignment flow

Proposition 1

Remark 1

Proposition 2

Proof

2.2 Existence and uniqueness

Proposition 3

Proof

Remark 2

Proposition 4

Proof

Example 1

2.3 Convergence to equilibria and stability

2.3.1 Characterization of equilibria and their stability

Proposition 5

Proof

Remark 3

Lemma 1

Proof

Proposition 6

Proof

Corollary 1

Proof

Remark 4

2.3.2 Convergence of the S-flow to equilibria

Theorem 1

Proof

Proposition 7

Proof

Remark 5

Theorem 2

Proof

Corollary 2

2.3.3 Basins of attraction

Proposition 8

Proof

Proposition 9

Proof

Corollary 3

Proof

2.4 Convergence properties of the linear assignment flow

Lemma 2

Proof

Lemma 3

Proposition 10

Proposition 11

Proof

Remark 6

Lemma 4

Proof

Corollary 4

Remark 7

3 Discretization, numerical examples and discussion

3.1 Discretization, geometric integration

Proposition 12

Proof

Proposition 13

Proof

3.2 Numerical examples, discussion

3.2.1 Vanishing diagonal averaging parameters

Example 2

3.2.2 Constructing \(3\times 3\) systems with various asymptotic properties

Lemma 5

Proof

Proposition 14

Proof