Mean curvature, threshold dynamics, and phase field theory on finite graphs

In the continuum, close connections exist between mean curvature flow, the Allen-Cahn (AC) partial differential equation, and the Merriman-Bence-Osher (MBO) threshold dynamics scheme. Graph analogues of these processes have recently seen a rise in popularity as relaxations of NP-complete combinatorial problems, which demands deeper theoretical underpinnings of the graph processes. The aim of this paper is to introduce these graph processes in the light of their continuum counterparts, provide some background, prove the first results connecting them, illustrate these processes with examples and identify open questions for future study. We derive a graph curvature from the graph cut function, the natural graph counterpart of total variation (perimeter). This derivation and the resulting curvature definition differ from those in earlier literature, where the continuum mean curvature is simply discretized, and bears many similarities to the continuum nonlocal curvature or nonlocal means formulation. This new graph curvature is not only relevant for graph MBO dynamics, but also appears in the variational formulation of a discrete time graph mean curvature flow. We prove estimates showing that the dynamics are trivial for both MBO and AC evolutions if the parameters (the time-step and diffuse interface scale, respectively) are sufficiently small (a phenomenon known as"freezing"or"pinning") and also that the dynamics for MBO are nontrivial if the time step is large enough. These bounds are in terms of graph quantities such as the spectrum of the graph Laplacian and the graph curvature. Adapting a Lyapunov functional for the continuum MBO scheme to graphs, we prove that the graph MBO scheme converges to a stationary state in a finite number of iterations. Variations on this scheme have recently become popular in the literature as ways to minimize (continuum) nonlocal total variation.


Introduction
Motion by mean curvature and flows involving mean curvature in general appear in many important continuum models, including models coming from materials science [81,105], fluid dynamics [58], and combustion [111,88]. All such models involve a front propagating with a velocity depending on the mean curvature of the front. Recently, there has been an increasing interest in using ideas from continuum PDEs (related to mean curvature) in discrete applications such as image analysis, machine learning and data clustering [9,107,74,51,60]. This paper initiates a systematic study of the definition of mean curvature for vertex sets of an arbitrary graph G = (V, E). We examine the effectiveness of the algorithms in the recent papers mentioned above and on how they may be improved. The graphs considered are arbitrary graphs and are not necessarily obtained as the discretization of a continuum problem, so our perspective is only parallel to one that is purely motivated by numerical analysis. In particular, we do not assume an embedding of the graph in a low dimensional space.
Of course, the various definitions of curvature in the (usual) continuum setting (see Appendix A for a brief review and some references) motivate and inform our approach to defining the curvature of a vertex set S ⊂ V using the discrete total variation norm and the discrete divergence of a "normal" edge flow. Since they are closely related to questions of mean curvature, the Allen-Cahn equation and the MBO scheme for arbitrary graphs G = (V, E) arise naturally in the present investigation. Theoretical and numerical examples are used to highlight possible connections between all these concepts, leading to a number of open questions given in Section 7. Graphs Laplacians, Allen-Cahn, and MBO. Graph Laplacians are the central objects of study in spectral graph theory [29]. These graph operators share many properties with their continuum counterparts. The Allen-Cahn equation on the graph V is defined in terms of the graph Laplacian, ∆, and any (typically bistable quartic) potential, W . One considers a phase field, u : V × R + → R, solving the differential equation,u This nonlinear equation has received greater attention recently, spurred by some of the applications mentioned above. The graph Allen-Cahn equation was introduced in the context of data classification in [9] and, in a number of examples, was shown to be both accurate and 4 efficient. As is well known, the continuum Allen-Cahn equation is the L 2 gradient flow associated to the Ginzburg-Landau functional. This is also true in the graph setting. In [107] it was shown that the graph Ginzburg-Landau functional Γ-converges to the graph cut objective functional on graphs, if the characteristic length scale ε goes to zero. Moreover, a relationship between the graph cut functional and the continuum total variation functional was given. At the same time, the continuum Allen-Cahn solution is known to converge to mean curvature flow, when ε → 0 [6]. Furthermore, the mean curvature is directly related to the first variation of the total variation functional. In this paper we therefore study the graph Allen-Cahn equation and make connections between it and a graph cut derived 'graph curvature'. The third ingredient in this paper is the threshold dynamics or Merriman-Bence-Osher (MBO) algorithm on graphs. Its continuum counterpart was introduced in [75,76] and consists of iteratively solving the heat equation for a short time, τ , and thresholding the result to an indicator function. It is known that, for short diffusion times τ , this approximates mean curvature flow [46,5]. In this paper we therefore also study the connections between the graph MBO scheme, the graph AC equation, and graph curvature.
In a recent series of papers [41,103,36,32,104,33,40,35, 34] Elmoataz et al. study partial differential equations and front propagation on graphs, mainly from a numerical point of view. In these papers the 1-Laplacian on a graph is used as curvature, which differs from our approach. We use the anisotropic graph total variation instead of the isotropic total variation (see Section 2 for definitions), since [107] suggests that is the natural total variation on graphs.
A common obstacle, when transferring results and intuitions from the continuum to graphs, is the (implicit) lower bound on the accessible length scales on a graph. We show in Theorem 5.3 and Theorem 4.2, that if ε or τ is too small, then the Allen-Cahn equation exhibits "freezing" (or "pinning"), or the MBO scheme is stationary, respectively. Hence the interesting dynamics happen at small but positive, ε or τ , rather than in the limits as ε → 0 or τ → 0. Related is the lack of a chain rule for derivatives on graphs 1 , which can be traced back to the absence of 'infinitesimal' length scales on a graph. As a consequence, the level set approach, which has proven very successful in describing continuum mean curvature flow, is not independent of the level set function on a graph.
New results. The finite spectral radius of the Laplacian is used to derive explicit bounds on the parameters for both threshold dynamics (MBO) and the graph Allen-Cahn equation that guarantee "freezing" or "pinning" of the initial phase, a phenomenon that has been observed in numerical simulations and is well known for discretizations of the continuum processes [75] [77,Section 4.4].
In the opposite direction, an argument based on the comparison principle is used to obtain a lower bound for the MBO time step that guarantees that a specific node of the phase changes in a single MBO iteration. This bound is given in terms of a new notion of mean curvature for general graphs, and as such, it is a "local" quantity (as opposed to one coming from spectral data). Such local bounds may be of use in developing adaptive time stepping schemes that complement the (spectral) adaptive schemes, such as those developed for discretizations of the continuum mean curvature flow [91,113]. In this sense, introducing the graph mean curvature and highlighting its connection with subjects in continuum PDE (MBO, Ginzburg-Landau, and nonlocal mean curvature) and graph theory (graph cuts, connectivity) are the main contributions of this work.
The results in Sections 4 and 5 and the numerical evidence and explicit examples in Section 6 suggest several open questions about the graph MBO scheme, the graph Allen-Cahn equation, and graph mean curvature, which are discussed in Section 7. These are interesting questions for future work.
Outline. In Section 2 the relevant graph based calculus is introduced, setting the notation for the rest of the paper. In particular, the graph Laplacian and its basic properties are discussed. Section 3 discusses curvature and mean curvature flow on a graph. Sections 4 and 5 discuss the MBO scheme and Allen-Cahn equation on graphs, respectively, and sufficient conditions are given on the parameters to guarantee freezing or pinning of the initial conditions. Section 6 explores the graph processes and concepts introduced in these previous three sections through theoretical and computational examples. Finally, we conclude in Section 7, with a discussion and a few open questions based on the new estimates and examples from previous sections. In Appendix A, we make some remarks on the continuum mean curvature flow. Appendix B discusses some similarities between the graph Laplacian, the graph 1-Laplacian, and the graph curvature. 6

Setup
We work on a finite 2 , undirected graph G = (V, E) with vertex set 3 V = {i} n i=1 and edge set E ⊂ V 2 . The graph is weighted; each edge (i, j) ∈ E, incident on nodes i and j, is assigned a weight ω ij > 0. Since the graph is undirected, (i, j) is identified with (j, i) and ω ij = ω ji . To simplify notation we extend ω ij to be zero for all (i, j) ∈ V 2 which do not correspond to an edge. The degree of node i is d i := j∈V ω ij . Denote the maximal and minimal degrees by d + := max i∈V d i and d − := min i∈V d i . We assume that G has no isolated nodes, i.e., d − > 0. For each i ∈ V we then have a nonempty set of neighbors N i := {j ∈ V : ω ij > 0}. We also assume G has no self-loops, i .e., ω ii = 0. In particular i ∈ N i .
Let V be the space of all functions V → R and E the space of all skew-symmetric 4 functions E → R. Again to simplify notation, we extend each ϕ ∈ E to a function ϕ : V 2 → R by setting ϕ ij = 0 if (i, j) ∈ E. As justified in earlier work [107] we introduce the following inner products and operators for parameters q ∈ [1/2, 1] and r ∈ [0, 1]: Note that in the sum in ϕ, φ E the indices i and j both run over all nodes. The edges (i, j) and (j, i) are counted separately, hence the correction factor 1 2 . Note that the powers 2q − 1 and 1 − q in the E inner product and 'dot product' and in the gradient, are zero for the admissible choices q = 1 2 and q = 1 respectively. In these cases we define ω 0 ij = 0 whenever ω ij = 0, so as not to make the inner product, 'dot product', or gradient, nonlocal on the graph. The inner products on V and E 5 are analogous to a weighted L 2 inner products in the 2 In this paper, we are working with a fixed graph G with a finite number of nodes. In no sense are we considering a sequence of graphs or taking a "continuum limit". 3 We will use the terms "vertex" and "node" interchangeably. 4 The necessity of skew-symmetry may not be obvious at this point, but it is a common requirement for consistency of certain concepts in discrete calculus, see e.g., [49,54,56,21]. See also Remark 2.1. 5 Note that ·, · E is indeed an inner product on the space of (skew-symmetric) functions E → R, but not for the space of functions V 2 → R, because for those functions the 'inner product' can be zero for nontrivial functions. continuum case, while the 'dot product' inner product (ϕ · φ) i is analogous to a weighted dot product between vector(field)s (without the integration of the L 2 inner product). A direct computation shows that div and ∇ are adjoints with respect to ·, · V and ·, · E , namely for u ∈ V and φ ∈ E we have This leads to the following associated norms, Laplacians, set measures, and total variation functionals: • Inner product norms, The norm corresponding to the dot product |ϕ| i := (ϕ · ϕ) i .
Note that |φ| ∈ V. • The Dirichlet energy 1 2 • The graph Laplacian ∆ := div • ∇ : It is worth noting that the sign convention for the graph Laplacian is opposite to that used for the continuum Laplacian in most of the PDE literature (in particular, the graph Laplacian ∆ is a positive semidefinite operator). When r = 0, ∆ is referred to as the unnormalized weighted graph Laplacian. When r = 1, ∆ is referred to as the asymmetric normalized graph Laplacian or random walk graph Laplacian. Another Laplacian, often encountered in the spectral graph 6 To justify these definitions and convince ourselves that there should be no ω ij or d i included in the maximum norms we define ϕ p E,p := 1 2 i,j∈V ϕ p ij ω 2q−1 ij . Adapting the proofs in the continuum case, e.g., [1, Theorems 2.3 and 2.8], to the graph situation we can prove a Hölder inequality ϕφ E,1 ≤ ϕ E,p φ E,p for 1 < p, p < ∞ such that 1 p + 1 p = 1, an embedding theorem of the form 8 theory literature, is the symmetric normalized graph Laplacian. This one falls outside the scope of the current setup and will not be considered in this paper. For general references on the graph Laplacian, consult [80,29,108,10].
Note that |S| is just a special case of vol S, for r = 0 (recall we assume d − > 0). • The anisotropic total variation TV q a : V → R: Here, the signum function is understood to act element-wise on the elements of ∇u. For the isotropic total variation on graphs, see Remark 2.1. The maximum in the definition of TV q a is achieved by ϕ = ϕ a := sgn(∇u).
Note that the values ϕ a takes on the set {∇u = 0} are irrelevant for achieving the maximum, hence this function is not uniquely determined.
The anisotropic total variation of the indicator function 7 for the set S ⊂ V , denoted χ S , is given by Thus, the total variation of a set S is equivalent to the graph cut between S and S c := V \S which is used in graph theory and spectral clustering [95]. For future reference it is useful to note that Remark 2.1. One can also define an isotropic total variation on graphs TV : V → R: The isotropic total variation also has a 'maximum formulation', if we are willing to let go of the skew-symmetry condition for functions in E. To be precise, define the extended set of edge functions E e as the set of all functions E → R, and extend the definition of divergence (compatible with the earlier definition) to functions ϕ ∈ E e : Using the same inner product on E e as on E, this divergence is again the adjoint of the gradient ∇. Then we can write The maximum is achieved by taking As in the anisotropic case, the values ϕ T V takes on the set {∇u = 0} are irrelevant for achieving the maximum, hence this function is not uniquely determined. The quantity div ∇u |∇u| is often referred to as the 1-Laplacian of u.
Lemma 2.2. The norms · V and · V,∞ are equivalent, with optimal constants given by Next we recall the definitions of node set boundaries and (signed) graph distance.
Then, the graph distance between arbitrary i, j ∈ V is given by where the minimum is taken over all paths γ with i 1 = i, i N = j.
In other words, d G ij is the minimal distance to go from node i to node j, traveling only via existing edges, where each edge represents a distance ω q−1 ij . For a given set S ⊂ V , we define the graph distance to S at each node as the minimal graph distance to a node in S: As argued in, for example, [71, Section 3.1, Example 2], d S is the solution u ∈ V to the equation Note that ∂S ⊂ S. Alternative definitions appear in the literature in which ∂S ⊂ S c .
2.1. Basic spectral properties of the graph Laplacian, ∆ In this section, we collect a number of spectral properties of the graph Laplacian ∆ : V → V. Further discussion and details for the special cases r = 0 and r = 1 can be found in [80,29,108,10], from which our presentation follows.
Note that ∆ : V → V is a self-adjoint operator in the V norm. For u ∈ V \ {0}, the Rayleigh quotient R : V → R is defined as The eigenvalues of the graph Laplacian, ∆, are then defined via the variational formulation, The minimum in (5), is attained when F k is spanned by the first k eigenfunctions, i.e., the eigenfunctions corresponding to the k smallest eigenvalues, counting multiplicities. In particular, there are n 8 Similarly, by changing the "strictly less than" inequalities into "strictly larger than" inequalities the boundary ∂(S c ) of the set S c is defined. The reduced boundary of S can be defined as the following subset of ∂S (compare with the continuum case in [4, Definition 3.54]): and again similarly for ∂ * S c .
non-negative real eigenvalues (counted with multiplicity), denoted {λ k } n k=1 . If we denote the span of the first k − 1 eigenfunctions bŷ where u ⊥ VFk−1 indicates that u is orthogonal (in the sense of ·, · V ) to u i , for i ∈ {1, . . . , k − 1}. Taking variations of the Rayleigh quotient with respect to u, we find that (λ k , u k ) satisfies (6) if and only if u ⊥ VFk−1 and, for all v ∈ V, Finally, unwinding the definitions, we find that (7) is equivalent to the matrix eigenvalue problem where A ij = ω ij and D ii = d i is a diagonal matrix 9 . We remark that the change of variables, y = D r/2 x, in (8) gives the standard eigenvalue problem Recall that the spectral radius ρ of ∆ is defined as the maximum of the absolute values of the eigenvalues of ∆, Lemma 2.5 (Spectral properties of the graph Laplacian, ∆). The following properties are satisfied: (a) The smallest eigenvalue is λ 1 = 0. The multiplicity of λ 1 = 0 is the number of connected components of the graph and the associated eigenspace is spanned by set of indicator vectors for each connected component. If there is only one connected component, λ 1 is simple and the first (unnormalized) eigenfunction is ∆u V u V , and the spectral radius are equal, ∆ V = ρ(∆). This implies that, for all u ∈ V, (c) The trace satisfies tr ∆ = n k=1 λ k = i∈V d 1−r i . Consequently, (d) If G is not a complete graph then (e) The second eigenvalue satisfies .
(b) Noting that ∆ is a self-adjoint operator, a proof can be found in, for example, [89,Thm. VI.6].
(c) Because the trace of the operator ∆ is equal to the trace of its matrix representation, we have tr ∆ = tr L = n k=1 λ k . Since we assume there are no self-loops in the graph, tr D −r A = 0, hence tr L = tr D 1−r . Equation (9) follows from the fact that λ 1 = 0 and the maximum (minimum) of a set is greater (less) than or equal to the mean of the set.
(d) If G is not a complete graph, then there exists an (a, b) / ∈ E. We define the test function v ∈ V, Note that v, 1 V = 0. The desired upper bound then follows from (6).
Then v, 1 V = 0 and v 2 V = (vol S c )(vol S)vol V . Using (2), we compute ∇v 2 E = (vol V ) 2 TV 1 a (χ S ). The first inequality then follows from (6). For the second inequality, If j ∈ V is such that d j = d + , then the supremum is attained by the vector u = χ {j} and the result follows.
The following lemma states properties of the diffusion operator e −t∆ : V → V.
Let the mass, M , be defined as in (10), λ 2 be the second eigenvalue of the graph Laplacian, and ε > 0. Assume the graph is connected.

Proof. (a) We compute
denote the eigenpairs of the graph Laplacian with V-normalized eigenvectors, then the spectral decomposition of u is given by Recalling from Lemma 2.5 that λ 1 = 0 and v 1 = (vol V ) − 1 2 χ V and using (11), we compute But by Lemma 2.2, this implies The result now follows, since by Lemma 2.5, λ 2 > 0 for a connected graph.
(d) If u 0 ≡ v 0 , then u(t) ≡ v(t), for all t > 0, by the uniqueness theorem for ordinary differential equations. In this case there is nothing more to prove. Moreover, by repeating the argument on each connected component we may assume without loss of generality that the entire graph is connected.
Let u 0 , v 0 be such that, for all j ∈ V , we have (u 0 ) j ≤ (v 0 ) j , and for some j 0 ∈ V , (u 0 ) j0 < (v 0 ) j0 . We will show that in this case u j (t) < v j (t), for every j ∈ V and all t > 0, which proves the strong comparison principle and in particular the comparison principle.
Arguing by contradiction, suppose that u j (t) ≥ v j (t) for some t and some j. Let t 0 be the last time v(t) lies everywhere above u(t), that is By our assumption we have that 0 ≤ t 0 < ∞. Then, from the definition of t 0 there is some k ∈ V such that u k (t 0 ) = v k (t 0 ) andu k (t 0 ) ≥v k (t 0 ). Moreover, again due to the definition of t 0 , u j (t 0 ) ≤ v j (t 0 ), for all j ∈ V . This shows that if u i (t 0 ) < v i (t 0 ) for some neighbor i of k, then (∆u(t 0 )) k > (∆v(t 0 )) k and 0 =v k (t 0 ) + (∆v(t 0 )) k <u k (t 0 ) + (∆u(t 0 )) k = 0, which is a contradiction. We conclude that u(t 0 ) = v(t 0 ) at all neighbors of k, and by iterating the above argument we get in fact that u(t 0 ) = v(t 0 ) at all nodes of V , since we are dealing with the case where V is connected. By the uniqueness theorem for ordinary differential equations we conclude that u 0 = v 0 at all nodes, which gives a contradiction, and the strong comparison principle is proved.
Remark 2.7. The dependence of the convergence of u(t) to the steady state (vol V ) −1 M χ V on the second Laplacian eigenvalue, λ 2 , in Lemma 2.6(c), is related to the rate of convergence of a Markov process on a graph to the uniform distribution [100]. Due to this property, λ −1 2 is sometimes referred to as the mixing-time for a graph. The second eigenvalue λ 2 is also referred to as the algebraic connectivity or Fiedler value for a graph [48], and plays an important role in many applications. The robustness of a network to node/edge failures is highly dependent on the algebraic connectivity of the graph. In the "chip-firing game" of Björner, Lovász, and Shor, the algebraic connectivity dictates the length of a terminating game [11]. The algebraic connectivity is also related to the informativeness of a least-squares ranking on a graph [86]. Consequently, algebraic connectivity is a measure of performance for the convergence rate in sensor networks, data fusion, load balancing, and consensus problems [84].

Relation between graph Laplacians and balanced graph cuts
In spectral graph theory there are some well known connections between the various graph Laplacians and different normalizations of the graph cut TV 1 a (χ S ) = i∈S,j∈S c ω ij from (2). For example, when r = 0 (and hence vol S = |S|), the quantity T V 1 a (χ S ) min(vol S, vol S c ) in Lemma 2.5(e) is the Cheeger cut and its minimum over S ⊂ V is the Cheeger constant, e.g., [95,102]. Let S 1 , . . . , S k be a partition of all the nodes V , for a given integer k and define the balanced graph cut We use the subscript r to remind us that vol S i depends on r. For r = 0 this quantity is known in the literature as the ratio cut 10 , for r = 1 as the normalized cut [95,108]. These quantities are introduced, because minimization of the graph cut, without a balancing term in the denominator, often leads to a partition with many singleton sets, which is typically unwanted in the application at hand. Minimization of this balanced cut over all partitions of V is an NPcomplete problem [109,95], but a relaxation of this problem can be defined using the graph Laplacian. For example, in [108,Section 5] it is shown that where H is an n by k matrix with elements Note that H is orthonormal in the V inner product, i.e., H T D r H = I, where I is the k by k identity matrix. Hence the minimization of C r over all partitions, is equivalent to minimizing Tr(H T LH) over all n by k V-orthonormal matrices of the form as in (12). The relaxation of this NP-complete minimization problem is now formulated by dropping the condition (12) and minimizing over all n by k V-orthonormal matrices. The problem then becomes an eigenvalue problem and the first k eigenvectors of L are expected to be approximations to the indicator functions of the optimal partition S 1 , . . . , S k . There is often no guarantee of the quality of this approximation though [98,57,64,65,99,108]. In order to turn the approximations of the indicator functions into true indicator functions a method like thresholding or k-means clustering is often used [82]. For partitioning the nodes into two subsets (k = 2), the potential term in the Allen-Cahn equation (discussed in Section 5) can be interpreted as a nonlinear extension of the graph Laplacian eigenvalue problem which forces the approximate solutions to be close to indicator functions. In this light, it is interesting that the graph Ginzburg-Landau functional (with possibly a mass constraint), of which the Allen-Cahn equation is a gradient flow, Γ-converges to the graph cut functional [107].

Curvature and mean curvature flow on graphs
In this section we will derive a graph curvature, analogous to mean curvature in the continuum, and study some of its properties. We end the section with a description of mean curvature flow on graphs.

Graph curvature
In the continuum case, the mean curvature is given by (minus) the divergence of the normal vector field on the boundary of the set (see Appendix A.2 and (46)). This normal vector field achieves the supremum in the definition of total variation of a characteristic function (under sufficient smoothness conditions). Hence, we define the normal of a vertex set by using ϕ a from (1) which achieves the supremum of the anisotropic total variation.
As in the continuum case, we define the curvature of a set as the divergence of the normal.
Recall from (1), that ν S ij is not uniquely determined on {(i, j) ∈ E : (∇χ S ) ij = 0} (i.e., away from the boundary ∂S ∪ ∂(S c ), in the sense of Definition 2.4) and hence the value 0 in (13) is a choice corresponding to the extension of the normal field away from the boundary. This ambiguity is irrelevant when div ν S is coupled to the characteristic function χ S via the V-inner product, as in but care should be taken when trying to interpret the normal or the curvature outside this setting. Note that for q = 1 and S ⊂ V , |(κ 1,r The curvature, κ q,r S , has the property that it vanishes away from the boundary ∂S ∪ ∂(S c ). In particular, we see that the above mentioned ambiguity is also irrelevant for pairings with χ S c since Let S,Ŝ ⊂ V be two node sets, then (15) implies In particular, ifŜ = S \ {n} for a node n ∈ S, then Because we assume there are no self-loops, in the final equality ω nn = 0. A similar computation shows for n ∈ S c The preceding discussion implies the following: if Ω ⊂ V is such that S minimizes TV q a (χ S ) among all sets S ⊂ V such that S∆S ⊂ Ω, then we have that (compare with the nonlocal mean curvature, (23)). Here, Ω is a set where S and S are forced to agree (similar to enforcing a boundary condition in the continuum case). The two inequality conditions in (18) are opposite for the two sides of the interface between S and S c . This strengthens the heuristic idea that the 'real' interface, where there would be an equality condition, is lost due to the lower bound on the accessible length scales on a graph.

Remark 3.3.
It is interesting to make some connections between the graph total variation TV a and graph curvature κ 1,r S on the one hand, and the local clustering coefficient [110] on the other. Assume G is unweighted (and undirected, as per usual in this paper), then the clustering coefficient C i of node i, is the number of triangles node i is part of, divided by the number of possible triangles in the (A version for weighted graphs was introduced in [7, Formula 5].) Using (2) and (16), we can rewrite, for r = 1, As for the continuum case, we can arrive at the graph curvature, κ q,r S , in several ways. We discuss some of them below. However, the analogy with the continuum curvature becomes even more apparent 20 if instead of the standard mean curvature, we consider the nonlocal mean curvature (see Section 3.2). If we (formally) compute the first variation of the continuum TV(u) over all functions u ∈ BV of bounded variation (and not restricting ourselves to characteristic functions only), we find div ∇u |∇u| , the curvature of the level sets of u. In (28) and Appendix B we follow a similar procedure on the graph and again find div ϕ a , with ϕ a from (1).
Similarly, in the continuum case, an alternative definition of the mean curvature is div ∇χ S |∇χ S | which is a Radon measure defined on the boundary. However, |∇χ S | = 1 on the boundary of S, so that the mean curvature is simply ∆χ S . As long as S is a rectifiable set, this computation can be made rigorous in the context of BV functions (see [47,Chapter 5]). Computing the analogous quantity on a graph, we find This is equal to κ 1,r S (compare with (3)). The choice q = 1 is a natural one, because it corresponds to the Γ-limit of the graph Ginzburg-Landau functional (GL ε ) [107], whose definition is given in Section 5.
In the continuum case the mean curvature κ(x) at the point x ∈ ∂S in the boundary of a set S ∈ R d satisfies the property that, if S is smooth enough, then for any given ball B δ (x) of radius δ and center x ∈ ∂S, Note that if S were a half space, then the expression on the right is zero for all δ, since ∂S would separate each B δ (x) in sets of equal volume. Thus κ measures how much ∂S deviates from cutting B δ (x) in sets of equal volume. The analogous computation on a graph, replacing the ball by the set Note that if r = 0 and G is an unweighted graph, such that ω ij ∈ {0, 1}, then in whose right hand side we recognize (18). In the last equality we used that ω ij = 0, if j ∈ N i . The equality in (19) has an interesting consequence on the level of Γ-convergence of functionals. For more information about the theory of Γ-convergence, we refer to [31,14]. The first result of Γ-convergence for the Ginzburg-Landau functional goes back to work of Modica and Mortola [79,78].
Remark 3.4. We note that graph curvature is related to the process of bootstrap percolation [23,63,13], in which nodes on an unweighted graph switch from 'inactive' to 'active', if their number of active neighbors exceeds a given threshold value. This number corresponds exactly to the graph curvature (κ q,0 S ) i , if node i is inactive and S is the set of active nodes.
Theorem 3.5. Let g ∈ C(R n ), ε > 0,W ∈ C 2 (R) a nonnegative double well potential with wells at 0 and 1, and consider the functionals else.
Then f ε Γ → f 0 as ε → 0 (using any of the equivalent metrics on R n ).
Furthermore, if the double well potentialW satisfies a coercivity condition -i.e., there exists a c > 0 such that for large |u|, W (u) ≤ c(u 2 − 1)-then compactness holds, in the following sense: be a sequence such that there exists a C > 0 such that for all n ∈ N f εn (u n ) < C. Then there exists a subsequence and a u ∞ of the form u ∞ = χ S , for some S ⊂ V , such that u n → u ∞ as n → ∞. 22 Proof. The key point in the proof of the Γ-convergence is to note that f ε is a continuous perturbation of the functional w ε : V → R, By a well known property of Γ-convergence [31, Proposition 6.21], the Γ-limit is preserved under continuous perturbations. Then using the fact, shown above in (19) The compactness result is a direct adaptation of the proof of [107, Theorem 3.2] to the current functionals f ε .
Remark 3.6. Note that in Theorem 3.5 above, we can also use the double well potential W with wells at ±1, instead ofW . In that case, the limit functional w 0 in the proof takes finite values only for functions of the form We end this subsection with another similarity between the graph based objects we introduced and their continuum counterparts. The gradient of the graph distance d ∂S , from Definition 2.3, agrees with the normal ν, from (13), on the boundary of S induced by the graph distance, in the sense of the following lemma. This again corresponds to what we expect based on the continuum case. We define the signed distance to ∂S as Lemma 3.7. Let S ⊂ V . Define the exterior boundary of S induced by the graph distance d ∂S as Similarly, let the interior boundary of S induced by the graph distance be Note that ∂ ext S ⊂ ∂(S c ). Hence, i ∈ ∂(S c ) and thus there is a k ∈ N j such that k ∈ ∂S and therefore d ∂S k = 0. Because d ∂S is nonnegative and i ∈ ∂ ext S, we deduce Thus the maximum is achieved for k = j, and hence so is the minimum in (4), which shows that The proof for i ∈ ∂ int S follows from similar arguments, noting that Note that ∂ ext S ⊂ ∂(S c ), but equality does not necessarily hold. If a shortest path from i ∈ ∂(S c ) to ∂S does not equal {i, j}, for some j ∈ ∂S, then i ∈ ∂ ext S. This situation does not occur if the graph distances are consistent, in the following sense: if, for all i, j, k ∈ V ,

Relation with the continuum nonlocal mean curvature
There is a clear analogy between the expressions in (18) and the (continuum) nonlocal mean curvature [18,17], as well as between TV q a and continuum nonlocal energy functionals. Consider an interaction kernel K : This kernel K can be thought of as the energy given by a long-range interaction between a particle placed at x with a particle at y. It defines a functional on subsets S ⊂ R n , sometimes called "nonlocal perimeter" or "nonlocal energy" and it is given by where for any pair A, B ⊂ R n we write Compare this with the graph case, if, for A, B ⊂ V , we write In terms of this bilinear functional, the anisotropic total variation, defined in (2), can be rewritten . We see that (21) is nothing but the continuum version of (22), and one may rightfully interpret the weight matrix ω q ij as an interaction kernel between pairs of nodes in G and L G (S, S c ) as measuring the total "interaction energy" between S and S c .
In this case one can see that the following two conditions must hold If, arguing heuristically, we let A shrink down to any x ∈ (∂S) ∩ Ω, we find that The integral on the left, which is well defined in the principal value sense 12 when x ∈ ∂S and ∂S is smooth enough (C 2 suffices), is known as the nonlocal mean curvature of S at x with respect to K, or just nonlocal mean curvature of S at x, when K is clear from the context. As with J K and TV q a , we see that is exactly a continuum analogue of the quantity j∈S − j∈S c ω q nj in (18), moreover, the inequalities in (23) are a continuum analogue of those in (18). Note however, that j∈S − j∈S c ω q nj is defined for all of n ∈ V , whereas κ nonlocal (x) above is only defined when x ∈ ∂S and ∂S is smooth enough.
Known regularity results deal mostly with the case K s (x, y) := c n,s |x − y| −n−s for s ∈ (0, 1) [17,22]. It is worth noting that J Ks is a fractional Sobolev norm of the characteristic function of S where . Ḣs /2 is defined in terms of the Fourier transform of f by Moreover, as s → 1 − the quantity above gives the perimeter of S, and the corresponding nonlocal mean curvature converges pointwise to the standard mean curvature. In [18] it is shown that if we consider the MBO scheme, where instead of the heat equation we use the fractional heat equation, then in the limit we get a set S t evolving over time with a normal velocity at x ∈ ∂S t given by

Mean curvature flow
In this section, we define a mean curvature flow on graphs and connect it with the curvature κ q,r S in (14). It is not clear what is the most natural notion for the evolution of a phase in a graph. Do we want to consider a sequence of subsets {S n } n∈N , or a continuous family {S t } t>0 which, although piecewise constant in t, may change in arbitrarily small time intervals? How we connect solutions of the graph mean curvature flow to solutions of the graph (MBO τ ) scheme from Section 4, or to solutions of the graph Allen-Cahn equation (ACE ε ) from Section 5, will depend on the answer to this question. For now, we shall be content with considering a phase evolution comprised of a discrete sequence of sets S n = S(nðt), n ∈ N, that correspond to the state of the system at discrete time steps.
Our construction follows the well-known variational formulations for classical mean curvature flow [3,68]. Appendix A.2 has a brief overview of mean curvature and the associated flow in the continuum case. An obstacle one encounters when trying to emulate the continuum level set method to express mean curvature flow on graphs is, that, due to the lack of a discrete chain rule, the resulting equation is not independent on the choice of level set function.
Recall the notions of graph distance and boundary of a node set from Section 2.
Definition 3.8. The mean curvature flow, S n = S(nðt), with discrete time step ðt for an initial set S 0 ⊂ V , is defined and Note that, for a given graph G, minimizers of F may not be unique. In this case different mean curvature flows can be defined, depending on the choice of S n+1 . An example of this non-uniqueness on the 4-regular graph is given in Section 6.5.
We choose to use the distance to Σ n , instead of the distance to either ∂S n or ∂(S c n ), so that the mean curvature flow is not a priori (independent of curvature) biased to either adding nodes to or removing nodes from S.
Since nodes in Σ n can be added or removed from S without increasing the last term of (25), every stationary state χ S of (MCF ðt ) is a minimal surface in the sense that TV q a (χ S ) ≤ TV q a (χ {S∪{n} ) for n ∈ ∂(S c ) and TV q a (χ S ) ≤ TV q a (χ {S\{n} ) for n ∈ ∂S (in the case where the minimizer of F is unique, the inequalities are strict). In particular, the sequence defined by (MCF ðt ) is not "frozen" for ðt arbitrarily small. Remark 3.9. In the continuum, stationary points of mean curvature flow are minimal surfaces, i.e., surfaces of zero mean curvature [55,20,73,30]. On the graph, the zero mean curvature condition is too restrictive, since it would only allow for disconnected sets S and S c , but, given Definition 3.8 above, we can still define an ðt-minimal set on a graph as a node set S ⊂ V , such that S is a stationary point of the mean curvature flow (MCF ðt ). Note that, if ðt 1 > ðt 2 and S is an ðt 1 -minimal set, then S is also an ðt 2 -minimal set. We can rewrite the last term in F(Ŝ, S n ) in terms of the signed graph distance (20)), which takes nonnegative values in S c n and nonpositive values in S n . We state the precise result in the following lemma.
Proof. The rewriting of TV q a (χŜ) − TV q a (χ Sn ) in terms of the curvatures, follows directly from (17). For the distance term, we compute The proof is completed by noting that the last term above does not depend on S.
Lemma 3.11 allows us to find a convex functional, the superlevel sets of whose minimizers are themselves minimizers of F from (25) (similar to the continuum results in, for example, [27] and [28]). Before we state and prove that result in Theorem 3.13, we prove a layer cake or coarea formula for the discrete total variation, TV q a .
Lemma 3.12. Let u ∈ V and define, for t ∈ R, Then Proof. As also noted in [112] and [24], we have, for i, j ∈ V , The second result follows, since, Theorem 3.13. Let u ∈ V be a minimizer of the convex functional then, for all s ∈ R, the superlevel set Proof. Let u − ≤ min i∈V u i . Then from Lemma 3.12 we know that This gives where F is as in (26). Hence, if u minimizes F , then for a.e. s ∈ [u − , ∞), the superlevel set E(s) minimizes F (·, S n ). In fact, since u − is arbitrarily chosen from (−∞, min i∈V u i ], this result holds for a.e. s ∈ R. Because u takes only finitely many values, the result holds for all s ∈ R. Lemma 3.11 now completes the proof. Note that Lemma 3.12 and Theorem 3.13 above also hold if we define E(t) in terms of a non-strict inequality instead: Remark 3.14. The function TV q a (u) is a convex function, thus, taking ∂TV q a (u) as the (possibly multivalued) subdifferential of TV q a (u) [39], any minimizer of u → TV q a (u) + u, g V , for g ∈ V, will solve the differential inclusion ∂TV q a (u) −g. From the definition of TV q a we see that its subdifferential is only multivalued at u for which ∇u vanishes between two nodes, at all other u, TV q a is pointwise differentiable. In particular, if u, v ∈ V and ∇u is never zero, we may differentiate 13 Since div is the adjoint of ∇, it follows that, for all v ∈ V, d dt (TV q a (u + tv) + u + tv, g V ) = div (sgn(∇u)) + g, v V . Therefore, the Euler-Lagrange equation for minimizers of F from (27), is div sgn(∇u) + 1 ðt sd Σn = 0, provided ∇u is never zero. Whenever (∇u) ij = 0 for some i, j, the above equation is replaced by a differential inclusion in terms of the subdifferential of the absolute value function. Concretely and to recap, since the subdifferential of the absolute value function at 0 is the interval [−1, 1], there exists φ ∈ E such that |φ ij | ≤ 1 for all i, j ∈ V , div φ + 1 ðt sd Σn = 0, and if (∇u) ij = 0, then φ ij = sgn((∇u) ij ).
Fast computational methods for the solution of (MCF ðt ) based on max flow/min cut algorithms are developed in [24,25]. These methods exploit the homogeneity and submodularity of the total variational functional, TV q a . Those same papers also show it is possible to rewrite (MCF ðt ) as an ROF problem [90].
It would also be interesting to compare (MCF ðt ) to the finite difference scheme for motion of level sets by mean curvature on a lattice in [83].

Threshold dynamics on graphs
In this section we study the threshold dynamics or Merriman-Bence-Osher algorithm on a graph G. For a short overview of the continuum case we refer to Section A.3 in Appendix A.

The graph MBO algorithm
The MBO scheme on a graph, describing the evolution of a node subset S ⊂ V , is given as follows.
Here χ S denotes the characteristic function of the set S.
Threshold step. Define the set S k ⊂ V to be By the comparison principle, Lemma 2.6(d), we note that the solution to (29) satisfies v(t) ∈ [0, 1] n for all t ∈ [0, τ ]. An alternative description of the algorithm is as follows. Let u k be the indicator of the set S k as defined by the (MBO τ ) algorithm.
If we define the thresholding function P : R → {0, 1}, which acts by thresholding then the iterates can be succinctly written u k = (P e −∆τ ) k u 0 . Several papers use the MBO algorithm on a graph to approximate motion by mean curvature. For example, in [74,51,60], the MBO algorithm on graphs was implemented and used to study data clustering, community detection, segmentation, object recognition, and inpainting. This is accomplished by simply reinterpreting the Laplacian in (47), or in appropriate extensions of (47), as the graph Laplacian.
4.2. The "step-size" τ in the (MBO τ ) algorithm As discussed at the end of Section A.3, in a finite difference discretization of the continuum MBO algorithm, the time step τ must be chosen carefully to avoid trivial dynamics. For τ too small, there is not enough diffusion to change the value of u at neighboring grid points beyond the threshold value. In this case, the solution is stationary under an (MBO τ ) iteration and we say that the solution is frozen or pinned. For τ too large there is so much diffusion that a stationary state is reached after one iteration in the MBO scheme. It is not surprising that these finite difference effects also appear for the MBO algorithm on graphs. From the form of the heat solution operator, e ∆τ , we expect that τ should be roughly chosen in the interval (λ −1 n , λ −1 2 ). Theorems 4.2 and 4.3 strengthen this intuition. The following theorem gives a lower bound on the choice of τ to avoid freezing in the (MBO τ ) algorithm on general graphs.
Proof. To prove (30), let χ S be the characteristic function on a set S ⊂ V . For a node to be added or removed from S by one iteration of (MBO τ ), it is necessary that e −τ ∆ χ S − χ S V,∞ ≥ 1 2 . For the linear operator A : V → V, let A V be the operator norm induced by · V , i.e., Lemma 2.5(b)). Using Lemma 2.2, we compute Using the triangle inequality and the submultiplicative property of · V (see, e.g., [59]), we compute , all nodes are stationary under an (MBO τ ) iteration.
The following corollary of Lemma 2.6(c) shows that an upper bound on τ is necessary to avoid trivial dynamics. (volV ) where the mass M (u 0 ) is defined in (10), then Proof. In Lemma 2.6(c), This corollary shows that, if τ is chosen too large, one iteration of (MBO τ ) leads to a trivial state u = χ V or u = 0, which is stationary under the algorithm (MBO τ ).
The following theorem gives a condition for which there is a gap between the lower and upper bound for τ .
Theorem 4.4. Consider the (MBO τ ) iterations on a graph with n ≥ 2. Let τ ρ and τ t be defined as in (30) and (32). If λ2 λn < log Since d r − ≤ volS, we have The result follows.
Theorem 4.4 further reenforces our intuition that τ should be chosen in the interval (λ −1 n , λ −1 2 ). If, for a particular graph, this interval is very small, then Theorems 4.2 and 4.3 cannot provide an interval for which the (MBO τ ) iterations has a chance of being nonstationary after the first iteration. Note however that the interval [τ ρ , τ t ] given by these theorems, is not necessarily a sharp interval for interesting dynamics.

A Lyapunov functional for the graph (MBO τ ) algorithm
In this section we introduce a functional which is decreasing on iterations of the (MBO τ ) algorithm. The analogous functional for the continuum setting was recently found in [42]. The functional is then used to show that the (MBO τ ) algorithm with any initial condition converges to a stationary state in a finite number of iterations.
Let τ > 0 and consider the functional J : V → R defined by Note that by, Lemma 2.6(a), J(u) = M (u) − u, e −τ ∆ u V , where M is the mass from (10).
Proof. We compute, for all v = 0, Taking the first variation of J(u) = 1 − u, e −τ ∆ u V , we find that δJ δu , δu as desired.
Define the convex set K := {φ ∈ V : ∀j ∈ V φ j ∈ [0, 1]}. Is it instructive to consider the optimization problem, Since the objective function in (34) is concave and the admissible set is a compact and convex set, it follows that the solution to (34) is attained by a vertex function u ∈ B := {v ∈ V : ∀j ∈ V v j ∈ {0, 1}}.
Here B is the set of binary vertex functions, taking the value 0 or 1 on each vertex. The sequential linear programming approach to solving the system (34) is to consider a sequence of vertex functions {u k } ∞ k=0 which satisfies u k+1 = arg min v∈K L u k (v), u 0 = χ S , for a node set S ⊂ V.
The optimization problem in (35) may not have a unique solution, so the iterates are not well-defined. The following proposition shows that the iterations of the (MBO τ ) algorithm define a unique sequence satisfying (35). Note that the optimization problem in (35) is the minimization of a linear objective function over a compact and convex set, implying that for any sequence {u k } ∞ k=0 satisfying (35), u k ∈ B for all k ≥ 0.
Proposition 4.6. The iterations defined by the (MBO τ ) algorithm satisfy (35). The functional J, defined in (33), is non-increasing on the iterates {u k } ∞ k=1 , i.e., J(u k+1 ) ≤ J(u k ), with equality only obtained if u k+1 = u k . Consequently, the (MBO τ ) algorithm with any initial condition converges to a stationary state in a finite number of iterations.
Proof. At each iteration k, the objective functional L u k is linear and thus the minimum is attained by a function These are precisely the (MBO τ ) iterations. By the strict concavity of J and linearity of L u k , for u k+1 = u k , Since u k ∈ K, L u k (u k+1 ) ≤ L u k (u k ) which implies J(u k+1 ) < J(u k ). The convergence of the algorithm in a finite number of iterations then follows from the fact that B contains only a finite number of points, the vertices of the unit n-cube.
Proposition 4.6 shows that J is a Lyapunov function for the (MBO τ ) iterates. From the proof of Proposition 4.6, we also note that the non-uniqueness of the iterates in (35) corresponds to the choice in the (MBO τ ) algorithm of thresholding vertices {j ∈ V : e −τ ∆ u k = 1 2 } to either 0 or 1 (see Remark 4.1).
Remark 4.7. The framework of [42] easily allows for the extension of the MBO algorithm to more phases, however we do not pursue these ideas here.

A local guarantee for a 'nonfrozen' (MBO τ ) iteration
We begin by observing that the constant τ κ in Theorem 4.2 depends on the maximum curvature κ 1,r S of the indicator set in the graph, ∆χ S V,∞ . In this section, we prove a theorem which gives a condition on τ in terms of the local curvature (κ 1,r S ) i at a node i, which 36 guarantees that the value of u on that node will change in one iteration of the graph (MBO τ ) scheme. We first introduce some notation which is needed to state the theorem. Recall that the set of neighbors of a node i ∈ V is N i = {j ∈ V : ω ij > 0}. Let 1 ∈ V be an arbitrary node in the graph G and let S ⊂ V . We define the sets The set S 1 contains neighbors of node 1 which are also in either the boundary ∂(S c ) or ∂S (depending on whether or not 1 ∈ S). For u ∈ V, define ∆ as We see that ∆ on S 1 is similar to the Laplacian on the subgraph induced by S 1 , with the important distinction that, for each i ∈ V , the degree d i is the degree of i in the full graph G, not the degree in the subgraph induced by S 1 . In [29,Section 8.4], ∆ is referred to as the Laplacian with Dirichlet conditions on ∂(S 1 c ).
Note in particular that, if v ∈ V 1 , then e −t∆ v ∈ V 1 for all t ≥ 0.
It is important to note that both |(κ 1,r S ) 1 | 2 and (∆ ) 2 χ S1 V,∞ are local quantities, in the sense that they only depend on the structure of G at node 1 and its neighbors. This is in contrast with Theorem 4.2, which depends on the spectrum of the Laplacian on G. The existence of a lower bound τ 1 on τ is unsurprising in the light of this earlier freezing result. The necessity for an upper bound τ 2 37 can be understood from our wish to only use local quantities in this theorem.
Let v satisfy the heat equation with Dirichlet boundary data, As noted above the theorem, v(t) ∈ V 1 for all t ≥ 0. It is easily checked that v is subcaloric, i.e.,v i ≤ −(∆v) i for all i ∈ V , and v(0) ≤ χ S . In addition, the Laplacian satisfies −(∆u) i ≤ −(∆ũ) i if u i =ũ i and u j ≤ũ j , for j = i. Hence, by the theory of differential inequalities (see for example [101,Theorem 8 In particular, Note that − (∆ χ S1 ) 1 = −(κ 1,r S ) 1 = |(κ 1,r S ) 1 |, where the last equality follows because 1 ∈ S.
We conclude that which proves the result for the case in which 1 ∈ S.
To prove the desired statement if 1 ∈ S, we note that Recall that, in this case, S 1 = N 1 ∩ S c , and the same 38 derivation as above holds 14 , since 1 ∈ S c , with the exception that the admissible range of τ becomes the open interval (τ 1 , τ 2 ). This is because, by our definition of the (MBO τ ) algorithm, the thresholding operator thresholds the 1 2 -level set to 1.
In the remainder of this section we determine some conditions under which the requirement |(κ 1,r S ) 1 | 2 > (∆ ) 2 χ S1 V,∞ in Theorem 4.8 is satisfied. To this end, define the reduced degrees, for i ∈ V , as Proof. Consider the |V | × |V | matrix corresponding to ∆ . After possibly relabeling the nodes, it can be written as 14 In particular, carefully note that now − ∆ χ S 1 1 = (κ 1,r S ) 1 = |(κ 1,r S ) 1 | holds.
If the first condition in (39) is satisfied with then the second condition in (39) can be replaced by the condition that, for all i ∈ S 1 , ω i1 ≤ d i .
Proof of Corollory 4.10. For r = 1, we compute, for i ∈ S 1 , Since d 1 > 0, condition (38) reduces to max on the other hand, the maximum is achieved at some i ∈ S 1 , then Here, we first used the first condition from (39), and then the second. Combined with Theorem 4.8 and Lemma 4.9, this proves the first claim. Now assume, instead of the second condition in (39) It is worthwhile to understand the conditions in the corollary. Adding the two conditions in (39) gives is a measure of the relative strength of connection of node i within the set S 1 compared to all its connections in G. Similarly d 1 d1 is the relative strength of connection of node 1 within S 1 (or, equivalently, within S 1 ), compared to all its connections in 41 G. The conditions in (39) thus require node 1 to be a node with comparatively large relative connection strength within S 1 , compared to the other nodes in S 1 . This will allow enough mass to diffuse to or away from node 1 (depending on whether or not 1 ∈ S), for it to pass the threshold value 1 2 , without too much of the locally available mass diffusing to other nodes.
Some examples in Section 6 further examines the conditions in (38) and (39).

Allen-Cahn equation on graphs
In this section, we investigate the Allen-Cahn equation on graphs. A short overview of the continuum Allen-Cahn equation can be found in Section A.1 in Appendix A.
We propose the following Allen-Cahn equation (ACE) on graphs, for all i ∈ V : for a given initial condition u 0 ∈ V and ε > 0. Here W ∈ C 2 (R) is a double well potential. For definiteness we set W to be the standard double well potential W (u) = (u+1) 2 (u−1) 2 , hence W (u) = 4u(u 2 − 1) and W has two stable minima at the wells at u = ±1 and an unstable local maximum at u = 0. Recall that the sign convention for the Laplacian is opposite to the one used in the continuum literature. Note that for ε sufficiently small, this system has 3 n equilibria, of which 2 n are stable. The (MBO τ ) algorithm is closely related to time-splitting methods applied to the Allen-Cahn evolution (ACE ε ). The diffusion step is precisely the time evolution with respect to the first term of (ACE ε ) and the thresholding step is the asymptotic behavior of evolution with respect to the second term of (ACE ε ).
The case V = Z d with weights ω ij = ω( i − j ) was considered in [8], where it is seen as an approximation to the Ising model, and stationary solutions and traveling waves are constructed for ε small enough. The authors note that, when ω ij corresponds to nearestneighbors, this equation is known as the discrete Nagumo equation, which is a simplified model of neural networks. In this context, [61] considered the Nagumo equation in Z 1 and derived the existence of traveling waves. In general, we are not aware of any previous works where (ACE ε ) is considered for an arbitrary weighted graph (V, E, ω ij ). It would be interesting to see whether the analysis in [8] can can be extended to use (ACE ε ) to study phase transitions in general graphs, a topic of interest in other areas of mathematics [70].
Just as in the continuum case, we arrive at (ACE ε ) as the gradient flow given by the graph Ginzburg-Landau functional, GL ε (u) : V → R, , and whose first variation is given by The factor d −r i in the potential term is needed to cancel the factor d r i in the V-inner product. Equation (ACE ε ) is then the V-gradient flow associated with (GL ε ).
Recall that the Laplacian ∆ also depends on r. In fact, the equation in (ACE ε ) can be rewritten as showing that the factor d r i can be interpreted as a node-dependent time rescaling.
By standard ODE arguments and the smoothness of the right hand side of (ACE ε ), for each ε > 0 a unique C 1 solution to (ACE ε ) exists for all t > 0.
This continuum case (see Appendix A.1) suggests an approach for finding a valid notion of mean curvature (and its flow) for graphs: Take initial data u(0) = χ S − χ S c , for some node set S ⊂ V , and consider the corresponding solution u ε (t) to (ACE ε ), for all times t > 0. The question is whether the limit exists. Even if it does, it is unlikely thatū is of the form χ S(t) − χ S(t) c , which can be interpreted as a binary indicator function for some evolving set S(t), for all times t > 0, but there may be an approximate phase separation: for some small δ > 0. Is there a way to characterize the evolution of the "interface" between the two level sets ofū i (t)?
However, a little analysis shows the above approach is rather naïve. Indeed, unlike in the continuum case, the graph Laplacian of the indicator function of a set S ⊂ V is always a well-defined bounded function (in any norm). Thus, for small ε the potential term in the equation will dominate the dynamics, and pinning or freezing will occur, as proven in Theorem 5.3. This is the dynamics in which the sign of the value of u on each node is fixed by the sign of the initial value, and u at each node just settles into the corresponding well of W .
As discussed at the start of Section 3.3, the question how to connect the sequence of sets evolving by graph mean curvature to 44 the super (or sub) level sets {i ∈ V : u ε i (t) > 0} for solutions of (ACE ε ), is still open. See also Question 7.4.
Remark 5.1. Note that in the (MBO τ ) algorithm, the values of u are reinitialized in every iteration to 0 or 1. Our choice of the double well potential W in (ACE ε ) has two equilibria corresponding to the level sets for ±1. Correspondingly, the unstable equilibrium for (MBO τ ) corresponds to the 1/2 level set, while for (ACE ε ) it corresponds to the 0 level set. This agrees with the now standard notations for Allen-Cahn and MBO.
Below, we show that for all ε below a finite ε 0 > 0 the functions u ε i (t) do not change sign as t varies, so that pinning occurs. Recall that a set which contains the forward orbit of each of its elements is called positively invariant, and that the number of nodes in the graph G is |V | = n.
Lemma 5.2. Consider the set S := {u ∈ V : u 2 V ≤ 17 4 n d −r + } and let u(t) be the solution to (ACE ε ) for a given ε > 0. Then t → u(t) 2 V is decreasing at each t such that u(t) ∈ S c . As a consequence, the set S is positively invariant and every trajectory of (ACE ε ) enters S in finite time.
The last inequality follows, since u i (t) 2 − 1 > 1 for i ∈ A c (t), and where we have used that |A(t)| ≤ n. This shows u(t) 2 V is decreasing in the region S c , as desired. The other statements in the lemma now follow.
There exist an ε ρ and an ε κ (depending on the spectral radius of ∆ via (41), and on sup t≥0 ∆u(t) V,∞ < ∞ via (42), respectively), such that, if either ε ≤ ε ρ or ε ≤ ε κ , then the solution u(t) to (ACE ε ) is such that sign(u i (t)) is constant in time, for all i ∈ V .
Proof. By Lemma 5.2, u(t) 2 V ≤ 17 4 n d −r + for t large enough. Hence, by continuity of u(t), there is a C (depending on the initial condition) such that, for all t ≥ 0, Thus, if ρ > 0 denotes the spectral radius of ∆, we get, for all i ∈ V , for all i ∈ V , thus, we have the inequalities . Without loss of generality, we can assume that there is a number α ∈ (0, 1) such that |u i (0)| ≥ α for all i ∈ V . If there is an i ∈ V such that |u i (t)| = α for a given t, then we have that |W (u i (t))| = 4α(1 − α 2 ), with a sign opposite to that of u i . Thus, if Hence u i (t) can never reach zero, and by continuity in t it does not change sign.
Alternatively, instead of (40), we can estimate The finitude of the right hand side follows from (40). Following the same reasoning as above, we then conclude The constant ε κ in Theorem 5.3 involves ∆u V,∞ , which is "curvature-like". Compare this to the constant τ κ in Theorem 4.2, which depends on the maximum curvature of the indicator set in the graph, ∆χ S V,∞ , as also discussed in Section 4.4. This tentative similarity makes us suspect, that a condition on the local curvature, similar to those for the (MBO τ ) algorithm given in Theorem 4.8, guarantees a phase change in the Allen-Cahn flow. We discuss this further in Question 7.3.
We see that the discrete nature of the graph, manifest in the finite spectral radius of the Laplacian, makes the limit behavior of (ACE ε ) as ε → 0 much different than that for the continuum case. In particular, this means that we ought to look for a notion of mean curvature flow on graphs more carefully.
Remark 5.4. For ε small enough, but not smaller than the ε 0 from Theorem 5.3 above, we expect interesting asymptotic behavior for the motion of the phases in (ACE ε ) on intermediate time scales. Such asymptotics might be connected to the graph curvature of the phases, which would match the situation in the continuum setting, where the solution has phases that for large times behave as if they were evolving by mean curvature flow, while the solution itself becomes stationary in the limit t → +∞. This phenomenon is known as dynamic metastability (see for instance [16] and the references therein). See also Question 7.4.

Explicit and computational examples
In this section we give several examples of graphs where the mean curvature, MBO, and Allen Cahn evolutions can be compared either explicitly or computationally.

Complete graph
Consider the complete graph, K n , on n nodes with ω ij = ω for all i, j ∈ V . See Figure 1a. In this case, the matrix representation of the graph Laplacian is given by the circulant matrix, L = ω[(n − 1)ω] −r n Id n − 1 n 1 t n , where 1 n denotes the vector in R n with all entries equal to 1, and Id n the identity matrix in R n×n . The eigenvalues of L are given by 0 and ωn[(n − 1)ω] −r (with multiplicity n − 1). In particular, λ 2 = λ n = ρ. Note that the normalized eigenvector corresponding to eigenvalue 0 is given by (vol V ) − 1 2 χ V . 47 Let S ⊂ V be a set with volume ratio R S = vol S vol V (see also Theorem 4.3). Using the spectral decomposition from (11), the evolution of χ S by the heat equation can be explicitly written as Assume R S = 1 2 . Then there exists a critical time step τ c depending only on vol S, vol V , and ρ such that τ < τ c implies the solution to the (MBO τ ) evolution is pinned and τ ≥ τ c implies exactly one iteration of the (MBO τ ) evolution gives a stationary solution, either 0 or χ V depending on the initial mass, M (χ S ) = vol S (see (10)). From the solution, the critical time step τ c can be directly computed, Note that for n > 2, R S > 1 n and τ t > τ c > τ ρ . By symmetry, both the Allen-Cahn equations and mean curvature flows reduce to two-dimensional systems, with one variable governing the value of the nodes in S and the other the nodes in S c . Critical parameters and ðt exist for which below the phase remains the same for all nodes and above the phase simultaneously changes.

Star graph
Consider a star graph SG n as in Figure 1b with n ≥ 3 nodes. Here the central node (say node 1) is connected to all other nodes and the other n − 1 nodes are only connected to the central node. Hence, for all i ∈ {2, . . . , n}, ω 1i = ω i1 > 0, and all the other ω jk are zero.
We consider the unnormalized graph Laplacian L = D − A (r = 0 in (8)). Since d 1 = n j=2 ω 1j and d i = ω 1i , for i ∈ {2, . . . , n}, we can explicitly compute the characteristic polynomial of L: If all non-zero edge weights have the same value ω, this simplifies considerably to We now let S = {1} and note that χ S has the explicit expansion in terms of these eigenvectors, We now consider the (MBO τ ) iterates of χ S . We compute e −∆τ χ S = n −1 χ V + (n − 1) 15 Here, subscripts j denote the components of the vectors.

49
Thus pinning occurs if τ < τ c := 1 nω log 2 n−1 n−2 . If τ > τ c , the solution to the (MBO τ ) evolution gives the stationary solution, 0, after exactly one iteration. The bound from Theorem 4.2 states that pinning occurs if τ < 1 nω log 3 2 . The bound in Theorem 4.3 states that trivial dynamics occur if τ > 1 ω log 2 n−1 n−2 . Qualitatively, this example shows that it is easier for a solution to be pinned on nodes with smaller degree.
We now consider an implication of Theorem 4.8 in the case where the graph induced by S 1 is a star graph with node 1 as center, i.e., d i = 0 for all i ∈ S 1 . This is certainly true for the case when the graph is a star graph and S = {2, . . . , n}. The following lemma states, in the case where r = 1, a simple criterion on the degrees for which there exists a τ interval in which node 1 switches phase in a single iteration of (MBO τ ). We will see an application of the lemma in Section 6.3. Lemma 6.1. Let r = 1 and consider the case where the subgraph induced by S 1 is a star graph with node 1 as center, i.e., d i = 0 for all i ∈ S 1 . If either then there exists a nontrivial τ interval (as in Theorem 4.8) such that node 1 will change phase in the next (MBO τ ) iterate.
Proof. We see from condition (39) (or via direct computation from (38)) that a sufficient condition for τ 1 < τ 2 , is to have, for all i ∈ S 1 , is the only node in S 1 , then ω i1 = d 1 and we find the condition d i < d 1 . If however ω ij , and thus the condition on d1 di can be replaced by the simpler (but stronger) condition d i < d 1 , for all i ∈ S 1 .

A regular tree
We consider the (MBO τ ) iterations on a regular tree as in Figure 2.
Let ω ij = ω, for all (i, j) ∈ E, and r = 1. As in Figure 2a, we consider the case where the initial set S consists of the leaves of a branch. We first observe that the subgraph induced by S j , for any j ∈ V , is a star graph with node j as center, i.e., d i = 0 for all i ∈ S j (for an example of a star graph with five nodes, see Figure 1b  that the hypothesis of Lemma 6.1 are satisfied with nodes 9 and 10 each playing the role of "node 1" in the lemma. Applying Lemma 6.1 to node 9 in Figure 2a where S = {1, 2, 3, 4}, we see that there exists a τ such that node 9 will change in the next iteration. By symmetry, node 10 will change in the same iteration. If node 13 doesn't change in the first MBO iteration, Lemma 6.1 can be applied again (because node 13 has two children, the hypotheses of the lemma are again satisfied with 9, 10 ∈ S 13 ) to show that there exists a τ such that node 13 will be added to the set. After node 13 has been added to the set, S, as in Figure 2b the MBO iterates are stationary. To see that node 15 cannot be added to S, assume that it were. Then the value of the Lyapunov functional, (33), must have decreased. But by symmetry, in the next MBO iteration, node 15 will be removed from S, again decreasing the value of the Lyapunov functional, a contradiction. The final configuration in Figure 2b minimizes the normalized cut, as defined in Section 2.2.
This argument is easily generalized to trees where each node, excluding leaves, has the same number of children c ≥ 2.

A small square grid
Here we construct an explicit example where Theorem 4.8 can be applied to show that there exists a time interval (τ 1 , τ 2 ) such that a node is guaranteed to change in one iteration of the (MBO τ ) algorithm.
Consider a three by three (nonperiodic) square grid as in Fig such that conditions (39) are satisfied. Furthermore, with > 0, and even the full condition (38), for r = 1, is satisfied. From (36) we can then compute for the time interval (τ 1 , τ 2 ) of Theorem 4.8.

Torus graph
Consider the n-cycle, C n with n nodes. The nodes are arranged in a circle and each node is connected to its 2 neighbors. We take ω ij = ω for i ∼ j and zero otherwise. See Figure 1d. We consider the unnormalized graph Laplacian L = D −A (r = 0 in (8)). In this case, L is a circulant matrix diag({−1, 2, −1}, {−1, 0, 1}). The eigenpairs {(λ j , v j )} n j=1 are given by We then consider the 2-torus graph, T 2 n1,n2 which is the Kronecker (tensor) product of the n 1 -and n 2 -cycles. See Figure 3. In particular, if u and v are eigenfunctions of the graph Laplacian on C n1 and C n2 with corresponding eigenvalues α and β respectively, then w = u ⊗ v (with w i,j = u i v j ) is an eigenvector of T 2 n1,n2 with 52 corresponding eigenvalue α + β. In particular, the spectral radius of the Laplacian is ρ = 8ω.
Consider for a moment T 2 n1,n2 as a discretization of the torus, T 2 . The nontrivial minimal-perimeter subsets of T 2 are given by "strips". Thus we might expect that for some initial condition, χ S , S ⊂ V the evolution by MBO, Allen-Cahn, or MC would converge to a strip.
We consider the (MBO τ ) evolution on a 32×12 torus with ω = 1 and initial condition, as in Figure 3a. For τ = 1.12, the solution is stationary after 4 iterations once the "high curvature corners" have been removed, as in Figure 3b. For τ = 4, the solution evolves into a minimal-perimeter "strip" in 5 iterations, as in Figure 3c.
For the parameters in Figure 3b, we compute the guaranteed stationarity bounds in (30) and (31) to be τ ρ ≈ 0.0057 and τ κ = 1 4 , respectively, showing these bounds are not sharp.
Consider (MCF ðt ), with S n equal to the minimal-perimeter strip in Figure 3c. Then S n+1 = S n is a minimizer of F(·, S n ), but so are S n+1 = S n ∪ ∂(S c n ) and S n+1 = S n \ ∂S n (or variations in which only one 'vertical line' of the boundary is added or removed). This illustrates a possible type of non-uniqueness for (MCF ðt ), which occurs when S n is totally geodesic (i.e., its boundary is a geodesic). To reiterate, the stationary solution in Figure 3b is frozen (due to the smallness of τ ), while the solution in Figure 3c is totally geodesic. 53

Buckyball graph
Consider the buckyball graph with 60 nodes and 90 edges with ω ij = ω for all edges (i, j) as in Figure 4. The graph is regular; each node has degree 3ω. Consider for a moment the buckyball graph as a (coarse) discretization of the sphere, S 2 . There are no nontrivial minimal-perimeter 54 subsets of S 2 . Great circles are the only nontrivial stationary submanifolds of S 2 (and have constant curvature). In fact, great circles are totally geodesic. Thus we might expect that for any initial condition, χ S , S ⊂ V such that |S| = |S 2 |/2, the evolution by MBO, Allen-Cahn, or MC would converge to a stationary solution, either 0 or χ V depending on the initial mass, M (χ S ) = vol S. If S is chosen to be a symmetric partitioning of the nodes for the buckyball graph, we expect that the (MBO τ ) evolution will be stationary for all values of τ . The bound from Theorem 4.2 states that pinning in (MBO τ ) occurs if The bound in Theorem 4.3 states that trivial dynamics occur if We find numerically that λ 2 ≈ ω 1−r 3 −r · 0.2434 and λ n ≈ ω 1−r 3 −r · 5.6180. For initial condition χ S , with |S| = 14, as given in Figure 4 (top left), and r = 0 and ω = 1, Theorems 4.2 and 4.3 predict that pinning occurs if τ < 0.0223 and trivial dynamics occur if τ > 15.1811. We find numerically that this initial condition is pinned if τ < 1.89 and trivial dynamics occur if τ > 3.54. For intermediate values of τ , the iterates shrink to the empty node set. For τ = 2, the iterates take 3 iterations to reach steady state, as illustrated in Figure 4.
For the initial condition χ S where S is taken to be a symmetric partitioning of the nodes, (MBO τ ) evolution is pinned for all values of τ .

Adjoining regular lattices
We consider the graph which is composed by adjoining a square and triangular lattice. See Figure 5. We take r = 0 and ω ij = 1 for i ∼ j and zero otherwise. Note that the degree of a node in the triangular lattice is 6 and the degree of a node in the square lattice is 4.
To test the intuition from the star graph (see Section 6.2) that it is easier for the solution to pin on nodes with smaller degree, we consider the initial condition given in the top left panel of Figure 5. The mass is initially distributed over both the square and triangular lattice sites. We consider the (MBO τ ) evolution with τ = 0.8. The solution moves freely on the lattice sites with degree > 4, i.e., on the triangular lattice. However, on the square lattice, the solution only 'rounds corners'. 55   The nodes on the 'border' of the graph (where the regular lattice was cut) have smaller degree. In Figure 6, we demonstrate that the solution can also be pinned on the border. Again, the initial condition is given in the top left panel. In this simulation, we take τ = 0.9. Away from the boundary, the solution set can again shrink freely. However, the solution becomes pinned on the border. 58

Two moons graph
In this last example, we consider a graph which is widely used as a benchmark problem for partitioning algorithms. Our construction of the graph follows [12]. The graph is generated by first randomly distributing 600 points in a region described by two half arcs in R 2referred to as "two moons". See Figure 7 (top left). The points are then embedded in R 100 and randomly perturbed by i.i.d. Gaussian noise with mean zero and standard deviation σ = 0.1. Let k = 10.
The edge weights are chosen to be and d i is the Euclidean distance between x i and its k-th nearest neighbor. We then take the symmetrized k-nearest neighbors graph. This is given in Figure 7 (top right). We consider the (MBO τ ) evolution with τ = 5 and initial condition as shown in Figure 7 (bottom left). After 9 iterations, the (MBO τ ) evolution converges to the state in Figure 7 (bottom right).
We want to stress that the two moons example is meant as an illustration of the (MBO τ ) algorithm on a more complex toy graph. In this paper we do not aim to compete in terms of accuracy or efficiency with existing clustering methods, hence we will not focus on those aspects of the two moons example.

Discussion and open questions
Motivated by curvature flows in continuum mechanics, we described several analogous processes on graphs. In particular we used the graph total variation, or graph cut, to define curvature on graphs, which we then related to the graph Allen-Cahn equation, graph MBO scheme, and graph mean curvature flow. The continuum intuition for these problems suggests many results, some of which we proved in this paper, some which we have shown cannot hold on a graph because of the lack of infinitesimal length scales, and some which we state below as, still unproven, open questions.
In a sense to be made precise, for a suitable choice of τ (not too small, not too large, depending on ω, most likely depending on the graph's spectrum), the dynamics of (MBO τ ) are expected to approximate those of graph MCF. An approach to Question 7.1, uses the Taylor series expansion for the solution of the graph heat equation: for S ⊂ V . Hence, we can rewrite the Lyapunov functional J from (33) as Using 1 − χ S , χ S V = 0, 1, ∆χ S V = 0, and (3), we find J(χ S ) = τ TV 1 a (χ S ) + R S (τ ). This connection between the Lyapunov functional J and the total variation, and hence the MCF functional F from (25), strengthens the plausibility of a positive answer to Question 7.1. A more difficult question, which could be of great use in numerical problems, is how we can estimate the number iterations of (MBO τ ) needed to go from some initial data to a minimizer of the Ginzburg-Landau functional or graph cut functional. Question 7.2 (Minimizing graph cut). For any, a priori specified, approximation error, is there a local, quantitative bound on the number of iterations of (MBO τ ) needed to approximate a minimizer of the graph cut functional TV q a up to the specified accuracy? "Local" means here that the bound does not rely on the spectrum of the graph, but instead uses quantities such as graph curvature κ q,r or the total variation TV q a (χ N ) for some local graph neighborhood N ⊂ V . The analogous question can be asked for (MCF ðt ).
In Theorem 4.8 it was shown that, if the curvature at a given node is sufficiently large and the time step τ in (MBO τ ) is chosen in the right interval, then the value at the node will change in one (MBO τ ) iteration. The next question is the analogous statement for the Allen-Cahn equation, (ACE ε ). Question 7.3 (Non-freezing for Allen-Cahn). Let ε be in some positive interval and let u ε be a solution of the Allen-Cahn equation (ACE ε ) for this choice of ε. Suppose that the curvature (κ 1,r S ) i of S = {j ∈ V : u ε j (t 0 ) ≤ 0} at a node i ∈ S, is sufficiently large (possibly depending 60 on ε). Is there is some interval of positive times such that u ε i (t 0 +h) > 0 for h in this interval?
Because (ACE ε ) is derived from the graph functional (GL ε ), we suspect that the correct curvature in Question 7.3 is κ 1,r S , the curvature related to the anisotropic functional 1 2 TV 1 a , which was identified as the Γ-limit of (GL ε ) for ε → 0 in earlier work [107], and not the curvature which can be derived from the isotropic total variation functional TV as the continuum case might suggest at first glance. Since we have seen that pinning occurs for small enough ε, full convergence is not expected here, but the numerical examples of Section 6 suggest an approximate result for small ε is feasible. Question 7.4 (Allen-Cahn and graph Mean Curvature Flow). Is there an ε > 0 such that, given the solution u ε to (ACE ε ) for some > 0, there is an increasing sequence of times t n for which either the sets S n := {j ∈ V : u ε j (t n ) ≤ 0} or the sets S n := {j ∈ V : u ε j (t n ) ≥ 0} form a solution to the graph MCF?
Furthermore, among sequences with this property is there exactly one sequence {t n } that is maximal in the following sense: there exists no sequence {t n }, of which {t n } is a strict subsequence, such that {S t n } n ⊂ {S tn } n and {S t n } is still a solution to the graph MCF?
A different question is how the graph MCF behaves in the continuum limit, when it is formulated on a sequence of graphs which are ever finer discretizations of some continuum space. We expect that it should give back the usual MCF in the continuum limit, or some anisotropic MCF, as the convergence results in [107] show the final limit could crucially depend on the scaling in ðt and the discretization parameter (which will show up in the graph weights). This question is similar (and perhaps equivalent) to the convergence of discretization schemes for the usual MCF. Similar questions can be asked about the graph Allen-Cahn equation and graph MBO scheme. Question 7.5 (Stability of graph MCF, MBO, and ACE, in the continuum limit). Suppose we are given any sequence of graphs (V k , ω k ij ), k ∈ N, converging in the Gromov-Hausdorff sense to a Riemannian manifold (M, g). Is there a fixed time interval such that, as k → ∞, any sequence generated by (MBO τ ) with τ in this interval converges to a sequence generated by the (possibly anisotropic) continuum MBO algorithm in M (with the Laplacian induced by g)? Accordingly, do solutions of (ACE ε ) converge to solutions of the (possibly anisotropic) continuum Allen-Cahn equation in M , and do solutions to (MCF ðt ) converge to viscosity solutions (via the level set formulation) of (possibly anisotropic) continuum MCF in M , with initial data given by the limit of the initial data in each V n ?
As explained in Appendix A, MCF is closely related to certain models of continuum phase transitions, particularly Allen-Cahn and Ginzburg-Landau dynamics. However, another important connection with statistical mechanics involves Ising models and other interacting particle systems, which are known to converge in the mesoscopic limit to flow by mean curvature. In work of Katsoulakis and Souganidis [66,67] convergence to a viscosity solution of MCF is first proved. See also the related work of Funaki and Spohn [50] where MCF is derived as a deterministic limit of stochastic Ginzburg-Landau dynamics. On the other hand, there is vast literature concerned with (for instance) the Ising model (and its generalizations) on graphs [69,70], see also Durrett's book [37]. This suggests the following question.
Question 7.6 (Possible probabilistic interpretations of graph MCF, MBO, and AC). Is (MCF ðt ) related to an interacting particle system on the underlying graph? Also, are there interacting particle systems or stochastic processes in V that are related to (MBO τ ) or (ACE ε )? Finding such a system would partly resolve the issue that a front moving on a graph in continuum time necessarily does so in a way that, from a continuum point of view, looks discontinuous (as discussed previously in this paper, in particular in Sections 3.3 and 5.), as the particle dynamics would be continuous in time and stochastic. The convergence results in [66,67,50] show that the above question has an a priori higher chance of having a positive answer for a large graph, as it already holds in the continuum limit. An interesting direction would be to investigate the relation between such probabilistic interpretations and 'spread of information' dynamics such as bootstrap percolation [23], gossip algorithms [94], and replicator dynamics [52,53,96].

A. The continuum case
In this appendix, we briefly review and provide references for the Allen-Cahn equation, the MBO algorithm, and mean curvature flow in the continuum setting.
A.1. The continuum Allen-Cahn equation The Allen-Cahn equation is a reaction-diffusion equation, given by where u : R n × R + → R and ∆ is the standard Laplacian (although other linear elliptic operators can be considered as well), and f is a non-linear function of the form f = −W where W : R → R is a double well potential with two global minima. For simplicity, take W (u) = (u + 1) 2 (u − 1) 2 , where the minima are at ±1. A question which is always of interest is understanding the way that solutions to (43) converge to equilibrium. For each fixed x, one expects that u(x, t) approaches either 1 or −1, as t → +∞, as these values correspond to the minima of W . This indicates that for very large t the function u defines two regions of R n , where it is very close to either 1 or to −1, separated with a smooth transition layer in between.
This asymptotic behavior is well understood nowadays. Rescaling (x, t) as ( x ε , t ε 2 ), we obtain the equation Note that for very small ε the function u ε describes the long time behavior of the original u. Then, it is well known (see [6,16] for background and discussion) that, as ε → 0 + , the solutions u ε (x, t) converge to a function which takes the value −1 in some set S t (depending on time) and takes the value 1 in S c t . Here S t is a set whose boundary is evolving by mean curvature flow (see Section A.2).
Although the original motivation for studying (43) was phase transitions, it is also the gradient flow of the Ginzburg-Landau functional. Precisely, equation (44) is the L 2 gradient flow of the functional It is expected that solutions to (44) converge to a local minimum of this functional, as t → +∞, thus schemes for (44) could be used for approximating minima of (45). This is the application that serves as the biggest motivation in the graph setting. For more information about reaction-diffusion equations with a polynomial nonlinearity we refer to [106, Section 1.1].

A.2. Continuum mean curvature flow
Mean curvature flow (MCF) consists of the evolution of a closed, oriented hypersurface Σ t ⊂ R d over time, such that the inner normal velocity at a given point of Σ t is equal to the mean curvature of Σ t at that point. The study of such a flow has been greatly motivated by phase transition models in crystal growth and materials science, in particular since the important work of Allen and Cahn [2]. Starting with the seminal work of Brakke [15], the mathematical study of this flow has been vast, and has involved areas of mathematics ranging from differential geometry to stochastic control. The use of MCF is now widespread in the modeling of moving fronts [19]. The reason why MCF is so ubiquitous in the phase transitions literature, is that many singular limits of reaction diffusion equations (i.e., singular limits of Ginzburg-Landau dynamics) converge to motion by mean curvature. See [16,87,5] for precise convergence theorems and further discussion.
A well known feature of MCF is both the formation of singularities and the occurrence of topological changes, regardless of the smoothness of the initial data. A significant portion of the literature on MCF deals with notions of weak solutions, the first of which goes back to Brakke [15]. Partial regularity for weak solutions as well as regularity up to the first singular time have been widely studied [38].
An equivalent formulation of the flow looks not only at the hypersurface Σ t , but at the entire domain Ω t bounded by it, so that ∂Ω t = Σ t . Accordingly, it is said that Ω t itself is evolving by mean curvature flow. This perspective is natural for phase transitions.
Let φ(t, ·) : R d → R be the signed distance function to the set Ω at time t. From the level set method perspective [85], the motion by mean curvature 16 of Ω t corresponds to an initial value problem 16 In the literature two related, but different, concepts of mean curvature appear. One corresponds with the factor div ∇φ |∇φ| in (46), the other has a normalization factor 1 d−1 , where d is the dimension of the space. This normalization by the dimension of the hypersurface justifies the "mean" part of "mean curvature". for a fully non-linear degenerate parabolic equation, φ t = F (D 2 φ, ∇φ), φ(·, 0) = φ 0 , where F (D 2 φ, ∇φ) = −|∇φ|div ∇φ |∇φ| .
Then, when there is a smooth solution φ(x, t), the domains given by Ω t := {φ(·, t) < 0} will be evolving by mean curvature flow and will start from the original domain Ω. In general, even for an initial domain with a smooth boundary, a smooth solution might not exist for all times, and one must work with viscosity solutions. In this context, the convergence of the MBO scheme (47) (explained in Section A.3) to such viscosity solutions was proved by Evans [46]. It is worth remarking that Soner and Touzi in [97] interpret MCF as a stochastic control problem. In this interpretation, one controls a Brownian motion for which one is allowed to turn off diffusion in one given direction. The surface Σ t in this case arises as the set of points that can be reached with probability 1. This probabilistic interpretation is quite different from those mentioned in the discussion at the end of Section 7.
Finally, given the affinity with the graph setting, it is worthwhile to comment briefly on the more recent nonlocal mean curvature flow. Caffarelli and Souganidis [18] arrive at this flow by following a nonlocal and continuum analogue of (MBO τ ), where instead of using the Laplacian one uses a fractional power of the Laplacian (−∆) s with s ∈ (0, 1/2). A level set formulation based on viscosity solutions was developed later by Imbert [62].

A.3. The continuum MBO algorithm
The Merriman, Bence, and Osher (MBO) algorithm [75,76,77], also known as the threshold dynamics algorithm, approximates the dynamics of mean curvature flow (46) by alternatively applying diffusion and thresholding operators. Let χ(t, ·) be the characteristic function of the set Ω t at time t. Define the diffusion operator χ 0 → u(t, ·) := e t∆ χ 0 to be the solution of the initial value probleṁ u = ∆u, u(0) = χ 0 (·).
Define the threshold operator The MBO evolution of a set described by u at time T can then be succinctly written χ(T, ·) = P e τ ∆ k χ 0 , where τ = T /k (47) is the 'time step' and k is a parameter. In [46,5] convergence of the MBO algorithm to motion by mean curvature, defined in (46), as k ↑ ∞, is proven. The MBO scheme and its implementation has evolved considerably since its original proposal. We provide a short, non-exhaustive, overview here. In [72,92], the MBO scheme was extended to multiplephase problems. In [91,93], a spectral discretization of the MBO scheme for motion by mean curvature was proposed, which is much more efficient then finite difference approaches. This approach can be applied to both two-phase and multi-phase problems. Convergence for an anisotropic variant of the MBO scheme was proven in [26], and in [43] diffusion generated motion was applied to higher order geometric motions. In [45], the MBO scheme was extended to a thresholding method for approximating the evolution by gradient descent of the Mumford-Shah functional and applied to image segmentation problems. In [44] the authors study MBO-like schemes which use the signed distance function. Recent work [42] presents new algorithms for multiphase mean curvature flow, based on a variational description of the MBO scheme.
It is well-known that, in a finite difference scheme for the MBO algorithm, the time step τ (equivalently k) in (47) must be chosen carefully and in the limit as k ↑ ∞, the discretized MBO evolution is stationary. In fact, when discussing the numerical implementation of the algorithm on a discrete grid, Merriman, Bence, and Osher [75] observe: "The basic requirement is that [the time step, τ ,] be short enough so that the local analysis . . . is valid, but also long enough so that the boundary curve moves by at least one grid cell on the spatial grid (otherwise the curve would be stuck)." They derive heuristic upper and lower bounds on the time step, τ , for the algorithm to approximate motion by mean curvature.

B. Calculation of the first variation for graph total variation
In (28) we computed d dt t=0 TV q a (u + tv) = sgn(∇u), ∇v using the convexity of TV q a . In this section we review this fact to other kinds 66 of graph total variation, which are expressible as T V(u) := max{ div ϕ, u V : ϕ ∈ A}, where A ⊂ E e is some admissible set of edge functions 17 , such that a maximizer ϕ u ∈ A exists (even if it might not be unique). The key fact is that such a T V(u) is convex and might be studied via convex analysis. The convexity of T V is evident from its definition: u → T V(u) is a scalar valued function given as the maximum of a family of linear functions u → div ϕ, u V . Let us recall some concepts from convex analysis [39, Chapter 1, Section 5], in particular, the subdifferential of T V at u. This set valued function is denoted by ∂T V(u) and given by That is, v ∈ T V(u) if and only if it is the slope of an affine function which is tangent to the graph of T V at u. In particular, at the points where T V(u) is differentiable ∂T V(u) consists of a single element: the gradient of T V(u) at u.
In the particular case of T V, it follows that Indeed, note that by (48), for any w ∈ V, T V(w) = div ϕ w , w V , where ϕ w ∈ A is a maximizer in (48). It follows that, if u ∈ V is given, then v ∈ ∂T V(u) ⇔ div ϕ w , w V ≥ div ϕ u , u V + div v, w − u V .
By choosing w = 0 and w = 2u, respectively, we find v, u V = div ϕ u , u V = T V(u). This proves the set identity (49) for any u ∈ V.
On the other hand, the convexity of T V(u) implies it is a locally Lipschitz function, making it differentiable for a.e. 18 u ∈ V by Rademacher's theorem [47, Chapter 6, Section 6.2, Theorem 2]. Therefore, for a.e. u ∈ V, ∂T V(u) contains a single element v = div ϕ u . Then, for a.e. u, it follows that d dt t=0 T V(u + tv) = div ϕ u , v V Now we can consider particular choices for T V and hence for A. For T V = TV q a , we have ϕ u = ϕ a from (1), as already explained in Remark 3.14. Similar computations can be done if we take A's corresponding to ϕ u given respectively by For the latter we can also write where we have to remember that ω q ij ui−uj |∇u|j and 1 |∇u|j are to be interpreted as 0 whenever |∇u| j = 0 for any j ∈ V (including j = i). Because the node function div ϕ TV is the first variation of the isotropic graph total variation, in the literature it is sometimes referred to as curvature or 1-Laplacian. In this paper we have argued why the use of the anisotropic total variation TV a to define curvature, as in (14), is a more natural choice on graphs.