Mean Curvature, Threshold Dynamics, and Phase Field Theory on Finite Graphs

In the continuum, close connections exist between mean curvature flow, the Allen-Cahn (AC) partial differential equation, and the Merriman-Bence-Osher (MBO) threshold dynamics scheme. Graph analogues of these processes have recently seen a rise in popularity as relaxations of NP-complete combinatorial problems, which demands deeper theoretical underpinnings of the graph processes. The aim of this paper is to introduce these graph processes in the light of their continuum counterparts, provide some background, prove the first results connecting them, illustrate these processes with examples and identify open questions for future study. We derive a graph curvature from the graph cut function, the natural graph counterpart of total variation (perimeter). This derivation and the resulting curvature definition differ from those in earlier literature, where the continuum mean curvature is simply discretized, and bears many similarities to the continuum nonlocal curvature or nonlocal means formulation. This new graph curvature is not only relevant for graph MBO dynamics, but also appears in the variational formulation of a discrete time graph mean curvature flow. We prove estimates showing that the dynamics are trivial for both MBO and AC evolutions if the parameters (the time-step and diffuse interface scale, respectively) are sufficiently small (a phenomenon known as “freezing” or “pinning”) and also that the dynamics for MBO are nontrivial if the time step is large enough. These bounds are in terms of graph quantities such as the spectrum of the graph Laplacian and the graph curvature. Adapting a Lyapunov functional for the continuum MBO scheme to graphs, we prove that the graph MBO scheme converges to a stationary state in a finite number of iterations. Variations on this scheme have recently become popular in the literature as ways to minimize (continuum) nonlocal total variation.


Introduction
Motion by mean curvature and flows involving mean curvature in general appear in many important continuum models, including models coming from materials science [81,105], fluid dynamics [58], and combustion [111,88]. All such models involve a front propagating with a velocity depending on the mean curvature of the front. Recently, there has been an increasing interest in using ideas from continuum PDEs (related to mean curvature) in discrete applications such as image analysis, machine learning and data clustering [9,107,74,51,60].
This paper initiates a systematic study of the definition of mean curvature for vertex sets of an arbitrary graph G = (V, E). We examine the mathematical underpinnings of some of the algorithms in the recent papers mentioned above. The graphs considered are arbitrary graphs and are not necessarily obtained as the discretization of a continuum problem, so our perspective is only parallel to one that is purely motivated by numerical analysis. In particular, we do not assume an embedding of the graph in a low dimensional space.
Of course, the various definitions of curvature in the (usual) continuum setting (see Appendix A for a brief review and some references) motivate and inform our approach to defining the curvature of a vertex set S ⊂ V using the discrete total variation norm and the discrete divergence of a "normal" edge flow. Since they are closely related to questions of mean curvature, the Allen-Cahn equation and the MBO scheme for arbitrary graphs G = (V, E) arise naturally in the present investigation. Theoretical and numerical examples are used to highlight possible connections between all these concepts, leading to a number of open questions given in Section 7.
Graphs Laplacians, Allen-Cahn, and MBO. Graph Laplacians are the central objects of study in spectral graph theory [29]. These graph operators share many properties with their continuum counterparts. The Allen-Cahn equation on the graph V is defined in terms of the graph Laplacian, Δ, and any (typically bistable quartic) potential, W . One considers a phase field, u : V × R + → R, solving the differential equation,u

Setup
We work on a finite 2 , undirected graph G = (V, E) with vertex set 3 V = {i} n i=1 and edge set E ⊂ V 2 . The graph is weighted; each edge (i, j) ∈ E, incident on nodes i and j, is assigned a weight ω ij > 0. Since the graph is undirected, (i, j) is identified with (j, i) and ω ij = ω ji . To simplify notation we extend ω ij to be zero for all (i, j) ∈ V 2 which do not correspond to an edge. The degree of node i is d i := j∈V ω ij . Denote the maximal and minimal degrees by d + := max i∈V d i and d − := min i∈V d i . We assume that G has no isolated nodes, i.e., d − > 0. For each i ∈ V we then have a non-empty set of neighbors N i := {j ∈ V : ω ij > 0}. We also assume G has no self-loops, i .e., ω ii = 0. In particular i ∈ N i .
Let V be the space of all functions V → R and E the space of all skewsymmetric 4 functions E → R. Again to simplify notation, we extend each ϕ ∈ E to a function ϕ : V 2 → R by setting ϕ ij = 0 if (i, j) ∈ E. As justified in earlier work [107] we introduce the following inner products and operators for parameters q ∈ [1/2, 1] and r ∈ [0, 1]: Note that in the sum in ϕ, φ E the indices i and j both run over all nodes. The edges (i, j) and (j, i) are counted separately, hence the correction factor 1 2 . Note that the powers 2q − 1 and 1− q in the E inner product and 'dot product' and in the gradient, are zero for the admissible choices q = 1 2 and q = 1 respectively. In these cases we define ω 0 ij = 0 whenever ω ij = 0, so as not to make the inner product, 'dot product', or gradient, nonlocal on the graph. The inner products on V and E 5 are analogous to a weighted L 2 inner products in the continuum case, while the 'dot product' inner product (ϕ · φ) i is analogous to a weighted dot product between vector(field)s (without the integration of the L 2 inner product). A direct computation shows that div and ∇ are adjoints with respect to ·, · V and ·, · E , namely for u ∈ V and φ ∈ E we have The characteristic function of a node set S ⊂ V is χ S ∈ V, defined via This leads to the following associated norms, Laplacians, set measures, and total variation functionals: • Inner product norms, u V := u, u V and ϕ E := ϕ, ϕ E . • Maximum norms 6 , u V,∞ := max{|u i | : i ∈ V } and ϕ E,∞ := max{|ϕ ij | : i, j ∈ V }. • The norm corresponding to the dot product |ϕ| i := (ϕ · ϕ) i . Note that |φ| ∈ V. • The Dirichlet energy • The graph Laplacian Δ := div • ∇: V → V. So (Δu) i := d −r i j∈V ω ij (u i − u j ). 5 Note that ·, · E is indeed an inner product on the space of (skew-symmetric) functions E → R, but not on the space of functions V 2 → R, because for those functions the 'inner product' can be zero for nontrivial functions. 6 To justify these definitions and convince ourselves that there should be no ωij or di included in the maximum norms we define ϕ p E,p := 1 2 i,j∈V ϕ p ij ω 2q−1 ij . Adapting the proofs in the continuum case, e.g., [  It is worth noting that the sign convention for the graph Laplacian is opposite to that used for the continuum Laplacian in most of the PDE literature (in particular, the graph Laplacian Δ is a positive semidefinite operator). When r = 0, Δ is referred to as the unnormalized weighted graph Laplacian. When r = 1, Δ is referred to as the asymmetric normalized graph Laplacian or random walk graph Laplacian. Another Laplacian, often encountered in the spectral graph theory literature, is the symmetric normalized graph Laplacian. This one falls outside the scope of the current setup and will not be considered in this paper. For general references on the graph Laplacian, consult [80,29,108,10]. Note that |S| is just a special case of vol S, for r = 0 (recall we assume d − > 0). • The anisotropic total variation TV q a : V → R: TV q a (u) := max{ div ϕ, Here, the signum function is understood to act element-wise on the elements of ∇u. For the isotropic total variation on graphs, see Remark 2.1. The maximum in the definition of TV q a is achieved by ϕ = ϕ a := sgn(∇u). (1) Note that the values ϕ a takes on the set {∇u = 0} are irrelevant for achieving the maximum, hence this function is not uniquely determined.
The anisotropic total variation of the indicator function 7 for the set S ⊂ V , denoted χ S , is given by TV q a (χ S ) = i∈S,j∈S c ω q ij .
Thus, the total variation of a set S is equivalent to the graph cut between S and S c := V \ S which is used in graph theory and spectral clustering [95]. For future reference it is useful to note that Remark 2.1. One can also define an isotropic total variation on graphs TV : V → R: 7 For χV , the indicator function of the full node set, we also write the constant function 1.
Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 9 The isotropic total variation also has a 'maximum formulation', if we are willing to let go of the skew-symmetry condition for functions in E. To be precise, define the extended set of edge functions E e as the set of all functions E → R, and extend the definition of divergence (compatible with the earlier definition) to functions ϕ ∈ E e : (div ϕ) i := 1 2d r i j∈V ω q ij (ϕ ji − ϕ ij ).
Using the same inner product on E e as on E, this divergence is again the adjoint of the gradient ∇. Then we can write The maximum is achieved by taking As in the anisotropic case, the values ϕ T V takes on the set {∇u = 0} are irrelevant for achieving the maximum, hence this function is not uniquely determined. The quantity div ∇u |∇u| is often referred to as the 1-Laplacian of u. Lemma 2.2. The norms · V and · V,∞ are equivalent, with optimal constants given by d Next we recall the definitions of node set boundaries and (signed) graph distance.
Then, the graph distance between arbitrary i, j ∈ V is given by where the minimum is taken over all paths γ with i 1 = i, i N = j. In other words, d G ij is the minimal distance to go from node i to node j, traveling only via existing edges, where each edge represents a distance ω q−1 ij . For a given set S ⊂ V , we define the graph distance to S at each node as the minimal graph distance to a node in S:

10
Y. van Gennip, N. Guillen, B. Osting and A.L. Bertozzi Vol. 82 (2014) As argued in, for example, [71, Section 3.1, Example 2], d S is the solution u ∈ V to the equation Note that ∂S ⊂ S. Alternative definitions appear in the literature in which ∂S ⊂ S c .

Basic spectral properties of the graph Laplacian, Δ
In this section, we collect a number of spectral properties of the graph Laplacian Δ : V → V. Further discussion and details for the special cases r = 0 and r = 1 can be found in [80,29,108,10], from which our presentation follows.
Note that Δ : The eigenvalues of the graph Laplacian, Δ, are then defined via the variational formulation, The minimum in (5), is attained when F k is spanned by the first k eigenfunctions, i.e., the eigenfunctions corresponding to the k smallest eigenvalues, counting multiplicities. In particular, there are n non-negative real eigenvalues (counted with multiplicity), denoted {λ k } n k=1 . If we denote the span of the first k − 1 eigenfunc- where u ⊥ VFk−1 indicates that u is orthogonal (in the sense of ·, · V ) to u i , for i ∈ {1, . . . , k − 1}. Taking variations of the Rayleigh quotient with respect to u, we find that (λ k , u k ) satisfies (6) if and only if u ⊥ VFk−1 and, for all v ∈ V, 8 Similarly, by changing the "strictly less than" inequality into "strictly larger than" inequality the boundary ∂(S c ) of the set S c is defined. The reduced boundary of S can be defined as the following subset of ∂S (compare with the continuum case in [4, Definition 3.54]): and again similarly for ∂ * S c .
Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 11 Finally, unwinding the definitions, we find that (7) is equivalent to the matrix eigenvalue problem where A ij = ω ij and D ii = d i is a diagonal matrix 9 . We remark that the change of variables, y = D 1/2 x, in (8) gives the standard eigenvalue problem Recall that the spectral radius ρ of Δ is defined as the maximum of the absolute values of the eigenvalues of Δ, Δu V u V , and the spectral radius are equal, Δ V = ρ(Δ). This implies that, for all u ∈ V, (c) The trace satisfies tr Δ = n k=1 λ k = i∈V d 1−r i . Consequently, (d) If G is not a complete graph, then (e) The second eigenvalue satisfies .
(b) Noting that Δ is a self-adjoint operator, a proof can be found in, for example, [89,Thm. VI.6].
(c) Because the trace of the operator Δ is equal to the trace of its matrix representation, we have tr Δ = tr L = n k=1 λ k . Since we assume there are no selfloops in the graph, tr D −r A = 0, hence tr L = tr D 1−r . Equation (9) follows from the fact that λ 1 = 0 and the maximum (minimum) of a set is greater (less) than or equal to the mean of the set.
(d) If G is not a complete graph, then there exists an Note that v, 1 V = 0. The desired upper bound then follows from (6).
Then v, The first inequality then follows from (6). For the second inequality, If j ∈ V is such that d j = d + , then the supremum is attained by the vector u = χ {j} and the result follows.
The following lemma states properties of the diffusion operator e −tΔ : V → V.
Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 13 (c) Let the mass, M , be defined as in (10), λ 2 be the second eigenvalue of the graph Laplacian, and ε > 0. Assume the graph is connected. If denote the eigenpairs of the graph Laplacian with V-normalized eigenvectors, then the spectral decomposition of u is given by Recalling from Lemma 2.5 that λ 1 = 0 and v 1 = (vol V ) − 1 2 χ V and using (11), we compute But by Lemma 2.2, this implies The result now follows, since by Lemma 2.5, λ 2 > 0 for a connected graph.
, for all t > 0, by the uniqueness theorem for ordinary differential equations. In this case there is nothing more to prove. Moreover, by repeating the argument on each connected component we may assume without loss of generality that the entire graph is connected.
Let u 0 , v 0 be such that, for all j ∈ V , we have (u 0 ) j ≤ (v 0 ) j , and for some j 0 ∈ V , (u 0 ) j 0 < (v 0 ) j 0 . We will show that in this case u j (t) < v j (t), for every j ∈ V and all t > 0, which proves the strong comparison principle and in particular the comparison principle.
which is a contradiction. We conclude that u(t 0 ) = v(t 0 ) at all neighbors of k, and by iterating the above argument we get in fact that u(t 0 ) = v(t 0 ) at all nodes of V , since we are dealing with the case where V is connected. By the uniqueness theorem for ordinary differential equations we conclude that u 0 = v 0 at all nodes, which gives a contradiction, and the strong comparison principle is proved.
Remark 2.7. The dependence of the convergence of u(t) to the steady state (vol V ) −1 Mχ V on the second Laplacian eigenvalue, λ 2 , in Lemma 2.6(c), is related to the rate of convergence of a Markov process on a graph to the uniform distribution [100]. Due to this property, λ −1 2 is sometimes referred to as the mixing-time for a graph. The second eigenvalue λ 2 is also referred to as the algebraic connectivity or Fiedler value for a graph [48], and plays an important role in many applications. The robustness of a network to node/edge failures is highly dependent on the algebraic connectivity of the graph. In the "chip-firing game" of Björner, Lovász, and Shor, the algebraic connectivity dictates the length of a terminating game [11]. The algebraic connectivity is also related to the informativeness of a least-squares ranking on a graph [86]. Consequently, algebraic connectivity is a measure of performance for the convergence rate in sensor networks, data fusion, load balancing, and consensus problems [84].

Relation between graph Laplacians and balanced graph cuts
In spectral graph theory there are some well known connections between the various graph Laplacians and different normalizations of the graph cut TV 1 a (χ S ) = i∈S,j∈S c ω ij from (2). For example, when r = 0 (and hence vol S = |S|), the quan- in Lemma 2.5(e) is the Cheeger cut and its minimum over S ⊂ V is the Cheeger constant, e.g., [95,102]. Let S 1 , . . . , S k be a partition of all the nodes V , for a given integer k and define the balanced graph cut We use the subscript r to remind us that vol S i depends on r. For r = 0 this quantity is known in the literature as the ratio cut 10 , for r = 1 as the normalized cut [95,108]. 10 Confusingly, the ratio cut is sometimes also called average cut, and the Cheeger cut  These quantities are introduced, because minimization of the graph cut, without a balancing term in the denominator, often leads to a partition with many singleton sets, which is typically unwanted in the application at hand. Minimization of this balanced cut over all partitions of V is an NP-complete problem [109,95], but a relaxation of this problem can be defined using the graph Laplacian. For example, in [108,Section 5] it is shown that where H is an n by k matrix with elements Note that H is orthonormal in the V inner product, i.e., H T D r H = I, where I is the k by k identity matrix. Hence the minimization of C r over all partitions, is equivalent to minimizing Tr(H T LH) over all n by k V-orthonormal matrices of the form as in (12). The relaxation of this NP-complete minimization problem is now formulated by dropping the condition (12) and minimizing over all n by k V-orthonormal matrices. The problem then becomes an eigenvalue problem and the first k eigenvectors of L are expected to be approximations to the indicator functions of the optimal partition S 1 , . . . , S k . There is often no guarantee of the quality of this approximation though [98,57,64,65,99,108]. In order to turn the approximations of the indicator functions into true indicator functions a method like thresholding or k-means clustering is often used [82]. For partitioning the nodes into two subsets (k = 2), the potential term in the Allen-Cahn equation (discussed in Section 5) can be interpreted as a nonlinear extension of the graph Laplacian eigenvalue problem which forces the approximate solutions to be close to indicator functions. In this light, it is interesting that the graph Ginzburg-Landau functional (with possibly a mass constraint), of which the Allen-Cahn equation is a gradient flow, Γ-converges to the graph cut functional [107].

Curvature and mean curvature flow on graphs
In this section we will derive a graph curvature, analogous to mean curvature in the continuum, and study some of its properties. We end the section with a description of mean curvature flow on graphs.

Graph curvature
In the continuum case, the mean curvature is given by (minus) the divergence of the normal vector field on the boundary of the set (see Appendix A.2 and (45)). This normal vector field achieves the supremum in the definition of total variation of a characteristic function (under sufficient smoothness conditions). Hence, we define the normal of a vertex set by using ϕ a from (1) which achieves the supremum of the anisotropic total variation.
As in the continuum case, we define the curvature of a set as the divergence of the normal.
Recall from (1), that ν S ij is not uniquely determined on {(i, j) ∈ E : (∇χ S ) ij = 0} (i.e., away from the boundary ∂S ∪ ∂(S c ), in the sense of Definition 2.4) and hence the value 0 in (13) is a choice corresponding to the extension of the normal field away from the boundary. This ambiguity is irrelevant when div ν S is coupled to the characteristic function χ S via the V-inner product, as in but care should be taken when trying to interpret the normal or the curvature outside this setting. Note that for q = 1 and The curvature, κ q,r S , has the property that it vanishes away from the boundary ∂S ∪ ∂(S c ). In particular, we see that the above mentioned ambiguity is also irrelevant for pairings with χ S c since Let S,Ŝ ⊂ V be two node sets, then (15) implies For the last two terms, we compute Mean Curvature, Threshold Dynamics,... 17 In particular, ifŜ = S \ {n} for a node n ∈ S, then Because we assume there are no self-loops, in the final equality ω nn = 0. A similar computation shows for n ∈ S c The preceding discussion implies the following: if Ω ⊂ V is such that S minimizes TV q a (χ S ) among all sets S ⊂ V such that SΔS ⊂ Ω, then we have that (compare with the nonlocal mean curvature, (23)). Here, Ω is a set where S and S are forced to agree (similar to enforcing a boundary condition in the continuum case). The two inequality conditions in (18) are opposite for the two sides of the interface between S and S c . This strengthens the heuristic idea that the 'real' interface, where there would be an equality condition, is lost due to the lower bound on the accessible length scales on a graph.

Remark 3.3.
It is interesting to make some connections between the graph total variation TV a and graph curvature κ 1,r S on the one hand, and the local clustering coefficient [110] on the other. Assume G is unweighted (and undirected, as per usual in this paper), then the clustering coefficient C i of node i, is the number of triangles node i is part of, divided by the number of possible triangles in the neighborhood of i, i.e.,  (2) and (16), we can rewrite, for r = 1, As for the continuum case, we can arrive at the graph curvature, κ q,r S , in several ways. We discuss some of them below. However, the analogy with the continuum curvature becomes even more apparent if instead of the standard mean curvature, we consider the nonlocal mean curvature (see Section 3.2).
If we (formally) compute the first variation of the continuum TV(u) over all functions u ∈ BV of bounded variation (not restricting ourselves to characteristic functions only), we find div ∇u |∇u| , the curvature of the level sets of u. In (28) and Appendix B we follow a similar procedure on the graph and again find div ϕ a , with ϕ a from (1).
Similarly, in the continuum case, an alternative definition of the mean curvature is div ∇χ S |∇χ S | which is a Radon measure defined on the boundary. However, |∇χ S | = 1 on the boundary of S, so that the mean curvature is simply Δχ S . As long as S is a rectifiable set, this computation can be made rigorous in the context of BV functions (see [47,Chapter 5]). Computing the analogous quantity on a graph, we find This is equal to κ 1,r S (compare with (3)). The choice q = 1 is a natural one, because it corresponds to the Γ-limit of the graph Ginzburg-Landau functional (GL ε ) [107], whose definition is given in Section 5.
In the continuum case the mean curvature κ(x) at the point x ∈ ∂S in the boundary of a set S ∈ R d satisfies the property that, if S is smooth enough, then for any given ball B δ (x) of radius δ and center x ∈ ∂S,

Vol.82 (2014)
Mean Curvature, Threshold Dynamics,... 19 Note that if S were a half space, then the expression on the right is zero for all δ, since ∂S would separate each B δ (x) in sets of equal volume. Thus κ measures how much ∂S deviates from cutting B δ (x) in sets of equal volume. The analogous computation on a graph, replacing the ball by the set Note that if r = 0 and G is an unweighted graph, such that ω ij ∈ {0, 1}, then in whose right hand side we recognize (18). In the last equality we used that

Remark 3.4.
We note that graph curvature is related to the process of bootstrap percolation [23,63,13], in which nodes on an unweighted graph switch from 'inactive' to 'active', if their number of active neighbors exceeds a given threshold value. This number corresponds exactly to the graph curvature (κ q,0 S ) i , if node i is inactive and S is the set of active nodes.
The equality in (19) has an interesting consequence on the level of Γ-convergence of functionals. For more information about the theory of Γ-convergence, we refer to [31,14]. The first result of Γ-convergence for the Ginzburg-Landau functional goes back to work of Modica and Mortola [79,78].
a nonnegative double well potential with wells at 0 and 1, and consider the functionals else.
Then f ε Γ → f 0 as ε → 0 (using any of the equivalent metrics on R n ).
Furthermore, if the double well potentialW satisfies a coercivity conditioni.e., there exists a c > 0 such that for large |u|,W (u) ≤ c(u 2 −1)then compactness holds, in the following sense: and let {u n } ∞ n=1 be a sequence such that there exists a C > 0 such that for all n ∈ N f εn (u n ) < C. Then there exists a subsequence Proof. The key point in the proof of the Γ-convergence is to note that f ε is a continuous perturbation of the functional w ε : V → R, By a well known property of Γ-convergence [31, Proposition 6.21], the Γ-limit is preserved under continuous perturbations. Then using the fact, shown above in (19), that Δu = κ 1,r S if u = χ S , completes the proof of Γ-convergence. The compactness result is a direct adaptation of the proof of [107,Theorem 3.2] to the current functionals f ε . Remark 3.6. Note that in Theorem 3.5 above, we can also use the double well potential W with wells at ±1, instead ofW . In that case, the limit functional w 0 in the proof takes finite values only for functions of the form We end this subsection with another similarity between the graph based objects we introduced and their continuum counterparts. The gradient of the graph distance d ∂S , from Definition 2.3, agrees with the normal ν, from (13), on the boundary of S induced by the graph distance, in the sense of the following lemma. This again corresponds to what we expect based on the continuum case. We define the signed distance to ∂S as Define the exterior boundary of S induced by the graph distance d ∂S as Similarly, let the interior boundary of S induced by the graph distance be Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 21 Note that ∂ ext S ⊂ ∂(S c ). Hence, i ∈ ∂(S c ) and thus there is a k ∈ N j such that k ∈ ∂S and therefore d ∂S Thus the maximum is achieved for k = j, and hence so is the minimum in (4), which shows that The proof for i ∈ ∂ int S follows from similar arguments, noting that This situation does not occur if the graph distances are consistent, in the following

Relation with the continuum nonlocal mean curvature
There is a clear analogy between the expressions in (18) and the (continuum) nonlocal mean curvature [18,17], as well as between TV q a and continuum nonlocal energy functionals.
Consider an interaction kernel K : This kernel K can be thought of as the energy given by a long-range interaction between a particle placed at x with a particle at y. It defines a functional on subsets S ⊂ R n , sometimes called "nonlocal perimeter" or "nonlocal energy" and it is given by where for any pair A, B ⊂ R n we write Compare this with the graph case, if, for A, B ⊂ V , we write In terms of this bilinear functional, the anisotropic total variation, defined in (2), can be rewritten TV q a (χ S ) = L G (S, S c ). We see that (21) is nothing but the continuum version of (22), and one may rightfully interpret the weight matrix ω q ij as an interaction kernel between pairs of nodes in G and L G (S, S c ) as measuring the total "interaction energy" between S and S c . Now suppose that S ⊂ R n minimizes J K (S) in some domain Ω, meaning that if S is such that SΔS ⊂⊂ Ω then J K (S) ≤ J K (S ). In this case one can see that the following two conditions must hold If, arguing heuristically, we let A shrink down to any x ∈ (∂S) ∩ Ω, we find that The integral on the left, which is well defined in the principal value sense 12 when x ∈ ∂S and ∂S is smooth enough (C 2 suffices), is known as the nonlocal mean curvature of S at x with respect to K, or just nonlocal mean curvature of S at x, when K is clear from the context. As with J K and TV q a , we see that is exactly a continuum analogue of the quantity j∈S − j∈S c ω q nj in (18), moreover, the inequalities in (23) are a continuum analogue of those in (18). Note however, that j∈S − j∈S c ω q nj is defined for all of n ∈ V , whereas κ nonlocal (x) above is only defined when x ∈ ∂S and ∂S is smooth enough.
Known regularity results deal mostly with the case K s (x, y) := c n,s |x − y| −n−s for s ∈ (0, 1) [17,22]. It is worth noting that J Ks is a fractional Sobolev norm of the characteristic function of S Moreover, as s → 1 − the quantity above gives the perimeter of S, and the corresponding nonlocal mean curvature converges pointwise to the standard mean curvature. In [18] it is shown that if we consider the MBO scheme, where instead of the heat equation we use the fractional heat equation, then in the limit we get a set S t evolving over time with a normal velocity at x ∈ ∂S t given by Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 23

Mean curvature flow
In this section, we define a mean curvature flow on graphs and connect it with the curvature κ q,r S in (14). It is not clear what is the most natural notion for the evolution of a phase in a graph. Do we want to consider a sequence of subsets {S n } n∈N , or a continuous family {S t } t>0 which, although piecewise constant in t, may change in arbitrarily small time intervals? How we connect solutions of the graph mean curvature flow to solutions of the graph (MBO τ ) scheme from Section 4, or to solutions of the graph Allen-Cahn equation (ACE ε ) from Section 5, will depend on the answer to this question. For now, we shall be content with considering a phase evolution comprised of a discrete sequence of sets S n = S(nðt), n ∈ N, that correspond to the state of the system at discrete time steps.
Our construction follows the well-known variational formulations for classical mean curvature flow [3,68]. Appendix A.2 has a brief overview of mean curvature and the associated flow in the continuum case. An obstacle one encounters when trying to emulate the continuum level set method to express mean curvature flow on graphs is, that, due to the lack of a discrete chain rule, the resulting equation is not independent on the choice of level set function.
Recall the notions of graph distance and boundary of a node set from Section 2. and Note that, for a given graph G, minimizers of F may not be unique. In this case different mean curvature flows can be defined, depending on the choice of S n+1 . An example of this non-uniqueness on the 4-regular graph is given in Section 6.5.
We choose to use the distance to Σ n , instead of the distance to either ∂S n or ∂(S c n ), so that the mean curvature flow is not a priori (independent of curvature) biased to either adding nodes to or removing nodes from S.
Since nodes in Σ n can be added or removed from S without increasing the last term of (25), every stationary state χ S of (MCF ðt ) is a minimal surface in the sense that TV q a (χ S ) ≤ TV q a (χ {S∪{n} ) for n ∈ ∂(S c ) and TV q a (χ S ) ≤ TV q a (χ {S\{n} ) for n ∈ ∂S (in the case where the minimizer of F is unique, the inequalities are strict). In particular, the sequence defined by (MCF ðt ) is not "frozen" for ðt arbitrarily small. In the continuum, stationary points of mean curvature flow are minimal surfaces, i.e., surfaces of zero mean curvature [55,20,73,30]. On the graph, the zero mean curvature condition is too restrictive, since it would only allow for disconnected sets S and S c , but, given Definition 3.8 above, we can still define an ðt-minimal set on a graph as a node set S ⊂ V , such that S is a stationary point of the mean curvature flow (MCF ðt ). Note that, if ðt 1 > ðt 2 and S is an ðt 1 -minimal set, then S is also an ðt 2 -minimal set.
Remark 3.10. In the last term of F(Ŝ, S n ), we use a symmetrized distance to the boundary, For this alternative choice, (MCF ðt ) exhibits "freezing".
We can rewrite the last term in F(Ŝ, S n ) in terms of the signed graph distance (20)), which takes nonnegative values in S c n and nonpositive values in S n . We state the precise result in the following lemma.
Proof. The rewriting of TV q a (χŜ) − TV q a (χ Sn ) in terms of the curvatures, follows directly from (17). For the distance term, we compute The proof is completed by noting that the last term above does not depend onŜ. Lemma 3.11 allows us to find a convex functional, the superlevel sets of whose minimizers are themselves minimizers of F from (25) (similar to the continuum results in, for example, [27] and [28]). Before we state and prove that result in Theorem 3.13, we prove a layer cake or coarea formula for the discrete total variation, TV q a . Lemma 3.12. Let u ∈ V and define, for t ∈ R, Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 25 Proof. As also noted in [112] and [24], we have, for i, j ∈ V , The second result follows, since, then, for all s ∈ R, the superlevel set Proof. Let u − ≤ min i∈V u i . Then from Lemma 3.12 we know that This gives where F is as in (26). Hence, if u minimizes F , then for a.e. s ∈ [u − , ∞), the superlevel set E(s) minimizes F (·, S n ). In fact, since u − is arbitrarily chosen from (−∞, min i∈V u i ], this result holds for a.e. s ∈ R. Because u takes only finitely many values, the result holds for all s ∈ R. Lemma 3.11 now completes the proof. Note that Lemma 3.12 and Theorem 3.13 above also hold if we define E(t) in terms of a non-strict inequality instead: Remark 3.14. The function TV q a (u) is a convex function, thus, taking ∂TV q a (u) as the (possibly multivalued) subdifferential of TV q a (u) [39], any minimizer of u → TV q a (u) + u, g V , for g ∈ V, will solve the differential inclusion From the definition of TV q a we see that its subdifferential is only multivalued at u for which ∇u vanishes between two nodes, at all other u, TV q a is pointwise differentiable. In particular, if u, v ∈ V and ∇u is never zero, we may differentiate 13 Since div is the adjoint of ∇, it follows that, for all v ∈ V, Therefore, the Euler-Lagrange equation for minimizers of F from (27), is div sgn(∇u) + 1 ðt sd Σn = 0, provided ∇u is never zero. Whenever (∇u) ij = 0 for some i, j, the above equation is replaced by a differential inclusion in terms of the subdifferential of the absolute value function. Concretely and to recap, since the subdifferential of the absolute value function at 0 is the interval and if (∇u) ij = 0, then φ ij = sgn((∇u) ij ).
Fast computational methods for the solution of (MCF ðt ) based on max flow/min cut algorithms are developed in [24,25]. These methods exploit the homogeneity and submodularity of the total variational functional, TV q a . Those same papers also show it is possible to rewrite (MCF ðt ) as an Rudin-Osher-Fatemi (ROF) problem [90].
It would also be interesting to compare (MCF ðt ) to the finite difference scheme for motion of level sets by mean curvature on a lattice in [83].

Threshold dynamics on graphs
In this section we study the threshold dynamics or Merriman-Bence-Osher algorithm on a graph G. For a short overview of the continuum case we refer to Section A.3 in Appendix A.

The graph MBO algorithm
The MBO scheme on a graph, describing the evolution of a node subset S ⊂ V , is given as follows.

Algorithm (MBO τ ): The Merriman-Bence-Osher algorithm on a graph.
Data: An initial node subset S 0 ⊂ V , a time step τ > 0, and the number of time steps N > 0. Output: A sequence of node sets {S k } N k=1 , which is the (MBO τ ) evolution of S 0 . for k = 1 to N , do Diffusion step. Let v = e −Δτ χ S k−1 denote the solution at time τ of the initial value problemv Here χ S denotes the characteristic function of the set S.
By the comparison principle, Lemma 2.6(d), we note that the solution to (29)  then the iterates can be succinctly written u k = (P e −Δτ ) k u 0 . Several papers use the MBO algorithm on a graph to approximate motion by mean curvature. For example, in [74,51,60], the MBO algorithm on graphs was implemented and used to study data clustering, community detection, segmentation, object recognition, and inpainting. This is accomplished by simply reinterpreting the Laplacian in (46), or in appropriate extensions of (46), as the graph Laplacian.

The "step-size" τ in the (MBO τ ) algorithm
As discussed at the end of Section A.3, in a finite difference discretization of the continuum MBO algorithm, the time step τ must be chosen carefully to avoid trivial dynamics. For τ too small, there is not enough diffusion to change the value of u at neighboring grid points beyond the threshold value. In this case, the solution is stationary under an (MBO τ ) iteration and we say that the solution is frozen or pinned. For τ too large there is so much diffusion that a stationary state is reached after one iteration in the MBO scheme. It is not surprising that these finite difference In particular, since vol S > d Proof. To prove (29), let χ S be the characteristic function on a set S ⊂ V . For a node to be added or removed from S by one iteration of (MBO τ ), it is necessary that (see also Lemma 2.5(b)). Using Lemma 2.2, we compute Using the triangle inequality and the submultiplicative property of · V (see, e.g., [59]), we compute To prove (30), we write the solution to the heat equation at time τ , This implies Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 29 Here, we used the comparison principle, Lemma 2.6(d). Thus, if τ ≤ 1 2 Δχ S V,∞ , then u(τ ) − χ S V,∞ ≤ 1 2 , implying the (MBO τ ) solution is stationary.
The following corollary of Lemma 2.6(c) shows that an upper bound on τ is necessary to avoid trivial dynamics.
where the mass M (u 0 ) is defined in (10), then This corollary shows that, if τ is chosen too large, one iteration of (MBO τ ) leads to a trivial state u = χ V or u = 0, which is stationary under the algorithm (MBO τ ).
The following theorem gives a condition for which there is a gap between the lower and upper bound for τ . Theorem 4.4. Consider the (MBO τ ) iterations on a graph with n ≥ 2. Let τ ρ and τ t be defined as in (29) and (31).
Since d r − ≤ volS, we have The result follows. Theorem 4.4 further reinforces our intuition that τ should be chosen in the interval (λ −1 n , λ −1 2 ). If, for a particular graph, this interval is very small, then Theorems 4.2 and 4.3 cannot provide an interval for which the (MBO τ ) iterations has a chance of being non-stationary after the first iteration. Note however that the interval [τ ρ , τ t ] given by these theorems, is not necessarily a sharp interval for interesting dynamics.

A Lyapunov functional for the graph (MBO τ ) algorithm
In this section we introduce a functional which is decreasing on iterations of the (MBO τ ) algorithm. The analogous functional for the continuum setting was recently found in [42]. The functional is then used to show that the (MBO τ ) algorithm with any initial condition converges to a stationary state in a finite number of iterations.
Let τ > 0 and consider the functional J : V → R defined by Note that by, Lemma 2.6(a), where M is the mass from (10). 1. J is a strictly concave functional on V.

J is Fréchet differentiable with derivative in the direction v given by
Proof. We compute, for all v = 0, Taking the first variation of J(u) = 1 − u, e −τ Δ u V , we find that δJ δu , δu as desired.
Since the objective function in (33) is concave and the admissible set is a compact and convex set, it follows that the solution to (33) The optimization problem in (34) may not have a unique solution, so the iterates are not well-defined. The following proposition shows that the iterations of the (MBO τ ) algorithm define a unique sequence satisfying (34). Note that the optimization problem in (34) is the minimization of a linear objective function over a compact and convex set, implying that for any sequence {u k } ∞ k=0 satisfying (34), u k ∈ B for all k ≥ 0. Proof. At each iteration k, the objective functional L u k is linear and thus the minimum is attained by a function These are precisely the (MBO τ ) iterations. By the strict concavity of J and linearity of L u k , for u k+1 = u k , Since u k ∈ K, L u k (u k+1 ) ≤ L u k (u k ) which implies J(u k+1 ) < J(u k ). The convergence of the algorithm in a finite number of iterations then follows from the fact that B contains only a finite number of points, the vertices of the unit n-cube.
Proposition 4.6 shows that J is a Lyapunov function for the (MBO τ ) iterates. From the proof of Proposition 4.6, we also note that the non-uniqueness of the iterates in (34) corresponds to the choice in the (MBO τ ) algorithm of thresholding vertices {j ∈ V : e −τ Δ u k = 1 2 } to either 0 or 1 (see Remark 4.1).

Remark 4.7.
The framework of [42] easily allows for the extension of the MBO algorithm to more phases, however we do not pursue these ideas here.

A local guarantee for a 'nonfrozen' (MBO τ ) iteration
We begin by observing that the constant τ κ in Theorem 4.2 depends on the maximum curvature κ 1,r S of the indicator set in the graph, Δχ S V,∞ . In this section, we prove a theorem which gives a condition on τ in terms of the local curvature (κ 1,r S ) i at a node i, which guarantees that the value of u on that node will change in one iteration of the graph (MBO τ ) scheme. Vol.82 (2014) We first introduce some notation which is needed to state the theorem. Recall that the set of neighbors of a node i ∈ V is N i = {j ∈ V : ω ij > 0}. Let 1 ∈ V be an arbitrary node in the graph G and let S ⊂ V . We define the sets The set S 1 contains neighbors of node 1 which are also in either the boundary ∂(S c ) or ∂S (depending on whether or not 1 ∈ S). For u ∈ V, define Δ as We see that Δ on S 1 is similar to the Laplacian on the subgraph induced by S 1 , with the important distinction that, for each i ∈ V , the degree d i is the degree of i in the full graph G, not the degree in the subgraph induced by S 1 . In [29,Section 8.4], Δ is referred to as the Laplacian with Dirichlet conditions on ∂(S 1 c ). If Note in particular that, if v ∈ V 1 , then e −tΔ v ∈ V 1 for all t ≥ 0.

Theorem 4.8. Let 1 ∈ V be an arbitrary node and S ⊂ V be such that
That is, the phase at node 1 changes after one (MBO τ ) iteration.
It is important to note that both |(κ 1,r S ) 1 | 2 and (Δ ) 2 χ S 1 V,∞ are local quantities, in the sense that they only depend on the structure of G at node 1 and its neighbors. This is in contrast with Theorem 4.2, which depends on the spectrum of the Laplacian on G. The existence of a lower bound τ 1 on τ is unsurprising in the light of this earlier freezing result. The necessity for an upper bound τ 2 can be understood from our wish to only use local quantities in this theorem.

Vol.82 (2014)
Mean Curvature, Threshold Dynamics,... 33 Let v satisfy the heat equation with Dirichlet boundary data, As noted above the theorem, v(t) ∈ V 1 for all t ≥ 0. It is easily checked that v is subcaloric, i.e.,v i ≤ −(Δv) i for all i ∈ V , and v(0) ≤ χ S . In addition, the Laplacian satisfies −(Δu) i ≤ −(Δũ) i if u i =ũ i and u j ≤ũ j , for j = i. Hence, by the theory of differential inequalities (see for example [101,Theorem 8 In particular, where the last equality follows because 1 ∈ S.
We conclude that which proves the result for the case in which 1 ∈ S.
To prove the desired statement if 1 ∈ S, we note that Recall that, in this case, S 1 = N 1 ∩ S c , and the same derivation as above holds 14 , since 1 ∈ S c , with the exception that the admissible range of τ becomes the open interval (τ 1 , τ 2 ). This is because, by our definition of the (MBO τ ) algorithm, the thresholding operator thresholds the 1 2 -level set to 1. In the remainder of this section we determine some conditions under which the requirement |(κ 1,r S ) 1 | 2 > (Δ ) 2 χ S 1 V,∞ in Theorem 4.8 is satisfied. To this end, define the reduced degrees, for i ∈ V , as 14 In particular, carefully note that now − (Δ χS 1 ) 1 = (κ 1,r S )1 = |(κ 1,r S )1| holds.
Proof. Consider the |V | × |V | matrix corresponding to Δ . After possibly relabeling the nodes, it can be written as where, for the second equality, we used that ω ii = ω jj = 0. From (14) and the definition of S 1 , we find then condition (37) is satisfied, and there exist 0 < τ 1 < τ 2 as in Theorem 4.8.

Vol.82 (2014)
Mean Curvature, Threshold Dynamics,... 35 If the first condition in (38) is satisfied with then the second condition in (38) can be replaced by the condition that, for all i ∈ S 1 , Note, by (14) and (36), that the conditions in (38) can be rewritten as Proof of Corollory 4.10. For r = 1, we compute, for i ∈ S 1 , If the first condition in (38) is satisfied, we have Since d 1 > 0, condition (37) reduces to max If the maximum is If, on the other hand, the maximum is achieved at some i ∈ S 1 , then Here, we first used the first condition from (38), and then the second. Combined with Theorem 4.8 and Lemma 4.9, this proves the first claim. Now assume, instead of the second condition in (38) It is worthwhile to understand the conditions in the corollary. Adding the two conditions in (38) gives the relative strength of connection of node i within the set S 1 compared to all its connections in G. Similarly d 1 d 1 is the relative strength of connection of node 1 within S 1 (or, equivalently, within S 1 ), compared to all its connections in G. The conditions in (38) thus require node 1 to be a node with comparatively large relative connection strength within S 1 , compared to the other nodes in S 1 . This will allow enough mass to diffuse to or away from node 1 (depending on whether or not 1 ∈ S), for it to pass the threshold value 1 2 , without too much of the locally available mass diffusing to other nodes.
Some examples in Section 6 further examine the conditions in (37) and (38).

Remark 4.11. One can interpret the condition
Combining these results, the condition or, equivalently, if Using vol S 1 = i∈S 1 d r i ≥ |S 1 |d r − , we can deduce the stronger sufficient condition Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 37

Allen-Cahn equation on graphs
In this section, we investigate the Allen-Cahn equation on graphs. A short overview of the continuum Allen-Cahn equation can be found in Section A.1 in Appendix A. We propose the following Allen-Cahn equation (ACE) on graphs, for all i ∈ V : for a given initial condition u 0 ∈ V and ε > 0. Here W ∈ C 2 (R) is a double well potential. For definiteness we set W to be the standard double well potential W (u) = (u + 1) 2 (u − 1) 2 , hence W (u) = 4u(u 2 − 1) and W has two stable minima at the wells at u = ±1 and an unstable local maximum at u = 0. Recall that the sign convention for the Laplacian is opposite to the one used in the continuum literature. Note that for ε sufficiently small, this system has 3 n equilibria, of which 2 n are stable. The (MBO τ ) algorithm is closely related to time-splitting methods applied to the Allen-Cahn evolution (ACE ε ). The diffusion step is precisely the time evolution with respect to the first term of (ACE ε ) and the thresholding step is the asymptotic behavior of evolution with respect to the second term of (ACE ε ).
The case V = Z d with weights ω ij = ω( i − j ) was considered in [8], where it is seen as an approximation to the Ising model, and stationary solutions and traveling waves are constructed for ε small enough. The authors note that, when ω ij corresponds to nearest-neighbors, this equation is known as the discrete Nagumo equation, which is a simplified model of neural networks. In this context, [61] considered the Nagumo equation in Z 1 and derived the existence of traveling waves. In general, we are not aware of any previous works where (ACE ε ) is considered for an arbitrary weighted graph (V, E, ω ij ). It would be interesting to see whether the analysis in [8] can can be extended to use (ACE ε ) to study phase transitions in general graphs, a topic of interest in other areas of mathematics [70].
Just as in the continuum case, we arrive at (ACE ε ) as the gradient flow given by the graph Ginzburg-Landau functional, GL ε (u) : V → R, , and whose first variation is given by The factor d −r i in the potential term is needed to cancel the factor d r i in the V-inner product. Equation (ACE ε ) is then the V-gradient flow associated with (GL ε ).
Recall that the Laplacian Δ also depends on r. In fact, the equation in (ACE ε ) can be rewritten as showing that the factor d r i can be interpreted as a node-dependent time rescaling.
Y. van Gennip, N. Guillen, B. Osting and A.L. Bertozzi Vol. 82 (2014) By standard ODE arguments and the smoothness of the right hand side of (ACE ε ), for each ε > 0 a unique C 1 solution to (ACE ε ) exists for all t > 0.
This continuum case (see Appendix A.1) suggests an approach for finding a valid notion of mean curvature (and its flow) for graphs: Take initial data u(0) = χ S −χ S c , for some node set S ⊂ V , and consider the corresponding solution u ε (t) to (ACE ε ), for all times t > 0. The question is whether the limit exists. Even if it does, it is unlikely thatū is of the form χ S(t) − χ S(t) c , which can be interpreted as a binary indicator function for some evolving set S(t), for all times t > 0, but there may be an approximate phase separation: , for some small δ > 0. Is there a way to characterize the evolution of the "interface" between the two level sets ofū i (t)?
However, a little analysis shows the above approach is rather naïve. Indeed, unlike in the continuum case, the graph Laplacian of the indicator function of a set S ⊂ V is always a well-defined bounded function (in any norm). Thus, for small ε the potential term in the equation will dominate the dynamics, and pinning or freezing will occur, as proven in Theorem 5.3. This is the dynamics in which the sign of the value of u on each node is fixed by the sign of the initial value, and u at each node just settles into the corresponding well of W .
As discussed at the start of Section 3.3, the question how to connect the sequence of sets evolving by graph mean curvature to the super (or sub) level sets {i ∈ V : u ε i (t) > 0} for solutions of (ACE ε ), is still open. See also Question 7.4. Remark 5.1. Note that in the (MBO τ ) algorithm, the values of u are reinitialized in every iteration to 0 or 1. Our choice of the double well potential W in (ACE ε ) has two equilibria corresponding to the level sets for ±1. Correspondingly, the unstable equilibrium for (MBO τ ) corresponds to the 1/2 level set, while for (ACE ε ) it corresponds to the 0 level set. This agrees with the now standard notations for Allen-Cahn and MBO.
Below, we show that for all ε below a finite ε 0 > 0 the functions u ε i (t) do not change sign as t varies, so that pinning occurs. Recall that a set which contains the forward orbit of each of its elements is called positively invariant, and that the number of nodes in the graph G is |V | = n. Lemma 5.2. Consider the set S := {u ∈ V : u 2 V ≤ 17 4 n d −r + } and let u(t) be the solution to (ACE ε ) for a given ε > 0. Then t → u(t) 2 V is decreasing at each t such that u(t) ∈ S c . As a consequence, the set S is positively invariant and every trajectory of (ACE ε ) enters S in finite time.
Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 39 The last inequality follows, since u i (t) 2 − 1 > 1 for i ∈ A c (t), and max{x 2 (1 − x 2 ) : V is decreasing in the region S c , as desired. The other statements in the lemma now follow.
There exist an ε ρ and an ε κ (depending on the spectral radius of Δ via (40), and on sup t≥0 Proof. By Lemma 5.2, u(t) 2 V ≤ 17 4 n d −r + for t large enough. Hence, by continuity of u(t), there is a C (depending on the initial condition) such that, for all t ≥ 0, Thus, if ρ > 0 denotes the spectral radius of Δ, we get, for all i ∈ V , for all i ∈ V , thus, we have the inequalities . Without loss of generality, we can assume that there is a number α ∈ (0, 1) such that |u i (0)| ≥ α for all i ∈ V . If there is an i ∈ V such that |u i (t)| = α for a given t, then we have that |W (u i (t))| = 4α(1 − α 2 ), with a sign opposite to that of u i . Thus, if Hence u i (t) can never reach zero, and by continuity in t it does not change sign. Alternatively, instead of (39), we can estimate The finitude of the right hand side follows from (39). Following the same reasoning as above, we then conclude The constant ε κ in Theorem 5.3 involves Δu V,∞ , which is "curvature-like". Compare this to the constant τ κ in Theorem 4.2, which depends on the maximum curvature of the indicator set in the graph, Δχ S V,∞ , as also discussed in Section 4.4. This tentative similarity makes us suspect, that a condition on the local curvature, similar to those for the (MBO τ ) algorithm given in Theorem 4.8, guarantees a phase change in the Allen-Cahn flow. We discuss this further in Question 7.3.
We see that the discrete nature of the graph, manifest in the finite spectral radius of the Laplacian, makes the limit behavior of (ACE ε ) as ε → 0 much different than that for the continuum case. In particular, this means that we ought to look for a notion of mean curvature flow on graphs more carefully.

Remark 5.4.
For ε small enough, but not smaller than the ε 0 from Theorem 5.3 above, we expect interesting asymptotic behavior for the motion of the phases in (ACE ε ) on intermediate time scales. Such asymptotics might be connected to the graph curvature of the phases, which would match the situation in the continuum setting, where the solution has phases that for large times behave as if they were evolving by mean curvature flow, while the solution itself becomes stationary in the limit t → +∞. This phenomenon is known as dynamic metastability (see for instance [16] and the references therein). See also Question 7.4.

Explicit and computational examples
In this section we give several examples of graphs where the mean curvature, MBO, and Allen-Cahn evolutions can be compared either explicitly or computationally.

Complete graph
Consider the complete graph, K n , on n nodes with ω ij = ω for all i, j ∈ V . See Figure 1a. In this case, the matrix representation of the graph Laplacian is given by the circulant matrix, L = ω[(n − 1)ω] −r n Id n − 1 n 1 t n , where 1 n denotes the vector in R n with all entries equal to 1, and Id n the identity matrix in R n×n . The eigenvalues of L are given by 0 and ωn[(n − 1)ω] −r (with multiplicity n − 1). In particular, λ 2 = λ n = ρ. Note that the normalized eigenvector corresponding to eigenvalue 0 is given by (vol V ) − 1 2 χ V .

Vol.82 (2014)
Mean Curvature, Threshold Dynamics,... 41 Let S ⊂ V be a set with volume ratio R S = vol S vol V (see also Theorem 4.3). Using the spectral decomposition from (11), the evolution of χ S by the heat equation can be written explicitly as Assume R S = 1 2 . Then there exists a critical time step τ c depending only on vol S, vol V , and ρ such that τ < τ c implies the solution to the (MBO τ ) evolution is pinned and τ ≥ τ c implies exactly one iteration of the (MBO τ ) evolution gives a stationary solution, either 0 or χ V depending on the initial mass, M (χ S ) = vol S (see (10)). From the solution, the critical time step τ c can be directly computed, If R S = 1 2 , symmetry pins the (MBO τ ) evolution for all τ > 0. The bound from Theorem 4.2 states that pinning occurs if where we have used the fact that, for all i ∈ V , d r i = volV n . The bound in Theorem 4.3 states that trivial dynamics occur if τ > τ t = ρ −1 log Note that for n > 2, R S > 1 n and τ t > τ c > τ ρ . By symmetry, both the Allen-Cahn equations and mean curvature flows reduce to two-dimensional systems, with one variable governing the value of the nodes in S and the other the nodes in S c . Critical parameters and ðt exist for which below the phase remains the same for all nodes and above the phase simultaneously changes.

Star graph
Consider a star graph SG n as in Figure 1b with n ≥ 3 nodes. Here the central node (say node 1) is connected to all other nodes and the other n − 1 nodes are only connected to the central node. Hence, for all i ∈ {2, . . . , n}, ω 1i = ω i1 > 0, and all the other ω jk are zero.
We consider the unnormalized graph Laplacian L = D − A (r = 0 in (8)). Since d 1 = n j=2 ω 1j and d i = ω 1i , for i ∈ {2, . . . , n}, we can explicitly compute the characteristic polynomial of L: If all non-zero edge weights have the same value ω, this simplifies considerably to We now let S = {1} and note that χ S has the explicit expansion in terms of these eigenvectors, We now consider the (MBO τ ) iterates of χ S . We compute Thus pinning occurs if τ < τ c := 1 nω log 2 n−1 n−2 . If τ > τ c , the solution to the (MBO τ ) evolution gives the stationary solution, 0, after exactly one iteration. The bound from Theorem 4.2 states that pinning occurs if τ < 1 nω log 3 2 . The bound in Theorem 4.3 states that trivial dynamics occur if τ > 1 ω log 2 n−1 n−2 . Qualitatively, this example shows that it is easier for a solution to be pinned on nodes with smaller degree.
Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 43 We now consider an implication of Theorem 4.8 in the case where the graph induced by S 1 is a star graph with node 1 as center, i.e., d i = 0 for all i ∈ S 1 . This is certainly true for the case when the graph is a star graph and S = {2, . . . , n}. The following lemma states, in the case where r = 1, a simple criterion on the degrees for which there exists a τ interval in which node 1 switches phase in a single iteration of (MBO τ ). We will see an application of the lemma in Section 6.3. Lemma 6.1. Let r = 1 and consider the case where the subgraph induced by S 1 is a star graph with node 1 as center, i.e., d i = 0 for all i ∈ S 1 . If either then there exists a nontrivial τ interval (as in Theorem 4.8) such that node 1 will change phase in the next (MBO τ ) iterate.
Proof. We see from condition (38) (or via direct computation from (37)) that a sufficient condition for τ 1 is the only node in S 1 , then ω i1 = d 1 and we find the condition thus the condition on d 1 d i can be replaced by the simpler (but stronger) condition d i < d 1 , for all i ∈ S 1 .

A regular tree
We consider the (MBO τ ) iterations on a regular tree as in Figure 2. Let ω ij = ω, for all (i, j) ∈ E, and r = 1. As in Figure 2a, we consider the case where the initial set S consists of the leaves of a branch. We first observe that the subgraph induced by S j , for any j ∈ V , is a star graph with node j as center, i.e., d i = 0 for all i ∈ S j (for an example of a star graph with five nodes, see Figure 1b), so that the hypotheses of Lemma 6.1 are satisfied with nodes 9 and 10 each playing the role of "node 1" in the lemma.
Applying Lemma 6.1 to node 9 in Figure 2a where S = {1, 2, 3, 4}, we see that there exists a τ such that node 9 will change in the next iteration. By symmetry, node 10 will change in the same iteration. If node 13 doesn't change in the first MBO iteration, Lemma 6.1 can be applied again (because node 13 has two children, the hypotheses of the lemma are again satisfied with 9, 10 ∈ S 13 ) to show that there exists a τ such that node 13 will be added to the set. After node 13 has been added to the set, S, as in Figure 2b the MBO iterates are stationary. To see that node 15 cannot be added to S, assume that it were. Then the value of the Lyapunov functional, (32), must have decreased. But by symmetry, in the next MBO iteration, node 15 will be removed from S, again decreasing the value of the Lyapunov functional, a contradiction. The final configuration in Figure 2b minimizes the normalized cut, as defined in Section 2.2.  This argument is easily generalized to trees where each node, excluding leaves, has the same number of children c ≥ 2.

A small square grid
Here we construct an explicit example where Theorem 4.8 can be applied to show that there exists a time interval (τ 1 , τ 2 ) such that a node is guaranteed to change in one iteration of the (MBO τ ) algorithm.
Consider a three by three (nonperiodic) square grid as in Figure 1c with unit edge weights, with nodes numbered 1 through 9 from left to right, top to bottom. Let It is easily checked that, for i ∈ S 5 , such that conditions (38) are satisfied. Furthermore, with S 5 := S 5 ∪ {5}, for the time interval (τ 1 , τ 2 ) of Theorem 4.8.

Torus graph
Consider the n-cycle, C n with n nodes. The nodes are arranged in a circle and each node is connected to its 2 neighbors. We take ω ij = ω if an edge connects nodes i and j, and ω ij = 0 otherwise. See Figure 1d. We consider the unnormalized graph Laplacian L = D − A (r = 0 in (8)). In this case, L is a circulant matrix diag({−1, 2, −1}, {−1, 0, 1}). The eigenpairs {(λ j , v j )} n j=1 are given by We then consider the 2-torus graph, T 2 n 1 ,n 2 which is the Kronecker (tensor) product of the n 1 -and n 2 -cycles. See Figure 3. In particular, if u and v are eigenfunctions of the graph Laplacian on C n 1 and C n 2 with corresponding eigenvalues α and β respectively, then w = u ⊗ v (with w i,j = u i v j ) is an eigenvector of T 2 n 1 ,n 2 with corresponding eigenvalue α + β. In particular, the spectral radius of the Laplacian is ρ = 8ω.
Consider for a moment T 2 n 1 ,n 2 as a discretization of the torus, T 2 . The nontrivial minimal-perimeter subsets of T 2 are given by "strips". Thus we might expect that for some initial condition, χ S , S ⊂ V the evolution by MBO, Allen-Cahn, or MCF would converge to a strip.
We consider the (MBO τ ) evolution on a 32 × 12 torus with ω = 1 and initial condition, as in Figure 3a. For τ = 1.12, the solution is stationary after 4 iterations once the "high curvature corners" have been removed, as in Figure 3b. For τ = 4, the solution evolves into a minimal-perimeter "strip" in 5 iterations, as in Figure 3c.
For the parameters in Figure 3b, we compute the guaranteed stationarity bounds in (29) and (30) to be τ ρ ≈ 0.0057 and τ κ = 1 4 , respectively, showing these bounds are not sharp.
Consider (MCF ðt ), with S n equal to the minimal-perimeter strip in Figure 3c. Then S n+1 = S n is a minimizer of F(·, S n ), but so are S n+1 = S n ∪∂(S c n ) and S n+1 = S n \ ∂S n (or variations in which only one 'vertical line' of the boundary is added or removed). This illustrates a possible type of non-uniqueness for (MCF ðt ), which occurs when S n is totally geodesic (i.e., its boundary is a geodesic). To reiterate, the stationary solution in Figure 3b is frozen (due to the smallness of τ ), while the solution in Figure 3c is totally geodesic.

Buckyball graph
Consider the buckyball graph with 60 nodes and 90 edges with ω ij = ω for all edges (i, j) as in Figure 4. The graph is regular; each node has degree 3ω.
Consider for a moment the buckyball graph as a (coarse) discretization of the sphere, S 2 . There are no nontrivial minimal-perimeter subsets of S 2 . Great circles are the only nontrivial stationary submanifolds of S 2 (and have constant curvature). In fact, great circles are totally geodesic. Thus we might expect that for any initial condition, χ S , S ⊂ V such that |S| = |S 2 |/2, the evolution by MBO, Allen-Cahn, or MC would converge to a stationary solution, either 0 or χ V depending on the initial mass, M (χ S ) = vol S. If S is chosen to be a symmetric partitioning of the nodes for the buckyball graph, we expect that the (MBO τ ) evolution will be stationary for all values of τ .
The bound from Theorem 4.2 states that pinning in (MBO τ ) occurs if

The bound in Theorem 4.3 states that trivial dynamics occur if
We find numerically that λ 2 ≈ ω 1−r 3 −r · 0.2434 and λ n ≈ ω 1−r 3 −r · 5.6180. For initial condition χ S , with |S| = 14, as given in Figure 4 (top left), and r = 0 and ω = 1, Theorems 4.2 and 4.3 predict that pinning occurs if τ < 0.0223 and trivial dynamics occur if τ > 15.1811. We find numerically that this initial condition is pinned if τ < 1.89 and trivial dynamics occur if τ > 3.54. For intermediate values of τ , the iterates shrink to the empty node set. For τ = 2, the iterates take 3 iterations to reach steady state, as illustrated in Figure 4.
For the initial condition χ S where S is taken to be a symmetric partitioning of the nodes, (MBO τ ) evolution is pinned for all values of τ .

Adjoining regular lattices
We consider the graph which is composed by adjoining a square and triangular lattice. See Figure 5. We take r = 0 and ω ij = 1 for i ∼ j and zero otherwise. Note that the degree of a node in the triangular lattice is 6 and the degree of a node in the square lattice is 4.
To test the intuition from the star graph (see Section 6.2) that it is easier for the solution to pin on nodes with smaller degree, we consider the initial condition given in the top left panel of Figure 5. The mass is initially distributed over both the     The nodes on the 'border' of the graph (where the regular lattice was cut) have smaller degree. In Figure 6, we demonstrate that the solution can also be pinned on the border. Again, the initial condition is given in the top left panel. In this simulation, we take τ = 0.9. Away from the boundary, the solution set can again shrink freely. However, the solution becomes pinned on the border. In this last example, we consider a graph which is widely used as a benchmark problem for partitioning algorithms. Our construction of the graph follows [12]. The graph is generated by first randomly distributing 600 points in a region described by two half arcs in R 2 -referred to as "two moons". See Figure 7 (top left). The points are then embedded in R 100 and randomly perturbed by i.i.d. Gaussian noise with mean zero and standard deviation σ = 0.1. Let k = 10. The edge weights are chosen to be

Two moons graph
and d i is the Euclidean distance between x i and its k-th nearest neighbor. We then take the symmetrized k-nearest neighbors graph. This is given in Figure 7 (top right). We consider the (MBO τ ) evolution with τ = 5 and initial condition as shown in Figure 7 (bottom left). After 9 iterations, the (MBO τ ) evolution converges to the state in Figure 7 (bottom right).
We want to stress that the two moons example is meant as an illustration of the (MBO τ ) algorithm on a more complex toy graph. In this paper we do not aim to compete in terms of accuracy or efficiency with existing clustering methods, hence we will not focus on those aspects of the two moons example.

Discussion and open questions
Motivated by curvature flows in continuum mechanics, we described several analogous processes on graphs. In particular we used the graph total variation, or graph cut, to define curvature on graphs, which we then related to the graph Allen-Cahn equation, graph MBO scheme, and graph mean curvature flow. The continuum intuition for these problems suggests many results, some of which we proved in this paper, some which we have shown cannot hold on a graph because of the lack of infinitesimal length scales, and some which we state below as, still unproven, open questions.
In a sense to be made precise, for a suitable choice of τ (not too small, not too large, depending on ω, most likely depending on the graph's spectrum), the dynamics of (MBO τ ) are expected to approximate those of graph MCF. for S ⊂ V . Hence, we can rewrite the Lyapunov functional J from (32) as Using 1 − χ S , χ S V = 0, 1, Δχ S V = 0, and (3), we find J(χ S ) = τ TV 1 a (χ S ) + R S (τ ). This connection between the Lyapunov functional J and the total variation, and hence the MCF functional F from (25), strengthens the plausibility of a positive answer to Question 7.1. A more difficult question, which could be of great use in numerical problems, is how we can estimate the number iterations of (MBO τ ) needed to go from some initial data to a minimizer of the Ginzburg-Landau functional or graph cut functional.

Question 7.2 (Minimizing graph cut).
For any, a priori specified, approximation error, is there a local, quantitative bound on the number of iterations of (MBO τ ) needed to approximate a minimizer of the graph cut functional TV q a up to the specified accuracy? "Local" means here that the bound does not rely on the spectrum of the graph, but instead uses quantities such as graph curvature κ q,r or the total variation TV q a (χ N ) for some local graph neighborhood N ⊂ V . The analogous question can be asked for (MCF ðt ).
In Theorem 4.8 it was shown that, if the curvature at a given node is sufficiently large and the time step τ in (MBO τ ) is chosen in the right interval, then the value at the node will change in one (MBO τ ) iteration. The next question is the analogous statement for the Allen-Cahn equation, (ACE ε ).

Question 7.3 (Non-freezing for Allen-Cahn).
Let ε be in some positive interval and let u ε be a solution of the Allen-Cahn equation (ACE ε ) for this choice of ε. Suppose that the curvature (κ 1,r S ) i of S = {j ∈ V : u ε j (t 0 ) ≤ 0} at a node i ∈ S, is sufficiently large (possibly depending on ε). Is there is some interval of positive times such that u ε i (t 0 + h) > 0 for h in this interval?
Because (ACE ε ) is derived from the graph functional (GL ε ), we suspect that the correct curvature in Question 7.3 is κ 1,r S , the curvature related to the anisotropic functional 1 2 TV 1 a , which was identified as the Γ-limit of (GL ε ) for ε → 0 in earlier work [107], and not the curvature which can be derived from the isotropic total variation functional TV as the continuum case might suggest at first glance. Since we have seen that pinning occurs for small enough ε, full convergence is not expected here, but the numerical examples of Section 6 suggest an approximate result for small ε is feasible. sequence of times t n for which either the sets S n := {j ∈ V : u ε j (t n ) ≤ 0} or the sets S n := {j ∈ V : u ε j (t n ) ≥ 0} form a solution to the graph MCF? Furthermore, among sequences with this property is there exactly one sequence {t n } that is maximal in the following sense: there exists no sequence {t n }, of which {t n } is a strict subsequence, such that {S t n } n ⊂ {S tn } n and {S t n } is still a solution to the graph MCF?
A different question is how the graph MCF behaves in the continuum limit, when it is formulated on a sequence of graphs which are ever finer discretizations of some continuum space. We expect that it should give back the usual MCF in the continuum limit, or some anisotropic MCF, as the convergence results in [107] show the final limit could crucially depend on the scaling in ðt and the discretization parameter (which will show up in the graph weights). This question is similar (and perhaps equivalent) to the convergence of discretization schemes for the usual MCF. Similar questions can be asked about the graph Allen-Cahn equation and graph MBO scheme.

Question 7.5 (Stability of graph MCF, MBO, and ACE, in the continuum limit).
Suppose we are given any sequence of graphs (V k , ω k ij ), k ∈ N, converging in the Gromov-Hausdorff sense to a Riemannian manifold (M, g). Is there a fixed time interval such that, as k → ∞, any sequence generated by (MBO τ ) with τ in this interval converges to a sequence generated by the (possibly anisotropic) continuum MBO algorithm in M (with the Laplacian induced by g)? Accordingly, do solutions of (ACE ε ) converge to solutions of the (possibly anisotropic) continuum Allen-Cahn equation in M , and do solutions to (MCF ðt ) converge to viscosity solutions (via the level set formulation) of (possibly anisotropic) continuum MCF in M , with initial data given by the limit of the initial data in each V n ?
As explained in Appendix A, MCF is closely related to certain models of continuum phase transitions, particularly Allen-Cahn and Ginzburg-Landau dynamics. However, another important connection with statistical mechanics involves Ising models and other interacting particle systems, which are known to converge in the mesoscopic limit to flow by mean curvature. In work of Katsoulakis and Souganidis [66,67] convergence to a viscosity solution of MCF is first proved. See also the related work of Funaki and Spohn [50] where MCF is derived as a deterministic limit of stochastic Ginzburg-Landau dynamics. On the other hand, there is vast literature concerned with (for instance) the Ising model (and its generalizations) on graphs [69,70], see also Durrett's book [37]. This suggests the following question. Is (MCF ðt ) related to an interacting particle system on the underlying graph? Also, are there interacting particle systems or stochastic processes in V that are related to (MBO τ ) or (ACE ε )? Finding such a system would partly resolve the issue that a front moving on a graph in continuum time necessarily does so in a way that, from a continuum point Then, when there is a smooth solution φ(x, t), the domains given by Ω t := {φ(·, t) < 0} will be evolving by mean curvature flow and will start from the original domain Ω. In general, even for an initial domain with a smooth boundary, a smooth solution might not exist for all times, and one must work with viscosity solutions. In this context, the convergence of the MBO scheme (46) (explained in Section A.3) to such viscosity solutions was proved by Evans [46]. It is worth remarking that Soner and Touzi in [97] interpret MCF as a stochastic control problem. In this interpretation, one controls a Brownian motion for which one is allowed to turn off diffusion in one given direction. The surface Σ t in this case arises as the set of points that can be reached with probability 1. This probabilistic interpretation is quite different from those mentioned in the discussion at the end of Section 7.
Finally, given the affinity with the graph setting, it is worthwhile to comment briefly on the more recent nonlocal mean curvature flow. Caffarelli and Souganidis [18] arrive at this flow by following a nonlocal and continuum analogue of (MBO τ ), where instead of using the Laplacian one uses a fractional power of the Laplacian (−Δ) s with s ∈ (0, 1/2). A level set formulation based on viscosity solutions was developed later by Imbert [62].
Define the threshold operator P u(x) := 1 u(x) ≥ 1 2 , 0 u(x) < 1 2 . The MBO evolution of a set described by u at time T can then be succinctly written is the 'time step' and k is a parameter. In [46,5] convergence of the MBO algorithm to motion by mean curvature, defined in (45), as k ↑ ∞, is proven. The MBO scheme and its implementation has evolved considerably since its original proposal. We provide a short, non-exhaustive, overview here. In [72,92], the MBO scheme was extended to multiple-phase problems. In [91,93], a spectral Vol. 82 (2014) Mean Curvature, Threshold Dynamics,... 57 discretization of the MBO scheme for motion by mean curvature was proposed, which is much more efficient then finite difference approaches. This approach can be applied to both two-phase and multi-phase problems. Convergence for an anisotropic variant of the MBO scheme was proven in [26], and in [43] diffusion generated motion was applied to higher order geometric motions. In [45], the MBO scheme was extended to a thresholding method for approximating the evolution by gradient descent of the Mumford-Shah functional and applied to image segmentation problems. In [44] the authors study MBO-like schemes which use the signed distance function. Recent work [42] presents new algorithms for multiphase mean curvature flow, based on a variational description of the MBO scheme. It is well-known that, in a finite difference scheme for the MBO algorithm, the time step τ (equivalently k) in (46) must be chosen carefully and in the limit as k ↑ ∞, the discretized MBO evolution is stationary. In fact, when discussing the numerical implementation of the algorithm on a discrete grid, Merriman, Bence, and Osher [75] observe: "The basic requirement is that [the time step, τ ,] be short enough so that the local analysis . . . is valid, but also long enough so that the boundary curve moves by at least one grid cell on the spatial grid (otherwise the curve would be stuck)." They derive heuristic upper and lower bounds on the time step, τ , for the algorithm to approximate motion by mean curvature.

B. Calculation of the first variation for graph total variation
In (28) we computed d dt t=0 TV q a (u + tv) = sgn(∇u), ∇v using the convexity of TV q a . In this section we review this fact for other kinds of graph total variation, which are expressible as T V(u) := max{ div ϕ, u V : ϕ ∈ A}, (47) where A ⊂ E e is some admissible set of edge functions 17 , such that a maximizer ϕ u ∈ A exists (even if it might not be unique). The key fact is that such a T V(u) is convex and might be studied via convex analysis. The convexity of T V is evident from its definition: u → T V(u) is a scalar valued function given as the maximum of a family of linear functions u → div ϕ, u V . Let us recall some concepts from convex analysis [39, Chapter 1, Section 5], in particular, the subdifferential of T V at u. This set-valued function is denoted by ∂T V(u) and given by That is, v ∈ T V(u) if and only if it is the slope of an affine function which is tangent to the graph of T V at u. In particular, at the points where T V(u) is differentiable ∂T V(u) consists of a single element: the gradient of T V(u) at u.

Vol.82 (2014)
Mean Curvature, Threshold Dynamics,... 59 where we have to remember that ω q ij u i −u j |∇u| j and 1 |∇u| j are to be interpreted as 0 whenever |∇u| j = 0 for any j ∈ V (including j = i). Because the node function div ϕ TV is the first variation of the isotropic graph total variation, in the literature it is sometimes referred to as curvature or 1-Laplacian. In this paper we have argued why the use of the anisotropic total variation TV q a to define curvature, as in (14), is a more natural choice on graphs.