Convergence of the Graph Allen–Cahn Scheme

The graph Laplacian and the graph cut problem are closely related to Markov random fields, and have many applications in clustering and image segmentation. The diffuse interface model is widely used for modeling in material science, and can also be used as a proxy to total variation minimization. In Bertozzi and Flenner (Multiscale Model Simul 10(3):1090–1118, 2012), an algorithm was developed to generalize the diffuse interface model to graphs to solve the graph cut problem. This work analyzes the conditions for the graph diffuse interface algorithm to converge. Using techniques from numerical PDE and convex optimization, monotonicity in function value and convergence under an a posteriori condition are shown for a class of schemes under a graph-independent stepsize condition. We also generalize our results to incorporate spectral truncation, a common technique used to save computation cost, and also to the case of multiclass classification. Various numerical experiments are done to compare theoretical results with practical performance.

The graph cut problem originated in computer science for the purpose of partitioning nodes on a graph [6]. It is tightly related to statistical physics due to its connections with Markov random fields (MRF), and spin systems. In particular, the maximum a posteriori (MAP) estimation of the Ising model can be formulated in terms of a graph cut problem [17]. The results also generalizes to multiclass graph cut by extending to the generalized Potts model [5]. Therefore, efficient solutions to the graph cut problem provide a means of doing MAP estimations for these types of MRFs, and is computationally more efficient compared to techniques for generic MRFs such as belief propagation [32,35]. Graph partitioning is also tightly related to the study of networks in statistical physics [21,24,36]. In [18], Hu et al. applied methods for solving graph cut problems to perform modularity optimization [16,25,36], a technique widely applied for community detection in networks.
On the other hand, diffuse interface models have been widely used in mathematical physics to model the free boundary of interfaces [9,26]. Diffuse interface models are often built around the Ginzburg-Landau functional, defined as Evolution by the gradient flow of the Ginzburg-Landau functional has been used to model the dynamics of two phases in material science. The most common among them is the Allen-Cahn equation [9], the L 2 gradient flow of the Ginzburg-Landau functional. Another commonly used model is the Cahn-Hilliard equation [2,8]. The diffuse interface models can often be used as a proxy for total variation (TV) minimization since the -limit of the Ginzburg-Landau functional is shown to be the TV semi-norm [20].
The key observation linking the two areas above is that the TV semi-norm, when suitably generalized to weighted graphs, coincides with the graph cut functional for discrete valued functions on graphs [29]. Hence techniques for TV minimization can also be applied to solve the graph cut problem. In [3], Bertozzi et al. generalized the Ginzburg-Landau functional to graphs, and developed an algorithm based on the Allen-Cahn equation to approximately solve the graph cut problem. This was made rigorous by the result that the graph Ginzburg-Landau functional -converges to the graph TV functional [28]. Following this line of work, a series of new algorithms were developed for semi-supervised and unsupervised classification problems on weighted graphs [18,23], applying techniques for TV minimization to the setting of weighted graphs.
The reason many PDE models defined on the Euclidean space R n can be generalized to discrete graphs is that the graph Laplacian matrix [30] shares many connections with the classical Laplacian operator. We recap the definition of the graph Laplacian and some of its basic properties below.
We consider a weighted graph G with vertices ordered {1, 2, . . . , n}. Each pair of vertices (i, j) is assigned a weight w i j ≥ 0, with w i j > 0 representing an edge connecting i and j, and w i j = 0 otherwise. The weights w i j form a weight matrix or adjacency matrix of the graph G. Given a weight matrix W , one can construct three different kinds of graph Laplacians: where D is the diagonal matrix d ii = i w i j . Throughout this paper, we assume that each node i is connected to at least another node, so that d ii > 0, ∀i and Eqs. (3) and (4) are well-defined.
All three Laplacian matrices are commonly used in graph learning problems. In particular, the graph Dirichlet energy for the unnormalized graph Laplacian has the following property as shown in Eq. (5).
Here u is a mapping from the set of nodes {1, . . . , N } to R, identified with a vector in R N . We use u(i) to denote the value of u on the node i. Similar to the classical Dirichlet energy, the graph Dirichlet energy penalizes similar nodes (i.e. pairs such that w i j is large) from having different function values, bringing a notion of "smoothness" for functions defined on the graph. In this paper, we will mainly focus on the unnormalized Laplacian, and generalize to the other two cases whenever we can. This paper studies the discrete graph Allen-Cahn scheme in [3] used for graph semisupervised classification. We give a brief introduction of the semi-supervised learning problem and its relation to the graph Allen-Cahn scheme. Given a collection of objects indexed by Z = {1, . . . , N } and a set of labels y(i) ∈ C for each object i, the task of semi-supervised learning is to infer the labels for all items given only the labels on a subset of objects Z ⊂ Z . We mainly focus on the case of binary classification, i.e., when |C| = 2, since the original Ginzburg-Landau model in [3] was designed to handle the binary case. However, we also generalize modestly to incorporate multiclass classification in Sect. 5 as well. Following the convention in [3], the binary label set C is assumed to be C = {−1, 1}. Next, we introduce the Ginzburg-Landau energy and the Allen-Cahn equation on graphs. Define the Ginzburg-Landau energy on graphs by replacing the spatial Laplacian with the graph LaplacianL.
where W is the double-well potential W (x) = 1 4 (x 2 − 1) 2 . Let W (u) = i W (u(i)). The Allen-Cahn equation on graphs is defined as the gradient flow of the graph Ginzburg-Landau functional.
The discrete graph Allen-Cahn scheme in [3] is a semi-implicit discretization of Eq. (7). The reason for being semi-implicit is to counter the ill-conditioning of the graph Laplacian To do graph semi-supervised classification, we add a quadratic fidelity term 1 2 i∈Z η(u(i)− y(i)) 2 to the graph Ginzburg-Landau energy, where y(i) are the known labels and η is a scalar parameter reflecting the strength of the fidelity. For our purpose, it is more convenient to adopt a matrix notation of the fidelity term, namely where u − y 2 := u − y, (u − y) , is a diagonal matrix where ii = 1 if i ∈ Z and 0 otherwise. The value u(i) can be interpreted as a continuous label assignment, and thresholding u(i) > 0 and u(i) < 0 gives a corresponding partition of the graph. Solving the gradient flow of F(u) via a semi-implicit discretization, we have: In later sections, we will study the scheme (8) first and then incorporate the fidelity term in the analysis. Next, we introduce spectral truncation. Note in each iteration of (8) and (10), we need to solve a linear system of the form (I + dt L)u = v. In many applications, the number of nodes N on a graph is huge, and it is too costly to solve this equation directly. In [3,23], a strategy proposed was to project u onto the m eigenvectors of the graph Laplacian with the smallest eigenvalues. In practice, spectral truncation gives accurate segmentation results but is computationally much cheaper. The reason spectral truncation works is because the first few eigenvectors of the graph Laplacian carry rich geometric information of the graph. In particular, the second eigenvector, named the Fiedler vector, approximates the solution to the normalized graph cut problem [30].
In practice, the selection of the stepsize dt is very important to the performance of the model, but is largely chosen empirically by trial and error in previous papers. In this paper, we intend to do a thorough and rigorous analysis on the range of stepsize for the scheme to be well-behaved. Our main contributions are below: -We prove that there exists a graph-independent upper bound c such that for all 0 ≤ dt ≤ c, the schemes (8), (10) are monotone in the Ginzburg-Landau energy, and that under an a posteriori condition, the sequence {u k } is convergent. -We show that the upper bound c depends linearly on , and is inversely proportional to the fidelity strength η in (10). -We generalize the results to incorporate spectral truncation and multiclass classification.
-We conduct a variety of numerical experiments to compare practical performance with theory.
The paper is structured as follows: in Sect. 2, we prove that the scheme is bounded via a discrete version of the maximum principle. In Sect. 3, we use L 2 estimates to prove monotonicity and convergence. In Sect. 4, we prove monotonicity and boundedness for spectral truncation under a graph-dependent stepsize bound dt = O(N −1 ), and provide an example for the dependency of dt on the graph size. In Sect. 5, we generalize the results to multiclass classification. In Sect. 6, a variety of numerical experiments are done to compare the theory with practical performance. We present a list of notations and definitions used throughout the paper.
-L placeholder variable for any choice of the three definitions of the graph Laplacian. The exact choice will be specified in the proposition or context it was referred to. -Z the set of nodes of the graph, with cardinality N ; Z the fidelity set, i.e., the set of nodes where the labels are known. u Z → R, identified with a vector in R N . u( j) denotes the evaluation of u on node j, and u k denotes the kth iterate of some numerical scheme.  u(1)), . . . , F N (u(N ))) for u ∈ R N . We call F i : R → R components of the diagonal map F .

Maximum Principle-L ∞ Estimates
The main result for this section is the following:
What is notable is that the stepsize restriction is independent of the graph size. We also note that the bound on dt depends linearly in , and we will generalize this dependency to include the fidelity term later in this section. To prove the proposition, we split the discretization (8) into two parts.
We will prove that u k+1 ∞ ≤ v k ∞ for all dt > 0 via the maximum principle, and show that the stepsize restriction essentially comes from the first line of (12). For future reference, we denote the first line of (12) as the forward step since it corresponds to a forward stepping scheme for the gradient flow and the second line a backward step correspondingly.

Maximum Principle
The classical maximum principle argument relies on the fact that u(x 0 ) ≥ 0 for x 0 a local minimizer. This fact is also true for graphs and is an extension of the classical maximum principle for finite difference operators [10]. Proof For both the random walk and the unnormalized Laplacian, we have the following: Let i be a local minimizer. Then Next, we prove a maximum principle for discrete time.
where L is either the unnormalized or the random walk Laplacian, then Proof Suppose i = arg min j u( j) is any node that attains the minimum for u. Then since Arguing similarly with the maximum, we have that u ∞ ≤ v ∞ .

Proof of Boundedness
We show that the stepsize bound for the sequence u k to be bounded depends only on the forward step of the scheme.

Proposition 4 Let u k be defined by
where is a diagonal map : (u(1), . . . , u(N )) → ( 0 (u(1)), . . . , N (u(N ))), L is the unnormalized graph Laplacian, and σ some constant greater than 0. Define the forward map Proof We set M = 1 and = (W , . . . , W ), where W is the double-well function. Note that by replacing dt with dt/ and setting σ = 1 2 in (16), we recover the original scheme (12). Therefore, we may assume = 1, and scale the bound obtained by . The component forward The Lemma 1 Define F dt as in (17).
Since F dt is cubic in x, (18) can be solved analytically via brute force calculation. Setting M = 1 and solving (18) for dt ≥ 0 gives 0 ≤ dt ≤ 0.5.
The choice of the constant M = 1 is natural since the function value u(i) is ideally close to the binary class labels {−1, 1}. However, if we merely want to prove boundedness without enforcing u k ∞ ≤ 1 we can get a larger stepsize bound by maximizing the bound obtained from (18) with respect to M, namely, The reason we are computing these constants explicitly is that we will compare them in Sect. 6 against results from real applications. For future reference, the dt ≤ 0.5 bound will be called the "tight bound" where the dt ≤ 2.1 bound will be called the "loose bound".

Generalizations of the Scheme
In this section, we extend the previous result to the case where fidelity is added, and also to the case for the symmetric graph Laplacian L s .
We restate the graph Allen-Cahn scheme with fidelity: is a diagonal matrix where ii = 1 if i is in the fidelity set Z and 0 otherwise, and Proposition 5 (Graph Allen-Cahn with fidelity) Define u k by (19) and suppose u 0 ∞ ≤ 1.
Proof Denote the forward map of (19) The case for the symmetric graph Laplacianis a little different. Since L s does not satisfy (13), we can no longer apply the arguments of maximum principle. However, we are still able to prove boundedness under the assumption that the graph satisfies a certain uniformity condition.
Define u k by the semi-implicit scheme (11) where L is set to be the symmetric Laplacian L s .
Proof By definition of L s and L r w , we have the relation Substituting (21) to line 2 of (12) with L = L s , we have We will do a change of variablesũ By the definition of α, we have ũ 0 ∞ ≤ 1. We will use the same technique as before to show where We can prove the theorem if we show F i dt maps [−1, 1] to itself for all i = 1, . . . , N . This is formalized in the next lemma, whose proof we omit since it involves only brute force calculations.

Lemma 3 For any
Finally, since ũ k ≤ 1, we have u k ≤ 2 by definition ofũ k .

Remark 1
The condition ρ < M with M = 4 is arbitrary and just chosen to simplify calculations for dt. The proposition here is weaker than Proposition 1 due to the loss of the maximum principle. We will see this again during the analysis of spectral truncation in Sect. 4.

Energy Method-L 2 Estimates
In this section, we derive estimates in terms of the L 2 norm. Our goal is to prove that the graph Allen-Cahn scheme is monotone in function value, and derive convergence results of the sequence {u k }. We will drop the subscript for 2 norms in this section. Our proof is loosely motivated by the analysis of convex-concave splitting in [11,33]. In [11], Eyre proved the following monotonicity result: is monotone in E, namely, In our proof, we will set E = G L(u), E 1 = 2 u, Lu and E 2 = 1 W (u). Since E 2 is not concave, we will have to generalize Proposition 7 for general E 2 . But first, we digress a bit and establish the connection between the semi-implicit scheme (25) and the proximal gradient method, which simply assumes E 1 to be sub-differentiable. The reason for this generalization is to have a unified framework for dealing with E 1 taking extended real values, which is the case when we study spectral truncation in Sect. 4.
The proximal gradient iteration [4] is defined as where the Prox operator is defined as This scheme is in fact equivalent to the semi-implicit scheme (25) when E 1 is differentiable. This is clear from the implicit gradient interpretation of the proximal map. Namely, if y = Prox γ f (x), The Prox operator is well-defined if f is a proper closed convex functions taking extended real values, namely, if the domain of f is non-empty, f is convex, and the epigraph of f is closed. We prove an energy estimate for the proximal gradient methods when E 2 is a general function. Proof The second line is by definition of subgradients, and ∂ E 1 (x k+1 ) could be any vector in the subgradient set. The third line is by substituting the particular subgradient ∂ E 1 (x k+1 ) in the definition of x k+1 . The fourth line is obtained by one variable Taylor expansion of the function E 2 along the line segment between x k and x k+1 .
Next, we apply estimate (8) and the boundedness results in Sect. 2 to prove that the graph Allen-Cahn scheme is monotone in the Ginzburg-Landau energy under a graph-independent stepsize.
Proposition 9 (Monotonicity of the Graph Allen-Cahn Scheme) Let u k be the graph Allen-Cahn scheme with fidelity defined below: where L is the unnormalized graph Laplacian. If u 0 ∞ ≤ 1, then ∀0 ≤ dt ≤ min( 2+η , 2 2+η ), the scheme is monotone under the Ginzburg-Landau energy with fidelity, The result holds for symmetric Laplacians if we add the uniformity condition (20) for the graph. (30) is equivalent to the proximal gradient scheme with E 1 and E 2 defined above, we can apply Proposition 8. Since the L ∞ unit ball is convex, line segments from u k to u k+1 lie in the set { u ∞ ≤ 1}, and we can estimate M by the inequality below Hence u k is monotone in E. The case for the symmetric Laplacian can be proved in a similar manner by computing an estimate of max ξ ∞ ≤2 ∇ 2 E 2 .
Next, we discuss the convergence of the iterates {u k }. First, we prove subsequence convergence of {u k } to a stationary point of E(u). We first need a lemma on the sequence {u k+1 −u k }.

Lemma 4 Let u k , dt, be as in Proposition 9, then
Proof Summing Eq. (31), we have the following holds for all n. Since E(u n ) ≥ 0 and dt ≤ 2 M , we prove the lemma.
Proposition 10 (Subsequence convergence to stationary point) Let u k , dt, be as in Proposition 9. Let S be the set of limit points of the set {u k }. Then ∀u * ∈ S, u * is a critical point of E, i.e., ∇ E(u * ) = 0. Hence any convergent subsequence of u k converges to a stationary point of E. Since In general, we can not prove that the full sequence {u k } is convergent, since it is possible for the iterates {u k } to oscillate between several minimum. However we show that when the set of limit points is finite, we do have convergence. This is stated in the Lemma 5, which is proved in the Appendix.

Lemma 5
Let u k be a bounded sequence in R N , and lim k→∞ u k+1 − u k = 0. Let S be the set of limit points of the set {u k |k ≥ 1}. If S has only finitely many points, then S contains only a single point u * , and hence lim k→∞ u k = u * .
Finally, we provide an easy to check a posteriori condition that guarantees convergence using the lemma above. The condition states that the iterates u k must take values reasonably close to the double-well minimum −1 and 1. Empirically, we have observed that the values of u k are usually around −1 and 1 near convergence, hence the condition is not that restrictive in practice.
where u 2 is the diagonal matrix whose entries are u(i) 2 . Note that ∇ 2 E(u) is positive definite on D since η and L are semi-positive definite, and 3u 2 − I is positive definite on D. Therefore, the stationary points of E are isolated on D. Since D is bounded, this implies the set of stationary points is finite.

Analysis on Spectral Truncation
In this section, we generalize the analysis of the previous sections to incorporate spectral truncation. We establish a bound dt = O(N −1 ) for monotonicity and boundedness when the initial condition u 0 ∈ V m where V m is defined below, and dt = O(N − 3 2 ) for the general case. First of all, we formally define the spectral truncated graph Allen-Cahn scheme. All conclusions in this section hold for both the unnormalized Laplacian and the symmetric Laplacian, therefore we will not make the distinction and will denote both by L.
Let {φ 1 , φ 2 , . . . , φ m } be eigenvectors of the graph Laplacian L ordered by eigenvalues in ascending order, i.e., λ 1 ≤ λ 2 · · · ≤ λ N . Define the mth eigenspace as V m = span{φ 1 , φ 2 , . . . , φ m }, and P m as the orthogonal projection operator onto the space V m . Then the spectral truncated scheme is defined as Note that in practice, we do not directly solve the linear system on the second line of (34), but instead express u k+1 directly in terms of the eigenvectors as in (38). However, writing it in matrix form is notationally more convenient in the subsequent analysis. We want to apply the energy estimates in Sect. 3 for spectral truncation. To do this, we first show that spectral truncated scheme (34) can be expressed as a proximal gradient scheme for some E 1 and E 2 .

Proposition 12 (Reformulation of Spectral Truncation)
The spectral truncated scheme (34) is equivalent to the proximal gradient scheme (26) where I V m is the indicator function of the mth eigenspace, i.e.
Proof Let v be any vector in R N . Define u, u by the spectral projection and the proximal step respectively, namely, We only have to show u = u . Decomposing (36) in terms of the eigenbasis {φ 1 , φ 2 , . . . , φ m }, we have Since And therefore Hence we have u = u .
Since the orthogonal projection P m is expansive in the L ∞ norm, i.e., P m u ∞ ≤ u ∞ does not always hold, we lose the maximum principle. However, we show that the energy estimate alone is enough to prove monotonicity and boundedness under a smaller stepsize. The choice for = 1 is only to avoid complicated dependencies on that obscures the proof. For the next two sections, we will assume = 1 throughout. To prove the theorem, we first establish the following lemmas.  (34). Then the following inequality holds:

Lemma 7 Let u k and u k+1 be defined in
Proof Since L is symmetric semi-positive definite and the orthogonal projection P m is non-expansive in the L 2 norm, we have u k+1 Next, we prove the main proposition. The idea is to choose dt small enough such that monotonicity in G L is satisfied, and then apply Lemma 6 to have a bound on u k .
Proof (Proposition 13.) Let E 1 (u) = 2 u, Lu + I V m , E 2 (u) = 1 W (u), and E = E 1 + E 2 = G L(u) + I V m . By Proposition 12, (34) is equivalent to the proximal gradient scheme for the splitting E = E 1 + E 2 . We also have ∀k ≥ 0, E(u k ) = G L(u k ) since u k ∈ V m . Therefore, we will denote E(u k ) and G L(u k ) interchangeably.
where (43) is satisfied for iteration k. We first prove the first line of (43) for k + 1. Since u k 2 ≤ C 1 √ N , we apply Lemma 7 and get u k+1 2 ≤ A 1 2 (1 + dt)N 1/2 + A 1 2 dt N 3/2 for some A 1 independent of N . Therefore, we can choose δ 1 independent of N such that ∀0 ≤ dt ≤ δ 1 N −1 , u k+1 2 ≤ A 1 N 1/2 . Next, we apply Proposition 8 and choose dt such that E(u k ) ≥ E(u k+1 ). Since u k+1 ∞ ≤ u k+1 2 ≤ A 1 N 1/2 . We can set M in Proposition 8 by the estimate below: where A 2 independent of N, and we can set M = A 2 N . Let δ 2 = 2 A 2 , and δ = min(δ 1 , δ 2 ), To prove the second line of (43), note that since G L(u k+1 ) ≤ C 0 N , we can apply the inverse bound Lemma 6 and get u k+1 2 ≤ C 1 √ N . This completes the induction step.
In Proposition 13, we assumed the initial condition u 0 to be in the subspace V m . This is not generally done in practice, as u 0 is usually chosen to have binary values {−1, 1}. The corollary below gives a monotonicity result for u 0 not in V m . Proposition 13. Let u 0 be any vector satisfying u 0 ∞ ≤ 1. Then exists δ independent of N such that ∀dt < δN −3/2 , {u k } is bounded and G L(u k ) ≤ G L(u k+1 ) for k ≥ 1.

Corollary 1 Let u k be defined as in
Proof Since u 0 is not in the feasible set V m , E(u 0 ) = +∞ = G L(u 0 ). However, since u 1 ∈ V m , we can start the induction from k = 1. Since u 1 2 and k = 1 is already proved above. To prove (44) for general k, we apply Lemma 7 and choose 0 ≤ dt ≤ δ 1 N −3/2 so that v k 2 ≤ A 1 N 3/4 . We then apply Proposition 8 and estimate Fig. 1 Illustration of counter example graph with N = 7. We index the left most node by 1 and the right most node by 2, both marked by an "times" in the figure. Starting from the top left node marked by a circle, we rotate counter clock-wise and assign odd indices {2k + 1|k ≥ 1} to these nodes. We assign even indices {2k|k ≥ 2} on the right similarly. We assume there are N nodes marked by circles on each side, and hence the graph has a total of 2N + 2 nodes and set δ 2 = 2 A2 . By choosing δ = min(δ 1 , δ 2 ), we prove monotonicity for 0 ≤ dt ≤ δ N −3/2 .

A Counter Example for Graph-Independent Stepsize Restriction
We proved that the spectral truncated scheme is monotone under stepsize range 0 ≤ dt ≤ δ = O(N −1 ). One would hope to achieve a graph-free stepsize rule as in the case of the original scheme without spectral truncation (8). However, as we show in our example below, a constant stepsize to guarantee monotonicity over all graph Laplacians of all sizes is not possible. (34), with = 1. For any δ > 0 and dt = δ N −α , 0 ≤ α < 1, we can always find an unnormalized graph Laplacian L N ×N and some initial condition u 0 ∞ = 1 such that the scheme in (34) with truncation number m = 2 is not monotone in the Ginzburg-Landau energy.

Proposition 14 (Graph Size Dependent Stepsize Restriction) Define u k as in
Remark 2 α = 0 is the case for graph-independent stepsize. However, this result is stronger and claims that dt has to be at least O(N −1 ) for monotonicity to hold for all graphs.
To prove Proposition 14, we explicitly construct a collection of weighted graphs that require increasingly small stepsizes to guarantee monotonicity as the graph size N increases. The graph is defined in Definition 1, and illustrated in Fig. 1. To give the idea behind the construction, we note that the reason maximum principle fails for spectral truncation is because a general orthogonal projection P is expansive in the L ∞ norm. Namely, for some vector v ∞ ≤ 1, we have in the worst case P(v) ∞ = O( √ N ). Our strategy is to explicitly construct a graph such that projection operator P m onto one of its eigenspaces V m attains this worst case L ∞ norm expansion. This is made precise in Proposition 15.

Definition 1 (Counter Example Graph)
1. Indexing We index the nodes as shown in Fig. 1. The graph has a total of 2N + 2 nodes, where N is the number of nodes marked by a circle on each side. Fig. 1, we set the weights for the solid black edges to 10; the solid gray edges 1; and the dashed gray edges to γ N , where γ = 2 1−N −1 = 2 + o(1). Writing out the weight matrix, we have

Edge weights With reference to
10, i, j of same parity and = 1, 2 3. Graph Laplacian We choose L to be the unnormalized graph Laplacian L = D − W .

Proposition 15 Under the setup above, the second eigenvector of the graph Laplacianis
We refer to Appendix for the proof of this proposition. Next, we give a proof of Proposition 14. The idea is that after the first two iterations, |u 2 (1)| is arbitrarily larger that that of |u 1 (1)|, and thus the scheme cannot be monotone in the Ginzburg-Landau energy.
Proof (Proposition 14) Define u k by the spectral truncated scheme (34) with u 0 = Sgn(φ 2 ) and dt = δ N −α for some δ > 0 and 0 < α < 1. Since Since u 2 (1) is asymptotically larger than u 1 (1) with respect to N , we have G L(u 2 ) > G L(u 1 ) for N large, and the scheme is not monotone in G L for large N .

Heuristic Explanation for Good Typical Behavior
Despite the pathological behavior of the example given above, the stepsize for spectral truncation does not depend badly on N in practice. In this section, we attempt to give a heuristic explanation of this from two viewpoints.
The first view is to analyze the projection operator P m in the L ∞ norm. The reason why the maximum principle fails is because P m is expansive in the L ∞ norm. Namely, for some However, an easy analysis shows the probability of attaining such an O( √ N ) bound decays exponentially as N grows large, as shown in a simplified analysis in Proposition 17 of Appendix. Thus in practice, it is very rare that adding P m would violate the maximum principle "too much".
The second view is to restrict our attention to data that come from a random sample. Namely, we assume that our data points x i are sampled i.i.d. from a probability distribution p. In [31], it is proven under very general assumptions that the eigenfunctions, eigenvalues of the symmetric graph Laplacian converges to continuous limits almost surely. Moreover, the projection operators P k converges in various senses (see [31] for details) to their continuous limits. More recently, results for continuous limits of graph-cut problems can be found in [27]. Under this set up, we can define the Allen-Cahn scheme on the continuous domain and discuss its properties on suitable function spaces. The spectral truncated scheme still would not satisfy the maximum principle, but at least the estimates involved would be independent of the size of the samples x i , which is also the size of the graph.

Results for Multiclass Classification
The analysis in previous sections can be carried over in a straight forward fashion to the multiclass case. Multiclass diffuse interface algorithm on graphs can be found in [15,19,23]. We state some basic notations. Let K be the number of classes, and N the number of nodes on the graph. We define u to be a real-valued N × K matrix, and obtain the classification results with respect to the matrix u by taking the row-wise maximum. Specifically, the predicted label of node i will be arg max j u i j . We think of the matrix u as a vector valued function on the graph, and denote its rows by u(i).
In [14], a different well function is defined using the L 1 norm instead of L 2 . However, the algorithm in [14] uses a subgradient descent followed by a projection onto the Gibbs simplex.
Since the Gibbs simplex itself is already bounded, this renders the boundedness result trivial, and therefore we will only prove the results for the L 2 well. Define W (u) = N i=1 W (u(i)). We minimize G L by the semi-implicit scheme below The main proposition we prove is this. Since the rows in line 1 of (49) are decoupled, we only have to show that the forward map maps each row of u k to [0, 1] K for 0 ≤ dt ≤ c. This is proven in the lemma below.
Proof Given x ∈ [0, 1] K , we denote components of x by x i . Let y = F dt (x). For each i, Remark 4 Using the same argument as in previous sections, we can extend the result to incorporate fidelity and also prove monotonicity. We omit these discussions for the sake of brevity.

Numerical Results
In this section, we construct a variety of numerical experiments on several different types of datasets. This helps demonstrate our theory, and also have some implications on the real world performance of the schemes. In the following subsections, we specify the exact type of graph Laplacian used for each experiment. For all of the experiments, we initialize u 0 randomly from the uniform distribution on [−1, 1] N .

Two Moons
The two moons data set was used by Buhler et al. [7] in exploring spectral clustering with p-Laplacians. It is constructed by sampling from two half circles of radius one on R 2 , centered at (0,0) and (1,0.5). Gaussian noise of standard deviation 0.02 in R 100 is then added to the data points. The weight matrix is constructed using Zelnik-Manor and Perona's procedure [34]. Namely, we set w i j = e − x i −x j 2 / √ τ i τ j , where τ i is the Mth closest distance to i. We will consider all three Laplacians L u , L r w , and L s in this section, and we refer to the figure captions for exactly which type of Laplacian is used.
In the experiments below, we compute the maximum stepsize dt such that the scheme satisfies an a posteriori criterion that reflects either the boundedness or the monotonicity of the scheme. Namely, we define the boundedness criterion as and define the monotonicity criterion as We set MaxIter = 500, and use bisection to determine the maximum stepsize that satisfies the criterion given. Figure 2 plots the maximum stepsize such that the graph Allen-Cahn scheme satisfies the boundedness criterion for M = 1, 10, where the graphs are generated from the two moons dataset with N = 20:20:2000. No fidelity terms are added and we set = 1. We perform the experiment for both the random walk Laplacian and the unnormalized Laplacian. We observe empirically that the stepsizes are independent of graph size N , and also match the tight and loose bound nicely. Figure 3 plots the maximum stepsize dt such that the graph Allen-Cahn scheme satisfies the monotonicity criterion. We plot the results for all three types of Laplacians. On the left, we fix = 1, and N is varied from 20:20:2000. As we can see, the typical maximum stepsize for monotonicity is between the tight and loose bound. On the right, we fix N = 2000 and vary in the range = 0:0.02:1. We observe empirically the maximum stepsize dt for the unnormalized Laplacian has an almost linear relation with . For random walk and symmetric Laplacians, the relation is linear for small values of , but deviates as is larger. Figure 4 (left) plots the maximum stepsize dt that satisfy the monotonicity criterion for the scheme under spectral truncation. The truncation level is set at Neig = 50. The results are compared with the original scheme without spectral truncation, and we see that the maximum stepsizes are roughly in the same range across all sizes of graphs tested in the experiment. We suspect that the effects of varying the truncation level Neig may be hard to observe as suggested in Fig. 4 (left), and will most likely depend on the specific data set and the graph construction parameters. Due to the length of the paper, we omit discussions of varying the truncation level. Figure 4 (right) plots the effects of adding a quadratic fidelity term with strength parameter c while keeping = 1 fixed for different percentages of randomly sampled fidelity points. We observe empirically that the stepsize dt decays as c increases to a large value, which matches the bound obtained in Proposition 5.

Two Cows
The purpose of this experiment is to study the effects of Nyström extension on the maximum stepsize for monotonicity. Nyström extension is a sampling technique used to approximate eigenvectors without explicitly computing the graph Laplacian [1,12,13]. The technique is very useful since it is often computationally prohibitive to work with the full graph Laplacian when the graph size N is large, which is often the case in image processing applications.
The images of the two cows (see Fig. 5) are from the Microsoft Database, and has been used in previous papers for the task of image segmentation [3,23]. The dimensions of the original image is 312×280. We generate 10 images with increasingly smaller sizes (312/k)×(280/k), k = 1, . . . , 10 by resizing the original image to the target dimensions. We use a feature window of size 7 × 7, and construct a fully connected graph with w i j = e , where σ = 1. We use the symmetric graph Laplacian for this dataset. The eigenvectors are constructed using the Nyström extension, the details of which could be found in [3].   shows two images with k = 1, 5 being segmented under the same stepsize dt = 2, = 4. For fidelity, we select a rectangular area (see blue and red boxes in Fig. 5) of pixels as fidelity, and set the fidelity strength to η = 1. Figure 6 plots the maximum stepsize for monotonicity versus N −1/2 , where N is the size of the graph which equals to the number of pixels in the image. To ensure segmentation quality, smaller epsilon had to be chosen for images of lower resolution. We choose = 4 for k ≤ 5 and = 2 for k ≥ 5. We plot the ratio dt versus N −1/2 in Fig. 6.

MNIST
The purpose of this experiment is to study the stepsize bound for the multiclass graph Allen-Cahn scheme. The MNIST database [22] contains approximately 70,000 28 × 28 images of handwritten digits from zero to nine. The graph is constructed by first projecting each image to the 50 principal components obtained through PCA of the entire MNIST dataset. The weights are computed using the Zelnik-Manor and Perona's scaling [34] with 50 nearest neighbors.
We consider subsets of the MNSIT dataset by choosing a triplet of digits (e.g. {4, 5, 6}). For each such subset, there are approximately 25,000 images, where each image is a representation of one of the digits in the triplet. We test the maximum stepsizes that satisfy the monotonicity criterion on several such subsets as shown in Table 1. We set = 1, η = 1, and randomly select 5% of data points as the fidelity set. We also use spectral truncation with 100 eigenvectors to speed up computations. Table 1 shows the maximum stepsizes for various choices of digit triplets, and the classification accuracy for dt = 0.5. We observe that the maximum dt does not change when the choice of the triplet varies, and can all achieve a good classification accuracy under a stepsize close to the maximum stepsize allowed for monotonicity.

Discussion
The graph Allen-Cahn scheme has been used to approximate solutions to the graph cut problem. This paper studies the range of stepsizes for the graph Allen-Cahn scheme to converge, in relation with the graph Laplacian and other parameters. In summary, we obtain graph independent bounds on dt for which the graph Allen-Cahn scheme is bounded and monotone. Moreover, under a mild a posteriori condition, we show the iterates converge to a stationary point of the total energy E. We then prove a similar monotonicity and boundedness result for stepsize 0 ≤ dt ≤ O(N −1 ) when spectral truncation is applied. We show via an explicit example that the dependency of the stepsize dt on the number of nodes N is unavoidable in the worst case. We also extend the results to multiclass Ginzburg-Landau functional using similar techniques as in the binary case.
There are still some very interesting problems left to be explored. One interesting theoretical problem is to generalize the results for other well potentials of different asymptotic growth rate. It may also be worthwhile to explore the dependency of dt on for the spectral truncation analysis, which the paper, for the sake of simplicity, does not address. Another potential problem is the relationship between the stepsize and the accuracy of the classification result. So far this analysis does not attempt to characterize the quality of the extrema reached, but experiments have shown that the classification accuracy does differ under different choices of stepsize. ã 0 , b 1 ,b 1 , . . . , b N ,b N ) with eigenvalue λ > 0. Since e = (1, 1, . . . , 1) is an eigenvector of L with eigenvalue 0, we have φ, e = 0, i.e. i φ(i) = 0.
Define the eigenspace of engenvalue λ as V λ . Since the graph is invariant under reflection along the middle and symmetric permutations of the nodes marked with a circle (see Fig. 1), V λ is also invariant under these actions. Namely, define a 0 ,b 1 , b 1 , . . . ,b N , b N ), where σ is any permutation of 1, . . . , N , then R(φ) and σ (φ) are also eigenvectors of L with eigenvalue λ. Let where C (1, N ) is the cyclic permutation group of index 1, . . . N , and b * = (b i )/N . Then either ξ 0 = 0 ∈ V λ , or ξ 0 = (0, 0, . . . 0). We discuss each case seperately. Note that for cases where the potential eigenvector v is already completely determined, e.g. cases 2-4, we can use the definition of an eigenvector Lv = λv (54) to verify whether the candidate is an eigenvector or not.
We continue with the proof of Proposition 15. We will show that for the particular weights we have chosen, one of the vectors of form 1 in Lemma 9 has the smallest Dirichlet energy First, we define χ 1 to be the minimizer of (56) under the additional constraint χ 1 = (a, −a, b, −b, . . . , b, −b). Writing in terms of a and b, and using the relation we have (56) Let k be the Lagrange multiplier, the optimality condition is Solving k for γ = . This implies χ 1 is the eigenvector of L whose eigenvalue λ is the smallest non-zero eigenvalue of L. Since 0 has only multiplicity one, χ 1 is the "second eigenvector" or L.

Proposition 17 Define the set
where P m is any projection operator onto a subspace, and 0 < C < 1. Then the volume(with respect to the standard L 2 metric in R N ) of the set M decreases exponentially with respect to the number of dimensions N .
The proposition shows that if u were sampled uniformly from a unit cube, then the probability of some projection P m expanding the max norm by a factor of O( √ N ) is exponentially decreasing. Since v n is the projected direction of u, P m u = u, v n v n . Then we have Since u n , v n ≤ 1, the projected direction v n must be in the set S = {v | v 2 = 1, v ∞ ≥ C}. However, the set S consists of the N "caps" of a unit sphere (see Fig. 7), and hence is exponentially decreasing in volume with respect to the standard metric on the sphere. On the other hand, since v n ∞ ≤ 1, by (60) we have u n , v n ≥ C, and thus u lies in a cone K (v n ) with angle cos(θ ) ≥ C. Hence u ∈ v∈S {K (v)}, and since cones K (v) have volume exponentially decreasing as well, we have V ol(M) is exponentially decreasing with respect to N .