Convergence of the thresholding scheme for multi-phase mean-curvature flow

We consider the thresholding scheme, a time discretization for mean curvature flow introduced by Merriman et al. (Diffusion generated motion by mean curvature. Department of Mathematics, University of California, Los Angeles 1992). We prove a convergence result in the multi-phase case. The result establishes convergence towards a weak formulation of mean curvature flow in the BV-framework of sets of finite perimeter. The proof is based on the interpretation of the thresholding scheme as a minimizing movements scheme by Esedoğlu et al. (Commun Pure Appl Math 68(5):808–864, 2015). This interpretation means that the thresholding scheme preserves the structure of (multi-phase) mean curvature flow as a gradient flow w. r. t. the total interfacial energy. More precisely, the thresholding scheme is a minimizing movements scheme for an energy functional that Γ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Gamma $$\end{document}-converges to the total interfacial energy. In this sense, our proof is similar to the convergence results of Almgren et al. (SIAM J Control Optim 31(2):387–438, 1993) and Luckhaus and Sturzenhecker (Calculus Var Partial Differ Equ 3(2):253–271, 1995), which establish convergence of a more academic minimizing movements scheme. Like the one of Luckhaus and Sturzenhecker, ours is a conditional convergence result, which means that we have to assume that the time-integrated energy of the approximation converges to the time-integrated energy of the limit. This is a natural assumption, which however is not ensured by the compactness coming from the basic estimates.


Context
The thresholding scheme, a time discretization for mean curvature flow introduced by Merriman et al. [26], has because of its conceptual and practical simplicity become a very popular scheme, see Algorithm 1 for its definition in a more general context. It has a natural extension from the two-phase case to the multi-phase case with triple junctions in local equilibrium, well-known in case for equal surface tensions since some time [27]. Multi-phase meancurvature flow models the slow relaxation of grain boundaries in polycrystals (called grain growth), where each grain corresponds to a phase. Elsey et al. have shown that (a modification of) the thresholding scheme is practical in handling a large number of grains over time intervals sufficiently large to extract significant statistics of the coarsening (also called aging) of the grain configuration [11][12][13]. In grain growth, the surface tension (and the mobility) of a grain boundary is both dependent on the misorientation between the crystal lattice of the two adjacent grains and on the orientation of its normal. In other words, the surface tension σ i j of an interface is indexed by the pair (i, j) of phases it separates, and is anisotropic. Esedoglu and the second author have shown in [14] the thresholding scheme can be extended to handle the first extension in a very general way, including in particular the most popular Ansatz for a misorientation-dependent grain boundary energy [33]. How to handle general anisotropies in the framework of the thresholding scheme, even in case of two phases, seems not yet to be completely settled, see however [5] and [19]. Hence in this work, we will focus on the isotropic case, ignore mobilities, but make the attempt to be as general as [14] when it comes to the dependence of σ i j on the pair (i, j).
In the two-phase case, the convergence of the thresholding scheme is well-understood: two-phase mean curvature flow satisfies a geometric comparison principle, and it is easy to see that the thresholding scheme preserves this structure. Partial differential equations and geometric motions that allow for a comparison principle can typically be even characterized by comparison with very simple solutions, which opens the way for a definition of a very robust notion of weak solutions, namely what bears the somewhat misleading name of viscosity solutions. If one allows for what the experts know as fattening, two-phase mean-curvature flow is well-posed in this framework [16]. Barles and Georgelin [4] and Evans [15] proved independently that the thresholding scheme converges to mean-curvature flow in this sense. Hence the main novelty of this work is a (conditional) convergence result in the multi-phase case; where clearly a geometric comparison principle is absent. However, the result has also some interest in the two-phase case, since it establishes convergence even in situations where the viscosity solution features fattening. Together with Drew Swartz [22], the first author uses similar arguments to treat another version of mean curvature flow that does not even allow for a comparison principle in the two-phase case, namely volume-preserving mean-curvature flow. They prove (conditioned) convergence of a scheme introduced by Ruuth and Wetton in [35]. We also draw the reader's attention to the recent work of Mugnai et al. [32], where they prove a (conditional) convergence result as in [24] of a modification of the scheme in [2,24] to volume-preserving mean curvature flow. Note that due to the only conditional convergence, our result does not provide a long-time existence result for (weak solutions of) multi-phase mean curvature flow. Short-time existence results of smooth solutions go back to the work of Bronsard and Reitich [7]. Mantegazza et al. [25] and Schnürer et al. [38] were able to construct long-time solutions close to a self-similar singularity.
For the present work, the structural substitute for the comparison principle is the gradient flow structure. Folklore says that mean curvature flow, also in its multi-phase version, is the gradient flow of the total interfacial energy. It is by now well-appreciated that the gradient flow structure also requires fixing a Riemannian structure, that is, an inner product on the tangent space, which here is given by the space of normal velocities. Mean curvature flow is then the gradient flow with respect to the L 2 -inner product, in case of grain growth weighted by grain-boundary-dependent and anisotropic mobilities. Loosely speaking, Brakke's existence proof in the framework of varifolds [6] is based on this structure in the sense that the solution monitors weighted versions of the interfacial energy. Recently, Kim and Tonegawa [21] improved this work by deriving the continuity of the volumes of the grains in the case of grain growth with equal surface tensions which ensures that the solution is non-trivial. Also Ilmanen's convergence proof of the Allen-Cahn equation, a diffuse interface approximation of computational relevance in the world of phase-field models, to mean curvature flow makes use of the gradient flow structure [18]. It was only discovered recently that the thresholding algorithm preserves also this gradient flow structure [14], which in that paper was taken as a guiding principle to extend the scheme to surface tensions σ i j and mobilities that depend on the phase pair (i, j). In this paper, we take the gradient flow structure, which we make more precise in the following paragraphs, as a guiding principle for the convergence proof.
On the abstract level, every gradient flow has a natural discretization in time, which comes in form of a sequence of variational problems: the configuration n at time step n is obtained by minimizing 1 2 dist 2 ( , n−1 ) + h E( ), where n−1 is the configuration at the preceding time step, h is the time-step size and dist denotes the induced distance on the configuration space endowed with the Riemannian structure. In the Euclidean case, the Euler-Lagrange equation (i. e. the first variation) of this variational problem yields the implicit (or backwards) Euler scheme. This variational scheme has been named "minimizing movements" by De Giorgi [10], and has recently gained popularity because it allows to interpret diffusion equations as gradient flows of an entropy functional w. r. t. the Wasserstein metric ( [20], see [3] for the abstract framework)-an otherwise unrelated problem. However, the formal Riemannian structure in case of mean curvature flow is completely degenerate: dist 2 ( ,˜ ) as defined as the infimal energy of curves in configuration space that connect to˜ vanishes identically, cf. [28].
Hence when formulating a minimizing movements scheme in case of mean curvature flow, one has to come up with a proxy for dist 2 ( ,˜ ). This has been independently achieved by Almgren et al. [2] on the one side and Luckhaus and Sturzenhecker [24] on the other side of the Atlantic.
= ∂ and˜ = ∂˜ , 2 ˜ d(x, )dx is one possible substitute for dist 2 ( ,˜ ) in the minimizing movements scheme, where d(x, ) denotes the unsigned distance of the point x to the surface -it is easy to see that to leading order in the proximity of˜ to , this expression behaves as the metric tensor V 2 dx, where V is the normal velocity leading from to˜ in one unit time. Their work makes this point by proving that this minimizing movements scheme converges to mean curvature flow. To be more precise, they consider a time-discrete solution { n } n of the minimizing movement scheme, interpolated as a piecewise constant function h in time and assume that for a subsequence h ↓ 0, the time-dependent sets h converge to in a stronger sense than the given compactness provides. Almgren et al. assume that h (t) converges to (t) in the Hausdorff distance and show that solves the mean curvature flow equation in the above mentioned viscosity sense. The argument was later substantially simplified by Chambolle and Novaga in [9]. Luckhaus and Sturzenhecker start from a weaker convergence assumption than the one in [2]: they assume that for the finite time horizon T under consideration, T 0 | h (t)|dt converges to T 0 | (t)|dt. Then they show that evolves according to a weak formulation of mean curvature flow, using a distributional formulation of mean curvature that is available for sets of finite perimeter, see Definition 1.1 for the multi-phase case of this formulation. Incidentally, weak-strong uniqueness of this formulation seems not to be understood-even in the two-phase case. Those are both only conditional convergence results: While the natural estimates coming from the minimizing movements scheme, namely the uniform boundedness of sup n | n | and n 2 n n+1 d(x, n )dx, are enough to ensure   T 0 | (t)|dt or even the convergence of h (t) to (t) in the Hausdorff distance. Our result will be a conditional convergence result very much in the same sense as the one in [24] but it turns out that our convergence result for the thresholding scheme requires no regularity theory for (almost) minimal surfaces, in contrast to the one of [24] and is therefore not restricted to low spatial dimensions d ≤ 7. Although the time discretization scheme in [2,24] seems rather academic from a computational point of view, it has been adapted for numerical simulations by Chambolle in [8]. Nevertheless, even in that variant, in each step one has to compute a (signed) distance function and solve a convex optimization problem.
Following [14], we now explain in which sense the thresholding scheme may be considered as a minimizing movements scheme for mean curvature flow. Each step in Algorithm 1 is equivalent to minimizing a functional of the form E h (χ)−E h (χ −χ n−1 ), where the functional E h , defined below in (3), is an approximation to the total interfacial energy. It is a little more subtle to see that the second term, −E h (χ n − χ n−1 ), is comparable to the metric tensor V 2 dx. The -convergence of functionals of the type (3) to the area functional has a long history: for the two-phase case, cf. Alberti and Bellettini [1] and Miranda et al. [29]. The multi-phase case, also for arbitrary surface tensions was investigated by Esedoglu and the second author in [14]. Incidentally, it is easy to see that -convergence of the energy functionals is not sufficient for the convergence of the corresponding gradient flows; Sandier and Serfaty [36] have identified sufficient conditions on both the functional and the metric tensor for this to be true.
Identically, the approach of Saye and Sethian [37] for multi-phase evolutions can also be seen as coming from the gradient flow structure when applied to mean-curvature flow with P phases. More precisely, it can be understood as a time splitting of an L 2 -gradient flow with an additional phase whose volume is strongly penalized: the first step is (P + 1)-phase gradient flow w. r. t. the total interfacial energy and the second step is (P + 1)-gradient flow w. r. t. the volume penalization (so geometrical optics leading to the Voronoi construction).

Informal summary of the proof
We now give a summary of the main steps and ideas of the convergence proof. In Sect. 2, we draw consequences from the basic estimate (10) in a minimizing movements scheme, like compactness, Proposition 2.1, coming from a uniform (integrated) modulus of continuity in space, Lemma 2.4, and in time, Lemma 2.5. We also draw the first consequence from the strengthened convergence (8) in the case of equal surface tensions in Proposition 2.2. We strongly advise the reader to familiarize him-or herself with the argument for the modulus of continuity in time, Lemma 2.5, since it is there that the mesoscopic time scale √ h appears for the first time in a simple context before being used in Sect. 4 in a more complex context. In the same vein, the fudge factor α in the mesoscopic time scale α √ h, which will be crucial in Sect. 4, will first be introduced and used in the simple context when estimating the normal velocity V of the limit in Proposition 2.2.
Starting from Sect. 3, we also use the Euler-Lagrange equation (34) of the minimizing movement scheme. By Euler-Lagrange equation we understand the first variation w. r. t. the independent variables, as generated by a test vector field ξ . In Sect. 3, we pass to the limit in the energetic part of the first variation, recovering the mean curvature H via the term H ξ · ν = ∇ · ξ − ν · ∇ξ ν. This amounts to show that under our assumption of strengthened convergence (8), the -convergence of the functionals can be upgraded to a distributional convergence of their first variations, cf. Proposition 3.1. It is a classical result credited to Reshetnyak [34] that under the strengthened convergence of sets of finite perimeter, the measure-theoretic normals and thus the distributional expression for mean curvature also converge. The fact that this convergence of the first variation may also hold when combined with a diffuse interface approximation is known for instance in case of the Ginzburg-Landau approximation of the area functional (also known by the names of Modica and Mortola, who established this -convergence [30,31]), see [23]. In our case the convergence of the first variations relies on a localization of the ingredients for the -convergence worked out in [14], like the consistency, i. e. pointwise convergence of these functionals. Section 4 constitutes the central and, as we believe, most innovative piece of the paper; we pass to the limit in the dissipation/metric part of the first variation, recovering the normal velocity V via the term V ξ · ν. In fact, we think of the test-field ξ as localizing this expression in time and space, and recover the desired limiting expression only up to an error that measures how well the limiting configuration can be approximated by a configuration with only two phases and a flat interface in the space-time patch under consideration; this is measured both in terms of area (leading to a multi-phase excess in the language of the regularity theory of minimal surfaces) and volume, see Proposition 4.1. The main difficulty of recovering the metric term V ξ · ν in comparison to recovering the distributional form ∇ · ξ − ν · ∇ξ ν of the energetic term is that one has to recover both the normal velocity V , which is distributionally characterized by ∂ t χ − V |∇χ| = 0 on the level of the characteristic function χ, and the (spatial) normal ν. In short: one has to pass to the limit in a product. More precisely, the main difficulty is that there is no good bound on the discrete normal velocity V at hand on the level of the microscopic time scale h; only on the level of the above-mentioned mesoscopic time scale √ h, such an estimate is available. This comes from the fact that the basic estimate yields control of the time derivative of the characteristic function χ only when mollified on the spatial scale √ h in u = G h * χ. The main technical ingredient to overcome this lack of control in Proposition 4.1 is presented in Lemma 4.2 in the two-phase case and in Lemma 4.5 in the general setting: if one of the two (spatial) functions u,ũ is not too far from being strictly monotone in a given direction (a consequence of the control of the tilt excess, see Lemma 4.4), then the spatial L 1 -difference between the level sets χ = {u > 1 2 } andχ = {ũ > 1 2 } is controlled by the squared L 2 -difference between u andũ. In Sect. 5, we combine the results of the previous two sections yielding the weak formulation of V = H on some space-time patch up to an error expressed in terms of the above mentioned (multi-phase) tilt excess of the limit on that patch. Complete localization in time and partition of unity in space allows us to assemble this to obtain V = H globally, up to an error expressed by the time integral of the sum of the tilt excess over the spatial patches of finite overlap. De Giorgi's structure theorem for sets of finite perimeter (cf. Theorem 4.4 in [17]), adapted to a multi-phase situation but just used for a fixed time slice, implies that the error expression can be made arbitrarily small by sending the length scale of the spatial patches to zero.

Notation
We denote by the Gaussian kernel of variance h. Note that G 2t (z) is the fundamental solution to the heat equation and thus We recall some basic properties, such as the normalization, non-negativity, boundedness and the factorization property: where G 1 denotes the 1-dimensional and G d−1 the (d − 1)-dimensional Gaussian kernel; let us also mention the semi-group property Throughout the paper, we will work on the flat torus [0, ) d . The thresholding scheme for multiple phases, introduced in [14], for arbitrary surface tensions σ i j and mobilities μ i j = 1/σ i j is the following. 1. Convolution step: 2. Thresholding step: We will denote the characteristic functions of the phases n i at the nth time step by χ n i and interpolate these functions piecewise constantly in time, i. e.
As in [14], we define the approximate energies for admissible measurable functions: Here and in the sequel dx stands short for [0, ) d dx, whereas dz stands short for R d dz.
The minimal assumption on the matrix of surface tensions {σ i j }, next to the obvious is the following triangle inequality It is known that (e. g. [14]), under the conditions above, these energies -converge w. r. t. the L 1 -topology to the optimal partition energy given by for admissible χ: The constant c 0 is given by For our purpose we ask the matrix of surface tensions σ to satisfy a strict triangle inequality: We recall the minimizing movements interpretation from [14] which is easy to check. The combination of convolution and thesholding step in Algorithm 1 is equivalent to solving the following minimization problem where χ runs over (4). The proof will mostly be based on the interpretation (5) and only once uses the original form (1) and (2) in Lemmas 4.2 and 4.4, respectively. Following [14], we will additionally assume that σ is conditionally negative-definite, i. e.
where σ > 0 is a constant. That means, that σ is negative as a bilinear form on (1, . . . , 1) ⊥ . This ensures that −E h (χ − χ n−1 ) in (5) is non-negative and penalizes the distance to the previous step.
In the following we write A B to express that A ≤ C B for a (possibly large) generic constant C < ∞ that only depends on the dimension d, the total number of phases P and on the matrix of surface tensions σ through σ min = min i = j σ i j , σ max = max σ i j , σ and min{σ ik + σ k j − σ i j : i, j, k pairwise different}. Furthermore, we say a statement holds for A B if the statement holds for A ≤ 1 C B for some generic constant C < ∞ as above.

Main result
The definition of our weak notion of mean-curvature flow is a distributional formulation which is suited to the framework of functions of bounded variation. Definition 1.1 (Motion by mean curvature) Fix some finite time horizon T < ∞, a matrix of surface tensions σ as above and initial data χ 0 : We say that the network and which are normal velocities in the sense that for Note that (7) also encodes the initial conditions as well as (6) encodes the Herring angle condition. Indeed, for a smooth evolution, since for any interface we have where = ∂ , b denotes the conormal and H the mean curvature of , we do not only obtain the equation along the smooth parts of the interfaces but also the Herring angle condition at triple junctions. If three phases 1 , 2 and 3 meet at a point x, then we have In terms of the opening angles θ 1 , θ 2 and θ 3 at the junction, this condition reads so that the opening angles at triple junctions are determined by the surface tensions.

Remark 1.2
To prove the convergence of the scheme, we will need the following convergence assumption: This assumption makes sure that there is no loss of area in the limit h → 0 as in Fig. 1.

Theorem 1.3
Let P ∈ N, let the matrix of surface tensions σ satisfy the strict triangle inequality and be conditionally negative-definite, T < ∞ be a finite time horizon and let χ 0 be given with E(χ 0 ) < ∞. Then for any sequence there exists a subsequence h ↓ 0 and a The mesoscopic time scale arises naturally from the scheme: due to the parabolic scaling, the microscopic time scale h corresponds to the length scale √ h as can be seen from the kernel G h . Since for a smooth evolution, the normal velocity V is of order 1, this prompts the mesoscopic time scale √ h. The parameter α will be kept fixed most of the time until the very end, where we send α → 0. Therefore, it is natural to think of α ∼ 1, but small. These three time scales go hand in hand with the following numbers, which we will for simplicity assume to be natural numbers throughout the proof: The following simple identities linking these different parameters will be used frequently:

Compactness
In this section we prove the compactness of the approximate solutions, construct the normal velocities and derive bounds on these velocities. In the first subsection we present all results of this section; the proofs can be found in the subsequent subsection.

Results
The first main result of this section is the following compactness statement.
The second main result of this section is the following construction of the normal velocities and the square-integrability under the convergence assumption (8).

Proposition 2.2
If the convergence assumption (8) holds, the limit χ = lim h→0 χ h has the following properties.
Both results essentially stem from the following basic estimate, a direct consequence of the minimizing movements interpretation (5).

Lemma 2.3 (Energy-dissipation estimate) The approximate solutions satisfy
√ −E h defines a norm on the process space {ω : [0, ) d → R P | i ω i = 0}. In particular, the algorithm dissipates energy.
In order to prove Proposition 2.1 we derive estimates on time-and space-variations of the approximations only using the basic estimate (10).
Variations in time are controlled by the following lemma coming from interpolating the (unbalanced) estimate (10) on time scales of order √ h.

Lemma 2.5 (Almost BV in time)
The approximate solutions satisfy for any τ > 0.
Let us also mention that with the same methods we can prove C In particular, For the proof of the second main result of this section, Proposition 2.2, and also for later use in Sect. 4 it is useful to define certain measures which are induced by the metric term. These measures allow us to localize the result of Lemma 2.5. In the two-phase case this is enough to prove that the measure ∂ t χ is absolutely continuous w. r. t. the perimeter and the existence and integrability of the normal velocity, cf. (i) and (ii) of Proposition 2.2. The square-integrability follows then from a refinement of these estimates by localizing the fudge factor α (cf. Remark 1.5) after passage to the limit h → 0. where By the monotonicity of h → G h * u L 2 and the energy-dissipation estimate (10), we have and μ h μ after passage to a further subsequence for some finite, non-negative measure μ In order to prove Proposition 2.2 also in the multi-phase case we have to ensure that the convergence assumption implies the convergence of the individual interfacial areas

Fig. 3
As h → 0, the two interfaces h 12 and h 23 merge into one interface, 13 , between Phases 1 and 3. Therefore the measure of | h 13 | jumps up in the limit h → 0 although the total interfacial energy converges due to the choice of surface tensions Lemma 2.8 (Implications of convergence assumption) The convergence assumption (8) ensures that for any pair i = j and any as h → 0.
The proof of Lemma 2.8 heavily relies on the fact that σ satisfies the strict triangle inequality so that we can preserve the triangle inequality after perturbing the energy functional. The following example shows that this is not a technical assumption but is a necessary condition for the lemma to hold and thus plays a crucial role in identifying the normal velocities V i .

Example 2.9
To fix ideas let us consider three sets 1 , 2 and 3 in dimension d = 2 with surface tensions σ 12 = σ 23 = 1, σ 13 = 2 as illustrated in Fig. 3. Then, the total energy is constant in h and due to the choice of the surface tensions the convergence assumption is fulfilled. Nevertheless, we clearly have This example also illustrates that although the energy functional E is lower semi-continuous, the individual interfacial energies 1

Proofs
Before proving the statements of this section we cite two results of [14] which will be used frequently in the proofs. The following monotonicity statement is a key tool for the -convergence in [14]. We will use it throughout our proofs but we seem not to rely heavily on it.
Another important tool for the -convergence in [14] is the following consistency, or pointwise convergence of the functionals E h to E, which we will refine in Sect. 3.

Lemma 2.11 (Consistency) For any admissible χ ∈ BV , we have
Taking the limit h → 0 in (18) with χ = χ 0 and using (19), we see that that the interfacial energy E 0 of the initial data χ(0) ≡ χ 0 bounds the approximate energy of the initial data: We first prove Proposition 2.1 which follows directly from the estimates in Lemmas 2.4 and 2.5. Then we give the proofs of the Lemmas used for Proposition 2.1. We present the proof of Proposition 2.2 at the end of this section since the proof heavily relies on the techniques developed in the proofs of the lemmas, especially in Lemma 2.5.
Proof of Proposition 2.1 The proof is an adaptation of the Riesz-Kolmogorov L p -compactness theorem. By Lemmas 2.4 and 2.5, we have for any δ, τ > 0 and e ∈ S d−1 . For δ > 0 consider the mollifier ϕ δ given by the scaling Hence, on the one hand, the mollified functions are equicontinuous and by Arzelà-Ascoli where the balls B (u i ) are given w. r. t. the C 0 -norm. On the other hand, for any function χ we have Using this for χ h and plugging in (20) yields Then set := ρ T d and find u 1 , . . . , u n from above. Note that only finitely many of the elements in the sequence {h} are greater than h 0 . Therefore, is a finite covering of balls (w. r. t. L 1 -norm) of given radius ρ > 0. Therefore, {χ h } h is precompact and hence relatively compact in L 1 . Hence we can extract a converging subsequence. After passing to another subsequence, we can w. l. o. g. assume that we also have pointwise convergence almost everywhere in (0,

Proof of Lemma 2.3 By the minimality condition (5), we have in particular
. Then (10) follows from the short argument after Lemma 2.11. We claim that the pairing − 1 dx defines a scalar product on the process space. It is bilinear and symmetric thanks to the symmetry of σ and G h . Since σ is conditionally negative-definite, Furthermore, we have equality only if ω ≡ 0. Thus, √ −E h is the induced norm on the process space.

Proof of Lemma 2.4
Step 1 We claim that Indeed, for any characteristic function χ : and thus by symmetry of G 2h : Applying this on χ h i , summing over i = 1, . . . , P, using where we used the approximate monotonicity of E h , cf. Lemma 2.10. Using the energydissipation estimate (10), we have and integration in time yields (21).
Step 2 By (21) and Hadamard's trick, we have on the one hand Using the translation invariance and (22) for the components of χ h , we have Proof of Lemma 2.5 In this proof, we make use of the mesoscopic time scale τ = α √ h, see Remark 1.5 for the notation. First we argue that it is enough to prove for α ∈ [1,2]. If α ∈ (0, 1), we can apply (23) twice, once for τ = √ h and once for τ = (1 + α) √ h and obtain (12). If α > 2, we can iterate (23). Thus we may assume that α ∈ [1,2]. We have for all these k's. Hence we may assume w. l. o. g. that k = 0 and prove only Note that for any two characteristic functions χ,χ we have Now we post-process the energy-dissipation estimate (10). Using the triangle inequality for the norm √ −E h on the process space and Jensen's inequality, we have Using (25) for χ Kl i and χ K(l−1) i with (22) for the second and the third right-hand side term and the conditional negativity of σ and the above inequality for the first right-hand side term we obtain which establishes (24) and thus concludes the proof.
Proof of Lemma 2.6 First note that (14) follows directly from (13) since we also have χ h (t) → χ(t) in L 1 for almost every t. The argument for (13) comes in two steps. Let s > t, τ := s − t and t ∈ [nh, (n + 1)h).
Step 1 Let τ be a multiple of h. We may assume w. l. o. g. that τ = m 2 h for some m ∈ N. As in the proof of Lemma 2.5, using (25) and (26) we derive As before, we sum these estimates: Step We prove the statement in three steps. In the first step we reduce the statement to a time-independent one. In the second step, we show that due to the strict triangle inequality, the convergence of the energies implies the convergence of the individual perimeters. In the third step, we conclude by showing that this convergence still holds true if we localize with a test function ζ , which proves the time-independent statement formulated in the first step.
Step 1: reduction to a time-independent problem It is enough to prove that We further claim that for a subsequence Then Lebesgue's dominated convergence, cf. (10), and the convergence assumption (8) yield and thus (28) after passage to a subsequence. Therefore, we can apply (27) for a. e. t and the time-dependent version follows from the time-independent one by Lebesgue's dominated convergence theorem and (10).
Step 2: convergence of perimeters We claim that given , the individual perimeters converge in the following sense: we have where F h and F are the two-phase analogues of the (approximate) energies: We will prove this claim by perturbing the functional E h . We recall that the functionals F h -converge to F (see e. g. [29] or [14]). Since the argument for the three cases work in the same way, we restrict ourself to the first case, Since the matrix of surface tensions σ satisfies the strict triangle inequality, we can perturb the functionals E h in the following way: for sufficiently small > 0, the associated surface tensions for the functional satisfy the triangle inequality so that approximate monotonicity, Lemma 2.10, and consistency, Lemma 2.11, still apply. Therefore, by Lemma 2.10, we have for By assumption, the left-hand side converges to E(χ).
is clearly a continuous functional on L 2 , the first right-hand side term converges as h → 0. Thus, for any h 0 > 0, As h 0 → 0, Lemma 2.11 yields By the -convergence we also have and thus the convergence Step 3: conclusion We claim that given (27). We will not prove (27) directly but prove that for any for the localized functionals instead. This is indeed sufficient since for any χ 1 , χ 2 , we clearly have and (29) therefore implies (27). Now we give the argument for (29). As before, we only prove one of the statements, . For this we use two lemmas that we will prove in Sect. 3. First, by applying Lemma 3.6, which is the localized version of Lemma 2.11, we have for → 0 and thus conclude the proof. Let us mention that one can also follow a different line of proof by localizing the monotonicity statement of Lemma 2.10 with a test function ζ . Since Lemma 3.7 seems more robust, we only prove the statement in this fashion.
Proof of Proposition 2.2 We make use of the mesoscopic time scale τ , see Remark 1.5 for the notation.
In this part we choose α = 1. Using the notation ∂ τ ζ for the discrete time derivative , by the smoothness of ζ , Since suppζ is compact, by Lemma 2.5 we have for any α > 0 and any ζ ∈ We fix ζ and by linearity we may assume that ζ ≥ 0 if we prove the inequality with absolute values on the left-hand side. We use the identity from above Setting to be the time average over a microscopic time interval, we have For simplicity, we will ignore k at first. We can argue as in the proof of Lemma 2.5, here with the localization ζ : By (22) we have for any with F h as in (30) and furthermore Therefore, using (25) we obtain where the last right-hand side term vanishes as h ↓ 0 by (10). For the first right-hand side term we note that for any ζ ∈ C ∞ ([0, ) d ) and any χ,χ ∈ {0, 1} we have so that we can replace the first right-hand side term by up to an error that vanishes as h ↓ 0, due to the above calculation and e. g. Lemma 2.6. As in (26) for −E h , now for this localized version, we can use the triangle inquality and Jensen's inequality to bound this term by as h ↓ 0, where μ h is the (approximate) dissipation measure defined in (15). Therefore we have as h ↓ 0. Taking the mean over the k's we obtain Passing to the limit h → 0, (17), which is guaranteed by the convergence assumption (8), implies (31).
If we take ζ ∈ C ∞ 0 (U ), the first term on the right-hand side of (31) vanishes and therefore Since the left-hand side does not depend on α, we have Taking the supremum over all Thus, ∂ t χ i is absolutely continuous w. r. t. |∇χ i | dt and the Radon-Nikodym theorem completes the proof. Argument for (iii) We refine the estimate in the argument for (ii). Instead of estimating the right-hand side of (31) and optimizing afterwards, which leads to a weak L 2 -bounds, we localize. Starting from (31), we notice that we can localize with the test function ζ . Thus, we can post-process the estimate and obtain and some constant C < ∞ which depends only on the dimension d, the number of phases P and the matrix of surface tensions σ . Now choose where we set α := 1 if V i = 0, in which case all other integrands vanish. Then, the first term on the right-hand side can be absorbed in the left-hand side and we obtain

Energy functional and curvature
It is a classical result by Reshetnyak [34] that the convergence χ h → χ in L 1 and imply convergence of the first variation A result by Luckhaus and Modica [23] shows that this may extend to a -convergence situation, namely in case of the Ginzburg-Landau functional We show that this also extends to our -converging functionals E h . Let us first address why the first variation of the approximate energies is of interest in view of our minimizing movements scheme. We recall (5): the approximate solution χ n at time nh minimizes The natural variations of such a minimization problem are inner variations, i. e. variations of the independent variable. Given a vector field and an admissible χ, we define the deformation χ s of χ along ξ by the distributional equation which means that the phases are deformed by the flow generated through ξ . The inner variation δ E h of the energy E h at χ along the vector field ξ is then given by For an admissibleχ the inner variation of the metric term −E h (χ −χ) is given by The (chosen and not necessarily unique) minimizer χ n in Algorithm 1 therefore satisfies the Euler-Lagrange equation

Results
The goal of this section is to prove the following statement about the convergence of the first term in the Euler-Lagrange equation.
and furthermore assume that Then, for any It is easy to reduce the statement to the following time-independent statement.
and furthermore assume that Then, for any Remark 3.3 Proposition 3.2 and all other statements in this section hold also in a more general context. We do not need the approximations χ h to be characteristic functions. In fact the statements hold for any sequence u h : The following first lemma brings the first variation δ E h of E h into a more convenient form, up to an error vanishing as h → 0 because of the smoothness of ξ . Already at this stage one can see the structure We have already seen in Lemma 2.8 that we can pass to the limit in the term involving only the kernel G h I d: where now ζ = ∇ · ξ . The next proposition shows that we can also pass to the limit in the term involving the second derivatives h∇ 2 G h of the kernel, which yields the projection ν ⊗ ν onto the normal direction in the limit.

Proposition 3.5
Let χ h , χ satisfy the convergence assumptions (37) and (38). Then for any The following two statements are used to prove Proposition 3.5. The following lemma yields in particular the construction part in the -convergence result of E h to E. We need it in a localized form; the proof closely follows the proof of Lemma 4 in Section 7.2 of [14].
The next lemma shows that under our convergence assumption of χ h to χ, the corresponding spatial covariance functions f h and f are very close and allows us to pass from Lemmas 3.4 and 3.6 to Proposition 3.2.

Lemma 3.7 (Error estimate)
Let χ h , χ satisfy the convergence assumptions (37) and (38) and let k be a non-negative kernel such that for some polynomial p. Then where

Proofs
Proof of Proposition 3.  (10) that Applying Proposition 3.5 for the kernel ∇ 2 G with ∇ξ playing the role of the matrix field A and Lemma 2.8 for the kernel G with ζ = ∇ · ξ , we can conclude the proof.
Proof of Lemma 3.4 Recall the definition of δ E h in (32). Since −∇χ · ξ = −∇ · (χ ξ) + χ (∇ · ξ ) for any functionχ : [0, ) d → R, we can rewrite the integral on the right-hand side of (32): Let us first turn to the first right-hand side term. For fixed (i, j), we can collect the two terms in the sum that belong to the interface between phases i and j and obtain by the antisymmetry of the kernel ∇G h that the resulting term with the prefactor A Taylor expansion of ξ around x gives the first-order term Now we argue that the second-order term is controlled by , the contribution of the second-order term is controlled by Using the approximate monotonicity (18) of E h , we have suitable control over this term.
After distributing the first-order term on both summand (i, j) and ( j, i) we therefore have Proof of Proposition 3.5 By Lemma 3.6 we know that the term converges if we take χ instead of the approximation χ h on the left-hand side of the statement. Lemma 3.7 in turn controls the error by substituting χ h by χ on the left-hand side.
Proof of Lemma 3.6 Our main focus in this proof lies on the anisotropic kernel ∇ 2 G. The statement for G is-up to the localization-already contained in the proof of Lemma 4 in Section 7.2 of [14].
Step 1: reduction of the statement to a simpler kernel Since ∇ 2 G(z) is a symmetric matrix, the inner product depends only the symmetric part A sym of A; hence w. l. o. g. let A be a symmetric matrix field. But then there exist functions We also note e i ⊗ e j + e j ⊗ e i = e i + e j ⊗ e i + e j − e i ⊗ e i + e j ⊗ e j .
Hence by linearity it is enough to prove the statement for A of the form for some ξ ∈ S d−1 . By rotational invariance we may assume Hence the statement can be reduced to for any ζ ∈ C ∞ ([0, ) d ). In the following we will show that for any such ζ and χ, and for the anisotropic kernel k(z) = z 2 1 G(z) we have The analogous statement for the Gaussian kernel G instead of the anisotropic kernel k isup to the localization with ζ -contained in [14]. In that case the right-hand side of (43) turns into the localized energy, i.e. replacing the anisotropic term (ν 2 1 + 1) by 1. Since it is indeed sufficient to prove (43). We will prove this in five steps. Before starting, we introduce spherical coordinates z = r ξ on the left-hand side: In the following two steps of the proof, we simplify the problem by disintegrating in r (Step 2) and ξ (Step 3). Then we explicitly calculate an integral that arises in the second reduction and which translates the anisotropy of the kernel k into a geometric information about the normal (Step 4). We simplify further by disintegration in the vertical component (Step 5) and conclude by solving the one-dimensional problem (Step 6).
Step 2: disintegration in r We claim that it is sufficient to show Indeed, note that since G(z) = G(|z|) and d dr G(r ) = −r G(r ) we have, using integration by parts, Replacing √ h by √ h r on the left-hand side of (45) and integrating w. r. t. the non-negative measure G(r )r d dr and using the equality from above shows that (45), in view of (44), formally implies (43). To make this step rigorous, we use Lebesgue's dominated convergence theorem. A dominating function can be obtained as follows: which is finite and independent of r . Hence, it is integrable w. r. t. the finite measure G(r )r d+2 dr.
Step 3: disintegration in ξ We claim that it is sufficient to show that for each ξ ∈ S d−1 , Indeed, if we integrate w. r. t. the non-negative measure 1 2 ξ 2 1 dξ we obtain the left-hand side of (45) from the left-hand side of (46). At least formally, this is obvious because of the symmetry under ξ → −ξ . The dominating function to interchange limit and integration is obtained as in Step 1: For the passage from the right-hand side of (46) to the right-hand side of (45) we note that since and |ν| = 1 |∇χ|a. e. it is enough to prove to obtain the equality for the right-hand side.
Step 6: argument for (49) Since χ,χ are {0, 1}-valued, every jump has height 1 and since χ,χ ∈ BV ([0, )), the total number of jumps is finite. Let J,J ⊂ [0, ) denote the jump sets of χ andχ , respectively. Now, if √ h is smaller than the minimal distance between two different points in J ∪J , then in view of (48), the only contribution to the left-hand side of (49) comes from neighborhoods of points where both, χ andχ , jump:

on each of these intervals and that
for intervals of the form Note that by (48), χ +χ jumps precisely where either χ orχ jumps. Thus Therefore, (49) holds, which concludes the proof.
Proof of Lemma 3.7 The proof is divided into two steps. First, we prove the claim for k = G, to generalize this result for arbitrary kernels k in the second step.
Step 1: k = G By Lemma 3.6 and the convergence assumption (38), we already know Hence, it is sufficient to show that Fix h 0 > 0 and N ∈ N and set h := 1 N 2 h 0 . We will make use of the following triangle inequality for f (h) = f, f h : This inequality has been proven in the proof of Lemma 3 in Section 7.1 of [14]. For the convenience of the reader we reproduce the argument here: using the admissibility of χ in the form of k χ k = 1, we obtain the following identity for any pair 1 ≤ i, j ≤ P of phases and any points x, x , x ∈ [0, ) d : Note that the contribution of k ∈ {i, j} to the sum has a sign: k∈{i, j} We now fix z, w ∈ R d and use the above inequality for x = x + z, x = x + z + w so that after multiplication with σ i j , summation over 1 ≤ i, j ≤ P and integration over x, we obtain on the left-hand side. Indeed, using the translation invariance for the term appearing in f ζ (w), we have Using the triangle inequality for the surface tensions, we see that the first right-hand side integral is non-positive: Indeed, the first and the third term, and the second and the last term cancel since the domain of indices in the sums is symmetric and thus we have (50). By iterating the triangle inequality (50) for Hence, by the definition of h, Therefore, using (51) for f h , the subadditivity of u → u + and finally (51) for f , we obtain Integrating w. r. t. the positive measure G(z) dz yields Given δ > 0, by Lemma 3.6 we may first choose h 0 > 0 such that for all 0 < h < h 0 : We note that we may now choose N ∈ N so large that for all 0 < h < 1 N 2 h 0 : Indeed, using the triangle inequality and translation invariance we have which tends to zero as h → 0 because by Lebesgue's dominated convergence and (37).
Hence also the second term on the right-hand side of (52) is small: Step 2: k = p G Fix > 0. Since G is exponentially decaying, we can find a number Hence we can split the integral into two parts. On the one hand, using (40) for k = G, and on the other hand, using (53) and the approximate monotonicity in Lemma 2.10, By the convergence assumption (38) and the consistency, cf. Lemma 2.11, we can take the limit h → 0 on the right-hand side and obtain lim sup Since the left-hand side does not depend on > 0, this implies (40).

Dissipation functional and velocity
As for any minimizing movements scheme, the time derivative of the solution should arise from the metric term in the minimization scheme. For the minimizing movements scheme of our interfacial motion, the time derivative is the normal velocity. The goal of this section, which is the core of the paper, is to compare the first variation of the dissipation functional to the normal velocity.

Idea of the proof
Let us first give an idea of the proof in a simplified setting with only two phases, a constant test vector field ξ and no localization. Then the first variation (33) of the metric term reads Using the distributional equation ∇χ · ξ = ∇ · (χξ ) − (∇ · ξ ) χ, this is equal to as h → 0. We will prove this in Lemma 4.7. Since ∂ −h t χ h = χ n −χ n−1 h V |∇χ| dt and √ h∇G h * χ n ≈ c 0 ν only in a weak sense, we cannot pass to the limit a priori. Our strategy is to freeze the normal and to control by the excess where χ * = 1 {x·ν * >λ} is a half space in direction of ν * . By the convergence assumption ε 2 converges to as h → 0, which is small by De Giorgi's structure theorem-at least after localization in space and time; i. e. sets of finite perimeter have (approximate) tangent planes almost everywhere.
To be self-consistent we will prove this application of De Giorgi's result in Sect. 5. The main difficulty in controlling (54) lies in finding good bounds on For the sake of simplicity we set E 0 = T = = 1 and write χ instead of χ h in the following. In Sect. 2 we have seen the bound For this, we used the energy-dissipation estimate (10) to bound the dissipation and Jensen's inequality gave us control over the function by the fudge factor α appearing in the definition of the mesoscopic time scale τ = α √ h: This estimate is the reason for the slight abuse of notation: we call the function in (56) α 2 (t) in order to keep the relation (57) between the two quantities in mind. In the following we will always carry along the argument t of the function α 2 (t) to make the difference clear. Writing χ τ short for χ( · + τ ) we have shown in the proof of Lemma 2.5 that (55) holds in the more precise form of In this section we will derive the following more subtle bound: While the argument for (55) was based on we now start from the thresholding scheme: We will use an elementary one-dimensional estimate, Lemma 4.2 (cf. Corollary 79 for this rescaled version), in direction ν * = e 1 (w. l. o. g.) and integrate transversally to obtain The first right-hand side term measures the monotonicity of the phase function u in normal direction in the transition zone { 1 3 ≤ u ≤ 2 3 }. It is clear that this term vanishes for χ −h = χ * , provided the universal constant c > 0 is sufficiently small. In Lemma 4.4 we will indeed bound this term by the excess at the previous time step. Compared to the first approach which yielded (58), where the limiting factor is that the first right-hand side term is only O( √ h), the result of the latter approach yields the improvement for an arbitrary (small) parameter s > 0. Now we show how to use the bound (61) in order to estimate (54). First, in Lemma 4.7 by freezing time for χ on the mesoscopic time scale τ = α √ h and using a telescoping sum for the first term ∂ h t χ we will show that By (61) the error term is controlled by by choosing s ∼ α 2 3 . Second, in Lemma 4.8 we will show how to use the algebraic relation (χ τ − χ)(χ τ + χ) = χ τ − χ for the product (χ τ − χ) √ h∇G h * (χ τ + χ) so that we can rewrite the right-hand side of (62) as for some kernel k. Third, in Lemma 4.9 we will control the first error term by using its quadratic structure and the estimate (61) before the transversal integration in x : Fig. 4 The majority phases 1 and 2 and the half space * = {x · ν * > λ} approximating 1 inside the ball B 2r . Its complement * c approximates 2 inside B 2r by choosings ∼ α 2 3 and s ∼ α 4 9 . We note that the values of the exponents of α in (63) and (65) do not play any role and can be easily improved. We only need the extra terms, here α 1 3 and α 1 9 , to be o(1) as α → 0; the prefactor of the excess ε 2 , here 1 α , can be large. Indeed, after sending h → 0 we will obtain the error 1 α E 2 + α 1 9 . We will handle this term in Sect. 5 by first sending the fineness of the localization to zero so that E 2 vanishes, and then sending the parameter α → 0.
In the following we will make the above steps rigorous and give a full proof in the multiphase case. First we state the main result, Proposition 4.1, then we explain the tools we will be using more carefully in the subsequent lemmas. We turn first to the two-phase case to present the one-dimensional estimate (60) in Lemma 4.2, its rescaled and localized version Corollary 4.3 and the estimate for the error term Lemma 4.4. Subsequently we state the same results in Lemma 4.5 and Corollary 4.6 for the multi-phase case. These estimates are the core of the proof of Proposition 4.1 and use the explicit structure of the scheme. Let us note that in these estimates we are using the two steps of the scheme, the convolution step (1) and the thresholding step (2), in a well-separated way. Indeed, the one-dimensional estimate, Lemma 4.5, analyzes the thresholding step (2); and Corollary 4.6 brings the (transversally integrated) error term in the form of the excess ε 2 at the previous time step by analyzing the convolution step (1).

Results
The main result of this section is the following proposition which will be used for small time intervals in Sect. 5 where we will control the limiting error terms which appear here with soft arguments from geometric measure theory. In view of the definition of E 2 below, the proposition assumes that χ 3 , . . . , χ P are the minority phases in the space-time cylinder (0, T ) × B r (Fig. 4); likewise it assumes that the normal between χ 1 and χ 2 is close to the first unit vector e 1 . This can be assumed since on the one hand we can relabel the phases in case we want to treat another pair of phases as the majority phases. On the other hand, due to the rotational invariance, it is no restriction to assume that e 1 is the approximate normal.
The exponents of α in this statement are of no importance and can be easily improved. It is only relevant that the two extra error terms, i. e. r d−1 T and η dμ, are equipped with prefactors which vanish as α → 0. In Sect. 5 we will show that-even after summationthe excess will vanish as the fineness of the localization, i. e. the radius r of the ball in the statement of Proposition 4.1 tends to zero. There we will take first the limit r → 0 and then α → 0 to prove Theorem 1.3. The prefactor of the excess, here 1 α 2 differs from the one in the two-phase case since the one-dimensional estimate is slightly different in the multi-phase case.
Let us comment on the structure of E 2 . The first term, describing the surface area of Phases 3, . . . , P inside the ball B 2r , will be small in the application when χ 3 , . . . , χ P are indeed the minority phases. The second term, sometimes called the excess energy describes how far χ 1 and χ 2 are away from being half spaces in direction e 1 or −e 1 , respectively. The terms comparing the surface energy inside B 2r do not see the orientation of the normal, whereas the bulk terms measuring the L 1 -distance inside the ball B 2r do see the orientation of the normal.
The estimates in Sect. 2 are not sufficient to understand the link between the first variation of the metric term and the normal velocities. For this, we need refined estimates which we will first present for the two-phase case, where only one interface evolves. The main tool of the proof is the following one-dimensional lemma. For two functions u,ũ, it estimates the L 1 -distance between the characteristic functions χ = 1 {u≥ 1 2 } andχ = 1 {ũ≥ 1 2 } in terms of the L 2 -distance between the u's -at the expense of a term that measures the strict monotonicity of one of the functions u. We will apply it in a rescaled version for x 1 being the normal direction.
for every s > 0.
The following modified version of Lemma 4.2 is the estimate one would use in the twophase case.
for any s > 0.
In the previous corollary, it was crucial to control strict monotonicity of one of the two functions via the term In the following lemma, we consider the d-dimensional version, i. e. dx 1 replaced by dx, of this term in case of u = G h * χ. We show that this term can be controlled in terms of the excess, measuring the energy difference to a half space χ * in direction e 1 .
where ε 2 is defined via and the integral on the left-hand side of (68) with the two cases <, + and >, −, respectively is a short notation for the sum of the two integrals.
In our application, we use the following lemma which is valid for any number of phases with arbitrary surface tensions instead of Lemma 4.2 or Corollary 4.3. Nevertheless, the core of the proof is already contained in the respective estimates in the two-phase case above. As in Proposition 4.1, we assume that χ 1 and χ 2 are the majority phases and that e 1 is the approximate normal to 1 = {χ 1 = 1}. Lemma 4.5 Let I ⊂ R be an interval, h > 0, η ∈ C ∞ 0 (R), 0 ≤ η ≤ 1 radially nonincreasing and u,ũ : I → R P be two smooth maps into the standard simplex for any s 1.
As Lemma 4.4 can be used to estimate the integrated version of the error in Corollary 4.3 against the excess, the following corollary shows that the integrated version of the corresponding error term in the multi-phase version, Lemma 4.5, can be estimated against a multi-phase version of the excess ε 2 .
where the functional ε 2 (χ) is defined via and the functional F h is the following localized version of the approximate energy in the two-phase case With these tools we can now turn to the rigorous proof of (62)-(65) in the following lemmas. In the next two lemmas, we approximate the first variation of the metric term by an expression that makes the normal velocity appear. The main idea is to work, as for Lemma 2.5, on a mesoscopic time scale τ ∼ √ h, introducing a fudge factor α, cf. Remark 1.5. The first lemma shows that we may coarsen the first variation from the microscopic time scale h to the mesoscopic time scale α √ h and is therefore the rigorous analogue of (62). It also shows that we may pull the test vector field ξ out of the convolution.
in the sense that the error is controlled by where η ∈ C ∞ 0 (B 2r ) is a radially symmetric, radially non-increasing cut-off for B r in B 2r with |∇η| 1 r and the functional ε 2 (χ) is defined in Corollary 4.6.
While the first lemma made the mesoscopic time derivative 1 τ χ Kl i − χ K(l−1) i appear, the upcoming second lemma makes the approximate normal, here e 1 , appear. This is the analogue of (64).

Lemma 4.8 Given ξ and η as in Lemma
in the sense that the error is controlled by as h → 0, where 0 ≤ k(z) ≤ |z|G(z) and the functional ε 2 (χ) is defined in Corollary 4.6.
Let us comment on the error term: the first part of the error term arises because e 1 is only the approximate normal. The last part arises in the passage from a diffuse to a sharp interface and formally is of quadratic nature.
The following lemma deals with the error term in the foregoing lemma and brings it into the standard form. The only difference to the two-phase case in (65) is the prefactor in front of the excess ε 2 which comes from the slight difference in the two one-dimensional estimates.

Lemma 4.9 With η as in Lemma 4.7 we have
where the functional ε 2 (χ) is defined in Corollary 4.6.
With the above lemma we can conclude the proof of Proposition 4.1. Since one of the error terms includes the factor r d−1 we will only use the proposition in case there the behavior in the ball B r is non-trivial. In the trivial case-meaning that the measure of the boundary inside B is much smaller than r d−1 -we can use the following easy estimate.

Proofs
Proof of Proposition 4.1 Step 1: the discrete analogue of (66) The statement follows easily from Here we use the notation ε 2 (t) := ε 2 (χ h (t)), where the functional ε 2 (χ) is defined in Corollary 4.6. The infimum is taken over all half spaces χ * = 1 {x 1 >λ} in direction e 1 . All terms appearing in ε 2 correspond to terms in E 2 . The first term is the sum of the localized approximate energies of χ 3 , . . . , χ P , the second term describes the approximate energy excess of Phases 1 and 2. The convergence of these terms as h → 0 for a fixed half space χ * follows as in the proof of Lemma 2.8. Taking the infimum over the half spaces yields (66).
Step 2: choice of appropriately shifted mesoscopic time slices In order to prove (71), we use the machinery that we develop later on in this section. There we work on the mesoscopic time scale τ = α √ h instead of the microscopic time scale h, see Remark 1.5 for the notation. To apply these results, we have to adjust the time shift of time slices of mesoscopic distance. At the end, we will choose a microscopic time shift k 0 ∈ {1, . . . , K} such that the average over time slices of mesoscopic distance is controlled by the average over all time slices: This follows from the simple fact that ε 2 (k 0 ) ≤ 1 K K k=1 ε 2 (k) for some k 0 . For notational simplicity, we shall assume that k 0 = 0 in (72).
Step 3: argument for (71) Using Lemmas 4.7, 4.8 and 4.9, we obtain up to an error where we used the choice of time slices (72). Since ξ has compact support in (0, T ), a discrete integration by parts yields By the Hölder-type bounds in Lemma 2.6 we can replace the mesoscopic scale on the righthand side by the microscopic scale for χ: By the smoothness of ξ , we can easily do the same for ξ to obtain by (iii) in Proposition 2.2 that for h → 0 Using this for the right-hand side of (73) establishes (71) and thus concludes the proof.
Proof of Lemma 4.2 Step 1: an easier inequality We claim that for any function u ∈ C 0,1 (I ), we have By Jensen's inequality for the convex function z → z 2 − , we have If |J | ≥ 4, then −1 ≤ 2(u(b) − u(a))/|J | ≤ 1 and so Thus, we have Hence, which concludes the proof.
Proof of Corollary 4.3 By rescaling x 1 = √ hx 1 ,û(x 1 ) = u( √ hx 1 ), and analogously forũ and using Lemma 4.2 for the transformed functions we obtain: Now we approximate η by simple functions: let Then 0 ≤η ≤ η, |η −η| ≤ 1 N and since η is radially non-increasing, each J n is an open interval. We can apply (79) with J n playing the role of I . By linearity we have Passing to the limit N → ∞, the left-hand side converges to 1 √ h η |χ −χ| dx 1 and we obtain the claim.
Proof of Lemma 4.4 Argument for (68) As in Step 1 of the proof of Lemma 2.4, by (22) we Using , and 2u + = |u| + u on the set {z 1 > 0} and 2u − = |u| − u on {z 1 < 0}, we thus obtain where we used again <, + and >, −, respectively as a short notation for the sum of the two integrals. Now we can apply a Taylor expansion for η around x, i. e. write η(x) − η(x − z) = ∇η(x) · z + O(|z| 2 ), where the constant in the O(|z| 2 )-term depends linearly on ∇ 2 η ∞ . By symmetry, the first-order term is Note that the right-hand side can be controlled by The second-order term is controlled by which completes the proof of (68). Argument for (69). For the first arguments let w. l. o. g. h = 1. The first ingredient is the identity where the last term is the sum of the two integrals. Indeed, since ∂ 1 G(z) = −z 1 G(z) is odd in z 1 , and splitting the integrand in the form u = |u|−2u − on the set {z 1 > 0} and −u = |u|−2u + on {z 1 < 0}, respectively, we derive which is (81). The second ingredient for (69) is To obtain (82), we estimate We recall that we G factorizes in a one-dimensional Gaussian G 1 and a (d − 1)-dimensional so that the second integral can be estimated from above by 2G 1 (0) . Therefore we have Optimizing in yields (82).
Using the fact that χ ∈ {0, 1}, implies the third ingredient: Combining (81), (82) and (83), one finds a positive constant c such that where we recall that the last term is the sum of the two integrals. We consider the "bad" set E := x : By construction of E we have a good estimate on E c : and thus we obtain strict monotonicity of G * χ in e 1 -direction outside E as long as the first term on the left-hand side dominates the second term: We introduce the parameter h again. Then this turns into with now By construction of E and since |z|G h (z) by a change of coordinates z → 2z and the subadditivity of the functions u → u ± . The last term can be handled using a Taylor expansion of η around x: where the constant in the O( √ h)-term depends linearly on E h (χ) and ∇η ∞ . Indeed, the error in the equation above is-up to a constant times ∇η ∞ -estimated by Using (68), we obtain and thus (69) holds.
Proof of Lemma 4.5 By the same argument as in Corollary 4.3, we can ignore the cut-off η and the parameter h > 0 and reduce the claim to the following version: We will prove Then (85) follows from the one-dimensional case in the form of (78) for the first right-hand side set and Chebyshev's inequality for the second and third right-hand side set. The fact that we replaced the 1 in (78) by the universal constant c > 0 can be justified by a simple rescaling.
In order to prove (86), we fix i ∈ {1, . . . , P} and define the functions andṽ in the same way, so that χ i = 1 {v>0} ,χ i = 1 {ṽ>0} and We clearly have which together with Chebyshev's inequality yields the desired bound on the measure of the second right-hand side set of (87). Therefore our goal is to prove which then implies (86). Now we give the argument for (88). First, we decompose the set {|v| < s} in the following way: We claim that Indeed, plugging in the definition of φ, using the triangle inequality for the surface tensions and l u l = 1, for k / ∈ {i, j}, we have on E j Subtracting φ j on both sides, we obtain u k ≤ 1 2 . Since φ j − s ≤ φ i on E j with the same chain of inequalities as before we obtain The same inequality holds for u j since φ i − s ≤ φ j , which concludes the argument for (89).
On the one hand, (89) gives us the upper bound for u 1 on {|v| < s}. On the other hand, since u ∧ (1 − u) = u − (2u − 1) + for any u we infer from (89) that on the set This concludes the argument for (88) and therefore the proof of the lemma.
Proof of Corollary 4.6 By Lemma 4.4, the claim follows from the obvious inequality Proof of Lemma 4.7 We recall the definition of the inner variation of −E h (χ −χ) in (33): we have for any pair of admissible functions χ,χ and any test function In our case, after integration in time, this yields denotes the time average of ξ over a microscopic time interval [nh, (n + 1)h). Now we prove step by step that 1. the (∇ · ξ)-term is negligible as h → 0; 2. we can freeze mesoscopic time for ξ , that is, substitute ξ n by some nearby value ξ(l n τ ) at the expense of an o(1)-term; 3. we can smuggle in η at the expense of an o(1)-term; 4. we can freeze mesoscopic time for χ h and substitute χ n in the second factor by the mean 1 2 χ h ((l n − 1)τ ) + χ h (l n τ ) , which is the main step; 5. we can get rid of η again at the expense of an o(1)-term; and finally; 6. we can pull ξ out of the convolution at the expense of an o(1)-term.

Note that
Step 3 and Step 5 are just auxiliary steps for Step 4.
Step 1: the (∇ · ξ)-term vanishes as h → 0 By Jensen's inequality, for any pair i, j we have Since the L 2 -norm of G h * u is decreasing in h and by the energy-dissipation estimate (10), the error is controlled by Step 2: time freezing for ξ We can approximate ξ n by a nearby value ξ(l n τ ), where l n ∈ {1, . . . L} is chosen such that K(l n − 1) < n ≤ Kl n . Note that |ξ n − ξ l n | ≤ τ ∂ t ξ ∞ .
Therefore, by Jensen's inequality, we have for any pair i, j Using the energy-dissipation estimate (10), the error is controlled by Step 3: smuggling in η We claim Using ∇G h = G h/2 * ∇G h/2 , the left-hand side is equal to Note that since η ≡ 1 on the support of ξ and |z| ∇G 1/2 (z) |z| 2 G(z) has finite integral, we have for any χ ∈ {0, 1}, Thus, using the Cauchy-Schwarz inequality and the energy-dissipation estimate (10), the error is controlled by Step 4: time freezing for χ h We claim that for any pair of indices i, j in the sense that the error is controlled by Here, we assumed that Phases 1 and 2 are the majority phases in the support of η. Indeed, we can control the error using the Cauchy-Schwarz inequality by Since 0 ≤ η ≤ 1, the term in the first parenthesis is controlled by η dμ h . For the term in the second parenthesis, we fix the mesoscopic block index l and the microscopic time step index k and sum at the end. Let l = 1 and write ξ instead of ξ(lτ ) for notational simplicity. We use the L 2 -convolution estimate and introduce η in the second integral, which is equal to 1 on the support of ξ : With Lemma 4.5 in the integrated form and Corollary 4.6, we can estimate these terms in the following way. We set for abbreviation By Minkowski's triangle inequality w. r. t. the measure η dx, we see that α also satisfies a triangle inequality. Thus, thanks to Jensen's inequality, Therefore, by integrating Lemma 4.5 over the tangential directions x 2 , . . . , x d and using Corollary 4.6, we have By (15) we have n α 2 (n, n − 1) ≤ η dμ h and the relation K τ = α 2 , we have This justifies the name α 2 (k, k ), since the term arising from α 2 (k, k ) is estimated in (91) by α 2 , the square of the fudge factor in the definition of the mesoscopic time scale τ = α √ h. Therefore, after summation over the mesoscopic block index l, (90) in conjunction with (91) yields Using Young's inequality, the total error in this step is controlled by ξ ∞ times If we now choose s = α Step 5: getting rid of η again As in Step 3, we can estimate as h → 0.
Step 6: pulling out ξ First, fix l and write ξ = ξ(lτ ). For simplicity of the formula, we will ignore l and formally set l = 1. Note that since ∇G is antisymmetric, we have for any two Then the error on this single time interval splits into two terms. The one coming from the first-order term in the expansion of ξ is where we used Jensen's inequality. Since K h = h∇ 2 G h + G h I d, h∇ 2 G h * u L 2 G h/2 * u L 2 for any u and since the L 2 -norm of G h * u is non-increasing in h, we have for any function v Plugging this into the inequality above with v playing the role of χ k i − χ k−1 i , multiplying by τ , summing over the block index l and using Jensen's inequality, we can control the contribution to the error coming from the first-order term by By Lemma 2.5, the contribution coming from the second-order term in the expansion of ξ is controlled by Finally, we note that by the time freezing in Step 4, we constructed a telescope sum: rewriting the summation over the microscopic time step index n = 1, . . . , N as the double sum over the microscopic time step index k = 1, . . . , K in the respective mesoscopic time intervals and the mesoscopic block index l = 1, . . . , L, we have for each l, Thus we obtain which concludes the proof.
Proof of Lemma 4.8 Step 1: rough estimate for minority phases We first argue that if {i, j} = {1, 2}, that is if the product involves at least one minority phase, then we can estimate this term. Let us first assume that j / ∈ {1, 2}. By a manipulation as in the proof of Lemma 4.7 and the Cauchy-Schwarz inequality, we have as h → 0. Note that for any characteristic function χ ∈ {0, 1}, since ∇G(z) dz = 0, Treating the metric term as in the proof of Lemma 4.7 with the triangle inequality and Jensen's inequality afterwards, we obtain the bound If instead i / ∈ {1, 2}, using a discrete integration by parts, the antisymmetry of ∇G and a manipulation as in the proof of Lemma 4.7 for ξ , we can exchange the roles of Phase i and Phase j: Thus, we can use the above argument also in this case and the only terms contributing to the sum as h → 0 are the terms involving both majority phases.
In the following we assume that i = 1 and j = 2. In the other case, we can just exchange the roles of χ 1 and χ 2 in the following steps and use −e 1 as the approximate normal to χ 2 instead and the proof is the same.
Step 2: substituting χ 2 by 1 − χ 1 We claim that Since ∇G * 1 = 0, the claim is clearly equivalent to proving that we can replace χ 2 by 1 − χ 1 in the second left-hand side term of the claim. But by i χ i = 1 and linearity the resulting error term is which can be handled by Step 1.
Step 3: substitution of ∇G We want to replace the convolution with ∇G on the left-hand side of the claim by a convolution with the anisotropic kernel To that purpose, we claim that for any characteristic function χ ∈ {0, 1}, Here, where the infimum is taken over all half spaces χ * = 1 {x 1 >λ} in direction e 1 . Using this inequality for χ K(l−1) 1 and χ Kl 1 , we can substitute those two summands and the error is estimated as desired: Now we give the argument for (92). By measuring length in terms of √ h, we may assume that h = 1. Since ∇G dz = 0 and ∇G(z) = −z G(z), using the identities u = |u| − 2u − and u = −|u| + 2u + on the sets {z 1 > 0} and {z 1 < 0}, respectively, Using |χ 1 − χ 2 | = (1 − χ 1 )χ 2 + χ 1 (1 − χ 2 ) for χ 1 , χ 2 ∈ {0, 1}, this implies the pointwise identity where the last term stands for the sum of the two integrals. Integration w. r. t. η dx now yields: As in the argument for (69), we can follow the lines from (84) on so that (68) yields (92).
Step 4: an identity for K We claim that for any two characteristic functions χ,χ ∈ {0, 1}, we have the pointwise identity Indeed, by scaling, we may w. l. o. g. assume h = 1 and start with Exchanging the roles of χ andχ , one obtains for the second part Using the factorization property of G and the symmetry z G d−1 (z )dz = 0, one computes that for any vector ξ ∈ R d ξ · K = sign(z 1 ) Hence the identity follows from (χ − 1)χ − (χ − 1) χ = χ −χ.
Step 5: conclusion Applying Steps 1 and 2, using the identity in Step 3 for the remaining two terms involving Phases 1 and 2, we end up with the right-hand side of the claim. The error is controlled by as h → 0. Note that |K | = k, where k is the kernel defined in the statement of the lemma. It remains to argue that η can be equally distributed on both copies of χ Kl − χ K(l−1) . For this, note that for u = χ Kl − χ K(l−1) ∈ [0, 1], Thus, in our case, we can use Lemma 2.6 and bound the error by Proof of Lemma 4.9 First, we note that it is enough to prove the following similar statement for a fixed mesoscopic time interval: Indeed, if we multiply (93) by τ and sum over the mesoscopic block index l we obtain the statement.
In the proof of (93), we will exploit the convolution in the normal direction e 1 in Step 1, which will allow us in Step 2 to make use of the quadratic structure of this term.
Step 1 We can estimate the kernel k by a kernel that factorizes in two kernels k 1 , k in normaland tangential direction, respectively, which are of the form Let us still denote the kernel by k. We have The second factor in the right-hand side convolution can be estimated in two ways: Therefore, we obtain a quadratic term with two copies of 1 Step 2 Now we use Lemma 4.5 before integration in x . We write ε 2 (x ) for the first error term in (70) and set so that the statement of Lemma 4.5 turns into We recall the link between the function α 2 (x ) and the fudge factor α as mentioned in (91) but now before summation over the mesoscopic block index l: Then for any two parameters s,s 1 the right-hand side of (94) is estimated by For the first and the last summand in the first factor, 1 s ε 2 (x ) and 1 s 2 α 2 (x ), we use the 1 in the minimum on the right. For the second summand on the left, s, we use the second term in the minimum for the pairing. Using the L 1 -convolution estimate and (95)  1 and then s = α 4 9 1.
Proof of Lemma 4.10 Thanks to the convergence assumption (8), we can apply Proposition 3.1. Using the Euler-Lagrange equation (34) for χ n and (36), we can identify the first term on the left-hand side as the limit of the first variation of the dissipation functional as h → 0. Following Step 1 of the proof of Lemma 4.7 and then estimating directly as in Step 3, but for ξ instead of η, we obtain Using the Cauchy-Schwarz inequality, for any pair i, j we have The first right-hand side factor is bounded by η dμ h . As in Step 1 in the proof of Lemma 4.8, the second right-hand side factor can be controlled by as h → 0, where we used Lemma 2.8 to pass to the limit. Thus, using Young's inequality, we have To estimate the second term in the lemma, note that by Young's inequality we have which concludes the proof.

Convergence
In Sect. 3, we identified the limit of the first variation of the energy; in Sect. 4, we identified the limit of first variation of the metric term up to an error that measures the local approximability by a half space. In this section, we show by soft arguments from geometric measure theory that this error can be made arbitrarily small. Before that, we will state the main ingredients of the proof here.
Definition 5.1 Given r > 0, we define the covering Z d is a regular grid of midpoints on [0, ) d . By construction, for each n ≥ 1 and each r > 0, the covering in the sense that for each point in [0, ) d , the number of balls containing this point is bounded by a constant c(d, n) which is independent of r . For given δ > 0 and χ : [0, ) d → {0, 1} ∈ BV , we define B r,δ to be the subset of B r consisting of all balls B such that the following two conditions hold: where η 2B is a cut-off for 2B.
The following lemma will be used to control the error terms obtained in Sect. 4 on the "bad" balls B ∈ B r − B r,δ .

Lemma 5.3
For any δ > 0 and any χ : In a rescaled version, the following lemma can be used to control the error terms on the "good" balls B ∈ B r,δ .

Lemma 5.4
Let η be a radially symmetric cut-off for the unit ball B. Then for any ε > 0 there exists δ = δ(d, ε) > 0 such that for any χ : there exists a half space χ * in direction e 1 such that then we can do so with one normal ν * ∈ S d−1 and its inverse −ν * :

Proof of Theorem 1.3
Using Proposition 4.1 and the lemmas from above, we can give the proof of the main result. The proof consists of three steps: 1. Post-processing Propositions 3.1 and 4.1, using the Euler-Lagrange equation (34) and by making the half space time-dependent, 2. Estimates for fixed time and 3. Integration in time.
where E 2 i j is defined via The infimum is taken over all half spaces χ * = 1 {x·ν * >λ} in direction ν * . Argument for (102): by symmetry, we may assume w. l. o. g. that the minimum on the righthand side of (102) is realized for i = 1 and j = 2. The Euler-Lagrange equation (34) of the minimizing movements interpretation (5) links Proposition 3.1 with the metric term: Before applying the results of Sect. 4 we symmetrize the second term on the left-hand side of (66): we claim that we can replace which appears on the left-hand side of (66) by the symmetrized term i, j which appears in the weak formulation (6). Then using Proposition 4.1 and this symmetrization or the rough estimate Lemma 4.10 yields (102). Now we show how to replace (103) by (104).
Note that by Young's inequality we have so that we can estimate both terms after integration in time by We are left with estimating the summands in (104) with {i, j} = {1, 2}. For those terms we can use Young's inequality in the following form |ν i | |V i | ≤ 1 α + αV 2 i so that after integration in time these terms are controlled by ξ B ∞ times which concludes the argument for the symmetrization and thus for (102).
Here, we see, why we needed to introduce extra terms in E 1 compared to the terms that were already present in the definition of E 1 in Sect. 4. These different terms are sometimes called tilt-excess and excess energy, respectively. Now let ξ ∈ C ∞ 0 ((0, T )×[0, ) d , R d ) be given. First, we localize ξ in space according to the covering B r from Definition 5.1. To do so, we introduce a subordinate partition of unity {ϕ B } B∈B r and set ξ B := ϕ B ξ . Then ξ = B∈B r ξ B , ξ B ∈ C ∞ 0 (B) and ξ B ∞ ≤ ξ ∞ . Given a radially symmetric and radially non-increasing cut-off η of B 1 (0) in B 2 (0), for each ball B in the covering, we can construct a cut-off η B of B in 2B by shifting and rescaling. Given any measurable function ν * : (0, T ) → S d−1 and any α ∈ (0, 1) we claim where E 2 B (ν * , t) := min i = j E 2 i j (ν * , t) for ν * ∈ S d−1 . Now we give the argument that (102) implies (105). We approximate the measurable function ν * in time by a piecewise constant function. Let 0 = T 0 < · · · < T M = T denote a partition of (0, T ) such that the approximation ν * M of ν * is constant on each interval [T m−1 , T m ). Since the measures on the left-hand side are absolutely continuous in time, we can approximate ξ B by vector fields which vanish at the points T m and both, the curvature and the velocity term converge. Therefore, we can apply (102) on each time interval (T m−1 , T m ). Lebesgue's dominated convergence gives us the convergence of the integral on the right-hand side and thus (105) holds.
Step 2: estimates for fixed time Let t ∈ (0, T ) be fixed. We will omit the argument t in the following. Let ε > 0 and let δ = δ(ε) (to be determined later). Let B r,δ be defined as the set of good balls in the lattice: For B ∈ B r,δ , and i = 1, . . . , P, we denote by ν B,i the vector ν * for which the infimum is attained, so that By a rescaling and since η is radially symmetric, we can upgrade Lemma 5.5, so that for given γ > 0, we can find δ = δ(d, γ ) > 0 (independent of χ) and ν B ∈ S d−1 , such that Rescaling Lemma 5.4, we can define γ = γ (ε) > 0 and a half space χ * in direction ν B , such that These two steps give us the dependence of δ on ε. Using the lower bound on the perimeters on B ∈ B r,δ (t), we obtain Note that for the balls B ∈ B r − B r,δ , we have by Lemma 5.3: |∇χ i | → 0, as r → 0.
The speed of convergence depends on χ and ε (through δ).
Step 3: integration in time Using Lebesgue's dominated convergence theorem, we can integrate the pointwise-in-time estimates of Step 2. Recalling the decomposition ξ = B ξ B and using the finite overlap (97), we have Since by the energy-dissipation estimate (10) we have E(χ(t)) ≤ E 0 and can control the first term. By Lebesgue's dominated convergence and (106), the second term vanishes as r → 0. By (16) and Proposition 2.2, we can handle the last two terms. Thus we obtain Taking first the limit ε to zero and then α to zero yields (6), which concludes the proof of Theorem 1.3. 2B → 0 as r → 0.
Therefore, after passage to a subsequence and a diagonal argument to exhaust the open ball {η > 0}, we find χ such that χ n → χ pointwise a.e. on {η > 0}.
Since the first term on the left-hand side is lower semi-continuous and the second one is continuous, we can pass to the limit in the above inequality and obtain η |ν − e 1 | 2 |∇χ| = 2 η |∇χ| − 2 ∇η · e 1 χ dx ≤ 0.
A mollification argument shows that there exists a half space χ * in direction e 1 such that χ = χ * a.e. in {η > 0}.
Because of (114), this rules out B χ n − χ * ≥ ε 2 on the one hand. On the other hand, by lower semi-continuity of the perimeter, also B ∇χ * ≥ ε 2 + B |∇χ n | is ruled out. To obtain a contradiction also w. r. t. the first statement in (113), letη ≤ η be a cut-off for B in (1 + δ)B. Since (111) holds also forη instead of η, we have Since χ * is a half space and therefore has no mass on ∂ B, we have which is a contradiction.
Proof of Lemma 5. 5 We give an indirect argument. Assume there exists a sequence of characteristic functions {χ n } n with i χ n i = 1 a.e., a number ε > 0 such that we can find approximate normals ν * n i ∈ S d−1 with Together with (116), we can take the limit n → ∞ in (115) and obtain k≥3 B which is a contradiction since the left-hand side vanishes by construction.