Gamma-convergence of a nonlocal perimeter arising in adversarial machine learning

In this paper we prove Gamma-convergence of a nonlocal perimeter of Minkowski type to a local anisotropic perimeter. The nonlocal model describes the regularizing effect of adversarial training in binary classifications. The energy essentially depends on the interaction between two distributions modelling likelihoods for the associated classes. We overcome typical strict regularity assumptions for the distributions by only assuming that they have bounded $BV$ densities. In the natural topology coming from compactness, we prove Gamma-convergence to a weighted perimeter with weight determined by an anisotropic function of the two densities. Despite being local, this sharp interface limit reflects classification stability with respect to adversarial perturbations. We further apply our results to deduce Gamma-convergence of the associated total variations, to study the asymptotics of adversarial training, and to prove Gamma-convergence of graph discretizations for the nonlocal perimeter.


Introduction
While modern machine learning methods and in particular deep learning [37] are known to be effective tools for difficult tasks like image classification, they are prone to adversarial attacks [45].The latter are imperceptible perturbations of the input which destroy classification accuracy.As a way to mitigate the effects of adversarial attacks, Madry et al. in [38] suggested a robust optimization algorithm to train more stable classifiers.Given a metric space X acting as feature space, a set Y acting as label space, a probability measure µ ∈ M(X × Y), which models the distribution of training data, a loss function ℓ : Y × Y → R, and a collection of classifiers C, adversarial training takes the form of the minimization problem inf u∈C E (x,y)∼µ sup x∈B(x,ε) ℓ(u(x), y) . (1.1) Here we use the notation E z∼µ [f (z)] := f (z) dµ(z).Adversarial training seeks a classifier for which adversarial attacks in the ball of radius ε around x (with respect to the metric on X ) have the least possible impact, as measured through the function ℓ.Here ε > 0 is referred to as the adversarial budget and will play an important role in this article.
Since its introduction a significant body of literature has evolved around adversarial training, both focusing on its empirical performance and improvement (see the survey [5]), and its theoretical understanding.Since the purpose of this article is primarily of theoretical nature, we restrict our discussion to the latter developments.As it turns out, adversarial training has intriguing connections within mathematics.Firstly, it was noted in different works that adversarial training is strongly connected to optimal transport.This was explored in the binary classification case, where Y = {0, 1} and ℓ equals the 0-1-loss, in a series of works, see [9,32,41,42] and the references therein.Only recently these results were generalized to the multi-class case where Y = {1, 2, . . ., K} by García Trillos et al. in [31] who characterize the associated adversarial training problem in terms of a multi-marginal optimal transport problem.Secondly, it was already observed by Finlay and Oberman in [26] that asymptotically, meaning for very small values of ε > 0 in (1.1), adversarial training is related to a regularization problem, where the gradient of the loss function is penalized in a suitable norm: inf u∈C E (x,y)∼µ [ℓ(u(x), y)] + ε E (x,y)∼µ [ ∇ x ℓ(u(x), y) * ] . (1.2) Here X is assumed to be a Banach space and • * is the corresponding dual norm.While these connections were mostly formal, C. and N. García Trillos in [30] made them rigorous in the context of adversarial training for residual neural networks.Still even there the relation between adversarial training and regularization was of asymptotic type, in particular, not allowing statements about the relation of minimizers of (1.1) and (1.2).Furthermore, in [32] García Trillos and Murray regard adversarial training in the form of (1.1) for binary classifiers as evolution with artificial time ε and relate it to (1.2) with a perimeter regularization at time ε = 0.A different approach was taken in their work together with the first author of this paper [12] where, again in the binary classification case and for ℓ the 0-1-loss, it was shown that adversarial training is equivalent to a non-local regularization problem: (1.1) = inf u∈C E (x,y)∼µ [ℓ(u(x), y)] + ε TV ε (u; ρ). (1.3) Here TV ε (•; ρ) denotes a nonlocal total variation functional depending on the measures ρ := (ρ 0 , ρ 1 ) defined as ρ i := µ(• × {i}) for i ∈ {0, 1} which are-up to normalization-the conditional distributions of the two classes describing their respective likelihoods.The set of classifiers can be the set of characteristic functions of Borel sets C char = {χ A : A ∈ B(X )} or the set of "soft classifiers" C soft = {u : X → [0, 1]} which live in a Lebesgue space equipped with a suitable measure on X .In this paper existence of solutions for adversarial training was proven, which included suitable relaxations of the objective function in (1.3) to a lower semi-continuous function, the construction of precise representatives, and the insight that the model with C soft is a convex relaxation of the model with C char .Furthermore, regularity properties of the decision boundaries of solutions are investigated.
We would like to emphasize that the results in [12] are proved for open balls B(x, ε) in (1.1) and this will also be the setting of the present paper.Usually, adversarial training is defined using closed balls which does not change the model drastically but requires more care with respect to measurability of the underlying functions, see the discussion in [12,Remark 1.3,Appendix B.1].For a different approach to proving existence-using a closed ball model-we refer to the work [4] by Awasthi et al., and the follow-up paper [3] studying consistency of adversarial risks.
The focus of this paper will be on the asymptotics of the functional TV ε (•; ρ) in (1.3).In fact, we will work with the associated perimeter functional Per ε (A; ρ) := TV ε (χ A ; ρ) for A ⊂ X since all statements, in particular Gamma-convergence, proved for the perimeter directly carry over to the total variation (see Section 4).This perimeter was shown in [12] to be of the form where the essential supremum and infimum are taken with respect to a suitable measure ν with sufficiently large support.When X = R d and ρ i equal the Lebesgue measure, the perimeter (1.4) can be used to recover the Minkwoski content of subset of R d by sending ε → 0, see (2.1) below.In this simplified setting, the nonlocal perimeter has applications in image processing [6] and is mathematically well-understood.A thorough study of its properties like isoperimetric inequalities or compactness was undertaken in [16,17], Gamma-convergence of related variants to local perimeters was investigated in [18,19], and associated curvature flows were analyzed in [20,21].We note that Chambolle et al. [19] introduce anisotropy into the perimeter by replacing the ball B(x, r) in the definition of (1.4) by a scaled convex set C(x, r).
In this paper, we study the asymptotic behavior of the nonlocal perimeter (1.4) as ε → 0, using the framework of Gamma-convergence (see, e.g., [11,23]).This approach is widely used in applications to materials science (see, e.g., [1,22,29,40]) and is particularly applicable to the study of energy minimization problems depending on singular perturbations, where it can describe complex energy landscapes in terms of simpler understood effective energies.In the case of phase separation in binary alloys, energetic minimizers closely approximate minimal surfaces [39].Likewise, we will relate local minimizers of the perimeter (1.4) to a weighted minimal surface, which has a transparent geometric interpretation.
Though the nonlocal perimeter (1.4) is not a phase field approximation of the classical perimeter, similar analytic tools are helpful.The Ambrosio-Tortorelli functional was introduced as an elliptic regularization of the Mumford-Shah energy for image segmentation with the nature of approximation made precise via Gamma-convergence [1,10].From the technical perspective, our work is related to the results of Fonseca and Liu [28] where Gamma-convergence of a weighted Ambrosio-Tortorelli functional is proven.In their setting, they consider a density described by a bounded SBV function with density uniformly bounded away from 0. In contrast to Chambolle et al. [19] and Fonseca and Liu [28], a principal challenge in our setting will be understanding the interaction between the densities ρ 0 and ρ 1 in the energy (1.4) and how these give rise to preferred directions.
We assume that X = Ω ⊂ R d and that the measures ρ i have densities with respect to the Lebesgue measure and are supported on some subset of the domain Ω.While for smooth densities Gamma-convergence is proven relatively easily (as in [19]), we only assume that the densities are bounded BV functions.In this case, we prove that the Gamma-limit is an anisotropic and weighted perimeter of the form where ν A denotes the unit normal vector to the boundary of A. More rigorous definitions are given the Section 2. Here, BV functions are a natural space for the distributions as discontinuities are allowed, but they are sufficiently regular to be well-defined on surfaces and thereby prescribe interfacial weights.We note that while the anisotropic dependence on the normal ν A vanishes for continuous densities ρ i , in the discontinuous case, the anisotropy provides a direct interpretation of the asymptotic regularization effect coming from adversarial training (1.1) for small adversarial budgets (see Examples 1 and 2).
An interesting consequence of our Gamma-convergence result is the convergence of adversarial training (1.1) as ε → 0 to a solution of the problem with ε = 0 with minimal perimeter.Furthermore, the primary result in the continuum setting can be used to recover a Gamma-convergence result for graph discretizations of the nonlocal perimeter; a setting that is especially relevant in the context of graph-based machine learning [33,34,36].Our approach for this discrete to continuum convergence is in the spirit of these works and relies on T L p (transport L p ) spaces which were introduced for the purpose of proving Gamma-convergence of a graph total variation by García Trillos and Slepčev in [34], but have also been used to prove quantitative convergence statements for graph problems, see, e.g., the works [14,15] by Calder et al.
The rest of the paper is structured as follows: In Section 2 we introduce our notation, state our main results, and list some important properties of the nonlocal perimeter.Section 3 is devoted to proving a compactness result as well as Gamma-convergence of the nonlocal perimeters.In Section 4, we finally apply our results to deduce Gamma-convergence of the corresponding total variations, prove conditional convergence statements for adversarial training, and prove Gamma-convergence of graph discretizations.
2 Setup and main results

Notation
The most important bits of our notation are collected in the following.) is an open cube centered at x ∈ R d with sides of length r and two faces orthogonal to ν.If ν is absent, the cube is assumed to be oriented along the axes.We also define

Balls and cubes
and by definition it holds Measure-theoretic set quantities For t ∈ [0, 1] the points where a measurable set A ⊂ R d has density t are defined as The Minkowski content M(A) of A ⊂ R d is defined as the following limit (in case it exists): We denote by ∂ * A the reduced boundary of a set A. This is the set of points where the measure-theoretic normal exists on the boundary of A [2, Definition 3.54].

Functions of bounded variation
For an open set Ω ⊂ R d we let BV (Ω) denote the space of functions of bounded variation [2].Let u ∈ BV (Ω) and M ⊂ Ω be an H d−1 -rectifiable set with normal ν (defined H d−1 -a.e.).For H d−1 -a.e. point x ∈ M , the measure-theoretic traces in the directions ±ν exist and are denoted by u ±ν (x) [2,Theorem 3.77].These are the values approached by u as the input tends to x within the half-space {y : y − x, ±ν > 0}, precisely, u(y) dy.
We typically write u ν instead of u +ν .We denote by u + (x) := max{u ν (x), u −ν (x)} the maximum of the trace values, and likewise u − (x) := min{u ν (x), u −ν (x)} for the minimum.Note that this notation is different from the one used in [2, Definition 3.67].The standard total variation of a function u ∈ L 1 (Ω) is denoted by TV(u) and satisfies TV(u) = |Du|(Ω) for Du the measure representing the distributional derivative (+∞ if it is not a finite measure).For u ∈ BV (Ω) we let J u denote its jump set, see, for example, [2] for a definition.

Main results
Let Ω ⊂ R d be an open domain.We consider two non-negative measures ρ 0 , ρ 1 which are absolutely continuous with respect to the d-dimensional Lebesgue measure.To simplify our notation we shall identify these measures with their densities from now on, meaning that dρ i (x) = ρ i (x) dx and We define the nonlocal perimeter of a measurable set A ⊂ Ω with respect to the measures ρ := (ρ 0 , ρ 1 ) and a parameter ε > 0 as It arises a special case of (1.4) by choosing X = Ω and ν as the Lebesgue measure.Our main result that we prove in this paper is Gamma-convergence of the nonlocal perimeters (2.2) to the localized version which preserves the apparent anisotropy of the energy.To motivate the correct topology for Gamma-convergence we first state a compactness property of the energies.
Theorem 2.1.Let Ω ⊂ R d be an open and bounded Lipschitz domain, and let ρ 0 , ρ 1 ∈ BV (Ω) ∩ L ∞ (Ω) satisfy ess inf Ω (ρ 0 + ρ 1 ) > 0. Then for any sequence (ε k ) k∈N with lim k→∞ ε k = 0 and collection of sets A k ⊂ Ω with lim sup k→∞ Per ε k (A k ; ρ) < ∞, we have that up to a subsequence (not relabeled) The assumption that ρ 0 and ρ 1 belong to BV (Ω) is crucial for regularity of the set A following from Theorem 2.1.To see this, fix a set A ⊂ Ω which is not a set of finite perimeter.Defining ρ 0 = χ A and ρ 1 = χ A c , we have that Per ε (A; ρ) ≡ 0, showing that (2.3) cannot hold.If one enforces the constraint ess inf Ω ρ i > 0 on both densities, this problem is resolved, however this is an unreasonable constraint in the context of classification since it would require both classes to be entirely mixed.Supposing now that A is a set of finite perimeter (i.e., χ A ∈ BV (Ω)) and ρ 0 , ρ 1 ∈ BV (Ω), we define the function where ν = DχA |DχA| is the measure-theoretic inner normal for A. We suppress dependence of β(ν; ρ) on A as β(ν; ρ) is uniquely prescribed, in the sense that if A 0 and A 1 are two sets of finite perimeter then the definition (2.4) is H d−1 -a.e.equivalent in x ∈ Ω : Note that if ρ 0 and ρ 1 are also continuous, then it holds for all ν that but for general BV -densities β may be anisotropic.
in the L 1 (Ω) topology, where the weighted perimeter is defined by We note that Example 1 and Theorem 2.3 provide a direct interpretation for how the nonlocal perimeter (2.2) selects a minimal surface for adversarial training (1.1).The fidelity term E (x,y)∼µ [ℓ(u(x), y)] in (1.1) roughly wants to align A with supp ρ 1 , for which a simple case is given by (2.6).In this situation, the cost function in (1.1) for A = (−1, 0) takes the value 2ε and the perimeter picks up the value β = 2.In contrast, for the sets A δ = (−1, ±δ) for δ > ε the cost function has the value δ + ε and these sets recover the limiting perimeter with β = 1.Hence, adversarial regularization reduces the cost by effectively performing a preemptive stabilizing perturbation of the classification region.
Remark 2.4 (Decomposition of the limit perimeter).We remark that the limit perimeter may be decomposed into an isotropic perimeter plus an anisotropic energy living only on the intersection of the jump sets J ρ0 ∩J ρ1 of the two densities.In particular, one can directly verify that where ν A := DχA |DχA| is the measure-theoretic inner unit normal of the boundary of A and we use the notation t + := max(t, 0) for t ∈ R. Note that the only case for which the non-isotropic term is different from zero is the case that ρ νA 0 < ρ −νA 0 and ρ νA 1 > ρ −νA
Note that all the sets A ∈ {(α, 2) : α ∈ [−1, 1]} minimize the Bayes risk E (x,y)∼µ [|χ A (x) − y|].However, easy computations show that Per((α, 2); ρ) =4 16 for α ∈ (−1, 1] and Per((−1, 2); ρ) = 3 16 .Hence, A = (−1, 2) has the smallest perimeter among these minimizers and hence solves (2.11).In this case the optimal Bayes classifier saturates the entire support of ρ 1 at the cost of picking up the small jump lim x↑−1 ρ 0 (x) − lim x↓−1 ρ 0 (x) = 1  16 .Note that if one symmetrizes the situation by setting the values of ρ 0 on (−2, 1) and of ρ 1 on (1, 2) to Theorem 2.3 is a direct consequence of Theorem 3.2 for the lim inf inequality and Theorem 3.3 for the lim sup inequality.To prove Theorem 3.2, we proceed via a slicing method, which allows us to reduce the argument to an elementary, though technical, treatment of the lim inf inequality in dimension d = 1.To prove the lim sup inequality in Theorem 3.3, we apply a density result of De Philippis et al. [24] to reduce to the case of a regular interface.In this setting, with the target energy in mind, we can locally perturb the interface to recover the appropriate minimum in the definition of β(ν; ρ) for the given orientation.
As is well known, Theorem 2.3 implies convergence of minimization problems involving Per ε (•; ρ) to problems defined in terms of the local perimeter Per(•; ρ) (see, e.g., [11]).Further, this result has a couple of important applications which we discuss in Section 4: Firstly, it immediately implies Gamma-convergence of the corresponding total variation.Secondly, as seen in Theorem 2.5, it has important implications for the asymptotic behavior of adversarial training (1.1) as ε → 0. Lastly, we can use it to prove Gamma-convergence of discrete perimeters on weighted graphs in the T L p topology, see Theorem 4.7 in Section 4.
Future work will include using the results of this paper to study dynamical versions of adversarial training.We remark that one way to think of the anisotropy in the limiting energy is that trace discontinuities pick up derivative information (in the spirit of BV ).To recover an effective sharp interface model with anisotropy even for smooth densities, one could then look to the "gradient flow" of the energy.For this one can interpret (1.3) as first step in a minimizing movement discretization of the gradient flow of the perimeter, where ε > 0 is interpreted as time step.Iterating this and sending ε → 0 one expects to arrive at a weighted mean curvature flow, depending on the densities ρ i .Furthermore, preliminary calculations also indicate that the next order Gamma-expansion of the nonlocal perimeter, i.e., the expression 1 ε Per ε (A; ρ) − Per(A; ρ) , relies on the gap between the trace values of ρ 0 and ρ 1 on ∂A which induces anisotropy even for smooth densities.

Auxiliary definitions and reformulations of the perimeter
To work with the perimeter (2.2), it is convenient to reformulate the energy in a way such that it resembles the thickened sets introduced in the definition of the d − 1 dimensional Minkowski content (2.1).Recalling that A t denotes the points of density t in A, for our setting, we have the following lemma, which may be directly verified (see also [19]).
Lemma 2.6.The perimeter (2.2) of a measurable set A ⊂ Ω admits the following equivalent representation: and furthermore it holds meaning that Per ε (A; ρ) can be expressed in terms of A 1 .
The above formulations provide a clear way to define restricted (localized) versions of the nonlocal perimeter.For measurable subsets A and Ω ′ ⊂ Ω, we define outer and inner nonlocal perimeters (respectively )

Note that by definition the monotonicity property Per
3 Gamma-convergence and compactness In this section we will prove that the Gamma-limit of the nonlocal perimeters Per ε (A; ρ), defined in (2.2), is given by Per(A; ρ), defined in (2.8), thereby completing the proof of Theorem 2.3.

Compactness
We can directly turn to the proof of compactness.The argument adapts the approach introduced in [19, Theorem 3.1], but takes care to account for the fact that the densities ρ i are allowed to vanish.
Proof of Theorem 2.1.Let us define the following sequences of functions (u k ) k∈N and (v k ) k∈N by Here u k changes value in a small layer outside A k , and similarly, v k transitions to 1 inside A k .Recalling that the gradient of the distance function has norm 1 almost everywhere outside of the 0 level-set (see, e.g., [25]), up to a null-set, these functions satisfy By the hypothesis ess inf Ω (ρ 0 + ρ 1 ) > 0, we have ρ 0 + ρ 1 > c ρ almost everywhere in Ω for some constant c ρ > 0. Hence, the sets Ω i := {ρ i > δ} cover Ω for 0 < δ < c ρ /2, i.e., Ω = Ω 0 ∪ Ω 1 .Applying the coarea formula to ρ 0 and ρ 1 , we may choose δ such that Ω i for i = 0, 1 are sets of finite perimeter.
Applying the chain rule for BV functions [2, Theorem 3.96] with f : (y, z) → yz, we find Note that by construction TV(χ Ω0 ) < ∞.Using this together with (3.2) shows that u k χ Ω0 is bounded uniformly in BV (Ω).Similarly, we also have that v k χ Ω1 is bounded uniformly in BV (Ω).Consequently, we apply BV -compactness (see, e.g., [2,Theorem 3.23]) to both sequences, to find that u k χ Ω0 → u and Taking into account that and a similar computation for Using lower semi-continuity of the total variation we get and similarly TV(χ V ) < ∞, meaning that both sets have finite perimeter in Ω.Consequently, the set

Liminf bound
We now prove the associated lim inf bound in the definition of Gamma-convergence for Theorem 2.3.The argument relies on slicing techniques, which allow the general d-dimensional case to be reduced to 1-dimension.To keep this part of the argument relatively self-contained and notationally unencumbered, we perform the slicing argument locally, while remarking that the approach developed in [10] could also be applied.
In our proof of the 1-dimensional case, we will need the following auxiliary lemma, which is a direct consequence of Reshetnyak's lower semi-continuity theorem.
Proof.We restrict our attention to i = 0. Note, up to an equivalent representative, we may assume that ρ 0 = ρ − 0 is a lower semi-continuous function defined everywhere (see [2, Section 3.2]).Recall the function u k introduced in the proof of Theorem 2.1 in (3.1) for which we have As u k → χ A in L 1 ({x : ρ 0 > δ}) for all δ > 0, we may apply the Reshetnyak lower semi-continuity theorem [44,Theorem 1.7] in each open set {x ∈ I : ρ 0 > δ} to find lim inf Letting δ → 0 and applying (3.4) concludes the result.
Theorem 3.2.Let Ω ⊂ R d be an open, bounded subset with Lipschitz boundary.Assume that ρ 0 , ρ Proof.For notational convenience, we suppress dependence on ρ and write Per ε (A) instead of Per ε (A; ρ).Likewise, we consider ε → 0, with the knowledge that this refers to a specific subsequence.Furthermore we define ν := DχA |DχA| and write β ν instead of β DχA |DχA| ; ρ .We split the proof into two steps.In Step 1, we show that the result holds in dimension d = 1.In this setting, a good representative of a BV functions possesses one sided limits everywhere, which will effectively allow us to reduce to the consideration of densities given by Heaviside functions and an elementary analysis.To recover the lim inf in general dimension, in Step 2, we use a slicing and covering argument along with fine properties of both BV functions and sets of finite perimeter.
Step 1: Dimension d = 1.We may without loss of generality suppose that Ω is a single connected open interval and χ A ∈ BV (Ω; {0, 1}).Consequently, the reduced boundary With this in hand, one can cover each element of ∂ * A by pairwise-disjoint neighborhoods to apply the inequality (3.5) to conclude the theorem in the case d = 1.We first suppose without loss of generality that x 0 = 0, ν(x 0 ) = −1, and that for any η ≤ η 0 we have ∂ * A ∩ (−η, η) = {0}.Up to choosing a smaller η 0 , we proceed by contradiction and suppose that there is a subsequence such that lim for all η ≤ η 0 .Applying Lemma 3.1, it follows that lim inf This implies Checking all the cases for the two minima on the right hand side shows that we must have and Using (3.9), we see that and so, without loss of generality, we may suppose that Using (3.10) inside (3.6) and applying (3.7) with the minimum identified by (3.9), we have that for some δ > 0. As (3.6) and (3.11) are unaffected by choosing η smaller, we now restrict η to be η < δ/4 and sufficiently small such that, for i = 0, 1, the one-sided limits are approximately satisfied, precisely, Using Lemma 2.6 it holds with the key point being that Per 1 ε can also be expressed in terms of the same underlying set A 1 ε .Using the above representation, by (3.11) and (3.12), for all ε sufficiently small, there is for all sufficiently small ε.Using x ε , the representation (3.13), and (3.12), it follows that Using this in (3.6) and applying (3.7) for i = 1 with minimum determined via (3.9), we have for all η > 0. Taking η → 0, we have contradicting the definition of β ν in (2.4).
Step 2: Dimension d > 1.For any η > 0, we show that for H d−1 -almost every x 0 ∈ ∂ * A there is r 0 := r 0 (x 0 , η) > 0 such that for every r < r 0 then where we recall that Q ν (x 0 , r) is a cube oriented along ν.Supposing, we have proven this, we may apply the Morse covering theorem [27] to find a countable collection of disjoint cubes {Q ν(xi) (x i , r i )} i∈N satisfying (3.14) and covering ∂ * A up an H d−1 -null set.Directly estimating, we find that Taking n → ∞ and then η → 0 concludes the theorem.
Turning now to prove (3.14), we apply the De Giorgi structure theorem to conclude that up to a H d−1 -null set, we have where K i is a subset of C 1 manifold.Consequently, for H d−1 -almost everywhere x 0 ∈ ∂ * A, we have that there is i ∈ N such that the density relations hold and the normals are aligned with ν(x 0 ) = ν Ki (x 0 ).
Fixing x 0 as above, choose a scale r 0 such that the density relations in (3.15) hold up to error η, precisely, and such that ν After some algebraic manipulation, one sees that (3.16) implies Without loss of generality, we may assume x 0 = 0 and ν ∂ * A (0) = e d .We perform a slicing argument.For notational convenience, we choose A ε to be given by the equivalent representative A 1 ε , allowing us to use the representation (3.13) without writing the superscript.Defining note that for any set where the distance is directly in R for the first set.We now use the representation (3.13), Fubini's theorem with x = (x ′ , x d ), and (3.20) to estimate Applying Fatou's lemma and Step 1 in the previous inequality, we have We note for almost every Using this, and subsequently the coarea formula [2], (3.17), (3.18), and the L ∞ bound on the densities ( ρ i L ∞ ≤ C), we have which together with (3.21) concludes (3.14) and the theorem.

Limsup bound
In this section, we show that for a given target classification region A ⊂ Ω, we can construct a recovery sequence with the optimal asymptotic energy.The result is precisely stated in the following theorem.The proof of this theorem relies on technical properties of BV functions, but is conceptually simple.Our approach is outlined in the following steps: 1. We use use a recent approximation result of De Philippis et al. [24] to approximate the BV function χ A by a function u ∈ BV (Ω) having higher regularity.We select a level-set, given by A η , of u such that ∂A η = ∂ * A η , H d−1 (∂A η △∂ * A) ≪ 1, and a large portion of ∂A η is locally given by a C 1 graph.
2. We then break ∂A η into a good set, with smooth boundary, and a small bad set, with controllable error.
3. In the good set, A η has C 1 boundary, and here we construct an almost optimal improvement by perturbing the smooth interface.This is the result of Proposition 3.7.
(a) Using a covering argument, we localize the construction of a near optimal sequence and introduce improved approximations in balls centered on points x 0 ∈ ∂A η .
(b) Depending on the minimum value of β(ν; ρ) , where ν is the inner normal of A η , we either shift the interface up or down slightly or, in the latter case of the minimum, leave it unperturbed to recover the optimal trace energies.The essence of this is to approximately satisfy ρ 0 + ρ 1 ≈ β(ν; ρ) on the modified interface.
4. Diagonalizing on approximations for A η and then on η, one obtains a recovery sequence the original set A.
We begin with the proof of Proposition 3.7 for Step 3, and for this, a couple auxiliary lemmas will make the argument easier.In our construction, we will select an appropriate level-set using the following lemma, which says that given control on an integral one can control the integrand in a large region.Lemma 3.4.Let f : (a, b) → [0, ∞) be integrable and θ ∈ (0, 1).Then Proof.Using Markov's inequality one computes To control errors arising in our interface construction, we will take advantage of the assumption ρ i ∈ L ∞ (Ω).Specifically, energetic contributions of H d−1 -small pieces of our construction will be thrown away using the following proposition for the classical Minkwoski content.Proposition 3.5 (Theorem 2.106 [2]).Suppose that f : where M is the d − 1-dimensional Minkowski content defined in (2.1).
In the proof of Step 3 and in the case that the optimal energy sees the traces on both sides of the interface, we will need a recovery sequence with non-flat interface.We will apply the following lemma to show that the weighted perimeters converge on either side of the interface.Lemma 3.6.Suppose that A ⊂ Q(0, r) is given by the sub-graph of a function in C 1 (Q ′ (0, r)) which does not intersect the top and bottom of the cube Q(0, r).Then Proof.Again, we suppress the dependency of ρ.It suffices to show convergence for the outer perimeter Per 0 ε (A; Q(0, r)) defined in (2.14).Let g 0 : Q ′ (0, r) → (−r/2, r/2) be the function prescribing the graph associated with ∂A.Let x ∈ R d → sdist(x, A) denote the signed distance from A, with the convention that it is non-negative outside of A. Note first that sdist −1 (s) ∩ Q(0, r − 2s) is the graph of a Lipschitz function g s : Q ′ (0, r − 2s) → (−r/2, r/2) with uniformly bounded gradient depending on g 0 for all s > 0 sufficiently small.To see this, note that the level set sdist can be expressed by translating the graph of the boundary, that is, which is the supremum of equicontinuous bounded functions.In fact, we have that where ω is the modulus of continuity of ∇g 0 in Q(0, r).To see this, we note that for sufficiently small t ∈ R d−1 and x ′ ∈ Q ′ (0, r − 2s), by (3.22), there is always a ν t ∈ S d−1 such that g s (x ′ + t) = g 0 (x ′ + t + sν ′ t ) − sν t,d .Using the mean value theorem, we can estimate where θ ∈ (0, 1); the same bound holds from below.Now assuming that x ′ is a point of differentiability for g s , we insert g s (x ′ + t) − g s (x ′ ) = ∇g s (x ′ ), t + o(|t|) into the above inequality to find Fixing r ∈ (0, 1), taking the supremum over t ∈ rS d−1 , dividing by r, and then sending r → 0, this inequality becomes ∇g s (x ′ ) − ∇g 0 (x ′ ) ≤ ω(s), and as Lipschitz functions are differentiable almost everywhere, we recover (3.23).Define φ(x ′ , s) := φ s (x ′ ) := (x ′ , g s (x ′ )).We now apply the coarea and area formulas to rewrite the perimeter as Using the area formula once again, we have Taking the difference, we can estimate Taking the lim sup as ε → 0 and then letting δ → 0 in the above estimate, we see that the lemma will be concluded if we show that lim sup Note that (x ′ , s) → φ(x ′ , s) is a bi-Lipschitz function on Q ′ (0, r(1−δ))×(0, ε) for sufficiently small ε.By [2, Theorem 3.16] on the composition of BV functions with Lipschitz maps, it follows that (x ′ , s) Further, by fine properties of BV functions (see, [2, Theorem 3.108 (b)]), it follows that Rewriting [2, Eq. (3.88)], for f ∈ BV (Q(0, r(1 − δ))), we have Inserting f = ρ 0 • φ into the above equation and using (3.25) and (3.26) concludes the proof of (3.24) and thereby the lemma.
We now prove that smooth sets have near optimal approximations, completing the proof of Step 3. As a matter of notation, we will typically consider the closure and boundary of a set A relative to Ω and denote this as A and ∂A, respectively.However, to denote the closure of a set A in R d , we will write A R d . This distinction will be important to ensure that the energy does not charge the boundary of Ω.Then for any η > 0, there is A η such that A η = A in a neighborhood of ∂Ω and the following inequalities hold: Proof.The primary challenge in this construction is controlling the interaction between the interfaces given by ∂A and by J ρ := J ρ0 ∪ J ρ1 , denoting the jump set of ρ 0 and ρ 1 , with a secondary challenge being to ensure that ∂A does not charge the boundary ∂Ω.We denote the inner normal of A by ν(x) := DχA |DχA| (x), abbreviate β ν = β DχA |DχA| ; ρ , and suppress the dependency on ρ for notational simplicity.We write ∂A as the union of a good surface and a bad surface We will select neighborhoods of Lebesgue points useful to our construction, and as such, we are only interested in keeping track of properties of S G and S B up to an H d−1 -null set.We may assume the following are satisfied up to H d−1 -null sets: 1.) For each point x ∈ ∂A, there is r x > 0 such that up to a rotation and translation, given by the subgraph of a C 1 function g x centered at the origin with ∇g x (0) = 0.
2.) For each point x ∈ ∂A, where we recall Q ν (x, r) is a cube oriented in the direction ν.
7.) For x ∈ S B , we may assume To prove the proposition, we will use a covering argument to select cubes containing the majority of points in S G and S B , for which the limits in the above list are approximately satisfied up to small relative error η ≪ 1. Inside each of these nice cubes, we will modify the interface or leave it alone depending on the target energy.
We begin by showing that for fixed η > 0, for H d−1 -almost every x ∈ ∂A there is r 0 (x, η) > 0 such that for any r < r 0 , there is a modification of A given by A η for which A η = A in a neighborhood of ∂Q ν(x) (x, r) and the inequalities hold.As the idea is similar for x ∈ S G , we focus on the more complicated situation where x ∈ S B .
Without loss of generality, we suppose that x = 0 ∈ S B , that ν(0) = −e d , and the relevant properties of Hypothesis 1.) -Hypothesis 8.) are satisfied.Recalling the definition of β ν in (2.4), to construct a modification of A in a small box centered at 0, we break into cases: Either 0).In the first case, we shift the interface up or down to recover the optimal trace.In the second case, the energy is best when picking up the traces from either side of A, and consequently we will not modify A, but must show this suffices.
By Hypothesis 1.), the relative height satisfies h ∂A (0, r) := sup y∈Q ′ (0,r) g 0 (y) r → 0 as r → 0, where Q ′ denotes the d − 1 dimensional cube.Consequently, also using that ∇g 0 is continuous, we may assume that r 0 = r(0, η) ≪ 1 is such that Further by Hypothesis 7.), we may take r 0 ≪ 1 such that for r < r 0 we have which gives Thus by Lemma 3.4, for θ ∈ (0, 1), we may find a θ-fraction of a ∈ (ηr, 2ηr) such that Further, as J ρ is σ-finite with respect to H d−1 (as the absolute value of the jump is positive and integrable on this set), we can assume that for such choices of the value a we also have and thus By Lemma 3.6 and (3.30), for all r ′ < r, we have Now, we introduce a second small parameter 0 < δ < 1 (that can be taken equal to η), which we use to shrink the cube under consideration.Defining A η in the cube Q(0, r) to be A ∪ (Q ′ (0, r(1 − δ)) × (−a, a)), the L 1 estimate of (3.28) follows immediately.By Proposition 3.5, we have and one can additionally see that where j sums over the faces Consequently, we may apply (3.29) and (3.31), the coarea formula [2], and Hypotheses 2.) and 3.) to find lim sup (3.32) As the constant C > 0 arising in the previous inequality is independent of x and r, taking δ = η, up to redefinition of A η by A η/C , the proof of this case is concluded.Case 2: β ν (0) = ρ −ν 0 (0) + ρ ν 1 (0).For A = A η , the L 1 estimate of (3.28) is trivially satisfied.By Lemma 3.6, we have that Assuming that the limits Hypotheses 3.) and 8.) are satisfied up to error η for r < r 0 = r(0, η), as in (3.32) we can estimate Once again, as η > 0 is arbitrary, the proof of (3.28) is complete.
By the Morse measure covering theorem [27], we may choose a finite collection of disjoint cubes {Q i } satisfying (3.28) compactly contained in Ω such that We define A η to be the set satisfying (3.28) in each of the finitely many disjoint cubes Q i , and the same as A outside of these cubes.The L 1 estimate within the proposition statement is satisfied.For each point is given as the C 1 -graph of a compact set up to rotation.Applying the Besicovitch covering theorem [27], we may find a finite subset of {C x }, given by {C j }, covering F such that each point x belongs to C j for at most C(d) many j.
Noting that by Proposition 3.5 and using the properties of the selected Q i , we then estimate As η > 0 is arbitrary, we conclude the proof of the proposition.
We now complete the proof of the lim sup bound in Theorem 3.3, and thereby the Gamma-convergence result of Theorem 2.3.
Proof of Theorem 3.3.As usual, we denote ε k by ε.By Theorem 2.1, we may without loss of generality assume that χ A ∈ BV (Ω).
Step 1: Construction of an approximating "smooth" set.For η > 0, we apply the approximation result [24, Theorem C] to find u ∈ SBV (Ω) such that We note that the approximation is found by applying [24, Theorem C] to χ A − 1/2, and the L ∞ bound follows from the comment at the top of page 372 therein (in fact, H d−1 (M \ J u ) = 0 at this point).
We will select a level-set of u to approximate A, and to do this, we will need to know that the boundary of the level-set is well-behaved: away from M , it will suffice to apply Sard's theorem to see this is a C 1manifold; however, it is possible the level-set boundary oscillates as it approaches M and creates a large intersection.To ensure this doesn't happen, we begin by modifying u so that u is C 1 up to the manifold M and ∂Ω, (3.34) for which the precise meaning will become apparent.
To modify u to satisfy (3.34), we locally use reflections to regularize.We remark that the trace is well defined on M , and up to a small extension of the manifold at ∂M , we can assume that u + = u − in a neighborhood of ∂M (meaning the manifold boundary).We modify u as follows: For each x ∈ M ∩{u + = u − } we choose r x > 0 such that M ∩ B(x, r x ) is a graph and B(x, r) ∩ ∂M = ∅.Consider a partition of unity {ψ i } with respect to a (finite) cover {B(x i , r i )} of {u + = u − }.For each ball B(x i , r i ), define M ± i to be the ball intersected with the sub-or super-graph.In M ± i , we can reflect and mollify uψ i .Choosing fine enough mollifications, restricting to M ± i , and adding together the mollified functions (and u(1 − i ψ i )) provides the desired approximation satisfying (3.34) and preserving the relations in (3.33).Similarly, one may smoothly extend u to R d as M ⊂⊂ Ω.
We control the second right-hand side term in (3.37) as follows.As in (3.42), we have Consequently by Lemma 3.4, u νA − u −νA > 3/4 in J u ∩ ∂ * A outside of a set with H d−1 -measure less than Cη.As the normal ν As coincides with the direction of positive change for u, for s ∈ (3/8, 5/8), we have that ν A = ν As outside of a small set controlled by η.In other words, we have that Putting this estimate together with the bounds (3.36) and (3.44), we conclude (3.35).
Note first by Sard's theorem, for almost every s ∈ (0, 1), the set ∂A s ∩ Ω \ {u + = u − } is a C 1 surface away from M ∪ ∂Ω (i.e., locally in Ω \ M ).We will show that the claim holds locally in {u + = u − } ∪ ∂Ω, meaning that: for any x 0 ∈ {u + = u − } ∪ ∂Ω, there is a radius r > 0 such that, for almost every s ∈ (0, 1), for some compact set N ⊂ Ω (this allows us to avoid proving anything on ∂B(x 0 , r)).With the claim satisfied locally, a covering argument concludes the claim.
We will assume that x 0 ∈ {u + = u − }, as the case of x 0 ∈ ∂Ω is simpler.Recall that we chose M so that dist({u + = u − }, ∂M ) > 0, so that there is r > 0 such that B(x 0 , 2r) ∩ M is a C 1 surface.Let M + and M − be the associated super-graph and sub-graph in B(x 0 , 2r), respectively.By (3.34), u| M ± has a C 1 extension to R d , which we denote by u ext,± .Similarly, denoting the trace of u from M ± onto M ∩ B(x 0 , 2r) by u ± , we have that u ± belongs to C 1 (M ∩ B(x 0 , 2r)).Applying Sard's theorem three times, in M ± and in M ∩ B(x 0 , 2r), we find that for almost every s ∈ (0, 1) are C 1 surfaces of dimension d − 1 and d − 2, respectively, and further where ∂ M denotes the boundary with respect to the topology relative to M .Clearly we have and taking N := (∂ M {u + > s} ∪ ∂ M {u − > s}) ∩ B(x 0 , r), we have that the claim is locally satisfied.
Step 2: Good and bad parts of ∂A s .We now fix s ∈ (3/8, 5/8) such that the previous step holds with regularity (as in Substep 1.2) and estimate (3.35) for A s := {u > s}.We construct open sets U 1 and . The set U 1 will only contain points of ∂A s where it is a smooth manifold.The set U 2 will contain the H d−1 -small collection of points in Ω for which ∂A s R d is not given by a smooth manifold, i.e., containing N .
We choose N ⊂ U 2 ⊂ {y : dist(y, N ) ≤ δ} for 0 < δ ≪ 1 such that this is possible as δ>0 {y : dist(y, N ) ≤ δ} = N and H d−1 (N ) = 0.For every x ∈ ∂A s \ U 2 , there is a local neighborhood contained in Ω for which the boundary is a graph.Consequently, we may choose U 1 ⊂⊂ Ω to be an open set with smooth boundary such that the boundary is , and U 1 is well separated from the sets arising from interface intersections with U 1 ∩ N = ∅.
Step 3: Near optimal approximation.We will now construct a new set A η by modifying A s in U 1 while in U 2 , we leave the surface unchanged.We will show that this set A η satisfies Within U i the constructed A η will leave the set A s unchanged in a neighborhood of ∂U i .Consequently, for sufficiently small ε > 0, we have where M ε is defined in (2.1).By properties (3.45), one can argue as at the end of the proof of Proposition 3.7 (with {C j }) to find that lim sup Further, by Proposition 3.7, we can construct With this, using (3.35), we see that the L 1 estimate in (3.46) immediately follows.To obtain the lim sup inequality, we apply the decomposition (3.47), the estimate of the Minkowski content (3.48), and the difference between ∂A s and ∂ * A in (3.37) to find lim sup Up to redefinition of η to absorb the constant C, this concludes (3.46), which by a diagonalization argument concludes the theorem.
For use in the next section, we highlight that the approximation introduced in the above proof may be used as a near optimal constant recovery sequence.Corollary 3.8.If the conditions of Theorem 2.3 hold true, then for all measurable sets A ⊂ R d with Per(A; ρ) < ∞ and for all η > 0 there is a set A η with smooth boundary away from a finite union of d − 2-dimensional manifolds having transverse intersection with the domain boundary, in the sense that

Applications
Having proved our main statement Theorem 2.3, we now turn towards applications.We deduce Gammaconvergence of the total variation functional which is associated to Per ε (•; ρ) in Section 4.1, in Section 4.2 we discuss the asymptotic behavior of adversarial training as ε → 0, and in Section 4.3 we define discretizations of the nonlocal perimeter on random geometric graphs and prove their Gamma-convergence.

Gamma-convergence of total variation
We define the nonlocal total variation of u ∈ L 1 (Ω): One can check easily (see also [12,Proposition 3.13] for L ∞ -functions) that TV ε satisfies the generalized coarea formula For any integrable function, the measure of the set of values t such that level-sets {u = t} have positive mass is zero.Consequently, as definition (2.2) is invariant under modification by null-sets, we may rewrite this as This motivates us to define a limiting version of this total variation as which is identified as the Gamma-limit of TV ε in the following theorem.Proof.The result is a consequence of Theorem 2.3, (4.2), and [18, Proposition 3.5]-a generic Gammaconvergence result for functionals satisfying a coarea formula.
We further provide a natural integral characterization of the limit energy (4.3),where we cannot directly argue via a density argument as β's behavior on d−1-dimensional sets is not "continuous" when approximated from the bulk.Proof.For u fixed, we may define β := β Du |Du| ; ρ for H d−1 -almost every point and treat it as a fixed function.Given the properties of the jump set [2, Section 3.6], it follows that β has an H d−1 -equivalent Borel representative.We can then rewrite the equality in the proposition as where β is a generic positive, bounded Borel measurable function.By a standard approximation argument, (4.4) will follow if we show that it holds for β := χ A for any Borel measurable subset A ⊂ Ω.
To extend to a generic Borel subset, define the class S := {A ⊂ Ω : (4.4) holds for β := χ A }.For A, an open subset of Ω, and β := χ A , (4.4) reduces to the standard coarea formula for BV functions [2].Consequently S contains all open subsets.Noting that S satisfies the hypothesis of the monotone class (or π − λ) theorem [25,Theorem 1.4], it follows that S contains all Borel subsets, concluding the proposition.

Asymptotics of adversarial training
In this section we would like to apply our Gamma-convergence results to adversarial training (1.1).For this we let µ ∈ M(Ω × {0, 1}) be the measure characterized through as the collection of characteristic functions of Borel sets.As proved in [12] the problem can equivalently be reformulated as inf where Per ε (A; ρ) is the nonlocal perimeter (2.2) for which we proved Gamma-convergence in Section 3. Before we turn to convergence of minimizers of this problem, we first discuss an alternative and simpler model for adversarial training which arises from fixing the regularization parameter in front of the nonlocal perimeter in (4.7) to α > 0: Unless for α = ε this problem is not equivalent to adversarial training anymore but it can be interpreted as an affine combination of the non-adversarial and the adversarial risk: Gamma-convergence as ε → 0 is an easy consequence of Theorem 2.3 since (4.8) is a continuous perturbation of a Gamma-converging sequence of functionals.Let us now continue the discussion of the original adversarial training problem (4.6) (or equivalently (4.8)).In order to preserve the regularizing effect of the perimeter, it is natural to take the approach of [7,13], developed in the context of Tikhonov regularization for inverse problems, and to consider the rescaled functional Obviously, this functional has the same minimizers as the original problem (4.6) and (4.7).Judging from the results in [7,13] one might hope that the Gamma-limit of J ε is the functional ∞, else.
Proof.If we assume that α : Hence, we see In the other case if α = 0 we use the lim inf inequality from Theorem 2.3 to find lim inf The lim sup inequality is non-trivial and potentially even false.Letting (A k ) k∈N be a recovery sequence for the perimeter Per( and hence, for the lim sup inequality to be satisfied, we would need to make sure that lim sup This requires that the recovery sequences converges sufficiently fast to A in L 1 (Ω), namely and is not obvious from our proof of Theorem 3.3.Even for smooth densities ρ 0 , ρ 1 where the construction of the recovery sequences is much simpler, (4.12) is not obvious.On the other hand, condition (4.12) for the validity of the lim sup inequality only has to be satisfied for the minimizers (so-called Bayes classifiers) of the unregularized problem inf B∈B(Ω) E (x,y)∼µ [|χ B (x) − y|] which have finite weighted perimeter.
This motivates to assume a so-called "source condition", demanding some regularity on these Bayes classifiers.In the field of inverse problems source conditions are well-studied and known to be necessary for proving convergence of variational regularization schemes [8,13].Our first source condition-referred to as strong source condition-takes the following form: possess a recovery sequence satisfying (4.12).

(sSC)
Note that a Bayes classifier A † admits a recovery sequence satisfying (4.12), for instance, if ∂A † is sufficiently smooth and the densities ρ 0 and ρ 1 are continuous.In this case the constant sequence, which trivially satisfies (4.12), recovers.Under this strong condition we have proved the following Gamma-convergence result: Proposition 4.5 (Conditional Gamma-convergence).Under the conditions of Theorem 2.3 and assuming (sSC) it holds that In fact, a substantially weaker source condition suffices for compactness of solutions as ε → 0: There exists a Bayes classifier A † ∈ arg min which possesses a recovery sequence satisfying (4.12).

(wSC)
We get the following compactness statement assuming validity of this source condition.Proposition 4.6 (Conditional compactness).Under the conditions of Theorems 2.1 and 2.3 and assuming the source condition (wSC), any sequence of solutions to (4.6) is precompact in L 1 (Ω) as ε → 0.
Proof.Let us take a sequence of solutions A ε of (4.6) for ε → 0. Furthermore let A † ε denote a recovery sequence for the Bayes classifier A † which satisfies (wSC).Using the minimization property of A ε it holds Here the quantity δ n > 0 is given by which for d ≥ 3 coincides with the asymptotic connectivity threshold for random geometric graphs.We can use these transport maps to consider the measures which are the empirical measures of the graph points that are associated with the i-th density ρ i .We define a graph discretization of Per ε (A; ρ) for A ⊂ X n as which effectively counts the number of points in an exterior strip around A carrying the label 0 and the number of points in an interior strip in A carrying the label 1.Note that, although the complement A c of a subset A of the graph vertices X n is not a subset of X n anymore, the empirical measure ν 1 n only considers points in A c ∩ X n so one can just as well replace A c by X n \ A. Note also that this graph model using (4.14) assumes that the label distribution is performed according to the ground truth distributions ρ 0 and ρ 1 .One can also treat more general labeling models such that (4.14) is asymptotically satisfied as n → ∞, but for the sake of simplicity we limit the discussion to the model above.
Using the weight function W n (x, y) we can equivalently express E n (A) as This graph perimeter functional combines elements of the graph perimeter studied in [36] and the graph Lipschitz constant studied in [43].Correspondingly, also the following Gamma-convergence proof bears similarities with both of these works.For proving Gamma-convergence of these graph perimeters to the continuum perimeter we employ the T L p -framework, developed in [36].Here, we do not go into too much detail regarding the definition and the properties of these metric spaces.We just define the space as the set of pairs of L p functions and measures T L p (Ω) := {(f, µ) : µ ∈ P(Ω), f ∈ L p (Ω, µ)}, where P(Ω) is the set of probability measures on Ω.We highlight [36, Proposition 3.12, 4.] which says that in our specific situation with ρ being a strictly positive absolutely continuous measure, convergence of (u n , ν n ) → (u, ρ) in the topology of T L p (Ω) is equivalent to u n •T n → u in L p (Ω) for the maps T n satisfying (4.13).Furthermore, we would like to emphasize that the functionals E n are random variables since they depend on the given realization of the random variables which constitute the vertices X n of the graph.Still it is possible to prove Gamma-convergence of these functionals with probability one, meaning that Gamma-convergence might be violated only for a set of graph realizations which have zero probability.We refer the interested reader to [36,Definition 2.11] for precise definitions.
The following is the main result of this section and asserts Gamma-convergence of the functionals E n to the Gamma-limit from Theorem 2.3.as n → ∞ in the T L 1 (Ω) topology, and the following compactness property holds: Proof.The result is proved in Lemmas 4.8 and 4.9 which prove the liminf and the limsup inequalities.The proof of the compactness statement is implicitly contained in the proof of Lemma 4.8.Proof.We can assume that Per(A; ρ) < ∞.Using Corollary 3.8, for all η > 0 there exists a set A η such that L d (A△A η ) ≤ η and lim sup n→∞ Per εn (A η ; ρ) ≤ Per(A; ρ) + η for all sequences ε n which converge to zero as n → ∞.We will abbreviate Ã := A η .Given the conditions of Corollary 3.8, we have that In the remainder of the proof we argue that we can replace every occurrency of χ Ã • T n by χ Ã (and similar for Ãc ) in the limit n → ∞.For this we first estimate the terms without essential suprema and then the terms with essential suprema.Defining the set Ân := {χ Ã • T n = 1} we have χ Ã • T n = χ Ân .Furthermore, for every x ∈ Ã with dist(x, Ãc ) > T n − id L ∞ (Ω) we can find x ∈ X n with |x − x| ≤ T n − id L ∞ (Ω) and hence x ∈ Ã.This implies x ∈ Ân .Similarly, one argues that x ∈ Ãc with dist(x, Ã) > T n − id L ∞ (Ω) implies x ∈ Âc n .Hence, we obtain which is a contradiction.We can argue symmetrically for points that lie within Ã⊕εn with distance to the complement larger than 2 T n − id L ∞ (Ω) .Using this we may estimate the symmetric difference of these sets by For x ∈ R d and r > 0 we denote by B(x, r) := {y ∈ R d : |x − y| < r} the open ball with radius r around x. Furthermore, Q ν (x, r r) denote the d − 1 dimensional cube centered at x with sides of length r.Measures and sets The d-dimensional Lebesgue measure in R d is denoted by L d and the k-dimensional Hausdorff measure in R d as H k .For a set A ⊂ R d we denote its complement by A c := R d \ S. If A ⊂ Ω ⊂ R d is a subset of some fixed other set Ω, we let A c denote its relative complement Ω \ A. The symmetric difference of two sets A, B ⊂ R d is denoted by A△B := (A \ B) ∪ (B \ A).Furthermore, the characteristic function of a set A ⊂ R d is denoted by

Theorem 3 . 3 .
Let the hypotheses of Theorem 3.2 hold.For any measurable set A ⊂ Ω and sequence (ε k ) k∈N with ε k → 0 as k → ∞, there is a sequence of sets A k such that χ A k → χ A in L 1 (Ω) and the following bound holds lim sup k→∞ Per ε k (A k ; ρ) ≤ Per(A; ρ).

Proposition 3 . 7 .
Let Ω ⊂ R d be an open, bounded set with Lipschitz boundary, M ⊂ R d a C 1 -manifold without boundary, and A ⊂ Ω a set.Suppose that ∂A R d is a submanifold of M , with the additional properties that ∂A R d = M ∩ Ω and H d−1 (M ∩ ∂Ω) = 0.

Substep 1 . 2 :
Regularity of ∂A s .We claim that for almost every choice of s ∈ (3/8, 5/8), ∂A s R d is contained in the compact image of a Lipschitz function f : R d → Ω, i.e., f (K) = ∂A s R d for some K ⊂⊂ R d , and there is compact set N ⊂ Ω with H d−1 (N ) = 0, such that for any point x ∈ ∂A s R d \ N , there is a radius and such that χ Aη − χ A L 1 (Ω) ≤ η and lim sup ε→0 Per ε (A η ; ρ) ≤ Per(A; ρ) + η.

Lemma 4 . 9 (
Discrete limsup inequality).Under the conditions of Theorem 4.7 for any measurable A ⊂ R d with probability one there exists a sequence of sets(A n ) ⊂ X n with (χ An , ν n ) → (χ A , ρ) in T L 1 (Ω) and lim sup n→∞ E n (A n ) ≤ Per(A; ρ).