Phase field approximations of branched transportation problems

In branched transportation problems mass has to be transported from a given initial distribution to a given final distribution, where the cost of the transport is proportional to the transport distance, but subadditive in the transported mass. As a consequence, mass transport is cheaper the more mass is transported together, which leads to the emergence of hierarchically branching transport networks. We here consider transport costs that are piecewise affine in the transported mass with N affine segments, in which case the resulting network can be interpreted as a street network composed of N different types of streets. In two spatial dimensions we propose a phase field approximation of this street network using N phase fields and a function approximating the mass flux through the network. We prove the corresponding $\Gamma$-convergence and show some numerical simulation results.


Introduction
Branched transportation problems constitute a special class of optimal transport problems that have recently attracted lots of interest (see for instance [26, §4.4.2] and the references therein). Given two probability measures µ + and µ − on some domain Ω ⊂ R n , representing an initial and a final mass distribution, respectively, one seeks the most cost-efficient way to transport the mass from the initial to the final distribution. Unlike in classical optimal transport, the cost of a transportation scheme does not only depend on initial and final position of each mass particle, but also takes into account how many particles travel together. In fact, the transportation cost per transport distance is typically not proportional to the transported mass m, but rather a concave, nondecreasing function τ : [0, ∞) → [0, ∞), m → τ (m). Therefore it is cheaper to transport mass in bulk rather than moving each mass particle separately. This automatically results in transportation paths that exhibit an interesting, hierarchically ramified structure (see Figure 1). Well-known models (with parameters α < 1, a, b > 0) are obtained by the choices τ (m) = m α ("branched transport" [17]), τ (m) = min(m, am + b) ("urban planning" [9,10]), τ (m) = 0 if m = 0, 1 else ("Steiner tree problem" [2,18,25]). There exist multiple equivalent formulations for branched transportation problems. One particular formulation models the mass flux F as an unknown vector-valued Radon measure that describes the transport from µ + to µ − . The cost functional of the flux, which one seeks to minimize, is defined as if div F = µ + − µ − in the distributional sense (so that indeed F transports mass µ + to µ − ) and F = σH 1 S + F ⊥ for some countably 1-rectifiable set S ⊂ R n , an H 1 S-measurable function σ : S → R n , and a diffuse part F ⊥ (composed of a Cantor measure and an absolutely continuous measure). Otherwise, E µ+,µ− [F] = ∞.
In this article we devise phase field approximations of the functional E µ+,µ− in two space dimensions in the case of a piecewise affine transportation cost τ (m) = min{α 0 m, α 1 m + β 1 , . . . , α N m + β N } with positive parameters α i , β i . In fact, for a positive phase field parameter ε we consider (a slightly improved but less intuitive version of) the phase field functional [σ, ϕ 1 , . . . , ϕ N ] = ∞ otherwise. Our main result Theorem 2.1 shows that E µ+,µ− ε Γ-converges in a certain topology to E µ+,µ− as ε 0. Here the vector field σ approximates the mass flux F, the auxiliary scalar phase fields ϕ 1 , . . . , ϕ N disappear in the limit, and µ ε + , µ ε − are smoothed versions of µ + , µ − . One motivation for our particular choice of τ is that this allows a phase field version of the urban planning model, the other motivation is that any concave cost τ can be approximated by a piecewise affine function. Those phase field approximations then allow to find numerical approximations of optimal mass fluxes.

Related work
Phase field approximations represent a widely used tool to approach solutions to optimization problems depending on lower-dimensional sets. The concept takes advantage of the definition of Γ-convergence [6,7,15] to approximate singular energies by smoother elliptic functionals such that the associated minimizers converge as well. The term phase field is due to Modica and Mortola [22] who study a rescaled version of the Cahn and Hilliard functional which models the phase transition between two immiscible liquids and which turns out to approximate the perimeter functional of a set. Subsequently a similar idea has been used by Ambrosio and Tortorelli in [4,3] to obtain an approximation to the Mumford-Shah functional [23]. Later on these techniques have been used as well in fracture theory and optimal partitions problems [19,14]. More recently such phase field approximations have also been applied to branched transportation problems and the Steiner minimal tree problem (to find the graph of smallest length connecting a given set of points) which both feature high combinatorial complexity [20]. For instance, in [24] the authors propose an approximation to the branched transport problem based on the Modica-Mortola functional in which the phase field is replaced by a vector-valued function satisfying a divergence constraint. Similarly, in [5] Santambrogio et al. study a variational approximation to the Steiner minimal tree problem in which the connectedness constraint of the graph is enforced trough the introduction of a geodesic distance depending on the phase field. Our phase field approximations can be viewed as a generalization of recent work by two of the authors [11,12], in which essentially (1) for τ (m) = αm + β with α, β > 0 is approximated by an Ambrosio-Tortorelli-type functional defined as if div σ = ρ ε * (µ + − µ − ) for a smoothing kernel ρ ε and if ϕ ≥ α √ β ε almost everywhere, andẼ µ+,µ− ε [σ, ϕ] = ∞ otherwise. The function 1 − ϕ may be regarded as a smooth version of the characteristic function of a 1-rectifiable set (the Steiner tree) whose total length is approximated by the Ambrosio-Tortorelli phase field term in parentheses. In addition, the first term in the integral forces this set to contain the support of a vector field σ which encodes the mass flux from µ + to µ − .

Notation
Throughout, Ω ⊂ R 2 is an open bounded domain with Lipschitz boundary. The spaces of scalar and R n -valued continuous functions on the closure Ω are denoted C(Ω) and C(Ω; R n ) (for m times continuously differentiable functions we use C m ) and are equipped with the supremum norm · ∞ . Their topological duals are the spaces of scalar and R n -valued Radon measures (regular countably additive measures) M(Ω) and M(Ω; R n ), equipped with the total variation norm · M . Weak-* convergence in these spaces will be indicated by * . The subset P(Ω) ⊂ M(Ω) shall be the space of probability measures (nonnegative Radon measures with total variation 1). The total variation measure of some Radon measure λ will be denoted |λ|, and its Radon-Nikodym derivative with respect to |λ| as λ |λ| . The restriction of λ to a measurable set A is abbreviated λ A. The pushforward of a measure λ under a measurable map T is denoted T # λ. The standard Lebesgue and Sobolev spaces are indicated by L p (Ω) and W k,p (Ω), respectively; if they map into R n we write L p (Ω; R n ) and W k,p (Ω; R n ). The associated norms are indicated by · L p and · W k,p . Finally, the n-dimensional Hausdorff measure is denoted H n .
Structure of the paper: In Section 2 we introduce the approximating energies and precisely state our results. Section 3 is devoted to the proof of the Γ-convergence result. Its first three subsections deal with the Γ − lim inf inequality which is obtained via slicing, while the remaining two subsections prove an equicoercivity result and the Γ−lim sup inequality. Finally, in Section 4 we introduce a numerical discretization and algorithmic scheme to perform some exemplary simulations of branched transportation networks.

Model summary and Γ-convergence result
Here we state in more detail the considered variational model for transportation networks and its phase field approximation as well as the Γ-convergence result.

Introduction of the energies
Before we state the original energy and its phase field approximation, let us briefly recall the objects representing the transportation networks.
Definition 1 (Divergence measure vector field and mass flux).
1. A divergence measure vector field is a measure F ∈ M(Ω; R 2 ), whose weak divergence is a Radon measure, div F ∈ M(Ω), where the weak divergence is defined as for all ψ ∈ C 1 (R 2 ) with compact support.
By [28], any divergence measure vector field F can be decomposed as 3. Given µ + , µ − ∈ P(Ω) with compact support in Ω, a mass flux between µ + and µ − is a divergence measure vector field F with div F = µ + − µ − . The set of mass fluxes between µ + and µ − is denoted is uniformly bounded in H so that we can extract a weakly converging subsequence, still indexed by k for simplicity, Due to the closedness of X µ+,µ− ε with respect to weak convergence in H we have (σ, ϕ 1 , . . . , ϕ N ) ∈ X µ+,µ− ε . Note that the integrand of E µ+,µ− ε is convex in σ(x) and the ∇ϕ i (x) as well as continuous in σ(x) and the ϕ i (x), thus E µ+,µ− ε is lower semi-continuous along the sequence. Indeed, consider a subsequence along which each term L ε [ϕ k i ] converges and along which the ϕ k i converge pointwise almost everywhere (so that also γ k ε (x) = min i=1,...,N ϕ k i (x) 2 + α 2 i ε 2 /β i converges for almost all x ∈ Ω). By Mazur's lemma, a sequence of convex combinations m k j=k λ k j σ j of the σ k converges strongly (and up to another subsequence again pointwise) so that by Fatou's lemma we have where we exploited the weak lower semi-continuity of L ε , the monotonicity of ω ε α 0 , γε(x) ε , |σ(x)| in its second argument, its convexity in its last argument, and its continuity in its latter two arguments.
Remark 2 (Motivation of ω ε α 0 , γε(x) ε , |σ(x)| via relaxation). Keeping the phase fields ϕ 1 , . . . , ϕ N fixed and ignoring the regularizing term ε p |σ| 2 , the integrand ω ε (α 0 , γε ε , |σ|) is the convexification in σ of which shows the intuition of the phase field functional much clearer. Indeed, the minimum over N + 1 terms parallels the minimum in the definition of τ , and the i th term for i = 0, . . . , N describes (part of) the transportation cost α i m + β i . However, since the above is not convex with respect to σ, a functional with this integrand would not be weakly lower semi-continuous in σ and consequently possess no minimizers in general. Taking the lower semi-continuous envelope corresponds to replacing the above by ω ε (α 0 , γε ε , |σ|) (note that this only ensures existence of minimizers, but will not change the Γ-limit of the phase field functional).

Statement of Γ-convergence and equi-coercivity
Let us extend both E µ+,µ− and E We have the following Γ-convergence result.
Theorem 2.1 (Convergence of phase field cost functional). For admissible µ + , µ − ∈ P(Ω) we have where the Γ-limit is with respect to weak- * convergence in M(Ω; R 2 ) and strong convergence in L 1 (Ω) N .
The proof of this result is provided in the next section. Together with the following equicoercivity statement, whose proof is also deferred to the next section, we have that minimizers of the phase field cost functional E µ+,µ− ε approximate minimizers of the original cost functional E µ+,µ− .
Remark 3 (Phase field boundary conditions). Recall that we imposed boundary conditions ϕ i = 1 on ∂Ω. Without those, the recovery sequence from the following section could easily be adapted such that all full phase field profiles near the boundary will be replaced by half, one-sided phase field profiles. It is straightforward to show that the resulting limit functional would become where fluxes along the boundary are cheaper and thus preferred.
Remark 4 (Divergence measure vector fields and flat chains). Any divergence measure vector field can be identified with a flat 1-chain or a locally normal 1-current (see for instance [28,Sec. 5] or [8,Rem. 2.29(3)]; comprehensive references for flat chains and currents are [30,16]). Furthermore, for a sequence σ j , j ∈ N, of divergence measure vector fields with uniformly bounded div σ j M , weak- * convergence is equivalent to convergence of the corresponding flat 1chains with respect to the flat norm [8,Rem. 2.29(4)]. Analogously, scalar Radon measures of finite total variation and bounded support can be identified with flat 0-chains or locally normal 0-currents [29,Thm. 2.2], and for a bounded sequence of compactly supported scalar measures, weak- * convergence is equivalent to convergence with respect to the flat norm of the corresponding flat 0-chains.
From the above it follows that in Theorems 2.1 and 2.2 we may replace weak- * convergence by convergence with respect to the flat norm. Indeed, for both results it suffices to consider sequences (σ ε , ϕ ε 1 , . . . , ϕ ε N ) with uniformly bounded cost E µ+,µ− ε . For those we have uniformly bounded σ ε M (by Theorem 2.2) as well as uniformly bounded div σ ε M = µ ε + − µ ε − M so that weak- * and flat norm convergence are equivalent.
3 The Γ-limit of the phase field functional In this section we prove the Γ-convergence result. As is canonical, we begin with the lim inf-inequality, after which we prove the lim sup-inequality as well as equicoercivity.

The Γ − lim inf inequality for the dimension-reduced problem
Here we consider the energy reduced to codimension-1 slices of the domain Ω. In our particular case of a two-dimensional domain, each slice is just onedimensional, which will simplify notation a little (the procedure would be the same for higher codimensions, though). The reduced functional depends on the (scalar) normal flux ϑ through the slice as well as the scalar phase fields ϕ 1 , . . . , ϕ N restricted to the slice. 1. The decomposition of a measure ϑ ∈ M(I) into its atoms and the remain- where S ϑ ⊂ I is the set of atoms of ϑ, m ϑ : S ϑ → R is H 0 S ϑ -measurable, and ϑ ⊥ contains no atoms.
Definition 5 (Cost domains). For given (ϑ, ϕ 1 , . . . , ϕ N ) ∈ L 2 (I) × W 1,2 (I) N we set The sets are analogously defined for (σ, where we use the same notation (which case is referred to will be clear from the context).
We now show the following lower bound on the energy, from which the Γ − lim inf inequality for the dimension-reduced situation will automatically follow.
and denote the collection of connected components of I \I η by C η . Furthermore define the subcollection of connected components Proof. 1. (α 0 < ∞) We first show that without loss of generality we may assume The motivation is that there may be regions in which a phase field ϕ i is (still) small, but in which we actually have to pay α 0 |ϑ|. Thus, in those regions we would like ω ε (α 0 , γε(x) ε , |ϑ(x)|) to approximate α 0 |ϑ(x)| sufficiently well, and the above condition on ϑ ensures (3) We achieve (2) by modifying ϑ while keeping the cost as well as Iη |ϑ| dx and C |ϑ| dx for all C ∈ C η the same so that the overall estimate of the proposition is not affected. The modification mimics the relaxation from Remark 2: the modified ϑ oscillates between small and very large values. Indeed, for fixed where t C is chosen such that C |θ| dx = C |ϑ| dx (this is possible, since for t C = ∞ we have |θ| ≥ |ϑ| and for t C = −∞ we have |θ| ≤ |ϑ| everywhere on C). The cost did not change by this modification since Note that the modificationθ has a different set K ε 0 (θ, ϕ 1 , . . . , ϕ N ; I) than the original ϑ. Indeed, by definition ofθ we have where we have employed (3) and Jensen's inequality. Upon minimizing in m 0 , which yields the optimal value α 0 εH 1 (( Next consider for each C ∈ C η \ C ≥ η and i = 1, . . . , N the subsets and abbreviate m A = A |ϑ| dx for any A ⊂ I. Using Young's inequality, for i, j = 1, . . . , N we have Similarly, using Jensen's inequality we have where in the last step we optimized for H 1 (C < i ). Finally, if inf C ϕ i ≤ δ we have (using Young's inequality) where (c, d) and (e, f ) denote the first and the last connected component of Summarizing the previous estimates we obtain Finally, we obtain the desired estimate, 2. (α 0 = ∞) In this case the set K ε 0 is empty, and the cost functional reduces to With Jensen's inequality we thus compute Furthermore, the same calculation as in the case α 0 < ∞ yields with respect to weak- * convergence in M(J) and strong convergence in L 1 (J) N .
Proof. Let (ϑ ε , ϕ ε 1 , . . . , ϕ ε N ) be an arbitrary sequence converging to (ϑ, ϕ 1 , . . . , ϕ N ) in the considered topology as ε → 0, and assume without loss of generality that the limit inferior of G ε [ϑ ε , ϕ ε 1 , . . . , ϕ ε N ; J] is actually a limit and is finite (else there is nothing to show). Further we may assume and thus also in L 1 (Ĩ) so that ϕ i = 1. Even more, after passing to another subsequence, by Egorov's theorem all ϕ ε i converge uniformly to 1 outside a set of arbitrarily small measure. In particular, for any ξ > 0 we can find an open interval (a + ξ, b − ξ) ⊂ I ⊂Ĩ such that ϕ ε i → 1 uniformly on ∂I, and for any η < 1 there is an open set I η ⊂ I with We now choose δ = ε 1/3 and η = 1 − ε and denote by C η (ε) and C ≥ η (ε) the collections of connected components of I \ I η from Proposition 3.1 (which now depend on ε). Further we abbreviate . The bound of Proposition 3.1 implies that the number of elements in C < η (ε) is bounded uniformly in ε and η. Passing to another subsequence we may assume C < η (ε) to contain K sets C 1 (ε), . . . , C K (ε) whose midpoints converge to x 1 , . . . , x K ∈ I, respectively. Thus for an arbitrary ζ > 0 we have that for all ε small enough where in the second inequality we used for all measurable A, B, C ⊂ I (due to the subadditivity of τ ) and in the third inequality we used τ (m) ≤ α 0 m as well as the lower semi-continuity of the mass on an open set. Letting now ζ → 0 (so that by the σ-continuity of ϑ we have If on the other hand α 0 = ∞ we obtain from Proposition 3.1 where B ζ ({xi})\Ci(ε) |ϑ ε | dx decreases to zero by Proposition 3.1. Therefore, where in the second inequality we used τ (m 1 + m 2 ) ≤ τ (m 1 ) + α 1 m 2 for any The proof is concluded by letting ξ → 0 and noting lim inf ξ→0 G[ϑ; I] ≥ G[ϑ;Ĩ].

Slicing of vector-valued measures
We derive now some technical construction for divergence measure vector fields which are needed to reduce the Γ − lim inf inequality to the lower-dimensional setting of the previous section. In particular, we will introduce slices of a divergence measure vector field, which in the language of geometric measure theory correspond to slices of currents. We will slice in the direction of a unitary vector ξ ∈ S n−1 with orthogonal hyperplanes of the form The orthogonal projection onto H ξ,t is denoted The slicing will essentially be performed via disintegration. Let σ be a compactly supported divergence measure vector field. By the Disintegration Theorem [1, Thm. 2.28], for all ξ ∈ S n−1 and almost all t ∈ R there exists a unique measure We decompose π ξ # |σ · ξ| into its absolutely continuous and singular part according to π ξ # |σ · ξ| = σ ξ (t) dt + σ ⊥ ξ for dt the Lebesgue measure on R.
Lemma 3.1. For any ξ ∈ S n−1 and any compactly supported divergence measure vector field σ we have σ ⊥ ξ = 0, that is, the measure π ξ # |σ · ξ| = σ ξ (t) dt is absolutely continuous with respect to the Lebesgue measure on R. Moreover, for almost all t ∈ R and any compactly supported θ ∈ C ∞ (R n ) we have Proof. Abbreviate H = ξ ⊥ = H ξ,0 with corresponding orthogonal projection π H , let φ ∈ C ∞ (H) and ψ ∈ C ∞ (R) be compactly supported, and define Introducing Ψ(t) = t −∞ ψ(s) ds we obtain via the chain and product rule so that (denoting by χ A the characteristic function of a set A) (Note that we could just as well have used χ {ξ·x>s} instead of χ {ξ·x≥s} , which would ultimately lead to integration domains {ξ · x ≤ t} in (4); for almost all t this will be the same.) Applying the Fubini-Tonelli Theorem we obtain where in the second step we just added 0 = R n ∇[φ•π H ]·dσ + R n φ•π H d div σ in the square brackets. On the other hand, using the disintegration of σ · ξ we also have H ξ,s φ(π H (y)) dν ξ,s (y) (σ ξ (s) ds + dσ ⊥ ξ (s)) .
We now define the slice of a divergence measure vector field as the measure obtained via disintegration with respect to the one-dimensional Lebesgue measure.
Remark 5 (Properties of sliced functions and measures). 1. By Fubini's theorem it follows that for any function f of Sobolev-type W m,p the corresponding sliced function f ξ,t is well-defined and also of Sobolev-type W m,p for almost all ξ ∈ S n−1 and t ∈ R. For the same reason, strong convergence f j → j→∞ f in W m,p implies strong convergence (f j ) ξ,t → f ξ,t in W m,p on the sliced domain.
2. The definitions of sliced functions and measures are consistent in the following sense. If we identify a Lebesgue function f with the measure χ = f L for L the Lebesgue measure, then the same identification holds between f ξ,t and χ ξ,t for almost all ξ ∈ S n−1 and t ∈ R.
3. Let σ be a divergence measure vector field, then the properties [1, Thm. 2.28] of the disintegration σ · ξ = ν ξ,t ⊗ π ξ # |σ · ξ|(t) = ν ξ,t ⊗ σ ξ (t) dt = σ ξ,t ⊗ dt immediately imply the following. The map t → σ ξ,t M is integrable and satisfies Furthermore, for any measurable function f : R n → R, absolutely integrable with respect to |σ · ξ|, it holds We briefly relate our definition of sliced measures to other notions of slices from the literature. . This σ ξ,t equals the so-called normal trace of σ on H ξ,t (see [28] for its definition and properties). In general it is not a measure but continuous on Lip(H ξ,t ) in the sense 2. Interpreting a divergence measure vector field as a 1-current or a flat 1chain,Šilhavý's definition of σ ξ,t is identical to the classical slice of σ on H ξ,t as for instance defined in [29]  3. Our notion of a sliced measure from Definition 6 is equivalent to both above-mentioned notions. Indeed, (4) implies which shows that the sliced measure represents the normal flux through the hyperplane H ξ,t = {x · ξ = t}. This, however, is the same characterization as given in [28, (3.6)] and [16, 4.2.1] for both above notions of slices.
We conclude the section with several properties needed for the Γ − lim inf inequality. The following result makes use of the Kantorovich-Rubinstein norm (see for instance [21, eq. (2) & (5)]; in geometric measure theory it is known as the flat norm) on M(R n ), defined by For measures of uniformly bounded support and uniformly bounded mass it is known to metrize weak- * convergence (see for instance [8,Rem. 2.29(3)-(4)]). We will furthermore make use of the following fact. Let T s : x → x − sξ be the translation by s in direction −ξ. It is straightforward to check that for any divergence measure vector field µ ∈ M(R n ; R n ) we have div(π H ξ,t # (µ − µ · ξ ξ)) = π H ξ,t # ( div µ) .
As a consequence, for any µ ∈ M(H ξ,t ) and ν ∈ M(H ξ,t+s ) we have Indeed, let δ > 0 arbitrary and Proof. It suffices to show σ j ξ,t * σ ξ,t for a subsequence.
Remark 7 (Flat convergence of sliced currents). The convergence from Theorem 3.1 is consistent with the following property of slices of 1-currents: If σ j , j ∈ N, is a sequence of 1-currents of finite mass with σ j → σ in the flat norm, then (potentially after choosing a subsequence) σ j ξ,t → σ ξ,t in the flat norm for almost every ξ ∈ S n−1 , t ∈ R (see [13,
As a result, for a given ξ the set of t such that σ ξ,t is not H 0 -diffuse is a subset of π ξ (Θ). Thus it remains to show that for almost all ξ ∈ S n−1 the set π ξ (Θ) is a Lebesgue-nullset. Writing it actually suffices to show that π ξ (Θ) is a Lebesgue-nullset for any p ∈ N. Now by the properties of the 1-dimensional density of a measure [1,Thm. 256], so that Θ p can be decomposed into a countably 1-rectifiable and a purely 1-unrectifiable set [1, p. 83], for any Lipschitz f : R → R n ). By the H 1 -diffusivity assumption on σ we have (abbreviating the Lebesgue measure by L) and by a result due to Besicovitch [1, Thm. 2.65] we have L(π ξ (Θ u p )) = 0 for almost all ξ ∈ S n−1 . Thus, for almost all ξ ∈ S n−1 we have L(π ξ (Θ p )) = 0, as desired.
Remark 9 (Characterization of divergence measure vector fields). By a result due to Smirnov [27], any divergence measure vector field σ can be decomposed into simple oriented curves σ γ = γ #γ ds [0, 1] with γ : [0, 1] → R n a Lipschitz curve and ds the Lebesgue measure, that is, with J the set of Lipschitz curves and µ σ a nonnegative Borel measure.
The results of this section can alternatively be derived by resorting to this characterization, since the slice of a simple oriented curve σ γ can be explicitly calculated.

The Γ − lim inf inequality
We now prove the desired lim inf-inequality, which as already anticipated will be obtained by slicing.
with respect to weak- * convergence in M(Ω; R 2 ) and strong convergence in Proof. Let (σ ε , ϕ ε 1 , . . . , ϕ ε N ) converge to (σ, ϕ 1 , . . . , ϕ N ) in the considered topology. We first extend σ ε and σ to R 2 \ Ω by zero and ϕ ε i and ϕ i to R 2 \ Ω by 1 for i = 1, . . . , N . The phase field cost functional and the cost functional are extended to R 2 in the obvious way (their values do not change). Without loss of generality (potentially after extracting a subsequence) we may assume lim ε→0 E µ+,µ− ε [σ ε , ϕ ε 1 , . . . , ϕ ε N ] to exist and to be finite (else there is nothing to show). As a consequence we have div σ ε = µ ε + − µ ε − as well as div σ = µ + − µ − and ϕ 1 ≡ . . . ≡ ϕ N ≡ 1 (since the phase field cost functional is bounded below by . Now let A ⊂ R 2 open and bounded; the restriction of the phase field cost functional to a domain A will be denoted E µ+,µ− ε [·; A]. Choosing some ξ ∈ S 1 , by Fubini's decomposition theorem we have where G ε is the dimension-reduced phase field energy from Definition 4 and for simplicity we identified the domain A ξ,t of the sliced functions with an open subset of the real line. Fatou's lemma thus implies By assumption, the left-hand side is finite so that the right-hand side integrand is finite for almost all t ∈ R as well. Pick any such t and pass to a subsequence such that lim inf turns into lim. Indeed σ ε ξ,t * σ ξ,t for every ξ and almost all t, as σ ε * σ and Theorem 3.1. Thus, Corollary 3.1 on the reduced dimension problem implies for almost all t ∈ R so that For notational convenience let us now define the auxiliary function κ, defined for open subsets A ⊂ R 2 , as Furthermore, introduce the nonnegative Borel measure as well as the |σ|-measureable Borel functions for some sequence ξ j , j ∈ N, dense in S 1 .
Since σ is a divergence measure vector field, we have the desired result.

Equicoercivity
Proof of Theorem 2.2. Due to C > E (Ω) and thus also in L 1 (Ω). Furthermore, we will show further below that σ ε L 1 = σ ε M is uniformly bounded, which by the Banach-Alaoglu theorem implies existence of a weakly-* converging subsequence (still denoted σ ε ) with limit σ ∈ M(Ω; R 2 ). It is now a standard property of Γ-convergence that, due to the above equicoercivity, any sequence of minimizers of E µ+,µ− ε contains a subsequence converging to a minimizer of E µ+,µ− .

The Γ − lim sup inequality
of polyhedral divergence measure vector fields in Ω such that If µ + and µ − are finite linear combinations of Dirac masses (which we have assumed in the case α 0 = ∞), we may even choose µ j ± = µ ± . We will construct the recovery sequence based on those polyhedral divergence measure vector fields. In the following we will use the notation for the phase field cost functional without divergence constraints.
Step 1. Initial construction for a single polyhedral segment In this step we approximate a single line element m k,j θ k,j H 1 Σ k,j of σ j by a phase field version. To this end we fix j and 0 ≤ k ≤ M j and drop these indices from now on in the notation for the sake of legibility. Without loss of generality we may assume Σ = [0, L] × {0}, m > 0, and θ = e 1 the standard Euclidean basis vector. Setῑ = argmin{α i m + β i | i = 0, . . . , N } to identify the phase field that will be active on Σ (ῑ = 0 means that no phase field is active). We first specify (a preliminary version of) the vector field σ ε . To this end let d Σ denote the distance function associated with Σ and define the width ifῑ > 0 and a ε ι = α 0 mε otherwise (6) over which the vector field will be diffused. We now set where χ A shall denote the characteristic function of a set A. Note that this vector field encodes a total mass flux of m that is evenly spread over the a ε ιenlargement of Σ. The corresponding active phase field will be zero in that region. Indeed, consider the auxiliary Cauchy problem Figure 2: Left: Sketch of the optimal profile of |σ ε | and a phase field ϕ ε ι for someῑ > 0 with m = 2, ε = 0.1, αῑ = 1, βῑ = 1. Right: Sketch of the numerical solution to the 1D problem with the same parameters. whose solution φ ε (t) = 1−exp − t ε represents the well-known optimal Ambrosio-Tortorelli phase field profile. Then we set ϕ ε i = 0 for all i =ῑ and, ifῑ = 0, Let us now evaluate the corresponding phase field cost. In the caseῑ = 0 (which can only occur for α 0 < ∞) we obtain where we abbreviated q = min{1, p−1} > 0 and C(m, L) > 0 denotes a constant depending on m and L. In the caseῑ = 0 we have |σ ε | = βῑ/(αῑε) as well as γ ε = α 2 ι ε 2 /βῑ on the support of |σ ε | so that Furthermore we have L ε (ϕ ε i ) = 0 for i =ῑ and, employing the coarea formula, Summarizing, Step 2. Adapting sources and sinks of all polyhedral segments The vector field constructions σ ε k,j from the previous step for each polyhedral segment Σ k,j are not compatible with the divergence constraint associated with the measure σ j , that is, We remedy this by adapting the source and sink of each σ ε k,j . Set then all vector fields σ ε k,j have support in a band around Σ k,j of width no larger than r(j)ε. Without loss of generality we assume r(j) ≥ 1 (else we just increase it). Again we concentrate on a single segment with fixed j and k and drop these indices in the following (we will also write r instead of r(j)). Denote by s + and s − the starting and ending point of the segment Σ with respect to the orientation induced by θ. Consider the elliptic boundary value problems where δ y denotes a Dirac mass centered at y, B r (y) denotes the open ball of radius r around y, and ν denotes the outer unit normal to ∂B r (0). Note that the boundary value problems and their solutions u + and u − are independent of ε due to the definition of σ ε . Setting (where we assume ε small enough such that B εr (s + ) and B εr (s − ) do not intersect) it is straightforward to check Furthermore, to have at least one phase field zero on the new additional support B εr (s + ) ∪ B εr (s − ) of the vector field we set Let us now estimate the costs. Let us assume that ε is small enough so that all balls B εr(j) (s ± k,j ) are disjoint as are the supports suppσ ε k,j \ (B εr(j) (s + k,j ) ∪ B εr(j) (s − k,j )) for all k. An upper bound can then be achieved via The last summand can be bounded above by Thus, if we set S ± = {l ∈ {1, . . . , M j } | s ± j,l = s} for fixed s = s + k,j or s = s − k,j we have for some constant C(σ j ) > 0 depending on the polyhedral divergence measure vector field σ j and the considered point s. In summary, we thus have for some constant C(σ j ) depending on σ j .
Step 3. Correction of the global divergence Recall that the vector field σ ε to be constructed has to satisfy div σ ε = ρ ε * (µ + − µ − ). In the case α 0 = ∞ the vector fieldσ ε j already has that property due to µ j ± = µ ± (thus we set σ ε j =σ ε j ). However, if α 0 < ∞ (so that admissible sources and sinks µ + and µ − do not have to be finite linear combinations of Dirac masses) we still need to adapt the vector field to achieve the correct divergence. To this end, let λ j ± ∈ M({x ∈ Ω | dist(x, ∂Ω) ≥ ε; R 2 ) be the optimal Wasserstein-1 flux between µ j ± and µ ± , that is, λ ± j minimizes λ M among all vector-valued measures λ with div λ = µ j ± − µ ± . Setting we thus obtain div σ ε j = ρ ε * (µ + − µ − ), as desired. The additional cost can be estimated using the fact where µ j ± M is an upper bound for the total mass moved by λ j ± and the constant C > 0 depends on ρ. With those ingredients we obtain since the Wasserstein-1 distance W 1 (·, ·) metrizes weak- * convergence. Furthermore, in the previous steps we have already estimated ε p σ ε j 2 L 2 ≤ C(σ j )ε q . Finally, Summarizing, for some constant C > 0 and C(σ j ) > 0 depending only on σ j .

Numerical experiments
Here we describe the numerical discretization with finite elements and the subsequent optimization procedure used in our experiments.

Discretization
The proposed phase field approximation allows a simple numerical discretization with piecewise constant and piecewise linear finite elements. Let T h be a triangulation of the space Ω of grid size h such that Ω = T ∈T h T . Denoting by P m the space of polynomials of degree m, we define the finite element spaces the discretized phase field problem now reads where all integrals are evaluated using midpoint quadrature and the divergence constraint is enforced in weak form. In our numerical experiments we use Ω = (0, 1) 2 with a regular quadrilateral grid whose squares are all divided into two triangles.

Optimization
Here we describe the numerical optimization method used to find a minimizer of the objective functional. We first elaborate the simplest case in which the transport cost τ only features a single affine segment, that is, α 0 = ∞ and N = 1. Afterwards we consider the setting with N > 1 and subsequently also with α 0 < ∞, which requires more care due to the higher complexity of the energy landscape.
Single phase field and no diffuse mass flux (N = 1, α 0 = ∞) In this case the energy reads The employed optimization method is similar to the one presented in [12] and updates the variables σ and ϕ alternatingly.
Let us abbreviate f ε = ρ ε * (µ + − µ − ). For minimization with respect to σ, we use the dual variable λ ∈ X 1 h to enforce the divergence constraint and write where the last step follows by standard convex duality. The minimization in σ can be performed explicitly, yielding σ = ε∇λ γε . Inserting this solution leads to a maximization problem in λ, The corresponding optimality conditions, represent a linear system of equations that can readily be solved numerically for λ so that subsequently σ can be computed (note that Ω f ε dx = 0 so that the above equation has a solution). Fixing σ, the optimality condition for ϕ reads which can again be solved under the constraint ϕ 1 | ∂Ω = 1 by a linear system solver.
In addition to the alternating minimization a stepwise decrease of the phase field parameter ε is performed, starting from a large value ε start for which the energy landscape shows fewer local minima. Algorithm 1 summarizes the procedure.
Multiple phase fields and no diffuse mass flux (N > 1, α 0 = ∞) Again we aim for an alternating minimization scheme. Fixing ϕ 1 , . . . , ϕ N , the optimization for σ is the same as before, since only γ ε changes. However, the optimization in the phase fields ϕ 1 , . . . , ϕ N is strongly nonconvex (due to the minimum in γ ε ) and thus requires a rather good initialization and care in the alternating scheme.
To avoid minimization for phase field ϕ i with the min-function inside γ ε we perform a heuristic operator splitting: in each iteration of the optimization we Niter−1 set ϕ j to the solution of (8) for given fixed σ = σ j−1 set γ j ε = ϕ j 2 + α 2 1 ε 2 j /β 1 set λ j to the solution of (7) for given fixed γ ε = γ j ε set σ j = εj ∇λ j 2γ j ε end for end function return σ Niter , ϕ Niter , λ Niter first identify at each location which term inside γ ε = min(ϕ 2 1 +α 2 1 ε 2 /β 1 , . . . , ϕ 2 N + α 2 N ε 2 /β N ) is the minimizer, which is equivalent to specifying the regions Afterwards we then optimize the energy E ε separately for each phase field ϕ i assuming the regions R ε i fixed, that is, we minimize where χ R ε i is the characteristic function of region R ε i . Similarly to the case N = 1 of a single phase field, this amounts to solving the linear system for ϕ i ∈ X 1 h with ϕ i | ∂Ω = 1. Since in the above simple approach the regions R ε i and phase fields ϕ i can move, but cannot nucleate within a different region (indeed, imagine for instance R ε 1 = Ω, then ϕ 2 , . . . , ϕ N will be optimized to equal 1 everywhere so that in the next iteration again R ε 1 = Ω), a suitable initial guess is crucial. To provide such a guess for the initial regions R ε i , we proceed as follows. We first generate some flux network σ 0 consistent with the given source and sink. This can for instance be done using the previously described algorithm for just a single phase field: in our simulations we simply ignore ϕ 2 , . . . , ϕ N and pretend only the phase field ϕ 1 would exist (essentially this means we replace the cost function τ by m → α 1 m + β 1 for m > 0; of course, an alternative choice would be to take m → αm + β for some α, β > 0 that better approximate τ for a larger range of values m). We then identify the total mass flowing through each branch of σ 0 . To this end we convolve |σ 0 | with the characteristic function χ Br(0) of a disc of radius r. If r is sufficiently large compared to the width of the support of σ 0 , we obtain χ Br(0) * |σ 0 | (x) = where m(x) denotes the mass flux through the nearby branch of σ 0 . Now we can compute the regions and furthermore use σ 0 as initial guess of the vector field. Algorithm 2 summarizes the full procedure.
Niter−1 set ϕ j i to the solution of (10) for given fixed σ = σ j−1 , i = 1, . . . , N update regions R ε 1 , . . . , R ε N via (9) set γ j ε = min i=1,...,N (ϕ j i ) 2 + α 2 i ε 2 /β i set λ j to the solution of (7) for given fixed γ ε = γ j ε set σ j = εj ∇λ j 2γ j ε end for end function return σ Niter , ϕ Niter 1 , . . . , ϕ Niter N Note that the estimate m(x) of the flowing mass is only valid in close proximity of the support of σ 0 so that the regions R ε i are only reliable near σ 0 . However, away from σ 0 all phase fields will be close to 1 anyway so that the regions R ε i do not play a role there. The effectiveness of the initialization can be further improved by an energy rescaling which we typically perform in our simulations: Recall that the optimal width (6) of the support of the vector field σ not only depends on the transported mass m(x), but also on which phase field is active at x. Thus, initializing with some vector field σ that was computed based on preliminary active phase fields may erroneously give slight preference to incorrect phase fields. This can be avoided by a small parameter change which assigns a different ε to each phase field. Indeed, setting ε i = β i ε/α i to be the phase field parameter associated with phase field ϕ i , equation (6) shows that the support width of σ becomes mε and thus no longer depends on the phase field. Thus, in practice we usually work with the phase field cost withγ ε (x) = min i=1,...,N {ϕ i (x) 2 +α 2 i εε i /β i } = min i=1,...,N {ϕ i (x) 2 +α i ε 2 }. The Γ-convergence result can readily be adapted to this case.

Experimental results
The algorithms were implemented in MATLAB c ○ ; parameters reported in this section refer to the rescaled cost (12). We first present simulation results for a single phase field and no diffuse mass flux, N = 1 and α 0 = ∞. Figures 3  and 4 show solutions for a source and a number of equal sinks arranged as a regular polygon. If α 1 is small as in Figure 3, the solution looks similar to the Steiner tree (which would correspond to α 1 = 0), while the solutions become much more asymmetric for larger α 1 as in Figure 4. More complex examples are displayed in Figure 5. Figure 6 shows simulation results for the same source and sink configuration as in Figure 5, but this time with N = 3 different linear segments in τ and corresponding phase fields. It is clear that different phase fields become active on the different network branches according to the mass flux through each branch. This can be interpreted as having streets of three different qualities: the street ϕ 3 allows faster (cheaper) transport, but requires more maintenance than the others, while street ϕ 1 requires the least maintenance and only allows expensive transport.
The case α 0 < ∞ finally can be interpreted as the situation in which mass can also be transported off-road, that is, part of the transport may happen without a street network, thus having maintenance cost β 0 = 0, but at the price of large transport expenses α 0 per unit mass. Corresponding results for again the same source and sink configuration are shown in Figure 7. In contrast to the case α 0 = ∞ it is now also possible to have sources and sinks that are not concentrated in a finite number of points. A corresponding example is shown in Figure 8.