Geometric linearisation for optimal transport with strongly p-convex cost

We prove a geometric linearisation result for minimisers of optimal transport problems where the cost-function is strongly p-convex and of p-growth. Initial and target measures are allowed to be rough, but are assumed to be close to Lebesgue measure.


Introduction
The study of the optimal transport problem: is well established.Here λ, µ are two (finite) non-negative measures on R d satisfying λ(R d ) = µ(R d ).We refer the reader to [26] and [23] for an introduction and overview of the literature.When solutions take the form of a transport map π = (Id × T ) # µ, under mild assumptions, minimisers are characterised by satisfying the Euler-Lagrange equation as well as the additional structure condition T (x) = x + ∇c * (Dφ), (1.3) where φ is a c-concave function and c * denotes the convex conjugate of c.Assuming µ ∼ λ ∼ 1 and linearising the geometric nonlinearity in (1.2), that is formally expanding det(Id + A) = 1 + tr A + . .., we find that div ∇c * (Dφ) = λ − µ.
(1.4) Thus, at least formally, we expect solutions of (1.1) to be well approximated by solutions of (1.4).Note that in general (1.4) is a nonlinear equation.Thus we refer to the process of moving from (1.1) to (1.4) as geometric linearisation.The aim of this paper is to make this connection rigorous.We show the following: Theorem 1.1.Let 1 < p < ∞.Suppose c : R d → R is a strongly p-convex cost function of controlled-duality p-growth, that is satisfying (1.11)- (1.14).Let π be a minimiser of (1.1) for some non-negative measures λ, µ satisfying λ(R d ) = µ(R d ).Denote Here Then, for every τ > 0, there exists ε(τ ) > 0 such that if E(4) + D(4) ≤ ε 1 , then there exists a radius R ∈ (2, 3), c ∈ R and φ satisfying c(x − y − ∇c * (Dφ)) dπ τ E(4) + D( 4).We remark that we explain our assumptions on the cost function in detail in Section 1.1.Theorem 1.1 states that if at some scale the local transportation cost E(R) (and the data term) are small, then at a smaller scale, the transportation plan is wellapproximated by ∇c * (Dφ), in the sense that estimate (1.5) holds.Note in particular that as a consequence of (1.5) and (1.13), also |x − y − ∇c * (Dφ)| p dπ τ E(4) + D(4).
Thus Theorem 1.1 makes the intuition leading to (1.4) rigorous.
Traditionally, (1.1) has been approached via the study of (1.2) using the theory of fully nonlinear elliptic equations developed by Caffarelli, see e.g.[2,1] and the references therein.Recently, an alternative approach using variational techniques has been developed by Goldman and Otto in [9].There, partial C 1,α -regularity for solutions to (1.1) in the case of Hölder-regular densities λ, µ and quadratic cost function c(x − y) = 1 2 |x − y| 2 was proven.The key tool in the proof was a version of Theorem 1.1 in this setting.In later papers, continuous densities [6], rougher measures [8], more general cost functions (albeit still close to the quadratic cost functional) [21], as well as almost-minimisers of the quadratic cost functional [21] were considered.The quadratic version of Theorem 1.1 was also used to provide a more refined linearisation result of (1.2) in the quadratic set-up in [8] and of a similar statement in the context of optimal matching in [7].Finally, quadratic versions of Theorem 1.1 played a key role in disproving the existence of a stationary cyclically monotone Poisson matching in 2-d [14].
We remark that very little information is available about the regularity of minimiser of (1.1) already in the simplest degenerate/singular case c(x − y) = |x−y| p p .In order to attempt to extend the techniques of [9] to this setting, an essential first step is Theorem 1.1.This result will also play a key role in extending the results of [14] to p-costs with p = 2 [15].
The strategy of proof is similar to that used in [8], although with a number of simplifications.Further, we point the reader to [16] where a detailed account of the proof of Theorem 1.1 and the motivations behind the strategy are given in the quadratic case.The heart of the proof is contained in Section 7. The key insight Lemma 7.2 is a consequence of the strong p-convexity, which allows to estimate (up to error terms) the left-hand side of the main estimate in Theorem 1.1 by a sum of two terms: on the one hand the difference between the local transportation energy of π and a dual form of the energy of φ: and on the other hand ˆ{∃t: X(t)∈B R } ˆσ τ y − x − ∇c * (Dφ(X(t)), Dφ(X(t)) dt dπ. (1.7) Here we write X(t) = (1 − t)x + ty and set τ = inf{t ∈ [0, 1] : X(t) ∈ B R }, as well as σ = sup{t ∈ [0, 1] : X(t) ∈ B R }. Lemma 7.2 replaces the quasi-orthogonality property, which was employed in the quadratic case.The quasi-orthogonality property relied on expanding the squares, which is not an available tool if p = 2. From formal calculations, which we make rigorous in Lemma 7.5, (1.7) will be small, if φ solves the Neumann problem where f R , g R are functions tracking the location of X(τ ) and X(σ), respectively.We formally define f R , g R in (2.2).However, as written, the problem is not well-posed and solutions do not possess sufficient regularity to carry out the necessary estimates.Hence, we will actually work with approximations of f R and g R , which we construct in Section 5.
Controlling the error made in this approximation, will require us to enlarge the domain on which we work and to choose a suitable radius R ∈ [2,3].This explains the presence of R in the formulation of Theorem 1.1.The estimate of (1.7) is finally carried out in Lemma 7.5 in Section 7.
The idea in estimating (1.6) is to relate the first term with the value of a localised optimal transportation problem.Then an appropriate competitor can be constructed using φ in order to estimate (1.6).We carry out the first step in Section 4, while the second is obtained in Lemma 7.4 in Section 7.
In order to carry out the estimates, both of (1.6) and (1.7), two ingredients are essential.We require elliptic regularity estimates which follow from strict p ′ -convexity of c * .We explain how to obtain these in Section 1.2.In the quadratic case, the relevant equation is Laplace equation.Hence solutions are harmonic and hence very regular-the proof in [8] requires C 3 -regularity of solutions!Already in the case c(x− y) = |x−y| p p with p = 2, the best regularity that is known for solutions to (1.4) in general is C 1,β -regularity for some β > 0 [25,18,24].Thus, at various places in the proof, more careful estimates are needed.
The second ingredient is to obtain a L ∞ -bound for minimisers of (1.1) in the smallenergy regime, see Section 3. In the quadratic case, this relies on the monotonicity (in the classical sense) property of solutions.In the non-quadratic case, c-monotonicity needs to be used directly.Focusing on p-homogeneous convex cost functions with p > 1, L ∞bounds were obtained in [10].Note that [10] obtained L ∞ -bounds in all energy regimes whereas the L ∞ -bounds obtained in this paper only cover the small-energy regime.A further difference is that in this paper, we obtain the L ∞ -bound as a consequence of the strong p-convexity of the energy, whereas [10] relies on the homogeneity of the cost.Nevertheless, in the small-energy regime and for cost functions covered by the assumptions both of this paper and of [10], the obtained bounds agree.In particular, this is the case for the important example of p-cost c(x − y) = 1 p |x − y| p with p > 1.Finally, we comment why we restrict our attention to cost functions of the form c(x − y).This is due to the fact that our proof relies on the availability of a dynamical formulation.As (1.7) hints, we want to identify points (x, y) ∈ spt π with the trajectory X(t) = tx + (1 − t)y.This is related to the Benamou-Brenier formulation of optimal transport, c.f. [3], which in our case states that (1.1) can be alternatively characterised as min Here djt dρt denotes the Radon-Nikodym derivative.We refer the reader to (2.2) for an explanation on how to make sense of (1.10) rigorously.This alternative, dynamical formulation of optimal transport is only available for costs of the form c(x − y) where c is convex.
The outline of the paper is as follows: In the remainder of this section, we make precise our assumptions on the cost function (Section 1.1) and collect the elliptic regularity statements we require (Section 1.2).After collecting notation and some elementary results on optimal transportation in Section 2, we collect the statements we require in order to prove Theorem 1.1 in Section 7: in Section 3, we prove the L ∞ -estimate.In Section 4, we prove a localisation result on optimal transportation costs.Next, we approximate the boundary data of (1.8) in Section 5.For technical reasons, we also need to localise the data-term D, which we do in Section 6.

Assumptions on the cost function and its dual
In this section, we explain our assumptions on the cost function c.In order to keep the statements of our results short, the conditions (1.11)-(1.14)below will be assumed to hold throughout the entire paper.We consider cost-functions modeled on the pcost, c(x) = |x| p with p > 1.This is also the primary example of cost functions we have in mind.We emphasize however that the cost functions we consider need not be homogeneous.In fact the assumptions we impose are standard within elliptic regularity theory, see e.g.[5,20].Let p ∈ (1, ∞).We consider a C 1 -cost function c : R d → R satisfying the following properties: There is Λ ≥ 1 such that (i) c is strongly p-convex: for any x, y ∈ R d and τ ∈ [0, 1], (1.11) where (ii) c has p-growth: If the choice of p is clear from the context, we will write V = V p and U = U p .We further note the following inequality, valid for any z 1 , z 2 , z 3 ∈ R d and with implicit constants depending only on p and d, (1.15) (1.15) follows from writing Here ∇ 2 denotes a derivative with respect to the second variable of V p .From elementary calculations which gives (1.15).(1.11) and the fact that c is C 1 imply that for any x, y ∈ R d , c(x) ≥ c(y) + ∇c(y), x − y + λV (x, y), (1.16) ∇c(x) − ∇c(y), x − y ≥ λV (x, y).(1.17)This can be seen by arguing as in the 2-convex, 2-growth case, which can be found for example in [12,Chapter IV.4.1].We require some information on the convexity properties of the convex conjugate that follow by adaptions of the 2-convex, 2-growth theory in [22] and [13].For the convenience of the reader, we provide proofs of the statements we require.Introduce the convex conjugate c * defined on We remark that since c is strongly convex, C 1 and superlinear at infinity, c * is strongly convex, C 1 and superlinear at infinity.Moreover ∇c and ∇c * are homeomorphisms of R d and ∇c * = (∇c) −1 [22,Theorem 26.5].Note that due to (1.12), we have for any A lower bound can be obtained similarly and we deduce: Due to strict p-convexity of c, c * satisfies for any Indeed, for ξ 1 , ξ 2 ∈ R d , using (1.16) with the choice x = ∇c * (ξ 1 ), y = ∇c * (ξ 2 ) and Cauchy-Schwarz, Re-arranging, we have (1.20) We claim that for any ξ ∈ R d , Then (1.20) gives (1.19).Since ∇c * = (∇c) −1 and both maps are homeomorphisms, to show (1.21), it suffices to show that for any x ∈ R d , Since difference quotients of convex functions are non-decreasing, for any h ∈ R d , applying also (1.13), Applying the above with h → th and choosing t such that |th| = |x|, as h was arbitrary, this gives the second inequality in (1.22).Note that in particular ∇c(0) = 0. Thus, using (1.11) with y = 0 and Cauchy-Schwartz gives After rearranging, this proves the first inequality in (1.22) We also require that c * is p ′ convex, that is for some C(p, Λ) > 0, Indeed, we can use Taylor's theorem and (1.14) to obtain for any x, y ∈ R d and some In order to estimate the integral, we used a well-known estimate, see e.g.[11,4].Recall the Fenchel-Young inequality in the form with equality if and only if ξ = ∇c * (x).Hence, with the choice x = ∇c * (ξ 1 ), (1.25) gives (1.27) Note that the supremum is nothing but CV p (∇c * (ξ 1 ), and arguing case by case as for (1.18), this shows In particular, recalling (1.21), we deduce Employing the fact that for any convex f : R d → R and any x, ξ ∈ R d , it holds that f (• − x) * (ξ) = f * (ξ) + x, ξ , we finally conclude, Combining this estimate with (1.27) gives (1.24).

Regularity assumptions on the dual system
Let R ∈ (2, 3).In this section, we state the regularity assumptions we make on distributional solutions φ ∈ W 1,p ′ (B) of the equation where g ∈ L p (B) and c g satisfies the compatibility condition |B R |c g = ´∂B g. ν denotes the outward pointing normal vector on ∂B.We will show that solutions exist and are unique up to a constant.Hence, we usually normalise solutions by requiring that ´B φ = 0. Fixing g ∈ L p (∂B), we denote by φ r the solution satisfying ´B φ r = 0 of (1.28) with data g r , where g r denotes convolution of g with a smooth convolution kernel on ∂B at scale r. 14), then solutions φ to (1.28) exist, are unique up to constant and the following statements hold: (i) φ satisfies the following energy estimates: The difference between φ and φ r is controlled: There exists s = s(n, p) > 0 such that (1.33) (iv) Dφ r is Hölder-regular up to the boundary: For any β ∈ (0, 1), Proof.Note that in light of the results of Section 1.1, c * is p ′ -convex and satisfies controlled p ′ -growth.Hence, the statements we need to prove are largely standard.
The existence and uniqueness up to constant of solutions follows from the direct method.Testing the weak formulation of (1.28) with φ and applying (1.24) in combination with Hölder's inequality, the trace estimate and Poincaré inequality (recall that ´B φ = 0) gives: Re-arranging this gives (1.30).Using (1.12) and (1.21), (1.30) implies that also that is (1.31) holds.
(1.32) is proven in [19] and [17].As the proofs are quite involved, we do not comment on them here.Instead we turn to (1.33).We focus on the case p ′ ≤ n as the other case is easier.Testing the equations for φ and φ r with Dφ − Dφ r and applying (1.24) and Hölder's inequality, we find By a standard trace estimate and Poincaré's inequality . By standard properties of convolution, combining estimates and re-arranging concludes the proof.If p ′ ≤ 2, we apply Hölder's inequality to see Since due to (1.30) and standard properties of convolution, this again gives (1.33).
(1.34) follows from [19] and [17].Again the proof is involved and we don't comment on it here.

General notation
Throughout, we let 1 < p < ∞.B r (x) will denote a ball of radius r > 0 centered at x ∈ R d .We further write B r = B r (0) and B = B 1 (0).c denotes a generic constant that may change from line to line.Relevant dependencies on Λ, say, will be denoted c(Λ).We say a b and a b, if there exists a constant c > 0 depending only on d, p and Λ such that a ≤ cb and a ≥ cb, respectively.
where we identify y N +1 = y 1 .
A function f : R d → R is called c-concave if there exists a function g : for all x ∈ R d .

Optimal transportation
We recall some definitions and facts about optimal transportation, see [26] for more details.For this subsection, the full strength of our assumptions (1.11)-(1.14) is not needed.In fact, assuming that c is lower semi-continuous, convex and satisfies p-growth (1.12) would be sufficient in this subsection.
Given a measure π on R d × R d we denote its marginals by π 1 and π 2 respectively.The set of measures on R d × R d with marginals π 1 and π 2 is denoted Π(π 1 , π 2 ).Given two positive measures with compact support and equal mass λ and µ we define While our notation is reminiscent of the Wasserstein distance, and in fact gives the (p-th power of the) Wasserstein p-distance in the case c(x − y) = |x − y| p , in general it is not a distance on measures.Under our hypothesis, an optimal coupling always exists and moreover a coupling π is optimal, if and only if its support is c-cyclical monotone.
Moreover, we note the following triangle-type inequality: Lemma 2.1.Let ε ∈ (0, 1).There is C(ε) > 0 such that for any admissible measures µ 1 , µ 2 , µ 3 it holds that Proof.Due to the gluing lemma, see e.g.[23, Lemma 5.5.],there exists σ, a positive measure on R d × R d × R d with marginal π 1 on the first two variables and marginal π 2 on the last two variables.Here π 1 and π 2 are the optimal couplings between µ 1 and µ 2 and µ 2 and µ 3 , respectively, with respect to W c .Set γ to be the marginal of σ with respect to the first and third variable.Then γ ∈ Π(µ 1 , µ 3 ).It follows using the convexity of c and the triangle inequality in L p (γ) that for any t ∈ (0, 1), Using (1.12) and recalling the definition of γ, we deduce Choosing t sufficiently close to 1, this gives the desired estimate.
We require also the following consequence of Lemma 2.1.
We remark that the Benamou-Brenier formula (1.10) needs to be interpreted via duality in general, that is we set We recall the definition of the quantities that we use to measure smallness: We will find it convenient to work with trajectories X(t) = tx + (1 − t)y.In this context, it is useful to work on the domain To every trajectory X ∈ Ω, we associate entering and exiting times of B R : Often, we will drop the subscripts and denote Ω = Ω R , σ = σ R and τ = τ R .Further, we will need to track trajectories entering and leaving B R .This is achieved through the non-negative measures f R and g R concentrated on ∂B R and defined by the relations ).Thus, the integrals in (2.2) are well-defined.We will often use similar observations without further justification.

Estimating radial projections
We record a technical estimate concerning radial projections we will require.
Here ĝ is the radial projection of g defined in (2.1).
Proof.By scaling we may assume R = sup g = 1.The first inequality is then a direct consequence of Jensen's inequality.
For the second inequality, note that if ε ≪ 1, sup We conclude that for ω ∈ ∂B 1 , The last inequality holds, since the minimiser of

A L ∞ -bound on the displacement
A key point in our proof will be that trajectories do not move very much.Since we assume E(4) ≪ 1, this is evidently true on average.However, we will require to control the length of trajectories not just on average, but in a pointwise sense.We establish this result in this section.In the quadratic case, the proof in [9] relies on the fact that 2-monotonicity is equivalent to standard monotonicity.In our setting this is not available and we hence provide a different proof.Our proof heavily relies on the strong p-convexity of c. Lemma 3.1.Let 1 < p < ∞.Let π be a coupling between two admissible measures λ and µ.Assume that Spt π is cyclically monotone with respect to c-cost and that E(4) + D(4) ≪ 1.Then for every (x, y) ∈ Spt π ∩ # 3 , we have As a consequence, for (x, y) ∈ Spt π and t ∈ [0, 1], In the proof of Lemma 3.1 we require the following technical result, which we state independently as we will require it again in the future.

.3)
In case α = 1, (3.3) holds with C 0,1 replaced by C 1 .Further, if in addition we have Proof.Integrate the estimate against an optimal transport plan π between µ and κ µ,R dx B R to find, Applying Hölder and using (1.12) the result follows.
To obtain the second estimate, we proceed similarly, but start with the estimate The result follows using (1.12) and using Hölder to estimate We proceed to prove Lemma 3.1.
Proof of Lemma 3.1.Fix (x, y) ∈ Spt π ∩ # 3 .Without loss of generality we may assume that (x, y) Step 1. Barrier points exist in all directions: In this step we show that in all directions we may find points (x ′ , y ′ ) ∈ Spt π with x ′ ≈ y ′ .To be precise, consider an arbitrary unit vector n ∈ R d and let r > 0. We show that for any n, and all r ≪ 1, there is Assume, for contradiction, that for any M > 0, there is n ∈ R d and r > 0 such that for all (x ′ , y However, due to Lemma 3.2 and noting κ µ,4 ∼ 1, Normalising η such that ´Br(x+2rn) η dx ∼ r d , we can guarantee κ µ,4 ´η dx ∼ κ µ,4 r d ∼ r d .
Ensuring D(4) ≪ r p+d , so that As M was arbitrary, this is a contradiction.
Step 2. Building barriers: In this step, we show that if we are given points Without loss of generality, we may assume that x ′ − x points in the e n direction.Moreover, considering the cost c(•) − c(x), we may assume that c(x) = 0. Suppose for a contradiction that for some α, ρ > 0 to be determined.Here Γ = {t(x ′ − x) : t ≥ 0} and a denotes the orthogonal projection of a point a ∈ R d−1 × R + onto Γ.We want to show that then ) is a contradiction to the c-monotonicity of π and hence proves the stated claim.
We note that we may assume x = 0. Indeed, setting z with |z| ≤ M E r d and z ∈ C 0,z , which we recognise as precisely the situation we are in if x = 0.
Taking ρ ≥ 4, we then estimate using the growth assumption (1.13) and the strict convexity assumption (1.11) In particular, it suffices to show Thus choosing α, ε > 0, sufficiently small, we find (3.5)holds, proving our claim.
We record two consequences of Lemma 3.1 we will use later.

Corollary 3.3. Under the assumptions of Lemma 3.1, it holds that
Proof.We use Lemma 3.1 to deduce there is C > 0 such that Further, again using Lemma 3.1, there is C > 0 such that,

A localisation result
In order to prove Theorem 1.1, we need to use optimality in a localised way, as the quantity we need to estimate is a local quantity.In general, given a minimiser π of optimal transport with cost function c between two measures λ and µ, it is not true that the localised transport cost of π is approximately equal to the optimal transport cost between the localised measures λ B R and µ B R .In other words, it is in general not the case that However, if we take into account the entry points of trajectories entering B R (which we denoted f R , c.f. (2.2)) and the exit points of trajectories exiting B R (which we denoted g R , c.f. (2.2)), the values are close as we show in the next lemma.Lemma 4.1.Let λ, µ be admissible measures.Suppose π ∈ Π(λ, µ) minimises (1.1).Let R ∈ [2, 3] and define f R , g R as in (2.2).Then for any τ, δ > 0, there is ε > 0 such that if E(4) + D(4) ≤ ε, then Proof.Introduce the weakly continuous family of probability measures {λ z } z∈∂B R such that for any test function Let π be an optimal plan for W c (λ In order to see that π ∈ Π(λ, µ), by symmetry it suffices to check that the first marginal is λ.Hence test (4.1)against ζ(x).We begin by noting that due to the definition of µ w and using that π is supported in B R , Similarly, using also the definition of f R , In particular, we have shown As in the proof of Lemma 2.1, for any δ > 0, there is C δ > 0 such that for any x, y, z, Using this in combination with the fact that λ z , µ w are probability measures we deduce In particular, we deduce In order to obtain the second to last line we used the admissibility of π.Now note on the one hand, that due to optimality of π, On the other hand, on Thus, we have shown To obtain the last line, we used Corollary 3.3.Choosing ε sufficiently small the result follows.

Approximating the boundary data
Before we can implement the c * -harmonic approximation, we face another problem.Lemma 4.1 suggests that the c * -harmonic function φ we should use in Theorem 1.1 is given as a solution of the following Neumann-problem However, f R , g R as well as λ, µ are not sufficiently smooth for φ to make sense as a weak solution and we will not be able to apply the regularity results of Lemma 1.2 as it stands.Hence, we will approximate f R , g R by suitable L p (∂B R )-functions fR and ḡR and will replace λ this approximation is given by the following result: Here f R , g R are the functions defined in (2.2).
Proof.By symmetry it suffices to focus on the terms involving g.We begin by constructing g R .Let π be optimal for W c (µ B 4 , κ µ,4 dz B 4 ).Extend π, which is supported on Let π be the minimiser of W c (λ, µ).Then define π on R d × R d × R d by the formula ˆζ(x, y, z)π( dx dy dz) = ˆˆζ(x, y, z)π( dz|y)π( dx dy) valid for any test function ζ.We note that with respect to the (x, y) variables π has marginal π, while with respect to the (y, z) variables π has marginal π.
Fix R ∈ [2,3].Extend a trajectory X ∈ Ω in a piecewise affine fashion by setting for t ∈ [1,2], Note that the distribution g ′ of the endpoint of those trajectories that exit B R during the time interval [0, 1] is given by Note that due to Lemma 3.1, y = X(1) ∈ B 4 for any trajectory X that contributes to (5.1).Since π(B 4 , B c 4 ) = 0, we deduce that also z = X(2) ∈ B 4 and hence that g ′ is supported in B 4 .In particular, we may estimate for any ζ ≥ 0, using that the second This shows that g ′ has a density, still denoted g ′ , satisfying g ′ ≤ κ µ,4 and allows us to conclude the construction of g R by defining We now turn to establishing the claimed estimates for g R .Note that, directly from the definitions of π, g ′ and g R , an admissible plan for Indeed, due to the definition of π and the definition of g R in (2.2), for any test function ζ, On the other hand, using (5.1) and the definition of g R , In particular, using the p-growth of c (1.12), Thus, we deduce using once again the p-growth of c (1.12) and Corollary 3.3, 4) E( 4) Choosing ε sufficiently small, the first estimate holds.Noting sup g ′ ≤ κ µ,4 1, in order to prove the second inequality, it suffices to prove and to apply Lemma 2.3.The condition on the support of g in Lemma 2.3 applies due to Lemma 3.1.Note that by definition of g ′ , In order to obtain the second line, we observed that since |X(τ In addition we noted that, since X(τ The second-to last line was obtained applying Young's inequality.This concludes the proof.

Restricting the data
As we see from Lemma 5.1, we will not be able to work on B 4 directly, but will have to pass to a smaller ball B R with some suitably chosen R ∈ [2,3].Hence, we need to control-for a well-chosen R-D(R), while at the moment we only control D(4).Unfortunately, this does not follow immediately from the definition but requires a technical proof utilising ideas of the previous sections.The outcome of these considerations is the following lemma: Lemma 6.1.For any non-negative measure µ there is ε > 0 such that if D(4) ≤ ε, then Proof.Fix R ∈ [2,3].In this proof π will denote the optimal transference plan for the problem , which record where exiting and entering trajectories end up by asking that for all test functions ζ, Introduce the mass densities We use Lemma 2.1 to deduce Restricting π to trajectories that start in B R gives an admissible plan for I. Consequently, Since II will be estimated in the same way as III, but is slightly more tricky, we first estimate III.In order to estimate III, introduce the projection ĝ of g ′ onto ∂B R via (2.1).Using Lemma 2.1, we deduce Regarding the first term, we claim that Indeed, an admissible density-flux pair (ρ, j) for the Benamou-Brenier formulation (1.10) is given by where φ solves We find, writing s = (κ µ,4 − κ f + tκ g ), for any ζ supported in B R , To obtain the second line, we used the Fenchel-Young inequality.Assuming |s − 1| ≪ 1 for now, using the p-growth of c (1.12) it is straightforward to see that To obtain the last inequality, we used the energy inequality (1.31).We now estimate ´(ĝ) p and W c (ĝ, g ′ ) arguing exactly as in Lemma 6.1 in order to conclude 4), to deduce that |s − 1| ≪ 1 and to conclude the estimate of III, it suffices to show κ p f + κ p g D( 4).By symmetry it suffices to consider κ g .Since ĝ is supported on ∂B R , using Young's inequality, we find 4).( This concludes the estimate for III.It remains to estimate II.Using the subadditivity of W c , we have We want to proceed exactly as we did in the estimate for III, the only delicate issue being that we do not have κ µ,4 dx B R − f ′ ≥ c > 0. This is necessary in (6.2).However, this can be remedied by using Corollary 2.2 to deduce Note that since κ µ,4 dx B R − f ′ ≥ 0 and using (6.3), after choosing ε sufficiently small, In order to obtain the last inequality, we used the definition of D and chose ε sufficiently small.This allows to proceed using the same argument as for III to conclude This completes the proof.
7 The c * -harmonic approximation result The goal of this section is to prove the c * -harmonic approximation result.With the results of the previous sections in hand, we can give a more precise version of Theorem 1.1.In particular, we can make explicit the problem that φ solves.
To this end, in light of Lemma 5.1 and Lemma 6.1, given τ > 0, we fix R ∈ (2, 3) such that there exist non-negative f R , g R such that Then let φ be a solution with ´BR φ dx = 0 of is the constant so that (7.1) is well-posed.We emphasize that while we do not make the dependence explicit in our notation, φ depends on the choice of radius R.
With this notation in place, we state a precise version of our main result.The proof will be a direct consequence of the lemmata we prove in the following subsections.We begin in Section 7.1 by using strong p-convexity in order to bound a quantity related to the left-hand side of the estimate in Theorem 7.1 by a difference of energies, as well as two error terms.The first error term arises from the approximation of the boundary data, while the second error term comes from passing to the perspective of trajectories.We construct a competitor to estimate the difference of energies and estimate the two error terms in Section 7.2.Collecting estimates, we conclude the proof of Theorem 7.1 in Section 7.3.

Error estimates
We would like to apply Lemma 7.2 with φ solving (7.1).In order to do so, we require φ ∈ C 1 (B R ).However note that g R and f R will in general not be sufficiently smooth to ensure that φ ∈ C 1 (B R ).Thus, we approximate them using mollification.To be precise, let 0 < r ≪ 1 and denote by f r R and g r R , respectively, the convolution with a smooth convolution kernel (on ∂B R ) at scale r of f R and g R .Set φ r to be the solution with Here is the constant such that (7.2) is well-posed.We begin by showing that replacing f R and g R with f r R and g r R , respectively is not detrimental on the left-hand side of the estimate in Lemma 7.2.Lemma 7.3.For every 0 < τ there exists ε(τ ) and C(τ ), r 0 (τ ) > 0, such that if it holds that E(4) + D(4) ≤ ε(τ ) and 0 < r ≤ r 0 , then there exists R ∈ [2,3] such that if φ solves (7.1) and φ r solves (7.2), then ˆΩ3/2 ˆτ3/2 V Ẋ(t), ∇c * (Dφ r (X(t)) dt dπ τ (E(4) + D( 4)).
We now turn to estimating each of the three terms on the right-hand side of the estimate in Lemma 7.2 in turn.We will see that the second and third term are errors that arise from the approximation of the boundary data and from passing to the perspective of trajectories, respectively.Accordingly, estimating them will be essentially routine.In contrast, estimating the first term requires us to contrast an appropriate competitor to π. Lemma 7.4.For every 0 < τ there exists ε(τ ), C(τ ), r 0 (τ ) > 0 such that if it holds that E(4) + D(4) ≤ ε(τ ) and 0 < r ≤ r 0 , then there exists R ∈ [2,3] such that if φ r solves (7.2), then Proof.We note, in the case p ≤ 2, using the p-growth of c (1.13), the p ′ − 1-growth of ∇c * (1.19) and Hölder, 4)).
To obtain the last line, we used the elliptic estimates (1.30) and (1.33).In case p ≥ 2 a similar estimate holds by the same argument.Due to the localisation result of Lemma 4.1 and the L ∞ -bound in the form of Corollary 3.3, In particular, combining the previous two estimates and choosing r sufficiently small, it suffices to prove Using Lemma 2.1, we obtain for δ ∈ (0, 1) to be fixed Noting that due to the definition of D and our choice of R, we claim that for some C > 0, Collecting estimates, choosing first δ and r small, then ε small, once (7.3) is established, the proof is complete.Establishing (7.3) is easy to do using the Benamou-Brenier formulation (1.10).For t ∈ [0, 1] introduce the non-singular, non-negative measure and the vector-valued measure Note that (7. Here the right-hand side needs to be interpreted in the sense of (2.2).Since j t is supported in B R it suffices to consider ζ supported in B R .Then by definition of (j t , ρ t ) and the Fenchel-Young inequality for any s > 0, Choosing s = tκ µ,R + (1 − t)κ λ,R and integrating in t, we deduce Thus the proof of (7.3) is complete.

Proof of Theorem 7.1
We are now ready to prove Theorem 7.1.