1 Introduction

In this paper, we study the density of Lipschitz functions in Sobolev spaces when X is complete and separable, and \(\mu \) is any Radon measure on X which is positive and finite on balls. We consider the so-called Newton-Sobolev space \(N^{1,p}(X)\) defined in [35] (see also [3, 21]), which for \(p>1\) coincides with the one introduced independently in [6]. A function f is in \(N^{1,p}(X)\) if \(f\in L^p(X)\) and if it has an upper gradient \(g\in L^p(X)\); see definition (2.1). Associated to each f there is a minimal p-weak upper gradient \(g_f \in L^p(X)\), which plays the role of the norm of a gradient. We will give precise definitions of these and the following notation in Sect. 2.

Our main result proves density in energy or, rather, produces a sequence of Lipschitz functions which converges in energy. A sequence of functions \((f_i)_{i\in {{\mathbb {N}}}}\), with \(f_i\in N^{1,p}(X)\) converges to \(f\in N^{1,p}(X)\) in energy, if the functions \(f_i\) converge to f in \(L^p(X)\) and if their minimal p-weak upper gradients \(g_{f_i}\) converge to \(g_f\) in \(L^p(X)\). Our sequences of functions \(f_i\) will be Lipschitz functions with bounded support, that is \(f_i \in {{\,\textrm{LIP}\,}}_b(X) \subset N^{1,p}(X)\). Our argument in fact shows more than the convergence of the minimal p-weak upper gradients. It shows that the asymptotic Lipschitz constants converge;

$$\begin{aligned} {{\,\textrm{lip}\,}}_a[f](x):=\lim _{r\rightarrow 0} \sup _{a\ne b\in B(x,r)} \frac{|f(a)-f(b)|}{d(a,b)}, \end{aligned}$$

and \({{\,\textrm{lip}\,}}_a[f](x) :=0\), if x is an isolated point. Since \({{\,\textrm{lip}\,}}_a[f]\) is an upper gradient, and since \(g_f\) is minimal, it follows that \({{\,\textrm{lip}\,}}_a[f]\ge g_f\) for any \(f\in {{\,\textrm{LIP}\,}}_b(X)\). Roughly, it is more difficult to ensure that \({{\,\textrm{lip}\,}}_a[f_i]\) converges, since it is larger. More precisely, from Lemma 2.9, it follows that convergence of \({{\,\textrm{lip}\,}}_a[f_i]\) implies convergence in energy, and is thus a stronger statement.

Theorem 1.1

Let X be a complete and separable metric space, let \(p\in [1,\infty )\) and let \(\mu \) be a Radon measure which is positive and finite on balls. If \(f\in N^{1,p}(X)\), then there exists a sequence \(f_i\in {{\,\textrm{LIP}\,}}_b(X)\subset N^{1,p}\) so that the following properties hold.

  1. (1)

    The functions \(f_i\) converge in \(L^p(X)\) to f, that is

    $$\begin{aligned} \lim _{i\rightarrow \infty } \int _X |f_i-f|^p d\mu =0. \end{aligned}$$
  2. (2)

    The asymptotic Lipschitz constants \({{\,\textrm{lip}\,}}_a[f_i]\) and the minimal p-weak upper gradients \(g_{f_i}\) of \(f_i\) converge to the minimal p-weak upper gradient \(g_f\) of f in \(L^p(X)\), that is

    $$\begin{aligned} \lim _{i\rightarrow \infty } \int _X |g_{f_i}-g_f|^p d\mu = \lim _{i\rightarrow \infty } \int _X |{{\,\textrm{lip}\,}}_a[f_i]-g_f|^p d\mu = 0.\end{aligned}$$

The conclusion of the theorem, for exponents \(p>1\), is contained in [2] (see also [1, Section 6]). Their methods, however were inexplicit. Further, the present result applies to the case \(p=1\), and gives a conceptually new way of obtaining their result. The case of \(p=1\) is of particular importance in applications that use the co-area inequality, as can be seen from the concurrent work [10].

In fact, the proof yields a slightly stronger conclusion. That this result is stronger, follows from Lemma 2.9.

Theorem 1.2

Let X be a complete and separable metric space, let \(p\in [1,\infty )\) and let \(\mu \) be a Radon measure which is positive and finite on balls. Let \(f\in L^1_\textrm{loc}(X)\) and let \(g\in L^p_\textrm{loc}(X)\) be a p-weak upper gradient of f. For every \(\epsilon >0\) and every bounded set \(C\subset X\) there exists a function \(g_\epsilon \in L^p_\textrm{loc}(X)\) with \(\Vert g_\epsilon -g\Vert _{L^p(C)}<\epsilon \) and a sequence \(f_i\in {{\,\textrm{LIP}\,}}_b(X)\), \(i\in {{\mathbb {N}}}\), so that the following properties hold.

  1. (1)

    The functions \(f_i\) converge in \(L^1_\textrm{loc}(X)\) to f, that is, for any bounded set \(A \subset X\),

    $$\begin{aligned} \lim _{i\rightarrow \infty } \int _{A} |f_i-f| d\mu =0. \end{aligned}$$
  2. (2)

    We have \({{\,\textrm{lip}\,}}_a[f_i]\le g_\epsilon \) for every \(i\in {{\mathbb {N}}}\).

Remark 1.3

Convergence in energy is weaker than convergence in norm. A sequence \(f_i\) converges in norm, if for any \(\epsilon >0\) there exists an \(N\in {{\mathbb {N}}}\) so that for all \(i\ge N\) it holds that \(\Vert f-f_i\Vert _{N^{1,p}}\le \epsilon \). In particular, this requires that the minimal p-weak upper gradients \(g_{f-f_{i}}\) of the differences \(f-f_{i}\) converge to 0 in \(L^p(X)\) with \(i\rightarrow \infty \). However, if the functions \(f_i\) converge in energy to f, then we only know that the differences of minimal p-weak upper gradients, \(g_f-g_{f_i}\), converge to 0 in \(L^p(X)\). Crucially, (a.e.) we have \(g_{f-f_{i}} \ge |g_f-g_{f_i}|\). This means, that convergence in norm is a stronger statement than convergence in energy. Indeed, by the following example, it is strictly stronger.

Equip \(X=[0,1]^2\) with the metric \(d((x_1,y_1),(x_2,y_2))=|x_1-y_1|+|x_2-y_2|\) and the Lebesgue measure. Let \(f(x,y):=x\), and \(f_n(x,y):=x+n^{-1}\sin (ny)\), and consider any \(p\in [1,\infty )\). For any smooth function \(a \in N^{1,p}(X)\), its minimal p-weak upper gradient \(g_a\) is given by the \(\ell ^\infty \)-norm of its gradient vector \(\Vert \nabla a\Vert _{\ell ^\infty }\). (Note that \(\ell ^\infty \) is the dual norm of \(\ell ^1\).) While this is classical, its argument is rarely spelled out in detail, and we will do so after explaining the relevance for our example. We get that \(g_{f_n}=\max \{|\partial _x f_n|, |\partial _y f_n|\} = 1 = g_f\). In particular, \(f_n \rightarrow f\) in \(L^p([0,1]^2)\) and \(g_{f_n}=g_f=1\). However, \(g_{f_n-f}=|\cos (ny)|\), which does not converge to 0 in \(L^p(X)\). Thus, the functions \(f_n\) converge to f in energy, but not in norm. If one uses any other uniformly convex norm on \({{\mathbb {R}}}^2\) to define the metric, then this phenomenon does not occur. Further, in many settings, such as this, one can fix the problem by taking convex combinations of \(f_n\) which converge in norm – even if the sequence \((f_n)_{n\in {{\mathbb {N}}}}\) only converges in energy. This final step requires reflexivity of the Sobolev space, which can be obtained, for example, with the assumption of finite Hausdorff dimension, see e.g. [11, Theorem 1.9].

We give a brief explanation for the above identification of the minimal upper gradient which an experienced reader may wish to skip. A proof can be patched together from arguments that are contained in [19, Sections 6-7] and [21, Prop. 6.3.3.]. First, from the triangle inequality and the definition in (2.1) it follows that \(\Vert \nabla a\Vert _{\ell ^\infty }\) is an upper gradient for any smooth function \(a\in N^{1,p}(X)\) (since it bounds all directional derivatives). Conversely, it is minimal, for the following reason. If g is any p-integrable upper gradient of a, we have that (2.1) holds for every line segment in any given direction v. By the argument in [21, Prop. 6.3.3.], which consists of Lebesgue differentiation of g and differentiation of a along such line segments, we get that the inequality \(g\ge |\partial _v a|\) holds a.e., where \(\partial _v a=\langle v, \nabla a\rangle \) is the directional derivative. By taking a supremum over a dense collection of vectors v, and as a consequence of the definition of a dual norm, we obtain \(g\ge \Vert \nabla a\Vert _{\ell ^\infty }\).

Density of energy, albeit weak, suffices for many applications. One application is proving the equivalence of various types of Poincaré inequalities. We say that a pair (ug) satisfies a p-Poincaré inequality (with constants \((C,\Lambda )\)) if for each ball \(B(x,r) \subset X\)

(1.4)

where we define the average by , when the final expression is well defined. For the statement of the following corollary, we define:

$$\begin{aligned} {{\,\textrm{lip}\,}}[f](x):=\lim _{r\rightarrow 0} \sup _{y\in B(x,r)} \frac{|f(x)-f(y)|}{d(x,y)}. \end{aligned}$$

Corollary 1.5

Let X be a complete and separable metric space, let \(p\in [1,\infty )\) and let \(\mu \) be a Radon measure which is positive and finite on balls. For any fixed constants \((C,\Lambda ) \in (0,\infty )^2\) the following three conditions are equivalent.

  1. (1)

    For every Lipschitz function \(f:X \rightarrow {{\mathbb {R}}}\) the pair \((f,{{\,\textrm{lip}\,}}_a[f])\) satisfies a p-Poincaré inequality with constants \((C,\Lambda )\).

  2. (2)

    For every Lipschitz function \(f:X \rightarrow {{\mathbb {R}}}\) the pair \((f,{{\,\textrm{lip}\,}}[f])\) satisfies a p-Poincaré inequality with constants \((C,\Lambda )\).

  3. (3)

    For every \(f \in N^{1,p}(X)\) the pair \((f,g_f)\) satisfies a p-Poincaré inequality with constants \((C,\Lambda )\).

  4. (4)

    For every \(f \in L^{1}_\textrm{loc}(X)\) and any upper gradient \(g\in L^1_\textrm{loc}(X)\) of f, the pair (fg) satisfies a p-Poincaré inequality with constants \((C,\Lambda )\).

The statement does not depend on any doubling or properness assumption, as was for example assumed in [24]. The proof is a simple exercise left to the reader of using Theorems 1.1 and 1.2, cut-off functions (see the proof of Theorem 1.1) and the fact that for Lipschitz functions \(g_f \le {{\,\textrm{lip}\,}}[f] \le {{\,\textrm{lip}\,}}_a[f]\) (almost everywhere). (For the last fact, see Example 2.5 and the definition of the minimal p-weak upper gradient in Sect. 2.) The equivalence of (4) is easier for \(g\in L^p_\textrm{loc}(X)\), but the case of \(g\in L^1_\textrm{loc}(X)\) is obtained by employing a cut-off function and a limiting argument that involves radii \(r'\nearrow r\). We note that the assumption of completeness can not be removed, as seen from the examples in [28] (see also [25] for a related self-improvement result).

Remark 1.6

Why is completeness relevant? If \(Y\subset X\) is any dense full measure set, then (1) and (2) in Corollary 1.5 remain unchanged. Indeed, Y may be even disconnected! Koskela’s example in [28] is of a connected open set \(Y\subset {{\mathbb {R}}}^2\), but where Y is separated by a set, whose capacity is not bounded from below scale-invariantly. This contradicts (3) and (4) in Corollary 1.5. However, Y is full measure and dense in the plane, so (1) and (2) still hold. In our proof, completeness is used via an application of the Arzelà-Ascoli Theorem.

A further application of a technical nature is the following, which was pointed out to us by Elefterios Soultanis.

Corollary 1.7

Let \(p\in [1,\infty )\). Assume that X is complete and separable and equipped with a Radon measure which is positive and finite on balls. If \(f \in N^{1,p}(X)\), then there is a Borel function \({\tilde{f}}\in N^{1,p}(X)\) so that \({\tilde{f}}=f\) almost everywhere.

Remark 1.8

If X is additionally measure doubling and satisfies a p-Poincaré inequality, then this result is already known and follows directly from norm-density of Lipschitz functions, see e.g. [21, Theorem 8.2.1].

Note that the definition of \(f\in N^{1,p}(X)\) only requires that f is measurable. Further, note that the Newton-Sobolev condition involves a pointwise consideration. Thus, a direct modification using Borel regularity does not yield the result since it may break the property of being a Newton-Sobolev function. The proof of this corollary follows immediately from Theorem 1.1 by considering a subsequence of \(f_i\) converging pointwise almost everywhere and their limit together with [15, Proof of Corollary 7.10]. We remark that a posteriori also \(f={\tilde{f}}\) at capacity almost every point, or quasi-everywhere, see [35, Corollary 3.3].

The question of the density of continuous and Lipschitz functions is also crucial in other contexts. For example, quasi-continuity properties of Sobolev functions are implied by the density of continuous functions in norm, see [4] and [33]. While we only get density in energy, it seems our techniques could have something to say in these contexts as well. In conclusion, it appears that density of continuous and Lipschitz functions in energy in Newton-type Sobolev spaces defined using upper gradients is far more generic than it appears from existing literature.

1.1 Approximation scheme

The approximation method that we introduce may seem surprisingly simple and a bit confusing. We thus wish to show how it arises naturally from prior work, explain how it is distinct from the state of the art and to survey existing approximation schemes. Our survey is brief, and undoubtedly not complete. In the following discussion, \(f:X\rightarrow {{\mathbb {R}}}\) will be a Sobolev function (as understoon in the specific context), and \({\tilde{f}}\) will denote its approximation.

In Euclidean spaces, and Lie groups, the simplest way to approximate an \(L^p(X)\)-function f is via convolution with \({\tilde{f}}=f*\phi _n\), where \(\phi _n\) is some approximation of unity. Even without a group structure, one can mimic this process on manifolds, and even some CAT(0) spaces, using various center of mass constructions. See e.g. [13, 23].

In general metric spaces, such a convolution method is missing. One rather old method is to employ so called discrete convolutions:

$$\begin{aligned} {\tilde{f}}(x)= \sum _n \psi _n f_{B_n}, \end{aligned}$$

where \(B_n\) are some balls which cover the space (with bounded overlap), and \(\psi _n\) is a partition of unity subordinate to \(B_n\). Such an approximation goes back to Coifman and Weiss [7]. Discrete convolutions have also been independently discovered in various guises, see e.g. [28]. When X is measure doubling (for some \(D\in [1,\infty )\) we have \(\mu (B(x,2r))\le D\mu (B(x,r))\) for each \(x\in X, r>0\)) and satisfies a Poincaré inequality, one can perform a discrete convolution in such a way that \({\tilde{f}}\) approximates f in \(L^p(X)\), and so that the \(N^{1,p}(X)\)-norm of \({\tilde{f}}\) is controlled. For such results, see [27]. For applications of discrete convolutions in other contexts and earlier results, see [34, pp. 290–292] and [31].

Discrete convolutions have been quite successful in analysis on metric spaces. In fact, the papers using this technique are too numerous to list here. But, some highlights we found are [26,27,28,29]. These papers require strong assumptions: a doubling measure and a Poincaré inequality. It is unclear, if a discrete convolution approach is possible to extend to more general settings. Another issue with discrete convolutions, is that even with the aforementioned strong assumptions, they fail to directly prove Theorem 1.1. Indeed, the approximating functions \({\tilde{f}}\) have minimal p-weak upper gradients \(g_{{\tilde{f}}}\) with a bound on their \(L^p(X)\)-norms – but convergence of the minimal p-weak upper gradients may fail.

In order to get convergence of gradients, and to weaken assumptions, we need to look beyond. The seminal paper of Cheeger [6] employs several useful approximation schemes, which apply to spaces with a doubling measure and a Poincaré inequality. We summarize them here.

  1. (1)

    [6, Theorem 4.24]: a sub-level set approach employing an approximation based on the McShane extension in the proof of the existence of a differential for Sobolev functions. As Cheeger notes, this method was standard in Sobolev space theory by the time of his work. We are not certain, where it appeared in the first instance. Hajłasz used the argument earlier for metric measure spaces [14, Theorem 5], and attributes the method to an earlier publication of Liu [30] and Calderón-Zygmund [5, Theorem 13], who proved related results in a Euclidean setting.

  2. (2)

    [6, Lemma 5.2]: an extension from a net of points using curves similar to Formula (1.10) (see below), which was used to prove a version of Theorem 1.1 and conclude \(g_f={{\,\textrm{lip}\,}}[f]\) for length spaces.

  3. (3)

    [6, Theorem 6.5]: an approximation involving a two-step Lipschitz extension and piecewise distance function, which was used to prove that \(g_f={{\,\textrm{lip}\,}}[f]\) without the length-space assumption.

These get much closer to the approximation method introduced in the present paper – even though the proofs written in [6] involved the strong assumptions of a Poincaré inequality and doubling. These arguments thus do not directly answer our questions. However, with these stronger assumptions, [6, Theorem 6.5] does prove our Theorem 1.1 for \(p>1\).

Each of the methods in Cheeger’s paper involved extending \(f|_A\) out from a subset \(A\subset X\) – or interpolating and extrapolating the function while attempting to ensure that \(\sup _{a\in A}|{\tilde{f}}(a)-f(a)|\) is as small as possible. This idea, that approximation and extension problems are connected, is behind the present paper. We briefly describe the approximation (1) and (2) above in slightly more detail to illustrate the connection to our method.

In the sub-level set approach, one obtains \({\tilde{f}}\) by taking a sub-level set \(A \subset X\) of the Hardy-Littlewood maximal function \(M(g_f^p)\), and employing a McShane Lipschitz extension. This is possible, because it follows from a point-wise version of the Poincaré inequality (see e.g. [16, Theorem 3.3]) that \(f|_A\) is Lipschitz. This approximation has the remarkable Lusin property: the approximating function \({\tilde{f}}\) actually agrees with the function f on large measure subsets. Together with locality of the minimal p-weak upper gradient (see e.g. [21, Proposition 6.3.22]), one gets that \({\tilde{f}}-f\) has small \(N^{1,p}(X)\)-norm. If one wishes to remove the Poincaré inequality assumption, then one needs substantially different tools. Indeed, without a Poincaré inequality, one must give up the (Lipschitz-)Lusin property because the function \(f \in N^{1,p}(X)\) may not, in general, be Lipschitz on any positive measure subset.

Example 1.9

Let \(X=[0,1]\) be a metric measure space equipped with the Lebesgue measure and snow-flake metric \(d(x,y)=\sqrt{|x-y|}\). Since X doesn’t have any non-constant rectifiable curves, every function \(f\in L^p(X)\) is a Sobolev function. Let \(f(x)=\sum _{n=1}^\infty a^{-n} \sin (2\pi b^n x)\) with \(1<a<\sqrt{b}\) and \(1+3/2\pi < ab\) and \(a,b\in {{\mathbb {N}}}\). Then, a classical computation shows that f is \(\log _b(a)\)-Hölder in the Euclidean metric, and \(\alpha \)-Hölder in the metric d, with \(\alpha :=2\log _b(a)<1\). On the other hand, the classical argument of Weierstrass gives [37], that to every \(x\in (0,1)\) and every \(m\in {{\mathbb {N}}}\) sufficiently large there exists a \(y \in [x-b^{-m},x+b^m]\cap [0,1]\) with \(f(x)-f(y)\gtrsim a^m\). Using this, one can conclude that any set \(A\subset (0,1)\), for which \(f|_A\) is Lipschitz with respect to d, must be porous (i.e. there is a constant \(c>0\) so that for every \(a\in A\) and any small enough \(r>0\), there exists an \(y\in (x-r,x+r)\) so that \(B(y,cr)\cap A = \emptyset \)). Consequently, there is no positive measure set A for which \(f|_A\) is Lipschitz.

The second approximation in [6] takes the form of

$$\begin{aligned} {\tilde{f}}(x):= \inf _{\gamma :A \leadsto x} f(\gamma (0)) + \int _\gamma g ~ds, \end{aligned}$$
(1.10)

where \(A\subset X\) is some compact set of points, and the infimum is taken over rectifiable curves \(\gamma :[0,1]\rightarrow X\) connecting A to x and the integral \(\int _\gamma g ~ds\) is the usual curve integral. While the proof, as written, in [6] does use a Poincaré inequality, the full strength of it is not really needed. Indeed, most of the lemmas in [6] do not use this assumption, and the proof of [6, Lemma 5.2] could be rewritten by using the Lusin property and choosing a set A where \(f|_A\) is continuous. By such a modification, the proof would apply to any proper length space. To our knowledge, this has not been observed before. However, we omit the details of this claim, since our main theorem contains a stronger result.

The idea in Definition (1.10) is to be close to f on the set A while insisting on g being an upper gradient. Such constructions have arisen in other settings, where one wishes to prescribe a given upper gradient: see e.g. [4, Lemma 3.1]. The main technical problem with Definition (1.10), is that it implicitly insists on the existence of rectifiable curves \(\gamma \). Without such curves \({\tilde{f}}\) may even fail to be continuous. Indeed, even the measurability is non-trivial [22].

The issue of a lack of curves has already been identified, and resolved, in limited instances. When proving that a Poincaré inequality implies quasiconvexity, a priori one can not assume the existence of any curves. See for example the beautiful discussion in [16, Proposition 4.4] where a version of this fact is proved – or the proof in [6, Theorem 17.1] which is originally due to Semmes. The proof involves “testing” the Poincaré inequality with functions of the form:

$$\begin{aligned} f_{\epsilon ,x}(y):=\inf _{p_0,\dots , p_n} \sum _{k=0}^{n-1} d(p_k,p_{k+1}), \end{aligned}$$

with the infimum is taken over all sequences of points, called discrete paths, \(p_0,\dots , p_n\) with \(d(p_k,p_{k+1})\le \epsilon \) (for \(k=0,\dots , n-1\), \(n\in {{\mathbb {N}}}\)) and \(p_0=x,p_n=y\), and \(x\in X\) a fixed point. Indeed, the approximation formula that we use will resemble and generalize this expression.

Our approximation combines the two main aspects from before: discrete paths, and extending from a subset \(A\subset X\). First, we explain the approximation for non-negative functions. The approximation is defined using given data: A non-negative function \(f:X\rightarrow [0,M]\) to approximate which is bounded by \(M>0\), a continuous bounded non-negative function \(g:X\rightarrow [0,\infty )\) which is our desired upper gradient, a set \(A\subset X\) s.t. \(f|_A\) is continuous, and a scale parameter \(\delta >0\).

The formula for the approximating function is:

$$\begin{aligned} {\tilde{f}}(x):= \min \left\{ \inf _{p_0,\dots ,p_n} f(p_0) + \sum _{k=0}^{n-1}g(p_k)d(p_k,p_{k+1}),M\right\} \end{aligned}$$
(1.11)

where the infimum is taken over all discrete paths \(p_0,\dots ,p_n\) with \(p_0 \in A\), \(d(p_k,p_{k+1})\le \delta \) and \(p_n=x\). There are some key observations, from Lemma 2.13 below, which we highlight here and which guide our definition.

  1. (1)

    If there are no discrete paths with the given properties, then by the standard convention, the infimum in the definition is \(\infty \), and we get \({\tilde{f}}(x)=M\). In any case, the cut-off value M ensures that \({\tilde{f}}(x)\le M\) for every \(x\in X\).

  2. (2)

    The function \({\tilde{f}}\) is automatically \(\max \{\sup _{x\in X}g(x), M/\delta \}\)-Lipschitz, and we have \({{\,\textrm{lip}\,}}_a({\tilde{f}}) \le g\). Indeed, if \(d(x,y)\le \delta \) for some \(x,y\in X\), by concatenating discrete paths, we obtain a bound for \(|{\tilde{f}}(x)-{\tilde{f}}(y)|\le \max \{g(x),g(y)\}d(x,y)\). By boundedness, if \(d(x,y)>\delta \), then \(|{\tilde{f}}(x)-{\tilde{f}}(y)|\le M \le \frac{2\,M}{\delta } d(x,y)\). Further details will be given later.

  3. (3)

    We have \({\tilde{f}}(x)\le f(x)\) for each \(x\in A\). Indeed, in the infimum, we can choose the discrete path \(P=(x,x)\). Thus, we have \({\tilde{f}}(x) \le f(x)+g(x)d(x,x)=f(x)\).

The difficulty lies then in choosing a function g appropriately so that one can show that \({\tilde{f}}|_A\) converges to \(f|_A\) pointwise as \(\delta \rightarrow 0\). Here, a refined version of Arzelà-Ascoli is used as part of a compactness and contradiction argument. If the convergence were to fail, then we would get a sequence of discrete paths converging to a curve, which would violate the upper gradient inequality (2.1). This step is classical, and has already appeared in [20, Proposition 2.17] and in [6, Lemma 5.18]. According to Cheeger, this argument has also been used by Rickman and Ziemer [6, Remark 5.26].

For the technically minded, we already mention that properness, which is usually assumed, is avoided by appropriately choosing g to penalize paths that form non-compact families. Indeed, paths that travel ”far” away from certain compact sets must have small ”modulus”, as we will later make precise. This is a new argument and seems to be useful in other settings as well. However, with properness, our proof would be considerably simpler – and this will be indicated in the proof.

Formula (1.11) applies directly only to functions with a bounded support and non-negative functions. For signed functions without bounded support, one first does a truncation and applies a cut-off function. Then, the previous approximation scheme is applied to the positive and negative part individually. We note that modifications of the formula yield approximations directly for any function \(f\in N^{1,p}(X)\). However, the scheme presented here simplifies the proof slightly.

Finally, for the sake of completeness, we wish to mention another enormously successful approximation method which arises via estimates for the gradient flow of convex functionals, cf. [2]. This approximation is implicit, since the method does not directly furnish the approximation, but shows that it exists. Indeed, it proceeds via defining a functional, via relaxation of a Dirichlet energy and needs lower-semicontinuity for it in \(L^2(X)\). By showing that two different expressions define the gradient of the flow of this functional, one obtains the existence of approximations for Sobolev functions. While very general and powerful, this approach has two main shortcomings. The first is the inexplicit nature of the approximating function and lack of any pointwise control. Another problem is in the \(p=1\) case for \(N^{1,1}(X)\). When \(p=1\), the relaxation approach does not give information for \(N^{1,1}\) but instead for functions of bounded variation. To give results for this borderline case, we needed to introduce the methods of the present paper. We remark, that this case is still quite important, since it is connected via the co-area formula to (modulus) estimates on curves and surfaces in the space.

1.2 Further questions

The methods of this paper are likely to apply to a host of other Sobolev type spaces and lead to interesting further questions. A number of works on approximations have appeared in different settings, see e.g. Orlicz-Sobolev spaces [36], Lorenz-Sobolev spaces [8] and variable exponent Sobolev spaces [17, 18]. One can even study these questions with a general Banach function space norm, see e.g. [32, 33]. This list is far from exhaustive. Indeed, a variety of authors have asked for necessary and sufficient conditions for the density of Lipschitz functions in these settings – and we suggest that completeness and separability suffice, with perhaps minimal further assumptions when an upper gradient is used. It is important to note, however, that the situation is quite different for the Sobolev space, often denoted \(W^{1,p}(X)\), which is defined using a distributional gradient, see e.g. [18] for such issues in a variable exponent case. The techniques here suggest, that the questions on density in this different setting are equivalent with \(N^{1,p}(X)=W^{1,p}(X)\), which is a type of regularity statement.

Another question is when (locally) Lipschitz functions are dense in \(N^{1,p}(\Omega )\) when \(\Omega \) is a domain – i.e open and connected – in a complete and separable space X. We use completeness in our arguments, and additional care is needed close to the boundary of \(\Omega \). In some cases, when \(\Omega \) is say a slit disk in the plane \(B(0,1) {\setminus } (0,1)\times \{0\} \subset {{\mathbb {R}}}^2\), one would not expect such a density for globally Lipschitz functions. However, it may be that some minimal assumption would guarantee density of locally Lipschitz functions.

A final, and seemingly difficult question, is if Lipschitz functions are always actually dense in \(N^{1,p}(X)\) in norm, and not just in energy (when X is complete). If \(p>1\) and if the Sobolev space is reflexive, then the density in energy can be directly upgraded to density in norm. For a space X which is metrically doubling, this follows from [1, Corollary 41]. In a concurrent work with Elefterios Soultanis, we have employed techniques from this paper to get the density result for all \(p\in [1,\infty )\) and with a weaker finite dimensionality assumption for an associated p-differentiable structure [11]. This finite dimensionality assumption is satisfied by all spaces X with a finite Hausdorff dimension.

Outline: The proof of Theorem 1.1 will be at the end of Sect. 2. At the beginning of that section, there are three preliminary subsections, which will describe the terminology, basic properties of the approximating functions and some useful lemmas for discrete paths.

2 Proof of Approximation

2.1 Preliminaries

Open balls in a metric space X are defined by \(B(x,r):=\{y: d(x,y)<r\}\), for \(r>0\) and \(x\in X\). Throughout this section \(p \in [1,\infty )\) and X is a complete and separable metric space equipped with a Radon measure which is finite and positive on balls, i.e. \(0<\mu (B(x,r))<\infty \) for each \(x\in X, r>0\). The spaces \(L^p(X)\) consist of p-integrable functions and \(L^p_\textrm{loc}(X)\) consist of locally p-integrable functions, i.e. those f so that for every bounded set \(A\subset X\) we have \(\int _A |f|^p ~d\mu <\infty \). These spaces are equipped with the usual notions of convergence. A curve is a continuous map \(\gamma :I\rightarrow X\) from a compact interval \(I\subset {{\mathbb {R}}}\). A curve is rectifiable, if it has finite length \(\textrm{Len}(\gamma )\), see [21, Chapter 5].

Recall that, as introduced by Heinonen and Koskela in [20], a non-negative Borel function \(g:X\rightarrow [0,\infty ]\) is a (true) upper gradient of (or for) \(f:X\rightarrow [-\infty ,\infty ]\), if

$$\begin{aligned} \int _\gamma g ~ds \ge |f(\gamma (1))-f(\gamma (0))|, \end{aligned}$$
(2.1)

for any rectifiable curve \(\gamma :[0,1]\rightarrow X\). If either \(f(\gamma (1)),\) or \(f(\gamma (0))\), is infinity, we interpret the right hand side as \(\infty \).

The property of being an upper gradient is not closed under \(L^p(X)\)-convergence. Thus, one introduces the notion of a p-weak upper gradient. One simple definition for this is that if \(\textrm{UG}(f)\subset L^p(X)\) is the collection of upper gradients of f that are contained in \(L^p(X)\), then g is a p-weak upper gradient if \(g\in \overline{\textrm{UG}(f)}\) (i.e. lies in the closure in \(L^p(X)\)). Equivalently, this notion can be defined using a notion of “a.e.” curves coming from the concept of a modulus of curve families. We say that g is a p-weak upper gradient, if (2.1) holds for p-a.e. rectifiable curve \(\gamma \): i.e. if there exists a \(h\in L^p(X)\), so that for every rectifiable curve \(\gamma \) for which \(\int _\gamma h \, ds < \infty \) we have inequality (2.1). We refer the reader to [15, 35] for details on modulus.

Remark 2.2

We prefer not to define modulus of curve families here, since we do not directly need it. Instead, we only need some properties of minimal p-weak upper gradients (which often are proved using modulus techniques). The main property we need is that if g is a p-weak upper gradient for f, then for any \(\epsilon >0\) there is a lower semi-continuous (true) upper gradient \(g_\epsilon \ge g\) with \(\int _X |g-g_\epsilon |^p ~d\mu \le \epsilon .\) This can be easily seen from \(g\in \overline{\textrm{UG}(f)}\) together with the fact that any function in \(L^p(X)\) can be approximated from above by a lower semi-continuous function (by the Vitali-Carathéodory Theorem). For the details we refer to [21, Sections 4.2 and Chapters 5–6].

We define \(N^{1,p}(X)\) as the collection of \(f\in L^p(X)\) so that there exists a Borel upper gradient \(g \in L^p(X)\). The functions in this collections are called Newton, Sobolev, or Newton-Sobolev functions. The space is called either the Newton, Sobolev, or Newton-Sobolev space. We define

$$\begin{aligned} \Vert f\Vert _{N^{1,p}}:=\left( \inf _{g\in \overline{\textrm{UG}(f)}} \Vert f\Vert _{L^p}^p + \Vert g\Vert _{L^p}^p\right) ^{1/p}, \end{aligned}$$
(2.3)

where the infimum is taken over all p-weak upper gradients of f (or, equivalently, over all true upper gradients g of f). By [15, Theorem 7.16] there always exists a minimal p-weak upper gradient \(g_f \in \overline{\textrm{UG}(f)}\subset L^p(X)\) which attains the infimum in (2.3), and which satisfies (2.1) for p-almost every curve. See also [35] for the \(p>1\) case. Since the set \(\overline{\textrm{UG}(f)}\) of p-weak upper gradients satisfies the lattice property [21, Corollary 6.3.12], we have that the minimal p-weak upper gradient satisfies the additional property that if \({\tilde{g}}\) is another p-weak upper gradient for f, then \({\tilde{g}}\ge g_f\) a.e. Throughout this paper, if \(f\in N^{1,p}(X)\), then \(g_f\) will denote its minimal p-weak upper gradient.

We also define the Lipschitz and asymptotic Lipschitz constant for a function \(f:X \rightarrow {{\mathbb {R}}}\)

$$\begin{aligned} {{\,\textrm{LIP}\,}}[f](A):= \sup _{x,y \in A, x\ne y} \frac{|f(x)-f(y)|}{d(x,y)}, \quad {{\,\textrm{lip}\,}}_a[f](x):=\lim _{r\rightarrow 0} {{\,\textrm{LIP}\,}}[f](B(x,r)). \end{aligned}$$
(2.4)

When A is a singleton, we interpret \({{\,\textrm{LIP}\,}}[f](A)=0\). A function \(f:X\rightarrow {{\mathbb {R}}}\) is Lipschitz if \({{\,\textrm{LIP}\,}}[f](X)<\infty \). The collection of Lipschitz functions \(f:X\rightarrow {{\mathbb {R}}}\) is denoted \({{\,\textrm{LIP}\,}}[f](X)\) and \({{\,\textrm{LIP}\,}}_b(X)\) is the collection of \(f\in {{\,\textrm{LIP}\,}}[f](X)\), with bounded support (i.e. there exists some ball \(B(x_0,R)\subset X\) so that \(f(y)=0\) for each \(y\not \in B(x_0,R)\)). A function f is called L-Lipschitz if \({{\,\textrm{LIP}\,}}[f](X)<L\).

Example 2.5

Lipschitz functions yield examples of Sobolev functions. If \(f\in {{\,\textrm{LIP}\,}}[f]\), then \({{\,\textrm{lip}\,}}[f]\le {{\,\textrm{lip}\,}}_a[f]\le {{\,\textrm{LIP}\,}}[f](X)\), and \({{\,\textrm{lip}\,}}[f]\) is an upper gradient of f. Indeed, \({{\,\textrm{lip}\,}}[f]\) controls the directional derivative along any rectifiable curve and thus (2.1) follows; see [21, Lemma 6.2.6] for a detailed argument. Consequently, \({{\,\textrm{LIP}\,}}_b(X)\subset N^{1,p}(X)\) for each \(p\in [1,\infty )\) and \(g_f\le {{\,\textrm{lip}\,}}_a[f]\) for each \(f\in {{\,\textrm{LIP}\,}}_b(X)\).

We will need some results from functional analysis. First, we need an argument which replaces the usual application of reflexivity when \(p>1\). This involves the following definition.

Definition 2.6

Let \(p\in [1,\infty )\). A collection of functions \({\mathcal {F}} \subset L^p(X)\) is strongly p-equi-integrable if,

  1. a)

    for every \(\epsilon >0\), there exists, a set \(E \subset X\) with \(\mu (E)<\infty \) so that \(\int _{X{\setminus } E} |f|^p ~d\mu \le \epsilon \) for all \(f\in {\mathcal {F}}\);

  2. b)

    \(\sup _{f\in {\mathcal {F}}} \Vert f\Vert _{L^p(X)}<\infty \); and

  3. c)

    for every \(\epsilon >0\) there exists a \(\delta >0\) so that if \(\mu (E)<\delta \), then \(\int _E |f|^p ~d\mu \le \epsilon \) for all \(f\in {\mathcal {F}}\).

In the literature, the condition p-equi-integrability usually only includes condition c). In order to state more compact theorems and proofs below, we add the conditions a) and b) and add the modifier “strongly” to the term. If the measure is non-atomic, then by decomposing the measure to sufficiently small parts, one can show that a) and c) imply b).

The following theorem is classical - although often stated only for finite measure spaces. See [9, Theorem IV.8.9] or [12, Theorem 2.54].

Theorem 2.7

(Dunford-Pettis theorem) A collection \({\mathcal {F}}\subset L^1(X)\) is pre-compact if and only if it is strongly 1-equi-integrable.

We will further need the Vitali-Convergence Theorem. A sequence of functions \(f_n \in L^p(X)\) converges in measure to a function \(f\in L^p(X)\) if and only if for every \(\epsilon >0\) it holds that \(\lim _{n\rightarrow \infty } \mu (\{x:|f_n(x)-f(x)|\ge \epsilon \})=0\). For a proof, see [12, Theorem 2.24].

Theorem 2.8

(Vitali-Convergence Theorem) A sequence \((f_n)_{n\in {{\mathbb {N}}}}\) of functions \(f_n\in L^p(X)\) converges to \(f\in L^p(X)\) if and only if it converges in measure to f and is strongly p-equi-integrable.

We apply these classical results to show that minimal p-weak upper gradients converge, if they possess some upper bounds which converge.

Lemma 2.9

Suppose that \(f_i,f\in N^{1,p}(X)\) are functions with minimal p-weak upper gradients \(g_{f_i},g_f \in L^p(X)\) and that \(f_i \rightarrow f\) in \(L^p(X)\). Suppose further that \({\tilde{g}}_i \in L^p(X)\) are functions which converge \({\tilde{g}}_i \rightarrow g_f\) in \(L^p(X)\). If \(g_{f_i}\le {\tilde{g}}_i\), then \(g_{f_i}\rightarrow g_f\) in \(L^p(X)\).

Proof

First, we will show that the sequence \(g_{f_i}\) converges to \(g_f\) weakly in \(L^p(X)\). To this end, it suffices to show that every subsequence of \(g_{f_i}\) has a further subsequence that converges to \(g_f\). After relabelling indices, without loss of generality, it suffices to find such a subsequence for \(g_{f_i}\).

First, we show that \(g_{f_i}\) has a weakly convergent subsequence in \(L^p(X)\), which converges to some function \({\tilde{g}}\). If \(p>1\), this follows from reflexivity of \(L^p(X)\). If \(p=1\), then the inequality \(g_{f_i}\le {\tilde{g}}_i\) and the fact that \({\tilde{g}}_i\) converge in \(L^1(X)\) easily imply that the sequence of functions \(g_{f_i}\) is strongly 1-equi-integrable. Therefore, by the Dunford-Pettis Theorem 2.7, the sequence has a weakly convergent subsequence. We will now denote this subsequence by \(g_{f_i}\), i.e. \(g_{f_i}\rightarrow {\tilde{g}}\) weakly in \(L^p(X)\).

It follows from [15, Lemma 7.8] that \({\tilde{g}}\) is a p-weak upper gradient of f and hence \({\tilde{g}}\ge g_f\) a.e. On the other hand, the weak lower semicontinuity of \(L^p\)-norms gives

$$\begin{aligned} \int _X {\tilde{g}}^p d\mu \le \liminf _{i\rightarrow \infty } \int _X g_{f_i}^p d\mu \le \liminf _{i\rightarrow \infty } \int _X \tilde{g_{i}}^p d\mu = \int _X g_f^p d\mu , \end{aligned}$$
(2.10)

which together with \({\tilde{g}}\ge g_f\) yields that \({\tilde{g}}=g_f\) a.e. Thus \(g_{f_i} \rightarrow g_f\) weakly in \(L^p(X)\).

Since the sequence \((g_{f_i})_{i\in {{\mathbb {N}}}}\) is strongly p-equi-integrable, by the Vitali-Convergence Theorem 2.8, it will converge to \(g_f\) if it converges in measure. Further, it follows from part c) of Definition 2.6, and the strong p-equi-integrability of the sequence \(g_{f_i}\), that it suffices to prove convergence in measure on all sets of finite measure, i.e.

$$\begin{aligned} \limsup _{i\rightarrow \infty } \mu (\{x\in A: |g_{f_i}(x)-g_f(x)|>\epsilon \})= 0 \end{aligned}$$

for every \(\epsilon >0\) and every \(A\subset X\) with \(\mu (A)<\infty \). Fix such a set \(A\subset X\) and \(\epsilon >0\). We have,

$$\begin{aligned} \mu (\{x\in A: |g_{f_i}(x)-g_f(x)|>\epsilon \})\le & {} \frac{1}{\epsilon }\int _{A} |g_f-g_{f_i}| d\mu \\ {}= & {} \frac{1}{\epsilon } \left( 2\int _{A\cap \{g_{f_i}>g_f\}} g_{f_i}-g_f d\mu + \int _{A} g_f-g_{f_i}d\mu \right) . \end{aligned}$$

Since \(g_{f_i}\) converge weakly to \(g_f\), we get \(\lim _{i\rightarrow \infty } \int _A g_f-g_{f_i}d\mu =0\). In order to estimate the first term, recall that \(g_{f_i}\le {\tilde{g}}_i\) and \(\tilde{g_i}\rightarrow g_f\) in \(L^p(X)\). Then,

$$\begin{aligned} \limsup _{i\rightarrow \infty } \int _{A\cap \{g_{f_i}>g_f\}} g_{f_i}-g_f d\mu\le & {} \limsup _{i\rightarrow \infty } \int _{A\cap \{g_{f_i}>g_f\}} {\tilde{g}}_{i}-g_f d\mu \\ {}\le & {} \limsup _{i\rightarrow \infty } \int _{A} |{\tilde{g}}_{i}-g_f| d\mu =0. \end{aligned}$$

By combining the two previous limits, we get

$$\begin{aligned} \limsup _{i\rightarrow \infty } \mu (\{x\in A: |g_{f_i}(x)-g_f(x)|>\epsilon \}) = 0. \end{aligned}$$

\(\square \)

We need a version of the Arzelà-Ascoli Theorem, which is easy to prove using standard techniques. We state this lemma to highlight the fact that our theorems apply to general complete and separable metric spaces, whereas most of the existing literature uses an assumption of properness. Indeed, we avoid properness by adding the assumption that the sets \(A_t\) that appear in the statement of the lemma are pre-compact. Consequently, we will be able to apply the lemma with \(Y=\ell ^\infty ({{\mathbb {N}}})\) which is not proper.

A curve \(\gamma :[0,1]\rightarrow X\) is L-Lipschitz if \(d(\gamma (s),\gamma (t))\le L|s-t|\) for every \(s,t\in [0,1]\).

Lemma 2.11

Let \(L \in [0,\infty )\). Suppose that Y is a complete metric space. Let \(\gamma _k:[0,1]\rightarrow Y\) be a sequence of L-Lipschitz curves so that for every \(t\in [0,1]\) the set \(A_t=\{\gamma _k(t): k\in {{\mathbb {N}}}\}\) is pre-compact in Y. There exists a subsequence of \(\gamma _k\) which converges uniformly to a L-Lipschitz curve \(\gamma \).

2.2 Approximating function

We introduce in this subsection the formula of our approximation and prove its main properties.

A discrete path P (or simply “a path” P) is a sequence of points \(P=(p_0,\dots , p_n)\) with \(p_k\in X\) for each \(k=0,\dots , n\), and \(n\in {{\mathbb {N}}}\) with \(n\ge 1\). We define the mesh of P by \(\textrm{Mesh}(P):=\max _{k=0,\dots , n-1} d(p_k,p_{k+1})\), the diameter of P by \({{\,\textrm{diam}\,}}(P):=\max _{k,l} d(p_k,p_{l})\) and the length of P by \(\textrm{Len}(P):=\sum _{k=0}^{n-1} d(p_k,p_{k+1})\). By a slight abuse of notation, we will write \(p\in P\) if there is a \(k=0,\dots , n\) so that \(p_k=p\). Further, we write \(P\subset U\) for a subset \(U\subset X\), if \(p_k \in U\) for each \(k=0,\dots , n\). We write \(P\subset Q\) if the sequence of points in P forms a subsequence of the points in Q without gaps: that is \(P=(p_0,\dots , p_m)\) and \(Q=(q_0,\dots , q_n)\), \(m\le n\) and there exists a non-negative \(0\le s\le n-m\) so that \(q_{s+k}=p_k\) for each \(k=0,\dots , m\). Such a path P is called a sub-path of Q.

For \(\delta >0\) and \(A\subset X\) a fixed closed set, we say that a discrete path \(P=(p_0,\dots , p_n)\) is \((\delta ,A,x)\)-admissible, if \(\textrm{Mesh}(P)\le \delta \), \(p_0\in A\) and \(p_n=x\). The collection of all \((\delta ,A,x)\) admissible discrete paths is denoted \({\mathcal {P}}(\delta ,A,x)\).

Suppose that \(f:X\rightarrow [0,M]\) is a bounded function for some \(M>0\), \(g:X\rightarrow [0,\infty )\) is a continuous bounded function, and \(A\subset X\) is a closed subset. Then, for \(\delta >0\) we define an approximating function \({\tilde{f}}\) with data \((f,g,A,M,\delta )\) as

$$\begin{aligned} {\tilde{f}}(x):=\min \{M,\inf _{(p_0,\dots , p_n) \in {\mathcal {P}}(\delta ,A,x)} f(p_0) + \sum _{k=0}^{n-1} g(p_k)d(p_k,p_{k+1})\}. \end{aligned}$$
(2.12)

Lemma 2.13

Let \(M,\delta >0\), \(f:X\rightarrow [0,M]\), \(A\subset X\) be closed, and let \(g:X\rightarrow [0,\infty )\) be continuous and bounded. The function \({\tilde{f}}\) in Formula (2.12) satisfies the following properties.

  1. A)

    \({\tilde{f}}:X\rightarrow [0,M]\).

  2. B)

    For each \(x\in A\), \(0\le {\tilde{f}}(x)\le f(x)\).

  3. C)

    If \(x\in A\) and \(f(x)=0\), then \({\tilde{f}}(x)=0\).

  4. D)

    For each \(x,y\in X\) with \(d(x,y)\le \delta \), we have

    $$\begin{aligned} |{\tilde{f}}(x)-{\tilde{f}}(y)|\le \max \{g(x),g(y)\}d(x,y). \end{aligned}$$
    (2.14)
  5. E)

    \({{\,\textrm{lip}\,}}_a[{\tilde{f}}](x)\le g(x)\) for every \(x\in X\).

  6. F)

    \({\tilde{f}}\) is \(\max \{M\delta ^{-1}, \sup _{x\in X} g(x)\}\)-Lipschitz.

Proof

  1. A)

    First, since each term in the infimum in (2.12) is non-negative, \({\tilde{f}}\) is also non-negative. Further, \({\tilde{f}}(x)\le M\) for each \(x\in X\) follows immediately from the definition.

  2. B)

    Let \(x\in A\). By noting that the discrete path \(P=(x,x)\) is \((\delta ,A,x)\)-admissible, we get \({\tilde{f}}(x)\le f(x)+g(x)d(x,x)=f(x)\).

  3. C)

    Let \(x\in A\). By B) and A) we get \(0\le {\tilde{f}}(x)\le f(x)=0\).

  4. D)

    Let \(x,y\in X\) be arbitrary with \(d(x,y) \le \delta \). Consider an arbitrary discrete \((\delta , A, x)\)-admissible path \(P=(p_0,\dots , p_n)\). We form a \((\delta , A,y)\)-admissible path \(Q=(q_0,\dots , q_{n+1})\) by adjoining \(q_{n+1}=y\) and setting \(q_i=p_i\) for \(i\in \{0,\dots , n\}\). With such choices

    $$\begin{aligned} {\tilde{f}}(y)\le f(q_0)+\sum _{k=0}^{n} g(q_k)d(q_k,q_{k+1}) = f(p_0)+\sum _{k=0}^{n-1} g(p_k)d(p_k,p_{k+1}) + g(x)d(x,y) \end{aligned}$$

    Taking an infimum over all (nAx)-admissible paths and a minimum with M on both sides of the inequality, we get \({\tilde{f}}(y)\le {\tilde{f}}(x)+g(x)d(x,y)\). By switching the roles of x and y, we obtain Inequality (2.14).

  5. E)

    Take points \(x,a,b\in X\). Then, by sending \(a,b\rightarrow x\) and applying (2.14) to the pair of points ab together with the continuity of g, we get

    $$\begin{aligned} {{\,\textrm{lip}\,}}_a[{\tilde{f}}](x)\le g(x). \end{aligned}$$
  6. F)

    Let \(L:=\max \{M\delta ^{-1}, \sup _{x\in X} g(x)\}\). If \(x,y\in X\), with \(d(x,y)\le \delta \), then from D) we get \(|{\tilde{f}}(x)-{\tilde{f}}(y)|\le L d(x,y)\). On the other hand, if \(x,y\in X\) with \(d(x,y)>\delta \), then by A) we have \(|{\tilde{f}}(x)-{\tilde{f}}(y)|\le M\le M\delta ^{-1} d(x,y)\le L d(x,y)\).\(\square \)

2.3 A compactness result for discrete curves

The convergence of our approximating functions to f is obtained via a contradiction argument, where we are given a sequence of discrete paths \(P^i\), and then extract a subsequence converging to a curve \(\gamma :[0,1]\rightarrow X\). In this subsection, we give the definition of convergence that we use and the main results for it.

First, to define our notion of convergence we fix an isometric Kuratowski embedding \(\iota : X\rightarrow \ell ^\infty ({{\mathbb {N}}})\), and identify X with its image under this embedding. A priori the notions that follow may depend on the choice of such an embedding. For our ultimate argument, such a dependence will play no role, and thus we do not analyse it much further. However, we will later indicate, how to prove that the definitions, in fact, are independent of the choice of such an embedding.

For a subset \(A \subset X\) and \(x\in A\), the distance from a to A is given by \(d(x,A):=\inf _{a \in A} d(a,x)\). If \(P=(p_0,\dots , p_n)\) is a discrete path, we define its linearly interpolating curve as follows. If \(\textrm{Len}(P)=0\), we define \(\gamma _P:[0,1]\rightarrow \ell ^\infty ({{\mathbb {N}}})\) by \(\gamma (t)=p_0\) for each \(t\in [0,1]\). If \(\textrm{Len}(P)>0\) we define the sequence of interpolating times \(T_P=(t_0,\dots , t_n)\) by \(t_0=0\) and \(t_k = \sum _{i=0}^{k-1} d(p_i,p_{i+1})/\textrm{Len}(P)\) for \(k=1,\dots , n\). Then, we define \(\gamma _P:[0,1]\rightarrow \ell ^\infty ({{\mathbb {N}}})\) piecewise by linear interpolation in \(\ell ^\infty ({{\mathbb {N}}})\): we set \(\gamma _P(t_k)=p_k\), and when \(t\in [t_{k},t_{k+1}]\) and \(t_{k+1}>t_k\), we set \(\gamma (t)=((t-t_k)p_k + (t_{k+1}-t)p_{k+1})(t_{k+1}-t_k)^{-1}\). The following lemma is elementary to verify.

Lemma 2.15

If P is a discrete path, and \(\gamma _P\) is its linearly interpolating curve, then \(\gamma _P\) is \(\textrm{Len}(P)\)-Lipschitz, parametrized by constant speed, \(\textrm{Len}(P)=\textrm{Len}(\gamma _P)\) and for each \(t\in [0,1]\) there exists a point \(p\in P \subset X\) so that \(d(\gamma _P(t),p)\le \textrm{Mesh}(P)\).

We say that a sequence of discrete paths \(P^i\) converges to a curve \(\gamma :[0,1]\rightarrow X\), if \(\gamma _{P^i}\) converges uniformly to \(\gamma \) and if \(\lim _{i\rightarrow \infty }\textrm{Mesh}(P^i)=0\). While we do not need this, we note that this notion of convergence does not depend on the embedding to \(\ell ^\infty ({{\mathbb {N}}})\) and could be defined intrinsically. Indeed, we could also define paths with jumps \(\gamma '_P:[0,1]\rightarrow X\) piecewise: if \(t\in [t_k,t_{k+1})\) we set \(\gamma _P(t_k)=p_k\). Then, we say that \(P^i\) converges to \(\gamma \) if \(\gamma '_{P^i}\) converges uniformly to \(\gamma .\) It is straightforward to show that these two definitions are equivalent. However, by using the linearly interpolating curves we avoid some unfortunate technical issues later.

We need compactness results for discrete paths. First, we give a simpler compactness statement in order to illustrate the main idea in the setting where X is compact.

Lemma 2.16

If X is a compact metric space, and if \(\{P^i\}_{i\in {{\mathbb {N}}}}\) is a sequence of discrete paths with \(\lim _{i\rightarrow \infty } \textrm{Mesh}(P^i)=0\) and \(\sup _{i\in {{\mathbb {N}}}}\textrm{Len}(P^i)<\infty ,\) then there exists a subsequence \(i_k\) so that \(P^{i_k}\) converges to a curve \(\gamma :[0,1]\rightarrow X\).

Proof

Let \(\gamma _{P^i}:[0,1]\rightarrow \ell ^\infty ({{\mathbb {N}}})\) be the linearly interpolating curves and \(L:=\sup _{i\in {{\mathbb {N}}}} \textrm{Len}(P^i)<\infty \). To prove the claim, we first need to show that the sequence of linearly interpolating curves \(\gamma _{P^i}\) satisfies the assumptions of Lemma 2.11. First, Lemma 2.15 gives that each \(\gamma _{P^i}\) is L-Lipschitz. Second, let \(t\in [0,1]\), and consider \(A_t:=\{\gamma _{P^i}(t): i\in {{\mathbb {N}}}\}.\) We will show that \(A_t\) is precompact in \(\ell ^\infty ({{\mathbb {N}}})\) by showing that \(A_t\) is totally bounded.

Fix \(\eta >0\), and \(N\in {{\mathbb {N}}}\) so that for \(i\ge N\) we have \(\textrm{Mesh}(P^i)\le \eta \). Let \(K:=X\cup \{\gamma _{P^1}(t), \dots , \gamma _{P^N}(t)\}\). We claim that \(d(a,K)\le \eta \) for each \(a=\gamma _{P^i}(t)\in A_t\). For each \(i\le N\) this claim is trivial. For each \(i> N\), by Lemma 2.15, there exists a \(p^i \in P^i\) so that \(d(\gamma _{P^i}(t),p^i)\le \textrm{Mesh}(P^i)\le \eta \). Since \(p^i \in X\), we have \(d(\gamma _{P^i}(t),X)\le \eta \).

The set K is compact, and thus totally bounded. Thus, we can find points \(x_1,\dots , x_M \in K\) so that \(K\subset \bigcup _{j=1}^M B(x_j,\eta )\). Combining with the previous paragraph, we get \(A_t\subset \bigcup _{j=1}^M B(x_j,2\eta )\), which proves the totally boundedness since \(\eta >0\) is arbitrary.

Now, by Lemma 2.15, we have that a subsequence of \(\gamma _{P^i}\) converges to some curve \(\gamma :[0,1]\rightarrow \ell ^\infty ({{\mathbb {N}}})\). By Lemma 2.15, for each \(t\in [0,1]\), we have \(d(\gamma (t),X)\le \limsup _{i\rightarrow \infty } d(\gamma _{P^i}(t),X)\le \lim _{i\rightarrow \infty } \textrm{Mesh}(P^i) = 0\), and thus the image of \(\gamma \) is contained in X. \(\square \)

If we were to work only in proper metric spaces, the previous lemma would be quite sufficient for the proof of Theorem 1.1. However, for non-proper spaces, we need to force the discrete curves to “not pass far” from a sequence of compact sets.

First, let \(\{K_n\}_{n\in {{\mathbb {N}}}}\) be an increasing sequence of non-empty compact sets \(K_n\subset X\), with \(K_n\subset K_m\) for \(n\le m\). One should imagine these sets coming from the tightness of the measure \(\mu \) on the complete and separable metric space X. We call a sequence of continuous bounded functions \(h_n:\ell ^\infty ({{\mathbb {N}}})\rightarrow [0,\infty ]\) defined by

$$\begin{aligned} h_n(x):=\sum _{k=1}^n \min \{nd(x,K_k),1\}. \end{aligned}$$
(2.17)

a good sequence of functions for \(\{K_k\}_{k\in {{\mathbb {N}}}}\).

These functions penalize paths that travel far away from the compact sets \(K_n\). Indeed, if \(d(x,K_n)\ge \eta \), then \(h_n(x)\ge n\min \{n\eta ,1\}.\) If we assume that a sum involving \(h_n\) over a discrete path P is controlled, then we can use this bound effectively to conclude that P is contained within an \(\eta \)-neighborhood of \(K_n\) (for some \(n\in {{\mathbb {N}}}\)).

Since the proof of the following Lemma is nearly identical to the previous Lemma, we will slightly abbreviate the proof, and utilize the same notation.

Lemma 2.18

Let \(K_n\subset X\) be an increasing sequence of compact sets and let \(M,L,\Delta >0\) be constants. Let further \(h_n\) be a good sequence of functions for \(\{K_k\}_{k\in {{\mathbb {N}}}}\). If \(\{P^i=(p_0^i, \dots , p_{n(i)}^i)\}_{i\in {{\mathbb {N}}}}\) is a sequence of discrete paths, with \(\lim _{i\rightarrow \infty } \textrm{Mesh}(P^i)=0\), \(\sup _{i\in {{\mathbb {N}}}}\textrm{Len}(P^i)\le L\), \(\inf _{i\in {{\mathbb {N}}}}{{\,\textrm{diam}\,}}(P^i)\ge \Delta \) and

$$\begin{aligned} \sum _{k=0}^{n(i)-1} h_i(p_k^i)d(p_k^i,p_{k+1}^i)\le M \end{aligned}$$

for each \(i\in {{\mathbb {N}}}\), then there exists a subsequence of \(P^i\) which converges to a curve \(\gamma :[0,1]\rightarrow X\).

Proof

The proof proceeds in the same way as Lemma 2.16. First, for every \(i\in {{\mathbb {N}}}\) the linearly interpolating curve \(\gamma _{P^i}\) is \(L-\)Lipschitz. Second, if a subsequence of \(\gamma _{P^i}\) converges to some curve \(\gamma :[0,1]\rightarrow \ell ^\infty ({{\mathbb {N}}})\), then by Lemma 2.15, we have \(d(\gamma (t),X)\le \limsup _{i\rightarrow \infty } d(\gamma _{P^i}(t),X)\le \lim _{i\rightarrow \infty } \textrm{Mesh}(P^i) = 0\) for each \(t\in [0,1]\), and thus the image of \(\gamma \) is contained in X.

Thus, we only need to verify the second assumption of Lemma 2.11, i.e. that the set \(A_t=\{\gamma _{P^k}(t): k\in {{\mathbb {N}}}\}\) is precompact in \(\ell ^\infty ({{\mathbb {N}}})\) for each \(t\in [0,1]\). Equivalently, we need to show that it is totally bounded. This in turn is equivalent to the fact that for every \(\eta >0\) we can find a compact set K so that \(d(a,K)\le \eta \) for each \(a\in A_t\).

Fix \(\eta >0\) and \(t\in [0,1]\). Fix \(N\in {{\mathbb {N}}}\) so that \(\textrm{Mesh}(P^i)\le \eta /8\) for each \(i\ge N\). Then, fix \(T:= 8\lceil \eta ^{-1} \rceil +\lceil 2\,M\min \{\eta /8, \Delta \}^{-1}\rceil \), and set \({\tilde{N}}:=\max \{N,T\}\). Set \(K:= K_{{\tilde{N}}} \cup \{\gamma _{P^i}(t): i=1,\dots , {\tilde{N}}\}\).

Take any \(i\in {{\mathbb {N}}}\) and consider \(a=\gamma _{P^i}(t)\in A_t\). If \(i\le {\tilde{N}}\), we have \(d(\gamma _{P^i}(t),K)=0\le \eta \). Next, assume \(i\ge {\tilde{N}}\). By Lemma 2.15, there exists a \(p^i\in P^i\) so that \(d(p^i, \gamma _{P^i}(t))\le \textrm{Mesh}(P^i)\le \eta / 8\). If \(d(p^i, K_{{\tilde{N}}})\le \eta /2\) for each \(i\ge {\tilde{N}}\), then \(d(\gamma _{P^i}(t),K)\le \eta \), and we are done. Suppose therefore that we have some index \(i\ge {\tilde{N}}\), with \(d(p^i, K_{{\tilde{N}}})\ge \eta /2\). We will show that this leads to a contradiction and thus completes the proof.

Let \(Q^i=(q_0^i, \dots , q_{m(i)}^i) \subset P^i\) be the largest sub-path of \(P^i\) which contains \(p^i\) and which is contained in \(B(p^i, \eta /4)\). Since \(\textrm{Mesh}(P^i)\le \eta /8\), and \({{\,\textrm{diam}\,}}(P^i)\ge \Delta \), we have \({{\,\textrm{diam}\,}}(Q^i)\ge \min \{\eta /8, \Delta \}\). Then, \(d(q,K_{{\tilde{N}}})\ge \eta /4\ge {\tilde{N}}^{-1}\) for each \(q\in Q^i\). Thus, \(d(q, K_j)\ge {\tilde{N}}^{-1}\) for each \(q\in Q^i\) and \(j\le {\tilde{N}}.\) In particular \(h_i(q)\ge h_{{\tilde{N}}}(q)\ge {\tilde{N}} \ge \lceil 2\,M \min \{\eta /8, \Delta \}^{-1}\rceil \), where we used Formula (2.17). Thus,

$$\begin{aligned} \sum _{k=0}^{m(i)-1} h_{i}(q_k^i)d(q_k^i,q_{k+1}^i)\ge \lceil 2M\min \{\eta /8, \Delta \}^{-1}\rceil \textrm{Len}(Q^i) \ge 2M. \end{aligned}$$

Since \(Q\subset P\), we have

$$\begin{aligned} 2M \le \sum _{k=0}^{m(i)-1} h_{i}(q_k^i)d(q_k^i,q_{k+1}^i)\le \sum _{k=0}^{n(i)-1} h_i(p_k^i)d(p_k^i,p_{k+1}^i) \le M, \end{aligned}$$

which is a contradiction. \(\square \)

We will also need a slightly technical lower semicontinuity statement reminiscent of [24, Proposition 4]. The sums that appear in the statement, and in Lemma 2.18, should be thought of as discrete Riemann sums.

Lemma 2.19

Let \(g:X\rightarrow [0,\infty ]\) be a lower semicontinuous function, and assume that \(\{g_i:X\rightarrow [0,\infty ]\}_{i\in {{\mathbb {N}}}}\) is an increasing sequence of continuous functions which converges to g pointwise, with \(g_i(x)\le g_j(x)\) for each \(x\in X\) and each \(i\le j\). If \(\{P^i=(p_0^i, \dots , p_{n(i)}^i)\}_{i\in {{\mathbb {N}}}}\) is a sequence of discrete paths, with \(\sup _{i\in {{\mathbb {N}}}} \textrm{Len}(P^i)<\infty \) and which converges to a curve \(\gamma :[0,1]\rightarrow X\), then

$$\begin{aligned} \int _\gamma g~ds\le \liminf _{i\rightarrow \infty }\sum _{k=0}^{n(i)-1} g_i(p_k^i)d(p_k^i, p_{k+1}^i). \end{aligned}$$

Proof

Let \(\gamma _{P^i}\) be the linearly interpolating curves to \(P^i\) and let \(L:=\sup _{i\in {{\mathbb {N}}}} \textrm{Len}(P^i)\). Use the Tietze Extension Theorem to extend each \(g_i:X \rightarrow {{\mathbb {R}}}\) to be a continuous function on \(\ell ^\infty ({{\mathbb {N}}})\). By constructing the extensions recursively and taking maxima, we can ensure \(g_i \le g_j\) for \(i\le j\). Extend the function g to all of \(\ell ^\infty ({{\mathbb {N}}})\) by the formula \(g(x):=\lim _{i\rightarrow \infty } g_i(x)\) where \(x\in \ell ^\infty ({{\mathbb {N}}})\). Since \((g_i)_{i\in {{\mathbb {N}}}}\) is an increasing sequence of continuous functions, it follows that g is lower semicontinuous. Denote the extensions still by the same letter.

Fix for the moment an index \(i\in {{\mathbb {N}}}\) and consider the function \(g_i\). Since \(\gamma _{P^j}\) converges uniformly to \(\gamma \), when \(j\rightarrow \infty \), we have that the set \(K\subset \ell ^\infty ({{\mathbb {N}}})\) formed by the union of the images of the curves \(\gamma _{P^j}\) (\(j\in {{\mathbb {N}}}\)) and \(\gamma \) is compact. On K, the function \(g_i\) is uniformly continuous.

Fix an \(\epsilon >0\). Since \(g_i|_K\) is uniformly continuous, there exists a \(\delta >0\) so that if \(x,y\in K\) and \(d(x,y)\le \delta \), then \(|g_i(x)-g_i(y)|\le \epsilon /L\). Choose then N so large that \(\textrm{Mesh}(P^j)\le \delta \) for each \(j\ge N\). If \(T_{P^j}=(t^j_0, \dots , t^j_{n(j)})\) is the sequence of interpolating times for \(P^j\), then \(d(\gamma _{P^j}(t),p^j_k)\le \textrm{Mesh}(P^j)\le \delta \) for each \(t\in [t^j_k, t^j_{k+1}]\) (\(k=0,\dots , n(j)-1\)). Consequently, for each \(j\ge N\), we get

$$\begin{aligned}&\left| \int _{\gamma _{P^j}} g_i ~ds- \sum _{k=0}^{n(j)-1} g_i(p^{j}_k) d(p^{j}_k,p^{j}_{k+1})\right| \\ {}&\quad = \left| \sum _{k=0}^{n(j)-1} \int _{\gamma _{P^j}|_{[t_k^j,t_{k+1}^j]}} g_i ~ds- \!\!\sum _{k=0}^{n(j)-1} g_i(p^{j}_k) d(p^{j}_k,p^{j}_{k+1})\right| \\&\quad =\left| \sum _{k=0}^{n(j)-1} \int _{\gamma _{P^j}|_{[t_k^j,t_{k+1}^j]}} (g_i(\cdot )-g_i(p^j_k)) ~ds\right| \\&\quad \le \textrm{Len}(P^j)\epsilon /L \le \epsilon . \end{aligned}$$

On the last line we used the fact that \(\textrm{Len}(P^j)=\textrm{Len}(\gamma _{P^j})\) by Lemma 2.15. Since \(\epsilon >0\) was arbitrary, by sending \(j\rightarrow \infty \) we get

$$\begin{aligned} \lim _{j\rightarrow \infty }&\left| \int _{\gamma _{P^j}} g_i ~ds- \sum _{k=0}^{n(j)-1} g_i(p^{j}_k) d(p^{j}_k,p^{j}_{k+1})\right| =0. \end{aligned}$$
(2.20)

By combining this with the lower semi-continuity of curve integrals (see e.g. the argument in [24, Proposition 4]) and the fact that \(g_i\le g_j\) for \(i\le j\), we have for each \(i\in {{\mathbb {N}}}\) that

$$\begin{aligned} \int _\gamma g_i ~ds&\le \liminf _{j\rightarrow \infty } \int _{\gamma _{P^j}} g_i ~ds {\mathop {=}\limits ^{2.20}} \liminf _{j\rightarrow \infty }\sum _{k=0}^{n(j)-1} g_i(p_k^j)d(p_k^j, p_{k+1}^j) \\&\le \liminf _{j\rightarrow \infty }\sum _{k=0}^{n(j)-1} g_j(p_k^j)d(p_k^j, p_{k+1}^j). \end{aligned}$$

Now, by letting \(i\rightarrow \infty \) and by using monotone convergence on the left hand side, we obtain the statement of the lemma. \(\square \)

2.4 Proof of Theorem 1.1

With the stage set, we are now able to prove the main theorem of the paper. The same proof, up to cosmetic changes, also gives Theorem 1.2. The main change, is replacing the convergence of \(f_n\) in \(L^p(X)\) with convergence in \(L^1_\textrm{loc}\), and instead of \(N^{1,p}(X)\) one employs the Dirichlet space defined in [21, Section 7.1] (alternatively, see [3]).

The argument will first use cut-off functions and truncation to reduce the problem to approximating non-negative and bounded functions with bounded support. This is based on the following lemma.

Lemma 2.21

If Theorem 1.1 is true for every \(f\in N^{1,p}(X)\) with \(f:X\rightarrow [0,M]\) for some \(M>0\) and with \(f|_{X{\setminus } B(x_0,R)}=0\) for some \(x_0\in X, R>0\), then Theorem 1.1 is true for all \(f\in N^{1,p}(X)\).

Proof

Let \(f\in N^{1,p}(X)\) be arbitrary. We will reduce the claim of Theorem 1.1 to first non-negative functions, and then to bounded functions, and finally to ones with bounded support.

Reduction to non-negative: We can write \(f=f_+ - f_-,\) where \(f_+=\max \{f,0\}\) and \(f_- = \max \{-f,0\}\) with \(f_\pm \in N^{1,p}(X)\). Then \(g_f = g_{f_+}+ g_{f_-}\) by [21, Proposition 6.3.22]Footnote 1. If Theorem 1.1 is true for \(f_\pm \), then we can find a sequence of functions \(f_{\pm }^n \in {{\,\textrm{LIP}\,}}_b(X)\) with \({{\,\textrm{lip}\,}}_a[f_\pm ^n]\rightarrow g_{f_{\pm }}\) in \(L^p(X)\). Let \(f_n=f_{+}^n-f_{-}^n\) and note that \(g_{f_n}\le {{\,\textrm{lip}\,}}_a[f_n]\le {{\,\textrm{lip}\,}}_a[f_+^n]+{{\,\textrm{lip}\,}}_a[f_-^n]\). Since \({{\,\textrm{lip}\,}}_a[f_+^n]+{{\,\textrm{lip}\,}}_a[f_-^n]\rightarrow g_{f_+}+ g_{f_-}=g_f\) in \(L^p(X)\), Lemma 2.9 gives that \({{\,\textrm{lip}\,}}_a[f_n]\rightarrow g_f\) and \(g_{f_n}\rightarrow g_f\) in \(L^p(X)\). In other words, \(f_n \in {{\,\textrm{LIP}\,}}_b(X)\) approximates f in energy. Thus, without loss of generality it suffices to prove the claim for all non-negative functions f. Let \(f:X\rightarrow [0,\infty )\) be an arbitrary non-negative function in \(N^{1,p}(X)\).

Reduction to a bounded case: Consider functions \(f_M = \min \{f,M\}\) for all \(M>0\). Then, we have \(f_M \rightarrow f\) in \(L^p(X)\) as \(M\rightarrow \infty \). By [21, Proposition 6.3.22], the function \(g_M:=g_f \cdot 1_{X \setminus f^{-1}[0,M]}\) is a minimal p-weak upper gradient for \(f-f_M\). One directly observes that \(g_M \rightarrow 0\) in \(L^p(X)\). Thus, \(f_M\) converges to f in norm in \(N^{1,p}(X)\). In particular, \(g_{f_M}\rightarrow g_{f}\) in \(L^p(X)\). If Theorem 1.1 is true for each \(f_M\), then there exist sequences \(\{f_M^n\}_{n\in {{\mathbb {N}}}}\) which converge in energy to \(f_M\) and with \({{\,\textrm{lip}\,}}_a[f_M^n]\rightarrow g_{f_M}\) in \(L^p(X)\) as \(n\rightarrow \infty \). By a diagonal argument, with \(M\rightarrow \infty \) together with \(n\rightarrow \infty \), we get the claim of Theorem 1.1 for f.

Finally, we can assume that \(f\in N^{1,p}(X)\) and that for some \(M>0\) we have \(0\le f\le M\).

Reduction to bounded support: Let \(x_0 \in X\) be any fixed point, and consider the functions \(f_R(x) = f \psi _R(x)\), where \(\psi _R(x) = \max \{0,\min \{1, R-d(x_0,x)\}\}\) and \(R\in {{\mathbb {N}}}\), \(R>0\). The functions \(\psi _R\) are 1-Lipschitz and \(0\le \psi _R \le 1\). Further, \(f_R\rightarrow f\) pointwise and in \(L^p(X)\). Further, the function \(f-f_R\) has a weak upper gradient \(g_R=1_{X {\setminus } B(x_0,R-1)}g_f + f 1_{B(x_0,R+1){\setminus } B(x_0,R-1)}\) as follows from the Leibniz rule in [21, Proposition 6.3.28 and Proposition 6.3.22]. Thus \(g_R \rightarrow 0\) and we get that \(f_R\rightarrow f\) in the norm of \(N^{1,p}(X)\). Therefore, functions with bounded support are norm dense, and we can repeat the argument in the previous step to obtain the lemma. Indeed, by assumption each \(f_R\) admits a sequence \(\{f_R^n\}_{n\in {{\mathbb {N}}}}\) of functions in \({{\,\textrm{LIP}\,}}_b(X)\) with \(f_R^n \rightarrow f_R\) in \(L^p(X)\) and with \(g_{f_R^n}\rightarrow g_{f_R}\) as \(n\rightarrow \infty .\) Then, a diagonal argument concludes the proof. \(\square \)

With this lemma, we are left to consider bounded non-negative functions with bounded support. For such functions we construct the approximation by making appropriate choices of \(g,A,\delta \) in Formula (2.12). In the proof, we first state a goal for the proof that the choices will guarantee, and show that reaching this goal suffices for the claim. After stating that goal, we construct a sequence of approximations \(f_n\) that reaches this goal for all large enough \(n\in N\). (The actual approximating sequence of f would be obtained by a diagonal sequence involving sending \(\epsilon >0\), which is introduced in the proof, to zero together with \(n\rightarrow \infty \).) The tricky bit of the proof is showing a pointwise convergence result for the approximating sequence \(f_n\), which is done using the lemmas above and a contradiction argument.

Proof of Theorem 1.1

By Lemma 2.21, we may assume that \(f\in N^{1,p}(X)\) satisfies \(f:X\rightarrow [0,M]\) for some \(M>0\) and \(f|_{X{\setminus } B(x_0,R)}=0\) for some \(x_0\in X, R> 2\). Further, let \(g_f \in L^p(X)\) be the minimal p-weak upper gradient of f.

Goals of proof: Let \(\epsilon \in (0,1)\) be fixed. We will show that we can find functions \(g_\epsilon \in L^p(X)\) and \(f_\epsilon \in {{\,\textrm{LIP}\,}}_b(X)\) so that \(g_\epsilon \ge {{\,\textrm{lip}\,}}_a[f_\epsilon ]\),

$$\begin{aligned} \int _X |g_\epsilon -g_f|^p d\mu \le \epsilon , \end{aligned}$$
(2.22)

and

$$\begin{aligned} \int _X |f-f_\epsilon |^p d\mu \le \epsilon . \end{aligned}$$
(2.23)

The claim of the theorem then follows directly from Lemma 2.9 by using the inequality \({{\,\textrm{lip}\,}}_a[f_\epsilon ] \ge g_{f_\epsilon }\) and by choosing a sequence of \(\epsilon \) with \(\epsilon \searrow 0\).

Choice of \(g_\epsilon \) and \(g_n\): First, by Remark 2.2, we can choose a lower semicontinuous upper gradient \(g_1\) of f so that \(g_1\ge g_f\) and

$$\begin{aligned} \int _X |g_1-g_f|^p d\mu \le \epsilon 4^{-2p}. \end{aligned}$$
(2.24)

Further, since 0 is an upper gradient for f when restricted to the set \(X\setminus B(x_0,R)\), we can set \(g_1(x)=g_f(x)=0\) for all \(x \in X \setminus B(x_0, R)\). Indeed, if \(g_1\) is any upper gradient, which does not satisfy this property, we may modify it to satisfy this equation. This modification preserves the lower semi-continuity property and the property of being an upper gradient. The latter of these is seen directly from the definition of being an upper gradient as follows. Decompose the curve appearing in Definition (2.1) to the open set \(\gamma ^{-1}(B(x_0,R))\), which is an at most countable union of open intervals, and to the compact set \(\gamma ^{-1}(X{\setminus } B(x_0,R))\), where \(f=0\). By applying (2.1) to each of the open intervals comprising \(\gamma ^{-1}(B(x_0,R))\), and by summing over them, we obtain (2.1) for the curve \(\gamma \).

By Lusin’s theorem and inner regularity of \(\mu \), we can choose an increasing sequence of compact sets \(K_n \subset B(x_0,2R)\) so that \(\mu (B(x_0,2R) \setminus K_n) \le \epsilon n^{-p}4^{-n-2p}\) and so that \(f|_{K_n}\) is continuous. Since \(K_n\) is an increasing sequence of sets, we get

$$\begin{aligned} \int _X \left( \sum _{n=1}^\infty 1_{B(x_0,2R) \setminus K_n}\right) ^p d\mu&\le \sum _{m=2}^\infty \int _{K_m\setminus K_{m-1}} \left( \sum _{n=1}^\infty 1_{B(x_0,2R) \setminus K_n}\right) ^p d\mu \nonumber \\&\le \sum _{m=2}^\infty \mu (K_m\setminus K_{m-1})(m-1)^p \le \epsilon 4^{-2p-2}. \end{aligned}$$
(2.25)

Choose a \(\sigma \in (0,1)\) so that \(\mu (B(x_0,2R))\sigma ^p \le \epsilon 4^{-2p}\), and define again \(\psi _{2R}(x) = \max \{0,\min \{1, 2R-d(x_0,x)\}\}\). Since \(R>2\), we have \(\psi _{2R}|_{B(x_0,3R/2)}=1\) and \(\psi _{2R}|_{X\setminus B(x_0,2R)}=0\).

Define

$$\begin{aligned} g_\epsilon (x) :=g_1(x) + \sigma \psi _{2R}(x)+\sum _{n=1}^\infty 1_{B(x_0,2R) \setminus K_n}(x). \end{aligned}$$
(2.26)

Inequality (2.22) follows from (2.25), (2.26) and (2.24). Also, \(g_\epsilon \) is lower-semicontinuous.

By Baire’s theorem for lower semicontinuous functions, cf. [21, Proposition 4.2.2], we can find an increasing sequence of bounded continuous functions \(\{{\tilde{g}}_n\}_{n\in {{\mathbb {N}}}}\) converging to \(g_1\) with \(0\le {\tilde{g}}_m \le {\tilde{g}}_n \le g_1\) for \(m\le n\). Define

$$\begin{aligned} g_n(x) = {\tilde{g}}_n(x) +\sigma \psi _{2R}(x)+ \sum _{k=1}^n \min \{nd(x,K_k),1_{B(x_0,2R)}(x)\}. \end{aligned}$$
(2.27)

From this definition, we get that \(g_n\le g_m\) for \(n\le m\), that \(0 \le g_n\le g_\epsilon \) and that \(g_n\) converges pointwise to \(g_\epsilon \) as \(n \rightarrow \infty \). Finally, choose \(L\in {{\mathbb {N}}}\) so that \(\mu (B(x_0,2R) \setminus K_L) \le \epsilon (2M)^{-p}\), and define the closed set \(A:=K_L \cup X\setminus B(x_0,R)\). Since \(f|_{K_L}\) and \(f|_{X{\setminus } B(x_0,R)}\) are continuous, the function \(f|_A\) is continuous.

We remark, that in definitions (2.26) and (2.27), we could avoid adding the summation term to \(g_\epsilon \) and \(g_n\) if the space X was proper. This is one place where properness would yield a simplification. In this case, later in the proof we would use the simpler Lemma 2.16 (with a bounded subset \(\overline{B(x_0,2R)}\) replacing X) instead of Lemma 2.18.

Approximating sequence \(f_n\): Define for each \(n\in {{\mathbb {N}}}\) the approximating function \(f_n\) with data \((f,g_n,A,M,n^{-1})\) by the formula

$$\begin{aligned} f_n(x):=\min \left\{ M,\inf _{p_0,\dots ,p_N} f(p_0)+\sum _{k=0}^{N-1} g_n(p_k)d(p_k,p_{k+1})\right\} , \end{aligned}$$
(2.28)

where the infimum is taken over all \((n^{-1},A,x)\) admissible discrete paths \((p_0,\dots , p_N)\).

Since \(g_n\) is bounded and continuous for each \(n\in {{\mathbb {N}}}\), Properties A, B, E and F of Lemma 2.13 yields that that \(f_n:X\rightarrow [0,M]\) is a Lipschitz functions with

$$\begin{aligned} {{\,\textrm{lip}\,}}_a[f_n]\le g_n\le g_\epsilon \quad \text { and } \quad 0\le f_n|_A\le f|_A. \end{aligned}$$

Further, Property C of Lemma 2.13 gives that \(f_n(x)=0\) for each \(x\not \in X\setminus B(x_0,R)\). Thus \(f_n \in {{\,\textrm{LIP}\,}}_b(X)\).

The main step remaining is to show that for each \(x\in K_L\) we have \(\lim _{n\rightarrow \infty }f_n(x) = f(x)\), that is the task is to prove pointwise convergence on \(K_L\). Indeed, suppose that we have shown this. Note that \(f_n(x)=f(x)=0\) on \(A\setminus K_L\), since \(A\setminus K_L \subset X\setminus B(x_0,R)\). By Lebesgue dominated convergence, since \(f_n,f:X\rightarrow [0,M]\), we get for n large enough that

$$\begin{aligned} \int _{A} |f_n-f|^p ~d\mu = \int _{K_L} |f_n-f|^p ~d\mu \le \epsilon /2. \end{aligned}$$

We have \(f_n(x)=f(x)=0\) for \(x\in X{\setminus } B(x_0,R)\) and \(f_n(x),f(x) \in [0,M]\) for \(x\in B(x_0,R){\setminus } K_L\). Thus, from the choice of L and the fact \(X=A \cup (B(x_0,R)\setminus K_L)\), we get for \(n\in {{\mathbb {N}}}\) large enough

$$\begin{aligned} \int _X |f-f_n|^p ~d\mu= & {} \int _{A} |f-f_n|^p ~d\mu + \int _{B(x_0,R)\setminus K_L} |f-f_n|^p ~d\mu \quad \\ {}\le & {} \epsilon /2 + M^p \mu (B(x_0,R)\setminus K_L) \le \epsilon . \end{aligned}$$

Thus, the function \(f_\epsilon =f_n\) for n large enough and \(g_\epsilon \) realize all the aspects of the goal of the proof. We are left to show that \(f_n|_{K_L}\) converges to \(f|_{K_L}\) pointwise.Footnote 2

Pointwise convergence \(\lim _{n\rightarrow \infty } f_n(x) = f(x)\) for \(x\in K_L\): For the sake of a contradiction, assume that there exists some \(x\in K_L \subset B(x_0,R)\) for which pointwise convergence fails.

For \(n\ge m\) we have \(g_n\ge g_m\). Thus \(f_n\ge f_m\), since for each \(x\in X\) any \((n^{-1},A,x)\)-admissible path is also \((m^{-1},A,x)\)-admissible. Thus, the sequence \(\{f_n(x)\}_{n\in {{\mathbb {N}}}}\) is increasing in \(n\in {{\mathbb {N}}}\). By Lemma 2.13 we have \(f_n(x) \le f(x)\) for each \(n\in {{\mathbb {N}}}\). Thus, the limit \(\lim _{n\rightarrow \infty } f_n(x)\) exists. Since the limit is not equal to f(x), there exists a constant \(\delta >0\), so that \(\lim _{n\rightarrow \infty }f_n(x) \le f(x)-\delta \).

Because \(f(x)\le M\), we get \(f_n(x) \le M-\delta \) for each \(n\in {{\mathbb {N}}}\). From the definition of \(f_n(x)\) in Formula (2.28), we obtain discrete paths \(P^n=(p^n_{0},\dots , p^{n}_{N_n})\) which are \((n^{-1},A,x)\)-admissible, and with

$$\begin{aligned} f(p_0^n) + \sum _{k=0}^{N_n-1} g_n(p^n_k) d(p^n_k,p^n_{k+1}) < f(x)-\delta /2 \le M. \end{aligned}$$
(2.29)

Our contradiction will be obtained by finding a curve in the limit of these discrete paths. Recall, that \((n^{-1},A,x)\)-admissibility means that \(p^n_0\in A, \textrm{Mesh}(P^n)\le n^{-1}\) and \(p^n_{N_n}=x\).

First, we restrict to a sub-path. Let \(Q^n=(q^n_0, \dots ,q^n_{M_n})\) be the largest sub-path of \(P^n\) with \(q^n_{M_n}=x\) and with \(Q^n \subset B(x_0,3R/2)\). We show that \(Q^n\) is \((n^{-1},A,x)\) admissible and satisfies Estimate (2.30). Since \(Q^n \subset P^n\), it is clear that \(\textrm{Mesh}(Q^n)\le n^{-1}\). Further, by construction \(q^{n}_{M_n}=x\). Thus, we need to check that \(q^n_0\in A\).

If \(Q^n=P^n\), then clearly \(Q^n\) is \((n^{-1},A,x)\)-admissible and \(f(q_0^n)=f(p_0^n)\) and (2.30) is immediate. Next, assume that \(Q^n \subsetneq P^n\). Recall that at the beginning of the proof we assumed \(R>2\). Since \(\textrm{Mesh}(P^n)\le 1 < R/2\) and since \(Q^n\) is the largest sub-path with the given inclusion, we must have \(q^n_0 \in B(x_0,3R/2) {\setminus } B(x_0,R)\). Thus, \(q^n_0\in A\), and \(Q^n\) is still \((n^{-1},A,x)\)-admissible. Also, \(0=f(q_0^n)\le f(p_0^n)\). These give that

$$\begin{aligned} f(q_0^n) + \sum _{k=0}^{M_n-1} g_n(q^n_k) d(q^n_k,q^n_{k+1})\le & {} f(p_0^n) + \sum _{k=0}^{N_n-1} g_n(p^n_k) d(p^n_k,p^n_{k+1})\nonumber \\ {}< & {} f(x)-\delta /2 \le M. \end{aligned}$$
(2.30)

We will show that \(\{Q^n\}_{n\in {{\mathbb {N}}}}\) satisfies the assumptions of Lemma 2.18. This consists of verifying the mesh, length, diameter and h-sum conditions for a good sequence of functions. Recall the definition of a good sequence of functions for \(\{K_n\}_{n\in {{\mathbb {N}}}}\):

$$\begin{aligned} h_n(x):=\sum _{k=1}^n \min \{nd(x,K_k),1\}. \end{aligned}$$
(2.31)

Here is another point, where properness would simplify the proof. If X was proper, we would only need to verify the mesh- and length- conditions and use Lemma 2.16. No h-sum condition or diameter condition or good sequence of functions would be involved.

  1. (1)

    Mesh condition \(\lim _{n\rightarrow \infty }\textrm{Mesh}(Q ^n)=0\): We have

    $$\begin{aligned} \lim _{n\rightarrow \infty }\textrm{Mesh}(Q ^n)\le \lim _{n\rightarrow \infty } n^{-1}=0,\end{aligned}$$

    since \(Q^n\) is \((n^{-1},A,x)\)-admissible.

  2. (2)

    Length bound \(\sup _{n\in {{\mathbb {N}}}} \textrm{Len}(Q^n)<\infty \): Fix \(n\in {{\mathbb {N}}}\). Recall (2.27), \(Q^n \subset B(x_0,3R/2)\) and \(R\ge 2\). The definition of \(\psi _{2R}\) guarantees that \(\psi _{2R}|_{B(x_0,3R/2)}=1\). Thus, we get \(g_n(q)\ge \sigma \) for every \(q\in Q^n\). By (2.30) and by setting \(L:=\sigma ^{-1}M\) we get

    $$\begin{aligned} \textrm{Len}(Q^n)= & {} \sigma ^{-1}\sum _{k=0}^{M_n-1} \sigma d(q^n_k,q^n_{k+1}) \nonumber \\\quad\le & {} \sigma ^{-1}\left( f(q_0^n) + \sum _{k=0}^{M_n-1} g_n(q^n_k) d(q^n_k,q^n_{k+1})\right) \le L, \end{aligned}$$
    (2.32)

    for all \(n\in {{\mathbb {N}}}.\)

  3. (3)

    Diameter bound \(\inf _{n\in {{\mathbb {N}}}} {{\,\textrm{diam}\,}}(Q^n)>0\): Fix \(n\in {{\mathbb {N}}}\). We stated earlier in the proof that \(f|_{A}\) is continuous. Thus, we can find a constant \(\Delta >0\) so that \(y\in A\) and \(d(x,y) \le \Delta \) imply that \(|f(x)-f(y)|\le \delta /4\). We get \(f(q_0^n)<f(x)-\delta /2\) from Inequality (2.30). Since \(q_0^n\in A\), we get \(d(q_0^n,x)=d(q_0^n,q_{M_n}^{n})\ge \Delta \). In particular, \({{\,\textrm{diam}\,}}(Q^n)\ge \Delta \).

  4. (4)

    h-sum bound: Let \(h_n\) be the good sequence of functions for \(\{K_n\}_{n\in {{\mathbb {N}}}}\). Note that, by definition (2.31), \(h_n|_{B(x_0,2R)}\le g_n|_{B(x_0,2R)}\). Inequality (2.30) together with \(Q_n\subset B(x_0,2R)\) thus gives

    $$\begin{aligned} \sum _{k=0}^{M_n-1} h_n(q^n_k) d(q^n_k,q^n_{k+1}) \le \sum _{k=0}^{M_n-1} g_n(q^n_k) d(q^n_k,q^n_{k+1}) \le M. \end{aligned}$$

As a consequence, Lemma 2.18 shows that a subsequence of \(Q^n\) converges to a curve \(\gamma :[0,1]\rightarrow X\). We now pass to this subsequence.

Since \(q_0^n\in A\), A is closed and \(\lim _{n\rightarrow \infty } q_0^n =\gamma (0)\), we get that \(\gamma (0)\in A\). Similarly, \(\gamma (1)=x\). Recall that \(f|_{A}\) is continuous. Therefore, \(f(\gamma (0)) = \lim _{n\rightarrow \infty } f(q^{n}_0)\). Recall that the increasing sequence of functions \(g_n\) converges pointwise to \(g_\epsilon \). From Lemma 2.19 applied to \(Q^n,\gamma ,g_n\) and \(g_\epsilon \), we obtain

$$\begin{aligned} f(\gamma (0)) + \int _\gamma g_\epsilon ~ds {\mathop {\le }\limits ^{2.19}} \liminf _{n\rightarrow \infty } f(q^{n}_0) + \sum _{k=0}^{M_n-1} g_{n}(q^n_k) d(q^{n}_k,q^{n}_{k+1}) {\mathop {\le }\limits ^{2.30}} f(\gamma (1))-\delta /2. \end{aligned}$$

We obtain

$$\begin{aligned} f(\gamma (1))-f(\gamma (0)) > \int _\gamma g_\epsilon ~ds, \end{aligned}$$

which contradicts \(g_\epsilon \) being a true upper gradient. \(\square \)

Remark 2.33

The proof above shows a more technical statement, which we highlight for purposes of future work. Suppose that f is non-negative, boundedly supported in \(B(x_0,R)\) and bounded by M and has a lower semi-continuous upper gradient \(g_1\). Then, let \(\sigma >0\) be arbitrary and take any increasing sequence of compact sets \(K_n\) as in the statement and define the function \(g_\epsilon \) as in (2.26). By the proof, for every \(L\in {{\mathbb {N}}}\), there is a sequence of Lipschitz functions \(f_m\) with \({{\,\textrm{lip}\,}}_a[f_m] \le g_\epsilon \) and \(f_m \rightarrow f\) pointwise on each compact set \(K_L\). Further, if X is proper we do not need to modify \(g_\epsilon \) with the summation term. Also, if \(g_1\) is bounded below on \(B(x_0,2R)\), we do not even need to add \(\sigma \psi _{2R}\), which is only added to ensure the length bound for \(Q^n\).

Additionally, the proof shows the following: if f is a continuous function with bounded support, then we do not need to consider the exhaustion by compact sets \(K_n\) and get convergence on all of \(A=X\). (This may be useful, for example, if f is continuous but not Lipschitz. On the other hand, even if f is Lipschitz, one could have \({{\,\textrm{lip}\,}}_a[f]>g_f\), and wish to construct approximations.)

A natural follow up work would consider these techniques in other Banach function spaces and associated Newton-Sobolev spaces, such as authors have done in [8, 17, 18, 36]. See also the versions in general Banach function spaces in [33]. In these other function settings, one would need to first ensure a lower semi-continuous upper gradient \(g_1\) which is close in norm to the minimal one (by a version of Vitali-Carathéodory as in Remark 2.2 and [32]). Then, check an appropriate version of Lemma 2.9. Finally, one would need to argue that the choices of \(K_n\) and \(\sigma \) can be made so that \(g_\epsilon \) and \(g_1\) are close in norm – which relies on some absolute continuity and monotone convergence in the applicable Banach function space. If one wishes, in proper metric spaces this should be slightly easier. For this argument, some form of Vitali-Carathéodory theorem holding for the Banach function space seems necessary, see [33]. Further ideas or techniques, such as some form of differential structure, would be needed to upgrade the density in energy to density in norm. For such ideas, see [11].