1 Introduction

The theory of convex functions in CAT(1)-spaces has received a considerable attention in the last 20 years or so. The theory first started in the papers of Jost [13, 14] and Mayer [20] establishing, among others, Crandall–Liggett-type [10] generation theorems for nonlinear semigroups corresponding to semi-convex functions in the CAT(0)-setting. The theory has been further expanded in the monographs [2, 5] where the theory of gradient flows has been studied in great detail. This followed by further investigations into convex optimization theory in the CAT(0)-setting [5]. Somewhat more recently Ohta and the second author has developed a theory of discrete [24] and continuous-time [25] gradient flows for lower semi-continuous semi-convex functions in CAT(1)-spaces. Note that it is enough to study the CAT(1)-setting, since for \(\kappa >0\), the metric \(d(\cdot ,\cdot )\) of a CAT(\(\kappa \))-space can be rescaled to \(\frac{1}{\sqrt{k}}d(\cdot ,\cdot )\) so that it becomes a CAT(1)-space with this new metric. Using the Moreau-Yosida resolvent \(J^\phi _\tau (x)\) for a lower-semicontinuous \(\lambda \)-convex function \(\phi :X\rightarrow (-\infty ,\infty ]\) and \(\tau >0\), one can construct the gradient flow as

$$\begin{aligned} S(t)x_0=\lim _{n\rightarrow \infty }\left( J^{\phi }_{t/n}\right) ^{n}(x_0) \end{aligned}$$
(1)

for a starting point \(S(0)x_0:=x_0\) and \(t>0\). This curve \(\xi (t)=S(t)x_0\) is characterized as the unique solution [22] to the Evolution Variational Inequality

$$\begin{aligned} \frac{1}{2} \frac{d}{dt} \big [ d^2\big ( \xi (t),y \big ) \big ] +\frac{\lambda }{2} d^2\big ( \xi (t),y \big ) +\phi \big ( \xi (t) \big ) \le \phi (y) \end{aligned}$$

for all \(y \in D(\phi )\) and almost all \(t>0\).

Variational convergence for sequences of functions in CAT(0)-spaces has been considered in the meantime, see Jost [13,14,15] and then Kuwae–Shioya [17] which introduced the concept of Mosco convergence for sequences of convex functions in a CAT(0)-space extending the pioneering works of Mosco [21] and Dal Maso [11]. See also [3] for a very detailed approach to the Banach and Hilbert settings. In particular following [14] it has been established in [17], that Mosco convergence of lower-semicontinuous convex functions implies the convergence of the corresponding Moreau-Yosida resolvents and in turn convergence of their gradient flows provided that all functions are bounded from below. This latter requirement has been relaxed in [6] and established the convergence through (1) also in the case of limits of sequences of metric spaces. They considered the general notion of an asymptotic relation of metric spaces which includes, among others, Gromov-Hausdorff limits. An essential feature of these approaches is the concept of weak convergence in a CAT(0)-space, where closed and bounded sets turn out to be weakly sequentially compact [6]. The concept of weak convergence and compactness carries over to the CAT(1)-setting as has been investigated by a number of authors along with the study of (semi-)convex functions [16, 26].

The purpose of this paper is to generalize the convergence of resolvents and gradient flows of Mosco convergent sequences of lower-semicontinuous semi-convex functions to the CAT(1)-setting from the CAT(0) with convex functions case [5, 6, 17]. In order to utilize weak convergence, thus to make sense of Mosco convergence, we need to assume some diameter bounds on our spaces in the most general cases of arbitrary lower-semicontinuous semi-convex functions. Under somewhat stronger assumptions of uniform lower boundedness or Lipschitz continuity of our function sequence we are able to prove convergence without diameter restrictions. Also we extend to the case of asymptotic relations, by generalizing the required weak convergence and compactness results to the CAT(1)-setting which may be of independent interest. In [6] the author utilizes local and global slope estimates for lower semicontinuous functions in the CAT(0)-setting established in [2]. The second author with Ohta in [25] established more direct estimates of error terms of minimizing movements related to gradient flows for semi-convex functions compared to those in [2]. We utilize these estimates in this paper in order to control error terms of minimizing movements converging to gradient curves. This approach is more direct than the one available for CAT(0)-spaces in [6], thus provides a simpler approach. Also we rely on Lipschitz estimates of [18] established for Moreau-Yosida resolvents locally in CAT(1)-spaces. The local nature of our analysis is one of the essential differences between the CAT(0)- and CAT(1)-setting.

Before moving onto the next section we fix some notations used in this paper. Given a real number \(\kappa > 0\), then \({M}_\kappa ^{2}\) is obtained from the sphere \(S^2\) by rescaling the distance function by the constant \(1/ \sqrt{k}\). For the definition of CAT(\(\kappa \))-spaces we refer to the textbook [9]. A subset A of a CAT(\(\kappa \))-space (Xd) is said to be \(\lambda \)-convex for \(\lambda > 0\) if for any two points \(x, y \in A\) with \(d(x, y) < \lambda \) can be joined by a unique minimal geodesic denoted by [xy] contained in A. In this paper all CAT(\(\kappa \))-spaces are assumed to be complete.

2 Weak convergence in CAT(1)-spaces

We begin with recalling the concept of weak convergence in a CAT(1) space [13], see also Definition 27 in [26]. In the definition we make use of \(P_Z(x)\) which denotes the unique nearest point projection of the point \(x\in X\) onto a closed \(\pi \)-convex set \(Z\subseteq X\) with \(d(Z,x):=\inf _{z\in Z}d(z,x)<\pi /2\). Under these conditions \(P_Z(x)\) is a retraction with the obtuse-angle property, see for example Exercise 2.6(1) in [9].

Definition 2.1

(Weak convergence) Let (Xd) be a CAT(1) space. A net \(x_i\in X\) included in \(B_x(\pi /2)\) with \(x\in X\) weakly converges to the point x, that is \(x_i\overset{w}{\rightarrow }\ x\), if for any \(\pi \)-convex geodesic segment \(\gamma \subseteq X\) with \(\gamma (0)=x\), \(P_{\gamma }(x_i)\) strongly converges to x.

The following result characterizes weak convergence, this lemma can be found in [6] in the case of CAT(0)-spaces. In the CAT(1)-case the price to be paid in the formulation is the required boundedness implied by the required convexity of geodesic lines.

Lemma 2.1

Let (Xd) be a CAT(1) space. Let \(x_n,x\in X\) such that \(d(x_n,x)<\pi /2\) for all \(n\in {\textbf{N}}\). Then \(x_n\rightarrow x\) if and only if \(x_n\overset{w}{\rightarrow }\ x\) and \(d(x_n,y)\rightarrow d(x,y)\) for some \(y\in X\) such that \(d(x,y)<\pi /2\).

Proof

The \("\Rightarrow "\) is trivial, so we only need to prove the \("\Leftarrow "\) implication. So let \(x_n\overset{w}{\rightarrow }\ x\) and \(d(x_n,y)\rightarrow d(x,y)\) for some \(y\in X\) such that \(d(x,y),d(x_n,x)<\pi /2\). We can assume that \(d(y,x)>0\), otherwise \(d(x_n,x)\rightarrow d(x,x)=0\) implying already that \(x_n\rightarrow x\). Then, by Exercise 2.6(1) [9] we have that the Alexandrov-angle \(\angle _{P_{[x,y]}(x_n)}(x_n,y)\ge \pi /2\) which we call the obtuse-angle property, thus by the spherical law of cosines we have

$$\begin{aligned} \begin{aligned} \cos d(x_n,y)&\le \cos d(x_n,P_{[x,y]}(x_n))\cos d(y,P_{[x,y]}(x_n))\\&\quad +\sin d(x_n,P_{[x,y]}(x_n))\sin d(y,P_{[x,y]}(x_n))\cos \angle _{P_{[x,y]}(x_n)}(x_n,y)\\&\le \cos d(x_n,P_{[x,y]}(x_n))\cos d(y,P_{[x,y]}(x_n)). \end{aligned} \end{aligned}$$

Now, weak convergence implies convergence of projections \(P_{[x,y]}(x_n)\rightarrow x\). Moreover, since \(P_{[x,y]}(x_n)\rightarrow x\), we may assume \(P_{[x,y]}(x_n) \ne y\). Thus, the assumption \(d(x_n,y)\rightarrow d(x,y)\) with the above estimate implies

$$\begin{aligned} \cos d(x_n,P_{[x,y]}(x_n))\rightarrow 1, \end{aligned}$$
(2)

thus \(x_n\rightarrow x\) as well. \(\square \)

We also have the following fact which ensures that small enough bounded sets are weakly sequentially compact, thus providing a Banach-Alaoglu property.

Lemma 2.2

(cf. Lemma 28 in [26]) Let (Xd) be a complete CAT(1)-space. Any sequence \((p_n)_{n\in {\mathbb {N}}}\) of points in X included in a \(\pi \)-convex set has a subsequence which converges weakly to a point in X.

We will make use of the following result later.

Lemma 2.3

Let (Xd) be a CAT(1) space with \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi \). Then if \((C_i)_{i\in I}\) is a non-increasing family of closed convex sets in X for an index set I, we have \(\cap _{i\in I}C_i\ne \emptyset \).

Proof

See for example [16, Theorem 2.5.] \(\square \)

3 Ekeland principle and Mosco continuity

Let (Xd) be a CAT(1) space in this section. We denote the extended real numbers by \(\overline{{\mathbb {R}}}:={\mathbb {R}} \cup \{-\infty ,\infty \}\).

Definition 3.1

(semi-continuity) A function \(\phi :X\rightarrow \overline{{\mathbb {R}}}\) with its effective domain \(D_\phi := \lbrace y \in X \vert \phi (y) < \infty \rbrace \) is lower semi-continuous (lsc) at \(x_0\in D_\phi \subseteq X\), if

$$\begin{aligned} \liminf _{x\rightarrow x_0} \phi (x)\ge \phi (x_0). \end{aligned}$$

Similarly we define weak lsc at \(x_0\in D_\phi \subseteq X\) if

$$\begin{aligned} \liminf _{x\overset{w}{\rightarrow }\ x_0} \phi (x)\ge \phi (x_0). \end{aligned}$$

A function \(\phi :X\rightarrow \overline{{\mathbb {R}}}\) is called lower semi-continuous on \(D_\phi \), if \(\phi \) is lsc at any point in \(D_\phi \). It is well known that a function \(\phi :X\rightarrow \overline{{\mathbb {R}}}\) is lsc if and only if all sublevel sets are closed, if and only if its epigraph is closed.

Definition 3.2

(semi-convexity) A function \(\phi :X\rightarrow \overline{{\mathbb {R}}}\) is semi-convex, more precisely \(\lambda \)-convex for a \(\lambda \in {\mathbb {R}}\), if

$$\begin{aligned} \phi \big ( \gamma (t) \big ) \le (1-t)\phi \big ( \gamma (0) \big ) +t\phi \big ( \gamma (1) \big ) -\frac{\lambda }{2}(1-t)td^2\big (\gamma (0),\gamma (1) \big ) \end{aligned}$$
(3)

along geodesics \(\gamma :[0,1] \rightarrow X\).

More generally \(\phi \) is quasi-convex if its sublevel sets are convex.

Lemma 3.1

(Ohta’s lemma cf. Lemma 5 in [26]) Any geodesic \(\gamma :[0,1]\rightarrow X\) with \(\gamma (0),\gamma (1)\in {\overline{B}}_z((\pi -\epsilon )/2)\) for an \(\epsilon >0\) satisfies

$$\begin{aligned} d^2(\gamma (t),z)\le (1-t)d^2(\gamma (0),z)+td^2(\gamma (1),z)-\frac{\kappa }{2}t(1-t)d^2(\gamma (0),\gamma (1)) \end{aligned}$$

for any \(t\in [0,1]\) and \(\kappa =(\pi -\epsilon )\tan (\epsilon /2)\).

Lemma 3.2

Let \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi \). Let \(f:X\rightarrow (-\infty ,\infty ]\) be a convex lsc function. Then f is bounded below.

Proof

Assume on the contrary that \(\inf _Xf=-\infty \). Then the sub-level sets \(X_k:=\{x\in X:f(x)\le -k\}\) for \(k\in {\mathbb {N}}\) are nonempty and are bounded, closed and convex. The sequence \(\{X_k\}_{k\in {\mathbb {N}}}\) is non-increasing, therefore by Lemma 2.3 they have nonempty intersection containing a point y. Then by construction \(f(y)=-\infty \) which is a contradiction. \(\square \)

Theorem 3.3

(Theorem 3.5. [16]) Let \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi \). Then closed convex sets are weakly closed.

Lemma 3.4

(Proposition 3.8. [16]) Let \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi \). Let \(f:X\rightarrow (-\infty ,\infty ]\) be a quasiconvex lsc function. Then f is weakly lsc. In particular \(x\rightarrow d^2(a,x)\) is weakly lsc on \(B_a(\pi /2)\).

Proof

Assume on the contrary that

$$\begin{aligned} \liminf _{x\overset{w}{\rightarrow }\ x_0}f(x)<f(x_0). \end{aligned}$$

This means that there exists a subsequence \(x_{n_k}\), and index \(k_0\in {\mathbb {N}}\), and a \(\delta >0\) such that \(f(x_{n_k})<f(x_0)-\delta \) for all \(k>k_0\). Using lsc and quasiconvexity of f, we get

$$\begin{aligned} f(y)\le f(x_0)-\delta \end{aligned}$$

for all \(y\in \overline{\textrm{co}}\{x_{n_k}:k>k_0\}\). This contradicts \(x_0\in \overline{\textrm{co}}\{x_{n_k}:k>k_0\}\) by Lemma 3.2. [16].

Fact 4 in [26] is the quasiconvexity of \(d(a,\cdot )\). \(\square \)

We need the following result of Kendall, which assures the existence of jointly convex functions on CAT(1) spaces with convex geometry.

Theorem 3.5

(Yokota’s Theorem A in [26]) Let \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi \). There exists a jointly \(\kappa \)-convex lsc function \(\Phi :X\times X\rightarrow [0,\infty )\) vanishing on the diagonal set \(\Delta \):= \(\{(x,x) \vert x \in X\)} for some \(\kappa >0\).

Lemma 3.6

(Ekeland principle, cf. [12]) Given \(x_0\in X\) and a lsc function \(f:X\rightarrow (-\infty ,\infty ]\) that is bounded below, there exist \(\alpha ,\beta \ge 0\) such that

$$\begin{aligned} f(x)\ge -\alpha d(x,x_0)-\beta \end{aligned}$$

for all \(x\in X\).

Definition 3.3

(Mosco convergence) A sequence of lsc functions \(f_n:X\rightarrow \overline{{\mathbb {R}}}\) said to converge to \(f:X\rightarrow \overline{{\mathbb {R}}}\) in the sense of Mosco if, for any \(x\in X\), we have

  1. (M1)

    \(f(x)\le \liminf _{n\rightarrow \infty }f_n(x_n)\) whenever \(x_n\overset{w}{\rightarrow }\ x\),

  2. (M2)

    there exists an \((y_n)\subseteq X\), such that \(y_n\rightarrow x\) and \(f_n(y_n)\rightarrow f(x)\).

Proposition 3.7

(Ekeland principle for a Mosco convergent sequence of lsc semi-convex functions, bounded case) Let X be a CAT(1)-space with \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi \). Given \(x_0\in X\) and a sequence of lsc \(\lambda \)-convex functions \(f_n:X \rightarrow (-\infty ,\infty ]\) that is Mosco converging to \(f:X \rightarrow (-\infty ,\infty ]\), there exist \(\alpha ,\beta \ge 0\) such that

$$\begin{aligned} f_n(x)\ge -\alpha d(x,x_0)-\beta \end{aligned}$$

for all \(x\in X\) and \(n\in {\mathbb {N}}\).

Proof

Assume that the assertion is false; that is, for any \(k\in {\mathbb {N}}\), there exists \(n_k\in {\mathbb {N}}\) and \(x_k\in X\) such that

$$\begin{aligned} f_{n_k}(x_k)+k[d(x_k,x_0)+1]<0. \end{aligned}$$

First assume that \(n_k\rightarrow \infty \) as \(k\rightarrow \infty \). Since we have \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi \), by Lemma 2.2 we can select a weakly convergent subsequence from \(x_k\) still denoted by \(x_k\) with weak limit denoted by \({\overline{x}}\in X\). Then by the Mosco convergence of \(f_n\) we have

$$\begin{aligned} \begin{aligned} f({\overline{x}})\le \liminf _{k\rightarrow \infty }f_{n_k}(x_k)&\le \liminf _{k\rightarrow \infty }-k[d(x_k,x_0)+1]\\&= -\limsup _{k\rightarrow \infty }k[d(x_k,x_0)+1]\\&=-\infty \end{aligned} \end{aligned}$$

which is a contradiction.

What is left to check is the case when \(n_k\) remains bounded, that is after some index \(k_0\in {\mathbb {N}}\) we have \(n_k=n_{k_0}\) for all \(k\ge k_0\). Let \(x_0\) denote the circumcenter of X which exists by Lemma 3.3 [8] and satisfies \(d(x_0,x)<\frac{\pi }{2}\) uniformly for all \(x\in X\). Consider the lsc function \(g(x):=f_{n_{k_0}}(x)+cd^2(x_0,x)\). Since \(f_{n_{k_0}}\) is \(\lambda \)-convex, we claim that there exists a large enough \(c>0\) so that g(x) is convex. Indeed, using Lemma 3.1 we get

$$\begin{aligned} (1-t)g(\gamma (0))+tg(\gamma (1))-g(\gamma (t))&\ge \left( \frac{\lambda }{2}+c\frac{\kappa }{2}\right) t(1-t)d^2(\gamma (0),\gamma (1))\\&\ge 0 \end{aligned}$$

if \(c>0\) is large enough. Pick such a \(c>0\). Then by Lemma 3.4\(g:X \rightarrow (-\infty ,\infty ]\) is weakly lsc, thus for a weakly convergent subsequence of \(x_k\) still denoted by \(x_k\) with weak limit \({\overline{x}}\in X\) we get

$$\begin{aligned} \begin{aligned} g({\overline{x}})&\le \liminf _{k\rightarrow \infty }f_{n_{k_0}}(x_k)+cd^2(x_0,x_k)\\&\le \liminf _{k\rightarrow \infty }-k[d(x_k,x_0)+1]+cd^2(x_0,x_k)\\&\le -\limsup _{k\rightarrow \infty }k[d(x_k,x_0)+1]+\liminf _{k\rightarrow \infty }cd^2(x_0,x_k)\\&\le -\limsup _{k\rightarrow \infty }k[d(x_k,x_0)+1]+\text {const.}\\&=-\infty \end{aligned} \end{aligned}$$

which is a contradiction. \(\square \)

For a potential function, we always consider a lower semi-continuous function \(\phi :X \rightarrow \overline{{\mathbb {R}}}\). The effective domain of \(\phi \) is defined as

$$\begin{aligned} D(\phi ):=X \setminus \phi ^{-1}(\infty ) \ne \emptyset . \end{aligned}$$

Given \(x \in X\) and \(\tau >0\), we define the Moreau–Yosida approximation:

$$\begin{aligned} \phi _{\tau }(x):=\inf _{z \in X} \left\{ \phi (z) +\frac{d^2(x,z)}{2\tau } \right\} \end{aligned}$$

and the Moreau–Yosida resolvent set

$$\begin{aligned} J_{\tau }^{\phi }(x):=\left\{ z \in X \,\bigg |\, \phi (z) +\frac{d^2(x,z)}{2\tau } =\phi _{\tau }(x) \right\} . \end{aligned}$$

For \(x \in D(\phi )\) and \(z \in J_{\tau }^{\phi }(x)\) (if \(J_{\tau }^{\phi }(x) \ne \emptyset \)), it is straightforward from

$$\begin{aligned} \phi (z)+\frac{d^2(x,z)}{2\tau } \le \phi (x) \end{aligned}$$

that \(\phi (z) \le \phi (x)\) and \(d^2(x,z) \le 2\tau \{ \phi (x)-\phi (z) \}\).

We can consider two kinds of conditions on \(\phi \):

Assumption 1

  1. (1)

    There exists \(\tau _*(\phi ) \in (0,\infty ]\) such that \(\phi _{\tau }(x)>-\infty \) and \(J_{\tau }^{\phi }(x) \ne \emptyset \) for all \(x \in X\) and \(\tau \in (0,\tau _*(\phi ))\) (coercivity).

  2. (2)

    For any \(Q \in {\mathbb {R}}\), bounded subsets of the sub-level set \(\{ x \in X \,|\, \phi (x) \le Q \}\) are relatively compact in X (compactness).

We remark that, if \(\phi _{\tau _*}(x_*)>-\infty \) for some \(x_* \in X\) and \(\tau _*>0\), then \(\phi _{\tau }(x)>-\infty \) for every \(x \in X\) and \(\tau \in (0,\tau _*)\) (see [2, Lemma 2.2.1]). Then, if the compactness (2) holds, we have \(J_{\tau }^{\phi }(x) \ne \emptyset \) by the lower semi-continuity of \(\phi \) (see [2, Corollary 2.2.2]).

Remark 3.1

If \({{\,\mathrm{\textrm{diam}}\,}}(X)<\infty \) and the compactness (2) holds, then the lower semi-continuity of \(\phi \) implies that every sub-level set \(\{ x \in X \,|\, \phi (x) \le Q \}\) is (empty or) compact. Thus \(\phi \) is bounded below and we can take \(\tau _*(\phi )=\infty \), in particular, (1) holds.

The following assures nonemptyness and unicity of \(J_{\tau }^{\phi }(z)\):

Remark 3.2

For \(\lambda \)-convex \(\phi \) by Proposition 3.26 in [18] \(J_{\tau }^{\phi }(z)\) is a unique point in the same neighborhoods of z as in Lemma 3.1 for all \(K+2\lambda \tau >0\), where \(K>0\). The K-convexity of \(x\mapsto d^2(z,x)\) holds under \({{\,\mathrm{\textrm{diam}}\,}}(X) < \frac{\pi }{2}-\epsilon \) with \(K:= (\pi - 2\epsilon )\tan (\epsilon )\).

The following is one of the main results of this section.

Theorem 3.8

Let \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi \) and \(f_n:X \rightarrow (-\infty ,\infty ]\) a sequence of lsc \(\lambda \)-convex functions that is Mosco converging to \(f:X \rightarrow (-\infty ,\infty ]\). Then

$$\begin{aligned} \lim _{n\rightarrow \infty }(f_n)_\tau (x)=f_\tau (x) \end{aligned}$$

and

$$\begin{aligned} \lim _{n\rightarrow \infty }J^{f_n}_\tau (x)=J^{f}_\tau (x) \end{aligned}$$

for any small enough \(\tau >0\) and \(x\in D(f)\).

Proof

From Proposition 3.7 we have for some \(\alpha , \beta \ge 0\):

$$\begin{aligned} f_n(J^{f_n}_\tau (x))\ge -\alpha d(J^{f_n}_\tau (x),x)-\beta . \end{aligned}$$

From the definition of \(J^{f_n}_\tau (x)\) and taking \(y_n \rightarrow x\) as in (M2), we have

$$\begin{aligned} f_n(y_n)+\frac{1}{2\tau }d^2(x,y_n)\ge f_n(J^{f_n}_\tau (x))+\frac{1}{2\tau }d^2(x,J^{f_n}_\tau (x)) \end{aligned}$$

where \(y_n\rightarrow x\), which combined with the above estimate yields

$$\begin{aligned} f_n(y_n)+\frac{1}{2\tau }d^2(x,y_n)+\alpha d(J^{f_n}_\tau (x),x)+\beta \ge \frac{1}{2\tau }d^2(x,J^{f_n}_\tau (x)). \end{aligned}$$

That is,

$$\begin{aligned} 0\ge d^2(x,J^{f_n}_\tau (x))-2\tau \alpha d(J^{f_n}_\tau (x),x)-2\tau (f_n(y_n)+\beta )-d^2(x,y_n). \end{aligned}$$

Since \(d^2(x,y_n)\rightarrow 0\) and \(f_n(y_n)\rightarrow f(x)<\infty \) by (M2), for small enough \(\tau >0\) the above forces

$$\begin{aligned} d(x,J^{f_n}_\tau (x))<\epsilon \end{aligned}$$
(4)

for all large enough \(n\in {\mathbb {N}}\), given an arbitrary \(\epsilon \le \frac{\pi }{2}\).

Now we are in position to prove \(\lim _{n\rightarrow \infty }J^{f_n}_\tau (x)=J^{f}_\tau (x)\). Suppose on the contrary that there exists a subsequence \(j_k\in J^{f_{n_k}}_\tau (x)\) such that \(d(j_k,J^{f}_\tau (x))\) does not converge to 0. Since \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi \), there exists a subsequence of \(\{j_k\}_{k\in {\mathbb {N}}}\) still denoted by \(j_k\) with weak limit \(c\in X\).

Since \(f_n\rightarrow f\) in the sense of Mosco, there exists a sequence \(y_n\rightarrow J^{f}_\tau (x)\) and \(f_n(y_n)\rightarrow f(J^{f}_\tau (x))\). Then using the weak lsc property of \(d^2(x,\cdot )\) we get

$$\begin{aligned} \begin{aligned} \limsup _{k\rightarrow \infty }(f_{n_k})_\tau (x)&\le \limsup _{k\rightarrow \infty }f_{n_k}(y_{n_k})+\frac{1}{2\tau }d^2(x,y_{n_k})\\&=f(J^{f}_\tau (x))+\frac{1}{2\tau }d^2(x,J^{f}_\tau (x))\\&\le f(c)+\frac{1}{2\tau }d^2(x,c)\\&\le \liminf _{k\rightarrow \infty }f_{n_k}(j_k)+\frac{1}{2\tau }d^2(x,j_k)\\&=\liminf _{k\rightarrow \infty }f_{n_k}(J^{f_{n_k}}_\tau (x))+\frac{1}{2\tau }d^2(x,J^{f_{n_k}}_\tau (x))\\&=\liminf _{k\rightarrow \infty }(f_{n_k})_\tau (x) \end{aligned} \end{aligned}$$
(5)

which yields \(c\in J^{f}_\tau (x)\). Furthermore using (M1) and (M2) with \(z_n\rightarrow c\) and \(f_n(z_n)\rightarrow f(c)\) we have

$$\begin{aligned} \limsup _{k\rightarrow \infty }\frac{1}{2\tau }d^2(x,j_k)&\le \limsup _{k\rightarrow \infty }-f_{n_k}(j_k)+f_{n_k}(z_{n_k})+\frac{1}{2\tau }d^2(x,z_{n_k})\\&=-\liminf _{k\rightarrow \infty }f_{n_k}(j_k)+f(c)+\frac{1}{2\tau }d^2(x,c)\\&\le -f(c)+f(c)+\frac{1}{2\tau }d^2(x,c)\\&=\frac{1}{2\tau }d^2(x,c)\\&\le \liminf _{k\rightarrow \infty }\frac{1}{2\tau }d^2(x,j_k) \end{aligned}$$

which together with (4) and Lemma 2.1 prove the strong convergence

$$\begin{aligned} j_k\rightarrow c \end{aligned}$$

which contradicts the assumption that \(d(j_k,J^{f}_\tau (x))\) does not converge to 0. Then using (5) we get

$$\begin{aligned} \lim _{n\rightarrow \infty }(f_n)_\tau (x)&=\lim _{n\rightarrow \infty }f_n(J^{f_n}_\tau (x))+\frac{1}{2\tau }d^2(x,J^{f_n}_\tau (x))\\&=f(J^{f}_\tau (x))+\frac{1}{2\tau }d^2(x,J^{f}_\tau (x))=f_\tau (x) \end{aligned}$$

finishing the proof of the second part of the assertion. \(\square \)

Remark 3.3

If \(J^{f}_\tau (x)\) is not unique in the above Theorem 3.8, then it still follows from the same proof that all weak cluster points of \(J^{f_n}_\tau (x)\) are in fact strong cluster points and are in \(J^{f}_\tau (x)\).

The following is a variant of the above result using different assumptions.

Theorem 3.9

Let \(f_n:X \rightarrow (-\infty ,\infty ]\) be a sequence of L-Lipschitz functions that is Mosco converging to \(f:X \rightarrow \overline{{\mathbb {R}}}\). Then

$$\begin{aligned} \lim _{n\rightarrow \infty }(f_n)_\tau (x)=f_\tau (x) \end{aligned}$$

and

$$\begin{aligned} \lim _{n\rightarrow \infty }J^{f_n}_\tau (x)=J^{f}_\tau (x) \end{aligned}$$

for any small enough \(\tau >0\).

Proof

From the definition of \(J^{f_n}_\tau (x)\), we have

$$\begin{aligned} f_n(x)\ge f_n(J^{f_n}_\tau (x))+\frac{1}{2\tau }d^2(x,J^{f_n}_\tau (x)) \end{aligned}$$

which combined with the L-Lipschitz property yields

$$\begin{aligned} d^2(x,J^{f_n}_\tau (x))&\le 2\tau [f_n(x)-f_n(J^{f_n}_\tau (x))],\\ d(x,J^{f_n}_\tau (x))&\le 2\tau \frac{f_n(x)-f_n(J^{f_n}_\tau (x))}{d(x,J^{f_n}_\tau (x))}\\&\le 2\tau L \end{aligned}$$

From here for small enough \(\tau >0\) we obtain that

$$\begin{aligned} d(x,J^{f_n}_\tau (x))<\epsilon \end{aligned}$$

for all large enough \(n\in {\mathbb {N}}\), given an arbitrary \(\epsilon \le \frac{\pi }{2}\). We can follow the proof of the previous Theorem 3.8 from (5) onward to obtain our assertions. \(\square \)

The following is yet another variant of the above, it generalizes Proposition 5.12. in [17] which proves this in CAT(0)-spaces.

Theorem 3.10

Let \(f_n:X \rightarrow (-\infty ,\infty ]\) be a uniformly lower bounded sequence of lsc functions that is Mosco converging to \(f:X \rightarrow (-\infty ,\infty ]\). Then

$$\begin{aligned} \lim _{n\rightarrow \infty }(f_n)_\tau (x)=f_\tau (x) \end{aligned}$$

and

$$\begin{aligned} \lim _{n\rightarrow \infty }J^{f_n}_\tau (x)=J^{f}_\tau (x) \end{aligned}$$

for any small enough \(\tau >0\) and \(x\in D(f)\).

Proof

Without loss of generality assume the uniform lower bound \(f_n,f\ge 0\). Since \(f\not \equiv +\infty \), we have \(f_\tau (x)<+\infty \) and thus \(f(J^{f}_\tau (x))<+\infty \). By (M2) there exists an \(\{y_n\}_{n\in {\mathbb {N}}}\subseteq X\), such that \(y_n\rightarrow x\) and \(f_n(y_n)\rightarrow f(x)\). Then by definition we have

$$\begin{aligned} f_n(y_n)+\frac{1}{2\tau }d^2(x,y_n)\ge f_n(J^{f_n}_\tau (x))+\frac{1}{2\tau }d^2(x,J^{f_n}_\tau (x)), \end{aligned}$$

which yields

$$\begin{aligned} \begin{aligned} d^2(x,J^{f_n}_\tau (x))&\le 2\tau [f_n(y_n)-f_n(J^{f_n}_\tau (x))]+d^2(x,y_n)\\&\le 2\tau f_n(y_n)+d^2(x,y_n)\rightarrow 2\tau f(x) \end{aligned} \end{aligned}$$

as \(n\rightarrow \infty \). Choosing small enough \(\tau >0\) this eventually yields that

$$\begin{aligned} d(x,J^{f_n}_\tau (x))<\epsilon \end{aligned}$$

for all large enough \(n\in {\mathbb {N}}\), given an arbitrary \(\epsilon \le \frac{\pi }{2}\). We can follow the proof of the previous Theorem 3.8 from (5) onward to obtain our assertions. \(\square \)

To construct discrete approximations of gradient curves of \(\phi \), we consider a partition of the interval \([0,\infty )\):

$$\begin{aligned} {\mathscr {P}}_{\tau }=\{0=t^0_{\tau }<t^1_{\tau } <\cdots \}, \qquad \lim _{k \rightarrow \infty } t^k_{\tau } =\infty , \end{aligned}$$

and set

$$\begin{aligned} \tau _k:=t^k_{\tau }-t^{k-1}_{\tau } \quad \text {for }k \in {\mathbb {N}},\qquad |\tau |:=\sup _{k \in {\mathbb {N}}} \tau _k. \end{aligned}$$

We will always assume \(|\tau |<\tau _*(\phi )\). Given an initial point \(x_0 \in D(\phi )\),

$$\begin{aligned} x_{\tau }^0:=x_0\text { and recursively choose }x_{\tau }^k \in J_{\tau _k}^{\phi }(x_{\tau }^{k-1})\text { for each }k \in {\mathbb {N}}. \end{aligned}$$
(6)

We call \(\{x_{\tau }^k\}_{k \in {\mathbb {N}}}\) a discrete solution of the variational scheme (6) associated with the partition \({\mathscr {P}}_{\tau }\), which is thought of as a discrete-time gradient curve for the potential function \(\phi \). The following a priori estimates (see [2, Lemma 3.2.2]) will be useful in the sequel. We remark that these estimates are easily obtained if \(\phi \) is bounded below.

Lemma 3.11

(A priori estimates) Let \(\phi :X \rightarrow (-\infty ,\infty ]\) satisfy Assumption 1(1). Then, for any \(x_* \in X\) and \(Q,T>0\), there exists a constant \(C=C(x_*,\tau _*(\phi ),Q,T)>0\) such that, if a partition \({\mathscr {P}}_{\tau }\) and an associated discrete solution \(\{x_{\tau }^k\}_{k \in {\mathbb {N}}}\) of (6) satisfy

$$\begin{aligned} \phi (x_0) \le Q,\quad d^2(x_0,x_*) \le Q,\quad t_{\tau }^N \le T,\quad |\tau | \le \frac{\tau _*(\phi )}{8}, \end{aligned}$$

then we have for any \(1 \le k \le N\)

$$\begin{aligned} d^2(x_{\tau }^k,x_*) \le C, \qquad \sum _{l=1}^k \frac{d^2(x_{\tau }^{l-1},x_{\tau }^l)}{2\tau _l} \le \phi (x_0)-\phi (x_{\tau }^k) \le C. \end{aligned}$$

In particular, for all \(1 \le k \le N\), we have \(d^2(x_{\tau }^{k-1},x_{\tau }^k) \le 2C \tau _k\) and

$$\begin{aligned} d^2(x_0,x_{\tau }^k) \le \left( \sum _{l=1}^k d(x_{\tau }^{l-1},x_{\tau }^l) \right) ^2 \le \sum _{l=1}^k \frac{d^2(x_{\tau }^{l-1},x_{\tau }^l)}{\tau _l} \cdot \sum _{l=1}^k \tau _l \le 2C t^k_{\tau }. \end{aligned}$$
(7)

The following result ensures the existence of the gradient curve.

Theorem 3.12

(gradient curve, cf. [25]) Fix an initial point \(x_0 \in D(\phi )\) and consider discrete solutions \(\{ x_{\tau _i}^k \}_{k \in {\mathbb {N}}}\) with \(x_{\tau _i}^0=x_0\) associated with a sequence of partitions \(\{{\mathscr {P}}_{\tau _i}\}_{i \in {\mathbb {N}}}\) such that \(\lim _{i \rightarrow \infty }|\tau _i|=0\). Then the piecewisely interpolated curve \({\bar{x}}_{\tau _i}:[0,\infty ) \rightarrow X\) converges to a curve \(\xi :[0,\infty ) \rightarrow X\) with \(\xi (0)=x_0\) as \(i \rightarrow \infty \) uniformly on each bounded interval [0, T]. In particular, the limit curve \(\xi \) does not depend on the choice of the sequence of partitions nor discrete solutions.

Theorem 3.13

(Error Estimate, cf. [25]) Let \(\lambda \le 0\) and \(\phi :X \rightarrow (-\infty ,\infty ]\) be lsc \(\lambda \)-convex and \(|\tau |<\tau _*(\phi )\). Fix \(x_0\in D(\phi )\) and let \(\xi (x_0):={\mathcal {G}}(t,x_0)\) denote the gradient flow of \(\phi \). Then we have

$$\begin{aligned} \begin{aligned} {d}^2\big ({{\bar{x}}}_{\tau },\xi (t) \big ) \le \,&e^{-2\lambda t} \left( \sqrt{1-K'-\frac{2\lambda }{3} |\tau }| +\sqrt{-\frac{4\lambda }{3} |\tau |} \right) ^2\\&|\tau | \left\{ \phi (x_0)-\phi \big ( {\bar{x}}_{\tau }(t) \big ) \right\} \end{aligned} \end{aligned}$$
(8)

for all \(t>0\), where \(K':=\min \{0,K\}\).

Thanks to the a priori estimate (Lemma 3.11), we have

$$\begin{aligned} \max \{ \phi (x_0)-\phi (x_{\tau _i}^k),\phi (x_0)-\phi (x_{\tau _j}^l) \} \le C=C(x_0,\tau _*(\phi ),\phi (x_0),T). \end{aligned}$$

Theorem 3.14

(Contraction property, cf. [25]) Let \(\phi \) be a lower semi continuous \(\lambda \)-convex function. Take \(x_0,y_0 \in D(\phi )\) and put \(\xi (t):={\mathcal {G}}(t,x_0)\) and \(\zeta (t):={\mathcal {G}}(t,y_0)\). Then we have, for any \(t>0\),

$$\begin{aligned} d\big ( \xi (t),\zeta (t) \big ) \le e^{-\lambda t} d(x_0,y_0). \end{aligned}$$
(9)

The following inequality is proved in Lemma 3.34. in [18] for all \(K+2\lambda \tau >0\), where \(K>0\) is the K-convexity of \(x\mapsto d^2(a,x)\). The K-convexity of \(x\mapsto d^2(z,x)\) holds under \({{\,\mathrm{\textrm{diam}}\,}}(X) < \frac{\pi }{2}-\epsilon \) with \(K:= (\pi - 2\epsilon )\tan (\epsilon )\). We have \(K>0\) for example on \({\overline{B}}_a((\pi -\epsilon )/2)\) by Lemma 3.1, thus \(K>0\) for any \(a\in {\overline{B}}_z((\pi -\epsilon )/4)\) and then

$$\begin{aligned} d(J^{f}_\tau (x),J^{f}_\tau (y))\le 4\frac{2d(x,y)+d(x,J^{f}_\tau (y))+d(J^{f}_\tau (x),y)}{K+2\lambda \tau }d(x,y) \end{aligned}$$
(10)

by Lemma 3.34 in [18] for any \(x,y\in {\overline{B}}_z((\pi -\epsilon )/4)\) and fixed z.

Now we are in position to prove the main result of this section regarding the continuity of gradient flows under Mosco convergence. The proof also works in the same way in CAT(0)-spaces providing a new proof of this result.

Theorem 3.15

Let (X,d) be a CAT(1) space. Assume \(f_n:X \rightarrow (-\infty ,\infty ]\) is a sequence of lsc semi-convex functions that is Mosco converging to \(f:X \rightarrow (-\infty ,\infty ]\). Let \(J^{f}_\tau (x)\) and \(J^{f_n}_\tau (x)\) be the corresponding resolvents such that for all small enough \(\tau >0\)

$$\begin{aligned} J^{f_n}_\tau (x)\rightarrow J^{f}_\tau (x) \end{aligned}$$

as established earlier under the extra assumptions imposed by either Theorems 3.8 or 3.9 or 3.10 and let \(S_t(x)\) and \(S^n_t(x)\) denote the corresponding gradient flows. Then

$$\begin{aligned} \lim _{n\rightarrow \infty }S^n_t(x)=S_t(x) \end{aligned}$$

for any \(x\in D(f)\).

Proof

By (M2) there exists an \(y_n\rightarrow x\) with \(f_n(y_n)\rightarrow f(x)\). Then

$$\begin{aligned} \begin{aligned} d(S_t(x),S^n_t(x))&\le d(S_t(x),S^n_t(y_n))+d(S^n_t(y_n),S^n_t(x))\\&\le d(S_t(x),S^n_t(y_n))+e^{-\lambda t}d(y_n,x), \end{aligned} \end{aligned}$$
(11)

furthermore

$$\begin{aligned} \begin{aligned}&d(S_t(x),S^n_t(y_n))\le d\left( S_t(x),\left( J_{t/k}^{f}\right) ^{k}(x)\right) \\&\qquad +d\left( \left( J_{t/k}^{f}\right) ^{k}(x),S^n_t(y_n)\right) \\&\quad \le d\left( S_t(x),\left( J_{t/k}^{f}\right) ^{k}(x)\right) +d\left( \left( J_{t/k}^{f}\right) ^{k}(x),\left( J_{t/k}^{f_n}\right) ^{k}(y_n)\right) \\&\qquad +d\left( \left( J_{t/k}^{f_n}\right) ^{k}(y_n),S^n_t(y_n)\right) . \end{aligned} \end{aligned}$$
(12)

To estimate the second term of (12) we argue as

$$\begin{aligned} \begin{aligned}&d\left( \left( J_{t/k}^{f}\right) ^{k}(x),\left( J_{t/k}^{f_n}\right) ^{k}(y_n)\right) \ \\&\le d\left( J_{t/k}^{f}\left( \left( J_{t/k}^{f}\right) ^{k-1}(x)\right) ,J_{t/k}^{f_n}\left( \left( J_{t/k}^{f}\right) ^{k-1}(x)\right) \right) \\&\qquad +d\left( J_{t/k}^{f_n}\left( \left( J_{t/k}^{f}\right) ^{k-1}(x)\right) ,J_{t/k}^{f_n}\left( \left( J_{t/k}^{f_n}\right) ^{k-1}(x)\right) \right) \\&\qquad +d\left( J_{t/k}^{f_n}\left( \left( J_{t/k}^{f_n}\right) ^{k-1}(x)\right) ,\left( J_{t/k}^{f_n}\right) ^{k}(y_n)\right) \\&=d\left( J_{t/k}^{f}\left( \left( J_{t/k}^{f}\right) ^{k-1}(x)\right) ,J_{t/k}^{f_n}\left( \left( J_{t/k}^{f}\right) ^{k-1}(x)\right) \right) \\&\qquad +d\left( J_{t/k}^{f_n}\left( \left( J_{t/k}^{f}\right) ^{k-1}(x)\right) ,J_{t/k}^{f_n}\left( \left( J_{t/k}^{f_n}\right) ^{k-1}(x)\right) \right) \\&\qquad +d\left( \left( J_{t/k}^{f_n}\right) ^{k}(x),\left( J_{t/k}^{f_n}\right) ^{k}(y_n)\right) . \end{aligned} \end{aligned}$$
(13)

First notice that (4) is established in all of Theorems 3.8,3.9,3.10. Then (10) establishes a Lipschitz estimate for resolvents in CAT(1) spaces with small enough radius where (4) ensures boundedness of Lipschitz constants, so applying (13) recursively for each k we obtain an upper bound in which the already established pointwise convergence of resolvents assures this second term also goes to 0 for each fixed k as \(n\rightarrow \infty \).

We continue by estimating each of the remaining terms on the right hand side of (12). By (8) with \(C_0:=\sqrt{1-K'-\frac{2\lambda }{3} |\tau }| +\sqrt{-\frac{4\lambda }{3} |\tau |}\) and \(x_{t/k}^k:=J_{t/k}^{f}\left( x_{t/k}^{k-1}\right) \), \(x_{t/k}^{0}:=x\) we have that

$$\begin{aligned} d\left( S_t(x),\left( J_{t/k}^{f}\right) ^{k}(x)\right) \le e^{-\lambda t}C_0\sqrt{\frac{t}{k}}\sqrt{f(x)-f(x_{t/k}^k)} \end{aligned}$$
(14)

and similarly with \(y_{t/k}^k:=J_{t/k}^{f_n}\left( y_{t/k}^{k-1}\right) \), \(y_{t/k}^{0}:=y_n\) we also have that

$$\begin{aligned} \begin{aligned} d\left( \left( J_{t/k}^{f_n}\right) ^{k}(y_n),S^n_t(y_n)\right)&\le e^{-\lambda t}C_0\sqrt{\frac{t}{k}}\sqrt{f_n(y_n)-f_n(y_{t/k}^k)}\\&\le e^{-\lambda t}C_0\sqrt{\frac{t}{k}}\sqrt{f(x)-f(x_{t/k}^k)+\epsilon _n}, \end{aligned} \end{aligned}$$
(15)

where to obtain the second inequality for fixed k we used (M2) for \(f_n(y_n)\rightarrow f(x)\), and (M1) for \(f(x_{t/k}^k)\le \liminf _{n\rightarrow \infty }f_n(y_{t/k}^k)\) where \(y_{t/k}^k\rightarrow x_{t/k}^k\) as \(n\rightarrow \infty \) by (13), thus the remainder \(\epsilon _n\rightarrow 0\).

Combining all these estimates together with the a priori estimate Lemma 3.11 for f, for a given \(\epsilon >0\) first choose a large enough k and then a large enough N to obtain that the right hand side of (11) is less than \(\epsilon \) for all \(n>N\). \(\square \)

4 Sequences of spaces with an asymptotic relation

Let \((X_i,d_i), (X,d)\) be complete \(\textrm{CAT}(1)\)-spaces. Define

$$\begin{aligned} {\mathcal {X}}:=X\sqcup \left( \bigsqcup _{i}X_i\right) \end{aligned}$$

as a disjoint union.

Definition 4.1

(Asymptotic relation, cf. [17]) We call a topology on \({\mathcal {X}}\) an asymptotic relation between \(X_i\) and X if

  1. (1)

    \(X_i\) and X are all closed in \({\mathcal {X}}\), and the restricted topology of \({\mathcal {X}}\) on each of \(X_i\) and X coincides with its original topology;

  2. (2)

    for any \(x\in X\) there exists a net \(x_i\in X_{i}\) converging to x in \({\mathcal {X}}\);

  3. (3)

    if \(X_i\ni x_i\rightarrow x\in X\) and \(X_i\ni y_i\rightarrow y\in X\) in \({\mathcal {X}}\), then we have \(d_{i}(x_i,y_i)\rightarrow d(x,y)\);

  4. (4)

    if \(X_i\ni x_i\rightarrow x\in X\) and \(y_i\in X_i\) is a net with \(d_i(x_i,y_i)\rightarrow 0\), then \(y_i\rightarrow x\) in \({\mathcal {X}}\).

In this section we assume that \(X_i\) and X have an asymptotic relation. The following definition can be found in [17] which generalizes weak convergence to an asymptotic relation in the \(\textrm{CAT}(0)\) setting. We modify it suitably for our \(\textrm{CAT}(1)\)-setting.

Definition 4.2

(Weak convergence) A net \(x_i\in X_i\) weakly converges to a point \(x\in X\), that is \(x_i\overset{w}{\rightarrow }\ x\), if for any net of \(\pi \)-convex geodesic segments \(\gamma _i\subseteq X_i\) strongly converging to a \(\pi \)-convex geodesic segment \(\gamma \in X\) with \(\gamma (0)=x\), such that \(d_i(\gamma _i,x_i):=\inf _{z\in \gamma _i}d_i(z,x_i)<\pi /2\), we have that \(P_{\gamma _i}(x_i)\) strongly converges to x. Strong convergence of \(\gamma _i\) to \(\gamma \) means that \(\gamma _i(t)\rightarrow \gamma (t)\) for each \(t\in D(\gamma )\).

Similarly to the case of a single \(\textrm{CAT}(1)\)-space it is easy to prove that strong convergence implies weak convergence and that weak limit points are unique in small enough metric balls.

Let \(D\subset M_{\kappa }^2\) be a closed Jordan domain whose boundary has finite length, and let X be a metric space. A map \(f:D\rightarrow X\) is majorizing if it is 1-Lipschitz and its restriction to the boundary \(\partial D\) is length-preserving. If \(\Gamma \) is a closed curve in X, we say that D majorizes \(\Gamma \) if there is a majorizing map \(f:D\rightarrow X\) such that the restriction \(f|_{\partial D}\) traces out \(\Gamma \). See Section 8.12 of [1] for the following result.

Theorem 4.1

(Reshetnyak’s majorization theorem) For any closed curve \(\Gamma \) in a \(\textrm{CAT}(\kappa )\) space (of length at most \(R_\kappa :=\frac{\pi }{\sqrt{\kappa }}\) when \(\kappa >0\)), there exists a convex region D in \(M_{\kappa }^2\), and an associated map f such that D majorizes \(\Gamma \) under f.

The following three results appeared in [17] as Lemma 5 (3–5) in the \(\textrm{CAT}(0)\)-setting and have also been mentioned in [19] without proofs in the \(\textrm{CAT}(1)\)-case. We prove them below in the \(\textrm{CAT}(1)\)-setting.

Lemma 4.2

Let \(x_i,y_i\in X_i\) and \(x,y\in X\) such that \(d_i(x_i,y_i),d(y,x)<\pi /2\) for all i. Assume \(x_i\overset{w}{\rightarrow } x\) and \(y_i\rightarrow y\). Then we have

(1):

\(d(x,y)\le \liminf _{i}{d_i(x_i,y_i)}\),

(2):

\(d_i(x_i,y_i)\rightarrow d(x,y)\) if and only if \(x_i\rightarrow x\).

Proof

(1) :  Take a net \({\hat{x}}_i\rightarrow x\) such that all geodesic segments \([{\hat{x}}_i,y_i]\) are \(\pi \)-convex. By the obtuse-angle property the spherical law of cosines gives

$$\begin{aligned} \begin{aligned}&\cos d_i(x_i,y_i)\le \cos d_i(x_i,P_{[{\hat{x}}_i,y_i]}(x_i))\cos d_i(y_i,P_{[{\hat{x}}_i,y_i]}(x_i))\\&\qquad +\sin d_i(x_i,P_{[{\hat{x}}_i,y_i]}(x_i))\sin d_i(y_i,P_{[{\hat{x}}_i,y_i]}(x_i))\cos \angle _{P_{[{\hat{x}}_i,y_i]}(x_i)}(x_i,y_i)\\&\quad \le \cos d_i(x_i,P_{[{\hat{x}}_i,y_i]}(x_i))\cos d_i(y_i,P_{[{\hat{x}}_i,y_i]}(x_i))\\&\quad \le \cos d_i(y_i,P_{[{\hat{x}}_i,y_i]}(x_i)), \end{aligned} \end{aligned}$$

which implies

$$\begin{aligned} d_i(x_i,y_i)\ge d_i(y_i,P_{[{\hat{x}}_i,y_i]}(x_i)), \end{aligned}$$

thus

$$\begin{aligned} \liminf _{i}d_i(x_i,y_i)\ge \liminf _{i}d_i(y_i,P_{[{\hat{x}}_i,y_i]}(x_i))=d(y,x). \end{aligned}$$

(2) :  The implication "\(\Leftarrow \)" is obvious. We shall prove "\(\Rightarrow \)". The assumption yields that in the proof of (1) instead of inequalities we have equalities, thus it follows that \(\cos d_i(x_i,P_{[{\hat{x}}_i,y_i]}(x_i))\rightarrow 1\) implying \(d_i(x_i,P_{[{\hat{x}}_i,y_i]}(x_i))\rightarrow 0\). Since \(P_{[{\hat{x}}_i,y_i]}(x_i)\rightarrow x\), this implies \(x_i\rightarrow x\). \(\square \)

Lemma 4.3

Let \(x_i\in X_i\) be a net and \(\gamma _i,\sigma _i:[0,1]\rightarrow X_i\) geodesic segments such that \(x_i,\gamma _i,\sigma _i\) are contained in \((\pi /2-\epsilon )\)-convex sets for each index i and a given \(\epsilon >0\) and

$$\begin{aligned} \lim _{i}d_i(\gamma _i(0),\sigma _i(0))=\lim _{i}d_i(\gamma _i(1),\sigma _i(1))=0. \end{aligned}$$
(16)

Then we have

$$\begin{aligned} \lim _{i}d_i(P_{\gamma _i}(x_i),P_{\sigma _i}(x_i))=0. \end{aligned}$$

Proof

The \((\pi /2-\epsilon )\)-convexity assumption and (16) with Theorem 4.1 assures that

$$\begin{aligned} \lim _{i}\sup _{t\in [0,1]}d_i(\gamma _i(t),\sigma _i(t))=0. \end{aligned}$$
(17)

Alternatively to obtain (17) instead of Theorem 4.1 one can also use Ohta’s L-convexity [23] available in CAT(1)-spaces. We set \(y_i:=P_{\gamma _i}(x_i)\), \(z_i:=P_{\sigma _i}(x_i)\) and take \(s_i,t_i\in [0,1]\) satisfying \(\gamma _i(s_i)=y_i\) and \(\sigma _i(t_i)=z_i\). Then (17) leads to

$$\begin{aligned} \lim _{i}d_i(\sigma _i(s_i),y_i)=\lim _{i}d_i(\gamma _i(t_i),z_i)=0. \end{aligned}$$

By Exercise 2.6(1) [9] P is a nonexpansive map, thus \(d_i(x_i,y_i)\le d_i(x_i,\gamma _i(t_i))\) and \(d_i(x_i,z_i)\le d_i(x_i,\sigma _i(s_i))\), we have

$$\begin{aligned} \begin{aligned} \lim _i|d_i(x_i,y_i)-d_i(x_i,z_i)|&=0,\\ \lim _i|d_i(x_i,y_i)-d_i(x_i,\gamma _i(t_i))|&=0. \end{aligned} \end{aligned}$$

By the spherical law of cosines we have

$$\begin{aligned} \cos d_i(x_i,\gamma _i(t_i))\le \cos d_i(x_i,y_i)\cos d_i(y_i,\gamma _i(t_i)) \end{aligned}$$

which implies

$$\begin{aligned} \cos d_i(x_i,\gamma _i(t_i))-\cos d_i(x_i,y_i)\le \cos d_i(x_i,y_i)(\cos d_i(y_i,\gamma _i(t_i))-1), \end{aligned}$$

and also using the trigonometric addition formulas for \(\cos \) we have that \(\cos d_i(x_i,\gamma _i(t_i))-\cos d_i(x_i,y_i)\rightarrow 0\) since \(d_i(x_i,y_i)-d_i(x_i,\gamma _i(t_i))\rightarrow 0\). Furthermore by the \((\pi /2-\epsilon )\)-convexity assumption we have

$$\begin{aligned} \cos d_i(x_i,\gamma _i(t_i)),\cos d_i(x_i,y_i)\ge s \end{aligned}$$

for some \(s>0\), so the above implies \(0\ge (\cos d_i(y_i,\gamma _i(t_i))-1)\rightarrow 0\), equivalently \(d_i(y_i,\gamma _i(t_i))\rightarrow 0\) and thus also \(d_i(y_i,z_i)\rightarrow 0\). \(\square \)

In the proof of the result below we assume that the spaces \(X_i,X\) are separable, however the argument should generalize to nets in non-separable spaces, since the main tool of Cantor’s diagonalization process is available there as well, see for example [4]. The proof itself closely follows its CAT(0) variant of Lemma 5.5 in [17].

Lemma 4.4

Let \(\epsilon >0\) be arbitrary. Any \(x_i\in X_i\) net satisfying \(d_i(x_i,o_i)<\pi /4-\epsilon \) for a net \(o_i\rightarrow o\in X\) has a weakly convergent subnet.

Proof

We may assume that \(\{x_i\}\) is a countable sequence. First take a dense countable subset \(\{\xi _\nu \}_{\nu \in {\mathbb {N}}}\subseteq B_o(\pi /4-\epsilon )\subseteq X\). Let \(y_0\in B_o(\pi /4-\epsilon )\) and select \(y_{0,i}\in B_{o_i}(\pi /4-\epsilon )\subseteq X_i\) such that \(y_{0,i}\rightarrow y_0\) strongly. For each \(\nu \in {\mathbb {N}}\), we select a sequence \(\xi _{\nu ,i}\in B_{o_i}(\pi /4-\epsilon )\subseteq X_i\) so that \(\xi _{\nu ,i}\rightarrow \xi _{\nu }\) strongly. Then we have \(\gamma ^0_{\nu ,i}:=[y_{0,i},\xi _{\nu ,i}]\rightarrow [y_{0},\xi _{\nu }]=:\gamma ^0_{\nu }\) as \(i\rightarrow \infty \). Since \(d_i(x_i,y_{0,i})\) is bounded, \(w^0_{\nu ,i}:=P_{[y_{0,i},\xi _{\nu ,i}]}(x_i)\) has a convergent subsequence whose limit denoted by \(w^0_{\nu }\) is a point in \([y_{0},\xi _{\nu }]\). By Cantor’s diagonalization process, we can choose a common subsequence of \(\{i\}\) independent of \(\nu \in {\mathbb {N}}\) for which \(w^0_{\nu ,i}\rightarrow w^0_{\nu }\) and we denote this subsequence with the indices \(\{i\}\). Select an arbitrary sequence \(0<\epsilon _m\rightarrow 0\). We define \(y_{m,i}\) and \(y_m\) inductively as follows. We assume that \(y_{m,i}\) and \(y_m\) are defined so that \(y_{m,i}\rightarrow y_m\). The sequence \(w^m_{\nu ,i}:=P_{[y_{m,i},\xi _{\nu ,i}]}(x_i)\) has a convergent subsequence, which can again be chosen independent of \(\nu \in {\mathbb {N}}\) and we replace \(\{i\}\) with that subsequence and set \(w^m_{\nu }:=\lim _{i}w^m_{\nu ,i}\). There exists a number \(\nu (m+1)\in {\mathbb {N}}\) such that

$$\begin{aligned} d(y_m,w^m_{\nu (m+1)})>\sup _{\nu }d(y_m,w^m_{\nu })-\epsilon _m. \end{aligned}$$

Define \(y_{m+1}:=w^m_{\nu (m+1)}\) and \(y_{m+1,i}:=w^m_{\nu (m+1),i}\). Then \(y_{m,i}\rightarrow y_{m}\) for each m.

By the obtuse-angle property, the spherical law of cosines implies

$$\begin{aligned} \cos d_i(y_{m,i},x_i)\le \cos d_i(x_i,y_{m+1,i})\cos d_i(y_{m+1,i},y_{m,i}). \end{aligned}$$

Taking a subsequence of \(\{i\}\) again, we can assume that for each m, \(d_i(y_{m,i},x_i)\) converges to some \(\lambda _m\in {\mathbb {R}}\). This implies that \(\lambda _m\) is monotone nonincreasing, thus \(d(y_{m+1},y_{m})\rightarrow 0\) as well. For any \(\nu \in {\mathbb {N}}\),

$$\begin{aligned} \lim _{i}d_i(y_{m,i},w^m_{\nu ,i})=d(y_{m},w^m_{\nu })<\epsilon _m+d(y_{m+1},y_{m})=:\epsilon '_m\rightarrow 0. \end{aligned}$$
(18)

There exists \(\{\nu (l,i)\}\) such that \(\lim _{i}\xi _{\nu (l,i)}=y_l\). By (18), if \(\nu \ll i\), then \(d_i(y_{m,i},w^m_{\nu ,i})\le \epsilon '_m\). We may assume \(\nu (l,i)\ll i\), so that we have

$$\begin{aligned} \limsup _{i}d_i(y_{m,i},w^m_{\nu ,i})\le \epsilon '_m. \end{aligned}$$

Since \([y_{m,i},\xi _{\nu (l,i),i}]\rightarrow [y_m,y_l]\), we can choose a common subsequence \(\{i\}\) independent of ml for which \(w^m_{\nu (l,i),i}\in [y_{m,i},\xi _{\nu (l,i),i}]\) converges to some point \(x_{m,l}\in [y_m,y_l]\). By the inequality above

$$\begin{aligned} d(y_m,x_{m,l})\le \epsilon '_m. \end{aligned}$$

Applying Lemma 4.3 to \(\gamma _i:= [y_{m,i},y_{l,i}]\) and \(\sigma _i:= [y_{m,i},y_{v(l,i),i}]\), we have \(P_{[y_{m,i},y_{l,i}]} (x_i) \rightarrow x_{m,l}\). Therefore,

$$\begin{aligned} d(y_l,y_m)\le d(y_l,x_{l,m})+d(x_{l,m},y_m)\le \epsilon '_m+\epsilon '_l \end{aligned}$$

implying that \(\{y_m\}\) is Cauchy. Let \(x:=\lim _m y_m\). By (18), for any geodesic \(\gamma \) emanating from x with an associated sequence of geodesics \(\gamma _i\in X_i\) strongly converging to \(\gamma \) we have that \(P_{\gamma _i}(x_i)\rightarrow x\) strongly. Then it follows that \(x_i\overset{w}{\rightarrow }\ x\) completing the proof. \(\square \)

Remark 4.1

It is interesting to compare the statement of the above Lemma 4.4 with Lemma 2.2. Can we have the weak compactness of larger sets as well in Lemma 4.4 such as in Lemma 2.2 for a single space? Note that the proof above uses projections in an essential way, on the other hand the asymptotic center approach in [26] can prove Lemma 2.2 in a different way obtaining weak compactness in \(\pi \)-convex sets.

Definition 4.3

(Mosco convergence for asymptotic relations) A sequence of lsc functions \(f_n:X_n\rightarrow \overline{{\mathbb {R}}}\) said to converge to \(f:X\rightarrow \overline{{\mathbb {R}}}\) in the sense of Mosco if, for any \(x\in X\), we have

  1. (M1)

    \(f(x)\le \liminf _{n\rightarrow \infty }f_n(x_n)\) whenever \(x_n\in X_n\) and \(x_n\overset{w}{\rightarrow }\ x\),

  2. (M2)

    there exists a sequence \(y_n\in X_n\), such that \(y_n\rightarrow x\) and \(f_n(y_n)\rightarrow f(x)\).

We prove a version of Proposition 3.7 adapted to the setting of this section.

Proposition 4.5

(Ekeland principle, bounded case) Let \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi /2-\epsilon \) for an \(\epsilon >0\). Given \(w_n\in X_n\) such that \(w_n\rightarrow w\in X\) and a sequence of lsc \(\lambda \)-convex functions \(f_n:X_n\rightarrow (-\infty ,\infty ]\) that is Mosco converging to \(f:X\rightarrow (-\infty ,\infty ]\). Then there exist \(\alpha ,\beta \ge 0\) such that

$$\begin{aligned} f_n(x_n)\ge -\alpha d_{n}(x_n,w_n)-\beta \end{aligned}$$

for all \(x_n\in X_n\) and \(n\in {\mathbb {N}}\).

Proof

Assume that the assertion is false; that is, for any \(k\in {\mathbb {N}}\), there exists \(n_k\in {\mathbb {N}}\) and \(x_{n_k}\in X_{n_k}\) such that

$$\begin{aligned} f_{n_k}(x_{n_k})+k[d_{n_k}(x_{n_k},w_{n_k})+1]<0. \end{aligned}$$

We first assume that \(n_k\rightarrow \infty \) as \(k\rightarrow \infty \). Since we have \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi \), we can select a weakly convergent subsequence from \(x_{n_k}\) still denoted by \(x_{n_k}\) with weak limit denoted by \({\overline{x}}\in X\). Then by the Mosco convergence of \(f_n\) we have

$$\begin{aligned} \begin{aligned} f({\overline{x}})\le \liminf _{k\rightarrow \infty }f_{n_k}(x_{n_k})&\le \liminf _{k\rightarrow \infty }-k[d_{n_k}(x_{n_k},w_{n_k})+1]\\&\le -\limsup _{k\rightarrow \infty }k[d_{n_k}(x_{n_k},w_{n_k})+1]\\&=-\infty \end{aligned} \end{aligned}$$

which is a contradiction. The rest of the proof is similar to the proof of Proposition 3.7. \(\square \)

Theorem 4.6

Let \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi /2-\epsilon \) for an \(\epsilon >0\) and \(f_n:X_n \rightarrow (-\infty ,\infty ]\) a sequence of lsc \(\lambda \)-convex functions that is Mosco converging to \(f:X \rightarrow (-\infty ,\infty ]\). Then

$$\begin{aligned} \lim _{n\rightarrow \infty }(f_n)_\tau (x_n)=f_\tau (x) \end{aligned}$$

and

$$\begin{aligned} \lim _{n\rightarrow \infty }J^{f_n}_\tau (x_n)=J^{f}_\tau (x) \end{aligned}$$

for any small enough \(\tau >0\) and \(x\in D(f)\) and \(x_n\in X_n\) with \(x_n\rightarrow x\).

Proof

From Proposition 4.5 we have

$$\begin{aligned} f_n(J^{f_n}_\tau (x_n))\ge -\alpha d_n(J^{f_n}_\tau (x_n),w_n)-\beta . \end{aligned}$$

From the definition of \(J^{f_n}_\tau (x_n)\) and (M2), we have

$$\begin{aligned} f_n(y_n)+\frac{1}{2\tau }d_n^2(x_n,y_n)\ge f_n(J^{f_n}_\tau (x_n))+\frac{1}{2\tau }d_n^2(x_n,J^{f_n}_\tau (x_n)) \end{aligned}$$

where \(y_n\rightarrow x\), which combined with the above estimate yields

$$\begin{aligned} f_n(y_n)+\frac{1}{2\tau }d_n^2(x_n,y_n)+\alpha d_n(J^{f_n}_\tau (x_n),w_n)+\beta \ge \frac{1}{2\tau }d_n^2(x_n,J^{f_n}_\tau (x_n)). \end{aligned}$$

That is,

$$\begin{aligned} 0\ge d_n^2(x_n,J^{f_n}_\tau (x_n))-2\tau \alpha d(J^{f_n}_\tau (x_n),w_n)-2\tau (f_n(y_n)+\beta )-d_n^2(x_n,y_n). \end{aligned}$$

Since \(d_n^2(x_n,y_n)\rightarrow 0\) and \(f_n(y_n)\rightarrow f(x)<\infty \) by (M2), for small enough \(\tau >0\) the above forces

$$\begin{aligned} d_n(x_n,J^{f_n}_\tau (x_n))<s \end{aligned}$$
(19)

for all large enough \(n\in {\mathbb {N}}\), given an arbitrary \(s\le \frac{\pi }{4}-\epsilon \).

Now we are in position to prove \(\lim _{n\rightarrow \infty }J^{f_n}_\tau (x_n)=J^{f}_\tau (x)\). Pick any subsequence \(j_k\in J^{f_{n_k}}_\tau (x_{n_k})\). Since \({{\,\mathrm{\textrm{diam}}\,}}(X)<\pi /2-\epsilon \), there exists a subsequence of \(\{j_k\}_{k\in {\mathbb {N}}}\) still denoted by \(j_k\) with weak limit \(c\in X\).

Since \(f_n\rightarrow f\) in the sense of Mosco, there exists a sequence \(y_n\rightarrow J^{f}_\tau (x)\) and \(f_n(y_n)\rightarrow f(J^{f}_\tau (x))\). Then using (1) of Lemma 4.2 we get

$$\begin{aligned} \begin{aligned} \limsup _{k\rightarrow \infty }(f_{n_k})_\tau (x_{n_k})&\le \limsup _{k\rightarrow \infty }f_{n_k}(y_{n_k})+\frac{1}{2\tau }d_{n_k}^2(x_{n_k},y_{n_k})\\&=f(J^{f}_\tau (x))+\frac{1}{2\tau }d^2(x,J^{f}_\tau (x))\\&\le f(c)+\frac{1}{2\tau }d^2(x,c)\\&\le \liminf _{k\rightarrow \infty }f_{n_k}(j_k)+\frac{1}{2\tau }d_{n_k}^2(x_{n_k},j_k)\\&=\liminf _{k\rightarrow \infty }f_{n_k}(J^{f_{n_k}}_\tau (x_{n_k}))+\frac{1}{2\tau }d_{n_k}^2(x_{n_k},J^{f_{n_k}}_\tau (x_{n_k}))\\&=\liminf _{k\rightarrow \infty }(f_{n_k})_\tau (x_{n_k}) \end{aligned} \end{aligned}$$
(20)

which yields \(c\in J^{f}_\tau (x)\). Furthermore using (M1) and (M2) with \(z_n\rightarrow c\) and \(f_n(z_n)\rightarrow f(c)\) we have

$$\begin{aligned} \limsup _{k\rightarrow \infty }\frac{1}{2\tau }d_{n_k}^2(x_{n_k},j_k)&\le \limsup _{k\rightarrow \infty }-f_{n_k}(j_k)+f_{n_k}(z_{n_k})+\frac{1}{2\tau }d_{n_k}^2(x_{n_k},z_{n_k})\\&=-\liminf _{k\rightarrow \infty }f_{n_k}(j_k)+f(c)+\frac{1}{2\tau }d^2(x,c)\\&\le -f(c)+f(c)+\frac{1}{2\tau }d^2(x,c)\\&\le \frac{1}{2\tau }d^2(x,c)\\&\le \liminf _{k\rightarrow \infty }\frac{1}{2\tau }d_{n_k}^2(x_{n_k},j_k) \end{aligned}$$

which together with (19) and (2) of Lemma 4.2 prove the strong convergence

$$\begin{aligned} j_k\rightarrow c. \end{aligned}$$

Then using (20) we get

$$\begin{aligned} \lim _{n\rightarrow \infty }(f_n)_\tau (x_n)&=\lim _{n\rightarrow \infty }f_n(J^{f_n}_\tau (x_n))+\frac{1}{2\tau }d_n^2(x_n,J^{f_n}_\tau (x_n))\\&=f(J^{f}_\tau (x))+\frac{1}{2\tau }d^2(x,J^{f}_\tau (x))=f_\tau (x) \end{aligned}$$

finishing the proof of the second part of the assertion. \(\square \)

At this point the analogs of Theorem 3.9 or Theorem 3.10 for asymptotic relations can be established as well in the same manner. We omit the details.

Theorem 4.7

Assume \(f_n:X_n\rightarrow (-\infty ,\infty ]\) is a sequence of lsc semi-convex functions that is Mosco converging to \(f:X\rightarrow (-\infty ,\infty ]\). Let \(J^{f}_\tau (x)\) and \(J^{f_n}_\tau (x_n)\) be the corresponding resolvents such that for all small enough \(\tau >0\)

$$\begin{aligned} J^{f_n}_\tau (x_n)\rightarrow J^{f}_\tau (x) \end{aligned}$$

as established earlier in this section and let \(S_t(x)\) and \(S^n_t(x_n)\) denote the corresponding gradient flows. Then

$$\begin{aligned} \lim _{n\rightarrow \infty }S^n_t(x_n)=S_t(x) \end{aligned}$$

for any \(t \ge 0\) and \(x\in D(f)\) and \(x_n\in X_n\) with \(x_n\rightarrow x\).

Proof

By (M2) there exists an \(y_n\rightarrow x\) with \(f_n(y_n)\rightarrow f(x)\). Since \(y_n,x_n\rightarrow x\), thus \(d_n(y_n,x_n)\rightarrow 0\) implying

$$\begin{aligned} d_n(S^n_t(y_n),S^n_t(x_n))\le e^{-\lambda t}d_n(y_n,x_n)\rightarrow 0, \end{aligned}$$

so it is enough to show that \(S^n_t(y_n)\rightarrow S_t(x)\). First we claim that

$$\begin{aligned} \left( J_{t/k}^{f_n}\right) ^{k}(y_n)\rightarrow \left( J_{t/k}^{f}\right) ^{k}(x) \end{aligned}$$
(21)

for any fixed \(k\in {\mathbb {N}}\). Indeed, \(y_n\rightarrow x\) and assuming that (21) holds for a \(k\ge 0\), Theorem 4.6 proves (21) for \(k+1\) yielding the claim by induction.

Next, by (8) with \(C_0:=\sqrt{1-K'-\frac{2\lambda }{3} |\tau }| +\sqrt{-\frac{4\lambda }{3} |\tau |}\) and \(x_{t/k}^k:=J_{t/k}^{f}\left( x_{t/k}^{k-1}\right) \), \(x_{t/k}^{0}:=x\) we have that

$$\begin{aligned} d\left( S_t(x),\left( J_{t/k}^{f}\right) ^{k}(x)\right) \le e^{-\lambda t}C_0\sqrt{\frac{t}{k}}\sqrt{f(x)-f(x_{t/k}^k)} \end{aligned}$$
(22)

and similarly with \(y_{t/k}^k:=J_{t/k}^{f_n}\left( y_{t/k}^{k-1}\right) \), \(y_{t/k}^{0}:=y_n\) we also have that

$$\begin{aligned} \begin{aligned} d_n\left( \left( J_{t/k}^{f_n}\right) ^{k}(y_n),S^n_t(y_n)\right)&\le e^{-\lambda t}C_0\sqrt{\frac{t}{k}}\sqrt{f_n(y_n)-f_n(y_{t/k}^k)}\\&\le e^{-\lambda t}C_0\sqrt{\frac{t}{k}}\sqrt{f(x)-f(x_{t/k}^k)+\epsilon _n}, \end{aligned} \end{aligned}$$
(23)

where to obtain the second inequality for fixed k we used (M2) for \(f_n(y_n)\rightarrow f(x)\), and (M1) for \(f(x_{t/k}^k)\le \liminf _{n\rightarrow \infty }f_n(y_{t/k}^k)\) where \(y_{t/k}^k\rightarrow x_{t/k}^k\) as \(n\rightarrow \infty \) by (21), thus the remainder \(\epsilon _n\rightarrow 0\).

Now choose sequences \(z^t_n,j^{t,k}_n\in X_n\) such that \(z^t_n\rightarrow S_t(x)\) and \(j^{t,k}_n\rightarrow \left( J_{t/k}^{f}\right) ^{k}(x)\) as \(n\rightarrow \infty \). Then it is enough to show that \(d_n(S^n_t(y_n),z^t_n)\rightarrow 0\). We estimate as

$$\begin{aligned} \begin{aligned} d_n(S^n_t(y_n),z^t_n)&\le d_n\left( S^n_t(y_n),\left( J_{t/k}^{f_n}\right) ^{k}(y_n)\right) \\&\quad +d_n\left( \left( J_{t/k}^{f_n}\right) ^{k}(y_n),j^{t,k}_n\right) +d_n\left( j^{t,k}_n,z^t_n\right) \end{aligned} \end{aligned}$$
(24)

Combining all these estimates together with the a priori Lemma 3.11 for f and an arbitrary \(\epsilon >0\), we first choose a large enough k so that the right hand side of (22) is less than \(\epsilon \) which implies that there exists a large enough \(N_1>0\) such that the last term on the right hand side of (24) is less than \(3\epsilon \) for all \(n>N_1\). Then choose a large enough \(N_2>N_1\) such that for all \(n>N_2\) by (21) the second term on the right hand side of (24) is less than \(2\epsilon \). Finally choose a large enough \(N_3>N_2\) such that for all \(n>N_3\) the right hand side of (23) is less than \(2\epsilon \) making the first term on the right hand side of (24) less than \(2\epsilon \). This implies that \(d_n(S^n_t(y_n),z^t_n)<7\epsilon \), thus the desired \(S^n_t(y_n)\rightarrow S_t(x)\) follows. \(\square \)