1 Introduction

In this paper, we consider optimal stopping problems under model uncertainty in terms of ambiguity aversion. By representation results, this means that we look at stochastic optimisation problems of the form

$$ \sup _{\tau \in \mathcal {T}}\sup _{Q\in \mathcal {Q}}\big(E_{ Q} [Y_{\tau} ]-\beta (Q)\big), $$
(1.1)

where \(\mathcal {T}\) and \(\mathcal {Q}\) denote the set of stopping times and a set of probability measures, respectively, whereas \(\beta \) stands for a convex penalty function (see Maccheroni et al. [27]). In the special case

$$ \sup _{\tau \in \mathcal {T}}\sup _{Q\in \mathcal {Q}}E_{Q}[Y_{ \tau}], $$
(1.2)

the optimal value represents the superhedging price of some American option in an incomplete financial market (see e.g. Trevino-Aguilar [34, Sect. 1.4], Föllmer and Schied [20, Chap. 7]). In the general form, the solution of (1.1) might be interesting for robust, efficient hedging of some American option in an incomplete financial market (see Trevino-Aguilar [34, Sect. 2.1], Föllmer and Schied [20, Chap. 8]). If the seller of this American option is only willing to invest an amount \(c\) strictly smaller than the superhedging price, then for any stopping time \(\tau \in \mathcal {T}\), the random variable \(Y_{\tau}\) may represent the shortfall risk of a hedging strategy with initial investment \(c\) when the American option is exercised at \(\tau \). Then

$$ \sup _{Q\in \mathcal {Q}}\left (E_{Q}[Y_{\tau}]-\beta ( Q)\right ) $$

gives a robust quantification of the shortfall risk at time \(\tau \) reflecting the seller’s model uncertainty.

The aim of the present paper is to solve the stopping problem (1.1) numerically. We restrict ourselves to penalty functions in the form of divergence functionals with respect to a reference probability measure. In this case, (1.1) reads as

$$ \sup _{\tau \in \mathcal {T}}\sup _{Q\in \mathcal {Q}}\big(E_{ Q}[Y_{\tau}]-E[\Phi (dQ/dP) ]\big), $$
(1.3)

where \(\Phi : [0,\infty )\rightarrow [0,\infty ]\) denotes a lower semicontinuous convex function and \(\mathcal {Q}\) consists of all probability measures \(Q\) which are absolutely continuous with respect to some reference probability measure \(P\). Besides the standard optimal stopping, prominent specialisations of (1.3) are optimal stopping under average value at risk and the family of entropic risk measures.

Our investigations are built upon a specific representation of (1.3) established in Belomestny and Krätschmer [8] (with a refinement in Belomestny and Krätschmer [9]). The crucial point is that we may reformulate the optimal stopping problem in terms of a family of standard optimal stopping problems parametrised by a set of real numbers. This allows us to derive a so-called additive dual representation generalising the well-known dual representation of Rogers [30] for the standard optimal stopping problems, given by

$$ V^{\ast}:=\inf _{M\in \mathcal{M}}E \Big[\sup _{t\in [0,T]}(Z_{t}-M_{t})\Big], $$
(1.4)

where \((Z_{t})\) is an adapted cash-flow process and ℳ is the set of all \((\mathcal{F}_{t})\)-martingales starting in 0 at \(t=0\). We use this new generalised dual representation to efficiently construct Monte Carlo upper bounds for the value of the optimal stopping problems under model uncertainty. As to the standard optimal stopping problems, several Monte Carlo algorithms for constructing upper biased estimators for \(V^{*}\) based on (1.4) were suggested in the literature. They typically consist of two steps:

  1. a)

    apply some numerical method to construct a martingale \(\widehat{M}\) which is close to optimality;

  2. b)

    estimate \(E [\sup _{t\in [0,T]}(Z_{t}-\widehat{M}_{t})]\) by the sample mean, using a new independent sample (testing sample).

All the existing dual Monte Carlo algorithms can be divided into two broad categories depending on how the martingale \(\widehat{M}\) is constructed. In the first class of algorithms, see for example Andersen and Broadie [2] and Glasserman [22, Chap. 8], Belomestny and Schoenmakers [10, Part III] for further references, the choice of the martingale \(\widehat{M}\) is based on approximating the so-called Doob martingale. A particular feature of the Doob martingale is that it solves (1.4) and, moreover, satisfies

$$ V^{\ast}=\sup _{t\in [0,T]}(Z_{t}-M_{t}^{\ast})\quad \text{almost surely.} $$
(1.5)

Because of (1.5), we say that the Doob martingale is surely or strongly optimal. In the second class of algorithms, one tries to solve the dual optimisation problem (1.4) directly by using methods of stochastic approximation and some parametric subclasses of ℳ. Let us mention Desai et al. [18], where the authors essentially applied the stochastic average approximation (SAA) approach and used a nested Monte Carlo method to construct a suitable finite-dimensional linear space of martingales, thus casting the resulting minimisation problem into a linear program. However, it was demonstrated later in Belomestny [6] that the approach in Desai et al. [18] may end up with martingales \(\widehat{M}\) that are close to optimal but only in expectation, with the variance of the random variable \(\sup _{t\in [0,T]}(Z_{t}-\widehat{M}_{t})\) being relatively high. In contrast, due to (1.5), for a martingale that is close to the Doob martingale \(M^{\ast}\) (in an \(L^{2}\) sense, for instance), this variance will be close to zero. Consequently, the estimation in step b) can be done more efficiently for such a martingale. Thus it is essential to find martingales that are “close” to the Doob martingale, or at least “close” to a surely optimal martingale. In this respect, Belomestny [6] proposed a modification of the plain SAA based on variance penalisation. The convergence analysis of this algorithm reveals that the variance of the random variable \(\sup _{t\in [0,T]}(Z_{t}-\widehat{M}_{t})\) converges to zero as the number of paths used to build \(\widehat{M}\) increases.

The contribution of the current work is twofold. First, we generalise the approach of Belomestny [6] to the case of optimal stopping problems under model uncertainty by using the dual representation by Belomestny and Krätschmer [8]. Second, we provide a thorough convergence analysis of the proposed algorithm. The main theoretical challenge is to extend the analysis of Belomestny [6] to objective functions involving empirical expectations and empirical variances of much more complicated objects than in Belomestny [6]; see Sect. 3. We use essentially different techniques (e.g. different concentration inequalities) and derive faster convergence rates that improve upon those in Belomestny [6] for standard optimal stopping problems. We also illustrate our results for the case of martingales in a diffusion setting defined as integrals with respect to the corresponding Brownian motion by the martingale representation theorem. As compared to Belomestny [6], we consider here not only parametric linear families of martingales, but rather general nonparametric ones defined as stochastic integrals with smooth integrands.

Putting our contribution into perspective, it should be emphasised that one cannot utilise any general device that is suggested in the literature to analyse the optimal stopping problem (1.1) or even (1.3). To the best of our knowledge, there exist two general strategies both based on some underlying filtered probability space \((\Omega , \mathcal{F},(\mathcal{F}_{t})_{0 \leq t \leq T},P)\). The first focuses on sets \(\mathcal {Q}\) where we may find conditional nonlinear expectations extending the functional

$$ X\mapsto \sup _{Q\in \mathcal {Q}}\big(E_{Q}[X] - \beta (Q)\big) $$

and satisfying a property called time-consistency which extends the tower property of conditional expectations. Time-consistency, sometimes also called recursiveness, allows extending the dynamic programming principle from standard optimal stopping problems to optimal stopping problems of the form (1.1). Studies following this line of reasoning may be found e.g. in Trevino-Aguilar [34, Sects. 4.1 and 4.2], Bayraktar and Yao [3], Bayraktar and Yao [4], Ekren et al. [19] and Bayraktar and Yao [5] (see also Riedel [29], Krätschmer and Schoenmakers [25] and Föllmer and Schied [20, Chap. 6] for the discrete-time case). Unfortunately, this approach requires very restrictive conditions that \(\mathcal {Q}\) should satisfy, at least for optimal stopping (1.2) (see e.g. Delbaen [17], or Belomestny et al. [7] for the case where \(\mathcal {Q}\) consists of probability measures that are equivalent to \(P\)). Even worse, it is known from Kupper and Schachermayer [26] that for the optimal stopping problem (1.3), \(\mathcal {Q}\) meets this requirement in two cases only. These choices of \(\Phi \) correspond to standard optimal stopping and optimal stopping under entropic risk measures.

The second approach proposed very recently in Huang and Yu [24] and Huang et al. [23] offers a way to solve the optimal stopping problem (1.2) when a dynamic programming principle cannot be applied. The main idea in these papers is to tackle optimal stopping within a game-theoretic framework and look for Nash subgame perfect equilibria. This line of reasoning refers to a long history in economics on how to deal with time-inconsistent dynamic utility maximisation, going back to Strotz [33], Selten [31] and Selten [32]. It has become popular for applications in stochastic finance due to the contributions by Björk and Murgoci [13] and Björk et al. [12], where the authors treat stochastic control problems which do not admit a Bellman optimality principle. Formally, the expected payoffs corresponding to the equilibria approximate the optimal values of (1.2) from below. However, this approach cannot be used directly for the optimal stopping problem (1.3) since this reduces to (1.2) only in a few cases (see Ben-Tal and Teboulle [11]), with optimal stopping under average value at risk as the outstanding representative. Moreover, a numerical method to calculate the payoffs at the equilibria is missing.

In conclusion, the existing literature on robust optimal stopping does not lead in general to a constructive numerical approach to calculate the optimal values of (1.3). This paper offers a method to deal with this problem and is completed by studying its theoretical properties.

The paper is organised as follows. In Sect. 2, we introduce convex risk measures and give some examples. Then we introduce primal and dual representations for our optimal stopping problems under model uncertainty. In Sect. 3, we develop a Monte Carlo functional optimisation algorithm based on the derived dual representation. Then we analyse its convergence towards the solution, depending on the number of Monte Carlo paths and complexity of the underlying functional class. The results are specified to a setting of diffusion processes in Sect. 4. Afterwards, we present some numerical results in Sect. 5. The proofs of the results from Sects. 3 and 4 are given in Sects. 68.

2 Setup

Let \(0< T<\infty \) and let \((\Omega , \mathcal{F},(\mathcal{F}_{t})_{0 \leq t \leq T},P)\) be a filtered probability space, where \((\mathcal {F}_{t})_{t\in [0,T]}\) is a right-continuous filtration with \(\mathcal {F}_{0}\) complete and trivial. We also impose the following requirements:

  • \((\Omega ,{\mathcal{F}}_{t},P|_{{\mathcal{F}}_{t}} )\) is atomless for \(t > 0\).

  • The \(L^{1}\)-space \(L^{1} (\Omega ,{\mathcal{F}}_{t},P|_{{\mathcal{F}}_{t}} )\) is weakly separable for \(t > 0\).

Consider a lower semicontinuous convex mapping \(\Phi : [0, \infty )\rightarrow [0,\infty ]\) satisfying \(\Phi (x_{0}) < \infty \) for some \(x_{0} > 0\), \(\inf _{x\geq 0}\Phi (x) = 0\) and \(\lim _{x\to \infty}\frac{\Phi (x)}{x} = \infty \). Its Fenchel–Legendre transform

$$ \Phi ^{*}(y):=\sup _{x\geq 0}~\big(xy - \Phi (x)\big) $$

is a finite nondecreasing convex function whose restriction to \([0,\infty )\) is a finite Young function, that is, \(\Phi ^{*}:[0,\infty )\to [0,\infty )\) is convex and satisfies

$$ \lim _{x \to \infty} \frac{\Phi ^{*}(x)}{x}=\infty , \qquad \lim _{x \to 0} \frac{\Phi ^{*}(x)}{x}=0. $$

Consider the space

$$ H^{\Phi ^{*}}= \{ X \in L^{0}: E[\Phi ^{*}(c |X|)] < \infty \text{ for all } c >0 \}, $$

where \(L^{0}\) is the class of all (equivalence classes of) finite-valued random variables. For abbreviation, let us introduce the functional \(\rho : H^{\Phi ^{*}}\rightarrow \mathbb{R}\) defined by

$$ \rho (X)=\sup _{Q\in \mathcal {Q}_{\Phi}}\big(E_{Q}[X]- E[\Phi (dQ/dP) ]\big), $$

where \(\mathcal {Q}_{\Phi}\) stands for the set of all probability measures \(Q\) which are absolutely continuous with respect to \(P\) and such that \(\Phi (dQ/dP)\) is \(P\)-integrable. Note that \(X dQ/dP\) is \(P\)-integrable for every \(Q\in \mathcal {Q}_{\Phi}\) and any \(X\in H^{\Phi ^{*}}\) due to Young’s inequality.

Example 2.1

Let us illustrate our setup in the case of the so-called average value at risk, also known as expected shortfall or conditional value at risk. The average value at risk at level \(\alpha \in (0,1]\) is defined as the functional

$$ AV@R_{\alpha} (X) := \frac{1}{1-\alpha}\,\int _{\alpha}^{1} F^{ \leftarrow}_{X}(\beta )\,d\beta , $$

where \(X\) is \(P\)-integrable and \(F^{\leftarrow}_{X}\) denotes the left-continuous quantile function of the distribution function \(F_{X}\) of \(X\) defined by \(F^{\leftarrow}_{X}(\alpha ) = \inf \{x\in \mathbb{R}: F_{X}(x)\geq \alpha \}\) for \(\alpha \in (0,1)\). Note that \(AV@R_{1}(X) = E[X]\) for any \(P\)-integrable \(X\). Moreover, for \(\alpha \in (0,1)\), it is well known that

$$ AV@R_{\alpha}(X) = \sup \limits _{Q\in \mathcal {Q}_{\Phi _{\alpha}}} E_{Q}[X]\qquad \mbox{for }P\mbox{-integrable }X, $$

where \(\Phi _{\alpha}\) stands for the function defined by \(\Phi _{\alpha}(x) = 0\) for \(x\leq 1/(1-\alpha )\), whereas \(\Phi _{\alpha}(x) = \infty \) otherwise (cf. Föllmer and Schied [20, Theorem 4.52]). Observe that the set \(\mathcal {Q}_{\Phi _{\alpha}}\) consists of all probability measures on ℱ with \(dQ/dP\leq 1/(1-\alpha )\) \(P\)-a.s.

Consider now a right-continuous nonnegative stochastic process \((Y_{t})\) adapted to \((\mathcal{F}_{t})\). Furthermore, let \(\mathcal {T}\) consist of all \([0,T]\)-valued stopping times \(\tau \) with respect to \((\mathcal{F}_{t})\). The main object of our study is the optimal stopping problem

$$ V_{0}=\sup _{\tau \in \mathcal {T}}\, \rho (Y_{\tau}). $$
(2.1)

For fixed \(x\in \mathbb{R}\), we denote by \(V^{x} = (V^{x}_{t})_{t\in [0,T]}\) the Snell envelope of the process \((\Phi ^{*}(x + Y_{t}) - x )_{t\in [0,T]}\) defined via

Let \({\mathrm{int}}({\mathrm{dom}}(\Phi ))\) denote the topological interior of the effective domain of the mapping \(\Phi : [0,\infty )\rightarrow [0,\infty ]\). We assume that \(\Phi \) is a lower semicontinuous convex function satisfying \(1\in {\mathrm{int}}({\mathrm{dom}}(\Phi ))\). Denote by \(\mathcal {M}_{0}\) the set of all martingales \((M_{t})_{t\in [0,T]}\) with \(M_{0}=0\) such that \(\sup _{t\in [0,T]} |M_{t}|\) is \(P\)-integrable. The following result was proved in Belomestny and Krätschmer [8] along with Belomestny and Krätschmer [9]. We point out that this uses that \((\Omega ,\mathcal {F}_{t},P|_{\mathcal {F}_{t}})\) is atomless and \(L^{1} (\Omega ,{\mathcal{F}}_{t},P|_{{\mathcal{F}}_{t}} )\) is weakly separable, for each \(t>0\).

Theorem 2.2

If there is some \(p > 1\) such that \(\sup _{t\in [0,T]}|\Phi ^{*}(x + Y_{t})|\) is \(P\)-integrable of order \(p\) for any \(x\in \mathbb{R}\), then we have the dual representations

(2.2)

Here \(M^{*,x}\) is the martingale part of the Doob–Meyer decomposition of the Snell envelope \(V^{x}\) and \(K \subseteq \mathbb{R}\) denotes a suitably chosen compact set.

Remark 2.3

The above dual representation is remarkable for at least two reasons. Firstly, it allows one to construct upper bounds for the value \(V_{0}\) by choosing a martingale \(M\) from the set \(\mathcal {M}_{0}\). Secondly, if the optimal martingale \(M^{*,x}\) is found, then we need a single trajectory of the reward process \(Y\) and the martingale \(M^{*,x}\) to compute \(V_{0}\) with no error. In this sense, such a dual representation can be computationally more efficient than the primal one.

Remark 2.4

We may describe more precisely how to choose the compact set \(K\) in Theorem 2.2. First of all, observe that under the assumptions of this theorem, the representation results imply

$$ \sup _{\tau \in \mathcal {T}}\rho (Y_{\tau}) \leq \inf _{x\in \mathbb{R}} E\Big[\sup _{t\in [0,T]}\big(\Phi ^{*}(x + Y_{t}) - x \big) \Big] < \infty . $$

Secondly,

$$ E\Big[\sup _{t\in [0,T]}\big(\Phi ^{*}(x + Y_{t}) - x - M^{*,x}_{t} \big)\Big]\geq \Phi ^{*}(x + Y_{0}) - x $$

holds for any real number \(x\). Next, by assumption we may find \(0\leq x_{0} < 1 < x_{1}\) such that \(x_{0}, x_{1}\) belong to the effective domain of \(\Phi \). Then by the definition of \(\Phi ^{*}\),

$$ \Phi ^{*}(x + Y_{0}) -x\geq \max _{i=0,1}\big((x_{i} - 1) x + Y_{0}~ x_{i} - \Phi (x_{i})\big)\quad \mbox{for}~x\in \mathbb{R}. $$

Then it is easy to check that

$$ E\Big[\sup _{t\in [0,T]}\big(\Phi ^{*}(x + Y_{t}) - x \big) \Big] > \sup _{\tau \in \mathcal {T}}\rho (Y_{\tau}) $$

whenever

$$ x < a_{\ell} := \min _{i=0,1} \frac{\sup _{\tau \in \mathcal {T}}\rho (Y_{\tau}) + \Phi (x_{i}) - x_{i} Y_{0}}{x_{i} - 1 } $$

or

$$ x > a_{u} := \max _{i=0,1} \frac{\sup _{\tau \in \mathcal {T}}\rho (Y_{\tau}) + \Phi (x_{i}) - x_{i} Y_{0}}{x_{i} - 1 }. $$

Hence any compact set \(K \supseteq [a_{\ell},a_{u}]\) may be used in Theorem 2.2. We can derive a more accessible choice for the set \(K\) in the case of average value at risk \(AV@R_{\alpha}\). By nonnegativity of the process \((Y_{t})\),

$$ \sup _{\tau \in \mathcal {T}}AV@R_{\alpha}(Y_{\tau})\leq \sup _{\tau \in \mathcal {T}}E[Y_{\tau}]/(1-\alpha ). $$

Furthermore, \(\Phi ^{*}(x) = x^{+}/(1-\alpha )\) holds for \(x\in \mathbb{R}\) so that

$$ \Phi ^{*}(x + Y_{0}) - x > \sup _{\tau \in \mathcal {T}}E[Y_{\tau}] \qquad \mbox{for}~\mathbb{R}\setminus [a^{\alpha}_{\ell},a^{\alpha}_{u} ], $$

where

$$ a^{\alpha}_{\ell} := - \sup _{\tau \in \mathcal {T}}E[Y_{\tau}] \qquad \mbox{and}\qquad a^{\alpha}_{u} := \Big((1-\alpha ) \sup _{ \tau \in \mathcal {T}}E[Y_{\tau}] - Y_{0}\Big)/\alpha . $$

Thus any compact \(K \supseteq [a^{\alpha}_{\ell},a^{\alpha}_{u} ]\) is a proper choice in Theorem 2.2 for \(\rho = AV@R_{\alpha}\).

In the next section, we propose a Monte Carlo method for solving the dual optimisation problem (2.2) empirically.

3 Dual empirical minimisation

The representation result in Theorem 2.2, in particular (2.2), is the starting point for our method to solve the optimal stopping problem (2.1). We start by fixing a metric space \(\Psi \) and a family \((M_{t}(\psi ))_{t\in [0,T]}\) of martingales parametrised by \(\psi \in \Psi \), adapted to \((\mathcal{F}_{t})_{t \in [0,T]}\) and satisfying \(M_{0}(\psi ) =0\). Define the process \(Z = (Z(x,\psi ))\) via

$$ Z(x,\psi ) := \sup _{s \in [0,T]} \big( \Phi ^{*}(x + Y_{s}) - x-M_{s}( \psi )\big)\qquad \mbox{for}~x\in \mathbb{R}, \psi \in \Psi . $$

We shall find the “best” \(\psi \in \Psi \) by solving the empirical optimisation problem on a set of trajectories. To this end, we define the product space \((\Omega ^{\mathbb{N}},\mathcal {F}^{\mathbb{N}},P^{\mathbb{N}})\) and its natural projections

$$ \Pi _{i}(\underline {\omega })= \omega _{i}, \qquad \underline {\omega }=(\omega _{n})_{n \in \mathbb{N}}\in \Omega ^{\mathbb{N}}, $$

as well as the processes \(Z^{(i)}\), \(i=1,2,\ldots \), on \(\Omega^{\mathbb{N}} \times \mathbb{R}\times \Psi \) via

$$ Z^{(i)}(\underline {\omega },x,\psi ):=\sup _{s \in [0,T]} \Bigl(\Phi ^{*}\big(x + Y_{s}\circ \Pi _{i}(\underline {\omega })\big) - x -M_{s}\big(\Pi _{i}(\underline {\omega }); \psi \big) \Bigr). $$

Fix some \(\lambda >0\) and let \((x_{n},\psi _{n})\) denote one of the random solutions of the random optimisation problem

$$\begin{aligned} \min _{(x,\psi ) \in K \times \Psi}\bigg(&\frac{1}{n}\sum _{i=1}^{n} Z^{(i)}( \,\cdot \,,x,\psi ) \\ &{}+ \frac{\lambda}{n(n-1)}\sum _{1\leq i < j\leq n} \big(Z^{(i)}(\, \cdot \,,x,\psi )-Z^{(j)} (\, \cdot \,,x,\psi )\big)^{2}\bigg), \end{aligned}$$

where \(K\) is a compact set in ℝ as in Theorem 2.2. If \(n\to \infty \), this optimisation problem becomes \(P^{\mathbb{N}}\)-a.s. close to the optimisation problem

$$\begin{aligned} \min _{(x,\psi ) \in K \times \Psi}\big( E[Z(x,\psi )]+\lambda \mathrm {{Var}}[Z(x,\psi )]\big), \end{aligned}$$
(3.1)

and we denote by \((x^{*},\psi ^{*})\) one of the latter’s (deterministic) solutions. The intuition behind (3.1) is simple. Setting \(\xi (x,M):=\sup _{t\in [0,T]}(\Phi ^{*}(x + Y_{t}) - x - M_{t})\), we minimise the expectation of \(\xi (x,M)\) over a family of martingales \(M\) and \(x\in K\). At the same time, we penalise the variance of this random variable. We also have in mind that the variance of \(\xi (x,M)\) is zero if the chosen family of martingales contains the martingale \(M^{*,x^{*}}\) defined in Theorem 2.2 (see Rogers [30]). In this way, a variance reduction effect can be achieved, as we shall illustrate in Sect. 5.

Let us now analyse the properties of the measurable selector \((x_{n},\psi _{n})\). For any \(n \in \mathbb{N}\), set

$$ \mathcal{D}_{n}: \Omega ^{\mathbb{N}} \to (\mathbb{R}^{n})^{K\times \Psi}, \qquad \underline {\omega }\mapsto \big(Z^{(1)}(\underline {\omega },x,\psi ),\dots ,Z^{(n)}( \underline {\omega },x,\psi )\big)_{(x,\psi ) \in K \times \Psi}. $$

The mapping \(\mathcal{D}_{n}\) can be interpreted as a set of Monte Carlo paths of the process \(Z\) used to construct \((x_{n},\psi _{n})\). In order to formulate our main results, we introduce the function, for a selector \((x_{n},\psi _{n})\),

$$\begin{aligned} \mathcal{Q}_{\lambda} (x_{n},\psi _{n} )&:=E[Z(x_{n},\psi _{n}) | \mathcal{D}_{n} ] + \lambda \mathrm {{Var}}[Z(x_{n},\psi _{n}) | \mathcal{D}_{n} ],\qquad (x,\psi )\in K \times \Psi . \end{aligned}$$

With a slight abuse of notation, we set for \((x,\psi ) \in K \times \Psi \),

$$\begin{aligned} \mathcal{Q}_{\lambda}\left (x,\psi \right )&:=E[Z(x,\psi ) ] + \lambda \mathrm {{Var}}[Z(x,\psi ) ]. \end{aligned}$$

Now let \((K \times \Psi )_{\eta}\) denote the set of centres of a covering of \(K\times \Psi \) by a minimal number of \(\eta \)-balls with respect to the (semi)metric

$$ d\big((x,\psi ),(x',\psi ')\big):=E[| Z(x,\psi )-Z(x',\psi ')| ].$$

Then define

$$ \gamma (K \times \Psi ,n) := \inf \left \{ \varepsilon >0 : \log \mathcal{N}(K \times \Psi , \varepsilon ) \leq n \varepsilon \right \}, $$

where \(\mathcal{N}(K \times \Psi ,\varepsilon )\) stands for the minimal number to cover the set \(K \times \Psi \) by open \(d\)-balls with radius \(\varepsilon > 0\). We tacitly set \(\mathcal{N}(K \times \Psi ,\varepsilon ) = \infty \) if no finite cover is available.

Theorem 3.1

Let \(\delta \in (0,1)\) and \(\lambda >0\). Assume that \(|Z|\leq b <\infty \) with probability 1. Then it holds for all \(n \in \mathbb{N}\) that

$$ P^{\mathbb{N}} [\mathcal{Q}_{\lambda}(x_{n},\psi _{n})-\mathcal{Q}_{ \lambda}(x^{*},\psi ^{*})\leq 8~R_{0}(n,\lambda ,\delta ) ] \geq 1- \delta , $$

where

$$\begin{aligned} R_{0}(n,\lambda ,\delta )&:=b(1+8\lambda b)\bigg(\sqrt{ \frac{\log (8/\delta )}{n}}+\sqrt{\gamma (K\times \Psi ,n)}\bigg) \\ &\phantom{=:}{} +(1+4 b \lambda )\gamma (K\times \Psi ,n). \end{aligned}$$

Corollary 3.2

Let all assumptions of Theorem 3.1be valid and further assume that

$$\lim _{n \to \infty} \gamma (K \times \Psi ,n)=0. $$

Then for all \(\varepsilon >0\), it holds that

$$ \lim _{n \to \infty}P^{\mathbb{N}} [ |\mathcal{Q}_{\lambda}(x_{n},\psi _{n})- \mathcal{Q}_{\lambda}(x^{*},\psi ^{*}) |\geq \varepsilon ]=0. $$

In some situations, the bounds of Theorem 3.1 can be improved. Suppose that

$$ \min _{(x,\psi ) \in K \times \Psi} \big(E[Z(x,\psi ) ]+ \lambda \mathrm {{Var}}[Z(x,\psi ) ]\big)=\sup _{\tau \in \mathcal {T}}\, \rho (Y_{\tau}), $$

that is, the set \(\Psi \) is assumed to be rich enough such that the solution \((x^{*},\psi ^{*})\) satisfies \(M(\psi ^{*})=M^{*,x^{*}}\), where \(M^{*,x^{*}}\) is the martingale part of the Doob–Meyer decomposition of \(V^{x^{*}}\). As already mentioned above, in this case it holds that

$$ \mathrm {{Var}}[Z (x^{*},\psi ^{*} ) ]=0. $$
(3.2)

In the case that \(M(\psi ^{*})=M^{*,x^{*}}\) for some \(\psi ^{*}\), the selectors \((x_{n},\psi _{n})\) are nothing else but so-called M-estimators. So we may invoke the established theory on asymptotics of M-estimation. The reader is referred to van der Vaart and Wellner [35, Chap. 3] for comprehensive information. In this theory, a starting point is the so called “well-separated minimum” condition

$$ \inf _{d ((x,\psi ),(x^{*},\psi ^{*}) )\geq \varepsilon} \mathbb{E}[Z(x, \psi ) ] > \mathbb{E}[Z(x^{*},\psi ^{*}) ]\qquad \mbox{for every}~ \varepsilon > 0. $$
(3.3)

Property (3.3) is a basic assumption to find general criteria which ensure that the sequence \((x_{n},\psi _{n} )_{n\in \mathbb{N}}\) converges in probability to \((x^{*},\psi ^{*})\) (see van der Vaart and Wellner [35, Corollary 3.2.3]).

Since our metric \(d\) is assumed to be totally bounded, the topological closure of the set \(\{Z(x,\psi ) : (x,\psi )\in K\times \Psi \}\) with respect to the \(L^{1}\)-norm is compact. Note then that condition (3.3) is satisfied if and only if the restriction of the expectation operator to the \(L^{1}\)-closure \({\mathrm{{cl}}}(\{Z(x,\psi ): (x,\psi )\in K\times \Psi \})\) of \(\{Z(x,\psi ): (x,\psi )\in K\times \Psi \}\) has a unique minimum at \(Z(x^{*},\psi ^{*})\).

If we are interested in convergence rates, we must complete the “well-separated minimum condition”. The following type of identifiability condition is now standard in the literature of M-estimation (see van der Vaart and Wellner [35, Theorem 3.2.5]): There exist \(\overline{C}, \delta > 0\) such that

$$\begin{aligned} &\mathbb{E}[Z(x,\psi ) - Z(x^{*},\psi ^{*}) ]\geq \overline{C}~d\big((x, \psi ),(x^{*},\psi ^{*})\big) \\ &\mbox{for }d\big((x,\psi ),(x^{*},\psi ^{*})\big) < \delta. \end{aligned}$$
(3.4)

Now we are prepared to improve the convergence rates.

Theorem 3.3

Let \(\delta \in (0,1)\), \(\lambda >0\), \(C_{\lambda ,b}:=64 b^{3}\lambda ^{2}+2\lambda \) and \(| Z |\leq b <\infty \) with probability 1. Then under (3.2)(3.4), for all \(n \in \mathbb{N}\) satisfying

$$ n> 4b \big(\gamma (K \times \Psi ,n) + C_{\lambda ,b}\log (8/\delta )/n +2/3 \big) \log (8/\delta ), $$

it holds that

$$ P^{\mathbb{N}} [\mathcal{Q}_{\lambda}(x_{n},\psi _{n})-\mathcal{Q}_{ \lambda}(x^{*},\psi ^{*})\leq c_{1} R_{1}(n,\delta )+c_{2} R_{2}(n, \lambda ,\delta ) ] \geq 1-\delta , $$
(3.5)

where \(c_{1},c_{2}>0\) are some universal constants,

$$ R_{1}(n,\delta ):=\gamma (K\times \Psi ,n)+\frac{\log (8/\delta )}{n} $$
(3.6)

and

$$ R_{2}(n,\lambda ,\delta ):=\sqrt{ \frac{b (\gamma (K \times \Psi ,n)+\frac{C_{\lambda ,b}\log (8/\delta )}{n} + 1 )\log (8/\delta )}{n}}. $$
(3.7)

Remark 3.4

Note that if \(\gamma (K \times \Psi ,n)\to 0\) as \(n\to \infty \) in such a way that

$$ \lim _{n\to \infty} n\,\gamma (K \times \Psi ,n)=\infty , $$

then

$$ \lim _{n\to \infty} \frac{R_{1}(n,\delta ) + R_{2}(n,\lambda ,\delta )}{R_{0}(n,\lambda ,\delta )}=0 $$

(see Sect. 7). In this sense, the bound in Theorem 3.3 is better than the one in Theorem 3.1.

4 Specification analysis for the class \(\Psi \)

In this section, we specify the convergence rates in (3.5) depending on the properties of the parameter space \(\Psi \). The convergence rate strongly depends on the quantity \(\gamma (K \times \Psi , n)\). This quantity in turn depends on the set \(\Psi \). Thus to analyse the convergence rate, we have to study the covering number of \(\Psi \). In what follows, we consider parametric families of martingales arising in the setting of diffusion processes. Let \((S_{t})_{t\in [0,T]}\) denote a \(d\)-dimensional diffusion process solving the system of SDEs

$$ dS_{t}= \mu (t,S_{t})dt+\sigma (t,S_{t})\,dW_{t}, \qquad S_{0}=x_{0}, $$
(4.1)

where \(\mu :[0,T]\times \mathbb{R}^{d} \to \mathbb{R}^{d}\) and \(\sigma :[0,T]\times \mathbb{R}^{d} \to \mathbb{R}^{d \times m}\) are Lipschitz-continuous in space and \(1/2\)-Hölder-continuous in time, with \(m\) denoting the dimension of the Brownian motion \(W=(W_{1},\ldots ,W_{m})^{\top}\). Then the martingale representation theorem implies that any square-integrable martingale \((M_{t})_{t\in [0,T]}\) with respect to the filtration \((\mathcal{F}_{t})_{t\in [0,T]}\) generated by \((W_{t})_{t\in [0,T]}\) and with \(M_{0}=0\) can be represented as

$$\begin{aligned} M_{t}=\int _{0}^{t} G_{s}\,dW_{s}, \qquad t\in [0,T], \end{aligned}$$
(4.2)

where \((G_{s})_{s\in [0,T]}\) is an \((\mathcal{F}_{t})_{t\in [0,T]}\)-adapted process which is square-integrable on \([0,T] \) in the sense of (4.3) below. Under some conditions, it can be shown by the Itô formula that the Doob martingale \((M_{t}^{*})_{t\in [0,T]}\) of the Snell process

for a function \(f:\mathbb{R}^{d}\to \mathbb{R}\) has a representation (4.2). More specifically, we may choose \(G_{s}=G(s,S_{s})\) for some measurable function \(G: [0,T] \times \mathbb{R}^{d}\to \mathbb{R}^{d}\) such that \((G_{s})_{s\in [0,T]}\) is square-integrable on \([0,T]\) as in (4.3) below; see Ye and Zhou [37, Theorem 5]. Therefore it is reasonable to parametrise a subclass of square-integrable martingales adapted to \((\mathcal{F}_{t})\) by functions \(\psi (t,x)=(\psi _{1}(t,x),\ldots ,\psi _{m}(t,x))\), satisfying

$$ \int _{0}^{T} E[|\psi (t,S_{t})|^{2} ] dt < \infty , $$
(4.3)

via

$$ M_{t}=M_{t}(\psi )=\int _{0}^{t} \psi (u,S_{u})\,dW_{u}. $$

Note that this type of representations was already used to solve optimal stopping/ control problems in a dual formulation; see e.g. Wang and Caflisch [36] and Ye and Zhou [37]. Denote by \(\mathcal{H}_{p}^{s}(\mathbb{R}^{d})\) the Sobolev space consisting of all functions \(f \in L^{p}(\mathbb{R}^{d})\) such that for every multi-index \(\alpha \) with \(|\alpha | \leq s\), the mixed partial derivative \(D^{\alpha}f\) exists in the weak sense and is in \(L^{p}(\mathbb{R}^{d})\). Further let \(\beta \in \mathbb{R}\) and \(\langle x \rangle ^{\beta}=(1+|x|^{2})^{\beta /2}\), where \(x\in \mathbb{R}^{d}\). For \(s-d/p>0\), we define the weighted Sobolev space

$$ \mathcal{H}_{p}^{s}(\mathbb{R}^{d}, \langle x \rangle ^{\beta})= \{f : f \langle x \rangle ^{\beta}\in \mathcal{H}_{p}^{s}(\mathbb{R}^{d}) \} $$

and

$$ \mathcal{H}_{p}^{s}([0,T]\times \mathbb{R}^{d}, \langle x \rangle ^{\beta})= \{f:[0,T]\times \mathbb{R}^{d} \to \mathbb{R}^{m} : f \in \mathcal{H}_{p}^{s}(\mathbb{R}^{d+1}, \langle x \rangle ^{\beta}) \}. $$

Let \(\pi _{t}\) denote the density function of \(S_{t}\). We set

$$ \Psi _{\pi}= \{f:[0,T] \times \mathbb{R}^{d} \to \mathbb{R}^{m},~ (t,x) \mapsto \sqrt{\pi _{t}(x)}\psi (t,x) : \psi \in \Psi \}. $$

Now let us first look at convergence rates in the case that \(\mathrm {{Var}}[Z(x^{*},\psi ^{*})]\) does not vanish. Built upon an integrability condition on the density process \((\pi _{t})\), they will be described in terms of the degree \(s\) of smoothness that the functions in \(\Psi _{\pi}\) fulfil, and the dimension \(d\) of their domain. Recalling the process \(Z = (Z(x,\psi ))\) introduced at the beginning of Sect. 3, the following result is an application of Theorem 3.1.

Theorem 4.1

Let \(p=2\), \(\beta \in \mathbb{R}, s \in \mathbb{N}\), \(\delta \in (0,1)\), \(\lambda >0\) and \(d \in \mathbb{N}\). Further, let \(\Psi \) be a set such that \(\Psi _{\pi}\subseteq \mathcal{H}_{2}^{s}([0,T] \times \mathbb{R}^{d}, \langle x \rangle ^{\beta})\) is bounded with respect to the norm

$$ \Vert f \Vert =\sum _{0 \leq |\alpha | \leq s}\sqrt{\int _{[0,T]} \int _{\mathbb{R}^{d}} |D^{\alpha}f(t,x)|^{2}\, dx dt}. $$

In addition, suppose that

$$\begin{aligned} \sqrt{\int _{[0,T]} \int _{\mathbb{R}^{d}} \langle x \rangle ^{\alpha - \beta} \pi _{t}(x) dx dt}< \infty \end{aligned}$$
(4.4)

for some \(\alpha >0\). If \(| Z |\leq b <\infty \) with probability 1 for some \(b\in \mathbb{R}\), then for \(s > (d+ 1)/2\), there exist constants \(\eta _{1}, \eta _{2}, \eta _{3}\) and \(\eta _{4}\), depending on \(\lambda ,b,d,s\) and \(\delta \) as well as on the compact set \(K\) and the function \(\Phi \), such that

1) for \(\alpha >s-(d+1)/2\) and \(s/(s+d+1) \leq 1/2\),

$$\begin{aligned} &P^{\mathbb{N}} \big[\mathcal{Q}_{\lambda} (x_{n},\psi _{n} )-\mathcal{Q}_{ \lambda} (x^{*},\psi ^{*} )\leq \eta _{1}~ n^{-\frac{1}{2} \frac{s}{s+d+1}} \big]\geq 1-\delta ; \end{aligned}$$

2) for \(\alpha >s-(d+1)/2\) and \(s/(s+d+1)> 1/2\),

$$\begin{aligned} &P^{\mathbb{N}} \big[ \mathcal{Q}_{\lambda} (x_{n},\psi _{n} )- \mathcal{Q}_{\lambda} (x^{*},\psi ^{*} )\leq \eta _{2}~ n^{- \frac{1}{4}} \big]\geq 1-\delta ; \end{aligned}$$

3) for \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) \leq 1/2\),

$$\begin{aligned} &P^{\mathbb{N}}\big[\mathcal{Q}_{\lambda} (x_{n},\psi _{n} )-\mathcal{Q}_{ \lambda} (x^{*},\psi ^{*} )\leq \eta _{3}~n^{-\frac{1}{2} \frac{\alpha /(d+1)+1/2}{\alpha /(d+1)+3/2}}\big] \geq 1-\delta ; \end{aligned}$$

4) for \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) > 1/2\),

$$\begin{aligned} &P^{\mathbb{N}}\big[\mathcal{Q}_{\lambda} (x_{n},\psi _{n} )-\mathcal{Q}_{ \lambda} (x^{*},\psi ^{*} )\leq \eta _{4}~n^{-\frac{1}{4}}\big] \geq 1- \delta . \end{aligned}$$

The parameter \(\alpha \) in Theorem 4.1 may be viewed as a degree of integrability for the density process \((\pi _{t})\). The terms \(s/(s + d + 1)\) and \((\alpha /(d+ 1) + 1/2)/(\alpha /(d+ 1) + 3/2)\) occurring in the result are nondecreasing in \(s\) and \(\alpha \), respectively, with

$$ \sup _{s < \alpha + (d+1)/2}\frac{s}{s + d + 1} = \frac{\alpha /(d+1) + 1/2}{\alpha /(d+1) + 3/2} $$

and

$$ \sup _{\alpha < s - (d + 1)/2} \frac{\alpha /(d+1) + 1/2}{\alpha /(d+1) + 3/2} = \frac{s}{s + d + 1}. $$

So Theorem 4.1 tells us that for a fixed degree of integrability, the convergence rates are nondecreasing with respect to the degree of smoothness. However, the second and fourth cases show that in case of a significant degree of smoothness in comparison with the dimension \(d\), there is always a point of saturation where the convergence rates cannot be improved by higher degrees of smoothness. In addition, for a given degree of smoothness, the higher the degree of integrability, the better the convergence rates, with certain points of saturation.

Let us turn to the situation when the assumptions of Theorem 3.3 hold. We may derive from Theorem 3.3 the next result which is qualitatively of the same nature as Theorem 4.1, but with doubled convergence rates.

Theorem 4.2

Let all conditions of Theorem 4.1be satisfied and in addition suppose that properties (3.2)(3.4) are valid. For \(s > (d+1)/2\), there exist constants \(\tilde{\eta}_{1},\tilde{\eta}_{2},\tilde{\eta}_{3},\tilde{\eta}_{4}\), depending on \(\lambda ,b,d,s\) and \(\delta \) as well as on the compact set \(K\) and the function \(\Phi \), such that

1) for \(\alpha >s-(d+1)/2\) and \(s/(s+d+1) \leq 1/2\),

$$\begin{aligned} &P^{\mathbb{N}}\big[ \mathcal{Q}_{\lambda} (x_{n},\psi _{n} )-\mathcal{Q}_{ \lambda}(x^{*},\psi ^{*})\leq \tilde{\eta}_{1}~n^{-\frac{s}{s+d+1}} \big]\geq 1-\delta ; \end{aligned}$$

2) for \(\alpha >s-(d+1)/2\) and \(s/(s+d+1)> 1/2\),

$$\begin{aligned} &P^{\mathbb{N}}\big[ \mathcal{Q}_{\lambda} (x_{n},\psi _{n} )-\mathcal{Q}_{ \lambda}(x^{*},\psi ^{*})\leq \tilde{\eta}_{2}~n^{-\frac{1}{2}}\big] \geq 1-\delta ; \end{aligned}$$

3) for \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) \leq 1/2\),

$$\begin{aligned} &P^{\mathbb{N}}\big[\mathcal{Q}_{\lambda} (x_{n},\psi _{n} )-\mathcal{Q}_{ \lambda}(x^{*},\psi ^{*})\leq \tilde{\eta}_{3}~n^{- \frac{\alpha /(d+1)+1/2}{\alpha /(d+1)+3/2}}\big] \geq 1-\delta ; \end{aligned}$$

4) for \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2) (\alpha /(d+1)+3/2) > 1/2\),

$$\begin{aligned} &P^{\mathbb{N}}\big[\mathcal{Q}_{\lambda} (x_{n},\psi _{n} )-\mathcal{Q}_{ \lambda}(x^{*},\psi ^{*})\leq \tilde{\eta}_{4}~n^{-\frac{1}{2}}\big] \geq 1-\delta . \end{aligned}$$

Remark 4.3

Theorem 4.1 implies that \(\mathcal{Q}_{\lambda}\left (x_{n},\psi _{n}\right )\) converges to \(\mathcal{Q}_{\lambda}\left (x^{*},\psi ^{*}\right )\) at a rate depending on the smoothness of the density \(\pi _{t}(x)\) and its decay for \(|x|\to \infty \). It is well known (see Friedman [21, Theorem 9.8]) that if the diffusion coefficient \(\sigma \) is uniformly elliptic and the coefficients \(\mu \) and \(\sigma \) are infinitely differentiable in \([0,T]\times \mathbb{R}^{d}\) with bounded derivatives of any order, then \(\partial _{t}^{s}\partial ^{r}_{x} \pi _{t}(x)\) exists for all positive integers \(r\) and \(s\). Moreover, it holds for all \(x\in \mathbb{R}^{d}\) and \(t>0\) that

$$\begin{aligned} |\partial _{t}^{s}\partial ^{r}_{x} \pi _{t}(x)|\lesssim \frac{1}{t^{(d+|r|)/2+s}} \exp \left (-c\,\frac{|x-x_{0}|^{2}}{t} \right ) \qquad \mbox{for some}~ c>0. \end{aligned}$$

Here ≲ means that the above inequality holds up to a constant only depending on \(s\) and \(r\). Hence (4.4) holds for an arbitrarily large \(\alpha \geq \beta \) and

$$\begin{aligned} &P^{\mathbb{N}} \big[ \mathcal{Q}_{\lambda} (x_{n},\psi _{n} )- \mathcal{Q}_{\lambda} (x^{*},\psi ^{*} )\leq M_{2}~n^{-\frac{1}{4}} \big]\geq 1-\delta \end{aligned}$$

for any norm-bounded class \(\Psi \subseteq \mathcal{H}_{2}^{s}([0,T] \times \mathbb{R}^{d},\langle x \rangle ^{\beta})\) with arbitrary but fixed \(\beta \leq 0\) and \(s>d+1\). Here we refer to the norm introduced in Theorem 4.1.

5 Numerical results

We use the Euler scheme and \(L=200\) discretisation points to approximate the solution of the SDE

$$ dS_{t}= \mu (t,S_{t})dt+\sigma (t,S_{t})\,dW_{t}, \qquad S_{0}=s_{0}. $$

In particular, we discretise the interval \([0,T]\) with

$$ 0=t_{0}< t_{1}< \cdots < t_{L}=T.$$

Then for computational reasons, we smooth our objective function

$$ \widetilde{Z}(x,\psi ) := \sup _{s \in \{t_{0},t_{1},\ldots ,t_{L}\}} \left ( \Phi ^{*}(x + Y_{s}) - x-M_{s}(\psi )\right )$$

using a soft-max type method to get

$$\begin{aligned} \widetilde{Z}_{p}(x,\psi )=p^{-1}\log \bigg(\sum _{i=0}^{L} \exp \Big(p ~\big(\Phi ^{*}(x + Y_{t_{i}}) - x-M_{t_{i}}(\psi )\big)\Big)\bigg), \quad p>0. \end{aligned}$$
(5.1)

Note that for \(p \to \infty \), the pointwise convergence \(\widetilde{Z}_{p} \to \widetilde{Z}\) holds. This follows from the observation that well-known relationships between \(L^{p}\)-norms (see e.g. Aliprantis and Border [1, Lemma 13.1]) yield

$$\begin{aligned} &\lim _{p\to \infty}\bigg(\sum _{i=0}^{L} \exp \big(\Phi ^{*}(x + Y_{t_{i}}) - x-M_{t_{i}}(\psi )- \widetilde{Z}(x,\psi )\big)^{p}\bigg)^{1/p} \\ &= \max _{s\in \{t_{0},\ldots ,t_{L}\}} \exp \big(\Phi ^{*}(x + Y_{s}) - x-M_{s}(\psi )- \widetilde{Z}(x,\psi )\big) = 1. \end{aligned}$$

For our numerical study, we focus on the optimal stopping problems

$$ \sup _{\tau \in \mathcal {T}}AV@R_{1- \alpha}(Y_{\tau}),\qquad \alpha \in [0,1), $$

where \(AV@R_{1- \alpha}\) denotes the risk measure average value at risk at level \(1-\alpha \) as introduced in Example 2.1. The real-valued martingale

$$ M_{t}(\psi )= \int _{0}^{t} \psi (u,S_{u})\,dW_{u} $$

can be approximated by the sum

$$\begin{aligned} \widetilde{M}_{t}(\psi )&=\sum _{i=0}^{L}\sum _{j=1}^{m} \psi _{j} (t_{i},S_{t_{i}} ) (W_{t_{i+1}}^{j}-W_{t_{i}} ) 1_{\{t_{i} \leq t\}}. \end{aligned}$$

For the space \(\Psi \), we take a linear span of trigonometric basis functions and use a gradient-based method to solve the resulting optimisation problem. Next, we present numerical examples of pricing American put and Bermudan max-call options. Some of these examples were discussed for standard optimal stopping in Glasserman [22, Chap. 8] and in Belomestny [6]. Note also that for the stopping problems considered in this section, some examples were presented in Belomestny and Krätschmer [8] albeit with different parameters.

Example 5.1

Let \(S_{t}=S_{0} \exp ((r-\delta -\sigma ^{2}/2)t+\sigma W_{t})\) with \(r=0.05\), \(\delta =0.1\), \(\sigma =0.2\) and \(Y_{t}=\exp (-r t)(K_{\mathrm {c},\mathrm {p}}-S_{t})^{+}\), where \(K_{\mathrm {c},\mathrm {p}}\) denotes the strike price. Under these conditions, our algorithm approximates the solution of the optimal stopping problem

$$ \sup _{\tau \in \mathcal {T}} \text{AV@R}_{1-\alpha}(Y_{\tau}). $$

In our implementation, we let \(\Psi \) be a linear space of functions \(\psi :[0,T]\times \mathbb{R}\to \mathbb{R}\) such that

$$ \psi (t,x) \in \hbox{span}\big\{ \xi _{k}\big(y_{t}(x)\big),\zeta _{k} \big(y_{t}(x)\big),~ k=0,\ldots ,D\big\} $$

where

$$ y_{t}(x)=\frac{1}{2\sqrt{T-t}} \log (x/K_{\mathrm {c},\mathrm {p}})$$

and

$$\begin{aligned} \xi _{k}(z)&= \textstyle\begin{cases} 0, \qquad & z < -2, \\ \sin (k z), \quad & \vert z \vert \leq 2, \\ 1,\quad &z>2, \end{cases}\displaystyle \\ \zeta _{k}(z)&= \textstyle\begin{cases} 0, \qquad & z < -2, \\ \cos (k z), \quad & \vert z \vert \leq 2, \\ 1,\quad &z>2. \end{cases}\displaystyle \end{aligned}$$

First we generate \(n= \text{10'000}\) paths to obtain the optimal values \((x_{n},\psi _{n})\). Then we generate 100’000 new paths to test the solution. For \(K_{\mathrm {c},\mathrm {p}}=100\), \(D=10\) and \(\alpha =0.05\), the results are presented in Table 1. It is interesting to see how the upper bounds depend on \(\alpha \). Setting \(S_{0}=100\) and using the same parameter values as above, we obtain Table 2. Here we solve the optimisation problem

$$ E\Big[\sup _{t \in [0,T]} \big((Y_{t}+x)^{+} -\alpha (x+M_{t}^{\psi})\big)\Big]$$

and then divide the result by \(\alpha \). This allowed us to increase \(p\) in (5.1) and to get better results.

Table 1 Dual upper bounds for different values of \(\lambda \) together with lower bounds in the case of a one-dimensional American put option
Table 2 Dual upper bounds for different values of \(\alpha \) together with lower bounds in the case of a one-dimensional American put option

Example 5.2

Now consider a Bermudan max-call option on two assets. For \(i=1,2\), let

$$ dS_{t}^{i}= (r-\delta )S_{t}^{i}dt+\sigma S_{t}^{i} \,dW_{t}^{i}, \qquad S_{0}^{i}=s_{0}^{i},$$

where \(r,\delta ,\sigma \) are constants. This system of SDEs describes two identically distributed assets, where each underlying yields a dividend rate \(\delta \). At any time \(t \in \{t_{0},\ldots ,t_{I}\}\), the holder of the option may exercise it and receive the payoff

$$ Y_{t}=\exp (-rt)\big(\max (S_{t}^{1},S_{t}^{2})-K_{\mathrm {c},\mathrm {p}}\big)^{+}.$$

In our example, we set \(t_{i}=iT/I\), \(i=0,\ldots ,I\), and choose \(T=3\) as well as \(I=9\). For the linear space \(\Psi _{D}\) of functions \(\psi :[0,T]\times \mathbb{R}^{2} \to \mathbb{R}^{2}\), we consider

$$\begin{aligned} \psi _{1}(t,x) \in \hbox{span}\big\{ &\zeta _{k}\big(y_{t}^{1}(x)\big), \xi _{k}\big(ky_{t}^{1}(x)\big),\zeta _{k}\big(y_{t}^{1}(x)\big)1_{\{y_{t}^{1}(x) \leq y_{t}^{2} (x)\}}, \\ &\xi _{k}\big(y_{t}^{1}(x)\big)1_{\{y_{t}^{1} (x) \leq y_{t}^{2} (x) \}}, \\ &\zeta _{k}\big(y_{t}^{1}(x)+y_{t}^{2}(x)\big),\xi _{k}\big(y_{t}^{1}(x)+y_{t}^{2}(x) \big),k=0,\ldots ,D\big\} \end{aligned}$$

and

$$\begin{aligned} \psi _{1}(t,x) \in \hbox{span}\big\{ &\zeta _{k}\big(y_{t}^{2}(x)\big), \xi _{k}\big(y_{t}^{2}(x)\big),\zeta _{k}\big(y_{t}^{2}(x)\big)1_{\{y_{t}^{2} (x) \leq y_{t}^{1} (x)\}}, \\ &\xi _{k}\big(y_{t}^{2}(x)\big)1_{\{y_{t}^{2} (x) \leq y_{t}^{1} (x) \}}, \\ &\zeta _{k}\big(y_{t}^{1}(x)+y_{t}^{2}(x)\big),\xi _{k}\big(y_{t}^{1}(x)+y_{t}^{2}(x) \big),k=0,\ldots ,D\big\} \end{aligned}$$

where \(\xi _{k}\) and \(\zeta _{k}\) are defined in Example 5.1. Now for \(K_{\mathrm {c},\mathrm {p}}=100\), \(r=0.05\), \(\delta =0.1\), \(\alpha =0.05\), \(\sigma =0.2\) and \(D=6\), we obtain Table 3. Like in Example 5.1, it is interesting to vary \(\alpha \). By fixing \(S_{0}^{1}=S_{0}^{2}=100\), we get the results presented in Table 4. In order to compare the current approach with the one used in Belomestny and Krätschmer [8, Table 1], we take \(S^{1}_{0}=S_{0}^{2}=90\) and \(\alpha \in \{0.33,0.5,0.67,0.75\}\). The corresponding results are presented in Table 5. The upper bounds are worse than those in [8]. Note that in [8], a nested approach to compute martingales was used.

Table 3 Dual upper bounds for different values of \(\lambda \) together with lower bounds in the case of a Bermudan max-call option on two assets
Table 4 Dual upper bounds for different values of \(\alpha \) together with lower bounds in the case of a Bermudan max-call option on two assets
Table 5 Bounds (with standard deviations) for 2-dimensional Bermudan max-call

Example 5.3

As in the example before, let

$$ dS_{t}^{i}= (r-\delta )S_{t}^{i}dt+\sigma S_{t}^{i} \,dW_{t}^{i}, \qquad S_{0}^{i}=s_{0}^{i}. $$

We define our reward function as

$$ Y_{t}=\exp (-rt)\big(K_{\mathrm {c},\mathrm {p}}-\min (S_{t}^{1},S_{t}^{2})\big)^{+}. $$

For \(I=9\), \(T=0.5\), \(r=0.06\), \(\delta =0\), \(K_{\mathrm {c},\mathrm {p}}=100\), \(\sigma =0.6\) and with the basis functions used in Example 5.2, we get the results presented in Table 6. By varying \(\alpha \), we get the results for \(S_{0}^{1}=S_{0}^{2}=100\) which are presented in Table 7.

Table 6 Dual upper bounds for different values of \(\lambda \) together with lower bounds in the case of a Bermudan min-put option on two assets
Table 7 Dual upper bounds for different values of \(\alpha \) together with lower bounds in the case of a Bermudan min-put option on two assets

In all the above examples, it is important to find a suitable compact subset \(K\) of ℝ in Theorem 2.2. Using the notations of Remark 2.4, this can be reduced to finding a lower estimate \(a_{\ell}^{1-\alpha}\) and an upper estimate of \(a_{u}^{1-\alpha}\). For this purpose, note that in any of the above examples, the desired estimates may be derived from upper estimates for the quantity \(\sup _{\tau \in \mathcal {T}}E[S_{\tau}] \), where for some \(\mu ,\sigma \in \mathbb{R}\),

$$ dS_{t}= \mu S_{t} dt+\sigma S_{t} \,dW_{t}, \qquad S_{0}=s_{0}.$$

We may invoke the reflection principle for Brownian motion to get

$$\begin{aligned} \sup _{\tau \in \mathcal {T}}E[S_{\tau}] &\leq s_{0}\exp (|\mu - \sigma ^{2}/2| T) E\Big[\exp \Big(\sigma \sup _{t\in [0,T]}W_{t} \Big)\Big] \\ &\leq s_{0}\exp (|\mu - \sigma ^{2}/2| T) \big(1 + 2 E[\exp ( \sigma W_{T} ) ]\big) \\ &\leq s_{0}\exp (|\mu - \sigma ^{2}/2| T) 3 \exp (T \sigma ^{2}/2 ). \end{aligned}$$

Once we have found a suitable interval \(K := [a_{\ell},a_{u}]\), we proceed in the following way. First we fix a grid \(X=\{\overline{a}_{\ell}=x_{0}< x_{1}<\cdots < x_{J}= \overline{a}_{u} \}\). Then for a fixed \(x\in X\), we use the Longstaff–Schwartz algorithm to approximate the value

$$ \sup _{\tau \in \mathcal{T}}E\big[\big(f(X_{\tau})+x\big)^{+}/ \alpha -x\big], $$

where \(X\) is the underlying Markov process with values in \(\mathbb{R}^{d}\) and \(f:\mathbb{R}^{d}\to \mathbb{R}\). To this end, we use a time discretisation by fixing a time grid \(0=t_{0}< t_{1}< \cdots <t_{L}=T\) on \([0,T]\). The LS algorithm is now used to obtain estimates \(\widehat{C}_{0}^{x},\ldots ,\widehat{C}^{x}_{L}\) for the corresponding continuation functions based on polynomials of degree 3 and with 100’000 Monte Carlo paths of the process \(X\). After that, we approximate the value of

$$ \sup _{\tau \in \mathcal {T}} \text{AV@R}_{1-\alpha}(Y_{\tau})$$

via

$$\begin{aligned} \inf _{x\in X}\frac{1}{n}\sum _{i=1}^{n}\big((Y^{(i)}_{\tau ^{(i)}(x)}+x)^{+}/ \alpha -x\big) \end{aligned}$$
(5.2)

with \(\tau ^{(i)}(x)=\min \{0\leq \ell \leq L:f(X_{t_{\ell}}^{(i)})\geq \widehat{C}_{\ell}^{x}(X_{t_{\ell}}^{(i)}) \}\). Here \(X^{(i)}_{t_{0}},\ldots ,X^{(i)}_{t_{L}}\) with \(i=1,\ldots ,n\) are \(n\) trajectories of the process \(X\) independent of those used to approximate the continuation values. Note that due to the discretisation in \(x\), we may incur an additional upward bias in the estimate (5.2). On the other hand, the time discretisation introduces a downward bias that can compensate. Our numerical experiments suggest that both biases are negligible (for large enough \(J\) and \(L\)) compared to the downward bias due to the error of approximating the underlying continuation functions.

6 Proof of the main results

6.1 Preparations and notations

To prove Theorems 3.1 and 3.3, we need some preparation. Since the way of proving both theorems is the same at the beginning, the preparation is valid for both proofs. Let \(\eta >0\). With \((K \times \Psi )_{\eta}\), we denote the space of centres of minimal \(\eta \)-balls needed to cover \(K\times \Psi \), with respect to the semimetric

$$ d\big((x,\psi ),(x',\psi ')\big)=E[|Z(x,\psi )-Z(x',\psi ')| ].$$

Fix \(n \in \mathbb{N}\) and \(\lambda >0\). By \((x_{n,\eta},\psi _{n,\eta})\), we denote a measurable selector of the set

and by \((x_{\eta}^{*},\psi _{\eta}^{*})\), we denote an element of the set \((K \times \Psi )_{\eta}\) satisfying

$$ d\big((x^{*},\psi ^{*}),(x_{\eta}^{*},\psi _{\eta}^{*})\big)\leq \eta $$
(6.1)

for a solution \((x^{*},\psi ^{*})\) of (3.1). Due to the construction of \((K \times \Psi )_{\eta}\), there always exists such an \((x_{\eta}^{*},\psi _{\eta}^{*})\), but it need not be unique. For \((x,\psi ) \in K \times \Psi \), let

$$\begin{aligned} g_{n}(x,\psi ) &:=\frac{1}{n(n-1)}\sum _{1\leq i< j\leq n}\Big( \big(Z^{(i)}(x, \psi )-Z^{(j)}(x,\psi ) \big)^{2} \\ & \hphantom{:=\frac{1}{n(n-1)}\sum _{1\leq i< j\leq n}\Big(}{} - \big(Z^{(i)}(x^{*},\psi ^{*})-Z^{(j)}(x^{*}, \psi ^{*}) \big)^{2}\Big), \\ h_{n,\lambda}(x,\psi )&:=\frac{1}{n}\sum _{i=1}^{n} \big(Z^{(i)}(x, \psi )-Z^{(i)}(x^{*},\psi ^{*})\big) +\lambda g_{n}(x,\psi ), \end{aligned}$$

as well as

$$\begin{aligned} g(x, \psi )&= \big(Z(x,\psi )-\tilde{Z}(x,\psi )\big)^{2} - \big(Z(x^{*}, \psi ^{*})-\tilde{Z}(x^{*},\psi ^{*})\big)^{2}, \\ h_{\lambda}(x,\psi )&=2\big(Z(x,\psi )-Z(x^{*},\psi ^{*})\big)+ \lambda g(x,\psi ), \end{aligned}$$

where \(\tilde{Z} = (\tilde{Z}(x,\psi ))\) is an independent copy of \(Z\). With the above definitions, we have \(P^{\mathbb{N}}\)-a.s. for \(c\geq 0\) that

$$\begin{aligned} &E[h_{\lambda}(x_{n,\eta},\psi _{n,\eta}) | \mathcal{D}_{n}] \\ &\leq E[h_{\lambda}(x_{n,\eta},\psi _{n,\eta}) | \mathcal{D}_{n}]-(1+c)h_{n, \lambda} (x_{n,\eta},\psi _{n,\eta} )+(1+c)h_{n,\lambda} (x_{\eta}^{*}, \psi _{\eta}^{*} ) \\ &\leq \sup _{(x,\psi ) \in (K \times \Psi )_{\eta}}\big(2E[h_{n, \lambda}(x,\psi )]-(1+c)h_{n,\lambda}(x,\psi )\big)+(1+c)h_{n,\lambda}(x_{\eta}^{*},\psi _{\eta}^{*}). \end{aligned}$$

Indeed, the first inequality holds due to

$$ h_{n,\lambda} (x_{n,\eta},\psi _{n,\eta} ) \leq h_{n,\lambda} (x_{\eta}^{*},\psi _{\eta}^{*} ) \qquad P^{\mathbb{N}}\mbox{-a.s.},$$

which follows directly from the definitions of \((x_{n,\eta},\psi _{n,\eta})\) and \(h_{n,\lambda}\). Now we have to analyse

$$ \sup _{(x,\psi ) \in (K \times \Psi )_{\eta}}\big(2E[h_{n,\lambda}(x, \psi )]-(1+c)h_{n,\lambda}(x,\psi )\big)$$

and

$$ (1+c)h_{n,\lambda}(x_{\eta}^{*},\psi _{\eta}^{*}).$$

Let us start with the first term. Observe that

$$\begin{aligned} &\sup _{(x,\psi ) \in (K \times \Psi )_{\eta}}\big(2E[h_{n, \lambda}(x,\psi ) ]-(1+c)h_{n,\lambda}(x,\psi )\big) \\ &=\sup _{(x,\psi ) \in (K \times \Psi )_{\eta}}\Big(2D(x,\psi ) +2 \lambda E[g_{n}(x,\psi ) ]-(1+c)\big(\xi _{n}(x,\psi )+\lambda g_{n}(x, \psi )\big)\Big) \\ &\leq \sup _{(x,\psi ) \in (K\times \Psi )_{\eta}}\big(2D(x,\psi )-(1+c) \xi _{n}(x,\psi )\big) \\ &\qquad +\sup _{(x,\psi ) \in (K \times \Psi )_{\eta}}\big(2\lambda E[g_{n}(x,\psi )]-(1+c)\lambda g_{n}(x,\psi )\big), \end{aligned}$$

where

$$\begin{aligned} D(x,\psi )&=E[Z(x,\psi )]-E[Z(x^{*},\psi ^{*})], \\ \xi _{n}(x,\psi )&=\frac{1}{n}\sum _{i=1}^{n}\big(Z^{(i)}(x,\psi )-Z^{(i)}(x^{*}, \psi ^{*})\big). \end{aligned}$$

Note that for all \(n \in \mathbb{N}\),

$$ D(x,\psi )=E\left [\xi _{n}(x,\psi )\right ].$$

At this point, it makes sense to separate the further steps of the proofs for the two theorems. But to prove both theorems, we have to analyse the following terms, where the aim is to find upper bounds holding within a given probability:

$$\begin{aligned} T_{1}&=\sup _{(x,\psi ) \in (K\times \Psi )_{\eta}} \big(2D(x,\psi )-(1+c) \xi _{n}(x,\psi )\big), \\ \end{aligned}$$
(6.2)
$$\begin{aligned} T_{2}&=\sup _{(x,\psi ) \in (K \times \Psi )_{\eta}}\big(2\lambda E[g_{n}(x,\psi )]-(1+c)\lambda g_{n}(x,\psi )\big), \end{aligned}$$
(6.3)
$$\begin{aligned} T_{3}&=(1+c)h_{n,\lambda}(x_{\eta}^{*},\psi _{\eta}^{*}). \end{aligned}$$
(6.4)

6.2 Outline for the proof of Theorem 3.1

The idea is to derive bounds for \(T_{1},T_{2},T_{3}\). Therefore we use some concentration inequalities like the Hoeffding inequality, the Bernstein inequality and a new one which is based on a bounded differences approach.

Let \(c=1\) and fix \(n \in \mathbb{N}\) and \(\eta >0\). We show in Sect. 6.6.1 that

$$\begin{aligned} P^{\mathbb{N}}[T_{1}\geq \epsilon ]&\leq 2\mathcal {N}(K\times \Psi , \eta ) \exp \bigg(\frac{-n\epsilon ^{2}}{8\ b^{2}}\bigg)\qquad \mbox{for}~\epsilon > 0, \end{aligned}$$
(6.5)
$$\begin{aligned} P^{\mathbb{N}}[T_{2}\geq \epsilon ]&\leq 2\mathcal {N}(K\times \Psi , \eta ) \exp \bigg(\frac{-n\epsilon ^{2}}{512~\lambda ^{2}b^{4}}\bigg) \qquad \mbox{for}~\epsilon > 0. \end{aligned}$$
(6.6)

For the analysis of \(T_{3}\), we notice that

$$ T_{3}=2\big(\xi _{n}(x_{\eta}^{*},\psi _{\eta}^{*})-D(x_{\eta}^{*},\psi _{\eta}^{*})+\lambda g_{n}(x_{\eta}^{*},\psi _{\eta}^{*})- \lambda E[g_{n}(x_{\eta}^{*},\psi _{\eta}^{*}) ]+E[h_{n,\lambda}(x_{\eta}^{*},\psi _{\eta}^{*}) ]\big)$$

and in Sect. 6.6.1, we obtain for \(\epsilon > 0\) that

$$\begin{aligned} P^{\mathbb{N}}[2\xi _{n}(x_{\eta}^{*},\psi _{\eta}^{*})-2D(x_{\eta}^{*}, \psi _{\eta}^{*})\geq \epsilon ]&\leq 2\exp \bigg( \frac{-n\epsilon ^{2}}{32\ b^{2}}\bigg), \end{aligned}$$
(6.7)
$$\begin{aligned} P^{\mathbb{N}}\big[2\lambda g_{n}(x_{\eta}^{*},\psi _{\eta}^{*})- 2 \lambda E[g_{n}(x_{\eta}^{*},\psi _{\eta}^{*})] \geq \epsilon \big] &\leq 2\exp \bigg( \frac{-n\epsilon ^{2}}{512~b^{4} \lambda ^{2}}\bigg). \end{aligned}$$
(6.8)

With the help of these concentration inequalities, we can derive bounds for \(T_{1},T_{2},T_{3}\) within a given probability if we choose \(\epsilon \) well. After deriving bounds for \(T_{1},T_{2},T_{3}\), we can easily find a bound for \(T_{1}+T_{2}+T_{3}\) within a given probability. Then the same bound holds for \(2\mathcal {Q}_{\lambda}(x_{n},\psi _{n})-2\mathcal {Q}_{\lambda}(x^{*},\psi ^{*})\) within the given probability because \(2\mathcal {Q}_{\lambda}(x_{n},\psi _{n})-2\mathcal {Q}_{\lambda}(x^{*},\psi ^{*}) \leq T_{1}+T_{2}+T_{3}\) \(P^{\mathbb{N}}\)-a.s.

6.3 Proof of Theorem 3.1

Fix \(n \in \mathbb{N}\) and \(\eta >0\), as well as \(\delta \in (0,1)\). Further, we impose that we have

$$ \log \mathcal {N}(K\times \Psi ,\eta )\leq n\eta .$$

Then we set

$$\begin{aligned} a_{1}&:=a_{1}(c,b,n,\lambda ,\delta ,\eta ):=\sqrt{8} \, b \bigg( \sqrt{\frac{\log (8/\delta )}{n}}+\sqrt{\eta}\bigg), \\ a_{2}&:=a_{2}(c,b,n,\lambda ,\delta ,\eta ):= 16 \sqrt{2} \, \lambda \, b^{2} \bigg(\sqrt{\frac{\log (8/\delta )}{n}}+\sqrt{\eta}\bigg), \end{aligned}$$

and we derive with the inequalities (6.5) and (6.6) the estimates

$$\begin{aligned} P^{\mathbb{N}}[T_{1} \geq a_{1} ] \leq \delta /4\qquad \mbox{and}\qquad P^{\mathbb{N}}[T_{2}\geq a_{2} ] \leq \delta /4. \end{aligned}$$

Therefore, by elementary calculations, we arrive at

$$ P^{\mathbb{N}}\left [T_{1}+T_{2}\leq a_{1}+a_{2}\right ]\geq 1- \delta /2. $$
(6.9)

Concerning \(T_{3}\), let us set

$$\begin{aligned} a_{3}&:=a_{3}(b,n,\lambda ,\delta ):= 16 \sqrt{2} \lambda b^{2} \sqrt{ \frac{\log (8/\delta )}{n}}, \\ a_{4}&:= a_{4} (b,n,\delta ):= 4 \sqrt{2} b \sqrt{ \frac{\log (8/\delta )}{n}}. \end{aligned}$$

Then we can derive first

$$\begin{aligned} P^{\mathbb{N}}[2\xi _{n}(x_{\eta}^{*},\psi _{\eta}^{*})-2D(x_{\eta}^{*}, \psi _{\eta}^{*})\geq a_{4} ]&\leq \delta /4, \\ P^{\mathbb{N}}\big[2\lambda g_{n}(x_{\eta}^{*},\psi _{\eta}^{*})- 2 \lambda E[g_{n}(x_{\eta}^{*},\psi _{\eta}^{*})] \geq a_{3} \big] & \leq \delta /4. \end{aligned}$$

This leads again with elementary calculations to

$$ P^{\mathbb{N}}\big[T_{3}\leq a_{3}+a_{4}+2E[h_{n,\lambda}(x_{\eta}^{*},\psi _{\eta}^{*})]\big]\geq 1-\delta /2. $$
(6.10)

Now we only need an upper estimate of \(E[h_{n,\lambda}(x_{\eta}^{*},\psi _{\eta}^{*}) ]\), which is presented via

$$\begin{aligned} &E[h_{n,\lambda}(x_{\eta}^{*},\psi _{\eta}^{*})] \\ & =E[Z(x_{\eta}^{*},\psi _{\eta}^{*})]-E[Z(x^{*},\psi ^{*})] \\ & \quad{} +\frac{\lambda}{2}~E\big[\big(Z(x_{\eta}^{*},\psi _{\eta}^{*}) - \tilde{Z}(x_{\eta}^{*},\psi _{\eta}^{*})\big)^{2} - \big(Z(x^{*}, \psi ^{*}) - \tilde{Z}(x^{*},\psi ^{*})\big)^{2}\big] \\ & \leq E[|Z(x_{\eta}^{*},\psi _{\eta}^{*})-Z(x^{*},\psi ^{*})|] \\ & \quad{} +2b\lambda E[|Z(x_{\eta}^{*},\psi _{\eta}^{*})-\tilde{Z}(x_{\eta}^{*},\psi _{\eta}^{*})-Z(x^{*},\psi ^{*})+\tilde{Z}(x^{*},\psi ^{*})|] \\ & \leq E[ |Z(x_{\eta}^{*},\psi _{\eta}^{*})-Z(x^{*},\psi ^{*}) | ] +4b \lambda E[ |Z(x_{\eta}^{*},\psi _{\eta}^{*})-Z(x^{*},\psi ^{*}) | ] \\ & \leq (1+4 b \lambda )\eta . \end{aligned}$$
(6.11)

Above, the equality follows directly by definition. The first inequality is derived by using the third binomial formula \(x^{2} - y^{2} = (x - y) (x + y)\) backwards in connection with the boundedness of \(Z\) and \(\tilde{Z}\). The second inequality holds because \(Z\) and \(\tilde{Z}\) are independent with identical distribution. The final inequality results from the definition of \((x_{\eta}^{*},\psi _{\eta}^{*})\); see (6.1). So by (6.10) and (6.11), we get for \(T_{3}\) that

$$ P^{\mathbb{N}}\left [T_{3}\leq a_{3}+a_{4}+2(1+ 4 b\lambda )\eta \right ]\geq 1-\delta /2. $$
(6.12)

Now, combining (6.9) and (6.12), we derive

$$ P^{\mathbb{N}}\left [T_{1}+T_{2}+T_{3}\leq a_{1}+a_{2}+a_{3}+a_{4} + 2(1+4 b\lambda )\eta \right ]\geq 1-\delta , $$

and since we have \(P^{\mathbb{N}}\)-a.s. that

$$ 2\big( \mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*}, \psi ^{*})\big) \leq 2\big( \mathcal {Q}_{\lambda}(x_{n,\eta},\psi _{n, \eta})-\mathcal {Q}_{\lambda}(x^{*},\psi ^{*})\big) \leq T_{1}+T_{2}+T_{3}, $$

we finish with

$$ P^{\mathbb{N}}\big[2\big(\mathcal {Q}_{\lambda}(x_{n},\psi _{n})- \mathcal {Q}_{\lambda}(x^{*},\psi ^{*})\big)\leq a_{1}+a_{2}+a_{3}+a_{4}+2(1+4 b \lambda )\eta \big]\geq 1-\delta . $$

Setting \(\eta =\gamma (K\times \Psi ,n)\), the assumption

$$ \log \mathcal {N}(K \times \Psi ,\eta )\leq n\eta $$

is always satisfied, and we have

$$ P^{\mathbb{N}}\big[2\big( \mathcal {Q}_{\lambda}(x_{n},\psi _{n})- \mathcal {Q}_{\lambda}(x^{*},\psi ^{*})\big)\leq A(n,c,\lambda ,\delta ) \big]\geq 1-\delta , $$

where

$$\begin{aligned} A(n,c,\lambda ,\delta )&=b( 4 \sqrt{2} + 16\lambda \sqrt{2} b)\bigg(2\sqrt{ \frac{\log (8/\delta )}{n}}+\sqrt{\gamma (K\times \Psi ,n)}\bigg) \\ &\quad{} +2(1+4 b \lambda )\gamma (K\times \Psi ,n). \end{aligned}$$

Now the statement of Theorem 3.1 follows immediately. □

6.4 Outline for the proof of Theorem 3.3

The proof of Theorem 3.3 is similar to the proof of Theorem 3.1, but relies on some different concentration inequalities given below. Let \(c \geq 2\), \(n \in \mathbb{N}\) and \(\eta >0\), as well as \(\epsilon >0\). With \(L\) from (6.18) below, we have

$$\begin{aligned} & P^{\mathbb{N}}\left [T_{1}\geq \epsilon \right ]\leq 2\mathcal{N}(K \times \Psi ,\eta )\exp \bigg(\frac{-n\epsilon}{(1+c)^{2} L}\bigg), \end{aligned}$$
(6.13)
$$\begin{aligned} & P^{\mathbb{N}}\left [T_{2}\geq \epsilon \right ]\leq 2\mathcal{N}(K \times \Psi ,\eta )\exp \bigg( \frac{-n\epsilon}{32 b^{2} (1+c)^{2}\lambda + 8 (1+c)^{2}\lambda} \bigg) \end{aligned}$$
(6.14)

(see Sect. 6.6.2). Moreover, with \(C_{\lambda ,b}\) as in Theorem 3.3, we derive in Sect. 6.6.2 the estimates

$$\begin{aligned} P^{\mathbb{N}}\big[g_{n}(x_{\eta}^{*},\psi _{\eta}^{*})-E[g_{n}(x_{\eta}^{*},\psi _{\eta}^{*})]\geq \eta /\lambda \big] \leq 2\exp \bigg( \frac{-n\eta}{C_{\lambda ,b}}\bigg) \end{aligned}$$
(6.15)

and

$$\begin{aligned} P^{\mathbb{N}}[\xi _{n}(x_{\eta}^{*},\psi _{\eta}^{*})-D(x_{\eta}^{*}, \psi _{\eta}^{*})\geq \kappa ] \leq 2\exp \bigg( \frac{-n\kappa ^{2}}{4~b\eta +\frac{8\ b}{3}\kappa} \bigg)\qquad \mbox{for}~\kappa > 0. \end{aligned}$$
(6.16)

6.5 Proof of Theorem 3.3

In the following, let \(\delta \in (0,1)\), \(c\geq 2\) and let \(n \in \mathbb{N}\) and \(\eta >0\) satisfy the condition

$$ \log \mathcal {N}(K\times \Psi ,\eta )\leq n\eta . $$
(6.17)

Let us introduce

$$ L:=\sup _{ {\scriptstyle (x,\psi )\in K\times \Psi ,\atop\scriptstyle E[Z(x,\psi )] > E[Z(x^{*},\psi ^{*})]}} \frac{2\mathrm {{Var}}[Z(x,\psi )]}{(c-1) (E[Z(x,\psi )]- E[Z(x^{*},\psi ^{*})] )} +\frac{8}{3}b. $$
(6.18)

Lemma 6.1

If (3.2)(3.4) hold, then \(L <\infty \).

Proof

Let \((\overline{x}_{k},\overline{\psi}_{k})_{k\in \mathbb{N}}\) be any sequence in \(K\times \Psi \) satisfying the strict inequality \(E[Z(\overline{x}_{k},\overline{\psi}_{k})] > E[Z(x^{*}, \psi ^{*})] \) for \(k\in \mathbb{N}\) and

$$ \lim _{k\to \infty} \frac{2\mathrm {{Var}}[(Z(\overline{x}_{k},\overline{\psi}_{k})]}{(c-1) (E[Z(\overline{x}_{k},\overline{\psi}_{k})]- E[Z(x^{*},\psi ^{*})] )} +\frac{8}{3}b = L. $$

First of all, the sequence \((\mathrm {{Var}}[Z(\overline{x}_{k},\overline{\psi}_{k})])\) is bounded because the random variables \(Z(x,\psi )\) are \(P^{\mathbb{N}}\)-essentially bounded, uniformly in \((x,\psi )\in K\times \Psi \). Therefore, in order to show the finiteness of \(L\), it suffices to restrict our considerations to the case \(E[Z(\overline{x}_{k},\overline{\psi}_{k})]- E[Z(x^{*}, \psi ^{*})]\to 0\). In this situation, the “well-separated minimum” property (3.3) implies the convergence \(d ((\overline{x}_{k},\overline{\psi}_{k}), (x^{*},\psi ^{*}) )\to 0\). Therefore by the identifiability condition (3.4), we may find some \(\overline{C} > 0\) and \(k_{0}\in \mathbb{N}\) such that

$$ E[Z(\overline{x}_{k},\overline{\psi}_{k})]- E[Z(x^{*}, \psi ^{*})]\geq \overline{C}~ d\big((\overline{x}_{k},\overline{\psi}_{k}), (x^{*},\psi ^{*})\big)\qquad \mbox{for }k\geq k_{0}. $$

Next, in view of (3.2),

$$\begin{aligned} \mathrm {{Var}}[Z(\overline{x}_{k},\overline{\psi}_{k}) ] &= \mathrm {{Var}}[Z( \overline{x}_{k},\overline{\psi}_{k}) - Z(x^{*},\psi ^{*}) ] \\ &\leq E\big[\big(Z(\overline{x}_{k},\overline{\psi}_{k}) - Z(x^{*}, \psi ^{*})\big)^{2}\big] \\ &\leq 2 b~d\big((\overline{x}_{k},\overline{\psi}_{k}), (x^{*},\psi ^{*}) \big) \end{aligned}$$

for \(k\in \mathbb{N}\), and thus

$$\begin{aligned} & \frac{2\mathrm {{Var}}[Z(\overline{x}_{k},\overline{\psi}_{k})]}{(c-1) (E[Z(\overline{x}_{k},\overline{\psi}_{k})]- E[Z(x^{*},\psi ^{*})] )} +\frac{8}{3}b \leq \frac{4 b}{(c - 1)~\overline{C}} + \frac{8}{3}b \qquad \mbox{for }k\geq k_{0}. \end{aligned}$$

This completes the proof due to the choice of the sequence \((\overline{x}_{k},\overline{\psi}_{k})_{k\in \mathbb{N}}\). □

From now on, we assume that the conditions (3.2)–(3.4) are satisfied so that the constant \(L\) defined in (6.18) is finite. In particular, we may introduce

$$\begin{aligned} a_{5}:=a_{5}(c,\delta ,\eta ,L,n):=(1+c)^{2} L\bigg(\eta + \frac{\log (8/\delta )}{n}\bigg). \end{aligned}$$

Furthermore, we set

$$\begin{aligned} a_{6}:=a_{6}(b,c,\delta ,\eta ,\lambda ,n):=\big(32b^{2}(1+c)^{2} \lambda + 8 (1 + c)^{2} \lambda \big)\bigg(\eta + \frac{\log (8/\delta )}{n}\bigg). \end{aligned}$$

Then we may derive from (6.13) and (6.14) along with (6.17) that

$$\begin{aligned} P^{\mathbb{N}}[T_{1}\geq a_{5}]\leq \delta /4\qquad \mbox{and} \qquad P^{\mathbb{N}}[(T_{2}\geq a_{6}]\leq \delta /4. \end{aligned}$$
(6.19)

In addition, if

$$ \eta \geq \frac{C_{\lambda ,b}\log (8/\delta )}{n}, $$
(6.20)

we get from (6.15) that

$$ P^{\mathbb{N}}\bigg[g_{n}(x_{\eta}^{*},\psi _{\eta}^{*})-E[g_{n}(x_{\eta}^{*},\psi _{\eta}^{*})]\geq \frac{\eta}{\lambda}\bigg] \leq \delta /4. $$
(6.21)

Setting

$$ \kappa (n)=\left \{ \textstyle\begin{array}{ll} \frac{4 b (\eta +2/3 )\log (8/\delta )}{n} \qquad & \text{for }n \leq 4 b (\eta +2/3 )\log (8/\delta ), \\ \sqrt{\frac{ 4 b(\eta +2/3 )\log (8/\delta )}{n}} \qquad & \text{for }n > 4 b (\eta +2/3 )\log (8/\delta ),\end{array}\displaystyle \right . $$

we may conclude from (6.16) that

$$ P^{\mathbb{N}}[\xi _{n}(x_{\eta}^{*},\psi _{\eta}^{*})-D(x_{\eta}^{*}, \psi _{\eta}^{*})\geq \kappa (n) ] \leq \delta /4. $$
(6.22)

Next, recall that in view of (6.11), the inequality \(E[h_{n,\lambda}(x_{\eta}^{*},\psi _{\eta}^{*})]\leq (1+4 b \lambda )\eta \) holds. Then combining (6.19) and (6.22) with (6.21), we obtain under condition (6.20) that

$$ P^{\mathbb{N}}\big[T_{1}+T_{2}+T_{3}\leq a_{5}+a_{6}+(1+c)\big( \eta +\kappa (n) + (1+4 b\lambda )\eta \big)\big]\geq 1-\delta $$

and therefore

$$\begin{aligned} & P^{\mathbb{N}}\big[2\big(\mathcal {Q}_{\lambda}(x_{n},\psi _{n})- \mathcal {Q}_{\lambda}(x^{*},\psi ^{*})\big)\leq a_{5}+a_{6}+(1+c)\big( \eta +\kappa (n)+ (1+4 b\lambda )\eta \big)\big] \\ &\geq 1-\delta . \end{aligned}$$

Let us set

$$ \eta =\gamma (K \times \Psi ,n)+ \frac{C_{\lambda ,b}\log (8/\delta )}{n}. $$

Then (6.20) and (6.17) are fulfilled. By elementary calculation and the definition of \(\kappa (n)\), we may find for \(n> 4b (\gamma (K \times \Psi ,n) + C_{\lambda ,b}\log (8/\delta )/n +2/3) \log (8/\delta )\) some universal constants \(c_{1},c_{2}>0\) such that

$$ P^{\mathbb{N}} [\mathcal{Q}_{\lambda}(x_{n},\psi _{n})-\mathcal{Q}_{ \lambda}(x^{*},\psi ^{*})\leq c_{1} R_{1}(n,\delta )+c_{2} R_{2}(n, \lambda ,\delta ) ] \geq 1-\delta , $$

where \(R_{1}(n,\delta )\), \(R_{2}(n,\lambda ,\delta )\) are as in (3.6) and (3.7), respectively. The proof is complete. □

6.6 Proofs of the concentration inequalities

Let us at first give an auxiliary result which will turn out to be useful.

Lemma 6.2

Let \(\Lambda (x,\psi )\) be a random variable parametrised by \((x,\psi ) \in K \times \Psi \). If \(\mathcal{N}(K \times \Psi ,\eta )<\infty \), then for every \(z \in \mathbb{R}\), there is a pair \((\overline{x},\overline {\psi }) \in (K\times \Psi )_{\eta}\) (depending on \(z\)) such that

$$ P\Big[\sup _{(x,\psi ) \in (K \times \Psi )_{\eta}} \Lambda (x, \psi ) \geq z\Big]\leq \mathcal{N}(K \times \Psi ,\eta )~P[ \Lambda (\overline{x},\overline {\psi })\geq z]. $$

Proof

Since \((K \times \Psi )_{\eta}\) has finite cardinality \(\mathcal{N}(K \times \Psi ,\eta )\), we have

$$\begin{aligned} P\Big[\sup _{(x,\psi ) \in (K \times \Psi )_{\eta}} \Lambda (x, \psi ) \geq z\Big]&=P\bigg[\bigcup _{(x,\psi ) \in (K \times \Psi )_{\eta}} \{\Lambda (x,\psi ) \geq z\}\bigg] \\ &\leq \sum _{(x,\psi ) \in (K \times \Psi )_{\eta}} P[\Lambda (x, \psi )\geq z] \\ &\leq \mathcal{N}(K \times \Psi ,\eta )P[\Lambda ( \overline{x},\overline {\psi }) \geq z], \end{aligned}$$

where . □

6.6.1 Proofs of the concentration inequalities for Theorem 3.1

Let us now prove the concentration inequalities used to prove Theorem 3.1. We start with (6.5).

Proof of (6.5)

Due to Lemma 6.2, there exists \((\overline {x},\overline {\psi }) \in (K\times \Psi)_{\eta}\) such that

$$\begin{aligned} &P^{\mathbb{N}}\Big[\sup _{(x,\psi )\in (K \times \Psi )_{\eta}} \big(2D(x,\psi )-2\xi _{n}(x,\psi )\big)\geq \epsilon \Big] \\ & \leq \mathcal{N}(K\times \Psi ,\eta )~P^{\mathbb{N}}[2D( \overline {x},\overline {\psi })-2\xi _{n}(\overline {x},\overline {\psi })\geq \epsilon ]. \end{aligned}$$

Using Corollary A.3, we derive

$$\begin{aligned} P^{\mathbb{N}}[\vert \xi _{n}(\overline {x},\overline {\psi })-D(\overline {x},\overline {\psi }) \vert \geq \epsilon /2 ]&\leq 2\exp \bigg( \frac{-n \epsilon ^{2}}{8\ b^{2}}\bigg). \end{aligned}$$

So finally, we get

$$\begin{aligned} P^{\mathbb{N}}\Big[\sup _{(x,\psi )\in (K \times \Psi )_{\eta}} \big(2D(x,\psi )-2\xi _{n}(x,\psi )\big)\geq \epsilon \Big] \leq 2 \mathcal{N}(K\times \Psi ,\eta )\exp \bigg( \frac{-n\epsilon ^{2}}{8\ b^{2}}\bigg). \end{aligned}$$

This shows (6.5) since we have chosen \(c = 1\). □

To prove (6.6), we need the following result for preparation.

Theorem 6.3

For \((x,\psi )\in K\times \Psi \) and \(t > 0\), it holds that

$$ P^{\mathbb{N}}\left [\left |g_{n}(x,\psi )-E\left [g_{n}(x, \psi )\right ]\right |>t\right ]\leq 2\exp \bigg( \frac{-nt^{2}}{128 b^{4}}\bigg). $$

Proof

We want to apply the bounded differences inequality (see Boucheron et al. [14, Theorem 6.2]) to the function \(\overline{g}_{n}: ([-b,b]\times [-b,b] )^{n}\rightarrow \mathbb{R}\) defined by

$$ \overline{g}_{n}\big((z_{1},\overline{z}_{1}),\ldots ,(z_{n}, \overline{z}_{n})\big):= \frac{1}{n (n - 1)}\sum _{1\leq i < j\leq n} \big((z_{i} - z_{j})^{2} - (\overline{z}_{i} - \overline{z}_{j})^{2} \big). $$

Therefore it suffices to show that \(\overline{g}_{n}\) satisfies the so-called bounded differences condition (cf. [14]). For this purpose, let \(k\in \{1,\ldots ,n\}\) and consider arbitrary pairs \((z_{1},\overline{z}_{1}),\ldots , (z_{n},\overline{z}_{n})\) and \((z_{k}',\overline{z}_{k}')\) from \([-b,b]^{2}\). With \(\underline{z} := ((z_{1},\overline{z}_{1}),\ldots ,(z_{n}, \overline{z}_{n}) )\) and

$$ \underline{z}' := \big((z_{1},\overline{z}_{1}),\ldots ,(z_{k-1}, \overline{z}_{k-1}),(z_{k}',\overline{z}_{k}'),(z_{k+1},\overline{z}_{k+1}), \ldots ,(z_{n},\overline{z}_{n})\big), $$

we then have

$$\begin{aligned} \big| \overline{g}_{n} (\underline{z}) - \overline{g}_{n} ( \underline{z}')\big| &\leq \frac{1}{n (n - 1)}~\sum _{ {\scriptstyle i=1\atop\scriptstyle i\neq k}}^{n}|(z_{i} - z_{k})^{2} - (z_{i}- z_{k}')^{2}| \\ & \phantom{=:}{} + \frac{1}{n (n - 1)}~\sum _{{\scriptstyle i=1\atop\scriptstyle i\neq k}}^{n}|( \overline{z}_{i} - \overline{z}_{k})^{2} - (\overline{z}_{i}- \overline{z}_{k}')^{2}| \\ &\leq \frac{1}{n(n-1)} \big(8(n-1)b^{2}+8(n-1)b^{2} \big) = \frac{16 b^{2}}{n}. \end{aligned}$$

So obviously \(\overline{g}_{n}\) satisfies the bounded differences condition with \(c_{k}:=16 b^{2}/n\) for \(k \in \{1,\ldots ,n\}\), and

$$ v:=\frac{1}{4}\sum _{k=1}^{n} c_{k}^{2}=\frac{64 b^{4}}{n}.$$

Now [14, Theorem 6.2] provides the estimate

$$\begin{aligned} P\left [\left |g_{n}-E\left [g_{n}\right ]\right |>t \right ]&\leq 2\exp \bigg(\frac{-t^{2}}{2v}\bigg) =2\exp \bigg( \frac{-nt^{2}}{128 b^{4}}\bigg). \end{aligned}$$

 □

We are ready to prove (6.6)

Proof of (6.6)

First of all, by Lemma 6.2,

$$\begin{aligned} &P^{\mathbb{N}}\Big[\sup _{(x,\psi ) \in (K\times \Psi )_{\eta}} \big(2 \lambda E[g_{n}(x,\psi )]-2\lambda g_{n}(x,\psi )\big) \geq t\Big] \\ &\leq \mathcal{N}(K\times \Psi ,\eta )P^{\mathbb{N}}\big[E[g_{n}( \overline {x},\overline {\psi })]-g_{n}(\overline {x},\overline {\psi }) \geq t/(2 \lambda )\big] \end{aligned}$$

for some \((\overline {x},\overline {\psi })\in (K\times \Psi )_{\eta}\). Then the inequality follows immediately from Theorem 6.3. □

Finally, (6.7) may be proved by an application of Corollary A.3, whereas (6.8) follows from Theorem 6.3.

6.6.2 Proofs of the concentration inequalities for Theorem 3.3

Let us now prove the inequalities used for the proof of Theorem 3.3. Under the additional assumption that \(\mathrm {{Var}}[Z(x^{*},\psi ^{*})]=0\), we first give a lemma, recalling the semimetric \(d\) on \(K\times \Psi \) introduced in Sect. 6.1.

Lemma 6.4

Under the condition (3.2), we have

$$ E[g^{2}(x,\psi ) ]\leq 4b^{2}E\left [|g(x,\psi )|\right ] \leq 32 b^{3} d\big((x,\psi ),(x^{*},\psi ^{*})\big)\quad \textit{for}~(x, \psi )\in K\times \Psi . $$

Proof

Assumption (3.2) means that \(Z(x^{*},\psi ^{*})\) and \(\tilde{Z}(x^{*},\psi ^{*})\) coincide \(P\)-a.s. Hence

$$\begin{aligned} E[g^{2}(x,\psi ) ] &=E\Big[\Big(\big(Z(x,\psi )-\tilde{Z}(x, \psi )\big)^{2}-\big(Z(x^{*},\psi ^{*})-\tilde{Z}(x^{*},\psi ^{*}) \big)^{2}\Big)^{2}\Big] \\ &=E\big[\big(Z(x,\psi )-\tilde{Z}(x,\psi )\big)^{4}\big] \\ &\leq 4b^{2} E\big[\big(Z(x,\psi )-\tilde{Z}(x,\psi )\big)^{2} \big] =4 b^{2} E[g(x,\psi )]. \end{aligned}$$

Since \(Z\) and \(\tilde{Z}\) are bounded by the constant \(b\) and are identically distributed, using \(x^{2} - y^{2} = (x - y) (x + y)\) along with the triangle inequality yields

$$\begin{aligned} E\left [g(x,\psi )\right ] &\leq 4 b~E[ |Z(x,\psi )- Z(x^{*}, \psi ^{*}) + \tilde{Z}(x^{*},\psi ^{*})-\tilde{Z}(x,\psi ) | ] \\ &\leq 8 b~E[ |Z(x,\psi )- Z(x^{*},\psi ^{*}) | ]. \end{aligned}$$

This completes the proof. □

The following auxiliary result is a useful consequence of Lemma 6.4.

Lemma 6.5

If (3.2) is satisfied, then for \((x,\psi )\in K\times \Psi \) and \(\epsilon > 0\),

$$\begin{aligned} P^{\mathbb{N}}\big[ |g_{n}(x,\psi ) - E[g_{n}(x,\psi )] | > \epsilon \big] \leq 2\exp \bigg( \frac{- n \epsilon ^{2}}{8 b^{2} E[g(x,\psi )] + 4 \epsilon /3} \bigg). \end{aligned}$$

Proof

Let \((x,\psi )\in K\times \Psi \) and \(\epsilon > 0\). Condition (3.2) implies that

$$ 2 g_{n}(x,\psi ) = \frac{1}{n (n-1)}~\sum _{ {\scriptstyle i,j=1\atop\scriptstyle i\neq j}}^{n} \big(Z^{(i)}(x,\psi ) - Z^{(j)}(x, \psi ) \big)^{2}. $$

In particular, \(2 g_{n}(x,\psi )\) is a so-called U-statistic with kernel \(q:\mathbb{R}^{2}\rightarrow \mathbb{R}\) defined by \(q(s,t) = (s - t)^{2}\). Hence we may draw on a Bernstein inequality for U-statistics (see e.g. Clémençon et al. [16, Appendix A]) to conclude that

$$\begin{aligned} &P^{\mathbb{N}}\big[ |g_{n}(x,\psi ) - E[g_{n}(x,\psi )] | > \epsilon \big] \\ &\leq 2\exp \bigg( \frac{- \lfloor n/2\rfloor (2\epsilon )^{2}}{2 \mathrm {{Var}}[q (Z(x,\psi ), \tilde{Z}(x,\psi ) ) ] + 2 (2\epsilon )/3} \bigg) \\ &\leq 2\exp \bigg( \frac{- n \epsilon ^{2}}{2 E[q (Z(x,\psi ), \tilde{Z}(x,\psi ) )^{2} ] + 4\epsilon /3} \bigg), \end{aligned}$$

where \(\lfloor n/2\rfloor \) denotes the integer part of \(n/2\). By using (3.2) again, we obtain

$$ E\big[q \big(Z(x,\psi ), \tilde{Z}(x,\psi ) \big)^{2} \big] = g^{2}(x, \psi ) $$

(see e.g. the proof of Lemma 6.4). Then the statement of Lemma 6.5 follows immediately from Lemma 6.4. □

Now we are ready to verify the concentration inequalities. Let us start with (6.13), recalling that \(L\) as defined in (6.18) is finite by Lemma 6.1.

Proof of (6.13)

Note first that in view of (3.2),

$$ P^{\mathbb{N}}[2D(x^{*},\psi ^{*})-(1+c)\xi _{n}(x^{*},\psi ^{*}) \geq \epsilon ] = 0. $$

Hence we may assume without loss of generality that \((K \times \Psi )_{\eta}\setminus \{(x^{*},\psi ^{*})\} \neq \emptyset \). In view of Lemma 6.2, there is some \((\overline{x},\overline{\psi})\in K\times \Psi \) such that

$$\begin{aligned} &P^{\mathbb{N}}\Big[\sup _{(x,\psi ) \in (K \times \Psi )_{\eta}} \big(2D(x,\psi )-(1+c)\xi _{n}(x,\psi )\big)\geq \epsilon \Big] \\ &\leq \mathcal{N}(K \times \Psi ,\eta )~P^{\mathbb{N}}[2D( \overline{x},\overline {\psi })-(1+c)\xi _{n}(\overline{x},\overline {\psi })\geq \epsilon ]. \end{aligned}$$

If \((\overline{x},\overline{\psi}) = (x^{*},\psi ^{*})\), then (6.13) is shown. So let \((\overline{x},\overline{\psi})\) be different from \((x^{*},\psi ^{*})\). Then \(D(\overline{x},\overline{\psi})\neq 0\) in view of (3.3). This allows us to use the Bernstein inequality (see Corollary A.4 below). Thanks to \(\mathrm {{Var}}[Z(x^{*},\psi ^{*}) ] = 0\), we then arrive at

$$\begin{aligned} &P^{\mathbb{N}}\bigg[D(\overline{x},\overline {\psi })-\xi _{n}(\overline{x}, \overline {\psi }) \geq \frac{\epsilon +(c-1)D(\overline{x},\overline {\psi })}{1+c}\bigg] \\ &\leq 2 \exp \bigg( \frac{-n(\epsilon +(c-1)D(\overline{x},\overline {\psi }))^{2}}{(1+c)^{2} (2\mathrm {{Var}}[Z(\overline {x},\overline {\psi })]+\frac{8}{3}b\frac{\epsilon +(c-1)D(\overline{x},\overline {\psi })}{1+c} )} \bigg). \end{aligned}$$

Since \(L\) satisfies for all \((\overline {x},\overline {\psi }) \in (K \times \Psi )_{\eta}\) with \(D(\overline {x},\overline {\psi })\neq 0\) the inequality

$$ 2\mathrm {{Var}}[Z(\overline {x},\overline {\psi })]+\frac{8}{3}b\big(\epsilon +(c-1)D( \overline{x},\overline {\psi })\big)\leq L \big(\epsilon +(c-1)D(\overline{x}, \overline {\psi })\big), $$

we get

$$\begin{aligned} &\exp \bigg( \frac{-n(\epsilon +(c-1)D(\overline{x},\overline {\psi }))^{2}}{(1+c)^{2} (2\mathrm {{Var}}[Z(\overline {x},\overline {\psi })]+\frac{8}{3}b\frac{\epsilon +(c-1)D(\overline{x},\overline {\psi })}{1+c} )} \bigg) \\ &\leq \exp \bigg( \frac{-n (\epsilon +(c-1)D(\overline{x},\overline {\psi }))}{(1+c)^{2} L}\bigg) \leq \exp \bigg(\frac{-n\epsilon}{(1+c)^{2} L}\bigg). \end{aligned}$$

Then (6.13) may be derived immediately. □

Let us turn now to (6.14).

Proof of (6.14)

For \(c \geq 2\), we may select by Lemma 6.2 some \((\overline {x},\overline {\psi })\in ( K\times \Psi )_{\eta}\) with

$$\begin{aligned} &P^{\mathbb{N}}\Big[\sup _{(x,\psi ) \in (K \times \Psi )_{\eta}} \big(2\lambda E[g_{n}(x,\psi )]-(1+c)\lambda g_{n}(x,\psi ) \big) \geq t\Big] \\ & \leq \mathcal{N}(K \times \Psi ,\eta )~ P^{\mathbb{N}}\bigg[ E[g_{n}(\overline{x},\overline {\psi })]-g_{n}(\overline{x},\overline {\psi })\geq \frac{t+\lambda (c-1) E[g_{n}(\overline{x},\overline {\psi })]}{\lambda (1+c)} \bigg] \\ &\leq \mathcal{N}(K \times \Psi ,\eta )~ P^{\mathbb{N}}\bigg[ E[g_{n}(\overline{x},\overline {\psi })]-g_{n}(\overline{x},\overline {\psi })\geq \frac{t+\lambda (c-1) E[g(\overline{x},\overline {\psi })]}{2 \lambda (1+c)} \bigg]. \end{aligned}$$

Now with \(\epsilon := (t+\lambda (c-1) E[g(\overline{x},\overline {\psi })] )/2 \lambda (1+c)\), we may invoke Lemma 6.5 to observe that

$$\begin{aligned} &P^{\mathbb{N}}\bigg[E[g_{n}(\overline{x},\overline {\psi })]-g_{n}( \overline{x},\overline {\psi })\geq \frac{t+\lambda (c-1) E[g(\overline{x},\overline {\psi })]}{2 \lambda (1+c)} \bigg] \\ &\leq 2 \exp \bigg( \frac{-n (t + \lambda (c-1) E[g(\overline {x},\overline {\psi })] )^{2}}{32 b^{2}\lambda ^{2}(1 + c)^{2}E[g(\overline {x},\overline {\psi })] + 8 \lambda (1 + c) (t + \lambda (c-1) E[g(\overline {x},\overline {\psi })] )/3} \bigg). \end{aligned}$$

By (3.2), the expectation \(E[g(\overline {x},\overline {\psi })]\) is nonnegative. Since in addition \(c \geq 2\), we may conclude that

$$\begin{aligned} & \frac{-n (t + \lambda (c-1) E[g(\overline {x},\overline {\psi })] )^{2}}{32 b^{2}\lambda ^{2}(1 + c)^{2}E[g(\overline {x},\overline {\psi })] + 8 \lambda (1 + c) (t + \lambda (c-1) E[g(\overline {x},\overline {\psi })] )/3} \\ &\leq \frac{-n (t + \lambda (c-1) E[g(\overline {x},\overline {\psi })] )}{32 b^{2}\lambda (1 + c)^{2} + 8 \lambda (1 + c)/3} \\ &\leq \frac{-n t}{8 \lambda (1 + c)^{2} (4 b^{2} + 1)}. \end{aligned}$$

This completes the proof. □

Concerning (6.15) and (6.16), note first that \(d ((x^{*}_{\eta},\psi ^{*}_{\eta}),(x^{*},\psi ^{*}) ) \leq \eta \) holds due to (6.1). In particular, \(E[g(x^{*}_{\eta},\psi ^{*}_{\eta})]\leq 8 b\eta \) by Lemma 6.4. Then (6.15) follows easily from Lemma 6.5. Moreover,

$$\begin{aligned} \mathrm {{Var}}[ Z(x^{*}_{\eta},\psi ^{*}_{\eta}) - Z(x^{*},\psi ^{*}) ] &\leq E\big[\big(Z(x^{*}_{\eta},\psi ^{*}_{\eta}) - Z(x^{*},\psi ^{*}) \big)^{2}\big] \\ &\leq 2 b d\big((x^{*}_{\eta},\psi ^{*}_{\eta}),(x^{*},\psi ^{*}) \big) \\ &\leq 2 b \eta . \end{aligned}$$

Thus (6.16) may be derived directly from the Bernstein inequality (see Corollary A.4). Hence we have shown all concentration inequalities necessary for our proof of Theorem 3.3.

7 Proof of Remark 3.4

It is easy to check that for a constant \(k\) not depending on \(n\) and \(\gamma \), we have

$$\begin{aligned} \frac{R_{1}+R_{2}}{R_{0}} &\leq \frac{\gamma (K \times \Psi ,n)}{k\sqrt{\gamma (K \times \Psi ,n)}}+ \frac{\log (8/\delta )}{k n \sqrt{\frac{\log (8/\delta )}{n}}} \\ & \phantom{=:}{} +\sqrt{ \frac{b(\gamma (K \times \Psi ,n)+1+C_{\lambda ,b}~\log (8/\delta )/n)\log (8/\delta )}{n k^{2} \gamma (K \times \Psi ,n)}}. \end{aligned}$$

With the assumption that \(\lim _{n \to \infty} \gamma (K\times \Psi ,n)=0\), we get

$$\begin{aligned} &\lim _{n \to \infty}\frac{R_{1}+R_{2}}{R_{0}} \\ & \leq \lim _{n \to \infty}\sqrt{\left (\frac{b }{k^{2}n} + \frac{b}{k^{2}n \gamma (K \times \Psi ,n)}+ \frac{b ~\log (8/\delta )~ C_{\lambda ,b}}{k^{2}n^{2}\gamma (K \times \Psi ,n)} \right )\log (8/\delta )}. \end{aligned}$$

Then by \(\lim _{n \to \infty} n\gamma (K \times \Psi ,n) = \infty \), we end up with

$$ \lim _{n \to \infty}\sqrt{\left (\frac{b}{k^{2}n} + \frac{b}{k^{2}n \gamma (K \times \Psi ,n)}+ \frac{b~\log (8/\delta )~ C_{\lambda ,b}}{k^{2}n^{2}\gamma (K \times \Psi ,n)} \right )\log (8/\delta )} = 0. $$

 □

8 Proofs of Theorems 4.1 and 4.2

Let the assumptions of Theorem 4.1 be fulfilled, retaking notation from its formulation. To prove Theorems 4.1 and 4.2, we need some preparations. These mainly concern estimates of different semimetrics, but may reveal some interesting results.

8.1 Preparations and notations

Firstly, we endow the space \(K \times \Psi \) with the semimetric

$$ d:(K\times \Psi ) \times (K \times \Psi ) \to [0,\infty ),~ \big((x, \psi ),(x',\psi ')\big) \mapsto E[|Z(x,\psi )-Z(x',\psi ')| ]. $$

This is well defined because \(Z(x,\psi )\) is assumed to be essentially bounded uniformly in \((x,\psi ) \in K\times \Psi \). Secondly, by assumption, we may equip the set \(\Psi _{\pi}\) with the \(L^{2}\)-metric \(d_{\Psi _{\pi}}\) defined by

$$\begin{aligned} d_{\Psi _{\pi}}(f_{1},f_{2})&:=\bigg(\int _{0}^{T}\int _{\mathbb{R}^{d}} | f_{1}(t,x)-f_{2}(t,x)|^{2} ~dx~dt\bigg)^{1/2}. \end{aligned}$$

Next, we want to find a suitable semimetric on the space \(\Psi \). It is based on the following observation.

Lemma 8.1

There exists some \(C_{1} > 0\) such that for \(\psi _{1},\psi _{2}\in \Psi \), the inequality

$$ E\Big[\sup _{t\in [0,T]} |M_{t}(\psi _{1})-M_{t}(\psi _{2})| \Big]\leq C_{1} d_{\Psi _{\pi}}(f_{\psi _{1}},f_{\psi _{2}}) $$
(8.1)

holds, where \(f_{\psi _{i}}(t,x) = \psi _{i}(t,x)\sqrt{\pi _{t}(x)}\) for \(i=1,2\).

Proof

By the Burkholder–Davis–Gundy inequality (for \(p=1\)), we may find some \(C_{1} > 0\) such that for \(\psi _{1}, \psi _{2}\in \Psi \), we have

$$\begin{aligned} E\Big[\sup _{t\in [0,T]} |M_{t}(\psi _{1})-M_{t}(\psi _{2})| \Big] &=E\bigg[\sup _{t\in [0,T]} \bigg|\int _{0}^{t}(\psi _{1}- \psi _{2})(u,S_{u})dW_{u}\bigg|\bigg] \\ &\leq C_{1} E\bigg[\bigg(\int _{0}^{T} |\psi _{1}-\psi _{2} |^{2}(u,S_{u})du \bigg)^{1/2}\bigg]. \end{aligned}$$

Invoking Jensen’s inequality, we end up with

$$\begin{aligned} &E\bigg[\bigg(\int _{0}^{T} \! |\psi _{1}-\psi _{2} |^{2}(u,S_{u})du \bigg)^{1/2}\bigg] \\ &\leq \bigg(E\bigg[\int _{0}^{T} \!| \psi _{1}-\psi _{2}|^{2}(u,S_{u})du \bigg]\bigg)^{1/2} \\ &= \bigg(\int _{0}^{T} \!\!\int _{\mathbb{R}^{d}}|\psi _{1}-\psi _{2}|^{2}(u,x) \pi _{u}(x)~dx~du\bigg)^{1/2}\!. \end{aligned}$$

This completes the proof. □

Lemma 8.1 allows us to introduce the mapping

$$ d_{\Psi}: \Psi \times \Psi \to [0,\infty ), \quad (\psi ,\psi ') \mapsto E\Big[\sup _{u \in [0,T]} |M_{u}(\psi )-M_{u}(\psi ')| \Big]. $$

Obviously, it satisfies the properties of a semimetric. The minimal number of open \(d_{\Psi}\)-balls of radius \(r > 0\) to cover \(\Psi \) is denoted by \(\mathcal{N}(\Psi ,r)\), where \(\mathcal{N}(\Psi ,r) := \infty \) if no finite cover is available.

Let us introduce the mappings

$$\begin{aligned} & \rho _{1}:(K\times \Psi )^{2} \to [0,\infty ),\quad \big((x,\psi ),(x', \psi ')\big) \mapsto d_{\Psi}(\psi ,\psi '), \\ & \rho _{2}:(K\times \Psi )^{2} \to [0,\infty ),\quad \big((x,\psi ),(x', \psi ')\big) \mapsto |x-x'|, \end{aligned}$$

which are obviously alternative semimetrics on \(K\times \Psi \). In the next step, we want to find an upper estimate of the semimetric \(d\) in terms of the semimetrics \(\rho _{1}\) and \(\rho _{2}\).

Theorem 8.2

There exists a constant \(C>1\) such that for \((x,\psi ),(x',\psi ')\in K\times \Psi \),

$$\begin{aligned} d\big((x,\psi ),(x',\psi ')\big) \leq C \rho _{2}\big((x,\psi ),(x', \psi ')\big)+\rho _{1}\big((x,\psi ),(x',\psi ')\big). \end{aligned}$$

Proof

Set \(\overline{x} := (\max K)^{+} + 1\). The proof is based on a representation for convex functions. We use that for any \(x,x_{0} \in \mathbb{R}\) with \(x_{0} < x\), we have

$$ \Phi ^{*}(x)=\Phi ^{*}(x_{0})+\int _{x_{0}}^{x}\Phi ^{*'}_{+}(s)ds,$$

where \(\Phi ^{*'}_{+}\) denotes the right derivative of \(\Phi ^{*}\). Since \(\Phi ^{*}\) and its right derivative are both nondecreasing, we may observe for \(x,x'\in K\) and \(\eta \geq 0\) that

$$\begin{aligned} |\Phi ^{*}(x + \eta ) - \Phi ^{*}(x' + \eta )| &= \int _{x\wedge x' + \eta}^{x\vee x' + \eta}\Phi ^{*'}_{+}(s)~ds \\ &\leq |x - x'|\Phi ^{*'}_{+}(\overline{x} + \eta ) \\ &\leq \frac{|x - x'|}{\overline{x}}\big(\Phi ^{*}(2\overline{x} + \eta ) - \Phi ^{*}(\overline{x} + \eta )\big) \\ &\leq \Phi ^{*}(2\overline{x} + \eta ) |x - x'|, \end{aligned}$$

where the last step additionally uses that \(\overline{x}\geq 1\) and that \(\Phi ^{*}\) is nonnegative on \([0,\infty )\). As a consequence, we may conclude by nonnegativity of \((Y_{t})\) that

$$ \sup _{t\in [0,T]}|\Phi ^{*}(x + Y_{t}) - \Phi ^{*}(x' + Y_{t})|\leq \sup _{t\in [0,T]}\Phi ^{*}(2\overline{x} + Y_{t}) |x - x'|\quad \mbox{for}~x, x'\in K. $$
(8.2)

Since \(|Z|\leq b\) \(P\)-a.s., we further obtain

$$\begin{aligned} |\Phi ^{*}(2\overline{x} + Y_{t})| &= \big(\Phi ^{*}(2\overline{x} + Y_{t}) - 2 \overline{x} - M_{t}(\psi )\big) + 2\overline{x} + M_{t}(\psi ) \\ &\leq b + 2\overline{x} + M_{t}(\psi ) \end{aligned}$$
(8.3)

for every \(\psi \in \Psi \) and any \(t\in [0,T]\). Each martingale \(M(\psi )\) is continuous and square-integrable on \([0,T]\) so that, by Doob’s \(L^{2}\)-inequality,

$$\begin{aligned} C_{\psi} := E\Big[\sup _{t\in [0,T]}|M_{t}(\psi )|\Big]\leq \Big(E\Big[\sup _{t\in [0,T]}|M_{t}(\psi )|^{2}\Big]\Big)^{1/2} < \infty \qquad \mbox{for}~\psi \in \Psi . \end{aligned}$$
(8.4)

Now the proof is straightforward: We fix any \(\overline{\psi}\in \Psi \). Combining (8.2) with (8.3) and (8.4), we end up with

$$\begin{aligned} d\big((x,\psi ),(x',\psi ')\big) &=E[| Z(x,\psi )-Z(x',\psi ')|] \\ &\leq |x-x'| + E\Big[ \sup _{u \in [0,T]} | M_{u}(\psi ) - M_{u}( \psi ')| \Big] \\ & \phantom{=:}{} +E\Big[\sup _{u \in [0,T]} |\Phi ^{*}(x + Y_{u}) - \Phi ^{*}(x'+ Y_{u})|\Big] \\ &\leq (1 + b + 2\overline{x} + C_{\overline{\psi}}) |x-x'| + d_{\Psi}( \psi ,\psi ') \end{aligned}$$

for \(x,x'\in K\) and \(\psi ,\psi '\in \Psi \). The proof is complete. □

Let us introduce some further notation. For any semimetric \(\overline{d}\) on \(K\times \Psi \), we denote by \(\mathcal{N}(K\times \Psi ,r,\overline{d})\) the minimal number of open \(\overline{d}\)-balls of radius \(r > 0\) to cover \(K\times \Psi \). Here we set \(\mathcal{N}(K\times \Psi ,r,\overline{d}) := \infty \) if no finite cover is available. These covering numbers induce the numbers

$$ \gamma (K\times \Psi ,n,\overline{d}) := \inf \{r > 0 : \log \mathcal{N}(K\times \Psi ,r,\overline{d})\leq n r\}, \qquad n\in \mathbb{N}. $$

Proposition 8.3

Let \(\gamma (\Psi ,n) := \inf \{r > 0 : \log \mathcal{N}(\Psi ,r)\leq n r \} \to 0\) for \(n \to \infty \). Then for \(C > 1\), it holds that

$$ \lim _{n\to \infty}\gamma (K \times \Psi ,n,\bar{d}^{C}_{K\times \Psi}) = 0, $$

where \(\bar{d}^{C}_{K\times \Psi}\) denotes the semimetric on \(K\times \Psi \) defined by \(\bar{d}^{C}_{K\times \Psi} := \rho _{1} + C\rho _{2}\).

Proof

We use the notation \(\mathcal{N}(K,r)\) for the minimal number of open intervals of radius \(r > 0\) to cover \(K\). By Buldygin and Kozachenko [15, Lemma 3.2.1], we have

$$\begin{aligned} \log \mathcal{N}(K\times \Psi ,r,\bar{d}^{C}_{K\times \Psi}) & \leq \log \mathcal{N}(K\times \Psi ,r/4,\rho _{1}) + \log \mathcal{N} \big(K\times \Psi ,r/(4C),\rho _{2}\big) \\ & \leq \log \mathcal{N}(\Psi ,r/4) + \log \mathcal{N}\big(K,r/(4C)\big) \quad \, \mbox{for}~r > 0. \end{aligned}$$
(8.5)

By compactness, the set \(K\) is a subset of \([-A,A]\) for some \(A>0\). In particular, the inequality \(\mathcal{N}(K,r) \leq 1 + 2 A/r\) holds for every \(r > 0\). Moreover, the mapping

$$ \varphi :(0,\infty )\rightarrow (0,\infty ), \quad r\mapsto \log (1 + 2 A/r)/r $$

is strictly decreasing and differentiable and satisfies \(\varphi (r)\to 0\) for \(r\to \infty \) as well as \(\varphi (r)\to \infty \) for \(r\to 0\). In particular, \(\varphi \) is a strictly decreasing bijection with inverse \(\varphi ^{-1}\). Then, defining \(\gamma (K,n) := \inf \{r > 0 : \log \mathcal{N}(K,r)\leq n r\}\), we obtain

$$ \gamma (K,n)\leq \varphi ^{-1}(n)\to 0\qquad \mbox{for}~n\to \infty . $$
(8.6)

Now let \(\epsilon > 0\). Since \(\gamma (\Psi ,n)\to 0\) for \(n\to \infty \) by assumption, we may find in view of (8.6) some \(n_{0}\in \mathbb{N}\) such that

$$ \gamma (K,n) < \epsilon /(4 C)\qquad \mbox{and}\qquad \gamma ( \Psi ,n) < \epsilon /4 $$

for \(n\in \mathbb{N}\) with \(n\geq n_{0}\). Then for \(n\in \mathbb{N}\) with \(n\geq n_{0}\), there exist \(r_{n}\in (0,\epsilon /(4C))\) and \(\overline{r}_{n}\in (0,\epsilon /4)\) such that

$$ \log \mathcal{N}\big(K,\epsilon /(4 C)\big)\leq \log \mathcal{N} (K,r_{n} )\leq n r_{n}\leq n \epsilon /(4C) $$

and

$$ \log \mathcal{N} (\Psi ,\epsilon /4 )\leq \log \mathcal{N} (K, \overline{r}_{n} )\leq n \overline{r}_{n}\leq n \epsilon /C. $$

Hence by (8.5),

$$ \log \mathcal{N}(K\times \Psi ,\epsilon ,\bar{d}^{C}_{K\times \Psi}) \leq n \epsilon \big(1/4 + 1/(4 C)\big) < n\epsilon \qquad \mbox{for}~n\in \mathbb{N}, n\geq n_{0}. $$

Hence \(\gamma (K\times \Psi ,n)\leq \epsilon \) for \(n\in \mathbb{N}\) with \(n\geq n_{0}\). This completes the proof. □

Henceforth, we denote by \(\mathcal {N}(\Psi _{\pi},\varepsilon )\) the covering number of \(\Psi _{\pi}\) by \(\varepsilon \)-balls with respect to \(d_{\Psi _{\pi}}\). Furthermore, we define

$$ \gamma (\Psi _{\pi},n) := \inf \left \{ \varepsilon >0 : \log \mathcal{N}(\Psi _{\pi}, \varepsilon ) \leq n \varepsilon \right \}. $$

Finally, let us introduce the following notation: \(f(x)\lesssim g(x)\) for two functions \(f,g:\mathbb{R}^{d}\to \mathbb{R}\) means that there exists a constant \(C>0\) such that \(f(x)\leq Cg(x)\) for all \(x \in \mathbb{R}^{d}\).

8.2 Proofs of Theorems 4.1 and 4.2

Setting \(\bar{d}^{C}_{K \times \Psi} := \rho _{1} + C\rho _{2}\), we may find by Theorem 8.2 some \(C > 1\) such that

$$ \gamma (K \times \Psi ,n)\lesssim \gamma ( K \times \Psi ,n,\bar{d}^{C}_{K \times \Psi}). $$
(8.7)

The idea of the proofs of Theorems 4.1 and 4.2 is based on a result given by Nickl and Pötscher [28]. Under the imposed assumptions, this result enables us to give analytical upper estimates for \(\gamma (K \times \Psi ,n,\bar{d}^{C}_{K \times \Psi})\). Then we use these analytical estimates and apply Theorems 3.1 and 3.3, respectively, to derive an analytical bound for the deviations \(\mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*},\psi ^{*})\).

To calculate the bounds for \(\gamma (K \times \Psi ,n,\bar{d}^{C}_{K \times \Psi})\), we already know by (8.5) that

$$ \gamma (K \times \Psi ,n,\bar{d}^{C}_{K \times \Psi} ) \leq \inf \big\{ \epsilon >0 : \log \mathcal{N}(\Psi ,\epsilon /4)+\log \mathcal{N}\big(K,\epsilon /(4C)\big)\leq n\epsilon \big\} , $$

and due to (8.1), there is \(C_{1}>0\) such that

$$\begin{aligned} \gamma (K \times \Psi ,n,\bar{d}^{C}_{K \times \Psi} ) \leq \inf \big\{ \epsilon >0 &: \log \mathcal{N}\big(\Psi _{\pi},\epsilon /(4 C_{1} )\big) \\ & \phantom{::}{} +\log \mathcal{N}\big(K,\epsilon /(4C)\big)\leq n\epsilon \big\} . \end{aligned}$$

On the one hand, we can calculate with \(K\subseteq [-A,A]\) for some \(A > 0\) that

$$ \mathcal{N}\big(K,\epsilon /(4C)\big)\leq 1 + 8CA/\epsilon \leq \exp (8CA/ \epsilon )\qquad \mbox{for}~\epsilon > 0. $$

On the other hand, Nickl and Pötscher [28, Corollary 4] gives for \(\alpha >s-(d+1)/2\) that

$$ \log \mathcal{N}(\Psi _{\pi},\epsilon ) \lesssim \epsilon ^{-(d+1)/s} $$

and for \(\alpha < s-(d+1)/2\) that

$$ \log \mathcal{N}(\Psi _{\pi},\epsilon ) \lesssim \epsilon ^{-(\alpha /(d+1)+1/2)^{-1}}. $$

So in view of (8.7), we end up with

$$\begin{aligned} \gamma (K\times ,n) & \lesssim \inf \left \{\epsilon >0 : \log \mathcal{N}\big(\Psi _{\pi}, \epsilon /(4C_{1})\big)+\log \mathcal{N}\big(K,\epsilon /(4C)\big) \leq n\epsilon \right \} \\ & \lesssim \inf \bigg\{ \epsilon >0 : \frac{h_{2}^{\ell}}{\epsilon ^{\ell}}+\frac{h_{1}}{\epsilon}\leq n \epsilon \bigg\} , \end{aligned}$$
(8.8)

where \(\ell =(d+1)/s\) in the case \(\alpha >s-(d+1)/2\) and \(\ell =(\alpha /(d+1)+1/2)^{-1}\) in the case \(\alpha < s-(d+1)/2\) and \(h_{1}=8CA\) as well as \(h_{2}=4C_{1}\). To calculate the convergence rates, we divide the set \(M:=\{\epsilon >0 : h_{2}^{\ell}/\epsilon ^{ \ell} + h_{1}/\epsilon \leq n\epsilon \}\) into the subsets \(M_{1}\) and \(M_{2}\) satisfying \(M=M_{1} \cup M_{2}\), where

$$\begin{aligned} M_{1}&:=\bigg\{ \epsilon \in (0,1) : \frac{h_{2}^{\ell}}{\epsilon ^{\ell}} + \frac{h_{1}}{\epsilon} \leq n \epsilon \bigg\} , \\ M_{2}&:=\bigg\{ \epsilon \geq 1 : \frac{h_{2}^{\ell}}{\epsilon ^{\ell}} + \frac{h_{1}}{\epsilon} \leq n \epsilon \bigg\} \end{aligned}$$

and note that \(\inf M =\min \{\inf M_{1}, \inf M_{2}\} \). Further we distinguish the following two cases:

Case 1: \(\ell \geq 1\). In this case, we have on \(M_{1}\) that

$$ \frac{h_{2}^{\ell}}{\epsilon ^{\ell}}+\frac{h_{1}}{\epsilon}\leq \frac{h_{2}^{\ell}}{\epsilon ^{\ell}}+\frac{h_{1}}{\epsilon ^{\ell}}= \frac{h_{2}^{\ell}+h_{1}}{\epsilon ^{\ell}}. $$

Therefore

$$ M_{1}':=\bigg\{ \epsilon \in (0,1) : \frac{h_{2}^{\ell}+h_{1}}{\epsilon ^{\ell}}\leq n\epsilon \bigg\} \subseteq M_{1} $$

and so \(\inf M_{1} \leq \inf M_{1}' \). With a bit of calculation, we have

$$ M_{1}' = \left \{ \textstyle\begin{array}{ll} [(h_{2}^{\ell}+h_{1})^{\frac{1}{\ell +1}}n^{-\frac{1}{\ell +1}},1) & \qquad \text{for }n > h_{2}^{\ell}+h_{1}, \\ \emptyset & \qquad \text{for }n \leq h_{2}^{\ell}+h_{1}.\end{array}\displaystyle \right . $$

On \(M_{2}\), we have

$$ \frac{h_{2}^{\ell}}{\epsilon ^{\ell}}+\frac{h_{1}}{\epsilon}\leq \frac{h_{2}^{\ell}}{\epsilon}+\frac{h_{1}}{\epsilon}= \frac{h_{2}^{\ell}+h_{1}}{\epsilon}. $$

With a bit of calculation, we have for \(n \leq h_{2}^{\ell}+h_{1}\) that

$$ M_{2}':=\bigg\{ \epsilon \geq 1 : \frac{h_{2}^{\ell}+ h_{1}}{\epsilon} \leq n\epsilon \bigg\} = \big((h_{2}^{ \ell}+h_{1})^{1/2}~n^{-1/2},\infty \big), $$

and for \(n > h_{2}^{\ell}+h_{1}\), we have \(M_{2}'= [1, \infty ) \). Combining these results and setting \(\inf \emptyset := \infty \), we have

$$\begin{aligned} \inf M &=\min \{ \inf M_{1},\inf M_{2}\} \\ & \leq \min \{\inf M_{1}',\inf M_{2}'\} = \left \{ \textstyle\begin{array}{ll} (h_{2}^{\ell}+h_{1})^{\frac{1}{\ell +1}}n^{-\frac{1}{\ell +1}} \qquad & \text{for }n > h_{2}^{\ell}+h_{1}, \\ (h_{2}^{\ell}+h_{1})^{\frac{1}{2}}n^{-\frac{1}{2}} \qquad & \text{for }n \leq h_{2}^{\ell}+h_{1}. \end{array}\displaystyle \right . \end{aligned}$$

Case 2: \(\ell \in (0,1)\). On \(M_{1}\), we have

$$ \frac{h_{2}^{\ell}}{\epsilon ^{\ell}}+\frac{h_{1}}{\epsilon}\leq \frac{h_{2}^{\ell}}{\epsilon}+\frac{h_{1}}{\epsilon}= \frac{h_{2}^{\ell}+h_{1}}{\epsilon}, $$

and so we get

$$ M_{1}'' := \left\{\varepsilon\in (0,1)~:~\frac{h_{2}^{l} + h_{1}}{\varepsilon}\leq n\varepsilon\right\} = \left \{ \textstyle\begin{array}{ll} [(h_{2}^{\ell}+h_{1})^{\frac{1}{2}}n^{-\frac{1}{2}},1 ) \qquad & \text{for }n > h_{2}^{\ell}+h_{1}, \\ \emptyset \qquad & \text{for }n \leq h_{2}^{\ell}+h_{1}.\end{array}\displaystyle \right . $$

On \(M_{2}\), we have

$$ \frac{h_{2}^{\ell}}{\epsilon ^{\ell}}+\frac{h_{1}}{\epsilon}\leq \frac{h_{2}^{\ell}}{\epsilon ^{\ell}}+\frac{h_{1}}{\epsilon ^{\ell}}= \frac{h_{2}^{\ell}+h_{1}}{\epsilon ^{\ell}}. $$

So we have for \(n \leq h_{2}^{\ell}+h_{1}\) that

$$ M_{2}'':=\bigg\{ \epsilon \geq 1 : \frac{h_{2}^{\ell}+ h_{1}}{\epsilon ^{l}} \leq n\epsilon \bigg\} = \big[(h_{2}^{\ell}+h_{1})^{1/(l+1)}~n^{-1/(l+1)},\infty \big) $$

and for \(n > h_{2}^{\ell}+h_{1}\) that \(M_{2}''= [1,\infty ) \). So in conclusion, we have for the second case that

$$\begin{aligned} \inf M & \leq \min \{\inf M_{1}',\inf M_{2}'\} = \left \{ \textstyle\begin{array}{ll} (h_{2}^{\ell}+h_{1})^{\frac{1}{\ell +1}}n^{-\frac{1}{\ell +1}} \qquad &\text{for } n \leq h_{2}^{\ell}+h_{1}, \\ (h_{2}^{\ell}+h_{1})^{\frac{1}{2}}n^{-\frac{1}{2}} \qquad & \text{for }n > h_{2}^{\ell}+h_{1}. \end{array}\displaystyle \right . \end{aligned}$$

Hence in view of (8.8), we end up with

$$\begin{aligned} \gamma (K \times \Psi ,n)\lesssim \left \{ \textstyle\begin{array}{ll} (h_{2}^{\ell}+h_{1})^{\frac{1}{\ell +1}} (\frac{1}{n} )^{\frac{1}{\ell +1}}, \qquad & \ell \geq 1, \\ (h_{2}^{\ell}+h_{1})^{\frac{1}{2}} (\frac{1}{n} )^{\frac{1}{2}}, \qquad & \ell \in (0,1), \end{array}\displaystyle \right .\quad \mbox{for}~n > h_{2}^{\ell} + h_{1}, \end{aligned}$$
(8.9)

and

$$\begin{aligned} \gamma (K \times \Psi ,n)\lesssim \left \{ \textstyle\begin{array}{ll} (h_{2}^{\ell}+h_{1})^{\frac{1}{\ell +1}} (\frac{1}{n} )^{\frac{1}{\ell +1}}, \qquad & \ell \in (0,1), \\ (h_{2}^{\ell}+h_{1})^{\frac{1}{2}} (\frac{1}{n} )^{\frac{1}{2}}, \qquad & \ell \geq 1, \end{array}\displaystyle \right . \quad \mbox{for}~n \leq h_{2}^{\ell} + h_{1}. \end{aligned}$$
(8.10)

From this point on, we want to apply the results of Theorems 3.1 and 3.3, respectively. Therefore the assumptions made for Theorems 4.1 and 4.2 vary and meet the requirements of Theorems 3.1 and 3.3, respectively.

Let us first prove Theorem 4.1. We apply Theorem 3.1 with the estimates (8.9), (8.10). We may find constants \(\hat{\eta}_{1},\hat{\eta}_{2},\hat{\eta}_{3},\hat{\eta}_{4}\), depending on \(b,\lambda ,\delta , s, d, h_{1}\) and \(h_{2}\) (and thus also on the set \(K\) and \(\Phi \)) such that the following inequalities hold with probability at least \(1-\delta \):

Case 1: \(\alpha >s-(d+1)/2\) and \(s/(s+d+1) \leq 1/2\). Then \(\ell = (d+1)/s \geq 1\) and

$$\begin{aligned} 0\leq \mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*}, \psi ^{*}) &\leq \hat{\eta}_{1} (n^{-1/2}+n^{-s/(2(s+d+1))}+n^{-s/(s+d+1)} ) \\ &\leq 3\hat{\eta}_{1}n^{-s/(2(s+d+1))}. \end{aligned}$$

Case 2: \(\alpha >s-(d+1)/2\) and \(s/(s+d+1)> 1/2\). Then \(\ell = (d+1)/s < 1\) and

$$\begin{aligned} 0 \leq \mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*}, \psi ^{*}) \leq \hat{\eta}_{2} (n^{-1/2}+n^{-1/4}+ n^{-1/2} ) \leq 3 \hat{\eta}_{2}n^{-1/4}. \end{aligned}$$

Case 3: \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) \leq 1/2\). Then \(\ell = (\alpha /(d+1) + 1/2 )^{-1}\geq 1\) and

$$\begin{aligned} 0 \leq \mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*}, \psi ^{*}) &\leq \hat{\eta}_{3} (n^{-1/2}+n^{-1/(2(\ell +1))}+ n^{-1/( \ell +1)} ) \\ &\leq 3\hat{\eta}_{3}n^{-1/(2(\ell +1))} \\ &\leq 3\hat{\eta}_{3}n^{-\frac{1}{2} \frac{\alpha /(d+1)+1/2}{\alpha /(d+1)+3/2}}. \end{aligned}$$

Case 4: \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) > 1/2\). Then \(\ell = (\alpha /(d+1) + 1/2 )^{-1} < 1\) and

$$\begin{aligned} 0 \leq \mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*}, \psi ^{*}) \leq \hat{\eta}_{4} (n^{-1/2}+n^{-1/4}+ n^{-1/2} ) \leq 3 \hat{\eta}_{4}n^{-1/4}. \end{aligned}$$

Now the statement of Theorem 4.1 follows immediately. □

Let us now turn to the proof of Theorem 4.2. We can apply Theorem 3.3 for all \(n \in \mathbb{N}\) satisfying

$$ n> \max \bigg(h_{2}^{\ell}+h_{1} ; 4 b \Big(\gamma (K \times \Psi ,n)+1+ \frac{C_{\lambda ,b}~\log (8/\delta )}{n}\Big)\log (8/\delta )\bigg),$$

where \(C_{\lambda ,b}\) is as in Theorem 3.3. Hence we may select constants \(\overline{\eta}_{1},\overline{\eta}_{2},\overline{\eta}_{3}\), \(\overline{\eta}_{4}\), depending on \(b,\lambda ,\delta , s, d, h_{1}\) and \(h_{2}\) (and thus also on the set \(K\) and \(\Phi \)) such that the following inequalities hold with probability at least \(1-\delta \):

Case 1: \(\alpha >s-(d+1)/2\) and \(s/(s+d+1) \leq 1/2\). Then \(\ell = (d+1)/s\geq 1\) and

$$\begin{aligned} 0\leq \mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*}, \psi ^{*}) \leq \overline{\eta}_{1} (n^{-s/(s+d+1)}+n^{-1/2} ) \leq 2 \overline{\eta}_{1}n^{-s/(s+d+1)}. \end{aligned}$$

Case 2: \(\alpha >s-(d+1)/2\) and \(s/(s+d+1)> 1/2\). Then \(\ell = (d+1)/s < 1\) and

$$\begin{aligned} 0\leq \mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*}, \psi ^{*}) \leq \overline{\eta}_{2} (n^{-1/2}+n^{-1/2} ) \leq 2 \overline{\eta}_{2}n^{-1/2}. \end{aligned}$$

Case 3: \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) \leq 1/2\). Then \(\ell = (\alpha /(d+1) + 1/2 )^{-1}\geq 1\) and

$$\begin{aligned} 0\leq \mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*}, \psi ^{*}) &\leq \overline{\eta}_{3} (n^{-\ell /(\ell +1)}+ n^{-1/2} ) \\ &\leq 2\overline{\eta}_{3}n^{-\ell /(\ell +1)} \\ &= 2\overline{\eta}_{3}n^{-(\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2)}. \end{aligned}$$

Case 4: \(\alpha >s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) > 1/2\). Then \(\ell = (\alpha /(d+1) + 1/2 )^{-1} < 1\) and

$$\begin{aligned} 0\leq \mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*}, \psi ^{*}) \leq \overline{\eta}_{4} (n^{-1/2}+n^{-1/2} ) \leq 2 \overline{\eta}_{4}n^{-1/2}. \end{aligned}$$

Note that here, the bounds can even be derived for all \(n \in \mathbb{N}\). To see this, check the definition of \(\kappa \) in the proof of Theorem 3.3. The proof of Theorem 4.2 is complete.  □

9 Proof of Remark 4.3

Let \(\pi (x,t) = \pi _{t}(x)\) denote the density of the diffusion process given in (4.1). Furthermore, let \(\overline {\sigma }:=\frac{1}{2}\sigma \sigma ^{\top}\), where \(\sigma ^{\top}\) denotes the transposed matrix. Then the Fokker–Planck equation states that

$$ \frac{\partial \pi (x,t)}{\partial t}=-\sum _{i=1}^{d} \frac{\partial}{\partial x_{i}}\big(\mu _{i}(x,t)\pi (x,t)\big) + \sum _{i=1}^{d}\sum _{j=1}^{d} \frac{\partial ^{2}}{\partial x_{i} \partial x_{j}} \big(\overline {\sigma }_{i,j}(x,t) \pi (x,t)\big). $$
(9.1)

This is a parabolic partial differential equation. To show that, under some conditions on \(\mu \) and \(\sigma \), the density \(\pi \) is infinitely differentiable in space and time, we want to make use of Friedman [21, Theorem 3.11]. To apply that theorem, we need to impose that \(\overline {\sigma }\) is uniformly elliptic, i.e., there exists \(\lambda >0\) such that for all \(t,x \in [0,T]\times \mathbb{R}^{d}\) and all \(\xi \in \mathbb{R}^{d}\),

$$ \frac{1}{\lambda} |\xi |^{2} \leq \xi ^{T} \overline {\sigma }(t,x) \xi \leq \lambda |\xi |^{2}. $$
(9.2)

Theorem 9.1

Let \(\overline {\sigma }\) be uniformly elliptic and let \(\sigma \) and \(\mu \) be \(p\) times Hölder-differentiable in space and \(q\) times Hölder-differentiable in time. If \(p=q=\infty \), then the partial derivative

$$ \frac{\partial ^{k+\ell}}{\partial x^{k}\partial t^{\ell}}\pi (x,t)$$

exists for all \(0\leq k,\ell <\infty \) and is Hölder-continuous.

Proof

To use [21, Theorem 3.11], we need a bit of calculation. For brevity, we set

$$\begin{aligned} b_{i}(x,t) :=-\mu _{i}(x,t)+2 \sum _{j=1}^{d} \frac{\partial \overline {\sigma }_{i,j}(x,t)}{\partial x_{j}},\qquad c(x,t) := \sum _{i=1}^{d} -\frac{\partial \mu _{i}(x,t)}{\partial x_{i}}. \end{aligned}$$

With elementary analysis, we write (9.1) as

$$\begin{aligned} \frac{\partial \pi (x,t)}{\partial t}&=\sum _{i=1}^{d}\sum _{j=1}^{d} \overline {\sigma }_{i,j}(x,t) \frac{\partial ^{2}\pi (x,t)}{\partial x_{i} \partial x_{j}} + \sum _{i=1}^{d} \frac{\partial \pi (x,t)}{\partial x_{i}}b_{i}(x,t)+ \pi (x,t)c(x,t). \end{aligned}$$

Due to our assumptions, \(b(x,t)\) and \(c(x,t)\) are infinitely Hölder-differentiable. Now [21, Theorem 3.11] is applicable and our claim follows. □