Abstract
In this work, we consider optimal stopping problems with model uncertainty incorporated into the formulation of the underlying objective function. Typically, the robust, efficient hedging of American options in incomplete markets may be described as optimal stopping of such kind. Based on a generalisation of the additive dual representation of Rogers (Math. Financ. 12:271–286, 2002) to the case of optimal stopping under model uncertainty, we develop a novel regression-based Monte Carlo algorithm for the approximation of the corresponding value function. The algorithm involves optimising a penalised empirical dual objective functional over a class of martingales. This formulation allows us to construct upper bounds for the optimal value with reduced complexity. Finally, we carry out a convergence analysis of the proposed algorithm and illustrate its performance by several numerical examples.
Similar content being viewed by others
1 Introduction
In this paper, we consider optimal stopping problems under model uncertainty in terms of ambiguity aversion. By representation results, this means that we look at stochastic optimisation problems of the form
where \(\mathcal {T}\) and \(\mathcal {Q}\) denote the set of stopping times and a set of probability measures, respectively, whereas \(\beta \) stands for a convex penalty function (see Maccheroni et al. [27]). In the special case
the optimal value represents the superhedging price of some American option in an incomplete financial market (see e.g. Trevino-Aguilar [34, Sect. 1.4], Föllmer and Schied [20, Chap. 7]). In the general form, the solution of (1.1) might be interesting for robust, efficient hedging of some American option in an incomplete financial market (see Trevino-Aguilar [34, Sect. 2.1], Föllmer and Schied [20, Chap. 8]). If the seller of this American option is only willing to invest an amount \(c\) strictly smaller than the superhedging price, then for any stopping time \(\tau \in \mathcal {T}\), the random variable \(Y_{\tau}\) may represent the shortfall risk of a hedging strategy with initial investment \(c\) when the American option is exercised at \(\tau \). Then
gives a robust quantification of the shortfall risk at time \(\tau \) reflecting the seller’s model uncertainty.
The aim of the present paper is to solve the stopping problem (1.1) numerically. We restrict ourselves to penalty functions in the form of divergence functionals with respect to a reference probability measure. In this case, (1.1) reads as
where \(\Phi : [0,\infty )\rightarrow [0,\infty ]\) denotes a lower semicontinuous convex function and \(\mathcal {Q}\) consists of all probability measures \(Q\) which are absolutely continuous with respect to some reference probability measure \(P\). Besides the standard optimal stopping, prominent specialisations of (1.3) are optimal stopping under average value at risk and the family of entropic risk measures.
Our investigations are built upon a specific representation of (1.3) established in Belomestny and Krätschmer [8] (with a refinement in Belomestny and Krätschmer [9]). The crucial point is that we may reformulate the optimal stopping problem in terms of a family of standard optimal stopping problems parametrised by a set of real numbers. This allows us to derive a so-called additive dual representation generalising the well-known dual representation of Rogers [30] for the standard optimal stopping problems, given by
where \((Z_{t})\) is an adapted cash-flow process and ℳ is the set of all \((\mathcal{F}_{t})\)-martingales starting in 0 at \(t=0\). We use this new generalised dual representation to efficiently construct Monte Carlo upper bounds for the value of the optimal stopping problems under model uncertainty. As to the standard optimal stopping problems, several Monte Carlo algorithms for constructing upper biased estimators for \(V^{*}\) based on (1.4) were suggested in the literature. They typically consist of two steps:
-
a)
apply some numerical method to construct a martingale \(\widehat{M}\) which is close to optimality;
-
b)
estimate \(E [\sup _{t\in [0,T]}(Z_{t}-\widehat{M}_{t})]\) by the sample mean, using a new independent sample (testing sample).
All the existing dual Monte Carlo algorithms can be divided into two broad categories depending on how the martingale \(\widehat{M}\) is constructed. In the first class of algorithms, see for example Andersen and Broadie [2] and Glasserman [22, Chap. 8], Belomestny and Schoenmakers [10, Part III] for further references, the choice of the martingale \(\widehat{M}\) is based on approximating the so-called Doob martingale. A particular feature of the Doob martingale is that it solves (1.4) and, moreover, satisfies
Because of (1.5), we say that the Doob martingale is surely or strongly optimal. In the second class of algorithms, one tries to solve the dual optimisation problem (1.4) directly by using methods of stochastic approximation and some parametric subclasses of ℳ. Let us mention Desai et al. [18], where the authors essentially applied the stochastic average approximation (SAA) approach and used a nested Monte Carlo method to construct a suitable finite-dimensional linear space of martingales, thus casting the resulting minimisation problem into a linear program. However, it was demonstrated later in Belomestny [6] that the approach in Desai et al. [18] may end up with martingales \(\widehat{M}\) that are close to optimal but only in expectation, with the variance of the random variable \(\sup _{t\in [0,T]}(Z_{t}-\widehat{M}_{t})\) being relatively high. In contrast, due to (1.5), for a martingale that is close to the Doob martingale \(M^{\ast}\) (in an \(L^{2}\) sense, for instance), this variance will be close to zero. Consequently, the estimation in step b) can be done more efficiently for such a martingale. Thus it is essential to find martingales that are “close” to the Doob martingale, or at least “close” to a surely optimal martingale. In this respect, Belomestny [6] proposed a modification of the plain SAA based on variance penalisation. The convergence analysis of this algorithm reveals that the variance of the random variable \(\sup _{t\in [0,T]}(Z_{t}-\widehat{M}_{t})\) converges to zero as the number of paths used to build \(\widehat{M}\) increases.
The contribution of the current work is twofold. First, we generalise the approach of Belomestny [6] to the case of optimal stopping problems under model uncertainty by using the dual representation by Belomestny and Krätschmer [8]. Second, we provide a thorough convergence analysis of the proposed algorithm. The main theoretical challenge is to extend the analysis of Belomestny [6] to objective functions involving empirical expectations and empirical variances of much more complicated objects than in Belomestny [6]; see Sect. 3. We use essentially different techniques (e.g. different concentration inequalities) and derive faster convergence rates that improve upon those in Belomestny [6] for standard optimal stopping problems. We also illustrate our results for the case of martingales in a diffusion setting defined as integrals with respect to the corresponding Brownian motion by the martingale representation theorem. As compared to Belomestny [6], we consider here not only parametric linear families of martingales, but rather general nonparametric ones defined as stochastic integrals with smooth integrands.
Putting our contribution into perspective, it should be emphasised that one cannot utilise any general device that is suggested in the literature to analyse the optimal stopping problem (1.1) or even (1.3). To the best of our knowledge, there exist two general strategies both based on some underlying filtered probability space \((\Omega , \mathcal{F},(\mathcal{F}_{t})_{0 \leq t \leq T},P)\). The first focuses on sets \(\mathcal {Q}\) where we may find conditional nonlinear expectations extending the functional
and satisfying a property called time-consistency which extends the tower property of conditional expectations. Time-consistency, sometimes also called recursiveness, allows extending the dynamic programming principle from standard optimal stopping problems to optimal stopping problems of the form (1.1). Studies following this line of reasoning may be found e.g. in Trevino-Aguilar [34, Sects. 4.1 and 4.2], Bayraktar and Yao [3], Bayraktar and Yao [4], Ekren et al. [19] and Bayraktar and Yao [5] (see also Riedel [29], Krätschmer and Schoenmakers [25] and Föllmer and Schied [20, Chap. 6] for the discrete-time case). Unfortunately, this approach requires very restrictive conditions that \(\mathcal {Q}\) should satisfy, at least for optimal stopping (1.2) (see e.g. Delbaen [17], or Belomestny et al. [7] for the case where \(\mathcal {Q}\) consists of probability measures that are equivalent to \(P\)). Even worse, it is known from Kupper and Schachermayer [26] that for the optimal stopping problem (1.3), \(\mathcal {Q}\) meets this requirement in two cases only. These choices of \(\Phi \) correspond to standard optimal stopping and optimal stopping under entropic risk measures.
The second approach proposed very recently in Huang and Yu [24] and Huang et al. [23] offers a way to solve the optimal stopping problem (1.2) when a dynamic programming principle cannot be applied. The main idea in these papers is to tackle optimal stopping within a game-theoretic framework and look for Nash subgame perfect equilibria. This line of reasoning refers to a long history in economics on how to deal with time-inconsistent dynamic utility maximisation, going back to Strotz [33], Selten [31] and Selten [32]. It has become popular for applications in stochastic finance due to the contributions by Björk and Murgoci [13] and Björk et al. [12], where the authors treat stochastic control problems which do not admit a Bellman optimality principle. Formally, the expected payoffs corresponding to the equilibria approximate the optimal values of (1.2) from below. However, this approach cannot be used directly for the optimal stopping problem (1.3) since this reduces to (1.2) only in a few cases (see Ben-Tal and Teboulle [11]), with optimal stopping under average value at risk as the outstanding representative. Moreover, a numerical method to calculate the payoffs at the equilibria is missing.
In conclusion, the existing literature on robust optimal stopping does not lead in general to a constructive numerical approach to calculate the optimal values of (1.3). This paper offers a method to deal with this problem and is completed by studying its theoretical properties.
The paper is organised as follows. In Sect. 2, we introduce convex risk measures and give some examples. Then we introduce primal and dual representations for our optimal stopping problems under model uncertainty. In Sect. 3, we develop a Monte Carlo functional optimisation algorithm based on the derived dual representation. Then we analyse its convergence towards the solution, depending on the number of Monte Carlo paths and complexity of the underlying functional class. The results are specified to a setting of diffusion processes in Sect. 4. Afterwards, we present some numerical results in Sect. 5. The proofs of the results from Sects. 3 and 4 are given in Sects. 6–8.
2 Setup
Let \(0< T<\infty \) and let \((\Omega , \mathcal{F},(\mathcal{F}_{t})_{0 \leq t \leq T},P)\) be a filtered probability space, where \((\mathcal {F}_{t})_{t\in [0,T]}\) is a right-continuous filtration with \(\mathcal {F}_{0}\) complete and trivial. We also impose the following requirements:
-
\((\Omega ,{\mathcal{F}}_{t},P|_{{\mathcal{F}}_{t}} )\) is atomless for \(t > 0\).
-
The \(L^{1}\)-space \(L^{1} (\Omega ,{\mathcal{F}}_{t},P|_{{\mathcal{F}}_{t}} )\) is weakly separable for \(t > 0\).
Consider a lower semicontinuous convex mapping \(\Phi : [0, \infty )\rightarrow [0,\infty ]\) satisfying \(\Phi (x_{0}) < \infty \) for some \(x_{0} > 0\), \(\inf _{x\geq 0}\Phi (x) = 0\) and \(\lim _{x\to \infty}\frac{\Phi (x)}{x} = \infty \). Its Fenchel–Legendre transform
is a finite nondecreasing convex function whose restriction to \([0,\infty )\) is a finite Young function, that is, \(\Phi ^{*}:[0,\infty )\to [0,\infty )\) is convex and satisfies
Consider the space
where \(L^{0}\) is the class of all (equivalence classes of) finite-valued random variables. For abbreviation, let us introduce the functional \(\rho : H^{\Phi ^{*}}\rightarrow \mathbb{R}\) defined by
where \(\mathcal {Q}_{\Phi}\) stands for the set of all probability measures \(Q\) which are absolutely continuous with respect to \(P\) and such that \(\Phi (dQ/dP)\) is \(P\)-integrable. Note that \(X dQ/dP\) is \(P\)-integrable for every \(Q\in \mathcal {Q}_{\Phi}\) and any \(X\in H^{\Phi ^{*}}\) due to Young’s inequality.
Example 2.1
Let us illustrate our setup in the case of the so-called average value at risk, also known as expected shortfall or conditional value at risk. The average value at risk at level \(\alpha \in (0,1]\) is defined as the functional
where \(X\) is \(P\)-integrable and \(F^{\leftarrow}_{X}\) denotes the left-continuous quantile function of the distribution function \(F_{X}\) of \(X\) defined by \(F^{\leftarrow}_{X}(\alpha ) = \inf \{x\in \mathbb{R}: F_{X}(x)\geq \alpha \}\) for \(\alpha \in (0,1)\). Note that \(AV@R_{1}(X) = E[X]\) for any \(P\)-integrable \(X\). Moreover, for \(\alpha \in (0,1)\), it is well known that
where \(\Phi _{\alpha}\) stands for the function defined by \(\Phi _{\alpha}(x) = 0\) for \(x\leq 1/(1-\alpha )\), whereas \(\Phi _{\alpha}(x) = \infty \) otherwise (cf. Föllmer and Schied [20, Theorem 4.52]). Observe that the set \(\mathcal {Q}_{\Phi _{\alpha}}\) consists of all probability measures on ℱ with \(dQ/dP\leq 1/(1-\alpha )\) \(P\)-a.s.
Consider now a right-continuous nonnegative stochastic process \((Y_{t})\) adapted to \((\mathcal{F}_{t})\). Furthermore, let \(\mathcal {T}\) consist of all \([0,T]\)-valued stopping times \(\tau \) with respect to \((\mathcal{F}_{t})\). The main object of our study is the optimal stopping problem
For fixed \(x\in \mathbb{R}\), we denote by \(V^{x} = (V^{x}_{t})_{t\in [0,T]}\) the Snell envelope of the process \((\Phi ^{*}(x + Y_{t}) - x )_{t\in [0,T]}\) defined via
Let \({\mathrm{int}}({\mathrm{dom}}(\Phi ))\) denote the topological interior of the effective domain of the mapping \(\Phi : [0,\infty )\rightarrow [0,\infty ]\). We assume that \(\Phi \) is a lower semicontinuous convex function satisfying \(1\in {\mathrm{int}}({\mathrm{dom}}(\Phi ))\). Denote by \(\mathcal {M}_{0}\) the set of all martingales \((M_{t})_{t\in [0,T]}\) with \(M_{0}=0\) such that \(\sup _{t\in [0,T]} |M_{t}|\) is \(P\)-integrable. The following result was proved in Belomestny and Krätschmer [8] along with Belomestny and Krätschmer [9]. We point out that this uses that \((\Omega ,\mathcal {F}_{t},P|_{\mathcal {F}_{t}})\) is atomless and \(L^{1} (\Omega ,{\mathcal{F}}_{t},P|_{{\mathcal{F}}_{t}} )\) is weakly separable, for each \(t>0\).
Theorem 2.2
If there is some \(p > 1\) such that \(\sup _{t\in [0,T]}|\Phi ^{*}(x + Y_{t})|\) is \(P\)-integrable of order \(p\) for any \(x\in \mathbb{R}\), then we have the dual representations
Here \(M^{*,x}\) is the martingale part of the Doob–Meyer decomposition of the Snell envelope \(V^{x}\) and \(K \subseteq \mathbb{R}\) denotes a suitably chosen compact set.
Remark 2.3
The above dual representation is remarkable for at least two reasons. Firstly, it allows one to construct upper bounds for the value \(V_{0}\) by choosing a martingale \(M\) from the set \(\mathcal {M}_{0}\). Secondly, if the optimal martingale \(M^{*,x}\) is found, then we need a single trajectory of the reward process \(Y\) and the martingale \(M^{*,x}\) to compute \(V_{0}\) with no error. In this sense, such a dual representation can be computationally more efficient than the primal one.
Remark 2.4
We may describe more precisely how to choose the compact set \(K\) in Theorem 2.2. First of all, observe that under the assumptions of this theorem, the representation results imply
Secondly,
holds for any real number \(x\). Next, by assumption we may find \(0\leq x_{0} < 1 < x_{1}\) such that \(x_{0}, x_{1}\) belong to the effective domain of \(\Phi \). Then by the definition of \(\Phi ^{*}\),
Then it is easy to check that
whenever
or
Hence any compact set \(K \supseteq [a_{\ell},a_{u}]\) may be used in Theorem 2.2. We can derive a more accessible choice for the set \(K\) in the case of average value at risk \(AV@R_{\alpha}\). By nonnegativity of the process \((Y_{t})\),
Furthermore, \(\Phi ^{*}(x) = x^{+}/(1-\alpha )\) holds for \(x\in \mathbb{R}\) so that
where
Thus any compact \(K \supseteq [a^{\alpha}_{\ell},a^{\alpha}_{u} ]\) is a proper choice in Theorem 2.2 for \(\rho = AV@R_{\alpha}\).
In the next section, we propose a Monte Carlo method for solving the dual optimisation problem (2.2) empirically.
3 Dual empirical minimisation
The representation result in Theorem 2.2, in particular (2.2), is the starting point for our method to solve the optimal stopping problem (2.1). We start by fixing a metric space \(\Psi \) and a family \((M_{t}(\psi ))_{t\in [0,T]}\) of martingales parametrised by \(\psi \in \Psi \), adapted to \((\mathcal{F}_{t})_{t \in [0,T]}\) and satisfying \(M_{0}(\psi ) =0\). Define the process \(Z = (Z(x,\psi ))\) via
We shall find the “best” \(\psi \in \Psi \) by solving the empirical optimisation problem on a set of trajectories. To this end, we define the product space \((\Omega ^{\mathbb{N}},\mathcal {F}^{\mathbb{N}},P^{\mathbb{N}})\) and its natural projections
as well as the processes \(Z^{(i)}\), \(i=1,2,\ldots \), on \(\Omega^{\mathbb{N}} \times \mathbb{R}\times \Psi \) via
Fix some \(\lambda >0\) and let \((x_{n},\psi _{n})\) denote one of the random solutions of the random optimisation problem
where \(K\) is a compact set in ℝ as in Theorem 2.2. If \(n\to \infty \), this optimisation problem becomes \(P^{\mathbb{N}}\)-a.s. close to the optimisation problem
and we denote by \((x^{*},\psi ^{*})\) one of the latter’s (deterministic) solutions. The intuition behind (3.1) is simple. Setting \(\xi (x,M):=\sup _{t\in [0,T]}(\Phi ^{*}(x + Y_{t}) - x - M_{t})\), we minimise the expectation of \(\xi (x,M)\) over a family of martingales \(M\) and \(x\in K\). At the same time, we penalise the variance of this random variable. We also have in mind that the variance of \(\xi (x,M)\) is zero if the chosen family of martingales contains the martingale \(M^{*,x^{*}}\) defined in Theorem 2.2 (see Rogers [30]). In this way, a variance reduction effect can be achieved, as we shall illustrate in Sect. 5.
Let us now analyse the properties of the measurable selector \((x_{n},\psi _{n})\). For any \(n \in \mathbb{N}\), set
The mapping \(\mathcal{D}_{n}\) can be interpreted as a set of Monte Carlo paths of the process \(Z\) used to construct \((x_{n},\psi _{n})\). In order to formulate our main results, we introduce the function, for a selector \((x_{n},\psi _{n})\),
With a slight abuse of notation, we set for \((x,\psi ) \in K \times \Psi \),
Now let \((K \times \Psi )_{\eta}\) denote the set of centres of a covering of \(K\times \Psi \) by a minimal number of \(\eta \)-balls with respect to the (semi)metric
Then define
where \(\mathcal{N}(K \times \Psi ,\varepsilon )\) stands for the minimal number to cover the set \(K \times \Psi \) by open \(d\)-balls with radius \(\varepsilon > 0\). We tacitly set \(\mathcal{N}(K \times \Psi ,\varepsilon ) = \infty \) if no finite cover is available.
Theorem 3.1
Let \(\delta \in (0,1)\) and \(\lambda >0\). Assume that \(|Z|\leq b <\infty \) with probability 1. Then it holds for all \(n \in \mathbb{N}\) that
where
Corollary 3.2
Let all assumptions of Theorem 3.1be valid and further assume that
Then for all \(\varepsilon >0\), it holds that
In some situations, the bounds of Theorem 3.1 can be improved. Suppose that
that is, the set \(\Psi \) is assumed to be rich enough such that the solution \((x^{*},\psi ^{*})\) satisfies \(M(\psi ^{*})=M^{*,x^{*}}\), where \(M^{*,x^{*}}\) is the martingale part of the Doob–Meyer decomposition of \(V^{x^{*}}\). As already mentioned above, in this case it holds that
In the case that \(M(\psi ^{*})=M^{*,x^{*}}\) for some \(\psi ^{*}\), the selectors \((x_{n},\psi _{n})\) are nothing else but so-called M-estimators. So we may invoke the established theory on asymptotics of M-estimation. The reader is referred to van der Vaart and Wellner [35, Chap. 3] for comprehensive information. In this theory, a starting point is the so called “well-separated minimum” condition
Property (3.3) is a basic assumption to find general criteria which ensure that the sequence \((x_{n},\psi _{n} )_{n\in \mathbb{N}}\) converges in probability to \((x^{*},\psi ^{*})\) (see van der Vaart and Wellner [35, Corollary 3.2.3]).
Since our metric \(d\) is assumed to be totally bounded, the topological closure of the set \(\{Z(x,\psi ) : (x,\psi )\in K\times \Psi \}\) with respect to the \(L^{1}\)-norm is compact. Note then that condition (3.3) is satisfied if and only if the restriction of the expectation operator to the \(L^{1}\)-closure \({\mathrm{{cl}}}(\{Z(x,\psi ): (x,\psi )\in K\times \Psi \})\) of \(\{Z(x,\psi ): (x,\psi )\in K\times \Psi \}\) has a unique minimum at \(Z(x^{*},\psi ^{*})\).
If we are interested in convergence rates, we must complete the “well-separated minimum condition”. The following type of identifiability condition is now standard in the literature of M-estimation (see van der Vaart and Wellner [35, Theorem 3.2.5]): There exist \(\overline{C}, \delta > 0\) such that
Now we are prepared to improve the convergence rates.
Theorem 3.3
Let \(\delta \in (0,1)\), \(\lambda >0\), \(C_{\lambda ,b}:=64 b^{3}\lambda ^{2}+2\lambda \) and \(| Z |\leq b <\infty \) with probability 1. Then under (3.2)–(3.4), for all \(n \in \mathbb{N}\) satisfying
it holds that
where \(c_{1},c_{2}>0\) are some universal constants,
and
Remark 3.4
Note that if \(\gamma (K \times \Psi ,n)\to 0\) as \(n\to \infty \) in such a way that
then
(see Sect. 7). In this sense, the bound in Theorem 3.3 is better than the one in Theorem 3.1.
4 Specification analysis for the class \(\Psi \)
In this section, we specify the convergence rates in (3.5) depending on the properties of the parameter space \(\Psi \). The convergence rate strongly depends on the quantity \(\gamma (K \times \Psi , n)\). This quantity in turn depends on the set \(\Psi \). Thus to analyse the convergence rate, we have to study the covering number of \(\Psi \). In what follows, we consider parametric families of martingales arising in the setting of diffusion processes. Let \((S_{t})_{t\in [0,T]}\) denote a \(d\)-dimensional diffusion process solving the system of SDEs
where \(\mu :[0,T]\times \mathbb{R}^{d} \to \mathbb{R}^{d}\) and \(\sigma :[0,T]\times \mathbb{R}^{d} \to \mathbb{R}^{d \times m}\) are Lipschitz-continuous in space and \(1/2\)-Hölder-continuous in time, with \(m\) denoting the dimension of the Brownian motion \(W=(W_{1},\ldots ,W_{m})^{\top}\). Then the martingale representation theorem implies that any square-integrable martingale \((M_{t})_{t\in [0,T]}\) with respect to the filtration \((\mathcal{F}_{t})_{t\in [0,T]}\) generated by \((W_{t})_{t\in [0,T]}\) and with \(M_{0}=0\) can be represented as
where \((G_{s})_{s\in [0,T]}\) is an \((\mathcal{F}_{t})_{t\in [0,T]}\)-adapted process which is square-integrable on \([0,T] \) in the sense of (4.3) below. Under some conditions, it can be shown by the Itô formula that the Doob martingale \((M_{t}^{*})_{t\in [0,T]}\) of the Snell process
for a function \(f:\mathbb{R}^{d}\to \mathbb{R}\) has a representation (4.2). More specifically, we may choose \(G_{s}=G(s,S_{s})\) for some measurable function \(G: [0,T] \times \mathbb{R}^{d}\to \mathbb{R}^{d}\) such that \((G_{s})_{s\in [0,T]}\) is square-integrable on \([0,T]\) as in (4.3) below; see Ye and Zhou [37, Theorem 5]. Therefore it is reasonable to parametrise a subclass of square-integrable martingales adapted to \((\mathcal{F}_{t})\) by functions \(\psi (t,x)=(\psi _{1}(t,x),\ldots ,\psi _{m}(t,x))\), satisfying
via
Note that this type of representations was already used to solve optimal stopping/ control problems in a dual formulation; see e.g. Wang and Caflisch [36] and Ye and Zhou [37]. Denote by \(\mathcal{H}_{p}^{s}(\mathbb{R}^{d})\) the Sobolev space consisting of all functions \(f \in L^{p}(\mathbb{R}^{d})\) such that for every multi-index \(\alpha \) with \(|\alpha | \leq s\), the mixed partial derivative \(D^{\alpha}f\) exists in the weak sense and is in \(L^{p}(\mathbb{R}^{d})\). Further let \(\beta \in \mathbb{R}\) and \(\langle x \rangle ^{\beta}=(1+|x|^{2})^{\beta /2}\), where \(x\in \mathbb{R}^{d}\). For \(s-d/p>0\), we define the weighted Sobolev space
and
Let \(\pi _{t}\) denote the density function of \(S_{t}\). We set
Now let us first look at convergence rates in the case that \(\mathrm {{Var}}[Z(x^{*},\psi ^{*})]\) does not vanish. Built upon an integrability condition on the density process \((\pi _{t})\), they will be described in terms of the degree \(s\) of smoothness that the functions in \(\Psi _{\pi}\) fulfil, and the dimension \(d\) of their domain. Recalling the process \(Z = (Z(x,\psi ))\) introduced at the beginning of Sect. 3, the following result is an application of Theorem 3.1.
Theorem 4.1
Let \(p=2\), \(\beta \in \mathbb{R}, s \in \mathbb{N}\), \(\delta \in (0,1)\), \(\lambda >0\) and \(d \in \mathbb{N}\). Further, let \(\Psi \) be a set such that \(\Psi _{\pi}\subseteq \mathcal{H}_{2}^{s}([0,T] \times \mathbb{R}^{d}, \langle x \rangle ^{\beta})\) is bounded with respect to the norm
In addition, suppose that
for some \(\alpha >0\). If \(| Z |\leq b <\infty \) with probability 1 for some \(b\in \mathbb{R}\), then for \(s > (d+ 1)/2\), there exist constants \(\eta _{1}, \eta _{2}, \eta _{3}\) and \(\eta _{4}\), depending on \(\lambda ,b,d,s\) and \(\delta \) as well as on the compact set \(K\) and the function \(\Phi \), such that
1) for \(\alpha >s-(d+1)/2\) and \(s/(s+d+1) \leq 1/2\),
2) for \(\alpha >s-(d+1)/2\) and \(s/(s+d+1)> 1/2\),
3) for \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) \leq 1/2\),
4) for \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) > 1/2\),
The parameter \(\alpha \) in Theorem 4.1 may be viewed as a degree of integrability for the density process \((\pi _{t})\). The terms \(s/(s + d + 1)\) and \((\alpha /(d+ 1) + 1/2)/(\alpha /(d+ 1) + 3/2)\) occurring in the result are nondecreasing in \(s\) and \(\alpha \), respectively, with
and
So Theorem 4.1 tells us that for a fixed degree of integrability, the convergence rates are nondecreasing with respect to the degree of smoothness. However, the second and fourth cases show that in case of a significant degree of smoothness in comparison with the dimension \(d\), there is always a point of saturation where the convergence rates cannot be improved by higher degrees of smoothness. In addition, for a given degree of smoothness, the higher the degree of integrability, the better the convergence rates, with certain points of saturation.
Let us turn to the situation when the assumptions of Theorem 3.3 hold. We may derive from Theorem 3.3 the next result which is qualitatively of the same nature as Theorem 4.1, but with doubled convergence rates.
Theorem 4.2
Let all conditions of Theorem 4.1be satisfied and in addition suppose that properties (3.2)–(3.4) are valid. For \(s > (d+1)/2\), there exist constants \(\tilde{\eta}_{1},\tilde{\eta}_{2},\tilde{\eta}_{3},\tilde{\eta}_{4}\), depending on \(\lambda ,b,d,s\) and \(\delta \) as well as on the compact set \(K\) and the function \(\Phi \), such that
1) for \(\alpha >s-(d+1)/2\) and \(s/(s+d+1) \leq 1/2\),
2) for \(\alpha >s-(d+1)/2\) and \(s/(s+d+1)> 1/2\),
3) for \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) \leq 1/2\),
4) for \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2) (\alpha /(d+1)+3/2) > 1/2\),
Remark 4.3
Theorem 4.1 implies that \(\mathcal{Q}_{\lambda}\left (x_{n},\psi _{n}\right )\) converges to \(\mathcal{Q}_{\lambda}\left (x^{*},\psi ^{*}\right )\) at a rate depending on the smoothness of the density \(\pi _{t}(x)\) and its decay for \(|x|\to \infty \). It is well known (see Friedman [21, Theorem 9.8]) that if the diffusion coefficient \(\sigma \) is uniformly elliptic and the coefficients \(\mu \) and \(\sigma \) are infinitely differentiable in \([0,T]\times \mathbb{R}^{d}\) with bounded derivatives of any order, then \(\partial _{t}^{s}\partial ^{r}_{x} \pi _{t}(x)\) exists for all positive integers \(r\) and \(s\). Moreover, it holds for all \(x\in \mathbb{R}^{d}\) and \(t>0\) that
Here ≲ means that the above inequality holds up to a constant only depending on \(s\) and \(r\). Hence (4.4) holds for an arbitrarily large \(\alpha \geq \beta \) and
for any norm-bounded class \(\Psi \subseteq \mathcal{H}_{2}^{s}([0,T] \times \mathbb{R}^{d},\langle x \rangle ^{\beta})\) with arbitrary but fixed \(\beta \leq 0\) and \(s>d+1\). Here we refer to the norm introduced in Theorem 4.1.
5 Numerical results
We use the Euler scheme and \(L=200\) discretisation points to approximate the solution of the SDE
In particular, we discretise the interval \([0,T]\) with
Then for computational reasons, we smooth our objective function
using a soft-max type method to get
Note that for \(p \to \infty \), the pointwise convergence \(\widetilde{Z}_{p} \to \widetilde{Z}\) holds. This follows from the observation that well-known relationships between \(L^{p}\)-norms (see e.g. Aliprantis and Border [1, Lemma 13.1]) yield
For our numerical study, we focus on the optimal stopping problems
where \(AV@R_{1- \alpha}\) denotes the risk measure average value at risk at level \(1-\alpha \) as introduced in Example 2.1. The real-valued martingale
can be approximated by the sum
For the space \(\Psi \), we take a linear span of trigonometric basis functions and use a gradient-based method to solve the resulting optimisation problem. Next, we present numerical examples of pricing American put and Bermudan max-call options. Some of these examples were discussed for standard optimal stopping in Glasserman [22, Chap. 8] and in Belomestny [6]. Note also that for the stopping problems considered in this section, some examples were presented in Belomestny and Krätschmer [8] albeit with different parameters.
Example 5.1
Let \(S_{t}=S_{0} \exp ((r-\delta -\sigma ^{2}/2)t+\sigma W_{t})\) with \(r=0.05\), \(\delta =0.1\), \(\sigma =0.2\) and \(Y_{t}=\exp (-r t)(K_{\mathrm {c},\mathrm {p}}-S_{t})^{+}\), where \(K_{\mathrm {c},\mathrm {p}}\) denotes the strike price. Under these conditions, our algorithm approximates the solution of the optimal stopping problem
In our implementation, we let \(\Psi \) be a linear space of functions \(\psi :[0,T]\times \mathbb{R}\to \mathbb{R}\) such that
where
and
First we generate \(n= \text{10'000}\) paths to obtain the optimal values \((x_{n},\psi _{n})\). Then we generate 100’000 new paths to test the solution. For \(K_{\mathrm {c},\mathrm {p}}=100\), \(D=10\) and \(\alpha =0.05\), the results are presented in Table 1. It is interesting to see how the upper bounds depend on \(\alpha \). Setting \(S_{0}=100\) and using the same parameter values as above, we obtain Table 2. Here we solve the optimisation problem
and then divide the result by \(\alpha \). This allowed us to increase \(p\) in (5.1) and to get better results.
Example 5.2
Now consider a Bermudan max-call option on two assets. For \(i=1,2\), let
where \(r,\delta ,\sigma \) are constants. This system of SDEs describes two identically distributed assets, where each underlying yields a dividend rate \(\delta \). At any time \(t \in \{t_{0},\ldots ,t_{I}\}\), the holder of the option may exercise it and receive the payoff
In our example, we set \(t_{i}=iT/I\), \(i=0,\ldots ,I\), and choose \(T=3\) as well as \(I=9\). For the linear space \(\Psi _{D}\) of functions \(\psi :[0,T]\times \mathbb{R}^{2} \to \mathbb{R}^{2}\), we consider
and
where \(\xi _{k}\) and \(\zeta _{k}\) are defined in Example 5.1. Now for \(K_{\mathrm {c},\mathrm {p}}=100\), \(r=0.05\), \(\delta =0.1\), \(\alpha =0.05\), \(\sigma =0.2\) and \(D=6\), we obtain Table 3. Like in Example 5.1, it is interesting to vary \(\alpha \). By fixing \(S_{0}^{1}=S_{0}^{2}=100\), we get the results presented in Table 4. In order to compare the current approach with the one used in Belomestny and Krätschmer [8, Table 1], we take \(S^{1}_{0}=S_{0}^{2}=90\) and \(\alpha \in \{0.33,0.5,0.67,0.75\}\). The corresponding results are presented in Table 5. The upper bounds are worse than those in [8]. Note that in [8], a nested approach to compute martingales was used.
Example 5.3
As in the example before, let
We define our reward function as
For \(I=9\), \(T=0.5\), \(r=0.06\), \(\delta =0\), \(K_{\mathrm {c},\mathrm {p}}=100\), \(\sigma =0.6\) and with the basis functions used in Example 5.2, we get the results presented in Table 6. By varying \(\alpha \), we get the results for \(S_{0}^{1}=S_{0}^{2}=100\) which are presented in Table 7.
In all the above examples, it is important to find a suitable compact subset \(K\) of ℝ in Theorem 2.2. Using the notations of Remark 2.4, this can be reduced to finding a lower estimate \(a_{\ell}^{1-\alpha}\) and an upper estimate of \(a_{u}^{1-\alpha}\). For this purpose, note that in any of the above examples, the desired estimates may be derived from upper estimates for the quantity \(\sup _{\tau \in \mathcal {T}}E[S_{\tau}] \), where for some \(\mu ,\sigma \in \mathbb{R}\),
We may invoke the reflection principle for Brownian motion to get
Once we have found a suitable interval \(K := [a_{\ell},a_{u}]\), we proceed in the following way. First we fix a grid \(X=\{\overline{a}_{\ell}=x_{0}< x_{1}<\cdots < x_{J}= \overline{a}_{u} \}\). Then for a fixed \(x\in X\), we use the Longstaff–Schwartz algorithm to approximate the value
where \(X\) is the underlying Markov process with values in \(\mathbb{R}^{d}\) and \(f:\mathbb{R}^{d}\to \mathbb{R}\). To this end, we use a time discretisation by fixing a time grid \(0=t_{0}< t_{1}< \cdots <t_{L}=T\) on \([0,T]\). The LS algorithm is now used to obtain estimates \(\widehat{C}_{0}^{x},\ldots ,\widehat{C}^{x}_{L}\) for the corresponding continuation functions based on polynomials of degree 3 and with 100’000 Monte Carlo paths of the process \(X\). After that, we approximate the value of
via
with \(\tau ^{(i)}(x)=\min \{0\leq \ell \leq L:f(X_{t_{\ell}}^{(i)})\geq \widehat{C}_{\ell}^{x}(X_{t_{\ell}}^{(i)}) \}\). Here \(X^{(i)}_{t_{0}},\ldots ,X^{(i)}_{t_{L}}\) with \(i=1,\ldots ,n\) are \(n\) trajectories of the process \(X\) independent of those used to approximate the continuation values. Note that due to the discretisation in \(x\), we may incur an additional upward bias in the estimate (5.2). On the other hand, the time discretisation introduces a downward bias that can compensate. Our numerical experiments suggest that both biases are negligible (for large enough \(J\) and \(L\)) compared to the downward bias due to the error of approximating the underlying continuation functions.
6 Proof of the main results
6.1 Preparations and notations
To prove Theorems 3.1 and 3.3, we need some preparation. Since the way of proving both theorems is the same at the beginning, the preparation is valid for both proofs. Let \(\eta >0\). With \((K \times \Psi )_{\eta}\), we denote the space of centres of minimal \(\eta \)-balls needed to cover \(K\times \Psi \), with respect to the semimetric
Fix \(n \in \mathbb{N}\) and \(\lambda >0\). By \((x_{n,\eta},\psi _{n,\eta})\), we denote a measurable selector of the set
and by \((x_{\eta}^{*},\psi _{\eta}^{*})\), we denote an element of the set \((K \times \Psi )_{\eta}\) satisfying
for a solution \((x^{*},\psi ^{*})\) of (3.1). Due to the construction of \((K \times \Psi )_{\eta}\), there always exists such an \((x_{\eta}^{*},\psi _{\eta}^{*})\), but it need not be unique. For \((x,\psi ) \in K \times \Psi \), let
as well as
where \(\tilde{Z} = (\tilde{Z}(x,\psi ))\) is an independent copy of \(Z\). With the above definitions, we have \(P^{\mathbb{N}}\)-a.s. for \(c\geq 0\) that
Indeed, the first inequality holds due to
which follows directly from the definitions of \((x_{n,\eta},\psi _{n,\eta})\) and \(h_{n,\lambda}\). Now we have to analyse
and
Let us start with the first term. Observe that
where
Note that for all \(n \in \mathbb{N}\),
At this point, it makes sense to separate the further steps of the proofs for the two theorems. But to prove both theorems, we have to analyse the following terms, where the aim is to find upper bounds holding within a given probability:
6.2 Outline for the proof of Theorem 3.1
The idea is to derive bounds for \(T_{1},T_{2},T_{3}\). Therefore we use some concentration inequalities like the Hoeffding inequality, the Bernstein inequality and a new one which is based on a bounded differences approach.
Let \(c=1\) and fix \(n \in \mathbb{N}\) and \(\eta >0\). We show in Sect. 6.6.1 that
For the analysis of \(T_{3}\), we notice that
and in Sect. 6.6.1, we obtain for \(\epsilon > 0\) that
With the help of these concentration inequalities, we can derive bounds for \(T_{1},T_{2},T_{3}\) within a given probability if we choose \(\epsilon \) well. After deriving bounds for \(T_{1},T_{2},T_{3}\), we can easily find a bound for \(T_{1}+T_{2}+T_{3}\) within a given probability. Then the same bound holds for \(2\mathcal {Q}_{\lambda}(x_{n},\psi _{n})-2\mathcal {Q}_{\lambda}(x^{*},\psi ^{*})\) within the given probability because \(2\mathcal {Q}_{\lambda}(x_{n},\psi _{n})-2\mathcal {Q}_{\lambda}(x^{*},\psi ^{*}) \leq T_{1}+T_{2}+T_{3}\) \(P^{\mathbb{N}}\)-a.s.
6.3 Proof of Theorem 3.1
Fix \(n \in \mathbb{N}\) and \(\eta >0\), as well as \(\delta \in (0,1)\). Further, we impose that we have
Then we set
and we derive with the inequalities (6.5) and (6.6) the estimates
Therefore, by elementary calculations, we arrive at
Concerning \(T_{3}\), let us set
Then we can derive first
This leads again with elementary calculations to
Now we only need an upper estimate of \(E[h_{n,\lambda}(x_{\eta}^{*},\psi _{\eta}^{*}) ]\), which is presented via
Above, the equality follows directly by definition. The first inequality is derived by using the third binomial formula \(x^{2} - y^{2} = (x - y) (x + y)\) backwards in connection with the boundedness of \(Z\) and \(\tilde{Z}\). The second inequality holds because \(Z\) and \(\tilde{Z}\) are independent with identical distribution. The final inequality results from the definition of \((x_{\eta}^{*},\psi _{\eta}^{*})\); see (6.1). So by (6.10) and (6.11), we get for \(T_{3}\) that
Now, combining (6.9) and (6.12), we derive
and since we have \(P^{\mathbb{N}}\)-a.s. that
we finish with
Setting \(\eta =\gamma (K\times \Psi ,n)\), the assumption
is always satisfied, and we have
where
Now the statement of Theorem 3.1 follows immediately. □
6.4 Outline for the proof of Theorem 3.3
The proof of Theorem 3.3 is similar to the proof of Theorem 3.1, but relies on some different concentration inequalities given below. Let \(c \geq 2\), \(n \in \mathbb{N}\) and \(\eta >0\), as well as \(\epsilon >0\). With \(L\) from (6.18) below, we have
(see Sect. 6.6.2). Moreover, with \(C_{\lambda ,b}\) as in Theorem 3.3, we derive in Sect. 6.6.2 the estimates
and
6.5 Proof of Theorem 3.3
In the following, let \(\delta \in (0,1)\), \(c\geq 2\) and let \(n \in \mathbb{N}\) and \(\eta >0\) satisfy the condition
Let us introduce
Lemma 6.1
If (3.2)–(3.4) hold, then \(L <\infty \).
Proof
Let \((\overline{x}_{k},\overline{\psi}_{k})_{k\in \mathbb{N}}\) be any sequence in \(K\times \Psi \) satisfying the strict inequality \(E[Z(\overline{x}_{k},\overline{\psi}_{k})] > E[Z(x^{*}, \psi ^{*})] \) for \(k\in \mathbb{N}\) and
First of all, the sequence \((\mathrm {{Var}}[Z(\overline{x}_{k},\overline{\psi}_{k})])\) is bounded because the random variables \(Z(x,\psi )\) are \(P^{\mathbb{N}}\)-essentially bounded, uniformly in \((x,\psi )\in K\times \Psi \). Therefore, in order to show the finiteness of \(L\), it suffices to restrict our considerations to the case \(E[Z(\overline{x}_{k},\overline{\psi}_{k})]- E[Z(x^{*}, \psi ^{*})]\to 0\). In this situation, the “well-separated minimum” property (3.3) implies the convergence \(d ((\overline{x}_{k},\overline{\psi}_{k}), (x^{*},\psi ^{*}) )\to 0\). Therefore by the identifiability condition (3.4), we may find some \(\overline{C} > 0\) and \(k_{0}\in \mathbb{N}\) such that
Next, in view of (3.2),
for \(k\in \mathbb{N}\), and thus
This completes the proof due to the choice of the sequence \((\overline{x}_{k},\overline{\psi}_{k})_{k\in \mathbb{N}}\). □
From now on, we assume that the conditions (3.2)–(3.4) are satisfied so that the constant \(L\) defined in (6.18) is finite. In particular, we may introduce
Furthermore, we set
Then we may derive from (6.13) and (6.14) along with (6.17) that
In addition, if
we get from (6.15) that
Setting
we may conclude from (6.16) that
Next, recall that in view of (6.11), the inequality \(E[h_{n,\lambda}(x_{\eta}^{*},\psi _{\eta}^{*})]\leq (1+4 b \lambda )\eta \) holds. Then combining (6.19) and (6.22) with (6.21), we obtain under condition (6.20) that
and therefore
Let us set
Then (6.20) and (6.17) are fulfilled. By elementary calculation and the definition of \(\kappa (n)\), we may find for \(n> 4b (\gamma (K \times \Psi ,n) + C_{\lambda ,b}\log (8/\delta )/n +2/3) \log (8/\delta )\) some universal constants \(c_{1},c_{2}>0\) such that
where \(R_{1}(n,\delta )\), \(R_{2}(n,\lambda ,\delta )\) are as in (3.6) and (3.7), respectively. The proof is complete. □
6.6 Proofs of the concentration inequalities
Let us at first give an auxiliary result which will turn out to be useful.
Lemma 6.2
Let \(\Lambda (x,\psi )\) be a random variable parametrised by \((x,\psi ) \in K \times \Psi \). If \(\mathcal{N}(K \times \Psi ,\eta )<\infty \), then for every \(z \in \mathbb{R}\), there is a pair \((\overline{x},\overline {\psi }) \in (K\times \Psi )_{\eta}\) (depending on \(z\)) such that
Proof
Since \((K \times \Psi )_{\eta}\) has finite cardinality \(\mathcal{N}(K \times \Psi ,\eta )\), we have
where . □
6.6.1 Proofs of the concentration inequalities for Theorem 3.1
Let us now prove the concentration inequalities used to prove Theorem 3.1. We start with (6.5).
Proof of (6.5)
Due to Lemma 6.2, there exists \((\overline {x},\overline {\psi }) \in (K\times \Psi)_{\eta}\) such that
Using Corollary A.3, we derive
So finally, we get
This shows (6.5) since we have chosen \(c = 1\). □
To prove (6.6), we need the following result for preparation.
Theorem 6.3
For \((x,\psi )\in K\times \Psi \) and \(t > 0\), it holds that
Proof
We want to apply the bounded differences inequality (see Boucheron et al. [14, Theorem 6.2]) to the function \(\overline{g}_{n}: ([-b,b]\times [-b,b] )^{n}\rightarrow \mathbb{R}\) defined by
Therefore it suffices to show that \(\overline{g}_{n}\) satisfies the so-called bounded differences condition (cf. [14]). For this purpose, let \(k\in \{1,\ldots ,n\}\) and consider arbitrary pairs \((z_{1},\overline{z}_{1}),\ldots , (z_{n},\overline{z}_{n})\) and \((z_{k}',\overline{z}_{k}')\) from \([-b,b]^{2}\). With \(\underline{z} := ((z_{1},\overline{z}_{1}),\ldots ,(z_{n}, \overline{z}_{n}) )\) and
we then have
So obviously \(\overline{g}_{n}\) satisfies the bounded differences condition with \(c_{k}:=16 b^{2}/n\) for \(k \in \{1,\ldots ,n\}\), and
Now [14, Theorem 6.2] provides the estimate
□
We are ready to prove (6.6)
Proof of (6.6)
First of all, by Lemma 6.2,
for some \((\overline {x},\overline {\psi })\in (K\times \Psi )_{\eta}\). Then the inequality follows immediately from Theorem 6.3. □
Finally, (6.7) may be proved by an application of Corollary A.3, whereas (6.8) follows from Theorem 6.3.
6.6.2 Proofs of the concentration inequalities for Theorem 3.3
Let us now prove the inequalities used for the proof of Theorem 3.3. Under the additional assumption that \(\mathrm {{Var}}[Z(x^{*},\psi ^{*})]=0\), we first give a lemma, recalling the semimetric \(d\) on \(K\times \Psi \) introduced in Sect. 6.1.
Lemma 6.4
Under the condition (3.2), we have
Proof
Assumption (3.2) means that \(Z(x^{*},\psi ^{*})\) and \(\tilde{Z}(x^{*},\psi ^{*})\) coincide \(P\)-a.s. Hence
Since \(Z\) and \(\tilde{Z}\) are bounded by the constant \(b\) and are identically distributed, using \(x^{2} - y^{2} = (x - y) (x + y)\) along with the triangle inequality yields
This completes the proof. □
The following auxiliary result is a useful consequence of Lemma 6.4.
Lemma 6.5
If (3.2) is satisfied, then for \((x,\psi )\in K\times \Psi \) and \(\epsilon > 0\),
Proof
Let \((x,\psi )\in K\times \Psi \) and \(\epsilon > 0\). Condition (3.2) implies that
In particular, \(2 g_{n}(x,\psi )\) is a so-called U-statistic with kernel \(q:\mathbb{R}^{2}\rightarrow \mathbb{R}\) defined by \(q(s,t) = (s - t)^{2}\). Hence we may draw on a Bernstein inequality for U-statistics (see e.g. Clémençon et al. [16, Appendix A]) to conclude that
where \(\lfloor n/2\rfloor \) denotes the integer part of \(n/2\). By using (3.2) again, we obtain
(see e.g. the proof of Lemma 6.4). Then the statement of Lemma 6.5 follows immediately from Lemma 6.4. □
Now we are ready to verify the concentration inequalities. Let us start with (6.13), recalling that \(L\) as defined in (6.18) is finite by Lemma 6.1.
Proof of (6.13)
Note first that in view of (3.2),
Hence we may assume without loss of generality that \((K \times \Psi )_{\eta}\setminus \{(x^{*},\psi ^{*})\} \neq \emptyset \). In view of Lemma 6.2, there is some \((\overline{x},\overline{\psi})\in K\times \Psi \) such that
If \((\overline{x},\overline{\psi}) = (x^{*},\psi ^{*})\), then (6.13) is shown. So let \((\overline{x},\overline{\psi})\) be different from \((x^{*},\psi ^{*})\). Then \(D(\overline{x},\overline{\psi})\neq 0\) in view of (3.3). This allows us to use the Bernstein inequality (see Corollary A.4 below). Thanks to \(\mathrm {{Var}}[Z(x^{*},\psi ^{*}) ] = 0\), we then arrive at
Since \(L\) satisfies for all \((\overline {x},\overline {\psi }) \in (K \times \Psi )_{\eta}\) with \(D(\overline {x},\overline {\psi })\neq 0\) the inequality
we get
Then (6.13) may be derived immediately. □
Let us turn now to (6.14).
Proof of (6.14)
For \(c \geq 2\), we may select by Lemma 6.2 some \((\overline {x},\overline {\psi })\in ( K\times \Psi )_{\eta}\) with
Now with \(\epsilon := (t+\lambda (c-1) E[g(\overline{x},\overline {\psi })] )/2 \lambda (1+c)\), we may invoke Lemma 6.5 to observe that
By (3.2), the expectation \(E[g(\overline {x},\overline {\psi })]\) is nonnegative. Since in addition \(c \geq 2\), we may conclude that
This completes the proof. □
Concerning (6.15) and (6.16), note first that \(d ((x^{*}_{\eta},\psi ^{*}_{\eta}),(x^{*},\psi ^{*}) ) \leq \eta \) holds due to (6.1). In particular, \(E[g(x^{*}_{\eta},\psi ^{*}_{\eta})]\leq 8 b\eta \) by Lemma 6.4. Then (6.15) follows easily from Lemma 6.5. Moreover,
Thus (6.16) may be derived directly from the Bernstein inequality (see Corollary A.4). Hence we have shown all concentration inequalities necessary for our proof of Theorem 3.3.
7 Proof of Remark 3.4
It is easy to check that for a constant \(k\) not depending on \(n\) and \(\gamma \), we have
With the assumption that \(\lim _{n \to \infty} \gamma (K\times \Psi ,n)=0\), we get
Then by \(\lim _{n \to \infty} n\gamma (K \times \Psi ,n) = \infty \), we end up with
□
8 Proofs of Theorems 4.1 and 4.2
Let the assumptions of Theorem 4.1 be fulfilled, retaking notation from its formulation. To prove Theorems 4.1 and 4.2, we need some preparations. These mainly concern estimates of different semimetrics, but may reveal some interesting results.
8.1 Preparations and notations
Firstly, we endow the space \(K \times \Psi \) with the semimetric
This is well defined because \(Z(x,\psi )\) is assumed to be essentially bounded uniformly in \((x,\psi ) \in K\times \Psi \). Secondly, by assumption, we may equip the set \(\Psi _{\pi}\) with the \(L^{2}\)-metric \(d_{\Psi _{\pi}}\) defined by
Next, we want to find a suitable semimetric on the space \(\Psi \). It is based on the following observation.
Lemma 8.1
There exists some \(C_{1} > 0\) such that for \(\psi _{1},\psi _{2}\in \Psi \), the inequality
holds, where \(f_{\psi _{i}}(t,x) = \psi _{i}(t,x)\sqrt{\pi _{t}(x)}\) for \(i=1,2\).
Proof
By the Burkholder–Davis–Gundy inequality (for \(p=1\)), we may find some \(C_{1} > 0\) such that for \(\psi _{1}, \psi _{2}\in \Psi \), we have
Invoking Jensen’s inequality, we end up with
This completes the proof. □
Lemma 8.1 allows us to introduce the mapping
Obviously, it satisfies the properties of a semimetric. The minimal number of open \(d_{\Psi}\)-balls of radius \(r > 0\) to cover \(\Psi \) is denoted by \(\mathcal{N}(\Psi ,r)\), where \(\mathcal{N}(\Psi ,r) := \infty \) if no finite cover is available.
Let us introduce the mappings
which are obviously alternative semimetrics on \(K\times \Psi \). In the next step, we want to find an upper estimate of the semimetric \(d\) in terms of the semimetrics \(\rho _{1}\) and \(\rho _{2}\).
Theorem 8.2
There exists a constant \(C>1\) such that for \((x,\psi ),(x',\psi ')\in K\times \Psi \),
Proof
Set \(\overline{x} := (\max K)^{+} + 1\). The proof is based on a representation for convex functions. We use that for any \(x,x_{0} \in \mathbb{R}\) with \(x_{0} < x\), we have
where \(\Phi ^{*'}_{+}\) denotes the right derivative of \(\Phi ^{*}\). Since \(\Phi ^{*}\) and its right derivative are both nondecreasing, we may observe for \(x,x'\in K\) and \(\eta \geq 0\) that
where the last step additionally uses that \(\overline{x}\geq 1\) and that \(\Phi ^{*}\) is nonnegative on \([0,\infty )\). As a consequence, we may conclude by nonnegativity of \((Y_{t})\) that
Since \(|Z|\leq b\) \(P\)-a.s., we further obtain
for every \(\psi \in \Psi \) and any \(t\in [0,T]\). Each martingale \(M(\psi )\) is continuous and square-integrable on \([0,T]\) so that, by Doob’s \(L^{2}\)-inequality,
Now the proof is straightforward: We fix any \(\overline{\psi}\in \Psi \). Combining (8.2) with (8.3) and (8.4), we end up with
for \(x,x'\in K\) and \(\psi ,\psi '\in \Psi \). The proof is complete. □
Let us introduce some further notation. For any semimetric \(\overline{d}\) on \(K\times \Psi \), we denote by \(\mathcal{N}(K\times \Psi ,r,\overline{d})\) the minimal number of open \(\overline{d}\)-balls of radius \(r > 0\) to cover \(K\times \Psi \). Here we set \(\mathcal{N}(K\times \Psi ,r,\overline{d}) := \infty \) if no finite cover is available. These covering numbers induce the numbers
Proposition 8.3
Let \(\gamma (\Psi ,n) := \inf \{r > 0 : \log \mathcal{N}(\Psi ,r)\leq n r \} \to 0\) for \(n \to \infty \). Then for \(C > 1\), it holds that
where \(\bar{d}^{C}_{K\times \Psi}\) denotes the semimetric on \(K\times \Psi \) defined by \(\bar{d}^{C}_{K\times \Psi} := \rho _{1} + C\rho _{2}\).
Proof
We use the notation \(\mathcal{N}(K,r)\) for the minimal number of open intervals of radius \(r > 0\) to cover \(K\). By Buldygin and Kozachenko [15, Lemma 3.2.1], we have
By compactness, the set \(K\) is a subset of \([-A,A]\) for some \(A>0\). In particular, the inequality \(\mathcal{N}(K,r) \leq 1 + 2 A/r\) holds for every \(r > 0\). Moreover, the mapping
is strictly decreasing and differentiable and satisfies \(\varphi (r)\to 0\) for \(r\to \infty \) as well as \(\varphi (r)\to \infty \) for \(r\to 0\). In particular, \(\varphi \) is a strictly decreasing bijection with inverse \(\varphi ^{-1}\). Then, defining \(\gamma (K,n) := \inf \{r > 0 : \log \mathcal{N}(K,r)\leq n r\}\), we obtain
Now let \(\epsilon > 0\). Since \(\gamma (\Psi ,n)\to 0\) for \(n\to \infty \) by assumption, we may find in view of (8.6) some \(n_{0}\in \mathbb{N}\) such that
for \(n\in \mathbb{N}\) with \(n\geq n_{0}\). Then for \(n\in \mathbb{N}\) with \(n\geq n_{0}\), there exist \(r_{n}\in (0,\epsilon /(4C))\) and \(\overline{r}_{n}\in (0,\epsilon /4)\) such that
and
Hence by (8.5),
Hence \(\gamma (K\times \Psi ,n)\leq \epsilon \) for \(n\in \mathbb{N}\) with \(n\geq n_{0}\). This completes the proof. □
Henceforth, we denote by \(\mathcal {N}(\Psi _{\pi},\varepsilon )\) the covering number of \(\Psi _{\pi}\) by \(\varepsilon \)-balls with respect to \(d_{\Psi _{\pi}}\). Furthermore, we define
Finally, let us introduce the following notation: \(f(x)\lesssim g(x)\) for two functions \(f,g:\mathbb{R}^{d}\to \mathbb{R}\) means that there exists a constant \(C>0\) such that \(f(x)\leq Cg(x)\) for all \(x \in \mathbb{R}^{d}\).
8.2 Proofs of Theorems 4.1 and 4.2
Setting \(\bar{d}^{C}_{K \times \Psi} := \rho _{1} + C\rho _{2}\), we may find by Theorem 8.2 some \(C > 1\) such that
The idea of the proofs of Theorems 4.1 and 4.2 is based on a result given by Nickl and Pötscher [28]. Under the imposed assumptions, this result enables us to give analytical upper estimates for \(\gamma (K \times \Psi ,n,\bar{d}^{C}_{K \times \Psi})\). Then we use these analytical estimates and apply Theorems 3.1 and 3.3, respectively, to derive an analytical bound for the deviations \(\mathcal {Q}_{\lambda}(x_{n},\psi _{n})-\mathcal {Q}_{\lambda}(x^{*},\psi ^{*})\).
To calculate the bounds for \(\gamma (K \times \Psi ,n,\bar{d}^{C}_{K \times \Psi})\), we already know by (8.5) that
and due to (8.1), there is \(C_{1}>0\) such that
On the one hand, we can calculate with \(K\subseteq [-A,A]\) for some \(A > 0\) that
On the other hand, Nickl and Pötscher [28, Corollary 4] gives for \(\alpha >s-(d+1)/2\) that
and for \(\alpha < s-(d+1)/2\) that
So in view of (8.7), we end up with
where \(\ell =(d+1)/s\) in the case \(\alpha >s-(d+1)/2\) and \(\ell =(\alpha /(d+1)+1/2)^{-1}\) in the case \(\alpha < s-(d+1)/2\) and \(h_{1}=8CA\) as well as \(h_{2}=4C_{1}\). To calculate the convergence rates, we divide the set \(M:=\{\epsilon >0 : h_{2}^{\ell}/\epsilon ^{ \ell} + h_{1}/\epsilon \leq n\epsilon \}\) into the subsets \(M_{1}\) and \(M_{2}\) satisfying \(M=M_{1} \cup M_{2}\), where
and note that \(\inf M =\min \{\inf M_{1}, \inf M_{2}\} \). Further we distinguish the following two cases:
Case 1: \(\ell \geq 1\). In this case, we have on \(M_{1}\) that
Therefore
and so \(\inf M_{1} \leq \inf M_{1}' \). With a bit of calculation, we have
On \(M_{2}\), we have
With a bit of calculation, we have for \(n \leq h_{2}^{\ell}+h_{1}\) that
and for \(n > h_{2}^{\ell}+h_{1}\), we have \(M_{2}'= [1, \infty ) \). Combining these results and setting \(\inf \emptyset := \infty \), we have
Case 2: \(\ell \in (0,1)\). On \(M_{1}\), we have
and so we get
On \(M_{2}\), we have
So we have for \(n \leq h_{2}^{\ell}+h_{1}\) that
and for \(n > h_{2}^{\ell}+h_{1}\) that \(M_{2}''= [1,\infty ) \). So in conclusion, we have for the second case that
Hence in view of (8.8), we end up with
and
From this point on, we want to apply the results of Theorems 3.1 and 3.3, respectively. Therefore the assumptions made for Theorems 4.1 and 4.2 vary and meet the requirements of Theorems 3.1 and 3.3, respectively.
Let us first prove Theorem 4.1. We apply Theorem 3.1 with the estimates (8.9), (8.10). We may find constants \(\hat{\eta}_{1},\hat{\eta}_{2},\hat{\eta}_{3},\hat{\eta}_{4}\), depending on \(b,\lambda ,\delta , s, d, h_{1}\) and \(h_{2}\) (and thus also on the set \(K\) and \(\Phi \)) such that the following inequalities hold with probability at least \(1-\delta \):
Case 1: \(\alpha >s-(d+1)/2\) and \(s/(s+d+1) \leq 1/2\). Then \(\ell = (d+1)/s \geq 1\) and
Case 2: \(\alpha >s-(d+1)/2\) and \(s/(s+d+1)> 1/2\). Then \(\ell = (d+1)/s < 1\) and
Case 3: \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) \leq 1/2\). Then \(\ell = (\alpha /(d+1) + 1/2 )^{-1}\geq 1\) and
Case 4: \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) > 1/2\). Then \(\ell = (\alpha /(d+1) + 1/2 )^{-1} < 1\) and
Now the statement of Theorem 4.1 follows immediately. □
Let us now turn to the proof of Theorem 4.2. We can apply Theorem 3.3 for all \(n \in \mathbb{N}\) satisfying
where \(C_{\lambda ,b}\) is as in Theorem 3.3. Hence we may select constants \(\overline{\eta}_{1},\overline{\eta}_{2},\overline{\eta}_{3}\), \(\overline{\eta}_{4}\), depending on \(b,\lambda ,\delta , s, d, h_{1}\) and \(h_{2}\) (and thus also on the set \(K\) and \(\Phi \)) such that the following inequalities hold with probability at least \(1-\delta \):
Case 1: \(\alpha >s-(d+1)/2\) and \(s/(s+d+1) \leq 1/2\). Then \(\ell = (d+1)/s\geq 1\) and
Case 2: \(\alpha >s-(d+1)/2\) and \(s/(s+d+1)> 1/2\). Then \(\ell = (d+1)/s < 1\) and
Case 3: \(\alpha < s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) \leq 1/2\). Then \(\ell = (\alpha /(d+1) + 1/2 )^{-1}\geq 1\) and
Case 4: \(\alpha >s-(d+1)/2\) and \((\alpha /(d+1)+1/2)/(\alpha /(d+1)+3/2) > 1/2\). Then \(\ell = (\alpha /(d+1) + 1/2 )^{-1} < 1\) and
Note that here, the bounds can even be derived for all \(n \in \mathbb{N}\). To see this, check the definition of \(\kappa \) in the proof of Theorem 3.3. The proof of Theorem 4.2 is complete. □
9 Proof of Remark 4.3
Let \(\pi (x,t) = \pi _{t}(x)\) denote the density of the diffusion process given in (4.1). Furthermore, let \(\overline {\sigma }:=\frac{1}{2}\sigma \sigma ^{\top}\), where \(\sigma ^{\top}\) denotes the transposed matrix. Then the Fokker–Planck equation states that
This is a parabolic partial differential equation. To show that, under some conditions on \(\mu \) and \(\sigma \), the density \(\pi \) is infinitely differentiable in space and time, we want to make use of Friedman [21, Theorem 3.11]. To apply that theorem, we need to impose that \(\overline {\sigma }\) is uniformly elliptic, i.e., there exists \(\lambda >0\) such that for all \(t,x \in [0,T]\times \mathbb{R}^{d}\) and all \(\xi \in \mathbb{R}^{d}\),
Theorem 9.1
Let \(\overline {\sigma }\) be uniformly elliptic and let \(\sigma \) and \(\mu \) be \(p\) times Hölder-differentiable in space and \(q\) times Hölder-differentiable in time. If \(p=q=\infty \), then the partial derivative
exists for all \(0\leq k,\ell <\infty \) and is Hölder-continuous.
Proof
To use [21, Theorem 3.11], we need a bit of calculation. For brevity, we set
With elementary analysis, we write (9.1) as
Due to our assumptions, \(b(x,t)\) and \(c(x,t)\) are infinitely Hölder-differentiable. Now [21, Theorem 3.11] is applicable and our claim follows. □
References
Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis, 3rd edn. Springer, New York (2006)
Andersen, L., Broadie, M.: A primal–dual simulation algorithm for pricing multi-dimensional American options. Manag. Sci. 50, 1222–1234 (2004)
Bayraktar, E., Yao, S.: Optimal stopping for non-linear expectations – part I. Stoch. Process. Appl. 121, 185–211 (2011)
Bayraktar, E., Yao, S.: Optimal stopping for non-linear expectations – part II. Stoch. Process. Appl. 121, 212–264 (2011)
Bayraktar, E., Yao, S.: Optimal stopping with random maturity for nonlinear expectations. Stoch. Process. Appl. 127, 2586–2629 (2017)
Belomestny, D.: Solving optimal stopping problems via empirical dual optimization. Ann. Appl. Probab. 23, 1988–2019 (2013)
Belomestny, D., Hübner, T., Krätschmer, V., Nolte, S.: Minimax theorems for American options without time-consistency. Finance Stoch. 23, 209–238 (2019)
Belomestny, D., Krätschmer, V.: Optimal stopping under model uncertainty: randomized stopping times approach. Ann. Appl. Probab. 26, 1260–1295 (2016)
Belomestny, D., Krätschmer, V.: Addendum to “optimal stopping under model uncertainty: randomized stopping times approach”. Ann. Appl. Probab. 27, 1289–1293 (2017)
Belomestny, D., Schoenmakers, J.: Advanced Simulation-Based Methods for Optimal Stopping and Control. Palgrave Macmillan, London (2018)
Ben-Tal, A., Teboulle, M.: An old-new concept of convex risk measures: the optimized certainty equivalent. Math. Finance 17, 449–476 (2007)
Björk, T., Khapko, M., Murgoci, A.: On time-inconsistent stochastic control in continuous time. Finance Stoch. 21, 331–360 (2017)
Björk, T., Murgoci, A.: A theory of Markovian time-inconsistent stochastic control in discrete time. Finance Stoch. 18, 545–592 (2014)
Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities – A Nonasymptotic Theory of Independence. Clarendon Press, Oxford (2012)
Buldygin, V.V., Kozachenko, Y.: Metric Characterization of Random Variables and Random Processes. Am. Math. Soc., Providence (2000)
Clémençon, S., Lugosi, G., Vayatis, N.: Ranking and empirical minimization of U-statistics. Ann. Stat. 36, 844–874 (2008)
Delbaen, F.: The structure of m-stable sets and in particular of the set of risk neutral measures. In: Émery, M., Yor, M. (eds.) Memoriam Paul-André Meyer: Séminaire de Probabilités XXXIX. Lecture Notes in Mathematics, vol. 1874, pp. 215–258. Springer, Berlin (2006)
Desai, V.V., Farias, V.F., Moallemi, C.C.: Pathwise optimization for optimal stopping problems. Manag. Sci. 58, 2292–2308 (2012)
Ekren, I., Touzi, N., Zhang, J.: Optimal stopping under nonlinear expectation. Stoch. Process. Appl. 124, 3277–3311 (2014)
Föllmer, H., Schied, A.: Stochastic Finance, 4th edn. de Gruyter, Berlin (2016)
Friedman, A.: Partial Differential Equations of Parabolic Type. Courier Dover Publications, Mineola (2008)
Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, Berlin (2003)
Huang, Y.-J., Nguyen-Huu, A., Zou, Z.: General stopping behaviors of naive and noncommitted sophisticated agents, with application to probability distortion. Math. Finance 30, 310–340 (2020)
Huang, Y.-J., Yu, X.: Optimal stopping under model ambiguity: a time-consistent equilibrium approach. Math. Finance 31, 979–1012 (2021)
Krätschmer, V., Schoenmakers, J.: Representations for optimal stopping under dynamic monetary utility functionals. SIAM J. Financ. Math. 1, 811–832 (2010)
Kupper, M., Schachermayer, W.: Representation results for law invariant time consistent functions. Math. Financ. Econ. 2, 189–2010 (2009)
Maccheroni, F., Marinacci, M., Rustichini, A.: Ambiguity, aversion, robustness, and the variational representation of preferences. Econometrica 74, 1447–1498 (2006)
Nickl, R., Pötscher, B.: Bracketing metric entropy rates and empirical central limit theorems for function classes of Besov- and Sobolev-type. J. Theor. Probab. 20, 177–199 (2007)
Riedel, F.: Optimal stopping with multiple priors. Econometrica 77, 857–908 (2009)
Rogers, L.C.G.: Monte Carlo valuation of American options. Math. Finance 12, 271–286 (2002)
Selten, R.: Spieltheoretische Behandlung eines Oligopolmodells mit Nachfrageträgheit: Teil I: Bestimmung des dynamischen Preisgleichgewichts. Z. Gesamte Staatswiss. 121, 301–324 (1965)
Selten, R.: Reexamination of the perfectness concept for equilibrium points in extensive games. Int. J. Game Theory 4, 25–55 (1975)
Strotz, R.H.: Myopia and inconsistency in dynamic utility maximization. Rev. Econ. Stud. 23, 165–180 (1955)
Trevino-Aguilar, E.: American options in incomplete markets: upper and lower snell envelopes and robust partial hedging. PhD Thesis, Humboldt University at Berlin (2008). Available online at https://edoc.hu-berlin.de/handle/18452/16472
van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, New York (1996)
Wang, Y., Caflisch, R.: Fast computation of upper bounds for American-style options without nested simulation. J. Comput. Finance 13(4), 95–125 (2010)
Ye, F., Zhou, E.: Information relaxation and dual formulation of controlled Markov diffusions. IEEE Trans. Autom. Control 60, 2676–2691 (2015)
Acknowledgements
D.B. was supported by a grant for research centers in the field of AI provided by the Analytical Center for the Government of the Russian Federation (ACRF) in accordance with the agreement on the provision of subsidies (identifier of the agreement 000000D730321P5Q0002) and the agreement No. 70-2021-00139 with HSE University.
The authors thank the reviewers for useful comments and suggestions which have helped to improve an earlier draft.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Let \(C > 0\) and let \(X_{1},\dots ,X_{n}\) be independent random variables with \(0\leq X_{i}\leq C\) for \(i=1,\dots ,n\). Then we define
Using Boucheron et al. [14, Theorem 2.8] and adapting this to our conditions, we derive the Hoeffding inequality.
Theorem A.1
Under the above conditions, it holds for \(t>0\) that
The Bernstein inequality reads as follows (cf. van der Vaart and Wellner [35, Lemma 2.2.9]).
Theorem A.2
Under the above conditions, it holds for \(t>0\) that
In our setting, we consider random variables that are not only independent, but also have the same distribution. The following corollaries can be derived as immediate consequences of the Hoeffding respectively the Bernstein inequality.
Corollary A.3
Let \(X_{1},\ldots ,X_{n}\) be i.i.d. on a probability space \((\Omega ,\mathcal {F},P)\) and satisfy \(\sup _{1 \leq i \leq n} |X_{i}| \leq b <\infty \) \(P\)-a.s. Then
Corollary A.4
Let \(X_{1},\ldots ,X_{n}\) be i.i.d. random variables on some probability space \((\Omega ,\mathcal {F},P)\) and satisfy \(\sup _{1 \leq i \leq n} |X_{i}| \leq b < \infty \) \(P\)-a.s. Then
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Belomestny, D., Hübner, T. & Krätschmer, V. Solving optimal stopping problems under model uncertainty via empirical dual optimisation. Finance Stoch 26, 461–503 (2022). https://doi.org/10.1007/s00780-022-00480-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00780-022-00480-z
Keywords
- Model uncertainty
- Optimal stopping
- Dual representation
- Empirical dual optimisation
- Generative models
- Covering numbers
- Concentration inequalities