1 Introduction

Since the works of Kolmogorov, Hardy and Littlewood, Wiener, Doob and many other mathematicians, maximal inequalities have played an important role in analysis and probability. One of the main goals of this paper is to present a method of proving such estimates for continuous-time Hilbert-space-valued local martingales satisfying differential subordination.

We start with introducing the necessary background and notation. Let \((\Omega ,\mathcal{F },\mathbb P )\) be a complete probability space, filtered by a nondecreasing right-continuous family \((\mathcal{F }_t)_{t\ge 0}\) of sub-\(\sigma \)-fields of \(\mathcal{F }\). In addition, we assume that \(\mathcal{F }_0\) contains all the events of probability \(0\). Let \(X,\, Y\) be two adapted local martingales, taking values in a certain separable Hilbert space \(\mathcal H \) with norm \(|\cdot |\) and scalar product \(\langle \cdot ,\cdot \rangle \). With no loss of generality, we may take \(\mathcal H =\ell ^2\). As usual, we assume that the trajectories of the processes are right-continuous and have limits from the left. The symbol \([X,X]\) will stand for the quadratic covariance process of \(X\): this object is given by \([X,X]=\sum _{n=1}^\infty [X^n,X^n]\), where \(X^n\) denotes the \(n\)th coordinate of \(X\) and \([X^n,X^n]\) is the usual square bracket of the real-valued martingale \(X^n\) (see e.g. Dellacherie and Meyer [15] for details). In what follows, \(X^*=\sup _{t\ge 0}|X_t|\) will denote the maximal function of \(X\), we also use the notation \(X^*_t=\sup _{0\le s\le t}|X_s|\). Furthermore, for \(1\le p\le \infty \), we shall write \(||X||_p=\sup _{t\ge 0}||X_t||_p\) and \(|||X|||_p=\sup _\tau ||X_\tau ||_p\), where the second supremum is taken over all adapted bounded stopping times \(\tau \).

Throughout the paper we assume that the process \(Y\) is differentially subordinate to \(X\). This concept was originally introduced by Burkholder [8] in the discrete-time case: a martingale \(g=(g_n)_{n\ge 0}\) is differentially subordinate to \(f=(f_n)_{n\ge 0}\), if for any \(n\ge 0\) we have \(|\text{ d}g_n|\le |\text{ d}f_n|\). Here \(\text{ d}f=(\text{ d}f_n)_{n\ge 0}\), \(\text{ d}g=(\text{ d}g_n)_{n\ge 0}\) are the difference sequences of \(f\) and \(g\), respectively, given by the equations

$$\begin{aligned} f_n=\sum _{k=0}^n \text{ d}f_k\quad \text{ and}\quad g_n=\sum _{k=0}^n \text{ d}g_k,\quad n=0,\,1,\,2,\,\ldots . \end{aligned}$$

The extension of the domination to the continuous-time setting is due to Bañuelos and Wang [3] and Wang [23]. We say that \(Y\) is differentially subordinate to \(X\), if the process \(([X,X]_t-[Y,Y]_t)_{t\ge 0}\) is nondecreasing and nonnegative as a function of \(t\). If we treat given discrete-time martingales \(f\), \(g\) as continuous-time processes (via \(X_t=f_{\lfloor t\rfloor }\) and \(Y_t=g_{\lfloor t\rfloor }\), \(t\ge 0\)), we see this domination is consistent with Burkholder’s original definition of differential subordination.

To illustrate this notion, consider the following example. Suppose that \(X\) is an \(\mathcal{H }\)-valued martingale, \(H\) is a predictable process taking values in the interval \([-1,1]\) and let \(Y\) be given as the stochastic integral \(Y_t=H_0X_0+\int _{0+}^t H_s\text{ d}X_s\), \(t\ge 0\). Then \(Y\) is differentially subordinate to \(X\): we have

$$\begin{aligned}{}[X,X]_t-[Y,Y]_t=(1-H_0^2)|X_0|^2+\int \limits _{0+}^t (1-H_s^2)\text{ d}[X,X]_s. \end{aligned}$$

Another example for stochastic integrals, which plays an important role in applications (see e.g., [2, 3, 16]), is the following. Suppose that \(B\) is a Brownian motion in \(\mathbb{R }^d\) and \(H,\, K\) are predictable processes taking values in the matrices of dimensions \(m\times d\) and \(n\times d\), respectively. For any \(t\ge 0\), define

$$\begin{aligned} X_t=\int \limits _{0+}^t H_s\cdot \text{ d}B_s\quad \text{ and}\quad Y_t=\int \limits _{0+}^t K_s\cdot \text{ d}B_s. \end{aligned}$$

If the Hilbert–Schmidt norms of \(H\) and \(K\) satisfy \(||K_t||_\mathrm{HS}\le ||H_t||_\mathrm{HS}\) for all \(t>0\), then \(Y\) is differentially subordinate to \(X\): this follows from the identity

$$\begin{aligned}{}[X,X]_t-[Y,Y]_t=\int \limits _{0+}^t \left(||H_s||_\mathrm{HS}^2-||K_s||_\mathrm{HS}^2\right)\,\text{ d}s. \end{aligned}$$

The differential subordination implies many interesting inequalities comparing the sizes of \(X\) and \(Y\). A celebrated result of Burkholder gives the following information on the \(L^p\)-norms (see [8, 10, 12, 13, 23]).

Theorem 1.1

Suppose that \(X\), \(Y\) are Hilbert-space-valued local martingales such that \(Y\) is differentially subordinate to \(X\). Then

$$\begin{aligned} |||Y|||_p\le (p^*-1)|||X|||_p,\quad 1<p<\infty , \end{aligned}$$
(1.1)

where \(p^*=\max \{p,p/(p-1)\}\). The constant is the best possible, even if \(\mathcal H =\mathbb{R }\).

For \(p=1\), the above moment inequality does not hold with any finite constant, but we have the corresponding weak-type \((1,1)\) estimate. In fact, we have the following result for a wider range of parameters \(p\), proved by Burkholder [8] for \(1\le p\le 2\) and Suh [22] for \(p>2\). See also Wang [23].

Theorem 1.2

Suppose that \(X\), \(Y\) are Hilbert-space-valued local martingales such that \(Y\) is differentially subordinate to \(X\). Then

$$\begin{aligned} \mathbb P (Y^*\ge 1)\le \frac{2}{\Gamma (p+1)}|||X|||_p^p,\quad 1\le p\le 2, \end{aligned}$$

and

$$\begin{aligned} \mathbb P (Y^*\ge 1)\le \frac{p^{p-1}}{2}|||X|||_p^p,\quad 2<p<\infty . \end{aligned}$$

Both inequalities are sharp, even if \(\mathcal H =\mathbb{R }\).

There are many other related results, see e.g., the papers [3] and [4] by Bañuelos and Wang, [11] and [13] by Burkholder and consult the references therein. For more recent works, we refer the interested reader to the papers [1820] by the author, and [6, 7] by Borichev et al. The estimates have found numerous applications in many areas of mathematics, in particular, in the study of the boundedness of various classes of Fourier multipliers (consult, for instance, [13, 12, 16, 17]).

There is a general method, invented by Burkholder, which enables one not only to establish various estimates for differentially subordinated martingales, but is also very efficient in determining the optimal constants in such inequalities. The idea is to construct an appropriate special function, an upper solution to a nonlinear problem corresponding to the inequality under investigation, and then to exploit its properties. See the survey [13] for the detailed description of the technique in the discrete-time setting and consult Wang [23] for the necessary changes which have to be implemented so that the method worked in the continuous-time setting.

The above results can be extended in another, very interesting direction. Namely, in the present paper we will be interested in inequalities involving the maximal functions of \(X\) and/or \(Y\). Burkholder [14] modified his technique so that it could be used to study such inequalities for stochastic integrals, and applied it to obtain the following result, which can be regarded as another version of (1.1) for \(p=1\).

Theorem 1.3

Suppose that \(X\) is a real-valued martingale and \(Y\) is the stochastic integral, with respect to \(X\), of some predictable real-valued process \(H\) taking values in \([-1,1]\). Then we have the sharp estimate

$$\begin{aligned} |||Y|||_1\le \gamma ||X^*||_1, \end{aligned}$$
(1.2)

where \(\gamma =2.536\ldots \) is the unique positive number satisfying

$$\begin{aligned} \gamma =3-\exp \frac{1-\gamma }{2}. \end{aligned}$$

As we have already observed above, if \(X\) and \(Y\) satisfy the assumptions of this theorem, then \(Y\) is differentially subordinate to \(X\). An appropriate modification of the proof in [14] shows that the assertion is still valid if we impose this less restrictive condition on the processes. However, the assertion does not hold any more if we pass from the real to the vector valued case. Here is one of the main results of this paper.

Theorem 1.4

Suppose that \(X\), \(Y\) are Hilbert-space-valued local martingales such that \(Y\) is differentially subordinate to \(X\). Then

$$\begin{aligned} |||Y|||_1\le \beta ||X^*||_1, \end{aligned}$$
(1.3)

where \(\beta =2.585\ldots \) is the unique positive number satisfying

$$\begin{aligned} \beta =2+\log \frac{1+\beta }{2}. \end{aligned}$$
(1.4)

The constant \(\beta \) is the best possible, even for discrete-time martingales taking values in a two-dimensional subspace of \(\mathcal H \).

This is a very surprising result. In most cases, the inequalities for stochastic integrals of real valued martingales carry over, with unchanged constants, to the corresponding bounds for vector-valued local martingales satisfying differential subordination. In other words, given a sharp inequality for \(\mathcal H \)-valued differentially subordinated martingales, the extremal processes, i.e. those for which the equality is (almost) attained, can be usually realized as stochastic integrals in which the integrator takes values in one-dimensional subspace of \(\mathcal H \). See e.g., the statements of Theorems 1.1 and 1.2. Here the situation is different: the optimal constant does depend on the dimension of the range of \(X\) and \(Y\).

Finally, let us mention here another related result. In general, the best constants in non-maximal inequalities for differentially subordinated local martingales do not change when we restrict ourselves to continuous-path processes; see e.g., Section 15 in [8] for the justification of this phenomenon. However, if we study the maximal estimates, the best constants may be different: for example, the passage to continuous-path local martingales reduces the constant \(\gamma \) in (1.2) to \(\sqrt{2}\). Specifically, we have the following theorem, which is one of the principal results of [21].

Theorem 1.5

Assume that \(X,\, Y\) are Hilbert-space-valued, continuous-path local martingales such that \(Y\) is differentially subordinate to \(X\). Then

$$\begin{aligned} ||Y||_p\le \sqrt{\frac{2}{p}}\, ||X^*||_p,\quad 1\le p\le 2, \end{aligned}$$

and

$$\begin{aligned} ||Y||_p\le (p-1)||X^*||_p,\quad 2<p<\infty . \end{aligned}$$

Both inequalities are sharp, even if \(\mathcal H =\mathbb{R }\).

We have organized the paper as follows. The next section is devoted to an extension of Burkholder’s method. In Sect. 3 we apply the technique to establish (1.3). In Sect. 4 we prove that the constant \(\beta \) cannot be replaced in (1.3) by a smaller one. The final part of the paper contains the proofs of technical facts needed in the earlier considerations.

2 On the Method of Proof

Burkholder’s method from [14] is a powerful tool for proving maximal inequalities for transforms of discrete-time real-valued martingales. Then the results for the wider setting of stochastic integrals are obtained by the use of approximation theorems of Bichteler [5]. This approach has the advantage that it avoids practically all the technicalities which arise naturally in the study of continuous-time processes. On the other hand, it does not allow to study estimates for (local) martingales under differential subordination; the purpose of this section is to present a refinement of the method which can be used to handle such problems.

The general statement is the following. Let \(V:\mathcal H \times \mathcal H \times [0,\infty )\times [0,\infty )\rightarrow \mathbb{R }\) be a given Borel function and suppose that we want to show the estimate

$$\begin{aligned} \mathbb{E }V(X_t,Y_t,X_t^*,Y_t^*)\le 0 \end{aligned}$$
(2.1)

for any \(t\ge 0\) and any \(\mathcal H \)-valued local martingales \(X,\, Y\) such that \(Y\) is differentially subordinate to \(X\). Due to some technical reasons, we shall deal with a slightly different, localized version of (2.1) (see Theorem 2.2 for the precise statement). Let \(D=\mathcal H \times \mathcal H \times (0,\infty )\times (0,\infty )\). Introduce the class \(\mathcal U (V)\), which consists of all \(C^2\) functions \(U:D\rightarrow \mathbb{R }\) satisfying (2.2)–(2.5) below: for any \((x,y,z,w)\in D\),

$$\begin{aligned} U(x,y,z,w)&\le 0 \quad \text{ if} |x|\le z,\,|y|\le \min \{|x|,w\}, \end{aligned}$$
(2.2)
$$\begin{aligned} U(x,y,z,w)&\ge V(x,y,z,w)\quad \text{ if} |x|\le z,\,|y|\le w. \end{aligned}$$
(2.3)

Furthermore, there is a locally bounded measurable function \(c:D\rightarrow [0,\infty )\) such that for all \((x,y,z,w)\in D\) with \(|x|\le z\), \(|y|\le w\) and all \(h,\,k\in \mathcal H \),

$$\begin{aligned}&\left\langle U_{xx}(x,y,z,w)h,h\right\rangle +2\left\langle U_{xy}(x,y,z,w)h,k\right\rangle +\left\langle U_{yy}(x,y,z,w)k,k\right\rangle \nonumber \\&\quad \le \; c(x,y,z,w)\left(|k|^2-|h|^2\right)\!. \end{aligned}$$
(2.4)

Finally, for all \((x,y,z,w)\in D\) with \(|x|\le z\), \(|y|\le w\) and all \(h,\,k\in \mathcal H \) with \(|k|\le |h|\),

$$\begin{aligned}&U(x+h,y+k,|x+h|\vee z,|y+k|\vee w)\nonumber \\&\qquad \le U(x,y,z,w)+\left\langle U_x(x,y,z,w),h\right\rangle +\left\langle U_y(x,y,z,w),k\right\rangle . \end{aligned}$$
(2.5)

The latter condition implies that

$$\begin{aligned}&U_z(x,y,z,w)\le 0 \quad \text{ if} |x|=z,\nonumber \\&U_w(x,y,z,w)\le 0 \quad \text{ if} |y|=w. \end{aligned}$$
(2.6)

For example, let us establish the bound for \(U_w\). Pick \(x,\,y,\,z,\,w\) with \(|y|=w\); by the continuity of \(U_w\), we may and do assume that \(|x|<z\). Apply (2.5) to \(x,\,y,\,z,\,w\) and \(h=k=sy\) for some \(s>0\). Then, take all the terms on one side, divide throughout by \(s\) and let \(s\rightarrow 0\). Since \(|x+sy|\vee z=z\) for sufficiently small \(s\) and \(|y+sy|\vee w=(1+s)w\) for all \(s>0\), we obtain the second estimate in (2.6); the bound for \(U_z\) is established similarly.

Before we turn to the main result of this section, let us mention here a technical fact, which will be needed later. Recall that for any semimartingale \(X\) there exists a unique continuous local martingale part \(X^c\) of \(X\) satisfying \(X^c_0=0\) and

$$\begin{aligned}{}[X,X]_t=|X_0|^2+[X^c,X^c]_t+\sum _{0< s\le t} |\Delta X_s|^2\quad \text{ for} t\ge 0. \end{aligned}$$

Here \(\Delta X_s=X_s-X_{s-}\) is the jump of \(X\) at time \(s\). Furthermore, \([X^c,X^c]=[X,X]^c\), the pathwise continuous part of \([X,X]\). Here is Lemma 1 of Wang [23].

Lemma 2.1

If \(X\) and \(Y\) are semimartingales, then \(Y\) is differentially subordinate to \(X\) if and only if \(Y^c\) is differentially subordinate to \(X^c\), \(|\Delta Y_t|\le |\Delta X_t|\) for all \(t>0\) and \(|Y_0|\le |X_0|\).

We are ready to study the interplay between the class \(\mathcal U (V)\) and the bound (2.1).

Theorem 2.2

Assume that \(\mathcal U (V)\) is nonempty and \(X,\, Y\) are Hilbert-space-valued local martingales such that \(Y\) is differentially subordinate to \(X\). Then there is a nondecreasing sequence \((\tau _N)_{N\ge 1}\) of stopping times such that \(\lim _{N\rightarrow \infty }\tau _N=\infty \) and

$$\begin{aligned} \mathbb{E }V(X_{\tau _N\wedge t},Y_{\tau _N\wedge t},X_{\tau _N\wedge t}^*\vee \varepsilon ,Y_{\tau _N\wedge t}^*\vee \varepsilon )\le 0 \end{aligned}$$
(2.7)

for all \(N\ge 1,\, t>0\), and \(\varepsilon >0\).

Proof

Let \((\sigma _n)_{n\ge 1}\) be the localizing sequence for \(X\) and \(Y\). Fix \(t>0,\, \varepsilon >0,\, N\in \{ 1,\,2,\,\ldots \}\) and let

$$\begin{aligned} \tau _N=\sigma _N\wedge \inf \{s>0:|X_s|+|Y_s|+|X_s^c|+|Y_s^c|\ge N\}. \end{aligned}$$

Since \(X^c_{\tau _N\wedge t}\) is bounded, for any \(\delta >0\) there is \(\mathcal D =\mathcal D (\delta , N,t)\ge 1\) such that

$$\begin{aligned} \mathbb{E }\sum _{k>\mathcal D } [X^{kc},X^{kc}]_{\tau _N\wedge t}=\mathbb{E }\sum _{k>\mathcal D } |X^{kc}_{\tau _N\wedge t}|^2<\delta . \end{aligned}$$
(2.8)

For \(0\le s\le t\) and \(d\ge \mathcal D \), put

$$\begin{aligned} X^{(d)}_s&= (X_{s}^1,X_{s}^2,\ldots ,X_{s}^d, 0,0,\ldots ), \\ Y^{(d)}_s&= (Y_{s}^1,Y_{s}^2,\ldots ,Y_{s}^d,0,0,\ldots ) \end{aligned}$$

and

$$\begin{aligned} Z^{(d)}_s=(X^{(d)}_{s},Y^{(d)}_{s},X^{(d)*}_{s}\vee \varepsilon ,Y^{(d)*}_s\vee \varepsilon ). \end{aligned}$$

There is a sequence \((T_{N,j})_{j\ge 1}\) of stopping times with \(T_{N,j}\uparrow \tau _N\), localizing the stochastic integrals \(\int U_x(Z^{(d)}_{s-})\cdot \text{ d}X_s^{(d)},\, \int U_y(Z^{(d)}_{s-})\cdot \text{ d}Y_s^{(d)}\). Since \(X^{(d)},\, Y^{(d)}\) take values in finite-dimensional subspace, we may apply Ito’s formula to get

$$\begin{aligned} U(Z^{(d)}_{T_{N,j}\wedge t})-U(Z^{(d)}_0)=I_1+I_2+I_3/2+I_4, \end{aligned}$$
(2.9)

where

$$\begin{aligned} I_1&= \int \limits _{0+}^{T_{N,j}\wedge t} U_x(Z^{(d)}_{s-})\cdot \text{ d}X^{(d)}_s+\int \limits _{0+}^{T_{N,j}\wedge t} U_y(Z^{(d)}_{s-})\cdot \text{ d}Y^{(d)}_s,\\ I_2&= \int \limits _{0+}^{T_{N,j}\wedge t} U_z(Z^{(d)}_{s-})\,\text{ d}(X^{(d)*}\vee \varepsilon )^c_s+\int \limits _{0+}^{T_{N,j}\wedge t} U_w(Z^{(d)}_{s-})\,\text{ d}(Y^{(d)*}\vee \varepsilon )^c_s,\\ I_3&= \int \limits _{0+}^{T_{N,j}\wedge t} U_{xx}(Z^{(d)}_{s-})\,\text{ d}[X^{(d)},X^{(d)}]^c_s+2\int \limits _{0+}^{T_{N,j}\wedge t} U_{xy}(Z^{(d)}_{s-})\,\text{ d}[X^{(d)},Y^{(d)}]^c_s\\&+\int \limits _{0+}^{T_{N,j}\wedge t} U_{yy}(Z^{(d)}_{s-})\,\text{ d}[Y^{(d)},Y^{(d)}]^c_s,\\ I_4&= \;\sum _{0<s\le T_{N,j}\wedge t}[U(Z^{(d)}_s)-U(Z^{(d)}_{s-})-\langle U_x(Z^{(d)}_{s-}),\Delta X^{(d)}_s\rangle \\&-\,\langle U_y(Z^{(d)}_{s-}),\Delta Y^{(d)}_s\rangle ]. \end{aligned}$$

Note that the integrals in \(I_2\) are with respect to the continuous parts of the processes \(X^{(d)*}\vee \varepsilon \) and \(Y^{(d)*}\vee \varepsilon \); this is due to the lack of the terms \(U_z(Z^{(d)}_{s-})\Delta (X^{(d)*}_s\vee \varepsilon )\) and \(U_w(Z^{(d)}_{s-})\Delta (Y^{(d)*}_s\vee \varepsilon )\) in \(I_4\).

Let us analyze the terms \(I_1\)\(I_4\). We have \(\mathbb{E }I_1=0\), since both the stochastic integrals are martingales. Next, \(I_2\le 0\): by (2.6), we have \(U_z(Z^{(d)}_{s-})\le 0\) on the set \(\{s: |X^{(d)}_{s-}|=X^{(d)*}_{s-}\vee \varepsilon \}\) in which the support of d\((X^{(d)*}\vee \varepsilon )_s^c\) is contained. This gives that the first integral in \(I_2\) is nonpositive, the second one is handled analogously. To deal with \(I_3\), fix \(0\le s_0<s_1\le t\). For any \(\ell \ge 0\), let \((\eta _i^\ell )_{1\le i\le i_\ell }\) be a nondecreasing sequence of stopping times with \(\eta _0^\ell =s_0,\,\eta _{i_\ell }^\ell =s_1\) such that \(\lim _{\ell \rightarrow \infty }\max _{1\le i\le i_\ell -1}|\eta _{i+1}^\ell -\eta _i^\ell |=0\). Keeping \(\ell \) fixed, we apply, for each \(i=0,\,1,\,2,\,\ldots ,\,i_\ell \), the property (2.4) to \(x=X_{s_0-}^{(d)},\, y=Y_{s_0-}^{(d)},\, z=X^{(d)*}_{s_0-}\vee \varepsilon \), \(w=Y^{(d)*}_{s_0-}\vee \varepsilon \) and \(h=h_i^\ell =X^{(d)c}_{T_{N,j}\wedge \eta _{i+1}^\ell }-X^{(d)c}_{T_{N,j}\wedge \eta _{i}^\ell }\), \(k=k_i^\ell =Y^{(d)c}_{T_{N,j}\wedge \eta _{i+1}^\ell }-Y^{(d)c}_{T_{N,j}\wedge \eta _{i}^\ell }\). We sum the obtained \(i_\ell +1\) inequalities and let \(\ell \rightarrow \infty \). Using the notation \([S,T]_s^u=[S,T]_u-[S,T]_s\), we may write the result in the form

$$\begin{aligned}&\sum _{m=1}^d\sum _{n=1}^d \left[ U_{x_mx_n}(Z^{(d)}_{s_0-})[X^{mc},X^{nc}]_{T_{N,j}\wedge s_0}^{T_{N,j}\wedge s_1} +2U_{x_my_n}(Z^{(d)}_{s_0-})[X^{mc},Y^{nc}]_{T_{N,j}\wedge s_0}^{T_{N,j}\wedge s_1}\right.\\&\qquad \qquad \quad + \left. U_{y_my_n}(Z^{(d)}_{s_0-})[Y^{mc},Y^{nc}]_{T_{N,j}\wedge s_0}^{T_{N,j}\wedge s_1}\right]\\&\qquad \le \; c\left(Z^{(d)}_{s_0-}\right) \sum _{k=1}^d \left([Y^{kc},Y^{kc}]_{T_{N,j}\wedge s_0}^{T_{N,j}\wedge s_1}-[X^{kc},X^{kc}]_{T_{N,j}\wedge s_0}^{T_{N,j}\wedge s_1}\right)\\&\qquad \le \; c\left(Z^{(d)}_{s_0-}\right)\left([Y^c,Y^c]_{T_{N,j}\wedge s_0}^{T_{N,j}\wedge s_1}-[X^c,X^c]_{T_{N,j}\wedge s_0}^{T_{N,j}\wedge s_1}+\sum _{k>d} [X^{kc},X^{kc}]_{T_{N,j}\wedge s_0}^{T_{N,j}\wedge s_1}\right)\\&\qquad \le \; c\left(Z^{(d)}_{s_0-}\right)\sum _{k>d} [X^{kc},X^{kc}]_{T_{N,j}\wedge s_0}^{T_{N,j}\wedge s_1}\\&\qquad = \; c\left(Z^{(d)}_{T_{N,j}\wedge s_0-}\right)\sum _{k>d} [X^{kc},X^{kc}]_{T_{N,j}\wedge s_0}^{T_{N,j}\wedge s_1}, \end{aligned}$$

where in the third passage we have exploited the differential subordination of \(Y^c\) to \(X^c\). From the local boundedness of \(c\) and the definition of \(\tau _N\), we infer that on the set \(\{T_{N,j}>0\},\, c(Z_{T_{N,j}\wedge s_0-}^{(d)})\) is bounded by a constant \(C\) depending only on \(N\) and \(\varepsilon \). Thus

$$\begin{aligned} I_3 \le C\sum _{k>d} [X^{kc},X^{kc}]_{T_{N,j}\wedge t}, \end{aligned}$$

using a standard approximation of integrals by discrete sums. Finally, we see that each summand in \(I_4\) is nonpositive, directly from (2.5) and the fact that \(|\Delta Y_s|\le |\Delta X_s|\), see Lemma 2.1. Consequently,

$$\begin{aligned}&I_4\le \; U\left(Z^{(d)}_{T_{N,j}\wedge t}\right)-U\left(Z^{(d)}_{T_{N,j}\wedge t-}\right)\\&\qquad \quad -\left\langle U_x(Z^{(d)}_{T_{N,j}\wedge t-}),\Delta X_{T_{N,j}\wedge t}^{(d)}\right\rangle -\left\langle U_y(Z^{(d)}_{T_{N,j}\wedge t-}),\Delta Y_{T_{N,j}\wedge t}^{(d)}\right\rangle \end{aligned}$$

on the set \(\{T_{N,j}>0\}\). Plug all the above estimates into (2.9) and take expectation of both sides. By (2.8), the bound we obtain can be rewritten in the form

$$\begin{aligned}&\mathbb{E }\left[U(Z^{(d)}_{T_{N,j}\wedge t-})-U(Z^{(d)}_0)+\left\langle U_x(Z^{(d)}_{T_{N,j}\wedge t-}),\Delta X^{(d)}_{T_{N,j}\wedge t}\right\rangle \right.\nonumber \\&\qquad \left. +\left\langle U_y(Z^{(d)}_{T_{N,j}\wedge t-}),\Delta Y_{T_{N,j}\wedge t}^{(d)}\right\rangle \right]1_{\{T_{N,j}>0\}}\le C\delta . \end{aligned}$$
(2.10)

For fixed \(N\), the random variables \(Z^{(d)}_{T_{N,j}\wedge t-}\), \(j\ge 1,\, d\ge \mathcal D \), are uniformly bounded on \(\{\tau _N>0\}\), in view of the definition of \(\tau _N\). Moreover, we have

$$\begin{aligned} |\Delta X^{(d)}_{T_{N,j}\wedge t}|&= |\Delta X^{(d)}_{T_{N,j}\wedge t}|1_{\{T_{N,j}=\tau _N\}}+|\Delta X^{(d)}_{T_{N,j}\wedge t}|1_{\{T_{N,j}<\tau _N\}}\\&\le |\Delta X_{\tau _N\wedge t}^{(d)}|1_{\{T_{N,j}=\tau _N\}}+\left(|X^{(d)}_{T_{N,j}\wedge t}|+|X^{(d)}_{T_{N,j}\wedge t-}|\right)1_{\{T_{N,j}<\tau _N\}}\\&\le |\Delta X_{\tau _N\wedge t}|+2N \end{aligned}$$

and, similarly, \(|\Delta Y^{(d)}_{T_{N,j}\wedge t}|\le |\Delta Y_{\tau _N\wedge t}|+2N\). The random variables \(|\Delta X_{\tau _N\wedge t}|\) and \(|\Delta Y_{\tau _N\wedge t}|\) are integrable on \(1_{\{\tau _N>0\}}\), since \((\tau _N)_{N\ge 1}\) localizes \(X\) and \(Y\). Thus, if we let \(j\rightarrow \infty \) and then \(d\rightarrow \infty \) in (2.10), we obtain

$$\begin{aligned}&\mathbb{E }\,\left[U(Z_{\tau _N\wedge t-})-U(Z_0)+\langle U_x(Z_{\tau _N\wedge t-}),\Delta X_{\tau _N\wedge t}\rangle \right.\\&\left.\qquad +\langle U_y(Z_{\tau _N\wedge t-}),\Delta Y_{\tau _N\wedge t}\rangle \right]1_{\{\tau _N>0\}}\le C\delta , \end{aligned}$$

by Lebesgue’s dominated convergence theorem. Here \(Z_s=(X_{s},Y_{s},X^*_{s}\vee \varepsilon ,Y^*_s\vee \varepsilon ),\, s\ge 0\). Let \(\delta \rightarrow 0\) and apply (2.5) to get

$$\begin{aligned} \mathbb{E }\left[U(Z_{\tau _N\wedge t})-U(Z_0)\right]=\mathbb{E }\left[U(Z_{\tau _N\wedge t})-U(Z_0)\right]1_{\{\tau _N>0\}}\le 0. \end{aligned}$$

It remains to use (2.2) and (2.3) to complete the proof. \(\square \)

Remark 2.3

A careful inspection of the proof of the above theorem shows that the function \(U\) need not be given on the whole \(D=\mathcal H \times \mathcal H \times (0,\infty )\times (0,\infty )\). Indeed, it suffices to define it on a certain neighborhood of the set \(\{(x,y,z,w)\in D: |x|\le z,\,|y|\le w\}\) in which the process \(Z\) takes its values. This can be further relaxed: if we are allowed to work with those \(X,\, Y\) which are bounded away from \(0\), then all we need is a \(C^2\) function \(U\) given on some neighborhood of \(\{(x,y,z,w)\in D: 0<|x|\le z,\,0<|y|\le w\}\), satisfying (2.2)–(2.5) on this set.

3 The Special Function Corresponding to (1.3)

Now we apply the approach described in the previous section to establish (1.3). Let \(V:D\rightarrow \mathbb{R }\) be given by \(V(x,y,z,w)=|y|-\beta (|x|\vee z)\). Furthermore, put

$$\begin{aligned} U(x,y,z,w)=z\Phi \left(\frac{|y|^2-|x|^2}{z^2}+1\right) \end{aligned}$$
(3.1)

for \(\{(x,y,z,w)\in D: 0<|x|< \sqrt{|y|^2+z^2},\,|y|>0\}\), where \(\Phi :[0,\infty )\rightarrow \mathbb{R }\) is defined by

$$\begin{aligned} \Phi (t)=\left(1+\frac{1}{\beta }\right) \left[\sqrt{t}-\log (1+\sqrt{t})-(2-\log 2)\right]\!. \end{aligned}$$

We start with four technical lemmas, which will be proved in Sect. 5.

Lemma 3.1

  1. (i)

    We have \(\Phi (t)\le \Phi (1)\le 0\) for \(t\le 1\).

  1. (ii)

    We have \(\Phi (t)\ge \sqrt{t}-\beta \) for \(t\ge 0\).

  2. (iii)

    For any \(c\ge 0\) the function

    $$\begin{aligned} f(s)=-\sqrt{s}\log \left(1+\frac{c}{\sqrt{s}}\right)-(2-\log 2)\sqrt{s} \end{aligned}$$

    is convex and nonincreasing.

  3. (iv)

    For any \(c>0\), the function

    $$\begin{aligned} f(s)=\sqrt{s}-c\log \left(1+\frac{\sqrt{s}}{c}\right) \end{aligned}$$

    is concave.

Lemma 3.2

The function \(y\mapsto \Phi (|y|^2)\) is convex on \(\mathcal H \).

Lemma 3.3

  1. (i)

    For any \(y,\,k\in \mathcal H \), we have

    $$\begin{aligned} (2-\log 2)(1-\sqrt{1+|k|^2})&+\,(1-\sqrt{1+|k|^2})\log (\sqrt{1+|k|^2}+|y+k|)\nonumber \\&+\, \sqrt{1+|k|^2}\log (\sqrt{1+|k|^2})\le 0. \end{aligned}$$
    (3.2)
  1. (ii)

    For any \(y,\,k\in \mathcal H \) with \(|y|+1\le \sqrt{1+|k|^2}+|k|-|y|\) we have

    $$\begin{aligned} (2\!-\!\log 2)(1\!-\!\sqrt{1+|k|^2})&+\sqrt{1+|k|^2}\left[2\frac{|k|-|y|}{\sqrt{1+|k|^2}}\!-\!\log \left(1+\frac{|k|-|y|}{\sqrt{1{+}|k|^2}}\right)\!\right]\quad \nonumber \\&\quad \qquad \qquad \le \frac{|k|}{1+|y|}-\log (1+|y|). \end{aligned}$$
    (3.3)

Lemma 3.4

Assume that \(x,\,y,\,h,\,k\in \mathcal H \) and \(z>0\) satisfy \(|x|=z\), \(\langle x,h\rangle \ge 0\) and \(|k|\le |h|\). Then

$$\begin{aligned} U(x+h,y+k,|x+h|\vee z,|y+k|\vee w)&\le U(x,y,z,w)\nonumber \\&+\left(1\!+\!\frac{1}{\beta }\right)\frac{ \langle y,k\rangle \!-\!\langle x,h\rangle }{|x|+|y|}.\qquad \qquad \end{aligned}$$
(3.4)

Equipped with these four lemmas, we turn to the following statement.

Theorem 3.5

The function \(U\) belongs to the class \(\mathcal U (V)\).

Proof

We check each of the conditions (2.2)–(2.5) separately.

  • The estimate (2.2): this follows immediately from the first part of Lemma 3.1.

  • The property (2.4): we derive that the left-hand side of the estimate equals

    $$\begin{aligned} \frac{|k|^2-|h|^2}{z+\sqrt{S}}-\frac{(\langle y,k\rangle -\langle x,h\rangle )^2}{2(z+\sqrt{S})^2\sqrt{S}}\le \frac{|k|^2-|h|^2}{z+\sqrt{S}}, \end{aligned}$$

    with \(S=|y|^2-|x|^2+z^2\). The property follows.

  • The majorization (2.3): in particular, (2.4) implies that for any \(h\) the function \(t\mapsto U(x+th,y,z,w)\) is concave on \([t_-,t_+]\), where \(t_-=\inf \{t: |x+th|\le z\}\) and \(t_+=\sup \{ t:|x+th|\le z \}\). Consequently, it suffices to verify (2.3) only for \((x,y,z,w)\) satisfying \(|x|= z\). But this reduces to the second part of Lemma 3.1.

  • The condition (2.5): by homogeneity and continuity of both sides, we may assume that \(z=1\) and \(|x|<1\). Define

    $$\begin{aligned} H(t)=U(x+th,y+tk,|x+th|\vee 1,|y+tk|\vee w) \end{aligned}$$

    for \( t\in \mathbb{R }\) and let \(t_-\), \(t_+\) be as above; note that \(t_-<0\) and \(t_+>0\). By (2.4), \(H\) is concave on \([t_-,t_+]\) and hence (2.5) holds if \(|x+h|\le 1\). Suppose then that \(|x+h|>1\) or, in other words, that \(t_+<1\). The vector \(x^{\prime }=x+t_+h\) satisfies \(\langle x^{\prime },h\rangle \ge 0\): this is equivalent to \(\frac{d}{dt}|x+th|^2|_{t=t_+}\ge 0\). Hence, by (3.4), if we put \(y^{\prime }=y+t_+k\), then

    $$\begin{aligned}&U(x+h,\;y+k,|x+h|\vee 1,|y+k|\vee w)\\&\qquad =U(x^{\prime }+(1-t_+)h,y^{\prime }+(1-t_+)k,|x+h|\vee 1,|y+k|\vee w)\\&\qquad \le U(x^{\prime },y^{\prime },1,w)+\left(1+\frac{1}{\beta }\right)\frac{\langle y^{\prime },(1-t_+)k\rangle -\langle x^{\prime },(1-t_+)h\rangle }{1+|y^{\prime }|}\\&\qquad =H(t_+)+H^{\prime }_-(t_+)(1-t_+)\\&\qquad \le H(0)+H^{\prime }(0)t_++H^{\prime }(0)(1-t_+)=H(0)+H^{\prime }(0). \end{aligned}$$

    This is precisely the claim. \(\square \)

Proof of (1.3)

It suffices to establish the estimate for \(X^*\in L^1\), because otherwise there is nothing to prove. Furthermore, we may assume that \(Y\) is bounded away from \(0\). To see this, consider a new Hilbert space \( \mathbb{R }\,\times \, \mathcal H \) and the martingales \((\delta ,X)\) and \((\delta ,Y)\), with \(\delta >0\). These martingales are bounded away from \(0\) and \((\delta ,Y)\) is differentially subordinate to \((\delta ,X)\). Having proved (1.3) for these processes, we let \(\delta \rightarrow 0\) and get the bound for \(X\) and \(Y\), by Lebesgue’s dominated convergence theorem.

We must show that for any bounded stopping time \(\tau \) we have

$$\begin{aligned} \mathbb{E }|Y_\tau |\le \beta \mathbb{E }X^*. \end{aligned}$$

Now we make use of the methodology described in the previous section (in particular, we exploit Remark 2.3). Since \(U\in \mathcal U (V)\), the above estimate follows immediately from (2.7), applied to the local martingales \((X_{\tau \wedge t})_{t\ge 0},\, (Y_{\tau \wedge t})_{t\ge 0}\), and letting \(N\rightarrow \infty ,\, t\rightarrow \infty \) and \(\varepsilon \rightarrow 0\). \(\square \)

4 Sharpness

The constant \(\beta \) can be shown to be optimal in (1.3) by the use of appropriate examples, but then the calculations are quite involved. To simplify the proof, we use a different approach. Assume that the probability space is the interval \([0,1]\) equipped with its Borel subsets and Lebesgue’s measure. Suppose that there is \(\beta _0\in (0,\beta )\) with the following property: for any discrete filtration \((\mathcal{F }_n)_{n\ge 0}\) and any adapted martingales \(f,\, g\) taking values in \(\mathbb{R }^2\) such that \(g\) is differentially subordinate to \(f\), we have

$$\begin{aligned} ||g||_1\le \beta _0||f^*||_1. \end{aligned}$$
(4.1)

We shall show that the validity of this estimate implies the existence of a certain special function, with properties similar to those in the definition of the class \(\mathcal U (V)\). Then, by proper exploitation of these conditions, we shall deduce that \(\beta _0\ge \beta \).

Recall that a sequence \((f_n)_{n\ge 0}\) is called simple if for any \(n\) the term \(f_n\) takes only a finite number of values and there is a deterministic \(N\) such that \(f_N=f_{N+1}=f_{N+2}=\ldots =f_\infty \). For any \((x,y)\in \mathbb{R }^2\times \mathbb{R }^2\), introduce the class \(\mathcal M (x,y)\) which consists of those simple martingale pairs \((f,g)\) with values in \(\mathbb{R }^2\times \mathbb{R }^2\), which satisfy the following two conditions.

  1. (i)

    \((f_0,g_0)\equiv (x,y)\),

  2. (ii)

    for any \(n\ge 1\) we have \(|dg_n|\le |df_n|\).

Here we also allow the filtration \((\mathcal{F }_n)_{n\ge 0}\) to vary. Let \(W:\mathbb{R }^2\times \mathbb{R }^2\times (0,\infty )\rightarrow \mathbb{R }\) \(\cup \{\infty \}\) be given by the formula

$$\begin{aligned} W(x,y,z)=\sup \left\{ \mathbb{E }|g_\infty |-\beta _0\mathbb{E }(f^*\vee z)\right\} , \end{aligned}$$

where the supremum is taken over all \((f,g)\in \mathcal M (x,y)\).

Lemma 4.1

The function \(W\) enjoys the following properties.

  1. (i)

    \(W\) is finite.

  2. (ii)

    \(W\) is homogeneous of order \(1\): for any \((x,y,z)\in \mathbb{R }^2\times \mathbb{R }^2\times (0,\infty )\) and \(\lambda \ne 0\),

    $$\begin{aligned} W(\lambda x,\pm \lambda y,|\lambda | z)=|\lambda | W(x,y,z). \end{aligned}$$
  3. (iii)

    We have \(W(x,y,z)=W(x,y,|x|\vee z)\) for all \((x,y,z)\in \mathbb{R }^2\times \mathbb{R }^2\times (0,\infty )\).

  4. (iv)

    We have \(W(x,y,z)\ge |y|-\beta _0(|x|\vee z)\) for \((x,y,z)\in \mathbb{R }^2\times \mathbb{R }^2\times (0,\infty )\).

  5. (v)

    For fixed \(x\in \mathbb{R }^2\) and \(z>0\), the function \(y\mapsto W(x,y,z)\) is convex on \(\mathbb{R }^2\).

  6. (vi)

    For any \((x,y,z)\in \mathbb{R }^2\times \mathbb{R }^2\times (0,\infty )\) with \(|x|\le z\), any \(h,\,k\in \mathbb{R }^2\) with \(|k|\le |h|\) and any \(s,t> 0\),

    $$\begin{aligned} \frac{s}{s+t}W(x+th,y+tk,z)+\frac{t}{s+t}W(x-sh,y-sk,z)\le W(x,y,z).\qquad \end{aligned}$$
    (4.2)

Proof

  1. (i)

    This follows from (4.1): for any \((f,g)\in \mathcal M (x,y)\) the martingale \(g-y=(g_n-y)_{n\ge 0}\) is differentially subordinate to \(f\), so for any \(z>0\),

    $$\begin{aligned} \mathbb{E }|g_\infty |-\beta _0\mathbb{E }(f^*\vee z)\le |y|+\mathbb{E }|g_\infty -y|-\beta _0\mathbb{E }f^*\le |y|. \end{aligned}$$

    Taking the supremum over \((f,g)\in \mathcal M (x,y)\) yields \(W(x,y,z)\le |y|<\infty \).

  2. (ii)

    Use the fact that \((f,g)\in \mathcal M (x,y)\) if and only if \((\lambda f,\pm \lambda g)\in \mathcal M (\lambda x,\pm \lambda y)\).

  3. (iii)

    This follows immediately from the very definition of \(W\).

  4. (iv)

    The constant pair \((x,y)\) belongs to \(\mathcal M (x,y)\).

  5. (v)

    Take any \(x,\,y_1,\,y_2\in \mathbb{R }^2,\, \alpha \in (0,1)\) and let \(y=\alpha y_1+(1-\alpha )y_2\). Pick \((f,g)\in \mathcal M (x,y)\) and observe that \((f,g+y_i-y)\in \mathcal M (x,y_i)\), \(i=1,\,2\). Thus,

    $$\begin{aligned} \mathbb{E }|g_\infty |-\beta _0\mathbb{E }(f^*\vee z)&\le \alpha \left[\mathbb{E }|g_\infty +y_1-y|-\beta _0\mathbb{E }(f^*\vee z)\right]\\&+\;(1-\alpha )\left[\mathbb{E }|g_\infty +y_2-y|-\beta _0\mathbb{E }(f^*\vee z)\right]\\&\le \alpha W(x,y_1,z)+(1-\alpha ) W(x,y_2,z). \end{aligned}$$

    Taking the supremum over \((f,g)\in \mathcal M (x,y)\) gives the desired convexity.

  6. (vi)

    This is a consequence of the so-called “splicing argument” of Burkholder (see e.g., in [9, p. 77]). For the convenience of the reader, let us provide the easy proof. Pick \((f^+,g^+)\in \mathcal M (x+th,y+tk),\, (f^-,g^-)\in \mathcal M (x-sh,y-sk)\). These two pairs are spliced together into one pair \((f,g)\) as follows: set \((f_0,g_0)\equiv (x,y)\) and (recall that \(\Omega =[0,1]\))

    $$\begin{aligned}&(f_n,g_n)(\omega )=\left(f^+_{n-1},g^+_{n-1}\right)\left(\frac{\omega (s+t)}{s}\right) \quad \text{ if} \omega \le \frac{s}{s+t},\\&\left(f^-_{n-1},g^-_{n-1}\right)\left(\left(\omega -\frac{s}{s+t}\right)\frac{t+s}{t}\right) \quad \text{ if} \omega >\frac{s}{s+t} \end{aligned}$$

    for \(n=1,\,2,\,\ldots \). It is not difficult to see that \((f,g)\) is a martingale pair with respect to its natural filtration. Furthermore, it is clear that this pair belongs to \(\mathcal M (x,y)\). Finally, since \(|x|\le z\), we have \(f_n^*\vee z=\sup _{1\le k\le n}|f_k|\vee z\) for \(n=1,\,2,\,\ldots \) and therefore

    $$\begin{aligned} W(x,y,z)&\ge \mathbb{E }|g_\infty |-\beta _0 \mathbb{E }(f^*\vee z)\\ \!&= \!\frac{s}{t+s}\left[\mathbb{E }|g_{\infty }^+|\!-\!\beta _0\mathbb{E }(f^{+*}\vee z)\right]\!+\!\frac{t}{t+s}\left[\mathbb{E }|g_{\infty }^-|-\beta _0\mathbb{E }(f^{-*}\vee z)\right]. \end{aligned}$$

It remains to take the supremum over all \((f^-,g^-)\) and \((f^+,g^+)\) to get (4.2).

It will be convenient to work with another special function: for any \(r\ge 0\), define

$$\begin{aligned} \Psi (r)=\inf \{ W(x,y,1):|x|=1,\,|y|=r \}. \end{aligned}$$

We shall establish the following property of this object.

Lemma 4.2

For any \(r>0\) and \(\varepsilon >0\) we have

$$\begin{aligned} \Psi (r)\ge \Psi \left(\sqrt{r^2+\varepsilon }\right)+\frac{\varepsilon \Psi (1)}{2(r+1)}. \end{aligned}$$
(4.3)

Proof

Fix \(\delta >0\). Pick \((x,y,z)\in \mathbb{R }^2\times \mathbb{R }^2\times (0,\infty )\) satisfying \(|x|=z=1,\, |y|=r\) and apply (4.2) with \(h=x\), \(k=-y/r\), \(s=\delta \) and \(t>0\). We obtain

$$\begin{aligned} W(x,y,1)&\ge \frac{\delta }{\delta +t}W(x+tx,y-ty/r,1)+\frac{t}{\delta +t}W(x-\delta x,y+\delta y/r,1)\\&= \frac{\delta }{\delta +t}(1+t)W(x,\frac{y-ty/r}{1+t},1)+\frac{t}{\delta +t}W(x-\delta x,y+\delta y/r,1), \end{aligned}$$

where we have used parts (ii) and (iii) of Lemma 4.1. By part (v) of that lemma, the function \(s\mapsto W(x,sy,1)\), \(s\in \mathbb{R }\), is continuous. Thus, if we let \(t\rightarrow \infty \), we get

$$\begin{aligned} W(x,y,1)&\ge \delta W(x,-y/r,1)+W(x-\delta x,y+\delta y/r,1)\nonumber \\&\ge \delta \Psi (1)+W(x-\delta x,y+\delta y/r,1). \end{aligned}$$
(4.4)

Now we have come to the point where we use the fact that we are in the vector-valued setting. Namely, we pick a vector \(d\in \mathbb{R }^2\setminus \{0\}\), orthogonal to \(y+\delta y/r-(x-\delta x)\). Let \(s,\,t>0\) be uniquely determined by the equalities \( |x-\delta x-sd|=|x-\delta x+td|=1.\) Then

$$\begin{aligned} y+\delta y/r-sd|^2-|x-\delta x-sd|^2&= |y+\delta y/r|^2-|x-\delta x|^2\\&= |y|^2+2\delta |y|-1+2\delta , \end{aligned}$$

since, as we have assumed at the beginning, \(|y|=r\) and \(|x|=1\). In other words, we have \(|y+\delta y/r-sd|=\sqrt{|y|^2+2\delta |y|+2\delta }\) and, similarly, \(|y+\delta y/r+td|=\sqrt{|y|^2+2\delta |y|+2\delta }\). Therefore, if we apply (4.2) with \(x^{\prime }:=x-\delta x,\, y^{\prime }:=y+\delta y/r,\, z=1,\, h=k=d\) and \(s,\,t\) as above, and combine it with the definition of \(\Psi \), we get

$$\begin{aligned} W(x-\delta x,y+\delta y/r,1)\ge \Psi \left(\sqrt{|y|^2+2\delta |y|+2\delta }\right)\!. \end{aligned}$$

Plugging this into (4.4) and taking the infimum over \(x\), \(y\) (satisfying \(|x|=1,\, |y|=r\)), we arrive at the estimate

$$\begin{aligned} \Psi (r)\ge \Psi \left(\sqrt{r^2+2\delta (r+1)}\right)+\delta \Psi (1). \end{aligned}$$

It suffices to put \(\delta =\varepsilon /(2r+2)\) to get the claim. \(\square \)

Now we are ready to prove that \(\beta _0\ge \beta \); suppose on contrary that this inequality does not hold. By induction, (4.3) yields

$$\begin{aligned} \Psi (1)\ge \Psi (\sqrt{1+n\varepsilon })+\frac{\varepsilon \Psi (1)}{2}\sum _{k=0}^{n-1} \frac{1}{1+\sqrt{1+k\varepsilon }}. \end{aligned}$$

Fix \(t>1\), put \(\varepsilon =(t^2-1)/n\) and let \(n\rightarrow \infty \) to obtain

$$\begin{aligned} \Psi (1)\ge \Psi (t)+\frac{\Psi (1)}{2}\int \limits _1^{t^2} \frac{1}{1+\sqrt{s}}\,\text{ d}s=\Psi (t)+ \Psi (1)\left(t-\log \left(\frac{1+t}{2}\right)-1\right). \end{aligned}$$

We have \(\Psi (t)\ge t-\beta _0\) by Lemma 4.1 (iv), so the above estimate yields

$$\begin{aligned} \beta _0\ge t+\Psi (1)\left(t-\log \left(\frac{1+t}{2}\right)-2\right) \quad \text{ for} \text{ all} t>1. \end{aligned}$$
(4.5)

Now we shall choose an appropriate \(t\). We have \(\Psi (1)<-1\); otherwise, we would let \(t\rightarrow \infty \) and obtain the contradiction with the assumption \(\beta _0<\beta \). Furthermore, \(\Psi (1)\ge 1-\beta _0>-2\). Thus, the number \(t\), determined by the equation \(\Psi (1)=-\frac{1+t}{t}\), satisfies \(t>1\). Application of (4.5) with this choice of \(t\) gives

$$\begin{aligned} \beta _0\ge t-\frac{1+t}{t}\left(t-\log \left(\frac{1+t}{2}\right)-2\right). \end{aligned}$$

It remains to note that for any \(t> 1\) the right-hand side is not smaller than \(\beta \). This follows from a standard analysis of the derivative. The proof is complete.

5 Proofs of Technical Lemmas

Proof of Lemma 3.1

  1. (i)

    We have \(\Phi ^{\prime }(t)=(1+\beta ^{-1})/(2(1+\sqrt{t}))>0\) and \(\Phi (1)=-(1+\beta ^{-1})<0\).

  2. (ii)

    The claim is equivalent to \(\Psi (t):=\Phi (t^2)-t\,+\,\beta \ge 0\) for all \(t\ge 0\). We easily check that \(\Psi \) is convex on \([0,\infty )\) and, by virtue of (1.4), satisfies \(\Psi (\beta )=\Psi ^{\prime }(\beta )=0\).

  3. (iii)

    Since \(\lim _{s\rightarrow \infty }f^{\prime }(s)=0\), it suffices to prove the convexity of \(f\). We have

    $$\begin{aligned} f^{\prime \prime }(s)=\frac{1}{4s^{3/2}}\left[\log \left(1+\frac{c}{\sqrt{s}}\right) -\frac{\sqrt{s}}{c+\sqrt{s}}+\frac{s}{(c+\sqrt{s})^2}\right]+\frac{2-\log 2}{4s^{3/2}} \end{aligned}$$

    and the expression in the square brackets is nonnegative: indeed, the function

    $$\begin{aligned} x\mapsto \log (1+x)-(1+x)^{-1}+(1+x)^{-2},\quad x\ge 0, \end{aligned}$$

    vanishes at \(0\) and is nondecreasing.

  4. (iv)

    We compute that \(f^{\prime \prime }(s)=-[\,4(c+\sqrt{s})^2\sqrt{s}\,]^{-1}\le 0\). \(\square \)

Proof of Lemma 3.2

Pick \(y_1,\,y_2\in \mathcal H \) and \(\alpha \in (0,1)\). By the concavity of the logarithm, we have

$$\begin{aligned}&\alpha \left[|y_1|-\log (1+|y_1|)\right] +(1-\alpha )\left[|y_2|-\log (1+|y_2|)\right]\\&\qquad \ge |\alpha y_1|+|(1-\alpha )y_2|-\log \left(1+|\alpha y_1|+|(1-\alpha )y_2|\right). \end{aligned}$$

This can be further bounded from below by

$$\begin{aligned} |\alpha y_1+(1-\alpha )y_2|-\log \left(1+|\alpha y_1+(1-\alpha )y_2|\right), \end{aligned}$$

since the function \(t\mapsto t-\log (1+t)\) is nondecreasing on \([0,\infty )\). We are done. \(\square \)

Proof of Lemma 3.3

  1. (i)

    This follows easily from the obvious estimates

    $$\begin{aligned} \log \left(\sqrt{1+|k|^2}+|y+k|\right)\ge \log (\sqrt{1+|k|^2}) \end{aligned}$$

    and

    $$\begin{aligned} 1-\sqrt{1+|k|^2}\le -\log (\sqrt{1+|k|^2}). \end{aligned}$$
  2. (ii)

    For simplicity, we shall write \(k,\, y\) instead of \(|k|,\, |y|\), respectively. We consider two major cases.

Case I: Suppose that

$$\begin{aligned} \sqrt{1+k^2}\ge (2-\log 2)(1+y). \end{aligned}$$
(5.1)

Then \(\sqrt{1+k^2}\ge 2-\log 2\), or \(k\ge k_0:=\sqrt{(2-\log 2)^2-1}\). In addition, \(\frac{k-y}{\sqrt{1+k^2}}\le 1\), so using the convexity of the function \(\xi (s)= 2s-\log (1+s),\, s\ge 0\), we have

$$\begin{aligned} \xi \left(\frac{k-y}{\sqrt{1+k^2}}\right)\le \frac{k-y}{\sqrt{1+k^2}}\cdot \xi (1)+\left(1-\frac{k-y}{\sqrt{1+k^2}}\right)\cdot \xi (0), \end{aligned}$$

or

$$\begin{aligned} 2\frac{k-y}{\sqrt{1+k^2}}-\log \left(1+\frac{k-y}{\sqrt{1+k^2}}\right)\le (2-\log 2)\frac{k-y}{\sqrt{1+k^2}}. \end{aligned}$$

Hence it suffices to prove that

$$\begin{aligned} (2-\log 2)(1+k-\sqrt{1+k^2})\le \frac{k}{1+y}-\log (1+y)+(2-\log 2)y. \end{aligned}$$
(5.2)

We consider three possibilities \(y\le 1,\, 1<y<2\), and \(y\ge 2\) separately.

  1. (1)

    If \(y\le 1\), then the function

    $$\begin{aligned} k\mapsto (2-\log 2)(1+k-\sqrt{1+k^2})-\frac{k}{1+y},\quad k\ge k_0, \end{aligned}$$

    is nonincreasing: its derivative at \(k\) equals

    $$\begin{aligned} (2-\log 2)\left(1-\frac{k}{\sqrt{1+k^2}}\right)-\frac{1}{1+y}&\le (2-\log 2)\left(1-\frac{k_0}{\sqrt{1+k_0^2}}\right)-\frac{1}{2}\\&= -0.03\ldots <0. \end{aligned}$$

    Thus, for \(y\le 1\) all we need is to check (5.2) for \(k\) satisfying the equation \(\sqrt{1+k^2}=(2-\log 2)(1+y)\). But then the estimate is equivalent to

    $$\begin{aligned} \left(2-\log 2-\frac{1}{1+y}\right)(k-\sqrt{1+k^2})\le -\log (1+y)+(2-\log 2)y, \end{aligned}$$

    and the left-hand side is negative, the right-hand side is nonnegative.

  2. (2)

    If \(1<y<2\), then by (5.1) we have \(k\ge \sqrt{4(2-\log 2)^2-1}>2.4\). Consequently, the left-hand side of (5.2) is smaller than \(2-\log 2\), while the right-hand side exceeds

    $$\begin{aligned} \frac{k}{3}+2-2\log 2>0.8+2-2\log 2>2-\log 2. \end{aligned}$$
  3. (3)

    Suppose finally, that \(y\ge 2\). As previously, the left-hand side of (5.2) is bounded from above by \(2-\log 2\). On the other hand, the right-hand side is larger than \(-\log 3+2(2-\log 2)>2-\log 2\).

Case II: Now we assume that

$$\begin{aligned} \sqrt{1+k^2}< (2-\log 2)(1+y). \end{aligned}$$
(5.3)

The inequality (3.3) is equivalent to \(F(k)\le 2y-\log (1+y)\), where

$$\begin{aligned}&F(k)=(2-\log 2)(1-\sqrt{1+k^2})+2(k-y)\\&\qquad \qquad -\,\sqrt{1+k^2}\,\,\log \left(1+\frac{k-y}{\sqrt{1+k^2}}\right)-\frac{k}{1+y}. \end{aligned}$$

We derive that \( F^{\prime }(k)=J_1+J_2\), where

$$\begin{aligned} J_1&= -\frac{k}{\sqrt{1+k^2}}\left[\log \left(1+\frac{k-y}{\sqrt{1+k^2}}\right)-\frac{\frac{k-y}{\sqrt{1+k^2}}}{1+\frac{k-y}{\sqrt{1+k^2}}}\right],\\ J_2&= \frac{y}{1+y}+\frac{k-y}{\sqrt{1+k^2}+k-y}-\frac{(2-\log 2)k}{\sqrt{1+k^2}}. \end{aligned}$$

Since \(\log (1+x)\ge x/(x+1)\) for \(x>-1\), we have \(J_1\le 0\). Furthermore, using the assumption \(\sqrt{k^2+1}+k-y\ge 1+y\), we get

$$\begin{aligned} J_2\le \frac{y}{1+y}+\frac{k-y}{1+y}-\frac{(2-\log 2)k}{\sqrt{1+k^2}}=\frac{k}{\sqrt{1+k^2}}\left(\frac{\sqrt{1+k^2}}{1+y}-(2-\log 2)\right)<0, \end{aligned}$$

due to (5.3). Hence \(F\) is nonincreasing; thus \(F(k)\le F(k_1)\), where \(k_1\) satisfies \(\sqrt{1+k_1^2}=(2-\log 2)(1+y)\); however, by the Case I, \(F(k_1)\le 2y-\log (1+y)\). This completes the proof. \(\square \)

Proof of Lemma 3.4

Of course, we may assume that \(h\ne 0\). Furthermore, by homogeneity, it suffices to verify the estimate for \(z=1\). It is convenient to split the reasoning into three parts.

Step 1

First we shall show (3.4) in the case when \(x\) and \(h\) are linearly dependent. Introduce the function \(G:[0,\infty )\rightarrow \mathbb{R }\) given by

$$\begin{aligned} G(t)=|x+th|\Phi \left(\frac{|y+tk|^2}{|x+th|^2}\right). \end{aligned}$$

We shall prove that this function is convex. To do this, fix \(t_1,\,t_2\ge 0,\, \alpha _1,\,\alpha _2\in (0,1)\) with \(\alpha _1+\alpha _2=1\), and let \(t=\alpha _1t_1+\alpha _2t_2\). Using Lemma 3.2, we get

$$\begin{aligned} \alpha _1G(t_1)&+\;\alpha _2G(t_2)\\&\qquad =\alpha _1|x+t_1h|\Phi \left(\frac{|y+t_1k|^2}{|x+t_1h|^2}\right)+\alpha _2|x+t_2h|\Phi \left(\frac{|y+t_2k|^2}{|x+t_2h|^2}\right)\\&\qquad \ge (\alpha _1|x+t_1h|+\alpha _2|x+t_2h|)\Phi \left(\frac{|y+tk|^2}{(\alpha _1|x+t_1h|+\alpha _2|x+t_2h|)^2}\right)\\&\qquad =|x+th|\Phi \left(\frac{|y+tk|^2}{|x+th|^2}\right)=G(t), \end{aligned}$$

where in the third passage we have exploited the linear dependence of \(x\), \(h\) and the inequality \(\langle x,h\rangle \ge 0\). Therefore, using the bound \(|k|\le |h|\) and Lemma 3.1 (i),

$$\begin{aligned} \lim _{t\rightarrow \infty }G^{\prime }(t)&= \lim _{t\rightarrow \infty }\frac{G(t)-G(1)}{t-1}\\&= |h|\Phi \left(\frac{|k|^2}{|h|^2}\right)\\&\le |h|\Phi (1)\\&= -\left(1+\frac{1}{\beta }\right)|h|\\&= -\left(1+\frac{1}{\beta }\right)\frac{|y||h|+|h|}{1+|y|}\\&\le \left(1+\frac{1}{\beta }\right)\frac{\langle y,k\rangle -\langle x,h\rangle }{1+|y|}. \end{aligned}$$

Consequently,

$$\begin{aligned}&U(x+h,y+k,|x+h|\vee 1,|y+k|\vee w)-U(x,y,1,w)\\&\qquad \qquad =G(1)-G(0) \le \left(1+\frac{1}{\beta }\right)\frac{\langle y,k\rangle -\langle x,h\rangle }{1+|y|}. \end{aligned}$$

Step 2

Next we check (3.4) in the case when \(x\) and \(h\) are orthogonal. The inequality becomes

$$\begin{aligned}&|y+k|-\sqrt{1+|h|^2}\log \left(1+\frac{|y+k|}{\sqrt{1+|h|^2}}\right)-(2-\log 2)\sqrt{1+|h|^2}-\frac{\langle y,k\rangle }{1+|y|}\nonumber \\&\qquad \le |y|-\log (1+|y|)-(2-\log 2). \end{aligned}$$
(5.4)

As a function of \(|h|\), the left-hand side of the inequality is nonincreasing (see Lemma 3.1 (iii)), so it suffices to prove the bound for \(|h|=|k|\). Fix \(|y|,\, |k|\) and consider the left-hand side as a function \(F\) of \(\langle y,k\rangle \). This function is concave (Lemma 3.1 (iv)), and

$$\begin{aligned} F^{\prime }(\langle y,k\rangle )=\frac{1}{\sqrt{1+|k|^2}+|y+k|}-\frac{1}{1+|y|}. \end{aligned}$$
(5.5)

Now, if \(|y|+1> \sqrt{1+|k|^2}+|k|-|y|\), then \(F^{\prime }\) vanishes at \(\langle y,k\rangle =(1+|y|)\) \((1-\sqrt{1+|k|^2})\) and hence it suffices to establish (5.4) for \(y\) and \(k\) satisfying this equation. A little calculation transforms the estimate into (3.2). On the other hand, if \(|y|+1\le \sqrt{1+|k|^2}+|k|-|y|\), then \(F^{\prime }\) is nonpositive on \([-|y||k|,|y||k|]\) and we need to verify (5.4) for \(y,\,k\) satisfying \(\langle y,k\rangle =-|y||k|\). Then the bound reduces to (3.3).

Step 3

Finally, we treat (3.4) for general vectors. The bound is equivalent to

$$\begin{aligned}&|y+k|-|x+h|\log \left(1+\frac{|y+k|}{|x+h|}\right)-(2-\log 2)|x+h|+\frac{\langle x,h\rangle -\langle y,k\rangle }{1+|y|}\nonumber \\&\qquad \qquad \le \; |y|-\log (1+|y|)-(2-\log 2). \end{aligned}$$
(5.6)

For fixed \(|x|,\, y,\, h\), and \(k\), the left-hand side, as a function of \(\langle x,h\rangle \), is convex (see Lemma 3.1 (iii)) and hence it suffices to verify the estimate in the case when \(\langle x,h\rangle =\{|x||h|,0\}\). These cases have been considered in Steps 1 and 2. \(\square \)