1 Introduction

The theory of nonparametric maximal exponential models centered at a given positive density p started with the seminal work by Pistone and Sempi [18] and is a generalization of the statistical theory of exponential families. An infinite-dimensional or nonparametric exponential family is typically defined by the form \(\exp (u-K_p(u)) p\), where the random variable u varies in an appropriate function space, p is the probability density function of a base probability \({\mathbb {P}}\), and \(K_p(u)\) is the logarithm of the normalization constant, which is also known as the cumulant generating function. In [18], the sufficient statistics u in the exponential family are such that \(\exp (\theta u)\) is \(p \cdot {\mathbb {P}}\)-integrable for all \(\theta \) in a real open interval containing 0; in fact, a further restriction is needed to avoid the border of the set \(\{u: K_p(u)<+\infty \}\). This integrability condition, in turn, defines the Banach space of exponential Orlicz spaces \(L^{\Phi }(p)\), whose various equivalent characterizations are the starting point from which the results of this paper arise.

The geometric theory of statistical models as manifolds modeled on exponential Orlicz spaces was deeply investigated in [1, 6, 7, 17]. Specifically, two densities p and q belonging to the maximal exponential model are connected by an open exponential arc. (By open, we essentially mean that the two connected densities are not the extremal points of the arc.) All densities connected by an open exponential arc form a Banach manifold \({\mathcal {P}}\), and the space \(L^{\Phi }(p)\), \(p\in \mathcal P\), is an expression of the tangent space.

Subsequent upgrades and applications of maximal exponential models to Statistics, Information Geometry, Physics and also to Finance have been presented in many works, see, e.g., [8, 12, 14, 16, 20,21,22,23,24,25]. Mainly, in [20] the authors prove that the exponential connection by arc for two densities is equivalent to the equality of the corresponding Orlicz spaces. The possibility to switch from one Orlicz space to the other was exploited in [23] to improve some duality results in a problem of exponential utility maximization. Furthermore, since connected densities produce the same Orlicz space with equivalent norms, in [25] robust concentration inequalities of Bernstein type have been obtained.

An equivalent formulation given in [20] for defining the maximal exponential model requires that the ratio q/p has to satisfy with respect to both densities p and q an integrability condition, denoted here by \((P_\alpha )\), with \(\alpha >1\). From \((P_\alpha )\), the finiteness of both the Kullback–Leibler divergences \(D(q\Vert p)\) and \(D(p\Vert q)\) immediately follows.

Different authors have generalized the original structure of the maximal exponential model by replacing the exponential function with deformed exponentials (see, e.g., [11, 28, 29]) or by modeling the statistical manifold on other function spaces (see, e.g., [4, 13]).

In this paper, new analytical properties of the maximal exponential model in the topology of the exponential Orlicz space are presented. The notion of sub-exponential random variable, equivalent to belonging to the Orlicz exponential space, leads to investigate its link with the definition of BMO (bounded mean oscillation) function. In this analysis, we use a suitable norm equivalent to the standard Luxemburg norm, which allows us to obtain a sharp estimate for the distance to \(L^\infty \) of a sub-exponential random variable. This estimate, in turn, allows us to better understand the structure underlying the maximal exponential model, as it establishes which is the smallest \(\alpha >1\) for which \((P_\alpha )\) holds. A comparison with the BMO theory highlights that condition \((P_\alpha )\), characterizing the maximal exponential model, is a sort of “static” Muckenhoupt condition on \(\log (q/p)\) with respect to both densities p and q. This could be a hint on how to introduce time dependence in the context of maximal exponential models, allowing to change the perspective from static to dynamic.

An interesting issue in statistical applications is the possibility of knowing the constants that the change of law produces on the equivalent norms of Orlicz spaces centered at two connected densities. We provide an answer to this matter and exploit our results to derive uniform bounds.

The paper is organized as follows. In Sect. 2, we investigate the notion of sub-exponential random variable and its link with the distance in the Orlicz topology to \(L^\infty \). In Sect. 3, we use the results of the previous section to highlight the structure underlying the maximal exponential model. In Sect. 4, we study the relationships between Orlicz spaces centered at connected densities, in terms of how their equivalent norms transform. We provide uniform bounds for the norms and some concentration inequalities of Bernstein type that are robust in an exponential subfamily with respect to a reference density.

2 Sub-exponentiality in Orlicz Spaces

In this section, we first describe the relevant Banach spaces: the exponential Orlicz space and its conjugate endowed with the Luxemburg norm (2). We then present the exponential space as a space of sub-exponential random variables with a new norm based of moments, equivalent to the Luxemburg norm, and we investigate its connections to BMO spaces. The second part of the section is partly new and is devoted to the study of a bound on the norm-distance of a generic sub-exponential random variable from \(L^\infty \). In fact, it is well known that the space of bounded random variables is not dense in the exponential Orlicz space, unless the state space is finite, cf. [19]. Particularly, the sequence of truncations of a generic element of such a space does not converge in general to the original variable. The bound we obtain characterizes in terms of the moment generating function those random variables for which the truncations converge. It also provides a better understanding of the structure of maximal exponential models in the next section.

Let \(( \Omega , {{\mathcal {F}}}, {\mathbb {P}})\) be a fixed probability space. Denote by \(L^k\), \(k\ge 1\), the ordinary Lebesgue spaces and by \(L^0\) the set of all random variables u defined on the probability space.

An exponential Orlicz space is a classical Banach space associated with an exponentially growing Young function. Young functions can be seen as generalizations of the power functions, and consequently, Orlicz spaces are generalizations of the Lebesgue spaces. Specifically,

Definition 2.1

A Young function \(\Phi \) is an even, convex function \(\Phi : {\mathbb R}\rightarrow [0,+\infty ]\) such that

$$\begin{aligned} i)\, \Phi (0)=0, \ ii)\, \lim _{x\rightarrow +\infty }\Phi (x)=+\infty , \ iii) \, \Phi (x)<\infty \text{ in } \text{ a } \text{ neighborhood } \text{ of } \text{0 }. \end{aligned}$$

The conjugate function \(\Psi \) of \(\Phi \) is defined as \(\Psi (y)=\underset{x\in {{\mathbb {R}}}}{\sup }\{xy-\Phi (x)\}\), \(\forall y \in {{\mathbb {R}}}\) and is itself a Young function.

The Orlicz space \(L^{\Phi }\), associated with the Young function \(\Phi \), is defined as

$$\begin{aligned} L^{\Phi }=\left\{ u\in L^0 \ : \ \exists \ \beta >0 \ s.t. \ {\mathbb {E}}\left( \Phi (\beta u)\right) < \infty \right\} \end{aligned}$$
(1)

, and the corresponding subspace of centered random variables is denoted by \(L^{\Phi }_0\).

\(L^{\Phi }\) is a complete Banach space when endowed with the Luxemburg norm

$$\begin{aligned} \Vert u\Vert _{\Phi }=\inf \left\{ \alpha >0 \ : \ {\mathbb {E}}\left( \Phi \left( \frac{u}{\alpha }\right) \right) \le 1\right\} . \end{aligned}$$
(2)

We refer to [19] for a general background on the subject.

Here, we focalize on the Young function \(\Phi _1(x)=\text {cosh}(x)-1\), whose conjugate function is \(\Psi _1(y)=y\log (y+\sqrt{1+y^2})-\sqrt{1+y^2}+1\). As a consequence, a random variable u belongs to \(L^{\Phi _1}\) if and only if its moment generating function \(M_u(t)= {\mathbb {E}}(e^{tu})\) is finite in a neighborhood of 0. Moreover, the closed unit ball of \(L^{\Phi _1}\) is

$$\begin{aligned} {{\mathcal {B}}}_1=\{u\in L^{\Phi _1} \, \ {\mathbb {E}}\left( \Phi _1( u)\right) \le 1 \}=\{u\in L^{\Phi _1} \, \ {\mathbb {E}}\left( \text {cosh}( u)\right) \le 2 \}. \end{aligned}$$

It is worth recalling that the same Orlicz space can be related to different equivalent Young functions. Specifically, the Young function \(\Phi _1(x)\) is equivalent to the more commonly used \(\Phi _2(x)= e^{|x|}-|x|-1\) and its conjugate function \(\Psi _1(y)\) is equivalent to \(\Psi _2(y)= (1+|y|)\log (1+|y|)-|y|\), whose analytic expression is related to the Kullback–Leibler divergence.

Remark 2.2

We can easily locate the Orlicz spaces \(L^{\Phi _1}\) and \(L^{\Psi _1}\) in the hierarchy of classical \(L^k\) spaces:

$$\begin{aligned} L^\infty \subseteq L^{\Phi _1} \subset L^k \subseteq L^{\Psi _1} \subset L^1, \qquad k>1; \end{aligned}$$

more precisely, every correspondent injection is continuous. In particular, if \(u\in L^{\Phi _1}\), then the moment of u of any order is finite.

The following proposition gives equivalent conditions for a random variable to belong to \( L^{\Phi _1}\) (cf. [30]).

Proposition 2.3

Let \(u\in L^0\). The following conditions are equivalent:

  1. 1.

    \(u\in L^{\Phi _1}\), i.e., the moment generating function of u is finite in a neighborhood of 0.

  2. 2.

    There is \(\lambda >0\) such that

    $$\begin{aligned} {\mathbb {P}}(|u|\ge t)\le \delta e^{-\lambda t}, \ \exists \delta =\delta (\lambda )\ge 1 \text{ and } \forall t>0. \end{aligned}$$
    (3)
  3. 3.

    \(\sup \limits _{k\ge 1}\left( {\mathbb {E}}\left( |u|^k\right) /k!\right) ^{1/k}<\infty \).

  4. 4.

    \(\limsup \limits _{k\rightarrow \infty }\left( {\mathbb {E}}\left( |u|^k\right) /k!\right) ^{1/k}<\infty \).

For \(u\in L^{\Phi _1}\), the mapping

$$\begin{aligned} \Vert u\Vert _{\star }=\sup _{k\ge 1}\left( {\mathbb {E}}\left( \frac{|u|^k}{k!}\right) \right) ^{1/k} \end{aligned}$$
(4)

is a norm on \(L^{\Phi _1}\) equivalent to \(\Vert u\Vert _{\Phi _1}\) and it holds (see [24])

$$\begin{aligned} \frac{2}{3} \ \Vert u\Vert _{\Phi _1} \le \Vert u\Vert _{\star } \le 4 \ \Vert u\Vert _{\Phi _1}. \end{aligned}$$
(5)

In particular, it can be easily checked that for constant random variables we have \(\Vert c\Vert _{\star }=|c|\) and for any bounded random variable u it holds \(\Vert u\Vert _{\star }\le \Vert u\Vert _{\infty }\).

In the sequel, we will prove our results with respect to the norm \(\Vert \cdot \Vert _{\star }\), which turns out to be a convenient choice in order to obtain sharp estimates (see Theorem 2.8).

Remark 2.4

A random variable u satisfying condition 2 in the previous proposition is also called sub-exponential. If u satisfies the inequality \({\mathbb {P}}(|u|\ge t)\le \delta e^{-\lambda t^2}, \ \forall t>0\), it is called sub-Gaussian. It is immediate to see that a sub-Gaussian random variable u is also sub-exponential and that u is sub-Gaussian if and only if \(u^2\) is sub-exponential. More generally, \(u^n\) is sub-exponential, i.e., \(u^n\in L^{\Phi _1}\), if and only if \({\mathbb {P}}(|u|\ge t)\le \delta e^{-\lambda t^n}, \ \forall t>0\).

Remark 2.5

Condition 2 of Proposition 2.3 can be rewritten in the form:

2’.:

There is \(\lambda >0\) such that \({\mathbb {P}}(|u|\ge t)\le e^{-\lambda t}, \ \forall t>t_0=t_0(\lambda )\ge 0\),

which results to be a probabilistic version of the John–Nirenberg inequality for \(\text {BMO}\)-functions in classical analysis (see [9]). Specifically, we recall that a real-valued locally integrable function f on \({\mathbb {R}}^d\) has bounded mean oscillation, that is, \(f\in \text {BMO}\), if

$$\begin{aligned} \sup _I \frac{1}{|I|}\int _I |f-f_I|\, dx <+\infty , \end{aligned}$$
(6)

where I denotes any cube in \({\mathbb {R}}^d\), |I| its volume and \(f_I\) the mean of f over I. John and Nirenberg proved in [9] that \(f\in \text {BMO}\) if and only if there is \(\lambda >0\) such that

$$\begin{aligned} \sup _I \frac{1}{|I|} \, \big | \{ x\in I: \, |f(x)-f_I | \ge t \} \big | \le e^{-\lambda t}, \ \forall t>t_0=t_0(\lambda )\ge 0. \end{aligned}$$

The probabilistic version of (6) for continuous uniformly integrable martingales \(M=(M_s)_{0\le s\le \infty }\) with \(M_0=0\) is

$$\begin{aligned} \sup _\tau \big \Vert {\mathbb {E}}\left( |M_\infty -M_\tau | \,|\, \mathcal F_{\tau }\right) \big \Vert _\infty <+\infty , \end{aligned}$$

where \(\tau \) denotes any stopping time for the filtration \(\mathcal F\). The probabilistic analogue of the John–Nirenberg inequality was proved by Emery [3]: \(M\in \text {BMO}\) if and only if there is \(\lambda >0\) such that

$$\begin{aligned} {\mathbb {P}} \left( \sup _{s\ge 0} |M_{\tau +s }-M_\tau | \ge \epsilon \,|\, {\mathcal {F}}_{\tau }\right) \le e^{-\lambda \epsilon }, \ \forall \epsilon >\epsilon _0=\epsilon _0(\lambda )\ge 0, \end{aligned}$$

for any stopping time \(\tau \).

The following proposition establishes the equality between the largest range for which the moment generating function of |u| is finite and the largest \(\lambda \) for which condition (3) holds.

Proposition 2.6

For \(u\in L^{\Phi _1}\), define

$$\begin{aligned} a(u)=\sup \{a>0 \, : \, {\mathbb {E}}e^{a |u|}<+\infty \} \end{aligned}$$

and

$$\begin{aligned} \lambda (u)=\sup \{\lambda >0 \, : \, u \text{ satisfies } (3) \}. \end{aligned}$$

Then, it holds

$$\begin{aligned} a(u)=\lambda (u). \end{aligned}$$

Proof

In order to prove \(a(u)\le \lambda (u)\), we show that

$$\begin{aligned} \{a>0 \,: \, {\mathbb {E}}e^{a |u|}<+\infty \} \subseteq \{\lambda >0 \,: \, u \text{ satisfies } (3) \}. \end{aligned}$$

Let us take \(a>0\) such that \({\mathbb {E}}e^{a |u|}<+\infty \). By Markov inequality, for all \(t>0\) we have

$$\begin{aligned} {{\mathbb {P}}}(|u|\ge t)= {{\mathbb {P}}}(e^{a |u|}\ge e^{a t})\le e^{-a t}{{\mathbb {E}}}(e^{a |u|}). \end{aligned}$$

We then obtain (3) by taking \(\lambda =a\) and \(\delta ={{\mathbb {E}}}(e^{a |u|})\).

To prove \(a(u)\ge \lambda (u)\), suppose by contradiction that \(a(u)< \lambda (u)\). We can then select \(\lambda _1, \lambda _2>0\) such that \(a(u)<\lambda _1<\lambda _2<\lambda (u)\). Since \(\lambda _2<\lambda (u)\), it holds

$$\begin{aligned} {\mathbb {P}}(|u|\ge t)\le \delta e^{-\lambda _2 t}, \end{aligned}$$
(7)

for a suitable \(\delta \ge 1\). Using (7), for any \(k=1,2...\) we have

$$\begin{aligned} {{\mathbb {E}}}(|u|^k)&=\int _0^{+\infty } {{\mathbb {P}}}(|u|^k>t) \ \text {d}t=\int _0^{+\infty } {{\mathbb {P}}}(|u|>s) \ k s^{k-1} \ \text {d}s \\ {}&\le k \delta \int _0^{+\infty } s^{k-1} e^{-\lambda _2 s} \ \text {d}s= k!\frac{\delta }{\lambda _2^k}. \end{aligned}$$

As a consequence, we get

$$\begin{aligned} {\mathbb {E}}e^{\lambda _1 |u|}&=\sum _{k=0}^{+\infty }\frac{\lambda _1^k}{k!} {\mathbb E}(|u|^k)\le \delta \sum _{k=0}^{+\infty }\left( \frac{\lambda _1}{\lambda _2}\right) ^k=\delta \frac{1}{1-\frac{\lambda _1}{\lambda _2}}<+\infty , \end{aligned}$$

which implies \(\lambda _1\le a(u)\). \(\square \)

The next result is a technical preliminary to our main results.

Lemma 2.7

Let \(u\in L^{\Phi _1}\). If \(\Vert u\Vert _{\star }<1\), then

$$\begin{aligned} {\mathbb {E}}e^{|u|}\le \frac{1}{1-\Vert u\Vert _{\star }}. \end{aligned}$$

Proof

By the definition (4) of \(\Vert \cdot \Vert _{\star }\), we have

$$\begin{aligned} {\mathbb {E}}e^{|u|}\le \sum _{k=0}^\infty \frac{1}{k!}{\mathbb {E}}(|u|^k)\le \sum _{k=0}^\infty \Vert u\Vert _{\star }^k=\frac{1}{1-\Vert u\Vert _{\star }}. \end{aligned}$$

\(\square \)

The following theorem gives a very nice relation between \(a(u)(=\lambda (u))\) and \(d_{\star }(u,L^{\infty })\), where

$$\begin{aligned} d_{\star }(u,L^{\infty })= \inf _{\ell \in L^\infty }\Vert u-\ell \Vert _{\star }. \end{aligned}$$

It is in the spirit of the results obtained in classical and stochastic analysis for the distance in \(\text {BMO}\) to \(L^\infty \), where upper and lower bounds for a(u) in terms of \(d_{\text {BMO}}(u, L^\infty )\) are given (see, e.g., [5] and, for continuous martingales, [27]). In our setting, we can prove that a(u) is exactly the reciprocal of the \(d_{\star }\)-distance of \(u\in L^{\Phi _1}\) to \(L^\infty \).

Theorem 2.8

For \(u\in L^{\Phi _1}\), we have

$$\begin{aligned} a(u)= \frac{1}{d_{\star }(u,L^{\infty })} \end{aligned}$$

Proof

We first prove that \(a(u) \ge \frac{1}{d_{\star }(u,L^{\infty })}\). Let \(0<a<\frac{1}{d_{\star }(u,L^{\infty })}\). Then, \(a \Vert u-\ell \Vert _{\star }<1\) for some \(\ell \in L^{\infty }\). Using Lemma 2.7, we get

$$\begin{aligned} {\mathbb {E}}e^{a|u|}\le {\mathbb {E}}\left( e^{a|u-\ell |}e^{a|\ell |}\right) \le e^{a\Vert \ell \Vert _{\infty }} {\mathbb {E}}e^{a|u-\ell |}\le \frac{e^{a\Vert \ell \Vert _{\infty }}}{1-a \Vert u-\ell \Vert _{\star }}<+\infty , \end{aligned}$$

which implies \(a\le a(u)\). We now prove that \(a(u) \le \frac{1}{d_{\star }(u,L^{\infty })}\). It suffices to verify that if \(0< a < a(u)\), then \(a\le \frac{1}{d_{\star }(u,L^{\infty })}\). Let us fix \(0<a < a(u)\). For any \(\ell \in L^{\infty }\), it holds

$$\begin{aligned} d_{\star }(u,L^{\infty })\le \Vert u-\ell \Vert _{\star } =\sup _{k\ge 1}\left( {\mathbb {E}}\left( \frac{|u-\ell |^k}{k!}\right) \right) ^{1/k}\le \frac{1}{a} \, {\mathbb {E}}e^{a|u-\ell |}, \end{aligned}$$

where the last inequality is due to the fact that for any \(a>0\) and \(k\ge 1\) it holds \(\frac{{\mathbb {E}}|v|^k}{k!}\le \frac{1}{ a^k} {\mathbb {E}} e^{a |v|}\) and thus

$$\begin{aligned} \left( {\mathbb {E}}\left( \frac{|u-\ell |^k}{k!}\right) \right) ^{1/k}\le \frac{1}{a} \, \left( {\mathbb {E}}e^{a|u-\ell |}\right) ^{1/k}\le \frac{1}{a} \, {\mathbb {E}}e^{a|u-\ell |}. \end{aligned}$$

Now, let us choose the sequence of truncations at n of u, i.e., \(u_n=1_{|u|\le n}u\in L^{\infty }\); then, the sequence \(|u-u_n|=1_{|u|>n} |u|\) decreases to 0. Since by hypothesis \(a < a(u)\), the random variables \(e^{a|u-u_n|}\) are bounded by the integrable random variable \(e^{a|u|}\). Applying the Lebesgue dominated convergence theorem, we deduce that \(\lim _{n\rightarrow +\infty }{\mathbb {E}}e^{a|u-u_n|}=1\) and thus \(d_{\star }(u,L^{\infty })\le \frac{1}{a} \). \(\square \)

Remark 2.9

From the theorem above, we immediately see that the sequence of truncations \(u_n\) converges to the original variable \(u\in L^{\Phi _1}\) if and only if the moment generating function of u is defined on the whole real line (cf. [1], Lemma 2).

In the sequel, we investigate the relationships among \(L^{\Phi _1}\) spaces with respect to different probability measures whose densities with respect to \({\mathbb {P}} \) belong to an open exponential model. For this purpose, we need to recall the notion of open exponential model, as well as its geometric meaning and the corresponding main results within the Orlicz framework.

3 Exponential Models

This section discusses a crucial topic in the geometric theory of statistical models. Here, the main new contributions are represented by Items 6 and 8 of Theorem 3.5 and by Proposition 3.8.

Let \({\mathcal {P}}\) denote the set of all densities which are positive \({\mathbb {P}}\)-a.s. For each fixed \(p\in {\mathcal {P}}\), we use \({\mathbb {E}}_p\) to indicate the integral with respect to \(p \cdot {\mathbb {P}}\). Moreover, the corresponding Orlicz space associated with \(\Phi _1\) is denoted by \(L^{\Phi _1}(p)\).

Let us consider the cumulant generating functional \(K_p(u)=\log {\mathbb {E}}_p(e^u)\) defined on the subspace of centered random variables \(L^{\Phi _1}_0(p)\). We recall from [18] that \(K_p\) is a positive convex and lower semicontinuous function, vanishing at zero. In addition, the interior of its proper domain, denoted here by \(\overset{\circ }{\text {dom}\, K_p}\), is a non-empty convex set containing the open unit ball of \(L^{\Phi _1}_{0}(p)\). This allows us to give the following definition.

Definition 3.1

For every density \(p \in {{\mathcal {P}}}\), the maximal exponential model at p is

$$\begin{aligned} {{\mathcal {E}}}(p)=\left\{ q=e^{u-K_p(u)} p: u\in \overset{\circ }{\text {dom}\, K_p} \right\} \subseteq {{\mathcal {P}}}. \end{aligned}$$

Remark 3.2

\(K_p\) is defined on the set \(L^{\Phi _1}_{0}(p)\) because centered random variables guarantee the uniqueness of the representation of \(q\in {{\mathcal {E}}}(p)\).

One of the main results in [18] states that any density belonging to the maximal exponential model centered at p is connected by an open exponential arc to p and vice versa. By open, we mean that the two densities are not the extremal points of the arc:

Definition 3.3

\(p, q \in {{\mathcal {P}}}\) are connected by an open exponential arc if there exists an open interval \(I \supset [0,1]\) such that \(p(\theta )\propto p^{(1-\theta )}q^{\theta }\in {\mathcal {P}}\), \(\forall \theta \in I\).

The connection by an open exponential arc is an equivalence relation. An equivalent definition is provided by the following proposition (see [1]).

Proposition 3.4

\(p, q \in {{\mathcal {P}}}\) are connected by an open exponential arc iff there exists an open interval \(I \supset [0,1]\) such that \(p(\theta )\propto e^{\theta u}p\in {\mathcal {P}}\), \(\forall \theta \in I\), where \(u\in L^{\Phi _1}(p)\) and \(p(0)=p, \ p(1)=q\).

The following theorem gives different equivalent conditions for a density to belong to the maximal exponential model. The proof of assertions 1–4 can be found in [1], while that of assertions 5–7 can be found in [20, 23]. The equivalence between 4 and the new assertion 8 follows from Theorem 2.8 of the previous section.

Theorem 3.5

(Portmanteau Theorem) Let \(p,q \in {\mathcal {P}}\). The following statements are equivalent.

  1. 1.

    \(q\in {\mathcal {E}}(p)\);

  2. 2.

    q is connected to p by an open exponential arc;

  3. 3.

    \({\mathcal {E}}(p)={\mathcal {E}}(q)\);

  4. 4.

    \(\log (q/p) \in L^{\Phi _1}(p)\cap L^{\Phi _1}(q);\)

  5. 5.

    \(L^{\Phi _1}(p)=L^{\Phi _1}(q)\);

  6. 6.

    There exists \(\alpha >1\) such that

    $$\begin{aligned} (P_\alpha ): \qquad q/p\in L^{1/(\alpha -1)}(q) \,\, \text{ and } \,\, p/q \in L^{1/(\alpha -1)}(p) \end{aligned}$$
  7. 7.

    \(^m{\mathbb {U}}^q_p: L^{\Psi _1}(p) \rightarrow L^{\Psi _1}(q)\) s.t. \(^m{\mathbb {U}}^q_p(v)=(p/q)v\) is an isomorphism of Banach spaces.

  8. 8.

    \(d_{\star , p}(\log (q/p), L^\infty )<+\infty \) and \(d_{\star , q}(\log (q/p), L^\infty )<+\infty \).

It is worth noting that the maximal exponential model is a good environment when dealing with divergence between densities. In fact, since \((P_\alpha )\) means \(q/p\in L^{1+\epsilon }(p)\) and \(p/q \in L^{1+\epsilon }(q)\) for some \(\epsilon >0\), it immediately follows that if \(q \in \mathcal {E}(p)\), then Kullback–Leibler divergences \(D(q\Vert p)\) and \(D(p\Vert q)\) are both finite.

The equivalence between the equality of the Orlicz spaces \(L^{\Phi _1}(p)\) and \(L^{\Phi _1}(q)\) and property \((P_\alpha )\) has been exploited in [23] to improve some duality results in the classical problem of exponential utility maximization in incomplete markets. In fact, in many well-known works on relative entropy minimization, the minimal entropy martingale (density) measure \(q^*\) satisfies \((P_\alpha )\), allowing to switch from the reference Orlicz space \(L^{\Phi _1}(p)\) to \(L^{\Phi _1}(q^*)\), and conversely, at convenience.

Remark 3.6

From Hölder inequality, \((P_\alpha )\) implies \((P_r)\) \(\forall r>\alpha \).

Lemma 3.7

The following equivalence holds:

$$\begin{aligned} (P_\alpha ) \quad \Longleftrightarrow \quad {\mathbb {E}}_q\left( \frac{q}{p}\right) ^{\pm \frac{1}{\alpha -1}}< +\infty , {\mathbb {E}}_p\left( \frac{p}{q}\right) ^{ \pm \frac{1}{\alpha -1}}< +\infty . \end{aligned}$$

Proof

It suffices to observe that if \(1<\alpha \le 2\), then

$$\begin{aligned} {\mathbb {E}}_q\left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}}={\mathbb {E}}_p\left( \frac{q}{p}\right) ^{1+\frac{1}{\alpha -1}}<+\infty \, \Longrightarrow \, {\mathbb {E}}_p\left( \frac{p}{q}\right) ^{- \frac{1}{\alpha -1}}= {\mathbb {E}}_p\left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}}<+\infty . \end{aligned}$$

Similarly, if \(1<\alpha \le 2\), then

$$\begin{aligned} {\mathbb {E}}_p\left( \frac{p}{q} \right) ^{\frac{1}{\alpha -1}}={\mathbb {E}}_q\left( \frac{p}{q} \right) ^{1+\frac{1}{\alpha -1}}<+\infty \, \Longrightarrow \, {\mathbb {E}}_q\left( \frac{q}{p} \right) ^{- \frac{1}{\alpha -1}}= {\mathbb {E}}_q\left( \frac{p}{q} \right) ^{\frac{1}{\alpha -1}}<+\infty . \end{aligned}$$

If \(\alpha > 2\), the function \(x^{\frac{1}{\alpha -1}}\) is concave and, by Jensen inequality, \({\mathbb {E}}_p\left( \frac{p}{q}\right) ^{- \frac{1}{\alpha -1}}= {\mathbb {E}}_p\left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}} \le 1\) and \({\mathbb {E}}_q\left( \frac{q}{p} \right) ^{- \frac{1}{\alpha -1}}= {\mathbb {E}}_q\left( \frac{p}{q} \right) ^{\frac{1}{\alpha -1}}\le 1\). \(\square \)

The following proposition gives a characterization of the smaller \(\alpha >1\) for which \((P_\alpha )\) holds, in terms of the \(d_{\star }\)-distance of \(\log (q/p)\) to \( L^\infty \).

Proposition 3.8

Let \(q\in {\mathcal {E}}(p)\) and define

$$\begin{aligned} \alpha _{p,q}=\inf \{\alpha >1\, : \, (P_\alpha ) \text{ holds }\}. \end{aligned}$$

Then,

$$\begin{aligned} \alpha _{p,q}=1+ \min \left\{ d_{\star , p}\left( \log \left( \frac{q}{p}\right) , L^\infty \right) , \, d_{\star , q}\left( \log \left( \frac{q}{p}\right) , L^\infty \right) \right\} . \end{aligned}$$
(8)

Proof

By Lemma (3.7), we get

$$\begin{aligned} \alpha _{p,q}= & {} \inf \{\alpha>1\,: \, {\mathbb {E}}_q\left( \frac{q}{p}\right) ^{\pm \frac{1}{\alpha -1}}< +\infty , {\mathbb {E}}_p\left( \frac{p}{q}\right) ^{ \pm \frac{1}{\alpha -1}}< +\infty \} \nonumber \\= & {} \inf \{\alpha >1\,: \, {\mathbb {E}}_q \left( e^{\frac{1}{\alpha -1} |\log {\frac{q}{p}}|}\right)< +\infty , \, {\mathbb {E}}_p \left( e^{\frac{1}{\alpha -1} |\log {\frac{q}{p}}|}\right) < +\infty \}. \end{aligned}$$

By Theorem 2.8, we then deduce

$$\begin{aligned} \alpha _{p,q}= & {} 1 + \min \left\{ \frac{1}{a_q\left( \log \frac{q}{p}\right) }, \frac{1}{a_p\left( \log \frac{q}{p}\right) } \right\} \\= & {} 1+ \min \left\{ d_{\star , p}\left( \log \left( \frac{q}{p}\right) , L^\infty \right) , \, d_{\star , q}\left( \log \left( \frac{q}{p}\right) , L^\infty \right) \right\} . \end{aligned}$$

\(\square \)

In the next section, we will see that when \(q\in {\mathcal {E}}(p)\), the distances \(d_{\star , p}\) and \(d_{\star , q}\) are equivalent. So, from the proposition above we get that \(\log \frac{q}{p}\) belongs to the closure in \(L^{\Phi _1}(p)=L^{\Phi _1}(q)\) of \(L^\infty \) if and only if \(\alpha _{p,q}=1\), which means that \((P_\alpha )\) holds for all \(\alpha >1\).

Remark 3.9

In the classical setting, it was shown (see [5]) that a locally integrable function f on \({\mathbb {R}}^d\) belongs to the \(\text {BMO}\)-closure of \(L^\infty \) if and only if both \(e^f\) and \(e^{-f}\) satisfy the Muckenhoupt \((A_\alpha )\) condition for all \(\alpha >1\):

$$\begin{aligned}&\sup _I\left( \frac{1}{|I|}\int _Ie^f dx \right) \left( \frac{1}{|I|}\int _Ie^{-\frac{f}{\alpha -1} } dx \right) ^{\alpha -1}<+\infty , \\&\sup _I\left( \frac{1}{|I|}\int _Ie^{-f} dx \right) \left( \frac{1}{|I|}\int _Ie^{\frac{f}{\alpha -1} } dx \right) ^{\alpha -1}<+\infty , \end{aligned}$$

where I denotes any cube in \({\mathbb {R}}^d\) and |I| its volume.

In the stochastic setting, a probabilistic analogue of the above result has been obtained for continuous \(\text {BMO}\)-martingales (see [10]); specifically, a \(\text {BMO}\)-martingale \(M=(M_t)_{0\le t\le \infty }\) belongs to the \(\text {BMO}\)-closure of \(L^\infty \) if and only if both \({\mathbb {E}}\left( e^{M_{\infty }} |{\mathcal {F}}_{\cdot }\right) \) and \({\mathbb {E}}\left( e^{-M_{\infty }} |{\mathcal {F}}_{\cdot }\right) \) satisfy the probabilistic version of the Muckenhoupt \((A_\alpha )\) condition, for all \(\alpha >1\):

$$\begin{aligned}&\sup _\tau \big \Vert {\mathbb {E}}\left( e^{M_{\infty }} |\mathcal F_{\tau }\right) {\mathbb {E}}\left( e^{-\frac{M_{\infty }}{\alpha -1}} |{\mathcal {F}}_{\tau }\right) ^{\alpha -1}\big \Vert _\infty<+\infty , \\&\sup _\tau \big \Vert {\mathbb {E}}\left( e^{-M_{\infty }} |\mathcal F_{\tau }\right) {\mathbb {E}}\left( e^{\frac{M_{\infty }}{\alpha -1}} |{\mathcal {F}}_{\tau }\right) ^{\alpha -1}\big \Vert _\infty <+\infty , \end{aligned}$$

where \(\tau \) denotes any stopping time for the filtration \(\mathcal F\). Observe that the class \(\text {BMO}\) as well as \((A_\alpha )\) condition depends on the underlying probability. For \(\tau =0\), \((A_\alpha )\) implies

$$\begin{aligned} (A_\alpha ^0): \qquad {\mathbb {E}}\left( e^{\pm M_{\infty }}\right)<+\infty , \quad \mathbb E\left( e^{\pm \frac{M_{\infty }}{\alpha -1}}\right) <+\infty . \end{aligned}$$

Thinking that the role of \(M_\infty \) is played by \(\log \frac{q}{p}\), we immediately see that requesting joint validity of \((P_2)\) and \((P_\alpha )\) means requesting the validity of \( (A_\alpha ^0)\) with respect to p and q

$$\begin{aligned} (P_2) \wedge (P_\alpha ) \Longleftrightarrow (A_\alpha ^0(p))\wedge (A_\alpha ^0(q)). \end{aligned}$$

In particular, taking into account Remark 3.6, for \(1<\alpha \le 2\) it holds

$$\begin{aligned} (P_\alpha ) \, \Longleftrightarrow \, (A_\alpha ^0(p)) \wedge (A_\alpha ^0(q)). \end{aligned}$$

4 Transformation of Norms by a Change of Law

Using essentially new computations with inequalities, this section derives explicit constants in the bounds between norms of equivalent Orlicz spaces. Since these constants depend on the points, they can be viewed as a measure of distance between points in the same maximal exponential model. Two applications of these bounds are proposed in Sects 4.1 and 4.2. Specifically, concentration inequalities frequently appear in high-dimensional probability and statistics ([26, 30]), but studies on the robustness of these inequalities uniformly in bounded parts of the statistical model are still limited.

We first start by recalling the following result proved in Cena and Pistone [1].

Proposition 4.1

If the Orlicz spaces \(L^{\Phi _1}(p)\) and \(L^{\Phi _1}(q)\) are equal as sets, then their Luxemburg norms \(\Vert \cdot \Vert _{\Phi _1, p}\) and \(\Vert \cdot \Vert _{\Phi _1, q}\) are equivalent.

The proof of the equivalence of norms is based on a standard argument for function spaces, i.e., in proving that the identity map from \(L^{\Phi _1}(p)\) to \(L^{\Phi _1}(q)\) is an homeomorphism. In many statistical applications, however, it is crucial to know the constants that this change of law yields on the norms. Theorem 4.2 provides an answer to this issue. It refers to the norm \(\Vert \cdot \Vert _{\star }\), equivalent to \(\Vert \cdot \Vert _{\Phi _1}\) by (5), since it is more suitable for applying the property \((P_\alpha )\) of the Portmanteau theorem.

Theorem 4.2

Let \(v\in L^{\Phi _1}(p)=L^{\Phi _1}(q)\). Then,

  1. 1.

    there exists \(\alpha >1 \) such that

    $$\begin{aligned} c_{\alpha }^{-1} \Vert v\Vert _{\star ,p} \le \Vert v\Vert _{\star ,q}\le C_{\alpha } \Vert v\Vert _{\star ,p}, \end{aligned}$$
    (9)

    where

    $$\begin{aligned} c_{\alpha }=\alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( {\mathbb {E}}_p \left( \frac{p}{q}\right) ^{\frac{1}{\alpha -1}}\right) ^\frac{\alpha -1}{\alpha }, \quad C_{\alpha }=\alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( {\mathbb {E}}_q \left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}}\right) ^\frac{\alpha -1}{\alpha } \end{aligned}$$
    (10)

    are constants independent of v;

  2. 2.

    it holds

    $$\begin{aligned} c^{-1} \, \Vert v\Vert _{\star ,p} \le \Vert v\Vert _{\star ,q}\le C \, \Vert v\Vert _{\star ,p}, \end{aligned}$$
    (11)

    where

    $$\begin{aligned} c=\inf _{\alpha>1}c_{\alpha }, \quad C=\inf _{\alpha >1}C_{\alpha } \end{aligned}$$
    (12)

Proof

For any \(\gamma >0\), using Hölder inequality with exponents \(\alpha /(\alpha -1)\) and \(\alpha \) we have

$$\begin{aligned} \frac{{\mathbb {E}}_q|v|^k}{k!}\le & {} \frac{1}{ \gamma ^k} {\mathbb {E}}_q e^{\gamma |v|}=\frac{1}{ \gamma ^k} {\mathbb {E}}_q\left( \left( \frac{q}{p}\right) ^{1/\alpha }\left( \frac{p}{q}\right) ^{1/\alpha } e^{\gamma |v|} \right) \\\le & {} \frac{1}{ \gamma ^k} \left( {\mathbb {E}}_q \left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}}\right) ^\frac{\alpha -1}{\alpha } \left( {\mathbb {E}}_p e^{\gamma \alpha |v|}\right) ^{\frac{1}{\alpha }}. \end{aligned}$$

Now, choosing \(\gamma >0\) so that \(\gamma \alpha \Vert v\Vert _{\star ,p} <1\), e.g., \(\gamma = \frac{1}{2 \alpha \Vert v\Vert _{\star ,p}}\), by Lemma 2.7 we deduce

$$\begin{aligned} {\mathbb {E}}_p e^{\gamma \alpha |v|}\le \frac{1}{1-\gamma \alpha \Vert v\Vert _{\star ,p}}=2, \end{aligned}$$
(13)

and therefore

$$\begin{aligned} \frac{{\mathbb {E}}_q|v|^k}{k!}\le 2^{\frac{1}{\alpha }} (2 \alpha )^k \Vert v\Vert ^k_{\star ,p} \left( {\mathbb {E}}_q \left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}}\right) ^\frac{\alpha -1}{\alpha }. \end{aligned}$$
(14)

By the first inequality of \((P_\alpha )\), for some \(\alpha >1\) the mean value \({\mathbb {E}}_q \left( q/p\right) ^{\frac{1}{\alpha -1}}\) is finite. Taking into account the definition (4) of \(\Vert \cdot \Vert _{\star ,q}\), from (14) we deduce

$$\begin{aligned} \Vert v\Vert _{\star ,q}=\sup _{k\ge 1}\left( {\mathbb {E}}_q\left( \frac{|v|^k}{k!}\right) \right) ^{1/k}\le 2 \alpha \Vert v\Vert _{\star ,p} \sup _{k\ge 1} \Big \{2^{\frac{1}{\alpha k}} \left( {\mathbb {E}}_q \left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}}\right) ^\frac{\alpha -1}{\alpha k}\Big \}. \end{aligned}$$
(15)

Since the function \(x^\frac{\alpha }{\alpha -1}\) is convex, by Jensen’s inequality \({\mathbb {E}}_q \left( q/p\right) ^{\frac{1}{\alpha -1}}= {\mathbb {E}}_p \left( q/p\right) ^{\frac{\alpha }{\alpha -1}} \ge 1\). As a consequence, the supremum in (15) is obtained for \(k=1\). The right-hand inequality in (9) then follows.

The left-hand inequality in (9) can be proved by inverting the role of p and q and by exploiting the second inequality of \((P_\alpha )\). Finally, inequality (11) follows immediately from (9). \(\square \)

Remark 4.3

In the previous theorem, we have chosen \(\gamma >0\) such that \(\gamma \alpha \Vert v\Vert _{\star ,p}=1/2\), which implies \(\frac{1}{1-\gamma \alpha \Vert v\Vert _{\star ,p}}=2\). But any other \(\gamma >0\) could be chosen such that \(\gamma \alpha \Vert v\Vert _{\star ,p}<1\), and this means that in (10) any number strictly greater than 1 can replace the number 2.

The following proposition is a probabilistic version of Gehring’s inequality (see [2]) and gives conditions for an upper bound on \({\mathbb {E}}_q\left( \frac{q}{p}\right) ^\frac{1}{\alpha -1}\) or (inverting the role of p and q) on \({\mathbb {E}}_p\left( \frac{p}{q}\right) ^\frac{1}{\alpha -1}\).

Proposition 4.4

Let \(p,q \in {\mathcal {P}}\). Assume there exist two constants \(\delta >0\) and \(c>0\) such that

$$\begin{aligned} {\mathbb {E}}_q\left( \mathbbm {1}_{\{\frac{q}{p}> \lambda \}} \right) \le c \lambda {\mathbb {E}}_p\left( \mathbbm {1}_{\{\frac{q}{p}> \delta \lambda \}} \right) , \quad \forall \, \lambda > 1. \end{aligned}$$
(16)

Then, there is \(\alpha >1\) depending only on \(c, \delta \) such that \({\mathbb {E}}_q(\frac{q}{p})^\frac{1}{\alpha -1}<+\infty \) and it holds

$$\begin{aligned} {\mathbb {E}}_q\left( \frac{q}{p}\right) ^\frac{1}{\alpha -1}\le \frac{2}{1-\frac{c}{\alpha \delta ^{\frac{\alpha }{\alpha -1}}}}. \end{aligned}$$
(17)

Proof

We may assume \(0<\delta <1\). Define \(\Phi _{\delta }(x)=x \delta ^{\frac{x}{x-1}}\) on \((1, +\infty )\). It is a continuous strictly increasing function such that \(\lim _{x\rightarrow 1^+}\Phi _{\delta }(x)=0\) and \(\lim _{x\rightarrow +\infty }\Phi _{\delta }(x)=+\infty \). Fix \(\alpha > \Phi ^{-1}_{\delta }(c)\). Multiplying both sides of (16) by \(\frac{1}{\alpha -1} \lambda ^{\frac{1}{\alpha -1}-1}\) and integrating with respect to \(\lambda \) on the interval \([1, +\infty )\), we find

$$\begin{aligned} {\mathbb {E}}_q\left( \mathbbm {1}_{\{\frac{q}{p}> 1\}} \left( \left( \frac{q}{p}\right) ^\frac{1}{\alpha -1}-1\right) \right) \le \frac{c}{\alpha } \, {\mathbb {E}}_p\left( \mathbbm {1}_{\{\frac{q}{p} > \delta \}} \left( \left( \frac{q}{p\delta }\right) ^\frac{\alpha }{\alpha -1}-1\right) \right) . \end{aligned}$$
(18)

Since \(\delta < 1\), we have \(\mathbbm {1}_{\{\frac{q}{p}> \delta \}}=\mathbbm {1}_{\{\delta <\frac{q}{p} \le 1\}}+\mathbbm {1}_{\{\frac{q}{p} > 1\}}\). Thus, rearranging the terms in (18), we get

$$\begin{aligned}&\left( 1-\frac{c}{\alpha \delta ^{\frac{\alpha }{\alpha -1}}}\right) {\mathbb {E}}_q\left( \mathbbm {1}_{\{\frac{q}{p}> 1\}} \left( \frac{q}{p}\right) ^\frac{1}{\alpha -1}\right) \le \nonumber \\&\quad \le {\mathbb {E}}_q\left( \mathbbm {1}_{\{\frac{q}{p} > 1\}}\right) + \frac{c}{\alpha } \, {\mathbb {E}}_p\left( \mathbbm {1}_{\{\delta <\frac{q}{p} \le 1\}} \left( \left( \frac{q}{p\delta }\right) ^\frac{\alpha }{\alpha -1}-1\right) \right) \le 1+ \frac{c}{\alpha \delta ^{\frac{\alpha }{\alpha -1}}}, \end{aligned}$$

from which we deduce

$$\begin{aligned} {\mathbb {E}}_q \left( \frac{q}{p}\right) ^\frac{1}{\alpha -1}\le & {} \frac{1+\frac{c}{\Phi _{\delta }(\alpha )}}{1-\frac{c}{\Phi _{\delta }(\alpha )}} + {\mathbb {E}}_q\left( \mathbbm {1}_{\{\frac{q}{p} \le 1\}} \left( \frac{q}{p}\right) ^\frac{1}{\alpha -1}\right) \\\le & {} \frac{1+\frac{c}{\Phi _{\delta }(\alpha )}}{1-\frac{c}{\Phi _{\delta }(\alpha )}} +1=\frac{2}{1-\frac{c}{\Phi _{\delta }(\alpha )}}. \end{aligned}$$

\(\square \)

We immediately deduce the following result.

Corollary 4.5

Let \(p, q=e^{u-K_p(u)} p \in {\mathcal {P}}\). Assume (16) holds true for \(\delta =\left( {\mathbb {E}}_p e^u\right) ^{-1} \in (0,1)\). We have:

  1. 1.

    There is \(\alpha >\max \{1,c\}\) which satisfies (17), and it holds \(K_p(u)<\frac{\alpha -1}{\alpha }\ln \frac{\alpha }{c}\).

  2. 2.

    Any \(\alpha >\max \{1,c\}\) such that \(K_p(u)<\frac{\alpha -1}{\alpha }\ln \frac{\alpha }{c}\) satisfies (17).

The proof follows from inequality \(\alpha > \Phi ^{-1}_{\delta }(c)\), with \(\Phi _{\delta }(x)=x \delta ^{\frac{x}{x-1}}\) and \(\delta =\left( {\mathbb {E}}_p e^u\right) ^{-1}\).

Remark 4.6

Note that a sufficient condition for the validity of (16) is

$$\begin{aligned} {\mathbb {E}}_p\left( \mathbbm {1}_{\{\frac{q}{p} \ge \delta \lambda \}} \right) \ge \frac{1}{c \lambda }, \quad \forall \, \lambda > 1. \end{aligned}$$

Letting \(\delta =\left( {\mathbb {E}}_p e^u\right) ^{-1}\) as in the corollary above, the inequality rewrites as \({\mathbb {E}}_p\left( \mathbbm {1}_{\{ e^u \ge \lambda \}} \right) \ge \frac{1}{c \lambda }\), \(\forall \, \lambda > 1\).

4.1 Robust Transformation of Norms

In the following proposition, we prove some robust transformations of norms when q varies in a subset of \({\mathcal {E}}(p)\).

Proposition 4.7

Given \(r<1\), consider the ball \({\mathcal {B}}_{r}(p)= \left\{ u \in L_0^{\Phi _1}(p) \,: \, \Vert u\Vert _{\star ,p}\le r \right\} \) and define

$$\begin{aligned} {\mathcal {E}}_r(p)=\left\{ q=e^{u-K_p(u)}p \ : \ u\in {\mathcal {B}}_{r}(p) \right\} \subseteq {\mathcal {E}}(p). \end{aligned}$$
(19)

Then,

  1. 1.

    for any \(v\in L^{\Phi _1}(p)\) and \(\alpha >\frac{1}{1-r}\)

    $$\begin{aligned} \sup _{q\in {\mathcal {E}}_r(p)}\Vert v\Vert _{\star , q}\le C_{\alpha , r } \Vert v\Vert _{\star ,p}, \end{aligned}$$
    (20)

    where

    $$\begin{aligned} C_{\alpha , r }= \alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( \frac{\alpha -1}{\alpha (1-r) -1} \right) ^\frac{\alpha -1}{\alpha }; \end{aligned}$$
    (21)
  2. 2.

    for any \(v\in L^{\Phi _1}(p)\) and \(\alpha > 1+r\)

    $$\begin{aligned} c_{\alpha , r }^{-1} \Vert v\Vert _{\star ,p} \le \inf _{q\in \mathcal E_r(p)}\Vert v\Vert _{\star , q}, \end{aligned}$$
    (22)

    where

    $$\begin{aligned} c_{\alpha , r }= \alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( \frac{1}{1-r}\right) ^{\frac{1}{\alpha }} \left( \frac{\alpha -1}{\alpha -1- r }\right) ^\frac{\alpha -1}{\alpha }; \end{aligned}$$
    (23)
  3. 3.

    for any \(v\in L^{\Phi _1}(p)\)

    $$\begin{aligned} c_r^{-1} \Vert v\Vert _{\star ,p} \le \inf _{q\in \mathcal E_r(p)}\Vert v\Vert _{\star , q} \le \sup _{q\in \mathcal E_r(p)}\Vert v\Vert _{\star , q}\le C_r \Vert v\Vert _{\star ,p}, \end{aligned}$$
    (24)

    where

    $$\begin{aligned} c_r= & {} \inf _{\alpha> 1+r} \, \alpha \, 2^{\frac{\alpha +1}{\alpha }}\left( \frac{1}{1-r}\right) ^{\frac{1}{\alpha }} \left( \frac{\alpha -1}{\alpha -1- r }\right) ^\frac{\alpha -1}{\alpha }, \\ C_r= & {} \inf _{\alpha > \frac{1}{1-r}}\alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( \frac{\alpha -1}{\alpha (1-r) -1}\right) ^\frac{\alpha -1}{\alpha }. \end{aligned}$$

Proof

We first observe that \({\mathcal {E}}_r(p) \subseteq {\mathcal {E}}(p)\) since \({\mathcal {B}}_{r}(p)\subseteq \overset{\circ }{\text {dom}\, K_p}\). In fact, if we take \( u \in L_0^{\Phi _1}(p) \) such that \(\Vert u\Vert _{\star ,p}\le r <1\), by Lemma 2.7 we get \(e^{|u|} \in L^{1+\epsilon }(p)\) for any \(0<\epsilon <1-r\). The inclusion then follows since, from the equivalence between 1 and 6 of the Portmanteau theorem, \(u \in \overset{\circ }{\text {dom}\, K_p}\) if and only if \( u \in L_0^{\Phi _1}(p) \) and \(e^u \in L^{1+\epsilon }(p)\) for some \(\epsilon >0\).

Since \(K_p(u)\ge 0\), we get

$$\begin{aligned} {\mathbb {E}}_q \left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}}={\mathbb {E}}_p \left( \frac{q}{p}\right) ^{\frac{\alpha }{\alpha -1}}\le {\mathbb {E}}_p \, e^{\frac{\alpha }{\alpha -1}u}, \end{aligned}$$

and, by Lemma 2.7, for \(\alpha >1\) such that \(\frac{\alpha }{\alpha -1} \Vert u\Vert _{\star ,p} <1\) we deduce

$$\begin{aligned} C_{\alpha }=\alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( {\mathbb {E}}_q \left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}}\right) ^\frac{\alpha -1}{\alpha }\le & {} \alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( {\mathbb {E}}_p \, e^{\frac{\alpha }{\alpha -1}u}\right) ^\frac{\alpha -1}{\alpha } \\\le & {} \alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( \frac{1}{1- \frac{\alpha }{\alpha -1} \Vert u\Vert _{\star ,p}}\right) ^\frac{\alpha -1}{\alpha } \nonumber \end{aligned}$$
(25)

For \(u\in {\mathcal {B}}_{r}(p)\) and \(\alpha >\frac{1}{1-r}\), we then obtain

$$\begin{aligned} C_{\alpha }\le C_{\alpha ,r}=\alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( \frac{1}{1- \frac{\alpha }{\alpha -1}\, r}\right) ^\frac{\alpha -1}{\alpha }. \end{aligned}$$

From the right-inequality in (9), we finally deduce (20) and, taking the infimum over \(\alpha >\frac{1}{1-r}\), we get the right-hand inequality in (24).

In order to prove (22), we start from the left-inequality in (9). Since

$$\begin{aligned} {\mathbb {E}}_p \left( \frac{p}{q}\right) ^{\frac{1}{\alpha -1}}= {\mathbb {E}}_p \, e^{\frac{1}{\alpha -1}\left( -u+K_p(u)\right) }= \left( {\mathbb {E}}_p \,e^u\right) ^{\frac{1}{\alpha -1}} {\mathbb {E}}_p \,e^{\frac{-u}{\alpha -1}}, \end{aligned}$$

again by Lemma 2.7, for \(\alpha >1\) such that \(\frac{1}{\alpha -1} \Vert u\Vert _{\star ,p} <1\) we deduce

$$\begin{aligned} c_{\alpha }=\alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( {\mathbb {E}}_p \left( \frac{p}{q}\right) ^{\frac{1}{\alpha -1}}\right) ^\frac{\alpha -1}{\alpha }= & {} \alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( {\mathbb {E}}_p \,e^u\right) ^{\frac{1}{\alpha }} \left( {\mathbb {E}}_p \,e^{\frac{-u}{\alpha -1}}\right) ^\frac{\alpha -1}{\alpha }\\\le & {} \alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( \frac{1}{1- \Vert u\Vert _{\star ,p}}\right) ^\frac{1}{\alpha } \left( \frac{1}{1- \frac{1}{\alpha -1} \Vert u\Vert _{\star ,p}}\right) ^\frac{\alpha -1}{\alpha }. \nonumber \end{aligned}$$
(26)

For \(u\in {\mathcal {B}}_{r}(p)\) and \(\alpha >1+r\), we then obtain

$$\begin{aligned} c_{\alpha }\le c_{\alpha ,r}=\alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( \frac{1}{1- r}\right) ^\frac{1}{\alpha } \left( \frac{1}{1- \frac{1}{\alpha -1} r}\right) ^\frac{\alpha -1}{\alpha }. \end{aligned}$$

From the left-inequality in (9) we finally deduce (22) and, taking the infimum over \(\alpha >1+r\), we get the left-hand inequality in (24). \(\square \)

Remark 4.8

An equivalent condition to sub-exponentiality that we can add to Proposition 2.3 in the particular case of centered random variables \(u\in L^{\Phi _1}_0(p)\) is the following (cf. [26]):

5.:

There exist \(a,b>0\)such that \({\mathbb E}_p(e^{\lambda |u|})\le 2e^{a\lambda ^2}, \forall \, 0<\lambda \le \frac{1}{b}\).

The constants in the bound can be expressed in terms of the norm \(\Vert \cdot \Vert _{\star ,p}\) as follows (cf. [25]):

$$\begin{aligned} {{\mathbb {E}}}_p(e^{\lambda |u|})\le 2e^{\frac{\beta }{\beta -1}\Vert u\Vert _{\star ,p}^2\lambda ^2}, \quad \forall 0<\lambda \le \frac{1}{\beta \Vert u\Vert _{\star ,p}}, \end{aligned}$$
(27)

where \(\beta \) is any number strictly greater than 1. By (25) and (26), we can then obtain robust bounds for \(C_\alpha \) and \(c_\alpha \) in exponential form.

Example 4.9

(The maximal exponential model with Gaussian weight; cf. [15, 25]) Let us consider the real Borel space \(({\mathbb {R}}, \mathcal {B})\) endowed with the probability measure \({{\mathbb {P}}}\) associated with the standard Gaussian density

$$\begin{aligned} d{{\mathbb {P}}}(x)= \frac{1}{\sqrt{2\pi }} e^{-\frac{1}{2}x^2} \ dx \end{aligned}$$

and the Gaussian exponential Orlicz space \(L^{\Phi _1}=L^{\Phi _1}(p)\), with \(p=1\).

The Orlicz space \(L^{\Phi _1}\) trivially contains all second-order polynomials of the form \(P(x)=a_2 x^2+a_1 x+a_0\).

For a centered polynomial \(P(x)\in L_0^{\Phi _1}\), where \(a_2+a_0=0\), we get

$$\begin{aligned} \Vert P(x) \Vert _{\star }\le |a_2| \Vert x^2 \Vert _{\star }+ |a_1| \Vert x \Vert _{\star } + |a_2|= 3|a_2| + \sqrt{\frac{2}{\pi }}\, |a_1|. \end{aligned}$$
(28)

In fact, for \(k=1,2, \dots \)

$$\begin{aligned} {\mathbb {E}}(x^{2k})=\frac{1}{\sqrt{2\pi }} \int _{-\infty }^ \infty x^{2k} e^{-\frac{1}{2}x^2} \ dx=1\cdot 3 \cdot 5 \cdots (2k-1), \end{aligned}$$

and we can write

$$\begin{aligned} \frac{{\mathbb {E}}(x^{2k})}{k!}=1 \cdot (2-\frac{1}{2})\cdot (2- \frac{1}{3}) \cdots (2-\frac{1}{k}). \end{aligned}$$

The sequence \(\left( \frac{{\mathbb {E}}(x^{2k})}{k!}\right) ^{1/k}\) is increasing, and it holds

$$\begin{aligned} \Vert x^2 \Vert _{\star }=\sup \limits _{k\ge 1}\left( {\mathbb {E}}(x^{2k})/k!\right) ^{1/k}=2. \end{aligned}$$

In addition, for \(k=1,2, \dots \)

$$\begin{aligned} {\mathbb {E}}(|x|^{k})=\frac{2}{\sqrt{2\pi }} \int _{0}^ \infty x^{k} e^{-\frac{1}{2}x^2} \ dx= {\left\{ \begin{array}{ll} \sqrt{\frac{2}{\pi }} &{} \text{ if } k=1 \\ \sqrt{\frac{2}{\pi }} (k-1)(k-3) \cdots 2 \cdot 1 &{} \text{ if } k\ge 3, k \text{ odd } \\ (k-1)(k-3) \cdots 3 \cdot 1 &{} \text{ if } k\ge 2, k \text{ even } \end{array}\right. } \end{aligned}$$

which implies

$$\begin{aligned} \Vert x \Vert _{\star }=\sup \limits _{k\ge 1}\left( {\mathbb {E}}(|x|^{k})/k!\right) ^{1/k}=\sqrt{\frac{2}{\pi }}. \end{aligned}$$

By (28), the ball \({\mathcal {B}}_{r}= \left\{ u \in L_0^{\Phi _1} \,: \, \Vert u\Vert _{\star }\le r \right\} \) contains all polynomials \(P(x)=a_2 x^2+a_1 x-a_2\) such that \(3|a_2| +\sqrt{\frac{2}{\pi }} \, |a_1| \le r\).

More generally, \(L_0^{\Phi _1}\) also includes the functions \(u\in C^2({\mathbb {R}};{\mathbb {R}})\) having \({\mathbb {E}}(u)=0\) and bounded second derivative. In fact, for a suitable \(\xi \in {\mathbb {R}} \) in a neighborhood of 0, these functions can be written in the form

$$\begin{aligned} u(x)=u(0)+ u^{'}(0) x + \frac{1}{2} u^{''}(\xi )x^2, \end{aligned}$$
(29)

with

$$\begin{aligned} u(0)= -\dfrac{1}{2} {\mathbb {E}}\left( u^{''}(\xi )x^2\right) , \quad |u^{''}(x)| \le c, \, \forall x\in {\mathbb {R}}, \end{aligned}$$
(30)

from which we deduce that

$$\begin{aligned} e^{P_1(x)}\le e^{u(x)} \le e^{P_2(x)}, \quad \forall x\in {\mathbb {R}}, \end{aligned}$$
(31)

for suitable second-order polynomials \(P_1\) and \(P_2\).

Since

$$\begin{aligned} \Vert u \Vert _{\star }\le |u(0)|+ |u^{'}(0)| \Vert x \Vert _{\star } + \frac{1}{2} \Vert u^{''}(\xi )x^2\Vert _{\star }, \end{aligned}$$

where \(|u(0)|\le \dfrac{1}{2} c \), \(\Vert x \Vert _{\star }=\sqrt{\frac{2}{\pi }}\) and \(\Vert u^{''}(\xi )x^2\Vert _{\star }\le c \Vert x^2\Vert _{\star }=2c\), we deduce

$$\begin{aligned} \Vert u \Vert _{\star }\le \frac{3}{2} c + \sqrt{\frac{2}{\pi }} \, |u^{'}(0)|. \end{aligned}$$

This means that \({\mathcal {B}}_{r}\) also contains the class of functions u defined by (29), (30) and such that \(\dfrac{3}{2} c + \sqrt{\frac{2}{\pi }} \, |u^{'}(0)|\le r\), and therefore, our results on the equivalence of norms are robust with respect to all densities \(q\propto e^u\) obtained by varying u in this class of functions.

4.2 Robust Concentration Inequalities

In the previous sections, we have seen that densities connected by an open exponential arc produce the same Orlicz space with equivalent norms and we have explicitly stated the related equivalence constants. In addition, a robust equivalence of norms has been obtained when q varies in the subset \({\mathcal {E}}_r(p)\) of the maximal exponential model. An interesting application that we present in this section is to derive robust concentration inequalities of Bernstein type.

We first recall some concentration inequalities of Bernstein type which hold true for centered random variables belonging to the exponential Orlicz space (see, e.g., [25, 26]).

Proposition 4.10

Let \(v\in L^{\Phi _1}_0\). Then, \(\forall t>0\) and for any \(\beta >1\),

$$\begin{aligned}{} & {} {\mathbb {P}}(|v|\ge t)\le 2\exp \left( -\min \left\{ \frac{t}{2\beta \Vert v\Vert _{\star }} \, \ \frac{t^2}{4\frac{\beta }{\beta -1} \Vert v\Vert _{\star }^2} \right\} \right) , \end{aligned}$$
(32)
$$\begin{aligned}{} & {} {\mathbb {P}}(|v|\ge t)\le 2\exp \left( -\frac{t^2}{ 2\Vert v\Vert _{\star }(t+2\Vert v\Vert _{\star }) }\right) . \end{aligned}$$
(33)

Since sub-exponentiality is preserved by linear transformations, we can obtain more general concentration inequalities for sums of independent random variables (see [26]).

Due to the form of the concentration bounds, when q varies in \({\mathcal {E}}_r(p)\) defined by (19), the robust version of the concentration inequalities can be immediately stated as in the following corollary.

Corollary 4.11

For a fixed \(p\in {\mathcal {P}}\), let \(v\in L^{\Phi _1}(p)\). Then, \(\forall t>0\) and for any \(\beta >1\)

$$\begin{aligned}{} & {} \sup _{q\in \mathcal {E}_r(p) } {\mathbb {E}}_q \left( \mathbbm {1}_{\{|v-{\mathbb {E}}_q(v)|\ge t\}} \right) \le 2\exp \left( -\min \left\{ \frac{t}{4\alpha C_r \Vert v \Vert _{\star , p}} \, \ \frac{t^2}{16 \frac{\beta }{\beta -1} C_r^2 \Vert v \Vert _{\star , p}^2} \right\} \right) ,\nonumber \\ \end{aligned}$$
(34)
$$\begin{aligned}{} & {} \sup _{q\in \mathcal {E}_r(p) } {\mathbb {E}}_q \left( \mathbbm {1}_{\{|v-{\mathbb {E}}_q(v)|\ge t\}} \right) \le 2\exp \left( -\frac{t^2}{ 4 C_r \Vert v \Vert _{\star , p}\left( t+4 C_r \Vert v \Vert _{\star , p}\right) }\right) . \end{aligned}$$
(35)

Proof

The proof follows the lines of some concentration inequalities stated in [25], so we only sketch it. For \(t>0\) and for any \(\beta >1\), we can write

$$\begin{aligned}{} & {} \sup _{q\in \mathcal {E}_r(p) } {\mathbb {E}}_q \left( \mathbbm {1}_{\{|v-{\mathbb {E}}_q(v)|\ge t\}} \right) \le 2\exp \left( -\min \left\{ \frac{t}{2\alpha {{\widetilde{N}}}_{r}(v)} \, \ \frac{t^2}{4\frac{\beta }{\beta -1} {{\widetilde{N}}}^2_{r}(v)} \right\} \right) , \end{aligned}$$
(36)
$$\begin{aligned}{} & {} \sup _{q\in \mathcal {E}_r(p) } {\mathbb {E}}_q \left( \mathbbm {1}_{\{|v-{\mathbb {E}}_q(v)|\ge t\}} \right) \le 2\exp \left( -\frac{t^2}{ 2{{\widetilde{N}}}_{r}(v)\left( t+2{\widetilde{N}}_{r}(v)\right) }\right) , \end{aligned}$$
(37)

where

$$\begin{aligned} {{\widetilde{N}}}_{{r}}(v)= \sup _{q\in {\mathcal {E}}_r(p)}\Vert v-{\mathbb {E}}_q(v) \Vert _{\star , q}<\infty . \end{aligned}$$
(38)

Finally, by Proposition 4.7(3) we get

$$\begin{aligned} {{\widetilde{N}}}_{{r}}(v)\le \sup _{q\in {\mathcal {E}}_r(p)}\Vert v \Vert _{\star , q} + \sup _{q\in {\mathcal {E}}_r(p)}{\mathbb {E}}_q(|v|) \le 2 C_r \Vert v \Vert _{\star , p}. \end{aligned}$$

\(\square \)

5 Conclusion

The theoretical results of this paper first highlighted the link existing between the exponential Orlicz space and the space of BMO functions. As a consequence, it was possible to understand that the property \((P_\alpha )\) characterizing the maximal exponential model is a sort of static Muckenhoupt property which holds simultaneously with respect to two densities connected by an open exponential arc. This could help to understand how to introduce time dependence in the study of maximal exponential models, which is still an open issue. The explicit constants we obtained when changing the law of Orlicz spaces centered at connected densities are one of the main contributions of the paper. As an application, for random variables belonging to the exponential Orlicz space we obtained concentration inequalities of Bernstein type robust with respect to densities in an exponential subfamily. Their extension to sums of sub-exponential random variables can be applied to derive uniform exponential bounds for the law of large numbers.