Abstract
Improvements in the study of nonparametric maximal exponential models built on Orlicz spaces are proposed. By exploiting the notion of sub-exponential random variable, we give theoretical results which provide a clearer insight into the structure of these models. The explicit constants we obtain when changing the law of Orlicz spaces centered at connected densities allow us to derive uniform bounds with respect to a reference density.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The theory of nonparametric maximal exponential models centered at a given positive density p started with the seminal work by Pistone and Sempi [18] and is a generalization of the statistical theory of exponential families. An infinite-dimensional or nonparametric exponential family is typically defined by the form \(\exp (u-K_p(u)) p\), where the random variable u varies in an appropriate function space, p is the probability density function of a base probability \({\mathbb {P}}\), and \(K_p(u)\) is the logarithm of the normalization constant, which is also known as the cumulant generating function. In [18], the sufficient statistics u in the exponential family are such that \(\exp (\theta u)\) is \(p \cdot {\mathbb {P}}\)-integrable for all \(\theta \) in a real open interval containing 0; in fact, a further restriction is needed to avoid the border of the set \(\{u: K_p(u)<+\infty \}\). This integrability condition, in turn, defines the Banach space of exponential Orlicz spaces \(L^{\Phi }(p)\), whose various equivalent characterizations are the starting point from which the results of this paper arise.
The geometric theory of statistical models as manifolds modeled on exponential Orlicz spaces was deeply investigated in [1, 6, 7, 17]. Specifically, two densities p and q belonging to the maximal exponential model are connected by an open exponential arc. (By open, we essentially mean that the two connected densities are not the extremal points of the arc.) All densities connected by an open exponential arc form a Banach manifold \({\mathcal {P}}\), and the space \(L^{\Phi }(p)\), \(p\in \mathcal P\), is an expression of the tangent space.
Subsequent upgrades and applications of maximal exponential models to Statistics, Information Geometry, Physics and also to Finance have been presented in many works, see, e.g., [8, 12, 14, 16, 20,21,22,23,24,25]. Mainly, in [20] the authors prove that the exponential connection by arc for two densities is equivalent to the equality of the corresponding Orlicz spaces. The possibility to switch from one Orlicz space to the other was exploited in [23] to improve some duality results in a problem of exponential utility maximization. Furthermore, since connected densities produce the same Orlicz space with equivalent norms, in [25] robust concentration inequalities of Bernstein type have been obtained.
An equivalent formulation given in [20] for defining the maximal exponential model requires that the ratio q/p has to satisfy with respect to both densities p and q an integrability condition, denoted here by \((P_\alpha )\), with \(\alpha >1\). From \((P_\alpha )\), the finiteness of both the Kullback–Leibler divergences \(D(q\Vert p)\) and \(D(p\Vert q)\) immediately follows.
Different authors have generalized the original structure of the maximal exponential model by replacing the exponential function with deformed exponentials (see, e.g., [11, 28, 29]) or by modeling the statistical manifold on other function spaces (see, e.g., [4, 13]).
In this paper, new analytical properties of the maximal exponential model in the topology of the exponential Orlicz space are presented. The notion of sub-exponential random variable, equivalent to belonging to the Orlicz exponential space, leads to investigate its link with the definition of BMO (bounded mean oscillation) function. In this analysis, we use a suitable norm equivalent to the standard Luxemburg norm, which allows us to obtain a sharp estimate for the distance to \(L^\infty \) of a sub-exponential random variable. This estimate, in turn, allows us to better understand the structure underlying the maximal exponential model, as it establishes which is the smallest \(\alpha >1\) for which \((P_\alpha )\) holds. A comparison with the BMO theory highlights that condition \((P_\alpha )\), characterizing the maximal exponential model, is a sort of “static” Muckenhoupt condition on \(\log (q/p)\) with respect to both densities p and q. This could be a hint on how to introduce time dependence in the context of maximal exponential models, allowing to change the perspective from static to dynamic.
An interesting issue in statistical applications is the possibility of knowing the constants that the change of law produces on the equivalent norms of Orlicz spaces centered at two connected densities. We provide an answer to this matter and exploit our results to derive uniform bounds.
The paper is organized as follows. In Sect. 2, we investigate the notion of sub-exponential random variable and its link with the distance in the Orlicz topology to \(L^\infty \). In Sect. 3, we use the results of the previous section to highlight the structure underlying the maximal exponential model. In Sect. 4, we study the relationships between Orlicz spaces centered at connected densities, in terms of how their equivalent norms transform. We provide uniform bounds for the norms and some concentration inequalities of Bernstein type that are robust in an exponential subfamily with respect to a reference density.
2 Sub-exponentiality in Orlicz Spaces
In this section, we first describe the relevant Banach spaces: the exponential Orlicz space and its conjugate endowed with the Luxemburg norm (2). We then present the exponential space as a space of sub-exponential random variables with a new norm based of moments, equivalent to the Luxemburg norm, and we investigate its connections to BMO spaces. The second part of the section is partly new and is devoted to the study of a bound on the norm-distance of a generic sub-exponential random variable from \(L^\infty \). In fact, it is well known that the space of bounded random variables is not dense in the exponential Orlicz space, unless the state space is finite, cf. [19]. Particularly, the sequence of truncations of a generic element of such a space does not converge in general to the original variable. The bound we obtain characterizes in terms of the moment generating function those random variables for which the truncations converge. It also provides a better understanding of the structure of maximal exponential models in the next section.
Let \(( \Omega , {{\mathcal {F}}}, {\mathbb {P}})\) be a fixed probability space. Denote by \(L^k\), \(k\ge 1\), the ordinary Lebesgue spaces and by \(L^0\) the set of all random variables u defined on the probability space.
An exponential Orlicz space is a classical Banach space associated with an exponentially growing Young function. Young functions can be seen as generalizations of the power functions, and consequently, Orlicz spaces are generalizations of the Lebesgue spaces. Specifically,
Definition 2.1
A Young function \(\Phi \) is an even, convex function \(\Phi : {\mathbb R}\rightarrow [0,+\infty ]\) such that
The conjugate function \(\Psi \) of \(\Phi \) is defined as \(\Psi (y)=\underset{x\in {{\mathbb {R}}}}{\sup }\{xy-\Phi (x)\}\), \(\forall y \in {{\mathbb {R}}}\) and is itself a Young function.
The Orlicz space \(L^{\Phi }\), associated with the Young function \(\Phi \), is defined as
, and the corresponding subspace of centered random variables is denoted by \(L^{\Phi }_0\).
\(L^{\Phi }\) is a complete Banach space when endowed with the Luxemburg norm
We refer to [19] for a general background on the subject.
Here, we focalize on the Young function \(\Phi _1(x)=\text {cosh}(x)-1\), whose conjugate function is \(\Psi _1(y)=y\log (y+\sqrt{1+y^2})-\sqrt{1+y^2}+1\). As a consequence, a random variable u belongs to \(L^{\Phi _1}\) if and only if its moment generating function \(M_u(t)= {\mathbb {E}}(e^{tu})\) is finite in a neighborhood of 0. Moreover, the closed unit ball of \(L^{\Phi _1}\) is
It is worth recalling that the same Orlicz space can be related to different equivalent Young functions. Specifically, the Young function \(\Phi _1(x)\) is equivalent to the more commonly used \(\Phi _2(x)= e^{|x|}-|x|-1\) and its conjugate function \(\Psi _1(y)\) is equivalent to \(\Psi _2(y)= (1+|y|)\log (1+|y|)-|y|\), whose analytic expression is related to the Kullback–Leibler divergence.
Remark 2.2
We can easily locate the Orlicz spaces \(L^{\Phi _1}\) and \(L^{\Psi _1}\) in the hierarchy of classical \(L^k\) spaces:
more precisely, every correspondent injection is continuous. In particular, if \(u\in L^{\Phi _1}\), then the moment of u of any order is finite.
The following proposition gives equivalent conditions for a random variable to belong to \( L^{\Phi _1}\) (cf. [30]).
Proposition 2.3
Let \(u\in L^0\). The following conditions are equivalent:
-
1.
\(u\in L^{\Phi _1}\), i.e., the moment generating function of u is finite in a neighborhood of 0.
-
2.
There is \(\lambda >0\) such that
$$\begin{aligned} {\mathbb {P}}(|u|\ge t)\le \delta e^{-\lambda t}, \ \exists \delta =\delta (\lambda )\ge 1 \text{ and } \forall t>0. \end{aligned}$$(3) -
3.
\(\sup \limits _{k\ge 1}\left( {\mathbb {E}}\left( |u|^k\right) /k!\right) ^{1/k}<\infty \).
-
4.
\(\limsup \limits _{k\rightarrow \infty }\left( {\mathbb {E}}\left( |u|^k\right) /k!\right) ^{1/k}<\infty \).
For \(u\in L^{\Phi _1}\), the mapping
is a norm on \(L^{\Phi _1}\) equivalent to \(\Vert u\Vert _{\Phi _1}\) and it holds (see [24])
In particular, it can be easily checked that for constant random variables we have \(\Vert c\Vert _{\star }=|c|\) and for any bounded random variable u it holds \(\Vert u\Vert _{\star }\le \Vert u\Vert _{\infty }\).
In the sequel, we will prove our results with respect to the norm \(\Vert \cdot \Vert _{\star }\), which turns out to be a convenient choice in order to obtain sharp estimates (see Theorem 2.8).
Remark 2.4
A random variable u satisfying condition 2 in the previous proposition is also called sub-exponential. If u satisfies the inequality \({\mathbb {P}}(|u|\ge t)\le \delta e^{-\lambda t^2}, \ \forall t>0\), it is called sub-Gaussian. It is immediate to see that a sub-Gaussian random variable u is also sub-exponential and that u is sub-Gaussian if and only if \(u^2\) is sub-exponential. More generally, \(u^n\) is sub-exponential, i.e., \(u^n\in L^{\Phi _1}\), if and only if \({\mathbb {P}}(|u|\ge t)\le \delta e^{-\lambda t^n}, \ \forall t>0\).
Remark 2.5
Condition 2 of Proposition 2.3 can be rewritten in the form:
- 2’.:
-
There is \(\lambda >0\) such that \({\mathbb {P}}(|u|\ge t)\le e^{-\lambda t}, \ \forall t>t_0=t_0(\lambda )\ge 0\),
which results to be a probabilistic version of the John–Nirenberg inequality for \(\text {BMO}\)-functions in classical analysis (see [9]). Specifically, we recall that a real-valued locally integrable function f on \({\mathbb {R}}^d\) has bounded mean oscillation, that is, \(f\in \text {BMO}\), if
where I denotes any cube in \({\mathbb {R}}^d\), |I| its volume and \(f_I\) the mean of f over I. John and Nirenberg proved in [9] that \(f\in \text {BMO}\) if and only if there is \(\lambda >0\) such that
The probabilistic version of (6) for continuous uniformly integrable martingales \(M=(M_s)_{0\le s\le \infty }\) with \(M_0=0\) is
where \(\tau \) denotes any stopping time for the filtration \(\mathcal F\). The probabilistic analogue of the John–Nirenberg inequality was proved by Emery [3]: \(M\in \text {BMO}\) if and only if there is \(\lambda >0\) such that
for any stopping time \(\tau \).
The following proposition establishes the equality between the largest range for which the moment generating function of |u| is finite and the largest \(\lambda \) for which condition (3) holds.
Proposition 2.6
For \(u\in L^{\Phi _1}\), define
and
Then, it holds
Proof
In order to prove \(a(u)\le \lambda (u)\), we show that
Let us take \(a>0\) such that \({\mathbb {E}}e^{a |u|}<+\infty \). By Markov inequality, for all \(t>0\) we have
We then obtain (3) by taking \(\lambda =a\) and \(\delta ={{\mathbb {E}}}(e^{a |u|})\).
To prove \(a(u)\ge \lambda (u)\), suppose by contradiction that \(a(u)< \lambda (u)\). We can then select \(\lambda _1, \lambda _2>0\) such that \(a(u)<\lambda _1<\lambda _2<\lambda (u)\). Since \(\lambda _2<\lambda (u)\), it holds
for a suitable \(\delta \ge 1\). Using (7), for any \(k=1,2...\) we have
As a consequence, we get
which implies \(\lambda _1\le a(u)\). \(\square \)
The next result is a technical preliminary to our main results.
Lemma 2.7
Let \(u\in L^{\Phi _1}\). If \(\Vert u\Vert _{\star }<1\), then
Proof
By the definition (4) of \(\Vert \cdot \Vert _{\star }\), we have
\(\square \)
The following theorem gives a very nice relation between \(a(u)(=\lambda (u))\) and \(d_{\star }(u,L^{\infty })\), where
It is in the spirit of the results obtained in classical and stochastic analysis for the distance in \(\text {BMO}\) to \(L^\infty \), where upper and lower bounds for a(u) in terms of \(d_{\text {BMO}}(u, L^\infty )\) are given (see, e.g., [5] and, for continuous martingales, [27]). In our setting, we can prove that a(u) is exactly the reciprocal of the \(d_{\star }\)-distance of \(u\in L^{\Phi _1}\) to \(L^\infty \).
Theorem 2.8
For \(u\in L^{\Phi _1}\), we have
Proof
We first prove that \(a(u) \ge \frac{1}{d_{\star }(u,L^{\infty })}\). Let \(0<a<\frac{1}{d_{\star }(u,L^{\infty })}\). Then, \(a \Vert u-\ell \Vert _{\star }<1\) for some \(\ell \in L^{\infty }\). Using Lemma 2.7, we get
which implies \(a\le a(u)\). We now prove that \(a(u) \le \frac{1}{d_{\star }(u,L^{\infty })}\). It suffices to verify that if \(0< a < a(u)\), then \(a\le \frac{1}{d_{\star }(u,L^{\infty })}\). Let us fix \(0<a < a(u)\). For any \(\ell \in L^{\infty }\), it holds
where the last inequality is due to the fact that for any \(a>0\) and \(k\ge 1\) it holds \(\frac{{\mathbb {E}}|v|^k}{k!}\le \frac{1}{ a^k} {\mathbb {E}} e^{a |v|}\) and thus
Now, let us choose the sequence of truncations at n of u, i.e., \(u_n=1_{|u|\le n}u\in L^{\infty }\); then, the sequence \(|u-u_n|=1_{|u|>n} |u|\) decreases to 0. Since by hypothesis \(a < a(u)\), the random variables \(e^{a|u-u_n|}\) are bounded by the integrable random variable \(e^{a|u|}\). Applying the Lebesgue dominated convergence theorem, we deduce that \(\lim _{n\rightarrow +\infty }{\mathbb {E}}e^{a|u-u_n|}=1\) and thus \(d_{\star }(u,L^{\infty })\le \frac{1}{a} \). \(\square \)
Remark 2.9
From the theorem above, we immediately see that the sequence of truncations \(u_n\) converges to the original variable \(u\in L^{\Phi _1}\) if and only if the moment generating function of u is defined on the whole real line (cf. [1], Lemma 2).
In the sequel, we investigate the relationships among \(L^{\Phi _1}\) spaces with respect to different probability measures whose densities with respect to \({\mathbb {P}} \) belong to an open exponential model. For this purpose, we need to recall the notion of open exponential model, as well as its geometric meaning and the corresponding main results within the Orlicz framework.
3 Exponential Models
This section discusses a crucial topic in the geometric theory of statistical models. Here, the main new contributions are represented by Items 6 and 8 of Theorem 3.5 and by Proposition 3.8.
Let \({\mathcal {P}}\) denote the set of all densities which are positive \({\mathbb {P}}\)-a.s. For each fixed \(p\in {\mathcal {P}}\), we use \({\mathbb {E}}_p\) to indicate the integral with respect to \(p \cdot {\mathbb {P}}\). Moreover, the corresponding Orlicz space associated with \(\Phi _1\) is denoted by \(L^{\Phi _1}(p)\).
Let us consider the cumulant generating functional \(K_p(u)=\log {\mathbb {E}}_p(e^u)\) defined on the subspace of centered random variables \(L^{\Phi _1}_0(p)\). We recall from [18] that \(K_p\) is a positive convex and lower semicontinuous function, vanishing at zero. In addition, the interior of its proper domain, denoted here by \(\overset{\circ }{\text {dom}\, K_p}\), is a non-empty convex set containing the open unit ball of \(L^{\Phi _1}_{0}(p)\). This allows us to give the following definition.
Definition 3.1
For every density \(p \in {{\mathcal {P}}}\), the maximal exponential model at p is
Remark 3.2
\(K_p\) is defined on the set \(L^{\Phi _1}_{0}(p)\) because centered random variables guarantee the uniqueness of the representation of \(q\in {{\mathcal {E}}}(p)\).
One of the main results in [18] states that any density belonging to the maximal exponential model centered at p is connected by an open exponential arc to p and vice versa. By open, we mean that the two densities are not the extremal points of the arc:
Definition 3.3
\(p, q \in {{\mathcal {P}}}\) are connected by an open exponential arc if there exists an open interval \(I \supset [0,1]\) such that \(p(\theta )\propto p^{(1-\theta )}q^{\theta }\in {\mathcal {P}}\), \(\forall \theta \in I\).
The connection by an open exponential arc is an equivalence relation. An equivalent definition is provided by the following proposition (see [1]).
Proposition 3.4
\(p, q \in {{\mathcal {P}}}\) are connected by an open exponential arc iff there exists an open interval \(I \supset [0,1]\) such that \(p(\theta )\propto e^{\theta u}p\in {\mathcal {P}}\), \(\forall \theta \in I\), where \(u\in L^{\Phi _1}(p)\) and \(p(0)=p, \ p(1)=q\).
The following theorem gives different equivalent conditions for a density to belong to the maximal exponential model. The proof of assertions 1–4 can be found in [1], while that of assertions 5–7 can be found in [20, 23]. The equivalence between 4 and the new assertion 8 follows from Theorem 2.8 of the previous section.
Theorem 3.5
(Portmanteau Theorem) Let \(p,q \in {\mathcal {P}}\). The following statements are equivalent.
-
1.
\(q\in {\mathcal {E}}(p)\);
-
2.
q is connected to p by an open exponential arc;
-
3.
\({\mathcal {E}}(p)={\mathcal {E}}(q)\);
-
4.
\(\log (q/p) \in L^{\Phi _1}(p)\cap L^{\Phi _1}(q);\)
-
5.
\(L^{\Phi _1}(p)=L^{\Phi _1}(q)\);
-
6.
There exists \(\alpha >1\) such that
$$\begin{aligned} (P_\alpha ): \qquad q/p\in L^{1/(\alpha -1)}(q) \,\, \text{ and } \,\, p/q \in L^{1/(\alpha -1)}(p) \end{aligned}$$ -
7.
\(^m{\mathbb {U}}^q_p: L^{\Psi _1}(p) \rightarrow L^{\Psi _1}(q)\) s.t. \(^m{\mathbb {U}}^q_p(v)=(p/q)v\) is an isomorphism of Banach spaces.
-
8.
\(d_{\star , p}(\log (q/p), L^\infty )<+\infty \) and \(d_{\star , q}(\log (q/p), L^\infty )<+\infty \).
It is worth noting that the maximal exponential model is a good environment when dealing with divergence between densities. In fact, since \((P_\alpha )\) means \(q/p\in L^{1+\epsilon }(p)\) and \(p/q \in L^{1+\epsilon }(q)\) for some \(\epsilon >0\), it immediately follows that if \(q \in \mathcal {E}(p)\), then Kullback–Leibler divergences \(D(q\Vert p)\) and \(D(p\Vert q)\) are both finite.
The equivalence between the equality of the Orlicz spaces \(L^{\Phi _1}(p)\) and \(L^{\Phi _1}(q)\) and property \((P_\alpha )\) has been exploited in [23] to improve some duality results in the classical problem of exponential utility maximization in incomplete markets. In fact, in many well-known works on relative entropy minimization, the minimal entropy martingale (density) measure \(q^*\) satisfies \((P_\alpha )\), allowing to switch from the reference Orlicz space \(L^{\Phi _1}(p)\) to \(L^{\Phi _1}(q^*)\), and conversely, at convenience.
Remark 3.6
From Hölder inequality, \((P_\alpha )\) implies \((P_r)\) \(\forall r>\alpha \).
Lemma 3.7
The following equivalence holds:
Proof
It suffices to observe that if \(1<\alpha \le 2\), then
Similarly, if \(1<\alpha \le 2\), then
If \(\alpha > 2\), the function \(x^{\frac{1}{\alpha -1}}\) is concave and, by Jensen inequality, \({\mathbb {E}}_p\left( \frac{p}{q}\right) ^{- \frac{1}{\alpha -1}}= {\mathbb {E}}_p\left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}} \le 1\) and \({\mathbb {E}}_q\left( \frac{q}{p} \right) ^{- \frac{1}{\alpha -1}}= {\mathbb {E}}_q\left( \frac{p}{q} \right) ^{\frac{1}{\alpha -1}}\le 1\). \(\square \)
The following proposition gives a characterization of the smaller \(\alpha >1\) for which \((P_\alpha )\) holds, in terms of the \(d_{\star }\)-distance of \(\log (q/p)\) to \( L^\infty \).
Proposition 3.8
Let \(q\in {\mathcal {E}}(p)\) and define
Then,
Proof
By Lemma (3.7), we get
By Theorem 2.8, we then deduce
\(\square \)
In the next section, we will see that when \(q\in {\mathcal {E}}(p)\), the distances \(d_{\star , p}\) and \(d_{\star , q}\) are equivalent. So, from the proposition above we get that \(\log \frac{q}{p}\) belongs to the closure in \(L^{\Phi _1}(p)=L^{\Phi _1}(q)\) of \(L^\infty \) if and only if \(\alpha _{p,q}=1\), which means that \((P_\alpha )\) holds for all \(\alpha >1\).
Remark 3.9
In the classical setting, it was shown (see [5]) that a locally integrable function f on \({\mathbb {R}}^d\) belongs to the \(\text {BMO}\)-closure of \(L^\infty \) if and only if both \(e^f\) and \(e^{-f}\) satisfy the Muckenhoupt \((A_\alpha )\) condition for all \(\alpha >1\):
where I denotes any cube in \({\mathbb {R}}^d\) and |I| its volume.
In the stochastic setting, a probabilistic analogue of the above result has been obtained for continuous \(\text {BMO}\)-martingales (see [10]); specifically, a \(\text {BMO}\)-martingale \(M=(M_t)_{0\le t\le \infty }\) belongs to the \(\text {BMO}\)-closure of \(L^\infty \) if and only if both \({\mathbb {E}}\left( e^{M_{\infty }} |{\mathcal {F}}_{\cdot }\right) \) and \({\mathbb {E}}\left( e^{-M_{\infty }} |{\mathcal {F}}_{\cdot }\right) \) satisfy the probabilistic version of the Muckenhoupt \((A_\alpha )\) condition, for all \(\alpha >1\):
where \(\tau \) denotes any stopping time for the filtration \(\mathcal F\). Observe that the class \(\text {BMO}\) as well as \((A_\alpha )\) condition depends on the underlying probability. For \(\tau =0\), \((A_\alpha )\) implies
Thinking that the role of \(M_\infty \) is played by \(\log \frac{q}{p}\), we immediately see that requesting joint validity of \((P_2)\) and \((P_\alpha )\) means requesting the validity of \( (A_\alpha ^0)\) with respect to p and q
In particular, taking into account Remark 3.6, for \(1<\alpha \le 2\) it holds
4 Transformation of Norms by a Change of Law
Using essentially new computations with inequalities, this section derives explicit constants in the bounds between norms of equivalent Orlicz spaces. Since these constants depend on the points, they can be viewed as a measure of distance between points in the same maximal exponential model. Two applications of these bounds are proposed in Sects 4.1 and 4.2. Specifically, concentration inequalities frequently appear in high-dimensional probability and statistics ([26, 30]), but studies on the robustness of these inequalities uniformly in bounded parts of the statistical model are still limited.
We first start by recalling the following result proved in Cena and Pistone [1].
Proposition 4.1
If the Orlicz spaces \(L^{\Phi _1}(p)\) and \(L^{\Phi _1}(q)\) are equal as sets, then their Luxemburg norms \(\Vert \cdot \Vert _{\Phi _1, p}\) and \(\Vert \cdot \Vert _{\Phi _1, q}\) are equivalent.
The proof of the equivalence of norms is based on a standard argument for function spaces, i.e., in proving that the identity map from \(L^{\Phi _1}(p)\) to \(L^{\Phi _1}(q)\) is an homeomorphism. In many statistical applications, however, it is crucial to know the constants that this change of law yields on the norms. Theorem 4.2 provides an answer to this issue. It refers to the norm \(\Vert \cdot \Vert _{\star }\), equivalent to \(\Vert \cdot \Vert _{\Phi _1}\) by (5), since it is more suitable for applying the property \((P_\alpha )\) of the Portmanteau theorem.
Theorem 4.2
Let \(v\in L^{\Phi _1}(p)=L^{\Phi _1}(q)\). Then,
-
1.
there exists \(\alpha >1 \) such that
$$\begin{aligned} c_{\alpha }^{-1} \Vert v\Vert _{\star ,p} \le \Vert v\Vert _{\star ,q}\le C_{\alpha } \Vert v\Vert _{\star ,p}, \end{aligned}$$(9)where
$$\begin{aligned} c_{\alpha }=\alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( {\mathbb {E}}_p \left( \frac{p}{q}\right) ^{\frac{1}{\alpha -1}}\right) ^\frac{\alpha -1}{\alpha }, \quad C_{\alpha }=\alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( {\mathbb {E}}_q \left( \frac{q}{p}\right) ^{\frac{1}{\alpha -1}}\right) ^\frac{\alpha -1}{\alpha } \end{aligned}$$(10)are constants independent of v;
-
2.
it holds
$$\begin{aligned} c^{-1} \, \Vert v\Vert _{\star ,p} \le \Vert v\Vert _{\star ,q}\le C \, \Vert v\Vert _{\star ,p}, \end{aligned}$$(11)where
$$\begin{aligned} c=\inf _{\alpha>1}c_{\alpha }, \quad C=\inf _{\alpha >1}C_{\alpha } \end{aligned}$$(12)
Proof
For any \(\gamma >0\), using Hölder inequality with exponents \(\alpha /(\alpha -1)\) and \(\alpha \) we have
Now, choosing \(\gamma >0\) so that \(\gamma \alpha \Vert v\Vert _{\star ,p} <1\), e.g., \(\gamma = \frac{1}{2 \alpha \Vert v\Vert _{\star ,p}}\), by Lemma 2.7 we deduce
and therefore
By the first inequality of \((P_\alpha )\), for some \(\alpha >1\) the mean value \({\mathbb {E}}_q \left( q/p\right) ^{\frac{1}{\alpha -1}}\) is finite. Taking into account the definition (4) of \(\Vert \cdot \Vert _{\star ,q}\), from (14) we deduce
Since the function \(x^\frac{\alpha }{\alpha -1}\) is convex, by Jensen’s inequality \({\mathbb {E}}_q \left( q/p\right) ^{\frac{1}{\alpha -1}}= {\mathbb {E}}_p \left( q/p\right) ^{\frac{\alpha }{\alpha -1}} \ge 1\). As a consequence, the supremum in (15) is obtained for \(k=1\). The right-hand inequality in (9) then follows.
The left-hand inequality in (9) can be proved by inverting the role of p and q and by exploiting the second inequality of \((P_\alpha )\). Finally, inequality (11) follows immediately from (9). \(\square \)
Remark 4.3
In the previous theorem, we have chosen \(\gamma >0\) such that \(\gamma \alpha \Vert v\Vert _{\star ,p}=1/2\), which implies \(\frac{1}{1-\gamma \alpha \Vert v\Vert _{\star ,p}}=2\). But any other \(\gamma >0\) could be chosen such that \(\gamma \alpha \Vert v\Vert _{\star ,p}<1\), and this means that in (10) any number strictly greater than 1 can replace the number 2.
The following proposition is a probabilistic version of Gehring’s inequality (see [2]) and gives conditions for an upper bound on \({\mathbb {E}}_q\left( \frac{q}{p}\right) ^\frac{1}{\alpha -1}\) or (inverting the role of p and q) on \({\mathbb {E}}_p\left( \frac{p}{q}\right) ^\frac{1}{\alpha -1}\).
Proposition 4.4
Let \(p,q \in {\mathcal {P}}\). Assume there exist two constants \(\delta >0\) and \(c>0\) such that
Then, there is \(\alpha >1\) depending only on \(c, \delta \) such that \({\mathbb {E}}_q(\frac{q}{p})^\frac{1}{\alpha -1}<+\infty \) and it holds
Proof
We may assume \(0<\delta <1\). Define \(\Phi _{\delta }(x)=x \delta ^{\frac{x}{x-1}}\) on \((1, +\infty )\). It is a continuous strictly increasing function such that \(\lim _{x\rightarrow 1^+}\Phi _{\delta }(x)=0\) and \(\lim _{x\rightarrow +\infty }\Phi _{\delta }(x)=+\infty \). Fix \(\alpha > \Phi ^{-1}_{\delta }(c)\). Multiplying both sides of (16) by \(\frac{1}{\alpha -1} \lambda ^{\frac{1}{\alpha -1}-1}\) and integrating with respect to \(\lambda \) on the interval \([1, +\infty )\), we find
Since \(\delta < 1\), we have \(\mathbbm {1}_{\{\frac{q}{p}> \delta \}}=\mathbbm {1}_{\{\delta <\frac{q}{p} \le 1\}}+\mathbbm {1}_{\{\frac{q}{p} > 1\}}\). Thus, rearranging the terms in (18), we get
from which we deduce
\(\square \)
We immediately deduce the following result.
Corollary 4.5
Let \(p, q=e^{u-K_p(u)} p \in {\mathcal {P}}\). Assume (16) holds true for \(\delta =\left( {\mathbb {E}}_p e^u\right) ^{-1} \in (0,1)\). We have:
-
1.
There is \(\alpha >\max \{1,c\}\) which satisfies (17), and it holds \(K_p(u)<\frac{\alpha -1}{\alpha }\ln \frac{\alpha }{c}\).
-
2.
Any \(\alpha >\max \{1,c\}\) such that \(K_p(u)<\frac{\alpha -1}{\alpha }\ln \frac{\alpha }{c}\) satisfies (17).
The proof follows from inequality \(\alpha > \Phi ^{-1}_{\delta }(c)\), with \(\Phi _{\delta }(x)=x \delta ^{\frac{x}{x-1}}\) and \(\delta =\left( {\mathbb {E}}_p e^u\right) ^{-1}\).
Remark 4.6
Note that a sufficient condition for the validity of (16) is
Letting \(\delta =\left( {\mathbb {E}}_p e^u\right) ^{-1}\) as in the corollary above, the inequality rewrites as \({\mathbb {E}}_p\left( \mathbbm {1}_{\{ e^u \ge \lambda \}} \right) \ge \frac{1}{c \lambda }\), \(\forall \, \lambda > 1\).
4.1 Robust Transformation of Norms
In the following proposition, we prove some robust transformations of norms when q varies in a subset of \({\mathcal {E}}(p)\).
Proposition 4.7
Given \(r<1\), consider the ball \({\mathcal {B}}_{r}(p)= \left\{ u \in L_0^{\Phi _1}(p) \,: \, \Vert u\Vert _{\star ,p}\le r \right\} \) and define
Then,
-
1.
for any \(v\in L^{\Phi _1}(p)\) and \(\alpha >\frac{1}{1-r}\)
$$\begin{aligned} \sup _{q\in {\mathcal {E}}_r(p)}\Vert v\Vert _{\star , q}\le C_{\alpha , r } \Vert v\Vert _{\star ,p}, \end{aligned}$$(20)where
$$\begin{aligned} C_{\alpha , r }= \alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( \frac{\alpha -1}{\alpha (1-r) -1} \right) ^\frac{\alpha -1}{\alpha }; \end{aligned}$$(21) -
2.
for any \(v\in L^{\Phi _1}(p)\) and \(\alpha > 1+r\)
$$\begin{aligned} c_{\alpha , r }^{-1} \Vert v\Vert _{\star ,p} \le \inf _{q\in \mathcal E_r(p)}\Vert v\Vert _{\star , q}, \end{aligned}$$(22)where
$$\begin{aligned} c_{\alpha , r }= \alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( \frac{1}{1-r}\right) ^{\frac{1}{\alpha }} \left( \frac{\alpha -1}{\alpha -1- r }\right) ^\frac{\alpha -1}{\alpha }; \end{aligned}$$(23) -
3.
for any \(v\in L^{\Phi _1}(p)\)
$$\begin{aligned} c_r^{-1} \Vert v\Vert _{\star ,p} \le \inf _{q\in \mathcal E_r(p)}\Vert v\Vert _{\star , q} \le \sup _{q\in \mathcal E_r(p)}\Vert v\Vert _{\star , q}\le C_r \Vert v\Vert _{\star ,p}, \end{aligned}$$(24)where
$$\begin{aligned} c_r= & {} \inf _{\alpha> 1+r} \, \alpha \, 2^{\frac{\alpha +1}{\alpha }}\left( \frac{1}{1-r}\right) ^{\frac{1}{\alpha }} \left( \frac{\alpha -1}{\alpha -1- r }\right) ^\frac{\alpha -1}{\alpha }, \\ C_r= & {} \inf _{\alpha > \frac{1}{1-r}}\alpha \, 2^{\frac{\alpha +1}{\alpha }} \left( \frac{\alpha -1}{\alpha (1-r) -1}\right) ^\frac{\alpha -1}{\alpha }. \end{aligned}$$
Proof
We first observe that \({\mathcal {E}}_r(p) \subseteq {\mathcal {E}}(p)\) since \({\mathcal {B}}_{r}(p)\subseteq \overset{\circ }{\text {dom}\, K_p}\). In fact, if we take \( u \in L_0^{\Phi _1}(p) \) such that \(\Vert u\Vert _{\star ,p}\le r <1\), by Lemma 2.7 we get \(e^{|u|} \in L^{1+\epsilon }(p)\) for any \(0<\epsilon <1-r\). The inclusion then follows since, from the equivalence between 1 and 6 of the Portmanteau theorem, \(u \in \overset{\circ }{\text {dom}\, K_p}\) if and only if \( u \in L_0^{\Phi _1}(p) \) and \(e^u \in L^{1+\epsilon }(p)\) for some \(\epsilon >0\).
Since \(K_p(u)\ge 0\), we get
and, by Lemma 2.7, for \(\alpha >1\) such that \(\frac{\alpha }{\alpha -1} \Vert u\Vert _{\star ,p} <1\) we deduce
For \(u\in {\mathcal {B}}_{r}(p)\) and \(\alpha >\frac{1}{1-r}\), we then obtain
From the right-inequality in (9), we finally deduce (20) and, taking the infimum over \(\alpha >\frac{1}{1-r}\), we get the right-hand inequality in (24).
In order to prove (22), we start from the left-inequality in (9). Since
again by Lemma 2.7, for \(\alpha >1\) such that \(\frac{1}{\alpha -1} \Vert u\Vert _{\star ,p} <1\) we deduce
For \(u\in {\mathcal {B}}_{r}(p)\) and \(\alpha >1+r\), we then obtain
From the left-inequality in (9) we finally deduce (22) and, taking the infimum over \(\alpha >1+r\), we get the left-hand inequality in (24). \(\square \)
Remark 4.8
An equivalent condition to sub-exponentiality that we can add to Proposition 2.3 in the particular case of centered random variables \(u\in L^{\Phi _1}_0(p)\) is the following (cf. [26]):
- 5.:
-
There exist \(a,b>0\)such that \({\mathbb E}_p(e^{\lambda |u|})\le 2e^{a\lambda ^2}, \forall \, 0<\lambda \le \frac{1}{b}\).
The constants in the bound can be expressed in terms of the norm \(\Vert \cdot \Vert _{\star ,p}\) as follows (cf. [25]):
where \(\beta \) is any number strictly greater than 1. By (25) and (26), we can then obtain robust bounds for \(C_\alpha \) and \(c_\alpha \) in exponential form.
Example 4.9
(The maximal exponential model with Gaussian weight; cf. [15, 25]) Let us consider the real Borel space \(({\mathbb {R}}, \mathcal {B})\) endowed with the probability measure \({{\mathbb {P}}}\) associated with the standard Gaussian density
and the Gaussian exponential Orlicz space \(L^{\Phi _1}=L^{\Phi _1}(p)\), with \(p=1\).
The Orlicz space \(L^{\Phi _1}\) trivially contains all second-order polynomials of the form \(P(x)=a_2 x^2+a_1 x+a_0\).
For a centered polynomial \(P(x)\in L_0^{\Phi _1}\), where \(a_2+a_0=0\), we get
In fact, for \(k=1,2, \dots \)
and we can write
The sequence \(\left( \frac{{\mathbb {E}}(x^{2k})}{k!}\right) ^{1/k}\) is increasing, and it holds
In addition, for \(k=1,2, \dots \)
which implies
By (28), the ball \({\mathcal {B}}_{r}= \left\{ u \in L_0^{\Phi _1} \,: \, \Vert u\Vert _{\star }\le r \right\} \) contains all polynomials \(P(x)=a_2 x^2+a_1 x-a_2\) such that \(3|a_2| +\sqrt{\frac{2}{\pi }} \, |a_1| \le r\).
More generally, \(L_0^{\Phi _1}\) also includes the functions \(u\in C^2({\mathbb {R}};{\mathbb {R}})\) having \({\mathbb {E}}(u)=0\) and bounded second derivative. In fact, for a suitable \(\xi \in {\mathbb {R}} \) in a neighborhood of 0, these functions can be written in the form
with
from which we deduce that
for suitable second-order polynomials \(P_1\) and \(P_2\).
Since
where \(|u(0)|\le \dfrac{1}{2} c \), \(\Vert x \Vert _{\star }=\sqrt{\frac{2}{\pi }}\) and \(\Vert u^{''}(\xi )x^2\Vert _{\star }\le c \Vert x^2\Vert _{\star }=2c\), we deduce
This means that \({\mathcal {B}}_{r}\) also contains the class of functions u defined by (29), (30) and such that \(\dfrac{3}{2} c + \sqrt{\frac{2}{\pi }} \, |u^{'}(0)|\le r\), and therefore, our results on the equivalence of norms are robust with respect to all densities \(q\propto e^u\) obtained by varying u in this class of functions.
4.2 Robust Concentration Inequalities
In the previous sections, we have seen that densities connected by an open exponential arc produce the same Orlicz space with equivalent norms and we have explicitly stated the related equivalence constants. In addition, a robust equivalence of norms has been obtained when q varies in the subset \({\mathcal {E}}_r(p)\) of the maximal exponential model. An interesting application that we present in this section is to derive robust concentration inequalities of Bernstein type.
We first recall some concentration inequalities of Bernstein type which hold true for centered random variables belonging to the exponential Orlicz space (see, e.g., [25, 26]).
Proposition 4.10
Let \(v\in L^{\Phi _1}_0\). Then, \(\forall t>0\) and for any \(\beta >1\),
Since sub-exponentiality is preserved by linear transformations, we can obtain more general concentration inequalities for sums of independent random variables (see [26]).
Due to the form of the concentration bounds, when q varies in \({\mathcal {E}}_r(p)\) defined by (19), the robust version of the concentration inequalities can be immediately stated as in the following corollary.
Corollary 4.11
For a fixed \(p\in {\mathcal {P}}\), let \(v\in L^{\Phi _1}(p)\). Then, \(\forall t>0\) and for any \(\beta >1\)
Proof
The proof follows the lines of some concentration inequalities stated in [25], so we only sketch it. For \(t>0\) and for any \(\beta >1\), we can write
where
Finally, by Proposition 4.7(3) we get
\(\square \)
5 Conclusion
The theoretical results of this paper first highlighted the link existing between the exponential Orlicz space and the space of BMO functions. As a consequence, it was possible to understand that the property \((P_\alpha )\) characterizing the maximal exponential model is a sort of static Muckenhoupt property which holds simultaneously with respect to two densities connected by an open exponential arc. This could help to understand how to introduce time dependence in the study of maximal exponential models, which is still an open issue. The explicit constants we obtained when changing the law of Orlicz spaces centered at connected densities are one of the main contributions of the paper. As an application, for random variables belonging to the exponential Orlicz space we obtained concentration inequalities of Bernstein type robust with respect to densities in an exponential subfamily. Their extension to sums of sub-exponential random variables can be applied to derive uniform exponential bounds for the law of large numbers.
Data Availability
Not applicable.
References
Cena, A., Pistone, G.: Exponential statistical manifold. AISM 59, 27–56 (2007). https://doi.org/10.1007/s10463-006-0096-y
Doléans-Dade, Meyer P.A.: Inégalités de normes avec poids. Séminaire de Probabilités XIII, Université de Strasbourg (Lecture Notes in Math. 721, pp. 313-331). Springer, Berlin (1979)
Emery, M.: Une définition faible de BMO. Ann. Inst. Henri Poincaré, Prob. Statist. 21(1), 59-71 (1985)
Fukumizu, K.: Exponential manifold by reproducing kernel Hilbert spaces. Algebraic and Geometric methods in Statistics, 291-306. Cambridge University Press (2009). https://doi.org/10.1017/CBO9780511642401.019
Garnett, J.B., Jones, P.W.: The distance in BMO to \(L^\infty \). Ann. Math. 108, 373–393 (1978)
Gibilisco, P., Pistone, G.: Connections on non-parametric statistical manifolds by Orlicz space geometry. Infin. Dimens. Anal. Quantum Probab. Relat. Top. 1(02), 325–347 (1998)
Grasselli, M.R.: Dual connections in nonparametric classical information geometry. Ann. Inst. Stat. Math. 62, 873–896 (2010). https://doi.org/10.1007/s10463-008-0191-3
Imparato D., Trivellato B.: Geometry of extendend exponential models. In: Algebraic and Geometric Methods in Statistics, pp. 307-326. Cambridge University Press (2009). https://doi.org/10.1017/CBO9780511642401.020
John, F., Nirenberg, L.: On functions of bounded mean oscillation. Commun. Pure Appl. Math. 14, 415–426 (1961)
Kazamaki, N.: Continuous exponential martingales and BMO. Springer, Berlin (1994). https://doi.org/10.1007/BFb0073585
Montrucchio L., Pistone G.: A Class of non-parametric deformed exponential statistical models. In: Nielsen, F. (eds) Geometric Structures of Information. Signals and Communication Technology, pp. 15-35. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-02520-5_2
Naudts, J.: Exponential arcs in manifolds of quantum states. Front. Phys. 11, 12 (2023). https://doi.org/10.3389/fphy.2023.1042257
Newton, N.J.: A class of non-parametric statistical manifolds modelled on Sobolev space. Inf. Geom. 2, 283–312 (2019). https://doi.org/10.1007/s41884-019-00024-z
Pistone, G.: Examples of the application of nonparametric information geometry to statistical physics. Entropy 15(10), 4042–4065 (2013). https://doi.org/10.3390/e15104042
Pistone G.: Information geometry of the gaussian space. In: Ay, N., Gibilisco P., Matus F. (eds) Information Geometry and Its Applications. IGAIA IV 2016. Springer Proceedings in Mathematics & Statistics, vol. 252, pp. 119-155. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97798-0_5
Pistone G.: A Lecture about the Use of Orlicz Spaces in Information Geometry. In: Barbaresco, F., Nielsen, F. (Eds.) SPIGL 2020 PROMS 361, pp. 179-195 (2020). https://doi.org/10.48550/arXiv.2012.03376
Pistone, G., Rogantin, M.P.: The exponential statistical manifold: mean parameters, orthogonality and space transformations. Bernpulli 5, 721–760 (1999). https://doi.org/10.2307/3318699
Pistone, G., Sempi, C.: An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 23(5), 1543–1561 (1995). https://doi.org/10.1214/aos/1176324311
Rao, M.M., Ren, Z.D.: Theory Of Orlicz spaces. Marcel Dekker Inc., New York (1991)
Santacroce, M., Siri, P., Trivellato, B.: New results on mixture and exponential models by Orlicz spaces. Bernoulli 22(3), 1431–1447 (2016). https://doi.org/10.3150/15-BEJ698
Santacroce M., Siri P., Trivellato B.: On mixture and Exponential connection by open arcs. In: Nielsen F., Barbaresco F. (eds) Geometric Science of Information. GSI 2017. Lecture Notes in Computer Science, vol 10589, pp. 577-584. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68445-1_67
Santacroce, M., Siri, P., Trivellato, B.: An application of maximal exponential models to duality theory. Entropy 20(495), 1–9 (2018). https://doi.org/10.3390/e20070495
Santacroce, M., Siri, P., Trivellato, B.: Exponential models by Orlicz spaces and applications. J. Appl. Probab. 55, 682–700 (2018). https://doi.org/10.1017/jpr.2018.45
Siri P., Trivellato B.: Minimization of the Kullback-Leibler divergence over a log-normal exponential arc. In: Nielsen F., Barbaresco F. (eds) Geometric Science of Information. GSI 2019. Lecture Notes in Computer Science, vol 11712, pp. 453-461. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26980-7_47
Siri, P., Trivellato, B.: Robust concentration inequalities in maximal exponential models. Stat. Probab. Lett. 170, 109001 (2021). https://doi.org/10.1016/j.spl.2020.109001
Vershynin R.: High-dimentional probability. An introduction with applications in data science. Cambridge University Press (2018). https://doi.org/10.1017/9781108231596
Varopoulos, N.T.: A probabilistic proof of the Garnett–Jones theorem on BMO. Proc. J. Math. 90, 201–221 (1980)
Vigelis, R.F., Cavalcante, C.C.: On \(\varphi \)-families of probability distributions. J. Theor. Probab. 26(3), 870–884 (2013). https://doi.org/10.1007/s10959-011-0400-5
Vieira, F.L.J., de Andrade, L.H.F., Vigelis, R.F., Cavalcante, C.C.: A deformed exponential statistical manifold. Entropy 21(5), 496 (2019). https://doi.org/10.3390/e21050496
Wainwright, M.J.: High-dimentional statistics. A non-asymptotic viewpoint. Cambridge University Press (2019). https://doi.org/10.1017/9781108627771
Acknowledgements
The author is grateful to a referee for very helpful comments and suggestions on the initial submission of the paper.
Funding
Open access funding provided by Politecnico di Torino within the CRUI-CARE Agreement.
Author information
Authors and Affiliations
Contributions
100% (single author).
Corresponding author
Ethics declarations
Conflict of interest
The author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Trivellato, B. Sub-exponentiality in Statistical Exponential Models. J Theor Probab 37, 2076–2096 (2024). https://doi.org/10.1007/s10959-023-01281-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10959-023-01281-6