Abstract
A variational formula for the Cramér transform of series of weighted, independent symmetric Bernoulli random variables (Rademacher series) is given.
1 Introduction
The Cramér transform defines a rate function of the large deviations for empirical means of a sequence of i.i.d. random variables (see [2]). The literature concerning much more general contexts of the large deviation principles is very vast (see for instance monographs [3, 4]). A goal of this paper is only to show some variational formula for the Cramér transform of random variables which are series of weighted, independent symmetric Bernoulli random variables.
The Cramér transform is the Legendre–Fenchel transform of the cumulant generating function of r.v. We will need the general notion of the Legendre–Fenchel transform in topological spaces (see [5] or [1]). Let \(X\) be a real locally convex Hausdorff space and \(X^*\) its dual space. By \(\left\langle \cdot ,\cdot \right\rangle \) we denote the canonical pairing between \(X\) and \(X^*\). Let \(f:X\mapsto \mathbb {R}\cup \{\infty \}\) be a function nonidentically \(\infty \). By \(\mathcal {D}(f)\) we denote the effective domain of \(f\), i.e. \(\mathcal {D}(f)=\{x\in X:\;f(x)<\infty \}\). A function \(f^*:X^*\mapsto \mathbb {R}\cup \{\infty \}\) defined by
is called the Legendre–Fenchel transform (convex conjugate) of \(f\) and a function \(f^{**}:X\mapsto \mathbb {R}\cup \{\infty \}\) defined by
is called the convex biconjugate of \(f\).
The functions \(f^*\) and \(f^{**}\) are convex and lower semicontinuous in the weak* and weak topology on \(X^*\) and \(X\), respectively. Moreover, the biconjugate theorem states that the function \(f:X\mapsto \mathbb {R}\cup \{\infty \}\) not identically equal to \(+\infty \) is convex and lower semicontinuous if and only if \(f=f^{**}\).
Let \(I\) be a countable set and \((\epsilon _i)_{i\in I}\) be a Bernoulli sequence, i.e. a sequence of i.i.d. symmetric r.v’s taking values \(\pm 1\). For \(\mathbf{t}=(t_i)_{i\in I}\in \ell ^2(I)\equiv \ell ^2\) the series
converges a.s.. Notice that for \(\mathbf{t}\in \ell ^1\)
i.e. \(X_\mathbf{t}\) is a bounded r.v. and we can define its cumulant generating function on whole \(\mathbb {R}\) that is
for every \(s\in \mathbb {R}\). Because \((\epsilon _i)_{i\in I}\) is i.i.d. Bernoulli sequence then
Observe that
We can not derive an evident form of \(\psi _\mathbf{t}^*\) by using the classical Legendre transform because we can not solve (inverse the derivative \(\psi _\mathbf{t}^\prime \)) the equation
and find
where \(s_\alpha \) is a solution of the Eq. (1).
The following theorem shows some variational expression on \(\psi _\mathbf{t}^*\).
Theorem 1.1
Let \((\epsilon _i)_{i\in I}\) be a Bernoulli sequence and \(\mathbf{t}=(t_i)_{i\in I}\in \ell ^1(I)\). The Cramér transform of a variable \(X_{\mathbf{t}}=\sum _{i\in I}t_i\epsilon _i\) is given by the following variational formula
for \(\alpha \in (-\Vert \mathbf{t}\Vert _1,\Vert \mathbf{t}\Vert _1)\) and \(+\infty \) otherwise, where
is the convex conjugate of a functional \(\psi _1:\ell ^1\mapsto \mathbb {R}\) of the form \(\psi _1(\mathbf{t})=\ln Ee^{X_\mathbf{t}}\) and \(\mathcal {D}(\psi _1^*)\subset \ell _\infty (I)\) denotes its effective domain.
Remark 1.1
Presented in the next section proof techniques are similar, but not the same, to methods used by Ostaszewska and Zajkowski in [6, 7].
2 Proof of Theorem 1.1
We begin with an observation on the absolute value of the cumulant generating function: \(\vert \psi _\mathbf{t}(s)\vert \le \vert s\vert \Vert \mathbf{t}\Vert _1\). A parameter \(\mathbf{t}\) may be an arbitrary element of \(\ell ^1\). Formally we can define a function \(\psi \) of two variables:
Fixing \(\mathbf{t}\) or \(s\) we write \(\psi (s,\mathbf{t})=\psi _\mathbf{t}(s)\) or \(\psi (s,\mathbf{t})=\psi _s(\mathbf{t})\), respectively. First we derive \(\psi _s^*\) and next we show how \(\psi _\mathbf{t}^*\) is expressed by \(\psi _s^*\).
In a standard way one can check the convexity of \(\psi _s\) for every \(s\in \mathbb {R}\). Let \(\mathbf{t},\mathbf{u}\in \ell ^1\) and \(\lambda \in (0,1)\) then
Using the Hölder inequality for exponents \(1/\lambda \) and \(1/(1-\lambda )\) we get
and, in consequence,
Because \(\psi _s:\ell ^1\mapsto \mathbb {R}\) and \((\ell ^1)^* \simeq \ell _\infty \) then
Let \(\mathbf{a}=(a_i)_{i\in I}\in \ell _\infty \). By the definition of the convex conjugate we have
where \(\left\langle \mathbf{t},\mathbf{a}\right\rangle =\sum _{i\in I}t_ia_i\).
Note that for \(s=0\) we have
Assume now that \(s\ne 0\). An expression in the curly bracket of (2), denote it by \(w\), is concave and its partial derivatives along vector of basis \(e_i=(\delta _{ij})_{j\in I}\) in \(\ell ^1\) (\(\delta _{ij}\) is the Kronecker delta) equal
The expression \(w\) is a sum of functions with separated variables \((t_i)_{i\in I}\). Concavity of each of these functions implies that the gradient \(\nabla w(\mathbf{t})=(a_i-s\tanh (st_i))_{i\in I}\) belongs to the subgradient \(\partial w(\mathbf{t})\) since
The concave function \(w\) attained its maximum (global) at the point \(\mathbf{t}\) if and only if \(\mathbf{0}\in \partial w(\mathbf{t})\). It suffices that
Because \(arc\tanh (x)=\frac{1}{2}\ln \frac{1+x}{1-x}\) for \(\vert x \vert <1\) then the partial derivatives equal zero when
Substituting the above values of \(t_i\)’s into (2) we get
Look a bit closely at the effective domain of \(\psi _s^*\) that is at the set
The function \(f(x)=(1+x)\ln (1+x)+(1-x)\ln (1-x)\) is even and \(f(0)=0\). Since \(\lim _{|x|\rightarrow 1^-}=2\ln 2\) we can extend its domain to the interval \([-1,1]\). One can check that \((1+x)\ln (1+x)+(1-x)\ln (1-x)\ge x^2\). It follows that
and \(|a_i|\le |s|\). Let \(\overline{B}_{\infty }(\mathbf{0};r)\) denote of the closed ball at the center \(\mathbf{0}\) and radius \(r\) in the space \(\ell _\infty \). The properties of \(f\) gives that
Let us note that \(\mathcal {D}(\psi _s^*)\) is a symmetric set that is \(\mathbf{a}\in \mathcal {D}(\psi _s^*)\) if and only if \(-\mathbf{a}\in \mathcal {D}(\psi _s^*)\). Moreover it is symmetric with respect to each coordinates \(a_i\) of \(\mathbf{a}\).
Return to the function \(\psi _\mathbf{t}\). Let us observe that
and \(\lim _{s\rightarrow \pm \infty }\psi _\mathbf{t}^\prime (s)=\pm \Vert \mathbf{t}\Vert _1\). It follows \(\mathcal {D}(\psi _\mathbf{t}^*)=\psi _\mathbf{t}^\prime (\mathbb {R})=(-\Vert \mathbf{t}\Vert _1,\Vert \mathbf{t}\Vert _1)\). Because \(\psi _\mathbf{t}\) is convex and continuous on \(\mathbb {R}\) then, by the biconjugate theorem, we get
On the other hand
If we take \(\mathbf{a}=s\mathbf{b}\) then \(\psi _s^*(s\mathbf{b})=\psi _1^*(\mathbf{b})\) with \(\mathbf{b}\in D(\psi _1^*)\). It means that we can rewrite the above variational principle as follows
Take now \(\alpha =\left\langle \mathbf{t},\mathbf{b}\right\rangle \). Recall that
We show that every number in \((-\Vert \mathbf{t}\Vert _1,\Vert \mathbf{t}\Vert _1)\) is taken by the inner product \(\left\langle \mathbf{t},\mathbf{b}\right\rangle \) over the set \(\mathcal {D}(\psi _1^*)\). Observe that a vector \(\mathbf{b}=\sum _{i\in J}r(sgn\;t_i)e_i\), where \(J\) is some finite subset of \(I\) and \(r\in [-1,1]\), belongs to \(\mathcal {D}(\psi _1^*)\) (only finite number of nonzero terms). For this vector we have
It follows that the inner product \(\left\langle \mathbf{t},\mathbf{b}\right\rangle \) attains over the set \(\mathcal {D}(\psi _1^*)\) any number belonging to the interval \((-\Vert \mathbf{t}\Vert _1,\Vert \mathbf{t}\Vert _1)\).
For a fixed \(\mathbf{t}\in \ell ^1\), intersect \(\mathcal {D}(\psi _1^*)\subset \ell _\infty \) with a family of hyperplains
Now we can divide the supremum of (3) into two parts and get
Define a function
We prove that in the above definition of function \(\varphi _\mathbf{t}\) an infimum over the set \(\mathcal {D}(\psi _1^*)\cap \{\mathbf{b}\in \ell _\infty :\;\left\langle \mathbf{t},\mathbf{b}\right\rangle =\alpha \}\) is attained and we can replace it by a minimum over this set that is we prove
for \(\alpha \in (-\Vert \mathbf{t}\Vert _1,\Vert \mathbf{t}\Vert _1)\) and \(+\infty \) otherwise.
By Banach–Alaoglu theorem the closed (unit) ball \(\overline{B}_{\infty }(\mathbf{0};1)\subset \ell _\infty \simeq (\ell ^1)^*\) is weak* compact and for each \(\mathbf{t}\) and \(\alpha \in (-\Vert \mathbf{t}\Vert _1,\Vert \mathbf{t}\Vert _1)\) the hyperplain \(H_{\mathbf{t},\alpha }=\{\mathbf{b}\in \ell _\infty :\;\left\langle \mathbf{t},\mathbf{b}\right\rangle =\alpha \}\) is closed in this topology. We have that an intersection \(\overline{B}_{\infty }(\mathbf{0};1)\cap H_{\mathbf{t},\alpha }\) is weak* compact. Let \(\ell _0\) be the space of sequences with finite support. Obviously \(\ell _0\cap \overline{B}_{\infty }(\mathbf{0};1)\subset \mathcal {D}(\psi _1^*)\) and \(H_{\mathbf{t},\alpha }\cap \ell _0\ne \emptyset \). We have
Recall that the function \(\psi _1^*\) is nonegative and lower semicontinuous in the weak* topology. By Weierstrass Theorem \(\psi _1^*\) attains its minimum in the compact set \(\overline{B}_{\infty }(\mathbf{0};1)\cap H_{\mathbf{t},\alpha }\). Because an intersection of this set with the effective domain of \(\psi _1^*\) is nonempty then it means that a nonegative infimum is attained at some element in \(\mathcal {D}(\psi _1^*)\). It follows that in the definition of \(\varphi _\mathbf{t}\) we can replace the infimum by minimum and the formula (5) holds.
The formula (4) means that \(\psi _\mathbf{t}\) is the convex conjugate of \(\varphi _\mathbf{t}\). To prove an equality \(\varphi _\mathbf{t}=\psi _\mathbf{t}^*\) we should show that \(\varphi _\mathbf{t}\) is convex and lower semicontinuous.
First we check the convexity of \(\varphi _\mathbf{t}\). Take \(\alpha _1,\;\alpha _2\in (-\Vert \mathbf{t}\Vert _1,\Vert \mathbf{t}\Vert _1)\). If \(\alpha _1\) or \(\alpha _2\) do not belong to the interval \((-\Vert \mathbf{t}\Vert _1,\Vert \mathbf{t}\Vert _1)\) then the value of \(\varphi _\mathbf{t}\) at such \(\alpha _k\) equals \(\infty \) and the condition of convexity is trivially satisfied. Let \(\mathbf{b}_k\) \((k=1,2)\) be vectors in \(\mathcal {D}(\psi _1^*)\cap H_{\mathbf{t},\alpha _k}\) such that
Observe that for \(\lambda \in (0,1)\)
that is \(\lambda \mathbf{b}_1+(1-\lambda )\mathbf{b}_2\in H_{\mathbf{t},\lambda \alpha _1+(1-\lambda )\alpha _2}\). The above and convexity of \(\psi _1^*\) gives
Now we prove the lower semicontinuity of \(\varphi _\mathbf{t}\). Recall that \(\psi _1^*\) is convex and lower semicontinuous in the weak* topology on \(\ell _\infty \). It means that for any \(c\in \mathbb {R}\) the set
is weak* closed. Since \(\psi _1^*\ge 0\) we can assume that \(c\ge 0\). Because the above set is contained in weak* compact unit ball \(\overline{B}_{\infty }(\mathbf{0};1)\supset \mathcal {D}(\psi _1^*)\) then it is also compact in this topology. Consider a range of the set (6) by the functional \(l_{\mathbf{t}}:=\left\langle \mathbf{t},\cdot \right\rangle \), i.e.
Since for each \(\mathbf{t}\in \ell ^1\) the linear functional \(l_\mathbf{t}\) is continuous on \(\ell _\infty \) (also in the weak* topology), by the intermediate and extreme value theorems we get that the set (7) is a closed interval. By symmetry of the set (6) and linearity of the functional \(l_\mathbf{t}\) we get the existence of a real number \(\alpha \) such that
We show that
Let \(\beta \in \varphi _\mathbf{t}^{-1}((-\infty ,c])\). Since \(\psi _1^*\) is lower semicontinuous, there exists \(\mathbf{b}_\beta \) such that
That is \(\left\langle \mathbf{t},\mathbf{b}_\beta \right\rangle =\beta \in [-\alpha ,\alpha ]\). Conversely, let \(\beta \in [-\alpha ,\alpha ]\). Since \(l_{\mathbf{t}}=\left\langle \mathbf{t},\cdot \right\rangle \) is continuous on the connected set \(\{\psi _1^*(\mathbf{b})\le c\}\), there is \(\mathbf{b}_\beta ^\prime \in \{\psi _1^*(\mathbf{b})\le c\}\) such that
Note that
that is \(\beta \in \varphi _\mathbf{t}^{-1}((-\infty ,c])\).
Because \(\varphi _\mathbf{t}\) is convex and lower semicontinuos then \(\psi _\mathbf{t}^*=\varphi _\mathbf{t}\), which completes the proof.
Remark 2.1
The result of Theorem 1.1 is similar to those obtained by the contraction principle (see for instance [3]) but let us emphasize that we used the space of parameters \(\ell ^1\) to generate the convex conjugate of the investigated function and we did not consider any probability distribution on it.
Remark 2.2
Let us stress that the proof of Theorem 1.1 contains some scheme which allow us to generate, under some assumptions of course, variational formulas on the Cramér transform for another series of random variables.
References
Barbu, V., Precupanu, T.: Convexity and Optimization in Banach Spaces. Springer Monographs in Mathematics, 4th edn. Springer, Dordrecht (2012)
Cramér, H.: Sur un nouveau théorème-limite de la théorie des probabilités, Actualités Scientifiques et Industrielles 736, 5–23 (1938). Colloque consacré à la théorie des probabilités, vol. 3, Hermann, Paris
Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications. Corrected reprints of the second (1998) edition, Stochastic Modeling and Applied Probability, 38. Springer, Berlin (2010)
Deuschel, J.D., Stroock, D.W.: Large Deviations. Pure and Applied Mathematics, 137. Academic Press Inc, Boston (1989)
Ekeland, I., Témam, R.: Convex Analysis and Variational Problems. Translated from French. Corrected reprint of the 1976 English edition. Classics in Applied Mathematics, 28. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1999)
Ostaszewska, U., Zajkowski, K.: Cramér transform and t-entropy. Positivity 18(2), 347–358 (2014)
Zajkowski, K.: Convex conjugates of analytic functions of logarithmically convex functional. J. Convex Anal. 20(1), 243–252 (2013)
Author information
Authors and Affiliations
Corresponding author
Additional information
The author is supported by the Polish National Science Center, Grant No. DEC-2011/01/B/ST1/03838.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
About this article
Cite this article
Zajkowski, K. Cramér transform of Rademacher series. Positivity 19, 529–537 (2015). https://doi.org/10.1007/s11117-014-0313-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11117-014-0313-5
Keywords
- Rademacher series
- Cramér transform
- Legendre–Fenchel transform
- Large deviations
Mathematics Subject Classification
- 44A15
- 60F10