Abstract
This paper presents an iterative algorithm that bounds the lower and upper partial moments of an arbitrary univariate random variable X by using the information contained in a sequence of finite moments of X. The obtained bounds on the partial moments imply bounds on the moments of the transformation f(X) for a certain function \(f:\mathbb {R}\rightarrow \mathbb {R}\). Two examples illustrate the performance of the algorithm.
Similar content being viewed by others
1 Introduction
Statistical computations for insurance policy, inventory management, Bayesian point estimation, and other areas often involve the computation of partial moments (Winkler et al. 1972). Bawa and Lindenberg (1977) derive a capital asset pricing model from utility functions based on lower partial moments. The i-th upper (lower) partial moment of a univariate real-valued random variable X with cumulative distribution function F is defined as the i-th moment in excess of (below) a certain threshold \(a\in {\mathbb {R}}\),
This paper presents an algorithm that bounds the partial moments of X using information contained in a sequence of (full) moments of X. This approach is strongly related to the moment problem.
The moment problem originally considered whether a certain sequence of moments corresponds to at least one univariate probability measure. This problem has been extensively discussed in the mathematical literature (Akhiezer 1965; Kreĭn and Nudelman 1977; Shohat and Tamarkin 1943; Stoyanov 2013). The admissible support of the probability measure splits the moment problem in three different subproblems: The admissible support can be unrestricted (Hamburger moment problem), restricted to the positive half-line (Stieltjes), or restricted to a bounded interval (Haussdorff). The algorithm presented in this paper fits into the Hamburger moment problem as it allows for X with support at both tails.
In case a sequence of moments is known to correspond to at least one probability measure, an important follow-up question involves what information on the distribution is contained in this sequence. Research on this question has focused on several characteristics:
- (i):
-
class of probability measure (Berg 1995; Gut 2002; Lin 1997; Pakes et al. 2001; Stoyanov 2000)
- (ii):
-
tail of the distribution (Goria and Tagliani 2003; Lindsay and Basak 2000)
- (iii):
-
mode (Gavriliadis 2008)
- (iv):
-
cumulative distribution function (Mnatsakanov 2008b; Mnatsakanov and Hakobyan 2009)
- (v):
-
density function (Gavriliadis and Athanassoulis 2009; Mnatsakanov 2008a)
- (vi):
-
Shannon entropy (Milev et al. 2012)
The above-mentioned literature can differ in the admissible support of the probability measure.
By using a sequence of moments, this paper adds to the literature on the moment problem by presenting an iterative algorithm that bounds the partial moments (1) at \(a=0\). More specifically, the algorithm bounds the partial moments from the positive part, \(\max (X,0)\), and the negative part, \(\max (-X,0)\), of a univariate real-valued random variable X by using information contained in known (ratios of) subsequent finite moments of X. In case one is interested in the partial moments at \(a\ne 0\), obtain moments of \(X_a:=X-a\) from the sequence of moments of X and bound the moments of \(\max (X_a,0)\) and \(\max (-X_a,0)\).
The bounds on the partial moments imply bounds on unobserved higher order moments of X. Using a Taylor expansion, one can obtain bounds on the moments of the transformation f(X) for a certain function \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\).
Where applicable, procedures from related literature can be added to our algorithm. If the random variable X belongs to the class of Pearson distributions,Footnote 1 a recursive relationship on partial moments is in Winkler et al. (1972). In case X has a known finite support, the information on the support implies additional bounds (Barnett et al. 2002) and suitable numerical optimization routines can be invoked (Dokov and Morton 2005; Frauendorfer 1988; Kall 1991). Because the sequence of known moments of X typically consists of more than six moments, an unrestricted support can hamper a numerical optimization routine that restricts the sequence of moments.
The paper proceeds as follows. Preliminaries are in Sect. 2. Section 3 presents the iterative algorithm that bounds partial moments, and how these bounds imply bounds on other expressions. In Sect. 4, the algorithm is demonstrated for two example distributions. Section 5 is reserved for concluding remarks and points of discussion.
2 Preliminaries
Let \(\mathbb {R}^+_0 := [0,\infty )\) denote the set of nonnegative real numbers in the set \(\mathbb {R}\) of real numbers. The sets \(\mathbb {N}^+_0 = \{0,1,\ldots \}\) and \(\mathbb {N}^+=\{1,2,\ldots \}\) denote sets of natural numbers. Let \(X:\varOmega \rightarrow \mathbb {R}\) be a univariate real-valued random variable defined on the probability space \((\varOmega ,\mathcal {F},\mathbb {P})\) with unknown cumulative distribution function (cdf) \(F:\mathbb {R}\rightarrow [0,1]\).
The known moments \(\mu _1,\ldots ,\mu _n\) of X are finite. Assume \(\mu _2>0\) to avoid trivial outcomes where all moments are zero. Denote the positive and negative part of X by \(X^{+}:=\max (X,0)\) and \(X^{-}:=\max (-X, 0)\), respectively. Let \(\mu _i^+\) and \(\mu _i^-\) represent the i-th moment of \(X^+\) and \(X^-\), respectively (\(i\in \mathbb {N}^+\)). Thus, \(\mu _i^+:=\mu _i^+(0)\) and \(\mu _i^-:=\mu _i^-(0)\) denote the partial moments of X at \(a=0\) in (1). Set \(\mu _0^+ := {{\mathbb P}\!\left( X > 0 \right) }\), \(\mu _0^- := {{\mathbb P}\!\left( X < 0 \right) }\), and \(\mu _0 := {{\mathbb P}\!\left( X\ne 0 \right) }\). The latter definition is a convention that enables tighter bounds.
For a quasi-degenerate random variable X, there exists \(x\in {\mathbb {R}}\) with \({{\mathbb P}\!\left( X\in \{0,x\} \right) }=1\). A nonquasi-degenerate (nqd) random variable X is a random variable that is not quasi-degenerate, thus \({{\mathbb P}\!\left( X\in \{0,x\} \right) }<1\) for all \(x\in {\mathbb {R}}\). A degenerate X is necessarily quasi-degenerate, while a nqd X is necessarily non-degenerate.
Define the i-th moment ratio of X as \(m_i := \mu _i/\mu _{i-1}\) with \(m_i=\infty \) if \(\mu _{i-1}=0\) (\(i\in \mathbb {N}^+\)). Let \(m_i^+\) and \(m_i^-\) refer to the i-th moment ratio of the positive part \(X^+\) and the negative part \(X^-\), respectively. Both ratios can be referred to as a partial moment ratio of X. Lower bounds and upper bounds are denoted by underlines and bars, respectively. For instance, \(\underline{\mu }_i^+\) is a lower bound on \(\mu _i^+\) and \(\bar{m}_i^-\) is an upper bound on \(m_i^- := \mu _i^- / \mu _{i-1}^-\). The negation of \(s\in \{+,-\}\) is denoted by \(-s\). The operation \(s\, x\) denotes x if \(s=+\) and \(-x\) if \(s=-\). The operation s / x means 1 / x if \(s=+\) and \(-1/x\) if \(s=-\). The equivalence \(X \equiv Y\) means that \(X=Y\) holds almost surely: \({{\mathbb P}\!\left( X=Y \right) } = 1\).
3 The algorithm
This section outlines the algorithm in a number of lemmas and a theorem. By using some simple constraints, the first lemma provides bounds on partial moments and partial moment ratios. This enables an initialization of the bounds.
Lemma 1
[Initialization I] For \(s\in \{+,-\}\),
-
(i)
The nonnegativity constraint in (2) holds with equality if and only if \(X^s\equiv 0\).
The constraint \(\mu _{2n}^s\le \mu _{2n}\) in (2) holds with equality if and only if \(X^{-s}\equiv 0\).
-
(ii)
Each of the constraints in (3) holds with equality if and only if \(X^{-s}\equiv 0\).
Consider the Christoffel function associated with cdf F of a non-degenerate X:
where \(P_0(x)\equiv 1\) and
Let
where \(x^{(1)}_j \le \cdots \le x^{(j)}_j\) are the j zeros of the j-th degree polynomial \(P_j\).
Using the definitions above, the following lemma from Gavriliadis and Athanassoulis (2009) bounds the distribution function at certain points.
Lemma 2
Suppose the moments \(\mu _1,\ldots ,\mu _{2n}\) of a non-degenerate X are known (\(n\in \mathbb {N}^+\)). The cumulative distribution function F satisfies at the j zeros \(x^{(1)}_j, \ldots , x^{(j)}_j\) of polynomial \(P_j\),
Let \(\tilde{x}^{(1)}< \cdots < \tilde{x}^{(N)}\) represent the strictly increasing sequence of the N distinct zeros of the set of polynomials \(\{P_j\}_{j=1}^n\):
Define the increasing step functions
Notice that for L(x) (respectively U(x)), one can simply take for each j the largest (smallest) i that satisfies \(x^{(i)}_j\le x\) (\(x^{(i)}_j \ge x\)) since \(L^{(i)}_j\) (\(U^{(i)}_j\)) is nondecreasing with i. It follows from Lemma 2 that \(L(x)\le F(x) \le U(x)\) holds for all \(x\in {\mathbb {R}}\). Using these bounds on the cdf F(x), Lemma 3 produces lower bounds on the partial moments:
Lemma 3
(Initialization II) Given the notation above,
where \(j_0^+ = \mathop {{{\mathrm{argmin}}}}\nolimits _{j\in \{1,\ldots ,N\}} \{ \tilde{x}^{(j)} > 0 \}\), \(dL_1 = L(\tilde{x}^{(1)})\), \(dU_{N}=1 - U(\tilde{x}^{(N)})\), and
Upper bounds on the partial moments \(\mu _i^+\) and \(\mu _i^-\) are not available along the lines in Lemma 3. For instance, the summation \(\sum _{j=j_0^+}^{N} \left( \tilde{x}^{(j)} \right) ^i dL_j\) is not an upper bound on \(\mu _i^+\) since \(dL_N\) is not a bound for \(\mathrm{d}F\) on \((x^{(N)}, \infty )\). The iterative algorithm we develop derives upper bounds on \(\mu _i^+\) and \(\mu _i^-\) under certain assumptions. In addition, the algorithm can sharpen the lower bounds obtained in Lemma 3.
The next lemma is useful to apply in the iterative algorithm as well as for proving some inequalities.
Lemma 4
(Moment ordering)
-
(i)
For nonnegative X and \(n\ge 2\), or for arbitrary X and even \(n \ge 2\),
$$\begin{aligned} \mu _{n-1}^{2} \le \mu _{n}\mu _{n-2}. \end{aligned}$$ -
(ii)
For nonnegative X with \(\mu _{1}>0\) the moment ratio \(m_i=\mu _i/\mu _{i-1}\) increases with \(i\in \mathbb {N}^+\):
$$\begin{aligned} 0<m_{1}\le m_{2}\le m_{3}\le \cdots . \end{aligned}$$
The inequalities are all strict if and only if X is nqd.
Lemma 5 produces bounds on the moment ratio of certain summations of nonnegative random variables.
Lemma 5
Consider \(Y = \sum _{j\in J} Y_j\) where each \(Y_j:\varOmega \rightarrow \mathbb {R}_0^+\) is a nonnegative random variable with strictly positive first moment. Let \(i \in \mathbb {N}^+\).
-
(i)
Suppose the following two conditions both hold
-
(a)
Y is a mixture
-
(b)
for each \(j\in J\), \(m_i(Y_j) = g_i h_j\) and \(m_{i+1}(Y_j) = g_{i+1} h_j\) where \(g_i,g_{i+1},h_j\in {\mathbb {R}}\), then
$$\begin{aligned} \max _{j\in J} \frac{m_{i+1}(Y_j)}{m_{i}(Y_j)}&\le \frac{m_{i+1}(Y)}{m_{i}(Y)}. \end{aligned}$$(5)
-
(a)
-
(ii)
If Y is a mixture, i.e., for almost all outcomes \(\omega \in \varOmega \) at most one \(Y_j(\omega )\) is nonzero, then
$$\begin{aligned} m_i(Y)&\le \max _{j\in J} m_i(Y_j). \end{aligned}$$(6) -
(iii)
If each \(Y_j\) is independent,
$$\begin{aligned} m_i(Y)&\le \sum _{j\in J} m_i(Y_j). \end{aligned}$$(7) -
(iv)
Suppose the following two conditions both hold
-
(a)
each \(Y_j\) is independent
-
(b)
each \(Y_j\) is logconcave,
then for each nonempty \(I\subseteq J\), \(\sum _{j\in I} Y_j\) is logconcave and
$$\begin{aligned} m_i\left( \sum _{j\in I} Y_j \right)&\le m_i(Y). \end{aligned}$$(8) -
(a)
Sufficient condition (b) in (i) reflects that the moment ratio \(m_i(Y_j)\) must be separable into two parts with a common functional form across i and j. Nonnegative distributions with moment ratios that satisfy this condition include the Gamma distribution and distributions with a single, scalar parameter such as the half-normal distribution and uniform distributions of the type U[0, b] with \(b>0\). The moment ratio of the log-normal distribution has not a separable form.
It is natural to inquire about the necessity of the sufficient conditions in Lemma 5. We provide for each condition a counterexample where the sufficient condition is not satisfied and the corresponding inequality fails to hold.
- (i):
-
(a) Distributions that violate sufficient condition (a) may not satisfy (5) as one can verify using \(Y=Y_1+Y_2\) with independent \(Y_1,Y_2 \sim \exp (1)\). (b) For instance, the mixture Y with \({{\mathbb P}\!\left( Y=Y_1 \right) }={{\mathbb P}\!\left( Y=Y_2 \right) }=\frac{1}{2}\) where \(Y_1 \equiv 1\) and \(Y_2 \sim \exp (1)\) satisfies (a), but violates (b). Inequality (5) is invalid here since \(m_2(Y_2)/m_1(Y_2) = 2 > \frac{3}{2} = m_2(Y)/m_1(Y)\).
- (ii):
-
Equation (6) can be invalid if the mixture condition on Y is dropped. This follows for the case where \(Y_1 \equiv Y_2 \equiv 1\).
- (iii):
-
Dependence between different \(Y_j\) can make inequality (7) invalid. For instance, the case where \({{\mathbb P}\!\left( (Y_1, Y_2)=\left( 0,\frac{1}{2}\right) \right) } = {{\mathbb P}\!\left( (Y_1,Y_2)=\left( 1,1\right) \right) }=\frac{1}{2}\) leads to \(m_3(Y_1)+m_3(Y_2)=1.9<1.91\cdots =m_3(Y)\).
- (iv):
-
(a) Consider \(Y=Y_1+Y_2\) with the log-concave distributions \(Y_1\sim \mathrm{Exp}(1)\) and \(Y_2\sim \mathrm{Exp}\left( \frac{1}{2}\right) \).
To violate (a), impose a perfect negative correlation on the quantiles \(F_{Y_1}(Y_1)\) and \(F_{Y_2}(Y_2)\) by letting \(F_{Y_1}(Y_1) = 1-F_{Y_2}(Y_2)\). In other words, \((Y_1,Y_2)\sim \left\{ \left( -\ln (1-U), -2\ln (U) \right) : U\sim U[0,1] \right\} \) such that \({{\mathbb E}\!\left[ Y^2\right] } = \int _0^1 \ln ^2\left( u^2(1-u)\right) \mathrm{d}u=18-\frac{2}{3}\pi ^2\). It follows that \(m_2(Y)=6-\frac{2}{9}\pi ^2 < 4 = m_2(Y_2)\). (b) For the independent, non-logconcave case where \({{\mathbb P}\!\left( Y_1=0 \right) }=\frac{9}{10}\), \({{\mathbb P}\!\left( Y_1=10 \right) }=\frac{1}{10}\), and \(Y_2\equiv 1\): \(m_2(Y_1+Y_2) = \frac{13}{2} < 10 = m_2(Y_1)\).
The following lemma can be supplemented with bounds from Lemma 5. This is helpful in the examples in Sect. 4.
Lemma 6
(Initialization III) Let \(i\in \mathbb {N}^+\).
-
(i)
Suppose \(X=\sum _{j\in I} X_j\) is a mixture of random variables \(X_j:\varOmega \rightarrow \mathbb {R}\), i.e., for almost all outcomes \(\omega \in \varOmega \) at most one \(X_j(\omega )\) is nonzero, then
$$\begin{aligned} m_i^+ := m_i(X^+)&\le \max _{j\in I} m_i(X_j^+). \end{aligned}$$(9) -
(ii)
If \(X = Y - Z\) with \(Y = \sum _j Y_j\) and the following conditions hold
(a) Y and Z are independent and nonnegative random variables
(b) each \(Y_j\) is independent
(c) each \(Y_j\) is log-concave,Footnote 2
then
$$\begin{aligned} m_i(X^+)&\le m_i(Y). \end{aligned}$$(10) -
(iii)
If \(X=Y-Z\) is a mixture of nonnegative random variables Y and Z, i.e., \({{\mathbb P}\!\left( \{Y=0\} \cup \{Z=0\} \right) }=1\), then \(X^+\equiv Y\) and
$$\begin{aligned} m_i(X^+)&= m_i(Y). \end{aligned}$$(11)
Similar relations apply to \(m_i^-:=m_i(X^-)\) and \(m_i(Z)\).
Condition (c) in Lemma 6(ii) on logconcave distributions is satisfied by many univariate distributions. Examples include the exponential distribution, the normal distribution, the uniform distribution, and the Gamma distribution with shape parameter at least one. The Gamma distribution with shape parameter less than one, the Student’s t-distribution, and the log-normal distribution are not log-concave.
We provide for each sufficient condition in Lemma 6 an example where the condition is not satisfied and the corresponding inequality is violated.
- (i):
-
Inequality (9) fails to hold for the nonmixture case where \(X\equiv 2\) with \(X_1 \equiv X_2 \equiv 1\). Here, X, is not a mixture and \(m_i(X^+) = 2 > 1 = \max (m_i(X_1),m_i(X_2))\) where \(i\in \mathbb {N}^+\).
- (ii):
-
(a) Suppose \(Y\sim U[0,1]\) and \(Z=Y 1_{Y<\frac{1}{2}}\). The independence assumption (a) fails to hold, while assumptions (b) and (c) are satisfied. Inequality (10) is not satisfied: \(m_2(X^+)=\frac{7}{9} > \frac{2}{3} = m_2(Y)\).
(b) Consider again the counterexample of Lemma 5(iv) where \(Y=Y_1+Y_2\) with the log-concave distributions \(Y_1\sim \mathrm{Exp}(1)\) and \(Y_2\sim \mathrm{Exp}\left( \frac{1}{2}\right) \), and let \(Z\equiv 5\). This satisfies conditions (a) and (c). To violate (b), impose a perfect negative correlation on the quantiles \(F_{Y_1}(Y_1)\) and \(F_{Y_2}(Y_2)\) by letting \(F_{Y_1}(Y_1) = 1 - F_{Y_2}(Y_2)\). It can be verified that \(m_2(X^+) = 3.88\cdots > 3.81\cdots = 6-\frac{2}{9}\pi ^2 = m_2(Y)\).
(c) Suppose \({{\mathbb P}\!\left( Y=1 \right) } = \frac{9}{10}\), \({{\mathbb P}\!\left( Y=9 \right) } = \frac{1}{10}\), and \(Z\equiv 1\). While conditions (a) and (b) are satisfied, condition (c) on logconcavity of each \(Y_j\) fails to hold. Inequality (10) also fails to hold: \(m_2(X^+)= 8 > 5 = m_2(Y)\).
- (iii):
-
The nonmixture case with \(Y\equiv 2\) and \(Z\equiv 1\) corresponds for each \(i\in \mathbb {N}^+\) to \(X\equiv X^+ \equiv 1\) and \(m_i^+ = 1 \ne 2 = m_i(Y)\).
The next theorem presents the theoretical novelty of the paper. For a nqd X, the moment \({\mu }_i^s\) is bounded in terms of the bounds on \(m_i^s\), which are in turn in the proof derived from the bounds on the moments of \(X^{-s}\). The requirement of a nqd X can be easily verified, since the moments of a quasi-degenerate distribution are uniquely characterized by finite \(m_1=m_2=\ldots \) and \(\mu _i = \left( \mu _n\right) ^{i/n}\) (\(i\in \mathbb {N}^+_0\)). To identify a quasi-degenerate X, it therefore suffices if \(\mu _i = \left( \mu _n\right) ^{i/n}\) holds for some odd i and even n. The requirements on i and n exclude X that are restricted to the set \(\{-x, 0, x\}\) for certain \(x\in {\mathbb {R}}\).
Theorem 1
(\({\mu }_i^s\) in terms of the bounds on \(m_i^s\)) Consider a nqd X with known moments \(\mu _{n-2}\), \(\mu _{n-1}\), \(\mu _{n}\) with \(n\ge 2\) even. Define
with constants (\(s\in \{+,-\}\))
Assume for the bounds on the partial moment ratios
The moments of \(X^s\) are bounded by
The next lemma bounds \(\mu _i^s\) using previously obtained bounds.
Lemma 7
(\({\mu }_i^s\) in terms of the bounds on \(m_j^s\), \({\mu }_j^s\), and \(\mu _j^{-s}\)) For \(i\in \mathbb {N}^+\) and \(s \in \{+,-\}\),
Lemma 8 bounds the moment ratios in terms of the bounds on the moments.
Lemma 8
(\(m_i^s\) in terms of the bounds on \({\mu }_i^s\))
The bounds in (22) hold with equality if and only if the bounds on \(\mu _{i-1}\) and \(\mu _{i}\) hold with equality. The bounds in (23) hold with equality if and only if the bounds on \(\mu _j^s\) (\(j\in \{i-2,\ldots ,i+1\}\)) hold with equality and \(X^s\) is quasi-degenerate.
The algorithm below is the iterative algorithm that bounds the partial moments \(\mu _i^+\) and \(\mu _i^-\) and the corresponding moment ratios \(m_i^+=\mu _i^+/\mu _{i-1}^+\) and \(m_i^-=\mu _i^-/\mu _{i-1}^-\).
Algorithm 1
(Partial moments) The moments \(\mu _1,\ldots ,\mu _n\) of the random variable X are known. Before proceeding to the next step, each step is executed for \(i=1,\ldots ,n\) [and also for \(i=0\) in (iii)–(iv)]. The index j denotes an index that may depend on i.
-
(i)
Initialize bounds on \(m_i^+\) and \(m_i^-\) (Lemma 1, 3, and 6).
-
(ii)
Bound \(m_i^+\) and \(m_i^-\) in terms of the bounds on \(m_j^+\) and \(m_j^-\) (Lemma 4(ii)).
-
(iii)
Bound \(\mu _i^+\) and \(\mu _i^-\) in terms of the bounds on \(m_j^+\) and \(m_j^-\) (Theorem 1).
-
(iv)
Bound \(\mu _i^+\) and \(\mu _i^-\) in terms of the bounds on \(\mu _j^+\), \(\mu _j^-\), \(m_j^+\), and \(m_j^-\) (Lemma 7).
-
(v)
Bound \(m_i^+\) and \(m_i^-\) in terms of the bounds on \(\mu _j^+\) and \(\mu _j^-\) (Lemma 8).
-
(vi)
Stop if the change in the bounds since step (ii) is smaller than some predetermined tolerance, otherwise go to step (ii).
Within each step, apply all applicable inequalities. The different assumptions in particularly Lemmas 5 and 6 must be checked in advance. Some may be valid for a given random variable X, while some others may not.
The following algorithm applies Lemma 6(iii) and is helpful for bounding unobserved higher order moments.
Algorithm 2
(Extrapolation) The moments \(\mu _1,\ldots ,\mu _n\) of \(X=Y-Z\) are known with \({{\mathbb P}\!\left( \{Y=0\} \cup \{Z=0\} \right) }=1\). Execute Algorithm 1 to obtain bounds on \(\mu _0^s, \ldots , \mu _n^s\) (\(s\in \{+,-\}\)). For a certain \(\nu \in \mathbb {N}^+\), construct bounds on \(\mu _i^s\) using bounds on \(m_i^s\) where \(i= n+1,\ldots ,n+\nu \):
-
(i)
Lower bounds If (a) \(X^s\) is a mixture, and (b) \(m_i(X_j^s) = g_i h_j\) where \(i > n\) and \(g_i, h_j\in {\mathbb {R}}\): apply Lemma 5(i)
else apply Lemma 4(ii).
-
(ii)
Upper bounds If \(X^s\) is a mixture: apply Lemma 5(ii)
else if \( X^s = \sum _j X_j^s\) with each \(X_j^s\) independent: apply Lemma 5(iii).
Conditions on \(X^s\) are needed for an upper bound on partial moments \(\mu ^s_i\) (\(i>n\)) in Algorithm 2(ii). To see this, the tail of any \(X^s\) can admit \({{\mathbb P}\!\left( X^s = x \right) }=1/x^{i-\varepsilon }\) for arbitrarily large x and \(\varepsilon \in (0,i)\). By Markov’s inequality, the probability mass at x of such \(X^s\) has unbounded impact on moments \(i\in \mathbb {N}^+\) and higher if \(x\rightarrow \infty \):
The bounds on the moments of \(X^+\) and \(X^-\) from Algorithm 2 provide bounds on the unobserved higher order moments of X through \(\mu _i=\mu ^+_i + (-1)^i \mu ^-_i\) with \(i=n+1,n+2,\ldots \). This can be used in a Taylor expansion of the transformation f(X). For an expansion of f(X) around a nonzero a, one can consider using \(\max (X_a,0)\) and \(\max (-X_a,0)\) with \(X_a:=X-a\).
4 Examples
The example in Sect. 4.1 reports bounds on partial moments for a case using Algorithm 1. Section 4.2 contains an example that uses Algorithm 1 and bounds a Taylor series expansion by extrapolating the results to higher order moments.
4.1 Example 1: a sum of random variables
Consider \(X=\frac{1}{2}T-U+Z\) with T an exponential distribution with intensity one, U a uniform distribution on [0, 1], and Z a standard normal distribution. The variables T, U, and Z are independent. Since the odd moments of Z are zero, the moments \(\mu _i\) of X can be obtained from the expansion
where \(i \in \mathbb {N}^+_0\), \({\lfloor i/2 \rfloor }\) denotes the largest natural number not greater than i / 2 \(({\lfloor i/2 \rfloor } = \max _{j\in {\mathbb {N}}_0^+} \{ j : j\le {i}/{2} \})\), and
We obtain the i-th moment of X from (24) for \(i=0, 1, \ldots , 16\).
We are interested in bounding the moments of \(X^+:=\max (X,0)\) and \(X^-:=\max (-X,0)\). Write \(X=\left( \frac{1}{2}T+Z^+)-(U+Z^-\right) \), where the components T, U, \(Z^+\), and \(Z^-\) are logconcave distributions. Notice that (i) \(X^+\not \equiv \frac{1}{2}T + Z^+\) and \(X^-\not \equiv U + Z^-\) and (ii) the convolutions \(\frac{1}{2}T + Z^+\) and \(U + Z^-\) are logconcave (Lemma 5(iv)).
Consider the independent distributions (i) \(T_1, T_2 \,\buildrel d \over =T\), (ii) \(U_1, U_2 \,\buildrel d \over =U\), and (iii) \(Z_1, Z_2 \,\buildrel d \over =Z\). Since \({{\mathbb P}\!\left( Z\ge 0 \right) }=\frac{1}{2}\), the distribution of X is equal in distribution to a mixture of two distributions:
Both components in (26) consist of three independent log-concave distributions. Upper bounds on \(m_i^+:=m_i^+(X)\) and \(m_i^-:=m_i^-(X)\) are initialized by applying Lemma 6(i)–(ii) and then Lemma 5(iv),
The initial moment ratios in (27)–(28) follow from
where the moments of T and U are in (25), and
As a measure of convergence after iteration k, consider the change in the width of the error bounds:
where the first subscript i of each moment bound refers to the considered order of the partial moments, and the second subscript k refers to the iteration number with iteration 0 immediately after the initialization. A small \(\varepsilon _k\) indicates a small improvement in the error bounds.
Using \(\varepsilon _k < 10^{-10}\) as a stopping condition, Algorithm 1 stops after 83 iterations and 151 ms of CPU time.Footnote 3 Table 1 reports the bounds on the moments \(\mu _i^+\) of \(X^+\) and \(\mu _i^-\) of \(X^-\). By (20)–(21), the difference \(\bar{\mu }_i^s-\underline{\mu }_i^s\) is independent of the sign \(s\in \{+,-\}\) (3rd column). This difference relative to the moment \(\mu _i\) tends to decrease with the order i of the moment (4th column). This can be helpful for bounding Taylor expansions because the moments of the highest observed order are most important for bounding unobserved higher order moments of X (here, the moments greater than order sixteen).
The difference between the bounds relative to \(\underline{\mu }_i^+\) is also decreasing with the order (7th column). In contrast, the difference between the bounds relative to \(\underline{\mu }_i^-\) increases for the higher order moments to 43.7% (rightmost column). This higher percentage may reflect that the sequence of 16 moments cannot uniquely pin down the characteristics of particularly \(X^-\). The reason is that \(X^+\) tends to be larger than \(X^-\), as indicated by the moment bounds. As such, the higher order moments of X are mainly determined by the higher order moments of \(X^+\).
The convergence by iteration is for the partial moments of order 10 in Table 2. Here, 25 iterations suffice for having each of the bounds with a difference less than \(0.03\%\) of the bounds after the final iteration on the 10-th moment. The final bounds on \(\mu _{10}^+\) and \(\mu _{10}^-\) differ from each other by 2.2 and \(22.0\%\), respectively. Table 3 indicates that for this example, executing 25 iterations gives bounds with a difference of at most \(0.14\%\) to the final bounds on the 16-th partial moments of X. The relative difference between the final bounds on \(\mu _{16}^+\) is, by coincidence, also \(0.14\%\).
Figure 1 depicts the series \(\varepsilon _k\) as a function of the iteration k. The linear trend in \(\varepsilon _k\) suggests that for some \(\tilde{a}, \tilde{b} \in {\mathbb {R}}\),
A small number of numerical experiments suggest that the exponential convergence of \(\varepsilon _k\) in (31) might hold in general. A proof of this conjecture is beyond the scope of this paper.
The product \(\prod _{k=1}^\infty \left( 1-\varepsilon _k \right) \) is a measure for the cumulative decrease in the width of the error bound over all iterations. Provided (31) is the correct model and \(\tilde{b} \le 0\), the lower bound and upper bound do not converge to each other:
where \(a=e^{\tilde{a}}\) and \(b=e^{\tilde{b}}\). An estimation of the linear regression (31) using \(\varepsilon _{20},\ldots ,\varepsilon _{83}\) of this example gives \(\tilde{a}=1.45\) and \(\tilde{b}=-0.299\). This predicts a small cumulative relative decrease in the width of the error bounds after iteration 83:
Because \(\varepsilon _k\) is a maximum over different orders of moments, the value in (33) can be interpreted as an upper bound on the cumulative decrease of the partial moments \(0,1,\ldots ,16\).
In some cases, initial bounds on the moment ratios can be difficult to obtain. Instead of the initial bounds (27)–(28), suppose we were to initialize the upper bounds on the moment ratios by
By Lemma 5(iii), this gives less strict initial bounds on the moment ratios than (27)–(28) (Table 4).
The effect of the initial bounds on the final bounds is substantial as can be seen by comparing Tables 1 and 5. Each difference \(\bar{\mu }_i^s-\underline{\mu }_i^s\) is higher in Table 5 than in Table 1. This underlines the importance of providing initial bounds in Algorithm 1 that are as strict as possible. Particularly, the bounds on the moments of \(X^-\) are sensitive to the initial bounds. More specifically, the accuracy of the bounds on the highest order moment of \(X^-\) decreases by three orders of magnitude when the less strict initial bounds in (34) are imposed (rightmost column in Tables 1 and 5).
4.2 Example 2: the exponential function on a quadratic form
Suppose one is interested in \({{\mathbb E}\!\left[ \exp (X)\right] }\) with the quadratic form \(X=Z' \tilde{A} Z = \frac{1}{2} Z' \! \left[ \tilde{A} + \tilde{A}' \right] \! Z\) where \(Z\sim N(\mathbf{0},I)\) has dimension d and \(\tilde{A}\) is a \(d\times d\) matrix with eigenvalues less than \(\frac{1}{2}\). Diagonalize the symmetric matrix \(A:=\frac{1}{2}\left[ \tilde{A} + \tilde{A}'\right] \) as \(V\varLambda V'\) with \(\varLambda \) a diagonal matrix with diagonal entries the eigenvalues \(\lambda _1 \le \cdots \le \lambda _d < \frac{1}{2}\). Since \(V'Z \sim Z\), the random variable X is a weighted summation of d independent Chi-squared distributions with one degree of freedom,
We can infer the exact outcome of \({{\mathbb E}\!\left[ \exp (X)\right] }\) from the moment generating function of a Chi-squared distribution:
The expectation of \(\exp (X)\) is infinite if \(\max _i\lambda _i \ge \frac{1}{2}\). The outcome in (35) enables a direct comparison with the bounds that we obtain. It should be stressed that an exact representation of \({{\mathbb E}\!\left[ f(X)\right] }\) is in general unavailable.
Consider the Taylor series expansion
The moments of X are (Magnus 1986, Lemma 3)
where the summation is over all \(\nu = (n_1, \ldots , n_i)\) with each \(n_j\in \mathbb {N}_0\) and \(\sum _{j=1}^i n_j j = i\). This procedure is computationally expensive for moments of a high order i. A procedure based on Algorithm 2 enables us to bound high moments of X, and thus \({{\mathbb E}\!\left[ \exp (X)\right] }\) in (36). The bounds follow from a few additional steps we outline below.
Using a polar coordinate system, it can be verified that \(X \sim Z'\varLambda Z \sim (\sqrt{R} U_d)' \varLambda (\sqrt{R} U_d)\) where the direction vector \(U_d\) follows a uniform distribution on the unit d-sphere \(S_d\) and R, which is the squared distance to the origin, follows a \(\chi ^2(d)\)-distribution. The latter distribution is a Gamma distribution with shape parameter d / 2 and scale parameter 2. This means that X is an infinite mixture of Gamma distributions \(X_{{{\mathbf {u}}}}\) with each component characterized by some \({{\mathbf {u}}}\in S^d\). More specifically, each component \(X_{{\mathbf {u}}}\) in the mixture is a Gamma distribution scaled by \(\lambda ({{\mathbf {u}}}) := {{\mathbf {u}}}'\varLambda {{\mathbf {u}}}\). The scaling parameter \(\lambda (\mathbf{u})\) varies between \(\lambda _{\min }= \min _i \lambda _i=\lambda _1\) and \(\lambda _{\max } = \max _i \lambda _i=\lambda _d\). For each component \(X_{{{\mathbf {u}}}}\), a positive (negative) \(\lambda ({{\mathbf {u}}})\) indicates that the corresponding Gamma distribution is added (subtracted). We loosely write \(X_{{\mathbf {u}}}\sim \mathrm{Gamma}(d/2,2\lambda ({{\mathbf {u}}}))\) for all \(\lambda ({{\mathbf {u}}})\), including negative \(\lambda ({{\mathbf {u}}})\).
Define the scaled remainder \({\xi }_n(Y)\) of a random variable Y by the functional
For a random variable Y that degenerates at x, i.e., \(Y\equiv x\), the functional in (38) can be written as a function of x,
where \(_1F_1(a, b; x)\) is the confluent hypergeometric function of the first kind with parameters a and b. By (36) and (38),
The moments \(\mu _1,\ldots ,\mu _n\) are obtained from (37), while Algorithm 1 produces bounds on \(\mu _n^+\) and \(\mu _n^-\). The following lemma derives bounds on \(\xi _n(X^+)\) and \(\xi _n(-X^-)\) in (39).
Lemma 9
Let \(n\in \mathbb {N}^+\).
-
(i)
The functionals \(\xi _n(X^+)\) and \(\xi _n(-X^-)\) are bounded by
$$\begin{aligned} 0 \le \xi _n(\underline{m}^+_{n+1})&\le \xi _n(X^+)\\ \xi _n(-\bar{m}^-_{n+1})&\le \xi _n(-X^-) \le \xi _n(-\underline{m}_n^-) \le 0. \end{aligned}$$ -
(ii)
If the random variable X is a mixture of Gamma distributions \(T_\theta \sim \mathrm{Gamma}(k, \theta )\) with fixed shape parameter \(k>0\), scale parameter \(\theta \in \left[ \theta _{\min }, \theta _{\max } \right] \subseteq \left( -1,1\right) \), and \(T_{\theta } = -T_{-\theta }\) for \(\theta <0\), then
$$\begin{aligned}&\xi _n \left( \frac{ (k+n)\underline{m}^+_{n} }{ k+n-1 } \right) \le \xi _n(X^+) \le \frac{1}{n!} \left[ {}_{2} F_{1}\left( 1, k+n ; \, n+1; \, [\theta _{\max }]^+ \right) -1 \right] \\&\frac{1}{n!} \left[ {}_{2} F_{1}\left( 1, k+n ; \, n+1; \, -[\theta _{\min }]^- \right) -1 \right] \le \xi _n(-X^-), \end{aligned}$$where \(_2F_1\) is the Gaussian hypergeometric function.
The bounds in Lemma 9(i) can deal with any distribution X, particularly any eigenvalue of A. In contrast, the two bounds in Lemma 9(ii) that are based on the Gaussian hypergeometric function \(_2F_1\) require that each absolute eigenvalue of A is less than \(\frac{1}{2}\), because \(X_\mathbf{u}\) has scale parameter \(\theta (\mathbf{u}) = 2\mathbf{u}^T A \mathbf{u} = 2\lambda (\mathbf{u})\) with \(\mathbf{u} \in S^d\).
We present example cases where the minimal eigenvalue \(\lambda _{\min }\) is \(-\,0.1\), \(-\,0.2\), \(-\,0.3\), or \(-\,0.4\), while the maximal eigenvalue \(\lambda _{\max }\) is 0.1, 0.2, 0.3, or 0.4. The other eigenvalues are either equally spaced between \(\lambda _{\min }\) and \(\lambda _{\max }\), or equally split at the two extremes.
Define
where \(\underline{\mu }_Y\) and \(\bar{\mu }_Y\) are a lower and an upper bound on the mean \(\mu _Y\) of Y, respectively. The ratio r represents the size of the maximal error relative to the mean \(\mu _Y\). A small r indicates more accurate bounds. The weight \(w\in [0,1]\) is the normalized location of \(\mu _Y\) on the interval \(\left[ {\underline{\mu }}_Y, \bar{\mu }_Y \right] \). A value of w close to zero (one) reflects that the lower (upper) bound is the most accurate bound on \(\mu _Y\). The bounds are equally accurate if \(w=\frac{1}{2}\).
Algorithm 1 stops after iteration k if \(\varepsilon _k< 10^{-6}\) with \(\varepsilon _k\) as in (30), or if \(k = 100\). Subsequently, the expression in (39) can be bounded. Table 6 reports several statistics for the case with two eigenvalues (\(d=2\)). The mean \(\mu _Y\) is in all cases of the same order of magnitude. The maximal relative error r is minimal when both extreme eigenvalues \(\lambda _{\min }\) and \(\lambda _{\max }\) are close to zero; a similar observation holds with \(d=20\) in Table 7 (eigenvalues equally spaced on the interval \(\left[ \lambda _{\min }, \lambda _{\max } \right] \)) and in Table 8 (10 eigenvalues at both \(\lambda _{\min }\) and \(\lambda _{\max }\)). Indeed, we can perfectly estimate \(\mu _Y = 1\) for the case where each eigenvalue equals zero.
It follows from w in Table 6 that with \(d=2\), the upper bound tends to be more accurate than the lower bound if \(\lambda _{\max }\) is high, thus when \(X^+\) tends to be large. This observation is reversed for \(d=20\) (Tables 7 and 8). Compared to \(\lambda _{\max }\), the value of \(\lambda _{\min }\) has a smaller impact on w. The accuracy decreases with the dimension d and the magnitude of the eigenvalues of A. More specifically, we observe a lower accuracy r if \(d=20\) and \(\max ( \left| \lambda _{\min } \right| , \left| \lambda _{\max } \right| ) \ge 0.3\).
The computation time in Tables 6, 7, 8 is the mean CPU time of 5,000 computations of each model. The standard error is at most 0.12 ms. The computation time and the number of iterations are lower in cases where \(\left| \lambda _{\min } \right| = \left| \lambda _{\max } \right| \).
5 Discussion and conclusions
This paper has presented an iterative algorithm that bounds the lower and upper partial moment at \(a\in {\mathbb {R}}\) of the random variable X with a known finite sequence of moments of X. In a numerical example, particularly the higher order partial moments can have narrow bounds. The obtained bounds imply bounds on unobserved higher order moments of X which is useful for bounding moments of the transformation f(X). In another application, a transformation \(f(x)=e^x\) is considered for the quadratic form \(X=Z'AZ\) where Z is a multivariate normal distribution. Numerical experiments suggest that the obtained bounds on \({{\mathbb E}\!\left[ \exp (X)\right] }\) are most accurate if \(X^+\) is not too large. The accuracy depends on the dimension as well as the eigenvalues of A.
Notes
The density function of a Pearson distribution is proportional to \(\exp \left( -\!\int \!{\frac{x-a}{b_{0}+b_{1}x+b_{2}x^{2}}}\,{\mathrm {d}}x\right) \) with constants a, \(b_0\), \(b_1\), and \(b_2\). Example distributions include the normal, beta, uniform, Gamma, Chi-squared, Student’s t, and Pareto distribution.
The density function of a log-concave distribution is proportional to \(\exp (\psi (y))\) with \(\psi \) a concave function on the domain of the distribution. Log-concave distributions are necessarily unimodal.
All computations are on an i5-6300 CPU with 2.4 GHz dual core and 16 GB RAM.
References
Abramowitz M, Stegun IA (1964) Handbook of mathematical functions: with formulas, graphs, and mathematical tables, vol 55. Courier Corporation, Chelmsford
Akhiezer NI (1965) The classical moment problem: and some related questions in analysis, vol 5. Oliver & Boyd, Edinburgh
Barnett NS, Dragomir SS, Agarwal R (2002) Some inequalities for probability, expectation, and variance of random variables defined over a finite interval. Comput Math Appl 43(10):1319–1357
Bawa VS, Lindenberg EB (1977) Capital market equilibrium in a mean-lower partial moment framework. J Financ Econ 5(2):189–200
Berg C (1995) Indeterminate moment problems and the theory of entire functions. J Comput Appl Math 65(1):27–55
Dokov SP, Morton DP (2005) Second-order lower bounds on the expectation of a convex function. Math Oper Res 30(3):662–677
Frauendorfer K (1988) Solving slp recourse problems with arbitrary multivariate distributions-the dependent case. Math Oper Res 13(3):377–394
Gavriliadis PN (2008) Moment information for probability distributions, without solving the moment problem, I: where is the mode? Commun Stat Theory Methods 37(5):671–681
Gavriliadis PN, Athanassoulis GA (2009) Moment information for probability distributions, without solving the moment problem, II: main-mass, tails and shape approximation. J Comput Appl Math 229(1):7–15
Goria MN, Tagliani A (2003) Bounds on the tail probability and absolute difference between two distributions. Communications in Statistics-Theory and Methods 32(3):519–532
Gut A (2002) On the moment problem. Bernoulli 3(8):407–421
Ibragimov IA (1956) On the composition of unimodal distributions. Theory Prob Appl 1(2):255–260
Kall P (1991) An upper bound for SLP using first and total second moments. Ann Oper Res 30(1):267–276
Kreĭn MG, Nudelman AA (1977) The Markov moment problem and extremal problems: ideas and problems of P.L. Čebyšev and A.A. Markov and their further development. American Mathematical Society, Providence
Lin GD (1997) On the moment problems. Stat Prob Lett 35(1):85–90
Lindsay BG, Basak P (2000) Moments determine the tail of a distribution (but not much else). Am Stat 54(4):248–251
Magnus JR (1986) The exact moments of a ratio of quadratic forms in normal variables. Annales d’Économie et de Statistique 4:95–109
Milev M, Novi Inverardi P, Tagliani A (2012) Moment information and entropy evaluation for probability densities. Appl Math Comput 218(9):5782–5795
Mnatsakanov RM (2008a) Hausdorff moment problem: reconstruction of distributions. Stat Prob Lett 78(12):1612–1618
Mnatsakanov RM (2008b) Hausdorff moment problem: reconstruction of probability density functions. Stat Prob Lett 78(13):1869–1877
Mnatsakanov RM, Hakobyan AS (2009) Recovery of distributions via moments. Lect Notes-monogr Ser 57:252–265
Pakes AG, Hung WL, Wu JW (2001) Criteria for the unique determination of probability distributions by moments. Aust N Z J Stat 43(1):101–111
Shohat JA, Tamarkin JD (1943) The problem of moments, vol 1. American Mathematical Society, Providence
Stoyanov J (2000) Krein condition in probabilistic moment problems. Bernoulli 5(6):939–949
Stoyanov JM (2013) Counterexamples in probability. Courier Corporation, Chelmsford
Szegö G (1975) Orthogonal polynomials, 4th edn. American Mathematical Society, Providence
Winkler RL, Roodman GM, Britney RR (1972) The determination of partial moments. Manage Sci 19(3):290–296
Acknowledgements
The author thanks the anonymous reviewer and Ludolf Meester for useful suggestions. This helped to improve the paper in many ways. This work was partially supported by Grant COST-2011-01, The Netherlands.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Lemma 1
Since the equality cases are straightforward, we prove strict bounds for X with both \(X^+\not \equiv 0\) and \(X^-\not \equiv 0\). The constraints in (2) and the first inequality in (3) follow from the identity \(\mu _i = \mu _i^+ + (-1)^i\mu _i^-\) (\(i\in \mathbb {N}^+_0\)). The second and third inequality in (3) follow from (\(n\in \mathbb {N}^+\))
\(\square \)
Proof of Lemma 2
See Theorem 3.1 in Gavriliadis and Athanassoulis (2009). \(\square \)
Proof of Lemma 3
The j zeros \(x_j^{(1)}< x_j^{(2)}< \cdots < x_j^{(j)}\) of each polynomial \(P_j\) are real and distinct (Akhiezer 1965; Szegö 1975). Lemma 2 implies that the functions L and U are a lower bound and an upper bound of the cdf F of X, respectively:
Therefore,
where \(j_0^+\), \(dL_j\), and \(dU_j\) (\(j=1,\ldots ,N\)) are defined in Lemma 3. \(\square \)
Proof of Lemma 4
- (i):
-
The Cauchy–Schwarz inequality with the inner product \(\langle Y,Z\rangle = {{\mathbb E}\!\left[ YZ\right] }\) is \(\left| {{\mathbb E}\!\left[ Y Z\right] }\right| ^2 \le {{{\mathbb E}\!\left[ Y^2\right] }}{{{\mathbb E}\!\left[ Z^2\right] }}\). Substituting \(Y = X^{n/2-1}\) and \(Z = X^{n/2}\) leads to the desired result for \(n>2\), and for \(n=2\) provided \(\mu _0:={{\mathbb P}\!\left( X\ne 0 \right) }=1\).
An additional step is needed for the case where \(n=2\) and \(\mu _0<1\). Consider the Cauchy–Schwarz inequality \(\tilde{\mu }_1^2={{\mathbb E}\!\left[ \tilde{X}\right] }^2 \le {{{\mathbb E}\!\left[ \tilde{X}^0\right] }}{{{\mathbb E}\!\left[ \tilde{X}^2\right] }}=\tilde{\mu }_0\tilde{\mu }_2\) with \(\tilde{X} \sim X_{X\ne 0}\). Since \(\tilde{\mu }_i =\mu _i/{{\mathbb P}\!\left( X\ne 0 \right) }\), multiplying both hand sides of this inequality by \(\left[ {{\mathbb P}\!\left( X\ne 0 \right) } \right] ^2\) gives the result with \(n=2\) and \(\mu _0 <1\).
The Cauchy–Schwarz inequality holds with equality if and only if (iff) \(Y = \lambda Z\) for some constant \(\lambda \), \(Y\equiv 0\), or \(Z\equiv 0\). None of these three conditions holds if X is nqd. It follows that \(\mu _{n-1}^2=\mu _n\mu _{n-2}\) holds iff X is quasi-degenerate.
- (ii):
-
Follows from \(\mu _1>0\) and (i).
\(\square \)
Proof of Lemma 5
Without loss of generality, assume \(J=\{1,\ldots ,|J|\}\) and \(0<h_1 \le h_2 \le \cdots \). When summing over all \(j\in J\), the set J is omitted for convenience.
- (i):
-
Notice that
$$\begin{aligned} 0&< m_i(Y_1) = g_i h_1 \le m_i(Y_2) = g_i h_2 \le \cdots . \end{aligned}$$A convex combination of larger \(m_{i}(Y_j)\), \(j \ge j_0\), is larger than a convex combination of this combination and another convex combination of smaller \(m_{i}(Y_j)\) with \(j < j_0\):
$$\begin{aligned} \sum _{j\ge j_0} m_{i}(Y_j) \frac{ {{\mathbb E}\!\left[ Y_j^{i-1}\right] } }{ \sum _{r \ge j_0} {{\mathbb E}\!\left[ Y_r^{i-1} \right] } }&\ge \sum _j m_{i}(Y_j)\frac{ {{\mathbb E}\!\left[ Y_j^{i-1} \right] } }{ \sum _r {{\mathbb E}\!\left[ Y_r^{i-1} \right] } } \end{aligned}$$Rearranging terms,
$$\begin{aligned} \frac{ \sum _{j\ge j_0} {{\mathbb E}\!\left[ Y_j^{i} \right] } }{ \sum _j {{\mathbb E}\!\left[ Y_j^{i}\right] } } = \frac{ \sum _{j\ge j_0} m_{i}(Y_j){{\mathbb E}\!\left[ Y_j^{i-1}\right] } }{ \sum _j m_{i}(Y_j) {{\mathbb E}\!\left[ Y_j^{i-1} \right] } }&\ge \frac{ \sum _{r \ge j_0} {{\mathbb E}\!\left[ Y_r^{i-1} \right] } }{ \sum _r {{\mathbb E}\!\left[ Y_r^{i-1} \right] } }. \end{aligned}$$(41)Let \(w_j^{(i)} := {{\mathbb E}\!\left[ Y_j^{i-1} \right] }/{ \sum _{r\in J} {{\mathbb E}\!\left[ Y_r^{i-1}\right] } }\). By (41), the measure \(w^{(i+1)}\) stochastically dominates the measure \(w^{(i)}\):
$$\begin{aligned} \sum _{j\ge j_0} w_j^{(i+1)} = \sum _{j\ge j_0}\frac{ {{\mathbb E}\!\left[ Y_j^{i} \right] } }{ \sum _r {{\mathbb E}\!\left[ Y_r^{i}\right] } } \ge \frac{ \sum _{j\ge j_0} {{\mathbb E}\!\left[ Y_j^{i-1} \right] } }{ \sum _r {{\mathbb E}\!\left[ Y_r^{i-1}\right] } } = \sum _{j\ge j_0} w_j^{(i)} \end{aligned}$$Write
$$\begin{aligned} m_i(Y)&= \frac{{{\mathbb E}\!\left[ \left( \sum _j Y_j\right) ^i\right] }}{{{\mathbb E}\!\left[ \left( \sum _r Y_r\right) ^{i-1}\right] }} = \sum _j \frac{ {{\mathbb E}\!\left[ Y_j^i\right] } }{ \sum _r {{\mathbb E}\!\left[ Y_r^{i-1}\right] } } = \sum _j \frac{ {{\mathbb E}\!\left[ Y_{j}^{i-1}\right] } }{ \sum _r {{\mathbb E}\!\left[ Y_r^{i-1}\right] } } \frac{{{\mathbb E}\!\left[ Y_j^i\right] }}{ {{\mathbb E}\!\left[ Y_{j}^{i-1}\right] } }\\&= \sum _j w_j^{(i)} m_i(Y_j) = g_i \sum _j w_j^{(i)} h_j. \end{aligned}$$Because \(\{ h_j \}_{j\in J}\) is a positive nondecreasing sequence and \(w^{(i+1)}\) stochastically dominates \(w^{(i)}\), for each \(r\in J\),
$$\begin{aligned} \frac{m_{i+1}(Y)}{m_{i+1}(Y_r)}&= \frac{g_{i+1}\sum _j w_j^{(i+1)} h_j }{g_{i+1} h_r} = \frac{1}{ h_r}\sum _j w_j^{(i+1)} h_j \ge \frac{1}{ h_r}\sum _j w_j^{(i)} h_j = \frac{m_{i}(Y)}{m_{i}(Y_r)}. \end{aligned}$$(42)Inequality (5) follows from rearranging (42) and \(m_{i},m_{i+1}>0\).
- (ii):
-
Under the assumption that Y is a mixture, i.e., \({{\mathbb P}\!\left( Y_j Y_r = 0 \right) }=1\) (\(j \ne r\)), the moment ratio \(m_i(Y)\) is a weighted average of \(m_i(Y_j)\) with positive weights \({{\mathbb E}\!\left[ Y_j^{i-1}\right] }/\sum _r{{\mathbb E}\!\left[ Y_r^{i-1}\right] }\):
$$\begin{aligned} m_i(Y)&= \frac{ {{\mathbb E}\!\left[ ( \sum _j Y_j )^i \right] } }{ {{\mathbb E}\!\left[ ( \sum _r Y_r )^{i-1} \right] } } = \frac{ \sum _j {{\mathbb E}\!\left[ Y_j^i \right] } }{ \sum _r {{\mathbb E}\!\left[ Y_r^{i-1} \right] } } = \sum _j \frac{ {{\mathbb E}\!\left[ Y_j^i\right] } }{ {{\mathbb E}\!\left[ Y_j^{i-1}\right] } } \frac{ {{\mathbb E}\!\left[ Y_j^{i-1}\right] } }{ \sum _r {{\mathbb E}\!\left[ Y_r^{i-1} \right] } } \\&\le \max _j \left( \frac{{{\mathbb E}\!\left[ Y_j^i\right] } }{ {{\mathbb E}\!\left[ Y_j^{i-1}\right] } } \right) = \max _{j} m_i(Y_j) \end{aligned}$$ - (iii):
-
The highest moment in the expansion of \(\left( \sum _{r\in J} Y_r \right) ^{i-1}\) equals \(i-1\). By Lemma 4(ii),
$$\begin{aligned} {{\mathbb E}\!\left[ Y_j^{r_j+1}\right] }&\le {{\mathbb E}\!\left[ Y_j^{r_j}\right] }\frac{{{\mathbb E}\!\left[ Y_j^i\right] }}{{{\mathbb E}\!\left[ Y_j^{i-1}\right] }}&r_j+1 \le i. \end{aligned}$$(43)Hence, by independence of each \(\{Y_r\}_{r\in J}\) and Lemma 4(ii),
$$\begin{aligned} {{\mathbb E}\!\left[ \left( \sum _{r} Y_{r} \right) ^{i-1} Y_j \right] }&= \sum _{\sum _{j}r_j=i-1} \left( {\begin{array}{c}i-1\\ r_1,\ldots ,r_{|J|}\end{array}}\right) {{\mathbb E}\!\left[ Y_1^{r_1} \right] } \ldots {{\mathbb E}\!\left[ Y_j^{r_{j+1}} \right] } \ldots {{\mathbb E}\!\left[ Y_{|J|}^{r_{|J|}}\right] } \nonumber \\&= \sum _{\sum _{j}r_j=i-1} \left( {\begin{array}{c}i-1\\ r_1,\ldots ,r_{|J|}\end{array}}\right) {{\mathbb E}\!\left[ Y_1^{r_1} \right] } \ldots {{\mathbb E}\!\left[ Y_{|J|}^{r_{|J|}} \right] } \frac{{{\mathbb E}\!\left[ Y_j^{r_{j+1}} \right] }}{{{\mathbb E}\!\left[ Y_j^{r_j} \right] }} \nonumber \\&\le \sum _{\sum _{j}r_j=i-1} \left( {\begin{array}{c}i-1\\ r_1,\ldots ,r_{|J|}\end{array}}\right) {{\mathbb E}\!\left[ Y_1^{r_1} \right] } \ldots {{\mathbb E}\!\left[ Y_{|J|}^{r_{|J|}} \right] } \frac{{{\mathbb E}\!\left[ Y_j^{i}\right] }}{{{\mathbb E}\!\left[ Y_j^{i-1}\right] }} \nonumber \\&= {{\mathbb E}\!\left[ \left( \sum _{r} Y_r \right) ^{i-1} \right] }\frac{{{\mathbb E}\!\left[ Y_j^i\right] }}{{{\mathbb E}\!\left[ Y_j^{i-1}\right] }}. \end{aligned}$$(44)Substituting (44),
$$\begin{aligned} m_i(Y)&= \frac{ {{\mathbb E}\!\left[ Y^i \right] } }{ {{\mathbb E}\!\left[ Y^{i-1} \right] } } = \frac{ \sum _j{{\mathbb E}\!\left[ ( \sum _r Y_r )^{i-1} Y_j \right] } }{ {{\mathbb E}\!\left[ ( \sum _r Y_r )^{i-1} \right] } } \le \sum _j \frac{ {{\mathbb E}\!\left[ \left( \sum _r Y_r \right) ^{i-1} \right] } }{ {{\mathbb E}\!\left[ \left( \sum _r Y_r \right) ^{i-1} \right] } }\frac{ {{\mathbb E}\!\left[ Y_j^i \right] } }{ {{\mathbb E}\!\left[ Y_j^{i-1} \right] } } \\&= \sum _j \frac{ {{\mathbb E}\!\left[ Y_j^i\right] } }{ {{\mathbb E}\!\left[ Y_j^{i-1}\right] } } = \sum _j m_i(Y_j). \end{aligned}$$ - (iv):
-
As the case \(I=J\) is straightforward, assume \(I\subset J\). Define the complement of I as \(I^c := J \backslash I\). Since each \(Y_j\) is independent and log-concave, the convolutions \(Y_I:=\sum _{j\in I} Y_j\), \(Y_{I^c}:=\sum _{j\in I^c} Y_j\), and \(Y=Y_I+Y_{I^c}\) have log-concave distributions (Ibragimov 1956). Denote the probability density function (pdf) of the nonnegative random variables \(Y_I\), \(Y_{I^c}\), and Y by \(f_{Y_I}(x)\), \(f_{Y_{I^c}}(x)\), and \(f_{Y}(x):= \int _0^\infty f_{Y_I}(x-z) \, \mathrm{d}F_{Y_{I^c}}(z)\), respectively. For any \(i\ge 1\), consider \(\tilde{Y}_I\) and \(\tilde{Y}\) with pdf
$$\begin{aligned} f_{\tilde{Y}_I}(x)&= \frac{ x^{i-1} f_{Y_I}(x) }{\int _0^\infty x^{i-1} f_{Y_I}(x) \, \mathrm{d}x}&f_{\tilde{Y}}(x)&= \frac{ x^{i-1} f_{Y}(x) }{\int _0^\infty x^{i-1} f_{Y}(x) \, \mathrm{d}x}. \end{aligned}$$(45)Because \(m_i(Y_I) = \frac{{{\mathbb E}\!\left[ \left[ Y_I\right] ^i\right] }}{{{\mathbb E}\!\left[ \left[ Y_I\right] ^{i-1}\right] }} = {{\mathbb E}\!\left[ \tilde{Y}_I\right] }\) and \(m_i(Y)=\frac{{{\mathbb E}\!\left[ Y^i\right] }}{{{\mathbb E}\!\left[ Y^{i-1}\right] }} = {{\mathbb E}\!\left[ \tilde{Y}\right] }\), we need to show that \({{\mathbb E}\!\left[ \tilde{Y_I}\right] } \le {{\mathbb E}\!\left[ \tilde{Y}\right] }\) holds. To do this, it suffices to show that the marginal likelihood ratio \(f_{\tilde{Y}_j} / f_{\tilde{Y}} \propto f_{Y_j}/f_{Y}\) is nonincreasing on \((0, \infty )\) as this indicates that \(\tilde{Y}\) stochastically dominates \(\tilde{Y}_j\). Since \(Y_I\) is a log-concave distribution, we have that \(\frac{\mathrm{d}}{\mathrm{d}x}\log f_{Y_I}(x)\) is nonincreasing on \([0,\infty )\). This gives for the density function of \(Y=Y_I+Y_{I^c}\),
$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d} x} \log f_{Y_I}(x)&= \frac{\mathrm{d}}{\mathrm{d} x}\log {{\mathbb E}\!\left[ f_{Y}(x+Y_{I^c})\right] } \le \frac{\mathrm{d}}{\mathrm{d} x} \log f_{Y}(x)&x&\ge 0. \end{aligned}$$This proves that \(\log \left( f_{Y_I}(x)/f_Y(x) \right) \), and thus the ratio \(f_{\tilde{Y}_I}/f_{\tilde{Y}}\), is nonincreasing on \([0,\infty )\). Therefore, \({{\mathbb E}\!\left[ \tilde{Y}_I\right] } \le {{\mathbb E}\!\left[ \tilde{Y}\right] }\) and thus \(m_i(Y_I) \le m_i(Y)\) for any \(i\ge 1\).
\(\square \)
Proof of Lemma 6
We prove the bounds on \(m_i^+:=m(X^+)\) and \(m_i(Y)\), the statements on \(m_i^-\) and \(m_i(Z)\) follow analogously.
- (i):
-
Since \(X = \sum _{j\in I} X_j\) is a mixture, the positive part of X is \(X^+ = \sum _{j\in I} X_j^+\). Apply Lemma 5(ii).
- (ii):
-
Since each \(Y_j\) is independent and log-concave, the convolution \(Y=\sum _{j\in J}Y_j\) is log-concave (Ibragimov 1956). Denote the probability density function (pdf) of X and Y by \(f_X(x):= \int _0^\infty f_Y(x+z) \, \mathrm{d}F_Z(z)\) and \(f_Y(x)\), respectively. For any \(i\ge 1\), consider \(\tilde{X}\) and \(\tilde{Y}\) with pdf
$$\begin{aligned} f_{\tilde{X}}(x)&= \frac{ x^{i-1} f_{\tilde{X}}(x) }{\int _0^\infty x^{i-1} f_{\tilde{X}}(x) \, \mathrm{d}x}&f_{\tilde{Y}}(x)&= \frac{ x^{i-1} f_{\tilde{Y}}(x) }{\int _0^\infty x^{i-1} f_{\tilde{Y}}(x) \, \mathrm{d}x}. \end{aligned}$$(46)Because \(m_i^+ = m_i(X^+) = \frac{{{\mathbb E}\!\left[ \left[ X^+\right] ^i\right] }}{{{\mathbb E}\!\left[ \left[ X^+\right] ^{i-1}\right] }} = {{\mathbb E}\!\left[ \tilde{X}\right] }\) and \(m_i(Y)=\frac{{{\mathbb E}\!\left[ Y^i\right] }}{{{\mathbb E}\!\left[ Y^{i-1}\right] }} = {{\mathbb E}\!\left[ \tilde{Y}\right] }\), we need to show that \({{\mathbb E}\!\left[ \tilde{X}\right] } \le {{\mathbb E}\!\left[ \tilde{Y}\right] }\) holds. To do this, it suffices to show that the marginal likelihood ratio \(f_{\tilde{X}} / f_{\tilde{Y}} \propto f_X/f_Y\) is nonincreasing on \((0, \infty )\) as this indicates that \(\tilde{Y}\) stochastically dominates \(\tilde{X}\). Since Y is a log-concave distribution, \(\frac{\mathrm{d}}{\mathrm{d}x}\log f_Y(x)\) is nonincreasing on \([0,\infty )\). This gives for the density function of \(X=Y-Z\) with Z any nonnegative and independent distribution,
$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d} x} \log f_X(x)&= \frac{\mathrm{d}}{\mathrm{d} x}\log {{\mathbb E}\!\left[ f_Y(x+Z)\right] } \le \frac{\mathrm{d}}{\mathrm{d} x} \log f_Y(x)&x&\ge 0. \end{aligned}$$This proves that \(\log \left( f_X(x)/f_Y(x) \right) \), and thus the ratio \(f_{\tilde{X}}/f_{\tilde{Y}}\), is nonincreasing on \([0,\infty )\). Therefore, \({{\mathbb E}\!\left[ \tilde{X}\right] } \le {{\mathbb E}\!\left[ \tilde{Y}\right] }\) and thus \(m_i(X^+) \le m_i(Y)\) for any \(i\in \mathbb {N}^+\).
- (iii):
-
Follows immediately from \(X^+=\max (X,0)\equiv Y\).
\(\square \)
Proof of Theorem 1
The moments \(\mu _{n-2}\), \(\mu _{n-1}\), and \(\mu _n\) are known and satisfy \(\mu _i = \mu _i^+ + (-1)^i\mu _i^-\) (\(i\in \{n-2,n-1,n\}\) and n even). Combining this with \(0<m_{n-1}^+\le m_n^+\) from Lemma 4(ii) gives for \(t\in \{0,1\}\)
Define for \(s\in \{+,-\}\)
The function \(g^s\) is minimal at \(b:=\sqrt{\mu _n/\mu _{n-2}}\), monotonically decreases on (0, b), and monotonically increases on \((b,\infty )\). Apply Lemma 4(i) to the nqd X,
By (47)–(50) and (12), for \(t\in \{0,1\}\)
Notably, the derivation above started with bounds on the moments of \(X^{-}\) and ends with bounds on the moments of \(X^+\).
Since \(-X=X^- - X^+\), we can switch all superscript + and – signs and use \({{\mathbb E}\!\left[ (-X)^i\right] }=(-1)^i\mu _{i-1}\) to generalize (51) to
Using (52), we bound \(\mu _{n-1}^s\) by substituting appropriate values for the unobserved \(m_{n-t}^s\). The function \(u^s\) increases on (0, b), while it decreases on \((b,\infty )\). Therefore, the most conservative upper bound from (52) equals \(u^s(b)=a/g^s(b)\). Possibly tighter bounds on \(\mu _{n-1}^s\) follow from the more general bound \(\min (u^s(c), u^s(d))\) with \(c=\mathop {{{\mathrm{argmin}}}}\nolimits _x \{ | x - {b} | : x\in \left[ \underline{m}_{n-1}^s, \bar{m}_{n-1}^s \right] \} \) and \(d=\mathop {{{\mathrm{argmin}}}}\nolimits _x \{ | x - {b} | : x\in \left[ \underline{m}_{n}^s, \bar{m}_{n}^s \right] \}\). Thus, the upper bound on \(\mu _{n-1}^s\) is \(u^s(b)\) in cases where both \(b\in \left[ \underline{m}_{n-1}^s, \bar{m}_{n-1}^s \right] \) and \(b\in \left[ \underline{m}_n^s, \bar{m}_{n}^s \right] \) hold. By (15), this is equivalent to cases where \(b\in \left[ \underline{m}_n^s , \bar{m}_{n-1}^s \right] \).
If \(\bar{m}_{n-1}^s\le b\le \underline{m}_n^s\), one can use \(\min (u^s(\bar{m}^s_{n-1}), u^s(\underline{m}^s_{n}))\) as a bound, because (15) and the monotonicity of \(u^s\) imply that \(\bar{m}_{n-1}^s\) or \(\underline{m}_{n}^s\) must give the lowest possible upper bound on \(\mu _{n-1}^s\). The bottom two components of \(\nu ^s\) in (16) follow from the monotonicity of the function \(u^s\) on (0, b) and \((b, \infty )\).
The obtained upper bound on \(\mu _{n-1}^s\) implies an upper bound on \(\mu _{n-2}^s\) from \(\mu _{n-2}^s \le {\bar{\mu }_{n-1}^s}/{\underline{m}_{n-1}^s}\). A tighter bound is available if the previous bound on \(\mu _{n-1}^s\) is based on \(\bar{m}^s_{n-1}\). By (52),
where
In (53), an upper bound on \(\mu _{n-2}^s\) corresponds to a lower bound on \(h^s\). The continuous function \(h^s\) monotonically decreases on \((0 ,s\, m_{n-1})\) and monotonically increases on \(([s\, m_{n-1}]^+,\infty )\). By (14), \(\underline{m}_{n-1}^s \ge s\, m_{n-1}\) such that \(\underline{m}_{n-1}^s\) gives the maximal upper bound on \(\mu ^s_{n-2}\) in (53). As a result, using \(\underline{m}^s_{n-1}\) (instead of \(\bar{m}^s_{n-1}\)) can tighten the upper bound on \(\mu _{n-2}^s\). This gives the bound on \(\mu ^s_{n-2}\) in (16).
Similarly, the upper bound on \(\mu _{n-1}^s\) implies the upper bound \(\mu _n^s \le \bar{m}_n^s\bar{\mu }_{n-1}^s\). A tighter bound is available if the previous bound on \(\mu _{n-1}^s\) is based on \(\underline{m}^s_n\). By (52),
where
In (54), an upper bound on \(\mu _{n}^s\) corresponds to a lower bound on \(k^s\). The function \(k^s\) is continuous on \((0,\infty )\) and monotonically increases on the interval \(([s\, m_{n}]^+,\infty )\). Provided \(s\, m_{n}>0\), \(k^s\) monotonically decreases on \((0 ,s\, m_{n})\). By (14), \(1/\bar{m}_{n}^s \ge 1 / (s\,m_{n})\) such that \(\bar{m}_{n}^s\) gives the maximal upper bound on \(\mu ^s_{n}\) in (54). Thus, using \(\bar{m}^s_{n}\) (instead of \(\underline{m}^s_{n}\)) can tighten the upper the bound on \(\mu _{n}^s\). This proves the last inequality in (16). \(\square \)
Proof of Lemma 7
The inequality in (17) follows from Lemma 4(i). Next, consider Hölder’s inequality
The first inequality in (18) is obtained by substituting \(Y=(X^s)^j\), \(Z\equiv 1\), \(p=i/j\) and \(q=p/(p-1)\). The first inequality in (19) follows from \(Y=(X^s)^i\), \(p=j/i\), and \(q=p/(p-1)\). The second inequality in both (18) and (19) follows from Lemma 4(i) and \(\mu _i^s = m_i^s \mu _{i-1}^s\) (\(i\in \mathbb {N}^+\)). The inequalities in (20)–(21) are based on \(\mu _i = \mu _i^+ + (-1)^i\mu _i^-\) where \(i\in \mathbb {N}^+_0\). \(\square \)
Proof of Lemma 8
We prove the case with strict bounds, since the case with equality signs is trivial. Both inequalities in (22) are straightforward by \(m^s_i = \mu ^s_i/\mu ^s_{i-1}\). The two inequalities in (23) follow from Lemma 4(ii):
\(\square \)
Proof of Lemma 9
- (i):
-
Define the function
$$\begin{aligned} r_{m,n}(x)&= {\sum _{i=m}^\infty \frac{x^i}{(n+i)!}}&m&\in \{1,2\}&n&= 0,1,\ldots \end{aligned}$$We prove the properties of \(r_{m,n}\) in Tabel 9. Notice that
$$\begin{aligned} r_{1,n}(x) = {\xi }_n(x) = \frac{_1F_1(1, n+2; x) - 1}{(n+1)!}. \end{aligned}$$(55)Apply Abramowitz and Stegun (1964, Eqs. (13.4.9) and (13.1.27)),
$$\begin{aligned} \frac{\mathrm{d}^k}{\mathrm{d}x^k} r_{1,n}(x)&= \frac{\mathrm{d}^k}{\mathrm{d}x^k} \frac{_1F_1(1, \, n+2; x) - 1}{(n+1)!}\nonumber \\&= \frac{k!}{ (k+n+1)! } \left[ {_1F_1(k+1, \, k+n+2; x)} - 1_{k=0}\right] \end{aligned}$$(56)$$\begin{aligned}&= \frac{ k! }{ (k+n+1)! } \left[ {e^x} {_1F_1(n+1, \, k+n+2; -x) - 1_{k=0}} \right]&k,n&\in \mathbb {N}^+_0 \end{aligned}$$(57)with \(1_A\) the indicator function which equals 1 if the event A is true, and 0 otherwise. Equations (56) and (57) indicate \(r_{1,n}(0) = 0\) and \(\frac{\mathrm{d}^k}{\mathrm{d}x^k} r_{1,n}(x) > 0\) for all \(x\in {\mathbb {R}}\) and \(k\in \{1,2\}\). Thus, the function \(r_{1,n}\) is increasing and convex with \(r_{1,n}(x)<0\) if \(x<0\), \(r_{1,n}(0)=0\), \(r_{1,n}(x)>0\) if \(x>0\), and \(r'_{1,n}(0) = 1/(n+1)!\).
Since \(r_{2,n}(x) = x \, r_{1,n+1}(x)\) and \(r_{2,n}(x) = r_{1,n}(x) - {x}/{(n+1)!}\), the function \(r_{2,n}\) is nonnegative and convex with \(r_{2,n}(0)=r'_{2,n}(0)=0\). Therefore, \(r_{2,n}(x)\) decreases on \((-\,\infty ,0)\) and increases on \((0,\infty )\). Next, we apply the properties in Table 9 on \({\xi }_n(x) = r_{1,n}(x)\). Consider \(\tilde{X}_n\) with the cdf \(\tilde{F}_n\) in terms of the cdf F of X,
$$\begin{aligned} \mathrm{d}\tilde{F}_n(x) = \frac{ x^n \, \mathrm{d}F(x) }{ \int x^n \, \mathrm{d}F(x) } = \frac{ x^n \, \mathrm{d}F(x) }{ {{\mathbb E}\!\left[ X^n\right] } }. \end{aligned}$$It follows that \({{\mathbb E}\!\left[ r_{1,n}(\tilde{X}_n)\right] }={\xi }_n(X)\) and \({{\mathbb E}\!\left[ \tilde{X}_n\right] }=m_{n+1}\). Using Lemma 4(ii), Jensen’s inequality, and the convexity of \(r_{1,n}\), we find the lower bounds on \(\xi _n(X^+)\) and \(\xi _n(-X^-)\):
$$\begin{aligned} 0 \le \xi _n(m_{n+1}^+) = r_{1,n}(m_{n+1}^+) = r_{1,n}\left( {{\mathbb E}\!\left[ \tilde{X}^+_n\right] }\right)&\le {{\mathbb E}\!\left[ r_{1,n}(\tilde{X}^+_n)\right] } = {\xi _n(X^+)}\\ \xi _n(-m_{n+1}^-) = r_{1,n}(-m_{n+1}^-) = r_{1,n}\left( {{\mathbb E}\!\left[ -\tilde{X}^-_n\right] }\right)&\le {{\mathbb E}\!\left[ r_{1,n}(-\tilde{X}^-_n)\right] } = {\xi _n(-X^-)}. \end{aligned}$$The upper bound on \(\xi _n(-X^-)\) follows by applying Jensen’s inequality to the convex function \(r_{2,n-1}\):
$$\begin{aligned} \xi _n(-X^-)&= \frac{1}{m_n^-{{\mathbb E}\!\left[ (-X^-)^{n-1}\right] }}\sum _{i=1}^\infty \frac{{{\mathbb E}\!\left[ (- X^-)^{n+i}\right] }}{(n+i)!} = -\frac{1}{m_n^-}\sum _{i=2}^\infty \frac{{{\mathbb E}\!\left[ (-\tilde{X}^-_{n-1})^{i}\right] }}{(n-1+i)!}\\&= -\frac{1}{m_n^-} \, {{\mathbb E}\!\left[ r_{2,n-1}(-\tilde{X}^-_{n-1}) \right] } \\&\le - \frac{1}{m_n^-} \, r_{2,n-1}\left( {{\mathbb E}\!\left[ -\tilde{X}^-_{n-1} \right] } \right) = -\frac{1}{m_n^-}\sum _{i=2}^\infty \frac{ (-m^-_{n})^{i} }{(n-1+i)!} = \sum _{i=1}^\infty \frac{ (-m^-_{n})^{i} }{(n+i)!}\\&= \xi _n(-m_n^-) = r_{1,n}(-m_n^-) \le 0 \end{aligned}$$ - (ii):
-
The moments of \(T_\theta \sim {\Gamma }(k, \theta )\) are \({{\mathbb E}\!\left[ T_\theta ^n\right] }={ \theta ^n{\Gamma }\left( k+n \right) }/{{\Gamma }\left( k \right) }\), which gives \(m_{n+1}(T_\theta )=\theta (k+n)\). By Lemma 5(i), we have for any mixture of Gamma distributions with fixed shape parameter k,
$$\begin{aligned} \frac{k+n}{k+n-1}\le \frac{m_{n+1}}{m_n}. \end{aligned}$$(58)Combining (58), \(\xi '_n(x) = r'_{1,n}(x)>0\) (\(x\in \mathbb {R}\)), and Lemma 9(i) produces the lower bound on \(\xi _n(X^+)\) in Lemma 9(ii). The scaled remainder of \(T_\theta \) is
$$\begin{aligned} \xi _n(T_\theta )&:= \frac{1}{{{\mathbb E}\!\left[ T_\theta ^{n}\right] }}\sum _{i=1}^\infty \frac{{{\mathbb E}\!\left[ T_\theta ^{n+i}\right] }}{\left( n+i \right) !}\\&= \frac{{\Gamma }\left( k \right) }{ n! \theta ^{n}{\Gamma }\left( k+n \right) }\sum _{i=1}^\infty \frac{ \theta ^{n+i}{\Gamma }\left( k+n+i \right) }{\left( n+1 \right) \ldots \left( n+i \right) \mathrm {\Gamma }(k)}\\&=\frac{1}{n!}\sum _{i=1}^\infty \theta ^{i}\prod \nolimits _{j=1}^i \left( \frac{k -1}{n+j}+1\right) \\&=\frac{ 1 }{n!} \left[ {}_{2} F_{1}\left( 1, k+n ;n+1; \theta \right) -1 \right] . \end{aligned}$$By well-known properties of the Gaussian hypergeometric function \(_2F_1\), the function \(h_n(\theta ):=\xi _n(T_\theta )\) increases monotonically on \(\left( -1, 1\right) \) with \(h_n(0)=0\). Because \(T_{[\theta _{\max }]^+}\) stochastically dominates the mixture \(X^+\) and \(\xi '_n(x)>0\) for \(x\in \mathbb {R}\), we must have \(\xi _n(X^+) \le \xi _n(T_{ [\theta _{\max }]^+})\) if \(\left[ \theta _{\min }, \theta _{\max } \right] \subseteq \left( -1,1\right) \). Similarly, the mixture \(-\,X^-\) stochastically dominates \(-\,T_{[\theta _{\min }]^-}\), which leads to \(\xi _n(T_{-[\theta _{\min }]^-}) \le \xi _n(-\,X^-)\). This proves (ii).
\(\square \)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Muns, S. An iterative algorithm to bound partial moments. Comput Stat 34, 89–122 (2019). https://doi.org/10.1007/s00180-018-0825-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-018-0825-8