1 Introduction

In the present note we consider the following problem: Given n real numbers

$$ \beta _{1}<\cdots <\beta _{n}, $$

under what conditions on \(\beta \)’s is there an integrable random variable (r.v.) X such that

$$ \textrm{IE}X_{i:n}=\beta _{i}, \ \ 1\le i\le n ? $$

[Here, \(X_{1:n}\le \cdots \le X_{n:n}\) are the order statistics of independent, identically distributed r.v.’s \(X_1,\ldots ,X_n\), each with distribution like X.] Notice that the number n is held fixed; the question for infinite sequences is closely connected to the Hausdorff (1921) moment problem, and its answer is well-known from the works of Huang (1998); Kadane (1971, 1974); Kolodynski (2000); Papadatos (2017). Some relative results for the finite case can be found in Mallows (1973).

The outline of the paper is as follows. In Section 2, we establish (Theorem 1) a one to one correspondence between the parent distribution of the sample with given expectations of order statistics \(\beta _{1}<\ldots <\beta _{n},\) and a random variable with the moments depending on \(\beta _{1}<\ldots <\beta _{n}.\) Since it is rather hard to check the characterization using this result, in Section 3 we provide (Theorem 3) explicit conditions on \(\beta _{1}<\ldots <\beta _{n},\) which guarantee the characterization.

2 A characterization of finite sequences of expected order statistics via binomial moments

Without loss of generality we may consider the numbers

$$ \widetilde{\beta }_{i} =\frac{\beta _{i}-c}{\lambda }, \ \ c\in \textbf{R}, \ \lambda >0, $$

instead of \(\beta _{i}\). Clearly these numbers will be the expected order statistics (=EOS) from \((X-c)/\lambda \) if and only if the \(\beta \)’s are the EOS from X.

First, we seek for a necessary condition. Assume that X is a non-degenerate random variable with distribution function (d.f.) F and \(\textrm{IE}|X|<\infty \). Let \(X_1,\ldots ,X_n\) be independent, identically distributed (i.i.d.) random variables with d.f. F, and denote by \(X_{1:n}\le \cdots \le X_{n:n}\) the corresponding order statistics. It is known that

$$\begin{aligned} \mu _k:=\textrm{IE}X_{k:k}= \begin{pmatrix} n\\ k \end{pmatrix}^{-1} \sum _{j=k}^n \begin{pmatrix} j-1 \\ k-1 \end{pmatrix} \mu _{j:n}, \ \ \ k=1,\ldots ,n, \end{aligned}$$
(1)

where \(\mu _{j:n}:=\textrm{IE}X_{j:n}\), \(j=1,\ldots ,n\); this follows by a trivial application of Newton’s formula to the expression \( \mu _k = k \int _{0}^1 u^{k-1} F^{-1}(u) [u+(1-u)]^{n-k} du, \) where \(F^{-1}(u):=\inf \{x:F(x)\ge u\}\), \(0<u<1\), is the left-continuous inverse of F. From (1) with \(k=1,2\),

$$\begin{aligned} \mu _1=\frac{\mu _{1:n}+\cdots +\mu _{n:n}}{n}, \ \ \ \mu _2=\frac{2}{n(n-1)}\sum _{j=2}^n (j-1)\mu _{j:n}. \end{aligned}$$
(2)

On the other hand, it is well-known (see Jones and Balakrishnan (2002)) that

$$\begin{aligned} \mu _{j+1:n}-\mu _{j:n} = \begin{pmatrix} n\\ j \end{pmatrix} \int _{\alpha }^\omega F(x)^j (1-F(x))^{n-j} dx>0, \ \ \ j=1,\ldots , n-1, \end{aligned}$$
(3)

where \(\alpha <\omega \) are the endpoints of the support of X; actually this formula goes back to Pearson (1902). Notice that \(-\infty \le \alpha <\omega \le \infty \), \(\alpha <\omega \) because F is non-degenerate, and the integral in (3) is finite since X is integrable. From (1) and (3) (applied to \(n=2\)),

$$ \mu _2-\mu _1=\int _{\alpha }^\omega F(x)(1-F(x)) dx, $$

while (2) yields

$$\begin{aligned} \mu _2-\mu _1 =\frac{1}{n(n-1)}\sum _{i=1}^{n-1} i(n-i)(\mu _{i+1:n}-\mu _{i:n}). \end{aligned}$$
(4)

Choosing \(c=n^{-1}\sum _{j=1}^n \mu _{j:n}\), \(\lambda =\big (n(n-1)\big )^{-1}\sum _{i=1}^{n-1} i(n-i)(\mu _{i+1:n}-\mu _{i:n})>0\), the numbers \(\widetilde{\mu }_{j:n}=(\mu _{j:n}-c)/\lambda \) and \(\widetilde{\mu _j}=(\mu _{j}-c)/\lambda \) are the EOS from \(\widetilde{X}=(X-c)/\lambda \) whose mean is 0 and whose Gini mean difference is 2. Therefore,

$$ {\widetilde{\mu _2}-\widetilde{\mu _1}=\int _{\alpha }^\omega \frac{F(x) (1-F(x))}{\lambda } dx =1.} $$

Since \(F(y)(1-F(y))>0\) for \(y\in (\alpha ,\omega )\), and zero outside \([\alpha ,\omega )\), it follows that \(f_Y(y):=F(y)(1-F(y))\) defines a Lebesgue density of a random variable, say Y, supported in the (finite or infinite) interval \((\alpha ,\omega )\). By (3),

$$\begin{aligned} \widetilde{\mu }_{j+1:n}- \widetilde{\mu }_{j:n}= & {} \begin{pmatrix} n\\ j \end{pmatrix} {\kern-1pt} \int _{\alpha }^\omega {\kern-1pt}F(y)^{j-1} (1-F(y))^{n-j-1} f_Y(y) dy\\= & {} \begin{pmatrix} n\\ j \end{pmatrix} \textrm{IE}\Big \{ T^{j-1} (1-T)^{n-j-1}\Big \}, \end{aligned}$$

where \(T:=F(Y)\) is a random variable taking values in the interval (0, 1) w.p. 1, because, by definition, \(\Pr (\alpha<Y<\omega )=1\). Hence,

$$ \frac{{\mu }_{j+1:n}- {\mu }_{j:n}}{\lambda } = \frac{n (n-1)}{j(n-j)}\textrm{IE}\Big \{ \begin{pmatrix}n-2\\ j-1\end{pmatrix} T^{j-1} (1-T)^{n-j-1}\Big \}, \quad j=1,\ldots , n-1, $$

and we have shown the following

Proposition 1

If \(X_1,\ldots ,X_n\) are i.i.d. integrable non-degenerate r.v.’s, then there exists an r.v. T, with \(\Pr (0<T<1)=1\), such that

$$\begin{aligned} \frac{(j+1)(n-j-1)(\mu _{j+2:n}-\mu _{j+1:n})}{\sum _{i=1}^{n-1}i(n-i)(\mu _{i+1:n}-\mu _{i:n})}= & {} \textrm{IE}\left\{ \begin{pmatrix}n-2\\ j\end{pmatrix} T^{j} (1-T)^{n-2-j}\right\} ,\nonumber \\{} & {} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ j=0,\ldots ,n-2. \end{aligned}$$
(5)

It is of interest to observe that the binomial moments of T appear in the r.h.s. of (5). Clearly, the r.v. T in this representation need not be unique; any other r.v. \(T'\) with \(\Pr (0<T'<1)=1\), possessing identical moments up to order \(n-2\) with T, will fulfill the same relationship.

Remark 1

For any integrable non-degenerate r.v. X with d.f. F we may define the r.v. T as in the proof of Proposition 1, that is, \(T = F(Y)\) where Y has density \(f_Y(y) = F(y)(1-F(y))/\lambda \) with \(\lambda = \int F (1-F)\). It can be shown, using Lemma 4.1 in Papadatos (2001), that the d.f. of T is specified by

$$\begin{aligned} \Pr (T < t)=\frac{1}{\lambda } \Bigg [t(1-t)F^{-1}(t)-\int _{0}^t (1-2u) F^{-1}(u) d u\Bigg ], \ \ 0 < t < 1. \end{aligned}$$
(6)

Notice that \(\lambda = \int _{0}^1 (2t-1)F^{-1}(t) dt\) and, hence, the function \(F^{-1}\) determines uniquely the d.f. of T. Moreover, (6) shows that the entire location-scale family of X, \(\{c+\lambda X: \ c\in \textbf{R}, \ \lambda > 0\}\), is mapped to a single r.v. \(T\in (0,1)\). Provided that X has (finite or infinite) interval support, non-vanishing density f and differentiable inverse d.f. \(F^{-1}\), we conclude from (6) that a density of T is given by

$$\begin{aligned} f_T(t) = \frac{t(1-t)}{\lambda f(F^{-1}(t))}, \ \ 0 < t < 1. \end{aligned}$$
(7)

Next, we proceed to verify that the preceding procedure can be inverted, showing sufficiency of (5). To this end, we shall make use of the following lemma, which is of independent interest in itself. A detailed proof is postponed to the appendix.

Lemma 1

Let T be an r.v. with d.f. \(F_T\) such that \(\Pr (0<T<1)=1\). Then, there exists a unique, non-degenerate, integrable, r.v. X, satisfying

$$\begin{aligned} \textrm{IE}X_1 =0 \ \ \text {and} \ \ \ \textrm{IE}X_{k+2:k+2}-\textrm{IE}X_{k+1:k+1}=\textrm{IE}T^k, \ \ k=0,1,\ldots , \end{aligned}$$
(8)

where \(X_{k:k}=\max \{X_1,\ldots ,X_k\}\) with \(X_1,X_2,\ldots \) being i.i.d. copies of X. The inverse distribution function of X is given by

$$\begin{aligned} F_0^{-1}(t) :=\frac{F_T(t-)}{t(1-t)} - 4 F_T(\frac{1}{2}-) - \int _{1/2}^{t} \frac{2u-1}{u^2(1-u)^2} F_T(u)d u - c_T, \end{aligned}$$
(9)

\(0<t<1\), where \(F_T(t-)=\Pr (T<t)\), \(\int _{1/2}^t du=-\int _{t}^{1/2} du\) for \(t<1/2\),

$$\begin{aligned} c_T := \textrm{IE}\Bigg [\frac{1}{T}I(T\ge \frac{1}{2})\Bigg ]- \textrm{IE}\Bigg [\frac{1}{1-T}I(T< \frac{1}{2})\Bigg ], \end{aligned}$$
(10)

and I denotes an indicator function.

Remark 2

Any r.v. \(T\in (0,1)\) can be viewed as the expected order statistics generator of its corresponding r.v. X with inverse d.f. \(F_0^{-1}\) as in (9). This is so because the map \(T\rightarrow X\) (i.e., \(F_T\rightarrow F_0\equiv F_X\)), defined implicitly by Lemma 1, is one to one and onto from the space \(\mathcal{T}=\{T: \Pr (0<T<1)=1\}\) to \(\mathcal{H}=\{X: \textrm{IE}X=0, \textrm{IE}X_{2:2}=1\}\), where identically distributed r.v.’s are considered as equal. Its inverse is given by Remark 1 (with \(\lambda =1\), since \(X\in \mathcal{H}\)). In view of (13), below, it is the suitable (and unique) transformation that quantifies the characterization of Hoeffding (1953), stating that the sequence of expected order statistics characterizes the corresponding distribution. It also provides an explicit connection of the (infinite) sequence of expected order statistics to the Hausdorff (1921) moment problem; see Kadane (1971, 1974); Huang (1998); Kolodynski (2000); Papadatos (2017).

Remark 3

Suppose that the r.v. T of Lemma 1 is absolutely continuous with density \(f_T\). Assume also that the corresponding r.v. X (with \(\textrm{IE}X=0\), \(\textrm{IE}X_{2:2}=1\), inverse d.f. \(F_0^{-1}\) as in (9)) is absolutely continuous, admitting a non-vanishing density \(f_0\) in the (finite or infinite) interval support of X, and that \(F_0^{-1}\) is differentiable. Then (see Remark 1),

$$ f_T(t)=\frac{t(1-t)}{f_0(F_0^{-1}(t))}, \ \ \ F_0^{-1}(t)=\int _{1/2}^t\frac{f_T(u)}{u(1-u)} du -c_T, \ \ \ 0<t<1. $$

For example, if T is Beta(2, 2) then X is uniform in \((-3,3)\); if T is Beta(2, 1) then \(X=2\mathcal{E}-2\) where \(\mathcal{E}\) is standard exponential; if T is Beta(1, 2) then \(X=2-2\mathcal{E}\); if T is standard uniform then X is standard logistic with density \(f_0(x)=e^{-x}/(1+e^{-x})^2\), \(x\in \textbf{R}\); if T is degenerate with \(\Pr (T=\rho )=1\) then (9) shows that X is a two-valued r.v. with \(\Pr (X=-1/\rho )=\rho \), \(\Pr (X=1/(1-\rho ))=1-\rho \).

The characterization for finite n reads as follows.

Theorem 1

Given n real numbers \(\beta _{1}<\cdots <\beta _{n}\), the following are equivalent.

  1. (i)

    The \(\beta \)’s are EOS, that is, there exist i.i.d. integrable non-degenerate r.v.’s \(X_1,\ldots ,X_n\) such that \(\textrm{IE}X_{j:n}=\beta _{j}\), \(j=1,\ldots ,n\).

  2. (ii)

    There exists an r.v. T, with \(\Pr (0<T<1)=1\), such that

    $$\begin{aligned} \frac{(j+1)(n-j-1)(\beta _{j+2}-\beta _{j+1})}{\sum _{i=1}^{n-1}i(n-i)(\beta _{i+1}-\beta _{i})}= & {} \textrm{IE}\left\{ \begin{pmatrix}n-2\\ j \end{pmatrix} T^{j} (1-T)^{n-2-j}\right\} ,\nonumber \\{} & {} \ \ \ \ \ \ \ \ \ \ \ \ \ \ j=0,\ldots ,n-2. \end{aligned}$$
    (11)
  3. (iii)

    There exists an r.v. T, with \(\Pr (0<T<1)=1\), such that

    $$\begin{aligned} \frac{n-1}{\begin{pmatrix}n-1\\ k+1\end{pmatrix}\sum _{i=1}^{n-1}i(n-i)(\beta _{i+1}-\beta _i)} \sum _{j=k+1}^{n-1}(n-j)\begin{pmatrix}j\\ k+1\end{pmatrix}(\beta _{j+1}-\beta _j)= & {} \textrm{IE}T^k,\nonumber \\ k=0,\ldots ,n-2. \end{aligned}$$
    (12)

Proof

The equivalence of (11) and (12) follows by a straightforward computation, while the implication (i)\(\Rightarrow \)(ii) is proved in Proposition 1. In order to verify (ii)\(\Rightarrow \)(i), assume that (11) is satisfied for some T with \(\Pr (0<T<1)=1\), and consider the r.v. X as defined in Lemma 1. Let \(\mu _{j:n}=\textrm{IE}X_{j:n}\) and \(\mu _k=\textrm{IE}X_{k:k}\). Then,

$$\begin{aligned} \mu _{j:n}=n \begin{pmatrix}n-1 \\ j-1\end{pmatrix} \sum _{i=j}^n (-1)^{i-j} \begin{pmatrix}n-j \\ i-j\end{pmatrix} \frac{\mu _i}{i}, \ \ \ j=1,\ldots ,n; \end{aligned}$$
(13)

see Mallows (1973); Arnold et al. (1992); David and Nagaraja (2003). It follows that

$$ \mu _{j+2:n}-\mu _{j+1:n}=\begin{pmatrix}n\\ j+1\end{pmatrix} \sum _{i=j+1}^n (-1)^{i-j} \begin{pmatrix}n-j-1 \\ i-j-1\end{pmatrix} \mu _i, \ \ \ j=0,\ldots ,n-2. $$

By a trivial application of the binomial theorem to \((1-T)^{n-2-j}\), and since \(\textrm{IE}T^k=\mu _{k+2}-\mu _{k+1}\), see (8), we obtain

$$\begin{aligned}{} & {} \textrm{IE}\left\{ \begin{pmatrix}n-2\\ j\end{pmatrix} T^{j} (1-T)^{n-2-j}\right\} = \begin{pmatrix}n-2\\ j\end{pmatrix} \sum _{i=j+1}^n (-1)^{i-j} \begin{pmatrix}n-j-1 \\ i-j-1\end{pmatrix} \mu _i, \\{} & {} {\kern220pt} j=0,\ldots ,n-2. \end{aligned}$$

Hence, for \(j=0,\ldots ,n-2\),

$$ \frac{(j+1)(n-j-1)}{n(n-1)}\Big (\mu _{j+2:n}-\mu _{j+1:n}\Big )= \textrm{IE}\left\{ \begin{pmatrix}n-2\\ j\end{pmatrix} T^{j} (1-T)^{n-2-j}\right\} , $$

and (11) implies that for some \(\lambda >0\),

$$ \beta _{j+1}-\beta _{j}=\lambda (\mu _{j+1:n}-\mu _{j:n}), \ \ j=1,\ldots , n-1. $$

It follows by induction on j that \(\beta _{j}=\beta _{1}+\lambda (\mu _{j:n}-\mu _{1:n})\), and therefore, \((\beta _{j}-c)/\lambda =\mu _{j:n}\), with \(c=\beta _{1}-\lambda \mu _{1:n}\). Hence, the numbers \(\big ((\beta _{j}-c)/\lambda \big )_{j=1}^n\) are expected order statistics, and thus, the same is true for \(\beta \)’s.

Remark 4

The r.h.s. of (11) corresponds to a Binomial Mixture (of a particular form, since \(\Pr (T=0)=\Pr (T=1)=0\)). The necessary and sufficient condition (12) is always satisfied for \(n=2\) and \(n=3\). To see this, it suffices to check that if \(n=2,\) then (12) holds whenever \(\Pr (T=c)=1\) with an arbitrary \(c\in (0,1),\) while if \(n=3,\) then (12) is fulfilled when \(\Pr (T=(\beta _2-\beta _1)/(\beta _3 -\beta _1))=1.\) Hence, the true problem begins at \(n=4\).

3 Explicit characterization of sequences of expected order statistics by solving the truncated moment problem for finite open intervals

In this section, we obtain a precise characterization by invoking results from the truncated moment problem for finite intervals. The existing results are limited to compact intervals and are not applicable to our case, since, according to the characterization of Theorem 1, a suitable T lies in the open interval (0, 1) w.p. 1.

Definition 1

Given \(n\ge 4\) numbers \(\beta _1<\cdots <\beta _n\), let \({\varvec{\beta }}=(\beta _1,\ldots ,\beta _n)\) and define the vector \((\nu _k)_{k=0}^{n-2}={\varvec{\nu }}={\varvec{\nu }}({\varvec{\beta }})\) by

$$\begin{aligned} \nu _k:=\frac{n-1}{\lambda \begin{pmatrix}n-1 \\ k+1\end{pmatrix}} \sum _{j=k+1}^{n-1}(n-j)\begin{pmatrix}j\\ k+1\end{pmatrix}(\beta _{j+1}-\beta _j), \ \ \ \ k=0,\ldots ,n-2, \end{aligned}$$
(14)

where \(\lambda =\lambda (\varvec{\beta }):=\sum _{i=1}^{n-1}i(n-i)(\beta _{i+1}-\beta _i)>0\).

It is easily checked that \(1=\nu _0>\nu _1>\cdots>\nu _{n-2}>0\), and that the vector \(\varvec{\nu }\) is invariant under location-scale transformations on the \(\beta \)’s.

According to Theorem 1, the \(\beta \)’s are EOS if and only if the \(\nu \)’s fulfill the truncated moment problem in the interval (0, 1). However, for the truncated moment problem, well-known results exist for a compact interval [ab]; see, e.g., Theorem IV.1.1 of Karlin and Studden (1966) or Theorems 10.1, 10.2 in Schmüdgen (2017). In order to obtain the corresponding necessary and sufficient conditions for open intervals, we shall make use of the following

Theorem 2

(Richter-Tchakaloff Theorem; see Schmüdgen (2017), Theorem 1.24). Let \((\mathcal {X},\mathcal {F},\mu )\) be a measure space and V be a finite dimensional linear subspace of the space \(L^1_{\textbf{R}}(\mathcal {X},\mathcal {F},\mu )\) of real-valued \(\mu \)-integrable functions on \(\mathcal {X}\). Define the linear functional \(L_{\mu }\) by \(L_{\mu }(f):=\int f d\mu \), \(f\in V\). Then, there exists a measure \(\mu _0\) in \((\mathcal {X},\mathcal {F})\), supported on \(k\le \dim V\) points of \(\mathcal X\), such that \(L_{\mu _0}\equiv L_{\mu }\) on V, that is, \(\int f d\mu _0=\int f d\mu \) for all \(f\in V\).

A symmetric \(n\times n\) matrix A with real entries is positive definite (denoted by \(A \succ 0\)) if \({\varvec{x}}^T A{\varvec{x}}>0\) for all \({\varvec{x}}\in \textbf{R}^n\setminus \{{\varvec{0}}\}\), where \({\varvec{x}}^T\) denotes the transpose of a column vector \({\varvec{x}}\in \textbf{R}^n\). Similarly, A is positive semi-definite (or nonnegative definite) if \({\varvec{x}}^T A{\varvec{x}}\ge 0\) for all \({\varvec{x}}\in \textbf{R}^n\), and this is denoted by \(A\succeq 0\).

Definition 2

(Hankel matrices). Let \(n\in \{4,5,\ldots \}\), \(0\le \varepsilon < 1/2\), and consider the numbers \(\nu _k\) as in Definition 1.

  1. (i)

    Case \(n=2m+2\): We define

    $$\begin{aligned} A_0(\varepsilon ):=\Big (\nu _{i+j}\Big )_{i,j=0}^m, \ \ \ B_0(\varepsilon ):=\Big (\nu _{i+j+1}-\nu _{i+j+2} -\varepsilon (1-\varepsilon ) \nu _{i+j}\Big )_{i,j=0}^{m-1}, \end{aligned}$$
    (15)

    and \(A_0:=A_0(0)\), \(B_0:=B_0(0)\).

  2. (ii)

    Case \(n=2m+3\): We define

    $$\begin{aligned} A_1(\varepsilon ):=\Big ( \nu _{i+j+1}-\varepsilon \nu _{i+j}\Big )_{i,j=0}^m, \ \ \ B_1(\varepsilon ):=\Big ((1-\varepsilon )\nu _{i+j}-\nu _{i+j+1}\Big )_{i,j=0}^{m}, \end{aligned}$$
    (16)

    and \(A_1:=A_1(0)\), \(B_1:=B_1(0)\).

The notation \(A_0(\varepsilon )\) is used for convenience, although \(A_0(\varepsilon )\) does not depend on \(\varepsilon )\). Notice that the matrices \(A_0(\varepsilon ), A_1(\varepsilon ), B_1(\varepsilon )\) are of order \(m+1\), while \(B_0(\varepsilon )\) is of order m. The following theorem contains our main result; compare with Mallows (1973).

Theorem 3

Let \(n\in \{4,5,\ldots \}\), \(\beta _1<\cdots <\beta _n\), and \((\nu _0,\ldots ,\nu _{n-2})\) as in Definition 1.

  1. (i)

    If \(n=2m+2,\) then the \(\beta \)’s are EOS if and only if \(A_0(\varepsilon )\succeq 0\) and \(B_0(\varepsilon )\succeq 0\) for some \(\varepsilon \in (0,1/2)\), where \(A_0(\varepsilon )\) and \(B_0(\varepsilon )\) are given by Definition 2(i).

  2. (ii)

    If \(n=2m+3,\) then the \(\beta \)’s are EOS if and only if \(A_1(\varepsilon )\succeq 0\) and \(B_1(\varepsilon )\succeq 0\) for some \(\varepsilon \in (0,1/2)\), where \(A_1(\varepsilon )\) and \(B_1(\varepsilon )\) are given by Definition 2(ii).

  3. (iii)

    If \(n=2m+2\), the condition \(A_0\succ 0\) and \(B_0\succ 0\) is sufficient, but not necessary, for the \(\beta \)’s to be EOS. Similarly, if \(n=2m+3\), the condition \(A_1\succ 0\) and \(B_1\succ 0\) is sufficient, but not necessary, for the \(\beta \)’s to be EOS.

Note that if either \(A_0\) or \(B_0\) (\(A_1\) or \( B_1\)) are not nonnegative definite, then \(\beta _1<\ldots < \beta _n\) are not the expectations of order statistics. This suggests the following verification procedure. We first analyze \(A_0\) and \(B_0\) (\(A_1\) and \(B_1\), respectively). If they are positive definite, then the \(\beta \)’s are EOS. If either of them is not nonnegative definite, then the \(\beta \)’s are not EOS. Otherwise we use Theorem 3 (i) and (ii) for a more precise analysis.

Proof of Theorem 3 (i) and (ii): According to Theorems 10.1, 10.2 in Schmüdgen (2017), or Theorem IV.1.1 of Karlin and Studden (1966), the condition \(A_i(\varepsilon )\succeq 0\) and \(B_i(\varepsilon )\succeq 0\) (\(i=0\) or 1) is necessary and sufficient for \((\nu _k)_{k=0}^{n-2}\) to be a truncated moment sequence in the interval \([\varepsilon ,1-\varepsilon ]\).

Assume first that \(A_i(\varepsilon )\succeq 0\) and \(B_i(\varepsilon )\succeq 0\) for some \(\varepsilon \in (0,1/2)\). Since \(\nu _0=1\), any solution (=representing measure) \(\mu \) will be a probability measure. Equivalently, the r.v. T with d.f. \(F_T(x)=\mu \big ((-\infty ,x]\big )\) takes values in \([\varepsilon ,1-\varepsilon ]\subseteq (0,1)\) and satisfies \(\textrm{IE}T^k=\nu _k\), \(k=0,\ldots ,n-2\). From Theorem 1(iii) it follows that the \(\beta \)’s are EOS.

To prove necessity, assume that the \(\beta \)’s are EOS. From Theorem 1(iii) we can find an r.v. T with \(\textrm{IE}T^k=\nu _k\), \(k=0,\ldots ,n-2\), and \(\Pr (0<T<1)=1\). Let \(\mu _T\) be the probability measure of T and consider the probability space \((\mathcal {X},\mathcal {F},\mu ):= ((0,1),\mathcal {B},\mu _T)\), where \(\mathcal {B}\) is the Borel \(\sigma -\)field on (0, 1). Define the space V of real polynomials \(f:(0,1)\rightarrow \textbf{R}\) of degree \(\le n-2\); obviously, V is a linear subspace of \(L^1(\textbf{R},\mu _T)\) of dimension \(n-1\) (finite). Consider also the Riesz functional \(L_{\mu _T}:V\rightarrow \textbf{R}\) defined by \(L_{\mu _T}(f):=\int f d \mu _T=\sum _{k=0}^{n-2}a_k \nu _k\) for \(f(x)=\sum _{k=0}^{n-2}a_k x^k \in V\). Form Richter-Tchakaloff Theorem (see Theorem 2, above), there exists a measure \(\mu _0\), supported in at most \(n-1\) points of \(\mathcal {X}=(0,1)\), such that \(L_{\mu _0}\equiv L_{\mu _T}\) on V; in particular, \(\nu _k=\int _{(0,1)}x^k d {\mu _T}(x)=\int _{(0,1)}x^k d {\mu _0}(x)\), \(k=0,\ldots ,n-2\). Thus, \(\mu _0\) is a probability measure (\(\nu _0=1\)) supported on a finite number of points in (0, 1), possessing the same initial \(n-2\) moments as \(\mu _T\). This means that \(\mu _0\) solves the truncated moment problem for \((\nu _k)_{k=0}^{n-2}\) in the interval \([t_1,t_{2}]\), where \(t_1\in (0,1)\) is the minimum supporting point of \(\mu _0\) and \(t_{2}\in (0,1)\) the maximum one. Choose \(\varepsilon >0\) such that \(\varepsilon <\min \{t_1,1-t_{2}\}\). Then, the sequence \((\nu _k)_{k=0}^{n-2}\) is the moment sequence of \(\mu _0\), supported in the interval \([\varepsilon ,1-\varepsilon ]\), and Theorems 10.1, 10.2 in Schmüdgen (2017) imply that \(A_i(\varepsilon )\succeq 0\) and \(B_i(\varepsilon )\succeq 0\) (\(i=0\) or 1).

(iii) First we prove sufficiency. Denote by \(\lambda _{\min }(M)\) (resp. \(\lambda _{\max }(M)\)) the smallest (resp. the largest) eigenvalue of a real symmetric matrix M. For the case \(n=2m+2\), the matrix \(A_0(\varepsilon )\) is independent of \(\varepsilon \), hence, \(A_0(\varepsilon )=A_0\succ 0\) by hypothesis. Moreover, \(B_0(\varepsilon )= B_0-\varepsilon (1-\varepsilon )M_0\) for some real symmetric matrix \(M_0\); see (15). Since \(\lambda _{\min }(B_0)>0\) by assumption, it follows that for any \({\varvec{x}}=(x_0,\ldots ,x_{m-1})^T\in \textbf{R}^{m}\), \({\varvec{x}}^T B_0(\varepsilon ) {\varvec{x}}={\varvec{x}}^T B_0 {\varvec{x}} -\varepsilon (1-\varepsilon ) {\varvec{x}}^T M_0 {\varvec{x}}\ge \big [\lambda _{\min }(B_0)-\varepsilon (1-\varepsilon )\lambda _{\max }(M_0)\big ]{\varvec{x}}^T {\varvec{x}}\ge 0\), if \(\varepsilon >0\) is sufficiently small. Hence, the sufficient condition (i), namely, \(A_0(\varepsilon )\succeq 0\) and \(B_0(\varepsilon )\succeq 0\) for some small \(\varepsilon >0\), is satisfied. Similarly, when \(n=2m+3\) we have \(A_1(\varepsilon )=A_1-\varepsilon M_1\) and \(B_1(\varepsilon )=B_1-\varepsilon M_1\) for some real symmetric matrix \(M_1\); see (16). From \(\lambda _{\min }(A_1)>0\), \(\lambda _{\min }(B_1)>0\), it follows that for any \({\varvec{x}}\in \textbf{R}^{m+1}\), \({\varvec{x}}^T A_1(\varepsilon ){\varvec{x}}={\varvec{x}}^T A_1 {\varvec{x}} -\varepsilon {\varvec{x}}^T M_1 {\varvec{x}}\ge \big [\lambda _{\min }(A_1) -\varepsilon \lambda _{\max }(M_1)\big ]{\varvec{x}}^T {\varvec{x}}\ge 0\), and \({\varvec{x}}^T B_1(\varepsilon ){\varvec{x}}\ge \big [\lambda _{\min }(B_1)-\varepsilon \lambda _{\max }(M_1)\big ]{\varvec{x}}^T {\varvec{x}}\ge 0\), provided \(\varepsilon >0\) is sufficiently small. Hence, the sufficient condition (ii), \(A_1(\varepsilon )\succeq 0\) and \(B_1(\varepsilon )\succeq 0\) for some small \(\varepsilon >0\), is satisfied. Therefore, in both cases, the condition (iii) is sufficient for the \(\beta \)’s to represent EOS.

Finally, we show that the condition (iii), namely \(A_i\succ 0\) and \(B_i\succ 0\) (\(i=0\) or 1), is not necessary. To this end, consider the sequence \(\beta _j:=\sum _{k=n+1-j}^n \begin{pmatrix}n \\ k\end{pmatrix}\), \(j=1,\ldots ,n\). Then, \(\beta _{j+1}-\beta _j=\begin{pmatrix}n\\ j\end{pmatrix}\) (\(j=1,\ldots ,n-1\)) and a straightforward computation yields \(\nu _k=2^{-k}\), \(k=0,\ldots ,n-2\); see (14). Suppose first that \(n=2m+2\) and let \({\varvec{x}}^T=(x_0,\ldots ,x_m)\in \textbf{R}^{m+1}\). Then, \({\varvec{x}}^T A_0 \varvec{x}=\big (\sum _{k=0}^m x_k/2^{k}\big )^2\), and since \(m\ge 1\), the matrix \(A_0\) is singular (hence, not positive definite). Similarly, for \({\varvec{x}}^T=(x_0,\ldots ,x_{m-1})\in \textbf{R}^{m}\), \({\varvec{x}}^T B_0 \varvec{x}=(1/4)\big (\sum _{k=0}^{m-1} x_k/2^{k}\big )^2\), which is positive definite if and only if \(m=1\) (\(n=4\)). On the other hand, \(A_0(\varepsilon )=A_0\succeq 0\) for all \(\varepsilon \in (0,1/2)\), while \({\varvec{x}}^T B_0(\varepsilon ) \varvec{x}=\big (1/4-\varepsilon (1-\varepsilon )\big ) \big (\sum _{k=0}^{m-1} x_k/2^{k}\big )^2\ge 0\) for small enough \(\varepsilon >0\). According to characterization (i), the given \(\beta \)’s are EOS, although the numbers \(\nu _k({\varvec{\beta }})\) (\(k=0,\ldots ,n-2\)) do not satisfy the condition \(A_0\succ 0\) and \(B_0\succ 0\).

Next, suppose that \(n=2m+3\). Then, \(A_1=B_1\) and it follows that \({\varvec{x}}^T A_1 \varvec{x}={\varvec{x}}^T B_1 \varvec{x}=(1/2) \big (\sum _{k=0}^{m} x_k/2^{k}\big )^2\), showing that \(A_1\) (and \(B_1\)) is singular and positive semi-definite. On the other hand, \({\varvec{x}}^T A_1(\varepsilon ) \varvec{x}={\varvec{x}}^T B_1(\varepsilon ) \varvec{x} =(1/2-\varepsilon ) \big (\sum _{k=0}^{m} x_k/2^{k}\big )^2\ge \) 0, and (ii) shows that the \(\beta \)’s are EOS. Hence, although the numbers \(\nu _k({\varvec{\beta }})\) (\(k=0,\ldots ,n-2\)) do not satisfy the condition \(A_1\succ 0\) and \(B_1\succ 0\), the corresponding \(\beta _j\) are EOS.

It can be checked that the given \(\beta \)’s are the EOS from the two-valued r.v. X with \(\Pr (X=0)=\Pr (X=2^n)=1/2\).\(\square \)

Remark 5

(a) Assume that for \(i=0\) or 1, \(A_i\succeq 0\), \(B_i\succeq 0\), and either \(\det A_i=0\) or \(\det B_i=0\) (or both). Then, the measure \(\mu =\mu _0\) is [0, 1]-determinate from its moments \((\nu _k)_{k=0}^{n-2}\); see Theorem 10.7 in Schmüdgen (2017). Hence, if this is the case, we can find \(\varepsilon \in (0,1/2)\) such that \(A_i(\varepsilon )\succeq 0\) and \(B_i(\varepsilon )\succeq 0\) if and only if the support of (the unique) \(\mu _0\) does not contain any of the endpoints 0 and 1.

(b) The finite supporting set of the discrete measure \(\mu _0\), constructed in the proof of Theorem 3, can be chosen to contain at most \(k\le n/2\) (rather than \(k\le n-1\)) points.

(c) Theorem 3 makes it possible, at least in principle, to calculate sharp upper and lower bounds on distribution functions in terms of expectations of order statistics (see Mallows, 1973).

Example 1

(The case \(n=4\)). Assume we are given \(\beta _1<\beta _2<\beta _3<\beta _4\). Since for \(n=4\) we have \(m=1\), the matrices \(A_0\), \(B_0\) are given by (see (14), (15)),

$$ A_0 {\kern4pt} = \frac{1}{\lambda ^2} {\kern4pt} \left(\begin{array}{cc} (\beta _3-\beta _2)+3(\beta _4-\beta _1) &{} (\beta _4-\beta _3)+2(\beta _4-\beta _2) \\ (\beta _4-\beta _3)+2(\beta _4-\beta _2) &{} 3 (\beta _4-\beta _3) \end{array}\right) , \ \ B_0= \left(\begin{array}{c} \frac{2}{\lambda }(\beta _3-\beta _2) \end{array} \right) , $$

with \(\lambda \) as in Definition 1. Hence, \(B_0\) is (trivially) positive definite, and by Sylvester’s criterion, \(A_0=A_0(\varepsilon )\) is positive semi-definite if and only if

$$\begin{aligned} (\beta _{2}-\beta _{1})(\beta _{4}-\beta _{3}) \ge \Big (\frac{2}{3} (\beta _{3}-\beta _{2})\Big )^2. \end{aligned}$$
(17)

According to Theorem 3(i), the \(\beta \)’s are EOS if and only if (17) is satisfied. Based on (17) we immediately deduce that, e.g., the numbers \((0,\ 2,\ 5,\ 7)\) are EOS, while the numbers \((0,\ 2,\ 11,\ 13)\) are not. For the first set of numbers, the r.v. T in (11) or (12) is uniquely determined (in fact, \(T\equiv 1/2\)), because \(\varvec{\nu }({\varvec{\beta })}=(1,1/2,1/4)=(1,\textrm{IE}T,\textrm{IE}T^2)\), showing that \(\text {Var}T=0\); see Remark 5(a) and (14) of Definition 1. Consequently, from Lemma 1 we conclude that the corresponding r.v. X, assuming the given expected order statistics, is also unique, namely, \(\Pr (X=-1/2)=\Pr (X=15/2)=1/2\); see Remarks 3, 2.

Example 2

(The case \(n=5\)). It can be checked that for \(n=5\), the \(2\times 2\) matrices \(A_1\), \(B_1\), see (16), (14), are positive semi-definite if and only if \((\beta _3-\beta _2)(\beta _5-\beta _4) \ge \frac{1}{2}(\beta _4-\beta _3)^2\) and \((\beta _2-\beta _1)(\beta _4-\beta _3) \ge \frac{1}{2}(\beta _3-\beta _2)^2\). Moreover, if both inequalities are strict (case \((+,+)\)) then \(A_1\succ 0\), \(B_1\succ 0\), and Theorem 3(iii) shows that the \(\beta \)’s are EOS. If, however, one (or both) of the inequalities reduces to an equality, one has to check the condition (ii) of Theorem 3 in detail. For instance, if both matrices are singular (case (0, 0)), then \(A_1(\varepsilon )\succeq 0\) and \(B_1(\varepsilon )\succeq 0\) for \(0<\varepsilon <\min \{\beta _3-\beta _2,\beta _4-\beta _3\}/(\beta _4-\beta _2)\), and the \(\beta \)’s are again EOS. As an example of the (0, 0)-case consider the numbers \((0,\ 1,\ 5,\ 13,\ 21)\), representing the EOS from the (uniquely defined) r.v. X with \(\Pr (X=-1/10)=2/3\), \(\Pr (X=121/5)=1/3\). However, both cases \((0,+)\) (e.g., \((0,\ 9,\ 11,\ 13,\ 14)\)) and \((+,0)\) (e.g., \((0,\ 1,\ 3,\ 5,\ 14)\)) imply that the \(\beta \)’s are not EOS. To see this, assume that \(2 (\beta _3-\beta _2)(\beta _5-\beta _4) = (\beta _4-\beta _3)^2\) and \(2(\beta _2-\beta _1)(\beta _4-\beta _3)>(\beta _3-\beta _2)^2\); case \((0,+)\). Then, \(B_1\succ 0\) (hence, \(B_1(\varepsilon )\succeq 0\) for small \(\varepsilon >0\)) and \(A_1\succeq 0\) with \(\det A_1=0\). It can be verified that for \({\varvec{x}}^T=(x_0,x_1):=(\beta _4-\beta _3,-(\beta _4-\beta _2))\), \(\varvec{x}^T A_1(\varepsilon ){\varvec{x}}=-\varepsilon \Delta \), where \(\Delta >0\) depends only on \(\beta \)’s, and thus, according to Theorem 3(ii), the \(\beta \)’s cannot be EOS. By the same reasoning, this is also true for the \((+,0)\)-case. Therefore, the complete characterization for \(n=5\) says for the \(\beta \)’s to be EOS it is necessary and sufficient that either \(2 (\beta _2-\beta _1)(\beta _4-\beta _3) = (\beta _3-\beta _2)^2\) and \(2 (\beta _3-\beta _2)(\beta _5-\beta _4)= (\beta _4-\beta _3)^2\), or \(2 (\beta _2-\beta _1)(\beta _4-\beta _3) > (\beta _3-\beta _2)^2\) and \(2 (\beta _3-\beta _2)(\beta _5-\beta _4)> (\beta _4-\beta _3)^2\); that is, either both matrices \(A_1\), \(B_1\) are positive definite, or both are positive semi-definite and singular. We do not know if the situation is similar for odd values of \(n\ge 7\).

Our final result characterizes the binomial mixtures for which the mixing distribution is supported in the open interval (0, 1) (cf. Wood, 1992, 1999). The proof, being an immediate application of Theorems 1, 3, is omitted.

Theorem 4

Let \({\varvec{p}}=(p_0,\ldots ,p_n)\) (\(n\ge 2\)) be a probability vector (\(p_i\ge 0\), \(\sum _{i=0}^n p_i=1\)) and \({\varvec{u}}={\varvec{u}}({\varvec{p}}) =(u_0,\ldots ,u_n)\), where

$$ u_k:= \begin{pmatrix}n\\ k\end{pmatrix}^{-1} \sum _{j=k}^n \begin{pmatrix}j\\ k\end{pmatrix} p_j, \ \ k=0,1,\ldots ,n. $$

If \(n=2m\) set

$$ A(\varepsilon )=\Big (u_{i+j}\Big )_{i,j=0}^m, \ \ \ B(\varepsilon )=\Big (u_{i+j+1}-u_{i+j+2}- \varepsilon (1-\varepsilon )u_{i+j}\Big )_{i,j=0}^{m-1}, $$

and if \(n=2m+1\) set

$$ A(\varepsilon )=\Big (u_{i+j+1}-\varepsilon u_{i+j}\Big )_{i,j=0}^m, \ \ \ B(\varepsilon )=\Big ((1-\varepsilon )u_{i+j}-u_{i+j+1}\Big )_{i,j=0}^{m}. $$

Then, the following are equivalent.

  1. (i)

    \(A(\varepsilon )\succeq 0\) and \(B(\varepsilon )\succeq 0\) for some \(\varepsilon \) with \(0<\varepsilon <1/2\).

  2. (ii)

    \({\varvec{p}}\in \text {Conv}[B_0]\), where

    $$ B_0=\left\{ \left( \begin{pmatrix}n\\ j\end{pmatrix}p^j(1-p)^{n-j}\right) _{j=0}^n, \ 0<p<1 \right\} $$

    is the open binomial probability curve (without its endpoints) and \(\text {Conv}[X]\) denotes the convex hull of \(X\subseteq \textbf{R}^{n+1}\).

  3. (iii)

    There exists an r.v. V with \(\Pr (0<V<1)=1\) such that

    $$ p_j=\textrm{IE}\left\{ \begin{pmatrix}n\\ j\end{pmatrix} V^j (1-V)^{n-j}\right\} , \ \ \ j=0,1,\ldots ,n. $$
  4. (iv)

    \({\varvec{u}}\in \text {Conv}[M_0]\), where \(M_0=\left\{ (1,t,t^2,\ldots ,t^n), \ \ 0<t<1 \right\} \) is the open moment curve (without its endpoints).

  5. (v)

    There exists an r.v. V with \(\Pr (0<V<1)=1\) such that

    $$ u_k=\textrm{IE}V^k, \ \ \ k=0,1,\ldots ,n. $$

Let \({\varvec{x}}(t)=(1,t,t^2,\ldots ,t^n)\), \(0\le t\le 1\). A simple application of Theorem 4 shows that for \(n\ge 3\) (in contrast to the case \(n=2\)), the line segment \((1-\lambda ){\varvec{x}(0)}+\lambda {\varvec{x}(t_0)}\), \(0\le \lambda <1\), lies outside \(\text {Conv}[M_0]\). Given \(\lambda _0,t_0\in (0,1)\), it follows from Farkas’ Lemma (see, e.g., Bertsimas and Tsitsiklis (1997), Theorem 4.6) that for any m, and any given collection \(\{t_0,\ldots ,t_m\}\subseteq (0,1)\), we can find a polynomial p with \(\deg (p)\le n\), such that \((1-\lambda _0)p(0)+\lambda _0 p(t_0)<0\) and \(p(t_i)\ge 0\), \(i=0,1,\ldots ,m\).