1 Introduction

Minimax estimation of a one dimensional cumulative distribution function (c.d.f.) was initiated in 1955 by Aggarwal (1955) and has been extensively studied since then (see the references given in the next paragraph). To the best of our knowledge, extensions of this approach to higher dimensions have not been investigated. In this paper we therefore consider estimating a bivariate c.d.f. and we generalize to this case some known results concerning minimax estimation of a univariate c.d.f. We also briefly discuss a multivariate generalization of these results.

Minimax estimation of a univariate c.d.f. F was considered by many authors. Using an invariance structure relative to the group of continuous and strictly increasing transformations, Aggarwal (1955) found the best invariant estimator of a continuous c.d.f. F under the invariant loss \(L(F,{\widehat{F}})=\int \limits _{{\mathbb {R}}} |F-{\widehat{F}}|^r \,dF\), \(r \ge 1\). Here \({\widehat{F}}\) stands for an estimate of F, based on a sample from F. Ferguson (1967, pages 191–197) generalized this result to the case that \(L(F,{\widehat{F}})=\int \limits _{{\mathbb {R}}} (F-{\widehat{F}})^2h(F) \,dF\), where \(h(\cdot )\) is a continuous and positive function. He also asked whether the best invariant estimates are minimax among the larger class of (not necessarily invariant) procedures. Yu (1992b) established the minimaxity of the best invariant procedure in Ferguson’s setup. In particular he found the minimax estimator of a continuous c.d.f. F under the loss of the form \(L(F,{\widehat{F}})=\int \limits _{{\mathbb {R}}}(F-{\widehat{F}})^2F^{-\delta } (1-F)^{-\gamma }dF \), where \(\delta , \gamma \in \{0,1\}\) are fixed numbers. Analog minimaxity findings were obtained by Mohammadi and van Zwet (2002) (entropy loss), Ning and Xie (2007) (Linex loss), and Stȩpień–Baran Stepien (2010) (strictly convex loss). Phadia and Yu (1992) proved minimaxity of the empirical distribution function under the Kolmogorov–Smirnov loss \(\sup _{t \in {\mathbb {R}}} |F(t)-{\widehat{F}}(t)|\). Jafari Jozani et al. (2014) considered the problem of estimating a continuous distribution function F, as well as meaningful functions \(\tau (F) \), under a large class of loss functions. They obtained best invariant estimators and established their minimaxity for Hölder continuous \(\tau \)’s and strict bowl-shaped losses with a bounded derivative. Phadia (1973) considered a different model in which it is not assumed that F is continuous. He found the minimax estimator of F under the noninvariant loss function \(L(F,{\widehat{F}})=\int \limits _{{\mathbb {R}}} (F-{\widehat{F}})^2F^{-\delta } (1-F)^{-\gamma }dw \), where \(\delta , \gamma \in \{0,1\}\) are fixed numbers and w is a given non-null finite measure on \((\mathbb {R}, \mathcal {B}_{\mathbb {R}} )\), with \(\mathcal {B}_{\mathbb {R}}\) denoting the \(\sigma \)-algebra of Borel sets on \(\mathbb {R}\). Yu (1992a) considered minimax estimation of F with a more general loss function \(L(F,{\widehat{F}})=\int \limits _{{\mathbb {R}}} G(F-{\widehat{F}})h(F) \, dw\), where \(G(\cdot )\) is quadratic and \(h(\cdot )\) is continuous and positive. He proved that this problem is equivalent to that of finding a minimax estimator of a binomial proportion p under the loss \(L_B(p,d)=G(p-d)h(p)\). His result was generalized by Jokiel-Rokita and Magiera (2007) to the case where \(G(\cdot )\) is convex.

In the first part of the paper we generalize the result of Phadia (1973) to the two-dimensional case. Let \((X_1, Y_1), (X_2, Y_2), \ldots , (X_n, Y_n)\,\) be i.i.d. two-dimensional random vectors from an unknown bivariate (not necessarily continuous) c.d.f. F on \({\mathbb {R}}^2\). On the basis of this sample we find a minimax estimator \({\widehat{F}}_1\) of F under the weighted squared error loss

$$\begin{aligned} L_1({\widehat{F}}, F) = \iint \limits _{{\mathbb {R}}^2} ( {\hat{F}}(s,t)- F(s,t))^2(F(s,t))^{-\delta }(1-F(s,t))^{-\gamma }\,dW(s,t). \end{aligned}$$
(1)

Here W is a given non-null finite measure on \((\mathbb {R}^2, \mathcal {B}_{\mathbb {R}^2})\) and \(\delta , \gamma \in \{0,1\}\) are fixed numbers. In the case where \(\delta \ne 0\) or \(\gamma \ne 0\), the loss functions \(L_1\) is more sensitive to departures from F in the tails of the distribution. We also show that the decision rule \({\widehat{F}}_1\) remains minimax even if F is assumed to be absolutely continuous with respect to the Lebesgue measure on \({\mathbb {R}}^2\). In the second part of the paper we find a minimax estimator \({\widehat{F}}_2\) of an arbitrary bivariate c.d.f. F under the invariant weighted Cramer–von Mises loss

$$\begin{aligned} L_2({\widehat{F}}, F) = \iint \limits _{{\mathbb {R}}^2} ( {\hat{F}}(s,t)- F(s,t))^2(F(s,t))^{-\delta }(1-F(s,t))^{-\gamma }\,dF(s,t). \end{aligned}$$
(2)

The latter result cannot be viewed as a natural generalization of the above mentioned result of Yu (1992b), because we derive \({\widehat{F}}_2\) not assuming that F must be continuous. Since we use a much larger class of c.d.f.’s, the minimax inference for the loss \(L_2\) becomes easier, but is still nontrivial.

In the univariate case, the minimax estimators \(\phi \) and \(d_0\) found by Phadia (1973) and Yu (1992b), respectively, are linear functions of the empirical distribution function (e.d.f.). Therefore, it is not surprising that the minimax procedures \({\widehat{F}}_1\) and \({\widehat{F}}_2\) are linear functions of the bivariate e.d.f. Moreover, for each \(\delta , \gamma \in \{0,1\}\), \({\widehat{F}}_1={\widehat{F}}_2\) and \({\widehat{F}}_1\) has the same form as \(\phi \) except that the univariate e.d.f. is replaced by the bivariate e.d.f. On the other hand, \(\phi \ne d_0\) and the relationship between \({\widehat{F}}_2\) and \(d_0\) is slightly different than that between \({\widehat{F}}_1\) and \(\phi \). The reason for this difference is that the parameter space considered by Yu (1992b) contains only continuous c.d.f.’s.

2 Formulation of the problem

In the problem of estimating a univariate distribution function, the action space is restricted to the functions \(a(\cdot ): {\mathbb {R}} \rightarrow [0,1]\), which are nondecreasing. To define the appropriate action space to estimate a bivariate distribution function we note, that a two-dimensional analog of a nondecreasing function of one variable is a 2-increasing function.

Definition 1

A function \(a:{\mathbb {R}}^2 \rightarrow [0,1]\) is 2-increasing if

$$\begin{aligned} a(s_2,t_2)-a(s_2,t_1)-a(s_1,t_2)+a(s_1,t_1) \ge 0 \end{aligned}$$

for any rectangle \([s_1,t_1] \times [s_2,t_2] \in {\mathbb {R}}^2\).

Let \({\widehat{F}}({\varvec{Z}};s,t)\) be an estimator of a bivariate distribution function F(st), based on the sample \({\varvec{Z}}=((X_1, Y_1), \ldots , (X_n, Y_n))\). To simplify the notation we will also write \({\widehat{F}}(\cdot ,\cdot )\) and \({\widehat{F}}(s,t)\) for \({\widehat{F}}({\varvec{Z}}, \cdot ,\cdot )\) and \({\widehat{F}}({\varvec{Z}}, s,t)\), respectively. We assume that for each realization \({\varvec{z}}\) of \({\varvec{Z}}\), the decision rule \({\widehat{F}}({\varvec{z}};\cdot ,\cdot )\) is an element of the action space

$$\begin{aligned} \mathcal{A} = \{ a: a=a(s,t) \; {\text {is a 2-increasing function on }}\; {\mathbb {R}}^2{\text { with values in }}\; [0,1] \}. \end{aligned}$$

It is obvious that any bivariate c.d.f. F belongs to the class \(\mathcal{A}\). However, \(\mathcal{A}\) contains estimates \(a(\cdot ,\cdot )\) which are not c.d.f.’s, because they do not satisfy the conditions \(a(-\infty ,-\infty ) =0\), \(a(-\infty ,y) =0\), \(a(x,-\infty ) =0\) and \(a(\infty ,\infty ) =1\). Such estimates are often referred to as a defective distribution functions. We include these estimates in the action space to obtain satisfactory results. The necessity of using defective distribution functions to estimate a univariate distribution function was recognized by Aggarwal (1955).

The family of all estimators \({\widehat{F}}\), which satisfy the above condition we denote by \(\mathcal{D}\), i.e. we put

$$\begin{aligned} \mathcal{D} = \{ {\widehat{F}}({\varvec{z}};\cdot ,\cdot ) : {\mathbb {R}}^{2n} \rightarrow \mathcal{A} \}. \end{aligned}$$

We use the symbol \(\mathcal{D}_A\) to denote the subclass of \(\mathcal{D}\), which consists of the following affine estimators

$$\begin{aligned} \mathcal{D}_A = \left\{ {\widehat{F}} \in \mathcal{D} : {\widehat{F}}(s,t) = a\sum _{i=1}^{n} {\mathbb {1}}\left( X_i \le s, Y_i \le t\right) +b : a,b \in {\mathbb {R}}\right\} . \end{aligned}$$

The estimates from \(\mathcal{D}_A\) are computationally tractable. The important member of \(\mathcal{D}_A\) is the empirical bivariate c.d.f. defined by

$$\begin{aligned} {\widehat{F}}_{emp}(s,t)=\frac{\sum _{i=1}^{n} {\mathbb {1}}\left( X_i \le s, Y_i \le t\right) }{n}, \; (s,t) \in {\mathbb {R}}^2. \end{aligned}$$

Let \({\mathbb {E}} _F\) denote the expectation with respect to the c.d.f. F. Then, the risk function of an estimate \({\widehat{F}}\) under the loss function \(L_i\)\( i=1,2\) [see (1) and (2)] is given by

$$\begin{aligned} R_i(F, {\widehat{F}}) = {\mathbb {E}} _F ( L_i(F,{\widehat{F}}) ). \end{aligned}$$

When the value of i is clear from the context, we write R instead of \(R_i\). We are interested in finding the minimax estimator of F, i.e. the estimator \({\widehat{F}}_N \in \mathcal{D}\), for which the following equation holds

$$\begin{aligned} \sup _{F \in \mathcal{F}} R({\widehat{F}}_N, F)= \inf _{{\widehat{F}} \in \mathcal{D} } \sup _{F \in \mathcal{F}} R({\widehat{F}}, F):=\rho _N. \end{aligned}$$

Here \(\mathcal{F}\) is the family of all bivariate distribution on \({\mathbb {R}}^2\), i.e.

$$\begin{aligned} \mathcal{F}= \{F: F \; {\text {is a cumulative distribution function on}} \; {\mathbb {R}}^2 \}. \end{aligned}$$

Unfortunately, many estimators from the class \(\mathcal{D}\) are not computationally tractable and finding \({\widehat{F}}_N\) may be a difficult task. Therefore, we first look for affine minimax estimators. We say that a decision rule \({\widehat{F}}_A \in \mathcal{D}_A\) is minimax affine estimator of F if

$$\begin{aligned} \sup _{F \in \mathcal{F}} R({\widehat{F}}_A, F)= \inf _{{\widehat{F}} \in \mathcal{D}_A } \sup _{F \in \mathcal{F}} R({\widehat{F}}, F):=\rho _A. \end{aligned}$$

The quantities \( \rho _N\) and \( \rho _A\) are called the minimax risk and minimax affine risk, respectively. Clearly, \( \rho _N \le \rho _A\). We prove that under both \(L_1\) and \(L_2\) these two risks are equal, i.e. \(\rho _A=\rho _N\). Hence \({\widehat{F}}_A\) is minimax among all estimators from the class \(\mathcal{D}\).

We also discuss the minimax approach in the case where an unknown distribution F is assumed to be absolutely continuous with respect to the Lebesgue measure. Therefore, we denote

$$\begin{aligned} \mathcal{F}_{AC}= \{F: F \; {\text {is an absolutely continuous distribution function on}} \; {\mathbb {R}}^2 \}. \end{aligned}$$

3 Auxiliary results

As we have mentioned above, Yu (1992a) proved that results on minimax estimation of a binomial proportion p can help to find minimax estimators of a univariate c.d.f.. It is not surprising that these results also can be applied in the bivariate case. For the sake of completeness we recall here the following well known facts concerning inference on binomial proportion, that will be used in the next sections (see, e.g. Hodges and Lehmann 1950, Olkin and Sobel 1979, Lehmann and Casella 1998, pages 311–312 or Phadia 1973).

Suppose that on the basis of an observation X from the binomial distribution B(np) we wish to estimate the unknown success probability p under the weighted squared error loss \( L_B(d,p) =\frac{(d-p)^2}{p^{\delta }(1-p)^{\gamma }}\), with \(\delta , \gamma \in \{0,1\}\). Let the numbers \(a_1=a_1(\delta ,\gamma )\), \(b_1=b_1(\delta ,\gamma )\) and \(r_1=r_1(\delta ,\gamma )\) satisfy the following equation

$$\begin{aligned}&\inf _{a,b \in {\mathbb {R}}} \sup _{p \in [0,1]} \left[ \frac{ a^2n p(1-p) + (b+(an-1)p)^2}{p^{\delta } (1-p)^{\gamma }} \right] \nonumber \\&\quad \quad = \sup _{p \in [0,1]} \left[ \frac{ a_1^2n p(1-p) + (b_1+(a_1n-1)p)^2}{p^{\delta } (1-p)^{\gamma }} \right] =r_1 \end{aligned}$$
(3)

(the term inside the first brackets is the risk function of an affine estimator \(d(X)=aX+b\)). Then \(d_1(X)=a_1X+b_1\) is the minimax estimator of p and \(r_1\) is the minimax risk, i.e.

$$\begin{aligned} \inf _{d} \sup _{p \in [0,1]} {\mathbb {E}}_p \left[ \frac{(d(X)-p)^2}{p^{\delta } (1-p)^{\gamma }} \right] = \sup _{p \in [0,1]} {\mathbb {E}}_p \left[ \frac{(a_1X+b_1-p)^2}{p^{\delta } (1-p)^{\gamma }} \right] =r_1, \end{aligned}$$
(4)

where the infimum is over measurable functions \(d:{\mathbb {R}} \rightarrow {\mathbb {R}}\). The constants \(a_1,b_1,r_1\) are given by

$$\begin{aligned} \begin{aligned} \delta =0, \gamma =0{:}\,\,&a_1=\frac{1}{n+\sqrt{n}}, \; b_1=\frac{1}{2}\frac{1}{\sqrt{n}+1}, \; r_1= \frac{1}{4(\sqrt{n}+1)^2},\\ \delta =1, \gamma =0{:}\,\,&a_1=\frac{1}{n+\sqrt{n}}, \; b_1=0, \; r_1= \frac{1}{(\sqrt{n}+1)^2},\\ \delta =0, \gamma =1{:}\,\,&a_1=\frac{1}{n+\sqrt{n}},\; b_1=\frac{1}{\sqrt{n}+1}, \; r_1=\frac{1}{(\sqrt{n}+1)^2} ,\\ \delta =1, \gamma =1{:}\,\,&a_1=\frac{1}{n},\; b_1=0, \; r_1= \frac{1}{n}. \end{aligned} \end{aligned}$$
(5)

In each of the four cases, \(d_1(X)\) has constant risk and is the Bayes estimator of p when p has the beta prior \(B(\alpha ,\beta )\) with \((\alpha ,\beta )\) equal to \((\sqrt{n}/2,\sqrt{n}/2)\) in the first case, \((1,\sqrt{n})\) in the second, \((\sqrt{n},1)\) in the third and (1, 1) in the fourth. Therefore, \(d_1(X)\) is minimax (see Lehmann and Casella 1998, Corollary 1.5, page 311).

4 Minimax estimator under the loss \(L_1\)

Since for each \(s,t \in {\mathbb {R}}\) the random variables \(\mathbb {1}\left( X_1 \le s, Y_1 \le t\right) , \mathbb {1}\left( X_2 \le s, Y_2 \le t\right) , \ldots , \mathbb {1}\left( X_n \le s, Y_n \le t\right) \) are i.i.d. Bernoulli trials with probability of success F(st), it follows that the risk function of an affine estimator \({\widehat{F}}_{ab}(s,t)=a\sum _{i=1}^{n} {\mathbb {1}}\left( X_i \le s, Y_i \le t\right) +b \) has the form

$$\begin{aligned} R_1({\widehat{F}}_{ab}, F)= & {} {\mathbb {E}}_F \left[ \iint \limits _{{\mathbb {R}}^2} \left( {\widehat{F}}_{ab}(s,t) - F(s,t)\right) ^2 h(F(s,t)) \,dW(s,t) \right] \\= & {} \iint \limits _{{\mathbb {R}}^2} \frac{ [ a^2nF(s,t)(1-F(s,t)) + (b+(an-1)F(s,t))^2 ]}{ [F(s,t)]^{\delta } [1-F(s,t)]^{\gamma } } \,dW(s,t), \nonumber \end{aligned}$$
(6)

where \(h(p)= p^{-\delta } (1-p)^{-\gamma }, \;p \in (0,1)\). To find the minimax affine rule, we first derive the lower bound for the minimax affine risk \(\rho _A\). The method used here is closely related to that of Phadia (1973). For any fixed integer \(k \ge 1\) and any p in (0, 1), let \(F_{kp}\) be the bivariate c.d.f. defined by \(F_{kp}=pF_{k1}+(1-p)F_{k2}\), where \(F_{k1}\) and \(F_{k2}\) are the c.d.f.’s corresponding to uniform distibutions over the squares \(A_1=[-(k+1), -k]^2\) and \(A_2=[k, k+1]^2\), respectively. Since the integrand in (6) is nonnegative and since \(F_{kp}(s,t)=p\) on \([-k,k]^2\), it follows that the risk of \({\widehat{F}}_{ab}\) at the point \(F_{kp}\) satisfies the following inequality

$$\begin{aligned}&R_1({\widehat{F}}_{ab}, F_{kp}) \ge \iint \limits _{ [-k,k]^2 }\left[ a^2nF_{kp}(s,t)(1-F_{kp}(s,t)) + (b\right. \nonumber \\&\left. \qquad +(an-1)F_{kp}(s,t))^2 \right] h(F_{k,p}(s,t)) \,dW(s,t) \nonumber \\&\quad = \iint \limits _{[-k,k]^2} \left[ a^2n p(1-p) + (b+(an-1)p)^2 \right] h(p) \,dW(s,t)\nonumber \\&\quad =\left[ \frac{ a^2n p(1-p) + (b+(an-1)p)^2}{p^{\delta } (1-p)^{\gamma }} \right] \; \iint \limits _{[-k,k]^2} \,dW(s,t). \end{aligned}$$
(7)

We use this inequality to prove the following lemma.

Lemma 1

Let \(\delta , \gamma \in \{0,1\}\) be fixed and let \(a_1\), \(b_1\), \(r_1\) be the corresponding numbers defined by (5). Then \({\widehat{F}}_{A}=a_1\sum _{i=1}^{n} {\mathbb {1}}\left( X_i \le s, Y_i \le t\right) +b_1 \) is the minimax affine rule under the loss function \(L_1\) and the minimax affine risk is given by

$$\begin{aligned} \rho _A=\inf _{a,b \in {\mathbb {R}}} \sup _{F \in \mathcal{F}} R_1({\widehat{F}}_{ab}, F) = \sup _{F \in \mathcal{F}} R_1({\widehat{F}}_{A}, F) =r_1 \iint \limits _{{\mathbb {R}}^2} \,dW(s,t). \end{aligned}$$

Proof

Since (7) holds for any positive integer k, we conclude by (3) that

$$\begin{aligned} \rho _A= & {} \inf _{a,b \in {\mathbb {R}}} \sup _{F \in \mathcal{F}} R_1({\widehat{F}}_{ab}, F) \ge \inf _{a,b \in {\mathbb {R}}} \sup _{p \in [0,1]} \sup _{k \ge 1} R_1({\widehat{F}}_{ab}, F_{kp}) \\\ge & {} \inf _{a,b \in {\mathbb {R}}} \sup _{p \in [0,1]} \left[ \frac{ a^2n p(1-p) + (b+(an-1)p)^2}{p^{\delta } (1-p)^{\gamma }} \right] \;\iint \limits _{{\mathbb {R}}^2} \,dW(s,t) \\= & {} r_1 \iint \limits _{{\mathbb {R}}^2} \,dW(s,t). \end{aligned}$$

By straightforward calculations it is easy to verify that the integrand in the second line of (6) does not depend of F and equals \(r_1\) when \(a=a_1\) and \(b=b_1\) (see again Phadia 1973). Therefore, the risk function of the estimator \({\widehat{F}}_{A}={\widehat{F}}_{a_1b_1}\) is constant and equal to \(r_1 \iint \limits _{{\mathbb {R}}^2} \,dW(s,t)\). This completes the proof, because

$$\begin{aligned} \inf _{a,b \in {\mathbb {R}}} \sup _{F \in \mathcal{F}} R_1({\widehat{F}}_{ab}, F)= & {} \rho _A \ge r_1 \iint \limits _{{\mathbb {R}}^2} \,dW(s,t) \\= & {} \sup _{F \in \mathcal{F}} R_1({\widehat{F}}_{a_1b_1}, F) \ge \inf _{a,b \in {\mathbb {R}}} \sup _{F \in \mathcal{F}} R_1({\widehat{F}}_{ab}, F). \end{aligned}$$

\(\square \)

Remark 1

Note that the constants \(a_1\), \(b_1\) and \(r_1\), which define both the form of a minimax affine rule and the corresponding minimax affine risk, depend on both \(\delta \) and \(\gamma \). However, for simplicity of notation, this dependence is not shown throughout this paper.

The next theorem, which is the main result of this section, states that the estimator \({\widehat{F}}_{a_1b_1}\) is minimax in \(\mathcal{D}\). The proof is based on a method of Yu (1992a) (cf. also Jokiel-Rokita and Magiera 2007).

Theorem 1

Let \(\delta , \gamma \in \{0,1\}\) be fixed and let \(a_1\), \(b_1\), \(r_1\) be the corresponding numbers defined by (5). Then the estimator \({\widehat{F}}_{A}=a_1\sum _{i=1}^{n} {\mathbb {1}}\left( X_i \le s, Y_i \le t\right) +b_1\) of an arbitrary bivariate c.d.f. F is minimax under the loss function \(L_1\) and the minimax risk is given by

$$\begin{aligned} \rho _N=\inf _{{\widehat{F}} \in \mathcal{D} } \sup _{F \in \mathcal{F}} R_1({\widehat{F}}, F) = \sup _{F \in \mathcal{F}} R_1({\widehat{F}}_{A}, F) =r_1\iint \limits _{{\mathbb {R}}^2} \,dW(s,t). \end{aligned}$$

Proof

To prove the theorem, it suffices to show that \(\sup _{F \in \mathcal{F}} R_1({\widehat{F}}, F) \ge \rho _A\) for any \({\widehat{F}} \in \mathcal{D}\). Let \(k>0\) be a fixed integer and let \((X_1,Y_1), \ldots , (X_n,Y_n)\) be i.i.d. random vectors from the c.d.f. \(F_{kp}\) defined above. Since \(F_{kp}(s,t)=p\) on \( [-k,k]^2\), it follows that for any \({\widehat{F}} \in \mathcal{D}\),

$$\begin{aligned}&R_1({\widehat{F}}, F_{kp})= {\mathbb {E}}_{F_{kp}} \left[ L_1({\widehat{F}}, F_{kp}) \right] \nonumber \\&\quad \ge {\mathbb {E}}_{F_{kp}} \left[ \;\;\iint \limits _{[-k,k]^2} ({\widehat{F}}(s,t)-F_{kp}(s,t))^2 h(F_{k,p}(s,t)) \, dW(s,t) \right] \nonumber \\&\quad = {\mathbb {E}}_{F_{kp}} \left[ \;\;\iint \limits _{[-k,k]^2} \frac{({\widehat{F}}(s,t)-p)^2}{p^{\delta } (1-p)^{\gamma }}\,dW(s,t) \right] . \end{aligned}$$
(8)

Note first that the joint distribution of the vector \({\varvec{Z}}=((X_1, Y_1), \ldots , (X_n, Y_n))\) is given by

$$\begin{aligned}&\prod _{i=1}^{n} [p 1_{A_1}(x_i,y_i) +(1-p) 1_{A_2}(x_i,y_i)] \\&\quad = \, \prod _{\{i:(x_i,y_i) \in A_1\}} \,p1_{A_1}(x_i,y_i) \,\times \, \prod _{\{i:(x_i,y_i) \in A_2\}} \, 1_{A_2}(x_i,y_i)(1-p) = p^{n_1} (1-p)^{n-n_1}, \end{aligned}$$

where \(n_1\) is the value of the random variable \(N_1\) which counts the number of observations \((x_j,y_j)\) that fall into the square \(A_1\). Since \(N_1\) is the sufficient statistic for p, we may assume that \({\widehat{F}}\) depends on \({\varvec{Z}}\) only through \(N_1\), i.e. \({\widehat{F}}({\varvec{Z}};s,t)={\widetilde{F}}(N_1;s,t)\) for some Borel measurable function \({\widetilde{F}}:{\mathbb {R}}^3 \rightarrow {\mathbb {R}}\). Let the numbers \(\delta (i)\),    \(i=1,\ldots ,n\), corresponding to \({\widehat{F}}\), be defined by \( \delta (i) \cdot \iint \limits _{[-k,k]^2} dW(s,t) = \iint \limits _{[-k,k]^2} {\widetilde{F}}(i;s,t) \, dW(s,t). \) Then,

$$\begin{aligned}&\iint \limits _{[-k,k]^2} ({\widetilde{F}}(i;s,t)-p)^2 dW(s,t) = \iint \limits _{[-k,k]^2} \left[ {\widetilde{F}}(i;s,t)-\delta (i)+ \delta (i)-p \right] ^2 \,dW(s,t) \\&\quad = \iint \limits _{[-k,k]^2} \left[ {\widetilde{F}}(i;s,t)-\delta (i) \right] ^2 dW(s,t)+2[\delta (i)-p ] \iint \limits _{[-k,k]^2} \left[ {\widetilde{F}}(i;s,t)-\delta (i) \right] dW(s,t) \\&\qquad + \iint \limits _{[-k,k]^2} [\delta (i)-p]^2 dW(s,t) \\&\quad = \iint \limits _{[-k,k]^2} \left[ {\widetilde{F}}(i;s,t)-\delta (i) \right] ^2 dW(s,t)+2[\delta (i)-p ] \cdot 0 + \iint \limits _{[-k,k]^2} [\delta (i)-p]^2 dW(s,t)\\&\qquad \ge \iint \limits _{[-k,k]^2} [\delta (i)-p]^2 dW(s,t). \end{aligned}$$

This immediately shows that

$$\begin{aligned} {\mathbb {E}}_{F_{kp}} \left[ \;\; \iint \limits _{[-k,k]^2} ({\widetilde{F}}(N_1;s,t)-p)^2 dW(s,t) \right] \ge {\mathbb {E}}_{F_{kp}} \left[ \;\; \iint \limits _{[-k,k]^2} (\delta (N_1)-p)^2 dW(s,t) \right] . \end{aligned}$$

Since \(N_1\) has the binomial distribution B(np), the last inequality implies by (8) and (4) that

$$\begin{aligned} \inf _{{\widehat{F}} \in \mathcal{D}} \sup _{p \in [0,1]} R_1({\widehat{F}}, F_{kp})\ge & {} \inf _{\delta } \sup _{p \in [0,1]} {\mathbb {E}}_p \left[ \frac{(\delta (N_1)-p)^2}{p^{\delta } (1-p)^{\gamma }} \right] \;\iint \limits _{[-k,k]^2}dW(s,t) \\= & {} r_1\iint \limits _{[-k,k]^2}dW(s,t). \end{aligned}$$

Here we use the fact that \({\widehat{F}}({\varvec{Z}};\cdot ,\cdot )={\widetilde{F}}(N_1;\cdot ,\cdot )\). Letting \(k \rightarrow \infty \) we obtain the lower bound

$$\begin{aligned} \sup _{F \in \mathcal{F}} R_1({\widehat{F}}, F)\ge & {} \lim _{k \rightarrow \infty } \inf _{{\widehat{F}} \in \mathcal{D}} \sup _{p \in [0,1]} R_1({\widehat{F}}, F_{kp})\\\ge & {} \lim _{k \rightarrow \infty } r_1\iint \limits _{[-k,k]^2}dW(s,t) = r_1 \iint \limits _{{\mathbb {R}}^2}dW(s,t) \end{aligned}$$

which proves that under the loss \(L_1\), te decision rule \({\widehat{F}}_{A}:={\widehat{F}}_{a_1b_1}\) is minimax among all estimators. \(\square \)

Remark 2

Since for any integer \(k \ge 1\) and any \(p \in (0,1)\), the c.d.f. \(F_{kp}\) is absolutely continuous (with respect to Lebesgue measure), it follows that we have proved a stronger result than the statement of the last theorem. In fact, we have shown that under the loss \(L_1\), \({\widehat{F}}_{A}\) is the minimax estimator of an unknown absolutely continuous bivariate c.d.f. F. Moreover, Theorem 1 can be easily generalized to dimensions \(d>2\). In a slight modification of the proof, \(F_{kp}\) is a d-variate c.d.f., which equals p over the hypercube \([-k,k]^d\).

5 Minimax estimation under the loss function \(L_2\)

Under the loss \(L_2\) the risk of an affine decision rule \({\widehat{F}}_{ab}\) is given by [cf. (6)]

$$\begin{aligned} R_2({\widehat{F}}_{ab}, F) = \iint \limits _{{\mathbb {R}}^2} \frac{ [ a^2nF(s,t)(1-F(s,t)) + (b+(an-1)F(s,t))^2 ]}{ [F(s,t)]^{\delta } [1-F(s,t)]^{\gamma } } \,dF(s,t).\nonumber \\ \end{aligned}$$
(9)

To find the minimax affine estimator, we first derive the lower bound for the minimax affine risk \(\rho _A\). For this purpose, we choose a suitable family of the bivariate c.d.f.’s and consider estimation of F in the resulting submodel. Let \(({\tilde{x}}_n)_{n \ge 1}\) be a given increasing sequence of points from (0, 1) and let \(m \ge 1\) be a fixed integer. We put \({\tilde{x}}_0={\tilde{y}}_0=0\) and \({\tilde{y}}_i = 1-{\tilde{x}}_i\) for \(i \ge 1\). Let the set \(\mathcal{S}_m \) be defined by

$$\begin{aligned} \mathcal{S}_m=\{{\varvec{s}}=(s_0,\ldots ,s_m) \in [0,1]^{m+1} : s_0+\cdots +s_m=1 \}. \end{aligned}$$

For any probability vector \({\varvec{p}}=(p_0,\ldots ,p_m) \in \mathcal{S}_m\), we denote by \(F_{m,p}\) the bivariate c.d.f. which corresponds to a discrete random vector (XY) with the support \(\{({\tilde{x}}_0,{\tilde{y}}_0),\ldots ,({\tilde{x}}_m,{\tilde{y}}_m)\}\) and with the joint probability mass function given by

$$\begin{aligned} f_{m,p}({\tilde{x}}_i,{\tilde{y}}_i) =\Pr (X={\tilde{x}}_i,Y={\tilde{y}}_i )= p_i, \quad i=0,\ldots ,m. \end{aligned}$$

The c.d.f. \(F_{m,p}\) satisfies

$$\begin{aligned} F_{m,p}({\tilde{x}}_i,{\tilde{y}}_i) = \Pr (X\le {\tilde{x}}_i, Y \le {\tilde{y}}_i) = \left\{ \begin{array}{ll} p_0, &{} \quad {\text {when }}i=0, \\ p_0+p_i, &{} \quad {\text {when }}i=1,\ldots , m, \end{array} \right. \end{aligned}$$

which implies that for any integer \(k \ge 0\),

$$\begin{aligned}&\iint \limits _{{\mathbb {R}}^2} \frac{\left[ \,F_{m,p}(s,t)\,\right] ^k }{[F_{m,p}(s,t)]^{\delta } [1-F_{m,p}(s,t)]^{\gamma }} \,dF_{m,p}(s,t) \nonumber \\&\quad = \frac{ (p_0)^{k}}{(p_0)^{\delta }(1-p_0)^{\gamma }}\,p_0+\sum _{i=1}^m\frac{ (p_0+p_i)^{k}}{(p_0+p_i)^{\delta }(1-p_0-p_i)^{\gamma }}\, p_i . \end{aligned}$$
(10)

We use the last equality to find the lower bound for the minimax affine risk \(\rho _A\). For each \(p_0 \in [0,1]\) and each integer \(m \ge 1\), let \({\varvec{p}}_0^{(m)}\) denote the corresponding vector from \(\mathcal{S}_m\) given by

$$\begin{aligned} {\varvec{p}}_0^{(m)}=\left( p_0,\frac{1-p_0}{m},\ldots ,\frac{1-p_0}{m} \right) . \end{aligned}$$

Then, for any integer \(k \ge 0\),

$$\begin{aligned}&\lim _{m \rightarrow \infty } \iint \limits _{{\mathbb {R}}^2} \frac{\left[ \,F_{m,p_0^{(m)}}(s,t)\,\right] ^k }{\left[ F_{m,p_0^{(m)}}(s,t)\right] ^{\delta } \left[ 1-F_{m,p_0^{(m)}}(s,t)\right] ^{\gamma }} \,dF_{m,p_0^{(m)}}(s,t) \\&\quad = \lim _{m \rightarrow \infty } \left[ \frac{ (p_0)^{k}\,p_0}{(p_0)^{\delta }(1-p_0)^{\gamma }}+\frac{ [p_0+(1-p_0)/m]^{k}(1-p_0)}{[p_0+(1-p_0)/m]^{\delta }[1-p_0-(1-p_0)/m]^{\gamma }}\, \right] \\&\quad = \frac{ (p_0)^{k}}{(p_0)^{\delta }(1-p_0)^{\gamma }}. \end{aligned}$$

Using (9) and applying the last equality with \(k=0,1,2\), we therefore find that

$$\begin{aligned} \lim _{m \rightarrow \infty } R_2 \left( {\widehat{F}}_{ab},F_{m,p_0^{(m)}} \right) = \left[ \frac{ a^2n p_0(1-p_0) + [\,b+(an-1)p_0\,]^2}{p_0^{\delta } (1-p_0)^{\gamma }} \right] . \end{aligned}$$

Hence, by (3) and (5), we obtain the following lower bound for the risk of affine estimators

$$\begin{aligned} \rho _A= & {} \inf _{a,b}\sup _{F \in \mathcal{F}} R_2\left( {\widehat{F}}_{a,b},F \right) \ge \inf _{a,b} \sup _{p_0 \in [0,1]} \sup _{m \ge 1} R_2 \left( {\widehat{F}}_{ab},F_{m,p_0^{(m)}} \right) \\\ge & {} \inf _{a,b} \sup _{p_0 \in [0,1]} \lim _{m \rightarrow \infty } R_2 \left( {\widehat{F}}_{ab},F_{m,p_0^{(m)}} \right) \\= & {} \inf _{a,b \in {\mathbb {R}}} \sup _{p_0 \in [0,1]} \left[ \frac{ a^2n p_0(1-p_0) + (b+(an-1)p_0)^2}{p_0^{\delta } (1-p_0)^{\gamma }} \right] =r_1. \end{aligned}$$

Lemma 2

Let \(\delta , \gamma \in \{0,1\}\) be fixed and let \(a_1\), \(b_1\), \(r_1\) be the corresponding constants defined (5). Then \({\widehat{F}}_{A}=a_1\sum _{i=1}^{n} {\mathbb {1}}\left( X_i \le s, Y_i \le t\right) +b_1 \) is the minimax affine rule under the loss \(L_2\) and the minimax affine risk is given by

$$\begin{aligned} \rho _A=\inf _{a,b \in {\mathbb {R}}} \sup _{F \in \mathcal{F}} R_2({\widehat{F}}_{ab}, F) = \sup _{F \in \mathcal{F}} R_2({\widehat{F}}_{A}, F) =r_1. \end{aligned}$$

Proof

The risk function of \({\widehat{F}}_{A}={\widehat{F}}_{a_1b_1}\) is constant and equal to \(r_1\), because the integrand in (9) does not depend of F and equals \(r_1\) when \(a=a_1\) and \(b=b_1\) (cf. the proof of Lemma 1). This completes the proof, because \(r_1\) is the lower bound for \(\rho _A\) and hence

$$\begin{aligned} \inf _{a,b \in {\mathbb {R}}} \sup _{F \in \mathcal{F}} R_2({\widehat{F}}_{ab}, F)= & {} \rho _A \ge r_1 = \sup _{F \in \mathcal{F}} R_2({\widehat{F}}_{A}, F) \ge \inf _{a,b \in {\mathbb {R}}} \sup _{F \in \mathcal{F}} R_2({\widehat{F}}_{ab}, F). \end{aligned}$$

\(\square \)

To prove that \({\widehat{F}}_{A}\) is minimax among all estimators we use the Bayes approach. More precisely, we take a specific sequence of priors on \(\mathcal{F}\) such that the corresponding sequence of Bayes risks converges to the supremum of the risk for \({\widehat{F}}_{A}\). Since the corresponding limit of Bayes risks is the lower bound for the minimax risk, we conclude that \({\widehat{F}}_{A}\) is minimax.

Suppose that we know a priori that \((X_1,Y_1), \ldots , (X_n,Y_n)\) are i.i.d. random vectors from \(F_{m,p}\), where m is a fixed positive integer and \({\varvec{p}}=(p_0,\ldots ,p_m) \in \mathcal{S}_m\) is an uknown probability vector. Then, the joint distribution of \({\varvec{Z}}=((X_1, Y_1), \ldots , (X_n, Y_n))\) is given by

$$\begin{aligned} \prod _{i=1}^{n}{f_{m,p}(x_i,y_i)} = \prod _{j=1}^{m}{p_j^{N_j}} , \end{aligned}$$

where \(N_k = \#\left\{ i:\left( x_i, y_i\right) = ({\tilde{x}}_k,{\tilde{y}}_k) \right\} \) for \(k=0, \ldots ,m\). It is clear that \({\varvec{N}} =(N_0,\ldots ,N_m)\) has the multinomial distribution on \(m+1\) categories with n draws and probability vector \({\varvec{p}}=(p_0,\ldots ,p_m)\). Note that for any estimator \({\widehat{F}} \in \mathcal{D}\), we obtain

$$\begin{aligned} L_2({\widehat{F}}, F_{m,p})= & {} \frac{({\widehat{F}}({\tilde{x}}_0,{\tilde{y}}_0)-p_0 )^2}{(p_0)^{\delta } (1-p_0)^{\gamma }} p_0+ \sum _{i=1}^m \frac{( {\widehat{F}}({\tilde{x}}_i,{\tilde{y}}_i)-p_0-p_i )^2 }{(p_0+p_i)^{\delta } (1-p_0-p_i)^{\gamma }} p_i \end{aligned}$$

Since \({\varvec{N}}\) is the sufficient statistics for \({\varvec{p}}\), we may assume that \({\widehat{F}}({\tilde{x}}_i,{\tilde{y}}_i)\),    \( i=0,\ldots ,m\), depends on \({\varvec{Z}}\) only through \({\varvec{N}}\) i.e. \({\widehat{F}}({\varvec{Z}}; {\tilde{x}}_i,{\tilde{y}}_i) = d_i({\varvec{N}})\) for some real-valued Bore1 measurable function \(d_i\). Therefore, the problem of estimating \(F_{m,p}\) with the loss \(L_2\) and the sample \({\varvec{Z}}\) from \(F_{m,p}\) is equivalent to estimation of multinomial probabilities \({\varvec{p}}=(p_0,\ldots ,p_m)\) under the loss

$$\begin{aligned} L({\varvec{d}},{\varvec{p}}) = \frac{(d_0-p_0 )^2}{(p_0)^{\delta } (1-p_0)^{\gamma }} \, p_0+ \sum _{i=1}^m \frac{( d_i-p_0-p_i )^2 }{(p_0+p_i)^{\delta } (1-p_0-p_i)^{\gamma }} \,p_i \end{aligned}$$
(11)

and the sample \({\varvec{N}}\). This means that the corresponding two minimax risks are equal, i.e.

$$\begin{aligned} \inf _{{\widehat{F}} \in \mathcal{F}} \sup _{p \in \mathcal{S}_m } R_2\left( {\widehat{F}},F_{m,p} \right) =\inf _{d \in \mathcal{D}_m} \sup _{p \in \mathcal{S}_m } R\left( {\varvec{d}},{\varvec{p}} \right) , \end{aligned}$$

where \( R({\varvec{d}},{\varvec{p}}) = {\mathbb {E}}_{p} \left[ \, L({\varvec{d}}({\varvec{N}}),{\varvec{p}}) \, \right] \) and \(\mathcal{D}_m\) is the set of all estimators of \({\varvec{p}}=(p_0,\ldots ,p_m) \in \mathcal{S}_m\).

To find the minimax risk in the latter problem we use the Bayes approach. Let \(\alpha _0,\ldots ,\alpha _m\) be any given positive numbers and let \(\alpha =\sum _{i=0}^m \alpha _i\). Suppose that the unknown vector \({\varvec{p}}=(p_0,\ldots ,p_m)\) has the Dirichlet distribution \(\pi =\mathcal{D}(\alpha _0, \ldots , \alpha _m)\) with the parameter vector\(\left( \alpha _0, \ldots , \alpha _m \right) \). Then, the random vector \((p_0,\ldots ,p_{m-1})\) has the Lebesgue p.d.f.

$$\begin{aligned} f(p_0,\ldots ,p_{m-1}) = \frac{\varGamma \left( \alpha \right) }{ \prod \limits _{i=0}^{m}{\varGamma \left( \alpha _i \right) }}\prod _{i=0}^{m}{p_i^{\alpha _i-1}}\,\times \,I_{\mathcal{S}_m}({\varvec{p}}), \end{aligned}$$

where \(p_m=1-(p_0+\cdots +p_{m-1})\) and \(I_{\mathcal{S}_m}(\cdot )\) is the indicator function of the set \(\mathcal{S}_m\). Since the Dirichlet prior is conjugate to the multinomial distribution it follows that the posterior of \({\varvec{p}}\) given \({\varvec{N}}={\varvec{n}}\) is \(\mathcal{D}(\alpha _0+n_0,\ldots ,\alpha _m+n_m )\), i.e.

$$\begin{aligned} f(p_0,\ldots ,p_{m-1}|{\mathbf {n}} )= \frac{\varGamma \left( \alpha + n\right) }{ \prod \limits _{i=0}^{m}{\varGamma \left( n_i+\alpha _i \right) }}\prod _{i=0}^{m}{p_i^{n_i+\alpha _i-1}}\,\times \,I_{\mathcal{S}_m}({\varvec{p}}). \end{aligned}$$

Let \({\varvec{d}}^{\pi }=(d_1^{\pi },\ldots ,d_m^{\pi })\) be the Bayes estimator of \({\varvec{p}}=(p_0,\ldots ,p_m)\) under the loss \(L({\varvec{d}},{\varvec{p}})\), given by (11). Then

$$\begin{aligned} d_i^{\pi } ({\varvec{N}})= \left\{ \begin{array}{ll} \displaystyle \frac{{\mathbb {E}} \left[ \, (p_0)^{1-\delta } (1-p_0)^{-\gamma }p_0\,| \,{\varvec{N}} \,\right] }{{\mathbb {E}} \left[ \, (p_0)^{-\delta } (1-p_0)^{-\gamma }p_0 \,| \,{\varvec{N}} \,\right] }, &{} \quad i=0, \\ \displaystyle \frac{{\mathbb {E}} \left[ \, (p_0+p_i)^{1-\delta } (1-p_0-p_i)^{-\gamma }p_i\,| \,{\varvec{N}} \,\right] }{{\mathbb {E}} \left[ \, (p_0+p_i)^{-\delta } (1-p_0-p_i)^{-\gamma }p_i\,| \,{\varvec{N}} \,\right] }, &{} \quad i=1,\ldots ,m, \end{array} \right. \end{aligned}$$

provided that these posterior moments are finite. If \(\delta =\gamma =0\), then both integral and the resulting Bayes risk can be easily calculated, because for each each \(k,l \in \{0,1,2,3\}\),

$$\begin{aligned}&{\mathbb {E}}_{\pi } \left( p_i^{\,k} p_0^{\,l} \right) \nonumber \\&\quad = \frac{\alpha _i(\alpha _i+1) \cdots (\alpha _i+k-1)\alpha _0(\alpha _0+1) \cdots (\alpha _0+l-1)}{\alpha (\alpha +1) \cdots (\alpha +k+l-1)}. \end{aligned}$$
(12)

To find these moments in the case where \(\delta =1\) or \(\gamma =1\), we use the following Liouville formula (cf. Fichtenholz 1992). Let a function \(\phi :[0,1] \rightarrow {\mathbb {R}}\) be continuous and let p and q be any positive real numbers. If the integral \(\int \limits _{0}^1 |\phi (u)|u^{p+q-1} du\) is finite, then the following identity holds

$$\begin{aligned} \iint \limits _{\begin{array}{c} x \ge 0, \,y\ge 0 \\ x+y \le 1 \end{array}} \phi (x+y)x^{p-1}y^{q-1} \,dxdy =\frac{\varGamma (p)\varGamma (q)}{\varGamma (p+q)} \int \limits _{0}^1 \phi (u)u^{p+q-1} du. \end{aligned}$$
(13)

Let ab be any real numbers such that \(\alpha _0+\alpha _i+a+1>0\) and \(\alpha -\alpha _0-\alpha _i+b>0\). Since the vector \((p_0,p_i,1-p_0-p_i)\) is distributed according to \(\mathcal{D}(\alpha _0, \alpha _i, \alpha -\alpha _0-\alpha _i)\), it follows from (13) that

$$\begin{aligned}&C_i(a,b)\nonumber \\&\quad := {\mathbb {E}}_{\pi }\left[ \, (p_0+p_i)^a(1-p_0-p_i)^bp_i \, \right] = \frac{\varGamma (\alpha )}{\varGamma (\alpha _0)\varGamma (\alpha _i)\varGamma (\alpha -\alpha _0-\alpha _i)} \\&\qquad \,\times \, \iint \limits _{\begin{array}{c} p_0 \ge 0, p_i \ge 0\\ p_0+p_i \le 1 \end{array}} (p_0+p_i)^a(1-p_0-p_i)^{b} p_i p_0^{\alpha _0-1}p_i^{\alpha _i-1} (1-p_0 -p_i)^{\alpha -\alpha _0-\alpha _i-1} \,dp_0dp_i \nonumber \\ \nonumber&\quad = \frac{\varGamma (\alpha )\alpha _i}{\varGamma (\alpha _0+\alpha _i+1)\varGamma (\alpha -\alpha _0-\alpha _i)} \frac{ \varGamma (\alpha _0+\alpha _i+a+1) \varGamma (\alpha -\alpha _0-\alpha _i+b)}{ \varGamma (\alpha +a+b+1) } \end{aligned}$$
(14)

Moreover, if \(\alpha _0+a+1>0\) and \(\alpha -\alpha _0+b>0\), then we also get

$$\begin{aligned} C_0(a,b):= & {} {\mathbb {E}}_{\pi }\left[ \, (p_0)^a(1-p_0)^bp_0 \, \right] \nonumber \\= & {} \frac{\varGamma (\alpha )}{\varGamma (\alpha _0)\varGamma (\alpha -\alpha _0)} \frac{\varGamma (\alpha _0+a+1)\varGamma (\alpha -\alpha _0+b)}{ \varGamma (\alpha +a+b+1) }, \end{aligned}$$
(15)

because \((p_0,1-p_0)\) has the distribution \(\mathcal{D}(\alpha _0, \alpha -\alpha _0)\). In particular, since \(\delta , \gamma \in \{0,1\}\), \(C_i(-\delta ,-\gamma )\) and \(C_i(1-\delta ,-\gamma )\) are finite for all \( i =0, \ldots ,m\) if \((\alpha _0,\ldots ,\alpha _m)\) belongs to the set \(\mathcal{A}_m^{\delta ,\gamma }\) defined by

$$\begin{aligned}&\mathcal{A}_m^{\delta ,\gamma }=\{ (\alpha _0,\ldots ,\alpha _m) \in {\mathbb {R}}^{m+1}_+: \alpha -\alpha _0-\alpha _i-\gamma >0, \quad {\text {for all}} \quad i =1,\ldots ,m \}.\nonumber \\ \end{aligned}$$
(16)

Let \((\alpha _0,\ldots ,\alpha _m) \in \mathcal{A}_m^{\delta ,\gamma }\). Then, since the posterior of \({\varvec{p}}\) given \({\varvec{N}}={\varvec{n}}\) is \(\mathcal{D}(\alpha _0+n_0,\ldots ,\alpha _m+n_m )\), it follows by (14) and (15), that the Bayes estimator \({\varvec{d}}^{\pi }=(d_0^{\pi },\ldots ,d_m^{\pi })\) of \({\varvec{p}}=(p_0,\ldots ,p_m)\) is

$$\begin{aligned}&d_i^{\pi } ({\varvec{N}})\\&\quad = \left\{ \begin{array}{ll} \displaystyle \frac{{\mathbb {E}} \left[ \, (p_0)^{1-\delta } (1-p_0)^{-\gamma }p_0\,| \,{\varvec{N}} \,\right] }{{\mathbb {E}} \left[ \, (p_0)^{-\delta } (1-p_0)^{-\gamma }p_0 \,| \,{\varvec{N}} \,\right] } = \frac{N_0+\alpha _0+1-\delta }{n+\alpha +1-\delta -\gamma }, &{} \quad i=0,\\ \displaystyle \frac{{\mathbb {E}} \left[ \, (p_0+p_i)^{1-\delta } (1-p_0-p_i)^{-\gamma }p_i\,| \,{\varvec{N}} \,\right] }{{\mathbb {E}} \left[ \, (p_0+p_i)^{-\delta } (1-p_0-p_i)^{-\gamma }p_i\,| \,{\varvec{N}} \,\right] } = \frac{N_0+N_i+\alpha _0+\alpha _i +1-\delta }{n+\alpha +1-\delta -\gamma }, &{} \quad i=1,\ldots ,m, \end{array} \right. \end{aligned}$$

Clearly, the finiteness of the above posterior means is implied by the fact that if \((\alpha _0,\ldots ,\alpha _m) \in \mathcal{A}_m^{\delta ,\gamma }\) then \((\alpha _0+n_0,\ldots ,\alpha _m+n_m ) \in \mathcal{A}_m^{\delta ,\gamma }\), because \(\alpha _i+n_i \ge \alpha _i>0\) and \(\alpha +n-(\alpha _0+n_0)-(\alpha _i+n_i)-\gamma = \alpha -\alpha _0-\alpha _i-\gamma +(n-n_0-n_i) \ge \alpha -\alpha _0-\alpha _i-\gamma >0 \) for each \( i=1,\ldots ,m\).

Since the random variables \(N_0\) and \(N_0+N_i\),    \(i=1,\ldots ,m\) have the binomial distributions \(B(n,p_0)\) and \(B(n,p_0+p_i)\), respectively, the risk of \({\varvec{d}}^{\pi }\) is given by

$$\begin{aligned}&R({\varvec{d}}^{\pi },{\varvec{p}}) = {\mathbb {E}}_{p} \left[ \, L({\varvec{d}}^{\pi }({\varvec{N}}),{\varvec{p}}) \, \right] = \frac{ n p_0 (1-p_0) + (\,u_0 -u p_0 \,)^2}{(n+u)^2} \frac{p_0}{(p_0)^{\delta } (1-p_0)^{\gamma }} \\&\qquad +\sum _{i=1}^m \frac{ n (p_0+p_i) (1-p_0-p_i) + [ \,u_i- u(p_0+p_i) \,]^2}{(n+u)^2} \frac{p_i}{(p_0+p_i)^{\delta } (1-p_0-p_i)^{\gamma }}, \end{aligned}$$

where for simplicity of notation we write

$$\begin{aligned} u_i = \left\{ \begin{array}{ll} \alpha _0 + 1-\delta &{} \quad {\text {when }}i=0, \\ \alpha _0 +\alpha _i+ 1-\delta , &{} \quad {\text {when }}i=1,\ldots , m, \end{array} \right. \quad \text {and} \quad u=\alpha +1-\delta -\gamma .\qquad \end{aligned}$$
(17)

Then, by (14) and (15), the Bayes risk \(r(\pi ) = {\mathbb {E}}_{\pi } \left[ \,R({\varvec{d}}^{\pi },{\varvec{p}}) \, \right] \) can be written as

$$\begin{aligned}&r(\pi )\\&\quad = \sum _{i=0}^m \left[ \frac{nC_i(1-\delta ,1-\gamma )+u_i^2 C_i(-\delta ,-\gamma ) -2u_iu\,C_i(1-\delta ,-\gamma )+u^2C_i(2-\delta ,-\gamma )}{(n+u)^2} \right] . \end{aligned}$$

To simplify the last formula, we note that by (14) and (15)

$$\begin{aligned} \displaystyle C_i(1-\delta ,-\gamma )=\frac{u_i}{u}\,C_i(-\delta ,-\gamma ),\; C_i(2-\delta ,-\gamma )=\frac{u_i(u_i+1)}{u(u+1)}\,C_i(-\delta ,-\gamma ). \end{aligned}$$

Moreover, we also have \(\displaystyle C_i(1-\delta ,1-\gamma )=\frac{u_i(u-u_i)}{u(u+1)}\,C_i(-\delta ,-\gamma )\), which implies that

$$\begin{aligned}&r(\pi )\\&\quad = \sum _{i=0}^m \left[ \frac{nC_i(1-\delta ,1-\gamma )+C_i(-\delta ,-\gamma ) \left( u_i^2 -2u_i^2+uu_i(u_i+1)/(u+1) \right) }{(n+u)^2} \right] \\&\quad = \sum _{i=0}^m \frac{ nC_i(1-\delta ,1-\gamma ) + uC_i(1-\delta ,1-\gamma )}{(n+u)^2} =\sum _{i=0}^m \frac{ C_i(1-\delta ,1-\gamma ) }{n+u} . \end{aligned}$$

Finally, by (14), (15) and (17), we obtain the following formula for the Bayes risk of \({\varvec{d}}^{\pi }\)

$$\begin{aligned}&r(\pi )\nonumber \\&\quad = \frac{1}{n+\alpha +1-\delta -\gamma } \left[ \frac{\varGamma (\alpha )}{\varGamma (\alpha _0)\varGamma (\alpha -\alpha _0)} \frac{\varGamma (\alpha _0+2-\delta )\varGamma (\alpha -\alpha _0+1-\gamma )}{ \varGamma (\alpha +3-\delta -\gamma ) } \right. \nonumber \\&\qquad + \sum _{i=1}^m \frac{\varGamma (\alpha )\alpha _i}{\varGamma (\alpha _0+\alpha _i+1)\varGamma (\alpha -\alpha _0-\alpha _i)} \left. \frac{ \varGamma (\alpha _0+\alpha _i+2-\delta ) \varGamma (\alpha -\alpha _0-\alpha _i+1-\gamma )}{ \varGamma (\alpha +3-\delta -\gamma ) } \right] .\nonumber \\ \end{aligned}$$
(18)

Before stating the main result of this section, we introduce some notation. If \(\delta =\gamma =0\) we put \(n_0=4\) and when \(\delta =1\) or \(\gamma =1\) we set \(n_0=1\).

Theorem 2

Let \(\delta , \gamma \in \{0,1\}\) be fixed and let \(a_1\), \(b_1\), \(r_1\) be the corresponding constants defined by (5). If \(n \ge n_0\) then the estimator \({\widehat{F}}_{A}=a_1\sum _{i=1}^{n} {\mathbb {1}}\left( X_i \le s, Y_i \le t\right) +b_1\) of F is minimax under the weighted Cramer–von Mises loss function \(L_2({\hat{F}}, F)\) and the minimax risk is given by

$$\begin{aligned} \rho _N=\inf _{{\widehat{F}} \in \mathcal{D} } \sup _{F \in \mathcal{F}} R_2({\widehat{F}}, F) = \sup _{F \in \mathcal{F}} R_2({\widehat{F}}_{A}, F) =r_1. \end{aligned}$$

Proof

Lemma 2 implies that for any integer \(m>1\) and any \((\alpha _0,\ldots ,\alpha _m) \in \mathcal{A}_m^{\delta ,\gamma }\), we have

$$\begin{aligned} r_1= & {} \sup _{F \in \mathcal{F}} R_2({\widehat{F}}_{A}, F) \ge \inf _{d \in \mathcal{D}_m} \sup _{ p \in \mathcal{S}_m} R({\varvec{d}},{\varvec{p}}) \ge \inf _{d \in \mathcal{D}_m} {\mathbb {E}}_{\pi } \left[ \,R({\varvec{d}},{\varvec{p}}) \, \right] \\= & {} {\mathbb {E}}_{\pi } \left[ \,R({\varvec{d}}^{\pi },{\varvec{p}}) \, \right] =r(\alpha _0,\ldots ,\alpha _m), \end{aligned}$$

where \(r(\alpha _0,\ldots ,\alpha _m)\) stands for the Bayes risk given by the right-hand side of (18). It is clear that to prove minimaxity of \({\widehat{F}}_{A}\) it suffices to find a sequence of points \(((\alpha _0^{(m)},\ldots ,\alpha _m^{(m)} )) \) such that

$$\begin{aligned}&\left( \alpha _0^{(m)},\ldots ,\alpha _m^{(m)} \right) \in \mathcal{A}_m^{\delta ,\gamma } \quad \text {for each} \quad m>1 \quad \text {and} \quad \lim _{m \rightarrow \infty } r\left( \alpha _0^{(m)},\ldots ,\alpha _m^{(m)} \right) \nonumber \\&\quad =r_1(\delta ,\gamma ) \end{aligned}$$
(19)

(see, e.g. Lehmann and Casella 1998, Theorem 1.12, page 316).

Suppose first that \(\delta =1\) and \(\gamma =0\). Then, for any integer \(m>1\) and any \((\alpha _0,\ldots ,\alpha _m) \in \mathcal{A}_m^{\delta ,\gamma }\),

$$\begin{aligned} r(\alpha _0,\ldots ,\alpha _m)= & {} \frac{\alpha _0(\alpha -\alpha _0)+\sum _{i=1}^m \alpha _i(\alpha -\alpha _0 -\alpha _i) }{\alpha (\alpha +1)(n+\alpha )} \\= & {} \frac{\alpha (\alpha -\alpha _0) - \sum _{i=1}^m \alpha _i^2}{\alpha (\alpha +1)(n+\alpha )} \end{aligned}$$

[cf. (18)]. Now, let \( (\alpha _0^{(m)},\ldots ,\alpha _m^{(m)}) = \frac{1}{m+1}\left( \sqrt{n},\ldots ,\sqrt{n} \right) \). Then, by (16), \((\alpha _0^{(m)},\ldots ,\alpha _m^{(m)}) \in \mathcal{A}_m^{\delta ,\gamma }\), because \(\alpha _0^{(m)}, \ldots , \alpha _m^{(m)}\) are positive numbers that satisfy

$$\begin{aligned} \alpha ^{(m)}-\alpha _0^{(m)}-\alpha _i^{(m)}-\gamma =\alpha ^{(m)}-\alpha _0^{(m)}-\alpha _i^{(m)}=\sqrt{n} -\frac{2\sqrt{n}}{m+1} >0, \quad i=1,\ldots m. \end{aligned}$$

To complete the proof of the theorem for the case \(\delta =1\) and \(\gamma =0\), we conclude that

$$\begin{aligned} r(\alpha _0^{(m)},\ldots ,\alpha _m^{(m)} )= & {} \frac{\sqrt{n}\,\left( \sqrt{n}-\frac{\sqrt{n}}{m+1}\, \right) -m\left( \frac{\sqrt{n}}{m+1} \right) ^2}{\sqrt{n}(\sqrt{n}+1)(n+\sqrt{n})} \xrightarrow [m \rightarrow \infty ]{} \frac{1}{(\sqrt{n}+1)^2} \\= & {} r_1(1,0). \end{aligned}$$

Now we consider the case \(\delta =0\) and \(\gamma =1\). Then, for any integer \(m>1\) and any \((\alpha _0,\ldots ,\alpha _m) \in \mathcal{A}_m^{\delta ,\gamma } \),

$$\begin{aligned} r(\alpha _0,\ldots ,\alpha _m)= & {} \frac{\alpha _0(\alpha _0+1)+\sum _{i=1}^m \alpha _i(\alpha _0+\alpha _i+1) }{\alpha (\alpha +1)(n+\alpha )}\\= & {} \frac{\alpha (\alpha _0+1)+ \sum _{i=1}^m \alpha _i^2}{\alpha (\alpha +1)(n+\alpha )}, \end{aligned}$$

[again cf. (18)]. Let \((\varepsilon _m)\) be any sequence of real numbers converging to zero such that \(\varepsilon _m>2/(m-1)\) when \(m>2\). Let \( (\alpha _0^{(m)},\ldots ,\alpha _m^{(m)}) =\big (\sqrt{n}-1,(1+\varepsilon _m)/m,\ldots ,(1+\varepsilon _m)/m \big )\). If \(n>1\) then, by (16), \((\alpha _0^{(m)},\ldots ,\alpha _m^{(m)}) \in \mathcal{A}_m^{\delta ,\gamma }\), because \(\alpha _0^{(m)}, \ldots , \alpha _m^{(m)}\) are positive numbers that satisfy

$$\begin{aligned} \alpha ^{(m)}-\alpha _0^{(m)}-\alpha _i^{(m)}-\gamma= & {} \left( \sqrt{n}+\varepsilon _m\right) -\left( \sqrt{n}-1\right) -\left( 1+\varepsilon _m\right) /m -1\\= & {} \varepsilon _m-(1+\varepsilon _m)/m>0 \end{aligned}$$

for each \(i=1,\ldots ,m\). Simple calculation yields

$$\begin{aligned}&r\left( \alpha _0^{(m)},\ldots ,\alpha _m^{(m)} \right) \\&\quad =\frac{\left( \sqrt{n}+\varepsilon _m\right) \sqrt{n}+m\left[ (1+\varepsilon _m)/m \right] ^2}{\left( \sqrt{n}+\varepsilon _m\right) \left( \sqrt{n}+\varepsilon _m+1\right) \left( n+\sqrt{n}+\varepsilon _m\right) } \xrightarrow [m \rightarrow \infty ]{} \frac{1}{(\sqrt{n}+1)^2} = r_1(0,1), \end{aligned}$$

which proves minimaxity of \({\widehat{F}}_{A}\) when \(\delta =0\) and \(\gamma =1\).

Suppose now that \(\delta =1\) and \(\gamma =1\). Then , for any integer \(m>1\) any \((\alpha _0,\ldots ,\alpha _m) \in \mathcal{A}_m^{\delta ,\gamma } \), we have

$$\begin{aligned} r(\alpha _0,\ldots ,\alpha _m)= & {} \frac{\alpha _0+\sum _{i=1}^m \alpha _i }{\alpha (n+\alpha -1)} = \frac{\alpha }{\alpha (n+\alpha -1)} =\frac{1}{n+\alpha -1}. \end{aligned}$$

Define \( \displaystyle (\alpha _0^{(m)},\ldots ,\alpha _m^{(m)}) =\frac{1}{m+1}\left( 1+\varepsilon _m,\ldots ,1+\varepsilon _m \right) \), where \((\varepsilon _m)\) is the sequence given above. Then, by (16), \((\alpha _0^{(m)},\ldots ,\alpha _m^{(m)}) \in \mathcal{A}_m^{\delta ,\gamma }\), because \(\alpha _0^{(m)}, \ldots , \alpha _m^{(m)}\) are positive numbers and

$$\begin{aligned} \alpha ^{(m)}-\alpha _0^{(m)}-\alpha _i^{(m)}-\gamma =(1+\varepsilon _m) -\frac{2(1+\varepsilon _m)}{(m+1)}-1 >0 \quad i=1,\ldots ,m. \end{aligned}$$

To prove minimaxity of \({\widehat{F}}_{A}\) in the case, where \(\delta =1, \gamma =1\), we note that

$$\begin{aligned} r\left( \alpha _0^{(m)},\ldots ,\alpha _m^{(m)} \right) =\frac{1}{n+1+\varepsilon _m-1} \xrightarrow [m \rightarrow \infty ]{} \frac{1}{n} = r_1(1,1). \end{aligned}$$

Consider now the last case where \(\delta =0\) and \(\gamma =0\) and assume that \(n \ge 4\). Then, for any integer \(m>1\) and any point \((\alpha _0,\ldots ,\alpha _m) \in \mathcal{A}_m^{\delta ,\gamma } \), the Bayes risk (18) can be rewritten as

$$\begin{aligned} r(\alpha _0,\ldots ,\alpha _m)= & {} \frac{\alpha _0(\alpha _0+1)(\alpha -\alpha _0)+\sum _{i=1}^m \alpha _i(\alpha _0+\alpha _i+1)(\alpha -\alpha _0-\alpha _i)}{(n+\alpha +1)\alpha (\alpha +1)(\alpha +2)} \\= & {} \frac{\alpha (\alpha _0+1)(\alpha -\alpha _0)+(\alpha -1-2\alpha _0)\sum _{i=1}^m \alpha _i^2- \sum _{i=1}^m \alpha _i^3}{(n+\alpha +1)\alpha (\alpha +1)(\alpha +2)} \end{aligned}$$

Define \(\displaystyle (\alpha _0^{(m)},\ldots ,\alpha _m^{(m)}) =\left( \frac{\sqrt{n}+\varepsilon _m}{2}-1,\frac{\sqrt{n}+\varepsilon _m}{2m},\ldots ,\frac{\sqrt{n}+\varepsilon _m}{2m}\right) \), where \((\varepsilon _m)\) is the sequence given above. Then, by (16), \((\alpha _0^{(m)},\ldots ,\alpha _m^{(m)}) \in \mathcal{A}_m^{\delta ,\gamma }\), because, for \(n \ge n_0=4\), \(\alpha _0^{(m)}, \ldots , \alpha _m^{(m)}\) are positive numbers and for each \(i=1,\ldots ,m\) the following condition holds

$$\begin{aligned} \alpha ^{(m)}-\alpha _0^{(m)}-\alpha _i^{(m)}-\gamma= & {} \left( \sqrt{n}+\varepsilon _m-1\right) -\left( \frac{\sqrt{n}+\varepsilon _m}{2}-1\right) -\frac{\sqrt{n}+\varepsilon _m}{2m}-0 \\= & {} \frac{\sqrt{n}+\varepsilon _m}{2} -\frac{\sqrt{n}+\varepsilon _m}{2m} >0. \end{aligned}$$

Since \(\alpha ^{(m)}=\sqrt{n}+\varepsilon _m-1\) and hence \(\alpha ^{(m)}-1-2\alpha _0^{(m)}=0\), \( \alpha _0^{(m)}+1=\alpha ^{(m)}-\alpha _0^{(m)} =(\sqrt{n}+\varepsilon _m)/2\), it follows that

$$\begin{aligned}&r\left( \alpha _0^{(m)},\ldots ,\alpha _m^{(m)} \right) \\&\quad = \frac{\left( \sqrt{n}+\varepsilon _m-1\right) \left[ \,\left( \sqrt{n}+\varepsilon _m\right) /2 \, \right] ^2 - m \left[ \, \left( \sqrt{n}+\varepsilon _m\right) /(2m) \, \right] ^3 }{\left( n+\sqrt{n}+\varepsilon _m\right) \left( \sqrt{n}+\varepsilon _m-1 \right) \left( \sqrt{n}+\varepsilon _m\right) \left( \sqrt{n}+\varepsilon _m+1\right) } \xrightarrow [m \rightarrow \infty ]{} \frac{1}{4\left( \sqrt{n}+1\right) ^2} \\&\quad = r_1(0,0). \end{aligned}$$

This proves minimaxity of \({\widehat{F}}_{A}\) in the case, where \(\delta =0, \gamma =0\). \(\square \)

Remark 3

Theorem 2 can be generalized to higher dimensions. In order to do this, we must modify the c.d.f.’s \(F_{m,p}\) defined at the beginning of this section. For simplicity of notation, we consider only the three-dimensional case. Let \(({\tilde{x}}_n)_{n \ge 1}\) be an increasing sequence of points from (0, 1), and let \({\tilde{x}}_0={\tilde{y}}_0={\tilde{z}}_0=0\) and \({\tilde{y}}_i = {\tilde{z}}_i =1-{\tilde{x}}_i\) for \(i \ge 1\). For any \(m \ge 1\) and any probability vector \({\varvec{p}}=(p_0,\ldots ,p_m) \in \mathcal{S}_m\), let \(F_{m,p}\) be the c.d.f. of a discrete random vector (XYZ) with the support \(\{({\tilde{x}}_0,{\tilde{y}}_0,{\tilde{z}}_0),\ldots ,({\tilde{x}}_m,{\tilde{y}}_m,{\tilde{z}}_m)\}\) and with the joint probability mass function given by

$$\begin{aligned} f_{m,p}\left( {\tilde{x}}_i,{\tilde{y}}_i,{\tilde{z}}_i\right) =\Pr \left( X={\tilde{x}}_i,Y={\tilde{y}}_i,Z={\tilde{z}}_i \right) = p_i, \quad i=0,\ldots ,m. \end{aligned}$$

Then analogously to the bivariate case, the c.d.f. \(F_{m,p}\) satisfies

$$\begin{aligned} F_{m,p}\left( {\tilde{x}}_i,{\tilde{y}}_i,{\tilde{z}}_i\right) = \Pr \left( X\le {\tilde{x}}_i, Y \le {\tilde{y}}_i,Z \le {\tilde{z}}_i\right) = \left\{ \begin{array}{ll} p_0, &{} \quad {\text {when }}i=0, \\ p_0+p_i, &{} \quad {\text {when }}i=1,\ldots , m, \end{array} \right. \end{aligned}$$

which implies that the 3-variate analog of the equality (10) holds. The rest of the generalization is straightforward.