Skip to main content
Log in

Statistical inference using rank-based post-stratified samples in a finite population

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

In this paper, we consider statistical inference based on post-stratified samples from a finite population. We first select a simple random sample (SRS) of size n and identify their population ranks. Conditioning on these population ranks, we construct probability mass functions of the sample ranks of n units in a larger sample of size \(M > n\). The n units in SRS are then post-stratified into d classes using conditional sample ranks. The sample ranks are constructed with two different conditional distributions leading to two different sampling designs. The first design uses a conditional distribution given n ordered population ranks. The second design uses a conditional distribution given a single (marginal) unordered population rank. The paper introduces unbiased estimators for the population mean, total, and their variances based on the post-stratified samples from these two designs. The conditional distributions of the sample ranks are used to construct Rao–Blackwell estimator for the population mean and total. We show that Rao–Blackwell estimators outperform the same estimators constructed from a systematic sample.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Chen M, Ahn S, Wang X, Lim J (2014) Generalized isotonized mean estimators for judgment post-stratification with multiple rankers. J Agric Biol Environ Stat 19:405–418

    Article  MathSciNet  Google Scholar 

  • Frey J (2012) Constrained nonparametric estimation of the mean and the CDF using ranked-set sampling with a covariate. Ann Inst Stat Math 64:439–456

    Article  MathSciNet  Google Scholar 

  • Frey J, Feeman TG (2012) An improved mean estimator for judgment post-stratification. Comput Stat Data Anal 56:418–426

    Article  MathSciNet  Google Scholar 

  • Frey J, Feeman TG (2013) Variance estimation using judgment post-stratification. Ann Inst Stat Math 65:551–569

    Article  MathSciNet  Google Scholar 

  • Frey J, Ozturk O (2011) Constrained estimation using judgment post-stratification. Ann Inst Stat Math 63:769–789

    Article  MathSciNet  Google Scholar 

  • Hollander M, Wolfe DA, Chicken E (2014) Nonparametric statistical methods, 3rd edn. Wiley, Hoboken

    MATH  Google Scholar 

  • MacEachern SN, Stasny EA, Wolfe DA (2004) Judgment post-stratification with imprecise rankings. Biometrics 60:207–215

    Article  MathSciNet  Google Scholar 

  • Ozturk O (2013) Combining multi-observer information in partially rank-ordered judgment post-stratified and ranked set samples. Can J Stat 41:304–324

    Article  MathSciNet  Google Scholar 

  • Ozturk O (2014) Statistical inference for population quantiles and variance in judgment post-stratified samples. Comput Stat Data Anal 77:188–205

    Article  MathSciNet  Google Scholar 

  • Ozturk O (2016) Statistical inference based on judgment post-stratifed samples in finite population. Surv Methodol 42:239–262

    Google Scholar 

  • Ozturk O (2017) Statistical inference with empty strata in judgment post stratifed samples. Ann Inst Stat Math 69:1029–1057

    Article  Google Scholar 

  • Ozturk O, Bayramoglu-Kavlak K (2018a) Model based inference using ranked set samples. Surv Methodol 44:1–16

    MATH  Google Scholar 

  • Ozturk O, Bayramoglu-Kavlak K (2018b) Model based inference using judgment post-stratified samples. In: Review Shankya

  • Stokes SL, Wang X, Chen M (2007) Judgment post-stratification with multiple rankers. J Stat Theory Appl 6:344–359

    MathSciNet  Google Scholar 

  • Wang X, Wang K, Lim J (2012) Isotonized CDF estimation from judgment poststratification data with empty strata. Biometrics 68:194–202

    Article  MathSciNet  Google Scholar 

  • Wang X, Stokes SL, Lim J, Chen M (2006) Concomitants of multivariate order statistics with application to judgment post-stratification. J Am Stat Assoc 101:1693–1704

    Article  Google Scholar 

  • Wang X, Lim J, Stokes SL (2008) A nonparametric mean estimator for judgment post-stratified data. Biometrics 64:355–363

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omer Ozturk.

Appendix

Appendix

Proof of Theorem 1:

The first expression follows from Eq. (1). For the second equation, we observe that

$$\begin{aligned} P(R_{s_i}^{A_1}=u,s_i=k)= & {} P(R_{s_i}^{A_1}=u|s_i=k)P(s_i=k)\nonumber \\= & {} \frac{\left( \begin{array}{c} {k-i} \\ {u-i} \end{array}\right) \left( \begin{array}{c} {N-n-k+i} \\ {M-n-u+i} \end{array}\right) }{ \left( \begin{array}{c} {N-n} \\ {M-n} \end{array}\right) } \frac{\left( \begin{array}{c} {k-1} \\ {i-1} \end{array}\right) \left( \begin{array}{c} {N-k} \\ {n-i} \end{array}\right) }{ \left( \begin{array}{c} {N} \\ {n} \end{array}\right) } \nonumber \\= & {} \frac{\left( \begin{array}{c} {N-k} \\ {M-u} \end{array}\right) \left( \begin{array}{c} {M-u} \\ {n-i} \end{array}\right) \left( \begin{array}{c} {k-1} \\ {u-1} \end{array}\right) \left( \begin{array}{c} {u-1} \\ {i-1} \end{array}\right) }{\left( \begin{array}{c} {N-n} \\ {M-n} \end{array}\right) \left( \begin{array}{c} {N} \\ {n} \end{array}\right) } \nonumber \\ P(R_{s_i}^{A_1}=u)= & {} \sum _{k=1}^N \frac{\left( \begin{array}{c} {N-k} \\ {M-u} \end{array}\right) \left( \begin{array}{c} {M-u} \\ {n-i} \end{array}\right) \left( \begin{array}{c} {k-1} \\ {u-1} \end{array}\right) \left( \begin{array}{c} {u-1} \\ {i-1} \end{array}\right) }{\left( \begin{array}{c} {N-n} \\ {M-n} \end{array}\right) \left( \begin{array}{c} {N} \\ {n} \end{array}\right) } = \frac{\left( \begin{array}{c} {u-1} \\ {i-1} \end{array}\right) \left( \begin{array}{c} {M-u} \\ {n-i} \end{array}\right) }{\left( \begin{array}{c} {M} \\ {n} \end{array}\right) }. \end{aligned}$$
(7)

We now consider the conditional distribution of \(X_{s_i}\) given \(R_{s_i}^{A_1}\). Using Eq. (7) and \(P(R^{A_q}_{s_i}=h)\), we write

$$\begin{aligned} P(X_{s_i}=x_k|R_{s_i}^{A_1}=u)= & {} \frac{P(X_{s_i}=x_k,R_{s_i}^{A_1}=u)}{P(R_{s_i}^{A_1}=u)}\\= & {} \frac{\frac{\left( \begin{array}{c} {N-k} \\ {M-u} \end{array}\right) \left( \begin{array}{c} {M-u} \\ {n-i} \end{array}\right) \left( \begin{array}{c} {k-1} \\ {u-1} \end{array}\right) \left( \begin{array}{c} {u-1} \\ {i-1} \end{array}\right) }{\left( \begin{array}{c} {N-n} \\ {M-n} \end{array} \right) \left( \begin{array}{c} N \\ n \end{array} \right) }}{ \frac{\left( \begin{array}{c} {u-1} \\ {i-1} \end{array}\right) \left( \begin{array}{c} {M-u} \\ {n-i} \end{array}\right) }{\left( \begin{array}{c} {M} \\ {n} \end{array}\right) }} \\= & {} \frac{\left( \begin{array}{c} {k-1} \\ {h-1} \end{array}\right) \left( \begin{array}{c} {N-k} \\ {M-u} \end{array}\right) }{\left( \begin{array}{c} {N} \\ {M} \end{array}\right) }. \end{aligned}$$

For the joint distribution of the sample ranks in Eq. (4) for design \(A_1\), we first observe that

$$\begin{aligned} P(s_i=k,s_j=t)= \frac{ \left( \begin{array}{c} {k-1} \\ {i-1} \end{array} \right) \left( \begin{array}{c} {t-k-1} \\ {j-i-1} \end{array} \right) \left( \begin{array}{c} {N-t} \\ {n-j} \end{array} \right) }{\left( \begin{array}{c} N \\ n \end{array}\right) }. \end{aligned}$$

The joint distribution of sample and population ranks are then given by

$$\begin{aligned} P(R_{s_i}^{A_1}=u, R_{s_j}^{A_1}=u',s_i=k,s_j=t)= & {} \frac{ \left( \begin{array}{c} {k-1} \\ {i-1} \end{array} \right) \left( \begin{array}{c} {t-k-1} \\ {j-i-1} \end{array} \right) \left( \begin{array}{c} {N-t} \\ {n-j} \end{array} \right) }{\left( \begin{array}{c} N \\ n \end{array}\right) } \frac{ \left( \begin{array}{c} {k-i} \\ {u-i} \end{array} \right) \left( \begin{array}{c} {t-k-j+i} \\ {u'-u-j+i} \end{array} \right) \left( \begin{array}{c} {N-t-n+j} \\ {M-n+j-u'} \end{array} \right) }{\left( \begin{array}{c} {N-n} \\ {m- n} \end{array}\right) }\\= & {} \frac{ \left( \begin{array}{c} {k-1} \\ {u-1} \end{array} \right) \left( \begin{array}{c} {u-1} \\ {i-1} \end{array} \right) \left( \begin{array}{c} {t-k-1} \\ {u'-u-1} \end{array} \right) \left( \begin{array}{c} {u'-u-1} \\ {j-i-1} \end{array} \right) \left( \begin{array}{c} {N-t} \\ {M-u'} \end{array} \right) \left( \begin{array}{c} {M-u'} \\ {n-j} \end{array} \right) }{\left( \begin{array}{c} N \\ n \end{array}\right) \left( \begin{array}{c} {N-n} \\ {M- n} \end{array}\right) }. \end{aligned}$$

The marginal distribution of sample ranks then follows from summation over k and t

$$\begin{aligned} P(R_{s_i}^{A_1}=u, R_{s_j}^{A_1}=u')= & {} \sum _{t=1}^N \sum _{k=1}^{t-1}\frac{ \left( \begin{array}{c} {k-1} \\ {u-1} \end{array} \right) \left( \begin{array}{c} {u-1} \\ {i-1} \end{array} \right) \left( \begin{array}{c} {t-k-1} \\ {u'-u-1} \end{array} \right) \left( \begin{array}{c} {u'-u-1} \\ {j-i-1} \end{array} \right) \left( \begin{array}{c} {N-t} \\ {M-u'} \end{array} \right) \left( \begin{array}{c} {M-u'} \\ {n-j} \end{array} \right) }{\left( \begin{array}{c} N \\ n \end{array}\right) \left( \begin{array}{c} {N-n} \\ {M- n} \end{array}\right) } \\= & {} \frac{\left( \begin{array}{c} N \\ M \end{array}\right) \left( \begin{array}{c} {u-1} \\ {i-1} \end{array} \right) \left( \begin{array}{c} {u'-u-1} \\ {j-i-1} \end{array} \right) \left( \begin{array}{c} {M-u'} \\ {n-j} \end{array} \right) }{\left( \begin{array}{c} N \\ n \end{array}\right) \left( \begin{array}{c} {N-n} \\ {M- n} \end{array}\right) }. \end{aligned}$$

The conditional distribution of \(X_{s_i}\) and \(X_{s_j}\) given their sample ranks is obtained as follows

$$\begin{aligned} P(X_{s_i}=x_k,X_{s_j}=x_t|R_{s_i}^{A_1}=u, R_{s_j}^{A_1}=u')= & {} \frac{P(R_{s_i}^{A_1}=u, R_{s_j}^{A_1}=u',s_i=k,s_j=t)}{P(R_{s_i}^{A_1}=u, R_{s_j}^{A_1}=u')}\\= & {} \frac{ \left( \begin{array}{c} {k-1} \\ {u-1} \end{array} \right) \left( \begin{array}{c} {t-k-1} \\ {u'-u-1} \end{array} \right) \left( \begin{array}{c} {N-t} \\ {M-u'} \end{array} \right) }{\left( \begin{array}{c} N \\ M \end{array}\right) }. \end{aligned}$$

\(\square \)

Proof of Lemma 1:

For the proof of (i) and (ii) when \(q=1\), we first observe that \(\frac{I_h^{A_1}I_{hu}^{A_1}}{n_h^{A_1}d_n^{A_1}}; u \in D_h\), are identically distributed. Let \(A= E\left( \frac{I_h^{A_1}I_{hu}^{A_1}}{n_h^{A_1}d_n^{A_1}}\right) \) for \( u \in D_h\). Note that the following equalities hold

$$\begin{aligned} \sum _{u \in D_h} \frac{I_h^{A_1}I_{hu}^{A_1}}{n_h^{A_1}d_n^{A_1}}= \frac{I_h^{A_1}}{d_n^{A_1}}. \end{aligned}$$

We take the expected value in both sides of the above equation and write

$$\begin{aligned} E\left\{ \sum _{u \in D_h} \frac{I_h^{A_1}I_{hu}^{A_1}}{n_h^{A_1}d_n^{A_1}} \right\}= & {} E\left( \frac{I_h^{A_q}}{d_n^{A_q}}\right) , \\ \sum _{u \in D_h} E \left( \frac{I_h^{A_1}I_{hu}^{A_1}}{n_h^{A_1}d_n^{A_1}}\right)= & {} E\left( \frac{I_h^{A_1}}{d_n^{A_1}}\right) , \\ H E \left( \frac{I_h^{A_1}I_{h1}^{A_1}}{n_h^{A_1}d_n^{A_1}}\right)= & {} E\left( \frac{I_h^{A_1}}{d_n^{A_1}}\right) , \\ A= & {} \frac{1}{H} E\left( \frac{I_h^{A_1}}{d_n^{A_1}}\right) . \end{aligned}$$

We again observe that \(I_h^{A_1}/d_n^{A_1}\); \(h=1,\ldots , d\), are identically distributed. Let \(B=E(I_h^{A_1}/d_n^{A_1})\); \(h=1,\ldots ,d\). It is then easy to see that \(B= 1/d\). When \(q=2\), the proof of (i) is given in Ozturk (2014). This completes the proofs of (i) and (ii).

Proof of (iii): To simplify the notation, we drop the superscript \(A_1\) in part (iii) and (iv). All random variables in these parts are constructed from design \(A_1\), even though they are not explicitly stated. We first observe that

$$\begin{aligned}&\displaystyle \sum _{u \in D_h} \sum _{v \in D_h} \frac{I_h^2I_{hu}I_{hv}}{d_n^2n_h^2} = \frac{I_h^2}{d_n^2} \\&\displaystyle \sum _{u \in D_h}\sum _{(v \ne u) \in D_h} \frac{I_h^2I_{hu}I_{hv}}{d_n^2n_h^2} + \sum _{u \in D_v} \frac{I_h^2I^2_{hu}}{d_n^2n_h^2} =\frac{I_h^2}{d_n^2}. \end{aligned}$$

Taking the expected value in both sides of the above equation, after some simplification, we obtain

$$\begin{aligned} {\text {Cov}}\left( \frac{I_1I_{11}}{d_nn_1},\frac{I_1I_{12}}{d_nn_1}\right) = -\frac{1}{H-1} {\text {Var}}\left( \frac{I_1I_{11}}{d_nn_1}\right) +{\text {Var}}\left( I_1/d_n\right) \frac{1}{H\left( H-1\right) }. \end{aligned}$$

Proof of (iv): We again observe the following equation

$$\begin{aligned} \sum _{u \in D_h} \sum _{v \in D_{h'}} \frac{I_{h}I_{hu}I_{h'}I_{h'v}}{n_hn_{h'}d_n^2} = \frac{I_hI_{h'}}{d_n^2}. \end{aligned}$$

Taking the expected values in both sides, we write

$$\begin{aligned} H^2E\left( \frac{I_{h}I_{ha}I_{h'}I_{h'b}}{n_hn_{h'}d_n^2}\right)= & {} E(I_hI_{h'}/d_n^2)\nonumber \\ {\text {Cov}}\left( \frac{I_{h}I_{ha}}{d_nn_h}, \frac{I_{h'}I_{h'b}}{d_nn_{h'}}\right) + \frac{1}{d^2H^2}= & {} \frac{1}{H^2}\left\{ {\text {Cov}}(I_h/d_n, I_{h'}/d_n)+\frac{1}{d^2}\right\} \nonumber \\ {\text {Cov}}\left( \frac{I_{h}I_{ha}}{d_nn_h}, \frac{I_{h'}I_{h'b}}{d_nn_{h'}}\right)= & {} -\frac{1}{H^2} {\text {var}}(I_1/d_n); \quad a \in D_h, b \in D_{h'}. \end{aligned}$$
(8)

The proofs of (v) and (vi) are given in Ozturk (2014) in a slightly different context. \(\square \)

Proof of Theorem 2:

We only consider the unbiasedness of the estimator \(\bar{X}_n^{A_1}\). The proof of the other estimator is similar. To simplify the notation, we again drop the subscript \(A_1\) in \(\bar{X}_n^{A_1}\). Using conditional expectation, we write

$$\begin{aligned} E(\bar{X}_n)= & {} E \left\{ \frac{1}{d_n} \sum _{h=1}^d \frac{I_h}{n_h} \sum _{u \in D_h} \sum _{i=1}^n I(R_{s_i}=u) E(X_{s_i}|R_{s_i}=u)\right\} \\= & {} E \left\{ \frac{1}{d_n} \sum _{h=1}^d \frac{I_h}{n_h} \sum _{u \in D_h} \sum _{i=1}^n I(R_{s_i}=u) \mu _{u:M} \right\} . \end{aligned}$$

Note that \(\sum _{i=1}^n I(R_{s_i}=u)= I_{hu}\) for \(u \in D_h\). Using this notation, the expected value of \(\bar{X}_n\) reduces to

$$\begin{aligned} E(\bar{X}_n)= & {} E \left\{ \frac{1}{d_n} \sum _{h=1}^d \frac{I_h}{n_h} \sum _{u \in D_h} I_{hu} \mu _{u:M} \right\} \end{aligned}$$
(9)

The ratios \( \frac{I_hI_{hu}}{n_hd_n}\); \(u \in D_h\) are identically distributed. From Lemma 1, we have \( E(\frac{I_hI_{hu}}{n_hd_n})= \frac{1}{H} E(\frac{I_h}{d_n})\), for \(u \in D_h\). Using this equality, we write

$$\begin{aligned} E(\bar{X}_n)= & {} E \left\{ \frac{1}{d_n} \sum _{h=1}^d \frac{I_h}{n_h} \sum _{u \in D_h} I_{hu} \mu _{u:M} \right\} \\= & {} E\left\{ \frac{1}{d_n} \sum _{h=1}^d \sum _{u \in D_h}\frac{I_h I_{u}}{n_hd_n} \mu _{u:M} \right\} = \sum _{h=1}^d\sum _{u \in D_h}E\left\{ \frac{I_hI_{h1}}{n_hd_n}\right\} \mu _{u:M} \\= & {} \sum _{h=1}^d \frac{1}{H}\sum _{u \in D_h} E(\frac{I_h}{d_n}) \mu _{u:M} \end{aligned}$$

Again \( \frac{I_h}{d_n}\); \(h=1,\ldots d\), are identically distributed and \(E(\frac{I_1}{d_n}) =1/d\). Hence, we write

$$\begin{aligned} E(\bar{X}_n)= \sum _{h=1}^d \frac{1}{H}\sum _{u \in D_h} E(\frac{I_h}{d_n}) \mu _{u:M}= \frac{1}{dH} \sum _{h=1}^d \sum _{u \in D_h} \mu _{u:M} = \mu . \end{aligned}$$

This completes the proof. \(\square \)

Proof of Theorem 3:

Again to simplify the notation without loss of generality, we write \(\bar{X}_n=\bar{X}_n^{A_1}\) and drop the superscript \(A_1\). Using total variance, we can write

$$\begin{aligned} {\text {Var}}(\bar{X}_n)= {\text {Var}}( E(\bar{X}_n|R))+ E( {\text {Var}}(\bar{X}_n|R)) \end{aligned}$$

From Eq. (9), we write

$$\begin{aligned} E(\bar{X}_n|R)= \sum _{h=1}^d \sum _{u \in D_h} \frac{I_hI_{hu}}{d_nn_h} \mu _{u:M} = \sum _{h=1}^d T_h, \quad T_h=\sum _{u \in D_h} \frac{I_hI_{hu}}{d_nn_h} \mu _{u:M}. \end{aligned}$$

Using the above expression, we obtain

$$\begin{aligned} {\text {Var}}(E(\bar{X}_n|R))= \sum _{h=1}^d {\text {Var}}(T_h) + \sum _{h=1}^d \sum _{h' \ne h} ^d {\text {Cov}}(T_h,T_{h'}) \end{aligned}$$

The variance of \(T_h\) can be expressed as

$$\begin{aligned} {\text {Var}}(T_h)= & {} {\text {Var}}\left\{ \sum _{u \in D_h} \frac{I_hI_{hu}}{d_nn_h} \mu _{u:M}\right\} \\= & {} \sum _{u \in D_h} \mu _{u:M}^2 {\text {Var}}(\frac{I_hI_{hu}}{d_nn_h}) \\&\quad +\sum _{u} \sum _{v} I\{(u \ne v) \in D_h\}\mu _{u:M} \mu _{v:M} {\text {Cov}}\left( \frac{I_hI_{hu}}{d_nn_h}, \frac{I_hI_{hv}}{d_nn_h}\right) \end{aligned}$$

The ratios \(\frac{I_hI_{hu}}{d_nn_h}\); \(u \in D_h\), are identically distributed. The variance of \(T_h\) then reduces to

$$\begin{aligned} {\text {Var}}(T_h)= & {} {\text {Var}}\left( \frac{I_hI_{h1}}{d_nn_h}\right) \sum _{u \in D_h} \mu _{u:M}^2+ {\text {Cov}}\left( \frac{I_hI_{h1}}{d_nn_h}, \frac{I_hI_{h2}}{d_nn_h}\right) \sum _{(u \ne v) \in D_h}\mu _{u:M} \mu _{v:M} \\= & {} {\text {Var}}\left( \frac{I_hI_{h1}}{d_nn_h}\right) \sum _{u \in D_h} \mu _{u:M}^2 +\left\{ \frac{{\text {Var}}(\frac{I_1}{d_n})}{H}-{\text {Var}}(\frac{I_hI_{h1}}{d_nn_h})\right\} \sum _{(u \ne v) \in D_h}\frac{\mu _{u:M} \mu _{v:M}}{H-1} \\= & {} {\text {Var}}\left( \frac{I_hI_{h1}}{d_nn_h}\right) \left\{ \sum _{u \in D_h} \mu _{u:M}^2 - \sum _{(u \ne v) \in D_h}\frac{\mu _{u:M} \mu _{v:M}}{H-1}\right\} \\&\quad + \frac{{\text {Var}}(\frac{I_1}{d_n})}{H(H-1)} \sum _{(u \ne v) \in D_h}\mu _{u:M} \mu _{v:M}. \end{aligned}$$

We now observe that

$$\begin{aligned} \sum _{(u \ne v) \in D_h}\mu _{u:M} \mu _{v:M} =\left( \sum _{u \in D_h} \mu _{u:M}\right) ^2- \sum _{u \in D_h}\mu _{u:M}^2= \mu _h^2- \sum _{u \in D_h}\mu _{u:M}^2, \end{aligned}$$

where \(\mu _h =\sum _{u \in D_h} \mu _{u:M}\). Using the equation above, we write

$$\begin{aligned} {\text {Var}}(T_h)= & {} \frac{{\text {Var}}(\frac{I_hI_{h1}}{d_nn_h})}{H-1}\left\{ H \sum _{u \in D_h}\mu _{u:M}^2 - \mu _h^2 \right\} + \frac{{\text {Var}}(\frac{I_1}{d_n})}{H(H-1)} \left\{ \mu _h^2 -\sum _{u \in D_h}\mu _{u:M}^2 \right\} \nonumber \\= & {} \frac{{\text {Var}}(\frac{I_hI_{h1}}{d_nn_h})}{H-1} H\sum _{u \in D_h}(\mu _{u:M}-\bar{\mu }_h)^2 \nonumber \\&+ \, {\text {Var}}(\frac{I_1}{d_n}) \left\{ \frac{H\bar{\mu }^2_h}{H-1}- \frac{\sum _{u \in D_h} \mu _{u:M}^2}{H(H-1)}\right\} , \end{aligned}$$
(10)

where \(\bar{\mu }_h=\mu _h/H\).

In a similar fashion, we can write

$$\begin{aligned} {\text {Cov}}(T_h,T_{h'})= & {} {\text {Cov}}\left( \sum _{u \in D_h} \frac{I_hI_{hu} \mu _{u:M}}{d_nn_h}, \sum _{v \in D_{h'}} \frac{I_{h'}I_{h'v} \mu _{v:M}}{d_nn_{h'}}\right) \\= & {} \sum _{u \in D_h} \sum _{v \in D_{h'}} \mu _{u:M} \mu _{v:M} {\text {Cov}}\left( \frac{I_hI_{hu} }{d_nn_h} , \frac{I_{h'}I_{h'v} }{d_nn_{h'}}\right) \\= & {} {\text {Cov}}\left( \frac{I_hI_{h1} }{d_nn_h} , \frac{I_{h'}I_{h'1}}{d_nn_{h'}}\right) \sum _{u \in D_h} \sum _{v \in D_{h'}} \mu _{u:M} \mu _{v:M} \end{aligned}$$

Note that from Lemma  1, we have

$$\begin{aligned} {\text {Cov}}\left( \frac{I_hI_{h1} }{d_nn_h} , \frac{I_{h'}I_{h'1}}{d_nn_{h'}}\right) =-\frac{1}{H^2(d-1)} {\text {Var}}(\frac{I_1}{d_n}). \end{aligned}$$

Using the above expression, we obtain

$$\begin{aligned} {\text {Cov}}(T_h,T_{h'}) = -\frac{1}{H^2(d-1)} {\text {Var}}(\frac{I_1}{d_n}) \sum _{u \in D_h} \sum _{v \in D_{h'}} \mu _{u:M} \mu _{v:M} \end{aligned}$$

and

$$\begin{aligned} \sum _{h=1}^d \sum _{h' \ne h}^d {\text {Cov}}(T_h,T_{h'})= & {} -\frac{{\text {Var}}(I_1/d_n)}{H^2(d-1)} \left\{ H^2d^2\mu ^2-H^2\sum _{h=1}^d \bar{\mu }^2_h\right\} . \end{aligned}$$
(11)

Combining Eqs. (10) and (11) in \({\text {Var}}(E(\bar{X}_n|R))\), after some simplifications, we obtain

$$\begin{aligned} {\text {Var}}(E(\bar{X}_n|R))= & {} \frac{H{\text {Var}}(\frac{I_1I_{11}}{d_nn_1})}{H-1} \sum _{h=1}^d\sum _{u \in D_h}(\mu _{u:M}-\bar{\mu }_h)^2 \\&+ {\text {Var}}(\frac{I_1}{d_n}) \sum _{h=1}^d \left\{ \frac{H\bar{\mu }^2_h}{H-1}- \frac{\sum _{u \in D_h} (\mu _{u:M}^2}{H(H-1)}\right\} \\&- \frac{{\text {Var}}(I_1/d_n)}{(d-1)} \left\{ d^2\mu ^2-\sum _{h=1}^d \bar{\mu }^2_h\right\} \\= & {} \frac{H}{H-1} {\text {Var}}\left( \frac{I_1I_{11}}{d_nn_1}\right) \sum _{h=1}^d \sum _{u \in D_h}( \mu _{u:M} -\bar{\mu }_h)^2 \\&+\,{\text {Var}}(\frac{I_1}{d_n})\sum _{h=1}^d \bar{\mu }_h^2\left( \frac{H}{H-1}+\frac{1}{d-1}\right) \\&-\,{\text {Var}}(\frac{I_1}{d_n})\left\{ \sum _{h=1}^d \sum _{u \in D_h}\frac{\mu _{u:M}^2}{H(H-1)} +\frac{d^2}{d-1}\mu ^2\right\} . \end{aligned}$$

Note that \( \sum _{u \in D_h}\mu _{u:M}^2= \sum _{u \in D_h}(\mu _{u:M}-\bar{\mu }_h)^2+H\bar{\mu }^2_h\). Using this equality in the above expression and combining similar terms, we write

$$\begin{aligned} {\text {Var}}(E(\bar{X}_n|R))= & {} \frac{H}{H-1} {\text {Var}}\left( \frac{I_1I_{11}}{d_nn_1}\right) \sum _{h=1}^d \sum _{u \in D_h}\left( \mu _{u:M} -\bar{\mu }_h\right) ^2 \nonumber \\&-\,\frac{1}{H(H-1)}{\text {Var}}\left( \frac{I_1}{d_n}\right) \sum _{h=1}^d \sum _{u \in D_h}\left( \mu _{u:M}-\bar{\mu }_h\right) ^2 \nonumber \\&+{\text {Var}}(\frac{I_1}{d_n}) \left\{ \sum _{h=1}^d\bar{\mu }_h^2\left( \frac{H(d-1)+(H-1)}{(H-1)(d-1)}-\frac{1}{H-1}\right) -\frac{d^2 \mu ^2}{d-1}\right\} \nonumber \\= & {} \sum _{h=1}^d \sum _{u \in D_h}( \mu _{u:M} -\bar{\mu }_h)^2 \left\{ \frac{H}{H-1} {\text {Var}}(\frac{I_1I_{11}}{d_nn_1})- \frac{1}{H(H-1)}{\text {Var}}\left( \frac{I_1}{d_n}\right) \right\} \nonumber \\&+\,\frac{d}{d-1} {\text {Var}}(\frac{I_1}{d_n})\sum _{h=1}^d(\bar{\mu }_h -\mu )^2. \end{aligned}$$
(12)

We now consider \(E({\text {Var}}(\bar{X}_n|R))\). Let

$$\begin{aligned} L_h= \sum _{u \in D_h}\sum _{i=1}^n \frac{I(R_{s_i}=u)}{d_nn_h}X_{s_i}. \end{aligned}$$

The conditional variance of \(\bar{X}_n\) given ranks R can be written as follows

$$\begin{aligned} {\text {Var}}(\bar{X}_n|R)= {\text {Var}}\left( \sum _{h=1}^d L_h\right) = \sum _{h=1}^d{\text {Var}}(L_h) +\sum _{h=1}^d \sum _{h'\ne h}^d {\text {Cov}}(L_h,L_{h'}). \end{aligned}$$

For the joint rankings, the variance of \(L_h\) is given by

$$\begin{aligned} {\text {Var}}(L_h)= \sum _{u \in D_h} \frac{I_h^2I_{hu}^2}{d_n^2n_h^2} \sigma ^2_{u:M}+ \sum _{(u \ne v) \in D_h} \frac{I_h^2 I_{hu}I_{hv}}{d_n^2n_h^2} \sigma _{u,v:M}^2 \end{aligned}$$

From the above equations, we write

$$\begin{aligned} E({\text {Var}}(L_h))= & {} E\left( \frac{I_h^2I_{ha}^2}{d_n^2n_h^2}\right) \sum _{u \in D_h} \sigma ^2_{u:M} +E\left( \frac{I_h^2I_{ha}I_{hb}}{d_n^2n_h^2}\right) \sum _{(u \ne v) \in D_h} \sigma _{u,v:M}^2; \quad (a,b) \in D_h\\= & {} {\text {Var}}\left( \frac{I_hI_{ha}}{d_nn_h}\right) \sum _{u \in D_h} \sigma ^2_{u:M} + {\text {Cov}}\left( \frac{I_hI_{ha}}{d_nn_h},\frac{I_hI_{hb}}{d_nn_h}\right) \sum _{(u \ne v) \in D_h} \sigma _{u,v:M}^2\\&+ \frac{1}{d^2H^2} \sum _{(u , v) \in D_h} \sigma _{u,v:M}^2; \quad (a,b) \in D_h \end{aligned}$$

Using part (iv) of Lemma 1, after some simplification, we obtain

$$\begin{aligned} E({\text {Var}}(L_h))= & {} \frac{\sum _{(u,v) \in D_h}\sigma _{u,v:M}^2}{H-1} \left\{ \frac{H-1}{d^2H^2}+\frac{{\text {Var}}(I_h/d_n)}{H}-{\text {Var}}(\frac{I_hI_{ha}}{d_nn_h}\right\} \\&+ \,\frac{\sum _{u \in D_h} \sigma _{u:M}^2}{H-1} \left\{ H {\text {Var}}(\frac{I_hI_{ha}}{d_nn_h})-{\text {Var}}(I_h/d_n)/H\right\} . \end{aligned}$$

For \(h \ne h'\), using the similar arguments, we obtain

$$\begin{aligned} E({\text {Cov}}(L_h,L_{h'}))= & {} \sum _{u \in D_h} \sum _{v \in D_{h'}} E\left( \frac{I_hI_{h'}I_{hu}I_{h'v}}{d_n^2n_hn_{h'}}\right) \sigma ^2_{u,v:M} \\= & {} E\left( \frac{I_hI_{h'}I_{ha}I_{h'b}}{d_n^2n_hn_{h'}}\right) \sum _{u \in D_h} \sum _{v \in D_{h'}} \sigma _{u,v:M}^2; \quad a \in D_h, b \in D_{h'} \\= & {} \frac{1}{H^2} \left[ {\text {Cov}}(I_h/d_n,I_{h'}/d_n)+1/d^2\right] \sum _{u \in D_h} \sum _{v \in D_{h'}} \sigma _{u,v:M}^2 \end{aligned}$$

The last equality follows from Eq. (8). Using the equality \({\text {Cov}}(I_1/d_n,I_2/d_n)=-{\text {Var}}(I_1/d_n)/H^2\), we obtain

$$\begin{aligned} E({\text {Cov}}(L_h,L_{h'}))= \frac{1}{H^2d^2 }\sum _{u \in D_h} \sum _{v \in D_{h'}} \sigma _{u,v:M}^2-\frac{{\text {Var}}(I_1/d_n)}{H^2(d-1)} \sum _{u \in D_h} \sum _{v \in D_{h'}} \sigma _{u,v:M}^2 \end{aligned}$$

Combining the variances and covariances, after some simplifications, we obtain

$$\begin{aligned} E({\text {Var}}(\bar{X}_n|R))= & {} \frac{M(N-M)\sigma ^2}{NH^2}\left\{ 1/d^2-{\text {Var}}(I_1/d_n)/(d-1)\right\} \\&+\sum _{h=1}^d \sum _{(u,v) \in D_h}\frac{\sigma _{u,v:M}^2}{H-1} \left\{ \frac{(Hd-1){\text {Var}}(I_1/d_n)}{H^2(d-1)}-{\text {Var}}(\frac{I_1I_{11}}{d_nn_1})\right\} \\&+\sum _{h=1}^d \sum _{u \in D_h}\frac{\sigma ^2_{u:M}}{H(H-1)}\left\{ H^2{\text {Var}}(\frac{I_1I_{11}}{d_nn_1})-{\text {Var}}(I_1/d_n)\right\} . \end{aligned}$$

Finally, the variance of \(\bar{X}_n\) is obtained

$$\begin{aligned} {\text {Var}}(\bar{X}_n)= & {} {\text {Var}}(E(\bar{X}_n|R))+E({\text {Var}}(\bar{X}_n|R)) \\= & {} \sum _{h=1}^d \sum _{u \in D_h}\frac{( \mu _{u:M} -\bar{\mu }_h)^2}{H-1} \left\{ H {\text {Var}}\left( \frac{I_1I_{11}}{d_nn_1}\right) -\frac{{\text {Var}}(\frac{I_1}{d_n})}{H} \right\} \\&+\,\frac{{\text {Var}}(\frac{I_1}{d_n})}{d-1} \sum _{h=1}^d (\bar{\mu }_h-\mu )^2 \\&+\,\frac{M(N-M)\sigma ^2}{NH^2}\left\{ 1/d^2-{\text {Var}}(I_1/d_n)/(d-1)\right\} \\&+\,\sum _{h=1}^d \sum _{(u,v) \in D_h}\frac{\sigma _{u,v:M}^2}{H-1} \left\{ \frac{(Hd-1){\text {Var}}(I_1/d_n)}{H^2(d-1)}-{\text {Var}}\left( \frac{I_1I_{11}}{d_nn_1}\right) \right\} \\&+\,\sum _{h=1}^d \sum _{u \in D_h}\frac{\sigma ^2_{u:M}}{H(H-1)}\left\{ H^2{\text {Var}}\left( \frac{I_1I_{11}}{d_nn_1}\right) -{\text {Var}}(I_1/d_n)\right\} . \end{aligned}$$

For the proof of \(\sigma ^2_{A_2}\), we first define a random variable \(Z_{R_{s_i},h}= I(R_{s_i} \in D_h) X_{s_i}\). The conditional distribution of \(Z_{R_{s_i},h}\) given that \(R_{s_i} \in D_h\) under design \(A_2\) is given by

$$\begin{aligned} P(Z_{R_{s_i}}=z|R_{s_i} \in D_h) =\frac{1}{H}\sum _{u \in D_h} P(X_{u:M}=z). \end{aligned}$$

Using this conditional distribution, we can easily establish the following equalities

$$\begin{aligned} E(Z_{R_{s_i}}|R_{s_i} \in D_h)= & {} \frac{1}{H} \sum _{u \in D_h} \mu _{u:M} =\bar{\mu }_h \\ V(Z_{R_{s_i}}|R_{s_i} \in D_h)= & {} \frac{1}{H} \sum _{u \in D_h} \sigma ^2_{u:M} +\frac{1}{H} \sum _{u \in D_h} (\mu _{u:M} -\bar{\mu }_h)^2. \end{aligned}$$

Again to simplify the notation we drop the subscript \(A_2\) in \(\bar{X}^{A_2}\). All random variables in the proof of \(\sigma ^2_{A_2}\) are defined based on design \(A_2\) even if it is not explicitly stated. From the total variance, we write

$$\begin{aligned} {\text {Var}}(\bar{X}^{A_2})={\text {Var}}(E(\bar{X}^{A_2}|R))+E({\text {Var}}(\bar{X}^{A_2}|R)). \end{aligned}$$

We first consider

$$\begin{aligned} {\text {Var}}(E(\bar{X}^{A_2}|R))= & {} {\text {var}}\left( \sum _{h=1}^d \frac{I_h}{d_nn_h} \sum _{i=1}^n E(Z_{R_{s_i},h}|R_{s_i} \in D_h)\right) \\= & {} {\text {var}}\left( \frac{I_1}{d_n}\right) \sum _{h=1} \bar{\mu }^2_h+ {\text {Cov}}\left( \frac{I_1}{d_n},\frac{I_2}{d_n}\right) \sum _{h=1} \sum _{h' \ne h} \bar{\mu }_h \bar{\mu }_{h'} \\= & {} {\text {var}}\left( \frac{I_1}{d_n}\right) \sum _{h=1} \bar{\mu }^2_h- \frac{1}{d-1}{\text {Var}}\left( \frac{I_1}{d_n}\right) (d^2\mu ^2-\sum _{h=1}^d \bar{\mu }_h^2 ) \\= & {} \frac{d}{d-1} {\text {Var}}\left( \frac{I_1}{d_n}\right) \sum _{h=1}^d (\bar{\mu }^2_h-\mu )^2 . \end{aligned}$$

In a similar fashion, we can show that

$$\begin{aligned} E({\text {Var}}(\bar{X}^{A_2}|R))= & {} E\left( \sum _{h=1}^d \frac{I_h^2}{d_n^2n_h^2} \sum _{i=1}^n {\text {Var}}(Z_{R_{s_i},h}|R_{s_i} \in D_h)\right) \\= & {} E\left( \frac{I_1^2}{d_n^2n_1}\right) \sum _{h=1}^d {\text {Var}}(Z_{R_{s_1},h}|R_{s_1} \in D_h) \\= & {} E\left( \frac{I_1^2}{d_n^2n_1}\right) \left\{ \frac{1}{H} \sum _{h=1}^d\sigma ^2_h+ \frac{1}{H} \sum _{h=1}^d \sum _{u \in D_h} (\mu _{u:M}-\bar{\mu }_h)^2\right\} . \end{aligned}$$

We complete the proof by combining expressions \({\text {Var}}(E(\bar{X}^{A_2}|R))\) and \(E({\text {Var}}(\bar{X}^{A_2}|R))\).

\(\square \)

Proof of Theorem 4:

We first consider the expected value of \(T_1\)

$$\begin{aligned} E(W_1)= & {} d\sum _{h=1}^d\sum _{i=1}^n \sum _{j\ne i}^nE\left( \frac{I_h^*}{ d_n^*n_h(n_h-1)}\right) E\left\{ (Z_{R_{s_i},h}-Z_{R_{s_j},h})^2|(R_{s_i},R_{s_j})\in D_h\right\} \\= & {} d \sum _{h=1}^dE(\frac{I^*_h}{d_n^*})E\left\{ (Z_{R_{s_1},h}-Z_{R_{s_2},h})^2 |(R_{s_i},R_{s_j})\in D_h\right\} \\= & {} 2dE(\frac{I^*_1}{d_n^*}) \sum _{h=1}^d {\text {var}}\left( Z_{R_{s_1},h}|R_{s_1}\in D_h\right) = 2 \sum _{h=1}^d {\text {var}}\left( Z_{R_{s_1},h}|R_{s_1}\in D_h\right) \end{aligned}$$

In a similar fashion, expected value of \(T_2\) is given by

$$\begin{aligned} E(W_2)= & {} \sum _{h=1}^d \sum _{h'\ne h}^d \sum _{i=1} \sum _{j=1}^n E\left( \frac{I_hI_{h'}}{d_n^2n_hn_{h'}}\right) E\left\{ (Z_{R_{s_i},h}-Z_{R_{s_j},h'})^2|(R_{s_i},R_{s_j})\in D_h\right\} \\= & {} E\left( \frac{I_1I_2}{d_n^2}\right) \sum _{h=1}^d \sum _{h' \ne h}^d E\left\{ (Z_{R_{s_1},h}-Z_{R_{s_2},h'})^2|(R_{s_1},R_{s_2})\in D_h\right\} \\= & {} E\left( \frac{I_1I_2}{d_n^2}\right) \left\{ 2(d-1) \sum _{h=1}^d {\text {Var}}\left( Z_{R_{s_1},h}|R_{s_1}\in D_h\right) + 2(d-1) \sum _{h=1}^d (\bar{\mu }_h-\mu )^2\right\} \\&- 2 E\left( \frac{I_1I_2}{d_n^2}\right) \sum _{h=1}\sum _{h'\ne h}(\bar{\mu }_h-\mu )(\bar{\mu }_h'-\mu ) \\= & {} E\left( \frac{I_1I_2}{d_n^2}\right) \left\{ 2(d-1) \sum _{h=1}^d {\text {Var}}(Z_{R_{s_1},h}|R_{s_1}\in D_h)+ 2d \sum _{h=1}^d (\bar{\mu }_h-\mu )^2\right\} \end{aligned}$$

We now combine \(E(W_1)\) and \(E(W_2)\) in \(\hat{\sigma }^2_{A_2}\) to write

$$\begin{aligned} E(\hat{\sigma }^2_{A_2})= & {} E(W_1/2)\left\{ E(\frac{I_1^2}{d_n^2n_1})-{\text {Var}}(\frac{I_1}{d_n})\right\} +\frac{E(W_2/2){\text {Var}}(I_1/d_n)}{E(\frac{I_1I_2}{d_n^2})(d-1)} \\= & {} \left\{ \sum _{h=1}^d {\text {var}}\left( Z_{R_{s_1},h}|R_{s_1}\in D_h\right) \right\} \left\{ E(\frac{I_1^2}{d_n^2n_1})-{\text {Var}}(\frac{I_1}{d_n})\right\} \\&+ \left\{ \sum _{h=1}^d {\text {var}}(Z_{R_{s_1},h}|R_{s_1}\in D_h)+\frac{d}{d-1} \sum _{h=1}^d(\bar{\mu }_h-\mu )^2\right\} {\text {Var}}(I_1/d_n) \\= & {} E\left( \frac{I_1^2}{d_n^2n_1}\right) \sum _{h=1}^d {\text {var}}\left( Z_{R_{s_1},h}|R_{s_1}\in D_h\right) +\frac{d}{d-1} {\text {Var}}(I_1/d_n)\bar{\tau }^2 \\= & {} \frac{d}{d-1} {\text {Var}}(I_1/d_n)\bar{\tau }^2 +\frac{1}{H}E\left( \frac{I_1^2}{d_n^2n_1}\right) \left\{ \sum _{h=1}^d \sigma _h^2 +\sum _{h=1}^d \tau ^2_h \right\} . \end{aligned}$$

This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ozturk, O. Statistical inference using rank-based post-stratified samples in a finite population. TEST 28, 1113–1143 (2019). https://doi.org/10.1007/s11749-018-0618-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-018-0618-y

Keywords

Mathematics Subject Classification

Navigation