Skip to main content
Log in

Order Statistics Based on a Combined Simple Random Sample from a Finite Population and Applications to Inference

  • Published:
Sankhya A Aims and scope Submit manuscript

Abstract

In this paper, we study probability distributions of order statistics from a set obtained by combining several simple random samples (SRS) selected from the same finite population. Each simple random sample is taken using without replacement selection procedure and does not contain any ties. On the other hand, in the combined sample, the same observation may appear more than once since each SRS is selected from the same finite population. Consequently, the number of the distinct observations in the combined sample is a discrete random variable. We provide the probability mass function of this discrete random variable. Next, using the order statistics in the combined SRSs, we construct confidence intervals for the quantiles and outer-inner confidence intervals for the quantile interval of a finite population. Finally, we also present a prediction interval for a future observation from the same finite population.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arnold, B.C., Balakrishnan, N. and Nagaraja, H.N. (1992). A first course in order statistics. Wiley, New York.

    MATH  Google Scholar 

  • Balakrishnan, N., Beutner, E. and Cramer, E. (2010). Exact two-sample nonparametric confidence, prediction, and tolerance intervals on ordinary and progressively Type-II right censored data. Test 19, 68–91.

    Article  MATH  Google Scholar 

  • Berred, A. and Nevzorov, V. (2009). Characterizations based on order statistics under sampling without replacement. J. Stat. Plan. Inference 139, 547–557.

    Article  MATH  Google Scholar 

  • Chatterjee, A. (2011). Asymptotic properties of sample quantiles from a finite population. Ann Inst. Stat. Math. 63, 157–179.

    Article  MATH  Google Scholar 

  • Conti, P.L. and Marella, D. (2015). Inference for quantiles of a finite population: asymptotic versus resampling results. Scand. J. Stat. 42, 545–561.

    Article  MATH  Google Scholar 

  • David, H.A. and Nagaraja, H.N. (2003). Order statistics, 3rd edn. Wiley, Hoboken.

    Book  Google Scholar 

  • Francisco, C.A. and Fuller, W.A. (1991). Quantile estimation with a complex survey design. Ann. Stat. 19, 454–495.

    Article  MATH  Google Scholar 

  • Intrigliolo, D.S. and Castel, J.R. (2007). Evaluation of grapevine water status from trunk diameter variations. Irrig. Sci. 26, 49–59. https://doi.org/10.1007/s00271-007-0071-2.

    Article  Google Scholar 

  • Kuk, A.Y.C. (1988). Estimation of distribution functions and medians under sampling with unequal probabilities. Biometrika 75, 97–103.

    Article  MATH  Google Scholar 

  • Malinovsky, Y. and Rinott, Y. (2011). Best invariant and minimax estimation of quantiles in finite populations. J. Stat. Plan. Inference 141, 2633–2644.

    Article  MATH  Google Scholar 

  • Meyer, J.S. (1987). Outer and inner confidence intervals for finite population quantile intervals. J. Am. Stat. Assoc. 82, 201–204.

    Article  MATH  Google Scholar 

  • Ozturk, O. and Balakrishnan, N (2019). Constructing quantile confidence intervals using extended simple random sample in finite populations. Statistics 53, 792–806. https://doi.org/10.1080/02331888.2019.1624754.

    Article  MATH  Google Scholar 

  • Rao, J.N.K., Kovar, J.G. and Mantel, H.J. (1990). On estimating distribution functions and quantiles from survey data using auxiliary information. Biometrika 77, 365–375.

    Article  MATH  Google Scholar 

  • Royal, R.M. (1992). The model based (prediction) approach to finite population sampling theory. Lecture Notes-Monograph Series, vol. 17, Current Issues in Statistical Inference: Essays in Honor of D. Basu, pp. 225–240.

  • Sedransk, J. and Meyer, J. (1978). Confidence intervals for the quantiles of a finite population: simple random and stratified simple random sampling. J. R. Stat. Soc. Ser. B 40, 239–252.

    MATH  Google Scholar 

  • Sitter, R.R. and Wu, C. (2001). A note on Woodruff confidence intervals for quantiles. Stat. Probab. Lett. 52, 353–358.

    Article  MATH  Google Scholar 

  • Shao, J. (1994). L-Statistics in complex survey problems. Ann. Stat.22, 946–967.

    Article  MATH  Google Scholar 

  • Smith, P.J. and Sedransk, J. (1983). Lower bounds for confidence coefficients for confidence intervals for finite population quantiles. Commun. Stat.- Theory Methods 12, 1329–1344.

    Article  MATH  Google Scholar 

  • Wang, J.C. and Opsomer, J.D. (2011). On asymptotic normality and variance estimation for nondifferentiable survey estimators. Biometrika 98, 91–106.

    Article  MATH  Google Scholar 

  • Woodruff, R.S. (1952). Confidence intervals for medians and other position measures. J. Am. Stat. Assoc. 47, 635–646.

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This research is supported by the Australian Grain Research and Development Corporation (GRDC) as part of the Statistics for the Australian Grains Industry project (UA00164). The data were collected by Bachelor of Agriculture Science students of the School of Agriculture, Food and Wine, University of Adelaide in 2019 under the supervision of Mr. Peter Kasprzak. The data was kindly made available for our purposes and other publications, as well as to improve the management and research decisions in the Coombe vineyard.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Omer Ozturk.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix:

Proof 1 (Proof of Theorem 1).

Suppose j = 0. Then, there is no tie in the combined sample. The number of combined samples with no ties can be obtained by selecting n1 units out of N in the first sample and then n2 units out of the remaining Nn1 in the second sample, that is,

$$ \begin{array}{@{}rcl@{}} \text{ Number of sampled}~S_{1:2}~\text{with no ties}=\left( \begin{array}{c}{N} \\ {n_{1}}\end{array}\right)\left( \begin{array}{c} {N-n_{1}} \\ {n_{2}}\end{array}\right). \end{array} $$

Since the samples are selected independently, we readily have

$$ \begin{array}{@{}rcl@{}} P(j=0)= \frac{\left( \begin{array}{c} {N} \\ {n_{1}}\end{array}\right)\left( \begin{array}{c}{N-n_{1}} \\ {n_{2}}\end{array}\right)}{ \left( \begin{array}{c} {N} \\ {n_{1}}\end{array}\right) \left( \begin{array}{c}{N} \\ {n_{2}}\end{array}\right)}. \end{array} $$

Now, suppose 0 < jn1. We must have j tied pairs in the combined sample S1,2, and suppose that j specific units in the population are tied in both samples. Then, there are

$$ \begin{array}{@{}rcl@{}} \left( \begin{array}{c} {N-j} \\ {n_{1}-j}\end{array}\right)\left( \begin{array}{c} {N-n_{1}} \\ {n_{2}-j}\end{array}\right) \end{array} $$

different ways to select the remaining untied units in both samples. Since j units in the population can be selected in \(\left (\begin {array}{c} {N} \\ {j}\end {array}\right )\) different ways, using the product rule we then obtain

$$ \begin{array}{@{}rcl@{}} P(j~\text{tied pairs in } S_{1:2})= \frac{\left( \begin{array}{c}{N} \\ {j}\end{array}\right) \left( \begin{array}{c} {N-j} \\ {n_{1}-j}\end{array}\right)\left( \begin{array}{c} {N-n_{1})} \\ {n_{2}-j}\end{array}\right)}{\left( \begin{array}{c} {N} \\ {n_{1}}\end{array}\right) \left( \begin{array}{c} {N} \\ {n_{2}}\end{array}\right)}, \end{array} $$

which completes the proof. □

Proof 2 (Proof of Theorem 2).

This is a special case of Theorem 3 and so there is no need to present its proof. □

Proof 3 (Proof of Theorem 3).

By conditioning on uK = u, we can write

$$ \begin{array}{@{}rcl@{}} P(Z_{(i:u)}=x_{t}|u_{K}=u)= f_{(i:u)}(x_{t}), \end{array} $$

where f(i:u) is the pmf of the i-th order statistic from a sample of size u selected without replacement from the population \(\mathcal {P}\). If i < nK, the unconditional distribution follows from the joint distribution as

$$ \begin{array}{@{}rcl@{}} P(Z_{(i:n_{T})}=x_{t})= \sum\limits_{u=n_{K}}^{n_{T}} f_{(i:u)}(x_{t}) P(u_{K}=u), \quad i=1, \ldots, n_{k}-1, \end{array} $$

where P(uKi) = 1 if i < nk. If inK, the i-th order statistic is observed only if there are at least i distinct observations in the combined sample S1:K. Hence we must use the truncated distribution of Uk to marginalize the distribution of \(Z_{(i:n_{T})}\) to get

$$ \begin{array}{@{}rcl@{}} P(Z_{(i:n_{T})}=x_{t})= \sum\limits_{u=n_{K}}^{n_{T}} \frac{f_{(i:u)}(x_{t}) P(u_{K}=u)}{P(u_{K} \ge i)}, \quad i=n_{K}, \ldots, n_{T}. \end{array} $$

The proof then gets completed by combining these two pieces together. □

R-function


############################# # This function is used by StepKF ### tieF=function(KV,N,n,m){ ret<-rep(0,width(KV)) ki=1 for(k in KV){ if(k > min(n,m)) {print("k must be less than minimum of n and m");return} ret[ki]= choose(N,k)*(choose(N-k,n-k)*choose(N-n,m-k))/(choose (N,n)*choose(N,m)) ki=ki+1 } return(ret) }

######################################## # This function computes the probability mass function # of sample size of distinct observations in the # K-th combined sample # P1: probability mass function of sample size in (K-1)-st step # NV: Sample size vector, ordered from smallest to largest StepKF=function(NV){ K=width(NV) # width of sample size vector P1= matrix(c(NV[1],1),ncol=2) for (k in (2:K)){ ML=min(P1[,1],NV[k]) MU=sum(NV[1:k]) # maximum value of distinct observations in step k Ck=NV[k]:MU # The range of sample size of distinct P2=matrix(0,ncol=2,nrow=width(Ck)) # updated pmf of sample size # in the K-th step P2[,1]=Ck # Values of sample size of distinct obs in step k pd=dim(P1) dv=P1[,1] # Values of sample size of distinct obs in step k-1 d=width(dv) newind=1:width(Ck) # this is used to locate the sample size in the range # in k-th step oldind=1:d # this is used to locate the sample size in the range # in (k-1)-th step for(dd in dv){ Jk=0:min(dd,NV[k]) # The tie vector in the K-th step given that # (k-1)-st step had dd distinct observations JkD=sort(Jk,decreasing=TRUE) dold=which(dd==dv) MV=sort(c(P1[dold,1],NV[k])) TPkV=tieF(JkD,N,MV[1],MV[2]) # this computes the probability having #JkD ties when we combine the distinct observations # in step k-1 with sample n_k for(jkn in JkD){ dnew=which(Ck==(dd+NV[k]-jkn)) P2[dnew,2]=P2[dnew,2]+P1[dold,2]*tieF(jkn,N,MV[1],MV[2]) # this updates # the probability of sample size in step k } } P1=P2 } colnames(P1)=c("u_K","P(U_K)") return(P1) }

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ozturk, O., Balakrishnan, N. & Kravchuk, O. Order Statistics Based on a Combined Simple Random Sample from a Finite Population and Applications to Inference. Sankhya A 85, 77–101 (2023). https://doi.org/10.1007/s13171-020-00228-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13171-020-00228-x

Keywords and phrases

AMS (2000) subject classification

Navigation