# Order Statistics Based on a Combined Simple Random Sample from a Finite Population and Applications to Inference

## Abstract

In this paper, we study probability distributions of order statistics from a set obtained by combining several simple random samples (SRS) selected from the same finite population. Each simple random sample is taken using without replacement selection procedure and does not contain any ties. On the other hand, in the combined sample, the same observation may appear more than once since each SRS is selected from the same finite population. Consequently, the number of the distinct observations in the combined sample is a discrete random variable. We provide the probability mass function of this discrete random variable. Next, using the order statistics in the combined SRSs, we construct confidence intervals for the quantiles and outer-inner confidence intervals for the quantile interval of a finite population. Finally, we also present a prediction interval for a future observation from the same finite population.

This is a preview of subscription content, access via your institution.

## References

1. Arnold, B.C., Balakrishnan, N. and Nagaraja, H.N. (1992). A first course in order statistics. Wiley, New York.

2. Balakrishnan, N., Beutner, E. and Cramer, E. (2010). Exact two-sample nonparametric confidence, prediction, and tolerance intervals on ordinary and progressively Type-II right censored data. Test 19, 68–91.

3. Berred, A. and Nevzorov, V. (2009). Characterizations based on order statistics under sampling without replacement. J. Stat. Plan. Inference 139, 547–557.

4. Chatterjee, A. (2011). Asymptotic properties of sample quantiles from a finite population. Ann Inst. Stat. Math. 63, 157–179.

5. Conti, P.L. and Marella, D. (2015). Inference for quantiles of a finite population: asymptotic versus resampling results. Scand. J. Stat. 42, 545–561.

6. David, H.A. and Nagaraja, H.N. (2003). Order statistics, 3rd edn. Wiley, Hoboken.

7. Francisco, C.A. and Fuller, W.A. (1991). Quantile estimation with a complex survey design. Ann. Stat. 19, 454–495.

8. Intrigliolo, D.S. and Castel, J.R. (2007). Evaluation of grapevine water status from trunk diameter variations. Irrig. Sci. 26, 49–59. https://doi.org/10.1007/s00271-007-0071-2.

9. Kuk, A.Y.C. (1988). Estimation of distribution functions and medians under sampling with unequal probabilities. Biometrika 75, 97–103.

10. Malinovsky, Y. and Rinott, Y. (2011). Best invariant and minimax estimation of quantiles in finite populations. J. Stat. Plan. Inference 141, 2633–2644.

11. Meyer, J.S. (1987). Outer and inner confidence intervals for finite population quantile intervals. J. Am. Stat. Assoc. 82, 201–204.

12. Ozturk, O. and Balakrishnan, N (2019). Constructing quantile confidence intervals using extended simple random sample in finite populations. Statistics 53, 792–806. https://doi.org/10.1080/02331888.2019.1624754.

13. Rao, J.N.K., Kovar, J.G. and Mantel, H.J. (1990). On estimating distribution functions and quantiles from survey data using auxiliary information. Biometrika 77, 365–375.

14. Royal, R.M. (1992). The model based (prediction) approach to finite population sampling theory. Lecture Notes-Monograph Series, vol. 17, Current Issues in Statistical Inference: Essays in Honor of D. Basu, pp. 225–240.

15. Sedransk, J. and Meyer, J. (1978). Confidence intervals for the quantiles of a finite population: simple random and stratified simple random sampling. J. R. Stat. Soc. Ser. B 40, 239–252.

16. Sitter, R.R. and Wu, C. (2001). A note on Woodruff confidence intervals for quantiles. Stat. Probab. Lett. 52, 353–358.

17. Shao, J. (1994). L-Statistics in complex survey problems. Ann. Stat.22, 946–967.

18. Smith, P.J. and Sedransk, J. (1983). Lower bounds for confidence coefficients for confidence intervals for finite population quantiles. Commun. Stat.- Theory Methods 12, 1329–1344.

19. Wang, J.C. and Opsomer, J.D. (2011). On asymptotic normality and variance estimation for nondifferentiable survey estimators. Biometrika 98, 91–106.

20. Woodruff, R.S. (1952). Confidence intervals for medians and other position measures. J. Am. Stat. Assoc. 47, 635–646.

## Acknowledgments

This research is supported by the Australian Grain Research and Development Corporation (GRDC) as part of the Statistics for the Australian Grains Industry project (UA00164). The data were collected by Bachelor of Agriculture Science students of the School of Agriculture, Food and Wine, University of Adelaide in 2019 under the supervision of Mr. Peter Kasprzak. The data was kindly made available for our purposes and other publications, as well as to improve the management and research decisions in the Coombe vineyard.

## Author information

Authors

### Corresponding author

Correspondence to Omer Ozturk.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Appendices

### Proof 1 (Proof of Theorem 1).

Suppose j = 0. Then, there is no tie in the combined sample. The number of combined samples with no ties can be obtained by selecting n1 units out of N in the first sample and then n2 units out of the remaining Nn1 in the second sample, that is,

$$\begin{array}{@{}rcl@{}} \text{ Number of sampled}~S_{1:2}~\text{with no ties}=\left( \begin{array}{c}{N} \\ {n_{1}}\end{array}\right)\left( \begin{array}{c} {N-n_{1}} \\ {n_{2}}\end{array}\right). \end{array}$$

Since the samples are selected independently, we readily have

$$\begin{array}{@{}rcl@{}} P(j=0)= \frac{\left( \begin{array}{c} {N} \\ {n_{1}}\end{array}\right)\left( \begin{array}{c}{N-n_{1}} \\ {n_{2}}\end{array}\right)}{ \left( \begin{array}{c} {N} \\ {n_{1}}\end{array}\right) \left( \begin{array}{c}{N} \\ {n_{2}}\end{array}\right)}. \end{array}$$

Now, suppose 0 < jn1. We must have j tied pairs in the combined sample S1,2, and suppose that j specific units in the population are tied in both samples. Then, there are

$$\begin{array}{@{}rcl@{}} \left( \begin{array}{c} {N-j} \\ {n_{1}-j}\end{array}\right)\left( \begin{array}{c} {N-n_{1}} \\ {n_{2}-j}\end{array}\right) \end{array}$$

different ways to select the remaining untied units in both samples. Since j units in the population can be selected in $$\left (\begin {array}{c} {N} \\ {j}\end {array}\right )$$ different ways, using the product rule we then obtain

$$\begin{array}{@{}rcl@{}} P(j~\text{tied pairs in } S_{1:2})= \frac{\left( \begin{array}{c}{N} \\ {j}\end{array}\right) \left( \begin{array}{c} {N-j} \\ {n_{1}-j}\end{array}\right)\left( \begin{array}{c} {N-n_{1})} \\ {n_{2}-j}\end{array}\right)}{\left( \begin{array}{c} {N} \\ {n_{1}}\end{array}\right) \left( \begin{array}{c} {N} \\ {n_{2}}\end{array}\right)}, \end{array}$$

which completes the proof. □

### Proof 2 (Proof of Theorem 2).

This is a special case of Theorem 3 and so there is no need to present its proof. □

### Proof 3 (Proof of Theorem 3).

By conditioning on uK = u, we can write

$$\begin{array}{@{}rcl@{}} P(Z_{(i:u)}=x_{t}|u_{K}=u)= f_{(i:u)}(x_{t}), \end{array}$$

where f(i:u) is the pmf of the i-th order statistic from a sample of size u selected without replacement from the population $$\mathcal {P}$$. If i < nK, the unconditional distribution follows from the joint distribution as

$$\begin{array}{@{}rcl@{}} P(Z_{(i:n_{T})}=x_{t})= \sum\limits_{u=n_{K}}^{n_{T}} f_{(i:u)}(x_{t}) P(u_{K}=u), \quad i=1, \ldots, n_{k}-1, \end{array}$$

where P(uKi) = 1 if i < nk. If inK, the i-th order statistic is observed only if there are at least i distinct observations in the combined sample S1:K. Hence we must use the truncated distribution of Uk to marginalize the distribution of $$Z_{(i:n_{T})}$$ to get

$$\begin{array}{@{}rcl@{}} P(Z_{(i:n_{T})}=x_{t})= \sum\limits_{u=n_{K}}^{n_{T}} \frac{f_{(i:u)}(x_{t}) P(u_{K}=u)}{P(u_{K} \ge i)}, \quad i=n_{K}, \ldots, n_{T}. \end{array}$$

The proof then gets completed by combining these two pieces together. □

### R-function

############################# # This function is used by StepKF ### tieF=function(KV,N,n,m){ ret<-rep(0,width(KV)) ki=1 for(k in KV){ if(k > min(n,m)) {print("k must be less than minimum of n and m");return} ret[ki]= choose(N,k)*(choose(N-k,n-k)*choose(N-n,m-k))/(choose (N,n)*choose(N,m)) ki=ki+1 } return(ret) }

######################################## # This function computes the probability mass function # of sample size of distinct observations in the # K-th combined sample # P1: probability mass function of sample size in (K-1)-st step # NV: Sample size vector, ordered from smallest to largest StepKF=function(NV){ K=width(NV) # width of sample size vector P1= matrix(c(NV,1),ncol=2) for (k in (2:K)){ ML=min(P1[,1],NV[k]) MU=sum(NV[1:k]) # maximum value of distinct observations in step k Ck=NV[k]:MU # The range of sample size of distinct P2=matrix(0,ncol=2,nrow=width(Ck)) # updated pmf of sample size # in the K-th step P2[,1]=Ck # Values of sample size of distinct obs in step k pd=dim(P1) dv=P1[,1] # Values of sample size of distinct obs in step k-1 d=width(dv) newind=1:width(Ck) # this is used to locate the sample size in the range # in k-th step oldind=1:d # this is used to locate the sample size in the range # in (k-1)-th step for(dd in dv){ Jk=0:min(dd,NV[k]) # The tie vector in the K-th step given that # (k-1)-st step had dd distinct observations JkD=sort(Jk,decreasing=TRUE) dold=which(dd==dv) MV=sort(c(P1[dold,1],NV[k])) TPkV=tieF(JkD,N,MV,MV) # this computes the probability having #JkD ties when we combine the distinct observations # in step k-1 with sample n_k for(jkn in JkD){ dnew=which(Ck==(dd+NV[k]-jkn)) P2[dnew,2]=P2[dnew,2]+P1[dold,2]*tieF(jkn,N,MV,MV) # this updates # the probability of sample size in step k } } P1=P2 } colnames(P1)=c("u_K","P(U_K)") return(P1) }

## Rights and permissions

Reprints and Permissions