## Abstract

In this paper, we study probability distributions of order statistics from a set obtained by combining several simple random samples (SRS) selected from the same finite population. Each simple random sample is taken using without replacement selection procedure and does not contain any ties. On the other hand, in the combined sample, the same observation may appear more than once since each SRS is selected from the same finite population. Consequently, the number of the distinct observations in the combined sample is a discrete random variable. We provide the probability mass function of this discrete random variable. Next, using the order statistics in the combined SRSs, we construct confidence intervals for the quantiles and outer-inner confidence intervals for the quantile interval of a finite population. Finally, we also present a prediction interval for a future observation from the same finite population.

This is a preview of subscription content, access via your institution.

## References

Arnold, B.C., Balakrishnan, N. and Nagaraja, H.N. (1992).

*A first course in order statistics*. Wiley, New York.Balakrishnan, N., Beutner, E. and Cramer, E. (2010). Exact two-sample nonparametric confidence, prediction, and tolerance intervals on ordinary and progressively Type-II right censored data.

*Test***19**, 68–91.Berred, A. and Nevzorov, V. (2009). Characterizations based on order statistics under sampling without replacement.

*J. Stat. Plan. Inference***139**, 547–557.Chatterjee, A. (2011). Asymptotic properties of sample quantiles from a finite population.

*Ann Inst. Stat. Math.***63**, 157–179.Conti, P.L. and Marella, D. (2015). Inference for quantiles of a finite population: asymptotic versus resampling results.

*Scand. J. Stat.***42**, 545–561.David, H.A. and Nagaraja, H.N. (2003).

*Order statistics, 3rd edn*. Wiley, Hoboken.Francisco, C.A. and Fuller, W.A. (1991). Quantile estimation with a complex survey design.

*Ann. Stat.***19**, 454–495.Intrigliolo, D.S. and Castel, J.R. (2007). Evaluation of grapevine water status from trunk diameter variations.

*Irrig. Sci.***26**, 49–59. https://doi.org/10.1007/s00271-007-0071-2.Kuk, A.Y.C. (1988). Estimation of distribution functions and medians under sampling with unequal probabilities.

*Biometrika***75**, 97–103.Malinovsky, Y. and Rinott, Y. (2011). Best invariant and minimax estimation of quantiles in finite populations.

*J. Stat. Plan. Inference***141**, 2633–2644.Meyer, J.S. (1987). Outer and inner confidence intervals for finite population quantile intervals.

*J. Am. Stat. Assoc.***82**, 201–204.Ozturk, O. and Balakrishnan, N (2019). Constructing quantile confidence intervals using extended simple random sample in finite populations.

*Statistics***53**, 792–806. https://doi.org/10.1080/02331888.2019.1624754.Rao, J.N.K., Kovar, J.G. and Mantel, H.J. (1990). On estimating distribution functions and quantiles from survey data using auxiliary information.

*Biometrika***77**, 365–375.Royal, R.M. (1992). The model based (prediction) approach to finite population sampling theory. Lecture Notes-Monograph Series, vol. 17, Current Issues in Statistical Inference: Essays in Honor of D. Basu, pp. 225–240.

Sedransk, J. and Meyer, J. (1978). Confidence intervals for the quantiles of a finite population: simple random and stratified simple random sampling.

*J. R. Stat. Soc. Ser. B***40**, 239–252.Sitter, R.R. and Wu, C. (2001). A note on Woodruff confidence intervals for quantiles.

*Stat. Probab. Lett.***52**, 353–358.Shao, J. (1994). L-Statistics in complex survey problems.

*Ann. Stat.***22**, 946–967.Smith, P.J. and Sedransk, J. (1983). Lower bounds for confidence coefficients for confidence intervals for finite population quantiles.

*Commun. Stat.- Theory Methods***12**, 1329–1344.Wang, J.C. and Opsomer, J.D. (2011). On asymptotic normality and variance estimation for nondifferentiable survey estimators.

*Biometrika***98**, 91–106.Woodruff, R.S. (1952). Confidence intervals for medians and other position measures.

*J. Am. Stat. Assoc.***47**, 635–646.

## Acknowledgments

This research is supported by the Australian Grain Research and Development Corporation (GRDC) as part of the Statistics for the Australian Grains Industry project (UA00164). The data were collected by Bachelor of Agriculture Science students of the School of Agriculture, Food and Wine, University of Adelaide in 2019 under the supervision of Mr. Peter Kasprzak. The data was kindly made available for our purposes and other publications, as well as to improve the management and research decisions in the Coombe vineyard.

## Author information

### Affiliations

### Corresponding author

## Additional information

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Appendices

### Appendix:

### Proof 1 (Proof of Theorem 1).

Suppose *j* = 0. Then, there is no tie in the combined sample. The number of combined samples with no ties can be obtained by selecting *n*_{1} units out of *N* in the first sample and then *n*_{2} units out of the remaining *N* − *n*_{1} in the second sample, that is,

Since the samples are selected independently, we readily have

Now, suppose 0 < *j* ≤ *n*_{1}. We must have *j* tied pairs in the combined sample *S*_{1,2}, and suppose that *j* specific units in the population are tied in both samples. Then, there are

different ways to select the remaining untied units in both samples. Since *j* units in the population can be selected in \(\left (\begin {array}{c} {N} \\ {j}\end {array}\right )\) different ways, using the product rule we then obtain

which completes the proof. □

### Proof 2 (Proof of Theorem 2).

This is a special case of Theorem 3 and so there is no need to present its proof. □

### Proof 3 (Proof of Theorem 3).

By conditioning on *u*_{K} = *u*, we can write

where *f*_{(i:u)} is the pmf of the *i*-th order statistic from a sample of size *u* selected without replacement from the population \(\mathcal {P}\). If *i* < *n*_{K}, the unconditional distribution follows from the joint distribution as

where *P*(*u*_{K} ≥ *i*) = 1 if *i* < *n*_{k}. If *i* ≥ *n*_{K}, the *i*-th order statistic is observed only if there are at least *i* distinct observations in the combined sample *S*_{1:K}. Hence we must use the truncated distribution of *U*_{k} to marginalize the distribution of \(Z_{(i:n_{T})}\) to get

The proof then gets completed by combining these two pieces together. □

### R-function

############################# # This function is used by StepKF ### tieF=function(KV,N,n,m){ ret<-rep(0,width(KV)) ki=1 for(k in KV){ if(k > min(n,m)) {print("k must be less than minimum of n and m");return} ret[ki]= choose(N,k)*(choose(N-k,n-k)*choose(N-n,m-k))/(choose (N,n)*choose(N,m)) ki=ki+1 } return(ret) }

######################################## # This function computes the probability mass function # of sample size of distinct observations in the # K-th combined sample # P1: probability mass function of sample size in (K-1)-st step # NV: Sample size vector, ordered from smallest to largest StepKF=function(NV){ K=width(NV) # width of sample size vector P1= matrix(c(NV[1],1),ncol=2) for (k in (2:K)){ ML=min(P1[,1],NV[k]) MU=sum(NV[1:k]) # maximum value of distinct observations in step k Ck=NV[k]:MU # The range of sample size of distinct P2=matrix(0,ncol=2,nrow=width(Ck)) # updated pmf of sample size # in the K-th step P2[,1]=Ck # Values of sample size of distinct obs in step k pd=dim(P1) dv=P1[,1] # Values of sample size of distinct obs in step k-1 d=width(dv) newind=1:width(Ck) # this is used to locate the sample size in the range # in k-th step oldind=1:d # this is used to locate the sample size in the range # in (k-1)-th step for(dd in dv){ Jk=0:min(dd,NV[k]) # The tie vector in the K-th step given that # (k-1)-st step had dd distinct observations JkD=sort(Jk,decreasing=TRUE) dold=which(dd==dv) MV=sort(c(P1[dold,1],NV[k])) TPkV=tieF(JkD,N,MV[1],MV[2]) # this computes the probability having #JkD ties when we combine the distinct observations # in step k-1 with sample n_k for(jkn in JkD){ dnew=which(Ck==(dd+NV[k]-jkn)) P2[dnew,2]=P2[dnew,2]+P1[dold,2]*tieF(jkn,N,MV[1],MV[2]) # this updates # the probability of sample size in step k } } P1=P2 } colnames(P1)=c("u_K","P(U_K)") return(P1) }

## Rights and permissions

## About this article

### Cite this article

Ozturk, O., Balakrishnan, N. & Kravchuk, O. Order Statistics Based on a Combined Simple Random Sample from a Finite Population and Applications to Inference.
*Sankhya A* (2021). https://doi.org/10.1007/s13171-020-00228-x

Received:

Accepted:

Published:

### Keywords and phrases

- Order statistics
- inner confidence interval
- outer confidence interval
- prediction interval
- quantile intervals.

### AMS (2000) subject classification

- Primary 62D05
- Secondary 62G30