Abstract
Partially rank-ordered set (PROS) sampling is a generalization of ranked set sampling in which rankers are not required to fully rank the sampling units in each set, hence having more flexibility to perform the necessary judgemental ranking process. The PROS sampling has a wide range of applications in different fields ranging from environmental and ecological studies to medical research and it has been shown to be superior over ranked set sampling and simple random sampling for estimating the population mean. We study Fisher information content and uncertainty structure of the PROS samples and compare them with those of simple random sample (SRS) and ranked set sample (RSS) counterparts of the same size from the underlying population. We study uncertainty structure in terms of the Shannon entropy, Rényi entropy and Kullback–Leibler (KL) discrimination measures.
Similar content being viewed by others
References
Arslan, G., Ozturk, O.: Parametric inference based on partially rank ordered set samples. J. Indian Stat. Assoc. 51(1), 1–24 (2013)
Barabesi, L., El-Sharaawi, A.: The efficiency of ranked set sampling for parameter estimation. Stat. Probab. Lett. 53(2), 189–199 (2001)
Barreto, M.C.M., Barnett, V.: Best linear unbiased estimators for the simple linear regression model using ranked set sampling. Environ. Ecol. Stat. 6(2), 119–133 (1999)
Chen, Z.: The efficiency of ranked-set sampling relative to simple random sampling under multi-parameter families. Stat. Sin. 10(1), 247–264 (2000)
Chen, Z., Wang, Y.G.: Efficient regression analysis with ranked-set sampling. Biometrics 60(4), 997–1004 (2004)
Chen, Z., Bai, Z., Sinha, B.K.: Ranked Set Sampling: Theory and Applications, vol. 176. Springer, New York (2004)
Dell, T.R., Clutter, J,L.: Ranked set sampling theory with order statistics background. Biometrics 28(2), 545–555 (1972)
Frey, J.: Nonparametric mean estimation using partially ordered sets. Environ. Ecol. Stat. 19(3), 309–326 (2012). (ISSN 1352-8505)
Hatefi, A., Jafari Jozani, M., Oztuk, O.: Mixture model analysis of partially rank ordered set samples. Scand. J. Stat. 42, 848–871 (2015)
Hatefi, A., Jafari Jozani, M.: Fisher information in different types of perfect and imperfect ranked set samples from finite mixture models. J. Multivar. Anal. 119, 16–31 (2013)
Hatefi, A., Jafari Jozani, M., Ziou, D.: Estimation and classification for finite mixture models under ranked set sampling. Stat. Sin. 24, 675–698 (2014)
Hill, B.M.: Information for estimating the proportions in mixtures of exponential and normal distributions. J. Am. Stat. Assoc. 58(304), 918–932 (1963)
Jafari Jozani, M., Ahmadi, J.: On uncertainty and information properties of ranked set samples. Inf. Sci. 260:1–16, 01 (2014)
Johnson, O.: Information Theory and the Central Limit Theorem. Imperial College Press, London (2004)
Lehmann, E.L., Casella, G.: Theory of Point Estimation, vol. 31. Springer, New York (1998)
McIntyre, G.A.: A method for unbiased selective sampling, using ranked sets. Crop Pasture Sci. 3(4), 385–390 (1952)
McIntyre, G.A.: A method for unbiased selective sampling, using ranked sets. Am. Stat. 59(3), 230 (2005)
Mode, N.A., Conquest, L.L., Marker, D.A.: Ranked set sampling for ecological research: accounting for the total costs of sampling. Environmetrics 10(2), 179–194 (1999)
Muttlak, H.A., McDonald, L.L.: Ranked set sampling and the line intercept method: A more efficient procedure. Biom. J. 34(3), 329–346 (1992)
Ozturk, O.: Sampling from partially rank-ordered sets. Environ. Ecol. Stat. 18(4), 757–779 (2011)
Ozturk, O.: Combining multi-observer information in partially rank-ordered judgment post-stratified and ranked set samples. Can. J. Stati. 41(2), 304–324 (2013)
Ozturk, O., Bilgin, Omer C., Wolfe, Douglas A.: Estimation of population mean and variance in flock management: a ranked set sampling approach in a finite population setting. J. Stat. Comput. Simul. 75(11), 905–919 (2005)
Stokes, S.L.: Ranked set sampling with concomitant variables. Commun. Stat. Theory Methods 6(12), 1207–1211 (1977)
Stokes, S.L.: Estimation of variance using judgment ordered ranked set samples. Biometrics 36(1), 35–42 (1980)
Wang, Y.G., Ye, Y., Milton, D.A.: Efficient designs for sampling and subsampling in fisheries research based on ranked sets. ICES J. Marine Sci. J. du Conseil 66(5), 928–934 (2009)
Wolfe, D.A.: Ranked set sampling: its relevance and impact on statistical inference. ISRN Probab. Stat. 1–32, 2012 (2012)
Acknowledgments
We would like to thank two anonymous reviewers and an associate editor for their constructive comments and suggestions. Mohammad Jafari Jozani gratefully acknowledges the research support of the NSERC Canada. Armin Hatefi acknowledges partial supports through the University of Manitoba Graduate Fellowship, Manitoba Graduate Scholarship (during his PhD program) and Fields Ontario Postdoctoral Fellowship.
Author information
Authors and Affiliations
Corresponding author
Appendix: FI of unbalanced PROS and the effect of misplacement errors
Appendix: FI of unbalanced PROS and the effect of misplacement errors
In this section, we study the FI matrix of the unbalanced PROS sampling design in a general setting when the subsets are allowed to be of different sizes. To obtain an unbalanced PROS, we first need to determine the sample of size K and set size S. Judgment sub-setting process is then applied to create K sets. We group these K sets into N cycles \(G_i=\{S_{1,i},\ldots ,S_{n_i,i}\};\, i=1,\ldots ,N\), where \(\sum _{i=1}^{N}n_i=K\). Let \(D_{r,i}=\{{d_{r[1]i}},\ldots ,d_{r[n_i]i}\}\) be the design parameter associated with set \(S_{r,i}\), where \({d_{r[l]i}}; l=1,\ldots ,n_i\) is the l-th judgment subset in the set \(S_{r,i}\). In each cycle \(G_i; i=1,\ldots ,N\), we randomly select a unit from one of the sets (particularly from the judgment subset \(d_{r[r]i}; r=1,\ldots , n_i\)) for full measurement, say \(X_{[d_r]i}\) and the number of unranked units in subset \(d_{r[r]i}\) is denoted by \(m_{ri}; r=1,\ldots ,n_i; i=1,\ldots ,N\). To this end, the collection of measured observations \(\{X_{[d_r]i};r=1,\ldots ,n_i;i=1,\ldots ,N\}\) is an unbalanced PROS sample of size \(K=\sum _{i=1}^{N}n_i\). Table 9 illustrates the construction of an unbalanced PROS sample of size of \(K=5\) with set size \(S=6\) and cycle size \(N=2\) so that in the first cycle we declare three subsets \(n_1=3\) and two subsets \(n_2=2\) of different sizes in the first and second cycles, respectively. In each set, \(m_{ri}\) represents the number of unranked units in the selected subset. For more details about this kind of designs, see Ozturk (2011).
In the light of Lemma 8 (pointed out in Ozturk 2011) and proving it through latent variables, we show the difference between the complete PROS (Sect. 2 for the case of perfect) and incomplete PROS (Sect. 3.2).
Lemma 8
Let \(Y_{ri}=X_{[d_r]i}\) be an observation from unbalanced PROS sampling design from a continuous distribution with pdf \(f(\cdot ;{\varvec{\theta }})\). With knowledge of the design parameter \(D_{r,i}\), the pdf of \(Y_{ri}\) is given by
where \(f^{[v:S]}(y;{\varvec{\theta }})\) is the pdf of the v-th judgment order statistics between S data.
Proof
For each \(Y_{ri}\) define the latent vector \({\varvec{\Delta }}^{[d_r]i}= (\Delta ^{[d_r]i}(v), v\in d_{r[r]i})\), where
with \(\sum _{v \in d_{r[r]i}} \Delta ^{[d_r]i}(v)=1\). The joint pdf of \((Y_{ri},{\varvec{\Delta }}^{[d_r]i})\) is given by
Furthermore, by summing the joint distribution of \((Y_{ri},{\varvec{\Delta }}^{[d_r]i})\) over \({\varvec{\Delta }}^{[d_r]i}={\varvec{\delta }}^{[d_r]i}\), the marginal distribution of \(Y_{ri}\) is obtained as follows:
\(\square \)
Using Lemma 8, the likelihood function under an unbalanced PROS design is now given by
where \(\Omega = ({\varvec{\theta }}, \varvec{\alpha })\), \(f^{(u:S)}(\cdot ;{\varvec{\theta }})\) is the pdf of the u-th order statistics and in a similar vein to Sect. 3.2, \(\alpha _{[d_r,d_h]i}\) is considered as the misplacement probability of a unit from subset \(d_{h[h]i}\) into subset \(d_{r[r]i}\) so that \(\sum _{h=1}^{n_i} \alpha _{[d_r,d_h]i}=\sum _{r=1}^{n_i} \alpha _{[d_r,d_h]i} =1; i=1,\ldots ,N\).
Similarly, one can re-write the likelihood function of unbalanced PROS data (13) as:
where
Similar to Sect. 3.2, to obtain the FI matrix of an unbalanced PROS sample and compare it with its SRS and RSS counterparts we need the following results whose proofs are easy and left to the reader.
Lemma 9
Let \(Y_{r,i}=X_{[d_r]i}\), \(r=1,\ldots , n_i; i=1,\ldots ,N\), be observed from a continuous distribution with pdf \(f(\cdot ; {\varvec{\theta }})\) using an unbalanced PROS sampling design. Suppose \(f_{[r;m_{ri}]}(\cdot ;{\varvec{\theta }})\) and \(g_{ri}(\cdot ;{\varvec{\theta }})\) are defined as in Lemma 8 and (14), respectively. Under the regularity conditions of Chen et al. (2004b), we have
-
(i)
\(\sum _{i=1}^{N}\sum _{r=1}^{n_i}E \left\{ \frac{D^2_{{\varvec{\theta }}}g_{ri}(X_{[d_r]i};{\varvec{\theta }})}{g_{ri}(X_{[d_r]i};{\varvec{\theta }})} \right\} = \sum _{i=1}^{N}\sum _{r=1}^{n_i} E \left\{ {D^2_{{\varvec{\theta }}}g_{ri}(X;{\varvec{\theta }})} \right\} ,\)
-
(ii)
\(\sum _{i=1}^{N}\sum _{r=1}^{n_i} E \left\{ \frac{[D_{{\varvec{\theta }}} g_{ri}(X_{[d_r]i};{\varvec{\theta }})][D_{{\varvec{\theta }}} g_{ri}(X_{[d_r]i};{\varvec{\theta }})]^{\top }}{g_{ri}^2(X_{[d_r]i};{\varvec{\theta }})} \right\} = { \sum _{i=1}^{N}\sum _{r=1}^{n_i}} E \left\{ \frac{[D_{{\varvec{\theta }}} g_{ri}(X;{\varvec{\theta }})][D_{{\varvec{\theta }}} g_{ri}(X;{\varvec{\theta }})]^{\top }}{g_{ri}(X;{\varvec{\theta }})} \right\} .\)
Theorem 4
Under the conditions of Lemma 9, the FI matrix of an unbalanced PROS sample about unknown parameters \(\Omega =(\varvec{\alpha },{\varvec{\theta }})\) is given by
Through numerical studies, Table 10 compares compare the FI content of unbalanced PROS with that of the counterparts RSS (with set size \(n\in \{2,3\}\) and cycle \(N=1\)) and SRS data in the case of normal distribution. The parameters of PROS design are cycle size \(N=1\), set size \(S=6\) and number of subsets \(n\in \{2,3\}\) with subset sizes \(m\in \{2, 3\}\). Similar to the previous simulation studies, the misplacement ranking error models are obtained following Dell and Clutter (1972) for \(\rho \in \{0.25,0.5,0.75,0.9,1\}\).
Rights and permissions
About this article
Cite this article
Hatefi, A., Jozani, M.J. Information content of partially rank-ordered set samples. AStA Adv Stat Anal 101, 117–149 (2017). https://doi.org/10.1007/s10182-016-0277-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-016-0277-9
Keywords
- Fisher information
- Shannon entropy
- Rényi entropy
- Kullback–Leibler information
- Misplacement probability matrix