Abstract
Kvam and Samaniego (J Am Stat Assoc 89: 526–537, 1994) derived an estimator that they billed as the nonparametric maximum likelihood estimator (MLE) of the distribution function based on a ranked-set sample. However, we show here that the likelihood used by Kvam and Samaniego (1994) is different from the probability of seeing the observed sample under perfect rankings. By appealing to results on order statistics from a discrete distribution, we write down a likelihood that matches the probability of seeing the observed sample. We maximize this likelihood by using the EM algorithm, and we show that the resulting MLE avoids certain unintuitive behavior exhibited by the Kvam and Samaniego (1994) estimator. We find that the new MLE outperforms both the Kvam and Samaniego (1994) estimator and the unbiased estimator due to Stokes and Sager (J Am Stat Assoc 83: 374– 381, 1988) in terms of integrated mean squared error under perfect rankings.
Similar content being viewed by others
References
Bohn, L. L., & Wolfe, D. A. (1992). Nonparametric two-sample procedures for ranked-set samples data. Journal of the American Statistical Association, 87, 552–561.
Bohn, L. L., & Wolfe, D. A. (1994). The effect of imperfect judgment rankings on properties of procedures based on the ranked-set sample analog of the Mann-Whitney-Wilcoxon statistic. Journal of the American Statistical Association, 89, 168–176.
Dell, T. R., & Clutter, J. L. (1972). Ranked set sampling theory with order statistics background. Biometrics, 28, 545–555.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.
Frey, J. (2007a). A note on a probability involving independent order statistics. Journal of Statistical Computation and Simulation, 77, 969–975.
Frey, J. (2007b). Distribution-free statistical intervals via ranked-set sampling. Canadian Journal of Statistics, 35, 585–596.
Frey, J., Ozturk, O., & Deshpande, J. V. (2007). Nonparametric tests for perfect judgment rankings. Journal of the American Statistical Association, 102, 708–717.
Frey, J., & Zhang, Y. (2017). Testing perfect rankings in ranked-set sampling with binary data. Canadian Journal of Statistics, 45, 326–339.
Frey, J., & Zhang, Y. (2019). An omnibus two-sample test for ranked-set sampling data. Journal of the Korean Statistical Society, 48, 106–116.
Gemayel, N. M., Stasny, E. A., Tackett, J. A., & Wolfe, D. A. (2012). Ranked set sampling: An auditing application. Review of Quantitative Finance and Accounting, 39, 413–422.
Halls, L. K., & Dell, T. R. (1966). Trial of ranked-set sampling for forage yields. Forest Science, 12, 22–26.
Howard, R. W., Jones, S. C., Mauldin, J. K., & Beal, R. H. (1982). Abundance, distribution, and colony size estimates for Reticulitermes spp. (Isopter: Rhinotermitidae) in Southern Mississippi. Environmental Entomology, 11, 1290–1293.
Kvam, P. H. (2003). Ranked set sampling based on binary water quality data with covariates. Journal of Agricultural, Biological, and Environmental Statistics, 8, 271–279.
Kvam, P. H., & Samaniego, F. J. (1993). On the inadmissibility of empirical averages as estimators in ranked set sampling. Journal of Statistical Planning and Inference, 36, 39–55.
Kvam, P. H., & Samaniego, F. J. (1994). Nonparametric maximum likelihood estimation based on ranked set samples. Journal of the American Statistical Association, 89, 526–537.
MacEachern, S. N., Ozturk, O., Wolfe, D. A., & Stark, G. V. (2002). A new ranked set sample estimator of variance. Journal of the Royal Statistical Society Series B, 64(part 2), 177–188.
MacEachern, S. N., Stasny, E. A., & Wolfe, D. A. (2004). Judgment post-stratification with imprecise rankings. Biometrics, 60, 207–215.
McIntyre, G. A. (1952). A method for unbiased selective sampling, using ranked sets. Australian Journal of Agricultural Research, 3, 385–390.
McIntyre, G. A. (2005). A method for unbiased selective sampling, using ranked sets. The American Statistician, 59, 230–232. originally appeared in Australian Journal of AgriculturalResearch 3:385–390.
Modarres, R., Hui, T. P., & Zheng, G. (2006). Resampling methods for ranked set samples. Computational Statistics and Data Analysis, 51, 1039–1050.
Nagaraja, H. N. (1992). Order statistics from discrete distributions. Statistics, 23, 189–216.
Stokes, S. L., & Sager, T. W. (1988). Characterization of a ranked-set sample with application to estimating distribution functions. Journal of the American Statistical Association, 83, 374–381.
Wolfe, D. A. (2004). Ranked set sampling: An approach to more efficient data collection. Statistical Science, 19, 636–643.
Wolfe, D. A. (2010). Ranked set sampling. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 460–466.
Wolfe, D. A. (2012). Ranked set sampling: Its relevance and impact on statistical inference. ISRN Probability and Statistics, 568385, 1–32.
Acknowledgements
The authors thank the reviewers for helpful suggestions that have improved the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
We will demonstrate here that, for the three estimators considered in Sect. 4, \(MSE_{{\hat{F}}}(x)\) depends on x only through F(x) if F(x) is continuous and if the rankings are done using the fraction-of-random-rankings model. It then follows that the IMSE and RIMSE values from Sect. 4 are distribution-free for continuous distributions. The perfect rankings case of this result is implicit in Sect. 4 of Kvam and Samaniego (1993), though not stated in a formal way.
Let \(Y_1,\ldots ,Y_N\) be the independent judgment order statistics to be included in the sample, with the associated ranks and set sizes being \(s_1,\ldots ,s_N\) and \(m_1,\ldots ,m_N\). Using the fact that the three estimators considered in Sect. 4 are step functions with \({\hat{F}}(x)\) values that depend only on the ranks of the ordered sample values and on which ordered values x falls between, we have that
where \(S_N\) is the permutation group on the integers \(\{1,\ldots ,N\}\), \(Y_{\pi (0)}\) and \(Y_{\pi (N+1)}\) are given by \(Y_{\pi (0)} \equiv -\infty\) and \(Y_{\pi (N+1)} \equiv \infty\), and \({\hat{F}}(\pi ,i)\) is the estimated value for F(x) on the interval \(x \in [Y_{\pi (i)},Y_{\pi (i+1)})\) when the judgment order statistics are ordered as \(Y_{\pi (1)}< \cdots < Y_{\pi (N)}\). In other words, \(MSE_{{\hat{F}}}(x)\) can be obtained by running through all possible values for \({\hat{F}}(x)\) and weighting the corresponding squared errors by the probability of the particular \({\hat{F}}(x)\) value occurring. In Sect. 4, we did this for a specific example in Table 2 and the associated discussion.
By the probability integral transform, if Y is a random draw from the distribution with distribution function F(x), then \(F(Y) \sim \text{ Uniform }(0,1)\). Similarly, if \(Y_{r:m}\) is an rth order statistic from a set of size m, then \(F(Y_{r:m}) \sim \text{ Beta }(r,m+1-r)\). Under the fraction-of-random-rankings model, \(Y_i\) is either a true order statistic or, with probability \(\lambda\), a random draw from the parent distribution. Thus, the values \(F(Y_1),\ldots ,F(Y_N)\) are independently distributed, with \(F(Y_i)\) being a mixture of the \(\text{ Uniform }(0,1)\) and \(\text{ Beta }(s_i,m_i+1-s_i)\) distributions where the components get weights \(1-\lambda\) and \(\lambda\). Applying \(F(\cdot )\) to each part of the inequality in the expression for \(MSE_{{\hat{F}}}(x)\) and using the fact that, by continuity, the \(F(Y_i)\) values are all distinct with probability 1, we have that
where the only dependence on x is through the two instances of F(x).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Frey, J., Zhang, Y. Nonparametric maximum likelihood estimation of the distribution function using ranked-set sampling. J. Korean Stat. Soc. 52, 901–920 (2023). https://doi.org/10.1007/s42952-023-00229-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-023-00229-0