Skip to main content
Log in

Nonparametric maximum likelihood estimation of the distribution function using ranked-set sampling

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

Kvam and Samaniego (J Am Stat Assoc 89: 526–537, 1994) derived an estimator that they billed as the nonparametric maximum likelihood estimator (MLE) of the distribution function based on a ranked-set sample. However, we show here that the likelihood used by Kvam and Samaniego (1994) is different from the probability of seeing the observed sample under perfect rankings. By appealing to results on order statistics from a discrete distribution, we write down a likelihood that matches the probability of seeing the observed sample. We maximize this likelihood by using the EM algorithm, and we show that the resulting MLE avoids certain unintuitive behavior exhibited by the Kvam and Samaniego (1994) estimator. We find that the new MLE outperforms both the Kvam and Samaniego (1994) estimator and the unbiased estimator due to Stokes and Sager (J Am Stat Assoc 83: 374– 381, 1988) in terms of integrated mean squared error under perfect rankings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Bohn, L. L., & Wolfe, D. A. (1992). Nonparametric two-sample procedures for ranked-set samples data. Journal of the American Statistical Association, 87, 552–561.

    Article  MATH  Google Scholar 

  • Bohn, L. L., & Wolfe, D. A. (1994). The effect of imperfect judgment rankings on properties of procedures based on the ranked-set sample analog of the Mann-Whitney-Wilcoxon statistic. Journal of the American Statistical Association, 89, 168–176.

    Article  MathSciNet  MATH  Google Scholar 

  • Dell, T. R., & Clutter, J. L. (1972). Ranked set sampling theory with order statistics background. Biometrics, 28, 545–555.

    Article  MATH  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.

    MathSciNet  MATH  Google Scholar 

  • Frey, J. (2007a). A note on a probability involving independent order statistics. Journal of Statistical Computation and Simulation, 77, 969–975.

    Article  MathSciNet  MATH  Google Scholar 

  • Frey, J. (2007b). Distribution-free statistical intervals via ranked-set sampling. Canadian Journal of Statistics, 35, 585–596.

    Article  MathSciNet  MATH  Google Scholar 

  • Frey, J., Ozturk, O., & Deshpande, J. V. (2007). Nonparametric tests for perfect judgment rankings. Journal of the American Statistical Association, 102, 708–717.

    Article  MathSciNet  MATH  Google Scholar 

  • Frey, J., & Zhang, Y. (2017). Testing perfect rankings in ranked-set sampling with binary data. Canadian Journal of Statistics, 45, 326–339.

    Article  MathSciNet  MATH  Google Scholar 

  • Frey, J., & Zhang, Y. (2019). An omnibus two-sample test for ranked-set sampling data. Journal of the Korean Statistical Society, 48, 106–116.

    Article  MathSciNet  MATH  Google Scholar 

  • Gemayel, N. M., Stasny, E. A., Tackett, J. A., & Wolfe, D. A. (2012). Ranked set sampling: An auditing application. Review of Quantitative Finance and Accounting, 39, 413–422.

    Article  Google Scholar 

  • Halls, L. K., & Dell, T. R. (1966). Trial of ranked-set sampling for forage yields. Forest Science, 12, 22–26.

    Google Scholar 

  • Howard, R. W., Jones, S. C., Mauldin, J. K., & Beal, R. H. (1982). Abundance, distribution, and colony size estimates for Reticulitermes spp. (Isopter: Rhinotermitidae) in Southern Mississippi. Environmental Entomology, 11, 1290–1293.

    Article  Google Scholar 

  • Kvam, P. H. (2003). Ranked set sampling based on binary water quality data with covariates. Journal of Agricultural, Biological, and Environmental Statistics, 8, 271–279.

    Article  Google Scholar 

  • Kvam, P. H., & Samaniego, F. J. (1993). On the inadmissibility of empirical averages as estimators in ranked set sampling. Journal of Statistical Planning and Inference, 36, 39–55.

    Article  MathSciNet  MATH  Google Scholar 

  • Kvam, P. H., & Samaniego, F. J. (1994). Nonparametric maximum likelihood estimation based on ranked set samples. Journal of the American Statistical Association, 89, 526–537.

    Article  MathSciNet  MATH  Google Scholar 

  • MacEachern, S. N., Ozturk, O., Wolfe, D. A., & Stark, G. V. (2002). A new ranked set sample estimator of variance. Journal of the Royal Statistical Society Series B, 64(part 2), 177–188.

    Article  MathSciNet  MATH  Google Scholar 

  • MacEachern, S. N., Stasny, E. A., & Wolfe, D. A. (2004). Judgment post-stratification with imprecise rankings. Biometrics, 60, 207–215.

    Article  MathSciNet  MATH  Google Scholar 

  • McIntyre, G. A. (1952). A method for unbiased selective sampling, using ranked sets. Australian Journal of Agricultural Research, 3, 385–390.

    Article  Google Scholar 

  • McIntyre, G. A. (2005). A method for unbiased selective sampling, using ranked sets. The American Statistician, 59, 230–232. originally appeared in Australian Journal of AgriculturalResearch 3:385–390.

    Article  MathSciNet  Google Scholar 

  • Modarres, R., Hui, T. P., & Zheng, G. (2006). Resampling methods for ranked set samples. Computational Statistics and Data Analysis, 51, 1039–1050.

    Article  MathSciNet  MATH  Google Scholar 

  • Nagaraja, H. N. (1992). Order statistics from discrete distributions. Statistics, 23, 189–216.

    Article  MathSciNet  MATH  Google Scholar 

  • Stokes, S. L., & Sager, T. W. (1988). Characterization of a ranked-set sample with application to estimating distribution functions. Journal of the American Statistical Association, 83, 374–381.

    Article  MathSciNet  MATH  Google Scholar 

  • Wolfe, D. A. (2004). Ranked set sampling: An approach to more efficient data collection. Statistical Science, 19, 636–643.

    Article  MathSciNet  MATH  Google Scholar 

  • Wolfe, D. A. (2010). Ranked set sampling. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 460–466.

    Article  Google Scholar 

  • Wolfe, D. A. (2012). Ranked set sampling: Its relevance and impact on statistical inference. ISRN Probability and Statistics, 568385, 1–32.

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors thank the reviewers for helpful suggestions that have improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesse Frey.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (TXT 6 KB)

Appendix

Appendix

We will demonstrate here that, for the three estimators considered in Sect. 4, \(MSE_{{\hat{F}}}(x)\) depends on x only through F(x) if F(x) is continuous and if the rankings are done using the fraction-of-random-rankings model. It then follows that the IMSE and RIMSE values from Sect. 4 are distribution-free for continuous distributions. The perfect rankings case of this result is implicit in Sect. 4 of Kvam and Samaniego (1993), though not stated in a formal way.

Let \(Y_1,\ldots ,Y_N\) be the independent judgment order statistics to be included in the sample, with the associated ranks and set sizes being \(s_1,\ldots ,s_N\) and \(m_1,\ldots ,m_N\). Using the fact that the three estimators considered in Sect. 4 are step functions with \({\hat{F}}(x)\) values that depend only on the ranks of the ordered sample values and on which ordered values x falls between, we have that

$$\begin{aligned} MSE_{{\hat{F}}}(x) = \sum _{\pi \in S_N} \sum _{i=0}^N P\left( Y_{\pi (1)}< \cdots< Y_{\pi (N)}, Y_{\pi (i)}< x < Y_{\pi (i+1)}\right) \left( {\hat{F}}(\pi ,i) - F(x)\right) ^2, \end{aligned}$$

where \(S_N\) is the permutation group on the integers \(\{1,\ldots ,N\}\), \(Y_{\pi (0)}\) and \(Y_{\pi (N+1)}\) are given by \(Y_{\pi (0)} \equiv -\infty\) and \(Y_{\pi (N+1)} \equiv \infty\), and \({\hat{F}}(\pi ,i)\) is the estimated value for F(x) on the interval \(x \in [Y_{\pi (i)},Y_{\pi (i+1)})\) when the judgment order statistics are ordered as \(Y_{\pi (1)}< \cdots < Y_{\pi (N)}\). In other words, \(MSE_{{\hat{F}}}(x)\) can be obtained by running through all possible values for \({\hat{F}}(x)\) and weighting the corresponding squared errors by the probability of the particular \({\hat{F}}(x)\) value occurring. In Sect. 4, we did this for a specific example in Table 2 and the associated discussion.

By the probability integral transform, if Y is a random draw from the distribution with distribution function F(x), then \(F(Y) \sim \text{ Uniform }(0,1)\). Similarly, if \(Y_{r:m}\) is an rth order statistic from a set of size m, then \(F(Y_{r:m}) \sim \text{ Beta }(r,m+1-r)\). Under the fraction-of-random-rankings model, \(Y_i\) is either a true order statistic or, with probability \(\lambda\), a random draw from the parent distribution. Thus, the values \(F(Y_1),\ldots ,F(Y_N)\) are independently distributed, with \(F(Y_i)\) being a mixture of the \(\text{ Uniform }(0,1)\) and \(\text{ Beta }(s_i,m_i+1-s_i)\) distributions where the components get weights \(1-\lambda\) and \(\lambda\). Applying \(F(\cdot )\) to each part of the inequality in the expression for \(MSE_{{\hat{F}}}(x)\) and using the fact that, by continuity, the \(F(Y_i)\) values are all distinct with probability 1, we have that

$$\begin{aligned} MSE_{{\hat{F}}}(x)= & {} \sum _{\pi \in S_N} \sum _{i=0}^N P\left( F(Y_{\pi (1)})< \cdots< F(Y_{\pi (N)}),\right. \\{} & {} \left. F(Y_{\pi (i)})< F(x)< F(Y_{\pi (i+1)})\right) \left( {\hat{F}}(\pi ,i) - F(x)\right) ^2, \end{aligned}$$

where the only dependence on x is through the two instances of F(x).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Frey, J., Zhang, Y. Nonparametric maximum likelihood estimation of the distribution function using ranked-set sampling. J. Korean Stat. Soc. 52, 901–920 (2023). https://doi.org/10.1007/s42952-023-00229-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-023-00229-0

Keywords

Navigation