Nonparametric maximum likelihood estimation of the distribution function using ranked-set sampling

Frey, Jesse; Zhang, Yimin

doi:10.1007/s42952-023-00229-0

Nonparametric maximum likelihood estimation of the distribution function using ranked-set sampling

Research Article
Published: 20 September 2023

Volume 52, pages 901–920, (2023)
Cite this article

Journal of the Korean Statistical Society Aims and scope Submit manuscript

77 Accesses
Explore all metrics

Abstract

Kvam and Samaniego (J Am Stat Assoc 89: 526–537, 1994) derived an estimator that they billed as the nonparametric maximum likelihood estimator (MLE) of the distribution function based on a ranked-set sample. However, we show here that the likelihood used by Kvam and Samaniego (1994) is different from the probability of seeing the observed sample under perfect rankings. By appealing to results on order statistics from a discrete distribution, we write down a likelihood that matches the probability of seeing the observed sample. We maximize this likelihood by using the EM algorithm, and we show that the resulting MLE avoids certain unintuitive behavior exhibited by the Kvam and Samaniego (1994) estimator. We find that the new MLE outperforms both the Kvam and Samaniego (1994) estimator and the unbiased estimator due to Stokes and Sager (J Am Stat Assoc 83: 374– 381, 1988) in terms of integrated mean squared error under perfect rankings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exponentially tilted empirical distribution function for ranked set samples

Article 21 October 2015

Inference on a distribution function from ranked set samples

Article 20 July 2018

Bayesian nonparametric models for ranked set sampling

Article 19 October 2014

References

Bohn, L. L., & Wolfe, D. A. (1992). Nonparametric two-sample procedures for ranked-set samples data. Journal of the American Statistical Association, 87, 552–561.
Article MATH Google Scholar
Bohn, L. L., & Wolfe, D. A. (1994). The effect of imperfect judgment rankings on properties of procedures based on the ranked-set sample analog of the Mann-Whitney-Wilcoxon statistic. Journal of the American Statistical Association, 89, 168–176.
Article MathSciNet MATH Google Scholar
Dell, T. R., & Clutter, J. L. (1972). Ranked set sampling theory with order statistics background. Biometrics, 28, 545–555.
Article MATH Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.
MathSciNet MATH Google Scholar
Frey, J. (2007a). A note on a probability involving independent order statistics. Journal of Statistical Computation and Simulation, 77, 969–975.
Article MathSciNet MATH Google Scholar
Frey, J. (2007b). Distribution-free statistical intervals via ranked-set sampling. Canadian Journal of Statistics, 35, 585–596.
Article MathSciNet MATH Google Scholar
Frey, J., Ozturk, O., & Deshpande, J. V. (2007). Nonparametric tests for perfect judgment rankings. Journal of the American Statistical Association, 102, 708–717.
Article MathSciNet MATH Google Scholar
Frey, J., & Zhang, Y. (2017). Testing perfect rankings in ranked-set sampling with binary data. Canadian Journal of Statistics, 45, 326–339.
Article MathSciNet MATH Google Scholar
Frey, J., & Zhang, Y. (2019). An omnibus two-sample test for ranked-set sampling data. Journal of the Korean Statistical Society, 48, 106–116.
Article MathSciNet MATH Google Scholar
Gemayel, N. M., Stasny, E. A., Tackett, J. A., & Wolfe, D. A. (2012). Ranked set sampling: An auditing application. Review of Quantitative Finance and Accounting, 39, 413–422.
Article Google Scholar
Halls, L. K., & Dell, T. R. (1966). Trial of ranked-set sampling for forage yields. Forest Science, 12, 22–26.
Google Scholar
Howard, R. W., Jones, S. C., Mauldin, J. K., & Beal, R. H. (1982). Abundance, distribution, and colony size estimates for Reticulitermes spp. (Isopter: Rhinotermitidae) in Southern Mississippi. Environmental Entomology, 11, 1290–1293.
Article Google Scholar
Kvam, P. H. (2003). Ranked set sampling based on binary water quality data with covariates. Journal of Agricultural, Biological, and Environmental Statistics, 8, 271–279.
Article Google Scholar
Kvam, P. H., & Samaniego, F. J. (1993). On the inadmissibility of empirical averages as estimators in ranked set sampling. Journal of Statistical Planning and Inference, 36, 39–55.
Article MathSciNet MATH Google Scholar
Kvam, P. H., & Samaniego, F. J. (1994). Nonparametric maximum likelihood estimation based on ranked set samples. Journal of the American Statistical Association, 89, 526–537.
Article MathSciNet MATH Google Scholar
MacEachern, S. N., Ozturk, O., Wolfe, D. A., & Stark, G. V. (2002). A new ranked set sample estimator of variance. Journal of the Royal Statistical Society Series B, 64(part 2), 177–188.
Article MathSciNet MATH Google Scholar
MacEachern, S. N., Stasny, E. A., & Wolfe, D. A. (2004). Judgment post-stratification with imprecise rankings. Biometrics, 60, 207–215.
Article MathSciNet MATH Google Scholar
McIntyre, G. A. (1952). A method for unbiased selective sampling, using ranked sets. Australian Journal of Agricultural Research, 3, 385–390.
Article Google Scholar
McIntyre, G. A. (2005). A method for unbiased selective sampling, using ranked sets. The American Statistician, 59, 230–232. originally appeared in Australian Journal of AgriculturalResearch 3:385–390.
Article MathSciNet Google Scholar
Modarres, R., Hui, T. P., & Zheng, G. (2006). Resampling methods for ranked set samples. Computational Statistics and Data Analysis, 51, 1039–1050.
Article MathSciNet MATH Google Scholar
Nagaraja, H. N. (1992). Order statistics from discrete distributions. Statistics, 23, 189–216.
Article MathSciNet MATH Google Scholar
Stokes, S. L., & Sager, T. W. (1988). Characterization of a ranked-set sample with application to estimating distribution functions. Journal of the American Statistical Association, 83, 374–381.
Article MathSciNet MATH Google Scholar
Wolfe, D. A. (2004). Ranked set sampling: An approach to more efficient data collection. Statistical Science, 19, 636–643.
Article MathSciNet MATH Google Scholar
Wolfe, D. A. (2010). Ranked set sampling. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 460–466.
Article Google Scholar
Wolfe, D. A. (2012). Ranked set sampling: Its relevance and impact on statistical inference. ISRN Probability and Statistics, 568385, 1–32.
Article MATH Google Scholar

Download references

Acknowledgements

The authors thank the reviewers for helpful suggestions that have improved the paper.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Villanova University, Villanova, PA, 19085, USA
Jesse Frey & Yimin Zhang

Authors

Jesse Frey
View author publications
You can also search for this author in PubMed Google Scholar
Yimin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jesse Frey.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (TXT 6 KB)

Appendix

We will demonstrate here that, for the three estimators considered in Sect. 4, $MSE_{{\hat{F}}}(x)$ depends on x only through F(x) if F(x) is continuous and if the rankings are done using the fraction-of-random-rankings model. It then follows that the IMSE and RIMSE values from Sect. 4 are distribution-free for continuous distributions. The perfect rankings case of this result is implicit in Sect. 4 of Kvam and Samaniego (1993), though not stated in a formal way.

Let $Y_1,\ldots ,Y_N$ be the independent judgment order statistics to be included in the sample, with the associated ranks and set sizes being $s_1,\ldots ,s_N$ and $m_1,\ldots ,m_N$. Using the fact that the three estimators considered in Sect. 4 are step functions with ${\hat{F}}(x)$ values that depend only on the ranks of the ordered sample values and on which ordered values x falls between, we have that

$$\begin{aligned} MSE_{{\hat{F}}}(x) = \sum _{\pi \in S_N} \sum _{i=0}^N P\left( Y_{\pi (1)}< \cdots< Y_{\pi (N)}, Y_{\pi (i)}< x < Y_{\pi (i+1)}\right) \left( {\hat{F}}(\pi ,i) - F(x)\right) ^2, \end{aligned}$$

where $S_N$ is the permutation group on the integers $\{1,\ldots ,N\}$, $Y_{\pi (0)}$ and $Y_{\pi (N+1)}$ are given by $Y_{\pi (0)} \equiv -\infty$ and $Y_{\pi (N+1)} \equiv \infty$, and ${\hat{F}}(\pi ,i)$ is the estimated value for F(x) on the interval $x \in [Y_{\pi (i)},Y_{\pi (i+1)})$ when the judgment order statistics are ordered as $Y_{\pi (1)}< \cdots < Y_{\pi (N)}$. In other words, $MSE_{{\hat{F}}}(x)$ can be obtained by running through all possible values for ${\hat{F}}(x)$ and weighting the corresponding squared errors by the probability of the particular ${\hat{F}}(x)$ value occurring. In Sect. 4, we did this for a specific example in Table 2 and the associated discussion.

By the probability integral transform, if Y is a random draw from the distribution with distribution function F(x), then $F(Y) \sim \text{ Uniform }(0,1)$. Similarly, if $Y_{r:m}$ is an rth order statistic from a set of size m, then $F(Y_{r:m}) \sim \text{ Beta }(r,m+1-r)$. Under the fraction-of-random-rankings model, $Y_i$ is either a true order statistic or, with probability $\lambda$, a random draw from the parent distribution. Thus, the values $F(Y_1),\ldots ,F(Y_N)$ are independently distributed, with $F(Y_i)$ being a mixture of the $\text{ Uniform }(0,1)$ and $\text{ Beta }(s_i,m_i+1-s_i)$ distributions where the components get weights $1-\lambda$ and $\lambda$. Applying $F(\cdot )$ to each part of the inequality in the expression for $MSE_{{\hat{F}}}(x)$ and using the fact that, by continuity, the $F(Y_i)$ values are all distinct with probability 1, we have that

$$\begin{aligned} MSE_{{\hat{F}}}(x)= & {} \sum _{\pi \in S_N} \sum _{i=0}^N P\left( F(Y_{\pi (1)})< \cdots< F(Y_{\pi (N)}),\right. \\{} & {} \left. F(Y_{\pi (i)})< F(x)< F(Y_{\pi (i+1)})\right) \left( {\hat{F}}(\pi ,i) - F(x)\right) ^2, \end{aligned}$$

where the only dependence on x is through the two instances of F(x).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Frey, J., Zhang, Y. Nonparametric maximum likelihood estimation of the distribution function using ranked-set sampling. J. Korean Stat. Soc. 52, 901–920 (2023). https://doi.org/10.1007/s42952-023-00229-0

Download citation

Received: 27 March 2023
Accepted: 30 August 2023
Published: 20 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s42952-023-00229-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric maximum likelihood estimation of the distribution function using ranked-set sampling

Abstract

Access this article

Similar content being viewed by others

Exponentially tilted empirical distribution function for ranked set samples

Inference on a distribution function from ranked set samples

Bayesian nonparametric models for ranked set sampling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (TXT 6 KB)

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonparametric maximum likelihood estimation of the distribution function using ranked-set sampling

Abstract

Access this article

Similar content being viewed by others

Exponentially tilted empirical distribution function for ranked set samples

Inference on a distribution function from ranked set samples

Bayesian nonparametric models for ranked set sampling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (TXT 6 KB)

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation