Abstract
The performance of a number of robust estimators in the presence of distinct secondary subsets of data is assessed. Estimators examined include the kernel mode recommended by IUPAC, the MM-estimator described by Yohai and, for comparison, the mean, median, and Huber estimate. The performance of the estimators was compared by application to simulated data with one major and one minor mode, and with known minor mode location and proportion of data in the minor mode. The MM-estimator generally performed better than classical and Huber estimates and also provided better precision than the kernel mode at lower minor mode proportions (20% or less). At high minor mode proportion (30%), the kernel density mode provided smaller mean bias and better precision at modest minor mode offsets.
Similar content being viewed by others
References
ISO 13528:2005 (2005) Statistical methods for use in proficiency testing by interlaboratory comparisons. ISO, Geneva
Analytical Methods Committee (1989) Analyst, 114, 1693
Thompson M, Ellison SLR, Wood R (2006) Pure Appl Chem 78:145–196
Lowthian PJ, Thompson M (2002) Analyst 10:1359–1364
Huber PJ (1981) Robust statistics. Wiley, New York
Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics: theory and methods. Wiley, Chichester
Yohai VJ (1987) Ann Stat 15:642–656
R Development Core Team (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0, URL http://www.R-project.org
Venables WN, Ripley BD (2002) Modern Applied Statistics with S, 4th edn. Springer, New York ISBN 0-387-95457-0
Scott DW (1992) Multivariate density estimation: theory, practice, and visualization. Wiley, Chichester
Rousseew PJ, Croux C (1993) J Am Stat Assoc 88:1273–1283
ISO/TS 20612:2007 (2007) Water quality—interlaboratory comparisons for proficiency testing of analytical chemistry laboratories, ISO, Geneva
Acknowledgments
Preparation of this paper was supported under contract with the UK Department for Innovation, Universities and Schools National Measurement System (NMS) Chemical and Biological Metrology Programme. The author would additionally like to thank Dr Antonio Possolo (NIST, USA) for suggesting the possible use of MM-estimators in inter-laboratory studies.
Author information
Authors and Affiliations
Corresponding author
Additional information
Presented at the Eurachen PT Workshop, October 2008, Rome, Italy.
Appendix: Formulae and algorithm for MM-estimation
Appendix: Formulae and algorithm for MM-estimation
Tukey’s bisquare
Tukey’s bisquare is defined by setting ρ in equation 1 to
where \( z = {{(x - \hat{\mu })} \mathord{\left/ {\vphantom {{(x - \hat{\mu })} {\hat{\sigma }}}} \right. \kern-\nulldelimiterspace} {\hat{\sigma }}}, \) \( \hat{\mu } \) is the robust estimate of location and \( \hat{\sigma } \) a robust estimate of scale. This leads to the weight function
where c is a tuning parameter which takes the values shown in Table 1 for different desired efficiencies.
MM-estimate for location with equal prior weights
Yohai’s “MM-estimate” uses simple initial estimates of scale and location as input to an initial S-estimate, which refines both, providing a robust estimate of scale with high breakdown point and a somewhat refined estimate of location. In a subsequent step, the S-estimate of scale is used in an M-estimate of location which uses the bisquare weight function above with a tuning parameter set to the desired higher efficiency, refining the location estimate to provide high efficiency. The implementation used for the simulation study above is that of Venables and Ripley [9], which provides MM-estimates for the general case of regression estimates with (optionally) differing prior weights. For the simpler case of a single location estimate with equal prior weights, as in the present comparison, the implementation provided below to illustrate the procedure provides essentially identical results within rounding and convergence error.
The two principal steps are:
-
(a)
Obtain initial S-estimates of scale and location.
For n data with no reported uncertainties (and therefore equal prior weights) the initial S-estimate can be implemented by iterative reweighting as follows:
-
(1)
set \( w_{i} = \left[ {1 - \min \left( {1,abs\left( {{{(x_{i} - \hat{\mu }_{ - 1} )} \mathord{\left/ {\vphantom {{(x_{i} - \hat{\mu }_{ - 1} )} {k_{0} \hat{\sigma }_{ - 1} }}} \right. \kern-\nulldelimiterspace} {k_{0} \hat{\sigma }_{ - 1} }}} \right)} \right)^{2} } \right]^{2} \)where k 0 is a tuning constant (set to 1.548 in the present implementation) and \( \hat{\mu }_{ - 1} \) and \( \hat{\sigma }_{ - 1} \)the previous or initial estimates of location and scale, respectively. For a simple location estimate without reported uncertainties, initial values can be set from the median and scaled median absolute deviation (described as MADE in reference [2]), respectively.
-
(2)
set \( \hat{\mu }_{0} = {{\sum {w_{i} x_{i} } } \mathord{\left/ {\vphantom {{\sum {w_{i} x_{i} } } {\sum {w_{i} } }}} \right. \kern-\nulldelimiterspace} {\sum {w_{i} } }} \)
-
(3)
set \( u_{i} = \left[ {{{\left( {x_{i} - \mu_{0} } \right)} \mathord{\left/ {\vphantom {{\left( {x_{i} - \mu_{0} } \right)} {k_{0} \hat{\sigma }_{ - 1} }}} \right. \kern-\nulldelimiterspace} {k_{0} \hat{\sigma }_{ - 1} }}} \right]^{2} \)
-
(4)
set \( \hat{\sigma }_{0} = \hat{\sigma }_{ - 1} \sqrt {{{2\sum {\max \left( {1,\,3u_{i} - 3u_{i}^{2} + u_{i}^{3} } \right)} } \mathord{\left/ {\vphantom {{2\sum {\max \left( {1,\,3u_{i} - 3u_{i}^{2} + u_{i}^{3} } \right)} } {(n - 1)}}} \right. \kern-\nulldelimiterspace} {(n - 1)}}} \)
Steps (1)–(4) are repeated until the value of \( \hat{\sigma }_{0} \) converges. For the implementation in the present paper, the value was considered to have converged when \( abs\left( {1 - {{\hat{\sigma }_{0} } \mathord{\left/ {\vphantom {{\hat{\sigma }_{0} } {\hat{\sigma }_{ - 1} }}} \right. \kern-\nulldelimiterspace} {\hat{\sigma }_{ - 1} }}} \right) < 10^{ - 5} \). The choice of k 0 provides a scale estimate with 50% breakdown.
-
(b)
Improve the location estimate.
The final stage uses the scale and location estimates \( \hat{\sigma }_{0} \) and \( \hat{\mu }_{0} \) from the S-estimate, as follows:
-
(1)
set \( \hat{\mu } = \hat{\mu }_{0} \)
-
(2)
set \( w_{i} = \left[ {1 - \min \left( {1,abs\left( {{{(x_{i} - \hat{\mu }_{{}} )} \mathord{\left/ {\vphantom {{(x_{i} - \hat{\mu }_{{}} )} {c\hat{\sigma }_{0} }}} \right. \kern-\nulldelimiterspace} {c\hat{\sigma }_{0} }}} \right)} \right)^{2} } \right]^{2} \)where c is chosen from Table 1 for the desired efficiency (for example, for 95% efficiency c = 4.69)
-
(3)
recalculate \( \hat{\mu } \) from \( \hat{\mu } = {{\sum {w_{i} x_{i} } } \mathord{\left/ {\vphantom {{\sum {w_{i} x_{i} } } {\sum {w_{i} } }}} \right. \kern-\nulldelimiterspace} {\sum {w_{i} } }} \)
Steps (2) and (3) are repeated until the value of \( \hat{\mu } \) converges. In the present implementation, the value was considered to have converged if the change in \( \hat{\mu } \) was less than \( {{10^{ - 5} \hat{\sigma }_{0} } \mathord{\left/ {\vphantom {{10^{ - 5} \hat{\sigma }_{0} } {\sqrt n }}} \right. \kern-\nulldelimiterspace} {\sqrt n }} \).
Rights and permissions
About this article
Cite this article
Ellison, S.L.R. Performance of MM-estimators on multi-modal data shows potential for improvements in consensus value estimation. Accred Qual Assur 14, 411–419 (2009). https://doi.org/10.1007/s00769-009-0571-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00769-009-0571-2