Accreditation and Quality Assurance

, Volume 13, Issue 4–5, pp 193–216 | Cite as

A comparison of location estimators for interlaboratory data contaminated with value and uncertainty outliers

General Paper


While estimation of measurement uncertainty (MU) is increasingly acknowledged as an essential component of the chemical measurement process, there is little agreement on how best to use even nominally well-estimated MU. There are philosophical and practical issues involved in defining what is “best” for a given data set; however, there is remarkably little guidance on how well different MU-using estimators perform with imperfect data. This report characterizes the bias, efficiency, and robustness properties for several commonly used or recently proposed estimators of true location, μ, using “Monte Carlo” (MC) evaluation of “measurement” data sets drawn from well-defined distributions. These synthetic models address a number of issues pertinent to interlaboratory comparisons studies. While the MC results do not provide specific guidance on “which estimator is best” for any given set of real data, they do provide broad insight into the expected relative performance within broadly defined scenarios. Perhaps the broadest and most emphatic guidance from the present study is that (1) well-estimated measurement uncertainties can be used to improve the reliability of location determination and (2) some approaches to using measurement uncertainties are better than others. The traditional inverse squared uncertainty-weighted estimators perform well only in the absence of unrepresentative values (value outliers) or underestimated uncertainties (uncertainty outliers); even modest contamination by such outliers may result in relatively inaccurate estimates. In contrast, some inverse total variance-weighted-estimators and probability density function area-based estimators perform well for all scenarios evaluated, including underestimated uncertainties, extreme value outliers, and asymmetric contamination.


Consensus value Interlaboratory comparisons Measurement uncertainty Mixture models Monte Carlo evaluation Probability density function Robustness Weighting function 



Comité Consultatif pour la Quantité de Matière


Inverse total variance


Inverse squared uncertainty


Key comparison


Least power


Median absolute deviation from the median, expressed as a standard deviation


Monte Carlo


Mixture model


Measurement uncertainty


Number of measurements in a data set


Number of bootstrap pseudo-data sets


Number of Monte Carlo simulation data sets


Normal (Gaussian) distribution having mean μ and standard deviation σ


National Metrology Institute


Level of confidence


Principal component


Probability density function


Uniform (rectangular) distribution of integers having lower limit and upper limit u


Uniform (rectangular) distribution of real numbers having lower limit and upper limit u


Estimate of dispersion, expressed as a standard deviation

\( s{\left( {\hat{\mu }} \right)} \)

Estimate of the variability of an estimator on replicate sampling of a population, expressed as a standard deviation


Uncertainty component of the ith measurement, expressed as a standard deviation


Uncertainty component of the ith measurement, expressed as a coverage interval providing approximately P% level of confidence


Weight in a given calculation given to the ith measurement

\( \ifmmode\expandafter\bar\else\expandafter\=\fi{x} \)

Arithmetic mean


Value component of the ith measurement


True location of a population

\( \hat{\mu } \)

Estimate of location


True dispersion of a population, expressed as a standard deviation


  1. 1.
    Duewer DL (2004) A robust approach for the determination of CCQM key comparison reference values and uncertainties Working document CCQM/04-15., accessed: 13 September 2007
  2. 2.
    Toman B (2007) Bayesian approaches to calculating a reference value in key comparison experiments. Technometrics 49(1):81–77CrossRefGoogle Scholar
  3. 3.
    Cox MG (1999) A discussion of approaches for determining a reference value in the analysis of Key-Comparison data NPL Report CISE 42/99., accessed: 13 September 2007
  4. 4.
    Lowthian PJ, Thompson M (2002) Bump-hunting for the proficiency tester—searching for multimodality. Analyst 127:1359–1364CrossRefGoogle Scholar
  5. 5.
    Ciarlini P, Cox MG, Pavese F, Regoliosi G (2004) The use of a mixture of probability distributions in temperature interlaboratory comparisons. Metrologia 41:116–121CrossRefGoogle Scholar
  6. 6.
    BIPM key comparison database., accessed: 13 September 2007
  7. 7.
    CIPM (1 March 1999) Guidelines for CIPM key comparisons., accessed: 13 September 2007
  8. 8.
    Andrews DF, Bickel PJ, Hampel FR, Huber PJ, Rogers WH, Tukey JW (1972) Robust estimates of location. Princeton University Press, PrincetonGoogle Scholar
  9. 9.
    Willink R (2006) Meaning and models in key comparisons, with measures of operability and interoperability. Metrologia 43:S220–S230Google Scholar
  10. 10.
    Croux C, Haesbroeck G (2002) Maxbias curves of robust location estimators based on subranges. J Nonparametr Stat 14:295–306CrossRefGoogle Scholar
  11. 11.
    Cox MG (2007) The evaluation of key comparison data: determining the largest consistent subset. Metrologia 44:187–200CrossRefGoogle Scholar
  12. 12.
    Duewer DL (2007) How to combine results having stated uncertainties: to MU or not to MU? In: Fajgelj A, Belli M, Sansone U (eds) Combining and reporting analytical results. RSC, London, pp 127–142Google Scholar
  13. 13.
    ISO (1995) Guide to the expression of uncertainty in measurement. ISO, GenevaGoogle Scholar
  14. 14.
    Rukhin AL, Vangel MG (1998) Estimation of a common mean and weighted means statistics. J Am Stat Assoc 93(441):303–308CrossRefGoogle Scholar
  15. 15.
    Müller JW (2000) Possible advantages of a robust evaluation of comparisons. J Res Nat Inst Std Technol 105:551–554Google Scholar
  16. 16.
    Cox MG (2002) The evaluation of key comparison data. Metrologia 39:589–595CrossRefGoogle Scholar
  17. 17.
    Pennecchi F, Callegaro L (2006) Between the mean and the median: the Lp estimator. Metrologia 43:213–219CrossRefGoogle Scholar
  18. 18.
    Callegaro L, Pennecchi F (2007) Why always seek the expected value? A discussion relating to the Lp norm. Metrologia 44(6):L68–L70CrossRefGoogle Scholar
  19. 19.
    Analytical Methods Committee (2001) Robust statistics: a method of coping with outliers AMC Technical Brief 6., accessed: 13 September 2007
  20. 20.
    Analytical Methods Committee (1989) Robust statistics—how not to reject outliers. Part 1. Basics. Analyst 114:1693–1697CrossRefGoogle Scholar
  21. 21.
    RobStat.xla, MS EXCEL Add-in for Robust Statistics (2002), accessed: 13 September 2007
  22. 22.
    Viser RG (2006) Interpretation of interlaboratory comparison results to evaluate laboratory proficiency. Accred Qual Assur 10(9):521–526CrossRefGoogle Scholar
  23. 23.
  24. 24.
    Rousseeuw PJ (1985) Multivariate estimation with high breakdown point In: Grossman W, Pflug G, Nincze I, Wetrz W (eds) Mathematical statistics and applications. Reidel, Dordrecht, The Netherlands, pp 283–297Google Scholar
  25. 25.
    Rose AH, Wang C-M, Byer SD (2000) Round Robin for optical fiber Bragg grating metrology. J Res Nat Inst Std Technol 105:839–866Google Scholar
  26. 26.
    Cox NJ (2007), SHORTH: Stata module for descriptive statistics based on shortest halves., accessed: 13 September 2007
  27. 27.
    Spitzer P, VyskoČil L, Máriássy M, Pratt KW, Hongyu X, Dazhou C, Fanmin M, Kristensen HB, Hjelmer B, Rol PM, Nakamura S, Kim M, Torres M, Kozlowski W, Wyszynska J, Pawlina M, Karpov OV, Zdorikov N, Seyku E, Maximov I, Schmidt I, Eberhardt R (2001) pH determination on two phosphate buffers by Harned cell measurements, Final report for CCQM-K9., accessed: 13 September 2007
  28. 28.
    Rukhin AL, Sedransk N (2007) Statistics in metrology: international key comparisons and interlaboratory studies. J Data Sci 5:393–412Google Scholar
  29. 29.
    Graybill FA, Deal RB (1959) Combining unbiased estimators. Biometrics 15:543–550CrossRefGoogle Scholar
  30. 30.
    Heydorn K (2006) The determination of an accepted reference value from proficiency data with stated uncertainties. Accred Qual Assur 10(9):479–484CrossRefGoogle Scholar
  31. 31.
    Decker JE, Brown N, Cox MG, Steele AG, Douglas RJ (2006) Recent recommendations of the consultative committee for length (CCL) regarding strategies for evaluating key comparison data. Metrologia 43:L51–L55CrossRefGoogle Scholar
  32. 32.
    Steele AG, Wood BM, Douglas RJ (2005) Outlier rejection for the weighted-mean KCRV. Metrologia 42:32–38CrossRefGoogle Scholar
  33. 33.
    Ratel G (2006) Median and weighted median as estimators for the key comparison reference value (KCRV). Metrologia 43:S244–S248CrossRefGoogle Scholar
  34. 34.
    Paule RC, Mandel J (1982) Consensus values and weighting factors. J Res Nat Bur Std 87:377–385Google Scholar
  35. 35.
    Rukhin AL, Biggerstaff BJ, Vangel MG (2000) Restricted maximum likelihood estimation of a common mean and the Mandel–Paule algorithm. J Stat Plan Infer 83:319–330CrossRefGoogle Scholar
  36. 36.
    Diaconis P, Efron B (1983) Computer-intensive methods in statistics. Sci Am 248:116+CrossRefGoogle Scholar
  37. 37.
    Duewer DL, Kowalski BR, Fasching JL (1976) Improving the reliability of factor analysis of chemical data by utilizing the measured analytical uncertainty. Anal Chem 48:2002–2010CrossRefGoogle Scholar
  38. 38.
    Brereton RG (2003) Chemometrics: data analysis for the laboratory and chemical plant. Wiley, ChichesterGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  1. 1.Analytical Chemistry Division, Stop 8390, Chemical Science and Technology LaboratoryNational Institute of Standards and TechnologyGaithersburgUSA

Personalised recommendations