# A comparison of location estimators for interlaboratory data contaminated with value and uncertainty outliers

## Abstract

While estimation of measurement uncertainty (MU) is increasingly acknowledged as an essential component of the chemical measurement process, there is little agreement on how best to use even nominally well-estimated MU. There are philosophical and practical issues involved in defining what is “best” for a given data set; however, there is remarkably little guidance on how well different MU-using estimators perform with imperfect data. This report characterizes the bias, efficiency, and robustness properties for several commonly used or recently proposed estimators of true location, *μ*, using “Monte Carlo” (MC) evaluation of “measurement” data sets drawn from well-defined distributions. These synthetic models address a number of issues pertinent to interlaboratory comparisons studies. While the MC results do not provide specific guidance on “which estimator is best” for any given set of real data, they do provide broad insight into the expected relative performance within broadly defined scenarios. Perhaps the broadest and most emphatic guidance from the present study is that (1) well-estimated measurement uncertainties can be used to improve the reliability of location determination and (2) some approaches to using measurement uncertainties are better than others. The traditional inverse squared uncertainty-weighted estimators perform well only in the absence of unrepresentative values (value outliers) or underestimated uncertainties (uncertainty outliers); even modest contamination by such outliers may result in relatively inaccurate estimates. In contrast, some inverse total variance-weighted-estimators and probability density function area-based estimators perform well for all scenarios evaluated, including underestimated uncertainties, extreme value outliers, and asymmetric contamination.

### Keywords

Consensus value Interlaboratory comparisons Measurement uncertainty Mixture models Monte Carlo evaluation Probability density function Robustness Weighting function### Abbreviations

- CCQM
Comité Consultatif pour la Quantité de Matière

- ITV
Inverse total variance

- IU
^{2} Inverse squared uncertainty

- KC
Key comparison

- Lp
Least power

- MADe
Median absolute deviation from the median, expressed as a standard deviation

- MC
Monte Carlo

- MM
Mixture model

- MU
Measurement uncertainty

*n*Number of measurements in a data set

*n*_{BS}Number of bootstrap pseudo-data sets

*n*_{MC}Number of Monte Carlo simulation data sets

*N*(*μ*,*σ*)Normal (Gaussian) distribution having mean

*μ*and standard deviation*σ*- NMI
National Metrology Institute

*P*Level of confidence

- PC
Principal component

Probability density function

*U*_{I}[*ℓ*,*u*]Uniform (rectangular) distribution of integers having lower limit

*ℓ*and upper limit*u**U*_{R}[*ℓ*,*u*]Uniform (rectangular) distribution of real numbers having lower limit

*ℓ*and upper limit*u**s*Estimate of dispersion, expressed as a standard deviation

- \( s{\left( {\hat{\mu }} \right)} \)
Estimate of the variability of an estimator on replicate sampling of a population, expressed as a standard deviation

*u*(*x*_{i})Uncertainty component of the

*i*th measurement, expressed as a standard deviation*U*_{P}(*x*_{i})Uncertainty component of the

*i*th measurement, expressed as a coverage interval providing approximately*P*% level of confidence*w*_{i}Weight in a given calculation given to the

*i*th measurement- \( \ifmmode\expandafter\bar\else\expandafter\=\fi{x} \)
Arithmetic mean

*x*_{i}Value component of the

*i*th measurement*μ*True location of a population

- \( \hat{\mu } \)
Estimate of location

*σ*True dispersion of a population, expressed as a standard deviation

### References

- 1.Duewer DL (2004) A robust approach for the determination of CCQM key comparison reference values and uncertainties Working document CCQM/04-15. http://www.bipm.info/cc/CCQM/Allowed/10/CCQM04-15.pdf, accessed: 13 September 2007
- 2.Toman B (2007) Bayesian approaches to calculating a reference value in key comparison experiments. Technometrics 49(1):81–77CrossRefGoogle Scholar
- 3.Cox MG (1999) A discussion of approaches for determining a reference value in the analysis of Key-Comparison data NPL Report CISE 42/99. http://www.npl.co.uk/ssfm/download/nplreports.html, accessed: 13 September 2007
- 4.Lowthian PJ, Thompson M (2002) Bump-hunting for the proficiency tester—searching for multimodality. Analyst 127:1359–1364CrossRefGoogle Scholar
- 5.Ciarlini P, Cox MG, Pavese F, Regoliosi G (2004) The use of a mixture of probability distributions in temperature interlaboratory comparisons. Metrologia 41:116–121CrossRefGoogle Scholar
- 6.BIPM key comparison database. http://kcdb.bipm.org/AppendixD/default.asp, accessed: 13 September 2007
- 7.CIPM (1 March 1999) Guidelines for CIPM key comparisons. http://www.bipm.org/utils/en/pdf/guidelines.pdf, accessed: 13 September 2007
- 8.Andrews DF, Bickel PJ, Hampel FR, Huber PJ, Rogers WH, Tukey JW (1972) Robust estimates of location. Princeton University Press, PrincetonGoogle Scholar
- 9.Willink R (2006) Meaning and models in key comparisons, with measures of operability and interoperability. Metrologia 43:S220–S230Google Scholar
- 10.Croux C, Haesbroeck G (2002) Maxbias curves of robust location estimators based on subranges. J Nonparametr Stat 14:295–306CrossRefGoogle Scholar
- 11.Cox MG (2007) The evaluation of key comparison data: determining the largest consistent subset. Metrologia 44:187–200CrossRefGoogle Scholar
- 12.Duewer DL (2007) How to combine results having stated uncertainties: to MU or not to MU? In: Fajgelj A, Belli M, Sansone U (eds) Combining and reporting analytical results. RSC, London, pp 127–142Google Scholar
- 13.ISO (1995) Guide to the expression of uncertainty in measurement. ISO, GenevaGoogle Scholar
- 14.Rukhin AL, Vangel MG (1998) Estimation of a common mean and weighted means statistics. J Am Stat Assoc 93(441):303–308CrossRefGoogle Scholar
- 15.Müller JW (2000) Possible advantages of a robust evaluation of comparisons. J Res Nat Inst Std Technol 105:551–554Google Scholar
- 16.Cox MG (2002) The evaluation of key comparison data. Metrologia 39:589–595CrossRefGoogle Scholar
- 17.Pennecchi F, Callegaro L (2006) Between the mean and the median: the Lp estimator. Metrologia 43:213–219CrossRefGoogle Scholar
- 18.Callegaro L, Pennecchi F (2007) Why always seek the expected value? A discussion relating to the Lp norm. Metrologia 44(6):L68–L70CrossRefGoogle Scholar
- 19.Analytical Methods Committee (2001) Robust statistics: a method of coping with outliers AMC Technical Brief 6. http://www.rsc.org/images/brief6_tcm18-25948.pdf, accessed: 13 September 2007
- 20.Analytical Methods Committee (1989) Robust statistics—how not to reject outliers. Part 1. Basics. Analyst 114:1693–1697CrossRefGoogle Scholar
- 21.RobStat.xla, MS EXCEL Add-in for Robust Statistics (2002) http://www.rsc.org/Membership/Networking/InterestGroups/Analytical/AMC/Software/RobustStatistics.asp, accessed: 13 September 2007
- 22.Viser RG (2006) Interpretation of interlaboratory comparison results to evaluate laboratory proficiency. Accred Qual Assur 10(9):521–526CrossRefGoogle Scholar
- 23.Dataplot, http://www.itl.nist.gov/div898/software/dataplot/homepage.htm, accessed: 13 September 2007
- 24.Rousseeuw PJ (1985) Multivariate estimation with high breakdown point In: Grossman W, Pflug G, Nincze I, Wetrz W (eds) Mathematical statistics and applications. Reidel, Dordrecht, The Netherlands, pp 283–297Google Scholar
- 25.Rose AH, Wang C-M, Byer SD (2000) Round Robin for optical fiber Bragg grating metrology. J Res Nat Inst Std Technol 105:839–866Google Scholar
- 26.Cox NJ (2007), SHORTH: Stata module for descriptive statistics based on shortest halves. http://ideas.repec.org/c/boc/bocode/s456728.html, accessed: 13 September 2007
- 27.Spitzer P, VyskoČil L, Máriássy M, Pratt KW, Hongyu X, Dazhou C, Fanmin M, Kristensen HB, Hjelmer B, Rol PM, Nakamura S, Kim M, Torres M, Kozlowski W, Wyszynska J, Pawlina M, Karpov OV, Zdorikov N, Seyku E, Maximov I, Schmidt I, Eberhardt R (2001) pH determination on two phosphate buffers by Harned cell measurements, Final report for CCQM-K9. http://kcdb.bipm.org/AppendixB/appbresults/ccqm-k9/ccqm-k9_final_report.pdf, accessed: 13 September 2007
- 28.Rukhin AL, Sedransk N (2007) Statistics in metrology: international key comparisons and interlaboratory studies. J Data Sci 5:393–412Google Scholar
- 29.Graybill FA, Deal RB (1959) Combining unbiased estimators. Biometrics 15:543–550CrossRefGoogle Scholar
- 30.Heydorn K (2006) The determination of an accepted reference value from proficiency data with stated uncertainties. Accred Qual Assur 10(9):479–484CrossRefGoogle Scholar
- 31.Decker JE, Brown N, Cox MG, Steele AG, Douglas RJ (2006) Recent recommendations of the consultative committee for length (CCL) regarding strategies for evaluating key comparison data. Metrologia 43:L51–L55CrossRefGoogle Scholar
- 32.Steele AG, Wood BM, Douglas RJ (2005) Outlier rejection for the weighted-mean KCRV. Metrologia 42:32–38CrossRefGoogle Scholar
- 33.Ratel G (2006) Median and weighted median as estimators for the key comparison reference value (KCRV). Metrologia 43:S244–S248CrossRefGoogle Scholar
- 34.Paule RC, Mandel J (1982) Consensus values and weighting factors. J Res Nat Bur Std 87:377–385Google Scholar
- 35.Rukhin AL, Biggerstaff BJ, Vangel MG (2000) Restricted maximum likelihood estimation of a common mean and the Mandel–Paule algorithm. J Stat Plan Infer 83:319–330CrossRefGoogle Scholar
- 36.Diaconis P, Efron B (1983) Computer-intensive methods in statistics. Sci Am 248:116+CrossRefGoogle Scholar
- 37.Duewer DL, Kowalski BR, Fasching JL (1976) Improving the reliability of factor analysis of chemical data by utilizing the measured analytical uncertainty. Anal Chem 48:2002–2010CrossRefGoogle Scholar
- 38.Brereton RG (2003) Chemometrics: data analysis for the laboratory and chemical plant. Wiley, ChichesterGoogle Scholar