Skip to main content
Log in

Replicated observations in metrology and testing: modelling repeated and non-repeated measurements

  • General Paper
  • Published:
Accreditation and Quality Assurance Aims and scope Submit manuscript

Abstract

In order to choose the right statistical tool, a basic issue for an accurate analysis of each specific problem is to understand whether the set of replicated measurement results under examination is to be considered as pertaining to repeated measurements or not. This issue is particularly important, since most of the traditional tools are valid only for repeated measurements, but, in many cases, such as laboratory comparisons (in metrology and in testing), the measurements necessary to assess correctly the measurand value and the associated uncertainty do not represent repeated measurements. The analysis performed in this paper aims to shine some light on these issues, starting with a review of the basic concepts, such as repeatability, reproducibility, accuracy, systematic error and bias, as defined in international documents and as used in the literature. The paper shows that, currently, a full consensus on a common language and understanding has not yet been achieved, and then shows how this fact reflects on the basic data models, especially those concerning inter-comparison data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. In this paper, the term “replicated measurements” is used to indicate, in a general way, the “determination of a value more than once” (1993) [7]. The term “repeated” has a specific statistical meaning and potential confusion should be avoided. In fact, replicated measurements can be either repeated or “non-repeated,” depending on the conditions. See, for example, the statement “to verify control of precision, the laboratory may perform a number of replicate measurements under repeatability conditions” in [8].

  2. (2.13) “A statistical analysis of the quantity values obtained by measurements under repeatability conditions.” The reference in the text to the VIM 2004 draft is solely for the purpose of indicating recent significant changes in the definitions.

  3. But see later footnote 7.

  4. Here, I distinct “experimental units,” or groups of measurements, are performed in a single laboratory at different times. Under certain special conditions, one can consider instead the case that I laboratories are performing the measurements, each on their own standard, and they pertain to the same “experimental unit”: in this case, the subscript i refers to the ith laboratory and, in the following contents of this section, any reference to “group” should be changed to “laboratory.” See more about these conditions, and their applicability, in Viewpoint A.

  5. “The statistical conclusions are conditional on the assumed model. Therefore, the conclusions are justified only to the extent that the assumed model is justified” [16].

  6. If the same J for all i.

  7. However, the GUM is referring this sentence to repeated measurements (3.1.5) “variations in repeated observations are assumed to arise from not being able to hold completely constant each influence quantity that can affect the measurement results,” a concept repeated in (3.2.2). This is inconsistent according to the prevailing definition of “repeated measurements.”

  8. For example, once an MRA key comparison is done and Draft A distributed, outlying data become an evidence of “known systematic effects that significantly influence the estimate,” so, according to the GUM, should be “corrected”: this is not allowed by the MRA.

  9. Obviously, these “influences” are only those under the control of the experimenter. Time is almost never an influence quantity in itself, but in time, the influence quantities can show a variability.

  10. But until the 2004 version, the VIM definition was the “mean that would result from an infinite number of measurements of the same measurand carried out under repeatability conditions minus a true value of the measurand” (i.e. a random variable that carries the very same uncertainty of the repeated measurements).

  11. Zero expectation after correction. The uncertainties ε ij in Eq. 1 “include components of uncertainty associated with the corrections” [17].

  12. Actually, there are contrasting reproducibility definitions: QUAM “variability obtained when different laboratories analyse the same sample,” while “intermediate precision relates to the variation in results observed when one or more factors, such as time, equipment and operator, are varied within a laboratory” [20] (while the usual definition is given for repeatability: “variability observed within a laboratory, over a short time, using a single operator, item of equipment, etc.”); the GUM [6], as seen, uses the reproducibility definition but for “observations ... obtained under the same conditions of measurement” (4.2.1).

  13. This difficulty is pointed out in ISO 21749 [21] reported above.

  14. As in the definition of systematic error of the VIM [11] and QUAM [20], reported above.

  15. Notice again that, the weaker the control on the influence factors, the less the conceptual difference between the concepts of “repeatability” and “reproducibility.”

  16. This is basically the definition up until 2004. It is worth reporting that, in the 2006 draft, two alternative definitions had been considered: “〈classical [error] approach〉 closeness of agreement between a measured quantity value and a true quantity value of the measurand” and noting “the concept “measurement accuracy” is not given a numerical value, but a measurement is said to be more accurate when it offers a smaller measurement uncertainty. Measures of measurement accuracy are found in ISO 5725”; (2.14) “〈uncertainty approach〉 closeness of agreement between measured quantity values that are being attributed to the measurand” and noting “the concept measurement accuracy is not given a numerical value, but a measurement is said to be more accurate when it offers a smaller measurement uncertainty.” Then, the 3rd edition adopted the “Uncertainty Approach.”

  17. Actually, “consistent bias,” as indicated by NIST “bias that is significant and persists consistently over time for a specific instrument, operator, or configuration should be corrected if it can be reliably estimated from repeated measurements.” (2.5.3.3.2) [22] (notice the incorrect use of “repeated”).

  18. One has to note that, applied to sets of intra-laboratory data, model 4 should also become the model to be used instead of model 3 for the non-repeated measurements performed for obtaining a measure of reproducibility when intra-laboratory knowledge is supplemented by the inter-laboratory knowledge arising, e.g. from a comparison operation. In model 4, ε i becomes ε i  + η i .

  19. In fact, in metrology, the b i remain as unknown as a is, only the differences (b h  − b k ) of pairs of laboratories are measured.

  20. According to the VIM [11], the definition of “metrological compatibility” is (2.47) as the “absolute value of the difference of any pair of measured quantity values from two different measurement results is smaller than some chosen multiple of the standard measurement uncertainty of that difference,” also noting that the “metrological compatibility of measurement results replaces the traditional concept of “staying within the error,” as it represents the criterion for deciding whether two measurement results refer to the same measurand or not. If in a set of measurements of a measurand, thought to be constant, a measurement result is not compatible with the others, either the measurement was not correct (e.g. its measurement uncertainty was assessed as being too small) or the measured quantity changed between measurements.” Until 2004, it was “property satisfied by all the measurement results of the same quantity, characterised by an adequate overlap of their corresponding sets of quantity values.”

  21. Being as these repeated measurements are performed in different laboratories, it may be difficult to apply them the current “repeatability condition” definition.

  22. A term used in the MRA but not defined by the VIM.

  23. The test hypotheses are generally based on confidence levels or intervals. In metrology, the indication of a threshold for the definition of “outlier” would appear less arbitrary if, instead, a risk level was used. In fact, assessing the level of the risk of a failure (e.g. a wrong value in a certificate of calibration or of a test) by indicating how critical (risky) is that value—consider, e.g. a medical or contaminant analysis—is much closer to the intended use. “Correct, safe results may be obtained only by deriving proper information concerning acceptable risk from the real situation, and evaluating accordingly the boundaries of the relevant confidence interval. When considering a set of experimental data obtained from a population described in terms of a statistical distribution, a result may fall in a low probability tail owing to chance only, or to the occurrence of an exceptional phenomenon, or a combination of both. No matter which is the real cause, an outlier is produced; should the existence of a perturbing phenomenon be ruled out, chance is left as the only explanation of an unlikely occurrence” [31].

  24. In “the “standard” model of ISO 5725 it is assumed that ρ and σ are constant over i” [24] because a standard method is used, but that does not, in general, apply to metrology.

References

  1. Pavese F, Filipe E (2006) Some metrological considerations about replicated measurements on standards. Metrologia 43:419–425

    Article  Google Scholar 

  2. European Accreditation (2003) EA guidelines on the expression of uncertainty in quantitative testing, EA-4/16, December 2003, rev00

  3. Désanfant M, Priel M (2006) Road map for measurement uncertainty evaluation. Measurement 39:841–848. Special Issue: Pavese F (ed) Advanced mathematical tools for measurement in metrology and testing

    Article  Google Scholar 

  4. Kaarls R (1981) Proces Verbaux des seances du Comité International des Poids et Mesures, vol 49, pp A1–A12 (in French)

  5. Giacomo P (1981) Metrologia 18:43–44

    Google Scholar 

  6. BIPM, IEC, IFCC, ISO, IUPAC, IUPAP, OIML (1995) Guide to the expression of uncertainty in measurement (GUM), 2nd edn. International Organization for Standardization, Geneva, Switzerland

  7. ISO 3534-2 (1993) 2nd edn and (2006) 3rd edn Statistics—vocabulary and symbols—Part 2: applied statistics. International Organization for Standardization, Geneva, Switzerland

  8. American Association for Laboratory Accreditation (A2LA) (2002) Guide for the estimation of measurement uncertainty in testing

  9. Lira I, Woeger W (2006) Evaluation of repeated measurements from the viewpoints of conventional and Bayesian statistics AMCTM VII. In: Ciarlini P et al. (eds) Series “Advances in mathematics for applied sciences,” vol 72. World Scientific, Singapore, pp 73–84

  10. Lira I, Woeger W (2006) Comparison between the conventional and Bayesian approaches to evaluate measurement data. Metrologia 43:S249–S259

    Article  Google Scholar 

  11. BIPM/ISO (2007) International vocabulary of basic and general terms in metrology (VIM), 3rd edn

  12. ISO 5725 (1994) Accuracy (trueness and precision) of measurement methods and results. International Organization for Standardization, Geneva, Switzerland

  13. ISO 3534-3 (1999) Statistics—vocabulary and symbols—Part 3: design of experiments, 2nd edn. International Organization for Standardization, Geneva, Switzerland

  14. CIPM (1999) Mutual recognition of national measurement standards and of calibration and measurement certificates issued by national metrology institutes. Bureau International des Poids et Mesures, Sèvres, France

  15. Pavese F (2006) A metrologist viewpoint on some statistical issues concerning the comparison of non-repeated measurement data, namely MRA Key Comparisons. Measurement 39:821–828

    Article  Google Scholar 

  16. Kakher RN (2004) Combining information from interlaboratory evaluations using a random effects model. Metrologia 41:132–136

    Article  Google Scholar 

  17. Kakher RN, Datla RU, Parr AC (2003) Statistical interpretation of Key Comparison reference value and degrees of equivalence. J Res Natl Inst Stand Technol 108:439–446

    Google Scholar 

  18. Grabe M (1987) Principles of “Metrological Statistics.” Metrologia 23:213–219

    Article  Google Scholar 

  19. Grabe M (2005) Measurement uncertainties in science and technology. Springer, Berlin, Germany

    Google Scholar 

  20. Eurachem CITAC (2000) Guide CG4. Quantifying uncertainty in analytical measurements (QUAM 2000.1), 2nd edn

  21. ISO 21749 (2003) Measurement uncertainty for metrological applications—simple replication and nested experiments. International Organization for Standardization, Geneva, Switzerland

  22. National Institute of Standards and Technology (NIST) (2006) Engineering statistics handbook (e-Handbook). Available online at http://www.nist.gov/stat.handbook/

  23. Willink R (2006) Meaning and models in key comparisons, with measures of operability and interoperability. Metrologia 43:S220–S230

    Article  Google Scholar 

  24. Forbes AB, Perruchet C (2006) Measurement systems analysis: concepts and computational approaches. IMEKO World Congress, CD-ROM Proceedings, Societade Brasileira de Metrologia, Rio de Janeiro, Brazil, September 2006, session TC21

  25. DIN 1319-1 (1995) Fundamentals of metrology—Part I: basic terminology

  26. Pavese F (2005) Comments on ‘Statistical analysis of CIPM key comparisons based on the ISO Guide.’ Metrologia 42:L10–L12

    Article  Google Scholar 

  27. Kacker RN, Datla RU, Parr AC (2004) Statistical analysis of CIPM key comparisons based on the ISO Guide. Metrologia 41:340–352

    Article  Google Scholar 

  28. White DR (2000) CPEM, Sydney, Australia, CPEM Conference digest, pp 325–326

  29. White DR (2004) Metrologia 41:122–131

    Article  Google Scholar 

  30. Steele AG, Douglas RJ (2006) Simplicity with advanced mathematical tools for metrology and testing. Measurement 39:795–807

    Article  Google Scholar 

  31. Barbato G, Barini E, Levi R (2007) Management of outliers in experimental data measurement. Measurement 40 (in press)

  32. Pavese F (2007) The definition of the measurand in key comparisons: lessons learnt with thermal standards. Metrologia 44 (in press)

  33. Forbes AB (2006) Measurement uncertainty and optimized conformance assessment. Measurement 39:808–814

    Article  Google Scholar 

  34. Willink R (2006) Principles of probability and statistics for metrology. Metrologia 43:S211–S219

    Article  Google Scholar 

  35. Paule RC, Mandel J (1982) Consensus values and weighting factors. J Res Natl Bur Stand 87:377–385

    Google Scholar 

  36. Rukhin AL, Vangel MG (1998) J Am Stat Assoc 93:303–308

    Article  Google Scholar 

  37. Schiller SB, Eberhardt KR (1991) Spectrochim Acta 46B:1607–1613

    CAS  Google Scholar 

  38. Iyer HK, Wang CM, Mathew T (2004) J Am Stat Assoc 99:1060–1071

    Article  Google Scholar 

  39. Wang CM, Iyer HK (2006) A generalized confidence interval for a measurand in the presence of type-A and type-B uncertainties. Measurement 39:856–863

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Franco Pavese.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pavese, F. Replicated observations in metrology and testing: modelling repeated and non-repeated measurements. Accred Qual Assur 12, 525–534 (2007). https://doi.org/10.1007/s00769-007-0303-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00769-007-0303-4

Keywords

Navigation