Basic concepts for measurement and testing and related quality assurance
A number of basic measurement and testing concepts are explained in this annex. This will help the reader of the paper to better understand some of the terms used in the main body of the text. The authors also wanted to illustrate the similarities between the concepts behind the different terms used in different fields of measurement and (regulatory) testing.
The terms ‘testing’ and ‘measurement’
Two fundamental concepts in this article are ‘test’ and ‘measurement’, which may have different connotations in different scientific disciplines.
For regulatory testing, the OECD guidance document 34 (OECD 2005) defines a test method as (p. 17): ‘… an experimental system that can be used to obtain a range of information from chemical properties through the adverse effects of a substance. The term ‘test method’ may be used interchangeably with ‘assay’ for ecotoxicity as well as for human health studies. …’. Testing means applying a test method.
Measurement is defined (JCGM 2008) as the process of experimentally obtaining one or more quantity values (number and reference together expressing magnitude) of a quantity, the latter being a property of a phenomenon, body, or substance, where the property has a magnitude that can be expressed as a number and a reference. The science of measurement and its application is called metrology.
When speaking in the most general sense, there are tests that are not usually seen as measurements, and measurements that are not usually called tests. However, the basic elements of the above definitions are highly similar.
Substances, materials and samples
The term substance generally describes a chemical element or compound.
This paper uses the REACH (EC 2006) definition of substance: In the EU regulatory context (EC 1967), now part of REACH, the term ‘substance’ has a more specific meaning: substance means ‘a chemical element and its compounds in the natural state or obtained by any manufacturing process, including any additive necessary to preserve its stability and any impurity deriving from the process used, but excluding any solvent which may be separated without affecting the stability of the substance or changing its composition’. Hence, REACH substances may consist of one or more chemical compounds, and guidance is found on http://echa.europa.eu.
There is no unique definition of the term material. Its exact meaning varies between different scientific fields or industrial sectors (ISO 2012), but most of the existing ISO definitions [for example in ISO 1182:2010 (ISO 2010b)] describe a material as a single substance or a uniformly dispersed mixture of substances. This definition implies that the term material has a broader coverage than the term substance as defined in REACH. Examples of materials that are not typically called substance are concrete, timber, stone, milk powder, etc. Nevertheless, as REACH addresses all forms of substances placed on the market, materials, including nanomaterials, are covered by REACH.
Both materials and substances may be quite complex in composition, including for example impurities and necessary stabilisers and may consist of multiple phases or components with a certain variability in composition. In any case, material and substance are collective terms for kinds of matter with properties that are considered uniform above a certain scale. A relevant question to ask is when substances or materials produced as different batches, or in different places, or by different processes can be considered as identical, equivalent or similar. The answer depends amongst others on the acceptable heterogeneity, for example in the variation of property values between different batches, as well as on the context, which for example may relate to specific legal requirements. To assess the heterogeneity within or between batches, tests need to be performed on samples of the material(s): a sample is a specific, unique portion of material or substance selected from a larger amount of parent material. A representative sample is a sample that reflects the average property of the parent material.
The terms test sample, test material, and test substance are straightforward combined terms, to be interpreted in accordance with the above understanding of the terms sample, material, or substance. Hence, a test material or test substance is a material or substance to be tested. A test sample is a portion of a test material or test substance.
The terms test item [see e.g. Good Laboratory Practice (GLP) (OECD 1998)], test piece [see e.g. ISO 148–2 (ISO 2008b)] and test specimen are used synonymously, and have the same meaning as test sample.
Steps in a test or measurement process
On the most generic level, any test or measurement consists of the following steps:
A representative sample is taken from the material to be tested and often undergoes pre-treatment (dispersion, dissolution, digestion, etc.).
The test sample is fed into a test system where it creates a response. A wide range of test systems exists, for example physical instruments (electron microscopes, etc.) or biological systems (microbiological assays for antibiotics, rats for toxicity testing, etc.). The nature of the response can be chemical (e.g. production of a precipitate), physical (e.g. absorption of light of a certain wavelength) or biological (e.g. change of growth of bacteria in the presence of the sample).
The response is expressed in quantitative terms (e.g. mass of precipitate is 1.245 g) or in qualitative terms (e.g. there is a blue precipitate).
The response is compared with the response of a (sample of a) second material, with known properties and response, either in a quantitative or in a qualitative way. As the discussion below shows, numerous terms like ‘calibrant’, ‘calibrator’, ‘standard’, ‘reference substance’, ‘reference chemical’ or ‘certified reference material’ and ‘control’ exist for this second sample (Emons et al. 2004), and possibly each term has a slightly different meaning in its context of origin. This comparison is an important part of the metrological traceability chain of the test result (see “Metrological traceability” section).
In addition to this generic procedure, there are of course features which are particular to the test system (e.g. higher variability in observed responses when using biological test systems as compared to physical test systems). Also in these cases, the above generic procedure would still provide a descriptive framework.
A key concept in any testing is ‘metrological traceability’, defined by the Joint Committee for Guides in Metrology (JCGM 2008) as ‘property of a measurement result whereby the result can be related to a reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty’. The level of traceability establishes the degree of comparability of the measurement for example over time and geographically: whether the result of a measurement can be compared to the previous one, to a measurement result from a year ago, or to the result of a measurement performed anywhere else in the world.
Metrological traceability includes the identity of the
measurand (the property to be measured) and its quantity value. The identity of the measurand can be structurally defined (such as mass, or bond lengths in a specific molecule) or procedurally defined. Examples of test procedures defining the measurand are the mass fraction of dietary fibre in fruits, enzyme activity or skin sensitization according to OECD Test Guideline 429, where the measurand is the proliferation of lymphocytes in the lymph node in test groups of animals exposed to the test substance. This is compared to the proliferation of lymphocytes in the lymph node in test groups of animals which are vehicle treated controls, and gives a Stimulation Index.
Traceability of the quantity value is defined by the calibration system. In its simplest form, an international agreement has been reached to use a certain material as standard, and a well-known example is the international kilogram stored in Paris, but also many primary WHO preparations (WHO 2011) fall under this category.
Quality assurance of test and measurement methods
In order to ensure that the testing is as reliable as required by the user of the test results, methods are validated, calibrated and subjected to continuous quality control, as relevant. The main aspects of such quality assurance efforts, as well as the requirements of the corresponding benchmark materials, are described below.
Calibration is the process where the response of the calibrant or reference substance is related to its property value. The curve of the known property values of a set of calibrants versus responses of the test system allows assigning a quantity value to other substances measured thereafter. A first key requirement is that the calibrant or reference substance is homogeneous and stable to ensure that any portion of it gives the same result. A second key requirement is that the value of the measurand of interest is well established. The value of the measurand can for example be a concentration or an antibiotic potency. The measurement unit does not need to be a physical unit, and for example, the result of a test may be that a substance A is ‘x times more potent than reference substance B’. Often, calibrants or reference substances are idealised samples, for example consisting of a single chemical compound. For (eco-) toxicological testing (using for example the OECD test guidelines), the tests may not have designs that lend themselves to classical calibration, and the term ‘calibrant/calibration’ is not used.
For (eco-) toxicological testing, the OECD Guidance Document 34 (OECD 2005) defines test method validation as ‘… a process based on scientifically sound principles by which the reliability and relevance of a particular test, approach, method or process are established for a specific purpose. Reliability is defined as the extent of reproducibility of results from a test within and among laboratories over time, when performed using the same standardised protocol. The relevance of a test method describes the relationship between the test and the effect in the target species and whether the test method is meaningful and useful for a defined purpose, with the limitations identified. In brief, it is the extent to which the test method correctly measures or predicts the (biological) effect of interest, as appropriate. Regulatory need, usefulness and limitations of the test method are aspects of its relevance. New and updated test methods need to be both reliable and relevant, i.e. validated’.
Validation of regulatory test methods, for example within the OECD, takes place once to ensure that the methods are reliable and relevant for the endpoint and that the method is recognised within the relevant jurisdiction.
Method validation has also a more generic meaning: laboratories validate test methods, i.e. they verify whether the specified method is adequate for an intended use. According to ISO and others, method validation requires materials to test robustness, precision and trueness. One can note that robustness and precision are the equivalent of the term reliability used in OECD (OECD 2005), and that trueness is conceptually similar to the term relevance in OECD (OECD 2005).
Testing of robustness comprises including slight variations in the test protocol and observing the effect of this on the outcome of the test. The materials used for this test should resemble the actual test samples as closely as possible. In addition, the material needs to be stable over the time of the study and must be sufficiently homogeneous, so that a variation of results, if detected, indeed reflects the effect of the variations in the test protocol and not sample heterogeneity.
In a precision study, tests are performed on different subsamples of the same material to investigate the test result variation within one test series in the same laboratory (repeatability), between series in the same laboratory (intermediate precision) and between laboratories (reproducibility). Also here, the sample should closely resemble actual test samples in composition and response. To make the results meaningful, the material chosen must be stable over the time of the study and must be homogeneous to assure that observed variations reflect the variability of the method in or between laboratories, and not heterogeneity or changes of the material. The material tested does not need to be accompanied by beforehand well-established quantity values of the measurands of interest as the goal is a relative assessment.
In a trueness experiment, the response of the test system to a sample is compared to an assigned or expected response to this sample, and this necessarily requires a well-established value of the measurand in question. The material used to check trueness must be stable until the time of use and sufficiently homogeneous to ensure that different portions give the same result. Typically, a material for a trueness experiment is more complex or resembling more the real-life materials to be tested later, than a material used for calibration. The reason for this is that the trueness experiment must not only cover the signal-producing step in the measurement process, but also all sample preparation steps.
Quality control of methods
For quality control of a routinely used method, a known sample is analysed periodically or as part of every test series and results are compared with target values. If the result is within specified limits, the method is under control. The material used must be homogeneous, so that heterogeneity does not contribute to the variation of results. The material must also be stable, often over months or even years, as otherwise degradation would make comparison of results obtained over time invalid. Target values and acceptance limits are usually obtained from repeated tests.
Interlaboratory comparisons are faced with the same issues as interlaboratory method validation: A homogeneous set of samples, which is stable over the duration of the period of comparison, is required to ensure that any difference between participants’ test results are not due to changes in the inherent material properties or due to differences between the distributed samples.