Assessing PTs by a quality index P
Q derived from probabilities
This approach makes use of the precision parameters repeatability standard deviation σ
r and reproducibility standard deviation σ
R of automated fluoro-optic SCC measurement as reported in the international standard ISO 13366-2 | IDF 148-2 [6].
Assume that in a given PT the estimates s
r and s
R (or the standard deviation between laboratories, s
L) of the repeatability and reproducibility standard deviations, σ
r and σ
R, respectively, are computed (for one level) using the results from p laboratories. Each laboratory measures the test material n times. Then, a quality index P
Q based on the probabilities derived from Chi-square distributions can be constructed.
From standard statistical results, the following equation relating the estimated and the population repeatability variances with the Chi-square distribution with ν degrees of freedom holds for normally distributed measurements (see also ISO 5725-4 [13]):
$$\hat{\chi }_{{({\text{r}})}}^{2} = \frac{{\nu s_{\text{r}}^{2} }}{{\sigma_{\text{r}}^{2} }}\sim\chi_{\nu }^{2} \quad \quad \nu = p\left( {n - 1} \right),$$
(1)
and similarly
$$\hat{\chi }_{{({\text{R,r}})}}^{2} = \frac{{\nu \left( {s_{\text{R}}^{2} - \left( {1 - \frac{1}{n}} \right)s_{\text{r}}^{2} } \right)}}{{\sigma_{\text{R}}^{2} - \left( {1 - \frac{1}{n}} \right)\sigma_{\text{r}}^{2} }}\sim\chi_{\nu }^{2} \quad \quad \nu = p - 1,$$
(2)
which by \(s_{\text{L}}^{2} = s_{\text{R}}^{2} - s_{\text{r}}^{2}\) is the same as
$$\hat{\chi }_{{({\text{L,r}})}}^{2} = \frac{{\nu \left( {s_{\text{L}}^{2} + \frac{{s_{\text{r}}^{2} }}{n}} \right)}}{{\sigma_{\text{L}}^{2} + \frac{{\sigma_{\text{r}}^{2} }}{n}}}\sim\chi_{\nu }^{2} \quad \quad \nu = p - 1.$$
(3)
Therefore, we can estimate the probabilities P
(r) and P
(L,r):
$$P_{{({\text{r}})}} = P\left( {\chi_{\nu }^{2} > \hat{\chi }_{{({\text{r}})}}^{2} } \right) = 1 - P\left( {\hat{\chi }_{{({\text{r}})}}^{2} } \right) = 1 - P\left( {\frac{{\nu s_{\text{r}}^{2} }}{{\sigma_{\text{r}}^{2} }}} \right)$$
(4)
$$P_{{({\text{L,r}})}} = P\left( {\chi_{\nu }^{2} > \hat{\chi }_{{({\text{L,r}})}}^{2} } \right) = 1 - P\left( {\hat{\chi }_{{({\text{L,r}})}}^{2} } \right) = 1 - P\left( {\frac{{\nu \left( {s_{\text{L}}^{2} + \frac{{s_{\text{r}}^{2} }}{n}} \right)}}{{\sigma_{\text{L}}^{2} + \frac{{\sigma_{\text{r}}^{2} }}{n}}}} \right) .$$
(5)
The known variances \(\sigma_{\text{r}}^{2}\) and \(\sigma_{\text{L}}^{2}\) are derived from the values of σ
r and σ
R, as published in standard ISO 13366-2 | IDF 148-2 [6].
P
(r) and P
(L,r) may then be combined to define the PT quality index P
Q as the product of these probabilities:
$$P_{\text{Q}} = P_{{({\text{r}})}} P_{{({\text{L,r}})}} .$$
(6)
P
Q can be (approximately) interpreted as an estimate of the probability that the set of p laboratories within the PT can achieve a repeatability standard deviation as small as σ
r and simultaneously a standard deviation between laboratories as small as σ
L.
If the reference value θ of the test material is known, or the assigned value θ is accepted as reliable, then the z-scores (based on an accepted standard deviation for proficiency assessment, σ
p [11]) of the p laboratories can be combined. To reduce the influence of extreme z-score values, a robust mean estimator \(\bar{z}_{{({\text{rob}})}}\) according to Huber is necessary, known as A15 (without an iterative update of the robust estimation of the standard deviation) or as ‘Huber proposal 2’, or H15 (with an iterative update of the robust estimation of the standard deviation) (Algorithm A, described in Annex C [12]), [14, 15]. The robust sum of z-scores is therefore
$$Z_{p} = p \cdot \bar{z}_{{({\text{rob}})}} ,$$
(7)
and a probability P(Z
p
) for Z·\(\sqrt p\) larger than |Z
p
| may be derived on the basis of the realisation \(\hat{Z}\) of the standard normal random variable Z, i.e. \(\hat{Z} = {{Z_{p} } \mathord{\left/ {\vphantom {{Z_{p} } {\sqrt p }}} \right. \kern-0pt} {\sqrt p }} \sim N(0,1)\):
$$P(Z_{P} ) = 2P\left( {Z > \left| {\hat{Z}} \right|} \right) = 2P\left( {Z > \frac{{\left| {Z_{p} } \right|}}{\sqrt p }} \right) = 2\left( {1 - \Phi \left( {\frac{{\left| {Z_{p} } \right|}}{\sqrt p }} \right)} \right) ,$$
(8)
where P(·) stands for probability and Φ(·) indicates the distribution function of the standard normal distribution.
An alternative combination of z-scores is possible because the sum S
p
of the squared z-scores is Chi-square distributed with p degrees of freedom [11]: \(S_{p} = {{\sum\nolimits_{i = 1}^{p} z_{i}^{2} \sim \chi_{p}^{2} } }.\)
The quality index P
Q has three components in this case: two are related to precision measures and one is related to the trueness of the p mean values.
$$P_{\text{Q}} = P_{{({\text{r}})}} P_{{({\text{L,r}})}} P(Z_{p} )$$
(9)
It is still possible to modify this quality measure by multiplication with a further expression (factor) q = f(q
1, q
2, q
3, …, q
m
) made up of the PT-specific quality indices q
1, q
2, q
3, …, q
m
to obtain
$$P_{\text{Q}} = P_{{({\text{r}})}} P_{{({\text{L,r}})}} P(Z_{p} )q .$$
(10)
The m quality indices q
i1, q
i2, q
i3, …, q
im
may be used to model m PT
i
characterising criteria. The components of q
i
= f(q
i1, q
i2, q
i3, …, q
im
) could be defined in such a way that higher values in the resulting q
i
indicate higher quality.
To compare up to k PTs in such a way, it may be better to compute normalised values, especially if the P
Q values were calculated according to Eq. (10):
$$\tilde{P}_{{{\text{Q}},i}} = \frac{{P_{{{\text{Q}},i}} }}{{\sum\nolimits_{j = 1}^{k} {P_{{{\text{Q}},j}} } }} .$$
(11)
Comparing PT schemes over time based on the quality index PQ or its elements
There are various possibilities to construct quality control charts for a given PT scheme.
The following quality or performance characteristics may be plotted versus the number of rounds, 1, 2, …, t:
-
s
r or \(s_{\text{r}}^{2}\) or \(\hat{\chi }_{{({\text{r}})}}^{2}\) or P
(r)
-
s
L or \(s_{\text{L}}^{2}\) (or s
R or \(s_{\text{R}}^{2}\)) or \(\hat{\chi }_{{({\text{L,r}})}}^{2}\) or P
(L,r)
-
Z
p
or P(Z
p
)
-
P
Q
-
the fraction of ‘satisfactory’ z-scores, i.e. |z| ≤ 2, as proposed by Gaunt and Whetton [16].
The sums or cumulative averages of these characteristics over t rounds may be used as numerical indices to compare PT schemes quantitatively over time.
Assessing laboratories by a quality index P
L derived from probabilities
Again, this approach makes use of the precision parameters repeatability standard deviation σ
r and reproducibility standard deviation σ
R of automated SCC measurements, as reported in the international standard ISO 13366-2 | IDF 148-2 [6].
Assume that the values of σ
r and σ
R, as published in standard ISO 13366-2 | IDF 148-2 [6], are known and that an accepted reference value θ has been established.
A single laboratory within a PT can be rated similar to the rating shown above if it provides a repeatability standard deviation s
r and a mean value \(\bar{y}\) of n replicates at a given level (estimates of s
r and \(\bar{y}\) for σ
r and θ, respectively).
With
$$\hat{\chi }_{{({\text{r}})}}^{2} = \frac{{\nu \, s_{\text{r}}^{2} }}{{\sigma_{\text{r}}^{2} }}\sim\chi_{\nu }^{2} ,\quad \nu = n - 1$$
(12)
we can estimate the probability P
(r)
$$P_{{({\text{r}})}} = P\left( {\chi_{\nu }^{2} > \hat{\chi }_{{({\text{r}})}}^{2} } \right) = 1 - P\left( {\hat{\chi }_{{({\text{r}})}}^{2} } \right) = 1 - P\left( {\frac{{\nu s_{\text{r}}^{2} }}{{\sigma_{\text{r}}^{2} }}} \right) .$$
(13)
The difference \(\bar{y} - \theta\), standardised by \(\left[ {\sigma_{\text{R}}^{2} - \left( {1 - \frac{1}{n}} \right)\sigma_{\text{r}}^{2} } \right]^{{\frac{1}{2}}}\), is a standard normal variate:
$$\tilde{z}_{n} = \frac{{\bar{y} - \theta }}{{\left[ {\sigma_{\text{R}}^{2} - \left( {1 - \frac{1}{n}} \right)\sigma_{\text{r}}^{2} } \right]^{{\frac{1}{2}}} }}{\sim}N\left( {0,1} \right) ,$$
(14)
which is used to compute the probability
$$P(\tilde{z}_{n} ) = 2P\left( {Z > \left| {\tilde{z}_{n} } \right|} \right) = 2\left( {1 - \Phi \left( {\left| {\tilde{z}_{n} } \right|} \right)} \right) .$$
(15)
P
(r) and \(P(\tilde{z}_{n} )\) may be combined to define the laboratory quality index P
L as the product of these probabilities:
$$P_{\text{L}} = P_{{({\text{r}})}} P(\tilde{z}_{n} ) .$$
(16)
P
L can be (approximately) interpreted as an estimate of the probability that a certain laboratory having participated in a PT can achieve a repeatability standard deviation as small as σ
r and simultaneously a difference between the assigned value of the PT θ and its own mean value \(\bar{y}\) as small as the standard deviation between laboratories σ
L.
Again, it is possible to modify this quality measure by multiplication with a further expression (factor) q = f(q
1, q
2, q
3, …, q
m
) made up of the laboratory-specific quality indices q
1, q
2, q
3, …, q
m
to obtain
$$P_{\text{L}} = P_{{({\text{r}})}} P(\tilde{z}_{n} )q .$$
(17)
The components q
i1, q
i2, q
i3, …, q
im
of q
i
should be defined in such a way that higher values in the resulting q
i
indicate higher quality.
A normalised quality index \(\tilde{P}_{{{\text{L}},i}}\) may be preferred to compare a set of p laboratories, especially if the P
Ls were calculated according to Eq. (17):
$$\tilde{P}_{{{\text{L}},i}} = \frac{{P_{{{\text{L}},i}} }}{{\sum\nolimits_{j = 1}^{p} {P_{{{\text{L}},j}} } }} .$$
(18)
Comparing laboratories over time based on the quality index P
L or its elements
There are various possibilities to construct quality control charts for a given laboratory (see also ISO 13528 [12]). The following quality or performance characteristics may be plotted versus the number of rounds, 1, 2, …, t:
-
s
r or \(s_{\text{r}}^{2}\) or \(\hat{\chi }_{{({\text{r}})}}^{2}\) or P
(r)
-
\(\tilde{z}_{n}\) or \(P(\tilde{z}_{n} )\) (or z-scores as reported by the PT provider)
-
P
L
-
the fraction of ‘satisfactory’ z-scores, i.e. |z| ≤ 2, as proposed by Gaunt and Whetton [16].
The sums or cumulative averages of these characteristics over t rounds may be used as numerical indices to compare laboratories quantitatively.
Data
For the testing of the assessment schemes for PTs and laboratories using the probabilistic approach, the data from five national and international PTs were chosen (see Table 1). The PTs took place between September 2010 and October 2011. The data sets were well known, meaning that the evaluation had been finished and feedback had been received.
Table 1 PTs used for the calculation of the quality indices P
Q and P
L
Each level of a PT was handled as an individual comparison. PTs and laboratories were anonymised, and, where known, the multiple participations of a certain laboratory were each handled as an individual participant.
An Excel® spreadsheet was used for the evaluation. Firstly, the data of the different PTs and levels were arranged according to the necessary information, which included laboratory labels/codes (and the instrument type, if known), number of replicates n, mean values \(\bar{y}\) as reported by the laboratories, repeatability and reproducibility standard deviations s
r and s
R of the laboratories and reference values (consensus or ‘true’ values) θ as well as the s
r of the PT or PT level. Additionally, the robust sum of the z-scores was calculated according to Eq. (7).
Secondly, the quality indices P
Q (assessing PTs) were calculated by inserting the data into the specific Excel® spreadsheets. Additionally, the population repeatability standard deviations σ
r and the population reproducibility standard deviations σ
R from ISO 13366-2 | IDF 148-2:2006 [6] had to be implemented. As the reference values θ are mostly between the published values in the ISO IDF standard, an interpolation table was used to calculate the relevant σ
r and σ
R. ISO 13366-2 | IDF 148-2:2006 [6] mentions, e.g. for the levels of 150 000 SCC/mL and 300 000 SCC/mL repeatability values of 6 % and 5 % and reproducibility values of 9 % and 8 %, respectively. For a reference value of 162 000 SCC/mL a s
r of 5.92 % or 9 590 SCC/mL and a s
R of 8.92 % or 14 450 SCC/mL were interpolated. Quality indices q
1 … q
m, as proposed in Eq. (10), were not used because thus far no considerations of the characters and values of the factors have taken place. Therefore, the weight w for the difference 1 − q is of no meaning. The upper part of Fig. 1 shows a calculation example (with p being the number of laboratories participating in the PT).
Thirdly, the quality indices P
L (assessing the laboratories) were calculated by inserting the data in the specific Excel® spreadsheets. Additionally, the population repeatability standard deviations σ
r and the population reproducibility standard deviations σ
R from ISO 13366-2 | IDF 148-2:2006 [6] had to be implemented. As mentioned above, for the calculation of the quality indices P
Q for the PTs, an interpolation table is needed to calculate the relevant σ
r and σ
R. Again, a weight of w ∈ [0,1] for the difference 1 − q could be chosen, but, as mentioned above, thus far no considerations of the characters and values of the factors have taken place. Figures 2 and 3 show graphical evaluation and calculation examples.
In addition to the evaluation of the participating laboratories in a specific PT by calculating the individual quality indices P
L, it is also possible to calculate, for example, the median quality indices from different PTs in order to have an indicator regarding the comparability of a certain laboratory or instrument over time and in different PTs (see Fig. 4).