Probabilistic comparison and assessment of proficiency testing schemes and laboratories in the somatic cell count of raw milk
 1.2k Downloads
Abstract
The somatic cell count (SCC) of milk is one of the main indicators of the udder health status of lactating mammals and is a hygiene criterion of raw milk used to manufacture dairy products. An increase in SCC is regarded as one of the primary indicators of inflammation of the mammary gland. Therefore, SCC is relevant in food legislation as well as in the payment of exfarm raw milk and it has a major impact on farm management and breeding programs. Its determination is one of the most frequently performed analytical tests worldwide. Routine measurements of SCC are almost exclusively done using automated fluorooptoelectronic counting. However, certified reference materials for SCC are lacking, and the microscopic reference method is not reliable because of serious inherent weaknesses. A reference system approach may help to largely overcome these deficiencies and help to assure equivalence in SCC worldwide. The approach is characterised as a positioning system fed by different types of information from various sources. A statistical approach for comparing proficiency tests (PTs) by assessing them using a quality index P _{Q} and assessing participating laboratories using a quality index P _{L}, both deriving from probabilities, is proposed. The basic assumption is that PT schemes are conducted according to recognised guidelines in order to compute performance characteristics, such as zscores, repeatability and reproducibility standard deviations. Standard deviations are compared with the method validation data from the ISO method. Input quantities close to or smaller than the reference data of the method validation or the assigned value of the PT result in values for P _{Q} and P _{L} close to the maximum value. Evaluation examples of wellknown PTs show the practicability of the proposed approach.
Keywords
Reference system Somatic cell count Proficiency testing Statistical approach Quality indexIntroduction
The somatic cell count (SCC) of milk is one of the main indicators of the udder health status of lactating mammals and one of the hygiene criteria of raw milk used to manufacture dairy products. Somatic cells excreted through milk include various types of white blood cells and some epithelial cells. Its composition and concentration change dramatically during periods of inflammation. An increase in SCC is therefore regarded as one of the primary indicators of inflammation of the mammary gland [1]. Therefore, SCC is relevant in food legislation [2, 3, 4], in the payment of exfarm raw milk serving as a price setting quality parameter; when measured in individual animals, it also has a major impact on farm management and breeding programs. Consequently, somatic cell count determination is one of the most frequently performed analytical tests in dairy laboratories worldwide, with an estimated more than 500 000 000 tests per year [5].
SCC data for routine measurements are nowadays almost exclusively obtained through the application of automated fluorooptoelectronic counting. Guidance on this application is available through ISO 133662  IDF 1482 [6]. Part of the guidelines focus on calibration and calibration control; however, certified reference materials (CRM) for SCC are lacking. Laboratories therefore calibrate with ‘secondary’ reference materials, which are types of milk, more or less well defined in its properties, using assigned ‘reference values’ for counting. These reference values may derive from the application of the reference method, which is a direct microscopic SCC, according to ISO 133661  IDF 1481 [7], often in combination with the results of automated counting. Routine testing laboratories usually rely on these secondary reference materials and their assigned values. Others base their calibration on the performance in proficiency tests (PTs), and some rely on the standard settings of the instrument manufacturer. The reasons for lack of full reliance on the microscopic reference method are an insufficient definition of the measurand and a poor precision [5]. To overcome the large uncertainty of the microscopic reference method, reference material providers can additionally rely on a set of routine measurement data, often coming from a selected group of laboratories. However, such reliance bears the risk of circular calibration [8, 9]. If at least a part of the participating laboratories do not also rely on other PTs, they may start correcting their instruments to the assigned value, and an undefined drift within the large uncertainty of the reference method begins. The existing PTs therefore need to be interlinked based on a quantitative scale. At this juncture, there is no ‘true’ value to assess the competence of a laboratory.
A reference system approach may help to largely overcome these deficiencies and help to assure equivalence in somatic cell counting worldwide. A reference system is characterised as a positioning system fed by different types of information from various sources—that is, from reference materials, reference method analysis, routine method results and PT results of laboratories operating in a laboratory network structure [10].
The purpose of this work is to propose a statistical approach for comparing PTs by assessing them using a quality index P _{Q} and assessing participating laboratories using a quality index P _{L}, both deriving from probabilities. The approach was developed in the framework of the SCC Reference System Working Group (International Dairy Federation [IDF] and the International Committee on Animal Recording [ICAR] [5, 10]) by the participating organisations. The basic assumption is that the PT schemes are conducted according to recognised guidelines such as the Harmonized Protocol [11] and ISO 13528 [12] or ISO 5725 [13] in order to compute performance characteristics such as zscores, repeatability and reproducibility standard deviations. The existence of a CRM (as an estimate of a ‘true value’) is not required in the following considerations. The situation is comparable to the summarising assessment of medical and similar studies, where metaanalysis is a wellproved tool using variances and frequencies for weighting and as objective criteria. However, given the fact that reliable estimates of the population variances are available (see below), we preferred to develop a probabilistic approach.
Method
Assessing PTs by a quality index P _{Q} derived from probabilities
This approach makes use of the precision parameters repeatability standard deviation σ _{r} and reproducibility standard deviation σ _{R} of automated fluorooptic SCC measurement as reported in the international standard ISO 133662  IDF 1482 [6].
Assume that in a given PT the estimates s _{r} and s _{R} (or the standard deviation between laboratories, s _{L}) of the repeatability and reproducibility standard deviations, σ _{r} and σ _{R}, respectively, are computed (for one level) using the results from p laboratories. Each laboratory measures the test material n times. Then, a quality index P _{Q} based on the probabilities derived from Chisquare distributions can be constructed.
The known variances \(\sigma_{\text{r}}^{2}\) and \(\sigma_{\text{L}}^{2}\) are derived from the values of σ _{r} and σ _{R}, as published in standard ISO 133662  IDF 1482 [6].
An alternative combination of zscores is possible because the sum S _{ p } of the squared zscores is Chisquare distributed with p degrees of freedom [11]: \(S_{p} = {{\sum\nolimits_{i = 1}^{p} z_{i}^{2} \sim \chi_{p}^{2} } }.\)
The m quality indices q _{ i1}, q _{ i2}, q _{ i3}, …, q _{ im } may be used to model m PT_{ i } characterising criteria. The components of q _{ i } = f(q _{ i1}, q _{ i2}, q _{ i3}, …, q _{ im }) could be defined in such a way that higher values in the resulting q _{ i } indicate higher quality.
Comparing PT schemes over time based on the quality index P_{Q} or its elements
There are various possibilities to construct quality control charts for a given PT scheme.

s _{r} or \(s_{\text{r}}^{2}\) or \(\hat{\chi }_{{({\text{r}})}}^{2}\) or P _{(r)}

s _{L} or \(s_{\text{L}}^{2}\) (or s _{R} or \(s_{\text{R}}^{2}\)) or \(\hat{\chi }_{{({\text{L,r}})}}^{2}\) or P _{(L,r)}

Z _{ p } or P(Z _{ p })

P _{Q}

the fraction of ‘satisfactory’ zscores, i.e. z ≤ 2, as proposed by Gaunt and Whetton [16].
The sums or cumulative averages of these characteristics over t rounds may be used as numerical indices to compare PT schemes quantitatively over time.
Assessing laboratories by a quality index P _{L} derived from probabilities
Again, this approach makes use of the precision parameters repeatability standard deviation σ _{r} and reproducibility standard deviation σ _{R} of automated SCC measurements, as reported in the international standard ISO 133662  IDF 1482 [6].
Assume that the values of σ _{r} and σ _{R}, as published in standard ISO 133662  IDF 1482 [6], are known and that an accepted reference value θ has been established.
A single laboratory within a PT can be rated similar to the rating shown above if it provides a repeatability standard deviation s _{r} and a mean value \(\bar{y}\) of n replicates at a given level (estimates of s _{r} and \(\bar{y}\) for σ _{r} and θ, respectively).
The components q _{ i1}, q _{ i2}, q _{ i3}, …, q _{ im } of q _{ i } should be defined in such a way that higher values in the resulting q _{ i } indicate higher quality.
Comparing laboratories over time based on the quality index P _{L} or its elements

s _{r} or \(s_{\text{r}}^{2}\) or \(\hat{\chi }_{{({\text{r}})}}^{2}\) or P _{(r)}

\(\tilde{z}_{n}\) or \(P(\tilde{z}_{n} )\) (or zscores as reported by the PT provider)

P _{L}

the fraction of ‘satisfactory’ zscores, i.e. z ≤ 2, as proposed by Gaunt and Whetton [16].
The sums or cumulative averages of these characteristics over t rounds may be used as numerical indices to compare laboratories quantitatively.
Data
PTs used for the calculation of the quality indices P _{Q} and P _{L}
Name  Organiser  Date  No. of levels  No. of participating laboratories 

AIA Isl  Associazione italiana allevatori (AIA), Laboratorio Standard Latte (http://www.aia.it/lsl)  March 2011  6  27 
Characterisation of Agroscope SCC Standard  Agroscope, Institute for Food Sciences (http://www.agroscope.ch)  September 2010  2  21 
Characterisation of Agroscope SCC Standard  Agroscope, Institute for Food Sciences (http://www.agroscope.ch)  March 2011  2  21 
Cornell  Cornell University, Department of Food Science (http://foodscience.cals.cornell.edu/extensior/dairymilkproducts)  October 2011  8  8 
ICAR  ActaliaCecalait (http://www.cecalait.fr: http://www.icar.org/pages/Sub_Committees/sc_milk_laboratories.htm)  September 2011  10  15 
Each level of a PT was handled as an individual comparison. PTs and laboratories were anonymised, and, where known, the multiple participations of a certain laboratory were each handled as an individual participant.
An Excel^{®} spreadsheet was used for the evaluation. Firstly, the data of the different PTs and levels were arranged according to the necessary information, which included laboratory labels/codes (and the instrument type, if known), number of replicates n, mean values \(\bar{y}\) as reported by the laboratories, repeatability and reproducibility standard deviations s _{r} and s _{R} of the laboratories and reference values (consensus or ‘true’ values) θ as well as the s _{r} of the PT or PT level. Additionally, the robust sum of the zscores was calculated according to Eq. (7).
Discussion
P _{Q} and P _{L} are influenced by their input variables. The three variables and performance characteristics zscore, repeatability and reproducibility standard deviations are calculated according to recognised standards, and they are compared with the specific method validation data from the ISO standard. It follows that input quantities close to or smaller than the reference data of the method validation or the assigned value of the PT result in values for P _{Q} and P _{L} close to the maximum value of 1.
The outcome of a PT is influenced by the competence of the participating laboratories. If the laboratories perform well and the overall repeatability s _{r} of p laboratories is close to or even smaller than σ _{r} of the standard, then the probability P _{(r)} and the quality index P _{Q} of the concerned PT or PT level become larger or close to the maximum value of 1 (solid circle in Fig. 1, PT no. 6). Otherwise, if a larger part or most of the laboratories show a poor performance and s _{r} therefore is larger than σ _{r}, the probability P _{(r)} and the index P _{Q} become smaller (dashed circle, PT no. 16). The same is true for P _{Q} and the probability related to the interlaboratory standard deviation P _{(L,r)}, calculated from the PT’s reproducibility s _{R} (solid and dashed circles in Fig. 1, PTs nos. 4 and 28). If the mean values of the laboratories in the PT are close to the assigned value, then the robust absolute sum of p zscores Z _{ p } according to Eq. (7) becomes small, and the related probability \(P\left( {Z_{p} } \right)\) and the index P _{Q} become large or close to the maximum value of 1 (solid circle, PT no. 26). For large values of Z _{ p }, the probability P(Z _{ p }) and the index P _{Q} become small (dashed circle, PT no. 1). The summarising quality index P _{Q} is almost equally influenced by the probabilities P _{(r)}, P _{(L,r)} and P(Z _{ p }) and therefore allows no conclusion on the PT’s performance concerning the repeatability, interlaboratory standard deviation and zscores achieved by the participating laboratories.
Regarding the assessment of a laboratory, the influence of its repeatability s _{r} and the mean value of a laboratory \(\bar{y}\) is shown in Figs. 2 and 3. If s _{r} is larger than σ _{r}, the probability related to the repeatability standard deviation P _{(r)} becomes small as well as the corresponding quality index P _{L}. In cases where s _{r} is close to or smaller than σ _{r}, the opposite is true, and the probability P _{(r)} as well as the quality index P _{L} become larger or close to the maximum value of 1. If the mean value \(\bar{y}\) is larger or smaller than the reference value (consensus value, ‘true’ value) θ, then the absolute zscore \(\left {\tilde{z}_{n} } \right\) becomes larger, and the related probability \(P\left( {\tilde{z}_{n} } \right)\) as well as the corresponding quality index P _{L} become small. In cases where the mean value \(\bar{y}\) is close or equal to the reference value θ, the absolute zscore \(\left {\tilde{z}_{n} } \right\) becomes small, and the related probability \(P\left( {\tilde{z}_{n} } \right)\) as well as the corresponding quality index P _{L} become large or close to the maximum value of 1. The summarising quality index P _{L} is almost equally influenced by the probabilities P _{(r)} and \(P\left( {\tilde{z}_{n} } \right)\) and therefore allows no conclusions on the laboratory’s performance concerning repeatability and comparability to the assigned value (this differentiation is provided by the results of the PTs reported to the participants).
In Eqs. (10) and (17), the possibility to modify the quality measure by multiplication with further expressions is mentioned. Such expressions (factors) q = f(q _{1}, q _{2}, q _{3}, …, q _{ m }) made up of m PT and laboratoryspecific quality indices q _{ i1}, q _{ i2}, q _{ i3}, …, q _{ im } may be used to model m PT_{ i } characterising criteria (e.g. frequency of the PT, number of participants, number of test levels, interlinkage to other PTs, [summarised] competence index of participating laboratories and of the PT provider, frequency of laboratories’ PT participation, competence of the laboratory and laboratory bias [by considering the zscore, e.g. q _{ i }(z _{ i }) = 2(1 − Φz _{ i })]). Further criteria are mentioned by Golze [18]. The components of q _{ i } need to be defined in such a way that higher values in the resulting q _{ i } indicate higher quality. As yet, no experts in the field of automated somatic cell counting have established such indices and experience in this regard is lacking. The need for using such indices might appear as soon as a system like that described in this paper is set up, and more data than are presented here are integrated. The brackets in the graphical evaluation of the median quality indices in Fig. 5 mark groups of laboratories and instruments and their numbers of times of participation. The median quality indices show a tendency to decline with higher numbers of times of participation. If such a tendency were to become obvious with more data sets, the use of specific quality indices might be necessary.
A model such as that described here can be used for all types of PTs where measurands are quantified. To set up a system as described here, a neutral and trustworthy body is needed to collect the sensitive data from PT trial organisers. Participating laboratories need to give authorisation for the evaluation of their data. Results must be anonymised, and it would be in the responsibility of PT providers and laboratories to communicate their codes to their customers in order to demonstrate their competence.
Supplementary material
References
 1.Pyörälä S (2003) Indicators of inflammation in the diagnosis of mastitis. Vet Res 34:565–578CrossRefGoogle Scholar
 2.Regulation (EC) (2004) No 853/2004 of the European Parliament and of the Council of 29 April 2004 laying down specific hygiene rules for on the hygiene of foodstuffs. Off J Eur Union. 139/55 Annex II, Section IX, BrusselsGoogle Scholar
 3.Grade “A” (2009) Pasteurized Milk Ordinance, 2009 Revision. US Department of Health and Human Services, Public Health Service, Food and Drug Administration, Silver SpringGoogle Scholar
 4.Beal R, Eden M, Gunn I, Hook I, LacyHulbert J, Morris G, Mylrea G, Woolford M (2001) Managing mastitis: a practical guide for New Zealand dairy farmers. Livestock Improvement, HamiltonGoogle Scholar
 5.Baumgartner C (2008) Architecture of reference systems, status quo of somatic cell counting and concept for the implementation of a reference system for somatic cell counting. Bull IDF 427Google Scholar
 6.ISO 133662  IDF 1482:2006. Milk—enumeration of somatic cells, Part 2—guidance on the operation of fluorooptoelectronic counters. International Organization for Standardization, Geneva and International Dairy Federation, BrusselsGoogle Scholar
 7.ISO 133661  IDF 1481:2008. Milk—enumeration of somatic cells, Part 1—microscope method (Reference method). International Organization for Standardization, Geneva and International Dairy Federation, BrusselsGoogle Scholar
 8.Petley BW (1985) Fundamental physical constants and the frontier of measurement. Adam Hilger, BristolGoogle Scholar
 9.Pendrill LR (2005) Meeting future needs for metrological traceability—a physicist’s view. Accred Qual Assur 10:133–139CrossRefGoogle Scholar
 10.Orlandini S, van den Bijgaart H (2011) Reference system for somatic cell counting in milk. Accred Qual Assur 16:415–420CrossRefGoogle Scholar
 11.Thompson M, Ellison SLR, Wood R (2006) The international harmonized protocol for the proficiency testing of analytical chemistry laboratories. Pure Appl Chem 78:145–196CrossRefGoogle Scholar
 12.ISO 13528:2005. Statistical methods for use in proficiency testing by interlaboratory comparisons. International Organization for Standardization, GenevaGoogle Scholar
 13.ISO 5725:1994. Accuracy (trueness and precision) of measurement methods and results. Parts 2, 4, 6. International Organization for Standardization, GenevaGoogle Scholar
 14.Analytical Methods Committee of the Royal Society of Chemistry (1989) Robust statistics: How not to reject outliers. Part 1. Basic concepts. Analyst 114:1693–1697CrossRefGoogle Scholar
 15.Analytical Methods Committee of the Royal Society of Chemistry (2001) MS EXCEL addin for robust statistics. (http://www.rsc.org/Membership/Networking/InterestGroups/Analytical/AMC/Software/RobustStatistics.asp)
 16.Gaunt W, Whetton M (2009) Regular participation in proficiency testing provides long term improvements in laboratory performance: an assessment of data over time. Accred Qual Assur 14:449–454CrossRefGoogle Scholar
 17.Thompson M, Lowthian PJ (1998) The frequency of rounds in a proficiency test: Does it affect the performance of participants? Analyst 123:2809–2812CrossRefGoogle Scholar
 18.Golze M (2001) Information system and qualifying criteria for proficiency testing schemes. Accred Qual Assur 6:199–202CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.