1 Introduction

An appropriate approach for the validation of qualitative methods will often differ considerably from that of quantitative methods. Nevertheless, core concepts from the validation of quantitative methods can be successfully carried over to qualitative methods. This paper shows how the reproducibility of a method—a performance characteristic usually associated with quantitative methods—can be determined in collaborative studies for qualitative methods in microbiology.

In analytical chemistry, one of the fundamental indicators of the performance of a quantitative method is the reproducibility of test results, as described in ISO 5725 (ISO 1994). While the concept of reproducibility is easily interpreted for qualitative methods in terms of consistent test results across laboratories for samples with the same level of contamination, it is not clear at all how to describe or characterize a qualitative method’s reproducibility in such a way as to make possible a comparison to criteria or other methods. In the last few years, however, novel validation approaches have been proposed for the characterization of the reproducibility of a qualitative method (Uhlig et al. 2011, 2013, 2015; Grohmann et al. 2015).

Why is it important to determine a method’s reproducibility? In order to answer this question, consider the case that a level of detection (LOD) of 3 colony forming units (CFU) per mL is determined in the validation study of a qualitative microbiological method, but that the LOD is sometimes much higher depending on the laboratory or measurement conditions. In such a case, failing to detect the occasional unreliability of the method could lead to mistakes in routine laboratory determinations. On the other hand, if a LOD of 300 CFU/mL is obtained in the validation study, the method will not be accepted even if this excessive LOD is not representative of its average performance. Accordingly, both the average LOD value and the reproducibility parameter—describing the variability of the LOD across laboratories or measurement conditions—capture important information about the performance of the method and should be determined in the course of the validation process.

In the case of microbiological methods, an exact determination of absolute contamination levels is often not possible. For this reason, the ISO 16140-2 (ISO 2016) proposes an approach which is based on the ratio of the LOD values of a reference and an alternative method. Just as in the case of the LOD, both average and reproducibility precision parameters can be calculated for this relative LOD (RLOD) value.

In order to determine the reproducibility of a qualitative method, a suitable approach must be identified for the conversion of the qualitative results into quantitative ones. In this paper, the case will be considered where the distribution of CFU contamination levels follows a Poisson distribution. The reliability and robustness of the validation can be enhanced by means of a systematic study of the effect of influence factors. Such an approach also allows a reduction in workload, with reliable validation parameters with as few as 5 participating laboratories.

2 Materials and methods

The approach presented here is based on the computation of a power curve, which plots the probability of detection POD (probability that the target microorganism is detected) as a function of the contamination level x (in CFU/mL). The limit of detection LOD 95% or LOD 50% is then defined as the contamination level corresponding to POD(LOD 95%) = 0.95 or POD(LOD 50%) = 0.5.

In the case of the detection of target microorganisms, it cannot be assumed that, for a particular dilution level, the CFU contamination level is the same from one test sample to the next. In the context of a collaborative method validation, it is thus necessary to distinguish between the theoretical or nominal CFU contamination level and the unknown actual CFU contamination level in a given test sample. The fundamental assumption is that, for a given nominal CFU contamination level, the actual contamination level in a particular test sample is subject to random variation and follows a Poisson distribution. More specifically, with x denoting the nominal CFU contamination level in CFU/mL, the probability that a test sample has a contamination level of k CFU/mL is

$$ p_{k} = \frac{{x^{k} }}{k!}\exp ( - x),\quad {\text{for}}\;{\text{all}}\;k = 0,{\mkern 1mu} 1,{\mkern 1mu} 2,{\mkern 1mu} 3,{\mkern 1mu} 4, \ldots . $$
(1)

On the assumption that every colony is detected, the probability of detection is thus

$$ POD = 1 - \exp ( - x). $$
(2)

This model is refined by introducing an extra parameter \( 0 \le a \le 1 \) (referred to as the sensitivity parameter) to account for unsuccessful detection:

$$ POD = 1 - \exp \left( { - a \cdot x} \right). $$
(3)

As can be seen, the POD increases with a. The value \( a = 0 \) corresponds to \( POD = 0 \) no matter the nominal number of copies (i.e. the method is useless), while, at the other extreme, the value a = 1 corresponds to \( POD = 1 - \exp ( - x) \) (i.e. the method is perfect).

Taking consecutive logarithms and rearranging, one obtains

$$ \ln ( - \ln (1 - {\text{POD(}}x ))) = \ln a + \ln x. $$
(4)

This equation will now be expanded in two directions. First, in the framework of a collaborative validation study, it will be assumed that different laboratories have different sensitivities a i and that \( { \ln }\,a_{i} \) follows a normal distribution with

$$ \ln a_{\text{i}} \sim {\text{N(}}\mu ,\sigma_{lab}^{2} ). $$
(5)

The parameter μ represents the average (log) sensitivity parameter across laboratories and the variance \( \upsigma_{\text{lab}}^{2} \) characterizes the variability of (log) sensitivity across laboratories. Accordingly, the model can now be written:

$$ \ln ( - \ln (1 - {\text{POD}}_{i} (x))) = \ln a_{i} + \ln x, $$
(6)

where the subscript i represents the laboratory. This POD model is known as a GLMM (generalized linear mixed model) with “complementary log–log” link function and is similar to the one described in (Uhlig et al. 2015).Footnote 1 For further information on generalized linear mixed models, the reader is referred to Nelder and McCullagh (1983), McCulloch and Searle (2001) and Jiang (2007).

In practice, it may occur that a i values greater than 1 are observed (or, equivalently, \( \ln a_{i} \ge 0 \)). Theoretically, such an occurrence is not compatible with the Poisson distribution assumption, since, for a given nominal concentration x, the corresponding POD would be greater than \( 1 - \exp \left( { - x} \right) \). Accordingly, it may seem desirable to constrain the sensitivity parameter estimates to values \( a_{i} \le 1 \). However, \( a_{i} > 1 \) can be interpreted as an indication that the average target microorganism concentration is greater than the nominal concentration or that the number of false positives is too large. In the framework of a validation study, this constitutes useful information and, for this reason, it was decided not to build in an extra constraint (note that \( a_{i} > 0 \) is ensured by applying the exponential function to the \( \ln a_{i} \) estimate).

The second model expansion consists in the implementation of a factorial experimental design. In this approach, different influence factors are identified as probable sources of variability, e.g. different operators or reagent batches. These factors are then systematically varied in the design. Typically, each factor is varied across 2 levels, e.g. 2 operators or 2 different reagent batches. If five factors are included in the design, each with two levels, there are thus 25 = 32 different combinations or settings. Particularly efficient designs called orthogonal designs make it possible to reduce the number of settings, e.g. from 32 to 8. An example for an orthogonal design with 8 settings is provided in Table 1. For further information on orthogonal designs, the reader is referred to Tamhane (2009).

Table 1 Study design in the case of five factors for each participating laboratory

Typically, for each setting, replicate measurements are carried out at 3 different contamination levels. The nominal contamination levels can be selected for instance as L 0 = Blank, \( L_{1} = 0.8 \;{\text{CFU}}/{\text{mL}} \) and \( L_{2} = 10\; {\text{CFU}}/{\text{mL}} \). The term run will be used to refer to the performance of all the measurements at the 3 contamination levels for one particular setting. In case that n replicate measurements are performed at each contamination level, there are thus 3n test results per run. On the basis of the replicates, ROD (rate of detection) values are calculated. For instance, if k of the n results are positive (i.e. the organism was detected), then the ROD value is k/n. The results can be entered in a table such as Table 2, where each empty cell corresponds to one ROD value.

Table 2 Summary of ROD values for each participating laboratory

Alternatively, an overview of the data for one method across laboratories can conveniently be displayed in a table such as Table 3.

Table 3 Positive results for one method

Taking into account the different runs, the model described by Eq. (6) is now expanded as follows:

$$ \ln ( - \ln (1 - {\text{POD}}_{ij} (x))) = \ln a_{i} + \ln x + \eta_{ij} , $$
(7)

where the subscript j represents the run, and the laboratory-specific run effect η ij actually consists of a sum of factor effects \( \eta_{ij} = \gamma_{i11} \cdot z_{j11} + \gamma_{i12} \cdot z_{j12} + \cdots + \gamma_{iq1} \cdot z_{jq1} + \gamma_{iq2} \cdot z_{jq2} , \) where \( \gamma_{ikl} \) is the effect of factor k (k = 1, …, q) in laboratory i for factor level l and z jkl is the design matrixFootnote 2 element (0 or 1) for run j, factor k and factor level l (it is assumed that every factor has two levels).

Note that in a validation study, the design matrix elements are constants, i.e. they are not subject to random variation. They are systematically selected in order to reflect the spectrum of measurement conditions in the laboratory. However, in routine measurements no such deliberate control is exercised over measurement conditions, and the z jkl values can be seen as independent realizations of a random variable with zero mean and unit variance.

The within-laboratory effects \( \gamma_{ikl} \) values are modelled as independent normal random effects with \( \gamma_{ikl} \sim {\text{N(}}0,\sigma_{k}^{2} ) \).

On the basis of the model described in Eq. (7), the variance components \( \sigma_{lab}^{2} \) and \( \sigma_{k}^{2} \) (\( k = 1, \ldots ,q \)) can be estimated in standard software such as R. Once they have been calculated, the total variance is obtained as

$$ \sigma_{total}^{2} = {\text{Var(}}\ln a_{i} + \ln x + \eta_{ij} )= \sigma_{lab}^{2} + \sigma_{1}^{2} + \cdots + \sigma_{q}^{2} . $$
(8)

The \( \sigma_{total}^{2} \) parameter thus characterizes the reproducibility of the method.

As far as the interpretation of the sensitivity parameter a [see Eq. (3)] is concerned, note that by definition of LOD95% (and using \( \ln\; 0.05 \cong - 3 \)), it follows that

$$ LOD_{95\% } = - \frac{\ln 0.05}{a} = \frac{3}{a}. $$
(9)

This establishes a direct relationship between the average sensitivity a [calculated as e μ, see Eq. (5)] and LOD 95%. Thus, in the ideal case (a = 1), we obtain \( LOD_{95\% } \cong 3. \) On the other hand, if the sensitivity parameter a drops to 1/2, LOD 95% increases to \( \cong 6 \).

By the same token, one obtains

$$ LOD_{50\% } = - \frac{\ln 0.5}{a} \cong \frac{0.69}{a}. $$
(10)

As far as the interpretation of the reproducibility parameter \( \sigma_{total}^{2} \) is concerned, it is noted that Eq. (9) implies that \( \ln LOD_{95\% } = \ln 3 - \ln a. \) For the upper and lower limits of the \( 95 \, \% \) prediction interval of the LOD estimate, it follows that

$$ \ln LOD_{95\% ,upper} = \ln 3 - (\ln a - 1.96 \cdot \sigma_{total} ) $$

and

$$ \ln LOD_{95\% ,lower} = \ln 3 - (\ln a + 1.96 \cdot \sigma_{total} ). $$

Accordingly, one obtains

$$ \ln \frac{{LOD_{95\% ,upper} }}{{LOD_{95\% ,lower} }} = 3.92 \cdot \sigma_{total} . $$
(11)

One obtains the same result with LOD 50% instead of LOD 95%.

Thus, the (log) reproducibility variability of the LOD 95% (or LOD 50%), defined as the logarithmic ratio between upper and lower 95 % confidence limits, is proportional to σtotal.

Simulation studies were conducted in order to assess the reliability of the σ total estimate. With 5 participant laboratories, a relative standard error of less than 30 % was observed for the σ total estimate. It can thus be concluded that reliable reproducibility estimates are achieved with as few as 5 laboratories.

Finally, it is important to take into account the fact that, in the case of microbiological methods, sufficient sample stability is difficult to achieve. As a result, the reliability of sensitivity and reproducibility estimates can be compromised. This difficulty can be overcome by including test results from a reference method in the validation study. Indeed, if, for each sample and laboratory, test results from both the alternative method (i.e. the method being validated) and the reference method are obtained, then it is reasonable to expect that the instability of the samples will affect both methods in the same manner. In order to assess the performance of the alternative method, the study of the relative level of detection (in accordance with ISO 16140-2 (ISO 2016), Section 5.1.4) needs to be conducted. In this approach, a reliable indicator of the performance of the alternative method is obtained by determining the ratio of the LOD values corresponding to the 2 methods:

$$ RLOD_{50\% } = \frac{{LOD_{50\% , alt} }}{{LOD_{50\% ,ref} }}. $$
(12)

There are 2 approaches for the determination of the RLOD50%. If the contamination levels are not known, only a direct estimation of RLOD is possible, see Section 5.1.4.2 of ISO 16140-2 (ISO 2016).

The mathematical model for the determination of RLOD is derived from the model for LOD described above. In accordance with this model, we have

$$ \ln LOD_{50\% ,alt} \sim N(\ln 0.69 - \ln a_{alt} , \sigma_{alt}^{2} ) $$
(13)

and

$$ \ln LOD_{50\% ,ref} \sim N\left( {\ln 0.69 - \ln a_{ref} , \sigma_{ref}^{2} } \right). $$
(14)

This implies

$$ \ln RLOD_{50\% } \sim N\left( {a_{ref} - a_{alt} ,\sigma_{RLOD}^{2} } \right), $$
(15)

where

$$ \sigma_{RLOD}^{2} = \sigma_{alt}^{2} + \sigma_{ref}^{2} - 2 \cdot \varrho \cdot \sigma_{alt} \cdot \sigma_{ref} $$
(16)

and where \( \varrho \) denotes the correlation between the two methods. If the 2 methods are independent, then we have \( \varrho = 0 \), corresponding to the case of an “unpaired study” in the wording of ISO 16140-2 (ISO 2016). Thus, in the case of an unpaired study, \( \sigma_{alt}^{2} \) can be obtained from \( \sigma_{RLOD}^{2} \) if \( \sigma_{ref}^{2} \) is available (e.g. from an earlier validation study).

3 Results and discussion

Five laboratories take part in an interlaboratory validation study for a newly developed culture method, referred to as the alternative method. The laboratories obtain yes/no test results at 3 contamination levels and 8 settings. Within each laboratory and for each contamination level, the design provided in Table 4 is implemented.

Table 4 Design with five factors and eight settings to be implemented within each laboratory and for each contamination level

The 3 nominal contamination levels taken into consideration in this study are \( {\text{L}}_{0} = {\text{Blank}} \), \( {\text{L}}_{1} = 0.8\,\,{\text{CFU}}/{\text{mL}} \) and \( {\text{L}}_{2} = 10\,\,{\text{CFU}}/{\text{mL}} \). For the 2 contamination levels L0 and L2, only one test result is obtained. For the contamination level L1, 4 replicates are obtained. The test results are provided in Table 5. Table 6 provides the corresponding ROD values for one of the laboratories (see Table 2).

Table 5 Data for example
Table 6 ROD values for laboratory 1

The estimation of the model parameters is carried out in the statistical software R. Alternatively, the computations can be performed by means of an extended version of the software PROLab POD (QuoData). The mean sensitivity estimate is 0.61. It follows that LOD 50% is approximately 1.13 [see Eq. (10)]. Finally, σ total is estimated as 0.76. It follows that \( \ln \frac{{LOD_{50\% ,upper} }}{{LOD_{50\% ,lower} }} = 3.92 \times 0.76 = 2.97 \) [see Eq. (12)].

The σ total estimate of 0.76 is relatively high and may constitute sufficient ground to call into question the fitness of the method. It is important to note, however, that the high reproducibility may be due, to some extent, to a lack of sample homogeneity. In order to investigate this question, the method is compared to a reference method. The test results for the reference method are provided in Table 7. Table 8 and 9 provide overviews across laboratories for the two methods (see Table 3).

Table 7 Data for example
Table 8 Positive results for the alternative method
Table 9 Positive results for the reference method

The LOD of the reference method is calculated as LOD 50%,ref  = 0.88. As can be seen, it is lower than that of the alternative method (LOD 50%,alt  = 1.13), i.e. the reference method is more sensitive. The corresponding RLOD 50% value is calculated as 1.28 [see Eq. (12)].

The possible effect of sample instability is offset by considering not the reproducibility with respect to the LOD, but rather with respect to the RLOD. In order to determine the 2 methods’ reproducibility with respect to RLOD 50%, in a first step, for each laboratory and for each setting, maximum likelihood estimates for LOD 50%,alt and for LOD 50%,ref are calculated along with corresponding \( { \log }_{10} RLOD_{50\% } \) values. In order to avoid implausible sensitivity estimates in case that all the test results are negative for a particular laboratory and setting, a minimum of 0.15 is stipulated for the sensitivity parameter a [see Eq. (3)] (this minimum corresponds to the upper confidence limit). The \( { \log }_{10} RLOD_{50\% } \) estimates are provided in Table 10 (note that these estimates only take on five values, depending on the number of positive results at the middle concentration level). In a second step, a linear mixed model is fitted to these \( { \log }_{10} RLOD_{50\% } \) estimates. The resulting reproducibility is σ total,RLOD  = 0.49. This is considerably less than reproducibility estimate for the alternative method (calculated as 0.76). The RLOD 50% estimate obtained from fitting the linear mixed model to the \( { \log }_{10} RLOD_{50\% } \) provided in Table 10 is 1.27, which matches well with the value 1.28 calculated directly from the \( LOD_{50\% , alt} \) and \( LOD_{50\% , ref} \) values according to Eq. (12) (see previous paragraph).

Table 10 \( { \log }_{10} {\text{RLOD}}_{50\% } \) values for each laboratory and setting

4 Conclusions

In this paper, a validation approach is presented for microbiological qualitative methods where the distribution of CFU contamination levels follows a Poisson distribution. In this approach, the method’s reproducibility is a measure of the reproducibility of the LOD parameter across laboratories and measurement conditions. Since a microbiological qualitative method cannot be reliably validated without determining the variability of the LOD, the method’s reproducibility—calculated as σ total —provides essential information about the method’s performance.

Moreover, the factorial design presented here constitutes a systematic approach to measurement conditions which, over and above ensuring the full range of measurement conditions is represented in the validation study, makes it possible to reduce the workload, with reliable reproducibility estimates with as few as 5 laboratories. In addition, the factorial approach also allows a quantitative analysis of the impact of different influence factors.

If, as is often the case for microbiological methods, sufficient stability of the samples is not ensured, then test results from a reference method should be taken into consideration, and the assessment of the reproducibility is carried out with respect to the two methods’ relative level of detection. Since it can be expected that sample instability will affect both methods in the same manner, considering the ratio of the 2 LOD values should offset any bias in the estimate of reproducibility caused by sample instability. The reproducibility of the RLOD parameter only provides information regarding the reproducibility of the LOD of the alternative and reference methods if the two measurement procedures can be considered independent, e.g. involving different culture media, reagents and instruments.

Finally, it needs to be noted that the approach presented here can be adapted to in-house validation studies. The factor “Laboratory” can be replaced by the factor “Day” or “Week”. The variability between the laboratories would then correspond to the variability between days or weeks.