Introduction

Antibiotics are widely used in veterinary medicine and subsequently drug residues may persist in foods derived from animals, which may pose an adverse health effect for the consumer. Screening of food products from animal origin for the presence of antimicrobial residues started soon after the introduction of antibacterial therapy in veterinary medicine. Initially it mainly concerned process monitoring in the dairy industry to prevent problems in fermentative dairy production, but from the early 1970s regulatory residue screening in slaughter animals also became more commonly introduced.

An efficient screening method needs to be low-cost and high-throughput, able to effectively identify potential noncompliant samples from a large set of negative samples.

Microbial inhibitions assays were the earliest methods used for the detection of antibiotic residues [1, 2] and they are still widely used. They are very cost-effective and in contrast to, for example, immunological or receptor-based tests, they have the potential to cover the entire antibiotic spectrum within one test. Two main test formats can be distinguished: the tube test and the (multi-) plate test. A tube (or vial, or ampoule) test consists of a growth medium inoculated with (spores of) a sensitive test bacterium, supplemented with a pH or redox indicator. At the appropriate temperature, the bacteria start to grow and produce acid, which will cause a color change. The presence of antimicrobial residues will prevent or delay bacterial growth, and thus is indicated by the absence or delay of the color change. This format is commonly applied in routine screening of milk [3, 4], but it is also increasingly used for analysis of other matrices [57]. A plate test consists of a layer of inoculated nutrient agar, with samples applied on top of the layer, or in wells in the agar. Bacterial growth will turn the agar into an opaque layer, which yields a clear growth-inhibited area around the sample if it contains antimicrobial substances. In Europe this has been the main test format since screening of slaughter animals for the presence of antibiotics started [8, 9].

One of the first official methods was the Sarcina lutea kidney test of van Schothorst [10], which became the statutorily prescribed method in the Netherlands in 1973. At approximately the same time, Germany introduced a Bacillus subtilis BGA test and other countries adopted similar test methods [8]. In 1980 a standardized method for the detection of antibacterial substances was proposed by a working group of the Scientific Veterinary Commission of the European Commission [9]. This EU four-plate test (EU4pt) comprises three plates of agar medium inoculated with B. subtilis BGA spores at pH 6, 7.2, and 8, and a Kocuria rhizophila (formerly known as Micrococcus luteus [11]) ATCC 9341 plate at pH 8. The pH 7.2 medium is supplemented with trimethoprim (TMP) to increase the sensitivity for sulfonamides. For a long time the result of this test was used as an unofficial tolerance level: meat testing negative on all four test plates was considered compliant.

The EU4pt was developed for detection of residues in meat and was considered less suitable for analysis of kidney because it caused too many false-positive results with this matrix. Also a test comprising four plates was considered rather laborious, so in several countries one-plate alternatives were introduced [1214]. These tests, based on B. subtilis, used renal pelvis fluid or kidney as a test matrix, since residue levels in this organ are generally higher than in meat, allowing a somewhat reduced sensitivity of the test, while the results were still comparable with the EU4pt results for meat [12]. Introduction of a membrane between the kidney sample and the test plate was used to prevent problems with natural growth-inhibiting compounds [14].

Ongoing harmonization of European legislation has led to a collective approach with respect to the approval of veterinary drugs (EU Council Regulation 2377/90 [15]) and monitoring programs (EU Council Directive 96/23/EC [16]). Before a veterinary medicinal product is allowed on the market, it has to undergo a safety and residue evaluation, after which maximum residue limits (MRLs) can be defined. This process started in 1992 and currently the list of antibacterial substances (category B1 substances) for which an MRL has been established comprises over 50 antimicrobial compounds (Table 1). EU Council Directive 96/23/EC prescribes mandatory screening of a fixed percentage of all animal products for the presence of residues of antimicrobial drugs. The vast majority of the screening methods used for monitoring the presence of antimicrobial compounds today are still microbial inhibition tests [17]. However, the establishment of MRLs has made us reconsider the original microbial screening methods, such as the EU4pt and one-plate B. subtilis assays, employed in the pre-MRL era, as it should be concluded that for many residues these tests are insufficiently sensitive [12, 18, 19].

Table 1 Overview of EU maximum residue limits (MRLs) (μg kg−1), established until 1 January 2009

Consequently the last decade has shown a significant development of improved methods. This paper reviews the efforts to develop methods that are (as far as possible) in compliance with EU legislation. An overview of the methods referred to is provided in Table 2.

Table 2 Overview of the methods referred to in this paper

Method development

Multiplate (broad-spectrum) methods

The most important trend that can be observed in the development of microbial detection methods for antibiotics is acknowledgement of the fact that adequate detection of a broad spectrum of antibiotics is only possible using multiplate assays based on a combination of different test bacteria.

Okerman et al. [20] presented an inhibition test for detection and presumptive identification of tetracyclines, β-lactams, and quinolones in poultry. The method comprises three pH 6 plates, inoculated with B. cereus, K. rhizophila, and Escherichia coli. The detection limits of a limited number of residues were compared with those of a B. subtilis pH 6 test and were found to be lower for all compounds, although the differences between B. subtilis and B. cereus sensitivity for tetracyclines were remarkably small. As the authors mentioned, this method should be considered a limited-spectrum method, because aminoglycosides and sulfonamides will not be detected. Also adequate detection of macrolides would probably require a higher pH.

Tsai and Kondo [21] evaluated the detection levels of 31 antimicrobial agents on various combinations of seven bacteria and five media. These included the somewhat uncommon test organisms Clostridium perfringens and Photobacterium phosphoreum. On the basis of the results a method comprising B. stearothermophilus, B. subtilis , K. rhizophila, and E. coli was proposed. B. cereus was included in the evaluation but was not found to be essential as B. subtilis grown on minimum medium showed better sensitivity to the tetracyclines tested (oxytetracycline (OTC) and chlortetracycline).

An interesting study was presented by Myllyniemi et al. [22], who evaluated the regulatory prescribed Finnish two-plate test, supplemented with a B. subtilis pH 7.2 + TMP plate. Kidney and muscle were taken from animals that were emergency-slaughtered during the withdrawal period of an antibiotic treatment. The samples were chemically confirmed, and the 68 out of 89 animals that contained residues showed a wide range of penicillin, OTC, and enrofloxacin concentrations below and above the MRL. This study provided valuable data on the correlation between the concentrations found in kidney and muscle in the same carcass. It was concluded that the B. subtilis assays used were not sensitive enough to allow OTC and enrofloxacin screening in muscle; only penicillin could be screened adequately from muscle tissue.

The costs of chemical confirmation can be considerably reduced by introducing a preliminary microbial identification procedure. For this reason the activity patterns of 15 different antibiotics were assessed on 18 combinations of test bacteria, varying growth medium pH values and antagonistic compounds [23]. This approach generated data for these specific antibiotics on a wide range of test plates, yielding a much better view of the specificity of a test plate. Activity patterns appeared to be sufficiently specific for group identification of the antibiotics tested. Additional data were generated with incurred kidney and muscle samples containing penicillin, OTC, and enrofloxacin (including the metabolite ciprofloxacin) and for these compounds the microbiological identification and the chemical identification were in good agreement. Cluster analysis on the inhibition zones caused by the different antimicrobial compounds on each of the 18 test plates revealed that the number of plates required for effective preliminary identification could be narrowed down to six [24]. It was shown that group identification of standard solutions and incurred samples of penicillin, OTC, and enrofloxacin remained correct; unfortunately no data for the other antibiotic groups are available.

The possibility for preliminary identification from activity profiles was also explored for the US Food Safety and Inspection Service (USDA-FSIS) method [25, 26]. This method consists of seven test plates and is used by the USDA-FSIS as a microbial confirmatory procedure for samples which tested positive in initial screening tests such as STOP [27], CAST [28], and FAST [29]. The method does not use the commonly applied K. rhizophila ATCC 9341, but uses two erythromycin- or (dihydro)streptomycin-resistant derivatives, which may improve the identification of macrolides and aminoglycosides. The method lacks a TMP-supplemented test plate and should therefore be considered insufficient with respect to the detection of sulfonamides. The B. subtilis plate in this method allows adequate detection of enrofloxacin. However, the lack of a specific test plate for this antibiotic group will probably lead to a situation in which most other veterinary quinolones remain undetected.

Under EU Council Directive 96/23/EC, AFSSA Fougères was designated as the Community Reference Laboratory (CRL) for (among others) the B1 substances [16]. The CRL proposed an improved method for screening of meat, the screening test for antibiotic residues (STAR) [30, 31], which is based on five individual test plates containing B. cereus, B. stearothermophilus, B. subtilis, K. rhizophila, and E. coli as the indicator organisms. The results of an initial collaborative study with a small number of residues in pig muscle were mainly satisfactorily, although gentamicin at 5 times the MRL was not detected and the B. stearothermophilus plate showed inhibition with blank samples [30]. Additionally the STAR was validated with fortified milk samples [31]. The compounds that could not be detected at levels less than or equal to the MRL were mainly sulfonamides and β-lactams; for the latter group, however, the MRLs in milk are generally lower than MRLs for other matrices. A validation study with spiked muscle tissue samples is ongoing, but preliminary results show that the detection capability for several of the substances tested appears to exceed the MRL [32].

Ferrini et al. [33] presented a six-plate method, the combined plate microbial assay (CPMA), which essentially consists of the EU4pt, extended with additional B. cereus and E. coli plates. When the proposed strategy is applied, i.e., applying samples in twofold or fourfold and supplementing them with one of the confirmatory solutions penicillinase, 4-aminobenzoate, or magnesium sulfate, the test allows presumptive group identification and initial screening in one step. This same approach was presented earlier for a limited range of residue groups in milk [34] and meat [35].

Reviewing the activity profiles is a relatively simple way to achieve preliminary identification [23, 26]. The ultimate form of it is presented in the Nouws antibiotic test (NAT) [36]. This method comprises five test plates, each one specific for one or two groups of antibiotics, with the plate showing the largest inhibition zone revealing the group identity of a residue. The method is based on the analysis of renal pelvis fluid. It uses a format that slightly differs from most plate tests, as it does not apply samples on top of the agar layer, but in punch holes which are supplemented with a plate-specific buffer. This approach yields good sensitivity, though the procedure becomes more complex, which might be a disadvantage in terms of robustness. In accordance with this same principle, postscreening methods for the analysis of kidney and meat [37, 38] were developed, and are used for screening of slaughter animals within the framework of the National Monitoring Program in the Netherlands.

It can be concluded that the increased number of test plates, required to achieve adequate detection, has resulted in more laborious methods. However, they bring the advantage of enhancing the possibilities and the accuracy of presumptive antibiotic group identification, which may reduce confirmatory costs and efforts.

Additional methods developed for quinolones

Some of the studies concerning microbial screening methods do not intend to cover the entire spectrum of antimicrobial residues, but only a specific group of antibiotics. This is particularly true for the quinolones, a major antibiotic group that only became veterinarily relevant during the last decade of the twentieth century.

Ellerbroek [39] compared the sensitivity of B. subtilis BGA and E. coli (Bay) 14 towards enrofloxacin, ciprofloxacin, and flumequine. He proposed an extension of the German three-plate (B. subtilis) method with E. coli, which was found to be 3–30 times more sensitive depending on the quinolone residue. Similarly, Choi et al. [40] compared several other E. coli strains with B. subtilis ATCC 3491, which was the official test organism for antibiotic screening in Canada. Besides enrofloxacin, ciprofloxacin, and flumequine, the study included also sarafloxacin and difloxacin. For all these residues E. coli ATCC 128 appeared to be superiorly sensitive and this organism was recommended for supplementing the existing microbial screening tests. A third comparative study evaluated the susceptibility of the same organisms as in [39] for ten different quinolones [41]. Only difloxacin appeared to be detected more sensitively using B. subtilis as the test organism. The paper also shows the differences between growth medium at pH 6 and pH 8. Detection of naldixic acid, flumequine, oxolinic acid, and difloxacin appeared to be optimal at pH 6; for the others pH 8 is favorable. It was concluded that the addition of an E. coli pH 8 test is the best option to include in existing screening methods. However, depending on the matrix it might be necessary to include a pH 6 plate for adequate detection of flumequine, since the MRL of flumequine in muscle differs between species (600 μg kg-1 in fish, 400 μg kg-1 in poultry, and 200 μg kg-1 in other species). Most of the broad-spectrum multiplate methods mentioned in the previous section comprise a specific E. coli test plate for quinolone detection, either E. coli (Bay) 14 at pH 6 [20] or E. coli 11303 at pH 7.2 [23, 24, 33] or pH 8 [31].

Two alternative bacterial species been proposed for the detection of quinolones, Klebsiella pneumoniae ATTC 10031 [42, 43] and Yersinia ruckeri NCIMB 13282 [44, 45]. For K. pneumoniae only data on its sensitivity towards enrofloxacin have been published [42, 43]. The Y. ruckeri assay was originally developed for the detection of oxolinic acid in fish [44]; the detection capability of a Y. ruckeri based pH 6.5 assay for several additional quinolones in egg and poultry muscle was published later [45]. The Nouws antibiotic test is the only multiplate method so far that has implemented this species for quinolone detection [3638]. It has been claimed that the use of this organism provides a better balance between sensitivity towards enrofloxacin and oxolinic acid and flumequine [45], though a straightforward comparison between the two species is lacking so far.

Tube tests

From a practical perspective, tube tests form an attractive alternative to multiplate methods. Almost without exception these tests use B. stearothermophilus var. calidolactis as the indicator organism. The only equipment needed is a device (e.g., garlic press) to obtain tissue fluid and an incubator or water bath at the appropriate temperature. Assay results are available within 4 h, and the use of spores instead of vegetative cells allows prolonged shelf life, which makes commercial distribution feasible. Initially, commercially available B. stearothermophilus tube tests were developed for the analysis of milk, but for several years tests intended for other animal matrices have also become commercially available; e.g., Premi®Test (DSM), Explorer (Zeu-Inmunotech), and Kidney Inhibition Swab (KIS™) test (Charm Sciences). The only test for which a substantial amount of literature is available is Premi®Test.

B. stearothermophilus is widely used for detection of antibiotics in milk, because it is very sensitive to what is considered the most important group of antimicrobials for this matrix, the β-lactam antibiotics. Popelka et al. [46] showed that Premi®Test exhibits excellent sensitivity for penicillin, amoxicillin, ampicillin, oxacillin, and cloxacillin. The study includes results of poultry muscle samples originating from animals treated with amoxicillin. Premi®Test appeared to be capable of detecting residue levels down to 21 μg kg-1.

Premi®Test recently received AFNOR (French Association for Normalization) certification. The AFNOR validation mark certifies the analytical effectiveness of commercial methods for a defined field of application, which should be comparable to the effectiveness of a reference method. The organization mainly certifies microbiological detection methods in food and water. Certification of antibiotic detection methods is limited so far; besides Premi®Test only a receptor assay for β-lactam antibiotics, beta-STAR (Neogen), has received an AFNOR certificate.

The results of the validation study, which was performed by the CRL, have been published [47]. The study comprised several steps. In the first step, the detection capability of the test for amoxicillin, ceftiofur, sulfamethazine, OTC, tylosin, and gentamicin in fortified meat juice samples was analyzed. Detection of amoxicillin, ceftiofur, and tylosin at their respective MRLs was satisfactorily, sulfamethazine and OTC were adequately detected at twice the MRL, but gentamicin was not. The false-positive rate was fairly high, with six “doubtful” results out of 40 measurements.

The second step concerned a comparison between Premi®Test and the EU4pt with incurred muscle samples. Since this is the French official method for monitoring muscle samples, it was assigned as the reference method. Incurred samples containing 750 μg kg-1 tylosin (MRL 100 μg kg-1), 270 μg kg-1 amoxicillin (MRL 50 μg kg-1), and a combination of 760 ug kg-1OTC and 150 ug kg-1 sulfadimethoxine (MRL for both 100 μg kg-1) were compared. The method performance was evaluated in terms of relative accuracy, relative specificity, and relative sensitivity and it was concluded that the results were similar, with Premi®Test yielding fewer false-negative and false-positive results. In more detail Premi®Test detected all incurred samples, while the EU4pt showed a false-negative rate of 20% for the amoxicillin and 80% for the tylosin incurred tissue samples. These results form the major argument on which the AFNOR certification is based.

From these results it may seem fair to conclude that Premi®Test performs equally well as or better than the reference method. However, evaluating the performance of a method against a reference method that, although it is still used on a large scale, is widely recognized to be insufficiently sensitive is arguable. Moreover, the samples that were used for the comparative study contained residue concentrations that were considerably higher than “the level of interest,” the MRL, so performing better than the reference method is no guarantee of performing adequately at the relevant residue concentrations. Finally, the number of different residues evaluated was very limited, which makes the outcome only of limited value in judging the application as a broad-spectrum-antibiotic screening method.

Additionally field samples testing positive with either Premi®Test or an additional B. cereus test were retested by the CRL using Premi®Test, the EU4pt, and the STAR method [30, 31]. Besides a disturbingly high false-positive rate for Premi®Test (62%), this part of the study also showed clearly that is the STAR method is much more sensitive than the EU4pt. Since 2006 field laboratories in France have been authorized to use Premi®Test as a prescreening, under the condition that all positive results are reanalyzed with the EU4pt.

Premi®Test claims to be suitable for matrices such as kidney, fish, eggs, and feed, but literature data on these matrices is very limited so far. Residue detection in eggs has only been studied with sulfadimidine [48], and detection in fish fluid was only tested with four antibiotics [7]. A much more extensive study, including the comparison with a one-plate B. subtilis test, was performed with 18 different antibiotics in kidney fluid [6]. Using kidney as a matrix may be an advantage, since the MRLs of several antibiotics are higher for this organ. Initially the detection capability was determined using antibiotic standard solutions. The sensitivity for most of the antibiotics tested appeared to be below the kidney MRL, except for chlortetracycline, sulfamethazine, streptomycin, and flumequine (and the banned residue chloramphenicol). In particular, for β-lactam antibiotics and sulfonamides the Premi®Test outperformed the one-plate test. However, when the same comparison was repeated using fortified kidney fluid samples, a considerable matrix effect was observed. The sensitivity of Premi®Test for β-lactam antibiotics remains below their MRLs, but with exception of doxycycline, all other antimicrobials were no longer detected adequately.

Okerman et al. [49] compared several methods, including Premi®Test, for the detection of tetracyclines in animal tissue. The Premi®Test results of chicken muscle spiked with 100 μg kg-1 of all four veterinarily used tetracyclines were negative. The study included the analysis of incurred samples. Unfortunately the highest doxycycline concentration (108.8 μg kg-1) yielded a negative Premi®Test result, so no conclusions regarding the detection limit for this compound could be made. OTC incurred samples were available at a much wider range and with this residue Premi®Test gave a positive result between 192.8 and 427 μg kg-1. Using a B. cereus based plate test detection of OTC at half the MRL appeared feasible. It should be mentioned that all the microbial test methods evaluated in this study were outcompeted by a commercial receptor test (Tetrasensor).

A comparison between two multiplate tests and Premi®Test also revealed insufficient sensitivity of Premi®Test with respect to the detection of tetracyclines [38]. Analysis of 591 slaughter animals yielded four MRL violations, of which three were tetracyclines that remained undetected by Premi®Test. A study on quinolone detection in poultry and eggs by the same group [45] showed that Premi®Test is also not suitable for this group of antibiotic residues, as all compounds tested remained undetected at their MRL.

Stead et al. [5] proposed an acetonitrile/acetone extraction to enhance the sensitivity of Premi®Test. Detection limits for a broad range of antibiotics and matrices were presented using this sample pretreatment. An advantage of using ampoule-based tests is the potential for objective automated processing of the results, using scanner technology as an alternative to subjective visual assessment [50]. Like one-plate tests, a tube test lacks the possibility for group identification. However, secondary screening for antibiotic group identification by repeating the assay supplemented with 4-aminobenzoate or β-lactamase can selectively identify the presence of sulfonamides or β-lactams [5]. In the same way, group identification of tetracyclines can be obtained after addition of a calcium-containing buffer [51].

The performances of Premi®Test and a similar B. stearothermophilus tube test, KISTM, were evaluated to asses the possibility to replace FAST [29], a B. megatherium one-plate test operated by the US Food Safety and Inspection Service [52, 53]. In addition to kidney fluid, which is the test matrix for FAST, also serum was tested since it would allow antemortem screening. KIS™ is specifically designed for the analysis of kidney and in practice employs a disposable swab format, but for this study samples were directly pipetted onto the test tube. Initially eight antibiotics (penicillin, sulfadimethoxine, OTC, tylosin, danofloxacin, streptomycin, neomycin, and spectinomycin) were tested [52]. As may be expected from the fact that they exploit the same test organism, differences in the results of the two B. stearothermophilus tests were only minor. FAST appeared significantly more sensitive for the aminoglycosides, but for most of the other residues the B. stearothermophilus tests show better sensitivity. KIS™ showed a considerable number of false-positive responses, but this maybe attributed to the fact that the kidney juice samples were not subjected to a preincubation step at 80 °C. This step is included in the Premi®Test protocol when one is analyzing kidney (and egg) to inactivate natural growth-inhibiting compounds.

Subsequently a very thorough study was carried out on suspect carcasses obtained from a meat inspection program [53]. Kidney and serum were subjected to each of the three microbial tests and were also analyzed by liquid chromatography–tandem mass spectrometry. From an analytical perspective, the range of compounds and concentrations found was somewhat disappointing. Only 39 out of 235 carcasses contained residues, mainly dihydrostreptomycin, penicillin, OTC, pirlimycin, and desfurylceftiofur cysteine disulfide, at very low levels. However, the three samples showing concentrations above US tolerance levels were effectively detected by both KISTM and Premi®Test, while FAST missed a sample containing 141 µg kg-1 sulfamethazine. It should be noted that the relatively low sensitivity of B. stearothermophilus for tetracyclines is not an issue in the US situation, where tolerance levels for tetracycline in kidney were set at 12,000 μg kg-1.

It can be concluded that tube tests can be used as a broad-spectrum screening method, but that in many cases parallel tests covering, for example, tetracycline and quinolone residues will be required.

Method validation and proficiency testing

Determining the suitability and applicability of a method for a specific matrix is obviously an important issue. It has probably become clear from the previous sections that it is difficult to compare the performances of individual methods on the basis of literature data, because factors such as the type of matrix and the specific residues investigated differ between studies. To come to more standardized procedures for method evaluation, the European Commission has issued a decision on method validation, 2002/657/EC [54], which describes how analytical methods should be validated according to common procedures and performance criteria. A method has to fulfill a defined subset of performance criteria, depending on whether it concerns a qualitative or a quantitative method, and a screening or confirmatory method (Table 3). Qualitative screening methods, such as the microbiological antibiotic detection methods, should be validated with respect to the following parameters: detection capability (CCβ), specificity/selectivity, ruggedness, and stability.

Table 3 Performance characteristics that should be determined in method validation according to 2002/657/EC

Detection capability

Two main dilemmas emerge when considering the validation of the detection capability of a broad-spectrum microbial screening method; the type of sample (matrix) and the number of compounds that should be assessed.

Although characterization with antibiotic standard solutions is relatively simple and provides valuable information on the bioactivity of individual compounds within a group, CCβ values obtained with such an approach cannot be considered representative for practical samples [6, 24, 45, 55]. The presence of an animal matrix may affect the detection capability of a method through various factors, such as the addition of growth components, local pH change, degradation, and protein binding. Validation should therefore (also) be performed with the matrix samples. Fortifying liquid matrices such as milk and egg is straightforward, but validating the detection capability of a method for meat or kidney is somewhat more complicated. For methods based on the detection of meat or kidney fluid, fortification is relatively easy, although two different approaches are being employed: fortification extracted fluid [6, 49] and fortification of tissue with subsequent extraction [5, 37]. Most methods for the screening of meat and kidney, however, rely on the analysis of intact pieces of tissue. Although the use of frozen pieces of fortified minced tissue, referred to as “simulated tissue,” has been reported [32], it remains difficult to find a proper fortification strategy for this type of test.

It would be highly preferable, especially with tests analyzing intact tissue, to assess the detection capability using incurred samples. Some studies evaluating the performance of microbial screening methods use tissues originating from animal medication experiments [47, 49, 56], but the production of incurred materials for each antibiotic at the appropriate concentration is a difficult and expensive task. Alternatively, samples originating from monitoring programs have been used for method evaluation [20, 2224, 38, 53, 57]. The most fruitful approach was to use emergency-slaughtered animals for which medication information indicates they were slaughtered before the end of the withdrawal period [22]. In general, however, these studies yielded only limited numbers of positive samples representing only a very limited group of substances, raising the question whether other residues were not present, or were not found because the method was too insensitive.

It has been proposed to limit the number of compounds to be validated for broad-spectrum methods by assigning “representative compounds” [37, 58]. It is assumed that one or two compounds within an antibiotic group can act as representatives for the entire group. This may be a legitimate assumption, but it should be treated with care. The relative bioactivity of compounds within a group may differ when they are exposed to a different test bacterium. Assuming that the representative compound should be the one that is detected least sensitively with respect to its MRL, also the fact that the MRLs vary between matrices may have consequences.

So far only a few microbial screening methods claim to have been validated “according to 2002/657/EC” [36, 37, 51, 59]. All of them determine CCβ using fortified concentration series. CCβ is determined as the lowest concentration for which 20 measurements (or more) give less than 5% false-negative results, so it would probably be more correct to state that CCβ is smaller than the established concentration.

Other validation criteria

The other criteria for qualitative screening methods, specificity/selectivity, ruggedness, and stability, can be interpreted in many ways. Pikkemaat et al. [36, 37] determined specificity by analyzing high (2–5 times the MRL) concentrations of all residues on each of the test plates. LeBreton et al. [59] claimed that a microbial inhibition tube test by definition is not specific, and only additional blank milk samples were tested. The same assay was validated according to the ISO/IDF 183 guideline as well [60]. In this study specificity was tested as the susceptibility to interfering substances (differing levels of fat, high somatic cell count, different species, etc.).

Ruggedness was defined by LeBreton et al. [59] as the reproducibility using different batches of tests, two analysts, different days, and spikes from different standard solutions. Tests were found rugged under the assumption that the result is judged according to a positive control. More specifically determined ruggedness aspects concerned variation in application volume and incubation temperature [60]. Ruggedness can also be shown by successful interlaboratory assessment [51].

Finally, the 2002/657/EC parameter stability, although in practice it is a very relevant aspect, forms a disputable demand. Since it is independent of the method used for the analysis, it cannot be considered a characteristic of a method. Assessing stability with a qualitative method is even more disputable, as the type of method implies that no absolute values can be assigned. Nevertheless Okerman et al. [41] analyzed the stability of frozen stock solutions of several β-lactams, tetracyclines, and quinolones using a B. subtilis plate assay. Under the assumption that a reduction more than 25% was significant, it was concluded that tetracycline, OTC, ceftiofur, and cefapirin were stable for less than 6 months, while amoxicillin and penicillin already showed a significant reduction after 2 months. The stability of antibiotic residues may vary between matrices, and results with other storage temperatures will also be relevant, as in practice samples are, for example, stored at 4 °C for several days. This problem has been recognized by the CRL, who proposed that “Stability data can be extracted from other laboratories’ studies, performed with other analytical methods, because they do not depend on the method used for analysis” [61].

Proficiency testing

Proficiency testing is another closely related quality control aspect, which is not in the 2002/657/EC criteria, but was prescribed in the earlier Commission Decision 98/179/EC, which states that “approved laboratories must prove their competence by regular and successful participation in adequate proficiency testing schemes recognized or organized by the national or Community reference laboratories” [62].

In contrast to other microbiological methods, currently there are no regular proficiency testing programs operational for microbial residue screening methods, while these are considered highly necessary to reveal the inevitable shortcomings in this area. Proficiency testing is available for chemical analysis of antibiotics, but the samples used in these studies often combine several residues in one sample, which will yield an additional or even synergistic effect when analyzed with effect-based microbial methods. Moreover, the materials are often homogenized, and therefore unsuitable for tests that operate on intact tissue.

In 2005 the CRL organized a proficiency test among 22 laboratories of which 21 performed microbiological screening [17]. Even though the residue concentrations in that study (195 μg kg-1 danofloxacin, 376 μg kg-1 tylosin, and a combination of 227 μg kg-1 OTC and 343 μg kg-1 sulfadimethoxine, along with two blank samples) were considerably above the MRLs for these compounds, only 13 laboratories correctly identified all three positive samples; additionally, five more laboratories produced false-positive results. This outcome may even be considered optimistic with respect to the situation in practice, since the laboratories involved were national reference laboratories, while in many countries the initial screening is delegated to routine field laboratories.

Conclusions

This paper provides an overview of the developments in the field of microbial screening methods for antibiotic residues in slaughter animals since the early 1990s, when the establishment of MRLs at levels below the sensitivity of the established and generally applied methods made us reconsider these existing screening methods.

Although the literature may show improved methods, the lack of validation data on incurred samples hampers an accurate evaluation of their true performance. It also remains difficult to get a clear picture of the extent to which improved methods have actually been implemented in practice. The results of a proficiency test organized among the EU national reference laboratories in 2005 showed that in an alarming number of laboratories the screening methods used were not sufficiently sensitive. The EU Standing Committee on the Food Chain and Animal Health produces a yearly report on the outcome of the national monitoring programmes [63]. For B1 substances the percentage of noncompliant results remains rather stable, around 0.2-0.3%. Considering the shortcomings of the currently applied screening methods, this figure is likely to be a serious underestimation of the actual noncompliance rate. It should be noted, however, that these data also include results of additional control programs for which the result of the microbial test is sufficient to reject the carcass. For some categories of animals these results represent over 50% of the total noncompliant results.

The fact that different methods are used and also target organs differ makes it impossible to compare the results between countries. One could argue that the change from prescribing routine or reference methods to an approach in which performance criteria and procedures for the validation of detection methods are established (2002/657/ EC) has not made this easier. Moreover, despite the attempt to standardize validation procedures, 2002/657/EC still leaves a lot of room for interpretation and is not considered very suitable for microbial methods.

Chemical methods generally were considered too specific and expensive to be applied as an initial screening. However, liquid chromatography–tandem mass spectrometry methods capable of simultaneous detection of multiple classes of antibiotics are increasingly becoming available [52, 6466] and may in some situations represent a cost-effective alternative. It should certainly be considered feasible for use within a national reference laboratory, as, for example, is already effectuated in Sweden (K. Granelli, personal communication) However, in particular for those countries that rely upon a monitoring infrastructure including dozens of routine field laboratories, it can be concluded that there is still a strong need for the development and implementation of adequate microbial screening methods, and more regular proficiency testing to reveal the shortcomings in the currently applied screening methods. It should be realized that these methods form the first line of defense in antibiotic residue monitoring, so it is essential to have accurate screening methods in place.