Introduction

In general, most analytical methods determine the concentration of an analyte in a sample by comparing the signal (e.g. absorbance in the case of ELISA) attributable to the analyte in the sample with the signal from one or more calibrators containing known concentrations of the analyte. Prerequisites for traceability are that the analyte is exactly defined and identical in sample and calibrator. Otherwise, more or less inaccurate results will occur.

Therefore, allergens cannot at present be measurable in a defined way like other analytes, for example, mycotoxins, because allergens do not meet either of the two prerequisites mentioned above. By nature, allergens are nearly always a mixture of different proteins with different mass fractions of each single protein in a commodity with different total protein contents in the presence of a complex and poorly characterized matrix [1]. By using ‘milk’ as an example for allergens, these points will be described in the following part.

From a technological point of view, the protein fraction of milk consists of two groups of proteins, caseins and the whey proteins. Due to their electrophoretic behaviour, caseins were divided into four subgroups, αS1-, αS2-, β- and κ-casein. The whey protein fraction consists of β-lactoglobulin, α-lactalbumin, serum albumins and immunoglobulins.

These proteins could be further distinguished into different isoforms due to genetic variants. These genetic variations result in amino acid exchanges, and therefore, different physico-chemical properties of the protein, for example, during cheese clotting occur [2]. If we take all caseins and the main whey proteins β-lactoglobulin and α-lactalbumin into account, a total of 49 isoforms were published in 2004 [3]. During the last two decades, mass spectrometry and capillary zone electrophoresis of proteins revealed that an additional level of complexity is added by different post-translational modifications, for example, lactosylation and phosphorylation [46]. Furthermore, seasonal and lactation-dependent changes of the relative quantities of the different proteins occur [7, 8].

Sometimes, these protein mixtures are fractionated by the food industry (e.g. into whey proteins and caseins). In other cases, reactions with other food constituents take place during manufacturing, resulting in different degrees of denaturation due to processing [1].

To summarize this description, milk is a highly complex mixture of proteins in an oil-in-water emulsion.

Biological variability, medical issues and legislation versus reference materials

It would be possible to use the most sophisticated methods which are available at the moment and quantify all different isoforms and post-translational modifications. Even the synthesis of every single isoform is possible in theory. Therefore, one theoretical possibility to circumvent the difficulties milk presents as a measurand would be to choose one highly defined protein as a representative for milk as a whole or to reconstitute an artificial milk protein mixture from defined proteins. However, it goes without saying that milk as a product of a biological process will vary in the ratios of different proteins, different isoforms and different post-translational modifications as described above. An absolute definition of milk and the selection of a representative protein (with a fixed and known ratio to all other milk proteins) are thus impossible. Furthermore, it is without doubt that this Herculean task will not meet legal and medical requirements. The purpose of worldwide allergen legislation is to enable the consumer to recognize and to choose a diet which was tested for allergens before. Focussed on labelling regulations and mandatory orders, the declaration of allergens in food is claimed as the full commodity, for example, ‘milk’. Obviously, a sensitized consumer is not interested in the content of, for example, β-casein isoform A3 in its lactosylated stage, and such a label would be of no practical use to every consumer and even medical scientists. Additionally, different sensitized consumers can be allergic against different milk proteins, and sensitized consumers do usually not know which is/are ‘their’ allergenic protein/s. Highly specific information is therefore also not reasonable from a medical point of view.

In summary, the main problem remains that milk is a very complex but poorly defined and variable emulsion. While it is not possible to reduce the complexity of milk as an analyte by ‘tracing’ an amount of ‘milk’ to a concentration of a single analyte or a mixture of defined (and therefore artificially produced) proteins, we do not advise this as the preferred approach.

Knowing these insufficiencies, we still need to fulfil two important requirements of the food industry and the sensitized consumers in the near future. On the one hand, the food industry needs methods to prove that their produced goods have a known and therefore calculable risk for sensitized consumers, and on the other hand, the consumers need a clear declaration of the remaining risk before consuming these goods. While the declaration of the remaining risk will result in threshold levels for most major allergens [9], the determination of contents of allergens in food is often performed by using commercial antibody-based methods. To relate the results of different test kits to each other and to threshold values, an analytical ‘reference point’ is needed. As stated above, a certified reference material is not possible to obtain from a classical point of view as described in ISO standards or guidelines because allergens (or the commodity) are in no case a single defined and invariable substance [10]. Additionally, the detection of allergens also cannot be replaced by detection of only a single representative analyte as stated above. Pragmatic solutions are therefore needed. It is without doubt that a ‘relative reference material’ like the International Prototype Kilogram which was chosen by demand in 1889 would be a step forward. Both set a reference point to which all users can relate their results to, and it makes no difference what ‘a-true-1-kg’ is. This concept will be explained in the best way by describing the history of a few standardization efforts in the last decade.

Standardization efforts in the last decade

Peanut: successful standardization by common reference material

There are two examples where a standardization of antibody-based assay systems resulted in a step forward and less bias during the following proficiency tests. In 2003, the AOAC Research Institute started a collaborative study for peanut measurement using different commercial ELISA test kits. At that time, the peanut butter material NIST SRM 2387 for the determination of fatty acids, vitamins, elements, amino acids and aflatoxins was used as per definition as a calibrator for all of the participating test kit manufacturers. Until today, FAPAS proficiency tests show remarkably low bias between the different test kits which were successful in the collaborative study (Table 1), especially in the lower mass fraction range (<20 mg/kg).

Table 1 Results of FAPAS® proficiency testing rounds for the quantification of peanut using different commercial ELISA test kits

Gluten/gliadin: successful standardization by a common reference trinity

The second example is the determination of gluten, the trigger of celiac disease.

In this case not only an international accepted ‘standard’ material is available [11]. By using this ‘standard’ material for provoking celiac disease in patients, a threshold value was defined and internationally accepted by the Codex Alimentarius in 2008 (CODEX STAN 118-1979) and the European Union in 2009 [12] together with a reference method (‘Mendez method’) based on a special monoclonal antibody (R5) to detect defined potentially toxic peptide sequences and a so-called cocktail solution for extraction of processed samples [13, 14]. The extraction method is very crucial since many gluten proteins are incorporated into aggregates and huge protein complexes; harsh conditions are necessary to yield a good extraction. Since the detection of gluten is usually done with monoclonal antibodies, the additional agreement on a specific monoclonal antibody was not only possible but in the authors’ view advisable. In comparison with polyclonal antibodies, a monoclonal antibody has fewer target sequences/structures; the use of different monoclonal antibodies in different test kits will therefore maybe lead to higher variability than different polyclonal antibodies. The agreement on the monoclonal R5 antibody which targets highly repetitive sequences on gluten ensures—for a monoclonal antibody—very numerous targets and also leads to less bias due to antibody variability. The R5 monoclonal antibody, the cocktail extraction and the ‘reference material’ form one reference trinity and rendered the definition of a threshold value possible.

Using well-homogenized materials and a detailed description of the test procedure (including a mandatory use for the cocktail solution), an international collaborative test in 2011 of gluten in different matrices using the R5 antibody was successful (Table 2) and only a naturally contaminated flour (sample 6) showed higher variability. Therefore, it could be shown that this method is fit-for-purpose and laboratories are able to measure with fit-for-purpose accuracy and precision all over the world.

Table 2 Performance statistics for the RIDASCREEN® Gliadin (R 7001) with s(r) (repeatability standard deviation), s(R) (reproducibility standard deviation), RSD(r) repeatability relative standard deviation and RSD(R) reproducibility relative standard deviation

In contrast, FAPAS rounds show remarkably inconsistent results possibly due to poor homogeneity and the poorly characterized nature of the spiking material, which was also recognized for other allergens like β-lactoglobulin. Since the organizers of these proficiency testing rounds never recommended one extraction procedure or a specific antibody, the outcomes of proficiency tests such as this are difficult to interpret for the participating laboratories. Furthermore, the inhomogeneity of the samples is higher than the measurement uncertainty. Values in Table 3 were taken from the official report of the FAPAS® proficiency test 2747 (gluten in chocolate cake mix) from the year 2008.

Table 3 Testing for homogeneity of FAPAS material 2747-A (gluten test material)

The calculation of the parameters to estimate the homogeneity was performed according to the harmonized IUPAC protocol from 2006 [15].

Homogeneity is assumed if the sampling variance s 2sam is smaller than the critical value c. In this case, s 2sam is negative because it is calculated out of the difference between V S (representing the variance of the sums S i) and s 2an (representing the variance of the replicates). Even if one simulates a relative standard deviation (RSD) for ‘fit-for-purpose’ of only 5 % (as shown in the right part of Table 3), the material seems still homogenous because the critical value is positive. It would be helpful in the future to use well-homogenized and well-characterized samples for better evaluation of the results.

It seems quite peculiar that some years ago a discussion about the status of the ‘Mendez method’ took place. Since it is a proprietary method, a downgrading was initiated to have the chance to use more than one method. If this process will go on in the future, the situation will be comparable to the years before the decision of standardization. At that stage, all assays measured different values and the suppliers of gluten-free products for celiac patients had no connection between a measured gluten value and the possibility that one of their products will be toxic to celiac patients. At the end, only celiac patients will suffer from a decision not to standardize the gluten determination. The standardization has to include reference material, the monoclonal antibody and an extraction method, since these are crucial parameters for gluten detection as stated above.

To complicate the situation, it should be noted that there is some confusion about the declaration of gluten and the declaration of wheat. In the first case, a product is ‘free’ of gluten if the concentrations are below 20 mg/kg. In the second case, a sample below 20 mg/kg of gluten needs to be declared to contain wheat because analytical methods will detect these concentrations. This situation confuses celiac patients or vice versa allergenic patients.

Casein, egg-white protein (‘ovalbumin’) and lysozyme in wine: misleading legislation

Fining processes during winemaking are used to clarify must and wine or to remove substances which otherwise would influence the flavour of the wine in a negative way. The fining agents bind to these substances and precipitate together with the bound substances. Whereas milk and its constituents are used traditionally for white wine fining, egg is applied for red wine. In case of egg, the whole raw egg white (consisting of ovalbumin, ovomucoid, ovotransferrin and lysozyme) could be used. A special case is the addition of pure lysozyme which is added to terminate bacterial growth that otherwise would result in undesirable aroma compounds, for example, ethylphenol, or an uncontrolled malo-lactic fermentation [16]).

In recent years, reasonable suspicion has emerged that residues of proteins used for fining processes are not removed completely and can remain in wine which could elicit an allergic reaction in sensitized consumers [17, 18].

According to the EU-directive 2007/68/EC, all wines labelled after 31 May 2009 must declare whether allergens like egg and milk were used during production and are still detectable in the wine [19]. An extension for the non-declaration until 30 June 2012 was decided by the Standing Committee on the Food Chain and Animal Health of the European Union. In the meantime, the International Organisation of Vine and Wine (OIV) set up requirements (mainly limit of detection, limit of quantification and recovery) for antibody-based methods in their resolutions OIV-MA-A315-23 and OIV-COMEX-502-2012. Unfortunately, the resolution does not name egg-white proteins but ovalbumin, which is misleading because commercial fining products contain a dry egg-white powder. This is the first case where special minimum requirements for an antibody-based method and fractionated allergens (casein from milk, egg-white protein and lysozyme from egg) were described by law.

However, not only caseins are used but also whey proteins which are not covered at all by the actual legislation (Paschke-Kratzin, University of Hamburg, Germany, pers. comm.). In both cases of egg and milk proteins as fining agents, the chosen reference by legislation does not reflect the real situation during production. As shown in the following paragraph, the reference for milk is a general problem for many food products, not only for fining agents in wine.

Milk: different reference bases

A prominent ‘negative’ example for different and therefore unclear reference bases is milk. The underlying problems are the following: the food industry does use not only milk but also fractionated parts of it (e.g. whey powder, caseins and hydrolysates), and the legislative demands to declare the absence or presence of milk without further specifications about the different milk proteins and without a threshold value. Based on these facts, it is not surprising that users of test kits have difficulties applying and interpreting the results obtained with commercial test kits. If they use a specific test kit, for example, to measure caseins, they will get a more or less exact value for caseins only depending on the quality of the test kit which could be described by a validation report. The same is true for β-lactoglobulin because both are based on specific antibodies and defined proteins. The calibrators of the test kits are traceable to a mass of a somewhat well-defined protein (β-lactoglobulin) or a group of related proteins (caseins). On the other hand, if they use a ‘total milk assay’, the result will be correct if the sample contains the unfractionated milk proteins only. If the sample contains a different ratio of, for example, caseins to β-lactoglobulin as present in milk or the sample contains only whey proteins, the quantitative value will not be comparable to the quantitative value of a true milk sample.

Furthermore, the different available commercial test kits are all calibrated in a different way, and the results are related to different bases (mass of protein or mass of commodity). Users of these assays will never be able to recalculate the results as ‘milk’ because they have no idea of the ingredients of an unknown sample and even if they know the ingredients, they have only a theoretical possibility to calculate, for example, casein to ‘milk’. Therefore, a comparison of different test kits will always tend to produce disparate results as happened during the collaborative test which was funded by the Food Standard Agency in the UK in 2011 (Johnson, University of Manchester, UK, to be published in J. Agric. Food Chem.). Without including conversion factors for the different test kits to calculate comparable results, these collaborative tests would have been without any useful information. Now we understand that we need (1) to characterize the spiking material for collaborative tests and (2) to standardize the results to a common basis.

Allergen ‘reference material’?

The international vocabulary in metrology defines a ‘reference material’ as a ‘material, sufficiently homogeneous and stable with reference to specified properties, which has been established to be fit for its intended use in measurement or in examination of nominal properties’ and further describes in a note that ‘the specifications of a reference material should include its material traceability, indicating its origin and processing’. These definitions should be discussed for allergens and may be adapted to avoid misinterpretations regarding the ‘property’ of the material.

The main idea is to establish a reasonable pragmatic compromise to which all tests should be standardized. To make a step in the right direction, a skimmed milk powder from the food industry could be chosen as a possible material with a known protein content (determined by an accepted method like Kjeldahl or Dumas determination). A further characterization of the mass of the main proteins (caseins, β-lactoglobulin and α-lactalbumin) and their degree of lactosylation could be performed by LC–MS/MS methods. Since caseins, β-lactoglobulin and α-lactalbumin account for more than 90 % of the milk proteins and they are the major milk allergens, this characterization should be sufficient. The material should be obtained using as many different sources of milk as possible in order to consider regional differences and to ensure a high possibility of reproducibility (complete reproducibility will never be possible for a substance or a composed mixture which is the product of a biological ‘production’ process). Using this kind of material, all assays that detect ‘milk’ in its natural composition of caseins and whey proteins will be standardized if the sample contains comparable milk protein compositions. The result will be obtained in mg/kg milk proteins. In the case that only caseins or whey proteins are present in the sample, the same material can be used for the standardization of a casein- or whey protein-specific test. The result obtained in mg/kg caseins can be recalculated to mg/kg milk proteins using standard conversion factors which are known from the characterization of the material. Most problematic are samples which contain caseins and whey proteins but in an unnatural ratio to each other. In this case, the sample could be measured in specific tests for caseins and β-lactoglobulin. The results will be summed up and multiplied by 1.1 (caseins and β-lactoglobulin account for approximately 90 % of milk proteins), leading to the result in mg/kg milk proteins.

Some of these ideas were already established by the introduction of the NIST reference material 8445. The material is described as a spray-dried whole egg for allergen detection. It is primarily intended for use in evaluating test kits for the determination of the presence of allergenic egg proteins. The determination of the total protein by the Dumas method revealed a mass fraction of (48 ± 1) %.

Assays which are related to the same material will lead to results which are closer together. It reduces the uncertainty of results between various test systems of one method of analysis but could also be the basis of comparison of results between different methods.

Beyond allergen ‘reference materials’

The most complicated cases are hydrolysed samples, for example, hydrolysed caseins. Hydrolysed caseins are, for example, present in cheese due to the addition of chymosin (lab) and/or bacteria and/or yeast in a very early step of cheese production. Chymosin and/or the proteases in these organisms digest the milk proteins to an unknown extent, and every fragment could contain epitope(s). In this case, the already existing variation due to the biological ‘production’ process of milk (different protein composition, different isoforms and post-translational modifications) is superimposed by the variation of the hydrolysation extent. Besides these naturally occurring hydrolysates, the food industry uses ‘technical’ protein hydrolysates with different hydrolysation extends to benefit from the technological properties, for example, foaming or emulsifying. Therefore, it goes without saying that the similarity between calibrator and sample will always be very limited, so that every chosen calibrator is a compromise at all events and that a recalculation to ‘milk’ is an adventure in every case.

An attempt to measure hydrolysed proteins has been done for hydrolysed gluten proteins which occur, for example, in beer due to the digestion by yeast proteases: gluten proteins were digested in vitro with trypsin and pepsin and were used as calibrators. Thereby, the digestion by yeast proteases during the brewing process and by gastrointestinal proteases in consumers sensitized to gluten proteins is simulated. Again, this is not a perfect calibrator, but a reasonable relative reference material. Additionally, this material enables the recalculation to gluten and thus to fulfil the demand of the legislative authorities to express the result as mg gluten/kg.

However, the case of gluten proteins cannot be applied to allergens in general (yet). First, the legal situation for gluten proteins is vice versa to allergens and much clearer. Not the undefined presence or the undefined absence of gluten proteins is regulated, but a gluten threshold of 20 mg/kg. Therefore, a calculation to the commodity cereals is not necessary, and the methods could be or already are adapted to this value. Normally, immunoassays for allergen detection measure as low as possible to ensure that no sensitized consumer will be harmed. Second, gluten proteins are fairly well-defined proteins, and a non-hydrolysed relative reference material already exists. Third, also the antibody and the extraction method to be used are standardized. Nevertheless, this example shows a possible way of how to deal with hydrolysed proteins.

In any case, the standardization for detection of hydrolysed proteins will be the royal discipline of standardization and can only be the final stage of standardization after the definition of non-hydrolysed materials has been completed.

Conclusion

The term ‘reference material’ needs to be discussed regarding its suitability for allergens. Main objects of this discussion would be the ‘property’ of a reference material, the determination of ‘stability’ and the possibility to reproduce this kind of material. Another point of discussion should be the broad applicability to have a material not only for antibody-based methods but also for DNA-based methods and LC–MS/MS methods. A decision is necessary to standardize different allergen determination methods. As shown in the case of a peanut ‘reference’ material, this might already be sufficient to reach an acceptable level of detection, ensuring suitable information for allergic consumers.

However, the example of gluten shows that detection of some proteins might need additional standardization. The detection by monoclonal antibodies—as common for gluten detection—has some advantages from a practical point of view, but tests using different monoclonal antibodies will target single different epitopes and lead to very different results. Therefore, the standardization for gluten detection must be extended to also include the monoclonal antibody to use. Furthermore, in the case of gluten proteins where extraction is very crucial, the extraction method must be included as well. The current state of gluten detection is quite optimal and as long as no independent data show that there exist, for example, an objectively better monoclonal antibody or a better extraction method, it should not be changed. In any case, the way to less defined detection, for example, by downgrading the R5 Mendez method without a comparable compensation would only mean a change to the worse. It goes without saying that collaborative trials should only use samples with characterized spikes or naturally contaminated samples with the limitation that the allergen is not characterized.

The examples of allergens in wine and the detection of milk show that prior to any standardization, legal requirements must be clear. Labelling requirements must relate to the allergens which are expected in foods such as egg-white proteins in wine and not ovalbumin. If fractionation of allergenic proteins is common, a regulation must be found on how to handle these samples such as caseins in sausages.

Once legal requirements have been clarified, suitable allergen ‘reference materials’ can be defined and all test kits can use this material directly as calibrator or can be calibrated against this material. Validation reports for tests must include this reference material. Only if this stage is completed, thresholds can be defined by legislators which have to be correlated directly to the allergen ‘reference material’. These thresholds must also include information of how to deal with fractionated allergens. Either after or prior to threshold definition, the detection of hydrolysed proteins can be standardized.

In order to achieve these goals, an international and widely accepted group of experts is necessary. This group has also to decide what kinds of food matrices for reference materials are necessary since not all kinds of matrices could be covered in the first years.

Finally, it should be noted that the problem of allergen ‘reference materials’ is not limited to a specific detection method but affects all currently used methods, since all of these methods measure the concentration in comparison with a calibrator.