Introduction

Element air pollution

The longest standing attention for wide-spread air pollution stems from our concern over its ecological effects. The environmental impact of atmospheric deposition has been studied for more than a century: probably the first effect that was described on a scientific basis was the decline of epiphytic lichens in areas with high levels of atmospheric pollution. Ever since Nylander’s (1886) [1] classical report on the epiphytic lichens of Paris and its surroundings, extensive studies have been performed in many areas [24]. Although these research efforts have led to a greatly improved scientific understanding of the abiotic effects of atmospheric deposition, especially in the fields of atmospheric chemistry [5], soil chemistry [6] and water chemistry [7], many of the biotic effects are still poorly understood, particularly in the terrestrial environment. Although many changes in vegetation are now generally attributed to atmospheric deposition, dose–effect relationships are usually poorly known [8]. General information on monitoring, behaviour and impact of terrestrial trace element pollution can be found in various reviews [916].

In addition to the ongoing concern for ecosystem performance as such, attention has been and becomes increasingly more directly focused on human health. This may be ascribed to the generally recognized impact of ecosystem performance on human well-being; furthermore, health-care has also been progressively developing towards approaches which include our nutrition and our social and environmental surroundings [17]. As a result, throughout the world, epidemiological studies were set up on air pollution and mortality rates and respiratory health effects, initially mostly on air particulates, ozone, acid rain, NO x and sulphur oxides [1821] but today the attention includes contaminants such as heavy metals, polycyclic aromatic compounds and halogenated organics [2228], which all differ widely with respect to their environmental and health impact properties [2935].

This increasing body of epidemiological data consistently demonstrates the adverse effect of air particulate matter on human health [29, 36, 37]. In the early 1990s, epidemiological studies started to be accepted although no plausible mechanisms could be identified [38]. Nowadays the key questions still refer to both particle size and chemical composition [39]. In aerosol studies, these two issues are largely addressed [4044], and also in epidemiological studies health is increasingly associated with these two particle characteristics [29, 39, 45].

Biomonitoring of element air pollution: general arguments and analytical techniques

Emission

Concern about atmospheric pollutants underlies the efforts to establish control programmes in many countries. Policies may be both source-oriented (e.g. technology-based emission management) and effect-oriented (e.g. risk assessment). In most countries, various regulatory instruments are combined into a co-ordinated control programme. In practice, controlling (anthropogenic) air pollutants is a very complex problem: sources and emissions have to be identified, analytical methods have to be evaluated, risks have to be assessed, critical emissions have to be controlled, and economical aspects have to be integrated [4648].

The necessary information on air pollutants can be obtained by dispersion modeling (source-orientation, a priori known emission sources) and by field measurements of the emissions (receptor/effect orientation) [49]. In many countries, dispersion modeling has gained more and more interest, also based on economic reasons: technical field measurements require equipment and manpower and are generally associated with high costs [50, 51]. Emission measurements, however, should be regarded as necessary and indispensable: they may be used to validate dispersion models, and the data obtained may indicate the presence of sources which are not known or registered [48]. Emission measurements require long-term sampling at large numbers of sampling sites. Such measurements using technical equipment have been few, mainly due to the high costs, and the lack of sufficiently sensitive and inexpensive techniques which permit the simultaneous measurement of many air contaminants [52]. It is here that biomonitoring comes in.

Biomonitoring, in a general sense, may be defined as the use of bio-organisms/materials to obtain (quantitative) information on certain characteristics of the biosphere. The relevant information in biomonitoring (e.g. using plants or animals) is commonly deduced from either changes in the behaviour of the monitor organism (impact: species composition and/or richness, physiological and/or ecological performance, morphology) or from the concentrations of specific substances in the monitor tissues. With proper selection of organisms, the general advantage of the biomonitoring approach is related primarily to the permanent and common occurrence of the organism in the field, even in remote areas, the ease of sampling, and the absence of any necessary expensive technical equipment.

Mosses and lichens are generally assumed to be among the most suitable biomonitors of atmospheric pollution by heavy metals and other elements [5355]. Both obtain their nutrients mainly from atmospheric deposition (both wet and dry). Because they have no root system contributions from the soil are assumed to be negligible, except for wind-blown soil material. Furthermore, mosses and lichens have a high capacity to retain ions, particularly of the heavy metals. Since the pioneering work of Rühling and Tyler [52, 56], mosses have been frequently used in large-scale monitoring surveys in Scandinavia [5761]. Since 1977, these surveys have been performed regularly (1977, 1985, 1990), eventually leading to various time-trend reports [6264]. The 1990 survey has been extended to other North-European countries, while similar studies were performed during 1990–2005 in several countries in central and southern Europe, among which was The Netherlands. A joint account of the results of these surveys is given by Rühling et al. [59, 65], more recent reports were compiled by Berg [66], Buse [64] and Harmens [67].

Single elements or multielements?

The selection of elements to be analyzed and/or used in data interpretation should be closely linked to the objectives of the study [68]. A survey may be dedicated to a single or only a limited number of elements, but it may also be set up to gather information about sources/effects based on broader points of view. The latter set-up may be regarded as most effective, because, first, the analysis of a large number of elements may generally increase the modes for interpretation, may permit a more reliable recognition of source finger-prints, and may show effects which are not a-priori anticipated [69, 70], while, second, the resources needed for the field-work will not or hardly depend on the number of elements of interest. This indicates that the selection of a large number of elements principally emerges from many survey’s goals: clear and unequivocal interpretation of data on specific elements may largely depend on the simultaneous presence of data on various other elements.

The above can be illustrated by data taken from several multi-element air pollution biomonitor surveys carried out at IRI, Delft, The Netherlands: The air pollution surveys commonly include a number of soil-associated elements (e.g. Al, Fe, Sc, Cr, Th) and several rare earth elements [71, 72]. In the Factor Analysis interpretation of the data on all selected 20 elements, the soil indicator elements serve to extract a “soil-factor” [73], based on which, for all individual elements, site-specific soil-associated fractions of the total concentrations can be calculated [71, 72].

The selection of a relatively large number of elements occurring in high temperature processes (e.g. V, Sb, Se, As) helped to discriminate between processes such as waste incineration, coal combustion, and other high-T sources [71, 72], while the simultaneous analysis of elements like Br, I, and Na suggested a sea-aerosol associated and long-distance origin of As, for about 25% of its total occurrence in mosses in a 1992 survey [74].

One of the most clear and simple examples can be given for the Zn smelting industries in the south-eastern part (Kempen area) of The Netherlands: here, Cd occurs as a by-product in a Zinc ore at a characteristic relative abundance of Cd = 1 to Zn −200. The analysis of a large number of elements, combined with the application of Factor Analysis, yielded a well-defined finger-print for these Zn smelting industries: they could be characterized by a Zn/Cd factor, in which Cd and Zn were obtained in a 1 to 212 relative abundance [75].

The data suggest that a careful selection of elements facilitates the interpretation of results on each of the individual elements; the larger the number of relevant elements involved in the eventual analysis, the more detailed information may be present in the data-set. Thus, the principal choice may be the multi-elemental analysis: the problem here is how to extract the wealth of information from the set, which may contain thousands of analytical data. A fast and functional approach may be found by the application of Factor Analysis techniques [71, 72].

Multi-element biomonitoring of element air pollution: element analysis

Although no sharp lines can be drawn between nuclear and non-nuclear analytical techniques (NATs and non-NATs) [76], the principle of the nuclear technique says that the analytical information on element and concentration originates from the nucleus and not from the atom. As such, chemical binding, chemical compound or matrix composition have no essential influence on the accuracy of the results [70, 77]. It should be noted here that although techniques such as Particle/Proton Induced X-ray Emission (PIXE) and X-Ray Fluorescence spectrometry (XRF) are basically derived from the behaviour of inner orbital electrons rather than the nucleus itself, they are often counted as a nuclear technique, primarily because inner orbital electrons do not predominate in the characteristics of the atom’s chemical behaviour (but see also [77], for NMR and Mössbauer techniques).

As said, a key feature of most NATs is the principal absence of effects by element chemical forms; the absence of necessary sample digestion procedures is a further advantage [77]. Apart from being an analytical advantage, the first property should be regarded as a drawback where information on a particular form of an element is required (viz. methylmercury rather than total mercury), the second only applies for nuclear signals of high penetrative power. However, the intrinsic accuracy of most NATs makes that they are in regular use in validation procedures of analytical methods, the development of new reference materials, and the set-up of base-line elemental information in a variety of (health-related) environmental issues.

Nuclear analytical techniques such as Instrumental Neutron Activation Analysis (γ-rays oriented INAA, [78]), PIXE, or XRF are all multielemental techniques, and non-destructive. Advantages of the first characteristic are discussed in the previous section, the latter makes them different from a large number of other widely used (non-nuclear) analytical techniques, such as atomic absorption spectroscopy (AAS), Inductively Coupled Plasma spectrometry (ICP), or mass spectrometry (MS). It also makes that INAA, PIXE or XRF are principally well suited for the routine analysis of the solid samples often encountered in environmental research: it is not necessary to bring the sample into solution, with all the associated problems ranging from incomplete digestion (elemental losses) to impurities in the applied chemicals (elemental contamination) [70, 76, and see section on Problems in digestion and the down-scaling of sample masses in analytical protocols]. It may be clear that the absence of effects of chemical forms and the non-destructive character of the nuclear technique pays off particularly in (environmental) base-line surveys comprising large numbers of samples and/or strongly varying sample matrices (e.g. ecosystem research, including samples of soils, plants, animal tissues). It should be noted, however, that nuclear techniques may be successfully applied also in mechanistic (process) dynamic environmental studies, due to the isotope-specific responses in nuclear analytical approaches [77].

Multielemental analysis by INAA and PIXE

INAA stands out by non-destructivity, multi-element capability and adequate limits of detection for the majority of elements of environmental interest. The absence of digestion steps and the independence of chemical forms make that a high level of accuracy can be obtained [77, 78]. The intrinsic technique characteristics imply that this high accuracy can be maintained over a large dynamic range, from ppb to % level [70].

Gamma-radiation is the radiation of choice in neutron activation analysis, since it is mono-energetic and in most cases characteristic for the emitting nucleus. The other advantage of γ-radiation is that it has a high penetrating power, so that it is hardly adsorbed in the radioactive material itself [78]. INAA is routinely used in a mg to g mass range of samples [79], and only when the sample matrix has a high overall atomic number Z, and/or when large samples are being analyzed, problems may arise due to self-attenuation of the induced γ-radiation, and, even more exceptional, self-shielding of the neutrons during irradiation. These phenomena are well-understood, and proper corrections can be applied, thereby making INAA applicable in a mg to kg sample mass range [78, 80, 81 and see sections on Reducing sample numbers in analytical processing and Analytical instrumentation for large volume samples].

Although INAA may be calibrated for a large number of elements (up to 70 elements calibrated at IRI, Delft, The Netherlands, see [70], INAA is not capable of determining low-Z elements of environmental interest, such as Be, B, Li, or high-Z elements such as Pb, Bi and Tl. Here, complementary techniques should be applied.

PIXE and INAA overlap and partly complement each other with respect to elements. The main difference between INAA and PIXE is that X-ray energies associated with PIXE are much smaller than the energies of emitted γ-rays used in routine INAA. These differences come out in differences in self-absorption (absorption of rays within the sample), which makes that PIXE should be practically regarded as a “thin-layer” (surface-related) technique, with energy-related depth-profiles, also depending on general sample matrix characteristics. This means that for all sample materials, including environmental ones such as bio-organisms, biomonitor materials, air particulate matter etc., key steps in PIXE analyses are the sample homogenization and pelletization: eventual elemental determinations are generally carried out on basis of very small (lower mg range) sample masses (see section on Consequences of the use of more than a single analytical technique, or more than a single analytical laboratory for a INAA vs. PIXE comparison).

Both INAA and PIXE are intrinsically accurate, but for PIXE the small X-ray energies involved, taken together with the small sample masses in actual analysis, imply a strong dependency on both sample matrix characteristics and bulk sample homogeneity. This makes that for PIXE particular difficulties may be encountered in quantitative calibration procedures. An associated problem is that existing Certified Reference Materials (CRMs) are generally certified for much larger sample masses: they are mostly inadequate for quality control in PIXE [82]. Therefore, in both sample and CRM preparations, much effort is devoted to increase the number of sample particles in analyzed sub-samples by grinding into smaller individual particle sizes and avoiding so called “nuggets” [82].

Although high-(energy)-resolution detectors are in use in both PIXE and INAA, peak overlaps are encountered in both techniques. The resulting doublets or multiplets can often be resolved mathematically without too many difficulties [78], but in both techniques, however, irresolvable multiplets remain, for which parallel information should be available (e.g. Si–Na–Al–Mg and P–Al in INAA, or the As (K α), Pb (L α) interferences in PIXE) [83]. Here, too, both techniques may complement each other in resolving overlaps [84].

Epidemiology and biomonitoring

Epidemiology

Epidemiology is a medical science which studies the frequency of diseases, cure, mortality etc. [16]: it is the dynamic study of the determinants, occurrence, distribution and control of health and disease in a population. It is the study of patterns of disease occurrence, it tries to explain these occurrences in terms of etiological, diagnostic and therapeutic issues, in short, it tries to relate the disease frequencies to these risk factors (determinants) which are of affect to these frequencies. Basic approach in epidemiology is the comparison. This comparison can be observational (descriptive, non-experimental, disease in terms of time, place, and person), or experimental, the latter mostly in experiments in which ones deliberately tries to change a single factor (the cause), of which the predictive alteration in the effect, not due to chance, is studied. Especially in human issues however, experimental manipulation of determinants is often a (ethical) difficulty, and therefore hardly ever fully possible. Because of this, often the obtained correlation between factor and frequency is topic of discussion: the question addressed is always the question of causality.

Of importance in causality issues is what is called the “confounder”. Confounding is the mixing of the effect of a certain determinant with that of another. Confounding determinants mostly correlate to both the determinant of interest and the effect. In addressing confounders, the researcher should have a-priori knowledge, should measure all variables, should match in experiments, but confounding remains one of the most difficult issues in epidemiology.

As an example, consider smoking and lung cancer, and co-consider the confounding determinant “carrying of matches”. It should be clear that without a-priori neglecting the “match”-confounder in a group of smokers, finding the causal relationship between smoking and lung-cancer incidence may remain unresolved!

Another example is the study by Cislaghi and Nimes [85]: they reported the strong reciprocal relationship between a regional lichen biodiversity index and the occurrence of lung cancer mortality. As may be clear, they reported the relationship between a confounder and the event. The underlying cause possibly was the atmospheric SO2 concentration, which may have caused both a decline in lichen biodiversity and a rise in lung cancer mortality. (it should be noted here however, that SO2 may be confounding for lung cancer mortality and at the same time be the actual cause of the decline in lichen biodiversity index).

Analytical epidemiology can be performed in (1) (retrospective) case-control studies, in which those affected with a disease are studied in comparison to a control group similar in all known ways except for the disease itself: the present effect could for example be hepatitis and the cause in the past may be a certain blood product, or (2) (prospective) cohort-studies, in which those exposed and non-exposed to a certain cause are followed to see the effects of that cause: for example, a blood product could be studied to see the possible future occurrence of hepatitis.

Ecological epidemiology is a branch of epidemiology which views disease as a result of (ecological) interactions between populations: the bias that may occur (ecological fallacy) is the observation of an association between variables at that higher aggregate population level, which not necessarily represents an association that exists at the individual level.

In whatever the approach, epidemiology relates a dose to an effect, and the reliable estimation of both dose and effect are of equal importance. It makes that the analytical uncertainty should not contribute to the sampling uncertainty [68], and, due to the very nature of the epidemiological approaches, the studies should allow for (analytical) assessments which are possible on large numbers of samples (throughput, costs etc) and extended areas and/or periods of time. Here, element assessment comprises sampling, storage, milling and homogenization and analysis.

Biomonitoring

Larger-scaled biomonitoring surveys of element air pollution often comprise large geographical areas [5767], which implies large numbers of samples, combined with extensive elemental assessment and analysis. Considering approaches like this in an epidemiological context [86, 87], these surveys should be seen as “cause” terms in an ecological epidemiological study: the outcomes should be related to disease occurrences with continuous consideration for the possibility of ecological fallacies. The outcomes, however, may definitely serve to tune further future epidemiological study, and may be a large-scaled frame to direct more refined experimental epidemiology.

Current literature

The present paper reviews literature which is relevant for the issues raised in the foregoing paragraphs, it presents biomonitoring of element air pollution, both general and in the health-context, it discusses (analytical) problems and possibilities, and tries to arrive at the points which are necessary to be addressed or controlled in future larger-scaled biomonitoring in a health-related (epidemiological) context. It should be noted here that biomonitors are presently regarded as reflectors of atmospheric deposition or atmospheric element concentrations rather than as impact sensors in a biological or ecological context. Furthermore, and although most issues are presently discussed in a spatial context, the very same reasoning applies to time-associated larger scaled biomonitoring.

The dose–effect relationship in epidemiology

In general terms, studying the possible association of a dose to an effect asks for the set-up of a dose–effect relationship: a regression analysis is carried out to judge the validity of the association. In epidemiology, associations are mostly expressed in relative risk (RR), or risk ratio, which defines the risk of an event (disease, mortality…) relative to an exposure. RR is a ratio of the probability of the event occurring in the exposed group versus a non-exposed group. For example, if the probability of developing lung cancer among smokers is 20% and among non-smokers 1%, then the relative risk of cancer associated with smoking would be 20: smokers would be twenty times as likely as non-smokers to develop lung cancer.

In epidemiology, also the term odds ratio (OR) is commonly used, which is the ratio of the odds of an event occurring in one group to the odds of it occurring in another group. For example, suppose that in a sample of 100 men, 90 have drunk wine in the previous week, while in a sample of 100 women only 20 have drunk wine in the same period. The odds of a man drinking wine are 90 to 10, or 9:1, while the odds of a woman drinking wine are only 20 to 80, or 1:4, or 0.25:1. Then, 9/0.25 = 36 so the odds ratio is 36. This example shows how odds ratios can sometimes seem to overstate relative positions: in the example men are 90/20 = 4.5 more likely to have drunk wine than women, but have 36 times the odds.

In a Dutch NCLS-AIR Cohort study, running during the 1987–1996 period, in which 984,589 persons were followed, black smoke and PM2.5 were taken as cause and various mortalities were considered as the event [88]: mortality due to natural causes, and due to cardiovascular, respiratory, lung and other diseases respectively showed RRs to black smoke as 1.05, 1.04, 1.22, 1.03 and 1.04, and to PM2.5 as 1.06, 1.04, 1.07, 1.06, 1.08. These data show the relatively small RRs, and indicate that analytical equipment (black smoke, PM2.5) must have been maintained over the full 10-year study period at a very high and steady quality and performance level, this to allow for the reliable determination of the RRs.

In another study, on toenail mercury Hg levels (the expression of the cause) and the risk of myocardial infarction (the event) [89], persons were followed in 10 European cities from Spain to Russia (to maximize the cause-range), the toenail Hg levels were strongly skewed (still only few data above 0.50 μg g−1 levels), the odds ratios for the majority of the people involved were <1.25, and the mean Hg toenail levels in patients were <20% higher than in controls. This again underlines the necessary maintenance of a very high quality level in analytical performance.

Considering confounders in epidemiology says that in many cases a variety of possible causes should be brought into data processing protocols: each of the considered “doses” may explain a certain fraction of the occurring event, and the validity of the cause “of interest” is still to be guaranteed in the multivariate regression analysis.

Some examples of data robustness

From the work of Sarmento et al. [90] two examples may be taken to discuss data robustness (quality), and judge possible effects on eventual results in terms of dose–effect relationships. A first example is a time-series of ozone-measurements in Lisbon, Portugal, ranging from 1999 to 2004 (Fig. 1, unpublished results). Over the whole period some periodicy can be seen, indicating the O3 fluctuations throughout the year, but what is remarkable is a gap in early 2002, and the sudden change in average level from that moment onwards. The explanation of the gap is a break-down of the equipment. After repair, the subsequent rise in observed O3 levels may be a true change but may also be a shift in the equipment’s settings or calibration: the question here is whether or not shifts like this may largely lead to false negatives or false positives in the dose-effect associations.

Fig. 1
figure 1

Ozone measurement in Lisbon, Portugal during a 1999–2004 period (Sarmento, S.F.M. et al., unpublished results)

Another example is effect of clustered extreme values on the slope of dose-effect curves [90, 91]. Here in a simulated data set of 200 observations, 0–7 clustered extreme values were introduced, set at values of no more than 2.5× the data average. As can be seen clearly, slopes started to change already at the introduction of 1–2 extremes, which suggest that data quality is of paramount importance in maintaining the anticipated dose-effect curves.

Consequences of up-scaling and dynamics of biomonitioring: the status quo and future of general biomonitoring

From the early days of biomonitoring of element air pollution, surveys have been made larger, both in geographical- and time-scales [5767]. Underlying reasoning was that air pollution is trans-boundary [92] (which made biomonitoring international), that local variances in outcomes should be made as small as possible relative to the survey’s variances, this to maximize the signal relative to the noise, and, last but not least, to find more extreme differences in ambient conditions [69, 9396]. Further, more and more attention was focused on possible source terms, this to permit emission-related interpretations of the results obtained: factor analysis (FA) was introduced to help organize the data into a limited number of factors (possible source profiles) [49, 71, 72, 74, 97, 98]. In later years FA was also used to clean-up data (removal of specific factors) and/or to select specific factors. The latter was carried out to focus on specific sources, and to try and make the data more robust, in the sense that full source profiles were less likely to suffer from possible element-specific outliers than with single-element approaches [69, 99].

A problem in up-scaling was that the data comparability became a point of relevance. In larger geographical areas, studies comprised more than a single biomonitior species, because a single one could not be found throughout the whole area [100, 101], species-calibrations were introduced, and tracking of condition-related variability in vitality, morphology, physiology (and hence response curves!) attracted more and more attention. Needless to say that all of these (interspecies) calibrations introduced a further excess of variance in the biomonitoring data [100, 101]. Various authors attempted to set lichen properties such as vitality, pK-values and binding capacity as properties to be used and judged in lichen-comparisons [102111]. Godinho et al. [102] could find hardly any differences between lichens from remote background sites and those transferred to polluted sites for both pK values and exchange capacities, others used vitality changes as indication of air pollution stress [112117].

A rather relevant difference in understanding is encountered if the dynamics of biomonitor organisms are reviewed. In moss monitoring, a certain growth increment of the plant is sampled, of which the growth period can be judged. In (quantitative) performance calibrations, the moss elemental concentrations are then regarded relative to the metal deposition data over that full growth period, and a calibration curve is set-up, with an assigned “efficiency” of metal retention in the moss tissues [5761, 66, 118]: the results show a strong element-dependent quality of these calibration curves.

Somewhat in apparent contrast, for lichens the approach is that any biomonitoring organism will be always in a dynamic process towards equilibrium with its ambient conditions, the dynamics of which may be condition-, species- and element-specific [119122]. This means that the biomonitor will gain and release elements in dependence of the conditions. The Chernobyl accident can be used as a clear example: reports show a clear gaining in radionuclide levels in biomonitors at the time of the accident and a slow but progressive decline in concentrations in the years thereafter [123125].

The question to be answered when using biomonitor organisms clearly is which period of ambient conditions they actually reflect. This reflection period may be the reason for the element-dependency of the quality of moss calibrations (they fit or do not fit with the adopted periods of deposition). Results by Reis et al [119] and Godinho et al [121] with lichens clearly show element-specific differences in dynamics: for example, Godinho [121] reported short reflection periods for As or Rb, and long reflection periods for Sb and Hf.

The above prompts some thoughts on the status quo and future of biomonitoring. After 40 years of biomonitoring and research into the biomonitoring approach, we may be still far from a situation where biomonitor organisms can be applied on a routine basis as a (quantitative) tool in air pollution studies. In the meantime, alternatives have been developed, both technically and economically: new tools such as small, solar-powered air filter set-ups (www.freepatentsonline.com/4924762.html) or fibrous ion-exchange collector materials (ifoch.bas-net.by/fiban_eng2.htm) are now becoming attractive alternatives for e.g. biomonitor transplants. Biomonitors have an important and unique advantage though: they are in place before one knows their information is needed, and they are cheap in use. And even if that information is only semi-quantitative or of a qualitative nature, biomonitors are the only tool capable of providing information about unforeseen accidental releases, and they are an economic tool for studying large areas for possible air pollution problems and for source identification. This implies that (future) use should be mainly focused on in-situ growing bio-organisms as retrospective accident “recorders” and on larger-scaled surveys: for small-sized or short-term surveys present-day alternatives may be preferred.

Considering the larger-scaled biomonitoring surveys in an (ecological) epidemiological context, both the difficulties related to upscaling, (the difficulties in estimating comparability of biomonitor’s performance over larger areas, and in the interspecies calibration), and the specific dynamics of the biomonitor should be seriously addressed. If possible, events of interest (“disease, mortality…”) should be compatible to the reflection period of the biomonitor (“cause”), in the sense that events of a long incubation period (e.g. certain cancers) are difficult to associate to relatively short-term reflection periods, unless these “cause” conditions can be shown as stable over a long period of time.

Consequences of the use of more than a single analytical technique, or more than a single analytical laboratory

Investigating the IAEA-336 lichen reference material, Marques et al. [126] compared PIXE and INAA analytical methods. INAA is an analytical technique in which sample sizes were used up to 500 mg, PIXE is a micro-analytical technique, in which, due to the very nature of the low X-ray energies involved, only μg masses of pressed sample pellets are actually “seen” in analysis [127]. Marques et al. [126] performed multiple element assessments of various individual samples, of various particle sizes, milled from material taken out of the IAEA reference material and judged the number of replicates necessary in PIXE to match the corresponding outcomes for INAA. Based on a z-score criterium, she concluded that the possible matches were variable and element-specific: for Mn only 2–3 PIXE replicates sufficed to match INAA irrespective of the particle size, for Fe no match was obtained except for the 125 μm particle sizes and 10 PIXE replicates, for Zn 2–6 PIXE replicates were necessary, and for K she needed up to 10 PIXE replicates for the smallest 20–40 μm particle sizes.

In the ongoing moss European surveys [6467], a large number of countries participate, and a variety of analytical techniques is used: AAS, AFS, AMA, CVAAS, CVAFS, ETAAS, FAAS, GFAAS, ICP-ES, ICP-MS, INAA. General laboratory performance is judged by issuing moss M2 and M3 reference material, but differences in outcomes occur due to the fact that INAA yields total element concentrations and other methods rely on the elemental fractions digested in strong nitric acid digestion procedures. In mapping, INAA is therefore left out [67], but differences between countries may be still due to either differences in analytical approaches, differences in laboratory quality, or both. As these differences occasionally show up by sudden trans-boundary shifts in concentrations (which of course is not compatible with idea of a continuous atmospheric metal abundance), the question may be raised as to whether it may be wise to bring all element assessment into the hands of a single or only very few high quality laboratories. The point here is that the goals of the survey should dictate what serves these goals best: if these data are to be used in any epidemiological context, highest data quality is necessary.

For interlaboratory data, the reader is referred to to the relevant information associated to numerous certified reference materials (NIST, BCR, CRM, IAEA etc).

Problems in digestion and the down-scaling of sample masses in analytical protocols

In epidemiology, many studies relate events to dietary intake of elements. Zukowska et al. [128] compared the direct method (duplicate diets for analysis) to indirect methods (market basket approach, with information on consumption). The direct method was regarded as accurate and simple, but costly, time-consuming and not-yielding information on individual components, whereas the indirect method was regarded as rapid and inexpensive, but vulnerable to systematic errors and data-insensitive. The same authors also discussed sample mineralization methods, and compared dry ashing with wet digestion methods (Table 1). Dry ashing is suspected as frequently leading to elemental losses to vessel walls and substantial risk of losing elements due to volatilization problems, the wet digestion method was regarded as complex, laborious, with high costs, high reagent volumes and subsequent sample dilution, allowing for small sample size only, and regularly showing problems with incomplete digestion.

Table 1 Sample mineralization methods

The above shows some of the problems associated to the use of e.g. AAS, ICP and other methods (see section on Consequences of the use of more than a single analytical technique, or more than a single analytical laboratory), but the down-scaling of sample mass to be taken into actual analysis poses an additional potential problem. Here the work of Wilhelm et al. [129] may serve to illustrate the point: in a study on the dietary intake of As, Hg and Se by children, Wilhelm et al. used the duplicate diet sampling. In the study, 14 children were observed, 98 duplicate diets were processed, food samples were weighed, homogenized, lyophilized, stored at −20°C, and, finally, samples of 1000 mg each were processed in metal assessment. Considering the data, the day-averaged food intake by the children was ca 250 g. This means that of the 250 g portions, only 1 g portions were taken into analysis. The question is how to ensure that these 1 g portions actually represent the initial 250 g food mass.

The key issue here is the presumed effectiveness of the homogenization procedure. In a study on tree bark, Wolterbeek and Verburg [92] sampled 32 individual tree bark samples. They INAA-assessed the elemental content of each of the individual samples, and they mixed the total afterwards into a 1 kg total sample mass. This mass was homogenized, and subsequently 32 subsamples were taken again, and INAA-assessed. The authors concluded that the level of homogenization was element-specific, ranging from no-effect to a variance-decrease of factor 7. These results indicate that the eventual data-quality may strongly depend on the sample processing in relationship with the mass actually taken into elemental assessment. Considering that these masses may not always adequately represent the initial sample, a further discussion on minimal sample mass to be taken into element assessment may be asked for.

A last example here to illustrate the necessity of addressing these issues comes from a study by Emond et al. [130]. Hepatic iron concentrations (HIC) were assessed from biopsy-sized liver samples (ca 3 mg dry weights). Several biopsies were taken from a total liver (Table 2), a local variance was determined (within 5 mm distance), a remote variance was estimated (3–5 cm distance, in excess of the local variance) and the total variance was assessed (local plus remote). Local variances ranged up to 30%, remote ones totaling up to near 100%, which indicates that, unless the liver is adequately homogenized no representative sample can be taken.

Table 2 Variance in the hepatic iron concentrations (HIC, Fe mg kg−1)

Biomonitor sample variability and Ingamell’s sampling constant

Zeisler et al. [131], studying both human liver and Mytilus Edulis (mussel) tissues took out 5 g subsamples, neutron-irradiated them, and mixed the subsamples back into the full sample mass. After thorough homogenization, Zeisler took out subsamples of increasing mass, to assess the mass to be taken to ensure a 1% sub-sampling error. For human liver, using a Teflon ball mill for homogenization, 32 g subsamples were needed, for mussel tissue, using a Teflon disk mill, 0.95 g subsamples already gave the 1% subsampling error. Apparently, a minimal subsample size should be chosen to ensure a certain representation, a size which may depend on both the sample material, the element and the homogenization method.

Wolterbeek et al. [69] and Sloof et al. [93, 94] studied the site-variability of element concentrations in biomonitor samples: they reported local variances in moss, lichen and tree bark up to 60%. These site-variabilities are not limited to biomonitor materials: Chaubey et al. [132], studying the outcomes for local multiple rainwater gauges reported variances in recorded amounts of rainfall up to ca 50%.

If an individual site should be regarded as an individual “cause” condition, as a basic unit of the epidemiological survey, how many local subsamples should be taken to ensure local representation? And if that number can be assessed, relative to the pre-set local uncertainty in the representation, should they all be taken into element assessment on an individual basis with possible subsequent problems in the needed capacity in sample processing and analysis, or should they be taken into a homogenized mixed mass, out of which an adequate subsample is taken into assessment routines?

Although Wolterbeek [69] argued that the signal-to-noise ratio determines the quality of a survey rather than the individual signal or the noise, thereby implicitly stating that survey quality cannot be judged on basis of small local variances, the point here is on the site-representation: this is why Ingamells’ sampling constant deserves attention: Ingamells gave the relationship between subsample mass and subsample error, related to each other by the sampling constant K s:K s is the subsample mass ensuring a relative subsampling error of 1% (68% CI) in a single determination [133, and see 126, 131, 134, 135].

Table 3 shows a large variability in K s values depending on the type of material, and undoubtedly, the way the material was processed and homogenized. Values range from 62 to 488 g for selected gold ores, range in an element-specific manner from 3 g (Mn) to 260 g (Eu) in IAEA-336 lichen reference material, up to 10 kg in leaf litter, up to ca 80 kg for Cu in soil samples. The high K s values show the difficulty in homogenizing initial bulk samples. Although it should be mentioned that high-tech homogenization procedures should be regarded to get K s down to lower levels, it may be argued if these approaches are compatible with high numbers of samples, and a necessary high throughput; they may certainly once more imply the necessity of the input from single or only few high-quality analytical laboratories rather than a distribution of all tasks to all participating survey-groups.

Table 3 The Sampling Constant K s. (Ingamells, C.O., Swizer, P. [97])

Homogenization and the selection of a particle size class

In certified reference materials (CRMs), element concentrations are certified or recommended with associated uncertainties, and in many cases the material is taken into that CRM in a limited size distribution. The underlying reasoning is clear: the CRM should live up to certain elemental concentration variance-ranges and these are best-answered in a selected size class. Due to the level of homogenization (K s), CRMs are generally referring to a certain minimal sample mass for which the material is certified.

In an attempt to visualize the consequences, Marques et al. [126] milled and sieved IAEA-336 lichen reference material (<125 μm) down to size classes <64, <41 and <20 μm. Surprisingly, they found the resulting fractional chlorophyll content (representing the fractional algal component relative to the fungal lichen component) to increase with decreasing particle size classes. This result clearly demonstrates that the milling and sieving processing comprised fractionation of the original material, in the sense that more and more fungal hyphae were sieved out. Depending on the metal fractional content of each component, this means that also metal concentrations varies with particle size classes. Mn may serve as an example, for which, with decreasing particle size, the concentration decreased by more than 50%. This result indicates that homogenization and sieving processes may alter the constitution of the material: it directly translates into the necessary strict protocollisation of any milling and sieving: any variability in that process may cause (unnoticed) additional variability in assessed element concentrations

Do samples fully represent local sites and survey properties?

In any survey, an important issue to consider is the local site as basic unit of a survey, and also the survey set-up itself should receive ample attention, the latter because the survey set-up should represent the area [69, 136, 137]. Boquete et al. [138] argued that the sampling grid of any survey interacts with the scale at which contamination (emission) processes occur, for each contaminant, so that the grid will only enable characterization of those processes at scales larger than the grid. The direct consequence is that some elements ask for small grids, for others larger grids may suffice. It is therefore difficult to establish a standard grid size for all elements. Spatial structures of the element concentrations in biomonitor materials may be used to define the necessary grid size; grid sizes may also be standardized as the size implicitly associates to the survey’s objectives. Of course, although fine grids allow for the inclusion of smaller scaled processes, they imply larger numbers of observations. Furthermore, the higher the level of skewness and kurtosis of the real situation (the data population, which should be reflected by the samples taken) the more observations are necessary. A problem in selecting the size or the number of samples taken in a “local site” is that these sites often are assumed as homogeneous and exhibit normal distributions, but in reality, one does not know. This again asks for relatively high number of samples.

As an example, consider populations of a size of 2000 simulated element concentrations, varying in the level of skewness and kurtosis. Any of these populations may be regarded as local site or as survey: in a local site the attempt is to be sure that the assessed element concentration represents the population average, and on survey level the attempt is to have outcomes which represent the population’s variance. Figure 2 shows the frequency distributions of three such simulated populations: one with skewness and kurtosis 0.69 and 0.51 respectively (which is very near a normal distribution), one with 1.92 and 8.19 respectively, and one with 6.60 and 93.22 respectively. The last two distributions are distributions which commonly occur in biomonitoring surveys of element air pollution, the first one is often assumed in local sites, but local distributions such as the other two may not be a-priori excluded.

Fig. 2
figure 2

Frequency distributions of populations (n = 2000) of simulated element concentrations. a Average 10.03, SD 25%, skewness 0.69, kurtosis 0.51. b Average 9.96, SD 50%, skewness 1.92, kurtosis 8.19 C: Average 10.14, SD 113%, skewness 6.60 kurtosis 93.22 (Sarmento, Verburg and Wolterbeek, unpublished results)

Table 4 shows the necessary numbers of samples to be taken to either represent the population’s average or its variance, with a certain margin of error, and with a certain level of significance. Suppose the local population is exhibiting a skewness and kurtosis of 0.69 and 0.51 respectively (Fig. 2a), and the local population average is to be represented within a 5% error with 95% significance, then 116 local samples are needed (Table 4, and see [69, 136, 137]). And suppose the survey population exhibits a skewness and kurtosis of 1.92 and 8.19 respectively (Fig. 2b) and its variance is to be represented within a 5% error with 95% significance, then 1432 survey samples are needed (Table 4). In total, this multiplies to 116 × 1,432 = 166,112 samples to be processed and analyzed. It may be clear that such numbers readily make that analytical capacities are exceeded, or that processing time exceeds the time available for the project. Questions to be answered here are the possibilities to set up surveys at higher aggregate levels (smaller numbers of samples taken, meaning larger grid sizes, by accepting the loss of accuracy and precision, or determining an acceptable level of accuracy and precision), to increase sample throughput, or to develop protocols in which the assessment of element concentrations is made into approaches which permit less individual samples to be processed. It should be noted here again that throughout larger survey areas, full biomonitors comparability in response behavior is needed [139141].

Table 4 Calculations of necessary sample size (number of samples) to represent the population average or variance (within a certain margin of error, with a certain level of significance), of populations of 2000 simulated concentrations, varying in skewness and kurtosis (Sarmento, Verburg and Wolterbeek, unpublished results)

Reducing sample numbers in analytical processing

Biomonitoring in an epidemiological context implies that the biomonitoring data (the “cause”) are of the highest possible quality, with an adequate throughput of samples. Especially in ecological epidemiology, the number of samples is high, and, depending on biomonitor reflection periods, sampling should be performed in a relatively short period of time. In the example given above (the previous section) the resulting necessary number of samples makes it virtually impossible to analytically live up to quality- and throughput demands, especially if samples should also be processed in terms of milling, homogenizing, with subsample masses down to micro-amounts for eventual element assessment.

As stated in the previous section, numbers of samples could be reduced by performing surveys at higher aggregate levels (larger grid sizes), but this may easily go at the expense of quality and spatial-sensitivity. Furthermore, survey details should always live up to and remain compatible with the details of the epidemiological information. To get surveys in line with epidemiological data of larger detail, empty survey grid cells may be filled in by approaches such as Kriging [136, 137, 142, 143] but Kriging routines (as all others) invariably result in decreased eventual survey variance [137]. This decrease may be determined and found acceptable, but another, and perhaps the only reasonable way out of the capacity problem, is to reduce the number of samples to be processed from local sites. How can this be done? Considering the work on local tree bark and other biomaterials [69, 93, 94], the observed local standard deviations are in line with the simulated data in the Table 4, suggesting needed local numbers up to 10–100 samples, easily leading to kg sized total local sample masses. Here, these masses should be milled, homogenized, digested and subsampled to small masses taken into element assessment (and see the problems discussed so far in sections on Consequences of up-scaling and dynamics of biomonitoring: the status quo and future of general biomonitoring up to Homogenization and the selection of a particle size class), or one should think of the only alternative, that is that analytical approaches should be employed which permit the assessment of concentrations in kg-sized samples.

It should be noted here, that the masses needed may imply that biomonitor materials other than the “conventional” lichens or mosses should be investigated. The early biomonitor materials were primarily selected on basis of their accumulative behavior, their common occurrence (notwithstanding their low masses) and the availability of only relatively non-sensitive analytical techniques [69]. Nowadays, analytical approaches have gained in sensitivity. Moreover, materials such as tree bark or pine needles may serve equally effective as biomonitor [69, 98, 144, 145], and much more mass or numbers of samples may be obtained in field work. Further study may be needed to arrive at “easy-to-sample” and “cheap” new biomonitor organisms or -tissues of high mass or number availability.

Analytical instrumentation for large volume samples

Today, most analytical instrumentation is focused on handling of minute amounts of sample. To the authors’ knowledge, there are only few places throughout the world where larger-volume samples can be processed analytically on a routine basis, and these are nuclear facilities. Table 5 gives entrees compiled from Bode [146]: the data indicate that only four today’s nuclear facilities may handle samples sizes >kg mass, of which only BARC and DELFT-RID [80, 81] are in actual operation (Bode, personal communication). The data show that the facilities operate at relatively low neutron fluency rates, but this drawback is compensated by the use of the large sample masses. Table 5 also shows that all of them are operated in manual mode. Also considering that large numbers of samples should be processed, these facilities should be upgraded if they should serve in larger-scaled biomonitoring: they should be operated by automated sample change systems, both for multiple sample irradiations and for measurements.

Table 5 Irradiation facilities for Large Sample Neutron Activation Analysis (LSNAA)

Conclusions

Today’s availability of solar-powered small air filter samplers and fibrous ion exchange materials is regarded as an adequate alternative for biomonitor transplant materials used in small-scaled set-ups, but biomonitors remain valuable in larger-scaled set-ups and in unforeseen releases and accidental situations. In the latter case, in-situ biomonitoring is seen as the only option for a retrospective study: biomonitors are there before one even knows that they are needed. For biomonitoring, nuclear analytical techniques are discussed as key techniques, especially because of the necessary multi-element assessments in both source recognition and single-element interpretation. To live up to the demands in an epidemiological context, larger-scaled in-situ biomonitoring asks for large numbers of samples, and consequently, for large total sample mass, this all to ensure representation of both local situations and survey area characteristics. This point may direct studies into new “easy-to-sample” biomonitor organisms of which high masses and numbers may be obtained in field work, rather than continue with biomonitors such as lichens. This also means that both sample handling and processing are of key importance in these studies. To avoid problems in comparability of analytical general procedures in milling, homogenization and digestion, the paper proposes to involve only few but high-quality laboratories in the total element assessment routines. Here, facilities that can handle large sample masses in the assessment of element concentrations are to be preferred. This all highlights the involvement of large-sample-volume nuclear facilities, which, however, should be upgraded and automated in their operation to ensure the necessary sample throughput in larger-scaled biomonitoring.