Background

Airborne transmission of pathogens in the outdoor environment is characterized by dispersion by the wind (horizontally and vertically). Such pathogens are either isolated or clustered cells or spores, or cells or spores attached to particulate matter or dust [1,2]. Well-known examples of airborne pathogens include:

  • The foot-and-mouth disease virus (FMDV) (livestock): major outbreaks have occurred in countries including the UK and France (1981) [3], Italy (1994) [4], The Netherlands (2001) [5], and the UK (2001 and 2007) [6,7].

  • Coxiella burnetii (livestock), a highly pathogenic agent causing Q fever in humans and animals. Major outbreaks have occurred in countries including Switzerland (1983) with 415 human cases [8], the UK (1989) with 137 human cases [9], France (1998–1999) with 73 human cases [10], Germany (2005) with 331 cases [11], and The Netherlands (2007–2010) with >4,000 human cases [12];

  • Legionella pneumophila (cooling towers and industrial sources). Major outbreaks have occurred, for example, in France (2003–2004) with 86 human cases including 18 fatalities [13], Norway (2005) with 56 human cases including 10 mortalities [14], and the Netherlands (2006) with 31 human cases [15].

  • Avian influenza virus (livestock): outbreaks have occurred world-wide [16];

  • Bacillus anthracis (‘anthrax’): one described outbreak occurred in the former Soviet Union (1979) [17].

In the case of an early phase of a (future) pathogen outbreak or release – generally related to (animal) industries or to bio-terrorism – it is, from a public and animal health perspective and for economic reasons [18], necessary to require insight in 1) the physical spatial spread of the pathogen, 2) the population at risk, and 3) the concentrations (infectious dose) to which persons and/or animals are exposed.

Traditional epidemiological spatial analysis techniques, such as the attack rate analysis, are however only useful for retrospective analyses and do not incorporate meteorological information [19]. Atmospheric dispersion models (ADMs) – mechanistic models developed to model the spread of particles and gasses spatially and temporarily as a function of meteorological conditions including wind speed and wind direction – may however be instrumental to simulate the spatial spread of pathogens released from a known source. Currently, three types of investigations using ADMs to simulate farm-to-farm, human-to-human, farm-to-human, or industrial-to-human airborne transmission may be distinguished: (1) qualitative investigations, in which airborne spread was modelled visually (e.g., [13,14,17,20]); (2) quantitative investigations, in which modelled concentrations were converted to doses and a quantitative microbial risk assessment was elaborated using dose–response models to calculate infection probabilities (e.g., [21,22]); and (3) the development of emergency preparedness systems and decision-support systems to be used during future outbreaks or releases (e.g., [23-25]). With respect to points 1 and 2, airborne transmission was generally indicated in case modelled concentrations near infected farms or humans exceeded threshold values or if infection probabilities were non-zero.

However, to our knowledge no studies have been published analysing the relationship between reported incidence rates, and concentrations modelled by ADM, using proper quantitative statistical measures. We wish to answer the question: are meteorological models indeed useful to explain observed incidence rates or disease notifications, or could the observed data also be explained by simpler models containing no meteorological information?

Therefore, we aimed at assessing quantitatively whether ADMs improve the correlation between modelled concentrations and observed human disease incidence rates. We used data from the large Q fever outbreaks in the Netherlands [12], and correlated ADM concentration levels to human disease incidence and compared these fits to more simple concentration models that do not contain meteorological information, namely 1) a model with a spatially uniform concentration, and 2) a model with concentration levels proportional to distance from the source.

If the ADM concentration levels correlate better with the human Q fever incidences than the concentration levels of the simple models, we then conclude that ADMs might be useful to predict and visualize the spatial and temporal pathogenic spread in case of an outbreak or release.

Method

Data

Human case data

Human case data have been made available by the Municipal Health Services in the Netherlands at the six-digit zip code level (PC6), i.e. street-level. Following [26], we focused on three relatively isolated Q fever outbreak areas where humans experienced exposure to C. burnetii from a large dairy goat farm as unique source in 2009. These areas include the Dutch provinces of Utrecht (area A), Noord-Brabant (area B), and Limburg (area C) [19,27] (Figure 1). The epidemic curves per week number in 2009 are shown in Figure 2A, B and C. The specific farms were classified as source in different investigations based on bulk tank milk tests, reported abortion numbers, epidemiological research, and a source detection method [19,26-30].

Figure 1
figure 1

Q fever incidence map. Map of the Q fever incidence (per 100,000 inhabitants) in the Netherlands in 2009, and the location of the three selected areas with their main source of exposure.

Figure 2
figure 2

Epidemic curves and emission profiles. (A/B/C) Epidemic curves of the three selected areas in number of cases per week; (D) lognormal emission profile of area C (lNormEpi), both as fit of the epidemic curve (dotted) and shifted 20.7 days back in time (solid); (E) steady-state emission profile during 2009 (conYear, dotted) and steady-state emission profile during epidemic (conEpi, solid) in area C; (F) idem as emission profile conEpi in subplot E, but with a threshold wind velocity of 4 m/s.

Population density data

Population density data at the PC6-level (reference date 1 January 2010) have been made available by Statistics Netherlands (CBS). The typical maximum distance between a C. burnetii infected farm and a case’s home address is 5 – 10 km [19,26]. We therefore selected two arrays of data: PC6’s up to 5000 m from the source, and all PC6’s up to 10 km away (Table 1). Dutch legislation allows using this case information for research purposes if information is not traceable to individual patients. In this case, consent of cases is not required. The case information can however not be made publicly available.

Table 1 Number of cases, number of inhabitants (including cases), incidence per 100,000 inhabitants, and number of zip codes within 5 km and 10 km from the sources in areas A, B, and C

Farm data

Coordinates of the source farms have been made available by the Ministry of Economic Affairs (reference date November 2009).

Concentration data

Simple models

The simple concentration models include:

  1. 1)

    A NULL model with a homogeneous concentration in space and time.

  2. 2)

    A DISTANCE based model with concentrations proportional to the distance between residential addresses of Q fever patients and the farms.

Atmospheric dispersion model

An atmospheric dispersion model (ADM) is a mechanistic model that calculates the physical dispersion of particles and gasses over space and time as a function of emission data and meteorological conditions. We used the Operational Priority Substances Short Term model (version 10.3.2), developed by the Netherlands National Institute for Public Health and the Environment (RIVM) (e.g., [31-34]). We considered particulate matter (PM10) to be a substitute for C. burnetii. The atmospheric dispersion model takes into account both dry and wet deposition of particles.

The OPS model requires hourly-based meteorological data (temperature, relative humidity, wind speed, wind direction, precipitation amounts, precipitation duration, global/solar radiation, and snow cover status) as input for the calculations. These data were retrieved from the Royal Netherlands Meteorological Institute (KNMI) and were determined at the meteorological stations. We spatially interpolated these data to obtain values at farm locations (see Additional file 1: Text S1 and Additional file 2: Figure S16, Additional file 3: Figure S17 and Additional file 4: Figure S18 for a detailed description of the meteorological data preparation). Precipitation data was deduced from precipitation radar images and was available at a 1 km resolution.

The output of the OPS-ST model consisted of hourly averaged PM10-concentration matrices (250 m resolution), which we converted to period-specific (see next section) averaged concentration maps. We normalized the concentration values per PC6 to the maximum concentration in the grid (i.e. the concentration at the source).

Emission profiles (ADM)

No data were available on the emission strength of C. burnetii. Therefore we defined three simple emission profiles (Table 2):

Table 2 Characteristics of the three simple emission profiles as input for the ADM model
  1. A)

    Emission profile “conYear”: steady-state emission strength during the entire period (year 2009) (Figure 2E).

  2. B)

    Emission profile “lNormEpi”: a lognormal emission profile based on the epidemic curve per area, which corresponds well with the lambing season at the farms (Figure 2D).

  3. C)

    Emission profile “conEpi”: constant emission strength starting at the day of the 2.5% percentile and ending at the day of the 97.5% percentile of the lagged profile “lNormEpi” (Figure 2E), that is from 27 March to 5 July (area A), 23 February to 8 June (B), and 6 February to 4 October (C).

Although the actual emission profiles will have been much more complex, there is some biological justification for the simple profiles. If one ignores wind direction and meteorological conditions, then a steady-state emission profile is related to a steady-state exposure level. A steady-state exposure seems plausible if goats were shedding bacteria successively during a certain period and/or the farm’s surrounding environment was contaminated as well – the inactivation rate of C. burnetii is very low [35] - therefore leading to multiple sources.

A lognormal emission profile could be related to a combination of processes: (1) a (very) short period of high shedding occurred, leading to a normal epidemic curve since the incubation period is distributed normal, or to a lognormal epidemic curve as a result of the normal-distributed incubation period in combination with a contaminated environment; (2) the shedding rate was time-dependent and followed a (log)normal curve, potentially leading to a lognormal epidemic curve if a contaminated environment is considered as well.

In addition we considered four threshold wind speeds for emission of C. burnetii, namely 0, 2, 4 and 6 m.s−1 (profiles V0, V2, V4, and V6 respectively). In the case where the hourly wind speed at the farm was lower than the threshold value, we assumed that bacteria would accumulate in the stable and would be released during the next hour that the wind speed threshold was exceeded (Figure 2F). The reason for this choice is that stables of large dairy goat farms are very open to the outdoor environment; thus, pathogens deposited on stable floors and surfaces can easily be aerosolized by strong enough winds, and then be dispersed to the farm’s surrounding environment.

Statistical analysis

Incidence versus concentration

The dose–response relationship for infectious micro-organisms like C. burnetii is given by (e.g., [36]):

$$ {p}_i=1- \exp \left(-\kappa {\lambda}_i\right) $$
(1)

with p i being the probability of infection at PC6 i, κ being the single-hit probability of initiating infection, and λ i being the dose at PC6 i [number of pathogens]. Since the observed overall incidence of Q fever during the Dutch epidemic is relatively small (Table 1), we can assume that the doses λ were relatively small too. Since exp(λ) ≅ 1 + λ for small values of λ, equation [1] approaches a linear equation:

$$ {p}_i\cong \kappa \cdot {\lambda}_i $$
(2)

For each PC6 i we determined the number of cases k i and inhabitants n i. Assuming that the probability of infection p is equal to the incidence I, and that the log-dose λ is proportional to the log-concentration, one could test which concentration model (NULL, DISTANCE, or ADM with the emission and wind speed threshold configurations) gives the best fit to the incidence by means of a Poisson generalized linear model (R version 3.0.3):

k i  ~ Poisson(μ i )

$$ \log \left(\overrightarrow{\mu}\right)= \log \left(\overrightarrow{n}\right)+{\beta}_0+{\beta}_1\cdot \log \left[f\left(\overrightarrow{x}\right)\right] $$
(3)

where μ is the expected outcome, β 0 and β 1 are the intercept and slope of the log-linear fit, and f(x) is the concentration function. In order to fulfil the linear conditions of equation [2], the slope should approximate 1 (i.e. β 1 ≈ 1):

$$ \exp \left[ \log \left(\overrightarrow{\mu}\right)\right]= \exp \left[ \log \left(\overrightarrow{n}\right)+{\beta}_0+{\beta}_1 \log \left[f\left(\overrightarrow{x}\right)\right]\right] $$
(4)

which equals

$$ \frac{\overrightarrow{\mu}}{\overrightarrow{n}}= \exp \left[{\beta}_0\right]\cdot f{(x)}^{\beta_1} $$
(5)

and thus exp[β 0] ~ κ.

For the NULL model, we defined f(x) = 1, for the DISTANCE model \( f(x)={\overrightarrow{r}}^{-2} \), and for the ADM-models f(x) is a function of a large set of meteorological equations.

Model comparison

To compare the performance of the NULL, DISTANCE and ADM models we applied a cross validation test [37]. That is, for each of these models we randomly selected 2/3 of the number of PC6’s (training data) and estimated the intercept (β 0) and the slope (β 1) of equation [3]. Subsequently, we applied that linear model to the remaining 1/3 of the data (test data), predicted their outcome, and calculated the residual deviances (δ c) for each cross validation test c. Finally, we calculated the total residual deviance (d c), being the sum of the residual deviances. We repeated this cross validation test 10,000 times, and calculated the mean total residual deviance D per concentration model:

$$ D=\frac{1}{v}\cdot {\displaystyle \sum_p}{d}_p=\frac{1}{v}\cdot {\displaystyle \sum_c}{\left({\displaystyle \sum_q}{\delta}_q\right)}_c $$
(6)

with v = 10,000 and c = 1…v.

Finally, we compared the D-values of the different concentration models by means of a two-sample t-test with 5% significance.

Results

Mean total residual deviance (D)

Figure 3 and Table 3 show that in area A the ADM’s with profiles conEpi-V0 and conEpi-V2 have the lowest D (±7.8% lower than the NULL model). The DISTANCE model has a 4.5% lower D than the NULL model. The ADMs with an annual constant emission profile (conYear) always performed worse than the DISTANCE model; this corresponds well with the observed short duration of the epidemic in this area (Figure 2A). The ADMs with profile V6 performed approximately equal to the NULL model. Increase of the selection radius to 10 km (Figure 4, Table 4) did not lead to major changes, although the indexed D’s are lower. Note that D for the conEpi-V0 model is significantly lower than that of the conEpi-V2 model.

Figure 3
figure 3

D-values (5 km). D-values of the NULL, DISTANCE and ADM-models, relative to the D-value of the NULL model, based on all PC6’s within 5 km of the source. The vertical black line represents the D-value of the DISTANCE model. The Roman numerals refer to groups of models with a significantly equal D value (I = significantly lowest D-value; II = D significantly lower than those of I, etc.).

Table 3 Five km results with mean total residual deviances ( D ) and each model’s ranked position
Figure 4
figure 4

D-values (10 km). Idem as Figure 3, but based on all PC6’s within 10 km of the source.

Table 4 Ten km results with mean total residual deviances ( D ) and each model’s ranked position

In area B, the results differ considerably compared to area A. The D-values of all models, except for the ADM-models with V6, are approximately 10% lower than that of the NULL model. The differences in D between the DISTANCE model and the ADM’s with V0, V2 and V4 are small, but the model with conYear-V4 still performs significantly better than the DISTANCE model (p < 0.05).

For the 10 km selection radius (Figure 4), the D-values are ± 17% lower than that of the NULL model (ADM V6-models not included). In this case, conEpi-V0 gives the best performance, but the differences in D with respect to the DISTANCE and ADMs with V0, V2 and V4 remain small.

In area C the difference in D between the DISTANCE and the NULL model is 11.5%, and all ADM’s (except for lNormEpi-V6) performed better with a D of ± 16-23% lower than that of the NULL model. The best performance is given by conYear-V2 (seeming to correspond with the longer duration of the epidemic as depicted in Figure 2C). Note that in general the ADM’s with lNormEpi have the highest D, and that the ADMs with conYear all result in relatively low D-values, even with V4 and V6.

For the 10 km selection radius (Figure 4) the D-values of all models improves compared to the NULL model, but all ADM’s with V4 and V6 have a significantly higher D than the DISTANCE model. The ADM’s with profiles conEpi-V0 and conYear-V2 have the lowest D-values.

Additional file 5: Figure S1, Additional file 6: Figure S2 and Additional file 7: Figure S3 show the ADM concentration plots for all emission curves and threshold wind velocities of areas A, B and C. Additional file 8: Figure S4, Additional file 9: Figure S5, Additional file 10: Figure S6, Additional file 11: Figure S7, Additional file 12: Figure S8 and Additional file 13: Figure S9 show the predicted versus observed incidence rates per PC6 of areas A, B and C as a function of the selection radii of 5 and 10 km. Additional file 14: Figure S10, Additional file 15: Figure S11, Additional file 16: Figure S12, Additional file 17: Figure S13, Additional file 18: Figure S14 and Additional file 19: Figure S15 show the geographical plots of the observed and predicted incidence rates.

Validation of the dose–response linearity

From equation [2] and [5] we inferred that ideally the slope of the log-linear fits (\( {\beta}_1 \)) would approximate 1 if the overall doses were relatively low. In area A, this boundary condition is met for the four best models (considering the 95% confidence interval) (Tables 3 and 4). In areas B and C, the condition is (almost) met for the majority of the models. These results support a linear dose response relation.

Discussion

In the current study, we correlated observed Q fever incidence numbers to modelled averaged concentrations of C. burnetii from two simple concentration models (NULL and DISTANCE) and of an atmospheric dispersion model with varying emission profiles and threshold wind speed. If all cases were uniformly distributed over the outbreak area, the model with the spatially homogeneous concentration (NULL model) should have had the best fit.

Instead, the DISTANCE model always performed better than the NULL model, which is possibly due to clustering of cases around the infected farms (a similar pattern was observed in previous study [26]). In addition, the observed incidence numbers correlated significantly better to the concentrations of some ADM models in all areas (but with small differences compared to the DISTANCE model in one of the three areas), indicating that meteorological conditions might have played a role during the Dutch Q fever epidemic.

The best fitting ADM models all had a slope in the log-linear fit (β 1) approximately equal to 1 or with the same order of magnitude. This might be an indication for relatively low doses, as is supported by C. burnetii measurements that were performed in 2009 [38].

In this study we applied a threshold wind speed for emission for two reasons. First, turbulent movements of the air are required to aerosolize the bacteria deposited on stable surfaces by dairy goats. Secondly, introduction of a threshold wind speed caused the annually averaged concentrations to be direction dependent (Figures S1-S3).

We conclude that, in general, the ADMs with a threshold wind speed of 0 or 2 m/s performed best. However, for aerosolization higher threshold wind speeds are required generally [39], especially in case of rough terrain such as a farm environment. We think two physical explanations exist for the relatively better performance of V0 and V2 profiles. Firstly, a sufficient amount of bacteria may have been aerosolized in the stable not only by the wind, but also by physical activity within the stable (e.g., feeding operations and movements of goats). Secondly, the surrounding environment of the goat stables may have been contaminated during the epidemic – given the high persistence of C. burnetii in the environment [35] – and thus a larger surface source may have developed. Thus, cases could have been infected from a wider range of wind directions.

Additional file 8: Figure S4, Additional file 9: Figure S5, Additional file 10: Figure S6, Additional file 11: Figure S7, Additional file 12: Figure S8 and Additional file 13: Figure S9 show the scatterplots of the predicted versus observed incidence rates in the three areas; Additional file 14: Figure S10, Additional file 15: Figure S11, Additional file 16: Figure S12, Additional file 17: Figure S13, Additional file 18: Figure S14, Additional file 19: Figure S15 and Additional file 2: Figure S16 show the geographical prediction plots. These plots make clear that the statistical Q fever incidence prediction (in a statistical sense) should be improved further. In general, the data are rather scattered and not close to the 1×1 line. We think several causes may explain the moderate predictability:

  1. 1)

    Lack of actual emission data, as a result of which actual concentrations might have deviated significantly from modelled concentrations.

  2. 2)

    Timing of concentration values (linked to point 1). Since time-dependent emission curves were unknown, we correlated observed incidence rates to cumulative concentration values. However, in reality, cases might have been infected after exposure to a particular dose or particular cumulative dose.

  3. 3)

    Presence of a contaminated environment. The total Q fever epidemic in the Netherlands lasted four years with seasonal outbreaks. Although we specifically focused on 2009, infections also occurred in other years in the selected areas. Results from both bulk tank milk sampling [28] and goat vaginal swab sampling [38,40,41] indicated that the source farms had already been positive in 2008. Indeed, the farm in area B had already caused a human outbreak in 2008 [19]. This may have resulted in an already contaminated environment prior to 2009 [42,43], favoured by the relatively low decay rate of C. burnetii [35]. A combination of emission from infected farms as well as emission from contaminated surrounding environments could have resulted in the clear absence of a (wind) direction dependent incidence pattern in area B.

  4. 4)

    Complex human mobility patterns: the health outcome of exposure to an infectious agent is generally dependent on the type and virulence of the pathogen, its concentration (or infectious dose), the exposure (both frequency and duration) and the immune status and general health status of the susceptible host. In that perspective, actual exposure is more difficult to assess in the case of humans than in the case of animals, since humans are very mobile. Although Dutch people spend approximately 70% of their time at home [44], it is very well possible that cases might have been infected on other locations than their own PC6, or non-cases might have ‘missed’ out on days with high exposure on their PC6. In this study we did not have information on case activity patterns, although this might be important for a better risk analysis as a very small number of bacteria are able to cause infection [45,46]. Nevertheless, since the ADM correlation results were best for area C and since the results in our previous distance-based study showed much more contrast in area C compared to the other areas [26], we think that either the fraction of infected persons in area C that were indeed infected within their PC6 was higher than in the other areas, or other sources of C. burnetii were not present in this area (or a combination of both).

  5. 5)

    Spatially heterogeneous awareness of Q fever among the population, resulting in a bias in the observed incidence.

  6. 6)

    Protective immunity from childhood [30] or immunity caused by infections in 2007 or 2008. A recent study confirmed that in humans with acute Q fever the level of antibodies remains high for several years [47].

  7. 7)

    Infection by other sources, either by a (small) unknown source in the areas themselves, or by a source in another part of the country.

Nevertheless, we showed that concentrations based on meteorological conditions correlated better to observed incidences than the NULL and DISTANCE based models, despite the fact that (1) actual emission data was lacking (thus simple emission profiles were useful), (2) the total exposure time was quite long, and (3) probable transmission by a contaminated environment could have influenced the observations.

We recommend repeating this study using similar data sets, and to repeat it for outbreaks or releases with airborne transmission during a relatively short period. That way, airborne pathogenic transmission to humans could be separated easier from transmission from a contaminated surrounding environment. In addition, it would be necessary to determine realistic emission strengths for C. burnetii to calculate exposure levels and infection probabilities using dose–response models [46].

To our knowledge this is the first study that attempts to quantify applicability of an atmospheric dispersion model for a pathogen outbreak considering human infections during an outbreak. Our results indicated that ADMs yield some promising results and that they can be used for livestock related outbreaks although more extensive validation work is needed, under different circumstances. This may make ADMs to serve as tools for environmental planning purposes to visualize and predict the spread of microbes from farms and industries to surrounding human populations.