1 Introduction

The results of the UDINEE urban dispersion-modelling exercise are based on the Joint Urban 2003 (JU2003) experimental campaign conducted in Oklahoma City, U.S.A. (see Allwine and Flaherty 2006, 2007; Hernández-Ceballos et al. 2018a, b, and references therein). The observational data are characterized by a high-resolution timestep (0.5 s) and peaks in measured concentrations when the tracer traverses the station, resulting in measurement values sensitive to local fluctuations. However, as the peaks are often not long lasting, many models predict the highest tracer-concentration values at times either prior to or after the measured peaks, with some of the models underpredicting, while others overpredicting, the concentration magnitude.

Since the appearance of the ensemble technique applied to atmospheric dispersion modelling, many questions have been raised, such as how to present simulation results versus measurements, how to compare different models, and how to analyze the ensemble of the models (Straume et al. 1998; Bellasio et al. 1999; Dabberdt and Miller 2000; Delle Monache and Stull 2003; Galmarini et al. 2001, 2004). A number of indices can be applied, such as the factor-of-two index FAC2, which is defined as an index determining the number of model-predicted values in the range of 0.5 to 2 multiplied by the measured value, and data can be compared using scatter diagrams and correlation coefficients. These indices have been used extensively, for example, while analyzing the results of the European Tracer Experiment (ETEX) (Girardi et al. 1998; Graziani et al. 1998; Van Dop and Nodop 1998; Mosca et al. 1998). Nevertheless, the optimal method for comparing model data and observations is not obvious. Specifically related to the results of urban dispersion simulations, this problem has been investigated, for example, in Zhou and Hanna (2007) and Hanna and Chang (2012), whose work is the basis for the analysis of the UDINEE dispersion-modelling-exercise results presented in Hernández-Ceballos et al. (2018a, b). Motivated by theoretical considerations, we propose a general approach here for comparing modelled and observed data specifically tailored to the analysis of the ensemble-dispersion models and measurements of high temporal resolution. However, the approach can be also adapted for boundary-layer processes, which can be investigated by using the ensemble-modelling technique.

Section 2 describes a general concept for model comparison, Sect. 3 deals with the proposed method for ensemble analysis, Sect. 4 presents a simple example of such an analysis related to the UDINEE exercise, and Sect. 5 contains conclusions.

2 Comparison of the Results and Presentation

Assume two datasets are \( \{ C_{i} \}_{i = 0}^{n} , \{ O_{i} \}_{i = 0}^{n} \), where Ci and Oi are the simulated and observed concentrations, respectively, at time ti=t0 + iΔt, with i = 0,…,n (the timestep Δt is fixed, but could also be variable in principle), and define interpolation functions for the interval [t0, tn] for both the modelled data and observations in general as

$$ C\left( t \right) = {\text{Interpol}}(\{ t_{i} \}_{i = 0}^{n} ,\left\{ {C_{i} } \right\}_{i = 0}^{n} ), $$
(1a)
$$ O\left( t \right) = {\text{Interpol}}(\{ t_{i} \}_{i = 0}^{n} ,\left\{ {O_{i} } \right\}_{i = 0}^{n} ), $$
(1b)

where Interpol is an interpolation function (for example linear) over the whole interval [t0, tn]. The typical norm for integrable functions,

$$ \left\| C \right\|_{t,\tau } = \mathop \int \limits_{t - \tau }^{t} \left| {C\left( t \right)} \right|dt, $$
(2a)
$$ \left\| O \right\|_{t,\tau } = \mathop \int \limits_{t - \tau }^{t} \left| {O\left( t \right)} \right|dt , $$
(2b)

can be used to estimate the modelled and measured integrated concentrations, respectively, in the time interval [tτ, t], where the parameter τ is the length of the interval over which the results are integrated (i.e. the time-integration interval).

We also define the positive and negative parts of the difference between the modelled and observed values as

$$ \left( {C{-}O} \right)^{ + } \left( t \right) = { \hbox{max} }\left\{ {C\left( t \right) \, {-} \, O(t); \, 0} \right\},\left( {C{-}O} \right)^{ - } (t) = {-}{ \hbox{min} }\left\{ {C\left( t \right) \, {-} \, O(t); \, 0} \right\}, $$
(3)

with

$$ \left\| {\left( {C - O} \right)^{ + } } \right\|_{{_{t,\tau } }} = \mathop \int \limits_{t - \tau }^{t} (C - O)^{ + } \left( t \right)dt,\left\| {\left( {C - O} \right)^{ - } } \right\|_{t,\tau } = \mathop \int \limits_{t - \tau }^{t} (C - O)^{ - } \left( t \right)dt . $$
(4)

The positive (negative) part shows the integrated concentration resulting from model overprediction (underprediction) in the interval [tτ, t]. The total error between the measured and simulated concentrations in this interval is

$$ \left\| {C - O} \right\|_{t,\tau } = \mathop \int \limits_{t - \tau }^{t} \left| {C\left( t \right) - O\left( t \right)} \right|dt , $$
(5)

which is equal to the sum \( \left\| {\left( {C - O} \right)^{ + } } \right\|_{t,\tau } + \left\| {\left( {C - O} \right)^{ - } } \right\|_{t,\tau } \).

Let us define the following indicators,

$$ R_{\tau } \left( t \right) = \frac{{\left\| C \right\|_{t,\tau } }}{{\left\| O \right\|_{t,\tau } }} , $$
(6a)
$$ N_{\tau }^{ + } \left( t \right) = \frac{{\left\| {(C - O)^{ + } } \right\|_{t,\tau } }}{{\left\| {(C - O)^{ + } } \right\|_{t,\tau } + \left\| {(C - O)^{ - } } \right\|_{t,\tau } }}, $$
(6b)
$$ N_{\tau }^{ - } \left( t \right) = \frac{{\left\| {(C - O)^{ - } } \right\|_{t,\tau } }}{{\left\| {(C - O)^{ + } } \right\|_{t,\tau } + \left\| {(C - O)^{ - } } \right\|_{t,\tau } }}, $$
(6c)

where Rτ(t) is a general index determining whether the model overpredicts or underpredicts concentration values in the time interval [tτ, t], and \( N_{\tau }^{ + } \left( t \right), N_{\tau }^{ - } \left( t \right) \) show the mutual relation between the overpredictions and underpredictions, respectively, with \( N_{\tau }^{ + } \left( t \right) + N_{\tau }^{ - } \left( t \right) = 1 \) for any t and τ.

Depending on the chosen integration interval, the index Rτ(t) can be treated either globally or locally, which shows global behaviour in the case when the interval τ is the length of the whole period considered, revealing in principle to what extent the total integrated concentration is in accordance with the observed value. It is probably more reasonable to select an integration time longer than the temporal resolution, but not too long, to still enable the capture of local effects such as peaks (depending on the duration of the peak). The index Rτ(t) can also be further analyzed statistically for all measurement stations, for example, by taking the mean, median, and standard deviation of the concentration. Note that the constant value Rτ(t) = 1 demonstrates the best agreement between observations and simulated values. The comparison of simulated values with measurements can be presented as a function of time for the consecutive integration times. The same time series can be obviously used for comparison among models and for the analysis of the whole ensemble of models, which may be illustrated by presentation of hypothetical measurement and model data in Fig. 1. Figure 2 shows the values of the index Rτ(t) for an integration time five times longer than the temporal resolution of the data.

Fig. 1
figure 1

Hypothetical observations and model results as a function of time with a temporal resolution of 0.1 s

Fig. 2
figure 2

The index Rτ(t) for the integration time τ = 0.5 s (five times longer than the temporal resolution)

However, it is also possible to present on the same axes both the index Rτ(t) and additional information on the relation between the overpredictions and underpredictions via the indicators \( N_{\tau }^{ + } \left( t \right), N_{\tau }^{ - } \left( t \right) \) by defining an angle \( \varphi \left( t \right) \) based on the ratio between these two indices \( { \tan }\varphi \left( t \right) = \frac{{N_{\tau }^{ + } \left( t \right)}}{{N_{\tau }^{ - } \left( t \right)}} \). Here, the angle \( \varphi \left( t \right) = \pi /4 \) corresponds to an equal amount of overprediction and underprediction, with φ = 0 for only overprediction, and φ = π/2 for only underprediction (see Fig. 3). For clearer presentation, we transform this angle to ψ = \( 2\varphi \), giving ψ = π/2 as the balance between over- and underprediction, where ψ < π/2 for overprediction, and ψ > π/2 for underprediction. Note that the value ψ = π/2 does not imply the absence of overprediction or underprediction, but only an equivalent degree of overprediction and underprediction.

Fig. 3
figure 3

Construction of the coordinates (Rτ(t), ψτ(t))

Now we present values in the polar coordinate system (Rτ(t), ψτ(t)) in the upper half-space as 0 ≤ ψτ(t) ≤ π. The observational values are at the fixed point (1, π/2) (i.e. (0, 1) in the Cartesian coordinate system) for all times and the integration interval τ. The function t → (Rτ(t), ψτ(t)) describes how far and on which side of the observational value the model-predicted concentrations are located: points located on the right-hand (left-hand) side imply a greater overprediction (underprediction). The arc value Rτ(t) = 1 shows the threshold between the total overpredicted and underpredicted integrated concentrations. One can also add curves presenting factors of two or five—if necessary, a logarithmic scale can also be used. As an example, the curve t → (Rτ(t), ψτ(t)) with an integration time of τ = 0.5 s is presented in Fig. 4 (for the hypothetical data shown in Fig. 1), noting that the measurement point is always at a fixed point (at (0, 1) in the Cartesian coordinate system). In Fig. 4, two t → (Rτ(t),ψτ(t)) curves are shown for two models: the scales of the axes are not preserved for better visualization. Such a graph enables observation of the behaviour of the models in consecutive timesteps for the assumed integration time.

Fig. 4
figure 4

The curve t → (Rτ(t),ψτ(t)) for two hypothetical models (integration time τ = 0.5 s), with x = Rτ(t)cosψτ(t) and y = Rτ(t)sinψτ(t)

A few general remarks on this presentation method:

1. In some cases, it can be relevant to consider only measurements and model data above some threshold. A possible reason can be the removal of low values considered as noise, or the interest in simply identifying peaks in the data (then a percentage of the peak value can be defined as the threshold), or the situation when the exceedance of some limit values is the main purpose of the analysis. When a threshold value is used, an appropriate choice of the analyzed time period should be made, for example, by taking the first and the last time points when the threshold is exceeded, for either observations or model data.

2. The time-integration interval τ can be any value starting from the timestep Δt up to the whole interval [t0, tn]. In any case, integration over an interval is related to taking an average for this interval. Obviously, a shorter interval time enables a more detailed analysis.

3. One of the general drawbacks of such an analysis is related to the fact that it is based on only point measurements. Since the model spatial resolution is usually only a few metres, and concentration peaks can appear in a short period of time, the peak concentration predicted by the model may also be shifted in space by a few metres compared with the measured peak concentration. Therefore, it would be reasonable to consider an average concentration over some area, and formally this leads to the relation (instead of Eq. 5),

$$ \left\| {C - O} \right\|_{t,\tau } = \mathop \int \limits_{t - \tau }^{t} \left| {\frac{1}{\left| A \right|}\mathop \smallint \limits_{A}^{{}} C\left( {t,x} \right)dx - O\left( t \right)} \right|dt , $$
(5′)

where the model concentration is integrated over some area A and averaged, with |A| the area measure. Actually, while models often use such a formulation, the choice of the value of A can be a delicate matter.

3 How to Analyze the Ensemble of the Models?

While there are various ways to perform an analysis of the whole model ensemble, statistical analysis is usually the preferred method. However, the question arises as to which indicators should be applied, and what their meaning would be? Here, the error between the measurement O and the single model M is determined by the integral \( \left\| {M - O} \right\|_{t,\tau } = \mathop \int \nolimits_{t - \tau }^{t} \left| {M\left( t \right) - O\left( t \right)} \right|dt \). Suppose we are interested in finding the representative function for the ensemble of the models. Taking an analogous measurement, we seek a function that minimizes the error between this representative and the models—the ensemble members. In other words, we look for the function defined by the following optimization problem (see also Galmarini and Potempski 2012), i.e. find the function M* such that

$$ \mathop \sum \limits_{j = 1}^{m} \left\| {M^{*} - M_{j} } \right\|_{t,\tau } = \mathop {\inf }\limits_{M} \mathop \sum \limits_{j = 1}^{m} \left\| {M - M_{j} } \right\|_{t,\tau }, $$

where {Mj} is the set of the (interpolation) functions representing the results from m models.

The quantity \( \mathop \sum \nolimits_{j = 1}^{m} \left\| {M^{*} - M_{j} } \right\|_{t,\tau } \) can be considered as a total spread of the ensemble—hence logically the function M*, which is supposed to reflect the behaviour of the whole ensemble, should be chosen to minimize this spread. It should be stressed that this does not necessarily imply that the value \( \left\| {M^{*} - O} \right\|_{t,\tau } \) minimizes the error between the ensemble representative M* and the measurement. The function M* is that which best characterizes the full set of ensemble models based on this measure.

The relation between the ensemble spread and the error between the measurement and ensemble representative can be expressed using the following inequalities,

$$ \begin{aligned} \left\| {M^{*} - O} \right\|_{t,\tau } \le \left\| {M^{*} - M_{j} } \right\|_{t,\tau } + \left\| {M_{j} - O} \right\|_{{_{t,\tau } }} ,{\text{for}}\;j \, = 1, \ldots ,m, \end{aligned} $$
(7)

which are summed to yield

$$ \begin{aligned} \left\| {M^{*} - O} \right\|_{t,\tau } \le \frac{1}{m}\sum\limits_{j = 1}^{m} {\left\| {M^{*} - M_{j} } \right\|_{t,\tau } } + \frac{1}{m}\sum\limits_{j = 1}^{m} {\left\| {M_{j} - O} \right\|_{t,\tau } } \end{aligned} . $$
(8)

The above relation shows that the error between ensemble representative and the measurements can be estimated by the average spread and the average error between the ensemble models and observations. The meaning of this relation is analogous to the accuracy-diversity equation applied in many various contexts, such as in the fields of machine learning and neural networks (see Krogh and Vedelsby 1995) or Optiz and Shavlik 1996), where this type of expression is specifically used for the ensemble mean and mean squared error. The norms we use here are related rather to the time-integrated concentration (or doses) than to the concentration itself.

As the second term in (8) is independent of the chosen ensemble representative, it is justified that the function minimizing the first term of the right-hand side (i.e. the average spread) should be taken as M*. To prove that the median function is the solution of the minimization problem posed above, let us first define the median function formally,

$$ \begin{aligned} {\text{Med}}\left( t \right) = \left\{ {\begin{array}{*{20}l} {M_{{\left[ {\frac{m}{2}} \right] + 1}} \left( t \right)} \hfill & {\quad {\text{if}}\; m \; {\text{is}}\;{\text{odd}}} \hfill \\ {\dfrac{{M_{{\left[ {\frac{m}{2}} \right]}} \left( t \right) + M_{{\left[ {\frac{m}{2}} \right] + 1}} \left( t \right)}}{2}} \hfill & {\quad {\text{if}}\; m\; {\text{is}}\;{\text{even}}} \hfill \\ \end{array} } \right. \hfill \\ \end{aligned} $$
(9)

where [x] indicates the highest integer number < x (in fact, any value between \( M_{{\left[ {\frac{m}{2}} \right]}} + M_{{\left[ {\frac{m}{2}} \right] + 1}} \) can be chosen for an even number of models).

We need to show that for any function M, we have

$$ \mathop \sum \limits_{j = 1}^{m} \left\| {{\text{Med}} - M_{j} } \right\|_{t,\tau } \le \mathop \sum \limits_{j = 1}^{m} \left\| {M - M_{j} } \right\|_{t,\tau } , $$
(10)

and since

$$ \mathop \sum \limits_{j = 1}^{m} \left\| {M - M_{j} } \right\|_{t,\tau } = \mathop \sum \limits_{j = 1}^{m} \mathop \int \limits_{t - \tau }^{t} |M\left( t \right) - M_{j} \left( t \right)|dt = \mathop \int \limits_{t - \tau }^{t} \mathop \sum \limits_{j = 1}^{m} \left| {M\left( t \right) - M_{j} \left( t \right)} \right|dt , $$
(11)

it is sufficient to show that the sum \( \mathop \sum \nolimits_{j = 1}^{m} \left| {M\left( t \right) - M_{j} \left( t \right)} \right| \) for each point t is minimized by the median. Without loss of generality, we assume that the values Mj(t) are in ascending order, i.e., Mj(t)≤ Mj+1(t) for any fixed point t. Obviously, for any value v outside the interval [M1(t), Mm(t)], we have \( \mathop \sum \nolimits_{j = 1}^{m} \left| {{\text{Med}}\left( t \right) - M_{j} \left( t \right)} \right| \le \mathop \sum \nolimits_{j = 1}^{m} \left| {v - M_{j} \left( t \right)} \right| \). Hence, suppose that \( v \in [M_{k} \)\( ,M_{k + 1} ] \) for some k, and assume that \( k \le \left[ {\frac{m}{2}} \right] \), then \( \mathop \sum \nolimits_{j = 1}^{m} \left| {{\text{Med}}\left( t \right) - M_{j} \left( t \right)} \right| = \mathop \sum \nolimits_{j = 1}^{{\left[ {m/2} \right]}} \left( {M_{m - j + 1} \left( t \right) - M_{j} \left( t \right)} \right) \), so that for the value \( v \in \left[ {M_{k} ,M_{k + 1} } \right] \), we have

$$ \begin{aligned} & \mathop \sum \limits_{j = 1}^{m} \left| {v - M_{j} \left( t \right)} \right| = \mathop \sum \limits_{j = 1}^{k} \left( {M_{m - j + 1} \left( t \right) - M_{j} \left( t \right)} \right) + \mathop \sum \limits_{j = k + 1}^{{\left[ {m/2} \right]}} \left[ {\left( {M_{j} \left( t \right) - v} \right) + (M_{m - j + 1} \left( t \right) - v} )\right] \\ & = \mathop \sum \limits_{j = 1}^{k} \left( {M_{m - j + 1} \left( t \right) - M_{j} \left( t \right)} \right) + \mathop \sum \limits_{j = k + 1}^{{\left[ {m/2} \right]}} \left[ {\left( {M_{m - j + 1} \left( t \right) - M_{j} \left( t \right)} \right) + 2(M_{j} \left( t \right) - v} )\right] \end{aligned} . $$
(12)

As the last term in the second sum is always positive, then

\( \mathop \sum \nolimits_{j = 1}^{m} \left| {{\text{Med}}\left( t \right) - M_{j} \left( t \right)} \right| \le \mathop \sum \nolimits_{j = 1}^{m} \left| {v - M_{j} \left( t \right)} \right| \). The case \( k > m/2 \) can be treated analogously, and, as such, consideration is valid for any time t and \( v \), which proves (10).

The main purpose of using an ensemble approach concerns the problem of model predictability. It is expected that the average of the ensemble represents the most probable realization of physical processes, while the spread is related to the inherent uncertainty, and shows the range of other possible realizations. Having this in mind, we assume that the ensemble spread is an indicator representing the uncertainty of the results. In order to examine this quantitatively, we use the following quantity,

$$ \frac{{\mathop \sum \nolimits_{j = 1}^{m} \left\| {M^{*} - M_{j} } \right\|_{t,\tau } }}{{\left\| {M^{*} } \right\|_{t,\tau } }} $$
(13)

or the average spread, to observe the degree of discrepancy between models in comparison with the average concentration determined by the ensemble (according to the above consideration, the median is a good choice for M*). These indicators depend on the problem under consideration—in principle, if the spread is a small percentage of the concentration values, then there is quite good agreement among models, which suggests low uncertainty. In contrast, several reasons can cause a higher spread, such as inherent uncertainties associated with the problem, limitations of the models, and various difficulties in modelling physical phenomena. In the case when measurements are additionally available, taking into account the inequality (8), one can check the relation between the ensemble spread \( \mathop \sum \nolimits_{j = 1}^{m} \left\| {M^{*} - M_{j} } \right\|_{t,\tau } \) and the ensemble error \( \mathop \sum \nolimits_{j = 1}^{m} \left\| {M_{j} - O} \right\|_{t,\tau } \) or \( \left\| {M^{*} - O} \right\|_{t,\tau } \), representing the error of the whole ensemble, where the median function can be taken as M*.

Three cases can be considered:

  • the ensemble spread is small in comparison with the ensemble error: this shows the situation when there are probably more fundamental difficulties with modelling the problem;

  • the ensemble spread and the ensemble error are comparable: this is the case when similar agreement is within the ensemble and with the measurements; hence, the uncertainty should not be too high;

  • the ensemble spread is high in comparison with the ensemble error: this indicates high uncertainty, which could be caused by different factors, as already mentioned.

We now present a few additional remarks concerning the representation of the ensemble results. If we are interested in the root-mean-square error expressed by the norm in L2 space, i.e. \( \left\| {M - O} \right\|_{t,\tau } = \left( {\mathop \int \nolimits_{t - \tau }^{t} |M\left( t \right) - O\left( t \right)|^{2} dt} \right)^{1/2} \) and, consequently, wish to find the representative function by looking for the solution of the optimization problem, \( \sqrt {\mathop \sum \nolimits_{j = 1}^{m} \left\| {M^{*} - M_{j} } \right\|^{2}_{t,\tau } } = \mathop { \inf }\nolimits_{M} \sqrt {\mathop \sum \nolimits_{j = 1}^{m} \left\| {M - M_{j} } \right\|^{2}_{t,\tau } } \), it can then be shown that the mean function \( {\text{Mean}}\left( t \right) = \frac{1}{m}\mathop \sum \nolimits_{j = 1}^{m} M_{j} \left( t \right) \) should be the representative function for the whole ensemble. The simplest formal proof can be made by finding the derivative of the function (in some space function) \( F\left( M \right) = \mathop \int \nolimits_{t - \tau }^{t} \mathop \sum \nolimits_{j = 1}^{m} \left( {M\left( t \right) - M_{j} \left( t \right)} \right)^{2} dt \) using the Gateaux derivative,

$$ \frac{{dF\left( {M + hP} \right)}}{dh}|_{h = 0} = \mathop \int \nolimits_{t - \tau }^{t} \mathop \sum \limits_{j = 1}^{m} \frac{d}{dh}\left( {M\left( t \right) + hP\left( t \right) - M_{j} \left( t \right)} \right)^{2} |_{h = 0} dt = $$
$$ \mathop \int \limits_{t - \tau }^{t} 2\mathop \sum \nolimits_{j = 1}^{m} \left( {M\left( t \right) - M_{j} \left( t \right)} \right)P\left( t \right)dt = \, 0. $$
(14)

As this equation should be valid for any function P(t), we immediately obtain \( \mathop \sum \nolimits_{j = 1}^{m} \left( {M\left( t \right) - M_{j} \left( t \right)} \right) = 0 \), which produces the relation for the mean. A general framework for finding the optimal combination of ensembles can be found in Potempski and Galmarini (2009).

Similarly, one can use the supremum norm (Chebyshev norm) for the error, i.e. \( \left\| {M - O} \right\|_{t,\tau } = \mathop {\sup}\nolimits_{{z \in \left[ {t - \tau ,t} \right]}} \left| {M\left( z \right) - O\left( z \right)} \right| \) and, consequently, consider the optimization problem for this norm. It is easy to check that the midpoint function

$$ {\text{Mid}}\left( t \right) = \frac{{\mathop {\hbox{min} }\limits_{j} M_{j} \left( t \right) + \mathop {\hbox{max} }\limits_{j} M_{j} \left( t \right)}}{2} $$
(15)

is the solution of this minimization problem.

Summarizing all these points, different representations of the ensemble are related to a different metric applied for measuring the spread of the ensemble, which, due to the properties of these metrics, can be expressed in the following ways:

  1. a.

    The midpoint with the corresponding spread (expressed as \( \mathop {\sup }\nolimits_{{z \in \left[ {t - \tau ,t} \right]}} \left| {{\text{Mid}}\left( z \right) - M_{j} \left( z \right)} \right| \)) defines the rectangular region containing all model values. This spread can be considered as the worst-case scenario as it shows the maximum discrepancy between the ensemble representative and the model, and, hence, is the most sensitive to the outliers.

  2. b.

    The mean value with its spread \( \sqrt {\mathop \sum \nolimits_{j = 1}^{m} \left\| {{\text{Mean}} - M_{j} } \right\|^{2}_{t,\tau } } \) defines the circle containing the model values, which would correspond to the minimization of the variance if the model results were treated as random variables, and, hence, in some sense, gives the value loaded with the smallest uncertainty. This also corresponds to the fact that the mean squared error for the ensemble mean is less than that for any ensemble member (a more exhaustive explanation is given in Rougier 2016). In relation to climate models, see Christiansen (2018), where the detailed investigations are based on the assumptions that the models have a normal distribution with different variances. Concerning the problem of dimensionality described in Christiansen (2018), one can also mention Riccio et al. (2012), where the problem of the reduction of data complexity has been investigated to deal with this issue.

  3. c.

    The median is the ensemble representative being possibly the least sensitive to the outliers. A general property of the median is such that it minimizes the mean absolute error associated with the random variable, i.e., it corresponds in some way to the bias.

However, there are also other possible less sensitive ensemble representatives than the mean, such as the winsorized mean or the trimmed mean. In both cases, the first step is to choose the limit percentile and either replace the outliers defined by this percentile by the nearest values within the percentile (for the winsorized mean) or simply remove them (for the trimmed mean) before calculating the mean. However, while this approach needs the definition of the limit percentile that, in general, is case dependent, the method can nonetheless be useful as a reasonable estimator if properly applied. The winsorized mean can also be presented in the form based on the notion of norms, but with a specific weighted norm that depends on the chosen percentile.

4 Example for the UDINEE Exercise

Here, a simple example of the application of the analysis method described above is demonstrated for the UDINEE exercise, using data from the second puff release of IOP4 (IOP – intensive operating period: time horizon of puff release) at sensor location L11, the fourth puff of IOP6 at station L15, and the first and second releases of IOP9 at the stations L05 and L17, respectively (see Fig. 5, and see Hernández-Ceballos et al. (2018a, b) for a full description of the UDINEE project). This selection was made because of the relatively good quality of the measurements. The main peaks are observed in the following periods: 160–180 s for IOP4, 110–180 s for IOP6, 330–370 s for the first puff of IOP9, and 210–320 s for the second puff of IOP9. Data are available from five models in the first two cases, and from six models in the last two cases.

Fig. 5
figure 5

Measurement data for a IOP 4, puff 2, station L11; b IOP6, puff 4, station L15; c IOP9, puff 1, station L5 and d IOP9, puff 2, station L17

First we present in Fig. 6 the values of the index Rτ(t) for τ = 20 s and τ = 30 s in consecutive timesteps for the five models, the mean and median for IOP4 (due to the high values, logarithmic values of concentration have been used in Fig. 6). Both values of τ give a decrease in the values of the index Rτ(t) starting at about 150 s, with higher values observed for several models starting from 600 s, illustrating that the choice of τ is important—a sharp peak for model 2 at time 100 s appears only in Fig. 6a for the shorter interval time τ = 20 s (model 2 also produces two zero values from 1000–1200 s for τ = 20 s).

Fig. 6
figure 6

Indicator values Rτ(t) for, aτ = 20 s, and bτ = 30 s for IOP 4, puff 2, station L11

Values of Rτ(t) for τ = 30 s are presented in Fig. 7 for IOP6 and two puffs of IOP9, showing the common feature that, in the peak period, the values of Rτ(t) for all models improves (i.e. closer to one), but outside the peak periods, there are models when this value is reduced.

Fig. 7
figure 7

Indicator values Rτ(t) for τ = 30 s: a IOP6, puff 6, station L15; b IOP9, puff 1, station L5; c IOP9, puff 2, station L17

In Fig. 8, t → (Rτ(t), ψτ(t)) curves for IOP4 (corresponding to Fig. 6b) are shown for times up to 720 s, illustrating that most of the values are located on the right-hand side of the diagram, which corresponds in general to model overprediction. Restricting the time horizon to the interval from 130–230 s in Fig. 9, corresponding to the time of peak concentration, we demonstrate that the curves for τ = 10 s have a better time resolution.

Fig. 8
figure 8

The t → (Rτ(t), ψτ(t)) curves for τ = 30 s for IOP 4, puff 2, station L11

Fig. 9
figure 9

The t → (Rτ(t), ψτ(t)) curves for IOP 4, puff 2, station L11 from 130–230 s showing the five models in the left column: (a) and (c), and the mean and median models in the right column: (b) and (d); τ = 30 s – graphs (a) and (b), τ = 10 s –plots (c) and (d)

From these plots, one concludes that model results are closer to the measurements in the time frame corresponding to the peak time than in the other time periods. In general, the median model better characterizes the whole ensemble than the mean model, which is much more sensitive to the peculiar values of a single model. Similar diagrams are presented in Fig. 10 for IOP6, and two puff releases of IOP9 (for τ = 10 s), illustrating the median as better representing the ensemble than the mean. An extreme case can be observed for IOP6 (Fig. 10a) when model 2 forces the mean to completely overpredict the concentration.

Fig. 10
figure 10

The t → (Rτ(t), ψτ(t)) curves for IOP6 4, puff 4, station L11 (a) and (b); IOP9, puff 1, station L5 (c) and (d); IOP9, puff 2, station L17 (e) and (f). Presented are the ensemble members in the left column (a), (c), (e), and the mean and median in the right column (b), (d), (f) for τ = 10 s (IOP time periods as for Fig. 7)

A cumulative column diagram enables a detailed analysis of the behaviour of the models, for which Fig. 11 shows overprediction and underprediction in consecutive timesteps for the integration time τ = 20 s for five models (see the diagram for the precise values of \( N_{\tau }^{ + } \left( t \right), N_{\tau }^{ - } \left( t \right) \)). The mean and median of the ensemble for IOP4, IOP6, and two puffs of IOP9 are presented in Fig. 12.

Fig. 11
figure 11

Cumulative column diagram showing the overprediction and underprediction for five models for the case IOP4

Fig. 12
figure 12

Cumulative column diagram showing the overprediction and underprediction for the mean (left column) and median (right column) for cases IOP4 station L11 (a), (b), IOP6 station L16 (c), (d) and two puffs of IOP9 stations L05 and L17, respectively (e), (f) and (g), (h)

The mean model generally gives higher values than the median model, which is particularly distinct for IOP6, and is explained by the deterioration of the mean by model 2 (see Fig. 10). Interestingly, at the times of peak concentration, more underprediction is observed. As models have difficulties in timing the exact peak, if the integrated values are considered (doses), the models better predict the higher values (which, in principle, is generally true), and the results are more dispersed in time. The reason for such behaviour may be because of difficulties in modelling very local phenomena, and the usage of parametrizations based on averaged values in both time and space. As already mentioned, another problem is related to the point measurements; more adequate would be to compare the model results with measurements from devices able to perform area scans.

Finally for IOP4 and for IOP6, Fig. 13 compares the spread with the median and the error of the ensemble (expressed as the error between the observation and median) in terms of the ratios between the respective quantities, illustrating that the spread is generally high, and usually the better agreement between the models is in the time interval related to the peak period; the spread is also higher in comparison with the ensemble error (again the smallest spread is in the peak period). In contrast, the spread may also depend on the value of the concentration—if the peak values are not high and sharp enough, the impact of noise can usually be observed. In general, this confirms the previous finding that the ensemble behaves better in the peak interval than in other periods.

Fig. 13
figure 13

The ratios of the ensemble spread to the median ((a) and (c)) and to the error between the median and the measurement ((b) and (d)) in consecutive time steps for IOP4 ((a) and (b)) and IOP6 ((c) and (d))

While this simple analysis cannot be treated as representative of the results of the whole UDINEE modelling exercise, it does give indications relating to the behaviour of the ensemble, and illustrates one of the possible ways for performing such an analysis, with a valuable feature being the possibility of using various integration intervals in a unified way, while observing the variability in time. In most presentations, global indicators are used when sometimes it is difficult to catch some nuances.

5 Conclusions

Many different indicators can be used for the analysis and presentation of high-resolution results of dispersion simulations, including the model ensemble. A general concept has been elaborated here based on typical mathematical notions. The methodology is unified in the sense that it can be applied for different degrees of accuracy, which can be defined by changing simple parameters, such as the integration time and the time horizon of the analysis, with the proper selection of the period of the integrated concentration playing an important role. However, in general, the method enables analysis of the ensemble results by taking into account the variability in time of the required precision. Simple theoretical considerations have shown that the median model can be treated as representative of the ensemble for this type of analysis. The multi-model ensemble approach gives additional information related to the uncertainty of the simulation results, while finding areas requiring further model improvement. Application of the presented method to selected cases from the UDINEE dispersion-modelling exercise reveals that, although the analysis cannot be treated as representative for all cases, the method describes the typical behaviour of the ensemble fairly well, with a quite large spread of model predicted concentrations in general, and better agreement in the time window related to the measured peak concentration. The proposed method may also be applied to other types of ensembles, such as those built by perturbing the initial parameters, and can be used in sensitivity studies. The sensitivity analysis for each model can be very useful in the construction of an optimal ensemble (see Potempski and Galmarini 2009).

For this type of experiment, it would generally be better if the observations were a continuous function in time and space, resulting from, for example, a very dense network of sensors combined with interpolation based on geospatial techniques, which can also give quite a good estimation of the measurement uncertainties, or devices scanning the concentration over some area.