Introduction

Volcanic hazard studies have a prominent impact on society and volcanology itself, being an area where the “rubber hits the road”, that is, where science is applied to an important societal problem. Despite its importance, volcanic hazard assessment is still commonly presented in many different ways, ranging from maps of past deposits of the volcano to more quantitative probabilistic assessment (e.g., Scandone et al. 1993; Newhall and Hoblitt 2002; Marzocchi et al. 2004; Martin et al. 2004; Neri et al. 2008; Marti et al. 2008). The latter, being quantitative, has remarkable advantages: (1) it allows comparisons among different volcanoes and with other natural and nonnatural hazards, (2) its reliability can be tested through statistical procedures, and (3) it provides a basic component for rationale decision making (e.g., Marzocchi and Woo 2007, 2009; Woo 2008).

In order to unambiguously distinguish the quantitative approach from others which are more qualitative, it has been suggested to call it probabilistic volcanic hazard assessment (PVHA; see Marzocchi et al. 2007). The term “probabilistic” means that the extreme complexity, nonlinearities, limited knowledge, and the large number of degrees of freedom of a volcanic system result in difficult, if not impossible, deterministic prediction of the evolution of volcanic processes (see, e.g., Marzocchi 1996; Sparks 2003). In other words, volcanic systems are stochastic, and hazardous volcanic phenomena involve so many uncertainties that a probabilistic approach is needed.

That being said, we note that full PVHA is still quite rare (Magill et al. 2006; Ho et al. 2006; Neri et al. 2008; Marti et al. 2008 are among the few exceptions). Generally, hazard assessment represents, at best, a conditional probability of one specific hazard conditioned to the occurrence of one specific event (for instance, the most likely event; Cioni et al. 2003; Macedonio et al. 2008), or it is focused on one specific aspect of volcanic hazard, such as vent opening (Martin et al. 2004; Jaquet et al. 2008; Selva et al. 2010). In other cases, as mentioned above, hazard assessment merely consists of maps of volcanic deposits of past events. By contrasts, a full PVHA requires the assessment of the impact of hazardous phenomena associated with every possible “eruptive setting” (ES) and eventually the merging of all ESs, each of them weighted with its own probability of occurrence. Hereafter, with the terms “eruptive setting”, we mean the occurrence of an eruption of a specific size or type from a specific vent.

Another basic feature of PVHA is that it needs to account for all relevant sources of uncertainty. Indeed, the great importance of PVHA is due to its practical implications for society. In this perspective, it is fundamental that PVHA is “accurate” (i.e., without significant biases) because a biased estimation would be useless in practice. On the other hand, PVHA may have a low “precision” (i.e., a large uncertainty) that would reflect limited knowledge of some physical processes involved, from the preparation of an impending eruption to the derived impact on the surrounding area.

Here, we present a probabilistic tool, named Bayesian Event Tree for Volcanic Hazard (BET_VH), to calculate and to visualize long-term PVHA, accounting for the features described above. In this paper, “long term” refers to the timescale of the expected significant variations in volcanic processes. While during unrest the time variations occur in short timescales (from hours to few months), the changes expected during a quiet phase of the volcano are much longer. In input, BET_VH takes the output of different models (ES-based or not) and merges them with available data collected in the field. The result is a long-term probabilistic estimate of the hazard posed by different volcanic hazardous processes (e.g., either ashfall, pyroclastic density currents, lava flows, or lahars, etc.), accounting for all possible sources of uncertainty in a Bayesian structure. BET_VH has an event tree structure (Newhall and Hoblitt 2002; Marzocchi et al. 2004; see Fig. 1a), that is, a graphical tree representation of events in which individual branches are alternative steps from a general prior event, state, or condition, through increasingly specific subsequent events (intermediate outcomes) to final outcomes. In this way, the scheme shows all relevant possible outcomes of volcanic unrest at progressively higher degrees of detail. This structure allows different processes and relative uncertainties to be modeled and accounted for separately, merging output of models and field data (Marzocchi et al. 2008).

Fig. 1
figure 1

a General scheme of event tree for BET_VH. The selected path (black solid arrows connecting blue labels) is related to the probability of an eruption in location i, of size class j (see text for more explanations), generating tephra that is able to reach area k and overcome the threshold set for tephra accumulation; b a snapshot from the main selection window of BET_VH code for an imaginary volcano called Monte Donato. Here, the selection is related to the probability of an eruption in any location, of size class 2 or greater, generating tephra that is able to reach proximal or distal areas and overcome the threshold of 5 cm

In the following sections, we describe the structure of the event tree for BET_VH, and the basic rules to estimate the probability distributions at each node. Finally, we show how a full PVHA is achieved, combining the probabilities of each node. A tutorial application of these concepts is outlined in the companion paper (Selva et al. 2010), where a PVHA for ashfall at Campi, Flegrei, Italy, is reported.

The Bayesian Event Tree scheme for long-term volcanic hazard

BET_VH scheme for long-term volcanic hazard is a natural evolution of the short-term eruption forecasting code BET_EF described in Marzocchi et al. (2008) and devoted to eruption forecasting purposes. Here, the event tree is expanded to consider also the probability of occurrence of the typical hazardous phenomena accompanying eruptions and impacting the territory, such as lava flows, pyroclastic density currents, ash fall, lahars, tsunami, and so on. Its basic structure can be described as follows (Fig. 1a):

  • Node 1–2–3: there is an eruption, or not, in the time interval (t 0,t 0 + τ], where t 0 is the present time and τ is the forecasting time window. This node condenses the probabilities of nodes 1 (unrest), 2 (presence of magma given an unrest), and 3 (eruption given a magmatic unrest) in BET_EF by Marzocchi et al. (2008) as regards the nonmonitoring part.

  • Node 4: the eruptive vent will open in a specific location, provided there is an eruption.

  • Node 5: the eruption will be of a certain size or type, provided that there is an eruption in a given location.

  • Node 6: a particular hazardous phenomenon will be generated, or not, given that an eruption of a specific size or type occurs. Several hazardous phenomena can be uploaded in BET_VH at a time; volcanic hazard due to different phenomena can be juxtaposed and compared.

  • Node 7: a selected area around the volcano will be reached, provided the occurrence of a specific hazardous phenomenon produced by an eruption of given size or type and location.

  • Node 8: a specific intensity threshold will be overcome, or not, provided that the area selected at node 7 is reached by a specific hazardous phenomenon generated by an eruption with a given size or type and location.

Note that since the definition of event tree is mainly driven by its practical utility, the branches at each node point to the whole set of different possible events, regardless of their probabilistic features. In other words, the events at each node need not be mutually exclusive (see upper part of Fig. 1). This makes the combination of nodes a little bit more complicated, but it keeps a more logical and comprehensible structure (see Marzocchi et al. 2004, 2006, 2008 for more details).

In the following section, we describe how the probability distribution at each node is assigned. We will try to make use of the same symbols and terminology already published in Marzocchi et al. (2008) and in the Electronic supplementary material of that paper.

Estimating the probability distribution at each node

BET_VH focuses on long-term PVHA only. Because of this choice, the information to be used is related to geological and/or physical models and past data. We do not account for monitoring measures. The latter, in fact, are related to short-term variations in the state of the volcano. This is one of the basic differences with BET_EF by Marzocchi et al. (2008), and it implies a simpler formalism for the probability computation at the nodes of the event tree in common with BET_EF. On the other hand, BET_VH has a more complex structure than BET_EF because it accounts also for the impact of hazardous phenomena on the territory; this leads to a more complex dependence on the selected path, i.e., on the events selected at the previous nodes. In order to keep the notation as simple as possible, hereinafter, we set specific indexes for specific selected outcomes at the different nodes: i indicates the vent location, j the eruption size or type, p the hazardous phenomenon, k the area, and s the threshold related to the phenomenon p. Despite the description of the following nodes should be self-consistent, we recommend to refer to Marzocchi et al. (2008) for more detailed description of the statistical distributions used (Beta and Dirichlet distributions).

Node 1–2–3

This node has two possible outcomes: eruption or no eruption, in the time interval (t 0,t 0 + τ]. A suitable time window for long-term hazard purposes is τ = 1 year. The long-term probability of eruption is estimated similarly to Marzocchi et al. (2008) for the nonmonitoring probability. Here, BET_VH condenses the first three nodes of that tree in one. In particular, BET_VH assigns a prior Beta distribution for the two possible outcomes:

$$ \label{eq:node123_prior} \left[\theta_{1-2-3}\right]_{\rm prior}=\textrm{Beta}(\alpha_1,\beta_1) $$
(1)

where the parameters α 1 and β 1 are determined from the average Θ1 (i.e., the best guess for the long-term probability of eruption) and from the equivalent number of data, Λ1, by inverting the system of equations (see Marzocchi et al. 2008 for more details):

$$ \label{eq:system_alphabeta} \left \{ \begin{array}{l} \Theta_1=\displaystyle\frac{\alpha_1}{\alpha_1+\beta_1}\\ \Lambda_1=\alpha_1+\beta_1 - 1 \end{array} \right. $$
(2)

While Θ1 comes from models and/or a priori considerations, Λ1 is set according to the degree of confidence that the user puts on the estimate of the best guess. The parameter Λ i (here the index i is relative to the node considered) is a friendly measure of the confidence on the prior distribution, or, in other terms, of the epistemic uncertainty. In general, the higher the Λ i , the larger our confidence on the reliability of the model, so that the number of past data needed to modify significantly the prior must be larger. On the contrary, if we believe that the prior is poorly informative (i.e., our a priori information is very scarce), Λ i must be small, so that even a small number of past data can drastically modify the prior (for more information see Marzocchi et al. 2008).

In case a catalog of past eruptions is available, BET_VH updates the prior distribution into the posterior through:

$$ \begin{array}{lll} \label{eq:node123_post} \left[\theta_{1-2-3}\right]&=&\left[\theta_{1-2-3}|y_{1-2-3}\right]\\&=&\textrm{Beta}\left(\alpha_1+y_{1-2-3},\beta_1+n_{1-2-3}-y_{1-2-3}\right)\\ \end{array} $$
(3)

where y 1 − 2 − 3 represents the number of past eruptions having occurred and n 1 − 2 − 3 represents the total number of not overlapping time windows of length τ (for example, years) available in the catalog, starting without unrest or eruption (see Electronic supplementary material in Marzocchi et al. 2008).

Node 4

At this node, BET_VH estimates the spatial probability of vent opening, given that an eruption occurs. The basic assumption here is that only one vent at a time will erupt. Because of this, this node has I 4 possible and mutually exclusive outcomes, corresponding to the number of possible vent locations defined by the user.

The probability estimation at this node is identical to the nonmonitoring case in Marzocchi et al. (2008). In practical terms, BET_VH assigns a prior Dirichlet distribution for the I 4 vent locations:

$$ \label{eq:node4_prior} \left[\theta_{4}\right]_{\rm prior}=\textrm {Di}_{I_4}\left(\alpha_{4}^{(1)},...,\alpha_{4}^{(I_4)}\right) $$
(4)

where each \(\alpha_{4}^{(i)}\) (i=1,...,I 4) is determined (as in Marzocchi et al. 2008) on the basis of \(\Theta_{4}^{(i)}\) (i.e., the expected value, or best guess, of the long-term probability of eruption in vent location i) and of Λ4 (i.e., the equivalent number of data), set similarly to node 1–2–3. The parameters \(\alpha_{4}^{(i)}\) are obtained by inverting the system of equations (see Marzocchi et al. 2008 for more details):

$$ \label{eq:system_alphaDiri} \left \{ \begin{array}{l} \Theta_4^{(1)}=\displaystyle\frac{\alpha_4^{(1)}}{\sum_{i=1}^{I_4} \alpha_4^{(i)}}\\ ... \\ \Theta_4^{(I_4)}=\displaystyle\frac{\alpha_4^{(I_4)}}{\sum_{i=1}^{I_4} \alpha_4^{(i)}}\\[18pt] \Lambda_4=\sum_{i=1}^{I_4} \alpha_4^{(i)} - I_4 +1. \end{array} \right. $$
(5)

In case a catalog of past data is available, BET_VH updates the prior distribution into the posterior through:

$$ \label{eq:node4_post} \left[\theta_4\right]=\left[\theta_4|y_4\right]=\textrm {Di}_{I_4}\left(\alpha_{4}^{(1)}+y_{4}^{(1)},...,\alpha_{4}^{(I_4)}+y_{4}^{(I_4)}\right) $$
(6)

where \(y_{4}^{(i)}\) represents the number of past eruptions occurred in vent location i.

Node 5

Here, we examine the probabilities related to the size or type of the eruption. The magnitude can be represented either by the type of the eruption (explosive, effusive, phreatomagmatic, and so on), or by the size (e.g., VEI), or by groups of types or sizes that for the user’s purposes are considered homogeneous (e.g., VEI ≥4). In the following, we use the term size class, meaning anyone of these parametrizations. Thus, the number of possible outcomes at this node is J 5, corresponding to the number of possible size classes defined.

At this node, we estimate the probability of a specific size class of the eruption, given its occurrence and given that the vent opens in a specific location. Here, we propose a substantial improvement over Marzocchi et al. (2008) because in BET_VH (and in the version 2.0 of BET_EF that can be downloaded from the website: http://www.bo.ingv.it/bet), we allow the probability of each size class to vary as a function of the vent location. In this way, it is possible to account for potential differences among vent locations. For example, for a certain volcano, we might want to assign a higher probability of a hydromagmatic type of eruption if the vent is located under shallow water than on land. Similarly, with this improvement, we can take into account flank instability in case the vent opens in a radial sector of a central, steep flank volcano rather than in a central crater. Note that J 5 does not depend on vent location since only the probabilities of the specific size classes do.

The computation of the prior and posterior distributions is similar to the one at node 4, except for the fact that each vent location has its own distribution that might be different from those relative to other locations. If the user defines J 5 size classes, BET_VH assigns a prior Dirichlet distribution for the size classes at the i-th vent location:

$$ \label{eq:node5_prior} [\theta_{5;i}]_{\rm prior}=\textrm{Di}_{J_5}\left(\alpha_{5;i}^{(1)},...,\alpha_{5;i}^{(J_5)}\right) $$
(7)

where each \(\alpha_{5,i}^{(j)}\) (j=1,...J 5) is determined (as for node 4, Eq. 5) on the basis of \(\Theta_{5;i}^{(j)}\) (the best guess of the long-term probability of an eruption having the j-th size class in the i-th location) and Λ5;i (the equivalent number of data for the i-th location). These parameters are set similarly to node 1–2–3. Note that the parameter Λ5;i can differ from vent locations since we may be more confident on the best guess related to a vent location than to another.

In case a catalog of past data is available, BET_VH updates the prior distribution for the i-th vent location into the posterior through:

$$ \label{eq:node5_post} \left[\theta_{5;i}\right]=\left[\theta_{5;i}|y_{5;i}\right]=\textrm{Di}_{J_5}\left(\alpha_{5;i}^{(1)}+y_{5;i}^{(1)},...,\alpha_{5;i}^{(J_5)}+y_{5;i}^{(J_5)}\right) $$
(8)

where \(y_{5;i}^{(j)}\) represents the number of past eruptions of size class j occurred in vent location i.

Node 6

For a specific hazardous phenomenon p, this node has two possible outcomes: generation or no generation of the phenomenon from an eruption of size class j. The generation of a specific phenomenon depends only on the size class, and not on the vent location. The prior distribution is set, similarly to node 1–2–3, as

$$ \label{eq:node6_prior} \left[\theta_{6;j}^{(p)}\right]_{\rm prior}=\textrm{Beta}\left(\alpha_{6;j}^{(p)},\beta_{6;j}^{(p)}\right) $$
(9)

where the parameters \(\alpha_{6;j}^{(p)}\) and \(\beta_{6;j}^{(p)}\) are determined (as for node 1–2–3, Eq. 2) on the basis of \(\Theta_{6;j}^{(p)}\) (i.e., the best guess for the long-term probability of generation of the phenomenon from a size class j eruption) and of \(\Lambda_{6;j}^{(p)}\) (i.e., the equivalent number of data). The parameters \(\Theta_{6;j}^{(p)}\) and \(\Lambda_{6;j}^{(p)}\) are set similarly to node 1–2–3. Note that the apex p on θ is necessary because we have different (and independent) distributions depending on the phenomenon considered.

In case a catalog of past data is available, BET_VH updates the prior distribution into the posterior through:

$$ \label{eq:node6_post} \left[\theta_{6;j}^{(p)}\right]\!=\!\left[\theta_{6;j}^{(p)}|y_{6;j}^{(p)}\right]\!=\!\textrm{Beta}\left(\alpha_{6;j}^{(p)}\!+\!y_{6;j}^{(p)},\beta_{6;j}^{(p)}\!+\!n_{6;j}\!-\!y_{6;j}^{(p)}\right) $$
(10)

where \(y_{6;j}^{(p)}\) represents the number of past eruptions of size class j that generated the phenomenon p and n 6;j represents the total number of past eruptions of size class j available in the catalog.

Nodes 7 and 8

These two nodes have a similar structure. First, the surrounding of the volcano is divided into a number (K 7) of areas (not necessarily equal and equally spaced). For each area k, at node 7, we have two possible outcomes: area k is reached or area k is not reached by the hazardous phenomenon p generated by an eruption of size class j and location i. At node 8, the two outcomes are: the selected threshold s is overcome or the selected threshold s is not overcome, considering an area k reached by the hazardous phenomenon p, generated from an eruption of size class j and location i. Both probabilities are assumed to be homogeneous all over the area k.

Each prior distribution is respectively set as

$$ \label{eq:node7_prior} \left[\theta_{7;i,j,p}^{(k)}\right]_{\rm prior}=\textrm{Beta}\left(\alpha_{7;i,j,p}^{(k)},\beta_{7;i,j,p}^{(k)}\right) $$
(11)

and

$$ \label{eq:node8_prior} \left[\theta_{8;i,j,p,k}^{(s)}\right]_{\rm prior}=\textrm{Beta}\left(\alpha_{8;i,j,p,k}^{(s)},\beta_{8;i,j,p,k}^{(s)}\right) $$
(12)

where the parameters α * and β * are determined (as for node 1–2–3, Eq. 2) by the correspondent Θ* (i.e., the best guess) and Λ* (i.e., the equivalent number of data) as for the previous nodes. The use of the Beta distribution for these nodes implies that each area is independent from the others; this assumption requires some additional comments. Let us consider node 7; obviously, the probability that a specific area will be hit by a hazardous phenomenon is somehow correlated to the probability on adjacent cells. This sort of dependency is accounted for, in the prior distribution, by the parameters Θ (the best guesses) for every area that are set by a model. In other words, a reliable model will produce probability estimations that are strongly spatially correlated (see the companion paper Selva et al. 2010), and therefore, also the Θ values for adjacent areas will be correlated. The characteristics of such a correlation are set by the physics of the model used. The use of a Beta distribution for each area implies some sort of independence also among adjacent areas. In particular, the average probability of hitting an area will be very similar to the average probability of hitting an adjacent area, but each single probability value is sampled from the Beta distribution independently from the sampled values in adjacent areas. The rationale of this choice is mostly technical because it greatly simplifies the computations of the code; nevertheless, this choice does not introduce any bias in the final assessment because the BET_VH code never combines the probabilities of different areas.

In case a catalog of past data is available, BET_VH updates the prior distributions into the posterior respectively through:

$$ \begin{array}{lll} \label{eq:node7_post} \lefteqn{ \left[\theta_{7;i,j,p}^{(k)}\right]=\left[\theta_{7;i,j,p}^{(k)}|y_{7;i,j,p}^{(k)}\right]= } \\ & = \textrm{Beta}\left(\alpha_{7;i,j,p}^{(k)}+y_{7;i,j,p}^{(k)},\beta_{7;i,j,p}^{(k)}+n_{7;i,j,p}-y_{7;i,j,p}^{(k)}\right) \end{array} $$
(13)

and

$$ \begin{array}{lll} \label{eq:node8_post} \lefteqn{ \left[\theta_{8;i,j,p,k}^{(s)}\right]=\left[\theta_{8;i,j,p,k}^{(s)}|y_{8;i,j,p,k}^{(s)}\right]= } \\ & = \textrm{Beta}\!\left(\alpha_{8;i,j,p,k}^{(s)}\!+\!y_{8;i,j,p,k}^{(s)},\beta_{8;i,j,p,k}^{s)}\!+\!n_{8;i,j,p,k}\!-\!y_{8;i,j,p,k}^{(s)}\right)\\ \end{array} $$
(14)

where y * represents the number of successes (reaching area k for node 7, overcoming threshold s for node 8) and n * represents the total number of past data (past eruptions of size class j and vent location i generating the hazardous phenomenon p for node 7 and number of such eruptions in which it reached area k for node 8).

Estimating PVHA

In the previous chapters, we have described the general features of the probability distributions at each node. Their combination allows a full and complete PVHA to be determined. In order to accomplish that, we have to still explore in detail three issues: (1) how model output can be used to set prior distributions for the nodes described before, (2) how to combine the conditional probability at each node to get absolute probabilities, and (3) how to account for different ESs. The following three subsections are devoted to describe these three issues. The last subsection reports the kind of outputs provided by the code.

Numerical models to define prior distribution

As we have seen so far, the setup of prior probability distributions is mainly based on models, through the best guess Θ* and the number of equivalent data Λ* that represents a sort of confidence on our best guess (see Marzocchi et al. 2008). For the prior distribution of nodes 1 to 5, we refer to the estimation of the nonmonitoring part in Marzocchi et al. (2008). For nodes 6, 7, and 8, we can use results from numerical models that are available for most of the hazardous phenomena related to volcanic eruptions (e.g., Favalli et al. 2005 for lava flows; Neri et al. 2007 for pyroclastic density currents; Pfeiffer et al. 2005 and Costa et al. 2006 for ash dispersion). For a more detailed discussion about this point, see (Selva et al. 2010). Here, we highlight the basic philosophy behind models’ usage in PVHA.

A single realization of a model very rarely represents a reliable forecast of the future activity. Several factors, acting simultaneously or separately, are responsible for this lack of determinism:

  1. 1.

    Intrinsic stochasticity of the process (the so-called aleatory uncertainty)

  2. 2.

    Epistemic uncertainty in the model parameters and in the boundary conditions at the time and during the eruption (e.g., for an ash fall model, the uncertainties in wind conditions)

  3. 3.

    Epistemic uncertainty in the (volcanological) input parameters (e.g., for an ash fall model, the uncertainties in the relevant eruption parameters given a specific eruption size class)

  4. 4.

    Any model is always a simplification of the reality, leading to unavoidable uncertainties into the forecasting.

These issues have a different impact depending on the model that we are considering. The compelling necessity to include all sources of uncertainties requires to explore the whole range of possible variations in the relevant parameters and conditions. In practice, this need calls for the use of a model able to run thousands of times in a reasonable CPU time, accounting for many different sets of initial/boundary conditions and model parameters realizations. Usually, faster models are simpler than more complex models. Anyway, we argue that errors due to the use of a simple model (see point 4 above) are often smaller than errors introduced by uncertainties in model parameters and initial and/or boundary conditions (points 1, 2, and 3; see also Grezio et al. 2010).

If different models can be used, BET_VH may use the output of each one of them recursively. In the companion paper (Selva et al. 2010), we propose a general scheme to introduce models’ results from a large number of runs and from more than one model, presenting also a practical application for tephra fall hazard estimation around the Campi Flegrei caldera.

Combination of the probabilities to obtain PVHA

The probability distributions at each node are conditional on the selection of a well-defined path (see Section “The Bayesian Event Tree scheme for long-term volcanic hazard”). In general, PVHA, as well as the evaluation of probability of each event for which we may be interested, requires their combination. With BET_VH, it is possible to compute the probability associated to a single ES (an eruption of size class and location specified) or to a combination of possible ESs. In the latter case, at nodes 4 and 5, we can select more than one branch at the same time (i.e., a set \(\mathcal{J}\) of J possible eruption size classes and/or a set \(\mathcal{I}\) containing I possible vent locations; see also Fig. 1b for a snapshot of the main window of BET_VH code, where it is possible to see that more than one location and/or size class can be selected). This feature of the code is quite remarkable because these new combined probabilities are usually very important for practical purposes and for a full PVHA (Selva et al. 2010). In the following, we will carefully describe how to obtain meaningful probabilities for ES combinations.

Absolute probability

In order to compute the absolute probability Φ of an outcome at node m, we multiply the probabilities along all the selected path, from node 1–2–3 to the selected outcome at node m (see Marzocchi et al. 2008). For example, the long-term absolute probability [Φ a ] of an area k being impacted by a specific hazardous phenomenon p overcoming the selected threshold s, generated by an eruption of size class j occurring in location i (i.e., a single ES), in any time window of duration τ in the future, is

$$ \label{eq:scenario_abs} [\Phi_a]=[\theta_{1-2-3}]\! \left[\theta_4^{(i)}\right]\! \left[\theta_{5;i}^{(j)}\right]\! \left[\theta_{6;j}^{(p)}\right]\! \left[\theta_{7;i,j,p}^{(k)}\right]\! \left[\theta_{8;i,j,p,k}^{(s)}\right] $$
(15)

while the long-term absolute probability [Φ b ] of an area k being impacted by the same phenomenon p overcoming the selected threshold s, associated to a combination of possible ESs (see above) in any time window of duration τ in the future, is

$$ \begin{array}{l} \label{eq:multscenarios_abs} [\Phi_b]\\ =[\theta_{1-2-3}]\! \sum_{i \in \mathcal{I}}\! \left(\! \left[\theta_4^{(i)}\right] \sum_{j \in \mathcal{J}} \left[\theta_{5;i}^{(j)}\right]\! \left[\theta_{6;j}^{(p)}\right]\! \left[\theta_{7;i,j,p}^{(k)}\right]\! \left[\theta_{8;i,j,p,k}^{(s)}\right]\! \right) \end{array} $$
(16)

Note that the latter estimation is particularly important in practical applications because the risk in a specific area depends on the phenomenon reaching that area, not the specific characteristic of the eruption that creates the phenomenon.

Conditional probability

Beyond the absolute probabilities, in many practical applications, some conditional probabilities are particularly important and useful, like the conditional probabilities that can be obtained by the combination of different ESs (see Selva et al. 2010). To this purpose, the code BET_VH gives the possibility to average, with proper weights, the conditional probabilities for different size classes, and/or locations.

Here, we describe how BET_VH calculates these new conditional probabilities ([φ]) starting from the conditional probability distributions at each node. In order to distinguish clearly these new conditional probabilities from the conditional probabilities of the nodes, we decide to indicate them with the symbol φ instead of θ. Let us consider all the possibilities offered by the code. For node 5 (size classes), the probability \([\theta_{5;i}^{(j)}]\) is by definition the conditional probability of an eruption of size class j, given the occurrence of an eruption at a specific location i. If a set \(\mathcal{I}\) of I possible vent locations is considered, then the conditional probability [φ a ] of an eruption of size class j occurring in a location belonging to the set \(\mathcal{I}\), given an eruption occurs, becomes

$$ \label{eq:size_cond_allventlocs} [\phi_a]=\frac {\sum_{i \in \mathcal{I}} \left( \left[\theta_4^{(i)}\right] \left[\theta_{5;i}^{(j)}\right] \right)} {\sum_{i \in \mathcal{I}} \left[\theta_4^{(i)}\right] } $$
(17)

If a set of size classes is selected at the same time, the resulting conditional probability is the algebraic sum of the conditional probability of the single size classes.

For nodes 6, 7, and 8, the conditional probabilities are always conditioned to the ES chosen (single or combined). We define the conditional ES probability [φ ES] (given an eruption occurs) as:

$$ \label{eq:scenario} [\phi_{\rm ES}]\equiv \left[\theta_{\{i \in \mathcal{I}, j \in \mathcal{J}\}}\right]=\sum_{i \in \mathcal{I}} \left( \left[\theta_4^{(i)}\right] \sum_{j \in \mathcal{J}} \left[\theta_{5;i}^{(j)}\right] \right) $$
(18)

The conditional ES probability [φ ES] represents the probability of occurrence of the selected single or combined ES, given an eruption occurs. This probability is particularly important because it gives the “weight ” to every possible ES selected; this is necessary either to compare or to merge them (see Selva et al. 2010, for more details).

With this definition, we have

  • At node 6, the conditional probability [φ b ] of a specific phenomenon p, given the occurrence of an eruption within the selected combination of ESs is

    $$ \label{eq:multscenarios_cond6} [\phi_b] =\frac {\sum_{i \in \mathcal{I}} \left( \left[\theta_{4}^{(i)}\right] \sum_{j \in \mathcal{J}} \left[\theta_{5,i}^{(j)}\right] \left[\theta_{6,j}^{(p)}\right] \right)} {\phi_{\rm ES}} $$
    (19)
  • At node 7, the conditional probability [φ c ] of a specific area k being reached by the phenomenon p, given the occurrence of an eruption within the selected combination of ESs is

    $$ \label{eq:multscenarios_cond7} [\phi_c]=\frac {\sum_{i \in \mathcal{I}} \left( \left[\theta_{4}^{(i)}\right] \sum_{j \in \mathcal{J}} \left[\theta_{5,i}^{(j)}\right] \left[\theta_{6,j}^{(p)}\right] \left[\theta_{7,i,j,p}^{(k)}\right] \right)} {\phi_{\rm ES}} $$
    (20)
  • At node 8, the conditional probability [φ d ] of overcoming the selected threshold s in a specific area k reached by the phenomenon p, given the occurrence of an eruption within the selected combination of ESs is

    $$ \label{eq:multscenarios_cond8} [\phi_d]=\frac {\sum_{i \in \mathcal{I}} \left( \left[\theta_{4}^{(i)}\right]\! \sum_{j \in \mathcal{J}} \! \left[\theta_{5,i}^{(j)}\right] \!\left[\theta_{6,j}^{(p)}\right]\! \left[\theta_{7,i,j,p}^{(k)}\right]\! \left[\theta_{8,i,j,p,k}^{(s)}\right] \right)} {\phi_{\rm ES}} $$
    (21)

In analogy to what said for node 5, such new combined conditional probabilities (Eqs. 19, 20, and 21) represent conditional probabilities averaged for every possible ES; in other words, the conditional probabilities for each ES are summed up weighting them with the ES probability. In case of a single ES (i-th vent location, j-th size class), \(\mathcal{I}=\{ i \}\) and \(\mathcal{J}=\{ j \}\).

PVHA output maps

As we have seen, BET_VH code allows any conditional or absolute probability of interest to be computed. Some probability maps have usually a practical immediate usefulness. Because of this, BET_VH is able to display them in gif or GoogleEarth format, or to export them in a raster file that can be uploaded and visualized in a GIS program. Such probability maps are:

  • Spatial probability of vent opening (conditional on eruption occurrence), which is related to the same probability of the susceptibility map proposed by Felpeto et al. (2007); see Fig. 2 for a snapshot example from the code

  • Absolute spatial probability of a specific size class or a specific phenomenon (in the code, it is called absolute map); this map shows, for each area, the absolute probability of a specific eruption size class, or the absolute occurrence probability of a specific phenomenon; see Fig. 3 for a snapshot example from the code.

  • ES conditional probability (conditional on eruption occurrence): this map (called sizes map in the code) displays, for each area (or group of areas), the probability of a specific eruption size class, weighted with the spatial probability of vent opening, conditional on eruption occurrence in that area (or group of areas). In practice, if one class is selected, the map displays [φ a ] (see Eq. 17); if more size classes are selected, the map shows the sum of [φ a ] for all size classes belonging to \({j \in \mathcal{J}}\). Figure 4 displays a snapshot example of the code.

  • Absolute or conditional probability of the phenomenon impacting the territory, in terms of “reaching” proximal or distal areas (node 7; [φ c ]), or in terms of overcoming a specific threshold (node 8; [φ d ]) in proximal or distal areas; this is called outcome map in the code; see Fig. 5 for a snapshot example from the code

Fig. 2
figure 2

A snapshot from the BET_VH code applied to Monte Donato, showing the conditional probability of vent opening in every possible vent location (conditioned to the occurrence of an eruption). The figure reports the average of the probability distribution. The table below the figure reports the 10th, 50th (median), and 90th percentiles. The term “vent loc” means vent location and “perc” means percentile

Fig. 3
figure 3

A snapshot from BET_VH code applied to Monte Donato, showing the absolute probability of an eruption of size class (see text for more explanations) 2 or greater, in every possible vent location. The figure reports the average of the probability distribution. The table below the figure reports also the 10th, 50th (median), and 90th percentiles. The term “vent loc” means vent location and “perc” means percentile

Fig. 4
figure 4

A snapshot from BET_VH code applied to Monte Donato, showing the conditional probability of an eruption of size class 2 or greater in every possible vent location, weighted by the probability of vent opening in each location, given the occurrence of an eruption. Note the difference in palette scale compared to Fig. 3. The figure reports the average of the probability distribution. The table below the figure reports the 10th, 50th (median), and 90th percentiles. The term “vent loc” means vent location and “perc” means percentile

Fig. 5
figure 5

Outcome map of the BET_VH code applied to Monte Donato. The map is visualized through Google Earth (2010 Europa Technologies; 2010 Tele Atlas). Here, we display the absolute probability related to the selection of Fig. 1, b, i.e., the absolute probability of overcoming 5 cm of tephra due to an eruption in any possible vent location with size class (see text for more details) 2 or greater. The probability is computed for different areas around the imaginary volcano. The centers of the areas are marked by white squares. The description of each area can be retrieved by clicking the relative white square as shown in the figure. Note the different spatial scale compared to Figs. 2, 3, and 4

The probabilities reported in each map are represented as distributions. The average value represents the “best guess” of such a probability; the dispersion around it gives the uncertainty about this guess.

Discussion and final remarks

This paper aims at introducing a new statistical tool, named BET_VH, to calculate and visualize the long-term PVHA. BET_VH has some paramount features:

  • It provides PVHA in a user-friendly and transparent way (it is not a black box).

  • It estimates almost all probabilities useful for hazard and risk applications. In particular, it calculates and visualizes ES maps, as well as weighted combinations of all possible ESs.

  • It is based on the Bayesian approach; this enables us to take into account different sources of information, such as models output, field data, and relevant geological and historical information. Moreover, the Bayesian approach allows aleatory and epistemic uncertainties to be estimated and visualized.

  • The outputs are provided in different formats (maps in GoogleEarth, GIS, gif formats), in order to make easy the use of the results.

In the companion paper (Selva et al. 2010), an extensive application of BET_VH to ash hall hazard at Campi Flegrei, Italy, is reported.