# Encyclopedia of Complexity and Systems Science

Living Edition
| Editors: Robert A. Meyers

# Tsunamis: Bayesian Probabilistic Analysis

Living reference work entry

Latest version

DOI: https://doi.org/10.1007/978-3-642-27737-5_645-2

## Glossary

Aleatory variability

In the present context, it is the assumed random variability of the parameters characterizing the future hazardous events or, in other words, the random variability in the model describing the physical system under investigation.

Bayesian statistics

An approach to statistics which represents unknown quantities with probability distributions that in one interpretation represent the degree of belief that the unknown quantity takes any particular value. Data are considered fixed and the parameters of distributions representing the state of the world or hypotheses are updated as evidences are collected.

Bias

The tendency of a measurement process or statistical estimate to over- or underestimate the value of a population parameter on average.

Conditional probability

The probability that an event will occur under the condition or given knowledge that another event occurs.

Conjugacy

In Bayesian statistics, the property of parametric families of distributions for prior and likelihood that lead the posterior distribution to be of the same family as the prior distribution.

Completeness

It is the extent to which all needed statistics are available. In geophysics, it is usually referred to catalogs of past events, and it refers to the spatiotemporal windows in which virtually no events in a given energetic range are missing.

Epistemic uncertainty

The uncertainty deriving from limited knowledge of the physical process, usually treated with alternative models of the same process.

Estimation

The process by which we make inferences about a population, based on information obtained from a sample.

Exceedance probability

The probability that a given parameter will be larger than a threshold value over a time interval of interest.

Frequentist statistics

An approach to statistical reasoning which considers the observed sample to be one realization of repeatable random experiment. The parameters to be estimated are considered to be constants, in contrast with Bayesian statistics where the parameters are treated as random variables.

Inference

It is the act of generalizing from the data (“sample”) to a larger phenomenon (“population”).

Joint probability distribution

Describes the simultaneous occurrence of two or more events treated as random variables.

Likelihood function

A function of the unknown parameters conditioned on the given fixed observed data, which returns the likelihood that the parameters assume specific values.

Probability density function (PDF)

Also known as the density of a continuous random variable, it is a function that describes the relative likelihood that a random variable takes a given value. The probability of the random variable falling within a particular range of values is given by the integral of this variable over the given range. The PDF is nonnegative everywhere, and its integral over the entire space is equal to one.

Recurrence interval or average return period

It is the average time interval between events of a similar size or intensity.

Run-up

It is the maximum topographic height reached by inundation above a reference sea level, usually measured at the horizontal inundation limit during a tsunami event.

## Definition of the Subject

Tsunamis are low-frequency high-consequences major natural threats, rare events devastating vast coastal regions near and far from their generation areas. They may be caused by coseismic seafloor motions, subaerial and submarine mass movements, volcanic activities (like explosions, pyroclastic flows, and caldera collapses), meteorological phenomena, and meteorite ocean impacts. The probability of tsunami occurrence and/or impact on a given coast may be treated formally by combining calculations based on empirical observations and on models; this probability can be updated in light of new/independent information. This is the general concept of the Bayesian method applied to tsunami probabilistic hazard analysis, which also provides a direct quantification of forecast uncertainties. This entry presents a critical overview of Bayesian procedures with a primary focus on their appropriate and relevant applicability to tsunami hazard analyses.

## Introduction

Bayesian inference is a process of learning from data and from experience (Gelman et al. 2013). We define prior information as the knowledge before observing new data and posterior information as the understanding after considering new data with the improvement of the prior knowledge due to the evidence in the data. Named after British mathematician the Reverend Thomas Bayes (1701–1761), Bayesian inference is based on statements of conditional probability. Bayes’ formula, at the roots of Bayesian methods, was introduced in special cases and was formalized in a posthumous paper presented at the Royal Society of London. Stigler (1983) attributes the principle of the Bayesian inference to Saunderson (1683–1739), a professor of optics who published a large number of mathematics papers. From 1774 to 1812, Laplace studied the general concept of the conditional probability independently, considering the inductive probability by reassessing the prior estimates if new relevant evidence has emerged. Due to the work of Laplace, Bayesian statistics was in common use for practical applications beginning around the late nineteenth to early twentieth centuries.

The awareness of the tsunami threat to coastal communities has been increased worldwide after catastrophic, global-scale tsunamis generated by great megathrust earthquakes in the Indian Ocean (26 December 2004), in Chile (27 February 2010), in Japan (11 March 2011) and by several other seismic events in the last decade (Lay 2015; Lorito et al. 2016). Hence, progressively more intense efforts have been spent on estimating long-term tsunami hazard and risk. Probabilistic tsunami hazard analysis (PTHA) provides a quantitative tool of the tsunami hazard assessments for tsunami risk mitigation plans and for the implementation of tsunami early warning systems (Geist and Parsons 2006; Geist and Lynett 2014; Grezio et al. 2017).

PTHA estimates the probability of exceeding specific tsunami intensities (wave heights, flow-depth, run-up, velocity, etc.; see, e.g., TPSWG 2006) within a certain time period (exposure time) at given locations (key sites). The most common approaches to PTHA are either based on combining source probability with numerical modeling of the ensuing tsunamis (computationally based approach: Annaka et al. 2007; Burbidge et al. 2008; Davies et al. 2016; Gonzalez et al. 2009; Heidarzadeh and Kijko 2011; Hoechner et al. 2016; Horspool et al. 2014; Knighton and Bastidas 2015; Lane et al. 2013; Lorito et al. 2015; Mueller et al. 2015; Omira et al. 2015; Power et al. 2013; Sakai et al. 2006; Selva et al. 2016; Sørensen et al. 2012; Suppasri et al. 2012; Thio et al. 2010; Thio and Li 2015) or, less frequently, based on the observed tsunami frequency at a given coastal site (empirical approach: Geist et al. 2014; Orfanogiannaki and Papadopoulos 2007; Tinti et al. 2005; Yadav et al. 2013). Empirical methods are often inhibited by a paucity of information. For this reason, computationally based methods are often preferred for tsunamis; however, they can be tremendously computationally intensive, and the statistics and physics of the sources can be difficult to constrain and model (see discussion on the different approaches and their advantages/limitations in Geist and Lynett 2014).

Bayesian statistics allows merging these different approaches, homogeneously integrating the different kinds of information and thus may represent an effective method to assess tsunami hazard. In examples of computationally based versus empirical approaches, the computationally based analyses may constitute the core assessments for building a first attempt of a complete description of the probabilistic tsunami hazard. The computationally based approach covers all the tsunami sources that are considered possible and their (natural) aleatory variability. Direct numerical modeling of the tsunamis generated by each assumed source scenario is then performed. The exceedance probability at a given site is finally assessed by combining the simulation results with the source probability. Alternative (source and propagation) models span the epistemic uncertainty. Furthermore, this prior information may be updated considering the statistics of the observed tsunamis. Bayesian techniques enable the merging of models and observations into a coherent probabilistic framework, and they may be adopted at some stage of the computationally based PTHA, for example, to constrain the earthquake or tsunami magnitude-frequency relationships and associated uncertainties, or the likelihood of the earthquake focal mechanism at a given location (e.g., Shin et al. 2015; Selva et al. 2016; Yadav et al. 2013), or for the final results of the PTHA (e.g., Grezio et al. 2012; Parsons and Geist 2009).

Bayesian approaches have been applied not only for long-term PTHA but also for short-term and/or time-dependent tsunami hazard forecast, in the context of tsunami early warning (Blaser et al. 2011; Tatsumi et al. 2014).

Here we present the general Bayesian methodology, and two examples of Bayes’ theorem applied to the tsunami hazard analysis: (i) developing a tsunami forecast from numerical modeling and an empirical catalog and (ii) implementing weighting factors based on past tsunami data for statistical models estimating run-up exceedance rates and run-up forecast with subjective estimation of the variance. Finally, we discuss the advantages and the limitations of the Bayesian approach in tsunami probabilistic analysis, and we outline some future directions of the Bayesian approach in tsunami studies.

## Methodology

In general terms, Bayesian statistics identifies a given number of unknown parameters Θ and tries to infer their values θ by accounting for some measurable data Y = {y1,…,yi, …, yn} and for the knowledge, independent from data, that we may have about their values. In all the steps of the analysis, the uncertainty is expressed through probability density functions (PDFs), and, in Bayesian analyses, all the parameters (including probability values) can be treated as unknowns (Draper 2009; Gelman et al. 2013).

In PTHA, the parameters under investigation are generally the exceedance probabilities of a given tsunami intensity at one site (Θ = P(Z > z; x, DT)), where the tsunami intensity is, for example, the run-up. The data may be observed intensities at given sites due to past tsunamis or the observed frequency of exceedance in a past time interval DT (Grezio et al. 2010). The parameters in question may also be intermediate quantities required for building the PTHA, like earthquake magnitudes and focal mechanisms (Θ = {M,strike,dip,rake}) or landslide volume or shape (Grezio et al. 2012).

The current state of knowledge about the parameters Θ (i.e., the relative ignorance of the parameter values) and the relative uncertainty before considering new observations are expressed through the prior PDF, hereinafter indicated with P(Θ). This term allows to account for whatever source of information, from theoretical models to expert beliefs and for both quantitative assessments and qualitative information. If no independent information is available, it is possible to set noninformative prior distributions (Box and Tiao 1992).

In order to understand what the observables say about the parameters Θ, we have to set a parametric statistical model that links the observables Y to the parameters. Then, the likelihood of the observables, given a specific value of Θ, needs to be evaluated. This information is expressed through the likelihood PDF, hereinafter indicated as P(Y|Θ), which gives the probability density of the observed data for any choice of model parameters and follows from the assumed parametric statistical model.

The final goal is the quantification of the probability of the different values of the parameters Θ given the prior knowledge and the observations Y. This information is expressed through the posterior PDF, hereinafter indicated as P(Θ|Y). The posterior distribution is computed from the prior and the likelihood distributions through Bayes’ theorem, discussed in the following paragraph. In practice, the past data are utilized to compute the likelihood function and then update the prior beliefs. If new evidence is then gathered, such as new data, the described procedure may be applied iteratively, and the old posterior can play the role of the new prior, to be combined with the new likelihood to obtain a new posterior. The results of the Bayesian analysis are conditional on the assumptions on which the statistical models for the prior PDFs and likelihood functions are built. The results allow for multiple interpretations like hypothesis evaluation, inverse probability problem, prediction process, model evaluation, parameter ranges, and sensitivity analysis (Box and Tiao 1992; Congdon 2006; Gelman et al. 2013).

### Bayes’ Theorem

Being Θ the model parameters and Y the observed data, Bayes’ theorem (Gelman et al. 2013) enunciates that the updated posterior probability distribution P(Θ | Y) is
$$P\left(\Theta |Y\right)=\frac{P\left(\Theta \right)P\left(Y|\Theta \right)}{P(Y)}$$
(1)
where:
• the notation P() denotes the probability PDF and P(·|·) the conditional PDF, whose parameters are usually referred to as hyper-parameters. Hence, conditional PDFs are central elements in the Bayesian framework.

• P(Θ) is the prior probability distribution. The Θ parameter spans a range of possible values θ and defines the hypothesis space. The parameters are not estimated as a single point in the parameter space but are instead represented by a distribution and its statistics (e.g., prior mode or prior mean).

• P(Y | Θ) is the likelihood function. It represents the information about Θ contained in the data Y.

• P(Y) is a normalization constant ensuring posterior probability integrates to 1. For this reason, it is often omitted from the notation, reporting “proportional to” instead of “equal to” in Eq. 1 (Gelman et al. 2013).

The posterior distribution P(Θ | Y) quantifies the Bayesian inference about the parameters obtained through Bayes’ theorem from the prior and likelihood. As for the prior, the parameters are not estimated as a single point in the parameter space but are instead represented by a distribution. The posterior distribution may be seen as an update of the prior (a novel estimate of the same quantity) in light of new data.

### Distribution Forms and Mathematical Techniques

The distribution forms of the prior and likelihood should reflect the probabilistic representations of the tsunami parameters and/or statistical descriptions of observed values. In tsunami investigations, probability distributions of a selected parameter Θ commonly make use of:
• Normal density (Θ ~ Nor (y | μ, σ) where μ is the mean and σ the standard deviation)

• Poisson density (Θ ~ Pois (y | λ) where λ is a constant rate of occurrence in the considered time interval)

• Gamma density (Θ ~ Gam (y | α, β) where the hyper-parameters α and β, respectively, control the probability distribution immediately after each sub-event and describe the longer-term rate)

• Binomial density (Θ ~ Bin (y | n, θ) where n are the trials, given that the probability of success in one trial is θ)

• Beta density (with Θ ~ Beta (y | α, β) where the two positive hyper-parameters α and β are the exponents of the random variable)

• Uniform density (with Θ ~ Unif (y | α, β) with all values between α and β equally probable)

Bayesian statistics may utilize the valuable mathematical properties of the conjugate analysis to find prior distributions for the likelihoods (denominated conjugate priors) in order to represent the results in an analytic form which simplifies the computations. When possible, this is obtained by picking a prior distribution of the same “family” of the likelihood function so that the resulting probability distribution is also in the family. In this way, a closed-form expression for the posterior distribution is obtained, and, for example, numerical integration can be avoided. In tsunami problems, like estimating the frequency of events where the rate parameter is unknown but can be constrained somewhat with data, conjugate families are sometimes used. Geist and Parsons (2010) applied a Poisson-Gamma conjugate to model the probability of potentially tsunamigenic submarine landslides in the Santa Barbara Channel (Southern California), Port Valdez (Alaska), and Storegga Slide complex (Norwegian Sea). The landslide probability problem assumes that the landslides are independent and thus occur randomly in time according to a Poisson distribution (characterized by a rate parameter λ), and eventually earthquake probability is also considered. Landslide inter-event time uncertainties associated with age dating of individual events and open time intervals were estimated. The seismically imaged landslides typically exhibited only the ages of the youngest and oldest underlying events. However, through the Bayesian approach, even not straightforward information are included. The most likely mean return time (1/λ) of the submarine landslides was estimated by this Poisson-Gamma model using the number of landslide occurrences and the observation period. Grezio et al. (2010) combined the prior Beta distribution with the Binomial likelihood function in the Messina Strait Area. The posterior distribution is a modified Beta distribution constrained by the past run-up observations. In a similar formulation, Knighton and Bastidas (2015) employed the Poisson-Gamma conjugate model in the first step to estimate a likelihood function for the Poisson rate parameter of tsunamigenic events given in an historical catalog. Then, the likelihood function for the Poisson parameter was determined to find the probability of tsunamigenic events causing a hazard exceeding a critical value. The outcome probability was used into a Beta-Binomial scheme like in Grezio et al. (2012). Selva et al. (2016) proposed an event tree procedure to quantify source uncertainties in a seismic PTHA. At one level of the event tree, the uncertainty on potential focal mechanisms of earthquakes is modeled. At this node, a Dirichlet distribution is used to represent the prior knowledge about the probability of occurrence of the different combinations of discrete intervals of strike, dip, and rake angles. This distribution is then updated by observations of such angles from two earthquake catalogs from the Ionian Sea region (central Mediterranean Sea), naturally distributed following a Multinomial distribution. The resulting posterior distribution for the probability of the different interval of angles is again a Dirichlet distribution.

If the parameter Θ is a m-dimensional vector {θ1, …, θm}, as in the last example of Selva et al. (2016), the probability distributions and the normalization constant P(Y) are m-dimensional problems. If the conjugacy property of prior/likelihood functions is not used to directly obtain the posterior distribution in a closed form, advanced sampling techniques should be adopted. Markov chain Monte Carlo (MCMC) methods are intensive computational techniques used to approximate the high-dimensional integrals associated with the posterior probability distribution in Bayes’ theorem. Markov chain samples from the posterior distribution for a time sufficient long to reach the equilibrium within the required approximation (Draper 2009). These techniques extend the range of the single-parameter sampling method to multivariate situations where each parameter or subset of parameters in the overall posterior density may have different density (Congdon 2006). Knighton and Bastidas (2015) evaluate the hazard to the hypothetical coastal facility within a 30-year time period by the Monte Carlo analysis through sampling the likelihood distribution of inter-event timing of the tsunami sources and the beta distribution which pertains to the binomial distribution of the hazard parameter.

### Epistemic and Aleatory Uncertainties

All probabilistic analyses typically address the problem of epistemic and aleatory uncertainties. Many authors supported this division, based on either theoretical (Marzocchi and Jordan 2014) or practical (Paté-Cornell 1996) reasoning. In its general interpretation, aleatory uncertainty represents the unreducible natural variability of the studied phenomenon, while the epistemic uncertainty arises from the limited knowledge on the system that does not allow to perfectly quantifying the aleatory uncertainty. In this way, it is possible to distinguish the uncertainty that may be reduced by increasing the knowledge of the modeled system (the epistemic uncertainty) from the irreducible unpredictability of the system itself (the aleatory uncertainty). Also, the separation allows to report the effective variability of the results in a more robust format and to make any probabilistic analysis a testable experiment (Marzocchi and Jordan 2014; Marzocchi et al. 2015).

In probabilistic tsunami hazard analysis, epistemic uncertainty emerges from the substantial lack of understanding of the tsunamigenic processes (e.g., the long-term earthquake rates or the dynamics of “tsunami earthquakes”; see, e.g., Polet and Kanamori 2009) and of the tsunami evolution after generation, or even from approximations in the tsunami numerical modeling made for the sake of practical feasibility (e.g., the common shallow-water approximation), or from the lack of accurate enough digital elevation models.

Both the physics-based and data-driven concepts should address the appropriate hypotheses on the statistical experiment settings. Tsunami events of largest intensity are rare, and the data are often not sufficient to constrain properly the variability of the controlling parameters. As a consequence, a large epistemic uncertainty arises, and many scientifically acceptable alternative models may be formulated.

In the Bayesian paradigm, both types of uncertainty are automatically quantified for the potential reduction of the possible epistemic uncertainty by accounting for all relevant available information. In a Bayesian analysis, the epistemic uncertainties are represented as uncertain parameters, whereas aleatory uncertainties are represented with the choice of probability density functions appearing in the selected parametric statistical model. Since in tsunami applications relatively few data are generally available, parameters are (usually) poorly constrained, and few additional data can feed the likelihood. Thus, the weight of the prior is larger compared to the weight of the likelihood functions.

Advanced approaches tend to extend the exploration of epistemic uncertainty by including alternative statistical models of the aleatory uncertainty, that is, developing the prior probability distributions by implementing different statistical models (Knighton and Bastidas 2015) or implicitly adopting an ensemble modeling approach (Marzocchi et al. 2015; Selva et al. 2016).

## Bayesian PTHA

Forecasting tsunamis is typically an underinformed exercise because the mean return time of a given event is often longer than the period we have had to observe it. Thus, we can only rarely develop a complete empirical distribution that satisfactorily captures the aleatory variation. We then rely on numerical models, indirect paleo-evidence, and/or incomplete historical observations to develop probability density function parameters. Many of these models have a stochastic component that attempts to cover the possible range of behaviors. Each of these datasets or model results may capture different aspects of the hazard process. Here we discuss two paradigmatic analyses in which the Bayesian method aggregates a variety of information sources, specifically using likelihood functions shaped from measurement uncertainties and/or stochastic distributions to integrate and weight results. If advanced models, additional and even sparse data, improved instrumental measurements, or new observations become available, an update of the posterior inferences is possible in the Bayesian statistical framework, keeping track of the assumptions on the prior knowledge and the introduced further information.

### Example of Tsunami Forecast from Numerical Modeling and an Empirical Catalog

Exceedance rate of run-ups or tsunami waves at a selected site is typically calculated by assessing local and distant tsunamigenic sources, modeling wave height and/or flooding, and then aggregating expected probabilities. It can be difficult to account for every source, especially for the case of submarine landslides. Thus, independent empirical observations, such as from tide gauges, eyewitness accounts, or paleotsunami evidence, are valuable. In this example (after Parsons and Geist 2009), we discuss a spatial tsunami forecast for the Caribbean region (Fig. 1) made from numerical models of wave height (Geist 2002; Geist and Parsons 2006) and an ~500-year-long empirical catalog (O’Loughlin and Lander 2003). Moreover, we discuss how Bayesian methods are used to handle cases where, at a given location, there may be multiple independent run-up rate distributions derived from different models, only one, or none. We gridded the region uniformly and began with a noninformative prior. In the case we present here, each geographic cell had between zero and two rate distributions that were described by likelihood functions. When there were no estimates for a given cell, then the posterior distribution was zeroed. When one model provided rates, its likelihood function was used to update the priors, and when more than one rate estimate was available, the posterior distribution was developed through combination and renormalization and was then used to update the prior. Fig. 1Top shows individual run-up observations from O’Loughlin and Lander (2003). Circle size represents run-up in m. Bottom panel shows summed number of run-up observations per 20 by 20 km cell and the corresponding empirical Poisson probability
Table 1

A 30-year probability of tsunami run-up in excess of 0.5 m in cells that contain population concentrations in 20 by 20 km cells for representative Caribbean countries and territories. Population given as a relative measure of risk throughout the region. Values were calculated as uniform over cell areas and are not intended to convey any detail at selected cities but are presented for comparison purposes. Dashes indicate negligible calculated probability

Country

Nearest coastal city

Latitude

Longitude

Population

30-yr probability

r ≥ 0.5 m (%)

Antigua and Barbuda

St. John’s

17.1167°

−61.8500°

24,226

5.74

Belize

Belize City

17.4847°

−88.1833°

70,800

Cayman Islands

George Town

19.3034°

−81.3863°

20,626

10.79

Columbia

Cartagena

10.4000°

−75.5000°

895,400

0.08

Costa Rica

Puerto Limon

10.000°

−83.0300°

78,909

8.32

Cuba

Santiago de Cuba

20.0198°

−75.8139°

494,337

2.31

Dominica

Roseau

15.3000°

−61.3833°

14,847

11.94

Dominican Republic

Santo Domingo

18.5000°

−69.9833°

913,540

17.56

France, Guadeloupe

Basse-Terre

16.2480°

−61.5430°

44,864

11.79

France, Martinique

Fort-de-France

14.5833°

−61.0667°

94,049

5.33

Grenada

St. George’s

12.0500°

−61.7500°

7500

2.48

Guatemala

Puerto Barrios

15.7308°

−88.5833°

40,900

Haiti

Port-au-Prince

18.5333°

−72.3333°

1,277,000

0.01

Honduras

La Ceiba

15.7667°

−86.8333°

250,000

Jamaica

Kingston

17.9833°

−76.8000°

660,000

21.95

Netherlands Antilles

Willemstad

12.1167°

−68.9333°

125,000

7.04

Nicaragua

Bluefields

12.0000°

−83.7500°

45,547

Panama

Colon

9.3333°

−79.9000°

204,000

17.56

St. Kitts and Nevis

Basseterre

17.3000°

−62.7333°

15,500

6.95

St. Lucia

Castries

14.0167°

−60.9833°

10,634

5.52

St. Vincent and the Grenadines

Kingstown

13.1667°

−61.2333°

25,307

11.32

Trinidad and Tobago

Port of Spain

10.6667°

−61.5167°

49,031

Turks and Caicos

Cockburn Town

21.4590°

−71.1390°

5567

3.57

UK, Virgin Islands

Road Town

18.4333°

−64.5000°

9400

13.85

USA, Puerto Rico

San Juan

18.4500°

−66.0667°

434,374

22.24

USA, Virgin Islands

Charlotte Amalie

18.3500°

−64.9500°

18,914

17.56

Venezuela

Cunana

10.4564°

−64.1675°

305,000

6.27

The Caribbean region has not produced many large earthquakes in the modern catalog era, which means we have virtually no knowledge of the causative earthquake magnitude-frequency distribution, nor do we have any information about slip distributions. Thus, a wide array of possible earthquake scenarios constrained by the fault geometry and moment rate from plate motions must be modeled. In this case, a 3D finite-element model of the subduction zone and other major faults was constrained by GPS and plate rates/directions to calculate expected fault slip rates, and a set of stochastic earthquake rate models and slip distributions was developed from that. The associated group of 50,500-year simulated catalogs of tsunami run-ups made with a finite-difference approach was then calculated to capture likely variability and modeling uncertainties (see Parsons and Geist 2009 for full details) (Fig. 2). A general aggregation equation for determining the rate (λ) at which tsunamis will exceed a certain run-up (R0) at a coastal location was used to develop synthetic tsunami catalogs as
$$\lambda \left(R>{R}_0\right)=\sum \limits_{\mathrm{zone}=j}{\nu}_j\underset{m_t}{\overset{\infty }{\int }}P\left(R>{R}_0|{m}_j\right){f}_j(m) dm.$$
(2) Fig. 2Example calculation of expected run-up (≥0.5 m) frequency over a 4442-year period calculated from the expected seismic moment rate
The propagation distance was included in the term P(R > R0| mj) since this term is computed by numerical propagation models. The moment-frequency distribution for earthquakes in a given zone j is described by the term fj(m), where a tapered Gutenberg-Richter (G-R) distribution with the complementary cumulative (survivor) distribution Fj(m) of Kagan (2002a) and Kagan and Jackson (2000) was used as
$${F}_j(m)={\left({m}_t/m\right)}^{\beta}\exp \left(\frac{m_t-m}{m_c}\right),\quad m\ge {m}_t$$
(3)
where β is the shape parameter for the distribution, mt is the threshold moment, and mc is the corner moment that controls the tail of the distribution. The source rate parameter for each zone (υj) was defined as the activity rate for earthquakes of mmt and is related to the seismic moment rate ($${\dot{m}}_s$$) as described by Kagan (2002b) as
$$\upsilon (m)=\frac{\left(1-\beta \right){\dot{m}}_s}{m^{\beta }{m}_c^{1-\beta}\Gamma \left(2-\beta \right){e}^{m/{m}_c}}$$
(4)
where Γ is the gamma function. The “tectonic” moment rate ($${\dot{m}}_t$$) is given by $${\dot{m}}_t=\mu A\dot{u}$$, where μ is the shear modulus, A is the area of the seismogenic part of the fault zone, $$\dot{u}$$ is the long-term slip rate along the fault determined from finite-element modeling, and$${\dot{m}}_s$$ and $${\dot{m}}_t$$ are related by a seismic coupling parameter (0 ≤ c ≤ 1):$${\dot{m}}_s=c{\dot{m}}_t$$. We implemented Eq. 2 using a Monte Carlo-type procedure in which synthetic earthquake catalogs of fixed duration were prepared from random samples of the distribution defined by Eqs. 3 and 4. The primary sources of epistemic uncertainty captured by this process are geographic moment distribution in the form of hypocentral locations, spatial magnitude distribution, and stochastic slip distributions, all of which variably impact the fraction of moment that is tsunamigenic.

For each earthquake in the synthetic catalogs, vertical and horizontal coseismic seafloor displacements are the initial conditions for tsunami modeling (Tanioka and Satake 1996). Displacements are calculated using Okada’s (1985) analytic functions. A finite rise time of 20 s was applied uniformly, with no preferred rupture propagation direction. The propagation of the tsunami wavefield is modeled using a finite-difference approximation to the linear long-wave equations (Aida 1969; Satake 2002). A 2 arc minute bathymetric grid (Smith and Sandwell 1997) was used with an 8 s time step that satisfied the Courant-Friedrichs-Lewy stability criterion for the Caribbean region. A reflection boundary condition was imposed at the 250 m isobath, whereas a radiation boundary condition was imposed along the open-ocean boundaries of the model (Reid and Bodine 1968). Run-up (R0) was approximated from the coarse-grid model by finding the nearest model grid point to the coastline and then multiplying the peak offshore tsunami amplitude by a factor of 3 that roughly accounts for shoaling amplification and the run-up process itself (Satake 1995, 2002; Shuto 1991).

We conducted two experiments with a single 4442-year synthetic run-up catalog and another with 50,500-year catalog. We found that the 50,500-year catalogues captured more variability in spatial run-up distribution than did the 4442-year catalogues. This resulted from the multiple catalogs having more varieties of earthquake locations since a few very large events can dominate the distribution of moment, and consequently regional tsunami run-up distribution, due to the Gutenberg-Richter constraint. We thus used the set of 50,500-year catalogues to determine mean rates and uncertainties in the probability calculations.

The empirical catalog, while unusually long, is spatially incomplete because not all coastlines were populated over its duration, meaning that in any given spatial cell, there are less than five observations. Therefore Monte Carlo methods were applied (e.g., Parsons 2008) to extrapolate recurrence parameters (Fig. 3). Fig. 3Normalized histograms of the Monte Carlo sequences that matched the indicated event frequencies over 500-year intervals. Ranges of exponential rate parameters are shown (expressed as the inverse, which is recurrence interval) that can match observed frequencies of Caribbean tsunami run-ups (≥0.5 m), which range from 1 to 4 events in ~500 years
It is not uncommon for empirical tsunami rate observations to be higher than numerical models predict in places not fully accounted for by the models. In the Caribbean region, this is seen at Puerto Rico, Jamaica, Costa Rica, and Panama, whereas the numerical rates are higher at the Antilles (Fig. 4). In these instances, it is likely that the empirical model has captured localized tsunami events that were caused by landslides and/or accommodating faults associated with the plate boundary that were not specifically included in the numerical model sources. Many of the secondary earthquake sources not included have very slow and uncertain slip rates, making implementation into a numerical model difficult. Here, we discuss how these disparate and incomplete distributions can be combined and weighted using likelihood functions. Fig. 4Comparison between (top) model-derived 30-year Poisson probability (calculated from modeled or observed rates in 20 × 20 km cells) of tsunami run-ups (≥0.5 m) and (bottom) empirically derived values

The primary sources of epistemic uncertainty include (1) tsunami sources not explicitly known or included in the model, (2) seismic coupling coefficient of the Caribbean plate boundary zones, and (3) the degree of completeness in the empirical tsunami catalog. To encompass these uncertainties into probability estimates, a Bayesian framework is created to build tsunami run-up rate estimates within 20 by 20 km cells that contain coastlines throughout the Caribbean region. The key advantage of this approach is that the empirical and model results end up being combined and weighted by their attendant uncertainties.

Having independent empirical and model-derived rate estimates in each spatial cell enables some of the run-up-rate uncertainty to be addressed. Monte Carlo fitting of empirical intervals as shown in Fig. 3 along with results from 50 numerical model runs (e.g., Fig. 2) provides arrays of possible run-up rate values at each cell. Unknown/unaccounted-for tsunami sources can be partly accounted for because some of the empirical rates result from sources not accounted for in the numerical model (the most affected areas can be seen by comparing the panels of Fig. 4); the forecast may suffer from incomplete knowledge if events not covered by numerical models have also not occurred in the empirical catalog over the past 500 years. Seismic coupling is a difficult parameter to estimate with certainty; a broad range is captured because the historic earthquake catalog implies a low coupling value of 0.32 (found by comparing seismic moment release to expected slip on Caribbean faults), whereas the numerical models have coupling coefficients of 1.0. Completeness is addressed because low-rate plate-boundary events potentially not seen in the empirical catalog are accounted for with the 50 numerical model runs.

Model-derived run-up rates are combined with empirical rates in the following way: in cells where there are no empirical values, the numerical-model-derived rates are given full weight. Conversely, empirical rates are given full weight where numerical model rates are zero. Lastly, where there are empirical and model rate estimates within the same cells, likelihood functions are used to weight the two models. Distributions shown in Fig. 3 give the relative probability of different rates for a Poisson model that could have caused the empirical observations. Similarly, results from the 50 numerical model runs produce relative probability (Fig. 5) of different rates in each model cell. Fig. 5(a) Normalized histogram (likelihood) of tsunami run-up (R0 ≥ 0.5 m) rates in 214 20 km by 20 km cells defined using likelihood functions from empirical rates (Fig. 3) and from 50 numerical modeling simulations. (b) Normalized histogram of run-up rates from numerical modeling in 685 cells where there are no empirical observations. Mean values from these distributions are used in the best-estimate probability calculations mapped in Fig. 6 Fig. 6A 30-year tsunami run-up (r ≥ 0.5 m) probability in 20 by 20 km cells at coastal sites in the Caribbean region made from combined rate estimates from empirical and numerical models. Lower panel shows locations of major cities listed in Table 1
To rank different rate models for each cell where more than one estimate exists (e.g., modeled and observed), a likelihood calculation is made to weight the models. In the simplest, binomial case, likelihood is defined as proportional to the probability of obtaining results A given a fixed hypothesis H resulting from a set of fixed data (equivalent to the sampling distribution as defined in section “Methodology”). If A1 and A2 are two possible, mutually exclusive results, then
$$P\left({A}_1\; or\ {A}_2|H\right)=P\left({A}_1|H\right)+P\left({A}_2|H\right),$$
(5)
and likelihood of a specific outcome A|H is defined as its probability; thus,
$$L\left(H|A\right)= kP\left(A|H\right),$$
(6)
where k is an arbitrary constant. In the current example, A1|H is a spatial distribution of numerical model run-up rates, each of which might be correct, whereas A2|H is a spatial distribution of Monte Carlo-modeled rates based on direct observations.
The results from likelihood functions are used to obtain the final weights using Bayes’ rule (Eq. 4), where the posterior distribution is proportional to the likelihood function multiplied by the prior. For this example, we begin with a uniform (noninformative) prior which assumes there is equal probability of all rates in each coastal 20 km by 20 km cell. Next, we update the prior with the empirical results and the numerical model results. Since the prior is updated twice, the same result is achieved by simply multiplying the two likelihood functions. Thus, the likelihood of a given rate λ where there were empirical estimates (e1) and numerical-modeled estimates (e2) is
$$L\left(\lambda |{e}_1,{e}_2\right)=k\left[{p}_1\left({e}_1|\lambda \right)\right]\left[{p}_2\left({e}_2|\lambda \right)\right],$$
(7)
where p(e1|λ) is the probability of rate λ based on the Monte Carlo fits shown in Fig. 3 and p(e2| λ) is the probability of rate λ from the 50 numerical model runs. The constant k is used for normalizing the weights so that they add to 1. This can be expanded indefinitely if there are more information sources.

Likelihood functions are used to weight rate models over a range from 0 to 10 events in the 500-year observation period. Rates between 0 and 10 events in 500 years are considered for all cells, assuming no further prior information. Final rates are found by weighted means of the posterior rates. To summarize the process, where model and empirical values are both absent for a given rate, the posterior distribution was zeroed. When one model provides rates, its likelihood function was used to update the priors, and when both empirical and numerical rate estimates are available, likelihood is developed through combination and renormalization using Eq. 4, which is then used to update the priors. Combining empirical and modeled rates makes up for some of the deficiencies in each approach; the empirical catalog is likely not a complete record of all possible interplate tsunami sources, whereas the numerical model did not account for accommodating intraplate faults and/or landslide sources that appear likely causes of tsunamis in the empirical record.

### Example of Past Data Weighting Factors for Statistical Models and Subjective Estimation of the Variance

The Bayesian inference can be applied to the probability Θ that a destructive event overcomes a defined specific threshold zt of a selected physical parameter Z within a given time period set equal to 1 year. In this example for the Messina Strait Area, Southern Italy (Fig. 7), the Bayesian PTHA is calculated as the annual probability that the run-up Z overcomes a selected threshold run-up zt at important sites at least once. Run-up is chosen as intensity as there are historical run-up data reported by the Italian Tsunami Catalogue (Tinti et al. 2004) and other studies (full details in Grezio et al. 2012). In the past centuries, this area was struck by important tsunamis generated by both regional seismic sources and by non-seismic sources (landslides and total or partial collapse of volcanic edifices due to volcanic eruptions). Information from the regional seismotectonic studies, marine geology background, and recent instrumental records are integrated in order to identify the potential tsunamigenic events generated by both submarine seismic sources (SSSs) and submarine mass failures (SMFs) (Grezio et al. 2010, 2012), and historical indications are used for selecting the key sites. In this example, the prior PDF is derived by (simplified) simulation of the tsunamis generated by SSSs and SMFs. In the area, these multiple sources are considered as the predominant tsunamigenic sources and are examined to reduce biases and underestimations in the hazard that possibly arise by assuming only one single type of source as primary (Grezio et al. 2015). The likelihood function is used for modeling the past tsunami run-up data. The posterior probability distribution summarizes the updated estimate of the parameter Z. Fig. 7Messina Strait Area: dots are the SSS epicenters and stars the SMFs; the major cities are indicated by squares on the coast where past tsunami events occurred

The SSSs are localized on active faults, and the relative epicenters are extracted from the instrumental Catalogue of the Italian Seismicity with a completeness magnitude of 2.5 (Castello et al. 2007) at depths smaller than 15 km within the shallow part of the crust. Instrumental magnitudes are recorded since 1981, and no tsunami occurred in this short time. In order to consider a large set of potentially tsunamigenic SSSs, magnitudes in the range 5.5–7.5 Mw were introduced consistently with the regional seismotectonic studies and weighted using the Gutenberg-Richter distributions (Gutenberg and Richter 1944). Finally the magnitudes are associated with the catalog epicenters, and the seafloor deformations are calculated via the analytical formulas by Okada (1992) in order to compute the initial tsunami sea surface waves. The relative fault parameters (width, length, and slip) and focal mechanisms (strike, dip, and rake) are provided, respectively, by the empirical relationships in Wells and Coppersmith (1994) and the Earthquake Mechanisms of the Mediterranean Area database (Vannucci and Gasperini 2004). The SSS spatial distribution is considered uniform.

The SMFs are spatially identified using marine geology background knowledge. Their propensity to fail is evaluated on the basis of the mean slope and mean depth, and it is associated to bathymetry cells. In each cell, potentially tsunamigenic SMFs are simulated with volumes spanning from 5 × 105 to 5 × 1010 m3 as indicated by the historical SMF sizes identified in the Tyrrhenian and Ionian basins. Additionally, spatial conditional probabilities are introduced considering that the past SMF scars represent instability areas. The other geometric parameters and the initial tsunami waves are estimated, respectively, by the rigid body approximation and the empirical formulas in Grilli and Watts (2005) and Watts et al. (2005). In analogy with the subaerial mass failures, the SMF frequency-size relationship is assumed to be a power law.

The run-ups Z caused by the SSS and SMF tsunamigenic sources were calculated trough empirical formulas (Synolakis 1987). Uncertainties related to the Z parameter would be reduced through the modeling of source directivity (tsunami energy is not spread isotropically around the source), wave propagation effects (refraction, diffraction, etc.), and other, also nonlinear processes during shoaling and coastal inundation (wave breaking, bores, friction, etc.).

The prior probability model of the parameter Z encompasses the theoretical assumptions (e.g., the tsunami modeling), the background knowledge (e.g., the SMF spatial distribution and their propensity to failure and the digital elevation model), and the instrumental data on the sources (e.g., the historical seismicity); it does not include the historical tsunami events (which enter in the likelihood distribution). Physical and statistical considerations are made assuming the Beta distribution is an adequate subjective choice of the functional form of the prior. Also, the Beta distributions are convenient conjugate families for the Binomial distribution used for the likelihood, which simplifies the calculations. Assuming that the probability Θ does not vary in the time interval, the prior distribution for Θ is approximated by a Beta distribution with positive α and β in each key site in the Messina Strait Area
$$P{\left(\Theta \right)}_{\mathrm{prior}}\sim Beta\left(a,\beta \right).$$
(8)
To choose values of α and β for the prior, we find it easier to work with the expected value E and the variance V for the Beta distribution, which are, respectively,
$$E\left(\Theta \right)=\frac{\alpha }{\alpha +\beta}\quad \mathrm{and}\, V\left(\Theta \right)=\frac{E\left(\Theta \right)\Big(1-E\left(\Theta \right)}{\alpha +\beta +1}.$$
(9)
The prior mean E is set equal to the weighted percentage of run-ups Z > zt generated by the simulated potential tsunamigenic sources
$$E\left(\Theta \right)={\sum}_i{p}_iH\left(Z>{z}_t\right)$$
(10)
where H is the Heaviside function equal to 1 if the simulated run-up Z is higher than the threshold (zt = 0.5 m) and 0 otherwise and pi is the probability of occurrence of each i-th tsunamigenic source in a time window of 1 year.
The variance V is defined as the confidence degree of the prior information through the equivalent number of data Λ (=α + β − 1) (Marzocchi et al. 2004, 2008; Grezio et al. 2010)
$$V\left(\Theta \right)=\frac{E\left(\Theta \right)\left(1-E\left(\Theta \right)\right)}{\Lambda +2}$$
(11)

By setting the parameter Λ to specific values, we assign both the subjective reliability to the prior model and the relative confidence interval. The parameter Λ weights the prior model and represents an estimate of the epistemic uncertainties due to the limited knowledge of the process. In general, a large Λ value corresponds to a large reliability of the prior model, so that the prior distribution needs a great number of past data or observations in the likelihood to be modified significantly. On the contrary, Λ must be small if the prior model is only a first-order approximation of the process, so that even a limited number of observations in the likelihood can heavily modify the prior distribution. The minimum possible value of Λ is 1, representing the maximum possible epistemic uncertainty or maximum of level of ignorance. As Λ increases, the Beta function becomes more and more spiked around the given mean. The end-member is a Dirac’s function judging the epistemic uncertainty negligible when a large amount of data is available (Marzocchi and Lombardi 2008). Here, Λ is assumed equal to 10 on the basis of practical and expert judgement; it means that more than 10 real data can change drastically the prior probability distribution (Grezio et al. 2012). After computing the expected value E and the variance V, the α and β hyper-parameters of the Beta distribution are finally constrained in Eq. 6, and the prior PDF is determined.

The prior Beta PDFs for SSSs and SMSs are shown separately in Fig. 8 for each key site along the Eastern Sicily and the Southern Calabria coasts (Messina Strait Area). The prior probability that a tsunami event overcomes the 0.5 m threshold at each key site is in the interval [2.2 × 10−4–6.4 × 10−4] × year−1 for the SSSs and [0.7 × 10−7–1.5 × 10−7] × year−1 for the SMFs. The relative prior variance is in the range [1.8 × 10−5–5.3 × 10−5] × year−1 for the SSSs and [0.5 × 10−8–1.3 × 10−8] × year−1 for the SMFs. Fig. 8Prior Beta distributions for the (a) SSSs and (b) SMFs
The likelihood is based on the historical and/or instrumental past data of the run-ups occurred in the Sicily and Calabria regions in the last 500 years (Maramai et al. 2005a,b; Favalli et al. 2009; Tinti et al. 2007; Grezio et al. 2012). The events are assumed independent and investigated in the historical time record of the last 500 years. The set of observations consist of the total number of 1 year time windows in which Z > zt in the historical catalog. The n years when the tsunami occurred are counted as successes and the years (n-y) without tsunami data as failures. This is formalized with the likelihood function by using a Binomial model
$$P{\left(Y|\Theta =\theta \right)}_{\mathrm{likel}}\sim Bin\left(n,\theta \right)$$
(12)
where θ is the random variable defined in the interval [0, 1]. From the catalog, we only use entries that have (i) the tsunami reliability equal to 4, meaning that a definite tsunami occurred, and (ii) the tsunami intensity equal to 2 or 3 in the Ambraseys-Sieberg scale, recognizing that an event of intensity equal 3 generally produces run-ups of approximately 1 m (Tinti et al. 2005). The impact of the tsunami waves was assumed large in the case of intensity 3, reaching all key sites in the Messina Strait Area, and lower in the case of intensity 2, relevant only for the closest key sites indicated by the catalog. In this case, even if the catalog does not provide explicitly the run-up measures, the values higher than 0.5 m were assigned to the local key sites.
The prior distribution (Beta distribution based on physical models, background knowledge, and marine geological information) is modified by the likelihood (Binomial distribution based on historical records) producing the posterior distribution (Beta distribution computed by point-wise multiplication). Then, the posterior distribution is
$$P{\left({\Theta}^i|\left(Y=y\right)\right)}_{\mathrm{post}}\sim Beta\left(a+{y^i}_{,}\beta +n-{y}^i\right)$$
(13)
where n is the number of years and yi the past observed events, with i = SSS, SMF. The posterior Beta PDFs are shown separately in Fig. 9 for each key site. When the historical run-ups are considered in the cities of the Messina Strait Area (Messina, Reggio Calabria, Pellaro, Catania, Augusta, Siracusa, Milazzo, Capo d’Orlando, Cefalù, Capo Vaticano, and Roccella) the posterior means largely increase with values between 2.0 × 10−3 × year−1 and 7.9 × 10−3 × year−1 in the case of the SSSs, and at the same time, the variances are reduced at the order 10−6 × year−1. Similarly, the posterior means for the SMFs increase in Messina, Reggio Calabria, Pellaro, Catania, Augusta, Siracusa, and Stromboli, which experienced mass failures producing tsunamis, with values in the interval [2.0 × 10−3– 11.9 × 10−3] × year−1 and posterior variances of the order 10−10 × year−1. Fig. 9Posterior Beta distributions for the (a) SSSs and (b) SMFs

The analysis shows that SSS and SMF posterior probability generally increases by one or more order of magnitude, and both types of tsunamigenic sources present the same order of magnitude in the Messina Strait Area. Therefore, both sources must be considered and combined in order to produce a reliable PTHA in this area. Conversely, the posterior variances are reduced by one order of magnitude in the SSS case and by two orders of magnitude in the SMF case. The epistemic uncertainty decreases when the number of past data and/or historical information increase and the Beta distribution results more spiked because of Λpost = Λ + yi.

Finally, if Θ defines the probability of occurrence of at least one tsunami event in the time interval, then 1–Θ is the generic probability that no tsunami occurs. The final posterior distribution is
$$P{\left(\Theta \right)}_{\mathrm{post}}=1-\left[\left(1-{\Theta^{\mathrm{SSS}}}_{\mathrm{post}}\right)\left(1-{\Theta^{\mathrm{SMF}}}_{\mathrm{post}}\right)\right]$$
(14)
and evaluates the probability for the Sicily and Calabria cities that a tsunami run-up overcoming the threshold zt occurs in the time interval of 1 year caused both by SSSs or SMFs. The mean and the variance of the final posterior distributions are reported in Table 2.
Table 2.

Final means and variance of the posterior probability distribution that a tsunami run-up overcoming 0.5 m occurs in the time interval of 1 year due to the SSSs and SMFs

Key sites

Mean

Variance

× 10−3

× 10−5

Messina

5.9

1.1

Reggio Calabria

7.9

1.5

Pellaro

5.9

1.1

Catania

5.9

1.2

Augusta

5.9

1.2

Siracusa

3.9

0.8

Milazzo

2.0

0.4

Capo d’Orlando

2.0

0.4

Cefalù

2.0

0.4

Stromboli

11.9

2.3

Capo Vaticano

5.9

1.2

Roccella

7.9

1.6

## Discussion and Conclusions

In Bayesian method applied to tsunami hazard, the following issues should be taken into account and discussed:
• A mathematical consequence of the Bayesian procedure is that the results are always within the range of hypotheses, similar to most of the statistical techniques. Thus for the Bayesian methods to be considered objective, the results must depend on the assumed prior statistical model and observed data. In fact, different prior parameter determinations (with their own probability distributions considered as the random variables) may reach different conclusions, in particular when few past data are available.

• Information at the base of prior PDFs and likelihood functions sometimes cannot be completely independent in practical applications, leading to potential double counting. For example, in long-term applications where all the data of rare events should be considered, this issue can seriously affect Bayesian inferences if not properly accounted for (e.g., discussion in Selva and Sandri 2013). To overcome this issue, in practice, two assertions can be considered: (i) the tsunami data used to define the prior probability models should be extracted by different catalogs (e.g., generic earthquakes, not necessarily tsunamigenic earthquakes) and/or used only in aggregated forms (e.g., the prior information may be derived by tsunami events that occurred globally and support general knowledge about tsunami processes), whereas (ii) the data used to create the likelihood functions should be mainly local, in order to reduce as much as possible the potential effect of possible double counting. Therefore all assumptions made in formulating the prior probabilities and in selecting the parameters should be stated explicitly so that the results can be properly assessed.

At the same time, Bayesian statistics presents several advantages in the present context. In fact:
• It enables merging of several kinds of available information in a homogeneous framework. Different statistical methods, theoretical deductions, background knowledge, physical beliefs, empirical laws, numerical models, analytical results, historical data, and instrumental measurements are combined and integrated. Bayesian techniques make use of all information, even sparse data, while keeping track of the assumptions about the prior knowledge or the level of ignorance. For a given probability model, an update of the final inferences is possible as soon as new models and/or additional data become available.

• It enables accounting for different sources of uncertainty, i.e., aleatory and epistemic uncertainty. The uncertainties are specified and synthesized in the statistical distributions, and the Bayesian procedure, considering potential all sources of information, enables a quantification and, in principle, a controlled reduction of the inherent epistemic uncertainties.

• It allows for propagating all the uncertainties from all the levels of the assessment. The most relevant sources of uncertainty from the tsunami source generation process to wave propagation and impact on the coasts may be reported and incorporated in the tsunami hazard computation (Marzocchi et al. 2004, 2008; Grezio et al. 2010, 2012; Gelman et al. 2013; Knighton and Bastidas 2015; Selva et al. 2016). Additionally, different types of potentially tsunamigenic sources may be included in the analysis in order to reduce biases (Grezio et al. 2015).

## Future Directions

A key role in the future is PTHA testability against real and independent data. In its Bayesian interpretation, the probability represents a state of knowledge, and it is intrinsically subjective because all probabilities are degrees of belief that cannot be measured (Lindley 2000) and/or eventually rejected (Jaynes 2003). In PTHA, this means that the probabilistic quantification strictly refers to the next time window, and its results cannot be tested. The frequentist interpretation instead intrinsically connects the probability definition to a measurable quantity (the past frequency) that can be theoretically known by analyzing an “infinite” sequence of outcomes for repeatable event (Popper 1983). This makes such frequencies formally testable against real data. An unificationist approach (Marzocchi and Jordan 2014) may then be adopted for PTHA, in which the expert opinion is regarded as a model distribution describing the long-run frequencies determined by the data-generating process. These frequencies, which characterize the aleatory variability, have epistemic uncertainty described by the experts’ distributions. As far as the knowledge of the system increases, our capability of assessing the true value of such frequencies is refined, that is, the epistemic uncertainty is reduced. Therefore, following this definition for the PTHA and related uncertainty, if “infinite” dataset is made available, Bayesian and classical PTHA will lead to equivalent results, since any subjective choice regarding priors is completely overcome by the infinite dataset perfectly constraining the long-run frequencies. However, we are unfortunately far from this case, being tsunamis relatively rare events as compared to our observation window.

## Acknowledgments

We wish to thank Gareth Davies and Eric Geist for the constructive comments during the review process.

## Bibliography

1. Aida I (1969) Numerical experiments for the tsunami propagation – the 1964 Niigata tsunami and the 1968 Tokachi-Oki tsunami. Bull Earthquake Res Inst 47:673–700Google Scholar
2. Annaka T, Satake K, Sakakiyama T, Yanagisawa K, Shuto N (2007) Logic-tree approach for probabilistic tsunami hazard analysis and its applications to the Japanese coasts. Pure Appl Geophys 164:577–592
3. Blaser L, Ohrnberger M, Riggelsen C, Babeyko A, Scherbaum F (2011) Bayesian network for tsunami early warning. Geophys J Int 185(3):1431–1443.
4. Box GEP, Tiao GC (1992) Bayesian inference in statistical analysis. Wiley Classics Library, New York, p 588
5. Burbidge D, Cummins PR, Mleczko R, Thio HK (2008) A probabilistic tsunami hazard assessment for Western Australia. Pure Appl Geophys.
6. Castello B, Olivieri M, Selvaggi G (2007) Local and duration magnitude determination for the Italian earthquake catalogue (1981–2002). Bull Seismol Soc Am 97:128–139
7. Congdon P (2006) Bayesian statistical modelling, Wiley series in probability and statistics. Wiley, Chichester, p 529
8. Davies G, Griffin J, Løvholt F, Glymsdal S, Harbitz C, Thio HK, Lorito S, Basili R, Selva J, Geist E, Baptista MA (2016) A global probabilistic tsunami hazard assessment from earthquake sources, Accepted Manuscript, “Tsunamis: geology, hazards and risks”. GSL Special Publications, LondonGoogle Scholar
9. Draper D (2009) Bayesian statistics. Enc Complexity Earth Syst Sci 1:445–476Google Scholar
10. Favalli M, Boschi E, Mazzarini F, Pareschi MT (2009) Seismic and landslide source of the 1908 Straits of Messina tsunami (Sicily, Italy). Geophys Res Lett 36:L16304.
11. Geist EL (2002) Complex earthquake rupture and local tsunamis. J Geophys Res 107:ESE2–1–ESE 2–16
12. Geist EL, Lynett PJ (2014) Source processes for the probabilistic assessment of tsunami hazards. Oceanography 27(2):86–93.
13. Geist EL, Oglesby DD (2014) Tsunamis: stochastic models of occurrence and generation mechanisms. In: Meyers RA (ed) Encyclopedia of complexity and systems science. Springer, New York.
14. Geist EL, Parsons T (2006) Probabilistic analysis of tsunami hazards. Nat Hazards 37:277–314.
15. Geist EL, Parsons T (2010) Estimating the empirical probability of submarine landslide occurrence. In: Mosher DC et al (eds) Submarine mass movements and their consequences. Advances in natural and technological hazards research, vol 28. Springer, Dordrecht, p 377
16. Geist EL, Ten Brink US, Gove M (2014) A framework for the probabilistic analysis of meteotsunamis. Nat Hazards 74:123–142.
17. Gelman A, Carlin JB, Stern HS, Rubin DB (2013) Bayesian data analysis. Chapman & Hall/CRC Press, Boca Raton, p 667Google Scholar
18. Gonzalez FI, Geist EL, Jaffe B, Kaˆnoglu U, Mofjeld H, Synolakis CE, Titov VV, Arcas D, Bellomo D, Carlton D, Horning T, Johnson J, Newman J, Parsons T, Peters R, Peterson C, Priest G, Venturato A, Weber J, Wong F, Yalciner A (2009) Probabilistic tsunami hazard assessment at Seaside, Oregon, for near- and far-field seismic sources. J Geophys Res 114:C11023.
19. Gregory P (2005) Bayesian logical data analysis for the physical sciences. Cambridge University Press, Cambridge, p 468
20. Grezio A, Marzocchi W, Sandri L, Gasparini P (2010) A Bayesian procedure for Probabilistic Tsunami Hazard Assessment. Nat Hazards 53:159–174.
21. Grezio A, Marzocchi W, Sandri L, Argnani A, Gasparini P (2012) Probabilistic tsunami hazard assessment for messina strait area (Sicily – Italy). Nat Hazards.
22. Grezio A, Tonini R, Sandri L, Pierdominici S, Selva J (2015) A methodology for a comprehensive probabilistic tsunami hazard assessment: multiple sources and short-term interactions. J Mar Sci Eng 3:23–51.
23. Grezio A, Babeyko A, Baptista MA, Behrens J, Costa A, Davies G, Geist EL, Glimsdal S, González FI, Griffin J, Harbitz CB, LeVeque RJ, Lorito S, Løvholt F, Omira R, Mueller C, Paris R, Parsons T, Polet J, Power W, Selva J, Sørensen MB, Thio HK (2017) Probabilistic tsunami hazard analysis: multiple sources and global applications. Rev Geophys 55.
24. Grilli ST, Watts P (2005) Tsunami generation by submarine mass failure, I: modeling, experimental validation, and sensitivity analyses. J Waterway Port Coast Ocean Eng 131:283–297
25. Gutenberg B, Richter C (1944) Frequency of earthquakes in California. Bull Seism Soc Am 34:185–188Google Scholar
26. Heidarzadeh M, Kijko A (2011) A probabilistic tsunami hazard assessment for the makran subduction zone at the Northwestern Indian Ocean. Nat Haz 56:577–593
27. Hoechner A, Babeyko AY, Zamora N (2016) Probabilistic tsunami hazard assessment for the Makran region with focus on maximum magnitude assumption. Nat Hazards Earth Syst Sci 16:1339–1350.
28. Horspool N, Pranantyo I, Griffin J, Latief H, Natawidjaja DH, Kongko W, Cipta A, Bustaman B, Anugrah SD, Thio HK (2014) A probabilistic tsunami hazard assessment for Indonesia. Nat Hazards Earth Syst Sci 14:3105–3122.
29. Jaynes ET (2003) In: Bretthorst GL (ed) Probability theory the logic of science. Cambridge University Press, Cambridge, p 727
30. Kagan YY (2002a) Seismic moment distribution revisited: I, statistical results. Geophys J Int 148:520–541
31. Kagan YY (2002b) Seismic moment distribution revisited: II, moment conservation principle. Geophys J Int 149:731–754
32. Kagan YY, Jackson DD (2000) Probabilistic forecasting of earthquakes. Geophys J Int 143:438–453
33. Knighton J, Bastidas LA (2015) A proposed probabilistic seismic tsunami hazard analysis methodology. Nat Hazards.
34. Lane EM, Gillibrand PA, Wang X, Power W (2013) A probabilistic tsunami hazard study of the auckland region, part II: inundation modelling and hazard assessment. Pure Appl Geophys 170:1635–1646.
35. Lay T (2015) The surge of great earthquakes from 2004 to 2014. Earth Planet Sci Lett Invited Front Pap 409:133–146.
36. Lindley DV (2000) The philosophy of statistics. Statistician 49:293–337Google Scholar
37. Lorito S, Selva J, Basili R, Romano F, Tiberti MM, Piatanesi A (2015) Probabilistic hazard for seismically induced tsunamis: accuracy and feasibility of inundation maps. Geophys J Int 200(1):574–588.
38. Lorito S, Romano F, Lay T (2016) Tsunamigenic earthquakes (2004–2013): source processes from data inversion. In: Meyers RA (ed) Encyclopedia of complexity and systems science, vol 2015. Springer Science+Business Media, New York.
39. Maramai A, Graziani L, Alessio G, Burrato P, Colini L, Cucci L, Nappi R, Nardi A, Vilardo G (2005a) Near- and far field survey report of the 30 December 2002 Stromboli (Southern Italy) tsunami. Mar Geol 215(93):106Google Scholar
40. Maramai A, Graziani L, Tinti S (2005b) Tsunami in the Aeolian Islands (southern Italy): a review. Mar Geol 215(11):21
41. Marzocchi W, Jordan TH (2014) Testing for ontological errors in probabilistic forecasting models of natural systems. PNAS 111(33):11973–11978.
42. Marzocchi W, Lombardi AM (2008) A double branching model for earthquake occurrence. J Geophys Res 113:B08317.
43. Marzocchi W, Sandri L, Gasparini P, Newhall C, Boschi E (2004) Quantifying probabilities of volcanic events: the example of volcanic hazard at Mount Vesuvius. J Geophys Res 109:B11201.
44. Marzocchi W, Sandri L, Selva J (2008) BET_EF: a probabilistic tool for long- and short-term eruption forecasting. Bull Volcanol 70:623–632.
45. Marzocchi W, Taroni M, Selva J (2015) Accounting for epistemic uncertainty in PSHA: logic tree and ensemble modeling. Bull Seismol Soc Am 105:2151–2159.
46. Mueller C, Power W, Fraser S, Wang X (2015) Effects of rupture complexity on local tsunami inundation: implications for probabilistic tsunami hazard assessment by example. J Geophys Res Solid Earth 120:488–502.
47. O’Loughlin KF, Lander JF (2003) Caribbean tsunamis: a 500-year history from 1498–1998. Kluwer Academic Publishers, Dordrecht
48. Okada Y (1985) Surface deformation due to shear and tensile faults in a half-space. Bull Seismol Soc Am 75:1135–1154Google Scholar
49. Okada Y (1992) Internal deformation due to shear and tensile faults in a half-space. Bull Seismol Soc Am 82:1018–1040Google Scholar
50. Omira R, Baptista MA, Matias L (2015) Probabilistic tsunami hazard in The Northeast Atlantic from near- and far-field tectonic sources. Pure Appl Geophys 172:901–920. 2014 Springer, Basel.
51. Orfanogiannaki K, Papadopoulos G (2007) Conditional probability approach of the assessment of tsunami potential: application in three tsunamigenic regions of the Pacific Ocean. Pure Appl Geophys 164:593–603
52. Parsons T (2008) Monte Carlo method for determining earthquake recurrence parameters from short paleoseismic catalogs: example calculations for California. J Geophys Res 113.
53. Parsons T, Geist EL (2009) Tsunami probability in the caribbean region. Pure Appl Geophys 165(2008):2089–2116.
54. Paté-Cornell M (1996) Uncertainties in risk analysis: six levels of treatment. Reliab Eng Syst Saf 54:95–111
55. Polet J, Kanamori H (2009) Tsunami earthquakes. In: Meyers A (ed) Encyclopedia of complexity and systems science. Springer, New York.
56. Popper KR (1983) Realism and the aim of science. Hutchinson, LondonGoogle Scholar
57. Power W, Wang X, Lane EM, Gillibrand PA (2013) A probabilistic tsunami hazard study of the auckland region, part I: propagation modelling and tsunami hazard assessment at the shoreline. Pure Appl Geophys 170:1621.
58. Reid RO, Bodine BR (1968) Numerical model for storm surges in Galveston Bay. J Waterways and Harbors Div ACE 94:33–57Google Scholar
59. Sakai T, Takeda T, Soraoka H, Yanagisawa K, Annaka T (2006) development of a probabilistic tsunami hazard analysis in Japan. In: Proceedings of ICONE14 international conference on nuclear engineering, Miami, 17–20 July, ICONE14–89183Google Scholar
60. Satake K (1995) Linear and nonlinear computations of the 1992 Nicaragua earthquake tsunami. Pure App Geophys 144:455–470
61. Satake K (2002) Tsunamis. In: Lee WHK, Kanimori H, Jennings PC, Kisslinger C (eds) International handbook of earthquake and engineering seismology, vol 81A. Academic Press, Amsterdam, pp 437–451
62. Selva J, Sandri L (2013) Probabilistic seismic hazard assessment: combining cornell-like approaches and data at sites through Bayesian inference. Bull Seismol Soc Am 103(3):1709–1722.
63. Selva J, Tonini R, Molinari I, Tiberti MM, Romano F, Grezio A, Melini D, Piatanesi A, Basili R, Lorito S (2016) Quantification of source uncertainties in seismic probabilistic tsunami hazard analysis (SPTHA). Geophys J Int 2016.
64. Shin JY, Chen S, Kim T-W (2015) Application of Bayesian Markov Chain Monte Carlo Method with mixed gumbel distribution to estimate extreme magnitude of tsunamigenic earthquake. KSCE J Civ Eng 19(2):366–375.
65. Shuto N (1991) Numerical simulation of tsunamis – its present and near future. Nat Hazards 4:171–191
66. Smith WHF, Sandwell DT (1997) Global seafloor topography from satellite altimetry and ship depth soundings. Science 277:1957–1962Google Scholar
67. Sørensen MB, Spada M, Babeyko A, Wiemer S, Grünthal G (2012) Probabilistic tsunami hazard in the Mediterranean Sea. J Geophys Res 117:B01305.
68. Stigler SM (1983) Who discovered Bayes Theorem? Am Stat 37:290–296
69. Suppasri A, Imamura F, Koshimura S (2012) Probabilistic tsunami hazard analysis and risk to coastal populations in Thailand. J Earthquake and Tsunami 06:1250011. [27 Pages].
70. Synolakis CE (1987) The runup of solitary waves. J Fluid Mech 185:523–545
71. Tanioka Y, Satake K (1996) Tsunami generation by horizontal displacement of ocean bottom. Geophys Res Lett 23:861–865
72. Tatsumi D, Calder CA, Tomita T (2014) Bayesian near-field tsunami forecasting with uncertainty estimates. J Geophys Res Oceans 119:2201–2211.
73. Thio HK, Li W (2015) Probabilistic tsunami hazard analysis of the cascadia subduction zone and the role of epistemic uncertainties and aleatory variability. In: 11th Canadian conference on earthquake engineering, Victoria, pp 21–24Google Scholar
74. Thio HK, Somerville P, Polet J (2010) Probabilistic tsunami hazard in California, PEER Report 2010/108 Pacific Earthquake Engineering Research CenterGoogle Scholar
75. Tinti S, Maramai A, Graziani L (2004) The new catalogue of Italian Tsunamis. Nat Haz 33(439):465Google Scholar
76. Tinti S, Armigliato A, Tonini R, Maramai A, Graziani L (2005) Assessing the hazard related to tsunamis of tectonic origin: a hybrid statistical-deterministic method applied to Southern Italy coasts. ISET J Earthquake Tech 42:189–201Google Scholar
77. Tinti S, Argnani A, Zaniboni F, Pagnoni G, Armigliato A (2007) Tsunamigenic potential of recently mapped submarine mass movements offshore eastern Sicily (Italy): numerical simulations and implications for the 1693 tsunami. IASPEI—JSS002—abstract n. 8235 IUGG XXIV General Assembly, Perugia, 2–13 July 2007Google Scholar
78. TPSWG Tsunami Pilot Study Working Group (2006) Seaside, Oregon tsunami pilot study— modernization of FEMA flood hazard maps. NOAA OAR Special Report, NOAA/OAR/PMEL, Seattle, p 94 + 7 appendicesGoogle Scholar
79. Vannucci G, Gasperini P (2004) The new release of the database of earthquake mechanisms of the mediterranean area (EMMA2). Ann Geophys Suppl 47:307–334Google Scholar
80. Watts P, Grilli ST, Tappin D, Fryer GJ (2005) Tsunami generation by submarine mass failure, II: predictive equations and case studies. J Waterway Port Coast Ocean Eng 131:298–310
81. Wells DL, Coppersmith KJ (1994) New empirical relationships among magnitude, rupture length, rupture width, rupture area and surface displacement. Bull Seismol Soc Am 84:974–1002Google Scholar
82. Yadav RBS, Tsapanos TM, Tripathi JN, Chopra S (2013) An evaluation of tsunami hazard using Bayesian approach in the Indian Ocean. Tectonophysics 593:172–182

## Copyright information

© Springer Science+Business Media LLC 2019

## Authors and Affiliations

• Anita Grezio
• 1
Email author
• Stefano Lorito
• 2
• Tom Parsons
• 3
• Jacopo Selva
• 1
1. 1.Istituto Nazionale di Geofisica e VulcanologiaBolognaItaly
2. 2.Istituto Nazionale di Geofisica e VulcanologiaRomeItaly
3. 3.USGSMenlo ParkUSA 