# Inference for ETAS models with non-Poissonian mainshock arrival times

- 175 Downloads

## Abstract

The Hawkes process is a widely used statistical model for point processes which produce clustered event times. A specific version known as the ETAS model is used in seismology to forecast earthquake arrival times under the assumption that mainshocks follow a Poisson process, with aftershocks triggered via a parametric kernel function. However, this Poissonian assumption contradicts several aspects of seismological theory which suggest that the arrival time of mainshocks instead follows alternative renewal distributions such as the Gamma or Brownian Passage Time. We hence show how the standard ETAS/Hawkes process can be extended to allow for non-Poissonian distributions by introducing a dependence based on the underlying process’ behaviour. Direct maximum likelihood estimation of the resulting models is not computationally feasible in the general case, so we also present a novel Bayesian MCMC algorithm for efficient estimation using a latent variable representation.

## Keywords

ETAS Stress release Renewal process Hawkes process Brownian passage times RHawkes## 1 Introduction

The Epidemic Type Aftershock Sequence (ETAS) model is commonly used for studying and forecasting the occurrence of earthquakes in a geographical region of interest (Ogata 1988). It assumes that earthquakes follow a self-exciting marked point process governed by a conditional intensity function \(\lambda (t | {\mathscr {H}}_t)\) which defines the instantaneous probability of an earthquake occurring at each time point *t* based on the historical earthquake sequence \({\mathscr {H}}_t=\{(t_1,m_1), (t_2,m_2),\ldots : t_i<t\}\), where \(t_i\) and \(m_i\), respectively, denote the time and magnitude of the *i*th previous earthquake.

Since the ETAS model assumes that the immigrant earthquakes follow a Poisson process with constant intensity \(\mu \), this implies that they occur completely at random, i.e. that an immigrant event is equally likely to occur at each point in time, and that the time between each pair of immigrant events (known as the ‘inter-arrival times’) follows a time-independent Exponential(\(\mu \)) distribution. However, this conflicts with findings elsewhere in the seismology literature, where there is substantial doubt over whether the occurrence times of mainshock earthquakes is really Poissonian (Tahernia et al. 2014; Ordaz and Arroyo 2016; Marzocchi and Taroni 2014). Although ETAS immigrant events are not strictly equivalent to mainshocks as defined elsewhere in the seismology literature (since there is no requirement that an ETAS immigrant should have larger magnitude than its offspring), this still seems to cast some doubt on the Poissonian assumption.

The concept of stress release (SR) suggests that the mainshock arrival times instead follow a renewal process that has a time-dependent hazard function, with inter-event times following a distribution such as the Gamma, Weibull, or Brownian Passage Times (BPT). Stress release models (SRM) were a representation of Reid’s elastic rebound theory (Reid 1910) and were fully described by Isham and Westcott (1979) as a self-correcting point process which is updated after every event occurrence. They were introduced to seismology by Vere-Jones (1978) who developed them in order to address Reid’s theory that earthquakes occur due to a release of energy which was previously accumulated strain energy along faults. SRMs were used in many locations to implement the elastic rebound theory due to their solid physical background. As outlined in Varini and Rotondi (2015), some of the examples of such implementations are present for the following countries: China (Yang et al. 2000; Liu et al. 1998; Xiaogu and Vere-Jones 1994), Greece (Rotondi and Varini 2006), Iran (Xiaogu and Vere-Jones 1994), Italy (Rotondi and Varini 2007; Varini and Rotondi 2015), Japan (Imoto 2001; Lu et al. 1999; Xiaogu and Vere-Jones 1994), New Zealand (Yang et al. 2000) and Taiwan (Zhu and Shi 2002).

SRMs are primarily applied to declustered sequences of mainshock events with large magnitudes, rather than to the full seismic sequences that are commonly used to fit ETAS models. In this paper we will develop a new class of ETAS models which we call SR-ETAS (stress release ETAS) which improve on standard ETAS models by incorporating time-dependent inter-arrival distributions. We explore two different formulations of SR-ETAS, which differ based on how they handle the inter-event time that is taken into account when calculating the immigrant event intensity. The first formulation is simpler to estimate but harder to simulate from, and addresses the Reid’s elasticity rebound theory directly for all events in the catalogue. The second one is harder to estimate as it depends entirely on the branching structure as it assumes that Reid’s theory is applicable only for immigrant events, making direct maximum likelihood estimation impossible.

A model which is closely related to our SR-ETAS was proposed by Wheatley et al. (2016), who considered a Hawkes process with a renewal immigration process, which they call Renewal Hawkes (RHawkes). The authors proposed an Expectation Maximisation (EM) algorithm for parameter estimation. However, as pointed out by Wheatley (2017), Wheatley (2016) their approach crucially exploited the Markovian properties used by the Exponential offspring density \(g(\cdot )\) that they considered, which leads to instability when this is replaced by a heavy-tailed alternative such as the Omori law used in the ETAS model (Oakes 1975; Filimonov and Sornette 2015). To mitigate this, they suggest that such heavy-tailed densities should be approximated by a sum of weighted exponential kernels. Further, simulation studies were found that their EM algorithm performs poorly even for the more simplistic Renewal Immigration Hawkes process in the case where the offspring clusters are heavily overlapping, which is inevitably the case of seismic sequences. To correct this, (Chen and Stindl 2018) provided a direct maximum likelihood optimisation, as well as some conceptional corrections to the method proposed by Wheatley et. al. However, both methods fail to address two fundamental issues. The first one is the potential multimodality of the ETAS mode likelihood. As discussed in Rasmussen (2013), Veen and Schoenberg (2008), Ross (2018a), such numerical instabilities can be tackled using an MCMC sampler. The second, and probably more important problem, is the lack of discussion regarding the numerical stability of the evaluation of Eq. 2. This approach is used if the intensity cannot be factorised into a single equation, i.e. it has to be evaluated as a ratio of two functions. The problem occurs since the denominator of Eq. 2 is approaching zero for large time lag.

Since the existing Expectation Maximisation (EM) and Direct Maximum Likelihood Estimation algorithms lead to either poor or limited estimation of the SR-ETAS model, we instead propose a novel Bayesian inference algorithm which uses latent variables to allow for computationally efficient inference using a Gibbs sampler, which is an extension of that proposed for the standard Hawkes process by Ross (2018a).

The remainder of this paper proceeds as follows. In Sect. 2 we review the standard ETAS model in more detail. The SR-ETAS models are fully introduced in Sect. 3, and we discuss different choices for the immigrant process in Sect. 4. The methods for parameter estimation are present in Sect. 5. In Sect. 6, we introduce the goodness-of-fit tests which will be used to compare performance of SR-ETAS to standard ETAS models. Finally, in Sect. 7, we study the application of SR-ETAS models and compare its performance to standard ETAS using real earthquake data from the New Madrid and the North California seismic sequences.

## 2 Standard ETAS model

*i*. All magnitudes are assumed to independently follow the Gutenberg–Richter law, which corresponds to a shifted Exponential(\(\beta \)) distribution with lower bound \(M_0\). The \(\mu \) parameter specifies the intensity of the homogeneous point process governing the immigrant events, while \(g(\cdot )\) is a kernel function specifying how the effect of each earthquake on the intensity decays over time. It is usually taken to be the Omori law:

*c*and

*p*are parameters controlling the decay rate, while

*k*controls the average productivity. The magnitude kernel \(\kappa (m_i)\) determines how the magnitude of each earthquake affects the intensity and is usually defined as:

*k*, and \(M_0\) is the catalogue’s magnitude of completeness, i.e. the minimum magnitude above which is considered that no events are missing due to physical limitations in the earthquake detection system. The unknown parameter set of the standard ETAS model is hence: \(\theta =\{\mu , \alpha , c, p, k\}\).

Note that the form of the conditional intensity function in Eq. 1 is equivalent to a branching process, as discussed in the previous section. Suppose that at some time point *t* there have been \(n_t\) previous earthquakes. Then, the process intensity at *t* can be viewed as a linear superposition of the immigrant process with intensity \(\mu \) and the \(n_t\) processes associated with each previous event, each contributing an intensity of \(g(t-t_i)\). It can hence be seen this formulation is equivalent to assuming that the immigrant events follow a homogeneous Poisson process with intensity \(\mu \), and hence have exponentially distributed inter-event times.

The standard ETAS model can be generalised to include a space component, giving the spatiotemporal ETAS model (Ogata 1998). For simplicity and ease of both simulation and computation, we only consider the original temporal ETAS model in this paper rather than its spatiotemporal extension, although our model could be extended to the spatial version without difficulty.

## 3 SR-ETAS models

*t*depends on the time at which the last mainshock occurred. To make this clearer, we introduce the following notation. For each earthquake

*i*, let \(B_i\) denote the index of its parent earthquake in the branching structure, with \(B_i=0\) if it has no parent (i.e. if earthquake

*i*is an immigrant). We hence have the branching vector \(B=(B_1,\ldots ,B_n)\). For example, in Fig. 1, \(B=(0,1,1,3,1,0,6,7,6,0,8)\).

*t*we write the occurrence time of the last previous immigrant event prior to \(t_i\) is \(t_{I_{[i]}}\) where \(I_{[i]}=\max _j \{j | t_j < t_i \text{ and } B_j = 0\}\). Similarly, the amount of time which has elapsed since the last previous immigrant event—known as the waiting time—is given by:

Under the definition provided by Eq. 2, the probability of an immigrant event occurring depends on the time which has elapsed since the previous immigrant event, in a manner which is consistent with SR theory since it can be interpreted with respect to Reid rebound theory where the ground state level is reached only for immigrant events and all other events are causing smaller impact on the strain accumulation/reduction. Since the branching structure is used to determine the time of the last immigrant event, we will refer to this model as the B-SR-ETAS model (Branched-SR-ETAS).

However, in practice when working with real earthquake catalogues, we do not know which events in the sequence are mainshocks since we do not have access to the true branching structure. Indeed, the branching structure is usually estimated as a by-product of the standard algorithms used to estimate the ETAS model (Ross 2018a; Rasmussen 2013; Veen and Schoenberg 2008). However, we cannot use this idea directly since we are caught in a vicious circle: our parameter estimation requires access to the branching structure in order to define the mainshock earthquakes, but we cannot get the branching structure without first estimating the model parameters! One approach is to marginalise the branching structures out of the joint distribution by summing over all \(2^{n-1}\) unique branching structures, for a catalogue with length *n*. However, this is computationally intractable for even a moderate value of *n*. As such, we will instead introduce a Monte Carlo approach for performing this inference in a computationally tractable way.

*t*, regardless of whether it was an immigrant or an offspring. At time

*t*, the time of the last event is given by \(t_E\) where \(E = \max \{i | t_i < t\}\). The waiting time in this case is hence:

The previously introduced concept of parameter set \(\theta \) can be adapted for both SR-ETAS models as \(\theta =\{\theta _{SR}, \alpha , c, p, k\}\), where \(\theta _{SR}\) is taking the parameters of the waiting time distribution \(F_w(\cdot )\).

## 4 Waiting time distributions

Regardless of which of the two approaches (B-SR-ETAS or F-SR-ETAS) we take when defining the waiting times \(w_t\), we must specify a probability model \(F_w\) which governs their distribution. In standard ETAS, the Poisson assumption results in a memoryless Exponential distribution. In contrast, the SR approach implies other forms of distributions with nonconstant hazard rate. There is some controversy in the seismological literature over the appropriate waiting time distribution for modelling the time between mainshocks. As such, we will consider two different distributions which have been found to have strong empirical support: the Brownian Passage Time, and the Gamma.

### 4.1 Brownian passage times (BPT) immigration

- 1.
The mean waiting time of the (immigrant) events in the catalogue of interest, \(\lambda \), provides a threshold until which the probability of event occurrence is continuously increasing. After reaching the mean waiting time, the conditional probability of occurrence is time independent and depends only on the aperiodicity parameter, \(\nu \), which is associated with the scaling of the Brownian motion.

- 2.
Earthquake occurrence corresponds to immediate stress release to ground base level. Thus, the probability of immediate events recurrence is zero.

### 4.2 Gamma process immigration

## 5 Estimation

We now consider parameter estimation for the SR-ETAS models. This includes estimating the ETAS model parameters \(\theta _{\Phi }=(\alpha , c,p,k)\), as well as \(\theta _{\mathrm{SR}}\), the parameters of the waiting time distribution \(F_w\). Let \(\theta = (\theta _{\mathrm{SR}}, \theta _{\Phi })\) denote the full set of unknown parameters. We perform Bayesian inference for the model parameters by developing a latent variable MCMC scheme that allows sampling from the full posterior.

### 5.1 Likelihood function

*n*indicating whether each event is immigrant (1) or not (0). As of the branching structure introduced in Fig. 1, the immigrant information is \(Z=\{1,0,0,0,0,1,0,0,0,1,0\}\).

*Z*for B-SR-ETAS since the intensity of the background process depends on the time at which the last immigrant event occurred. However since the true branching structure is not known in practice, it must be marginalised out by summing over all \(2^{n-1}\) possible values. Therefore, the log-likelihood of the B-SR-ETAS model is:

*n*. As such, we will instead use a latent variable formulation where the unknown branching vector

*Z*is treated as a parameter to be learned. In order to evaluate this quantity, we can either use a single “best” quantity or to provide a Monte Carlo approximation of it based on sampling multiple branching structures based on the true/optimised parameters \(\theta \).

While the proposed by Wheatley et al. (2016) log-likelihood function is conceptually the same as the one shown above, (Chen and Stindl 2018) Sect. 3, Remark 1, claims that the log-likelihood form is wrong with respect to the examined by them RHawkes process. The full algorithm that is proposed for the calculation of the (log-)likelihood of RHawkes by Chen and Stindl (2018) is provided in “Appendix A”. This method requires the calculation of probabilities associated with all possible inheritance structures. In other words, the immigrant intensity Eq. 2 has to be evaluated for all possible temporal lags when in the calculation of Eqs. 6–8, Sect. 3 of Chen and Stindl (2018). As discussed before, such expression cannot be evaluated for immigrant distributions that do not have explicit intensity function (Eq. 2). However, from a Bayesian prospective, the branching structure is a feature that we learn. Rather than being an unknown quantity, it is a data characteristics that we evaluate based on our inheritance believes. Thus, the provided log-likelihood function in Eq. 5 is feasible for the scope of a Bayesian algorithm.

### 5.2 Bayesian analysis

*i*we would like to propose a new sampled value of \(\theta ^{(i)}\) based on \(\theta ^{(i-1)}\). For example we might consider a White Noise transformation such as \(\theta ^{(i)}=\theta ^{(i-1)}+\epsilon \) were \(\epsilon \sim N(0, \sigma ^2)\). The acceptance probability of the proposed value \(\theta ^{(i)}\) is \(\pi (\theta ^{(i)}|{\mathscr {H}}_t )/\pi (\theta ^{(i-1)}|{\mathscr {H}}_t )\). If the value is rejected, we fail to obtain a new sample at this step and assign the \((i-1)\)st sample to the

*i*th (i.e. \(\theta ^{(i)}=\theta ^{(i-1)}\)) and repeat the procedure for the next step.

*g*() that now integrates to 1. The log-likelihood function as of Eq. 6 is then:

#### 5.2.1 Branching procedure

*B*denote the branching structure vector where \(B_i=j\) indicates that the

*i*-th event in the sequence is caused by the

*j*-th event (\(j<i\)). Immigrant events are notated as uncaused, i.e. caused by an event with index 0. If we refer again to the branching structure, that was introduced on Fig. 1, we can visually assign corresponding values for our branching inheritance measure \(B_i\) as follows \(B=\{0, 1, 1, 3, 1, 0, 6, 7, 6, 0, 8\}\). The immigrant events are coming from a in-homogeneous Poisson process with intensity function \(\mu (\cdot )\) while the offspring events of the

*j*-th event are generated from in-homogeneous Poisson process with intensity \(h(t_i-t_j) \iota (m_j)\). Assuming that each event in the sequence is generated by a single process, we can assign probabilities distribution to each event with respect to its branching pedigree and therefore sample a branching structure from its conditional posterior as follows:

- 1.
Initiate the branching by setting \(B_1=0\) as we assume that always the first term is immigrant.

- 2.
Sample each \(B_i\) in turn from \(P(B_i | {\mathscr {H}}_t, \theta , B_{1:(i-1)})\)

- 3.
Return the sequence of generated \(B_i\)s

*i*-th event in order to obtain branching for B-SR-ETAS, and \(I_{[i]}=i-1\) for F-SR-ETAS and ETAS models.

#### 5.2.2 Log-likelihood latent variable transformations

*t*is a sum of the contribution of \(\mu (t)\) from the background process, and a contribution of \(h(t-t_i)\) for each of the previous event \(t_i\). Let us define \(S_0\) to be the set of all immigrant events (conditional on the branching structure), and \(S_I\) to be the set of all events triggered by each event \(t_i\). We write \(|S_i|\) to denote the number of events in each set. For a given branching structure, the likelihood function can then be rewritten as:

*B*is a full branching structure realisation, and the integrals are summed over all immigrant events except the first one since there is no waiting time for the first event. The permutation over \(\mu (s)\) is a permutation of the spot values of \(\mu (\cdot )\) at the triggering times of all immigrant events in the catalogue and \(\theta _{SR}\) represents the parameter set of the chosen SR distribution. Note that \(\mu (t)\) in this case is actually \(\mu (t|w_t)=\frac{f_w(w_t)}{1-F_w(w_t)}\), where \(w_t\) is the waiting time from the last immigration for the B-SR-ETAS model and the waiting time between every event for the F-SR-ETAS. The \(f_w(\cdot )\) and \(F_w(\cdot )\) are the corresponding PDF and CDF of the candidate immigration distribution (SR). Additional approximation can be obtained based on the previously mentioned infinite time assumption, namely that the end time of the catalogue is very large and as such the integral over the Modified Omori law (\(h(\cdot )\)) for the range of values in the catalogue converges to 1, or in other words

#### 5.2.3 Choice of prior and proposal distributions

The (SR-ETAS) parameter estimates in this paper were obtained by running a latent variable MCMC for each of the 5 proposed models—ETAS, B-B-ETAS, F-B-ETAS, B-G-ETAS and F-G-ETAS. We use noninformative priors for all (SR-) ETAS parameters. For the standard ETAS model there exists a conjugate Gamma prior for the fixed ground intensity \(\mu \) (Ross 2018a). We used a flat Uniform prior for \(\theta _{\mathrm{SR}}\), \(\alpha \), \(\log (c)\), \(\log (p)\) and \(\log (K)\) with bounds \(\alpha \in [0,10], c \in [0,10], p \in [1,30], K \in [0,\infty ]\), although more informative priors could be used if desired. For a infinite time catalogue, the overall productivity of the offspring decay, i.e. the mean number of offsprings by every event is *K* thus we might want to reduce it to be smaller than 1 for simulation purposes. Since in reality the time is not infinite, the overall productivity is not *K* anymore. It is catalogue dependent and as such we decided to use a higher upper bound for *K*. The support for the other parameters is greatly influenced by the potential multimodality and were taken to be identical to those used in the Bayesian ETAS R package (Ross 2018a).

We use as a proposal distribution a Normal with standard deviation of 0.1 for all parameters that require Metropolis–Hastings updates. The New Madrid catalogue parameters’ sequences are with overall length of 15,000 after burn-in of 5, 100 and 100 for the \(\theta _{\mathrm{SR}}\), \(\{K, \alpha \}\) and \(\{c, p\}\), respectively. The branching structure was sampled from its conditional posterior at every iteration. The North California catalogue is much larger, thus we updated the branching structure less frequently at every 20 iterations of the Gibbs sampler, overall 12,000 parameter sets were obtained after burn-in of 4, 100, 20 for the \(\theta _{\mathrm{SR}}\), \(\{K, \alpha \}\) and \(\{c, p\}\), respectively.

## 6 Model comparison: diagnostic tests

In order to compare the performance of the two SR-ETAS models to the standard ETAS model for the purpose of modelling real earthquake data, we require model comparison metrics. In this section, we discuss the various tests which we will use for the comparison.

### 6.1 Bayesian information criterion (BIC)

*d*is the number of free model parameters, i.e. \(d=|\theta |\), \(l(\theta )\) is the log-likelihood value evaluated at the MLE \({\hat{\theta }}\) and

*n*is the number of observations. The best model is associated with the lowest value of BIC coefficient.

#### 6.1.1 Deviance information criterion (DIC)

### 6.2 Time rescaling residuals

#### 6.2.1 Cramér-Von Mises test

*w*(

*x*) is a weight function which is assumed to be equal to 1 in the standard CVM test.

#### 6.2.2 Ljung–Box test

*m*autocorrelations of the residuals. The test statistic is:

*n*is the length of the data, \({\hat{\rho }}_k\) is the estimated autocorrelation at the lag of interest

*k*with respect to the number of lags (

*m*) that are taken into account. The choice of appropriate number of lags

*m*is critical for obtaining adequate test results (Hyndman 2014). Given that the data is expected to be nonseasonal and always have more than 200 observations, we always use 10 lags, i.e. \(m=10\).

#### 6.2.3 Engle Russell excess dispersion test

Goodness-of-fit summary—New Madrid; ETAS, BPT and Gamma-based SR-ETAS

ETAS | F-G-ETAS | B-G-ETAS | F-B-ETAS | B-B-ETAS | |
---|---|---|---|---|---|

Log-likelihood | \(-\) 260.75 | \(-\) 256.44 | \(-\) 250.13\(^\mathrm{a}\) | \(-\) 251.28 | \(-\) 245.01\(^\mathrm{a}\) |

Number of parameters | 5 | 6 | 6 | 6 | 6 |

BIC | 275.05 | 273.23 | \(-\) 267.32\(^\mathrm{a}\) | 268.47 | \(-\) 262.20\(^\mathrm{a}\) |

DIC | 519.82 | 515.47 | 484.28\(^\mathrm{a}\) | 507.60 | 483.80\(^\mathrm{a}\) |

## 7 Applications

In this section, we discuss and compare the model fit across ETAS-based models on two seismic catalogue of interest. The first one is the New Madrid catalogue which is much smaller but of great interest for underwriting community while the second one, the North California, is more dense and should behave similarly to a typical single fault catalogue.

### 7.1 New Madrid seismic sequence

We first compare the performance of the ETAS and SR-ETAS models on the catalogue of New Madrid earthquakes obtained from The University of Memphis website http://www.memphis.edu/ceri/seismic/catalog.php. This catalogue starts on 29/06/1974 and ends on 23/02/2017. Only earthquakes of magnitude greater than 3 are considered since smaller ones are typically considered harmless. The resulting catalogue contains 308 events. We fit the ETAS model and BPT and Gamma-based SR-ETAS models to this catalogue.

Figure 2 shows how the sequence of log-likelihoods for each model evolves over each iteration of the Gibbs sampler (after convergence). It is clearly observable that there is a difference between the overall fitting capabilities between the 5 models. What is more, the overall mixing for branched-SR models is greater and relatively more symmetric. Figure 3 plots the posterior distribution of the model parameters for the B-B-ETAS, which is the most difficult model to estimate due to the need to estimate the unknown branching structure. The posterior distributions for the parameters in the other models are similar. As expected, the obtained parameters’ distributions are smooth, symmetric and not very different from a bell-shaped-base form.

The Goodness-of-fit and model comparison results are shown on Table 1. Amongst all ETAS-based models it appears that SR-ETAS models are superior to the standard ETAS model according to both BIC and DIC. BPT-based models are supreme to their corresponding Gamma alternatives and B-SR-ETAS models are supreme to the F-SR-ETAS. The best model within all examined models is evidently the B-B-ETAS.

Figure 4 presents the informal diagnostic plots of the time residuals. On the left are the raw time residuals for all 5 models versus a diagonal line. Ideally, these should overlap. The overall pattern is very similar for all models. They all experience a bias towards the middle of the catalogue which might indicate a potential minor nonstationarity in the data (Kumazawa and Ogata 2013). The right part of Fig. 4 shows a Q–Q plot for the residuals of all 5 models versus Exponential(1) distribution. All 5 models behave similarly, with minor spread from the expected results for large quantiles.

The time rescaling diagnostic tests conclude that the CVM and ER tests were passed by all 5 models at the 5% significance level. The LB test is passed only by F-B-ETAS at 5% significance level, while all other models pass it at 1% significance level. Thus, there might be minor dependence in the residuals which we believe is negligible.

### 7.2 North California seismic sequence

The previous analysis was repeated using a North California seismic sequence. The historical catalogue of earthquake events can be obtained from http://www.ncedc.org/ncedc/catalog-search.html. We took into account all events from 01/01/1987 until 31/12/2015, with magnitude of completeness of 3.5. This created a catalogue consisting of 3442 events.

The full sequences of the log-likelihood calculated using the Gibbs sampler are shown in Fig. 5. As before, all the SR-ETAS models appear to give substantial improvements over the basic ETAS model. Again we decided to report the posterior density only for B-B-ETAS which are shown in Fig. 6. The heavy tails that appeared for the New Madrid catalogue are not present. Overall the shapes of all 6 parameters appear to be roughly symmetric. The goodness-of-fit results of the un-simplified (finite time) runs are shown on Table 2.

According to the BIC, the F-B-ETAS is the worst model while all other SR models are slightly better than the standard ETAS. Due to the larger number of observations in this catalogue, we decided to examine the DICalt that depends on the previously defined \(p_{\mathrm{DICalt}}\). According to it, the branched-SR models are providing a considerable performance improvement compared to the full models while the standard ETAS performs the worst. For this catalogue BPT-based models are no longer superior to their corresponding Gamma alternatives. Interestingly, F-B-ETAS is currently the worst model. This is probably attributed to the fact that Gamma-SR-ETAS models are guaranteed to be at least as good as the ETAS model since they can reduce to it, since the Exponential inter-arrival time distribution used in the standard ETAS is nested inside the Gamma distribution. It is clear that B-B-ETAS has a great advantage amongst all other models.

Goodness-of-fit summary—North California; ETAS, BPT and Gamma-based SR-ETAS

ETAS | F-G-ETAS | B-G-ETAS | F-B-ETAS | B-B-ETAS | |
---|---|---|---|---|---|

Log-likelihood | \(-\) 109.32 | \(-\) 103.68 | \(-\) 90.61\(^\mathrm{a}\) | \(-\) 110.58 | \(-\) 64.83\(^\mathrm{a}\) |

Number of parameters | 5 | 6 | 6 | 6 | 6 |

BIC | 129.68 | 128.11 | 115.04\(^\mathrm{a}\) | 135.01 | 89.26\(^\mathrm{a}\) |

DICalt | 1108.51 | 799.57 | 725.33\(^\mathrm{a}\) | 553.48 | 338.07\(^\mathrm{a}\) |

## 8 Conclusion

The ETAS model has proved to be one of the most widely used tools for modelling seismic activity in terms of both capturing specific features of interest and forecasting future events. Its estimation can be considered challenging due to identifiability issues. In this work, we introduced the concept of temporally variable ground intensity based on stress release modelling. In this, we specified two families of SR-ETAS model that depend on either the occurrence time of the previous event in the sequence (Full-SR-ETAS), or the elapsed time from the last immigrant (main) event (Branched-SR-ETAS). Our experimental results suggest that these models capture observed features of real earthquake catalogues that the standard ETAS model does not.

Our experimental results suggest that these models capture features of real earthquake catalogues related to crustal strain budget that the standard ETAS model does not. Currently, we examined a single fault, nonspatial occurrence that is typically used by general seismologist for the analysis of a seismic fault activity. All concepts are directly applicable to the Spatial extension of ETAS. There are many alternatives of the spatial component(s) of the standard ETAS that provide a great differentiation amongst them which makes direct comparison of the introduced family of model impractical. Overall, the nonspatial alternative introduced in this paper will provide excellent results as long as there are no strong nonlinear or nonuniform patterns in the spatial distribution of the earthquakes along the fault of interest.

All methods are introduced for a general distribution, as such the SR-ETAS family can grow very quickly to accommodate the modelling needs of any sort of data. Direct application to stock daily changes, insurance claims, fraud and terrorist threats is feasible.

## Notes

### Acknowledgements

We would like to express sincere appreciation for the detailed comments of the two reviewers. Their immense contribution helped us to address very critical issues that resulted in a substantial improvements to the paper.

## References

- Brown, E.N., Barbieri, R., Ventura, V., Kass, R.E., Frank, L.M.: The time-rescaling theorem and its application to neural spike train data analysis. Neural Comput.
**14**(2), 325–346 (2002)zbMATHCrossRefGoogle Scholar - Chen, C.-H., Wang, J.-P., Wu, Y.-M., Chan, C.-H., Chang, C.-H.: A study of earthquake inter-occurrence times distribution models in Taiwan. Nat. Hazards
**69**(3), 1335–1350 (2013)CrossRefGoogle Scholar - Chen, F., Stindl, T.: Direct likelihood evaluation for the renewal hawkes process. J. Comput. Graph. Stat.
**27**(1), 119–131 (2018)MathSciNetzbMATHCrossRefGoogle Scholar - Chib, S., Greenberg, E.: Understanding the Metropolis–Hastings algorithm. Am. Stat.
**49**(4), 327–335 (1995)Google Scholar - Ellsworth, W.L., Matthews, M.V., Nadeau, R.M., Nishenko, S.P., Reasenberg, P.A., Simpson, R.W.: A physically based earthquake recurrence model for estimation of long-term earthquake probabilities. US Geol. Surv.
**522**, 23 (1999)Google Scholar - Engle, R.F., Russell, J.R.: Autoregressive conditional duration: a new model for irregularly spaced transaction data. Econometrica
**66**, 1127–1162 (1998)MathSciNetzbMATHCrossRefGoogle Scholar - Filimonov, V., Sornette, D.: Apparent criticality and calibration issues in the Hawkes self-excited point process model: application to high-frequency financial data. Quant. Finance
**15**(8), 1293–1314 (2015)MathSciNetCrossRefGoogle Scholar - Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis, vol. 2. CRC Press, Boca Raton (2014)zbMATHGoogle Scholar
- Hamra, G., MacLehose, R., Richardson, D.: Markov chain Monte Carlo: an introduction for epidemiologists. Int. J. Epidemiol.
**42**(2), 627–634 (2013)CrossRefGoogle Scholar - Hyndman, R.J.: Thoughts on the Ljung–Box test. (2014) https://robjhyndman.com/hyndsight/ljung-box-test/. Accessed 4 Aug 2017
- Imoto, M.: Application of the stress release model to the Nankai earthquake sequence, southwest Japan. Tectonophysics
**338**(3), 287–295 (2001)CrossRefGoogle Scholar - Isham, V., Westcott, M.: A self-correcting point process. Stoch. Process. Appl.
**8**(3), 335–347 (1979)MathSciNetzbMATHCrossRefGoogle Scholar - Johnson, N.L., Kemp, A.W., Kotz, S.: Univariate Discrete Distributions, vol. 444. Wiley, Hoboken (2005)zbMATHCrossRefGoogle Scholar
- Kagan, Y., Knopoff, L.: A stochastic model of earthquake occurrence. In: Proceedings of the Eighth International Conference on Earthquake Engineering, vol.
**1**, pp. 295–302 (1984)Google Scholar - Kumazawa, T., Ogata, Y.: Quantitative description of induced seismic activity before and after the 2011 Tohoku–Oki earthquake by nonstationary ETAS models. J. Geophys. Res. Solid Earth
**118**(12), 6165–6182 (2013)CrossRefGoogle Scholar - Lallouache, M., Challet, D.: The limits of statistical significance of Hawkes processes fitted to financial data. Quant. Finance
**16**(1), 1–11 (2016)MathSciNetCrossRefGoogle Scholar - Liu, J., Vere-Jones, D., Ma, L., Shi, Y.-L., Zhuang, J.-C.: The principle of coupled stress release model and its application. Acta Seismologica Sinica
**11**(3), 273–281 (1998)CrossRefGoogle Scholar - Ljung, G.M., Box, G.E.: On a measure of lack of fit in time series models. Biometrika
**65**(2), 297–303 (1978)zbMATHCrossRefGoogle Scholar - Lu, C., Harte, D., Bebbington, M.: A linked stress release model for historical Japanese earthquakes: coupling among major seismic regions. Earth Planets Space
**51**(9), 907–916 (1999)CrossRefGoogle Scholar - Marzocchi, W., Taroni, M.: Some thoughts on declustering in probabilistic seismic-hazard analysis. Bull. Seismol. Soc. Am.
**104**, 1838–1845 (2014)CrossRefGoogle Scholar - Matthews, M.V., Ellsworth, W.L., Reasenberg, P.A.: A Brownian model for recurrent earthquakes. Bull. Seismol. Soc. Am.
**92**(6), 2233–2250 (2002)CrossRefGoogle Scholar - Oakes, D.: The Markovian self-exciting process. J. Appl. Probab.
**12**(1), 69–77 (1975)MathSciNetzbMATHCrossRefGoogle Scholar - Ogata, Y.: Statistical models for earthquake occurrences and residual analysis for point processes. J. Am. Stat. Assoc.
**83**(401), 9–27 (1988)CrossRefGoogle Scholar - Ogata, Y.: Space-time point-process models for earthquake occurrences. Ann. Inst. Stat. Math.
**50**(2), 379–402 (1998)zbMATHCrossRefGoogle Scholar - Ordaz, M., Arroyo, D.: On uncertainties in probabilistic seismic hazard analysis. Earthq. Spectra
**32**(3), 1405–1418 (2016)CrossRefGoogle Scholar - Rasmussen, J.G.: Bayesian inference for Hawkes processes. Methodol. Comput. Appl. Probab.
**15**(3), 623–642 (2013)MathSciNetzbMATHCrossRefGoogle Scholar - Reid, H.F.: The Mechanics of the Earthquake, vol. 2. Carnegie Institution of Washington, Washington (1910)Google Scholar
- Ross, G.: Bayesian estimation of the ETAS model for earthquake occurrences. Preprint (2018a)Google Scholar
- Ross, G.: Nonparametric bayesian inference for the Hawkes process with seasonal event data. Preprint (2018b)Google Scholar
- Rotondi, R., Varini, E.: Bayesian analysis of marked stress release models for time-dependent hazard assessment in the western Gulf of Corinth. Tectonophysics
**423**(1), 107–113 (2006)CrossRefGoogle Scholar - Rotondi, R., Varini, E.: Bayesian inference of stress release models applied to some Italian seismogenic zones. Geophys. J. Int.
**169**(1), 301–314 (2007)CrossRefGoogle Scholar - Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat.
**6**(2), 461–464 (1978)MathSciNetzbMATHCrossRefGoogle Scholar - Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Linde, A.: The deviance information criterion: 12 years on. J. R. Stat. Soc. Ser. B (Stat. Methodol.)
**76**(3), 485–493 (2014)MathSciNetCrossRefGoogle Scholar - Stephens, M.A.: Use of the Kolmogorov-Smirnov, Cramér-Von Mises and related statistics without extensive tables. J. R. Stat. Soc. Ser. B (Methodol.)
**32**, 115–122 (1970)zbMATHGoogle Scholar - Tahernia, N., Khodabin, M., Mirzaei, N.: Non-Poisson probabilistic seismic hazard assessment. Arab. J. Geosci.
**7**(8), 3259–3269 (2014)CrossRefGoogle Scholar - Varini, E., Rotondi, R.: Probability distribution of the waiting time in the stress release model: the Gompertz distribution. Environ. Ecol. Stat.
**22**(3), 493–511 (2015)MathSciNetCrossRefGoogle Scholar - Veen, A., Schoenberg, F.P.: Estimation of space-time branching process models in seismology using an em-type algorithm. J. Am. Stat. Assoc.
**103**(482), 614–624 (2008)MathSciNetzbMATHCrossRefGoogle Scholar - Vere-Jones, D.: Earthquake prediction-a statistician’s view. J. Phys. Earth
**26**(2), 129–146 (1978)CrossRefGoogle Scholar - Wang, J.-H., Chen, K.-C., Lee, S.-J., Huang, W.-G., Wu, Y.-H., Leu, P.-L.: The frequency distribution of inter-event times of \(m \ge 3\) earthquakes in the Taipei metropolitan area: 1973–2010. Terr. Atmos. Ocean. Sci.
**23**(3), 269–281 (2012)Google Scholar - Wheatley, S.: Extending the Hawkes Process, A General Outlier Test, Case Studies in Extreme Risk. Ph.D. Thesis, ETH Zurich, Zurich (2016)Google Scholar
- Wheatley, S.: Personal communication (2017)Google Scholar
- Wheatley, S., Filimonov, V., Sornette, D.: The Hawkes process with renewal immigration and its estimation with an EM algorithm. Comput. Stat. Data Anal.
**94**, 120–135 (2016)MathSciNetzbMATHCrossRefGoogle Scholar - Xiaogu, Z., Vere-Jones, D.: Further applications of the stochastic stress release model to historical earthquake data. Tectonophysics
**229**(1–2), 101–121 (1994)CrossRefGoogle Scholar - Yang, W.-Z., Vere-Jones, D., Ma, L., Liu, J.: A method for locating the critical region of a future earthquake using the critical earthquake concept. Earthquake
**20**(4), 28–38 (2000)Google Scholar - Zhu, S.-B., Shi, Y.-L.: Improved stress release model: application to the study of earthquake prediction in Taiwan area. Acta Seismologica Sinica
**15**(2), 171–178 (2002)MathSciNetCrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.