1 Introduction

Strong earthquakes are usually observed to cause a pronounced spatio-temporal pattern of aftershocks. More precisely, according to the Omori-Utsu Law (Utsu et al. 1995), the temporal aftershock rate is subject to a power law decrease with time \(t-t_{main}\) after the main triggering event, that is,

$$\begin{aligned} g(t-t_{main}) = (t-t_{main}+c)^{-p} \end{aligned}$$
(1)

with the delay parameter \(c>0\) (usually a few minutes to hours) and the exponent p (usually in the range between \(0.8-1.2\)). It means that the temporal pattern of aftershocks is dominated by events occurring within short time after the mainshock. Figure 1a demonstrates this temporal behavior for the Ridgecrest sequence in California, which produced an M6.4 foreshock on July 4, 2019, followed by an M7.1 mainshock within 34 hours on July 6, 2019.

Fig. 1
figure 1

a Magnitudes versus event times of Ridgecrest Mw6.4 (red dots) and Mw7.1 (blue dots) aftershock sequences. Event times are denoted in days before/after Mw7.1 mainshock, the dashed black line represents the time origin (M7.1 event time). Light blue and light red dots mark aftershocks with magnitudes larger than 5. Yellow pentagram symbolizes the Mw6.4 foreshock, and yellow hexagram marks the Mw7.1 mainshock. b Aftershock locations of the Ridgecrest Mw6.4 and Mw7.1 sequences. Legend as in a. c Magnitudes versus logarithmic event times of Ridgecrest Mw6.4 sequence. The dashed red line represents a manually fitted estimate of the empirical completeness function \(M_c(t)\). d Magnitudes versus logarithmic event times of Ridgecrest Mw7.1 sequence. The dashed red line represents a manually fitted estimate of the empirical completeness function \(M_c(t)\)

The observed spatial patterns of aftershock sequences stem from their tendency to occur on or close to the mainshock rupture plane (Marsan and Lengliné 2008). The larger the length-to-width ratio of this plane gets, the more elongate the typical aftershock region becomes. In addition, a higher dip angle reduces the width of the 3D-to-2D projection of the rupture plain to the earth’s surface and therefore results in a scatter of two-dimensional aftershock epicenters that can be increasingly well approximated by a line segment.

The prevailing continental tectonic regime in southern California with typically steep, strike-slip faulting favors such elongated aftershock patterns in this region. With the exception of the M6.7 1994 Northridge earthquake, all of the most prominent mainshock-aftershock sequences of the last 40 years (M6.6 1987 Superstition Hill, M7.3 1992 Landers, M7.1 1999 Hector Mine, M7.2 2010 Baja California, M7.1 2019 Ridgecrest) demonstrate distinct linearly elongate scattering of aftershock locations (Hainzl 2021).

In this context, the Ridgecrest sequence is a special case as the M6.4 foreshock simultaneously ruptured two almost orthogonal faults, leading to a double pattern of separate linearly elongate aftershock clouds (Marsan and Ross 2021). Fig. 1b shows that the triggering M6.4 event (yellow pentagram) is located close to the intersection of the two ruptured faults. In contrast, the M7.1 mainshock (yellow hexagram) ruptured only one fault which appears to be the extension of one of the faults activated by the foreshock.

Analyzing and forecasting clustered seismicity is an established discipline in seismological research. Its goal is to understand the evolution of large aftershock sequences and to predict their size, largest aftershock magnitude, spatial distribution etc. A prominent approach to model clustered seismicity is the so-called Epidemic Type Aftershock Sequence (ETAS) model, which describes earthquake records as a superposition of independent background seismicity and triggered earthquake sequences (Ogata 1988, 1998). The earthquake triggering component is designed in terms of a branching process and characterized by the triad of (1) trigger-magnitude dependent aftershock productivity, (2) a temporal distribution of aftershock times typically derived from the Omori Law (see Eq. 1), and (3) an usually isotropic spatial distribution of aftershock locations (e.g. Zhuang et al. 2002; Jalilian 2019). Particularly, the aftershock productivity (i.e. expected number of offsprings) for a trigger event with magnitude m is

$$\begin{aligned} k_{A,\alpha }(m) = A\, \exp (\alpha \, (m-M_c)), \end{aligned}$$
(2)

where parameters \(A > 0\) and \(\alpha > 0\) control the exponential growth of the trigger potential and \(M_c\) is the cut-off magnitude of the analyzed earthquake catalog.

Despite generally producing successful and insightful estimation and forecast results, ETAS models are known to be limited by a number of potential biases. In this article, we present an approach that combines solutions for three main short-comings of the conventional ETAS model, (1) the isotropic spatial aftershock distribution, (2) the infinite extent of the spatial kernel and (3) the short-term incompleteness of earthquake records after strong triggering events.

1.1 Bias 1: isotropic spatial distribution

The common assumption in ETAS models is that spatial aftershock locations are distributed isotropically around the triggering event. It is named as a shortcoming in many publications because it stands in conflict with the abovementioned observation that aftershocks tend to occur close to the (elongate) rupture plane of the triggering event (Ogata 1998, 2011; Ogata and Zhuang 2006; Hainzl et al. 2008, 2013; Seif et al. 2017; Zhang et al. 2018). The assumption of isotropy is reasonably valid for weak earthquakes with small rupture extensions, but becomes problematic for larger magnitudes, e.g. see the spatial pattern of the Ridgecrest sequence in Fig. 1b. It has been shown that inadequate spatial models can lead to an underestimation of the productivity parameter \(\alpha\) (Eq. 2) because the numerous small events take over the role of mimicking the ”true” anisotropic distribution (Hainzl et al. 2008, 2013; Grimm et al. 2021).

1.2 Bias 2: infinite spatial extent

Under the common definition of an inifinite spatial kernel, aftershock triggering is disproportionately associated with the more numerous small events, that can more flexibly mimic anisotropic event alignments than the few strong mainshocks. This can lead to unrealistically far trigger impact of small magnitudes and to a substantial underestimation of the direct aftershock productivity of strong events, resulting in a smoothing of temporal event distributions (Grimm et al. 2021).

1.3 Bias 3: short-term aftershock incompleteness (STAI)

Strong earthquakes typically cause incomplete aftershock records immediately after their occurrence, mainly due to an overlap of coda waves (Hainzl 2016a; de Arcangelis et al. 2018). Figure 1c and (d) confirms this phenomenon for the aftershock sequences of the M6.4 and M7.1 Ridgecrest events, respectively. Apparently, records of smaller sized aftershocks are missing in the first minutes to hours, somewhat foiling the power law decay of event rates expected from the Omori-Utsu law (Eq. 1). The short-term incomplete event records can therefore hide to a large extent both the ”true” Omori Law decay (Eq. 1) and the ”true” aftershock productivity of the trigger event (Eq. 2) and lead to an overestimation of Omori parameter c and an underestimation of productivity parameter \(\alpha\) (Hainzl 2021, 2016b; Page et al. 2016; Seif et al. 2017).

Data-driven uncertainties of event locations and cut-off magnitude as well as the assumption of a time-invariant seismic background may lead to further inaccuracies in the parameter estimation (Harte 2013, 2016; Seif et al. 2017). However, they can be neglected in our study because they are either expected to be small in southern California datasets (e.g. location and magnitude uncertainty) or do not apply in an isolated sequence analysis (background miss-specification).

1.4 Scope of this article

In this article, we combine an ETAS approach accounting for short-term incomplete event records with the application of a generalized, anisotropic spatial model that restricts the spatial kernel to the local surrounding of the trigger source. We demonstrate the functionality and superiority of our approaches over the conventional, isotropic ETAS model by means of forecasting experiments for the Ridgecrest sequence.

We utilize the generalized anisotropic and locally restricted spatial kernel suggested by Grimm et al. (2021), which assumes uniform trigger density along an estimated rupture line segment, with power-law decay to the sides and at the end points of the rupture. Zhang et al. (2018) pursued an even more detailed approach, which assumed constant trigger rate in the entire rupture plane, with power-law decay outside of it. Different versions of elliptic Gaussian distributions were introduced and discussed by Ogata (1998, 2011) and Ogata and Zhuang (2006). The latter approaches successfully modeled spatial aftershock patterns, however, they require a new set of parameters and are therefore not flexibly combinable with the conventional, isotropic functionality. In contrast, the kernel of Grimm et al. (2021) represents a generalization of the isotropic function and therefore allows simultaneous anisotropic modeling of some events (e.g. above a certain magnitude threshold) and isotropic modeling of the rest. In order to address the abovementioned particularity of the M6.4 Ridgecrest foreshock, rupturing two almost orthogonal faults, we further generalize the approach by allowing a spatial kernel composed by a weighted superposition of two distinct rupture line segments.

Additionally, we accounts for STAI by applying an ETAS model version that incorporates rate-dependent incompleteness of event records. Recognizing alternative approaches that will be briefly described in the Methods section, we choose for the ETAS-Incomplete (ETASI) model as recently suggested by Hainzl (2021). For simplicity and to sharpen its focus on the incompleteness detection, Hainzl (2021) neglected the space dimension in his model. As this article combines the ETASI time model of Hainzl (2021) with an adequate, anisotropic spatial kernel it can be seen as the space-including extension of the latter. The focus of this study, however, is on the benefit of modeling the spatial aftershock distribution by a generalized anisotropic spatial kernel, rather than the benefit of the ETASI model.

This article is structured as follows. In the Methods section, we introduce the conventional ETAS model and its ETASI extension and define the anisotropic, locally restricted spatial kernel. This section includes a description of the estimation procedures for strikes and rupture positions and the spatial integral over anisotropic kernels. Next, the Application section explains the three forecasting experiments, introducing the data and time-space windows for the parameter estimation and forward simulations. Finally, we interpret and discuss our forecasting results and draw our conclusions. Source codes for model estimation and simulation are freely available in a Github repository (see Data and resources).

2 Methods

The ETAS model, first introduced by Ogata (1988, 1998), is a branching-tree type model which describes clustered earthquake occurrences by consecutive triggering evolving over multiple parent-child generations (i.e. allowing secondary aftershocks). The triggered seismicity is overlaying a time-invariant background process.

In this section, we will first introduce the conventional, isotropic ETAS model approach. Next, we will extend the model to obtain a time-space version of the ETASI model suggested by Hainzl (2021), which involves STAI into parameter estimation. Mostly, notations are consistent with Hainzl (2021). We will then define the anisotropic generalization of the spatial kernel, which is compatible with both the ETAS and ETASI model, and introduce the local restriction of the kernel. Finally, we explain the fitting algorithm for the strike angle and rupture position of anisotropic trigger events and the methods for spatial integral estimation.

2.1 ETAS-model

In the conventional ETAS model approach, the occurrence rate of an earthquake with magnitude m, occurring at time t and at location (xy) is modeled by an inhomogeneous Poisson process with a time-space-magnitude dependent intensity function

$$\begin{aligned} \lambda (t,x,y,m) = f_0(m)\, R_0(t,x,y) \end{aligned}$$

where

$$\begin{aligned} f_0(m) = \beta \, e^{-\beta (m-M_c)} \end{aligned}$$
(3)

is the ”true” probability density function (pdf) of the frequency-magnitude distribution (FMD) with Gutenberg-Richter parameter \(b = \beta / ln(10)\) (Gutenberg and Richter 1944), and

$$\begin{aligned} \begin{aligned}&R_0(t,x,y) = \mu \, u(x,y) \\&\quad + \sum _{i:t_i<t} k_{A,\alpha }(m_i)\, g_{c,p}(t-t_i)\, h_{D,\gamma ,q}(r_i(x,y),m_i,l_i) \end{aligned} \end{aligned}$$
(4)

is the ”true” occurrence rate of events with magnitude \(m \ge M_c\), at time t and at location (xy). The ”true” event rate is modeled by a superposition of the time-invariant seismic background rate \(\mu \, u(x,y)\) with parameter \(\mu > 0\) and a sum of the trigger rate contributions of all events i that occurred prior to current time t. \(k_{A,\alpha }(m_i)\) and \(g_{c,p}(t-t_i)\) denote the aftershock productivity and Omori-Utsu Law decay functions as defined in Eqs. (1) and (2), respectively. \(h_{D,\gamma ,q}(r_i(x,y),m_i,l_i)\) models distribution of aftershock locations triggered by event i, with parameters \(D, \gamma\) and q. The precise inputs and shape of the spatial kernel are discussed later.

The term ”true” means that the (physical) relationships are expected to be observed with perfect earthquake records. The main assumption of the conventional ETAS model is that STAI does not significantly distort the “true” magnitude distribution and the “true” event rates.

2.2 ETASI model

2.2.1 Rate-dependent iIncompleteness

The concept of rate-dependent earthquake record incompleteness assumes that the ”true” relationships underlying \(f_0(m)\) and \(R_0(t,x,y)\) are not accurately identifiable in available earthquake catalogs because especially events with small magnitudes are detected with lower probability in periods of high seismic activity. In these periods, the detection ability is limited typically due to overlapping seismic waves (Hainzl 2016a, 2021).

Fitting the ”true” relationships to incomplete data records may therefore lead to significantly biased parameter estimates (Hainzl 2016a, b; Page et al. 2016; Seif et al. 2017; Hainzl 2021).

In recent years, there has been growing research interest in how to account for short-term incomplete datasets. For instance, Zhuang et al. (2017) developed a replenishment algorithm to fill up likely incomplete time intervals by simulated events, in order to obtain artificially complete pseudo-records. Other authors, particularly mentionable Omi et al. (2013, 2014), Lippiello et al. (2016), de Arcangelis et al. (2018), Mizrahi et al. (2021) and Hainzl (2021), tried to incorporate STAI directly into the ETAS model fit. A rather simple workaround approach is to remove likely incomplete time periods from the fitted time interval using empirical completeness functions, such as performed in Hainzl et al. (2013) and Grimm et al. (2021). A comprehensive discussion and comparison of various ETASI models is not in the scope of this article. The choice for the ETASI model proposed by Hainzl (2021) was made for rather practical reasons, mainly because of its compatibility with existing code.

2.2.2 Model formulation

The working assumption of the ETASI model described here is that an earthquake with magnitude m and occurring at time t can only be detected by the operating seismic network if no event of equal or larger magnitude occurred within the blind time \([t-T_b,t]\), where \(T_b\) is typically in the range of some seconds to few minutes (Hainzl 2021). Similar assumptions have formerly been formulated by Lippiello et al. (2016), de Arcangelis et al. (2018) and Hainzl (2016a).

Let \(N_0(t)\) be the expected number of events occurring within the entire spatial window S during blind time \([t-T_b, t]\),

$$\begin{aligned}&N_0(t) = \int _{t-T_b}^{t} \iint _S R_0(t,x,y) dx\, dy\, dt \approx T_b\, \\&\quad \iint _S R_0(t,x,y)\, dx\, dy, \end{aligned}$$

where the approximation holds under the assumption that event rates are approximately constant during the blind time (Hainzl 2021). According to the ”true” FMD (Eq. 3), each of the \(N_0(t)\) events has a probability of \(e^{-\beta \, (m-M_c)}\) to exceed magnitude m. We define the detection probability \(p_d(m,t)\) of an earthquake at time t with magnitude m as the probability that no equal or larger event occurred during blind time \(T_b\), i.e.

$$\begin{aligned} p_d(m,t) = e^{-N_0(t)\, e^{-\beta \, (m-M_c)}}. \end{aligned}$$

Following the derivations in Hainzl (2016b, 2021), we obtain the ”apparent”, incompleteness-biased FMD

$$\begin{aligned} \begin{aligned} f(m,t) :&= f_0(m)\, N_0(t)\, \frac{p_d(m,t)}{1-e^{-N_0(t)}} \end{aligned} \end{aligned}$$

and the ”apparent” event rate

$$\begin{aligned} R(t,x,y) := \frac{R_0(t,x,y)}{N_0(t)}\, \left( 1 - e^{-N_0(t)}\right) . \end{aligned}$$

The term ”apparent” signalizes that the functions f and R do not represent the ”true”, but the observable relationships that are possibly distorted by rate-dependent record incompleteness. In periods of high seismic activity, the ”apparent” FMD exhibits a larger relative frequency of strong events (because they are more likely to be detected) and an event rate lowered by detection capacity. We obtain the ETASI intensity function

$$\begin{aligned} \begin{aligned}&\lambda (t,x,y,m) = f(m,t)\, R(t,x,y)\\&\quad = f_0(m)\, R_0(t,x,y)\, p_d(m,t) \end{aligned} \end{aligned}$$

The two underlying, simplifying assumptions in the ETASI model are that (1) the blind time \(T_b\) is magnitude-independent, which Hainzl (2021) justifies by typically shorter source durations than travel times of coda waves, and (2) that the seismic network is equally occupied for blind time \(T_b\) by any event in the entire investigated spatial window. The second assumption is reasonable for a small spatial window, e.g. when analyzing an isolated sequence. When fitting the ETASI model over a larger region, it needs to be checked that relevant clusters do not evolve at the same time but at distinct locations as they would be assumed to simultaneously occupy the entire seismic network. A reasonable approach to prevent undesired biases is to choose a larger cut-off magnitude.

2.2.3 Log-likelihood optimization

The parameter vector \(\theta = \{ \mu , A, \alpha , c, p, D, \gamma , q, \beta , T_b \}\) of the ETASI model is estimated by maximizing its log-likelihood function \(LL = LL_1 - LL_2\) with

$$\begin{aligned} \begin{aligned} LL_1&= \sum _{events~j} ln\left( \, f_0(m_j)\, R_0(t_j,x_j,y_j)\, p_d(m_j,t) \,\right) , \\ LL_2&= \int _{M_c}^{\infty } \int _{T_1}^{T_2} \iint _{S} \lambda (t,x,y,m)\, dx\, dy\, dt\, dm \\&\approx \frac{T_2 - T_1}{T_b} - \frac{1}{T_b}\, \int _{T_1}^{T_2} e^{-T_b\, \iint _{S} R_0(t,x,y)\, dx\, dy} dt \end{aligned} \end{aligned}$$
(5)

where the sum in \(LL_1\) goes over all target events in the magnitude-time-space window \([M_c, \infty ) \times [T_1, T_2] \times S\) and \(LL_2\) integrates over this model space. In our study we optimized the parameter vector \(\theta\) using the gradient-based Davidson-Fletcher-Powell algorithm (Ogata 1998; Zhuang et al. 2002; Jalilian 2019).

2.3 Generalized anisotropic spatial kernel

2.3.1 Conventional isotropic kernel

The spatial kernel \(h_{D,\gamma ,q}(r_i,m_i,l_i)\) in Eq. (4) models the 2D-distribution of aftershocks locations. In conventional ETAS model approaches, the triggering event is assumed to be a point source, distributing its offsprings isotropically around its epicenter. A classical definition of an isotropic kernel (see Ogata 1998; Grimm et al. 2021; Jalilian 2019) is

$$\begin{aligned}&h_{D,\gamma ,q}^{iso}(r_i(x,y),m_i) := \frac{q-1}{D\, \exp (\gamma (m_i-M_c))}\, \\&\quad \left( 1 + \frac{\pi \, r_i(x,y)^2}{D\, \exp (\gamma (m_i-M_c))} \right) ^{-q} \end{aligned}$$

where \(r_i(x,y)\) denotes the point-to-point distance between a potential aftershock location (xy) and the coordinates \((x_i,y_i)\) of the triggering event i, and \(m_i\) is the magnitude of the event i. The kernel is constrained by the parameters D and \(\gamma\) that control the magnitude-dependent width of the kernel, and parameter q that describes the exponential decay of the function with growing spatial distance.

2.3.2 Anisotropic generalization

Here we use the anisotropic generalization of the spatial kernel that was first introduced by Grimm et al. (2021),

$$\begin{aligned}&h_{D,\gamma ,q}(r_i(x,y),m_i,l_i) := \frac{q-1}{D\, \exp (\gamma (m_i-M_c))}\, \\&\quad \left( 1 + \frac{2\, l_i\, r_i(x,y)\, +\, \pi \, r_i(x,y)^2}{D\, \exp (\gamma (m_i-M_c))} \right) ^{-q}. \end{aligned}$$

In this spatial model, the distance term \(r_i(x,y)\) denotes the point-to-line distance between the potential aftershock location (xy) and the estimated rupture segment of triggering event i with length \(l_i\). That is, the kernel assigns constant density along the rupture line segment, with a power-law decay to the sides. Note that

$$\begin{aligned} h_{D,\gamma ,q}(r_i(x,y),m_i,0) = h_{D,\gamma ,q}^{iso}(r_i(x,y),m_i), \end{aligned}$$

i.e. the anisotropic kernel is a generalization and collapses to the isotropic model if the triggering location is assumed to be a point source with rupture extension \(l_i = 0\). Therefore, the generalized spatial model can be used for mixing approaches of both kernels, e.g. applying anisotropy to events i with magnitudes \(m_i \ge M_{aniso}\):

$$\begin{aligned} l_i = {\left\{ \begin{array}{ll} 0, &{}\text {for}~m_i < M_{aniso}, \quad \text {(isotropic~trigger)} \\ 10^{-2.57 + 0.62m_i}, \qquad &{}\text {for}~m_i \ge M_{aniso}, \quad \text {(anisotropic~trigger)} \end{array}\right. } \end{aligned}$$
(6)

The scaling relationship for anisotropic events is taken from the estimate of subsurface rupture lengths for strike-slip faulting events, provided in Wells and Coppersmith (1994). Alternative relationships can be applied, too, but are not expected to impact results.

2.3.3 Local spatial restriction

Both the conventional isotropic and the generalized anisotropic kernels are defined in terms of a probability density function (pdf) over infinite space. Realistically, however, small earthquakes should exert only a locally restricted trigger influence. Grimm et al. (2021) showed that an infinite spatial extent may lead to an underestimation of the aftershock productivity parameter \(\alpha\) because it overestimates the triggering power of smaller events at the cost of the larger events. A manual analysis of the spatial aftershock patterns of the six Californian mainshocks named in the introduction shows that the cloud of potential aftershocks typically lies within one estimated rupture length (by Wells and Coppersmith 1994) from the epicenter. In case of an anisotropic shape of the kernel, the area of half a rupture length around the extended rupture segment seems sufficient. According to this observation, we choose a spatial restriction \(R_i\) for event i according to

$$\begin{aligned} R_i := {\left\{ \begin{array}{ll} 10^{-2.57 + 0.62m_i}, &{}\text {for}~m_i < M_{aniso}, \quad \text {(isotropic~trigger)} \\ 0.5\, \cdot \, 10^{-2.57 + 0.62m_i}, &{}\text {for}~m_i \ge M_{aniso}, \quad \text {(anisotropic~trigger)} \end{array}\right. } \end{aligned}$$
(7)

where again we use the strike-slip faulting subsurface rupture length scaling from Wells and Coppersmith (1994).

In other words, the spatial kernel for event i is only defined in the restricted area

$$\begin{aligned} S_i(R_i) := \{(x,y)\in \mathbb {R}^2 | r_i(x,y) \le R_i\} \end{aligned}$$

and set to 0 outside of it. Note that the restricted area \(S_i(R_i)\) can assume isotropic and anisotropic shapes, depending on the point-to-point or point-to-line definition of the distance term \(r_i(x,y)\). In order to retain the property of a pdf, we need to rescale the kernel within the restricted area by its analytical integral

$$\begin{aligned} \begin{aligned} H_{D,\gamma ,q}(R_i,m_i,l_i) :&= \iint _{S_i(R_i)} h_{D,\gamma ,q}(r_i(x,y),m_i,l_i)\, dx\, dy \\&= 1 - \left( 1 + \frac{2\, l_i\, R_i\, +\, \pi \, R_i^2}{D\, \exp (\gamma (m_i-M_c))} \right) ^{1-q}. \end{aligned} \end{aligned}$$
(8)

The integral term holds true for both isotropic (\(l_i=0\)) and anisotropic triggers (\(l_i>0\)). We obtain the generalized, restricted and anisotropic spatial kernel

$$\begin{aligned} h^{restr}_{D,\gamma ,q}(r_i(x,y),m_i,l_i) = {\left\{ \begin{array}{ll} \frac{h^{restr}_{D,\gamma ,q}(r_i(x,y),m_i,l_i)}{H_{D,\gamma ,q}(R_i,m_i,l_i)}, \quad &{}\text {if}\, r_i(x,y) \le R_i, \\ 0, \quad &{}\text {if}\, r_i(x,y) > R_i. \end{array}\right. } \end{aligned}$$
(9)

2.4 Estimation of strike and epicenter location

The restricted, anisotropic spatial kernel in Eq. (9) requires a strike angle to define the orientation of the extended rupture for anisotropic triggers with \(l_i > 0\). In retrospect, the strike angle could be taken from one of the numerous publications about the Ridgecrest sequence or from focal mechanism datasets such as the Global Moment Tensor Catalog, the ISC-GEM Global Instrumental Earthquake Catalog or from local datasets of the Southern California Earthquake Data Center (SCEDC). In order to perform a realistic forecasting test case, however, we should build upon instantaneous strike estimates such as from local agencies (e.g. the United States Geological Survey), which are typically available within several minutes to hours.

Here, we used the strike estimation algorithm proposed by Grimm et al. (2021), that optimally fits the rupture segment through the cloud of early aftershock locations by maximizing the contributed spatial rate under initial spatial parameter guesses. To be more precise, we ran through possible strikes in \(1^{\circ }\) steps (i.e. \(\{1^{\circ }, ..., 180^{\circ }\}\) where we can neglect all strikes larger than \(180^{\circ }\) because we do not account for dip direction in our model). We also moved the rupture along each strike angle in order to test different positions of the rupture segment relative to the fix epicenter. Here, we go through possible relative positions in 0.01-steps (i.e. \(\{0, 0.01, 0.02, ..., 0.99, 1\}\)), where 0 and 1 means that one of the ends of the rupture segment coincides with the epicenter, and 0.5 denotes the situation where the rupture embeds the epicenter directly in its center. Among all combinations, we searched the orientation and rupture position that maximizes the forward trigger contribution of the anisotropic event i to subsequent events j within a fixed duration \(\Delta t = 1~hour\), i.e. with \(t_i< t_j < t_i+\Delta t\). The forward trigger contribution of event i is computed as

$$\begin{aligned} \sum _{t:t_i< t_j < t_i+\Delta t} h^{rest}_{D,\gamma ,q}(r_i(x_j,y_j),m_i,l_i). \end{aligned}$$
(10)

In order to avoid that the rupture orientation and position is dominated by single events that occurred very close to the segment candidate, we applied a minimum distance of 0.2 kilometers.

Here, we use the initial spatial parameters \(D=0.0025\), \(\gamma =1.78\) and \(q=1.71\) derived from the results of an isotropic ETAS model for a long-term California dataset, locally restricted to \(R=2.5\) rupture lengths, by Grimm et al. (2021). Tests have shown that modified initial parameters changed the level of the sum of forward rate contributions, but led to similar strike and epicenter location estimates. We also tested multiple durations \(\Delta t\) up to 30 hours and found that the estimation procedure provided very similar estimates for strike and rupture position. It shows that the rupture orientation and position can be appropriately identified soon after the trigger event occurred.

In the Application section we present the strike and rupture position estimation for the example of the M6.4 and M7.1 Ridgecrest events.

2.5 Estimation of spatial integral

The computation of the log-likelihood function in Eq. (5) requires the estimation of the spatial integral of \(R_0\) and therefore \(h^{restr}_{D,\gamma ,q}\).

In the isotropic case, the integral can be estimated semi-analytically by the radial partitioning method suggested by Ogata (1998) and applied in the R package ETAS (Jalilian 2019). It uses the property, that the isotropic spatial kernel can be integrated analytically over circular areas \(S_i(R)\), according to Eq. (8). As Fig. 2a illustrates, the polygon defining the spatial window S is partitioned into a fine grid, with two neighboring boundary grid points having approximately equal distances \(\tilde{R}\) to the point source coordinate of event i. The integral of \(h^{restr}_{D,\gamma ,q}\) over each of these thin segments of a circle can then be approximated by the analytical full integral, weighted by the ratio of the circle segment \(\phi /360^{\circ }\), where \(\phi\) is the angle enclosed by the circle segment (Jalilian 2019; Ogata 1998).

Similarly, an anisotropic spatial kernel can be integrated analytically over an anisotropic region \(S_i(\tilde{R})\) with maximum distance \(\tilde{R}\) to the extended rupture. Due to the non-circular shape of region \(S_i(R)\) for anisotropic triggers, radial partitioning can be only performed at both ends of the rupture segment. As Fig. 2b illustrates, in a similar way we use ”bin partitioning” in the space orthogonal to the rupture. Unfortunately, in the anisotropic case, the weights \(\phi /360^{\circ }\) of the circle segments at both ends of the rupture only relate to the part of the integral that is estimated by radial partitioning. Similarly, the weight of a bin of size \(\Delta \, l\) is \(\frac{\Delta \, l}{2\, l_i}\) relative to only the orthogonal space on both sides of the rupture segment. In each iteration of the parameter estimation, the shares of the radial and the orthogonal integral parts change and need to be determined numerically before each iteration. This comes at the computational cost of approximately doubled run-time, given that only the minority of strong earthquakes with magnitude \(M \ge M_{aniso}\) are modelled anisotropically.

Fig. 2
figure 2

Visualization of the spatial integral estimation needed for computing the log-likelihood function (Eq. 5) for a isotropic triggers and b anisotropic triggers. The box represents the boundary of the spatial target region (polygon), gridded into small intervals. Red crosses symbolize the epicenters of the triggering events. In a, the red circle around the event represents the contour lines of an isotropic spatial kernel and the shaded segments illustrate the radial partitioning method. In (b), the red box and semi-circles symbolize the contour lines of the anisotropic spatial kernel, and the shaded segments illustrate the radial and bin partitioning method

3 Application

We carry out three forecasting experiments to check the quality of the previously defined models in predicting the observed Ridgecrest M6.4 and M7.1 sequences. Each forecasting experiment consists of three main steps, represented as blue boxes in Fig. 3:

  • Parameter Estimation: Estimate model parameters for a specified event sub-set of southern Californian earthquake data

  • Forward Simulation: Use the fitted model parameters to conduct 10,000 forward simulations of the Ridgecrest M6.4 or M7.1 sequence, respectively.

  • Evaluation: Analyze simulated sequences and compare to the observation.

Fig. 3
figure 3

Summary of the forecasting experiments (from left to right): The five model versions, listed in Table 1, are fitted to a long-term California event sub-set (Experiments 1 and 2) and to the local M6.4 Ridgecrest sequence (Experiment 3). The estimated parameters are applied to forward simulations of the Ridgecrest M6.4 sequence (Experiment 1) and the Ridgecrest M7.1 sequence (Experiments 2 and 3). The predicted sequences are compared to the observed ones with respect to three attributes, further described in the Attributes and Measures section

Table 1 Overview of the model variants tested in this paper

In the following, we first introduce the basic earthquake event set for California underlying this study, and define the time-space windows used to obtain the event sub-sets applied for parameter estimation. Next, we describe the three forecasting experiments, rigorously defining the magnitude-time-space windows applied for parameter estimation and forward simulations. Each forecasting experiment is repeated for five versions of the models introduced in the Methods section, which are defined in detail. Finally, we specify the forward simulation process and attributes and measures to assess the quality of the forecasts. Here, we also give an example of the estimation of strike angles and rupture positions for the Ridgecrest M6.4 and M7.1 events.

3.1 Data

As our basic event set, we downloaded the earthquake catalog from the Southern California Earthquake Data Center (SCEDC, Hauksson et al. 2012). The entire dataset comprises 699,175 event occurrences between January 1, 1981, and December 31, 2019. Because magnitudes seem to be clustered at values with one decimal, we round all magnitudes to one decimal and use the cut-off magnitude \(M_c = 2.05\) (Hutton et al. 2010; Hainzl 2021). We remove events at depths larger than 40 km to avoid completeness issues.

3.2 Forecasting experiments

Here, we describe in detail the design of the forecasting experiments, summarized in Fig. 3.

3.2.1 Experiment 1

We estimate generic, long-term California model parameters within the hexagonal polygon of southern California defined in Hutton et al. (2010). In order to mitigate computational costs, we restrict the time window to the period between January 1, 1987, and December 31, 2018, including the five prominent earthquake sequences before Ridgecrest as named in the Introduction section, and choose the larger cut-off magnitude \(M_c = 2.95\). The cut-off magnitude is a trade-off between too large and too small event record sizes that ensures reasonable run-time and statistical robustness of parameter estimates. Additionally, it avoids potentially biased estimates of the blind time parameter \(T_b\) in large spatial areas due to simultaneous clustering. The magnitude-time-space window contains 7,215 fitted target events. We account for external triggering impact by including complementary events that occurred after January 1, 1986, and in the surrounding of 0.5\(^\circ\) geographical degrees of the fitted area.

The estimated models are then applied to forecast the Ridgecrest M6.4 foreshock sequence above cut-off magnitude \(M_c = 2.05\), within a circular polygon with radius 40 km centered in the M6.4 event location. The simulated time window starts in the moment of the M6.4 event (July 4, 2019) and ends at the M7.1 mainshock event time (July 6, 2019), thus it has a duration of approximately 34 hours. We initialize triggering seismicity by the event history from June 1, 2019.

3.2.2 Experiment 2

In the second experiment, we use the same set of generic, long-term California parameters, but apply it in a forecast of the Ridgecrest M7.1 mainshock sequence above cut-off magnitude \(M_c = 2.95\), starting at the M7.1 event time for a duration of ten days. The spatial simulation window is defined by a disk with radius of 75 km, centered in the M7.1 event location. Again, we initialize triggering seismicity by the event history from June 1, 2019, here until the M7.1 event time.

3.2.3 Experiment 3

In the third experiment, we simulate Ridgecrest M7.1 sequences with the same settings as for Experiment 2, but based on parameter estimates fitted over the immediately preceding M6.4 foreshock sequence. For the parameter estimation, we use the same magnitude-time-space target window as for the M6.4 sequence simulations in Experiment 1. We account for external triggering by including complementary events that occurred after June 1, 2019, and within a disk with increased radius of 50 km.

3.3 Fitted models

Each forecasting experiment is carried out for five different versions of the model introduced in the Methods section, summarized in Table 1. The ”ETAS conventional” model serves as our benchmark and uses the most standard set-up of an ETAS model (e.g. Ogata 1998; Zhuang et al. 2002; Jalilian 2019). It applies an isotropic spatial kernel with infinite spatial extent to all triggers. The ”ETAS iso-r” model applies an isotropic kernel, but restricts the spatial extent to one rupture length for all events, according to Eq. (7). In the ”ETAS aniso-r” model, all events with magnitudes \(m_i \ge M_{aniso} = 6.0\) are modeled as an anisotropic trigger source with a spatial restriction to half a rupture length (Eqs. 6 and 7). The other events are modeled as isotropic triggers, restricted to one rupture length. The ”ETASI iso-r” and ”ETASI aniso-r” models combine the spatial kernel settings of the latter models with the ETASI approach accounting for STAI.

3.4 Simulation process

For each forecasting experiment and model version, we carry out 10,000 realizations of synthetic sequences to obtain statistically stable results. At the beginning of each simulation, we distribute the Poisson-sampled number of background events, scaled by the size of the spatial area, uniformly over space and time. The assumption of an uniform spatio-temporal background event distribution appears plausible within the short and small space-time simulation windows.

Next, we sample the numbers of offsprings for the initiating event history and the background events. The number of offsprings, depending on trigger magnitude m, is drawn from a Poisson distribution with expected value

$$\begin{aligned} N(m) = k(m)\, \frac{1 }{1-p}\, \left( (T+c)^{1-p} - c^{1-p} \right) . \end{aligned}$$
(11)

where k(m) is the aftershock productivity function in Eq. (2) and the latter term is the integral from \(t=0\) to a maximum trigger duration \(t=T\) (in days) over the Omori-Utsu function in Eq. (1). We need to rescale the aftershock productivity to obtain the expected number of offsprings within T days, because the Omori-Utsu law is not normalized (no pdf) and, therefore, typically does not integrate to one. Thus, it interacts with the scaling parameter A of the productivity function.

Each triggered event is then assigned an event time and location according to inversion sampling from the (rescaled) Omori-Utsu law and the spatial kernel. The magnitude is sampled by the inversion method from the estimated FMD in Eq. (3), applying a maximum magnitude of 7.5 for California. The aftershock sampling routine is repeated for every newly triggered event in the simulated time-space window until no more events are sampled.

In order to make fair comparisons of simulated sequences with the observed ones, we need to consider the implications of STAI in the forecasts. The ETASI models account for incomplete records in the parameter estimation and therefore forecast the ”true”, i.e. complete, aftershock sequence. According to its definition of event detectability, we would need to delete all events that occurred within the blind time \(T_b\) of an earlier event with larger magnitude.

For the sake of transparency and consistency with the observations, we used an alternative approach and manually fitted empirical magnitude completeness functions

$$\begin{aligned} M_c(t) = {\left\{ \begin{array}{ll} -1.4\, log_{10}(t) + M - M_c - 4.75, &{}\text {(Ridgecrest M6.4)}, \\ -0.99\, log_{10}(t) + M - M_c - 3.75, &{}\text {(Ridgecrest M7.1)}. \end{array}\right. } \end{aligned}$$
(12)

to the logarithmic event time-magnitude scatter data of the observed Ridgecrest M6.4 and Ridgecrest M7.1 sequences in Fig. 1c and d.

In the forecasts generated by the ETASI iso-r and aniso-r models, we delete all events that fall in the supposedly undetected range below the line of the simulated sequence. In contrast, ETAS models estimate STAI-biased aftershock productivities and therefore lead to predictions of the incomplete, rather than the ”true” size of the sequence. Therefore, in forecasts generated by these models we do not delete events.

3.4.1 Attributes and measures

For each model version and experiment, we want to assess the quality of the forecasts with respect to three attributes, in comparison with the observed sequence evaluated over the same magnitude-time-space window.

We compute the predicted cumulative distribution function (cdf) of the number of aftershocks and the predicted pdf of the largest aftershock magnitude out of the 10,000 forecasted sequences. As a quantitative measure of the fit, we determine the exceedance probability that the predicted distribution would forecast a larger or the observed value. Extreme exceedance probabilities, either close to 0 or 1, indicate an inadequate prediction of the attribute.

To test the spatial distribution of aftershock locations, we define a 1km \(\times\) 1km spatial grid over the spatial simulation window of the forecasting experiment and count the number of aftershocks in each simulation run, that occurred closest to the respective grid points. We determine the spatial distribution \(D_{ij}\) of the i-th simulation run by dividing the number of events occurred at each grid point j, \(N_{ij}\), by the number of events in the i-th simulation run, \(N_i\), i.e.

$$\begin{aligned} D_{ij} = N_{ij} / N_i. \end{aligned}$$

By repeating the same procedure for each simulation run, we obtain 10,000 simulated spatial distributions \(D_{ij}\) for each model version. Finally, we average the individual simulated distributions to obtain the predicted probability \(P_j\) that an event occurs at grid point j.

The more events of the observed sequence have occurred at grid points with high predicted probabilities \(P_j\), the better is the forecast. Therefore, we quantify the goodness of the spatial fit with the likelihood \(L_{space} = \prod _{grid~points~j}\, P_j^{N_j^{obs}}\) where \(N_j^{obs}\) is the number of observed events at grid point j with corresponding log-likelihood

$$\begin{aligned} LL_{space} = \sum _{grid~points~j}\, N_j^{obs}\, ln(P_j). \end{aligned}$$

We compute the information gain of the models’ spatial predictions relative to the ETAS conventional model by

$$\begin{aligned} IG = \frac{LL_{space} - LL_{space}^0}{N_{obs}} \end{aligned}$$

where \(LL_{space}^0\) is the benchmark result for the ETAS conventional model (Hainzl 2021; Rhoades et al. 2014).

3.4.2 Strike and rupture position estimates

For anisotropic models, both the parameter estimation and the forward simulations of the Ridgecrest M6.4 and M7.1 sequences require estimates of the strike angle and rupture position of all events with magnitude \(M > 6.0\).

Figure 4a demonstrates the methodology, described in the Methods section, for the Ridgecrest M6.4 foreshock. The forward trigger rate contribution (y axis) from Eq. (10) is plotted against the strike sample (x axis) and the sample of relative rupture positions (red lines). The curves clearly show a bi-modal shape, with peaks at strikes \(34^{\circ }\) and \(132^{\circ }\) and relative rupture positions 0.76 and 0.77. Fig. 4c visualizes the optimized rupture orientation and position as a fit through the cloud of potential aftershocks within the first hour (red) or within 30 hours (yellow). It confirms the earlier mentioned particularity of two almost orthogonally ruptured faults. The strike \(34^{\circ }\) rupture segment does not perfectly fit the aftershock alignment, as segment must go through the fixed M6.4 epicenter location which seems to be slightly off the ruptured fault. Apparently, later aftershocks have a very similar spatial distribution as the events occurred within the first hour. For larger \(\Delta t\), the M6.4 strike estimates would vary by only \(1^{\circ }\) or \(2^{\circ }\), respectively.

Figure 4b shows the analogous analysis for the M7.1 Ridgecrest mainshock. Here, the maximizing properties are strike \(142^{\circ }\) and a relatively central rupture position 0.55. The M7.1 event ruptured a single fault, resulting in an uni-modal shape of the forward trigger contribution curves.

Fig. 4
figure 4

Strike and relative rupture position optimization using initial ETAS parameter guesses \(D=0.0025, \gamma =1.78, q=1.71\). a, b: Sum of forward trigger rate contributions to events within one hour against strike sample (x axis) and relative rupture position sample (curves) for a the M6.4 Ridgecrest foreshock and b the M7.1 Ridgecrest mainshock. Text boxes show strike and relative rupture position estimates at the curve maxima. c, d: Fitted rupture segments through cloud of aftershocks after c the M6.4 Ridgecrest foreshock and d the M7.1 Ridgecrest mainshock. Darker red and blue points represent aftershocks within the first hour after the respective trigger event, yellow and lighter blue points represent aftershocks within the first 30 hours. Yellow pentagram symbolizes Mw6.4 foreshock, and yellow hexagram marks Mw7.1 mainshock. Thick black lines represent estimated rupture locations according to the strikes and relative rupture positions estimated in a and b.

4 Results and discussion

In this section, we analyze and discuss the results of the three forecasting experiments, summarized in Fig. 3. We use the attributes and measures presented in the Application section to evaluate the quality of the forecasts, compared to the observed sequences. The model parameter estimation results of both the generic California and the Ridgecrest M6.4 sequence parameter fits are listed in Table 2 and will help us to understand and explain features in the forecasts. After a rigorous discussion of the forecasting results, we will mention some sensitivity tests that we applied to check the robustness of our findings.

Table 2 Overview of model results for generic (long-term) California and Ridgecrest M6.4 parameter estimation

4.1 Forecasting experiment 1

In the first forecasting experiment, we simulated the Ridgecrest M6.4 sequence, starting at the time of the M6.4 earthquake occurrence, based on generic parameters, fitted on a long-term and spacious Californian event set. The simulation period covers the 34 hours until (but non-including) the occurrence of the M7.1 mainshock.

4.1.1 Predicted aftershock productivity

Figure 5a shows the predicted cdfs of the number of aftershocks for each model, compared to the observed M6.4 sequence, which produced 633 events in the same time-space window. Evidently, the ETAS conventional model (with isotropic, unlimited spatial kernel) provides a very inappropriate estimate, as it does not reach the observed number in any of the 10,000 simulations. According to the ETAS iso-r and ETAS aniso-r models, the observed number of events would be a possible, but rather unlikely scenario, with approximately 2.4 and 3.7% probability to exceed the observed value. The ETASI models tend to only moderately (ETASI iso-r) or slightly (ETASI aniso-r) underestimate the observed number.

Fig. 5
figure 5

Predicted cdfs of the number of aftershocks (a, c, e) and predicted pdfs of the largest aftershock magnitude (b, d, f) for Experiment 1 (a, b), Experiment 2 (c, d) and Experiment 3 (e, f). Each predicted distribution is based on 10,000 simulated forecasts of the Ridgecrest M6.4 sequence (a, b) and the Ridgecrest M7.1 sequence (cf), using the models indicated in the legend in the top left figure. Vertical gray lines show the value of the observed sequence

To explain this observation, we consider that the size of this relatively short sequence is predominantly influenced by the amount of direct aftershocks of the initial M6.4 trigger event. According to the model parameter estimates in Table 2 and Eq. (11), the M6.4 trigger event would only produce approximately 17 direct aftershocks in the ETAS conventional model, compared to 46 (ETAS iso-r), 49 (ETAS aniso-r), 66 (ETASI iso-r) and 74 (ETASI aniso-r) in the other models. The larger the parameter \(\alpha\), the more direct aftershocks are expected for the M6.4 event.

As argued in the Methods section, the local restriction of the spatial kernels prevents a disproportionate triggering power of small events and in return increases the direct aftershock productivity of the stronger events, characterized by a considerable increase of the parameter \(\alpha\) in the non-conventional models (Grimm et al. 2021). Besides, the application of the ETASI model accounts for missing aftershock records after strong trigger events and corrects for the biased, underestimated aftershock productivity, leading to an additional increase of \(\alpha\) (Hainzl 2021). Finally, we note that the majority of the \(M>6\) mainshocks included in the estimation time window from 1987 until 2018, are characterized by anisotropic aftershock patterns. Consequently, more events are associated as direct aftershocks of the strong mainshocks when we estimate the parameters with the ETAS aniso-r or the ETASI aniso-r model.

4.1.2 Predicted largest aftershock magnitude

Figure 5b shows the predicted pdfs of the largest aftershock magnitude in the synthetic sequences, assuming that the Gutenberg-Richter distribution holds over the entire magnitude range up to the largest values. For each of the five models, a kernel density function was computed for the 10,000 largest magnitude samples. In all models, the observed M7.1 event would have been an extremely rare case, with exceedance probabilities \(P(largest~magnitude \ge 7.1) \le 0.43\%\). Even the second largest, observed aftershock magnitude (\(M=5.4\)) was not reached in approximately 75% of the simulations of the best model (ETASI aniso-r).

To interpret this result, think of the largest aftershock magnitude as the largest order statistic of the sample of simulated events in a simulation run. Then, the expected value of the sample maximum (i.e. the largest aftershock) increases if (1) the sequence size becomes larger or (2) if the magnitudes in the sample are distributed in a way that they favor high values.

The underestimations of the observed sequence size, shown in Fig. 5a and discussed earlier, cannot sufficiently explain the miss-match of the predicted largest aftershock magnitudes. Even the observed sample size (633 events) would produce a M7.1 event with a probability of less than 1%, given the generic California estimates for the FMD with \(b = 0.98\) (ETAS models) or \(b = 1.01\) (ETASI models; see Table 2). If \(b = 1\), then each magnitude increment by 1 leads to a 10 times smaller probability of occurrence. Therefore, one M7.1 event is only obtained, on average, for a sequence with 100,000 aftershocks.

According to the results in Table 2, all models estimated significantly smaller values \(b < 0.8\) for the observed Ridgecrest M6.4 sequence, which favors the occurrence of strong events. Note that the b estimates of the three ETAS models are biased, because they are fitted for the ”true” FMD using the evidently short-term incomplete M6.4 sequence event record (see Fig. 1c). The ETASI models account for the missing smaller magnitudes and therefore lead to corrected, larger b values.

If we would simulate the Ridgecrest M6.4 sequence using its own estimation results (note that this is not a valid forecasting experiment, but used for illustration purposes), we would obtain an \(M \ge 7.1\) event with 10.0% (ETAS conventional), 25.9% (ETAS iso-r), 53.7% (ETAS aniso-r), 15.6% (ETASI iso-r) and 25.3% (ETASI aniso-r) chance.

4.1.3 Criticality

The branching ratios \(\nu _{branch}\), i.e. the average number of direct aftershocks per event, clearly exceed 1 in each model (see Table 2). According to these models, an earthquake would trigger more than one direct aftershock on average, which would cause the sequence to be unstable, with exponentially increasing activity. The large branching ratios are mainly driven by the small b values, which substantially increase the occurrence probability of the more productive, strong earthquakes.

The instability of the M6.4 sequence could be interpreted as an indication of an imminent, strong mainshock. On the other hand, it is unlikely that the instability is based on a model error, e.g. due to a substantial misfit of the b-value due to few magnitude outliers. First, the FMD is estimated accounting for all earthquakes at equal weights, regardless of their magnitude. Therefore, the b value estimate is primarily controlled by the more numerous, small magnitudes. Secondly, the M7.1 event magnitude was not included in the b value estimation.

In summary, the generic California parameters are fitted to a long-term catalog mainly consisting of stable earthquake sequences and seismically quiet periods. Therefore, it cannot adequately predict the magnitude distribution of the M6.4 foreshock sequence, which is characterized by instability due to a particularly flat FMD.

4.1.4 Spatial distribution

Figure 6a and b show the predicted spatial event distributions, averaged over the 10,000 simulation runs and evaluated on the 1 km \(\times\) 1 km grid described in the Application section, for the ETASI iso-r model (in (a)) and the ETASI aniso-r model (in (b)). We overlay the observed event locations to the logarithmic heat map of grid cell probabilities. At first glance, the anisotropic spatial forecast in (b) fits the observed, and clearly non-isotropical event distribution much better than the isotropic counterpart in (a).

Fig. 6
figure 6

Predicted spatial event distributions for Experiment 1 (a, b), Experiment 2 (c, d) and Experiment 3 (e, f). Each predicted distribution is averaged over 10,000 simulated forecasts of the Ridgecrest M6.4 sequence (a, b) and the Ridgecrest M7.1 sequence (cf), based on the ETASI iso-r model (a, c, e) and the ETASI aniso-r model (b, d, f) . The color bar indicates the predicted, logarithmic probability that an event occurs at the respective grid point

In the isotropic model, all direct aftershocks are distributed point-symmetrically around the M6.4 trigger event. Subsequent secondary triggering then takes place around the new initiators. In the anisotropic model, the direct aftershocks are distributed around the fitted rupture segments of the two orthogonal faults (see Fig. 4). Subsequent trigger generations then spread isotropically (if \(M < M_{aniso}\)) or anisotropically (if \(M \ge M_{aniso}\)) around their new initiators. In both plots, we can see a pronounced boundary from green to blue color, indicating the spatial restriction to one rupture length (isotropic model) and half a rupture length (anisotropic model) around the trigger source, according to Eq. (7). Spatial grid cells outside of this boundary can only be activated by secondary triggering or background seismicity.

Fig. 7
figure 7

Summary plots of forecasting results. Predicted probabilities per model that a the number of aftershocks exceeds the observation (633 for Ridgecrest M6.4; 3,273 for Ridgecrest M7.1) and b the largest aftershock magnitude exceeds the observation (7.1 for Ridgecrest M6.4; 5.5 for Ridgecrest M7.1). Dashed horizontal lines represent \(2.5\%\) and \(97.5\%\) quantiles. c Spatial information gains relative to the ETAS conventional model prediction for the same experiment. Legend in a holds for all plots

To quantify the quality of the spatial forecasts, we computed the information gains relative to the ETAS conventional model as described in the Application section. Figure 7c shows the results for Experiment 1 in the left group of bars. Both anisotropic models lead to a pronounced improvement, which confirms the visual impression in Fig. 6a and b. The ETAS and ETASI iso-r models, which differ from the conventional approach in terms of the local spatial restriction, show a small information gain. As we can see in Fig. 6a, none of the observed events occurred outside of the spatial restriction. Therefore, the restriction leads to a slightly more pronounced accumulation of simulated event locations closer to the M6.4 trigger, which coincides better with the observation.

4.2 Forecasting experiment 2

In the second forecasting experiment, we simulated the Ridgecrest M7.1 sequence for a duration of 10 days based on the same generic California parameters as used for Experiment 1.

4.2.1 Predicted aftershock productivity

Figure 5c compares the number of aftershocks, predicted by the five models, to the observed number of 3,273 events. The forecasts show a very similar setup of curves as in Experiment 1 (see Fig. 5a). The ETAS conventional model clearly underestimates the observed number of events. The ETAS iso-r and aniso-r models reach the observation in about 6.5 and 14.1% of the simulation runs. Again, the ETASI models provide the best approximations.

According to Eq. (11), the M7.1 trigger event would on average trigger only roughly 53 direct aftershocks in the ETAS conventional model, compared to 219 in the ETAS iso-r, 242 in the ETAS aniso-r, 328 in the ETASI iso-r and 387 in the ETASI aniso-r model. As explained in detail for Experiment 1, the reason is found in the parameter estimate for \(\alpha\).

4.2.2 Predicted largest aftershock magnitude

Figure 5d shows the predicted pdfs for the largest aftershock magnitude of the Ridgecrest M7.1 sequence. In contrast to Experiment 1, all but the conventional model provide very good forecasts, indicating that the generic, long-term California estimates of the FMD with \(b \approx 1\) coincide well with the FMD of the Ridgecrest M7.1 sequence and the instability of the sequence ended with the occurrence of the M7.1 mainshock. The moderate underestimation by the ETAS conventional model can be explained by the underestimated sequence size, which substantially reduces the sample size of event magnitudes.

4.2.3 Spatial distribution

Figure 6c and d show the predicted spatial distributions of aftershock locations, again for the ETASI iso-r and aniso-r model. The visual impression, that the anisotropic model provides a substantially better forecast, is confirmed by the bar plot in Fig. 7c. The information gain by the anisotropic models is more pronounced for the Ridgecrest M7.1 sequence, because it has a longer rupture extension (\(\sim 68km\) by Wells and Coppersmith 1994) than the M6.4 event and it did not rupture two orthogonal faults, which can be approximated more easily by an isotropic kernel.

4.3 Forecasting experiment 3

In the third forecasting experiment, we simulated the 10-days Ridgecrest M7.1 sequence based on the parameters fitted to the local Ridgecrest M6.4 foreshock sequence. Since the instability of the sequence would lead to exploding forecasts, we assumed the long-term estimated FMD with \(b=1\).

4.3.1 Predicted aftershock productivity

Figure 5e shows that the number of aftershocks is predicted much more similarly by the five models than in Experiments 1 and 2. It suggests that the particular features of the model versions play a smaller role in the estimation over a closed, local sequence than in the generic fit over a long-term catalog with several sequences and seismically quiet periods in between. The ETAS conventional model reaches the observation in 4.4% of the simulation runs, the ETASI aniso-r even overestimates the size of the sequence in 94.1% of the simulations. The other models show very good predictions.

4.3.2 Predicted largest aftershock magnitude

According to Fig. 5f, our manual choice of \(b=1\) led to very realistic predictions of the largest aftershock magnitude. Together with the results for the number of aftershocks, it shows that the Ridgecrest left the unstable state after the M7.1 event by returning to the generic FMD, while retaining a similar structure of aftershock productivity.

4.3.3 Spatial distribution

Finally, Fig. 6e and f suggests that, compared to Experiment 2, the spatial kernels fitted over the Ridgecrest M6.4 sequence are much narrower than those coming from the generic, long-term model fit. This is confirmed by the larger estimates of q and the smaller estimates of \(\gamma\) in Table 2. Figure 7c shows that the narrower spatial distribution leads to a more pronounced information gain by the local restriction and the anisotropy, relative to the ETAS conventional model.

Note that, to some extent, the predicted spatial distributions show traces of late or secondary aftershocks triggered along the orthogonal M6.4 Ridgecrest fault, in contrast to very few observed events in that area. This might be an indication of an underestimated Omori parameter p or an overestimated c, favoring pronounced triggering over a longer time period.

4.4 Summary of forecast quality

Figure 7 shows a summary of the quality measures for the three experiments, with respect to the predicted number of aftershocks in Fig. 7a, largest aftershock magnitude in Fig. 7b and spatial aftershock distribution in Fig. 7c. The conventional model scores worst in each category. It confirms the results in Grimm et al. (2021), who argued that the isotropic and unlimited spatial kernel assumes an implausibly far trigger reach and leads to underestimated cluster sizes.

According to Fig. 7a, the ETASI models seem to predominantly estimate more realistic aftershock productivties than the ETAS models when fitted over the long-term Californian catalog (see Experiments 1 and 2). When fitted over the specific Ridgcrest M6.4 sequence, the bias of an underestimated aftershock productivity seems to be balanced out by not cutting out undetected events. Anisotropic models always lead to larger predicted sequence sizes, in the case of Experiment 3 even to a substantial overestimation.

The predictions of the largest aftershock magnitude, shown in Fig. 7b, are reasonable for all but Experiment 1. Apparently, the short-term incompleteness bias in the ETAS models is of much less consequence for the FMD than for the aftershock productivity.

According to Fig. 7c, as expected, the anisotropic models predict more realistic spatial event distributions. The spatial restriction leads to a much smaller improvement.

4.5 Sensitivity of results

As a sensitivity study, we forecasted the Ridgecrest M7.1 sequence over a duration of 50 days. In a longer time window, direct aftershock productivity has less dominance, and is being displaced more and more by secondary triggering. The underestimation of direct aftershock productivity (e.g. in the ETAS conventional model) typically goes along with more pronounced secondary triggering, characterized by larger estimates of the productivity constant A, see Table 2. Therefore, we observed that the ETAS conventional model caught up the missing events over time. On the other hand, this indicates a temporal distribution of aftershocks which is not in agreement with the observations. Other sensitivity tests, such as the model estimation with varying cut-off magnitudes \(M_c\) or different time windows for the generic California estimates showed generally stable results.

5 Conclusion

In this article, we combined an ETAS approach with generalized anisotropic and locally restricted spatial kernels (Grimm et al. 2021) with the ETASI time model of Hainzl (2021). The new features have the objective to solve the three major biases of the conventional ETAS model, which are the isotropic and spatially unlimited kernel as well as the neglection of short-term incompleteness in the fitted event records.

We estimated five different versions of the new ETASI time-space model to a generic, long-term Californian event set and to the specific Ridgecrest M6.4 foreshock sequence. Then, we applied the fitted model parameters to generate synthetic forecasts of the Ridgecrest M6.4 and the M7.1 sequences, which we analyzed regarding the predicted size of the sequence, largest aftershock magnitude and spatial aftershock distribution.

The results indicate that the ETAS conventional model leads to a substantial underestimation of the number of aftershocks due to its disproportionately small estimates of the direct aftershock productivity for the M6.4 and M7.1 trigger events. The locally restricted ETAS models without ETASI-extension provide more realistic, but still underestimated predictions, as they are affected by the bias of short-term incomplete event sequences in the fitted event set. The combination of ETASI model with locally restricted spatial kernel seems to solve the bias and provides the most robust predictions in the forecasting experiments. The anisotropy of the spatial kernel has a positive impact on the model estimation, however, shows its strength more clearly in the prediction of the spatial event distribution of aftershocks.

More as a by-product, we find that the Ridgecrest M6.4 foreshock sequence showed instable behavior, favoring strong aftershocks by a small Gutenberg-Richter parameter \(b < 0.8\). The instability of the foreshock sequence can be interpreted as an indication of an imminent strong mainshock. In consequence, models fitted on the long-term, stable Californian event records cannot adequately predict the magnitude distribution of this sequence.

The new model provides a better understanding of spatio-temporal earthquake clustering and solves three major biases of the conventional ETAS model at once. Particularly, it leads to better estimates of the aftershock productivity and to improved forecasts of the size of a sequence and the spatial distribution of aftershocks. These improvements may be of major interest for short-term risk assessment during an on-going aftershock sequence, particularly for the risk of a second, damaging earthquake following the initial trigger event. The anisotropic spatial forecast of aftershock locations enables desaster response managers to take actions in areas at risk where particularly high aftershock activity is expected.

Future work should test the forecast quality for other earthquake sequences. It would be interesting to address the question whether the ETASI time-space model can be used to reliably detect the instability of a live sequence, which could have positive impacts on emergency management during on-going sequences. An evaluation of the goodness of fit for the temporal event distribution should be included into such analyses.

6 Data and resources

The earthquake event set for California has been downloaded from the Southern California Earthquake Data Center (https://scedc.caltech.edu/data/alt-2011-dd-hauksson-yang-shearer.html, last accessed on October 25, 2021). Results and figures were produced using Matlab software. The source code for model estimation and simulation is made freely available by the first author in the Github repository https://github.com/ChrGrimm/ETASanisotropic.