1 Introduction

Modeling earthquake sequencing is an interesting and challenging problem. The main purpose of modeling is to provide earthquake forecasts using the available earthquake catalogues. Earthquake occurrences are generally modeled to be Poisson processes. However, choosing a distribution that befits the earthquake sequencing is a well-contested problem. Local or regional behavior of earthquake occurrences varies from one region to the other. While much of the effort goes into studying the regional earthquakes from the perspective of their damage potential, treating the phenomenon from a global perspective would put to test all available statistical distributions of earthquake occurrences and models that satisfy certain empirical observations.

In this present paper, we make use of the importance attached to the plate-tectonics model to define a framework for the earthquake sequencing. The occurrence of earthquakes in and around the plate boundaries falls into tectonically well-defined zones (Bird 2003; Kagan et al. 2010). Kagan et al. (2010) simplified Bird’s (2003) digital model of 52-plate boundaries into a five-zone plate boundary template (see Table 1) to examine global seismicity. Since the five zones define different seismicity rates, upper magnitude thresholds, and varying number of triggered earthquakes, studying them with a simple model such as a Markov chain model to glean a better understanding of the relationships among different tectonic boundaries is one of the main objectives of this paper. We have chosen this selection of zones with a view to understand how the Markov chain model works.

Earthquake sequencing has been treated as a Markov process to study the behaviour of aftershocks (Vere-Jones 1966; Hagiwara 1975; Ogata 1988; Fujinawa 1991). A Markov chain model for earthquake sequence analysis (Tsapanos and Papadopoulou 1999; Tsapanos 2001; Nava et al. 2005; Herrera et al. 2006) to carry out earthquake forecasting in a regional context has been recently explored. There has not been any such attempt made with the observed global seismicity, excepting the recent work of Cavers and Vasudevan (2012, 2013) and Vasudevan and Cavers (2012). This is yet another reason as to why we have conducted the present study. Our starting mathematical model is that enunciated by Nava et al. (2005).

Finite state Markov chains can be represented as a directed graph (Jarvis and Shier 1996). A graph-theoretic approach provides an easy mechanism to understand the visual representation of the process. Furthermore, one can use the properties of directed graphs to draw certain conclusions about the earthquake sequencing in the global context. This gives us another motivation to undertake a graph theoretic approach to study earthquake sequencing.

Another useful component to the use of a directed graph is the relaxation of the attributes associated with the nodes and the arcs of the graph. For example, a simple directed graph of the Markov chain for the 5-zone scenario adapted here is partly restrictive in that it could potentially include successive earthquakes occurring with a large-distance separation within a given zone or between zones. We understand that making causality connections here for forecasting purposes would be an issue. Adding new features to the graph, such as including the spatio-temporal complexity of seismic events in relation to recurring events in the record-breaking sense, will alleviate this problem. Record-breaking statistics has been studied in connection with extreme statistics (Tata 1969; Glick 1978; Nevzorov 1987; Vogel et al. 2001; Davidsen et al. 2008; Yoder et al. 2010; Edery et al. 2013). This addressed the question of analyzing the temporal sequence of extreme events observed in nature. However, for any improvement to the Markov chain model for earthquake sequencing, we consider including the spatio-temporal complexity of record-breaking events or earthquakes in the present methodology. Also, such a study has not been incorporated into any forecasting work before.

Davidsen et al. (2008) have recently studied the spatio-temporal complexity of earthquakes in the southern California region by examining the recurrence of earthquakes in the record-breaking sense and searched for signs of causal structure in the data. Extension of this study to global seismicity has yet to be investigated. It is in this context that we would like to incorporate the spatio-temporal complexity of recurring events for each seismic event by assigning weights to the arcs of the graph or state transition probabilities represented in the Markov chain. It would be interesting to find out how introducing this feature in graphs and, hence, in Markov chains would play a role in the forecasting problem. Hence, we undertake the present study of the Markov chains of earthquake sequences with and without the spatio-temporal complexity component of the recurring events in the record-breaking sense.

2 Markov Chains of Earthquake Sequences and Directed Graphs

In this section, we introduce the Markov chain model to understand earthquake sequencing by partitioning the earthquake region into a meaningful number of zones and show how the spatio-temporal complexity is introduced into this model.

2.1 Without the Inclusion of Spatio-Temporal Complexity of Recurring Events

A Markov chain, \(M\), for an earthquake sequence is modeled by partitioning an earthquake region into zones, examining the system, \(S\), in a finite number of states, \(s_i\), and building the transition probabilities, \(p_{ij}\), between states, \(i\) and \(j\), at discrete time intervals, \(\Delta t\). When there are \(R\) zones (labeled by \(0\) to \(R-1\)), each state, \(s_i\), corresponding to a distinct time interval is expressed as a right-to-left concatenation of binary digits \(b_{R-1}b_{R-2}\cdots b_1b_0\) with \(b_L=1\) or \(0\), respectively, corresponding to whether or not there is an earthquake occurrence in region \(L\) in the time interval corresponding to \(s_i\). We let \(\theta _{ij}\) be the number of occurrences from state \(i\) to state \(j\), \(\Theta =[\theta _{ij}]\) denote the transition frequency matrix and \(s(n)\) denote the state for interval number \(n\). The matrix \(P=[p_{ij}]\) is the probability transition matrix consisting of transition probabilities, \(p_{ij}\), which is given as

$$\begin{aligned} p_{ij}&= \mathbf{Pr}\{s(n+1)=j|s(n)=i\}=\mathbf{Pr}\{j|i\},\end{aligned}$$
(1)
$$\begin{aligned} p_{ij}&= \frac{\theta _{ij}}{\xi _i},\hbox { where }\xi _i=\sum _{j}\theta _{ij}. \end{aligned}$$
(2)

Given \(p_{ij}\) and the system is in state \(i\), we express the conditional probability of an earthquake occurring in region \(L\) (Nava et al. 2005) or regional active probability, \(p_{iL}\), as

$$\begin{aligned} p_{iL}=\mathbf{Pr}\{L|i\}=\sum _{j\supset L}p_{ij} \end{aligned}$$
(3)

and regional forecast quality, \(q_{ij}\), as

$$\begin{aligned} q_{ij}=\frac{1}{R}\left[ \sum _{L\subset j} p_{iL}+\sum _{L\not \subset j}(1-p_{iL}) \right] , \end{aligned}$$
(4)

where \(L\subset j\) means that state \(j\) includes seismicity in region \(L\). We use Markov chain probabilities resulting from Eqs. (1) to (4) for analysis.

Such a finite-state Markov chain \(M\) can be represented in the form of a directed graph, \(G\), having nodes consisting of all possible states, i.e., binary strings of length \(R\). There is a set of arcs, \(E\), connecting different states, and \(G\) contains an arc \((i,j)\in E\) if and only if \(p_{ij}>0\) (Jarvis and Shier 1996). Figure 1 shows an example of a Markov chain and associated directed graph corresponding to \(R=2\) zones, where the \(2^2=4\) states \(\{00,01,10,11\}\) are written in decimal format \(\{0,1,2,3\}\), respectively.

Fig. 1
figure 1

An example of a probability transition matrix, \(P\), for a \(4\)-state Markov chain, \(M\), and the directed graph, \(G\), associated with \(M\)

In this figure, we do not show all of the possible transitions between states and typically an arc \((i,j)\) is omitted when \(p_{ij}=0\). For example, in Fig. 1 the arc \((1,2)\not \in E\) as \(p_{12}=0\), the arc \((3,3)\in E\) as \(p_{33}>0\) and the arc \((1,1)\not \in E\) as \(p_{11}=0\). The combinatorial structure of the directed graph contains important information about the Markov chain model used for earthquake sequencing. Often, it is useful to apply a weight \(w_{ij}\) to each arc of the underlying directed graph to get a weighted directed graph. The weights come from the Markov chain and typically have the form \(w_{ij}=\theta _{ij}\) or \(w_{ij}=p_{ij}\).

2.2 With the Inclusion of Spatio-Temporal Complexity of Recurring Events

To apply the Markov chain model to global earthquake sequences we modify the weights \(w_{ij}\) in the weighted directed graph by considering recurrences in the sense of Davidsen et al. (2008). The purpose of this is to introduce spatial-temporal complexity into the model so that transitions with earthquake occurrences at large distances have less of an impact on our model than transitions with earthquake occurrences at short distances. Each event (i.e., earthquake) in a zone may have several recurring events in the record-breaking sense. The recurring events for one event in a given zone may be in the same zone or may fall into other zones. This flexibility adds to the possibility of interactions among zones. As described by Davidsen et al. (2008), we first form a directed graph representing the network of recurrent events. In this network, each earthquake represents a node \(a_i\), and each recurrence gives a link between pairs of nodes directed according to the time ordering of the earthquakes. Each recurrence between earthquakes corresponding to nodes \(a_i\) and \(a_j\) at distance \(r\) is then assigned a weight, its recurrence weight, as a relation of the number of record-breaking events from the region corresponding to \(a_i\) to the region corresponding to \(a_j\) at distance at most \(r\). Such a relation should be decreasing in \(r\) so that short distance recurrences are emphasized in the model. In this note, an empirically derived relation for recurrence weights is used as outlined in Sect. 6.

We use prime notation for the Markov chain model that takes recurrence weights into account. In particular, we let \(\theta '_{ij}\) be the sum of the recurrence weights corresponding to occurrences from state \(i\) to state \(j\) in the Markov chain described in Sect. 2.1 and \(\Theta '=[\theta '_{ij}]\). We define \(p'_{ij}\), \(p'_{iL}\) and \(q'_{ij}\) as in Eqs. (2)–(4), respectively, using \(\theta '_{ij}\) in place of \(\theta _{ij}\) in Eq. (2), \(p'_{ij}\) in place of \(p_{ij}\) in Eq. (3) and \(p'_{iL}\) in place of \(p_{iL}\) in Eq. (4).

3 Plate Boundary Zones and Earthquake Occurrences

Since the acceptance of plate tectonics as a useful model to understand the earth processes, much knowledge has been gained in relating most of the earthquakes to strong plate motions and an increased strain rates at plate boundaries (Stein and Freymueller 2002). Geodetic measurements and understanding the plate motion vectors have become common practices in many of the seismically active areas (DeMets et al. 1990, 2010). Bird (2003) presented an updated model of plate boundary earthquakes.

Global seismic catalogues such as the CMT catalogue contain the sequence information of the earthquakes around the globe. Earthquake and statistical seismologists study these catalogues and attempt to seek certain physical attributes and statistics of seismicity. Understanding differing levels of seismic activity at different zones of plate boundaries is not a simple, closed problem. One aspect of it, earthquake forecasting, is a challenging research topic. In particular, relating two earthquakes that occur in a sequence at two distant earthquake zones far apart for forecasting purposes demands a causal connection. However, examining the zone to zone transition probabilities with an earthquake catalogue of events over a well-defined time-interval, instead of looking at an event to event transition probability, might shed new light on the causality connection. Here, we would like to explore the Markov chain model in conjunction with the inclusion of the spatio-temporal recurrence of events in the record-breaking sense as additional weights to the arcs of the graph to understand the earthquake processes.

4 Global Earthquake Catalogue and Kagan–Bird–Jackson Template

The platform on which the Markov chain model is to be applied to earthquake sequencing originates from Bird’s 52 zones plate-boundary model, PB2002 (Bird 2003). Within the graph theory framework considered for the Markov chain, one needs to consider \(2^{52}\) states with all possible transitions between \(i\) and \(j\), including no transitions, for a chosen time-interval, \(\Delta t\). While this will be our end objective, in this paper we consider a simplified version of Bird’s template defined by Kagan et al. (2010) and, henceforth, known as Kagan–Bird–Jackson (KBJ) template. This template defines five tectonic zones (Kagan et al. 2010) (see Table 1): Zone 4: Trench that includes incipient subduction, and earthquakes in outer rise or upper plate); Zone 3: Fast-spreading ridges (oceanic crust, spreading rate \(>\)40 mm/a; includes transforms); Zone 2: Slow-spreading ridges (oceanic crust, spreading rate \(<\)40 mm/a, includes transforms); Zone 1: Active continent (including continental parts of all orogens of PB2002, plus continental plate boundaries of PB2002); and Zone 0: Plate-interior or the rest of the Earth’s surface. As Kagan et al. (2010) pointed out, the tectonic zones are defined by objective rules. The directed graph of the Markov chain that results from these zones, implemented in MATLAB code, can be easily reproducible, expandable and revisable to accommodate changes to the structure of the graph, as we would attempt to test out existing empirical relationships and new emerging concepts.

Table 1 Tectonic zone identifier, tectonic zone and the number of earthquakes, \(N\), considered for \(M_w>5.6\) and depth \(\le 70\) km from 1982/01/01 to 2007/03/31

We use both the global CMT and the NEIC catalogues to extract the required data using the latitude-longitude grid definition (Kagan et al. 2010). For this particular study, we used the published result of the global tectonic zones classification, as found in: http://bemlar.ism.ac.jp/wiki/index.php/Bird’s_Zones. For this classification, Kagan et al. (2010) partitioned the shallow (≤70 km-depth) events with moment magnitude \(M_\mathrm{w}>5.6\) from the Global CMT catalogue (1982/01/01–2007/03/31) into five zone sub-catalogues using their grid-assignment schemes. The selected catalogue of 6,752 earthquakes contains 4,407 from Zone 4 (Trenches), 723 from Zone 3 (fast-spreading ridges), 487 from Zone 2 (slow-spreading ridges), 898 from Zone 1 (active continent), and 237 from Zone 0 (plate interior), respectively.

The present work is carried out without removing aftershocks since they are part of the activity during the period in which the main shock had taken place. Since a large threshold magnitude is used for the construction of the Markov chain, aftershocks below this threshold are neglected. So, for the succeeding periods, forecasting large magnitude aftershocks is possible (Herrera et al. 2006).

For the five zones considered, there are \(2^5\) states that define the Markov chain. For example, state \(0\) (representing \(00000\) in binary) corresponds to no earthquake occurrence in all five zones in the chosen time interval, \(\Delta t\), state \(31\) (representing \(11111\) in binary) points to earthquake occurrences in all five zones, and state \(30\) (representing \(11110\) in binary) corresponds to earthquake occurrences in Zone \(4\), Zone \(3\), Zone \(2\) and Zone \(1\), with no earthquake occurrence in Zone \(0\). All other states, \(1\) to \(29\), are defined similarly (see Table 2 for details).

Table 2 Zone and state definition used in the construction of a directed graph of a Markov chain

5 Determination of Regional Threshold Magnitudes, and Time Interval, \(\Delta t\)

The earthquake hazard threshold magnitudes may be different for each region and should be chosen so that the hazard estimations are useful and should not be too small so as to appear in most of the considered intervals (Nava et al. 2005). Due to the variation of earthquake hazards in the five zones, we chose a threshold magnitude of 6.0 for Zone 4 as a representative for earthquake hazard in this region and magnitude 5.6 for the remaining zones, that is, we used the threshold magnitude vector [6.0, 5.6, 5.6, 5.6, 5.6] for our analysis. We defer the validity of the choice of regional or zone threshold magnitudes to a future study on the Markov chain model of global earthquake sequencing. Using \(M_\mathrm{w}>6.0\) for Zone 4 decreases the number of earthquakes with depth ≤70 km to 1,806 during the period from 1 January 1982 to 31 March 2007.

For a Markov chain structure given earlier for the five zones, the computation of transition frequencies and, hence, transition probabilities, depend on the chosen time-interval, \(\Delta t\). We use the simple rules outlined by Nava et al. (2005) to choose \(\Delta t\):

  1. 1.

    \(\Delta t\) should be small enough such that the hazard estimations are useful;

  2. 2.

    \(\Delta t\) should not be too small that the most frequently occurring transition is from state \(0\) to state \(0\);

  3. 3.

    \(\Delta t\) should not be too large that state \(31\) to state \(31\) transitions are dominant.

So, for the threshold magnitudes chosen, \(\Delta t\) should be large enough to allow interaction among regions and make estimates of Markov chain transition probabilities robust.

We tested out three functions for the Markov chain model without recurrences to determine \(\Delta t\). Function 1: The difference between the number of transitions from state \(0\) to state \(0\) and the number of transitions from state \(31\) to state \(31\). Function 2: The difference between the total number of transitions from state \(0\) and the total number of transitions from state \(31\). Function 3: The function given in the equation

$$\begin{aligned} F_3(S) = -\sum _i \pi _i \sum _j p_{ij} \log _2 p_{ij}, \end{aligned}$$

based on the maximum entropy principle (Jaynes 2003) as it is applied to finite Markov chains (Ünal and Çelebioğlu 2011), where \(p_{ij}\) is given by (2) and \(\pi =[\pi _i]\) is the limiting distribution of the Markov chain.

To satisfy the rules (1)–(3) outlined above, \(\Delta t\) is chosen so that both Functions \(1\) and \(2\) show an output close to \(0\) and for which Function 3 is maximum. Figure 2 is a plot that represents the functional behaviour of the three functions used when applied to a threshold magnitude vector of [6.0, 5.6, 5.6, 5.6, 5.6].

Fig. 2
figure 2

Plot of the three relations using threshold magnitude vector [6.0, 5.6, 5.6, 5.6, 5.6] and the 5-zone KBJ template for finite Markov chain without recurrences

Considering Function 3 as a function of time-interval and ranging \(\Delta t\) from 1 to 20 in increments of 1 day, we observe that the entropy is largest when \(\Delta t=6\) days and \(\Delta t=9\) days, with a difference of 0.1 between these two entropy values. Thus, the maximum entropy principle suggests a time interval of six days or nine days. The zeros of Functions 1 and 2 occur when \(\Delta t\) is between nine and ten days. In this paper, as evident by Fig. 2, we settle on \(\Delta t=9\) days for our analysis.

6 Determination of Recurrence Weights

In order to incorporate recurrences into the Markov chain we first form the network of recurrences (Davidsen et al. 2008) using the earthquake catalogue described in Sects. 4 and 5 with threshold magnitude vector [6.0, 5.6, 5.6, 5.6, 5.6] and time interval \(\Delta t=9\) days. Figure 3 shows the first 23 earthquakes in this sequence along with recurrences amongst those events.

Fig. 3
figure 3

Network of recurrences for the first 23 earthquakes in the catalogue partitioned according to \(\Delta t=9\) days. Below each earthquake event is the zone corresponding to that earthquake’s location along with the state descriptions for the first five time-intervals

The weight applied to each arc in the network of recurrences is derived empirically by using a total count of record breaking events between the corresponding earthquake zones and the distance involved. In particular, we define \(L_{jk}(r)\) to be the number of record-breaking events from zone \(j\) to zone \(k\) at distance at most \(r\) in the network of recurrences. We plot these 25 relations in Fig. 4, namely, the relations \(L_{jk}(r)\) for \(j,k=0,1,2,3,4\).

Fig. 4
figure 4

Plot of number of recurrences from zone \(j\) to zone \(k\) (for \(j,k=0,1,2,3,4\)) corresponding to earthquake recurrences at distance at most \(r\)

Observe that each relation is increasing since the total number of record-breaking events increases as we increase the distance threshold allowed between events.

Each recurrence from an earthquake \(a\) to an earthquake \(b\) in the sequence is given a weight \(W_{ab}\) between \(0\) and \(1\), with a weight equal to \(1\) if the distance between \(a\) and \(b\) is less than \(50\) km. If the distance is \(r\) with \(r\ge 50\) km and earthquakes \(a\) and \(b\) occur in Zones \(j\) and \(k\) ,respectively, a weight of

$$\begin{aligned} W_{ab}=\frac{L_{jk}(20{,}000)-L_{jk}(r)}{L_{jk}(20{,}000)-L_{jk}(50)} \end{aligned}$$

is given. Each weight \(W_{ab}\) uses the record-breaking counts for each zone to zone interaction. For small values of \(r\), the assigned weight \(W_{ab}\) is close to \(1\), whereas, for large values of \(r\), the assigned weight \(W_{ab}\) is close to zero. In particular, note that for \(r=50\) km, an output of \(1\) is given while for \(r=20{,}000\) km, an output of \(0\) is given.

A Markov chain with the inclusion of spatio-temporal complexity of recurring events is then derived by summing the weights of the recurrence arcs corresponding to occurrences from state \(i\) to state \(j\) in consecutive time-intervals as described in Sect. 2.2. For simplicity, in the case that \(i\) or \(j\) corresponds to state \(0\) and there are no recurrence weights, we take \(\theta '_{0k}=\theta _{0k}\) and \(\theta '_{k0}=\theta _{k0}\). To describe how \(\theta _{ij}'\) is obtained in more detail, define the indicator function \({\mathbf {1}}_{ij}(n)\) for each interval number \(n\) as

$$\begin{aligned} {\mathbf {1}}_{ij}(n)=\left\{ \begin{array}{ll} 1 &{} \hbox {if } s(n)=i \hbox { and } s(n+1)=j,\\ 0 &{} \hbox {otherwise,}\\ \end{array}\right. \end{aligned}$$

where as defined in Sect. 2.2, \(s(n)\) is the state for interval number \(n\). Note that in the Markov chain model without the inclusion of recurring events, we have

$$\begin{aligned} \theta _{ij}=\sum _{n=0}^{N-1} {\mathbf {1}}_{ij}(n) \end{aligned}$$

where \(N\) is the total number of transitions. In the Markov chain model with the inclusion of spatio-temporal complexity of recurring events we use

$$\begin{aligned} \theta _{ij}'=\left\{ \begin{array}{ll} \theta _{ij} &{} \hbox {if } i=0 \hbox { or } j=0,\\ \sum \nolimits _{n=0}^{N-1} {\mathbf {1}}_{ij}(n)W(n) &{} \hbox {otherwise,}\\ \end{array}\right. \end{aligned}$$

where \(W(n)\) is the sum of the recurrence weights \(W_{ab}\) for every earthquake recurrence that occurs between earthquakes \(a\) and \(b\) with earthquake \(a\) occurring in the time interval \(n\) and earthquake \(b\) occurring in time interval \(n+1\). For example, Fig. 3 shows that the first two time-intervals contribute a \(1\) to \(\theta _{21,20}\) in the Markov chain model without recurrences, whereas a contribution of \(W(0)=1.64\) is given to \(\theta '_{21,20}\) determined by summing the empirically derived weights of the seven recurrences from the first time-interval to the second time-interval.

7 Matrices Arising from the Markov Chain

With the chosen time-interval, \(\Delta t=9\), and threshold magnitude vector, [6.0, 5.6, 5.6, 5.6, 5.6], we computed the transition frequency matrix \(\Theta\) (see Fig. 5a) and the associated transition probability matrix (see Fig. 5b) for the Markov chain.

Fig. 5
figure 5

a Transition frequency matrix \(\Theta\) for finite Markov chain without recurrences. b Transition probability matrix using Eq. (2). c Regional probability matrix using Eq. (3). d Regional forecasting quality matrix using Eq. (4)

It is evident from Fig. 5a, b that the presence or absence of Zones 4 in states is significant as is reflected in higher transition probability values. As defined by Nava et al. (2005), we computed both the regional or the zone probability and the regional or the zone forecasting quality values (Fig. 5c, d, respectively).

We also computed the corresponding matrices for the Markov chain that incorporates record-breaking events using the relations described in Sect. 6. In particular, we computed the transition frequency matrix \(\Theta '\) (see Fig. 6a) and the associated transition probability matrix (see Fig. 6b), along with the regional or the zone probability and the regional or the zone forecasting quality values (Fig. 6c, d, respectively).

Fig. 6
figure 6

a Transition frequency matrix \(\Theta '\) for finite Markov chain with recurrences. b Transition probability matrix using Eq. (2) with \(p'_{ij}\). c Regional probability matrix using Eq. (3) with \(p'_{iL}\). d Regional forecasting quality matrix using Eq. (4) with \(q'_{ij}\)

When a finite-state Markov chain is irreducible and aperiodic, there is a unique stationary (limiting) distribution. In our case, both Markov chains satisfy these conditions and; hence, each has a unique limiting distribution. We let \(\pi\) (or \(\pi '\)) denote the limiting distribution for the Markov chain without (or with) recurrences.

8 Statistical Analysis of Results

The results given in the previous section for a finite Markov chain provide an ideal platform to compare them with two memoryless distributions, uniform and Poisson. We use the following Eqs. (5) to (7) (Nava et al. 2005), to describe the three models for transition probability computations, namely uniform, Poisson, and a Fixed Markov chain models, respectively. The uniform probability is

$$\begin{aligned} p_{ij}^U\equiv p^U = (R+1)^{-1} = \langle p_{ij}\rangle . \end{aligned}$$
(5)

The uniform probability for a 32-state system turns out to be 0.0303, and is too small to represent the true system. For the Poisson model, since we know the number of earthquakes that had taken place in the time period considered, we computed the Poisson parameter, \(\lambda _L\), for each region, and, hence, the Poisson transition probability, the latter with equation:

$$\begin{aligned} p_{ij}^P \equiv p_j^P = \prod _{L\subset j}(1-\exp ({-\lambda _L\Delta t}))\prod _{L\not \subset j}\exp ({-\lambda _L\Delta t}). \end{aligned}$$
(6)

For the fixed Markov chain model, we use Eq. (7) to compute its transition probability:

$$\begin{aligned} p_{ij} \equiv p_j^0 = \frac{\xi _j}{\sum \nolimits _i \xi _i}, \end{aligned}$$
(7)

where \(\xi _i\) is given in Eq. (2). Given the transition probability matrix for a \(32\)-state fixed Markov chain, it is possible to compute the \(k\)-step transition probability matrix, \(p^k_{ij}\), or the \(k\)th matrix power of the transition probability matrix \(P\). For \(k = 10\), we computed \(P^{10}\) and arrived at the corresponding memoryless transition probability matrix (to six decimal places). We summarize the results for the five-zone KBJ template in Fig. 7a.

Fig. 7
figure 7

a Transition probability associated with the 32-states for the Poisson model, the fixed Markov chain model, and the 10th period of the Markov chain (memoryless) model. b A comparison of two residual probability curves

It is apparent that the fixed Markov chain model to represent the earthquake occurrences deviates from the Poisson model that is commonly assumed in relative probability values. Also, the \(P^{10}\) memoryless model shows deviations from the Poisson model. For the five-zone case, it is clear that 10th step or 60 days would take the system to a memoryless state, as the difference plot suggests in Fig. 7b. It is important to note that all three models illustrated in Fig. 7a share a commonness in the shape of the distributions. It emphasizes the relative importance of the Zone 4 in enhancing the probability for states where it is represented (states 16 to 31) and the relative insignificance of the states where Zone 0 appears (states \(2i+1\) for \(i=0,1,\ldots ,15\)).

9 Centrality Measures of Directed Graphs

There are various centrality measures for graphs that measure the importance of a node within the graph. Two of the most widely used in network analysis are betweenness centrality (White and Borgatti 1994) and degree centrality. For a directed graph \(G\) with node set \(V\), a path in \(G\) is a sequence of arcs which connect a sequence of nodes distinct from one another. Over the set of paths from a node \(s\in V\) to a node \(t\in V\), a shortest path from \(s\) to \(t\) is one that uses the minimum number of arcs. Note that there may be several shortest paths from \(s\) to \(t\) each which use the same minimum number of arcs. For distinct nodes \(s,t,v\in V\), we define \(\sigma _{st}\) to be the total number of shortest paths from \(s\) to \(t\) and \(\sigma _{st}(v)\) the number of those paths that pass through node \(v\). Now, for each node \(v\in V\), the betweenness of \(v\), denoted by \(C_\mathrm{B}(v)\), is defined to be

$$\begin{aligned} C_\mathrm{B}(v)= \sum _{s \ne v \ne t \in V}\frac{\sigma _{st}(v)}{\sigma _{st}}, \end{aligned}$$

where the summation ranges over all distinct pairs of nodes \(s,t\) different from \(v\).

Nodes with a high betweenness have a high probability to occur on a randomly chosen shortest path between two randomly chosen nodes. Such nodes are critical to the graph since their removal would destroy many short paths in the graph. The betweenness of each node can be scaled to lie between 0 and 1 by dividing by the total number of possible ordered pairs of nodes \(s\) and \(t\) not including \(v\). That is, \(C_\mathrm{B}(v)\) is divided by \((N-1)(N-2)\), where \(N\) is the number of nodes in the directed graph. The scaled betweenness was computed for each node in the present underlying directed graph associated with \(P\) having parameters \(\Delta t = 9\) and threshold magnitude vector [6.0, 5.6, 5.6, 5.6, 5.6]. The results are summarized in Fig. 8a, and support the significance of Zone 4 in states where it appears, the trench zone of the KBJ template.

Fig. 8
figure 8

a A comparison of the betweenness centrality of the associated underlying directed graph and the limiting distributions for the Markov chains with and without recurrences. b A plot of the indegree and outdegree of the underlying directed graph associated with the Markov chains using \(\Delta t=9\) and threshold magnitude vector, [6.0, 5.6, 5.6, 5.6, 5.6]. c A plot of the indegree and outdegree of the weighted directed graph associated with the Markov chain without recurrences using weights \(w_{ij}=\theta _{ij}\). d A plot of the indegree and outdegree of the weighted directed graph associated with the Markov chain with recurrences using weights \(w_{ij}=\theta '_{ij}\)

Curiously, the scaled betweenness of the underlying directed graph associated with the Markov chains provides a similar ranking of states to that of the limiting distributions \(\pi\) and \(\pi '\). This indicates that the combinatorial structure of the directed graph contains important information of the earthquake sequencing. One possible extension would be to analyze a modified betweenness centrality for the weighted directed graphs associated with Markov chains. Although there have been a number of attempts to extend the betweenness to weighted networks, we defer this to a future study.

The degree centrality is another widely used measure of node centrality. The indegree is a count of the number of arcs directed to the node and outdegree is the number of arcs that leave the node. The degree centrality can easily be generalized to weighted graphs simply by summing the weights of arcs leaving a node, or entering a node. Figure 8b shows a plot of the indegree and outdegree of each node of the underlying directed graph, that is, each state of the associated Markov chain having time-interval \(\Delta t = 9\) and threshold magnitude vector [6.0, 5.6, 5.6, 5.6, 5.6]. Figure 8c, d show plots of the indegree and outdegree of each node of the weighted directed graphs corresponding to the Markov chain model without recurrences and the Markov chain that incorporates record-breaking events. Figure 8c shows that for almost every state the indegree and outdegrees are equal. This is due to the construction of the weighted directed graph, namely, each sequence of transitions \(s_i\rightarrow s_j\rightarrow s_k\) has a contribution of 1 to both the indegree of state \(j\) and the outdegree of state \(j\). Figure 8d shows differences in the indegree and outdegree for weighted directed graph when recurrences are taken into account.

10 Discussion and Conclusions

In this work, we have brought together ideas developed by several earthquake seismologists to explore questions on the analysis of earthquake sequencing for forecasting purposes. Understanding the earthquake sequencing with a Markov chain model in seismogenic zones in the regional sense is a subject of study in recent years (Tsapanos and Papadopoulou 1999; Nava et al. 2005; Herrera et al. 2006; Ünal and Çelebioğlu 2011). Kagan and Jackson (2012) have examined the long-term forecasts, based on the available earthquake catalogues in seismic zones throughout the globe. They suggest that the sequence of events occurring in one geographic region is independent of the sequence of events in another geographic region over the globe. Extending the regional studies of earthquake sequencing done with a Markov chain model to a global earthquake sequencing is new. In the directed graph representation of the Markov chain that includes the distance-constrained recurring events in the record-breaking sense, as is presented in this paper, points to the participation of not only recurring events such as aftershocks following a large one within a given zone but also recurring events across zones (Fig. 3).

Forecasting with a Markov chain without recurring events in the record-breaking sense might question the validity of long-distance causality between successive events within a zone and between zones. However, inclusion of distance-dependent recurring events within the framework of Markov chain implicitly points to the presence of very low probability of events occurring over long-distances. Also, within this framework, there is a general alteration in the physical interaction between states, i.e., in the transition probability values between states. State to state transition frequencies without and with the inclusion of recurrence events, as shown in Figs. 5 and 6, confirm the change in the interaction between states. Also, in regional studies (Nava et al. 2005; Ünal and Çelebioğlu 2011), certain state-to-state transition frequencies and hence, state-to-state probabilities are very small. In fact, the variability in the state-to-state transition frequencies present in this study is relevant to the forecasting work.

We follow Nava et al. (2005) to evaluate the performance of the two models by counting the number of successes of forecasted and aftcasted transitions. Using (5), high probabilities are defined as those above the threshold probability

$$\begin{aligned} p^H=(1+\mu )p^U, \end{aligned}$$

where \(\mu \ge 0\) is an arbitrary constant. The purpose of \(\mu\) is to separate the criteria for high probability from the random guess probability \(p^U\). We call a forecasted transition one with forecast \(p_{ij}>p^H\), where \(p_{ij}\) has been calculated based on information existing up to the time of the forecast. If \(p_{ij}\) has been evaluated based on all available information then we call the transition an aftcasted transition. A success is the number of occurrences of a forecasted (or aftcasted) transition.

Using \(\mu =0.1\), we compute the results of aftcasting and forecasting. For the Markov chain method without recurrences (see Table 3), aftcasting of the whole catalog results in an average observed occurrence probability of \({\hat{p}}=0.0881\) which is greater than the average expected probability \(p^{U} = 0.0303\) for the uniform null hypothesis. The resulting 872 successes out of 1,024 transitions (85 %) has a negligible probability of resulting from purely random guessing considering an average \({\bar{\kappa }} = 9.75\) and a corresponding \(p={\bar{\kappa }}/S=0.305\). Calculating ten state forecasts, the number of successes, six, has an occurrence probability of 0.039272. With 50 state forecasts the number of successes, 28, has an occurrence probability of 0.000105. With 100 state forecasts the number of successes, 54, has an occurrence probability of 0.000000543.

Table 3 Results of state aftcasting and forecasting for the Markov chain model without recurrences

Performing the same calculations for the Markov chain method with recurrences to our data (see Table 4), aftcasting of the whole catalog results in an average observed occurrence probability of \({\hat{ p}}=0.0861\) ,which is greater than the average expected probability \(p^{U} = 0.0303\) for the uniform null hypothesis. The resulting 818 successes out of 1,024 transitions (80 %) has a negligible probability of resulting from purely random guessing considering an average \({\bar{\kappa }} = 8.56\) and a corresponding \(p={\bar{\kappa }}/S=0.268\). Calculating \(10\) state forecasts, the number of successes, 6, has an occurrence probability of 0.02218. With 50 state forecasts the number of successes, 25, has an occurrence probability of 0.000255. With 100 state forecasts the number of successes, 52, has an occurrence probability of 0.000000051.

Table 4 Results of state aftcasting and forecasting for the Markov chain model with recurrences

Tables 3 and 4 show that the number of successes is higher for the Markov chain model without recurrences when using a threshold probability of \(p^H=(1+\mu )p^U=0.0333\). Increasing this threshold probability \(p^H\) further we observe that the model with recurrences outperforms that of without recurrences in terms of number of successes (see Fig. 9).

Fig. 9
figure 9

A comparison of the two models considered in this paper plotting number of successes against threshold probability for \(10\) state forecasts and \(50\) state forecasts, respectively

Present results indicate that the finite Markov chain model for the 5-zone earthquake sequencing appears to differ from the Poisson model. To examine how sensitive this observation is to \(\Delta t\) values, we examined the cumulative probability distribution for the Poisson model and the finite Markov model with four different \(\Delta t\) values. For transitions involving small probabilities with \(p_{ij}<0.05\), there is no difference between the Poisson model and the Markov chain model. However, as the transition probability increases beyond 0.05, differences between the models occur (Fig. 10).

Fig. 10
figure 10

A comparison of the Poisson model and the Markov chain model for four different \(\Delta t\) values (namely 6, 8, 10 and 12)

It can be inferred from Fig. 10 that the differences between the two models are not negligible for \(\Delta t=9\) days.

A directed graph representation of the finite Markov chain of state transitions affords a mechanism to visually see the interaction of five-zones at different scales as long as the number of transitions for any given magnitude vector is large enough to make this procedure robust. Arcs in directed graphs now contain in it a simple picture of when the transition probability is positive.

For simplicity, we restricted ourselves to 5 tectonic zones or 32 available states for transitions. One possible extension would be to look at Bird’s (2003) 52-plate description of the plate boundaries to gain further insight into the usefulness of the directed graph of a Markov chain.

We present a simple method to understand the earthquake sequencing. There are still many unanswered questions. With regard to the choice of magnitude-threshold for different zones, we believe that the directed graph representation affords us a simple tool to work with. The present study can easily be extended to understand the influence of magnitude-threshold for the choice of the \(\Delta t\) on the transition frequency matrix so that the hazard estimations based on the state to state transition frequency matrix becomes useful.

In summary, we have modified the approach of Nava et al. (2005) to assign weights to the arcs of the directed graph that represents the Markov chain model used in the study of earthquake sequencing. It creates an avenue to quantify further the directed graph based on physical and observed properties of earthquakes by assigning meaningful and interpretable weights to the nodes and arcs. We find that the transition frequency matrix will provide the basis for understanding the fluctuations in state-to-state transitions. It will also help us recognize certain patterns in an evolving graph for forecasting purposes. The present work over the time window of the earthquake catalogue considered suggests the dominance of the subduction-style earthquakes. Also, all state-to-state transitions that contain in them the intra-plate boundary zone have lower probabilities of earthquakes (see Fig. 7a).

Fig. 11
figure 11

a The transition frequency matrix for the Markov chain without recurrences using the first half of the catalogue. b The transition frequency matrix for the Markov chain without recurrences using the second half of the catalogue. c A comparison of the limiting distributions arising from the Markov chains in a and b along with the limiting distribution using the entire catalogue

We observe invariant statistics of the directed graph to two different time-spans. Figures 11a, b show a comparison of the transition frequency matrices corresponding to the Markov chain model applied to the first half of the catalogue and the second half of the catalogue, respectively. Figure 11c shows a comparison of the limiting distributions for the first and second half of the catalogue compared to the limiting distribution when using the entire catalogue and is indicative of the invariance of the statistics to different time-spans.