Parametric and non-parametric estimation of extreme earthquake event: the joint tail inference for mainshocks and aftershocks

In an earthquake event, the combination of a strong mainshock and damaging aftershocks is often the cause of severe structural damages and/or high death tolls. The objective of this paper is to provide estimation for the probability of such extreme events where the mainshock and the largest aftershocks exceed certain thresholds. Two approaches are illustrated and compared -- a parametric approach based on previously observed stochastic laws in earthquake data, and a non-parametric approach based on bivariate extreme value theory. We analyze the earthquake data from the North Anatolian Fault Zone (NAFZ) in Turkey during 1965-2018 and show that the two approaches provide unifying results.


Introduction
In a seismically active area, a strong earthquake, namely the mainshock, is often followed by subsequent damaging earthquakes, known as the aftershocks. These aftershocks may occur in numerous quantity and with magnitudes equivalent to powerful earthquakes on their own. For instance, in the 1999İzmit earthquake, a magnitude 7.6 mainshock triggered hundreds of aftershocks with magnitudes greater than or equal to 4 in the first six days, cf. [25]. In the 2008 Sichuan earthquake, a mainshock of magnitude 8.0 induced a series of aftershocks with magnitudes up to 6.0. The results are severe structural damage and loss of life, especially when the area has already been weakened by the mainshock. Theİzmit earthquake killed over 17,000 people and left half a million homeless [22]. The Sichuan earthquake caused over 69,000 deaths and damages of over 150 billion US dollars [9].
The goal of this paper is to provide a statistical analysis for the joint event of an extreme mainshock and extreme aftershocks. Throughout the paper, we denote the magnitude of a mainshock with X and that of the largest aftershock with Y . We estimate via two approaches the probability of for large values of s and t. The first approach uses a parametric model based on a series of well-know stochastic laws that describe the empirical relationships of the aftershocks and the mainshock, which we briefly review in Section 3.1. In the second approach, we apply bivariate extreme value theory to estimate the joint tail. Both methods are applied to the extreme earthquake events in the North Anatolian Fault Zone (NAFZ) in Turkey, the region where the 1999İzmit earthquake occurred.
The remainder of the paper is structured as follows. In Section 2, we present the earthquake data in NAFZ and describe the relevant data processing. Section 3 provides the parametric and non-parametric estimation procedures for the joint main-/aftershock distribution. The detailed data analysis and results are presented in Section 4. We conclude in Section 5 with some discussions.

Data description
We use the North Anatolian Fault Zone (NAFZ) as an area of investigation due to its long and extensive historical record of large earthquakes [1; 2]. Extending from eastern Turkey to Greece, the 1,500-kilometer-long rip sustained several cycle-like sequences of large-magnitude (M > 7) earthquakes over the past centuries [29], several resulting in high death tolls and severe economic losses. The most recent activities include thė Izmit (Mw 7.6) and Düzce (Mw 7.1) earthquakes of 1999 [24; 28]. 1 1 Regarding the earthquake scale in our data: Before 1977, all earthquakes were recorded in the body-wave magnitude scale (mb) or the surface-wave magnitude scale (Ms), depending on the depth of the earthquake. Following the development of the moment magnitude scale (Mw) by [18; 17], earthquakes with magnitude larger than 5 Table 1: Window specification L(X), T (X) for aftershock labelling from [15].
We obtain data from the Presidential of Earthquake Department database of the Turkish Disaster and Emergency Management Authority (https://deprem.afad.gov. We now label earthquake events by identifying the mainshocks and their corresponding aftershocks. Being interested in extreme events, we only consider earthquake events with significant mainshocks such that X ≥ 5. We use the window algorithm proposed in [15] as follows. For each shock with magnitude X ≥ 5, we scan the window within distance L(X) and time T (X). If a larger shock exists, we move on to that shock and perform the same scan. If not, then the shock is labelled as the mainshock and all shocks within the specified window are pronounced as its aftershocks. Table 1 provides the values for L(X) and T (X). For example, for an earthquake of magnitude 6.0, any shock following T = 510 days and within L = 54 km radius, with a magnitude less than 6, is considered to be its aftershock.
The right panel of Figure 1 shows the labelled mainshocks in the time series. The algorithm identifies n = 180 earthquake events with mainshocks X ≥ 5 among which 129 have aftershocks with magnitude greater than 4. Note that a few large earthquakes in the early years are not labelled as mainshocks, instead they are identified as aftershocks of mainshocks before the year 1965 from earlier data.

Parametric approach
It is agreed in the literature that the distribution of aftershocks in space, time and magnitude can be characterized by stochastic laws, see [30; 31] and [32] for a summary with detailed empirical studies. In this section, we propose a simple parametric model for the joint magnitudes of the mainshock and the largest aftershock based on these relationships. This derivation is similar to that in [33].
The following empirical evidence for aftershocks have been noted in prior literature.
1. The frequency g(t) of the aftershocks per time unit at time t after the mainshock follows the modified Omori's law: where K, c, p are constants [30].
2. The magnitude of the aftershocks follows Gutenberg-Richter's law [16], that is, the number of aftershocks N (m) with magnitude m follows where a, b are constants.
3. The magnitude difference between a mainshock and its largest aftershock is approximately constant, independent of the mainshock magnitude and typically between 1.1 and 1.2 [3].
Based on the above, [30] modelled the intensity rate of aftershocks with magnitude m as where m 0 is the mainshock magnitude and a, b, c, p are constants. By defintion, m ≤ m 0 such that aftershocks are always smaller than the mainshock. This modelling is used widely in ensuing literature, cf. [27], and is the basis of the ETAS (epidemic-type aftershock sequence) simulation model, cf. [23]. It is common to model the occurrences of aftershocks as a Poisson point process.
On the other hand, the mainshocks can be considered as independent events and their magnitude can also be modelled by the Gutenberg-Richter's law in equation (2) [32]. In the following, we model the magnitude of the mainshocks X using an exponential distribution with distribution and density functions

The model
Let X A denote the magnitude of an aftershock. Given the mainshock X = m 0 , we assume that the aftershocks sequence follows a non-homogeneous Poisson process with intensity function (3). We derive the following.
• The total number of aftershocks N follows a Poisson random variable with mean where β = b ln 10 and C = 1 β 10 a ∞ t=1 1 (t+c) p .
• The conditional distribution of X A follows and is conditionally independent of N .
Observe that the largest aftershock Y = max 1≤i≤N X A i . Therefore, by the conditional independence of N and X A , it follows that for m ∈ [0, m 0 ], This suggests that Z follows a Gompertz distribution, that is, −Z follows a Gumbel distribution conditional to be negative. If we impose the convention that {Z > m 0 } = {X A < 0} represents the event that no aftershock occurs, then we can model Z as independent of X. Note that when m 0 is large the probability of {Z > m 0 } is negligible.
Combining (4) and (5) yields the joint model for (X, Y ) given by where f X (x) is as defined in (4) and f Z (z) is the density function of Z from (5). Given data, the parameters (α, β, C) can be estimated through maximum likelihood.

Bivariate extreme value approach
Multivariate extreme statistics has been exhibited to be a powerful tool for inference on multidimensional risk factors. Examples of applications can be found in [10; 21] and [26] among others. Recall that the goal is to estimate the probability: P(X > t, Y > s).
To this end, we assume that the joint distribution of (X, Y ) is in the max domain of a bivariate extreme distribution introduced in [12]. This is a common condition in multivariate tail analysis and includes distributions with various types of copulas. Let F 1 and F 2 denote the marginal distribution functions of X and Y , respectively. The assumption implies that for any (x, y) ∈ [0, ∞] 2 \ (∞, ∞), the following limit exists: The function R characterizes the extremal dependence between X and Y and it can be expressed via other extremal dependence measures. For instance, it is linked to the stable tail dependence function L and the Pickand function A: For a general review on the multivariate extreme value theory, see for example Chapter 6 in [11] and Chapter 8 in [4].
The limit relation in (6) guarantees the regularity in the right tail of the copula of (X, Y ), which enables us to do the bivariate extrapolation to the range far beyond the historical observations. Let s and t be sufficiently large and denote that p 1 = P(X > s) and p 2 = P(X > t).
Then the problem transforms to estimating p 1 , p 2 and R(x, 1). Due to the relation in (7), the various methods of estimating L or A can be applied to estimate R(x, 1); for instance see [8; 13; 7; 14; 5] among many others. Because of the particular features of earthquake data -that they have been rounded to the first digit and censored belowwe use a basic non-parametric estimator of R(x, 1), which requires least assumptions on the data and is the basis of other more advanced estimation approaches. Let n be the sample size and k = k(n) be a sequence of integers such that k → ∞ and k/n → 0 as n → ∞. Let R X i and R Y i denote the ranks of X i and Y i in their respective samples. The estimator of R(x, 1) is given bŷ As for estimating p 1 and p 2 , we fit exponential distributions to both margins, which is a typical choice for modeling earthquake magnitude justified by the Gutenberg-Richter law [16]. A natural alternative is to apply univariate extrem value theory to estimate these tail probabilities. Many studies have been devoted to study the tail distribution or the endpoint of earthquake magnitude; see for instance [19] and [6]. However, due to the small sample size and the rounding issue, we choose to fit parametric margins.

Results
From Section 2 we extract from the NAFZ dataset time series of mainshocks magnitude (x i ) where x i ≥ 5. For the time series of the corresponding largest aftershock, we only observe the values that are above 4, that is, we observe y i 1 {y i ≥4} . The two time series are plotted in Figure 2.

Parametric approach
The mainshock sequence (x i ) is fitted with an exponential distribution truncated at 4.95 -we take into consideration the continuity correction. Since all observations are discrete by 0.1 increment, from now on whenever we show the fit of a distribution or calculate the goodness-of-fit p-value, we jitter all observations by uniform noises between (−.05, .05). The fit is shown on the left panel of Figure 3 and the Komogorov-Smirnov p-value is 0.83, indicating a good fit.
Next we fit a Gompertz distribution to the difference x i − y i by maximizing the following censored likelihood: where F Z is as defined in (5) and f Z is the corresponding density. To assess the goodness-of-fit, we first approximate the complete set of maximum aftershock sequence byỹ i as follows . When y i ≥ 4, setỹ i := y i . When y i < 4, simulate z i from F conditional on z i ≥ x i − 4 and setỹ i := x i − z i . The histogram of jitteredỹ i is shown on the right panel of Figure 3 with the fitted density. The p-value is 0.95.
The scatterplot of the jittered (x i ,ỹ i ) is plotted in Figure 4, with the red point indicating the simulation for the censored observations. As we can see, the simulation is in agreement with the pattern of the observed pairs. We also note that the Båth's law [3] -the empirical evidence that the magnitude difference between the mainshock and the largest aftershock is constant between 1.1 and 1.2 -can be well-justified by the fitted model. The fitted mean of Y − X is 1.1. The x = y + 1.1 line is shown in as the dotted line in Figure 4.

Extreme value approach
For the extreme value analysis approach, we use the same estimate for the marginal distribution of X as in the parametric approach. We fit an exponential distribution truncated at 4.55 to the marginal distribution Y . The fitted density is shown in Figure  5 with the Komogorov-Smirnov p-value 0.11. The fitted distributions are used to compute p 1 and p 2 in (8).
When estimating R(x, 1), we note that there are ties in the data as they are rounded to one decimal place. For this, we randomly assign ranks to the tied observations. The missing values of Y (they are censored above 4) does not effect the estimator in (9) provided that k is smaller than n 1 , the number of observed Y i 's. Obviously, the ranks of the missing values are smaller than n − n 1 , thus the corresponding indicator function in (9) equals to zero regardless of the precise value of R Y i . The left panel of Figure 6 shows the estimates of R(x, 1) for three different values of x and k ∈ [10, 100]. Also note that R(1, 1) is a commonly used quantity to distinguish tail dependence (R(1, 1) > 0) and tail independence (R(1, 1) = 0). Roughly, tail dependence says that the extremes of X and Y tend to occur simultaneously while joint extremes rarely occur under tail independence. This plot clearly suggests tail dependence between X and Y because the estimates of R(1, 1) are clearly above zero. Based on these three curves, we choose our k = 40.
With the choice of k = 40, we obtain the non-parametric estimate of R(x, 1) for x ∈ [0.02, 5] plotted in the black curve in right panel of Figure 6. The wiggly behaviour of this estimator motivates us to consider a smoothing method. We adopt the smoothing method introduced in [20], which makes use of the beta copula. This smoothed estimator, denoted asR b (x, 1), respects the pointwise upper bounds of the function, that is R(x, 1) ≤ max(x, 1) and it does not require smoothing parameter such as bandwidth. The resulted estimates are represented by the red curve in right panel of Figure 6. The two estimators are coherent with each other.R b (x, 1) is only used in obtaining the level curves in Figure 7. the results. We emphasize that the two approaches only share one common assumption, that is, the marginal distribution of X. The distribution of Y and the dependence between the (X, Y ) are modelled separately.
Next we obtain the level curves of (X, Y ) for the tail probability sequence (10 −3 , 5 · 10 −4 , 10 −4 , 5 · 10 −5 , 10 −5 , 5 · 10 −6 , 10 −6 ), as shown in Figure 7. The points (x, y) on each curve represents such that P(X > x, Y > y) = p for the given probability level. Albeit based on different theories, the two approaches provide coinciding prediction results. The two dot lines in Figure 7 correspond to x = y and x = y + 1.1. The horizontal shape of the curves between these two lines indicates that the results respect the Båth's law. This is particularly remarkable for the non-parametric approach, which does not impose any dependence structure for (X, Y ).

Discussion
In this paper we consider estimating the tail probability of an extreme earthquake event where the mainshock magnitude X and the largest aftershock magnitude Y both exceed certain thresholds. We approach the problems from two directions. On one   hand, based on the well-known stochastic rules for aftershocks, we propose a joint parametric model for (X, Y ), estimate the model using (censored) maximum likelihood, and from the model, calculate the desired probabilities. On the other hand, we use non-parametric methods from bivariate extreme value analysis to extrapolate tail probabilities. We illustrates both methods using the earthquake data in the North Anatolian Fault Zone (NAFZ) in Turkey from 1965 to 2018. The two approaches produce surprisingly agreeing results.
This is an exploratory effort in applying multivariate extreme value analysis to seismology problems and much extension is possible. For example, the occurrences of the earthquake events can be modelled in time and return level of extreme events can be estimated. Further information, such as distance between shocks and other geological covariates, can be incorporated into the analysis to provide more accurate R (x, 1) R b(x, 1) Figure 6: Left: The estimator of R(x, 1) is given in (9). Right: The estimator of R(x, 1) is given in (9).
or customized results.
This paper serves as a confirmation that simple techniques from multivariate extreme value analysis, though with little expert knowledge behind the data, is able to provide useful information in the analysis of extreme events.