1. Introduction

Seismicity patterns vary substantially from place to place, showing various clustering features, though some of the fundamental physical processes leading to earthquakes may be common to all events. Kanamori (1981) postulates that fault zone heterogeneity and complexity are responsible for the observed variations. Such complex features have been tackled in terms of stochastic point-process models for earthquake occurrence. The stochastic models have to be accurate enough in the sense that they are spatio-temporally well adapted to and predict various local patterns of normal activity. The epidemic type aftershock sequence (ETAS) model and its space-time extension have been introduced for such a purpose (Ogata, 1985, 1988, 1993, 1998).

However, their postulate is that the parameter values are assumed to be the same throughout the whole region and time span considered. We learn by experience that the difference of parameter values of the model at different subre-gions becomes more significant as the catalog size increases by lowering the magnitude threshold or as the area of the investigation becomes larger. For example, the p-value of the aftershock decay varies from place to place (Utsu, 1969), besides the background seismicity that obviously depends on the location. If the space-time ETAS model is fitted to such a dataset, the parameter estimates on average are obtained for the seismicity on the whole area, but they lead to biased seismicity prediction in the subregions where the seismicity pattern is significantly different from the one estimated for the whole area (see Ogata, 1988, for example).

Therefore, the best fitted case among the candidates of the space-time ETAS models in Ogata (1998) was extended to the hierarchical version of the model (the hierarchical space-time ETAS model, HIST-ETAS model in short) in which the parameters depend on the location of the earthquakes (Ogata et al., 2003; Ogata, 2004). The software package of the computing programs is in preparation for publishing (Ogata et al., 2010).

Using the present HIST-ETAS model together with Gutenberg-Richter’s magnitude frequency (Gutenberg and Richter, 1944) with the location dependent b-values, we are able to forecast the baseline seismic activity more accurately than ever, and thus we take a part in the Earthquake Forecast Testing Experiment in Japan (EFTEJ) for a short-term, intermediate-term and long-term future in and around Japan (http://www.eic.eri.u-tokyo.ac.jp/ZISINyosoku/wiki.en/wiki.cgi). This manuscript describes a sequence of procedures of pre-treatment (recompiling) of the space-time data, parameter estimation of the HIST-ETAS model as well as estimation of the location dependent b-values to undertake the short-, intermediate- and long-term forecasting.

2. Location Dependent Space-Time ETAS Model

First of all, we are concerned with statistical models for the data of occurrence times and locations of earthquakes whose magnitudes equal to or larger than a certain cut-off magnitude Mc. We define the occurrence rate λ(t, x, yǀH t ) of an earthquake at time t and the location (x, y) conditional on the past history of the occurrences, satisfying the relation

(1)

where H t = {(t i , x i , y i , M i ); t i < t} is the history of earthquake occurrence times {t i } up to time t associated with the corresponding epicenters (x i , y i ) and magnitudes {M i }. Thus a space-time probability forecast can be provided by the conditional occurrence rate function as a seis-micity model.

We would like to predict the standard short-term seismic-ity for a region A using the models of the location dependent parameters that reflect different regional and physical characteristics of the earth’s crusts. Namely, we consider a space-time ETAS model whose parameter values vary from place to place depending on the location (x,y). Consider the space-time occurrence rate conditioned on the occurrence history H t up to time t such that

(2)

where (x j , y j ) and S j are the aftershock centroid and normalized variance-covariance matrix of spatial clusters, respectively, which are specified in the next section. We are particularly concerned with the spatial estimates of the first two parameters of the model. Namely, μ(x, y) of the background seismicity is useful for long-term prediction of large earthquakes (Ogata, 2008). Also, the model with normalized aftershock productivity K(x, y) could possibly be more useful for immediate aftershock probability forecast than the one implemented in Marzocchi and Lombardi (2009), especially in the case where the anisotropic features are not neglected. The reasons and their utility of the basic structure of the model in (2) are demonstrated in Ogata (1998).

As will be specifically described in Section 5, each of the parameters μ(x, y), K (x, y), α(x, y), p(x, y) and q(x, y) is represented by a piecewise function whose value at any location (x,y) is interpolated by the three values (the coefficients) at the locations of the nearest three earthquakes (Delaunay triangle vertices) on the planed tessellated by epicenters. The coefficients of the parameter functions are simultaneously estimated by maximizing a penalized log-likelihood function that determines the optimum trade-off between the goodness of fit to the data and uniformity constraints of the functions (i.e., facets of each piecewise linear function being as flat as possible). Here, such optimum trade-off is objectively attained by minimizing the Akaike Bayesian Information Criterion (ABIC; Akaike, 1980; see Section 4) that actually evaluates the expected predictive error of Bayesian models based on the data used for the estimation (e.g., Ogata, 2004).

3. Data Processing for Anisotropic Clusters

According to the format required by the EFTEJ, we use the hypocenter catalog of the Japan Meteorological Agency (JMA) for the period 1926-2008 as the original source. Furthermore, we combine the catalog with the Utsu catalog (Utsu, 1982, 1985) for the period 1886–1925, whose magnitudes are consistent with the JMA catalog. Actually, the detection rate of smaller earthquakes is low in early period. Nevertheless, we utilize such large earthquakes as the history in the ETAS model in the precursory period because they are possibly influential to the seismicity in the target period. The accuracy of the hypocenter depth of the JMA catalogue is not satisfactory especially in offshore regions, so that we ignore the depth axis and consider only longitude and latitude for the location of an earthquake restricting ourselves to shallow events down to 100 km depth. Also, we should be sensitive to and avoid the constrained epicenters in such a way that they are subsequently located at the same place or on lattice coordinates because these cause odd or biased estimates of the space-time ETAS models.

We preprocess the data in the original JMA catalog to fit the space-time ETAS model (2) as follows. First of all, to predict a possible anisotropic spatial cluster, we utilize the data of all detected earthquakes with depths shallower than 100 km throughout whole Japan; that is, within the rectangular region bounded by 120° E and 150° E meridians, and 20°N and 50°N parallels. Then, instead of using the epicenter location in hypocenter catalogues that is the location of rupture initiation, we adopt the centroid coordinates of aftershocks for the model (2). Furthermore, we see that aftershocks are approximately elliptically distributed (Utsu, 1969) as represented by a quadratic function using the matrix S j in the model, which reflects the ratio of the length to width of the ruptured fault, its dip angle and the location errors of aftershock epicenters. To determine the matrix S j , we consider each large earthquake as a cluster parent (mainshock) that followed by enough number of clustered events (aftershocks) within a short time span (say, one hour) and within the square domain of side distance 3.33 × 100.5M−2 + 66.6 km centered at the epicenter location, taking the epicenter errors in early days into consideration (see Utsu, 1969; Ogata et al., 1995; hereafter called as the Utsu Spatial Distance). Specifically, for the cluster parents, we consider all earthquakes of M ≥ 5 for short-and intermediate-term and M ≥ 6 for long-term forecast, which are more than one unit larger than the cut-off magnitude (M 4 for short- and intermediate-term and M 5 for long-term as assigned by the EFTEJ). On the other hand, we use all earthquakes located by the JMA for the cluster members for the following analysis. Figure 1 shows several examples of such spatial clusters of earthquakes that took place within an hour.

Fig. 1.
figure 1

These panels show aftershocks occurring during the first hour after the mainshock that is indicated by a star. The occurrence date and magnitude of the mainshock are printed. The AIC values of Models 0–3 relative to the largest one (see text) are listed in each panel, where the model of the smallest value is adopted for the forecast of the aftershock cluster anisotropy.

To predict whether the cluster develops in isotropy or anisotropy, we fit a bi-variate Normal distribution to the epicenter coordinates of the aftershocks in each cluster to obtain the maximum likelihood estimate of the average vector and the covariance matrix with the elements and for S j in (2) in the form

Model 0 represents the null model with the original epicenter location with σ1 = σ2 = 1 and ρ = 0. Alternatively, the epicenter coordinates of the cluster parent is replaced by the centroid coordinates of their immediate aftershocks (Model 1), or the identity matrix is replaced by the normalized variance-covariance matrix (Model 2), or the both are replaced (Model 3). The model of the smallest AIC value is adopted among Models 0–3. All the other events including the cluster members remain the same as the null model (Model 0); namely, the same coordinate as that of the epicenter of the original catalogue associated with the identity matrix for S j . This selection procedure is comparable to the projection of the centroid moment tensor solution (Dziewonski et al., 1981) to the surface.

As requested by the EFTEJ, we consider two target periods with different threshold magnitudes for the long- and short-term forecasts, taking the evolution of detection capability of earthquakes by the seismic network of the JMA. The former one is 1926–2008 with threshold magnitude M 5.0, and the latter is 2000–2008 with threshold magnitude M 4.0. These are regarded as almost completely detected throughout the respective target period and the Japan area except for the north-end off-shore and southern end of Izu-Ogasawara (Izu-Bornin) Islands in early years. We use a moderate number of large earthquakes (M 6 or larger) in the precursory period to the target period of the analysis, as the history of the ETAS model. Then, based on this earthquake data, we form the Delaunay tessellation that is necessary to apply the location dependent space-time ETAS model as specified in Section 5.

4. Optimization and Selection of Bayesian Models

We are concerned with statistical models to describe space-time heterogeneity which actually require a large number of parameters. Consider the case where such models with parameters {θ = (θ i ) ∈ Θ} are given by likelihood L(θ ǀ data). To estimate the parameters; we often use the penalized log likelihood (Good and Gaskins, 1971)

(3)

where the function Q represents a positive valued penalty function, and τ = (τ1,⋯, τ K ) is a vector of the hyper-parameters that control the strength of some constraints between the parameters θ. The crucial point here is the tuning of τ. From the Bayesian viewpoint, the penalty function is related to the prior probability density π(θǀτ) = eQ(θǀτ)/ ∫ΘeQ(θǀτ)dθ and the exponential to the penalized log likelihood function R is proportional to the posterior function. For determining suitable values of the hyper-parameters τ, consider the posterior probability density function p(θǀdata; τ) = L(θǀdata)π(θǀτ)/Λ(τǀdata) with normalizing factor

(4)

Maximization of this normalizing factor or its logarithm with respect to the hyper-parameters τ is called the method of the Type II maximum likelihood due to Good (1965). Given a set of data, one seeks to compare the goodness-of-fit of Bayesian models that have distinct likelihoods or distinct priors and to search for the optimal hyper-parameter values. For instance, Ogata et al. (1991) compared the use of different priors for isotropic and anisotropic smoothness constraints, which need two and five hyper-parameters, respectively. For such a purpose, Akaike (1980) justified and developed the Good’s method based on the entropy maximization principle (Akaike, 1978) and defined ABIC = −2maxτ lnΛ(τǀdata) + 2dim(τ) for consistent use with the Akaike Information Criterion (AIC; Akaike, 1974). Here, dim(τ) is the number of the hyper-parameters. Both ABIC and AIC are to be minimized for the comparison of Bayesian and ordinary likelihood-based models, respectively, for better fit to the data. The normalizing factor Λ (τ ǀ data) in Eq. (4) is called the likelihood of the Bayesian model with respect to the hyper-parameters τ. The Bayes factor (e.g., O’Hagan, 1994) corresponds to the likelihood ratio of the Bayesian models.

5. Hierarchical Modelling on Tessellated Spatial Region

5.1 Delaunay interpolation functions

Consider the location-dependent space-time ETAS model where the five parameters in (2) are expressed by

(5)

Here, the constants and are baseline parameter values, and the functions ϕ1(x, y), ϕ2(x, y), ϕ3(x, y), ϕ4(x, y) and ϕ5(x, y) are expanded using sufficiently many coefficients. The exponential with respect to each ϕ-function is adopted to avoid negative values of the parameter functions. The two dimensional cubic B-spline expansion could be used as in Ogata and Katsura (1988, 1993) and Ogata et al. (1991). However, the spatial distribution of the epicenters such as shown in Fig. 2(a) appears too highly clustered for a bi-cubic spline function to represent well adapted and locally unbiased estimates of seismicity rate in such active regions. This is even more difficult for the recent data where earthquakes are accurately located.

Fig. 2.
figure 2

(a) Epicenter locations (dots) of earthquakes of M ≥ 4.0 in and around Japan for the target period 2000–2008 together with those of M ≥ 6.0 from the period 1885–1999 that are used as the history of the ETAS model, and (b) Delaunay tessellation connecting the epicenters and some points on the boundary.

Therefore, our alternative proposal for the present case is as follows. Consider the Delaunay triangulation (e.g., Green and Sibson, 1978); that is to say, the whole rectangular region A is tessellated by triangles with the vertex locations of earthquakes and some additional points {(x i ,y i ),i = 1,…, N + n}, where N is the number of earthquakes and n is the number of the additional points on the rectangular boundary including the corners. Here, for successfully fulfilling a Delaunay tessellation, we sometimes need very small perturbation of epicenters to avoid lattice structure or duplicated locations in a local domain. Figure 2(b) shows such a tessellation based on the epicenters of the present dataset (Fig. 2(a)) and the additional points on the boundaries.

Then, define the piecewise linear function ϕ(x, y) on the tessellated region such that its value at any location (x, y) in each triangle is linearly interpolated by the three values at the vertices. Specifically, consider a Delaunay triangle and the coordinates of its vertices (x i , y i ), i = 1, 2, 3. Then, for the values ϕ i = ϕ(x i , y i ), i = 1, 2, 3, the function value at any location inside the triangle is given as follows:

Consider the linear equations

(6)

to obtain the non-negative solution and so that we have

(7)

Such a function suitably represents the variation of the samples on a highly non-homogeneous or clustered point pattern. That is to say, we can estimate detailed changes of rate in a region where the observations are densely populated.

5.2 Spatial ETAS with all parameters constant

Now we have to start with the simplest space-time ETAS model in which all the parameters θ = (μ, K, c, α, p, d, q) in (2) are constant throughout the whole region, equivalently, all the functions ϕ k (x, y) in (5), k = 1, 2,…, 5, are equal to zero. The maximum likelihood estimates (MLE) are obtained by the maximizing the log-likelihood function

(8)

for the earthquakes in the target period [S, T], where H t is the history of earthquake occurrences before time t including those from the precursory period [0, S]. We use a quasi-Newton method (e.g., Fletcher and Powell, 1963) for the numerical maximization. When the number of earthquakes is very large, the computing takes substantially long time due to the double sum in the first term of the log likelihood (8). One may be interested in a quicker but approximate computation by only taking the double sum of the earthquake pairs closer than a certain distance, such as 4 times of the Utsu Spatial Distance 3.33 × 100.5M−2 km (cf., Section 3). This restriction considerably lessens the required calculations because the intensity at the location of subsequent events will only be influenced by historical events if the given event is contained within the threshold distance associated with the historical events. We take this restriction for an approximation throughout the present paper although we can perform the computations without the restriction taking the longer c.p.u. time. The MLE for the datasets with magnitude thresholds M 4 and M 5 are given in Tables 1 and 2, respectively. It should be noted here that the space-time ETAS models with constant parameter including μ and K appear to provide biased estimates for other parameters (see Tables 1 and 2, and Section 7). In particular, the p-value of the models are less than 1.0 while the Bayesian models take p > 1 values as obtained below. Nevertheless, the obtained MLE are then used for the initial guess to estimate the restricted HIST-ETAS model as specified in the next section.

Table 1. Estimates of the models applied to the M ≥ 4 data.
Table 2. The estimates of the models applied to the M ≥ 5 data. The same caption as for Table 1.

5.3 ETAS: Spatially varying μ and K

The obtained MLEs under the constant parameter μ for the background seismicity cause the highly biased MLEs for the baseline estimates and in (5) as well as c and d. Without appropriately unbiased initial guess of the baseline parameters, it is not easy to stably obtain the converging solution of the five location-dependent parameters in (5) due to the search in very high dimensional coefficient space. Therefore, before applying the model (2) with (5), we use the MLEs of the space-time ETAS model for the initial guess of the baseline parameters of a special version of the model (2) in which we assume that only the background rates and aftershock productivity rate are location dependent; namely, other functions ϕ k (x, y), k = 3, 4, 5, in (5) are fixed to be zero. Hereafter we call this restricted model as μK-HIST-ETAS model. In order to estimate ϕ k (x,y) with each of k = 1, 2, we use more than twice as many coefficients as the number of the earthquake data.

For stable estimation of such functions, we need to constrain the freedom of the coefficients toward the uniformity, or less variability, of the functions. These requirements lead us to minimize the penalized log-likelihood function (3) where ln L (θ) is the log-likelihood function in (6), Q (θ ǀτ) is a penalty function against the roughness of the ϕ-functions, and τ = (w1, w2) is a set of the weights for tuning parameters (hyper-parameters). The penalty function Q represents the strength of the constraints against the variability in the first derivative of the ϕ-functions as follows:

(9)

where the index j runs across all the Delaunay triangles with areas Δ j ; and and is the function value of the vertex coordinate and , respectively.

The penalized log-likelihood defines a trade-off between the goodness of fit to the data and the uniformity of each function, namely, the facets of the piecewise linear function being as flat as possible. A smaller weight leads to a higher regional variability of the ϕ-functions. The optimal weights together with the maximizing baseline parameters (, c, α, p, d, q) are obtained by a Bayesian principle of maximizing the integrated posterior function (see Appendix). Here note that the baseline parameters are automatically determined by the zero sum constraint of the corresponding ϕ-function. This overall maximization can be eventually attained by repeating alternate procedures of the separated maximizations with respected to the parameters (coefficients) and hyper-parameters (weights) described as follows.

First of all, we use the obtained MLEs of the space-time ETAS model for the initial baseline parameter and set ϕ1 (x, y) = ϕ2(x,y) = 0 for the initial coefficients. Then, we implement the maximization of the penalized log-likelihood (3) with respect to the coefficients of the ϕ-functions (see Appendix). For the maximization, we adopt a linear search procedure in conjunction with the incomplete Cholesky conjugate gradient (ICCG) method for 2(N + n) dimensional coefficient vectors by using a suitable approximate Hessian matrix (see Appendix), where N is the number of earthquakes and n is the number of the additional points on the rectangular boundary including the corners (see Fig. 2(b)). This makes the convergence very rapid regardless of the high dimensionality of θ if the Gaussian approximations for the posterior function are adequate.

Having attained such convergences for given hyper-parameters τ = (w1, w2, c, α, p, d, q), we eventually need to perform the maximization of Λ (τ) defined in (4) with respect to τ by a direct search such as the simplex method in the 7 dimensional space. Such double optimizations are repeated in turn until the latter maximization converges. The whole optimization procedure usually converges when initial vector values for τ are set in such a way that the penalty is effective enough; otherwise, it may take very many steps to reach the solution. After all, assuming unimodality of the posterior function, one can get the optimal maximum posterior solution for the maximum likelihood estimate.

5.4 ETAS: Spatial variation in 5 parameters

Having obtained the optimal weights with coefficients of and as well as the baseline parameters in the μK-HIST-ETAS model, we use these initial inputs to stably estimate the HIST-ETAS model in (2) with five location-dependent parameters in (5) by the same optimization procedure as stated above. Specifically, we first set the initial estimates and obtained in the above and also set ϕ3(x, y) = ϕ4(x, y) = ϕ5(x, y) = 0 with the baseline values and of the μK-HIST-ETAS model that are obtained by the above-stated procedure. Then, we consider the penalized log-likelihood function (3) with the extended penalty function

(10)

of τ = (w1,…, w5). Here, the baseline values and are fixed throughout the region and period. The optimal weights are obtained by the similar procedure of maximizing the integrated posterior function (see Appendix) to the procedure that has applied to the μK-HIST-ETAS model in Section 5.3. This maximization can attain sequentially and alternately as follows. First, we implement the maximization of the penalized log-likelihood (3) with respect to the coefficients of the ϕ-functions (see Appendix). For the calculation, we adopt a linear search using the incomplete Cholesky conjugate gradient (ICCG) method for 5(N + n) dimensional coefficient vectors, where N + n is the same number as given in Section 5.3. Alternately, we implement the simplex algorithm in the 5-dimensional space of to maximize Λ (τ) up until this converges. Here, before the 5-dimensional simplex search, we recommend to firstly make the lattice search of (w3, w4, w5) in the logarithmic orders, such as (10i, 10j, 10k) for possible sets of integers i, j and k to compare the respective ABIC values h, while (w1, w2) remain fixed to obtained in Section 5.3. It is a limitation of this procedure that this maximization may not converge for small sets of integers because the convergence relies on the quadratic approximation penalized log likelihood (see Appendix and the ICCG method). From our experience, 2 or 3 or larger can be a choice of the start. Then, using the set of weights with the smallest ABIC value, we can implement the 3 dimensional simplex search of (w3, w4, w5) or even the 5 dimensional simplex search of (w1, w2, w3, w4, w5) for global minimization. Here it is important to make use of the previously converged solutions of parameters (coefficients) for the next initial parameters of such large dimensions.

It is also useful to examine whether or not the characteristic parameters, particularly and are significantly uniform (i.e., spatially invariant). For this we can calculate the Akaike Bayesian Information Criterion (ABIC; see Appendix) as a byproduct of the above simplex optimization. A model with a smaller ABIC value indicates a better fit. For example, we can compare the ABIC values of the HIST-ETAS model for the optimal weights with the one for (, 108) to examine whether q-value is location dependent or not.

Figures 3 with Table 1 and 4 with Table 2 provide the optimal estimates of HIST-ETAS model applied to the processed JMA data in Section 3 for the target period of 2000–2008 with threshold magnitude M 4.0, and the data for 1926–2008 with threshold magnitude M 5.0, respectively.

Fig. 3.
figure 3

Maximum posterior estimates of respective parameter functions (see text) of the hierarchical space-time ETAS model and b-values of the G-R frequency that are applied to the reprocessed JMA data (see Section 3) with earthquakes of M 4.0 or larger during the target period from 2000–2008; in addition, we use earthquakes of M 6.0 or larger from the precursory period of 1885–1999 as the occurrence history of the space-time ETAS model. The colors represent the estimated coefficient values of the parameter functions μ,K, α, p,q and b-values. The dimension of μ and K is the number of events per degree per day.

The estimated images of the corresponding parameters between Figs. 3 and 4 appear similar to each other in spite of the different target periods and different cutoff magnitudes. Although the considered earthquakes with the cutoff magnitudes are mostly complete, the q-value images in both Figs. 3 and 4 shows apparent artificial feature. Namely, the inverse power q-values for distances between a mainshock and its aftershocks are lower in the margin of Japan islands than those in the interior region. This seems to be attributed to the difference of epicenter location accuracies in the land and the margin. The images of the other parameters seem to be genuine except in the very margin of the region such as in Taiwan and in the southern part of the Ogasawara islands due to the magnitude incompleteness there. Incidentally, we can obtain contour images and color images on the lattice of these parameters covering the whole area by the interpolation (7) of the Delaunay triangles such as shown in Ogata et al. (2003) and Ogata (2004).

Fig. 4.
figure 4

Maximum posterior estimates of respective parameter functions of the hierarchical space-time ETAS model and b-values, applied to the reprocessed JMA data with earthquakes of M 5.0 or larger during the period of 1926–2008; in addition, we use earthquakes of M 6.0 or larger from the precursory period from 1885–1925 as the occurrence history of the ETAS model. See Fig. 3 for the additional caption.

6. Modeling the Spatially Varying b-Values

We further consider that the b-value of the Gutenberg-Richter’s magnitude frequency law is location dependent. Historically, based on the moment method, Utsu (1965) proposed the estimator for the observation of magnitude sequence {M i , i = 1,…, N} where Mc is the lowest bound of the magnitudes above which almost all the earthquakes are detected. This is modified by Utsu (1970) to replace Mc by Mc − 0.05 for the unbiased estimate of the b-values in case when the given magnitudes are rounded into values with 0.1 unit, and hereafter we follow this modification for the JMA catalog.

Aki (1965) showed that the Utsu’s b-estimator is nothing but the maximum likelihood estimate (MLE) that maximizes the likelihood function , M i > Mc and β = bln 10. Wiemer and Wyss (1997) uses the MLE in ZMAP software to obtain the location dependent b-values using data from moving disk whose radius is adjusted to include the same number of earthquakes. However there remain the issues of optimal selection of the number of earthquakes in the disk and evaluation of significance of the b-value changes.

We would like to solve these problems by the Bayesian procedure. Here, we assume that the b-value, or coefficient of the exponential distribution of magnitude, is dependent on the location in such a way that β θ (x, y) = b θ (x, y) ln 10 where θ is a parameter vector characterizing the function (Ogata et al., 1991). Then, having observed the magnitude data M i for each hypocenter’s coordinates (x i , y i ) with i = 1,2,…, N, the current likelihood function of θ can be written by

for M i > Mc. Since β, or b, is positive valued, we make the re-parameterization of the function , so that the estimate of the b-values in space is given by , where the ϕ-function is the piecewise linear on Delaunay tessellation, as given above. For a set of clusters of earthquakes, the Delaunay-based function fits better than the bi-cubic B-spline function that was used in Ogata et al. (1991). The estimation of the coefficients is undertaken by the penalized log-likelihood, where the penalty is tuned by the similar Bayesian procedure based on the ABIC (see Section 4 and Appendix). The last panels in Figs. 3 and 4 together with Table 3 provide the optimal estimates of the b-values applied to the data for the period of 2000–2008 with cutoff magnitude Mc = 3.95, and the one for 1926–2008 with cutoff magnitude Mc = 4.95, respectively. This appears similar on the whole to each other.

Table 3. The estimates for magnitude frequency.

7. Implications of Tables and Figures

We can compare the AIC and ABIC values among the MLE based models and among the Bayesian models, respectively, although we cannot directly compare the AIC value with ABIC values here because we did not adjust the difference in the normalization factors between AIC and ABIC in the considered models. By the entropy concept from which both AIC and ABIC (Akaike, 1974, 1978, 1980) are derived, we can expect a better forecast among the MLE-based models or among the Bayesian models with a smaller AIC or ABIC, respectively, under the assumption that the stochastic structure of future seismicity will not change from the past as the baseline seismicity.

Thus, Tables 1 and 2 imply several consequences of the present fitting of the models. First, we can say that the fit of the models to the data from the target period associated with the occurrence history of large earthquakes in precursory period will forecast better than those applied to the data during the target period only. Second, the models that take the anisotropic clusters into consideration will forecast better than the models with isotropic clusters only using the original JMA hypocenter data. Third, the five parameter HIST-ETAS models will forecast better than the μK-HIST-ETAS models. Eventually, we expect the best forecasting performance by the 5 parameter HIST-ETAS models that take account of the anisotropic clustering and effect of the history in the precursory periods. Finally, the p < 1 estimate for the uniform background rate μ in space become p > 1 by the location dependent μ estimate. The reason of the p < 1 estimate is that as a compensation of the spatially uniform back ground rate, the time evolution with heavier tailed aftershock decay is easier for the spatial seismicity to concentrate in the active regions.

Figure 5 shows the pair plots between the parameter values of the HIST-ETAS model in addition to the b-value at the same location. First, each parameter of the HIST-ETAS model seems to have little correlation with the b-value. The correlations among the HIST-ETAS parameters are not clear on the whole. It may not make sense to see the correlations throughout the entire Japan region unlike the cases in Guo and Ogata (1996) in which only aftershock sequences are compared among the classified locations of inter- and intra-plate mainshocks. Nevertheless, we may see a weak correlation between μ and K parameters on a logarithmic scale. This is consistent with the observation that the asperity regions and mainshocks are complementary to the regions of high intensity of aftershock productivity (Ogata, 2004, 2008).

Fig. 5.
figure 5

Plots of the pairs of parameter values in Figs. 3 and 4 (except for the q-values) at the corresponding locations. The panels in the upper triangle panels (black dots) and the lower triangle panels (gray dots) are from Fig. 3 (M ≥ 4.0) and Fig. 4 (M ≥ 5.0), respectively. The parameters μ and K are on a logarithmic scale while the others are on a linear scale.

8. Forecasting

8.1 Short-term forecast

For the short-term forecast, we first reprocess the JMA data in real time as described in Section 3. Namely, during a certain time span (say, one hour) immediately after a large earthquake, the cluster analysis is automatically implemented while during the same period, we can only to make a real time forecast using the generic (null hypothesis model) procedure with the original JMA epicenter coordinates and the identity matrix for isotropic clustering.

Then the short-term probability forecast is calculated by the joint distribution of the combination given by

where the spatial values of both ETAS coefficient and b-values at any location (x, y) can be obtained by solving the relation in (6) and then interpolated by (7). Incidentally, since the CSEP testing centers, including the EFTEJ, commonly ask us to submit the forecasting probability at each voxel [t, t + A t ) [x, x + A x ) × [y, y + A y ) × [M, M + ΔM) of sizes in time (Δ t = 1 day), space (Δ x = Δ y = 0.1 degree) and magnitude (ΔM = 0.1 magnitude unit). Therefore, we forecast the probability for such a unit time-space-magnitude volume (voxel) by

8.2 Intermediate-term forecast

Suppose that the current time is S, and we forecast the probability during the period till the time T. For a intermediate period [S, T], we forecast probability for each space-magnitude voxel by

where Λ(S,T; x, y) is obtained by the following procedure: (i) calculate the intensity λ(t,x,yǀH S ) conditioned on the history H S up to time S from the HIST-ETAS model; (ii) integrate over the time span [S, T]; (iii) normalize this by its spatial integration over the whole region; and (iv) multiply this by the average number of earthquakes of MMc for the period of the time length TS. Here the normalization and multiplication in steps (iii) and (iv) are necessary to modify the bias of the forecasting probability because no possible events for the history H t , S < t < T, in the integration step (ii) is taken into consideration in the conditional intensity function during the period [S, T].

8.3 Long-term forecast

During the period [S, T] for a sufficiently large time span TS, λ(t, x, yǀH S ) is essentially equal to the background seismicity rate μ(x, y) for any location and time. Therefore, the intermediate-term probability above should take a very similar value for the case where we use the background seismicity rate μ(x, y) in place of λ(t, x, yǀH S ) in the above-stated procedure (i)–(iv). Thus, we adopt this as the probability of the long-term forecast of each space-magnitude voxel per unit time.

Relevantly, Ogata (2008) argues that the background rate appears better long-term forecasting for large earthquakes (M ≥ 6.7,15 years period) than the ordinary average occurrence intensity in space, by the retrospective prediction performance. This is mainly because such large earthquakes mostly occurred at the complementary regions of high K-values (e.g., Ogata, 2004) that substantially contribute to the total intensity λ(t,x,yǀH S ).

9. Concluding Remarks

We applied the hierarchical space-time ETAS (HIST-ETAS) model to the short-, intermediate- and long-term forecast of baseline seismicity in and around Japan. Each parameter of the space-time ETAS model is described by a two dimensional piecewise function whose value at a location is interpolated by the three values at the location of the nearest three earthquakes (Delaunay triangle vertices) on the tessellated plane. Such modeling by using Delaunay tessellation is suited for the observation on highly clustered points with accurate locations, and therefore we can expect locally unbiased probability evaluation there. We are particularly concerned with the spatial estimates of the first two parameters of the space-time ETAS model: namely, μ-values of the background seismicity and aftershock productivity K-values. The former is useful for the long-term prediction of the large earthquakes, and the latter for the short-term aftershock probability forecast immediately after a large earthquake.

It is noteworthy here that there is an extended version from the original space-time ETAS model with the same structure as the HIST-ETAS in (2). It is described such that

using the additional parameter γ (see Ogata and Zhuang, 2006; Zhuang et al., 2005). In principle, we can further extend this to the case where the parameter γ is also location dependent in addition to the five parameters in (5). Although it becomes unstable to obtain the estimates of the 6 location-dependent parameters mainly because of the strong correlation between the parameters α and γ, this could be a challenging task for a better forecasting.

For the joint probability of space-time-magnitude forecast, we have assumed that the sequences of magnitudes are independent from history of the occurrence times while the reverse relation is highly dependent as described by the ETAS model. Furthermore, we have adopted the exponential distribution (Gutenberg-Richter law) for the magnitude frequency. However, I believe these postulates are not always the case. Indeed, the magnitude sequence of the global large earthquakes is not at all independent between them but possesses a long-range autocorrelations (Ogata and Abe, 1989). Furthermore, Ogata (1989) considered a model for magnitude sequence where the b-value varies in time based on both history of magnitudes and occurrence times of earthquakes. Furthermore, we know that magnitude frequency in a local area is not necessarily exponentially distributed as we see in many swarm activity. These anomalies may provide some hints for a better prediction of large earthquakes than the present models for baseline seismicity.