1. Introduction

Seismicity models provide some of the most useful products in earthquake prediction research. The incorporation of various predictive parameters could result in better performing models. Utsu (1977), 1982) and many others (Rhoades and Evison, 1979; Aki, 1981; Hamada, 1983; Grandori et al., 1988) have formulated expressions for earthquake probabilities based on precursory anomalies from a variety of measurements. Imoto (2006, 2007) proposed a method to build models based on multiple predictive parameters in which independence among parameters is not necessarily assumed as it has been in previous studies. His result implies that mutual correlations among predictive parameters for certain conditions could produce a better performance than expected for those cases in which the parameters are independent.

With the development of dense seismic networks and computational power, seismic wave velocity structures have been modeled to higher resolutions than has previously been possible. Seismogenesis must be closely related to the physical properties (e.g., pressure, temperature, and properties of geology) of focal areas. Of these, the P-wave velocity is more generally and systematically sampled than any other parameter. Many issues are involved in the relationship between seismic wave velocity perturbations and seismicities, and some of these have been discussed in only limited terms (i.e., AL-Shukri and Mitchell, 1988; Michael, 1988; Kaufmann and Long, 1996; Hauksson and Haase, 1997).

In the 1970s, a large body of literature was published on changes in seismic velocity before earthquakes. A change in P-wave velocity was interpreted using the dilatancy theory (hypothesis) that rocks underwent dilatation in the last stage before failure (Nur, 1972; Scholz et al., 1973). To date, systematic and homogeneous measurements detecting such variations have not been obtained. Consequently, in the study reported here, variations in seismic velocity over time are not discussed.

After the Hi-net seismic network was established in Japan (Obara et al., 2005), Matsubara et al. (2008) revealed fine structures of P- and S-wave velocities in Japan. Some of their remarkable findings are as follows. The high-velocity Pacific plate and Philippine Sea plate are clearly imaged to the depth of 150 km beneath northeastern and the southwestern Japan, respectively. High-V p /V s (P-wave velocity by S-wave velocity) zones are widely distributed beneath the volcanic front where seismic swarm activities, including moderately sized earthquakes, have often been observed. Non-volcanic tremors occur in the high-V p /V s zone at depths of 30–40 km beneath southwestern Japan where the oceanic crust of the Philippine Sea plate encounters the wedge mantle of the Eurasian plate.

Matsubara and Obara (2008) reported characteristic features of the perturbations in zones beneath active faults, and this information may be incorporated into the construction of a seismicity model that performs better than any previous model. However, before building such a model, it is necessary to appropriately evaluate the information of each contributing parameter.

In the study reported here, we evaluate model performance in terms of information gain per event (IGpe; Daley and Vere-Jones, 2003; Imoto, 2004) in order to incorporate information on the P-wave velocity structure into current seismicity models. The results present a good example of cases in which correlations among predictive parameters increase predictive power more than those without correlations.

2. Method

The seismic hazard function is expressed as the expectation of the number of earthquakes in a space-time volume dx above some threshold magnitude (Daley and Vere-Jones, 2003). We consider both the unconditional and conditional probabilities of observing a potentially predictive parameter value of θ, which are represented by g(θ) (background density) and f (θ) (conditional density) and which are empirically determined with random samples of cells in the whole study volume and samples conditioned on occurrences of earthquakes in some cells. The hazard function at a space-time point (x), conditioned on a value of θ(x), is given by

(1)

where m0 is the number of earthquakes above the threshold, and V0 is the space-time volume being studied (see Appendix).

Taking the Poisson model as the baseline, the IGpe (Daley and Vere-Jones, 2003; Imoto, 2004, 2007) for a large number of earthquakes is given by

(2)

where the integral is performed within the whole space of θ defined, R. The above equation represents the fact that IGpe is equivalent to the Kullback-Leibler quantity of information expressing the distance between two probability distributions. Assuming that f(θ) and g(θ) are normal multivariate distributions, Imoto (2007) derived an analytical equation to estimate the IGpe value.

For the sake of convenience, the main results of the previous studies (Imoto, 2007) will be summarized below. For a single parameter θ1, the IGpe(θ1) can be represented as

(3)

where μ1 is the mean, and is the variance of f(θ1), and those of g(θ1) are scaled to be 0 (mean) and 1 (variance).

Next, we consider n variables θ1, θ2,…θ n as possessing joint density distributions f (θ1, θ2, …θ n ) and g(θ1, θ2, …θ n ), and their marginal distributions of θ i are noted as f i (θ i ) for the conditional distribution and g i (θ i ) for the background distribution. If variables θ1, θ2, …θ n are mutually independent in both distributions and are normally distributed with the mean μ i , and variance for the conditional distribution and 0 and 1 for the background distribution, the IGpe can be represented as follows.

(4)

We assume here that the correlation among the n variables θ1, θ2, …θ n only occurs in the conditional density distribution f (θ1, θ2, …θ n ):

(5)

where the superscript −1 refers to the inverse of a matrix, and the covariance matrix C can be expressed as

(6)

where ρ ij is the correlation coefficient between θ i and θ j .

By introducing an appropriate transformation of the (θ) coordinate system with an orthogonal matrix, the covariance matrix can be expressed as a diagonal matrix. At the same time, the vector μ is transformed into μ′ with the same orthogonal matrix. Referring to the previous case, the IGpe is represented by

(7)

where trace denotes the sum of the diagonal elements and is an invariant parameter for a unitary transformation, and represent the eigenvalues of C. Comparing Eqs. (7) and (4), the first term in the right side of Eq. (7) exceeds that of Eq. (4) unless every ρ ij is zero. The other three terms have the same values in both equations. Therefore, the IGpe for a conditional distribution of correlated variables always exceeds that with no correlation.

In general, some correlations among parameters may be observed in both distributions. The procedure from Eq. (5) to Eq. (7) could be applied after the covariance matrix for the background distribution is changed into the identity matrix by transformations of the coordinate system with an orthogonal matrix and a diagonal matrix.

Once we have estimated the means and variances of the parameters together with the correlation matrices for both the conditional and the background distributions, we can represent them by f (θ) and g(θ) and thus calculate the hazard function of Eq. (1). This function estimates the hazard rate at any point of interest conditioned on the parameter values observed at that point.

3. Data

We consider a seismicity model for earthquakes M ≥ 5.0 in Japan based on P-wave velocity perturbation data. We use the hypocenter parameters for 1961–2008 determined by the Japan Meteorological Agency (JMA). This period is selected to balance both the number of earthquakes and the accuracy of estimated locations. In terms of the complex tectonic setting in and around Japan, we restricted ourselves to earthquakes shallower than 30 km.

Matsubara et al. (2008) constructed three-dimensional P- and S-wave velocity models beneath all of Japan at depths of 0–40 km, with a 0.2° grid spacing in the horizontal direction and a 5- to 10-km spacing in the vertical direction. They also constructed a velocity model down to the depth of 400 km with less densely spaced grids. In general, velocity variations from a standard velocity model are estimated since P- and S-wave velocities strongly depend on depth. Therefore, comparing variations at the same depth may be useful for obtaining characteristic seismogenic features of a focal area.

Accordingly, we consider a two-dimensional seismicity model in which hazard rates at horizontal spacing grids are defined. P-wave velocity differences at four different depths (10, 15, 20, and 25 km) for each point are used. We consider that a set of these four parameters plays important roles as predictive parameters. More than 3,000 points with reliable velocity anomalies are selected at every point of a 0.1 × 0.1° grid for the background distributions, which mostly cover inland parts of Japan, with the exception of Hokkaido Island. To estimate the conditional distributions, we select 198 epicenters of earthquakes (Table 1) with magnitudes >5.0 that occurred between 1961 and 2008.

Table 1. List of target earthquakes used for the conditional distribution.

4. Information Gain per Event

Figure 1 illustrates the empirical background distributions (dark solid line) for the four parameters and the normal functions fitted to them (light dashed line). Each background distribution is generally well-approximated by a normal function. In the same way, Fig. 2 illustrates the conditional background distributions (dark solid line) and the normal functions fitted to them (light dashed line). For the conditional distributions at 20 and 25 km, the normal approximation is not a close fit.

Fig. 1.
figure 1

Cumulative background distributions for the four parameters. Empirical background distributions (dark line) and normal distributions (light dashed line) fitted to the background distributions.

Fig. 2.
figure 2

Cumulative conditional distributions for the four parameters. Empirical conditional distributions (dark line) and normal distributions (light dashed line) fitted to the conditional distributions.

The chi square-test for goodness-of-fit was performed within the framework of the null hypothesis that P-wave velocity differences at each depth possess a normal distribution. The hypothesis for samples at either 10 or 15 km is accepted at the 10% level of significance. The hypothesis for samples at either 20 or at 25 km is accepted at the 1% level of significance, which may appear higher than usual but is assessed to be adequate for fitting with a function of two parameters.

The parameters of these normal distributions are summarized in Table 2. The last column of the table indicates the IGpe for each predictive parameter, calculated using Eq. (3), where both distributions are assumed to be normally distributed. It is obvious that no large differences exist between conditional and background distributions. If we use the predictive parameter separately, an IGpe of 0.03 at most is expected for the parameter measured at a depth of 25 km.

Table 2. Terms of normal distributions for each parameter and its IGpe value.

However, correlations among the four parameters are observed for both distributions. Table 3 summarizes the correlation matrices in the background (lower left) and conditional (upper right) distributions. The coefficient in the conditional distribution always exceeds the corresponding one in the background distribution. Specifically, for the correlation between parameters at depths of 10 and 25 km, the coefficient of 0.498 in the conditional distribution is larger than that of 0.103 in the background distribution. These features of the correlation matrices suggest a better predictive power with correlated parameters than that expected from a single parameter, as indicated by Eq. (7). Given the values in Tables 2 and 3, we can construct the background and conditional densities. The hazard function is obtained by multiplying f (θ)/g(θ) by an average rate (Poisson rate). Using the formula developed by Imoto (2007), we can estimate an IGpe of 0.30 for the seismicity model of this hazard function. This value is equivalent to a probability gain of 1.35 across all target earthquakes.

Table 3. Correlation matrices. Lower left: Observed in background distribution. Upper right: Observed in conditional distribution.

Figure 3 plots joint distributions between the parameters at 10 and 25 km for both cases. The plot of the conditional distribution (dark circles) is more concentrated than that of the background distribution (light circles). The plot of the conditional distribution is more or less located in a lower right part of the background distribution, which corresponds to the evidence showing that velocity becomes higher beneath an epicenter than the average velocity at a depth of 10 km but becomes lower than the average velocity at a depth of 25 km (Table 2). A moderate correlation between two parameters is observed in the plot of the conditional distribution, whereas no clear correlation is observed in that of the background distribution (Table 3). In the present case, the contributions from every depth sum up to a negligible IGpe value of 0.06, but the correlations among parameters could lead to a useful IGpe value of 0.3.

Fig. 3.
figure 3

Joint distributions of the predictive parameters at depths of 10 and 25 km. The ordinate represents P-wave velocity at a depth of 25 km, and the abscissa represents P-wave velocity at a depth of 10 km. Dark symbols indicate plots of the conditional distributions, and light symbols indicate those of background distributions.

5. Discussion and Conclusions

To confirm the above IGpe estimates, we have performed a simulation study to obtain the distribution of IGpe values by a bootstrap method. In generating a set of samples, we consider only the variations of parameter values in the conditional distributions (Tables 2 (right group) and 3 (upper right). Two different data sources, the conditional distribution and the background distribution, are adopted. In both simulations, we randomly select 198 samples from the distribution, which are assigned as a simulated conditional distribution. After calculating means, variances, and correlation coefficients among parameters, we are able to estimate IGpe. By iterating 10,000 sets, we are also able to find the distribution of the IGpe value, which gives an average IGpe value and the standard deviation. When we select samples from the conditional distribution, we obtain an average of 0.36 and a standard deviation of 0.06 (right line in Fig. 4). The observed IGpe thus falls within one standard deviation of the mean. In contrast, when we use the background distribution, we obtain an average of 0.04 and a standard deviation of 0.02 (left line in Fig. 4). These results suggest that the observed IGpe is not obtained by chance from the background distribution of the parameters.

Fig. 4.
figure 4

Cumulative distributions of IGpe values simulated by a bootstrap method. The left curve with mean 0.04 and standard deviation 0.02 denotes IGpe values obtained based on background distributions, and the right curve with mean 0.36 and standard deviation 0.06 denotes IGpe values based on conditional distributions. The vertical line in the right curve indicates the IGpe estimated in the actual case.

Matsubara and Obara (2008) studied the relationship between the seismic velocity structure and the active tectonic faults in the Japan Islands. They first estimated velocity variations at depths of 5, 10, 15, and 20 km and then they compared the values beneath the fault zones with the nationwide averages. They found that velocity becomes higher than the average velocity in the shallow part beneath the fault zones but becomes lower than the average velocity in the deeper part. Based on this finding, they suggested that seismic velocity anomaly could contribute to the detection of blind active faults. Although their finding has not been examined quantitatively, it implies that a P-wave velocity model could contribute to the assessments of the seismoge-nesis of shallow earthquakes of moderate and large magnitude.

Taking into account the close relationship between large earthquakes and active faults, we focus on epicenters of earthquakes with a magnitude ≥5.0 at a shallow depth as the conditional group. It may be possible to adopt fault zones as a conditional group, but epicenters of earthquakes are more exactly defined and more easily selected than fault zones. Even with these simple selections, we are able to construct a seismicity model that could possibly assess the seismogenesis of shallow earthquakes. It may be possible to propose more effective models after various predictive parameters have been examined. However, how such models would perform remains to be seen.

Figure 5 shows the probability gains at every point of a 0.1 × 0.1° grid. In general, a probability gain is defined by a ratio of the hazard function (Eq. (1)) to the Poisson rate (m0/V0), which becomes equal to the f (θ)/g(θ) value in the present case. Although the map indicates some parts of high probability gain up to 2.0, an average over the values at 198 epicenters (Table 1) becomes 1.35. Imoto and Rhoades (2010) combined two models, namely, the Every Earthquake a Precursor According to Scale model (EEPAS, Rhoades and Evison, 2006) and a three-parameter model (Imoto, 2008), into a better performance model in which the hazard rate of the EEPAS model is treated as a surrogate precursor. In a similar way, we can combine the present parameters and an appropriate seismicity model into a better performance model. A study focusing on this point will be conducted in the future.

Fig. 5.
figure 5

A map of probability gains ( f (θ)/g(θ) values) at every point of a 0.1 × 0.1 grid.

In summary, we have attempted to assess the performance of a seismicity model for shallow earthquakes in Japan based on a P-wave velocity model. Applying the formula derived by Imoto (2007) to the P-wave velocity data, we assessed that IGpe of the model is 0.3 units, after incorporating the correlations among the parameters. The bootstrap method suggests that this IGpe value could not be obtained by chance from the background distribution.