1 Introduction

The probabilistic seismic hazard is a potential possibility of the occurrence of ground motion caused by seismicity, expressed in the form of likelihoods. This possibility results from probabilistic properties of the seismic source, propagation of seismic waves from the source to a receiver and receiving site. The probabilistic seismic hazard analysis (PSHA) problem can be presented as

$$\Pr \left[ {{\text{amp}}\left( {x_{0} ,y_{0} } \right) \ge a\left( {x_{0} ,y_{0} )\;{\text{in}}\;D\;{\text{time}}\;{\text{units}}} \right)} \right] = p,$$
(1)

where D and p are given values and one looks for the value of amplitude parameter of ground motion, a(x 0 , y 0), at the given point (x 0 , y 0) whose exceedance probability in D time units is p.

The classic formulation of PSHA assumes that the earthquake occurrence process is Poissonian (e.g. Cornell 1968; Cornell and Toro 1970; Reiter 1991). There are numerous papers indicating the non-Poissonian character of tectonic (e.g. Shlien and Toksöz 1970; Vere-Jones 1970; Kiremidjian and Anagnos 1984; Cornell and Winterstein 1988; Parvez and Ram 1997; Lana et al. 2005; Xu and Burton 2006; Chang et al. 2006; Jimenez 2011; Martin-Montoya et al. 2015) as well as anthropogenic seismic processes (e.g. Lasocki 1992; Weglarczyk and Lasocki 2009; Marcak 2013). However, the classic formulation with Poisson model for earthquake occurrence is still often used (e.g. Petersen et al. 2014, 2015) as it may be appropriate for the cases involving broad average of mixtures of seismic processes. Nonetheless, its practical application should be preceded by a rigorous check of the applicability of Poisson model. When the Poisson model for earthquake occurrences is accepted, the exceedance probability, that is the probability that the amplitude parameter of ground motion will exceed a at (x 0 , y 0) in any time intervthe seismic source and pointal of length D time units is:

$$\Pr \left[ {{\text{amp}}\left( {x_{0} ,y_{0} } \right) \ge a\left( {x_{0} ,y_{0} } \right)} \right] = \int\limits_{0}^{\infty } {\int\limits_{\rm M} {\Pr \left[ {{\text{amp}}\left( {x_{0} ,y_{0} } \right) \ge a(x_{0} ,y_{0} )|r,M} \right]f\left( r \right)f\left( {M\left| {N\left( D \right) \ne 0} \right.} \right){\text{d}}M} {\text{d}}r,}$$
(2)

where N(D) is the number of seismic events in the time interval of length D time units, M is the event magnitude, and f(M|N(D) ≠ 0) is the probability density function of M, conditional upon the occurrence of seismic events in D, r is the distance between the seismic source and point (x 0, y 0), f(r) is the probability density function of r, and Pr[amp(x 0, y 0) ≥ a(x 0, y 0)|r, M] is the probability of occurrence at (x 0, y 0) the ground motion amplitude greater than or equal to a(x 0, y 0), when the event of magnitude M is located at the distance r from the point (x 0, y 0). It is also assumed in (2) that the event magnitude and location are independent. The conditional magnitude density reads:

$$f\left( {M\left| {N\left( D \right) \ne 0{\kern 1pt} } \right.} \right) = - \frac{\text{d}}{{{\text{d}}M}}\left[ {\frac{{R\left( {M,D} \right)}}{{1 - \exp \left( { - \lambda D} \right)}}} \right],$$
(3)

where,

$$R\left( {M,D} \right) = 1 - \exp \left[ { - \lambda D\left( {1 - F\left( M \right)} \right)} \right],$$
(4)

R(M, D), referred to as exceedance probability, is the total probability that in D time units there will be events equal to or greater than M, where F(M) is the cumulative distribution function (CDF) of magnitude, λ is the mean activity rate, that is the parameter of Poisson’s distribution of earthquake occurrence. R(M, D) represents a potential of the seismic source.

The other function often used to express probabilistic properties of seismic sources when the Poisson model for occurrence is applied, is the reciprocal of the rate of occurrence of earthquakes of magnitude M or greater referred to as the mean return period (e.g. Lomnitz 1974; Baker 2008):

$$T\left( M \right) = \left\{ {\lambda \left[ {1 - F\left( M \right)} \right]} \right\}^{ - 1} .$$
(5)

The mean return period is the average recurrence time of events of magnitude M or greater. Both these hazard functions, R(M, D) and T(M), depend on the mean event rate of the Poisson temporal occurrence of earthquakes and the distribution of magnitude. The interval estimation of the CDF of magnitude, F(M) and subsequently the interval estimation of R(M, D) and T(M) where the Poisson occurrence model is accepted but only F(M) uncertainty is taken into considerations have been presented in Orlecka-Sikora (2004, 2008). Extending these works, here we propose a method for the interval estimation of R(M, D) and T(M) functions that accounts for aggregated uncertainty resulting from the uncertainty of the Poisson mean event rate, λ, estimate and the uncertainty of CDF of magnitude, F(M), estimate. On synthetic and actual seismicity cases we analyze improvements introduced by such an integrated approach.

2 Interval Estimation of Seismic Hazard Parameters

When taking into account the aggregated uncertainty in the activity rate and magnitude CDF estimates the confidence intervals (CI) of hazard functions are evaluated on the plug-in curve in the following way:

  1. 1.

    First, the percentiles of the mean activity rate distribution, \(\lambda^{(\alpha )}\), are estimated, where \(\alpha\) is the percentile order;

  2. 2.

    Next, at each value of M the percentiles of CDF, \(F\left( M \right)^{(\alpha )}\), are calculated for each \(\alpha\);

  3. 3.

    The values of \(R\left( {M, D} \right)\), \(T\left( M \right)\) (or other hazard functions), are products of all combinations of the percentiles \(\hat{\lambda }^{(\alpha )}\) with the percentiles \(\hat{F}\left( M \right)^{(\alpha )}\);

  4. 4.

    The confidence intervals on the plug-in \(\hat{R}\left( {M, D} \right)\), \(\hat{T}\left( M \right)\) are determined directly from the sorted values of \(\hat{R}^{k} \left( {M, D} \right)\), \(\hat{T}^{k} \left( M \right)\) obtained for the particular value of M, where \(k = 1,2, \ldots , l^{2}\) denotes the combinations of \(l\) evenly spaced percentiles λ with \(l\) the same percentiles for magnitude CDF. The interval of intended coverage \(1 - 2\alpha\) is given by the \(\left\lfloor\alpha \cdot l^{2} \right\rfloor\)-th and \(\left\lceil \left( {1 - \alpha } \right) \cdot l^{2} \right\rceil\)-th values of the series of \(\hat{R}^{k} \left( {M, D} \right)\), \(\hat{T}^{k} \left( M \right)\), where \(\left\lfloor a \right\rfloor /\left\lceil a \right\rceil\) denotes the largest/smallest integer less/greater than or equal to \(a\), respectively.

For the assumed here Poisson earthquake occurrence the mean activity rate estimate is λ = N(D)/D. In this case, the standard method of confidence interval construction for the Poisson mean is based on inverting an equal tailed test for the null hypothesis \(H_{0} :\lambda = \lambda_{0}\) using the exact distribution, e.g. normal. However, Patil and Kulkarni (2012) present that this approach provides conservative and too wide confidence intervals. After review of the existing methods for obtaining the Poisson confidence intervals they recommend to choose method adjusted to the value of mean activity rate. In the case where the value of mean activity rate is lower than 2 they propose to use one of the following methods:

  1. (a)

    Modified Wald (Barker 2002):

    $${\text {CI:\,}}\left\{ {\begin{array}{*{20}c} {\left[ {0; - \log \left( {\frac{\alpha }{2}} \right)} \right]\quad \quad \quad \quad \quad \quad \quad {\text{for}} \,x = 0 } \\ {\left[ {x + z_{{\frac{\alpha }{2}}} \cdot \sqrt x ;x + z_{{1 - \frac{\alpha }{2}}} \cdot \sqrt x } \right]\quad {\text{for}}\, x > 0} \\ \end{array} } \right.$$
    (6)
  2. (b)

    Wald continuity correction (Schwertman and Martinez 1994):

    $${\text{CI:}}\left[ {\left( {x - 0.5} \right) + z_{{\frac{\alpha }{2}}} \cdot \sqrt {x - 0.5} ;\left( {x + 0.5} \right) + z_{{1 - \frac{\alpha }{2}}} \cdot \sqrt {x + 0.5} } \right],$$
    (7)

    where \(x\) is the number of observation in the considered time period, \(z_{{\frac{\alpha }{2}}}\) and \(z_{{1 - \frac{\alpha }{2}}}\) are the quantiles of the standard Gaussian distribution of the order \(\frac{\alpha }{2}\) and \(1 - \frac{\alpha }{2}\), respectively. In the case where the mean activity rate is larger authors suggest to use one of the four methods:

  3. (a)

    Garwood (1936):

    $${\text{CI:}}\left[ {\frac{{\chi^{2}_{{\left( {2x,\frac{\alpha }{2}} \right)}} }}{2}; \frac{{\chi^{2}_{{\left( {2\left( {x + 1} \right),1 - \frac{\alpha }{2}} \right)}} }}{2}} \right]$$
    (8)
  4. (b)

    Wilson and Hilferty (1931):

    $${\text{CI:\,}}\left[ {x\left( {1 - \frac{1}{9}x + \frac{1}{3}z_{{\frac{\alpha }{2}}} \sqrt x } \right);\left( {x + 1} \right)\left( {1 - \frac{1}{9}\left( {x + 1} \right) + \frac{1}{3}z_{{1 - \frac{\alpha }{2}}} \sqrt {x + 1} } \right)} \right]$$
    (9)
  5. (c)

    Molenaar (1970):

    $${\text{CI:}}\left[ {\left( {x - 0.5} \right) + \frac{{\left( {2z_{{\frac{\alpha }{2}}}^{2} + 1} \right)}}{6} + z_{{\frac{\alpha }{2}}} \sqrt {\left( {x - 0.5} \right) + \frac{{\left( {z_{{\frac{\alpha }{2}}}^{2} + 2} \right)}}{18}} ;\left( {x + 0.5} \right) + \frac{{\left( {2z_{{1 - \frac{\alpha }{2}}}^{2} + 1} \right)}}{6} + z_{{1 - \frac{\alpha }{2}}} \sqrt {\left( {x + 0.5} \right) + \frac{{\left( {z_{{1 - \frac{\alpha }{2}}}^{2} + 2} \right)}}{18}} } \right]$$
    (10)
  6. (d)

    Begaud et al. (2005):

    $${\text{CI:}}\left[ {\left( {\sqrt {x + 0.02} + \frac{{z_{{\frac{\alpha }{2}}} }}{2}} \right)^{2} ;\left( {\sqrt {x + 0.96} + \frac{{z_{{1 - \frac{\alpha }{2}}} }}{2}} \right)^{2} } \right],$$
    (11)

    where \(\chi^{2}_{{\left( {n,\alpha } \right)}}\) are the quantiles of the \(\alpha\) order of the \(\chi^{2}\) distribution with \(n\) degrees of freedom.

We consider two approaches to model the magnitude distribution. First is the most popular exponential magnitude distribution model, which results from the Gutenberg–Richter relation and reads:

$$f\left( M \right) = \beta e^{{ - \beta \left( {M - M_{ \hbox{min} } } \right)}} ;\;F\left( M \right) = 1 - e^{{ - \beta \left( {M - M_{\hbox{min} } } \right)}} \;\quad M \ge M_{\hbox{min} } ,$$
(12)

f(M) = F(M) = 0 for M < M min, \(\beta = bln10\), where b is the Gutenberg–Richters’ constant and M min known as magnitude completeness is the lower limit of magnitude of events, which statistically all are present in the analyzed sample of earthquakes. For this model, the interval estimation of its parameter is usually based on the asymptotic normality of the maximum likelihood estimator.

The second approach is applicable to deal with multicomponental seismic processes in which the magnitude distribution does not follow the Gutenberg–Richter relation but is more complex, often multimodal. It is then proposed to use the nonparametric kernel estimation of magnitude distribution (e.g. Lasocki et al. 2000; Kijko et al. 2001; Orlecka-Sikora and Lasocki 2005; Lasocki and Papadimitriou 2006; Lasocki 2008; Quintela-del-Rio 2010; Francisco-Fernandez et al. 2011; Francisco-Fernandez and Quintela-del-Rio 2011). The adaptive kernel estimate of magnitude probability density function (PDF), \(f\left( M \right)\), is constructed by summing up the Gaussian kernel functions:

$$\hat{f}^{a} \left( M \right) = \left\{ \begin{aligned} 0\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \;\;\quad {\text{for}}\quad M < M_{\hbox{min} } \hfill \\ \frac{{\frac{1}{{\sqrt {2\pi } }}\sum\nolimits_{i = 1}^{n} {\frac{1}{{\omega_{i} h}}\exp \left[ { - 0.5\left( {\frac{{M - M_{i} }}{{\omega_{i} h}}} \right)^{2} } \right]} }}{{n - \sum\nolimits_{i = 1}^{n} {\varPhi \left( {\frac{{M_{\hbox{min} } - M_{i} }}{{\omega_{i} h}}} \right)} }}\quad \quad{\text{for}}\quad M \ge M_{\hbox{min} } \hfill \\ \end{aligned} \right.,$$
(13)

where \(n\) is the number of events greater than or equal to \(M_{\hbox{min} }\), \(M_{i}\) are the sizes of these events, \(\varPhi ( \bullet )\) denotes the standard Gaussian cumulative distribution, \(h\) is the smoothing factor automatically selected from the data using the least squares cross-validation technique (Silverman 1986). For the Gaussian kernel function and this \(h\) selection method it is the root of the equation (Kijko et al. 2001):

$$\begin{aligned} \sum\limits_{i,j} {\left\{ {2^{ - 0.5} \left[ {\frac{{\left( {M_{i} - M_{j} } \right)^{2} }}{{2h^{2} }} - 1} \right]\exp \left[ { - \frac{{\left( {M_{i} - M_{j} } \right)^{2} }}{{4h^{2} }}} \right]} \right.} \hfill \\ \left. {\quad - 2\left[ {\frac{{\left( {M_{i} - M_{j} } \right)^{2} }}{{h^{2} }} - 1} \right]\exp \left[ { - \frac{{\left( {M_{i} - M_{j} } \right)^{2} }}{{2h^{2} }}} \right]} \right\} - 2n = 0. \hfill \\ \end{aligned}$$
(14)

The local bandwidth factors \(\omega_{i} , i = 1, \ldots ,n\) cause the smoothing factor to adapt to uneven data density along the magnitude range. They are estimated as follows

$$\omega_{i} = \left[ {\frac{{\tilde{f}\left( {M_{i} } \right)}}{g}} \right]^{ - 0.5} ,$$
(15)

where \(\tilde{f}\left( \bullet \right)\) is the pilot, constant kernel estimator

$$\tilde{f}\left( M \right) = \frac{1}{{\sqrt {2\pi } }} \cdot \frac{1}{nh}\sum\limits_{i = 1}^{n} {\exp \left[ { - 0.5\left( {\frac{{M - M_{i} }}{h}} \right)^{2} } \right]} ,$$
(16)

and \(g = \left[ {\prod\nolimits_{i = 1}^{n} {\tilde{f}\left( {M_{i} } \right)} } \right]^{{\frac{1}{n}}}\) is the geometric mean of all constant kernel estimates. Such adaptive approach improves effectiveness of the nonparametric estimator in high magnitude intervals where the data are sparse. The corresponding magnitude CDF estimator is:

$$\hat{F}^{a} \left( M \right) = \left\{ \begin{aligned} 0\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad {\text{for}}\quad M < M_{\hbox{min} } \hfill \\ \frac{{\sum\nolimits_{i = 1}^{n} {\left[ {\varPhi \left( {\frac{{M - M_{i} }}{{\omega_{i} h}}} \right) - \varPhi \left( {\frac{{M_{\hbox{min} } - M_{i} }}{{\omega_{i} h}}} \right)} \right]} }}{{n - \sum\nolimits_{i = 1}^{n} {\varPhi \left( {\frac{{M_{\hbox{min} } - M_{i} }}{{\omega_{i} h}}} \right)} }}\quad \quad{\text{for}}\quad M \ge M_{\hbox{min} } \hfill \\ \end{aligned} \right..$$
(17)

Further details on the nonparametric estimator and its adoption for magnitude distribution estimation are provided in Lasocki et al. (2000), Kijko et al. (2001), and Orlecka-Sikora and Lasocki (2005) and the references therein.

For the nonparametric modeling of magnitude distribution we propose the iterated bias corrected and accelerated method (IBCa method) for interval estimation (Orlecka-Sikora 2004, 2008). This procedure is based on the smoothed bootstrap and second-order bootstrap samples. The algorithm begins from the so-called bias corrected and accelerated method (BCa method, Efron 1987). The BCa intervals are second-order accurate and transformation respecting (Efron 1987; Efron and Tibshirani 1998). To improve the accuracy of results of the magnitude CDF confidence interval estimation we use of the iterated bootstrap for estimating the bias-correction parameter. According to the iterated BCa method, for any magnitude value the interval of intended coverage \(1 - 2\alpha\) of the non-parametric magnitude CDF is given by:

$$\left( {\hat{F}_{\alpha 1}^{a*} ,\hat{F}_{\alpha 2}^{a*} } \right),$$
(18)

where \(\hat{F}_{\alpha 1}^{a*}\) and \(\hat{F}_{\alpha 2}^{a*}\) are bootstrap estimated percentiles of the distribution of nonparametric magnitude CDF estimator, \(\hat{F}_{{}}^{a}\). The orders of percentiles, \(\alpha 1\) and \(\alpha 2\), are calculated from the equations:

$$\alpha 1 = \varPhi \left( {\hat{z}_{0} + \frac{{\hat{z}_{0} + z_{\alpha } }}{{1 - \hat{a}\left( {\hat{z}_{0} + z_{\alpha } } \right)}}} \right),$$
(19)
$$\alpha 2 = \varPhi \left( {\hat{z}_{0} + \frac{{\hat{z}_{0} + z_{1 - \alpha } }}{{1 - \hat{a}\left( {\hat{z}_{0} + z_{1 - \alpha } } \right)}}} \right),$$
(20)

where \(z_{\alpha }\) and \(z_{1 - \alpha }\) are percentiles of the standard Gaussian distribution, \(\hat{z}_{0}\) is the estimate of bias-correction, and \(\hat{a}\) is the estimate of the acceleration constant. The bias-correction, \(z_{0}\), measures the discrepancy between the median of \(\hat{F}_{i}^{a*}\) and \(\hat{F}_{i}^{a}\), in normal units. According to IBCa method \(\hat{z}_{0}\) is estimated as a mean value of the bootstrap estimates of \(\hat{z}_{0}\), \(\hat{z}_{0}^{*}\). Each value of \(\hat{z}_{0}^{*}\) is obtained from the proportion of the second-order bootstrap CDF estimates, \(\hat{F}_{i}^{a**}\), less than the magnitude CDF estimated from the \(b\)-th bootstrap data sample, \(\hat{F}_{{}}^{a} \left( m \right)_{b}^{*}\), where \(b = 1,\;2,\; \ldots ,\;B\) and \(B\) is the number of the first order bootstrap samples (Orlecka-Sikora 2008):

$$\hat{z}_{0}^{*} = \varPhi^{ - 1} \left( {\frac{{{\text{number}}\;{\text{of}}\;{\text{II}}\;{\text{order}}\;{\text{bootstrap}}\;{\text{estimates}}\;\;\hat{F}_{{}}^{a} \left( M \right)_{i}^{**} < \hat{F}_{{}}^{a} \left( M \right)_{b}^{*} }}{j}} \right),$$
(21)

where \(\varPhi^{ - 1} \left( \bullet \right)\) indicates the inverse function of the standard Gaussian CDF, \(j\) is the number of second-order bootstrap samples drawn from every bootstrap sample and used to estimate the magnitude CDF, \(\left\{ {\,\hat{F}_{{}}^{a} \left( M \right)_{i}^{**} ,\;i = 1,\;2,\; \ldots ,\;j\,} \right\}\).

The acceleration constant refers to the rate of change of the standard error of \(\hat{F}_{i}^{a}\) with respect to the actual value of magnitude CDF. The acceleration constant can be evaluated in various ways, for instance from the equation (Efron and Tibshirani 1998):

$$\hat{a} = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {\hat{F}_{{_{\left( \bullet \right)} }}^{a} - \hat{F}_{{_{{\left( {\text{ijack}} \right)}} }}^{a} } \right)^{3} } }}{{6\left\{ {\sum\nolimits_{i = 1}^{n} {\left( {\hat{F}_{{_{\left( \bullet \right)} }}^{a} - \hat{F}_{{_{{\left( {\text{ijack}} \right)}} }}^{a} } \right)^{2} } } \right\}^{3/2} }},$$
(22)

where \(\hat{F}_{{_{{\left( {\text{ijack}} \right)}} }}^{a}\) denotes the \(i\)-th jackknife nonparametric estimate of magnitude CDF, and \(\hat{F}_{{_{\left( \bullet \right)} }}^{a}\) is the arithmetic mean of all jackknife estimates.

The bootstrap samples are generated by sampling \(n\)-times with replacement from the original data set. Given a data sample \(M = \left\{ {M_{i} } \right\}\), \(i = 1,\;2,\; \ldots ,\;n\), the bootstrap sample is obtained from the formula:

$$y_{i} = M'_{i} + h \cdot \omega_{i} \cdot \varepsilon ,$$
(23)

where \(M'_{i}\) represents the results of resampling with replacement from the original data points, the smoothing factor \(h\) is estimated on the basis of the original data sample, the local bandwidth factors \(\omega_{i}\) are calculated on the basis of the original data sample for \(M'_{i}\) values, and \(\varepsilon\) is the standard normal random variable, (Silverman 1986). The \(i\)-th jackknife sample is defined as the original sample with the \(i\)-th data point removed (Efron and Tibshirani 1998).

To achieve a desired level of accuracy of the quantile level of CI of magnitude CDF the number of bootstrap samples can be calculated using three-step method (Andrews and Buchinsky 2002; Orlecka-Sikora 2008). Further details on the IBCa interval estimation and justification of its use for magnitude CDF estimation when nonparametric approach is applied can be found in the cited works and the references therein.

3 Performance of the Algorithm

The performance of the proposed approach is studied on Monte Carlo generated seismic catalogues linked to three models of magnitude distribution. The functional form of the first two models is the one-side truncated exponential distribution of magnitude, Eq. 12. The parameters for the simulations are: b = 1.7 (\(\beta = 3.8\)), \(M_{\hbox{min} } = 1.1\) for the first model and b = 0.6 (\(\beta = 1.4\)), \(M_{\hbox{min} } = 1.0\) for the second one. An actual example of the first model-like magnitude distribution is the seismic sequence that occurred in connection with a geothermal well in Basel in Switzerland (e.g. Haege et al. 2012; Urban et al. 2015). The second model corresponds for instance to the seismicity triggered by a surface reservoir impoundment of the hydropower plant Song Tranh 2 in Central Vietnam (e.g. Wiszniowski et al. 2015; Urban et al. 2015). The third model is a mixture of two one-side truncated exponential distributions, and reads:

$$f(x) = \left\{ \begin{aligned} \lambda \cdot \beta_{1} \cdot e^{{ - \beta_{1} x}} \quad {\text{dla}}\quad 0 \le x \le x_{c} \hfill \\ \mu \cdot \beta_{2} \cdot e^{{ - \beta_{2} x}} \quad {\text{dla}}\quad x \ge x_{c} \hfill \\ \end{aligned} \right.,$$
(24)

where \(x = M - M_{\hbox{min} }\), \(x_{c} = M_{c} - M_{\hbox{min} }\), \(M_{c}\) is the magnitude for which the break of linear scaling is observed, \(\lambda = \left\{ {1 - \left( {1 - \frac{{\beta_{1} }}{{\beta_{2} }}} \right) \cdot e^{{ - \beta_{1} x_{c} }} } \right\}^{ - 1}\), \(\mu = \lambda \cdot \frac{{\beta_{1} }}{{\beta_{2} }} \cdot \frac{{e^{{\beta_{2} x_{c} }} }}{{e^{{\beta_{1} x_{c} }} }}\). This function models complex magnitude generation processes. The parameters for the simulation are \(b_{1} = 1.05\;(\beta_{1} = 2.42)\), \(\,b_{2} = 1.55\;(\beta_{1} = 3.57)\), \(\,M_{\hbox{min} } = 3.5,\quad M_{c} = 5.0\).

From each of these model distributions we draw 50 samples of 50 elements each and 50 samples of 100 elements each. Every sample is used to estimate the cumulative distribution, \(F(M)\), and the seismic hazard functions, \(R\left( {M, D} \right)\), \(T\left( M \right)\). The estimation is done by fitting the parametric exponential model, Eq. 12, to data drawn from model 1 and model 2 and using the adaptive nonparametric kernel estimator, Eqs. 1317, for data drawn from model 3. We use mean activity rate values from the range 0.1–10 events/time unit. In this way, we obtain an opportunity to track scenarios stemming from combinations of: (a) seismic sequence with low activity rate, (b) seismic sequence with high activity rate, (c) seismic sequence with low value of magnitude CDF for the specified \(M\), and (d) seismic sequence with high value of magnitude CDF for the specified \(M\).

In Figs. 1, 2, 3 and 4 the exact values of the exceedance probability, \(R\left( {M, D} \right)\), and mean return period function, T(M), are compared with the estimates of 95% CI calculated with and without inclusion of the activity rate uncertainty. The results come from one of the above mentioned 100 and 50 event sample drawn from model 1 and model 2 distributions, respectively, and from 100 event sample drawn from model 3 distribution. The estimation of \(R\left( {M, D} \right)\) has been performed for magnitude M p  = 2.0 in models 1 and 2 and for M p  = 4.5 in model 3.

Fig. 1
figure 1

Exact values (solid red) and 95% CI-s estimated with (dashed blue) and without (dotted black) the inclusion of the mean activity rate uncertainty of a the exceedance probability of events of magnitude M p  = 2.0 and b the mean return period function. The results have been obtained for 100 event sample drawn from the model 1 distribution. The mean activity rate has been assumed as 10 events/arbitrary unit

Fig. 2
figure 2

Exact values (solid red) and 95% CI-s estimated with (dashed blue) and without (dotted black) the inclusion of the mean activity rate uncertainty of a the exceedance probability of events of magnitude M p  = 2.0 and b the mean return period function. The results have been obtained for 50 event sample drawn from the model 2 distribution. The mean activity rate has been assumed as 0.1 events/arbitrary unit

Fig. 3
figure 3

Exact values (solid red) and 95% CI-s estimated with (dashed blue) and without (dotted black) the inclusion of the mean activity rate uncertainty of the exceedance probability of events of magnitude M p  = 4.5. The results have been obtained for a 50 and b 100 event sample drawn from the model 3 distribution. The mean activity rate has been assumed as 2.1 and 3 events/arbitrary unit for the a and b, respectively

Fig. 4
figure 4

Exact values (solid red) and 95% CI-s estimated with (dashed blue) and without (dotted black) the inclusion of the mean activity rate uncertainty of the mean return period function. The results have been obtained for a 50 and b 100 event sample drawn from the model 3 distribution. The mean activity rate has been assumed as 2.1 and 3 events/arbitrary unit for the a and b, respectively

Figures 5, 6, 7 and 8 show the relative disparity of mean upper/lower bound of 95% CI of the exceedance probability when assuming an aggregated uncertainty of the activity rate and magnitude CDF and when accounting only for CDF uncertainty. The disparity is evaluated by:

$$\delta_{U/L} \left( {M_{p} } \right) = \frac{{\bar{R}_{U/L} \left( {M_{p} ,D} \right) - \bar{R}_{U/L}^{B} \left( {M_{p} ,D} \right)}}{{\bar{R}_{U/L}^{B} \left( {M_{p} ,D} \right)}},$$
(25)

where \(\bar{R}_{U/L}^{B} \left( {M_{p} ,D} \right)\) is the mean of 50 estimates of the upper/lower bound of 95% CI of exceedance probability when assuming the aggregated uncertainty, and \(\bar{R}_{U/L} \left( {M_{p} ,D} \right)\) is this mean when the mean activity rate estimate is assumed to be error free.

Fig. 5
figure 5

The relative disparity between the mean 95% CI-s of exceedance probability estimated with and without inclusion of the mean activity rate uncertainty. Red lines correspond to the upper bound and blue lines to the lower bound of CI-s. The calculations have been done for \(M_{p} = 3.5\), D = 12 arbitrary units and for λ ranging from 0.1 to 10. The a 50 and b 100 element magnitude samples have been drawn from model 1 of magnitude distribution with parameters: \(\beta = 3.8\), \(M_{\hbox{min} } = 1.1\)

Fig. 6
figure 6

The relative disparity between the mean 95% CI-s of exceedance probability estimated with and without inclusion of the mean activity rate uncertainty. Red lines correspond to the upper bound and blue lines to the lower bound of CI-s. The calculations have been done for \(M_{p} = 3.0\) (a) and for \(M_{p} = 2.0\) (b), for D = 12 arbitrary units and λ ranging from 0.1 to 10. The 50 element magnitude samples have been drawn from model 1 of magnitude distribution with parameters: \(\beta = 3.8\), \(M_{\hbox{min} } = 1.1\)

Fig. 7
figure 7

The relative disparity between the mean 95% CI-s of exceedance probability estimated with and without inclusion of the mean activity rate uncertainty. Red lines correspond to the upper bound and blue lines to the lower bound of CI-s. The calculations have been done for \(M_{p} = 2\), D = 12 arbitrary units and λ ranging from 0.1 to 10. The a 50 and b 100 element magnitude samples have been drawn from model 2 of magnitude distribution with parameters: \(\beta = 1.4\), \(M_{\hbox{min} } = 1\)

Fig. 8
figure 8

The results for the induced seismicity episode from G11/8 panel in Rudna Mine. a Time changes of the estimated exceedance probability, R(M p , D) for M p  = 3.0 and D = 30 days calculated in moving time window of 100 events advancing by 1 event. b The mean return period estimates for the time window No 20. The mean activity rate for this window is 0.36 events/day. The solid green lines represent the point estimates, the blue dashed lines represent the 95% CI estimates when the mean activity rate uncertainty has been taken into account and the black dotted lines represent the 95% CI estimates when the mean activity rate uncertainty has been neglected

The analysis shows that the uncertainty of mean activity rate affects significantly the interval estimates of hazard functions only when the product λD is small, the activity rate is small and the inference does not concern very long time period D. With increasing λ, the impact of the uncertainty of magnitude CDF dominates the impact of λ uncertainty in the aggregated uncertainty of the hazard functions and the interval estimates with and without inclusion of the λ uncertainty converge. This agrees with and stems from the functional forms of the hazard functions. For larger M, 1 − F(M) tends to zero and for moderate λ and D it dominates λD in the exponent in R(M, D) (see: Eq. 4).

When the λ uncertainty effect is significant, its neglecting results in underestimation of the upper bound of CI of \(R(M_{p} ,D)\) and overestimation of its lower bound. Increasing sample size reduces the level of this misestimation. For the same sample size we observe that the effect of λ uncertainty becomes greater for smaller magnitudes. This is due to smaller magnitude CFD uncertainty for smaller magnitudes and hence a reduction of its effect in total uncertainty due to both factors: λ and F(M) (Figs. 5a, 6a, b).

4 Practical Examples

The two considered approaches to CI estimation of hazard functions have been applied to two actual sets of earthquakes related to anthropogenic seismicity accompanying, respectively, (1) underground exploitation of copper ore in the Legnica-Głogów Copper District (LGCD) in Poland (Orlecka-Sikora et al. 2012) and (2) Song Tranh 2 in Vietnam reservoir impoundment (Wiszniowski et al. 2015; Urban et al. 2015).

The first dataset from LGCD is associated with mining exploitation in section G-11/8 of Rudna mine. An in-mine seismic monitoring system records all events from there of magnitudes 1.2 and more. Mining works in the section G-11/8 began in 2002 and have been continued until present. In this study, 242 seismic events that occurred in the period from 2.01.2004 to 30.12.2005 are analyzed. The strongest tremors of local magnitudes 3.7 and 3.5 took place on 7.01 and 20.01.2005, respectively. The b-value for this dataset is very low, 0.32, and the mean activity rate is 0.3 event/day. Detailed analyses of the empirical frequency–magnitude relations of the seismicity from the LGCD area revealed that the magnitude distribution did not follow the Gutenberg–Richter relation but had a complex structure (e.g., Orlecka-Sikora and Lasocki 2005; Lasocki and Orlecka-Sikora 2008; Orlecka-Sikora 2008). In such cases, the nonparametric kernel estimator is used to estimate the magnitude CDF. We calculate point and interval estimates of R(3.0, 30) and T(M) in the moving time window of 100 events advancing by one event. For each time window 10,000 bootstrap replicas of the data in the window are used to evaluate 95% CI of the mentioned hazard functions. The final results of the analysis are shown in Fig. 8. The presented T(M) estimates have been obtained for the window No 20.

Song Tranh 2 dam, the base for the second practical example, locates on the River Song Tranh in Quang Nam province in central Vietnam. The dam was built as a part of hydropower plant. Filling of the reservoir started in November 2010. Up to the beginning of 2011, the seismic activity in this area increased significantly. Two strongest earthquakes, of magnitudes 4.6 and 4.7, took place on 22nd October and on 15th November 2012, respectively. The seismic activity continues until the present. We analyze a set of 822 earthquakes recorded from 1.09.2012 to 10.11.2014. The range of magnitudes is [1.0; 4.7] and the set is complete. The b-value from the whole set is 0.82, however, Urban et al. (2015) ascertained statistically a highly significant deviation of the observed magnitude distribution from the Gutenberg–Richter related exponential model, Eq. 12. Therefore, we also use the nonparametric approach to estimate the seismic hazard functions and their uncertainties. We calculate the estimates in the moving time window comprising 200 events and advancing by 10 events. The mean activity rate varies between the windows within the range of 0.6–3.5 event/day. We calculate point and interval estimates of R(3, 7 days) and T(M). For each time window 10,000 bootstrap replicas of the data in the window are used to evaluate 95% CI of the hazard functions. The results are shown in Fig. 9. The presented mean return period estimates have been obtained for the window No 5.

Fig. 9
figure 9

The results for the induced seismicity episode from Song Tranh 2 reservoir. a Time changes of the estimated exceedance probability, R(M p , D) for M p  = 3.0 and D = 30 days calculated in moving time window of 200 events advancing by 10 events. b The mean return period estimates for the time window No 5. The mean activity rate for this window is 1.36 events/day. The solid green lines represent the point estimates, the blue dashed lines represent the 95% CI estimates when the mean activity rate uncertainty has been taken into account and the black dotted lines represent the 95% CI estimates when the mean activity rate uncertainty has been neglected

The first observation drawn from Figs. 8 and 9 is that in both considered cases the exceedance probability considerably varies in time. In the example from Rudna mine during the time of the first 60 windows this probability, that is the seismic hazard, was much stronger than the hazard during the time of the windows from 70 to 120. In the Song Tranh 2 case the hazard was initially quite high and then steadily decreased until the 33-rd time window to increase again within the time period of the last 16 windows.

Second, the confidence intervals are generally wide, that is the uncertainty of hazard functions estimation is considerable. For instance in the Rudna mine case, the point estimate of the exceedance probability of M3 events in a month is 0.4 for the window No 58 and the 95% CI is [0.01, 0.65]. For the Song Tranh 2 case in the window No 1 we have 0.37 for the point estimate and [0.18, 0.52] for 95% CI of the exceedance probability of M3 events in a month. These results underline the need for interval estimation of hazard functions illustrating how much one can be misled regarding hazard when only point estimates are in hand.

Third, there are no significant differences between the 95% CI estimates including and not including the uncertainty of the mean activity rate, λ. The one visible difference is tiny. The biggest differences between these estimates reach 12-per cent of \(R(M_{p} = 3, D = 30)\) for the Rudna mine case when the time periods with the mean activity rate is the lowest, equal to 0.22–0.28 event per day.

5 Conclusions

We have presented a way to integrate the uncertainty of mean activity rate and magnitude CDF estimates in the interval estimation of the most widely used seismic hazard functions, namely the exceedance probability, R(M, D) and the mean return period, T(M). The proposed algorithm can be used in both situations, either when the parametric model of magnitude distribution is accepted or when the nonparametric estimation is in use. The performance of this algorithm and the changes resulted from this integrated approach with respect to the approach, which neglects the uncertainty of the mean activity rate estimate have been studied on synthetic and actual datasets. The following conclusions can be drawn:

  1. 1.

    Assuming that earthquake occurrences are governed by the Poisson distribution, the algorithm deals with the uncertainty of seismic hazard functions, which depend on the magnitude distribution and the Poisson mean activity rate, both elements being uncertain. However, it is generic, hence can be applied also to capture the propagation of uncertainty of estimates, which are parameters of a multiparameter function, onto this function.

  2. 2.

    Taking into account also the uncertainty of the mean activity rate in the interval estimation of hazard functions makes differences only when the product λD is small, at about 5.0 or less. In such cases, CI of the considered seismic hazard functions should be estimated capturing uncertainty of both their random components: the mean activity rate and magnitude CDF.

  3. 3.

    When λD is bigger, the impact of the uncertainty of magnitude CDF dominates the values of confidence intervals of hazard functions. This results from the particular forms of the hazard functions hence is specific for these functions. In such cases, the uncertainty of λ can be safely neglected.

  4. 4.

    In any case the variance of hazard functions estimates, resulting from the variance of estimates of their components, is significant. Further developments of PSHA should aim at including this source of uncertainty into seismic hazard assessments.