1 Introduction

Ground motion models (GMMs) are a key component of a variety of scientific and engineering products, from seismic hazard and risk analyses, to shakemaps and magnitude scales. GMMs are developed using empirical data, either through direct regression, or in the case of simulation-based models, through calibration to recorded data. The quality of the underlying ground motion dataset is therefore of significant importance. The main issues that may reduce the quality of ground motion records are the instrument and datalogger (i.e. the record’s fidelity), and the background noise. The use of modern instrumentation, with broadband flat sensitivity, high-resolution dataloggers and reliable data transmission means that the main constraint on the usability of an earthquake ground motion record now lies with background noise. A great deal of attention has been paid to the processing of earthquake records in strong ground motion datasets to avoid or minimise the impact of noise on response spectra at long periods (Akkar and Bommer 2006) and short periods (Douglas and Boore 2011). However, little attention has been paid to the impact of noise on weak-motion records, which are increasingly utilised in studies developing application- and location-specific GMMs (e.g. Bommer et al. 2016; Novakovic et al. 2018; Edwards et al. 2021).

The recent increase in induced seismicity related to emerging industries, such as hydraulic fracturing, geothermal energy and CO2 sequestration is of particular global concern, especially within the context of the transition to low-carbon economies. Induced seismicity tends to be of smaller magnitude, however is often at shallow depths and in close proximity to urban areas. This means that smaller magnitude events contribute to the seismic hazard and risk of local population centres. Furthermore, there is increasing recognition that ground motions are regionally variable, particularly for small to moderate earthquakes occurring at shallow depths (Douglas and Edwards 2016). As a result, engineers must either modify existing models, or develop local GMMs using regional or, better still, data local to the target site. This spatial limitation means that motions will inevitably be of smaller amplitude if uncertainty and biases in developed models are to be reduced.

Seismic noise is a continuous, but variable, vibration with various sources. At low frequency, the microseism dominates and is related to natural phenomena such as ocean waves (Webb 1992). At high frequency, sources of seismic noise tend to be anthropogenic, owing to their lower propagation efficiency. Monitoring instruments in close proximity to urban environments are therefore susceptible to both low- and high-frequency disturbances. Cauzzi and Clinton (2013) and Peterson (1993) provide an overview of seismic noise and develop models for ‘high’ and ‘low’ noise cases. In terms of the impact of seismic noise on ground motion time-histories or, equivalently, Fourier spectra, an unambiguous assessment is possible by comparing earthquake records with ‘noise records’. Noise records are typically taken from the immediate pre-event time-history (such that transient noise at the time of recording may be captured), but equally an estimate could be reconstructed based on the high- and low-noise models (Cauzzi and Clinton 2013; Peterson 1993). The ratio of signal (plus noise) to noise Fourier spectral amplitudes, typically referred to as the signal-to-noise ratio (SNR), provides a useful measure of signal contamination. Thresholds above three are usually considered suitable, but this varies between applications, with authors typically striking a balance between data quality and quantity.

The impact of noise on response spectral ordinates such as pseudo-spectral acceleration (PSA), which form the basis of seismic hazard and risk analyses, is more difficult to quantify than for Fourier amplitude spectra (FAS). This is because of the non-linear transform between the Fourier domain (representing the signal amplitude at a given signal frequency) and response spectral amplitudes (representing peak motions of an oscillator with characteristic period). Bora et al. (2016) show that this transform results in a roughly linear relationship between low Fourier frequencies and long oscillator periods (i.e. FAS(1/f)∝PSA(T) for T>0.5 s), but quickly becomes non-linear as periods reduce. At long periods, Akkar and Bommer (2006) showed that a usability limit of Tmax = [0.7 to 0.97]/fl was required to avoid the impact of long period noise on PSA, with fl (the minimum uncontaminated signal frequency) defined, for example, by a SNR threshold.

On the other hand, Bora et al. (2016) show that the shortest oscillator periods of engineering interest (e.g. 0.01–0.1 s) are driven by motions with longer Fourier periods than those of the corresponding response-spectrum oscillator period. In fact, noise present in the time series at very short periods, such as 0.01–0.03 s does not necessarily affect the corresponding response spectrum at all, even in that specific period range. For instance, PSA at T = 0.01 s (often assumed equivalent to PGA) is typically related to ground motions at 20–30 Hz or lower, and nowhere near the 100 Hz implied by the reciprocal of the oscillator period. This was explored in detail by Douglas and Boore (2011), who concluded, through simulations of moderate to large events, that contaminating records with high-frequency noise had a negligible impact on their response spectra. This is frequently used as justification to ignore high-frequency noise and Tmin when dealing with PSA.

Douglas and Boore (2011) investigated records for earthquakes typically found in strong ground motion datasets (M > 4.5) and, furthermore, made use of site conditions representative of such records, typically soil or stiff-soil sites. Both the moderate to high magnitude of events and the relatively high damping result in records with Fourier spectral amplitudes naturally lacking in high-frequency content. The records used by Douglas and Boore (2011) therefore exhibited low source corner frequencies (f0 < 1 Hz) and strong exponential decay at high frequency due to damping. For application to smaller events (weak-motion data), with higher source corner frequencies, or to records from rock or hard rock sites with low damping, we must consider signals with very different spectral content to those investigated by Douglas and Boore (2011). As noted previously, in such cases, the degree to which short oscillator period PSA is driven by longer period motions reduces and the conclusions of Douglas and Boore (2011) may, therefore, not be transferable to weak-motion data.

It is the aim of this study to investigate the impact of high-frequency noise on the response spectrum and propose a robust workflow for defining the usable bandwidth of both FAS and PSA from weak-motion records. For clarity, throughout this manuscript, we refer to PSA in terms of oscillator period, T, and FAS in terms of signal frequency, f, as per convention. A parametric model for the lowest usable (uncontaminated) period, Tmin, is initially developed using simulations that account for the influence of Fourier spectral shape on the influence of noise in a record’s 5% damped response spectrum. The model is then used in direct application to investigate the impact of high-frequency noise on response spectral ordinates from weak-motion records of induced seismicity in the Groningen Gas Field, the Netherlands. The database consists of 803 triaxial recordings from events between 2006 and 2020 with local magnitudes ranging from ML 2.5 to ML 3.6 and is being used in the framework of the Groningen gas field hazard and risk analyses (van Elk et al. 2017) to develop a GMM (Bommer et al. 2017). The recordings are from high-quality digital accelerographs at 98 sites belonging to the B- and G-networks of the Royal Netherlands Meteorological Institute (KNMI; see Ntinalexis et al. 2019; Dost et al. 2017; KNMI 1993). By virtue of the small magnitude of the events, the recordings contain small-amplitude motions, with as-recorded horizontal PGA values ranging from 0.068 cm/s2 (7 × 10−5g) to 108.68 cm/s2 (0.11 g).

2 Noise and its impact on FAS and PSA

The assessment of noise and its impact in the Fourier domain is relatively straight-forward. Typically, pre-event noise samples are taken and compared with the record in the Fourier domain. It is important to account for differences in signal duration when sampling the time-history for noise. Authors sometimes ensure that both the earthquake time-history and the noise time-history are of equal length, but this is not always possible. In this case, noise FAS should be scaled by the square root of the ratio of duration between the earthquake and noise time-history (after Parseval’s theorem) to provide FAS amplitudes that are consistent (i.e. corresponding to equivalent signal lengths). Due to the characteristic ‘trapezoidal’ shape of the earthquake acceleration spectrum, it stands out over the broadly flat noise floor within the passband that can be considered acceptable (Fig. 1). Lower (fl) and upper (fu) usable frequency limits are therefore clearly identifiable.

Fig. 1
figure 1

Example simulation for the Groningen GMM (with κ0 = 0.01 s). Top: FAS of signal (black) and noise (red). Middle: SNR (black) and threshold (magenta). Green-dashed lines show the lower and upper usable frequencies. Bottom: the corresponding synthetic time-history

As noted earlier, for the response spectrum (PSA), the correlation with FAS amplitudes at corresponding oscillator periods decreases dramatically as signal frequencies increase above the record’s apparent corner frequency (roughly the peak of the FAS) (Bora et al. 2016). While at long periods we can therefore assume a correspondence of the minimum usable frequency of the record’s FAS (fl) and the maximum usable period of the record’s response spectrum (i.e. Tmax = [0.7 to 0.97]/fl, after Akkar and Bommer 2006), at high signal frequencies and short oscillator periods, we cannot make this assumption.

In order to assess the impact of high-frequency noise on PSA, we initially work with synthetic data. This allows us an unambiguous definition of the true signal amplitude and corresponding uncontaminated response spectrum, which is not afforded with real data. Time-domain stochastic simulations have been performed using EXSIM (Motazedian and Atkinson 2005) as modified by Boore (2009). Simulations have been performed using the GMMs for:

  1. i)

    the Groningen gas field at a buried rock horizon (Vs30 = 1400 m/s), as detailed in Edwards et al. (2019), and

  2. ii)

    Eastern North America (ENA, Atkinson and Boore 2006).

For the Groningen simulations, we investigate the impact of damping by varying the simulation parameter κ0 (Anderson and Hough 1984), using κ0 = 0.01, 0.03 and 0.05 s (roughly equivalent to damping expected at competent rock outcrops through to low Vs soil site conditions). The ENA GMM specifically allows the investigation of weakly damped motions, with a very hard-rock site condition (κ0 = 0.005 s) implicit in the GMM. In both cases, the GMMs are calibrated against local empirical data in the magnitude range of interest for this study (approx. ML < 4), and the simulations can therefore be considered to be representative, yet diverse, in terms of amplitude and frequency content, of real earthquake records.

Noise-free acceleration time series from earthquakes with moment magnitudes from 1.0 to 6.0 (in 0.5 unit increments) are simulated at 20 log-spaced Joyner-Boore distances from 0.1 to 60 km (Fig. 1). Noise is subsequently applied to the simulations in increasing amplitude until the signals are completely lost. We use two noise forms: (i) white noise and (ii) the noise model of Cauzzi and Clinton (2013). The white noise is generated in the time domain and defined by a normal distribution with zero mean and standard deviation: 0.01, 0.1, 1, 10, 100 cm/s2. The higher values are not intended to reflect typical noise levels, but to ensure that all records are affected by noise. The model of Cauzzi and Clinton (2013) is used to consider a realistic high-noise scenario, using their high-noise power-spectrum model converted from dB/Hz to absolute units of spectral acceleration. We scale those amplitudes incrementally by factors 0.1, 0.2, 0.5, 0.75, and 1.0, to generate a realistic stochastic-phase noise time-history that is added to the noise-free simulation in the time domain. With this approach, we retain acceleration time-histories for the noise-free simulation, the background noise and the contaminated ‘noisy’ simulation, with the latter referred in the following as the ‘synthetic’ time series.

From each of the synthetic time series’ FAS, various measures are determined to allow investigation of the impact of the noise level:

  • fu: the upper usable FAS frequency in Hz (defined by a signal to noise ratio of 3);

  • fpeak: the frequency at the FAS peak;

  • Apeak: the natural logarithm FAS amplitude at fpeak;

  • Au: the natural logarithm FAS amplitude at fu;

  • A: the amplitude difference, Apeak − Au;

  • f: the frequency difference, fu − fpeak.

We measure Tmin by determining the lowest period at which the 5% damped response spectrum of the acceleration time-history is unaffected by noise. This is defined as where the response spectrum of the synthetic time-history is within a 5% tolerance of the true value (Fig. 2). This is a conservative estimate, as PSA at shorter periods than the subsequently defined Tmin may return to within the defined threshold. Our observations show that while this is often the case, PSA then tends to fluctuate within and outside the acceptable tolerance level at periods below Tmin (Fig. 2). Low-pass frequency filtering of the time series at, or around, fu results in severe impact on the PSA for weak-motion data, and, as such, should not be used. It is clear that unfiltered (or high-pass frequency filtered: band-pass f > fl) time series allow calculation of PSA to periods well below 1/fu. In fact, at the 5% tolerance level, the PSA from both these cases in Fig. 2 (from the simulation shown in Fig. 1) only just fail, with most PSA amplitudes being within ~ 10% of the true values. On the other hand, the low-pass frequency filtered time series result in up to 50 % underestimation of PSA.

Fig. 2
figure 2

(Top) 5% damped response spectra corresponding to the unfiltered synthetic and filtered data (with legend indicating the frequency band-pass), in addition to PSA from the noise-free time series (the target) in Fig. 1. (Bottom) Ratios of PSA (as top) with respect to the noise-free target. Tmin (dashed vertical line) is defined as the period at which the unfiltered PSA first deviates (with decreasing period) from the ± 5% tolerance level (horizontal dashed lines).

Figure 3 shows the suite of measured fu and Tmin from simulations using the Groningen GMMs with alternative damping (κ0), along with the ENA GMM. Clearly, as expected, for signals with higher fu, we obtain PSA with smaller Tmin. However, the spectral shape has a significant impact on the usability of the response spectra: low κ0 (weakly damped) records require much higher values of fu to maintain usability of PSA down to 0.01 s. In terms of estimating Tmin from the time series, we therefore require knowledge of not only the usable FAS bandwidth (i.e. fu, which can be directly measured from field records), but also the spectral shape. In the following, we therefore develop a model for determining Tmin that accounts for spectral shape by using easily measured characteristics of a waveform and its FAS.

Fig. 3
figure 3

Plot of measured Tmin versus fu for simulations from the three Groningen and ENA GMMs

First, we define an adjusted upper usable FAS frequency (\( {f}_u^{\ast } \)) in Hz. The adjustment normalises fu to that expected, given the same noise and peak-signal amplitude, for a signal with a reference damping, defined by κref = 0.03 s. It therefore results in a predictor for Tmin that is unbiased.\( {f}_{\mathrm{u}}^{\ast } \) is given by:

$$ {f}_{\mathrm{u}}^{\ast }={f}_u.\mathit{\max}\left[0.4,\exp \left\{{f}_{\mathrm{u}}.\left[- 0.25\ lo{g}_{\mathrm{e}}\left({\kappa}_{ref}+0.005\right)-0.17\right]\left(\frac{\Delta A}{\pi \Delta f}-\left({\kappa}_{ref}+0.005\right)\right)\right\}\right] $$
(1)

Figure 4 shows the resulting \( {f}_{\mathrm{u}}^{\ast } \) for the three alternative Groningen GMM simulations (κ0= 0.01, 0.03 and 0.05 s). Note that \( {f}_{\mathrm{u}}^{\ast } \) values for the κ0= 0.03 s simulations (the same as our selected reference, κref) are broadly consistent with measured fu. For the κ0= 0.01 s simulations, \( {f}_{\mathrm{u}}^{\ast } \) are systematically reduced with respect to fu, while for the strongly damped κ0= 0.05 s simulations, \( {f}_{\mathrm{u}}^{\ast } \) are higher than measured fu. Note that we define a maximum adjustment factor, 0.4, in Eq. 1 based on trial and error after observing over-correction of very weakly damped (low κ0) signals.

Fig. 4
figure 4

Comparison of measured (fu) and adjusted upper FAS frequencies \( {f}_{\mathrm{u}}^{\ast } \) according to Eq. 1

Figure 5 compares the original fu (as Fig. 3) and adjusted \( {f}_{\mathrm{u}}^{\ast } \) against Tmin. The use of \( {f}_{\mathrm{u}}^{\ast } \) clearly removes the dependence of spectral shape on the correlation. Based on \( {f}_{\mathrm{u}}^{\ast } \) and Tmin for the Groningen GMM simulations (Fig. 5), a best-estimate lower usable period, \( \overline{T_{\mathrm{min}}} \) (in seconds), within an acceptable tolerance is defined by:

$$ {\displaystyle \begin{array}{cc}\overline{T_{min}}=\max \left(0.01,\mathit{\exp}\left({a}_2+{a}_1 lo{g}_{\mathrm{e}}\left({f}_{\mathrm{u}}^{\ast}\right)\right)\ \right)& {f}_{\mathrm{u}}^{\ast }<{a}_3\\ {}\overline{T_{min}}=0.01& {f}_{\mathrm{u}}^{\ast}\ge {a}_3\end{array}} $$
(2)

where a3 is the log-mean \( {f}_{\mathrm{u}}^{\ast } \) for 0.01 < Tmin < 0.02 s and a1 − 2 are determined through log-linear regression of \( {f}_{\mathrm{u}}^{\ast } \) versus Tmin. Bounds on Tmin are then given by introducing a scaling factor, c, on \( {f}_{\mathrm{u}}^{\ast } \):

$$ {\displaystyle \begin{array}{cc}{T}_{min}=\mathit{\exp}\left({a}_2+{a}_1 lo{g}_{\mathrm{e}}\left(\frac{f_{\mathrm{u}}^{\ast }}{c^n}\right)\right)&\ \frac{f_{\mathrm{u}}^{\ast }}{c^n}<{a}_3\\ {}{T}_{min}=0.01& \frac{f_{\mathrm{u}}^{\ast }}{c^n}\ge {a}_3\end{array}} $$
(3)

with the factors c = 1.113 (upper bound) and c = 1/1.113 (lower bound) designed to encapsulate the data (where 0.01 < Tmin < 0.02 s) at n standard deviations of \( {\log}_{\mathrm{e}}\left({f}_{\mathrm{u}}^{\ast}\right) \). Based on an average over three Groningen GMM simulation scenarios with 1100 simulations in each, and using only data where Tmin > 0.01 s, we determine a1 =  − 1.753, a2 = 1.946, and a3 = 25.41 Hz (Fig. 5c). In addition, a maximum threshold of Tmin = 0.1 s is imposed beyond which it is not possible to reliably estimate Tmin from \( {f}_{\mathrm{u}}^{\ast } \). Predicted values of Tmin are therefore deemed unresolved if they exceed 0.1 s.

Fig. 5
figure 5

Tmin versus (a) fu for the Groningen GMM simulations, (b) versus \( {f}_u^{\ast } \) (Equation 1), and (c) showing the linear best-fit (using \( {f}_{\mathrm{u}}^{\ast } \)) to the individual simulation datasets, along with the best-estimate Tmin model (Eq. 2) and ±3-sigma (Eq. 3, n = 3). Predicted Tmin (black lines) is assumed unresolved if greater than 0.1 s and not plotted

In order to test if there is any sensitivity of the model to the selected magnitude-distance range, the data are split into subsets with magnitude 1–4.5 and 4.5–6 and distances 0–30 and 30–60 km. The model was found to be equally applicable to all of the data subsets. An example is shown for M = 4.5–6 at all distances in Fig. 6. These simulations were subject to unrealistically high levels of noise in order to obtain relatively low fu for such large events. Interestingly, the model appears equally valid for these very noisy records of larger events, in addition to weak-motion data. While not routinely useful for strong-motion datasets (since noise levels rarely reach such amplitudes), the model would be useful for cases where significant anthropogenic noise levels are present, such as those used for earthquake early warning in industrial settings (Cauzzi et al. 2016). As further validation exercise, we apply the model to the ENA dataset (Fig. 7), which has so far been withheld from the model development. Disregarding the very noisy data with Tmin > 0.1 (which, as noted previously, shows very little correlation with fu), the consistency with results from the Groningen simulations is very good, and the model for Tmin is clearly suitable independent of the region.

Fig. 6
figure 6

Tmin versus (a) fu for the Groningen GMM simulations for the 3 damping levels with M = 4.5–6, (b) versus \( {f}_{\mathrm{u}}^{\ast } \), and (c) showing the linear best-fit to the individual simulation datasets along with the model (Eqs. 2, 3, n = 3). The black lines show the model for the whole dataset, as Fig. 5, and almost completely overlap the model that would result from using only M = 4.5–6 (magenta). Note that predicted Tmin is assumed unresolved if greater than 0.1 s

Fig. 7
figure 7

Tmin versus (a) fu for the κ0= 0.03 s Groningen GMM simulations (red) alongside the ENA data (cyan) and b versus \( {f}_{\mathrm{u}}^{\ast } \) for the same datasets, dashed red and cyan lines show the best fit of each. The black lines are the proposed model (as Fig. 5) for predicting Tmin calibrated using only the Groningen simulations.

Our choice of 5% tolerance for selecting observed Tmin will clearly have an impact on the results discussed previously: allowing a larger tolerance when measuring Tmin means that lower fu are required (for a given Tmin). In order to facilitate choice when implementing the Tmin model, we have also calibrated coefficients for Eqs. 2 and 3 using alternative tolerances of 10 and 15 % (Table 1, Fig. 8). An alternative to white noise was also explored by implementing the high noise model of Cauzzi and Clinton (2013), which is somewhat more forgiving in the mid-period range than white noise. Here the noise is more realistic, but the larger events, particularly those simulated at near distances, are unaffected by the noise and are therefore not included in the derivation of the alternative model (since Tmin = 0.01 s for those records). Using the high noise model, a3, defining the frequency \( {f}_{\mathrm{u}}^{\ast } \) above which Tmin= 0.01 s, is almost unchanged, being instead strongly related to the acceptable tolerance within the true PSA. For tolerance values of 5, 10 and 15%, we observe a3 values of 24.4–25.4, 19.3–20.3 and 17.0–17.1 Hz, respectively: \( {f}_{\mathrm{u}}^{\ast } \) above which results in Tmin = 0.01 s. The shape of the Tmin versus \( {f}_{\mathrm{u}}^{\ast } \) slope does change depending on the noise model used, however. This suggests that the shape of the noise spectrum itself, as well as the earthquake time series, has an impact on the usability of PSA.

Table 1 Coefficients of Eqs. 2 and 3 for various tolerances. HNM indicates the use of the high-noise model of Cauzzi and Clinton (2011); otherwise, the white noise model is used
Fig. 8
figure 8

Comparison of modelled Tmin versus \( {f}_{\mathrm{u}}^{\ast } \) for various acceptable tolerances on PSA (see Table 1) using a the high-noise model (HNM) of Cauzzi and Clinton (2013) and b white noise

3 Workflow: usable frequency range of FAS

In the following sections, we detail the application of a workflow used to define usable frequency (for FAS) and period (for PSA) for an induced seismicity dataset, specifically, a database of 803 recordings from the B- and G-networks of the KNMI in the Groningen region, the Netherlands. The recording networks and instrumentation used to record the acceleration time series are described in Ntinalexis et al. (2019). Prior to 2014, the monitoring network consisted of several GeoSig digital accelerographs. As a consequence of a ML3.6 earthquake that occurred in Huizinge on 16 August 2012, more detailed seismic studies were commissioned for the area. A significant upgrade and expansion of the existing network as well as the installation of new networks became part of this effort (Ntinalexis et al. 2019). The KNMI networks now consist of almost 100 modern Kinemetrics accelerometer stations with high rate 24-bit data-logging. We can therefore safely assume that the predominant source of signal contamination in the dataset analysed will be external noise. The recordings examined were obtained during induced events of local magnitudes ranging from ML2.5 to ML3.6 that occurred between 2006 and 2020 in Groningen. The as-recorded horizontal PGA values of the records range from 0.068 cm/s2 to 108.68 cm/s2 and were recorded at epicentral distances ranging from 0.4 to 34 km (Fig. 9).

Fig. 9
figure 9

Peak ground acceleration of the Groningen horizontal components plotted against distance (upper) and magnitude-distance distribution of the Groningen database (lower)

3.1 Maximum usable frequency

As mentioned previously, the maximum usable FAS frequency, fu, can be selected via a signal-to-noise ratio analysis. We choose to select fu as the maximum frequency of the continuous frequency window with SNR above 3. This is the simplest method to select the maximum usable frequency and is also widely employed in engineering and seismology. To conduct the SNR analysis, it is first necessary to obtain a noise model representative of the noise in the record. This is routinely determined as the FAS of the pre-event time series. In most modern recording networks, continuous data streams are available via online services and data portals, which allows the user to select a time window of their choice around the event. In these cases, it suffices for the user to select a time window with a long pre-event memory and select the first several seconds of that window to sample noise adequately. However, in networks operating on a triggering-only basis, such as the KNMI B-network in Groningen prior to 2014 (see Ntinalexis et al. 2019), limited time lengths of the pre-event memory may be available. In small-amplitude records such as those included in the Groningen database, the SNR at frequencies above 20 Hz can also be very sensitive to the selection of the noise window due to transient signals, and hence it is important to make sure that the noise window is carefully selected.

A technique of dynamic noise window selection is employed in our analyses. We use the vertical component motion to determine the noise window to ensure we avoid P wave energy in the selected analysis window. While small in amplitude on the horizontal components, the P wave has non-negligible high-frequency energy that may bias the noise estimate (and therefore fu). We begin by locating the time window from the beginning of the record to the point where the Arias Intensity is 0.5% of the total. We then determine short (−1 to +0.5 s) and long-term (−3 to +0.5 s) moving averages (STA and LTA, respectively) and compute the ratio (STA/LTA). A ratio above 1.2 signifies a significant amplitude change that can be associated with the first observable arrivals of the earthquake signal. We choose the end of the noise window to be the earliest of either the 0.5% Arias Intensity or the STA/LTA trigger (assumed to be the P wave). The noise window, as defined on the vertical component, is then used for the horizontal components. An example is shown in Fig. 10.

Fig. 10
figure 10

Time-histories and selected noise windows of the MID1 recording of the ML3.2 Westeremden event on 30 October 2008

For the Groningen dataset, we found that often the G-station sensors were located close to the electricity mains network. In this case, it was very likely that the record was contaminated with 50-Hz noise. For small-amplitude records, this may result in a significant peak in the FAS (Fig. 11) and affect the calculation of fu, as well as the response spectra of the record (Fig. 12). Douglas and Boore (2011) recommend the removal of this peak at 50 Hz with a narrow notch filter and in our case, we found it absolutely necessary to remove the 50-Hz noise in order to obtain correct estimates of short-period PSA (Fig. 12).

Fig. 11
figure 11

Example of the presence and amplitude of mains electricity noise in a record’s FAS

Fig. 12
figure 12

Response spectra of record G040 from the ML3.1 Hellum earthquake in Groningen (30 September 2015), removing (blue) and without removing (red) the 50-Hz noise peak

3.2 Minimum usable frequency

Determining the lower usable frequency (fl) by employing the same SNR>3 criterion as used for fu is a choice that is often employed. However, because the SNRs of small-amplitude records are smaller and the resulting bandwidth can be very limited, it is desirable in our case to use a method that results to more forgiving estimates of fl. The first step is to obtain an initial estimate of fl. This is defined as the first point (with decreasing frequency) that the linear trend of the recording’s FAS is observed to systematically decay more slowly than a theoretical Brune (1970) spectrum. The next step is to low-cut filter the record using fl as the filter corner-frequency and then compute the displacement trace through double integration of the acceleration time series. The filter used is an 8th order acausal Butterworth filter. Any low-frequency noise can then easily be observed in the time domain. If the total displacement is zero and long-period noise cannot be readily observed in the displacement trace, then the initial estimate is selected as the final fl value. If the user judges the displacement trace to still be unacceptably contaminated with noise, a higher frequency is selected, and the process is iterated until fl is found that results in a noise-free displacement time series.

An example of the application of the iterative selection of fl is shown in Figs. 13 and 14. Figure 13 shows the FAS of the North-South component of recording KANT from the ML3.2 Garrelsweer earthquake of 27 June 2011. The identification of different possible low-cut filter frequencies from the FAS of the record is illustrated. The displacement traces obtained after the application of the different filters are compared in Fig. 14. It is obvious that applying a filter of 0.342 Hz (the initial estimate based on spectral shape) is insufficient, as long-period waves are still clearly observable in the displacement trace (Fig. 14). After iterating through increased values of fl, we observe that a frequency of 1.611 Hz is excessive as it results in a reduction in the amplitude of the record. Low-cut frequencies of 0.635 Hz and 0.732 Hz both produce acceptable results; hence, the lowest, 0.635 Hz, is selected.

Fig. 13
figure 13

Fourier amplitude spectrum of acceleration for the North-South component of recording KANT from the ML3.2 Garrelsweer earthquake of 27 June 2011 (blue) the fitted Brune model (purple) and proposed frequencies for the selection of fl (as Fig. 14)

Fig. 14
figure 14

Acceleration, velocity and displacement traces of the North-South component of recording KANT from the ML3.2 Garrelsweer earthquake of 27 June 2011 after the application of different low-cut filters. The displacement traces are shown in individual panels

3.3 Removal criteria

Figure 15 shows ratios of the PGV and PSA of noise-contaminated synthetic recordings to the noise-free versions using the Groningen GMM with κ0 = 0.03 s. The ratios are plotted as a function of the maximum usable frequency (fu). It is immediately apparent that, when fu is low, PGV and the short-period spectral ordinates have significantly increased amplitudes. We therefore recommend that records with fu below 15 Hz should not be used at all and should be discarded from ground-motion databases. As shown in Fig. 14, low-cut filtering with an excessively high cutoff frequency can result in a reduction in amplitude and should be avoided. Therefore, when fl is identified above 2 Hz, we also consider a record to also be unusable. When either horizontal component fulfils at least one of these removal criteria, we discard the entire triaxial recording, as both horizontal components are required to compute the intensity measures commonly predicted by GMPEs and GMMs.

Fig. 15
figure 15

Ratio of the PGV and PSA of noise-contaminated synthetic recordings (using the Groningen GMM with κ0 = 0.03 s) to the noise-free versions

Figure 16 illustrates which records of the database were removed entirely by applying the constraints on fl and fu. A total of 96 out of the 800 records (12%) from the Groningen database were removed. As expected, these recordings correspond to the relatively weaker motions within the database, which come from the lower end of the magnitude range and stations at longer epicentral distances (Fig. 16).

Fig. 16
figure 16

Peak ground acceleration of the Groningen as-recorded horizontal components plotted against distance (upper) and magnitude-distance distribution of the Groningen database (lower). Unusable recordings are shown in red

4 Workflow: usable period range of PSA

4.1 Maximum usable period

Once the usable bandwidth of the FAS is defined, the next step is to low-cut filter the records. We recommend the use of an 8th order acausal (zero phase) Butterworth filter, which has been found to be more suitable for use on digital records (Boore and Akkar 2003). For the correct use of this type of filter, it is necessary to zero-pad both ends of the record (Boore and Bommer 2005). The pad length is a calculated using the function of Converse and Brady (1992) which is dependent on both the chosen filter corner frequency and the order of the filter. We apply the same filter to both horizontal components, using the lowest cutoff, fl, of the two components, as they are typically used in conjunction when calculating intensity parameters for use in GMPE/GMM development.

The amplitudes of long-period spectral ordinates are highly sensitive to the application of low-cut filters. As the filter removes both signal and noise, an unknown combination of both is left behind by the filter at frequencies lower and close to the cutoff frequency. Therefore, the response spectra are reliable for use only up to a certain period, lower than the long-period cutoff (Tc, the inverse of the cutoff frequency, fu). Different studies have employed schemes to define this usable period limit. Some examples are described in Boore and Bommer (2005) and Akkar and Bommer (2006). The most widely employed technique—and the one adopted in this study—to define the usable period limit is to identify the ratio Tc/Tmax.

According to Akkar and Bommer (2006), for digital records from soft soil sites such as those in Groningen, this ratio is between 0.7 and 0.97. The method we adopted to select from this range consists of comparing the PSA before and after filtering and only using the spectral ordinates where the change in amplitude is within a certain threshold. For the Groningen data, we selected this threshold to be 5%. Figure 17 shows ratios of PSA post- to pre-filtering, plotted as a function of the ratio of each period to the cutoff period. In this case, it can be observed that more than 95% of the response spectra have changed by less than 5% up to a period of 70% of the cutoff period. Hence, we selected the ratio of 0.7 and define the maximum usable period for each record as Tmax = 0.7Tc = 0.7/fl. It must be noted that, for databases with a small number of available records, it may be preferable to define a larger ratio to maximise the available data, using a more generous threshold.

Fig. 17
figure 17

Ratios of PSA post- to pre-filtering, plotted against a ratio of period to cutoff period

4.2 Minimum usable period

Filtering high frequencies prior to computing PSA is not recommended as it may have a knock-off effect on a wide range of periods (see Fig. 2). However, as shown earlier, it is still necessary to define a minimum usable period in order to exclude noise-contaminated PSA from use. The first estimate of Tmin is the result of the upper-bound Tmin model presented earlier at n = 3 (Eq. 3), which we apply for a threshold 5% using the white noise model (Table 1).

In addition to the parametric Tmin model, we devise additional measures to constrain Tmin. We create two hybrid-synthetic records using the FAS of each record under analysis. To create the first synthetic, we fit an idealised Brune (1970) spectrum to the FAS of the record (Fig. 18), and use the FAS of the record within its usable frequency range (fl to fu) and the Brune spectrum in the unusable frequencies. Thus, we create an idealised ‘noise-free’ version of the record when performing an inverse Fourier transform. To create the second synthetic, we use the full FAS of the record but double it for frequencies higher than fu. In this way, we obtain a noisier version of the same record.

Fig. 18
figure 18

FAS of record G780 from the ML3.4 Zeerijp event of 8 January 2018 in Groningen

By comparing the response spectrum of the original record to the idealised ‘noise-free’ version, we obtain an estimate of the periods that are affected by noise. At the same time, by comparing the original response spectrum with the ‘noisier’ version, we can observe which periods are sensitive to additional noise. From these comparisons, we can define two additional estimates of Tmin, based on the divergence (with 5 % tolerance) of the hybrid-synthetic and the original response spectra. Finally, we select Tmin using the following logic (Fig. 19):

  • If the parametric Tmin model is 0.01 s (the shortest period defined), we retain that value.

  • If two of the three Tmin estimates are within 10% of one another, we retain the average value of those Tmin.

  • Otherwise, we select the result of the parametric Tmin model, but restrict Tmin between the values calculated using the two hybrid-synthetics.

Fig. 19
figure 19

Spectral ratios and selection of Tmin for the H2 component of record G310 of the 2 May 2020 ML2.5 Zijldijk earthquake in Groningen

The number of usable PSA, as defined by Tmin and Tmax, is shown in Fig. 20 over 13 approximately linearly spaced periods from 0.01 to 1.5 s. The largest quantity of usable spectral accelerations correspond to the intermediate periods (0.1–0.7 s), a smaller number (498) is available at 0.01 s and a rapid decay can be observed with increasing period from 0.85 s onward. At 1.5 s, the number of usable spectral accelerations is 184, which can still be considered sufficient for the limited distance (Repi < 35 km) and magnitude range covered by the database. In total, 206 records (29.2% of the 704 usable records) are unusable at 0.01 s due to noise.

Fig. 20
figure 20

Number of usable records as a function of oscillator period, showing the total number and those corresponding to different earthquake magnitude ranges

5 Conclusions

Short-period noise in acceleration time series has the potential to influence response spectral accelerations at short oscillator periods. This has previously been investigated by Douglas and Boore (2011) in the context of data found in typical in strong-motion datasets. Analysis of ‘strong-motion’ data, however, generally avoids the influence of high-frequency noise. This is both due the relative amplitude of signal and noise, and also due to the fact that the dominant frequency of motion of strong-motion data is much lower than any high-frequency noise. Our simulations show that PSA from noisy weak-motion records, as present in many ground motion databases such as those for induced seismicity, is susceptible to high-frequency noise. This is particularly so for weakly damped records, such as those on ‘hard-rock’ sites. The impact of high-frequency noise on PSA should be considered by assigning record specific Tmin and without any form of low-pass frequency filtering. A parametric Tmin model, based on easily measurable properties of waveform FAS (peak/noise amplitudes, frequencies), is proposed herein and can be used as a guide to assign Tmin. We additionally propose an easily implementable approach to assess the impact of noise using hybrid-synthetic records, which modify the ‘unusable’ noisy portion of the records’ FAS, before reconstructing time series and subsequently PSA for comparison with the original spectrum. An example of the full workflow used to define usable FAS frequencies and PSA periods was presented for the Groningen induced seismicity database. We showed that only 12% (96 out of 800 available records) were required to be removed in their entirety due to excessive noise. Further to the removal of records in the long periods range (based on Tmax), which is already common practice for GMPE/GMM databases, we showed that 29% of the usable records of the database are unusable at 0.01 s due to the influence of high-frequency noise.