Defining the usable bandwidth of weak-motion records: application to induced seismicity in the Groningen Gas Field, the Netherlands

Seismic hazard and risk analyses are increasingly tapping into the previously underused resource of local weak-motion records. This is facilitating the development of local- or even application-specific models for the characterisation of earthquake ground motion. In turn, this offers the opportunity to derive non- or partially non-ergodic models and significantly reduce bias and uncertainty. However, weak-motion data, while carrying important information about local earthquake source, path and site effects, are susceptible to noise. We show that high-frequency noise has a record-, or region-specific, impact on pseudo-spectral acceleration (PSA). This impact depends on the shape of the records’ Fourier amplitude spectrum (FAS): PSA from moderately to highly damped ‘soil’ records (e.g. Groningen, the Netherlands) is much less susceptible to high-frequency noise than PSA from weakly damped ‘rock’ records (e.g. Eastern North America). We make use of simulated ground motion records to develop a parametric model for the lower usable period of PSA (Tmin). The model accounts for the impact of high-frequency noise on PSA, conditional on easily measured parameters characterising the shape of a record’s FAS. We then present a workflow, describing processing undertaken for records of induced seismicity from the Groningen gas field. The workflow includes the definition of maximum and minimum usable frequencies and periods of FAS and PSA, respectively. As part of the workflow, we present an approach that considers multiple estimates of Tmin. These include the parametric model and, additionally, record-specific hybrid simulations that artificially extend or modify time series’ FAS beyond the noise floor to assess subsequent impacts on PSA.


Introduction
Ground motion models (GMMs) are a key component of a variety of scientific and engineering products, from seismic hazard and risk analyses, to shakemaps and magnitude scales. GMMs are developed using empirical data, either through direct regression, or in the case of simulation-based models, through calibration to recorded data. The quality of the underlying ground motion dataset is therefore of significant importance. The main issues that may reduce the quality of ground motion https://doi.org/10.1007/s10950-021-10010-7 records are the instrument and datalogger (i.e. the record's fidelity), and the background noise. The use of modern instrumentation, with broadband flat sensitivity, high-resolution dataloggers and reliable data transmission means that the main constraint on the usability of an earthquake ground motion record now lies with background noise. A great deal of attention has been paid to the processing of earthquake records in strong ground motion datasets to avoid or minimise the impact of noise on response spectra at long periods (Akkar and Bommer 2006) and short periods (Douglas and Boore 2011). However, little attention has been paid to the impact of noise on weak-motion records, which are increasingly utilised in studies developing application-and location-specific GMMs (e.g. Bommer et al. 2016;Novakovic et al. 2018;Edwards et al. 2021).
The recent increase in induced seismicity related to emerging industries, such as hydraulic fracturing, geothermal energy and CO 2 sequestration is of particular global concern, especially within the context of the transition to low-carbon economies. Induced seismicity tends to be of smaller magnitude, however is often at shallow depths and in close proximity to urban areas. This means that smaller magnitude events contribute to the seismic hazard and risk of local population centres. Furthermore, there is increasing recognition that ground motions are regionally variable, particularly for small to moderate earthquakes occurring at shallow depths (Douglas and Edwards 2016). As a result, engineers must either modify existing models, or develop local GMMs using regional or, better still, data local to the target site. This spatial limitation means that motions will inevitably be of smaller amplitude if uncertainty and biases in developed models are to be reduced.
Seismic noise is a continuous, but variable, vibration with various sources. At low frequency, the microseism dominates and is related to natural phenomena such as ocean waves (Webb 1992). At high frequency, sources of seismic noise tend to be anthropogenic, owing to their lower propagation efficiency. Monitoring instruments in close proximity to urban environments are therefore susceptible to both low-and high-frequency disturbances. Cauzzi and Clinton (2013) and Peterson (1993) provide an overview of seismic noise and develop models for 'high' and 'low' noise cases. In terms of the impact of seismic noise on ground motion timehistories or, equivalently, Fourier spectra, an unambiguous assessment is possible by comparing earthquake records with 'noise records'. Noise records are typically taken from the immediate pre-event time-history (such that transient noise at the time of recording may be captured), but equally an estimate could be reconstructed based on the high-and lownoise models (Cauzzi and Clinton 2013;Peterson 1993). The ratio of signal (plus noise) to noise Fourier spectral amplitudes, typically referred to as the signalto-noise ratio (SNR), provides a useful measure of signal contamination. Thresholds above three are usually considered suitable, but this varies between applications, with authors typically striking a balance between data quality and quantity.
The impact of noise on response spectral ordinates such as pseudo-spectral acceleration (PSA), which form the basis of seismic hazard and risk analyses, is more difficult to quantify than for Fourier amplitude spectra (FAS). This is because of the non-linear transform between the Fourier domain (representing the signal amplitude at a given signal frequency) and response spectral amplitudes (representing peak motions of an oscillator with characteristic period). Bora et al. (2016) show that this transform results in a roughly linear relationship between low Fourier frequencies and long oscillator periods (i.e. FAS(1/f)∝PSA(T) for T>0.5 s), but quickly becomes non-linear as periods reduce. At long periods, Akkar and Bommer (2006) showed that a usability limit of T max = [0.7 to 0.97]/f l was required to avoid the impact of long period noise on PSA, with f l (the minimum uncontaminated signal frequency) defined, for example, by a SNR threshold.
On the other hand, Bora et al. (2016) show that the shortest oscillator periods of engineering interest (e.g. 0.01-0.1 s) are driven by motions with longer Fourier periods than those of the corresponding responsespectrum oscillator period. In fact, noise present in the time series at very short periods, such as 0.01-0.03 s does not necessarily affect the corresponding response spectrum at all, even in that specific period range. For instance, PSA at T = 0.01 s (often assumed equivalent to PGA) is typically related to ground motions at 20-30 Hz or lower, and nowhere near the 100 Hz implied by the reciprocal of the oscillator period. This was explored in detail by Douglas and Boore (2011), who concluded, through simulations of moderate to large events, that contaminating records with high-frequency noise had a negligible impact on their response spectra. This is frequently used as justification to ignore high-frequency noise and T min when dealing with PSA. Douglas and Boore (2011) investigated records for earthquakes typically found in strong ground motion datasets (M > 4.5) and, furthermore, made use of site conditions representative of such records, typically soil or stiff-soil sites. Both the moderate to high magnitude of events and the relatively high damping result in records with Fourier spectral amplitudes naturally lacking in high-frequency content. The records used by Douglas and Boore (2011) therefore exhibited low source corner frequencies (f 0 < 1 Hz) and strong exponential decay at high frequency due to damping. For application to smaller events (weak-motion data), with higher source corner frequencies, or to records from rock or hard rock sites with low damping, we must consider signals with very different spectral content to those investigated by Douglas and Boore (2011). As noted previously, in such cases, the degree to which short oscillator period PSA is driven by longer period motions reduces and the conclusions of Douglas and Boore (2011) may, therefore, not be transferable to weak-motion data.
It is the aim of this study to investigate the impact of high-frequency noise on the response spectrum and propose a robust workflow for defining the usable bandwidth of both FAS and PSA from weak-motion records. For clarity, throughout this manuscript, we refer to PSA in terms of oscillator period, T, and FAS in terms of signal frequency, f, as per convention. A p a r a m e t r i c m o d e l f o r t h e l o w e s t u s a b l e (uncontaminated) period, T min , is initially developed using simulations that account for the influence of Fourier spectral shape on the influence of noise in a record's 5% damped response spectrum. The model is then used in direct application to investigate the impact of high-frequency noise on response spectral ordinates from weak-motion records of induced seismicity in the Groningen Gas Field, the Netherlands. The database consists of 803 triaxial recordings from events between 2006 and 2020 with local magnitudes ranging from M L 2.5 to M L 3.6 and is being used in the framework of the Groningen gas field hazard and risk analyses  to develop a GMM . The recordings are from highquality digital accelerographs at 98 sites belonging to the B-and G-networks of the Royal Netherlands Meteorological Institute (KNMI; see Ntinalexis et al. 2019;Dost et al. 2017;KNMI 1993). By virtue of the small magnitude of the events, the recordings contain small-amplitude motions, with as-recorded horizontal PGA values ranging from 0.068 cm/s 2 (7 × 10 −5 g) to 108.68 cm/s 2 (0.11 g).

Noise and its impact on FAS and PSA
The assessment of noise and its impact in the Fourier domain is relatively straight-forward. Typically, preevent noise samples are taken and compared with the record in the Fourier domain. It is important to account for differences in signal duration when sampling the time-history for noise. Authors sometimes ensure that both the earthquake time-history and the noise timehistory are of equal length, but this is not always possible. In this case, noise FAS should be scaled by the square root of the ratio of duration between the earthquake and noise time-history (after Parseval's theorem) to provide FAS amplitudes that are consistent (i.e. corresponding to equivalent signal lengths). Due to the characteristic 'trapezoidal' shape of the earthquake acceleration spectrum, it stands out over the broadly flat noise floor within the passband that can be considered acceptable (Fig. 1). Lower (f l ) and upper (f u ) usable frequency limits are therefore clearly identifiable.
As noted earlier, for the response spectrum (PSA), the correlation with FAS amplitudes at corresponding oscillator periods decreases dramatically as signal frequencies increase above the record's apparent corner frequency (roughly the peak of the FAS) (Bora et al. 2016). While at long periods we can therefore assume a correspondence of the minimum usable frequency of the record's FAS (f l ) and the maximum usable period of the record's response spectrum (i.e. T max = [0.7 to 0.97]/f l , after Akkar and Bommer 2006), at high signal frequencies and short oscillator periods, we cannot make this assumption.
In order to assess the impact of high-frequency noise on PSA, we initially work with synthetic data. This allows us an unambiguous definition of the true signal amplitude and corresponding uncontaminated response spectrum, which is not afforded with real data. Timedomain stochastic simulations have been performed using EXSIM (Motazedian and Atkinson 2005) as modified by Boore (2009). Simulations have been performed using the GMMs for: i) the Groningen gas field at a buried rock horizon (V s30 = 1400 m/s), as detailed in Edwards et al. (2019), and ii) Eastern North America (ENA, Atkinson and Boore 2006).
For the Groningen simulations, we investigate the impact of damping by varying the simulation parameter κ 0 (Anderson and Hough 1984), using κ 0 = 0.01, 0.03 and 0.05 s (roughly equivalent to damping expected at competent rock outcrops through to low V s soil site conditions). The ENA GMM specifically allows the investigation of weakly damped motions, with a very hard-rock site condition (κ 0 = 0.005 s) implicit in the GMM. In both cases, the GMMs are calibrated against local empirical data in the magnitude range of interest for this study (approx. M L < 4), and the simulations can therefore be considered to be representative, yet diverse, in terms of amplitude and frequency content, of real earthquake records.
Noise-free acceleration time series from earthquakes with moment magnitudes from 1.0 to 6.0 (in 0.5 unit increments) are simulated at 20 log-spaced Joyner-Boore distances from 0.1 to 60 km ( Fig. 1). Noise is subsequently applied to the simulations in increasing amplitude until the signals are completely lost. We use two noise forms: (i) white noise and (ii) the noise model of Cauzzi and Clinton (2013). The white noise is generated in the time domain and defined by a normal distribution with zero mean and standard deviation: 0.01, 0.1, 1, 10, 100 cm/s 2 . The higher values are not intended to reflect typical noise levels, but to ensure that all records are affected by noise. The model of Cauzzi and Clinton (2013) is used to consider a realistic highnoise scenario, using their high-noise power-spectrum model converted from dB/Hz to absolute units of spectral acceleration. We scale those amplitudes incrementally by factors 0.1, 0.2, 0.5, 0.75, and 1.0, to generate a realistic stochastic-phase noise timehistory that is added to the noise-free simulation in the time domain. With this approach, we retain acceleration time-histories for the noise-free simulation, the background noise and the contaminated 'noisy' simulation, with the latter referred in the following as the 'synthetic' time series.
From each of the synthetic time series' FAS, various measures are determined to allow investigation of the impact of the noise level: f u : the upper usable FAS frequency in Hz (defined by a signal to noise ratio of 3); f peak : the frequency at the FAS peak; A peak : the natural logarithm FAS amplitude at f peak ; A u : the natural logarithm FAS amplitude at f u ; ΔA: the amplitude difference, A peak − A u ; Δf: the frequency difference, f u − f peak .
We measure T min by determining the lowest period at which the 5% damped response spectrum of the acceleration time-history is unaffected by noise. This is defined as where the response spectrum of the synthetic time-history is within a 5% tolerance of the true value ( Fig. 2). This is a conservative estimate, as PSA at shorter periods than the subsequently defined T min may return to within the defined threshold. Our observations show that while this is often the case, PSA then tends to fluctuate within and outside the acceptable tolerance level at periods below T min (Fig. 2). Low-pass frequency filtering of the time series at, or around, f u results in severe impact on the PSA for weak-motion data, and, as such, should not be used. It is clear that unfiltered (or high-pass frequency filtered: band-pass f > f l ) time series allow calculation of PSA to periods well below 1/f u . In fact, at the 5% tolerance level, the PSA from both these cases in Fig. 2 (from the simulation shown in Fig. 1) only just fail, with most PSA amplitudes being within1 0% of the true values. On the other hand, the low-pass frequency filtered time series result in up to 50 % underestimation of PSA. Figure 3 shows the suite of measured f u and T min from simulations using the Groningen GMMs with alternative damping (κ 0 ), along with the ENA GMM. Clearly, as expected, for signals with higher f u , we obtain PSA with smaller T min . However, the spectral shape has a significant impact on the usability of the response spectra: low κ 0 (weakly damped) records require much higher values of f u to maintain usability of PSA down to 0.01 s. In terms of estimating T min from the time series, we therefore require knowledge of not only the usable FAS bandwidth (i.e. f u , which can be directly measured from field records), but also the spectral shape. In the following, we therefore develop a model for determining T min that accounts for spectral shape by using easily measured characteristics of a waveform and its FAS.
First, we define an adjusted upper usable FAS frequency (f * u ) in Hz. The adjustment normalises f u to that expected, given the same noise and peak-signal amplitude, for a signal with a reference damping, defined by κ ref = 0.03 s. It therefore results in a predictor for T min that is unbiased. f * u is given by: Figure 4 shows the resulting f * u for the three alternative Groningen GMM simulations (κ 0 = 0.01, 0.03 and 0.05 s). Note that f * u values for the κ 0 = 0.03 s simulations (the same as our selected reference, κ ref ) are broadly consistent with measured f u . For the κ 0 = 0.01 s simulations, f * u are systematically reduced with respect to f u , while for the strongly damped κ 0 = 0.05 s simulations, f * u are higher than measured f u . Note that we define a maximum adjustment factor, 0.4, in Eq. 1 based on trial and error after observing over-correction of very weakly damped (low κ 0 ) signals. Figure 5 compares the original f u (as Fig. 3) and adjusted f * u against T min . The use of f * u clearly removes the dependence of spectral shape on the correlation. Based on f * u and T min for the Groningen GMM simulations (Fig. 5), a best-estimate lower usable period, T min (in seconds), within an acceptable tolerance is defined by: where a 3 is the log-mean f * u for 0.01 < T min < 0.02 s and a 1 − 2 are determined through log-linear regression of f * u versus T min . Bounds on T min are then given by introducing a scaling factor, c, on f * u : with the factors c = 1.113 (upper bound) and c = 1/ 1.113 (lower bound) designed to encapsulate the data (where 0.01 < T min < 0.02 s) at n standard deviations of log e f * u À Á . Based on an average over three Groningen GMM simulation scenarios with 1100 simulations in each, and using only data where T min > 0.01 s, we determine a 1 = − 1.753, a 2 = 1.946, and a 3 = 25.41 Hz (Fig. 5c). In addition, a maximum threshold of T min = 0.1 s is imposed beyond which it is not possible to reliably estimate T min from f * u . Predicted values of T min are therefore deemed unresolved if they exceed 0.1 s.
In order to test if there is any sensitivity of the model to the selected magnitude-distance range, the data are split into subsets with magnitude 1-4.5 and 4.5-6 and distances 0-30 and 30-60 km. The model was found to be equally applicable to all of the data subsets. An example is shown for M = 4.5-6 at all distances in Fig. 6. These simulations were subject to unrealistically high levels of noise in order to obtain relatively low f u for such large events. Interestingly, the model appears  equally valid for these very noisy records of larger events, in addition to weak-motion data. While not routinely useful for strong-motion datasets (since noise levels rarely reach such amplitudes), the model would be useful for cases where significant anthropogenic noise levels are present, such as those used for earthquake early warning in industrial settings (Cauzzi et al. 2016). As further validation exercise, we apply the model to the ENA dataset (Fig. 7), which has so far been withheld from the model development.
Disregarding the very noisy data with T min > 0.1 (which, as noted previously, shows very little correlation with f u ), the consistency with results from the Groningen simulations is very good, and the model for T min is clearly suitable independent of the region.
Our choice of 5% tolerance for selecting observed T min will clearly have an impact on the results discussed previously: allowing a larger tolerance when measuring T min means that lower f u are required (for a given T min ). In order to facilitate choice when implementing the T min model, we have also calibrated coefficients for Eqs. 2 and 3 using alternative tolerances of 10 and 15 % (Table 1, Fig. 8). An alternative to white noise was also explored by implementing the high noise model of Cauzzi and Clinton (2013), which is somewhat more forgiving in the mid-period range than white noise. Here the noise is more realistic, but the larger events, particularly those simulated at near distances, are unaffected by the noise and are therefore not included in the derivation of the alternative model (since T min = 0.01 s for those records). Using the high noise model, a 3 , defining the frequency f * u above which T min = 0.01 s, is almost unchanged, being instead strongly related to the acceptable tolerance within the true PSA. For tolerance values of 5, 10 and 15%, we observe a 3 values of 24.4-25.4, 19.3-20.3 and 17.0-17.1 Hz, respectively: f * u above which results in T min = 0.01 s. The shape of the T min versus f * u slope does change depending on the noise model used, however. This suggests that the shape of the noise spectrum itself, as well as the earthquake time series, has an impact on the usability of PSA. In the following sections, we detail the application of a workflow used to define usable frequency (for FAS) and period (for PSA) for an induced seismicity dataset, specifically, a database of 803 recordings from the Band G-networks of the KNMI in the Groningen region, the Netherlands. The recording networks and instrumentation used to record the acceleration time series are described in Ntinalexis et al. (2019). Prior to 2014, the monitoring network consisted of several GeoSig digital accelerographs. As a consequence of a M L 3.6 earthquake that occurred in Huizinge on 16 August 2012, more detailed seismic studies were commissioned for the area. A significant upgrade and expansion of the existing network as well as the installation of new networks became part of this effort (Ntinalexis et al. 2019). The KNMI networks now consist of almost 100 modern Kinemetrics accelerometer stations with high rate 24-bit data-logging. We can therefore safely assume that the predominant source of signal contamination in the dataset analysed will be external noise. The recordings examined were obtained during induced events of local magnitudes ranging from M L 2.5 to M L 3.6 that occurred between 2006 and 2020 in Groningen. The as-recorded horizontal PGA values of the records range from 0.068 cm/s 2 to 108.68 cm/s 2 and were recorded at epicentral distances ranging from 0.4 to 34 km (Fig. 9).

Maximum usable frequency
As mentioned previously, the maximum usable FAS frequency, f u , can be selected via a signal-to-noise ratio analysis. We choose to select f u as the maximum frequency of the continuous frequency window with SNR above 3. This is the simplest method to select the maximum usable frequency and is also widely employed in engineering and seismology. To conduct the SNR analysis, it is first necessary to obtain a noise model representative of the noise in the record. This is routinely determined as the FAS of the pre-event time series. In most modern recording networks, continuous data streams are available via online services and data portals, which allows the user to select a time window of their choice around the event. In these cases, it suffices for the user to select a time window with a long preevent memory and select the first several seconds of that  Table 1) using a the high-noise model (HNM) of Cauzzi and Clinton (2013) and b white noise Fig. 9 Peak ground acceleration of the Groningen horizontal components plotted against distance (upper) and magnitudedistance distribution of the Groningen database (lower) window to sample noise adequately. However, in networks operating on a triggering-only basis, such as the KNMI B-network in Groningen prior to 2014 (see Ntinalexis et al. 2019), limited time lengths of the preevent memory may be available. In small-amplitude records such as those included in the Groningen database, the SNR at frequencies above 20 Hz can also be very sensitive to the selection of the noise window due to transient signals, and hence it is important to make sure that the noise window is carefully selected.
A technique of dynamic noise window selection is employed in our analyses. We use the vertical component motion to determine the noise window to ensure we avoid P wave energy in the selected analysis window. While small in amplitude on the horizontal components, the P wave has non-negligible high-frequency energy that may bias the noise estimate (and therefore f u ). We begin by locating the time window from the beginning of the record to the point where the Arias Intensity is 0.5% of the total. We then determine short (−1 to +0.5 s) and long-term (−3 to +0.5 s) moving averages (STA and LTA, respectively) and compute the ratio (STA/LTA). A ratio above 1.2 signifies a significant amplitude change that can be associated with the first observable arrivals of the earthquake signal. We choose the end of the noise window to be the earliest of either the 0.5% Arias Intensity or the STA/LTA trigger (assumed to be , removing (blue) and without removing (red) the 50-Hz noise peak the P wave). The noise window, as defined on the vertical component, is then used for the horizontal components. An example is shown in Fig. 10.
For the Groningen dataset, we found that often the Gstation sensors were located close to the electricity mains network. In this case, it was very likely that the record was contaminated with 50-Hz noise. For smallamplitude records, this may result in a significant peak in the FAS (Fig. 11) and affect the calculation of f u , as well as the response spectra of the record (Fig. 12). Douglas and Boore (2011) recommend the removal of this peak at 50 Hz with a narrow notch filter and in our case, we found it absolutely necessary to remove the 50-Hz noise in order to obtain correct estimates of shortperiod PSA (Fig. 12).

Minimum usable frequency
Determining the lower usable frequency (f l ) by employing the same SNR>3 criterion as used for f u is a choice that is often employed. However, because the SNRs of small-amplitude records are smaller and the resulting bandwidth can be very limited, it is desirable in our case to use a method that results to more forgiving estimates of f l . The first step is to obtain an initial estimate of f l . This is defined as the first point (with decreasing frequency) that the linear trend of the recording's FAS is observed to systematically decay more slowly than a theoretical Brune (1970) spectrum. The next step is to low-cut filter the record using f l as the filter corner-frequency and then compute the displacement trace through double integration of the acceleration time series. The filter used is an 8th order acausal Butterworth filter. Any low-frequency noise can then easily be observed in the time domain. If the total displacement is zero and long-period noise cannot be readily observed in the displacement trace, then the initial estimate is selected as the final f l value. If the user judges the displacement trace to still be unacceptably contaminated with noise, a higher frequency is selected, and the process is iterated until f l is found that results in a noise-free displacement time series.
An example of the application of the iterative selection of f l is shown in Figs. 13 and 14. Figure 13 shows the FAS of the North-South component of recording KANT from the M L 3.2 Garrelsweer earthquake of 27 June 2011. The identification of different possible low-cut filter frequencies from the FAS of the record is illustrated. The displacement traces obtained after the application of the different filters are compared in Fig. 14. It is obvious that applying a filter of 0.342 Hz (the initial estimate based on spectral shape) is insufficient, as long-period waves are still clearly observable in the displacement trace (Fig. 14). After iterating through increased values of f l , we observe that a frequency of 1.611 Hz is excessive as it results in a reduction in the amplitude of the record. Low-cut frequencies of 0.635 Hz and 0.732 Hz both produce acceptable results; hence, the lowest, 0.635 Hz, is selected. Figure 15 shows ratios of the PGV and PSA of noisecontaminated synthetic recordings to the noise-free versions using the Groningen GMM with κ 0 = 0.03 s. The ratios are plotted as a function of the maximum usable frequency (f u ). It is immediately apparent that, when f u is low, PGV and the short-period spectral ordinates have significantly increased amplitudes. We therefore recommend that records with f u below 15 Hz should not be used at all and should be discarded from ground-motion databases. As shown in Fig. 14, low-cut filtering with an excessively high cutoff frequency can result in a reduction in amplitude and should be avoided. Therefore, when f l is identified above 2 Hz, we also consider a record to also be unusable. When either horizontal component fulfils at least one of these removal criteria, we discard the entire triaxial recording, as  Figure 16 illustrates which records of the database were removed entirely by applying the constraints on f l and f u . A total of 96 out of the 800 records (12%) from the Groningen database were removed. As expected, these recordings correspond to the relatively weaker motions within the database, which come from the lower end of the magnitude range and stations at longer epicentral distances (Fig. 16).

Removal criteria
4 Workflow: usable period range of PSA

Maximum usable period
Once the usable bandwidth of the FAS is defined, the next step is to low-cut filter the records. We recommend the use of an 8th order acausal (zero phase) Butterworth filter, which has been found to be more suitable for use on digital records (Boore and Akkar 2003). For the correct use of this type of filter, it is necessary to zeropad both ends of the record (Boore and Bommer 2005). The pad length is a calculated using the function of Converse and Brady (1992) which is dependent on both the chosen filter corner frequency and the order of the filter. We apply the same filter to both horizontal components, using the lowest cutoff, f l , of the two components, as they are typically used in conjunction when calculating intensity parameters for use in GMPE/GMM development.
The amplitudes of long-period spectral ordinates are highly sensitive to the application of low-cut filters. As the filter removes both signal and noise, an unknown combination of both is left behind by the filter at frequencies lower and close to the cutoff frequency. Therefore, the response spectra are reliable for use only up to a certain period, lower than the long-period cutoff (T c , the inverse of the cutoff frequency, f u ). Different studies have employed schemes to define this usable period limit. Some examples are described in Boore and Bommer (2005) and Akkar and Bommer (2006). The most widely employed technique-and the one adopted in this study-to define the usable period limit is to identify the ratio T c /T max .
According to Akkar and Bommer (2006), for digital records from soft soil sites such as those in Groningen, this ratio is between 0.7 and 0.97. The method we adopted to select from this range consists of comparing the PSA before and after filtering and only using the spectral ordinates where the change in amplitude is within a certain threshold. For the Groningen data, we selected this threshold to be 5%. Figure 17 shows ratios of PSA post-to pre-filtering, plotted as a function of the ratio of each period to the cutoff period. In this case, it can be observed that more than 95% of the response spectra have changed by less than 5% up to a period of 70% of the cutoff period. Hence, we selected the ratio of 0.7 and define the maximum usable period for each record as T max = 0.7T c = 0.7/f l . It must be noted that, for databases with a small number of available records, it may be preferable to define a larger ratio to maximise the available data, using a more generous threshold.

Minimum usable period
Filtering high frequencies prior to computing PSA is not recommended as it may have a knock-off effect on a wide range of periods (see Fig. 2). However, as shown earlier, it is still necessary to define a minimum usable period in order to exclude noisecontaminated PSA from use. The first estimate of T min is the result of the upper-bound T min model presented earlier at n = 3 (Eq. 3), which we apply for a threshold 5% using the white noise model (Table 1).
In addition to the parametric T min model, we devise additional measures to constrain T min . We create two hybrid-synthetic records using the FAS of each record under analysis. To create the first synthetic, we fit an idealised Brune (1970) spectrum to the FAS of the record (Fig. 18), and use the FAS of the record within its usable frequency range (f l to f u ) and the Brune spec- Fig. 16 Peak ground acceleration of the Groningen as-recorded horizontal components plotted against distance (upper) and magnitude-distance distribution of the Groningen database (lower). Unusable recordings are shown in red Fig. 17 Ratios of PSA post-to pre-filtering, plotted against a ratio of period to cutoff period trum in the unusable frequencies. Thus, we create an idealised 'noise-free' version of the record when performing an inverse Fourier transform. To create the second synthetic, we use the full FAS of the record but double it for frequencies higher than f u . In this way, we obtain a noisier version of the same record.
By comparing the response spectrum of the original record to the idealised 'noise-free' version, we obtain an estimate of the periods that are affected by noise. At the same time, by comparing the original response spectrum with the 'noisier' version, we can observe which periods are sensitive to additional noise. From these comparisons, we can define two additional estimates of T min , based on the divergence (with 5 % tolerance) of the hybrid-synthetic and the original response spectra. Finally, we select T min using the following logic (Fig. 19): & If the parametric T min model is 0.01 s (the shortest period defined), we retain that value. & If two of the three T min estimates are within 10% of one another, we retain the average value of those T min . & Otherwise, we select the result of the parametric T min model, but restrict T min between the values calculated using the two hybrid-synthetics.
The number of usable PSA, as defined by T min and T max , is shown in Fig. 20 over 13 approximately linearly spaced periods from 0.01 to 1.5 s. The largest quantity of usable spectral accelerations correspond to the intermediate periods (0.1-0.7 s), a smaller number (498) is available at 0.01 s and a rapid decay can be observed with increasing period from 0.85 s onward. At 1.5 s, the number of usable spectral accelerations is 184, which can still be considered sufficient for the limited distance (R epi < 35 km) and magnitude range covered by the database. In total, 206 records (29.2% of the 704 usable records) are unusable at 0.01 s due to noise.

Conclusions
Short-period noise in acceleration time series has the potential to influence response spectral accelerations at short oscillator periods. This has previously been investigated by Douglas and Boore (2011) in the context of data found in typical in strong-motion datasets. Analysis of 'strong-motion' data, however, generally avoids the influence of high-frequency noise. This is both due the relative amplitude of signal and noise, and also due to the fact that the dominant frequency of motion of strong-motion data is much lower than any high-frequency noise. Our simulations show that PSA from noisy weak-motion records, as present in many ground motion databases such as those for induced seismicity, is susceptible to high-frequency noise. This is particularly so for weakly damped records, such as those on 'hardrock' sites. The impact of high-frequency noise on PSA should be considered by assigning record specific T min and without any form of low-pass frequency filtering. A parametric T min model, based on easily measurable properties of waveform FAS (peak/noise amplitudes, frequencies), is proposed herein and can be used as a guide to assign T min . We additionally propose an easily implementable approach to assess the impact of noise using hybrid-synthetic records, which modify the 'unusable' noisy portion of the records' FAS, before reconstructing time series and subsequently PSA for comparison with the original spectrum. An example of the full workflow used to define usable FAS frequencies and PSA periods was presented for the Groningen induced seismicity database. We showed that only 12% (96 out of 800 available records) were required to be removed in their entirety due to excessive noise. Further to the removal of records in the long periods range (based on T max ), which is already common practice for GMPE/GMM databases, we showed that 29% of the usable records of the database are unusable at 0.01 s due to the influence of high-frequency noise.
Code availability The software EXSIM used for this study is available on request from the author, and online at http://www. daveboore.com/software_online.html (last accessed January 2021).
Funding This work has been funded by the Nederlandse Aardolie Maatschappij (NAM).

Declarations
Competing interests The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative