Selecting time windows of seismic phases and noise for engineering seismology applications: a versatile methodology and algorithm

Seismic signal windowing is the preliminary step for many analysis procedures in engineering seismology (standard spectral ratio, quality factor, general inversion techniques, etc.). Moreover a noise window is often necessary for the data quality control through the signal-to-noise verification. Selecting the noise window can be challenging when large heterogeneous datasets are considered, especially when they include short pre-event noise signals. This study proposes a fully automatic and configurable (i.e., with default parameters that can also be user-defined) algorithm to windowing the noise and the P, S, coda and full signal once the P-wave (TP) and S-wave (TS) first arrivals are known. An application example is given on a KiK-net dataset. A Matlab language implementation of this algorithm is proposed as an online resource.


Introduction
Selecting specific signal phases (i.e., P, S, or coda waves) is required for diverse applications in seismology. For instance, the early part of shear-wave phase is often used for site effects assessment (e.g., Borcherdt 1970;Jongmans and Campillo 1993;Horike et al. 2001;Satoh et al. 2001) and is the basis of the evaluation of the kappa parameter (Anderson and Hough 1984;Ktenidou et al. 2014), while the quality factor (Q) related to the attenuation is regularly estimated from the later coda arrivals (e.g., Aki and Chouet 1975;Mayor et al. 2016). Moreover, to estimate the quality of a signal and its frequency range of validity, the signal-to-noise ratio (SNR) is computed from the ratio between the Fourier amplitude spectrum (FAS) estimated on parts of the signal and the FAS generally evaluated for a noise window of the same duration that is often selected before the event.
While numerous studies have proposed automatic picking algorithms to determine the P-wave (T P ) and/or S-wave (T S ) first arrivals (e.g., Baer and Kradolfer 1987;Sleeman and van Eck 1999;Zhao and Takano 1999;Zhang et al. 2003;Stefano et al. 2006;Wong et al. 2009;Küperkoch et al. 2010;Tan and He 2016), there have only been a few that have offered a solution for windowing the different phases of the earthquake signal. Most studies have considered a constant window duration for every event, without taking into account the earthquake rupture duration or the expansion of the signal duration as waves are being propagated to larger distances (Phillips and Aki 1986;Bonilla et al. 1997;Drouet et al. 2010;Douglas et al. 2010). Recently, some studies have proposed more complex approaches based on the signal analysis (e.g., Maggi et al. 2009) or based on a model using the information extracted from seismic bulletins (e.g., Kishida et al. 2016).
When working with a large and heterogeneous dataset, once the T P and T S first arrivals have been picked, defining a specific window can be complicated. This complexity increases when a noise window has to be assessed for SNR computation. Indeed, timeseries extracted from triggered instruments and/or generated automatically from regional or national networks, often present short and variable pre-event noise durations. When a large dataset has to be processed, as for generalized inversion techniques (GIT-e.g., Drouet et al. 2008) or ground motion prediction equations (e.g., Laurendeau et al. 2013), then a complex automatic procedure has to be used to avoid the introduction of poor quality data into the processing and to minimize the number of rejected data due to difficult window selection.
The motivation behind the present study was to provide a suitable windowing tool for spectral estimation on different phases with due account to signal-to-noise ratio issues. A method to select the phase windows of any dataset for which the T P and T S first arrivals are known is proposed, and a suitable solution to estimate the noise window from heterogeneous datasets with variable noise level and duration is provided. An automatic Matlab algorithm was developed and tested on a KiK-net Japanese dataset that is composed of more than 2000 manually picked events with short and variable durations of pre-event noise (Laurendeau et al. 2013). The records are accelerograms from local to regional events that are used between 0.25 and 30 Hz mostly. This study was initially developed for the application of GIT to a specific KiK-net subset (Foundotos et al. 2016-same issue), and for the correction of the KiK-net surface records for local site effects for prediction of hard-rock reference ground motion prediction (Laurendeau et al. 2016-same issue): more details on the dataset can be found in these papers.
After a short reminder on the relationships between window duration and the associated minimum frequency valid for the FAS, (f min ), we present the model proposed for the seismic-phase windowing in a first step, and the methodology used for the noise window selection in a second step. In a final step, some examples are given to discuss the windowing obtained for local, regional and teleseimic events, as well as for complex examples including after-or fore-shocks.

Spectral resolution and sensitivity to time window duration
Many studies have tested the sensitivity of their data to signal window duration (e.g., Satoh et al. 2001;Ktenidou 2010;Douglas et al. 2010). These have mostly reported only limited dependence, provided that the same seismic phase is considered. These observations have sometimes been used to justify the choice of a constant window duration for every event.
In addition to the potential differences in the input seismic signal delimited by different windows, the resolution of the FAS differs as well. Indeed, the longer the selected time window, the higher the number of wavelengths considered for the FAS at each frequency. For instance, a 10-s-long window contains only one wavelength of a 10-s period, but 10 wavelengths of a 1-s period, and 100 wavelengths of a 0.1-s period, while a 1-s-long window contains one tenth of the wavelengths at each of these periods. If N is the number of wavelengths necessary to insure a good spectral resolution, then the minimal reliable frequency for the FAS is given by: where D is the duration of the window considered. The higher is N, the better is the spectral resolution. Based on our experience, taking N = 10 is enough to give the assurance of good spectral resolution. However, taking such a high N number is not always possible, as this depends on the seismic phase or noise duration available, and on the minimal frequency necessary for the application. When it is required, the N-value can be optimized by tested it values for different signals. Figure 1 shows the sensitivity of the FAS to the S-wave window duration (D S ). The colors from blue to yellow show the results for window lengths from 2.5 to 60 s. In this example and for various other KiK-net signals tested (not represented here), the minimal N-value for this dataset is around three. Indeed, clear discrepancies appear at low frequency for the shortest window, generally just below the f min criteria (vertical lines) obtained with N = 3. Because the KiK-net dataset present very limited duration of noise for the analyses and tests with N = 3 appear satisfying, we keep this N-value in the following examples.
In agreement with the literature, we find a good stability of the FAS over the frequency range where the N-value criterion is satisfied (Fig. 1). Thus, the FAS seems weakly dependent on the duration of the signal considered, provided the most energetic part of the signal is common to every window. It means that small variations on the duration of the selected phase window would lead to negligible change on the FAS.

Seismic-phase windowing
Phase windowing consists of using the first arrival time of P waves and S waves to automatically select different windows: for P, S, coda, and full signals. The nomenclature for the phase intervals and the different times considered is given in Fig. 2. In addition to T P and T S , the ending signal time (T end ) can be defined as well. This time is used as an upper limit for the duration of the S phase, the coda phase, and the full signal. We recommend selecting T end directly from the spectrogram with precautions in the chosen color scale, to be able to detect the end of the coda waves at low frequency, as well as eventual strong noise or aftershocks at each frequency. If the T end value is not provided, then it is automatically taken as the time corresponding to 95% of the cumulated energy evaluated on the three components between T P and the end of the record. It is particularly useful to pick T end for low SNR records and in the case of close aftershocks or strong transient noise, which can be included by the cumulated energy approach. Moreover, the cumulated energy approach presents the drawback that it depends on the duration and level of noise included in the record around the signal, especially when the latter is weak. The time of the initial sample of the record is denoted T i , while the final sample is given by T f . The time for the earthquake occurrence is noted as T 0 . The method and its associated algorithm have been developed for FAS processing purposes. The window's edge must be tapered to satisfy the infinite signal assumption made for the Fourier transform on a finite window. Thus, the rate of tapering (tx) can be specified in the window selection process, to enlarge the windows and apply the tapering outside the accurate delimitation of the phase windows. a The time histories of the three components and the selected window are shown. The window is a little larger because the cosine taper is applied outside the limit interval of the defined S-wave window. b The corresponding FAS are shown. FAS in gray correspond to the noise spectrum. The vertical lines indicate the minimum frequency associated with the S-wave window and allowed to have at least three wavelengths (N = 3). In this example, it is necessary to have at least 10 s of signal to have a reliable spectrum at 0.3 Hz

P-wave windowing
The P-wave window is the easiest phase to delimit as it starts at T P and ends at T S . The duration of the P-wave phase (D P ) can be written as: where tx takes into account the single edge enlargement in the noise before the P-wave onset, to apply the tapering on this pre-event noise and thus to avoid losing some parts of the P-wave phase in the tapering process for the FAS evaluation. Finally, the P-wave window interval is defined as: Fig. 2 East-West component record from a M L 2.2 earthquake at 56 km epicentral distance and occurring at T 0 . The P, S, coda and all phases are represented by the gray bands indicating I P , I S , I C and I All , respectively, with their corresponding durations (D P , D S , D C , D All ) and considering a rate of tapering of 5% (tx = 0.05). The first arrival times (T P , T S , T C ), phases ending (T end ) and beginning (T i ) and ending (T f ) of the record are also shown. Bottom spectrogram for the seismic energy as a function of the time and frequency

S-wave windowing
The S-wave window duration is given according to a source term through the inverse of the corner frequency (f c -Brune 1970), and a propagation term taking into account the difference time between the P-wave and S-wave first arrivals (T S -T P ): Here, the window is being enlarged by a factor tx for both edges. Thus, the enlargement of the first edge includes a small portion of P waves in the S window, although P waves are already included in the S-wave phase anyway. A minimal duration (D smin ) can be defined for the question of spectral resolution at low frequency. f c is estimated directly by the Brune (1970) relationship [Eq. (4)] from the seismic moment (M 0 ), considering a stress drop (Dr) of 10 bars and a mean shear-wave velocity (b S ) for the crust of 3500 m/s: Dr and b S can be, however, easily adapt to the target region if needed. The seismic moment can be deduced from the moment magnitude (M W ) according to Eq. (6) (Hanks and Kanamori 1979): If the moment magnitude is not available, M W can be approximated by the local magnitude (M L ) extracted from the seismic catalog. This estimation of the source duration is anyway approximate, but is supported by the observed stability in the spectrum evaluated from windows with different D S as shown in Fig. 1. Kishida et al. (2016) proposed a similar formulation for D S , with a part related to the source with different durations defined empirically according to the magnitude, and a part related to the propagation defined as one tenth of the hypocentral distance (0.1R h ). First, for the source term, f c is high for small and moderate earthquakes (M \ 5), and thus D S is only controlled by the propagation term. Then, the source duration can be neglected for M \ 5, making Eq. (4) usable without the need for parameters other than T S and T P . The use of Eq. (4) for earthquakes with M [ 7.5 can lead to very large source durations (see Kishida et al. 2016). A maximal S-wave window duration (D smax ) can be chosen in this context. Secondly, for the propagation term, (T S -T P ) is widely accepted to be close to a 1/8 of the hypocentral distance given in kilometers, making both expressions similar. However, the formulation in Eq. (3) has the advantage of being independent of uncertainties in the source localization (especially the depth) given by the seismic catalog.
The S-wave interval is finally defined as: Figure 3 shows the variation of source and path component duration of D S as defined by Eq. (4) with the magnitude and rupture distance for the KiK-net dataset example. The maximum duration due to the source is around 17 s, and that due to the propagation is around 38 s. The total duration has a minimum of 1.2 s and a maximum of 55 s. However, in the following, we consider a target minimal frequency of 0.3 Hz and a minimum of three wavelengths included inside the window (N = 3). Equation (1) finally gives a minimal duration D Smin = 10 s for applications on the KiK-net dataset.

Coda-wave windowing
Aki (1969) defined the beginning of the coda phase as twice the S-wave travel time (2(T S -T 0 )) after earthquake occurrence T 0 . To be independent of the information extracted from the seismic bulletin, we propose a formulation equivalent to the Aki (1969) definition, but based only on the T P and T S parameters. Using the approximation commonly accepted that R h / 8ðT S À T P Þ and that b S % 3:5 km=s and considering R h =b S % ðT S À T 0 Þ, we easily find 2 T S À T 0 ð Þ%4:6 T S À T P ð Þ . The time of the first 'arrival' of the coda phase (T C ) is finally T C ¼ 4:6 T S À T P ð Þþ T 0 , which is equivalent to T C ¼ 2:3 T S À T P ð Þþ T S . This formulation is also empirically confirmed through the good coefficient of determination R 2 = 0.98 for the linear regression on the local events (R h \ 500 km) between Aki (1969) and our T C expression. This definition of T C has the advantage that it is independent of the uncertainty on the time origin of the earthquake. Finally, the coda-wave interval can be written as: Its associate duration is simply: A minimal coda wave window duration (D Cmin ) can be defined. If D C \ D Cmin , then no coda wave window is returned by the algorithm since the signal is too weak and the amplitude of the coda waves falls rapidly below the noise level. A coda signal is generally available only for events with good SNR (i.e. [10) in a large frequency range.

Full-signal windowing
A full-signal interval (I All ) is also proposed that starts at T P and finishes at T end . This takes into account the enlargement on the first edge for the tapering, and it is defined as: This full signal window is particularly useful when no specific phase is mandatory, and to obtain long enough windows (D All ) to assess spectra with good resolution up to low frequencies.

Noise window selection
The noise window selection generally consists of taking the duration of the target window and reporting it before the first P-wave arrival. Here, a more complex scheme has been developed, to take into account the availability of data with short windows of pre-event noise. Figure 4 shows the pre-event noise duration that is available for the KiK-net dataset. This shows generally that these durations are short and variable, which makes the noise window selection difficult. Only one noise window of duration (D N ) is selected to assess the SNR of several seismic phase windows of different durations. The duration of the target noise window (D t ) has to be at least as long as the longest seismic signal window requested (D P , D S , D C or D All ), or long enough to satisfy the f min criterion. Thus, FAS estimated from these windows of different durations have to be normalized by the square root of the duration to obtain the Fourier amplitude spectrum density (FASD ¼ FAS= ffiffiffi ffi D p ) that is length-independent, for SNR purposes especially. Fig. 4 The number of records versus the pre-event noise duration available In the present example, we consider a minimal noise duration D min of 10 s to be consistent with the minimal S-wave window duration taken previously. However, 10 s of noise is not available before the event for all of the records of the KiK-net dataset, as shown in Fig. 4. The dataset contains 311 records without 10 s of pre-event noise. Thus, different noise window definitions are tested, for which the energy was then compared. The idea was to select a noise window with sufficient duration, and also a window with a representative level of energy (without strong seismic signal included). To do this, one preevent noise window (I N1 ) of duration (D N1 ) is tested as well as two post-events noise windows: a short window I N2 of duration D N2 , and a long window I N3 of duration D N3 .
; T P À 0:1 2 4 3 5 ; ð11Þ The pre-event window (I N1 ) is the preferred one, as no earthquake signal can be introduced in the noise evaluation. However, if the pre-event noise that is available is too short, or if there is a fore-shock before the mainshock, then other windows have to be considered. The post-event window I N3 is longer than I N2 only when the target phase duration (D t ) is greater than D min and D N1 . No S wave can be introduced in the post-event window, while the coda wave can be accepted to minimize the number of rejected signals due to a lack of noise available. An example is given in Fig. 5 that illustrates the noise window selecting process for the S-wave duration targeted on a record presenting a limited pre-event noise duration available. In this example, I N3 is preferable to the two other noise windows because it is the only one that has the same duration as the targeted S-wave window while it provides similar FASD than the two other noise windows even if some coda waves may be included inside I N3 . This latter verification is carried out on the comparison of the mean energy of the different windows. A minimum pre-event noise duration of 1 s is mandatory for these comparisons. To be able to compare just the representativeness of each noise window, the mean energy (E) is estimated for the three noise windows (E N1 , E N2 , E N3 ) according to the following definition: where N min is the index of the minimum frequency (f min ), Nf is the number of frequency samples, and FAS are the Fourier amplitude spectra computed for the three components. A scheme can take into account the length of each window as well as their mean energy, to determine the best noise window for the noise FASD evaluation, as detailed in Fig. 6. The energy for the three noise window comparisons is weighted by some factors defined by the user (F 1 , F 2 , F 3 , F 4 ). This allows I N1 , I N2 , or I N3 to be favored, depending on the duration available for each one, and the number of rejected records to be minimized due to lack of noise for the SNR evaluation. In addition to the noise and the P, S, coda and all signal windows, the algorithm returns a Flag value that indicates which noise window has been selected and how. For the KiK-net application the scheme of the noise window selection process is configured with: D min ¼ 10 s; F 1 ¼ 5; F 2 ¼ 3; F 3 ¼ 2; and F 4 = 0.67. Fig. 6 Scheme of the noise window selection methodology. D N1 , D N2 and D N3 are the pre-event and the short and the long post-event noise window duration respectively. The corresponding spectral energy of these noise windows are given by E N1 , E N2 and E N3 . Few other parameters can be easily adapted to each dataset: the minimal and target duration (D min and D t ) and the weighted factors F 1 -F 4 The main idea is that the pre-event window is favored when both the pre-event and postevent windows are longer than D min and D t (Flag 1). However if the pre-event window is shorter than D t , then the longest post-event window is preferred (Flag 3). In the same way, if I N1 is shorter than D min , then the long post-event is preferred if not too much signal is included inside it (Flag-3), otherwise the short post-event window is chosen (Flag-2).
These Flags for the noise selection are indicated in Fig. 6 and are given in Table 1, with the corresponding number of events that were selected for the KiK-net dataset example. Most of the noise windows selected were pre-event windows. The post-event windows selected were mostly constituted by the long window. Only a few signals are removed from the dataset due to the absence of a noise window for the SNR evaluation. All of the parameters and factors given in this article can be adjusted as an input of the algorithm.

Application examples and discussion
The advantage of our windowing formulation is illustrated in Fig. 7 by the comparison with two simple formulations: a 30-and a 10-s constant S-wave window. Three earthquakes recorded in Provence (Southeastern France) with increasing epicentral distance are presented. The influence of the S-wave window duration is visible on the FASD and the SNR. For the closest earthquake (7a), the 30-s window includes a lot of coda waves and underestimates greatly the FASD obtained with the 10-s window and our formulation. For the intermediate epicentral distance (7b), our formulation is in between the two constant window leading to a FASD that is close to the average between the three window definitions. Only the beginning of the S-wave window is included in the two constant windows for the regional earthquake (7c) while our formulation provides longer duration leading to a slightly lower FASD amplitude even if the three curves are very similar. Thus, a constant window duration is not adapted when working with datasets composed by both local and regional earthquakes while our formulation gives a suitable solution. Concerning the other phases of the signal, we used the classical P wave formulation that seems appropriate, while our coda wave window formulation begins very close to the one predicted by Aki (1969), as expected. The T end95 cumulative energy estimation is always later than the  manually picked one, and this duration increase may be accentuated for longer records.
The three noise windows have the same duration and present similar FASD. However, long pre-event noise is available and this may not be true for each record of every dataset. Figure 8 presents two examples recorded during the 2014 Cephalonia seismic crisis in Greece, for which the selection of noise and phase windows is uneasy and might be unreliable. Here, the phase ending time has not been picked and is defined by T end95 . In the 458 km (c) from the recording site. The P-wave window (I P ), the S-wave window (I S ), the coda wave window (I C ) and the full signal window (I All ) are represented. To make the comparison between our formulation and simple constant window formulation, 30-and 10-s long windows (I S30 and I S10 ) are also represented as well as their corresponding amplitude spectral density (FASD) and associated signal-to-noise ratio (SNR). The picked phase ending (T end ) and the one predicted by the 95% fractile of the cumulative energy (T end95 ) are represented as well as the Aki (1969) coda beginning formulation (T Caki ) that can be compared to our formulation (I C ). Identically to Fig. 5, the noise window selection process is also represented by the pre-event noise window (I N1 ) and the short and long post-event noise window (I N2 and I N3 ) and their associated FASD and SNR. Here, the pre-event noise duration is long enough to target D S leading to the same duration for the three noise windows (D N1 = D N2 = D N3 = D S ) first case (a), the target earthquake is followed by a stronger one that strongly biases the T end95 estimation and leads to include the P-and S-wave phases of the second earthquake in the coda and the full signal phase of the first event. However, the duration of the coda wave is not long enough here to satisfy the minimal coda wave duration criteria (D C \ D Cmin = 10 s). Moreover, for this particular example, the noise selection leads to the rejection of this record (Flag = 0) since the pre-event noise window is too short and the FASD of the two post-event noise windows are too different from the pre-event one.
In the second example, a small fore-shock is present in the pre-event noise while the record is cut before the actual end of the signal. Here the signal phases are not biased but the noise selection is complex. Indeed, I N1 is enriched at high frequency compared to I N2 and I N3 while it is the opposite for low frequency due to the presence of longperiod coda waves in the post-event noise window. When using our parameterization (D min ¼ 10 s; F 1 ¼ 5; F 2 ¼ 3; F 3 ¼ 2; and F 4 = 0.67), the algorithm selects the preevent noise window since it exhibits sufficient duration for good spectral resolution. The SNR is, however, significantly lower at higher frequency. To avoid such difficulty, we strongly recommend visualizing and flag such records when picking P-and S-wave first arrivals. Picking T end or testing more accurate automatic procedures than the cumulative energy one is also required for an accurate coda and full signal phase window ending.  Fig. 8 Examples of uneasy signal phase and noise windowing when no phase ending (T end ) has been picked and when either after-(a) or fore-shock (b) are present in the record. Similarly to Fig. 7, all the signal phases (I P , I S , I C and I All ) and noise windows (I N1 , I N2 and I N3 ) are represented with their corresponding FASD and SNR. Here, the pre-event noise is limited, leading to test different post-event noise windows durations

Conclusions
Seismic signal windowing is the preliminary step for many applications in seismology and for SNR verification. While the vast majority of previous studies have used very simple windowing formulations, such as constant duration, we propose a more complex method that takes into account source and propagation terms. This study provides a suitable solution for heterogeneous datasets where the P-wave and S-wave first arrivals have been picked beforehand. The earthquake signal phases are selected exclusively from the T P and T S parameters for the majority of the events, which makes the windowing independent of the uncertainties present in the information given by the seismic bulletin. For strong earthquakes (M [ 6) that have source durations that are not negligible, a source term is estimated through the inverse of the corner frequency evaluated from the magnitude. Selecting the noise window can be challenging when large heterogeneous datasets are considered, especially if for some events the duration of the available pre-event noise is short. The noise window has to be the most representative of the noise level, and long enough to allow SNR estimation with good resolution at low frequency. To get around this issue, we defined and tested three different windows: one pre-event and two post-event windows. A scheme is proposed for selecting of the best noise window in terms of duration (as long as possible) and mean energy (as low as possible), without including undesirable transients. This approach gave good results on the KiK-net dataset, with a very limited number of rejected signals.
A Matlab algorithm was developed and can be adapted to each dataset through a few parameters to be defined by the user. This algorithm is freely available as electronic online supplementary material of this paper and ready to be used for every windowing application.