Comparing Methods and Defining Practical Requirements for Extracting Harmonic Tidal Components from Groundwater Level Measurements

The groundwater pressure response to the ubiquitous Earth and atmospheric tides provides a largely untapped opportunity to passively characterize and quantify subsurface hydro-geomechanical properties. However, this requires reliable extraction of closely spaced harmonic components with relatively subtle amplitudes but well-known tidal periods from noisy measurements. The minimum requirements for the suitability of existing groundwater records for analysis are unknown. This work systematically tests and compares the ability of two common signal processing methods, the discrete Fourier transform (DFT) and harmonic least squares (HALS), to extract harmonic component properties. First, realistic conditions are simulated by analyzing a large number of synthetic data sets with variable sampling frequencies, record durations, sensor resolutions, noise levels and data gaps. Second, a model of two real-world data sets with different characteristics is validated. The results reveal that HALS outperforms the DFT in all aspects, including the ability to handle data gaps. While there is a clear trade-off between sampling frequency and record duration, sampling rates should not be less than six samples per day and records should not be shorter than 20 days when simultaneously extracting tidal constituents. The accuracy of detection is degraded by increasing noise levels and decreasing sensor resolution. However, a resolution of the same magnitude as the expected component amplitude is sufficient in the absence of excessive noise. The results provide a practical framework to determine the suitability of existing groundwater level records and can optimize future groundwater monitoring strategies to improve passive characterization using tidal signatures.

of detection is degraded by increasing noise levels and decreasing sensor resolution. However, a resolution of the same magnitude as the expected component amplitude is sufficient in the absence of excessive noise. The results provide a practical framework to determine the suitability of existing groundwater level records and can optimize future groundwater monitoring strategies to improve passive characterization using tidal signatures.
Keywords Tidal subsurface analysis · Tidal constituents · Signal analysis · Harmonic least squares · Non-uniform sampling

Introduction
Reliable detection and extraction of harmonic components embedded in measurements is crucial for a range of different applications in the geosciences. These include, but are not limited to, the prediction of ocean tides (Pawlowicz et al. 2002), investigating the propagation of seismic waves (Tary et al. 2014), identifying oscillations in climate signals (Ghil et al. 2002) and quantifying water flux in near-surface sediments using temperature measurements (Wörman et al. 2012;Rau et al. 2014;Halloran et al. 2016).
One emerging application is the characterization of the subsurface using the groundwater response to Earth and atmospheric tides. The impacts of astronomical tides on groundwater systems have long been observed, and methods have been developed and applied to characterize subsurface systems (Bredehoeft and Papaopulos 1965;Hsieh et al. 1987; Van der Kamp and Gale 1983;Xue et al. 2016). The advantage of such techniques is that they are passive (Allègre et al. 2016) and thus can be widely applied to existing data sets (McMillan et al. 2019). This approach enables the quantification of subsurface hydro-geomechanical properties such as hydraulic conductivity, specific storage or compressibility, and is termed tidal subsurface analysis (TSA) (McMillan et al. 2019). However, the first step towards property quantification is the extraction of amplitudes and phases of distinct harmonic constituents from measurements that contain other signals or noise that is not white.
Many different signal analysis methods have been developed, each tailored to the challenges of specific applications. Generally speaking, the suitability of any particular methodology depends on the requirements for the spectral analysis (Tary et al. 2014). One of the oldest and most popular approaches is the Fourier transform, a theorem stating that any continuous periodic signal can be decomposed into a sum of properly chosen sinusoidal functions (Stein and Shakarchi 2011). For discrete measurements, the reformulated discrete Fourier transform (DFT) was introduced and translated into the fast Fourier transform (FFT), an algorithm designed for computational efficiency (Nussbaumer 1981). The FFT is implemented in all major software platforms designed for scientific computations, such as MATLAB, Scientific Python (SciPy) or R, which allows for easy implementation in data analysis workflows that require signal processing. The FFT has been used in a broad range of scientific fields over many decades and has therefore been hailed as one of the most important numerical algorithms ever created (Rockmore 2000).
With the DFT, the frequency content of a signal is estimated without a priori knowledge. While this can be a major advantage, it comes at significant costs; for example, estimates of harmonic components are prone to error, especially when their frequencies do not comply with the discretization in the frequency domain which is dictated by the record length in the time domain. As a result, the decomposition of harmonic components is influenced by aliasing and spectral leakage (Havin and Jöricke 1994;Smith et al. 2001;Stoica and Moses 2005). Separating harmonics with nearby frequencies, such as those caused by different constituents (equivalent to components) of the astronomical tides (McMillan et al. 2019), is difficult to achieve. To overcome such frequency separation issues, the minimum record duration is estimated based on the Nyquist theorem (Havin and Jöricke 1994;Acworth et al. 2016).
The presence of data gaps inherent to real-world measurements can complicate the analysis, because DFT requires a uniform sampling rate. Data gaps are usually filled by interpolation which adds to the processing steps and can affect the results in undesired ways (Munteanu et al. 2016). Finally, the magnitude of harmonic components may be smaller than the sensor resolution, and therefore these components may fall below detectability . Despite these limitations, the DFT remains one of the most popular approaches applied in the geosciences. However, its reliability for analyzing real-world data sets is often neglected, probably because signal processing is considered as a stepping stone rather than the research subject.
Many other signal analysis techniques have been developed to overcome the shortcomings of the DFT, such as methods that apply fitting between a model and data based on minimizing a metric that quantifies the difference; for example, the least squares of the differences between a signal and a sum of harmonic functions at defined frequencies evaluated at the times at which the signal was sampled. The popular Lomb-Scargle approach (Lomb 1976;Scargle 1982) falls within this category and was specifically developed for non-uniformly sampled data (VanderPlas 2018). It has received much attention and is a well-established tool applied to data sets from across the scientific disciplines. However, Stoica et al. (2009) illustrated that it does not have any particular advantages for spectral analysis based on least-squares periodograms (LSP). Further, the limitations of least-squares-based approaches when analyzing real-world data sets have not been established.
With signal extraction being a necessary step of TSA, the quality of subsequent natural property quantification is directly proportional to the reliability and accuracy of the signal analysis approach deployed. The aim of this paper is to (i) systematically compare the performance of the DFT with that of a harmonic least-squares (HALS) approach when estimating amplitudes and phases of harmonic constituents with known frequencies, and (ii) define practical limits for record duration, sampling rate, measurement resolution (here termed signal quantization), signal-to-noise ratio and gap fraction for the reliable extraction of harmonic constituents from real-world time-series measurements. The results provide important criteria and guidelines for the types of groundwater data sets that can be analyzed using TSA and for future groundwater monitoring strategies (Rau et al. 2020b).

Discrete Fourier Transform (DFT)
In the geosciences the DFT is a commonly applied methodology. The use of the DFT does not require any primary information about the signal before processing. If a data set is complete, i.e., regularly sampled in time and without gaps, the DFT can provide a good estimate of the frequency spectrum. Because of this simplicity, the Fourier transform is widely used in many scientific fields to extract frequency information about a process that is not easily accessible in the time domain. The frequency spectrum can be expressed as follows (Havin and Jöricke 1994;Stoica and Moses 2005) where y n are discrete samples at times t n with the time index n; N is the number of discrete samples;Ŷ j is a complex coefficient at a discrete frequency with index j. The amplitudes and phases for each identified frequency component can be quantified aŝ where and denote the real and imaginary parts of a complex number, respectively, and phase values always fall within −π ≤ φ k ≤ π . The reader is reminded that the frequency bins are uniformly spaced, with where f d is the sampling frequency. Further, the frequency resolution Δf is the minimum interval between each frequency bin of the DFT that can be resolved and is defined as where τ is the total duration of the time series. The inverse relationship between the frequency resolution and the total duration of the time series justifies taking a preferably long data set (i.e., the longer the duration, the better the DFT resolves the components in the spectrum). Energy from frequency components whose frequencies lie between bins is distributed to the neighboring bins. This is referred to as spectral leakage (Havin and Jöricke 1994). Furthermore, the DFT assumes the input data set to be finite (i.e., a continuous spectrum with one complete period of a periodic signal). Periodic continuation of the discrete signal beyond the considered series duration can lead to discontinuities at the transition (i.e., the two endpoints of the waveform). These affect our ability to distinguish the frequencies of the original spectrum using DFT, as they appear in the spectrum as additional high-frequency components. This well-known effect was minimized by multiplying the time record with a Hanning window (Fig. 1h), which tapers the magnitude at the beginning and at the end of a finite-length record towards zero and therefore prevents discontinuities. While many different window shapes exist, the Hanning window is often used to reduce interference from leakage (Harris 1978). For a comprehensive discussion of the DFT please refer to relevant signal analysis literature, for example Smith (2007).

Harmonic Least Squares (HALS)
The harmonic least-squares method for amplitude and phase estimation (HALS) is an optimization approach which aims to minimize the sum of the squared residuals between a model combining harmonics with known frequencies and some discretely measured data points (Stoica and Moses 2005) and is defined as follows Here, N is the number of discrete samples, y(t n ) is the value of a sample at time t n and K is the total number of tidal constituents k with angular frequency ω k = 2π f k . By adding up the harmonics (K ), multiple tidal constituents, such as those found in groundwater measurements (see Table 1), are taken into account. The sample timings y(t n ) in Eq. 6 are much more flexible compared to the requirement for equally spaced samples y n (t n ) in Eq. 1. The solution to this minimization problem can be simply obtained by solving a system of linear equations for the coefficients a k and b k . It is worth noting that the solution can be affected by numerical errors that arise from the limited precision of the arithmetic operations performed by standard computers. The coefficients a k and b k are converted into amplitudê and phaseφ with phase values always within −π ≤ φ k ≤ π .

De-trending of Records
Both DFT (Eq. (1)) and HALS (Eq. (6)) work best when a signal can be approximated by a summation of harmonic components and white noise. Most real-world ground-

IV g DFT
For uniformly sampled data 1 2 3 4 5 0 1 0 1 5 2 0 Magnitude Magnitude  Fig. 1 A workflow for the systematic generation and analysis of the synthetic signal data set. In sub-figure (h), the windowed de-trending function is applied to a non-uniformly sampled input signal (black dots). The resulting trend (green line) is subtracted from the original signal to get the de-trended signal (green dots). The grey vertical bars illustrate the missing samples for a time series with a TGP of 0.2, while the change in sampling frequency f s is shown in the second plot below. Finally, at the bottom, the HALS analysis results for three frequencies and their deviation from the true value (grey broken line) are shown water head measurements contain components with random but longer periods and higher magnitudes compared to the target constituents (Table 1). A de-trend function should precede application of both methods to improve signal extraction. This was specifically developed for non-uniformly sampled data. The function fits a linear trend to the data using least squares (Oliphant 2006) and then subtracts the resulting trend from the original time series (Fig. 1h). In order to eliminate any frequency components lower than a specific cut-off value, the de-trending was done in segments by moving an averaging window with a predefined time length across the record. By defining the Table 1 Overview of the main tidal constituents reported in groundwater head measurements (Merritt 2004;McMillan et al. 2019), their frequency spacing and minimum required record length to resolve harmonic properties. The latter is calculated for the two closest constituents (top to bottom), using Eq. (5), with the frequency spacing being the frequency resolution window size in terms of a time length instead of a number of samples, this approach works well for non-uniformly sampled time series. The approach acts as a high-pass filter, where the cut-off frequency ( f ≥ 0.2 cpd) was selected to preserve the target tidal frequencies ( f k > 0.89 cpd). Boundary effects were reduced by defining the window step size as a fraction of its own size, resulting in a window overlap. Here, a window size of 5 d with an overlap of 3 was chosen as a compromise between an effective high-pass filter and reasonable computing times. De-trending can be optimized by reducing the window size (i.e., by approximating the cut-off frequency to the smallest f k ) and increasing the overlap (i.e., by smoothing the de-trending curve).
In the present case, the high-pass filter removes all trends caused by constituents with a period d ≥ 5, whereby all the tidal constituents sought are retained in the signal. For availability of this de-trending function please refer to the acknowledgements.

Characteristics of Tidal Constituents
The frequencies of astronomical tides are measured in cycles per day (cpd) and have been well documented in the literature (Agnew 2018), including their impact on groundwater systems (Merritt 2004;McMillan et al. 2019). Table 1 shows the frequencies of significant constituents that have been reported in the literature (Merritt 2004;McMillan et al. 2019) and will therefore be used in this analysis. A major challenge for TSA is the fact that tidal constituents are buried in many other signals. Further, the frequencies of tidal constituents embedded in groundwater head measurements can be quite close. Table 1 illustrates the frequency spacing between the two closest components and the theoretically required minimum record duration to resolve the respective peaks using Eq. (5). Acworth et al. (2016) demonstrated that M2 and S2 have the highest information content, making them the most useful constituents that can be resolved with a uniformly sampled groundwater level record of just 15 days. However, separating other constituents such as P1, S1 and K 1 would require a record length of 1 year. The uniformly spaced frequency resolution of the DFT presents a particular challenge when identifying and resolving tidal constituents (Table 1), for two reasons: (i) it is impossible to find a record duration which is a multiple of all the estimated frequencies, and (ii) it is impractical to limit the analysis to a defined record duration. Despite these shortcomings, the DFT is commonly used (Acworth et al. 2016;Allègre et al. 2016;Xue et al. 2016;Rau et al. 2018;Qu et al. 2020). It is suggested that HALS would be a better solution since the problem is well-posed because the desired tidal frequencies are known.

Synthetic Data Set for Method Testing
To compare and analyze the performance of DFT and HALS in extracting harmonic tidal constituents, synthetic signals were generated with the following form Here, t is time, and A k and φ k are the amplitude and phase of each mode k, with values drawn at random from uniform distribution U A [0.1, 10] and U φ [−π, π], respectively. The synthetic signal S consists of K = 12 superposed sinusoidal modes which represent typical tidal constituents that have been documented in the literature and are summarized in Table 1 (Merritt 2004;McMillan et al. 2019). Moreover, white noise was added to the signal in Eq. (9) such that where Here, C is a scaling factor and μ is a Gaussian distributed random variable with zero mean and unit standard deviation [N (0, 1)]. The scalable noise term in Eq. (10) defines the noise level of the signal and was calculated based on the total signal power P signal and the signal-to-noise ratio S N R = {1, 10, ∞}. To express the power ratio between signal and noise (S N R) more conveniently, a logarithmic decibel (dB) scale was chosen such that d B = 10 lg P signal P noise .
Thus, a signal with d B = 0 corresponds to a power ratio of 1. A set of 900 synthetic signal realizations was generated from Eq. (10) (Fig. 1), based on 100 different signal combinations S from A k and φ k in Eq. (9), while additionally considering three random noise realizations for every given S N R in Eq. (11). The random noise was generated separately for every signal realization. The resulting data set was then systematically sampled at a frequency f s over a record duration τ with a total gap proportion (TGP) and quantization q to create 1,860 time series for each element of the set. The complete parameter space is defined in Table 2. In total, the data set consists of 1,674,000 time-series realizations that were generated and analyzed. The different time-series configurations are designed to represent real-world groundwater head measurements and can be broadly classified as: i uniformly sampled time series, and ii non-uniformly sampled time series that are characterized by missing values at random locations, resulting in small data gaps.
While all time series with uniform sampling were analyzed by both DFT and HALS, the analysis of the non-uniformly sampled time series was limited to HALS. It should be noted that the latter configuration is representative of most time-series measurements in groundwater monitoring. The gaps are caused, for example, by temporarily removing the logger for maintenance purposes or by replacing it due to failure . For the synthetic time series under consideration, these gaps were created by removing sample points from the uniformly sampled time series of the signal at random locations and of different sizes (Fig. 1e). For this purpose, a similar approach as in Munteanu et al. (2016) was chosen, whereby the gap size distribution was defined by a gamma function, so that where α and β are the shape and scale parameter of the gamma probability density function (PDF), respectively. The actual probability of each gap size P(s) was controlled by drawing both mean (x = α × β) and variance (s 2 = α × β 2 ) of the gamma function at random from a uniform distribution U g [1,5]. Each gap size probability was then scaled by the TGP, which represents the total number of points removed from the time series and is defined as a proportion of the original number of sampling points N . Thus, the total number of gaps G of a certain size s that are removed from the signal at a random location x as successive samples (n g = [n x , n x+s )) can be described as follows where the resulting real number was rounded to the nearest integer value. The overlap of the randomly created gaps is minimized while by only selecting gap locations that are buffered by at least one sample value on either side of the gap. Non-uniform sampling was limited to f d = {6, 12, 24, 48} and τ = {60, 120, 180} to minimize the computational load while sampling representative time series at the lower end of N . In practice, every measurement device has a limited resolution, which leads to a rounding effect on the digits of the signal magnitude (Fig. 1f). In signal theory this is referred to as quantization, and the approximation leads to a small error. This effect was considered by applying a uniform quantizer Q to each discrete time series, which can be expressed using a floor function where x is a real number and q is the quantization step size. The floor function returns the greatest integer that is less than or equal to the enclosed term. Thus, for q = 1, the quantizer is simply rounding to the nearest integer. The quantization range represents the range of groundwater pressure transducer resolutions that are found on the market ).

Description and Analysis of Field Measurements
The two signal processing methods were further applied to real groundwater head measurements from two boreholes at different locations, which are representative of frequently occurring subsurface and measurement conditions:   The accuracy of DFT and HALS in quantifying harmonic constituents was analyzed and compared in the following way: (a) The model of both methods was trained on the first 75% of each data set. Thus, the model was trained on 12,512 and 80,961 samples for BLM-1 and BH3, respectively. To evaluate the performance of each method, the error variance of the residual between model and measurements was calculated.
(b) For the DFT, the modeled signals were calculated using only the frequency bins closest to the tidal frequencies. The other bins in the spectrum were discarded to exclude artifacts originating from the de-trending or from noise and also to increase practicability. (c) Both models are validated on the remaining 25% of each data set by calculating the variance of the residual between each model prediction and the corresponding measurements. Thus, the validation was based on 4,171 and 26,987 samples for BLM-1 and BH3, respectively.
The error variance σ is determined by where the vectors M and D contain the modeled and de-trended signal, respectively. Comparing the error variance of the residual allows for a quantitative assessment of model performance, whereby a smaller variance indicates better performance.

Comparison of DFT and HALS
When applying TSA, an accurate assessment of amplitudes and phases of the constituents M2 and S2 is particularly important in order to derive reliable subsurface properties. The efficacy of both the DFT and HALS methods was evaluated as a function of the detection accuracy, with results shown in Fig. 3. The direct comparison was limited to time series with τ ≥ 30 days. Since the true parameter value is known, the accuracy was defined in terms of the relative error R E as and with the nominator in Eq. (19) taking into account that the maximum distance between two phases is π . A target value (TV) of 10% relative error was defined as the maximum acceptable relative error (i.e., results with a R E less than 10% are regarded as sufficiently accurate). While this is an arbitrary threshold, it is a small value compared with the compound uncertainties typical of active hydraulic investigations that are standard in hydrogeology (Raghavan 2004). It should be noted that the R E distribution of both methods is strongly right-skewed. In Fig. 3 this is evident from the off-centered median value and interquartile range. In order to take account of the skewness in the data and to give less weight to outliers, the mean relative error for each grid cell in Figs. 4, 5, 6 and 7 was calculated using the log-transformed R E values. The result was inverted again for better readability. Overall, 86% and 62% ofÂ M2 , and 95% and 90% ofφ M2 were below the TV for HALS and DFT, respectively, whereas only 76% and 18% ofÂ S2 , and 86% and 40% ofφ S2 were below the TV for HALS and DFT, respectively. Thus, both methods are consistently more accurate in estimating the M2 properties compared to S2.
The outliers for R E A estimates of the HALS method compared to a narrow and low interquartile range in Fig. 3 indicates that the accuracy of the HALS estimation deteriorates under certain boundary conditions, while it stays robust for most of the investigated signal and sampling parameter range. While HALS clearly outperformed the DFT in simultaneously estimating amplitudes and phases of the tidal constituents, the estimation routine requires proper constraints (see Sect. 3.6). In contrast to the DFT method, in which the maximum ofÂ is limited by the signal power, HALS is an optimization problem that minimizes the residuals and thus has no reasonable inherent constraint onÂ. Instead, the goodness of fit depends on the polynomial order (i.e., the degrees of freedom) given by the number of estimated constituents, but also on the record duration and the frequency separation of the estimated constituents. The poor performance of DFT in estimating the S2 properties, on the other hand, is likely due to spectral leakage from other nearby frequencies, for example S1, M2 and K 2 ( Table 1). The method's ability to discern nearby frequencies is strongly limited by the frequency binning, which depends on sample frequency and record duration, which is further analyzed below.

Practical Data Set Requirements for Detecting Harmonic Properties
When assessing whether a data set of measurements is suitable for TSA, record duration τ or sampling frequency f s are used as indicators in practice because they are easy to determine and related to criteria such as the Nyquist-Shannon sampling theorem. Figure 4 shows the interrelated effect of f s and τ on both R E A and R E φ for HALS and DFT, respectively. HALS can reliably extract the constituents phase and amplitude at f s ≥ 6 n/d, while not exceeding the TV of 10% relative error. Only for τ < 50 d, the sampling frequency increasingly becomes an issue in determiningÂ S2 (Fig. 4c) and should be taken into consideration. In essence, τ and f s are inversely correlated for τ < 50 d, and f s needs to be increased if M2 and S2 are to be reliably detected. Figure 4 shows that even if f s = 288 d −1 , the absolute minimum duration for records analyzed with the HALS method is 20 days.
A detection trade-off between τ and f s was not discernible with the DFT method. This is in accord with the well-known fact that the DFT frequency resolution depends on record duration. Instead, the detection accuracy increased with the length of the record only. Further, the DFT method appears much less robust compared to HALS, and it is not possible to determine a minimum criterion for R E A within the realistic constraints given by our analysis. Acworth et al. (2016) suggested that records as short as 16 days could be used to resolve the properties of S2 and M2, based simply on considering the Nyquist-Shannon theorem (Table 1). However, our analysis clearly demonstrates that under realistic measurement conditions, the presence of additional tidal constituents renders this limit impossible.

Effect of Measurement Resolution and Noise Level
The influence of measurement resolution and noise level on the accuracy of HALS is shown in Figs. 5, 6 and 7. In order to facilitate a comparison of the effect of the quantization step size q on the accuracy of the amplitude estimate R E A across all time series, q was normalized by the amplitude A k of the constituent of the signal under consideration, namely M2 and S2, such that The resulting quantization ratio q k thus reflects the relative influence of the rounding error introduced by q on the identifiability of the constituent k, which depends on the magnitude of k itself. From Fig. 5 it becomes apparent that the effect of the sample parameters τ and f s on the detection accuracy changes with the q k . In fact, as q k increases, there is a clear dependency between the accuracy and both sample parameters. For q k < 2.0, the minimum sample criteria for HALS are identical to the previously determined limits (Fig. 4). In other words, the quantization step size should be less than twice the amplitude of the constituent under consideration, without impairing the accuracy of the method. Conversely, at higher values of q k , the minimum sample criteria need to be adapted for a reliable estimate, even more so forÂ S2 . An increase in the sampling rate seems to be particularly effective here. In practice, the actual measurement resolution that can be achieved is of course limited, which is why this relationship is not transferred to arbitrarily small amplitudes A k 0.1 mm. Fig. 7 Effect of sampling frequency and record duration on the mean accuracy of HALS in estimating S2 amplitude (a-c) and phase (d-f) for three different noise levels A similar overall tendency in the effect of the sample parameters on detection accuracy was observed for a change in noise level. With an increasing level of noise, the sampling criteria require adjustment. In contrast to q k , however, accurate estimates for the properties of both target constituents are feasible even at the highest investigated noise levels of 0 dB. Furthermore, for a noise level approaching zero (db = ∞), HALS is effectively limited by the accuracy of the S2 estimate requiring f s ≥ 6 d −1 and τ ≥ 20 (Fig. 7).

Effect of Non-uniform Sampling on HALS Performance
One of the main advantage of the HALS method is that non-uniformly sampled data records can be analyzed directly without further preprocessing steps. The performance of the HALS method in estimating the target constituent properties for time series with increasing TGP is presented in Fig. 8, as the fraction of estimates with a relative error below the aforementioned TV of 10% (R E < 10%). On average, 72% and 70% of all A and 86% and 85% of allφ met the TV criteria for M2 and S2, respectively. Thus, the HALS method performs well in extracting the target constituent properties for the majority of the investigated time series, even with gaps. However, the fraction of estimates that meet the TV criteria are highly dependent on the TGP, which is apparent from a sharp decrease in accuracy at T G P ≈ 0.5. This is especially the case for the amplitude estimates (Fig 8a). For T G P < 0.5, on the other hand, the estimates are highly reliable, with about 79 ± 25% and 94 ± 15% of all R E A and R E φ meeting the TV criteria, respectively. Thus, the general dependency of the estimation accuracy on the TGP is much less pronounced for the phase (Fig. 8b). However, the standard deviation indicates that the accuracy of the estimate at any given gap proportion can deviate, depending on the underlying signal realization.
Furthermore, the effect of the gap proportion on the accuracy of the estimate is again dependent on the sampling of the time series (Fig. 9). Data gaps decrease the effective sampling rate, so that even for a T G P < 0.2, the minimum sampling criteria are slightly more strict than for a time series without gaps. It should also be noted that while the average gap size is controlled by the gamma function (Eq. (13)), once Eq. (15) Fig. 8 Effect of small gap proportion on the performance of HALS in estimating amplitude (a) and phase (b) of the two target constituents M2 and S2. The performance is evaluated in terms of the fraction of time series where the relative error is below the TV (R E < 0.1). The vertical bars indicate the standard deviation for the different signal realizations Fig. 9 Effect of sampling frequency and record duration on the mean accuracy of HALS in estimating the amplitude of M2 and S2 combined, for different gap proportions. The mean accuracy was explicitly indicated for values of R E > 0.2 is violated, the average gap size increases dramatically due to the aggregation of smaller gaps. Thus, the observed steep decrease in estimation accuracy above T G P ≈ 0.5 can be partly attributed to the emergence of ever larger gaps.
Groundwater head measurements are generally made at regular intervals, but often contain small, randomly distributed gaps, with the TGP remaining well below the critical 50% . While these gaps are usually interpolated as a preprocessing step for the DFT method, this not only adds an extra step to the data analysis, but also increases the uncertainty of the parameter estimation. The previous analysis shows that HALS, on the other hand, makes interpolation superfluous and works consistently well for records with small gaps, providing another advantage for HALS as a standard approach to extracting harmonic constituent properties.

Fig. 10
Comparison of the amplitudes and phases determined by the DFT and HALS for a the BLM-1 data set from Inyo County (California, USA) and for b the BH3 data set at Baldry (New South Wales, Australia). The grey lines represent the differences between the DFT and HALS estimates. Note that the amplitude scale is logarithmic for improved visibility

Comparing the Performance of DFT and HALS Using Groundwater Level Measurements
Both methods were tested on the real-world groundwater head measurements described in Sect. 3.2. Figure 10 shows the amplitude and phase estimates obtained for all tidal constituents and both data sets. A comparison of the two data sets clearly shows that the record length in particular has a major impact on the estimation accuracy. Indeed, the two methods result in rather differentÂ estimates for the short data set (Fig. 10a), while they give almost identicalÂ estimates for the longer data set (Fig. 10b). However, it is also interesting to note that the results for some of the constituents are consistently very close, while others differ significantly. For example, the determinedÂ S2 andφ S2 are in compliance with each other for both DFT and HALS and in both data sets. On the other hand,Â K 2 andÂ S1 in the BLM-1 data set are rather different for the two methods. These differences are mainly due to spectral leakage inherent to the DFT. This means that the energy of tidal constituents with nearby frequencies, for example, K 1 and S1, are contained within the same frequency bin, especially for records with shorter duration (e.g., BLM-1), leading to an inaccurate frequency resolution. In this regard, the ability of HALS to model specific frequencies generally provides a better performance. Figure 10 qualitatively illustrated that there can be significant differences in the amplitudes and phases detected using DFT or HALS. When used to characterize groundwater systems based on TSA, this can lead to erroneous interpretations. For example, Hsieh et al. (1987) used the DFT to extract amplitudes but then calculated phases using a HALS-like approach. Figure 10 and the synthetic data analysis indicate  1 (a, b) and BH3 (c, d). The model for both methods was trained on the first 75% of the record and validated using the remaining 25% (grey-shaded). Error variances were calculated for the training and the validation periods and are listed in Table 3   Table 3 Summary of the training and validation of the DFT and HALS models for both real-world groundwater head measurements. The values represent the error variance calculated for each subset of the data depicted in Fig. 11 BLM-1 BH3 Training Validation Training Validation DFT 9.99 · 10 −4 1.09 · 10 −3 3.93 · 10 −4 3.55 · 10 −4 HALS 1.56 · 10 −4 4.62 · 10 −4 2.87 · 10 −4 2.97 · 10 −4 that this could be inconsistent. Further, the synthetic data analysis clearly shows that HALS provides more accurate results for real-world groundwater records. Figure 10 begs the question of which method is more accurate in resolving harmonic component properties. To answer this question, the prediction accuracy of each method was analyzed and compared on the basis of the error variance of the model residuals. The results are summarized in Fig. 11 and Table 3. Overall, the variance of the residuals between model and measurements is small for HALS compared to DFT. Of course, this should come as no surprise, as minimizing the residuals is inherent to the least-squared method, and it is clearly superior in this regard. Nevertheless, it also reflects the earlier results that were obtained by testing the methods with synthetic data and demonstrates that HALS performs better in quantifying the properties of tidal constituents. Furthermore, the model residuals of the validation data (Fig. 11), and thus the error variance (Table 3), are higher for the DFT than for the HALS method. This is particularly pronounced for the short data set, where the DFT residuals also contain prominent oscillations (Fig. 11a). These oscillations are a result of the spectral leakage stemming from the discrete and fixed frequency resolution inherent to the DFT. Figure 11panels (c) and (d) show that the residuals of both methods become more similar as the length of the data set increases. Thus, as the frequency resolution of the DFT increases, the spectral leakage is reduced, which leads to an improved ability to detect harmonic components. In contrast, the HALS model residuals are consistently low, indicating that it is more accurate in resolving the harmonic components contained in both records.

General Considerations for the Use of HALS
The analysis presented in this work demonstrates that HALS is superior to the DFT when extracting tidal constituents from realistic groundwater head measurements. While the minimum record duration required to distinguish components appears lower compared to the DFT, it is still the most significant constraint for HALS. It is important to note that Eq. 6 is an optimization problem where the solution can be affected by the limited precision of arithmetic computations, leading to compounded numerical errors. The risk of this occurring increases with decreasing record duration, where a solution can become ill-conditioned. For example, the combination of closely spaced frequency components and short duration records will lead to similar values in the columns of the linear system of equations that is to be solved. This can degrade the accuracy of the solution and lead to overall biased estimates.
Similar to the DFT, there is a trade-off between record duration and minimum frequency spacing for the components determined by HALS (i.e., the smaller the frequency spacing, the longer the record duration must be). To determine whether the solution is ill-conditioned, the conditioning number can be evaluated. This is the ratio of the maximum and minimum singular values of the matrix in the least-squares problem (Eq. 6) and should not exceed a large number, such as 10 8 (i.e., depending on the computing environment). If a system becomes ill-conditioned, a solution would be to reduce the number of desired tidal constituents. For example, tidal constituents P1, S1 and K 1 (Table 1) could first be unified into one harmonic component, as they are closely spaced. Constituents S2 and K 2 are the next closest components that could be merged if the previous merger does not improve the conditioning. For TSA, however, the two components M2 and S2 are of primary interest, which is why an analysis with HALS only makes sense as long as the properties of both can be reliably extracted.

Conclusions
The systematic analysis and comparison of the methods revealed that the harmonic least-squares (HALS) approach is very robust and outperforms the discrete Fourier transform (DFT) in extracting the amplitude and phase of harmonic constituents of known and closely spaced frequencies from time-series measurements. Only for rather short data sets can the variance of the least-squares amplitude estimate increase signif-icantly. The analysis focused on distinguishing the closely spaced frequencies of the Earth and atmospheric tidal constituents M2 (1.93227 cpd) and S2 (2.0 cpd). Both are generally present as dominant constituents in barometric and groundwater pressure measurements and can be used to quantify subsurface properties. As a benchmark of accuracy, an average relative error of 10% was considered to be acceptable when estimating tidal constituent amplitudes and phases.
It is well known that the DFT frequency domain resolution increases with longer duration of a time series and remains independent of the sampling frequency f s . For HALS on the other hand, an accurate estimate of the S2 amplitude requires f s to increase as τ decreases. This effect was slightly less pronounced for the amplitude estimate of M2. Furthermore, the results of this study suggested that an absolute minimum record duration of at least 20 days is needed for resolving the properties of both constituents M2 and S2 at more than 24 n/d (for a normalized measurement resolution below 2). However, f s should not be lower than 6 n/d, unless the noise level approaches zero (S N R = ∞). In the case of S N R = ∞, HALS can exceed these sampling limitations, in particular when determining the properties of M2. However, this scenario is unlikely in practice, as environmental measurements are generally noisy.
The synthetic data analysis further included noise and signal resolution, factors that represent the characteristics of measurement hardware used in environmental sensing such as groundwater pressure. As expected, increasing noise levels degrade the detection of harmonic properties, but more so for amplitudes than phases. Even a noise level of 0 dB, which is considered very high for standard sensors, resulted in excellent HALS performance. Furthermore, sensor resolution (here simulated as signal quantization) also degraded the performance. However, results illustrated that a quantization ratio below 2.0 (i.e., quantization normalized by the amplitude of the target constituent) is recommended.
Finally, the influence of small data gaps inherent to real-world data sets was analyzed. The results demonstrated that the accuracy of the HALS method in detecting the harmonic constituents was reliable as long as the TGP remained below 50%. This relatively high threshold is one of the main advantages of the HALS method, which makes further preprocessing steps such as interpolation or resampling superfluous. The superiority of HALS was further underpinned by a comparison of amplitude and phase estimates from two different real-world groundwater head records.
The analysis presented herein suggests that when applying tidal subsurface analysis (TSA) (McMillan et al. 2019), HALS should be used instead of DFT. In combination, these approaches provide a powerful tool for groundwater resource investigations. The results establish practical criteria which can be used to determine the suitability of existing groundwater head records for TSA. In addition, recommendations for future groundwater monitoring strategies can be derived, with which the accuracy of the passive characterization can be maximized (Rau et al. 2020b).
for providing the groundwater pressure data set from BLM-1. This data set is also available online: https:// nwis.waterdata.usgs.gov/nwis/gwlevels/?site_no=362402116280901. Some of the data used in this paper were collected with equipment provided by the Australian Federal Government-financed National Collaborative Research Infrastructure Strategy (NCRIS). The groundwater data are available through the NCRIS Groundwater Database: http://groundwater.anu.edu.au.
Funding Open Access funding enabled and organized by Projekt DEAL. This project has received funding from the European Union's Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie grant agreement no. 835852. PS was partly supported by the Swedish Research Council (grants VR 2017-04610 and 2016-06079).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/ by/4.0/.