A Wavelet-Based Outlier Detection and Noise Component Analysis for GNSS Position Time Series

Various signals of crustal deformation and mass loading deformation are contained in a GNSS position time series. However, a GNSS position time series is also polluted by outliers and various colored noise, which must be reasonably modelled before estimating deformation signals. Since temporal signals of the GNSS position time series are non-linear and complicated, we propose a wavelet-based approach for outlier detection, which ﬁrst retrieves the temporal signals from the GNSS position time series by using wavelet analysis, and then detect outliers in the residual position time series by using the interquartile range. After the detected outliers are eliminated from the residual time series, the noise components, including white noise and ﬂicker noise, are estimated by using MINQUE approach. Our proposed approach is used to process the real GNSS position time series of the Crustal Movement Observation Network of China (CMONOC) over the period spanning 1999–2018. The results demonstrate that our approach can detect the outliers more efﬁciently than the traditional approach, which retrieves the temporal signals by using a functional model with trend and periodic variations. As a result, the noise components estimated with our proposed approach are smaller than those with the traditional approach for the GNSS position time series of all CMONOC stations.


Introduction
The position time series of various GNSS station networks are widely used to study the geophysical phenomena such as plate tectonics (Tobita 2016), post-glacial rebound (Peltier et al. 2015) and sea level change (Wöppelmann et al. 2007). Due to multipath effects, station-related error (such as electromagnetic interference), orbital anomaly and other unknown reasons, outliers inevitably exist in the GNSS position time series, which will lead to bias estimates in both functional and stochastic models (Koch 1999; K. Ji · Y. Shen ( ) College of Surveying and Geo-Informatics, Tongji University, Shanghai, China e-mail: yzshen@tongji.edu.cn Khodabandeh et al. 2012). There are several approaches for detecting outliers in the GNSS position time series, such as three sigma method (3 ) (Mao et al. 1999), Bayesian method (Zhang and Gui 2013), as well as Detection Identification Adaptation (DIA) procedure (Amiri-Simkooei et al. 2015). Besides these methods, the window-opening test algorithm based on the Interquartile Range (IQR) statistic is another commonly used approach for outlier detection in the GNSS position time series (Nikolaidis 2002;Li and Shen 2018). This algorithm is fast and robust since the median and IQR values of a time series are less affected by outliers. Due to its superior performance, the outlier detection approach based on IQR criterion has been widely applied in the open source software or packages for GNSS position time series analysis, such as iGPS (Tian 2011), Hector (Bos et al. 2013) and TSAnalyzer (Wu et al. 2017).
Apart from outliers, the GNSS position time series are also polluted by temporally correlated noise, which is a combination of white noise plus flicker noise (Mao et al. 1999). The maximum likelihood estimation (MLE) is widely used for estimating the noise components of a GNSS time series. Besides, the existing methods of Variance Component Estimation (VCE), such as Helmert (1907), Minimum Norm Quadratic Unbiased Estimation (MINQUE) (Rao 1971), Best Invariant Quadratic Unbiased Estimation (BIQUE) (Koch 1999), as well as LS_VCE (Teunissen and Amiri-Simkooei 2008), are identical under the normal distribution (Teunissen and Amiri-Simkooei 2008). Therefore we use MINQUE method to estimate noise components in this paper.
The traditional least squares (LS) outlier detection based on IQR criterion (LS_IQR) and noise component estimation based on MINQUE method (LS_MINQUE) are all based on the harmonic functional model (Nikolaidis 2002) in which a position time series is described as a combination of linear trend, quasi-annual and semi-annual signals with constant amplitude and phase. However, the amplitudes and phases of seasonal variation signals in GNSS position time series also vary slightly over time due to the variation of surface-mass loading (Blewitt and Lavallée 2002), atmospheric and hydrological loadings (Bogusz and Figurski 2014). Consequently, a harmonic model isn't sufficient to reflect the nonlinear variation signals of GNSS position time series, especially the time-varying seasonal variation due to the irrationality of the model itself. Therefore, when a harmonic functional model is used to describe the GNSS position time series, the LS residuals still contains partial signal, which will affect the performance of outlier detection and lead to imprecise estimation of noise components. For this reason, we propose a wavelet-based algorithm for outlier detection and noise component analysis, which extracts the time variable signals by wavelet analysis and thereby named as WA_IQR and WA_MINQUE for the correspondent outlier detection and noise component algorithm. The remainder of the paper is organized as follows. Section 2 presents the main methodology, including dyadic wavelet analysis, outlier detection based on IQR criterion and noise component estimation using MINQUE method. Section 3 presents the results of real data analysis of CMONOC over the period from 1999 to 2018, and conclusions are summarized in Sect. 4.

Dyadic Wavelet Analysis
When '(t) is denoted as a basic wavelet function, a set of wavelet functions can be derived by means of dilation a and translation b of '(t) as (Daubechies 1992) Taking a D 2 j , b D 2 j k, where j, k are integers, we can obtain the dyadic wavelet functions as where w(j, k) is the k-th value of j-th wavelet coefficient and S i is the i-th sampling interval. Rewriting Eq.
(3) with vector and matrix form as . Stacking the wavelet coefficients from small to large scale and subjoining the scale coefficient v J 1 , where J denotes the number of layers to be decomposed. For the GNSS position time series, the reconstructed seventh and eighth components of basic wavelet function represent time-varying signals with periods of about 182 and 365 days, which denote the semi-annual and annual signals (Klos et al. 2018), respectively. For this reason, we take J as 8. Then we obtain the wavelet transform of x in matrix form as V J 1 is the scale transform matrix, which is orthogonal to W j and the wavelet transform matrix W is a standard orthogonal matrix. The original time series x can be reconstructed by the wavelet coefficients and transform matrix as follows: where d j D W T j w j represents the j-th detail component and a J 1 D V T J 1 v J 1 represents the appropriate component of the time series.

Outlier Detection in Residual with IQR
The original time series x can be decomposed into components of different frequencies which represent either signal or noise after multi-resolution analysis (Mallat 1988). The signal and noise can be separated by the correlation coefficient method (Zhang et al. 2018), which calculates the correlation coefficient between the original time series and the reconstructed component of each layer, and the layer where the correlation coefficient firstly appears local minimum is considered to be the boundary layer. The correlation coefficient between x and i-th reconstructed component d i can be calculated as where x t and d t, i represent t-th element of x and d i , x and d i represent the average value of x and d i , respectively.
After the multi-resolution analysis of the original time series, we obtain the residual vector v, in which outliers are mostly reflected. Sorting residual in ascending order, and then dividing it into several equal parts with the window length L, which was commonly taken as 182 (Nikolaidis 2002;Wu et al. 2017). Performing a window check on each part of the data set using the Z-ratio statistic (Nikolaidis 2002).
where v i represents the i-th residual, med( ) and IQR( ) denote the operators for computing the median and interquartile range of a series, respectively. According to IQR criterion (Nikolaidis 2002;Bos et al. 2013), when Z > 3, the i-th value of the original time series is detected as an outlier.

Noise Component Estimation Using MINQUE Approach
After the outliers are detected and then eliminated, the noise amplitudes of residual time series, including white noise and flicker noise are estimated by MINQUE method. The fundamental equation of variance component estimation (VCE) is (Li et al. 2010) where v D Ry; R D I A A T˙ 1 y A 1 A T˙ 1 y , A is the coefficient matrix of the observational equation. The covariance matrix † y is a combination of two cofactor matrices for white noise and flicker noise aṡ where 2 w ; 2 f are the white and flicker noise components to be estimated, Q f is the cofactor matrix of flicker noise. For the calculation of Q f , one can refer to Mao et al. (1999).
According to the MINQUE estimation by Rao (1971), the equation to compute the white and flicker noise components is given as follows where, ™ D 2 w ; 2 f Á T . N is a 2 2 matrix and q is a 2 vector, the elements are given by n 11 D t r .WW/ ; n 12 D n 21 D t r WWQ f ; where W D˙ 1 y R, tr( ) is the operator for computing the trace of a matrix. Since R contains unknown noise components, Eq. (11) needs to be iteratively solved with given initial value of noise components.

Real GNSS Position Time Series Analysis
The real position time series of 27 permanent GNSS stations of CMONOC are processed with our proposed approach and their locations are shown in Fig. 1. All the GNSS position  Figure 2a presents position time series of Up, North, and East coordinates for BJFS station and it shows that position time series of three coordinates contain some outliers. Wavelet analysis requires that involved time series should be stable and equally spaced (Walnut 2013), however missing data inevitably occur in the position time series (Shen et al. 2014). We adopt the iterative interpolation scheme to handle data missing problem. Besides, some abrupt changes called discontinuities or offsets occur in the GNSS position time series due to various reasons such as brakes in station operation and change of antennas. Vitti (2012) provided a tool (sigseg) for the detection of position discontinuities in geodetic time series based on Blake-Zisserman variational model. This tool is used to detect and repair the discontinuities in position time series. The new position time series after complementing the missing values and correcting the discontinuities are presented in Fig. 2b.

Signal and Noise Separation
The detrend BJFS time series in Fig. 2b is then decomposed with coif-5 wavelet, and the reconstructed components of each layer are presented in Fig. 3 and correlation coefficients between the original time series and the reconstructed component of each layer are presented in Table 1. Signals extracted by WA and LS estimation are presented in Fig. 4. Obviously, WA can well capture the nonlinear variation of position time series, while LS estimation based on harmonic model characterizes the nonlinear variation as a periodic

Outlier Detection
The IQR criterion is used to detect outliers in the residuals of three coordinates by WA and LS estimation, and results are presented in Fig. 5. Obviously, WA_IQR can detect much more outliers than LS_IQR. In Fig. 5, LS_IQR fails to detect a lot of outliers, especially in the epochs of the non-stationary part, which are caused by the poor fitting to the harmonic model. Figure 5 also presents the detected outliers by the 3 method, it seems that the 3 method can only detect a few outliers. The new time series after eliminating outliers from the original position time series are presented in Fig. 6, from which we can see that more outliers remain in the LS_IQR and the 3 detected time series (i.e. between 1999 and 2003) than WA_IQR detected time series. However, none of them can recognize some outliers, of which the magnitude is quite small (i.e. outliers near epoch of 2015). Figure 7 presents the proportion of detected outliers in position time series of 27 stations for three coordinates. For the BJFS station, the proportion of detected outliers for the whole data for three coordinates are 0.77%, 0.19% and 0.84% by 3 , 1.78%, 1.47% and 2.11% by LS_IQR, and 4.50%, 5.55% and 3.65% by WA_IQR, respectively. From the remaining stations in Fig. 7, we can clearly see that WA_IQR can detect more outliers than LS_IQR and 3 for all stations, the mean detected proportion of 27 stations are 0.16%, 0.50% and 0.39% by 3 , 1.62%, 1.92% and 1.62% by LS_IQR, and 4.61%, 4.65% and 2.59% by WA_IQR, respectively.

Conclusions and Remarks
The traditional LS_IQR for outlier detection and LS_MINQUE for noise component estimation are all based on the harmonic functional model, which cannot well describe the timevariable seasonal signals of GNSS position time series. Consequently, the residuals derived by traditional LS estimation still contain partial signal, which will definitely affect the performance of outlier detection and lead to an imprecise estimate of the noise component. This paper develops a wavelet-based algorithm of outlier detection and noise component estimation, namely WA_IQR and WA_MINQUE. The basic idea of our new algorithm is to separate the signal and noise of the GNSS position time series by wavelet analysis firstly, then detect outliers in residual time series using IQR statistic and then estimate noise components of the residual time series after outliers eliminated. The new algorithm is verified by the real data of CMONOC and the results show that WA_IQR is more effective than LS_IQR to detect outliers and WA_MINQUE can obtain the more reasonable noise component estimates than LS_MINQUE. The noise components estimated by WA_MINQUE approach are all smaller than those by the traditional LS_MINQUE approach for all 27 CMONOC stations.