SVR and ARIMA models as machine learning solutions for solving the latency problem in real-time clock corrections

Real-time precise point positioning (PPP) has become a prevalent technique in global navigation satellite systems (GNSS). However, GNSS real-time users must receive space state representation (SSR) products to correct for satellite clock, orbit, and phase biases. The International GNSS Service (IGS) provides GNSS users with real-time services (RTSs) through different real-time correction SSR products. These products arrive at the GNSS users with some latency, which affects the quality of real-time PPP positioning. The autoregressive integrated moving average (ARIMA) and support vector regression (SVR) models are used in this research to predict those corrections to eliminate the latency effect. ARIMA model reduces the standard deviation by 28% and 13% for GPS and GLONASS constellations, respectively, compared to the real-time solution, which includes the latency effect, the research simulated the latency effect and named it a forced-latency solution, and the SVR model reduces the standard deviation by 28% and 23% for GPS and GLONASS constellations, respectively. The results for the permanent GNSS stations used in this study across different years 2013, 2014, 2015, 2019, and 2021 show a mean reduction in the 3D positioning standard deviation by 13% compared with the forced-latency solution for the ARIMA solution and 9% for the SVR solution. The potential of both models to overcome the latency effect is apparent based on the findings.


Introduction
The GNSS is widely used to determine positioning through various methods, such as single point positioning, relative positioning, and PPP. The main principles of those methods can be found in Zumberge et al. 1997;Hofmann-Wellenhof et al. 2008;Enge and Misra 2011;Sanz et al. 2013;or Leick et al. 2015. PPP allows GNSS users to locate themselves globally since it does not rely on a local network of GNSS receivers or a base station. This is also free from the local effects resulting from the movement of reference stations.
Nevertheless, users must precisely measure pseudoranges and carrier phases from GNSS satellites to perform post-processing and real-time PPP. GNSS biases must be accurately estimated, such as multipath, ionospheric delay, tropospheric delays, earth tides, relativistic effects, and antenna variation. A detailed study of these effects can be found in Zumberge et al. (1997), Ge et al. (2008), Cai and Gao (2013), Li et al. (2015), and Teunissen and Khodabandeh (2015). Finally, code and phase biases and satellite orbital and clock errors can be mitigated by using adequate biases and orbital and clock correction products (Teunissen and Khodabandeh 2015; Henkel et al. 2018;Ye et al. 2018). The IGS and other Analysis Centers (ACs) provide precise respective products (Dow et al. 2009).
The PPP coordinate accuracy could reach a centimeter or sub-decimeter depending on the operational mode of postprocessing or real-time (RT-PPP).
The International GNSS Services (IGS) initiated a pilot project for real-time activities at the beginning of 2001 to improve RT-PPP accuracy (Dow et al. 2009). The goal was to provide the RTS for GNSS users with accurate real-time products, mainly orbital and clock corrections, to reduce the satellite ephemeris and the onboard satellite clock errors.
Consequently, some ACs using the GNSS permanent reference network are involved in the real-time tracking, computation, and broadcasting of real-time products (Grinter and Roberts 2013). Various ACs produce and disseminate their real-time products. IGS combines those products to produce official real-time products such as IGS01, IGS02, and IGS03. IGS01 is a single-epoch combination solution, IGS02 is a Kalman filter combination solution, and IGS03 is an experimental Kalman filter combination for GPS and GLONASS solutions.
The orbit and clock products are used by the user with different latencies, i.e., the sum of the time required to generate the products in the ACs, to combine these products for IGS RTS Service (Johnston et al. 2017), broadcast them over the Internet using Networked Transport of RTCM via Internet Protocol (NTRIP) as RTCM state-space representation (SSR) correction streams, and the time required for the local computer where the RT-PPP solution is implemented. As the latency value increases, the corrections become outdated, leading to applying older corrections to realtime observations. Martín et al. (2015b) found a 3D error of approximately 0.15 m and 0.30 m if latencies of 30 and 40 s are employed, respectively. The latency problem is also addressed by Hadas and Bosy (2014). This study showed a negative relationship between latency values and the corrections accuracies. The latency values for the individual products produced by ACs reach 10-12 s. However, this value increases to 30-40 s for the combined IGS products since IGS needs extra time to receive all AC solutions and combine them.
Different prediction models are discussed in the literature. Examples are quadratic polynomial or linear models with sinusoid terms found in Huang et al. (2014), El-Mowafy et al. (2017, El-Mowafy (2019a, b), Yang et al. (2019) to predict orbital and clock corrections during data communication failure or discontinuity periods, and genetic algorithms with autoregressive moving average models to predict 15 min of corrections during data loss of the IGS02 stream in Kim and Kim (2017). A limited number of studies have investigated prediction models for navigation satellite systems other than GPS. In Qafisheh et al. (2020), the GLO-NASS satellite system is included in the prediction. Still, both Qafisheh et al. (2020) and Kim and Kim (2017) studies are missing the latency effects on coordinate accuracy.
Clock correction values are the most challenging to model compared to orbital corrections because they are highly correlated to the onboard GNSS clock oscillator behavior, which suffers from frequency instability, jumps, offsets, outliers, and frequency drift. Additional research about the characteristics model of the clocks and their behaviors can be found in Daly (1990), Senior et al. (2008), Hauschild et al. (2013), andMaciuk (2019). Different real-time schemes are utilized to adapt clock offsets, jumps, noise, and speed using the Kalman filter (Huang and Zhang 2012).
This research aims to overcome the latency in the IGS03 real-time clock product using the ARIMA and the SVR models over a short period. The prediction model and the methodology used in this research can be extended to cover other combined real-time products made available by IGS or other ACs. Various open-source software can handle SSR correction streams such as RTKLIB (Takasu 2009, http:// www. rtklib. com/ rtklib_ tutor ial. htm), GNSSSurfer (SAPOS®-Berlin 2020, http:// 217.9. 43. 196/ Downl oad/), or BNC from the Kartographie und Geodäsie Agency (BKG) (Weber and Mervart 2007). In this research, BNC software is employed. It can be used to accomplish different GNSS tasks, including satellite coordinate comparison, broadcast, combine and upload corrections, PPP post-processing, RT-PPP, and decode the SSR messages to obtain the required correction values to eliminate orbit and clock errors (Kouba and Héroux 2001;Weber and Mervart 2007;Weber and Mervart 2007). The PPP technique uses ionospheric-free combinations to mitigate the ionospheric effects by combining pseudoranges codes and carrier observations of different frequencies. The tropospheric error, antenna phase center, and other biases are well modeled in the BNC software.

Experimental data
First, the RT-PPP measurements from December 13-16, 2019 were stored. We used the IGS03 correction product, a 5-s sampling interval, and the Brest (France) permanent station observations. Before experimenting, the BNC software stored observation, navigation, and correction files in a local computer synchronized to the Internet time.
Latency values were also stored. The mean value of the latency was 31.68 s. However, the range during the implementation varied between 31.34 and 32.21 s. The stored correction file contains data for 52 observed satellites with approximately 26 thousand correction values. Several satellites are affected by data unavailability during some periods.
The research extended to cover more stations over several years to evaluate the prediction models properly. Consequently, one day of RT-PPP data was used, and nine stations distributed globally were selected. We investigated the following years: 2013, 2014, 2015, 2019, and 2021. Table 1 shows the percentages of the null value for various satellite blocks for different years, and Fig. 1 shows the distribution of the selected IGS stations.

Methodology
One of the most important aspects to take into account in any machine learning or data mining project is the proper choice of the method or mathematical model to use. In this case, it is about making future predictions about the studied signal itself, so the choice of the method will depend on the characteristics of the signal. Based on the results obtained in the signal analysis, there will be a greater foundation in the choice of the prediction method.

Signal analysis
Statistical analysis for stationary inspection of the signal is needed to select the proper machine learning algorithms. Our goal is to predict future values in a temporal series (time series forecasting). If the signal is stationary, that is, having no trends, seasonality, or cyclic patterns, most of the machine learning methods, including random forest, neural networks, or XGB, fail to fit the data. In those cases, the predictions have the same values as the last observation (Bownlee 2017).
Augmented Dickey-Fuller, Phillips-Perron, and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) statistical tests can examine a time-series stationarity property (Kwiatkowski et al. 1992). It is possible to improve the stationarity decision by combining different statistical tests (Schlitzer 1995). However, the KPSS test is applied to the stored GNSS clock correction data to simplify the code and reduce processing time. The stationarity was tested for various time windows of clock data split from the IGS03 clock correction files obtained between December 13-and 16, 2019. Table 2 presents the stationarity results for the 8-min data windows, including the combined results for different satellite blocks and the percentage of stationary and not-stationarity windows. In this research, we conclude that the clock corrections are mostly stationary signals, but these results could  vary with respect to clock oscillators or signal length, considerably restricting the machine learning model for forecasting. Therefore, SVR and ARIMA models are the best candidates (Hyndman and Athanasopoulos 2018, Bownlee 2017). Furthermore, they can also be used to predict nonstationary signals.

Prediction models
Once the prediction models have been chosen based on the signal analysis, we are going to describe them briefly, adjusting the explanation to what is really necessary to understand the implementation and results properly.

Support vector regression model
The support vector machine (SVM) and derived SVR methods have been widely used in machine learning due to their simplicity (Clarkson et al. 2012). SVM development allows applying the traditional support vector classifier in a higherdimensional space (Drucker et al. 1997). The adaptation of higher-dimensional space can be made by harnessing the mapping function (kernel). Radial-base function, polynomial, linear, and other kernels can transform data into higher-dimensional space. Given a data set containing {(x i , y i , i = 1,2, …, m)} where x i and y i ∈ R, x i are the features and y i the label, both normalized to (+ 1, − 1), the classifier categorizes data according to its label. For data categorization, the hyperplane must be established to separate the data. Due to the nature of the data, many hyperplanes can classify the same dataset. The chosen hyperplane must maximize the margin, defined using the nearest points near the margin, called support vectors. The margin can be classified as a hard or soft margin. The so-called soft SVM is generally used because it allows isolated values and misclassified samples, making it more realistic. Smola and Schölkopf (2004) defined the SVM formula as:

Subjected to
where w represents the margin width, b the bias, ξ denotes the slack variable, which is associated with the defined soft SVM, allowing some value to fall in the margin, and C is the trade-off margin width.

ARIMA model
The ARIMA model has been extensively used to forecast time series (Sneeuw et al. 2012;Ye et al. 2012;Moreira et al. 2013;Xin et al. 2018; Van Le and Nishio 2019). The ARIMA is a combination of autoregressive (AR), moving average (MA), and the differencing term (I). The ARIMA model depends on the well-known Box-Jenkins methodology (Box et al. 2011;Hyndman and Athanasopoulos 2018). The I-term differentiates the time series to ensure the stationarity, and the AR indicates that it is a regression of the variable against itself. For example, an autoregressive model of order p can be defined as: where a is a constant, ε t is the noise, and b p are the parameters to be identified. This expression is similar to a multiple regression but considers the lagged values of y t as a predictor.
The MA suggests a moving average model that uses past forecast errors in a regression model. For example, a model average model or order q can be expressed by: where c is a constant, ε t is the noise, and r p are the parameters to be determined. In this case, the prediction y t can be computed based on a weighted moving average of the past q forecast errors.
Finally, the integration of AR, I, and MA generates the ARIMA model described as: where m is a constant, ε t is the noise, and y ′ t signifies that the series is differenced, and it may have been differenced by more than one.
The ARIMA model is usually denoted as ARIMA (p, d, q) where the p, d, and q represent the number of lags applied to the MA model, the differencing degree, and the order of the MA model, respectively (Piccolo 1990;Box et al. 2011;Hyndman and Athanasopoulos 2018).

Implementation
The stored RT-PPP correction data contain orbital and clock corrections. The prediction is centered on the clock corrections since orbital corrections can be well projected based on the radial, along-track, and cross-track components and their velocities broadcast in the navigation messages (Hadas and Bosy 2015). Clock corrections also need to be estimated at a high rate (Ge et al. 2012) due to the frequency instability of onboard GNSS clocks, mainly caused by temperature and gravitational variations.
A balance between the training set data length and computational time is required to control the prediction accuracy and process time if real-time predictions are required. Rolling sliding window (RSW), expanding window, or fixed splitting are various testing and training methods to accomplish this. The clock corrections experience a period of high oscillations, with periodic jumps due to the clocks' frequency drift, instability, and changes in gravitational forces. Consequently, the forward validation and the RSW technique is more suitable testing and training method. It guarantees computational time reduction and mitigates the effect of outdated data, which clearly emerges by using an expanding window. Using a rolling window instead of an expanding window allows GNSS users to keep the computational load at the minimum level and improve the prediction accuracy.
This research aims to predict clock corrections in approximately 30 s to overcome the latency effect, so three essential aspects must be addressed. First, we must fix the impact of choosing different RSWs with various lengths on both SVR and ARIMA model predictions in the prediction accuracy. Second, the model parameterization in the prediction model definition must be updated by searching and adjusting the optimal hyperparameters in the model definition based on the stored data. The question here is to search for the best rate of updating hyperparameters and then search for the best values for these hyperparameters. This led to evaluating the prediction accuracy with a different combination of C and gamma hyperparameters in the SVR model and (p,q,d) in the ARIMA model, considering a large hyperparameter search space in both models to ensure more stable predictions.
Regarding SVR, a large C value means that the model did not allow errors to violate the margin, resulting in margin shrinking; the smaller it is, the more isolated values are allowed in the margin area (soft SVM). The gamma parameter establishes the variance of the Gaussian function used for class separation; it must be tuned to control interpolation, extrapolation, and nonlinearly separable classes (Guyon et al. 1993). For the ARIMA model, p controls the number of lags required to implement the linear regression, d controls the degree of differencing mandatory to ensure signal stationarity, and q is devoted to controlling signal error propagation. Finally, the third essential aspect is to compare, in a post-process experiment, the clock values stored in realtime with the SVR and ARIMA predictions, computing the standard deviation and the range for residuals and the effect on the final PPP computed coordinates. This experiment was performed by reprocessing the stored RT-PPP files using BNC software in the static mode by eliminating the latency effect since the clock and orbital corrections production and transmission are eliminated. So, the latency effect was eliminated to obtain the so-called latency-free solution.
The stored clock correction file was modified to hold new clock correction values from SVR and ARIMA predictions. BNC is rerun again with the files containing predictions to compare the coordinates between the stored files in postprocessing (latency-free solution) and SVR and ARIMA predictions, solutions with the latency eliminated by prediction. A simulation of the latency effect is computed to complete the comparisons by shifting the stored correction file ahead by 30 s, which more or less represents the stored mean latency, the so-called forced-latency solution, and the normal RT-PPP situation. Figures 2 and 3 show an overview of the two central parts of the methodology applied in this research. Figure 2 includes the methodology part related to real-time implementation. The BNC software used the broadcasted corrections and navigation information with the station observation stream to produce the real-time coordinates solution. The clock corrections predictions were obtained from both models according to the rolling sliding values. The clock corrections residuals were calculated from different solutions with respect to real-time corrections.
The post-processing was described in Fig. 3. The stored navigation and station observation information with the stored clock corrections predictions and clock corrections stored in real-time and the simulated clock corrections were used to produce the station coordinates with respect to the different solutions or clock corrections. The sampling value for real-time and forced-latency clock corrections files was 10 s. SVR and ARIMA models predict the clock corrections with the same sampling interval.

Results and discussion
The initial portion of the results, following the implementation section, is related to the RSW length effect. Different sliding windows are probed with sizes of 1, 2, 4, 8, and 16 min to fit the SVR and ARIMA models. One hour of clock correction data was implemented for this experiment on December 13, 2019. The standard deviation for clock correction residuals obtained by subtracting the SVR and ARIMA predictions was compared with the latency-free solution. Based on the results, an 8-min-long RSW for the ARIMA model and a 1-min RSW for the SVR model long can be selected as a good compromise between prediction accuracy and processing time. Tables 3 and 4 include the standard deviation comparison for both models. One satellite represents each satellite block, and the required time for processing is also included. Those tables also contain, for comparison purposes, the standard deviation of the residual between the forced-latency solution, as a reproduction of the real-time process, and the latencyfree solution. It should be mentioned that the results are the same if we choose different one-hour intervals of the stored data. It can be seen from the tables that there is a negative relation between the RSW length and the error values obtained by the ARIMA model. This shows that the ARIMA model is highly dependent on the length of previous observations to adjust the hyperparameters properly and construct the prediction model. STDP in Tables 3 and 4 denote the standard deviation of predictions in comparison with the free-latency solution, and STDL denotes the forced-latency standard deviation in comparison with the free-latency solutions. For both tables, the standard deviation unit is in meters, and the processing time corresponds to a one-hour of predictions.
The SVR model can be implemented through different kernels such as linear, polynomial, and radial base functions. The reason behind picking the radial base function as a kernel is that it is more suitable for short-term predictions like latency. In this research, the radial base function was used, and as the RSW length increased, the influence of outdated observations became more notable. It is worth mentioning that the different types of clocks used in different satellite blocks that led to negative and positive trends are not valid for all RSWs.
According to the implementation section, the second vital position to consider is related to selecting the best updating rate for hyperparameters. Updating rates of 0.25, 0.50, 1, 2, 3, 4, 5, and 6 h were examined from December 13 to December 16, 2019. Based on the experiments, a onehour updating rate for SVR is selected. The list of hyperparameters to resolve in the SVR method should be quite extensive in order to ensure that correct values are chosen. Consequently, a one-hour rate is a good balance between   computation time and accuracy. However, the ARIMA model hyperparameters list can be considered fairly small: 0 to 3 for p and d parameters and 0 to 1 in the q parameter. Thus, the hyperparameter search is a high-speed operation, and their fixation can be executed for every RSW of 8 min without loss of computational time. It has also been realized that the rolling sliding window length has a greater impact on the prediction accuracy than the parameter updating rate. The same data and products for the ARIMA model examination were employed to analyze the SVR model. The third and final aspect to consider is related to the comparisons between prediction models and the forced-latency solution regarding the latency-free clock correction values. Figures 4 and 5 demonstrate the range calculated for the clock correction residuals. The residuals were calculated by subtracting the clock correction values of forced-latency, ARIMA prediction, and SVR prediction clock files with respect to the latency-free file. Figures 6 and 7 illustrate the standard deviation for the same residuals. It is apparent from those figures that the range reduction is negligible due to clock correction jumps. However, a remarkable reduction in standard deviation, especially for GPS satellites, has been discovered. For Figs. 4, 5, 6, and 7, the differences are computed with respect to the latency-free clock correction values. From the previous experiments, it can be concluded that the ARIMA model has lower standard deviations than the SVR model, except for the GLONASS M satellite block, where SVR is superior. However, the SVR model shows a faster execution time than the ARIMA model, and Tables 3  and 4 refer to the processing time needed for one-hour predictions.    Coordinates free from latency (latency-free solution), simulated real-time (forced-latency solution), SVR predictions, and ARIMA predictions were acquired using the postprocess BNC software module for the permanent stations of Fig. 1, and one day of real-time data for the years 2013, 2014, 2015, 2019, and 2021. Table 5 shows the average of the 3D residuals statistical summary in terms of the mean, standard deviation, and range of the forced-latency, SVR, and ARIMA solutions compared with the latency-free solution. Figure 8 is a scatter plot that expresses the X and Y planar coordinates of all four solutions and includes the fixed coordinates of station IGS BREST during one day period. Both of the prediction models show remarkable improvement in the accuracy of coordinates in comparison with the forced-latency solution, which simulates the real-time situation. Figure 8 confirms that the SVR and ARIMA coordinate solutions are more precise and much denser around the true coordinate than the forced-latency solution, especially the ARIMA solution. For visualization purposes, X and Y coordinates were shifted with (423,100 and 332,700) meters.     Tables 6 and 7 show the statistical summary throughout the research days of the years into consideration of standard deviation and range for the residuals obtained by subtracting the clock corrections values from different solutions (forced-latency, ARIMA, and SVR predictions) with respect to latency-free values. Tables 6 and 7 does not include the mean value analysis as the mean values for the residuals are near zero for all solutions.
Finally, the data collected with the highest clock correction jumps were examined. Each clock correction value was subtracted from the previous values so that the maximum clock correction jumps could be determined; consequently, the results were grouped for each satellite block by calculating the mean of the maximum values. Figure 9 shows the clock corrections values for 24 h between December 13 and 14, 2019. The figure shows the value for corrections concerning the GPS satellite G05. It can be seen from the figure the clock correction values dropped around 0.40 m within seconds. Figure 10 represents 30 min of clock corrections, the black line represents the free-latency solution, and the gray dash line denotes the forced-latency solution, while the thick gray and black dash lines show the prediction models. Table 8 presents the mean value of the maximum consecutive difference between successive clock correction values. The results proved that the jumps could reach 11.68 m for satellites in GPS block GPS-IIF meters in 2013. Table 8 shows that the clock corrections consecutive jumps have fewer values in recent years, especially for GPS satellites.
From the GNSS real-time user point of view, the implementation should be as follows. If the SVR method is selected, the user should store observations for the first 2 min to generate the first valid SVR model. An initial search for the hyperparameters and fixed model prediction is carried out. Next, the fitted model can predict the next clock correction according to the stored latency value, keeping the size of the rolling sliding windows constant at 1 min. Thus, the new observations are stored, and the old observations are automatically deleted in the SRW window. In addition, the GNSS users need to update the C and gamma parameters for the SVR model each hour to ensure accuracy. Therefore, the user is required to store one-hour latency values continuously. For the first hour of observations, the hyperparameters can be updated every 15-20 min to ensure accuracy in the prediction.
In case the ARIMA method is selected, the user should store the observation for the first 2 min. The first search for hyperparameters should be executed for the first 2 min of observations. The respective hyperparameters are utilized for the first 8 min of observation. Afterward, an SRW window of 8 min can be employed, including the searchand-fix of hyperparameters. In both models, the GNSS user should use the clock corrections as they are received in the first 2 min with a latency effect; then, the GNSS user can use a prediction model and results.

Conclusions
The SVR and ARIMA prediction models are applied to IGS03 clock corrections. Both models are used to estimate the clock corrections to overcome the latency effect. In this research, the latency effects on orbital corrections were not studied, as they vary at a low rate compared to the clock corrections and are not severely affected by latency. Three days of real-time data were utilized initially in this study in order to obtain the correct time dimension for the rolling sliding window and the correct rate to update the ARIMA and SVR hyperparameters. Similarly, one day of real-time data for the years 2013, 2014, 2015, 2019, and 2021 is used to confirm the validity of the proposed methods. The results prove that both models can be used to overcome the latency.
According to Tables 6 and 7, the ARIMA model reduces the standard deviation by 28% and 13% for GPS and GLO-NASS constellations, respectively, compared to the forcedlatency solution. The SVR model reduces the standard deviation by 28% and 23% for GPS and GLONASS constellations, respectively. Both models showed robust behavior during clock correction jumps because the models rely on 1 min of clock correction observations for the SVR model and 8 min for the ARIMA model. The uses for the RSWs mitigate the effect of jumps and improve the availability of the RT-PPP solution derived from the BNC software and maintain the coordinate convergence during jump periods.
Finally, based on the results of Table 5, the average of the 3D standard deviations, both models are successful in eliminating the latency effect, especially the ARIMA model. However, the SVR model displays superiority in processing time compared to ARIMA. It is approximately eight to nine times faster. Thus, it could be an excellent solution to overcome the latency due to simplicity and computational speed, which is important since GNSS receivers need to predict clock corrections for approximately 10-20 satellites that are above the horizon during the observation when using GPS and GLONASS signals. The proposed prediction models could also forecast clock corrections during periods of data loss or discontinuity. In this research, the range analysis is included in order to investigate outlier predictions. Both prediction models experienced this phenomenon to prevent such behavior in a limited number of predictions. They affect both clock corrections and 3D calculated residuals. To eliminate possible outliers, a threshold detector must be implemented for both predicting models. Additionally, the research methodology could improve the accuracy of dynamic GNSS receivers such as self-driving vehicles and mobile users. Those users rely on mobile data connections where the internet connections are more affected by the latency.