Introduction

With the rapid development of global navigation satellite systems (GNSS) over the past decades, real-time precise point positioning (PPP) has extended to multiple fields, such as real-time atmospheric sounding (Li et al. 2014; Liu et al. 2018), tsunami prediction (Samper and Merino. 2013; Saito et al. 2016), earthquake monitoring and early warning (Allen and Ziv. 2011; Li et al. 2013). Real-time precise orbit and clock products are indispensable prerequisites to support the real-time precise positioning service. Aiming at this, the International GNSS Service (IGS) initiated the Real-Time Working Group (RTWG) in 2001 and formally launched the Real-time Service (RTS) in 2013 (http://www.igs.org/rts). Since precise orbit can be predicted based on observations in the recent past with high precision, most of the efforts have been carried out in the aspect of real-time clock estimation (Ge et al. 2012; Dai et al. 2019). The target accuracy of real-time clock products is 0.3 ns (Weber et al. 2007; Hadas and Bosy. 2015). However, the predicted clocks of IGS ultra-rapid products can only reach a precision of 1.5 ns for GPS (http://www.igs.org/products). Therefore, quantities of globally distributed real-time stations are usually used in the real-time precise clock estimation (Dixon 2006; Hauschild and Montenbruck 2009), and the precision of multi-GNSS real-time clock products is generally better than 0.2 ns compared to IGS final products (Liu et al. 2019; Cao et al. 2022).

Three primary methods are commonly used in real-time clock estimation, including the undifferenced (UD) method, epoch-differenced (ED) method (Mervart et al. 2008; Zhang et al. 2011), and mix-differenced (MD) method (Zhang et al. 2016; Chen et al. 2018). Both ED and MD methods use epoch-differenced carrier phase observations to remove large quantities of ambiguity parameters and speed up the computation. Although ED and MD methods are efficient to realize and can reach comparable accuracy with that of UD method, high computation performance is achieved at the loss of some useful information. The float ambiguity parameters, for example, can be further fixed to get more reliable clock estimation and improve the performance of real-time PPP (Li et al. 2019b; Dai et al. 2019). As to UD method, satellite clocks are estimated based on undifferenced phase and range observations, and all the useful information is kept. However, a large number of ambiguity parameters are estimated together with clock parameters, which leads to the heavy computation burden. In addition, to meet the needs of time-critical applications, real-time satellite clock corrections need to be broadcast to the users at an update rate of 5 s or higher. Nowadays, abundant GNSS tracking stations and modern constellations contribute to better observation geometry which ensures a higher clock precision and stability (Zuo et al. 2021), whereas the huge number of observations burden the computation and hamper its real-time applications. Improving the computation efficiency of UD method becomes a challenging issue to be urgently solved.

Benefitting from the development of computing technologies, parallel data processing and high performance of software libraries provide a new brainstorming for the GNSS network solutions (Cui et al. 2015; Li et al. 2019a). Gao et al. (2017) introduced the OpenMP (Open Multi-Processing) to accelerate the estimation of real-time GPS satellite clocks. Gong et al. (2018) improved the efficiency of square root information filter (SRIF) by taking advantage of the block QR factorization and decreasing the algorithm complexity of GNSS data processing. In the study of Kuang (2019), the sequential extended Kalman filter (EKF) with OpenMP was introduced to improve the efficiency of real-time GNSS orbit and clock estimation. Liu et al. (2019) developed an efficient UD method for precise clock estimation (PCE), wherein two-thread processing is adopted. The full-parameter thread is time-consuming and estimates all the parameters, including the large number of ambiguity parameters. The high-rate thread estimates the clock corrections only with phase observations, in which the latest ambiguity estimates and corresponding variance information are introduced as phase corrections. Fu et al. (2019) implemented an adapted online quality control strategy based on sequential least square (LSQ) adjustment to meet the 5 s update rate of the IGS real-time clock product. In fact, with the formal commission of the global BeiDou navigation satellite system (BDS-3) in 2020, more satellites have become available for PCE and further burden the computation. Since the computation time should be shorter than the interval of the real-time clock products, it is a challenging work to finish the multi-GNSS clock estimation within a few seconds. Hence, it is essential to carry out more studies on improving the computation efficiency of multi-GNSS real-time clock estimation based on undifferenced observations.

In this contribution, an efficient strategy is proposed to achieve high processing efficiency for multi-GNSS real-time clock estimation, wherein the UD method based on sequential LSQ is adopted. In the proposed strategy, parallel data processing and high-performance matrix operations are introduced to accelerate the processing of multi-GNSS clock estimation. The former is based on OpenMP, since its API (Application Program Interface) gives programmers a simple and flexible interface for developing parallel applications on different platforms (http://www.openmp.org). The latter is achieved by the implementation of the Schur complement and the open-source library OpenBLAS (http://www.openblas.net).

After introducing the undifferenced real-time clock estimation observation model, the parallel processing strategy of PCE using sequential LSQ is presented and discussed. Subsequently, multiple experiments are conducted to evaluate the computation efficiency of the proposed strategy. The performance of precise clock products is then analyzed by clock comparison and real-time PPP validation. Finally, the summary and conclusions are drawn.

Methodology

This section introduces the undifferenced mathematical model for multi-GNSS clock estimation. Then, the principle of parameter elimination in sequential LSQ is presented. Finally, a parallel processing strategy based on OpenMP and a parallel processing procedure for multi-GNSS real-time clock estimation is summarized in detail.

Undifferenced model for multi-GNSS clock estimation

The original undifferenced GNSS observation equations for pseudorange \(P_{r,j}^{s}\) and carrier phase \(L_{r,j}^{s}\) can be expressed as follows,

$$P_{r,j}^{s} = \rho_{r}^{s} + c(t_{r} - t^{s} ) + c(d_{r,j} - d^{s} ) + I_{r,j}^{s} + T_{r}^{s} + \varepsilon_{P,r,j}^{s}$$
(1)
$$L_{r,j}^{s} = \rho_{r}^{s} + c(t_{r} - t^{s} ) + \lambda_{j} (b_{r,j} - b_{j}^{s} ) + \lambda_{j} N_{r,j}^{s} - I_{r,j}^{s} + T_{r}^{s} + \varepsilon_{L,r,j}^{s}$$
(2)

where \(s\), \(r\), and \(j\) refer to satellite, receiver, and frequency band, respectively; \(\rho_{r}^{s}\) is the geometric distance from the satellite to the receiver; \(c\) is the speed of light in vacuum; \(t^{s}\) and \(t_{r}\) are the clock offsets for the satellite and receiver in seconds; \({\text{d}}_{r,j}\) and \({\text{d}}_{{\text{j}}}^{{\text{s}}}\) are the code bias of the receiver and satellite in seconds; \(b_{{{\text{r}},{\text{j}}}}\) and \(b_{j}^{s}\) are the phase delay of the receiver and satellite in cycles; \(\lambda_{j}\) denotes the wavelength of the carrier for frequency \(j\) in meters; \(N_{{{\text{r}},{\text{j}}}}^{{\text{s}}}\) denotes the integer ambiguity in cycles; \(I_{{{\text{r}}.j}}^{{\text{s}}}\) is the ionospheric delay on the frequency band \(j\) in meters along the signal path; \(T_{r}^{s}\) is the slant tropospheric delay in meters; \(\varepsilon_{P,r,j}^{s}\) and \(\varepsilon_{L,r,j}^{s}\) denote the sum of observation noise, multi-path error, and other unmodeled errors for the pseudorange and carrier phase observations, respectively. The systematic error corrections such as the earth tides, ocean tide loading, relativistic delays, phase wind-up as well as phase center offsets (PCOs) and variations (PCVs) can be corrected according to the existing models (Schaer et al. 1999; Dach et al. 2006), and they are already applied in the equations.

In order to eliminate the first-order ionospheric delay, the ionosphere-free (IF) combination is widely used in real-time clock estimation. As for the tropospheric delay, its dry component is usually corrected with an a priori model (Li et al. 2019b). In this way, the IF observation model can be expressed as follows,

$$P_{{r,{\text{IF}}}}^{s} = \rho_{r}^{s} + c(t_{{r,{\text{IF}}}} - t_{{{\text{IF}}}}^{s} ) + c(d_{{r,{\text{IF}}}} - d_{{{\text{IF}}}}^{s} ) + m_{r}^{s} Z_{r} + \varepsilon_{{P,r,{\text{IF}}}}^{s}$$
(3)
$$L_{{r,{\text{IF}}}}^{s} = \rho_{r}^{s} + c(t_{{r,{\text{IF}}}} - t_{{{\text{IF}}}}^{s} ) + \lambda_{{{\text{IF}}}}^{s} (b_{{r,{\text{IF}}}} - b_{{{\text{IF}}}}^{s} ) + m_{r}^{s} Z_{r} + \lambda_{{{\text{IF}}}}^{s} N_{{r,{\text{IF}}}}^{s} + \varepsilon_{{L,r,{\text{IF}}}}^{s}$$
(4)

where \({\text{d}}_{{{\text{r}},{\text{IF}}}}\) and \({\text{d}}_{{{\text{IF}}}}^{{\text{s}}}\) refer to the IF code bias of receiver and satellite; \(b_{{{\text{r}},{\text{IF}}}}\) and \(b_{{{\text{IF}}}}^{{\text{s}}}\) are the IF phase delay of the receiver and satellite signal; \(N_{{{\text{r}},{\text{IF}}}}^{{\text{s}}}\) refer to the carrier phase IF ambiguity; \(Z_{r}\) is the zenith wet delay and \(m_{r}^{s}\) is the corresponding mapping function; \(\varepsilon_{{P,{\text{r}},{\text{IF}}}}^{{\text{s}}}\) and \(\varepsilon_{{L,{\text{r}},{\text{IF}}}}^{{\text{s}}}\) are the sum of IF observation noise, multi-path error, and other unmodeled errors for the pseudorange and carrier phase observations.

In clock estimation, the satellite orbits and station coordinates are held fixed. It should be noted that the satellite clocks are not estimable directly. Two types of the linear dependencies cause rank defects, the linear dependency between satellite clocks and receiver clocks (De Jonge 1998; Zhang et al. 2014) and the linear dependency between clocks and code hardware delays (Odijk et al 2016, 2017). To eliminate rank defects caused by the linear dependency between the satellite and receiver clocks, one clock datum should be chosen and the other clocks are aligned to the datum. In this study, one specific satellite clock is set as the datum and constrained according to the broadcast ephemeris. To eliminate rank defects caused by the linear dependency between clocks and code hardware delays, satellite and receiver code biases are absorbed by the corresponding clock offset parameters, which is also an IGS recognized convention (Kouba and Héroux 2001; Dach et al. 2015). The phase delays are merged into the phase ambiguity. Hence, the observation model can be expressed as:

$$v_{{{\text{Pc}}}}^{{\text{s}}} = \delta \overline{t}_{{\text{r}}} - \delta t^{{\text{s}}} + m_{{\text{r}}}^{{\text{s}}} Z_{{\text{r}}} + \varepsilon_{{{\text{Pc}}}}$$
(5)
$$v_{{{\text{Lc}}}}^{{\text{s}}} = \delta \overline{t}_{{\text{r}}} - \delta t^{{\text{s}}} + \lambda_{{{\text{IF}}}}^{{\text{s}}} \overline{N}_{{{\text{r}},{\text{IF}}}}^{{\text{s}}} + m_{{\text{r}}}^{{\text{s}}} Z_{{\text{r}}} + \varepsilon_{{{\text{Lc}}}}$$
(6)

where \(\delta \overline{t} = c(t_{{{\text{r}},{\text{IF}}}} + {\text{d}}_{{{\text{r}},{\text{IF}}}} )\) and \(\delta t^{{\text{s}}} = c(t_{{{\text{IF}}}}^{{\text{s}}} + {\text{d}}_{{{\text{IF}}}}^{{\text{s}}} )\) refer to the receiver clock and satellite clock in meters, respectively; \(\overline{N}_{{{\text{IF}}}}^{{\text{s}}} = N_{{{\text{IF}}}}^{{\text{s}}} + b_{{{\text{r}},{\text{IF}}}} - b_{{{\text{IF}}}}^{{\text{s}}} - c({\text{d}}_{{{\text{r}},{\text{IF}}}} - {\text{d}}_{{{\text{IF}}}}^{{\text{s}}} )/\lambda_{{{\text{IF}}}}^{{\text{s}}}\) is the re-parameterized IF phase ambiguity; \(v_{{{\text{Pc}}}}^{{\text{s}}}\) and \(v_{{{\text{Lc}}}}^{{\text{s}}}\) denote "observed minus computed" (OMC) residuals of phase and pseudo-range observation, respectively.

In the multi-GNSS scenario, the measurements could be systematically biased from satellite to satellite due to the signal delay caused by hardware. For the GNSS using code division multiple access (CDMA) signals, including GPS, BDS, and Galileo, the inter-system biases (ISB) are added to handle the biases between different systems. As for the GLONASS satellites using frequency division multiple access (FDMA) signals, the biases between different satellites are parameterized as the inter-frequency bias (IFB). Using GPS time as time reference, the combined multi-GNSS clock estimation models can be expressed as:

$$\left\{ {\begin{array}{*{20}l} {v_{{{\text{Pc}}}}^{G} = \delta \overline{t}_{r} - \delta t^{s,G} + m_{r}^{s} Z_{r} + \varepsilon_{{P_{c} }}^{G} } \hfill \\ {v_{{{\text{Pc}}}}^{E} = \delta \overline{t}_{r} - \delta t^{s,E} + {\text{ISB}}^{E - G} + m_{r}^{s} Z_{r} + \varepsilon_{{P_{c} }}^{E} } \hfill \\ {v_{{{\text{Pc}}}}^{C} = \delta \overline{t}_{r} - \delta t^{s,C} + {\text{ISB}}^{C - G} + m_{r}^{s} Z_{r} + \varepsilon_{{P_{c} }}^{C} } \hfill \\ {v_{{{\text{Pc}}}}^{{R_{k} }} = \delta \overline{t}_{r} - \delta t^{{s,R_{k} }} + {\text{IFB}}^{{R_{k} - G}} + m_{r}^{s} Z_{r} + \varepsilon_{{P_{c} }}^{{R_{k} }} } \hfill \\ \end{array} } \right.$$
(7)
$$\left\{ {\begin{array}{*{20}l} {v_{{{\text{Lc}}}}^{G} = \delta \overline{t}_{r} - \delta t^{s,G} + \lambda_{{{\text{IF}}}}^{G} \overline{N}_{{r,{\text{IF}}}}^{s,G} + m_{r}^{s} Z_{r} + \varepsilon_{{L_{c} }}^{G} } \hfill \\ {v_{{{\text{Lc}}}}^{E} = \delta \overline{t}_{r} - \delta t^{s,E} + {\text{ISB}}^{E - G} + \lambda_{{{\text{IF}}}}^{E} \overline{N}_{{r,{\text{IF}}}}^{s,E} + m_{r}^{s} Z_{r} + \varepsilon_{{L_{c} }}^{E} } \hfill \\ {v_{{{\text{Lc}}}}^{C} = \delta \overline{t}_{r} - \delta t^{s,C} + {\text{ISB}}^{C - G} + \lambda_{{{\text{IF}}}}^{C} \overline{N}_{{r,{\text{IF}}}}^{s,C} + m_{r}^{s} Z_{r} + \varepsilon_{{L_{c} }}^{C} } \hfill \\ {v_{{{\text{Lc}}}}^{{R_{k} }} = \delta \overline{t}_{r} - \delta t^{{s,R_{k} }} + {\text{IFB}}^{{R_{k} - G}} + \lambda_{{{\text{IF}}}}^{{R_{k} }} \overline{N}_{{r,{\text{IF}}}}^{{s,R_{k} }} + m_{r}^{s} Z_{r} + \varepsilon_{{L_{c} }}^{{R_{k} }} } \hfill \\ \end{array} } \right.$$
(8)

where \(G\),\(E\), and \(C\) refer to GPS, Galileo, and BDS systems, respectively; \(R_{k}\) is the GLONASS satellite with frequency factor \(k\); \({\text{ISB}}^{E - G} = {\text{d}}_{{r,{\text{IF}}}}^{E} - {\text{d}}_{{r,{\text{IF}}}}^{G}\); \({\text{ISB}}^{C - G} = {\text{d}}_{{r,{\text{IF}}}}^{C} - {\text{d}}_{{r,{\text{IF}}}}^{G}\); \({\text{IFB}}^{{R_{k} - G}} = {\text{d}}_{{r,{\text{IF}}}}^{{R_{k} }} - {\text{d}}_{{r,{\text{IF}}}}^{G}\). It should be noted that GLONASS code IFBs are usually neglected to simplify data processing, especially in real time (Cai and Gao 2013; Chen et al. 2017, 2018).

Sequential least square with parameter elimination

In GNSS data processing, the unknown parameter vector \(x\) can be divided into the active parameter vector \(x_{1}\) and the inactive parameter vector \(x_{2}\). The Gauss-Markov model of the aforementioned observation equations can be expressed in the following as:

$$x = [\begin{array}{*{20}c} {x_{1} } & {x_{2} } \\ \end{array} ]^{T}$$
(9)
$$v = Ax_{1} + Bx_{2} + \varepsilon ,P$$
(10)

where \(A\) and \(B\) are the corresponding design matrices with full column rank of \(x_{1}\) and \(x_{2}\); \(v\) denotes OMC residuals of observation; \(\varepsilon\) is normally distributed observation error vector with zero mean; \(P\) denotes the weight matrix of observables.

In the sequential LSQ adjustment, inactive parameters, i.e., satellite and receiver clocks, and tropospheric delays, are usually estimated as white noise or random-walk process, and they will be eliminated before new parameters are added in the normal equation (NEQ). The NEQ of the least squares can be expressed as:

$$\left( {\begin{array}{*{20}c} {N_{11} } & {N_{12} } \\ {N_{21} } & {N_{22} } \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {\hat{x}_{1} } \\ {\hat{x}_{2} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {A^{T} Pv} \\ {B^{T} Pv} \\ \end{array} } \right)$$
(11)

where

$$N = \left( {\begin{array}{*{20}c} {N_{11} } & {N_{12} } \\ {N_{21} } & {N_{22} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {A^{T} {\text{PA}}} & {A^{T} {\text{PB}}} \\ {B^{T} {\text{PA}}} & {B^{T} {\text{PB}}} \\ \end{array} } \right)$$
(12)

and \(\hat{x}_{2}\) refers to the inactive parameters which will never be used for the observations afterward.

In fact, not all the inactive parameters are eliminated at the same epoch and parameters to be eliminated at current epoch are usually nonadjacent. Hence, eliminating parameters one by one is the traditional way in the sequential LSQ implementation. However, to fully exploit the power of modern CPUs, the granularity of the operations should be larger (Schreiber and Loan 1989; Demmel 1997). Figure 1 shows a typical multi-level memory of the cache-based processors. Data must be moved from memory layers lower in the pyramid into the registers for CPU to perform floating point operations. Since such movement represents unproductive overhead, reusing data that is up the memory pyramid many times before allowing it to slide back down becomes important (Low and van de Geijn 2004). Consequently, casting algorithms in terms of larger granularity operations, i.e., matrix–matrix multiplication, will achieve better performance for data processing (Gong et al. 2018). Concerning the parameter elimination in sequential LSQ, inactive parameters at the current epoch are rearranged in a continuous row and eliminated thoroughly by the implementation of the Schur complement.

Fig. 1
figure 1

Hierarchical memories of modern CPU viewed as a pyramid

Note that \(N_{11}\) and \(N_{22}\) are square and invertible, the Schur complement of \(N_{22}\) in \(N\) is \(N_{11} - N_{12} N_{22}^{ - 1} N_{21}\) (Gabe et at. 2010). Reducing the parameters \(\hat{x}_{2}\) onto the parameters \(\hat{x}_{1}\) gives,

$$\left( {\begin{array}{*{20}c} {N_{11} - N_{12} N_{22}^{ - 1} N_{21} } & 0 \\ {N_{21} } & {N_{22} } \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {\hat{x}_{1} } \\ {\hat{x}_{2} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {A^{T} Pv - N_{12} N_{22}^{ - 1} B^{T} Pv} \\ {B^{T} Pv} \\ \end{array} } \right)$$
(13)

Let \(J = {\text{BN}}_{22}^{ - 1} B^{T} P\) and \(\tilde{A} = (I - J)A\), then,

$$\begin{gathered} \tilde{N}_{11} = N_{11} - N_{12} N_{22}^{ - 1} N_{21} \\ \quad = A^{T} PA - A^{T} {\text{PBN}}_{22}^{ - 1} B^{T} {\text{PA}} \\ \quad = A^{T} (I - J)^{T} P(I - J)A \\ \end{gathered}$$
(14)
$$\tilde{N}_{11} \hat{x}_{1} = \tilde{A}^{T} Pv$$
(15)
$$\tilde{A}\hat{x}_{1} = v$$
(16)

Inactive parameters are marginalized out, and active parameters are stored for the next epoch processing.

In addition, to further improve the performance of matrix operations, the open-source library OpenBLAS is implemented. Since the Basic Linear Algebra Subprograms (BLAS) is one of the most basic linear algebra kernels and provides numerous matrix–matrix operations with a portable interface (Low and van de Geijn 2004), OpenBLAS optimized level 3 BLAS (matrix–matrix operations; Wang et al. 2013). Compared to other open-source libraries commonly used in the traditional serial strategy, such as Eigen (http://www.eigen.tuxfamily.org), OpenBLAS shows significant improvements in the following experiments.

Parallel processing strategy based on OpenMP

To fully exploit the power of modern CPUs with multi-core and multi-thread, parallel processing strategy based on OpenMP, multi-GNSS real-time clock estimation is implemented. Figure 2 shows the OpenMP parallel working principle, namely a fork-join design pattern. OpenMP is an implementation of multithreading, a method of parallelizing whereby a primary thread (a series of instructions executed consecutively) forks a specified number of sub-threads and the system divides a task among them.

Fig. 2
figure 2

Illustration of the working principle of OpenMP

The parallel processing flowchart of multi-GNSS real-time clock estimation is summarized in Fig. 3. At the beginning, real-time observation data and broadcast ephemeris are derived from BKG Ntrip Client (BNC, https://igs.bkg.bund.de/). The satellite orbits are retrieved and updated from IGS ultra-rapid (predicted half) products. Since epoch observations of different stations are independent, a parallel region based on OpenMP is created when performing data preprocessing and NEQ construction. In the step of data preprocessing, quality control is carried out to detect the gross errors and cycle slips. Next, in the parameter updating step, inactive parameters are marginalized out by the implementation of the Schur complement and new parameters are initialized and added in. Subsequently, parallelization based on OpenMP is introduced to accelerate the step of NEQ construction. As shown in Fig. 4, parallelization is implemented before the outer station loop in consideration of the overheads due to the synchronization. Thereafter, OpenBLAS matrix library contributes to the high performance of matrix operations and speeds up the step of sequential LSQ solving. It should be noted that OpenBLAS also supports multithreading based on OpenMP to accelerate matrix operations. Finally, multi-GNSS real-time clock products are generated and broadcast to support real-time precise positioning service at the user side.

Fig. 3
figure 3

Parallel processing procedure for multi-GNSS real-time clock estimation

Fig. 4
figure 4

Pseudo-code of OpenMP to accelerate the step of NEQ construction

Software and hardware

In order to satisfy various demands of scientific and engineering applications in geodesy and navigation fields, the GREAT (GNSS + REsearch, Application and Teaching) software was designed and developed at the school of Geodesy and Geomatics of Wuhan University (Li et al. 2022). Written in standard C++ language and compiled with Object-Oriented principles, GREAT software can achieve multiple applications such as GNSS real-time orbit and clock determination, PPP-RTK (real-time kinematic), LEO augmentation, multi-sensor integration navigation, etc. The proposed strategy is implemented in real-time clock estimation with the GREAT software package in this contribution. Specifications of the processing strategy for multi-GNSS real-time clock estimation are summarized in Table 1. To verify the performance of the proposed strategy, we perform all computations on a cloud server. Detailed information about the hardware resources is listed in Table 2.

Table 1 Detailed processing strategy for multi-GNSS real-time clock estimation
Table 2 CPU architecture information of the workstation

Experiments on computation efficiency

Series of experiments were designed to validate the effectiveness and reliability of the proposed computation strategy. First of all, the overall elapsed time per epoch of different parts in the traditional serial strategy of PCE is analyzed. Afterward, high-performance matrix operations based on the Schur complement and OpenBLAS library are added to accelerate the processing of parameter elimination and NEQ solving, respectively. Then, parallel processing based on OpenMP is introduced to speed up data preprocessing and NEQ construction and further accelerate matrix operations. Finally, elapsed time with different computation strategies against the growth in the number of stations is compared and analyzed.

This contribution uses 85 globally distributed multi-GNSS stations with real-time streams. The distribution and supported constellations of selected tracking stations are shown in Fig. 5. All stations provide GPS and GLONASS observations, while 83 and 78 stations can receive Galileo and BDS signals, respectively. The real-time streams are obtained by the BNC software in real time. This section uses observation data from DOY 50 to 52, 2022.

Fig. 5
figure 5

Distribution and supported constellations of the selected IGS tracking stations (purple for GPS-capable stations, green for GLONASS, red for Galileo, and blue for BDS)

Elapsed time of the traditional serial strategy

Figure 6 shows the overall elapsed time per epoch and average number of parameters for PCE using the traditional serial strategy with different GNSS constellations. With serial algorithm and open-source matrix library Eigen, it takes 1.46, 2.99, and 10.30 s to process the GPS-only, GE, and GREC systems for each epoch, respectively. It can be seen that, with the increase of GNSS satellites, the number of parameters and elapsed time increase rapidly.

Fig. 6
figure 6

Overall elapsed time per epoch and average number of parameters for PCE with different GNSS constellations using the traditional serial strategy

Specifically, the overall computation time is subdivided into four parts: preprocessing, parameter elimination, constructing NEQ, and solving NEQ. Figure 7 shows the elapsed time and percentages of the four parts in GREC quad-system scene. Solving NEQ consumes most of the computation time and costs 6.06 s, while constructing NEQ, preprocessing, and parameter elimination costs 3.45, 0.33, and 0.46 s, respectively. Therefore, it is impossible for the traditional strategy to meet the 5 s update rate of the IGS real-time clock product.

Fig. 7
figure 7

Elapsed time and percentages of the four parts in GREC quad-system scene

Optimization of matrix operations

Considering the analysis mentioned above, solving NEQ takes most of the computation time, especially the inversion of the epoch-wise high-order NEQ. With the implementation of optimized Cholesky factorization, OpenBLAS can significantly accelerate the step of solving NEQ. Figure 8 shows the elapsed time of solving NEQ with different GNSS constellations using OpenBLAS and Eigen, respectively. With the increase of GNSS satellites, the elapsed time for solving NEQ using OpenBLAS grows much slower than that of using Eigen, which makes OpenBLAS more competitive in the multi-GNSS scene. Compared with Eigen, the average time for the GPS-only, GE, and GREC using OpenBLAS is 0.09, 0.32, and 1.91 s, with an improvement of 63.6%, 65.2%, and 68.3%, respectively.

Fig. 8
figure 8

Elapsed time of solving NEQ with different GNSS constellations using OpenBLAS and Eigen

As for the step of parameter elimination, the inactive parameters from the last epoch are eliminated one by one in the traditional serial strategy, namely the cycle scheme. With the implementation of the Schur complement, the proposed computing strategy eliminates inactive parameters one time and shortens the processing time. Matrix-remove scheme is used to describe the proposed strategy for parameter elimination. Table 3 shows the elapsed time of parameter elimination with different GNSS constellations using the cycle scheme and matrix-remove scheme. It can be seen that the two schemes show comparable performance for GPS-only and GE. With the increase of GNSS satellites, inactive parameters increase and matrix-remove scheme shows much better performance than the cycle scheme, with a remarkable improvement of 45.7% in the quad-system scene.

Table 3 Elapsed time of parameter elimination with different GNSS constellations using the cycle scheme and matrix-remove scheme, respectively

OpenMP-based parallel optimization

In order to speed up the step of preprocessing and constructing NEQ, parallel processing based on OpenMP is utilized. Since the epoch observations of different stations are independent, parallel regions are created and OpenMP is implemented to fully exploit the power of modern CPUs with multiple cores and threads. Meanwhile, OpenBLAS also supports multithreading based on OpenMP to accelerate matrix operations.

Figure 9 shows the overall elapsed time for different GNSS constellations with various numbers of threads. It can be seen that the increase in the number of satellites burdens the computation, while the increase in threads can accelerate the processing. Besides, the improvement of parallelization is rather significant when the number of threads increases from 1 to 4, compared to that from 6 to 12. Taking the quad-system scene as an example, the former improvement reaches 51.2%, while the latter is only 10.8%. In the serial single thread scheme, the overall elapsed time is over 5 s and cannot meet the 5 s update rate of the IGS real-time clock product. When the number of threads reaches 12, the average time per epoch is less than 2.5 s, with a remarkable improvement of 62.0%.

Fig. 9
figure 9

Overall elapsed time for different GNSS constellations with various numbers of threads

To be more intuitionistic, Table 4 summarizes the performance of the traditional serial strategy and the proposed strategy with 12 threads in single-, dual-, and quad-system scenarios. Compared to the traditional serial strategy, the computation efficiency is significantly improved by 76.0%, 77.3%, and 77.7%, respectively.

Table 4 Elapsed time with the traditional strategy and the proposed strategy using 12 threads with different GNSS constellations

Table 5 shows the components of elapsed time for GREC systems with the traditional serial strategy and the proposed strategy using 12 threads. With the proposed strategy, elapsed time for preprocessing, parameter elimination, constructing NEQ and solving NEQ is improved by 27.3%, 65.2%, 84.9%, and 77.2%, respectively. Multi-GNSS real-time clock estimation can meet the 5 s update rate of the IGS real-time clock product.

Table 5 Components of elapsed time for GREC system with the traditional strategy and the proposed strategy using 12 threads

Computation efficiency with different number of stations

Abundant GNSS networks contribute to better observation geometry and ensure a higher clock precision and stability. However, such a huge number of observations greatly burdens the processing. To further assess the computation efficiency of multi-GNSS clock estimation with the proposed strategy, the average processing time for different numbers of stations is compared and analyzed. Additional 15 multi-GNSS stations are utilized to enrich the experiment results. Figure 10 shows the elapsed time of quad-system clock estimation with 55-, 70-, 85-, and 100- station networks based on the traditional serial and proposed strategies. When the number of stations is more than 70, the elapsed time with the traditional serial strategy is over 6 s. By contrast, the elapsed time using the proposed strategy grows quite slowly with the increase of stations. The computation efficiency is significantly improved by 76.1%, 76.9%, 77.7%, and 77.5%, respectively. When the network size reaches 100 stations, the elapsed time with the proposed strategy is 3.06 s and can still meet the 5 s update rate of the IGS real-time clock product.

Fig. 10
figure 10

Elapsed time of quad-system clock estimation with 55-, 70-, 85-, and 100-station networks based on two computing strategies

Evaluation of real-time clock products

In this section, the accuracy of real-time clock products is evaluated, and real-time PPP validation experiments are performed to further verify the reliability of the proposed method. All data processing for precise clock estimation and PPP validation are fully based on real-time data streams.

Clock comparison

To evaluate the accuracy of multi-GNSS real-time clock products, the derived clock estimates are compared with the IGS final GPS products and the GFZ final multi-GNSS products (GBM0MGXRAP). The comparison lasts from DOYs 50–52, DOYs 164–166, and DOYs 250–252, 2022. The clock time series are aligned to a reference satellite in order to eliminate the systematic errors caused by the clock datum. G01, R01, E02, and C21 are selected as the reference clock for GPS, GLONASS, Galileo, and BDS, respectively. Note that the performance of orbit products, especially the radial component, greatly influences on the performance of real-time clock estimation. The radial orbit differences between the different orbit products used in the proposed solution and IGS post-processing products, are removed. Figures 11 and 12 present the average STDs of the clock differences with respect to IGS final GPS products and GFZ final multi-GNSS products, respectively.

Fig. 11
figure 11

STDs of the clock difference with respect to IGS final GPS products

Fig. 12
figure 12

STDs of the clock difference with respect to GFZ final multi-GNSS products (GBM0MGXRAP) for different constellations (blue for GPS, red for Galileo, green for GLONASS, and brown for BDS)

For GPS satellites, the STDs are all smaller than 0.16 ns and the average value is 0.095 ns and 0.106 ns compared to IGS final GPS products and GFZ final multi-GNSS products, respectively. Comparable to that of GPS, the average STD of Galileo satellites is 0.102 ns. Due to the poor quality of GLONASS orbits, the average STD is only 0.325 ns and shows the worst performance. In terms of BDS satellites, the STDs are generally smaller than 0.25 ns and the average STD is 0.215 ns. The results indicate that multi-GNSS real-time products generated by the proposed strategy are of high quality and have a good agreement with the IGS post-processing products.

Real-time PPP validation

In this contribution, real-time clock and orbit corrections are encoded to state-space representative (SSR) format and broadcast through the Ntrip protocol to support the real-time PPP service. In order to further validate the positioning performance of real-time clock products, 30 multi-GNSS stations are selected to perform the real-time PPP experiments and the distribution of the stations is shown in Fig. 13.

Fig. 13
figure 13

Distribution of 30 multi-GNSS stations selected for real-time PPP experiments

The experiments last from 2022-09-06 04:17:20-2022-09-09 04:17:15 (GPST), and the positioning accuracy is determined by the difference between PPP results and IGS weekly combined solutions. To evaluate the performance of real-time PPP solutions using the estimated clock (GREAT clock), real-time PPP solutions using the CNES clock (SSRA00CNE0) are also listed for comparisons. The time series of the real-time PPP results for stations TOW2 and KIR8 are shown as examples in Figs. 14 and 15, respectively.

Fig. 14
figure 14

Time series of the real-time PPP positioning errors for station TOW2 (top for GREAT clock, and bottom for CNES clock)

Fig. 15
figure 15

Time series of the real-time PPP positioning errors for station KIR8 (top for GREAT clock, and bottom for CNES clock)

It can be seen that the positioning errors of the stations in the east, north, and up components are within ± 10 cm for most epochs. The average RMS of positioning errors and convergence time for 30 stations are summarized in Table 6. For CNES clock, the average RMS values are 2.34, 3.49, and 6.47 cm in the east, north, and up components, respectively, and the average convergence time is 9.50 min. For GREAT clock, the average RMS values are 1.83, 3.00, and 4.85 cm in the east, north, and up components, respectively, and the average convergence time is 8.79 min. All results indicate that high positioning accuracy can be achieved using the estimated multi-GNSS real-time clock products.

Table 6 Average RMS of positioning errors and convergence time for 30 stations using CNES and GREAT real-time clock

Conclusion

Real-time precise clock products are essential prerequisites to support real-time precise positioning service. In this contribution, an efficient strategy based on the UD method is proposed to achieve high processing efficiency of multi-GNSS clock estimation, wherein parallel data processing and high performance of matrix operations are introduced to accelerate the processing of multi-GNSS clock estimation. Benefitting from parallelization and effective use of modern CPUs, the computation time can be significantly reduced compared to the traditional serial strategy. Multi-GNSS observations from 85 globally distributed tracking stations are utilized for the generation of real-time clock products with the proposed strategy.

The computation efficiency of the proposed strategy is analyzed. Specifically, the overall computation time is subdivided into four parts: preprocessing, parameter elimination, constructing NEQ, and solving NEQ. With the implementation of optimized Cholesky factorization, OpenBLAS can accelerate the step of solving NEQ significantly. Compared with the open-source library Eigen, the average time for solving NEQ in GPS-only, GE, and GREC scene is 0.09, 0.32, and 1.91 s, with an improvement of 63.6%, 65.2%, and 68.3%, respectively. With the implementation of the Schur complement, the elapsed time of parameter elimination is shortened by 45.7% in the quad-system scene. To fully exploit the power of modern CPUs with multi-core and multi-thread, parallel regions applying OpenMP are created in the step of preprocessing and constructing NEQ. With the proposed strategy with 12 threads, the average elapsed time per epoch is 0.35, 0.68, and 2.30 s for GPS-only, dual-system, and quad-system solutions, respectively. Compared to the traditional serial strategy, the computation efficiency is significantly improved by 76.0%, 77.3%, and 77.7%, respectively. Moreover, the computation efficiency with different numbers of stations is compared and analyzed. When the network size reaches 100 stations, the elapsed time of quad-system clock estimation with the proposed strategy is 3.06 s and can still meet the 5 s update rate of the IGS real-time clock product.

Furthermore, the performance of the real-time clock products is evaluated to verify the reliability of the proposed strategy. Compared to IGS final GPS products, the average STD value is 0.095 ns, and compared to GFZ final multi-GNSS satellite clock products, the average STD values are 0.106, 0.102, 0.325, and 0.215 ns for GPS, Galileo, GLONASS, and BDS, respectively. Based on the real-time clock corrections, 30 multi-GNSS stations are employed for real-time PPP experiments. The average RMS values of the stations are 1.83, 3.00, and 4.85 cm in the east, north, and up components, respectively. All results demonstrate that the proposed strategy is effective and accurate and can guarantee high-rate multi-GNSS real-time clock estimation.