1 Introduction

Zumberge et al. (1997) introduced the precise point positioning (PPP) concept by using precise satellite orbit and clock products to provide positioning with centimeter accuracy for a stand-alone receiver. Its real-time realization was demonstrated by Héroux et al. (2004). To speed up this development toward a real-time precise positioning service, the International GNSS Service (IGS) (Johnston et al. 2017) launched its real-time pilot project (IGS-RTPP) (Cassey et al. 2012), aiming to acquire and distribute GNSS data and products in real-time based on the NTRIP protocol in June 2007. Serving as the essential information for real-time precise positioning, satellite orbits are usually estimated with the latest available observations and predicted for real-time applications (Elsobeiey et al. 2016; Li et al. 2022).

The orbit estimation can be carried out by the least-squares batch processing or epoch-wise filtering. The least-squares batch solution is adopted by most IGS Analysis Centers (ACs), including Center for Orbit Determination in Europe CODE (Schaer et al. 2021) and German Research Centre for Geosciences GFZ (Deng and Schuh 2018), while filtering is implemented by some of the ACs dedicated for real-time purpose, such as the square root information filter (SRIF) at JPL and Wuhan University (Bertiger et al. 2020; Lou et al. 2022) and Kalman filter at CNES (Laurichesse et al. 2013). For the filter-based method, the stochastic constraint is introduced in the orbit state elements update, which is formed as a linear blend of the previous estimate and the current measurement information (Laurichesse et al. 2013; Parkinson et al. 1996). Currently, the available real-time orbits are mostly predicted from the batch least-squares solutions owing to its feasibility of routine processing at most IGS ACs. However, with the extension of prediction time, the orbit accuracy drops progressively (Dai et al. 2019; Duan et al. 2019). In order to improve the orbit quality, there are two aspects to consider: one is to refine the modeling, especially the non-conservative forces, and the other one is to shorten the prediction time, i.e., shorten the orbit update latency.

IGS began providing ultra-rapid (IGU) GPS orbits in November 2000 and then reduced the update latency of ultra-rapid orbits from 12 to 6 h in April 2004 (Kouba 2009; Springer and Hugentobler 2001). As a member of the Multi-GNSS Experiment (MGEX) ACs, GFZ started to provide five GNSS system ultra-rapid products with 3-h update rate since November 2015, including GPS, Globalnaya Navigazionnaya Sputnikovaya Sistema (GLONASS), Galileo, BeiDou Navigation Satellite System (BDS), and Quasi-Zenith Satellite System (QZSS) (Deng et al. 2016). Wuhan University (WHU) provided hourly updated multi-GNSS orbits with an accuracy of 3 to 5 cm (Zhao et al. 2018).

With more satellites and stations involved, a huge number of parameters must be handled, especially for the undifferenced (UD) measurements. In that case, the computational time is always a burden. The simplest way to reduce the computing time is parallel processing, including the parallel processing of sub-networks, individual satellite constellations, and sub-sessions (session in this study refers to the time length of a processing job). The parallel processed tasks will be combined later to generate final orbit products. In the network parallel processing, the network is divided into several sub-networks to be processed in parallel, and the normal equations (NEQs) of sub-networks are combined by utilizing a certain number of common stations to derive a final solution (Beutler et al. 1996; Bruni et al. 2018; Pintori et al. 2021; Zurutuza et al. 2019). As the satellite and receiver clocks are eliminated during parallel processing, they cannot be combined by stacking the NEQs and thus it is not equivalent to the integrated solution. In the constellation parallel processing, each constellation is processed separately in parallel and delivered to users (Chen et al. 2021). The constellation solutions can be combined via common parameters, such as earth rotation parameters (ERP) and station coordinates, which would potentially improve the consistency and precision of orbits. However, it is still not possible to combine other processing parameters such as tropospheric parameters and receiver clocks, as they are pre-eliminated once deactivated. Another method is to split an undivided processing into several sub-sessions and each sub-session is processed separately but in parallel to generate the sub-session NEQs. All the sub-session NEQs are stacked, with all necessary parameters combined, including both global parameters such as ERP, orbits, and station coordinates, and the processing parameters such as ambiguities covering different sub-sessions. This method is much more efficient as the computation burden can be shared by several computing nodes. Jiang et al. (2021) applied the epoch-parallel into multi-GNSS POD with double-differenced observations. However, they keep all parameters in the NEQ, including the tropospheric delays and ambiguities, which can hardly be realized in the multi-GNSS POD with undifferenced observations due to the large amounts of ambiguities and clocks. In addition, keeping the ambiguities in the NEQ leads to an enormous NEQ, slowing the speed of inversion.

In addition, special algorithms are also developed for solving a GNSS network with a huge number of stations. By keeping only active parameters in the NEQ, Ge et al. (2006) reduces the requirement of computer memory and computational burden. The optimized algorithms based on newly developed processors, i.e., block-partitioned algorithms (Gong et al. 2017; Quintana-Orti et al. 2008) and open multi-processing (openMP) (Chandra et al. 2001; Chen et al. 2022) continue improving the computation efficiency. Since most active parameters in GNSS POD are ambiguities, a carrier-range method is brought up and successfully implemented in the huge network processing with the UD ambiguity resolution (Blewitt et al. 2010; Chen et al. 2014). However, precise orbits and clocks are the prerequisite for applying this method, which cannot improve ultra-rapid POD with around 100 stations. Cui et al. (2021) also designed a parallel computing of large GNSS network, which is suitable for multi-core and multi-node environments, by decomposing the GNSS modeling tasks of each epoch to different nodes and cores. However, the efficiency was demonstrated with PPP and baseline processing of large GNSS networks instead of network solution which is essential for ultra-rapid GNSS POD. Of course, with the improvement of modern computer power like central processing unit (CPU) and solid-state drive (SSD), the efficiency in GNSS POD can be further improved (Li et al. 2018).

Currently, the processing of around 120 satellites and 100 stations based on above methods still costs nearly one hour. We aim at achieving ultra-rapid multi-GNSS batch POD solution within 30 min, without losing the consistency of sequential batch solution. Therefore, in this study, an optimized multi-GNSS POD processing strategy is proposed based on the epoch-parallel processing. Different from the method introduced by Jiang et al. (2021), our proposed strategy considers parameter elimination as soon as they are deactivated, which significantly decreases memory requirement and computation burden.

The article is organized as follows. We first present an overview of the batch POD strategy and then propose the epoch-parallel processing strategy, demonstrating its equivalence to the entire sequential batch solution in Sect. 2. The analysis of the computation time and the benefits of using 30-min updated orbit against the 1-hourly updated orbits are illustrated in Sect. 3. The summary and conclusions are presented in Sect. 4.

2 Methodology

2.1 Precise orbit determination in batch solution

Suppose that the UD measurement equation of ionosphere-free (IF) phase and pseudo-range observables from satellite to receiver is described by Eq. (1)

$$ \left\{ {\begin{array}{*{20}l} {P = \left| {{\varvec{x}}^{{{\text{sat}}}} \left( {t_{s} } \right) - {\mathbf{R}}_{{\varvec{x}}} {\varvec{x}}_{{{\text{rec}}}} \left( {t_{r} } \right)} \right| + c\left( {{\text{d}}t_{{{\text{rec}}}} - {\text{d}}t^{{{\text{sat}}}} } \right) + t_{{{\text{trop}}}} + \varepsilon_{P} } \hfill \\ {L = \left| {{\varvec{x}}^{{{\text{sat}}}} \left( {{\text{t}}_{{\text{s}}} } \right) - {\mathbf{R}}_{{\varvec{x}}} {\varvec{x}}_{{{\text{rec}}}} \left( {t_{r} } \right)} \right| + c\left( {{\text{d}}t_{{{\text{rec}}}} - {\text{d}}t^{{{\text{sat}}}} } \right) + t_{{{\text{trop}}}} + N_{{{\text{rec}}}}^{{{\text{sat}}}} + \varepsilon_{L} } \hfill \\ \end{array} } \right., $$
(1)

where \(P\) and \(L\) denote pseudo-range and phase observations, respectively; \({\mathbf{x}}^{sat} \left( {t_{s} } \right)\) the satellite position at the signal transmitting time in Geocentric Celestial Reference System (GCRS); \({\mathbf{x}}_{{{\text{rec}}}} \left( {t_{r} } \right)\) the station position at the signal receiving time in International Terrestrial Reference System (ITRS) and \({\mathbf{R}}_{{\varvec{x}}}\) the transformation matrix from ITRS to GCRS; \(c\) the speed of light; \(dt_{rec}\) the receiver clock offset; \(dt^{sat}\) the satellite clock offset; \(t_{trop} \) the slant tropospheric delay; \(N_{rec}^{sat}\) the phase ambiguities; and \(\varepsilon_{p}\) and \(\varepsilon_{l}\) the measurement noise and other unmodelled errors for the pseudo-range and carrier phase observations, respectively. The inter-system/frequency-dependent code biases relative to the GPS biases at the receiver end will be considered in multi-GNSS POD. As is well known, the satellite orbit is fully determined by the initial state and necessary force model parameters after the force models are specified. The satellite position at any time can be expressed by its reference orbit and the corrections of the initial state and force model parameters as:

$$ {\varvec{x}}^{{{\text{sat}}}} \left( t \right) = \tilde{\user2{x}}^{{{\text{sat}}}} \left( t \right) + \left[ {\begin{array}{*{20}c} {\frac{{\partial {\varvec{x}}^{{{\text{sat}}}} \left( t \right)}}{{\partial {\varvec{x}}^{{{\text{sat}}}} \left( {t_{0} } \right)}}} & {\frac{{\partial {\varvec{x}}^{{{\text{sat}}}} \left( t \right)}}{{\partial \dot{\user2{x}}^{{{\text{sat}}}} \left( {t_{0} } \right)}}} & {\frac{{\partial {\varvec{x}}^{{{\text{sat}}}} \left( t \right)}}{{\partial {\varvec{q}}_{0}^{{{\text{sat}}}} }}} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\delta {\varvec{x}}^{{{\text{sat}}}} \left( {t_{0} } \right)} \\ {\delta \dot{\user2{x}}^{{{\text{sat}}}} \left( {t_{0} } \right)} \\ {\delta {\varvec{q}}_{0}^{{{\text{sat}}}} } \\ \end{array} } \right] $$
(2)

where \(\tilde{\user2{x}}^{{{\text{sat}}}}\) is the reference orbit at certain time \(t\); \(t_{0}\) the reference time; \({\varvec{x}}_{0}^{{{\text{sat}}}} \left( {t_{0} } \right)\) the satellite initial conditions; \(\dot{\user2{x}}^{{{\text{sat}}}} \left( {t_{0} } \right)\) the satellite initial velocities; and \({\varvec{q}}_{0}^{{{\text{sat}}}}\) the force parameters. Note that the satellite orbits need to be expressed in CRS, and the rotation matrix is needed for transforming the receiver position vector from terrestrial reference system (TRS) into CRS (Petit and Luzum 2010). The ERP is therefore introduced and can be estimated.

As soon as the linearized observation equations are available, they can be contributed into the NEQ epoch-by-epoch. The sequential least-squares adjustment only keeping the active parameters in the NEQ can be found in the studies by Ge et al. (2006) and Chen et al. (2022). The eliminated parameters will be recovered backwards from the last epoch to the first epoch after the final NEQ is solved and also the observation residuals for parameter update and observation quality control.

After obtaining the float solution, integer double-differenced ambiguities are resolved and then applied as a strong constraint on the corresponding four related UD ambiguities to obtain the integer ambiguity resolution (Ge et al. 2005). Hence, the pseudo-observation equation of introducing resolved double-differenced ambiguities reads as

$$ {\varvec{v}}_{b} = {\mathbf{D}}{\varvec{x}} - \overline{\user2{b}}, {\mathbf{P}}_{b} , $$
(3)

where D denotes the ambiguity mapping matrix, \(\overline{\user2{b}}\) the fixed IF double-ambiguities, and \({\mathbf{P}}_{b}\) the weights of pseudo-observations.

2.2 Epoch-parallel processing

Currently, the ultra-rapid POD solution is usually performed using 24-h observations, which ensures a precise and reliable solution, but can hardly be shortened due to the large number of processing parameters. Instead of sequential processing from the beginning to the end, the epoch-parallel processing divides the 24-h session into a set of sub-sessions processed parallelly. Each sub-session is processed using the sequential least squares (LSQ) discussed in Sect. 2.1. Only the active parameters are introduced to NEQ, and deactivated parameters are eliminated immediately. Within a sub-session processing, parameters which are not active in the prior or next sub-session can be eliminated before connecting sub-sessions, and the eliminating equation can be used to recover the eliminated parameters to obtain the same result as for sequential processing.

To demonstrate this, we assume all the deactivated parameters are eliminated at once and the NEQs of the ith and \((i + 1){\text{th}}\) sub-session are expressed as

$$ \left[ {\begin{array}{*{20}c} {{\mathbf{N}}_{11,i} } & {{\mathbf{N}}_{12} } \\ {{\mathbf{N}}_{21} } & {{\mathbf{N}}_{22} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\varvec{x}}_{1} } \\ {{\varvec{x}}_{2} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {{\varvec{w}}_{1,i} } \\ {{\varvec{w}}_{2} } \\ \end{array} } \right] $$
(4)
$$ \left[ {\begin{array}{*{20}c} {{\mathbf{N}}_{11,i + 1} } & {{\mathbf{N}}_{13} } \\ {{\mathbf{N}}_{31} } & {{\mathbf{N}}_{33} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\varvec{x}}_{1} } \\ {{\varvec{x}}_{3} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {{\varvec{w}}_{1,i + 1} } \\ {{\varvec{w}}_{3} } \\ \end{array} } \right] $$
(5)

where \({\varvec{x}}_{1}\) are the active parameters which cover at least the \(i^{th}\) and \((i + 1)^{th}\) session, for example, satellite state parameters, \({\varvec{x}}_{2}\) and \({\varvec{x}}_{3}\) are the inactivated parameters in \((i + 1)^{th}\) and \(i^{th}\) session, respectively, for example, ambiguities and epoch-wise clocks.

For sequential processing, we eliminate \({\varvec{x}}_{2}\) from Eq. (4):

$$ {\varvec{x}}_{2} = {\mathbf{N}}_{22}^{ - 1} \left( {{\varvec{w}}_{2} - {\mathbf{N}}_{21} {\varvec{x}}_{1} } \right) $$
(6)
$$ \left( {{\mathbf{N}}_{11,i} - {\mathbf{N}}_{12} {\mathbf{N}}_{22}^{ - 1} {\mathbf{N}}_{21} } \right){\varvec{x}}_{1} = {\varvec{w}}_{1,i } - {\mathbf{N}}_{12,i} {\mathbf{N}}_{22}^{ - 1} {\varvec{w}}_{2} $$
(7)

contribute the next sub-session into the NEQ:

$$ \left[ {\begin{array}{*{20}c} {{\overline{\mathbf{N}}}_{11} } & {{\mathbf{N}}_{13} } \\ {{\mathbf{N}}_{31} } & {{\mathbf{N}}_{33} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\varvec{x}}_{1} } \\ {{\varvec{x}}_{3} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\overline{\user2{w}}_{11} } \\ {{\varvec{w}}_{3} } \\ \end{array} } \right] $$
(8)

with

$$ {\overline{\mathbf{N}}}_{11} = {\mathbf{N}}_{11,i} + {\mathbf{N}}_{11,i + 1} - {\mathbf{N}}_{12} {\mathbf{N}}_{22}^{ - 1} {\mathbf{N}}_{21} $$
(9)
$$ \overline{\user2{w}}_{11} = {\varvec{w}}_{1,i} + {\varvec{w}}_{1,i + 1} - {\mathbf{N}}_{12} {\mathbf{N}}_{22}^{ - 1} {\varvec{w}}_{2} $$
(10)

Then, eliminate \({\varvec{x}}_{3}\) from Eq. (8):

$$ {\varvec{x}}_{3} = {\mathbf{N}}_{33}^{ - 1} \left( {{\varvec{w}}_{3} - {\mathbf{N}}_{31} } \right){\varvec{x}}_{1} $$
(11)
$$ \left( {{\overline{\mathbf{N}}}_{11} - {\mathbf{N}}_{13} {\mathbf{N}}_{33}^{ - 1} {\mathbf{N}}_{31} } \right){\varvec{x}}_{1} = \overline{\user2{w}}_{11} - {\mathbf{N}}_{13} {\mathbf{N}}_{33}^{ - 1} {\varvec{w}}_{3} $$
(12)

After solving Eq. (12), the eliminated parameters can be recovered with Eqs. (6) and (11).

For the parallel processing, \({\varvec{x}}_{2}\) and \({\varvec{x}}_{3}\) are eliminated in parallel. The elimination of \({\varvec{x}}_{2}\) is expressed by Eqs. (6) and (7), while that for \({\varvec{x}}_{3} \) is similar as

$$ {\varvec{x}}_{3} = {\mathbf{N}}_{33}^{ - 1} \left( {{\varvec{w}}_{3} - {\mathbf{N}}_{31} } \right){\varvec{x}}_{1} $$
(13)
$$ \left( {{\mathbf{N}}_{11,i + 1} - {\mathbf{N}}_{13} {\mathbf{N}}_{33}^{ - 1} {\mathbf{N}}_{31} } \right){\varvec{x}}_{1} = {\varvec{w}}_{1,i + 1} - {\mathbf{N}}_{13,i} {\mathbf{N}}_{33}^{ - 1} {\varvec{w}}_{3} $$
(14)

Combining the NEQs of the sub-sessions, i.e., Eqs. (7) and (14) results in the same NEQ of the sequential processing, i.e., Eq. (12). Therefore, both eliminating equations and the final NEQ are the same.

To guarantee the equivalence of epoch-parallel and sequential batch processing, the parameter elimination must be carried out carefully, especially when temporal constraints are involved. First of all, in each sub-session processing, only parameters which are not used in the previous and subsequent sub-sessions can be eliminated, such as ambiguities and epoch-wise clocks. However, for the parameterization of the stochastic process, such as random-walk process (RW), temporal constraints between adjacent parameters should be imposed. These parameters, for example, tropospheric delay parameters, can only be removed after both, all related observations and the corresponding temporal constraints, are added.

It should also be pointed out that to keep the consistency of active ambiguities in the adjacent NEQs, phase windup corrections should be prepared in advance of NEQ generation, so possible integer jumps could be corrected. The phase windup correction is accumulated along with time starting from a fractional cycle at the beginning. Therefore, there could be integer cycle differences from one sub-session to another sub-session, that prevent the connection of ambiguities of continuous data arc.

From the above discussion, epoch-parallel processing strategy divides the long session into short sub-sessions and each sub-session can be processed parallelly using sequential processing software and all the sub-session results can be combined into the final solution. It is obvious that it can take full advantage of multi-cores and multiple computers with only a minor modification of GNSS sequential processing strategies. Hence, it can be easily implemented.

2.3 POD processing strategy

The flowchart of the optimized POD strategy is shown in Fig. 1, including data preparation, data preprocessing, parameter estimation and update, ambiguity resolution, and product generation. In the data preparation, hourly observation and navigation files for specified stations are downloaded from the IGS data centers and recorded from IGS real-time streams; they are merged after preliminary quality control to session-files. The main function of data preprocessing is initializing satellite orbits, generating phase windup correction files, and initial quality control using the single station editing method referred to as TurboEdit (Blewitt 1990), all in a parallel way. The parameter estimation part is optimized by assigning and coordinating sub-session generation, sub-session stacking and solving, orbit update and post-fit residual-based quality control. To continue shortening the computational time, the OpenMP multithreading model is adopted for parameter elimination process during sub-session generation and sub-session stacking. It must be pointed out that the estimation part is repeated several times, typically four iterations, for data cleaning and parameter update and a final iteration for generating the integer ambiguity resolution.

Fig. 1
figure 1

The optimized processing strategy for POD. The rectangle filled with dark orange color means the processes are implemented in a parallel way. Note that the module name in the Positioning And Navigation Data Analyst (PANDA) software is in italic here and in the following sections, e.g., TurboEdit, EdtRes and OI

In the stacking of sub-session NEQs, it is also very important to eliminate deactivated parameters in a timely manner for computational efficiency. Consequently, after any two sub-session NEQs are stacked, the parameters which become deactivated in the subsequent combination should be eliminated immediately. Abide by this rule, there are usually two available approaches, adjacent stacking and sequential stacking. The adjacent stacking approach stacks two adjacent sub-session NEQs whenever available, until all sub-session NEQs are combined. In contrast, the sequential stacking approach stacks the sub-session NEQs one by one and it can be optimized by two parallel stacking processing starting from both ends toward the middle. Taking the stacking of 24 1-h NEQs generated by each sub-session as an example, Fig. 2 illustrates the two different stacking approaches. Besides the 1-h NEQ generation step (step 1), the adjacent stacking approach requires another five steps to obtain the final 24-h NEQ, while the sequential stacking approach needs 12 steps. For the adjacent stacking approach, there are 12 2-h NEQs needed to be generated in step 2, then six 4-h NEQs in step 3, three 8-h NEQs in step 4, one 16-h NEQ and 8-h NEQ in step 5, and a 24-h NEQ in the last step. Assuming that there are sufficient threads available and the parallel combinations take the same time of a single combination, the adjacent stacking should take less time than the sequential stacking at the first glance. However, it is in fact opposite and the computational efficiency of two approaches is evaluated in Sect. 3.2.

Fig. 2
figure 2

Different ways of NEQ stacking. h stands for hour. 1-h NEQ is generated by sequential POD of a sub-session

For this processing, multiple computer nodes should be included, among them one serves as the master or coordinator to distribute tasks, monitor processing status (start and complete, or any disrupt) and check results. The parallel tasks are distributed to the available nodes along with all necessary files for computation. The number of involved nodes is determined by how many sub-session NEQs are divided, for example, 12 nodes are required for epoch-parallel processing of 12 sub-session NEQs. On each node, the proposed strategy can also take advantage of multi-threads, for example, accelerating parameter elimination by OpenMP.

3 Results

The epoch-parallel processing strategy is evaluated in this section. We first introduce the data processing strategy for the experiment and the involved hardware configuration. Subsequently, the feasibility and timeliness of the proposed POD strategy on a cluster are verified in Sect. 3.2 and the benefits of the 30-min updated orbits are presented in Sect. 3.3.

3.1 Data processing strategies

Table 1 shows the CPU types equipped in computer nodes used in this study. The CPU frequency of each node ranges from 3.20 to 4.10 GHz. For a convenient and clear statement, we need to specify that one node here means a single server or computer. All nodes are equipped with SSDs for high-speed data input–output (IO).

Table 1 CPU architecture information

A network of 120 multi-GNSS ground stations are selected for multi-GNSS POD solutions, as shown in Fig. 3. In addition, we select a subnet using 90, 100, or 110 stations within the 120 stations, to further evaluate the POD efficiency and accuracy of different networks. The time span of our experiments ranges from day of year (DOY) 335 to 365, 2021.

Fig. 3
figure 3

Station distribution of the experimental GNSS tracking network with 120 stations

The proposed POD strategy in Sect. 2.2 is implemented on the Positioning And Navigation Data Analyst (PANDA) software package (Liu and Ge 2003), which is capable of multi-GNSS post and real-time processing (Zuo et al. 2021) and recently upgraded to handle multiple space geodetic techniques (Wang et al. 2022). Table 2 gives the processing details, which generally follow the IGS standard. For the solar radiation pressure, a hybrid estimation model with the extended code orbit model (ECOM), which originated from the model proposed by Beutler et al. (1994), and a prior box-wing model is applied. The 5-parameter ECOM is termed as ECOM1 (Springer et al. 1999), while the 7-parameter ECOM is termed as ECOM2 (Arnold et al. 2015). We do not apply integer ambiguity resolution to the GLONASS or BDS GEO satellites. Considering the CPU information, the number of threads for parallel parameter elimination in a node is set to four. In the following experiments, the session length is 24 h, similar to IGS ACs, even though a longer session might bring marginal improvement. The entire session is divided into at most 24 sub-sessions, as the processing of 1-h observations only takes a few seconds. Unless otherwise noted, the experiments in this section are carried out using 100 tracking stations and 120 satellites on the 4.1 GHz node.

Table 2 Strategy and parameters of POD processing

3.2 The timeliness of multi-GNSS POD on multiple-nodes

As the length of a sub-session becomes shorter, more NEQs are generated and the NEQ stacking tends to be more time-consuming. But if a sub-session is too long, the benefit of epoch-parallel is less significant. Therefore, the balance between generating and stacking of sub-sessions needs to be further investigated. Note that the number of nodes mentioned in the following equals the number of sub-sessions as there is only one process of sub-session NEQ generation running on a node, e.g., the number of four nodes means that a 24-h session is split into four 6-h sub-sessions.

Taking into consideration all the optimized strategies mentioned in Sect. 2.3, along with parallel parameter elimination utilizing OpenMP, the computational efficiency on different numbers of sub-sessions is shown in Fig. 4a. With the assistance of parallel processing of TurboEdit, OI, and EdtRes, the sequential batch POD strategy with and without parallel parameter elimination costs 54 min and 73 min, respectively. Compared with sequential batch solutions, all epoch-parallel cases have the priority in computational efficiency. When there are more than six nodes (six sub-sessions) available, the computational time of the new POD processing strategy is less than 30 min, while the improvement of using 12 nodes reaches up to 49% compared to using one node. The increased time using 24 nodes is caused by the stacking of too many sub-sessions coupled with many active ambiguities. Even for the two nodes case, the computational efficiency is improved by 25%. Obviously, the most time-consuming part in a POD is the process of parameter estimation. Figure 4b illustrates the computational time of one iteration of parameter estimation, including NEQ generation and NEQ stacking, using different numbers of nodes. If four or more nodes are used, then one iteration of parameter estimation takes only around 5 min and the difference caused by more nodes is less than 1 min. The computational efficiency on two nodes is reduced by more than 2 min compared with the sequential POD method.

Fig. 4
figure 4

Computational efficiency of POD with various numbers of nodes over the period from December 1 to 7, 2021, using 100 stations on a 4.1 GHz node. a The running time of an entire POD (total), where “Preprocessing” stands for processing steps before parameter estimation, “Estimation(float)” for four iterations of parameter estimation without ambiguity resolution; “Estimation(fixed)” for one time of parameter estimation with ambiguity resolution. b The running time of one iteration of parameter estimation including the generation of NEQs (yellow) and NEQ stacking (green). Sequential stacking method is selected here. The number of the sub-sessions is identical with the number of nodes, so that each node processes only one session for NEQ generation. Note that “1(np)” stands for using one node but non-parallel parameter elimination

When the number of sub-sessions increases, the time-consumption of sub-session stacking becomes more prominent. As aforesaid, there are mainly two ways of stacking NEQs: sequential stacking and adjacent stacking. Their time-consumings are presented in Fig. 5. The step of stacking two sub-sessions only costs around 17 s. When there are more than two sub-sessions, the stacking time increases sharply. In case of six or more sub-sessions, sequential stacking is more efficient, e.g., 67 s is saved in the situation of stacking 24 sub-sessions, even though the sequential stacking requires more stacking steps. The reason is that in the sequential stacking only the active ambiguities of one side (either starting or ending part of the sub-session) are kept in NEQ, while in adjacent stacking, many parameters which are also active in the adjacent two sub-session NEQs must be kept for further stacking, leading to a larger dimension of the NEQ and more time-consuming in parameter elimination. When the number of sub-sessions is no more than four, the time-consuming of two methods are the same as they perform identically. Although both three sub-sessions and four sub-sessions need two times stacking, the latter costs much more time, as the additional third and fourth NEQ stacking spends more time than the first and second NEQ stacking. Therefore, sequential stacking is adopted in the following POD results. It is worth mentioning that the sequential stacking can be easily performed on a single node while the benefit of using two nodes is marginal.

Fig. 5
figure 5

Computational efficiency for different stacking methods

Apart from the NEQ generation and stacking, multi-nodes are profitable for the other steps, e.g., preprocessing of GNSS observations (TurboEdit), orbit integration (OI), and posterior residual-based quality control (EdtRes), which can be easily realized station-wise and/or satellite-wise. As shown in Fig. 6, the computational time of TurboEdit, OI, and EdtRes decreases gradually along with the increasing of the node number. When three or more nodes are involved, TurboEdit of 100 stations requires less than 1 min. However, with more nodes involved, the improvement of computational efficiency for OI and EdtRes decreases progressively as each program needs a little time to process single station or satellite, e.g., files reading which cannot be omitted. Provided that there are 100 stations and 120 satellites, four nodes equipped with eight threads processing at most four times of OI and EdtRes, costs around merely 10 s. The improvement of the available nodes more than four is less than 5 s.

Fig. 6
figure 6

Computational efficiency of TurboEdit, OI, and EdtRes with various nodes. “1(seq)” in the horizontal axis stands for sequential processing without station or satellite parallel

The above comparison and analysis are derived from processing 100 tracking stations based on the 4.1 GHZ CPU nodes. The computational efficiency of the new strategy using different types of CPU is further shown in Fig. 7. The process of POD is usually faster with higher CPU performance. Compared with the results based on nodes equipped with 3.2 GHZ CPU, the time-consuming on the 4.1 GHz CPU of three solutions, including four, two, and one sub-sessions is down by 37%, 33%, and 29%, respectively. Consistent with the above conclusion, using four nodes is always faster than using two, especially for larger number of stations. In general, the new strategy can secure a processing time within one hour except for few special cases with too many stations and/or outdated CPU such as the case of processing 120 stations on 3.2 GHz node, whereas the sequential strategy has to reduce the number of stations and on a high-performance node to satisfy the 1-h requirement, such as the case using 90 stations, and that using 100 stations on 4.1 GHz node. More important is that the half-hourly update can be achieved using the new strategy, for instance, using 90 stations on a 3.8 GHz or 4.1 GHz node.

Fig. 7
figure 7

Computational efficiency with different number of nodes and stations over the period from December 1 to 7, 2021. More information of the different types of nodes is provided in Table 1. One session means a sequential batch POD solution

3.3 Assessment of multi-GNSS orbits

According to the timeliness analysis in Sect. 3.2, the epoch-parallel strategy with around 90 stations and 120 satellites can realize half-hourly orbit update, while for the sequential batch processing the computation costs nearly 50 min. Currently, most IGS ACs select around 100 stations to realize the routine ultra-rapid orbit solution with a latency between one and three hours, whereas our proposed strategy can increase the number of tracking stations to 120 to achieve hourly orbits. Therefore, four solutions over a period of one month in December, 2021 are designed to evaluate the performance of epoch-parallel processing strategy, as shown in Table 3. The EP_090_030 and EP_120_060 solutions are based on the epoch-parallel strategy, updated in 30 and 60 min, respectively. The other two solutions based on the sequential batch strategy serve as a reference. Note that the sequential batch and epoch-parallel strategy are strictly equivalent, with the only difference being in processing time. Hence, the major difference between SB_090_060 and EP_120_060 is the number of stations included in processing. The SB_090_060 solution indicates that 90 stations can be processed in one hour, while 120 stations can be processed at the same time in EP_120_060 solution.

Table 3 Four cases of ultra-rapid POD using different POD strategies

For real-time positioning purpose, only the predicted orbits are available for users, therefore it is of great importance. However, here both estimated and predicted orbits are evaluated. Figure 8 shows the processing and update scenarios of ultra-rapid orbits.

Fig. 8
figure 8

Update latency for ultra-rapid orbits

Let the processing of session one start at \(t_{1}\), the session time of estimated part is \(\left( {t_{1} - 24^{h} ,t_{1} } \right)\), the processing period is \({\Delta }t_{c}\), then the orbits are available from \(t_{1} + {\Delta }t_{u}\). \({\Delta }t_{u}\) could be different from \({\Delta }t_{c}\) as it is determined by product provider. Afterward, the session two starts at \(t_{1} + {\Delta }t_{u}\), in which \({\Delta }t_{u}\) is the orbit update interval, the orbits are available from \(t_{1} + 2{\Delta }t_{u}\). From this epoch, the predicted orbits from session one will be replaced by that of session two. The inconsistency at switch epoch of the two consecutive predicted orbits are the boundary discontinuities, that are investigated. The current used predicted orbits can also be compared with the later available estimated orbits for quality validation, i.e., the predicted part of session one with the estimated part of session three, termed as orbit overlap. Compared with session three, orbit overlap in session one starts from \(t_{1} + {\Delta }t_{u}\) to \(t_{1} + 2{\Delta }t_{u}\), which is represented as shade part in Fig. 8.

The estimated orbits are first evaluated and compared with the IGS Final products and Rapid products provided by GFZ, known as GBM, is shown in Fig. 9 and Table 4. The orbit accuracy for GPS, GLONASS, Galileo, and BDS MEO satellites in terms of averaged 1D RMS is 1.4, 4.7, 2.2, and 3.7 cm, respectively, while for BDS GEO and IGSO satellites the value is 117 cm and 11.6 cm, respectively. Due to the uneven tracking stations and insufficient force models, e.g., solar radiation pressure, the performance of BDS is inferior to that of GPS and Galileo, especially for BDS GEO and IGSO satellites (Zhao et al. 2022). As expected, there is almost no difference between the epoch-parallel and traditional batch solutions.

Fig. 9
figure 9

RMS values of estimated part for different solutions. GPS orbits are compared with the IGS Final products, while the other constellations are compared with the GBM Rapid products. Note the different y-axis scales between different panels

The averaged RMS values of orbit overlaps are provided in Fig. 10 and Table 5. The averaged 1D RMS values in the EP_090_030 solution for the GPS, GLONASS, Galileo, BDS GEO, BDS IGSO, and BDS MEO satellites are 1.3, 2.3, 1.8, 19.7, 11.4, and 2.8 cm, respectively, which is better than the SB_090_060 solution by 39%, 35%, 41%, 47%, 47%, and 37%. After the number of tracking stations increases to 120, although a slight improvement is observed with respect to the SB_090_060 solution, in which BDS IGSO and MEO satellites show the largest improvement, increased by 6% and 9%, respectively, adding more stations to the network with 90 stations is not critical. When the multi-GNSS orbits are updated every two hours, a significant accuracy reduction is observed. Compared with the EP_120_060 solution, the averaged 1D RMS values in the SB_120_120 solution for GPS, GLONASS Galileo, BDS GEO, BDS IGSO, and BDS MEO satellites are decreased by 33%, 35%, 37%, 52%, 49%, and 36%, respectively. For MEO satellites, the along component exhibits the worst accuracy among the three directions and drops rapidly as the update latency becomes longer. In addition, the radial component for BDS GEO and IGSO satellites also shows a notable deterioration from the 30-min updated solution to the 60-min and 120-min updated ones, decreased by 73% and 66%, respectively.

Fig. 10
figure 10

RMS values of orbit overlaps for different solutions. Note the different y-axis scales between different panels

Figure 11 shows the distribution of 1D RMS values of orbit overlaps for different constellations. For solution EP_090_030, around 80%, 54%, 67%, and 40% of the overlap RMS values are within 2 cm for GPS, GLONASS, Galileo, and BDS MEO, respectively, which is better than the SB_090_060 solution by 26%, 19%, 27%, and 20%, respectively. Similar to the orbit results compared with GBM, the BDS MEO satellites show relatively poor performance. Despite the overall poor overlaps of BDS GEO and IGSO satellites, 79% and 89% of the overlap RMS values are within 20 cm in the EP_090_030 solution, respectively, whereas the corresponding value in the SB_090_060 solution is 49% and 68%. Comparing the SB_090_060 and EP_120_060 solutions, the station number does not contribute significantly to the orbit overlap for GPS and Galileo constellations, whereas for GLONASS and BDS satellites a slight improvement of 2% is observed. When the orbit prediction time is extended to two hours, i.e., solution SB_120_120, the overlap RMS values are much worse, and only 32%, 18%, 20%, and 9% of the values are within 2 cm for GPS, GLONASS, Galileo, and BDS MEO satellites, respectively. The four solutions demonstrate the necessity and benefits of using low-latency ultra-rapid orbits.

Fig. 11
figure 11

Distribution of the 1D RMS values of orbit overlaps for different solutions. Note that the dashed orange lines are covered by the green line in most subplots, except for the last plot in the bottom-right corner. The x-axis scales of MEO and GEO/IGSO satellites are different in the lower panel

The positioning performance could be affected by the update of the GNSS ultra-rapid orbits, i.e., switching from one to the next orbits inevitably leads to residual fluctuations in positioning solutions due to the discontinuities of two consecutive sessions. Hence, orbit discontinuity is a key indicator for evaluating orbit quality. Figure 12 and Table 6 show the averaged 1D RMS values of orbit discontinuities. By comparing SB_090_060 and EP_120_060 solutions, increasing the number of stations from 90 to 120 only has small improvements for the discontinuities of BDS satellites. For the satellite orbits updated per 30 min, i.e., solution EP_090_030, the averaged 1D RMS values for GPS, GLONASS, Galileo, BDS GEO, BDS IGSO, and BDS MEO satellites are 1.1, 1.9, 1.6, 15.5, 9.8, and 5.6 cm, respectively. When the orbit update is extended to one hour, the averaged RMS values for all satellites decrease by a factor of 1.8 to 2.3. The accuracy of orbits updated every two hours, i.e., solution SB_120_120 is even three to six times worse than the orbits update in 30 min, especially for BDS GEO and IGSO satellites, decreased by a factor of around five. In terms of the 95th percentile, the orbit discontinuities in EP_090_030 solution for GPS, GLONASS, Galileo, BDS GEO, BDS IGSO, and BDS MEO satellites are 2.2, 3.7, 3.1, 31.9, 20.1, and 5.1 cm, respectively, which show an obvious drop if the update latency is lengthened, from 30 min to 60 min and from 60 min to 120 min. Both the orbit overlap and the discontinuity show that satellite orbits with half-hourly update perform best among the four solutions.

Fig. 12
figure 12

Averaged 1D RMS (solid bar) values of orbit discontinuities for different solutions. Dashed box stands for the 95th percentile. Note the different y-axis scales in different panels

Taking the IGS Final products and GBM products, the statistics of user-available part of predicted orbits are shown in Fig. 13 and Table 7. The orbits in EP_090_030 solution show the best performance among four solutions. The average 1D RMS values are 2.7 cm, 6.5 cm, 3.7 cm, 129.2 cm, 26.7 cm, and 6.5 cm for GPS, GLONASS, Galileo, BDS GEO, BDS IGSO, and BDS MEO satellites, respectively. When the orbit update interval gets longer, the accuracy of predicted orbits decreases gradually, especially in the along component. Compared to the EP_090_030 solution, the along component of Galileo and BDS IGSO in the SB_120_120 solution is decreased by a factor of two and three, respectively. The orbit difference between the SB_090_060 and EP_120_060 solution is negligible, less than 1%. Meanwhile, the accuracy decrease in the other two components is slightly smaller, except for the radial component of BDS GEO and IGSO satellites.

Fig. 13
figure 13

RMS values of user-available part for different solutions. GPS orbits are compared with the IGS Final products, while the other constellations are compared with the GBM Rapid products. Note the different y-axis scales between different panels

4 Conclusions and outlook

In this contribution, an epoch-parallel processing strategy for multi-GNSS observations is proposed, which is suitable for low-latency multi-GNSS POD. The proposed method is rigorously equivalent to the sequential batch processing strategy. Compared with the sequential batch processing, the overall computation time is improved by 25% to 49% with more than two nodes utilized. However, the benefits of using more nodes are less significant if more than four nodes are used. With optimized epoch-parallel POD strategy, the ultra-rapid orbits can be updated within 30 min for a global network consisting of 90 tracking stations and one hour for a global network consisting of 120 tracking stations.

In addition to the orbit computational efficiency, the orbit accuracy of user-available part is improved visibly as well. The 30-min updated orbit is better than the 60-min updated ones by half of the corresponding RMS, and the 120-min updated ones with only a quarter of the RMS value. The averaged 1D RMS of orbit overlaps of the 30-min orbits for the GPS, GLONASS, Galileo, BDS GEO, BDS IGSO, and BDS MEO satellites is 1.3, 2.3, 1.8, 19.7, 11.4, and 2.8 cm, respectively, and the corresponding values when compared with GBM products is 1.9, 6.2, 2.2, 144, 11.4, and 3.7 cm. Increasing the number of tracking stations only improves slightly to BDS satellites.

In summary, our results show that the proposed strategy can shorten orbit update time effectively without destroying the consistency of all estimated parameters. Hence, the accuracy of predicted orbit can be improved. In future, we will further optimize the POD strategies and explore the benefits of half-hourly updated orbits in real-time clock estimation, precise point positioning (Tang et al. 2023), and atmospheric sounding.