1 Introduction

The ionosphere is the upper part of the Earth’s atmosphere with an altitude between about 40 and 2000 km. The ionosphere is ionized by solar radiation which means that there are free electrons and positive ions in this region Schaer (1999). In addition to playing and important role in the Earth’s magnetic field, the ionosphere influences radio signal propagation from satellites to the Earth.

One of the major parts of the error budget in the GNSS (Global Navigation Satellite System) measurements is the ionospheric delay. The effect depends on the number of charged particles, the Total Electron Content (TEC) along the path of the signal through the ionosphere. With multi-frequency devices, the delay can be removed by the so-called ionospheric-free linear combination Hofmann-Wellenhof et al. (2008). However, still major part of the devices operates on a single-frequency; thus the ionosphere modelling is crucial to mitigate its effect.

One of the earliest and simplest models is the Klubochar model Klobuchar (1987). The corresponding parameters are broadcast by the GPS satellites among navigational messages. In general, Klobuchar model can describe just a limited part of the whole effect (Feess and Stephens 1987; Filjar et al. 2009), however in extreme conditions, during ionospheric storms for instance, its accuracy can be degraded even more (Bergeot et al. 2010; Gordienko et al. 2005; Hu et al. 2005). The most recent satellite system, the European Galielo, broadcasts its own ionosphere model, the NeQuick one Angrisano et al. (2013). According to a throughout investigation and comparison of Klobuchar and NeQuick model Farah (2008), it can be concluded in general that NeQuick model offers a better behaviour when the ionosphere is stable but slightly poorer behaviour with higher variability of ionosphere or close to maximum TEC values.

High quality TEC maps can be derived from the measurements of permanent station with post-processing Schaer (1999). The spherical harmonics and B-spline based TEC models are adequate for global and regional representation, while the polynomial based models are limited for regional modelling. These models describe the TEC of the ionosphere as a function of time and space Hernández-Pajares et al. (2009). The commonly accepted exchange format is the IONEX Schaer et al. (1998), that represents the 2- and 3-dimensional TEC maps given in a geographic grid. Thanks to the rapid development of computer processing, providing global and regional tec maps can be done in near-real-time (Coster et al. 1992; Bergeot et al. 2014; Erdogan et al. 2017; Chang et al. 2019; Renga et al. 2018).

The Gauss process regression (GPR) method is well-known in the machine learning community and proved its utility in various domain. It is also worth to mention that the mathematical background of the GPR and popular Kriging in geostatistics are the same Stein (1999). Data from a single station in India was used to forecast ionospheric delay using GPR technique with almost 100% accuracy Lakshmi et al. (2020). Another GPR approach was used to predict daily TEC values based upon the TEC values recorded at various permanent GNSS station in Turkey Inyurt et al. (2020). A comparison of Gaussian process (GP) and neural network model was performed by Ackermann et al. (2011) over a test area in South Africa. The GP framework presented many advantages over competing modeling strategies, such as providing powerful and convenient ways of incorporating prior knowledge and requiring less training data than neural networks. Another recent study shows the GP ability to enhance the positioning performance by improved TEC estimation in real-time Lin et al. (2019).

Fig. 1
figure 1

RIMS positions across the European region

The safety critical applications demand for high reliability besides accuracy. Auxiliary services are preferred to achieve this high standard of requirements. For the European region, the EGNOS (European Geostationary Navigation Overlay Service) grants real-time corrections from the measurements of RIMS (Ranging and Integrity Monitoring Stations) Ventura-Travest and Flament (2007). The stations are located across the European region in a sparse network (Fig. 1). The very first initiative to monitor the achievable accuracy and integrity with EGNOS was performed by the EGNOS Data Collection Network (EDCN) project Soley et al. (2004). EGNOS performance can be slightly degraded at the boundary of coverage area due to lack of reliable ionosphere modelling over the periphery Grunwald et al. (2016) or Lupsic and Takács (2019).

This paper presents a GPR based novel ionospheric model for real-time TEC map generation from RIMS data available in the European region. The accuracy of the developed model is investigated by comparing to widely used and referred global and regional models developed using raw GNSS observations.

1.1 Algorithm outline

Fig. 2
figure 2

Algorithm outline

A brief overview about the developed algorithm is presented in Fig. 2 and the mathematical details of each block will be featured in the following sections. A regional receiver network, in our case the EGNOS system provides code and phase measurements in double frequencies. Combined measurements cancel out the geometry dependency and the result directly reflects the state of the ionosphere and the combined instrumental delays of receiver and the tracked satellite. This is the so called \(L_4\) combination and the rigorious mathematical description of its calculation is presented in Sect. 2.1. In case of known instrumental delays, the slant ionospheric delays from the \(L_4\) observation can be calculated. A single layer model determines the conversion from the slant ionospheric delay to vertical ionospheric delay alongside the location of the ionospheric pierce point (IPP) [Sec. 2.1]. The vertical ionospheric delay with their corresponding locations are the input to the Gauss process regression model which is described in Sect. 2.2. The GPR model creates the Total Electron Content map and updates it in real time. The presented algorithm in the paper is designed to be capable of run in real time, therefore the instrumental delays shall be estimated, and monitored continuously in parallel with the GPR. For that purpose, a polynomial model was used to describe the vertical total electron content (VTEC) and the data was processed iterative with a Kalman filter. The Sect. 2.4 is dedicated to present this part of the algorithm. It is worth to mention any other near-real time estimation could be used which provides an adequate accuracy for instrumental delays.

2 Mathematical background

2.1 Ionospheric TEC measurements from GNSS

The ionospheric refraction is frequency-dependent, therefore with a dual-frequency receiver this delay can be eliminated in first order by the ionospheric-free combination. On the other hand, with the geometry-free combination, one could extract information about the state of the ionosphere Ciraolo et al. (2007). The combination can be created from the pseudorange measurements and the phase measurements. The noise of phase observation is typically two or three orders smaller than the code but suffers from the initial ambiguity. This ambiguity can be resolved by quasi ionosphere-free (QIF) (Teunissen 1995; Zhang et al. 2016).

The ionospheric delay depends on the frequency and the total electron content along the signal path between satellite and receiver.

$$\begin{aligned} STEC = \int _{r}^{s}N_e(l)dl, \end{aligned}$$
(1)

where \(N_e\) is the electron density in \(el/m^2\) dimension, which varies along the ray path. The vertical total electron content (VTEC) definition follows the form of Eq. (1) but the integration path is vertical.

$$\begin{aligned} I_{r,j}^s=\frac{40.3 \cdot STEC}{f_j^2}=\frac{f_i^2}{f_j^2}I_{r,i}^s \end{aligned}$$
(2)

Converting the ionospheric delay from one frequency to another can be easily done with the Eq. (2), where j is the target and i is the reference frequency. \(I_{r,i}^s\) is the delay on the reference signal. The code and phase observation can be divided to frequency dependent and independent parts.

$$\begin{aligned} R_{r,j}^s= & {} \rho _r^s+c(dt_r-dt^s)+T_r^s+I_{r,j}^s+\xi _{r,j}^s+d_{r,j}+d_j^s+\varepsilon _{r,j}^s \end{aligned}$$
(3)
$$\begin{aligned} \varPhi _{r,j}^s= & {} \rho _r^s+c(dt_r-dt^s)+T_r^s-I_{r,j}^s+\zeta _{r,j}^s+\delta _{r,j}+\delta _j^s+\lambda _jN_{r,j}^s+\epsilon _{r,j}^s, \end{aligned}$$
(4)

where j indicates the carrier frequency, \(R_{r,j}^s, \varPhi _{r,j}^s\) code and phase measurement respectively, \(\rho\) is the real geometry distance between the r receiver and s satellite, \(dt_r\) is the receiver clock bias, \(dt^s\) stands for the satellite clock bias, \(T_r^s\) is the tropospheric delay, \(I_{r,j}^s\) the ionospheric delay in j frequency, \(d_{r,j}, \delta _{r,j}\) are the hardware bias of the receiver in j frequency of the code and phase observation respectively, \(d_j^s, \delta _j^s\) are for the hardware bias of the satellite in j frequency of the code and phase measurement respectively, \(\varepsilon _{r,j}^s, \epsilon _{r,j}^s\) denote stochastic, Gaussian type noises, \(N_{r,j}\) is the integer phase ambiguity, \(\lambda _j\) the wavelength of j frequency, \(\xi _{r,j}^s,\zeta _{r,j}^s\) denote the multipath. Subtracting the code and phase measurement in frequnecy 1, 2 yields a geometry-free combination.

$$\begin{aligned} R_{r,I}^s =R_{r,1}^s-R_{r,2}^s=\alpha I_{r,1}^s+b_r+b^s+M_{r,I}^s+\varepsilon _{r,P}^s \end{aligned}$$
(5)
$$\begin{aligned} \alpha&=\left( 1-\frac{f_1^2}{f_1^2}\right) ,\\ b_r&=d_{r,1}-d_{r,2},\\ b^s&=d_{1}^s-d_{2}^s,\\ \varepsilon _{r,P}^s&=\varepsilon _{r,1}^s-\varepsilon _{r,2}^s,\\ M_{r,P}^s&=\xi _{r,j}^s-xi_{r,j}^s, \end{aligned}$$

where \(\alpha\) is a frequency dependent constant, \(b_r, b^s\) are the differential code biases, \(\varepsilon _{r,P}^s,\xi _{r,P}^s\) are the combined noise, and multipath delay respectively. The same combination can be created from the phase measurement as well,

$$\begin{aligned} \varPhi _{r,I}^s=\varPhi _{r,2}^s-\varPhi _{r,1}^s=\alpha I_{r,1}^s+B_r+B^s+M_{r,\varPhi }^s+C_{arc,r}^s+\varepsilon _{r,\varPhi }^s, \end{aligned}$$
(6)
$$\begin{aligned} C_{arc,r}^s&=\lambda _2N_{r,2}^s-\lambda _1N_{r,1}^s\\ B_r&=\delta _{r,2}-\delta _{r,1},\\ B^s&=\delta _2^s-\delta _1^s,\\ \epsilon _{r,\varPhi }^s&=\epsilon _{r,2}^s-\epsilon _{r,1}^s,\\ M_{r,\varPhi }^s&=\zeta _{r,2}^s-\zeta _{r,1}^s,\\ \end{aligned}$$

where \(C_{arc,r}^s\) is the ambiguity bias, \(B_r,B^s\) are the receiver and satellite interfrequency bias (ISB), \(M_{r,\varPhi }^s,\epsilon _{r,\varPhi }^s\) are the combined multipath delay and noise. The significantly lower noise level of the phase observation can be exploited in the \(L_4\) combination. The \(CPB_r^s\) (Carrier Phase Bias) is the offset between the \(\varPhi _{r,I}^s\) and \(R_{r,I}^s\) combination and it can be estimated averaging out the geomatry-free code and phase differences in a cycle slip free continuous arc.

$$\begin{aligned} CPB_r^s \approx \frac{1}{N}\sum _{n=1}^{N}(\varPhi _{r,I}^s-R_{r,I}^s)_n, \end{aligned}$$
(7)

where N is the number of observation in the continuously observed arc. The leveled geometry-free combination, \(L_{r,4}^s\) for a given arc is

$$\begin{aligned} L_{r,4}^s = \varPhi _{r,I}^s-CPB_r^s = \alpha STEC + b_r +b^s+\epsilon _{L_4} \end{aligned}$$
(8)

To achieve a near real-time ionosphere estimation the levelling method is not quite suitable Xiang et al. (2017), instead of that a Hatch-filter smoothing was applied. The smoothed geometry-free code \({\hat{R}}_{r,I}^s(n)\) can be calculated as:

$$\begin{aligned} \begin{aligned} {\hat{R}}_{r,I}^s(k)=\frac{1}{n}R_{r,I}^s(k)+\frac{n-1}{n}\left[ {\hat{R}}_{r,I}^s(k-1)+\varPhi _{r,I}^s(k)-\varPhi _{r,I}^s(k-1) \right] , \end{aligned} \end{aligned}$$
(9)

where, \(n=k\) when \(k<N\) and \(n=N\) when \(k \ge N\). The N value was choosed to 50 in this paper, but to avoid the nonconvergent smoothed code \({\hat{R}}_{r,I}^s(n)\) create a significant bias to the estimation, the first 10 samples in each new arc were not used for TEC modelling. A cycle slip detection is substantial to avoid abrupt biases in \(L_4\) smoothing and in case of this event the Hatch-filter algorithm resets.

Once the L4 is calculated and smoothed, one can create a direct link with corresponding STEC value and the satellite and receiver biases.

$$\begin{aligned} L_{r,4}^s(t) = \alpha STEC(t) + b_r(t) +b^s(t)+\epsilon _{L_4}, \end{aligned}$$
(10)

where the \(\alpha\), \(b_r\), \(b^s\) represents the same terms from Eq. (5) and \(\epsilon _{L_4}\) is the noise of the smoothed \(L_{r,4}^s\) measurement. The vertical total electron content is one of the main properties of the ionosphere. One of the most used, and simplest method to establish connection between the slant and vertical TEC is a single layer model (SLM).

$$\begin{aligned} STEC(t) = m(z)VTEC(\lambda ,\phi ,t), \end{aligned}$$
(11)

where STEC is the total electron content between a receiver and a satellite, m(z) is the elevation dependent mapping function. Fig. 3. depicts the concept of SLM, where the the vertical dimension of ionosphere is reduced to a single layer. The slant total electron content is considered at the ionospheric point, where the line of sight crosses the single layer. The m(z) mapping function converts the STEC to VTEC value therefore the different line of sight measurements are comparable. The latitude and longitude of the sub-ionospheric point are the characteristic properties of the IPPs alongside with its VTEC value. Substituting Eq. (11) to Eq. (10) yields the following:

$$\begin{aligned} L_{r,4}^s(t) = \alpha m(z)VTEC(\lambda ,\phi ,t) + b_r(t) +b^s(t)+\epsilon _{L_4}, \end{aligned}$$
(12)
Fig. 3
figure 3

Single layer ionospheric model, SLM

In case of known \(b_r\) and \(b^s\) the VTEC value can be calculated by rearranging the Eq. (12).

$$\begin{aligned} VTEC(\lambda ,\phi ,t) = \frac{L_{r,4}^s(t) - b^s(t)-b_r(t) -\epsilon _{L_4}}{\alpha m(z)} \end{aligned}$$
(13)

The latitude and longitude of the IPP can be calculated from the single layer model and from the known position of the receiver and satellite. These are the input set to the GPR modell, and in case of RIMS stations 200–400 separate IPPs with their VTEC values can be calculated in each epoch (Fig. 4). The next section will present the details how one can create a consistent TEC map from the fore-mentioned snapshot data with Gauss-process regression.

Fig. 4
figure 4

VTEC map created by GPR model. Green circles represents the positions of the RIMS stations, and the spherical stars are the positions of the ionospheric pierce points. Their colors and heights depict the corresponding VTEC values derived from the L4 observations and from the known DCBs values according to Eq. (13). The stars represent a snapshot of input value set to the GPR model, and the grid surface is the VTEC output of GPR modell

2.2 Gauss process regression

The central understanding of Gaussian process lays in the multivariate normal distribution. A random vector \({\varvec{y}}=(y_1,\ldots ,y_l)\) has the multivariate normal distirbution if it is derived from the following transformation of \({\varvec{z}}=(z_1,\ldots ,z_M)\), where each \(z_m \sim {\mathcal {N}}(0,1)\) Blitzstein and Hwang (2019):

$$\begin{aligned} {\varvec{y}}={\varvec{A}}{\varvec{z}}+\varvec{\mu }, \end{aligned}$$
(14)

where \({\varvec{A}}\) is a \(K\times M\) matrix and \(\mu \in {\mathbb {R}}^K\). This represented as \({\varvec{y}} \sim {\mathcal {N}}(\varvec{\mu },\varvec{\varSigma })\), where \(\varvec{\varSigma }=Cov({\varvec{y}},{\varvec{y}})={\mathbf {A}}{\varvec{A}}^T\). The mean vector represented by \(\varvec{\mu }\), and the covariance matrix is the \(\varvec{\varSigma }\). Therefore the probabilistic functions of \({\varvec{y}}\) is

$$\begin{aligned} f({\varvec{y}})=\frac{1}{(2\pi )^K \left| \varvec{\varSigma } \right| ^{\frac{1}{2}}}exp\left( -\frac{1}{2}({\varvec{y}}-\varvec{\mu })^T\varvec{\varSigma }^{-1}({\varvec{y}}-\varvec{\mu })\right) \end{aligned}$$
(15)

Consider splitting a multivariate normal variable \({\varvec{y}}\) into \(({\varvec{y}}_1,{\varvec{y}}_2)\) two sub-vectors. Each of it has multivariate normal distribution hence we can split the \({\varvec{A}}\) matrix above accordingly. Rewriting the \({\varvec{y}}\) vector in terms of its sub-vectors gives

$$\begin{aligned} {\varvec{y}}= \begin{pmatrix} {\varvec{y}}_1 \\ {\varvec{y}}_2 \end{pmatrix}\sim {\mathcal {N}}\left( \begin{pmatrix} \varvec{\mu }_1 \\ \varvec{\mu }_2 \end{pmatrix}, \begin{pmatrix} \varvec{\varSigma }_{11} &{} \varvec{\varSigma }_{12}\\ \varvec{\varSigma }_{21} &{} \varvec{\varSigma }_{22} \end{pmatrix} \right) . \end{aligned}$$
(16)

The conditional distribution of \({\varvec{y}}_2\) given \({\varvec{y}}_1\) is \(P({\varvec{y}}_1 | {\varvec{y}}_2)={\mathcal {N}}(\varvec{\mu }_{1|2},\varvec{\mu }_{1|2})\), where

$$\begin{aligned} \varvec{\mu }_{1|2}= & {} \varvec{\mu }_2+\varvec{\varSigma }_{21}\varvec{\varSigma }_{11}^{-1}({\varvec{y}}_1-\varvec{\mu }_1), \end{aligned}$$
(17)
$$\begin{aligned} \varvec{\varSigma }_{1|2}= & {} \varvec{\varSigma }_{22}-\varvec{\varSigma }_{21}\varvec{\varSigma }_{11}^{-1}\varvec{\varSigma }_{12}. \end{aligned}$$
(18)

Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. Formally, an \({\varvec{f}}\) function is a Gaussian process if any finite set of values \(f({\varvec{x}}_1),...f({\varvec{x}}_N)\) has a multivariate normal distribution on any S domain, \({\varvec{x}}_1,...{\varvec{x}}_N \in S\). GP is specifed by a mean function \(m({\varvec{x}})\) and a covariance function \(k({\varvec{x}},\varvec{x'})\), which is denoted as \(f\sim \mathcal {GP}(m({\varvec{x}}),k({\varvec{x}},\varvec{x'}))\). The mean and covariance of f GP for any \({\varvec{x}},\varvec{x'}\) are \(m({\varvec{x}})=E\left[ f({\varvec{x}})\right]\) and \(k({\varvec{x}},\varvec{x'})=Cov(f({\varvec{x}}),f(\varvec{x'}))\). The goal is to estimate function values of GP conditioned on some training data. Denote the set of inputs as \({\varvec{X}}\in {\mathbb {R}}^{N\times D}\), and the corresponding function values as \({\varvec{f}}\in {\mathbb {R}}^{N}\), where D is the domain dimension. In the simplest case, the mean function is assumed as constant zero, the \({\varvec{f}}\) values are noise-free. The \(k({\varvec{x}},\varvec{x'})\) covariance function is driven by its \(\varvec{\theta }\) hyperparameters, or kernel parameters. One of the most popular kernel function is the squared exponential given by the following formula:

$$\begin{aligned} k({\varvec{x}},\varvec{x'})=\sigma ^2\exp \left( -\frac{({\varvec{x}}-\varvec{x'})}{2l^2})\right) , \end{aligned}$$
(19)

where \(\sigma _f^2\) and l are the hyperparameters, denoted by \(\varvec{\theta }\) above. To estimate the function values \({\varvec{f}}_*\) for a new set of inputs \({\varvec{X}}_*\) similarly to Eq. (16) assume

$$\begin{aligned} \begin{pmatrix} {\varvec{f}} \\ {\varvec{f}}_* \end{pmatrix}\sim {\mathcal {N}}\left( \begin{pmatrix} {\varvec{0}} \\ {\varvec{0}} \end{pmatrix}, \begin{pmatrix} {\varvec{K}}_{{\varvec{X}}{\varvec{X}}} &{} {\varvec{K}}_{{\varvec{X}}\varvec{X_*}}\\ {\varvec{K}}_{\varvec{X_*}{\varvec{X}}} &{} {\varvec{K}}_{\varvec{X_*}\varvec{X_*}} \end{pmatrix} \right) . \end{aligned}$$
(20)

The \({\varvec{K}}_{{\varvec{X}}{\varvec{X}}},{\varvec{K}}_{{\varvec{X}}\varvec{X_*}},{\varvec{K}}_{\varvec{X_*}{\varvec{X}}},{\varvec{K}}_{\varvec{X_*}\varvec{X_*}}\) covariance matrices are constructed from the kernel function. The conditional distribution of \({\varvec{f}}_*\) can be calculated from the conditional properties of multivariate normal.

$$\begin{aligned} P({\varvec{f}}_*|\varvec{X_*},{\varvec{X}},{\varvec{f}},\varvec{\theta })={\mathcal {N}}({\varvec{K}}_{\varvec{X_*}{\varvec{X}}}{\varvec{K}}_{{\varvec{X}}{\varvec{X}}}^{-1}{\varvec{f}},{\varvec{K}}_{\varvec{X_*}\varvec{X_*}}{\varvec{K}}_{{\varvec{X}}{\varvec{X}}}^{-1}{\varvec{K}}_{{\varvec{X}}\varvec{X_*}}) \end{aligned}$$
(21)

This derived formula is called the predictive distribution and it models the distribution of an unobserved set of inputs. The function values can be estimated by taking the mean, and furthermore the covariance structure of the estimated values are generally known. In practice the function values are not accessed directly but can be observed with some noise.

$$\begin{aligned} y_n=f({\varvec{x}}_n)+\epsilon _n, \end{aligned}$$
(22)

where \(\epsilon _n \sim {\mathcal {N}}(0,\sigma ^2)\). The observation noise can be incorporated into the kernel function by adding \(\sigma ^2\) to every diagonal term in each covariance matrix. The new kernel takes the form of

$$\begin{aligned} k'({\varvec{x}},\varvec{x'})=k({\varvec{x}},\varvec{x'})+\sigma _n^2 I({\varvec{x}}=\varvec{x'}), \end{aligned}$$
(23)

where \(I(\cdot )\) represents the indicator function. The predicted distribution of the unobserved \({\varvec{y}}_*\) values of the \(\varvec{X_*}\) set of input can be derived similarly as Eq. (21).

$$\begin{aligned} \begin{aligned} P({\varvec{y}}_*|\varvec{X_*},{\varvec{X}},{\varvec{y}},\varvec{\theta })={\mathcal {N}}({\varvec{K}}_{\varvec{X_*}{\varvec{X}}}{\varvec{K}}_{{\varvec{X}}{\varvec{X}}}^{-1}{\varvec{y}},{\varvec{K}}_{\varvec{X_*}\varvec{X_*}}{\varvec{K}}_{{\varvec{X}}{\varvec{X}}}^{-1}{\varvec{K}}_{{\varvec{X}}\varvec{X_*}}) \end{aligned} \end{aligned}$$
(24)

2.3 TEC map estimation with Gauss process regression

In the previous section the model was considered with a zero mean function, however, in many cases a trend function defines the relation between the input and output data. Consider a training sata set \(\{(\varvec{x_i},y_i;\;i\;=\;1,2,\ldots ,n)\}\), with a following linear connection:

$$\begin{aligned} y=h({\varvec{x}})^T\varvec{\beta }+\varepsilon , \end{aligned}$$
(25)

where \({\varvec{x}}_i\in {\mathbb {R}}^d\) and \(y_i\in {\mathbb {R}}\), \(\varepsilon \sim {\mathcal {N}}(0,\sigma ^2)\). The \(h({\varvec{x}})\) is a chosen base function and the \(\varvec{\beta }\) coefficients and the \(\sigma\) standard deviation are estimated from the training data set. In case of ionosphere modelling, the \({\varvec{x}}_i\equiv \{\phi _i,\lambda _i\}, y_i\equiv VTEC_i ;i=1,2,\ldots ,n\), where n is the number of observaion used for the TEC map construction, \(\phi _i,\lambda _i\) are the latitude and longitude of the IPP, and the \(VTEC_i\) is the corresponding vertical electron content derived from the observations and the single-layer mapping model Dach et al. (2007). The \(h({\varvec{x}})\) function can be polynomial or spherical harmonics which established their viability in TEC modelling (Komjathy et al. 2005; Sun et al. 2020), and one of its variant is used for DCBs estimation in Eq. (44). Instead of solving Eq. (25) for \(\varvec{\beta }\), a location dependent GP will be introduced. The \(\varepsilon\) noise parameter is implicitly in the kernel like in Eq. (23). Consider the following model, where the definition of \({\varvec{x}}\), y are the same as in Eq. (25).

$$\begin{aligned} y=h({\varvec{x}})^T\varvec{\beta }+f({\varvec{x}}), \end{aligned}$$
(26)

where \(f({\varvec{x}})\) is a zero mean GP with covariance function \(k({\varvec{x}},{\varvec{x}}')\), \(f({\varvec{x}})\sim GP(0,k({\varvec{x}},{\varvec{x}}'))\). The basis functions \(h({\varvec{x}})\) transforms the original \({\varvec{x}}\) vector to a \({\mathbb {R}}^{p}\) feature space. The \(\varvec{\beta }\) represents the coefficients of the basis vector with its p-by-1 dimension. The GPR model for an instance of response y is the following:

$$\begin{aligned} P(y_i|f({\varvec{x}}_i),{\varvec{x}}_i)\sim {\mathcal {N}}(y_i|h({\varvec{x}}_i)^T\varvec{\beta } + f({\varvec{x}}_i),\sigma ^2) \end{aligned}$$
(27)

The GPR model is nonparametric, hence it introduces an \(f({\varvec{x}}_i)\) latent variable for each \({\varvec{x}}_i\) observation. The vector form of the model is

$$\begin{aligned} P({\varvec{y}}|{\varvec{f}},{\varvec{X}})\sim {\mathcal {N}}({\varvec{y}}|{\varvec{H}}\varvec{\beta } + {\varvec{f}},\sigma ^2{\varvec{I}}), \end{aligned}$$
(28)

where

$$\begin{aligned} {\varvec{X}} = \begin{pmatrix} {\varvec{x}}_1^T \\ {\varvec{x}}_2^T \\ \vdots \\ {\varvec{x}}_n^T \end{pmatrix}, {\varvec{y}} = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix}, {\varvec{H}} = \begin{pmatrix} h({\varvec{x}}_1^T) \\ h({\varvec{x}}_2^T) \\ \vdots \\ h({\varvec{x}}_n^T) \end{pmatrix}, {\varvec{f}}= \begin{pmatrix} f({\varvec{x}}_1) \\ f({\varvec{x}}_2) \\ \vdots \\ f({\varvec{x}}_n) \end{pmatrix}. \end{aligned}$$
(29)

The basis matrix \({\varvec{H}}\) has a dimension of \(n \times p\), where n is the number of observations, or training data, and p is the dimension of feature space. In case of the missing \({\varvec{H}}\) matrix, the form converts back to the Eq. (22). The simplest choice for basis function is constant, so the feature space p would be one dimension, so the \({\varvec{H}}={\varvec{1}}\) basis matrix is an \(n\times 1\) vector of ones. In this paper the constant base matrix was used. Worth to mention the linear and quadratic form, where \({\varvec{H}}_{lin}=[{\varvec{1}},{\varvec{X}}]\) and \({\varvec{H}}_{quad}=[{\varvec{1}},{\varvec{X}},{\varvec{X}}_2]\) respectively. Extracted form of the matrices in details in case of ionosphere modelling are:

$$\begin{aligned} {\varvec{H}}_{lin}= & {} [{\varvec{1}}, {\varvec{X}} ]= \begin{pmatrix} 1 &{}\phi _1&{} \lambda _1 \\ 1 &{}\phi _2 &{}\lambda _2 \\ \vdots &{} \vdots &{} \vdots \\ 1&{}\phi _n &{}\lambda _n \\ \end{pmatrix},\\ H_{quad}= & {} [{\varvec{1}}, {\varvec{X}} ,{\varvec{X}}_2]= \begin{pmatrix} 1 &{}\phi _1&{} \lambda _1 &{}\phi _1^2 &{}\lambda _1^2 \\ 1 &{}\phi _2 &{}\lambda _2 &{}\phi _2^2 &{}\lambda _2^2 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 1&{} \phi _n &{}\lambda _n &{}\phi _n^2 &{}\lambda _n^2 \\ \end{pmatrix}. \end{aligned}$$

The \({\varvec{f}}\) latent variable in the Eqs. (29) and (28) has a joint distribution by definition, and it follows the form of

$$\begin{aligned} P({\varvec{f}}|{\varvec{X}})\sim {\mathcal {N}}({\varvec{f}}|{\varvec{0}},K({\varvec{X}},{\varvec{X}})), \end{aligned}$$
(30)

where the \(K({\varvec{X}},{\varvec{X}})\) has the form of

$$\begin{aligned} K({\varvec{X}},{\varvec{X}})= \begin{pmatrix} k({\varvec{x}}_1,{\varvec{x}}_1) &{} k({\varvec{x}}_1,{\varvec{x}}_2) &{} \ldots &{} k({\varvec{x}}_1,{\varvec{x}}_n)\\ k({\varvec{x}}_2,{\varvec{x}}_1) &{} k({\varvec{x}}_2,{\varvec{x}}_2) &{} \ldots &{} k({\varvec{x}}_2,{\varvec{x}}_n)\\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ k({\varvec{x}}_n,{\varvec{x}}_1) &{} k({\varvec{x}}_n,{\varvec{x}}_2) &{} \ldots &{} k({\varvec{x}}_n,{\varvec{x}}_n)\\ \end{pmatrix} \end{aligned}$$
(31)

The used \(k({\varvec{x}},{\varvec{x}}')\) covariance function in this paper is the Matern52 kernel defined as

$$\begin{aligned} k({\varvec{x}}_i,{\varvec{x}}_j)=\sigma _f^2 \left( 1 + \frac{\sqrt{5}r}{\sigma _l} + \frac{\sqrt{5}r^2}{3\sigma _l^2} \right) exp \left( - \frac{\sqrt{5}r}{\sigma _l}\right) , \end{aligned}$$
(32)

where r is the Euclidean distance between \({\varvec{x}}_i\) and \({\varvec{x}}_j\). The \(\varvec{\theta }\) hyperparameters in the above mentioned kernel are \(\sigma _f\) and \(\sigma _l\). The \(k({\varvec{x}}_i,{\varvec{x}}_j|\varvec{\theta })\) form highlights the dependency of the \(k({\varvec{x}}_i,{\varvec{x}}_j)\) kernel from the \(\varvec{\theta }\) hyperparameters. To create a TEC map, first the \(\varvec{\beta }\) basis function coefficients, the \(\sigma ^2\) noise variance and the kernel’s \(\varvec{\theta }\) hyperparamaters shall be estimated from the given \(({\varvec{X}},{\varvec{y}})\) training data set. The used approach is the \(P({\varvec{y}}|{\varvec{X}})\) likelihood maximization as a function of \(\varvec{\beta }\), \(\varvec{\theta }\) and \(\sigma ^2\). The estimation is conducted by maximization of the marginal log likelihood function:

$$\begin{aligned} \hat{\varvec{\beta }},\hat{\varvec{\theta }},{\hat{\sigma }}^2=\arg \underset{\varvec{\beta },\varvec{\theta },\sigma ^2}{\max }\log P({\varvec{y}}|{\varvec{X}},\varvec{\beta },\varvec{\theta },\sigma ^2). \end{aligned}$$
(33)

The marginal log likelihood function has a form of:

$$\begin{aligned} \begin{aligned} \log P({\varvec{y}}|{\varvec{X}},\varvec{\beta },\varvec{\theta },\sigma ^2)=-\frac{1}{2}(\mathbf{y }-{\varvec{H}}\varvec{\beta })^T {\varvec{M}}^{-1}({\varvec{y}}-{\varvec{H}}\varvec{\beta }) - \frac{n}{2} \log 2\pi - \frac{1}{2}log |{\varvec{M}}|, \end{aligned} \end{aligned}$$
(34)

where

$$\begin{aligned} {\varvec{M}} = K({\varvec{X}},{\varvec{X}}|\varvec{\theta }) + \sigma ^2{\varvec{I}}. \end{aligned}$$
(35)

First maximize the log likelihood function with respect to \(\varvec{\beta }\) for given \(\varvec{\theta }\) and \(\sigma ^2\). The \(\hat{\varvec{\beta }}(\varvec{\theta }, \sigma ^2)\) estimation is

$$\begin{aligned} \hat{\varvec{\beta }}(\varvec{\theta }, \sigma ^2) =[{\varvec{H}}^T {\varvec{M}}^{-1}{\varvec{H}}]^{-1}\,{\varvec{H}}^T {\varvec{M}}^{-1}{\varvec{y}}. \end{aligned}$$
(36)

Substituting Eq. (36) to Eq. (34) yields a \(\beta\)-profiled log likelihood.

$$\begin{aligned} \begin{aligned} \log P({\varvec{y}}|{\varvec{X}},\varvec{\theta },\sigma ^2)=-\frac{1}{2}(\mathbf{y }-{\varvec{H}}\varvec{{\hat{\beta }}})^T {\varvec{M}}^{-1}({\varvec{y}}-{\varvec{H}}\varvec{{\hat{\beta }}}) - \frac{n}{2} \log 2\pi - \frac{1}{2}log |{\varvec{M}}| \end{aligned} \end{aligned}$$
(37)

Equation (37) is maximized over \(\varvec{\theta }\) and \(\sigma ^2\) to find their estimates, then back substitute the optimized \(\varvec{\theta }\) and \(\sigma ^2\) to the Eq. (36) to fix the \(\varvec{\beta }\) coefficients. The last step is to make prediction to the new \({\varvec{y}}_{g}\) TEC values in the given \({\varvec{X}}_{g}\) data set, to create a TEC map. The observation noise free prediction follows as

$$\begin{aligned} P({\varvec{y}}_{g}|{\varvec{y}},{\varvec{X}},{\varvec{X}}_{g}) = {\mathcal {N}}({\varvec{y}}_{g}|{\varvec{H}}_{{\varvec{X}}_{g}}\varvec{\beta }+\varvec{\mu },\varvec{\varSigma }), \end{aligned}$$
(38)

where

$$\begin{aligned} \varvec{\varSigma } = K({\varvec{X}}_{g},{\varvec{X}}_{g})-K({\varvec{X}}_{g},{\varvec{X}})(K({\varvec{X}},{\varvec{X}})+\sigma ^2{\varvec{I}})^{-1}K({\varvec{X}},{\varvec{X}}_{g}), \end{aligned}$$
(39)

and \({\varvec{y}}_g,{\varvec{X}}_g,{\varvec{H}}_g\) have similar from as in Eq. (29) but with the new \({\varvec{X}}_g = ({\varvec{x}}_1^g, {\varvec{x}}_2^g, \ldots ,{\varvec{x}}_n^g)\) consists the grid coordinates and \({\varvec{y}}_g\) vector contains the corresponding VTEC values. The expected TEC values of the created TEC map can be derived from Eq. (38) and found as

$$\begin{aligned} E({\varvec{y}}_{g}|{\varvec{y}},{\varvec{X}},{\varvec{X}}_{g}),\varvec{\beta },\varvec{\theta },\sigma ^2) = (H_{{\varvec{X}}_{g}}\varvec{\varvec{\beta }}+K({\varvec{X}},{\varvec{X}}_{g}|\varvec{\varvec{\theta }})\alpha , \end{aligned}$$
(40)

where

$$\begin{aligned} \alpha = (K({\varvec{X}},{\varvec{X}}|\varvec{\theta })+\sigma ^2{\varvec{I}})^{-1}({\varvec{y}}-{\varvec{H}}\varvec{\beta }). \end{aligned}$$
(41)

2.4 Differential code bias estimation with Kalman filter

Kalman filter is a common choice for real-time estimation. Consider extending the Eq. (12) where the following m(z) mapping function of the used single layer model Dach et al. (2007):

$$\begin{aligned} m(z) = \left( 1-\left( \frac{R_e}{R_e+H} sin(\alpha _mz)\right) ^2\right) ^{-\frac{1}{2}}, \end{aligned}$$
(42)

with single-layer height \(H = 506.7\) km, Earth radius \(R_e=6371\) km, and \(\alpha _m=0.9782\). The VTEC is estimated with second degree polynomials in \(\lambda\) latitude and \(\phi\) longitude. The following equation can be derived from the general form of Eq. (25), with polynomials as \(h({\varvec{x}})\) base function and \(\varvec{\beta }\) as the adjustable polynomial coefficients.

$$\begin{aligned} \begin{aligned} VTEC(\lambda ,\phi ) = \sum _{j=0}^{5} \omega ^j(\lambda ,\phi ) VTEC^j(\lambda ,\phi |\lambda _0^j,\phi _0^j), \end{aligned} \end{aligned}$$
(43)

where

$$\begin{aligned} \begin{aligned}&VTEC^j(\lambda ,\phi |\lambda _0^j,\phi _0^j) = a_0^j + a_1^j (\lambda _0^j - \lambda ) + a_2^j (\phi _0^j - \phi ) \\&\quad +a_3^j (\lambda _0^j - \lambda )^2 + a_4^j (\phi _0^j - \phi )^2 \\&\quad + a_5^j (\lambda _0^j - \lambda ) (\phi _0^j - \phi ) \end{aligned} \end{aligned}$$
(44)

The \(a_n^j\), \(n,j \in {0, \ldots 5}\) are the coefficients estimated by the Kalman filter are the same declared as \(\varvec{\beta }\) in Eq. (25). The \({\lambda _0^j, \phi _0^j}\), \(j \in {0, \ldots 5}\) are the center of the polynomials. The \(\omega (\lambda ,\phi )\) is a weight factor. The inverse distance of the IPP and the center of polynomial is a common choice for the weight factor but in this paper the closest polynomial is the only contributor.

$$\begin{aligned} \omega ^j(\lambda ,\phi )= {\left\{ \begin{array}{ll} 1,&{} \text {if } d^j \equiv \min \limits _{n \in 0\dots 5} d^n \\ 0, &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$
(45)

where \(d^j = \left\| (\lambda ,\phi ) - (\lambda _0^j,\phi _0^j) \right\|\) is the Euclidean distance of the \((\lambda ,\phi )\) IPP coordinates and the origo of the j-th polynomial.

Table 1 Center coordinate of polynomials

The first four polynomials cover the European region. The number and position of them are arbitrary and summarized in Table 1. During the model alignment, it was found that even one polynomial is able to follow the main trend of the ionosphere which is suffice for DCB estimation, however usage of four polynomials has gained better and more consistent results. The number of six, and nine were also tested, but due to the non-uniform spatial distribution of IPPs, some polynomials suffered the lack of nearby measurement points and struggled to converge, causing biases in DCBs. The 4th and 5th polynomial are dedicated to HBK and MON station respectively. They are located in North-America and South-Africa, therefore their observations cannot contribute to the TEC calculation of the European region but served well for DCB estimation of the satellites.

The Kalman filter has a prediction step and update step. The estimated \({\varvec{x}}\) state vector contains the \(a_n^i\) polynomial coefficients and the DCBs of satellites and permanent stations. The equations follow as

$$\begin{aligned} \begin{aligned} {\varvec{x}}_k&={\varvec{F}}_k{\varvec{x}}_{k-1}+\varvec{\omega }_{k-1} \\ {\varvec{y}}_k&= {\varvec{D}}_k {\varvec{x}}_k + {\varvec{e}}_k, \end{aligned} \end{aligned}$$
(46)

where \({\varvec{F}}_k\) is the transition matrix, which in this model corresponds to a simple identity matrix. The \({\varvec{y}}_k\) is the observation vector, which contains the \(L_4\) combinations. The \({\varvec{D}}_k\) design matrix transforms the \({\varvec{x}}_k\) unknown parameters from the state space to the observation space according to Eqs. (12) and (43).

The \(\varvec{\omega }_{k}\) is the process noise vector and \({\varvec{e}}_k\) is the measurement noise vector. Both of them are assumed as a white noise with zero expected values and the two vector mutually independent of each other.

$$\begin{aligned} \begin{aligned}&E[\varvec{\omega }_k] = 0;\; E[\varvec{\omega }_k,\varvec{\omega }_l^T] = \delta _{k,l} \varvec{\varSigma }_{\omega } \\&E[{\varvec{e}}_k] = 0;\; E[{\varvec{e}}_k,{\varvec{e}}_l^T] = \delta _{k,l} \varvec{\varSigma }_{y}\\&E[\varvec{\omega }_k,{\varvec{e}}_l^T] =0, \end{aligned} \end{aligned}$$
(47)

where \(\delta _{k,l}\) is the Kronecker delta. The covariance matrix of the process noise \(\varvec{\varSigma }_{\omega }\) consist three different sub-blocks, \(\varvec{\varSigma }_{\omega }^{poly}\) for the polynomial coefficients, \(\varvec{\varSigma }_{\omega }^{sv}\) for satllites’ DCBs and \(\varSigma _{\omega }^{stat}\) for stations’ DCBs.

$$\begin{aligned} \varvec{\varSigma }_{\omega }= \begin{pmatrix} \varvec{\varSigma }_{\omega }^{poly} &{} 0 &{} 0\\ 0 &{} \varvec{\varSigma }_{\omega }^{sv} &{} 0\\ 0 &{} 0 &{} \varvec{\varSigma }_{\omega }^{stat} \end{pmatrix}, \end{aligned}$$
(48)

where \(\varvec{\varSigma }_{\omega }^{sv}\) and \(\varvec{\varSigma }_{\omega }^{stat}\) have \(\sigma _{sv}^2\) and \(\sigma _{stat}^2\) in diagonal and zeros in off-diagonal. The \(\varSigma _{\omega }^{poly}\) has a form of:

$$\begin{aligned} \varvec{\varSigma }_{\omega }^{poly}= \begin{pmatrix} \varvec{\varSigma }_{\omega }^{poly_0} &{} &{} 0\\ &{} \ddots &{} \\ 0 &{} &{} \varvec{\varSigma }_{\omega }^{poly_5} \end{pmatrix}, \end{aligned}$$
(49)

where \(\varSigma _{\omega }^{poly_i} \in i=[0,\ldots ,5]\) have only diagonal elements but differ in respect of coefficients. The GNSS observation was considered An elevation dependent wighting scheme was applied on the \(\varvec{\varSigma }_{y}\) observation covariance matrix. The non-diagonal elements were set to zero, and the \(\sigma _y^i\) diagonal variances were calculated by the following formula (adapted from Wang et al. 2015):

$$\begin{aligned} \sigma _y^i = \sigma ^2(1+sin^2(z_i)), \end{aligned}$$
(50)

where \(z_i\) is the zenith angle corresponding to the ith IPP, and the \(\sigma ^2\) comes from the likelihood maximization presented in Eq. (33).

3 Results

3.1 Concept validation with simulation

This section presents the GPR’s capabilities for TEC map generation from the available VTEC values at the corresponding coordinates of IPPs. The assessment consists a simulation of the VTEC values in the IPPs for a given epoch from a known reference map. The IPPs position have been calculated from the postion of RIMS and GPS satellites with an ionospheric layer height 350 km. The used reference map is the product of Royal Observatory of Belgium (ROB) Bergeot et al. (2014) at date of 2020-01-23. The published ROB map has a 15 minutes time update and \(0.5^{\circ } \times 0.5^{\circ }\) spatial resolution in lattidue and longitude. The VTEC values come from the ROB product, then a zero mean Gaussion noise scaled by the obliquity factor is added to the simulated data. As first step, the \(L_4^s\) combination was calculated from the reference ROB map.

$$\begin{aligned} L_4^s=\alpha _f m(z_{ipp} )VTEC(\lambda _{ipp},\phi _{ipp})^{R}+b_r+b^s+\varepsilon _{L_4}^s, \end{aligned}$$
(51)

where \(L_4^s\) is the simulated measurement, \(\varepsilon _{L_4}^s\sim {\mathcal {N}}(0,\sigma _s^2)\) is a zero mean Gaussian noise. In the second step, the VTEC values are calculated from the \(L_4^s\) for a given epoch. The \(VTEC(\lambda _{ipp},\phi _{ipp})^s\) comes from the Eq. (51) as

$$\begin{aligned} VTEC(\lambda _{ipp},\phi _{ipp})^s=\frac{L_4^s-b_r-b^s}{\alpha _f m(z_{ipp})} \end{aligned}$$
(52)

The DCBs are considered as known values therefore the \(VTEC^s\) distribution comes only from \(L_4^s\), scaled by the invers of the obliquity factor.

$$\begin{aligned} VTEC(\lambda _{ipp},\phi _{ipp})^s \sim {\mathcal {N}}\left( VTEC(\lambda _{ipp},\phi _{ipp})^{R},\frac{\sigma _s^2}{m(z_{ipp})^2}\right) \end{aligned}$$
(53)

The \(VTEC^s\) values with the corresponding IPP coordinates are collected for a given epoch and feeded to the GPR model to estimate a TEC map at matching grid points of the ROB map. A day long data set was used, therefore \(24\times 4\) TEC map were estimated with \(2.5^{\circ } \times 2.5^{\circ }\) spatial resolution, altogether more then 20000 points.

After differentiation with the reference VTEC values, the discrepancies have been evaulated. The standard deviation of the added noise varied from 0 to 10 TECu. In noise free case, an almost perfect match is expected with the reference map. Besides the GPR, two additional methods were used for interpolation to give more perspective to the results. The assisting methods are the following: radial base function (RBF) with multiquadratic function, and third degree polynomial fit (POLY), adapted from (Yilmaz et al. 2009; Yu et al. 2015). The \(d_n^k\) denotates the differenced VTEC value, where k is the time index and n is the grid point index, computed as:

$$\begin{aligned} d_n^{k}(m) = VTEC_n^{ref}(t_k)-VTEC_n^{m}(t_k) \end{aligned}$$
(54)

where \(m\in [GPR,RBF,POLY]\) indicates the type of method.

Fig. 5
figure 5

Histogram of differencies derived from GPR, RBF, POLY methods with added gaussian noise, \(\sigma = [0,6]\) TECu

The \(d_n^{k}(GPR)\), \(d_n^{k}(RBF)\) and the \(d_n^{k}(POLY)\) values were calculated with \(\sigma _s = {0,1,2,3,4,6,8,10}\) [TECu] added noise. In Fig. 5, the two-dimensional histograms show the \(d_n^{k}\) values in respect of the reference values. Result of the three interpolation method with zero and \(\sigma _s = 6[TECu]\) added noise. The Fig. 5 contains altogether six subplots. The Fig. 5a, b for the GPR, Fig. 5c, d for the RBF and Fig. 5e, f for the POLY method in case of zero and 6 TECu simulation noise. The Gaussian process regression model demonstrates its capabilaties to effectively estimates TEC map from a snapshot measurement dataset. In error free case scenario, the GPR has better characteristics than the RBF or polynomial fit method (Fig. 5.a). In case of large noise the polynomial fit shows inferior quality than the other two examined methods (Fig. 5f).

Fig. 6
figure 6

Global mean of absolute differences in respect of simulation noise

In Fig. 6, the global mean of \(|d_n^{k}|\) showed in respect of the added noise. The error bars stand for the global standard deviation of \(d_n^{k}\). In both quality indicators, the GPR shows better performance in overall domain of the increasing measurement noise Hhowever However the performance of the polynomial and the radial base function could be definitely improved with better parameter adjustments strategies. This simulation has no goal to state a comprehensive qaulity assessment of the three chosen estimation method, only to show GPR’s ability to create regional TEC maps from epoch-wise observation data of a sparse monitoring network.

3.2 Real-time ionosphere map creation from RIMS observations

The performance of the GPR model was investigated in nine days from 2020, grouped by three consecutive days. The first three days are Jan 23-25, when the sunspot numbers reached the lowest in the current 11 year long cycle McIntosh et al. (2020). The next three days are April 19-21. They were chosen because of the observed high ionospheric activity on April 20. And the last three days are July 19-21. The EGNOS RIMS observation data were used to run the Kalman filter aided GPR model. The created TEC maps are compared to the ROB and CODE products, and they were regarded as reference. To give perspective of the derived quality indicators, the same comparison was also performed to the two reference ionospheric map. Three type of VTEC differences are derived based Eq. (55).

$$\begin{aligned} d_n^{k}(m_i,m_j) = VTEC_n^{m_i}(t_k)-VTEC_n^{m_j}(t_k) \end{aligned}$$
(55)

where \(m_i,m_j \in [ROB,CODE,GPR]\) indicates the type of TEC map.

Fig. 7
figure 7

Daily statistics between the ROB-GPR, CODE-GPR and ROB-CODE products depicted by blue, red and yellow lines respectively. The upper subfigures refer to data collected from January, the middle subfigures’ data are from April, and the lower subfigures’ data are from July

The global mean, the absolute mean and the standard deviation of \(d_n^{k}\) values are derived from Eq. (55) and showed in Fig. 7a, b, c respectively for each day and combination. The mean absolute values of ROB-CODE combination are close to 0.5 TECu for each day, meanwhile the GPR combinations with the references have 1 TECu absolute mean. These discrepancies are consistent in the analyzed time intervals. The standard deviation of ROB-CODE are varies around 0.6-0.7 TECu. The standard deviations of GPR-ROB and GPR-CODE are less than twice the ROB-CODE and approximately 0.9-1.2 TECu. The global means of ROB-CODE are also closer to the expected zero than the GPR counterparts.

Fig. 8
figure 8

Mean of TEC value differences between ROB-GPR, CODE-GPR and ROB-CODE products in the assessed grid points of the nine examined day

Fig. 9
figure 9

Standard deviation of TEC value differences between ROB-GPR, CODE-GPR and ROB-CODE products in the assessed grid points of the nine examined day

Spatial characteristics of mean and standard deviations are depicted in Figs. 8, 9 respectively. There is a visible trend in the mean values. The GPR method tends to overestimate the ionospheric delay in the northern region, hence the values are negative. On the other hand, the GPR underestimates the TEC values in the Mediterranean region. This skew does not appear when the two reference maps are analyzed in Fig. 8c. The mean absolute discrepancies of GPR are less than 2 TECu in all grid points. The gridwise standard deviations are less in the terrestrial regions, and shows correlation with the means both in case of ROB-GPR and CODE-GPR.

Fig. 10
figure 10

Two-dimensional histograms of the differences between ROB-GPR, CODE-GPR and ROB-CODE TEC maps in respect of the TEC values of ROB product

The two-dimensional histograms help to visualize the distribution of TEC differences in respect of the estimated values (Fig. 10). The plotted data are collected from all 9 investigated day. One can observe again a skewness in the GPR data in the lower TEC region (Fig. 10a, b). The outliers are contained in range of \(\pm 5\) TECu. This distortion is not visible when the differencies of reference maps are depicted (Fig. 10c).

Fig. 11
figure 11

Daily TEC variations at 6 selected location assessed from GPR model and CODE, ROB products on 2020 April 20

To gain better insight, six grid points were chosen to show the daily variation of the VTEC values (Fig. 11). The shaded area represents the 2 \(\sigma\) standard deviation. There is no shading in case of ROB product because of the lack of available variance information. It is noticeable that GPR estimates correctly the TEC variation in each case. Mark the bias at lower and higher latitude (Fig. 11a, b and e, f). This flaw in the estimation was already visible in Fig. 8, and it is present during the whole day, and not the result of a temporally localized disturbance. The authors assume that the main source of this systematic error is the insufficient DCB estimation of the RIMS receivers.

4 Conclusion

The presented Gaussian process regression approach is a novel and promising method for ionospheric model derivation from multi-frequency GNSS measurements. It is capable to accurately estimate regional Total Electron Content (TEC) maps from snapshot measurements of a relatively sparse monitoring station. Accuracy of the GPR-based models in this paper is about \(\pm 2\) TECu, which is comparable to the ±1 TECu accuracy of the corresponding models in the literature. In addition to the TEC modelling, the hardware delays were also estimated by a Kalman filter continuously and independently of the GPR. In further research, this loosely coupled setup could be tightened with direct DCBs estimation by the GPR. The hyperparameters of the GPR are calculated epoch-wise manner without taking into account the previous values, so it opens a window to tighten the hyperparameter searching space based on the previous values.