1 Introduction

Knowledge on temporal variations of geoid heights is vitally important in geodesy and Earth science. It is essentially needed for dynamic and kinematic updates of the static geoid model that serves as a reference surface for heights as well as for the transformation between geometrical ellipsoidal heights obtained from the Global Navigation Satellite System (GNSS) measurements and gravity-related heights, e.g. orthometric and normal heights, determined with the use of spirit levelling. Moreover, temporal variations of geoid heights are also needed for modelling a precise regional geoid/quasigeoid of sub-centimetre accuracy, which is one of the activities of the Commission 2 “Gravity Field” of the International Association of Geodesy (IAG), the Joint Study Group 0.15 (JSG 0.15) “Regional geoid/quasi-geoid modelling—Theoretical framework for the sub-centimetre accuracy” of the Intercommission Committee on Theory (ICCT), established for the period from 2015 to 2019 (see Drewes et al. 2016). Temporal variations of geoid heights as geographically dependent require analysis and modelling in different parts of the world using an appropriate data and method.

The Gravity Recovery and Climate Experiment (GRACE) mission data brought very useful information on temporal variations of mass distribution in the Earth’s system and thereby temporal variations of geoid heights (Tapley et al. 2004). Thus, from the beginning of this century, several investigations on the determination of temporal variations of geoid heights using GRACE mission data were conducted. For example, Rangelova (2007) combined GRACE mission data with GNSS, tide gauge/altimetry and absolute gravimetry data to develop a dynamic geoid model for Canada. Rangelova and Sideris (2008) estimated secular geoid height changes in North America using GRACE mission and terrestrial geodetic data. The resulting dynamic geoid model obtained accordingly to those studies was implemented as a vertical datum for orthometric heights in Canada (cf. Rangelova et al. 2010).

For the area of Poland, an intensive research on modelling the geoid/quasigeoid has been conducted in the last two decades (for more details see Krynski 2007). Currently, the fit of the static quasigeoid model developed over this area to different sets of GNSS/levelling data is estimated to 1.4–2.2 cm, in terms of standard deviation of the differences (e.g. Szelachowska and Krynski 2014). With such a fit, temporal variations of geoid heights seem important to be investigated and taken into the consideration. Krynski et al. (2014) conducted research for analysing temporal variations of the Earth’s gravity field over the whole area of Europe, including the area of Poland and surrounding areas. The authors analysed release RL04 GRACE-based Global Geopotential Models (GGMs) using Fourier’s analysis (e.g. Bloomfield 2000) and seasonal decomposition (cf. Makridakis et al. 1983) methods. They showed that amplitudes of temporal variations of geoid heights within the area of Central Europe reach up to 7 mm. Godah et al. (2017a, b) analysed and modelled temporal variations of geoid heights determined from the latest release, i.e. release 5 (RL05), of monthly GRACE-based GGMs for the area of Poland divided into four 3° × 5° subareas, using the same methods that were implemented in Krynski et al. (2014) but substantially smaller subareas. They revealed that temporal variations of geoid heights reach up to 11 mm. The authors indicated that these variations can be modelled with the accuracy of 0.5 mm using the seasonal decomposition method. They also showed that models of temporal variations of geoid heights developed with the use of the seasonal decomposition method were highly correlated, i.e. 96.56–97.56%, with temporal variations of geoid heights computed using monthly RL05 GRACE-based GGMs. Moreover, Godah et al. (2017a) illustrated the preference of seasonal component and trend (long term) component of temporal variations of geoid heights obtained with the use of the seasonal decomposition method for the prediction of temporal variations of geoid heights over the area of Poland.

The main limitation of the Fourier analysis and the seasonal decomposition methods is that they cannot be implemented without prior information, in particular, information concerning periodic signal, i.e. repeated seasonal cycles, of temporal variations of geoid heights. One of the popular analysis and modelling methods used to overcome that limitation is the so-called Principal Component Analysis or Empirical Orthogonal Function (PCA/EOF) method (e.g. Preisendorfer and Mobley 1988; Jolliffe 2002). The PCA/EOF method is one of the statistical decomposition methods that are data driven, thus, one would hope it can model trends and seasonal components of temporal variations of geoid heights quite well. This method has successfully been used by different authors for the analysis and modelling temporal variations of mass distribution within the Earth’s system obtained from GRACE mission data. For example, De Viron et al. (2006) used this method to study the inter-annual continental hydrology signal related to the El Niño–Southern Oscillation (ENSO) obtained from GRACE mission data. Rangelova (2007) and Rangelova and Sideris (2008) applied the PCA/EOF method for modelling secular rates of geoid changes in North America. Anjasmara and Kuhn (2010) implemented this method to analyse the equivalent water height variations obtained from GRACE mission data. Overall, those studies demonstrate the usefulness of the PCA/EOF method for analysing and modelling GRACE mission data. Moreover, Forootan and Kusche (2012) examined the PCA/EOF method and introduced the independent component analysis (ICA) method for separating global time-variable gravity signals. Rangelova et al. (2010) studied the capabilities of the multi-channel singular spectrum analysis (MSSA) method, which is mathematically equivalent to the extended empirical orthogonal functions (EOFs) for extracting water mass anomalies from GRACE data on a global scale and on the Amazon, Congo and Mississippi river basins. Those studies revealed that the ICA and the MSSA are superior to the PCA/EOF method on a global scale and over the investigated river basins. All those statistical decomposition methods have not been yet implemented for analysing and modelling temporal variations of mass distribution within the Earth’s system obtained from GRACE mission data over the area of Poland which is relatively small—rather local scale—and where is smaller mass variation dynamics than over large river basins. The main objective of this contribution is to analyse and model temporal variations of geoid heights determined from GRACE mission data over the area of Poland using the PCA/EOF method.

2 Data set

The area of Poland, bounded by the parallels of 49°N and 55°N and the meridians of 14°E and 24°E, has been chosen as an investigation area. For this area the appropriate spatial resolution of RL05 GRACE-based GGMs filtered using the decorrelation (DDK3; see Kusche et al. 2009) filter and truncated at degree/order 60 is about 3° × 3° (Godah et al. 2017a). Thus, the area of the investigation has been divided into four subareas (Fig. 1). Moreover, Godah et al. (2017a) revealed that temporal variations of geoid heights obtained from RL05 GRACE-based GGMs developed by the CSR and JPL centres over the area of Poland are very similar to the corresponding ones of the GeoForschungsZentrum (GFZ) centre. The differences between temporal variations of geoid heights calculated from GGMs provided by these centres and the respective ones from the Water Global Assessment and Prognosis (WaterGAP) Global Hydrology Model (WGHM; Döll et al. 2003) do not exceed 0.2 mm in terms of standard deviation values. Thus, monthly RL05 GRACE-based GGMs of the GFZ centre (Dahle et al. 2014) developed using the tide free system and conventional constants: Earth radius R = 6,378,136.460 m, and GM = 0.3986004415 × 1015, have been used in this investigation. They were downloaded from the International Center for Global Gravity Field Models (the ICGEM, http://icgem.gfz-potsdam.de/ICGEM/ICGEM.html). The geoid heights N on a monthly basis for the period between 04/2002 and 03/2016 were determined at the centre point for each of those four subareas from the ICGEM (cf. Barthelmes 2013). Then, temporal variations of geoid heights ΔN (GGM) were obtained as follows:

$$ \Delta N_{{({\text{GGM}})i}} = N_{i} - N_{mean} $$
(1)

where i = 1, 2, 3,…,168 stands for a number of monthly solution for the period from 04/2002 to 03/2016, and N mean is the mean value obtained from the time series of N i .

Fig. 1
figure 1

The area of investigation and its four subareas

Within the period investigated, i.e. from 04/2002 to 03/2016, there were thirteen gaps in GFZ RL05 GRACE-based GGMs time series resulting from missing GRACE mission observations for some periods. In order to get continuous time series of ΔN (GGM) for the analysis and modelling tasks (cf. Sect. 4), temporal variations of geoid heights at those gaps were interpolated using the second degree polynomial (cf. Godah et al. 2017b). Time series of temporal variations of geoid heights determined with the use of GFZ RL05 GRACE-based GGMs and gaps in these series are shown in Fig. 2. A distinctive seasonal pattern of temporal variations of geoid heights with maximum values at the beginning of spring and minimum values at the end of summer can be observed. Figure 2 also reveals that ΔN (GGM) differences can reach 2 mm between two subareas at the same epoch, and from epoch to epoch ΔN (GGM) differences can reach 10 mm at the same subarea and 11 mm between two subareas (cf. Godah et al. 2017a, b).

Fig. 2
figure 2

Time series of temporal variations of geoid heights ΔN (GGM); grey columns indicate gaps in ΔN (GGM) time series

3 The PCA/EOF method

The fundamentals of the PCA/EOF method have widely been discussed in the Earth-science related textbooks (e.g. Preisendorfer and Mobley 1988; Jolliffe 2002) as well as by many different authors (e.g. De Viron et al. 2006; Rangelova 2007; Rangelova and Sideris 2008; Anjasmara and Kuhn 2010; Kusche et al. 2011; Forootan and Kusche 2012). The general idea of the PCA/EOF method is to reduce the dimensionality and represent the original data matrix X by the product of two matrices T and P

$$ {\mathbf{X}} = {\mathbf{TP}}^{T} + {\varvec{\upvarepsilon}} $$
(2)

where T consists of PCA modes, P defines the EOF loading patterns and reflects contributions of original variables to the various PCA modes, ε presents the unmodelled part of the signal. In this investigation, the matrix X is obtained from grids of ΔN (GGM) time series from four subareas, normalized by their standard deviations.

The solution obtained from the PCA/EOF method relies on determining matrices P and T. It should be emphasized that the investigation conducted in this study is the first one concerning the analysis and modelling of temporal variations of geoid heights over Poland using the PCA/EOF method. Thus, comparing different algorithms for estimating the PCA modes and EOF loading patterns is considerably needed. In this study, the Non-linear Iterative Partial Least Squares (NIPALS), and the Singular Value Decomposition (SVD) algorithms were investigated (e.g. Wu et al. 1997). The NIPALS algorithm decomposes the data and calculates PCA modes and EOF loading patterns sequentially. On the other hand, the SVD algorithm extracts the PCA modes and EOF loading patterns simultaneously. The NIPALS is an iterative algorithm and can be discontinued at any number of PCA modes. Therefore, the NIPALS algorithm is very efficient when only the first few PCA modes are required. The NIPALS and SVD algorithms are numerically different, and thereby, they can led to different estimation of the PCA modes and EOF patterns. Moreover, the accuracy of PCA modes and EOF loading patterns can be controlled by threshold that reflects the convergence of the solutions. For more details about the differences between the NIPALS and the SVD algorithms, readers are recommended to see Wu et al. (1997). The algorithms are described shortly in this paper as follows (e.g. Wold et al. 1987; Cordella 2012):

The NIPALS algorithm is an iterative procedure to estimate PCA modes and EOF loading patterns. The vector t contains PCA modes while the vector p contains loadings. The convergence criterion, which is constant in the procedure, e.g. threshold = 10−4, is set.

The following steps are performed in the NIPALS algorithm:

  1. 1.

    Set up t as the column from X, e.g. with the largest variance.

  2. 2.

    Project X onto t to calculate the corresponding loading p

    $$ {\mathbf{p}} = \frac{{{\mathbf{X}}_{{}}^{\text{T}} {\mathbf{t}}}}{{{\mathbf{t}}^{\text{T}} {\mathbf{t}}}} $$
  3. 3.

    Normalize loading vector p to length 1

    $$ {\mathbf{p}} = \frac{{\mathbf{p}}}{{\sqrt {{\mathbf{p}}^{\text{T}} {\mathbf{p}}} }} $$
  4. 4.

    Project X onto p to calculate the corresponding new vector t

    $$ {\mathbf{t}} = \frac{{{\mathbf{X}}_{{}}^{\text{T}} {\mathbf{p}}}}{{{\mathbf{p}}^{\text{T}} {\mathbf{p}}}} $$
  5. 5.

    Check the convergence. If the difference between τnew = (t T t) and τold (from last iteration) is larger than threshold∗τnew return to step 2.

  6. 6.

    Remove the estimated PCA mode and EOF loading pattern from X:

    $$ {\mathbf{E}} = {\mathbf{X}} - {\mathbf{tp}}^{T} $$
  7. 7.

    In order to estimate other PCA modes and EOF loading patterns repeat the procedure from the step 1 using the obtained matrix E as the X.

The SVD algorithm is based on a theorem from linear algebra, in which a rectangular matrix X is decomposed into the product of three new matrices:

$$ {\mathbf{X}} = {\mathbf{USV}}_{{}}^{\text{T}} $$

where columns of U are orthonormal eigenvectors of XX T, columns of V are orthonormal eigenvectors of X T X, S is a diagonal matrix containing singular values of X, i.e. square roots of eigenvalues from U or V in decreasing order. The column vectors of V define the EOF loading patterns. The PCA modes are obtained from the column vectors of the matrix T = US.

The computation steps of the SVD algorithm are as follows:

  1. 1.

    Compute X T, X T X.

  2. 2.

    Compute eigenvalues of X T X and sort them in descending order along its diagonal by resolving

    $$ \left| {{\mathbf{X}}^{\text{T}} {\mathbf{X}} - \lambda {\mathbf{I}}} \right| = 0 $$
  3. 3.

    Compute the square root of eigenvalues of X T X to obtain the singular values of X.

  4. 4.

    Build a diagonal matrix S by sorting singular values in descending order along its diagonal and compute S −1.

  5. 5.

    Use eigenvalues from step 2 in descending order and compute the eigenvectors of X T X. Place these eigenvectors along the columns of V.

  6. 6.

    Compute U = XVS −1 and compute PCA modes T = US.

In this study, the NIPALS algorithm has been employed using the STATISTICA software (http://www.statsoft.pl/), whilst the SVD algorithm has been implemented with the use of the MATLAB software, in particular, the function “pca” (https://www.mathworks.com/help/stats/pca.html).

The percentages of the total variance of temporal variations of geoid heights σ (total) were estimated as follows:

$$ \sigma_{k}^{(total)} = \frac{{\lambda_{k} }}{{\Delta^{2} }} $$
(3)

where λ k denotes the eigenvalue estimated from the matrix X, k is the number of the PCA mode and variable ∆ is the total variance of temporal variations of geoid heights.

4 Results

4.1 Analysis of temporal geoid height variations

With the use of the PCA/EOF method, temporal variations of geoid heights obtained from RL05 GRACE-based GGMs for four subareas and normalized by their standard deviations were analysed. The percentages of the total variance of temporal variations of geoid heights reflected by three PCA modes are given in Table 1. They reveal that, over the area of the investigation, ~99.93% of ΔN (GGM) variance can be obtained using the first three PCA modes and EOF loading patterns. The first PCA/EOF accounts for the most significant variance of these variations, i.e. ~96.4%. The second and third PCA/EOF reflect less than 3.5% of total variance of temporal variations of geoid heights over the area investigated.

Table 1 The total variance reflected by the first three PCA modes [%]

Figure 3 shows the first three PCA modes and their corresponding EOF loading patterns of temporal variations of geoid heights over the area of Poland. The first PCA mode reveals a clear seasonal pattern of temporal variations of geoid heights, with maximum values in March and minimum values in July–September. This seasonal pattern is strongly correlated with the increases/decreases of water masses over the area investigated, which are due to the melting of snow that had been accumulated in the winter season, and the water evaporation during dry months within the summer season (cf. Krynski et al. 2014; Godah et al. 2017a, b). Figure 3 also shows that the first EOF loading pattern ranges from ~0.8526 to ~1.0741. It indicates very similar loading patterns of temporal variations of geoid heights obtained for all four subareas. This is because the characteristics of ΔN (GGM) are very similar, i.e. at the same epoch ΔN (GGM) differences between subareas do not exceed 2 mm, for all four subareas (see Fig. 2).

Fig. 3
figure 3

The PCA modes (right panels) and EOF loading patterns (left panels; blue numbers estimated using the NIPALS algorithm and red numbers estimated using the SVD algorithm) of time series of normalized ΔN (GGM). (Color figure online)

The second and third PCA/EOF do not show a clear pattern, e.g. seasonal (periodic) pattern, trend (secular) pattern and consistent loading pattern, in ΔN (GGM) time series investigated. This might be due to the facts that (1) the percentages of total variance reflected by the second PCA/EOF (less than 2.22%), and the third PCA/EOF (less than 1.41%) are much smaller compared to the total variance reflected by the first PCA/EOF (greater than 96.32%), and (2) temporal mass variations within the Earth system over the area investigated are basically relied on seasonal water mass changes that had clearly been revealed from the first PCA/EOF. Thus, the second and the third PCA/EOF are most likely to be insignificant. They can merely be interpreted as white noise in ΔN (GGM) time series investigated.

The results obtained indicate that the PCA modes estimated using the SVD algorithm and the NIPALS algorithm are almost identical. The differences between those modes are depicted in Fig. 4. They are at the level of ± 0.03 for the first PCA mode, and ± 0.05 for the second PCA mode and the third PCA mode. They also exhibit that differences between the first EOF loading patterns estimated with the use of the SVD algorithm and the NIPALS algorithm are approximately − 0.124, + 0.037, − 0.024, and + 0.096 for subareas 1, 2, 3, and 4, respectively.

Fig. 4
figure 4

Histograms showing differences between the PCA modes of normalized ΔN (GGM) estimated using the SVD algorithm and the NIPALS algorithm

4.2 Modelling of temporal geoid height variations

Models of temporal variations of geoid heights ΔN (PCA/EOF) were developed for the aforementioned four subareas (see Fig. 1) on the basis of the first PCA/EOF estimated using the SVD algorithm and the NIPALS algorithm. The differences between ΔN (PCA/EOF) estimated using the SVD algorithm and the NIPALS algorithm are shown in Fig. 5.

Fig. 5
figure 5

Differences between ΔN (PCA/EOF) estimated using the SVD algorithm and the corresponding ones estimated using the NIPALS algorithm

The results presented in Fig. 5 illustrate that ΔN (PCA/EOF) obtained with the use of the SVD algorithm, and the NIPALS algorithm are very similar. The dispersion (maximum-minimum) between them does not exceed ± 0.01 mm. This is because the differences between the first PCA mode obtained with the use of the SVD algorithm, and the NIPALS algorithm are at the level of ± 0.03 (cf. Fig. 4). Those differences in terms of mean values can reach up to ± 0.02 mm. These mean values of the differences are mainly due to differences of the EOF loading patterns (cf. Fig. 3). This might be due to the fact that characteristics of ΔN for the four subareas investigated are quite similar (cf. Fig. 2) and the first PCA mode reflects more than 96.3% of total variance of ΔN (cf. Table 1). Thus, it can be concluded that the differences between modelled temporal variations of geoid heights developed on the basis of the PCA/EOF method using the SVD algorithm and the NIPALS algorithm are practically negligible.

Figure 6 shows models of ΔN (PCA/EOF) determined using the NIPALS algorithm and their corresponding temporal variations of geoid heights obtained using GFZ RL05 GRACE-based GGMs, i.e. ΔN (GGM).

Fig. 6
figure 6

Time series of temporal variations of geoid heights ΔN (GGM) and models of temporal variations of geoid heights ΔN (PCA/EOF) obtained using the PCA/EOF method and the NIPALS algorithm

The results presented in Fig. 6 show that temporal variations of geoid heights ΔN (PCA/EOF) modelled using the first PCA/EOF are in a good agreement with temporal variations of geoid heights obtained from monthly RL05 GRACE-based GGMs ΔN (GGM). They justify the outcomes presented in Table 1 pointing that the total variance from the first PCA mode is over 96% of ΔN (GGM).

The correlation between ΔN (GGM) and modelled temporal variations of geoid heights obtained using the first PCA/EOF, expressed in percent, are given in Table 2. They reveal that estimated correlations between ΔN (GGM) and ΔN (PCA/EOF) range from 97.65 to 98.63%. Table 3 shows statistics of differences between ΔN (GGM) and ΔN (PCA/EOF).

Table 2 The correlation between ΔN (GGM) and ΔN (PCA/EOF) (%)
Table 3 Statistics of the differences between ΔN (GGM) and ΔN (PCA/EOF) (mm)

Differences between ΔN (GGM) and ΔN (PCA/EOF) range from − 1.1 to + 1.3 mm, and estimated standard deviations of these differences are 0.3–0.4 mm (Table 3). The comparison between ΔN (PCA/EOF) and models of temporal variations of geoid heights presented in Godah et al. (2017a) indicated that when using the PCA/EOF method for modelling temporal variations of geoid heights, the dispersion and the standard deviation of the differences can be reduced by 0.3–0.5 mm and 0.1–0.2 mm, respectively. Furthermore, the correlation coefficients between ΔN (GGM) and ΔN (PCA/EOF) are ~1.1% higher than the corresponding ones indicated in Godah et al. (2017b). This may reveal that the PCA/EOF method is slightly better than the seasonal decomposition method for modelling temporal variations of geoid heights over the area investigated.

5 Conclusions

This paper discusses the analysis and modelling of temporal variations of geoid heights over the area of Poland using the Principal Component Analysis/Empirical Orthogonal Function (PCA/EOF) method. Temporal variations of geoid heights ∆N for the period from 04/2002 to 03/2016 over the area of Poland divided into four subareas of 3° × 5° were determined from GRACE mission data. The ∆N can reach 10 mm at the same subarea and 11 mm between two subareas. These ∆N variations should be considered when determining the orthometric height using the ellipsoid heights from GNSS data for different seasons. They should also be considered to improve the static quasigeoid model developed over the area of Poland, which currently fit to GNSS/levelling data with the accuracy of 14–22 mm, in terms of the standard deviation of the differences.

Two algorithms: the Singular Value Decomposition (SVD) algorithm and the Non-linear Iterative Partial Least Squares (NIPALS) algorithm, were implemented to estimate the PCA modes and their corresponding EOF loading patterns.

The results revealed that ~99.93% of total variance of temporal variations of geoid heights can be obtained using the first three PCA modes and EOF loading patterns. The significant signal, i.e. greater than 96.3% in terms of total variance, of temporal variations of geoid heights over the area of Poland can be obtained from the first PCA mode and EOF loading pattern.

Models of temporal variations of geoid heights developed using the PCA/EOF method are satisfactory. The fit, in terms of standard deviations of differences between temporal variations of geoid height models ΔN (PCA/EOF) obtained with the use of the PCA/EOF method, and the respective ones determined from RL05 GRACE-based GGMs ΔN (GGM) is of 0.3–0.4 mm. The ΔN (PCA/EOF) values are highly correlated, i.e. 97.65–98.63%, with the ΔN (GGM).

Overall, the PCA/EOF method is recommended for modelling temporal variations of geoid heights over the area of Poland. It provides slightly better results compared to other methods implemented so far over the area investigated, i.e. the Fourier analysis and the seasonal decomposition methods. The differences between ΔN (GGM) and ΔN (PCA/EOF) obtained when implementing the SVD algorithm, and the NIPALS algorithm do not exceed ± 0.01 mm. They are practically negligible.