A new hybrid method to improve the ultra-short-term prediction of LOD

Accurate, short-term predictions of Earth orientation parameters (EOP) are needed for many real-time applications including precise tracking and navigation of interplanetary spacecraft, climate forecasting, and disaster prevention. Out of the EOP, the LOD (length of day), which represents the changes in the Earth’s rotation rate, is the most challenging to predict since it is largely affected by the torques associated with changes in atmospheric circulation. In this study, the combination of Copula-based analysis and singular spectrum analysis (SSA) method is introduced to improve the accuracy of the forecasted LOD. The procedure operates as follows: First, we derive the dependence structure between LOD and the Z component of the effective angular momentum (EAM) arising from atmospheric, hydrologic, and oceanic origins (AAM + HAM + OAM). Based on the fitted theoretical Copula, we then simulate LOD from the Z component of EAM data. Next, the difference between LOD time series and its Copula-based estimation is modeled using SSA. Multiple sets of short-term LOD prediction have been done based on the IERS 05 C04 time series to assess the capability of our hybrid model. The results illustrate that the proposed method can efficiently predict LOD.

space geodesy orbitography (i.e., precise orbit determination) because of its coupling with the orbit node. However, the LOD prediction is extremely difficult due to extreme events such as El Niño which demonstrated themselves in the LOD signals (Holton and Dmowska 1989;Gross et al. 1996).
Several techniques have been developed to improve the accuracy of LOD prediction. These algorithms could be classified into two groups: first, the methods that use the information within the LOD time series, e.g., auto-covariance (AC) (Kosek et al. 1998;Kosek 2002), wavelet decomposition (Akyilmaz et al. 2011), or neural network (Schuh et al. 2002;Liao et al. 2012;Lei et al. 2015Lei et al. , 2017. Besides, this group includes the hybrid methods using the combination of least squares (LS) and auto-regressive (AR), auto-regressive moving average (ARMA), auto-covariance, and neural network (Kosek et al. 2005;Xu and Zhou 2015;Wu et al. 2019). In the second group, we cast the methods that take into account the axial component of effective angular momentum (EAM Z ) (Freedman et al. 1994;Gross et al. 1998;Niedzielski and Kosek 2008;Kosek 2012;Nastula et al. 2012;Dill et al. 2019). Freedman et al. (1994) showed that the use of atmospheric angular momentum (AAM) wind terms in the Kalman filter technique to predict LOD variations improved near-term predictions.  used UT1-like observations determined by AAM in the UT1-UTC combination solution to predict UT1 which showed a significant reduction in the prediction errors when compared with the previous prediction method ( McCarthy and Luzum 1991). Also, Dill et al. (2019) used 6 days long predicted EAM values for the PM and UT1-UTC prediction using LS extrapolation and AR model. The Earth orientation parameters prediction comparison campaign (EOP PCC) took place within [2005][2006][2007][2008][2009], and it was organized to assess the various prediction techniques under the same conditions and rules. One of the main results of the EOP PCC was that there is not a specific method preferred for all EOP and all prediction intervals (ultra-short term and long term). Also, as the EOP prediction accuracy benefits from AAM forecast data, EOP PCC recommended paying more attention to the analysis and prediction of atmospheric angular momentum (AAM), continental hydrology angular momentum (HAM), and ocean angular momentum (OAM) (Kalarus et al. 2010). Therefore, a new prediction method is required to fully capture the dependence structure between AAM, HAM, OAM, and EOP. Although there are approaches to quantify the dynamical relation between the geophysical fluids (the atmosphere, the ocean, and the land hydrology) and the LOD variation (Gross 2015), we ignore them in our work as we exactly want to assess this relation directly independent of theoretical implications through comparing the LOD and the effective angular momentum function z-component by the Copula method. In this paper, we introduce an algorithm to improve the LOD prediction for reaching the accuracy goals pursued by the Global Geodetic Observing System (GGOS) of the International Association of Geodesy (IAG), i.e., 1 mm accuracy and 0.1 mm/year stability on global scales in terms of the International Terrestrial Reference Frame (ITRF) defining parameters (Plag and Pearlman 2009). We explored the combination of Copula-based analysis and singular spectrum analysis (SSA) to predict LOD. In Modiri et al. (2018), we applied the combination of SSA and Copula for the first time as a novel deterministic-stochastic tool for PM prediction. In this method, deterministic part is estimated by SSA, whereas Copula is used for modeling the stochastic part. The results indicated that the proposed approach can efficiently predict PM. Moreover, the improvement in PM prediction accuracy up to 365 days in the future is found to be 40% on average compared to the current PM prediction data from the International Earth Rotation and Reference Systems Service (IERS) Rapid Service/Prediction Center (RS/PC), hosted by the US Naval Observatory (USNO) (Petit and Luzum 2010). The Copula method contains both linear and nonlinear dependence structures between variables, and it is a powerful tool for dealing with multi-dimensional data and for modeling the relationship between parameters (Joe 1997). SSA is a subspace-based technique which makes use of empirical functions derived from the data to model the time series in a pre-specified level of details. It can be used for trend extraction and extrapolation (Alexandrov 2009;Modiri et al. 2018), periodicity detection, seasonal adjustment, smoothing, noise reduction (Golyandina et al. 2001;Ghil et al. 2002) as well as for change point detection (Moskvina and Zhigljavsky 2003;Hoseini et al. 2020). First, we derive the dependence structure between LOD and EAM Z . Based on the fitted theoretical Copula, we then simulate LOD from the EAM Z data. Next, the difference between LOD time series and its Copula-based estimation is modeled using SSA. After that, the LOD will be computed from predicted EAM Z . Finally, the difference will be predicted and will be added to LOD predicted by Copula. Multiple sets of ultra-short-term (10 days) LOD prediction have been made based on IERS 05 C04 time series to assess the capability of our hybrid model. We consider the same conditions as EOP PCC to show the effectiveness of the presented method. We compare the prediction results with those of existing techniques of EOP PCC, and the results evidence that the proposed approach can efficiently predict LOD.

Methods
The combination of stochastic and deterministic methods is used for LOD prediction. The Copula-based analysis technique aims to estimate the models for capturing the dependence structure between observed LOD and the EAM Z . Also, SSA is used as a deterministic technique to obtain stochastic residuals (the difference between the observed data and the Copula generated data). Finally, let us remark that we used the IERS EOP time series available at the time of the EOP PCC so that our results could be easily compared to the former analyses. In the following section, the theoretical background of Copula theory and SSA is briefly sketched.

Copula-based analysis
The Copula approach exploits linear and nonlinear dependency between variables. Copula is a flexible tool offering an enormous improvement in capturing the real correlation pattern. This technique provides the grounds for dealing with multi-dimensional data and modeling the relation between parameters based on the marginal distribution functions of the variables (Embrechts et al. 2002).
Copula appeared in the mathematics context for the first time by Sklar (1959). Sklar's theorem indicates that a Copula function C connects a given multivariate distribution function with its univariate marginal. For bivariate distribution, there is a bivariate Copula C which models the joint cumulative probability distribution function of two variables X and Y based on the marginal cumulative distribution functions F X (x) and F Y (y).
where C describes the joint distribution function F X ,Y (x, y). The variables u and v are transformations of X and Y to uniform distribution, respectively. The Copula is unique when the marginals are continuing functions. As the Copula is a reflection of the dependence structure itself, its construction is reduced to the study of the relationship between the variables, giving freedom for the choice of the univariate marginal distribution. Further information about Copula can be found, e.g., in Joe (1997) and Nelsen (2006). For many years, the Copula method has been used for modeling the dependence structure between random variables in different types of studies, such as Economics (Rachev and Mittnik 2000;Patton 2006Patton , 2009, Biomedicine (Wang and Wells 2000;Escarela and Carriere 2003), Hydrology (Bárdossy and Li 2008;Bárdossy and Pegram 2009;Verhoest et al. 2015), Meteorology (Laux et al. 2011;Vogl et al. 2012), Hydro-geodesy, and Geodesy (Modiri et al. 2015(Modiri et al. , 2018. Six different bivariate Copula families are used in this research: the Archimedean 12, Archimedean 14, Clayton, Frank, Gumbel, and Joe. Further information about mathematical details on these families can be found in Appendix A, Nelsen (2006), or in Salvadori and De Michele (2007).

Singular Spectrum Analysis
SSA is a time series analysis tool which can be used to retrieve robust components of a dataset aiming to provide an easier to interpret picture of complex observations. The method diagonalizes a lag-covariance matrix concerning a basis of orthogonal eigenvectors and computes the corresponding eigenvalues (Groth and Ghil 2015). SSA is able to reveal useful information about hidden underlying processes of a time series. Within its four steps, SSA groups correlated information in a time series and offers the opportunity of reproducing new versions of the time series based on their different characteristics [see Appendix B, Golyandina et al. (2001), and Ghil et al. (2002), for more details].

Error analysis
The mean absolute error (MAE) is used in order to evaluate the prediction accuracy. The MAE is calculated for the kth day in the future as follows: where P i is the predicted value of the ith prediction day, O i is the corresponding observed value, and k is the total prediction number.

Length of day (LOD)
Daily time series of LOD in this contribution are from IERS EOP 05 C04 series (Gambis 2004). The LOD time series are available from (http://hpiers.obspm.fr/eop-pc) and span the time interval from 1996 to 2008.

Effective angular momentum (EAM) functions
EAM functions in both mass and motion terms explain the non-tidal geophysical excitation of the Earth's rotation due to mass redistribution in atmospheric and terrestrial hydrosphere, and in the oceans. The EAM data consist of three main components X , Y , and Z . The

Data processing and analysis
In this paper, we defined an algorithm for LOD prediction as shown in Fig. 1. It is important to note that LOD can be decomposed into several components (e.g., variations related to zonal components of solid Earth tides and ocean tides, atmospheric circulation, internal effects, and transfer of angular momentum to the Moon orbital motion). Taking into account that the accuracy of the different models may be not homogeneous, we decide to include in the modeling of this study the total variation in LOD. We are aware that it may be more challenging for testing the method performance that relying partially in previous models for certain components, but we preferred not to remove too many difficulties when testing the method as a mean to get more insight into its capabilities. Having said that, the methodology is structured as follows. We analyze the EAM Z which is the sum of mass and motion terms of AAM, HAM, and OAM (see Fig. 2). The dependence structure between observed LOD and EAM Z is captured and modeled by using Copula-based analysis. The difference between the observed LOD and Copula LOD estimated data is modeled using the SSA method. After that, the EAM Z is predicted as described in detail in Modiri et al. (2018). The prediction algorithm is demonstrated through the following steps: 1. Copula-based joint distribution function of LOD and EAM Z -Model the dependency structure between LOD and EAM Z -Model the periodic residual using SSA -Calibrate the Copula + SSA model 2. EAM Z prediction -SSA periodic terms estimation -Copula anomaly modeling 3. LOD prediction using the calibrated Copula + SSA model -Sample random data from the conditional Copula; as it is shown in Fig. 3, the training interval is between 1996 and 2003.

Copula-based joint distribution function of LOD and EAM Z
In this study, the training dataset which ranges from 1996 to 2003 is used to fit a Copula-based joint distribution function; both EAM Z and LOD are transformed to uniformly distributed values between (0, 1) interval through their empirical cumulative distribution function. Thereafter, the dependence structure between the EAM Z and LOD is

EAM Z prediction
For this step, we defined an algorithm for EAM Z prediction as shown in Fig. 1. The (Z AAM + Z HAM + Z OAM ) time series can be split up into two parts. The first part is dealing with periodic effects such as annual and semiannual variations due to the spectral analysis of EAM Z (illustrated in Figure 5). The SSA models the periodic terms of the (Z AAM + Z HAM + Z OAM ) (see Fig. 6). Then, the difference between the observed EAM Z and SSA estimated data is modeled by using the Copula-based analysis method. First, the window length (L = 365 days) is selected considering the main periodicity (see Figure 5). After that, the number of anomaly and has a stochastic nature. This anomaly part is predicted using the Copula-based model. The anomaly part is displayed in Fig. 6 (lower panel). This part is formed with the same window length L. The dependence structure between the column i and column i+1 in the residual matrix is investigated. As can be seen in Fig. 7, the scatter plot illustrates a linear dependence between the two adjacent columns which are modeled by Archimedean Copula. Then, the empirical Copula is determined using Eq. 3. The next step is fitting a bivariate Archimedean Copula. In this study, Frank Copula is selected for predicting the EAM Z anomaly due to its ability to capture the linear dependence structure. Finally, the Copula-based predicted anomalies are added to the pre-viously described deterministic part. Here, we utilized seven years of EAM Z time series from September 1998 to September 2005 for the 10 days ahead forecasting during the interval between October 2005 and 2008, i.e., the same interval as it has been used for the EOP PCC. As can be seen in Fig. 8, the MAE of the EAM Z prediction is up to the 0.76 for the next days which is two orders of magnitude smaller than the EAM Z magnitude. In Fig. 7, the scatter plot shows a linear relationship between column i and column i+1 . Then, the corresponding empirical Copula is estimated. Finally, the anomalies are added to the SSA-forecasted time series.

LOD prediction from predicted EAM Z using the calibrated Copula + SSA model
The predicted EAM Z dataset from 2005 to 2008 is employed as input time series for the calibrated model. As can be seen in Fig. 1, the periodic terms in the residual part are predicted using SSA extrapolation as well. Finally, the Copula-based predicted data are added to the SSA-forecasted residual. To asses the proposed method, the results are compared with the EOP PCC solutions.

Discussion of the results
In this paper, a hybrid LOD prediction method Copula+SSA has been tested. The proposed combination method is examined based on the hind-cast experiments using the data from the past, i.e., the LOD data are predicted using the same time span (2005)(2006)(2007)(2008) as the EOP PCC. Figure 9 shows the MAE of ultra-short-term prediction. The MAE of our hybrid Copula + SSA models indicates fewer errors compared to the EOP PCC solutions. However, the Kalman filter with forecasted AAM shows a comparable performance with our proposed model with smaller MAE for the first 4 days in the future. Table 1 presents the MAE of Copula + SSA and EOP PCC results in numbers. Adoptive transform, AR, LS collocation, and NN show errors of more than 0.1 ms/day for the first day of prediction. Also, the MAE of the wavelet, LS extrapolation LS + AR, and He approaches reach more than 0.1 ms/day after 2 or 3 days in the future. From all contributions to the EOP PCC solution, Kalman filter provides the best accuracy. However, the MAE of the Kalman filter gets larger than 0.1 ms/day after 7 days. All Copula + SSA models show MAE smaller than 0.1 ms/day over 10-day prediction, except smaller Joe Copula + SSA. Figure 10 presents the MAE of 10-day-ahead prediction between 2005 and 2008 for all six hybrid models to understand better the prediction performance and its causes. Different features can be seen, and there are some common patterns such as high errors from the beginning of 2007 which might be caused by the El Niño effect or certain geomagnetic jerk events as pointed out by Shirai et al. (2005) or Malkin (2013). As NOAA National Centers for Environmental Information, State of the Climate reported, the El Niño warm event had a peak in December Archi 12 Kalarus et al. (2010) suggested benefiting from the prediction of the AAM, OAM, and HAM. Therefore, our better prediction performance may be due to considering both mass and motion terms of EAM Z for modeling the dependence structure between LOD and EAM Z . It is important to note that the Copula + SSA Archimedean 12 and 14 Copula provide significantly smaller errors than the other methods. On the other hand, the Joe Copula exhibits slightly larger errors than the two aforesaid models. This may have been caused by Archimedean 12 and 14 Copula's ability to capture the upper and lower heavy tail dependence structure.

Summary and Conclusion
LOD represents the variation in Earth's rotation rate which is most difficult to predict, because of the occurrence of extreme events in the LOD signal. In this paper, we introduce several approaches based on Copulas which were applied to bivariate frequency analysis. Using Copula is promising since it allows to take into account a wide range of correlation, frequently observed in time series. The presented work here is aimed at the possibility of utilizing the EAM Z data to predict LOD data due to the existing relationship between them. In order to study this relationship, two datasets were compared: the observed LOD from IERS EOP C05 and EAM Z derived from GFZ. The comparison with results of other methods indicates that the Copula + SSA can efficiently and precisely predict the LOD parameter at ultra-short term. All of our methods introduced here provide comparable error with the existing methods used for their evaluation in the time interval considered of up to 10 days. Besides, it is clearly demonstrated that the predicted AAM, HAM, OAM time series as additional input information can improve the LOD prediction. Among the analyzed combinations, the Archimedean 12 + SSA and Archimedean 14 + SSA show the most sophisticated performance with low errors. As the Kalman filter prediction provides better results within the first 3 to 4 days, we will investigate this topic further in future in order to find out how the EAM functions can deliver even better LOD predictions. The EOP PCC proved once more to be very useful. As long as the data are still available for post-processing, new methods can adequately be compared in a consistent way to the methods applied in the past.
Author contributions SM did most of the data analysis and writing of the manuscript. MH carried out the SSA studies and wrote a part of the manuscript. SB conceived and designed the study. RH, JMF, and HS participated in the design of the study and helped to improve the manuscript. All authors read and approved the final manuscript.

Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.

A.1 Empirical Copula
The empirical Copula approximates the unknown theoretical Copula distribution. The empirical Copula is purely based on the data, and it is defined in the rank space as follows (Genest and Rivest 1993;Genest and Favre 2007;Laux et al. 2011): where (r 1 ), (r 2 ) . . . , (r n ) denote the pairs of ranks of the variable (x 1 ), (x 2 ), . . . , (x n ), (s 1 ), (s 2 ) . . . , (s n ) denote the pairs of ranks of the variable (y 1 ), (y 2 ), . . . , (y n ), n is the length of the data vector, 1(…) is the indicator function. If the condition is true, the indicator function is equal to 1. Otherwise, the indicator function is equal to 0.

A.2 Archimedean Copula
Some Copulas can be estimated directly with the simple form. They are named Archimedean Copulas. An Archimedean Copula can be described in the following form: where θ is the Copula parameter and the function φ is the generator of the Copula with the following characteristics (Nelsen 2006): -for all u ∈ (0, 1), φ(u) < 0, φ is decreasing and convex (Table 2), -φ(1) = 0, and φ −1 is defined by

A.2.1 The relationship between Copula parameter Â and classical dependence parameter
There is a functional relationship between classical dependence parameters such as Kendall τ and Copula parameters. For one-parametric Copulas, the functional relationship between the Kendall τ Copula functions, namely: Equation 7 can be used to estimate the Copula parameter.
Kendal τ for Archimedean Copula with the generator Φ(t) is shown: Thus, the relations between the Kendall τ and the Archimedean Copula parameters are illustrated in Table 3. Here D is Debye functions.

A.3 Simulating from Copula-based conditional random data
This subsection provides the essential steps for data simulation using Copula-based conditional random data. The following steps are taken to fit the proper theoretical Copula function and simulation data (Laux et al. 2011;Vogl et al. 2012;Modiri et al. 2018).
is the Debye function for any positive integer k 1. Independent identical distribution (iid)-transformation of input time series. 2. Compute the marginal distributions F X (x) and F Y (y) of the input data x and y. 3. Transform data to rank space using the estimated marginal distributions of data with u i and v i in rank space. 4. Compute the empirical Copula to the dependence structure of random variables using the rank-transformed data. 5. Fit a theoretical Copula function C θ (u, v). 6. Compute the conditional Copula function. 7. Sample random data from the conditional Copula cumulative distribution function (CDF). 8. Transfer the sample back to the data space using the inverse marginal.

B Singular Spectrum Analysis
In the first step which is called embedding, we form a trajectory matrix (X) by moving a window of length L over the elements of the time series ( f i ), i.e., The matrix X is a symmetric matrix having identical elements on anti-diagonals. In the next step, we apply a singular value decomposition (SVD) to the trajectory matrix X, i.e., with the superscript T the transpose operator. The matrices U and V are orthonormal and are called left and right singular vectors, respectively. Σ Σ Σ is a diagonal matrix containing nonnegative entries, the singular values, which reflect the importance of the singular vectors.
If λ 1 ≥ λ 2 ≥ · · · ≥ λ L ≥ 0 denote diagonal entries of Σ Σ Σ, the trajectory matrix can be written as: where λ i is the corresponding singular value of X i and d is the number of nonzero singular values (d ≤ L).
In the next step, the grouping step, we select a group from {X 1 , X 2 , . . . , X d } which will be used for reconstructing a new version of the time series. For instance, for reproducing a representative trend of the time series, we would choose the most critical part of X which is retrievable using the first few singular vectors and their corresponding singular values.
The last step is dedicated to the reconstruction of the time series using the selected group. Based on the fact that the original trajectory matrix had the same entries on anti-diagonals, we reconstruct a model of the time series (G = (g 1 , g 2 , . . . , g N )) by anti-diagonal averaging: wherex i, j is an estimation of the element f i+ j−1 of the original time series.