Keywords

1 Introduction

In statistical modeling and hypothesis testing, models and procedures exist for estimating parameters from observations. These models often include a functional model, correlation model, and stochastic model, with the latter usually assumed to be normally distributed. However, this assumption can lead to incorrect results if outliers are present. Various approaches exist to address the issue of outliers in observations. One such approach is robust parameter estimation, which aims to reduce the impact of outliers on the estimation result. In Bayesian inference, robust estimation is achieved by substituting the initially assumed distribution for the observations with a distribution having heavy tails. Consequently, to obtain a robust estimator for the assumption of normally distributed observations, one option is to replace this distribution with a heavy-tailed t-distribution (Lange et al. 1989).

Modeling multivariate time series has been approached by Alkhatib et al. (2018) and Kargoll et al. (2020). Alkhatib et al. (2018) proposed a nonlinear functional model with a t-distributed error model, while Kargoll et al. (2020) introduced two different outlier models for nonlinear deterministic and vector-autoregressive (VAR) models. The VAR process models auto- and cross-correlations, but both Alkhatib et al. (2018) and Kargoll et al. (2020) do not consider prior knowledge of the parameters. Kargoll et al. (2020) derived a generalized expectation maximization (GEM) algorithm to approximate parameters but it does not include prior knowledge, and the variance-covariance matrix (VCM) of the parameters can only be estimated with a computationally intensive algorithm with bootstrapping. However, it is possible to integrate prior knowledge and estimate the VCM of the parameters in Bayesian inference, under the assumption that the prior knowledge is available in the form of a distribution function.

In Dorndorf et al. (2021), the model in Alkhatib et al. (2018) was extended to consider prior information using Bayesian inference. This paper focuses on the VAR model. For an overview of Bayesian time series analysis models, see Steel (2010). In Bayesian time series analysis, the VAR coefficients are treated as random variables (Box and Jenkins 2015) and require a prior density for estimation. A Bayesian AR model with a non-informative prior and normally distributed white noise is presented in Box and Jenkins (2015). Ni and Sun (2005) introduced a Bayesian VAR model structured similarly to Kargoll et al. (2020) and solved with a Gibbs sampler, but it requires the time series to be detrended.

The algorithm from Dorndorf et al. (2021) will be extended to handle a VAR process (described in detail in Sect. 2), and the posterior density function will be approximated through Markov-Chain Monte Carlo (MCMC) algorithm (outlined in Sect. 3). A multi-variate time series model for laser tracker observations of a circle in 3D will be proposed in Sect. 4 and evaluated through Monte Carlo simulation. The findings will be used to evaluate the performance of the implemented Metropolis-Hastings-within Gibbs algorithm.

2 The Bayesian Time Series Model

The observation \(\boldsymbol {\ell }_{t}\) are expressed in the observation matrix \({\boldsymbol {L} = \begin {bmatrix} \boldsymbol {\ell }_{1} & \cdots & \boldsymbol {\ell }_{n} \end {bmatrix}^T}\). The observation model is defined to be a regression time series

$$\displaystyle \begin{aligned}{} \mathcal{L}_t & = \boldsymbol{\Tilde{\ell}_t} + \mathcal{E}_t = {\boldsymbol{h}}_t \left(\Tilde{{\boldsymbol{\beta}}} \right) +\mathcal{E}_t, \quad t=1,\dots,n, \end{aligned} $$
(1)

where the random variable \(\mathcal {L}_t\) consists of a deterministic part \(\boldsymbol {\Tilde {\ell }_t}\) and a stochastic part \(\mathcal {E}_t \). Here \( \boldsymbol {\Tilde {\ell }_t} \) is a true value and can be described by an arbitrary possibly nonlinear (differentiable) function \(\boldsymbol {h}_t \left ( \cdot \right )\) over the true functional parameters \(\boldsymbol {\Tilde {\beta }}\). The stochastic component \(\mathcal {E}_t\) represents a colored noise for the time series \( \mathcal {L}_t \) that is obtained from a VAR model with

$$\displaystyle \begin{aligned}{} \mathcal{E}_{t} & = {\tilde{\boldsymbol{A}}}_{1}\mathcal{E}_{t-1} + \ldots + {\tilde{\boldsymbol{A}}}_{{p}} \mathcal{E}_{t-{p}} + \mathcal{U}_{t}. \end{aligned} $$
(2)

The matrix \(\tilde {\boldsymbol {A}}_j\) contains the true VAR coefficients of the p-th VAR order and the matrix is thus given with

(3)

where \(\mathcal {U}_{t}\) in Eq. 2 is the white noise and follows from the approximation of a multivariate Student distribution \(\mathcal {U}_{t} \sim t\left (\boldsymbol {0}, \tilde {\boldsymbol {\Psi }} , \tilde {\nu } \right )\) where the expectation value of white noise \(\mathcal {U}_{t}\) is \(\boldsymbol {0}\); \(\tilde {\boldsymbol {\Psi }}\) denotes the true scale matrix of white noise and \(\tilde {\nu }\) is the true degree of freedom for the multivariate t-distribution; N in Eq. 3 is the dimension of the multivariate time series. The scaling matrix has the structure of a VCM resulting in

(4)

with the true correlation coefficient \(\tilde {\rho }_{i,k}\) and the true scaling factors \(\tilde {\psi }^2_i\) on the diagonal.

It follows that the random variable \(\mathcal {L}_t \) of Eq. 1 can be specified by the parameters \(\tilde {\boldsymbol {\beta }}\), \(\tilde {\boldsymbol {\Psi }}\), \(\tilde {\boldsymbol {A}}_j\), and \(\tilde {\nu }\), where the scaling matrix \(\tilde {\boldsymbol {\Psi }}\) according to Eq. 4 consist of \(\tilde {\psi }^2_{i}\) (with ) and \(\tilde {\rho }_{k,o}\) (with and ). These parameters are now grouped into the true parameter vector:

$$\displaystyle \begin{aligned} {} \tilde{\boldsymbol{\theta}} = \begin{bmatrix} \tilde{\boldsymbol{\beta}}^T & \left(\tilde{\boldsymbol{\psi}}^2\right)^T & \tilde{\boldsymbol{\rho}}^T & \tilde{\boldsymbol{a}}^T & \tilde{\nu} \end{bmatrix}^T. \end{aligned} $$
(5)

Thus, this vector consists of \(\tilde {\boldsymbol {\beta }} = \left [ \tilde {\beta }_1 , \dots , \tilde {\beta }_m \right ]^T\), \(\tilde {\boldsymbol {\psi }}^2 = \left [ \Tilde {\psi }^2_{1}, \dots , \Tilde {\psi }^2_{N} \right ]^T, \tilde {\boldsymbol {\rho }} \,{=}\, \left [ \tilde {\rho }_{1,2} ,\dots ,\tilde {\rho }_{1,N},\tilde {\rho }_{2,3},\dots , \tilde {\rho }_{N-1,N} \right ]^T\) and \(\tilde {\boldsymbol {a}} = [ \Tilde {\alpha }_{1;1,1}, \dots , \Tilde {\alpha }_{1;N,1} , \Tilde {\alpha }_{1;1,2} , \dots , \) \( \Tilde {\alpha }_{1;N,N} , \Tilde {\alpha }_{2;1,1}, \dots , \Tilde {\alpha }_{p;N,N} ]^T\). Hence, the dimension of the parameter vector \(\tilde {\boldsymbol {\theta }}\) is \(B = m + N + \frac {N^2-N}{2} + N^2 \cdot p + 1\), where m is the total number of the functional parameters \(\tilde {\boldsymbol {\beta }}\), N is the dimension of the multivariate time series and p is the order of the VAR process.

In general, all parameters in Eq. 5 are unknown for the observed data \(\boldsymbol {L}\), and thus, the estimated values \(\hat {\boldsymbol {\theta }}\) need to be calculated. In the context of Bayesian inference, the parameter vector \(\hat {\boldsymbol {\theta }}\) is estimated based on the corresponding random variable \(\boldsymbol {\Theta }\), whose density function is also unknown. Assuming that the time series data \(\mathcal {L}_t\) from the model in Eq. 1 is given, this data depends on the random variable \(\boldsymbol {\Theta }\). This relationship can be expressed as a likelihood function \(f_{{\mathcal {L} \mid \boldsymbol {\Theta }}}\). the random variable \(\boldsymbol {\Theta }\) that can be expressed as a probability distribution \(f_{\boldsymbol {\Theta }}\) called prior density function. Let’s assume that we possess prior knowledge about the random variable \(\boldsymbol {\Theta }\), which can be represented as a probability distribution \(f_{\boldsymbol {\Theta }}\) known as the prior density function. We can then update this prior knowledge with the observed data \(\boldsymbol {L}\) using the Bayes theorem, resulting in the calculation of the posterior density function \(f_{\boldsymbol {\Theta } \mid \mathcal {L}}\) as follows:

$$\displaystyle \begin{aligned}{} f_{\boldsymbol{\Theta} \mid \mathcal{L}} \left( \boldsymbol{\beta} , \boldsymbol{\psi}^2, \boldsymbol{\rho} , \boldsymbol{a} , \nu \mid \boldsymbol{L} \right) \propto f_{\boldsymbol{\Theta}} \left(\boldsymbol{\beta} , \boldsymbol{\psi}^2, \boldsymbol{\rho} , \boldsymbol{a} , \nu \right)\\ \cdot f_{\mathcal{L} \mid \boldsymbol{\Theta}} (\boldsymbol{L} \mid \boldsymbol{\beta} , \boldsymbol{\psi}^2, \boldsymbol{\rho} , \boldsymbol{a} , \nu). \end{aligned} $$
(6)

According to Kargoll et al. (2020) the joint likelihood function leads to:

$$\displaystyle \begin{aligned}{} &f_{\mathcal{L} \mid \boldsymbol{\Theta}} (\boldsymbol{L} \mid \boldsymbol{\theta} )\\ &\quad = \prod_{t=1}^n \left( \frac{\Gamma \left( \frac{\nu+N}{2} \right) }{ \Gamma \left( \frac{\nu}{2} \right) \sqrt{(\nu \pi)^N} } \vert \boldsymbol{\Psi} \vert^{-1/2} \left[ 1 + \frac{ \boldsymbol{u}_t^T \boldsymbol{\Psi}^{-1} \boldsymbol{u}_t }{\nu} \right]^{-\frac{\nu+N}{2}} \right) \end{aligned} $$
(7)

where \(\Gamma \) is the gamma function, and \(\boldsymbol {u}_{t}\) in Eq. 7 is:

$$\displaystyle \begin{aligned}{} \boldsymbol{u}_{t} &= \boldsymbol{e}_{t} - \sum_{j=1}^p \left( \boldsymbol{A}_{j} \cdot \boldsymbol{e}_{t-j} \right)\\ &= \boldsymbol{\ell}_{t} - \boldsymbol{h}_{t}(\boldsymbol{\beta}) - \sum_{j=1}^p \left( \boldsymbol{A}_{j} \cdot \left[ \boldsymbol{\ell}_{t-j} - \boldsymbol{h}_{t-j} \left(\boldsymbol{\beta}\right) \right] \right). \end{aligned} $$
(8)

Thus, the calculation of the likelihood function for the observations \(\boldsymbol {L}\) is based on the product of the unknown white noise \(\boldsymbol {u}_t\) that results from the relation of Eq. 8. Due to assumed stochastic independence, the joint prior density function in Eq. 6 can be written as

$$\displaystyle \begin{aligned}{} &f_{\boldsymbol{\Theta}} (\boldsymbol{\beta} , \boldsymbol{\psi}^2, \boldsymbol{\rho} , \boldsymbol{a} , \nu)\\ &\quad = f_{\boldsymbol{\Theta}_{\boldsymbol{\beta}}} \left(\boldsymbol{\beta} \right) \cdot f_{\boldsymbol{\Theta}_{\boldsymbol{\psi}^2}} \left(\boldsymbol{\psi}^2 \right) \cdot f_{\boldsymbol{\Theta}_{\boldsymbol{\rho}}} \left( \boldsymbol{\rho} \right) \cdot f_{\boldsymbol{\Theta}_{\boldsymbol{a}}} \left( \boldsymbol{a} \right) \cdot f_{{\Theta}_{\nu}} \left( \nu \right). \end{aligned} $$
(9)

In this paper, we only consider the case of a non-formative prior density because the Bayes model is compared to a comparable classical adjustment model for validation purposes. The used non-informative prior density for the parameters are:

(10)

The prior densities used have values of \(\pm \infty \), which are considered improper. But combining these densities with the likelihood results in a proper posterior density. For the parameter \(\boldsymbol {\rho }\), an improper density could have been used, but due to mathematical constraints it is only possible for a correlation coefficient to be between \(\pm 1\). Similarly, the scaling factor \(\psi ^2\) was excluded from having a value smaller than zero. For the degree of freedom \(\nu \), a proper density was defined according to Kargoll et al. (2020) to prevent an improper posterior density and ensure a fair comparison between the Bayesian and classical GEM models. The computation of the exact posterior density function given in Eq. 6 is not feasible, so approximation techniques must be used. To solve these problems, a MCMC algorithm has been developed and will be described in the next section.

3 The Developed MCMC Algorithm

The goal of the MCMC methods is now to generate the random numbers \(\boldsymbol {\theta }^{(1)}\rightarrow \boldsymbol {\theta }^{(2)}\rightarrow \dots \rightarrow \boldsymbol {\theta }^{(b)}\) as a Markov chain. The MCMC method, therefore, generates a total of b random numbers as realizations of \(\boldsymbol {\Theta }\). The Markov chain defined here is related to the full parameter vector \(\boldsymbol {\theta }^{\left (y \right )}\), whereby y is the current number of the sample of the Markov chain. However, since the density function \(f_{\boldsymbol {\Theta } \mid \mathcal {L}} \left ( \boldsymbol {\theta }^{\left (y\right )} \mid \boldsymbol {\theta }^{\left (y-1\right )}, \boldsymbol {L} \right )\) is unknown here, the Markov chain cannot be generated directly with the Gibbs sampler. For this reason, this density function at time y of the Markov chain is decomposed into a univariate conditional density function as follows

$$\displaystyle \begin{aligned}{} f_{\boldsymbol{\Theta} \mid \mathcal{L}} \left( \theta_z^{\left(y\right)} \mid \theta_1^{\left(y\right)},\dots,\theta_{z-1}^{\left(y\right)},\theta_{z+1}^{\left(y-1\right)},\dots \theta_B^{\left(y-1\right)}, \boldsymbol{L} \right). \end{aligned} $$
(11)

Applying the formula in Eq. 11 to the posterior density from Eq. 6, it means for example for the parameter \(\beta _2\) that the conditional posteriori density is \(f_{\boldsymbol {\Theta }_{{\beta }_2} \mid \mathcal {L}} \left ( {\beta }^{\left (y\right )}_2 \mid {\beta }^{\left (y\right )}_1, {\beta }^{\left (y-1\right )}_3,\dots \right .\), \(\left .{\beta }^{\left (y-1\right )}_m \left (\boldsymbol {\psi }^2\right )^{\left (y-1\right )} , \boldsymbol {\rho }^{\left (y-1\right )} \boldsymbol {a}^{\left (y-1\right )}, \nu ^{\left (y-1\right )} , \boldsymbol {L} \right )\).

To generate the random numbers \(\boldsymbol {\theta }^{(y)}\) using the MCMC methods now, one of the different existing Monte Carlo algorithms for generating Markov chains needs to be selected. In this paper, the Metropolis-Hastings algorithm and the Gibbs sampler are chosen for this purpose, which are two of the most important algorithms for MCMC methods (refer to Gelman et al. (2013)). Both algorithms can be integrated into a so-called Metropolis-within-Gibbs algorithm, shown in Algorithm 1. The Gibbs sampler is the “For loop” in line 2 of the algorithm. This requires as the start value the parameter b as length for the Markov chain to be generated. The Metropolis algorithm starts in line 4, where this algorithm is executed once for each component z of \(\boldsymbol {\theta }\) using the “For loop” in line 3. To run the Metropolis algorithm, the proposal density \(f_{{\Theta }_z^{*} \mid {\theta }_z } \left ( {\theta }_z^{*} \mid {\theta }_z^{\left (y-1 \right )} \right )\) is required for generating the random realization \(\theta _z^*\). In this paper, a normal distribution is always used as proposal density and therefore results from \({\Theta }_{{\theta }_z}^{*} \sim \mathcal {N}\left ({\theta }_z^{\left (y-1 \right )}, \lambda _{{\theta }_z}^2 \right )\).

Algorithm 1 Metropolis-within-Gibbs

The mean of the distribution is set by the previous value in the Markov chain and the variance \(\lambda _{{\theta }_z}^2\) determines the jump distance and affects convergence to the desired posterior density. Optimal convergence is achieved when the acceptance rate is around 44% with a normal distribution as the proposal density, as shown in Gelman et al. (1996).

Automating the selection of \(\lambda _{{\theta }_z}^2\) with adaptive MCMC algorithms, such as the one presented in Roberts and Rosenthal (2009), is recommended when B is large. In Algorithm 1, the acceptance rate decides if \(\boldsymbol {\theta }^{\left (*\right )}\) or \(\boldsymbol {\vartheta }\) is accepted as the realization in step y of the chain. To optimize the jump values, the approach presented in Dorndorf et al. (2019) was used to achieve an acceptance rate between 40–50%. This method requires initial values, the posterior density, and the conditional density functions to be constructed.

The mean value of the different parameter groups can be estimated from the generated Markov chain \(\boldsymbol {\theta }^{\left (y \right )}\) using:

$$\displaystyle \begin{aligned}{} \hat{{\theta}}_{z} = \frac{1}{b-o} \sum_{y = o + 1 }^b \theta_{z}^{\left( y \right)} \quad \text{for} \quad z = 1, \dots , B, \end{aligned} $$
(12)

where o is the Warm Up Phase. Based on the mean values estimated in Eq. 12 and the realizations of the Markov chain generated by Algorithm 1, the VCM of the parameters can be estimated (refer to Gelman et al. (2013)):

$$\displaystyle \begin{aligned}{} \hat{\boldsymbol{\Sigma}}_{\hat{\boldsymbol{\theta}} \hat{\boldsymbol{\theta}} ; {z,i}} = \frac{1}{b-o} \sum_{y = o+1}^{b} \left( \theta_{z}^{\left( y \right)} - \hat{{\theta}}_{z} \right) \left( \theta_{i}^{\left( y \right)} - \hat{{\theta}}_{i} \right) \quad \text{for} \quad z = 1, \dots , B \text{ ; } i = 1, \dots , B. \end{aligned} $$
(13)

4 Closed Loop Monte Carlo Simulation

The Closed Loop Simulation (CLS) in this chapter is based on an experiment conducted at the Geodetic Institute Hanover using a multi-sensor system consisting of a laser scanner and GNSS equipment. The experiment aimed to determine the transformation parameters between the global coordinate system defined by the GNSS equipment and the laser scanner’s local, sensor-defined coordinate system using a high-precision laser tracker. For further details, refer to Paffenholz (2012). The CLS was developed to estimate the expected accuracy of the parameters and was used to validate the Bayesian model presented in Algorithm 1. The advantage of a CLS is that the true functional and stochastic model in Eq. 1 to Eq. 4 are known. Real data processing is beyond the scope of this paper.

4.1 The Framework of the Simulation

The CLS involves a 3D non-linear regression model of a circle with 6 parameters: two for orientation (\(\varphi \) and \(\omega )\), one for radius (r), and three for center (\(c_x,c_y,c_z\)). The observable 3D circle points are described by

$$\displaystyle \begin{aligned} \tilde{\ell}_{x,t} &:= \tilde{\ell}_{1,t} = h_{1,t}\left(\tilde{\boldsymbol{\beta}}\right) = \tilde{r}\sin{\left(\tilde{\kappa}_t\right)}\cos{\left(\tilde{\varphi}\right)}+\tilde{c}_x, \end{aligned} $$
(14)
$$\displaystyle \begin{aligned} \tilde{\ell}_{y,t} &:= \tilde{\ell}_{2,t} = h_{2,t}\left(\tilde{\boldsymbol{\beta}}\right) = \tilde{r}\sin{\left(\tilde{\kappa}_t\right)}\sin{\left(\tilde{\varphi}\right)} \sin{\left(\tilde{\omega}\right)}+r\cos{\left(\tilde{\kappa}_t\right)} \cos{\left(\tilde{\omega}\right)}+\tilde{c}_y, \end{aligned} $$
(15)
$$\displaystyle \begin{aligned} \tilde{\ell}_{z,t} &:= \tilde{\ell}_{3,t} = h_{3,t}\left(\tilde{\boldsymbol{\beta}}\right) = -\tilde{r}\sin{\left(\tilde{\kappa}_t\right)}\sin{\left(\tilde{\varphi}\right)} \cos{\left(\tilde{\omega}\right)}+r\cos{\left(\tilde{\kappa}_t\right)} \sin{\left(\tilde{\omega}\right)}+\tilde{c}_z, {} \end{aligned} $$
(16)

where \(t=1,\ldots , n\) (with \(n=1000\)) and \(\tilde {\kappa }_t= \tilde {\kappa }_O + \tilde {\kappa }_{\Delta } \cdot \left (t - 1 \right )\). In this equation the parameter \(\kappa _O\) is the unknown orientation and the parameter \(\kappa _{\Delta }\) is the angle of rotation of the TLS between two observations.

In this simulation, the functional parameters are the 3D circle parameters \(\boldsymbol {\beta }\), which were assumed to take the true values \(\tilde {c}_x=0.12\left [\text{m}\right ]\), \(\tilde {c}_y=-3.36\left [\text{m}\right ]\), \(\tilde {c}_z=-0.10 \left [\text{m} \right ]\), \(\tilde {r}= 0.50 \left [\text{m} \right ] \), \(\tilde {\omega }=-0.05 \left [\text{deg} \right ] \), \(\tilde {\varphi }=0.01 \left [\text{deg} \right ]\), \(\tilde {\kappa }_O=184.00 \left [\text{deg} \right ] \) and \(\tilde {\kappa }_{\Delta }= 0.36 \left [\text{deg} \right ]\). The model of Eq. 2 with a VAR order of \(p=1\) is used in the CLS as the stochastic model for generating the realisations of the coloured noise. The VAR matrix \(\tilde {\boldsymbol {A}}_{1}\) then results according to Eq. 3 for the chosen true coefficients to \(\tilde {\alpha }_{1;1,1}=0.50 \), \(\tilde {\alpha }_{1;1,2}= -0.10\), \(\tilde {\alpha }_{1;1,3}=0.15\), \(\tilde {\alpha }_{1;2,1}=0.10\), \(\tilde {\alpha }_{1;2,2}=-0.20\), \(\tilde {\alpha }_{1;2,3}=0.25\), \(\tilde {\alpha }_{1;3,1}=0.20\), \(\tilde {\alpha }_{1;3,2}=-0.05\) and \(\tilde {\alpha }_{1;3,3}=0.75\). For the generation of the random white noise, the stochastic model \(\mathcal {U}_{t} \sim t\left (\boldsymbol {0},\tilde { \boldsymbol {\Psi }} , \tilde {\nu } \right )\) is used in the CLS. The scaling matrix in Eq. 4 is initialized with the scaling factors \({\tilde {\boldsymbol {\psi }} = \left [ \begin {matrix} 8.8 & 6.1 & 11.9 \end {matrix} \right ]^T \left [\upmu \text{m}\right ]}\)and the correlation coefficients \({\tilde {\boldsymbol {\rho }} = \left [ \begin {matrix} 0.37 & -0.15 & 0.09 \end {matrix} \right ]^T}\). The degree of freedom of the Student distribution is fixed to \(\tilde {\nu } = 4.14\).

4.2 Results of the Simulation

In the CLS, the results of the developed Bayesian MCMC Algorithm 1 in Sect. 3 are compared with the results of the GEM algorithm in Kargoll et al. (2020). The GEM model in the CLS was run twice: once with the estimation of the parameter \(\nu \) (referred to as GEM), and once with a known degree of freedom \(\nu = 10{,}000{,}000\). The latter scenario represents the case where the likelihood function corresponds to a multivariate normal distribution. The purpose of these runs is to show the effect of incorrect noise assumption on the estimation results. The initial values for the MCMC algorithm were set to \(b=5000\) and \(o=2000\), and the initial values for the parameters were set to the true parameters of CLS to avoid any biases in the results.

In the following analysis, the parameters estimated by MCMC Algorithm 1 and GEM algorithm are subtracted from the true values \(\tilde {\boldsymbol {\theta }}\) given in Sect. 4.1. From this follows \(\Delta \hat {\theta }_{z,s} = \hat {\theta }_{z,s} - \tilde {\theta }_{z}\) with \(z = 1, \dots , B\), s is the index for Monte Carlo run of the CLS with \(s=1,\dots , 10{,}000\) and B is the total number of the parameters. The estimated reduced parameter values are shown for specific chosen parameters in Fig. 1 using the boxplot. The parameters estimated by MCMC and GEM algorithms are compared with true values and shown in Fig. 1. The results show that the parameters estimated by MCMC and GEM algorithms are almost the same, with wider confidence intervals for functional parameters in the GEM estimator with \(\nu = \infty \). The estimated VAR coefficients are comparable among all estimators, with unbiased estimates for the other two estimators. The estimated correlation coefficients show that the GEM estimator with \(\nu = \infty \) scatters more around the true values compared to the other two estimators. The results of the MCMC and GEM solutions are similar but there are differences in the degree of freedom (\(\nu \)) and the parameters in the scaling matrix (\(\boldsymbol {\Psi }\)). The median of the boxplots for \(\nu \) and \(\psi _{x}\) deviates more for the GEM compared to MCMC, but the deviation is not noticeable for the correlation coefficients (\(\boldsymbol {\rho }\)). However, the larger deviation of the median in the GEM does not affect the dispersion of the estimated parameters, which is comparable for both MCMC and GEM.

Fig. 1
figure 1

Differences between the estimated values and true values for the 10,000 CLS runs as a box plot. For GEM \(\nu = \infty \) the degree of freedom is fixed at \(\nu = 10{,}000{,}000\)

The performance of both algorithms was compared by using the estimated \(\hat {\boldsymbol {\beta }}\) to determine \(\hat {\boldsymbol {\ell }}\), and then calculating the residuals \(\tilde {v}_{i,t,s}\) (\(i=x,y,z\) and \(t=1\dots 1000\), \(s=1\dots 10{,}000\)) to see how well the predictions matched the true observations. The mean, standard deviation, minimum, and maximum of the residuals were determined and are shown in Table 1.

Table 1 Descriptive statistics for estimated residuals between predicted observations and true observations

The GEM algorithm generates residuals with slightly smaller values compared to the MCMC algorithm. The differences between the estimated parameters \(\Delta \hat {\nu }\) and \(\Delta \hat {\psi }_x\) presented in Fig. 1 have no influence on the calculated results of Table 1, because the residuals \(\boldsymbol {\Tilde {v}}_x\), \(\boldsymbol {\Tilde {v}}_y\) and \(\boldsymbol {\Tilde {v}}_z\) are only calculated on the basis of the estimated parameters \(\hat {\boldsymbol {\beta }}\). However, the differences shown in Table 1 are not significant compared to the parameters \(\tilde {\boldsymbol {\psi }}\), which was used to create the white noise for the CLS. The prediction of \(\tilde {\boldsymbol {\ell }}_{x}\) and \(\tilde {\boldsymbol {\ell }}_{y}\) have less deviations than the \(\tilde {\boldsymbol {\ell }}_{z}\) component due to the symmetrical circle trajectory in the x-y component, which supports estimation. However, in the z-component, inaccuracies in the estimated parameters have a stronger effect.

5 Conclusions

In this paper, a robust Bayesian model with VAR process was presented and compared to a classical model based on a GEM algorithm and a VAR model with multivariate normal distribution assumption for the white noise. The robust Bayesian model showed almost identical results to the robust classical model, with differences arising from the use of different estimators for the parameters. The robust Bayesian model offers the advantage of being able to determine the precision of the parameters and to apply different estimators for the parameters, which is not possible with the classical model without a significant increase in computational cost. The limitations of the robust Bayesian model and its future applications, such as investigating the quality of the VCM and the convergence of the Markov chains, as well as defining an informative prior density and validating the model on real data, will be explored in future work.