Abstract
We construct a flexible dynamic linear model for the analysis and prediction of multivariate time series, assuming a two-piece normal initial distribution for the state vector. We derive a novel Kalman filter for this model, obtaining a two components mixture as predictive and filtering distributions. In order to estimate the covariance of the error sequences, we develop a Gibbs-sampling algorithm to perform Bayesian inference. The proposed approach is validated and compared with a Gaussian dynamic linear model in simulations and on a real data set.
1 Introduction
State-space models have been extensively considered in diverse areas of application for modeling and forecasting time series. An important special case is the class of dynamic linear models (hereafter dlm). This class of models includes the ordinary static linear model as a special case, and assumes that the parameters can change over time, thus incorporating in the observational system variations that can significantly affect the observed behavior of the process of interest. The dlm is defined by
for \(t=1,\ldots , T\), where \({\textbf{Y}}_t\) is a \(r \times 1\) response vector, \({\textbf{F}}_t\) is a \(p \times r\) matrix that links the observed data with \(\varvec{\theta }_t\), a \(p \times 1\) vector of latent states at time t, \({\textbf{G}}_t\) is a \(p\times p\) transition matrix, which describes the evolution of the state parameters. The terms \(\varvec{\nu }_t\) and \(\varvec{\omega }_t\) are mutually independent white-noise vectors of dimension \(r\times 1\) and \(p \times 1\), respectively, with zero-means and constant variance-covariance matrice \({\textbf{V}}\) and \({\textbf{W}}\), respectively. The most popular case is the Gaussian dlm, which assumes that
all being mutually independents for each \(t=1,\ldots , T\), where \(N_k(\varvec{\mu },\varvec{\Sigma })\) denotes the k-variate normal distribution with mean vector \(\varvec{\mu }\) and variance-covariance matrix \(\varvec{\Sigma }\). For an extensive introduction to dlms with a Bayesian perspective, refer to West and Harrison (1997), Petris et al. (2009). In this article, we focus on a setting where \({\textbf{F}}_t\) and \({\textbf{G}}_t\) are known, consistently with classical Kalman filter setting (Kalman 1960) and recent developments in state-space models (e.g., Fasano et al. 2021).
Leveraging the properties of the multivariate normal distribution and the structure of the Gaussian dlm, it is possibile to derive closed form expressions for the predictive and filtering distributions and conduct dynamic inference on the states \(\varvec{\theta }_t\) via Kalman filter, conditioning on \({\textbf{F}}_t,{\textbf{G}}_t, {\textbf{W}}\) and \({\textbf{V}}\). However, Naveau et al. (2005) observed that the Gaussian assumption may be questionable for a large number of applications, as many distributions used in a state-space model can be skewed. In order to mitigate this issue, Naveau et al. (2005) assumed that state initial parameter vector \(\varvec{\theta }_0\) follows a multivariate closed skew-normal distribution, preserving the typical assumptions of independence and normality for the error sequences \(\varvec{\nu }_t\) and \(\varvec{\omega }_t\). From this work, several authors have proposed different mechanisms to obtain dlm with more skewness. For instance, Kim et al. (2014) extended the results in Naveau et al. (2005) by assuming a scale mixtures of closed skew-normal distributions for the initial state parameter vector \(\varvec{\theta }_0\); Cabral et al. (2014) proposed a Bayesian dlm relaxing the assumption of normality and assuming an extended skew-normal (Azzalini and Capitanio 1999) for the initial distribution of the state parameter; Arellano-Valle et al. (2019) proposed a dlm in which the error sequence \(\varvec{\nu }_t\) in the observational equation are assumed to have multivariate skew-normal distribution. Furthermore, several authors have been dealing with a similar problem; see, for example, Gualtierotti (2005), Pourahmadi (2007), Corns and Satchell (2007), among many others.
In this work we take a similar perspective, and derive a novel dlm that allows one to induce asymmetry by means of a scalar parameter, inducing a skewed initial distribution for the state space parameter \(\varvec{\theta }_0\). Our purpose is to replace the normal distribution for \(\varvec{\theta }_0\) by a more flexible one, incorporating asymmetry via a two-piece normal (tpn) mixing distribution. Using this simple method, we obtain an extension of the classic Kalman filter, and closed form expressions for one-step-ahead and filtering distributions. These results are further combined into a Markov-Chain Monte Carlo procedure via a forward filtering–backward sampling algorithm to provide inference on the covariances \(\textbf{V}\) and \(\textbf{W}\) of the error terms, providing posterior inference on the unknown quantities.
2 Two-piece normal and skew normal distributions
2.1 Two-piece normal distributions
According to Arellano-Valle et al. (2005), a continuous random variable Y follows a two-piece normal (tpn) distribution with location \(\mu\), scale \(\sigma\) and asymmetry parameter \(\gamma\) if its density function for any \(y\in {\mathbb {R}}\) can be written as
where \(\phi (x)\) denotes the density of a standard Gaussian and \(I_A(x)\) denotes the indicator function of the set A; we write such a distribution compactly as \(Y\sim TPN(\mu , \sigma , \gamma )\). Skewness is controlled via two functions \(a(\gamma )\) and \(b(\gamma )\) satisfying the following properties:
-
(i)
\(a(\gamma )\) and \(b(\gamma )\) are positive-valued functions for \(\gamma \in (\gamma _L,\gamma _U)\), a possibly infinite interval;
-
(ii)
one of the functions is (strictly) increasing and the other is (strictly) decreasing;
-
(iii)
there exists an unique value \(\gamma _* \in (\gamma _L, \gamma _U)\) such that \(a(\gamma _*) = b(\gamma _*)\), and therefore the tpn density (1) becomes
$$\begin{aligned} h(y)=\frac{2}{\sigma a(\gamma _*)}\phi \left( \frac{y-\mu }{\sigma a(\gamma _*)}\right) . \end{aligned}$$
In addition, the tpn distribution has several interesting formal properties in terms of stochastic construction (Arellano-Valle et al. 2005). The following list includes those most relevant for the purposes of this work:
P1. The tpn density (1) can be expressed as a finite mixture of two truncated normal densities \(f_a\) and \(f_b\) given by
That is,
where
P2. If \(Y \sim h\), then \(Y \buildrel d\over = \mu + \sigma W_\gamma V\), where the notation \(\buildrel d\over =\) indicates equality in distribution. Specifically, \(V \sim TN(0,1, [0, \infty ])\), a truncated normal with location 0, scale 1, truncated over the positive real line, while \(W_\gamma\) is an independent discrete random variable with probability function
which can be rewritten as
with \(s = \text{ sign }(w)\) and \(\pi _s\) defined by (3). Equivalently, if \(Y = \mu + \sigma W_\gamma |X|\), where \(X \sim N(0,1)\) and is independent of \(W_\gamma\), then \(Y \sim h\). This stochastic representation allows one to obtain the mean and variance of Y leveraging the law of total expectation; refer to Arellano-Valle et al. (2020) for further details.
2.2 Skew-normal distribution
A random vector \({\textbf{Y}}\) has a multivariate Skew-Normal sn distribution with location vector \(\varvec{\xi }\), positive definite scale matrix \(\varvec{\Omega }\) and skewness/shape vector \(\varvec{\lambda }\), denoted by \({\textbf{Y}} \sim SN_p (\varvec{\xi }, \varvec{\Omega }, \varvec{\lambda })\), if its density function is given by
Here, \(\phi _p(\cdot ;\varvec{\xi },\varvec{\Omega })\) denotes the density function of the p-variate normal distribution with mean vector \(\varvec{\xi }\) and variance-covariance matrix \(\varvec{\Omega }\), and \(\Phi (\cdot )\) is the cumulative distribution function of a standard normal. The sn random vector \({\textbf{Y}} \sim SN_p (\varvec{\xi }, \varvec{\Omega }, \varvec{\lambda })\) can be introduced as the location-scale transformation \({\textbf{Y}} = \varvec{\xi } + {\varvec{\Omega }}^{1/2}{\textbf{X}}\), where \({\textbf{X}}\) has the following stochastic representation:
where \(\varvec{\delta } = \varvec{\lambda }/{(1 + {\varvec{\lambda }}^{\top }\varvec{\lambda })}^{1/2}\), \(X_0 \sim N(0,1)\) and \({\textbf{X}}_1 \sim N_p (\varvec{0}, \varvec{I}_p - \varvec{\delta }{\varvec{\delta }}^{\top })\), which are independent. By (4) we can get that if \({\textbf{Y}} \sim SN_p (\varvec{\xi }, \varvec{\Omega }, \varvec{\lambda })\), then there are two independent random quantities Z and \({\textbf{U}}\), with \(Z \buildrel d\over = \vert X_0 \vert\) and \({\textbf{U}} \buildrel d\over = {\varvec{\Omega }}^{1/2}{\textbf{X}}_1\), such that
where \(\varvec{\Delta } = {\varvec{\Omega }}^{1/2}\varvec{\delta }\). Note that \(Z \sim HN(0,1)\) and \({\textbf{U}} \sim (\varvec{0}, \varvec{\Omega } - \varvec{\Delta }{\varvec{\Delta }}^{\top })\). Thus, using (5), it can be shown that the mean vector and variance-covariance matrix of \({\textbf{Y}}\) are given respectively by
3 A two-piece normal dynamic linear model
3.1 The initial state distribution
Our proposal in this section is to derive a more flexible dlm that regulates asymmetry through a simple scalar parameter. Specifically, preserving the classical independence assumptions, we consider the dlm defined by
for \(t=1,\ldots , T\), replacing the initial state parameter \(\varvec{\theta }_0\) distribution with the following hierarchical specification:
The model defined by the Eqs. (6, 7) and (8, 9) will be referred to as two-piece normal dynamic linear model (hereafter tpn-dlm).
As a first important result, we note that the hierarchical specification (8, 9) leads a mixture of two multivariate skew-normals as initial distribution for \(\varvec{\theta }_0\). Its proof can be derived as a direct extension of Proposition 2 in Arellano-Valle et al. (2020), and so it is omitted here.
Proposition 3.1
Under the hierarchical representation defined by Eqs. (8, 9), the initial density of \(\varvec{\theta }_0\) is given by
where, for \(s=a,b\), \(\pi _s\) is defined by (3), and
Here it should be noted that from the well-known matrix inversion formula
we get, for \(s=a,b\), that \(1-{\varvec{\alpha }}^\top _s{\varvec{\Omega }}^{-1}_s\varvec{\alpha }_s = {(1+{\varvec{\alpha }}^\top _s{\varvec{C}_0}^{-1}\varvec{\alpha }_s)}^{-1}>0\) and \({\varvec{\alpha }}^\top _s{\varvec{\Omega }}^{-1}_s={(1+{\varvec{\alpha }}^\top _s{\varvec{C}_0}^{-1}\varvec{\alpha }_s)}^{-1}{\varvec{\alpha }}^\top _s{\varvec{C}_0}^{-1}\), so that the term \(\varvec{\eta }_s\) defined in Proposition 3.1 can be rewritten as
The distribution of the initial random vector \(\varvec{\theta }_0\) can be written as
which correspond to the density of a two-component mixture of the multivariate skew-normal densities reported in Sect. 2.2. Specifically, from Proposition (3.1), we see that the initial state parameter is distributed as
where
3.2 The Kalman filter
Our next step is to develop a Kalman filter based on the new initial distribution given by (12), and assuming that the conditional distribution of \(\varphi\) corresponds to a mixture of two truncated Gaussian densities.
Let \(D_t = \{{\textbf{y}}_1,\cdots ,{\textbf{y}}_t\}\) denote the available information at time t, where \({\textbf{y}}_i\) indicates a realization of the random variable \({\textbf{Y}}_i\). In the proposed tpn-dlm we consider a conditionally normal distribution for \(\varvec{\theta }_0\) given \(\varphi\), with a tpn initial distribution for \(\varphi\) (8). Furthermore, we assume by induction that
Specifically, the conditional distribution of \(\varphi\) correspond to a mixture of two truncated Gaussian with locations \(\eta _{t-1}^s\), scales \(\tau _{t-1}^s\) and mixing weights \(\pi ^s_{t-1}\) with \(s=a,b\) and truncation point \(\mu\), defined by the initial distribution given in 9.
Leveraging the conditional independence properties of the tpn-dlm outlined in Eq. 7, the one-step-ahead predictive distribution of \({\varvec{\theta }}_t\) given \((\varphi ,D_{t-1})\) is given by
where
Similarly, using (6) we find that the one-step-ahead predictive distribution of \({\textbf{Y}}_t\) given \((\varphi ,D_{t-1})\) becomes
where
In other words, from (14) and (16) we have that
and therefore
Finally, by applying the properties of the conditional normal distribution, we obtain the following filtering distribution of \(\varvec{\theta }_t\) given \((\varphi , D_{t})\):
where
with \(\varvec{a}_t\), \(\varvec{b}_t\), \(\varvec{R}_t\) and \(\varvec{\Sigma }_t\) defined as in (15) and (17), respectively.
The above results are formalized below:
Proposition 3.2
Consider the TPN-dlm defined by Eqs. (6)-(7) and (8)-(9), with the induction assumptions (13). Then:
- (i):
-
The one-step-ahead conditional predictive distribution of
$$\begin{aligned} \varvec{\theta }_{t} \vert (\varphi , D_{t-1}) \sim N_p (\varvec{a}_{t} + \varphi \varvec{b}_{t}, \varvec{R}_{t}); \end{aligned}$$ - (ii):
-
The one-step-ahead conditional predictive distribution of
$$\begin{aligned} {\textbf{Y}}_{t} \vert (\varphi , D_{t-1}) \sim N_r ( \varvec{F}^{\top }_t\varvec{a}_{t} + \varphi \varvec{F}^{\top }_t\varvec{b}_{t}, \varvec{\Sigma }_{t}); \end{aligned}$$ - (iii):
-
The conditional filtering distribution of
$$\begin{aligned} \varvec{\theta }_{t} \vert (\varphi , D_{t}) \sim N_p ( \varvec{m}_{t} + \varphi \varvec{\beta }_{t}, \varvec{C}_{t}). \end{aligned}$$
The next proposition establishes the conditional distribution of \(\varphi \vert D_{t}\).
Proposition 3.3
Consider the TPN-dlm defined by Eqs. (6)-(7) and (8)-(9), with the induction assumptions (13). Then the conditional distribution of \(\varphi \vert D_{t}\) has a finite mixture density of two truncated Gaussian distributions, given by
where, for \(s=a,b\),
and
where
This representation allows to characterize the expected value and variance of \(\varphi \mid D_t\), which can be expressed as
and
A simpler expression for \(Var\{\varphi \mid D_t\}\) can be obtained in terms of Chi-square cumulative distribution function, adapting (Barr and Sherrill 1999).
Immediate consequences of these results are given in the following proposition.
Proposition 3.4
Consider the TPN-dlm defined by Eqs. (6)-(7) and (8)-(9), with the induction assumptions (13). Then:
- (i):
-
The one-step-ahead predictive distribution of \(\varvec{\theta }_{t}\) given \(D_{t-1}\) is
$$\begin{aligned} p(\varvec{\theta }_{t} \vert D_{t-1})&= \pi ^a_{t-1}\phi _p(\varvec{\theta }_{t}; \varvec{a}_t+\eta ^a_{t-1}\varvec{b}_t,\varvec{R}_t+\tau ^a_{t-1}\varvec{b}_t \varvec{b}^\top _t)\Phi \left( -\frac{\mu -\chi _t^a}{\sqrt{\vartheta _t^a}}\right) \\&\quad +\pi ^b_{t-1}\phi _p(\varvec{\theta }_{t};\varvec{a}_t+\eta ^b_{t-1}\varvec{b}_t, \varvec{R}_t+\tau ^b_{t-1}\varvec{b}_t\varvec{b}^\top _t)\Phi \left( \frac{\mu -\chi _t^b}{\sqrt{\vartheta _t^b}}\right) , \end{aligned}$$where, for \(s=a, b\),
$$\begin{aligned} \chi _t^s =\frac{\eta ^s_{t-1}+\tau ^s_{t-1}\varvec{b}^\top _t\varvec{R}_t^{-1} (\varvec{\theta }_t-\varvec{a}_t)}{1+\tau _{t-1}^s \varvec{b}^\top _t\varvec{R}_t^{-1} \varvec{b}_t},\quad \vartheta _t^s=\frac{\tau _{t-1}^s}{1+\tau _{t-1}^s \varvec{b}^\top _t\varvec{R}_t^{-1}\varvec{b}_t}. \end{aligned}$$ - (ii):
-
The one-step-ahead predictive distribution of \({\textbf{y}}_{t}\) given \(D_{t-1}\) is
$$\begin{aligned} p({\textbf{y}}_{t} \vert D_{t-1})&=\pi ^a_{t-1}\phi _r({\textbf{y}}_t ; \varvec{F}^{\top }_t(\varvec{a}_{t} + \eta ^a_{t-1}\varvec{b}_{t}) , {\varvec{\Sigma }}_t +\tau ^a_{t-1}\varvec{F}^{\top }_t\varvec{b}_{t}\varvec{b}_t^{\top } \varvec{F}_t)\Phi \left( -\frac{\mu -\eta _t^a}{\sqrt{\tau _t^a}}\right) \nonumber \\&\quad +\pi ^b_{t-1}\phi _r({\textbf{y}}_t ; \varvec{F}^{\top }_t(\varvec{a}_{t} + \eta ^b_{t-1}\varvec{b}_{t}) , {\varvec{\Sigma }}_t + \tau ^b_{t-1}\varvec{F}^{\top }_t\varvec{b}_{t}\varvec{b}_t^{\top }\varvec{F}_t) \Phi \left( \frac{\mu -\eta _t^b}{\sqrt{\tau _t^b}}\right) \nonumber \\ \end{aligned}$$where \(\eta _t^s\) and \(\tau ^s_t\), for \(s=a,b\), are defined above in Proposition 3.3.
- (iii):
-
The filtering distribution is
$$\begin{aligned} p(\varvec{\theta }_{t} \vert D_{t})= & {} \pi ^a_t\phi _p(\varvec{\theta }_{t}; \varvec{m}_t+\eta _t^a\varvec{\beta }_t,\varvec{C}_t + \tau ^a_t\varvec{\beta }_t\varvec{\beta }^\top _t) \frac{\Phi \left( -\frac{\mu -\delta _t^a}{\sqrt{\upsilon _t^a}}\right) }{\Phi \left( -\frac{\mu -\eta ^a_t}{\sqrt{\tau ^a_t}}\right) }\,\nonumber \\{} & {} +\pi ^b_t\phi _p(\varvec{\theta }_{t}; \varvec{m}_t+\eta _t^b\varvec{\beta }_t,\varvec{C}_t + \tau _t^b\varvec{\beta }_t\varvec{\beta }^\top _t) \frac{\Phi \left( \frac{\mu -\delta _t^b}{\sqrt{\upsilon _t^b}}\right) }{\Phi \left( \frac{\mu -\eta _t^b}{\sqrt{\tau ^b_t}}\right) }, \end{aligned}$$(23)where \(\pi _t^s\), \(\eta ^s_t\) and \(\tau ^s_t\), for \(s=a, b\), are defined in in Proposition 3.3, and
$$\begin{aligned} \delta _t^s=\frac{\eta _t^s+\tau _t^s\varvec{\beta }^\top _t\varvec{C}_t^{-1}(\varvec{\theta }_{t}-\varvec{m}_t)}{1+\tau _t^s\varvec{\beta }^\top _t\varvec{C}_t^{-1}\varvec{\beta }_t},\quad \upsilon _t^s=\frac{\tau _t^s}{1+\tau _t^s\varvec{\beta }^\top _t\varvec{C}_t^{-1}\varvec{\beta }_t}. \end{aligned}$$
Proposition 3.4 shows the distribution of one-step-ahead predictive distribution of the states is typically skewed, and the same is true for the analogous predictive distribution of the response. Also, Proportion 3.4 shows that the filtering distribution is also typically skewed. This can be seen by comparing the results of our Proposition 3.4 with the results from the usual dlm (see, e.g., Petris et al. (2009)). Finally, since
and
then from equations (14), (16) and the property P3. of the tpn distribution (see Sect. 2.1), we obtain the following results:
Proposition 3.5
Under the tpn-dlm defined by Eqs. (6)-(7) and (8)-(9), with the induction assumptions (13), the one-step-ahead expected filtering and prediction distributions and their covariance matrices are, respectively, given by
where \(E\{\varphi \mid D_t\}\) and \(Var\{\varphi \mid D_t\}\) are derived in Eqs. 21 and 22, respectively.
4 Outline of Bayesian computation
In this section we combine the results obtained above to derive a forward filtering backward sampling (ffbs) to conduct full Bayesian inference on the model’s parameters \(\varvec{\Theta }= (\varvec{\theta }_1, \dots , \varvec{\theta }_T)\), \(\varvec{V}\) and \(\varvec{W}\) via Markov-Chain Monte Carlo (mcmc). In particular, we assign Inverse-Wishart priors on the error covariances \(\varvec{V}\) and \(\varvec{W}\) as
where \(\varvec{M}\) and \(\varvec{Z}\) are positive definite matrix with size \(r\times r\) and \(p\times p\), respectively, while \(\ell\) and g are scalars such that \(\ell >(r-1)/2\) and \(g>(p-1)/2\). This choice guarantees that the covariance matrices \({\textbf{V}}\) and \({\textbf{W}}\) are positive definite.
Conditionally on the latent states, such distributions are conjugate, as the model is conditionally Gaussian. Therefore, the full conditional distributions of \(\varvec{V}\) and \(\varvec{W}\) are again Inverse-Wishart with
In order to sample from \(\varvec{\Theta }|(D_T,\varphi )\) we rely on backward recursions and decompose the filtered distribution for the state parameters following Carter and Kohn (1994), Frühwirth-Schnatter (1994), as
where
and with
4.1 mcmc algorithm
Posterior sampling can be performed combining the above results in a mcmc algorithm, alternating the Kalman filter with sampling from the conditional distributions. The following pseudo-code illustrates the steps of a single mcmc iteration:
-
1.
Sample \(\varvec{\Theta }\) using the following modified ffbs algorithm:
-
1a.
For \(t=1,\dots , T\), update the parameters of the distribution \(\varvec{\theta }_t|D_t\) using the Kalman filter given in Sect. 3.2 (forward filtering)
-
1b.
For \(t=1, \dots , T\), sample \(\varphi |D_t\) from the conditional distribution outlined in Eq. 20;
-
1c.
Sample \(\varvec{\theta }_T|D_T\) from the filtering distribution reported in Eq. 23;
-
1d.
For \(t = T-1, T-2, \dots , 1\), sample \(\varvec{\theta }_t|(\varvec{\theta }_{t+1},D_t, \varphi )\) from the distribution outlined in Eq. 28, conditioning on the \(\varvec{\theta }_{t+1}\) sampled in the previous step (backward smoothing)
-
1a.
-
2.
Sample \(\varvec{V}\) from its Inverse-Wishart full-conditional distribution, outlined in Equation (24);
-
3.
Sample \(\varvec{W}\) from its Inverse-Wishart full-conditional distribution, outlined in Equation (25);
5 Simulation
We propose a simulation study to compare the performance of the proposed approach against a Gaussian dlm, focusing on different settings with varying sample size. We focus on univariate settings, assuming that the matrices \(\{\varvec{G}_t\}\) and \(\{\varvec{F}_t\}\) are unidimensional and do not depend on time, namely \(\varvec{F}=\varvec{G}=1\). We simulated \(T=50\) observations from the dlm defined by
with different specification of the initial distribution \(\theta _0\) and the disturbances \(\nu _t\) and \(\omega _t\). Specifically, we focus on the following settings:
-
(1)
Scenario 1: data are generated from a two-piece dlm, with initial distribution \(\theta _0|\varphi \sim N(-3+2\varphi , 2)\) and \(\varphi \sim TPN(3,\sqrt{3},0.5)\), letting \(a(\gamma ) = 1 + \gamma\) and \(b(\gamma )= 1-\gamma\), with Gaussian errors \(\nu _t \sim N(0,5)\) and \(\omega _t \sim N(0,3)\)
-
(2)
Scenario 2: data are generated from a Gaussian dlm, with \(\theta _0 \sim N(-3, 2)\) and Gaussian errors \(\nu _t \sim N(0,5)\), \(\omega _t \sim N(0,3)\)
-
(3)
Scenario 3: data are generated from a dlm with heavy tails, simulating \(\theta _0\), \(\nu _t\) and \(\omega _t\) from independent Student’s t distribution with 3 degrees of freedom.
We chose diffuse inverse-gamma distributions as priors for V and W, which in this case are scalar, with parameters \(\ell =M=g=Z=0.001\). We compare our approach with a Gaussian dlm with same prior distributions, running both algorithms for 5000 iterations after 500 burn-in samples, and focusing on the one-step-ahead predictions and state parameters. Examination of traceplots of the parameters, auto-correlation function and Rubin’s diagnostics showed no evidence against convergence.
Figure 1, 2, and 3 show the one-step-ahead predictions and filtered estimates in the three scenarios. Current empirical findings indicate that, as expected, the main advantage of the proposed approach is more evident in the initial part of the series, where the impact of the initial distribution is substantial. This result is clearly seen in Fig. 1, where the tpn-dlm is correctly specified, and the Gaussian dlm tend to underestimate the state parameter and the one-step-ahead predictions. When data are generated from a Gaussian dlm, as in Fig. 2, the tpn initial distribution is incorrectly specified. However, its impact vanishes after few steps, and its one-step-ahead predictions are indistinguishable from a Gaussian dlm. Lastly, Fig. 3 focuses on a setting where both models are incorrectly specified, in terms of initial and distribution of the errors. We observe that the proposed tpn-dlm is robust again such misspecification, obtaining one-step-ahead predictions and estimates for the state parameters that are closer to the true level.
These findings are further explored replicating the simulations scenarios for different sample sizes, ranging \(T\in \{10,50,100\}\), with \(T=50\) corresponding to the results from Figs. 1,2 and 3. Results are reported in Table 1, comparing the Mean Squared Error (mse) of the expected value of the one-step-ahead distributions under both approaches. Empirical results are consistent with the previous discussion, with the tpn-dlm performing particularly well with small sample sizes, under correct specification and with heavy-tails processes.
6 Analysis of real data
Finally, we illustrate the tpn-dlm by analyzing the quarterly earnings in dollars per Johnson and Johnson share from 1960 to 1980 (Shumway et al. 2000, Example 1.1).
Data are characterized by a seasonality larger in the starting and ending years, almost missing for the central years. Trend is increasing and regular. Following Shumway et al. (2000), the time series will be modelled with the trend and seasonality components added to a white noise
trend will be modelled as follow
and we assume that the seasonal component is expected to sum to zero over a complete period of four quarters
We may express the model in state-space form, by choosing \([T_t, S_t, S_{t-1}, S_{t-2}]^{\top }\) as state vector:
The parameters to be estimated are the observation noise variance, V, and the state noise variances associated with the trend, \(W_{11}\), and the seasonal components, \(W_{22}\). In addition, we need to estimate the transition parameter associated with the growth rate, \(\phi\). Following Shumway et al. (2000), Example 6.27, we write \(\phi =1+\zeta\), where \(0<\zeta \le 1\), and we rewrite the trend component as
so that, conditionally on the states, \(\zeta\) is the slope of the linear regression of \((T_t-T_{t-1})\) on \(T_{t-1}\) and \(\omega _{t1}\) is the error. We choose a reference uninformative prior on \((\zeta ,\omega _{t1})\) and weakly informative priors for the remaining parameters by letting \(\ell =M=0.001\) and \(g=0.05\) and \(\textbf{Z}=\text{ diag }\{0.05,0.05,0,0\}\). We ran the algorithm for 5000 iterations collected after 5000 burn-in samples. Examination of traceplots of the parameters, auto-correlation function and Rubin’s diagnostics showed no evidence against convergence. Fig. 4 displays the comparison of the trends (\(T_t\)) and season (\(T_t + S_t\)) along with \(99\%\) credible intervals for Gaussian dlm and tpn-dlm. Figure 5 displays the data and the one-step-ahead predictions for the time series \(Y_t\), again along with 99% credible intervals for Gaussian dlm and two-piece dlm.
Figures 4 and 5 show that the \(99\%\) credibility intervals of state and response are different between of the two-piece dlm and dlm: as a consequence the entire distributions of those quantities are different. In addition we note that skewness of predictive distributions is maintained with the increasing of time, showing the usefulness of the two-piece dlm.
Mean squared error was 0.2131 for the two-piece dlm and 0.3512 for the dlm, showing an advantage towards the two-piece dlm. We also considered a BIC criteria for competing alternative models k, \(k = 1, 2,\ldots , K\), the smaller-is-better criterion BIC is \(\hbox {BIC}_k = n \log MSE_k + m_k \log (n)\), where \(MSE_k\) is the predicting mean squared error and \(m_k\) is the number of independent parameters used to fit model k. We obtained \(-70.18\) for the dlm, where \(m_k=4\) and \(-107.69\) for the two-piece dlm, where \(m_k=5\), confirming that also including complexity of the model, two-piece dlm is preferable.
7 Conclusion
In this article we proposed a flexible dynamic linear model (dlm) for modeling and forecasting multivariate time series relaxing the assumption of normality for the initial distribution of the state space parameter, replacing it by a more flexible class of distributions, which is called two-piece normal distributions. This model allows the initial distribution of the state space parameter to be skewed, and the asymmetry can be controlled by a scalar parameter. We derive a Kalman filter for this model, obtaining a two component mixture as predictive and filtering distributions that maintain skeweness.
In our opinion, the main contribution of this article is to present a simple and effective tool to model time series with possibly skewed distribution, like the Example 1.1 in Shumway et al. (2000) here. Also since we obtained a two component mixture for predictive and filtering distributions so this new model can simultaneously deal with some issues related to departures from normality like skewed, heavy-tailed data and also, multi-modality.
References
Arellano-Valle RB, Azzalini A, Ferreira CS, Santoro K (2020) A two-piece normal measurement error model. Comput Stat Data Anal 144:106863
Arellano-Valle RB, Contreras-Reyes JE, Quintero FOL, Valdebenito A (2019) A skew-normal dynamic linear model and Bayesian forecasting. Comput Stat 34(3):1055–1085
Arellano-Valle RB, Gómez HW, Quintana FA (2005) Statistical inference for a general class of asymmetric distributions. J Stat Plan Inference 128(2):427–443
Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J R Stat Soc Ser B Stat Methodol 61:579–602
Barr DR, Sherrill ET (1999) Mean and variance of truncated normal distributions. Am Stat 53(4):357–361
Cabral CRB, Da-Silva CQ, Migon HS (2014) A dynamic linear model with extended skew-normal for the initial distribution of the state parameter. Comput Stat Data Anal 74:64–80
Carter CK, Kohn R (1994) On Gibbs sampling for state space models. Biometrika 81(3):541–553
Corns T, Satchell S (2007) Skew Brownian motion and pricing European options. Eur J Financ 13(6):523–544
Fasano A, Rebaudo G, Durante D, Petrone S (2021) A closed-form filter for binary time series. Stat Comput 31(4):1–20
Frühwirth-Schnatter S (1994) Data augmentation and dynamic linear models. J Time Ser Anal 15(2):183–202
Gualtierotti A (2005) Skew-normal processes as models for random signals corrupted by Gaussian noise. Int J Pure Appl Math 20:109–142
Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82:35–45
Kim H-M, Ryu D, Mallick BK, Genton MG (2014) Mixtures of skewed Kalman filters. J Multivar Anal 123:228–251
Naveau P, Genton MG, Shen X (2005) A skewed Kalman filter. J Multivar Anal 94(2):382–400
Petris G, Petrone S, Campagnoli P (2009) Dynamic linear models with R. Springer, Berlin
Pourahmadi M (2007) Skew-normal ARMA models with nonlinear heteroscedastic predictors. Commun Stat Theory Methods 36(9):1803–1819
Shumway RH, Stoffer DS, Stoffer DS (2000) Time series analysis and its applications, vol 3. Springer, Berlin
West M, Harrison J (1997) Bayesian forecasting and dynamic models. Springer Science & Business Media, Berlin
Funding
Open access funding provided by Università degli Studi di Padova within the CRUI-CARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs
Appendix: Proofs
Proof of Proposition 3.1
Using the hierarchical representation defined by Eqs. (8)-(9), and the fact that \(\varphi\) has a tpn density (2), we obtain
By making the change of variables \(\varphi _a=\frac{{\varphi -\mu }}{\sigma _0 a(\gamma _0)}\) and \(\varphi _b=\frac{-({\varphi -\mu })}{\sigma _0 b(\gamma _0)}\) we have
\(\square\)
Proof of Proposition 3.3
By the induction hypotheses, the conditional distribution of \(\varphi\) given \(D_t\) is
Using (6) and (16), we get
From the marginal/conditional representation of the multivariate normal distribution, we find, for \(s=a,b\), the following identity:
where, for \(s=a,b\),
Hence, we have
from where the inverse of the proportionality/normalization constant becomes
Therefore, the conditional distribution of \(\varphi \mid D_t\) can be written as
with
which correspond to the density of
and the result is proved. \(\square\)
Proof of Proposition 3.4
Part (i): by using (14) and (6) we have
where, for \(s=a, b\), we use that
by considering again the marginal/conditional factorization of the multivariate normal density, and
which leads to the proof of part (i).
Part (ii): The proof of this part is direct from (29) since \(c_t^{-1}=p({\textbf{y}}_{t} \vert D_{t-1})\).
Part (iii): Proceeding similarly to part (i), but now by using part (iii) of Proposition 3.2 and Proposition 3.3, we get
where, for
This proves part (iii). \(\square\)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Aliverti, E., Arellano-Valle, R.B., Kahrari, F. et al. A flexible two-piece normal dynamic linear model. Comput Stat 38, 2075–2096 (2023). https://doi.org/10.1007/s00180-023-01355-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-023-01355-3