1 Introduction

State-space models have been extensively considered in diverse areas of application for modeling and forecasting time series. An important special case is the class of dynamic linear models (hereafter dlm). This class of models includes the ordinary static linear model as a special case, and assumes that the parameters can change over time, thus incorporating in the observational system variations that can significantly affect the observed behavior of the process of interest. The dlm is defined by

$$\begin{aligned} {\textbf{Y}}_t= & {} {\textbf{F}}^{\top }_t\varvec{\theta }_t + \varvec{\nu }_t\quad ({\text {observation equation}}),\\ \varvec{\theta }_t= & {} {\textbf{G}}_t\varvec{\theta }_{t-1} + \varvec{\omega }_t\quad ({\text {system or state equation}}), \end{aligned}$$

for \(t=1,\ldots , T\), where \({\textbf{Y}}_t\) is a \(r \times 1\) response vector, \({\textbf{F}}_t\) is a \(p \times r\) matrix that links the observed data with \(\varvec{\theta }_t\), a \(p \times 1\) vector of latent states at time t, \({\textbf{G}}_t\) is a \(p\times p\) transition matrix, which describes the evolution of the state parameters. The terms \(\varvec{\nu }_t\) and \(\varvec{\omega }_t\) are mutually independent white-noise vectors of dimension \(r\times 1\) and \(p \times 1\), respectively, with zero-means and constant variance-covariance matrice \({\textbf{V}}\) and \({\textbf{W}}\), respectively. The most popular case is the Gaussian dlm, which assumes that

$$\begin{aligned}&\varvec{\nu }_t \buildrel \text {ind}.\over \sim N_r(\varvec{0},{\textbf{V}}),\qquad \varvec{\omega }_t \buildrel \text {ind}.\over \sim N_p(\varvec{0},{\textbf{W}}),\qquad \varvec{\theta }_0 \sim N_p(\varvec{m}_0,\varvec{C}_0), \end{aligned}$$

all being mutually independents for each \(t=1,\ldots , T\), where \(N_k(\varvec{\mu },\varvec{\Sigma })\) denotes the k-variate normal distribution with mean vector \(\varvec{\mu }\) and variance-covariance matrix \(\varvec{\Sigma }\). For an extensive introduction to dlms with a Bayesian perspective, refer to West and Harrison (1997), Petris et al. (2009). In this article, we focus on a setting where \({\textbf{F}}_t\) and \({\textbf{G}}_t\) are known, consistently with classical Kalman filter setting (Kalman 1960) and recent developments in state-space models (e.g., Fasano et al. 2021).

Leveraging the properties of the multivariate normal distribution and the structure of the Gaussian dlm, it is possibile to derive closed form expressions for the predictive and filtering distributions and conduct dynamic inference on the states \(\varvec{\theta }_t\) via Kalman filter, conditioning on \({\textbf{F}}_t,{\textbf{G}}_t, {\textbf{W}}\) and \({\textbf{V}}\). However, Naveau et al. (2005) observed that the Gaussian assumption may be questionable for a large number of applications, as many distributions used in a state-space model can be skewed. In order to mitigate this issue, Naveau et al. (2005) assumed that state initial parameter vector \(\varvec{\theta }_0\) follows a multivariate closed skew-normal distribution, preserving the typical assumptions of independence and normality for the error sequences \(\varvec{\nu }_t\) and \(\varvec{\omega }_t\). From this work, several authors have proposed different mechanisms to obtain dlm with more skewness. For instance, Kim et al. (2014) extended the results in Naveau et al. (2005) by assuming a scale mixtures of closed skew-normal distributions for the initial state parameter vector \(\varvec{\theta }_0\); Cabral et al. (2014) proposed a Bayesian dlm relaxing the assumption of normality and assuming an extended skew-normal (Azzalini and Capitanio 1999) for the initial distribution of the state parameter; Arellano-Valle et al. (2019) proposed a dlm in which the error sequence \(\varvec{\nu }_t\) in the observational equation are assumed to have multivariate skew-normal distribution. Furthermore, several authors have been dealing with a similar problem; see, for example, Gualtierotti (2005), Pourahmadi (2007), Corns and Satchell (2007), among many others.

In this work we take a similar perspective, and derive a novel dlm that allows one to induce asymmetry by means of a scalar parameter, inducing a skewed initial distribution for the state space parameter \(\varvec{\theta }_0\). Our purpose is to replace the normal distribution for \(\varvec{\theta }_0\) by a more flexible one, incorporating asymmetry via a two-piece normal (tpn) mixing distribution. Using this simple method, we obtain an extension of the classic Kalman filter, and closed form expressions for one-step-ahead and filtering distributions. These results are further combined into a Markov-Chain Monte Carlo procedure via a forward filtering–backward sampling algorithm to provide inference on the covariances \(\textbf{V}\) and \(\textbf{W}\) of the error terms, providing posterior inference on the unknown quantities.

2 Two-piece normal and skew normal distributions

2.1 Two-piece normal distributions

According to Arellano-Valle et al. (2005), a continuous random variable Y follows a two-piece normal (tpn) distribution with location \(\mu\), scale \(\sigma\) and asymmetry parameter \(\gamma\) if its density function for any \(y\in {\mathbb {R}}\) can be written as

$$\begin{aligned} h(y) = \frac{2}{\sigma (a(\gamma ) + b(\gamma ))}\left\{ \phi \left( \frac{y-\mu }{\sigma a(\gamma )}\right) I_{[\mu ,\infty )}(y) + \phi \left( \frac{y-\mu }{\sigma b(\gamma )}\right) I_{(-\infty ,\mu )}(y)\right\} , \end{aligned}$$

where \(\phi (x)\) denotes the density of a standard Gaussian and \(I_A(x)\) denotes the indicator function of the set A; we write such a distribution compactly as \(Y\sim TPN(\mu , \sigma , \gamma )\). Skewness is controlled via two functions \(a(\gamma )\) and \(b(\gamma )\) satisfying the following properties:

  1. (i)

    \(a(\gamma )\) and \(b(\gamma )\) are positive-valued functions for \(\gamma \in (\gamma _L,\gamma _U)\), a possibly infinite interval;

  2. (ii)

    one of the functions is (strictly) increasing and the other is (strictly) decreasing;

  3. (iii)

    there exists an unique value \(\gamma _* \in (\gamma _L, \gamma _U)\) such that \(a(\gamma _*) = b(\gamma _*)\), and therefore the tpn density (1) becomes

    $$\begin{aligned} h(y)=\frac{2}{\sigma a(\gamma _*)}\phi \left( \frac{y-\mu }{\sigma a(\gamma _*)}\right) . \end{aligned}$$

In addition, the tpn distribution has several interesting formal properties in terms of stochastic construction (Arellano-Valle et al. 2005). The following list includes those most relevant for the purposes of this work:

P1. The tpn density (1) can be expressed as a finite mixture of two truncated normal densities \(f_a\) and \(f_b\) given by

$$\begin{aligned} f_a(y) = \frac{2}{\sigma a(\gamma )}\phi \left( \frac{y-\mu }{\sigma a(\gamma )}\right) I_{[\mu ,\infty )}(y),\quad f_b(y) = \frac{2}{\sigma b(\gamma )}\phi \left( \frac{y-\mu }{\sigma b(\gamma )}\right) I_{(-\infty ,\mu )}(y). \end{aligned}$$

That is,

$$\begin{aligned} h(y)=\pi _a f_a(y) + \pi _b f_b(y),\quad y \in {\mathbb {R}}, \end{aligned}$$


$$\begin{aligned} \pi _s = \frac{s(\gamma )}{a(\gamma ) + b(\gamma )}, \quad s = a, b. \end{aligned}$$

P2. If \(Y \sim h\), then \(Y \buildrel d\over = \mu + \sigma W_\gamma V\), where the notation \(\buildrel d\over =\) indicates equality in distribution. Specifically, \(V \sim TN(0,1, [0, \infty ])\), a truncated normal with location 0, scale 1, truncated over the positive real line, while \(W_\gamma\) is an independent discrete random variable with probability function

$$\begin{aligned} p{(w;\gamma )} = {\left\{ \begin{array}{ll} \pi _a &{} \quad \text {if } w = a(\gamma ), \\ \pi _b &{} \quad \text {if } w = - b(\gamma ), \\ 0 &{} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$

which can be rewritten as

$$\begin{aligned} p{(w;\gamma )} = \pi _a^{(s+1)/2}\pi _b^{(s-1)/2}I_{\{-1,1\}}(s), \end{aligned}$$

with \(s = \text{ sign }(w)\) and \(\pi _s\) defined by (3). Equivalently, if \(Y = \mu + \sigma W_\gamma |X|\), where \(X \sim N(0,1)\) and is independent of \(W_\gamma\), then \(Y \sim h\). This stochastic representation allows one to obtain the mean and variance of Y leveraging the law of total expectation; refer to Arellano-Valle et al. (2020) for further details.

2.2 Skew-normal distribution

A random vector \({\textbf{Y}}\) has a multivariate Skew-Normal sn distribution with location vector \(\varvec{\xi }\), positive definite scale matrix \(\varvec{\Omega }\) and skewness/shape vector \(\varvec{\lambda }\), denoted by \({\textbf{Y}} \sim SN_p (\varvec{\xi }, \varvec{\Omega }, \varvec{\lambda })\), if its density function is given by

$$\begin{aligned} f({\textbf{y}}; \varvec{\xi },\varvec{\Omega },\varvec{\lambda }) = 2\phi _p({\textbf{y}}; \varvec{\xi },\varvec{\Omega })\Phi ({\varvec{\lambda }}^{\top } {\varvec{\Omega }}^{-1/2}({\textbf{y}} - \varvec{\xi })),\quad \quad {\textbf{y}} \in {\mathbb {R}}^p. \end{aligned}$$

Here, \(\phi _p(\cdot ;\varvec{\xi },\varvec{\Omega })\) denotes the density function of the p-variate normal distribution with mean vector \(\varvec{\xi }\) and variance-covariance matrix \(\varvec{\Omega }\), and \(\Phi (\cdot )\) is the cumulative distribution function of a standard normal. The sn random vector \({\textbf{Y}} \sim SN_p (\varvec{\xi }, \varvec{\Omega }, \varvec{\lambda })\) can be introduced as the location-scale transformation \({\textbf{Y}} = \varvec{\xi } + {\varvec{\Omega }}^{1/2}{\textbf{X}}\), where \({\textbf{X}}\) has the following stochastic representation:

$$\begin{aligned} {\textbf{X}} \buildrel d\over = \varvec{\delta }\vert X_0\vert + {\textbf{X}}_1, \end{aligned}$$

where \(\varvec{\delta } = \varvec{\lambda }/{(1 + {\varvec{\lambda }}^{\top }\varvec{\lambda })}^{1/2}\), \(X_0 \sim N(0,1)\) and \({\textbf{X}}_1 \sim N_p (\varvec{0}, \varvec{I}_p - \varvec{\delta }{\varvec{\delta }}^{\top })\), which are independent. By (4) we can get that if \({\textbf{Y}} \sim SN_p (\varvec{\xi }, \varvec{\Omega }, \varvec{\lambda })\), then there are two independent random quantities Z and \({\textbf{U}}\), with \(Z \buildrel d\over = \vert X_0 \vert\) and \({\textbf{U}} \buildrel d\over = {\varvec{\Omega }}^{1/2}{\textbf{X}}_1\), such that

$$\begin{aligned} {\textbf{Y}} = \varvec{\xi } + \varvec{\Delta }Z + {\textbf{U}}, \end{aligned}$$

where \(\varvec{\Delta } = {\varvec{\Omega }}^{1/2}\varvec{\delta }\). Note that \(Z \sim HN(0,1)\) and \({\textbf{U}} \sim (\varvec{0}, \varvec{\Omega } - \varvec{\Delta }{\varvec{\Delta }}^{\top })\). Thus, using (5), it can be shown that the mean vector and variance-covariance matrix of \({\textbf{Y}}\) are given respectively by

$$\begin{aligned}&E({\textbf{Y}}) = \varvec{\xi } + \sqrt{\frac{2}{\pi }}\varvec{\Delta }\quad \text{ and } \quad \text{ var }({\textbf{Y}}) = \varvec{\Omega } - {\frac{2}{\pi }}\varvec{\Delta }{\varvec{\Delta }}^{\top }. \end{aligned}$$

3 A two-piece normal dynamic linear model

3.1 The initial state distribution

Our proposal in this section is to derive a more flexible dlm that regulates asymmetry through a simple scalar parameter. Specifically, preserving the classical independence assumptions, we consider the dlm defined by

$$\begin{aligned} {\textbf{Y}}_t= & {} {\textbf{F}}^{\top }_t\varvec{\theta }_t + \varvec{\nu }_t,\qquad \varvec{\nu }_t \sim N_r(\varvec{0},{\textbf{V}}),\end{aligned}$$
$$\begin{aligned} \varvec{\theta }_t= & {} {\textbf{G}}_t\varvec{\theta }_{t-1} + \varvec{\omega }_t,\qquad \varvec{\omega }_t \sim N_p(\varvec{0},{\textbf{W}}), \end{aligned}$$

for \(t=1,\ldots , T\), replacing the initial state parameter \(\varvec{\theta }_0\) distribution with the following hierarchical specification:

$$\begin{aligned} \varvec{\theta }_0 \vert \varphi\sim & {} N_p(\varvec{m}_0 + \varphi \varvec{\beta }_0,\varvec{C}_0), \end{aligned}$$
$$\begin{aligned} \varphi\sim & {} {TPN} (\mu ,\sigma _0,\gamma _0). \end{aligned}$$

The model defined by the Eqs. (6, 7) and (8, 9) will be referred to as two-piece normal dynamic linear model (hereafter tpn-dlm).

As a first important result, we note that the hierarchical specification (8, 9) leads a mixture of two multivariate skew-normals as initial distribution for \(\varvec{\theta }_0\). Its proof can be derived as a direct extension of Proposition 2 in Arellano-Valle et al. (2020), and so it is omitted here.

Proposition 3.1

Under the hierarchical representation defined by Eqs. (8, 9), the initial density of \(\varvec{\theta }_0\) is given by

$$\begin{aligned} p(\varvec{\theta }_0)=2\pi _a\phi _p(\varvec{\theta }_0;\varvec{\xi }_0,\varvec{\Omega }_a) \Phi (\varvec{\eta }^{\top }_a(\varvec{\theta }_0 -\varvec{\xi }_0))+2\pi _b\phi _p(\varvec{\theta }_0;\varvec{\xi }_0,\varvec{\Omega }_b) \Phi (\varvec{\eta }^{\top }_b(\varvec{\theta }_0-\varvec{\xi }_0)), \end{aligned}$$

where, for \(s=a,b\), \(\pi _s\) is defined by (3), and

$$\begin{aligned}{} & {} \varvec{\xi }_0=\varvec{m}_0+\mu \varvec{\beta }_0,\,\, \varvec{\alpha }_s=\sigma _0 s(\gamma _0)\varvec{\beta }_0,\,\, \\{} & {} \varvec{\Omega }_s=\varvec{C}_0+\varvec{\alpha }_s\varvec{\alpha }^\top _s,\,\, \varvec{\eta }_s=(1-\varvec{\alpha }^\top _s\varvec{\Omega }^{-1}_s\varvec{\alpha }_s)^{-1/2} {\varvec{\Omega }_s}^{-1}\varvec{\alpha }_s. \end{aligned}$$

Here it should be noted that from the well-known matrix inversion formula

$$\begin{aligned} {\varvec{\Omega }}^{-1}_s=(\varvec{C}_0+\varvec{\alpha }_s{\varvec{\alpha }}^\top _s)^{-1}={\varvec{C}_0}^{-1} -\frac{{\varvec{C}_0}^{-1}\varvec{\alpha }_s{\varvec{\alpha }}^\top _s {\varvec{C}_0}^{-1}}{1+{\varvec{\alpha }}^\top _s{\varvec{C}_0}^{-1}\varvec{\alpha }_s}, \end{aligned}$$

we get, for \(s=a,b\), that \(1-{\varvec{\alpha }}^\top _s{\varvec{\Omega }}^{-1}_s\varvec{\alpha }_s = {(1+{\varvec{\alpha }}^\top _s{\varvec{C}_0}^{-1}\varvec{\alpha }_s)}^{-1}>0\) and \({\varvec{\alpha }}^\top _s{\varvec{\Omega }}^{-1}_s={(1+{\varvec{\alpha }}^\top _s{\varvec{C}_0}^{-1}\varvec{\alpha }_s)}^{-1}{\varvec{\alpha }}^\top _s{\varvec{C}_0}^{-1}\), so that the term \(\varvec{\eta }_s\) defined in Proposition 3.1 can be rewritten as

$$\begin{aligned} \varvec{\eta }_s={(1+{\varvec{\alpha }}^\top _s{\varvec{C}_0}^{-1}\varvec{\alpha }_s)}^{-1/2} {\varvec{C}_0}^{-1}{\varvec{\alpha }}_s. \end{aligned}$$

The distribution of the initial random vector \(\varvec{\theta }_0\) can be written as

$$\begin{aligned} p(\varvec{\theta }_0)= & {} 2\pi _a\phi _p(\varvec{\theta }_0;\varvec{m}_0 +\mu \varvec{\beta }_0,\varvec{C}_0+\varvec{\alpha }_a\varvec{\alpha }^\top _a) \Phi \left( \frac{{\varvec{\alpha }}^\top _a{\varvec{C}}^{-1}_0(\varvec{\theta }_0-\varvec{\xi }_0)}{\sqrt{1+{\varvec{\alpha }}^\top _a{\varvec{C}}^{-1}_0{\varvec{\alpha }}_a}}\right) \nonumber \\{} & {} +2\pi _b\phi _p(\varvec{\theta }_0;\varvec{m}_0+\mu \varvec{\beta }_0,\varvec{C}_0 +\varvec{\alpha }_b\varvec{\alpha }^\top _b)\Phi \left( \frac{{\varvec{\alpha }}^\top _b {\varvec{C}}^{-1}_0(\varvec{\theta }_0-\varvec{\xi }_0)}{\sqrt{1+{\varvec{\alpha }}^\top _b{\varvec{C}}^{-1}_0{\varvec{\alpha }}_b}}\right) \quad , \end{aligned}$$

which correspond to the density of a two-component mixture of the multivariate skew-normal densities reported in Sect. 2.2. Specifically, from Proposition (3.1), we see that the initial state parameter is distributed as

$$\begin{aligned} \varvec{\theta }_0\sim \pi _a SN_p(\varvec{\xi }_a,\varvec{\Omega }_a,\varvec{\lambda }_a) + \pi _b SN_p(\varvec{\xi }_b,\varvec{\Omega }_b,\varvec{\lambda }_b), \end{aligned}$$


$$\begin{aligned} \varvec{\xi }_s={\varvec{\xi }}_0\quad \textrm{and}\quad \varvec{\lambda }_s=(1-{\varvec{\alpha }}^\top _s{\varvec{\Omega }}^{-1}_s \varvec{\alpha }_s)^{-1/2}{\varvec{\Omega }}^{-1/2}_s\varvec{\alpha }_s,\quad \quad s=a,b. \end{aligned}$$

3.2 The Kalman filter

Our next step is to develop a Kalman filter based on the new initial distribution given by (12), and assuming that the conditional distribution of \(\varphi\) corresponds to a mixture of two truncated Gaussian densities.

Let \(D_t = \{{\textbf{y}}_1,\cdots ,{\textbf{y}}_t\}\) denote the available information at time t, where \({\textbf{y}}_i\) indicates a realization of the random variable \({\textbf{Y}}_i\). In the proposed tpn-dlm we consider a conditionally normal distribution for \(\varvec{\theta }_0\) given \(\varphi\), with a tpn initial distribution for \(\varphi\) (8). Furthermore, we assume by induction that

$$\begin{aligned}{} & {} \varvec{\theta }_{t-1} \vert \varphi ,D_{t-1} \sim N_p(\varvec{m}_{t-1} + \varphi \varvec{\beta }_{t-1},\varvec{C}_{t-1}), \nonumber \\{} & {} \varphi \vert D_{t-1} \sim \pi _{t-1}^a TN(\eta _{t-1}^a,\tau _{t-1}^a, [\mu ,\infty )) + \pi _{t-1}^b TN(\eta _{t-1}^b,\tau _{t-1}^b,(-\infty ,\mu )). \end{aligned}$$

Specifically, the conditional distribution of \(\varphi\) correspond to a mixture of two truncated Gaussian with locations \(\eta _{t-1}^s\), scales \(\tau _{t-1}^s\) and mixing weights \(\pi ^s_{t-1}\) with \(s=a,b\) and truncation point \(\mu\), defined by the initial distribution given in 9.

Leveraging the conditional independence properties of the tpn-dlm outlined in Eq. 7, the one-step-ahead predictive distribution of \({\varvec{\theta }}_t\) given \((\varphi ,D_{t-1})\) is given by

$$\begin{aligned} \varvec{\theta }_{t} \vert (\varphi ,D_{t-1}) \buildrel d\over = \varvec{G}_t(\varvec{\theta }_{t-1} \vert (\varphi ,D_{t-1}))+\varvec{\omega }_t \sim N_p(\varvec{a}_t + \varphi \varvec{b}_t, \varvec{R}_t), \end{aligned}$$


$$\begin{aligned} \varvec{a}_t = \varvec{G}_t\varvec{m}_{t-1},\quad \varvec{b}_t = \varvec{G}_t\varvec{\beta }_{t-1},\quad \varvec{R}_t = \varvec{G}_t\varvec{C}_{t-1} \varvec{G}^{\top }_t + \varvec{W}. \end{aligned}$$

Similarly, using (6) we find that the one-step-ahead predictive distribution of \({\textbf{Y}}_t\) given \((\varphi ,D_{t-1})\) becomes

$$\begin{aligned} {\textbf{Y}}_{t} \vert (\varphi ,D_{t-1}) \buildrel d\over = \varvec{F}^{\top }_t(\varvec{\theta }_{t} \vert (\varphi ,D_{t-1}))+\varvec{\nu }_t \sim N_r(\varvec{F}^{\top }_t\varvec{a}_t + \varphi \varvec{F}^{\top }_t\varvec{b}_t, \varvec{\Sigma }_t), \end{aligned}$$


$$\begin{aligned} \varvec{\Sigma }_t = \varvec{F}^{\top }_t\varvec{R}_{t} \varvec{F}_t + \varvec{V}. \end{aligned}$$

In other words, from (14) and (16) we have that

$$\begin{aligned} \begin{bmatrix}{\textbf{Y}}_t\\ \varvec{\theta }_t\end{bmatrix} \vert (\varphi ,D_{t-1})\buildrel d\over = \begin{bmatrix} \varvec{F}^{\top }_t\varvec{G}_{t} \\ \varvec{G}_t\\ \end{bmatrix}(\varvec{\theta }_{t-1} \vert (\varphi ,D_{t-1})) + \begin{bmatrix} \varvec{F}^{\top }_t &{} \varvec{I}_r \\ \varvec{I}_p &{} \varvec{0}\\ \end{bmatrix} \begin{bmatrix} \varvec{\omega }_{t} \\ \varvec{\nu }_t\\ \end{bmatrix}, \end{aligned}$$

and therefore

$$\begin{aligned} \begin{bmatrix}{\textbf{Y}}_t\\ \varvec{\theta }_t\end{bmatrix} \vert (\varphi ,D_{t-1}) \sim N_{p+r} \left( \begin{bmatrix} \varvec{F}^{\top }_t\varvec{a}_{t} \\ \varvec{a}_t\\ \end{bmatrix} + \varphi \begin{bmatrix} \varvec{F}^{\top }_t\varvec{b}_{t}\\ \varvec{b}_t\\ \end{bmatrix}, \begin{bmatrix} \varvec{\Sigma }_t &{} \varvec{F}^{\top }_t\varvec{R}_{t} \\ \varvec{R}_t\varvec{F}_{t} &{} \varvec{R}_t\\ \end{bmatrix} \right) . \end{aligned}$$

Finally, by applying the properties of the conditional normal distribution, we obtain the following filtering distribution of \(\varvec{\theta }_t\) given \((\varphi , D_{t})\):

$$\begin{aligned} \varvec{\theta }_t \vert (\varphi , D_{t}) \sim N_p ({\varvec{m}}_t + \varphi {\varvec{\beta }}_t, \varvec{C}_t), \end{aligned}$$


$$\begin{aligned} \left. \begin{array}{l} \varvec{m}_{t} = \varvec{a}_{t} + \varvec{R}_t\varvec{F}_{t}{\varvec{\Sigma }}^{-1}_t ({\textbf{y}}_t-\varvec{F}^{\top }_t\varvec{a}_{t}), \\ \varvec{\beta }_t = \varvec{b}_{t} - \varvec{R}_t\varvec{F}_{t} {\varvec{\Sigma }}^{-1}_t\varvec{F}^{\top }_t\varvec{b}_{t},\\ \varvec{C}_t = \varvec{R}_{t} - \varvec{R}_t\varvec{F}_{t} {\varvec{\Sigma }}^{-1}_t\varvec{F}^{\top }_t\varvec{R}_{t}, \end{array}\right\} \end{aligned}$$

with \(\varvec{a}_t\), \(\varvec{b}_t\), \(\varvec{R}_t\) and \(\varvec{\Sigma }_t\) defined as in (15) and (17), respectively.

The above results are formalized below:

Proposition 3.2

Consider the TPN-dlm defined by Eqs. (6)-(7) and (8)-(9), with the induction assumptions (13). Then:


The one-step-ahead conditional predictive distribution of

$$\begin{aligned} \varvec{\theta }_{t} \vert (\varphi , D_{t-1}) \sim N_p (\varvec{a}_{t} + \varphi \varvec{b}_{t}, \varvec{R}_{t}); \end{aligned}$$

The one-step-ahead conditional predictive distribution of

$$\begin{aligned} {\textbf{Y}}_{t} \vert (\varphi , D_{t-1}) \sim N_r ( \varvec{F}^{\top }_t\varvec{a}_{t} + \varphi \varvec{F}^{\top }_t\varvec{b}_{t}, \varvec{\Sigma }_{t}); \end{aligned}$$

The conditional filtering distribution of

$$\begin{aligned} \varvec{\theta }_{t} \vert (\varphi , D_{t}) \sim N_p ( \varvec{m}_{t} + \varphi \varvec{\beta }_{t}, \varvec{C}_{t}). \end{aligned}$$

The next proposition establishes the conditional distribution of \(\varphi \vert D_{t}\).

Proposition 3.3

Consider the TPN-dlm defined by Eqs. (6)-(7) and (8)-(9), with the induction assumptions (13). Then the conditional distribution of \(\varphi \vert D_{t}\) has a finite mixture density of two truncated Gaussian distributions, given by

$$\begin{aligned} \varphi \vert D_{t} \sim \,\pi _{t}^a \, TN(\eta _{t}^a,\tau _{t}^a, [\mu ,\infty )) + \pi _{t}^b \,TN(\eta _{t}^b,\tau _{t}^b,(-\infty ,\mu )) \end{aligned}$$

where, for \(s=a,b\),

$$\begin{aligned} \eta ^s_t&=\frac{\eta ^s_{t-1}+\tau _{t-1}^s\varvec{b}^\top _t\varvec{F}_t \varvec{\Sigma }^{-1}_t({\textbf{y}}_t-\varvec{F}^\top _t\varvec{a}_t)}{1+\tau _{t-1}^s\varvec{b}^\top _t\varvec{F}_t\varvec{\Sigma }^{-1}_t\varvec{F}^\top _t \varvec{b}_t},\quad \tau ^s_t=\frac{\tau ^s_{t-1}}{1+\tau ^s_{t-1} \varvec{b}^\top _t\varvec{F}_t\varvec{\Sigma }^{-1}_t\varvec{F}^\top _t\varvec{b}_t}, \end{aligned}$$


$$\begin{aligned} \pi ^a_t&=c_t \pi ^a_{t-1}\phi _r({\textbf{y}}_t ; \varvec{F}^{\top }_t(\varvec{a}_{t} + \eta ^a_{t-1}\varvec{b}_{t}) , {\varvec{\Sigma }}_t + \tau ^a_{t-1} \varvec{F}^{\top }_t\varvec{b}_{t}\varvec{b}_t^{\top }\varvec{F}_t)\Phi \left( -\frac{\mu -\eta _t^a}{\sqrt{\tau _t^a}}\right) , \\ \pi ^b_t&=c_t \pi ^b_{t-1}\phi _r({\textbf{y}}_t ; \varvec{F}^{\top }_t(\varvec{a}_{t} + \eta ^b_{t-1}\varvec{b}_{t}) , {\varvec{\Sigma }}_t + \tau ^b_{t-1} \varvec{F}^{\top }_t\varvec{b}_{t}\varvec{b}_t^{\top }\varvec{F}_t)\Phi \left( \frac{\mu -\eta _t^b}{\sqrt{\tau _t^b}}\right) , \end{aligned}$$


$$\begin{aligned} c_t^{-1}&=\pi ^a_{t-1}\phi _r({\textbf{y}}_t ; \varvec{F}^{\top }_t(\varvec{a}_{t} + \eta ^a_{t-1}\varvec{b}_{t}) , {\varvec{\Sigma }}_t + \tau ^a_{t-1} \varvec{F}^{\top }_t\varvec{b}_{t}\varvec{b}_t^{\top }\varvec{F}_t)\Phi \left( -\frac{\mu -\eta _t^a}{\sqrt{\tau _t^a}}\right) \\&\quad +\pi ^b_{t-1}\phi _r({\textbf{y}}_t ; \varvec{F}^{\top }_t(\varvec{a}_{t} + \eta ^b_{t-1}\varvec{b}_{t}) , {\varvec{\Sigma }}_t + \tau ^b_{t-1} \varvec{F}^{\top }_t\varvec{b}_{t}\varvec{b}_t^{\top }\varvec{F}_t)\Phi \left( \frac{\mu -\eta _t^b}{\sqrt{\tau _t^b}}\right) \end{aligned}$$

This representation allows to characterize the expected value and variance of \(\varphi \mid D_t\), which can be expressed as

$$\begin{aligned} E\{\varphi \mid D_t\} = \pi _t^a\left[ \eta _t^a + \sqrt{\tau _t^a} \frac{\phi \left( \frac{\mu -\eta _t^a}{\sqrt{\tau _t^a}}\right) }{\Phi \left( -\frac{\mu -\eta _t^a}{\sqrt{\tau _t^a}}\right) }\right] + \pi _t^b\left[ \eta _t^b - \sqrt{{\tau _t^b}} \frac{\phi \left( \frac{\mu -\eta _t^b}{\sqrt{\tau _t^b}}\right) }{\Phi \left( \frac{\mu -\eta _t^b}{\sqrt{\tau _t^b}}\right) }\right] \end{aligned}$$


$$\begin{aligned} Var\{\varphi \mid D_t\}= & {} \pi _t^a\left[ (\eta _t^a)^2 + \tau _t^a + \sqrt{\tau _t^a}(\mu + \eta _t^a) \frac{\phi \left( \frac{\mu -\eta _t^a}{\sqrt{\tau _t^a}}\right) }{\Phi \left( -\frac{\mu -\eta _t^a}{\sqrt{\tau _t^a}}\right) }\right] \nonumber \\{} & {} +\pi _t^b\left[ (\eta _t^b)^2 + \tau _t^b - \sqrt{\tau _t^b}(\mu + \eta _t^b) \frac{\phi \left( \frac{\mu -\eta _t^b}{\sqrt{\tau _t^b}}\right) }{\Phi \left( \frac{\mu -\eta _t^b}{\sqrt{\tau _t^b}}\right) }\right] \nonumber \\{} & {} -\left[ E\{\varphi \mid D_t\} \right] ^2. \end{aligned}$$

A simpler expression for \(Var\{\varphi \mid D_t\}\) can be obtained in terms of Chi-square cumulative distribution function, adapting (Barr and Sherrill 1999).

Immediate consequences of these results are given in the following proposition.

Proposition 3.4

Consider the TPN-dlm defined by Eqs. (6)-(7) and (8)-(9), with the induction assumptions (13). Then:


The one-step-ahead predictive distribution of \(\varvec{\theta }_{t}\) given \(D_{t-1}\) is

$$\begin{aligned} p(\varvec{\theta }_{t} \vert D_{t-1})&= \pi ^a_{t-1}\phi _p(\varvec{\theta }_{t}; \varvec{a}_t+\eta ^a_{t-1}\varvec{b}_t,\varvec{R}_t+\tau ^a_{t-1}\varvec{b}_t \varvec{b}^\top _t)\Phi \left( -\frac{\mu -\chi _t^a}{\sqrt{\vartheta _t^a}}\right) \\&\quad +\pi ^b_{t-1}\phi _p(\varvec{\theta }_{t};\varvec{a}_t+\eta ^b_{t-1}\varvec{b}_t, \varvec{R}_t+\tau ^b_{t-1}\varvec{b}_t\varvec{b}^\top _t)\Phi \left( \frac{\mu -\chi _t^b}{\sqrt{\vartheta _t^b}}\right) , \end{aligned}$$

where, for \(s=a, b\),

$$\begin{aligned} \chi _t^s =\frac{\eta ^s_{t-1}+\tau ^s_{t-1}\varvec{b}^\top _t\varvec{R}_t^{-1} (\varvec{\theta }_t-\varvec{a}_t)}{1+\tau _{t-1}^s \varvec{b}^\top _t\varvec{R}_t^{-1} \varvec{b}_t},\quad \vartheta _t^s=\frac{\tau _{t-1}^s}{1+\tau _{t-1}^s \varvec{b}^\top _t\varvec{R}_t^{-1}\varvec{b}_t}. \end{aligned}$$

The one-step-ahead predictive distribution of \({\textbf{y}}_{t}\) given \(D_{t-1}\) is

$$\begin{aligned} p({\textbf{y}}_{t} \vert D_{t-1})&=\pi ^a_{t-1}\phi _r({\textbf{y}}_t ; \varvec{F}^{\top }_t(\varvec{a}_{t} + \eta ^a_{t-1}\varvec{b}_{t}) , {\varvec{\Sigma }}_t +\tau ^a_{t-1}\varvec{F}^{\top }_t\varvec{b}_{t}\varvec{b}_t^{\top } \varvec{F}_t)\Phi \left( -\frac{\mu -\eta _t^a}{\sqrt{\tau _t^a}}\right) \nonumber \\&\quad +\pi ^b_{t-1}\phi _r({\textbf{y}}_t ; \varvec{F}^{\top }_t(\varvec{a}_{t} + \eta ^b_{t-1}\varvec{b}_{t}) , {\varvec{\Sigma }}_t + \tau ^b_{t-1}\varvec{F}^{\top }_t\varvec{b}_{t}\varvec{b}_t^{\top }\varvec{F}_t) \Phi \left( \frac{\mu -\eta _t^b}{\sqrt{\tau _t^b}}\right) \nonumber \\ \end{aligned}$$

where \(\eta _t^s\) and \(\tau ^s_t\), for \(s=a,b\), are defined above in Proposition 3.3.


The filtering distribution is

$$\begin{aligned} p(\varvec{\theta }_{t} \vert D_{t})= & {} \pi ^a_t\phi _p(\varvec{\theta }_{t}; \varvec{m}_t+\eta _t^a\varvec{\beta }_t,\varvec{C}_t + \tau ^a_t\varvec{\beta }_t\varvec{\beta }^\top _t) \frac{\Phi \left( -\frac{\mu -\delta _t^a}{\sqrt{\upsilon _t^a}}\right) }{\Phi \left( -\frac{\mu -\eta ^a_t}{\sqrt{\tau ^a_t}}\right) }\,\nonumber \\{} & {} +\pi ^b_t\phi _p(\varvec{\theta }_{t}; \varvec{m}_t+\eta _t^b\varvec{\beta }_t,\varvec{C}_t + \tau _t^b\varvec{\beta }_t\varvec{\beta }^\top _t) \frac{\Phi \left( \frac{\mu -\delta _t^b}{\sqrt{\upsilon _t^b}}\right) }{\Phi \left( \frac{\mu -\eta _t^b}{\sqrt{\tau ^b_t}}\right) }, \end{aligned}$$

where \(\pi _t^s\), \(\eta ^s_t\) and \(\tau ^s_t\), for \(s=a, b\), are defined in in Proposition 3.3, and

$$\begin{aligned} \delta _t^s=\frac{\eta _t^s+\tau _t^s\varvec{\beta }^\top _t\varvec{C}_t^{-1}(\varvec{\theta }_{t}-\varvec{m}_t)}{1+\tau _t^s\varvec{\beta }^\top _t\varvec{C}_t^{-1}\varvec{\beta }_t},\quad \upsilon _t^s=\frac{\tau _t^s}{1+\tau _t^s\varvec{\beta }^\top _t\varvec{C}_t^{-1}\varvec{\beta }_t}. \end{aligned}$$

Proposition 3.4 shows the distribution of one-step-ahead predictive distribution of the states is typically skewed, and the same is true for the analogous predictive distribution of the response. Also, Proportion 3.4 shows that the filtering distribution is also typically skewed. This can be seen by comparing the results of our Proposition 3.4 with the results from the usual dlm (see, e.g., Petris et al. (2009)). Finally, since

$$\begin{aligned}&E(\varvec{\theta }_t \vert D_{t-1})=E\{E(\varvec{\theta }_t \vert \varphi , D_{t-1})\},\\&E({\textbf{Y}}_t \vert D_{t-1})=E\{E({\textbf{Y}}_t \vert \varphi , D_{t-1})\}, \end{aligned}$$


$$\begin{aligned}&Var(\varvec{\theta }_t \vert D_{t-1})=E\{Var(\varvec{\theta }_t \vert \varphi , D_{t-1})\}+Var\{E(\varvec{\theta }_t \vert \varphi , D_{t-1})\},\\&Var({\textbf{Y}}_t \vert D_{t-1})=E\{Var({\textbf{Y}}_t \vert \varphi , D_{t-1})\}+Var\{E({\textbf{Y}}_t \vert \varphi , D_{t-1})\}, \end{aligned}$$

then from equations (14), (16) and the property P3. of the tpn distribution (see Sect. 2.1), we obtain the following results:

Proposition 3.5

Under the tpn-dlm defined by Eqs. (6)-(7) and (8)-(9), with the induction assumptions (13), the one-step-ahead expected filtering and prediction distributions and their covariance matrices are, respectively, given by

$$\begin{aligned} E(\varvec{\theta }_t \vert D_{t-1})&= \varvec{a}_{t} + E\{\varphi \mid D_t\} \varvec{b}_{t},\\ Var(\varvec{\theta }_t \vert D_{t-1})&= {\textbf{R}}_t+ Var\{\varphi \mid D_t\} \varvec{b}_{t}\varvec{b}^{\top }_t,\\ E({\textbf{Y}}_t \vert D_{t-1})&= \varvec{F}^{\top }_t\varvec{a}_{t} + E\{\varphi \mid D_t\} \varvec{F}^{\top }_t\varvec{b}_{t},\\ Var({\textbf{Y}}_t \vert D_{t-1})&= \varvec{\Sigma }_t+ Var\{\varphi \mid D_t\} \varvec{F}^{\top }_t\varvec{b}_{t}\varvec{b}^{\top }_t\varvec{F}_{t}, \end{aligned}$$

where \(E\{\varphi \mid D_t\}\) and \(Var\{\varphi \mid D_t\}\) are derived in Eqs. 21 and 22, respectively.

4 Outline of Bayesian computation

In this section we combine the results obtained above to derive a forward filtering backward sampling (ffbs) to conduct full Bayesian inference on the model’s parameters \(\varvec{\Theta }= (\varvec{\theta }_1, \dots , \varvec{\theta }_T)\), \(\varvec{V}\) and \(\varvec{W}\) via Markov-Chain Monte Carlo (mcmc). In particular, we assign Inverse-Wishart priors on the error covariances \(\varvec{V}\) and \(\varvec{W}\) as

$$\begin{aligned} \varvec{V} \sim IW_r(\ell , \varvec{M}) \qquad \varvec{W} \sim IW_p(g, {\varvec{Z}}), \end{aligned}$$

where \(\varvec{M}\) and \(\varvec{Z}\) are positive definite matrix with size \(r\times r\) and \(p\times p\), respectively, while \(\ell\) and g are scalars such that \(\ell >(r-1)/2\) and \(g>(p-1)/2\). This choice guarantees that the covariance matrices \({\textbf{V}}\) and \({\textbf{W}}\) are positive definite.

Conditionally on the latent states, such distributions are conjugate, as the model is conditionally Gaussian. Therefore, the full conditional distributions of \(\varvec{V}\) and \(\varvec{W}\) are again Inverse-Wishart with

$$\begin{aligned} \varvec{V}|(D_T,\varvec{\theta })&\sim IW_r\left( \ell +\frac{T}{2}, \frac{1}{2} \sum _{t=1}^{T}(\varvec{y}_t-\varvec{F}_t^{\top }\varvec{\theta }_t) (\varvec{y}_t-\varvec{F}_t^{\top }\varvec{\theta }_t)^{\top }+\varvec{M}\right) , \end{aligned}$$
$$\begin{aligned} \varvec{W}|(D_T,\varvec{\theta })&\sim IW_p\left( g+\frac{T}{2},\frac{1}{2} \sum _{t=1}^{T}(\varvec{\theta }_t-\varvec{G}_t\varvec{\theta }_{t-1}) (\varvec{\theta }_t-\varvec{G}_t\varvec{\theta }_{t-1})^{\top }+\varvec{Z}\right) . \end{aligned}$$

In order to sample from \(\varvec{\Theta }|(D_T,\varphi )\) we rely on backward recursions and decompose the filtered distribution for the state parameters following Carter and Kohn (1994), Frühwirth-Schnatter (1994), as

$$\begin{aligned} p(\varvec{\Theta }|D_T,\varphi )&= \prod _{t=0}^{T}p(\varvec{\theta }_t|\varvec{\theta }_{t+1},D_t, \varphi ) \end{aligned}$$
$$\begin{aligned}&= p(\varvec{\theta }_T|D_T,\varphi )\prod _{t=0}^{T-1}p(\varvec{\theta }_t|\varvec{\theta }_{t+1},D_t, \varphi ), \end{aligned}$$


$$\begin{aligned} \varvec{\theta }_t \mid (\varvec{\theta }_{t+1}, D_t, \varphi ) \sim N(\varvec{h}_t, \varvec{H}_t) \end{aligned}$$

and with

$$\begin{aligned} \varvec{h}_t= & {} \varvec{m}_t + \varphi \varvec{\beta }_t + \varvec{C}_t \varvec{G}_{t+1}^{\top } \varvec{R}_{t+1}^{-1} (\varvec{\theta }_{t+1} - \varvec{a}_{t+1} - \varphi \varvec{b}_{t+1})\\ \varvec{H}_t= & {} \varvec{C}_t - \varvec{C}_t \varvec{G}_{t+1}^{\top } \varvec{R}_{+1}^{-1} \varvec{G}_{t+1} \varvec{C}_t. \end{aligned}$$

4.1 mcmc algorithm

Posterior sampling can be performed combining the above results in a mcmc algorithm, alternating the Kalman filter with sampling from the conditional distributions. The following pseudo-code illustrates the steps of a single mcmc iteration:

  1. 1.

    Sample \(\varvec{\Theta }\) using the following modified ffbs algorithm:

    1. 1a.

      For \(t=1,\dots , T\), update the parameters of the distribution \(\varvec{\theta }_t|D_t\) using the Kalman filter given in Sect. 3.2 (forward filtering)

    2. 1b.

      For \(t=1, \dots , T\), sample \(\varphi |D_t\) from the conditional distribution outlined in Eq. 20;

    3. 1c.

      Sample \(\varvec{\theta }_T|D_T\) from the filtering distribution reported in Eq. 23;

    4. 1d.

      For \(t = T-1, T-2, \dots , 1\), sample \(\varvec{\theta }_t|(\varvec{\theta }_{t+1},D_t, \varphi )\) from the distribution outlined in Eq. 28, conditioning on the \(\varvec{\theta }_{t+1}\) sampled in the previous step (backward smoothing)

  2. 2.

    Sample \(\varvec{V}\) from its Inverse-Wishart full-conditional distribution, outlined in Equation (24);

  3. 3.

    Sample \(\varvec{W}\) from its Inverse-Wishart full-conditional distribution, outlined in Equation (25);

5 Simulation

We propose a simulation study to compare the performance of the proposed approach against a Gaussian dlm, focusing on different settings with varying sample size. We focus on univariate settings, assuming that the matrices \(\{\varvec{G}_t\}\) and \(\{\varvec{F}_t\}\) are unidimensional and do not depend on time, namely \(\varvec{F}=\varvec{G}=1\). We simulated \(T=50\) observations from the dlm defined by

$$\begin{aligned} \begin{aligned}&Y_t = \theta _t + \nu _t, \\&\theta _t = \theta _{t-1} + \omega _t, \\ \end{aligned} \end{aligned}$$

with different specification of the initial distribution \(\theta _0\) and the disturbances \(\nu _t\) and \(\omega _t\). Specifically, we focus on the following settings:

  1. (1)

    Scenario 1: data are generated from a two-piece dlm, with initial distribution \(\theta _0|\varphi \sim N(-3+2\varphi , 2)\) and \(\varphi \sim TPN(3,\sqrt{3},0.5)\), letting \(a(\gamma ) = 1 + \gamma\) and \(b(\gamma )= 1-\gamma\), with Gaussian errors \(\nu _t \sim N(0,5)\) and \(\omega _t \sim N(0,3)\)

  2. (2)

    Scenario 2: data are generated from a Gaussian dlm, with \(\theta _0 \sim N(-3, 2)\) and Gaussian errors \(\nu _t \sim N(0,5)\), \(\omega _t \sim N(0,3)\)

  3. (3)

    Scenario 3: data are generated from a dlm with heavy tails, simulating \(\theta _0\), \(\nu _t\) and \(\omega _t\) from independent Student’s t distribution with 3 degrees of freedom.

We chose diffuse inverse-gamma distributions as priors for V and W, which in this case are scalar, with parameters \(\ell =M=g=Z=0.001\). We compare our approach with a Gaussian dlm with same prior distributions, running both algorithms for 5000 iterations after 500 burn-in samples, and focusing on the one-step-ahead predictions and state parameters. Examination of traceplots of the parameters, auto-correlation function and Rubin’s diagnostics showed no evidence against convergence.

Fig. 1
figure 1

One-step-ahead predictions and filtered estimates for Scenario 1. Black lines denote the observed time series and the true state parameters

Fig. 2
figure 2

One-step-ahead predictions and filtered estimates for Scenario 2. Black lines denote the observed time series and the true state parameters

Figure 1, 2, and 3 show the one-step-ahead predictions and filtered estimates in the three scenarios. Current empirical findings indicate that, as expected, the main advantage of the proposed approach is more evident in the initial part of the series, where the impact of the initial distribution is substantial. This result is clearly seen in Fig. 1, where the tpn-dlm is correctly specified, and the Gaussian dlm tend to underestimate the state parameter and the one-step-ahead predictions. When data are generated from a Gaussian dlm, as in Fig. 2, the tpn initial distribution is incorrectly specified. However, its impact vanishes after few steps, and its one-step-ahead predictions are indistinguishable from a Gaussian dlm. Lastly, Fig. 3 focuses on a setting where both models are incorrectly specified, in terms of initial and distribution of the errors. We observe that the proposed tpn-dlm is robust again such misspecification, obtaining one-step-ahead predictions and estimates for the state parameters that are closer to the true level.

These findings are further explored replicating the simulations scenarios for different sample sizes, ranging \(T\in \{10,50,100\}\), with \(T=50\) corresponding to the results from Figs. 1,2 and 3. Results are reported in Table 1, comparing the Mean Squared Error (mse) of the expected value of the one-step-ahead distributions under both approaches. Empirical results are consistent with the previous discussion, with the tpn-dlm performing particularly well with small sample sizes, under correct specification and with heavy-tails processes.

Fig. 3
figure 3

One-step-ahead predictions and filtered estimates for Scenario 3. Black lines denote the observed time series and the true state parameters

Table 1 Mean squared error for the expected value of the one-step-ahead distribution

6 Analysis of real data

Finally, we illustrate the tpn-dlm by analyzing the quarterly earnings in dollars per Johnson and Johnson share from 1960 to 1980 (Shumway et al. 2000, Example 1.1).

Data are characterized by a seasonality larger in the starting and ending years, almost missing for the central years. Trend is increasing and regular. Following Shumway et al. (2000), the time series will be modelled with the trend and seasonality components added to a white noise

$$\begin{aligned} Y_t = T_t + S_t + \nu _t; \end{aligned}$$

trend will be modelled as follow

$$\begin{aligned} T_t = \phi T_{t-1} + \omega _{t1} \end{aligned}$$

and we assume that the seasonal component is expected to sum to zero over a complete period of four quarters

$$\begin{aligned} S_t + S_{t-1} + S_{t-2} + S_{t-3} = \omega _{t2}. \end{aligned}$$

We may express the model in state-space form, by choosing \([T_t, S_t, S_{t-1}, S_{t-2}]^{\top }\) as state vector:

$$\begin{aligned} Y_t&= \left[ \begin{array}{cccc} 1&1&0&0 \end{array}\right] \left[ \begin{matrix} T_t \\ S_{t} \\ S_{t-1} \\ S_{t-2} \end{matrix} \right] + v_t\qquad \text{ where } \qquad v_t \sim N(0,V)\\ \left[ \begin{matrix} T_t \\ S_{t} \\ S_{t-1} \\ S_{t-2} \end{matrix} \right]&= \left[ \begin{matrix} \phi &{} 0 &{} 0 &{} 0 \\ 0 &{} -1 &{} -1 &{} -1 \\ 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \\ \end{matrix} \right] \left[ \begin{matrix} T_{t-1} \\ S_{t-1} \\ S_{t-2} \\ S_{t-3} \end{matrix} \right] + \left[ \begin{matrix} \omega _{t1} \\ \omega _{t2} \\ 0 \\ 0 \end{matrix} \right] \\&\quad \text{ where } \qquad \left[ \begin{matrix} \omega _{t1} \\ \omega _{t2} \\ 0 \\ 0 \end{matrix} \right] \sim N_4 \left( \left[ \begin{matrix} 0 \\ 0 \\ 0 \\ 0 \end{matrix} \right] , \left[ \begin{matrix} W_{11} &{} 0 &{} 0 &{} 0 \\ 0 &{} W_{22} &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \\ \end{matrix} \right] \right) \end{aligned}$$

The parameters to be estimated are the observation noise variance, V, and the state noise variances associated with the trend, \(W_{11}\), and the seasonal components, \(W_{22}\). In addition, we need to estimate the transition parameter associated with the growth rate, \(\phi\). Following Shumway et al. (2000), Example 6.27, we write \(\phi =1+\zeta\), where \(0<\zeta \le 1\), and we rewrite the trend component as

$$\begin{aligned} T_t-T_{t-1}=\zeta T_{t-1}+\omega _{t1}, \end{aligned}$$

so that, conditionally on the states, \(\zeta\) is the slope of the linear regression of \((T_t-T_{t-1})\) on \(T_{t-1}\) and \(\omega _{t1}\) is the error. We choose a reference uninformative prior on \((\zeta ,\omega _{t1})\) and weakly informative priors for the remaining parameters by letting \(\ell =M=0.001\) and \(g=0.05\) and \(\textbf{Z}=\text{ diag }\{0.05,0.05,0,0\}\). We ran the algorithm for 5000 iterations collected after 5000 burn-in samples. Examination of traceplots of the parameters, auto-correlation function and Rubin’s diagnostics showed no evidence against convergence. Fig. 4 displays the comparison of the trends (\(T_t\)) and season (\(T_t + S_t\)) along with \(99\%\) credible intervals for Gaussian dlm and tpn-dlm. Figure 5 displays the data and the one-step-ahead predictions for the time series \(Y_t\), again along with 99% credible intervals for Gaussian dlm and two-piece dlm.

Fig. 4
figure 4

Posterior estimate of trend (\(T_t\)) and trend plus season (\(T_t + S_t\)) along with corresponding 99% credible intervals for the Johnson &Johnson data

Figures 4 and 5 show that the \(99\%\) credibility intervals of state and response are different between of the two-piece dlm and dlm: as a consequence the entire distributions of those quantities are different. In addition we note that skewness of predictive distributions is maintained with the increasing of time, showing the usefulness of the two-piece dlm.

Mean squared error was 0.2131 for the two-piece dlm and 0.3512 for the dlm, showing an advantage towards the two-piece dlm. We also considered a BIC criteria for competing alternative models k, \(k = 1, 2,\ldots , K\), the smaller-is-better criterion BIC is \(\hbox {BIC}_k = n \log MSE_k + m_k \log (n)\), where \(MSE_k\) is the predicting mean squared error and \(m_k\) is the number of independent parameters used to fit model k. We obtained \(-70.18\) for the dlm, where \(m_k=4\) and \(-107.69\) for the two-piece dlm, where \(m_k=5\), confirming that also including complexity of the model, two-piece dlm is preferable.

Fig. 5
figure 5

One-step-ahead predictions for the Johnson & Johnson quarterly earnings series. Dotted lines refer to \(99\%\) credible intervals

7 Conclusion

In this article we proposed a flexible dynamic linear model (dlm) for modeling and forecasting multivariate time series relaxing the assumption of normality for the initial distribution of the state space parameter, replacing it by a more flexible class of distributions, which is called two-piece normal distributions. This model allows the initial distribution of the state space parameter to be skewed, and the asymmetry can be controlled by a scalar parameter. We derive a Kalman filter for this model, obtaining a two component mixture as predictive and filtering distributions that maintain skeweness.

In our opinion, the main contribution of this article is to present a simple and effective tool to model time series with possibly skewed distribution, like the Example 1.1 in Shumway et al. (2000) here. Also since we obtained a two component mixture for predictive and filtering distributions so this new model can simultaneously deal with some issues related to departures from normality like skewed, heavy-tailed data and also, multi-modality.