A flexible two-piece normal dynamic linear model

We construct a flexible dynamic linear model for the analysis and prediction of multivariate time series, assuming a two-piece normal initial distribution for the state vector. We derive a novel Kalman filter for this model, obtaining a two components mixture as predictive and filtering distributions. In order to estimate the covariance of the error sequences, we develop a Gibbs-sampling algorithm to perform Bayesian inference. The proposed approach is validated and compared with a Gaussian dynamic linear model in simulations and on a real data set.


Introduction
State-space models have been extensively considered in diverse areas of application for modeling and forecasting time series.An important special case is the class of dynamic linear models (hereafter dlm).This class of models includes the ordinary static linear model as a special case, and assumes that the parameters can change over time, thus incorporating in the observational system variations that can significantly affect the observed behavior of the process of interest.The dlm is defined by for t = 1, … , T , where Y t is a r × 1 response vector, F t is a p × r matrix that links the observed data with t , a p × 1 vector of latent states at time t, G t is a p × p transition Y t =F ⊤ t t + t (observation equation), t =G t t−1 + t (system or state equation), matrix, which describes the evolution of the state parameters.The terms t and t are mutually independent white-noise vectors of dimension r × 1 and p × 1 , respec- tively, with zero-means and constant variance-covariance matrice V and W , respec- tively.The most popular case is the Gaussian dlm, which assumes that all being mutually independents for each t = 1, … , T , where N k ( , ) denotes the k-variate normal distribution with mean vector and variance-covariance matrix .For an extensive introduction to dlms with a Bayesian perspective, refer to West and Harrison (1997), Petris et al. (2009).In this article, we focus on a setting where F t and G t are known, consistently with classical Kalman filter setting (Kalman 1960) and recent developments in state-space models (e.g., Fasano et al. 2021).
Leveraging the properties of the multivariate normal distribution and the structure of the Gaussian dlm, it is possibile to derive closed form expressions for the predictive and filtering distributions and conduct dynamic inference on the states t via Kalman filter, conditioning on F t , G t , W and V .However, Naveau  et al. (2005) observed that the Gaussian assumption may be questionable for a large number of applications, as many distributions used in a state-space model can be skewed.In order to mitigate this issue, Naveau et al. (2005) assumed that state initial parameter vector 0 follows a multivariate closed skew-normal distribution, preserving the typical assumptions of independence and normality for the error sequences t and t .From this work, several authors have proposed different mechanisms to obtain dlm with more skewness.For instance, Kim et al. (2014) extended the results in Naveau et al. (2005) by assuming a scale mixtures of closed skew-normal distributions for the initial state parameter vector 0 ; Cabral et al. (2014) proposed a Bayesian dlm relaxing the assumption of normality and assuming an extended skew-normal (Azzalini and Capitanio 1999) for the initial distribution of the state parameter; Arellano-Valle et al. (2019) proposed a dlm in which the error sequence t in the observational equation are assumed to have multivariate skew-normal distribution.Furthermore, several authors have been dealing with a similar problem; see, for example, Gualtierotti (2005), Pourahmadi (2007), Corns and Satchell (2007), among many others.
In this work we take a similar perspective, and derive a novel dlm that allows one to induce asymmetry by means of a scalar parameter, inducing a skewed initial distribution for the state space parameter 0 .Our purpose is to replace the normal distribution for 0 by a more flexible one, incorporating asymmetry via a two-piece normal (tpn) mixing distribution.Using this simple method, we obtain an extension of the classic Kalman filter, and closed form expressions for one-step-ahead and filtering distributions.These results are further combined into a Markov-Chain Monte Carlo procedure via a forward filtering-backward sampling algorithm to provide inference on the covariances V and W of the error terms, providing posterior inference on the unknown quantities.∼ N r (0, V), A flexible two-piece normal dynamic linear model 2 Two-piece normal and skew normal distributions

Two-piece normal distributions
According to Arellano-Valle et al. (2005), a continuous random variable Y follows a two-piece normal (tpn) distribution with location , scale and asymmetry parameter if its density function for any y ∈ ℝ can be written as where (x) denotes the density of a standard Gaussian and I A (x) denotes the indica- tor function of the set A; we write such a distribution compactly as Y ∼ TPN( , , ) .Skewness is controlled via two functions a( ) and b( ) satisfying the following properties: (i) a( ) and b( ) are positive-valued functions for ∈ ( L , U ) , a possibly infinite interval; (ii) one of the functions is (strictly) increasing and the other is (strictly) decreasing; (iii) there exists an unique value * ∈ ( L , U ) such that a( * ) = b( * ) , and therefore the tpn density (1) becomes In addition, the tpn distribution has several interesting formal properties in terms of stochastic construction (Arellano-Valle et al. 2005).The following list includes those most relevant for the purposes of this work: P1.The tpn density (1) can be expressed as a finite mixture of two truncated normal densities f a and f b given by That is, where where the notation d = indicates equality in distribution.Specifically, V ∼ TN(0, 1, [0, ∞]) , a truncated normal with location 0, scale 1, truncated over the positive real line, while W is an independent discrete random variable with probability function . (2) which can be rewritten as with s = sign (w) and s defined by (3).Equivalently, if Y = + W |X| , where X ∼ N(0, 1) and is independent of W , then Y ∼ h .This stochastic representation allows one to obtain the mean and variance of Y leveraging the law of total expectation; refer to Arellano-Valle et al. (2020) for further details.

Skew-normal distribution
A random vector Y has a multivariate Skew-Normal sn distribution with location vector , positive definite scale matrix and skewness/shape vector , denoted by Y ∼ SN p ( , , ) , if its density function is given by Here, p (⋅; , ) denotes the density function of the p-variate normal distribution with mean vector and variance-covariance matrix , and Φ(⋅) is the cumulative distribution function of a standard normal.The sn random vector Y ∼ SN p ( , , ) can be introduced as the location-scale transformation Y = + 1∕2 X , where X has the following stochastic representation: where = ∕(1 + ⊤ ) 1∕2 , X 0 ∼ N(0, 1) and X 1 ∼ N p (0, I p − ⊤ ) , which are independent.By (4) we can get that if Y ∼ SN p ( , , ) , then there are two independent random quantities Z and U , with X 1 , such that where = 1∕2 .Note that Z ∼ HN(0, 1) and U ∼ (0, − ⊤ ) .Thus, using (5), it can be shown that the mean vector and variance-covariance matrix of Y are given respectively by 1 3 A flexible two-piece normal dynamic linear model

The initial state distribution
Our proposal in this section is to derive a more flexible dlm that regulates asymmetry through a simple scalar parameter.Specifically, preserving the classical independence assumptions, we consider the dlm defined by for t = 1, … , T , replacing the initial state parameter 0 distribution with the follow- ing hierarchical specification: The model defined by the Eqs.(6, 7) and (8, 9) will be referred to as two-piece normal dynamic linear model (hereafter tpn-dlm).
As a first important result, we note that the hierarchical specification (8, 9) leads a mixture of two multivariate skew-normals as initial distribution for 0 .Its proof can be derived as a direct extension of Proposition 2 in Arellano-Valle et al. (2020), and so it is omitted here.Proposition 3.1 Under the hierarchical representation defined by Eqs.(8, 9), the initial density of 0 is given by where, for s = a, b , s is defined by (3), and Here it should be noted that from the well-known matrix inversion formula we get, for s = a, b , that , so that the term s defined in Proposition 3.1 can be rewritten as The distribution of the initial random vector 0 can be written as which correspond to the density of a two-component mixture of the multivariate skew-normal densities reported in Sect.2.2.Specifically, from Proposition (3.1), we see that the initial state parameter is distributed as where

The Kalman filter
Our next step is to develop a Kalman filter based on the new initial distribution given by ( 12), and assuming that the conditional distribution of corresponds to a mixture of two truncated Gaussian densities.Let D t = {y 1 , ⋯ , y t } denote the available information at time t, where y i indicates a realization of the random variable Y i .In the proposed tpn-dlm we consider a con- ditionally normal distribution for 0 given , with a tpn initial distribution for (8).Furthermore, we assume by induction that Specifically, the conditional distribution of correspond to a mixture of two truncated Gaussian with locations s t−1 , scales s t−1 and mixing weights s t−1 with s = a, b and truncation point , defined by the initial distribution given in 9.
Leveraging the conditional independence properties of the tpn-dlm outlined in Eq. 7, the one-step-ahead predictive distribution of t given ( , D t−1 ) is given by A flexible two-piece normal dynamic linear model where Similarly, using (6) we find that the one-step-ahead predictive distribution of Y t given ( , D t−1 ) becomes where In other words, from ( 14) and ( 16) we have that and therefore Finally, by applying the properties of the conditional normal distribution, we obtain the following filtering distribution of t given ( , D t ): where with a t , b t , R t and t defined as in ( 15) and ( 17), respectively.
The above results are formalized below: Proposition 3.2 Consider the TPN-dlm defined by Eqs. ( 6)-( 7) and ( 8)-( 9), with the induction assumptions (13).Then: The one-step-ahead conditional predictive distribution of (ii) The one-step-ahead conditional predictive distribution of ( 15) (iii) The conditional filtering distribution of The next proposition establishes the conditional distribution of |D t .
Immediate consequences of these results are given in the following proposition.
Proposition 3.4 Consider the TPN-dlm defined by Eqs. ( 6)-( 7) and ( 8)-( 9), with the induction assumptions (13).Then: The one-step-ahead predictive distribution of t given D t−1 is where, for s = a, b , (ii) The one-step-ahead predictive distribution of y t given D t−1 is where s t and s t , for s = a, b , are defined above in Proposition 3.3. ( The filtering distribution is where s t , s t and s t , for s = a, b , are defined in in Proposition 3.3, and Proposition 3.4 shows the distribution of one-step-ahead predictive distribution of the states is typically skewed, and the same is true for the analogous predictive distribution of the response.Also, Proportion 3.4 shows that the filtering distribution is also typically skewed.This can be seen by comparing the results of our Proposition 3.4 with the results from the usual dlm (see, e.g., Petris et al. (2009)).Finally, since and then from equations ( 14), ( 16) and the property P3. of the tpn distribution (see Sect. 2.1), we obtain the following results: Proposition 3.5 Under the tpn-dlm defined by Eqs.(6)-( 7) and (8)-( 9), with the induction assumptions (13), the one-step-ahead expected filtering and prediction distributions and their covariance matrices are, respectively, given by where E{ | D t } and Var{ | D t } are derived in Eqs.21 and 22, respectively.
1 3 A flexible two-piece normal dynamic linear model

Outline of Bayesian computation
In this section we combine the results obtained above to derive a forward filtering backward sampling (ffbs) to conduct full Bayesian inference on the model's parameters = ( 1 , … , T ) , V and W via Markov-Chain Monte Carlo (mcmc).In particular, we assign Inverse-Wishart priors on the error covariances V and W as where M and Z are positive definite matrix with size r × r and p × p , respectively, while and g are scalars such that > (r − 1)∕2 and g > (p − 1)∕2 .This choice guarantees that the covariance matrices V and W are positive definite.
Conditionally on the latent states, such distributions are conjugate, as the model is conditionally Gaussian.Therefore, the full conditional distributions of V and W are again Inverse-Wishart with In order to sample from |(D T , ) we rely on backward recursions and decompose the filtered distribution for the state parameters following Carter and Kohn (1994), Frühwirth-Schnatter (1994), as where and with

mcmc algorithm
Posterior sampling can be performed combining the above results in a mcmc algorithm, alternating the Kalman filter with sampling from the conditional distributions.The following pseudo-code illustrates the steps of a single mcmc iteration: 1. Sample using the following modified ffbs algorithm: 1a.For t = 1, … , T , update the parameters of the distribution t |D t using the Kalman filter given in Sect.3.2 (forward filtering) 1b.For t = 1, … , T , sample |D t from the conditional distribution outlined in Eq. 20; 1c.Sample T |D T from the filtering distribution reported in Eq. 23; 1d.For t = T − 1, T − 2, … , 1 , sample t |( t+1 , D t , ) from the distribution outlined in Eq. 28, conditioning on the t+1 sampled in the previous step (backward smoothing) 2. Sample V from its Inverse-Wishart full-conditional distribution, outlined in Equation ( 24); 3. Sample W from its Inverse-Wishart full-conditional distribution, outlined in Equation ( 25);

Simulation
We propose a simulation study to compare the performance of the proposed approach against a Gaussian dlm, focusing on different settings with varying sample size.We focus on univariate settings, assuming that the matrices {G t } and {F t } are unidimensional and do not depend on time, namely F = G = 1 .We simulated T = 50 observations from the dlm defined by with different specification of the initial distribution 0 and the disturbances t and t .Specifically, we focus on the following settings: (1) Scenario 1: data are generated from a two-piece dlm, with initial distribution 0 | ∼ N(−3 + 2 , 2) and ∼ TPN(3, √ 3, 0.5) , letting a( ) = 1 + and b( ) = 1 − , with Gaussian errors t ∼ N(0, 5) and t ∼ N(0, 3) (2) Scenario 2: data are generated from a Gaussian dlm, with 0 ∼ N(−3, 2) and Gaussian errors t ∼ N(0, 5) , t ∼ N(0, 3) (3) Scenario 3: data are generated from a dlm with heavy tails, simulating 0 , t and t from independent Student's t distribution with 3 degrees of freedom.
A flexible two-piece normal dynamic linear model We chose diffuse inverse-gamma distributions as priors for V and W, which in this case are scalar, with parameters = M = g = Z = 0.001 .We compare our approach with a Gaussian dlm with same prior distributions, running both algorithms for 5000 iterations after 500 burn-in samples, and focusing on the one-step-ahead predictions and state parameters.Examination of traceplots of the parameters, auto-correlation function and Rubin's diagnostics showed no evidence against convergence.
Figure 1, 2, and 3 show the one-step-ahead predictions and filtered estimates in the three scenarios.Current empirical findings indicate that, as expected, the main advantage of the proposed approach is more evident in the initial part of the series, where the impact of the initial distribution is substantial.This result is clearly seen in Fig. 1, where the tpn-dlm is correctly specified, and the Gaussian dlm tend to underestimate the state parameter and the one-step-ahead predictions.When data are generated from a Gaussian dlm, as in Fig. 2, the tpn initial distribution is incorrectly specified.However, its impact vanishes after few steps, and its one-step-ahead predictions are indistinguishable from a Gaussian dlm.Lastly, Fig. 3 focuses on a setting where both models are incorrectly specified, in terms of initial and distribution of the errors.We observe that the proposed tpn-dlm is robust again such misspecification, obtaining one-step-ahead predictions and estimates for the state parameters that are closer to the true level.
These findings are further explored replicating the simulations scenarios for different sample sizes, ranging T ∈ {10, 50, 100} , with T = 50 corresponding to the results from Figs. 1,2 and 3. Results are reported in Table 1, comparing the Mean Squared Error (mse) of the expected value of the one-step-ahead distributions under both approaches.Empirical results are consistent with the previous Data are characterized by a seasonality larger in the starting and ending years, almost missing for the central years.Trend is increasing and regular.Following Shumway et al. (2000), the time series will be modelled with the trend and seasonality components added to a white noise trend will be modelled as follow and we assume that the seasonal component is expected to sum to zero over a complete period of four quarters We may express the model in state-space form, by choosing [T t , S t , S t−1 , S t−2 ] ⊤ as state vector: The parameters to be estimated are the observation noise variance, V, and the state noise variances associated with the trend, W 11 , and the seasonal components, W 22 .In addition, we need to estimate the transition parameter associated with the growth rate, .Following Shumway et al. (2000), Example 6.27, we write = 1 + , where 0 <  ≤ 1 , and we rewrite the trend component as so that, conditionally on the states, is the slope of the linear regression of (T t − T t−1 ) on T t−1 and t1 is the error.We choose a reference uninformative prior on ( , t1 ) and weakly informative priors for the remaining parameters by letting = M = 0.001 and g = 0.05 and Z = diag {0.05, 0.05, 0, 0} .We ran the algorithm for 5000 iterations collected after 5000 burn-in samples.Examination of traceplots of the parameters, auto-correlation function and Rubin's diagnostics showed no evidence against convergence.Fig. 4 displays the comparison of the trends ( T t ) and season ( T t + S t ) along with 99% credible intervals for Gaussian dlm and tpn-dlm.Figure 5 displays the data and the one-step-ahead predictions for the time series Y t , again along with 99% credible intervals for Gaussian dlm and two-piece dlm.
Figures 4 and 5 show that the 99% credibility intervals of state and response are different between of the two-piece dlm and dlm: as a consequence the entire distributions of those quantities are different.In addition we note that skewness of predictive distributions is maintained with the increasing of time, showing the usefulness of the two-piece dlm.
Mean squared error was 0.2131 for the two-piece dlm and 0.3512 for the dlm, showing an advantage towards the two-piece dlm.We also considered a BIC criteria for competing alternative models k, k = 1, 2, … , K , the smaller-is-better cri- terion BIC is BIC k = n log MSE k + m k log(n) , where MSE k is the predicting mean squared error and m k is the number of independent parameters used to fit model k.We obtained −70.18 for the dlm, where m k = 4 and −107.69 for the two-piece dlm, where m k = 5 , confirming that also including complexity of the model, two-piece dlm is preferable.A flexible two-piece normal dynamic linear model

Conclusion
In this article we proposed a flexible dynamic linear model (dlm) for modeling and forecasting multivariate time series relaxing the assumption of normality for the initial distribution of the state space parameter, replacing it by a more flexible class of distributions, which is called two-piece normal distributions.This model allows the initial distribution of the state space parameter to be skewed, and the asymmetry can be controlled by a scalar parameter.We derive a Kalman filter for this model, obtaining a two component mixture as predictive and filtering distributions that maintain skeweness.In our opinion, the main contribution of this article is to present a simple and effective tool to model time series with possibly skewed distribution, like the Example 1.1 in Shumway et al. (2000) here.Also since we obtained a two component mixture for predictive and filtering distributions so this new model can simultaneously deal with some issues related to departures from normality like skewed, heavy-tailed data and also, multi-modality.

Appendix: Proofs
Proof of Proposition 3.1 Using the hierarchical representation defined by Eqs. ( 8)-( 9), and the fact that has a tpn density (2), we obtain

Fig. 1 Fig. 2 Fig. 3
Fig. 1 One-step-ahead predictions and filtered estimates for Scenario 1. Black lines denote the observed time series and the true state parameters

Fig. 4
Fig. 4Posterior estimate of trend ( T t ) and trend plus season ( T t + S t ) along with corresponding 99% credible intervals for the Johnson &Johnson data

Fig. 5 0 (
Fig. 5 One-step-ahead predictions for the Johnson & Johnson quarterly earnings series.Dotted lines refer to 99% credible intervals

Table 1
Mean squared error for the expected value of the one-step-ahead distribution |D t ) ∝ p(y t | , D t−1 )p( |D t−1 ).p(|D t ) ∝ 2 a t−1  r (y t ;F ⊤ t (a t + b t ), t )(; a t−1 ,  a t−1 )I [,∞) () + 2 b t−1  r (y t ;F ⊤ t (a t + b t ), t )(; b t−1 ,  b t−1 )I (−∞,) ().are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http:// creat iveco mmons.org/ licen ses/ by/4.0/.
r (y t ;F ⊤ t (a t + b t ), t )(; s t−1 ,  s t−1 ) =  r (y t ;F ⊤ t (a t +  s t−1 b t ), t +  s t−1 F ⊤ t b t b ⊤ t F t )(; s t ,  s t ), =  a t−1  r (y t ;F ⊤ t (a t +  a t−1 b t ), t +  a t−1 F ⊤ t b t b ⊤ t F t )Φ r (y t ;F ⊤ t (a t +  b t−1 b t ), t +  b t−1 F ⊤ t b t b ⊤ t F t )Φ