# Robust time series models with trend and seasonal components

- 1.3k Downloads
- 1 Citations

## Abstract

We describe observation driven time series models for Student-t and EGB2 conditional distributions in which the signal is a linear function of past values of the score of the conditional distribution. These specifications produce models that are easy to implement and deal with outliers by what amounts to a soft form of trimming in the case of t and a soft form of Winsorizing in the case of EGB2. We show how a model with trend and seasonal components can be used as the basis for a seasonal adjustment procedure. The methods are illustrated with US and Spanish data.

## Keywords

Fat tails EGB2 Score Robustness Student’s t Trimming Winsorizing## JEL Classification

C22 G17## 1 Introduction

Time series are often subject to observations that, when judged by the Gaussian yardstick, are outliers. This is a very real issue for many economic time series. Agustin Maravall’s seasonal adjustment program, TRAMO-SEATS, which he developed jointly with Victor Gomez, tackles the problem by identifying outliers and, where appropriate, replacing them by dummy variables. Here we take a different approach in which we employ a new class of robust models where the dynamics of the level, or location, are driven by the score of the conditional distribution of the observations. These called dynamic conditional score (DCS) models have recently been developed by Creal et al. (2011, 2013) and Harvey (2013). They are relatively easy to implement and their form facilitates the development of a comprehensive and relatively straightforward theory for the asymptotic distribution of the maximum likelihood estimator.

The changing level of a Gaussian time series is usually obtained from an ARIMA process or explicitly modeled as an unobserved component. The statistical treatment of linear Gaussian unobserved components models is straightforward, with the Kalman filter playing a key role. Additive outliers may be captured by dummy variables. A different way forward is to let the noise have a Student *t*-distribution, thereby accommodating the outliers. However, the treatment of such a model requires computationally intensive procedures, as described in Durbin and Koopman (2012). The DCS-t model proposed by Harvey and Luati (2014) provides an alternative approach which is observation-driven in that the conditional distribution of the observations is specified. A model of this kind may be compared and contrasted with the methods in the robustness literature; see Maronna, Martin and Yohai (2006, ch 8) and McDonald and Newey (1988), where a parametric approach is called ‘partially adaptive’. Robust procedures for guarding against additive outliers typically respond to large observations in one of two ways: either the response function converges to a positive (negative) constant for observations tending to plus (or minus) infinity or it goes to zero. These two procedures are usually classified as Winsorizing or as trimming. The score for a t-distribution converges to zero and so can be regarded as a parametric form of trimming. Similarly a parametric form of Winsorizing is given by the exponential generalized beta distribution of the second kind (EGB2) distribution. The article by Caivano and Harvey (2014) sets out the theory for the DCS location model with an EGB2 distribution and illustrates its practical value.

The article is organized as follows. Section 2 reviews the idea behind the DCS location model and expands on the reason for using the conditional score. The various classifications of distributions in terms of their tails are then set out and this is followed by a discussion of the score as it relates to location and scale and the link with robust estimation. This material leads on to the contrast between the DCS t and EGB2 models in Sect. 3. Section 4 compares the fit of EGB2 and t-distributions for two US macroeconomic time series. A comparison between the way in which Gaussian and DCS models adapt to structural breaks is made in Sect. 5. This issue is important, because it may be thought that the price paid by DCS models for their robustness is a slow response to structural breaks.

DCS models with trend and seasonal components are described in Sect. 6. These models can be regarded as robust counterparts to the unobserved components ‘basic structural model’ (BSM). Seasonal adjustment can be carried out with the BSM by extracting smoothed components using the standard Kalman filter and smoother. Because DCS models only give filtered components, it is necessary to devise a method for smoothing. In Sect. 7 we apply this method to a monthly series of tourists arriving in Spain and compare the extracted trend and seasonal components with those obtained by fitting a BSM with the outliers handled by dummy variables.

## 2 Filters, heavy tails and robust estimation

The first sub-section below sets out a simple unobserved components model and shows how the innovations form of the Kalman filter may be adapted to form a DCS model. The second sub-section provides a rationale for the use of the conditional score. The way in which tails of distributions may be classified is reviewed in the third sub-section and in the fourth, tail behaviour is related to the considerations of robustness. The treatment of these topics is more general and integrated than in Harvey (2013).

### 2.1 Unobserved components and filters

*q*may be estimated by maximum likelihood (ML), with the likelihood function constructed from the one-step ahead prediction errors. The KF can be expressed as a single equation which combines \(\mu _{t\mid t-1},\) the optimal estimator of \(\mu _{t}\) based on information at time \(t-1,\) with \(y_{t}\) in order to produce the best estimator of \(\mu _{t+1}\). Writing this equation together with an equation that defines the one-step ahead prediction error, \(v_{t},\) gives the innovations form of the KF:

*q*. In the steady-state, \( k_{t}\) is constant. Setting it equal to \(\kappa \) in (3) and re-arranging gives the ARMA model (2) with \(\xi _{t}=v_{t}\) and \( \phi -\kappa =\theta .\)

When the noise in (1) comes from a heavy-tailed distribution such as Student’s *t* it can give rise to observations which, when judged against the yardstick of a Gaussian distribution, are additive outliers. As a result fitting a Gaussian model is inefficient and may even yield estimators which are inconsistent. Simulation methods, such as Markov chain Monte Carlo (MCMC) and particle filtering, provide the basis for a direct attack on such non-Gaussian models; see Durbin and Koopman (2012). However, simulation-based estimation can be time-consuming and subject to a degree of uncertainty. In addition the statistical properties of the estimators are not easy to establish.

*t*-th observation conditional on past observations. The time-varying parameter is then updated by a suitably defined filter. Such a model is said to be observation driven. In a linear Gaussian UC model, the KF depends on the one step-ahead prediction error. The main ingredient in the DCS filter for non-Gaussian distributions is the replacement of \(v_{t}\) in the KF equation by a variable, \(u_{t},\) that is proportional to the score of the conditional distribution; compare Maronna, Martin and Yohai (2006, p. 272–4) and the references therein. Thus the second equation in (3) becomes

### 2.2 Why the Score?

*t*observations and the last expression follows because of (5). For a Gaussian distribution a single update goes straight to the ML estimate at time

*t*(recursive least squares).

*Remark 1*

We may sometimes be able to choose a link function so that the information quantity does not depend on \(\theta .\)

*t*by a constant, which may be denoted as \(\kappa .\) Thus

### 2.3 Heavy tails

The Gaussian distribution has kurtosis of three and a distribution is said to exhibit *excess kurtosis,* or to be *leptokurtic,* if its kurtosis is greater than three. Although some researchers take excess kurtosis as defining heavy tails, it is not, in itself, an ideal measure, particularly for asymmetric distributions. Most classifications in the insurance and finance literature begin with the behaviour of the upper tail for a non-negative variable, or one that is only defined above a minimum value; see Embrechts et al. (1997). The two which are relevant here are as follows.

*heavy-tailed*if

*y*has an exponential distribution, \(\overline{F}(y)=\exp (-y/\alpha ),\) so \( \exp (y/\alpha )\overline{F}(y)=1\) for all

*y*. Thus the exponential distribution is not heavy-tailed.

*fat-tailed*if, for a fixed positive value of \(\eta ,\)

*c*is a non-negative constant and

*L*(

*y*) is slowly varying,

^{1}that is

*power law*PDF

*m*-th moment exists if \(m< \eta \). The Pareto distribution is a simple case in which \( \overline{F}(y)=y^{-\eta }\) for \(y>1.\) If a distribution is fat-tailed then it must be heavy-tailed, but the converse is not true; see Embrechts, Kluppelberg and Mikosch (1997, p. 41–2).

*y*divided by a scale parameter\(,\varphi \), so that \(\overline{F}(y/\varphi )=cL(y/\varphi )(y/\varphi )^{-\eta }\) and \(f(y)\sim cL(y)\varphi ^{-1}\eta (y/\varphi )^{-\eta -1}.\) Then

*x*denote a variable with a fat-tailed distribution in which the scale is written as \(\varphi =\exp (\mu )\) and let \(y=\ln x.\) Then for large

*y*

*y*is not heavy-tailed, but it may exhibit excess kurtosis. The score with respect to location, \(\mu ,\) is the same as the original score with respect to the logarithm of scale and so tends to \(\eta \) as \(y\rightarrow \infty .\)

### 2.4 Robust estimation

The ML estimators are asymptotically efficient, assuming certain regularity conditions hold. More generally \(\rho (.)\) may be any function deemed to yield estimators with good statistical properties. In particular, the estimators should be robust to observations which would be considered to be outliers for a normal distribution. When normality is assumed, the ML estimators of the mean and variance are just the corresponding sample moments, but these can be subject to considerable distortion when outliers are present. Robust estimators, on the other hand, are resistant to outliers while retaining relatively high efficiency when the data are from a normal distribution.

The M-estimator, which features prominently in the robustness literature, has a Gaussian response until a certain threshold, *K*, whereupon it is constant; see Maronna, Martin and Yohai (2006, p. 25–31). This is known as Winsorizing as opposed to trimming, where observations greater than *K* in absolute value are given a weight of zero.^{2}

## 3 DCS location models

^{3}and \( u_{t} \) is proportional to the conditional score, that is \(u_{t}=k.\partial \ln f(y_{t}\mid y_{t-1}\), \(y_{t-2},...)/\partial \mu _{t\mid t-1},\) where

*k*is a constant.

*p*,

*r*) is

*q*is defined as \(max(p,r+1),\) then \(y_{t}\) is an

*ARMA*(

*p*,

*q*) process with MA coefficients \(\theta _{i}=\phi _{i}-\kappa _{i-1}\), \(i=1,..,q.\) Nonstationary ARIMA-type models may also be constructed as may structural times series models with trend and seasonal components. Explanatory variables can be introduced into DCS models, as described in Harvey and Luati (2014).

Maronna, Martin and Yohai (2006, Sect 8.6 and 8.8) give a robust algorithm for AR and ARMA models with additive outliers. For a first-order model their filter is essentially the same as (13) except that their dynamic equation is driven by a robust \(\psi -function\) and they regard the model as an approximation to a UC model.^{4}

### 3.1 Student *t* model

The Student \(t_{\nu }\) distribution has fat tails for finite degrees of freedom, \(\nu ,\) with the tail index given by \(\nu \). Moments exist only up to and including \(\nu -1\). The excess kurtosis, that is the amount by which the normal distribution’s kurtosis of three is exceeded, is \(6/(\nu -4),\) provided that \(\nu >4.\)

All moments of \(u_{t}\) exist and the existence of moments of \(y_{t}\) is not affected by the dynamics. The autocorrelations can be found from the infinite MA representation; the patterns are as they would be for a Gaussian model.

Maximum likelihood estimation is straightforward and for a first-order dynamic equation, as in (13), an analytic expression for the information matrix is available.

There are a number of ways in which skewness may be introduced into a t-distribution. One possibility is the method proposed by Fernandez and Steel (1998). There is a minor technical issue in that the score is not differentiable at the mode but as, Zhu and Galbraith (2011) show, the asymptotic theory for the ML estimator still goes through in the usual way. The asymptotic theory for the DCS skew-t location model also goes through; see Harvey (2013, Sect 3.11).

### 3.2 Exponential generalized beta distribution model

Figure 1 shows the score functions for EGB2 and t distributions with a standard deviation of one and an excess kurtosis of two. The shape parameters for the two distributions are \(\xi =0.5\) and \(\nu =7\). As can be seen, the score for the *t*-distribution is redescending, reflecting the fact that it has fat tails, whereas the EGB2 score is bounded.

## 4 Macroeconomic time series

Dynamic location models were fitted to the growth rate of US GDP and industrial production using EGB2, Student’s t and normal distributions. GDP is quarterly, ranging from 1947q1 to 2012q4. Industrial production data are monthly and range from January 1960 to February 2013. All data are seasonally adjusted and taken from the Federal Reserve Economic Data (FRED) database of the Federal Reserve of St. Louis.

*p*value of 0.24). Comparing the residuals with a fitted normal shows them to have a higher peak at the mean, as well as heavier tails; see Fig. 2.

*t*model and the EGB2 outperform the Gaussian model with the shape parameter, \({ \nu }\) or \({ \xi ,}\) confirming the excess kurtosis. The \(\lambda \) parameter is the logarithm of scale, \(\nu ,\) but the estimates of \(\sigma \) are shown because these are comparable across different distributions. For GDP the EGB2 models gives a slightly better fit, whereas the t-distribution is better for industrial production. However, the differences between the two are small compared with the Gaussian model.

US GDP (quarterly)

\({\kappa }\) | \({\phi }\) | \({\omega }\) | \({\lambda }\) | \({\xi }~(or~{\nu )}\) | \({\sigma }\) | |
---|---|---|---|---|---|---|

EGB2 | 0.30 | 0.50 | 0.008 | \(-5.40\) | 0.88 | 0.0091 |

Num SE | (0.063) | (0.103) | (0.001) | (0.324) | (0.394) | |

Asy SE | (0.054) | (0.143) | (0.001) | (0.415) | (0.502) | |

t | 0.50 | 0.50 | 0.008 | \(-4.88\) | 6.49 | 0.0091 |

Num SE | (0.094) | (0.103) | (0.001) | (0.071) | (2.364) | |

Asy SE | (0.089) | (0.141) | (0.001) | (0.056) | (1.887) | |

Gaussian | 0.35 | 0.49 | 0.008 | \(-4.70\) | \(-\) | 0.0091 |

Num SE | (0.058) | (0.112) | (0.001) | (0.044) | ||

Asy SE | (0.061) | (0.141) | (0.001) | (0.044) |

US Industrial production (monthly)

\({\kappa }\) | \({\phi }\) | \({\omega }\) | \({\lambda }\) | \({\xi }~(or~{\nu )}\) | \({\sigma }\) | |
---|---|---|---|---|---|---|

EGB2 | 0.20 | 0.85 | 0.003 | \(-6.05\) | 0.55 | 0.0069 |

Num SE | (0.033) | (0.036) | (0.001) | (0.214) | (0.147) | |

Asy SE | (0.027) | (0.040) | (0.001) | (0.287) | (0.196) | |

t | 0.40 | 0.85 | 0.002 | \(-5.25\) | 4.49 | 0.0071 |

Num SE | (0.060) | (0.036) | (0.001) | (0.046) | (0.743) | |

Asy SE | (0.055) | (0.040) | (0.001) | (0.038) | (0.634) | |

Gaussian | 0.25 | 0.83 | 0.002 | \(-4.95\) | \(-\) | 0.0071 |

Num SE | (0.032) | (0.041) | (0.001) | (0.028) | ||

Asy SE | (0.035) | (0.046) | (0.001) | (0.028) |

US macroeconomic series—Model comparison

Log-Likelihood | AIC | BIC | |
---|---|---|---|

GDP | |||

EGB2 | 868.376 | \(-6.566\) | \(-6.498\) |

t | 868.242 | \(-6.565\) | \(-6.497\) |

Gaussian | 862.212 | \(-6.526\) | \(-6.472\) |

Industrial production | |||

EGB2 | 2291.66 | \(-7.168\) | \(-7.133\) |

t | 2293.56 | \(-7.174\) | \(-7.139\) |

Gaussian | 2255.21 | \(-7.057\) | \(-7.029\) |

## 5 Structural breaks

It might be thought that the EGB2 and t filters will be less responsive to a permanent change in the level than the linear Gaussian filter. However, for moderate size shifts, the score functions in Fig. 1 suggest that this might not be the case, because only for large observations is the Gaussian response bigger than the response of the robust filters. For example, for the logistic (EGB2 with unit shape parameters), the score is only smaller than the observation (and hence the linear filter) when it is more than (approximately) 1.6 standard deviations from the mean. The behaviour of the t-filter is similar.

## 6 Trend and seasonality

*t*filter

Generalizing the above model to include a slope and seasonals provides the basis for a robust treatment of seasonal adjustment.

### 6.1 Basic structural model

*j*at time

*t*and define \({\gamma } _{t}=(\gamma _{1t},...,\gamma _{st})^{\prime }\). The full set of seasonals evolves as a multivariate random walk

*s*seasonal components are continually changing, only one affects the observations at any particular time, that is \(\gamma _{t}=\gamma _{jt}\) when season

*j*is prevailing at time

*t*. The requirement that the seasonal components always sum to zero is enforced by the restriction that the disturbances sum to zero at each

*t*. This restriction is implemented by the correlation structure of \({\omega }_{t}\), where \(Var\left( {\mathbf {i}} ^{\prime }{\omega }_{t}\right) =0,\) coupled with initial conditions constraining the seasonals to sum to zero at \(t=0.\)

### 6.2 Stochastic trend and seasonal in the DCS model

*t*model by treating \(\kappa _{1}=\kappa \) as the unknown parameter, but without unity imposed as an upper bound.

*j*th element of \({\kappa }_{t}\), then in season

*j*we set \(\kappa _{jt}=\kappa _{s},\) where \(\kappa _{s}\) is a non-negative unknown parameter, whereas \(\kappa _{it}=-\kappa _{s}/(s-1)\), \( i\ne j,\quad i=1,..,s.\) The amounts by which the seasonal effects change therefore sum to zero. The initial conditions at time \(t=0\) are estimated by treating them as parameters.

The above filter may be regarded as a robust version of the well-known Holt-Winters filter; see Harvey (1989, p. 31). However, it differs from Holt-Winters in the Gaussian case by enforcing the restriction that the seasonals sum to zero. This is an important advantage.

### 6.3 Seasonal adjustment

In contrast to the Gaussian BSM, the DCS model has no exact solution for smoothing. Some possibilities are suggested in Harvey (2013, Sect 3.7), but these are difficult to generalize beyond the local level model. The best way to employ the DCS model for seasonal adjustment is to use it to mitigate the effects of outliers by modifying them rather than eliminating them by dummy variables. A dummy variable effectively means that the corresponding observation is treated as though it were missing; in other words it corresponds to hard trimming.

The above procedure could be implemented with TRAMO-SEATS rather than the unobserved components BSM.

## 7 Tourists in Spain

The logarithm of the number of tourists entering Spain from January 2000 to April 2014 (source: Frontur) is plotted in Fig. 5. Comparing a Gaussian unobserved components model with dummy variables with a DCS-t model provides a contrast between hard and soft trimming.

## 8 Conclusions

This article has shown how DCS models with changing location and/or scale can be successfully extended to cover EGB2 conditional distributions. Most of the theoretical results on the properties of DCS-t models, including the asymptotic distribution of ML estimators, carry over to EGB2 models. However, whereas the t-distribution has fat-tails, and hence subjects extreme observations to a form of soft trimming, the EGB2 distribution has light tails ( but excess kurtosis) and hence gives a gentle form of Winsorizing. The examples show that the EGB2 distribution can give a better fit to some macroeconomic series.

The way in which DCS models respond to breaks was examined and it was shown that, contrary to what might be expected, they adjust almost as rapidly as Gaussian models.

A seasonal adjustment procedure may be carried out with DCS models that include both trend and seasonal components. Having fitted the DCS model, the scores are used to adjust the data before smoothing using a standard Gaussian model. Two or three iterations seem to be sufficient. The method was illustrated with data on tourists in Spain. There is a case for a structural break in April, 2008, but the DCS model quickly adjusts to it and indeed it could be reasonably argued that it is better to let the change take place over several months rather than assigning it to just one. In summary, our new DCS procedure, like TRAMO-SEATS, provides a practical approach to seasonal adjustment in the presence of outliers.

## Footnotes

- 1.
More generally regularly varying is \(\lim _{y\rightarrow \infty }(L(ky)/L(y))=k^{\beta };\) see Embrechts, Kluppelberg and Mikosch (1997, p. 37, 564). Fat-tailed distributions are regularly varying with \(\eta =-\beta >0.\)

- 2.
In both cases a (robust) estimate of scale needs to be pre-computed and the process of computing M-estimates is then often iterated to convergence.

- 3.
The standard deviation is \(\sqrt{\nu /(\nu -2)}\) times the scale.

- 4.
Muler, Peña and Yohai (2009, p. 817) note two shortcomings of the estimates obtained in this way. They write: ‘First, these estimates are asymptotically biased. Second, there is not an asymptotic theory for these estimators, and therefore inference procedures like tests or confidence regions are not available.’ They then suggest a different approach and show that it allows an asymptotic theory to be developed.

## Notes

### Acknowledgments

We are grateful to the Gabriele Fiorentini and a referee for helpful comments on the first draft.

## References

- Caivano M, Harvey AC (2014) Time series models with an EGB2 conditional distribution. J Time Ser Anal 34:558–571CrossRefGoogle Scholar
- Creal D, Koopman SJ, Lucas A (2013) Generalized autoregressive score models with applications. J Appl Econ 28:777–795CrossRefGoogle Scholar
- Creal D, Koopman SJ, Lucas A (2011) A dynamic multivariate heavy-tailed model for time-varying volatilities and correlations. J Bus Econ Stat 29:552–563CrossRefGoogle Scholar
- Durbin J, Koopman SJ (2012) Time series analysis by state space methods, 2nd edn. Oxford Statistical Science Series, OxfordCrossRefGoogle Scholar
- Embrechts P, Kluppelberg C, Mikosch T (1997) Modelling extremal events. Springer, BerlinCrossRefGoogle Scholar
- Fernandez C, Steel MFJ (1998) On Bayesian modeling of fat tails and skewness. J Am Stat Assoc 99:359–371Google Scholar
- Harvey AC (1989) Forecasting, structural time series models and the kalman filter. Cambridge University Press, CambridgeGoogle Scholar
- Harvey AC (2013) Dynamic models for volatility and heavy tails., Econometric Society MonographCambridge University Press, Cambridge, New YorkCrossRefGoogle Scholar
- Harvey AC, Luati A (2014) Filtering with heavy tails. J Am Stat Assoc 109:1112–1122CrossRefGoogle Scholar
- Koopman SJ, Harvey AC, Doornik JA, Shephard N (2009) STAMP 8.2 structural time series analysis modeller and predictor. Timberlake Consultants Ltd, LondonGoogle Scholar
- Maravall A (1985) On structural time series models and the characterization of components. J Bus Econ Stat 3:350–355Google Scholar
- Maronna R, Martin D, Yohai V (2006) Robust statistics: theory and methods. Wiley, ChichesterCrossRefGoogle Scholar
- McDonald JB, Newey WK (1988) Partially adaptive estimation of regression models via the generalized t distribution. Econ Theory 4:428–457CrossRefGoogle Scholar
- Muler N, Pena D, Yohai VJ (2009) Robust estimation for ARMA models. Ann Stat 37:816–840CrossRefGoogle Scholar
- Zhu D, Galbraith JW (2011) Modelling and forecasting expected shortfall with the generalised asymmetric student-t and asymmetric exponential power distributions. Journal of Empir Financ 18:765–778CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.