Abstract
Stationary time series models built from parametric distributions are, in general, limited in scope due to the assumptions imposed on the residual distribution and autoregression relationship. We present a modeling approach for univariate time series data, which makes no assumptions of stationarity, and can accommodate complex dynamics and capture non-standard distributions. The model for the transition density arises from the conditional distribution implied by a Bayesian nonparametric mixture of bivariate normals. This results in a flexible autoregressive form for the conditional transition density, defining a time-homogeneous, non-stationary Markovian model for real-valued data indexed in discrete time. To obtain a computationally tractable algorithm for posterior inference, we utilize a square-root-free Cholesky decomposition of the mixture kernel covariance matrix. Results from simulated data suggest that the model is able to recover challenging transition densities and non-linear dynamic relationships. We also illustrate the model on time intervals between eruptions of the Old Faithful geyser. Extensions to accommodate higher order structure and to develop a state-space model are also discussed.
Similar content being viewed by others
References
Antoniano-Villalobos, I., Walker, S.G.: A nonparametric model for stationary time series. J. Time Ser. Anal. 37, 126–142 (2016)
Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)
Caron, F., Davy, M., Doucet, A., Duflos, E., Vanheeghe, P.: Bayesian Inference for linear dynamic models with Dirichlet process mixtures. IEEE Trans. Signal Process. 56, 71–84 (2008)
Carvalho, A., Tanner, M.: Modeling nonlinear time series with local mixtures of generalized linear models. Can. J. Stat. 33, 97–113 (2005)
Carvalho, A., Tanner, M.: Modeling nonlinearities with mixtures of experts time series models. Int. J. Math. Math. Sci. (2006)
Connor, R., Mosimann, J.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Stat. Assoc. 64, 194–206 (1969)
Daniels, M., Pourahmadi, M.: Bayesian analysis of covariance matrices and dynamic models for longitudinal data. Biometrika 89, 553–566 (2002)
DeYoreo, M., Kottas, A.: A fully nonparametric modeling approach to binary regression. Bayesian Anal. 10, 821–847 (2015)
Di Lucca, M., Guglielmi, A., Müller, P., Quintana, F.: A simple class of Bayesian autoregression models. Bayesian Anal. 8, 63–88 (2013)
Ferguson, T.: A Bayesian analysis of some nonparametric problems. Annals Stat. 1, 209–230 (1973)
Fox, E., Sudderth, E., Jordan, M., Willsky, A.: Bayesian nonparametric inference for switching dynamic linear models. IEEE Trans. Signal Process. 59, 1569–1585 (2011)
Früwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
Geweke, J., Terui, N.: Bayesian threshold autoregressive models for nonlinear time series. J. Time Ser. Anal. 14, 441–454 (1993)
Ishwaran, H., James, L.: Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 96, 161–173 (2001)
Juang, B.H., Rabiner, L.: Mixture autoregressive hidden Markov models for speech signals. IEEE Trans. Acoust. Speech Signal Process. 33(6), 1404–1413 (1985)
Lau, J., So, M.: Bayesian mixture of autoregressive models. Comput. Stat. Data Anal. 53, 38–60 (2008)
MacEachern, S.: Dependent Dirichlet processes. The Ohio State University, Department of Statistics, Tech. rep. (2000)
Martinez-Ovando, J., Walker, S.G.: Time-series modeling, stationarity, and Bayesian nonparametric methods. Tech. rep, Banco de Mexico (2011)
Mena, R., Walker, S.: Stationary autoregressive models via a Bayesian nonparametric approach. J. Time Ser. Anal. 26, 789–805 (2005)
Müller, P., West, M., MacEachern, S.: Bayesian models for nonlinear autoregressions. J. Time Ser. Anal. 18, 593–614 (1997)
Ombao, H., Raz, J.A., von Sachs, R., Malow, B.A.: Automatic statistical analysis of bivariate nonstationary time series. J. Am. Stat. Assoc. 96, 543–560 (2001)
Primiceri, G.: Time varying structural vector autoregressions and monetary policy. Rev. Econ. Stud. 72, 821–852 (2005)
Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)
Tang, Y., Ghosal, S.: A consistent nonparametric Bayesian procedure for estimating autoregressive conditional densities. Comput. Stat. Data Anal. 51, 4424–4437 (2007a)
Tang, Y., Ghosal, S.: Posterior consistency of Dirichlet mixtures for estimating a transition density. J. Stat. Plan. Inference 137, 1711–1726 (2007b)
Tong, H.: On a threshold model. In: Chen, C. (ed.) Recognition, Pattern, Processing, Signal. Sijthoff and Nordhoff, Amsterdam (1987)
Tong, H.: Non-Linear Time Series: A Dynamical System Approach. Oxford University Press, Oxford (1990)
Webb, E., Forster, J.: Bayesian model determination for multivariate ordinal and binary data. Comput. Stat. Data Anal. 52, 2632–2649 (2008)
West, M., Harrison, J.: Bayesian Forecasting and Dynamic Models. Springer, New York (1999)
Wong, C.S., Li, W.K.: On a mixture autoregressive model. J. R. Stat. Soc. Ser. B 62, 95–115 (2000)
Wood, S., Rosen, O., Kohn, R.: Bayesian mixtures of autoregressive models. J. Comput. Graph. Stat. 20, 174–195 (2011)
Acknowledgments
The work of the first author was supported by the National Science Foundation under award SES 1131897. The work of the second author was supported in part by the National Science Foundation under awards DMS 1310438 and DMS 1407838. The authors wish to thank three reviewers for constructive feedback and for comments that improved the presentation of the material in this paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: The Markov chain Monte Carlo algorithm
Here, we provide the details of the MCMC method for posterior simulation from the nonparametric mixture model developed in Sect. 2.1.
The posterior full conditional distributions for \(\alpha \) and the components of vector \(\psi \) are standard as they are assigned conditionally conjugate priors. Each \(U_t\), \(t=2,\ldots ,n\) is sampled from a discrete distribution on \(\{1,\ldots ,L\}\), with probabilities \((\tilde{p}_{1,t},\ldots ,\tilde{p}_{L,t})\), where \(\tilde{p}_{l,t}\propto p_l\mathrm {N}(z_t \mid \mu _{l}^y-\beta _{l}(z_{t-1}-\mu _{l}^x),\delta _{l}^y) \mathrm {N}(z_{t-1} \mid \mu _l^x,\delta _l^x)\), for \(l=1,\ldots ,L\).
Next, consider the mixing parameters. Letting \(\{U_j^*:j=1,\ldots ,n^*\}\) be the \(n^*\) distinct values of \((U_2,\ldots ,U_n)\), and \(M_l=|\{U_t:U_t=l\}|\), we obtain the full conditional
Therefore, if \(l\in \{U_j^*\}\), \(\mu _l^y\) is sampled from a normal distribution with variance \((v^y)^*=[(\nu ^y)^{-1}+M_l(\delta _l^y)^{-1}]^{-1}\), and mean \((v^y)^*[(\nu ^y)^{-1}m^y+(\delta _l^y)^{-1}\sum _{\{t:U_t=U_j^*\}}(z_t+\beta _l(z_{t-1}-\mu _l^x))]\). If component l is empty, that is, \(l\notin \{U_j^*\}\), then \(\mu _l^y\sim \mathrm {N}(m^y,v^y)\). The updates for \(\delta _l^y\) and \(\beta _l\) also require only Gibbs sampling. If \(l\in \{U_j^*\}\), then \(\delta _l^y\sim \mathrm {IG}(\nu ^y+0.5M_l,s^y+0.5\sum _{\{t:U_t=l\}}(z_t-\mu _l^y+\beta _l(z_{t-1}-\mu _l^x))^2)\) and \(\beta _l\) is sampled from a normal with variance \(c^*=[c^{-1}+(\delta _l^y)^{-1}\sum _{\{t:U_t=l\}}(z_{t-1}-\mu _l^x)^2]^{-1}\) and mean \(c^*[c^{-1}\theta +(\delta _l^y)^{-1}\sum _{\{t:U_t=l\}}(z_{t-1}-\mu _l^x)(\mu _l^y-z_t)]\). If \(l\notin \{U_j^*\}\), then we sample from \(G_0\): \(\delta _l^y\sim \mathrm {IG}(\nu ^y,s^y)\) and \(\beta _l\sim \mathrm {N}(\theta ,c)\).
No matter the choice of \(G_0\), the full conditionals for \(\mu _l^x\) and \(\delta _l^x\) are not proportional to any standard distribution, as these parameters are contained in the sum of L terms in the denominator of \(q_l(z_{t-1})\). The posterior full conditional \(p(\mu _l^x \mid \ldots ,\mathrm {data})\), when \(l\in \{U_j^*\}\), is given by
This can be written as \(p(\mu _l^x|\ldots ,\mathrm {data})\propto \) \(\mathrm {N}(\mu _l^x \mid (m^x)^*,(v^x)^*)(\prod _{t=2}^n \sum _{m=1}^L p_m\mathrm {N}(z_{t-1} \mid \mu _m^x,\delta _m^x))^{-1}\), with \((v^x)^*=((v^x)^{-1}+M_l(\delta _l^x)^{-1}+M_l\beta _l^2(\delta _l^y)^{-1})\) and \((m^x)^*=(v^x)^*((v^x)^{-1}m^x+(\delta _l^x)^{-1}\sum _{\{t:U_t=l\}}z_{t-1}+(\delta _l^y)^{-1}\beta _l^2\sum _{\{t:U_t=l\}}(z_{t-1}+(z_t-\mu _l^y)/\beta _l))\). We use a random-walk Metropolis step to update \(\mu _l^x\). For \(l\notin \{U_j^*\}\), \(p(\mu _l^x \mid \ldots ,\mathrm {data})\) is proportional to \(\mathrm {N}(\mu _l^x \mid m^x,v^x)\left[ \prod _{t=2}^n \sum _{m=1}^L p_m\mathrm {N}(z_{t-1} \mid \mu _m^x,\delta _m^x)\right] ^{-1}\), and in this case we use a Metropolis–Hastings algorithm, proposing a candidate value \(\mu _l^x\) from the base distribution \(\mathrm {N}(m^x,v^x)\).
The full conditional and sampling strategy for \(\delta _l^x\) are similar to those for \(\mu _l^x\). We have
which for an active component, is written as proportional to
For non-active components, the full conditional is \(\mathrm {IG}(\delta _l^x \mid \nu ^x,s^x)(\prod _{t=2}^n \sum _{m=1}^L p_m\mathrm {N}(z_{t-1} \mid \mu _m^x,\delta _m^x))^{-1}\). We use a similar strategy for sampling \(\delta _l^x\) as we did with \(\mu _l^x\), using a random-walk Metropolis algorithm for the active components of \(\delta _l^x\), working on the log-scale and sampling \(\log (\delta _l^x)\), and proposing the non-active components from \(G_0(\delta _l^x)=\) \(\mathrm {IG}(\nu ^x,s^x)\).
We next discuss the updating scheme for the vector \(\varvec{p}=\) \((p_1,\ldots ,p_L)\), which poses the main challenge for posterior simulation. The full conditional for \(\varvec{p}\) has the form
In standard DP mixture models, the implied generalized Dirichlet prior for \(f(\varvec{p} \mid \alpha )\) combines with \(\prod _{l=1}^L p_l^{M_l}\) to form another generalized Dirichlet distribution. However, in this case there is an additional term. Metropolis–Hastings algorithms with various proposal distributions were explored to sample the vector p, resulting in very low acceptance rates. We instead devise an alternative sampling scheme, in which we work directly with the latent beta-distributed random variables which determine the probability vector \(\varvec{p}\) arising from the DP truncation approximation. Recall that \(p_1=v_1\), \(p_l=v_l\prod _{r=1}^{l-1}(1-v_r)\), for \(l=2,\ldots ,L-1\), and \(p_L=\prod _{r=1}^{L-1}(1-v_r)\), where \(v_1,\ldots ,v_{L-1}\mathop {\sim }\limits ^{i.i.d.}\mathrm {beta}(1,\alpha )\). Equivalently, let \(\zeta _1,\ldots ,\zeta _{L-1}\mathop {\sim }\limits ^{i.i.d.}\mathrm {beta}(\alpha ,1)\), and define \(p_1=\) \(1-\zeta _1\), \(p_l =\) \((1-\zeta _l)\prod _{r=1}^{l-1}\zeta _r\), and \(p_L=\prod _{r=1}^{L-1}\zeta _r\). Rather than updating directly \(\varvec{p}\), we work with the \(\zeta _{l}\), a sample for which implies a particular probability vector \(\varvec{p}\).
The full conditional for \(\zeta _l\), \(l=1,\ldots ,L-1\), has the form
where
Also, let \(c_{t,l}=\mathrm {N}(z_{t-1} \mid \mu _{l}^{x},\delta _{l}^{x})\), which is constant with respect to each \(\zeta _l\). The form of the full conditional in (8) suggests the use of a slice sampler to update each \(\zeta _l\) one at a time. The slice sampler is implemented by drawing auxiliary random variables \(u_{t}\sim \mathrm {uniform}(0,(d(z_{t-1}))^{-1}),\) \(t=2,\ldots ,n,\) and then sampling \(\zeta _{l}\sim \mathrm {beta}(\alpha +\sum _{r=l+1}^{L}M_{r},M_{l}+1)\), but restricted to the set \(\{\zeta _{l}:u_{t}<(d(z_{t-1}))^{-1},t=2,\ldots ,n\}\). The term \(d(z_{t-1})\) can be expressed as
\(d(z_{t-1})=\) \(\zeta _{l} w_{1t}+w_{0t}\), for any \(l=1,\ldots ,L-1\), where
and, if \(l=1,\) \(w_{0t}=c_{t,1}\), otherwise \(w_{0t}=c_{t,1}(1-\zeta _{1})+\sum _{s=2}^{l-1}c_{t,s}(1-\zeta _{s})\prod _{r=1}^{s-1}\zeta _{r}+c_{t,l}\prod _{s=1}^{l-1}\zeta _{s}\). Then, the set \(\{\zeta _{l}:d(z_{t-1})<u_{t}^{-1}\}\) is \(\{\zeta _{l}:\zeta _{l}w_{1t}<u_{t}^{-1}-w_{0t}\}.\) This takes the form of \(\{\zeta _{l}:\zeta _{l}<(u_{t}w_{1t})^{-1}-w_{0t}(w_{1t})^{-1}\}\) when \(w_{1t}\) is positive, and has the form \(\{\zeta _{l}:\zeta _{l}>(u_{t}w_{1t})^{-1}-w_{0t}(w_{1t})^{-1}\}\) otherwise. Therefore, the truncated beta random draw for \(\zeta _l\) must lie in the interval \((\max _{\{t:w_{1t}<0\}}[(u_{t}w_{1t})^{-1}-w_{0t}(w_{1t})^{-1}],\min _{\{t:w_{1t}>0\}}[(u_{t}w_{1t})^{-1}-w_{0t}(w_{1t})^{-1}])\). The inverse CDF random variate generation method can be used to sample from these truncated beta random variables. This strategy results in direct draws for the \(\zeta _l\).
Appendix 2: Computing posterior predictive ordinates
We describe here an approach to computing one-step-ahead posterior predictive ordinates, \(p(z_{t} \mid \varvec{z}_{(t-1)})\), where \(\varvec{z}_{(m)}=\) \((z_{2},\ldots ,z_{m})\), for \(m=2,\ldots ,n\), is the observed series up to time m. The objective is to compute \(p(z_{t} \mid \varvec{z}_{(t-1)})\) for any desired number of observations \(z_{t}\), using the samples from the posterior distribution given the full data vector \(\varvec{z}_{(n)}\).
Denote by \(\varvec{\Theta }=\) \((\{ \varvec{\eta }_{l}: l=1,\ldots ,L \},\varvec{p},\alpha ,\varvec{\psi })\) all model parameters, excluding the latent configuration variables. We abbreviate \(f(z_t \,{\mid }\, z_{t-1},G)\) in (3) to \(f(z_t \mid z_{t-1})\), but note that, given the \(\varvec{\eta }_{l}\) and \(\varvec{p}\), the mixture model for the transition density can be computed at any values \(z_{t}\) and \(z_{t-1}\). Let \(B_{(m)}\) be the normalizing constant of the posterior distribution for \(\varvec{\Theta }\) given \(\varvec{z}_{(m)}\), and \(p(\varvec{\Theta })=\) \(\left\{ \prod _{l=1}^{L} G_0(\varvec{\eta }_l \mid \varvec{\psi })\right\} f(\varvec{p}\,{\mid }\, \alpha ) p(\alpha ) p(\varvec{\psi })\) be the prior for \(\varvec{\Theta }\). Then,
and therefore \(p(z_{n} \mid \varvec{z}_{(n-1)}) =\) \(\int f(z_{n} \mid z_{n-1}) p(\varvec{\Theta } \mid \varvec{z}_{(n-1)}) \, \text {d}\varvec{\Theta }=\) \(B_{(n)}/B_{(n-1)}\). In addition, \(\int \{ f(z_{n} \mid z_{n-1}) \}^{-1} p(\varvec{\Theta } \mid \varvec{z}_{(n)}) \, \text {d}\varvec{\Theta }=\) \(B_{(n-1)}/B_{(n)}\), and thus
Similarly, \(p(\varvec{\Theta } \mid \varvec{z}_{(n-2)})=\) \(\{ B_{(n)} p(\varvec{\Theta } \mid \varvec{z}_{(n)}) \}/\{ B_{(n-2)} f(z_{n} \mid z_{n-1}) f(z_{n-1} \mid z_{n-2}) \}\). Hence, \(p(z_{n-1} \mid \varvec{z}_{(n-2)})=\) \(\int f(z_{n-1} \mid z_{n-2}) p(\varvec{\Theta } \mid \varvec{z}_{(n-2)}) \, \text {d}\varvec{\Theta }=\) \(\frac{ B_{(n)} }{ B_{(n-2)} } \int \{ f(z_{n} \mid z_{n-1}) \}^{-1} p(\varvec{\Theta } \mid \varvec{z}_{(n)}) \, \text {d}\varvec{\Theta }\). Then, observing that \(\int \{ f(z_{n} \mid z_{n-1}) f(z_{n-1} \mid z_{n-2}) \}^{-1} p(\varvec{\Theta } \mid \varvec{z}_{(n)}) \, \text {d}\varvec{\Theta }=\) \(B_{(n-2)}/B_{(n)}\), we obtain an expression for \(p(z_{n-1} \mid \varvec{z}_{(n-2)})\) that involves the product of the two integrals above. Extending the derivation for \(p(z_{n-1} \mid \varvec{z}_{(n-2)})\), we obtain
for any \(t=3,\ldots ,n-1\), with the expression for \(t=n\) given in (9). These expressions allow us to estimate any posterior predictive ordinate \(p(z_{t} \mid \varvec{z}_{(t-1)})\), using Monte Carlo integration based on the samples from \(p(\varvec{\Theta } \mid \varvec{z}_{(n)})\).
Rights and permissions
About this article
Cite this article
DeYoreo, M., Kottas, A. A Bayesian nonparametric Markovian model for non-stationary time series. Stat Comput 27, 1525–1538 (2017). https://doi.org/10.1007/s11222-016-9702-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9702-x