SNP: Nonparametric Time Series Analysis

Gallant, A. Ronald

doi:10.1057/978-1-349-95189-5_1956

SNP: Nonparametric Time Series Analysis

A. Ronald Gallant¹

Reference work entry
First Online: 01 January 2018

16 Accesses

Abstract

SNP is a method of nonparametric multivariate time series analysis. It employs an expansion in Hermite functions to approximate the conditional density of a multivariate process. An appealing feature of the expansion is that it is a nonlinear nonparametric model that directly nests the Gaussian VAR model, the semiparametric ARCH model, the Gaussian GARCH model, and the semiparametric GARCH model. The unrestricted SNP expansion is more general than any of these models. The SNP model is fitted using conventional maximum likelihood together with a model selection strategy that determines the appropriate order of expansion.

Download reference work entry PDF

SNP is a method of multivariate nonparametric time series analysis. SNP is an abbreviation of ‘semi-nonparametric’ which was introduced by Gallant and Nychka (1987) to suggest the notion of a statistical inference methodology that lies halfway between parametric and nonparametric inference. The method employs an expansion in Hermite functions to approximate the conditional density of a multivariate process.

The leading term of this expansion can be chosen through selection of model parameters to be a Gaussian vector autoregression (VAR) model, a semi-parametric VAR model, a Gaussian ARCH model (Engle 1982), a semiparametric ARCH model, a Gaussian GARCH model (Bollerslev 1986), or a semiparametric GARCH model, either univariate or multivariate in each case. The unrestricted SNP expansion is more general than that of any of these models. The SNP model is fitted using maximum likelihood together with a model selection strategy that determines the appropriate order of expansion. Because the SNP model possesses a score, it is an ideal candidate for the auxiliary model in connection with efficient method of moment estimation (Gallant and Tauchen 1996). Due to its leading term, the SNP approach does not suffer from the curse of dimensionality to the same extent as kernels and splines. In regions where data are sparse, the leading term helps to fill in smoothly between data points. Where data are plentiful, the higher-order terms accommodate deviations from the leading term. The method was first proposed by Gallant and Tauchen (1989) in connection with an asset pricing application. A C + + implementation of SNP is at http://econ.duke.edu/webfiles/arg/snp/, together with a User’s Guide, which is an excellent tutorial introduction to the method.

Important adjuncts to SNP estimation are a rejection method for simulating from the SNP density developed in Gallant and Tauchen (1992), which can be used, for example, to set bootstrapped confidence intervals as in Gallant et al. (1992); nonlinear error shock analysis as described in Gallant et al. (1993), which develops the nonlinear analog of conventional error shock analysis for linear VAR models; and re-projection, which is a form of nonlinear Kalman filtering that can be used to forecast the unobservables of nonlinear latent variables models (Gallant and Tauchen 1998).

As stated above, the SNP method is based on the notion that a Hermite expansion can be used as a general purpose approximation to a density function. Letting z denote an M–vector, we can write the Hermite density as $h (z) \propto {[P (z)]}^{2} φ (z)$ where $P (z)$ denotes a multivariate polynomial of degree K_z and φ(z) denotes the density function of the (multivariate) Gaussian distribution with mean zero and variance the identity matrix. Denote the coefficients of $P (z)$ by a, which is a vector whose length depends on K_z and M. When we wish to call attention to the coefficients, we write $P (z| a) .$

The constant of proportionality is $1 / \int {[P (s)]}^{2} φ (s) ds$ which makes h(z) integrate to one. As seen from the expression that results, namely

h (z) = \frac{{[P (z)]}^{2} φ (z)}{\int {[P (s)]}^{2} φ (s) ds},

we are effectively expanding the square root of the density in Hermite functions of the form $P (z) \sqrt φ (z)$ . Because the square root of a density is always square integrable and because the Hermite functions of the form $P (z) \sqrt φ (z)$ are dense for the collection of square integrable functions (Fenton and Gallant 1996), every density has such an expansion. Because ${[P (z)]}^{2} / \int {[P (s)]}^{2} φ (s) ds$ is a homogeneous function of the coefficients of the polynomial $P (z)$ , the coefficients can only be determined to within a scalar multiple. To achieve a unique representation, the constant term of the polynomial part is put to 1. Customarily the Hermite density is written with its terms orthogonalized and the C ++ code is written in the orthogonalized form for numerical efficiency. But reflecting that here would lead to cluttered notation and add nothing to the ideas.

A change of variables using the location-scale transformation y = Rz + μ, where R is an upper triangular matrix and μ is an M-vector, gives

\begin{array}{l} f (y| θ) \\ \begin{array}{l} \propto {\{P [R^{- 1} (y - μ)]\}}^{2} \{φ [R^{- 1} (y - μ)] /| det (R)\} \end{array} \end{array}

The constant of proportionality is the same as above, $1 / \int {[P (s)]}^{2} φ (s) ds .$ Because {φ[R⁻¹(y − μ)]/| det(R)|} is the density function of the M-dimensional, multivariate, Gaussian distribution with mean μ and variance-covariance matrix Σ = RR′, and because the leading term of the polynomial part is 1, the leading term of the entire expansion is proportional to the multivariate, Gaussian density function. Denote the Gaussian density of dimension M with mean vector μ and variance matrix Σ by n_M (y|μ, Σ) and write

f (y| θ) \propto {[P (z)]}^{2} n_{M} (y| μ| Σ)

where z = R⁻¹(y − μ) for the density above.

When K_z is put to zero, one gets f(y| θ) = n_M(y| μ, Σ) exactly. When K_z is positive, one gets a Gaussian density whose shape is modified due to multiplication by a polynomial in z = R⁻¹(y − μ). The shape modifications thus achieved are rich enough to accurately approximate densities from a large class that includes densities with fat, t-like tails, densities with tails that are thinner than Gaussian, and skewed densities (Gallant and Nychka 1987).

The parameters θ of f(y|θ) are made up of the coefficients a of the polynomial $P (z)$ plus μ and R and are estimated by maximum likelihood which is accomplished by minimizing $ {s}_n\left(\theta \right)=\left(-1/n\right){\sum}_{t=1}^n\mathit{\log}\left[f\left({y}_t|\theta \right)\right] $ As mentioned above, if the number of parameters p_θ grows with the sample size n, the true density and various features of it such as derivatives and moments are estimated consistently (Gallant and Nychka 1987).

This basic approach can be adapted to the estimation of the conditional density of a multiple time series {y_t} that has a Markovian structure. Here, the term ‘Markovian structure’ is taken to mean that the conditional density of the M–vector y_t given the entire past y_t−1,y_t−2,... depends only on L lags from the past. For convenience, we will presume that the data are from a process with a Markovian structure, but one should be aware that, if L is sufficiently large, then non-Markovian data can be well approximated by an SNP density (Gallant and Long 1997). Collect these lags together as x_t−1 = (y_t−1,y_t−2,...,y_t−L), where L exceeds all lags in the following discussion.

To approximate the conditional density of {y_t} using the ideas above, begin with a sequence of innovations {z_t}. First consider the case of homogeneous innovations; that is, the distribution of z_t does not depend on x_t−1. Then, as above, the density of z_t can be approximated by $h (z) \propto {[P (z)]}^{2} φ (z)$ where $P (z)$ is a polynomial of degree K_z. Follow with the location-scale transformation y_t = Rz_t + μ_x where μ_x is a linear function that depends on L_u lags

$$ {\mu}_x={b}_0+{Bx}_{t-1}. $$

(If L_u < L, then some elements of B are zero.) The density that results is

f (y| x| θ) \propto {[P (z)]}^{2} n_{M} (y| μ_{x}| Σ)

where z = R⁻¹(y − μ_x). The constant of proportionality is as above, $1 / \int {[P (s)]}^{2} φ (s) ds$ . The leading term of the expansion is n_M(y| μ_x, Σ) which is a Gaussian vector autoregression or Gaussian VAR. When K_z is put to zero, one gets n_M(y| μ_x, Σ) exactly. When Kz is positive, one gets a semiparametric VAR density.

To approximate conditionally heterogeneous processes, proceed as above but let each coefficient of the polynomial $P (z)$ be a polynomial of degree K_x in x. A polynomial in z of degree K_z whose coefficients are polynomials of degree K_x in x is, of course, a polynomial in (z, x) of degree K_z + K_x. Denote this polynomial by $P (z, x) .$ Denote the mapping from x to the coefficients a of $P (z)$ such that $P (z| a_{x}) = P (z, x)$ by a_x and the number of lags on which it depends by L_p. The form of the density with this modification is

f (y| x| θ) \propto {[P (z, x)]}^{2} n_{M} (y| μ| Σ)

where z = R⁻¹(y − μ_x ). The constant of proportionality is $1 / \int {[P (s, x)]}^{2} φ (s) ds$ . When K_x is zero, the density reverts to the density above. When K_x is positive, the shape of the density will depend upon x. Thus, all moments can depend upon x and the density can, in principal, approximate any form of conditional heterogeneity (Gallant and Tauchen, 1989).

In practice the second moment can exhibit marked dependence upon x. In an attempt to track the second moment, K_x can get quite large. To keep K_x small when data are markedly conditionally heteroskedastic, the leading term n_M (y|μ_x, Σ) of the expansion can be put to a Gaussian GARCH rather than a Gaussian VAR. SNP uses a modified BEKK expression as described in Engle and Kroner (1995); the modifications are to add leverage and level effects.

$$ {\displaystyle \begin{array}{ll}{\Sigma}_{x_{t-1}}=& {R}_0{R}_0^{\prime }+\sum \limits_{i=1}^{L_g}{Q}_i{\Sigma}_{x_{t-1-i}}{Q}_i^{\prime}\hfill \\ {}& +\sum \limits_{i=1}^{L_r}{P}_i\left({{y_t}_{-}}_i-{\mu}_{x_{t-1-i}}\right){\left({{y_t}_{-}}_i-{\mu}_{x_{t-1-i}}\right)}^{\prime }{P}_i^{\prime}\hfill \\ {}& +\sum \limits_{i=1}^{L_v}\max \left[0,{V}_i\left({{y_t}_{-}}_i-{\mu}_{x_{t-1-i}}\right)\right]\hfill \\ {}& \max {\left[0,{V}_i\left({{y_t}_{-}}_i-{\mu}_{x_{t-1-i}}\right)\right]}^{\prime }\ \hfill \\ {}& +\sum \limits_{i=1}^{L_w}{W}_i{{x_{\Big(}}_1}_{\Big)}{{{{}_{,}}_t}_{-}}_1{x}_{(1),t-i}^{\prime }{W}_i^{\prime }.\hfill \end{array}} $$

Above, R₀ is an upper triangular matrix. The matrices P_i, Q_i, V_i, and W_i can be scalar, diagonal, or full M by M matrices. The notation x_(1),t−i indicates that only the first column of x_t−i enters the computation. The max(0, x) function is applied elementwise. Because $ {\sum}_{x_{t-1}} $ must be differentiable with respect to the parameters of $ {\mu}_{x_{t-2-i}} $, the max(0,x) function is approximated by a twice continuously differentiable cubic spline. Defining $ {R}_{x_{t-1}} $ by the factorization $ {\sum}_{x_{t-1}}={R}_{x_{t-1}}{R}_{x_{t-1}}^{\prime } $ and writing x for x_t−1, the SNP density becomes

f (y| x| θ) \propto {[P (z, x)]}^{2} n_{M} (y| μ| Σ_{x})

where $ z={R}_x^{-1}\left(y-\mu \right) $. The constant of proportionality is $1 / \int {[P (s, x)]}^{2} φ (s) ds$ . The leading term n_M(y|μ_x, Σ_x) is Gaussian ARCH if L_g = 0 and L_r> 0 and Gaussian GARCH if both L_g> 0 and L_r> 0 (leaving aside the implications of L_v and L_w).

Bibliography

Bollerslev, T. 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31: 307–327.
Article Google Scholar
Engle, R.F. 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50: 987–1007.
Article Google Scholar
Engle, R.F., and K.F. Kroner. 1995. Multivariate simultaneous generalized ARCH. Econometric Theory 11: 122–150.
Article Google Scholar
Fenton, V.M., and A.R. Gallant. 1996. Qualitative and asymptotic performance of SNP density estimators. Journal of Econometrics 74: 77–118.
Article Google Scholar
Gallant, A.R., D. Hsieh, and G. Tauchen. 1997. Estimation of stochastic volatility models with diagnostics. Journal of Econometrics 81: 159–192.
Article Google Scholar
Gallant, A.R., and J.R. Long. 1997. Estimating stochastic differential equations efficiently by minimum chi-square. Biometrika 84: 125–141.
Article Google Scholar
Gallant, A.R., and D.W. Nychka. 1987. Seminonparametric maximum likelihood estimation. Econometrica 55: 363–390.
Article Google Scholar
Gallant, A.R., P.E. Rossi, and G. Tauchen. 1992. Stock prices and volume. Review of Financial Studies 5: 199–242.
Article Google Scholar
Gallant, A.R., P.E. Rossi, and G. Tauchen. 1993. Nonlinear dynamic structures. Econometrica 61: 871–907.
Article Google Scholar
Gallant, A.R., and G. Tauchen. 1989. Seminonparametric estimation of conditionally constrained heterogeneous processes: Asset pricing applications. Econometrica 57: 1091–1120.
Article Google Scholar
Gallant, A.R., and G. Tauchen. 1992. A nonparametric approach to nonlinear time series analysis: Estimation and simulation. In New directions in time series analysis, part II, ed. D. Brillinger et al. New York: Springer-Verlag.
Google Scholar
Gallant, A.R., and G. Tauchen. 1996. Which moments to match? Econometric Theory 12: 657–681.
Article Google Scholar
Gallant, A.R., and G. Tauchen. 1998. Reprojecting partially observed systems with application to interest rate diffusions. Journal of the American Statistical Association 93: 10–24.
Article Google Scholar

Download references

Author information

Authors and Affiliations

http://link.springer.com/referencework/10.1057/978-1-349-95121-5
A. Ronald Gallant

Authors

A. Ronald Gallant
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Additional information

Research for this article was supported by the National Science Foundation.

Copyright information

About this entry

Cite this entry

Gallant, A.R. (2018). SNP: Nonparametric Time Series Analysis. In: The New Palgrave Dictionary of Economics. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-349-95189-5_1956

Download citation

DOI: https://doi.org/10.1057/978-1-349-95189-5_1956
Published: 15 February 2018
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-349-95188-8
Online ISBN: 978-1-349-95189-5
eBook Packages: Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences

Publish with us

Policies and ethics

SNP: Nonparametric Time Series Analysis

Abstract

See Also

Bibliography

Author information

Authors and Affiliations

Editor information

Additional information

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Abstract

See Also

Bibliography

Author information

Authors and Affiliations

Editor information

Additional information

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation