An Adaptive Sparse Grid Approach for Time Series Prediction

Bohn, Bastian; Griebel, Michael

doi:10.1007/978-3-642-31703-3_1

Bastian Bohn³ &
Michael Griebel³

Part of the book series: Lecture Notes in Computational Science and Engineering ((LNCSE,volume 88))

1587 Accesses
3 Citations

Abstract

A real valued, deterministic and stationary time series can be embedded in a—sometimes high-dimensional—real vector space. This leads to a one-to-one relationship between the embedded, time dependent vectors in \({\mathbb{R}}^{d}\) and the states of the underlying, unknown dynamical system that determines the time series. The embedded data points are located on an m-dimensional manifold (or even fractal) called attractor of the time series. Takens’ theorem then states that an upper bound for the embedding dimension d can be given by d ≤ 2m + 1.The task of predicting future values thus becomes, together with an estimate on the manifold dimension m, a scattered data regression problem in d dimensions. In contrast to most of the common regression algorithms like support vector machines (SVMs) or neural networks, which follow a data-based approach, we employ in this paper a sparse grid-based discretization technique. This allows us to efficiently handle huge amounts of training data in moderate dimensions. Extensions of the basic method lead to space- and dimension-adaptive sparse grid algorithms. They become useful if the attractor is only located in a small part of the embedding space or if its dimension was chosen too large.We discuss the basic features of our sparse grid prediction method and give the results of numerical experiments for time series with both, synthetic data and real life data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Here, “generically” means the following:If \({X}_{l} := \left \{\bf{x} \in {M}_{0}\mid {\phi }^{l}\left (\bf{x}\right ) =\bf{ x}\right \}\) fulfills \(\left \vert {X}_{l}\right \vert < \infty \) for all l ≤ 2m and if the Jacobian matrix \({\left (\text{ D}{\phi }^{l}\right )}_{\bf{x}}\) of ϕ ^l at \(\bf{x}\) has pairwise distinct eigenvalues for all \(l \leq 2m,\bf{x} \in {X}_{l}\) , then the set of all \(o \in {C}^{2}\left ({M}_{0}, \mathbb{R}\right )\) for which the embedding property of Theorem 1 does not hold is a null set. As \({C}^{2}\left ({M}_{0}, \mathbb{R}\right )\) is an infinite dimensional vector space, the term “null set” may not be straightforward. It should be understood in the way that every set \(Y \supset \left \{o \in {C}^{2}\left ({M}_{0}, \mathbb{R}\right )\mid {\rho }_{\left (\phi ,o\right )}\text{ is an embedding }\right \}\) is prevalent.
2.
All functions on the right hand side of (3) are at least twice differentiable. As M ₀ is compact, the concatenation of these functions lies in the standard Sobolev space \({H}_{2}({\rho }_{\left (\phi ,o\right )}({M}_{0}))\), where \({\rho }_{\left (\phi ,o\right )}({M}_{0}) \subset {\mathbb{R}}^{2m+1}\) denotes the image of M ₀ under \({\rho }_{\left (\phi ,o\right )}\).
3.
An alternative would be to simulate a time series with 15 min gaps by omitting intermediate values which would lead to a considerable reduction of the number of points. This is however not advantageous, as more points usually lead to better prediction results for the numerical algorithm.
4.
Here “generically” means the following:If \(\tilde{{X}}_{l} := \left \{\bf{x} \in A\mid {\phi }^{l}\left (\bf{x}\right ) =\bf{ x}\right \}\) fulfills \(\widehat{\dim }\left (\tilde{{X}}_{l}\right ) \leq \frac{l} {2}\) for all \(l \leq \lfloor 2m + 1\rfloor \) and if \({\left (\text{ D}{\phi }^{l}\right )}_{\bf{x}}\) has pairwise distinct eigenvalues for all \(l \leq \lfloor 2m + 1\rfloor ,\bf{x} \in \tilde{ {X}}_{l}\) , then the set of all \(o \in {C}^{2}\left ({M}_{0}, \mathbb{R}\right )\) for which the properties in Theorem 2 do not hold is a null set.
5.
Other cost functions can be used as well but these might lead to non-quadratic or even non-convex minimization problems.
6.
If this is not the case we can choose a linearly independent subsystem and continue analogously.
7.
See [28] for several reproducing kernels and their corresponding Hilbert spaces.
8.
Note that the use of the combination technique [16] even allows here for a slight improvement to \(O\left (N \cdot {t}^{d-1}\right )\). In both cases, however, the constant in the O-notation grows exponentially with d.
9.
Note here that it is not enough to check the surplus of points which have been inserted in the last iteration. The hierarchical surplus of all other points can change as well when calculating the solution on the refined grid.
10.
Note that \({W}_{\bf{l}}\) and \(\tilde{{W}}_{\bf{l}}\) are the same for a multilevel index \(\bf{l}\) with l _j ≥ 1 for all j = 1, …, d.
11.
For the one-dimensional case one simply defines x _0, 1 to be the single child node of x _− 1, 0. The generalization to the multi-dimensional case is straightforward.
12.
To this end, the system matrix from (17) is first transformed into the prewavelet basis, see e.g. [4], then, the inverse of its diagonal is taken as preconditioner.
13.
One can easily see that \(\tilde{{X}}_{l}\) is finite for l = 1, 2, 3. Nevertheless, there exist points \(\bf{x} \in {\mathbb{R}}^{2}\) for which \({\left (\text{ D}{\phi }^{l}\right )}_{\bf{x}}\) has eigenvalues with algebraic multiplicity 2 for l = 2, 3.
14.
Since we restricted ourselves to d ≤ 3 in this experiment, we did not apply the dimension-adaptive algorithm to this problem.
15.
Further information concerning the setting and the dataset can be found at http://www.neural-forecasting-competition.com/NN3/index.htm.

References

H.-J. Bungartz and M. Griebel. Sparse grids. Acta Numerica, 13:147–269, 2004.
Article MathSciNet Google Scholar
M. Casdagli, T. Sauer, and J. Yorke. Embedology. Journal of Statistical Physics, 65:576–616, 1991.
MathSciNet Google Scholar
C. Chang and C. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
C. Feuersänger. Sparse Grid Methods for Higher Dimensional Approximation. PhD thesis, Institute for Numerical Simulation, University of Bonn, 2010.
Google Scholar
C. Feuersänger and M. Griebel. Principal manifold learning by sparse grids. Computing, 85(4), 2009.
Google Scholar
J. Garcke. Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten dünnen Gittern. PhD thesis, Institute for Numerical Simulation, University of Bonn, 2004.
Google Scholar
J. Garcke. A dimension adaptive combination technique using localised adaptation criteria. In H. Bock, X. Hoang, R. Rannacher, and J. Schlöder, editors, Modeling, Simulation and Optimization of Complex Processes, pages 115–125. Springer, 2012.
Google Scholar
J. Garcke, T. Gerstner, and M. Griebel. Intraday foreign exchange rate forecasting using sparse grids. In J. Garcke and M. Griebel, editors, Sparse grids and applications, 81–105, 2012.
Google Scholar
J. Garcke and M. Hegland. Fitting multidimensional data using gradient penalties and the sparse grid combination technique. Computing, 84(1–2):1–25, 2009.
Article MathSciNet MATH Google Scholar
T. Gerstner and M. Griebel. Dimension–adaptive tensor–product quadrature. Computing, 71(1):65–87, 2003.
Article MathSciNet MATH Google Scholar
G. Golub and W. Kahan. Calculating the singular values and pseudo-inverse of a matrix. Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis, 2(2):205–224, 1965.
Article MathSciNet Google Scholar
P. Grassberger and I. Procaccia. Measuring the strangeness of strange attractors. Physica, D9:189–208, 1983.
MathSciNet Google Scholar
M. Griebel. Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences. Computing, 61(2):151–179, 1998.
Article MathSciNet MATH Google Scholar
M. Griebel and M. Hegland. A finite element method for density estimation with Gaussian priors. SIAM Journal on Numerical Analysis, 47(6), 2010.
Google Scholar
M. Griebel and P. Oswald. Tensor product type subspace splitting and multilevel iterative methods for anisotropic problems. Adv. Comput. Math., 4:171–206, 1995.
Article MathSciNet MATH Google Scholar
M. Griebel, M. Schneider, and C. Zenger. A combination technique for the solution of sparse grid problems. In P. de Groen and R. Beauwens, editors, Iterative Methods in Linear Algebra, pages 263–281. IMACS, Elsevier, North Holland, 1992.
Google Scholar
M. Hegland. Adaptive sparse grids. ANZIAM J., 44:C335–C353, 2003.
MathSciNet Google Scholar
M. Hénon. A two-dimensional mapping with a strange attractor. Communications in Mathematical Physics, 50:69–77, 1976.
Article MathSciNet MATH Google Scholar
J. Huke. Embedding nonlinear dynamical systems: A guide to Takens’ theorem, 2006. Manchester Institute for Mathematical Sciences EPrint: 2006.26.
Google Scholar
J. Imai and K. Tan. Minimizing effective dimension using linear transformation. In Monte Carlo and Quasi-Monte Carlo Methods 2002, pages 275–292. Springer, 2004.
Google Scholar
H. Kantz and T. Schreiber. Nonlinear Time Series Analysis. Cambridge University Press, 2004. 2nd edition.
Google Scholar
A. Krueger. Implementation of a fast box-counting algorithm. Computer Physics Communications, 98:224–234, 1996.
Article Google Scholar
L. Liebovitch and T. Toth. A fast algorithm to determine fractal dimensions by box counting. Physics Letters A, 141(8,9):386–390, 1989.
Google Scholar
B. Schölkopf and A. Smola. Learning with Kernels – Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press – Cambridge, Massachusetts, 2002.
Google Scholar
F. Takens. Detecting strange attractors in turbulence. Dynamical Systems and Turbulence, Lecture Notes in Mathematics, (898):366–381, 1981.
Google Scholar
J. Theiler. Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36(9):4456–4462, 1987.
Article MathSciNet Google Scholar
A. Tikhonov. Solution of incorrectly formulated problems and the regularization method. Soviet Math. Dokl., 4:1035–1038, 1963.
Google Scholar
G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series In Applied Mathematics. SIAM: Society for Industrial and Applied Mathematics, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Numerical Simulation, University of Bonn, 53115, Bonn, Germany
Bastian Bohn & Michael Griebel

Authors

Bastian Bohn
View author publications
You can also search for this author in PubMed Google Scholar
Michael Griebel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bastian Bohn .

Editor information

Editors and Affiliations

, Institut für Numerische Simulation, Universität Bonn, Wegelerstr. 6, Bonn, 53115, Germany
Jochen Garcke
Inst. Numerische Simulation, Universität Bonn, Wegelerstr. 6, Bonn, 53115, Germany
Michael Griebel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bohn, B., Griebel, M. (2012). An Adaptive Sparse Grid Approach for Time Series Prediction. In: Garcke, J., Griebel, M. (eds) Sparse Grids and Applications. Lecture Notes in Computational Science and Engineering, vol 88. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31703-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-31703-3_1
Published: 29 August 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31702-6
Online ISBN: 978-3-642-31703-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics