Kernel methods for the approximation of discrete-time linear autonomous and control systems

Hamzi, Boumediene; Colonius, Fritz

doi:10.1007/s42452-019-0701-3

Kernel methods for the approximation of discrete-time linear autonomous and control systems

Research Article
Open access
Published: 07 June 2019

Volume 1, article number 674, (2019)
Cite this article

Download PDF

You have full access to this open access article

SN Applied Sciences Aims and scope Submit manuscript

Kernel methods for the approximation of discrete-time linear autonomous and control systems

Download PDF

1142 Accesses
8 Citations
Explore all metrics

Abstract

Methods from learning theory are used in the state space of linear dynamical and control systems in order to estimate relevant matrices and some relevant quantities such as the topological entropy. An application to stabilization via algebraic Riccati equations is included by viewing a control system as an autonomous system in an extended space of states and control inputs. Kernel methods are the main techniques used in this paper and the approach is illustrated via a series of numerical examples. The advantage of using kernel methods is that they allow to perform function approximation from data and, as illustrated in this paper, allow to approximate linear discrete-time autonomous and control systems from data.

Kernel Methods for Discrete-Time Linear Equations

Kernel Methods

Approximation of eigenfunctions in kernel-based spaces

Article 21 January 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This paper discusses several problems in dynamical systems and control, where methods from learning theory are used in the state space of linear systems. This is in contrast to previous approaches in the frequency domain [10, 11]. We refer to [11] for a general survey on applications of machine learning to system identification where similar problems have been treated using different techniques.

Basically, learning theory allows to deal with problems when only data from a given system are given. Reproducing Kernel Hilbert spaces (RKHS) allow to work in a very large dimensional space in order to simplify the underlying problem. We will discuss this in the simple case when the matrix A describing a linear discrete-time system is unknown, but a time series from the underlying linear dynamical system is given. We propose a method to estimate the underlying matrix using kernel methods. Applications are given in the stable and unstable case and for estimating the eigenvalues and topological entropy for a linear map. Furthermore, in the control case, estimation of the relevant matrices for a linear control system is done by viewing a linear control system as a dynamical system in the extended space of states and control inputs. Stabilization via linear-quadratic optimal control is discussed.

The emphasis of the present paper is on the formulation of a number of problems in dynamical systems and control and to illustrate the applicability of our approach via a series of numerical examples. This paper should be viewed as a preliminary step to extend these results to nonlinear discrete-time systems within the spirit of [3, 4] where the authors showed that RKHSs act as “linearizing spaces” and that this approach offers tools for a data-based theory for nonlinear (continuous-time) dynamical systems. The approach used in these papers is based on embedding a nonlinear system in a high (or infinite) dimensional reproducing kernel Hilbert space (RKHS) where linear theory is applied. To illustrate this approach, consider a polynomial in $\mathbb {R}$, $p(x)=\alpha + \beta x +\gamma x^2$ where $\alpha , \beta , \gamma$ are real numbers. If we consider the map $\phi {:}\, \mathbb {R}\rightarrow \mathbb {R}^3$ defined as $\phi (x)=[1 \; x \; x^2]^{T}$ then $p(x) = {\alpha } \cdot [1 \; x \; x^2]^{T}= {\alpha } \cdot \phi (x)$ is an affine polynomial in the variable $\phi (x)$. Similarly, consider the nonlinear discrete-time system $x(k+1)=x(k)+x^2(k)$. By rewriting it as $x(k+1)=[1 \; 1] \left[ \begin{array}{c}x(k) \\ x(k)^2 \end{array} \right]$, the nonlinear system becomes linear in the variable $[x(k) \; x(k)^2]$.

The contents are as follows: In Sect. 2, the problem is stated formally and an algorithm based on kernel methods is given for the stable case. In Sect. 3, the algorithm is extended to the unstable case. In particular, the topological entropy of linear maps is computed (which boils down to computing unstable eigenvalues). In Sect. 4, identification of linear control systems is considered and Sect. 5 discusses their stabilization. Every section contains several numerically computed examples (via MATLAB) illustrating the approach. Section 6 draws some conclusions from the numerical experiments. For the reader’s convenience we have collected in the appendix basic concepts from learning theory as well as some hints to the relevant literature. A preliminary version of this article appeared in Hamzi and Colonius [9]

2 Statement of the problem

Consider the linear discrete-time system

$$\begin{aligned} x(k+1)=Ax(k), \end{aligned}$$

(1)

where $A=[a_{i,j}]\in {\mathbb {R}}^{n\times n}$. We want to estimate A from the time series $x(1)+\eta _{1}$, $\ldots$, $x(N)+\eta _{N}$ where the initial condition x(0) is known and $\eta _{i}$ are distributed according to a probability measure $\rho _{x}$ that satisfies the following condition (this is the Special Assumption in [18]).

Assumption The measure $\rho _{x}$ is the marginal on $X={\mathbb {R}} ^{n}$ of a Borel measure $\rho$ on $X\times {\mathbb {R}}$ with zero mean supported on $[-\,M_{x},M_{x}],M_{x}>0$.

One obtains from (1) for the components of the time series that

$$\begin{aligned} x_{i}(k+1)=\sum _{j=1}^{n}a_{ij}x_{j}(k). \end{aligned}$$

(2)

For every i we want to estimate the coefficients $a_{ij},j=1,\ldots ,n$. They are determined by the linear maps $f^{*}_{i}{:}\,{\mathbb {R}}^{n} \rightarrow {\mathbb {R}}$ given by

$$\begin{aligned} (x_{1},\ldots ,x_{n})\mapsto \sum _{j=1}^{n}a_{ij}x_{j}. \end{aligned}$$

(3)

This problem can be reformulated as a learning problem as described in the “Appendix” where $f^{*}_{i}$ in (3) plays the role of the unknown function (74) and $(x(k),x_{i}(k+1)+\eta _{i})$ are the samples in (76).

We note that in [18], the authors do not consider time series and that we apply their results to time series.

In order to approximate $f^{*}_{i}$, we minimize the criterion in (79). For a positive definite kernel K, let $f_{i}$ be the kernel expansion of $f^{*}_{i}$ in the corresponding RKHS ${\mathcal {H}}_{K}$. Then $f_{i}=\sum _{j=1}^{\infty }c_{i,j}\phi _{j}$ with certain coefficients $c_{ij}\in {\mathbb {R}}$ and

$$\begin{aligned} \Vert f_{i}\Vert _{{\mathcal {H}}_{K}}=\displaystyle \sum _{j=1}^{\infty }\frac{c_{i,j}^{2} }{\lambda _{j}}, \end{aligned}$$

(4)

where $(\lambda _{j},\phi _{j})$ are the eigenvalues and eigenfunctions of the integral operator $L_{K}{:}\,{\mathcal {L}}_{\nu }^{2}({\mathcal {X}})\rightarrow {\mathcal {C}}({\mathcal {X}})$ given by $(L_{K}f)(x)=\int K(x,t)f(t)d\nu (t)$ with a Borel measure $\nu$ on ${\mathcal {X}}$. Thus $L_{K}\phi _{j}=\lambda _{j}\phi _{j}$ for $j\in {\mathbb {N}}^{*}$ and the eigenvalues $\lambda _{j}\ge 0$.

Then we consider the problem of minimizing over $(c_{i,1},$ $\ldots ,c_{i,N})$ the functional

$$\begin{aligned} {\mathcal {E}}_{i}=\frac{1}{N}\sum _{k=1}^{N}(y_{i}(k)-f_{i} (x(k)))^{2}+\gamma _{i}\Vert f_{i}\Vert _{{\mathcal {H}}_{K}}^{2}, \end{aligned}$$

(5)

where $y_{i}(k):=x_{i}(k+1)+\eta _{i}=f^{*}_{i}(x(k))+\eta _{i}$ and $\gamma _{i}$ is a regularization parameter.

Since we are dealing with a linear problem, it is natural to choose the linear kernel $k(x,y)=\langle x,y\rangle$. Then the solution of the above optimization problem is given by the kernel expansion of $x_{i}(k+1)$, $i=1,\ldots ,n$,

$$\begin{aligned} y_{i}(k):=x_{i}(k+1)=\sum _{j=1}^{N}c_{ij}\langle x(j),x(k)\rangle , \end{aligned}$$

(6)

where the $c_{ij}$ satisfy the following set of equations:

$$\begin{aligned} \left[ \begin{array} [c]{c} x_{i}(1)\\ \vdots \\ x_{i}(N) \end{array} \right] =\Bigg (N\lambda I_{d}+{\mathbb {K}}\Bigg )\left[ \begin{array} [c]{c} c_{i1}\\ \vdots \\ c_{iN} \end{array} \right] , \end{aligned}$$

(7)

with

$$\begin{aligned} {\mathbb {K:}}=\left[ \begin{array} [c]{ccc} \sum _{\ell =1}^{n}x_{\ell }(1)x_{\ell }(0) &{}\quad \cdots &{}\quad \sum _{\ell =1}^{n}x_{\ell }(N)x_{\ell }(0)\\ \vdots &{}\quad \cdots &{}\quad \vdots \\ \sum _{\ell =1}^{n}x_{\ell }(1)x_{\ell }(N-1) &{}\quad \cdots &{}\quad \sum _{\ell =1}^{n}x_{\ell }(N)x_{\ell }(N-1) \end{array} \right] . \end{aligned}$$

(8)

This is a consequence of Theorem 2.

From (2), we have

$$\begin{aligned} x_{i}(k+1)&=\sum _{j=1}^{N}c_{ij}\langle x(j),x(k)\rangle =\sum _{j=1} ^{N}c_{ij}x(j)^{T}\cdot x(k)\\&=\sum _{j=1}^{N}\sum _{\ell =1}^{n}c_{ij}x_{\ell }(j)x_{\ell }(k)\\&=\sum _{\ell =1}^{n}\sum _{j=1}^{N}c_{ij}x_{\ell }(j)x_{\ell }(k). \end{aligned}$$

Then an estimate of the entries of A is given by

$$\begin{aligned} \hat{a}_{i\ell }=\sum _{j=1}^{N}c_{i,j}x_{\ell }(j). \end{aligned}$$

(9)

This discussion leads us to the following basic algorithm.

Algorithm ${\mathcal {A}}$ If the eigenvalues of A are all within the unit circle, one proceeds as follows in order to estimate A. Given the time series $x(1),\ldots ,x(N)$ solve the system of Eq. (7) to find the numbers $c_{ij}$ and then compute $\hat{a}_{i\ell }$ from (9).

Before we present numerical examples and modifications and applications of this algorithm, it is worthwhile to note the following preliminary remarks indicating what may be expected.

The stability assumption in algorithm ${\mathcal {A}}$ is imposed, since otherwise the time series will diverge exponentially. Then, already for a moderately sized number of data points ($N\approx 10^{2}$) Eq. (7) will be ill conditioned. Hence for unstable A, modifications of algorithm ${\mathcal {A}}$ are required.

While for test examples one can compare the entries of the matrix A and its approximation $\hat{A}$, it may appear more realistic to compare the values $x(1),\ldots ,x(N)$ of the data series and the values $\hat{x}(1),\ldots ,\hat{x}(N)$ generated by the iteration of the matrix $\hat{A}$.

In general, one should not expect that increasing the number of data points will lead to better approximations of the matrix A. If the matrix A is diagonalizable, for generic initial points $x(0)\in {\mathbb {R}}^{n}$ the data points x(k) will approach, for $N\rightarrow \infty$, the eigenspace for the eigenvalue with maximal modulus. For general A and generic initial points $x(0)\in {\mathbb {R}}^{n}$, the data points x(N) will approach for, $N\rightarrow \infty$, the largest Lyapunov space (i.e., the sum of the real generalized eigenspaces for eigenvalues with maximal modulus). Thus in the limit for $N\rightarrow \infty$, only part of the matrix can be approximated. A detailed discussion of this (well known) limit behavior is, e.g., given in Colonius and Kliemann [6]. A consequence is that a medium length of the time series should be adequate.

This problem can be overcome by choosing the regularization parameter $\gamma$ in (5) and (7) using the method of cross validation described in [12]. Briefly, in order to choose $\gamma$, we consider a set of values of regularization parameters: we run the learning algorithm over a subset of the samples for each value of the regularization parameter and choose the one that performs the best on the remaining data set. Cross validation helps also in the presence of noise and to improve the results beyond the training set.

A theoretical justification of our algorithm is guaranteed by the error estimates in Theorem 5. In fact, for the linear dynamical system (1), we have that $f^{*}$ in (74) is the linear map $f^{*}(x)=f_{i}(x)$ in (3) and the samples ${\mathbf {s}}$ in (76) are $(x(k),x_{i}(k+1)+\eta _{i})$. Moreover, by choosing the linear kernel $k(x,y)=\langle x,y\rangle$ we get that $f^{*}\in {\mathcal {H}}_{K}$. In this case, (84) has the form

$$\begin{aligned} \Vert \hat{x}_{i}(k+1)-x_{i}(k+1)\Vert ^{2}\le 2C_{\bar{x}}{\mathcal {E}}_{\text{ samp }} +2\Vert x(k+1)\Vert _{K}^{2}(\gamma +8C_{\bar{x}}\varDelta ), \end{aligned}$$

(10)

where $\Vert x_{i}(k+1)\Vert _{{\mathcal {H}}_{K}}=\sum _{j=1}^{\infty }\frac{c_{i,j}^{2} }{\lambda _{j}}$. See [3] for more details about error estimates in the general nonlinear case.

The first term in the right hand side of inequality (10) represents the error due to the noise (sampling error) and the second term represents the error due to regularization (regularization error) and the finite-number of samples (integration error).

Next, we discuss several numerical examples, beginning with the following scalar equation.

Example 1

Consider $x(k+1)=\alpha x(k)$ with $\alpha =0.5$. With the initial condition $x(0)=-\,0.5$, we generate the time series $x(1),\ldots ,x(100)$. Applying algorithm ${\mathcal {A}}$ with the regularization parameter $\gamma =10^{-6}$ we compute $\hat{\alpha }=0.4997$. Using cross validation, we get that $\hat{\alpha }=0.5$ with regularization parameter $\gamma =1.5259\,\times \,10^{-5}$. When we introduce an i.i.d perturbation signal $\eta _{i}\in [-\,0.1,0.1]$, the algorithm does not behave well when we fix the regularization parameter. With cross validation, the algorithm works quite well and the regularization parameter adapts to the realization of the signal $\eta _{i}$. Here, for $e(k)=x(k)-\hat{x}(k)$ with $x(k+1)=\alpha x(k)$ and $\hat{x}(k+1)=\hat{\alpha }\hat{x}(k)$, we get that $\Vert e(300)\Vert =\sqrt{\sum _{i=1}^{300}e^{2}(i)}=0.0914$ and $\sqrt{\sum _{i=100}^{300}e^{2} (i)}=1.8218\,\times \,10^{-30}$.

We observe an analogous behavior of the algorithm when the data are generated from $x(k+1)=\alpha x(k)+\varepsilon x(k)^{2}$ where the algorithm works well in the presence of noise and structural perturbations when using cross validation. When $\varepsilon =0.1$ and with an i.i.d perturbation signal $\eta _{i}\in [-\,0.1,0.1]$, $\hat{\alpha }$ varies between 0.38 and 0.58 depending on the realization of $\eta _{i}$ but $\Vert e(300)\Vert =\sqrt{\sum _{i=1}^{300}e^{2}(i)}=0.2290$ and $\sqrt{\sum _{i=100}^{300}e^{2} (i)}=2.8098\,\times \,10^{-30}$ which shows that the error e decreases exponentially and the generalization properties of the algorithm are quite good.

Example 2

Consider $x(k+1)=Ax(k)$ with matrix A given by

$$\begin{aligned} A:=\left[ \begin{array} [c]{cccc} -\,0.5 &{}\quad 1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0.6 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0.7 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad -\,0.8 \end{array} \right] . \end{aligned}$$

(11)

For the initial condition $x=[-\,0.9,0.1,15,0.2]^{\prime }$ and with $N=100$ data points, we get

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} -\,0.5000 &{}\quad 1.0000 &{}\quad 0.0000 &{}\quad -\,0.0000\\ 0.0000 &{}\quad 0.6000 &{}\quad 1.0000 &{}\quad 0.0000\\ 0.0000 &{}\quad -\,0.0000 &{}\quad 0.7000 &{}\quad 0.9994\\ -\,0.0000 &{}\quad 0.0000 &{}\quad -\,0.0000 &{}\quad -\,0.7995 \end{array} \right] . \end{aligned}$$

(12)

We then simulate $x(k+1)=Ax(k)$ and $\hat{x}(k+1)=\hat{A}\hat{x}(k)$ for $k=0,\ldots ,200$ to test the accuracy of our approximation beyond the interval $k=0,\ldots ,100$. Then the norm of the error $e_{j}(k)=x_{j}(k)-\hat{x} _{j}(k)$, for $j=1,\ldots ,4$, $\Vert e_{j}(300)\Vert =\sqrt{\sum _{i=1}^{300}e_{j} ^{2}(i)}$ is of the order of $10^{-3}$ and $\sqrt{\sum _{i=100}^{300}e_{j} ^{2}(i)}$ is of the order of $10^{-11}$ which shows that the error e decreases exponentially and the generalization properties of the algorithm are quite good. The regularization parameters are $\gamma _{i}=0.9313\,\times \,10^{-9}$ for $i=1,\ldots ,4$.

Also in the presence of small noise $\eta _{i}\in [-\,0.01,0.01]$, the algorithm behaves well and the regularization parameters adapt to the realization of $\eta _{i}$. For example, for a certain realizations of $\eta _{i}$, we obtain the regularization parameters

$$\begin{aligned} \gamma _{1}& = 0.0039, \gamma _{2}=2.4114\,\times \,10^{-4},\nonumber \\ \gamma _{3}& = 9.3132\,\times \, 10^{-10}, \gamma _{4}=2 \,\times \,10^{-3} \end{aligned}$$

(13)

and the error $\Vert e_{j}(300)\Vert =\sqrt{\sum _{i=1}^{300}e_{j}^{2}(i)}$ is of the order of $10^{-1}$ and $\sqrt{\sum _{i=100}^{300}e_{j}^{2}(i)}$ is of the order of $10^{-9}$ .

Suppose that in addition to a small noise $\eta _{i}\in [-0.01,0.01],$ there is a quadratic structural perturbation, i.e.,

$$\begin{aligned} x(k+1)=Ax(k)+\varepsilon \left[ \begin{array} [c]{c} x_{1}(k)^{2}\\ x_{2}(k)^{2}\\ x_{3}(k)^{2}\\ x_{4}(k)^{2} \end{array} \right] . \end{aligned}$$

(14)

Then with cross validation for $\varepsilon =0.001$ the algorithm behaves well. For a particular realization of $\eta$, the error $\Vert e_{j}(300)\Vert =\sqrt{\sum _{i=1}^{300}e_{j}^{2}(i)}$ is between 5 and 15 but $\sqrt{\sum _{i=100}^{300}e_{j}^{2}(i)}$ is of the order of $10^{-9}$ and the regularization parameters are

$$\begin{aligned} \gamma _{1}& = 0.5,\gamma _{2}=9.3132\,\times \,10^{-10},\gamma _{3}=9.3132\,\times \, 10^{-10},\nonumber \\ \gamma _{4}& = 9.3132\,\times \,10^{-10}. \end{aligned}$$

(15)

These examples show a very good behavior of the algorithm.

3 Unstable case

Consider

$$\begin{aligned} x(k+1)=Ax(k){\text { with }}A\in {\mathbb {R}}^{n\times n}, \end{aligned}$$

(16)

where some of the eigenvalues of A are outside the unit circle. Again, we want to estimate A when the following data are given,

$$\begin{aligned} x(1),x(2),\ldots ,x(N), \end{aligned}$$

(17)

which are generated by system (16), thus $x(k)=A^{k-1}x(1)$.

As remarked above, a direct application of the algorithm ${\mathcal {A}}$ will not work, since the time series diverges fast. Instead, we construct a new time series from (17) associated to an auxiliary stable system.

For a constant $\sigma >0$ we define the auxiliary system by

$$\begin{aligned} y(k+1)=\tilde{A}y(k){\text { with }}\tilde{A}:=\frac{1}{\sigma }A. \end{aligned}$$

(18)

Thus

$$\begin{aligned} y(k)=\left( \frac{A}{\sigma }\right) ^{k-1}y(1) \end{aligned}$$

(19)

and with $y(1)=x(1)$ one finds

$$\begin{aligned} y(k)=\frac{1}{\sigma ^{k-1}}A^{k-1}x(1)=\frac{1}{\sigma ^{k-1}}x(k). \end{aligned}$$

(20)

If we choose $\sigma >0$ such that the eigenvalues of $\frac{A}{\sigma }$ are in the unit circle, we can apply algorithm ${\mathcal {A}}$ to this stable matrix and hence we would obtain an estimate of $\frac{A}{\sigma }$ and hence of A. However, since the eigenvalues of the matrix A are unknown, we will be content with a somewhat weaker condition than stability of $\frac{A}{\sigma }$.

The data (17) for system (16) yield the following data for system (18):

$$\begin{aligned} y(1):=x(1),y(2):=\frac{1}{\sigma }x(2),\ldots ,y(N):=\frac{1}{\sigma ^{N-1}}x(N). \end{aligned}$$

(21)

We propose to choose $\sigma$ as follows: Define

$$\begin{aligned} \sigma :=\max \left\{ \frac{\left\| x(k+1)\right\| }{\left\| x(k)\right\| },k\in \{0,1,\ldots ,N\}\right\} . \end{aligned}$$

(22)

Clearly, the inequality $\sigma \le \left\| A\right\|$ holds. We apply algorithm ${\mathcal {A}}$ to the time series y(k). This yields an estimate of $\frac{A}{\sigma }$ and hence an estimate $\hat{A}$ of A.

For general A, this choice of $\sigma$ certainly does not guarantee that the eigenvalues of $\frac{A}{\sigma }$ are within the unit circle. However, as mentioned above, a generic data sequence $x(k),k\in {\mathbb {N}}$, will converge to the eigenspace of the eigenvalue with maximal modulus. Hence $\frac{\left\| x(k+1)\right\| }{\left\| x(k)\right\| }$ will approach the maximal modulus of an eigenvalue, thus this choice of $\sigma$ will lead to a matrix $\frac{A}{\sigma }$ which is not “too unstable”.

Example 3

Consider $x(k+1)=\alpha x(k)$ with $\alpha =11.46$. With the initial condition $x(0)=-\,0.5$, we generate the time series $x(1),\ldots ,x(100)$. The algorithm above with the regularization parameter $\gamma =10^{-6}$ yields the estimate $\hat{\alpha }=11.4086$. Cross validation leads to the regularization parameter $\gamma =9.5367\,\times \,10^{-7}$ and the estimate $\hat{\alpha }=11.4599$. In the presence of a small noise $\eta \in [-\,0.1,0.1]$, cross validation yields the regularization parameter $\gamma =0.002$ and the slightly worse estimate $\hat{\alpha }=11.1319$.

We observe the same behavior in higher dimensional systems where the eigenvalues are of the same order of magnitude.

Example 4

Consider $x(k+1)=Ax(k)$ with

$$\begin{aligned} A=\left[ \begin{array} [c]{cccc} 20 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad -\,10 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 15 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad -\,25 \end{array} \right] \end{aligned}$$

(23)

Using cross validation, we get that

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} 20.0000 &{}\quad 0.0000 &{}\quad 0.0001 &{}\quad 0.0000\\ -\,0.0000 &{}\quad -\,10.0000 &{}\quad 0.0000 &{}\quad -\,0.0000\\ 0.0000 &{}\quad -\,0.0000 &{}\quad 14.9998 &{}\quad 0.0000\\ -\,0.0000 &{}\quad -\,0.0000 &{}\quad -\,0.0000 &{}\quad -\,25.0003 \end{array} \right] \end{aligned}$$

(24)

for $\gamma _{i}=0.9313\,\times \,10^{-9}$, $i=1,\ldots ,4$.

For different realizations of a noise $\eta _{i}$ of magnitude $0.5\,\times \,10^{-4}$, cross validation gives a good approximation of A and the eigenvalues of $A-\hat{A}$ are all within the unit disk with amplitude of the order of $10^{-3}$ showing that the dynamics of the error $e(k)=x(k)-\hat{x}(k)$ is asymptotically stable. For example, for a particular realization of $\eta _{i}$ of magnitude $0.5\,\times \,10^{-4}$, we get

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} 19.9635 &{}\quad 0.0086 &{}\quad 0.1365 &{}\quad -\,0.0007\\ -\,0.0177 &{}\quad -\,10.0025 &{}\quad 0.0379 &{}\quad -\,0.0007\\ -\,0.0177 &{}\quad -\,0.0025 &{}\quad 15.0376 &{}\quad -\,0.0007\\ -\,0.0132 &{}\quad -\,0.0167 &{}\quad 0.0065 &{}\quad -\,25.0000 \end{array} \right] \end{aligned}$$

(25)

with regularization parameters

$$\begin{aligned} \gamma _{1}& = 1.9073\,\times \,10^{-6},\gamma _{2}=9.3132\,\times \,10^{-10},\gamma _{3}=9.3132\,\times \,10^{-10},\nonumber \\ \gamma _{4}& = 1.2207\,\times \,10^{-4}. \end{aligned}$$

(26)

The algorithm fails in the presence of quadratic structural perturbations. This is due to the choice of a linear kernel. A polynomial kernel, for example, would allow for nonlinear perturbations but this would require a complete reformulation of our algorithm. We leave the extension of our algorithm to the nonlinear case for future work.

The next example is an unstable system with a large gap between the eigenvalues.

Example 5

Consider the system $x(k+1)=Ax(k)$ with

$$\begin{aligned} A=\left[ \begin{array} [c]{cc} 20 &{}\quad 0\\ 0 &{}\quad -\,0.1 \end{array} \right] . \end{aligned}$$

(27)

With the initial condition $x(0)=[-\,1.9,1]$, we generate the time series $x(1),\ldots ,x(100)$. The algorithm above yields the (excellent) estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} 20.0000 &{}\quad 0.0000\\ -\,0.0000 &{}\quad -\,0.1000 \end{array} \right] , \end{aligned}$$

(28)

In the presence of noise of maximal amplitude $10^{-4}$ , the algorithm approximates well only the large entry $a_{11}=20$: For a first realization of $\eta _{i}$ and with cross validation, we get

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} 19.9997 &{}\quad -\,0.0111\\ 0.0000 &{}\quad -\,0.1104 \end{array} \right] , \end{aligned}$$

(29)

with $\gamma _{1}=1.5259\,\times \,10^{-5}$ and $\gamma _{2}=2^{20}$. However another realization of $\eta _{i}$ leads to

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} 19.9994 &{}\quad -\,0.0011\\ 0.0000 &{}\quad -\,0.0000 \end{array} \right] , \end{aligned}$$

(30)

with $\gamma _{1}=3.0518\,\times \,10^{-5}$ and $\gamma _{2}=2.8147\,\times \,10^{14}$. This is due to the fact that the data converge to the eigenspace generated by the largest eigenvalue $\lambda =20$. However, the eigenvalues of $A-\hat{A}$ are within the unit disk with small amplitude which guarantees that the error dynamics of $e(k)=x(k)-\hat{x}(k)$ converges to the origin quite quickly. We observe the same phenomenon with

$$\begin{aligned} A=\left[ \begin{array} [c]{cc} -\,0.5 &{}\quad 0\\ 0 &{}\quad 25 \end{array} \right] . \end{aligned}$$

(31)

Here, in the absence of noise, we obtain the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} -\,0.5000 &{}\quad 0.0000\\ -\,0.0000 &{}\quad 25.0000 \end{array} \right] , \end{aligned}$$

(32)

with $\gamma _{1}=\gamma _{2}=0.9313\,\times \,10^{-9}$. In the presence of noise $\eta _{i}$ with amplitude $10^{-4}$, the data converge to the eigenspace corresponding to the largest eigenvalue $\lambda =25$: for some realization of $\eta _{i}$ one obtains the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} -\,0.4809 &{}\quad 0.0008\\ 0.0164 &{}\quad 24.9960 \end{array} \right] , \end{aligned}$$

(33)

while for another realization of $\eta$

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} -\,0.0000 &{}\quad -\,0.0000\\ -\,1.0067 &{}\quad 24.8696 \end{array} \right] . \end{aligned}$$

(34)

The regularization parameters $\gamma _{1}$ and $\gamma _{2}$ adapt to the realization of the noise.

As already remarked in the end of Sect. 2, we see that “more data” does not always necessarily lead to better results, since the data sequence converges to the eigenspace generated by the largest eigenvalue. However, whether with or without noise, the approximations of A are good enough to reduce the error between $x(k+1)=Ax(k)$ and $\hat{x}(k+1)=\hat{A}\hat{x}(k)$ outside of the training examples, since cross-validation determines a good regularization parameter $\gamma$ that balances between good fitting and good prediction properties.

The next example has an eigenvalue on the unit circle.

Example 6

Consider $x(k+1)=Ax(k)$ with

$$\begin{aligned} A=\left[ \begin{array} [c]{cccc} 2.2500 &{}\quad -\,1.2500 &{}\quad 1.2500 &{}\quad -\,49.5500\\ 3.7500 &{}\quad -\,2.7500 &{}\quad 13.1500 &{}\quad -\,20.6500\\ 0 &{}\quad 0 &{}\quad 10.4000 &{}\quad -\,32.3000\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad -\,21.9000 \end{array} \right] . \end{aligned}$$

(35)

The set of eigenvalues of A is $\text{ spec }(A)=\{-\,1.5000,1.0000,10.4000,-\,21.9000\}$. In the absence of noise and initial condition $x=[-\,0.9,15,1.5.2.5]$ with $N=100$ points, we compute the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} 2.2500 &{}\quad -\,1.2500 &{}\quad 1.2498 &{}\quad -\,49.5499\\ 3.7500 &{}\quad -\,2.7500 &{}\quad 13.1498 &{}\quad -\,20.6499\\ 0.0000 &{}\quad 0.0000 &{}\quad 10.3998 &{}\quad -\,32.2999\\ 0.0000 &{}\quad 0.0000 &{}\quad -\,0.0001 &{}\quad -\,21.8999 \end{array} \right] , \end{aligned}$$

(36)

and regularization parameters $\gamma _{1}=\gamma _{2}=0.9313\,\times \,10^{-9}$. In this case, the set of eigenvalues of $\hat{A}$ is

$$\begin{aligned} \text{ spec }(\hat{A})=\{-21.9000,10.3999,-1.5000,1.0000\}. \end{aligned}$$

(37)

For a given realization of $\eta \in [-10^{-4},10^{-4}]$, we obtain the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} 2.2551 &{}\quad -\,1.2490 &{}\quad 1.2187 &{}\quad -\,49.5304\\ 3.7554 &{}\quad -\,2.7489 &{}\quad 13.1175 &{}\quad -\,20.6297\\ 0.0055 &{}\quad 0.0011 &{}\quad 10.3669 &{}\quad -\,32.2794\\ 0.0053 &{}\quad 0.0010 &{}\quad -\,0.0325 &{}\quad -\,21.8797 \end{array} \right] \end{aligned}$$

(38)

with $\gamma _{1}=0.0745\,\times \,10^{-7}$ and $\gamma _{2}=0.1490\,\times \,10^{-7}$. The eigenvalues of $A-\hat{A}$ are of the order of $10^{-4}$ which guarantees that the error dynamics converges quickly to the origin. However, the set of eigenvalues of $\hat{A}$ is

$$\begin{aligned} \text{ spec }(\hat{A})=\{-21.8996,10.3999,-1.5026,1.0134\}. \end{aligned}$$

(39)

Hence an additional unstable eigenvalue occurs.

Example 7

Consider $x(k+1)=Ax(k)$ with

$$\begin{aligned} A=\left[ \begin{array} [c]{cccc} -\,0.8500 &{}\quad 0.4500 &{}\quad -\,0.4500 &{}\quad -\,77.8500\\ -\,1.3500 &{}\quad 0.9500 &{}\quad 14.3500 &{}\quad -\,11.6500\\ 0 &{}\quad 0 &{}\quad 15.3000 &{}\quad -\,55.3000\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad -\,40.0000 \end{array} \right] . \end{aligned}$$

(40)

The eigenvalues of A are given by

$$\begin{aligned} \text{ spec }(A)=\{-\,0.4000,0.5000,15.3000,-\,40.0000\}. \end{aligned}$$

(41)

For an initial condition $x=[-\,0.9;15;1.5;2.5]$ and with $N=100$ data points, we get

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} -\,0.8498 &{}\quad 0.4501 &{}\quad -\,0.4499 &{}\quad -\,77.8504\\ -\,1.3499 &{}\quad 0.9500 &{}\quad 14.3501 &{}\quad -\,11.6502\\ 0.0001 &{}\quad 0.0001 &{}\quad 15.3001 &{}\quad -\,55.3004\\ -\,0.0004 &{}\quad -\,0.0002 &{}\quad -\,0.0004 &{}\quad -\,39.9987 \end{array} \right] \end{aligned}$$

(42)

with eigenvalues given by

$$\begin{aligned} \text{ spec }(\hat{A})=\{-\,40.0000,-\,0.3974,0.4982,15.3008\}. \end{aligned}$$

(43)

Here we used $\gamma _{i}=10^{-12}$, $i=1,\ldots ,4$. Moreover, the eigenvalues of $A-\hat{A}$ are quite small and such that the error dynamics converges quickly to the origin. In the presence of noise $\eta$, the algorithm approximates the largest eigenvalues of A but does not approximate the smaller (stable) ones. For example, for a particular realization of noise with amplitude $10^{-4}$, we get the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} -\,2.1100 &{}\quad -\,0.0993 &{}\quad -\,1.3259 &{}\quad -\,74.4543\\ -\,1.7053 &{}\quad 0.7777 &{}\quad 13.9397 &{}\quad -\,10.5308\\ -\,0.8277 &{}\quad -\,0.3692 &{}\quad 14.6466 &{}\quad -\,52.9920\\ -\,0.8283 &{}\quad -\,0.3694 &{}\quad -\,0.6539 &{}\quad -\,37.6904 \end{array} \right] \end{aligned}$$

(44)

and $\text{ spec }(\hat{A})=\{-\,40.0009,0.1620\pm 0.8438i,15.3008\}$.

For another realization of noise with amplitude $10^{-2}$, we get the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} -\,138.0893 &{}\quad -\,60.7052 &{}\quad -\,105.8111 &{}\quad 301.5029\\ -\,0.2435 &{}\quad 0.9101 &{}\quad 12.9638 &{}\quad -\,12.6745\\ -\,71.1408 &{}\quad -\,31.9557 &{}\quad -\,40.3842 &{}\quad 142.3170\\ -\,71.1408 &{}\quad -\,31.9557 &{}\quad -\,55.6843 &{}\quad 157.6172 \end{array} \right] \end{aligned}$$

(45)

and $\text{ spec }(\hat{A})=\{-\,40.1391,3.9326,0.9601,15.3002\}$.

The algorithm introduced above also allows us to compute the topological entropy of linear systems, since it is determined by the unstable eigenvalues. Recall that the topological entropy of a linear map on ${\mathbb {R}}^{n}$ is defined in the following way:

Fix a compact subset $K\subset {\mathbb {R}}^{n}$, a time $\tau \in {\mathbb {N}}$ and a constant $\varepsilon >0$. Then a set $R\subset {\mathbb {R}}^{n}$ is called $(\tau ,\varepsilon )$-spanning for K if for every $y\in K$ there is $x\in R$ with

$$\begin{aligned} \left\| A^{j}y-A^{j}x\right\| <\varepsilon {\text { for all }}j=0,\ldots ,\tau . \end{aligned}$$

(46)

By compactness of K, there are finite $(\tau ,\varepsilon )$-spanning sets. Let R be a $(\tau ,\varepsilon )$-spanning set of minimal cardinality $\#R=r_{\min }(\tau ,\varepsilon ,K)$. Then

$$\begin{aligned} h_{top}(K,A,\varepsilon ):=\lim _{\tau \rightarrow \infty }\frac{1}{\tau }\log r_{\min }(\tau ,\varepsilon ,K),h_{top}(K,A):=\underset{\varepsilon \rightarrow 0^{+}}{\lim }h_{top}(K,\varepsilon ). \end{aligned}$$

(47)

(the limits exist). Finally, the topological entropy of A is

$$\begin{aligned} h_{top}(A):=\sup _{K}h_{top}(K,A), \end{aligned}$$

(48)

where the supremum is taken over all compact subsets K of ${\mathbb {R}}^{n}.$

A classical result due to Bowen (cf. [21, Theorem 8.14]) shows that the topological entropy is determined by the sum of the unstable eigenvalues, i.e.,

$$\begin{aligned} h_{top}(A)=\sum \max (1,\left| \lambda \right| ), \end{aligned}$$

(49)

where summation is over all eigenvalues of A counted according to their algebraic multiplicity.

Hence, when we approximate the unstable eigenvalues of A by those of the matrix $\hat{A}$, we also get an approximation of the topological entropy.

Example 8

For Example 6, we get that $h_{top}(A)=34.80$ while for the estimate $\hat{A}$ one obtains $h_{top}(\hat{A})=34.7999$. For Example 7, we get that $h_{top}(A)=55.30$ and $h_{top}(\hat{A})=55.3008$. These estimates appear reasonably good.

4 Identification of linear control systems

Consider the linear control system

$$\begin{aligned} x(k+1)=Ax(k)+Bu(k), \end{aligned}$$

(50)

with $A\in {\mathbb {R}}^{n\times n}$ and $B\in {\mathbb {R}}^{n\times 1}$. We want to estimate the matrices A and B from the time series $x(1)+\eta _{1},$ $\ldots ,x(N)+\eta _{N}$ where $\eta$ satisfies the Assumption in Sect. 2. The initial condition x(0) and the control sequence u(0), $\ldots ,u(N)$ are assumed to be known.

In order to estimate A and B, we will extend algorithm ${\mathcal {A}}$. The ith component of system (50) is given by

$$\begin{aligned} x_{i}(k+1)=\sum _{j=1}^{n}a_{ij}x_{j}(k)+b_{i}u(k). \end{aligned}$$

(51)

For every i we want to estimate the coefficients $b_{i}$ and $a_{ij},j=1,$ $\ldots ,n$. Thus the linear map $f_{i}:{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}$ given by

$$\begin{aligned} (x_{1},\ldots ,x_{n},u)\mapsto \sum _{j=1}^{n}a_{ij}x_{j} +b_{i}u \end{aligned}$$

(52)

is unknown. To extend algorithm ${\mathcal {A}}$, we will view system (51) as a system of the form (2) where the state x is the extended state $\underline{x}=(x,u)\in {\mathbb {R}}^{n}\times {\mathbb {R}}$ for (50). Hence, the kernel expansion (6) becomes

$$\begin{aligned} {x}_{i}(k+1)=\sum _{j=1}^{N}c_{ij}\langle \underline{x} (j),\underline{x}(k)\rangle \end{aligned}$$

(53)

where $\underline{x}_{n+1}=u$ and the $c_{ij}$ satisfy the following set of equations:

$$\begin{aligned} \left[ \begin{array} [c]{c} x_{i}(1)\\ \vdots \\ x_{i}(N) \end{array} \right] =\Bigg (N\lambda I_{d}+\underline{\mathbb {K}}\Bigg )\left[ \begin{array} [c]{c} c_{i1}\\ \vdots \\ c_{iN} \end{array} \right] , \end{aligned}$$

(54)

with

$$\begin{aligned} \underline{\mathbb {K}}=\left[ \begin{array} [c]{ccc} \sum \nolimits _{\ell =1}^{n+1}\underline{x}_{\ell }(1)\underline{x}_{\ell }(0) &{} \cdots &{} \sum \nolimits _{\ell =1}^{n+1}\underline{x}_{\ell }(N)\underline{x}_{\ell }(0)\\ \vdots &{} \cdots &{} \vdots \\ \sum \nolimits _{\ell =1}^{n+1}\underline{x}_{\ell }(1)\underline{x}_{\ell }(N-1) &{} \cdots &{} \sum \nolimits _{\ell =1}^{n+1}\underline{x}_{\ell }(N)\underline{x}_{\ell }(N-1) \end{array} \right] . \end{aligned}$$

(55)

Let us emphasize that $u=x_{n+1}$ does not appear on the left hand side of (53)–(54).

In reference to the case when A has eigenvalues outside the unit circle, we adopt the same method as in Sect. 3 and define

$$\begin{aligned} \underline{\sigma }:=\max \left\{ \frac{\left\| \underline{x}(k+1)\right\| }{\left\| \underline{x}(k)\right\| },k\in \{0,1,\ldots ,N\}\right\} . \end{aligned}$$

(56)

Example 9

(One dimensional case) Consider $x(k+1)=-\,0.9x(k)+3.5u$. For an input $u(k)=\sin (k)+\cos (k)$ and for 100 points we obtain the estimate $\hat{A}=-\,0.9$ and $\hat{B}=3.5$ when there is no noise $\eta _{i}$. Here cross validation gives $\gamma _{1}=1.5259\,\times \,10^{-05}$ and $\gamma _{2}=1$. For a certain realization of the noise $\eta _{i}$ with amplitude 0.1, we get $\hat{A}=-\,0.9008$ and $\hat{B}=3.4983$. Here cross validation gives $\gamma _{1}=0.0078$ and $\gamma _{2}=1$.

Example 10

(Three dimensional stable case) Consider control system (50) with

$$\begin{aligned} A=\left[ \begin{array} [c]{ccc} -\,0.9 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad -\,0.1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 0.8 \end{array} \right] {\text { and }}B=\left[ \begin{array} [c]{c} -\,2.5\\ -\,3.5\\ 4.5 \end{array} \right] . \end{aligned}$$

(57)

With the input $u(k)=\sin (k)+\cos (k)$ and 100 points, one computes the estimates

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{ccc} -\,0.9000 &{}\quad 1.0000 &{}\quad 0.0000\\ 0.0000 &{}\quad -\,0.1000 &{}\quad 1.0000\\ -\,0.0000 &{}\quad -\,0.0000 &{}\quad 0.8000 \end{array} \right] {\text { and }}\hat{B}=\left[ \begin{array} [c]{c} -\,2.5000\\ -\,3.5000\\ 4.5000 \end{array} \right] . \end{aligned}$$

(58)

Here cross validation gives the regularization parameters $\gamma _{i}=0.1526\,\times \,10^{-4}$ for $i=1,\ldots ,4$. For some realization of perturbations $\eta _{i}$ with amplitude 0.1, one computes the estimates

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{ccc} -\,0.9047 &{}\quad 0.9984 &{}\quad -\,0.0029\\ -\,0.0047 &{}\quad -\,0.1016 &{}\quad 0.9971\\ -\,0.0048 &{}\quad -\,0.0018 &{}\quad 0.7971 \end{array} \right] {\text { and }}\hat{B}=\left[ \begin{array} [c]{c} -\,2.5326\\ -\,3.5321\\ 4.4661 \end{array} \right] . \end{aligned}$$

(59)

Here cross validation gives $\gamma _{1}=9.7656\,\times \,10^{-4}$, $\gamma _{2}=9.7656\,\times \,10^{-4}$, $\gamma _{3}=1.5259\,\times \,10^{-5}$, $\gamma _{4}=4$.

Example 11

(Three dimensional unstable case) Consider control system (50) with

$$\begin{aligned} A=\left[ \begin{array} [c]{ccc} -\,20 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 20 \end{array} \right] {\text { and }}B=\left[ \begin{array} [c]{c} 1\\ 2\\ 3 \end{array} \right] . \end{aligned}$$

(60)

The input $u(k)=\sin (k)+\cos (k)$ and 100 points give the estimates

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{ccc} -\,19.9945 &{}\quad 1.0009 &{}\quad -\,0.0137\\ 0.0013 &{}\quad 0.9995 &{}\quad 0.9919\\ 0.0155 &{}\quad -\,0.0171 &{}\quad 19.7835 \end{array} \right] {\text { and }}\hat{B}=\left[ \begin{array} [c]{c} 0.9898\\ 1.9898\\ 2.9333 \end{array} \right] . \end{aligned}$$

(61)

Here cross validation yields the regularization parameters $\gamma _{i}=0.8882\,\times \,10^{-15}$ for $i=1,\ldots ,4$. For some realization of perturbations $\eta _{i}$ with amplitude $10^{-4}$, one computes the estimates

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{ccc} -\,20.0000 &{}\quad 0.9334 &{}\quad -\,0.0058\\ -\,0.0008 &{}\quad 0.9382 &{}\quad 0.9939\\ -\,0.0008 &{}\quad -\,0.0590 &{}\quad 19.9937 \end{array} \right] {\text { and }}\hat{B}=\left[ \begin{array} [c]{c} 0.9819\\ 1.9814\\ 2.9811 \end{array} \right] . \end{aligned}$$

(62)

Here cross validation gives $\gamma _{1}=\gamma _{2}=0.2384\,\times \,10^{-6}$, $\gamma _{3}=\gamma _{4}=0.0596\,\times \,10^{-6}$.

These results show that algorithm ${\mathcal {A}}$ works quite well in these cases.

5 Stabilization via linear-quadratic optimal control

A basic problem for linear control systems is stabilization by state feedback. A standard method is to use linear quadratic optimal control, where the feedback is computed using the solution of an algebraic Riccati equation. In this section, we propose to replace in the algebraic Riccati equation the system matrix A by the estimate $\hat{A}$ obtained by learning theory.

The linear quadratic optimal control problem has the following form:

Minimize over all (continuous) inputs u

$$\begin{aligned} J_{\infty }(x_{0};u)=\sum _{k=0}^{\infty }\left[ x(k)^{\top }Qx(k)+u(k)^{\top }Ru(t)\right] \end{aligned}$$

(63)

with $x(\cdot )$ given by

$$\begin{aligned} {x}(k+1)=Ax(k)+Bu(k),\ k\ge 0,\ x(0)=x_{0}; \end{aligned}$$

(64)

here $Q\in {\mathbb {R}}^{n\times n}$ is positive semidefinite and $R\in {\mathbb {R}}^{m\times m}$ is positive definite, and $A\in {\mathbb {R}}^{n\times n},B\in {\mathbb {R}}^{n\times m}$.

Consider the discrete algebraic Riccati equation DARE

$$\begin{aligned} A^{\top }\left( P-PB(R+B^{T}PB)^{-1}B^{\top }P\right) A+Q=P. \end{aligned}$$

(65)

Obviously, every solution P is positive semi-definite. We cite the following theorem from [1].

Theorem Suppose that for every $x_{0}\in {\mathbb {R}}^{n}$ there is an input u, such that $J(x_{0},u)<\infty$. Then the following holds:

(i)
There is a unique solution P of the DARE.
(ii)
For every $x_{0}\in {\mathbb {R}}^{n}$ one has $J^{*}(x_{0}):=\inf \{J(x_{0},u)\left| {}\right. u$ an input$\}=x_{0}^{\top }Px_{0}$ and there is a unique optimal input $u^{*}$ with $J^{*}(x_{0})=J(x_{0},u^{*})$. This optimal input is generated by the feedback $F=(R+B^{T}PB)^{-1}B^{\top }PA$ and

$$\begin{aligned} u(k)=-Fx(k),k\ge 0. \end{aligned}$$

(66)

In particular, the feedback F stabilizes the system, i.e., ${x} (k+1)=(A-BF)x(k)$ is stable.

Now we use an estimate $\hat{A}$ and $\hat{B}$ (obtained by kernel methods) instead of A and B in the algebraic Riccati equation and obtain the solution $\hat{P}$. Will the corresponding feedback $u=\hat{F}x:=-B^{\top } \hat{P}x$ also stabilize the system, i.e., is the following system stable:

$$\begin{aligned} {x}(k+1)=(A-BB^{\top }\hat{P})x(k)? \end{aligned}$$

(67)

Example 12

Consider the one-dimensional system $x(k+1)=-\,0.9x(k)+3.5u$ in Example 9. In the absence of noise, we get $\hat{A}=-\,0.9$ and $\hat{B}=3.5$. We have that $A-B\hat{F}=\hat{A}-\hat{B}\hat{F}=-\,0.0643$. When there is noise of amplitude 0.1, we get that $\hat{A}=-\,0.9002$ and $\hat{B}=3.4929$ and $A-B\hat{F}=-\,0.0643$ while $\hat{A}-\hat{B}\hat{F}=-\,0.0610$. Hence, the controller improves stability.

Example 13

Consider control system (50) with

$$\begin{aligned} A=\left[ \begin{array} [c]{ccc} -\,0.9 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad -\,0.1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 0.8 \end{array} \right] {\text { and }}B=\left[ \begin{array} [c]{c} -\,2.5\\ -\,3.5\\ 4.5 \end{array} \right] . \end{aligned}$$

(68)

As illustrated in Example 10, without noise we get excellent approximations of A and B. For both cases, the set of eigenvalues of the closed-loop system is $\{-\,0.6172,0.4049,-\,0.0018\}$. With a noise of maximal amplitude 0.1, the estimates $\hat{A}$ and $\hat{B}$ are given in Example 10. For the feedback system one finds

$$\begin{aligned} \text{ spec }(\hat{A}-\hat{B}\hat{F})&=\{-\,0.6204,0.4053,-\,0.0018\},\\ \text{ spec }(A-{B}\hat{F})&=\{-\,0.6240,-\,0.0062,0.4111\}. \end{aligned}$$

In this example the feedback based on the estimate also stabilizes the original system.

Example 14

Consider control system (50) with

$$\begin{aligned} A=\left[ \begin{array} [c]{ccc} -\,20 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 20 \end{array} \right] {\text { and }}B=\left[ \begin{array} [c]{c} 1\\ 2\\ 3 \end{array} \right] . \end{aligned}$$

(69)

As Example 11 illustrates, without noise we get excellent approximations of A and B. For the feedback system one finds

$$\begin{aligned} \text{ spec }(\hat{A}-\hat{B}\hat{F})&=\{0.1994,0.0483,-0.0501\},\\ \text{ spec }(A-{B}\hat{F})&=\{-0.1234\pm 2.0777i,0.5279\}. \end{aligned}$$

When there is noise of amplitude $10^{-4}$, one computes the estimates

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{ccc} -\,19.9805 &{}\quad 0.7484 &{}\quad 0.0135\\ -\,0.0062 &{}\quad 0.7969 &{}\quad 1.0107\\ -\,0.0229 &{}\quad 0.9851 &{}\quad 19.6776 \end{array} \right] {\text { and }}\hat{B}=\left[ \begin{array} [c]{c} 1.0194\\ 2.0114\\ 2.6673 \end{array} \right] . \end{aligned}$$

(70)

This are bad approximations for A and B. Furthermore, for the feedback system one finds

$$\begin{aligned} \text{ spec }(\hat{A}-\hat{B}\hat{F})&=\{0.1929,0.0477,-0.0501\},\\ \text{ spec }(A-{B}\hat{F})&=\{1.4510\pm 3.0103i,-2.5232\}. \end{aligned}$$

Thus the stabilizing controller for the approximate system does not stabilize the true system.

6 Conclusions

This paper has introduced the algorithm ${\mathcal {A}}$ based on kernel methods to identify a stable linear dynamical system from a time series. The numerical experiments give excellent results in the absence of noise and structural perturbations. In the presence of noise and structural perturbations the algorithm works well in the stable case. In the unstable case, a modified algorithm works quite well in the presence of noise but cannot handle structural perturbations.

Then we have extended algorithm ${\mathcal {A}}$ to identify linear control systems. In particular, we have used estimates obtained by kernel methods to stabilize linear systems using linear-quadratic control and the algebraic Riccati equation. Here the numerical experiments seem to indicate that the same conclusions on applicability of the algorithm apply.

Extensions of the considered algorithms to nonlinear systems appear feasible and are left to future work.

Notes

A suggestion in [18] is to consider the $\rho _{X}$-volume of the Voronoi cell associated with $\bar{x}$. Another example is $w=1$ or if $|\bar{x}|=m<\infty$, $w=\frac{1}{m}$.
This assumption is true if X is compact and the inclusion map of ${\mathcal {H}}_{K,\bar{t}}$ into the space of Lipschitz functions on X is bounded which is the case when K is a $C^{2}$ Mercer kernel [22]. In fact, if $\Vert f\Vert _{\text{ Lip }(X)}\le C_{0}\Vert f\Vert _{K}$ for each $f\in {\mathcal {H}}_{K,\bar{t}}$, then $C_{\bar{x}}\le C_{0}^{2} \rho _{X}(X)$.

References

Antsaklis PJ, Michel AN (2006) Linear systems. Birkhäuser, Boston
MATH Google Scholar
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404
Article MathSciNet Google Scholar
Bouvrie J, Hamzi B (2017) Kernel methods for the approximation of nonlinear systems. SIAM J Control Optim 55–4:2460–2492
Article MathSciNet Google Scholar
Bouvrie J, Hamzi B (2017) Kernel methods for the approximation of some key quantities of nonlinear systems. J Comput Dyn 4(1—-2):1–19
MathSciNet MATH Google Scholar
Cheney W, Light W (2009) A course in approximation theory, graduate studies in mathematics, vol 101. American Mathematical Society, Providence
MATH Google Scholar
Colonius F, Kliemann W (2014) Dynamical systems and linear algebra, graduate studies in mathematics, vol 158. American Mathematical Society, Providence
Book Google Scholar
Cucker F, Smale S (2001) On the mathematical foundations of learning. Bull Am Math Soc 39:1–49
Article MathSciNet Google Scholar
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50
Article MathSciNet Google Scholar
Hamzi B, Colonius F (2019) Kernel methods for discrete-time linear equations, springer lecture notes in computer science LNCS, vol 11539
Google Scholar
Li L, Zhou D-X (2015) Learning theory approach to a system identification problem involving atomic norm. J Fourier Anal Appl 21:734
Article MathSciNet Google Scholar
Pillonetto G, Dinuzzo F, Chen T, De Nicolao G, Ljung L (2014) Kernel methods in system identification, machine learning and function estimation: a survey. Automatica 50(3):657–682
Article MathSciNet Google Scholar
Rifkin RM, Lippert A (2007) Notes on regularized least squares. Computer science and artificial intelligence laboratory technical report, MIT, MIT-CSAIL-TR-2007-025, CBCL-268
Schoenberg IJ (1935) Remarks to Maurice Fréchet’s article “Sur la définition axiomatique d’une classe d’espace distanciés vectoriellement applicable sur l’espace de Hilbert”. Ann Math 36:724–732
Article MathSciNet Google Scholar
Schoenberg IJ (1937) On certain metric spaces arising from Euclidean spaces by a change of metric and their imbedding in Hilbert space. Ann Math 38:787–793
Article MathSciNet Google Scholar
Schoenberg IJ (1938) Metric spaces and positive definite functions. Trans Am Math Soc 44:522–536
Article MathSciNet Google Scholar
Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press, Cambridge
MATH Google Scholar
Smale S, Zhou D-X (2003) Estimating the approximation error in learning theory. Anal Appl 1(1):17
Article MathSciNet Google Scholar
Smale S, Zhou D-X (2004) Shannon sampling and function reconstruction from point values. Bull Am Math Soc 41:279–305
Article MathSciNet Google Scholar
Smale S, Zhou D-X (2005) Shannon sampling II: connections to learning theory. Appl Comput Harmon Anal 19(3):285–302
Article MathSciNet Google Scholar
Wahba G (1990) Spline models for observational data. In: SIAM CBMS-NSF regional conference series in applied mathematics, vol 59
Walters P (1982) An introduction to ergodic theory. Springer, Berlin
Book Google Scholar
Zhou D-X (2003) Capacity of reproducing kernel spaces in learning theory. IEEE Trans Inf Theory 49(7):1743–1752
Article MathSciNet Google Scholar

Download references

Acknowledgements

BH thanks the European Commission for financial support received through Marie Curie Fellowships.

Author information

Authors and Affiliations

Department of Mathematics, Imperial College London, London, UK
Boumediene Hamzi
Institut für Mathematik, Universität Augsburg, Augsburg, Germany
Fritz Colonius

Authors

Boumediene Hamzi
View author publications
You can also search for this author in PubMed Google Scholar
Fritz Colonius
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boumediene Hamzi.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Elements of learning theory

In this section, we give a brief overview of Reproducing Kernel Hilbert Spaces (RKHS) as used in statistical learning theory. The discussion here borrows heavily from Cucker and Smale [7], Wahba [20], and Schölkopf and Smola [16]. Early work developing the theory of RKHS was undertaken by Schoenberg [13,14,15] and then Aronszajn [2]. Historically, RKHS came from the question, when it is possible to embed a metric space into a Hilbert space.

Definition 1

Let ${\mathcal {H}}$ be a Hilbert space of functions on a set ${\mathcal {X}}$ which is a closed subset of ${\mathbb {R}}^{n}$. Denote by $\langle f, g \rangle$ the inner product on ${\mathcal {H}}$ and let $\Vert f\Vert = \langle f, f \rangle ^{1/2}$ be the norm in ${\mathcal {H}}$, for f and $g \in {\mathcal {H}}$. We say that ${\mathcal {H}}$ is a reproducing kernel Hilbert space (RKHS) if there exists $K:{\mathcal {X}} \times {\mathcal {X}} \rightarrow {\mathbb {R}}$ such that

(i)
K has the reproducing property, i.e., $f(x)=\langle f(\cdot ),K(\cdot ,x)\rangle$ for all $f\in {\mathcal {H}}$.
(ii)
K spans ${\mathcal {H}}$, i.e., ${\mathcal {H}}=\overline{\text{ span }\{K(x,\cdot )|x\in {\mathcal {X}}\}}$.

K will be called a reproducing kernel of ${\mathcal {H}}$ and ${\mathcal {H}}_{K}$ will denote the RKHS ${\mathcal {H}}$ with reproducing kernel K.

Definition 2

Given a kernel $K:{\mathcal {X}}\times {\mathcal {X}}\rightarrow {\mathbb {R}}$ and inputs $x_{1},\ldots ,x_{n}\in {\mathcal {X}}$, the $n\times n$ matrix

$$\begin{aligned} k:=(K(x_{i},x_{j}))_{ij}, \end{aligned}$$

(71)

is called the Gram Matrix of k with respect to $x_{1},\ldots ,x_{n}$. If for all $n\in {\mathbb {N}}$ and distinct $x_{i}\in {\mathcal {X}}$ the kernel $K\,$gives rise to a strictly positive definite Gram matrix, it is called strictly positive definite.

Definition 3

(Mercer kernel map) A function $K:{\mathcal {X}} \times {\mathcal {X}} \rightarrow {\mathbb {R}}$ is called a Mercer kernel if it is continuous, symmetric and positive definite.

The important properties of reproducing kernels are summarized in the following proposition.

Proposition 1

If K is a reproducing kernel of a Hilbert space ${\mathcal {H}}$, then

(i)
K(x, y) is unique.
(ii)
For all $x,y\in {\mathcal {X}}$, $K(x,y)=K(y,x)$ (symmetry).
(iii)
$\sum _{i,j=1}^{m}\alpha _{i}\alpha _{j}K(x_{i},x_{j}) \ge 0$ for $\alpha _{i} \in {\mathbb {R}}$ and $x_{i} \in {\mathcal {X}}$ (positive definitness).
(iv)
$\langle K(x,\cdot ),K(y,\cdot ) \rangle _{\mathcal {H}}=K(x,y)$.
(v)
The following kernels, defined on a compact domain ${\mathcal {X}} \subset {\mathbb {R}}^{n}$, are Mercer kernels: $K(x,y)=x\cdot y^{\top }$ (Linear), $K(x,y)=(1+x\cdot y^{\top })^{d},\quad d\in {\mathbb {N}}$ (Polynomial), $K(x,y)=e^{-\frac{\Vert x-y\Vert ^{2}}{\sigma ^{2}}},\quad \sigma >0$ (Gaussian).

Theorem 1

Let $K:{\mathcal {X}}\times {\mathcal {X}}\rightarrow {\mathbb {R}}$ be a symmetric and positive definite function. Then there exists a Hilbert space of functions ${\mathcal {H}}$ defined on ${\mathcal {X}}$ admitting K as a reproducing Kernel. Moreover, there exists a function $\varPhi :X\rightarrow {\mathcal {H}}$ such that

$$\begin{aligned} K(x,y)=\langle \varPhi (x),\varPhi (y)\rangle _{\mathcal {H}}\quad \text{ for }\quad x,y\in {\mathcal {X}}. \end{aligned}$$

(72)

$\varPhi$ is called a feature map.

Conversely, let ${\mathcal {H}}$ be a Hilbert space of functions $f:{\mathcal {X}} \rightarrow {\mathbb {R}}$, with ${\mathcal {X}}$ compact, satisfying

$$\begin{aligned} {\text {For all }}x\in {\mathcal {X}}{\text { there is }}\kappa _{x} >0,\,\text{ such } \text{ that }\,|f(x)|\le \kappa _{x}\Vert f\Vert _{\mathcal {H}}. \end{aligned}$$

(73)

Then ${\mathcal {H}}$ has a reproducing kernel K.

Remark 1

(i)
The dimension of the RKHS can be infinite and corresponds to the dimension of the eigenspace of the integral operator $L_{K}:{\mathcal {L}}_{\nu }^{2}({\mathcal {X}})\rightarrow {\mathcal {C}}({\mathcal {X}})$ defined as $(L_{K}f)(x)=\int K(x,t)f(t)d\nu (t)$ if K is a Mercer kernel, for $f\in {\mathcal {L}}_{\nu }^{2}({\mathcal {X}})$ and $\nu$ a Borel measure on ${\mathcal {X}}$.
(ii)
In Theorem 1, and using property [iv.] in Proposition 1, we can take $\varPhi (x):=K_{x}:=K(x,\cdot )$ in which case ${\mathcal {F}}={\mathcal {H}}$—the “feature space” is the RKHS. This is called the canonical feature map.
(iii)
The fact that Mercer kernels are positive definite and symmetric shows that kernels can be viewed as generalized Gramians and covariance matrices.
(iv)
In practice, we choose a Mercer kernel, such as the ones in [v.] in Proposition 1, and Theorem 1, that guarantees the existence of a Hilbert space admitting such a function as a reproducing kernel.

RKHS play an important role in learning theory whose objective is to find an unknown function

$$\begin{aligned} f^{*}:X\rightarrow Y \end{aligned}$$

(74)

from random samples

$$\begin{aligned} {\mathbf {s}}=(x_{i},y_{i})|_{i=1}^{m}, \end{aligned}$$

(75)

In the following we review results from [18] (for a more general setting, cf. [7]) in the special case when the data samples ${\mathbf {s}}$ are such that the following assumption holds.

Assumption 1

The samples in (75) have the special form

$$\begin{aligned} {\mathcal {S: }}\quad {\mathbf {s}}=(x,y_{x})|_{x \in \bar{x}}, \end{aligned}$$

(76)

where $\bar{x}=\{x_{i}\}|_{i=1}^{d+1}$ and $y_{x}$ is drawn at random from $f^{*}(x)+\eta _{x}$, where $\eta _{x}$ is drawn from a probability measure $\rho _{x}$.

Here for each $x \in X$, $\rho _{x}$ is a probability measure with zero mean, and its variance $\sigma _{x}^{2}$ satisfies $\sigma ^{2} :=\sum _{x \in \bar{x}} \sigma _{x}^{2} < \infty$. Let X be a closed subset of ${\mathbb {R}}^{n}$ and $\bar{t} \subset X$ is a discrete subset. Now, consider a kernel $K: X \times X \rightarrow {\mathbb {R}}$ and define a matrix (possibly infinite) $K_{\bar{t},\bar{t}} : \ell ^{2}(\bar{t}) \rightarrow \ell ^{2}(\bar{t})$ as

$$\begin{aligned} (K_{\bar{t},\bar{t}}a)_{s} = \sum _{t \in \bar{t}}K(s,t)a_{t}, \quad s \in \bar{t}, a \in \ell ^{2}(\bar{t}), \end{aligned}$$

(77)

where $\ell ^{2}(\bar{t})$ is the set of sequences $a=(a_{t})_{t \in \bar{t}}: \bar{t} \rightarrow {\mathbb {R}}$ with $\langle a,b \rangle =\sum _{t \in \bar{t} }a_{t} b_{t}$ defining an inner product. For example, we can take $X={\mathbb {R}}$ and $\bar{t}=\{0,1,\ldots ,d\}$.

In the case of a linear dynamical system (1), we are interested in learning the map $x(k)\mapsto x(k+1)$. Here we can apply the following results.

The problem to approximate a function $f^{*}\in {\mathcal {H}}_{K}$ from samples ${\mathbf {s}}$ of the form (75) has been studied in [18, 19]. It is reformulated as the minimization problem

$$\begin{aligned} \bar{f}_{{\mathbf {s}},\gamma }:=\text{ arg }{\text{ min }} _{f\in {\mathcal {H}}_{K,\bar{t}}}\bigg \{\sum _{x\in \bar{x}}(f(x)-y_{x})^{2} +\gamma \Vert f\Vert _{K}^{2}\bigg \}, \end{aligned}$$

(78)

where $\gamma \ge 0$ is a regularization parameter. Moreover,when $\bar{x}$ is not defined by a uniform grid on X, the authors of [18] introduced a weighting $w:=\{w_{x}\}_{x\in \bar{x}}$ on $\bar{x}$ with $w_{x}>0$.^{Footnote 1} Let $D_{w}$ be the diagonal matrix with diagonal entries $\{w_{x}\}_{x\in \bar{x}}$. Then, $\Vert D_{w}\Vert \le \Vert w\Vert _{\infty }$.

In this case, the regularization scheme (78) becomes

$$\begin{aligned} \bar{f}_{{\mathbf {s}},\gamma }:=\text{ arg }{\text{ min }} _{f\in {\mathcal {H}}_{K,\bar{t}}}\bigg \{\sum _{x\in \bar{x}}w_{x}(f(x)-y_{x} )^{2}+\gamma \Vert f\Vert _{K}^{2}\bigg \}, \end{aligned}$$

(79)

Theorem 2

Assume $f^{*}\in {\mathcal {H}}_{K,\bar{t}}$ and the standing hypotheses with X, K, $\bar{t}$, $\rho$ as above, y as in (76). Suppose $K_{\bar{t},\bar{x}}D_{w}K_{\bar{x},\bar{t}}+\gamma K_{\bar{t} ,\bar{t}}$ is invertible. Define ${\mathcal {L}}$ to be the linear operator ${\mathcal {L}}=(K_{\bar{t},\bar{x}}D_{w}K_{\bar{x},\bar{t}}+\gamma K_{\bar{t},\bar{t}})^{-1}K_{\bar{t},\bar{x}}D_{w}$. Then problem (79) has the unique solution

$$\begin{aligned} f_{{\mathbf {s}},\gamma }=\sum _{t\in \bar{t}}({\mathcal {L}}y)_{t}K_{t} \end{aligned}$$

(80)

Assumption 2

For each $x \in X$, $\rho _{x}$ is a probability measure with zero mean supported on $[-\,M_{x},M_{x}]$ with ${\mathcal {B}}_{w} :=(\sum _{x \in \bar{x}}w_{x} M_{x}^{2})^{\frac{1}{2}} < \infty$.

The next theorems give estimates for the different sources of errors.

Theorem 3

(Sample error) [18, Theorem 4, Propositions 2 and 3] Let Assumptions 1 and 2 be satisfied, suppose that $K_{\bar{t},\bar{x}}D_{w}K_{\bar{x},\bar{t}}+\gamma K_{\bar{t},\bar{t}}$ is invertible and let $f_{{\mathbf {s}},\gamma }=\sum _{t\in \bar{t}}c_{t}K_{t}$ be the solution of (79) given in Theorem 2 by $c={\mathcal {L}}y$. Define

$$\begin{aligned} {\mathcal {L}}_{w}&:=(K_{\bar{t},\bar{x}}D_{w}K_{\bar{x},\bar{t}}+\gamma K_{\bar{t},\bar{t}})^{-1}K_{\bar{t},\bar{x}}D_{w}^{1/2}\\ \kappa&:=\Vert K_{\bar{t},\bar{t}}\Vert \;\Vert (K_{\bar{t},\bar{x}}D_{w}K_{\bar{x},\bar{t}}+\gamma K_{\bar{t},\bar{t}})^{-1}\Vert ^{2}. \end{aligned}$$

Then for every $0<\delta <1$, with probability at least $1-\delta$ we have the sample error estimate

$$\begin{aligned} \Vert f_{{\mathbf {s}},\gamma }-f_{\bar{x},\gamma }\Vert _{K}^{2} \le {\mathcal {E}}_{\text{ samp }}:=\kappa \sigma _{w}^{2}\alpha ^{-1}\bigg (\frac{2\Vert K_{\bar{t},\bar{t}}{\mathcal {L}}_{w}\Vert \;\Vert {\mathcal {L}}_{w}\Vert \;{\mathcal {B}} _{w}^{2}}{\kappa \sigma _{w}^{2}}\;\log {\frac{1}{\delta }}\bigg ), \end{aligned}$$

(81)

where $\alpha (u):=(u-1)\log u$ for $u>1$. In particular, ${\mathcal {E}} _{\text{ samp }}\rightarrow 0$ when $\gamma \rightarrow \infty$ or $\sigma _{w} ^{2}\rightarrow 0$.

Theorem 4

(Regularization error and integration error) [18, Proposition 4 and Theorem 5] Let Assumptions 1 and 2 be satisfied and let $\bar{X}=(X_{x})_{x\in \bar{x}}$ be the Voronoi cell of X associated with $\bar{x}$ and $w_{x}=\rho _{X}(X_{x})$. Define the Lipschitz norm on a subset $X^{\prime }\subset X$ as $\Vert f\Vert _{\text{ Lip }(X^{\prime })}:=\Vert f\Vert _{L^{\infty }(X^{\prime })}+\sup _{s,u\in X}\frac{|f(s)-f(u)|}{\Vert s-u\Vert _{\ell ^{\infty }({\mathbb {R}}^{n})}}$ and assume that the inclusion map of ${\mathcal {H}}_{K,\bar{t}}$ into the Lipschitz space satisfies^{Footnote 2}

$$\begin{aligned} C_{\bar{x}}:=\sup _{f\in {\mathcal {H}}_{K,\bar{t}}}\frac{\sum _{x\in \bar{x}} w_{x}\Vert f\Vert _{\text{ Lip }(X_{x})}^{2}}{\Vert f\Vert _{K}^{2}}<\infty . \end{aligned}$$

(82)

Suppose that $\bar{x}$ is $\varDelta -$dense in X, i.e., for each $y\in X$ there is some $x\in \bar{x}$ satisfying $\Vert x-y\Vert _{\ell ^{\infty }({\mathbb {R}}^{n})} \le \varDelta$.

Then for $f^{*}\in {\mathcal {H}}_{K,\bar{t}}$

$$\begin{aligned} \Vert f_{\bar{x},\gamma }-f^{*}\Vert ^{2}\le \Vert f^{*}\Vert _{K}^{2}(\gamma +8C_{\bar{x}}\varDelta ) \end{aligned}$$

(83)

Theorem 5

(Sample, regularization and integration errors) [18, Corollary 5] Under the assumptions of Theorems 3 and 4, let $\bar{X}=(X_{x})_{x\in \bar{x}}$ be the Voronoi cell of X associated with $\bar{x}$ and $w_{x}=\rho _{x}(X_{x} )$. Suppose that $\bar{x}$ is $\varDelta -$dense, $C_{\bar{x}}<\infty$, and $f^{*}\in {\mathcal {H}}_{K,\bar{t}}$. Then, for every $0<\delta <1$, with probability at least $1-\delta$ there holds

$$\begin{aligned} \Vert f_{\mathbf {{s},\gamma }}-f^{*}\Vert ^{2}\le 2C_{\bar{x} }{\mathcal {E}}_{\text{ samp }}+2\Vert f^{*}\Vert _{K}^{2}(\gamma +8C_{\bar{x}}\varDelta ), \end{aligned}$$

(84)

where ${\mathcal {E}}_{\text{ samp }}$ is given in (81).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Hamzi, B., Colonius, F. Kernel methods for the approximation of discrete-time linear autonomous and control systems. SN Appl. Sci. 1, 674 (2019). https://doi.org/10.1007/s42452-019-0701-3

Download citation

Received: 15 March 2019
Accepted: 03 June 2019
Published: 07 June 2019
DOI: https://doi.org/10.1007/s42452-019-0701-3

Kernel methods for the approximation of discrete-time linear autonomous and control systems

Abstract

Similar content being viewed by others

Kernel Methods for Discrete-Time Linear Equations

Kernel Methods

Approximation of eigenfunctions in kernel-based spaces

1 Introduction

2 Statement of the problem

Example 1

Example 2

3 Unstable case

Example 3

Example 4

Example 5

Example 6

Example 7

Example 8

4 Identification of linear control systems

Example 9

Example 10

Example 11

5 Stabilization via linear-quadratic optimal control

Example 12

Example 13

Example 14

6 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix: Elements of learning theory

Appendix: Elements of learning theory

Definition 1

Definition 2

Definition 3

Proposition 1

Theorem 1

Remark 1

Assumption 1

Theorem 2

Assumption 2

Theorem 3

Theorem 4

Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation