1 Introduction

This paper discusses several problems in dynamical systems and control, where methods from learning theory are used in the state space of linear systems. This is in contrast to previous approaches in the frequency domain [10, 11]. We refer to [11] for a general survey on applications of machine learning to system identification where similar problems have been treated using different techniques.

Basically, learning theory allows to deal with problems when only data from a given system are given. Reproducing Kernel Hilbert spaces (RKHS) allow to work in a very large dimensional space in order to simplify the underlying problem. We will discuss this in the simple case when the matrix A describing a linear discrete-time system is unknown, but a time series from the underlying linear dynamical system is given. We propose a method to estimate the underlying matrix using kernel methods. Applications are given in the stable and unstable case and for estimating the eigenvalues and topological entropy for a linear map. Furthermore, in the control case, estimation of the relevant matrices for a linear control system is done by viewing a linear control system as a dynamical system in the extended space of states and control inputs. Stabilization via linear-quadratic optimal control is discussed.

The emphasis of the present paper is on the formulation of a number of problems in dynamical systems and control and to illustrate the applicability of our approach via a series of numerical examples. This paper should be viewed as a preliminary step to extend these results to nonlinear discrete-time systems within the spirit of [3, 4] where the authors showed that RKHSs act as “linearizing spaces” and that this approach offers tools for a data-based theory for nonlinear (continuous-time) dynamical systems. The approach used in these papers is based on embedding a nonlinear system in a high (or infinite) dimensional reproducing kernel Hilbert space (RKHS) where linear theory is applied. To illustrate this approach, consider a polynomial in \(\mathbb {R}\), \(p(x)=\alpha + \beta x +\gamma x^2\) where \(\alpha , \beta , \gamma\) are real numbers. If we consider the map \(\phi {:}\, \mathbb {R}\rightarrow \mathbb {R}^3\) defined as \(\phi (x)=[1 \; x \; x^2]^{T}\) then \(p(x) = {\alpha } \cdot [1 \; x \; x^2]^{T}= {\alpha } \cdot \phi (x)\) is an affine polynomial in the variable \(\phi (x)\). Similarly, consider the nonlinear discrete-time system \(x(k+1)=x(k)+x^2(k)\). By rewriting it as \(x(k+1)=[1 \; 1] \left[ \begin{array}{c}x(k) \\ x(k)^2 \end{array} \right]\), the nonlinear system becomes linear in the variable \([x(k) \; x(k)^2]\).

The contents are as follows: In Sect. 2, the problem is stated formally and an algorithm based on kernel methods is given for the stable case. In Sect. 3, the algorithm is extended to the unstable case. In particular, the topological entropy of linear maps is computed (which boils down to computing unstable eigenvalues). In Sect. 4, identification of linear control systems is considered and Sect. 5 discusses their stabilization. Every section contains several numerically computed examples (via MATLAB) illustrating the approach. Section 6 draws some conclusions from the numerical experiments. For the reader’s convenience we have collected in the appendix basic concepts from learning theory as well as some hints to the relevant literature. A preliminary version of this article appeared in Hamzi and Colonius [9]

2 Statement of the problem

Consider the linear discrete-time system

$$\begin{aligned} x(k+1)=Ax(k), \end{aligned}$$
(1)

where \(A=[a_{i,j}]\in {\mathbb {R}}^{n\times n}\). We want to estimate A from the time series \(x(1)+\eta _{1}\), \(\ldots\), \(x(N)+\eta _{N}\) where the initial condition x(0) is known and \(\eta _{i}\) are distributed according to a probability measure \(\rho _{x}\) that satisfies the following condition (this is the Special Assumption in [18]).

Assumption The measure \(\rho _{x}\) is the marginal on \(X={\mathbb {R}} ^{n}\) of a Borel measure \(\rho\) on \(X\times {\mathbb {R}}\) with zero mean supported on \([-\,M_{x},M_{x}],M_{x}>0\).

One obtains from (1) for the components of the time series that

$$\begin{aligned} x_{i}(k+1)=\sum _{j=1}^{n}a_{ij}x_{j}(k). \end{aligned}$$
(2)

For every i we want to estimate the coefficients \(a_{ij},j=1,\ldots ,n\). They are determined by the linear maps \(f^{*}_{i}{:}\,{\mathbb {R}}^{n} \rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} (x_{1},\ldots ,x_{n})\mapsto \sum _{j=1}^{n}a_{ij}x_{j}. \end{aligned}$$
(3)

This problem can be reformulated as a learning problem as described in the “Appendix” where \(f^{*}_{i}\) in (3) plays the role of the unknown function (74) and \((x(k),x_{i}(k+1)+\eta _{i})\) are the samples in (76).

We note that in [18], the authors do not consider time series and that we apply their results to time series.

In order to approximate \(f^{*}_{i}\), we minimize the criterion in (79). For a positive definite kernel K, let \(f_{i}\) be the kernel expansion of \(f^{*}_{i}\) in the corresponding RKHS \({\mathcal {H}}_{K}\). Then \(f_{i}=\sum _{j=1}^{\infty }c_{i,j}\phi _{j}\) with certain coefficients \(c_{ij}\in {\mathbb {R}}\) and

$$\begin{aligned} \Vert f_{i}\Vert _{{\mathcal {H}}_{K}}=\displaystyle \sum _{j=1}^{\infty }\frac{c_{i,j}^{2} }{\lambda _{j}}, \end{aligned}$$
(4)

where \((\lambda _{j},\phi _{j})\) are the eigenvalues and eigenfunctions of the integral operator \(L_{K}{:}\,{\mathcal {L}}_{\nu }^{2}({\mathcal {X}})\rightarrow {\mathcal {C}}({\mathcal {X}})\) given by \((L_{K}f)(x)=\int K(x,t)f(t)d\nu (t)\) with a Borel measure \(\nu\) on \({\mathcal {X}}\). Thus \(L_{K}\phi _{j}=\lambda _{j}\phi _{j}\) for \(j\in {\mathbb {N}}^{*}\) and the eigenvalues \(\lambda _{j}\ge 0\).

Then we consider the problem of minimizing over \((c_{i,1},\) \(\ldots ,c_{i,N})\) the functional

$$\begin{aligned} {\mathcal {E}}_{i}=\frac{1}{N}\sum _{k=1}^{N}(y_{i}(k)-f_{i} (x(k)))^{2}+\gamma _{i}\Vert f_{i}\Vert _{{\mathcal {H}}_{K}}^{2}, \end{aligned}$$
(5)

where \(y_{i}(k):=x_{i}(k+1)+\eta _{i}=f^{*}_{i}(x(k))+\eta _{i}\) and \(\gamma _{i}\) is a regularization parameter.

Since we are dealing with a linear problem, it is natural to choose the linear kernel \(k(x,y)=\langle x,y\rangle\). Then the solution of the above optimization problem is given by the kernel expansion of \(x_{i}(k+1)\), \(i=1,\ldots ,n\),

$$\begin{aligned} y_{i}(k):=x_{i}(k+1)=\sum _{j=1}^{N}c_{ij}\langle x(j),x(k)\rangle , \end{aligned}$$
(6)

where the \(c_{ij}\) satisfy the following set of equations:

$$\begin{aligned} \left[ \begin{array} [c]{c} x_{i}(1)\\ \vdots \\ x_{i}(N) \end{array} \right] =\Bigg (N\lambda I_{d}+{\mathbb {K}}\Bigg )\left[ \begin{array} [c]{c} c_{i1}\\ \vdots \\ c_{iN} \end{array} \right] , \end{aligned}$$
(7)

with

$$\begin{aligned} {\mathbb {K:}}=\left[ \begin{array} [c]{ccc} \sum _{\ell =1}^{n}x_{\ell }(1)x_{\ell }(0) &{}\quad \cdots &{}\quad \sum _{\ell =1}^{n}x_{\ell }(N)x_{\ell }(0)\\ \vdots &{}\quad \cdots &{}\quad \vdots \\ \sum _{\ell =1}^{n}x_{\ell }(1)x_{\ell }(N-1) &{}\quad \cdots &{}\quad \sum _{\ell =1}^{n}x_{\ell }(N)x_{\ell }(N-1) \end{array} \right] . \end{aligned}$$
(8)

This is a consequence of Theorem 2.

From (2), we have

$$\begin{aligned} x_{i}(k+1)&=\sum _{j=1}^{N}c_{ij}\langle x(j),x(k)\rangle =\sum _{j=1} ^{N}c_{ij}x(j)^{T}\cdot x(k)\\&=\sum _{j=1}^{N}\sum _{\ell =1}^{n}c_{ij}x_{\ell }(j)x_{\ell }(k)\\&=\sum _{\ell =1}^{n}\sum _{j=1}^{N}c_{ij}x_{\ell }(j)x_{\ell }(k). \end{aligned}$$

Then an estimate of the entries of A is given by

$$\begin{aligned} \hat{a}_{i\ell }=\sum _{j=1}^{N}c_{i,j}x_{\ell }(j). \end{aligned}$$
(9)

This discussion leads us to the following basic algorithm.

Algorithm \({\mathcal {A}}\) If the eigenvalues of A are all within the unit circle, one proceeds as follows in order to estimate A. Given the time series \(x(1),\ldots ,x(N)\) solve the system of Eq. (7) to find the numbers \(c_{ij}\) and then compute \(\hat{a}_{i\ell }\) from (9).

Before we present numerical examples and modifications and applications of this algorithm, it is worthwhile to note the following preliminary remarks indicating what may be expected.

The stability assumption in algorithm \({\mathcal {A}}\) is imposed, since otherwise the time series will diverge exponentially. Then, already for a moderately sized number of data points (\(N\approx 10^{2}\)) Eq. (7) will be ill conditioned. Hence for unstable A, modifications of algorithm \({\mathcal {A}}\) are required.

While for test examples one can compare the entries of the matrix A and its approximation \(\hat{A}\), it may appear more realistic to compare the values \(x(1),\ldots ,x(N)\) of the data series and the values \(\hat{x}(1),\ldots ,\hat{x}(N)\) generated by the iteration of the matrix \(\hat{A}\).

In general, one should not expect that increasing the number of data points will lead to better approximations of the matrix A. If the matrix A is diagonalizable, for generic initial points \(x(0)\in {\mathbb {R}}^{n}\) the data points x(k) will approach, for \(N\rightarrow \infty\), the eigenspace for the eigenvalue with maximal modulus. For general A and generic initial points \(x(0)\in {\mathbb {R}}^{n}\), the data points x(N) will approach for, \(N\rightarrow \infty\), the largest Lyapunov space (i.e., the sum of the real generalized eigenspaces for eigenvalues with maximal modulus). Thus in the limit for \(N\rightarrow \infty\), only part of the matrix can be approximated. A detailed discussion of this (well known) limit behavior is, e.g., given in Colonius and Kliemann [6]. A consequence is that a medium length of the time series should be adequate.

This problem can be overcome by choosing the regularization parameter \(\gamma\) in (5) and (7) using the method of cross validation described in [12]. Briefly, in order to choose \(\gamma\), we consider a set of values of regularization parameters: we run the learning algorithm over a subset of the samples for each value of the regularization parameter and choose the one that performs the best on the remaining data set. Cross validation helps also in the presence of noise and to improve the results beyond the training set.

A theoretical justification of our algorithm is guaranteed by the error estimates in Theorem 5. In fact, for the linear dynamical system (1), we have that \(f^{*}\) in (74) is the linear map \(f^{*}(x)=f_{i}(x)\) in (3) and the samples \({\mathbf {s}}\) in (76) are \((x(k),x_{i}(k+1)+\eta _{i})\). Moreover, by choosing the linear kernel \(k(x,y)=\langle x,y\rangle\) we get that \(f^{*}\in {\mathcal {H}}_{K}\). In this case, (84) has the form

$$\begin{aligned} \Vert \hat{x}_{i}(k+1)-x_{i}(k+1)\Vert ^{2}\le 2C_{\bar{x}}{\mathcal {E}}_{\text{ samp }} +2\Vert x(k+1)\Vert _{K}^{2}(\gamma +8C_{\bar{x}}\varDelta ), \end{aligned}$$
(10)

where \(\Vert x_{i}(k+1)\Vert _{{\mathcal {H}}_{K}}=\sum _{j=1}^{\infty }\frac{c_{i,j}^{2} }{\lambda _{j}}\). See [3] for more details about error estimates in the general nonlinear case.

The first term in the right hand side of inequality (10) represents the error due to the noise (sampling error) and the second term represents the error due to regularization (regularization error) and the finite-number of samples (integration error).

Next, we discuss several numerical examples, beginning with the following scalar equation.

Example 1

Consider \(x(k+1)=\alpha x(k)\) with \(\alpha =0.5\). With the initial condition \(x(0)=-\,0.5\), we generate the time series \(x(1),\ldots ,x(100)\). Applying algorithm \({\mathcal {A}}\) with the regularization parameter \(\gamma =10^{-6}\) we compute \(\hat{\alpha }=0.4997\). Using cross validation, we get that \(\hat{\alpha }=0.5\) with regularization parameter \(\gamma =1.5259\,\times \,10^{-5}\). When we introduce an i.i.d perturbation signal \(\eta _{i}\in [-\,0.1,0.1]\), the algorithm does not behave well when we fix the regularization parameter. With cross validation, the algorithm works quite well and the regularization parameter adapts to the realization of the signal \(\eta _{i}\). Here, for \(e(k)=x(k)-\hat{x}(k)\) with \(x(k+1)=\alpha x(k)\) and \(\hat{x}(k+1)=\hat{\alpha }\hat{x}(k)\), we get that \(\Vert e(300)\Vert =\sqrt{\sum _{i=1}^{300}e^{2}(i)}=0.0914\) and \(\sqrt{\sum _{i=100}^{300}e^{2} (i)}=1.8218\,\times \,10^{-30}\).

We observe an analogous behavior of the algorithm when the data are generated from \(x(k+1)=\alpha x(k)+\varepsilon x(k)^{2}\) where the algorithm works well in the presence of noise and structural perturbations when using cross validation. When \(\varepsilon =0.1\) and with an i.i.d perturbation signal \(\eta _{i}\in [-\,0.1,0.1]\), \(\hat{\alpha }\) varies between 0.38 and 0.58 depending on the realization of \(\eta _{i}\) but \(\Vert e(300)\Vert =\sqrt{\sum _{i=1}^{300}e^{2}(i)}=0.2290\) and \(\sqrt{\sum _{i=100}^{300}e^{2} (i)}=2.8098\,\times \,10^{-30}\) which shows that the error e decreases exponentially and the generalization properties of the algorithm are quite good.

Example 2

Consider \(x(k+1)=Ax(k)\) with matrix A given by

$$\begin{aligned} A:=\left[ \begin{array} [c]{cccc} -\,0.5 &{}\quad 1 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0.6 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0.7 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad -\,0.8 \end{array} \right] . \end{aligned}$$
(11)

For the initial condition \(x=[-\,0.9,0.1,15,0.2]^{\prime }\) and with \(N=100\) data points, we get

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} -\,0.5000 &{}\quad 1.0000 &{}\quad 0.0000 &{}\quad -\,0.0000\\ 0.0000 &{}\quad 0.6000 &{}\quad 1.0000 &{}\quad 0.0000\\ 0.0000 &{}\quad -\,0.0000 &{}\quad 0.7000 &{}\quad 0.9994\\ -\,0.0000 &{}\quad 0.0000 &{}\quad -\,0.0000 &{}\quad -\,0.7995 \end{array} \right] . \end{aligned}$$
(12)

We then simulate \(x(k+1)=Ax(k)\) and \(\hat{x}(k+1)=\hat{A}\hat{x}(k)\) for \(k=0,\ldots ,200\) to test the accuracy of our approximation beyond the interval \(k=0,\ldots ,100\). Then the norm of the error \(e_{j}(k)=x_{j}(k)-\hat{x} _{j}(k)\), for \(j=1,\ldots ,4\), \(\Vert e_{j}(300)\Vert =\sqrt{\sum _{i=1}^{300}e_{j} ^{2}(i)}\) is of the order of \(10^{-3}\) and \(\sqrt{\sum _{i=100}^{300}e_{j} ^{2}(i)}\) is of the order of \(10^{-11}\) which shows that the error e decreases exponentially and the generalization properties of the algorithm are quite good. The regularization parameters are \(\gamma _{i}=0.9313\,\times \,10^{-9}\) for \(i=1,\ldots ,4\).

Also in the presence of small noise \(\eta _{i}\in [-\,0.01,0.01]\), the algorithm behaves well and the regularization parameters adapt to the realization of \(\eta _{i}\). For example, for a certain realizations of \(\eta _{i}\), we obtain the regularization parameters

$$\begin{aligned} \gamma _{1}& = 0.0039, \gamma _{2}=2.4114\,\times \,10^{-4},\nonumber \\ \gamma _{3}& = 9.3132\,\times \, 10^{-10}, \gamma _{4}=2 \,\times \,10^{-3} \end{aligned}$$
(13)

and the error \(\Vert e_{j}(300)\Vert =\sqrt{\sum _{i=1}^{300}e_{j}^{2}(i)}\) is of the order of \(10^{-1}\) and \(\sqrt{\sum _{i=100}^{300}e_{j}^{2}(i)}\) is of the order of \(10^{-9}\) .

Suppose that in addition to a small noise \(\eta _{i}\in [-0.01,0.01],\) there is a quadratic structural perturbation, i.e.,

$$\begin{aligned} x(k+1)=Ax(k)+\varepsilon \left[ \begin{array} [c]{c} x_{1}(k)^{2}\\ x_{2}(k)^{2}\\ x_{3}(k)^{2}\\ x_{4}(k)^{2} \end{array} \right] . \end{aligned}$$
(14)

Then with cross validation for \(\varepsilon =0.001\) the algorithm behaves well. For a particular realization of \(\eta\), the error \(\Vert e_{j}(300)\Vert =\sqrt{\sum _{i=1}^{300}e_{j}^{2}(i)}\) is between 5 and 15 but \(\sqrt{\sum _{i=100}^{300}e_{j}^{2}(i)}\) is of the order of \(10^{-9}\) and the regularization parameters are

$$\begin{aligned} \gamma _{1}& = 0.5,\gamma _{2}=9.3132\,\times \,10^{-10},\gamma _{3}=9.3132\,\times \, 10^{-10},\nonumber \\ \gamma _{4}& = 9.3132\,\times \,10^{-10}. \end{aligned}$$
(15)

These examples show a very good behavior of the algorithm.

3 Unstable case

Consider

$$\begin{aligned} x(k+1)=Ax(k){\text { with }}A\in {\mathbb {R}}^{n\times n}, \end{aligned}$$
(16)

where some of the eigenvalues of A are outside the unit circle. Again, we want to estimate A when the following data are given,

$$\begin{aligned} x(1),x(2),\ldots ,x(N), \end{aligned}$$
(17)

which are generated by system (16), thus \(x(k)=A^{k-1}x(1)\).

As remarked above, a direct application of the algorithm \({\mathcal {A}}\) will not work, since the time series diverges fast. Instead, we construct a new time series from (17) associated to an auxiliary stable system.

For a constant \(\sigma >0\) we define the auxiliary system by

$$\begin{aligned} y(k+1)=\tilde{A}y(k){\text { with }}\tilde{A}:=\frac{1}{\sigma }A. \end{aligned}$$
(18)

Thus

$$\begin{aligned} y(k)=\left( \frac{A}{\sigma }\right) ^{k-1}y(1) \end{aligned}$$
(19)

and with \(y(1)=x(1)\) one finds

$$\begin{aligned} y(k)=\frac{1}{\sigma ^{k-1}}A^{k-1}x(1)=\frac{1}{\sigma ^{k-1}}x(k). \end{aligned}$$
(20)

If we choose \(\sigma >0\) such that the eigenvalues of \(\frac{A}{\sigma }\) are in the unit circle, we can apply algorithm \({\mathcal {A}}\) to this stable matrix and hence we would obtain an estimate of \(\frac{A}{\sigma }\) and hence of A. However, since the eigenvalues of the matrix A are unknown, we will be content with a somewhat weaker condition than stability of \(\frac{A}{\sigma }\).

The data (17) for system (16) yield the following data for system (18):

$$\begin{aligned} y(1):=x(1),y(2):=\frac{1}{\sigma }x(2),\ldots ,y(N):=\frac{1}{\sigma ^{N-1}}x(N). \end{aligned}$$
(21)

We propose to choose \(\sigma\) as follows: Define

$$\begin{aligned} \sigma :=\max \left\{ \frac{\left\| x(k+1)\right\| }{\left\| x(k)\right\| },k\in \{0,1,\ldots ,N\}\right\} . \end{aligned}$$
(22)

Clearly, the inequality \(\sigma \le \left\| A\right\|\) holds. We apply algorithm \({\mathcal {A}}\) to the time series y(k). This yields an estimate of \(\frac{A}{\sigma }\) and hence an estimate \(\hat{A}\) of A.

For general A, this choice of \(\sigma\) certainly does not guarantee that the eigenvalues of \(\frac{A}{\sigma }\) are within the unit circle. However, as mentioned above, a generic data sequence \(x(k),k\in {\mathbb {N}}\), will converge to the eigenspace of the eigenvalue with maximal modulus. Hence \(\frac{\left\| x(k+1)\right\| }{\left\| x(k)\right\| }\) will approach the maximal modulus of an eigenvalue, thus this choice of \(\sigma\) will lead to a matrix \(\frac{A}{\sigma }\) which is not “too unstable”.

Example 3

Consider \(x(k+1)=\alpha x(k)\) with \(\alpha =11.46\). With the initial condition \(x(0)=-\,0.5\), we generate the time series \(x(1),\ldots ,x(100)\). The algorithm above with the regularization parameter \(\gamma =10^{-6}\) yields the estimate \(\hat{\alpha }=11.4086\). Cross validation leads to the regularization parameter \(\gamma =9.5367\,\times \,10^{-7}\) and the estimate \(\hat{\alpha }=11.4599\). In the presence of a small noise \(\eta \in [-\,0.1,0.1]\), cross validation yields the regularization parameter \(\gamma =0.002\) and the slightly worse estimate \(\hat{\alpha }=11.1319\).

We observe the same behavior in higher dimensional systems where the eigenvalues are of the same order of magnitude.

Example 4

Consider \(x(k+1)=Ax(k)\) with

$$\begin{aligned} A=\left[ \begin{array} [c]{cccc} 20 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad -\,10 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 15 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad -\,25 \end{array} \right] \end{aligned}$$
(23)

Using cross validation, we get that

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} 20.0000 &{}\quad 0.0000 &{}\quad 0.0001 &{}\quad 0.0000\\ -\,0.0000 &{}\quad -\,10.0000 &{}\quad 0.0000 &{}\quad -\,0.0000\\ 0.0000 &{}\quad -\,0.0000 &{}\quad 14.9998 &{}\quad 0.0000\\ -\,0.0000 &{}\quad -\,0.0000 &{}\quad -\,0.0000 &{}\quad -\,25.0003 \end{array} \right] \end{aligned}$$
(24)

for \(\gamma _{i}=0.9313\,\times \,10^{-9}\), \(i=1,\ldots ,4\).

For different realizations of a noise \(\eta _{i}\) of magnitude \(0.5\,\times \,10^{-4}\), cross validation gives a good approximation of A and the eigenvalues of \(A-\hat{A}\) are all within the unit disk with amplitude of the order of \(10^{-3}\) showing that the dynamics of the error \(e(k)=x(k)-\hat{x}(k)\) is asymptotically stable. For example, for a particular realization of \(\eta _{i}\) of magnitude \(0.5\,\times \,10^{-4}\), we get

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} 19.9635 &{}\quad 0.0086 &{}\quad 0.1365 &{}\quad -\,0.0007\\ -\,0.0177 &{}\quad -\,10.0025 &{}\quad 0.0379 &{}\quad -\,0.0007\\ -\,0.0177 &{}\quad -\,0.0025 &{}\quad 15.0376 &{}\quad -\,0.0007\\ -\,0.0132 &{}\quad -\,0.0167 &{}\quad 0.0065 &{}\quad -\,25.0000 \end{array} \right] \end{aligned}$$
(25)

with regularization parameters

$$\begin{aligned} \gamma _{1}& = 1.9073\,\times \,10^{-6},\gamma _{2}=9.3132\,\times \,10^{-10},\gamma _{3}=9.3132\,\times \,10^{-10},\nonumber \\ \gamma _{4}& = 1.2207\,\times \,10^{-4}. \end{aligned}$$
(26)

The algorithm fails in the presence of quadratic structural perturbations. This is due to the choice of a linear kernel. A polynomial kernel, for example, would allow for nonlinear perturbations but this would require a complete reformulation of our algorithm. We leave the extension of our algorithm to the nonlinear case for future work.

The next example is an unstable system with a large gap between the eigenvalues.

Example 5

Consider the system \(x(k+1)=Ax(k)\) with

$$\begin{aligned} A=\left[ \begin{array} [c]{cc} 20 &{}\quad 0\\ 0 &{}\quad -\,0.1 \end{array} \right] . \end{aligned}$$
(27)

With the initial condition \(x(0)=[-\,1.9,1]\), we generate the time series \(x(1),\ldots ,x(100)\). The algorithm above yields the (excellent) estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} 20.0000 &{}\quad 0.0000\\ -\,0.0000 &{}\quad -\,0.1000 \end{array} \right] , \end{aligned}$$
(28)

In the presence of noise of maximal amplitude \(10^{-4}\) , the algorithm approximates well only the large entry \(a_{11}=20\): For a first realization of \(\eta _{i}\) and with cross validation, we get

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} 19.9997 &{}\quad -\,0.0111\\ 0.0000 &{}\quad -\,0.1104 \end{array} \right] , \end{aligned}$$
(29)

with \(\gamma _{1}=1.5259\,\times \,10^{-5}\) and \(\gamma _{2}=2^{20}\). However another realization of \(\eta _{i}\) leads to

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} 19.9994 &{}\quad -\,0.0011\\ 0.0000 &{}\quad -\,0.0000 \end{array} \right] , \end{aligned}$$
(30)

with \(\gamma _{1}=3.0518\,\times \,10^{-5}\) and \(\gamma _{2}=2.8147\,\times \,10^{14}\). This is due to the fact that the data converge to the eigenspace generated by the largest eigenvalue \(\lambda =20\). However, the eigenvalues of \(A-\hat{A}\) are within the unit disk with small amplitude which guarantees that the error dynamics of \(e(k)=x(k)-\hat{x}(k)\) converges to the origin quite quickly. We observe the same phenomenon with

$$\begin{aligned} A=\left[ \begin{array} [c]{cc} -\,0.5 &{}\quad 0\\ 0 &{}\quad 25 \end{array} \right] . \end{aligned}$$
(31)

Here, in the absence of noise, we obtain the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} -\,0.5000 &{}\quad 0.0000\\ -\,0.0000 &{}\quad 25.0000 \end{array} \right] , \end{aligned}$$
(32)

with \(\gamma _{1}=\gamma _{2}=0.9313\,\times \,10^{-9}\). In the presence of noise \(\eta _{i}\) with amplitude \(10^{-4}\), the data converge to the eigenspace corresponding to the largest eigenvalue \(\lambda =25\): for some realization of \(\eta _{i}\) one obtains the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} -\,0.4809 &{}\quad 0.0008\\ 0.0164 &{}\quad 24.9960 \end{array} \right] , \end{aligned}$$
(33)

while for another realization of \(\eta\)

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cc} -\,0.0000 &{}\quad -\,0.0000\\ -\,1.0067 &{}\quad 24.8696 \end{array} \right] . \end{aligned}$$
(34)

The regularization parameters \(\gamma _{1}\) and \(\gamma _{2}\) adapt to the realization of the noise.

As already remarked in the end of Sect. 2, we see that “more data” does not always necessarily lead to better results, since the data sequence converges to the eigenspace generated by the largest eigenvalue. However, whether with or without noise, the approximations of A are good enough to reduce the error between \(x(k+1)=Ax(k)\) and \(\hat{x}(k+1)=\hat{A}\hat{x}(k)\) outside of the training examples, since cross-validation determines a good regularization parameter \(\gamma\) that balances between good fitting and good prediction properties.

The next example has an eigenvalue on the unit circle.

Example 6

Consider \(x(k+1)=Ax(k)\) with

$$\begin{aligned} A=\left[ \begin{array} [c]{cccc} 2.2500 &{}\quad -\,1.2500 &{}\quad 1.2500 &{}\quad -\,49.5500\\ 3.7500 &{}\quad -\,2.7500 &{}\quad 13.1500 &{}\quad -\,20.6500\\ 0 &{}\quad 0 &{}\quad 10.4000 &{}\quad -\,32.3000\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad -\,21.9000 \end{array} \right] . \end{aligned}$$
(35)

The set of eigenvalues of A is \(\text{ spec }(A)=\{-\,1.5000,1.0000,10.4000,-\,21.9000\}\). In the absence of noise and initial condition \(x=[-\,0.9,15,1.5.2.5]\) with \(N=100\) points, we compute the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} 2.2500 &{}\quad -\,1.2500 &{}\quad 1.2498 &{}\quad -\,49.5499\\ 3.7500 &{}\quad -\,2.7500 &{}\quad 13.1498 &{}\quad -\,20.6499\\ 0.0000 &{}\quad 0.0000 &{}\quad 10.3998 &{}\quad -\,32.2999\\ 0.0000 &{}\quad 0.0000 &{}\quad -\,0.0001 &{}\quad -\,21.8999 \end{array} \right] , \end{aligned}$$
(36)

and regularization parameters \(\gamma _{1}=\gamma _{2}=0.9313\,\times \,10^{-9}\). In this case, the set of eigenvalues of \(\hat{A}\) is

$$\begin{aligned} \text{ spec }(\hat{A})=\{-21.9000,10.3999,-1.5000,1.0000\}. \end{aligned}$$
(37)

For a given realization of \(\eta \in [-10^{-4},10^{-4}]\), we obtain the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} 2.2551 &{}\quad -\,1.2490 &{}\quad 1.2187 &{}\quad -\,49.5304\\ 3.7554 &{}\quad -\,2.7489 &{}\quad 13.1175 &{}\quad -\,20.6297\\ 0.0055 &{}\quad 0.0011 &{}\quad 10.3669 &{}\quad -\,32.2794\\ 0.0053 &{}\quad 0.0010 &{}\quad -\,0.0325 &{}\quad -\,21.8797 \end{array} \right] \end{aligned}$$
(38)

with \(\gamma _{1}=0.0745\,\times \,10^{-7}\) and \(\gamma _{2}=0.1490\,\times \,10^{-7}\). The eigenvalues of \(A-\hat{A}\) are of the order of \(10^{-4}\) which guarantees that the error dynamics converges quickly to the origin. However, the set of eigenvalues of \(\hat{A}\) is

$$\begin{aligned} \text{ spec }(\hat{A})=\{-21.8996,10.3999,-1.5026,1.0134\}. \end{aligned}$$
(39)

Hence an additional unstable eigenvalue occurs.

Example 7

Consider \(x(k+1)=Ax(k)\) with

$$\begin{aligned} A=\left[ \begin{array} [c]{cccc} -\,0.8500 &{}\quad 0.4500 &{}\quad -\,0.4500 &{}\quad -\,77.8500\\ -\,1.3500 &{}\quad 0.9500 &{}\quad 14.3500 &{}\quad -\,11.6500\\ 0 &{}\quad 0 &{}\quad 15.3000 &{}\quad -\,55.3000\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad -\,40.0000 \end{array} \right] . \end{aligned}$$
(40)

The eigenvalues of A are given by

$$\begin{aligned} \text{ spec }(A)=\{-\,0.4000,0.5000,15.3000,-\,40.0000\}. \end{aligned}$$
(41)

For an initial condition \(x=[-\,0.9;15;1.5;2.5]\) and with \(N=100\) data points, we get

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} -\,0.8498 &{}\quad 0.4501 &{}\quad -\,0.4499 &{}\quad -\,77.8504\\ -\,1.3499 &{}\quad 0.9500 &{}\quad 14.3501 &{}\quad -\,11.6502\\ 0.0001 &{}\quad 0.0001 &{}\quad 15.3001 &{}\quad -\,55.3004\\ -\,0.0004 &{}\quad -\,0.0002 &{}\quad -\,0.0004 &{}\quad -\,39.9987 \end{array} \right] \end{aligned}$$
(42)

with eigenvalues given by

$$\begin{aligned} \text{ spec }(\hat{A})=\{-\,40.0000,-\,0.3974,0.4982,15.3008\}. \end{aligned}$$
(43)

Here we used \(\gamma _{i}=10^{-12}\), \(i=1,\ldots ,4\). Moreover, the eigenvalues of \(A-\hat{A}\) are quite small and such that the error dynamics converges quickly to the origin. In the presence of noise \(\eta\), the algorithm approximates the largest eigenvalues of A but does not approximate the smaller (stable) ones. For example, for a particular realization of noise with amplitude \(10^{-4}\), we get the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} -\,2.1100 &{}\quad -\,0.0993 &{}\quad -\,1.3259 &{}\quad -\,74.4543\\ -\,1.7053 &{}\quad 0.7777 &{}\quad 13.9397 &{}\quad -\,10.5308\\ -\,0.8277 &{}\quad -\,0.3692 &{}\quad 14.6466 &{}\quad -\,52.9920\\ -\,0.8283 &{}\quad -\,0.3694 &{}\quad -\,0.6539 &{}\quad -\,37.6904 \end{array} \right] \end{aligned}$$
(44)

and \(\text{ spec }(\hat{A})=\{-\,40.0009,0.1620\pm 0.8438i,15.3008\}\).

For another realization of noise with amplitude \(10^{-2}\), we get the estimate

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{cccc} -\,138.0893 &{}\quad -\,60.7052 &{}\quad -\,105.8111 &{}\quad 301.5029\\ -\,0.2435 &{}\quad 0.9101 &{}\quad 12.9638 &{}\quad -\,12.6745\\ -\,71.1408 &{}\quad -\,31.9557 &{}\quad -\,40.3842 &{}\quad 142.3170\\ -\,71.1408 &{}\quad -\,31.9557 &{}\quad -\,55.6843 &{}\quad 157.6172 \end{array} \right] \end{aligned}$$
(45)

and \(\text{ spec }(\hat{A})=\{-\,40.1391,3.9326,0.9601,15.3002\}\).

The algorithm introduced above also allows us to compute the topological entropy of linear systems, since it is determined by the unstable eigenvalues. Recall that the topological entropy of a linear map on \({\mathbb {R}}^{n}\) is defined in the following way:

Fix a compact subset \(K\subset {\mathbb {R}}^{n}\), a time \(\tau \in {\mathbb {N}}\) and a constant \(\varepsilon >0\). Then a set \(R\subset {\mathbb {R}}^{n}\) is called \((\tau ,\varepsilon )\)-spanning for K if for every \(y\in K\) there is \(x\in R\) with

$$\begin{aligned} \left\| A^{j}y-A^{j}x\right\| <\varepsilon {\text { for all }}j=0,\ldots ,\tau . \end{aligned}$$
(46)

By compactness of K, there are finite \((\tau ,\varepsilon )\)-spanning sets. Let R be a \((\tau ,\varepsilon )\)-spanning set of minimal cardinality \(\#R=r_{\min }(\tau ,\varepsilon ,K)\). Then

$$\begin{aligned} h_{top}(K,A,\varepsilon ):=\lim _{\tau \rightarrow \infty }\frac{1}{\tau }\log r_{\min }(\tau ,\varepsilon ,K),h_{top}(K,A):=\underset{\varepsilon \rightarrow 0^{+}}{\lim }h_{top}(K,\varepsilon ). \end{aligned}$$
(47)

(the limits exist). Finally, the topological entropy of A is

$$\begin{aligned} h_{top}(A):=\sup _{K}h_{top}(K,A), \end{aligned}$$
(48)

where the supremum is taken over all compact subsets K of \({\mathbb {R}}^{n}.\)

A classical result due to Bowen (cf. [21, Theorem 8.14]) shows that the topological entropy is determined by the sum of the unstable eigenvalues, i.e.,

$$\begin{aligned} h_{top}(A)=\sum \max (1,\left| \lambda \right| ), \end{aligned}$$
(49)

where summation is over all eigenvalues of A counted according to their algebraic multiplicity.

Hence, when we approximate the unstable eigenvalues of A by those of the matrix \(\hat{A}\), we also get an approximation of the topological entropy.

Example 8

For Example 6, we get that \(h_{top}(A)=34.80\) while for the estimate \(\hat{A}\) one obtains \(h_{top}(\hat{A})=34.7999\). For Example 7, we get that \(h_{top}(A)=55.30\) and \(h_{top}(\hat{A})=55.3008\). These estimates appear reasonably good.

4 Identification of linear control systems

Consider the linear control system

$$\begin{aligned} x(k+1)=Ax(k)+Bu(k), \end{aligned}$$
(50)

with \(A\in {\mathbb {R}}^{n\times n}\) and \(B\in {\mathbb {R}}^{n\times 1}\). We want to estimate the matrices A and B from the time series \(x(1)+\eta _{1},\) \(\ldots ,x(N)+\eta _{N}\) where \(\eta\) satisfies the Assumption in Sect. 2. The initial condition x(0) and the control sequence u(0),  \(\ldots ,u(N)\) are assumed to be known.

In order to estimate A and B, we will extend algorithm \({\mathcal {A}}\). The ith component of system (50) is given by

$$\begin{aligned} x_{i}(k+1)=\sum _{j=1}^{n}a_{ij}x_{j}(k)+b_{i}u(k). \end{aligned}$$
(51)

For every i we want to estimate the coefficients \(b_{i}\) and \(a_{ij},j=1,\) \(\ldots ,n\). Thus the linear map \(f_{i}:{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} (x_{1},\ldots ,x_{n},u)\mapsto \sum _{j=1}^{n}a_{ij}x_{j} +b_{i}u \end{aligned}$$
(52)

is unknown. To extend algorithm \({\mathcal {A}}\), we will view system (51) as a system of the form (2) where the state x is the extended state \(\underline{x}=(x,u)\in {\mathbb {R}}^{n}\times {\mathbb {R}}\) for (50). Hence, the kernel expansion (6) becomes

$$\begin{aligned} {x}_{i}(k+1)=\sum _{j=1}^{N}c_{ij}\langle \underline{x} (j),\underline{x}(k)\rangle \end{aligned}$$
(53)

where \(\underline{x}_{n+1}=u\) and the \(c_{ij}\) satisfy the following set of equations:

$$\begin{aligned} \left[ \begin{array} [c]{c} x_{i}(1)\\ \vdots \\ x_{i}(N) \end{array} \right] =\Bigg (N\lambda I_{d}+\underline{\mathbb {K}}\Bigg )\left[ \begin{array} [c]{c} c_{i1}\\ \vdots \\ c_{iN} \end{array} \right] , \end{aligned}$$
(54)

with

$$\begin{aligned} \underline{\mathbb {K}}=\left[ \begin{array} [c]{ccc} \sum \nolimits _{\ell =1}^{n+1}\underline{x}_{\ell }(1)\underline{x}_{\ell }(0) &{} \cdots &{} \sum \nolimits _{\ell =1}^{n+1}\underline{x}_{\ell }(N)\underline{x}_{\ell }(0)\\ \vdots &{} \cdots &{} \vdots \\ \sum \nolimits _{\ell =1}^{n+1}\underline{x}_{\ell }(1)\underline{x}_{\ell }(N-1) &{} \cdots &{} \sum \nolimits _{\ell =1}^{n+1}\underline{x}_{\ell }(N)\underline{x}_{\ell }(N-1) \end{array} \right] . \end{aligned}$$
(55)

Let us emphasize that \(u=x_{n+1}\) does not appear on the left hand side of (53)–(54).

In reference to the case when A has eigenvalues outside the unit circle, we adopt the same method as in Sect. 3 and define

$$\begin{aligned} \underline{\sigma }:=\max \left\{ \frac{\left\| \underline{x}(k+1)\right\| }{\left\| \underline{x}(k)\right\| },k\in \{0,1,\ldots ,N\}\right\} . \end{aligned}$$
(56)

Example 9

(One dimensional case) Consider \(x(k+1)=-\,0.9x(k)+3.5u\). For an input \(u(k)=\sin (k)+\cos (k)\) and for 100 points we obtain the estimate \(\hat{A}=-\,0.9\) and \(\hat{B}=3.5\) when there is no noise \(\eta _{i}\). Here cross validation gives \(\gamma _{1}=1.5259\,\times \,10^{-05}\) and \(\gamma _{2}=1\). For a certain realization of the noise \(\eta _{i}\) with amplitude 0.1, we get \(\hat{A}=-\,0.9008\) and \(\hat{B}=3.4983\). Here cross validation gives \(\gamma _{1}=0.0078\) and \(\gamma _{2}=1\).

Example 10

(Three dimensional stable case) Consider control system (50) with

$$\begin{aligned} A=\left[ \begin{array} [c]{ccc} -\,0.9 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad -\,0.1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 0.8 \end{array} \right] {\text { and }}B=\left[ \begin{array} [c]{c} -\,2.5\\ -\,3.5\\ 4.5 \end{array} \right] . \end{aligned}$$
(57)

With the input \(u(k)=\sin (k)+\cos (k)\) and 100 points, one computes the estimates

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{ccc} -\,0.9000 &{}\quad 1.0000 &{}\quad 0.0000\\ 0.0000 &{}\quad -\,0.1000 &{}\quad 1.0000\\ -\,0.0000 &{}\quad -\,0.0000 &{}\quad 0.8000 \end{array} \right] {\text { and }}\hat{B}=\left[ \begin{array} [c]{c} -\,2.5000\\ -\,3.5000\\ 4.5000 \end{array} \right] . \end{aligned}$$
(58)

Here cross validation gives the regularization parameters \(\gamma _{i}=0.1526\,\times \,10^{-4}\) for \(i=1,\ldots ,4\). For some realization of perturbations \(\eta _{i}\) with amplitude 0.1, one computes the estimates

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{ccc} -\,0.9047 &{}\quad 0.9984 &{}\quad -\,0.0029\\ -\,0.0047 &{}\quad -\,0.1016 &{}\quad 0.9971\\ -\,0.0048 &{}\quad -\,0.0018 &{}\quad 0.7971 \end{array} \right] {\text { and }}\hat{B}=\left[ \begin{array} [c]{c} -\,2.5326\\ -\,3.5321\\ 4.4661 \end{array} \right] . \end{aligned}$$
(59)

Here cross validation gives \(\gamma _{1}=9.7656\,\times \,10^{-4}\), \(\gamma _{2}=9.7656\,\times \,10^{-4}\), \(\gamma _{3}=1.5259\,\times \,10^{-5}\), \(\gamma _{4}=4\).

Example 11

(Three dimensional unstable case) Consider control system (50) with

$$\begin{aligned} A=\left[ \begin{array} [c]{ccc} -\,20 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 20 \end{array} \right] {\text { and }}B=\left[ \begin{array} [c]{c} 1\\ 2\\ 3 \end{array} \right] . \end{aligned}$$
(60)

The input \(u(k)=\sin (k)+\cos (k)\) and 100 points give the estimates

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{ccc} -\,19.9945 &{}\quad 1.0009 &{}\quad -\,0.0137\\ 0.0013 &{}\quad 0.9995 &{}\quad 0.9919\\ 0.0155 &{}\quad -\,0.0171 &{}\quad 19.7835 \end{array} \right] {\text { and }}\hat{B}=\left[ \begin{array} [c]{c} 0.9898\\ 1.9898\\ 2.9333 \end{array} \right] . \end{aligned}$$
(61)

Here cross validation yields the regularization parameters \(\gamma _{i}=0.8882\,\times \,10^{-15}\) for \(i=1,\ldots ,4\). For some realization of perturbations \(\eta _{i}\) with amplitude \(10^{-4}\), one computes the estimates

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{ccc} -\,20.0000 &{}\quad 0.9334 &{}\quad -\,0.0058\\ -\,0.0008 &{}\quad 0.9382 &{}\quad 0.9939\\ -\,0.0008 &{}\quad -\,0.0590 &{}\quad 19.9937 \end{array} \right] {\text { and }}\hat{B}=\left[ \begin{array} [c]{c} 0.9819\\ 1.9814\\ 2.9811 \end{array} \right] . \end{aligned}$$
(62)

Here cross validation gives \(\gamma _{1}=\gamma _{2}=0.2384\,\times \,10^{-6}\), \(\gamma _{3}=\gamma _{4}=0.0596\,\times \,10^{-6}\).

These results show that algorithm \({\mathcal {A}}\) works quite well in these cases.

5 Stabilization via linear-quadratic optimal control

A basic problem for linear control systems is stabilization by state feedback. A standard method is to use linear quadratic optimal control, where the feedback is computed using the solution of an algebraic Riccati equation. In this section, we propose to replace in the algebraic Riccati equation the system matrix A by the estimate \(\hat{A}\) obtained by learning theory.

The linear quadratic optimal control problem has the following form:

Minimize over all (continuous) inputs u

$$\begin{aligned} J_{\infty }(x_{0};u)=\sum _{k=0}^{\infty }\left[ x(k)^{\top }Qx(k)+u(k)^{\top }Ru(t)\right] \end{aligned}$$
(63)

with \(x(\cdot )\) given by

$$\begin{aligned} {x}(k+1)=Ax(k)+Bu(k),\ k\ge 0,\ x(0)=x_{0}; \end{aligned}$$
(64)

here \(Q\in {\mathbb {R}}^{n\times n}\) is positive semidefinite and \(R\in {\mathbb {R}}^{m\times m}\) is positive definite, and \(A\in {\mathbb {R}}^{n\times n},B\in {\mathbb {R}}^{n\times m}\).

Consider the discrete algebraic Riccati equation DARE

$$\begin{aligned} A^{\top }\left( P-PB(R+B^{T}PB)^{-1}B^{\top }P\right) A+Q=P. \end{aligned}$$
(65)

Obviously, every solution P is positive semi-definite. We cite the following theorem from [1].

Theorem Suppose that for every \(x_{0}\in {\mathbb {R}}^{n}\) there is an input u, such that \(J(x_{0},u)<\infty\). Then the following holds:

  1. (i)

    There is a unique solution P of the DARE.

  2. (ii)

    For every \(x_{0}\in {\mathbb {R}}^{n}\) one has \(J^{*}(x_{0}):=\inf \{J(x_{0},u)\left| {}\right. u\) an input\(\}=x_{0}^{\top }Px_{0}\) and there is a unique optimal input \(u^{*}\) with \(J^{*}(x_{0})=J(x_{0},u^{*})\). This optimal input is generated by the feedback \(F=(R+B^{T}PB)^{-1}B^{\top }PA\) and

$$\begin{aligned} u(k)=-Fx(k),k\ge 0. \end{aligned}$$
(66)

In particular, the feedback F stabilizes the system, i.e., \({x} (k+1)=(A-BF)x(k)\) is stable.

Now we use an estimate \(\hat{A}\) and \(\hat{B}\) (obtained by kernel methods) instead of A and B in the algebraic Riccati equation and obtain the solution \(\hat{P}\). Will the corresponding feedback \(u=\hat{F}x:=-B^{\top } \hat{P}x\) also stabilize the system, i.e., is the following system stable:

$$\begin{aligned} {x}(k+1)=(A-BB^{\top }\hat{P})x(k)? \end{aligned}$$
(67)

Example 12

Consider the one-dimensional system \(x(k+1)=-\,0.9x(k)+3.5u\) in Example 9. In the absence of noise, we get \(\hat{A}=-\,0.9\) and \(\hat{B}=3.5\). We have that \(A-B\hat{F}=\hat{A}-\hat{B}\hat{F}=-\,0.0643\). When there is noise of amplitude 0.1, we get that \(\hat{A}=-\,0.9002\) and \(\hat{B}=3.4929\) and \(A-B\hat{F}=-\,0.0643\) while \(\hat{A}-\hat{B}\hat{F}=-\,0.0610\). Hence, the controller improves stability.

Example 13

Consider control system (50) with

$$\begin{aligned} A=\left[ \begin{array} [c]{ccc} -\,0.9 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad -\,0.1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 0.8 \end{array} \right] {\text { and }}B=\left[ \begin{array} [c]{c} -\,2.5\\ -\,3.5\\ 4.5 \end{array} \right] . \end{aligned}$$
(68)

As illustrated in Example 10, without noise we get excellent approximations of A and B. For both cases, the set of eigenvalues of the closed-loop system is \(\{-\,0.6172,0.4049,-\,0.0018\}\). With a noise of maximal amplitude 0.1, the estimates \(\hat{A}\) and \(\hat{B}\) are given in Example 10. For the feedback system one finds

$$\begin{aligned} \text{ spec }(\hat{A}-\hat{B}\hat{F})&=\{-\,0.6204,0.4053,-\,0.0018\},\\ \text{ spec }(A-{B}\hat{F})&=\{-\,0.6240,-\,0.0062,0.4111\}. \end{aligned}$$

In this example the feedback based on the estimate also stabilizes the original system.

Example 14

Consider control system (50) with

$$\begin{aligned} A=\left[ \begin{array} [c]{ccc} -\,20 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 1 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad 20 \end{array} \right] {\text { and }}B=\left[ \begin{array} [c]{c} 1\\ 2\\ 3 \end{array} \right] . \end{aligned}$$
(69)

As Example 11 illustrates, without noise we get excellent approximations of A and B. For the feedback system one finds

$$\begin{aligned} \text{ spec }(\hat{A}-\hat{B}\hat{F})&=\{0.1994,0.0483,-0.0501\},\\ \text{ spec }(A-{B}\hat{F})&=\{-0.1234\pm 2.0777i,0.5279\}. \end{aligned}$$

When there is noise of amplitude \(10^{-4}\), one computes the estimates

$$\begin{aligned} \hat{A}=\left[ \begin{array} [c]{ccc} -\,19.9805 &{}\quad 0.7484 &{}\quad 0.0135\\ -\,0.0062 &{}\quad 0.7969 &{}\quad 1.0107\\ -\,0.0229 &{}\quad 0.9851 &{}\quad 19.6776 \end{array} \right] {\text { and }}\hat{B}=\left[ \begin{array} [c]{c} 1.0194\\ 2.0114\\ 2.6673 \end{array} \right] . \end{aligned}$$
(70)

This are bad approximations for A and B. Furthermore, for the feedback system one finds

$$\begin{aligned} \text{ spec }(\hat{A}-\hat{B}\hat{F})&=\{0.1929,0.0477,-0.0501\},\\ \text{ spec }(A-{B}\hat{F})&=\{1.4510\pm 3.0103i,-2.5232\}. \end{aligned}$$

Thus the stabilizing controller for the approximate system does not stabilize the true system.

6 Conclusions

This paper has introduced the algorithm \({\mathcal {A}}\) based on kernel methods to identify a stable linear dynamical system from a time series. The numerical experiments give excellent results in the absence of noise and structural perturbations. In the presence of noise and structural perturbations the algorithm works well in the stable case. In the unstable case, a modified algorithm works quite well in the presence of noise but cannot handle structural perturbations.

Then we have extended algorithm \({\mathcal {A}}\) to identify linear control systems. In particular, we have used estimates obtained by kernel methods to stabilize linear systems using linear-quadratic control and the algebraic Riccati equation. Here the numerical experiments seem to indicate that the same conclusions on applicability of the algorithm apply.

Extensions of the considered algorithms to nonlinear systems appear feasible and are left to future work.