1 Introduction

A finite impulse response (FIR) model is a reasonable model for identification of linear time-invariant stable discrete-time system since it is linear-in-parameter and can approximate any systems by increasing the number of parameters (see e.g., Sect. 3.3 of [1]). The main drawback of FIR model is overfitting; when the number of data is small, the least squares estimate would be affected by noise significantly. To avoid overfitting, kernel regularization methods have been proposed and attracted a lot of attention in these days [1,2,3,4,5]. Kernel regularization methods estimate the FIR parameter through regularized least squares where the regularization term is designed based on a priori knowledge on the system, e.g., stability [1, 2], McMillan degree [6], relative degree [7, 8], or frequency response [9, 10].

One of the important challenges in kernel regularization is to identify unstable systems. In general, FIR model is not suitable for unstable systems, the impulse response of unstable system diverges, thus kernels for stable systems [11, 12] are not appropriate for unstable systems.

This paper proposes a simple and effective way to identify unstable systems with kernel regularization. The main idea of the proposed method comes from joint input–output identification approach (see e.g., Sect. 13.5 of [13]). We consider a closed-loop system stabilized by a controller, and first identify two stable transfer functions, one from reference to input and one from reference to output. Then the unstable target system is estimated by the ratio of these two transfer functions. Note that the above two transfer functions are stable if the controller stabilizes the target system, and thus the kernel regularization is available.

The advantages of the proposed method compared to the typical joint input–output identification are the following.

  • Compared to parametric approaches of joint input–output identification, the exact knowledge on the structure of the target transfer function is not required.

  • Compared to the case where two transfer functions are estimated by the least squares with FIR model, kernel regularization improves identification accuracy.

A preliminary version of this paper is presented in SICE Annual Conference 2022 [14]. Main differences between the preliminary version and this paper are (1) the main part of the paper (Sect. 4) is significantly revised, (2) the target in the numerical simulation is changed to more complicated one, and (3) a practical experiment is added.

Notation   The set of real numbers are denoted by \({\mathbb {R}}\). The transpose of a matrix A is denoted by \(A^\textrm{T}\). In this paper, z denotes the complex frequency in z-transform. The discrete convolution between two time series \(g_t\) and \(u_t\) (\(g_t=u_t=0\) for \(t<0\)), i.e., \(\sum _{i=0}^{\infty } g_i u_{t-i}\) is denoted by \(g_t * u_t\). We also use this notation for truncated time series; if \(g_t=0\) for \(t\ge m\), \(g_t*u_t=\sum _{i=0}^{m-1} g_i u_{t-i}\).

2 Problem setting

This paper focuses on a single-input-single-output linear discrete-time system G(z). The system G(z) can be unstable, so there may be a pole of G(z) outside the unit circle in the complex plane. We consider the closed-loop setting as illustrated in Fig. 1.

Fig. 1
figure 1

Block diagram of closed-loop system

Here, \(u_t, y_t, r_t\) and \(\varepsilon _t\) denote the input, output, reference and observation noise at time step t, respectively. The controller K(z) is assumed to stabilize the system, i.e., the poles of \(\frac{G(z)K(z)}{1+G(z)K(z)}\) lie inside the unit circle.

Based on these assumptions, this paper discusses the following problem.

Problem 1

Assume that the data \(\{(r_t, u_t, y_t)\}_{t=0}^{N-1}\) generated from the system illustrated in Fig. 1 are given. Estimate the frequency response of the system, i.e., \(G(e^{ {\mathrm i}\omega })\).

Remark 1

This paper assumes \(r_t=0\) for \(t<0\) for the ease of notation. The result can be extended to the case with \(r_t\ne 0\) for \(t<0\) by revising the matrix R defined in later.

3 Brief introduction of kernel regularization

This section briefly summarizes kernel regularization methods.

Fig. 2
figure 2

Input–output relationship in kernel regularization

Since the kernel regularization is employed for open-loop experiment cases, we consider the system illustrated in Fig. 2. To avoid the confusion, we use another notation, \(u^o_t, y^o_t,\) and \(\varepsilon _t^o\) for input, output, and noise at time step t, respectively. The target to be estimated is denoted by H(z) whose impulse response is denoted by \(h_t\), and we assume \(h_t =0\) for \(t\ge m\). With these notations, the input–output relationship is given by the discrete convolution

$$\begin{aligned} y^o_t = h_t*u^o_t +\varepsilon _t = \textstyle \sum \limits _{i=0}^t h_i u^o_{t-i} + \varepsilon _t^o. \end{aligned}$$
(1)

The goal of kernel regularization is to estimate \(h_0, \ldots , h_{m-1}\) from \(\{(u_t^o, y_t^o)\}_{t=0}^{N-1}\).

Let

$$\begin{aligned} U=&\begin{bmatrix} u^o_0&{} &{} &{} \\ u^o_1&{}u^o_0&{} &{}{\text {0}} \\ \vdots &{}\vdots &{}\ddots &{} \\ u^o_{N-1}&{}u^o_{N-2}&{}\cdots &{} u^o_{N-m} \end{bmatrix} \in {\mathbb {R}}^{N\times m}, \end{aligned}$$
(2)
$$\begin{aligned} y=&\begin{bmatrix} y^o_0&y^o_1&\cdots&y^o_{N-1} \end{bmatrix}^\textrm{T} \in {\mathbb {R}}^{N}. \end{aligned}$$
(3)

The estimate of impulse response

$$\begin{aligned} \hat{h}=\begin{bmatrix} \hat{h}_0&\cdots&\hat{h}_{m-1} \end{bmatrix}^\textrm{T} \in {\mathbb {R}}^m, \end{aligned}$$
(4)

by kernel regularization is then given as follows.

$$\begin{aligned} \hat{h} =&\mathop {\textrm{argmin}}\limits \ \Vert y-Uh\Vert ^2+h^\textrm{T}K^{-1}h \nonumber \\ =&\left( U^\textrm{T}U+K^{-1}\right) ^{-1}U^\textrm{T}y. \end{aligned}$$
(5)

Here, \(K \in {\mathbb {R}}^{m\times m}\) is a positive definite matrix whose (ij) element is given by a bivariate function k(ij). This function k is called a kernel function, and an appropriate design of kernel function improves the identification accuracy significantly.

Three most widely used kernels are tuned-correlated (TC) [1], diagonal-correlated (DC) [1], and stable-spline (SS) [2] defined as

$$\begin{aligned} k_\textrm{TC}(i,j) =&\beta \alpha ^{\max (i,j) }, \ \beta >0,~ 0<\alpha <1, \end{aligned}$$
(6)
$$\begin{aligned} k_\textrm{DC}(i,j) =&\beta \alpha ^{\frac{i+j}{2}} \rho ^{|i-j|}, \ \beta >0, ~0<\alpha<1, ~|\rho |<1, \end{aligned}$$
(7)
$$\begin{aligned} k_\textrm{SS}(i,j) =&\beta \left( \frac{\alpha ^{i+j+\max (i,j)}}{2}-\frac{\alpha ^{3\max (i,j)}}{6}\right) , \nonumber \\&\beta >0, ~0<\alpha <1. \end{aligned}$$
(8)

Here, a pair \((\beta , \alpha )\) (or triplet \((\beta , \alpha , \rho )\)) is called hyperparameter.

It should be noted that these kernels are derived for stable systems, i.e., exponentially converging impulse response. Therefore, these kernels are not applicable to unstable systems in a straightforward manner.

4 Proposed method

Let us consider transfer functions from \(r_t\) to \(u_t\) and \(y_t\); \(G_{ru}(z)=\frac{K(z)}{1+G(z)K(z)}\) and \(G_{ry}(z)=\frac{G(z)K(z)}{1+G(z)K(z)}\). Then, the target system G(z) is computed from these two stable transfer functions as \(G(z)=\frac{G_{ry}(z)}{G_{ru}(z)}\). This idea is used in systems identification; we can estimate G(z) by estimating two transfer functions \(G_{ru}(z)\) and \(G_{ry}(z)\). Such an identification method is called joint input–output identification (see e.g., Sect. 13.5 of [13]).

Typical difficulties of joint input–output identification is the following:

  • The structure of G(z) is unknown in general. Therefore, if we employ the parametric approach, how to select the structures of \(G_{ru}(z)\) and \(G_{ry}(z)\) itself is a difficult task.

  • If we employ nonparametric approach, overfitting problem arises especially when the amount of data is not enough.

To solve these difficulties, this paper proposes to connect joint input–output identification and kernel regularization. Note that the transfer functions \(G_{ru}(z)\) and \(G_{ry}(z)\) are stable since K(z) stabilizes the closed-loop system, thus kernel regularization is available.

Let \(g^{ru}_t\) and \(g^{ry}_t\) be impulse responses of \(G_{ru}(z)\) and \(G_{ry}(z)\), respectively. This paper proposes the following procedure to identify the unstable system G(z).

  1. Step 1

    Select \(m\le N\) be a sufficiently large natural number such that \(g^{ru}_t\) and \(g^{ry}_t\) can be truncated for \(t\ge m\).

  2. Step 2

    Estimate \(g^{ru}_t\) and \(g^{ry}_t\) (\(0\le t \le m-1\)) by a kernel regularization method. In more detail, let

$$\begin{aligned} R={\begin{bmatrix} r_{0} &{} &{} &{} \\ r_1 &{} r_{0}&{} &{}{\text {0}} \\ \vdots &{}&{}\ddots \\ r_{N-1}&{}r_{N-2} &{} \cdots &{} r_{N-m} \end{bmatrix} \in {\mathbb {R}}^{N\times m}. } \end{aligned}$$
(9)

Then,

$$\begin{aligned} \begin{bmatrix} \hat{g}^{ru}_0 \\ \hat{g}^{ru}_1 \\ \vdots \\ \hat{g}^{ru}_{m-1} \end{bmatrix}=&\left( R^\textrm{T}R + K_{ru}^{-1}\right) ^{-1}R^\textrm{T} \begin{bmatrix} u_0 \\ u_1 \\ \vdots \\ u_{N-1} \end{bmatrix},\end{aligned}$$
(10)
$$\begin{aligned} \begin{bmatrix} \hat{g}^{ry}_0 \\ \hat{g}^{ry}_1 \\ \vdots \\ \hat{g}^{ry}_{m-1} \end{bmatrix}=&\left( R^\textrm{T}R + K_{ry}^{-1}\right) ^{-1}R^\textrm{T} \begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_{N-1} \end{bmatrix}, \end{aligned}$$
(11)

where \(K_{ru} \in {\mathbb {R}}^{m\times m}\) and \(K_{ry} \in {\mathbb {R}}^{m\times m}\) are kernel matrices with respect to \(g^{ru}_t\) and \(g^{ry}_t\), respectively, whose (ij) element is defined by k(ij).

Remark 2

Hyperparameters for \(K_{ru}\) and \(K_{ry}\) can be different. How to tune these hyperparameters are discussed later.

  1. Step 3

    Compute discrete-time Fourier transformation of \(\hat{g}^{ru}_t\) and \(\hat{g}^{ry}_t\):

    $$\begin{aligned} \hat{G}_{ru}(e^{{\mathrm i}\omega })=&\textstyle \sum \limits _{j=0}^{N-1} \hat{g}^{ru}_j e^{-{\mathrm i}j \omega }, \end{aligned}$$
    (12)
    $$\begin{aligned} \hat{G}_{ry}(e^{{\mathrm i}\omega })=&\textstyle \sum \limits _{j=0}^{N-1} \hat{g}^{ry}_j e^{-{\mathrm i}j \omega }. \end{aligned}$$
    (13)

Construct \(\hat{G}(e^{{\mathrm i}\omega })\) by \(\frac{\hat{G}_{ry}(e^{{\mathrm i}\omega })}{\hat{G}_{ru}(e^{{\mathrm i}\omega })}\).

Main advantages of the proposed method are shown in the following:

  • The structure of G(z) could be unknown.

  • Thanks to the kernel regularization, the identification accuracy is improved compared to the least squares estimate especially when the data are not adequate.

The remaining problem is how to tune the hyperparameter for \(K_{ru}\) and \(K_{ry}\). To this end, the effect of observation noise \(\varepsilon _t\) should be noted. From Fig. 1, the relations between \(r_t\) and \(u_t, y_t\) are given as

$$\begin{aligned} u_t =&G_{ru} r_t - \frac{1}{1+KG}\varepsilon _t, \end{aligned}$$
(14)
$$\begin{aligned} y_t =&G_{ry} r_t + \frac{K}{1+KG}\varepsilon _t \end{aligned}$$
(15)

with a slight abuse of notation.Footnote 1 This suggests that the observation noise shows unknown colored behavior if we consider the relations between \(r_t\) and \(u_t, y_t\). Based on the above observation, standard hyperparameter tuning methods such as empirical Bayes can not be used since they assume the whiteness of the observation noise.Footnote 2

Based on this, this paper employs validation data to tune the hyperparameter. In the following, we use \(\rho \) to denote the hyperparameter. The estimation of \(g_t^{ru}\) and \(g_t^{ry}\) including hyperparameter tuning is summarized as follows.

  1. Step 0

    Prepare two sets of observed data with the fixed controller as \(\{(r_t, u_t, y_t)\}_{t=0}^{N-1}\) and \(\{(\bar{r}_t, \bar{u}_t, \bar{y}_t)\}_{t=0}^{\bar{N}-1}\). Here, \(\bar{r}_t, \bar{u}_t, \bar{y}_t\) are the reference, input, and output signals used in the second experiment.

  2. Step 1

    Generate candidates of hyperparameter \(\{\rho _i\}_{i=1}^{N_{\rho }}\).

  3. Step 2

    Estimate the impulse response of \(G_{ru}(z)\) and \(G_{ry}(z)\) from the first data set with \(\rho _i\), and denote them by \(\hat{g}^{ru}_t(\rho _i)\) and \(\hat{g}^{ry}_t(\rho _i)\).

  4. Step 3

    Compute

    $$\begin{aligned} E_{ru}(\rho _i)=&\textstyle \sum \limits _{t=0}^{\bar{N}-1} \left( \bar{u}_t-\hat{g}^{ru}_t(\rho _i)*\bar{r}_t\right) ^2,\end{aligned}$$
    (16)
    $$\begin{aligned} E_{ry}(\rho _i)=&\textstyle \sum \limits _{t=0}^{\bar{N}-1} \left( \bar{y}_t-\hat{g}^{ry}_t(\rho _i)*\bar{r}_t\right) ^2. \end{aligned}$$
    (17)
  5. Step 4

    Select

    $$\begin{aligned} \rho _{ru}^*=\mathop {\textrm{argmin}}\limits _{\rho _i}\ E_{ru}(\rho _i), \end{aligned}$$
    (18)
    $$\begin{aligned} \rho _{ry}^*=\mathop {\textrm{argmin}}\limits _{\rho _i}\ E_{ry}(\rho _i) , \end{aligned}$$
    (19)

and return \(\hat{g}_t^{ru}(\rho _{ru}^*)\) and \(\hat{g}_t^{ru}(\rho _{ru}^*)\) as the estimates.

Remark 3

There are several alternative approaches to the above hyperparameter tuning method such as two-fold cross validation, SURE, or generalized cross validation (see e.g., [15, 16]). As shown in the following examples, however, the above simple tuning method also works well.

5 Numerical example

To demonstrate the effectiveness of the proposed method, this section shows a numerical example with a model of magnetic levitation model used in [17]:

$$\begin{aligned} P(z)= \frac{-3.1854\times 10^{-4}(z+3.696)(z+0.2652)}{(z-1.069)(z-0.9608)(z-0.9355)}. \end{aligned}$$
(20)

A stabilized controller for this system is selected as

$$\begin{aligned} K(z)= \frac{-14.124(z\!-\!0.9816)(z^2\!-\!1.907z\!+\!0.9109)}{(z\!-\!1)(z\!-\!0.3012)(z^2\!-\!1.266z\!+\!0.4824)}. \end{aligned}$$
(21)

The length of experiment N and the length of FIR m are set to 255, and M sequence (Maximum-length sequence) is used as reference signal \(r_t\). The noise variance is selected so as to SNR becomes 20 (dB). TC kernel is employed as the kernel function, and the candidates of hyperparameter are \(\{\alpha _i\}_{i=1}^{10} \times \{\beta _i\}_{i=1}^{10}\) where \(\alpha _i\) and \(\beta _i\) are logarithmically spaced from 0.8 to 0.99 and from \(10^{-4}\) to \(10^4\), respectively.

Based on the above setting, we identified the system for 30 times with independent noise realizations.

Fig. 3
figure 3

Impulse response of \(G_{ru}(z)\)

Fig. 4
figure 4

Impulse response of \(G_{ry}(z)\)

Figures 3 and 4 are impulse responses of \(G_{ru}(z), G_{ry}(z)\) and their estimates. The horizontal and vertical axes show time and impulse response, respectively.Footnote 3 The 30 estimates are shown with gray lines, and the true ones are shown with red lines.

We have several observations from these figures.

  • The behavior of these impulse responses is complicated, thus a parametric approach of joint input–output identification is not easy if we do not know the structure of P.Footnote 4

  • Although the impulse response behaves in a complicated way, the estimates well approximate the true ones thanks to kernel regularization.

  • Convergence rates and scales of the impulse responses of \(G_{ru}(z)\) and \(G_{ry}(z)\) are different. Hence it is natural to tune hyperparameters of \(K_{ru}\) and \(K_{ry}\) separately.

Based on these estimates, we computed \(\hat{G}(e^{{\mathrm i}\omega })\).

Fig. 5
figure 5

Estimated frequency responses with proposed method

Figure 5 shows the estimated Bode diagram. The horizontal axes show the frequency with rad/sample, and the vertical axes show the gain and phase, respectively. The red lines show the true responses, and the gray lines show the 30 estimated responses. We can confirm that the estimates well approximate the true response.

This result suggests the following advantage of the proposed method.

  • Although the proposed method is one of the nonparametric methods whose number of parameter m is equivalent to the number of data N, the estimated result well approximates the true response and thus the overfitting is avoided.

  • An unstable system can be identified by the proposed method with kernel regularization which is developed for stable systems.

  • Different from [18], the periodicity of the reference signal is not required.

To show the effectiveness of the proposed method more clearly, we compare the result with a parametric approach of joint input–output identification. We employed tfest function in MATLAB System identification toolbox (ver 9.14). The number of poles of \(G_{ru}(z)\) and \(G_{ry}(z)\) are set to 6 and 7, and the number of zeros of \(G_{ru}(z)\) and \(G_{ry}(z)\) are set to 5 and 6, respectively. Note that these values are true ones.

Fig. 6
figure 6

Impulse response of \(G_{ru}\): parametric approach

Fig. 7
figure 7

Impulse response of \(G_{ry}\): parametric approach

Figures 6 and 7 show the estimated impulse responses of \(G_{ru}(z)\) and \(G_{ry}(z)\) with the parametric approach, respectively. The definitions of axes and the lines are the same as Figs. 3 and 4. We can see that \(G_{ru}(z)\) is approximated well, but \(G_{ry}(z)\) is not well approximated.

Fig. 8
figure 8

Estimated frequency responses with parametric approach

Figure 8 shows the estimated frequency response with the parametric approach. The definitions of axes and the lines are the same as Fig. 5. Since models are parametric ones, frequency responses are smooth. However, the identification accuracy becomes worse compared to the proposed method especially in the low frequency range.

Table 1 Mean and variance of Fit values

Table 1 shows the mean, variance, and median of Fit values of estimated impulse responses where Fit is defined by

$$\begin{aligned} 100 \times \left( 1-\frac{\sqrt{\sum \nolimits _{t=0}^{m-1}\left( \hat{g}^{\bullet }_t - g^{\bullet }_t\right) ^2}}{\sqrt{\sum \nolimits _{t=0}^{m-1}\left( \hat{g}^{\bullet }_t - g^{\bullet }_t\right) ^2}}\right) , \end{aligned}$$

where \(\hat{g}^{\bullet }_t\) and \(g^{\bullet }_t\) are estimated and true impulse responses, respectively. Here, \(\bullet \) indicates either ru or ry. This table indicates that the proposed method estimates the impulse responses with small variance compared to the parametric approach in this case. In particular, \(G_{ry}\) is much better estimated by the proposed method than the parametric approach.

These results show the effectiveness of the proposed method clearly; even if the structure of the target system is known, the proposed method shows slightly better identification accuracy compared to the parametric approach. As mentioned above, the structure of the target system itself is often unknown in practice. Thus the above result clearly shows the importance of the proposed method.

Remark 4

Since \(N=m\) and the SNR is 20 (dB), the least squares method shows overfitting. Since the results cannot be plotted in the same range and the order of Fit becomes about \(10^{20}\), we omit the results with the least squares method.

6 Practical experiment

To demonstrate the effectiveness of the proposed method, we show a practical experiment with a DC motor in this section.

The target system is Quanser QUBE Servo 2 shown in Fig. 9. The input is voltage, and the angular velocity is chosen as the output. The sampling time is 0.002 s. We added a 15 g load to the motor and considers a model around inverted position to make the system unstable.

Fig. 9
figure 9

Quanser QUBE Servo 2 used in experiment

Fig. 10
figure 10

Definition of angle \(\theta \)

To describe the experimental setup clear, we define the angle of the motor measured from up-direction \(\theta \) as shown in Fig. 10. With this notation, our goal is described as to model the behavior of \(\dot{\theta }\) around \(\theta =0\).

Fig. 11
figure 11

Closed-loop setup in experiment

To achieve these, we employed the closed-loop system shown in Fig. 11. The block \(\frac{1}{z-1}\) indicates the integrator. Although the output \(y_t\) is angular velocity, the signal \(r_t\) gives a reference to the angle. Note that the proposed method can be also used in this setting since \(G_{ru}(z)\) and \(G_{ry}(z)\) become \(\frac{K(z)}{1+P(z)K(z)\frac{1}{z-1}}\) and \(\frac{P(z)K(z)}{1+P(z)K(z)\frac{1}{z-1}}\), respectively, and still the relation \(G(z)=\frac{G_{ry}(z)}{G_{ru}(z)}\) holds.

Based the above setting, we collected the input/output data. We first set \(\theta =\frac{1}{16}\pi \), and then set \(\theta =0\).

Fig. 12
figure 12

Input signal used in experiment

Fig. 13
figure 13

Output signal measured in experiment

We applied the proposed method with \(N=m=750\). TC kernel is employed as the kernel function, and the candidates of hyperparameters are the same with the ones used in the simulation. Figures 12 and 13 show the measured input/output and the estimated ones from \(\hat{g}_t^{ru}\) and \(\hat{g}^{ry}_t\), respectively. The solid lines shows the measured signal, and the broken lines show the estimated ones. These results indicate that the estimated impulse responses reproduce the measured signal well.

Fig. 14
figure 14

Estimated impulse response of \(G_{ru}(z)\) in experiment

Fig. 15
figure 15

Estimated impulse response of \(G_{ry}(z)\) in experiment

Fig. 16
figure 16

Estimated Bode diagram of Quanser QUBE Servo 2 with proposed method

Figures 14 and 15 show the estimated impulse responses of \(G_{ru}(z)\) and \(G_{ry}(z)\), respectively. Thanks to the kernel regularization, the estimated results seem to avoid overfitting.

Figure 16 shows the Bode diagram of the estimated target transfer function. The horizontal axis shows the frequency with Hz. Gain diagram suggests that the system shows 20 [dB/dec] high-frequency decay property. This is reasonable since (stable) DC motors are often model as \(\frac{b}{s+a}\) where s denotes the complex frequency in Laplace transform.

From the above result, the effectiveness of the proposed method is experimentally validated.

7 Conclusion

This paper proposes an identification procedure for unstable systems with kernel regularization. In particular, the proposed method employs joint input–output identification framework, i.e., estimates two transfer functions from reference signal to input and output, and then constructs the model of system with these models. A numerical example is shown to demonstrate the effectiveness of the proposed method. In particular, the proposed method gives slightly better estimates even when the parametric approach is available, i.e., when the true structure of the system is known. When the structure is unknown, the proposed method would be much better than the parametric approach. In addition to the simulation, a practical experiment with a DC motor is also shown.

Estimation of noise model is one of the future tasks.