1 Introduction

The main problem of the traditional adaptive control theories is: they depend on mathematical model of controlled plant and the un-modeled dynamics. Therefore, how to design adaptive control system only based on information from the I/O data of the plant under control will be of great significance both in theoretical aspects and applications. In recent years, the PID type control technique based on the classical PID control, and the applications of artificial neural network technique in adaptive control [1] have been well developed. Although they can tackle the control problems without establishing the mathematical model of the plant under control previously, they both suffer some limitations themselves. In this paper, an adaptive data driven control method is proposed based upon a gradient approximation method—SPSA (Simultaneous perturbation stochastic approximation) [2]. Simulation comparison tests have been conducted on typical discrete-time non-linear systems, solving non-linear tracking problems, as well as near-optimal control problems. The effectiveness of the novel data driven control strategy is fully illustrated through the simulation results.

The data driven control approach discussed here uses the system measurements to determine a controller without the need to estimate or assume a separate model of the system. The controller acts as a function, taking as input past and current system information and producing as output a control value to affect future system performance. The approach here is based on using a function approximation (FA) technique to construct the controller. Popular FAs include, for example, polynomials, multilayered feed-forward neural networks, splines, trigonometric series, etc. By results such as the well-known Stone–Weierstrass approximation theorem (e.g., Rudin [3]), it can be shown (e.g., Narendra and Parthasarthy [4]) that many of these techniques share the property that they can be used to approximate any continuous function to any degree of accuracy. Each FA technique tends to have advantages and disadvantages, some of which are discussed for control problems in Narendra and Parthasarthy [4], Lane, et al. [5], and Spall [6].

Neural networks are a convenient means of representation because they are universal approximators that can learn data by example [7] or reinforcement [8], either in batch or sequential mode. They can be easily trained to map multidimensional non-linear functions because of their parallel architecture. Other parametric structures, such as splines and wavelets, have become standard tools in regression and signal analysis involving input spaces with up to three dimensions [912]. However, much of univariate approximation theory does not generalize well to higher dimensional spaces [13]. For example, the majority of spline-based solutions for multivariate approximation problems involve tensor product spaces that are highly dependent on the coordinate system of choice [1416]. Neural networks can be used effectively for the identification and control of dynamical systems, mapping the input-output representation of an unknown system and, possibly, its control law [17, 18]. For example, they have been used in combination with an estimation-before-modeling paradigm to perform online identification of aerodynamic coefficients [19]. Derivative information is included in the training process, producing smooth and differentiable aerodynamic models that can then be used to design adaptive non-linear controllers.

In the past decade, intelligent control methods using function approximation, such as neural network, fuzzy network, and wavelet network, have been proposed, which opened a new avenue leading to more generic solutions and better control performance. The most profound feature of those function approximation methods is that the nonparametric function f(x) is given a representation in a parameter space. Various structures based on neural networks, radial basis networks, wavelet networks, and hinging hyperplanes have been used as black-box non-linear dynamical models ([20, 21]) with the selected function approximation. The problem of function approximation also is central to the solution of differential equations. Neural networks can provide differentiable closed-analytic- form solutions that have very good generalization properties and are widely applicable [22]. Thus, in this paper, the function approximator is fixed as a neural network.

There are examples [2327] of approaches that rely on controller FAs but introduce stronger modeling assumptions (e.g., deterministic systems or specific knowledge of how the controller enters the system dynamics) as a means of performing a stability analysis. However, for systems where only a flawed (if any) model is available, attempts to do such analysis can lead to suboptimal (or worse) controllers and faulty stability and controllability analysis. It is such cases that are of interest here.

The remainder of the paper is organized as follows: Sect. 2 reviews briefly the SPSA algorithm employed in this study. Section 3 proposes our adaptive data driven control approach with convergence analysis. Section 4 provides simulation experiments and analysis of their findings. Section 5 draws conclusions.

2 Simultaneous perturbation stochastic approximation (SPSA)

Firstly, we give a brief review of the SPSA approach [2]. Consider the problem of finding a root θ of the gradient equation:

$$ g(\theta)\equiv\frac{\partial L(\theta)}{\partial (\theta)}=0 $$
(1)

for some differentiable loss function L:R pR 1. Letting \(\hat{\theta}_{k}\), denote the estimate for θ k at the kth iteration, the Stochastic Approximation (SA) algorithm has the standard form:

$$ \hat{\theta}_{k+1}={\hat{\theta}}_{k}-{a_k} \hat{g}_k(\hat{\theta}_k) $$
(2)

where a k is the gain sequence, and it satisfies certain well-known conditions [2]. Note the close relationship of (2) to the method of steepest descent, the difference being that in steepest descent g k (⋅) replaces \(\hat{g}_{k}(\cdot)\). Let Δ k R p be a vector of p mutually independent mean-zero random variables {Δ k1,Δ k2,…,Δ kp }. Furthermore, let {Δ k } be a mutually independent sequence with Δ k , independent of \(\hat{\theta}_{0}, \hat{\theta}_{1}, \ldots ,\hat{\theta}_{k}\). No assumptions regarding the specific type of distribution for Δ ki need to be made here. Consistent with the usual SA framework, noisy measurements L(⋅) are available. In particular, at design levels, \(\hat{\theta}_{k}\pm c_{k}\varDelta_{k}\), with c k a positive scalar, let

(3)

where \(\varepsilon_{k}^{(+)},\varepsilon_{k}^{(-)}\) represent the measurement noise terms that satisfy

(4)

One form for the estimate of g(⋅) at the kth iteration is then

(5)

Note that this estimate differs from the usual finite difference approximation in that only two measurements (instead of 2p) are used. (The name simultaneous perturbation as applied to (5) arises from the fact that all elements of the \(\hat{\theta}_{k}\), vector are being varied simultaneously.) (2) is evaluated with \(\hat{g}_{k}(\cdot)\) as in (5), and can also be used with several (conditional on \(\mathcal{F}\)) independent simultaneous perturbation approximations averaged at each iteration [2].

3 Adaptive data driven control strategy

In this section, an adaptive data driven control strategy is proposed with convergence analysis. The controller is fixed as a function approximator and the novel control strategy is proposed based upon SPSA (Simultaneous perturbation stochastic approximation) [3] method. This adaptive control approach does not need to establish the mathematical model of controlled plant previously. And with its cost function, the controller has integral structure.

3.1 Overview of approach to control without system model

Consider a general discrete-time state space system of the form:

(6)

where ϕ(⋅) and h k (⋅) are generally unknown non-linear functions governing the system dynamics and measurement process, u k is the control input applied to govern the system at time k+1, and w k and v k are noise terms (not necessarily serially independent or independent of each other). Based on information contained within measurements and controls up to y k , and u k−1, the goal is to choose a control u k in a manner so as to minimize some loss function related to the next measurement y k+1. Often times, this loss function will be one that compares y k+1 against a target value t k+1, penalizing deviations between the two.

In the approach here, a function approximator (e.g., neural network, polynomial) will be used to produce the control u k . Consider a FA (function approximator) of fixed structure across time (e.g., the number of layers and nodes in a neural network is fixed) but allow for underlying parameters in the FA (e.g., connection weights in a neural network) to be updated. Since past information will serve as input to the control, this fixed structure requires that u k be based on only a fixed number of previous measurements and controls (in contrast to using all information available up to time k for each k).

3.2 Formulation of estimation problem for determining FA

Associated with the FA generating u k will be a parameter vector θ k , that must be estimated (e.g., the connection weights in a neural network). Recall the assumption that the FA’s structure is fixed across time, hence the adaptive control problem of finding the optimum control at time k is equivalent to finding the θ k ∈ℜp, (p not a function of k), that minimizes some loss function L k (θ k ). Here take a common loss function—the one-step-ahead quadratic tracking error as an example:

(7)

where A k and B k are positive semi-definite matrices reflecting the relative weight to put on deviations from the target and on the cost associated with larger values of u k . The approach can also apply with non-quadratic and/or nontarget-tracking loss functions. Such functions might arise, e.g., in constrained optimal control problems where the goal is to minimize some cost (without regard to a specific target value) and penalty functions are used for certain values of y k+1 and/or u k , to reflect problem constraints. For convenience, here the approach will focus on the points with the target tracking problem exemplified by (7). The problem of minimizing L k (θ k ) implies that for each k, seek the globally optimal \(\theta_{k}^{*}\) such that

$$ g_k(\theta_k)=\frac{\partial L_k}{\partial \theta_k}=\frac{\partial u_k^T}{\partial \theta_k}\cdot \frac{\partial L_k}{\partial u_k}=0 \quad \mbox{at } \theta_k=\theta_k^* $$
(8)

Since ϕ k (⋅) and h k+1(⋅) are incompletely known functions, the term \(\frac{\partial L_{k}}{\partial u_{k}}\) which involves the terms \(\frac{\partial h_{k+1}}{\partial x_{k+1}}\) and \(\frac{\phi_{k}}{\partial u_{k}}\), is not generally computable. Hence, g k (θ k ) is not generally available. To illustrate this fact, consider a simple scalar deterministic version of system (6). Then, under the standard squared-error loss:

(9)

Because gradient-descent-type algorithms are not generally feasible in the model-free setting, here consider the SPSA algorithm of the form in (2) to estimate {θ k }, where \(\hat{\theta}_{k}\) denotes the estimate at the given iteration. {a k } is a scalar gain sequence satisfying certain regularity conditions, and the gradient approximation is such that it does not require knowledge of ϕ k (⋅) and h k+1(⋅) in (6).

3.3 Adaptive data driven control strategy

In the data driven control strategy here, the mathematical model of controlled plant is unknown and the controller is a Function Approximator (FA). The FAs generally include two forms: polynomial and neural networks. Each approximator has its Advantages and disadvantages, as for polynomial approximator, its structure is relatively simple, and has a clear physical meaning, such as the commonly used PID controller. And for the neural networks, they can theoretically approximate any continuous function at arbitrary precision according to the famous Stone–Weierstrass approximation theorem, and thus can solve the non-linear control problems effectively. So here, according to [28], the FA was considered as a multi-layer feed-forward neural network to produce the control U. The number of layers and notes in each layer is fixed previously. The connecting weights are then the control parameter θ, which are allowed to be updated. Its whole structure is shown in Fig. 1.

Fig. 1
figure 1

Schematic of the data driven control

Suppose the “sliding window” of previous information available at time k, say I k , contains M previous measurements and N previous controls. Thus at time k, the input of neural network is

(10)

where u(k) is the output and y d (k) is the expected output. The whole structure of the control strategy can be seen in Fig. 2. A suitable choice of M and N should both ensure the completeness of the information data and avoid too large value, which may increase the complexity of the FA. How to decide an optimal structure of the FA (number of hidden layers and nodes in the neural network) is also an important issue. Some researchers have done relative researches in this area [2931], but it is not our focus in this paper. For convenience, we use neural networks with fixed structures here. Referring to [32], one non-linear hidden layer is shown to be enough for approximating non-linear mapping, and using more than one hidden layer may have the advantage of significantly reducing the number of hidden neurons needed for the same task. Thus in this paper, we fix the FA as a two-hidden-layer feed-forward neural network.

Fig. 2
figure 2

Structure of the control strategy

y(k) is the output of the controlled plant while u(k) is its input control at time k. y d (k+1) is the expected output at time k+1. The aim is to find an optimized control parameter \(\theta^{*}_{k}\), to minimize the objective function J.

In the data-based control strategy here, the parameter θ k is estimated by (2), where \({\hat{g_{k}}}({\hat{\theta}}_{k})\) is the simultaneous perturbation approximation to \({g_{k}}({\hat{\theta}}_{k})\). In particular the lth component of \({\hat{g_{k}}}({\hat{\theta}}_{k})\), l=1,2,…,p, is given by

$$ {\hat{g_{kl}}}({\hat{\theta}}_{k})=\frac{{{\hat{J}}^{(+)}_k}-{\hat{J}}^{(-)}_k}{ 2c_k{\varDelta}_{kl}} $$
(11)

\({{\hat{J}}^{(\pm)}_{k}}\) are estimated values of \(J_{k}(\hat{\theta}_{k}\pm c_{k}{\varDelta}_{k})\) using the observed \(y_{k+1}^{(\pm)}\) and \(u_{k}^{(\pm)}\). \(u_{k}^{(\pm)}\) are controls based on the parameter vector \(\theta_{k+1}={\hat{\theta}}_{k}\pm c_{k}{\varDelta}_{k}\), where Δ k =[Δ k1,Δ k2,…,Δ kp ]T, with Δ kl independent, bounded, symmetrically distributed (about 0) random variables ∀k,l identically distributed. {c k } is a sequence of positive numbers satisfying certain regularity conditions (typically c k →0 or c k =c,∀k, depending on whether the system equations are stationary or non-stationary).

According to [28], the choice of a k and c k is important for adequate performance of the algorithm (analogous to choosing the step-size in back-propagation). Choosing an a k too small may lead to an excessively slow reduction in error, while choosing it too large may cause the system to go unstable. Thus, it might be appropriate to begin with a relatively small a k and gradually increase it until there is an adequate convergence rate but little chance of going unstable. Inspired by this idea, an adaptive tuning method for a k is proposed here by defining

$$ a_k=\gamma /\bigl(1+\lambda/k^2\bigr) $$
(12)

And the convergence proof of this newly proposed adaptive control method is given in the following section.

In the approach above, at each iteration step, only two close-loop tests are needed. The estimation value of \({g_{k}}({\hat{\theta}}_{k})\) (that is, \({\hat{g_{kl}}}({\hat{\theta}}_{k})\)) can then be obtained. The model of the controlled plant is not needed during the whole control procedure. For clarity, the proposed novel control strategy can be summarized as follows:

Step 1 :

Give random values to θ and initialize the whole control system;

Step 2 :

Calculate the control signal u(k) from the neural network;

Step 3 :

Apply the control u(k) to the realtime non-linear plant under control and get its output y(k+1);

Step 4 :

Calculate \(\widehat{J}_{k}^{(\pm)}\), with the definition of the objective function;

Step 5 :

Update θ k by the proposed adaptive method;

Step 6 :

Let k=k+1, repeat Step 2–Step 5.

3.4 Convergence analysis

In this section, we present one Proposition that establishes conditions under which \(\hat{\theta_{k}}\) converges to θ almost surely. The error term is defined as

$$ e_k(\hat{\theta}_k)=\hat{g_k}(\hat{ \theta}_k)-E\bigl(\hat{g_k}(\hat{\theta}_k) \bigm|\hat{\theta}_k\bigr) $$
(13)

Then, (2) can be rewritten as:

$$ \hat{\theta}_{k+1}=\hat{\theta}_k-a_k \bigl[g_k(\hat{\theta}_k)+b_k(\hat{ \theta}_k)+e_k(\hat{\theta}_k)\bigr] $$
(14)

First, we introduce some assumptions as:

  1. I

    Consider all kK for some K<∞. Suppose that for each such k, the Δ ki are i.i.d. (i=1,2,…,p) and symmetrically distributed about 0, with |Δ ki |≤α 0 a.s. and \(E|\varDelta_{ki}^{-1}|\leq \alpha_{1}\). For almost all \(\hat{\theta}_{k}\) (at each kK), suppose that ∀θ in an open neighborhood of \(\hat{\theta}_{k}\), that is not a function of k or ω , L (3)(θ)≡ 3 L/∂θ T ∂θ T ∂θ T exists continuously with individual elements satisfying \(|L^{(3)}_{i_{1}i_{2}i_{3}}(\theta)|\leq \alpha_{2}\) (L is defined as some differentiable loss function here).

  2. II

    c k >0 ∀k; c k →0 as k→∞; and \(\sum_{k=0}^{\infty}(\frac{a_{k}}{c_{k}})^{2} <\infty\).

  3. III

    For some α 0,α 1,α 2>0 and ∀k, \(E[\epsilon_{k}^{(\pm)}]^{2}\leq \alpha_{0}\), \(E[L(\hat{\theta}_{k}\pm c_{k}\varDelta_{k})]^{2} \leq \alpha_{1}\), and \(E [\varDelta_{kl}^{-2}] \leq \alpha_{2}~(l=1,2,\dots,p)\).

  4. IV

    \(\|\hat{\theta}_{k} \|<\infty\) a.s. ∀k.

  5. V

    θ is an asymptotically stable solution of the differential equation dx(t)/dt=−g(x).

  6. VI

    Let D(θ )={x 0:lim t→∞ x(t|x 0)=θ }, where x(t|x 0) denotes the solution to the differential equation of V based on initial conditions x 0 (i.e., D(θ ) is the domain of attraction). There exists a compact SD(θ ), such that \(\hat{\theta}_{k}\in S\) infinitely often for almost all sample points.

Proposition

Let Assumptions I–VI hold. Then as k→∞,

$$ \hat{\theta}_{k} \rightarrow \theta^*\quad \mbox{\emph{for almost all} } \omega\in \varOmega $$
(15)

Proof

Based on the theory given in [33, Lemma 2.2.1 and Theorem 2.3.11], we find that, given II, IV–VI, if

  1. (i)

    \(\|b_{k}(\hat{\theta}_{k})\|<\infty\ \forall k\), and \(b_{k}(\hat{\theta}_{k})\rightarrow 0\) a.s.

  2. (ii)

    \(\lim_{n\rightarrow{\infty}} P(\sup_{m\geq k} \| \sum _{i=k}^{m}a_{i}e_{i}(\hat{\theta})_{i} \|\geq \eta)=0\) for any η>0 are satisfied, then (15) holds.  □

From [2, Lemma 1], we know that

(16)

Now it can be seen obviously that (i) follows immediately by (16) and Assumption II.

For (ii), since here a k is defined as a k =γ/(1+λ/k 2) and \(\{\sum_{i=k}^{m} a_{i}e_{i}\}_{m\geq k}\) is a martingale sequence, we have from an inequality in [34, p. 315] (see also [33, p. 27])

(17)

As defined in (3) and (5), we have

(18)

Then:

(19)

From Assumption III, we know that \(E[\epsilon_{k}^{(\pm)}]^{2} \leq\nobreak \alpha_{0}\), \(E[L(\hat{\theta_{k}}\pm c_{k}\varDelta_{k})]^{2} \leq \alpha_{1}\), and \(E [\varDelta_{kl}^{-2}] \leq \alpha_{2} (l=1,\allowbreak 2,\dots,p)\). Then we have

(20)

As

$$ e_k(\hat{\theta}_{k})=\hat{g}_k(\hat{ \theta}_{k})-E\bigl(\hat{g}_k(\hat{\theta}_{k})| \hat{\theta}_{k}\bigr) $$
(21)

we have

$$ \bigl\|e_k(\hat{\theta}_{k})\bigr\|^2=\sum _{l=1}^{p}\bigl(\hat{g}_{kl}(\hat{ \theta}_{kl})-E\bigl(\hat{g}_{kl}(\hat{\theta}_{kl})\bigm| \hat{\theta}_{kl}\bigr)\bigr)^2 $$
(22)

From the fact that \(E(e_{i}^{T}e_{j})=E(e_{i}^{T}E(e_{j}|\hat{\theta}_{j}))=0,\allowbreak \forall i<\nobreak j\). Now for any l∈{1,2,…,p}, we have

(23)

Now, from (17) and Assumption II, (ii) has been shown, and the proof is completed.

4 Simulation results

In order to test the performance of the proposed adaptive data driven control strategy, simulation tests have been done on typical non-linear systems [35, 36]. And the original SPSA-based model-free control method [6] is introduced for comparison.

4.1 Non-linear tracking problem

First, this novel approach is applied to solve non-linear tracking problems and the objective function is defined as one-step-ahead quadratic tracking error.

The output error is written as

$$ e_k(\theta_k)=y_d(k)-y(k) $$
(24)

And here, we fix the objective function J as:

$$ J_k(\theta_k)=E\bigl[e_k^2( \theta_k)\bigr] $$
(25)

Thus, the aim of our proposed method is to minimize the system output error. First, introduce the following system for analyzing:

(26)

The controller is fixed as a multi-layer neural network with two hidden layers, and an N 4,3,2,1 neural network is used for system (26) for testing. Decaying gain c k =0.15/k 0.101 is used in order to fulfill the requirements for convergence [28] and the parameter a k is fixed as: a k =0.005/(1+0.05/k 2) according to the proposed adaptive control strategy.

Simulation tests were conducted 50 independent times and the mean value of the output error Err is evaluated:

$$ \mathit{Err}=\Biggl(\,\sum_{k=1}^{P} \bigl(y(k)-yd(k)\bigr)^2\Biggr)/P $$
(27)

where P is the number of the running steps. The given signal is a step signal of amplitude 2 and the mean value of the output error is calculated. First, apply the original SPSA-based model-free control method [6] to the non-linear plant (26). The tracking result and its corresponding control input are shown in Fig. 3 and Fig. 4. Then, apply the adaptive data driven control strategy to the same non-linear plant and the results are shown in Fig. 5 and Fig. 6. Run the SPSA-based method and our proposed method separately, each for 50 times. The comparison results of their average output errors on plant (26) are shown in Table 1.

Fig. 3
figure 3

Performance of the original SPSA-based model-free control method on plant (26)

Fig. 4
figure 4

The corresponding control input

Fig. 5
figure 5

Performance of the adaptive data driven control strategy on plant (26)

Fig. 6
figure 6

The corresponding control input

Table 1 Comparison results on plant (26) out of 50 runs

From the above comparison results, it can be seen obviously that the newly proposed adaptive data driven control strategy can achieve much smaller output error and can get much better tracking results. To further test the performance of the proposed adaptive control strategy, another two typical non-linear systems (28) and (29) are introduced:

$$ y(k+1)=\frac{y(k)}{1+y^2(k)}+u(k)+5u(k-1) $$
(28)
$$ y(k+1)=\frac{y(k)y(k-1)(y(k)-2.5)}{1+y^2(k)+y^2(k-1)}+u(k) $$
(29)

For system (28), an N 3,3,3,1 neural network is used as controller and the control parameters a k and c k are fixed the same as for system (26).

First, apply the SPSA-based model-free control strategy [6] to the non-linear plant (28). The expected output is a Sinusoidal signal of amplitude 6, and the controlled system response is shown in Fig. 7 with the corresponding control input shown in Fig. 8.

Fig. 7
figure 7

Performance of the original SPSA-based model-free control method on plant (28)

Fig. 8
figure 8

The corresponding control input

Then, apply the novel adaptive control strategy to the same non-linear plant. The controlled system response and the corresponding control input signal are shown in Fig. 9 and Fig. 10 separately.

Fig. 9
figure 9

Performance of the adaptive data driven control strategy on plant (28)

Fig. 10
figure 10

The corresponding control input

The same comparison tests were also conducted on plant (29). The expected output is chosen as a square signal of amplitude 1. An N 3,3,2,1 neural network is used as controller and the control parameters a k and c k are fixed the same as for system (26). Simulation comparison results can be seen in Figs. 11, 12, 13, and 14.

Fig. 11
figure 11

Performance of the original SPSA-based model-free control method on plant (29)

Fig. 12
figure 12

The corresponding control input

Fig. 13
figure 13

Performance of the adaptive data driven control strategy on plant (29)

Fig. 14
figure 14

The corresponding control input

For plant (28) and (29), comparison results of their average output errors out of 50 runs are shown separately in Table 2 and Table 3.

Table 2 Comparison results on plant (28) out of 50 runs
Table 3 Comparison results on plant (29) out of 50 runs

In order to test the performance of the proposed control method when the character of the plant changes, a hybrid system (30) is introduced:

(30)

An N 3,3,3,1 neural network is used as controller and the control parameters a k and c k are fixed the same as for system (26).

First, apply the SPSA-based model-free control strategy [6] to the non-linear plant (30). The expected output is a square signal of amplitude 0.5, and the response of the controlled system is shown in Fig. 15, with the corresponding control input shown in Fig. 16.

Fig. 15
figure 15

Performance of the original SPSA-based model-free control method on plant (30)

Fig. 16
figure 16

The corresponding control input

Then, apply the novel adaptive control strategy to the same plant. The controlled system response and the corresponding control input signal are shown in Fig. 17 and Fig. 18 separately. The comparison results of their average output errors out of 50 runs are given in Table 4.

Fig. 17
figure 17

Performance of the adaptive data driven control strategy on plant (30)

Fig. 18
figure 18

The corresponding control input

Table 4 Comparison results on plant (30) out of 50 runs

From all the above simulation results, it can be seen obviously that the proposed adaptive data driven control strategy performs much better than the SPSA based method and can achieve much smoother tracking results. Also, from the comparisons in Tables 1, 2, 3 and 4, the effectiveness of the proposed adaptive control strategy is fully illustrated. In all the simulation tests, the novel adaptive method can achieve higher accuracy in tracking errors, and especially in the third simulation test, when the original SPAS based model free control method cannot get satisfactory tracking result after the turning point of the given signal, the newly proposed adaptive data driven control strategy can achieve good tracking performance with high accuracy, which also shows the strong power of the newly proposed data driven control strategy.

4.2 Near-optimal control problem

In this section, one example of near-optimal control problem is provided to demonstrate the effectiveness of the control scheme developed in this paper. Consider the near-optimal control problem for a discrete-time non-linear mass-spring system:

(31)

where x(k) is the state vector and u(k) is the control input.

Define the performance functional as:

$$ J^*_k = x^TQx $$
(32)

where the weight matrix Q is chosen as:

(33)

An N 3,2,1,1 neural network is used as controller and γ=5,λ=0.05 is chosen for a k , and c k is fixed as 0.15/k 0.101. The state curves are shown in Fig. 19, and the corresponding control input is shown in Fig. 20. Through the simulation results, it is well demonstrated that by applying the adaptive data driven control strategy, the mass-spring system’s performance could converge quickly and very close to optimal.

Fig. 19
figure 19

The state curves

Fig. 20
figure 20

The corresponding control input

5 Conclusion

This paper proposed a novel adaptive data driven control strategy with stability analysis. The proposed control method is applied to solve non-linear tracking problems and one near optimal control problem. Simulation tests were conducted on typical non-linear plants and a mass-spring system. The effectiveness of this novel data driven control strategy is fully illustrated through simulation comparison tests.

In the future research work, we will further discuss how to choose better control parameter γ and λ, and will focus on more complex non-linear systems. In the mean while, we will try to find more application areas for our proposed adaptive data driven control strategy. Another open problem is one common to many applications of function approximators: namely, to develop guidelines for determining the optimal (at least approximately) structure for the FA, e.g., optimal number of hidden layers and nodes in a neural network. We will also do some more researches in such areas.