Learning-based nonlinear model predictive control with accurate uncertainty compensation

A learning-based nonlinear model predictive control (LBNMPC) method is proposed in this paper for general nonlinear systems under system uncertainties and subject to state and input constraints. The proposed LBNMPC strategy decouples the robustness and performance requirements by employing an additional learned model and introducing it into the MPC framework along with the nominal model. The nominal model helps to ensure the closed-loop system’s safety and stability, and the learned model aims to improve the tracking behaviors. As a core of the learned model construction, an online parameter estimator is designed to deal with system uncertainties. This estimation process effectively evaluates both the current and historical effects of uncertainties, leading to superior estimating performance compared with conventional methods. By constructing an invariant terminal constraint set, we prove that the LBNMPC is recursively feasible and robustly asymptotically stable. Numerical verifications for a two-link manipulator are conducted to validate the effectiveness and robustness of the proposed control scheme.


Introduction
Model predictive control (MPC) technique has received extensive attention over the recent decades [1][2][3], given its optimizing ability with respect to userdefined cost functions and its constraint handling ability regarding state and input constraints. Balancing the robustness and performance requirements is a challenging task in the design of advanced MPC schemes [4]. Though robust MPC methods [5], such as the min-max MPC [6] and tube-based MPC [7], can ensure robustness and constraint handling requirements, these methods usually over-prioritize robustness properties, causing performance degradation and conservativeness. Several control strategies for uncertain systems have been investigated recently [8][9][10][11]. Considering the efficiency and capability against uncertainties [12][13][14], adaptive MPC methods were proposed to reduce system conservativeness and improve the closed-loop performance in [15][16][17]. Moreover, a learning-based MPC (LBMPC) was proposed in [18] for linear systems. LBMPC combines the advantages of both the adaptive control and robust MPC, making it possible to improve performance under system uncertainties while guaranteeing robustness and safety requirements. The main principle of LBMPC is to minimize a cost function by using two parallel models: i) the learned model, which is modified online for performance enhancement purposes; ii) the nominal model, which is employed to ensure safety and stability [19]. This strategy, to some extent, decouples and balances the robustness and the performance of the closed-loop system. Given these merits, LBMPC has been employed in various practical applications [20][21][22].
However, existing LBMPC schemes [18][19][20][21][22] are only designed for linear nominal models. Moreover, only conventional certainty-equivalence (CE)-based adaptive control [23] or statistic-based estimation methods [24] were employed in the existing LBMPC frameworks. It is well understood that adaptive controllers synthesized through the CE principle cannot guarantee the convergence of parameter estimation errors to zero unless reference signals additionally satisfy the strong persistent-excitation (PE) conditions [25]. Therefore, the CE-based adaptive law may lead to poor transient performance and fail to estimate the true values of unknown parameters. Recently, the concurrent-learning adaptive control (CLAC) method was proposed to address the drawbacks of CE-based adaptive controllers [26]. State in a nutshell, this design innovatively uses specially selected and online recorded data concurrently with instantaneously incoming measurements for adaptation. Thus, the CLAC strategy can effectively estimate the unknown parameters based on both the current and historical effects, resulting in superior estimation performance under relaxed excitation conditions. However, how to embed the CLAC technique into the MPC framework (especially for discrete models) is still an open problem.
Motivated by these facts, a learning-based nonlinear MPC (LBNMPC) scheme with a concurrentlearning estimator is proposed in this paper for uncertain nonlinear systems, subject to input and state constraints. To meet the robustness and performance requirements of the closed-loop system simultaneously, an additional learned model enriched by the learning-based uncertainty estimator is introduced into the MPC framework. The main contributions are in order: (1) In contrast to existing LBMPC schemes [18,22] that only consider linear nominal models, this work deals with nonlinear nominal models, and the stability and robustness of the closed-loop system can still be strictly guaranteed by utilizing robust MPC theory. The terminal penalty and inequality constraints force the system states in a terminal region, and the recursive feasibility of the system guarantees the asymptotic stability of the closed-loop system. (2) A novel concurrent-learning estimator is designed in a discrete-time form and introduced into the LBNMPC. By employing both instantaneous and historical state data (the historical state data are recorded online), this estimator ensures superior estimation performance compared with conventional methods. It guarantees the exponential convergence of parameter estimation errors, subject to the satisfaction of a relaxed excitation condition.
The remainder of this paper is organized as follows. Section 2 establishes the main framework of our LBNMPC algorithm. Section 3 analyzes the recursive feasibility and stability of the LBNMPC. Section 4 presents the adaptive estimation procedure with stability analysis. In Sect. 5, numerical simulations are illustrated to show the advantages of LBNMPC. Finally, the paper ends with some conclusions in Sect. 6.
Notations. R ! a signifies a set of numbers that their real parts are greater than a. The operation jjxjj 2 Q means jjxjj 2 Q ¼ x T Qx, where Q is a positive-definite matrix. The operator X È Y ¼ fx þ yjx 2 X; y 2 Yg denotes the Minkowski sum, where x and y are elements in the sets X and Y, respectively. The operator [27]. We also denote intðF Þ as the interior of the set F . Also, k min ðÁÞ and k max ðÁÞ denote the minimum and maximum eigenvalues of corresponding matrices.

Definitions
A function -: R ! 0 ! R ! 0 is type-K if it is continuous, strictly increasing, and -ð0Þ ¼ 0. Moreover, the functionbelongs to class K 1 if -ðsÞ ! 1 for s ! 1 [27]. Furthermore, a function b : ð Þ is nonincreasing for s ! 0, and b s; t ð Þ ! 0 as t ! 1. Finally, a set is a C-set if it is compact and convex, and it is a PC-set once it contains the origin.

LBNMPC strategy
As the main contribution of this work, a LBNMPC framework is developed in this section for general nonlinear systems under system uncertainties. The LBNMPC scheme aims to minimize the quadratic objective function consisting of a stage cost and a terminal cost, subject to nonlinear dynamics, state constraints, input constraints, and a terminal inequality constraint.

Problem statement
A LBMPC method was proposed in Ref. [18] to improve the system performance while ensuring robustness. It considers a linear nominal model and a learned model with uncertainties. The linear nominal model is used to guarantee the stability of the system. An identification tool is employed for the learned model to improve performance. The LBMPC was implemented in heating, ventilation, and air-conditioning systems in Ref. [20] and the real-time control of quadrotor helicopters in Refs. [19,21], and [22]. However, all these elegant results are built on linear nominal models. Considering that many practical systems have strong nonlinear properties, this paper aims to address the optimal control problem for nonlinear systems subject to multiple constraints and system uncertainties. The main challenge comes from how to make a balance between adaptability and robustness while strictly guaranteeing the closed-loop stability. Motivated by these facts, a LBNMPC scheme with a high-performance learning estimator is proposed in this paper to solve optimal control problems for uncertain nonlinear systems under constraints. Firstly, the optimal control problem is formulated as follows.
Consider a discrete-time nonlinear system as follows where x 2 R n is the state vector, u 2 R m is the control input vector, and x þ 2 R n denotes the successor state of x. Besides, f ðÁÞ denotes the nominal model of the system which is assumed to be twice continuously differentiable, and wðÁÞ is the uncertainty dynamics of the system.
Without loss of generality, we assume the system's equilibrium ðx e ; u e Þ is the origin. The system in (1) is subject to the state and control constraints: x 2 X and u 2 U. Here X & R n and U & R m are PC-sets. Besides, the uncertainty dynamics is assumed to be bounded for x 2 X and u 2 U. Thus there exists a Cset W such that wðx; uÞ 2 W when x 2 X and u 2 U.
The LBNMPC is constructed to handle the stabilization issue of (1). As mentioned in introduction, a nominal model and a learned model are employed for controller design. The nominal model of (1) is described by where x and u are the state and input of the nominal model. The learned model is defined bŷ x þ ¼ f ðx;ûÞ þŵðx;ûÞ ð 3Þ wherex andû are the induced state and control input based on the learned model, andŵ denotes the estimation of w. In our LBNMPC, an iterative optimization procedure is required to obtain the control sequence via optimizing a finite-horizon quadratic cost function, which is defined as follows with an initial state x 0 and a desired state x e V N ðx; k; u; x 0 ; x e Þ,V f ðxðNÞ; where N is the prediction horizon, lðx; uÞ is the stage cost defined by lðx; uÞ, jjxjj 2 Q þ jjujj 2 R with Q and R are positive-definite weight matrices. Besides, we denote uðx; k; x 0 ; x e Þ,fuðx; 0; x 0 ; x e Þ; uðx; 1; x 0 ; x e Þ; . . .; uðx; N À 1; x 0 ; x e Þg as the control sequence for the whole prediction horizon. Moreover, V f ðÁÞ is the terminal cost function depicted as where P is the terminal penalty matrix. We also define a terminal constraint set by It is designed to draw the states at the end of the finite prediction horizon to a neighborhood of the origin [28]. The construction process of this terminal constraint set X is described in detail in the following subsection.

Construction of terminal constraint set
In Ref. [29], a novel robust MPC was proposed for a linear system with additive uncertainties to track changing targets. This controller has the ability to steer the uncertain system to a neighborhood of the target. Reference [30] designed MPC method for constrained systems with detailed stability and optimality analysis. Reference [31] introduced a robust MPC approach to guarantee the feasibility and robustness of linear systems under bounded disturbances and various constraints. Reference [32] presented a maximal output admissible set for linear MPC methods. Reference [33] proved the asymptotic closed-loop stability of the nonlinear MPC. These results provide fundamental design principles for the construction of terminal constraints and showcase how to ensure stability and feasibility of robust MPC controllers under disturbances and constraints.
Specifically, the terminal constraint set aims to block the move at the end of the prediction horizon and restrict the inherent behavior of the finite-horizon control. It is critical in providing stability, safety, robustness, and feasibility of MPC [34]. The terminal penalty matrix P and the terminal constraint set X can be determined off-line. To this end, we consider the Jacobian linearization of the nominal dynamics at the equilibrium ðx e ; u e Þ: where A ¼ ðof =oxÞðx e ; u e Þ and B ¼ ðof =ouÞðx e ; u e Þ. Assuming this linearized system is controllable, then there exists a local linear feedback controller u ¼ where K is the feedback gain (K is a positive-definite matrix). Then, the terminal penalty matrix P and the matrix K can be determined by solving the following equation [28] ðA Ã Þ T PA Ã À P ¼ ÀðQ Ã þ sIÞ ð 8Þ where Q Ã ¼ Q þ K T RK, and s is a user-defined positive constant.
To guarantee the stability of the system, the invariant terminal constraint set X is designed as where a is a constant and computed by the method proposed in Ref. [28].
Remark 1 Note that the local linear feedback controller is only applied to calculate the terminal penalty matrix P and the terminal constraint set X off-line and ensure the system asymptotic stability [33]. It is not directly employed to the actual control system. Besides, it should be emphasized that the terminal constraint set X chosen by the linear feedback control is invariant for the nonlinear system under the MPC control law.

LBNMPC Strategy
The LBNMPC strategy proposed in this paper inspires from the tube MPC, a type of robust MPC, which can ensure the nonlinear system's real trajectory lies in a tube that surrounds the nominal trajectory. The width of the tube is restricted in a set C, and the constraints set X are shrunk by the width of the tube. Thereby, the nominal trajectory lies in XÉC and the real trajectory lies in X. Similarly, for LBNMPC, the nominal and real trajectories lie in XÉC and X, respectively. Given the nominal model in (2) and learned model in (3), our LBNMPC is formulated as: lðxðkÞ À x e ;ũðkÞÞ ð10Þ x þ ¼ f ðx;ũÞ ð 12Þ x þ ¼ f ðx;ũÞ þŵðx;ũÞ ð 13Þ whereũ is the control sequence and the first element of it (i.e.,ũ) is applied to both the nominal model and learned model. The set C is defined as 18]. Also, xðNÞ 2 X denotes the terminal constraints and X is an invariant set restricted by the local linear feedback controller [35]. In addition, Eq. (11) indicates that, after the optimal control input is applied to the system, the observed state x should be fed to both the nominal state x and the learned statex to construct the subsequent optimization problem.

Remark 2
The terminal invariant set denotes a set that for all x 2 X, there exists a control u 2 U such that x þ 2 X [36,37]. Therefore, all trajectories of the system always stay in X if they are starting from X.
The solution of the above problem is denoted as /ðk; x;ũÞ with the initial state x 0 and the control sequenceũ. The state constraint xðkÞ 2 XÉC, control constraintũðkÞ 2 UÉKC, and terminal inequality constraint /ðN; x;ũÞ 2 X lead to the control set U N ðx; x e Þ as follows The optimal control problem P N ðx; k; x 0 ; x e Þ is defined by The initial solution of P N ðx; k; x 0 ; x e Þ is And the associated state sequence is Then, the first elementũ 0 ð0; x; k; x 0 ; x e Þ is applied to the LBNMPC. At the next sampling instant, this procedure operates repeatedly for the successor state.

Remark 3
The difference between the LBNMPC and other MPC methods is that the LBNMPC is formulated based on two models (nonlinear nominal and learned models) simultaneously, which makes it possible for dealing with uncertainties while preserving the properties of robust MPC.
Remark 4 Note that in the LBNMPC, the cost function is constructed by the states of the learned model, while the constraints are imposed on the nominal model. This design ensures robustness when the learned model does not match the true dynamics.
Remark 5 An important characteristic of LBNMPC is that the system safety, stability, and robustness are only related to the state, input, and terminal constraints based on the nominal model. Therefore, the safety & robustness requirement (guaranteed by the nominal model) and the performance enhancement (provided by the learned model) can be decoupled. As a result, the LBNMPC can make a trade-off between the system's robustness and performance.

Stability analysis
In this section, the conditions for guaranteeing the stability of LBNMPC are presented in detail. It is noteworthy that the stability and robustness of the LBNMPC scheme are independent to the system uncertainties and the learning tools that are involved in the learned model. Some necessary definitions are given as follows for the subsequent stability analysis.
Definition 1 [38], Asymptotically Stable: The system is said to be asymptotically stable (AS) about x e on F R n , if there exists a type-K functionsuch that for x k 2 F, the condition jx k À x e jjx k À x e j ; k ð Þ holds for k ! 0.
Definition 2 [39], Robustly Asymptotically Stable: The system is said to be robustly asymptotically stable (RAS) about x e on intðF Þ with respect to measurement error (additive disturbance) e k , if there exists a KL function b and for each e [ 0 and compact set ' & intðF Þ, there exists d [ 0 such that for all the measurement errors e k satisfying i)max k jje k jj\d; ii) x k 2 ' and jjx k À x e jj b jjx k À x e jj; k ð Þþe for all k ! 0.

Remark 6
The main difference between AS and RAS is that the RAS considers the measurement error (additive disturbance), and it guarantees the asymptotic stability of system with respect to the disturbance.

Stability of open-loop system
The trajectory generated by (12) with feasible solutions can satisfy the terminal inequality constraint in (15) and ensure the boundedness of the objective function in (10). Therefore, feasibility at each time step should be analyzed. We assume that the feasible point is p n ; the system state and input predicted by the nominal model are denoted as xðx; k; x 0 ; x e ; p n Þ 2 X É C, uðx; k; x 0 ; x e ; p n Þ 2 UÉKC, respectively. By introducing the uncertain modeling error (denoted as d n 2 W) into the nominal model, x þ ¼ f ðx; uÞ þ d n should be considered for stability analysis. For the next feasible point p nþ1 , we have xðx; k; x 0 ; x e ; p nþ1 Þ ¼ xðx; k; x 0 ; x e ; p n Þ þ d nþ1 and uðx; k; x 0 ; x e ; p nþ1 Þ ¼ uðx; k; x 0 ; x e ; p n Þ þ Kd nþ1 , d nþ1 2 W.
Lemma 1 Considering a set of states X N satisfying (20) that leads to at least one control sequenceũ meeting the state, control, and terminal constraints. Then, for the nominal system, the feasibility of the open-loop optimal control problem at k = 0 implies its feasibility for all k [ 0.
x 2 X N ,fxjU N 6 ¼ ;g Proof Under the condition that there exists an optimal control sequenceũ Ã ðx; k; x 0 ; x e Þ for the optimal control problem P N ðx; k; x 0 ; x e Þ with the associated state sequence x Ã ðx; k; x 0 ; x e Þ 2 X N at ½k; k þ N, the state x Ã ðx; k þ N; x 0 ; x e Þ belongs to the terminal constraint set X due to the system feasibility. For the next optimal control problem P N ðx; k þ r; x 0 ; x e Þ (r [ 0 is a small sampling step), the initial state satisfies x 0 ðx; kþ r; x 0 ; x e Þ ¼ x Ã ðx; k þ r; x 0 ; x e Þ. Then, a candidate control sequenceũðx; k þ r; x 0 ; x e Þ under the local linear feedback controller for the problem P N ðx; k þ r; x 0 ; x e Þ at ½k þ r; k þ r þ N can be chosen as.
Note that the terminal constraints set X is an invariant set restricted by the local linear feedback controller. Therefore, the initial state x 0 ðx; x þ r; x 0 ; x e Þ ¼ x Ã ðx; k þ N; x 0 ; x e Þ 2 X indicates that x 0 ðx; k þ r þ N; x 0 ; x e Þ 2 X [42][43][44]. This completes the proof.
To sum up, for each prediction horizon of the optimal control problem in (10)-(15) with feasible solutions, the terminal penalty V f ðxðNÞ; x e Þ considered in (10) and the terminal constraints in (15) can force the states at the end of the prediction horizon lie within a terminal region.

Stability of closed-loop system
We choose the cost function V N ðx; k; x 0 ; x e Þ as the Lyapunov function to analyze the closed-loop stability. Note that the cost function is related to the learned states at each prediction horizon. After solving the optimal control problem for each prediction horizon, the state x 0 ðx; k þ 1; x 0 ; x e Þ for next prediction horizon can be acquired by the real system model based on the solution ofũ Ã ðx; k; x 0 ; x e Þ. Then, we consider the following assumptions.
Assumption 2 [42]: For the stage cost lðÁÞ and the terminal cost V f ðÁÞ, there exist K 1 functions a l and a f satisfying lðx;ũÞ ! a l ðjxjÞ; 8x 2 X N ;ũ 2 U ð22Þ where x 2 X N ,fxjU N 6 ¼ ;g since the set U N satisfies the state constraint, input constraint, and terminal constraint.
Lemma 2 [45,46]: If the cost function satisfies the following condition: There exists a u ¼ k f ðxÞ 2 U such that V f ðf ðx; k f ðxÞÞ; x e Þ þ lðx À x e ; k f ðxÞÞ V f ðx; x e Þ for all x 2 Xðx e Þ. Then, the closed-loop system is AS.
Based on all these preliminaries, we propose the following theorem.
and the closed-system system is AS.
Proof The solution of P N ðx; k; x 0 ; x e Þ is.
The associated state sequence is and the first elementũ 0 ð0; x; k; x 0 ; x e Þ is applied to the LBNMPC, and we denote x þ ¼ x 0 ð1; x; k; x 0 ; x e Þ. Letũ denote the following control sequencẽ uðx; k; x 0 ; x e Þ,ũ 0 ð1; x; k; x 0 ; x e Þ;ũ 0 ð2; x; k; x 0 ; x e Þ; È which is feasible for P N ðx; k; x 0 ; x e Þ but not necessarily optimal. Then, it follows that Considering Therefore, the cost function is nonincreasing, and the closed-loop system is AS. [18,42,45]. If there exists an additive disturbance w 2 W, then the asymptotic stability condition for the closed-loop system is modified as: There exists a u ¼ k f ðxÞ 2 U such that V f ðf ðx; k f ðxÞÞ; w; x e Þ þ lðx À x e ; k f ðxÞÞ for all x 2 Xðx e Þ, where d 2 ð0; 1Þ. Then, the LBNMPC is RAS with Therefore, the LBNMPC is RAS when i) Assumption 1 and Assumption 2 are satisfied; ii) the system uncertainty is bounded; iii) the terminal cost and the terminal invariant set force the state within the neighborhood of origin at the end of each prediction horizon; iv) the closed-loop system is feasible.

Concurrent-learning estimator
From Section III, the robustly asymptotic stability has been guaranteed with the nonlinear nominal model. However, the performance of LBNMPC requires the accurate estimation of the unmodeled dynamics. Some examples to this end are given in [20,47,48]. In this section, a novel concurrent-learning-based estimation strategy is proposed and its convergence property is proved.

Estimator development
We aim to develop an estimator to compensate the uncertain dynamics: wðx; uÞ. In many applications, wðx; uÞ satisfies an affine representation, i.e., there exists a regressor matrix zðx; uÞ 2 R nÂp and an unknown constant vector / 2 R p , such that wðx; uÞ ¼ zðx; uÞ/. And here p is the total number of unknown parameters. In fact, when wðx; uÞ cannot satisfy the affine representation, a single-layer neural network can be employed to reconstruct wðx; uÞ. Specifically, based on the Weierstrass approximation theorem [49], wðx; uÞ can be reconstructed by a very similar form with the affine representation: But now zðx; uÞ 2 R nÂp is a set of basis functions, / 2 R p is a governed vector which contains the weights of corresponding basis functions, and e is the reconstruction error. It has been well-understood that, by properly choosing a large enough set of basis functions, the reconstruction error e is bounded in X Â U and can be arbitrarily small and negligible. Based on these facts, we assume the uncertain dynamics in our paper satisfies wðx; uÞ ¼ zðx; uÞ/, and no matter zðx; uÞ is a regressor matrix or a set of user-defined basis functions. And the objective of the estimator is to evaluate the true value of /.
Based on wðx; uÞ ¼ zðx; uÞ/, Eq. (1) can be rewritten as follows Then, we design the following filtered variables: where l f is the estimator coefficient satisfying 0\l f \1. Thus x þ f , z þ f and f þ f are bounded variables when x f , z f , and f f are bounded. Substituting (34)- (36) into (1) yields We denote y ¼ x þ f À f f À z f /, and then (37) indicates Since y is an exponentially vanishing term which converges to zero quickly, we have x þ f À f f À z f / % 0. Then, the estimation of / (denoted by/) can be divided into two termŝ And the successors of/ 1 and/ 2 are designed aŝ is employed for ease of notation. Besides, t i denotes a set of past time indexes with 0 t i \t, i ¼ 1; 2; :::; q, and here q is a constant which denotes the total number of historical data points. Based on (40) and (41), the estimator/ follows: Equation (42) shows that not only real-time data but past measurements are concurrently introduced into the estimator, and the motivation of this design will be explained in the convergence analysis.

Convergence analysis
Theorem 2 Consider the nonlinear system with uncertain dynamics as in (1), design the learningbased estimator as in (42). Then the estimation error / ¼/ À / is bounded. Moreover, if k min ðuÞ [ 0,/ exponentially converges to zero.
Remark 8 Unlike the conventional CLAC that relies on the accurate approximation of immeasurable variables (i.e., the variable x þ in this paper), the proposed estimator adopts filtered states and regressor matrices in (34)- (36) and therefore circumvents the variable approximation requirements.
In summary, the detailed design procedure of the LBNMPC is described in this subsection. The detailed implementation procedure of the LBNMPC scheme is presented in Table 1, and the optimal control problem can be solved by the solver MOSEK [50]. For a better understanding of the proposed controller, the architecture of the LBNMPC is illustrated in Fig. 1 as well.

Remark 9
In Theorem 2, we show that the estimation error/ can exponentially converge to zero, subject to the satisfaction that k min ðuÞ [ 0. Recall the definition of FE conditions, this requirement can be guaranteed if z f satisfies a FE condition. To ensure u is full rank if z f satisfies a FE condition, a simplest way is to add all incoming data of z f into u until rankðuÞ ¼ p (therefore, k min ðuÞ [ 0). A more sophisticated method is to design a selection algorithm, and some examples are shown in [26,33]. Moreover, without the past measurements (47). Under this condition, z f is required to satisfy the PE condition as in Definition 3 to ensure the convergence of/. Note that this is a common requirement in conventional estimator/identifier designs. However, PE conditions are quite strong from the standpoint of practical engineering. In this paper, we relax the PE condition to the FE condition by employing past measurements.

Applications to the control of a two-link manipulator
In this section, the effectiveness of the proposed LBNMPC scheme is validated via numerical simulations. A typical two-link robot manipulator model [51] is considered as follows where x 1 ¼ ½ q 1 q 2 T and x 2 ¼ ½ q 3 q 4 T denote the position and velocity vectors, respectively. Besides,u ¼ ½ u 1 u 2 T is the control torque vector, and  (48) is discretized using Euler approximation as where T s is the sampling period, f ðkÞ ¼ ÀM À1 ½ðV þ FÞx 2 ðkÞ þ C, gðkÞ ¼ ÀM À1 .
In this simulation case, we assume that the parameters p 5 and p 7 are unknown for controller design and thus they need to be estimated by the proposed learning procedure. The initial guesses of them arê p 5 ð0Þ ¼ 0:6 andp 7 ð0Þ ¼ 1:8. The parameter k f in the filtered regressor is selected as k f ¼ 0:2. The prediction horizon is set to N = 7, and the weighting matrices Q and R are chosen as Q ¼ 3000 Â I 4Â4 and R ¼ I 2Â2 .
The constant a is computed as a ¼ 1:34. Through the Jacobian linearization of the nominal system at the origin, the matrices A and B are calculated by (7)  The terminal penalty matrix P and the local linear feedback gain K are calculated by (8)    The set U is selected as U ¼ ½À5; 5. The set W is chosen via the algorithm in [52] that exploits W based on the Taylor reminder theorem with a ''safetymargin'' [18]. Then, the set C can be calculated by its definition (C 0 ¼ f0g, C k ¼ È kÀ1 j¼0 ðA þ BKÞ j W [18]). Therefore, the constraint in (14) can be determined, which ensures the closed-loop system's safety. This reflects the main characteristic of LBNMPC that decouples the system's safety and the performance enhancement.
Under these conditions, the simulation results under LBNMPC without the estimation procedure are presented in Fig. 2a, while the results with the uncertainty learning method are shown in Fig. 2b. The thin solid lines in Fig. 2 represent the reference trajectories. It can be observed that the system states can successfully track the desired trajectories in both cases. Moreover, it can be seen from Fig. 3 that the LBNMPC leads to   The uncertainty learning performance for the unknown parameters p 5 , p 7 is presented in Fig. 4a. The solid lines in Fig. 4a depict the real values of p 5 , p 7 , and the dashed lines denote the estimated valuesp 5 ,p 7 . It can be seen from Fig. 4a that the estimated parameters can converge to their real values. The estimation errors (p 5 andp 7 ) in Fig. 4b show that the proposed uncertainty learning scheme can ensure the estimation errors converge to zero.
The optimal control sequence is presented in Fig. 5. The control sequence in Fig. 5 satisfies the input constraints formulated in (14). The cost function variations are presented in Fig. 6. The cost function decreases gradually and finally converges to zero, reflecting the asymptotic stability of the LBNMPC approach.
In addition to the unknown parameters, the robustness of the LBNMPC is verified under external disturbances on velocity measurements. Two difference cases are considered here: i) Case 1: the velocity measurements are polluted by a zero-mean Gaussian noise with a standard deviation as 1.0 9 10-3; ii). Case 2: the velocity measurements are polluted by a zero-mean Gaussian noise with a standard deviation as 3.0 9 10-3. The simulation results are shown in Fig. 7. Figures 7a and 7c show the tracking   Figures 7b  and 7d show the corresponding tracking errors. It can be seen that the position states can precisely track the reference trajectories with good performance even under polluted velocity measurements. All these results show that the proposed LBNMPC approach can maintain good performance under disturbances.
To evaluate the properties of the proposed LBNMPC in detail, we also consider a stabilization case by setting x d ¼ ½ 0 0 0 0 T . The results are given in Fig. 8. One can see that all the states can converge to the origin under the proposed controller. Figure 8b illustrates the tracking errors and the terminal region X. It shows that the closed-loop system satisfies the constraints in (15), validating the stability and convergence properties of our LBNMPC.
Moreover, two different MPC approaches are also employed for comparison: i) the adaptive model predictive control (AMPC) with the proposed estimation procedure, and ii) the nonlinear model predictive control (NMPC) without uncertainty compensation. The comparative results of tracking trajectories and cost functions are presented in Figs. 9 and 10, respectively. It can be observed that both the LBNMPC and AMPC have good performance in tracking the desired trajectories, while the NMPC leads to larger tracking errors. Moreover, it can be seen from Fig. 10 that the proposed LBNMPC controller leads to a less cost than the AMPC and NMPC.
To sum up, the proposed LBNMPC is capable of achieving good performance, subject to uncertainties, disturbances, and various constraints.

Conclusion
A LBNMPC method was proposed in this paper to solve the optimal control problems of nonlinear systems subject to multiple constraints and system uncertainties. The control strategy was based on two models, i.e., the nonlinear nominal model and the learned model. The nominal model guarantees the stability and robustness of the LBNMPC, while the learned model improves the control performance via a novel concurrent-learning estimator. The key feature of our estimator is that it includes not only real-time data but also past measurements into the estimating framework, achieving precise estimation under a relaxed excitation condition. We showed that our LBNMPC could decouple the robustness and performance and ensure the feasibility, stability, and convergence of the closed-loop system. Extensive simulations and comparative analyses illustrated that LBNMPC could lead to superior tracking performance and robustness compared with other methods.
Funding This work has received funding from the UK Engineering and Physical Sciences Research Council (Grant Number: EP/S001905/1).

Declarations
Conflicts of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.