Abstract
In this paper, the problem of inverse quadratic optimal control over finite timehorizon for discretetime linear systems is considered. Our goal is to recover the corresponding quadratic objective function using noisy observations. First, the identifiability of the model structure for the inverse optimal control problem is analyzed under relative degree assumption and we show the model structure is strictly globally identifiable. Next, we study the inverse optimal control problem whose initial state distribution and the observation noise distribution are unknown, yet the exact observations on the initial states are available. We formulate the problem as a risk minimization problem and approximate the problem using empirical average. It is further shown that the solution to the approximated problem is statistically consistent under the assumption of relative degrees. We then study the case where the exact observations on the initial states are not available, yet the observation noises are known to be white Gaussian distributed and the distribution of the initial state is also Gaussian (with unknown mean and covariance). EMalgorithm is used to estimate the parameters in the objective function. The effectiveness of our results are demonstrated by numerical examples.
Introduction
Since first proposed by [1], there has been a numerous applications of inverse optimal control (IOC) [2,3,4]. For example, IOC has been widely developed as a powerful tool to help us understand the optimality criteria underlying biological motions, which could then used for control synthesis of bionic robots. The goal of “forward” optimal control problem is to find the optimal control input and the corresponding state trajectory given the cost function, system dynamics and the initial value of the states. In contrast, for a given system dynamics, the IOC problem focuses on recovering the undetermined parameters in the cost function that corresponds to the observed system states or output measurements (possibly noisy).
In this paper, the problem of IOC for discretetime linear quadratic regulators (LQRs) over finitetime horizon is considered. As an extension of our early work [5], the identifiability of the corresponding model structure is also fully studied. Specifically speaking, our goal is to reconstruct the unknown parameters in the quadratic objective function using the given discretetime linear system dynamics and its noisy measurement of output.
The inverse linear quadratic optimal control problem has been widely studied during the past decades, particularly in the continuous infinite timehorizon case such as [6,7,8]. In most work, it is assumed that the optimal feedback gain K is measured exactly, which is a constant matrix in the infinite timehorizon. A classical method [9] is to formulate the problem into an optimization problem with linear matrix inequalities (LMI) constraints and solve for the optimal Q and R when the feedback gain K is known. Nevertheless, in many scenarios, the feedback control gain K cannot be known exactly and we could only measure the optimal state sequence or the corresponding system output. In [10], given the noisy state observations, the discrete infinite timehorizon case is studied in which the optimal feedback gain K is timeinvariant. Their approach is to first identify the feedback matrix K from noisy observations by a maximumlikelihood estimator. Then matrices Q and R could be solved in a similar way as the method proposed in [9].
However, note that in the finitetime horizon case, the feedback gain \(K_t\) is timevarying, since “identifying timevarying system” is in general not an easy task, such “two steps” approach, i.e., “first identify the feedback gain using classical system identification methods, then search for the corresponding parameters in the quadratic objective function” cannot be directly applied. Furthermore, one critical property which is ignored for the “first identification step” is that the \(K_t\)’s are actually generated by an LQR. Hence, it is difficult to guarantee the performance of such method in the finitetime horizon case and the optimality behind the observations could have been utilized to further improve the performance.
Since linear quadratic regulators can be seen as a special case of nonlinear optimal control problems, hence more generally speaking, the problem of IOC for quadratic regulators also lies in the scope of IOC problems for nonlinear systems. Usually, the considered objective function is composed by a weighted sum of base functions, where various parameter identification methods or learning techniques are applied. For example, in [11, 12], the optimality conditions for the forward optimal control problem is analyzed and the undetermined parameters in the cost function are estimated by minimizing the violation of these conditions. Nevertheless, [13] showed that such methods are not statistically consistent and sensitive to observation noise. Instead, a statistically consistent formulation is proposed in [13], where the resulting optimization problem is difficult to solve. The discrete finite time case is also considered in [14, 15] based on the Pontryagin’s Maximum Principle (PMP) for the optimal control problem. An optimization problem is then formulated whose constraints are two of the three conditions of PMP, and the residual of the third PMP condition is minimized. It is assumed that the optimal control input signal is known exactly therein while in our case, we only assume that the initial values and noisy system output is known. Moreover, though the authors claim that the problem of identifiability is considered, the “identifiability” considered therein does not coincide with the traditional definition of identifiability. In addition, the statistical consistency of the estimation is not claimed therein, hence the performance under noisy observations can not be guaranteed. In [16], the authors consider the discretetime IOC for nonlinear systems when some segments of the trajectories and input observations are missing. However, their analysis for the identifiability does not accord with the other definition of identifiability and statistical consistency cannot be guaranteed.
The aforementioned problems of estimating the parameters in a nonlinear optimal control problem could indeed include inverse LQ optimal control as a special case. However, taking advantage of the special structure of LQR, we can further prove the identifiability for the model structure in Sect. 3 and the statistical consistency for the estimation in Sect. 4 as well. Furthermore, under Gaussian assumptions, we use EMalgorithm to obtain a maximum likelihood estimation of the objective function as shown in Sect. 5 due to its special structure. The performance of the algorithms are shown by numerical examples in Sect. 6.
Notations We denote n dimensional positive semidefinite matrices cone as \({\mathbb {S}}^n_+\) while \({\mathbb {S}}^n_{++}\) is the set of strictly positive semidefinite matrices. \(\Vert \cdot \Vert\) denotes the \(l_2\) norm of a vector. We denotes the Frobenius norm of a matrix as \(\Vert \cdot \Vert _F\). \(\mathrm{tr}(\cdot )\) denotes the trace of a matrix and \(\mathrm{tr}(H_1^\mathrm{T}H_1)=\Vert H_1\Vert _F^2\). \(\otimes\) denotes Kronecker product and \([H]_i\) denotes the ith row of matrix H.
Problem formulation
The “forward” optimal LQ problem reads
where \(S,Q\in {\mathbb {S}}^n_+\), \(R\in {\mathbb {S}}^m_{++}\), \(x_t\in {\mathbb {R}}^n\) and \(u_t\in {\mathbb {R}}^m\).
Suppose the noisy output \(y_t=Cx_t+v_t\) is available, where \(v_t\in {\mathbb {R}}^l\) denotes the noise term, \(y_t\in {\mathbb {R}}^l\) and the matrices (A, B, C) are known. For the sake of simplicity, we only consider the case which \(R=I\) and \(S=0\) in this paper.
The IOC problem considered in this paper aims to find Q in the objective function using:

1.
the exact samples of initial value \(x_1={\bar{x}}\) and the noisy output \(y_{2:N}\), or

2.
the noisy output \(y_{1:N}\) under Gaussian assumption.
We further assume (A, B) is controllable and B has full column rank. Moreover, since a discretetime system is usually sampled from a continuous linear system \({\dot{x}}={\hat{A}}x+{\hat{B}}u\), this means that \(A=\mathrm{e}^{{\hat{A}}{\Delta } t}\), \(B=\int \nolimits _0^{{\Delta } t}\mathrm{e}^{{\hat{A}}\tau }{\hat{B}}\mathrm{d}\tau\) and by the fact of matrix exponential being invertible, it is reasonable to further suppose A being invertible.
Model identifiability
Before moving on, we would like to investigate the fundamental question: given the observation data, can we uniquely reconstruct the parameter Q in the objective function? This question actually involves two aspects: identifiability of the model structure as well as persistent excitation. Identifiability is a property of the model structure which determines if the model structure can generate the same output using two different parameters while persistent excitation of the input signal ensures that we can actually recover the undetermined parameters in the identifiable model by the measured corresponding output. Hence, to do further analysis, we need to define the model structure for the IOC problem first.
Model structure is nothing but a parameterized candidate model that describes the input–output relationship [17]. In the IOC problem we just formulated, we regard the initial value \({\bar{x}}\) as the input signal and the system output \(y_t\) as the output signal. Therefore, the model structure \({\mathcal {M}}_{2:N1}(Q)\) considered in this paper is defined as the following:
Definition 1
(Model structure) For \({\mathcal {M}}_t:{\mathbb {R}}^n\mapsto {\mathbb {R}}^l,t=2,\ldots ,N1\), the model structure \({\mathcal {M}}_{2:N1}(\cdot )\) takes the form:
where \(A_{cl}(k;Q)\) is the closedloop system matrix generated by the “forward problem” (1) and (2).
We further make the following assumption for the system:
Assumption 1
The discretetime linear system defined by (A, B, C) is a square system, i.e., \(l=m\) and has relative degree \((r_1,\ldots ,r_m)\). Namely,
where \(c_i\) denotes the ith row of C.
We first look into more details of the optimal control sequences.
Theorem 1
Suppose that \(N\geqslant n+1\) and (A, B) is controllable. If \(Q\ne Q^\prime , Q,Q^\prime \in {\mathbb {S}}^n_+\), then \(K_{t}(Q)= K_{t}(Q^\prime )\) cannot happen consecutively for more than \(n1\) steps.
Proof
Recall that the optimal solution to (2) is derived by the discretetime Riccati Equation (DRE)
whose solution sequence is denoted by DRE(Q, S). We then prove the theorem by contradiction. Let \(P_t=DRE(Q,0)\), \(P^\prime _t=DRE(Q^\prime ,0)\), \({\Delta } Q=Q Q^\prime\) and \({\Delta } P_t=P_t P^\prime _t\) for \(t=2:N\). Assume there exists \(t_1\in [1,N1]\) such that \(K_{t}(Q)= K_{t}(Q^\prime )\) hold for \(t\in [t_1n+1,t_1] \subset [1,N1]\). Then it holds that
where \(t=t_1n+1:t_1\).
Using (4) and the fact that A is invertible, \(B^\mathrm{T} {\Delta } P_{t+1}=0\) could be computed recursively for \(t=t_1n+1:t_1\) and stacked into the following compact form:
Since (A, B) is controllable, \({\varGamma }\) has full column rank. Hence, \({\Delta } P_{t_1+1}\) is the unique solution of the linear equation \({\varGamma } {\mathscr {P}} = {\varOmega }\).
On the other hand, consider the following algebraic equation:
We can apply the same techniques to (5) and get the same equation \({\varGamma } {\mathscr {P}} = {\varOmega }\) while \({\Delta } P\) is a solution to it. Since \({\varGamma } {\mathscr {P}} = {\varOmega }\) has a unique solution, hence it follows that \({\Delta } P_{t_1+1}={\Delta } P\). Moreover, recall that it holds for any \(t_1n+1\leqslant t\leqslant t_1\) that \(A^\mathrm{T}{\Delta } P_{t+1}A{\Delta } Q = {\Delta } P_t\) and \(A^\mathrm{T}{\Delta } PA+{\Delta } Q={\Delta } P\), thus we have \({\Delta } P_t = {\Delta } P,\forall t = t_1n+2:t_1+1\).
Using (5) and \(P^\prime _t=DRE(Q^\prime ,0)\), we can then show that \({\bar{P}}_t\triangleq P_t^\prime +{\Delta } P\) is in fact the solution to \(DRE(Q,{\Delta } P)\) for all \(t\in [1,N1]\), i.e.,
where \({\bar{P}}_N=P_N^\prime +{\Delta } P={\Delta } P\).
Recall that on the time interval \(t =t_1n+2:t_1+1\), \(P_t\) and \({\bar{P}}_t\) satisfy the same DRE with weighting matrix Q and boundary \({P}_{t_1+1}={\bar{P}}_{t_1+1}=P_{t_1+1}^\prime +{\Delta } P\). Then based on the backwards recursion of DRE, it is obvious that \(P_t={\bar{P}}_t\) for all \(1 \leqslant t \leqslant t_1+1\). It means that \(P_t\) and \({\bar{P}}_t\) are generated by the same forward Riccati equation with initial condition \(P_1\). We further show uniqueness of solution for the forward DRE for \(t\in [1,N1]\) by considering its adjoint system and using the fact that A is nonsingular
whose solution can be uniquely determined for all \(t\geqslant 1\). Recall that \(P_t=DRE(Q,0)\) and let \(\lbrace X_t \rbrace\) be the regular matrix solution of its closedloop system \(X_{t+1}=(A+BK_t)X_t\) with \(X_1=I\). Since \(A+BK_t\) is always nonsingular [18], the nonsingular \(X_t\) together with \(Y_t=P_t X_t\) compose the unique solution to (6). Therefore, for the given \(P_1\), there exists a unique solution \(Y_tX_t^{1}\) that satisfies the forward DRE for \(t=1:N\), which contradicts with the fact \({\bar{P}}_N={\Delta } P \ne P_N\). Hence the claim follows. \(\square\)
Now, we are ready to present the theorem for the identifiability of the model structure \({\mathcal {M}}_{2:N1}(Q)\).
Theorem 2
Under Assumption 1, suppose that (A, B) is controllable, \(N\geqslant 2n\), A is invertible, B has full column rank, then the model structure (3) is strictly globally identifiable.
Proof
We prove the statement by contradiction. Suppose two different weighting matrices \(Q_1 \ne Q_2\) could generate the same model structure, namely, \({\mathcal {M}}_{2:N1}(Q_1)={\mathcal {M}}_{2:N1}(Q_2)\). For the model structure \({\mathcal {M}}_{t,i}(Q)\) that corresponds to the ith system output at time instant t. By Theorem 1, assume \(t^\prime \leqslant n1\) is the first time instant such that \(K_{t^\prime }(Q_1)\ne K_{t^\prime }(Q_2)\). Therefore, we have
Hence, if it holds for the model structures \({\mathcal {M}}_t(Q_1)= {\mathcal {M}}_t(Q_2)\) at all \(t=1:N1\), then it must hold that
Using the fact that \(A_{cl}(t;Q)\) is invertible for all \(t=1:N1\) [5], we can cancel the product of closedloop system matrices in the above equation and get
Since \({\mathscr {L}}\) is invertible by definition, it holds that \(K_{t^\prime }(Q_1)=K_{t^\prime }(Q_2)\) and we have a contradiction. Hence the model structure is globally identifiable at \(Q_1\). Since \(Q_1\) is arbitrarily chosen, this means that the model structure is strictly globally identifiable. \(\square\)
Remark 1
In the above theorem, it is easy to see that it is actually sufficient for the timehorizon length to have \(N\geqslant n+\max _i(r_i)\) to make the model structure strictly globally identifiable. However, we state the theorem as it is just to make the theorem easier to comprehend for the reader. On the other hand, for the case \(l>m\) (the output dimension is greater than the input dimension), we just need to assume that there is a subset of outputs that makes the system have relative degree \((r_1,\ldots ,r_m)\).
Here, we would like to remark that in our problem formulation observability of the system is not assumed. This is due to the fact that lack of observability can happen in various situations such as sensor unavailability, data shared with privacy, uncertain environment or tasks with a limited number of descriptive features, and it is not necessary to impose reconstructability of the state from noisy observations. On the other hand, for the identifiability of the IOC problem, it is required that the influence of different cost functions should be reflected sufficiently in the observation sequences. More specifically, from the proof of Theorem 2, we can see that the existence of relative degree plays a key role in guaranteeing the distinguishability of the system output under different parameter specifications, namely, \({\bar{y}}_t({\bar{Q}},{\bar{x}})={\bar{y}}_t(Q,{\bar{x}})\) with \(t=2,\ldots N\) if and only if \(Q={{\bar{Q}}}\). It means that under such assumption, the LQR problems would generate exactly the same output sequences in the noiseless case only when the same weighting matrix Q is used. However, such property can not always be guaranteed for systems without a relative degree, which is illustrated by the following example.
Example 1
Consider the following controllable square system
which, however, does not have a relative degree.
For the quadratic cost function, we take \(N=5\). We could show that the following weighting matrices would generate the same output sequences \({\bar{y}}_t\),
First, we check the first output. Since \(c_1B=0\), \(c_1AB \ne 0\), and \({\bar{y}}_{t+1,i}(Q,{\bar{x}})=c_i\Pi _{j=1}^t(A+BK_j){\bar{x}}\), straightforward computation gives
By computing \(K_t\) from DRE, one could easily show that \({\bar{y}}_{t,1}({\bar{Q}},{\bar{x}})={\bar{y}}_{t,1}(Q,{\bar{x}})\) for any \({{\bar{x}}}\) and \(t=1,\ldots N\) since all the coefficient matrices of \({{\bar{x}}}\) in the right hand side of the above equations are equal for \(K_t(Q)\) and \(K_t({{\bar{Q}}})\). The same claim also holds for \({\bar{y}}_{t,2}\) and the statement follows. \(\square\)
IOC using exact initial values and noisy output
We first consider the IOC problem in which exact samples of the initial value \(x_1={\bar{x}}\) and noisy output \(y_{2:N}\) are available. Suppose that \({\bar{x}}\in {\mathbb {R}}^n\), \(\{v_t\in {\mathbb {R}}^n\}_{t=2}^N\) are independent random vectors with unknown distributions. We make further the following assumptions:
Assumption 2
It holds that \(N\geqslant 2n\), \(l\geqslant m\), and there exists a subset of outputs that makes the system have relative degree \((r_1,\ldots ,r_m)\).
Assumption 3
(Persistent excitation) \(\forall \eta \in {\mathbb {R}}^n\), \(\exists r(\eta )\in {\mathbb {R}}\backslash \{0\}\) such that \({\mathbb {P}}\left( {\bar{x}}\in {\mathscr {B}}_\varepsilon (r\eta )\right)>0,\forall \varepsilon >0\), where \({\mathscr {B}}_\varepsilon (r\eta )\) is the open \(\varepsilon\)ball centered at \(r\eta\).
Assumption 4
The “real” \({\bar{Q}}\) belongs to a compact set \(\bar{{\mathbb {S}}}^n_+(\varphi )=\{Q\in {\mathbb {S}}^n_+\Vert Q\Vert ^2_F\leqslant \varphi \}\).
Assumption 5
\({\mathbb {E}}(\Vert {\bar{x}}\Vert ^2)<+\infty\), \({\mathbb {E}}(v_t)=0\) and \({\mathbb {E}}(\Vert v_t\Vert ^2)<+\infty\).
With the assumptions above, the forward LQR problem (1) and (2) can actually be expressed as
where \({\varOmega }\) is the sample space. Hence the optimal trajectory \(\{x_t^*\}\) as well as the observed system output \(\{y_t^* = Cx_t^*+v_t\}\) are random vectors implicitly determined by the random variable \({\bar{x}}\) and the parameter Q. Based on (7) we can define a risk function as
where \(\xi =[{\bar{x}}^\mathrm{T},Y^\mathrm{T}]^\mathrm{T}\), \(Y=[y_2^\mathrm{T},\ldots ,y_N^\mathrm{T}]^\mathrm{T}\) and \(f:\bar{{\mathbb {S}}}^n_+(\varphi )\times {\mathbb {R}}^{n+(N1)l}\mapsto {\mathbb {R}}\),
and \(x_{2:N}^*(Q;\,{\bar{x}})\) is the optimal trajectory to (7).
We minimize the risk function to obtain an estimate of the Q matrix, namely,
The main problem now is that whether the risk minimization problem has a unique solution. For a given closedloop system matrix \(A+BK_t\) of a finitehorizon LQR problem, the uniqueness of the corresponding quadratic cost function has been studied in discrete systems [18] and continuous systems [19], respectively. However, additional difficulties would rise when we could only measure the system output. The following lemma shows the uniqueness of the solution to (10) under the assumption of relative degrees.
Lemma 1
Under Assumptions 2–5, the risk minimization problem (10) has a unique optimizer \({\bar{Q}}\), where \({\bar{Q}}\) is the true value used in the forward problem (7).
Proof
Since \(y_t=Cx^*_t({\bar{Q}},{\bar{x}})+v_t\), \({\mathbb {E}}(v_t)=0,t=2:N\), (8) can be simplified as \({\mathscr {R}}(Q)=L(Q)+\textstyle \sum \nolimits _{t=2}^N{\mathbb {E}}(\Vert v_t\Vert ^2)\), where \(L(Q)={\mathbb {E}}( \textstyle \sum \nolimits _{t=2}^N\Vert {\bar{y}}_t({\bar{Q}},{\bar{x}}){\bar{y}}_t(Q,{\bar{x}})\Vert ^2)\), \({\bar{y}}_t({\bar{Q}},{\bar{x}})=Cx_t^*({\bar{Q}},{\bar{x}})\) and \({\bar{y}}_t(Q,{\bar{x}})=Cx_t^*(Q,{\bar{x}})\). It is clear that \(Q={\bar{Q}}\) minimizes the risk function \({\mathscr {R}}(Q)\). What remains to show is the uniqueness.
Recall that by our assumption, there exists a subset of outputs that makes the system have relative degree \((r_1,\ldots ,r_m)\). On the other hand, it holds that \({\bar{y}}_t(Q,{\bar{x}})={\mathcal {M}}_t(Q){\bar{x}}\). Note that
By Theorem 2, we know that given \(N\geqslant 2n\), if \(Q\ne {\bar{Q}}\), then there exists some time instants \(t_i\)s such that \({\mathcal {M}}_t(Q)\ne {\mathcal {M}}_t({\bar{Q}})\). On the other hand, by Assumption 3, \(\exists r(\eta )\ne 0\), such that \({\mathbb {P}}({\bar{x}}\in {\mathscr {B}}_\varepsilon (r\eta ))>0,\forall \varepsilon\). Since \(r\ne 0\), it holds that \({\mathcal {M}}_{t_i}({\bar{Q}})(r\eta )\ne {\mathcal {M}}_{t_i}(Q)(r\eta )\). Furthermore, since \({\mathcal {M}}_t(Q)\eta\) is continuous with respect to \(\eta\), \(\forall Q\in {\mathbb {S}}^n_+\), this implies \(\exists \varepsilon _1\), such that \({\mathcal {M}}_{t_i}(Q){\bar{x}}\ne {\mathcal {M}}_{t_i}({\bar{Q}}){\bar{x}},\forall {\bar{x}}\in {\mathscr {B}}_{\varepsilon _1}(r\eta )\) and \({\mathbb {P}}\left( {\bar{x}}\in {\mathbb {B}}_{\varepsilon _1}(r\eta )\right) >0\). Thus,
Hence, \({\bar{Q}}\) is the unique minimizer to (8) and the statement follows. \(\square\)
The risk minimization problem is expected to be solved numerically. To tackle the issue that the distributions of \({\bar{x}}\) and \(v_{2:N}\) are unknown, we approximate (8) using the empirical mean
where \(\xi ^{(i)}\) are i.i.d.
We consider first the optimal control problem (7). By Pontryagin’s Maximum Principle (PMP), if \(u_{1:N1}^*\) and \(x_{1:N}^*\) are the optimal control and corresponding trajectory, then there exist adjoint variables \(\lambda _{2:N}^*\) such that
Since what we consider is a forward LQR problem, the PMP conditions are also sufficient. Hence, we can express \(x_{2:N}^*\) in (9) using (12). Thus, the approximated risk minimization problem reads
The superscript “star” is omitted for simplicity. Note that the optimizer \((Q_M^*(\omega ),x_{2:N}^{(i)*}(\omega ),\lambda _{2:N}^{(i)*}(\omega ))\) is stochastic and is optimal in the sense that it optimizes (11) for every \(\omega \in {\varOmega }\).
Based on Lemma 1, we could then show the statistical consistency of the approximated risk minimization problem (13).
Theorem 3
Suppose \(Q_M^*\in \bar{{\mathbb {S}}}^n_+(\varphi )\), \(\{x_{2:N}^{(i)*}\}\) and \(\{\lambda _{2:N}^{(i)*}\}\) solve (13), then \(Q_M^*\overset{p}{\rightarrow }{\bar{Q}}\), where \({\bar{Q}}\) is the true value used in the “forward” problem (7).
Proof
Denote \(z_t=\left( x_t^\mathrm{T},\lambda _t^\mathrm{T}\right) ^\mathrm{T},t=2:N\), then the first two equations in (12) can be written as the following implicit dynamics
Therefore, we can rewrite (12) in the following compact matrix form as
where
Note that for an arbitrary \(Q\in {\mathbb {S}}^n_+\), since (14) is a sufficient and necessary condition for the optimality of the corresponding forward LQR problem that in turn has a unique solution, \({\mathscr {F}}(Q)\) must be invertible for all \(Q\in {\mathbb {S}}^n_+\). Thus, it follows that \(Z={\mathscr {F}}(Q)^{1}b({\bar{x}})={\mathscr {F}}(Q)^{1}{\tilde{A}}{\bar{x}}\), where \({\tilde{A}}=[A^\mathrm{T},0,\ldots ,0]^\mathrm{T}\). Hence \(f(Q;\xi )\) can be rewritten as
where \(G=I_{N1}\otimes \left[ C,0_{l\times n}\right]\) and \(Y=[y_2^\mathrm{T},\ldots ,y_N^\mathrm{T}]^\mathrm{T}\).
It can be shown in a similar way to [5] that the uniform law of large numbers [20] applies, namely,
In addition to (15), by Lemma 1 we have also shown that \({\bar{Q}}\) is the unique optimizer to (10), then \(Q_M^*\overset{p}{\rightarrow }{\bar{Q}}\) follows directly from Theorem 5.7 in [21]. \(\square\)
IOC under Gaussian assumption
In this section, we consider the case in which exact samples on the initial values are not available and we can only observe the noisy measurement \(y_1=Cx_1+v_1\). Recall that in Sect. 3, we mentioned that we see the initial value as the “input signal” of the model structure while the system output observation as the “output signal”. This means that in this scenario, the problem is actually an errorinvariable system identification problem, which is a hard problem in general. To tackle this problem, we have the following assumption in the remainder of this section.
Assumption 6
\({\bar{x}}=x_1\sim {\mathcal {N}}(m_1,{\varSigma} _1)\), \(v_t\sim {\mathcal {N}}(0,{\varSigma} _v)\) and \(v_{1:N}\) are white. \({\varSigma} _v\) is known, but \(m_1\) and \({\varSigma} _1\) are not a priori known.
Recall that in the proof of Theorem 3, we are able to write \(Y=[y_2^\mathrm{T},\ldots ,y_N^\mathrm{T}]^\mathrm{T}=G{\mathscr {F}}(Q)^{1}{\tilde{A}}{\bar{x}}+[v_2^\mathrm{T},\ldots ,v_N^\mathrm{T}]^\mathrm{T}\). Namely, it holds that
where \(G_t=\left[ G\right] _{l(t1)+1:lt}\). We can regard \(y_{1:N}\) as the output of the following system:
where \({\tilde{C}}_1(Q)=C\) and \({\tilde{C}}_t(Q)=G_t{\mathscr {F}}(Q)^{1}{\tilde{A}},t=2:N\). This means that the IOC problem can be viewed as a system identification problem for linear Gaussian model under Assumption 6, which can be solved using maximum loglikelihood method. If M sets of output sequences are available, it means we have M i.i.d samples \(x_1^{(1:M)}\) of the initial value. Since \(v_t^{(1:M)}\) is also i.i.d., \(y_t^{(i)} = Cx_t^{(i)}+v_t^{(i)}\) and \(x_t^{(i)}\) depends only on the initial value and Q, \(y_t^{(i)}\) is independent of \(y_t^{(j)}\), \(\forall i\ne j\).
EMalgorithm will be used to get a maximum loglikelihood estimate for Q. Based on (16), \(\eta _1\) is chosen as the latent variable and we use \(\theta\) to parametrize Q, \(m_1\) and \({\varSigma} _1\). Now we are ready to compute the auxiliary function \({\mathcal {Q}}(\theta ,\theta _k)\) for EMalgorithm.
Proposition 1
Under Assumption 6, the auxiliary function \({\mathcal {Q}}(\theta ,\theta _k)\) that corresponds to the model (16) is given by
where
and \({\hat{\eta }}_{1N}^{(i)}:={\mathbb {E}}_{\theta _k}[\eta _1^{(i)}y_{1:N}^{(i)}]\), \(P_{1N}^{(i)}={{\,\mathrm{cov}\,}}_{\theta _k}[\eta _1^{(i)}y_{1:N}^{(i)}]\).
Proof
By Bayes’ rule, it follows that
Hence by the definition of the auxiliary function \({\mathcal {Q}}(\theta ,\theta _k)\), we have
Note that \(y_1^{(i)}=C\eta _1^{(i)}+v_1^{(i)}\), hence \(p_\theta (y_1^{(i)}\eta _1^{(i)})\sim {\mathcal {N}}(C\eta _1^{(i)},{\varSigma} _v)\) and it has nothing to do with \(\theta\). By using the cyclic permutation property of matrix trace and the fact that \({\mathbb {E}}_{\theta _k}[\eta _1^{(i)}\eta _1^{(i){\rm T}}y_{1:N}^{(i)}]={\hat{\eta }}_{1N}^{(i)}{\hat{\eta }}_{1N}^{(i){\rm T}}+P_{1N}^{(i)}\), we have
where the constant and irrelevant terms are ignored. \(\square\)
We can obtain the mean and covariance for the initial value \({\hat{\eta }}_1^{(i)}\) and \(P_{1N}^{(i)}\) by fixed point smoothing [22]. Each iteration of EMalgorithm involves maximizing \({\mathcal {Q}}(\theta ,\theta _k)\) with respect to \(\theta\). Since \(m_1\) and \({\varSigma} _1\) only appear in \({\mathcal {Q}}_1(\theta ,\theta _k)\) while Q appears only in \({\mathcal {Q}}_2(\theta ,\theta _k)\), they can be optimized separately. The firstorder optimality condition for maximizing \({\mathcal {Q}}_1(\theta ,\theta _k)\) with respect to \(m_1\) and \({\varSigma} _1\) reads:
Since (21) and (22) give the unique maximizer of \({\mathcal {Q}}_1(\theta ,\theta _k)\), it is also the global maximizer of \({\mathcal {Q}}_1(\theta ,\theta _k)\). In contrast it is not possible to obtain the analytic solution for the optimizer of \({\mathcal {Q}}_2(\theta ,\theta _k)\), thus we have to solve \(Q^*\) numerically. The problem of maximizing \({\mathcal {Q}}_2(\theta ,\theta _k)\) can be rewritten as
where \(Y^{(i)}=[y_2^{(i)\rm T},\ldots ,y_N^{(i)\rm T}]^\mathrm{T}\), \({\varSigma} _V=I_{N1}\otimes {\varSigma} _v\). Using the function \({\hat{f}}_\varepsilon\) proposed by [23] and apply the same “trick” mentioned in the end of Sect. 4, we can use standard nonlinear optimization solvers to solve (23).
Numerical examples
In our numerical simulation, a group of discretetime linear systems sampled from continuous linear systems \({\dot{x}}={\hat{A}}x+{\hat{B}}u\) with the sampling period \({\Delta } t=0.1\) are generated, where
and \(a_1,a_2,c_1,c_2\) are randomly generated from \({\mathcal {N}}(0,3)\), respectively. However, we need to screen for satisfaction of the relative degree condition. The “real” \({\bar{Q}}\) is generated by letting \({\bar{Q}}={\hat{Q}}{\hat{Q}}^\mathrm{T}\), where each element of \({\hat{Q}}\) is sampled from the uniform distribution on \([1,1]\) and the timehorizon length N is set to 40. The feasible compact set for Q is set as \(\bar{{\mathbb {S}}}^n_+(5)\) and those randomly generated \({\bar{Q}}\) that does not belong to \(\bar{{\mathbb {S}}}^n_+(5)\) is discarded and regenerated until we have 200 groups of randomly generated “forward” problems. Each element of the initial conditions \({\bar{x}}^{(1:M)}\) is sampled from a uniform distribution supported on \([5,5]\). For each forward problem, 200 trajectories are generated, i.e., \(M=200\). We add 20dB of white Gaussian noises to \(Cx_{2:N}^{(1:M)}\) to obtain \(y_{2:N}^{(1:M)}\). MATLAB function fmincon is used to solve the riskminimizing problem. We use \(Q=I\) as the initial iteration values when solving the optimization problem in MATLAB for all IOC estimations. Denote the estimation of \({\bar{Q}}\) as \(Q_\mathrm{est}\).
From Fig. 1, we see that the relative error \(\Vert Q_\mathrm{est}{\bar{Q}}\Vert _F/\Vert {{\bar{Q}}}\Vert _F\) largely decreases as M increases. This empirically justifies Theorem 3 of the statistical consistency.
Next, the result of IOC estimation under Gaussian assumptions would be illustrated. Here, 100 groups of forward problems, in which 50 trajectories are generated for each group, i.e., \(M=50\). The initial values are randomly generated by sampling from \({\mathcal {N}}(m_1=0,{\varSigma} _1=10I)\). White Gaussian noises are added to \(Cx_{1:N}^{(1:M)}\) to get \(y_{1:N}^{(1:M)}\) and \(v_t\sim {\mathcal {N}}(0,{\varSigma} _v=0.01I)\). We use \(m_1=I\), \({\varSigma} _1=20I\) and \(Q=I\) as the initial guesses for EMalgorithm for all groups. The stopping criterion for iteration is \(\Vert Q_{k+1}Q_k\Vert _F\leqslant 10^{4}\). The results are shown as Fig. 2.
As illustrated in Fig. 2, the relative error \(\Vert Q_\mathrm{est}{\bar{Q}}\Vert _F/\Vert {{\bar{Q}}}\Vert _F\) is largely small, but we do get a few estimations with “huge” relative errors. This is due to the fact that EMalgorithms only guarantee that the likelihood does not descend but guarantee neither the finding of the global maximizer for the loglikelihood function nor the convergence of \(Q_k\) to some local maxima. Mitigation of such issues will be our future work.
Conclusions
In this investigation, IOC for finitehorizon discretetime LQRs is studied. We first investigate the identifiability of the model structure and conditions for identifiability is given. Next, the problem of IOC is considered under the two scenarios. We first consider the case where the distributions of the initial state and the observation noise are completely unknown, yet the initial states and some noisy system output are available. The problem is formulated as a risk minimization problem. By utilizing the property of identifiability, we are able to prove the statistical consistency for such the estimation method. We next consider the case where the initial states are not available, but the distribution as well as that of the observation noises is Gaussian. Maximumlikelihood estimation is produced by EMalgorithms. The performance of our methods is illustrated by numerical examples.
References
Kalman, R. E. (1964). When is a linear control system optimal? Journal of Basic Engineering, 86(1), 51–60.
Mombaur, K., Truong, A., & Laumond, J.P. (2010). From human to humanoid locomotionan inverse optimal control approach. Autonomous Robots, 28(3), 369–383.
Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In International conference on machine learning, pp. 49–58. New York, NY, USA.
Berret, B., & Jean, F. (2016). Why don’t we move slower? The value of time in the neural control of action. Journal of Neuroscience, 36(4), 1056–1070.
Zhang, H., Li, Y., & Hu, X. (2019). Inverse optimal control for finitehorizon discretetime linear quadratic regulator under noisy output. In IEEE 58th conference on decision and control (CDC), pp. 6663–6668. Nice, France.
Anderson, B. D. O., & Moore, J. B. (2007). Optimal control: Linear quadratic methods. Courier Corporation.
Jameson, A., & Kreindler, E. (1973). Inverse problem of linear optimal control. SIAM Journal on Control, 11(1), 1–19.
Fujii, T. (1987). A new approach to the LQ design from the viewpoint of the inverse regulator problem. IEEE Transactions on Automatic Control, 32(11), 995–1004.
Boyd, S., El Ghaoui, L., Feron, E., & Balakrishnan, V. (1994). Linear matrix inequalities in system and control theory (vol. 15). SIAM.
Priess, M. C., Conway, R., Choi, J., Popovich, J. M., & Radcliffe, C. (2015). Solutions to the inverse LQR problem with application to biological systems analysis. IEEE Transactions on Control Systems Technology, 23(2), 770–777.
Keshavarz, A., Wang, Y., & Boyd, S. (2011). Imputing a convex objective function. In IEEE international symposium on intelligent control (ISIC), pp. 613–619. Denver, CO, USA.
Bertsimas, D., Gupta, V., & Paschalidis, I. C. (2015). Datadriven estimation in equilibrium using inverse optimization. Mathematical Programming, 153(2), 595–633.
Aswani, A., Shen, Z.J., & Siddiq, A. (2018). Inverse optimization with noisy data. Operations Research, 66(3), 870–892.
Molloy, T. L, Tsai, D., Ford, J. J, & Perez, T. (2016). Discretetime inverse optimal control with partialstate information: A softoptimality approach with constrained state estimation. In IEEE 55th conference on decision and control (CDC), pp. 1926–1932. Las Vegas, NV, USA.
Molloy, T. L., Ford, J. J., & Perez, T. (2018). Finitehorizon inverse optimal control for discretetime nonlinear systems. Automatica, 87, 442–446.
Jin, W., Kulić, D., Mou, S., & Hirche, S. (2018). Inverse optimal control with incomplete observations. arXiv: 1803.07696 (arXiv Preprint).
Ljung, L., & Chen, T. (2013). Convexity issues in system identification. In The 10th IEEE international conference on control and automation (ICCA), pp. 1–9. Hangzhou, China.
Zhang, H., Umenberger, J., & Hu, X. (2019). Inverse optimal control for discretetime finitehorizon linear quadratic regulators. Automatica, 110, 108593.
Li, Y., Yao, Yu., & Hu, X. (2020). Continuoustime inverse quadratic optimal control problem. Automatica, 117, 108977.
Jennrich, R. I. (1969). Asymptotic properties of nonlinear least squares estimators. The Annals of Mathematical Statistics, 40(2), 633–643.
der Vaart, V., & Aad, W. (2000). Asymptotic statistics (vol. 3). Cambridge University Press.
Söderström, T. (2012). Discretetime stochastic systems: Estimation and control. Springer.
Nesterov, Y. (2007). Smoothing technique and its applications in semidefinite optimization. Mathematical Programming, 110(2), 245–259.
Author information
Affiliations
Corresponding author
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhang, H., Li, Y. & Hu, X. Discretetime inverse linear quadratic optimal control over finite timehorizon under noisy output measurements. Control Theory Technol. 19, 563–572 (2021). https://doi.org/10.1007/s11768021000668
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11768021000668
Keywords
 Inverse optimal control
 Linear quadratic regulator
 Statistical consistency
 EMalgorithm