Abstract
A supremum-of-quadratics representation for a class of extended real valued barrier functions is developed and applied in the context of solving a continuous time linear regulator problem subject to a single state constraint of bounded norm. It is shown that this very simple state constrained regulator problem can be equivalently formulated as an unconstrained two-player game. By demonstrating equivalence of the upper and lower values, and exploiting existence and uniqueness of the optimal actions for both players, state feedback characterizations for the corresponding optimal policies for both players are developed. These characterizations are illustrated by a simple example.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Finite horizon continuous time linear quadratic regulator (LQR) problems and their solution can be considered classical in systems theory, see for example [2, 16, 18]. The value function that attends their solution as an optimal control problem is quadratic in the initial system state, with its Hessian determined by the unique solution of a differential Riccati equation (DRE) subject to a terminal condition set by the terminal cost. Standard tools exist for the efficient solution of DREs, and thus for such LQR problems.
The inclusion of nonlinear dynamics, non-quadratic costs, and/or a constraint into an LQR problem introduces a non-quadratic term into the associated Hamiltonian. This non-quadratic term prevents the Hamilton–Jacobi–Bellman (HJB) partial differential equation (PDE) from simplifying to a DRE, so that computationally more expensive numerical methods are usually required [20, 21]. However, if the non-quadratic term introduced is semiconvex [15, 21], it can be equated with a supremum over a family of quadratic functions, which can be exploited in the subsequent analysis [22]. An example of this type of sup-of-quadratics representation is illustrated in Fig. 1.
The purpose of this paper is to explore the inclusion of a semiconvex extended real valued barrier function in the cost function for an LQR problem via its sup-of-quadratics representation, with the proviso that convexity of the cost function (with respect to the control input) is retained. The intention is to implement an elementary state constraint of bounded norm, containing the origin, using computations that involve a DRE. The development reported is a compendium of earlier efforts [11, 12] by the authors, augmented with detailed proofs to support its validity. The focus is deliberately on the sup-of-quadratics representation for the barrier function involved, and the implications its use has in the formulation and solution of the optimal control problems of interest.
1.1 Approach and Contribution
A typical extended real valued barrier function, and its finite approximation, is illustrated in Fig. 2. This type of barrier function is intended for the implementation of an elementary state constraint of bounded norm, containing the origin. Under reasonable conditions, either an exact or an approximate sup-of-quadratics representation can be constructed for such a barrier function, analogously to Fig. 1. Adding either representation to the cost function for an otherwise standard LQR problem, with the proviso that convexity must be retained, yields a non-quadratic value function for an exact or an approximate state constrained optimal control problem (respectively). Exploiting the aforementioned convexity, and other properties of the cost functions, yields convergence of the approximate problem to the exact problem, in terms of their value functions and their corresponding optimal controls and state trajectories. Satisfaction of the state constraint for the exact problem also follows.
The exact and approximate sup-of-quadratics representation can be further interpreted, via measurable selection, as encoding the actions of an auxiliary player in corresponding games. The value of the approximate game is shown to exist, and to coincide with that of the approximate regulator problem. The earlier convergence results allow corresponding conclusions in the exact case. Further consideration of the lower value of the approximating game leads to state feedback characterizations for the optimal policies of both players. These policies are shown to explicitly depend on the solution of the state dynamics driven by the optimal control, and the solution of a DRE facilitated by the sup-of-quadratics representation. As the state dynamics and the DRE evolve their respective solutions forwards and backwards in time, application of these policies is admitted via solution of a two point boundary value problem.
1.2 Context
The value function of a general state constrained control problem, including for the simple case considered in this paper, can be characterized as the viscosity solution of the aforementioned non-stationary HJB PDE. Uniqueness of this solution can be guaranteed in the presence of suitable controllability assumptions on the constraint set, for example, via inward pointing [8, 26, 27] or related boundary conditions [19]. Alternatively, the value function can be characterized uniquely as the viscosity solution of a variational inequality, in very general settings, using viability theory and non-smooth analysis [1, 4]. Moreover, consistent approximations generated via temporal and spatial discretization provide the foundation for finite difference methods for numerical computation, see for example [7, 9]. In the context of the curse-of-dimensionality, recent advances in optimization based approaches exploiting (for example) reproducing kernels [3] and optimistic planning [5] have provided very promising computational improvements over those earlier grid-based methods.
In the context of duality and barrier function approaches to state constrained optimal control, immediately relevant prior works include [6, 10, 13, 17, 24, 29]. Specifically, [29] develops an approximation scheme for general convex costs, and studies consistency of this approximation, while [10] considers continuous time constrained control in a model predictive control setting, subject to an inward pointing condition on the feedback policy. One of many related investigations exploiting barrier functions in the implementation of constraints is detailed in [13], via a discrete time setting. Duality and saddle point properties are explored in a more general setting in [6, 17], albeit in the case of control constraints. The tools of convex analysis are employed in the general treatment of a closely related class of continuous time problems in [24] that addresses both control and state constraints. A key contribution of the current work relative to [24] concerns the sup-of-quadratics representation developed, and the exploitation of the game formulation that follows from it via measurable selection.
1.3 Organization and Notation
The regulator problem of interest is posed in Sect. 2, along with the barrier functions involved. This is followed in Sect. 3 by development of the exact and approximate sup-of-quadratics representations for these barrier functions, and the introduction of the approximate regulator problem. Existence and uniqueness of optimal trajectories for the exact and approximate regulator problems are considered in Sect. 4, along with their behaviour relative to the state constraint of interest. Exact and approximate two player games are formulated in Sect. 5, and their respective equivalences with the exact and approximate regulator problems is demonstrated. A further equivalence of the upper and lower values is demonstrated in each case. This in turn motivates characterization of the optimal policies involved via solution of a two-point boundary value problem defined in terms of a DRE. This characterization is illustrated by example in Sect. 6. The paper concludes with some minor summarizing remarks in Sect. 7. Some additional technicalities and proofs appear in Appendices A and B, while a summary of significant notations is included in Appendix C.
Throughout, \({\mathbb {R}}\), \({\mathbb {N}}\), \({\mathbb {Z}}\) denote the reals, natural numbers, and integers, while \({\mathbb {R}}_{\ge 0}\), \({\mathbb {R}}_{>0}\), and \({\overline{{\mathbb {R}}}}\) denote the non-negative, positive, and extended reals respectively, with the latter defined by \({\overline{{\mathbb {R}}}}\doteq {\overline{{\mathbb {R}}}}^-\cup {\overline{{\mathbb {R}}}}^+\), \({\overline{{\mathbb {R}}}}^\pm \doteq {\mathbb {R}}\cup \{\pm \infty \}\). For convenience, \({\mathbb {R}}_{\ge a} \doteq [a,\infty )\) and \({\mathbb {R}}_{>a}\doteq (a,\infty )\) for any \(a\in {\mathbb {R}}\). An n-dimensional Euclidean space is denoted by \({\mathbb {R}}^n\). The space of \(n\times m\) matrices on \({\mathbb {R}}\) is denoted by \({\mathbb {R}}^{n\times m}\). The set of positive semidefinite symmetric matrices in \({\mathbb {R}}^{n\times n}\) is denoted by \(\varSigma ^n\). The Euclidean and induced matrix norms are denoted by \(|\cdot |\) and \(\Vert \cdot \Vert \) respectively. Otherwise, the norm on a Banach space \({{\mathscr {U}}}\) is denoted by \(\Vert \cdot \Vert _{{{\mathscr {U}}}}\), or simply \(\Vert \cdot \Vert \) if the space is contextually clear. Open and closed balls of radius \(r\in {\mathbb {R}}_{\ge 0}\) in \({{\mathscr {U}}}\) are denoted respectively by \({{\mathscr {B}}}_{{{\mathscr {U}}}}(0;r)\) and \({{\mathscr {B}}}_{{{\mathscr {U}}}}[0;r]\). Weak convergence of a sequence \(\{u_k\}_{k\in {\mathbb {N}}}\subset {{\mathscr {U}}}\) to some \({\bar{u}}\in {{\mathscr {U}}}\) is denoted by \(u_k \rightharpoonup {\bar{u}}\) (as \(k\rightarrow \infty \)). The product space \({{\mathscr {U}}}\times \cdots \times {{\mathscr {U}}}\) of \(k\in {\mathbb {N}}\) instances of \({{\mathscr {U}}}\) is denoted by \({{\mathscr {U}}}^k\). The space of bounded linear operators between Banach spaces \({{\mathscr {U}}}\) and \({{\mathscr {V}}}\) is denoted by \({\mathcal {L}}({{\mathscr {U}}};{{\mathscr {V}}})\). The spaces of continuous and k-times continuously differentiable functions mapping \({{\mathscr {U}}}\) to \({{\mathscr {V}}}\) are denoted by \(C({{\mathscr {U}}};{{\mathscr {V}}})\) and \(C^{k}({{\mathscr {U}}};{{\mathscr {V}}})\) for \(k\in {\mathbb {N}}\cup \{\infty \}\). Differentiability at a closed left or right end-point of an interval is interpreted throughout to mean right- or left-differentiability respectively. The space of (Lebesgue) square integrable mappings from \([0,t]\subset {\mathbb {R}}_{\ge 0}\) to \({{\mathscr {U}}}\) is denoted by \({{{{\mathscr {L}}}}_2}([0,t];{{\mathscr {U}}})\). Unless otherwise specified, \(C([0,t];{{\mathscr {U}}})\) is equipped with the sup norm. A function \(f:{{\mathscr {U}}}\rightarrow {\overline{{\mathbb {R}}}}\) has (possibly empty) domain \({{\,\textrm{dom}\,}}f\doteq \{ u\in {{\mathscr {U}}}\, | \, f(u)< \infty \}\), is proper if \({{\,\textrm{dom}\,}}f \ne \emptyset \) and f is finite on \({{\,\textrm{dom}\,}}f\). \(f:{{\mathscr {U}}}\rightarrow {\overline{{\mathbb {R}}}}\) is lower semicontinuous if \({{\,\textrm{lsc}\,}}f(u) \doteq \liminf _{{\tilde{u}}\rightarrow u} f({\tilde{u}}) \ge f(u)\) for all \(u\in {{\mathscr {U}}}\), and (lower) closed if its epigraph is closed. \(f:{{\mathscr {U}}}\rightarrow {\overline{{\mathbb {R}}}}\) is (strictly) convex if \(f:{{\,\textrm{dom}\,}}f\rightarrow {\overline{{\mathbb {R}}}}^-\) is (strictly) convex, and coercive if \(\lim _{\Vert u\Vert \rightarrow \infty } f(u) / \Vert u\Vert = \infty \).
2 State Constrained Linear Regulator Problem
Interest is restricted to optimal control problems defined on a finite time horizon \(t\in {\mathbb {R}}_{\ge 0}\), with respect to linear dynamics and a convex barrier state constraint. The value function \({\overline{W}}_t:{\mathbb {R}}^n\rightarrow {\overline{{\mathbb {R}}}}^+\) involved is defined by
for all \(x\in {\mathbb {R}}^n\), in which \({{\mathscr {U}}}[0,t] \doteq {{{{\mathscr {L}}}}_2}([0,t];{\mathbb {R}}^m)\) is the space of open loop controls, and \({\bar{J}}_t\) is a cost function defined with respect to the integrated running costs \({\bar{I}}_t\) and \(I_t^\kappa \), \(\kappa \in {\mathbb {R}}_{>0}\), and a terminal cost \(\varPsi \). Specifically, \({\bar{J}}_t, {\bar{I}}_t:{\mathbb {R}}^n\times {{\mathscr {U}}}[0,t]\rightarrow {\overline{{\mathbb {R}}}}^+\), \(I_t^\kappa :{{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}_{\ge 0}\), and \(\varPsi :{\mathbb {R}}^n\rightarrow {\mathbb {R}}_{\ge 0}\), are defined by
for all \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\), in which \(K\in {\mathbb {R}}\) and \(P_t\in \varSigma ^n\) are a priori fixed, and \(\varPhi \) is an extended real valued barrier function to be specified below. The map \(s\mapsto \xi _s\in {\mathbb {R}}^n\), \(s\in [0,t]\), describes the unique trajectory of a linear dynamical system corresponding to an initial state \(x\in {\mathbb {R}}^n\) and input \(u\in {{\mathscr {U}}}[0,t]\), given explicitly via a map \(\chi :{\mathbb {R}}^n\times {{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}^n\), where
for all \(s\in [0,t]\), given \(A\in {\mathbb {R}}^{n\times n}\), \(B\in {\mathbb {R}}^{n\times m}\), \(B\ne 0\). The barrier function \(\varPhi :{\mathbb {R}}\rightarrow {\overline{{\mathbb {R}}}}^+\) is defined by
for fixed \(b\in {\mathbb {R}}_{>0}\), in which \(\phi :[0,b^2)\rightarrow {\mathbb {R}}\) satisfies the following properties:
Note in particular that (iii)–(v) follow as a consequence of (i)–(ii), see for example [25, Theorem 2.13, p. 46]. As a consequence, \(\phi \) has a well-defined convex dual \(a:{\mathbb {R}}_{\ge \phi '(0)}\rightarrow {\mathbb {R}}_{\ge -\phi (0)}\) given by
for all \(\beta \in {\mathbb {R}}_{\ge \phi '(0)}\), that satisfies a variety of properties, including invertibility, etc, see Lemma 13 in Appendix A. It defines a useful change of coordinates in the sup-of-quadratics representation that is developed for barrier \(\varPhi \) in Sect. 3. Two preliminary lemmas concerning (1), (5) are included prior to commencing this development. Their proofs are standard and are omitted for brevity.
Lemma 1
Given any \(t\in {\mathbb {R}}_{\ge 0}\), \(x\in {\mathbb {R}}^n\), \({\overline{U}}\in {\mathbb {R}}_{\ge 0}\), \(\{u_k\}_{k\in {\mathbb {N}}}\subset {{\mathscr {B}}}_{{{\mathscr {U}}}[0,t]}[0;{\overline{U}}]\), with \(\xi _k\doteq \chi (x,u_k)\) defined via (5) for all \(k\in {\mathbb {N}}\), the following properties hold:
- (i):
-
\(\xi \doteq \chi (x,u):[0,t]\rightarrow {\mathbb {R}}^n\) is uniformly continuous, given \(u\in {{\mathscr {U}}}[0,t]\);
- (ii):
-
\(\chi (x,\cdot )\in C^1({{\mathscr {U}}}[0,t];C([0,t];{\mathbb {R}}^n))\), with Frèchet derivative
$$\begin{aligned} \begin{aligned}&{[}D_u\, \chi (x,u)\, h]_s = [{\mathcal {A}}\, h]_s \doteq \int _0^s e^{A\, (s-\sigma )}\, B\, h_\sigma \, d\sigma , \\&\Vert {\mathcal {A}}\, h\Vert _{C([0,t];{\mathbb {R}}^n)} \le \sup _{s\in [0,t]} \Vert e^{A\, s}\, B \Vert \, \sqrt{t}\, \Vert h\Vert _{{{\mathscr {U}}}[0,t]}, \end{aligned} \end{aligned}$$(9)for all \(u,h\in {{\mathscr {U}}}[0,t]\), \(s\in [0,t]\); and
- (iii):
-
\(\{ \xi _k \}_{k\in {\mathbb {N}}}\subset C([0,t];{\mathbb {R}}^n)\) is uniformly equicontinuous and bounded. Furthermore, there exists a \({\bar{u}}\in {{\mathscr {U}}}[0,t]\) and subsequences \(\{v_k\}_{k\in {\mathbb {N}}}\subset \{ u_k \}_{k\in {\mathbb {N}}}\) and \(\{y_k\}_{k\in {\mathbb {N}}} \subset \{ \xi _k \}_{k\in {\mathbb {N}}}\) such that \(v_k\rightharpoonup {\bar{u}}\) weakly and \(y_k\rightarrow {\bar{\xi }}\doteq \chi (x,{\bar{u}})\) uniformly, in which \(y_k = \chi (x,v_k)\) for all \(k\in {\mathbb {N}}\).
Lemma 2
\(0\in {{\,\textrm{dom}\,}}{\overline{W}}_t\) for all \(t\in {\mathbb {R}}_{\ge 0}\).
In view of (3), (5), (6), (7), it is emphasized that attention is restricted to the very simple case of a single state constraint of bounded norm, i.e. \(|\xi _s|\le b\) for all \(s\in [0,t]\). While this case is seemingly trivial, it is sufficient to demonstrate the details of how the barrier function implementing this constraint can be relaxed in the optimal control problem to yield an unconstrained game. The treatment of more general convex constraints in [11], defined with respect to the intersection of a family of ellipses, is founded on this development.
3 Barrier Representations and an Approximate Regulator Problem
Exact and approximate sup-of-quadratics representations for closed convex barrier functions of the form of \(\varPhi \) of (6) can be established via convex duality [23, 25]. These representations are fundamental to the development of a convergent approximation for the state constrained regulator problem (1), and its representation via unconstrained linear quadratic games. The development of these representations and the approximate regulator problem follow below.
3.1 Exact Sup-of-Quadratics Representation for Convex Barriers
Convex duality and the asserted properties (7) of barrier (6) yield the following, via some rudimentary calculations.
Lemma 3
The barrier function \(\varPhi :{\mathbb {R}}\rightarrow {\overline{{\mathbb {R}}}}^+\) of (6) is closed and convex, and there exists a closed and convex function \({\varTheta }:{\mathbb {R}}\rightarrow {\mathbb {R}}\) such that
for all \(\rho ,\beta \in {\mathbb {R}}\), with a as per (8). Furthermore, the optimizers \({\hat{\beta }}^*:{\mathbb {R}}\rightarrow {\overline{{\mathbb {R}}}}\) and \({\hat{\rho }}^*:{\mathbb {R}}\rightarrow {\mathbb {R}}\) in (10), defined by \({\hat{\beta }}^*(\rho ) \doteq \mathop {\textrm{arg}\,\textrm{max}}\nolimits _{\beta \in {\mathbb {R}}} \{ \beta \, \rho - {\varTheta }(\beta ) \}\) and \({\hat{\rho }}^*(\beta ) \doteq \mathop {\textrm{arg}\,\textrm{max}}\nolimits _{\rho \in {\mathbb {R}}} \{ \beta \, \rho - \varPhi (\rho ) \}\), are given by
for all \(\beta ,\rho \in {\mathbb {R}}\).
A change of coordinates via (8) yields the sup-of-quadratics representation, the details of which are analogous to the later proof of Proposition 2.
Proposition 1
The barrier function \(\varPhi (|\cdot |^2):{\mathbb {R}}^n\rightarrow {\overline{{\mathbb {R}}}}^+\) appearing in (1) via (2), (3), and defined by (6), has the exact sup-of-quadratics representation
for all \(x\in {\mathbb {R}}^n\), in which \(a^{-1}\) is defined via (8). Furthermore, the optimizer \({\hat{\alpha }}^*(|\cdot |^2):{\mathbb {R}}^n\rightarrow {\mathbb {R}}_{\ge -\phi (0)}^+\) in (12) is defined via \(\phi '\), a of (7), (8) by
for all \(x\in {\mathbb {R}}^n\).
Remark 1
While the barrier map \(\rho \mapsto \varPhi (\rho ):{\mathbb {R}}\rightarrow {\overline{{\mathbb {R}}}}^+\) of (6) is guaranteed to be convex by Lemma 3, the corresponding map \(x\mapsto \varPhi (|x|^2):{\mathbb {R}}^n\rightarrow {\overline{{\mathbb {R}}}}^+\) need not be convex. However, Proposition 1 implies that this map is uniformly semiconvex [15, 21]. Choosing any \(\eta \ge -2\, a^{-1}(-\phi (0))\), (12) yields
for all \(x\in {\mathbb {R}}^n\), in which \(a^{-1}(\alpha ) + {\textstyle {\frac{\eta }{2}}} \ge a^{-1}(\alpha ) - a^{-1}(-\phi (0)) \ge 0\) for all \(\alpha \ge -\phi (0)\), as \(a^{-1}\) is strictly increasing by Lemma 13. The right-hand side of (14) is thus a supremum of convex functions, which is thus also convex, see [23, p. 7]. That is, (14) implies that there exists an \(\eta \in {\mathbb {R}}\) such that \(\varPhi (|\cdot |^2) + {\textstyle {\frac{\eta }{2}}}\, |\cdot |^2\) is convex, so that \(\varPhi (|\cdot |^2)\) is uniformly semiconvex by definition, see [15, 21].
3.2 Approximate Sup-of-Quadratics Representation for Convex Barriers
An approximate sup-of-quadratics representation can be obtained by restricting the interval over which the supremum is evaluated in the first equation in (10). To this end, define \(\varPhi ^M:{\mathbb {R}}\rightarrow {\overline{{\mathbb {R}}}}^+\) and \({\hat{\rho }}:[-\phi (0),\infty )\rightarrow [0,b^2)\) by
for all \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(\rho \in {\mathbb {R}}\), with \(\phi '\), a, \({\varTheta }\) as per (7), (8), (10), with the range of \(\varPhi ^M\) to be verified. The proof of the following appears in Appendix B.
Lemma 4
With \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(\varPhi ^M:{\mathbb {R}}\rightarrow {\overline{{\mathbb {R}}}}^+\) of (15) satisfies the following:
- (i):
-
\(\varPhi ^M\) has the explicit representation
$$\begin{aligned}&\varPhi ^M(\rho ) = \left\{ \begin{array}{rl} \infty , &{} \rho \in {\mathbb {R}}_{<0}, \\ \phi (\rho ), &{} \rho \in [0,{\hat{\rho }}(M)], \\ a^{-1}(M)\, \rho - M, &{} \rho \in {\mathbb {R}}_{>{\hat{\rho }}(M)}, \end{array} \right. \end{aligned}$$(16)for all \(\rho \in {\mathbb {R}}\), with the maximizer \(\beta = \beta ^{M*}:{\mathbb {R}}\rightarrow {\overline{{\mathbb {R}}}}^-\) given by
$$\begin{aligned} {\hat{\beta }}^{M*}(\rho )&\doteq \left\{ \begin{array}{rl} -\infty , &{} \rho \in {\mathbb {R}}_{<0}, \\ \phi '(\rho ), &{} \rho \in [0,{\hat{\rho }}(M)], \\ a^{-1}(M), &{} \rho \in {\mathbb {R}}_{>{\hat{\rho }}(M)}. \end{array} \right. \end{aligned}$$(17) - (ii):
-
\(\varPhi ^M\in C({\mathbb {R}}_{\ge 0};{\mathbb {R}})\cap C^1({\mathbb {R}}_{>0};{\mathbb {R}})\), and it is closed and strictly convex;
- (iii):
-
\(\varPhi ^M\) is pointwise non-decreasing in \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), with
$$\begin{aligned} \varPhi (\rho ) = \sup _{M \ge -\phi (0)} \varPhi ^M(\rho ) = \lim _{M\rightarrow \infty } \varPhi ^M(\rho ), \end{aligned}$$for all \(\rho \in {\mathbb {R}}\), with \(\varPhi \) as per (6);
- (iv):
-
There exist \(M_1\in {\mathbb {R}}_{\ge -\phi (0)}\), \(c\in {\mathbb {R}}\) such that \(\inf _{M\ge M_1} \inf _{\rho \in {\mathbb {R}}} \varPhi ^M(\rho )> c\).
- (v):
-
\(\hat{\rho }\) of (15) is strictly increasing, and satisfies \(\lim _{M\rightarrow \infty } {\hat{\rho }}(M) = b^2\).
As per the exact case of Proposition 1, application of this lemma along with a change of coordinates defined by (8) admits an approximate sup-of-quadratics representation.
Proposition 2
Given \(b\in {\mathbb {R}}_{>0}\), the following holds:
- (i):
-
Given \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), the approximation \(\varPhi ^M\) of the barrier function \(\varPhi \) of (6), represented in (15), (16), has the sup-of-quadratics representation
$$\begin{aligned} \varPhi ^M(|x|^2)&= \sup _{\alpha \in [-\phi (0),M]} \{ a^{-1}(\alpha )\, |x|^2 - \alpha \} \end{aligned}$$(18)for all \(x\in {\mathbb {R}}^n\), in which \(a^{-1}\) is as per (8), with the maximizer given by
$$\begin{aligned}&{\hat{\alpha }}^{M*}(|x|^2) \doteq \mathop {\textrm{arg}\,\textrm{max}}\limits _{\alpha \in [-\phi (0),M]} \{ a^{-1}(\alpha )\, |x|^2 - \alpha \} = \left\{ \begin{array}{rl} a\circ \phi '(|x|^2)\,, &{} |x|^2\le {\hat{\rho }}(M)\,, \\ M\,, &{} |x|^2>{\hat{\rho }}(M)\,, \end{array} \right. \end{aligned}$$(19)where \(\phi '\), a, \({\hat{\rho }}\) are as per (7), (8), (15); and
- (ii):
-
\(\varPhi ^M(|\cdot |^2):{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) defined by (18) is pointwise non-decreasing in \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), and converges pointwise to \(\varPhi (|\cdot |^2):{\mathbb {R}}^n\rightarrow {\overline{{\mathbb {R}}}}^+\) of (12) in the limit as \(M\rightarrow \infty \).
Proof
(i) Fix \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(x\in {\mathbb {R}}^n\). Applying Lemma 4(i), note that the optimizer (17) in (15) satisfies \({\hat{\beta }}^{M*}(|x|^2) \in [\phi '(0), a^{-1}(M)]\), as \(|x|^2\in {\mathbb {R}}_{\ge 0}\). Meanwhile, a of (8) defines a change of variable \(\alpha = a(\beta )\) for all \(\beta \in [\phi '(0),\infty )\), via Lemma 13. Hence, \(\varPhi ^M(|x|^2)\) transforms from (15) to \( \varPhi ^M(|x|^2) = \sup _{\beta \in [\phi '(0), a^{-1}(M)]} \{ \beta \, |x|^2 - {\varTheta }(\beta ) \} = \sup _{\beta \in [a^{-1}(-\phi (0)), a^{-1}(M)]} \{ \beta \, |x|^2 - a(\beta ) \} \) via (10), which yields (18). The same change of variable applied to \({\hat{\beta }}^{M*}(|x|^2)\) of (17) yields (19). (ii) is immediate by Lemma 4(iii).\(\square \)
Some useful bounds follow by Proposition 2 and (7), (8).
Corollary 1
Given \(K\in {\mathbb {R}}_{\ge -\phi '(0)}\) as per (3), (7),
for all \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(\rho \in {\mathbb {R}}_{\ge 0}\).
3.3 Approximate Regulator Problem and Its Convergence to the Exact Problem
The sup-of-quadratics representation (12) for the convex barrier function \(\varPhi \) of (6), and its convergent approximation (18), can be used to formulate an approximate regulator problem for (1). Given \(t\in {\mathbb {R}}_{\ge 0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), the value function \({\overline{W}}_t^M:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) for this approximate problem is defined by
for all \(x\in {\mathbb {R}}^n\), with \({\bar{J}}_t^M:{\mathbb {R}}^n\times {{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}\) defined with respect to \(I_t^\kappa \) and \(\varPsi \) of (3), (4) and \({\bar{I}}_t^M:{\mathbb {R}}^n\times {{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}\) by
for all \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\), in which \(\xi \doteq \chi (x,u)\) and \(\varPhi ^M\) are as per (5) and (15), (16), (18) respectively, and \(K\in {\mathbb {R}}_{\ge -\phi '(0)}\), \(\kappa \in {\mathbb {R}}_{>0}\) are fixed. This approximate problem recovers the original problem of (1) in the limit as \(M\rightarrow \infty \), as formalized by the theorem below. For convenience, \({\bar{J}}_t^\infty :{\mathbb {R}}^n\times {{\mathscr {U}}}[0,t]\rightarrow {\overline{{\mathbb {R}}}}^+\) and \({\overline{W}}_t^\infty :{\mathbb {R}}^n\rightarrow {\overline{{\mathbb {R}}}}^+\) are defined by
for all \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\).
Theorem 1
Given \(t\in {\mathbb {R}}_{\ge 0}\), the cost and value functions \({\bar{J}}_t^M\), \({\overline{W}}_t^M\) of (21), (20) are pointwise non-decreasing in \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), and satisfy
for all \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\). where \({\bar{J}}_t,\, {\bar{J}}_t^\infty :{\mathbb {R}}^n\times {{\mathscr {U}}}[0,t]\rightarrow {\overline{{\mathbb {R}}}}^+\) and \({\overline{W}}_t,\, {\overline{W}}_t^\infty :{\mathbb {R}}^n\rightarrow {\overline{{\mathbb {R}}}}^+\) are defined by (1), (2), and (23).
Proof
Fix \(t\in {\mathbb {R}}_{\ge 0}\), \(x\in {\mathbb {R}}^n\). [Non-decreasing property] This is immediate by inspection of (20), (21), (22), and the non-decreasing property of \(\varPhi ^M(|\cdot |^2)\) provided by Proposition 2(ii).
[Left-hand inequalities in (24), (25)] Immediate by the definition of \({\bar{J}}_t^\infty \), \({\overline{W}}_t^\infty \) in (23). Also, Corollary 1 implies that \(-\infty < {\textstyle {\frac{\phi (0)}{2}}} \, t \le {\bar{J}}_t^M(x,u)\). Moreover, as u is arbitrary here, \(-\infty < {\textstyle {\frac{\phi (0)}{2}}} \, t \le {\overline{W}}_t^M(x) = \inf _{u\in {{\mathscr {U}}}[0,t]} {\bar{J}}_t^M(x.u)\).
[Domain properties in (25)] Fix \(M\in {\mathbb {R}}_{\ge -\phi (0)}\). It is immediate by the left-hand inequality in (25) and Lemma 2 that \({{\,\textrm{dom}\,}}{\overline{W}}_t^M\supset {{\,\textrm{dom}\,}}{\overline{W}}_t\ne \emptyset \) holds. For the remaining assertion, fix \(u\in {{\mathscr {U}}}[0,t]\), and recall that \(\chi (x,u)\in C([0,t];{\mathbb {R}}^n)\) via (5) and Lemma 1. Applying Lemma 4(ii), \(\varPhi ^M(|\chi (x,u)|^2)\in C([0,t];{\mathbb {R}})\), so that \({\bar{J}}_t^M(x,u) < \infty \) by inspection of (21). Hence, as u is arbitrary, \({\overline{W}}_t^M(x) = \inf _{u\in {{\mathscr {U}}}[0,t]} {\bar{J}}_t^M(x.u) <\infty \), and as \(x\in {\mathbb {R}}^n\) is arbitrary, \({{\,\textrm{dom}\,}}{\overline{W}}_t^M = {\mathbb {R}}^n\).
[Limits in (24), (25) ] Immediate from the non-decreasing property above.
[Right-hand equality in (24)] Fix \(u\in {{\mathscr {U}}}[0,t]\). In view of Corollary 1 and (22), it follows that \( {\bar{I}}_t^{M}(x,u) = \int _0^t \nu _s^M\, ds + {\textstyle {\frac{\phi (0)}{2}}}\, t \), where \(s\mapsto \nu _s^M \doteq {\textstyle {\frac{K}{2}}} \, |\xi _s|^2 + {{\textstyle {\frac{1}{2}}}}\, \varPhi ^M(|\xi _s|^2) - {\textstyle {\frac{\phi (0)}{2}}}\) is nonnegative by Corollary 1, non-decreasing in \(M\in {\mathbb {R}}_{\ge -\phi (0)}\) by Proposition 2(ii), and continuous (and hence measurable). Applying the monotone convergence theorem,
in which the final equality follows by Lemma 4(iii). Hence, recalling (21), (23),
in which it is noted that \(x\in {\mathbb {R}}^n\) and \(u\in {{\mathscr {U}}}[0,t]\) are arbitrary.
[Right-hand equality in (25)] Applying (23), (24),
It remains to demonstrate the opposite inequality. To this end, fix an arbitrary \({\epsilon }\in {\mathbb {R}}_{>0}\), and any non-decreasing sequence \(\{M_k\}_{k\in {\mathbb {N}}}\subset {\mathbb {R}}_{\ge -\phi (0)}\) such that \(\lim _{k\rightarrow \infty } M_k = \infty \). Define a sequence \(\{u_k^{\epsilon }\}_{k\in {\mathbb {N}}}\subset {{\mathscr {U}}}[0,t]\) by
and note by definition (20) of \({\overline{W}}_t^{M_k}(x)\) that this is always possible. Suppose that \(\{u_k^{\epsilon }\}_{k\in {\mathbb {N}}}\) is unbounded. Applying Corollary 1 in the definition (22) of \({\bar{I}}_t^{M_k}(x,\cdot )\), note that \({\bar{I}}_t^{M_k}(x,u) \ge {\textstyle {\frac{\phi (0)}{2}}} \, t\) for all \(u\in {{\mathscr {U}}}[0,t]\). Combining this with (21), (27) yields \( {\overline{W}}_t^\infty (x) = \lim _{k\rightarrow \infty } {\overline{W}}_t^{M_k}(x) \ge \lim _{k\rightarrow \infty } {\bar{J}}_t^{M_k}(x,u_k^{\epsilon }) - {\epsilon }\ge {\textstyle {\frac{\phi (0)}{2}}} \, t - {\epsilon }+ {\textstyle {\frac{\kappa }{2}}}\, \lim _{k\rightarrow \infty } \Vert u_k^{\epsilon }\Vert _{{{\mathscr {U}}}[0,t]}^{2} = \infty \) which yields \({\overline{W}}_t^\infty (x) \ge {\overline{W}}_t(x)\), as required to complete the proof in that unbounded case.
Alternatively, suppose that \(\{u_k^{\epsilon }\}_{k\in {\mathbb {N}}}\) is bounded, i.e. there exists \({\overline{U}}\in {\mathbb {R}}_{\ge 0}\) such that \(\{u_k^{\epsilon }\}_{k\in {\mathbb {N}}}\in {{\mathscr {B}}}_{{{\mathscr {U}}}[0,t]}[0;{\overline{U}}]\). Lemma 1(iii) implies that there exists a \(\bar{u}^{\epsilon }\in {{\mathscr {U}}}[0,t]\) and a subsequence \(\{{\tilde{u}}_k^{\epsilon }\}_{k\in {\mathbb {N}}} \subset \{u_k^{\epsilon }\}_{k\in {\mathbb {N}}}\) such that \(u_k^{\epsilon }\rightharpoonup \bar{u}^{\epsilon }\) and \({\tilde{\xi }}_k^{\epsilon }\rightarrow {\bar{\xi }}^{\epsilon }\) uniformly as \(k\rightarrow \infty \), where \({\tilde{\xi }}_k^{\epsilon }\doteq \chi (x,{\tilde{u}}_k^{\epsilon })\). In view of (3), (22), and Corollary 1, define a sequence \(\{{\tilde{\nu }}_k^{\epsilon }\}_{k\in {\mathbb {N}}}\) of maps from [0, t] to \({\mathbb {R}}_{\ge 0}\), and candidate limit \({\bar{\nu }}^{\epsilon }:[0,t]\rightarrow {\overline{{\mathbb {R}}}}_{\ge 0}^+\), by
for all \(s\in [0,t]\), \(k\in {\mathbb {N}}\). Fix any \(s\in [0,t]\), \(j\in {\mathbb {N}}\). Note that by monotonicity of \(\{\varPhi ^{M_k}\}_{k\in {\mathbb {N}}}\), see Lemma 4(iii) or Proposition 2(ii), \( \varPhi ^{M_k}(|[{\tilde{\xi }}_k^{\epsilon }]_s|^2) = [\varPhi ^{M_k}(|[{\tilde{\xi }}_k^{\epsilon }]_s|^2) - \varPhi ^{M_{j}}(|[{\tilde{\xi }}_k^{\epsilon }]_s|^2)] + \varPhi ^{M_{j}}(|[{\tilde{\xi }}_k^{\epsilon }]_s|^2) \ge \varPhi ^{M_{j}}(|[{\tilde{\xi }}_k^{\epsilon }]_s|^2) \) for all \(k\ge j\). Hence, as \(\varPhi ^{M_j}\) is continuous, \(\liminf _{k\rightarrow \infty } \varPhi ^{M_k}(|[{\tilde{\xi }}_k^{\epsilon }]_s|^2) \ge \varPhi ^{M_{j}}(|[\bar{\xi }^{\epsilon }]_s|^2)\), so that \(\liminf _{k\rightarrow \infty } \varPhi ^{M_k}(|[{\tilde{\xi }}_k^{\epsilon }]_s|^2) \ge \lim _{j\rightarrow \infty }\varPhi ^{M_{j}}(|[\bar{\xi }^{\epsilon }]_s|^2) = \varPhi (|[\bar{\xi }^{\epsilon }]_s|^2)\). As \(\lim _{k\rightarrow \infty } |[{\tilde{\xi }}_k^{\epsilon }]_s|^2= |[{\bar{\xi }}^{\epsilon }]_s|^2\), (28) subsequently yields that
By inspection, \({\bar{\nu }}_s^{\epsilon }= \infty \) implies that \(\lim _{k\rightarrow \infty } [{\tilde{\nu }}_k^{\epsilon }]_s = \infty = {\bar{\nu }}_s^{\epsilon }\).
Alternatively, suppose that \({\bar{\nu }}_s^{\epsilon }<\infty \). In view of (28), define \([{\hat{\nu }}_k^{\epsilon }]_s \doteq {\textstyle {\frac{K}{2}}}\, |[{\tilde{\xi }}_k^{\epsilon }]_s|^2 + {{\textstyle {\frac{1}{2}}}}\, \varPhi (|[{\tilde{\xi }}_k^{\epsilon }]_s|^2) - {\textstyle {\frac{\phi (0)}{2}}}\) for all \(k\in {\mathbb {N}}\). As \({\bar{\nu }}_s^{\epsilon }<\infty \), there exists an open interval containing \(|[\bar{\xi }^{\epsilon }]_s|^2\) on which \(\varPhi \) is continuous, and \(\lim _{k\rightarrow \infty } [{\tilde{\xi }}_k^{\epsilon }]_s = [{\bar{\xi }}^{\epsilon }]_s\), so that \([{\hat{\nu }}_k^{\epsilon }]_s<\infty \) for all \(k\in {\mathbb {N}}\) sufficiently large, and \(\lim _{k\rightarrow \infty } [{\hat{\nu }}_k^{\epsilon }]_s = {\bar{\nu }}_s^{\epsilon }\). Note further that \([{\tilde{\nu }}_k^{\epsilon }]_s \le [{\hat{\nu }}_k^{\epsilon }]_s\) for all \(k\in {\mathbb {N}}\), again by Lemma 4(iii). Hence,
Consequently, combining (29) and (30), and recalling the \({\bar{\nu }}_s^{\epsilon }= \infty \) case above, it may be concluded that \(\lim _{k\rightarrow \infty } [{\tilde{\nu }}_k^{\epsilon }]_s = {\bar{\nu }}_s^{\epsilon }\) for both the \({\bar{\nu }}_s^{\epsilon }=\infty \) and the \({\bar{\nu }}_s^{\epsilon }<\infty \) cases.
Next, recall by definition (28) and Corollary 1, that \(\{{\tilde{v}}_k^{\epsilon }\}_{k\in {\mathbb {N}}}\) defines a non-negative sequence of functions in \(C([0,t];{\mathbb {R}})\). Consequently, every element of this sequence is measurable and non-negative, so that Fatou’s lemma yields \( \int _0^t {\bar{\nu }}_s^{\epsilon }\, ds = \int _0^t \liminf _{k\rightarrow \infty } [{\tilde{\nu }}_k^{\epsilon }]_s \le \liminf _{k\rightarrow \infty } \int _0^t [{\tilde{\nu }}_k^{\epsilon }]_s\, ds \). Hence, recalling (28), the definitions of \({\tilde{\xi }}_k^{\epsilon }\), \({\bar{\xi }}^{\epsilon }\) prior, and (3), (22),
Meanwhile, by weak convergence of \({\tilde{u}}_k^{\epsilon }\) to \({\bar{u}}^{\epsilon }\), \(\Vert {\bar{u}}^{\epsilon }\Vert _{{{\mathscr {U}}}[0,t]} \le \liminf _{k\rightarrow \infty } \Vert {\tilde{u}}_k^{\epsilon }\Vert _{{{\mathscr {U}}}[0,t]}\), so that (3) implies
Moreover, continuity of \([\chi (x,\cdot )]_t\) by Lemma 1(ii), along with continuity of \(\varPsi _t\) of (4), imply that
Combining (31), (32), (33) via (2), (21) yields
Hence, applying (27) and (34) while recalling that \(\{{\tilde{u}}_k^{\epsilon }\}_{k\in {\mathbb {N}}} \subset \{ u_k^{\epsilon }\}_{k\in {\mathbb {N}}}\) is a subsequence of the near-optimal inputs involved, and noting that \({\bar{u}}^{\epsilon }\) is suboptimal in the definition (1) of \({\overline{W}}_t(x)\), yields
As \({\epsilon }\in {\mathbb {R}}_{>0}\) is arbitrary, it follows that \({\overline{W}}_t(x) \le {\overline{W}}^\infty (x)\). Recalling (26) and the fact that \(t\in {\mathbb {R}}_{\ge 0}\) and \(x\in {\mathbb {R}}^n\) are also arbitrary completes the proof of the equality in (25).\(\square \)
4 Optimal Trajectories and Constraint Satisfaction
Existence and uniqueness of the optimal trajectories in (1), (20) is demonstrated via analysis of the attendant cost functions (2), (22). In particular, these cost functions are shown to be proper, lower semicontinuous, strictly convex, and coercive. These properties are demonstrated to be sufficient for the required existence and uniqueness of the optimal controls involved, as summarised in Theorem 2. The relevant behaviour of the corresponding optimal trajectories, with respect to the state constraint, is concluded in Theorem 3.
4.1 Existence and Uniqueness of the Optimal Controls
In order to demonstrate that the cost functions \({\bar{J}}_t(x,\cdot ), \, {\bar{J}}_t^M(x,\cdot ): {{\mathscr {U}}}[0,t]\rightarrow {\overline{{\mathbb {R}}}}^+\) of (2), (22) are proper, convex, and coercive for fixed \(t\in {\mathbb {R}}_{\ge 0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(x\in {\mathbb {R}}^n\), it is useful to consider the map \(\gamma _x^{s,\alpha }:{{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}\) defined for fixed \(x\in {\mathbb {R}}^n\), \(s\in [0,t]\), \(\alpha \in {\mathbb {R}}_{\ge -\phi (0)}\) by
for all \(u\in {{\mathscr {U}}}[0,t]\), in which \(\chi \) is as per (5).
Lemma 5
Given \(t\in {\mathbb {R}}_{\ge 0}\), \(x\in {\mathbb {R}}^n\), \(s\in [0,t]\), \(\alpha \in {\mathbb {R}}_{\ge -\phi (0)}\), \(\gamma _x^{s,\alpha }:{{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}\) of (35) is convex.
Proof
Fix \(t\in {\mathbb {R}}_{\ge 0}\), \(x\in {\mathbb {R}}^n\), \(s\in [0,t]\), \(\alpha \in {\mathbb {R}}_{\ge -\phi (0)}\). As \(u\mapsto [\chi (x,u)]_s\) is affine by (5), convexity of \(\gamma _x^{s,\alpha }\) follows by inspection of (35), (7), and Lemma 13. \(\square \)
Lemma 6
Given any \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), the cost functions \({\bar{J}}_t(x,\cdot )\), \({\bar{J}}_t^M(x,\cdot ): {{\mathscr {U}}}[0,t]\rightarrow {\overline{{\mathbb {R}}}}^+\) defined for \(x\in {\mathbb {R}}^n\) by (2), (22) satisfy the following:
- (i):
-
\({\bar{J}}_t^M(x,\cdot )\) and \({\bar{J}}_t(x,\cdot )\) are respectively continuous and lower semicontinuous for all \(x\in {\mathbb {R}}^n\);
- (ii):
-
Both are strictly convex and coercive for all \(x\in {\mathbb {R}}^n\); and
- (iii):
-
\({\bar{J}}_t^M(x,\cdot )\) and \({\bar{J}}_t(y,\cdot )\) are proper for all \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t^M = {\mathbb {R}}^n\) and all \(y\in {{\,\textrm{dom}\,}}{\overline{W}}_t\).
Proof
Fix \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\). (i) Fix \(x\in {\mathbb {R}}^n\). [Continuity of \({\bar{J}}_t^M(x,\cdot )\) ] By inspection of (21), (22), and (3), (4), continuity of \({\bar{J}}_t^M(x,\cdot )\) on \({{\mathscr {U}}}[0,t]\) requires continuity of its constituent maps \({\bar{I}}_t^M(x,\cdot )\), \(I_t^\kappa \), and \(\varPsi ([\chi (x,\cdot )]_t)\) on \({{\mathscr {U}}}[0,t]\). This is immediate for \(I_t^\kappa \) and \(\varPsi ([\chi (x,\cdot )]_t)\), by (3), (4), and Lemma 1(ii). The same conclusion follows for \({\bar{I}}_t^M(x,\cdot )\), by application of Lemma 1(ii) and Lemma 4(ii).
[Lower semicontinuity of \({\bar{J}}_t(x,\cdot )\) ] Fix \(u\in {{\mathscr {U}}}[0,t]\), and any sequence \(\{{\tilde{u}}_i\}_{i\in {\mathbb {N}}}\subset {{\mathscr {U}}}[0,t]\) such that \(\lim _{i\rightarrow \infty } \Vert u - {\tilde{u}}_i\Vert _{{{\mathscr {U}}}[0,t]} = 0\). By continuity of \({\bar{J}}_t^M(x,\cdot )\), note that \({\bar{J}}_t^M(x,u) = \lim _{i\rightarrow \infty } {\bar{J}}_t^M(x,{\tilde{u}}_i)\). Hence, applying Theorem 1, and in particular (23), (24),
As \(u\in {{\mathscr {U}}}[0,t]\) and \(\{{\tilde{u}}_i\}_{i\in {\mathbb {N}}}\subset {{\mathscr {U}}}[0,t]\) are arbitrary, the assertion follows.
(ii) Fix \(x\in {\mathbb {R}}^n\). [Convexity of \({\bar{J}}_t^M(x,\cdot )\)] Fix \(u\in {{\mathscr {U}}}[0,t]\), and \(\xi \doteq \chi (x,u)\) by (5). By (3), (18), (22),
where \(\gamma _{x}^{s,\alpha }\) is as per (35). Recall by Lemma 5 that \(\gamma _{x}^{s,\alpha }:{{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}\) is convex for any \(s\in [0,t]\), \(\alpha \in {\mathbb {R}}_{\ge -\phi (0)}\). As convexity is preserved under suprema and integration, see [23, Theorem 3 and (2.6), p. 7], it follows by (36) that \({\bar{I}}_t^M(x,\cdot ):{{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}\) is convex. As \(\varPsi \) of (4) is convex by definition of \(P_t\in \varSigma ^n\), and \([\chi (x,\cdot )]_t:{{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}^n\) is affine, \(\varPsi ([\chi (x,\cdot )]_t):{{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}\) is also convex. By (21), \({\bar{J}}_t^M(x,\cdot ) - {\textstyle {\frac{\kappa }{2}}} \Vert \cdot \Vert _{{{\mathscr {U}}}[0,t]}^2\) is convex, and so \({\bar{J}}_t^M(x,\cdot )\) is strictly convex.
[Convexity of \({\bar{J}}_t(x,\cdot )\) ] Recalling the convexity argument immediately above, \({\bar{J}}_t^M(x,\cdot ) - {\textstyle {\frac{\kappa }{2}}} \Vert \cdot \Vert _{{{\mathscr {U}}}[0,t]}^2\) is convex. Moreover, Theorem 1 implies that (24), (23) hold. Hence, as convexity is preserved under suprema [23, (2.6), p. 7], \({\bar{J}}_t(x,\cdot ) - {\textstyle {\frac{\kappa }{2}}} \Vert \cdot \Vert _{{{\mathscr {U}}}[0,t]}^2\) is convex, so that \({\bar{J}}_t(x,\cdot )\) is strictly convex.
[Coercivity of \({\bar{J}}_t^M(x,\cdot )\) ] Recall by Corollary 1 that \({\textstyle {\frac{K}{2}}} \, |\cdot |^2 + {{\textstyle {\frac{1}{2}}}}\, \varPhi ^M(|\cdot |^2) \ge {\textstyle {\frac{\phi (0)}{2}}}\). Applying (2), (4),
for all \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\). Hence, \({\bar{J}}_t^M(x,\cdot )\) is coercive, as \(\kappa \in {\mathbb {R}}_{>0}\).
[Coercivity of \({\bar{J}}_t(x,\cdot )\) ] Follows by coercivity of \({\bar{J}}_t^M(x,\cdot )\) and Theorem 1.
(iii) Lemma 2 demonstrates that \({{\,\textrm{dom}\,}}{\overline{W}}_t\ne \emptyset \). Fix any \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t\). Select a near-optimal input \({\tilde{u}}\in {{\mathscr {U}}}[0,t]\) in the definition (1) of \({\overline{W}}_t(x)\), such that \({\bar{J}}_t^M(x,{\tilde{u}}) \le {\bar{J}}_t(x,{\tilde{u}})< {\overline{W}}_t(x) + 1<\infty \), and note that this is always possible by Theorem 1, i.e. (24). Hence, \({{\,\textrm{dom}\,}}\, {\bar{J}}_t^M(x,\cdot ) \ne \emptyset \ne {{\,\textrm{dom}\,}}\, {\bar{J}}_t(x,\cdot )\). Again recalling (24), along with (37), note also that \( -\infty < {\textstyle {\frac{\phi (0)}{2}}}\, t + {\textstyle {\frac{\kappa }{2}}}\, \Vert u\Vert _{{{\mathscr {U}}}[0,t]}^2 \le {\bar{J}}_t^M(x,u) \le {\bar{J}}_t(x,u) \) for all \(u\in {{\mathscr {U}}}[0,t]\). Hence, \({\bar{J}}_t(x,\cdot ), \, {\bar{J}}_t^M(x,\cdot ): {{\mathscr {U}}}[0,t]\rightarrow {\overline{{\mathbb {R}}}}^+\) of (2), (22) are proper for any \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t\). Finally, recalling (37) yields that \({\bar{J}}_t^M(y,\cdot )\) is also proper for any \(y\in {{\,\textrm{dom}\,}}{\overline{W}}_t^M = {\mathbb {R}}^n\). \(\square \)
With \(t\in {\mathbb {R}}_{>0}\), existence and uniqueness of the optimal controls in (1), (20) may now be established.
Theorem 2
Given any \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t\), \(y\in {{\,\textrm{dom}\,}}{\overline{W}}_t^M = {\mathbb {R}}^n\), there exist unique optimal controls \(u^*, u^{M*}\in {{\mathscr {U}}}[0,t]\) for the respective optimal control problems (1), (20), with
Moreover, for \(x=y\in {{\,\textrm{dom}\,}}{\overline{W}}_t\), these optimal controls converge strongly in the limit as \(M\rightarrow \infty \), i.e. \(\lim _{M\rightarrow \infty } \Vert u^{M*} - u^*\Vert _{{{\mathscr {U}}}[0,t]} = 0\).
Proof
For the first assertion, i.e. (38), as the existence and uniqueness arguments for the two optimal controls are analogous, only the first is included. The proof involves two main steps, prefaced as follows: (I) confirm that the left-hand argmin in (38) is non-empty, by constructing an element of a sequence of decreasing level sets of \({\bar{J}}_t(x,\cdot )\), using the available properness, lower semicontinuity, and coercivity properties of the latter; and (II) verify that this argmin is a singleton, via convexity. The details of the proof follow.
(I) Fix any \(t\in {\mathbb {R}}_{>0}\), and recall that \({{\,\textrm{dom}\,}}{\overline{W}}_t\ne \emptyset \) by Lemma 2. Fix any \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t\), and recall by Lemma 6 that \({\bar{J}}_t(x,\cdot ):{{\mathscr {U}}}[0,t]\rightarrow {\overline{{\mathbb {R}}}}^+\) is proper, lower semicontinuous, strictly convex, and coercive. Given \(\ell \in {\overline{{\mathbb {R}}}}^+\), define the level set \(\varLambda _\ell \subset {{\mathscr {U}}}[0,t]\) by
As \({\bar{J}}_t(x,\cdot ):{{\mathscr {U}}}[0,t]\rightarrow {\overline{{\mathbb {R}}}}^+\) is proper and coercive, and (37) holds, there exists \({\hat{u}}\in {{\mathscr {U}}}[0,t]\) such that \(-\infty< {\textstyle {\frac{\phi (0)}{2}}}\, t + {\textstyle {\frac{\kappa }{2}}}\, \Vert {\hat{u}}\Vert _{{{\mathscr {U}}}[0,t]}^2 \le {\bar{J}}_t(x,{\hat{u}}) < \infty \). Consequently, \(\ell _0\doteq \inf _{u\in {{\mathscr {U}}}[0,t]} {\bar{J}}_t(x,u)\) is finite, i.e. \(\ell _0\in {\mathbb {R}}\), and \(\varLambda _\ell \) of (39) is guaranteed to be non-empty for all \(\ell >\ell _0\). Moreover, as \(\kappa \in {\mathbb {R}}_{>0}\), (37) implies that \(\varLambda _\ell \) is bounded for all \(\ell >\ell _0\), with \(\varLambda _\ell \subset {{\mathscr {B}}}_{{{\mathscr {U}}}[0,t]}[0,r_\ell ]\), \(r_\ell \doteq [\ell - {\textstyle {\frac{\phi (0)}{2}}}\, t]^\frac{1}{2}\) (and note by inspection that \(\ell _0\ge {\textstyle {\frac{\phi (0)}{2}}}\, t\)). Define a decreasing sequence \(\{\ell _k\}_{k\in {\mathbb {N}}}\subset {\mathbb {R}}\) such that \(\lim _{k\rightarrow \infty } \ell _k = \ell _0\), and a corresponding sequence \(\{u_k\}_{k\in {\mathbb {N}}}\subset {{\mathscr {U}}}[0,t]\) such that \(u_k\in \varLambda _{\ell _k}\). Note in particular that \(u_k\in \varLambda _{\ell _1}\subset {{\mathscr {B}}}_{{{\mathscr {U}}}[0,t]}[0;r_{\ell _1}]\) as \(\varLambda _{\ell _k}\supset \varLambda _{\ell _{k+1}}\ne \emptyset \), for all \(k\in {\mathbb {N}}\). That is, \(\{u_k\}_{k\in {\mathbb {N}}}\) is bounded. As per the proof of Theorem 1, this implies the existence of a subsequence \(\{ {\hat{u}}_k \}_{k\in {\mathbb {N}}} \subset \{ u_k \}_{k\in {\mathbb {N}}}\) and a \({\bar{u}}\in {{\mathscr {U}}}[0,t]\) such that \({\hat{\xi }}_k\rightarrow {\bar{\xi }}\) uniformly as \(k\rightarrow \infty \), where \({\hat{\xi }}_k\doteq \chi (x,{\hat{u}}_k)\), \({\bar{\xi }} \doteq \chi (x,{\bar{u}})\). Define a sequence of maps \(\{{\hat{\nu }}_k\}_{k\in {\mathbb {N}}}\) from [0, t] to \({\overline{{\mathbb {R}}}}^+\) and its candidate limit \({\bar{\nu }}:[0,t]\rightarrow {\overline{{\mathbb {R}}}}^+\) by
for all \(k\in {\mathbb {N}}\), \(s\in [0,t]\). By inspection, note that \(\lim _{k\rightarrow \infty } [{\hat{\nu }}_k]_s = {\bar{\nu }}_s\), irrespective of finiteness of \({\bar{\nu }}_s\), for all \(s\in [0,t]\). Repeating the Fatou’s Lemma argument of Theorem 1, \( \int _0^t {\bar{\nu }}_s \, ds = \int _0^t \liminf _{k\rightarrow \infty } [{\hat{\nu }}_k]_s \, ds \le \liminf _{k\rightarrow \infty } \int _0^t [{\hat{\nu }}_k]_s \, ds \). Hence, (3), (40) imply that \( {\bar{I}}_t(x,{\bar{u}}) = \int _0^t {\bar{\nu }}_s \, ds + {\textstyle {\frac{\phi (0)}{2}}}\, t \le \liminf _{k\rightarrow \infty } \int _0^t [{\hat{\nu }}_k]_s \, ds + {\textstyle {\frac{\phi (0)}{2}}}\, t = \liminf _{k\rightarrow \infty } {\bar{I}}_t(x,{\hat{u}}_k)\), which, again following the proof of Theorem 1, yields \({\bar{J}}_t(x,{\bar{u}}) \le \liminf _{k\rightarrow \infty } {\bar{J}}_t(x,{\hat{u}}_k)\). Abuse notation by relabelling \(\{\ell _k\}_{k\in {\mathbb {N}}}\) to match the subsequence \(\{{\hat{u}}_k\}_{k\in {\mathbb {N}}}\) of \(\{u_k\}_{k\in {\mathbb {N}}}\), and note that \({\hat{u}}_k\in \varLambda _{\ell _k}\). Hence, \({\bar{J}}_t(x,{\bar{u}}) \le \liminf _{k\rightarrow \infty } {\bar{J}}_t(x,{\hat{u}}_k) \le \liminf _{k\rightarrow \infty } \ell _k = \ell _0\). Consequently, recalling the definition of \(\ell _0\), \({\bar{J}}_t(x,{\bar{u}}) = \ell _0 = \inf _{u\in {{\mathscr {U}}}[0,t]} {\bar{J}}_t(x,u)\), so that \({\bar{u}}\in \mathop {\textrm{arg}\,\textrm{min}}\nolimits _{u\in {{\mathscr {U}}}[0,t]} {\bar{J}}_t(x,u)\) and the argmin is non-empty.
(II) Suppose there exists a \({\tilde{u}}\in \mathop {\textrm{arg}\,\textrm{min}}\nolimits _{u\in {{\mathscr {U}}}[0,t]} {\bar{J}}_t(x,u)\) such that \({\tilde{u}} \ne {\bar{u}}\), and define \(\breve{u}\doteq {{\textstyle {\frac{1}{2}}}}\, ({\bar{u}} + {\tilde{u}})\in {{\mathscr {U}}}[0,t]\). By strict convexity, \({\bar{J}}_t(x,\breve{u}) < {{\textstyle {\frac{1}{2}}}}\, {\bar{J}}_t(x,{\bar{u}}) + {{\textstyle {\frac{1}{2}}}}\, {\bar{J}}_t(x,{\tilde{u}}) = {\bar{J}}_t(x,{\bar{u}})\), contradicting \({\bar{u}}\in \mathop {\textrm{arg}\,\textrm{min}}\nolimits _{u\in {{\mathscr {U}}}[0,t]} {\bar{J}}_t(x,u)\). Hence, the argmin is a singleton, with \(\{ u^* \} = \mathop {\textrm{arg}\,\textrm{min}}\nolimits _{u\in {{\mathscr {U}}}[0,t]} {\bar{J}}_t(x,u)\), as \(u^* \doteq {\bar{u}} = {\tilde{u}}\).
For the second assertion, i.e. convergence of the optimal controls, fix \(t>0\), \(x=y\in {{\,\textrm{dom}\,}}{\overline{W}}_t\), and a sequence \(\{M_k\}_{k\in {\mathbb {N}}}\subset [-\phi (0),\infty )\) such that \(\lim _{k\rightarrow \infty } M_k = \infty \). Note by the first assertion that \(u^*,u^{M_k*}\in {{\mathscr {U}}}[0,t]\) exist and are unique. Define \(\breve{u}^{M_k} \doteq {{\textstyle {\frac{1}{2}}}}\, (u^* + u^{M_k*})\), \(k\in {\mathbb {N}}\), and note that
As \({\bar{J}}_t^{M_k}(x,\cdot ) - {\textstyle {\frac{\kappa }{2}}} \, \Vert \cdot \Vert _{{{\mathscr {U}}}[0,t]}^2\) is convex [see the proof of Lemma 6(ii)],
Consequently, by the sub-optimality of \(\breve{u}^{M_k}\) in \({\overline{W}}_t^{M_k}(x)\), optimality of \(u^*\) and \(u^{M_k*}\) in \({\overline{W}}_t(x)\) and \({\overline{W}}_t^{M_k}(x)\) respectively, and Theorem 1,
As \(\kappa \in {\mathbb {R}}_{>0}\), taking the limit as \(k\rightarrow \infty \) and again applying Theorem 1 yields \(\lim _{k\rightarrow \infty } \Vert u^* - u^{M_k*}\Vert _{{{\mathscr {U}}}[0,t]} \le 0\), as required. \(\square \)
4.2 Constraint Satisfaction
With existence of the optimal controls in (1), (20) guaranteed by Theorem 2, the corresponding state trajectories can be examined to determine their compliance with the intended state constraint. To this end, given \(t\in {\mathbb {R}}_{>0}\), \(x\in {\mathbb {R}}^n\), \({\epsilon }\in {\mathbb {R}}_{>0}\), define the sets of \({\epsilon }\)-optimal inputs in the definitions (1), (20) of \({\overline{W}}_t(x)\), \({\overline{W}}_t^M(x)\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), respectively by
Define the map \(\varDelta _t:{\mathbb {R}}^n\times {{\mathscr {U}}}[0,t]\rightarrow [0,t]\), where
for all \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\). Observe that \(\varDelta _t(x,u)\) aggregates the times in [0, t] at which the state constraint is violated for the trajectory (5), given its initial state x and control u. The following theorem provides bounds on the measure of this set \(\varDelta _t(x,u)\) for any control, for the near optimal controls (41), (42), and subsequently for the optimal controls of (38).
Theorem 3
The following properties concerning the map \(\varDelta _t\) of (43) hold for any \(t\in {\mathbb {R}}_{>0}\):
- (i):
-
There exist constants \(M_1\in {\mathbb {R}}_{>-\phi (0)}\) and \(\eta _t, \, \lambda _t,\, {\varXi _t}\in {\mathbb {R}}_{>0}\) and non-increasing \(\beta :{\mathbb {R}}_{>M_1}\rightarrow {\mathbb {R}}_{>0}\) satisfying \(\lim _{M\rightarrow \infty } \beta (M) = 0\), such that for any \(M\in {\mathbb {R}}_{>M_1}\), \({\epsilon }\in {\mathbb {R}}_{>0}\),
$$\begin{aligned} \sup _{u\in {{\mathscr {U}}}_x^{M,{\epsilon }}[0,t]} \mu ( \varDelta _t(x,u) )&\le \beta (M) \left[ \eta _t \left( {\overline{W}}_t^M(x) + {\epsilon }\right) + \lambda _t + {\varXi _t}\, |x|^2 \right] \end{aligned}$$(44)for all \(x\in {\mathbb {R}}^n\), in which \(\mu \) denotes the Lebesgue measure, and \({{\mathscr {U}}}_x^{M,{\epsilon }}\) is as per (42);
- (ii):
-
Given any \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t\), and any \({\epsilon }\in {\mathbb {R}}_{>0}\),
$$\begin{aligned}&\lim _{M\rightarrow \infty } \sup _{u\in {{\mathscr {U}}}_x^{M,{\epsilon }}[0,t]} \mu ( \varDelta _t(x,u) ) = 0 = \!\!\! \sup _{u\in {{\mathscr {U}}}_x^{\epsilon }[0,t]} \mu (\varDelta _t(x,u) )\,, \end{aligned}$$(45)in which \({{\mathscr {U}}}_x^{\epsilon }[0,t]\) is as per (41); and
- (iii):
-
Given any \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t\) and a strictly increasing sequence \(\{M_k\}_{k\in {\mathbb {N}}}\subset {\mathbb {R}}_{>-\phi (0)}\), there exist unique \(u^*\in {{\mathscr {U}}}[0,t]\) and sequence \(\{u^{M_k*}\}_{k\in {\mathbb {N}}}\subset {{\mathscr {U}}}[0,t]\), specified by (38), such that
$$\begin{aligned}&\lim _{k\rightarrow \infty } \mu ( \varDelta _t(x,u^{M_k*}) ) = 0 = \mu ( \varDelta _t(x,u^*) )\,. \end{aligned}$$
Proof
Fix any \(t\in {\mathbb {R}}_{>0}\). Select \(M_1\in {\mathbb {R}}_{\ge -\phi (0)}\), \(c\in {\mathbb {R}}\) as per Lemma 4(iv). Fix any \(M\in {\mathbb {R}}_{>M_1}\). By definition of M, \(M_1\), c, Lemma 4(iii) and (iv) imply that
Motivated by (46), define \(\beta :{\mathbb {R}}_{>M_1}\rightarrow {\mathbb {R}}_{>0}\) by
for all \(M>M_1\), and note that it is non-increasing by Lemma 4(iii). Furthermore, (18) and Lemma 14 imply that \(\lim _{M\rightarrow \infty } \varPhi ^M(b^2) \ge \lim _{M\rightarrow \infty } \{ a^{-1}(M)\, b^2 - M \} = \lim _{M\rightarrow \infty } \gamma _{b^2}(M) = \infty \), where \(\gamma _{b^2}(M)\) is as per (97). Hence, by inspection of (47), \(\lim _{M\rightarrow \infty } \beta (M) = 0\).
(i) Fix any \(x\in {\mathbb {R}}^n\), \({\epsilon }\in {\mathbb {R}}_{>0}\), and \(u\in {{\mathscr {U}}}_x^{M,{\epsilon }}[0,t]\), and denote the corresponding near-optimal trajectory by \(\xi \doteq \chi (x,u)\) as per (5). Applying (42) and Corollary 1, note that \( {\overline{W}}_t^M(x) + {\epsilon }> {\bar{J}}_t^M(x,u) \ge \int _0^t {\textstyle {\frac{K}{2}}}\, |\xi _s|^2 + {{\textstyle {\frac{1}{2}}}}\, \varPhi ^M(|\xi _s|^2) \, ds + {\textstyle {\frac{\kappa }{2}}}\, \Vert u\Vert _{{{\mathscr {U}}}[0,t]}^2 \ge {\textstyle {\frac{\phi (0)}{2}}}\, t + {\textstyle {\frac{\kappa }{2}}}\, \Vert u\Vert _{{{\mathscr {U}}}[0,t]}^2\), so that \( \Vert u\Vert _{{{\mathscr {U}}}[0,t]}^2 \le {\textstyle {\frac{2}{\kappa }}} \, [ {\overline{W}}_t^M(x) + {\epsilon }- {\textstyle {\frac{\phi (0)}{2}}}\, t ] \). Recalling (5), there exist \({{{\overline{\varXi }}_t}},{\bar{\gamma }}_t\in {\mathbb {R}}_{>0}\) such that \(\Vert \xi \Vert _{{{{{\mathscr {L}}}}_2}([0;t];{\mathbb {R}}^n)}^2 \le {{{\overline{\varXi }}_t}}\, |x|^2 + {\bar{\gamma }}_t\, \Vert u\Vert _{{{\mathscr {U}}}[0,t]}^2\), so that
Consequently, returning to the definition (42) of near optimality, and applying (43), Corollary 1, and the bound \(K+\phi '(0)\ge 0\) adopted in (7) (ii),
That is, with \(\beta \) as per (47),
from which (44) immediately follows by selecting \(\eta _t\doteq 1 + {\textstyle {\frac{{\bar{\gamma }}_t\, |\phi '(0)|}{\kappa }}}\), \(\lambda _t\doteq {{\textstyle {\frac{1}{2}}}}\, [ |\phi '(0)|+|c| + {\textstyle {\frac{{\bar{\gamma }}_t}{\kappa }}}\, |\phi '(0)|\, |\phi (0)| ] \, t\), and \({\varXi _t}\doteq {{\textstyle {\frac{1}{2}}}}\, {{{\overline{\varXi }}_t}}\, |\phi '(0)|\).
(ii) Fix any \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t\), \({\epsilon }\in {\mathbb {R}}_{>0}\). The left-hand equality of (45) holds follows by (16), (25), and assertion (i), i.e. (44). In particular,
It remains to show that the right-hand equality in (45) holds. Fix any \(u\in {{\mathscr {U}}}_x^{\epsilon }[0,t]\). Suppose there exists \(\delta \in {\mathbb {R}}_{>0}\) such that \(\mu ( \varDelta _t(x,u) )\ge \delta > 0\). An analogous calculation to (48), with \({\overline{W}}_t^M\) and \(\varPhi ^M\) replaced with \({\overline{W}}_t\) and \(\sup _{M\ge -\phi (0)} \varPhi ^M\), yields
in which the equalities follow as \(\delta \in {\mathbb {R}}_{>0}\), and by (18) and Lemma 14. Hence, \({\overline{W}}_t(x) = \infty \), which contradicts the definition of \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t\). Consequently, no such \(\delta \in {\mathbb {R}}_{>0}\) exists, so that \(\mu ( \varDelta _t(x,u) ) = 0\). As \(u\in {{\mathscr {U}}}_x^{\epsilon }[0,t]\) is arbitrary, the right-hand equality in (45) follows as required.
(iii) Immediate by assertion (ii) and Theorem 2. \(\square \)
Remark 2
Theorem 3 indicates that the regulator problem defined by \({\overline{W}}_t\) of (1) implements the required state constraint for almost every time for those initial states \(x\in {\mathbb {R}}^n\) for which \({\overline{W}}_t(x)<\infty \), and the approximating regulator problem defined by \({\overline{W}}_t^M\) of (20) implements the same constraint in the limit as \(M\rightarrow \infty \). This constraint can be violated for \(M<\infty \), as the finite approximation \(\varPhi ^M\) of the extended real valued barrier \(\varPhi \) does not impose an infinite cost penalty on such violations in the definition (20) of \({\overline{W}}_t^M\). This is reflected in the non-zero right-hand side of (44), which allows for constraint violations on time intervals of non-zero measure for \(M<\infty \).
5 Equivalent Unconstrained Game
The sup-of-quadratics representation (12) for the convex barrier function \(\varPhi \) in (6) is used to demonstrate equivalence of the value function for the state constrained regulator problem (1) with the upper value of an unconstrained two player game, as summarised in Theorem 4. Similarly, the approximate sup-of-quadratics representation (18) is used to demonstrate an equivalence between the value function for the approximate regulator problem (20) with the corresponding upper value of an approximate two player game, see Theorem 5. It is further demonstrated that this approximate game has equivalent upper and lower values, which is used to demonstrate the corresponding equivalence for the exact game, see Theorem 6 and Corollary 2, via the convergence results of Theorem 1. The lower value is subsequently exploited to examine solutions of the state constrained regulator problem (1) via DREs, see Theorems 7 and 8.
5.1 Exact Unconstrained Game and Its Upper Value
Given a horizon \(t\in {\mathbb {R}}_{\ge 0}\), define a function space by
Motivated by (2), (3), (12), define the upper value \(W_t:{\mathbb {R}}^n\rightarrow {\overline{{\mathbb {R}}}}^+\) of a two player unconstrained linear quadratic game by
for all \(x\in {\mathbb {R}}^n\), in which \(J_t\) is a cost function defined with respect to a new integrated running cost function \(I_t\) motivated by (3), and the existing integrated running cost \(I_t^\kappa \) of (3) and terminal cost \(\varPsi \) of (4). In particular, define \(J_t, I_t:{\mathbb {R}}^n\times {{\mathscr {U}}}[0,t]\times {{\mathscr {A}}}[0,t]\rightarrow {\mathbb {R}}\) and \(\nu :{\mathbb {R}}^n\times {\mathbb {R}}_{\ge -\phi (0)}\rightarrow {\mathbb {R}}\) by
for all \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\), \(\alpha \in {{\mathscr {A}}}[0,t]\), \({\hat{\alpha }}\in {\mathbb {R}}_{\ge -\phi (0)}\).
The value functions (1) and (50) defining the exact regulator problem and the exact unconstrained game are in fact equivalent, as stated in the following theorem.
Theorem 4
Given \(t\in {\mathbb {R}}_{\ge 0}\), the value functions \({\overline{W}}_t\), \(W_t\) of (1), (50) are equivalent, with \({\overline{W}}_t(x) = W_t(x)\) for all \(x\in {\mathbb {R}}^n\).
The proof of Theorem 4 follows as a consequence of the following measurable selection lemma, see for example [14].
Lemma 7
Given \(t\in {\mathbb {R}}_{\ge 0}\), \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\), \(\xi \doteq \chi (x,u)\in C([0,t];{\mathbb {R}}^n)\), the following hold:
- (i):
-
The cost functions \({\bar{I}}_t\), \(I_t\) of (3), (52) associated with the exact regulator problem (1) and game upper value (50) satisfy
$$\begin{aligned}&{\bar{I}}_t(x,u) = \int _0^t \sup _{{\hat{\alpha }}\ge -\phi (0)} \nu (\xi _s,{\hat{\alpha }}) \, ds = \sup _{\alpha \in {{\mathscr {A}}}[0,t]} I_t(x,u,\alpha ), \end{aligned}$$(54)in which \(\nu \) is as per (53);
- (ii):
-
If \(\mu (\varDelta _t(x,u))=0\), see (43), then \(\alpha ^*\in {{\mathscr {A}}}[0,t]\) given for any \(M\in {\mathbb {R}}_{\ge -\phi (0)}\) by
$$\begin{aligned} \alpha _s^{*} = {\hat{\alpha }}^{\varDelta *}(|\xi _s|^2)&\doteq \left\{ \begin{array}{rl} a\circ \phi '(|\xi _s|^2), &{} s\in [0,t]\setminus \varDelta _t(x,u),\\ M, &{} s\in \varDelta _t(x,u), \end{array} \right. \end{aligned}$$(55)satisfies
$$\begin{aligned} {\bar{I}}_t(x,u)&= I_t(x,u,\alpha ^*)\, ; \end{aligned}$$(56) - (iii):
-
If \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t\) and \(u\in {{\mathscr {U}}}_x^{\epsilon }[0,t]\), \({\epsilon }\in {\mathbb {R}}_{>0}\), see (41), then (56) holds with \(\alpha ^*\in {{\mathscr {A}}}[0,t]\) as per (55), for arbitrary \(M\in {\mathbb {R}}_{\ge -\phi (0)}\).
Proof
Fix \(t\in {\mathbb {R}}_{\ge 0}\), \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\), and \(\xi \doteq \chi (x,u)\in C([0,t];{\mathbb {R}}^n)\). (i) The left-hand equality in (54) is immediate by (3), (52), (53), and Proposition 1, in particular (12). For the right-hand equality, first fix any \(\alpha \in {{\mathscr {A}}}[0,t]\), and note that it is pointwise suboptimal in the supremum over \({\hat{\alpha }}\ge -\phi (0)\). That is, \(\int _0^t \sup _{{\hat{\alpha }}\ge -\phi (0)} \nu (\xi _s, {\hat{\alpha }}) \, ds \ge \int _0^t \nu (\xi _s, \alpha _s) \, ds\) for all \(\alpha \in {{\mathscr {A}}}[0,t]\). Hence,
In order to prove the opposite inequality, fix \(\delta \in {\mathbb {R}}_{>0}\), and suppose that \(u\in {{\mathscr {U}}}[0,t]\) is such that \(|\varDelta _t(x,u)|\ge \delta >0\), see (43). Given any \(\alpha ^-,\alpha ^+\in {\mathbb {R}}_{\ge -\phi (0)}\), define piecewise constant \(\breve{\alpha }\in {{\mathscr {A}}}[0,t]\) by
for all \(s\in [0,t]\). Note that \(\breve{\alpha }\) is suboptimal insofar as
As this is true for any \(\alpha ^+\in {\mathbb {R}}_{\ge -\phi (0)}\), it follows immediately that
Lemma 14 implies that \(\nu (\xi _s,\cdot ) = \gamma _{|\xi _s|^2}(\cdot )\) is strictly increasing for any \(s\in \varDelta _t(x,u)\) fixed, as \(|\xi _s|^2\ge b^2\) by (43), with \(\lim _{\alpha ^+\rightarrow \infty } \nu (\xi _s,\alpha ^+) = {\textstyle {\frac{K}{2}}}\, |\xi _s|^2 + \lim _{\alpha ^+\rightarrow \infty } \gamma _{|\xi _s|^2}(\alpha ^+) = \infty \). Consequently, there exists an \(M_0\in {\mathbb {R}}_{\ge -\phi (0)}\) such that \(\nu (\xi _s,\alpha ^+) \ge a^{-1}(\alpha ^+)\, b^2 - \alpha ^+ > 0\) for all \(\alpha ^+>M_0\). Hence, the monotone convergence theorem implies that
As the left-hand side here is the right-hand side of (57), it follows immediately that the opposite inequality to (57) always holds where \(|\varDelta _t(x,u)|\ge \delta >0\).
Alternatively, suppose \(u\in {{\mathscr {U}}}[0,t]\) is such that \(\mu ( \varDelta _t(x,u) )=0\), and let \(\alpha ^*\in {{\mathscr {A}}}[0,t]\) be defined by (55). Recalling the left-hand equality of (54), and the definition (13) of \({\hat{\alpha }}^*(\cdot )\) in Proposition 1,
Combining this inequality with (57) yields (54).
(ii) Immediate by the fourth equality of (58).
(iii) Fix \(x\in {{\,\textrm{dom}\,}}{\overline{W}}_t\), \({\epsilon }\in {\mathbb {R}}_{>0}\), \(u\in {{\mathscr {U}}}_x^{\epsilon }[0,t]\). Theorem 3(ii) implies that \(\mu ( \varDelta (x,u) ) = 0\), so that assertion (ii) above applies. \(\square \)
Theorem 4 follows by Lemma 7(i) and by comparison of (1)–(3), (50)–(53).
5.2 Approximate Game and Its Upper and Lower Values
Given \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(t\in {\mathbb {R}}_{\ge 0}\), define \({{\mathscr {A}}}^M[0,t]\doteq C([0,t];[-\phi (0),M])\). Analogous to the exact game defined by (50), define the upper value \(W_t^M:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) of an approximating two player unconstrained linear quadratic game by
for all \(x\in {\mathbb {R}}^n\), where cost \(J_t\) is as per (51). As in the exact case, the value function (20) of the approximating regulator problem and the upper value (59) of the approximating game are equivalent.
Theorem 5
Given \(t\in {\mathbb {R}}_{\ge 0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), the value functions \({\overline{W}}_t^M\), \(W_t^M\) of (20), (59) are equivalent, with \({\overline{W}}_t^M(x) = W_t^M(x)\) for all \(x\in {\mathbb {R}}^n\).
The proof of Theorem 5 follows as a consequence of a corresponding measurable selection lemma, the proof of which is similar to that of Lemma 7.
Lemma 8
Given any \(t\in {\mathbb {R}}_{\ge 0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\), and \(\xi \doteq \chi (x,u)\in C([0,t];{\mathbb {R}}^n)\), the cost functions \({\bar{I}}_t^M\), \(I_t\) and \(\bar{J}_t^M\), \(J_t\) of (22), (52) and (21), (51) satisfy
in which \(\alpha ^{M*}\in {{\mathscr {A}}}^M[0,t]\) is defined via (19) by, and satisfies,
Proof
Fix any \(t\in {\mathbb {R}}_{\ge 0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(x\in {\mathbb {R}}^n\), \(u\in {{\mathscr {U}}}[0,t]\), and define \(\xi \doteq \chi (x,u)\in C([0,t];{\mathbb {R}}^n)\). Define \(\alpha ^{M*}\) as per (61), and note in particular that \(\alpha ^{M*}\in {{\mathscr {A}}}^M[0,t]\) by Lemma 13 and (7), (15), (19).
[(60) and the left-hand argmax in (62)] The first equality in (60) is immediate by Proposition 2(i), i.e. (18). For the remaining equalities, any \(\alpha \in {{\mathscr {A}}}^M[0,t]\) is pointwise suboptimal in the supremum over \({\hat{\alpha }}\in [-\phi (0),M]\), so that
for all \(\alpha \in {{\mathscr {A}}}^M[0,t]\). Hence,
In order to prove the opposite inequality, recall by (61) and Proposition 2, i.e. (19), that \(\alpha _s^{M*}\) is the pointwise maximizer of \(\nu (\xi _s,\cdot )\). That is,
Hence, combining inequalities (63) and (64) yields (60), and the left-hand argmax in (62).
[(60) and the right-hand argmax in (62)] Immediate by definitions (21), (22), (51), (52) of \({\bar{J}}_t^M\), \({\bar{I}}_t^M\), \(J_t\), \(I_t\), (60), (61), and the left-hand argmax in (62). (Note in particular that the dependence of \(J_t(\cdot ,\cdot ,\alpha )\) on \(\alpha \in {{\mathscr {A}}}^M[0,t]\) comes only through \(I_t(\cdot ,\cdot ,\alpha )\).) \(\square \)
Theorem 5 subsequently follows by Lemma 8 and comparison of (20)–(22) and (51)–(53), (59).
With a view to addressing computation, the remaining objective is to demonstrate equivalence of the upper and lower values for the game (59). To this end, a number of useful properties of the cost function \(J_t\) of (51) are summarised via the following two lemmas.
Lemma 9
Given any \(t\in {\mathbb {R}}_{>0}\), \(x\in {\mathbb {R}}^n\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(\alpha \in {{\mathscr {A}}}^M[0,t]\), the cost function \(J_t(x,\cdot ,\alpha ):{{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}\) defined by (51) is Frèchet differentiable, strictly convex, and coercive.
Proof
The differentiability assertion follows by Lemma 1(ii) and the chain rule, while the strict convexity and coercivity assertions follow via analogous arguments to the proof of Lemma 6. The details are omitted. \(\square \)
Remark 3
Strict convexity in Lemma 9 requires \(t\in {\mathbb {R}}_{>0}\).
Lemma 10
Given \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), and \(x\in {\mathbb {R}}^n\), let \(u^{M*}\in {{\mathscr {U}}}[0,t]\) be defined as per (38), and let \(\alpha ^{M*}\doteq {\hat{\alpha }}^{M*}(|\chi (x,u^{M*})|^2)\in {{\mathscr {A}}}^M[0,t]\) be defined via (5), (61). Then, \(u^{M*}\) and \(\alpha ^{M*}\) are unique, and together satisfy
Proof
The proof involves three main steps, prefaced as follows: (I) given the unique optimal control \(u^{M*}\) as indicated, construct a unique \(\alpha ^{M*}\) from the corresponding optimal trajectory via (5), (61); (II) verify by strict convexity that the cost \(J_t(x,\cdot ,\alpha ^{M*})\) has a unique minimizer; and (III) show that this minimizer must be \(u^{M*}\), by showing that perturbations away from \(u^{M*}\) always result in a higher cost, via a non-negative directional derivative of \(J_t(x,\cdot ,\alpha ^{M*})\).
Step (III) is complicated by two features of the problem: firstly, perturbations in the control yield perturbations in the trajectory, which in turn yield perturbations in the evaluated \(\alpha ^{M*}\) via (61); and secondly, the statement of (61) involves a number of cases, see (19). In dealing with the first complication, it is convenient to bound the directional derivative of \(J_t(x,\cdot ,\alpha ^{M*})\) below by that of \({\bar{J}}_t^M(x,\cdot )\), for which \(u^{M^*}\) is the known unique minimizer. In dealing with the second complication, the cases involved must be exhaustively enumerated.
The details of the proof follow.
(I) Fix \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\). Define \(u^{M*}\in {{\mathscr {U}}}[0,t]\) uniquely as per (38), i.e. as per the left-hand equality in the lemma statement. Given this \(u^{M*}\), define \(\alpha ^{M*}\in {{\mathscr {A}}}^M[0,t]\) as per the lemma statement, and note by Lemma 8 that \(\alpha ^{M*}\) is unique by definition.
(II) Recall by Lemma 9 that \(J_t(x,\cdot ,\alpha ^{M*}):{{\mathscr {U}}}[0,t]\rightarrow {\mathbb {R}}\) is Frèchet differentiable and strictly convex. Hence, in order to verify that \(J_t(x,\cdot ,\alpha ^{M*})\) is (uniquely) minimized at \(u^{M*}\), it is sufficient to show that the directional derivative of \(J_t(x,\cdot ,\alpha ^{M*})\) is nonnegative in all directions when evaluated at \(u^{M*}\). The details follow in (III) below.
(III) Fix any \({\tilde{u}}\in {{\mathscr {U}}}[0,t]\) with \(\Vert {\tilde{u}}\Vert _{{{\mathscr {U}}}[0,t]} = 1\). The Frèchet derivative and its Riesz representation at \(u^{M*}\in {{\mathscr {U}}}[0,t]\), denoted by \(D_u J_t(x,u^{M*},\alpha ^{M*})\in {\mathcal {L}}({{\mathscr {U}}}[0,t];{\mathbb {R}})\) and \({\nabla }_u J_t(x,u^{M*},\alpha ^{M*})\in {{\mathscr {U}}}[0,t]\), satisfy
Fix any \({\epsilon }\in {\mathbb {R}}_{>0}\) with \({\epsilon }^2<\min (1,{\hat{\rho }}(M))\), and \({\hat{\rho }}(M)\) as per (15). Let \(L_t\doteq \Vert {\mathcal {A}}\Vert _{{\mathcal {L}}({{\mathscr {U}}}[0,t];C([0,t];{\mathbb {R}}^n))} \in {\mathbb {R}}_{>0}\), with \({\mathcal {A}}\) as per (9), and \({\bar{\delta }}^{\epsilon }\doteq {\epsilon }/ (4\, L_t) \in {\mathbb {R}}_{>0}\). Fix any \(\delta \in (0,{\bar{\delta }}^{\epsilon }]\). Define
for all \(s\in [0,t]\), with \(\chi \), \({\hat{\alpha }}^{M*}\) as per (5), (19). Recalling (5) and Lemma 1, note in particular that
By (51), (52), (53), and Lemma 8,
so that, by subtraction,
As per the prefaced first complication of step (III), the first two terms on the right-hand side of (68) are a prelude to a directional derivative of \({\bar{J}}_t^M(x,\cdot )\), in which it is noted that \(u^{M*}\) is the minimizer of \({\bar{J}}_t^M(x,\cdot )\), see Theorem 2 and (38). Hence, a lower bound for the integral term in the right-hand side of (68) is sought, as a function of \(\delta \), using Taylor’s theorem. This Taylor’s theorem argument brings in the second prefaced complication of step (III), due to the cases involved in the definition of \(\alpha ^{M*}\), and this dominates the remainder of the proof. Persevering, as a first step, it may be shown with some calculation via (19), (53), (66), (97) that
in which the derivatives follow by Lemma 14 and the identity \(\nu ({\tilde{\xi }}_s^{M*},\alpha ) = \frac{K}{2}\, | {\tilde{\xi }}_s^{M*}|^2 + {{\textstyle {\frac{1}{2}}}}\, \gamma _{| {\tilde{\xi }}_s^{M*}|^2}(\alpha )\) for all \(\alpha \in {\mathbb {R}}_{\ge -\phi (0)}\), with \(\gamma _{(\cdot )}\) as per (97).
In evaluating these cases, observe that the second partial derivative in (69) is unbounded if \(|{\tilde{\xi }}_s^{M*}|\rightarrow 0\). Two cases are thus considered, (i) \(s\in \varDelta _0^{\epsilon }\), and (ii) \(s\in [0,t]\setminus \varDelta _0^{\epsilon }\), in which
(i) Fix \(s\in \varDelta _0^{\epsilon }\). The triangle inequality, (67), (70) imply that \(|{\tilde{\xi }}_s^{M*}| \le |{\tilde{\xi }}_s^{M*} - \xi _s^{M*}| + |\xi _s^{M*}|\le {\textstyle {\frac{{\epsilon }}{4}}} + {\textstyle {\frac{{\epsilon }}{2}}} = {\textstyle {\frac{3\, {\epsilon }}{4}}}\), so that \(\max ( |{\tilde{\xi }}_s^{M*}|^2, |\xi _s^{M*}|^2 ) \le {\epsilon }^2 < {\hat{\rho }}(M)\) by definition of \({\epsilon }\). Hence, (19), (66) yield \({\tilde{\alpha }}_s^{M*} = a\circ \phi '(|{\tilde{\xi }}_s^{M*}|^2)\), \(\alpha _s^{M*} = a\circ \phi '(|\xi _s^{M*}|^2)\), so that \(a^{-1}({\tilde{\alpha }}_s^{M*}) = \phi '(|{\tilde{\xi }}_s^{M*}|^2)\), \(a^{-1}(\alpha _s^{M*}) = \phi '(|\xi _s^{M*}|^2)\). Note also that \(a\circ \phi '\) is differentiable by Lemma 13, with \((a\circ \phi ')'(\rho ) = [a'\circ \phi '(\rho )]\, \phi ''(\rho ) = \rho \, \phi ''(\rho )\) for all \(\rho \in [0,{\epsilon }]\). Hence, (53), the triangle inequality, Taylor’s theorem, and (67) together imply that the integrand in (68) satisfies
in which \({\epsilon }<1\) by definition, and \(\mu _s,\rho _s\in [0,{\epsilon }]\subset [0,{\hat{\rho }}(M)]\) lie in an interval defined by the end points \(|\xi _s^{M*}|^2\) and \(|{\tilde{\xi }}_s^{M*}|^2\), and
As \(s\in \varDelta _0^{\epsilon }\) is arbitrary, integration yields
(ii) Fix any \(s\in [0,t]\setminus \varDelta _0^{\epsilon }\). As \(|\xi _s^{M*}| > {\textstyle {\frac{{\epsilon }}{2}}}\), by definition of \(\varDelta _0^{\epsilon }\), the triangle inequality and (67) imply that \(|{\tilde{\xi }}_s^{M*}| \ge |\xi _s^{M*}| - |{\tilde{\xi }}_s^{M*} - \xi _s^{M*}| > {\textstyle {\frac{{\epsilon }}{2}}} - {\textstyle {\frac{{\epsilon }}{4}}} = {\textstyle {\frac{{\epsilon }}{4}}}\), so that \({\tilde{\xi }}_s^{M*}, \xi _s^{M*} \not \in {{\mathscr {B}}}_{{\mathbb {R}}^n}[0;{\textstyle {\frac{{\epsilon }}{4}}}]\). Consequently, (66), (69) imply that \({\frac{\partial {\nu }}{\partial {\alpha }}}({\tilde{\xi }}_s^M,{\tilde{\alpha }}_s^{M*})\) and \({\frac{\partial ^2 {\nu }}{\partial {\alpha }^2}}({\tilde{\xi }}_s^M,{\bar{\alpha }}_s)\) exist and are uniformly bounded for \(s\in [0,t]\), given any \({\bar{\alpha }}_s\) contained in the interval defined by the end points \({\tilde{\alpha }}_s^{M*}\) and \(\alpha _s^{M*}\). By Taylor’s theorem, such an \({\bar{\alpha }}_s\) exists, and satisfies
Note by inspection of the various cases in (69) that the first order term is equivalently given by
Let \(R^M\doteq \Vert \xi ^{M*}\Vert _{C([0,t];{\mathbb {R}}^n)}>{\textstyle {\frac{{\epsilon }}{2}}}\), and note that \(|{\tilde{\xi }}_s^{M*}| \le |\xi _s^{M*}| + |{\tilde{\xi }}_s^{M*} - \xi _s^{M*}| \le R^M + {\textstyle {\frac{{\epsilon }}{4}}}\), by (67). In the non-zero case above, as \(M = a\circ \phi '({\hat{\rho }}(M))\), a second application of Taylor’s theorem yields
in which \(\mu _s\in [|\xi _s^{M*}|^2, {\hat{\rho }}(M)]\). Similarly,
Hence, combining these inequalities yields a lower bound for the first order term, with
with \(K_1^M({\epsilon }) \doteq 2\, L_1^M \left( R^M + {\textstyle {\frac{{\epsilon }}{8}}} \right) ^2 L_t^2\).
The second order term in (73) has the same form as (69), with
in which \({\bar{\alpha }}_s\) is in the interval defined by the end points \({\tilde{\alpha }}_s^{M*}\) and \(\alpha _s^{M*}\). As \({\tilde{\xi }}_s^{M*}, \xi _s^{M*}\not \in {{\mathscr {B}}}_{{\mathbb {R}}^n}[0;{\textstyle {\frac{{\epsilon }}{4}}}]\), (66) implies that \(\alpha _s^{M*}, {\tilde{\alpha }}_s^{M*}\in [a\circ \phi '(\frac{{\epsilon }^2}{16}), M]\), so that \({\bar{\alpha }}_s\in [a\circ \phi '(\frac{{\epsilon }^2}{16}), M]\). Hence, Lemma 13, i.e. (92), yields
with \(L_2^M \doteq \sup _{\rho \in [0,{\hat{\rho }}(M)]} [2\, \phi ''(\rho )]^{-1}\). Furthermore, in each of the four cases listed for \(\alpha _s^{M*} - {\tilde{\alpha }}_s^{M*}\) in (69), Taylor’s theorem again yields
in which \(\rho _s\in [0,{\hat{\rho }}(M)]\) in every case. Hence, a lower bound for the second order term is
with \(K_2^M({\epsilon })\doteq {{\textstyle {\frac{1}{2}}}}\, L_2^M\, ({\textstyle {\frac{16}{{\epsilon }^2}}})^3\, (R^M+{\textstyle {\frac{{\epsilon }}{4}}})^2\, [{\hat{\rho }}(M)\, L_1^M\, (R^M + {\textstyle {\frac{{\epsilon }}{8}}})\, L_t]^2\). Thus, integrating (73) via (74), (75),
Cases (i) and (ii) may now be combined, via (72) and (76), in (68), to finally deal with the second prefaced complication of step (III). In particular,
Recalling (66), a lower bound for the directional derivative (65) can subsequently be evaluated, with
in which the second inequality follows by Theorem 2, i.e. (38), effectively dealing with the first prefaced complication of step (III). Furthermore, as \({\epsilon }\in {\mathbb {R}}_{>0}\) can be selected arbitrarily small, cf. its definition prior to (66), and \(K_0^M(0) = 0\) by (71), it follows that
in which \({\tilde{u}}\in {{\mathscr {U}}}[0,t]\), \(\Vert {\tilde{u}}\Vert _{{{\mathscr {U}}}[0,t]} = 1\), is arbitrary. Hence, \(u^{M*}\in {{\mathscr {U}}}[0,t]\) minimizes \(J_t(x,\cdot ,\alpha ^{M*})\). \(\square \)
Theorem 6
Given \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(x\in {\mathbb {R}}^n\), and \(u^{M*}\), \(\alpha ^{M*}\) as per Lemma 10,
Proof
Fix \(t\in {\mathbb {R}}_{> 0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(x\in {\mathbb {R}}^n\), and \(\alpha ^{M*}\), \(u^{M*}\) as per Lemma 10. Recalling Theorem 5, \( {\overline{W}}_t^M(x) = W_t^M(x) = \inf _{u\in {{\mathscr {U}}}[0,t]} \sup _{\alpha \in {{\mathscr {A}}}^M[0,t]} J_t(x,u,\alpha ) \ge \sup _{\alpha \in {{\mathscr {A}}}^M[0,t]} \inf _{u\in {{\mathscr {U}}}[0,t]} J_t(x,u,\alpha ) \). For the opposite inequality, and existence of the minimizer and maximzer as per the final equality in (77), note by Theorem 2, Lemma 8, the definition of \(\alpha ^{M*}\), and finally Lemma 10, that
\(\square \)
Corollary 2
Given \(t\in {\mathbb {R}}_{>0}\) and \(x\in {\mathbb {R}}^n\), the game upper value defined by \(W_t\) of (50) and the corresponding game lower value are equivalent, with
Proof
Fix \(t\in {\mathbb {R}}_{>0}\), \(x\in {\mathbb {R}}^m\). Applying Theorem 4, followed by Theorems 1 and 6,
\(\square \)
5.3 Computation via the Lower Value
Theorems 4, 5, 6, and Corollary 2, together establish equivalences of the exact and approximate regulator problems (1) and (20) with the corresponding exact and approximate games (50), (59), (77), (78), and that the upper and lower values of these games are equivalent in both cases. With a view to computation, via the value function and optimal trajectories corresponding to the approximate regulator problem (20), it is useful to explicitly consider the lower value of the approximate game. To this end, given \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(\alpha \in {{\mathscr {A}}}^M[0,t]\), define an auxiliary value function \({\widehat{W}}_t^{\alpha }\) by
for all \(t\in {\mathbb {R}}_{\ge 0}\), \(x\in {\mathbb {R}}^n\). The following is then immediate.
Lemma 11
Given any \(t\in {\mathbb {R}}_{\ge 0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), the value functions \(W_t^M,{\widehat{W}}_t^{\alpha }:{\mathbb {R}}^n\rightarrow {\overline{{\mathbb {R}}}}^+\), \(\alpha \in {{\mathscr {A}}}^M[0,t]\), of (50), (79) satisfy \(W_t^M(x) = \sup _{\alpha \in {{\mathscr {A}}}^M[0,t]} {\widehat{W}}_t^{\alpha }(x)\) for all \(x\in {\mathbb {R}}^n\).
By inspection of (51), (52), (53), \({\widehat{W}}_t^{\alpha }\) of (79) defines the value of an LQR problem, parameterized by \(\alpha \in {{\mathscr {A}}}^M[0,t]\). In order to demonstrate that \({\widehat{W}}_t^{\alpha }\) has an explicit quadratic representation, it is convenient to consider the final value problem (FVP)
for all \(s\in [0,t]\), in which \({\hat{A}}, {\hat{P}}_s, {\hat{P}}_t, {\hat{V}}_s\in {\mathbb {R}}^{(n+1)\times (n+1)}\), \(s\in [0,t]\), \({\hat{B}}\in {\mathbb {R}}^{(n+1)\times m}\), \({\hat{C}}\in {\mathbb {R}}^{n\times (n+1)}\) are defined via \(\kappa \), K of (2), A, B of (5), and \(P_t\) of (4) by
in which \(I_n\in {\mathbb {R}}^{n\times n}\) and \(0_n\in {\mathbb {R}}^{n\times 1}\) denote the identity matrix and zero vector respectively.
Remark 4
FVP (80) may be expressed as three component FVPs
subject to \(P_t^\alpha = P_t\), \(Q_t^\alpha = Q_t\), and \(R_t^\alpha = R_t\), by (4), (81).
Lemma 12
Given fixed \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), and any \(\alpha \in {{\mathscr {A}}}^M[0,t]\), there exists a unique \({\hat{P}}^\alpha \in C([0,t];\varSigma ^{n+1})\cap C^1((0,t);\varSigma ^{n+1})\) of the form (81) that satisfies FVP (80).
Proof
See for example [28, Theorem 37, p. 364].\(\square \)
Theorem 7
Given any \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), \(\alpha \in {{\mathscr {A}}}^M[0,t]\), the auxiliary value function \({\widehat{W}}_t^{\alpha }\) of (79) satisfies \({\widehat{W}}_t^{\alpha }(x) = {\breve{W}}_t^{\alpha }(0,x)\) for all \(x\in {\mathbb {R}}^n\), where \({\breve{W}}_t^{\alpha }:[0,t]\times {\mathbb {R}}^n\rightarrow {\overline{{\mathbb {R}}}}^+\) is given by
for all \(s\in [0,t]\), \(x\in {\mathbb {R}}^n\), in which \({\hat{P}}_t^{\alpha }\in C([0,t];\varSigma ^{n+1})\cap C^1((0,t);\varSigma ^{n+1})\) is the unique solution of FVP (80). Furthermore, the optimal input \(u^\alpha \in {{\mathscr {U}}}[0,t]\) in (79) has the state feedback characterization
for any \(x\in {\mathbb {R}}^n\), where \(P_s^\alpha \), \(Q_s^\alpha \) are as per (82), (83).
Proof
Fix arbitrary \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), and \(\alpha \in {{\mathscr {A}}}^M[0,t]\). Applying Lemma 12, there exists a unique \({\hat{P}}^\alpha \in C([0,t];\varSigma ^{n+1})\cap C^1((0,t);\varSigma ^{n+1})\) of the form (81) that satisfies FVP (80). Consequently, given any \(s\in (0,t)\), \(x\in {\mathbb {R}}^n\), (80), (81), (82), (83), (84) imply that
Define the Hamiltonian \(H^\alpha :[0,t]\times {\mathbb {R}}^n\times {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) by
for all \(x,p\in {\mathbb {R}}^n\), \(s\in [0,t]\). Combining (86), (87), note that \(-{\frac{\partial {{\breve{W}}_t^\alpha }}{\partial {s}}}(s,x) = H^\alpha (s,x,{\nabla }_x {\breve{W}}_t^\alpha (s,x))\). Fix any \({\bar{u}}\in {{\mathscr {U}}}[0,t]\). Define \({\bar{\xi }} \doteq \chi (x,{\bar{u}})\) via (5), and observe via (87) that \({\bar{u}}_s\) is pointwise suboptimal in \(H^\alpha (s,{\bar{\xi }}_s,{\nabla }_x {\breve{W}}_t^\alpha (s,{\bar{\xi }}_s))\) for any \(s\in [0,t]\). Consequently, \( 0 \le {\textstyle {{\frac{\partial {{\breve{W}}_t^\alpha }}{\partial {s}}}}}(s,{\bar{\xi }}_s) + \langle {\nabla }_x {\breve{W}}_t^\alpha (s,{\bar{\xi }}_s),\, A\, {\bar{\xi }}_s + B\, {\bar{u}}_s \rangle + {\textstyle {\frac{\kappa }{2}}}\, |{\bar{u}}_s|^2 + {{\textstyle {\frac{1}{2}}}}\, [K + a^{-1}(\alpha _s)]\, |{\bar{\xi }}_s|^2 - {\textstyle {\frac{\alpha _s}{2}}} = {\textstyle {{\frac{d {}}{d {s}}}}} {\breve{W}}_t^\alpha (s,{\bar{\xi }}_s) + {\textstyle {\frac{\kappa }{2}}}\, |{\bar{u}}_s|^2 + {{\textstyle {\frac{1}{2}}}}\, [K + a^{-1}(\alpha _s)]\, |{\bar{\xi }}_s|^2 - {\textstyle {\frac{\alpha _s}{2}}} \). Integrating with respect to \(s\in [0,t]\), and observing that \(\breve{W}_t^\alpha (t,x) = \varPsi (x)\), yields \( {\breve{W}}_t^\alpha (0,x) \le \int _0^t {\textstyle {\frac{\kappa }{2}}}\, |{\bar{u}}_s|^2 + {{\textstyle {\frac{1}{2}}}}\, [K + a^{-1}(\alpha _s)]\, |{\bar{\xi }}_s|^2 - {\textstyle {\frac{\alpha _s}{2}}}\, ds + \varPsi ({\bar{\xi }}_t) = J_t(x,{\bar{u}},\alpha ) \). As \({\bar{u}}\in {{\mathscr {U}}}[0,t]\) is arbitrary, it follows by (79) that
for all \(x\in {\mathbb {R}}^n\). Consider the initial value problem (85). By Lemma 12, note that \(\xi ^\alpha \in {{{{\mathscr {L}}}}_2}([0,t];{\mathbb {R}}^n)\) and \(u^\alpha \in {{\mathscr {U}}}[0,t]\). Note further that \(u_s^\alpha \in {\mathbb {R}}^m\) is pointwise optimal in \(H^\alpha (s,\xi _s^\alpha ,{\nabla }_x {\breve{W}}_t^\alpha (s,\xi _s^\alpha ))\) for any \(s\in [0,t]\). Hence, repeating the above argument and applying (79), (88), \( {\widehat{W}}_t^\alpha (x) \le J_t(x,u^\alpha ,\alpha ) = {\breve{W}}_t^\alpha (0,x) \le {\widehat{W}}_t^\alpha (x) \). Recalling that \(x\in {\mathbb {R}}^n\) is arbitrary completes the proof. \(\square \)
Theorem 8
Given any \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), suppose there exists a solution \(P^*\in C([0,t];\varSigma ^n)\cap C^1((0,t);\varSigma ^n)\), \(Q^*, \xi ^*\in C([0,t];{\mathbb {R}}^n)\cap C^1((0,t);{\mathbb {R}}^n)\) of the two point boundary value problem (TPBVP)
for all \(s\in (0,t)\), subject to \(P_t^* = P_t\), \(Q_t^* = Q_t = -P_t\, z\), and \(\xi _0 = x\), where \(P_t\), z are as per (4). Then, the optimal inputs \(u^{M*}\in {{\mathscr {U}}}[0,t]\), \(\alpha ^{M*}\in {{\mathscr {A}}}^M[0,t]\) in (38), (77), and Lemma 10 are given by the state feedback characterizations
for all \(s\in [0,t]\), in which \({\hat{\alpha }}^{M*}\) is as per (19), (61).
Proof
Fix \(t\in {\mathbb {R}}_{>0}\), \(M\in {\mathbb {R}}_{\ge -\phi (0)}\). Suppose that a solution of TPBVP (89) exists as per the theorem statement, and denote it by \(s\mapsto (P_s^+, Q_s^+,\xi _s^+)\). Note in particular that \(P_t^+ = P_t\), \(Q_t^+ = -P_t\, z\), and \(\xi _0^+ = x\). Define the corresponding inputs \(u^+\in {{\mathscr {U}}}[0,t]\) and \(\alpha ^+\in {{\mathscr {A}}}^M[0,t]\) analogously to (90), i.e.
for all \(s\in [0,t]\). Observe that \(s\mapsto (P_s^+, Q_s^+)\) satisfy the FVPs (82), (83), as these are identical to the first two equations of (89). Augmenting these FVPs with (84) yields FVP (80), so that Theorem 7 may be applied. In particular, \(u^+ = u^{\alpha ^+}\) is the optimal control (85) in \({\widehat{W}}_t^{\alpha ^+}(x)\). Hence, by Lemma 8,
Hence, \({\overline{W}}_t^M(x) = J_t(x, u^+, \alpha ^+)\), and the uniqueness assertions of Theorem 6 and Lemma 10 imply that \(u^+ = u^{M*}\) and \(\alpha ^+ = \alpha ^{M*}\) as required. \(\square \)
Remark 5
Theorem 8 implies that the unique optimal inputs \(u^{M*}\) and \(\alpha ^{M*}\) of Theorem 2 and Lemma 10 can be computed via the state feedback characterizations (90), which depend on the solution of TPBVP (89). Consequently, as expected, a shooting method applied to TPBVP (89) will yield numerical approximations of these optimal inputs in specific examples.
Remark 6
Preliminary work [11] by the authors has illustrated how the approach in this paper may be generalized to include linear time-varying dynamics, and convex constraints defined by the intersection of a finite collection of ellipses. The latter generalization involves an increase in the dimension of the range of the actions of the barrier penalty negotiating player, i.e. \(\alpha _s\in {\mathbb {R}}^p\), \(s\in [0,t]\), where p is the number of ellipses. Crucially, the dimension of the DRE (80), or equivalently the DREs (82), (83), (84), does not change, so that the dimension of the dynamics underlying the TPBVP involved does not increase beyond that presented here. The interested reader is referred to [11] for those preliminary details and examples.
6 Illustrative Example
In illustrating an application of Theorems 4, 6, and 8 the approximate solution of a state constrained regulator problem (1) via the approximate problem (20) and corresponding game (59), a simple example is considered. The linear dynamics (5) and a hyperbolic barrier (6) are specified by
while the running cost (3) and its approximation (22) are specified by \(t \doteq 4\), \(\kappa \doteq 1\), \(K\doteq 0.1\), and \(M\doteq 50\). Some straightforward calculations yield that
so that (7) holds. Note in particular that \(\phi (0) = 0\), \(\phi '(0) = \frac{1}{9}\), and \(K\ge -\phi '(0)\).
Using this data, the sup-of-quadratics representation for \(\varPhi ^M\) provided by Proposition 2 is illustrated in Fig. 2a. The trajectory defined by TPBVP (89) is computed using a standard shooting method, which integrates the state dynamics (5) and FVP (80) backward in time from the known terminal cost \({\widehat{P}}_t^* = {\widehat{P}}_t\in \varSigma ^3\), and a candidate terminal state \(\xi _t^* = \xi _t\in {\mathbb {R}}^2\). The squared error in the obtained initial state \(|x - \xi _0^*|^2\) is iteratively minimized by varying \(\xi _t\) within a Nelder-Mead simplex method. The state feedback \(a^{-1}\circ {\hat{\alpha }}^{M*}(|\xi _s^*|^2)\) appearing in (89) is evaluated using (19) and the explicit expressions above.
Case I Terminal cost (4) with \(z\doteq 0\) and \(P_t \doteq I_2\). A pair of optimal trajectories for this terminal cost case is illustrated in Fig. 3a, corresponding to the barrier cost being active or inactive, i.e. included or excluded, in the cost (2), (22). The circle included identifies the boundary of the state constraint imposed. An initial state of \(x \doteq [\begin{array}{ccc} \frac{4}{3}&\,&-\frac{4}{3} \end{array}]'\) for dynamics (5) is assumed. Figures 4a and 5a illustrate the optimal inputs \({\tilde{\alpha }}^*\) and \({\tilde{u}}^*\) of (90) respectively. By inspection of the unconstrained case, \({\tilde{\alpha }}^*\) attains its maximum value of \(M=50\) where the constraint is violated. However, as \({\tilde{\alpha }}^*\) does not influence the control in the unconstrained case, the trajectory is not adjusted accordingly. In contrast, in the active constraint case, \({\tilde{\alpha }}^*\) attains a maximum of approximately 35 as the trajectory approaches the constraint. By inspection, the state constraint is not violated, due to the intervention evident in the large actuated control \({\tilde{u}}^*\) that ensues.
Case II Terminal cost (4) with \(z\doteq [\, 1 \ \ 1 \,]'\) and \(P_t \doteq 10\, I_2\). An initial state of \(x \doteq [\begin{array}{ccc} 0&\,&\frac{5}{3} \end{array}]'\) for dynamics (5) is assumed. The terminal cost is adjusted so as to encourage the trajectory to move towards the non-zero terminal state \(\xi _t = z = [\, 1 \ \ 1 \,]'\), while respecting the state constraint. Figures 3b, 4b, and 5b illustrate respectively the corresponding state trajectories, the optimal input \({\tilde{\alpha }}^*\), and the optimal control \({\tilde{u}}^*\) obtained, by solving TPBVP (89), with the constraint inactive and active.
7 Conclusions
A sup-of-quadratics representation is developed for a class of convex barrier functions for encoding a simple state constraint in a linear regulator problem. Using this representation, an equivalent unconstrained two player linear quadratic game is constructed. By demonstrating equivalence of its upper and lower values, an approach to computation is presented, and illustrated by example.
References
Altarovici, A., Bokanowski, O., Zidani, H.: A general Hamilton-Jacobi framework for non-linear state-constrained control problems. ESAIM Control Optim. Calc. Var. 19, 337–357 (2013)
Anderson, B.D.O., Moore, J.B.: Linear Optimal Control. Prentice-Hall, Englewood Cliffs (1971)
Aubin-Frankowska, P.-C.: Linearly-constrained linear quadratic regulator from the viewpoint of kernel methods. SIAM J. Control Optim. 59(4), 2693–2716 (2021)
Bokanowski, O., Forcadel, N., Zidani, H.: Deterministic state-constrained optimal control problems without controllability assumptions. ESAIM Control Optim. Calc. Var. 17, 995–1015 (2011)
Bokanowski, O., Gammoudi, N., Zidani, H.: Optimistic planning algorithms for state-constrained optimal control problems. Comp. Math. Appl. 109, 158–179 (2022)
Burachik, R.S., Kaya, C.Y., Majeed, S.N.: A duality approach for solving control-constrained linear-quadratic optimal control problems. SIAM J. Control Optim. 52(3), 1423–1456 (2014)
Camilli, F., Falcone, M.: Approximation of optimal control problems with state constraints: estimates and applications. In: Nonsmooth Analysis and Geometri Methods in Deterministic Optimal Control, pp. 23–57. Springer (1996)
Capuzzo-Dolcetta, I., Lions, P.-L.: Hamilton-Jacobi equations with state constraints. Trans. Am. Math. Soc. 318(2), 643–683 (1990)
Cardaliaguet, P., Quincampoix, M., Saint-Pierre, P.: Numerical schemes for discontinuous value functions of optimal control. Set-Valued Anal. 8, 111–126 (2000)
DeHaan, D., Guay, M.: A new real-time perspective on non-linear model predictive control. J. Process Control 16, 615–624 (2006)
Dower, P.M., Cantoni, M.: State constrained optimal control of linear time-varying systems. In: Proc. \(56^{th}\) IEEE Conference on Decision & Control, pp. 1338–1343. Melbourne, Australia (2017)
Dower, P.M., McEneaney, W.M., Cantoni, M.: A game representation for state constrained linear regulator problems. In: Proc. \(55^{th}\) IEEE Conference on Decision & Control, pp. 1074–1079. Las Vegas NV, USA (2016)
Feller, C., Ebenbauer, C.: Relaxed logarithmic barrier function based model predictive control of linear systems. IEEE Trans. Autom. Control 62(3), 1223–1238 (2017)
Fleming, W., Rishel, R.: Deterministic and Stochastic Optimal Control. Applications of Mathematics. Springer, New York (1975)
Fleming, W.H., McEneaney, W.M.: A max-plus-based algorithm for a Hamilton-Jacobi-Bellman equation of nonlinear filtering. SIAM J. Control. Optim. 38(3), 683–710 (2000)
Garcia, C.E., Prett, D.M., Morari, M.: Model predictive control: theory and practice—a survey. Automatica 25(3), 335–348 (1989)
Goebel, R., Subbotin, M.: Continuous time linear quadratic regulator with control constraints via convex duality. IEEE Trans. Autom. Control 52(5), 886–892 (2007)
Green, M., Limebeer, D.J.N.: Linear Robust Control. Information and Systems Sciences. Prentice-Hall, New York (1995)
Ishii, H., Koike, S.: A new formulation of state constraint problems for first-order PDEs. SIAM J. Control Optim. 34(2), 554–571 (1996)
Kushner, H.J., Dupuis, P.: Numerical Methods for Stochastic Control Problems in Continuous Time. Springer, New York (2001)
McEneaney, W.M.: Max-Plus Methods for Nonlinear Control and Estimation. Birkhauser, Boston (2006)
McEneaney, W.M., Dower, P.M.: The principle of least action and fundamental solutions of mass-spring and \(n\)-body two-point boundary value problems. SIAM J. Control Optim. 53(5), 2898–2933 (2015)
Rockafellar, R.T.: Conjugate Duality and Optimization. SIAM Regional Conf. Series in Applied Math., vol. 16 (1974)
Rockafellar, R.T., Goebel, R.: Linear-convex control and duality. Geom. Control Non-smooth Anal. 76, 280–299 (2008)
Rockafellar, R.T., Wets, R.J.: Variational Analysis. Springer, New York (1997)
Soner, H.M.: Optimal control with state-space constraint I. SIAM J. Control Optim. 24(3), 552–561 (1986)
Soner, H.M.: Optimal control with state-space constraint II. SIAM J. Control Optim. 24(3), 1110–1122 (1986)
Sontag, E.D.: Mathematical Control Theory. Springer, New York (1998)
Wright, S.E.: Consistency of primal-dual approximations for convex optimal control problems. SIAM J. Control Optim. 33(5), 1489–1509 (1995)
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Appendices
Appendix A: Some Useful Properties of the Barrier and Its Dual
Lemma 13
Given \(\phi \) satisfying (7), the function a of (8) is well-defined, differentiable, and strictly increasing, and has well-defined, differentiable, and strictly increasing derivative \(a'\) and inverse \(a^{-1}\), and well-defined, strictly positive second derivative \(a''\), satisfying
Proof
By inspection of (8), and the properties of \((\phi ')^{-1}\) provided by (7) (v), it is evident that a is well-defined on \({\mathbb {R}}_{\ge \phi '(0)}\). Note further that \( [(\phi ')^{-1}]'(\beta ) = 1/[\phi ''\circ (\phi ')^{-1}(\beta )] \) for all \(\beta \in {\mathbb {R}}_{\ge \phi '(0)}\), in which the denominator is strictly positive by (7) (i), (v). Hence, a is differentiable by inspection of (8), and the chain rule yields \( a'(\beta ) = (\phi ')^{-1}(\beta ) + \beta \, [(\phi ')^{-1}]'(\beta ) - [\phi \circ (\phi ')^{-1}(\beta )] \, [(\phi ')^{-1}]'(\beta ) = (\phi ')^{-1}(\beta ) \) for all \(\beta \in {\mathbb {R}}_{\ge \phi '(0)}\). Consequently, \(a' = (\phi ')^{-1}\) is well-defined and strictly increasing, with \(a':{\mathbb {R}}_{\ge \phi '(0)}\rightarrow [0,b^2)\), by (7) (v). That is, (92) holds. As \(a'(\phi '(0)) = 0\) (by substitution), the strict increase property of \(a'\) implies that \(a'(\beta )\in {\mathbb {R}}_{>0}\) for all \(\beta \in (\phi '(0),\infty )\). Hence, a is also strictly increasing, and so (8) implies that \(a(\beta ) \ge a(\phi '(0)) = -\phi (0)\) for all \(\beta \in [\phi '(0),\infty )\). By the same strict increase property of \(a'\), note further that there exists an \({\epsilon }>0\) and \(\beta _{\epsilon }>\phi '(0)\) such that \(a'(\beta ) \ge {\epsilon }>0\) for all \(\beta \ge \beta _{\epsilon }\). Consequently, \(\lim _{\beta \rightarrow \infty } a(\beta ) \ge \lim _{\beta \rightarrow \infty } [ (\beta - \beta _{\epsilon })\, {\epsilon }+ a(\beta _{\epsilon })] = \infty \). Hence, \(a(\beta )\in [-\phi (0),\infty )\) for all \(\beta \in [\phi '(0),\infty )\), which confirms the range of a specified in (8).
By inspection of (92) and (7) (i), \(a'\) is differentiable with derivative \(a''\) given by \( a''(\beta ) = [(\phi ')^{-1}]'(\beta ) = 1/[\phi ''\circ (\phi ')^{-1}(\beta )] \) for all \(\beta \in {\mathbb {R}}_{\ge \phi '(0)}\), which (as indicated previously) is strictly positive by (7) (i), (v). Hence, (93) holds.
As a is strictly increasing, the existence of its strictly increasing inverse \(a^{-1}\), with domain and range specified by (94), follows immediately from (8). The chain rule and (92) subsequently imply that \(a^{-1}\) is also differentiable, with derivative (95). There, the range of this derivative follows by (7) (v), (92). The two limits in (96) follow directly from (95), with
These limits, along with the fact that \((a^{-1})'\) is decreasing, confirm the range in (95).\(\square \)
The following two lemmas are stated without proof.
Lemma 14
Given \(\rho \in {\mathbb {R}}_{\ge 0}\), and \(a^{-1}\) as per (94), the map \(\gamma _\rho :{\mathbb {R}}_{\ge -\phi (0)}\rightarrow {\mathbb {R}}\) defined by
is twice differentiable with \(\gamma _\rho ':{\mathbb {R}}_{>-\phi (0)} \rightarrow (\rho /b^2-1,\infty )\), \(\gamma _\rho '':{\mathbb {R}}_{>-\phi (0)} \rightarrow {\mathbb {R}}_{<0}\) given by
with \({\hat{\rho }}\) as per (15). With \(\rho \ge b^2\), \(\alpha \mapsto \gamma _\rho (\alpha )\) is strictly increasing with \(\lim _{\alpha \rightarrow \infty }\gamma _\rho (\alpha ) = \infty \).
Lemma 15
Given \(M\in {\mathbb {R}}_{\ge -\phi (0)}\),
in which a, \(a^{-1}\), \({\hat{\rho }}(M)\) are given by (8), (94), (15), and \(\lambda _+^M:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is defined by
Appendix B: Proof of Lemma 4
Proof
[Lemma 4] (i) Fix \(M\in {\mathbb {R}}_{\ge -\phi (0)}\). By the monotonicity of \(a^{-1}\), see Lemma 13 and (94), note that \(a^{-1}(M) \ge a^{-1}(-\phi (0)) = \phi '(0)\). Hence, recalling (15), and (10),
with \(\varGamma _+^M(\rho ) \doteq \sup _{\beta \in [\phi '(0), a^{-1}(M)]} \{ \beta \, \rho - a(\beta ) \}\). The supremum in (104) is achieved at
Following some further rudimentary calculations,
with the supremum achieved at the \(\beta = {\hat{\beta }}_+^{M*}(\rho )\) specified. The pointwise maximum (104) may be evaluated via (106), and the inequalities (99), (100) of Lemma 15. Indeed, inspection of (104), (106), (99), (100) immediately yields (16). The optimizer (17) that achieves the supremum in (15) follows by matching the corresponding cases in (105), (106).
(ii) In view of \(\varPhi ^M\), \({\hat{\rho }}(M)\) of (16), (15), define
for \(M\in {\mathbb {R}}_{\ge -\phi (0)}\). With \({\bar{\beta }}\doteq a^{-1}(M)\), note that \(L = \phi \circ (\phi ')^{-1}({\bar{\beta }})\) and \(U = {\bar{\beta }} \, (\phi ')^{-1}({\bar{\beta }}) - M\), so that \(U - L = [{\bar{\beta }} \, (\phi ')^{-1}({\bar{\beta }}) - \phi \circ (\phi ')^{-1}({\bar{\beta }})] - M = a({\bar{\beta }}) - M = a\circ a^{-1}(M) - M = 0\), via (8). That is, \(\varPhi ^M\) is continuous at \({\hat{\rho }}(M)\), and \(\varPhi ^M\in C({\mathbb {R}}_{\ge 0};{\mathbb {R}})\). By inspection of (16),
and \(\lim _{\rho \uparrow {\hat{\rho }}(M)} (\varPhi ^M)'(\rho ) = \phi '\circ {\hat{\rho }}(M) = a^{-1}(M) = \lim _{\rho \downarrow {\hat{\rho }}(M)} (\varPhi ^M)'(\rho )\) via (15). Hence, \(\varPhi ^M\in C({\mathbb {R}}_{\ge 0};{\mathbb {R}})\cap C^1({\mathbb {R}}_{>0};{\mathbb {R}})\). As \((\varPhi ^M)'\) is non-decreasing on \({\mathbb {R}}_{>0}\), and infinite elsewhere, \(\varPhi ^M:{\mathbb {R}}\rightarrow {\overline{{\mathbb {R}}}}^+\) is (lower) closed convex on \({\mathbb {R}}\), see for example [23, (3.8), pp. 15,17].
(iii) Follows by inspection of (6), (10), (15), via Lemma 3.
(iv) The following claim is first demonstrated.
Claim: Given \(M\in {\mathbb {R}}_{\ge -\phi (0)}\), there exists \({\varTheta }^M:{\mathbb {R}}\rightarrow {\mathbb {R}}_{\ge -\phi (0)}^+\) such that
for all \(\rho ,\beta \in {\mathbb {R}}\), with \(\phi \), a as per (7), (8). \(\square \)
Proof of Claim
Convexity assertion (ii) and [23, Theorem 5, p. 16] imply that their exists a one-to-one pairing between \(\varPhi ^M\) and its Fenchel transform \({\varTheta }^M:{\mathbb {R}}\rightarrow {\overline{{\mathbb {R}}}}^+\) as per (107) and the left-hand equation in (108). For the right-hand equation in (108), note by (16) that the supremum in the left-hand equation in (108) is never achieved at \(\rho \in {\mathbb {R}}_{<0}\). Hence,
for all \(\beta \in {\mathbb {R}}\). Some rudimentary calculations subsequently yield
By inspection of (110), and recalling (103),
Hence, the pointwise maximum in (109) may be evaluated by (112), (113) and inequalities (101), (102) from Lemma 15 in Appendix A, which yields the right-hand equation in (108). \(\square \)
Returning to the proof of (iv), by Lemma 13, there exists an \(M_1\in {\mathbb {R}}_{\ge -\phi (0)}\) such that \(a^{-1}(M)\in {\mathbb {R}}_{>0}\) for all \(M\in {\mathbb {R}}_{\ge M_1}\). Meanwhile, applying the above claim, in particular (108), \({\varTheta }^M(0) = \sup _{\rho \in {\mathbb {R}}} \{ -\varPhi ^M(\rho ) \}\) for any \(M\in {\mathbb {R}}_{\ge -\phi (0)}\). Hence, recalling (108),
so that the assertion is proved for any \(c\in {\mathbb {R}}\) satisfying \(c<{\hat{c}}\in {\mathbb {R}}\), as required.
(v) Monotonicity of \({\hat{\rho }}\) follows by applying the chain rule and Lemma 13 to (15). That it converges to \(b^2\) in the limit as \(M\rightarrow \infty \) follows by a simple contradiction argument, whose details are omitted. \(\square \)
Appendix C: Summary of Significant Notations
Significant notations, and references to their introduction, are summarized in the table that follows.
Exact regulator problem | ||
---|---|---|
\({\overline{W}}_t\) | Value function | (1) |
\({\bar{J}}_t\), \({\bar{I}}_t\) | Cost function and its trajectory dependence | |
\(I_t^\kappa \), \(\varPsi \) | Input and terminal costs | |
K, \(\kappa \), \(P_t\), z | Cost function parameters / data | |
A, B | Linear system matrices | (5) |
\(\varPhi \), \(\varTheta \) | Extended barrier function and its convex dual | |
\(\phi \), a | Real valued barrier function and its convex dual | |
\(u^*\) | Optimal open loop control | (38) |
\({{\mathscr {U}}}[0,t]\), \({{\mathscr {U}}}_x^{\epsilon }[0,t]\) | Admissible and near-optimal controls | |
Approximate regulator problem | ||
\({\overline{W}}_t^M\) | Value function | (20) |
\({\bar{J}}_t^M\), \({\bar{I}}_t^M\) | Cost function and its trajectory dependence | |
\(\varPhi ^M\) | Sup-of-quadratics barrier function approximation | (15) |
\(u^{M*}\) | Optimal open loop control | (38) |
\({{\mathscr {U}}}[0,t]\), \({{\mathscr {U}}}_x^{M,{\epsilon }}[0,t]\) | Admissible and near-optimal controls | |
Exact and approximate game problems | ||
\(W_t^M\) | Value function | (59) |
\(J_t\), \(I_t\) | Cost function and its trajectory dependence | |
\(\nu \) | Running cost | (53) |
\({\hat{\alpha }}^*\), \(\alpha _s^*\) | Auxiliary player action (exact) | |
\({\hat{\alpha }}^{M*}\), \(\alpha _s^{M*}\) | Auxiliary player action (approx.) | |
\({{\mathscr {A}}}[0,t]\) | Auxiliary player actions | (59) |
\(\widehat{W}_t^\alpha \) | Auxiliary value function | (79) |
\(P_s^\alpha \), \(Q_s^\alpha \), \(R_s^\alpha \) | Quadratic form for \(\widehat{W}_t^\alpha \) |
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dower, P.M., McEneaney, W.M. & Cantoni, M. A Game Representation for a Finite Horizon State Constrained Continuous Time Linear Regulator Problem. Appl Math Optim 88, 19 (2023). https://doi.org/10.1007/s00245-023-09972-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s00245-023-09972-6