A Game Representation for a Finite Horizon State Constrained Continuous Time Linear Regulator Problem

A supremum-of-quadratics representation for a class of extended real valued barrier functions is developed and applied in the context of solving a continuous time linear regulator problem subject to a single state constraint of bounded norm. It is shown that this very simple state constrained regulator problem can be equivalently formulated as an unconstrained two-player game. By demonstrating equivalence of the upper and lower values, and exploiting existence and uniqueness of the optimal actions for both players, state feedback characterizations for the corresponding optimal policies for both players are developed. These characterizations are illustrated by a simple example.


Introduction
Finite horizon continuous time linear quadratic regulator (LQR) problems and their solution can be considered classical in systems theory, see for example [2,16,18]. The value function that attends their solution as an optimal control problem is quadratic in the initial system state, with its Hessian determined by the unique solution of a differential Riccati equation (DRE) subject to a terminal condition set by the terminal cost. Standard tools exist for the efficient solution of DREs, and thus for such LQR problems.
The inclusion of nonlinear dynamics, non-quadratic costs, and/or a constraint into an LQR problem introduces a non-quadratic term into the associated Hamiltonian. This non-quadratic term prevents the Hamilton-Jacobi-Bellman (HJB) partial differential equation (PDE) from simplifying to a DRE, so that computationally more expensive numerical methods are usually required [20,21]. However, if the non-quadratic term introduced is semiconvex [15,21], it can be equated with a supremum over a family of quadratic functions, which can be exploited in the subsequent analysis [22]. An example of this type of sup-of-quadratics representation is illustrated in Fig. 1.
The purpose of this paper is to explore the inclusion of a semiconvex extended real valued barrier function in the cost function for an LQR problem via its sup-ofquadratics representation, with the proviso that convexity of the cost function (with respect to the control input) is retained. The intention is to implement an elementary state constraint of bounded norm, containing the origin, using computations that involve a DRE. The development reported is a compendium of earlier efforts [11,12] by the authors, augmented with detailed proofs to support its validity. The focus is deliberately on the sup-of-quadratics representation for the barrier function involved, and the implications its use has in the formulation and solution of the optimal control problems of interest. Semiconvex function Quadratic function

Approach and Contribution
A typical extended real valued barrier function, and its finite approximation, is illustrated in Fig. 2. This type of barrier function is intended for the implementation of an elementary state constraint of bounded norm, containing the origin. Under reasonable conditions, either an exact or an approximate sup-of-quadratics representation can be constructed for such a barrier function, analogously to Fig. 1. Adding either representation to the cost function for an otherwise standard LQR problem, with the proviso that convexity must be retained, yields a non-quadratic value function for an exact or an approximate state constrained optimal control problem (respectively). Exploiting the aforementioned convexity, and other properties of the cost functions, yields convergence of the approximate problem to the exact problem, in terms of their value functions and their corresponding optimal controls and state trajectories. Satisfaction of the state constraint for the exact problem also follows. The exact and approximate sup-of-quadratics representation can be further interpreted, via measurable selection, as encoding the actions of an auxiliary player in corresponding games. The value of the approximate game is shown to exist, and to coincide with that of the approximate regulator problem. The earlier convergence results allow corresponding conclusions in the exact case. Further consideration of the lower value of the approximating game leads to state feedback characterizations for the optimal policies of both players. These policies are shown to explicitly depend on the solution of the state dynamics driven by the optimal control, and the solution of a DRE facilitated by the sup-of-quadratics representation. As the state dynamics and the DRE evolve their respective solutions forwards and backwards in time, application of these policies is admitted via solution of a two point boundary value problem.

Context
The value function of a general state constrained control problem, including for the simple case considered in this paper, can be characterized as the viscosity solution of   the aforementioned non-stationary HJB PDE. Uniqueness of this solution can be guaranteed in the presence of suitable controllability assumptions on the constraint set, for example, via inward pointing [8,26,27] or related boundary conditions [19]. Alternatively, the value function can be characterized uniquely as the viscosity solution of a variational inequality, in very general settings, using viability theory and non-smooth analysis [1,4]. Moreover, consistent approximations generated via temporal and spatial discretization provide the foundation for finite difference methods for numerical computation, see for example [7,9]. In the context of the curse-of-dimensionality, recent advances in optimization based approaches exploiting (for example) reproducing kernels [3] and optimistic planning [5] have provided very promising computational improvements over those earlier grid-based methods.
In the context of duality and barrier function approaches to state constrained optimal control, immediately relevant prior works include [6,10,13,17,24,29]. Specifically, [29] develops an approximation scheme for general convex costs, and studies consistency of this approximation, while [10] considers continuous time constrained control in a model predictive control setting, subject to an inward pointing condition on the feedback policy. One of many related investigations exploiting barrier functions in the implementation of constraints is detailed in [13], via a discrete time setting. Duality and saddle point properties are explored in a more general setting in [6,17], albeit in the case of control constraints. The tools of convex analysis are employed in the general treatment of a closely related class of continuous time problems in [24] that addresses both control and state constraints. A key contribution of the current work relative to [24] concerns the sup-of-quadratics representation developed, and the exploitation of the game formulation that follows from it via measurable selection.

Organization and Notation
The regulator problem of interest is posed in Sect. 2, along with the barrier functions involved. This is followed in Sect. 3 by development of the exact and approximate supof-quadratics representations for these barrier functions, and the introduction of the approximate regulator problem. Existence and uniqueness of optimal trajectories for the exact and approximate regulator problems are considered in Sect. 4, along with their behaviour relative to the state constraint of interest. Exact and approximate two player games are formulated in Sect. 5, and their respective equivalences with the exact and approximate regulator problems is demonstrated. A further equivalence of the upper and lower values is demonstrated in each case. This in turn motivates characterization of the optimal policies involved via solution of a two-point boundary value problem defined in terms of a DRE. This characterization is illustrated by example in Sect. 6. The paper concludes with some minor summarizing remarks in Sect. 7. Some additional technicalities and proofs appear in Appendices A and B, while a summary of significant notations is included in Appendix C.
Throughout, R, N, Z denote the reals, natural numbers, and integers, while R ≥0 , R >0 , and R denote the non-negative, positive, and extended reals respectively, with the latter defined by R .
= (a, ∞) for any a ∈ R. An n-dimensional Euclidean space is denoted by R n . The space of n × m matrices on R is denoted by R n×m . The set of positive semidefinite symmetric matrices in R n×n is denoted by Σ n . The Euclidean and induced matrix norms are denoted by |·| and · respectively. Otherwise, the norm on a Banach space U is denoted by · U , or simply · if the space is contextually clear. Open and closed balls of radius r ∈ R ≥0 in U are denoted respectively by B U (0; r ) and B U [0; r ]. Weak convergence of a sequence {u k } k∈N ⊂ U to someū ∈ U is denoted by u k ū (as k → ∞). The product space U × · · · × U of k ∈ N instances of U is denoted by U k . The space of bounded linear operators between Banach spaces U and V is denoted by L(U ; V ). The spaces of continuous and k-times continuously differentiable functions mapping U to V are denoted by C(U ; V ) and Differentiability at a closed left or right end-point of an interval is interpreted throughout to mean right-or left-differentiability respectively. The space of (Lebesgue) square integrable mappings from

State Constrained Linear Regulator Problem
Interest is restricted to optimal control problems defined on a finite time horizon t ∈ R ≥0 , with respect to linear dynamics and a convex barrier state constraint. The value function W t : R n → R + involved is defined by is the space of open loop controls, andJ t is a cost function defined with respect to the integrated running costsĪ t and I κ t , κ ∈ R >0 , and a terminal cost Ψ . Specifically,J t ,Ī t : for all x ∈ R n , u ∈ U [0, t], in which K ∈ R and P t ∈ Σ n are a priori fixed, and Φ is an extended real valued barrier function to be specified below. The map s → ξ s ∈ R n , s ∈ [0, t], describes the unique trajectory of a linear dynamical system corresponding to an initial state x ∈ R n and input u ∈ U [0, t], given explicitly via a map χ : for all s ∈ [0, t], given A ∈ R n×n , B ∈ R n×m , B = 0. The barrier function Φ : R → R + is defined by for fixed b ∈ R >0 , in which φ : [0, b 2 ) → R satisfies the following properties: exists and is strictly increasing.
Note in particular that (iii)-(v) follow as a consequence of (i)-(ii), see for example [25,Theorem 2.13,p. 46]. As a consequence, φ has a well-defined convex dual a : for all β ∈ R ≥φ (0) , that satisfies a variety of properties, including invertibility, etc, see Lemma 13 in Appendix A. It defines a useful change of coordinates in the sup-ofquadratics representation that is developed for barrier Φ in Sect. 3. Two preliminary lemmas concerning (1), (5) are included prior to commencing this development. Their proofs are standard and are omitted for brevity. (5) for all k ∈ N, the following properties hold:

Lemma 1 Given any t
In view of (3), (5), (6), (7), it is emphasized that attention is restricted to the very simple case of a single state constraint of bounded norm, i.e. |ξ s | ≤ b for all s ∈ [0, t]. While this case is seemingly trivial, it is sufficient to demonstrate the details of how the barrier function implementing this constraint can be relaxed in the optimal control problem to yield an unconstrained game. The treatment of more general convex constraints in [11], defined with respect to the intersection of a family of ellipses, is founded on this development.

Barrier Representations and an Approximate Regulator Problem
Exact and approximate sup-of-quadratics representations for closed convex barrier functions of the form of Φ of (6) can be established via convex duality [23,25]. These representations are fundamental to the development of a convergent approximation for the state constrained regulator problem (1), and its representation via unconstrained linear quadratic games. The development of these representations and the approximate regulator problem follow below.

Exact Sup-of-Quadratics Representation for Convex Barriers
Convex duality and the asserted properties (7) of barrier (6) yield the following, via some rudimentary calculations.

Lemma 3
The barrier function Φ : R → R + of (6) is closed and convex, and there exists a closed and convex function Θ : R → R such that for all ρ, β ∈ R, with a as per (8). Furthermore, the optimizersβ * : R → R and ρ * : for all β, ρ ∈ R.
A change of coordinates via (8) yields the sup-of-quadratics representation, the details of which are analogous to the later proof of Proposition 2.

Proposition 1 The barrier function
and defined by (6), has the exact sup-of-quadratics representation (12) for all x ∈ R n , in which a −1 is defined via (8). Furthermore, the optimizerα * (| · | 2 ) : for all x ∈ R n .

Approximate Sup-of-Quadratics Representation for Convex Barriers
An approximate sup-of-quadratics representation can be obtained by restricting the interval over which the supremum is evaluated in the first equation in (10). To this end, define Φ M : for all M ∈ R ≥−φ(0) , ρ ∈ R, with φ , a, Θ as per (7), (8), (10), with the range of Φ M to be verified. The proof of the following appears in Appendix B.

Lemma 4
With M ∈ R ≥−φ(0) , Φ M : R → R + of (15) satisfies the following: for all ρ ∈ R, with the maximizer β = β M * : R → R − given bŷ , and it is closed and strictly convex; As per the exact case of Proposition 1, application of this lemma along with a change of coordinates defined by (8) admits an approximate sup-of-quadratics representation.

Approximate Regulator Problem and Its Convergence to the Exact Problem
The sup-of-quadratics representation (12) for the convex barrier function Φ of (6), and its convergent approximation (18), can be used to formulate an approximate regulator problem for (1).
for all .
[Right-hand equality in (24) In view of Corollary 1 and (22) by Proposition 2 (ii), and continuous (and hence measurable). Applying the monotone convergence theorem, in which the final equality follows by Lemma 4 (iii). Hence, recalling (21), (23), in which it is noted that x ∈ R n and u ∈ U [0, t] are arbitrary.

Optimal Trajectories and Constraint Satisfaction
Existence and uniqueness of the optimal trajectories in (1), (20) is demonstrated via analysis of the attendant cost functions (2), (22). In particular, these cost functions are shown to be proper, lower semicontinuous, strictly convex, and coercive. These properties are demonstrated to be sufficient for the required existence and uniqueness of the optimal controls involved, as summarised in Theorem 2. The relevant behaviour of the corresponding optimal trajectories, with respect to the state constraint, is concluded in Theorem 3.

Existence and Uniqueness of the Optimal Controls
In order to demonstrate that the cost functionsJ (2), (22) are proper, convex, and coercive for fixed in which χ is as per (5).

Lemma 6 Given any t
(i)J M t (x, ·) andJ t (x, ·) are respectively continuous and lower semicontinuous for all x ∈ R n ; (ii) Both are strictly convex and coercive for all x ∈ R n ; and (iii)J M t (x, ·) andJ t (y, ·) are proper for all x ∈ dom W M t = R n and all y ∈ dom W t .

Theorem 2 Given any t
Moreover, for x = y ∈ dom W t , these optimal controls converge strongly in the limit Proof For the first assertion, i.e. (38), as the existence and uniqueness arguments for the two optimal controls are analogous, only the first is included. The proof involves two main steps, prefaced as follows: (I) confirm that the left-hand argmin in (38) is nonempty, by constructing an element of a sequence of decreasing level sets ofJ t (x, ·), using the available properness, lower semicontinuity, and coercivity properties of the latter; and (II) verify that this argmin is a singleton, via convexity. The details of the proof follow.
(II) Suppose there exists aũ ∈ arg min u∈U [0,t]J t (x, u) such thatũ =ū, and defineȗ . (x, u). Hence, the argmin is a singleton, with {u * } = arg min u∈U [0,t]J t (x, u), as u * . =ū =ũ. For the second assertion, i.e. convergence of the optimal controls, fix t > 0, x = y ∈ dom W t , and a sequence {M k } k∈N ⊂ [−φ(0), ∞) such that lim k→∞ M k = ∞. Note by the first assertion that u * , u M k * ∈ U [0, t] exist and are unique. Defineȗ M k . = 1 2 (u * + u M k * ), k ∈ N, and note that Consequently, by the sub-optimality ofȗ M k in W M k t (x), optimality of u * and u M k * in W t (x) and W M k t (x) respectively, and Theorem 1, As κ ∈ R >0 , taking the limit as k → ∞ and again applying Theorem 1 yields lim k→∞ u * − u M k * U [0,t] ≤ 0, as required.

Constraint Satisfaction
With existence of the optimal controls in (1), (20) guaranteed by Theorem 2, the corresponding state trajectories can be examined to determine their compliance with the intended state constraint. To this end, given t ∈ R >0 , x ∈ R n , ∈ R >0 , define the sets of -optimal inputs in the definitions (1), (20) for all x ∈ R n , u ∈ U [0, t]. Observe that Δ t (x, u) aggregates the times in [0, t] at which the state constraint is violated for the trajectory (5), given its initial state x and control u. The following theorem provides bounds on the measure of this set Δ t (x, u) for any control, for the near optimal controls (41), (42), and subsequently for the optimal controls of (38).

Remark 2
Theorem 3 indicates that the regulator problem defined by W t of (1) implements the required state constraint for almost every time for those initial states x ∈ R n for which W t (x) < ∞, and the approximating regulator problem defined by W M t of (20) implements the same constraint in the limit as M → ∞. This constraint can be violated for M < ∞, as the finite approximation Φ M of the extended real valued barrier Φ does not impose an infinite cost penalty on such violations in the definition (20) of W M t . This is reflected in the non-zero right-hand side of (44), which allows for constraint violations on time intervals of non-zero measure for M < ∞.

Equivalent Unconstrained Game
The sup-of-quadratics representation (12) for the convex barrier function Φ in (6) is used to demonstrate equivalence of the value function for the state constrained regulator problem (1) with the upper value of an unconstrained two player game, as summarised in Theorem 4. Similarly, the approximate sup-of-quadratics representation (18) is used to demonstrate an equivalence between the value function for the approximate regulator problem (20) with the corresponding upper value of an approximate two player game, see Theorem 5. It is further demonstrated that this approximate game has equivalent upper and lower values, which is used to demonstrate the corresponding equivalence for the exact game, see Theorem 6 and Corollary 2, via the convergence results of Theorem 1. The lower value is subsequently exploited to examine solutions of the state constrained regulator problem (1) via DREs, see Theorems 7 and 8.

Exact Unconstrained Game and Its Upper Value
Given a horizon t ∈ R ≥0 , define a function space by Motivated by (2), (3), (12), define the upper value W t : R n → R + of a two player unconstrained linear quadratic game by for all x ∈ R n , in which J t is a cost function defined with respect to a new integrated running cost function I t motivated by (3), and the existing integrated running cost I κ t of (3) and terminal cost Ψ of (4). In particular, define J t , ν(x,α) .
The value functions (1) and (50) defining the exact regulator problem and the exact unconstrained game are in fact equivalent, as stated in the following theorem.

Theorem 4
Given t ∈ R ≥0 , the value functions W t , W t of (1), (50) are equivalent, The proof of Theorem 4 follows as a consequence of the following measurable selection lemma, see for example [14].
(ii) Immediate by the fourth equality of (58).

Approximate Game and Its Upper and Lower Values
for all x ∈ R n , where cost J t is as per (51). As in the exact case, the value function (20) of the approximating regulator problem and the upper value (59) of the approximating game are equivalent.
The proof of Theorem 5 follows as a consequence of a corresponding measurable selection lemma, the proof of which is similar to that of Lemma 7.
[(60) and the left-hand argmax in (62)] The first equality in (60) is immediate by Proposition 2 (i), i.e. (18). For the remaining equalities, any α ∈ A M [0, t] is pointwise suboptimal in the supremum overα ∈ [−φ(0), M], so that Hence, In order to prove the opposite inequality, recall by (61) and Proposition 2, i.e. (19), that α M * s is the pointwise maximizer of ν(ξ s , ·). That is, With a view to addressing computation, the remaining objective is to demonstrate equivalence of the upper and lower values for the game (59). To this end, a number of useful properties of the cost function J t of (51) are summarised via the following two lemmas. (51) is Frèchet differentiable, strictly convex, and coercive.

Lemma 9 Given any t
Proof The differentiability assertion follows by Lemma 1(ii) and the chain rule, while the strict convexity and coercivity assertions follow via analogous arguments to the proof of Lemma 6. The details are omitted. Proof The proof involves three main steps, prefaced as follows: (I) given the unique optimal control u M * as indicated, construct a unique α M * from the corresponding optimal trajectory via (5), (61); (II) verify by strict convexity that the cost J t (x, ·, α M * ) has a unique minimizer; and (III) show that this minimizer must be u M * , by showing that perturbations away from u M * always result in a higher cost, via a non-negative directional derivative of J t (x, ·, α M * ).
Step (III) is complicated by two features of the problem: firstly, perturbations in the control yield perturbations in the trajectory, which in turn yield perturbations in the evaluated α M * via (61); and secondly, the statement of (61) involves a number of cases, see (19). In dealing with the first complication, it is convenient to bound the directional derivative of J t (x, ·, α M * ) below by that ofJ M t (x, ·), for which u M * is the known unique minimizer. In dealing with the second complication, the cases involved must be exhaustively enumerated.
The details of the proof follow.
for all s ∈ [0, t], with χ ,α M * as per (5), (19). Recalling (5) and Lemma 1, note in particular that By (51), (52), (53), and Lemma 8, so that, by subtraction, As per the prefaced first complication of step (III), the first two terms on the righthand side of (68) are a prelude to a directional derivative ofJ M t (x, ·), in which it is noted that u M * is the minimizer ofJ M t (x, ·), see Theorem 2 and (38). Hence, a lower bound for the integral term in the right-hand side of (68) is sought, as a function of δ, using Taylor's theorem. This Taylor's theorem argument brings in the second prefaced complication of step (III), due to the cases involved in the definition of α M * , and this dominates the remainder of the proof. Persevering, as a first step, it may be shown with some calculation via (19), (53) 2 |φ (ρ)| .
As s ∈ Δ 0 is arbitrary, integration yields Hence, combining these inequalities yields a lower bound for the first order term, with with K M 1 ( ) The second order term in (73) has the same form as (69), with in which ρ s ∈ [0,ρ(M)] in every case. Hence, a lower bound for the second order term is 1 2 with K M 2 ( ) Cases (i) and (ii) may now be combined, via (72) and (76), in (68), to finally deal with the second prefaced complication of step (III). In particular, Recalling (66), a lower bound for the directional derivative (65) can subsequently be evaluated, with in which the second inequality follows by Theorem 2, i.e. (38), effectively dealing with the first prefaced complication of step (III). Furthermore, as ∈ R >0 can be selected arbitrarily small, cf. its definition prior to (66), and K M 0 (0) = 0 by (71), it follows that Theorem 6 Given t ∈ R >0 , M ∈ R ≥−φ(0) , x ∈ R n , and u M * , α M * as per Lemma 10, Proof Fix t ∈ R >0 , M ∈ R ≥−φ(0) , x ∈ R n , and α M * , u M * as per Lemma 10. x, u, α). For the opposite inequality, and existence of the minimizer and maximzer as per the final equality in (77), note by Theorem 2, Lemma 8, the definition of α M * , and finally Lemma 10, that for all s ∈ [0, t], in whichÂ,P s ,P t ,V s ∈ R (n+1)×(n+1) , s ∈ [0, t],B ∈ R (n+1)×m , C ∈ R n×(n+1) are defined via κ, K of (2), A, B of (5), and P t of (4) bŷ in which I n ∈ R n×n and 0 n ∈ R n×1 denote the identity matrix and zero vector respectively.
for any x ∈ R n , where P α s , Q α s are as per (82), (83).

Illustrative Example
In illustrating an application of Theorems 4, 6, and 8 the approximate solution of a state constrained regulator problem (1) via the approximate problem (20) and corresponding game (59), a simple example is considered. The linear dynamics (5) and a hyperbolic barrier (6) are specified by while the running cost (3) and its approximation (22) (5) is assumed. The terminal cost is adjusted so as to encourage the trajectory to move towards the non-zero terminal state ξ t = z = [ 1 1 ] , while respecting the state constraint. Figures 3b, 4b, and 5b illustrate respectively the corresponding state trajectories, the optimal inputα * , and the optimal controlũ * obtained, by solving TPBVP (89), with the constraint inactive and active.

Conclusions
A sup-of-quadratics representation is developed for a class of convex barrier functions for encoding a simple state constraint in a linear regulator problem. Using this representation, an equivalent unconstrained two player linear quadratic game is con-  structed. By demonstrating equivalence of its upper and lower values, an approach to computation is presented, and illustrated by example.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.