Gradient Methods on Strongly Convex Feasible Sets and Optimal Control of Affine Systems
Abstract
The paper presents new results about convergence of the gradient projection and the conditional gradient methods for abstract minimization problems on strongly convex sets. In particular, linear convergence is proved, although the objective functional does not need to be convex. Such problems arise, in particular, when a recently developed discretization technique is applied to optimal control problems which are affine with respect to the control. This discretization technique has the advantage to provide higher accuracy of discretization (compared with the known discretization schemes) and involves strongly convex constraints and possibly nonconvex objective functional. The applicability of the abstract results is proved in the case of linearquadratic affine optimal control problems. A numerical example is given, confirming the theoretical findings.
Keywords
Optimal control Mathematical programming Numerical methods Gradient methods Affine control systems Bang–bang controlMathematics Subject Classification
49M25 90C25 90C48 49M371 Introduction
Solving numerically optimal control problems in which the control function appears linearly, and performing error analysis, are still challenging issues due to the typical discontinuity of the optimal control. Considerable progress was made in the past decade in the analysis of discretization schemes in combination with various methods of solving the resulting discretetime optimization problems. The papers [1, 2, 25, 27] apply to problems with linear dynamics, while [3, 11] address nonlinear affine (in the control) dynamics. Usually the discretization is performed by Runge–Kutta schemes (mainly the Euler scheme) and the accuracy is at most of first order due to the discontinuity of the optimal control. Discretization schemes of higher accuracy were recently proposed in [21, 24] for systems with linear dynamics and Mayer or Bolza problems. In both cases the error analysis is based on the assumption that the optimal control is of purely bang–bang type.
For solving the above problem one can apply the highorder discretization scheme developed in [21, 24]. It results in a discretetime optimal control problem (a mathematical programming problem), where the gradient of the objective function can be calculated following a standard procedure involving the solution of the associated adjoint system, so that gradienttype methods are conveniently applicable. And here we encounter a remarkable fact: although neither the objective functional (1) of the continuoustime problem (1)–(3) nor the control constraints (3) are strongly convex, it turns out that the feasible set of the discretized problem is strongly convex. This brings into consideration the issue of convergence of gradient methods for problems with strongly convex feasible sets and possibly nonconvex objective functions (even if the functional J in (1) is convex on the set of admissible control–trajectory pairs, the discretized problem may fail to be convex!).
Versions of the gradient projection method (GPM) and the conditional gradient method (CGM) are widely studied (see e.g. [18, 19] and the references therein), but results about linear convergence of the generated sequence of iterates seem to be available only for problems with strongly convex objective functions. Exceptions are the papers [6, 15], where strong convexity is assumed for the feasible set instead of the objective function. However, as clarified in the end of Sect. 2.1 below, the additional assumptions in these two papers are rather strong and are not fulfilled for the problem arising in the optimal control context as described above.
In this paper we present convergence results for the gradient projection and the conditional gradient methods for minimization problems in a Hilbert space, where the feasible set is strongly convex but the objective functional is not necessarily convex. These results are new even for convex or strongly convex objective functional, but we relax the convexity assumption due to the needs of our main goal—to cover the problems arising in optimal control of affine systems, as described above. For that we consider objective functionals that we called, for shortness, \((\varepsilon ,\delta )\)approximately convex. These functions constitute a larger class than that of the weakly convex functions (see e.g. [4]). In Sect. 2.1 we prove linear convergence of the sequence of approximate solutions generated by the GPM, provided that the step sizes are appropriately chosen. Apart from the applicability for nonconvex objective functionals, this result does not require the additional conditions in [6, 15]. As usual, the “appropriate” choice of the step sizes is expressed by some constants related to the data of the problem, which are often not available (or very roughly estimated). Therefore, we present an additional convergence result involving a rather general and constructive condition for the step sizes (wellknown in the literature).
The conditional gradient method may have some advantages (compared with the GPM) in our optimal control application. For this reason we also prove a linear convergence result for the CGM. This is done in Sect. 2.2.
In Sect. 3 we turn back to the optimal control problem (1)–(3). The first two subsections are preliminary, where we introduce notations, formulate assumptions and present the discrete approximation introduced in [21, 24] and the error estimate proved in [24]. All this is needed for understanding of the implementation of the GPM and the CGM and of the proofs of the error estimations. Then, in Sects. 3.3 and 3.4 we prove the applicability of the abstract convergence results, obtained in Sect. 2, to our discretized optimal control problem and present details about the implementation of the GPM and the CGM. A numerical example that confirms the theoretical findings is given in Sect. 3.5.
The paper concludes with indication of some open problems for further research (Sect. 4).
2 Gradient Methods for Problems with Strongly Convex Feasible Set
As usual, \(\left<\cdot ,\cdot \right>\) denotes the inner product in H and \(\Vert \cdot \Vert \)—the induced norm.
Below we remind the following notions.
Definition 2.1
An alternative definition is often used in the literature: a set is strongly convex (with respect to the number \(R > 0\)) if it coincides with the intersection of all balls of radius R containing this set. The two definitions are equivalent (see e.g. [28, Theorem 1]) and the relation between \(\gamma \) and R is that \(R = 1/\gamma \).^{1}
Definition 2.2
The following definition introduces a property that is usually called “weak convexity” or “paraconvexity” (see e.g. [4]).
Definition 2.3
Definition 2.4
Notice that \(\delta \) can be taken equal to zero in the above definition, in which case the \((\varepsilon ,\delta )\)approximate convexity reduces to \(\varepsilon \)convexity.
The following three results provide the ground for the error analysis of the GPM and the CGM.
Proposition 2.1
Proof
Property (7) will play an important role in the further analysis. In fact, the \((\varepsilon ,\delta )\)approximate convexity of f and the strong convexity of K were needed just to ensure existence of \(\nu > 0\) and \(\delta \ge 0\) for which condition (7) is fulfilled. We mention that (7) is always fulfilled if the set K is convex and the functionf is strongly convex, which is not the case here.
Lemma 2.1
Let f be differentiable on K and let condition (7) be fulfilled with some \(\nu > 0\). If for some \(w \in K\) and \(\lambda > 0\) it holds that \(P_K(w  \lambda \nabla f(w)) = w\), then \(\Vert w  \hat{w}\Vert \le \delta \).
Proof
Lemma 2.2
Let f be differentiable on K and let condition (7) be fulfilled with some \(\nu > 0\). If for some \(w \in K\) it holds that \(\nabla f(w) = 0\), then \(\Vert w  \hat{w}\Vert \le \delta \).
Proof
2.1 The Gradient Projection Method
For solving the minimization problem (4), we consider first the most classical algorithm, the gradient projection method (GPM) stated below. In the formulation of the algorithm we only assume that f is Lsmooth.

Step 0: Choose \(w_0\in K\). Set \(k=0\).

Step 1: If \(w_{k}=P_K\left( w_k \nabla f(w_k)\right) \) then Stop. Otherwise, go to Step 2.
 Step 2: Choose \(\lambda _k >0 \) and calculateReplace k by \(k+1\); go to Step 1.$$\begin{aligned} w_{k+1}=P_K \left( w_k\lambda _k \nabla f(w_k)\right) . \end{aligned}$$(9)
In this subsection, we prove that if condition (7) is fulfilled with \(\nu > 0\) then the sequence \(\left\{ w_k \right\} \) generated by the GPM linearly approaches \(\hat{w}\) at least until entering a \(\delta \)neighborhood of \(\hat{w}\). Proposition 2.1 gives conditions for existing of such \(\nu \) in terms of strong convexity of the set K and \((\varepsilon ,\delta )\)approximate convexity of the function f. We mention that if the above algorithm of the GPM stops at Step 1 for some k then, according to Lemma 2.1, \(\Vert w_k  \hat{w}\Vert \le \delta \), that is, a \(\delta \)approximate solution is attained (obviously this is meaningful only if \(\delta \) is sufficiently small).
Proposition 2.2
Proof
Now we can state and prove the main convergence result for the GPM.
Theorem 2.1
Before proving the theorem we mention that in the case of an \(\varepsilon \)convex function f (that is, if \(\delta = 0\)) the first claim of the theorem means that the sequence generated by the GPM converges linearly to the (unique) solution \(\hat{w}\). In the case \(\delta > 0\) we also have linear convergence at least until the generated sequence enters the \(\delta \)neighborhood of \(\hat{w}\). Thus in this case the theorem is meaningful only if \(\delta \) is reasonably small.
Proof
Remark 2.1
If the constants L and \(\nu \) can be reasonably estimated, then inequalities (19) and (20) can be used to estimate the number of iterations of the GPM needed to achieve a given accuracy.
Remark 2.2
Since the parameters L and \(\nu \) are usually not known in advance, we can consider the step size sequence \(\left\{ \lambda _k \right\} \) as any nonsummable converging to zero sequence of positive real numbers as it follows in the next theorem.
Theorem 2.2
Clearly, in the case \(\delta = 0\) the first claim of the theorem implies strong convergence of the sequence \(\{w_k\}\).
Proof
Remark 2.3
 (i)For any k, there exists a unit vector \(n(w_k) \in N_K(w_k)\) such thatwhere \(N_K(w_k)\) is the normal cone to K at \(w_k\) defined as$$\begin{aligned} \left\langle n(w_k), \nabla f(w_k) \right\rangle \le 0, \end{aligned}$$$$\begin{aligned} N_K(w_k):={\left\{ \begin{array}{ll} \emptyset &{} \text{ if } \quad w_k \notin K, \\ \{l\in H:\langle l,vw_k \rangle \le 0\ \forall v\in K \} &{} \text{ if } \quad w_k\in K. \end{array}\right. } \end{aligned}$$
 (ii)
The problem (4) has a unique solution and it belongs to the boundary of K.
2.2 The Conditional Gradient Method
In this subsection, we consider the conditional gradient method (CGM) for solving problem (4) with a \(\gamma \)strongly convex set K and an \((\varepsilon ,\delta )\)approximate convex and Lsmooth function f. This method dates back to the original work of Frank and Wolfe [13] which presented an algorithm for minimizing a quadratic function over a polytope using only linear optimization steps over the feasible set. The CGM for solving (strongly) convex problem was investigated in [8, 9, 14].

Step 0: Choose \(w_0\in K\). Set \(k=0\).
 Step 1: If \(\nabla f(w_k) = 0\), then Stop. Otherwise, find a solution \(x_k\) of the problem$$\begin{aligned} \min _{y \in K} \, \left\langle \nabla f(w_k), y\right\rangle . \end{aligned}$$(25)

Step 2: If \(x_k=w_k\), then Stop. Otherwise, go to Step 3.
 Step 3: If \(\nabla f(w_k) \not = 0\), choose \(\eta _k \in (0, \min \lbrace 1,\frac{\gamma \Vert \nabla f(w_k)\Vert }{4L}\rbrace ] \), calculatereplace k by \(k+1\), and go to Step 1. Else the iteration process terminates.$$\begin{aligned} w_{k+1}=(1\eta _k)w_k+\eta _k x_k, \end{aligned}$$(26)
In general, problem (25) may fail to have a solution, in which case the CGM is not executable.
Remark 2.4
The objective function in the subproblem (25) in the CGM is linear, thus if K is a polytope, we encounter a linear programming problem which should be easier to solve than the quadratic programming subproblem (9) in the GPM. In the case considered in this paper the set K is not a polytope, thus (25) is not a linear programming problem. However, in our main application (see the next section) the set K is a product of (possibly large number of) simple twodimensional strongly convex sets, so that (25) decomposes into twodimensional subproblems that are easy to solve.
We will use the following global version of \((\varepsilon ,\delta )\)approximate convexity.
Definition 2.5
We begin the convergence analysis of the CGM with an inequality which will play a key role for obtaining convergence results. For convenience we assume that if the CGM terminates at some finite iteration \(k =i\), (due to \(\nabla f(w_i) = 0\)) then the sequence \(\{ w_k \}\) is extended as \(w_k = w_i\) for \(k > i\).
Proposition 2.3
Proof
If \(\nabla f(w_i) = 0\) for some i, we have \(x_k=w_k\) and \(\Delta _k = 0\) for all \(k \ge i\), hence (28). Thus we may assume that \(\nabla f(w_k) \not = 0\) for the arbitrarily fixed k in the consideration below.
We are now in a position to establish the convergence results for the CGM.
Theorem 2.3
Clearly, in the case \(\delta = 0\), the first and the second claims of the theorem mean that the sequences \(\left\{ f(w_k)\right\} \) and \(\left\{ w_k\right\} \) converge linearly to \(\hat{f}\) and \(\hat{w}\), respectively. In the case \(\delta > 0\) we also have linear convergence at least until the generated sequence enters the \(\delta \)neighborhood of \(\hat{w}\).
Proof
3 The Affine Optimal Control Problem
In this section we turn back to the control–affine linearquadratic problem (1)–(3) and prove that the gradient projection methods considered in the previous section are applicable to the (high order) discretization of the problem recently developed in [21, 24]. (This also applies to the conditional gradient method, where the analysis is similar). We also provide error estimates regarding both the errors due to discretization and those due to truncation of the gradient projection iterations.
The first two subsections reproduce assumptions and results from [24] that are necessary for understanding the implementation of the GPM to the discretized version of problem (1)–(3). The next subsections prove the applicability of the abstract results obtained above, present details about the implementation of the gradient methods, and provide results of computational experiments.
3.1 Notations and Assumptions
As usual, \(L_2([0,T];{\mathbb {R}}^m)\) denotes the Hilbert space of all measurable squareintegrable functions \([0,T] \rightarrow {\mathbb {R}}^m\) with scalar product \(\langle u_1, u_2 \rangle = \int _0^T \langle u_1(t), u_2(t) \rangle {\mathrm{\,d}}t\) and the corresponding norm is denoted again by \(\Vert \cdot \Vert _2\).
We begin with some assumptions concerning the problem (1)–(3).
Assumption A1
The matrix functions A(t), B(t), W(t) and S(t), \(t \in [0,T]\), have Lipschitz continuous first derivatives, Q and W(t) are symmetric. Moreover, the matrix \(B(t)^{\top }S(t)\) is symmetric for all \(t\in [0,T]\).
Denote by \({\mathcal {F}}\) the set of all admissible control–trajectory pairs (u, x), that is, all pairs of an admissible control u and the corresponding (absolutely continuous) solution x of (2). By a standard argument, problem (1)–(3) has a solution, \((\hat{x},\hat{u}) \in {\mathcal {F}}\), which from now on will be considered as fixed.
Assumption A2
The first part of Assumption (A1) is standard, while the last requirement is demanding but known from the literature, usually expressed in terms of the Lie brackets of the involved controlled vector fields see e.g. [26]. It is certainly fulfilled in the case of singleinput systems, \(m = 1\). Assumption (A2) is a directional convexity assumption at \((\hat{x},\hat{u})\), which is somewhat weaker than the usual convexity assumption for the functional J in (1) regarded as a functional on the set of admissible controls (viewing x as a function of u).
Assumption A3
(strict bang–bang property)
3.2 HighOrder TimeDiscretization
In this subsection we recall the discretization scheme for problem (1)–(3) presented in [24], which has a higher accuracy than the Euler scheme without a substantial increase of the numerical complexity of the discretized problem. The approach uses second order truncated Volterra–Fliess series. The discretization scheme is described as follows.
For the subsequent analysis it will be important that the set \(Z \subset {\mathbb {R}}^2\) is strongly convex. This is evident from Fig. 1, but the calculation of a modulus \(\gamma \) is cumbersome and we skip the details. In this calculation we use Theorem 1 in [28] (expressing \(\gamma \) by the Lipschitz constant of the mapping that maps a unit vector to that point on the boundary of Z at which this vector is normal to Z) and the explicit formula for the normal cone to Z given in [21, Sect. 4]. The number \(\gamma = 1/\sqrt{32}\) turns out to be a modulus of strong convexity of Z.
The following theorem is extracted from Theorem 3.1 in [24].
Theorem 3.1
We mention that the above discretization scheme is meaningful even without assuming (A2) and (A3). These assumptions are only needed for the error estimate in Theorem 3.1.
3.3 Applicability of the Results About GradientType Methods
In this subsection we prove that the assumptions needed for applicability of the results in Sect. 2 to the above problem are fulfilled.
Next, we present five technical lemmas which are needed in the proof of the main result in this section—Proposition 3.1. In the proofs, \(c_1, c_2, \ldots \) denote nonnegative constants that may depend on the data of the problem (1)–(3) (and their derivatives) but are independent of N. These constants may have different values in different proofs.
Lemma 3.1
Proof
Lemma 3.2
Proof
Lemma 3.3
Proof
The Fréchet differentiability of \(f^h\) was established in [24]), together with the representation (55) of its derivative. The Lipschitz continuity on K follows from this representation, together with (57) and (58) (the notations are as in the proof of Lemma 3.1). \(\square \)
We remind that \(\hat{w}^N \in K\) denoted in Theorem 3.1 an optimal control sequence in the discrete problem (53). Further it will be convenient to skip the superscript N in this notation.
Lemma 3.4
Proof
Now, let us define \(\delta _1 := M \sqrt{2mr/T}\), where M is the diameter of the set Z (which is \(\sqrt{5}\)). Moreover, define the natural number \(N_0\) as bigger than \(4 \bar{c} T/\alpha \), so that \(\bar{c}h \le \alpha /4\).
Lemma 3.5
Proof
Proposition 3.1
On the assumptions (A1)–(A3), the function \(f^h\) is Lsmooth on K and there exist numbers \(N_0\), \(\nu _0 > 0\) and \(\delta _0\) such that for every \(N \ge N_0\) condition (7) in Proposition 2.1 (hence, also the assumptions in Proposition 2.2 and Theorems 2.1 and 2.2) is fulfilled for problem (53) with \(\nu = \nu _0h\) and \(\delta = \delta _0 \sqrt{h}\).
Proof
Let us interpret the above proposition in view of Theorem 2.1 for convergence of the gradient projection method (GPM) applied to the discrete problem (52) and (53). The linear rate of convergence, \(\mu \), as estimated in this theorem, may approach 1 when \(\nu \) approaches zero. In the same time, Proposition 3.1 estimates \(\nu \) as proportional to h. Thus, although the convergence is linear, its rate, \(\mu \), may be close to one. Even more, this rate of convergence is valid only until an accuracy \(\delta \) is achieved (see Theorem 2.1). The number \(\delta \) in Proposition 3.1 is estimated as proportional to \(\sqrt{h}\). Thus the convergence of the GPM does not seem to be consistent with the \(O(h^2)\)approximation that the discretization method provides. On the other hand, the fact that the GPM is proved to converge (even linearly, in the sense of Theorem 2.1) is remarkable. Indeed, if the Euler discretization scheme is applied to the original problem (1)–(3) (as in most of the literature), the resulting discretetime problem may fail to be convex, and no results about the rate of convergence of the GPM are available in the literature, to the authors’ knowledge.
We do not present the convergence analysis of the CGM for problem (52) and (53), which is rather similar.
3.4 Implementation of the Gradient Methods
Now, we shall describe the implementation of the GPM and the CGM to the specific mathematical programming problem defined by (53) and (52).
The two key points in the implementation of the gradient methods are: (i) calculation of the gradient \(\nabla f^h(w)\); calculation of projections on K (for the GPM) or solving a linear optimization problem on K (for the CGM). We do not discuss here the issue of the choice of the step sizes \(\lambda _k\), for which numerous possibilities are known from the literature.
1. Calculation of \(\nabla f^h(w)\) Since \(f^h\) represents the objective function of a discretetime optimal control problem as a function of the control variables (the state being implicitly regarded as a function of the control), we employ the well known in control theory way for calculating its gradient: \(\nabla f^h(w)\) is the derivative of the Hamiltonian with respect to the control, evaluated at the current control–trajectory pair, together with the corresponding solution of the adjoint equation. The explicit formula is given in (55), reproducing [24, Sect. 3.2].
2. Calculation of the projection on K
The set K is a product of \(m\times N\) copies of the strongly convex set Z, thus the projection of a vector \(w \in H\) onto K is represented by projections onto Z of the twodimensional components of w. Thus we have to only calculate projections, \(P_Z(u,v)\) on Z, where \((u,v)^\top \in {\mathbb {R}}^2\).
3. Solving the auxiliary subproblem in the CGM
Now, we consider the subproblem \(\min _{y \in K} \, \langle \nabla f^h(w), y\rangle \) which appears in the implementation of the CGM (see (25)).
3.5 Numerical Examples
In this subsection, we present some numerical experiments for the example of an affine linearquadratic optimal control problem given in [24].
Example 3.1
Convergence rates for the GPM
N  10  20  30  40  50  60  70  80  90  100 

\(\mu _N \)  0.2744  0.4687  0.5742  0.6477  0.6874  0.7166  0.7327  0.8038  0.8736  0.8778 
Table 1 indicates that the (numerically obtained) rate of linear convergence, \(\mu _N\), of the GPM depends on the mesh size N: it is monotone increasing and likely approaching 1 when N increases. This is to be expected, since according to Theorem 2.1, the rate \(\mu _N\) of linear convergence approaches 1 when \(\nu \) goes to zero, and according to Proposition 3.1 \(\nu \) estimated as proportional to h. Actually, the convergence of \(\mu _N\) to 1 is also consistent with the fact, that the GPM applied (theoretically) to the continuoustime problem (1)–(3) converges sublinearly, as recently established in [22, Theorem 3.2]. We emphasize that due to the second order accuracy of discretization, the mesh size N does not need to be taken large, therefore the rate of linear convergence may be reasonably good (see Table 1 for \(N = 10\)–30).
Convergence rates for the CGM
N  10  20  30  40  50  60  70  80  90  100 

\(\theta _N \)  0.8946  0.8999  0.9016  0.9023  0.9028  0.9030  0.9032  0.9034  0.9035  0.9036 
4 Concluding Remarks
In this paper we obtain a number of new results about the convergence of gradient methods for general optimization problems on strongly convex feasible sets. The main motivation is the application of a recently developed discretization scheme [21, 24] for linearquadratic affine optimal control problems, which results in discretetime problems of the same type, however, with strongly convex pointwise control constraints having rather simple representations by means of quadratic inequalities. This opens several directions of further research.
First, to develop more efficient (than gradient projection) methods using the specific linearquadratic structure of the objective function and of the constraints.
Second, to investigate the applicability of gradient projection methods to discretized nonlinear optimal control problems with the control appearing linearly. As indicated in [17], our discretization approach is also applicable to such problems, and results in mathematical programming problems with strongly convex feasible sets. The general convergence results obtained in the present paper are also applicable, in principle. The main open problem here, is that the error analysis of the discretization is not developed for nonlinear problems, which also creates problems to justify the applicability and the convergence of gradient methods.
Footnotes
Notes
Acknowledgements
Open access funding provided by Austrian Science Fund (FWF).
References
 1.Alt, W., Baier, R., Lempio, F., Gerdts, M.: Approximations of linear control problems with bangbang solutions. Optimization 62, 9–32 (2013)MathSciNetCrossRefGoogle Scholar
 2.Alt, W., Schneider, C., Seydenschwanz, M.: Regularization and implicit Euler discretization of linearquadratic optimal control problems with bangbang solutions. Appl. Math. Comput. 287, 104–124 (2016)MathSciNetGoogle Scholar
 3.Alt, W., Felgenhauer, U., Seydenschwanz, M.: Euler discretization for a class of nonlinear optimal control problems with control appearing linearly. Comput. Optim. Appl. (2017). https://doi.org/10.1007/s1058901799697 MathSciNetCrossRefGoogle Scholar
 4.Attouch, H., Aze, D.: Approximation and regularization of arbitrary functions in Hilbert spaces by the LasryLions method. Ann. Inst. Henri Poincaré 3, 289–312 (1993)MathSciNetCrossRefGoogle Scholar
 5.Balashov, M.V.: Maximization of a function with Lipschitz continuous gradient. J. Math. Sci. 209, 12–18 (2015)MathSciNetCrossRefGoogle Scholar
 6.Balashov, M.V., Golubev, M.O.: About the Lipschitz property of the metric projection in the Hilbert space. J. Math. Anal. Appl. 394, 545–551 (2012)MathSciNetCrossRefGoogle Scholar
 7.Beck, A., Teboulle, M.: A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)MathSciNetCrossRefGoogle Scholar
 8.Demyanov, V.F., Rubinov, A.M.: Approximate Methods in Optimization Problems. Elsevier, New York (1970)Google Scholar
 9.Dunn, J.C.: Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals. SIAM J. Control Optim. 17, 187–211 (1979)MathSciNetCrossRefGoogle Scholar
 10.Felgenhauer, U.: On stability of bangbang type controls. SIAM J. Control Optim. 41, 1843–1867 (2003)MathSciNetCrossRefGoogle Scholar
 11.Felgenhauer, U.: Discretization of semilinear bangsingularbang control problems. Comput. Optim. Appl. 64, 295–326 (2016)MathSciNetCrossRefGoogle Scholar
 12.Felgenhauer, U.: A Newtontype method and optimality test for problems with bangsingularbang optimal control. Pure Appl. Funct. Anal. 1, 197–215 (2016)MathSciNetzbMATHGoogle Scholar
 13.Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Q. 3, 149–154 (1956)MathSciNetCrossRefGoogle Scholar
 14.Garber, D., Hazan, E.: Faster rates for the FrankWolfe method over stronglyconvex sets. In: ICML’15, vol. 37, pp. 541–549 (2015)Google Scholar
 15.Golubev, M.O.: Gradient projection method for convex function and strongly convex set. IFACPapersOnLine 48, 202–205 (2015)CrossRefGoogle Scholar
 16.Kinderlehrer, D., Stampacchia, G.: An Introduction to Variational Inequalities and Their Applications. Academic Press, New York (1980)zbMATHGoogle Scholar
 17.Lempio, F., Veliov, V.M.: Discrete approximations of differential inclusion. Bayreuth. Math. Schr. 54, 149–232 (1998)MathSciNetzbMATHGoogle Scholar
 18.Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming. Springer, New York (2008)zbMATHGoogle Scholar
 19.Nesterov, Y.: Introductory Lectures on Convex Optimization. Springer, New York (2013)zbMATHGoogle Scholar
 20.Peypouquet, J.: Convex Optimization in Normed Spaces: Theory, Methods and Examples. Springer, Dordrecht (2015)CrossRefGoogle Scholar
 21.Pietrus, A., Scarinci, T., Veliov, V.M.: High order discrete approximations to Mayer’s problems for linear systems. SIAM J. Control Optim. 56, 102–119 (2018)MathSciNetCrossRefGoogle Scholar
 22.Preininger, J., Vuong, P.: On the convergence of the gradient projection method for optimal control problems with bangbang solutions. Comput. Optim. Appl. 70, 221–238 (2018)MathSciNetCrossRefGoogle Scholar
 23.Preininger, J., Scarinci, T., Veliov, V.M.: Metric regularity properties in bangbang type linearquadratic optimal control problems. SetValued Var. Anal. https://doi.org/10.1007/s1122801804881. Available as Research Report, 201707, ORCOS. TU Wien, Wien. https://orcos.tuwien.ac.at/fileadmin/t/orcos/Research_Reports/201707.pdf (2017)
 24.Scarinci, T., Veliov, V.M.: Higherorder numerical schemes for linear quadratic problems with bangbang controls. Comput. Optim. Appl. 69, 403–422 (2018). https://doi.org/10.1007/s105890179948z MathSciNetCrossRefzbMATHGoogle Scholar
 25.Seydenschwanz, M.: Convergence results for the discrete regularization of linearquadratic control problems with bangbang solutions. Comput. Optim. Appl. 61, 731–760 (2015)MathSciNetCrossRefGoogle Scholar
 26.Veliov, V.M.: On the timediscretization of control systems. SIAM J. Control Optim. 35, 1470–1486 (1997)MathSciNetCrossRefGoogle Scholar
 27.Veliov, V.M.: Error analysis of discrete approximation to bangbang optimal control problems: the linear case. Control Cybern. 34, 967–982 (2005)MathSciNetzbMATHGoogle Scholar
 28.Vial, J.P.: Strong convexity of sets and functions. J. Math. Econ. 9, 187–205 (1982)MathSciNetCrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.