Receding-Horizon Control of Constrained Switched Systems with Neural Networks as Parametric Function Approximators

This work studies receding-horizon control of discrete-time switched linear systems subject to polytopic constraints for the continuous states and inputs. The objective is to approximate the optimal receding-horizon control strategy for cases in which the online computation is intractable due to the necessity of solving mixed-integer quadratic programs in each discrete time instant. The proposed approach builds upon an approximated optimal finite-horizon control law in closed-loop form with guaranteed constraint satisfaction. The paper derives the properties of recursive feasibility and asymptotic stability for the proposed approach. A numerical example is provided for illustration and evaluation of the approach.


Introduction
Applications from many domains, such as flexible production systems, smart grids, autonomous vehicles, logistic systems, smart cities, etc., are nowadays termed as cyberphysical systems (CPS), see e.g. [1]. These systems are characterized by the integration of physical objects with digital components of information processing and control interacting over communication networks. A typical property of CPS is the co-existence and interaction of continuous dynamics (most often stemming from the physical objects) and discrete-event dynamics (mostly arising from the digital components). Mixed continuous-discrete dynamics, also referred to as hybrid dynamics, has been subject of intensive research over the last decades [2].
A special class of hybrid dynamic systems are switched systems, for which the transition between different continuous dynamics is defined by a switching logic. With respect to this logic, one can distinguish between internally forced and externally forced switching [3]. The earlier type typically depends on the continuous state, and a frequently considered example is piecewise affine systems [4]. In contrast, the paper on hand focuses on switched systems with externally forced switching, where an external decision unit decides on when to execute a transition to a new mode of continuous dynamics. This class of switched systems is relevant for applications in which, e.g., a discrete or supervisory controller (such as a programmable logic controller) coexists with continuous control loops. More precisely, this paper studies the control of discrete-time switched linear systems (DSLS) with mixed-inputs, where discrete control inputs have to be selected to determine the continuous dynamics, and continuous inputs serve to achieve control goals for the continuous state, in particular the satisfaction of state constraints.
Discrete-time linear quadratic regulation problems for switched linear systems without constraints, here referred to as DSLQR problems, have been intensively studied before. It is shown in [5] that any finite-horizon value function of the DSLQR problem is a pointwise minimum of a finite number of quadratic functions. The quadratic functions are exactly described by a set of positive definite matrices obtained from the dynamic programming (DP) solution of the DSLQR problem. Even for a finite time horizon, the computation of the exact solution to the DSLQR problem This article is part of the topical collection "Informatics in Control, is an NP-hard [6] problem. The critical point is that the number of possible discrete input sequences, and by that also the number of quadratic functions, grows exponentially with the time horizon. DP with pruning [7] and relaxed DP [8,9] have been shown to reduce complexity drastically, as is also described in the generalizing work [4,10].
It is proven in [5] that the finite-horizon value function converges under certain conditions exponentially fast to the infinite-horizon value function. The work proposes a relaxation framework to solve the infinite-horizon DSLQR problem with guaranteed closed-loop stability but suboptimal performance. The use of a stabilizing base policy and the concept of rollout are exploited in [11] to address the problem of finding low complexity policies with (preferably tight) performance bounds for the infinite-horizon DSLQR problem. Recently, a Q-learning algorithm with customized Q-function approximation has been proposed in [12]. The approach is based on analytic results for the value function, and it addresses the infinite-horizon DSLQR problem for higher dimensional cases-these are currently intractable to be solved by state-of-the art methods. In general, receding horizon control (RHC) constitutes an approach to approximating infinite-horizon control problems [13]. According to [10], an RHC strategy can be expressed explicitly as a piecewise linear state feedback control law defined over (usually non-convex) regions.
The DSLQR problem for switched linear systems with polytopic constraints for the continuous states and inputs, here referred to as DCSLQR, is less studied. In [14], the infinite-horizon DCSLQR problem is addressed by splitting the problem into an unconstrained DSLQR problem and a finite-horizon DCSLQR problem. The finite-horizon DCS-LQR part is then formulated as a mixed-integer quadratic programming (MIQP) problem. In general, MIQP problems are known to be NP-hard problems. The online solution of such MIQP problems for the control of switched systems is addressed in [15], by determining a trade-off between performance and computation time through a tailored tree search with cost bounds and search heuristics. The finitehorizon DCSLQR problem has been recently considered in a previous paper of the authors [16]. There, the optimal closed-loop control law has been approximated by neural networks (NN) which are trained offline. This allows one a fast determination of guaranteed admissible and (preferably optimal) continuous and discrete inputs for any state of the DSLS. In general, NN have the attractive property of being able to approximate functions arbitrarily close when used with general activation functions [17][18][19], and thus have contributed significantly to the recent success of machine learning applications [20][21][22][23]. The use of NN as function approximators for policies (control laws) and value (costto-go) functions, as in DeepMind's popular computer program AlphaZero [23], is the core of numerous approximate dynamic programming and reinforcement learning algorithms [24][25][26][27][28][29][30]. Moreover, NN have also been considered recently for approximating receding-horizon (or model predictive) control laws [31][32][33][34] Missing so far are techniques that efficiently solve and represent the result of the DLSQR problem for the case with constraints over infinite horizons-proposing a corresponding technique by approximating the solution by NN while ensuring all constraints is the goal and contribution of this paper. Note that for the case without constraints, the work in [10] has shown that solving an MIQP problem at each time instant is necessary, leading to high computation times for larger problem instances. However, the MIQP problem to be solved at each time step can be transformed into one for DSLS with finite time horizons, as proposed in [16]. This motivates the concept of approximating the optimal RHC strategy by use of NN, and by exploiting the results from [16] for efficient computation. The objective thus is to suggest an approach which makes the computation of the control inputs (based on the approximating RHC strategy) significantly faster while guaranteeing constraint satisfaction. Thus, the present paper extends the work in [16] from a single optimization over a finite horizon to a setting of receding (finite) horizons in order to cover infinite time spans. As a consequence of this extension, recursive feasibility and asymptotic stability have to be considered as additional aspects, and this paper shows how these properties can be proven, requiring in addition, of course, that all state and input constraints are guaranteed to be satisfied throughout.
The paper is structured such that Section "Problem Formulation and Preliminaries" first introduces the RHC problem, and analyzes the properties and challenges of its solution. Motivated by the challenges, a simplified RHC problem is introduced, which can be transformed into the finite-horizon control problem considered in [16]. Section "Finite-Horizon Control with Parametric Function Approximators" reminds the reader of fundamental results from [16], which are then used to develop the new approach in Section "Receding-Horizon Control with Parametric Function Approximators". A numerical example is provided for illustration in Section "Numerical Example", and the paper is concluded in Section "Conclusion". The "Appendix A. Proofs of the Propositions" contains all proofs for the results established in the mentioned sections.

Problem Formulation and Preliminaries
For defining discrete-time switched linear systems (DSLS), let x k ∈ ℝ n x denote the continuous state, u k ∈ ℝ n u the continuous control input, and v k ∈ = {1, … , M} the discrete control input, all for time k ∈ ℕ 0 ∶= ℕ ∪ {0} . The latter selects for any k the parameterization (A i , B i ) of a linear dynamics with system matrix A i ∈ ℝ n x ×n x and input matrix B i ∈ ℝ n x ×n u . The DSLS is then written as: with initial state x 0 ∈ ℝ n x . The states and inputs are subject to constraints: where it is required that X and U are polytopic and contain the origin in their interior. Subsequently, let x j|k denote a continuous state at time k + j predicted at time k. According to (1), the predicted state at time k + j + 1 is given by: where u j|k and v j|k denote the continuous and discrete inputs at time k + j predicted at time k, respectively. Predicted input sequences at k over a time span {k + j, … , k + N − 1} are written as: Consider the quadratic cost-to-go over the horizon from j to N: subject to (1) with terminal cost function: and stage cost function: For the sake of simplicity, the shorter notation J j instead of J j→N , u j|k instead of u j→N−1|k , and v j|k instead of v j→N−1|k is used when appropriate.
This work aims at determining online a control strategy based on a receding-horizon principle for steering the DSLS from the initial state x 0 into the origin.
a continuous input sequence u * 0|k ∶= u * 0|k , … , u * N−1|k and a discrete input sequence v * 0|k ∶= v * 0|k , … , v * N−1|k with prediction horizon N, such that the system state reaches the origin within N time steps, while the cost function (3) is minimized: If a feasible solution to the problem exists and an optimal receding-horizon control (RHC) strategy is obtained, it can be applied by imposing the first element u * 0|k of the continuous input sequence u * 0|k and the first element v * 0|k of the discrete input sequence v * 0|k to (1) at k: The closed-loop dynamics for the DSLS controlled by the strategy is then: , v * 0|k denote the optimal cost-to-go for steering x k into the origin within N steps.

Recursive Feasibility and Stability
The symbol X j→N is introduced to denote the set of states that can be steered into the origin within N − j steps. A recursive definition of X j→N is given by: with i ∈ {N − 1, … , j} . When appropriate, the shorter notation X j is used instead of X j→N . The Problem 1 has a feasible solution if and only if x k is an element of X 0 .
An RHC strategy is called recursively feasible if x 0 ∈ X 0 implies feasibility of Problem 1 for all future states x k , k > 0 . From the definition of the RHC strategy follows that x k ∈ X 0 implies x k+1 ∈ X 1 . Hence, a sufficient condition for recursive feasibility is that X 0 ⊇ X 1 .
be the set of states that can be steered into the origin within N − j steps for a fixed discrete input sequence (v j , … , v N−1 ): For brevity, the more compact notation: is used, where the operator Pre (v j ) (S) returns the set of predecessor states to the set S ⊆ ℝ n x for a fixed discrete input v j : It follows that X j is the union of all possible sets X (v j ,…,v N−1 ) j : and the following propositions can be established (see the Appendix for the corresponding proofs).

Proposition 1 For a given prediction horizon N and a fixed discrete input sequence
The sets X j are non-convex in the general case, with the property that: (7) is recursively feasible. (8) is asymptotically stable with domain of attraction X 0 .

Proposition 3 The origin of the closed-loop system
The Propositions 1-3 are extensions of results known from literature on RHC for systems without switching, as e.g. in [35]. More particularly, the sets X j are extensions (of the feasible sets considered there) with respect to the discrete inputs. If X N is control-invariant (as defined in the proof of Proposition 1), then the sets X j share the property (with the sets of the case without switching) that X j grows as j decreases, and stops growing when reaching the maximal control-invariant set (see e.g. [35,Remark 11.3]). Note that X N is a singleton here, which contains the origin and thus is control-invariant.

Properties of the RHC Problem
Let V * j denote the optimal cost-to-go for steering x j|k into the origin within N − j steps for a chosen discrete input sequence v j|k : The optimization problem (12) is a quadratic program (QP), and has a feasible solution if and only if x j|k ∈ X As a convention, V * j x j|k , v j|k is set to infinity for the case that (12) has no feasible solution.
The optimal cost-to-go J * j x j|k for steering x j|k into the origin within N − j steps, as well as the corresponding input sequences u * j|k = Φ u * j x j|k and v * j|k = Φ v * j x j|k may be obtained by solving a QP for each possible discrete input sequence v j|k : Denote as U v j x j|k , v j|k the set that contains the admissible continuous inputs for a state x j|k and a discrete input v j|k to reach the state set X j+1 : Instead of solving a QP problem for each possible discrete input sequence, the optimal cost-to-go J * 0 (x k ) and the optimal RHC strategy (7) may be computed, in principle, by setting j = 0 and solving the optimization problem: for each possible discrete input v j|k ∈ and just one step, leading to:

SN Computer Science
and: This requires, of course, that J * j+1 and U v j x j|k , v j|k are already known and that a globally optimal solution is found. By definition, J * N is equal to the terminal cost (4), i.e. J * N x N|k ∶= g N x N|k . The following propositions establish that the optimization problem (17) is a difficult one, in the general case, since it constitutes a nonlinear program with non-convex objective function J * j+1 and non-convex con- analog to the definition of the pointwise maximum in [36].)

Proposition 4
The optimal cost-to-go function J * j , j ∈ {0, … , N − 1} is in the general case a pointwise minimum of functions that are convex and piecewise quadratic on polyhedra.

Challenges and Objective
Problem 1 is an MIQP and known to be NP-hard. The number of possible discrete input sequences, given by | N | = M N , grows exponentially with the prediction horizon N. Thus, the trivial approach of solving a QP for each possible discrete input sequence to solve Problem 1 is computationally intractable in almost all cases. While more efficient approaches to solve MIQPs exist (such as branch-and-bound or branch-and-cut techniques), these approaches typical still require too much time for the online optimization of RHC.
On first sight, the approach of computing the optimal RHC strategy (7) according to (19) and (20) requires only the solution of M optimization problems. However, these optimization problems are in general nonlinear programs with non-convex objective function J * 1 and non-convex constraints U v 0 x k , v k , and thus challenging to solve. Moreover, J * 1 has a complicated form (pointwise minimum of functions that are convex and piecewise quadratic on polyhedra, see Proposition 4), and the derivation of analytic solution is usually not possible. Last but not least, the proof of Proposition 5 provides the insight that the determination of U v 0 x k , v k relies on the computation of the union of M N−1 polytopes, and requires the expensive offline computation of M N−1 many polytopes X Thus, the objective of the further derivations of this work is to efficiently approximate the optimal RHC strategy to make the computation of the control inputs faster, while guaranteeing properties like constraint satisfaction, recursive feasibility, and asymptotic stability.

Simplified RHC Problem
As discussed above, the optimal RHC strategy could be computed by solving (19) and (20). The problem is, however, that the set of admissible continuous inputs U v 0 x k , v k is non-convex. While established methods exist for nonlinear programs with convex constraints, the solution of a nonlinear program with non-convex constraints (arising for each discrete input sequence) is computationally intractable for online application in most cases. Moreover, the determination of U v 0 x k , v k is computationally demanding. The objective of this section is to introduce a simplified RHC problem based on convex control-invariant subsets X j of the (generally) non-convex sets X j . By doing so, the set of admissible continuous inputs Ũ v 0 x k , v k is convex as well, and the computation of Ũ v 0 x k , v k is less demanding. Let the sets X j be (recursively) defined by: Thus, for any state x j|k ∈X j and an arbitrary choice of the discrete input v j|k ∈ , at least one admissible continuous input u j|k ∈ U exists such that f x j|k , u j|k , v j|k ∈X j+1 . It follows from the definition that X j is the intersection of all possible polytopes X Hence, X j is a polytope as well with the property that X j ⊆ X j . It is worth mentioning that it is possible, in principle, to determine a further polytopic inner approximation of X j with smaller number of facets, if it is necessary to reduce complexity. Algorithm 1 provides a method for computing the sets X j recursively.
Problem 2 (Simplified RHC Problem) For a current state x k at time k and the DSLS (1) subject to (2), find a continuous input sequence ũ * 0|k ∶= ũ * 0|k , … ,ũ * N−1|k and a discrete input sequence ṽ * 0|k ∶= ṽ * 0|k , … ,ṽ * N−1|k for a prediction horizon N that steers the state into the origin within N time steps, while satisfying x j|k ∈X j and minimizing the quadratic cost function (3): In case a feasible solution exists, the application of the first elements of the input sequences to (1) at time k: leads to the closed-loop dynamics: Recursive feasibility of the RHC strategy (23) and asymptotic stability of the origin of the closed-loop system (24) with a domain of attraction X 0 can be proven in accordance with Propositions 2 and 3. Again, the optimal cost-to-go J * j x j|k and the cor responding input sequences ũ * j|k = Φ̃u * j x j|k and ṽ * j|k = Φ̃v * j x j|k may be obtained by solving a QP: for each possible discrete input sequence: The polytopes U and X j can be written in half-space representation as: with matrices H U , HX j and vectors h U , hX j of appropriate dimensions. By use of these sets, the set of admissible continuous inputs for a state x j|k and a discrete input v j|k at prediction time k + j can be written here as: Obviously, Ũ v j x j|k , v j|k is a polytope. The optimal cost-togo J * 0 x k and the optimal RHC strategy (23) of the simplified RHC problem may alternatively be computed by setting j = 0 and solving the nonlinear program: The constraints Ũ v j x j|k , v j|k are convex in this case, and it applies that: and: Φ̃u * j x j|k ∈ arg min u j|k J j x j|k , u j|k , Φ̃v * j x j|k subject to: 25b. Again, J * N is equal to the terminal cost (4) by definition, i.e. J * N x N|k ∶= g N x N|k . The optimal cost-to-go J * j is still a pointwise minimum of functions that are convex and piecewise quadratic on polyhedra, such that the challenge of deriving an analytical expression remains.
The Problem 2 can be transformed into the finite-horizon control problem considered in previous work of the authors [16]. There, the optimal finite-horizon control laws have been approximated by (deep) neural networks. These finitehorizon control laws are fundamental for the RHC approach presented in Section "Receding-Horizon Control with Par-ametricFunction Approximators", thus the relevant results from [16] are summarized in Section "Finite-Horizon Control with ParametricFunction Approximators" and tailored to the problem formulated before.

Finite-Horizon Control with Parametric Function Approximators
For the DSLS with finite-horizon N: subject to the constraints: consider the following problem, into which the simplified RHC problem 2 can be transferred with x k as initial state x 0 .

Problem 3 (Finite-Horizon
Control Problem) For a given initial state x 0 at time j = 0 , the DSLS (34) subject to (35), and a finite time horizon N, find input sequences that steer x 0 into the origin within N time steps, while minimizing (3): The optimal finite-horizon control law: (33) with v j =̃v * j (x j ) and u j =̃u * j (x j ) as defined in (32) and (33) does not only produce the optimal sequences ṽ * 0 and ũ * 0 for a single, but also for all initial states x 0 ∈X 0 . This control law is, however, not readily applicable, as discussed in Section "Problem Formulation and Preliminaries".
In [16], the functions ũ * j and ṽ * j are approximated with the help of neural networks. The main ideas required for the further method development in Section Receding-Horizon Control with Parametric Function Approximators are briefly repeated here.
The approximation of the cost-to-go functions J * j by parametric functions J j with real-valued parameter vectors r J j constitutes a so-called approximation in value space [30]. This allows to approximate the function Q * j defined in (30) by solving the following one-step look-ahead optimization problem with convex constraints:

solution of the optimization problem (38):
The finite-horizon control law (37) can then be approximated by: with: If a closed-form expression for the partial derivative [J j+1 ∕ x j+1 ] is available, well established gradient methods can be used to solve (38). The satisfaction of the convex constraints u j ∈Ũ v j (x j , v j ) in methods of this type is not a problem, see e.g. [37,Chapter 3]. This approach with neural networks to approximate the cost-to-go has been proposed in [38] to guarantee constraint satisfaction for systems without switching. Note that satisfaction of the constraints (35) is even guaranteed in case of imperfect approximations of (38) the optimal cost-to-go functions, and if the iterative procedure of gradient methods is stopped before finding a local minimum. The alternative approach of approximating the functions ũ * j ,̃v * j directly by parametric functions ũ PS,j ,̃v PS,j with real-valued parameter vectors r u j , r v j constitutes a so-called approximation in policy space [30]. In what follows, a possible realization of ṽ PS,j is presented. Motivated by classification tasks, parametric functions: are introduced, which are trained to predict the probability p v j ,j of a discrete input v j being optimal for state x j at time j. Note that p j (x j ) represents by definition a valid probability distribution. The function ṽ PS,j can be defined as the one which assigns to each state x j at time j the discrete input v j with the highest predicted probability to be optimal. The procedure of establishing p j as neural network is described in Section Neural Networks as Parametric Approximators.
The finite-horizon control law (37) can be approximated on the basis of approximation in policy space by: with: The projection of an inadmissible input ũ PS,j (x j ) ∉Ũ v j x j ,̃v PS,j (x j ) onto the polytope Ũ v j x j ,̃v PS,j (x j ) in (45) can be solved efficiently as QP. This is proposed in [31] to guarantee constraint satisfaction of neural network controllershere, the approach guarantees the satisfaction of the constraints (35).

Training and Training Data Generation
The prediction of the optimal cost-to-go J * j by J j for a state x j at time j can be solved as a regression task. Assume for the moment that a parametric function J j and a data set consisting of state-cost pairs x s j , J s j , s ∈ 1, … , q J j are available. Each value J s j denotes a regression target that represents a cost sample for the corresponding sample state x s j . The parameter vector r J j can then be adapted with the aim to improve the performance on the considered regression task by learning from the data set. Of course, a performance measure is required hereto, and the mean-squared error is a typical choice. The adaption procedure, typically named training, is an instance of supervised learning, for which several established algorithms exist, see e.g. [39]. The parameter vectors r u j and r v j can be adapted by supervised learning, too, requiring that data sets x s j , u s j , s ∈ 1, … , q u j and x s j , v s j , s ∈ 1, … , q v j are available.

SN Computer Science
The training data may originate from offline solutions of the considered MIQP problem. This approach, however, may take too much time due to the exponential growth of the number of possible discrete input sequences with N. An alternative is the use of approximate dynamic programming or reinforcement learning methods. The offline procedure in Algorithm 2 constitutes an approximate dynamic programming example that extends the sequential dynamic programming procedure from [30] to DSLS. Such an affine transformation arises also in softmax output units, in which h (L) i is set to:  The neural network (46) belongs to the family of parametric functions, whose shape is formed by the parameter vector that consists of the weights and biases:

Neural Networks as Parametric Approximators
Subsequently, the neural network structure (46) is considered as basis for parametric approximators. For the cost-togo function approximators J j , the use of continuous and continuously differentiable activation functions (such as sigmoid functions) and linear output units is proposed. As shown in [38], this allows one to derive closed-form expressions for the partial derivatives of h with respect to its arguments: Linear output units are further proposed for establishing ũ PS,j . Here, it is not necessarily required that the activation functions are continuous and continuously differentiable. The softmax output unit, on the other hand, is proposed as output unit for p j . It is common to use softmax units as output units to represent probability distributions over different classes [39]. According to (52), each output of the neural network with softmax output units is in between 0 and 1, and all outputs sum up to 1, leading to a valid probability distribution.

Receding-Horizon Control with Parametric Function Approximators
The RHC strategy (23) can be computed by solving a QP for each possible discrete input sequence. As already mentioned, this procedure, however, becomes computationally intractable rapidly for increasing N, due to the exponential growth of the possible number of discrete input sequences. The approach presented in this section aims at approximating the RHC strategy to make the online application possible, and is based on the idea to solve a QP only for a small number of discrete input sequences. Of course, a procedure is desirable which selects those discrete input sequences which are promising candidates for being the true optimal one(s). The procedure proposed in this section is based on the ideas for approximating the finite-horizon control law by neural networks, as presented in the previous section. Let V(k) denote a small set of selected discrete input sequences at time k-suppose for a moment that this set was available. For a given state x k , the approach computes the i n p u t s e q u e n c e s ṽ 0|k = ṽ 0|k , … ,ṽ N−1|k a n d ũ 0|k = ũ 0|k , … ,ũ N−1|k by solving the QP defined in (25) for each discrete input sequence v 0|k ∈ V(k): The approximated RHC strategy is obtained by applying the first element ũ 0|k of the continuous input sequence ũ 0|k and the first element ṽ 0|k of the discrete input sequence ṽ 0|k to the DSLS (1) at time k: The closed-loop dynamics for the DSLS (1) controlled by the approximated RHC strategy is then: Subsequently, J RHC,0 is defined as: and constitutes, obviously, an upper bound to the optimal cost-to-go J * 0 . For the determination of N−1|k , i ∈ 1, … , M are generated by a combination of approximation in value space and approximation in policy space, as described next. First, for each possible subsequence |k is determined recursively as illustrated in Fig. 2: 0|k is the current state x k at time k, i.e. x [i] 0|k = x k . Recall that the value of function ũ VS,j for state x [i] j|k and discrete input v [i] j|k results according to (39) from the solution of the nonlinear program (38) with convex constraints. The application of well-established gradient methods is possible here due to the availability of the closed-form expression for In addition, one further discrete input sequence N−1|k is selected, and chosen to guarantee asymptotic stability. Motivated by the proof of Proposition 3, N−1|k for k = 0 , on the other hand, is arbitrarily selected from N . Algorithm 3 summarizes the procedure of determining the set V(k).
. The definition of X 0 ensures that the QP has a feasible solution for any v 0|k ∈ N ⊇ V(k) if x k ∈X 0 , such that x k ∈X 0 ensures feasibility of the approximated RHC strategy, too. If feasible, the constraints of the QP enforce that f cl x k ⊆X 1 . It follows immediately from Propositions 1 and (21) that X 0 ⊇X 1 , such that recursive feasibility of the approximated RHC strategy is guaranteed for x 0 ∈X 0 . Now consider an x 0 ∈X 0 , and let x 0|0 = x 0|0 ,x 1|0 , … , x N|0 be the state sequence with x N|0 = 0 and x 0|0 ∶= x 0 that results from the input sequences ũ 0|0 = Φ̃u 0 (x 0 ) and ṽ 0|0 = Φ̃v 0 (x 0 ) . Hence, one gets: Proof of Theorem 1 Given x k at time k, the approximated RHC strategy solves the QP defined in (25) for all Due to u 0 =ũ 0|0 and v 0 =ṽ 0|0 , it follows from (1) that , 0 corresponds to the continuous input s e q u e n c e Since the sequences x 0|1 and u 0|1 satisfy x j|1 ∈X j and u j|1 ∈ U , respectively, they are admissible, such that the cost: constitutes an upper bound on Ṽ * x 1 , v 0|1 . On the other hand, v 0|1 ∈ V(1) , such that Ṽ * x 1 , v 0|1 constitutes an upper bound on J RHC,0 (x 1 ) =Ṽ * x 1 ,̃v 0|1 . Hence: and it follows from induction that: Since x k ∈X 0 implies f cl (x k ) ∈X 0 , the state sequence of the closed-loop system (58) lies within X 0 for any x 0 ∈X 0 . Note that the stage cost g and the terminal cost g N are continuous and positive definite functions, and that J RHC,0 (x k ) is lower bounded by zero. Hence, J RHC,0 (x k ) decreases according to (62) along any state sequence that starts from X 0 , i.e., convergence to the origin without leaving X 0 is guaranteed

Numerical Example
This section provides a numerical example for the illustration and the evaluation of the proposed approach, inspired by the numerical example considered in [16] for the finitehorizon case.
The switched system (1) is parameterized by the matrices: and is subject to polytopic constraints (2) with X = {x ∈ ℝ 2 | |x i | ≤ 1} and U = {u ∈ ℝ | |u| ≤ 4} . Furthermore, a quadratic cost function of type (3) is chosen with prediction horizon N = 6 and: As in [16], all neural networks required for the proposed approach are chosen to consist of one hidden layer with 50 units (i.e. S (1) = 50 ). In each hidden unit, hyperbolic tangents have been chosen as activation function. The neural networks have been trained according to Algorithm 2 offline by choosing q j = 1000.
To evaluate the approximation quality of the approximated RHC strategy, 1000 states x p have been generated by gridding the set X 0 . The latter has been determined with Algorithm 1 and is marked by the shaded polytope in Fig. 3.
For each x p , the optimal cost-to-go J * 0 (x p ) and its approximation J RHC,0 (x p ) for = 1 have been computed. The distribution of the optimal cost J * 0 (x p ) for the different states is shown in Fig. 4. The comparison of the optimal cost J * 0 (x p ) with its approximation J RHC,0 (x p ) yield a mean-squared error of only 6.99 × 10 −5 .
The average online computation time for the determination of the optimal costs by complete enumeration over all 4 N possible discrete input sequences (and solving one QP each) was 19.8s on a standard notebook (Intel ® Core™ i 5 − 7200 U Processor), where the CPLEXQP solver from the IBM ® ILOG ® CPLEX ® Optimization Studio has been used for the solution of the QPs. In contrast, when applying the proposed scheme, the average online computation time for determining J RHC,0 (x p ) (and thus the control inputs) was only 0.96s. Figure 3 shows a state sequence obtained from the approximated RHC strategy for an exemplary initial state x 0 , and demonstrates the asymptotic stability of the origin as proven in Theorem 1.

Conclusion
This paper has considered the optimal control of discretetime constrained switched linear systems (with externally forced switching) using the principle of the receding-horizon control. Building on previous work of the authors, an approach for approximating the optimal receding-horizon control strategy with neural networks as parametric function approximators has been developed. Important properties such as the guaranteed satisfaction of the polytopic constraints, recursive feasibility, and asymptotic stability of the origin of the closed-loop system under the approximated receding-horizon control strategy have been proven. The numerical example has shown that the proposed approach allows for the computation of approximating but close-tooptimal receding-horizon control strategies, which is considerable faster than solving the same problem by mixedinteger quadratic programming in any step. The focus of this work has been to provide theoretical guarantees for the proposed approach, without putting the focus on reducing the speed of computation through efficient implementation. A streamlined implementation of the proposed algorithm certainly allows for further reduction of the computation. Furthermore, an in-depth treatment of an efficient training data generation has been out of the scope of this work. Investigating alternative schemes to sequential dynamic programming for efficient generation of training data is a worthwhile point of future work for improving the applicability of the proposed scheme.

Proofs of the Propositions
is a polytope.
is the result of linear operations on the polytopes X (v j+1 ,…,v N−1 ) j and U (see e.g. [35,Chapter 10]), and hence a polytope, too. Moreover, since X is a polytope by definition, X is polytopic as well. Here, the terminal set X N ⊆ X is a singleton that contains only the or igin. Hence, X  non-convex, however, such that the sets X j are non-convex in the general case.
Let S ⊆ X be a control-invariant set for the DSLS (1) subject to the constraints (2) if: Suppose that X j+1 is control-invariant. According to the definition, X j is the set of states x j|k ∈ X j for which at least one u j|k ∈ U and at least one v j|k ∈ exist such that f (x j|k , u j|k , v j|k ) ∈ X j+1 . If X j+1 is control-invariant, then for each x j|k ∈ X j+1 at least one u j|k ∈ U and at least one v j|k ∈ exist such that f (x j|k , u j|k , v j|k ) is in X j+1 again. Consequently, X j+1 is a subset of X j , i.e. X j ⊇ X j+1 . Moreover, X j is controlinvariant, since it is always possible to reach the controlinvariant set X j+1 in one time-step and to remain in there up to step N. Here, X N is a singleton that contains only the origin. Furthermore, the origin is in the interior of U . Hence, x = f (x,ū,v) for x = 0 , ū = 0 , and arbitrary v ∈ , such that X N is control-invariant according to the definition above. The fact that X j ⊇ X j+1 follows thus by induction. ◻ Proof of Proposition 2 Problem 1 is feasible for x k ∈ X 0 . From the definition of the RHC strategy follows that x k ∈ X 0 implies x k+1 ∈ X 1 . Hence, a sufficient condition for recursive feasibility is that X 0 ⊇ X 1 . It follows from Proposition 1 and (11) that X 0 ⊇ X 1 . ◻ Proof of Proposition 3 (The proof is similar to the one for systems without switching, as can be found in [35,Chapter 12].) Let x 0 be an element of X 0 , such that recursive feasibility of the RHC strategy is guaranteed by Proposition 2. Further, let u * 0|0 = u * 0|0 , … , u * N−1|0 and v * 0|0 = v * 0|0 , … , v * N−1|0 be the optimal solution of Problem 1 for the current state x 0 at time 0, and x * 0|0 , x * 1|0 , … , x * N|0 the corresponding state sequence with x * 0|0 = x 0 and x * N|0 = 0 . Hence, the optimal cost J * 0 (x 0 ) is: Since u 0 = u * 0|0 and v 0 = v * 0|0 , it follows from (1) that T h e s t a t e s e q u e n c e x 0|1 , … , x N−1|1 , x N|1 ∶= x * 1|0 , … , x * N|0 , 0 corresponds to the input sequences u 0|1 = u 0|1 , … , u N−1|1 ∶= u * 1|0 , … , u * N−1|0 , 0 and v 0|1 = x k ∈ S ⇒ ∃u k ∈ U, ∃v k ∈ such that f (x k , u k , v k ) ∈ S for all k ∈ ℕ 0 .
and v 0|1 are feasible, but generally not an optimal solution of Problem 1 for the current state x 1 = x 0|1 at time 1, such that: is an upper bound on J * 0 (x 1 ) . Hence: and it follows from induction that Since x k ∈ X 0 implies f * cl (x k ) ∈ X 0 , the state sequence of the closed-loop system (8) lies within X 0 for any x 0 ∈ X 0 . Note that the stage cost g and the terminal cost g N are continuous and positive definite functions, and J * 0 (x k ) is lower bounded by zero. Hence, J * 0 (x k ) decreases according to (A1) along any state sequence that starts from X 0 , such that convergence to the origin without leaving X 0 is guaranteed as t → ∞ . ◻

Proof of Proposition 4
Let J * ,(v j ,…,v N−1 ) j be the value function V * j for a fixed discrete input sequence (v j , … , v N−1 ) ∈ N−j . The optimization problem in (12) subject to the fixed discrete input sequence can then be transformed into a multiparametric QP (mp-QP) problem of the form: In here, H j = H T j ≻ 0 , F j , Y j , G j , w j , and E j are similarly defined as in [35,Chapter 11]. It is proven there that the value function obtained as solution of the mp-QP problem is convex and piecewise quadratic on polyhedra. This transfers here to the function J * ,(v j ,…,v N−1 ) j . According to (13), J * j→N is the pointwise minimum over the functions J * ,(v j ,…,v N−1 ) j