Introduction

Applications from many domains, such as flexible production systems, smart grids, autonomous vehicles, logistic systems, smart cities, etc., are nowadays termed as cyber-physical systems (CPS), see e.g. [1]. These systems are characterized by the integration of physical objects with digital components of information processing and control interacting over communication networks. A typical property of CPS is the co-existence and interaction of continuous dynamics (most often stemming from the physical objects) and discrete-event dynamics (mostly arising from the digital components). Mixed continuous-discrete dynamics, also referred to as hybrid dynamics, has been subject of intensive research over the last decades [2].

A special class of hybrid dynamic systems are switched systems, for which the transition between different continuous dynamics is defined by a switching logic. With respect to this logic, one can distinguish between internally forced and externally forced switching [3]. The earlier type typically depends on the continuous state, and a frequently considered example is piecewise affine systems [4]. In contrast, the paper on hand focuses on switched systems with externally forced switching, where an external decision unit decides on when to execute a transition to a new mode of continuous dynamics. This class of switched systems is relevant for applications in which, e.g., a discrete or supervisory controller (such as a programmable logic controller) coexists with continuous control loops. More precisely, this paper studies the control of discrete-time switched linear systems (DSLS) with mixed-inputs, where discrete control inputs have to be selected to determine the continuous dynamics, and continuous inputs serve to achieve control goals for the continuous state, in particular the satisfaction of state constraints.

Discrete-time linear quadratic regulation problems for switched linear systems without constraints, here referred to as DSLQR problems, have been intensively studied before. It is shown in [5] that any finite-horizon value function of the DSLQR problem is a pointwise minimum of a finite number of quadratic functions. The quadratic functions are exactly described by a set of positive definite matrices obtained from the dynamic programming (DP) solution of the DSLQR problem. Even for a finite time horizon, the computation of the exact solution to the DSLQR problem is an \({\mathcal {N}}{\mathcal {P}}\)-hard [6] problem. The critical point is that the number of possible discrete input sequences, and by that also the number of quadratic functions, grows exponentially with the time horizon. DP with pruning [7] and relaxed DP [8, 9] have been shown to reduce complexity drastically, as is also described in the generalizing work [4, 10].

It is proven in [5] that the finite-horizon value function converges under certain conditions exponentially fast to the infinite-horizon value function. The work proposes a relaxation framework to solve the infinite-horizon DSLQR problem with guaranteed closed-loop stability but suboptimal performance. The use of a stabilizing base policy and the concept of rollout are exploited in [11] to address the problem of finding low complexity policies with (preferably tight) performance bounds for the infinite-horizon DSLQR problem. Recently, a Q-learning algorithm with customized Q-function approximation has been proposed in [12]. The approach is based on analytic results for the value function, and it addresses the infinite-horizon DSLQR problem for higher dimensional cases—these are currently intractable to be solved by state-of-the art methods. In general, receding horizon control (RHC) constitutes an approach to approximating infinite-horizon control problems [13]. According to [10], an RHC strategy can be expressed explicitly as a piecewise linear state feedback control law defined over (usually non-convex) regions.

The DSLQR problem for switched linear systems with polytopic constraints for the continuous states and inputs, here referred to as DCSLQR, is less studied. In [14], the infinite-horizon DCSLQR problem is addressed by splitting the problem into an unconstrained DSLQR problem and a finite-horizon DCSLQR problem. The finite-horizon DCSLQR part is then formulated as a mixed-integer quadratic programming (MIQP) problem. In general, MIQP problems are known to be \({\mathcal {N}}{\mathcal {P}}\)-hard problems. The online solution of such MIQP problems for the control of switched systems is addressed in [15], by determining a trade-off between performance and computation time through a tailored tree search with cost bounds and search heuristics. The finite-horizon DCSLQR problem has been recently considered in a previous paper of the authors [16]. There, the optimal closed-loop control law has been approximated by neural networks (NN) which are trained offline. This allows one a fast determination of guaranteed admissible and (preferably optimal) continuous and discrete inputs for any state of the DSLS. In general, NN have the attractive property of being able to approximate functions arbitrarily close when used with general activation functions [17,18,19], and thus have contributed significantly to the recent success of machine learning applications [20,21,22,23]. The use of NN as function approximators for policies (control laws) and value (cost-to-go) functions, as in DeepMind’s popular computer program AlphaZero [23], is the core of numerous approximate dynamic programming and reinforcement learning algorithms [24,25,26,27,28,29,30]. Moreover, NN have also been considered recently for approximating receding-horizon (or model predictive) control laws [31,32,33,34]

Missing so far are techniques that efficiently solve and represent the result of the DLSQR problem for the case with constraints over infinite horizons—proposing a corresponding technique by approximating the solution by NN while ensuring all constraints is the goal and contribution of this paper. Note that for the case without constraints, the work in [10] has shown that solving an MIQP problem at each time instant is necessary, leading to high computation times for larger problem instances. However, the MIQP problem to be solved at each time step can be transformed into one for DSLS with finite time horizons, as proposed in [16]. This motivates the concept of approximating the optimal RHC strategy by use of NN, and by exploiting the results from [16] for efficient computation. The objective thus is to suggest an approach which makes the computation of the control inputs (based on the approximating RHC strategy) significantly faster while guaranteeing constraint satisfaction. Thus, the present paper extends the work in [16] from a single optimization over a finite horizon to a setting of receding (finite) horizons in order to cover infinite time spans. As a consequence of this extension, recursive feasibility and asymptotic stability have to be considered as additional aspects, and this paper shows how these properties can be proven, requiring in addition, of course, that all state and input constraints are guaranteed to be satisfied throughout.

The paper is structured such that Section “Problem Formulation and Preliminaries” first introduces the RHC problem, and analyzes the properties and challenges of its solution. Motivated by the challenges, a simplified RHC problem is introduced, which can be transformed into the finite-horizon control problem considered in [16]. Section “Finite-Horizon Control with Parametric Function Approximators” reminds the reader of fundamental results from [16], which are then used to develop the new approach in Section “Receding-Horizon Control with Parametric Function Approximators”. A numerical example is provided for illustration in Section “Numerical Example”, and the paper is concluded in Section “Conclusion”. The “Appendix A. Proofs of the Propositions” contains all proofs for the results established in the mentioned sections.

Problem Formulation and Preliminaries

For defining discrete-time switched linear systems (DSLS), let \(x_k \in \mathbb {R}^{n_x}\) denote the continuous state, \(u_k \in \mathbb {R}^{n_u}\) the continuous control input, and \(v_k \in \mathbb {M} = \{1, \ldots , M\}\) the discrete control input, all for time \(k \in \mathbb {N}_0:=\mathbb {N}\cup \{0\}\). The latter selects for any k the parameterization \((A_i, B_i)\) of a linear dynamics with system matrix \(A_i \in \mathbb {R}^{n_x \times n_x}\) and input matrix \(B_i \in \mathbb {R}^{n_x \times n_u}\). The DSLS is then written as:

$$\begin{aligned} x_{k+1} = A_{v_k}x_k + B_{v_k}u_k =: f\left( x_k, u_k, v_k\right) , \end{aligned}$$
(1)

with initial state \(x_0\in \mathbb {R}^{n_x}\). The states and inputs are subject to constraints:

$$\begin{aligned} x_k \in \mathcal {X}, \quad u_k \in \mathcal {U}, \quad \forall k \in \mathbb {N}_0, \end{aligned}$$
(2)

where it is required that \(\mathcal {X}\) and \(\mathcal {U}\) are polytopic and contain the origin in their interior. Subsequently, let \(x_{j \vert k}\) denote a continuous state at time \(k+j\) predicted at time k. According to (1), the predicted state at time \(k+j+1\) is given by:

$$\begin{aligned} x_{j+1 \vert k}&= f\left( x_{j \vert k}, u_{j \vert k}, v_{j \vert k}\right) , \\&\quad x_{0 \vert k} := x_k, \quad j \in \mathbb {N}_0, \end{aligned}$$

where \(u_{j \vert k}\) and \(v_{j \vert k}\) denote the continuous and discrete inputs at time \(k+j\) predicted at time k, respectively. Predicted input sequences at k over a time span \(\{ k+j, \ldots , k+N-1 \}\) are written as:

$$\begin{aligned}&\phi _{j \rightarrow N-1 \vert k}^u := \left( u_{j \vert k}, \ldots , u_{N-1 \vert k} \right) , \\&\quad \phi _{j \rightarrow N-1 \vert k}^v := \left( v_{j \vert k}, \ldots , v_{N-1 \vert k} \right) . \end{aligned}$$

Consider the quadratic cost-to-go over the horizon from j to N:

$$\begin{aligned}&J_{j \rightarrow N}\left( x_{j \vert k}, \phi _{j \rightarrow N-1 \vert k}^u, \phi _{j \rightarrow N-1 \vert k}^v \right) \nonumber \\&\quad = g_N \left( x_{N \vert k} \right) + \sum _{i = j}^{N-1} g \left( x_{i \vert k}, u_{i \vert k}, v_{i \vert k} \right) \end{aligned}$$
(3)

subject to (1) with terminal cost function:

$$\begin{aligned} g_N \left( x_{N \vert k} \right) = x_{N \vert k}^\mathrm{T} P x_{N \vert k}, \end{aligned}$$
(4)

and stage cost function:

$$\begin{aligned} g \left( x_{j \vert k}, u_{j \vert k}, v_{j \vert k} \right) = x_{j \vert k}^\mathrm{T} Q_{v_{j \vert k}} x_{j \vert k} + u_{j \vert k}^\mathrm{T} R_{v_{j \vert k}} u_{j \vert k}, \end{aligned}$$
(5)

where \(P = P^\mathrm{T} \succ 0\), \(Q_{v_k} = Q_{v_k}^\mathrm{T} \succ 0\), and \(R_{v_k} = R_{v_k}^\mathrm{T} \succ 0\) with \(v_k \in \mathbb {M}\) are (switched) weighting matrices. For the sake of simplicity, the shorter notation \(J_j\) instead of \(J_{j \rightarrow N}\), \(\phi _{j \vert k}^u\) instead of \(\phi _{j \rightarrow N-1 \vert k}^u\), and \(\phi _{j \vert k}^v\) instead of \(\phi _{j \rightarrow N-1 \vert k}^v\) is used when appropriate.

This work aims at determining online a control strategy based on a receding-horizon principle for steering the DSLS from the initial state \(x_0\) into the origin.

Problem 1

(Receding-Horizon Control Problem) For the current state \(x_k\) at time k and the DSLS (1) subject to (2), find a continuous input sequence \(\phi _{0 \vert k}^{u^{*}} := \left( u_{0 \vert k}^{*}, \ldots , u_{N-1 \vert k}^{*}\right)\) and a discrete input sequence  \(\phi _{0 \vert k}^{v^{*}} := \left( v_{0 \vert k}^{*}, \ldots , v_{N-1 \vert k}^{*}\right)\) with prediction horizon N, such that the system state reaches the origin within N time steps, while the cost function (3) is minimized:

$$\begin{aligned} \left( \phi _{0 \vert k}^{u^{*}}, \phi _{0 \vert k}^{v^{*}}\right) \in \mathop {\mathrm {arg\,min}}\limits _{\left( \phi _{0 \vert k}^u, \phi _{0 \vert k}^v \right) }&\quad J_0\left( x_{0 \vert k}, \phi _{0 \vert k}^u, \phi _{0 \vert k}^v\right) \nonumber \\ \mathrm {subject\,to:}&\quad x_{0 \vert k} = x_k, \quad x_{j+1 \vert k} = f\left( x_{j \vert k}, u_{j \vert k}, v_{j \vert k} \right) , \nonumber \\&\quad x_{j \vert k} \in \mathcal {X}, \quad u_{j \vert k} \in \mathcal {U}, \quad v_{j \vert k} \in \mathbb {M}, \nonumber \\&\quad x_{N \vert k} = 0, \quad j \in \{0, \ldots , N-1\}. \end{aligned}$$
(6)

If a feasible solution to the problem exists and an optimal receding-horizon control (RHC) strategy is obtained, it can be applied by imposing the first element \(u_{0 \vert k}^{*}\) of the continuous input sequence \(\phi _{0 \vert k}^{u^{*}}\) and the first element \(v_{0 \vert k}^{*}\) of the discrete input sequence \(\phi _{0 \vert k}^{v^{*}}\) to (1) at k:

$$\begin{aligned} (u_k, v_k) = \left( u_{0 \vert k}^{*}, v_{0 \vert k}^{*} \right) =: \left( \mu _0^{u^{*}}(x_k), \mu _0^{v^{*}}(x_k)\right) . \end{aligned}$$
(7)

The closed-loop dynamics for the DSLS controlled by the strategy is then:

$$\begin{aligned} x_{k+1} =f\left( x_k, \mu _0^{u^{*}}(x_k), \mu _0^{v^{*}}(x_k)\right) =: f_{\text {cl}}^{*}(x_k), \quad k \in \mathbb {N}_0. \end{aligned}$$
(8)

Subsequently, let \(J_0^{*}(x_k) = J_0\left( x_k, \phi _{0 \vert k}^{u^{*}}, \phi _{0 \vert k}^{v^{*}}\right)\) denote the optimal cost-to-go for steering \(x_k\) into the origin within N steps.

Recursive Feasibility and Stability

The symbol \(\mathcal {X}_{j \rightarrow N}\) is introduced to denote the set of states that can be steered into the origin within \(N-j\) steps. A recursive definition of \(\mathcal {X}_{j \rightarrow N}\) is given by:

$$\begin{aligned} \mathcal {X}_{i}&= \{ x \in \mathcal {X} \,\vert \, \exists u \in \mathcal {U}, \exists v \in \mathbb {M} \text { such that } f(x, u, v) \in \mathcal {X}_{i+1} \}, \\&\quad \mathcal {X}_{N} := \{0\}, \end{aligned}$$

with \(i\in \{N-1,\ldots ,j\}\). When appropriate, the shorter notation \(\mathcal {X}_{j}\) is used instead of \(\mathcal {X}_{j \rightarrow N}\). The Problem 1 has a feasible solution if and only if \(x_k\) is an element of \(\mathcal {X}_{0}\).

An RHC strategy is called recursively feasible if \(x_0 \in \mathcal {X}_0\) implies feasibility of Problem 1 for all future states \(x_k\), \(k>0\). From the definition of the RHC strategy follows that \(x_k \in \mathcal {X}_0\) implies \(x_{k+1} \in \mathcal {X}_1\). Hence, a sufficient condition for recursive feasibility is that \(\mathcal {X}_0 \supseteq \mathcal {X}_1\).

Let \(\mathcal {X}_j^{\,(v_j, \ldots , v_{N-1})}\) be the set of states that can be steered into the origin within \(N-j\) steps for a fixed discrete input sequence \((v_j, \ldots , v_{N-1})\):

$$\begin{aligned} \mathcal {X}_j^{\left( v_j, v_{j+1}, \ldots , v_{N-1}\right) }&= \left\{ x \in \mathcal {X} \,\bigl \vert \, \exists u \in \mathcal {U} \text { such that } f\left( x, u, v_j \right) \in \mathcal {X}_{j+1}^{\left( v_{j+1}, \ldots , v_{N-1}\right) } \right\} , \\ \mathcal {X}_{N-1}^{\left( v_{N-1}\right) }&= \left\{ x \in \mathcal {X} \,\bigl \vert \, \exists u \in \mathcal {U} \text { such that } f\left( x, u, v_{N-1} \right) \in \mathcal {X}_N \right\} . \end{aligned}$$

For brevity, the more compact notation:

$$\begin{aligned} \mathcal {X}_j^{\left( v_j, v_{j+1}, \ldots , v_{N-1}\right) }&= \text {Pre}^{(v_j)}\left( \mathcal {X}_j^{\left( v_{j+1}, \ldots , v_{N-1}\right) }\right) \cap \mathcal {X}, \nonumber \\ \mathcal {X}_{N-1}^{\left( v_{N-1}\right) }&= \text {Pre}^{\left( v_{N-1}\right) }\left( \mathcal {X}_{N}\right) \cap \mathcal {X} \end{aligned}$$
(9)

is used, where the operator \(\text {Pre}^{(v_j)}(\mathcal {S})\) returns the set of predecessor states to the set \(\mathcal {S} \subseteq \mathbb {R}^{n_x}\) for a fixed discrete input \(v_j\):

$$\begin{aligned} \text {Pre}^{\left( v_j\right) }(\mathcal {S})&= \left\{ x \in \mathbb {R}^{n_x} \,\vert \, \exists u \in \mathcal {U} \text { such that } f\left( x, u, v_j\right) \in \mathcal {S} \right\} . \end{aligned}$$
(10)

It follows that \(\mathcal {X}_{j}\) is the union of all possible sets \(\mathcal {X}_j^{(v_j, \ldots , v_{N-1})}\):

$$\begin{aligned} \mathcal {X}_{j} = \bigcup \limits _{\left( v_j, \ldots , v_{N-1}\right) \in \mathbb {M}^{N-j}} \mathcal {X}_j^{\left( v_j, \ldots , v_{N-1}\right) }, \end{aligned}$$
(11)

and the following propositions can be established (see the Appendix for the corresponding proofs).

Proposition 1

For a given prediction horizon N and a fixed discrete input sequence \((v_j, \ldots , v_{N-1}) \in \mathbb {M}^{N-j}\), the sets \(\mathcal {X}_j^{(v_j, \ldots , v_{N-1})}\) are polytopes. The sets \(\mathcal {X}_{j}\) are non-convex in the general case, with the property that:

$$\begin{aligned} \mathcal {X}_j \supseteq \mathcal {X}_{j+1}, \quad j \in \left\{ 0, \ldots , N-1 \right\} . \end{aligned}$$

Proposition 2

If \(x_0 \in \mathcal {X}_0\), then the optimal RHC strategy (7) is recursively feasible.

Proposition 3

The origin of the closed-loop system (8) is asymptotically stable with domain of attraction \(\mathcal {X}_0\).

The Propositions 13 are extensions of results known from literature on RHC for systems without switching, as e.g. in [35]. More particularly, the sets \(\mathcal {X}_j\) are extensions (of the feasible sets considered there) with respect to the discrete inputs. If \(\mathcal {X}_N\) is control-invariant (as defined in the proof of Proposition 1), then the sets \(\mathcal {X}_j\) share the property (with the sets of the case without switching) that \(\mathcal {X}_j\) grows as j decreases, and stops growing when reaching the maximal control-invariant set (see e.g. [35, Remark 11.3]). Note that \(\mathcal {X}_N\) is a singleton here, which contains the origin and thus is control-invariant.

Properties of the RHC Problem

Let \(V_{j}^{*}\) denote the optimal cost-to-go for steering \(x_{j \vert k}\) into the origin within \(N-j\) steps for a chosen discrete input sequence \(\phi _{j \vert k}^v\):

$$\begin{aligned} V_{j}^{*}\left( x_{j \vert k}, \phi _{j \vert k}^v \right) = \min _{\phi _{j \vert k}^u}&\quad J_j\left( x_{j \vert k}, \phi _{j \vert k}^u, \phi _{j \vert k}^v \right) \end{aligned}$$
(12a)
$$\begin{aligned} \text {subject to:}&\quad x_{i+1 \vert k} = f\left( x_{i \vert k}, u_{i \vert k}, v_{i \vert k}\right) , \nonumber \\&\quad x_{i \vert k} \in \mathcal {X}, \quad u_{i \vert k} \in \mathcal {U}, \quad v_{i \vert k} \in \mathbb {M}, \nonumber \\&\quad x_{N \vert k} = 0, \quad i \in \{j, \ldots , N-1\}. \end{aligned}$$
(12b)

The optimization problem (12) is a quadratic program (QP), and has a feasible solution if and only if \(x_{j \vert k} \in \mathcal {X}_j^{(v_{j \vert k}, \ldots , v_{N-1 \vert k})}\). As a convention, \(V_{j}^{*}\left( x_{j \vert k}, \phi _{j \vert k}^v \right)\) is set to infinity for the case that (12) has no feasible solution.

The optimal cost-to-go \(J_j^{*}\left( x_{j \vert k} \right)\) for steering \(x_{j \vert k}\) into the origin within \(N-j\) steps, as well as the corresponding input sequences \(\phi _{j \vert k}^{u^{*}} = \Phi _j^{u^{*}}\left( x_{j \vert k} \right)\) and \(\phi _{j \vert k}^{v^{*}} = \Phi _j^{v^{*}}\left( x_{j \vert k} \right)\) may be obtained by solving a QP for each possible discrete input sequence \(\phi _{j \vert k}^v\):

$$\begin{aligned} J_j^{*}\left( x_{j \vert k} \right)&= \min _{\phi _{j \vert k}^v} V_j^{*}(x_{j \vert k}, \phi _{j \vert k}^v) \text { subject to: } \phi _{j \vert k}^v \in \mathbb {M}^{N-j}, \end{aligned}$$
(13)
$$\begin{aligned} \Phi _j^{v^{*}}\left( x_{j \vert k} \right)&\in \mathop {\mathrm {arg\,min}}\limits _{\phi _{j \vert k}^v} V_{j}^{*}\left( x_{j \vert k}, \phi _{j \vert k}^v \right) \text { subject to: } \phi _{j \vert k}^v \in \mathbb {M}^{N-j}, \end{aligned}$$
(14)
$$\begin{aligned} \Phi _j^{u^{*}}\left( x_{j \vert k} \right)&\in \mathop {\mathrm {arg\,min}}\limits _{\phi _{j \vert k}^u} J_j\left( x_{j \vert k}, \phi _{j \vert k}^u, \Phi _j^{v^{*}}\left( x_{j \vert k} \right) \right) \text { subject to:~12b}. \end{aligned}$$
(15)

Denote as \(\mathcal {U}_j^v \left( x_{j \vert k}, v_{j \vert k} \right)\) the set that contains the admissible continuous inputs for a state \(x_{j \vert k}\) and a discrete input \(v_{j \vert k}\) to reach the state set \(\mathcal {X}_{j+1}\):

$$\begin{aligned} \mathcal {U}_j^v\left( x_{j \vert k}, v_{j \vert k} \right) = \left\{ u \in \mathcal {U} \,\bigl \vert \, f\left( x_{j \vert k}, u, v_{j \vert k} \right) \in \mathcal {X}_{j+1} \right\} . \end{aligned}$$
(16)

Instead of solving a QP problem for each possible discrete input sequence, the optimal cost-to-go \(J_0^{*}(x_k)\) and the optimal RHC strategy (7) may be computed, in principle, by setting \(j=0\) and solving the optimization problem:

$$\begin{aligned} Q_j^{*}\left( x_{j \vert k}, v_{j \vert k} \right) = & \min _{u_{j \vert k}} g\left( x_{j \vert k}, u_{j \vert k}, v_{j \vert k} \right) \nonumber \\&\quad + J_{j+1}^{*}\left( f\left( x_{j \vert k}, u_{j \vert k}, v_{j \vert k} \right) \right) \nonumber \\ \text {subject to:}&\quad u_{j \vert k} \in \mathcal {U}_j^{v}\left( x_{j \vert k}, v_{j \vert k} \right) , \end{aligned}$$
(17)

for each possible discrete input \(v_{j \vert k} \in \mathbb {M}\) and just one step, leading to:

$$\begin{aligned} J_j^{*}\left( x_{j \vert k} \right)&= \min _{v_{j \vert k}} Q_j^{*}\left( x_{j \vert k}, v_{j \vert k} \right) \text { subject to: } v_{j \vert k} \in \mathbb {M}, \end{aligned}$$
(18)
$$\begin{aligned} \mu _j^{v^{*}}\left( x_{j \vert k} \right)&\in \mathop {\mathrm {arg\,min}}\limits _{v_{j \vert k}} Q_j^{*}\left( x_{j \vert k}, v_{j \vert k} \right) \text { subject to: } v_{j \vert k} \in \mathbb {M}, \end{aligned}$$
(19)

and:

$$\begin{aligned} \mu _j^{u^{*}}\left( x_{j \vert k} \right) \in \mathop {\mathrm {arg\,min}}\limits _{u_{j \vert k}}&\quad g\left( x_{j \vert k}, u_{j \vert k}, \mu_j^{v^{\ast}}\left( x_{j \vert k} \right) \right) \nonumber \\ &\quad + J_{j+1}^{\ast}\left( f\left( x_{j \vert k}, u_{j \vert k}, \mu_j^{v^{\ast}}\left( x_{j \vert k} \right) \right) \right) \nonumber \\ \text {subject to:}&\quad u_{j \vert k} \in \mathcal {U}_j^{v}\left( x_{j \vert k}, \mu _j^{v^{*}}\left( x_{j \vert k} \right) \right) . \end{aligned}$$
(20)

This requires, of course, that \(J_{j+1}^{*}\) and \(\mathcal {U}_j^{v}\left( x_{j \vert k}, v_{j \vert k} \right)\) are already known and that a globally optimal solution is found. By definition, \(J_N^{*}\) is equal to the terminal cost (4), i.e. \(J_N^{*}\left( x_{N \vert k} \right) := g_N\left( x_{N \vert k} \right)\). The following propositions establish that the optimization problem (17) is a difficult one, in the general case, since it constitutes a nonlinear program with non-convex objective function \(J^{*}_{j+1}\) and non-convex constraints \(\mathcal {U}_j^v\left( x_{j \vert k}, v_{j \vert k} \right)\). (The pointwise minimum of two functions \(J_1\) and \(J_2\) is defined as \(J(x) = \min \{ J_1(x), J_2(x) \}\), in analog to the definition of the pointwise maximum in [36].)

Proposition 4

The optimal cost-to-go function \(J_{j}^{*}\), \(j \in \{ 0, \ldots , N-1 \}\) is in the general case a pointwise minimum of functions that are convex and piecewise quadratic on polyhedra.

Proposition 5

The sets \(\mathcal {U}_j^v\left( x_{j \vert k}, v_{j \vert k} \right)\), \(j \in \{0, \ldots , N-2\}\) are non-convex in general.

Challenges and Objective

Problem 1 is an MIQP and known to be \(\mathcal {N}\mathcal {P}\)-hard. The number of possible discrete input sequences, given by \(\vert \mathbb {M}^N \vert = M^N\), grows exponentially with the prediction horizon N. Thus, the trivial approach of solving a QP for each possible discrete input sequence to solve Problem 1 is computationally intractable in almost all cases. While more efficient approaches to solve MIQPs exist (such as branch-and-bound or branch-and-cut techniques), these approaches typical still require too much time for the online optimization of RHC.

On first sight, the approach of computing the optimal RHC strategy (7) according to (19) and (20) requires only the solution of M optimization problems. However, these optimization problems are in general nonlinear programs with non-convex objective function \(J_1^{*}\) and non-convex constraints \(\mathcal {U}_0^v\left( x_k, v_k \right)\), and thus challenging to solve. Moreover, \(J_1^{*}\) has a complicated form (pointwise minimum of functions that are convex and piecewise quadratic on polyhedra, see Proposition 4), and the derivation of analytic solution is usually not possible. Last but not least, the proof of Proposition 5 provides the insight that the determination of \(\mathcal {U}_0^v\left( x_k, v_k \right)\) relies on the computation of the union of \(M^{N-1}\) polytopes, and requires the expensive offline computation of \(M^{N-1}\) many polytopes \(\mathcal {X}_1^{(v_1, \ldots , v_{N-1})}\).

Thus, the objective of the further derivations of this work is to efficiently approximate the optimal RHC strategy to make the computation of the control inputs faster, while guaranteeing properties like constraint satisfaction, recursive feasibility, and asymptotic stability.

Simplified RHC Problem

As discussed above, the optimal RHC strategy could be computed by solving (19) and (20). The problem is, however, that the set of admissible continuous inputs \(\mathcal {U}_0^v\left( x_k, v_k \right)\) is non-convex. While established methods exist for nonlinear programs with convex constraints, the solution of a nonlinear program with non-convex constraints (arising for each discrete input sequence) is computationally intractable for online application in most cases. Moreover, the determination of \(\mathcal {U}_0^v\left( x_k, v_k \right)\) is computationally demanding. The objective of this section is to introduce a simplified RHC problem based on convex control-invariant subsets \(\tilde{\mathcal {X}}_j\) of the (generally) non-convex sets \(\mathcal {X}_j\). By doing so, the set of admissible continuous inputs \(\tilde{\mathcal {U}}_0^v\left( x_k, v_k \right)\) is convex as well, and the computation of \(\tilde{\mathcal {U}}_0^v\left( x_k, v_k \right)\) is less demanding.

Let the sets \(\tilde{\mathcal {X}}_j\) be (recursively) defined by:

$$\begin{aligned} \tilde{\mathcal {X}}_j&= \left\{ x \in \mathcal {X} \,\vert \, \forall v \in \mathbb {M} : \exists u \in \mathcal {U} \text { such that } f(x, u, v) \in \tilde{\mathcal {X}}_{j+1} \right\} , \\&\quad \tilde{\mathcal {X}}_N = \mathcal {X}_N. \end{aligned}$$

Thus, for any state \(x_{j \vert k} \in \tilde{\mathcal {X}}_j\) and an arbitrary choice of the discrete input \(v_{j \vert k} \in \mathbb {M}\), at least one admissible continuous input \(u_{j \vert k} \in \mathcal {U}\) exists such that \(f\left( x_{j \vert k}, u_{j \vert k}, v_{j \vert k} \right) \in \tilde{\mathcal {X}}_{j+1}\). It follows from the definition that \(\tilde{\mathcal {X}}_j\) is the intersection of all possible polytopes \(\mathcal {X}_j^{(v_j, \ldots , v_{N-1})}\):

$$\begin{aligned} \tilde{\mathcal {X}}_j = \bigcap \limits _{\left( v_j, \ldots , v_{N-1}\right) \in \mathbb {M}^{N-j}} \mathcal {X}_j^{\left( v_j, \ldots , v_{N-1}\right) }. \end{aligned}$$
(21)

Hence, \(\tilde{\mathcal {X}}_j\) is a polytope as well with the property that \(\tilde{\mathcal {X}}_j \subseteq \mathcal {X}_j\). It is worth mentioning that it is possible, in principle, to determine a further polytopic inner approximation of \(\tilde{\mathcal {X}}_j\) with smaller number of facets, if it is necessary to reduce complexity. Algorithm 1 provides a method for computing the sets \(\tilde{\mathcal {X}}_j\) recursively.

figure a

Problem 2

(Simplified RHC Problem) For a current state \(x_k\) at time k and the DSLS (1) subject to (2), find a continuous input sequence \(\phi _{0 \vert k}^{\tilde{u}^{*}} := \left( \tilde{u}_{0 \vert k}^{*}, \ldots , \tilde{u}_{N-1 \vert k}^{*}\right)\) and a discrete input sequence \(\phi _{0 \vert k}^{\tilde{v}^{*}} := \left( \tilde{v}_{0 \vert k}^{*}, \ldots , \tilde{v}_{N-1 \vert k}^{*}\right)\) for a prediction horizon N that steers the state into the origin within N time steps, while satisfying \(x_{j \vert k} \in \tilde{\mathcal {X}}_j\) and minimizing the quadratic cost function (3):

$$\begin{aligned} \left( \phi _{0 \vert k}^{\tilde{u}^{*}}, \phi _{0 \vert k}^{\tilde{v}^{*}}\right) \in \mathop {\mathrm {arg\,min}}\limits _{\left( \phi _{0 \vert k}^u, \phi _{0 \vert k}^v \right) }&\quad J_0\left( x_{0 \vert k}, \phi _{0 \vert k}^u, \phi _{0 \vert k}^v\right) \nonumber \\ \mathrm {subject\,to:}&\quad x_{0 \vert k} = x_k, \quad x_{j+1 \vert k} = f\left( x_{j \vert k}, u_{j \vert k}, v_{j \vert k} \right) , \nonumber \\&\quad x_{j \vert k} \in \tilde{\mathcal {X}}_{j}, \quad u_{j \vert k} \in \mathcal {U}, \quad v_{j \vert k} \in \mathbb {M}, \nonumber \\&\quad x_{N \vert k} = 0, \quad j \in \{0, \ldots , N-1\}. \end{aligned}$$
(22)

In case a feasible solution exists, the application of the first elements of the input sequences to (1) at time k:

$$\begin{aligned} \left( \tilde{u}^{*}_k, \tilde{v}^{*}_k \right) = \left( \tilde{u}_{0 \vert k}^{*}, \tilde{v}_{0 \vert k}^{*} \right) =: \left( \mu _0^{\tilde{u}^{*}}\left( x_k\right) , \mu _0^{\tilde{v}^{*}}\left( x_k\right) \right) \end{aligned}$$
(23)

leads to the closed-loop dynamics:

$$\begin{aligned} x_{k+1} = f\left( x_k, \mu _0^{\tilde{u}^{*}}\left( x_k\right) , \mu _0^{\tilde{v}^{*}}(x_k) \right) =: \tilde{f}_{\text {cl}}^{*}\left( x_k \right) . \end{aligned}$$
(24)

Recursive feasibility of the RHC strategy (23) and asymptotic stability of the origin of the closed-loop system (24) with a domain of attraction \(\tilde{\mathcal {X}}_0\) can be proven in accordance with Propositions 2 and 3. Again, the optimal cost-to-go \(\tilde{J}_j^{*}\left( x_{j \vert k} \right)\) and the corresponding input sequences \(\phi _{j \vert k}^{\tilde{u}^{*}} = \Phi _j^{\tilde{u}^{*}}\left( x_{j \vert k} \right)\) and \(\phi _{j \vert k}^{\tilde{v}^{*}} = \Phi _j^{\tilde{v}^{*}}\left( x_{j \vert k} \right)\) may be obtained by solving a QP:

$$\begin{aligned} \tilde{V}_{j}^{*}\left( x_{j \vert k}, \phi _{j \vert k}^v \right) = \min _{\phi _{j \vert k}^u}&\quad J_j\left( x_{j \vert k}, \phi _{j \vert k}^u, \phi _{j \vert k}^v \right) \end{aligned}$$
(25a)
$$\begin{aligned} \text {subject to:}&\quad x_{i+1 \vert k} = f\left( x_{i \vert k}, u_{i \vert k}, v_{i \vert k}\right) , \nonumber \\&\quad x_{i \vert k} \in \tilde{\mathcal {X}}_i, \quad u_{i \vert k} \in \mathcal {U}, \quad v_{i \vert k} \in \mathbb {M}, \nonumber \\&\quad x_{N \vert k} = 0, \quad i \in \{j, \ldots , N-1\} \end{aligned}$$
(25b)

for each possible discrete input sequence:

$$\begin{aligned} \tilde{J}_j^{*}\left( x_{j \vert k} \right)&= \min _{\phi _{j \vert k}^v} \tilde{V}_j^{*}\left( x_{j \vert k}, \phi _{j \vert k}^v \right) \text { subject to: } \phi _{j \vert k}^v \in \mathbb {M}^{N-j}, \end{aligned}$$
(26)
$$\begin{aligned} \Phi _j^{\tilde{v}^{*}}\left( x_{j \vert k} \right)&\in \mathop {\mathrm {arg\,min}}\limits _{\phi _{j \vert k}^v} \tilde{V}_{j}^{*}\left( x_{j \vert k}, \phi _{j \vert k}^v \right) \text { subject to: } \phi _{j \vert k}^v \in \mathbb {M}^{N-j}, \end{aligned}$$
(27)
$$\begin{aligned} \Phi _j^{\tilde{u}^{*}}\left( x_{j \vert k} \right)&\in \mathop {\mathrm {arg\,min}}\limits _{\phi _{j \vert k}^u} J_j\left( x_{j \vert k}, \phi _{j \vert k}^u, \Phi _j^{\tilde{v}^{*}}\left( x_{j \vert k} \right) \right) \text { subject to:~25b}. \end{aligned}$$
(28)

The polytopes \(\mathcal {U}\) and \(\tilde{\mathcal {X}}_j\) can be written in half-space representation as:

$$\begin{aligned} \mathcal {U} = \left\{ u \in \mathbb {R}^{n_u} \,\bigr \vert \, H^{\mathcal {U}} u \le h^{\mathcal {U}} \right\} , \quad \tilde{\mathcal {X}}_j = \left\{ x \in \mathbb {R}^{n_x} \,\bigr \vert \, H^{\tilde{\mathcal {X}}_j} x \le h^{\tilde{\mathcal {X}}_j} \right\} , \end{aligned}$$

with matrices \(H^{\mathcal {U}}\), \(H^{\tilde{\mathcal {X}}_j}\) and vectors \(h^{\mathcal {U}}\), \(h^{\tilde{\mathcal {X}}_j}\) of appropriate dimensions. By use of these sets, the set of admissible continuous inputs for a state \(x_{j \vert k}\) and a discrete input \(v_{j \vert k}\) at prediction time \(k+j\) can be written here as:

$$\begin{aligned} \tilde{\mathcal {U}}_j^v\left( x_{j \vert k}, v_{j \vert k} \right)&= \left\{ u \in \mathcal {U} \,\bigl \vert \, f\left( x_{j \vert k}, u, v_{j \vert k} \right) \in \tilde{\mathcal {X}}_{j+1} \right\} \nonumber \\&= \left\{ u \in \mathbb {R}^{n_u} \,\biggl \vert \, \begin{bmatrix} H^{\tilde{\mathcal {X}}_{j+1}} B_{v_{j \vert k}} \\ H^{\mathcal {U}} \end{bmatrix} u\right. \nonumber \\&\quad \left. \le \begin{bmatrix} h^{\tilde{\mathcal {X}}_{j+1}} - H^{\tilde{\mathcal {X}}_{j+1}} A_{v_{j \vert k}} x_{j \vert k} \\ h^{\mathcal {U}} \end{bmatrix} \right\} . \end{aligned}$$
(29)

Obviously, \(\tilde{\mathcal {U}}_j^v\left( x_{j \vert k}, v_{j \vert k} \right)\) is a polytope. The optimal cost-to-go \(\tilde{J}_0^{*}\left( x_k \right)\) and the optimal RHC strategy (23) of the simplified RHC problem may alternatively be computed by setting \(j=0\) and solving the nonlinear program:

$$\begin{aligned} \tilde{Q}_j^{*}\left( x_{j \vert k}, v_{j \vert k} \right) = \min _{u_{j \vert k}}&\quad g\left( x_{j \vert k}, u_{j \vert k}, v_{j \vert k} \right) \nonumber \\&+ \tilde{J}_{j+1}^{*}\left( f\left( x_{j \vert k}, u_{j \vert k}, v_{j \vert k} \right) \right) \nonumber \\ \text {subject to:}&\quad u_{j \vert k} \in \tilde{\mathcal {U}}_j^v\left( x_{j \vert k}, v_{j \vert k} \right) . \end{aligned}$$
(30)

The constraints \(\tilde{\mathcal {U}}_j^v\left( x_{j \vert k}, v_{j \vert k} \right)\) are convex in this case, and it applies that:

$$\begin{aligned} \tilde{J}_j^{*}\left( x_{j \vert k} \right)&= \min _{v_{j \vert k}} \tilde{Q}_j^{*}\left( x_{j \vert k}, v_{j \vert k} \right) \text { subject to: } v_{j \vert k} \in \mathbb {M}, \end{aligned}$$
(31)
$$\begin{aligned} \mu _j^{\tilde{v}^{*}}\left( x_{j \vert k} \right)&\in \mathop {\mathrm {arg\,min}}\limits _{v_{j \vert k}} \tilde{Q}_j^{*}\left( x_{j \vert k}, v_{j \vert k} \right) \text { subject to: } v_{j \vert k} \in \mathbb {M}, \end{aligned}$$
(32)

and:

$$\begin{aligned} \mu _j^{\tilde{u}^{*}}\left( x_{j \vert k} \right) \in \mathop {\mathrm {arg\,min}}\limits _{u_{j \vert k}}&\quad g\left( x_{j \vert k}, u_{j \vert k}, \mu_j^{\tilde{v}^{\ast}}\left( x_{j \vert k} \right) \right) \nonumber \\ &\quad + \tilde{J}_{j+1}^{\ast}\left( f\left( x_{j \vert k}, u_{j \vert k}, \mu_j^{\tilde{v}^{\ast}}\left( x_{j \vert k} \right) \right) \right) \nonumber \\ \text {subject to:}&\quad u_{j \vert k} \in \tilde{\mathcal {U}}_j^{v}\left( x_{j \vert k}, \mu _j^{\tilde{v}^{*}}\left( x_{j \vert k} \right) \right) . \end{aligned}$$
(33)

Again, \(\tilde{J}_N^{*}\) is equal to the terminal cost (4) by definition, i.e. \(\tilde{J}_N^{*}\left( x_{N \vert k} \right) := g_N\left( x_{N \vert k} \right)\). The optimal cost-to-go \(\tilde{J}_j^{*}\) is still a pointwise minimum of functions that are convex and piecewise quadratic on polyhedra, such that the challenge of deriving an analytical expression remains.

The Problem 2 can be transformed into the finite-horizon control problem considered in previous work of the authors [16]. There, the optimal finite-horizon control laws have been approximated by (deep) neural networks. These finite-horizon control laws are fundamental for the RHC approach presented in Section “Receding-Horizon Control with Parametric Function Approximators”, thus the relevant results from [16] are summarized in Section “Finite-Horizon Control with Parametric Function Approximators” and tailored to the problem formulated before.

Finite-Horizon Control with Parametric Function Approximators

For the DSLS with finite-horizon N:

$$\begin{aligned} x_{j+1} = A_{v_j} x_j + B_{v_j} u_j =: f\left( x_j, u_j, v_j\right) , \quad j \in \{0, \ldots , N-1\} \end{aligned}$$
(34)

subject to the constraints:

$$\begin{aligned} x_j \in \tilde{\mathcal {X}}_j, \quad x_N = 0, \quad u_j \in \mathcal {U}, \quad j \in \{0, \ldots , N-1\}, \end{aligned}$$
(35)

consider the following problem, into which the simplified RHC problem 2 can be transferred with \(x_k\) as initial state \(x_0\).

Problem 3

(Finite-Horizon Control Problem) For a given initial state \(x_0\) at time \(j=0\), the DSLS (34) subject to (35), and a finite time horizon N, find input sequences \(\phi _{0}^{\tilde{u}^{*}} := \left( \tilde{u}_0^{*}, \ldots , \tilde{u}_{N-1}^{*} \right)\) and \(\phi _{0}^{\tilde{v}^{*}} := \left( \tilde{v}_0^{*}, \ldots , \tilde{v}_{N-1}^{*} \right)\) that steer \(x_0\) into the origin within N time steps, while minimizing (3):

$$\begin{aligned} \left( \phi _{0}^{\tilde{u}^{*}}, \phi _{0}^{\tilde{v}^{*}}\right) \in \mathop {\mathrm {arg\,min}}\limits _{\left( \phi _0^u, \phi _0^v \right) }&\quad J_0\left( x_0, \phi _0^u, \phi _0^v\right) \nonumber \\ \text {subject to:}&\quad x_{j+1} = f(x_j, u_j, v_j), \nonumber \\&\quad x_j \in \tilde{\mathcal {X}}_j, \quad u_j \in \mathcal {U}, \quad v_j \in \mathbb {M}, \nonumber \\&\quad x_N = 0, \quad j \in \{0, \ldots , N-1\}. \end{aligned}$$
(36)

The optimal finite-horizon control law:

$$\begin{aligned} \tilde{\pi }^{*} = \left\{ \left( \mu _0^{\tilde{u}^{*}}, \mu _0^{\tilde{v}^{*}} \right) , \ldots , \left( \mu _{N-1}^{\tilde{u}^{*}}, \mu _{N-1}^{\tilde{v}^{*}} \right) \right\} \end{aligned}$$
(37)

with \(v_j = \mu _j^{\tilde{v}^{*}}(x_j)\) and \(u_j = \mu _j^{\tilde{u}^{*}}(x_j)\) as defined in (32) and (33) does not only produce the optimal sequences \(\phi _0^{\tilde{v}^{*}}\) and \(\phi _0^{\tilde{u}^{*}}\) for a single, but also for all initial states \(x_0 \in \tilde{\mathcal {X}}_0\). This control law is, however, not readily applicable, as discussed in Section “Problem Formulation and Preliminaries”.

In [16], the functions \(\mu _j^{\tilde{u}^{*}}\) and \(\mu _j^{\tilde{v}^{*}}\) are approximated with the help of neural networks. The main ideas required for the further method development in Section Receding-Horizon Control with Parametric Function Approximators are briefly repeated here.

The approximation of the cost-to-go functions \(\tilde{J}_j^{*}\) by parametric functions \(\tilde{J}_j\) with real-valued parameter vectors \(r_j^J\) constitutes a so-called approximation in value space [30]. This allows to approximate the function \(\tilde{Q}_j^{*}\) defined in (30) by solving the following one-step look-ahead optimization problem with convex constraints:

$$\begin{aligned} \tilde{Q}_{\text {VS},j}\left( x_j, v_j\right) = \min _{u_j}&\quad g\left( x_j, u_j, v_j\right) + \tilde{J}_{j+1}\left( f(x_j, u_j, v_j); \,r_j^J\right) \nonumber \\ \text {subject to:}&\quad u_j \in \tilde{\mathcal {U}}_j^{v}\left( x_j, v_j\right) , \end{aligned}$$
(38)

where \(\tilde{J}_N(x_N) := g_N(x_N)\). Let \(\xi _{\text {VS}, j}^{\tilde{u}}(x_j, v_j)\) denote a solution of the optimization problem (38):

$$\begin{aligned} \xi _{\text {VS},j}^{\tilde{u}}\left( x_j, v_j\right) \in \mathop {\mathrm {arg\,min}}\limits _{u_j}&\quad g\left( x_j, u_j, v_j\right) + \tilde{J}_{j+1}\left( f(x_j, u_j, v_j); r_j^J\right) \nonumber \\ \text {subject to:}&\quad u_j \in \tilde{\mathcal {U}}_j^{v}\left( x_j, v_j\right) . \end{aligned}$$
(39)

The finite-horizon control law (37) can then be approximated by:

$$\begin{aligned} \tilde{\pi }_{\text {VS}} = \left\{ \left( \mu _{\text {VS},0}^{\tilde{u}}, \mu _{\text {VS},0}^{\tilde{v}} \right) , \ldots , \left( \mu _{\text {VS},N-1}^{\tilde{u}}, \mu _{\text {VS},N-1}^{\tilde{v}} \right) \right\} , \end{aligned}$$
(40)

with:

$$\begin{aligned} \mu _{\text {VS},j}^{\tilde{v}}(x_j)&\in \mathop {\mathrm {arg\,min}}\limits _{v_j} \tilde{Q}_j(x_j, v_j) \text { subject to: } v_j \in \mathbb {M}, \end{aligned}$$
(41)
$$\begin{aligned} \mu _{\text {VS},j}^{\tilde{u}}(x_j)&= \xi _{\text {VS},j}^{\tilde{u}}\left( x_j, \mu _{\text {VS},j}^{\tilde{v}}(x_j) \right) . \end{aligned}$$
(42)

If a closed-form expression for the partial derivative \([\partial \tilde{J}_{j+1} / \partial x_{j+1}]\) is available, well established gradient methods can be used to solve  (38). The satisfaction of the convex constraints \(u_j \in \tilde{\mathcal {U}}_j^{v}(x_j, v_j)\) in methods of this type is not a problem, see e.g. [37, Chapter 3]. This approach with neural networks to approximate the cost-to-go has been proposed in [38] to guarantee constraint satisfaction for systems without switching. Note that satisfaction of the constraints (35) is even guaranteed in case of imperfect approximations of the optimal cost-to-go functions, and if the iterative procedure of gradient methods is stopped before finding a local minimum.

The alternative approach of approximating the functions \(\left( \mu _j^{\tilde{u}^{*}}, \mu _j^{\tilde{v}^{*}} \right)\) directly by parametric functions \(\left( \mu _{\text {PS}, j}^{\tilde{u}}, \mu _{\text {PS}, j}^{\tilde{v}} \right)\) with real-valued parameter vectors \(\left( r_j^u, r_j^v\right)\) constitutes a so-called approximation in policy space [30]. In what follows, a possible realization of \(\mu _{\text {PS}, j}^{\tilde{v}}\) is presented.

Motivated by classification tasks, parametric functions:

$$\begin{aligned} p_j\left( x_j; r_j^v\right)&= \begin{bmatrix} p_{1, j}\left( x_j \right)&\ldots&p_{M, j}\left( x_j \right) \end{bmatrix}^\mathrm{T}, \\&\quad \sum _{i = 1}^M p_{i, j}\left( x_j \right) = 1, \quad p_{i, j}\left( x_j \right) \ge 0 \end{aligned}$$

are introduced, which are trained to predict the probability \(p_{v_j, j}\) of a discrete input \(v_j\) being optimal for state \(x_j\) at time j. Note that \(p_j(x_j)\) represents by definition a valid probability distribution. The function \(\mu _{\text {PS}, j}^{\tilde{v}}\) can be defined as the one which assigns to each state \(x_j\) at time j the discrete input \(v_j\) with the highest predicted probability to be optimal. The procedure of establishing \(p_j\) as neural network is described in Section Neural Networks as Parametric Approximators.

The finite-horizon control law (37) can be approximated on the basis of approximation in policy space by:

$$\begin{aligned} \tilde{\pi }_{\text {PS}} = \left\{ \left( \mu _{\text {PS}, 0}^{\tilde{u}_{\text {Proj}}}, \mu _{\text {PS}, 0}^{\tilde{v}} \right) , \ldots , \left( \mu _{\text {PS}, N-1}^{\tilde{u}_{\text {Proj}}}, \mu _{\text {PS}, N-1}^{\tilde{v}} \right) \right\} , \end{aligned}$$
(43)

with:

$$\begin{aligned} \mu _{\text {PS}, j}^{\tilde{v}}(x_j)&\in \mathop {\mathrm {arg\,max}}\limits _{v_j} p_{v_j, j}\left( x_j\right) \text { subject to: } v_j \in \mathbb {M}, \end{aligned}$$
(44)
$$\begin{aligned} \mu _{\text {PS}, j}^{\tilde{u}_{\text {Proj}}}(x_j)&\in \mathop {\mathrm {arg\,min}}\limits _{u_j} \left\| u_j - \mu _{\text {PS}, j}^{\tilde{u}}(x_j) \right\| _2^2 \text { s.t.: } u_j \in \tilde{\mathcal {U}}_j^v\left( x_j, \mu _{\text {PS}, j}^{\tilde{v}}\left( x_j\right) \right) . \end{aligned}$$
(45)

The projection of an inadmissible input \(\mu _{\text {PS}, j}^{\tilde{u}}(x_j) \not \in \tilde{\mathcal {U}}_j^v\left( x_j, \mu _{\text {PS}, j}^{\tilde{v}}(x_j)\right)\) onto the polytope \(\tilde{\mathcal {U}}_j^v\left( x_j, \mu _{\text {PS}, j}^{\tilde{v}}(x_j)\right)\) in (45) can be solved efficiently as QP. This is proposed in [31] to guarantee constraint satisfaction of neural network controllers—here, the approach guarantees the satisfaction of the constraints (35).

Training and Training Data Generation

The prediction of the optimal cost-to-go \(\tilde{J}_j^{*}\) by \(\tilde{J}_j\) for a state \(x_j\) at time j can be solved as a regression task. Assume for the moment that a parametric function \(\tilde{J}_j\) and a data set consisting of state-cost pairs \(\left( x_j^s, J_j^s \right)\), \(s \in \left\{ 1, \ldots , q_j^J \right\}\) are available. Each value \(J_j^s\) denotes a regression target that represents a cost sample for the corresponding sample state \(x_j^s\). The parameter vector \(r_j^J\) can then be adapted with the aim to improve the performance on the considered regression task by learning from the data set. Of course, a performance measure is required hereto, and the mean-squared error is a typical choice. The adaption procedure, typically named training, is an instance of supervised learning, for which several established algorithms exist, see e.g. [39]. The parameter vectors \(r_j^u\) and \(r_j^v\) can be adapted by supervised learning, too, requiring that data sets \(\left( x_j^s, u_j^s \right)\), \(s \in \left\{ 1, \ldots , q_j^u \right\}\) and \(\left( x_j^s, v_j^s \right)\), \(s \in \left\{ 1, \ldots , q_j^v \right\}\) are available.

figure b

The training data may originate from offline solutions of the considered MIQP problem. This approach, however, may take too much time due to the exponential growth of the number of possible discrete input sequences with N. An alternative is the use of approximate dynamic programming or reinforcement learning methods. The offline procedure in Algorithm 2 constitutes an approximate dynamic programming example that extends the sequential dynamic programming procedure from [30] to DSLS.

Neural Networks as Parametric Approximators

Figure 1 illustrates a feed-forward neural network that is characterized by a chain structure of the form:

$$\begin{aligned} h(x_j) = \left( h^{(L)} \circ \cdots \circ h^{(2)} \circ h^{(1)}\right) (x_j), \end{aligned}$$
(46)

where \(h^{(L)}\) denotes the final layer and \(h^{(l)}\) the hidden layer \(l\in \{1,\ldots ,L-1\}\).

Fig. 1
figure 1

Architecture of a feed-forward neural network

Further, \(\eta ^{(l)}\) denotes the output of layer l, and \(\eta ^{(0)}\) constitutes the input of the overall network:

$$\begin{aligned} \eta ^{(0)}(x_j)&= x_j, \end{aligned}$$
(47)
$$\begin{aligned} \eta ^{(l)}(x_j)&= \left( h^{(l)} \circ \cdots \circ h^{(1)}\right) (x_j). \end{aligned}$$
(48)

The hidden layers in Fig. 1 are vector-to-vector functions of the form

$$\begin{aligned} h^{(l)}\left( \eta ^{(l-1)}\right) = \left( \phi ^{(l)} \circ \psi ^{(l)}\right) \left( \eta ^{(l-1)}\right) , \end{aligned}$$
(49)

with affine and nonlinear transformations \(\psi ^{(l)}\) and \(\phi ^{(l)}\), respectively. The affine transformation is affected by the choice of the weight matrix \(W^{(l)}\) and the bias vector \(b^{(l)}\):

$$\begin{aligned} \psi ^{(l)}\left( \eta ^{(l-1)}\right) = W^{(l)} \eta ^{(l-1)} + b^{(l)}. \end{aligned}$$
(50)

Each layer consists of parallel acting units, and a positive integer \(S^{(l)}\) describes the number of units in layer l. Each unit i in layer l defines a vector-to-scalar function, which is the i-th component of \(h^{(l)}\). For the hidden layers, \(h_i^{(l)}\left( \eta ^{(l-1)}\right) = \phi _{i}^{(l)}\left( W^{(l)} \eta ^{(l-1)} + b^{(l)}\right)\) with \(\phi _{i}^{(l)}\) denoting an activation function. Typical choices are rectified linear units or sigmoid functions. For the purposes of this work, linear and softmax output units are considered. For a neural network with linear output units, the function \(h^{(L)}\) is an affine transformation:

$$\begin{aligned} \psi ^{(L)}\left( \eta ^{(L-1)}\right) = W^{(L)} \eta ^{(L-1)} + b^{(L)}. \end{aligned}$$
(51)

Such an affine transformation arises also in softmax output units, in which \(h_i^{(L)}\) is set to:

$$\begin{aligned} \begin{aligned} \text {softmax}_{i}\left( \psi ^{(L)}\left( \eta ^{(L-1)}\right) \right) = \frac{\exp \left( \psi _i^{(L)}\left( \eta ^{(L-1)}\right) \right) }{\sum _{j=1}^{S^{(L)}}\exp \left( \psi _j^{(L)}\left( \eta ^{(L-1)}\right) \right) }. \end{aligned} \end{aligned}$$
(52)

The neural network (46) belongs to the family of parametric functions, whose shape is formed by the parameter vector that consists of the weights and biases:

$$\begin{aligned} r = \begin{bmatrix} W_{1,1}^{(1)}&\ldots&W_{S^{(L)},S^{(L-1)}}^{(L)}&b_1^{(1)}&\ldots&b_{S^{(L)}}^{(L)} \end{bmatrix}^\mathrm{T}. \end{aligned}$$
(53)

Subsequently, the neural network structure (46) is considered as basis for parametric approximators. For the cost-to-go function approximators \(\tilde{J}_j\), the use of continuous and continuously differentiable activation functions (such as sigmoid functions) and linear output units is proposed. As shown in [38], this allows one to derive closed-form expressions for the partial derivatives of h with respect to its arguments:

$$\begin{aligned} \frac{\partial h\left( x_j\right) }{\partial x_j}&= \prod _{i=0}^{L-1} \frac{\partial h^{(L-i)}\left( \eta ^{\left( L-(i+1)\right) }\left( x_j\right) \right) }{\partial \eta ^{\left( L-(i+1)\right) }}, \nonumber \\ \frac{\partial h^{(l)}\left( \eta ^{(l-1)}\left( x_j\right) \right) }{\partial \eta ^{(l-1)}}&= \frac{\partial \phi ^{(l)}\left( \psi ^{(l)}\left( \eta ^{(l-1)}\left( x_j\right) \right) \right) }{\partial \psi ^{(l)}}\cdot W^{(l)}, \nonumber \\ \frac{\partial h^{(L)}\left( \eta ^{(L-1)}\left( x_j\right) \right) }{\partial \eta ^{(L-1)}}&= W^{(L)}, \quad l \in \{ 1, \ldots , L-1\}. \end{aligned}$$
(54)

Linear output units are further proposed for establishing \(\mu _{\text {PS},j}^{\tilde{u}}\). Here, it is not necessarily required that the activation functions are continuous and continuously differentiable. The softmax output unit, on the other hand, is proposed as output unit for \(p_j\). It is common to use softmax units as output units to represent probability distributions over different classes [39]. According to (52), each output of the neural network with softmax output units is in between 0 and 1, and all outputs sum up to 1, leading to a valid probability distribution.

Receding-Horizon Control with Parametric Function Approximators

The RHC strategy (23) can be computed by solving a QP for each possible discrete input sequence. As already mentioned, this procedure, however, becomes computationally intractable rapidly for increasing N, due to the exponential growth of the possible number of discrete input sequences. The approach presented in this section aims at approximating the RHC strategy to make the online application possible, and is based on the idea to solve a QP only for a small number of discrete input sequences. Of course, a procedure is desirable which selects those discrete input sequences which are promising candidates for being the true optimal one(s). The procedure proposed in this section is based on the ideas for approximating the finite-horizon control law by neural networks, as presented in the previous section.

Let \(\mathcal {V}(k)\) denote a small set of selected discrete input sequences at time k—suppose for a moment that this set was available. For a given state \(x_k\), the approach computes the input sequences \(\phi _{0 \vert k}^{\tilde{v}} = \left( \tilde{v}_{0 \vert k}, \ldots , \tilde{v}_{N-1 \vert k} \right)\) and \(\phi _{0 \vert k}^{\tilde{u}} = \left( \tilde{u}_{0 \vert k}, \ldots , \tilde{u}_{N-1 \vert k} \right)\) by solving the QP defined in (25) for each discrete input sequence \(\phi _{0 \vert k}^v \in \mathcal {V}(k)\):

$$\begin{aligned} \phi _{0 \vert k}^{\tilde{v}}&= \Phi _0^{\tilde{v}}\left( x_k\right) \in \mathop {\mathrm {arg\,min}}\limits _{\phi _{0 \vert k}^v} \tilde{V}_0^{*}\left( x_k, \phi _{0 \vert k}^v \right) \text { subject to: } \phi _{0 \vert k}^v \in \mathcal {V}(k) \end{aligned}$$
(55)
$$\begin{aligned} \phi _{0 \vert k}^{\tilde{u}}&= \Phi _0^{\tilde{u}}(x_k) \in \mathop {\mathrm {arg\,min}}\limits _{\phi _{0 \vert k}^u} J_0\left( x_k, \phi _{0 \vert k}^u, \Phi _0^{\tilde{v}}\left( x_k\right) \right) \text { subject to:~25b}. \end{aligned}$$
(56)

The approximated RHC strategy is obtained by applying the first element \(\tilde{u}_{0 \vert k}\) of the continuous input sequence \(\phi _{0 \vert k}^{\tilde{u}}\) and the first element \(\tilde{v}_{0 \vert k}\) of the discrete input sequence \(\phi _{0 \vert k}^{\tilde{v}}\) to the DSLS (1) at time k:

$$\begin{aligned} (u_k, v_k) = \left( \tilde{u}_{0 \vert k}, \tilde{v}_{0 \vert k} \right) =: \left( \mu _0^{\tilde{u}}\left( x_k\right) , \mu _0^{\tilde{v}}\left( x_k\right) \right) . \end{aligned}$$
(57)

The closed-loop dynamics for the DSLS (1) controlled by the approximated RHC strategy is then:

$$\begin{aligned} x_{k+1} = f\left( x_k, \mu _0^{\tilde{u}}\left( x_k\right) , \mu _0^{\tilde{v}}\left( x_k\right) \right) =: \tilde{f}_{\text {cl}}\left( x_k\right) , \quad k \in \mathbb {N}_0. \end{aligned}$$
(58)

Subsequently, \(\tilde{J}_{\text {RHC}, 0}\) is defined as:

$$\begin{aligned} \tilde{J}_{\text {RHC}, 0}\left( x_k\right) := \tilde{V}_0^{*}\left( x_k, \Phi _0^{\tilde{v}}\left( x_k\right) \right) = J_0\left( x_k, \Phi _0^{\tilde{u}}\left( x_k\right) , \Phi _0^{\tilde{v}}\left( x_k\right) \right) , \end{aligned}$$
(59)

and constitutes, obviously, an upper bound to the optimal cost-to-go \(\tilde{J}_0^{*}\).

For the determination of \(\mathcal {V}(k)\) at time k, \(M^{\ell }\) different discrete input sequences \(\left( v_{0 \vert k}^{[i]}, \ldots , v_{\ell \vert k}^{[i]}, \ldots , v_{N-1 \vert k}^{[i]}\right)\), \(i \in \left\{ 1, \ldots , M^{\ell } \right\}\) are generated by a combination of approximation in value space and approximation in policy space, as described next. First, for each possible subsequence \(\left( v_{0 \vert k}^{[i]}, \ldots , v_{\ell -1 \vert k}^{[i]}\right) \in \mathbb {M}^{\ell }\), the state \(x_{\ell \vert k}^{[i]}\) is determined recursively as illustrated in Fig. 2:

$$\begin{aligned} x_{j+1 \vert k}^{[i]} = f\left( x_{j \vert k}^{[i]}, \xi _{\text {VS}, j}^{\tilde{u}}\left( x_{j \vert k}^{[i]}, v_{j \vert k}^{[i]} \right) , v_{j \vert k}^{[i]}\right) , \quad j \in \{0, \ldots , \ell -1\}. \end{aligned}$$
(60)
Fig. 2
figure 2

Generation of discrete input sequences for the determination of \(\mathcal {V}(k)\) with \(\ell = 1\)

Here, \(x_{0 \vert k}^{[i]}\) is the current state \(x_k\) at time k, i.e. \(x_{0 \vert k}^{[i]} = x_k\). Recall that the value of function \(\xi _{\text {VS}, j}^{\tilde{u}}\) for state \(x_{j \vert k}^{[i]}\) and discrete input \(v_{j \vert k}^{[i]}\) results according to (39) from the solution of the nonlinear program (38) with convex constraints. The application of well-established gradient methods is possible here due to the availability of the closed-form expression for the gradient of the neural network \(\tilde{J}_{j+1}\), as specified in (54). The remaining subsequence \(\left( v_{\ell +1 \vert k}^{[i]}, \ldots , v_{N-1 \vert k}^{[i]}\right)\) follows from the approximated finite-horizon control law specified in (43):

$$\begin{aligned} x_{j+1 \vert k}^{[i]}&= f\left( x_{j \vert k}^{[i]}, \mu _{\text {PS},j}^{\tilde{u}_{\text {Proj}}}\left( x_{j \vert k}^{[i]} \right) , \mu _{\text {PS},j}^{\tilde{v}}\left( x_{j \vert k}^{[i]} \right) \right) , \nonumber \\&\quad j \in \{\ell , \ldots , N-1\}. \end{aligned}$$
(61)

In addition, one further discrete input sequence \(\left( v_{0 \vert k}^{[0]}, \ldots , v_{N-1 \vert k}^{[0]}\right)\) is selected, and chosen to guarantee asymptotic stability. Motivated by the proof of Proposition 3, \(\left( v_{0 \vert k}^{[0]}, \ldots , v_{N-2 \vert k}^{[0]}, v_{N-1 \vert k}^{[0]}\right)\) is set to \(\left( \tilde{v}_{1 \vert k-1}, \ldots , \tilde{v}_{N-1 \vert k-1}, 1\right)\) for \(k>0\). The discrete input sequence \(\left( v_{0 \vert k}^{[0]}, \ldots , v_{N-1 \vert k}^{[0]}\right)\) for \(k=0\), on the other hand, is arbitrarily selected from \(\mathbb {M}^{N}\). Algorithm 3 summarizes the procedure of determining the set \(\mathcal {V}(k)\).

figure c

Theorem 1

For the DSLS (1) with constraints (2), let \(\mathcal {V}(k)\) be determined by Algorithm 3. Then, the approximated RHC strategy (57) is recursively feasible for \(x_0 \in \tilde{\mathcal {X}}_0\), and the origin of the closed-loop system (58) is asymptotically stable with domain of attraction \(\tilde{\mathcal {X}}_0\).

Proof of Theorem 1

Given \(x_k\) at time k, the approximated RHC strategy solves the QP defined in (25) for all \(\phi _{0 \vert k}^v \in \mathcal {V}(k)\). The definition of \(\tilde{\mathcal {X}}_0\) ensures that the QP has a feasible solution for any \(\phi _{0 \vert k}^v \in \mathbb {M}^N \supseteq \mathcal {V}(k)\) if \(x_k \in \tilde{\mathcal {X}}_0\), such that \(x_k \in \tilde{\mathcal {X}}_0\) ensures feasibility of the approximated RHC strategy, too. If feasible, the constraints of the QP enforce that \(\tilde{f}_{\text {cl}}\left( x_k \right) \subseteq \tilde{\mathcal {X}}_1\). It follows immediately from Propositions 1 and (21) that \(\tilde{\mathcal {X}}_0 \supseteq \tilde{\mathcal {X}}_1\), such that recursive feasibility of the approximated RHC strategy is guaranteed for \(x_0 \in \tilde{\mathcal {X}}_0.\)

Now consider an \(x_0 \in \tilde{\mathcal {X}}_0\), and let \(\phi _{0 \vert 0}^{\tilde{x}} = \left( \tilde{x}_{0 \vert 0}, \tilde{x}_{1 \vert 0}, \ldots , \tilde{x}_{N \vert 0} \right)\) be the state sequence with \(\tilde{x}_{N \vert 0} = 0\) and \(\tilde{x}_{0 \vert 0} := x_0\) that results from the input sequences \(\phi _{0 \vert 0}^{\tilde{u}} = \Phi _0^{\tilde{u}}(x_0)\) and \(\phi _{0 \vert 0}^{\tilde{v}} = \Phi _0^{\tilde{v}}(x_0)\). Hence, one gets:

$$\begin{aligned} \tilde{J}_{\text {RHC}, 0}(x_0)&= g_N\left( \tilde{x}_{N \vert 0} \right) + \sum _{j = 0}^{N-1} g\left( \tilde{x}_{j \vert 0}, \tilde{u}_{j \vert 0}, \tilde{v}_{j \vert 0} \right) \\&= \sum _{j = 0}^{N-1} g\left( \tilde{x}_{j \vert 0}, \tilde{u}_{j \vert 0}, \tilde{v}_{j \vert 0} \right) . \end{aligned}$$

Due to \(u_0 = \tilde{u}_{0 \vert 0}\) and \(v_0 = \tilde{v}_{0 \vert 0}\), it follows from (1) that \(x_1 = f\left( x_0, \tilde{u}_{0 \vert 0}, \tilde{v}_{0 \vert 0} \right)\), such that \(x_1 = \tilde{x}_{1 \vert 0}\).

The state sequence \(\phi _{0 \vert 1}^{x} = \left( x_{0 \vert 1}, \ldots , x_{N-1 \vert 1}, x_{N \vert 1} \right) := \left( \tilde{x}_{1 \vert 0}, \ldots , \tilde{x}_{N \vert 0}, 0 \right)\) corresponds to the continuous input sequence \(\phi _{0 \vert 1}^{u} = \left( u_{0 \vert 1}, \ldots , u_{N-2 \vert 1}, u_{N-1 \vert 1} \right) := \left( \tilde{u}_{1 \vert 0}, \ldots , \tilde{u}_{N-1 \vert 0}, 0 \right)\) and the discrete input sequence \(\phi _{0 \vert 1}^{v} = \left( v_{0 \vert 1}, \ldots , v_{N-1 \vert 1} \right) := \left( v_{0 \vert 1}^{[0]}, \ldots , v_{N-2 \vert 1}^{[0]}, v_{N-1 \vert 1}^{[0]}\right) = \left( \tilde{v}_{1 \vert 0}, \ldots , \tilde{v}_{N-1 \vert 0}, 1\right)\). Since the sequences \(\phi _{0 \vert 1}^{x}\) and \(\phi _{0 \vert 1}^{u}\) satisfy \(x_{j \vert 1} \in \tilde{\mathcal {X}}_j\) and \(u_{j \vert 1} \in \mathcal {U}\), respectively, they are admissible, such that the cost:

$$\begin{aligned} J_0\left( x_1, \phi _{0 \vert 1}^{u}, \phi _{0 \vert 1}^{v} \right)&= g_N\left( x_{N \vert 1} \right) + \sum _{j = 0}^{N-1} g\left( x_{j \vert 1}, u_{j \vert 1}, v_{j \vert 1} \right) \nonumber \\& = \sum _{j = 0}^{N-2} g\left( x_{j \vert 1}, u_{j \vert 1}, v_{j \vert 1} \right) \\&= \sum _{j = 1}^{N-1} g\left( \tilde{x}_{j \vert 0}, \tilde{u}_{j \vert 0}, \tilde{v}_{j \vert 0} \right) \\ &= \tilde{J}_{\text {RHC}, 0}(x_0) - g\left( \tilde{x}_{0 \vert 0}, \tilde{u}_{0 \vert 0}, \tilde{v}_{0 \vert 0} \right) \end{aligned}$$

constitutes an upper bound on \(\tilde{V}^{*}\left( x_1, \phi _{0 \vert 1}^{v} \right)\). On the other hand, \(\phi _{0 \vert 1}^{v} \in \mathcal {V}(1)\), such that \(\tilde{V}^{*}\left( x_1, \phi _{0 \vert 1}^{v} \right)\) constitutes an upper bound on \(\tilde{J}_{\text {RHC}, 0}(x_1) = \tilde{V}^{*}\left( x_1, \phi _{0 \vert 1}^{\tilde{v}} \right)\). Hence:

$$\begin{aligned} \tilde{J}_{\text {RHC}, 0}(x_1) \le J_0\left( x_1, \phi _{0 \vert 1}^{u}, \phi _{0 \vert 1}^{v} \right)&= \tilde{J}_{\text {RHC}, 0}(x_0)\\&\quad - g\left( \tilde{x}_{0 \vert 0}, \tilde{u}_{0 \vert 0}, \tilde{v}_{0 \vert 0} \right) , \end{aligned}$$

and it follows from induction that:

$$\begin{aligned}&\tilde{J}_{\text {RHC}, 0}\left( \tilde{f}_{\text {cl}}(x_k) \right) - \tilde{J}_{\text {RHC}, 0}(x_k) \le - g\left( x_k, \mu _0^{\tilde{u}}(x_k), \mu _0^{\tilde{v}}(x_k) \right) ,\nonumber \\&\quad \forall x_k \in \tilde{\mathcal {X}}_0. \end{aligned}$$
(62)

Since \(x_k \in \tilde{\mathcal {X}}_0\) implies \(\tilde{f}_{\text {cl}}(x_k) \in \tilde{\mathcal {X}}_0\), the state sequence of the closed-loop system (58) lies within \(\tilde{\mathcal {X}}_0\) for any \(x_0 \in \tilde{\mathcal {X}}_0\). Note that the stage cost g and the terminal cost \(g_N\) are continuous and positive definite functions, and that \(\tilde{J}_{\text {RHC}, 0}(x_k)\) is lower bounded by zero. Hence, \(\tilde{J}_{\text {RHC}, 0}(x_k)\) decreases according to (62) along any state sequence that starts from \(\tilde{\mathcal {X}}_0\), i.e., convergence to the origin without leaving \(\tilde{\mathcal {X}}_0\) is guaranteed as \(k \rightarrow \infty\). \(\square\)

Numerical Example

This section provides a numerical example for the illustration and the evaluation of the proposed approach, inspired by the numerical example considered in [16] for the finite-horizon case.

The switched system (1) is parameterized by the matrices:

$$\begin{aligned} A_1&= \begin{bmatrix} 0 &{} 1 \\ -0.8 &{} 2.4 \end{bmatrix}, \quad A_2 = \begin{bmatrix} 0 &{} 1 \\ -1.8 &{} 3.6 \end{bmatrix}, \quad A_3 = \begin{bmatrix} 0 &{} 1 \\ -0.56 &{} 1.8 \end{bmatrix}, \\ A_4&= \begin{bmatrix} 0 &{} 1 \\ -8 &{} 6 \end{bmatrix}, \quad B_1 = B_2 = B_3 = B_4 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}, \end{aligned}$$

and is subject to polytopic constraints (2) with \(\mathcal {X} = \{x \in \mathbb {R}^2 \, \vert \, \vert x_i \vert \le 1\}\) and \(\mathcal {U} = \{u \in \mathbb {R} \, \vert \, \vert u \vert \le 4\}\). Furthermore, a quadratic cost function of type (3) is chosen with prediction horizon \(N=6\) and:

$$\begin{aligned} P&= Q_1 = Q_2 = Q_3 = Q_4 = \begin{bmatrix} 1 &{} 0 \\ 0 &{} 1 \end{bmatrix}, \\&\quad R_1 = R_2 = R_3 = R_4 = 1. \end{aligned}$$

As in [16], all neural networks required for the proposed approach are chosen to consist of one hidden layer with 50 units (i.e. \(S^{(1)} = 50\)). In each hidden unit, hyperbolic tangents have been chosen as activation function. The neural networks have been trained according to Algorithm 2 offline by choosing \(q_j = 1000\).

To evaluate the approximation quality of the approximated RHC strategy, 1000 states \(x^p\) have been generated by gridding the set \(\tilde{\mathcal {X}}_0\). The latter has been determined with Algorithm 1 and is marked by the shaded polytope in Fig. 3. For each \(x^p\), the optimal cost-to-go \(\tilde{J}_0^{*}\left( x^p \right)\) and its approximation \(\tilde{J}_{\text {RHC},0}\left( x^p \right)\) for \(\ell = 1\) have been computed. The distribution of the optimal cost \(\tilde{J}_0^{*}\left( x^p \right)\) for the different states is shown in Fig. 4. The comparison of the optimal cost \(\tilde{J}_0^{*}\left( x^p \right)\) with its approximation \(\tilde{J}_{\text {RHC},0}\left( x^p \right)\) yield a mean-squared error of only \(6.99 \times 10^{-5}\).

Fig. 3
figure 3

Example of a state sequence obtained from the approximated RHC strategy for \(x_0 = \begin{bmatrix} 0.125&1 \end{bmatrix}^\mathrm{T}\). The shaded polytope marks \(X_0\)

Fig. 4
figure 4

Box plot diagram showing the distribution of the optimal costs \(\tilde{J}_0^{*}\left( x^p\right)\) for 1000 states \(x^p\) obtained by gridding of \(\tilde{\mathcal {X}}_0\)

The average online computation time for the determination of the optimal costs by complete enumeration over all \(4^N\) possible discrete input sequences (and solving one QP each) was 19.8s on a standard notebook (Intel® Core™  i\(5-7200\)U Processor), where the CPLEXQP solver from the IBM® ILOG® CPLEX® Optimization Studio has been used for the solution of the QPs. In contrast, when applying the proposed scheme, the average online computation time for determining \(\tilde{J}_{\text {RHC},0}\left( x^p \right)\) (and thus the control inputs) was only 0.96s.

Figure 3 shows a state sequence obtained from the approximated RHC strategy for an exemplary initial state \(x_0\), and demonstrates the asymptotic stability of the origin as proven in Theorem 1.

Conclusion

This paper has considered the optimal control of discrete-time constrained switched linear systems (with externally forced switching) using the principle of the receding-horizon control. Building on previous work of the authors, an approach for approximating the optimal receding-horizon control strategy with neural networks as parametric function approximators has been developed. Important properties such as the guaranteed satisfaction of the polytopic constraints, recursive feasibility, and asymptotic stability of the origin of the closed-loop system under the approximated receding-horizon control strategy have been proven. The numerical example has shown that the proposed approach allows for the computation of approximating but close-to-optimal receding-horizon control strategies, which is considerable faster than solving the same problem by mixed-integer quadratic programming in any step.

The focus of this work has been to provide theoretical guarantees for the proposed approach, without putting the focus on reducing the speed of computation through efficient implementation. A streamlined implementation of the proposed algorithm certainly allows for further reduction of the computation. Furthermore, an in-depth treatment of an efficient training data generation has been out of the scope of this work. Investigating alternative schemes to sequential dynamic programming for efficient generation of training data is a worthwhile point of future work for improving the applicability of the proposed scheme.