Section 2 proposes the deployment of special multi-core ECUs in order to increase the availability of driving functions. Here the deployment of multi-core ECUs is discussed to increase the computational performance and to discover new opportunities for computation especially in relation to automated driving. According to [1], parallel computing is defined as the concurrent usage of a number of processors to solve one computational problem. Therefore, the problem is decomposed in chunks of work and then allocated to several computing resources. The processor communication architecture, the partitioning, the communication schedule between processors and finally the modeling of such problems is described in this section.
Processing structure and essentials
A parallel computing system consists of several processors (cores) which are located in small distance of each other and therefore the exchange of data between them is reliable and predictable [4]. Different processor interconnection architectures exist. One possibility is to exchange data via a shared memory. Another option is a message-passing architecture, where the network can either consist of directly connected links or of a shared bus. A combination of both architectures is called a hybrid architecture. For a first modeling approach and the sake of simplicity we assume to have a simplified architecture of synchronized processor cores as shown in Fig. 2. Each core is directly connected to a buffer which is directly connected to each buffer of the other cores. Values of inputs \(\hat{u}\) and outputs \(\hat{y}\) are also provided to the cores via buffers.
After the architecture has been identified, the control function can be decomposed. The decomposition method applied here is the functional decomposition where the computational workload is partitioned into less computing resource consuming sub-tasks [1].
To ensure that for every task a time interval for computation is allocated, a task scheduling on multi-core level is needed. Static scheduling policies where the execution order is determined off-line and unchanged during run-time (e.g. Round Robin) are used in real-time systems, due to their predictability. But dynamic schedulers, where the execution order is defined on-line are under research currently [13]. Important task parameters for hard real-time systems, such as control systems, are the start time and deadline of a task as well as the worst case execution time. For such systems the tasks are executed periodically [5]. The continuous time signals to and from the plant, such as the control input and the plant output, are sampled periodically and then held constant between sampling time instants [5].
Modeling and system description
For the modeling of the parallel computation of controllers we use the idea of the emulation design as in Networked Control Systems (NCS) [18] where the control system is a continuous time dynamic system. First a stabilizing controller for the plant is designed while the effects of parallel computation are ignored. The closed-loop system shown in Fig. 3(a) is defined by
$$\begin{aligned} \dot{x}_{p} &= f_{p}(x_{p},u), \qquad y = g_{p}(x_{p}), \end{aligned}$$
(1)
$$\begin{aligned} \dot{x}_{c} &= f_{c}(x_{c},y),\qquad u = g_{c}(x_{c}), \end{aligned}$$
(2)
where the plant state is \(x_{p} \in \mathbb{R}^{n_{p}}\), the controller state is \(x_{c} \in \mathbb{R}^{n_{c}}\), the control input is \(u \in \mathbb{R}^{n_{u}}\) and the plant output is \(y \in \mathbb{R}^{n_{y}}\), where the dimensions \(n_{c}\), \(n_{u}\), \(n_{y}\) and \(n_{p}\) are positive real numbers. The functions \(f_{p}\), \(g_{p}\), \(f_{c}\) and \(g_{c}\) are assumed to be continuous and sufficiently smooth. The time sequence \(t_{k}\) is defined by \(t_{0} < t_{1} <\ldots \) with \(T = t_{k + 1}-t_{k} \) and satisfies \(T >0\). The interval \(T\) is referred to as one computational interval and \(T_{s}\) with \(T_{s}\geq T\) to the computing cycle. \(T_{s}\) is discretized in \(N_{stages}\) time instants with \(N_{stages} \in \mathbb{Z}_{>0}\) and \(T_{s} = t_{k + Nstages} - t_{k}\).
Figure 3(b) shows a closed-loop system including a controller which is distributed over several processors. The controller is decomposed into \(n\) subsystems \(x_{c} = [x_{c1}^{T}, \ldots , x_{cn}^{T}]^{T}\) and allocated to \(n\) different processor cores. On the i-th core the differential equations of \(x_{i}\) are solved sufficiently fast to be considered as a continuous time system. The variables \(\hat{u}\), \(\hat{y}\) and \(\hat{x}_{ci}\) indicate buffer states containing the values of \(u\), \(y\) and \({x}_{ci}\) at the latest update time instants. These buffers operate as Zero-Order-Hold (ZOH) equivalents and satisfies \(\dot{\hat{y}}=0, \dot{\hat{u}}=0\) and \(\dot{\hat{x}}_{Ci}=0\) between update time instants. Inputs \({u}\) and outputs \({y}\) are periodically sampled every \(T_{s}\) and subsequently stored in \(\hat{u}\) and \(\hat{y}\); \(\hat{x}_{ci}\) are the values of states exchanged with other processors at certain time instants. This leads to the following system for times \(t\in[t_{k},t_{k+1}]\)
$$\begin{aligned} \dot{x}_{p} &= f_{p}(x_{p},\hat{u}) \\ y &= g_{p}(x_{p}) \\ \dot{x}_{c1} &= f_{c,1}(x_{c1},\hat{x}_{c2},\ldots ,\hat{x}_{cn} ,\hat{y}) \\ \vdots \\ \dot{x}_{cn} &= f_{c,n}(\hat{x}_{c1},\ldots ,\hat{x}_{c(n-1)},x_{cn},\hat{y}) \\ u &= g_{c}({x}_{c1},\ldots ,{x}_{cn}). \end{aligned}$$
(3)
To keep it simple \(g_{c}\) has unlimited access to \(x_{c}\).
Task and inter-processor communication scheduling
Important for that approach is when and how processors exchange data. This partly depends on the task scheduling. Here a static task scheduling as Round Robin [18] is considered. The communication schedule policy itself is inspired by methods of parallel numerical computation of Ordinary Differential Equation (ODEs), especially by the parallelism across the system [3]. It is a way of decoupling the system’s state equations to allow a parallel numerical integration [7]. Several methods were introduced over the years. Important for our work is the Waveform Relaxation (WR) [12]. The system of ODEs is decomposed into coupled subsystems and each subsystem is iteratively solved by its own numerical integration method for a given time interval. The solution of each subsystem for one iteration is called wave. After one iteration the waves are exchanged and the solver starts again until the numerical solution close enough to the unique solution [3].
Two iterative methods are distinguishable: Jacobi WR and Gauss–Seidel WR [4]. For the Jacobi WR waves generated an iteration before are used for the current integration. Therefore, values of the waves at specified time instants are exchanged and need to be stored for the whole integration time. On the other hand, for the Gauss–Seidel WR the waves of different subsystems are sequentially generated and exchanged. To implement a WR, multiple buffers for all controller states and specified exchange time instants would be necessary.
Hence, the scheduling policies used here are motivated by Jacobi WR and Gauss–Seidel WR. First we assume that after one integration the waves are close enough to the unique solutions and numerical integration effects can be ignored. Furthermore, the buffers \(\hat{x}_{ci}\) contain the values of \(x_{ci}\) at the latest exchange time instant.
The resulting idea of the scheduling for a test system with controller states \(x_{c1}\) and \(x_{c2}\) is depicted in Fig. 4.
The resulting parallel computing scheme shown by Fig. 4(a) is inspired by the Jacobi WR, where \(x_{c1}\) and \(x_{c2}\) are exchanged simultaneously after every computational period \(T\) and instantaneously stored in the buffers \(\hat{x}_{c1}\) and \(\hat{x}_{c2}\). The processor cores are busy with computing all the time. It is assumed that read and write actions on the buffers are sufficiently fast, so that the caused delays can be ignored. After every \(T_{s}=T\), \(u\) and \(y\) are sampled and stored in the buffers \(\hat{u}\) and \(\hat{y}\).
The sequential computing scheme given in Fig. 4(b) is inspired by the Gauss–Seidel WR. Within one computing cycle \(T_{s}\), \(\hat{x}_{c1}\) and \(\hat{x}_{c2}\) are updated sequentially. This leads to the definition of computational nodes. If the subsystems update their buffers in parallel, then they are assigned to the same computational node and \(N_{stages}=1\). Otherwise, the subsystems are assigned to different computational nodes and \(N_{stages}=n\), where \(n\) is the number of subsystems updated sequentially.
Consequently, each scheduling policy discussed before can be expressed similar to the update function defined in [18]. Based on the considerations before, an impulsive system similar to that one presented in [14] can be introduced. In a follow-up paper we will describe the model and stability analysis of this approach.