Deep model predictive flow control with limited sensor data and online learning

The control of complex systems is of critical importance in many branches of science, engineering, and industry, many of which are governed by nonlinear partial differential equations. Controlling an unsteady fluid flow is particularly important, as flow control is a key enabler for technologies in energy (e.g., wind, tidal, and combustion), transportation (e.g., planes, trains, and automobiles), security (e.g., tracking airborne contamination), and health (e.g., artificial hearts and artificial respiration). However, the high-dimensional, nonlinear, and multi-scale dynamics make real-time feedback control infeasible. Fortunately, these high-dimensional systems exhibit dominant, low-dimensional patterns of activity that can be exploited for effective control in the sense that knowledge of the entire state of a system is not required. Advances in machine learning have the potential to revolutionize flow control given its ability to extract principled, low-rank feature spaces characterizing such complex systems. We present a novel deep learning model predictive control framework that exploits low-rank features of the flow in order to achieve considerable improvements to control performance. Instead of predicting the entire fluid state, we use a recurrent neural network (RNN) to accurately predict the control relevant quantities of the system, which are then embedded into an MPC framework to construct a feedback loop. In order to lower the data requirements and to improve the prediction accuracy and thus the control performance, incoming sensor data are used to update the RNN online. The results are validated using varying fluid flow examples of increasing complexity.


Introduction
The robust and high-performance control of fluid flows presents an engineering grand challenge, with the potential to enable advanced technologies in domains as diverse as transportation, energy, security, and medicine.In many of these areas, the flows-described by the three-dimensional Navier-Stokes equations-are turbulent or exhibit chaotic dynamical behavior in the relevant regimes.As a consequence, the control of fluid flows is (1) Here, f (y) = z is the observation of the time (and potentially space) dependent system state y that has to follow a reference trajectory z ref , and α and β are regularization parameters penalizing the control input as well as its variation.The time-T map of the system dynamics describes how the system state evolves over one time step given the current state and control input.Problem (1) is then solved repeatedly over a fixed prediction horizon N and the first entry is applied to the real system.As the initial condition in the next time step, the real system state is used such that a feedback behavior is achieved.Note that u −1 is the control input that was applied to the system in the previous time step.The scheme is visualized in Fig. 1, where the MPC controller based on the full system dynamics is shown in green.
MPC has successfully been applied to a very large number of problems.However, a major challenge is the real-time requirement, i.e., (1) has to be solved within the sample time t = t i+1 − t i .In order to achieve this, linearizations are often used.Since even these can be too expensive to solve for large systems, we will here use a surrogate model which does not model the entire system state but only the control relevant quantities.In a flow control problem, these can be the lift and drag coefficients of a wing, for instance.Such an approach has successfully been used in combination with surrogate models based on dynamic mode decomposition [33] or clustering [31].We thus aim at directly approximating the dynamics for the observable z = f (y) and replacing the constraint in Problem (1) by a surrogate model.Following Takens embedding theory [39], we will use delay coordinates, an approach which has been successfully applied to many systems [8].Therefore, we define where d is the number of delays.Given a history of states z and controls u, the reduced dynamics then yield the state at the next time instant.This allows us to replace Problem (1) by the following surrogate problem: ( The resulting MPC controller is visualized in Fig. 1 in orange. Remark 1 Note that another advantage of modeling only the quantities relevant to the control part is that we depend much less strongly on the scales of the flow field (i.e., grid size and time step), as integral quantities such as body forces may evolve on their own (and possibly somewhat slower) time scale.

Related work
The main challenge in flow control-the construction of fast yet accurate models-has been addressed by many researchers in various ways.We here give a short overview of alternative methods (mostly related to the cylinder flow) and relate them to our approach.From a control-theoretical standpoint, the best way to compute a control law is via the exact model, i.e., the full Navier-Stokes equations.Using such a model in combination with an adjoint approach, a significant drag reduction could be achieved for the cylinder flow in [12] for Reynolds numbers up to 1000.However, this approach is too expensive for real-time control.To this end, several alternatives have been proposed, the most intuitive and well known being linearization around a desired operating point, cf.[18] for an overview.As a popular alternative, proper orthogonal decomposition (POD) [38] has emerged over the past decades, where the full state is projected onto a low-dimensional subspace spanned by orthogonal POD modes which are determined from snapshots of the full system.The resulting Galerkin models have successfully been used for control of the cylinder wake, see, e.g., [7,13].Balanced truncation POD models can be obtained for linear [44] or linearized systems [36].In order to ensure convergence to an optimal control input, the POD model can be updated regularly within a trust-region framework [6].Alternative approaches that are similar in spirit are moment matching [1] and linear-quadratic-Gaussian (LQG) balanced truncation [5].
The above-mentioned methods have as their main drawback that they quickly become prohibitively expensive with increasing Reynolds number.This is due to the fact that linearizations are less efficient or that the dynamics no longer live in low-dimensional subspaces that can be spanned by a few POD modes.Furthermore, all approaches require knowledge of the entire velocity (and potentially pressure) field, at least for the model construction.Both issues can be avoided when not considering the entire velocity field but only sensor data, which results in purely data-driven models or feedback laws.Several machine learning-based approaches have been presented in this context, for instance cluster-based surrogate models (cf.[31], where the drag of an airplane wing was reduced), feedback control laws constructed by genetic programming [45], or reinforcement learning controllers [35].These approaches are often significantly faster, rendering real-time control feasible.The approach presented in the following falls into this category as well.

DeepMPC: model predictive control with a deep recurrent neural network
In order to solve (2), the surrogate model for the control relevant system dynamics is required.For this purpose, we will use a deep RNN architecture which is implemented in TensorFlow [2].Once the model is trained and can predict the dynamics of z (at least over the prediction horizon), the model can be incorporated into the MPC loop.

Design of the RNN
As previously mentioned, the surrogate model is approximated using a deep neural network similar to [3].Each cell of the RNN predicts the system state for one time step.In order to capture the system dynamics using few observations only, we use the delay coordinates introduced above.Consequently, each RNN cell takes as input a sequence of past observations ẑi as well as corresponding control inputs ûi .The RNN consists of an encoder and a decoder (cf.Fig. 2a), where the decoder performs the actual prediction task and consists of N cells-one for each time step in the prediction horizon.This means that a single decoder cell computes ẑi+1 = (ẑ i , ûi ).The state information ẑi+1 is then forwarded to the next cell to compute the consecutive time step.In order to take long-term dynamics into account, an additional latent state l k+1 is computed based on past state information and forwarded from one cell to another.To properly compute this state for the first decoder cell, an encoder with M cells, whose cells only predict this latent state, is prepended to the decoder.As the encoder cell only predicts the latent state, it is a reduced version of the decoder cell which additionally contains elements for predicting the current and future dynamics, cf.Fig. 2a, b.
More precisely, the decoder cells are divided into three functional parts capturing different parts of the dynamics, i.e., long term (which is equivalent to an encoder cell) and current dynamics as well as the influence of the control inputs (see Fig. 2b).Therefore, the input (ẑ k , ûk ) of each cell k is divided into three parts  As shown in Fig. 2b, the encoder and decoder consist of different smaller sub-units, represented by gray boxes.Each of the gray boxes represents a fully connected neural network.The encoder cell consists of three parts, h l,past , h l,current and h latent .In h l,past and h l,current latent variables for the last k − 2b + 1, . . ., k − b and k − b + 1, . . ., k time steps are computed, respectively.The current latent state l k+1 can be computed based on the information given by h l,past , h l,current and the latent variable of the last RNN cell l k .In a decoder cell, the future state z k+1 is additionally computed.Therefore, the latent state l k+1 is used as an input for h past , and the results of h past , h current and h future are used to calculate the predicted state z k+1 and thus, ẑk+1 .The corresponding equations can be found in Appendix A.
The RNN-based MPC problem (2) is solved using a gradient-based optimization algorithm-in our case a BFGS method.The required gradient information with respect to the control inputs can be calculated using standard backpropagation through time.This is represented by the red arrows in Fig. 2b.Since the RNN model requires temporal information from at least M + 2b time steps (M encoder cells and input sequence of length 2b) to predict future states, there is an initialization phase in the MPC framework during which the control input is fixed to 0.

Training of the RNN
The RNN is trained offline with time series data ((z 0 , u 0 ), . . ., (z n , u n )).For the data collection, the system is actuated with uniformly distributed random yet continuously varying inputs.In order to overcome difficulties with exploding and vanishing gradients as well as problems with the effect of nonlinearities when iterating from one time step to another, we use the three-stage approach for learning as proposed in [24] and used in [3].First, a conditional restricted Boltzmann machine is used to compute good initial parameters for the RNN according to the work by [40].In the second stage, only the model for a single time step is trained as this is faster and more stable than directly training the entire network, i.e., the model for the entire prediction horizon.In the final stage, another training phase is performed, this time for the complete RNN with N decoder cells, improving and making the predictions more robust for the system state over N time steps.Both the individual RNN cell and the entire network were trained using the ADAM optimizer [19].

Online training of the RNN
During system operation, we obtain incoming sensor data in each iteration, i.e., with a relatively high frequency.In order to improve the prediction accuracy and thus the control performance of the model, these data are used to perform batch-wise online learning.To this end, we begin with a model which was trained in the offline phase as previously described.We then collect data over a fixed time interval such that we can update the RNN via batch-wise training using the ADAM optimizer.
To control the influence of the newly acquired data on the model-i.e., to avoid overfitting while yielding improved performance-it is important to select the training parameters accordingly, in particular the batch size and the learning rate.In our experiments, we have observed that the same initial learning rate and the same batch size as in the offline training phase are typically a good choice.However, the optimal choice of those parameters highly depends on the initial training data and the data collected during the control process.

Results
In order to study the performance of the proposed MPC framework, four flow control problems of increasing complexity are considered.Instead of a real physical system, we here use a numerical simulation of the full model as our plant.In all four cases, the flow around one or multiple cylinders (cf.Fig. 3) is governed by the incompressible 2D Navier-Stokes equations with fluid entering from the left at a constant velocity y in .The Reynolds number Re = y in /ν D (based on the kinetic viscosity ν and the cylinder diameter D) ranges from 100 to 200, i.e., we are in the laminar regime.The full system is solved using a finite volume discretization and the open-source solver OpenFOAM, cf.[14].The control relevant quantities are the lift and drag forces (i.e., the forces in x 2 and x 1 direction) acting on the cylinders.These consist of both friction and pressure forces which can be computed from the system state (or easily measured in the case of a real system).The system can be controlled by rotating the cylinders, i.e., the control variables are the angular velocities.

One cylinder
The first example is the flow around a single cylinder, cf.Fig. 3a, which was also studied in [30].At Re = 100, the uncontrolled system possesses a periodic solution, the so called von Kármán vortex street.On the cylinder surface, the fluid and the cylinder velocity are identical (no-slip condition) such that the flow can be steered by rotating the cylinder.The control relevant quantities are the forces acting on the cylinder-the lift C l and drag C d .We thus set z = (C l , C d ), and the aim is to control the cylinder such that the lift follows a given trajectory, e.g., a piecewise constant lift.
In order to create training data, a time series of the lift and the drag is computed from a time series of the full system state with a random control sequence.To avoid high input frequencies, a random rotation between u = −2 and u = 2 is selected according to an equal distribution every 0.5 s.The intermediate control inputs are then computed using a spline interpolation on the grid of the time-T map, where t = 0.1 s.For the RNN training, a time series with 110,000 data points is used which corresponds to a duration of 11,000 s.The concrete parameters for the RNN can be found in Appendix B. Remark 2 In the first experiments, we use abundant measurement data in order to rule out this source of insecurity.The chosen amount significantly exceeds the amount that is required for a good performance, as we will see below.Considering the uncontrolled system, the fluidic time scale is in the range of seconds, and even in the actuated situation, a much smaller amount of data is sufficient, in particular in combination with the stochastic gradient descent optimization which selects random subsets of the data set in each iteration.
In a first step, the quality of the RNN prediction is evaluated on the basis of an exemplary control input sequence.As one can see in Fig. 4a, the prediction is very accurate over several time steps for many combinations of observations z and control inputs u.There are only small regions where the predictions deviate stronger from the real lift and drag.
The good prediction quality enables us to use the RNN in the MPC framework, where the aim is to force the lift to +1, 0 and −1 for 20 s, respectively.This results in the following realization of ( 2): ( The parameter β is set to 0.01 in order to avoid too rapid variations of the input.Furthermore, the control is bounded by the minimum and maximum control input of the training data (i.e., ±2).We solve the optimization problem (3) using a BFGS method and with a prediction horizon of length N = 5, as our experiments have shown that this is a good compromise between accuracy and computing effort.As visualized in Fig. 4b, the DeepMPC scheme leads to a good performance.Due to the periodic fluctuation of the uncontrolled system, a periodic control is expected to suppress this behavior which is what we observe.

Fluidic pinball
In the second example, we control the flow around three cylinders in a triangular arrangement, as shown in Fig. 3b.This configuration is known as the fluidic pinball, see [9] for a detailed study.The control task is to make the lift of the three cylinders (C l,1 , C l,2 and C l,3 ) follow given trajectories by rotating the rear cylinders while the cylinder in the front is fixed.We thus want to approximate the system dynamics of the forces acting on all three cylinders, i.e., z ).Similar to the single cylinder case, the system possesses a periodic solution at Re = 100.When increasing the Reynolds number, the system dynamics become chaotic [9] and the control task is much more challenging.We thus additionally study the chaotic cases Re = 140 and Re = 200.The behavior of the observables of the uncontrolled flow for the three cases is presented in Fig. 5. Interestingly, the flow is non-symmetric in all cases.At Re = 100, for instance, the amplitudes of the oscillations of C l,2 and C l,3 are different, and which one is larger depends on the initial condition.As we now have two inputs and three reference trajectories, we obtain the following realization of problem (2): min where the value of β is set to 0.1.For Re = 100, the prediction horizon is again N = 5.Since the dynamics are more complex for Re = 140 and Re = 200, the prediction horizon has to be larger and we experimentally found N = 10 to be an appropriate choice.The average number of iterations as well as the number of function and gradient evaluations for the MPC optimization are shown in Table 1 for the three considered cases.
For all three Reynolds numbers, the training data are computed by simulating the system with random yet smoothly varying control inputs as before, i.e., uniformly distributed random values between u = −2 and u = 2 for each cylinder independently every 0.5 s.Due to the significantly smaller time step of the finite volume solver for the fluidic pinball, the control is interpolated on a finer grid with step size 0.005 s.Since the control input has to be fixed over one lag time due to the discrete-time mapping via the RNN, the mean over one lag time (i.e., over 20 data points) is taken for u.Time series with 150,000, 200,000 and 800,000 data points are used for Re = 100, Re = 140 and Re = 200, respectively.As already mentioned in Remark 2, the chosen amount of data exceeds the maximum that is required.We will address this below.The concrete parameters for the RNN can be found in Appendix B.
At Re = 100, where the dynamics are periodic, the control is quite effective, almost comparable to the single cylinder case, cf.Fig. 6a.In particular, the lift at the bottom cylinder is controlled quite well.The fact that the two lift coefficients cannot be controlled equally well is not surprising as the uncontrolled case is also asymmetric, cf.Fig. 5.
In comparison, the error e mean for the mildly chaotic case Re = 140 (Fig. 6b) is approximately one order of magnitude larger.The reference is still tracked, but larger deviations are observed.However, since the system is chaotic, this is to be expected.It is more difficult to obtain an accurate prediction and-more importantly-the system is more difficult to control, see also [32].
In order to improve the controller performance, we incorporate system knowledge, i.e., we exploit the symmetry along the horizontal axis.Numerical simulations suggest that this symmetry results in two metastable regions in the observation space and that the system changes only occasionally from one region to the other, analogous to the Lorenz attractor [8].Therefore, we symmetrize (and double) the training data as follows: This step is not necessary at Re = 100, since the collected data are already nearly symmetric.Nevertheless, the amount of training data can be doubled by exploiting the symmetry, and therefore, the simulation time to generate the training data can be reduced.
In Fig. 6c, the results for Re = 140 with symmetric training data are shown.In this example, the tracking error is reduced by nearly 50%.In particular, the second lift is well controlled.This indicates that it is advisable to incorporate known physical features such as symmetries in the data assimilation process.However, we still observe that the existence of two metastable regions results in a better control performance for one of the cylinders, depending on the initial condition.
For the final example, the Reynolds number is increased to Re = 200 in order to further increase the complexity of the dynamics, and symmetric data are used again.Due to the higher Reynolds number, switching between the two metastable regions occurs much more frequently, and the use of symmetric data yields less improvement.The results are presented in Fig. 6d, and we see that even though tracking is achieved, the oscillations around the desired state are larger.In Fig. 7a, the mean and the maximal error for the three Reynolds numbers are shown.Since the system dynamics become more complex with increasing Reynolds In order to study the robustness of the training process as well as the influence of the amount of training data on the tracking error, 5 identical experiments for Re = 200 have been performed for different amounts of training data (10%, 50% and 100% of the symmetrized data points), respectively, see Fig. 7b.We observe no trend with respect to the amount of training data, in particular considering that the standard deviation is approximately 0.03 for the average and 0.15 for the maximal error.Figure 8 shows a comparison between two solutions using RNNs trained with 100% and 10% of the data, respectively.Although the solution trained with  less data appears to behave less regularly, its performance is on average almost as good as the solution trained with 100% of the data.In conclusion, shorter time series already cover a sufficiently large part of the dynamics and are thus sufficient to train the model.In order to further improve the performance, the size of the RNN as well as the length of the training process would very likely have to be increased significantly and also, much smaller lag times would be required.

Online learning
Since we want to avoid a further increase in computational effort and data collection, we instead use small amounts of data sampled in the relevant parts of the observation space, i.e., close to the desired state.To this end, we perform online updates using the incoming sensor data.In our final experiment, we study how the control performance can be improved by performing batch-wise online updates of the RNN using the incoming data as described in Sect.3.3.In the feedback loop, a new data point is collected from the real system at each time step.Our strategy is to collect new data over 25 s for each update.By exploiting the symmetry, we obtain a batch size of 500 points within each interval that is used for further training of the RNN.In the right plot of Fig. 9, we compare the tracking error over several intervals, and we see that it can be decreased very efficiently within a few iterations by using online learning (see also Fig. 6a for a comparison).Besides reducing the tracking error, the control cost ||u|| 2 decreases, which further demonstrates the importance of using the correct training data.Significant improvements of both the tracking performance as well as the controller efficiency are obtained very quickly with comparably few measurements.

Conclusion and further work
We present a deep learning MPC framework for feedback control of complex systems.Our proposed sensorbased, data-driven learning architecture achieves robust control performance in a complex fluid system without recourse to the governing equations, and with access to only a few physically realizable sensors.In order to handle the real-time constraints, a surrogate model is built exclusively for control relevant and easily accessible quantities (i.e., sensor data).This way, the dimension of the RNN-based surrogate model is several orders of magnitude smaller compared to a model of the full system state.On the one hand, this enables applicability in a realistic setting since we do not rely on knowledge of the entire state.On the other hand, it allows us to address systems of higher complexity, i.e., it is a sensor-based and scalable architecture.The approach shows very good performance for high-dimensional systems of varying complexity, including chaotic behavior.To avoid prohibitively large training data sets and long training phases, an online update strategy using sensor data is applied.This way, excellent performance can be achieved for Re = 100.For future work, it will be important to further improve and robustify the online updating process, in particular for chaotic systems.Furthermore, it is of great interest to further decrease the training data requirements by designing RNN structures specifically tailored to control problems.Deep learning MPC is a critically important architecture for real-world engineering applications where only limited sensors are available to enact control authority.Therefore, in the context of real-world applications it would be of significant interest how the framework reacts to noisy sensor data.Since neural networks can in general work well with noisy data, we expect that DeepMPC gives a good noise-robustness.Furthermore, it should be investigated what happens if the Reynolds number changes slightly from the one used to train the RNN.
The equations for the encoder resp.the long term part of the decoder are given by h l,past = relu(W l,past x p + b l,past ), h l,current = relu(W l,current x current + b l,current ), l k+1 = h latent = relu(W latent,h (h l,past • h l,current ) + W latent,l l k + b latent ), where • is the element-wise multiplication of vectors and relu computes the rectified linear, i.e., max(0, x) for each element in x.The weights and the biases are the variables which are optimized during the training process and have the following dimensions: where N l is the size of the latent state l k and N h the size of the hidden layers.N l and N h can be chosen appropriately depending on the problem, cf.Appendix B for the concrete values as used in our experiments.where with N u = bn u .The final output z k+1 is computed via a linear layer: where W out ∈ R N h ×N out and b out ∈ R N out with N out = n o .

B Parameter choice for the RNN
The structure of the RNN is defined by the number of neurons in the hidden layers, i.e., by N h and N l , the number of encoder cells M and the chosen delay d = 2b.We performed different experiments to find appropriate values, and in Table 2 we summarize our final choices.

Fig. 1 2 2
Fig. 1 Structure of the control scheme, where a classical MPC controller based on a model for the full system state is shown in green and a controller using a surrogate model in orange (color figure online) , two time series z k−2b+1,...,k−b and z k−b+1,...,k of the observable with the corresponding control inputs u k−2b,...,k−b−1 , and u k−b,...,k−1 and a separate sequence of control inputs u k−b+1,...,k .The input length b is thus related to the delay via d = 2b.In summary, the inputs (ẑ k , ûk ) = (z k−2b,...,k , u k−2b,...,k ) are required.

Fig. 2 a
Fig. 2 a Unfolded RNN consisting of encoder (red) and decoder (yellow).b Layout of a single RNN cell.An encoder cell only consists of the blue area.A decoder cell, on the other hand, contains the entire green cell (color figure online)

Fig. 3 aFig. 4
Fig. 3 a Single cylinder setup.The system is controlled by setting the angular velocity u of the cylinder.b Setup for the fluidic pinball, where the forces on all cylinders are observed.The system is controlled by rotating cylinders one and two with the respective angular velocities u 1 and u 2

Fig. 7 a
Fig. 7 a Mean (blue) and maximal (red) error for various Reynolds numbers with full data.b Mean and maximal error for different training data set sizes, both averaged over 5 training runs (Re = 200) (color figure online)

Fig. 8
Fig. 8 DeepMPC reference tracking for Re = 200 and different amount of data

Fig. 9
Fig. 9 Re = 100 with online learning.The RNN is updated every 25 s (denoted by black lines on the left).On the right the mean error (blue) and the control cost (red) over each interval are shown (color figure online) N x is determined by the delay d = 2b, the number of observables n o and the number of control inputs n u as N x = b(n o + n u ).For the fluidic pinball, we observe the lift and drag at the three cylinders, and we can adapt the angular velocity of the two rear cylinders.Therefore, we have n o = 6 and n u = 2.In order to predict the future state, the decoder consists of three additional parts.The equations are given byh past = relu(W past l k+1 + b past ), h current = relu(W current x current + b current ), h future = relu(W future u future + b future ),

Table 1
Number of iterations, function and gradient evaluations averaged over control steps for different Reynolds numbers