Aerodynamic optimization of a generic light truck under unsteady conditions using gradient-enriched machine learning control

We present the first machine-learned multiple-input multiple-output aerodynamic feedback control under varying operating conditions. Closed-loop control is relevant to many fluid dynamic applications ranging from gust mitigation to drag reduction. Existing machine learning control investigations have been mainly applied under steady conditions. The current study leverages gradient-enriched machine learning control (Cornejo Maceda et al. in J Fluid Mech 917:A42, 2021) to identify optimal control laws under unsteady conditions. The approach is exemplified on a coupled oscillator system with unsteady coupling and demonstrated for a generic truck model undergoing a yawing maneuver. Key enablers of the experiment are a rich set of pneumatic actuators and pressure sensors. The results demonstrate the method’s capabilities in identifying an efficient forcing for control under dynamically changing conditions. This automated and generalizable closed-loop control strategy complements and expands the machine learning control field and promises a new fast-track avenue to efficiently control a broader set of fluid flow problems.


Introduction
The growing constraints on energy saving in the transportation industry have spurred greater interest in vehicle aerodynamics. Improvement of the aerodynamic performance of road vehicles can increase their efficiency, in particular at highway speeds and under gusty driving conditions. At highway speeds, overcoming aerodynamic drag represents the largest portion of the total power expenditure, over 65% of the total power expense according to Hucho and Sovran (1993) and McCallen (2004). This explains why most aerodynamic developments on automobiles during the last decades have focused on drag reduction. Most existing approaches to minimize vehicle drag on commercial road vehicles are passive (Gilhaus and Renn 1986). They rely on aerodynamic shaping, such as well-rounded contours, rear slope angle and add-on devices (Cooper 2003;Krajnović 2014). Such passive approaches, however, are restricted by design and practical considerations and cannot be 'turned off' when not needed.
Recently, more research activities have focused on active flow control (AFC) solutions. Cattafesta III and Sheplak (2011) give an extensive overview of possible actuation mechanisms, whereas Choi et al. (2008) present the most common AFC approaches on bluff bodies. Of the many available approaches, a large subset is considered as an academic exercise, such as rotary (Dennis et al. 2000;Bergmann et al. 2005), streamwise (Cetiner and Rockwell 2001) and transverse (Carberry et al. 2003;Blackburn and Henderson 1999) oscillation of a bluff body. Of the practical and realistic AFC mechanisms, many were investigated on a generic bluff body and the Ahmed body (Ahmed et al. 1984), such as synthetic jets (Li et al. 2022), pulsed pneumatic actuation (Joseph et al. 2013;Rouméas et al. 2009;Pastoor et al. 2008), microjets with steady blowing (Aubrun et al. 2011), blowing and suction slots (Krajnović and Fernandes 2011) and Coanda blowing (Freund and Mungal 1994;Oswald et al. 2019;Herrmann et al. 2020). AFC on heavy truck vehicles were also researched using, for example, synthetic jet (El-Alti et al. 2010) and Coanda actuators (Englar 2001(Englar , 2004Haffner et al. 2020a, b).
Most of the aforementioned studies focused mainly on drag reduction as the aerodynamic property to optimize. However, drag reduction measures usually affect other aerodynamic characteristics, particularly side-wind stability. Such conflicts are even more likely to occur with rear-end treatment, which is where most AFC systems are usually installed.
The term crosswind stability describes the behavior of a vehicle under flow conditions with lateral velocity components. These flow conditions can be caused by atmospheric winds as well as other vehicles, for example during passing maneuvers, and are strongly dependent on roadside obstacles or the preceding vehicle. Side-wind stability is important for passenger comfort of all ground vehicles. It is however critical and a safety issue for vehicles with large projected side area, such as trucks and buses. Accidents linked to crosswind can happen as it is reported by several governmental agencies (Destatis 2010;Starnes 2006). Side-winds and side-gusts originate from different sources like weather, surrounding traffic or the topology of the terrain next to the road. Several numerical (Guilmineau et al. 2013;de Villiers et al. 2009) and experimental (Gilhaus and Renn 1986;Chadwick et al. 2000) studies were conducted to understand the dynamics of such flows. Both steady (Howell 1993;Baker 1991) and unsteady (Chadwick et al. 2000;Kitoh et al. 2009) crosswind investigations were researched. From these aforementioned studies and others, it was observed that a reduction of the lateral projected area in the back will reduce the rear side force and thereby the yaw moment. However, this relation is not valid for well-rounded rear-end configurations (C and D-pillars). Hence, unsteady effects are expected to play a major role for a heavy vehicle that has a boxy shape with many sharp edges, where the flow is dominated by a large time-averaged recirculation bubble behind the base.
Despite the numerous studies on side-wind effects and driving stabilities, very few have tackled this issue using AFC. To our knowledge, only Englar (2001Englar ( , 2004 and Pfeiffer (2016); Pfeiffer and King (2018) applied Coanda blowing on the four rear edges of a truck to reduce drag and to control the yaw moment. Whereas Englar (2001Englar ( , 2004 only implemented open-loop AFC, Pfeiffer (2016); Pfeiffer and King (2018) successfully applied it as a closed-loop control. The latter authors were able to achieve a maximum drag reduction of 15% and complete authority over the yaw moment. However, an automatic and optimized reduction in actuation power for both drag minimization and yaw control has yet to be attempted.
The above-mentioned studies have one theme in common, which is efficiency. Greener, more efficient and safer vehicles are now important objectives for the car industry, whose emission reduction targets are restricted by the European union (European Parliament, Council of the European Union 2007). This requires that the power gained through active drag reductions be as large as possible in comparison with the expanded power of the blown jet. This is typically quantified through the net power savings, where D is the drag force, D 0 is the reference drag without actuation, and A ref is a reference area. The subscripts 'jet' and ∞ denote blown jet and freestream properties, respectively.
An increase in the net power savings can be achieved through periodic blowing. Such observations were initially noted in studies investigating Coanda blowing on airfoils, where large aerodynamic gains were observed with the introduction of periodic actuation (Oyler and Palmer 1972;Jones et al. 2002;Semaan et al. 2021). Periodic Coanda actuation on ground vehicles (generic or otherwise) is less common. Barros et al. (2016) and Haffner et al. (2020bHaffner et al. ( , 2021 have shown reduction on a 3D Ahmed body, where drag reduction of up to 20% is reported. No measurement of the net power savings was provided, which renders transferability of the results difficult. An additional increase in the net power savings can be achieved through closed-loop flow control, which has been rapidly advanced over the past decades with the development of control theory, simulation methods and experimental techniques (Rowley and Williams 2006;Choi et al. 2008). Several control strategies can be found in the literature. A very simple strategy is a model-free approach with direct sensor feedback (Rapoport et al. 2003), or with sensor-based feedback, such as PID control (Roussopoulos 1993). This method is simple to implement and requires only parameter tuning (Killingsworth and Krstc 2006). The resulting control has been successfully applied to single-input single-output opposition and phasor control but comes with no guaranteed performance or optimality property. Furthermore, the large number of tunable parameters makes this approach impractical for multiple actuators, multiple sensors and complex dynamics. A well-investigated model-based control design for multiple sensors and actuators directly uses the input/output signals (black box model) under strong linearity assumptions (Rowley and Williams 2006;Pfeiffer 2016). For complex dynamics with unknown nonlinearities, this control strategy has severe challenges. In contrast, optimal control (Scott Collis et al. 2002;Gunzburger and Lee 2000) is based on a high-fidelity nonlinear Navier-Stokes discretization and minimizes a cost functional. The computational cost, however, excludes any real-time implementation in an experiment. In addition, sensor-based flow estimation may give rise to further technical difficulties. Control-oriented reduced-order models (ROM) resolving just the key nonlinear actuation mechanisms are a compromise between the online-capable black-box models geared toward linear dynamics and the accuracy of the computationally expensive nonlinear Navier-Stokes-based approaches. These ROMs are typically obtained by performing a Galerkin projection of the Navier-Stokes equations onto the proper orthogonal decomposition (POD) basis (Graham et al. 1999;Noack et al. 2003;Weller et al. 2009;Semaan et al. 2016;Semeraro et al. 2011). Further simplifications of such models can be accomplished, such as the generalized mean-field model by Luchtenburg et al. (2009). Due to truncation and other errors, such models require calibration, which is accomplished by either the addition of an eddyviscosity term or by direct identification of the model coefficients. While reduced-order Galerkin models can be pivotal in understanding the actuation mechanisms, they tend to have an intrinsically narrow dynamic bandwidth which limits the use for control design.
Furthermore, many complex nonlinear control tasks, like feedback turbulence control and robot missions (Wahde 2008), are beyond the capabilities of a control-oriented dynamical model. Data-driven methods of machine learning (Bishop 2006) have become increasingly powerful for estimation, prediction and control of many systems. In particular, a control logic may be learned directly from the plant. A well-known example in fluid mechanics is feedback control for skin friction reduction in a wall turbulence simulation using a learning neural network for the control law (Lee et al. 1997). Another approach relies on clusterbased feedback control, which has been successfully applied to control a stalled flow over an airfoil (Nair et al. 2019). The first demonstration of automated learning of a nonlinear closed-loop law taming experimental turbulence was presented by Duriez et al. (2017) and was shown to outperform all tested state-of-the-art control strategies. Moreover, late progress in reinforcement learning managed to learn feedback control laws (Rabault et al. 2019;Fan et al. 2020).
Recent successes of machine learning control (MLC) via genetic programming control (GPC) comprise drag reduction of the Ahmed body with and without yaw angle (Li et al. 2019(Li et al. , 2018, mitigation of vortex-induced vibration of a cylinder (Ren et al. 2019(Ren et al. , 2020, individual pitch control of a floating off-shore wind turbine (Kane 2020), mixing layer control (Parezanović et al. 2016;Li et al. 2020), separation control of a turbulent boundary layer (Debien et al. 2016), recirculation zone reduction behind a backward facing step (Gautier et al. 2015), jet mixing enhancement (Wu et al. 2018;Zhou et al. 2020) and control synthesis for the spatial stabilization of a robot (Diveev and Mendez Florez 2021). GPC has been successful in multiple experiments, where the learned solutions outperformed existing control strategies by exploiting unpredictable nonlinearities. Lately, MLC learning speed was greatly increased thanks to the introduction of intermediate gradient-based steps to exploit local gradient information , reducing the number of needed evaluations from O(1000) to O(100). The acceleration of the optimization process enables new testing opportunities such as learning under unsteady (operating) conditions to improve the control robustness. We refer to an unsteady condition as one that varies with time. This is particularly relevant, because in all aforementioned studies using MLC, the control laws were learned under steady experimental conditions. The benefits of learning under unsteady conditions include improved robustness and automatable identification of an optimal control law. This comes with the price of increased testing time as transient conditions are needed during training. Tang et al. (2020) bypassed such constraint in numerical simulations with parallelization of the operating conditions for reinforcement learning. The derived control policy was able to efficiently reduce the drag for the trained conditions even for the out-of-design operating conditions. Experimentally, Asai et al. (2019) identified a feedback control law for stall suppression on an airfoil model in unsteady conditions. The learning was carried out for both fixed and varying angles of attack to determine the actuation command for a discrete (on/off) actuator. However, both learning strategies identified similar controls independent of the operating conditions.
In this study, we leverage the gradient-enriched machine learning control (gMLC) algorithm to identify optimal control laws under unsteady conditions. We report on the first use of machine-learned multiple-input multiple-output control under changing operating conditions. In Sect. 2, we review the gMLC algorithm and illustrate its capabilities on a threedimensional generic truck model equipped with five pneumatic actuators. Details of the experimental setup and the acquisition techniques are provided in Sect. 3. We present the results of the optimally actuated vehicle under constant and time-varying yaw angle in Sect. 4. Also provided is a comparison between the dynamically identified gMLC performance and that of a linearly interpolated controller from three gMLC control laws at steady conditions. A summary and conclusions are presented in Sect. 5.

Methodology
In this section, we review the adopted control strategy. We first introduce the optimization control problem formulation and describe the gradient-enriched machine learning control algorithm employed to solve it.

The control problem as an optimization problem
The cornerstone of model-free approaches is the formulation of the control problem as an optimization problem, where a nonlinear mapping between the outputs of the system, i.e., the sensor information s and the inputs of the system, i.e., the actuation command b , is learned. This nonlinear mapping is a function referred to as the control law K . The control law lives in an infinite-dimensional Hilbert function space K also referred to as the control law space. In this study, we aim to learn control laws that can adapt to changes in external conditions, such as Reynolds number, temperature and yaw angle to name only a few examples. To this extent, we introduce a variable c that characterizes the external conditions. Hence, the control ansatz reads: The control problem is therefore cast as an optimization problem, where the control law K is to be optimized to minimize a cost function J, The cost function J is a problem-dependent user-defined functional. Equation (3) is in general a non-convex optimization problem that presents, a priori, several minima. In the next section, we present a gradient augmented machine learning control algorithm to identify new minima in the control law space and to quickly descend toward them.

Gradient-enriched machine learning control
In this section, we review the gradient-enriched machine learning control (gMLC) algorithm introduced by Cornejo Maceda et al. (2021) that we employ to solve optimization problem (3). Gradient-enriched MLC is an iterative optimization algorithm to derive control laws, also referred to as individuals following the genetic programming terminology. The method, based on GPC (Duriez et al. 2017), consists of selective recombination of the most performing control laws and relies on a systematic evaluation of new control laws. The learning is accelerated compared to GPC by the exploitation of local gradient with downhill simplex optimization steps.
The algorithm starts with a broad exploration phase of the control law space with a Monte Carlo sampling. The Monte Carlo phase generates N MC random control laws to initialize the first set of individuals. The individuals are then evaluated and added to the database of all individuals. Then, the algorithm alternates between exploration steps carried out by genetic programming and exploitation steps by the downhill simplex iterations until a stopping criterion is reached.
The role of exploration is to locate new and better minima in the space of control laws. This phase is carried out by GPC, where the idea is to recombine the best performing individuals through several generations thanks to genetic operators: mutation and crossover. Mutation consists of replacing parts of a control law with random expressions, whereas crossover exchanges parts of two control laws. Both genetic operators partake in creating new control law structures, i.e., exploring new regions of the control law space. Thus, both operators, mutation and crossover, are employed for the exploration phase. (Li et al. 2018) gives an example of implementation of mutation and crossover for control. In the gMLC approach, the concept of evolution through generations is no longer considered. Instead, new individuals are generated by recombining the best of all the previously evaluated individuals. This assures that no important information is irretrievably lost. N G new individuals are generated from the database based on their performance. The selection is carried out by the tournament method with a tournament size of 7 for 100 individuals following Duriez et al. (2017) recommendation. Both, the Monte Carlo and GPC steps are encoded in a matrix representation that is inherited from linear genetic programming, where each candidate solution is represented by an integer matrix. Each matrix resembles a computer program that codes unequivocally a control law. Each line of the matrix is an instruction pointing to basic operations ( + , −, × , ÷ , cos , sin , tanh , etc.) and registers containing constants and variables. The N inst lines of the matrix are read linearly yielding the control commands as outputs in the first registers (Li et al. 2018).
Each exploration phase is followed by an exploitation phase. This step exploits the local gradient information to slide down toward the best neighboring minimum. This is carried out by a variant of downhill simplex (Nelder and Mead 1965) for infinite-dimension spaces introduced by Rowan (1990) and referred to as downhill subplex. In the following, we do not differentiate between the downhill simplex and subplex as they are similar and only differ in their application spaces. The simplex step linearly combines a set of N sub best-performing control laws to identify a minimum. The N sub individuals describe a linear subspace, where the simplex evolves by geometric operations (reflection, expansion, contraction and shrink) using the local gradient information. Each geometric operation yields one or several new individuals that are linear combinations of the original N sub individuals.
After each downhill simplex iteration, the worst individuals are replaced by the best ones. The downhill simplex steps are iterated until at least N G individuals are generated. The newly generated individuals are then added to the database. It is worth noting that all the new individuals belong to the subspace defined by the original N sub individuals. If the stopping criterion is reached, the algorithm returns the best-performing control law. Otherwise, a new iteration of exploration and exploitation is carried out. The stopping criterion may be a given performance or a total number of evaluations.
Gradient-enriched MLC is a semi-stochastic algorithm which is able to locate the basin of attraction of some minima and slide-down toward the minima. Reaching the global minimum's basin of attraction is typically not guaranteed for infinite-dimensional function spaces with finite testing budget. Exploration and exploitation is alternated until the process is stopped. Cornejo Maceda et al. (2022) show an example of application of gMLC where exploration keeps discovering good performing control laws after 500 evaluations, i.e., 4 exploration-exploitation cycles.
For the following exploration steps, N G new individuals are generated by recombining the best individuals of the whole database, i.e., including the individuals from the previous downhill simplex and exploration steps. As the size of the database increases, the tournament size is scaled such as to maintain a 7 for 100 ratio. Among these new N G individuals, the best performing individuals replace the worstperforming individuals of the simplex if their cost value is better. Thus, the simplex includes all the best performing individuals at each step. This procedure allows to perform a gradient descent beyond the initial subspace.
We note the very important intermediate step of reconstruction between each exploitation and exploration iteration. The new individuals generated by the downhill simplex are linear combinations of individuals without a matrix representation, which is essential for genetic operations. Thus, a reconstruction is performed for each linearly combined individual by solving a secondary optimization problem. The goal is to derive a control law that has the same response as the linearly combined one. Such a problem is similar to a surface fitting problem, which we solve with linear genetic programming (Li et al. 2018) with the same metaparameters as gMLC so the matrices are compatible. Figure 1 schematically illustrates the gMLC algorithm. For more details, we refer the readers to Cornejo .
In this study, the gMLC algorithm is employed for the first time to learn control laws under varying conditions, i.e., each individual is evaluated under varying conditions. In Appendix 1, this approach has been demonstrated in the stabilization of time-varying system of coupled oscillators. Here, the optimized control law for transient conditions is compared with the interpolated laws from optimizations at multiple "frozen" steady conditions and found to require significantly less testing time.

Experimental setup
Building on the promising results of the coupled oscillator problem, we implement the automated control strategy experimentally on a generic truck model under yawing conditions. This section details the wind tunnel facility, the experimental model and the acquisition techniques used for the investigation. A schematic of the experimental setup is shown in Fig. 2.

Wind tunnel facility and experimental model
The experiment is conducted in the Modell-Unterschallkanal Braunschweig (MUB) wind tunnel at the Institute of Fluid Mechanics of the Technische Universität Braunschweig. The MUB is a low-speed closed-circuit wind tunnel with a 1.3 m × 1.3 m × 5.7 m test section. Flow speeds of up to 60m/s can be achieved in the test section. In the current experiment, the incident flow is controlled at velocity V ∞ ≈ 44m/s and temperature ≈ 35 • C to yield a constant Reynolds number. The experimental model is a three-dimensional generic truck that represents a small transport truck. The truck could be categorized as a typical N1 transport truck that suffers poor aerodynamic performance and handling. This is a direct result of the vehicle's boxy shape and dimensions, which are driven by the desire to maximize the payload volume. The geometry is a scaled-up version of the model used by Pfeiffer and King (2018), which in turn is a shortened version of the ground transportation system (GTS) investigated by many, such as Gutierrez et al. (1996) and Englar (2001). The model has a height of H = 0.224 m, width of W = 0.161 m and length of L = 0.569 m, yielding a Reynolds number of 1.5 ⋅ 10 6 with respect to the model length. The blockage ratio above the raised floor of 2.5 %. The influence of flow blockage is therefore neglected. The model is equipped with simplified half-wheels with a 0.031 m radius and a wheelbase of 0.364 m. Two vertical zigzag strips are taped along the imaginary A-pillars to trip the boundary layer and mitigate Reynolds number effects. The model is mounted on a sting and hovers over an elevated floor with less than one millimeter clearance between the surface and the wheels.
To simulate side wind, the model is yawed by a rotating mechanism situated underneath the wind tunnel test section. We acknowledge the nuances between a rotating vehicle and one exposed to a side wind gust. These differences are, however, of no relevance to the current study. The yawing mechanism is powered by an MPC TC-E 130 servo motor connected to a belt drive with a 4.4:1 reduction ratio that turns the balance, which is attached to the sting and subsequently the model. This yawing mechanism enables accurate and repeatable yaw-angle variations. A top view of the model is presented in Fig. 2b, where positive yaw-angles are defined clockwise with respect to the mean flow direction.
Actuation is performed by five pneumatic actuators illustrated in Fig. 3. Four are Coanda actuators located at the four trailing edges of the body. They are henceforth referred to as "top," "right," "bottom" and "left" following their position at the model base. The Coanda actuators blow a thin air jet through h ≈ 0.4 mm slots over Coanda surface quadrants Fig. 2 a A side view of the experimental setup illustrating the model, the elevated floor, the aerodynamic housing encompassing the sting, the balance and the yawing mechanism. b A top view of the experimental model including the coordinate system and the yaw angle

Fig. 3
The pressure sensor distribution over the experimental model. Blue markers denote the fast pressure sensor locations, whereas the black dots those for the steady pressure measurement taps. Also shown are the five pneumatic actuators at the model base with a close-up on one Coanda actuator with r = 8 mm radius. The actuators span 89% of the 'top' and 'bottom' and 92% of the 'left' and 'right' trailing edge sides. The four Coanda actuators around the model base create a cavity of relative depth of d∕H ≈ 0.036 . Excluding the boat-tailing effect from the Coanda surfaces, the cavity is not sufficiently deep to considerably alter the reflectional symmetry breaking (RSB) mode or significantly reduce the drag (Evrard et al. 2016;Fan et al. 2022). Jet uniformity across the slots is achieved by local slot height adjustment, which yields pressure deviations of less than 10 % from the mean (El Sayed M. et al. 2018;Semaan 2020). The fifth actuator is a vertical slot at the model base center, which can be described as a pneumatic splitter plate. The principle idea behind the proposed pneumatic splitter plate is to duplicate the benefits of a physical one combined with Coanda blowing, as demonstrated by Geropp and Odenthal (2000) and Odenthal (1996), where up to a 50% base pressure increase was documented. The mechanical splitter plate used in these studies was observed to enhance the Coanda actuators at the body aft by streamlining the jets away from the base centerline. Despite their benefits, such physical splitter plates are neither practical nor safe on real vehicles. In addition, they are passive devices that must be deployed at all times and most often under unfavorable flow conditions. On the other hand, the proposed pneumatic splitter plate is safe and only deployable when needed. Similar to the Coanda actuators, the pneumatic splitter plate is 0.4 mm thin and extends 180 mm along the base center. We shall refer to this actuator as "splitter." All jets are supplied through distributed plena inside the model that are connected to a constantly replenished 500 l tank to damp pressure fluctuations. Before reaching the tank, the air is cleaned and dried by a PoleStar PST120 refrigeration air dryer. Pressure to the tank is regulated by a Festo MPPE proportional pressure regulator. Unsteady actuation is achieved by five Matrix MX-893-Series multi-valve blocks with switching time below 1ms. Each multi-valve block is connected to an actuator and allows for independent and fast switching between three blowing intensity levels by the sequential opening and closing of internal valves.

Acquisition techniques
In this study, we are mainly interested in the identification of an optimized control law under unsteady yaw angle variations. Critical to these efforts are the measurements of the drag, the side force, the yaw angle, the individual blowing intensities, as well as state-representative feedback signals.
The aerodynamic forces are measured through a sting connected to a 6-component balance installed on the yawing mechanism (c.f. Fig. 2). Hence, the entire balance rotates with the yawing mechanism. The KME 4 balance has a 0.2% full-scale accuracy on the six force and moment components. The accuracy is additionally verified by in situ calibration using controlled weights and a pulley. Accurate force measurements are insured by, 1. Independent ducting of the pneumatic feed lines, such that no momentum is transferred to the sting and subsequently to the balance. 2. Mitigation of the aerodynamic influence of the sting, which is shielded from the flow by an aerodynamic housing across the gap between the test section floor and the elevated floor. This exposed small section between the truck model and the elevated floor has an elliptical shape to streamline the flow over a range of yaw angles and minimize the resultant acting forces. A detailed quantification of the sting influence on the aerodynamic forces has not been conducted. We assume it has a negligible contribution at small yaw angles that slightly increases with larger yaw angles. 3. Resetting (zeroing) the balance readings based on wind-off measurements: The small forces and moments measured during the model rotation with the wind tunnel turned off were subtracted from the measurements during the experiment. The zeroing helps to remove any non-aerodynamic force component from the measurements.
All six components are oversampled at a rate of 10kHz, which matches that of the surface time-resolved pressure sensors. In the current work, we are strictly concerned with the drag and side force coefficients, defined as and respectively, where D is the drag and S is the side force. In both definitions, A ref denotes the projected frontal area of the vehicle. We note that the thrust force generated from blowing is accounted for in open-loop tests by simply subtracting the force components that were measured using the same actuation but with the wind off. Such a correction was not feasible for the currently presented closed-loop control tests, since the control loop relies on sensor signals with the wind on. Thus, the reported drag and side force results include a thrust contribution from the blowing jets. The generated thrust is, however, one order of magnitude smaller than the drag force. Specifically, and as detailed in Sect. 4.3, the total momentum coefficient constitutes ≈ 2.5 − 6% of the (4) drag coefficient. We emphasize that this matter is of no concern for the reported net-power savings results, because the jets' force contribution to the expanded power is explicitly accounted for in Eq.
(1). The second type of measurement is the vehicle yaw angle, which is acquired by a Sick SKM36 differential position encoder with a 0.088 • resolution installed on the servo motor. Due to the gear ratio, the model yaw angle resolution is in effect even smaller.
The jet blowing intensity and actuation efficiency are characterized by the momentum coefficient (Poisson-Quinton and Lepage 1961) and the net power savings defined in Eq. (1). Both metrics require the measurement of the jet exit velocity (El Sayed M. et al. 2018;Semaan 2020), which is estimated from pressure measurements assuming an isentropic expansion and a sufficiently large plenum with a quasi-zero flow velocity as where is the nozzle efficiency factor, and P jet and P pl are the jet and plenum pressures, respectively. The instantaneous pressure signals for P jet and P pl are measured for each plenum using Honeywell SDX05D4-A pressure sensors distributed within the actuators plena and at the jet exits. The nozzles' efficiency factors are estimated by computing the individual mass flow rates using the velocity from Eq. (7) and equating them to the measured values from the flow meters (Semaan 2020). The mass flow rate of each individual actuator is directly measured using three Testo 6442 and two Testo 6441 flowmeters.
The final necessary component to conduct gMLC is state representative instantaneous feedback signals. This is achieved by 18 Honeywell SDX05D4-A pressure sensors distributed over the base, the leeward side and the top of the model. The connecting taps are marked by the blue dots in Fig. 3. The sensors are sampled at 10kHz. The 18 fast pressure sensors are supplemented with 151 static pressure measurement taps using temperature compensated DTC Initium pressure measurement system with ESP-64-HD and ESP-32-HD pressure scanners with ±7kPa and ±35kPa ranges and static accuracy of ±0.05 % and ±0.1 % of their respective full range. The 151 pressure taps over the model surface are marked by the black dots in Fig. 3. The static pressure signals are acquired at a rate of 160Hz.
All processes are centrally managed by a LabVIEW code on a host PC. The data acquisition and the gMLC fast-loop are operated through a National Instruments cRIO-9039 FPGA controller at a rate of 10kHz and 4kHz, respectively, which ensures that all data and tasks are processed expediently. The gMLC MATLAB program is interfaced with the LabVIEW code through ZeroMQ sockets and JSONformatted messages. The live sampled data are transmitted to the host PC for monitoring and recording.

Control objective and settings
The main objective of this study can be described as the determination of a multi-objective feedback control optimization for a slow yaw angle transient between 0 to 15 • . The slow transient is dictated by the mechanical limitations of the yawing mechanism that moves at an average rate of ≈ 6.7 • /s and by our desire at this stage to avoid fast broadband-exciting transients. Such rotation rates are considered quasi-transient for the current experiment, thus excluding realistic sudden gust scenarios. However, according to Sims-Williams (2011), quasi-transient yaw movement is sufficient to simulate sudden gust effects. In addition, real yaw moments on real vehicles do not only depend on the aerodynamics but also on the suspension and on the payload, conditions that cannot be simulated in wind tunnels. At 0 • the goal is drag reduction (green traffic), whereas at 15 • the primary concern is yaw stabilization (safe traffic). As such, we define the cost function as, is the reference unactuated side force, and w NPS and w S are yaw angledependent weights defined as, and Cost function (8) is specially designed based on the model aerodynamic response to side wind (Sect. 4.2) and our overall objectives for greener and safer vehicles. Specifically, the importance of maximizing the net power saving associated with drag reduction quickly gives way to minimizing the side force coefficient at a pre-chosen crossover set value of C cross S = 1.4 . The gradual and continuous transition is enabled by the tanh functions in Eq. (9). We note that the limiting side force coefficient C cross S = 1.4 is arbitrary. The other constants in (9) are selected to satisfy w S (C S = 0) = 0 , The factor 4 inside the tanh function controls the steepness of the function rise. Both weight distributions are presented in Fig. 4, where the equal weights at the crossover point C cross S = 1.4 can be observed. The employed gMLC settings are similar to those used for the modified coupled oscillator except for the number of forcing terms and the input sensors, which now stand at N b = 5 and N s = 60 , respectively. Unlike the coupled oscillator, a full-state control is not feasible. Instead, we rely on the 18 fast surface pressure sensors filtered signals and features thereof. The enriched features are designed to better reflect the instantaneous state of the flow. They include: moving averages, moving standard deviations and differences between chosen pairs of base pressure signals. In addition, we explicitly include the yaw angle as a sensor signal, i.e., b = K(s, ) . The optimization process is carried out over ≈ 1200 evaluations.

Control optimization at steady yaw angles
In this section, we minimize cost function (8) at three steady yaw angles ∈ [0 • , 11 • , 15 • ] . The results shall serve as a basis for the following section detailing the control optimization under unsteady yaw angle variations.
Before we delve into the gMLC results, it is important to first examine the unactuated reference aerodynamic response of the model. The investigated generic truck is considered a bluff body. As with all unactuated bluff bodies, the flow over the vehicle separates at the trailing edges and forms a large highly unsteady wake. This re-circulation wake region yields a low base pressure and a high drag coefficient of C D ≈ 0.38 at = 0 • . Yawing the model increases both the drag and the side force coefficient almost linearly, as observed in Fig. 5. The tested yaw angle range of 0 • < < 15 • is wider than the typical 0 • < < 10 • range of side wind disturbances encountered at highway conditions (Sims-Williams 2011). These variations in drag and side force coefficients can be elucidated from the pressure distributions over the model surface. Figure 6 presents the top view of the pressure coef- A special characteristic of the gMLC compared to MLC is its ability to converge faster toward the optimized control law (Cornejo Maceda 2021). Figure 7 presents the learning evolution over the 1200 evaluations, which include 5 exploration/exploitation cycles for the = 0 • case. The gray × 's denote the initial Monte Carlo sampled points. The gray + 's are the control laws built with the evolutionary processes, and the ⋅ 's are those from the subplex exploitation phases. The cost function evolution of the best control laws throughout the training phase is marked by the red line. The benefits of the subplex steps are observed by the sudden drops in the red line. Following the progression of the best control laws, we observe a plateau after ≈ 200 evaluations with only negligible improvements thereafter. The winning individual reduces the cost function to J = 0.8903 . This reduction is strictly achieved through net power savings related to drag reduction gains, which amounts to is the reference unactuated drag coefficient. We hypothesize that a partial contribution to the drag reduction is achieved through wake instability suppression (Lorite-Díez et al. 2020;Khan et al. 2022). To remain in-scope, we refrain from conducting a stability analysis. A similar learning rate behavior is observed for the two other yaw angles = 11 • and 15 • , which are presented in Appendix 2. The analytical expression of the winning control laws is reported in Appendix 3. Figure 8 presents the best control law actuation output quantified with the blowing intensity C for all five actuators at = 0 • . Surprisingly, the blowing intensities of all five actuators are practically steady. Despite gMLC's ability to actuate in an unsteady manner, it has identified an optimal actuation strategy that is quasi-steady; refer to Appendix 4 for the output signal distributions. The small fluctuations in the blowing intensity sequences stem from sensor noise used to estimate the blowing intensity and from small pressure fluctuations in the plena. Note that we refer to optimality with respect to cost function (8). The labeling excludes optimality of other typically desired characteristics, such as stability. These findings align with previous observations made by Haffner et al. (2020b) that reported larger aerodynamic gains for steady blowing over unsteady actuation when employing large relative Coanda radii r. The aerodynamic affinity toward a certain slot height-surface radius combination for steady Coanda blowing has been known for some time (Englar and Williams 1971). If such a favorable region of geometric combinations similarly exists for unsteady blowing, it is not known. Further investigations are required to shed light on this subject.
Besides their quasi-steady behavior, the five blowing intensity time traces in Fig. 8 show additional noteworthy characteristics; the top and the left/right actuators are blowing at full capacity, the bottom at mid-capacity and the splitter at low capacity. The favorable medium blowing intensity of the bottom actuator aligns with the results reported by Haffner et al. (2021) on an Ahmed body. We suspect the reason for the slightly reduced blowing intensity relates to With its weak blowing intensity, the pneumatic splitter plate appears to yield only a marginal benefit to the drag reduction and the net power savings. Indeed, the highest drag reduction from open-loop tests achieved at = 0 • with the splitter plate active is only ΔC D ≈ 1.5% larger than that without it. When accounting for the generated thrust for that test case, the drag reduction becomes insignificant. The current pneumatic splitter plate is not capable of duplicating the results of the mechanical one, as demonstrated by Geropp and Odenthal (2000) and Odenthal (1996). We suspect a relatively low flow penetration and thus authority behind the reduced performance of the current pneumatic splitter plate compared to a physical one.
Except for the splitter actuator at = 15 • , the superiority of steady blowing also extends to the two other tested yaw angles conditions. Figures 9 and 10 present the actuator output of the best gMLC-identified control law actuation quantified with the blowing intensity C at = 11 • and 15 • , respectively. At = 11 • , the objective function includes both drag and side force coefficient components. Unlike at = 0 • , the top, left and splitter actuators are blowing at full intensity, whereas the right and bottom ones are turned off. The separated right side at = 11 • is beyond the ability for reattachment. The algorithm correctly identifies this constraint and shuts the right actuator off. The turned-off bottom actuator is also the result of the now much larger separation region that extends from the front right side to the base left side. With its now relatively little authority over this larger recirculation region, the bottom actuator at the base is also shut down. Interestingly, the pneumatic splitter plate blows at full intensity. Considering it was barely active at = 0 • , the splitter actuator is now clearly acting against the side force. We suspect dual effects of the pneumatic splitter plate: (i) a torque balancing role from this simulated tail fin and (ii) At = 15 • , the cost objective becomes mostly side force reduction. Only the left and the splitter actuators remain active with both at high blowing intensity. The splitter actuator shows periodic switching between valve levels 2 and 3 at St = f a W∕U ∞ = 0.07 , where f a is the actuation frequency, and W is the model width. This frequency could be related to low-frequency dynamics in the natural shedding wake. This dynamics is usually associated with pumping of the recirculation region causing its shrinking and expansion, as shown by Berger et al. (1990) in the wake of a normal disk and as reported by Khalighi et al. (2001) and Dalla Longa et al. (2019) in three-dimensional blunt body wakes. The cost function is reduced to J = 0.8533 , which is mostly achieved by side force reductions ( NPS = 0.0852 , ΔC s = 0.153).
We note the gradual erosion of benefits in employing gMLC with increasing yaw angle. At small yaw angles, gMLC clearly improves upon the Monte Carlo samples, e.g., by more than 10% for the = 0 • case; refer to Fig. 7. On the other hand, at the highest tested yaw angle of = 15 • , gMLC does not yield any significant improvement over the Monte Carlo samples, as observed in Fig. 21. We believe this degradation is mainly caused by the gradually simpler actuation required at higher yaw angles, where, for example, only the left and the splitter actuators remain active at = 15 • . Specifically, the probability of stochastically identifying a simple controller significantly increases when the optimal actuation is simple. This degradation should not prejudice the gMLC algorithm performance: After all, it is not the algorithm's fault if the optimal control is simple. Moreover, the gMLC advantages over Monte Carlo are clearly demonstrated at low yaw angles and under unsteady conditions, which constitute the main implementation.

Control optimization under unsteady yaw angle variations
In this section, we present our control strategy under unsteady yaw angle variations. Cost function (8) is the same as used for steady yaw angle. However, since it implicitly depends on the yaw angle, it changes with , which is sinusoidally varied from 0 • to 15 • and back to 0 • in 5 seconds. Similar to the coupled oscillator system example, we investigate and compare both the direct gMLC optimization trained on the time-varied yaw angle and the parametric control law interpolated from the three steady yaw angles. Figure 11 presents the learning evolution over the ≈ 1200 evaluations, which include 9 exploration/exploitation cycles for the time-varied yaw angle motion. Compared to the steady yaw angle training, the progression of the best control laws is slower. The optima is observed after ≈ 400 evaluations (compared to, for example, ≈ 200 evaluations for the = 0 • steady angle case) with only negligible improvements thereafter. The slower learning rate is expected as the optimization problem contains much more complex flow dynamics. The analytical expression of the winning final control law is reported in Appendix 3.
The best gMLC-identified control law actuation output quantified with the five blowing intensity signals C during dynamic yaw angle variation over 5 seconds is presented in Fig. 12. Also shown is the yaw angle variation over one cycle. The time traces illustrate the impressive capabilities of gMLC to efficiently actuate this unsteady flow. Except for the right actuator, all blown jets remain quasi-steady with the top and the left at full intensity and the bottom at 2/3 intensity. The splitter actuator is neglected throughout the entire cycle. Similar to the steady yaw angle cases, all actuation signals are free from any periodic forcing. The fact that only the right actuator varies is sensible. It is after all the most affected actuator by the flow separation at the vehicle front. As reported in Sect. 4.2, with increased yaw angle and hence separation, the right actuator becomes superfluous and loses its authority. The gMLC algorithm recognized that limitation and gradually shuts the actuator down with increasing yaw angle and opens it again during the decreasing phase of the cycle. Moreover, the time variation of the right actuator is symmetric with respect to the angle variation. In other words, there is no sign of hysteresis. This is not surprising, because the yaw angle variation, and hence the aerodynamic response, is quasi-steady. We note that the sudden variation in C is simply caused by the discrete opening/closing of the three valve stages.
One beneficial by-product of gMLC is the input (i.e., sensor) optimization. When reaching convergence, the evolutionary process optimizes the sensor selection by retaining those beneficial to the winning control laws. This optimization has been reported by several sources in the literature, such as Parezanović et al. (2016) and Li et al. (2018). Analyzing the winning control laws, three observations are discerned:  Fig. 12 The best gMLC-identified control law actuation output quantified with the blowing intensity C for all five actuators during the dynamic yaw angle variation over 5 seconds. Also shown is the yaw angle variation over one cycle forward position on the leeward side. Sensor 6 yields a base pressure signal that correlates with the vehicle drag, whereas sensor 15 reflects the state of separation as the yaw angle is increased. The sensors at the model roof are mostly unused. 3. The yaw angle is excluded as a sensor input, despite its inclusion as a possible input variable 1 . As reported in Appendix 3, the control law only depends on surface pressure signals and features thereof. The exclusion is desired since the yaw angle on a real vehicle is not typically available. Contingent upon the installation of pressure sensors on vehicles, this finding opens the door for possible implementation in industrial applications.
Similar to the coupled oscillator system, we compare the gMLC performance trained under unsteady conditions to a parametric controller interpolated from the best individual control laws at = 0 • , 11 • , and 15 • . The concept is comparable to the linear parameter-varying control methodology, where multiple models are combined to improve the applicability range (Hemati et al. 2017;Shaqarin et al. 2021). The interpolated control is mathematically expressed as where w i are the interpolation weights, and b 0 , b 11 , and b 15 are the actuation commands from the corresponding best control laws identified with gMLC at steady = 0 • , 11 • and 15 • , respectively. Unlike the coupled oscillator, we interpolate the actuation signals b that are actually transmitted to the actuators instead of the corresponding analytical control laws K. Such an approach is more convenient for the current controller, where the interpolation of the complex control laws K 0 , K 11 , and K 15 listed in Appendix 3 is rather prohibitive. Figure 13 presents the controlled system using b param under unsteady yaw variation. As can be seen, the interpolated control law output is not similar to the optimized one trained under unsteady conditions. Here, the forcing signals are much more varied with time. The top, the right and the bottom actuator gradually shut down as approaches 15 • , whereas the splitter plate gradually opens. The left actuator remains open throughout the entire cycle, which emphasizes the important role of the left actuator in drag reduction as well as in side force mitigation (for positive yaw angles).
The cost function obtained with b param is J param = 0.8238 , which is only slightly better than that achieved under unsteady conditions J = 0.8311 . Similar to the coupled oscillator problem, training a control law directly under unsteady yaw angle variation is favorable when accounting for the overall cost of our parametric approach, which requires three separate runs and approximately three times the number of evaluations. In other words, learning under unsteady conditions allows almost the same performance as our parametric control but with a significant reduction in the number of evaluations.
A better understanding of the two control strategies is achieved when comparing the respective aerodynamic gains. Figures 14 and 15 present the absolute and relative drag and side force reductions for the directly identified and the interpolated control laws. We note that the presented force reductions in Figs. 14 and 15 include a thrust component from the blown jet, which we could not subtract. Therefore, the quantitative values should be regarded with caution. However, the overall qualitative trends and the drawn conclusions therefrom should hold.
The absolute drag and side force reductions ( ΔC D and ΔC S ) in the figures are defined with respect to the corresponding reference force coefficient at that yaw angle. For Fig. 13 Parametric actuation output interpolated from the three identified gMLC control law actuation signals at steady yaw angles. The actuation output signal is quantified with the blowing intensity C for all five actuators during the dynamic yaw angle variation over 5 seconds. Also shown is the yaw angle variation over one cycle is the reference unactuated drag coefficient. Normalizing ΔC D and ΔC S yields the relative drag and side force reductions, As the figures show, the parametric controller yields larger drag and side force reductions than the directly identified control law over most of the yaw angle motion range. We emphasize that ΔC D is not directly maximized by gMLC. The drag reductions are rather achieved through optimization of the net power savings. At the lower range 1 • < < 5 • , both control strategies yield similar absolute and relative drag reductions. For 5 • < < 13 • , the interpolated control law shows larger gains, whereas for > 13 • , the unsteady gMLC yields larger drag reductions. Interestingly, the drag reductions from both control strategies increase with yaw angle starting at ≈ 8 • , despite the quickly decreasing weight multiplying NPS. This counter-intuitive drag reduction is obtained indirectly through the side force reduction, which shrinks the re-circulation region size and the wake fluctuation level and subsequently also the drag. We stress that the overwhelming drag reduction is not achieved by the added thrust. Indeed, the total momentum coefficient C tot decreases with increasing yaw angle for both control strategies, as shown in Fig. 16. Also shown, is the relative contribution of the total momentum coefficient to the reference drag coefficient, which ranges from ≈ 2.5 − 6% along the yaw angle variation for the two control strategies. Thus, for both control strategies, side force minimization using active flow control at large yaw angles also reduces drag. Unlike the drag, the side force reductions remain consistent through most of the yaw angle range for both control strategies. Starting at = 1 • , ΔC S steadily increases ( C S steadily decreases) with . It is not entirely clear why no additional side force reductions are achieved at ≳ 11 • , where w S becomes significant. We suspect that the decreasing authority to reduce the side force is balanced by the sudden increase in the cost function weight.

Summary and conclusions
In this study, we successfully apply gradient-enriched machine learning control (gMLC) to identify an optimized control law under unsteady operating conditions. This marks a departure from previous MLC implementations on steady operating conditions. We report on the first use of machinelearned multiple-input multiple-output control under changing operating conditions. The approach is applied on a coupled oscillator system and a generic truck experiment with five pneumatic actuators. The results demonstrate the method's capabilities in identifying the most efficient forcing for control under dynamically changing conditions. The first gMLC implementation under unsteady conditions is performed on a modified coupled oscillator that includes a time-varying coupling parameter. The objective is to stabilize an unstable damped oscillator (a 1 , a 2 ) by exploiting the nonlinear coupling to two linearly stable oscillators (a 3 , a 4 ) and (a 5 , a 6 ) through a forcing term b that acts on both the second and third oscillators. The coupling is achieved through a parameter c ∈ [0;1] that defines the coupling level between oscillators. The coupling is exploited to generate unsteady conditions by sinusoidally varying c during the learning process. The results demonstrate the gMLC capability to dynamically adapt to the varying conditions by switching its forcing between the second and the third oscillator depending on the coupling. Moreover, the forcing is performed at the corresponding oscillator frequency without prior explicit inclusion of this information.
The second gMLC deployment is an experimental implementation on a generic truck model with multiple actuators and sensors under unsteady yawing conditions. Actuation is performed by five pneumatic actuators; four are Coanda actuators located at the trailing edges of the body, whereas the fifth is a vertical slot at the model base center, which can be described as a pneumatic splitter plate. The instantaneous feedback signals are provided by the yaw angle encoder, 18 fast pressure sensors distributed over the model and enriched features thereof. The control objective aims for efficient drag reductions at low yaw angles and side force reductions at high yaw angles. Again, gMLC automatically identifies a control law that minimizes this cost function and satisfies the aerodynamic objectives. Moreover, the resultant controller is independent of the explicit yaw angle and only relies on the surface pressure sensor signals and their features. This opens the possibility for commercial application as the wind direction is generally not known.
The control strategy highlights the coupling between the drag and the side-force on this model; side force minimization using active flow control inevitably reduces drag. This results from the control strategy similarly affecting the wake, shrinking the re-circulation region and the fluctuation level.
For both applications, the directly identified optimized control law under unsteady conditions is compared to a parametric control strategy that interpolates between individual gMLC optimized control laws trained at three steady conditions. The gMLC-based parametric control is marginally better than the directly identified one, albeit at approximately three times the training time. Moreover, the interpolated control for the generic truck experiment requires the explicit yaw angle input which is typically not available in a real-life Fig. 16 The total blowing intensity variation or the directly identified and the interpolated control laws over the dynamic range application. We note that problems with a more complex dynamic response would require more than three conditions to interpolate the controller, thus rendering the gMLC-based parametric control strategy even more expensive. In summary, training a control law directly under unsteady conditions remains clearly favorable when accounting for the training cost despite the slight superiority of gMLC-based parametric control. The current gMLC success opens a new avenue of automated control strategies for flow problems under unsteady conditions. Possible applications range from pitching airfoils to gust encounters, which the authors are currently pursuing.

Appendix A Stabilization of a coupled oscillator system under varying conditions
In this appendix, we illustrate the learning capability of gMLC under unsteady conditions by controlling a coupled oscillator system that includes a time-varying coupling parameter.

A.1 Problem formulation
The coupled oscillator system illustrates how nonlinearities can stabilize self-amplified natural instabilities through forcing. The objective is to stabilize an unstable damped oscillator [a 1 , a 2 ] by exploiting the nonlinear coupling with a linearly stable oscillator [a 3 , a 4 ] through a forcing term b. We note that when the forcing term is absent, the oscillator is free and simply performs harmonic motion. The control of such coupled oscillator system with machine learning control has been reported in Duriez et al. (2017) using MLC andCornejo Maceda (2021) using gMLC. In this study, we wish to demonstrate the learning capabilities of gMLC under varying conditions. For this purpose, we introduce a third oscillator [a 5 , a 6 ] that is also nonlinearly coupled to the first oscillator [a 1 , a 2 ] in the same way as the second oscillator [a 3 , a 4 ] . The same forcing term b is added to the third oscillator. The coupling is achieved through a parameter c ∈ [0;1] that defines the coupling level between oscillators 1 and 2 on one hand and 1 and 3 on the other hand. For c = 0 , the first oscillator is solely coupled to the second oscillator, whereas for c = 1 the first oscillator is only coupled to the third oscillator. Intermediate values of c allow partial coupling with both oscillators. The coupling is employed to generate varying conditions by changing the coupling parameter c during the learning process. The system of ordinary differential equation then reads: with where is the growth rate of the first oscillator and is the coupling term of the entire system. The forcing term b is a function of [a 1 , a 2 , a 3 , a 4 , a 5 , a 6 ] to enable full-state control and of c to incorporate explicit dependencies of the unsteady effects. The eigenfrequency of the third oscillator is set to 3 = 10 with ≈ 1.618 being the golden ratio, such as 3 is incommensurable with 1 and 2 . This implies that distinct controllers are necessary to make 2 and 3 resonate. We note that the state [a 1 , a 2 , a 3 , a 4 , a 5 , a 6 ] = [0, 0, 0, 0, 0, 0] is an unstable fixed point of the unforced system for any value of c. The second and third oscillators are linearly stable, as their growth rate is a negative constant. However, a slight displacement of the first oscillator in the phase space leads to a destabilization of the system. The growth rate of the first oscillator is positive as the radii r 1 , r 2 and r 3 are close to zero. Destabilized, the second and third oscillators converge toward the fixed point (0, 0) leading to r 2 = 0 and r 3 = 0 , whereas the first oscillator grows exponentially until the cubic damping term (−r 1 2 ) saturates the growth = 0 . The first oscillator has a globally stable limit cycle with a radius r 1 = √ 0.1 . To stabilize the first oscillator, the forcing b needs to destabilize the second or third oscillators so that < 0 . The system is initialized at t = 0 close to the unstable fixed point as Based on the system characteristics, the control objective is thus to stabilize the first oscillator and bring it to the fixed point [a 1 , a 2 ] = [0, 0] . We wish to achieve this objective using a control law that excites the second or the third oscillator adaptively based on the coupling parameter c. Hence, we seek to minimize the time-averaged fluctuation level of the first oscillator and, for an energy efficient control, the actuation power 1 , a 2 , a 3 , a 4 , a 5 , a 6 , c)  0  b(a 1 , a 2 , a 3 , a 4 , a 5 , a 6 , c) r 1 = √ a 1 2 + a 2 2 , 1 = 1, r 2 = √ a 3 2 + a 4 2 , 2 = 10, r 3 = √ a 5 2 + a 6 2 , 3 = 10 .

Thus, the cost function to minimize is
where is an arbitrary penalization parameter. In this study, we set = 0.01 so that more weight is given to the fluctuation level minimization. The value of is similar to that used in previous studies to allow for better comparability. The cost function is computed after T 0 = 20 time units which corresponds to 10 periods of the first oscillator. The control of the system relies on the nonlinear coupling between the cylinders. Indeed, a linearization of the equations around the fixed point [a 1 , a 2 , a 3 , a 4 , a 5 , a 6 ] = [0, 0, 0, 0, 0, 0] would make the growth rate = 0.1 rendering the system uncontrollable as the first cylinder would follow an unbounded exponential growth for any actuation command b. Thus, no linear control such as LQR can stabilize the coupled oscillators. Therefore, gMLC is employed to build a control law exploiting the nonlinearities of the system to stabilize it.

A.2 gMLC settings
The employed gMLC settings are similar to the previously used ones by Cornejo , which build on expert knowledge of the system and on an array of tests previously conducted by Cornejo Maceda (2021) and Duriez et al. (2017). For further details on the settings' selection, we refer the reader to the aforementioned two references. The stabilization of the coupled oscillator system is accomplished through a single forcing term, i.e., N b = 1 . We refrain from introducing any open-loop time-dependent functions to the feedback signal because we are seeking a feedback control that can automatically adapt to the varying state of the system. The control laws are constructed using the following operations: + , −, × , ÷ , sin , tanh , exp , log , √ , in addition to 3 constants chosen between -1 and 1. To avoid a complete and premature stabilization of the first oscillator, the control laws are limited between −0.07 and 0.07 thanks to a saturation function. For the Monte Carlo initialization step, N MC = 100 individuals are generated and evaluated. The exploitation subspace is described by N sub = 10 individuals, as in Cornejo , so that the subspace is large enough to comprise rich control laws by linear combinations, but not too large to slow the convergence down. The number of individuals generated at each exploration and exploitation phase is set to N G = 50 . The gMLC algorithm settings are summarized in Table 1. The optimization process is carried out over 3000 evaluations.
In the following, four different configurations with constant as well as time-varying coupling parameter c are studied: (i) a constant coupling parameter c = 0 , (ii) a constant coupling parameter c = 0.5 , (iii) a constant coupling parameter c = 1 and (iv) a time-varying coupling parameter with a sinusoidal profile c(t) ∼ sin(Ωt) . For configuration (iv), the period of the sinusoidal profile is chosen such that the first oscillator performs 150 periods during the transition from 0 to 1 or from 1 to 0, i.e., Ω = 1 300 = 1 300 . Hence, the system is solved up to T max = 2000 time units. The four configurations are illustrated in Fig. 17.

A.3 Stabilization of the coupled oscillator system
In the following, we present the gMLC results applied to the four configurations of the coupled oscillator system. For each configuration, 10 realizations are performed. Table 2 lists the cost function values achieved by the best control law of all the realizations. As the table shows, the achieved values exhibit a small scatter among the 10 realizations for each configuration. Except for 3 test runs for configuration (iv), the scatter is small, which indicates a robust convergence to the global minima. We note the weaker performance of configuration (ii) with a relatively larger J. In fact, none of the learned control laws managed to fully stabilize the first oscillator for configuration (ii). The closest distance reached to the fixed point is r 1 = 3 × 10 −2 . Including the forthcoming observations, this deficiency appears to be related to the unfavorable coupling parameter c = 0.5 , where simultaneous and equal forcing of both oscillators 2 and 3 is disadvantageous. This results in relatively low maximum fluctuation levels of the oscillators and fails to yield < 0 needed for full stabilization.
Of particular interest to this study is configuration (iv) with a sinusoidally varying coupling. Figure 18 presents the controlled dynamical system using the best control law achieved for configuration (iv). Figure 18a shows the time evolution of each oscillator radius alongside the coupling parameter c and the actuation command b. We reiterate that actuation is initiated at time t = 20 to allow the first oscillator to develop from the initial conditions and to stabilize on the limit cycle with radius r 1 = √ 0.1 . After T = 20 time units, the actuation command starts by exciting the second oscillator, which reaches a limit cycle of radius close to 0.5. Interestingly, during this phase, the third oscillator remains unperturbed. This forcing remains relatively unchanged until c approaches unity. Here, the third oscillator starts to resonate and becomes strongly coupled with the first oscillator; therefore the new forcing leads to stabilization. As c continues its cycle and approaches zero again, the control switches back to the second oscillator. The transitions between the oscillators are well illustrated in the time series of the actuation command b. As can be seen, this behavior is repeated periodically following the variations in c. We note the control asymmetry with respect to c. For example, at the same coupling parameter value c = 0.5 , the control acts on either the second or the third oscillator depending on whether c is ascending or descending. This leads to the hysteresis presented in Fig. 18b. This hysteresis is a result of the identified control law opting to continue exciting the already active oscillator until the coupling is sufficiently different. We remark the continuous actuation of either the second or the third oscillator, which is necessary to stabilize the first oscillator near the fixed point. These observations are mirrored in the spectrogram of the actuation command b in Fig. 18c for configuration (iv) over the three cycles of c in the post-transient regime. As expected, the actuation is mainly performed at angular frequencies 2 and 3 , corresponding to the active phases Fig. 17 Illustration of the interactions among the three oscillators for the four examined configurations with (i) c = 0 , (ii) c = 0.5 , (iii) c = 1 and (iv) sinusoidal variation with angular frequency Ω = 1 300 = 1 300 . The unforced dynamics of each oscillator is depicted with black spiral-shaped arrows. The blue arrows stand for the coupling between the oscillators. Thick (thin) arrows denote a total (partial) coupling between the oscillators. The blue dashed arrows denote a time-varying coupling. The actuation is illustrated by the red arrows, whereas the red dashed arrows represent the expected response of the oscillators: outward arrows signify destabilization and inward arrows stabilization Table 2 Cost value of the best individual for each configuration. Ten tests are conducted for each configuration. The best realization of each configuration is marked in bold. The cost function J is computed only over one period with T max = 2000 time units Sinusoidal c Test 1 7.3 × 10 −4 3.3 × 10 −3 7.5 × 10 −4 4.5 × 10 −3 Test 2 7.5 × 10 −4 2.8×10 −3 7.5 × 10 −4 8.5 × 10 −4 Test 3 7.4 × 10 −4 2.8 × 10 −3 7.5 × 10 −4 7.5 × 10 −4 Test 4 7.5 × 10 −4 2.8 × 10 −3 7.3×10 −4 7.5 × 10 −4 Test 5 7.5 × 10 −4 2.8 × 10 −3 7.4 × 10 −4 3.1 × 10 −3 Test 6 7.4 × 10 −4 2.8 × 10 −3 7.5 × 10 −4 7.5 × 10 −4 Test 7 7.3 × 10 −4 2.8 × 10 −3 7.4 × 10 −4 4.7 × 10 −3 Test 8 7.4 × 10 −4 2.8 × 10 −3 7.4 × 10 −4 7.5 × 10 −4 Test 9 7.2×10 −4 2.8 × 10 −3 7.3 × 10 −4 7.5×10 −4 Test 10 7.3 × 10 −4 2.8 × 10 −3 7.5 × 10 −4 8.4 × 10 −4 of the second and third oscillators. The spectrogram also shows the second (weakly) and third harmonics of 2 at = 20 , 30 as well as triadic interactions between them: 2 − Δ ≈ 3.82 , 3 + Δ ≈ 22.36.
To compare the gMLC performance trained under unsteady conditions to existing approaches, we construct a parametric control law from the control laws optimized in configurations (i)-(iii) with c = 0 , 0.5, and 1. The parametric law is built by linearly combining the optimized laws. The interpolated control is mathematically expressed as where w i are the interpolation weights, and K (i) , K (ii) , and K (iii) are the best control laws identified with gMLC for c = 0 , 0.5 and 1, respectively. The interpolation weights are piecewise linear functions of c such that K param (c ≤ 0) = K (i) , K param (c = 0.5) = K (ii) , K param (c ≥ 1) = K (iii) , and intermediate values are linearly interpolated. Figure 19 presents the controlled dynamical system using K param on configuration (iv). As can be seen, the interpolated control law K param behaves differently than the optimized one trained under (19) K param = w 1 (c)K (i) + w 2 (c)K (ii) + w 3 (c)K (iii) unsteady conditions. Here, forcing is mostly conducted through the second oscillator, except during short time-windows close to where c = 1 when the third oscillator is forced instead. Similarly, the parametric controller yields a hysteresis in actuation, albeit to a smaller extent (c.f. Fig. 19b). The spectrogram of the associated actuation command in Fig. 19c reflects the previous observations; forcing is mainly achieved through the second oscillator at a frequency of 2 , except during smaller intervals when the third oscillator is coupled the strongest (i.e., c ≈ 1 ) and forcing is conducted at a frequency of 3 .
The cost associated with the parametric law K param for configuration (iv) is J param = 4.8 × 10 −4 , which is only slightly better than that of the directly learned gMLC control law under unsteady coupling conditions with J = 7.4 × 10 −4 . Despite the slight superiority, training a control law directly under unsteady conditions remains clearly favorable when accounting for the overall cost of our parametric control approach, which requires three separate runs. In other words, learning under unsteady conditions allows almost the same performance as our parametric control but with a significant reduction in the number of evaluations.