Instant flow distribution network optimization in liquid composite molding using deep reinforcement learning

Carbon fibre reinforced plastic (CFRP) manufacturing cycle time is a major driver of production rate and cost for aerospace manufacturers. In vacuum assisted resin transfer molding (VARTM) where liquid thermoset resin is infused into dry carbon reinforcement under vacuum pressure, the design of a resin distribution network to minimize fill time while ensuring the preform is completely full of resin is critical to achieving acceptable quality and cycle time. Complex resin distribution networks in aerospace composites increase the need for quick, optimized virtual design feedback. Framing the problem flow media placement in terms of reinforcement learning, we train a deep neural network agent using a 3D Finite Element based process model of resin flow in dry carbon preforms. Our agent learns to place flow media on thin laminates in order to avoid resin starvation and reduce total infusion time. Due to the knowledge the agent has gained during training on a variety of thin laminate geometries, when presented with a new thin laminate geometry it is able to propose a good flow media layout in less than a second. On a realistic aerospace part with a complex 12-dimensional flow media network, we demonstrate our method reduces fill time by 32% when compared to an expert designed placement, while maintaining the same fill quality.


Introduction
Improved fuel efficiency of passenger flights is critical to the environmental sustainability and economic viability of air travel. In response, major aircraft manufacturers have moved to improve the efficiency of their products through new designs and new technology. The weight of an aircraft is of singular significance to fuel burn, and for this reason the industry has moved away from Aluminium as the major structural material to lighter composite materials like Carbon Fibre Reinforced Plastic (CFRP). Over the past 30 years each generation of aircraft has included more CFRP to reduce weight. Modern passenger aircraft are over 50 percent CFRP by weight, including CFRP primary structure such as the fuselage and wings on the Boeing 787 and Airbus A350 (Roberts, 2007). CFRP cure cycle time is a major driver of production rate and cost for Original Equipment Manufactur- In Liquid Composite Molding (LCM) techniques such as Vacuum Assisted Resin Transfer Molding (VARTM), liquid thermoset resin is infused into dry carbon reinforcement under vacuum pressure in an oven. When compared to traditional cure processes where the carbon reinforcement is preimpregnated with resin (prepreg) and then cured under pressure in an autoclave, the comparatively lower cost of ovens along with other advantages has lead to increased interest in the use of VARTM in aerospace structures (Soutis, 2005). However, the time required to fill the dry reinforcement with resin means that a part produced with VARTM will spend more time in an oven than a prepreg part spends in an autoclave. Designing processes to reduce this additional cycle time will be a critical factor as production rates increase. This challenge also exists in a related family of LCM techniques which use all rigid molding surfaces allowing pressures higher than 1 atmosphere, called Resin Transfer Molding (RTM). RTM is used to produce complex 3D parts, but is limited in scalability due to the increased cost of double-sided sealed tooling, and so is not applicable to larger structures.
LCM processes have two main stages (Sozer et al., 2012): 1. Placement of the dry preform on or inside the mold 2. Mold filling, in which the resin is infused or injected into the preform to saturate all the empty space between the fibers During the mold filling step, resin is injected through inlets or 'gates', and air leaves through outlets or 'vents'. The first objective is to fully saturate the preform with resin, displacing all the air and volatile organic compounds, as not completely filling the preform leads to dry spots and defective parts that cannot be used. The second objective is to minimize the fill time. A successful LCM process will completely fill the preform quickly (Sozer et al., 2012).
There are two main approaches to reduce fill time. In a RTM setting, a high pressure can be used to increase the resin flow rate. In both RTM and VARTM settings, the design of the resin distribution system has a direct impact on how the resin flows through the preform (Hsiao and Heider, 2012) and whether the preform is entirely filled, it is therefore of primary importance to LCM process design. A resin distribution system includes the design and location of resin injection sites, the location of air outlets, the design and location of channels or flow pipes designed to carry resin from one area of a mold to another, and the design and location of high permeability material layers ('flow media' or 'distribution media') placed on top of the preform that distribute resin preferentially (Hsiao and Heider, 2012).
The design of these systems must be done carefully and is not always intuitive (George, 2011). In industry, process engineers depend on their experience and knowledge to propose a resin distribution system design, which is then refined through trial and error loops, expensively testing fabrication strategies, rather than a systematic optimization approach. To avoid many physical experiments, Finite Element Methods (FEM) have been previously established in the literature solving Darcy's law describing the flow of resin in a dry preform (Bruschke and Advani, 1990), we briefly review the key points in Section Simulation and Process Model. However, even with accurate process models, optimization of the resin distribution network still remains a challenge.
In RTM where the high pressure helps assist in achieving high resin flow rates, the design of gate and vent locations has been the subject of active research. Using various models of resin flow, many optimization techniques have been studied, we briefly review some key contributions to the RTM design optimization literature below: -Young (1994) use a genetic algorithm to optimize 2 gate locations evaluated using a mold filling FEM that also simulates temperature. They propose an objective function weighting resin injection pressure, the maximum time difference between boundary nodes being filled, and the maximum temperature difference between nodes at the end of mold filling. They find their results are encouraging but that computational complexity is a limit to the effectiveness as the number of design variables increases. - Mathur et al. (1999) use a genetic algorithm to optimize 2 vent locations for minimum fill time using a mold filling FEM. They propose an objective function that weights fill time and dry spot formation, and use a step function penalty to impose constraints. They report that their results make physical sense and could be used to seed a local search. -Gokce et al. (2002) optimize the location of a single gate using a branch and bound approach and a mold filling FEM. They partition the possible gate locations into sets geometrically and exploit a locality assumption that the lower bound of the objective function (total unfilled nodes) is of the same order of magnitude as the average over a given spatial partition. They perform separate optimizations for cycle time and elimination of dry spots. The authors report that the branch and bound approach had an 86% lower computational cost when compared to a genetic algorithm, and verified it found an optimal solution via exhaustive search. -Boccard et al. (1995) use a geometric approach based on distances which does not take into account material properties to propose gate locations for thin molds that avoid trapping air bubbles. Their approach is based on constructing geometric subdomains in the 2D plane, such that each domain contains one inlet. They then use the distance from each inlet port to the two closest points on the perimeter to determine the gate locations algorithmically, without using input from a mold filling simulator. - Lin et al. (2000) use a quasi-newton optimization method to refine the initial placement of 3 gates using a mold filling FEM. The authors find that using a FEM with gradient based search requires either a fine mesh or adaptive meshing so that small differences in evaluated gate locations provide a useful gradient to the optimizer. They recommend that if one or more of the parameters are discrete (such as the number of gates), gradient based optimization should either not be used, or used in conjunction with a search based method like genetic algorithms. - Wang et al. (2017) use a geometric approach (medial axis transform) to find initial symmetric designs for injection channels in RTM molds of shell-like complex parts. The initial designs are then optimized using a FEM based mold filling simulation to determine the flow pattern. This flow pattern is then geometrically partitioned to allow a binary search to move injection channels to correct for anisotropy in the preform.
The main challenge of VARTM is the reliance on vacuum pressure. As the resin follows the pressure gradient through the preform, the varying permeability caused by complex geometries is especially prone to dry spots if the resin delivery system is not well designed. Versus RTM, there is comparatively less literature concerning optimization of VARTM resin distribution networks. We summarize some of the key approaches below: -Hsiao et al. (2004) optimize the cross-sectional area of 2 flow runner channels, and the number of plies of distribution media in 6 sections to determine an optimal flow distribution network for a co-cured stiffened panel.
The authors define an optimization objective weighting dry spot volume fraction and fill time. They use a genetic algorithm and a finite element flow model and report their GA found a solution in the top 1.3% of all possibilities, using only 15.5% of the computational time. Their results were verified by exhaustive search and experimentation. - Kessels et al. (2007) optimize the location of 3 flow runner pipes on a complex glider shell. They use a 3D geometric mesh distance model based on the assumption that the resin first fills the nodes that are closest to the inlet, then the next closest, etc. They remark that this assumption is locally true but not globally, and suggest modifying mesh distances to capture anisotropic permeability, although this is not demonstrated. The mesh based distance model is then used in the loop of a genetic algorithm. They found generally good agreement between their model and a FEM filling model, but reported that when the flow is complex, the results differ from a FEM filling model and recommend a final calculation to refine the vent locations. - Sánchez et al. (2015) propose a pre-design tool for resin channels on a large thin boat shell using a geometric model. The model is based on distance fields computed by level sets which are then partitioned into independent regions. The partitioning lines are the resin channel paths, resembling a medial axis. It assumes uniform and isotropic permeability in the preform and that gates should be placed on the boundary, as far away from inlets as possible. based on an assumption of uniform permeability to be used. The authors conclude that the resultant pre-design should then be computed by a full physicsbased simulation. - Sas et al. (2015) optimize 6 flow media sections on a panel and 14 sections on a complex geometry for robustness under multiple racetracking scenarios around an insert. Focusing on robustness, they explicitly enumerate the possible racetracking scenarios around the insert and use tree-search based method to perform discrete optimization over the number of distribution media plies in each region. Of note, their tree search is able to update the number of regions if the optimization does not converge. Evaluation in the tree search is done using a FEM fill model. They report a case study in which their proposed solution is able to fill the mold for all scenarios.
Many of these approaches can be considered a multiobjective grid based placement optimization problem. A more complex and constrained variant of this general problem is heavily studied in integrated circuit design and is known as Global Placement. While there are analytical approaches like Integer Linear Programming, and numerical approaches like non-linear optimization, much like flow media placement it is often solved stochastically using GAs, Tree/Graph search, or Reinforcement Learning (Mirhoseini et al., 2020). We have focused our literature review on the optimisation techniques previously applied to flow media placement and similar manufacturing problems in composites, however for interested readers we recommend the review of the history of work in this related field that inspired ours by (Kahng, 2021).
In the case of the RTM literature, optimizations that use realistic physics-based flow models all exhibit very low dimensional design spaces (typically only one or two gate locations are optimized) and often have an intuitive starting position such as the center of the mold to work from. Conversely, VARTM resin delivery systems are often more complex, and the set of design variables is larger. Therefore most studies that consider flow pipes or channels have not used accurate physics-based models, but instead made use of symmetry and assumptions of the homogeneity of the permeability in parts, which is a significant limitation for application to VARTM aerospace structures. While the placement of distribution media has a very large effect on the flow front, natural through-thickness permeability variation in the preform can also have an effect on the evenness of the flow front along the bottom surface. Yun et al. (2017) found that this variation meant that the percentage of voids in the final filled preform increased with the permeability of the distribution media, further emphasizing that physicsbased simulations with appropriate material properties are important for resin distribution network optimization. Hsiao et al. (2004) and Sas et al. (2015) tackle the highest dimensional design spaces in a problem setting similar to ours. In Hsiao et al. (2004), the authors optimize 6 sections of resin flow media and the cross-sectional area of 2 flow runner channels, allowing multiple layers of flow media in each section, resulting in a search space with 3 2 * 2 2 = 2916 combinations. The main limitation of the work in Hsiao et al. (2004) is in scalability, as their proposed genetic algorithm explores a full 16% of this relatively small design space to yield the optimized result. In Sas et al. (2015) the authors expand the search space under optimization compared to Hsiao et al. (2004) by considering up to 14 flow media sections (allowing one or zero layers of flow media at each) giving a design space of 2 14 = 16, 384.
In aerospace VARTM designs with many flow media sections and multiple layers allowed at each, the design space is at least one and possibly three orders of magnitude larger than considered in Sas, and exploring that much of the design space is expensive in both computational time and real time. Designing aerospace geometry is an iterative process as requirements and interfaces are refined. Many distinct but related part designs will be proposed before the geometry is finalized. Hence, optimizations that take significant time before feedback is given to the engineers can become bottlenecks in the design process. Our proposed solution to this problem is an optimization technique that is able to transfer knowledge from previous optimizations performed on similar designs to a new design in the same way an expert would.
We achieve this by training a neural network-based agent to optimize the placement (location and thickness) of flow media through Deep Reinforcement Learning (DRL). Through experience, our agent learns the skill of placing flow media when presented with a part design. This experience encoded in the neural network can then be thought of as a skill that can be applied to other parts without requiring additional experience. This is a shift in how to think about process optimization. Instead of directly optimizing the scenario of interest by running a process model, we train an agent on a variety of (usually randomly generated or perturbed) scenarios in order to gather relevant experience for the agent. Then, when presented with a previously unseen scenario, the agent immediately proposes new parameters without any further training. This can create significant time savings as it potentially eliminates entire optimization runs. The applicability of the agent's previous experience and therefore the overall effectiveness when the agent is presented with a new scenario will depend on the similarity between the scenarios it was trained on, and the scenario of interest.
Applying DRL to production system optimization has been recently shown to outperform conventional methods in many manufacturing domains. We recommend the review of this area by (Panzer and Bender, 2021) for a comprehensive summary. DRL has been shown to work well in the manufacturing setting through creating RL environments based on offline process model simulators such as ozonation of textiles (He et al., 2021), material draping (Zimmerling et al., 2022), metal forging (Ma et al., 2022), and brine injection (Andersen et al., 2019). The need for knowledge transfer combined with the high-dimensional decision problem and the demonstrated success of DRL in industrial process optimization motivate our use of Deep Reinforcement Learning in this domain.
Our model is trained using realistic geometries and process model runs from a design family of thin laminates with complex "pad-ups" (an aerospace term for locally thicker sections on an aerostructure skin), the system is then instantly able to propose a good placement on previously unseen parts from that same family without running the process model at all. Additionally, by finetuning, the system can continually learn and improve from new experiences as it is used.
The main contributions of our work are: -We introduce a novel instant optimization method using machine learned knowledge transfer for VARTM flow media placement that is scalable to realistic aerospace parts with the objective of avoiding dry spots and minimizing fill time -We detail our Deep Reinforcement Learning (DRL) based optimization methodology that successfully learns a skill from experience in simulation to optimize flow media placement on designs that were not seen during training, without evaluating any process model on the design under optimization -We present a case study applying our method to a realistic aerospace laminate part with a 12-dimensional design space of size 3 12 = 531, 441 where we reduce fill time by 32% when compared to an expert designed placement, while maintaining a complete fill of the preform The remainder of this paper is organized as follows. In Section Flow distribution network optimization, we describe the overall resin flow optimization problem. In Section Simulation and Process Model we describe the process model used in our optimization. In Section Flow media placement optimization we describe our proposed approach to flow media placement optimization. The results and discussion are presented in Section Results and discussion. Finally, Section Limitations concludes the paper and discusses future work.

Flow distribution network optimization
In this study we consider a VARTM process occurring in an oven with metallic tooling underneath the CFRP and soft tooling on top as in Fig. 1. The tooling is Inner Mold Line (IML) controlled as is used in complex structures in the aerospace industry (Hiken, 2017), meaning complex pad-ups are placed on the tool side to control assembly tolerances. A stack of carbon sheets (plies) are stacked (layed up) between the top and bottom layer of tooling materials. In VARTM, the soft tooling materials are sealed to the mold surface, and the carbon plies are then held under vacuum to compact them and remove air. The resin is injected through inlets and leaves through outlets under 1 atmosphere of vacuum, this pressure differential pulling the resin through the dry preform and filling the empty space between fibers with the resin. Once saturated, the now filled preform is held under vacuum and the temperature is raised to cure the resin and create the final composite part. In order to ensure the preform is appropriately filled, a resin distribution network is used to direct the flow of the resin during infusion.
The resin distribution network consists of the inlets, outlets, flow channels or pipes, and high permeability distribution media (flow media) layers. As the impregnation of the resin into the dry reinforcement is achieved only through vacuum, the resin distribution network design significantly influences part quality and cycle time.
While inlets can be single points, it is typical to use a 'flow runner' (channel or leaky pipe) so that injected resin flows from a line instead of a point. These are commonly placed along one edge of the part for practicality. Default practice is to cover most of the top surface of the preform with a layer of high permeability distribution media in order to encourage resin flow across the top of the preform (inplane) first, and then allow the resin to flow through thickness. Due to the vacuum outlet location on the opposite edge from the inlet, the pressure differential pulls resin both in-plane and through thickness, however as the permeability throughthickness is much lower than the in-plane permeability, care with the placement of distribution media must be taken to ensure the resin doesn't flow across the top of the preform in-plane before filling through-thickness. An example of a resin distribution network containing a flow runner and flow media is displayed in Fig. 2. While filling, the resin flow front will be non-uniform through the thickness of the preform. This phenonmena is displayed in Fig. 3. In aerospace parts with large scale and/or complex geometries with locally varying thicknesses, this lag in the flow front between the top and bottom surfaces is exascerbated, and can cause the resin to 'racetrack', arriving at the outlet through a more permeable path around thick areas instead of through them before the whole preform is full, trapping off vacuum and causing dry spots due to resin starvation (known as 'trapoff'). An illustration of this phenomena for a part with varying thickness as an example is displayed in Fig. 4).
As a preform that is not almost completely filled will lead to a defective part, a sub-optimal resin distribution network design causes both quality issues and poor cycle time.
Optimizing resin distribution is therefore an important problem in industry, however current practice relies on manual optimization by trial and error either through simulation physical experiments.

Simulation and Process Model
In the literature, flow optimization makes use of either a numerical process simulator (Gokce et al., 2002;Lin et al., 2000;Young, 1994;Boccard et al., 1995;Mathur et al., 1999;Hsiao et al., 2004) or a geometric surrogate model (Sánchez et al., 2015;Kessels et al., 2007), in order to evaluate an objective function without doing expensive physical experiments.
Mathematical models of the relevant physics of resin flow during processing of thermoset composites and their solutions using Finite Element Methods have been previously established in the literature (Bruschke and Advani, 1990) and the key points briefly reviewed here. We do not propose a new process model in this work, but use existing work to feed our optimization.
When filling the dry preform, we can treat the resin as a fluid flowing through an anisotropic porous medium and we can describe it with Darcy's law (Eq. 1).
Where η is the viscocity of the resin, K is the permeability tensor of the perform, − → u is the average velocity, and ∇ P is the pressure gradient in the flow field. Darcy's law states that the velocity of a fluid flowing through a porous medium is directly proportional to the driving pressure drop.
If we assume that the resin is incompressible and the flow is quasi-steady state in the domain (the domain is at full saturation) as in Eq. 2, we generate the following PDE (Eq. 3).
This second order PDE can be solved when the boundary conditions (no flow through the boundary, inlet at the process injection pressure, and flow front pressure at zero) are prescribed. This is a moving boundary problem as the resin domain changes over time. However, once the velocity is obtained at the flow front, we can project the expansion of the resin saturated domain for the next time step, and update the pressure distribution. This is discretized using finite elementcontrol volume (FE-CV) methods to perform these updates.
As suggested in (Hsiao et al., 2004), by assigning different (but fixed) permeability values (in 3D) to the preform, flow runner, and flow media elements, we can model the resin flow during the VARTM process for the purposes of flow media layout optimization. We use the validated FE-CV model formulation from Bruschke and Advani (Bruschke and Advani, 1990) which has been used with good correlation to experiments in (Mathur et al., 1999;Gokce et al., 2002;Maier et al., 1996;Hsiao et al., 2004). While we use this previously validated approach in our study, we emphasize here that our optimization method detailed in the next section makes no particular assumptions on the process model except for access to the element fill volumes and times. With the appropriate software engineering it could be applied to work with other tools such as PAM-RTM TM (ESI-Group TM ), and LIMS (Maier et al., 1996).

Flow media placement optimization
We consider the problem of determining optimal placement of flow media for the fabrication of thermoset-matrix composites using a VARTM process. Our optimization target is the family of thin laminate parts with complex pad-ups as are typically found on aircraft control surfaces. The objective is to minimize the cycle time while simultaneously satisfying the primary manufacturing constraint that all of the part is filled with resin.
We assume the part is held at a constant infusion temperature at all locations during filling. The resin delivery network under optimization consists of flow media layers placed on the top surface of the preform. In all our scenarios there is a flow runner channel running along one side of the tool, with a vacuum outlet on the opposite side with the runner, inlet, and outlet locations all held fixed. For the purposes of optimization, we discretize the flow media design space into a 4 by 3 grid, each position of which can accept 0, 1, or 2 layers of flow media. A visualization of the parameterization is presented in Fig. 5. The Flow Media Map is a 2D matrix containing the integers 0, 1, or 2 at each coordinate (i, j) which corresponds to the number of flow media layers at that position in the grid. This 1:1 correspondence between the Flow Media Map matrix and the flow media arrangement is clear by inspecting Fig. 5. This parameterization results in a design space of 3 12 = 531, 441 combinations. The grid discretization of 4 by 3 was chosen to be at least an order of magnitude larger than considered by previous literature while not exceeding reasonable manufacturing complexity, however our optimization method does not depend on the exact discretization.
We use two-dimensional elements to represent the flow distribution media in the finite element model with constant thickness, porosity, and permeability. To model the addition or removal of flow media, the Finite Element Mesh used as input is modified by changing the thickness and therefore control volume of the flow media elements that are mod-eled on the top of the preform. The part is meshed in 3D with tetrehedral elements and the flow progression is solved as described in Section Simulation and Process Model. The process model stores the time at which each element is full which allows computation of flow fronts at any time t. This output is used to produce the Fill Map (a heatmap where the values represent the time at which the bottom surface elements fill) and the Trapoff Map (a binary map where all values are zero except element locations lacking connection to the outlet boundary at any time). These maps can be interpreted as images produced by virtual cameras if the tooling surface was transparent, as is common in validation studies. This information gives the agent spatial context with which to make decisions on flow media placement. Section State describes how the process model information is encoded into the state consumed by the agent. A visualization of the process model elements used in optimization is presented in Fig.  6.

Reinforcement learning
Deep Reinforcement Learning (DRL) approaches have recently enjoyed success in solving control problems with highdimensional inputs from simulators where control policies are difficult to model formally. Relevant examples are playing Atari from pixels (Mnih et al., 2015), learning object manipulation or locomotion in robotics, and playing strategy games like Go (Silver et al., 2016) and StarCraft (Vinyals et al., 2019). In industry, these approaches have successfully been used to control HVAC systems in datacenters (Moriyama et al., 2018), allocate computing resources (Mirhoseini et al., 2017), and design semiconductors (Mirhoseini et al., 2020). In composites, DRL was used to optimize VARTM temperature profiles and tooling thicknesses in (Szarski and Chauhan, 2021).
We note that the placement of resin distribution network have similarities to circuit placement problems in semiconductor design. We are inspired by the industrial success in the Fig. 6 The 3D, 2D, and 1D mesh elements model the dry preform and resin distribution network. Once the process model is run with the resin material properties, the output maps are generated for use by the optimizer Fig. 7 Reinforcement learning setting area of data-driven design of these complex systems using Reinforcement Learning. Of particular interest is the ability of neural network 'agents' trained via reinforcement learning to generalize from accrued experience and apply the knowledge gained to new scenarios immediately without further training.
In the setting of Reinforcement Learning (RL), we have an agent interacting with an environment in discrete time steps. At each time step t, the agent receives the environment's current state S t , and the agent must choose an appropriate action A t in response. After the agent executes the action, the agent receives a reward R t and a new state S t+1 (See Fig. 7). We refer to the sequence (S t , A t , R t , S t+1 , A t+1 , R t+1 , ...) (i.e. the history of what action the agent took in each state and the subsequent reward it received), as the trajectory τ . Informally we can think of this as the experience the agent is able to learn from.
The goal of Reinforcement Learning is therefore to train an agent so that it "knows what to do". This is equivalent to learning the best action to take in a given state, so as to maximize the expected numerical reward over time. We refer to the map between states and actions as the policy.
In this setting, each time step t represents one full simulation with a set of input process parameters proposed by the agent. The reward function is calculated on the output, and over time the agent learns the relationships between the input parameters and the output reward in order to make better choices of process parameters based on its experience.
Markov Decision Processes are a mathematical framework for modelling reinforcement learning problems. For a thorough background we recommend the treatments in (Sigaud and Buffet, 2013) and (Sutton and Barto, 2018). However, we review the definition here.
Formally, we consider the flow media placement problem as an infinite-horizon discounted Markov Decision Process (MDP). An MDP M is a tuple M = S, A, P, R, γ , where: -S is a set of states (e.g. a spatial representation of the flow media placement, part geometry, and flow simulation output), -A is a set of possible actions to control the system (e.g. adding or removing flow media from locations on the tool), -P : S × A × S → [0, 1] is the state-action-state transition probability distribution (e.g. what is the impact of a change to flow media placement on the flow simulation) -R : S × A × S → R is the reward function for transitions (e.g. fill time and trapoff penalties) -γ ∈ [0, 1) is a discount factor for future rewards We seek a policy π : S × A → [0, 1] that maximizes the discounted expected return η (i.e. the reward expectation over multiple timesteps).
With τ = (s 0 , a 0 , s 1 , a 1 , ...). With flow media placement posed as an MDP, the optimization problem is equivalent to finding the policy π that maximizes a reward function designed to minimize trapoff and total fill time. A critical point is that our policy π must be learned from experience with state-action pairs (i.e. the expected fill pattern of a given placement on a given geometry) that have varying geometry, such that the agent can learn a skill that generalizes to similar parts. In our flow placement formulation we hold the flow media grid size, material properties, and inlet/outlet arrangement fixed. During training, we refer to a single trajectory τ as an episode. This means that every timestep t from the RL agent's point of view is a complete run of the FEM and therefore distinct from the time variables in the PDE. As the action at each timestep adds or subtracts a single layer to each grid cell, a single timestep is not enough for the flow media grid to reach our maximum thickness of two layers per cell. Due to this and the stochastic nature of many RL algorithms, we model the decision problem as an MDP with episodes of 2 timesteps, which allows grid cells to reach the maximum thickness in our problem setting. To enable more extreme designs in future work, the episode length could simply be increased with no change to the model. Every episode, a new part geometry is generated, such that the agent receives experience during training across a variety of flat laminates with local pad-ups (see Section Random part mesh generator).

Flow media placement reinforcement learning environment
We implement the environment following the Open AI Gym interface (Brockman et al., 2016), which requires reset() and step() methods, which reset the simulation or progress it one timestep, respectively. Because it is necessary to fully run a flow model with a flow media placement in order to calculate a useful reward, in our case each timestep of the environment reflects a full run of the flow model with a given placement, with the agent modifying the placement each time. A flowchart of this process is shown in Fig. 8.
We defined and implemented a resin flow environment for reinforcement learning as follows: State -64x64x1 Part Geometry Height Map -64x64x1 Part Fill Map -64x64x1 Flow Media Map -64x64x1 Trapoff Map -64x64x1 Absolute Fill Time Channel As our policy neural network is designed to work with 2D data, our proposed state representation is image based. After each run of the process model we produce 5 images of resolution 64 x 64 for each process model run as input to the policy network, representing the part geometry, flow media layout, and the resulting resin flow. All maps are normalized to the range of [0.0, 1.0] by the environment when they are given to the DRL agent. The Part Geometry Height Map is a 2D view of the part thickness normalized to 25.4mm thick. The Part Fill Map is a 2D heatmap representing the latest fill time of a preform element on the tooling surface at each point, normalized against the maximum fill time of any element in the map. The Flow Media Map represents the spatial flow media density as solid blocks as in Fig. 5 and is normalized against the maximum allowable flow media thickness. The Trapoff Map is calculated as follows: 1.
Step through the flow progression in i steps of 1/100 the maximum fill time, calculating the area A i of all tooling surface preform elements that are not connected to the boundary at step i 2. Take i such that A i is at its maximum 3. Create an image such that all tooling surface preform elements that are not connected to the boundary are set to 1 and the background is set to 0 The trapoff map can be seen as a spatial representation of the areas that have high trapoff risk (unfilled nodes with no connection to the boundary at any time). Both the trapoff map and fill map are produced with a virtual surface camera looking at the tooling surface elements as in Fig. 9. The entire state representation is displayed in Fig. 10.

Action
The flow media placement optimization problem is treated as discrete action control, with the agent choosing to add, remove, or leave flow media layers at all locations on the tool at every timestep through a multinomial action distribution of dimension n * 3, where n is the number of discrete locations where flow media may be placed on the preform.
n commanded placement modifications for flow media thickness [Remove, Leave, Add] The flow media actions are applied as in Fig. 11. We use the flow media map color convention for the remainder of the paper.

Reward
The reward at each step is a fill time penalty based on the maximum fill time of any element in the laminate: where C is a fill time normalization constant set to the largest allowable fill time, t i is the fill time for element i, and i enumerates all mesh elements that belong to the preform. The time penalty is added to a penalty considering all preform elements not connected to the border (trapoff): where f is the step in the FEM flow simulation, F is the final step of the FEM flow simulation, |E \ (E B)| t is the number of elements that are not path connected to the boundary at time f , n is the number of mesh elements that belong to the preform, and w is a weighting factor for disconnected element count that is set to 4 in our experiments. The trapoff map considers the entire fill time and not just the final state to avoid sensitivity to extremely long fill timesas the pressure in the preform is unlikely to ever be exactly zero it is possible that in the model, given a large amount of time, these elements may eventually fill, but this may not be physically realistic and is almost certainly not useful. The total reward at each timestep is designed to balance fill time and trapoff rewards through a weighted average:

RL Interaction Example
In the beginning of training, actions taken by the RL agent in each episode will appear random as the agent explores, as in Fig. 12. Each time the process model is run, the reward is calculated, penalizing trapoff and fill time. During training, the agent learns a policy that can take in the 5 maps as the current state, and propose an action that is likely to lead to an increased reward (faster fill time and/or less trapoff). A schematic of an episode later during training is displayed in Fig. 13. In this episode, presented with different random geometry, we can see the agent discovers an 'L-shaped' flow media placement strategy for this pad-up geometry and then refines it. This experience can be used by the agent later, on further geometries.

Proximal policy optimization
Our policy π θ mapping states to actions is a Convolutional Neural Network that consumes the 5 2D Matrix heatmaps stacked as a 5-dimensional state tensor (see Section State) and outputs the mean μ and standard deviation σ vectors of a 12 (3x4) dimensional multivariate gaussian distribution which determines the next placement actions (remove, leave, or add, for each of the 12 grid locations). Informally, training the policy involves collecting samples of the reward given when the agent takes a certain action in a given state, and then based on the reward, modifying the agent's policy to make it more or less likely that the agent takes that action when it is presented with that same state in the future. The power of the Convolutional Neural Network policy is that it can generalize between similar states, so that not every state needs to be previously observed in order to take a good action.
In the context of flow media placement, during training the agent observes the results of its many different flow media placements on many different part geometries, and uses this experience of how part geometry, flow media, and fill patterns interact to increasingly make better placements (as measured by the reward function). Each simulation of (geometry, placement design, fill pattern) is a state-action-reward trajectory that is used to train the policy.
In order to train the policy neural network, we use the Proximal Policy Optimization (PPO) (Schulman et al., 2017) algorithm. PPO is a policy gradient approach using advantage estimation to smooth out the gradient.
Formally, we denote a policy π , a state action function as Q π , the value function as V π , and the advantage function as A π . PPO updates the parameters θ of the neural network policy π θ by taking multiple steps of minibatch Stochastic Gradient Descent (SGD) over collected state-action-reward trajectories and solving Eq. 8. Where:

Results and discussion
Experimental setup

RL policy network
The policy network consists of a Convolutional Neural Network (CNN) feature extractor based on (Mnih et al., 2015), with 3 Convolutional layers (8x8, 4x4, and 3x3 kernel sizes, with strides of 4, 2, and 1 respectively) and 2 Linear layers with ReLU nonlinarities between all layers. The architecture of the feature extractor is displayed in Fig. 14. At each environment step, the policy network outputs a mean and variance vector of a multivariate gaussian representing the action to take at each location on the tool (add, remove, or leave flow media layers). Our training was purely model-free, based on abstract representations of states and actions, no knowledge of the placement problem was encoded into the policy network.

Simulation environment
The environment was made up of a random part mesh generator and a 3D finite element model solving for the flow progression. The material properties used in the simulation were based on a T700 fiber based woven fabric and RTM6 resin, and are displayed in Table 1. Each step of the environment runs the FEM with the new distribution of flow media until the part is either filled or a iteration limit is reached.
Total training time is about 12 hours on a 72 cores of an Intel Xeon 2.0Ghz CPU.

Random part mesh generator
To learn a generalizable agent we seek to train on realistic examples that could have been drawn from the distribution of real parts of interest. Given the iterative process involved in aerospace design, we restrict our random generator to a single part type, with the aim of training an agent that is able to generalize to modified designs within that part type. The geometry family of interest has complex pad-up regions in the tooling surface, with a flat surface at the top on which flow media is placed. The laminates are all 1125mm x 525mm. We create a simple generative model of a skin-like laminate with thick pad-ups similar to those used at interfaces on control surfaces.  We sample 10 random panels at the start of training, and use the same 10 thereafter. Examples of two random panels are displayed in Fig. 15. Thickness maps of 10 panels sampled during training are available in Appendix A.

RL algorithm implementation
We use the open-source Stable Baselines 3 (Raffin et al., 2019) implementation of the PPO algorithm in PyTorch, distributed across 72 processes using the Message Passing Interface (MPI) (Dalcin et al., 2011) for Python to allow efficient communication between processes. The environment is designed according to the Open AI Gym interface (Brockman et al., 2016), and wrapped in the OpenAI Baselines Vec-Normalize wrapper, which computes a running average of observations and rewards for use during learning.

Experiments
We based our training hyperparameters on known-good values from (Schulman et al., 2017). As the environment is deterministic and two successive actions can maximize or minimize flow media in all locations, the Steps per Episode was reduced to 2 in order increase the variety experienced by the agent. The following hyperparameters were used in our experiments: -Learning Rate = 0.0003 -GAE λ = 0.95 -Batch Size = 48 -Minibatch Size = 48 -Steps per Episode = 2 -Episodes = 16000 We train our agent only on 10 randomly generated panels as described in Section Random part mesh generator. As in Section Reinforcement learning, during the training process a panel is randomly selected every episode, so that the agent receives experience across a variety of panels. We evaluate the ability of our agent to use the knowledge it has gained during training by asking it to instantly optimize two panels it has never seen before but are from the same family (thin laminates with pad-ups of dimension 1125 mm x 525 mm.). We use two geometries for this purpose, a randomly generated geometry with two large pad-ups, and a realistic human designed geometry from the control surface family of interest. We reiterate here that the case studies are performed without any training, and that the results for these panels are instant.
The two geometries in our case study are displayed in Fig. 16.

Randomly generated test part
Fiures 17 and 18 present the pre-and post-optimization states of the Randomly Generated Test Part, respectively. In each case the 'pre-optimization' state represents a uniform distribution of one layer of flow media placed across the laminate, which is the default used industrially for flat laminates and unlikely to be optimal when applied to complex geometries. Before optimization, only 88% of the part was filled due to trapoff around the thick pad-up areas, after flow media layout optimization, 99.9% of the part is filled.

Realistic part
Our optimization reduced fill time by 32% , while ensuring complete fill of the part. On the realistic part we had an aerospace practitioner manually design a flow media distribution which we refer to hereafter as 'manually optimized'. The unoptimized, manually optimized, and RL optimized flow media distributions and fill progressions are displayed in Figs. 19, 20 and 21.

Discussion
On the Randomly Generated Test Part, the RL agent is successful at applying the knowledge gained during training to a new geometry that it has not seen before. In Fig. 17, we can see that a flow media layout with 1 layer across the top of the preform suffers from racetracking causing trapoff. The instant optimization produced a flow media layout that eliminated trapoff. Inspecting the flow media layout, we can see that the agent has made two major changes to the default design with 1 layer of flow media across the top of the entire preform: adding a layer of flow media on the top of the outer pad-up to encourage fast resin flow to the inner padup, and removing all flow media near the outlet edge, to slow resin flow to the outlet and allow time for both pad-ups     to completely fill (see Figs. 17 and 18). This result agrees with intuition and is especially encouraging, as the agent has shown the ability to transfer knowledge from laminates seen during training, and apply it instantly to a new previously unseen geometry.
Applying the agent to a realistic geometry from the same thin laminate family instantly is the fundamental test of the method. Figs. 19, 20, and 21, show the unoptimized, manually optimized, and RL optimized flow media placements and flow progressions on the Realistic Part. In Fig. 19, we can see that a naive flow media placement again suffers from racetracking causing trapoff and ends in significant sections of the part remaining unfilled. The manually optimized placement ( Fig. 20) has reduced flow media in the area between the flow runner and the thick area on the opposite edge of the part, retarding flow and avoiding racetracking along the top of the panel before the pad-up areas are filled through their thickness. This has the effect of eliminating the trapoff, however the fill time is significant for a part of this size. The RL flow media placement (Fig. 21) uses a similar tactic, reducing the flow to the pad-up regions by completely removing flow media on the edge of the part. It is apparent that there are two main differences between the RL and manual placements: asymmetry, and the number of layers used in less flow-critical sections of the part. By visual inspection we can see that the asymmetry introduced by the RL policy mirrors the asymmetry in the part thickness, as while the model ensures that the resin does not flow too quickly over the center-middle pad-up, it has found that offsetting a permeable path with 2 flow media layers across the top of the panel (the 'L'-shaped highway) allows a faster fill time overall. We also note the agent uses the same tactic to retain and increase flow media in the right-top and right-bottom corners where the panel is thin and will fill through-thickness easily. While not obviously parsimonious, the choice of 0 layers of flow media at the center-top position of the panel is technically a valid design, as vacuum is applied at all edges of the panel so this offset slow to fill area does not cause trapoff and still results in a complete fill. This tailoring of the flow distribution network to increase flow in less critical areas wherever possible, while retaining a focus on retarding flow over the thickest areas results in both trapoff avoidance and improvement in fill time, and shows the model has captured the dynamics of the process and how they relate to local part thickness and flow media density.
As all optimizations were done by our trained agent 'blind' (without running the process model at all), our case studies demonstrate that the trained agent is able to apply its learned process knowledge to realistic part geometries that were not seen during the training process and we have therefore have achieved our objective of knowledge transfer. On the Realistic Part, achieved improvement of 32% in fill time while maintaining complete fill when compared to an manually designed placement is industrially relevant. When applied to the Realistic Part and the Randomly Generated Test Part, our agent proposed the optimized flow media design in less than a second, demonstrating successful knowledge transfer. In both cases, the agent reached the highest reward configuration in one timestep, however this is not always the case (see Fig. 13 for an example of where the agent takes two steps to arrive at the highest reward configuration).
When evaluated on 100 new randomly sampled panels using the training mesh generation parameters the average entropy of the predicted multinomial action distribution was very close to zero (See Table 3), implying the agent is very confident in its actions. This is likely because the agent has overfit the training set and that the test and training sets are similar enough that meaningful generalisation has occurred. However, this overfitting is the goal in learning an optimal placement skill in a constrained scope. These results suggest that our approach could accomodate a more challenging and diverse set of geometries than used in this study. We leave this to future work.
Comparison to Hsiao et al. (2004) and Sas et al. (2015) is not straightforward due to differences in focus and approach, e.g. While Hsaio et. al. focus on optimization, Sas et. al. focus on robustness. However, both Hsaio and Sas propose direct optimization methods -i.e. they run the process model many times on the geometry of interest in a search procedure to explore the space of possible designs and arrive at a highperforming one. This process occurs for every new laminate geometry and given the process model takes significant time to run, the number of evaluated designs required to arrive at a good result (the search efficiency) is crucial to the performance of the overall method. In Table 2 we can see the relative search efficiencies of the methods as reported in the literature.
By contrast, our approach is completely different -we do not run the process model many times on the geometry of interest to explore the space. Instead, we initially evaluate a number of designs on a number of different geometries. This gives our trained agent an intuition about how to design, much like a person would have. Based on that knowledge gained upfront, our agent is able to propose a design without running the process model at all, instead transferring its learned knowledge to propose successful designs that lead to complete fill and reduced time to fill. During training, our method explores less space than a single run of Hasio or Sas, and during application on to similar geometries, it doesn't need to do any exploration at all -effectively achieving close to 100% efficiency on geometries in the family it was trained on.

Limitations
We have demonstrated knowledge transfer within our domain of thin laminates with pad-ups for control surfaces. Within this constraint, the geometry can be varied while maintaining the instant optimization capability. Currently, changes to the flow runner location, gate location, material properties, or flow media grid (something other than 4x3) currently require retraining the agent. While material properties are unlikely to change during a design phase, a more dense flow media grid or a modified flow runner location are common enough needs that the technique should be extended in the future to be able to also handle these changes without retraining. While instant design assistance for a single geometry family is industrially relevant, for design assistance of this type to be generally useful, the agent would need to adapt to major changes in geometry such as more complex 3D integrated structure. We have not attempted to train an agent for that level of complexity. Further, evaluating robustness in the training by repeating the process with many random seeds, collecting training statistics across many runs, and tuning hyperparameters would be valuable in building confidence in the method.

Conclusion
In this work, we introduced a novel instant optimization method for resin flow distribution networks based on learned knowledge transfer. We framed the problem flow media placement in terms of reinforcement learning, training an agent using a 3D Finite Element based process model of resin flow in carbon preforms. Trained on 10 panels from a thin laminate geometry family, our agent learns to place flow media in order to avoid resin starvation (trapoff), and reduce total infusion time. Using a case study on a realistic aerospace part with a complex 12-dimensional flow media network, we show our method reduces fill time by 32% when compared to a manually designed placement, while maintaining the same fill quality. Due to the knowledge the agent has gained during training, it is able to optimize this previously unseen panel instantly (in less than a second). To our knowledge, this is the first example of knowledge transfer in this domain. Natural extensions of this work include studying the application of DRL-based optimization to very high dimensional liquid molding design spaces such as arbitrary flow runner pipe paths on complex 3D parts.

Fig. 22
A random sample of 10 padups used in training