Towards a Machine Learning Pipeline in Reduced Order Modelling for Inverse Problems: Neural Networks for Boundary Parametrization, Dimensionality Reduction and Solution Manifold Approximation

In this work, we propose a model order reduction framework to deal with inverse problems in a non-intrusive setting. Inverse problems, especially in a partial differential equation context, require a huge computational load due to the iterative optimization process. To accelerate such a procedure, we apply a numerical pipeline that involves artificial neural networks to parametrize the boundary conditions of the problem in hand, compress the dimensionality of the (full-order) snapshots, and approximate the parametric solution manifold. It derives a general framework capable to provide an ad-hoc parametrization of the inlet boundary and quickly converges to the optimal solution thanks to model order reduction. We present in this contribution the results obtained by applying such methods to two different CFD test cases.


Introduction
Inverse problems is a wide family of problems that crosses many different sciences and engineering fields.Inverse problems aim to compute from the given observations the cause that has produced them, as also explained in [3,32].Formally, we can consider an input I and an output O, and suppose that there exists a map M : i → o that models a mathematical or physical law.The computation of the output as o = M(i) is called direct problem, whereas the problem of finding the input given the output is called inverse problem.
Given a certain output o t , the inverse problem consists of inverting the map M and finding the input i t which produces the output o t , i.e., which satisfies M(i t ) = o t .Inverse problems may be of interest for a lot of mathematical fields, from heat transfer problems to fluid dynamics.The case study which is here analysed is a Navier-Stokes flow in a circular cylinder, and the aim is to find the proper boundary condition in order to obtain the expected distribution within the domain.Such an application tries to solve a typical naval problem, numerically looking for the inlet setting which provides the right working condition during the optimization of the propulsion system.Propeller optimization is indeed very sensitive to modifications in the velocity distribution at the propeller plane: to obtain an optimized artifact it becomes very important to set up the numeric simulation such that the velocity distribution is as close as possible to the target distribution, usually collected by experimental tests.The problem is the identification of the inlet boundary condition, given the velocity distribution at the so-called wake, which is the plane (or a portion of it) orthogonal to the cylinder axis where the propeller operates.To achieve that, the inlet distribution is searched by parametrizing the target wake -by exploiting a neural network, as we will describe in the next paragraphs -and optimizing these parameters such that in the simulation the velocity is close to the target wake.It must be said that to produce meaningful results, we assume here the flow has a main direction that is perpendicular to the inlet and wake planes: in such a way, the distributions at these planes are similar to each other, allowing us to search for the optimal inlet among the parametrized wake distributions.Even if in this case the numerical experiments are conducted in a Computational Fluid Dynamics (CFD) setting, the methodology is in principle easily transferable to different contexts.Typically, such an operation is performed within an optimization procedure, in which the direct problem is iteratively solved by varying the input until the desired output is reached.This, of course, implies the necessity to numerically parametrize the input in a proper way, possibly allowing a large variety of admissible inputs and at the same time a limited number of parameters.Moreover, the necessity to solve the direct problem for many different instances makes the entire process computationally expensive, especially dealing with the numerical solution of Partial Differential Equations (PDEs).A possible solution to overcome such computational burden is offered by the Reduced Order Modelling (ROM) techniques.
ROM constitutes a constantly growing approach for model simplification, allowing for a realtime approximation of the numerical solution of the problem at hand.Among the methods already introduced in the literature, the Proper Orthogonal Decomposition (POD) has become in recent developments an important tool for dealing with PDEs, especially in parametric settings [37,29,13,35].Such a framework aims to efficiently combine the numerical solutions for different configurations of the problem, typically pre-computed using consolidated methods -e.g.finite volume, finite element -such that at any model inference all this information is combined for providing a fast approximation.Within iterative and many-query processes, like inverse problems, this methodology allows a huge computational gain.The many repetitions of the direct problem, needed to find the target input, can be performed at the reduced level, requiring then a finite and fixed set of numerical solutions only for building the ROM.The coupling between ROM and the inverse problem has been already investigated in literature for heat flux estimation in a data assimilation context [25], in aerodynamic application [2], in haemodynamic problems [19].An alternative way to efficiently deal with this kind of problem has been explored in a Bayesian framework [22].Moreover, among all the contributions in literature we cite [40,16] as an example of inverse problem with pointwise observations and inverse problem in a boundary element method context, respectively.This contribution introduces an entire and novel machine learning pipeline to deal with the inverse problems in a ROM setting.In specific, we combine three different uses of Artificial Neural Network (ANN), that are: i) parametrization of the boundary condition given a certain target distribution or pointwise observations, ii) dimensionality compression of the discrete space of the original -the so-called full-order -model and iii) approximation of the parametric solution manifold.It derives a data-driven pipeline (graphically represented in Figure 1) able to provide a parametrization of the original problem, which is successively exploited for the optimization in the reduced space.Finally, the optimization is carried out by involving a Genetic Algorithm (GA), but in principle can be substituted by any other optimization algorithm.
The contribution presents in Section 2 an algorithmic overview of the employed methods, whereas Section 3 illustrates the results of the numerical investigation pursued to the abovementioned test case.In particular, we present details for all the intermediate outcomes, comparing them to the results obtained by employing state-of-the-art techniques.Finally, Section 4 is dedicated to summarizing the entire content of the contribution, drawing future perspectives and highlighting the criticisms highlighted during the development of this contribution.

Methodology
We dedicate this section to providing an algorithmic overview of the numerical tools composing the computational pipeline.

Boundary parametrization using ANN
Neural networks are a class of regression techniques and the general category of Feed-forward neural networks has been the subject of considerable research in several fields in recent years.The capability of ANN to approximate any function [15] and the even greater computational availability allowed indeed the massive employment of such an approach to overcome many limitations.In the Figure 1: Flow diagram for the data-driven pipeline followed in the paper.field of PDEs, we cite [30,4,12,20,28,23,41,27] as some of main impacting frameworks recently explored.
A not yet investigated usage of ANN, to the best of authors' knowledge, is instead the parametrization of a given (scattered) function.We suppose that we have a target distribution v target = (v target i ) P i=1 , corresponding to the wake velocity distribution in our case, which has P degrees of freedom.We want to reproduce this distribution by exploiting a neural network technique.
A neural network is defined as a concatenation of an input layer, multiple hidden layers, and a final layer of output neurons.An output of the i-th neuron in the l-th layer of the network is generally defined as: where σ is the activation function (which provides non-linearity), b l i the bias and W refers to the so-called weights of the synapses of the network, N l is the number of neurons of the l-th hidden layer, H is the number of layers of the network.
The bias and the weights are the hyperparameters of the network, that are tuned during the training procedure to converge to the optimal values for the approximation in hand.We can think of these hyperparameters as the degree of freedom of our approximation, allowing us to manipulate such distribution by perturbating them.We define the optimal hyperparameters (computed with a generic optimization procedure [36,33]) as b * and W * ; the network exploiting these hyperparameters reproduces as output an approximation of our target wake distribution: The input x to the whole neural network in this paper corresponds to the polar coordinates of the points of the wake, so we have a two-dimensional input.We can then rearrange Eq. 1 to express the parametrized output of a single hidden layer as follows: where µ l is the vector of parameters in layer l, which applies only to the bias of the layers.We finally obtain the parametrized output N (x, µ).
The main advantage of this approach is the capability to parametrize any initial distribution, since the weights are initially learnt during the training and then manipulated to obtain its parametric version.On the other hand, the dimension of the weights is typically very large, especially in networks with more than one layer.In networks composed of a large number of layers, a high number of hyperparameters should be tuned.A possible solution to overcome such an issue could be to manipulate just a subset of all the hyperparameters.Indeed, in this paper, only the bias parameters of two hidden layers are perturbed.
Such a posteriori parametrization of any generic distribution is employed in this work to deal with the inverse problem.The main idea is to compute different inlet velocity distributions corresponding to different weights of the ANN.The weights perturbations are used as parameters to build the reduced order model, providing an ad-hoc parametrization of the boundary condition based on the target output.It must be said that such parametrization may lead to good results only if some correlation between the boundaries and the target output exists.

Model Order Reduction
ROMs are a class of techniques aimed to reduce the computational complexity of mathematical models.
The technique used in this paper is data-driven or non-intrusive ROM, which allows us to build a reduced model without requiring the knowledge of the governing equations of the observed phenomenon.For this reason, this technique is suitable to deal with experimental data and it has been widely used in numerical simulations for industrial, naval, biomedical, and environmental applications [38,5,39,10,8,42,11].
Non-intrusive ROMs are based on an offline-online procedure.Each stage of the procedure can be approached in different ways, which are analyzed in the following Sections.All the techniques presented in the next lines have been implemented in the Python package called EZyRB [7].

Dimensionality reduction
In the dimensionality reduction step, a given set of high-dimensional snapshots is represented by a few reduced coordinates, in order to fight the curse of dimensionality and make the pipeline more efficient.
We consider the following matrix of M snapshots, collecting our data: where v wake i ∈ R P , i = 1, . . ., M are the velocity distributions in our case, each one corresponding to a different set of parameters µ i ∈ R p for i = 1, . . ., M .The goal is to calculate the reduced snapshots {v wake where Ψ : R P → R L .We specify that the latent dimension L P has to be selected a priori.Such phase can be approached by making use of linear or non-linear techniques, such as the POD and the usage of an Autoencoder (AE), respectively.
Proper Orthogonal Decomposition In the first case, the offline part consists of the computation of a reduced basis, composed of a reduced number of vectors named modes.The main assumption on which it is based is that each snapshot can be approximated by a linear combination of modes: with ϕ i ∈ R P are the modes and a k i are the related modal coefficients.The computation of modes in the POD procedure can be done in different ways, such as via Singular Value Decomposition (SVD) or the velocity correlation matrix.In the first case, the matrix Y can be decomposed via singular value decomposition in the following way: where the matrix U ∈ R P ×L stores the POD modes in its columns and matrix Σ ∈ R L×L is a diagonal matrix including the singular values in descending order.The matrix of modal coefficients -so, the reduced coordinates -in this case can be computed by: where the Ŷ ∈ R L×M columns are the reduced snapshots.In the second case, the POD space i=1 } is found solving the following minimization problem: where Y n is the n-th column of the matrix Y.This problem is equivalent to computing the eigenvectors and the eigenvalues of the velocity correlation matrix: where Ω is the domain on which the velocity is defined (the propeller wake plane in this case).The POD modes are expressed as: where V y stores the eigenvectors of the correlation matrix in its columns and λ y i are its eigenvalues.
Autoencoder AE refers to a family of neural networks that, for its architectural features, has become a mathematical tool for dimensionality reduction [20].In general, an AE is a neural network that is composed of two main parts: • the encoder: a set of layers that takes as input the high-dimensionality vector(s) and returns the reduced vector(s).
• the decoder, on the other hand, computes the opposite operation, returning a high-dimensional vector by passing as input the low-dimensional one.
The layers composing the AE could be in principle of any type -e.g.convolutional [34], dense -but in this work both the encoder and the decoder are feed-forward neural networks.For sake of simplicity, we assume here that both the encoder and the decoder are built with only one hidden layer and we denote by D the decoder and with E the encoder.If v wake is the input of the AE, we denote by ṽwake = (D • E)(v wake ) = AE(v wake ) the output of the encoder, where formally: A generic structure of an autoencoder is schematized in Figure 2. Weights and activation functions can be different for encoder and decoder, and of course the case with more than one hidden layer for the encoder and the decoder can be easily derived from this formulation.The AE is trained by minimizing the following loss function: where v wake is the original (high-dimensional) vector to reduce, vwake represents the reduced coordinates and ṽwake is the predicted vector.In this way, the network weights are optimized such that the entire AE produces the approximation of the original vector, but compressing it onto an a priori reduced dimension.The learning step aims so to discover the latent dimension of the problem at hand.
For what concerns the test cases here considered, two different types of autoencoders are taken into account: (i) a linear autoencoder, i.e. without an activation function between the hidden layers, with a single hidden layer composed of a number of neurons equal to the reduced dimension: it should exactly reproduce the behavior of the POD; (ii) a non-linear autoencoder, i.e. with an activation function, and with multiple hidden layers, whose performance is compared to that of the POD.

Approximation techniques
The problem in the online part is to predict the (unknown) latent dimension ṽwake given a new parameter µ: where π : R p → R L is the mapping from parameter space to reduced space.We can approximate such mapping by means of interpolation techniques, such as Radial Basis Functions (RBF), or regression techniques, such as ANN, in order to predict the latent dimension for any new parameter.Finally, the approximation of the solution in the original (high-dimensional) space requires the expansion of the reduced coordinates, which relies on the inverse compression method computed during the dimensionality reduction.
Remark 1 Many approximation techniques can be employed to reconstruct the solution in reduced order models.For instance, two spread techniques are the RBF and the Gaussian Process Regression (GPR).However, we remark that many other approximants can be adopted, such as the Moving Least Squares approach (MLS), which is described in detail in [21,18].The reason for choosing the RBF approach is that it allows us to tune different parameters, such as the radius, and the kernel of the radial basis functions, in order to adapt our approximation to different settings of the training dataset.

Radial Basis Functions
The RBF is an approximation technique that reconstructs the original field in the following way: where (φ( µ − µ i )) M i=1 are called radial basis functions, each one associated with a different center µ i and weighted with a weight ω i .The radial functions can have different expressions, in our case we consider the multiquadric functions φ(r) = 1 + (εr) 2 , where r = µ − µ i .

Artificial Neural Networks
The other technique here investigated to approximate the parametric solution manifold is ANN.The basic structure of the method is already explained in Section 2.1, We consider a neural network composed of a unique hidden layer.Its structure is: The weight matrix and bias of the ANN are found by training the neural network with the set of parameters and snapshots (µ i , v wake i ) M i=1 .Then, the approximated solution ṽwake is computed from the related vector of parameters µ as ṽwake = ANN(µ).
The reduced order techniques presented in this Section both for dimensionality reduction and approximation are applied in this paper to two different inverse problems.In particular, the following cases are considered:

Wake optimization
The construction of the reduced order model is followed by the research of the vector of parameters which better reconstructs the velocity distribution we want to reproduce.This investigation is addressed by solving an optimization problem, in which the aim is to minimize the difference between the approximated wake distribution predicted by the ROM and the real wake distribution.The optimization problem can be addressed either by using a search-based, such as the genetic algorithm (GA), initially proposed in [14], or a gradient-based algorithm.In subsections 3.2.3 and 3.3.2we compare the results obtained employing both approaches.
Remark 2 However, it is worth remarking that the genetic algorithm allows us to reach the global theoretical minimum without getting stuck into a local minimum.Gradient-based methods, instead, require derivable objective functions and get trapped in local minima in non-convex optimization.The genetic algorithm requires a high number of evaluations, which is not a real issue since the employment of the reduced model, but it is able to converge to the global minimum.In a data-driven ROM framework, as the one proposed in this manuscript, the solution manifold is approximated with regression techniques, without any warranties on convexity.Thus, genetic methods offer a robust approach in this context, as demonstrated by its employment in similar frameworks [5,6].
We dedicated this section to providing a basic introduction to the genetic method, retaining a full discussion out of the topic of the present work.For a deeper focus on genetic optimization, we refer the reader to the original contribution.The first step of the algorithm is the definition of a population composed of N pop individuals, in our case vectors of parameters (µ j ) Npop j=1 , composed of p genes, µ j ∈ R p .Then, we proceed by defining the objective function which should be minimized.We indicate by ṽwake j ≡ ṽwake (µ j ) the approximation of the wake distribution computed with the reduced order model that we are taking into account.We call v target (µ) the real wake distribution.
The objective function defined for each individual in the population is: The GA consists in an iterative process composed of three main steps: selection, in which the best individuals are chosen; mate or crossover, where the genes of the best individuals are combined according to a certain mate probability; mutation, changing some of the genes of the individuals.This process is iterated a number of times which is named number of generations.Regarding the technical side, the GA has been performed using the open-source package DEAP [9].

Numerical results
In the present Section, the methods presented in Section 2 are applied to the test case of the flow in a circular cylinder.

The inverse problem
The computational domain is a circular cylinder with height 4.67 m, diameter 2.36 m and it is schematized in Figure 4, where the inlet is indicated as Γ i , the outlet as Γ o and the lateral surface of the cylinder as Γ side .The aim is to reconstruct the inlet velocity distribution given the velocity distribution at the so-called wake, which is a plane placed at a distance of 2.97 meters from the inlet plane, as showed in 4.This type of problem is known as inverse problem, which is in this case applied in a CFD setting.The physical problem at hand is modelled by the Navier-Stokes + initial conditions in (Ω, 0) .
In the computation of the full order solutions of the NSE in (5), the finite volume discretization is employed, by means of the open-source software OpenFOAM [1].The finite volume method [26] is a mathematical technique that converts the partial differential equations (the NSE in our case) defined on differential volumes in algebraic equations defined on finite volumes.The computational mesh considered in this test case has been generated by blockMesh and snappyHexMesh and it is composed by 1 × 10 6 cells.The mesh is a regular radial mesh, which presents 100 refinements both in the radial, the tangential and the axial direction, as can be seen from Figure 3.
The turbulence treatment at the full order level is characterized making use of the RANS (Reynolds Averaged Navier-Stokes) approach.This approach is based on the Reynolds decomposition, which was proposed for the first time by Osborne Reynolds [31], in which each flow field is expressed as the sum of its mean and its fluctuating part.The RANS equations are obtained by taking the time average of the NSE and adding a closure model for the well-known Reynolds stress tensor.The closure model considered in the full order model in OpenFOAM is the κ − ω model, proposed in its standard form in [17], and in the SST form in [24].This model is based on the Boussinesq hypothesis, which models the Reynolds stress tensor as proportional to the so-called eddy viscosity and it is based on the resolution of two additional transport equation for the kinetic energy κ and for the specific turbulent dissipation rate ω.In the full order model, we consider the SST κ − ω model.The CFD simulation is run making use of the PIMPLE algorithm until convergence to a steady state.
Figure 4: The computational domain.
In this paper, the inverse problem presented is applied for two different wake distributions.The first one is a smooth function f : W → R defined in all the wake plane W, written as a function of the radial coordinate: The second distribution is given as a set of pointwise observations only in a selected region of the wake plane.For the sake of simplicity only the distribution along the axis of the cylinder, which is the main direction of the flow, is taken into account.The contributions along the secondary directions are ignored.Thus, we denote with wake distribution or velocity distribution only the component of the velocity along the main direction of the flow.The observations are computed using the function f defined as where x and y are the cartesian coordinates in the wake plane.In total we collect 36 observations by sampling with equispaced cartesian grid the domain [−0.5, 0.5] 2 in the wake plane.

First test case: a smooth distribution
In this Section of the results, the problem considered is the reconstruction of the inlet velocity distribution when the wake velocity has the smooth distribution in (6).

Parametrization using ANN
The first step in the resolution of the inverse problem is the parametrization of the real velocity distribution at the wake through a fully-connected ANN.The ANN is represented in Figure 5 and it is composed by 3 hidden layers, with 10, 5 and 3 neurons, respectively.The inputs are the polar coordinates of the wake points and the output is the wake velocity distribution; the degree of freedom of the distribution is P = 10001, which corresponds to the number of points at the inlet.As explained in Section 2.1, the parameters coincide with a subset of weights of the ANN.In particular, our choice is to consider 4 parameters, given by the biases of the last two layers.
The main idea is to generate M different distributions from the ANN by randomly perturbating the weights considered as parameters.Those distributions are set as inlet distributions in the full order model.The perturbated parameters and the wake distributions obtained from the full order computation are considered as parameters and snapshots, respectively, for the formulation of the ROM.
In Figure 6, the following distributions are represented from the left to the right: the distribution predicted from the ANN considering the same parameters as in the training stage; the real target wake distribution; two among the M distributions obtained by perturbating the parameters.

Model order reduction
The distributions obtained by the parametrization described in Section 3.2.1 are then used as inlet initial conditions to run a set of M offline simulations, following the setting described in Section 3.1.The perturbed parameters of the ANN and the wake distributions found from high-fidelity simulations are then used to perform a model reduction.In particular, in this Section different types of techniques for the reduction and approximation stages of the ROM are evaluated and compared and the models considered are here listed: 1. POD-RBF; 2. POD-ANN;
The details related to the structure of the neural networks and the hyperparameters used for training, i.e. the learning rate, the stopping criteria, the number of hydden layers and of neurons for each layer, are defined in the discussion Section 3.4.
First of all, we consider a comparison regarding the dimensionality reduction step.The reduced dimension considered in this test case is 3 and we consider a linear AE, i.e. without any activation function, composed by a single hidden layer of 3 neurons for both the encoder and the decoder parts.The idea is indeed to reproduce the behaviour of the POD.A preliminary study is the graphical comparison between the POD and the AE modes.In particular, the AE modes are computed by taking as input to the decoder part of the network the identity matrix of size 3. Since the AE does not include any non linear computation, the operation is linear and ideally identical to the computation of the first 3 POD modes.From Figure 7, the shapes of the first 3 POD and AE modes appear similar.In Figure 8 we provide a representation of the error in the L 2 norm for: (a) a fixed test dataset; (b) the training dataset used to build each ROM.In this sensitivity analysis, the test dataset is fixed and composed by 10 snapshots and the number of snapshots used to train the reduced models varies between 10 and 90.
Considering a dataset composed by n element, we first define the relative error for each i-th element, say: where ṽwake i is the velocity distribution predicted by the reduced model starting from the test parameter, and v wake i is the full-order snapshot in the test dataset.Then, the mean over all the elements of the test dataset is computed: ( n i=1 ε i )/n.This definition applies to both the test and the training datasets and it is used to compute the errors represented in Figure 8.The neural networks appearing in ROMs are trained 3 times and Figure 8 provides the mean test and train errors among the 3 different runs, to ensure a more reliable analysis of the ROMs accuracy.
As can be seen from Figure 8, the results obtained for the test error of POD-RBF and AE-RBF are similar to each other and, as intuitively expected, the general trend is decreasing as the number of training snapshots increases.The reduced methods which include a regression technique for the approximation, i.e. the ANN, produce better results in terms of test error with respect to the interpolating technique.In particular, the regression technique outperforms the classical RBF if we consider a small number of training snapshots; however, the results obtained by the two approximating techniques are similar when a larger training dataset is used to train the ROM.

Remark 3
The behavior of the neural networks (both the AE used for reduction and the ANN used as approximant) strongly depends on the hyperparameters of the networks.This instability occurs also in the case of nonlinear AE.Indeed, in this case the training error outperforms the results obtained with the POD, but the test error remains close to the one obtained with a standard POD approach, leading to the overfitting phenomenon.A wider discussion on the stability issues related to neural networks will be undertaken in Section 3.4.

Wake optimization
In this Section, two different optimization methods are exploited to find the combination of parameters that provides the highest accuracy in the reconstruction of the original wake.In particular, the results obtained with a genetic algorithm are compared to those obtained with a gradientbased method.Both optimization processes exploit the ROM potentiality to predict the velocity distribution corresponding to each evaluation point in the parameter space.The reduced model taken into account is POD-ANN, which provided the highest precision in the sensitivity analysis in 3.2.2 when the number of snapshots used for training is 90.
The GA, composed by the selection, crossover and mutation steps, is evolved for 20 generations, considering an initial population of 200 individuals, where each individual is represented by a different combination of the four parameters used for the wake parametrization in Section 3.2.1.All the attributes of each individual are in the interval [−0.5, 0.5].The fitness of each individual of the population, i.e. the objective function minimized in the GA, is the relative error of the wake obtained from the ROM with respect to the reference wake we want to reproduce.We specify here more details about the implementation of the genetic evolution: • in the mate stage, a blend crossover is performed, that modifies in-place the input individuals; • in the mutation step, a Gaussian mutation of mean µ = 0 and standard deviation σ = 1 is applied to the input individual.The independent probability for each attribute to be mutated is set to 0.5; • the evolutionary algorithm employed in the genetic process is the (µ + λ) algorithm, which selects the µ = 50 best individuals for the next generation and produces λ = 100 children at each generation.Moreover, the probabilities that an offspring is produced by crossover and by mutation are set to 0.4 and 0.6, respectively.
On the other hand, for the gradient optimization we considered the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm.The starting point chosen in the function evaluation is [0.5, −0.5, −0.5, 0.5].The algorithm evaluated the derivative of the objective function via forward differences, setting the absolute step size to 1 × 10 −17 .
The optimal individual obtained from the genetic optimization is −0.157 −0.179 −0.157 0.0851 , whereas the optimal individual predicted by the gradient-based algorithm is 0.00611 −0.117 −0.0095 0.0143 .The final value of the objective function for the optimal individual is ∼ 3.89 % for the genetic algorithm, and ∼ 3.94 % for the gradient algorithm.The graphical representations of the optimal and of the real distributions are displayed in Figure 9.
Remark 4 Although the two methods reach similar precision, the genetic algorithm is preferred since the gradient method is significantly influenced by the choice of the initial guess and by the Figure 9: First test case: graphical representations of the exact benchmark wake (left), the optimal wakes obtained with a genetic (center) and gradient (right) optimization.
step size used to evaluate the approximated derivative.We will show the numerical evidence for this argumentation in Section 3.4.

Second test case: a set of pointwise observations
In practical applications, the available data is typically provided not in the form of a smooth distribution, but in a small number of pointwise observations, which are usually localized on a reduced part of the computational domain.In this section, the known data at the wake is composed of a set of 36 measurements placed inside a square of side 1 m centered on the wake plane.The reference distribution considered is expressed in (7).
The logical passages followed in Section 3.2 are reproduced in the present section with a different set of data.

Parametrization using ANN
The ANN used to parametrize the wake in this test case is composed of 3 hidden layers, the first one with 8 neurons and the other two with 2 neurons.The number of parameters is 6 in this application and coincides with the perturbations of the weight matrix and the bias of the last hidden layer.The ANN is trained for 50000 epochs with a learning rate of 3 × 10 −3 .Since in this numerical experiment there is no radial symmetry (contrary to the previous test case) we add an additional contribution to the loss term to force the continuity between 0 and 2π angles.Formally, the loss to minimize during the train is L(ṽ wake , v target ) = M SE(ṽ wake , v target ) + λ ANN(r i , 0) − ANN(r i , 2π) , where M SE(•, •) is the mean square error, λ is a weight of the additional contribution (here λ = 1) and r i for i = 1, . . ., 100 are the equispaced sample points we spanned along the radius at the angles 0 and 2π. Figure 10 shows indeed a continuous surface that keeps such a feature also after the perturbation of the weights.As already done in Section 3.2.1,M sets of parameters are generated as random perturbations in the interval [−1, 1], which will be used as parameters of the ROM.The ANN perturbed with the generated parameters is used to obtain M new velocity distributions, which are set as inlet distributions for the full-order model.Then, from the full-order simulations, the wake distributions corresponding to each set of parameters are obtained and considered as snapshots for the ROM.

Model Order Reduction and Optimization
From the sensitivity analysis in Section 3.2.2 the combination of the AE and the ANN proved to have the highest precision in the reconstruction of the wake.For this reason, the AE-ANN is the ROM which is chosen in this test case to train the optimization algorithm and to provide the best (approximated) wake.The GA, composed by the selection, mate, and mutation, is here iterated for 20 generations.The number of individuals considered is 200 for the first generation and 100 for all the other generations.The fitness is the relative error between the real wake we want to reproduce and the wake reconstructed by the ROM, where only the points in data are considered.In this experiment, the fitness decreases as the population evolves, until a percentage error of 49% is reached.Also in this case, the BFGS technique is not able to reach that fitness and gets blocked instead in some local minima at 61% of error with respect to the target wake.
Figure 12 shows the graphical representation of the optimum wake obtained from the GA.From a graphical point of view, the optimization is able to capture the trend exhibited by the pointwise measurements.However, the error remains high after the optimization since the data in this test case is not given by a target smooth distribution, but by a few amount of points.Another possible issue is that the points are given only in a limited region of the wake, leading to a more difficult reconstruction of the distribution at the inlet, especially in the external region.

Discussion
This Section is dedicated to a wider discussion of the numerical results obtained in Sections 3.2 and 3.3.
Table 1 provides a summary of the results obtained for the two test cases.The first thing that emerges from the table is that none of the methods performs better than the others for each different value of the training snapshots.In general, for the first test case, the methods that provide the highest accuracy on the test snapshots are the POD-ANN and the linear AE-ANN, as can be seen also from Figure 8.On the other hand, in the second test case, the reduced order methods employing the nonlinear autoencoder provide the best results in most of the M values, as can be seen in Figure 11.For what concerns the training error, from both Figures 8 and 11 we can deduce that the introduction of non-linearity in the reduction stage leads to a better reconstruction of the training snapshots, especially when M is small, i.e. with a limited amount of training data.Therefore, especially in the first test case, the autoencoder reaches a high accuracy on the train snapshots, but it is not able to generalize the information learned during training, leading to high errors on the test database.Indeed, the worst results in the first case are obtained with the nonlinear AE and are highlighted in red in Table 2.This behavior describes the overfitting phenomenon, which is here mitigated by the addition of a regularization term in the loss function, say: L(ṽ wake , v target ) = M SE(ṽ wake , v where W is the L 2 norm of the weight matrix, i.e. the sum over all the squared weight values, and α is named weight decay.
In addition, we provide here different arguments to explain the stability issues concerning the neural networks' performance: • The networks are highly influenced by the choice of the hyper-parameters, i.e. the learning rate, the structure of the hidden layers, the stopping criteria, and the weight decay.In order to have a good result for each value of M , the value of the hyperparameters should be tuned ad-hoc considering each case individually.In our analysis, instead, we consider the parameters fixed for each method for a fair comparison.These values are reported in Appendix 4.
• The different performance of the autoencoder in the two test cases depends on the input data.We obtained better results in the second test case, where the POD has poor accuracy and is not able to capture the system's behavior.To support this argumentation, we provide the results obtained in the first test case when L = 1.From Figure 13 we can notice that the autoencoder improves the results obtained with the POD when M is low.
An additional point that can be discussed is the type of optimization algorithm used.As pointed out in Section 3.2.3, the genetic algorithm is preferred to a gradient-based algorithm, because the gradient method can get stuck into local minima.The evidence of this statement is provided in Table 4, where we show the results obtained with the BFGS method starting from different initial  guesses.In the first three cases analyzed, the method converges to three different local minima; in the last two cases, it gets stuck in the first evaluation point.

Conclusions
We presented in this paper a data-driven computational pipeline that allows us to efficiently face inverse problems by means of the model order reduction technique.Neural networks are involved in the such framework at three different levels: • the parametrization of a given (generic) wake distribution by perturbating a subset of the weights of a fully-connected neural network.The parameters are exploited to obtain the inlet distributions of the full-order simulations.The wake distributions obtained from the fullorder simulations are considered snapshots associated with the input parameters, obtaining a parametrization of the original problem.
• the compression of the dimensionality of the full-order snapshots, which typically belong to high-dimensional spaces, obtained with a properly structured autoencoder; • the approximation of the solution manifold, allowing for predicting new solutions for any unseen parameters.This last task is obtained with a second fully-connected neural network.
The results obtained by both the test cases in this work are promising.The neural network parametrization shows the possibility to produce several different distributions playing with very few parameters (4 and 6), becoming a valuable tool also for such an objective.Of course, the benefits of such parametrization are strictly related to the problem at hand, since here we are assuming that the target distribution is somehow related to the boundary condition.The investigation of the parametric configuration shows also a large range of admissible distributions.However, we highlight there is no possibility to precisely select the region of interest for the parametrization and the weights of the neural network which have been perturbated have no physical significance linked to the wake velocity distribution.Dimensionality compression and manifold approximation are already investigated uses of neural networks, and also in this contribution, the results confirm their better performances with respect to the more consolidated techniques.Possible future developments of the presented pipeline may interest the parametrization through neural networks: at the current stage, it results in a sort of black-box procedure, since the parameters and their ranges are selected following a trial and error process.Such an issue can be

Figure 3 :
Figure 3: Representation of the mesh on a slice orthogonal to the cylinder axis (left), and on the walls (right).

Figure 5 :
Figure 5: Structure of the ANN used for parametrization.

Figure 6 :
Figure 6: First test case: parametrization of the given target sinusoidal function using a ANN: from left to right are sketched the distribution after the training phase, the original distribution, and two perturbed configurations.
(a) POD modes (b) AE modes without activation function

Figure 7 :
Figure 7: The POD and AE modes.Modes in (b) are obtained considering a linear autoencoder.

Figure 8 :
Figure 8: First test case: sensitivity analysis for all the ROMs with the reduced dimension L = 3. Error is shown for test (left) and training (right) datasets.

Figure 10 :
Figure 10: Second test case: parametrization of the given target using a ANN: from left to right are sketched the distribution after the training phase, the original observations, and two perturbed configurations.

Figure 11 :
Figure 11: Second test case: sensitivity analysis for all the ROMs with the reduced dimension L = 4. Error is shown for the testing (left) and training (right) datasets.

Figure 12 :
Figure 12: Second test case: the target output at wake plane (left) and the optimum representation, obtained with a genetic algorithm (right).

Figure 13 :
Figure 13: First test case: sensitivity analysis for all the ROMs with the reduced dimension L = 1.Error is displayed for test (left) and training (right) datasets.

Table 1 :
Summary of resultsFirst test case Second test case

Table 2 :
First test case: stability analysis on the percentage test errors of the ROMs when L = 3.

Table 4 :
First test case: results of gradient optimization for different initial guesses.

Table 5 :
Second test case: results of gradient optimization for different initial guesses.