A Time-Accurate Inflow Coupling for Zonal LES

Generating turbulent inflow data is a challenging task in zonal Large Eddy Simulation (zLES) and often relies on predefined DNS data to generate synthetic turbulence with the correct statistics. The more accurate, but more involved alternative is to use instantaneous data from a precursor simulation. Using instantaneous data as an inflow condition allows to conduct high fidelity simulations of subdomains of e.g. an aircraft including all non-stationary or rare events. In this paper we introduce a tool-chain that is capable of interchanging highly resolved spatial and temporal data between flow solvers with different discretization schemes. To accomplish this, we use interpolation algorithms suitable for scattered data in order to interpolate spatially. In time we use one-dimensional interpolation schemes for each degree of freedom. The results show that we can get stable simulations that map all flow features from the source data into a new target domain. Thus, the coupling is capable of mapping arbitrary data distributions and formats into a new domain while also recovering and conserving turbulent structures and scales. The necessary time and space resolution requirements can be defined knowing the resolution requirements of the used numerical scheme in the target domain.


Introduction
In modern Computational Fluid Dynamics (CFD) research Large Eddy Simulation (LES) is becoming more popular due to increased computation performance [1]. However, many practical applications still are out of range for a detailed investigation using this technique. Therefore, many simulations run today are based on so called zonal approaches that often depend on predefined DNS data to generate synthetic turbulence with the correct statistics. By zonal we mean that only a small subset of a domain is simulated with the LES method in order to decrease computational cost, while the majority of the domain is for example solved by a much cheaper Reynolds-Averaged Navier-Stokes (RANS) method, or the subset is equipped with suitable boundary conditions, in particular scale-resolving inflow data. This can be of special interest in the simulation of turbulent boundary layers, where we do not want to simulate the initial transition process, but are just interested in the fully developed boundary layer as a starting point. To achieve this, there have been developed many approaches, such as the synthetic eddy method [2,3] or the recycling-rescaling approach [4,5] which allow for significantly smaller domains. A practical example is the simulation of acoustic noise at the trailing edge of an airfoil where a detailed simulation of only a part of the airfoil is needed [6]. One similarity of the applications just described is their dependency on boundary layer properties and therefore reference data from literature.
Another slightly different example is the investigation of the interaction of an incoming turbulent wake with the boundary layer. For example this can be applied to the interaction of a turbulent wing wake with the horizontal stabilizer of an aircraft (cf. Fig. 1). The described scenario poses a challenge, since fully scale resolving codes often can not afford to compute the whole aircraft and codes that are capable of running a simulation of a whole aircraft are often not able to run high fidelity simulation of parts of it. Therefore, there is the need to map the results in a time-accurate manner from one simulation to an inflow/initial condition on a detailed simulation with a smaller domain and to impose them as inflow conditions. This approach allows for refined simulation of areas of special interest. Also such a coupling between these codes enables a way to further investigate the interaction between turbulence of different physical scales very efficiently. Therefore, zonal simulations of high Reynolds number flow become feasible.
When trying to couple different numerical codes we encounter several problems on how to approach this: 1. Is a two-way coupling necessary or is one-way sufficient? 2. Do we couple the codes during runtime? 3. Are the underlying numerics compatible?
The first question is -in context of the scenario described above -easy to answer. Since we only are interested in the effects of an incoming flow to the target domain, a one-way coupling is sufficient. We note that, depending on the equation system, a one-way coupling via a prescribed Dirichlet boundary is prone to errors, since information transport is limited to one direction. However, we can justify this simplification by assuming that we e.g. extract the flow in a wake region with no proximity to a wall in case of the compressible Navier-Stokes equations. Additionally, the simplification removes a lot of complexity and thus enables efficient coupling of different solvers and experiments which would not be possible in a two-way coupled way.
Thus, we can directly answer the second question. Having both codes run separately allows us to perform the mapping in a preprocessing step for the zonal simulation and thus removes a bottleneck during runtime. In addition, it avoids the complexities of having to solve the High Performance Computing (HPC) problem of having to run two possibly very heterogeneous codes synchronously and establish efficient parallel communication patterns. As mentioned before the coupling is designed to perform detailed simulations of a subdomain, meaning that the area of special interest is also contained in the full domain simulation and therefore is assumed to be represented in a sufficient way to capture all the necessary physics. Also we have to consider that the incoming physics can be truncated. We thus investigate the influence of different resolution combinations in order to quantify this error.
The third question is harder to answer since we not only have to take the spatial discretization like finite volume, finite difference and finite element methods into account, but also take care of the temporal discretization, which in most applications is either implicit or explicit. In general, two choices for mapping the solutions between two heterogeneous representations are possible: A projection approach, and an interpolation method. While the former is (approximately) conservative, ensuring this property on arbitrary meshes in space and time is cumbersome, expensive (as it requires non-local operations) and not very flexible. The interpolation approach relaxes this condition, making the mapping process very general and allows for extending the algorithms to work with arbitrary (x, y, z) data as an input and map it to a compatible data format. The mapping algorithms thus have to be able to capture resolution differences from both grid spacing and numerical efficiency per DOF, arbitrary points and inconsistent time steps. Hence, interpolation algorithms seem to be a good choice for achieving these properties in space and time.
Another problem we have to tackle is how to deal with large data sets. Although we opt for an offline coupling which avoids having to transfer the data in situ, time resolved surface or volume data is very memory intensive. Thus, memory management algorithms have to be taken into account, including parallelization, load balancing and data reduction in order to keep compute costs low.
In this paper we want to show that mapping instantaneous data from the DLR finite volume code TAU to the high-order spectral element code FLEXI is possible and allows for detailed simulation of subdomains. Thus, we want to answer the following research questions: 1. Is it possible to get a mapping for the data which allows for a numerically stable coupled simulation?
2. Which temporal resolution is needed? 3. How does the difference in spatial resolution (mesh and numerical efficiency per DOF) affect the mapping error?

Numerical Methods
An important aspect for the mapping algorithm is the knowledge of the underlying numerics and the associated scale resolving properties which we are going to assess in more detail in the following section.

Code Frameworks
The target numerical scheme for which the data has to be prepared is the opensource discontinuous Galerkin framework FLEXI, developed at the University of Stuttgart [7]. In this scheme the domain is partitioned into non-overlapping unstructured hexahedral elements. We can choose an arbitrary polynomial degree for the elements. This also means that the solution in each element is represented as a polynomial.
In contrast to the spectral element code, the source data is typically pointwise data resulting from an experiment or finite volume code. The presented mapping algorithms are designed and implemented to be generally applicable, but are optimized to work with the DLR finite volume code TAU. In Fig. 1 a typical application of TAU is visualized. TAU uses hybrid RANS/LES methods e.g. for cases with separated flows, where attached boundary layers are treated in RANS mode and detached wake regions are resolved in LES mode. Thus the effect of the wing wake on the boundary layer of the HTP is hard to investigate within TAU. FLEXI on the other hand resolves the boundary layer and thus is capable of quantifying the influence of the wing wake. Thus, the region of interest that can be simulated within FLEXI is marked in Fig. 1. TAU will also be used to validate the results in Sec. 4 later on.

TAU
The finite volume solver TAU is developed by the German Aerospace Center (DLR) and widely used among the aviation industry [8]. It solves the Euler or RANS equations both on structured and unstructured grids. Several oneand two-equation models and Reynolds stress models are implemented for turbulence modeling. Additionally, LES or hybrid RANS/LES simulations can also be performed. Hexahedra, tetrahedra, triangular prisms and pyramids are supported elements for the cells of the primary grid. For the computation of the numerical fluxes at the cell boundaries, different upwind schemes and central approximations are available. Both explicit and implicit schemes can be chosen for the integration in time. The resulting linear system is solved with SGS or LUSGS schemes. For convergence acceleration, local time stepping, residual smoothing and multigrid methods are used. Parallelization is achieved by domain decomposition, with communication through the message passing interface (MPI). Fig. 1: Tandem wing configuration with visualized wing wake interacting with the HTP simulated in TAU. Region of interest for detailed simulation in FLEXI is highlighted.

FLEXI
FLEXI is a high-performance open-source CFD solver based on the Discontinuous Galerkin Spectral Element Method (DGSEM). It utilizes hexahedral tensor product elements with an arbitrary polynomial degree in each element. Since DG methods are hybrid schemes combining finite element and finite volume methods, we use a Roe Riemann solver with minimum dissipation for the fluxes between the elements [9]. Additionally, we represent the polynomial solution on a non-equispaced Legendre-Gauss or Legendre-Gauss-Lobatto point set. Since FLEXI acts as a framework there are multiple equation systems implemented. In this paper we mainly use the compressible Navier-Stokes equations. For validation the linear scalar advection system is used. The results in the application section are created using the compressible Navier-Stokes equations, which are implemented as skew-symmetric split form approximations to minimize aliasing instabilities [10]. The boundary conditions generally are imposed weakly. This means, that we do not prescribe the state at the corresponding solution point in the element, but rather prescribe the numerical flux. FLEXI is parallelized using MPI and was successfully tested on up to O(10 5 ) cores [11].

Comparison of the Code Frameworks
Discontinuous Galerkin methods are commonly used high-order schemes. Finite volume methods in contrast are for unstructured meshes often limited to second order. It is well known that for the same number of degrees of freedom high-order methods can achieve lower error and need fewer solution points to resolve the same structures. This is known as numbers per wavelength n PPW criteria [1,12]. High-order methods can achieve n P P W ≈ 4, while second finite  volume method is often limited to n PPW ≈ 20. This means that for resolving a structure of a given wavelength, high-order methods need 5 times fewer resolution points. However, this property is heavily dependent on the used polynomial degree N as shown in Fig. 2. Thus, on the same mesh the simulation with FLEXI N = 5 not only has a faster decreasing error but also has the smaller error for a given amount of degrees of freedom. This becomes obvious since FLEXI N = 1 or TAU on the h = 10 2.4 grid correspond in terms of amount of degree of freedom with FLEXI N = 5 at h = 10 1.6 . The Shuvortex test case [13] is utilized to conduct this study. All simulations are run independently and thus without mapping.
The results denoted as "TAU RANS mode" are obtained with numerical settings commonly used for the RANS zones of hybrid RANS/LES simulations in TAU. A second order central flux approximation is used as Riemann solver stabilized by artificial dissipation, derived from the scheme after James, Schmidt and Turkel [14]. Applying a skew-symmetric scheme with matrix dissipation [15] already reduces the dissipation level compared to the TAU-default average of flux scheme with scalar dissipation. The simulations denoted with "TAU LES mode" additionally utilize a reconstruction of the convective fluxes using a linear gradient extrapolation at the cell faces, in a way to reduce the numerical dispersion of the skew-symmetric scheme [16]. Moreover, the coefficient of artificial dissipation is lowered by a factor of 16. These settings are suitable for the LES zones of a hybrid simulation. In the FLEXI runs a Roe Riemann solver is used [9]. The FLEXI simulation for N = 1 shows a result similar to the TAU runs with the same order of convergence. However N = 1 is typically not used in practical application, since the advantages of high-order schemes are not visible for such low polynomial degree. The runs with N = 5 represents a more realistic scenario and show the advantage of high-order polynomials.

Workflow
Before presenting the mapping routines, we first discuss the general workflow of how to run a simulation with time resolved input data. The general workflow is visualized in Fig. 3.
First, the source data has to be provided. Generally this can be in the form of point-wise scattered data. Since in this paper we focus on the procedure for mapping TAU data to FLEXI we assume to get either volume snapshots or surface data from TAU.
Second, we process the data by choosing an appropriate spatial mapping mechanism. Also we have to decide if we only want to map surface data for an instantaneous boundary condition, or if we also want to get the volume information to e.g. generate a restart file for FLEXI. The tool creates an interpolated file for each input file. The results are saved in a HDF5 ® format that uses a polynomial structure closely related to FLEXI. To ensure compatibility with FLEXI, the output polynomial degree is identical to the degree of the simulation we want to run afterwards.
Higher-order representations are prone to aliasing and oscillations in general and the quality of the results heavily relies on the used point set and polynomial degree we perform the interpolation on. Since the output degree and point set is defined by the simulation we want to perform eventually, we can not use these parameters for mitigating errors. Thus, we super-sample the target point representation which helps avoiding errors due to oscillations resulting from the non-polynomials character of the source points. We then map the source data to the super-sampled target data points. We found that using M ≈ [1.5, 2] · (N + 1) target points for super-sampling yields good agreement and mitigates oscillations significantly. This is in line with the literature values for overintegration of turbulent data, which is commonly used in the DG community for dealiasing [17]. After interpolating the source data to super-sampled target points, we project the solution to the original basis N < M which removes the high modal information that is especially affected by aliasing.
Third, we interpolate the results from the second step temporally. The mapped volume or surface files created in step two get converted into a dataset containing the temporal interpolator for each solution point. The interpolator consists of the coefficients of the polynomial, which are dependent on the evaluation time. Additionally, we partition the data into a user-defined number of subsets to limit the amount of data of each interpolator and avoid memory overflow during FLEXI runtime.
Fourth, we provide FLEXI with the resulting file. FLEXI then evaluates the interpolant in each time step and sets the according boundary condition to the interpolated values.

Some Remarks on Surface Data
The mapping algorithms we present can be applied to volume as well as surface data. Depending on the provided data the spatial mapping algorithms will either use the volume solution to extract the target boundary or use the provided surface plane directly. TAU is able to write 2D data from a user-defined plane, onto which the flow variables are interpolated internally using algorithms of the chimera technique [18]. This interpolation will also of course introduce an error that leads to a mismatch between the volume solution of TAU and the 2D data on the plane. The resulting chimera plane can be read in separately. Hence, there can be made significant simplifications in term of area reduction which reduces the overall cost of the mapping algorithms.

Spatial Interpolation
An important step to achieve the coupling of TAU with FLEXI is the spatial mapping. Since ultimately we want to create a instantaneous boundary condition we have to map surface data only. To keep the interpolation more general we implement a three-dimensional method to also allow volume interpolation and arbitrary oriented surface planes in the source domain.
A major challenge in creating a mapping between TAU and FLEXI are the differences in spatial discretization. Industrial finite volume codes rely often on tetrahedral meshes. Therefore, a direct interpolation is difficult, since mesh data structures for hexahedral only codes are dissimilar. In contrast to FLEXI, TAU stores its solution as low order point wise data. This results in arrays containing the coordinates (x, y, z) and the solution U [19]. FLEXI however stores its solution as polynomial data for each element [7,20]. The difficulty now is to consistently map the point-wise TAU data to the polynomial coefficients needed for the solution polynomials in FLEXI. Since FLEXI only uses unstructured hexahedral elements and TAU is in contrast also based on tetrahedrals we can not use the TAU mesh information natively. An example of this incompatibility including a schematic of the interpolated solution is visualized in Fig. 4. The interpolation algorithms thus have to work with scattered data. This means we have a cloud of points with a submerged target mesh. Therefore, interpolation from the TAU source data to the FLEXI target data has to be done using unstructured interpolation algorithms. There are several algorithms available that are suitable for such tasks.

TAU FLEXI
Scattered data interpolation generally can achieve good accuracy and performance but is highly dependent on the distribution of the source points [21]. On the other hand implementing scattered data interpolation routines enable us to gain a more universal interface, since other codes and solution formats can be implemented easily.
Since the solution data of the source solver is provided beforehand, we do not have to map the data during run time, which saves a lot of computation time. In addition, the meshes used by both codes are known a priori (and remain constant during the computation). Still, good performance is crucial. Therefore, we implement the interface framework with MPI parallelization. This is done by reading in the mesh and distributing the elements equally between each processor using MPI. The elements are sorted along a Hilbert curve [20] which minimizes the communication interfaces between the individual processors cf. red curve in Fig. 5a. This approach however can only be done for the target FLEXI mesh, because we need the mesh information to distribute the data directly. For the source data we only read in data points.
Hence, a simplification and distribution of the load between the processors is not possible in a first step. Thus, we read in the source data into shared memory [11,22]. Each processor then accesses the shared memory source data and simplifies it by only selecting the data which is necessary to spatially interpolate in its individual domain. With this method we can save a lot of computational time and decrease our memory footprint without sacrificing accuracy.
If we use surface data as an input to the spatial interpolation algorithms we have to consider load balancing in more detail. The distribution of volume elements that has just been described, does not take surfaces into account. Thus, for the surface mapping we can not ensure that every part of the decomposed domain contains surface elements that have to be mapped (cf. Fig. 5a). Therefore, especially for small domains with many processors, it is possible that not all processors are working on the task resulting in a inefficient mapping. Hence, we have to redistribute the load between the processors according to the number of surface elements (cf. Fig. 5b). This can be done by assigning each surface element a high weight for domain decomposition. This weight is chosen by counting the number of boundary sides that have to be mapped per element and it generally reduces the computational load on MPI ranks that contain such a coupling interface. To do so we first have to read in the mesh file normally, then apply the surface weighting and finally reinitialize the mesh reading process [23]. With this approach we can gain performance improvements while sacrificing a few seconds in the initialization process due to the necessary reinitialization of the mesh. The overall cost of surface interpolation will be lower than using volume data. The difference between volume and surface distribution is visualized in Fig. 5.
Additionally, we have to ensure to provide a buffer region around every individual MPI domain in order to establish the interpolation stencils for each point. The buffer area is estimated by taking the size of the largest element in the complete domain of the source points into account. Since the largest element is not known directly, we take the distances between the scattered points into account and use the largest distance for that matter.

Nearest neighbor interpolation
The nearest neighbor interpolation checks the source data coordinates and finds the closest point to the desired FLEXI target point by point-wise comparison. The values U of the source data are then directly stored as a nodal coefficient in FLEXI. This type of interpolation yields a piecewise constant solution. Another requirement is an evenly distributed source mesh. If these requirements can not be met, there is risk of bad results. This does not automatically mean the the results are not physical, but rather that the resulting interpolation polynomial inside a DG cell is ill conditioned and can, due to the massive jumps, result in an unnaturally oscillating mapped solution. Another phenomenon one can observe is the possibility to get jumps on the element boundaries of the target mesh, if the boundary nodes are not included. On the Volume Mapping (a) Target domain is fully submerged in the source data (gray). Domain decomposition does not have to take surface elements into account.

Surface Mapping
(b) Boundary is aligned with surface source data (gray). MPI distribution has to be adapted in order for all three processors to contain mapped surface elements.

Inverse Distance Weighting
A more general approach is available using inverse distance weighting [24]. The target solution is calculated using a weighted average of the the source value with N source denoting the number of source points in the whole domain. In contrast to the nearest neighbor approach we not only take one point into account, but all in the source area. The weights ω( x) are depending on the distance between the source points and the target solution point and a weighting exponent p. For p ⇒ ∞ the inverse distance weighting approach resembles the nearest neighbor method. A modification to the general inverse distance weighting was introduced by Shepard, who proposed to only take the source points into account that are within a predefined radius R around the target point [25]. This reads as with R denoting a predefined search radius.

Radial Basis Functions
A third option to consider for unstructured interpolation are radial basis functions ϕ [26,27]. These methods allow for high-order accurate interpolants s of unstructured data. The interpolant consists of the weighted sum of radial basis functions. In contrast to the other methods introduced earlier we have to solve a linear equation system to invert the Vandermonde and to determine the weights ω satisfying and therefore with r ki = x k − x i L2 . We rewrite the interpolation condition in matrix notation This can be rewritten in matrix form as Φ ij ω i = u j,source using index notation. Since we have to invert the matrix Φ for interpolation, the radial basis approach is the most expensive of the introduced methods. We evaluate the interpolant and get the value at an arbitrary point in the computational domain. Typical radial basis functions for interpolation are multiquadratic ϕ(r) = 1 + (εr) 2 , inverse multiquadratic ϕ(r) = 1 √ 1+(εr) 2 , Gaussian ϕ(r) = e −(εr) 2 and thin plate spline ϕ(r) = r 2 ln(r) functions with r = x j − x i L2 . The parameter ε defines the shape of the function and is used for scaling. The multiquadratic and the thin plate spline have shown to be the most reliable radial basis functions for this use case. Since the thin plate spline does not require any additional user parameter ε we use this function for all further investigations.
During implementation of the algorithms above some observations were made. First, none of the scattered interpolation method is designed in a way to be conservative. Thus, we interpolate the primitive variables and, for consistency reasons, convert to conservative variables after mapping.

Comparison of the Spatial Interpolation Methods
Before assessing the performance of the spatial interpolation routines in context of the mapping routines, we investigate the convergence behavior in an isolated test case. Thus, we calculate the L 2 -error of a simple one-dimensional interpolation of a sine function f (x) = sin(2πx).
We plot the error in Fig. 6 over the sampling resolution of the source data. We can see that radial basis function interpolation clearly yields the best results with lower errors and a better convergence rate EOC = 5 than nearest neighbor and inverse distance weighting interpolation with EOC = 2. We assess the accuracy and the differences between the spatial interpolation schemes in more detail in Sec. 3.

Temporal Interpolation
In addition to the spatial interpolation we also have to interpolate temporally in order to account for the different time stepping schemes in the source and target codes. FLEXI e.g. uses an explicit low storage Runge-Kutta method to advance the equation systems in time. TAU on the other hand uses an implicit time discretization to accomplish that. However, in addition to the different time stepping schemes, the time step and output rate of the simulation data can change between different simulations. For using the data as an instantaneous boundary condition we have to ensure that we can provide the target solver with the correct inflow data at an arbitrary point of time. Thus, it is crucial to interpolate the results of the spatial interpolation in time to get a continuous temporal interpolator.
In contrast to the volume and surface mapping the temporal interpolation consists of purely one-dimensional problems. For one-dimensional data there are vast numbers of different interpolation techniques. In this work we use polynomial interpolation in combination with a Lagrange basis and spline interpolation.
We use the Lagrange interpolation basis since coefficient and solution values coincide [28]. Also the evaluation can be easily done with the tools already built into FLEXI, since the solution in each element consists of the tensorproduct of three one-dimensional nodal Lagrange functions.
Furthermore two different variants of spline interpolation are implemented. A common open spline as well as the Akima spline [29]. In contrast to a typical spline an Akima spline does not take the second derivative into account. This leads to a more evenly distributed solution and fewer overshoots compared to the open spline. The problem of overshoots can also be found in polynomial interpolation of degree p ≥ 2. This becomes especially important if an implicit source method is paired with an explicit target solver. In this case the temporal interpolation has to come up for a huge number of time steps since the time step in an explicit scheme is typically much smaller than implicit time steps. Thus, overshoots can play a significant role for the overall mapping quality. If the time steps of source and target method are similar, the effect of the temporal interpolation becomes smaller. However, it should be noted that even in this case, overshoots can be generated. Generally, for these reasons it is recommended to either use linear interpolation or the Akima interpolation for the most reliable results. We will show this in Sec. 3.
Hence, the resulting quality of the interpolation depends on multiple factors. First, on the chosen interpolation method. Second, on the sample rate of the provided state or boundary source files. Thus, a general prediction of the error resulting from temporal interpolation is difficult.
The interpolation is done in a separate tool and is not only limited to surface data, but can also be done with restart files of any FLEXI simulation. The result is processed and saved in an HDF5 ® file which includes the coefficients for every polynomial at every temporal sample point.
The resulting files of the temporal interpolation routine can either be directly used in FLEXI for evaluation of the interpolant or even be used to generate a restart file to continue simulation at an arbitrary point of time.
The temporal interpolator generated contains the resulting polynomial/spline at each degree of freedom. Thus, the overall size of the interpolator array has more dimension (polynomial coefficients and time) than the solution array. With increasing dimensionality the memory requirements of the array also increase. Depending on the amount of source data available it might be necessary to partition the resulting temporal interpolant in order to avoid memory overflow during simulation of the target domain. Therefore, a maximal size for the interpolant array has to be provided by the user. The interpolation algorithm will then partition the data into equally sized datasets, each containing a period of time which results from the user parameter. FLEXI then only reads in the dataset that contains the temporal information of the current FLEXI time step. Thus, during runtime of FLEXI the saved interpolant is only evaluated, allowing for obtaining an interpolated solution at an arbitrary point of time.

Validation of the Interface
In this section we start validating the mapping algorithms. We chose a gradual approach and start by showing a proof of concept, followed by the temporal algorithms and in the end assess the convergence behavior of the spatial interpolation algorithms of the interface.
We start to evaluate the algorithms by applying them to very simple test cases. Thus, we chose the linear scalar advection equation u t + ∇ · (au) = 0 with a ∈ R (8) due to its simplicity and a priori known exact solution for given initial conditions. The transport velocity is set to a = 1. We vary the initial conditions between the tested scenarios and describe them in the corresponding sections in more detail. The source domain Ω ∈ [−1, 1] 3 and the target domain Ω ∈ [1, 3] × [−1, 1] 2 are designed to have a shared interface at x = 1. The source data as well as target data for these test cases are fully generated using FLEXI. Triple-periodic boundary conditions are used for the source mesh and the target mesh is designed to have periodic boundary conditions in y and z. In x-direction we have the instantaneous interface condition at x = 1 and a outflow at x = 3.

Proof of Concept
We start the validation of the interface by applying it to a very basic sine test case with u(x, t) = sin(π(x − at)). For this first test we match the surface elements at the interface and thus can use nearest neighbor interpolation without sacrificing accuracy (source and target points coincide). In this case the nearest neighbor algorithm will just copy the data from the source to the target domain. Source and target mesh are only offset in x-direction by the length of the domain. In Fig. 7 the initial condition is depicted in light green. One can also see the solution of (8) after t = 0.8 and t = 1.6. The vertical gray dashed grid lines depict the mesh of the simulation grid and the red dotted line visualizes the interface between source and target domain. The graphs are extracted from the center line in x-direction of the equispaced Cartesian cubes, which each have a resolution of 16 × 16 × 16 using N = 4 polynomials in each element. Legendre-Gauss-Lobatto points are used for the source and target simulation. Additionally, we avoid temporal interpolation by sampling the interface data at every physical time step dt.
In Fig. 7 we can see that the general workflow presented performs as expected and the information gets propagated over the interface with a = 1. Since we do not interpolate the data in any way in this test case we expect the overall errors between source sine and target sine wave to be comparable. In the source domain we have an L 2 -error of 9.6506E−7 after t = 2. The L 2error in the target domain after t = 2 is evaluated in the same way and is 9.6859E−7. This successfully proves that the workflow is capable of mapping the data without any information loss. We stress that the full framework is working as if we were coupling between two heterogeneous solvers, with the exception that the source and target points coincide.
Note that we used continuous initial conditions between source and target domain. We found that one should avoid having jumps or large discrepancies of the initial conditions between the source and target domain due to nonphysical disturbances created at the inlet of the target domain which are further propagated. This however, is not due to the interface mapping algorithms but rather due to the nature of the high-order scheme. In practical applications, especially for transient simulations, this does not pose a problem since all structures starting from free-stream, will be mapped into the target domain.

Assessing the Temporal Interpolation and Sampling
Next, we evaluate the effect of temporal interpolation/sampling on the interface mapping process. Thus, we investigate the effects of different temporal interpolation schemes and sampling rates on the incoming solution, which we map via the instantaneous boundary condition. For this test case we chose different exact solution and initial conditions for the linear advection equation 8.
To evaluate the effect of the sampling we chose a initial condition that includes a discontinuity in order to visualize the information loss. Thus, we use We use Legendre-Gauss-Lobatto nodes with N = 4 on a 256 × 1 × 1 source and target domain. The surface elements at the interface are again matched. Thus, we can use nearest neighbor interpolation to interpolate the surface data in space, without sacrificing accuracy (copy values from source to target). Fig. 8a depicts the simulation at two discrete points in time evaluated with different ∆t-interpolants. The interface is located at x = 1 and the discontinuities travel into the target domain on the right side of the red dotted interface with a = 1. Already in the overview graph in Fig. 8a we can see substantial differences between the two interpolated jumps. For this test we chose in total three sampling rates. A fine sampling rate at ∆t ≈ 10dt which is approximately ten times the explicit FLEXI time step dt and two coarser sampling rates at ∆t ≈ 50dt and ∆t ≈ 100dt. In Fig. 8a only the finest and the coarsest sampling rate are visualized. Fig. 8b shows the jumps of all three sampling rates at the same evaluation time t in more detail. Since the x-axis is scaled identically one can see that the influence of the temporal mapping on the target FLEXI simulation is very high. There are two main effects visible: 1. The temporal distance between two samples effects the slope of the jump and 2. the temporal interpolation method has an effect on the quality of the jump representation.
The first observation has to only result from the temporal interpolation since the slope of the shock has been steeper in the source domain. Additionally we see that lowering ∆t increases the slope again. Thus, ∆t has to be chosen in a way that the steepest gradient in data can be represented sufficiently. This however is very much problem depending and requires knowledge of the data.  In Sec. 4 we assess an approach on how to determine this in the context of turbulent eddies. For the second point a more general statement can be made, since this observation is nearly independent of ∆t and only becomes more prominent if ∆t becomes sufficiently large. Higher order polynomial approximations tend to oscillate, especially for equispaced point distribution which is the case for temporal sampling. Therefore, in Fig. 8 polynomial interpolation is only shown up to second degree. Also the well known Spline interpolation tends to oscillations for high ∆t. Most favorable thus is linear or Akima interpolation, which represent the vertical jump best and recover the steepest gradients. Higher order polynomial and spline interpolation in this case fail mainly because the physical time steps dt at which we sample are roughly equispaced. Thus, we see the so-called Runge's phenomenon for interpolation using an equispaced point set in time. This is a crucial point since the TAU output frequency is only determined by its implicit timestep. The target solver FLEXI thus has to recover the data in every explicit time step. However, having a fixed source sampling rate and decreasing the target time step, will not further increase the error since the interpolant is only determined by the sampling rate of the source data and only is evaluated during FLEXI runtime.
Since Akima interpolation yields slightly smoother results in combination with steeper gradients, we use Akima interpolation for all following test cases.

Convergence of the Spatial Mapping
Another important aspect one has to consider is the convergence behavior of the mapping process. To measure the effect of the interpolation routines we decided to calculate the error norm of the whole mapping and simulation process. Thus, the error includes spatial and temporal interpolation error as well as the error associated with imposing the instantaneous boundary condition in FLEXI (e.g. discretization error).
To run the convergence test, we modify the initial conditions from the onedimensional sine in Fig. 7. We add a y and z dependency to the exact solution u in order to have varying u values on the interface plane. Thus, we get u(x, y, z, t) = sin(π(x − at)) + sin(πy) + sin(πz).
For the sine wave and the linear transport we have seen earlier that we can recover the exact solution on the target domain and that the information is propagated correctly via the instantaneous boundary condition if there is no spatial and temporal interpolation involved (just copy the values from source to target). Thus, we want to investigate the effect of different non-matching interfaces (point sets and resolution) on the error of the simulation and therefore have to combine spatial and temporal interpolation techniques for the first time. We again use Legendre-Gauss-Lobatto points with N = 5 in the source and target domain. Additionally, we use super-sampling with N = 8. For the linear transport this is not necessary, since in contrast to the Navier-Stokes equations we do not see aliasing here. However, we want to assess the convergence as close to the later application as possible and additionally avoid matching all the degrees of freedom in any case (N src = 5 = 8 = N tar,super ).
In Fig. 9 we see two different testing scenarios. The first in Fig. 9a shows the L 2 -error for increasing target resolution and a fixed source mesh. The second scenario in Fig. 9b depicts the error for an increasing source resolution and a fixed target mesh. With "grid" we mean the number of elements in each spatial direction of the Cartesian cube. The sampling timestep is defined by the physical timestep dt of the source data. Thus, for e.g. the source grid "1" we extract the interface data at every physical time step and use Akima  Nearest Neighbor Modified Shepard RBF (thin plate) Fig. 9: Comparison of the convergence behavior of the entire mapping procedure including spatial, temporal mapping and the instantaneous boundary condition.
interpolation to interpolate it to the physical time step of the "32" target grid that is 32 times smaller. In Fig. 9a we assess the effect of varying target mesh resolutions on the overall error. The resolution of the source mesh is fixed at 8 × 8 × 8 and 32 × 32 × 32 respectively. We expect the overall error to converge, since the error can not be mitigated any further if it is dominated by the source data. Thus, we can see the influence of the source data on the target domain. For the 8 × 8 × 8 source mesh we can observe this behavior very well. Starting at grid tar ≈ 4 we see that for all interpolation algorithms there is no further improvement. For the finer source mesh we can observe a similar behavior, however the overall error is lower and the error is converged later. Due to the error introduced with the spatial interpolation we can not see a declining error until the source mesh resolution.
In Fig. 9b we investigate the effect of a varying source mesh on a fixed target mesh. This can be interpreted as increased input quality for the mapping for a given target domain. In this test case we again assessed the influence for two fixed target resolutions at 8×8×8 and 32×32×32. Nearest neighbor, Shepard as well as RBF interpolation show declining errors for increasing source grid resolution. This time nearly linear decaying errors can be seen up to the resolution of the target mesh. However, especially for the grid tar = 8 × 8 × 8 case, we can see that RBF interpolation is capable of recovering information from source grids with finer resolution than the target mesh. Shepard and nearest neighbor show clearly weaker performance here and have changing slopes of the error in this source grid regime.
Overall we can rank the performance of the three tested spatial interpolation techniques. Nearest neighbor interpolation shows as expected the weakest performance with an experimental order of convergence of EOC ≈ 1.2 in Fig. 9b. Shepard interpolation shows overall lower errors at roughly the same order of convergence EOC ≈ 1.5. However, Shepard interpolation is capable of retaining the error even for source resolutions higher than the target resolution in Fig. 9b where nearest neighbor interpolation show inconsistent results. Finally, radial basis function interpolation clearly yields the best results with an order of convergence of EOC ≈ 2.8. Thus, using RBF interpolation is recommended. Overall the results in Fig. 9b underline the observations made in Fig. 6. However, the convergence rates in Fig. 9b are lower for all interpolation methods. The qualitative observations however are identical and the losses in EOC are equivalent for all interpolation techniques. One should note, that the test case in Fig. 9b has an increased complexity, since it is two-dimensional and we evaluate the error over the whole mapping process compared to an isolated one-dimensioal interpolation test in Fig. 6.
For large source datasets one should keep in mind, that solving the equation system necessary to the get the interpolation coefficients for the radial basis functions gets very expensive and RBF interpolation even in this simple test case was noticeably (approx. up to an order of magnitude) slower than nearest neighbor and Shepard interpolation.

Results: Cylinder Flow
In this section we investigate the flow around a cylinder at a Reynolds number of Re c = 3900 [30,31]. The diameter of the cylinder is defined as c and is used as the characteristic length in this investigation. The domain has a spanwise extension of c. For the first time we now map actual TAU surface data into a FLEXI domain.
In Fig. 10 the simulation setup is depicted. Note that the size of the interface planes in Fig. 10 at x I does not match the size in the actual simulation. In the setup the interface planes are designed in a way that all the turbulent wake structures are captured by the planes and all vortical structures of the wake are fully contained in the interface planes.
The most important aspects for assessing the performance of the interface is to define the interface locations and to define record points (also known as probe points). We decide to place two interface planes at position x I in the wake as well as two record points x P .
We use the same number of degrees of freedom in the TAU source and the appended FLEXI target mesh. Thus, the target mesh resolution including the interface has to be divided by a factor of eight in order to accommodate for the higher polynomial degree of FLEXI N = 7. With this approach we minimize the resulting errors (cf. Fig. 9a). We will show later that this resolution is sufficient to map all physical structures occurring in this test case. The target mesh in this case is a simple box that has the same y and z dimensions as the interface and a sufficiently long x extension for the turbulent wake to develop and travel. u∞ x 0 x P1 x P2 x I1 x I2 x z Fig. 10: Cylinder at Re c = 3900 test case definition containing interface planes and probe points for evaluation.

Simulation Setup
Next, we discuss the simulation setup of the cylinder test case. We describe the setup for FLEXI as well as TAU.
The main reason we chose the cylinder flow as the main test case is the fact, that we can afford to run the whole domain in FLEXI and in TAU. Thus, we not only can compare the mapped results against the TAU solution but also against the reference DNS created with FLEXI. Additionally, the cylinder is a well known geometry in the fluid mechanics community and has been investigated in detail before.
We use the same base mesh setup for the TAU and FLEXI DNS reference simulation. The only difference is again that the FLEXI case is coarser by a factor of eight to consider the high-order polynomials that are used in each element. Thus, if FLEXI is run with N = 7 we have the same number of degrees of freedom as TAU. We also investigate the solution quality of lower order polynomials later. For the simulation in FLEXI we use N ∈ [3,5,7] polynomials.

Sensitivity on Resolution
First, we assess the sensitivity of the test case on the resolution. We conduct this study in FLEXI since we are interested if the chosen resolution is sufficient for a DNS. This test case was specifically chosen since it allows to conduct a fully resolved simulation in FLEXI and in TAU. For typical applications of the inflow condition this will not be possible.
In Fig. 11 the spectra of the turbulent kinetic energy are shown at three discrete points in the wake of the cylinder. Each figure contains the spectrum for three simulations. Each with different polynomial degree. The N = 3 spectrum in Fig. 11 shows a deviation from the N = 5 and N = 7 curves at all evaluation locations. Thus, we can assume that the resolution for N = 3 is not sufficient for a DNS and does not yield enough dissipation. Since the turbulent kinetic energy spectra for N = 5 and N = 7 coincide, we can assume that we are converged at this resolution and thus N = 5 is sufficient for running a DNS. The expected Strouhal frequency of the cylinder is clearly visible as a distinct peak in the spectrum [31] for all polynomial degrees.
The simulation with N = 7 (same amount of degrees of freedom as TAU mesh) is too fine for a typical LES/DNS since the mesh was originally created to be suitable for a hybrid RANS/DNS and a FV code. Evaluating the viscous wall spacing in FLEXI yields y + ≈ 0.01 which is more than sufficient, even for a DNS. This however underlines the benefits of a high-order scheme when resolving turbulent eddies.
As already shown in Fig. 2 using the same amount of DOFs in FLEXI and in TAU with a high polynomial degree in FLEXI shows the strength of the high-order scheme, since this resolution is sufficient in FLEXI to run a DNS.

Interpolation Error
Next, we assess the interpolation error resulting from interpolating a wake plane as defined in Fig. 10 onto the FLEXI boundary condition (grid tar = 32 × 8) using the modified Shepard method. We investigate the error for plane x I1 , which is the one that is located closest to the cylinder. The error is assessed by using the TAU source data as reference data. To get the same point set for TAU and FLEXI we evaluate the polynomials of the mapped FLEXI solution on the TAU solution points. The results are visualized in Fig. 12.
By eye norm the results in Fig. 12 look very convincing. The structures of the source data are all represented in the mapped solution. Taking a closer look one can see small overshoots of the mapped solution at the element boundaries. This effect has already been mitigated by using a super-sampling as dealiasing technique. To quantify the error we look at the difference between the source and the mapped data visualized in Fig. 13.
Thus, we evaluate the error of the density for three resolutions grid tar = 32 × 8, grid tar = 16 × 4 and grid tar = 8 × 2. The plots show the relative deviation of the interpolated target data based on the source data. The density plot confirms the observations we made in Fig. 12 and shows a very small error in   In Tab. 1 we can see the minimal and maximal values of the primitive variables for source and mapped data. Additionally, the integral mean value is listed. One can note that the mapping yields very good results for density and pressure. In contrast especially for the velocities we see small deviations from source data. This error is based on the fact that the interpolation is not conservative. This gets especially pronounced for the velocity components due to changing signs. By applying the conversion of primitive to conservative variables after interpolation we ensure that -despite the interpolation not being conservative -we get consistent conservative variables.
Hence, due to flexibility of the scheme and the generally very small effect on the mapped results we can neglect the effects of the non-conservativity (cf. Tab. 1) and directly use the mapped plane as an inflow condition.

Influence of the Sampling Rate
Now we assess the effect of the sampling rate of the source data on the quality of the solution in the target domain, which is a very important user parameter that has to be considered when creating a coupled simulation. We do so by investigating the effect on the contribution of the incoming turbulence on the turbulent kinetic energy.
In Fig. 14 the turbulent kinetic energy spectra at two distinct probe points are visualized. The different colors depict different temporal sampling. With N Skip we mean how many TAU snapshots are skipped in time. N Skip = 1 means that every temporal snapshot is used. The physical TAU sampling rate is ∼ 150 snapshots per characteristic time. The characteristic time is defined as the time it takes the fluid to cover the distance of the diameter of the cylinder. For N Skip = 2 we only use every second snapshot. The lighter the color gets the fewer snapshots are used to recover the TAU solution in FLEXI. Fig. 14 shows that the results are heavily dependent on the sampling rate. This seams reasonable since the sampling rate determines which structures are mapped via the instantaneous boundary condition. According to the Nyquist criterion there is a value for N Skip for which the solution is not represented anymore. In this case for N Skip ≥ 512 we no longer see agreement with the reference solution. For smaller N Skip there is better agreement with the black reference solution. Hence, two major observations can be made. First, for high N Skip the major flow structures can not be recovered and even the Strouhal frequency is not represented correctly. Additionally, after some development in the target domain at x = 5.25c, we can see the there is a lot of disagreement even for low k. Second, we can observe that the energy does not adapt and we loose energy in high modes for large N Skip .
From these observations we can conclude that the sampling frequency is dependent on the structures that have to be mapped to the new domain. Thus, we define a measure to quantify the "eddy size -sampling rate" relation which is closely related to the underlying spatial discretization scheme. From literature (e.g. [1,10]) we know that there is a similar criteria for spatial discretization, which uses the parameter numbers-per-wavelength n PPW to quantify the property of a spatial discretization scheme in resolving multi-scale structures. For DGSEM it is known that n PPW,DGSEM ≈ 4.
In this case we take two sizes as reference. First, according to [32] the large structures are of the size of the cylinder which corresponds to L = 1c. From the simulation setup and the properties of the DG scheme we estimate the smallest structures according to l = L domain #DOF · n PPW,DGSEM ≈ 0.06c.
with L domain denoting the size of the domain and #DOF the number of DOFs used to discretize the domain. Taking u ∞ into account, we get an approximation for how long it takes an eddy to be advected over the interface plane, assuming Taylor's hypothesis [33]. Taking the sampling frequency into account we can estimate that for the smallest structures we need N Skip ≈ 4 and for the large structures N Skip ≈ 64 is sufficient. This behavior for L = c is also underlined in Fig. 14. Only using every 64 th sample N Skip = 64 still provides us with the main structures and correct amplitudes, while N Skip > 64 shows signs of underresolution. Using these information we can approximate a criterion on how many points we need per structure/eddy that has to be transported over the interface. It turns out that for both large and small eddies we need approximately 2.3 samples per eddy. As one would expect, we can conclude that spatial and temporal discretization requirements are similar for the interface. We repeated this evaluation for both interface planes x I1 and x I2 . Both showed qualitatively identical results.

Summary
In this work we introduced a method to generate an instantaneous boundary condition relying on a precursor simulation. We presented the numerical methods necessary to handle differences in spatial and temporal discretization via interpolation as well as validated the scheme for simple test cases and a more complex cylinder wake.
We have shown how to generate numerically stable inflow and initial conditions with the methods described in this paper that are universally applicable also to other solvers than TAU and even experimental data.
The requirements regarding sampling rate are similar to those of the spatial discretization and thus need approximately four sampling points per wavelength, depending on the temporal interpolation scheme used.
We implemented several mapping techniques and showed the differences in interpolation quality and additionally demonstrated their capabilities of reconstructing scattered source data. In addition we utilized super-sampling of the interpolation to increase the overall accuracy and to mitigate the errors due to aliasing and numerical incompatibilities.
In terms of spatial resolution difference at the interface we observed that increasing the resolution of the source data never posed a problem. However, coarsening the data too much can produce large aliasing errors which cause trouble for the high-order scheme. Thus, we recommend using a similar resolution on both sides of the interface.
The introduced interface now has to be applied to more complex scenarios. The tool-chain introduced in this paper is already designed to handle these kind of challenging simulations.