Abstract
In this paper, we present results of the second phase of the project ExaFSA within the priority program SPP1648—Software for Exascale Computing. Our task was to establish a simulation environment consisting of specialized highly efficient and scalable solvers for the involved physical aspects with a particular focus on the computationally challenging simulation of turbulent flow and propagation of the induced acoustic perturbations. These solvers are then coupled in a modular, robust, numerically efficient and fully parallel way, via the open source coupling library preCICE. Whereas we made a first proof of concept for a threefield simulation (elastic structure, surrounding turbulent acoustic flow in the nearfield, and pure acoustic wave propagation in the farfield) in the first phase, we removed several scalability limits in the second phase. In particular, we present new contributions to (a) the initialization of communication between processes of the involved independent solvers, (b) optimization of the parallelization of data mapping, (c) solverspecific whitebox data mapping providing higher efficiency but less flexibility, (d) portability and scalability of the flow and acoustic solvers FASTEST and Ateles on vector architectures by means of code transformation, (e) physically correct information transfer between nearfield acoustic flow and farfield acoustic propagation.
Download conference paper PDF
1 Introduction
The simulation of fluidstructureacoustic interactions is a typical example for multiphysics simulations. Two fundamentally different physical sound sources can be distinguished: structural noise and flowinduced noise. As we are interested in accurate results for the resulting sound emissions induced from the turbulent flow, it is decisive to include not only the turbulent flow, but also the structure deformation and the interaction between both. High accuracy requires the use of highly resolved grids. As a consequence, the use of massively parallel supercomputers is inevitable. When we are interested in the sound effects far away from a flow induced fluttering structure, the simulation becomes too expensive, even for supercomputing architectures. Hence, we introduce an assumption, we call it the “farfield”. Far from the structure and, thus, the noise generation, we assume a homogeneous background flow and restrict the simulation in this part of the domain to the propagation of acoustic waves. This results in an overall setup with two coupling surfaces—between the elastic structure and the surrounding flow, and between the nearfield and the farfield in the flow domain (see Fig. 1 for an illustrative example). Such a complex simulation environment implies several new challenges compared to “singlephysics” simulations: (a) multiscale properties in space and time (smallscale processes around the structure, multiscale turbulent flow in the nearfield, and largescale processes in the acoustic farfield), (b) different optimal discretization and solver choices for the three fields, (c) highly illconditioned problem, if formulated and discretized as a single large system of equations, (d) challenging loadbalancing due to different computational load per grid unit depending on the local physics.
Application examples for fluidstructureacoustic simulations can be found in several technical systems: wind power plants, fans in air conditioning systems of buildings, cars or airplanes, car mirrors and other design details of a car frame, turbines, airfoil design, etc.
Fluidstructure interaction simulations as a subproblem of our target system have been in the focus of research in computational engineering for many years, mainly aiming at capturing stresses in the structure more realistically than with a pure flow simulation. A main point of discussion in this field is the question whether monolithic approaches—treating the coupled problem as a single large system of equations—or partitioned methods—glueing together separate simulation modules for structures and fluid flow by means of suitable coupling numerics and tools—are more appropriate and efficient. Monolithic approaches require a new implementation of the simulation code as well as the development of specialized iterative solvers for the illconditioned overall system of equations, but can achieve very high efficiency and accuracy [3, 12, 19, 23, 38]. Partitioned approaches, on the other hand, offer large flexibility in choosing optimal solvers for each field, adding additional fields, or exchanging solvers. The difficulty here lies in both a stable, accurate, and efficient coupling between independent solvers applying different numerical methods and in establishing efficient communication and load balancing between the used parallel codes. For numerical coupling, numerous efficient data mapping methods [5, 26, 27, 32] have been published along with efficient iterative solvers [2, 7, 13, 20, 29, 35, 39, 41]. In [6], various monolithic and partitioned approaches have been proposed and evaluated in terms of a common benchmark problem. Threefield fluidstructureacoustic interaction in the literature has so far been restricted to nearfield simulations due to the intense computational load [28, 33].
To realize a threefield fluidstructureacoustic interaction including the farfield, we use a partitioned approach and couple existing established “singlephysics” solvers in a blackbox fashion. We couple the finite volume solver FASTEST [18], the discontinuous Galerkin solver Ateles [42], and the finite element solver CalculiX [14] by means of the coupling library preCICE [8]. We compare this approach to a less flexible whitebox coupling implemented in APESmate [15] as part of the APES framework and make use of the common datastructure within APES [31]. The assumption which is confirmed in this paper is, that the whitebox approach is more efficient, but puts some strict requirements on the codes to be coupled, while the blackbox approach is a bit less efficient, but much more flexible with respect to the codes that can be used. Our contributions to the field of fluidstructureacoustic interaction, which we summarize in this paper, include:

1.
For the nearfield flow, we introduce a volume coupling between background flow and acoustic perturbations in FASTEST accounting for the multiscale properties in space and time by means of different spatial and time resolution.

2.
For both nearfield flow and farfield acoustics, we achieved portability and performance optimization of Ateles and FASTEST for vector machines by means of code transformation.

3.
In terms of interfield coupling, we

(a)
increased the efficiency of intercode communication by means of a new hierarchical implementation of communication initialization and a modified communicator concept,

(b)
we improved the robustness and efficiency of radial basis function mapping,

(c)
we identified correct interface conditions between nearfield and farfield, optimized the position of the interface, and ensured correct boundary conditions by overlapping nearfield and farfield,

(d)
we developed and implemented implicit quasiNewton coupling numerics that allow for a simultaneous execution of all involved solvers.

(a)

4.
For a substantially improved intercode load balancing, we use a regressionbased performance model for all involved solvers and perform an optimization of assigned cores.

5.
We present a comparison of our blackbox and to the whitebox approach for multiphysics coupling.
These contributions have been achieved as a result of the project ExaFSA—a cooperation between the Technische Universität Darmstadt, the University of Siegen, the University of Stuttgart, and the Tohoku University (Japan) in the Priority Program SPP 1648—Software for Exascale Computing of the German Research Foundation (DFG) in close collaboration with the Technical University of Munich. In the first funding phase (2013–2016), we showed that efficient yet robust coupled simulations are feasible and can be enhanced with an insitu visualization component as an additional software part, but we still reached limits in terms of scalability and load balancing [4, 9]. This paper focuses on results of the second funding phase (2016–2019) and demonstrates significant improvements in scalability and accuracy as well as robustness based on the abovementioned contributions.
In the following, we introduce the underlying model equations of our target scenarios in Sect. 2 and present our solvers and their optimization in Sect. 3 as well as the blackbox coupling approach and new contributions in terms of coupling in Sect. 4. In Sect. 5, we compare blackbox coupling to an alternative, efficient, but solverspecific and, thus, less flexible whitebox coupling for unidirectional flowacoustic coupling. Finally, results for a turbulent flow over a fence scenario are presented in Sect. 6.
2 Model
In this section, we shortly introduce the underlying flow, acoustic and structure models of our target application. We use the Einstein summation convention throughout this section.
2.1 Governing Equations
The multiphysics scenario we investigate describes an elastic structure embedded in a turbulent flow field. The latter is artificially decomposed into a nearfield and a farfield. See Fig. 1 for an example.
NearField Flow
In the nearfield region Ω^{F} = Ω^{NA}, the compressible fluid flow is modeled by means of the density \(\overline {\rho }\), the velocity \(\overline {u_i}\) and the pressure \(\overline {p}\). As we focus on a low Mach number regime, we can split these variables into an incompressible part ρ, u _{i}, p, and acoustic perturbations \(\rho ', u_i^{\prime }, p'\):
The incompressible flow is described by the NavierStokes equations^{Footnote 1}
where ρ is the density of the fluid, and f _{i} summarizes external force density terms. The incompressible stress tensor τ _{ij} for a Newtonian fluid is described by
with μ representing the dynamic viscosity and δ _{ij} the KroneckerDelta.
Acoustic Wave Propagation
The propagation of acoustic perturbations in both the nearfield and the farfield is modeled by the linearized Euler equations, where in the farfield a constant background state is assumed (which implies \(\frac {\partial p}{\partial t} = 0\)).
Here c is the speed of sound. In the nearfield, the background flow quantities u _{i} and p are calculated from (2), whereas they are assumed to be constant in the acoustic farfield. The respective constant value is read from the coupling interface with the nearfield, which implies that the interface has to be chosen such that the background flow values are (almost) constant at the coupling interface. In both cases, the coupling between backgroundflow and acoustic perturbations is unidirectional from the background flow to the acoustic equations (4) by means of p and u _{i}.
Elastic Structure
The structural subdomain Ω^{S} is governed by the equations of motion, here in Lagrangian description:
With \(x^{\text{S}}_i = X^{\text{S}}_i + \vartheta _i\) being the position of a particle in the current configuration, \(X^{\text{S}}_i\) is the position of a particle in the reference configuration, and 𝜗 _{i} the displacement. F _{ij} is the deformation gradient. S _{ij} is the second PiolaKirchhoff tensor, and ρ ^{S} describes the structural density. The Cauchy stress tensor \(\tau _{ij}^{\text{S}}\) relates to S _{ij} via
We assume linear elasticity to describe the stressstrain relation.
The coupling between fluid and structure is bidirectional by means of dynamic and kinematic conditions, i.e., equality of interface displacements/velocities and stresses, i.e.,
at Γ^{I} = Γ^{S} ∩ Γ^{F} with Γ^{F} = ∂ Ω^{F} and Γ^{S} = ∂ Ω^{S}.
3 Solvers and Their Optimization
Following a partitioned approach, the respective subdomains of the multiphysics model as described in Sect. 2 (elastic structure domain, nearfield, and farfield) are treated by different solvers. We employ the flow solver FASTEST presented in Sect. 3.1 to solve for the incompressible flow equations, Eq. (2), and nearfield acoustics equations, Eq. (4), the Ateles solver described in Sect. 3.2 for the farfield acoustics equations, Eq. (4), and finally the structural solver CalculiX introduced in Sect. 3.3 for the deformation of the obstacle, Eq. (5). For performance optimization of FASTEST and Ateles, we make use of the Xevolver framework, which has been developed to separate systemspecific performance concerns from application codes. We report on the optimization of both solvers further below.
3.1 FASTEST
FASTEST is used to solve both the incompressible NavierStokes equations (2) and the linearized Euler equations (4) in the nearfield.
Capabilities and Numerical Methods
The flow solver FASTEST [24] solves the threedimensional incompressible NavierStokes equations. The equations are discretized utilizing a secondorder finitevolume approach with implicit timestepping, which is also second order accurate. Field data are evaluated on a nonstaggered, bodyfitted, and blockstructured grid. The equations are solved according to the SIMPLE scheme [11], and the resulting linear equation system is solved by ILU factorization [36]. Geometrical multigrid is employed for convergence acceleration. The code generally follows a hybrid parallelization strategy employing MPI and OpenMP. FASTEST can account for different flow phenomena, and has the capability to model turbulent flow with different approaches. In our test case example, we employ a detachededdy simulation (DES) based on the ζ −f turbulence model [30].
In addition, FASTEST contains a module to solve the linearized Euler equations to describe low Mach number aeroacoustic scenarios, which are solved by a second order LaxWendroff scheme with various limiters.
Since all equation sets are discretized on the same numerical grid, advantage can be taken from the multigrid capabilities to account for the scale discrepancies of the fluid flow and the acoustics. Since the spatial scales of the acoustics are considerably larger than those of the flow, a coarser grid level can be used for them. In return, the finer temporal scales can be considered by subcycling a CFD time step with various CAA time steps. This way a very efficient implementation of the viscous/acoustic splitting approach can be realized.
Performance Optimization
Concerning performance optimization, one interesting point of FASTEST is that some of its kernels were once optimized for old vector machines, and thus important kernels have their vector versions in addition to the default ones. The main difference between the two versions is that nested loops in the default version are collapsed into one loop in the vector version. Since the loops skip accessing halo regions, the compiler is not able to automatically collapse the loops, resulting in short vector lengths even if the compiler can vectorize them. To efficiently run the solver on a vector system, performance engineers usually need to manually change the loop structures. In this project, Xevolver is used to express the differences between the vector and default versions as code transformation rules. In other words, vectorizationaware loop optimizations are expressed as code transformations. As a result, the default version can be transformed to its vector version, and the vector version does not need to be maintained any longer to achieve high performance on vector systems. That is, the FASTEST code can be simplified without reducing the vector performance by using the Xevolver approach. Ten rules are defined to transform the default kernels in FASTEST to their vector kernels. Those code transformations plus some systemindependent minor code modifications for removing vectorization obstacles can reduce the execution time on the NEC SXACE vector system by about 85%, when executing a simple test case that models a threedimensional Poiseuille flow through a channel based on the NavierStokes equations, in which the mesh contains two blocks with 426,000 cells each. The code execution on the SXACE vector processor works about 2.7 times faster than on the Xeon E52695v2 processor, since the kernel is memoryintensive and the memory bandwidth of SXACE is 4× higher than that of Xeon. Therefore, it is clearly demonstrated that the Xevolver approach is effective to achieve both high performance portability and high code maintainability for FASTEST.
3.2 Ateles
In our project, Ateles is used for the simulation of the acoustic farfield. Since acoustics scales need to be transported over a large distance, Ateles’ highorder DG scheme can show its particular advantages of low dissipation and dispersion error in this test case.
Capabilities and Numerical Methods
The solver Ateles is integrated in the simulation framework APES [31]. Ateles is based on the Discontinuous Galerkin (DG) discretization method, which can be seen as a hybrid method, combining the finitevolume and finiteelement methods. DG is well suited for parallelization and the simulation of aeroacoustic problems, due to its inherent dissipation and dispersion properties. This method has several outstanding advantages, that are among others the highorder accuracy, the faster convergence of the solution with increasing scheme order and fewer elements compared to a low order scheme with a higher number of elements, the local hp refinement as well as orthogonal hierarchical bases. The DG solver Ateles includes different equation systems, among others the compressible NavierStokes equations, the compressible inviscid Euler equations and the linearized Euler equations (used in this work for the acoustics farfield). For the time discretization, Ateles makes use of the explicit RungeKutta time stepping scheme, which can be either second or fourth order.
Performance Optimization
Analyzing the performance of Ateles, originally developed assuming x86 systems, we found out that four kinds of code optimization techniques are needed for a total of 18 locations of the code in order to migrate the code to the SXACE system. Those techniques are mostly for collapsing the kernel loop and also for directing the NEC compiler to vectorize the loop. In this project, all the techniques are expressed as one common code transformation rule. The rule can take the option to change its transformation behaviors appropriately for each code location. This means that, to achieve performance portability between SXACE and x86 systems, only one rule needs to be maintained in addition to the Ateles application code. We executed a small testcase solving Maxwell equations with an 8th order DG scheme on 64 grid cells. The code transformation leads to 7.5× higher performance. The significant performance improvement is attributed to loop collapse and insertion of appropriate compiler directives, which increases the vectorization length by a factor of 2 and the vectorization ratio from 71.35% to 96.72%. Finally, in terms of the execution time, the SXACE performance is 19% the performance of Xeon E52695v2. The code optimizations for SXACE reduce the performances of Xeon and Power8 by 14% and 6%, respectively. In this way, code optimizations for a specific system are often harmful to other systems. However, by using Xevolver, such a systemspecific code optimization is expressed separately from the application code. Therefore, the Xevolver approach is obviously useful for achieving high performance portability across various systems without complicating the application code.
3.3 CalculiX
As structure solver, we use the wellestablished finite element solver CalculiX[14], developed by Guido Dhont und Klaus Wittig.^{Footnote 2} While CalculiX also supports static and thermal analysis, we only use it for dynamic nonlinear structural mechanics. As our main research focus is not the structural computation per se, but the coupling within a fluidstructureacoustic framework, we merely regard CalculiX as a black box. The preCICE adapter of CalculiX has been developed in [40].
4 A BlackBox Partitioned Coupling Approach Using preCICE
Our first and general coupling approach for the threefield simulation comprising (a) the elastic structure, (b) the nearfield flow with acoustic equations, and (c) the farfield acoustic propagation follows a blackbox idea, i.e., we only use input and output data of dedicated solvers at the interfaces between the respective domains for numerical coupling. Such a blackbox coupling requires three main functional coupling components: intercodecommunication, datamapping between nonmatching grids of independent solvers, and iterative coupling in cases with strong bidirectional coupling. preCICE is an open source library^{Footnote 3} that provides software modules for all three components. In the first phase of the ExaFSA project, we ported preCICE from a serverbased to a fully peertopeer communication architecture [9, 39], increasing the scalability of the software from moderately to massively parallel. To this end, all coupling numerics needed to be parallelized on distributed data. During the second phase of the ExaFSA project, we focused on several costly initialization steps and further necessary algorithmic optimizations. In the following, we shortly sketch all components of preCICE with a particular focus on innovations introduced in the second phase of the ExaFSA project and on the actual realization of the fluidacoustic coupling between nearfield and farfield and the fluidstructure coupling.
4.1 (Iterative) Coupling
To simulate fluidstructureacoustic interactions such as in the scenario shown in Fig. 1, two coupling interfaces have to be considered with different numerical and physical properties: (a) the coupling between fluid flow and the elastic structure requires an implicit bidirectional coupling, i.e., we exchange data in both directions and iterate in each time step until convergence; (b) the coupling between fluid flow and the acoustic farfield is unidirectional (neglecting reflections back into the nearfield domain), i.e., results of the nearfield fluid flow simulation are propagated to the farfield solver as boundary values once per time step. In order to fulfil the coupling conditions at the fluidstructure interface as given in Sect. 2, we iteratively solve the fixedpoint equation
where f represents the stresses, u the velocities at the interface Γ_{FS}, S the effects of the structure solver on the interface (with stresses as an input and velocities as an output), F the effects of the fluid solver on the interface (with interface velocities as an input and stresses as an output). preCICE provides a choice of iterative methods accelerating the plain fixedpoint iteration on Eq. (8). The most efficient and robust schemes are our quasiNewton methods that are provided in a linear complexity (in terms of interface degrees of freedom) and fully parallel optimized versions [35]. As most of our achievements concerning iterative methods fall within the first phase of the ExaFSA project, we omit a more detailed description and refer to previous reports instead [9].
For the unidirectional coupling between the fluid flow in the nearfield and the acoustic farfield, we transfer perturbation in density, pressure, and velocity from the flow domain to the farfield as boundary conditions at the interface. We do this once per acoustic time step, which is chosen to be the same for nearfield and farfield acoustics, but which is much smaller than the fluid time step size (and the fluidstructure coupling), as described in Sect. 3.1.
Both domains are timedependent and subject to mutual influence.
In an aeroacoustic setting, the nearfield subdomain Ω^{NA} and farfield subdomain Ω^{FA}, with boundaries Γ^{NA} = ∂ Ω^{NA} and Γ^{FA} = ∂ Ω^{FA} are fixed, which means, all background information in the farfield are fixed to a certain value. Therefore there is only influence of Ω^{NA} onto Ω^{FA}, as backward propagation can be neglected. Then the continuity of shared state variables on the interface boundary Γ^{IA} = Γ^{NA} ∩ Γ^{FA} is
4.2 Data Mapping
Our three solvers use different meshes adapted to their specific problem domain. To map data between the meshes, preCICE offers three different interpolation algorithms: (a) Nearestneighbor interpolation is based on finding the geometrically nearest neighbor, i.e. the vertex with the shortest distance from the target or source vertex. It excels in its ease of implementation, perfect parallelizability, and low memory consumption. (b) Nearestprojection mapping can be regarded as an extension to the nearestneighbor interpolation, working on nearest mesh elements (such as edges, triangles or quads) instead of merely vertices and interpolating values to the projection points. The method requires a suitable triangulation to be provided by the solver. (c) Interpolation by radialbasis functions is provided. This method works purely on vertex data and is a flexible choice for arbitrary mesh combinations with overlaps and gaps alike.
In the second phase of the ExaFSA project, we improved the performance of the data mapping schemes in various ways. All three interpolation algorithms contain a lookupphase which searches for vertices or mesh elements near a given set of positions. As there is no guarantee regarding ordering of vertices, this resulted in \(\mathcal {O}\left (n \cdot m\right )\) lookup operations, being the size of the respective meshes. In the second phase, we introduced a treebased data structure to facilitate efficient spatial queries. The implementation utilizes the library Boost Geometry ^{Footnote 4} and uses an rtree in conjunction with the rstar insertion algorithm. The integration of the tree is designed to fit seamlessly into preCICE and avoids expensive copy operations for vertices and mesh elements of higher dimensionality. Consequently, the complexity of the lookupphase was reduced to \(\mathcal {O}\left (\log _a n\right ) \cdot m\) with a being a parameter of the tree, set to ≈5. The tree index is used by nearestneighbor, nearestprojection, and RBF interpolation as well as other parts in preCICE and provides a tremendous speedup in the initialization phase of the simulation.
In the course of integrating the index, the RBF interpolation profited from a second performance improvement. In contrast to the nearestneighbor and nearestprojection schemes it creates an explicit interpolation matrix. Setting values one by one results in a large number of small memory allocations with a relatively large percall overhead. To remedy this, a preallocation pattern is computed with the help of the tree index. This results in a single memory allocation, speeding up the process of filling the matrix. A comparison of the accuracy and runtime of the latter two interpolation methods is provided in Sect. 5.
4.3 Communication
Smart and efficient communication is paramount in a partitioned multiphysics scenario. As preCICE is targeted at HPC systems, a central communication instance would constitute a bottleneck and has to be avoided. At the end of phase one, we implemented a distributed application architecture. The main objective in its design is not a classical speedup (as it is for parallelism) but not to deteriorate the scalability of the solvers and rendering a central instance unnecessary. Still, a socalled master process exists, which has a special purpose mainly during the initialization phase.
At initialization time, each solver gives its local portion of the interface mesh to preCICE. By a process called repartitioning, the mesh is transferred to the coupling partner and partitioned there, i.e., the coupling partner’s processes select interface data portions that are relevant for their own calculations. The partitioning pattern is determined by the requirements of the selected mapping scheme. The outcome of this process is a sparse communication graph, where only links between participants exist that share a common portion of the interface. While this process was basically in place at the end of phase one, it was refined in several ways.
MPI connections are managed by means of a communicator which represents an ntom connection including an arbitrary number of participants. The first implementation used only one communication partner per communicator, essentially creating only 1to1 connections. To establish the connections, every connected pair of ranks had to exchange a connection token generated by the accepting side. This exchange is performed using the network file system, as the only apriori existing communication space common to both participants. However, network file systems tend to perform badly with many files written to a single directory. To reduce the load on the file system, a hashbased scheme was introduced as part of the optimizations in phase two. With that, writing of the files is distributed among several directories, as presented in [26]. This scheme features a uniform distribution of files over different directories and, thus, minimizes the files per directory.
However, this obviously resulted in a large number of communicators to be created. As a consequence, large runs hit system limits regarding the number of communicators. Therefore, a new MPI communication scheme was created as an alternative. It uses only one communicator for an alltoall communication, resulting in significant performance improvements for the generation of the connections. This approach also solves the problem of the high number of connection tokens to be published, though only for MPI. As MPI is not always available or the implementation is lacking, the hashbased scheme of publishing connection tokens is still required for TCP based connections.
4.4 Load Balancing
In a partitioned coupled simulation solvers need to exchange boundary data at the beginning of each iteration, which implies a synchronization point. If computational cores are not distributed in an optimal way among solvers, one solver will have to wait for the other one to finish its time step. Thus, the load imbalance reduces the computational performance. In addition, in a one way coupling scenario, if the data receiving solver is much slower than the other one, the sending partner has to wait until the other one is ready to receive (in synchronized communication) or store the data in a buffer (in asynchronous communication). In the first phase, the distribution of cores over solvers was adjusted manually and only synchronized communication was implemented, resulting in idle times.
Regression Based Load Balancing
We use the load balancing approach proposed in [37] to find the optimal core distribution among solvers: we first model the solver performance against the number of cores for each domain and then optimize the core distribution to minimize the waiting time. Since mathematical modeling of the solvers’ performance can be very complicated, we use an empirical approach as proposed in [37], first introduced in [10], to find an appropriate model.
Assuming we have a given set of m data points, consisting of pairs (p, f _{p}) mapping the number of ranks p to the runtime f _{p}, we want to find a function f(p) which predicts the runtime against p. Therefore, we use the Performance Model Normal Form (PMNF) [10] as a basis for our prediction model:
where the superscript i denotes the respective solver, n is a apriori chosen number of terms, and c _{k} is the coefficient for the kth regression term. The next step is to optimize the core distribution such that we achieve minimal overall run time which can be expressed by the following optimization problem:
This optimization problem is a nonlinear, possibly nonconvex integer program. It can be solved by the use of branch and bound techniques. But, if we assume that the f ^{i} are all monotonically decreasing, i.e., assigning more cores to a solver never increases the runtime, we can simplify the constraints to \(P = \sum _{i = 0}^{l}p_i\) and solve the problem by bruteforcing all possible choices for p _{i}. That is, we iterate over all possible combinations of core numbers and choose the pair that minimizes the total runtime. For more details, please refer to [37].
Asynchronous Communication and Buffering
For our fluidstructureacoustic scenario shown in Fig. 1, we perform an implicitly coupled simulation of the elastic structure interacting with the incompressible flow over a given discrete time step (marked simply as ‘Fluid’ in Fig. 2). This is followed by many small time steps for the acoustic wave propagation in the nearfield, which are coupled in a loose, unidirectional way to the farfield acoustic solver (executing the same small time steps). To avoid waiting times of the farfield solver while we compute the fluidstructure interactions in the nearfield, we would like to ‘stretch’ the farfield calculations such that they consume the same time as the sum of fluidstructure time steps and acoustic steps in the nearfield (see Fig. 2). To achieve this, we introduced a fully asynchronous buffer layer, by which the sending participant was decoupled from the receiving participant, as shown in Fig. 2. Special challenges to tackle were the preservation of the correct ordering of messages, especially for TCP communication which does not implement such guarantees in the protocol.
4.5 Isolated Performance of preCICE
In this section, we show numerical results for preCICE only. This isolated approach is used to show the efficiency of the communication initialization. In addition, we show standalone upscaling results. Other aspects are considered elsewhere: (a) the mapping accuracy is analyzed in Sect. 5, (b) the effectiveness of our load balancing approach as well as the buffering for unidirectional coupling are covered in Sect. 6. If not denoted otherwise, the following measurements are performed on the supercomputing systems SuperMUC^{Footnote 5} and HazelHen.^{Footnote 6}
Mapping Initialization: Preallocation and Matrix Filling
As described previously, one of the key components of mapping initialization is the spatial tree which allows for performance improvements by accelerating the interpolation matrix construction. Figure 3 compares different approaches to matrix filling and preallocation: (a) no preallocation: using no preallocation at all, i.e., allocating each entry separately, (b) explicitly computed: calculate matrix sparsity pattern in a first mesh traversal, allocate entries afterwards, and finally fill the matrix in a second mesh traversal, (c) computed and saved: additionally cache mesh element/data point relations from the first mesh traversal and use them in the second traversal to fill the matrix with less computation, (d) spatial tree: use the spatial tree instead of bruteforce pairwaise comparisons to determine mesh components relevant for the mapping. Each method can be considered as an enhancement of the previous one. As it becomes obvious from Fig. 3, the spatial tree was able to provide us a with an acceleration of more than two orders of magnitude.
Communication
For communication and its initialization, we only present results for the new singlecommunicator MPI based solution. For TCP socket communication that still requires the exchange of many connection tokens by means of the file system, we only give a rough factor of 2.5 that we observed in terms of acceleration of communication initialization. Note that this factor can be potentially higher as the number of processes and, thus, connections grows larger, and that the hashbased approach removed the hard limit of ranks per participant inherent to the old approach.
In Figs. 4, 5 and 6, we compare performance results for establishing an MPI connection among different ranks using manycommunicators for 1to1 connections with using a single communicator representing an ntom connection. In our academic setting, both Artificial Solver Testing Environment (ASTE) participants run on n cores. On SuperMUC, each rank connects to 0.4n ranks, on HazelHen, with a higher number of ranks per node, each rank connects to 0.3n ranks. The amount of data transferred between each connected pair of ranks is held constant with 1000 rounds of transfer of an array of 500, and 4000 double values from participant B to participant A. Each measurement is performed five times of which the fastest and the slowest runs are ignored and the remaining three are averaged. We present timings from rank zero, which is synchronized with all other ranks by a barrier, making the measurements from each rank identical. Note, that the measurements are not directly comparable between SuperMUC and HazelHen due to the different number of cores per node and that the test case is even more challenging than actual coupled simulations. In an actual simulation, the number of partner ranks per rank of a participant is constant with increasing number of cores on both sides.
Figure 4 shows the time to publish the connection token. The old approach requires to publish many tokens, which obviously becomes a performance bottleneck as the simulation setup moves to higher number of ranks. The new approach, on the other hand, only publishes one token. It is omitted in the plot, as the times are negligible (<2 ms). In Fig. 5, the time for the actual creation of the communicator is presented. The total number of communication partners per communicator is smaller with the old manycommunicator concept (as the communication topology is sparse). However, the creation of many 1to1 communicators is substantially slower than the creation of one alltoall communicator for both HPC systems. Finally, in Fig. 6 the performance for an exchange of data sets of two different sizes is presented. The results for single and manycommunicator approaches are mostly on par with the notable exception of the SuperMUC system. There, the new approach suffers a small but systematic slowdown for small message sizes. We argue that this is a result of vendor specific settings of the MPI implementation.
Data Mapping
As described above, we have further improved the mapping initialization, in particular by applying a treebased approach to identify data dependencies induced by the mapping between grid points of the nonmatching solver grids and to assemble the interpolation matrix for RBF mapping. Accordingly, we show both the reduction of the matrix assembly runtime (Fig. 3) and the scalability of the mapping, including setting up the interpolation system and the communication initialization.
These performance tests of preCICE are measured using a special testing application called ASTE.^{Footnote 7} This application behaves like a solver to preCICE but provides artificial data. It is used to quickly generate input data and decompose it for upscaling tests. ASTE generates uniform, rectangular, twodimensional meshes on [0, 1] × [0, 1] embedded in threedimensional space with the zdimension always set to zero. The mesh is then decomposed using a uniform approach, thus producing partitions of same size as far as possible. Since we mainly look at the mapping part which is only executed as one of the participants, we limit the upscaling to this participant. The other participant always uses one node (28 resp. 24 processors). The mesh size is kept constant, i.e., we perform a strong scaling. The upscaling of an RBF mapping with Gaussian basis functions is shown in Fig. 7.
5 BlackBox Coupling Versus WhiteBox Coupling with APESMate
In the above section, we have evaluated the performance of the blackbox coupling tool preCICE. In this section, we introduce an alternative approach that allows to couple different solvers provided within the framework APES [31]. Blackbox data mapping in preCICE only requires point values (nearest neighbor and RBF mapping), and in some cases (nearest projection) connectivity information on the coupling interface. The whitebox coupling approach of APESmate [25] has knowledge about the numerical schemes within the domain, since it is integrated in the APES suite, and has access to the common datastructure TreELM [22]. APESmate can directly evaluate the high order polynomials of the underlying Discontinuous Galerkin scheme. Thus, the mapping in preCICE is more generally applicable, while the approach in APESmate is more efficient in the context of high order scheme. Furthermore, APESmate allows the coupling of all solvers of the APES framework, both in terms of surface and in terms of volume coupling. The communication between solvers can be done in a straightforward way as all coupling participants can be compiled as modules into one single application. Each subdomain defines its own MPI subcommunicator, a global communicator is used for the communication between the subdomains. During the initialization process, coupling requests are locally gathered from all subdomains and exchanged in a roundrobin fashion. As all solvers in APES are based on an octree datastructure and a spacefilling curve for partitioning, it is rather easy to get information about the location of each coupling point on the involved MPI ranks. In the following, we compare both accuracy and runtime of the two coupling approaches for a simple academic test case that allows to control the ‘difficulty’ of the mapping by adjusting order and resolution of the two participants.
Test Case Setup
We consider the spreading of a Gaussian pressure pulse over a cubic domain of size 5 × 5 × 5 unit length, with an ambient pressure of 100,000 Pa and a density of 1.0 kg/m^{3}. The velocity vector is set to 0.0 for all spatial directions. To generate a reference solution, this test case is computed monolithically using the inviscid Euler equations.
For the coupled simulations, we decompose the monolithic test case domain into an inner and an outer domain. The resolution and the discretization order of the inner domain are kept unchanged. In the outer domain, we choose the resolution and the order such that the error is balanced with that of the inner domain. See [15] for the respective convergence study. To be able to determine the mapping error at the coupling interface between inner and outer domain, we choose the time horizon such that the pressure pulse reaches the outer domain, but is still away from the outer boundaries to avoid any influences from the outer boundaries. The test case is chosen in a way, that the differences between the meshes at the coupling interface increase, thus increasing the difficulty to maintain the overall accuracy in a blackbox coupling approach. Table 1 provides an overview of all combinations of resolution and order in the outer domain used for our numerical experiments, where the total number of elements per subdomain is given as nElements, the number of coupling points with nCoupling points and the scheme order by nScheme order, respectively. For time discretization, we consider the explicit two stage RungeKutta scheme with a time step size of 10^{−6} for all simulations.
Mapping Accuracy
In terms of mapping accuracy, it is expected, that the APESmate coupling is orderpreserving, and by that not (much) affected by the increasing differences between the nonmatching grids at the coupling interface, while preCICE should show an increasing accuracy drop when the points become less and less matching. This is the case for increasing order of the discretization in the outer domain. Figure 8 illustrates first results. As can be clearly seen, the whitebox coupling approach APESmate provides outstanding results by maintaining the overall accuracy of the monolithic solution for all different variations of the coupled simulations, independent of the degree to which the grids are nonmatching (increasing with increasing order used in the outer domain). For the interpolation methods provided by preCICE, the error increases considerably with increasing differences between the grids at the interface. As the error of the interpolation methods depends on the distances of the points (see Fig. 8), the error is dominated by the large distance of the integration points in the middle of the surface of an octree grid cell in the High Order Discontinuous Galerkin discretization.
Accuracy Improvement by Regular Subsampling
We can decrease the L2 error of NP and RBF and improve the solution of the coupled simulation by providing values at equidistant points on the Ateles side as interpolation support points. The number of equidistant points is equal to the number of coupling points, hence as high as the scheme order. With this new implementation, the error shown in Fig. 9a decreases considerably compared to the results in Fig. 8. We achieve an acceptable accuracy for all discretization order combinations. However, the regular subsampling of values in the Ateles solver increases the overall computational time substantially as can be seen in Fig. 9b.
To improve the NP interpolation, it turned out that in addition to providing equidistant points, oversampling was required to increase the accuracy. Our investigation showed, that an oversampling factor of 3 is needed to achieve almost the same accuracy as APESmate. In spite of the additional cost of many newly generated support points, the runtime does not increase as much as for RBF, since for the RBF a linear equation system has to be computed, while for NP a simple projection needs to be done.
Summary and Runtime Comparison
Figure 10 shows a summary of all tested methods for the interpolation/evaluation before and after improvements. The integrated coupling approach APESmate provides not just very accurate results, but also low runtimes. At this point, we want to recall that this is as expected—the whitebox approach makes use of all internal knowledge, which gives it advantages in terms of accuracy and efficiency. On the other hand, this internal knowledge binds it to the solvers available in the framework, while preCICE can be applied to almost all available solvers. Further details regarding this investigation can be found in [15, 16].
6 Results
This section presents a more realistic test case, the turbulent flow over a fence, to assess the overall performance of our approach. Analyses for accuracy and specific isolated aspects are integrated in the sections above.
6.1 Flow over a Fence Test Case Setup
As a test case to assess the overall scalability, we simulate the turbulent flow over a (flexible) fence and the induced acoustic farfield as already shown schematically in Fig. 1. The FSI functionality of FASTEST has been demonstrated earlier many times, e.g. in [34]. Thus we focus on the acoustic coupling.
As boundary conditions, we use a noslip wall at the bottom and the fence surface, an inflow on the left with u _{bulk}, outflow convective boundary conditions on the right, periodic boundary conditions in ydirections, and slip conditions at the upper boundary for the nearfield flow. For the acoustic perturbation, we apply reflection conditions at the bottom and the fence surface, zerogradient condition at all other boundaries. The acoustic farfield solver uses Dirichlet boundary conditions at its lower boundary (see also Eq. (9)). Therefore, the upper nearfield boundary is not the coupling interface, but we instead overlap nearfield and farfield as shown in Fig. 11. Figures 12 and 13 show a snapshot of the nearfield flow and the nearfield and farfield acoustic pressure, respectively.
6.2 FluidAcoustics Coupling with FASTEST and Ateles
To demonstrate the computational performance of our framework using FASTEST for the flow simulation in the nearfield, the highorder DG solver Ateles for the farfield acoustic wave propagation, and preCICE for coupling, we show weak scalability measurements for the interaction between nearfield flow simulation and farfield acoustics. We keep both the mesh and the number of MPI ranks in the nearfield flow simulation fixed. In the farfield computed with Ateles, we refine the mesh to better capture the acoustic wave propagation. We use a multilevel mesh with a fine mesh at the coupling interface to allow a smooth solution at the coupling interface between the nearfield and the farfield. We refine the farfield mesh in two main steps: in the first step, we only refine the mesh at the coupling interface. In the next step, we first refine the whole mesh, and again the mesh at the coupling interface in the third and fourth step. Due to the refinement at the coupling interface, the number of Ateles ranks participating in the interface increases such that this study also shows that the preCICE communication does not deteriorate scalability. Table 2 gives an overview of the configurations used for the weak scaling study.
To find the optimal core distribution for all setups, the load balancing approach proposed in Sect. 4 is used. This analysis shows that for the smallest mesh resolution with 24,864 elements in the farfield, the optimal core distribution is 424 cores for the nearfield domain and 196 cores for the farfield. For all other setups, we assume perfect scalability, i.e., we choose the number of cores proportional to the number of degrees of freedom in the weak scaling study and increase the number of cores simultaneously by a factor of two in both fields for strong scaling. The scalability measurements are shown in Fig. 14. The results show that the framework scales almost perfectly up to 6528 cores.
6.3 FluidAcoustics Coupling with Only Ateles
In Sect. 5 we investigated the suitability of different interpolation methods for our simulations. In this section, we present a strong scaling study for an AtelesAteles coupled simulation of the flow over a fence test case. The fence is modelled in Ateles using the newly implemented immersed boundary method, enabling highorder representation of complex geometries in Ateles [1]. We solve the compressible NavierStokes equations in the flow domain with a scheme order of 4 and a four step mixed implicitexplicit RungeKutta time stepping scheme, with a time step size of 10^{−7}. The total number of elements in the flow domain is 192,000. For the farfield, we use the same setup as for the FASTESTAteles coupling.
The linearized Euler equations in the farfield can be solved in a DG setting in the modal formulation, which makes the solver very cheap even for very high order. In the nearfield domain, the nonlinear NavierStokes equations are solved with a more expensive hybrid nodalmodal approach. Due to this, and the different spatial discretizations and scheme orders, both domains have different computational load, which requires load balancing. We use static load balancing, since neither the mesh nor the scheme order vary during runtime. As both solvers are instances of Ateles, we apply the SpartA algorithm [21], which allows repartitioning of the workload according to weights per elements, which are computed during runtime. Those weights are then used to redistribute the elements according to the workload among available processes (see [17] for more details). The total number of processes used for this test case are 14,336 processes, which is equal to one island on the system. As mentioned previously, the total workload per subdomain does not change, therefore we start our measurements by providing the lower subdomain 100 processes and the upper subdomain 12 processes, which is equal to 4 nodes on the system. This number per subdomain is then doubled for each run, the ratio is kept the same.
Figure 15 shows the strong scaling measurements for both coupling approaches (APESmate and preCICE) executed on the SuperMUC Phase2 system. As can be clearly seen, both coupling setups AtelesAPESmateAteles and AtelespreCICEAteles scale almost ideally, however with a lower absolute runtime for the APESmate coupling as expected.
7 Summary and Conclusion
We have presented a partitioned simulation environment for the massively parallel simulation of fluidstructureacoustic interactions. Our setup uses the flow and acoustic solvers in the finite volume software FASTEST, the acoustic solvers in the discontinuous Galerkin framework Ateles as well as the blackbox fully parallel coupling library preCICE. In particular, we could show that with a careful design of the coupling tool as well as of solver details, we can achieve a bottleneckfree numerically and technically highly scalable solution. It turned out that efficient initialization of pointtopoint communication relations and mapping matrices between the involved participants, sophisticated intercode load balancing and asynchronous communication using message buffering are crucial for largescale scenarios. With these improvements, we advanced the limits of scalability of partitioned multiphysics simulations from less than a hundred cores to more than 10,000 cores. Beyond that, we reach a problem size that is not required by the given problem as well as scalability limits of the solvers. The coupling itself is not the limiting factor for the given problem size and degree of parallelism. To be able to use also vector architectures in an efficient sustainable way, we adapted our solvers with a highly effective code transformation approach.
Notes
 1.
To capture the moving structure within the nearfield, we actually formulate all nearfield equations in an arbitrary LagrianEulerian perspective. For the relative mesh velocity, we use a blockwise elliptic mesh movement as described in [30]. As we do not show fluidstructure interaction in this contribution, however, we formulate all nearfield equations in a pure Eulerian perspective for the sake of simplicity.
 2.
 3.
 4.
 5.
28× IntelXeonE52697 cores, 64 GB memory per node.
 6.
24× Intel XeonE52680 cores, 128 GB memory per node.
 7.
References
Anand, N., Pour, N.E., Klimach, H., Roller, S.: Utilization of the brinkman penalization to represent geometries in a highorder discontinuous Galerkin scheme on octree meshes. Symmetry 11(9), 1126 (2019). https://doi.org/10.3390/sym11091126
Banks, J.W., Henshaw, W.D., Schwendeman, D.W.: An analysis of a new stable partitioned algorithm for FSI problems. Part I: incompressible flow and elastic solids. J. Comput. Phys. 269, 108–137 (2014)
Bazilevs, Y., Takizawa, K., Tezduyar, T.E.: Computational FluidStructure Interaction: Methods and Applications. Wiley, Hoboken (2012)
Blom, D., Ertl, T., Fernandes, O., Frey, S., Klimach, H., Krupp, V., Mehl, M., Roller, S., Sternel, D.C., Uekermann, B., Winter, T., van Zuijlen, A.: Partitioned fluidstructureacoustics interaction on distributed data – numerical results and visualization. In: Bungartz, H.J., Neumann, P., Nagel, E.W. (eds.) Software for Exascale Computing – SPPEXA 2013–2015. Springer, Berlin (2016)
Boer, A.D., van Zuijlen, A., Bijl, H.: Comparison of conservative and consistent approaches for the coupling of nonmatching meshes. Comput. Methods Appl. Mech. Eng. 197(49), 4284–4297 (2008)
Bungartz, H., Mehl, M., Schäfer, M.: Fluid Structure Interaction II: Modelling, Simulation, Optimization. Lecture Notes in Computational Science and Engineering. Springer, Berlin (2010)
Bungartz, H.J., Lindner, F., Mehl, M., Uekermann, B.: A plugandplay coupling approach for parallel multifield simulations. Comput. Mech. 55(6), 1119–1129 (2015)
Bungartz, H.J., Lindner, F., Gatzhammer, B., Mehl, M., Scheufele, K., Shukaev, A., Uekermann, B.: preCICE – a fully parallel library for multiphysics surface coupling. Comput. Fluids 141, 250–258 (2016). Advances in FluidStructure Interaction
Bungartz, H.J., Lindner, F., Mehl, M., Scheufele, K., Shukaev, A., Uekermann, B.: Partitioned fluidstructureacoustics interaction on distributed data – coupling via preCICE. In: Bungartz, H.J., Neumann, P., Nagel, E.W. (eds.) Software for Exascale Computing – SPPEXA 2013–2015. Springer, Berlin (2016)
Calotoiu, A., Beckinsale, D., Earl, C.W., Hoefler, T., Karlin, I., Schulz, M., Wolf, F.: Fast multiparameter performance modeling. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 172–181 (2016). https://doi.org/10.1109/CLUSTER.2016.57
Caretto, L.S., Gosman, A.D., Patankar, S.V., Spalding, D.B.: Two calculation procedures for steady, threedimensional flows with recirculation. In: Proceedings of the Third International Conference on Numerical Methods in Fluid Dynamics. Springer, Paris (1972)
Crosetto, P., Deparis, S., Fourestey, G., Quarteroni, A.: Parallel algorithms for fluidstructure interaction problems in haemodynamics. SIAM J. Sci. Comput. 33(4), 1598–1622 (2011)
Degroote, J., Bathe, K.J., Vierendeels, J.: Performance of a new partitioned procedure versus a monolithic procedure in fluidstructure interaction. Comput. Struct. 87(11–12), 793–801 (2009)
Dhondt, G.: The Finite Element Method for ThreeDimensional Thermomechanical Applications. Wiley, Chichester (2004)
Ebrahimi Pour, N., Roller, S.: Error investigation for coupled simulations using discontinuous galerkin method for discretisation. In: Proceedings of ECCM VI/ECFD VII, Glasgow, June 2018
Ebrahimi Pour, N., Krupp, V., Klimach, H., Roller, S.: Coupled simulation with two coupling approaches on parallel systems. In: Resch, M.M., Bez, W., Focht, E., Gienger, M., Kobayashi, H. (eds.) Sustained Simulation Performance 2017, pp. 151–164. Springer International Publishing (2017). https://doi.org/10.1007/9783319668963_10
Ebrahimi Pour, N., Krupp, V., Roller, S.: Load balancing for coupled simulations. In: Resch, M.M., Bez, W., Focht, E., Gienger, M., Kobayashi, H. (eds.) Sustained Simulation Performance 2019. Springer International Publishing, Cham (2019)
Fachgebiet für Numerische Berechnungsverfahren im Maschinenbau: FASTEST Manual (2005)
Gee, M.W., Küttler, U., Wall, W.A.: Truly monolithic algebraic multigrid for fluid–structure interaction. Int. J. Numer. Methods Eng. 85(8), 987–1016 (2011)
Haelterman, R., Degroote, J., van Heule, D., Vierendeels, J.: The quasiNewton least squares method: a new and fast secant method analyzed for linear systems. SIAM J. Numer. Anal. 47(3), 2347–2368 (2009)
Harlacher, D.F., Klimach, H., Roller, S., Siebert, C., Wolf, F.: Dynamic load balancing for unstructured meshes on spacefilling curves. In: 2012 IEEE 26th International on Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pp. 1661–1669 (2012)
Klimach, H.G., Hasert, M., Zudrop, J., Roller, S.P.: Distributed octree mesh infrastructure for flow simulations. In: Eberhardsteiner, J. (ed.) ECCOMAS 2012  European Congress on Computational Methods in Applied Sciences and Engineering, eBook Full Papers (2012)
Kong, F., Cai, X.C.: Scalability study of an implicit solver for coupled fluidstructure interaction problems on unstructured meshes in 3D. Int. J. High Perform. Comput. Appl. 32, 207–219 (2016)
Kornhaas, M., Schäfer, M., Sternel, D.: Efficient numerical simulation of aeroacoustics for low Mach number flows interacting with structures. Comput. Mech. 55, 1143–1154 (2015). https://doi.org/10.1007/s0046601411141
Krupp, V., Masilamani, K., Klimach, H., Roller, S.: Efficient coupling of fluid and acoustic interaction on massive parallel systems. In: Sustained Simulation Performance 2016, pp. 61–81 (2016)
Lindner, F.: Data transfer in partitioned multiphysics simulations: interpolation & communication. PhD thesis, University of Stuttgart (2019)
Lindner, F., Mehl, M., Uekermann, B.: Radial basis function interpolation for blackbox multiphysics simulations. In: Papadrakakis, M., Schrefler, B., Onate, E. (eds.) VII International Conference on Computational Methods for Coupled Problems in Science and Engineering, pp. 1–12 (2017)
Link, G., Kaltenbacher, M., Breuer, M., Döllinger, M.: A 2D finiteelement scheme for fluidsolidacoustic interactions and its application to human phonation. Comput. Methods Appl. Mech. Eng. 198(41–44), 3321–3334 (2009)
Mehl, M., Uekermann, B., Bijl, H., Blom, D., Gatzhammer, B., van Zuijlen, A.: Parallel coupling numerics for partitioned fluidstructure interaction simulations. Comput. Math. Appl. 71(4), 869–891 (2016)
Reimann, T.: Numerische simulation von fluidstrukturinteraktion in turbulenten strömungen. Ph.D. thesis, Technische Universität Darmstadt (2013)
Roller, S., Bernsdorf, J., Klimach, H., Hasert, M., Harlacher, D., Cakircali, M., Zimny, S., Masilamani, K., Didinger, L., Zudrop, J.: An adaptable simulation framework based on a linearized octree. In: Resch, M., Wang, X., Bez, W., Focht, E., Kobayashi, H., Roller, S. (eds.) High Performance Computing on Vector Systems 2011, pp. 93–105. Springer, Berlin (2012)
Ross, M.R., Felippa, C.A., Park, K., Sprague, M.a.: Treatment of acoustic fluidstructure interaction by localized Lagrange multipliers: formulation. Comput. Methods Appl. Mech. Eng. 197(33–40), 3057–3079 (2008)
Schäfer, F., Müller, S., Uffinger, T., Becker, S., Grabinger, J., Kaltenbacher, M.: Fluidstructureacoustic interaction of the flow past a thin flexible structure. AIAA J. 48(4), 738–748 (2010)
Schäfer, M., Sternel, D.C., Becker, G., Pironkov, P.: Efficient numerical simulation and optimization of fluidstructure interaction. In: Bungartz, H.J., Mehl, M., Schäfer, M. (eds.) Fluid Structure Interaction II. Modelling, Simulation, Optimization. Lecture Notes in Computational Science and Engineering, vol. 73, pp. 131–158. Springer, Berlin (2010)
Scheufele, K., Mehl, M.: Robust multisecant quasiNewton variants for parallel fluidstructure simulations – and other multiphysics applications. SIAM J. Sci. Comput. 39(5), 404–433 (2017)
Stone, H.L.: Iterative solution of implicit approximations of multidimensional partial differential equations. SIAM J. Numer. Anal. 5(3), 530–558 (1968)
Totounferoush, A., Ebrahimi Pour, N., Schroder, J., Roller, S., Mehl, M.: A new load balancing approach for coupled multiphysics simulations (2019). https://doi.org/10.1109/IPDPSW.2019.00115
Turek, S., Hron, J., Madlik, M., Razzaq, M., Wobker, H., Acker, J.: Numerical simulation and benchmarking of a monolithic multigrid solver for fluidstructure interaction problems with application to hemodynamics. In: Bungartz, H.J., Mehl, M., Schäfer, M. (eds.) Fluid Structure Interaction II: Modelling, Simulation, Optimization, p. 432. Springer, Berlin (2010)
Uekermann, B.: Partitioned fluidstructure interaction on massively parallel systems. Ph.D. thesis, Department of Informatics, Technical University of Munich (2019)
Uekermann, B., Bungartz, H.J., Cheung Yau, L., Chourdakis, G., Rusch, A.: Official preCICE adapters for standard opensource solvers. In: 7th GACM Colloquium on Computational Mechanics, pp. 1–4 (2017)
Vierendeels, J., Lanoye, L., Degroote, J., Verdonck, P.: Implicit coupling of partitioned fluidstructure interaction problems with reduced order models. Comput. Struct. 85(11–14), 970–976 (2007)
Zudrop, J., Klimach, H., Hasert, M., Masilamani, K., Roller, S.: A fully distributed CFD framework for massively parallel systems. In: Cray User Group Conference. Cray User Group, Stuttgart (2012)
Acknowledgements
We thankfully acknowledge funding from SPPEXA, the German Science Foundation Priority Program 1648—Software for Exascale Computing and the European Union’s Horizon 2020 research and innovation program under the Marie SklodowskaCurie grant agreement No 754462, and compute time granted on the LRZ SuperMUC system and the HLRS HazelHen system.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2020 The Author(s)
About this paper
Cite this paper
Lindner, F. et al. (2020). ExaFSA: Parallel FluidStructureAcoustic Simulation. In: Bungartz, HJ., Reiz, S., Uekermann, B., Neumann, P., Nagel, W. (eds) Software for Exascale Computing  SPPEXA 20162019. Lecture Notes in Computational Science and Engineering, vol 136. Springer, Cham. https://doi.org/10.1007/9783030479565_10
Download citation
DOI: https://doi.org/10.1007/9783030479565_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783030479558
Online ISBN: 9783030479565
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)