TerraNeo—Mantle Convection Beyond a Trillion Degrees of Freedom
- 1 Mentions
- 1.5k Downloads
Abstract
Simulation of mantle convection on planetary scales is considered a grand-challenge application even in the exascale era. The reason being the enormous spatial and temporal scales that must be resolved in the computation as well as the complexities of realistic models and the large parameter uncertainties that need to be handled by advanced numerical methods. This contribution reports on the TerraNeo project which delivered novel matrix-free geometric multigrid solvers for the Stokes system that forms the core of mantle convection models. In TerraNeo the hierarchical hybrid grids paradigm was employed to demonstrate that scalability can be achieved when solving the Stokes system with more than ten trillion (1.1 ⋅ 10^{13}) degrees of freedom even on present-day peta-scale supercomputers. Novel concepts were developed to ensure resilience of algorithms even in case of hard faults and new scheduling algorithms proposed for ensemble runs arising in Multilevel Monte Carlo algorithms for uncertainty quantification. The prototype framework was used to investigate geodynamic questions such as high velocity asthenospheric channels and dynamic topography and to perform adjoint inversions. We also describe the redesign of our software to support more advanced discretizations, adaptivity, and highly asynchronous execution while ensuring sustainability and flexibility for future extensions.
1 Introduction and Motivation
Geodynamics
Mantle convection is a critical component of the Earth system. Although composed of rocks, the Earth’s mantle behaves like a highly viscous fluid on geological time-scales. Its motion is driven by internal heating due to radioactive decay and by substantial heating from below. The latter stems from the release of primordial heat stored in the core from the time of the planet’s accretion. The mantle convection currents are largely responsible for many of Earth’s surface tectonic features, forming mountain chains and oceanic trenches through plate tectonics, and contributing significantly to the accumulation of stresses released in inter-plate earthquakes. Hence, a thorough quantitative understanding of mantle convection is indispensible to gain further insight into these processes. However, besides such sort of fundamental questions, mantle convection does also have direct influence on some societal and commercial issues. Viscous stresses caused by up- and downwellings in the mantle induce dynamic topography, i.e. they lead to elevation or depression of parts of Earth’s surface. Reconstructing the latter and the associated sea-levels back in time is crucial for localisation of oil-reservoirs and the determination of future sea-level rises caused by climate change.
Although the basic equations for describing the mantle convection process are not in question, resulting from the force balance between viscous and buoyancy forces and conservation of mass and energy, key system parameters, such as the buoyancies and viscosities, remain poorly known. In particular the rheology of the mantle, which is a fundamental input parameter for geodynamic models, is not well known. Studies based on modeling the geoid e.g. [87], the convective planform e.g. [21], glacial isostatic adjustment e.g. [72, 80], true polar wander e.g. [82, 86, 89] and plate motion changes e.g. [57] consistently point to the need for a significant viscosity increase between the upper and the lower mantle. But the precise form of the viscosity profile remains uncertain. Commonly the viscosity profiles display a peak in the mid lower mantle [81, 85], or involve a rheology contrast located around 1000 km depth [93], or favor an asthenosphere with a strong viscosity reduction to achieve high flow velocities and stress amplification in the sublithospheric mantle [51, 102]. Geodynamic arguments on the uncertainties and trade-offs in the viscosity profile of the upper mantle have been summarized recently by Richards and Lenardic [88].
TerraNeo
TerraNeo^{1}aims to design and realize a new community software framework for extreme scale Earth Mantle simulations. We have a special constellation where groups from Geophysics, Numerical Mathematics, and High Performance Computing collaborate towards a unique co-design effort. The first 3-year funding period has already been summarized in [7]. Initially, the team had focussed on fundamental mathematical questions together with general scalability and performance issues, as documented in [38, 40, 41, 42, 43]. In particular it could be shown that even with peta-scale class machines, computations are possible that resolve the global Earth Mantle with about 1 km resolution, requiring the solution of indefinite sparse linear systems with more than 10^{12} degrees of freedom. These very large computations and scaling experiments were performed with a prototype software that was implemented using the pre-existing hierarchical hybrid grids (HHG) multigrid library [12, 13]. This step was necessary to gain experience with parallel multigrid solvers for indefinite problems and the specific difficulties arising in Earth Mantle models. While this approach led to quick first results, the prototype character of HHG implied several fundamental limitations. Therefore, the second funding period is being used for a fundamental redesign of HHG under the new acronym HyTeG for Hybrid Tetrahedral Grids. The goal is to leverage the lessons learned with HHG for the design of a new, extremely fast, but also flexible, extensible, and sustainable Geophysics software framework.
Additionally, research on several exascale topics was conducted in both funding periods. These include resilience, inverse problems, and uncertainty quantification. Detailed results on these topics, as well as on the new HyTeG software architecture, will be reported in the following sections of this report.
Multigrid Methods
Multigrid methods belong to the set of fastest solvers for sparse linear systems that arise from the discretization of PDEs and are, thus, of central importance for exascale research. Their invention and their popularization e.g. in [17] constitute one of the major breakthroughs in numerical mathematics and computational science. To illustrate their relevance for exascale computing, we summarize here a thought experiment from [90].
A uniform discretization of the mantle at for instance 1 km resolution would result in meshes with nearly a trillion elements, which is far beyond the capacity of the largest available supercomputers.
It is important to realize that parallel computing alone does not solve the problem. Even the fastest German supercomputer, currently SuperMUC-NG^{2}, would still need longer than a year of compute time (if it did not run out of memory before) to execute 10^{24} operations.
In the TerraNeo project we have demonstrated, and we will report below, that a well designed massively parallel multigrid method is indeed capable of solving such large systems with a trillion unknowns in compute times in the order of seconds. This is fundamentally based on their fast convergence, for many types of problems, that leads to asymptotically optimal complexity solvers, when it is combined with a nested iteration in the form of a so called full multigrid method [17, 41]. Based on this, it comes as little surprise that multigrid methods are employed in several of the SPPEXA projects, such as e.g. [3, 4, 24, 69, 75] with excellent success for a wide variety of applications.
In the literature on multigrid methods of the past two decades, much attention has been given to so called algebraic multigrid methods (AMG) that do not operate on a mesh structure, but that attempt to construct the necessary hierarchy based on analyzing the sparse system matrix. These methods can exhibit excellent parallel scalability [3] and are often used as components within other parallel solvers, such as domain decomposition methods [62]. AMG methods have several advantages, most importantly that they can be interfaced to an application via classical sparse matrix technology. They can also save the application developer from worrying about the necessary solution process and many numerical analysts who design novel discretizations have taken this perspective. However, AMG methods work only well for certain types of systems, and consequently users of these new discretization techniques now find themselves being trapped with linear systems for which no efficient parallel algorithms are available, yet. Few methods devised in the applied mathematics community can demonstrate even solving systems with 10^{9} degrees of freedom, falling three or four orders behind of what we can demonstrate here.
Additionally, the convenience of algebraic multigrid also comes at the price of a loss in performance. Algebraic multigrid methods have only a limited scope of applicability, and additionally they often lose an order of magnitude in performance in their expensive setup phase. Of course they are inherently also not matrix-free. As we will discuss below, matrix-free methods are essential to reach maximal performance. Geometric multigrid methods can be realized as matrix-free algorithms, potentially leading to another order of magnitude in performance improvement. For this reason we have invested heavily in new matrix-free methods [8, 9, 11], similar to other SPPEXA projects [5, 67].
The price of using geometric multigrid lies in the more complex algorithms which often have to be tailored carefully to each specific application, and the significantly more complex interfaces that are needed between the other software components and the solver. This in turn creates the need for new software technologies as are being developed in the HyTeG framework.
2 Basic Ideas and Concepts
Physical quantities and their representing symbols
R_{0} | Earth’s radius | ρ_{0} | Reference density | g | Gravitational acceleration |
η | Dynamic viscosity | α | Thermal expansivity | κ | Heat conductivity |
ΔT | Temperature difference between core-mantle-boundary and surface |
The computational domain is distributed to the parallel processes by means of the unstructured coarse grid elements. To realize efficient communication procedures, the mesh is enhanced by so-called interface primitives. For each shared face, edge, and vertex an additional macro-primitive is allocated in the data structure. Together with the macro-tetrahedra, each primitive is assigned to one parallel process. Starting from a two times refined mesh, each macro-primitive has at least one inner vertex.
A description of the general form of the temperature equation and our work on its discretization and time-stepping can be found in [7].
3 Summary of Project Results
3.1 Efficiency of Solvers and Software
A rigorous quantitative performance analysis lies at the heart of systematic research in large scale scientific computing. The systematic analysis of performance is central to the research agenda of computational science [91], since it defines how successful computational models are. In this sense, performance analysis here means more than measuring the speedup of a parallel solver or studying its weak and strong scaling properties. Beyond this, modern scientific computing must attempt to quantify numerical cost in metrics that are independent of the run-time of an arbitrary reference implementation. Most importantly, the numerical cost must be set in relation to what is numerically achieved, i.e. the accuracy that is delivered by a specific computation.
As one step in this direction, we extend Achi Brandt’s notion of textbook multigrid efficiency (TME) to massively parallel algorithms in [41]. Using the finite element based prototype multigrid implementation of the HHG library, we have employed the TME paradigm for scalar linear equations with constant and varying coefficients as well as to linear systems with saddle-point structure. A core step is then to extend the idea of TME to the parallel setting in a way that is adapted to the parallel architecture under consideration. To this end, we develop a new characterization of a work unit (WU) in an architecture-aware fashion by taking into account modern performance modeling techniques, in particular the standard Roofline model and the more advanced ECM model. The newly introduced parallel TME measure is studied in [41] with large-scale computations and for solving problems with up to 200 billion unknowns.
3.2 Reducing Complexity in Models and Algorithms
When striving for optimal efficiency, it is essential to choose models and discretization schemes whose incurred computational cost is as low as possible. Within TerraNeo we have therefore analyzed and improved the discretization schemes under study. A restricting factor was initially that only simple conforming discretizations are feasible in the prototype HHG library. E.g. discontinuous Galerkin discretizations, as in [6, 69], were not yet feasible in HHG. In particular, buoyancy-driven flow models thus demand a careful treatment of the mass-balance equation to avoid spurious source and sink terms in the non-linear coupling between flow and transport. In the context of finite-elements, it is therefore commonly proposed to employ sufficiently rich pressure spaces, containing piecewise constant shape functions to obtain local or even strong mass-conservation. In three-dimensional computations, this usually requires nonconforming approaches, special meshes or higher order velocities, which make these schemes prohibitively expensive for some applications and complicate the implementation into legacy code. In [39], we propose and analyze a lean and conservatively coupled scheme based on standard stabilized linear equal-order finite elements for the Stokes part and vertex-centered finite volumes for the energy equation. We show that in a weak mass-balance it is possible to recover exact conservation properties by a local flux-correction which can be computed efficiently on the control volume boundaries of the transport mesh. Furthermore, we discuss implementation aspects and demonstrate the effectiveness of the flux-correction by different two- and three-dimensional examples which are motivated by geophysical applications in [101].
In the special case of constant viscosity the Stokes system can be cast into different formulations by exploiting the incompressibility constraint. For instance the strain in the weak formulation can be replaced by the gradient to decouple the velocity components in the different coordinate directions. Thus the discretization of the simplified problem leads to fewer nonzero entries in the stiffness matrix. This is of particular interest in large scale simulations where a reduced memory footprint and accordingly reduced bandwidth requirement can help to significantly accelerate the computations. In the case of a piecewise constant viscosity, as it typically arises in multi-phase flows, or when the boundary conditions involve traction, the situation is more complex, and the cross derivatives in the original Stokes system must be treated with care. A naive application of the standard vectorial Laplacian results in a physically incorrect solution, while formulations based on the strain increase the computational effort everywhere, even when the inconsistencies arise only from an incorrect treatment in a small fraction of the computational domain. In [55], we present a new approach that is consistent with the strain-based formulation and preserves the decoupling advantages of the gradient-based formulation in iso-viscous subdomains. The modification is equivalent to locally changing the discretization stencils, hence the more expensive discretization is restricted to a lower dimensional interface, making the additional computational cost asymptotically negligible. We demonstrate the consistency and convergence properties of the new method and show that in a massively parallel setup, the multigrid solution of the resulting discrete systems is faster than for the classical strain-based formulation.
3.3 Stokes Solvers and Performance
3.3.1 Multigrid Approaches for the Stokes System
Characteristics of the different supercomputers used for simulations presented in this publication
SuperMUC Phase 1 | ||||
---|---|---|---|---|
(Thin nodes) | SuperMUC Phase 2 | Juqueen | Hazel Hen | |
Operation | 2012–2018 | 2015–present | 2012–2018 | 2015–present |
# nodes | 9216 | 3072 | 28,672 | 7712 |
CPU | Intel Sandy Bridge | Intel Haswell | IBM | Intel Haswell |
E5-2580 | E5-2697v3 | PowerPC A2 | E5-2680v3 | |
8 Core | 14 Core | 16 Core | 12 Core | |
CPU frequency (GHz) | 2.7 | 2.6 | 1.6 | 2.5 |
# total cores | 147,456 | 80,016 | 458,752 | 185,088 |
Interconnect | Infiniband FDR10 | Infiniband FDR14 | 5D Torus | Aries |
Total memory (TByte) | 288 | 194 | 448 | 987 |
Linpack (PFlop/s) | 2.897 | 2.814 | 5.0 | 7.42 |
Exploring the limits of the monolithic multigrid solver: weak scaling on JUQUEEN
Nodes | Threads | DoFs | Iter | Time [s] | Time [s] w/o coarse grid solver |
---|---|---|---|---|---|
5 | 80 | 2.7 × 109 | 10 | 685.88 | 678.77 |
40 | 640 | 2.1 × 1010 | 10 | 703.69 | 686.24 |
320 | 5120 | 1.2 × 1011 | 10 | 741.86 | 709.88 |
2560 | 40,960 | 1.7 × 1012 | 9 | 720.24 | 671.63 |
20,480 | 327,680 | 1.1 × 1013 | 9 | 776.09 | 681.91 |
3.3.2 Smoothers for Indefinite Systems
Number of iterations to reduce the residual by factor of 10^{−8} (Intel Xeon)
ν | 4 | 6 | 8 | |||
---|---|---|---|---|---|---|
DoFs | Iter | Time [s] | Iter | Time [s] | Iter | Time [s] |
1.4 × 103 | 13 | 0.10 | 9 | 0.07 | 7 | 0.06 |
1.4 × 104 | 12 | 0.21 | 9 | 0.18 | 7 | 0.15 |
1.2 × 105 | 12 | 0.61 | 8 | 0.51 | 6 | 0.44 |
1.0 × 106 | 11 | 2.44 | 8 | 2.33 | 6 | 2.16 |
8.2 × 106 | 11 | 14.54 | 8 | 14.58 | 6 | 14.03 |
6.6 × 107 | 11 | 102.66 | 7 | 92.09 | 6 | 101.90 |
5.3 × 108 | 10 | 700.38 | 7 | 693.75 | 6 | 769.34 |
While the all-at-once multigrid solver requires a special smoother, the Schur-complement formulation allows for a natural application of multigrid on the velocity component as part of an inner iteration in a preconditioned conjugate gradient method for the pressure. In case of an isoviscous flow the Schur complement is spectrally equivalent to the mass matrix and thus the condition number does not deteriorate with an increasing number of degrees of freedom. Due to its simple structure this iterative solver can be easily implemented by reusing standard software components and is thus suitable for a rigorous performance analysis. We do not report here in further detail, but an innovative performance analysis can be found in [40, 41]. Good scalability on more than half a million parallel threads is demonstrated in [42]. Together with the excellent node-level performance, this is essential for achieving high performance levels.
3.3.3 Multigrid Coarse Grid Solvers
Parallel efficiency for master-slave agglomeration technique and up to more than 50,000 parallel threads on JUQUEEN
np | DoF | Red. | T[s] | Coarse | Par. eff |
---|---|---|---|---|---|
30 | 8.3 × 107 | 1 | 16.284 | 0.043 | 1.00 |
120 | 3.3 × 108 | 1 | 16.426 | 0.050 | 0.99 |
960 | 2.6 × 109 | 1 | 17.084 | 0.171 | 0.95 |
7680 | 2.4 × 1010 | 1 | 17.310 | 0.382 | 0.94 |
61,440 | 1.7 × 1011 | 8 | 17.704 | 0.877 | 0.92 |
Parallel efficiency in a weak scaling experiment for a geodynamically relevant setting on Hazel Hen
DOF | Iter | Time [s] | Time [s] | |||||
---|---|---|---|---|---|---|---|---|
Proc. | Fine | Total | Fine | BLR 𝜖 | Coarse Ana. & fac. | Par. eff. | ||
1920 | 2.10 × 1010 | 15 | 78.1 | 77.9 | 10^{−3} | 0.03 | 2.7 | 1.00 |
15,360 | 4.30 × 1010 | 13 | 88.9 | 86.8 | 10^{−3} | 0.22 | 25.0 | 0.93 |
43,200 | 1.70 × 1011 | 14 | 95.5 | 87.0 | 10^{−8} | 0.59 | 111.6 | 0.82 |
3.4 Multi-Level Monte Carlo
Due to errors in measurements, geophysical data are inevitably stochastic. Therefore, it is desirable to quantify these uncertainties when computing geophysical applications. A typical approach is to use ensemble simulations, i.e. running multiple computations with slight variations of perturbed data, and evaluating computational quantities of interest via Monte Carlo sampling. A naive sampling-based uncertainty quantification for 3D partial differential equations results in an extremely large computational complexity. More sophisticated approaches, such as multilevel Monte Carlo (MLMC), can reduce this complexity significantly. The performance can be further enhanced when the Monte Carlo sampling over several levels of mesh refinement is combined with a fast multigrid solver. In a parallel environment, however, sophisticated scheduling strategies are needed to exploit MLMC based on multigrid solvers. In [28], we explored the concurrent execution across the three layers of the MLMC method. Namely, parallelization across levels, across samples, and across the spatial grid.
The one-layer homogeneous strategy, as shown in Fig. 4 (left), offers no flexibility. The theoretical run-time is simply given by the sum of the time of all samples. Even if this method guarantees perfect load balancing, it will not lead to an optimal efficiency since on the coarser levels the scalability of the solver is typically significantly worse than on the finer levels. On the coarsest level we may even have less grid points than processors. Thus, we discard this approach from our list of possible options.
In case of large run-time variations of the samples, we additionally consider dynamic variants where samples can be assigned to processors dynamically. Figure 5 (right) illustrates the DyLeSyHom strategy. Here it is essential that not all processor blocks execute the same number of sequential steps.
Comparison of static and dynamic scheduling strategies
LeSyHom | DyLeSyHom | ||
---|---|---|---|
Level | time [s] | time [s] | Ratio |
0 | 500 | 460 | 0.92 |
1 | 1512 | 1347 | 0.89 |
2 | 5885 | 5596 | 0.95 |
Total | 7897 | 7403 | 0.94 |
The largest uncertainty quantification scenario we considered, involved finest grids with almost 70 billion degrees of freedom, and a total number of samples beyond 10, 000, most of which are computed on coarser levels. This computation was executed on the JUQUEEN supercomputer using more than 132, 000 processor cores in an excellent overall compute time of about 1.5 h.
3.5 Inverse Problem and Adjoint Computations
The adjoint method is a powerful technique that allows the computation of sensitivities (Fréchet derivatives) with respect to model parameters. It solves inverse problems where analytical solutions are not available or the cost to solve the associated forward problem many times is prohibitively high. In geodynamics it has been applied to the inverse problem of mantle convection—i.e., to restore past mantle flow e.g. [22], where it finds an optimal initial flow state that evolves into the present-day state of Earth’s mantle. In doing so, the adjoint method has the potential to link together diverse observations and theoretical expectations from seismology, geology, mineral physics, paleomagnetism and fluid dynamics, greatly enhancing our understanding of the solid Earth system.
Adjoint equations for mantle flow restoration have been derived for incompressible [22, 52, 59], compressible [35] and thermo-chemical mantle flow [36], and the uniqueness properties of the inverse problem have been related explicitly to the tangential component of the surface velocity field of a mantle convection model [25]. So knowledge of the latter is essential to assure convergence [100] and to obtain a small null space for the restored flow history [52]. To reduce the computational cost, we use a two-scale time step size predictor-corrector strategy to couple between Stokes and temperature similar to the one discussed in [46].
3.5.1 Twin Experiment
3.6 Matrix-Free Algorithms
Matrix-free approaches are an essential component for ultra-high resolution finite-element simulations. The reason for this is twofold. The classical finite-element workflow of assembling the global system matrix from the local element contributions and then solving the resulting linear system of equations with algorithms implemented in matrix-vector fashion leads to enormous data traffic between main memory and computational units (CPU cores). This constitutes a severe performance bottleneck. Another aspect is that the cost for holding the matrix in main memory becomes prohibitive, too. Note that for our simulations with 10^{13} DoFs, see [43] and Sect. 3.3 the solution vector alone requires 80 TByte of memory. Thus, matrix-free methods, which do not assemble the global matrix, but only provide the means to evaluate an associated matrix–vector product receive increasing attention, see e.g. [5, 20, 66, 67, 68, 78, 92]. An overview on the history and different approaches can e.g. be found in [9].
We note that the concept of hierarchical hybrid grids, from its inception on, was designed to be matrix-free. Based on its original premise that the macro mesh resolves both, geometry and material parameters, only a single discretization stencil had to be computed and stored for each macro primitive, as opposed to one for every fine mesh node, which corresponds to one row of the global matrix, [13]. For the case of locally varying material parameters, such as the viscosity in mantle convection, this could be extended, using classical local element matrix based approaches, see [9, 14, 37]. However, this approach does not carry over to the case of non-polyhedral domains and/or higher order elements. Thus, we developed alternative and also more efficient approaches.
3.6.1 Matrix-Free Approaches Based on Surrogate Polynomials
In our HHG and HyTeG frameworks the coupling between DoFs is considered in the form of a stencil like in classical finite differences. Due to the local regularity of the refined block-structured mesh the stencil pattern is invariant across an individual macro primitive. For example in the case of P_{1} elements and a scalar equation in 3D we obtain a 15-point stencil for all nodes within a volume primitive or on a face primitive. However, in the case of a curved domain and/or locally variable material parameters the stencil weights are non-constant over a macro primitive. These weights can be computed on-the-fly when they are needed to apply the stencil locally, e.g. during a smoothing step, by standard quadrature. Note that at least for P_{1} elements this is theoretically still faster than the more sophisticated approach of fusing quadrature and application of the local element-matrix, see [66]. However, it is significantly slower than the original one-stencil-per-primitive scenario. Thus, in [8] we introduce a new two-scale approach that employs surrogate polynomials to approximate the stencil weights. The idea, in a nutshell, is to consider the individual stencil weights as functions on each primitive. These functions can be sampled, in a setup phase, on a certain number of sampling points, typically for all nodes of the primitive on a coarser level of the mesh hierarchy. Then an approximating polynomial is constructed for the weight by determining the polynomial coefficients through a standard least-squares fit. Whenever the stencil weight is needed the associated polynomial is then evaluated. Performance-wise this raises two questions. How much memory is required for storing the polynomial coefficients and what is the cost for evaluating the polynomial, as opposed to that of on-the-fly quadrature. The answer to the first question is that the memory footprint even in 3D is not too large for polynomials of moderate degree. Consider as an example a trivariate polynomial of degree five. It has 56 coefficients. Representing a 15-point stencil by such polynomials for all nodes of a volume primitive, thus, requires only 6720 bytes. Note also that the number of polynomials needed can be reduced by symmetry arguments and the zero row-sum property of consistent weak differential operators, see [8, 30]. Thanks to the logical structuredness of the mesh inside our volume and face primitives evaluation of a polynomial can be performed lexicographically along lines. It, thus, reduces to a 1D problem which can efficiently be executed using incremental updates based on the concept of divided differences, see [8, 30].
Discretization error in the L^{2}-norm for IFEM and LSQP (with different polynomial degree q) and ratio of the asymptotic multigrid convergence rates for curved pipe domain (level ℓ = 0 refers to an already twice refined mesh)
e_{LSQP} | ρ_{IFEM}∕ρ_{LSQP} | e_{LSQP} | ρ_{IFEM}∕ρ_{LSQP} | ||
---|---|---|---|---|---|
Level | e_{IFEM} | q = 2 | q = 3 | ||
ℓ = 1 | 7.50 × 10−5 | 7.50 × 10−5 | 1.00 | 7.50 × 10−5 | 1.00 |
ℓ = 2 | 1.86 × 10−5 | 1.86 × 10−5 | 1.00 | 1.86 × 10−5 | 1.00 |
ℓ = 3 | 4.64 × 10−6 | 4.67 × 10−6 | 1.00 | 4.64 × 10−6 | 1.00 |
ℓ = 4 | 1.16 × 10−6 | 1.41 × 10−6 | 1.00 | 1.16 × 10−6 | 1.00 |
ℓ = 5 | 2.89 × 10−7 | 9.87 × 10−7 | 1.00 | 3.10 × 10−7 | 1.00 |
Weak scaling of the LSQP_{e} approach using ca. 2.3 × 107 DoFs per core; shown are: average run-time w/ and w/o coarse grid solver (c.g.) for one UMG cycle and no. of UMG iterations; values in brackets give no. of c.g. iterations (preconditioner/MINRES); parallel efficiency w.r.t. one UMG cycle is given for timings w/ and w/o c.g.; additionally the average time for a single residual application on the finest level is given
Global | # UMG | Time UMG cycle | Parallel | Time | ||||
---|---|---|---|---|---|---|---|---|
Islands | Cores | DoFs | resolution | V-cycles | w/ c.g | w/o c.g. | efficiency | residual |
1 | 5580 | 1.3 × 1011 | 3.4 km | 7 (50/150) | 192 s | 164 s | 1.00 / 1.00 | 11.9 s |
2 | 12,000 | 2.7 × 1011 | 2.8 km | 10 (100/150) | 213 s | 169 s | 0.90 / 0.97 | 12.1 s |
4 | 21,600 | 4.8 × 1011 | 2.3 km | 7 (50/250) | 210 s | 172 s | 0.92 / 0.96 | 12.7 s |
8 | 47,250 | 1.1 × 1012 | 1.7 km | 8 (50/350) | 230 s | 173 s | 0.83 / 0.95 | 12.8 s |
3.6.2 A Stencil Scaling Approach for Accelerating Matrix-Free Finite Element Implementations
In [9] we present a novel approach to fast on-the-fly low order finite element assembly for scalar elliptic partial differential equations of Darcy type with variable coefficients optimized for matrix-free implementations. In this approach, we introduce a new operator that is obtained by scaling the reference operator, i.e. the stencil obtained from the constant coefficient case. Assuming sufficient regularity, an a priori analysis showed that solutions obtained by this approach are unique and have asymptotically optimal order convergence in the H^{1}-norm and the L^{2}-norm on hierarchical hybrid grids. These preliminary considerations motivate our novel approach to reduce the cost by recomputing the surrogate stencil entries for a matrix-free solver in a more efficient way.
Results for large scale 3D application with errors measured in the discrete L^{2}-norm
Nodal integration | Scale Vol+Face | Rel. | |||||||
---|---|---|---|---|---|---|---|---|---|
DoFs | Error | eoc | ρ | tts [s] | Error | eoc | ρ | tts [s] | tts |
4.7 × 106 | 2.43 × 10−4 | – | 0.522 | 2.5 | 2.38 × 10−4 | – | 0.522 | 2.0 | 0.80 |
3.8 × 107 | 6.00 × 10−5 | 2.02 | 0.536 | 4.2 | 5.86 × 10−5 | 2.02 | 0.536 | 2.6 | 0.61 |
3.1 × 108 | 1.49 × 10−5 | 2.01 | 0.539 | 12.0 | 1.46 × 10−5 | 2.01 | 0.539 | 4.5 | 0.37 |
2.5 × 109 | 3.72 × 10−6 | 2.00 | 0.538 | 53.9 | 3.63 × 10−6 | 2.00 | 0.538 | 15.3 | 0.28 |
2.0 × 1010 | 9.28 × 10−7 | 2.00 | 0.536 | 307.2 | 9.06 × 10−7 | 2.00 | 0.536 | 88.9 | 0.29 |
1.6 × 1011 | 2.32 × 10−7 | 2.00 | 0.534 | 1822.2 | 2.26 × 10−7 | 2.00 | 0.534 | 589.6 | 0.32 |
These results demonstrate that the new scaling approach reproduces the discretization error, as is expected from our variational crime analysis [8, 9, 30]. Additionally, the multigrid convergence rate is not affected. For larger ℓ the run-time as compared to the nodal integration approach requires only about 30% of the time.
3.6.3 Stencil Scaling for Vector-Valued PDEs with Applications to Generalized Newtonian Fluids
Our target problem is vector-valued, thus, we expanded the scalar stencil scaling idea from Sect. 3.6.2 and developed a similar matrix-free approach for vector-valued PDEs [31]. The construction is again based on the use of hierarchical hybrid grids, the conceptual basis in the HHG and HyTeG [63] frameworks. Vector-valued second-order elliptic PDEs play an important role in mathematical modeling and arise e.g. in problems from elastostatics and fluid dynamics. Numerical experiments indicated that the idea of the scalar stencil scaling (denoted below as unphysical scaling) cannot directly be applied to these equations, because the standard finite-element solution cannot be reproduced, even in the case of linear coefficients. Thus, there is need of a modified stencil scaling method (denoted below as physical scaling) that is also suited for matrix-free finite element implementations on hierarchical hybrid grids. It turns out this vector-valued scaling requires computation of an additional correction term. While this makes it more complicated and expensive, compared to the scalar stencil scaling, it is able to reproduce the standard finite-element solutions, while requiring only a fraction of the time to obtain them. In the best scenario, we could observe a speedup of about 122% compared to standard on-the-fly integration. Our largest example involved solving a Stokes problem with 12,288 compute cores.
Dimensionless parameters for the Carreau viscosity model
Parameter | η_{0} | η_{∞} | κ | r |
---|---|---|---|---|
Value | 140.764 | 1.0 | 212.2 | − 0.325 |
In order to solve the systems, we employ the inexact Uzawa solver presented from [29] with variable V (3, 3) cycles where 2 smoothing steps are added to each coarser refinement level which enforces convergence of the method.
Relative errors along the line θ in the supremum norm for different scaling approaches
Viscosity | Velocity | ||
---|---|---|---|
Physical | Unphysical | Physical | Unphysical |
8.88 × 10−4 | 1.07 × 10−1 | 5.65 × 10−5 | 3.19 × 10−2 |
Relative time-to-solution comparison of the nodal integration and physical scaling approach for the non-linear generalized Stokes problem
Nodal integration | Physical scaling | Relative | |
---|---|---|---|
DoFs | tts [s] | tts [s] | tts |
4.69 × 106 | 309.10 | 364.97 | 1.18 |
3.82 × 107 | 361.90 | 412.10 | 1.14 |
3.08 × 108 | 895.18 | 719.55 | 0.80 |
2.47 × 109 | 3227.45 | 2626.13 | 0.81 |
3.7 Resilience
Extremely concurrent simulations may in the future require a software design that is resilient to faults of individual software and hardware components. The probability of faults in the underlying systems increases with the number of parallel processes. Especially long-running simulations may suffer from lower mean time between failures and restarting applications with run-times of several hours or days consumes vast amounts of resources.
Therefore, fault tolerance techniques have become a research topic as preparation for possibly unreliable future exascale systems. In the TerraNeo project we have so far mainly focused on hard faults. To cope with the failure of a core or node, we mainly distinguish between two categories of methods: one class relies on checkpointing techniques, where snapshots of the simulation are stored in regular intervals so that the state can be loaded upon failure. A second class are algorithm-based fault tolerance (ABFT) techniques, where (partially) lost data can be re-computed on-the-fly. Regarding large-scale simulations, checkpointing techniques often suffer from bad serial and parallel performance if data is written to disk. However, there are approaches that solely rely on distributed, in-memory checkpointing, which could be combined with compression techniques [71] that were also explored in a student project [74]. This can lead to a flexible, fast, and scalable resilience method [64].
In [53, 54] an alternative ABFT technique to provide resilience specifically for multigrid methods is developed. Given a faulty subdomain Ω_{F}, created when a processor (core or node) crashes, the global solution can be recovered by solving a recovery problem on Ω_{F} via a local multigrid iteration. After the recovery subproblem is solved, the global iteration continues.
To address this issue, in [56] the recovery algorithm is enhanced by an adaptive control mechanism. Instead of solving the subproblem using a fixed number of iterations, a hierarchically weighted error estimator is introduced to define a stopping criterion for the faulty subdomain. The estimator is designed to be well-suited to extreme-scale parallel multigrid settings as they are employed in applications. It guards the algorithm against over- or under-solving the subproblem and therefore wasting computational resources.
We have also conducted studies with asynchronous recovery strategies. Here, the local problem on Ω_{F} and the problem on the healthy domain \(\Omega \setminus \overline {\Omega }_F\) are solved concurrently. This means that during the recovery process, the multigrid iteration is also continued in parallel in the healthy domain. Different strategies have been explored in [54] which boundary conditions are suitable for the healthy domain on the interface. After the stopping criterion in the faulty domain is fulfilled, the local recovery iteration is terminated and a recoupling procedure is initiated.
To emphasize the applicability to large-scale scenarios, the adaptive approach is scaled in [56] to a simulation with about 6.9 × 10^{11} DoFs and 245, 766 processes run on the JUQUEEN supercomputer. Additionally, the generality of the approach was demonstrated for a scenario where multiple failures in different regions of the domain have been triggered.
3.8 General Performance Issues
Simulations in geophysics require a high spatial resolution which leads to problems with many degrees of freedom. The success of a geophysics simulation framework will thus depend on its scalability and that it has only minimal memory overhead. In the previous sections we already presented scalability results to highlight the results obtained in the TerraNeo project.
However, message passing scalability is not the only metric that is important in high-performance computing. Even more important is the absolute performance which depends critically also on the performance on a single core or a single node. In the extreme scale computing community the relevance of a systematic node-level performance engineering is increasingly being realized, see e.g. [4, 45, 48, 60, 67]. Note that traditional scalability is easier to achieve when the node-level performance is lower. Publishing only scalability results without absolute performance measures is a frequently observed parallel computing cheat, as exposed in [2]. We emphasize here that the TerraNeo project has in this sense profited from long term research efforts in HPC performance engineering. These stem originally from the Dime project^{4} [65, 97] and have led to the excellent performance features of the HHG library [12, 13]. Thus, node level performance in TerraNeo is not coming as an afterthought imposed on existing codes, but has been an a priori design goal with high priority.
Within TerraNeo, these techniques were continuously employed to analyze the performance and were used to guide the development of new matrix-free methods. This includes rather simple analysis like calculating the update cost for a single stencil update in terms of floating point operations per second which allows comparing the actual performance to the maximally achievable performance on a given hardware. This type of analysis has been performed in [10] and [11].
In cooperation with the HPC group of the RRZE in Erlangen, a more sophisticated analysis was performed in [8, 40, 41] where the Execution-Cache-Memory (ECM) model [48] was used to examine the performance.
3.9 HyTeG
We successfully demonstrated that the combination of a fully unstructured grid with regular refinement and matrix-free algorithms can be used to solve large geodynamical problems. A substantial amount of work in the TerraNeo project was performed using the HHG prototype. Even though the latter shows excellent performance and scalability, certain issues need to be faced. Due to the fact that the codebase was created over a decade ago, the coding standard cannot live up to current requirements. This makes it hard to maintain the framework and to familiarise new users with it, which is essential for a community code. Furthermore, the prototype was never meant to be used for higher-order discretizations or other than nodal elements.
Therefore a redesign is initiated under a new name: Hybrid Tetrahedral Grids (HyTeG) [63]. In addition to building on the knowledge gathered over the years from developing the HHG framework we could also utilize ideas and infrastructure from the waLBerla framework [44]. One of the fundamental changes in HyTeG is the strict separation of the data structures that define the macro mesh geometry and the actual simulation data. As in HHG, the tetrahedra of the unstructured macro mesh are split into their geometric primitives, namely volumes, faces, edges and vertices. The lower dimensional primitives (faces, edges and vertices) are used to decouple the volume primitives in terms of communication. Contrary to HHG the parallel partitioning is not solely based on volume primitives, instead all primitives get partitioned between processes. This e.g. allows to take also the computational and memory footprint for face primitives into account for load balancing, for which we can employ sophisticated tools such as ParMetis [61]. Note that in HyTeG the partitioning does not involve global data structures, an essential aspect when entering the exascale era. Like with HHG, it was decided to use C++ as the primary programming language due to its high performance and spread in the HPC community. Up to this point, no compute data are associated with the primitives. In a second step, these can be attached to the primitives. This might be a single value for only some of the primitives or a full hierarchy of grids for every single primitive. This separation also allows for efficient light-weight static and dynamic load balancing techniques similar to those described in [32, 96].
By introducing different types of discretizations, there are also many more compute-intensive kernels that need to be taken care of. The solution chosen in HyTeG to solve this problem is code generation. Inspired by the work in ExaStencils [76, 94], our joint collaboration in this direction [15], and also using experience from pystencils^{6} this can be efficiently realized. One difference to other projects that use whole program generation, however, is that only compute intensive kernels are generated, but not the surrounding data structures. In contrast to HyTeG itself, pystencils is using Python, which allows for much more flexibility and metaprogramming capabilities. The generated kernels, however are in C++, which eliminates the need for a Python environment when running HyTeG on supercomputers.
3.10 Asthenosphere Velocities and Dynamic Topography
In order to demonstrate the flexibility of the different algorithmic components, we have studied selected application questions. In [102] we investigated the question of flow speeds in the asthenosphere. The latter is a mechanically weak layer in the upper mantle right below the lithosphere, the rigid outermost shell of our planet. It plays an important role in convection modelling as it distinguishes itself from the lower parts of the mantle by a significantly lower viscosity. The precise details of the contrast are, however, unknown. A variety of geologic observations and geodynamic arguments indicates that velocities in the upper mantle may exceed those of tectonic plates by an order of magnitude [49, 51]. The framework was used to simulate high-resolution whole earth convection models with asthenospheric channels of varying thickness. Reduction of the asthenospheric thickness is balanced by an associated reduction in its viscosity, following the Cathles parameter [88]. This resulted for the tested end-member case in a set-up with an asthenosphere channel of only 100 km depth and a significant viscosity contrast of up to 4 orders of magnitude relative to the deeper mantle. We found a velocity increase by a factor of 10 between a reference case with an upper mantle of 1000 km depth and the very thin channel end-member case, translating into speeds of ≈ 20 cm/a within the narrow asthenosphere. Our suggested and numerically verified Poiseuille flow model predicts that the upper mantle velocity scales with the inverse of the asthenospheric thickness.
4 Conclusions and Future Work
TerraNeo is a project in Computational Science and Engineering [91]. We successfully developed innovative scientific computing methods for the exascale era. To this end, new algorithms were devised for simulating Earth Mantle convection. We analyzed the accuracy of these methods and their cost on massively parallel computers, showing that our methods exhibit excellent performance. A highly optimized prototype implementation was designed using the HHG multigrid library. Based on HHG, we demonstrated computations with an unprecedented resolution of the Earth’s Mantle based on real-world geophyical input data.
In TerraNeo many new methods for future exascale geodynamic simulations were invented: they include new solver components for monolithic multigrid methods for saddle point problems, specially designed smoothers and new strategies to solve the coarse grid problems. Two new classes of matrix-free methods were introduced and analyzed, one based on surrogate polynomials, the other one based on stencil scaling. New scheduling algorithms for parallel multilevel Monte Carlo methods and uncertainty quantification were studied. Methods for fault tolerance were investigated. This includes methods based on in-memory checkpointing as well as new methods for fast algorithmic reconstruction of lost data when hard faults occur. Inverse problems in mantle convection were studied using adjoint techniques.
Careful parallel performance studies complement our work and assess the suitability of the algorithmic components on future exascale computers. Many of the methods were tested on application oriented problems, such as for example a geophysics study on the relation of the thickness of the asthenosphere and upper mantle velocities and the influence of viscosity variations on dynamic topography.
We found that conventional C++-based implementation techniques lack expressiveness and flexibility for massively parallel and highly optimized parallel codes and that achieving performance portability is a major difficulty. Starting from this insight, we invented new programming techniques based on automatic program generation, learning from neighboring SPPEXA projects such as ExaStencils. These ideas have already been realized in HyTeG and waLBerla as new and innovative simulation software architectures for multiphysics exascale computing.
Thus, some aspects of our research in TerraNeo are of mathematical nature, others fall into the field of computer science. Our methods have also already been used to perform research in geophysics, the target discipline of TerraNeo. However, we point out that the research contribution of TerraNeo falls neither into the intersection of all these fields, nor can our contribution be understood from the viewpoint of mathematics or computer science alone. It is also no project in the geosciences, since its primary goal is not the creation of new geophysical insight. The goal of TerraNeo is the construction and analysis of simulation methods that are the enabling technologies for future exascale research in geodynamics.
This result could not be reached by either of the classical disciplines alone. The synthesis of knowledge and methods from mathematics, computer science, and geophysics becomes more than the sum of its parts. We emphasize additionally that our research in computational science and engineering is not restricted to the application in geophysics. Many of the innovations developed in TerraNeo can be transferred to other target disciplines.
With our new computational methods, we have broken previously existing barriers to computational performance. This is demonstrated for example by our solution of indefinite linear systems that have in excess of 10^{13} degrees of freedom. To our knowledge this constitutes the largest finite element computation published to date, even though the computation was still performed on JUQUEEN, a machine that is now outdated and was already retired.
TerraNeo funding in the last period was reduced from the desired four to only three positions so that the new software design and its development could not be completed as originally proposed. A preliminary version of the TerraNeo software will be made public and will contain essential parts of the promised core functionality, but it still lacks most of the application oriented functionality that would make it fully usable as a Geophysics community code. Efforts will be made to continue the development so that the central research goal of TerraNeo can be reached.
Footnotes
Notes
Acknowledgements
The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. for funding this project by providing computing time on the GCS Supercomputers: SuperMUC at Leibniz Supercomputing Centre (LRZ), JUQUEEN at Jülich Supercomputing Centre (JSC) and Hazel Hen at High-Performance Computing Center Stuttgart (HLRS). Financial support by the German Research Foundation through the Priority Programme 1648 Software for Exascale Computing (SPPEXA), WO671/11-1 and BU 2012/15-1 is greatly acknowledged. We also would like to thank our former project collaborators: D. Bartuschat, J. Eitzinger, B. Gmeiner, M. Hofmann, L. John, H. Stengel, Chr. Waluga, J. Weismüller, G. Wellein, and M. Wittmann.
References
- 1.Amestoy, P., Buttar, A., L’Excellent, J.Y., Mary, T.: On the complexity of the block low-rank multifrontal factorization. SIAM J. Sci. Comp. 39(4), A1710–A1740 (2017)MathSciNetzbMATHGoogle Scholar
- 2.Bailey, D.H.: Misleading performance claims in parallel computations. In: Proceedings of the 46th Annual Design Automation Conference, pp. 528–533. ACM, New York (2009)Google Scholar
- 3.Baker, A., Klawonn, A., Kolev, T., Lanser, M., Rheinbach, O., Yang, U.: Scalability of classical algebraic multigrid for elasticity to half a million parallel tasks. In: Software for Exascale Computing-SPPEXA 2013–2015, pp. 113–140. Springer, Berlin (2016)Google Scholar
- 4.Bastian, P., Engwer, C., Fahlke, J., Geveler, M., Göddeke, D., Iliev, O., Ippisch, O., Milk, R., Mohring, J., Müthing, S., Ohlberger, M., Ribbrock, D., Turek, S.: Hardware-based efficiency advances in the EXA-DUNE project. In: Software for Exascale Computing - SPPEXA 2013–2015. Lecture Notes in Computational Science and Engineering. Springer, Berlin (2016)Google Scholar
- 5.Bastian, P., Müller, E.H., Müthing, S., Piatkowski, M.: Matrix-free multigrid block-preconditioners for higher order discontinuous Galerkin discretisations. J. Comput. Phys. 394, 417–439 (2019). doi: https://doi.org/10.1016/j.jcp.2019.06.001. http://www.sciencedirect.com/science/article/pii/S0021999119303973
- 6.Bauer, A., Schaal, K., Springel, V., Chandrashekar, P., Pakmor, R., Klingenberg, C.: Simulating turbulence using the astrophysical discontinuous Galerkin code TENET. In: Software for Exascale Computing-SPPEXA 2013–2015, pp. 381–402. Springer, Berlin (2016)Google Scholar
- 7.Bauer, S., Bunge, H.P., Drzisga, D., Gmeiner, B., Huber, M., John, L., Mohr, M., Rüde, U., Stengel, H., Waluga, C., et al.: Hybrid parallel multigrid methods for geodynamical simulations. In: Software for Exascale Computing-SPPEXA 2013–2015, pp. 211–235. Springer, Berlin (2016). https://doi.org/10.1007/978-3-319-40528-5_10
- 8.Bauer, S., Mohr, M., Rüde, U., Weismüller, J., Wittmann, M., Wohlmuth, B.: A two-scale approach for efficient on-the-fly operator assembly in massively parallel high performance multigrid codes. Appl. Numer. Math. 122, 14–38 (2017). https://doi.org/10.1016/j.apnum.2017.07.006 MathSciNetzbMATHGoogle Scholar
- 9.Bauer, S., Drzisga, D., Mohr, M., Rüde, U., Waluga, C., Wohlmuth, B.: A stencil scaling approach for accelerating matrix-free finite element implementations. SIAM J. Sci. Comp. 40(6), C748–C778 (2018). https://doi.org/10.1137/17M1148384 MathSciNetzbMATHGoogle Scholar
- 10.Bauer, S., Huber, M., Mohr, M., Rüde, U., Wohlmuth, B.: A new matrix-free approach for large-scale geodynamic simulations and its performance. In: International Conference on Computational Science, pp. 17–30. Springer, Berlin (2018). https://doi.org/10.1007/978-3-319-93701-4_2
- 11.Bauer, S., Huber, M., Ghelichkhan, S., Mohr, M., Rüde, U., Wohlmuth, B.: Large-scale simulation of mantle convection based on a new matrix-free approach. J. Comp. Sci. 31, 60–76 (2019). https://doi.org/10.1016/j.jocs.2018.12.006 Google Scholar
- 12.Bergen, B., Hülsemann, F., Rüde, U.: Is 1.7 × 10^{10} unknowns the largest finite element system that can be solved today? In: SC’05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, pp. 5–5. IEEE, Piscataway (2005)Google Scholar
- 13.Bergen, B., Gradl, T., Hülsemann, F., Rüde, U.: A massively parallel multigrid method for finite elements. Comp. Sci. Eng. 8(6), 56–62 (2006)Google Scholar
- 14.Bielak, J., Ghattas, O., Kim, E.J.: Parallel octree-based finite element method for large-scale earthquake ground motion simulation. Comput. Model. Eng. Sci. 10(2), 99–112 (2005). https://doi.org/10.3970/cmes.2005.010.099 MathSciNetzbMATHGoogle Scholar
- 15.Bolten, M., Franchetti, F., Kelly, P., Lengauer, C., Mohr, M.: Algebraic description and automatic generation of multigrid methods in SPIRAL. Concurrency Comput. Pract. Exp. 29(17) (2017). Special Issue on Advanced Stencil-Code Engineering. https://doi.org/10.1002/cpe.4105
- 16.Braess, D., Sarazin, R.: An efficient smoother for the Stokes problem. Appl. Numer. Math. 23(1), 3–19 (1997). Multilevel methods (Oberwolfach, 1995)Google Scholar
- 17.Brandt, A., Livne, O.E.: Multigrid Techniques: 1984 Guide with Applications to Fluid Dynamics, vol. 67. SIAM, Philadelphia (2011)zbMATHGoogle Scholar
- 18.Braun, J.: The many surface expressions of mantle dynamics. Nat. Geosci. 3, 825–833 (2010). https://doi.org/10.1038/ngeo1020 Google Scholar
- 19.Brezzi, F., Douglas, J.: Stabilized mixed methods for the Stokes problem. Numer. Math. 53(1–2), 225–235 (1988)MathSciNetzbMATHGoogle Scholar
- 20.Brown, J.: Efficient nonlinear solvers for nodal high-order finite elements in 3D. J. Sci. Comp. 45(1–3), 48–63 (2010). https://doi.org/10.1007/s10915-010-9396-8 MathSciNetzbMATHGoogle Scholar
- 21.Bunge, H.P., Richards, M.A., Baumgardner, J.R.: Effect of depth-dependent viscosity on the planform of mantle convection. Nature 379(6564), 436–438 (1996). https://doi.org/10.1038/379436a0 Google Scholar
- 22.Bunge, H.P., Hagelberg, C.R., Travis, B.J.: Mantle circulation models with variational data assimilation: inferring past mantle flow and structure from plate motion histories and seismic tomography. Geophys. J. Int. 152(2), 280–301 (2003). https://doi.org/10.1046/j.1365-246X.2003.01823.x Google Scholar
- 23.Burstedde, C., Stadler, G., Alisic, L., Wilcox, L.C., Tan, E., Gurnis, M., Ghattas, O.: Large-scale adaptive mantle convection simulation. Geophys. J. Int. 192(3), 889–906 (2013). https://doi.org/10.1093/gji/ggs070 Google Scholar
- 24.Clevenger, T.C., Heister, T., Kanschat, G., Kronbichler, M.: A flexible, parallel, adaptive geometric multigrid method for FEM. Technical Report (2019). arXiv:1904.03317Google Scholar
- 25.Colli, L., Bunge, H.P., Schuberth, B.S.A.: On retrodictions of global mantle flow with assimilated surface velocities. Geophys. Res. Lett. 42(20), 8341i–8348 (2015). https://doi.org/10.1002/2015gl066001
- 26.Colli, L., Ghelichkhan, S., Bunge, H.P.: On the ratio of dynamic topography and gravity anomalies in a dynamic earth. Geophys. Res. Lett. 43(6), 2510–2516 (2016). https://doi.org/10.1002/2016gl067929 Google Scholar
- 27.Colli, L., Ghelichkhan, S., Bunge, H.P., Oeser, J.: Retrodictions of Mid Paleogene mantle flow and dynamic topography in the Atlantic region from compressible high resolution adjoint mantle convection models: sensitivity to deep mantle viscosity and tomographic input model. Gondwana Res. 53, 252–272 (2018). https://doi.org/10.1016/j.gr.2017.04.027 Google Scholar
- 28.Drzisga, D., Gmeiner, B., Rüde, U., Scheichl, R., Wohlmuth, B.: Scheduling massively parallel multigrid for multilevel Monte Carlo methods. SIAM J. Sci. Comp. 39(5), S873–S897 (2017). https://doi.org/10.1137/16M1083591 MathSciNetzbMATHGoogle Scholar
- 29.Drzisga, D., John, L., Rüde, U., Wohlmuth, B., Zulehner, W.: On the analysis of block smoothers for saddle point problems. SIAM J. Mat. Ana. Appl. 39(2), 932–960 (2018). https://doi.org/10.1137/16M1106304 MathSciNetzbMATHGoogle Scholar
- 30.Drzisga, D., Keith, B., Wohlmuth, B.: The surrogate matrix methodology: a priori error estimation (2019). Preprint arXiv:1902.07333Google Scholar
- 31.Drzisga, D., Rüde, U., Wohlmuth, B.: Stencil scaling for vector-valued PDEs on hybrid grids with applications to generalized Newtonian fluids (2019). Preprint arXiv:1908.08666Google Scholar
- 32.Eibl, S., Rüde, U.: A systematic comparison of runtime load balancing algorithms for massively parallel rigid particle dynamics. Comput. Phys. Commun. 244, 76–85 (2019). https://doi.org/10.1016/j.cpc.2019.06.020 Google Scholar
- 33.Elman, H.C., Silvester, D.J., Wathen, A.J.: Finite Elements and Fast Iterative Solvers: With Applications in Incompressible Fluid Dynamics. Oxford University Press, Oxford (2014)zbMATHGoogle Scholar
- 34.Galdi, G.P., Rannacher, R., Robertson, A.M., Turek, S.: Hemodynamical Flows. Delhi Book Store, New Delhi (2008)Google Scholar
- 35.Ghelichkhan, S., Bunge, H.P.: The compressible adjoint equations in geodynamics: derivation and numerical assessment. GEM Int. J. Geomath. 7(1), 1–30 (2016). https://doi.org/10.1007/s13137-016-0080-5 MathSciNetzbMATHGoogle Scholar
- 36.Ghelichkhan, S., Bunge, H.P.: The adjoint equations for thermochemical compressible mantle convection: derivation and verification by twin experiments. Proc. R. Soc. A Math. Phys. Eng. Sci. 474(2220), 20180329 (2018). https://doi.org/10.1098/rspa.2018.0329 MathSciNetzbMATHGoogle Scholar
- 37.Gmeiner, B.: Design and analysis of hierarchical hybrid multigrid methods for peta-scale systems and beyond. Ph.D. Thesis, Technische Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg (2013)Google Scholar
- 38.Gmeiner, B., Köstler, H., Stürmer, M., Rüde, U.: Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurrency Comput. Pract. Exp. 26(1), 217–240 (2014)Google Scholar
- 39.Gmeiner, B., Waluga, C., Wohlmuth, B.: Local mass-corrections for continuous pressure approximations of incompressible flow. SIAM J. Numer. Ana. 52(6), 2931–2956 (2014). https://doi.org/10.1137/140959675 MathSciNetzbMATHGoogle Scholar
- 40.Gmeiner, B., Rüde, U., Stengel, H., Waluga, C., Wohlmuth, B.: Performance and scalability of hierarchical hybrid multigrid solvers for Stokes systems. SIAM J. Sci. Comp. 37(2), C143–C168 (2015). https://doi.org/10.1137/130941353 MathSciNetzbMATHGoogle Scholar
- 41.Gmeiner, B., Rüde, U., Stengel, H., Waluga, C., Wohlmuth, B.: Towards textbook efficiency for parallel multigrid. Numer. Math. Theory, Methods Appl. 8(1), 22–46 (2015). https://doi.org/10.4208/nmtma.2015.w10si
- 42.Gmeiner, B., Huber, M., John, L., Rüde, U., Waluga, C., Wohlmuth, B.: Massively parallel large scale Stokes flow simulation. In: NIC Symposium (2016)Google Scholar
- 43.Gmeiner, B., Huber, M., John, L., Rüde, U., Wohlmuth, B.: A quantitative performance study for Stokes solvers at the extreme scale. J. Comp. Sci. 17, 509–521 (2016). https://doi.org/10.1016/j.jocs.2016.06.006 MathSciNetGoogle Scholar
- 44.Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., Rüde, U.: A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In: SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE, Piscataway (2013)Google Scholar
- 45.Guillet, T., Pakmor, R., Springel, V., Chandrashekar, P., Klingenberg, C.: High-order magnetohydrodynamics for astrophysics with an adaptive mesh refinement discontinuous Galerkin scheme. Mon. Not. R. Astron. Soc. 485(3), 4209–4246 (2019)Google Scholar
- 46.Gupta, S., Wohlmuth, B., Helmig, R.: Multi-rate time stepping schemes for hydro-geomechanical model for subsurface methane hydrate reservoirs. Adv. Water Res. 91, 78–87 (2016). https://doi.org/10.1016/j.advwatres.2016.02.013 Google Scholar
- 47.Hager, B.H., Clayton, R.W., Richards, M.A., Comer, R.P., Dziewonski, A.M.: Lower mantle heterogeneity, dynamic topography and the geoid. Nature 313(6003), 541–545 (1985). https://doi.org/10.1038/313541a0 Google Scholar
- 48.Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. Concurrency Comput. Pract. Exp. 28, 189–210 (2014)Google Scholar
- 49.Hartley, R.A., Roberts, G.G., White, N.J., Richardson, C.: Transient convective uplift of an ancient buried landscape. Nat. Geosci. 4(8), 562–565 (2011). https://doi.org/10.1038/ngeo1191 Google Scholar
- 50.Heister, T., Dannberg, J., Gassmöller, R., Bangerth, W.: High accuracy mantle convection simulation through modern numerical methods - II: realistic models and problems. Geophys. J. Int. 210(2), 833–851 (2017). https://doi.org/10.1093/gji/ggx195 Google Scholar
- 51.Höink, T., Lenardic, A., Richards, M.A.: Depth-dependent viscosity and mantle stress amplification: implications for the role of the asthenosphere in maintaining plate tectonics. Geophys. J. Int. 191(1), 30–41 (2012). https://doi.org/10.1111/j.1365-246X.2012.05621.x Google Scholar
- 52.Horbach, A., Bunge, H.P., Oeser, J.: The adjoint method in geodynamics: derivation from a general operator formulation and application to the initial condition problem in a high resolution mantle circulation model. GEM Int. J. Geomath. 5(2), 163–194 (2014). https://doi.org/10.1007/s13137-014-0061-5 MathSciNetzbMATHGoogle Scholar
- 53.Huber, M., John, L., Pustejovska, P., Rüde, U., Waluga, C., Wohlmuth, B.: Solution techniques for the Stokes system: A priori and a posteriori modifications, resilient algorithms. In: Proceedings of the ICIAM, Beijing. Higher Education Press, Beijing (2015)Google Scholar
- 54.Huber, M., Gmeiner, B., Rüde, U., Wohlmuth, B.: Resilience for massively parallel multigrid solvers. SIAM J. Sci. Comp. 38(5), S217–S239 (2016). https://doi.org/10.1137/15M1026122 MathSciNetzbMATHGoogle Scholar
- 55.Huber, M., Rüde, U., Waluga, C., Wohlmuth, B.: Surface couplings for subdomain-wise isoviscous gradient based Stokes finite element discretizations. J. Sci. Comp. 74(2), 895–919 (2018)MathSciNetzbMATHGoogle Scholar
- 56.Huber, M., Rüde, U., Wohlmuth, B.: Adaptive control in roll-forward recovery for extreme scale multigrid. Int. J. High Perf. Comp. Appl. pp. 1–21 (2018). https://doi.org/10.1177/1094342018817088
- 57.Iaffaldano, G., Lambeck, K.: Pacific plate-motion change at the time of the Hawaiian-Emperor bend constrains the viscosity of Earth’s asthenosphere. Geophys. Res. Lett. 41(10), 3398–3406 (2014). https://doi.org/10.1002/2014GL059763 Google Scholar
- 58.Ilic, A., Pratas, F., Sousa, L.: Cache-aware Roofline model: upgrading the loft. IEEE Comp. Arch. Lett. 13(1), 21–24 (2014). https://doi.org/10.1109/L-CA.2013.6 Google Scholar
- 59.Ismail-Zadeh, A., Schubert, G., Tsepelev, I., Korotkii, A.: Inverse problem of thermal convection: numerical approach and application to mantle plume restoration. Phys. Earth Planet. Int. 145(1–4), 99–114 (2004). https://doi.org/10.1016/j.pepi.2004.03.006 Google Scholar
- 60.Jumah, N., Kunkel, J., Zängl, G., Yashiro, H., Dubos, T., Meurdesoif, Y.: GGDML: icosahedral models language extensions. J. Comput. Sci. Technol. Updat. 4, 1–10 (2017). https://doi.org/10.15379/2410-2938.2017.04.01.01 Google Scholar
- 61.Karypis, G., Kumar, V.: A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J. Paral. Distrib. Comput. 48(1), 71–95 (1998). https://doi.org/10.1006/jpdc.1997.1403 Google Scholar
- 62.Klawonn, A., Lanser, M., Rheinbach, O.: Toward extremely scalable nonlinear domain decomposition methods for elliptic partial differential equations. SIAM J. Sci. Comput. 37(6), C667–C696 (2015). https://doi.org/10.1137/140997907 MathSciNetzbMATHGoogle Scholar
- 63.Kohl, N., Thönnes, D., Drzisga, D., Bartuschat, D., Rüde, U.: The HyTeG finite-element software framework for scalable multigrid solvers. Int. J. Parallel Emergent Distrib. Syst. 34, 1–20 (2018). https://doi.org/10.1080/17445760.2018.1506453 Google Scholar
- 64.Kohl, N., Hötzer, J., Schornbaum, F., Bauer, M., Godenschwager, C., Köstler, H., Nestler, B., Rüde, U.: A scalable and extensible checkpointing scheme for massively parallel simulations. Int. J. High Perfor. Comput. Appl. 33(4), 571–589 (2019)Google Scholar
- 65.Kowarschik, M., Rüde, U., Weiss, C., Karl, W.: Cache-aware multigrid methods for solving Poisson’s equation in two dimensions. Computing 64(4), 381–399 (2000)MathSciNetzbMATHGoogle Scholar
- 66.Kronbichler, M., Kormann, K.: A generic interface for parallel cell-based finite element operator application. Comput. Fluids 63, 135–147 (2012). https://doi.org/10.1016/j.compfluid.2012.04.012 MathSciNetzbMATHGoogle Scholar
- 67.Kronbichler, M., Kormann, K.: Fast matrix-free evaluation of discontinuous Galerkin finite element operators. ACM Trans. Math. Softw. 45(3), 29:1–29:40 (2019). https://doi.org/10.1145/3325864
- 68.Kronbichler, M., Ljungkvist, K.: Multigrid for matrix-free high-order finite element computations on graphics processors. ACM Trans. Parall. Comput. 6(1) (2019). https://doi.org/10.1145/3322813
- 69.Kronbichler, M., Wall, W.A.: A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers. SIAM J. Sci. Comp. 40(5), A3423–A3448 (2018). https://doi.org/10.1137/16M110455X MathSciNetzbMATHGoogle Scholar
- 70.Kuckuk, S., Köstler, H.: Automatic generation of massively parallel codes from ExaSlang. Computation 4(3), 27:1–27:20 (2016). Special Issue on High Performance Computing (HPC) Software Design. https://doi.org/10.3390/computation4030027
- 71.Kunkel, J., Novikova, A., Betke, E.: Towards decoupling the selection of compression algorithms from quality constraints - an investigation of lossy compression efficiency. Supercomput. Front. Innov. 4, 17–33 (2017). https://doi.org/10.14529/jsfi170402 Google Scholar
- 72.Lambeck, K., Smither, C., Johnston, P.: Sea-level change, glacial rebound and mantle viscosity for northern Europe. Geophys. J. Int. 134(1), 102–144 (1998). https://doi.org/10.1046/j.1365-246x.1998.00541.x Google Scholar
- 73.Larin, M., Reusken, A.: A comparative study of efficient iterative solvers for generalized Stokes equations. Numer. Linear Algebra Appl. 15(1), 13–34 (2008). https://doi.org/10.1002/nla.561 MathSciNetzbMATHGoogle Scholar
- 74.Leitenmaier, L.: Data compression for simulation data from Earth mantle convection. Lehrstuhl für Informatik 10 (Systemsimulation), Friedrich-Alexander-Universität Erlangen-Nürnberg. Bachelor Thesis (2014)Google Scholar
- 75.Lengauer, C., Apel, S., Bolten, M., Chiba, S., Rüde, U., Teich, J., Größlinger, A., Hannig, F., Köstler, H., Claus, L., Grebhahn, A., Groth, S., Kronawitter, S., Kuckuk, S., Rittich, H., Schmitt, C., Schmitt, J.: ExaStencils: Advanced Multigrid Solver Generation (In this volume)Google Scholar
- 76.Lengauer, C., Apel, S., Bolten, M., Größlinger, A., Hannig, F., Köstler, H., Rüde, U., Teich, J., Grebhahn, A., Kronawitter, S., Kuckuk, S., Rittich, H., Schmitt, C.: ExaStencils: Advanced stencil-code engineering. In: Euro-Par 2014: Parallel Processing Workshops. Lecture Notes in Computer Science, vol. 8806, pp. 553–564. Springer, Berlin (2014). https://doi.org/10.1007/978-3-n-2_47
- 77.Maitre, J.F., Musy, F., Nigon, P.: A fast solver for the Stokes equations using multigrid with a Uzawa smoother. In: Advances in multigrid methods (Oberwolfach, 1984). Notes on Numerical Fluid Mechanics, vol. 11, pp. 77–83. Braunschweig, Vieweg (1985)Google Scholar
- 78.May, D.A., Brown, J., Pourhiet, L.L.: A scalable, matrix-free multigrid preconditioner for finite element discretizations of heterogeneous Stokes flow. Comp. Meth. Appl. Mech. Engg. 290, 496–523 (2015). https://doi.org/10.1016/j.cma.2015.03.014 MathSciNetzbMATHGoogle Scholar
- 79.May, D., Sanan, P., Rupp, K., Knepley, M., Smith, B.: Extrem scale multigrid components within PETSc. In: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC (2016)Google Scholar
- 80.Mitrovica, J.X.: Haskell [1935] revisited. J. Geophys. Res. 101(B1), 555 (1996). https://doi.org/10.1029/95JB03208 Google Scholar
- 81.Mitrovica, J.X., Forte, A.M.: A new inference of mantle viscosity based upon joint inversion of convection and glacial isostatic adjustment data. Earth Planet. Sci. Lett. 225(1–2), 177–189 (2004). https://doi.org/10.1016/j.epsl.2004.06.005 Google Scholar
- 82.Mitrovica, J.X., Wahr, J.: Ice age earth rotation. Ann. Rev. Earth Planet. Sci. 39, 577–616 (2011). https://doi.org/10.1146/annurev-earth-040610-133404 Google Scholar
- 83.Müller, R.D., Sdrolias, M., Gaina, C., Roest, W.R.: Age, spreading rates, and spreading asymmetry of the world’s ocean crust. Geochem. Geophys. Geosyst. 9(4), 1525–2027 (2008)Google Scholar
- 84.Price, M.G., Davies, J.H.: Profiling the robustness, efficiency and limits of the forward-adjoint method for 3-D mantle convection modelling. Geophys. J. Int. 212(2), 1450–1462 (2017). https://doi.org/10.1093/gji/ggx489 Google Scholar
- 85.Ricard, Y., Wuming, B.: Inferring the viscosity and the 3-D density structure of the mantle from geoid, topography and plate velocities. Geophys. J. Int. 105(3), 561–571 (1991). https://doi.org/10.1111/j.1365-246X.1991.tb00796.x Google Scholar
- 86.Ricard, Y., Spada, G., Sabadini, R.: Polar wandering of a dynamic earth. Geophys. J. Int. 113(2), 284–298 (1993). https://doi.org/10.1111/j.1365-246X.1993.tb00888.x Google Scholar
- 87.Richards, M.A., Hager, B.H.: Geoid anomalies in a dynamic earth. J. Geophys. Res. 89(B7), 5987–6002 (1984). https://doi.org/10.1029/JB089iB07p05987 Google Scholar
- 88.Richards, M.A., Lenardic, A.: The Cathles parameter (Ct): a geodynamic definition of the asthenosphere and implications for the nature of plate tectonics. Geochem. Geophys. Geosyst. 19(12), 4858–4875 (2018). https://doi.org/10.1029/2018GC007664 Google Scholar
- 89.Richards, M., Bunge, H., Ricard, Y., Baumgardner, J.: Polar wandering in mantle convection models. Geophys. Res. Lett. 26(12), 1777–1780 (1999). https://doi.org/10.1029/1999GL900331 Google Scholar
- 90.Rüde, U.: Mehrgittermethode—Grundlage der computergestützten Wissenschaften. Informatik Spektrum 42(2), 138–143 (2019)Google Scholar
- 91.Rüde, U., Willcox, K., McInnes, L.C., Sterck, H.D.: Research and education in computational science and engineering. SIAM Rev. 60(3), 707–754 (2018). https://doi.org/10.1137/16M1096840 MathSciNetGoogle Scholar
- 92.Rudi, J., Malossi, A.C.I., Isaac, T., Stadler, G., Gurnis, M., Staar, P.W.J., Ineichen, Y., Bekas, C., Curioni, A., Ghattas, O.: An extreme-scale implicit solver for complex PDEs: Highly heterogeneous flow in Earth’s Mantle. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, pp. 5:1–5:12. ACM, New York (2015). https://doi.org/10.1145/2807591.2807675
- 93.Rudolph, M.L., Lekić, V., Lithgow-Bertelloni, C.: Viscosity jump in Earth’s mid-mantle. Science 350(6266), 1349–1352 (2015). https://doi.org/10.1126/science.aad1929 Google Scholar
- 94.Schmitt, C., Kuckuk, S., Hannig, F., Köstler, H., Teich, J.: ExaSlang: A domain-specific language for highly scalable multigrid solvers. In: Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pp. 42–51. IEEE Computer Society, Washington (2014)Google Scholar
- 95.Schöberl, J., Zulehner, W.: On Schwarz-type smoothers for saddle point problems. Numer. Math. 95(2), 377–399 (2003). https://doi.org/10.1007/s00211-002-0448-3 MathSciNetzbMATHGoogle Scholar
- 96.Schornbaum, F., Rüde, U.: Extreme-scale block-structured adaptive mesh refinement. SIAM J. Sci. Comp. 40(3), C358–C387 (2018)MathSciNetzbMATHGoogle Scholar
- 97.Stals, L., Rüde, U., Weiß, C., Hellwagner, H.: Data local iterative methods for the efficient solution of partial differential equations. In: John, N., Andrew, G., Michael, T. (eds.) Computational Techniques And Applications: Ctac 97-Proceedings Of The Eight Biennial Conference. World Scientific, Singapore (1998)Google Scholar
- 98.Vanka, S.: Block-implicit multigrid solution of Navier-Stokes equations in primitive variables. J. Comput. Phys. 65, 138–158 (1986). https://doi.org/10.1016/0021-9991(86)90008-2 MathSciNetzbMATHGoogle Scholar
- 99.Verfürth, R.: A multilevel algorithm for mixed problems. SIAM J. Numer. Anal. 21, 264–271 (1984). https://doi.org/10.1137/0721019 MathSciNetzbMATHGoogle Scholar
- 100.Vynnytska, L., Bunge, H.: Restoring past mantle convection structure through fluid dynamic inverse theory: regularisation through surface velocity boundary conditions. GEM - Int. J. Geomath. 6(1), 83–100 (2014). https://doi.org/10.1007/s13137-014-0060-6 MathSciNetzbMATHGoogle Scholar
- 101.Waluga, C., Wohlmuth, B., Rüde, U.: Mass-corrections for the conservative coupling of flow and transport on collocated meshes. J. Comp. Phys. 305, 319–332 (2016). https://doi.org/10.1016/j.jcp.2015.10.044 MathSciNetzbMATHGoogle Scholar
- 102.Weismüller, J., Gmeiner, B., Ghelichkhan, S., Huber, M., John, L., Wohlmuth, B., Rüde, U., Bunge, H.P.: Fast asthenosphere motion in high-resolution global mantle flow models. Geophys. Res. Lett. 42(18), 7429–7435 (2015). https://doi.org/10.1002/2015GL063727 Google Scholar
- 103.Williams, S., Waterman, A., Patterson, D.: Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785 Google Scholar
- 104.Wobker, H., Turek, S.: Numerical studies of Vanka-type smoothers in computational solid mechanics. Adv. Appl. Math. Mech. 1(1), 29–55 (2009)MathSciNetGoogle Scholar
- 105.Zhong, S., McNamara, A., Tan, E., Moresi, L., Gurnis, M.: A benchmark study on mantle convection in a 3-D spherical shell using CitcomS. Geochem. Geophys. Geosyst. 9, Q10017 (2008). https://doi.org/10.1029/2008GC002048 Google Scholar
- 106.Zulehner, W.: A class of smoothers for saddle point problems. Computing 65, 227–246 (2000). https://doi.org/10.1007/s006070070008 MathSciNetzbMATHGoogle Scholar
- 107.Zulehner, W.: Analysis of iterative methods for saddle point problems: a unified approach. Math. Comp. 71(238), 479–505 (2002)MathSciNetzbMATHGoogle Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.