FEMPAR: An ObjectOriented Parallel Finite Element Framework
 2.3k Downloads
 2 Citations
Abstract
FEMPAR is an open source object oriented Fortran200X scientific software library for the highperformance scalable simulation of complex multiphysics problems governed by partial differential equations at large scales, by exploiting stateoftheart supercomputing resources. It is a highly modularized, flexible, and extensible library, that provides a set of modules that can be combined to carry out the different steps of the simulation pipeline. FEMPAR includes a rich set of algorithms for the discretization step, namely (arbitraryorder) grad, div, and curlconforming finite element methods, discontinuous Galerkin methods, Bsplines, and unfitted finite element techniques on cut cells, combined with hadaptivity. The linear solver module relies on stateoftheart bulkasynchronous implementations of multilevel domain decomposition solvers for the different discretization alternatives and blockpreconditioning techniques for multiphysics problems. FEMPAR is a framework that provides users with outofthebox stateoftheart discretization techniques and highly scalable solvers for the simulation of complex applications, hiding the dramatic complexity of the underlying algorithms. But it is also a framework for researchers that want to experience with new algorithms and solvers, by providing a highly extensible framework. In this work, the first one in a series of articles about FEMPAR, we provide a detailed introduction to the software abstractions used in the discretization module and the related geometrical module. We also provide some ingredients about the assembly of linear systems arising from finite element discretizations, but the software design of complex scalable multilevel solvers is postponed to a subsequent work.
1 Introduction
Even though the origins of the FE method trace back to the 50s, the field has drastically evolved during the last six decades, leading to increasingly complex algorithms to improve accuracy, stability, and performance. The use of the pversion of the FE method and its exponential convergence makes highorder approximations an excellent option in many applications [1]. Adaptive mesh refinement driven by a posteriori error estimates, i.e., hadaptivity, is an essential ingredient to reduce computational cost in an automatic way [2]. For smooth solutions, padaptivity or hybrid hpadaptivity can further reduce computational cost for a target level of accuracy [3]. Originally, FE methods were restricted to nodal Lagrangian bases for structural problems. The extension of FE methods to other applications, like porous media flow or electromagnetism, motivated the design of more complex bases and require different mappings from the reference to the physical space, complicating the implementation of these techniques in standard FE codes. Saddlepoint problems also require particular mixed FE discretizations for stability purposes [4, 5]. More recently, novel FE formulations have been proposed within the frame of exterior calculus, e.g., for mixed linear elasticity problems [6]. Physicscompatible discretization are also gaining attention, e.g., in the field of incompressible fluid mechanics. Divergencefree mixed FEs satisfy mass conservation up to machine precision, but their implementation is certainly challenging [7]. During the last decade, a huge part of the computational mechanics community has embraced isogeometric analysis techniques [8], in which the discretization spaces are defined in terms of NURBS (or simply splines), leading to smoother global spaces. In the opposite direction, discontinuous galerkin (DG) methods have also been actively developed, and novel approaches, like hybridizable DG and PetrovGalerkin DG methods, have been proposed [9, 10]. As the discretization methods become more and more complex, the efficient implementation of these techniques is more complicated. It also poses a challenge in the design of scientific software libraries, which should be extensible and provide a framework for the (easy) implementation of novel techniques, to be resilient to new algorithmic trends.
The hardware in which scientific codes run evolves even faster. During 40 years, core performance has been steadily increasing, as predicted by Moore’s law. In some years, supercomputers will reach 1 exaflop/s, a dramatic improvement in computational power that will not only affect the extreme scale machines but radically transform the whole range of platforms, from desktops to high performance computing (HPC) clouds. The ability to efficiently exploit the forthcoming 100x boost of computational performance will have a tremendous impact on scientific discoveries/economic benefits based on computational science, reaching almost every field of research. However, all the foreseen exascale growth in computational power will be delivered by increasing hardware parallelism (in distinct forms), and the efficient exploitation of these resources will not be a simple task. HPC architectures will combine generalpurpose fat cores, finegrain manycores accelerators (GPUs, DSPs, FPGAs, Intel MIC, etc.), and multiplelevel disruptivetechnology memories, with high nonuniformity as common denominator [11]. This (inevitable) trend challenges algorithm/software design. Traditional bulksynchronous message passing interface (MPI) approaches are likely to face significant performance obstacles. Significant progress is already being made by MPI+X [12] (with X=OpenMP, CUDA, OpenCL, OmpSs, Kokkos, etc.) hybrid execution models. Going a step further, asynchronous manytask execution models (e.g., Charm++[13], Legion [14], or HPX [15]) and their supporting runtime systems hold great promise [16].
Traditionally, researchers in the field of scientific computing used to develop codes with a very reduced number of developers, e.g., a university department, and a limited life span. The software engineering behind scientific codes was poor. Codes were rigid and nonextensible, and developed for a target application and a specific numerical method. However, the increasing levels of complexity both in terms of algorithms and hardware make the development of scientific software that can efficiently run stateoftheart numerical algorithms on HPC resources a real challenge. Considering to start from scratch a project of this kind has an ever increasing level of complexity. Furthermore, due to the huge resources required to carry out such a project, it is natural to develop a framework that will be resilient to new algorithmic and hardware trends, in order to maximize life time, and to be applicable to a broad range of applications. In this sense, objectoriented (OO) programming, which provides modularity of codes and datahiding, is the key for the software design of flexible and scalable (in terms of developers) projects.
There is a number of open source OO FE libraries available through the Internet, e.g., deal.II [17, 18], FEniCS [19], GRINS [20], Nektar++ [21], MOOSE [22], MFEM [23], FreeFem++ [24], and DUNE [25]. In general, these libraries aim to provide all the machinery required to simulate complex problems governed by partial differential equations (PDE) using FE techniques. In any case, every library has its main goal and distinctive features. Some libraries, like FreeFem++ or FEniCS, have extremely simple user interfaces. FEniCS has its own domain specific language for weak forms to automatically generate the corresponding FE code (preventing padaptivity) and includes a collection of Python wrappers to provide userfriendly access to the services of the library. Other sophisticated libraries like deal.II or DUNE have a slightly more demanding learning curve. In general, parallel adaptivity is at most partially supported; as far as we know, none of the libraries above have support for parallel hpadaptivity, unless DG methods are being used. Some libraries are restricted to a particular cell topology, e.g., deal.II is limited to hexahedral/quadrilateral (ncubes) meshes, while FEniCS only supports simulations on triangular/tetrahedral (nsimplices) meshes.
In general, these libraries provide modules for some of the different steps in the simulation pipeline, which involves the setup of the mesh, the construction of the FE space, the integration and assembly of the weak form, the solution of the resulting linear system, and the visualization of the computed solution. The solution of the linear system is clearly segregated from the discretization step in all the scientific software libraries described above (for parallel computations); the linear system is transferred to a generalpurpose sparse linear algebra library, mainly PETSc [26, 27, 28], Hypre [29], and Trilinos [30, 31]. As a result, the coupling between the discretization step and the linear solver step is somehow weak, since they rely on general purpose solvers, which usually involve simple interfaces. The strong point of these general purpose numerical linear algebra libraries is to be problemindependent, but it also limits their performance for specific applications, since they cannot fully exploit the underlying properties of the PDE operator and the numerical discretization.^{1} This segregation has a clear impact on the type of methods to be used. This blackbox approach to generalpurpose linear solvers has favoured the use of algebraic multigrid methods, the de facto linear solver [29]. On the other hand, geometric multigrid methods and domain decomposition (DD) methods, which are very specific to meshbased PDE solvers, are not common, even though they can be superior to algebraic methods in many cases. A geometric multigrid method that exploits the hpadaptive structure of the FE space is included in deal.II, but it can only be used in the serial case. In parallel scenarios, DD methods have certainly evolved during the last decade. Modern DD methods do not (necessarily) rely on a static condensation of the internal variables, which requires sparse direct methods for the local subdomain problems. Instead, inexact solvers can be used, e.g., multigrid methods, and linear complexity DD preconditioners can be defined (see [33, 34]). The definition of twolevel DD methods resembles the one of FE methods, by exchanging the FE and subdomain concepts, and their definition is strongly related to the one of multiscale FEs [35]. Furthermore, multilevel extensions can be naturally defined. In short, stateoftheart multilevel DD methods can be understood (in their inexact version) as a nonconforming multigrid method. Even though the mathematical theory of the DD methods is very sound, high performance implementations are quite recent (see [36, 37, 38]). On the other hand, we are not aware of any general purpose FE code that integrates a DD algorithm in the solution workflow. DD methods require subassembled matrices to be used, and are not supported by the majority of the existing advanced OO FE libraries. Analogously, the use of blockpreconditioning is in general poorly supported, because it involves the discretization of additional operators to define the approximated Schur complement, and the corresponding blockbased assembly of matrices.
On the other hand, based on the supercomputing trends, the segregation between time discretization, linearization, space discretization, and linear system solve, will progressively blur. As an example, nonlinear preconditioning and parallelintime solvers are two natural ways to attain the higher levels of concurrency of the forthcoming exascale supercomputers [36, 39]. These facts will complicate even more the rigid workflow of current advanced FE libraries. In this sense, current efforts in PETSc to provide nonlinear preconditioning interfaces can be found in [40], relying on callback functions, and the XBraid solver [41] aims to provide timeparallelism in a nonintrusive way.
2 The FEMPAR Project
In this work, we present FEMPAR, an OO FE framework for the solution of PDEs, designed from inception to be highly scalable on supercomputers and to easily handle complex multiphysics problems. The first public release of FEMPAR has almost 300K lines of code written in (mostly) OO Fortran and makes intensive use of the features defined in the 2003 and 2008 standards of the language. The source code that is complementary to this work corresponds to the first public release of FEMPAR, i.e., version 1.0.0. It is available at a git repository [42]. In particular, the first public release was assigned the git tag FEMPAR1.0.0, in accordance with the “Semantic Versioning” system.^{2}
FEMPAR is very rich in terms of FE technology. In particular, it includes not only Lagrangian FEs, but also curl and divconforming ones, e.g., Nédélec (edge) and RaviartThomas FEs. The library supports ncube and nsimplex meshes, and arbitrary highorder bases for all the FEs included. Continuous and discontinuous spaces can be used, providing all the machinery for the integration of DG facet (i.e., edges in 2D and faces in 3D) terms. Recently, in a beta version of the code, Bsplines have also been added, together with the support for cut cell methods (using XFEMtype techniques) and hpadaptivity, but we will not discuss these developments for the sake of brevity.
Moreover, FEMPAR has been developed with the aim to provide a framework that will allow developers to implement complex techniques that are not wellsuited in the traditional segregated workflow commented above. FEMPAR also provides a highly scalable builtin numerical linear algebra module based on stateoftheart domain decomposition solvers. FEMPAR can provide partially assembled matrices, required for DD solvers; the multilevel BDDC solver in FEMPAR has scaled up to almost half a million cores and 1.75 million MPI tasks (subdomains) in the JUQUEEN Supercomputer [34, 37]. It includes an abstract framework to construct applications and preconditioners based on multilevel nonoverlapping partitions. Even though every block within the library preserves modularity, the interface between discretization and numerical linear algebra modules within FEMPAR is very rich and focused on PDEbased linear systems. In the path to the exascale, FEMPAR has been designed to permit an asynchronous implementation of multilevel methods, both in terms of multiphysics FEs and multilevel solvers, which have been exploited, e.g., in [37]. It is a unique feature that is not available in other similar libraries. The library also allows the user to define blocks in multiphysics applications, that can be used to easily implement complex block preconditioners [43, 44, 45]. All these blocks are very customizable, which has already been used to develop scalable DD solvers for electromagnetics problems and block preconditioners for multiphysics problems, e.g., magnetohydrodynamics [44]. These distinctive features of FEMPAR, however, are not discussed in this article but in a forthcoming one. A general discussion of the main ingredients of our implementation of the discretization step using FElike approximations is first necessary, which is the purpose of this work.
FEMPAR has already been successfully used in a wide set of applications by the authors of the library: simulation of turbulent flows and stabilized FE methods [46, 47, 48, 49], magnetohydrodynamics [50, 51, 52, 53, 54], monotonic FEs [55, 56, 57, 58, 59], unfitted FEs and embedded boundary methods [60], and additive manufacturing simulations [61]. It has also been used for the highly efficient implementation of DD solvers [34, 37, 39, 62, 63, 64, 65, 66] and block preconditioning techniques [44].
This work is more than an overview article with the main features of the library. It is a detailed description of the software abstractions being used within FEMPAR to develop an efficient, modular, and extensible implementation of FE methods and supporting modules in a broad sense. To this end, we enrich the discussion with code snippets that describe data structures, bindings, and examples of use.^{3} This document is intended to be used as a guide for new FEMPAR developers that want to get familiarized with its software abstractions. But it can also be a useful tool for developers of FE codes that want to learn how to implement FE methods in an advanced OO framework. In any case, due to the size of the library itself, many details cannot be exposed, to keep a reasonable article length. The article can be read in different ways, since it is not necessary to fully understand all the preceding sections to grasp the main ideas of a section. For instance, the section about the abstract implementation of polytopes in arbitrary dimensions and its related algorithms is quite technical and a reader that is not particularly interested in the internal design of this type and its bindings implementations can skip it. Experienced FE researchers can skip the short section with the basics of FE methods, and only look at this one (if needed) when referred in subsequent sections.

The polytope, which describes a set of admissible geometries and permits the automatic, dimensionindependent generation of reference cells and structured domains. The mathematics underlying the polytope are presented in Sect. 3.14, while its software implementation in Sect. 4.

The polynomial abstraction and related data types, which are presented in Sects. 3.4 and 5, respectively. These sections describe how shape functions bases can be generated for arbitrary orders and for ncube and nsimplex topologies.

The reference FE in Sect. 6, which describes the reference cell and defines a set of basis functions and degrees of freedom on each cell.

The triangulation in Sect. 7, which represents a discrete approximation of the physical domain \(\Omega \).

A set of tools required to perform numerical integration (e.g., quadratures and geometrical maps) produced by the reference FE and described in Sect. 8 for cell integrals and in Sect. 9 for facet integrals.

The FE space described in Sect. 10, built from a triangulation and a set of reference FEs, which represents a global space of FE functions.

The discrete integration, an abstract class to be extended by the user to define an affine FE operator, which describes the numerical integration of the weak form of the problem to be solved, described in Sect. 11.2.

The linear (affine) operator in Sect. 11, whose root is the solution of the problem at hand, constructed using the FE space and a discrete integration.

An example of a user driver in Sect. 12, in which the different ingredients previously described are used to simulate a problem governed by PDEs, the Stokes system.
3 The FE Framework
In this section, we briefly introduce all the mathematical abstractions behind the FE method for the discretization of problems governed by PDEs. For a more detailed exposition of the topics, we refer to [69, 70, 71, 72]. The FEs described below (and many other not covered herein) can be formulated and analyzed using the finite element exterior calculus framework [6], which makes use of exterior algebra and exterior calculus concepts. In this framework, one can define FEs, e.g., div and curlconforming ones, in arbitrary space dimensions, using the concept of differential kforms. However, we have decided not to use such presentation of FE methods to simplify the exposition for readers not familiar with these abstractions.
3.1 The Boundary Value Problem in Weak Form
Example 3.1
Example 3.2
3.2 Space Discretization with FEs
In order to define FE spaces, we require a triangulation \(\mathcal {T}_h\) of the domain \(\Omega \) into a set \(\{ K\}\) of cells. This triangulation is assumed to be conforming, i.e., for two neighbour cells \(K^+, \, K^ \in \mathcal {T}_h\), its intersection \(K^+ \cap K^\) is a whole kface (\(k<d\)) of both cells (note that kface refers to a geometrical entity, e.g. cells, faces, edges and vertices for \(d=3\), see Sect. 3.14). In practice, the cells must be expressed as a particular type of mapping over a set of admissible geometries (polytopes, see Sect. 3.14). Thus, for every element \(K \in \mathcal {T}_h\), we assume that there is a reference cell \({\hat{K}}_K\) and a diffeomorphism \(\varvec{\Phi }_K: {\hat{K}}\rightarrow K\). In what follows, we usually use the notation \({\hat{\varvec{x}}} \, \doteq \, \varvec{\Phi }_K^{1}(\varvec{x})\).
The definition of the functional space also relies on a reference functional space as follows: (1) we define a functional space in the reference cell \({\hat{K}}\); (2) we define a set of functions in the physical cell \(K\) via a function mapping; (3) we define the global space as the assemble of cellbased spaces plus continuity constraints between cells. In order to present this process, we introduce the concept of reference FE, FE, and FE space, respectively.
3.3 The FE Concept in the Reference and Physical Spaces
In the reference space, we build reference FEs \(({\hat{K}},\hat{\mathcal {V}},{\hat{\Sigma }})\) as follows. First, we consider a bounded set of possible cell geometries, denoted by \({\hat{K}}\); see the definition of polytopes in Sect. 3.14. On \({\hat{K}}\), we build a functional space \(\hat{\mathcal {V}}\) and a set of DOFs \({\hat{\Sigma }}\). We consider some examples of reference FEs in Sects. 3.8, 3.9, and 3.10.
In the physical space, the FE triplet \((K,\mathcal {V},\Sigma )\) on a mesh cell \(K\in \mathcal {T}_h\) relies on: (1) a reference FE \(({\hat{K}},\hat{\mathcal {V}},{\hat{\Sigma }})\), (2) a geometrical mapping \(\varvec{\Phi }_K\) such that \(K\, \doteq \, \varvec{\Phi }_K({\hat{K}})\), and (3) a linear bijective function mapping \({\hat{\Psi }}_K: \hat{\mathcal {V}}\rightarrow \hat{\mathcal {V}}\). The functional space in the physical space is defined as \(\mathcal {V}\, \doteq \, \{ \hat{\Psi }_K({\hat{v}}) \circ \varvec{\Phi }^{1}_K: \, {\hat{v}} \in \hat{\mathcal {V}}\}\); we will also use \({\Psi }_K: \hat{\mathcal {V}}\rightarrow \mathcal {V}\) defined as \({\Psi }_K({\hat{v}}) \, \doteq \, {\hat{\Psi }}_K({\hat{v}}) \circ \varvec{\Phi }^{1}_K\). The set of DOFs in the physical space is defined as \(\Sigma \, \doteq \, \{ {\hat{\sigma }} \circ {\Psi }_K^{1} \, : \, {\hat{\sigma }} \in {\hat{\Sigma }}\}\). Given the set of shape functions \(\{ \hat{\phi }^{a} : a \in {\mathcal {N}}_{\hat{\Sigma }}\}\) in the reference FE, it is easy to check that \(\{ \phi _K^{a} \, \doteq \, \Psi _K( \hat{\phi }^{a} ) : a \in {\mathcal {N}}_{\hat{\Sigma }}\}\) are the set of shape functions of the FE in the physical space.
The reference FE space \(\hat{\mathcal {V}}\) is usually a polynomial space. Thus, the first ingredient is to define bases of polynomials; see Sect. 3.4. The analytical expression of the basis of shape functions is not straightforward for complicated definitions of moments; this topic is covered in Sect. 3.5. After that, we will consider how to build global (and conforming) FE spaces in Sect. 3.6, and how to integrate the bilinear forms in the corresponding weak formulation in Sect. 3.7. We finally provide three examples of FEs in Sects. 3.8, 3.9, and 3.10.
3.4 Construction of Polynomial Spaces
This Cartesian product construction leads to a basis for the local FE spaces usually used on ncubes, i.e., the space of polynomials that are of degree less or equal to k with respect to each variable \(x_1, \ldots , x_d\). We can define monomials by a dtuple \(\varvec{\alpha }\) as \(p_{\varvec{\alpha }}(\varvec{x}) \, \doteq \, \Pi _{i=1}^d x_i^{\alpha _i}\), and the polynomial space of order \({\varvec{k}}\) as \({\mathcal {Q}}_{\varvec{k}}= \mathrm{span} \{ p_{\varvec{\alpha }}(\varvec{x}) \, : 0 \le \alpha _i \le k_i, \, i = 1, \ldots , d \}\). We have \({\mathcal {Q}}_{\varvec{k}}= \mathrm{span} \{ \ell \, : \, \ell \in \mathcal {L}^{\varvec{k}}\}\).
The definition of polynomial spaces on nsimplices is slightly different. It requires the definition of the space of polynomials of degree equal or less than k in the variables \(x_1,\ldots ,x_d\). It does not involve a full Cartesian product of 1D Lagrange polynomials (or monomials) but a truncated space, i.e., the corresponding polynomial space of order k is \({\mathcal {P}}_k = \mathrm{span} \{ p_{\varvec{\alpha }}(\varvec{x}) \, :  \varvec{\alpha } \le k \}\), with \( \varvec{\alpha } \, \doteq \, \sum _{i=1}^d \alpha _i\). Analogously as for ncubes, a basis for the dual space of \({\mathcal {P}}_k\) are the values at the set of nodes \(\tilde{{\mathcal {N}}}^{k} \, \doteq \, \{ {\varvec{s}}\in {\mathcal {N}}^{k\varvec{1}} \, : \, {\varvec{s}} \le k \}\). It generates the typical gradconforming FEs on nsimplices.
3.5 Construction of the Shape Functions Basis
The analytical expression of shape functions can become very complicated for high order FEs and nontrivial definitions of DOFs, e.g., for electromagnetic applications. Furthermore, to have a code that provides a basis for an arbitrary high order, an automatic generator of shape functions must be implemented. When the explicit construction of the shape functions is not obvious, we proceed as follows.
3.6 Global FE Space and Conformity
Finally, we must define the global FE space. Conforming FE spaces are defined as: \(\mathcal {X}_h\, \doteq \, \{ v \in \mathcal {X}\, : \, v_K \in \mathcal {V}\}.\) The main complication in this definition is to enforce the conformity of the FE space, i.e., \(\mathcal {X}_h\subset \mathcal {X}\). In fact, the conformity constraint is the one that motivates the choice of \({\hat{\Sigma }}\) and \(\Psi \), and as a consequence, \(\Sigma \). In practice, the conformity constraint must be restated as a continuity constraint over FE DOFs. In general, these constraints are implicitly enforced via a global DOF numbering, even though it is not possible in general for adaptive schemes with nonconforming meshes and/or variable order cells, which require more involved constraints.
Let us define by \(\mathcal {M}_h \, \doteq \, \{ (b,K) : b \in {\mathcal {N}}_{\Sigma _K}, \, K\in \mathcal {T}_h\}\) the Cartesian product of local DOFs for all cells. We define the global DOFs as the quotient space of \(\mathcal {M}_h\) by an equivalence relation \(\sim \). Using standard notation, given \(\sim \), the equivalence class of \(a \in \mathcal {M}_h\) with respect to \(\sim \) is represented with \([a] \, \doteq \, \{ b \in \mathcal {M}_h \, : \, a \sim b \}\), and the corresponding quotient set is \({\mathcal {N}}_h \, \doteq \, \{ [a] \, : \, a \in \mathcal {M}_h \}\). The set \({\mathcal {N}}_h\) is the set of global DOF and \([\cdot ]\) represents the localtoglobal DOF map. We assume that the equivalence relation is such that if two elements \((b,K), \, (b',K') \in \mathcal {M}_h\) are such that \((b,K) \sim (b',K')\), then \(K \ne K'\).^{5} Using the onetoone mapping between moments and shape functions, the same operator allows one to define global shape functions \(\phi ^{a} = \sum _{(b,K) \sim a} \phi ^{b}_K\). We assume that the choices above are such that they satisfy the conformity constraint, i.e., \(\mathcal {X}_h= \mathrm{span}\{ \phi ^{a} \}_{a \in {\mathcal {N}}_h} \subset \mathcal {X}\).
Below, we provide details about how to choose the local DOFs \({\hat{\Sigma }}\), the function map \(\Psi \), and the equivalence relation \(\sim \) such that the conformity property is satisfied for grad, div, and curlconforming FE spaces. The case of nonconforming methods, e.g., DG methods, can readily be considered. In this case, the conformity constraint is not required, which leads to much more flexibility in the definition of DOFs. On the other side, these schemes require numerical perturbations of the continuous bilinear and linear forms in (4) that involve integrals over the facets of FEs to weakly enforce the conformity. (Facets are \((d1)\)faces, e.g., faces in 3D and edges in 2D).
3.7 Numerical Integration
Quadrature rules for \({\hat{K}}\) being an ncube can readily be obtained as a tensor product of a 1D quadrature rule, e.g., the GaussLegendre quadrature. Symmetric quadrature rules on triangles and tetrahedra for different orders can be found, e.g., in [69]. In any case, to create arbitrarily large quadrature rules for nsimplices, one can consider the socalled Duffy transformation [73, 74].
As it is well known, considering ncube topologies for \({\hat{K}}\), Gauss quadratures with n points per direction can integrate exactly \(2n1\) order polynomials. e.g., For a Lagrangian reference FE of order p and an affine geometrical map, we choose \(n=p+ \mathrm{ceiling}( 1/2 ) = p+1\) per direction to integrate exactly a mass matrix. For nsimplex meshes, we use either symmetric quadratures (if available) or tensor product rules plus the Duffy transformation [73, 74]. The latter case is based on introducing a change of variables that transform our nsimplex integration domain into an ncube, and integrate on the ncube using tensor product quadratures. It is worth noting that this change of variables introduces a nonconstant Jacobian. The determinant of the Jacobian is of order at most \(d1\) with respect to each variable. To integrate a mass matrix exactly, we must be able to integrate exactly polynomials of order \(2p+d1\). Therefore, we need to take \(n=p+ \mathrm{ceiling}( d/2 )\) to exactly integrate mass matrices.
3.8 GradConforming FEs: Lagrangian (Nodal) Elements
In this section, we consider one characterization of the abstract FE technology above. First, we are interested in the socalled nodal FEs, based on Lagrange polynomials and DOFs based on nodal values.
Let us consider the same order for all components, i.e., \(k \varvec{1}\, \doteq \, (k ,\ldots , k)\). When the reference geometry \({\hat{K}}\) is an ncube, we define the reference FE space as \(\mathcal {V}_k \, \doteq \, {\mathcal {Q}}_{k \varvec{1}}\). The set of nodes \({\mathcal {N}}^{k\varvec{1}}\) can be generated, e.g., from the equidistant Lagrangian nodes. Let us define the bijective mapping \({\mathtt {i}}(\cdot )\) from the set of nodes \({\mathcal {N}}^{k\varvec{1}}\) to \(\{1, \ldots , {\mathcal {N}}^{k\varvec{1}} \} \equiv {\mathcal {N}}_\Sigma \), i.e., the local node numbering. The set of local DOFs \({\mathcal {N}}_{\Sigma _K}\) are the nodal values, i.e., \(\sigma _{{\mathtt {i}}(\varvec{s})} \, \doteq \, v(\varvec{x}_{\varvec{s}})\), for \(\varvec{s} \in {\mathcal {N}}^{\varvec{k}}\). Clearly, the reference FE shape functions related to these DOFs are \(\phi ^{{\mathtt {i}}(\varvec{s})} \, \doteq \, \ell _{\varvec{s}}^{k\varvec{1}}\). On the other hand, we simply take \({\hat{\Psi }}(v) \, \doteq \, v\).
For nsimplices, we consider the reference FE space \({\mathcal {P}}_k\) spanned by the prebasis \(\{ p_{\varvec{\alpha }}(\varvec{x}) \, : 0 \le \alpha _i \le k, \, i = 1, \ldots , d \}\) and the set of nodes \(\tilde{{\mathcal {N}}}^{k}\) (see Sect. 3.4). The set of local DOFs \({\mathcal {N}}_{\Sigma _K}\) are the nodal values. Since the prebasis elements are not shape functions, we proceed as in Sect. 3.5 to generate the expression of the shape functions basis for arbitrary order reference FEs on nsimplices.
The global FE space is determined by the following equivalence relation. The set of local DOFs for ncubes is \(\mathcal {M}_h \, \doteq \, \{ (\varvec{s},K) \, : \, \varvec{s}\in {\mathcal {N}}^{k \varvec{1}}, K \in \mathcal {T}_h\}\) due to the onetoone mapping between DOFs and nodes; we replace the set of nodes by \(\tilde{{\mathcal {N}}}^{k}\) for nsimplices. Furthermore, we say that \((\varvec{s},K) \sim (\varvec{s}',K')\) iff \(\varvec{x}_{\varvec{s}} = \varvec{x}_{\varvec{s}'}\). The implementation of this equivalence relation, and thus, the global numbering, relies on the ownership relation between nfaces and DOFs (e.g., in 3D we can say whether a DOF belongs to a vertex, edge, or face) and a permutation between the local node numbering in \(K^+\) to the one in \(K ^\) for nodes on F. See Sect. 3.14 for more details. With such global DOF definition, it is easy to check that the global FE space functions are \(\mathcal {C}^0\) and thus gradconforming.
Since Lagrangian moments involve pointwise evaluations of functions and \(H^1_0(\Omega ) \not \subset \mathcal {C}^0(\Omega )\) for \(d>1\), the interpolator (9) is not defined in such space. Instead, we consider that functions to be interpolated belong, e.g., to the space \(\tilde{\mathcal {X}} \, \doteq \, \mathcal {C}^0(\Omega )\).
When one has to deal with vector or tensor fields, we can generate them as a Cartesian product of scalar spaces as follows. We define the local FE space \(\varvec{\mathcal {V}}_k \, \doteq \, [{\mathcal {Q}}_{k \varvec{1}}]^d\) and the function map \({\hat{\Psi }}(\varvec{v}) \, \doteq \, \varvec{v}\). In the vector case, the local DOFs set is represented with \(\mathcal {M}_h \, \doteq \, \{ (i,\varvec{s},K) \, : \, 1 \le i \le d, \, \varvec{s}\in {\mathcal {N}}^{k \varvec{1}}, K \in \mathcal {T}_h\}\), and \((i,\varvec{s},K) \sim (i',\varvec{s}',K')\) iff \(i = i'\) and \(\varvec{x}_{\varvec{s}} = \varvec{x}_{\varvec{s}'}\). Analogously, shape functions are computed as \({\phi }^a \, \doteq \, \sum _{(i,\varvec{s},K) \sim a} \ell _{\varvec{s}}^{k \varvec{1}} \mathbf {\varvec{e}}_i\); \(\mathbf {\varvec{e}}_i\) represents the ith canonical basis vector of \(\mathbb {R}^d\). We proceed analogously for nsimplices.
3.9 DivConforming FEs
3.10 CurlConforming FEs
3.11 Cartesian Product of FEs for Multifield Problems
Many problems governed by PDEs involve more than one field, e.g., the NavierStokes equations or any multiphysics problem. Let us consider a PDE that involves a set of unknown fields \((\varvec{u}_1 , \ldots , \varvec{u}_n) \in \mathcal {X}^1 \times \ldots \times \mathcal {X}^n\), defined as the Cartesian product of functional spaces. We can proceed as above, and define a FE space for every field space separately, leading to a global FE space \( \mathcal {X}^1_h \times \ldots \times \mathcal {X}^n_h\) defined by composition of FE spaces. To define the global numbering of DOFs in the multifield case, we consider that two DOFs are equivalent if they are related to the same field and satisfy the equivalence relation of the FE space of this field.
The Cartesian product of FE spaces is enough to define volumecoupling multiphysics problems governed on the same physical domain, i.e., the different physics are defined on the whole domain and coupled through volume terms in the formulation. However, many multiphysics problems are interfacebased, i.e., the coupling between different physics that are defined on different subdomains is through transmission conditions on the interface. This is the case, e.g., of fluidstructure problems (see, e.g., [76, 77, 78, 79]). In these cases, different FE spaces could be defined on different parts of the global mesh, i.e., one must describe the set of subdomains \(( \Omega _1, \ldots , \Omega _n )\) of the whole domain \(\Omega \) in which the corresponding FE spaces are defined.
3.12 Nonconforming Methods
Up to now, we have considered a global FE space that is conforming, i.e., \(\mathcal {X}_h \subset \mathcal {X}\). Alternatively, one can consider FE schemes that are not conforming. Since the original bilinear form has no sense in general for a nonconforming FE space \(\mathcal {X}_h\), one shall consider a stabilized bilinear form \(a_h\) that is wellposed (stable and continuous) in the discrete setting. In general, these schemes replace the required intercell continuity for conformity by a weak imposition of such continuity. Thus, the intercell continuity is imposed weakly through penaltylike terms. DG methods are schemes of this type [71].
3.13 Facet Integration
The expression of the shape functions and their gradients in the physical space in terms of the ones in the reference space are computed by using the cellwise maps. Thus, two mappings \(\varvec{\Phi }_{K^+}\) and \(\varvec{\Phi }_{K^}\) among the reference cell \({\hat{K}}\) and the cells \(K^+\) and \(K^\) in physical space, respectively, are involved in the numerical evaluation of interior facet integrals. We can also consider the reference facet \({\hat{F}}\) and a map \(\varvec{\Phi }_F\) from this reference facet to F (analogously as \(\varvec{\Phi }_K\) and K but in one dimension less in the reference space). We can define a quadrature rule \((\hat{\varvec{x}}_\mathrm{gp},\mathrm{w}_\mathrm{gp})\) in \({\hat{F}}\). We can also define the reference facet \({\hat{F}}^\pm \) of \({\hat{K}}\) such that \(\varvec{\Phi }_{K^\pm }( {\hat{F}}^\pm ) = F\), and the map \(\varvec{\Phi }_{\hat{F}^\pm }\) from \({\hat{F}}\) to \({\hat{F}}^\pm \). With this map, we can define the quadrature \((\hat{\varvec{x}}_\mathrm{gp}^\pm \, \doteq \, \varvec{\Phi }_{\hat{F}^\pm }(\hat{\varvec{x}}_\mathrm{gp}), \mathrm{w}_\mathrm{gp})\) with respect to the reference cell \({\hat{K}}\).
3.14 Polytopes
One of the motivations of FEMPAR is to develop a framework that can deal with arbitrary space dimensions. It permits to readily implement spacetime formulations, which are posed in 4D. Other higherdimensional applications include systems of PDEs posed in the phase space, e.g., the 7D (including time) VlasovMaxwell equations for the simulation of plasmas.
In this section, we provide the mathematical abstraction of cell topologies based on the concept of polytope. This abstract concept is of practical importance, because it allows us to develop algorithms and codes that can be applied to any topology that fits into the framework. The framework developed herein is very general and includes triangles and quadrilaterals in 2D, and tetrahedra, hexahedra, prysms, and pyramids in 3D. Furthermore, it can also be extended to arbitrary dimensions, to define not only ncubes and nsimplices but many other topologies. A polytope is mathematically defined as the convex hull of a finite set of points. As a consequence, a polytope is a polyhedron. In the frame of FEMPAR, we consider polytopes that can be expressed as the image of the composition of two operators. The definition of topologies for reference FEs based on this idea can be found in [25].
The main topological information consumed by FE codes is the description of the ddim polytope boundary as the assemble of \((d1)\)dim polytopes, proceeding recursively till 0dim objects are obtained (vertices); we use the contraction kdim object to say object of dimension k. These lowerdimensional entities describing the polytope boundary are denoted herein as nfaces. Usually, the nomenclature used to describe nfaces in FEs is restricted to 3D problems. In FEMPAR and in the following exposition, we use a dimensionindependent nomenclature in order to accommodate higherdimensional problems. We consider the space dimension \(d \in \mathbb {N}^+\) and a ddim polytope. We define the dface as the polytope itself. The set of \((d1)\)dim polygons that compose the boundary of the polytope are its \((d1)\)faces; \((d1)\)faces are usually denoted as facets. We can proceed recursively, i.e., defining the \((k1)\)faces of the polytope as the set of facets of its kfaces till reaching 0faces. In 3D, 3faces are called cells, 2dim faces are faces, 1dim faces are edges, and 0dim faces are vertices. Herein, we use the term nfaces to denote all these objects. In this work, we denote by vefs the set of nfaces of dimension lower than the space dimension, e.g., it only includes vertices, edges, and faces in 3D.
Let us introduce some notation. We represent the set of bitmaps of size m with \(\mathbb {B}^m\). The bitmaps \((1,1,\ldots ,1)\) and \((0,0,\ldots ,0)\) are represented with \({\mathtt {1}}\) and \({\mathtt {0}}\), respectively. Given a domain \(\square \subset \mathbb {R}^d\) we use the notation \(\alpha \square + \varvec{b}\), \(\alpha \in \mathbb {R}\), \(\varvec{b}\in \mathbb {R}^d\) to denote the domain \(\{ \alpha \varvec{x}+ \varvec{b}\, : \, \varvec{x}\in \square \}\). \(\mathbf {\varvec{e}}_j\) represents the jth canonical basis vector of \(\mathbb {R}^d\).
Let us define first the directional extrusion \({\square }_{(j;\alpha ,\beta )}\) of \(\square \) with respect to the direction \({\mathbf {\varvec{e}}_j}\) of type \((\alpha ,\beta )\). \(\alpha \) determines the topology of the extrusion, namely a prysmtype extrusion (1) or a pyramidtype extrusion (0) (see also [25]). \(\beta \) determines whether we want to perform the \(\alpha \)extrusion (1) or donothing (0). Based on this, we have the following definition.
Definition 3.3
The directional extrusion can be used recursively to define polytopes and their nfaces. An nface is determined by a topology \({\mathtt {t}}\in \mathbb {B}^d\), an extrusion \({\mathtt {e}}\in \mathbb {B}^d\), and an anchor vertex \(\varvec{v}\in \mathbb {R}^d\), using a recursive procedure as follows. The use of directional extrusions to get different polytopes and nfaces is illustrated in Figs. 3 and 4. One can observe how all the lower dimensional nfaces after directional extrusion lead to one dimension larger nfaces for different values of \(\alpha \).
Definition 3.4
In codes, like in FEMPAR , the topology can be coded with the bitmap \({\mathtt {t}}\) (e.g., one 32bit integer). FEMPAR can use any geometry that can be defined this way, for an arbitrary space dimension. This polytope definition leads to the following geometries. The 1dim line segment topology is \({\mathtt {t}}= (0)\) or (1); this ambiguity in 1D is inherited to higher dimensions. In 2D, the triangle topology is \({\mathtt {t}}= (00)\) (or (01)) and the quadrilateral topology \({\mathtt {t}}= (10)\) (or (11)). In 3D, cubes are represented by \({\mathtt {t}}= (1,1,0)\) (or (1, 1, 1)), tetrahedra \({\mathtt {t}}= (0,0,0)\) (or (0,0,1)), prysms by \({\mathtt {t}}=(1,0,0)\) (or (1,1,1)), and pyramids by \({\mathtt {t}}=(0,1,0)\) (or (0,1,1)). Cosserats in 4D are represented by \({\mathtt {t}}=(1,1,1,0)\) (or (1,1,1,1)). In general, \(2^{k1}\) types of kdim topologies are possible. ncubes are expressed by \({\mathtt {t}}= {\mathtt {1}}\) and nsimplices by \({\mathtt {t}}= {\mathtt {0}}\).
For a given nface \(\square \equiv ({\mathtt {t}},{\mathtt {e}},{\mathtt {v}})\), we want to define the set \(\mathcal {S}_{\square }\) of all nfaces of \(\square \). In order to do so, we introduce the following concepts.
Definition 3.5
Definition 3.6
All the resulting nfaces can also be written with the \(({\mathtt {t}},{\mathtt {e}},{\mathtt {v}})\) notation commented above. In order to define this chain as in (28) (i.e., only based on the bitmap notation), we note the following. Given the nface \(\square \equiv ({\mathtt {t}},{\mathtt {e}},{\mathtt {v}})\), the nface \(\square + {\mathbf {\varvec{e}}_j}\equiv ({\mathtt {t}},{\mathtt {e}},{\mathtt {v}}.o_j(1))\). With this ingredient, we can implement the generator of all nfaces of an nface using the bitmap notation.
Using this definition of facets for the 3D cube, we get the following faces: ((1, 1, 0); (0, 0, 0)) and ((1, 1, 0); (0, 0, 1)) faces (\(x=0\) and \(x=1\) faces), ((1, 0, 1); (0, 0, 0)) and ((1, 0, 1); (0, 1, 0)) faces (\(y=0\) and \(y=1\) faces), ((0, 1, 1); (0, 0, 0)) and ((0, 1, 1); (1, 0, 0)) faces (\(z=0\) and \(z=1\) faces), having 6 faces in total. For every one of these faces, we can use the same definition above, to obtain the \((d2)\)faces that are in the boundary of every \((d1)\)face. All these ideas can be used for any polytope, not only ncubes. The only difference is the type of extrusion being used in every case.
3.15 Node Generation and Indexing
FE spaces are polynomial spaces, e.g., Lagrangian polynomials. (Let us note that div and curlconforming FEs also rely on Lagrangian polynomials for the definition of the prebases and the definition of the equivalence classes.) In order to express these polynomials, one must define a set of points (nodes). In the following, we define a node generator for a given order on an arbitrary polytope, using lexicographical notation.^{8}
Definition 3.7
Given a node \({\varvec{\alpha }}\in \mathbb {N}^d\) in lexicographical notation and assuming an equidistant distribution of nodes, its space coordinates \(x_{\varvec{\alpha }}\in \mathbb {R}^d\) can readily be obtained, \(x_{\varvec{\alpha }}\, \doteq \, {\varvec{\alpha }}/k\). We note that for ncubes we recover the typical tensor product definition of nodes and the corresponding truncated subset of nodes for nsimplices. Other node generators can also be considered, especially for very highorder elements (e.g., Fekete points).
It is basic in FE analysis to have an ownership relation between nfaces and nodes. In particular, it is basic to enforce continuity between FEs by enforcing continuity of nodal values. In order to generate the nodes of the polytope that belong to an nface, we use the following construction.
3.16 Global DOF Numbering and Conformity
A basic ingredient in FE analysis is the imposition of continuity among FEs in order to build conforming global FE spaces. This process is mathematically defined with equivalence classes on DOFs (see Sect. 3.6). For example, functions in the Lagrangian FE space are related to geometrical nodes, and to impose continuity of a function among FEs is equivalent to impose continuity of nodal values in the same spatial position (see Sect. 3.8). In the following, we provide a mechanism to identify nodes in two different cells that share the same position to implement the required equivalence class. The situation is slightly more involved for divconforming and curlconforming FE spaces. In these cases, one can still determine a DOF with a node plus nface ownership (see Sects. 3.9 and 3.10, respectively). Thus, the equivalence class in these situations can be formulated as in Lagrangian FEs (determine nodes with the same position) at every nface separately.
Following Sect. 3.6, a node within a cell of our triangulation can be represented as (b, K), where b is the local cellwise index of the node and K is the cell global index. Given an nface F of the cell, the same node can be represented with \((b',F,K)\), where \(b'\) is an nfacebased local index. For example, node 8 (cellwise local index) in the cell of Fig. 6 can also be determined as the node 1 (facetwise local index) of the nface 8 (see Fig. 5). This facetwise local index is determined by the coordinate system being used at the nface. For example, the nodes of nface 8 in Fig. 6 are ordered as (8, 12) (i.e., first 8 and then 12). On the other hand, node indices are represented with the coordinates in a lexicographical coordinate system, as presented in (31). For example, node with \(b=8\) (\(b'=1\) in nface 8) is represented with the coordinates \(\varvec{s}=(4,1)\) (\(\varvec{s}'=(1)\) in the nface).
Let us consider an nface F in our triangulation, two cells \(K^{+}\) (source cell) and \(K^{}\) (target cell) sharing the nface, and nodes \((\varvec{s}_+',F,K^{+})\) and \((\varvec{s}'^{},F,K^{})\) (with nfacewise local indices). The question that must be answered is: are nodes \((\varvec{s}_+',F,K^{+})\) and \((\varvec{s}'^{},F,K^{})\) in the same spatial position? This question can be answered with the map \(\mathtt {p}_F\) in (15) that, given the position of the node in the coordinate system of F in \(K^{+}\), provides the one in \(K^{}\).
We note that this mapping is trivial when using structured (possibly locally adapted) ncube meshes, since the local ordering of nodes in an nface based on increasing local index leads to the same ordering for all cells containing that nface; we say that the mesh is properly oriented in this case. However, 2D or 3D unstructured mesh generators might not return properly oriented meshes, and thus the FE code has to deal with the explicit construction and application of permutations. We also note that one can always end up with oriented meshes for nsimplices by simple cellwise permutations (see, e.g., [72, Sect. 5.5] and [80]). After reading nsimplex meshes, these meshes are always properly oriented in FEMPAR before proceeding to any computation. While this is also true for 2D ncube meshes, 3D ncube meshes cannot be properly oriented in general [81].
Let us consider the reference polytope \({\hat{K}}\) associated to \(K^{+}\) and \(K^{}\). In general, the nface F has a different nface local index with respect to the two cells; its corresponding reference nface is represented with \({\hat{F}}^+\) and \({\hat{F}}^{}\) for \(K^{+}\) and \(K^{}\), respectively. In general, the map between nodes of these two nfaces can be defined by using (32), which is invertible (since it is linear and full rank). Using this approach, the map can be generated for arbitrary dimension and polytope topology. However, for the particular case of 2D/3D ncube meshes, we have implemented this procedure in a more computationally efficient manner. In particular, the required permutations (mappings) are expressed in terms of a set of tables, which are stored and set up (filled) by the socalled reference_fe_t abstract data type in FEMPAR. We refer to Sect. 6.3 for detailed implementation details. (Recall that nsimplices meshes do not actually require this procedure as they can always be properly (re)oriented.)

Rotation index: Provides the local index of the anchor vertex of \(F^{}\) with respect to the coordinate system of \(F_+\). When FEs are sharing two edges, we have the following situations. The edge can have the same anchor vertex seen from both elements, or not. For faces, the anchor vertex can be in 4 positions. It is called rotation because it represents a map that keeps invariant the reference face \({\hat{F}}^{}\) and makes the anchor vertices of the source and target cells coincide.

Orientation index: Given two cells sharing an nface with the same anchor vertex, the orientation index codes the map from the coordinate system of the nface with respect to the first cell to the one with respect to the second one.^{9} For edges, this map is always the identity, because two cells sharing an edge with the same anchor node provide the same edgewise node coordinates to its nodes. For faces, the situation is more complex, because it involves 2 different possible situations. The orientation index is equal to 0 for the identity permutation and 1 when we have to swap indices. We denote the base face as the face with the lowest local index (face [011000] in Fig. 3). Next, we consider two cubes that share a face, restricted to the following scenario: (1) the face is the base face in at least one of the cubes; (2) the face has the same anchor vertex in the two cubes. It is trivial to compute the orientation index in these cases. The orientation index in the more general case of two cubes sharing a face only restricted to (2), i.e., two arbitrary faces with the same anchor vertex, can be obtained by composition as follows. If two faces have the same orientation index with the base face, they have an orientation index equal to 1, and 0 otherwise.

Permutation index: An index obtained by composition of the rotation and orientation indices (i.e., it ranges from 1 and 2, and 1 and 8 for edges and faces, respectively), that codifies the final mapping between coordinates of two cells as the composition of a rotation and a orientation map. We note that the composition of all possible rotations and orientations cover all the possible relative positions of cells for a conforming mesh.
4 Implementation of polytope_t and Related Data Types
In FEMPAR, the reference FE cell geometry is defined by the polytope_t data type; see Listing 1. The input needed to define the polytope is the space dimension num_dimensions and the topology \({\mathtt {t}}\) in the 32bit integer topology.
Using the ideas in (27), (28), and (29), we create the set of all nfaces of the polytope \(({\mathtt {t}},{\mathtt {e}},{\mathtt {v}})\) in the (private) fill_polytope_chain TBP, which is in turn invoked by the (public) create TBP. All nfaces of the polytope have the same topology, and can be uniquely determined by a 32bit integer that represents the composition of \(({\mathtt {e}},{\mathtt {v}})\). We note that the ordering of nfaces based on \(({\mathtt {e}},{\mathtt {v}})\) mixes nfaces of different dimensions and it is nonconsecutive in general. Thus, we consider an ordering based first on the nface dimension, and next by \(({\mathtt {e}},{\mathtt {v}})\). The set of all nfaces generated by the recursion (29) are stored in n_face_array, an array of size number_n_faces. This array in particular provides the \(({\mathtt {e}},{\mathtt {v}})\) associated to each nface. The inverse mapping (from \(({\mathtt {e}},{\mathtt {v}})\) to the actual numbering) is stored in the ijk_to_index array.
It is also possible to iterate over facets of an nface, based on (30). The create_facet_iterator TBP of polytope_t creates a facet_iterator_t instance for a given nface. facet_iterator_t is defined in Listing 2. The nface \(({\mathtt {e}},{\mathtt {v}})\) is stored in root, the topology can be extracted from its polytope pointer member variable. The iterator over facets is described by two integers, component and coordinate, using the ideas in (30). The complexity of the traversal over facets is coded in facet_iterator_next and facet_iterator_has_finished.
With regard to the implementation of nodes within FEMPAR, we provide the node_array_t data type to represent the set of nodes defined in (31); see Listing 3. It is constructed from a polytope and the order. It provides a create TBP, where we perform (31) and fill all the resulting nodes in the node_array array member variable. We number the nodes using a consecutive numbering with increasing lexicographical index. The node array provides the lexicographical label in one integer. The inverse is stored in ijk_to_index. The total number of nodes is stored in num_nodes. Finally, the space coordinates of nodes are stored in coordinates.
We also provide the node_iterator_t object (see Listing 4), which iterates over the nodes of an nface (stored in n_face) using (31) and (32). It has a pointer to the node_array of the base polytope. Internally, it goes through the nodes of n_face (using (31)) (the current node being stored in displacement), which can be translated to the base polytope node numbering using (32) (stored in coordinate); the coordinate is computed on demand by calling the TBP node_iterator_current_ijk. The own_boundary logical allows one to iterate over the nodes considering the nface as an open or closed set. We note that the create TBP of node_array_t relies on node_iterator_t.
5 The polynomial_t Abstraction
In FEMPAR, the definition of shape functions is not hardcoded, as usually done in most FE codes. Such approach has severe limitations: (1) it is not practical for high order discretizations, and the code cannot be written for an arbitrary order; (2) it involves a huge number of code lines with the analytical expression of shape functions for a given set of available orders (see the discussion in [82]); and (3) it does not allow for dimensionindependent code. Instead, we consider a framework based on the concepts in Sect. 3.5, in which one considers a prebasis, defines the moments, and performs a change of basis. The prebasis is defined using the product of 1D functions (e.g., the Cartesian product), and the 1D function generator is written in terms of the (arbitrary) order. Our machinery for the generation of 1D functions has been restricted for the moment to polynomial functions in one variable, namely Lagrangian polynomials, monomials, and Bsplines, but the implementation can be extended to other choices. The product of 1D functions can be a Cartesian product of 1D Lagrange polynomials (or monomials), to define \({\mathcal {Q}}_{\varvec{k}}\) spaces on ncubes, or a reduced combination of monomials to define \({\mathcal {P}}_k\) spaces on nsimplices.
The definition of the reference FE functional space relies on the polynomial_t data type in Listing 5, which represents a polynomial in one variable, i.e., \(p(x) = \sum _{i=0}^k a_ix^k\). Thus, a 1D polynomial is defined in terms of its order k and a set of \(k+1\) coefficients \(\{a_i\}_{i=0}^k\), stored in order and the coefficients array, respectively. Different type extensions of polynomial_t have been considered so far, namely lagrange_polynomial_t and monomial_t. The first one generates a Lagrangian polynomial as in Sect. 3.4, in which the coefficients array has in its first order entries the coordinates of the nodes and in the last entry the coefficient \(\frac{1}{ \Pi _{n \in {\mathcal {N}}_k \setminus \{m\} } (x_m  x_s) }\) in (7). The monomial_t extension represents \(x^k\) where k is its order. It is just a trivial case of polynomial_t for optimization purposes that is uniquely defined by the order (the coefficients array is not needed). We also consider the polynomial_basis_t data type, which is just a set (array) of (polymorphic) polynomials.
Up to this point, we have defined Lagrange polynomials and monomials in one variable. lagrange_polynomial_t and monomial_t also provide the binding generate_basis that generates a Lagrangian and monomial basis of polynomials, for a given order k. The result of this subroutine is a polynomial_basis_t that includes as many polynomials as the polynomial space dimension. In the case of the Lagrangian basis, it implements the basis \(\mathcal {L}^k\) in Sect. 3.4, whereas the binding for monomials simply implements \(\{ x^i \}_{i=0}^k\).
The next step is to generate higher dimensional spaces. We consider two types of spaces. The first one is a space that can be generated as the Cartesian product of 1D spaces, implemented in the data type tensor_product_polynomial_space_t. This data type is defined through the number of space dimensions and as many polynomial_basis_t as space dimensions. This data type can be applied to any combination of 1D spaces. e.g., In the case of 1D Lagrange bases (possibly with different order and nodes per dimension), it leads to the multidimensional basis in (8). Thus, with this data type and Lagrangian 1D bases we generate the Lagrangian FE spaces on top of ncube cells, i.e., the \({\mathcal {Q}}_{{\varvec{k}}}\) space of polynomials.^{10}
Furthermore, we also consider the truncated_tensor_product_polynomial_space_t extension that generates Lagrangian FE spaces on nsimplices, i.e., the \({\mathcal {P}}_k\) space of polynomials. In this case, the generate_basis TBPs of monomial_t should be used to create the monomial 1D bases per direction and the order should also be the same for all directions. Otherwise, the resulting multivariable function would have no sense. Next, the combination of 1D monomials only involves terms such that \(  \varvec{\alpha } \le k \) (see Sect. 3.4), to generate a prebasis for FE spaces on tetrahedra, i.e., the \({\mathcal {P}}_k\) space of polynomials.
We note that with these abstract representations of polynomial spaces one can define the reference FE local space. However, unless one considers 1D Lagrangian basis and tensor product polynomials on ncubes, the resulting basis is not the shape functions basis. Even in the case of Lagrangian nsimplices, a changeofbasis is needed, using the procedure in Sect. 3.5 taking nodal values as moments. In Sect. 9.5, we show how we can define the shape function basis for the case of divconforming FEs of arbitrary order. The same ideas apply for gradconforming Lagrangian FEs on nsimplices and curlconforming FEs in general, but are not included for the sake of brevity.
6 The reference_fe_t Abstraction
In this section, we introduce the reference_fe_t data type. This data type is the OO representation of the standard mathematical definition of a reference FE presented in Sect. 3.3, namely, a reference cell geometry \({\hat{K}}\), a functional space \(\hat{\mathcal {V}}\), and a set of DOFs \({\hat{\Sigma }}\) defined on top of it. The reference_fe_t is a central abstraction in a FE library and must be judiciously designed to be extensible and reusable. In particular, it must not only accommodate Lagrangian FEs, but also other (more involved/general) spaces like RaviartThomas or edge FEs, DG methods, and Bspline patches. An extensible and reusable design of reference_fe_t should allow one to, e.g., easily incorporate new local functional spaces that were not originally considered, and to do so without having to rewrite (and thus recompile) any code that is grounded on the set of methods provided by reference_fe_t. To this end, in FEMPAR, reference_fe_t is an abstract data type that serves as a template equipped with a set of member variables and deferred bindings that subclasses have to set up and implement (i.e., override), respectively, in order to complete the description of the concrete FE space at hand. The definition of the reference_fe_t data type, a classification of its member variables into three different categories (corresponding to the three ingredients in Ciarlet’s definition), and an enumeration of its most relevant regular and deferred bindings, are shown in Listing 6.
This section is structured as follows. The member variables in each of the three aforementioned categories are covered in detail in Sects. 6.1–6.3, respectively. In Sect. 6.4, we discuss the OO design pattern chosen in FEMPAR for the creation of reference_fe_t polymorphic instances, and describe the arguments that uniquely define a subclass of this data type; these are in line with its mathematical definition. In Sect. 6.5, we enumerate and briefly describe the subclasses of reference_fe_t currently available in FEMPAR. We note that the section is not selfcontained as most of the deferred bindings of reference_fe_t are not covered here. These involve interactions with other data types in our OO design, and will be described in the sections in which these interactions are exposed. Code comments in Listing 6 serve as a table of contents with the article sections in which these deferred bindings are covered.
6.1 The Reference Cell Topology
The FEMPAR data type list_t stores a set of (variablesized) lists of integer identifiers, one per each entity; in this particular scenario, entities are nfaces. As shown in Fig. 5, the current implementation of this data type uses a compressed storage layout as, e.g., in compressed storage formats for sparse graphs. In order to preserve encapsulation and data hiding, list_t offers a rich set of TBPs that lets users to set up (step by a step) a new list_t instance; this type also provides a list_iterator_t type that lets them to sequentially read/write each of the integer identifiers of the list associated to an entity. The code snippet in Listing 7 illustrates how to iterate and print the identifiers of those vertices belonging to the nface with identifier n_face_lid.
The number of nfaces of any dimension can be easily computed from ptr_n_faces_x_dim. We note that ptr_n_faces_x_dim is not a list_t instance, since we adopt the convention that nfaces are numbered from the lowest to highest dimension, and thus only the \(\texttt {p}\) array of the list is actually needed (see Fig. 5). In the example in Fig. 5, the value of this array is \(\{1,5,9,10\}\), since we have 4 vertices (dimension 0), 4 facets or edges (dimension 1), and 1 cell (dimension 2).
6.2 The Reference FE Space
For a given cell topology, different definitions of functional spaces and sets of DOFs are possible, e.g., the ones of the nodal Lagrangian gradconforming reference FE in Sect. 3.8, the RaviartThomas divconforming reference FE in Sect. 3.9, or the curlconforming Nédélec reference FE in Sect. 3.10. The member variables of reference_fe_t required to describe the functional space \(\hat{\mathcal {V}}\) with support on \({\hat{K}}\) are encompassed within Lines 13–16 of Listing 6.
The local FE space \(\hat{\mathcal {V}}\) is determined by the member variables fe_type, (in some cases) field_type, and order. fe_type uniquely identifies the concrete FE space at hand. Possible values are provided by means of the public parameter constants fe_type_lagrangian, fe_type_raviart_thomas, and fe_type_nedelec corresponding to the reference_fe_t implementors currently supported in FEMPAR; see Sect. 6.5 for additional details on those. field_type identifies the “type” of physical field being discretized, i.e., whether it is scalar, vectorvalued, etc. There are FE spaces that are inherently vectorvalued such as, e.g., RaviartThomas and edge FEs. However, Lagrangian FEs can be either used to discretize scalar, vector, or tensorvalued fields, and field_type must be provided. We assume that \(\hat{\mathcal {V}}\) can be parameterized with respect to an order, which is stored in order. Out of these values, we can generate additional data, e.g., the number of shape functions is stored in num_shape_functions. For example, for (scalarvalued) biquadratic (2D) and triquadratic (3D) Lagrangian FEs, the field_type is scalar, num_components is equal to 1, order is equal to 2, and num_shape_functions is equal to 9 and 27, respectively.
6.3 The Set of Local DOFs
Additional data is required to describe the set of DOFs \({\hat{\Sigma }}\) for \(\hat{\mathcal {V}}\). In particular, the member variables encompassed within Lines 19–23 of Listing 6 serve this purpose.
The conformity member variable determines whether the global FE space \(\mathcal {X}_h\) is conforming with respect to the infinitedimensional space \(\mathcal {X}\), i.e., whether \(\mathcal {X}_h\subset \mathcal {X}\) or not. It is used to describe the nface that owns every DOF, which is required to enforce conformity of the global FE space through equivalence classes (see Sect. 3). e.g., For Lagrangian FEs, setting it to .true. results in a gradconforming global FE space, whereas setting it to .false. it results in a discontinuous space for DG methods. It is conceptually possible to set it to .true. on some cells and false on others, leading to the CDG method in [83]. On the other hand, the continuity member variable is only determined by \(\mathcal {X}\), and tells us whether \(\mathcal {X}\) admits a trace operator. Roughly speaking, it tells us whether we must enforce some type of continuity at the discrete level to preserve conformity, e.g., full, tangential, or normal traces for \(H^1(\Omega )\), \(H(\mathbf{curl},\Omega )\), and \(H(\mathrm{div},\Omega )\), respectively. The value of continuity is .false. when \(\mathcal {X}= L^2(\Omega )\), since no continuity is required. When continuity is .false., conformity must be .true.. continuity is barely used (see discussion in next paragraph).
The reference_fe_t data type plays a crucial role in the algorithm in charge of assigning global DOF identifiers to node functionals distributed over the interior of the triangulation cells and their boundary nfaces. (This algorithm, which is is covered in detail in Sect. 10, is grounded on the notion of equivalence classes introduced in Sect. 3.) In particular, the functionlike (regular) binding referred to as permute_dof_lid_n_face (see Line 32 of Listing 6) implements the mapping \(\mathtt {p}_F\) in (15). This function takes as input the socalled permutation index in Sect. 3.16, the local index of a node within an nface of given dimension (e.g., in 3D, either 0 for vertices, 1 for edges, and 2 for faces) from the perspective of a source cell, and returns the local index of a node within that nface from the perspective of the target cell.^{12} This is in particular the transformation that we have to apply when global DOF identifiers have been already assigned to nface nodes in the source cell, and we want to transfer them to nface nodes in the target cell; see Sect. 10.3. This binding, implemented in reference_fe_t, ultimately relies on its own_dof_permutations(:) member variable; see Line 23 in Listing 6. This allocatable array is indexed with the nface dimension (i.e., 1 for edges, and 2 for faces). For each nface dimension larger than 0, it contains a rank2 allocatable array (i.e., type(allocatable_array_ip2_t) is the base type of the array), which serves as a lookup table for the implementation of the aforementioned transformation. In particular, the rows are indexed with the local index of the node identifier on top of the nface from the perspective of the source cell, and the columns with the permutation index; see Sect. 3.16. The entry in the corresponding row and column of the table provides the local index of the node within the nface from the perspective of the target cell. These lookup tables are filled within the fill_own_dofs_permutations deferred binding of reference_fe_t. We note that this latter binding, and permute_dof_lid_n_face, are declared as overridable bindings in Listing 6 on purpose. This lets, e.g., subclasses of reference_fe_t to be used in conjunction with (properly oriented; see Sect. 3.16) nsimplex meshes to implement the former such that the own_dof_permutations(:) member variable is not allocated nor filled, and the latter such that always returns the identity transformation.
6.4 Creating reference_fe_t Polymorphic Instances
Central to any OO software system relying on abstract data types is the approach chosen to create polymorphic instances at runtime. For simplicity, FEMPAR follows the socalled simple factory design pattern [85]. It takes the form of a single standalone function, called make_reference_fe, which selects the dynamic type of the polymorphic instance to be returned at runtime based on the values of its dummy arguments topology and fe_type. (For example, assuming the topology of an hexahedron and fe_type_lagrangian, then it will select its dynamic type to be hex_lagrangian_reference_fe_t, i.e., the concrete data type implementing Lagrangiantype FE spaces on top of ncubes.) Before returning, it calls a deferred binding of reference_fe_t, called create, which is responsible to leave the reference_fe_t in a fully functional state. The interface of this deferred binding is shown in Listing 8.
We remark that field_type is only a free parameter for Lagrangian FEs (i.e., for a particular reference_fe_t subclass). In other words, it must be field_type_vector for RaviartThomas and edge elements. We note that despite its fix set of dummy arguments interface, it has been proven to be sufficient to fully describe all subclasses currently available in FEMPAR; see Sect. 6.5. However, in the event that it is needed, and with extensibility in mind, a single parameter dictionary of \({<}key, value{>}\) pairs might have been used instead; FEMPAR indeed relies on an implementation of this data type where key is a string (typically denoting the name of the parameter), and value a scalar or arbitrary rank array of intrinsic or even userdefined types.^{13}
6.5 Enumeration of reference_fe_t Subclasses

hex and tet_lagrangian_reference_fe_t. Space of polynomials of arbitrary degree k on top of ncubes (i.e., tensorproduct like spaces \({\mathcal {Q}}_{\varvec{k}}\)) and nsimplices (i.e., \({\mathcal {P}}_{\varvec{k}}\)), respectively, for the discretization of either scalarvalued, vectorvalued or tensorvalued fields; see Sect. 3.8. By selecting the ownership relationship among node functionals and nfaces appropriately (see Sect. 6.3), this FE space can be either globally continuous, or entirely discontinuous across cell boundaries.

hex and tet_raviart_thomas_reference_fe_t. The vectorvalued RaviartThomas FE of arbitrary degree k on top of ncubes, and nsimplices, resp., suitable for the mixed Laplacian problem and some fluid flow problems. Global FE functions of this space (in its conformal variant) have continuous normal components across cell faces; see Sect. 3.9 for details.

hex and tet_nedelec_reference_fe_t. The vectorvalued curlconforming Nédélec FE of arbitrary degree k on top of ncubes, and nsimplices, resp., suitable for electromagnetic problems. Global FE functions of this space (in its conformal variant) have continuous tangential components across cell faces; see Sect. 3.9 for details.

void_reference_fe_t. A software artifact that represents a FE space with no DOFs at all, neither at the cell interiors, nor at their boundary nfaces. This sort of software resource has been proven extremely efficient for: (1) the numerical solution of a PDE on a subdomain of our original discretized domain (which thus has to be aligned with the cells boundaries); (2) the numerical solution of a PDE using XFEMlike discretization techniques (which are grounded on FE spaces that do not assign DOFs to cells exterior to the embedded domain); (3) to simplify the implementation of discretization methods for PDE problems that involve coupling at the interface level, e.g., fluidstructure interaction.
7 The Description of the Physical Domain: The triangulation_t Abstraction
A central abstraction in all FE numerical simulation codes is the one that describes the triangulation/mesh \(\mathcal {T}_h\) of the physical domain \(\Omega \subset \mathbb {R}^d\) in which our problem is posed. (In practice, the mesh generation for \(\Omega \) introduces a geometrical error, and the mesh is in fact over an approximated domain \(\Omega _h\)). In FEMPAR, this abstraction is called triangulation_t. With flexibility, and code reuse in mind, this is an abstract data type. In Sect. 7.1, we introduce triangulation_t, and the mechanism that it provides to its subclasses in order to preserve encapsulation and data hiding, while still letting subclasses to store and access to data efficiently. For completeness, in Sect. 7.2, we introduce details underlying the implementation of a particular concrete subclass of triangulation_t.
7.1 An Abstract Triangulation Representation and Its Software Implementation
In this section, we present an abstract (conceptual) representation of a triangulation that FEMPAR exposes to userlevel applications and other library software abstractions that are grounded on it (see, e.g., Sect. 10). This conceptual representation is provided by a set of abstract derived data types (and the methods bounded to them) to which we have converged as a result of our experience in accommodating a wide range of stateoftheart FE discretizations and solver techniques within a single framework, from desktops/laptops, to highend distributedmemory supercomputers (see Sect. 2).
For the sake of brevity, in this work we restrict ourselves to a subset of this representation that only provides support to the implementation of highorder conforming and nonconforming FE discretizations grounded on conforming meshes in a serial computing environment. We stress, however, that the actual (complete) representation also incorporates concepts to express the mesh in a distributedmemory environment (e.g., the set of cells of a subdomain is divided into local cells and a layer of cells owned by remote subdomains, which we denote as ghost cells). It also provides support to the implementation of highorder hpadaptive (i.e., on locally refined, nonconforming meshes) conforming and nonconforming FEs (using hanging node constraints [82] and subface integration over a facet between cells of different refinement level, respectively) and to the implementation of XFEMtype techniques (see [60] and references therein); provided an implicit representation of the geometry of the domain, a background mesh is able to know whether a cell is interior, exterior or cut by the domain, and in the latter case, to provide the coordinates of the intersection points. This extra expressivity comes in the form of additional data types and an extended set of methods for those data types that are covered in this section. We stress, however, that neither the former nor the latter ones will be covered in this section.
Although our abstract representation of a triangulation has been proven to have high expressivity, we do not claim, however, that our triangulation representation is universally applicable to the implementation of arbitrary numerical discretization and solver techniques. It indeed has been designed such that extra extensions are foreseen to satisfy further requirements.
The triangulation representation encompasses both topological and geometric data. A triangulation is conceived as a partition of \(\Omega \) into a set of cells (dfaces). Each cell is uniquely identified by a global identifier in the range \(\texttt {cell\_gid}=1,\ldots ,\texttt {num\_cells}\).^{14} Apart from the cells, a triangulation is also composed by a set of lower dimensional objects, i.e., a set of kfaces, for \(k=0,\ldots ,d1\). We will also refer to elements in this set as “vefs”, provided that in the \(d=3\) case, it is composed of vertices, edges, and faces. Each of the objects in this set is uniquely identified by a global identifier in the range \(\texttt {vef\_gid}=1,\ldots ,\texttt {num\_vefs}\).^{15}
Apart from the cells and vefs, a triangulation also encompasses adjacency data. This sort of data describes how nfaces in a mesh are related to each other. We denote by F the set of all nfaces in the mesh, by \(F^k\) the set of all kfaces, and by \(F_i\) and \(F^k_i\) the ith nface (of arbitrary dimension) and the ith kface (of fixed dimension k), respectively. In conforming meshes, there are mainly two relevant types of adjacency relationships, namely composition (mfaces that are part of a kface for \(m<k\)) and neighbourhood (mfaces around a given kface for \(m>k\)). Following [87], the set of mfaces adjacent to \(F^k_i\), is denoted by \(F^k_i \langle F^m \rangle \) (i.e., the operator \(\langle \cdot \rangle \) selects from the set the mfaces adjacent to the one in the left). A triangulation conforming with FEMPAR abstract representation should be able to provide the composition data \(F^3_i \langle F\rangle \), and the neighbourship data \(F_i \langle F^3 \rangle \), that is, nfaces that compose each cell and cells around nfaces.
A triangulation also includes geometry data. Cell geometries are represented by a map \(\varvec{\Phi }_K\) of a polytope \({\hat{K}}\) in the reference space to the physical space (see Sect. 3). This map is represented as a function of a scalar FE space (e.g., grounded on highorder Lagrangian FEs or Bsplines), with its DOF values being the vectors of node coordinates (i.e., point_t instances) in the physical space.
At the core of the software design in charge of providing the triangulationrelated data covered so far is an abstract data type named triangulation_t. (The rationale behind this data type being abstract will be made clear in the course of this section.) This data type is defined as shown in Listing 9. triangulation_t is conceived as a template to which all subclasses have to conform. On the one hand, it is composed by a (minimal) set of member variables encompassing data common to any triangulation. In particular, any triangulation is embedded in a num_dimensionsdimensional space, and is composed of a total number of num_cells (num_dimensionsdimensional) cells and num_vefs vefs, respectively; see Lines 3–5 of Listing 9, respectively. On the other hand, triangulation_t is equipped with a set of deferred methods that the subclasses of triangulation_t must implement; see Lines 11–18. The rationale underlying these methods requires further elaboration, to be discussed in the sequel.
In order to construct a conceptual view of triangulation_t suitable for the user (and library) code needs, FEMPAR relies on the socalled iterator OO design pattern [88]. Iterators are data types that provide sequential traversals over the full sets of objects that all together (conceptually) comprise triangulation_t as a meshlike container. There are several different iterators available, each one related to a different set of objects to be traversed. For example, cell_iterator_t provides traversals over the set composed of all cells, while vef_iterator_t over the one composed of all vefs.^{16} In our software design, iterators are created and freed by a set of public TBPs provided by triangulation_t; see Lines 11–18 of Listing 9. Thus, for example, the expression call triangulation%create_cell_iterator(cell) creates an iterator on the cell clientspace instance, while call triangulation%free_cell_iterator(cell) frees it. Iterators sequentially traverse objects in increasing order by their global identifiers. However, we note that triangulation_t subclasses are completely free to decide how to internally label these objects.^{17}
As the reader might have already noted from the minimal set of member variables in Listing 9 (among others), our software design is such that we want to provide complete flexibility to concrete subclasses of triangulation_t with respect to how do they internally layout the (topology and geometry) data to be provided. To this end, triangulation_t is an abstract class that defers this decision to its subclasses. There is a clear separation among how the data is handled (i.e., stored and accessed) by the private data structures (member variables) underlying triangulation_t subclasses, and the conceptual/abstract view of triangulation_t exposed to FEMPAR users. This view renders triangulation_t easily accessible and understandable. Whereas the public interface of triangulation_t being used by client codes is designed to be stable over time, the internals of triangulation_t subclasses, however, are allowed to (and are subject to) change over time (e.g., in order to accommodate further optimizations, additional requirements, etc.). At the price of dynamic runtime polymorphism, triangulation_t subclasses might be designed such that they strongly strive to preserve encapsulation and data hiding while still storing and accessing to data efficiently. Thus, e.g., a triangulation_t subclass in charge of handling structured/uniform meshes of simple domains may decide to not explicitly store the cellwise global vef identifiers, nor the vertex coordinates of the mesh, but instead to provide them implicitly on demand as a function of the global cell identifier.
Apart from encompassing the logic underlying the actual traversal over objects of the set at hand, iterators also have the following crucial responsibility. Following the software concept of “accessors” presented in [17], they are able to tease out the data related to the current object on which they are seated from the global arrays and rest of private data structures that comprise the internals of the corresponding triangulation_t subclass. They therefore do not explicitly store, e.g., the global vef identifiers of the current cell. Instead, they know how to fetch them from the corresponding triangulation_t subclass into data structures suitable for the user needs. Provided that it is the responsibility of triangulation_t subclasses to decide how to internally layout data, iterators are abstract data types as well, and most of its TBPs are deferred/virtual. This also justifies why the methods in the Lines 11–18 of Listing 9 are deferred, and why the corresponding iterator dummy arguments, polymorphic allocatable. It is ultimately the responsibility of the concrete subclass of triangulation_t to decide on execution time the dynamic type of the polymorphic variable being created.
Let us next discuss the rationale underlying the design of iterators over cells and vefs. These data types are defined in Listing 10, where set must be actually replaced by the corresponding name uniquely identifying the set of objects to be traversed by the iterator at hand, i.e., either cell or vef. In Fig. 7, we illustrate the implementation of a partial (selected) subset of the bindings of these data types.
The actual set of (deferred) TBPs of a triangulation_t iterator highly depends on the type of object being pointed. We now briefly discuss those TBPs in the set corresponding to cell and vef iterators that provide support to the subset of the triangulation conceptual representation we are focusing on. These are in particular enumerated in Listing 11.
The TBPs in Lines 8–12 of Listing 11 are in charge of providing data related to the composition relationship \(F^3_i \langle F\rangle \). In particular, the get_num_vefs binding returns the number of vefs on the boundary of the mesh (i.e., the cardinality of the composition relationship). Given the local index of a vef in a cell (within the range \(1,\ldots ,\texttt {num\_vefs}\)), get_vef positions the vef_iterator_t instance on input such that it points to this vef, while get_vef_gid, returns its global identifier; get_vef_lid performs the inverse translation to the one of get_vef_gid. Finally, get_vefs_gid let the client obtain the global identifier of all vefs of the current cell in one shot provided a userspace pointer to integer array. The semantics of this last TBP are such that subclasses of cell_iterator_t are not allowed to allocate the provided pointer, but to associate it with existing (internal) memory (for increased performance and memory leaks avoidance).
The TBP in Line 15 of Listing 11 provides support to the implementation of the transformation procedure described in Sect. 3.16. In particular, this binding has to be invoked on a cell_iterator_t instance positioned in the source cell, and given a cell_iterator_t positioned on the target cell, and the nface local identifier within the former and latter cells, returns the permutation index; see Sect. 3.16. We stress that both the rotation and orientation indices can be always computed using the TBPs in the previous paragraph. For example, in order to determine the rotation index, one can extract the global id of the anchor vertex of the nface in the target cell (by calling get_vef_gid), and then searching for this global id in the set of vertices that comprise the nface in the target cell (using an iterator over the corresponding sublist in vertices_n_face; see Sect. 6.1). However, we preferred to provide a specialized deferred binding for such purpose in order to leave room for optimizations in triangulation_t subclasses. For example, in the case of a subclass that works with oriented meshes, then get_permutation_index may be implemented such that it always returns the permutation index corresponding to the identity transformation. In the case of a subclass of triangulation_t that is intended to remain static (or to be adapted very infrequently) during the course of the simulation process (see, e.g., Sect. 7.2), then it might be beneficial for performance to precalculate all possible permutation indices during set up into lookup tables, and reuse them all the way through without having to perform the aforementioned searches over and over again.
The TBPs in Lines 18–20 are in charge of providing the cell geometry relateddata. In particular, get_reference_fe returns a polymorphic pointer to the reference_fe_t instance that describes the space of functions to which the mapping \(\varvec{\Phi }_K\) belongs. get_num_nodes and get_nodes_coordinates return the number of nodes describing the geometry of the cell, and its associated coordinates in physical space, respectively. Instead of a pointer to an userspace array to be associated with internal storage (as get_vef_gids), get_nodes_coordinates takes a userspace (preallocated) array of type point_t instances, and fills it (because of reasons made clear in Sect. 8.3). Assuming that reference_fe_t is a bilinear Lagrangian FE on a quadrilateral, then get_num_nodes would return 4 (one node per cellvertex), while get_nodes_coordinates the coordinates in physical space of its vertices.
Any triangulation_t subclass should let its clients to classify the cells into sets. Each set is globally identified by an integer number, named set_id. The methods get_set_id and set_set_id let the caller to associate a set to the current cell, or to retrieve the set to which the cell is currently associated. Cells set identifiers are primarily (although not only) used by fe_space_t during its setup; see Sect. 10. In particular, they instruct the latter to determine which reference_fe_t instances to use on top of the cells belonging to the same set. For example, assuming that we want to solve a scalar, singlefield PDE problem on a subdomain of our original domain (that we assume to be aligned with the cells boundaries), we would use two different sets. The first for the cells that are interior to the subdomain, and the second for those that are exterior. Then we could associate e.g., a linear Lagrangian reference FE to cells in the first set, and void_reference_fe_t on those cells of the second set; see Sect. 6.5.
Sitting on a given vef, the TBPs in Lines 36–37 are in charge of providing data related to the adjacency relationship \(F_i \langle F^3\rangle \). In particular, get_num_cells_around returns its cardinality, while get_cell_around returns a cell in this set. To be more precise, the latter TBP positions the instance of cell_iterator_t on input such that it points to a cell in this set identified with an index within the range \(1,\ldots ,\texttt {get\_num\_cells\_around()}\). The order in which the cells around a vef are listed can be arbitrary, so that codes relying on triangulation_t should not assume, e.g., that they are ordered increasingly by their global cell identifiers. On the other hand, get_num_nodes and get_nodes_coordinates return the number of points on top of the vef (including those on top of the lowerdimensional ones on its boundary), and its associated coordinates in physical space, respectively; see Lines 40–41. We adopt the convention that these nodes are (locally) labeled (within the input/output array of point coordinates to be filled) according to the reference coordinate system of the first cell around the vef, i.e., the cell obtained as vef%get_cell_around(1,cell).
The TBPs in Lines 44–48 let the client to determine whether the vef is at the interior of the domain or on its boundary, the vef dimension (e.g., in 3D, it would return 0, 1, and 2 for vertices, edges, and faces, respectively) and to retrieve the set to which the vef is currently associated, or associate a new set to it, respectively. Sets in the case of vefs are primarily used to codify the boundary conditions of the PDE problem at hand, as discussed in Sect. 10.4.
At this point we are already in position to show userlevel code that exploits the software design covered so far. In particular, Listing 12 splits the whole set of triangulation cells into two disjoint sets, those that are in contact to the boundary of the domain, and those that are in its interior.
7.2 An Example triangulation_t Subclass and Rationale
In this section, we discuss how a particular subclass of triangulation_t is internally organized in order to efficiently provide triangulationrelated data by means of the software abstractions presented in Sect. 7.1. This subclass is static_triangulation_t. A static_triangulation_t codifies a conforming mesh, which is set up from scratch at the beginning of the simulation, and remains unaltered during the whole process. On the other hand, static_cell_iterator_t and static_vef_iterator_t are two nonabstract data type extensions of cell_iterator_t and vef_iterator_t, respectively. By overriding the set of deferred methods of the former ones, the latter ones tease out the data related to the current object on which they are seated from the global arrays and rest of private data structures that comprise the internals of static_triangulation_t.
There is no single approach to layout the data within a given triangulation subclass. The seek of an acceptable tradeoff among memory consumption, computational time required to set up, update (if it applies), access to triangulation data, and the frequency on which these operations are performed should guide its internal organization. For example, in [87], two storage layouts are presented, and its memory and computational cost for the computation of any possible adjacency relationship is evaluated in 3D. The first one, called onelevel representation, is defined by \(F^1_i \langle F^0 \rangle \), \(F^2_i \langle F^1 \rangle \), and \(F^3_i \langle F^2 \rangle \), and by \(F^0_i \langle F^1 \rangle \), \(F^1_i \langle F^2 \rangle \), and \(F^2_i \langle F^3 \rangle \) (neighbourhood information). In other words, it stores vertices of each edge, edges of each face, and faces of each cell, together with edges around vertices, faces around edges, and cells around faces. The second one, called circular representation, is defined by the composition information \(F^1_i \langle F^0 \rangle \), \(F^2_i \langle F^1 \rangle \), \(F^3_i \langle F^2 \rangle \) (as above), together with the neighbourhood information \(F^0_i \langle F^3 \rangle \) (cells around vertices). An important property of these two storage layouts is their completeness, i.e., the possibility to determine any adjacency without a loop over the entire mesh. The storage requirements for a uniform mesh of a cube domain with \(N_c\) cells are \(48 N_c\) (for hexahedra) and \(24 N_c\) (for tetrahedra) in the former, and \(32 N_c\) (for hexahedra) and \(16 N_c\) (for tetrahedra) in the latter. However, the operation count for determining some adjacencies, although independent of \(N_c\), is high. For example, in the case of the onelevel representation, to obtain the cells around a vertex requires 48 (for hexahedra) and 140 (for tetrahedra) operations, whereas only one operation is needed to obtain cells around facets. In the case of the circular representation, these queries involve one and 148 (for hexahedra) or 299 (for tetrahedra) operations, respectively [87]. (We recall that both kind of adjacencies are required by FEMPAR as presented in Sect. 7.1.)
Another quite different storage data layout is the one followed by the triangulation in the deal.II library [17], essentially defined by the composition data \(F^1_i \langle F^0 \rangle \), \(F^2_i \langle F^1 \rangle \), and \(F^3_i \langle F^2 \rangle \) (referred as hierarchical cell representation by the authors of the library), and the neighbourhood data \(F^3_i \langle F^2 \rangle \) stored cellwise (i.e., a given cell stores the identifiers of its cell neighbours across each face within the cell). Besides, the (potentially nonconforming) triangulation in this library is conceived (and explicitly represented) as a collection of trees, where the cells of a coarsest conforming mesh (generated by deal.II itself for simple domains, or read from a file from several file formats) form the roots, and the children branch off their parent cells, thus forming binarytrees, quadtrees and octtrees in \(d=1,2,\) and 3 spatial dimensions, respectively [17]. While both the ancestors (i.e., the socalled “inactive” cells) and leaf cells of the tree (i.e., the socalled “active” cells) are stored, only the latter ones actually form the partition of the domain. Apart from a hierarchy of cells, the deal.II triangulation also maintains a hierarchy of kfaces for \(k=1,\ldots ,d1\). Such quite complex data structure is justified by the authors for two reasons. First, it allows for an efficient implementation of adaptive mesh adaptation (including coarsening and refinement). The hierarchy of nfaces aids in the process of handling the socalled hanging node constraints required to build conforming FE spaces on top of nonconforming meshes. The second reason is the implementation of (geometric) multigrid preconditioners grounded on the adaptivity tree. In particular, such preconditioners require that DOFs are also associated to inactive cells. Thus, also inactive nfaces have to explicitly exist in the triangulation. In any case, such structure is hard to generate and maintain, and does not fit well when integrated with parallel octree libraries [89], like p4est [90]. The whole hierarchy must be generated from scratch on each mesh adaptivity step. However, based on our own experience, such hierarchy is not really needed for an efficient implementation of adaptive refinement. The second reason, i.e., the implementation of a serial hierarchical multigrid solver in deal.II, would probably be more complicated without such a hierarchical representation of the mesh.
While the hierarchical cell representation in deal.II has been proven to be successful in the implementation of highly complex hpadaptive FE discretization [82] and reduces memory consumption over \(F^3_i \langle F\rangle \), the restriction of the global vef identifiers to a cell (a very frequent operation in FE codes), becomes significantly more expensive in this storage layout as this operation requires permutations among the reference coordinate system of the cell that owns the vef to the one to which we are restricting to; the same applies to the restriction of global DOF identifiers to a cell when the DOFs are stored nfacewise. Furthermore, it is a noncomplete storage layout. In particular, neighbourship data \(F_i \langle F^3 \rangle \) has to be computed by the user by means of a loop over all cells. Besides, it prevents library support to loops over the facets of the mesh, and access to the neighbouring cells, a natural operation in the implementation of DG methods. In our experience, facetloop based integration of DG terms (versus cellloop based) leads to a software that is significantly easier to use, as it might be designed such that most of the complexity underlying facet integration can be hidden to the user (see Sect. 9). Finally, although it is very efficient for hierarchical and local mesh adaptation (within each subdomain), the most severe drawback is its costly set up (from scratch) for a given initial conforming coarse mesh (this can be mitigated by reducing the coarse mesh resolution, at the price of potentially losing geometry modelling accuracy), and, in a distributedmemory environment, the even more costly regeneration of an adapted nonconforming forest of trees after a redistribution step among processes for dynamic loadbalancing [90]. Indeed, in [89], the latter is reported as the second more costly operation in the simulation pipeline, only below the linear solver step.
The static_triangulation_t data type explicitly stores the composition data \(F^3_i \langle F\rangle \), and the neighbourship data \(F_i \langle F^3 \rangle \) within its internal (private) member variables.^{18} The memory consumption of such complete storage layout is \(52 N_c\) (hexahedra) and \(28 N_c\) (tetrahedra), which is less than twice the one of the onesided and circular representations [87]. At the price of this increased memory consumption, static_triangulation_t is able to provide the required adjacency data with \(\mathcal {O}(1)\) arithmetic complexity. Besides, the cellbased storage of the composition relationship is perfectly suited for its migration in parallel distributedmemory environments. On the other hand, the amount of permanent storage of this data layout can be reduced if one exploits the fact that neighbourship data is only required in very specific parts of the code. For example, unstructured mesh generators usually provide only the composition data \(F^3_i \langle F^0 \rangle \). In such a case, static_triangulation_t requires the neighbourship data \(F^0_i \langle F^3 \rangle \) (plus the reference cell topology data encompassed within the reference_fe_t instance mapped to each cell; see Sect. 6.1) in order to set up the composition data \(F^3_i \langle F^1 \rangle \) and \(F^3_i \langle F^2 \rangle \). It is also needed in triangulation_t subclasses suitable for distributedmemory computers, among others, to set up the data structures required to perform nearest neighbour exchanges of DOFs nodal values among subdomains. (We stress that this process requires to globally identify interface DOFs consistently among subdomains sharing such DOFs .) In this latter scenario, this adjacency data is only required for nfaces that lay on the intersubdomain interface (and not for those on the interior). The evaluation of facet integrals (as designed in FEMPAR, see Sect. 9) also requires at least \(F^2_i \langle F^3 \rangle \) and \(F^1_i \langle F^2 \rangle \), in 2D and 3D, respectively. The use of the full adjacency data can be needed for the implementation of advanced numerical discretization schemes, e.g., for the implementation of nodalbased shock detectors for monotonic FEs [58, 59]. Due to the aforementioned reasons, we decided to design static_triangulation_t such that it permanently stores such data, but we stress that our software design is such that a triangulation subclass is always free to offer methods that set up and destroy these data on demand to reduce the amount of permanent data storage.
The static_triangulation_t data type, together with a selected set of its bindings, is defined as shown in Listing 13. Before going into more detail, there are two main points to remark with respect to how this type internally layouts its data. First, it relies all the way through on intrinsic Fortran allocatable arrays. These sort of data structures are perfectly suited for the particular case of static_triangulation_t, due to its static nature. We stress, however, that more efficient data structures (i.e., able to mitigate the effect of frequent/costly allocatable array reallocations) would be convenient if it also had to support mesh adaptation (e.g., a linked list, or even better for data locality, a data structure with semantics close to std:vector of the C++ standard template library, which in fact is already in FEMPAR but not included for brevity). Second, for increased data locality during cell and vef sequential traversals (and thus a more efficient on the memory hierarchy of modern computer architectures) the data is not stored into cellwise or vefwise local arrays, but into global arrays that are indexed either by the global cell or vef identifiers.
A collection of reference_fe_t polymorphic instances is stored in the reference_fes(:) array (see Line 4 of Listing 13). These instances are uniquely identified (within the local scope of static_triangulation_t) by their position in this array. For a given cell with global identifier cell_gid, the FE space of functions to which the cell mapping \(\varvec{\Phi }_K\) belongs, is described by the reference_fe_t instance with identifier cell_to_ref_fes(cell_gid) in the collection; see Line 7. The member variables used to store the composition data \(F^3_i \langle F\rangle \) are encompassed within Lines 10–11 of Listing 13. As stated above, the global vef identifiers are stored cellwise, in the lst_vefs_gids(:) array, which is in turn (indirectly) addressed by the ptr_vefs_x_cell(:) array. In particular, the ones assigned to the vefs on cell cell_gid start and end in position ptr_vefs_x_cell(cell_id) and ptr_vefs_x_cell(cell_id+1)1 of lst_vefs_gids(:), respectively. Thus, e.g., the implementation of the (overridden) get_num_vefs TBP in static_cell_accessor (see Listing 12), just determines the number of vefs on the boundary of the current cell as ptr_vefs_x_cell(cell_id+1)ptr_vefs_x_cell(cell_id). On the other hand, the member variables used to store the adjacency data \(F_i \langle F^3\rangle \) are encompassed within Lines 14–15 of Listing 13. The global identifiers of the cells around a vef vef_gid start and end in position ptr_cells_around(vef_gid) and ptr_cells_around(vef_gid+1)1 of lst_cells_around(:), respectively.
The geometryrelated data is handled by the member variables in Lines 18–20. In particular, during the set up of static_triangulation_t a global numbering of the nodes of the global FE space describing the geometry of the mesh is internally built. (The process that generates such numbering is identical to the one described in Sect. 10.3, so that we omit it here to keep the presentation short.) In particular, the global node identifiers restricted to cell cell_gid start and end in position ptr_nodes_gids(cell_id) and ptr_nodes_gids(cell_id+1)1 of lst_nodes_gids(:), respectively. These global node identifiers are used to (indirectly) address the global array of nodes coordinates in Line 20. The cells_set_ids(:) and vefs_set_ids(:) arrays are used to store the userprovided cell and vef set identifiers (see Sect. 7.1), respectively, while vefs_at_boundary(:), whether the corresponding vef lays on the boundary of the domain or not.
Finally, the static_triangulation_create binding sets up a new static_triangulation_t instance. There are two options for creating a static_triangulation_t in FEMPAR, depending on whether the mesh is structured or unstructured. In the first case, FEMPAR provides the machinery for the automatic generation of a triangulation on simple domains (e.g., a unit cube), currently of brick (quadrilateral or hexahedral) cells. This function is implemented exploiting a tensor product structure of the space, numbering cells and vefs using lexicographical order. The second way to create a static_triangulation_t instance is from a mesh data file, e.g., using the GiD mesh generator [91].
8 Evaluation of Cell Integrals
In this section, we describe the data structures required to perform the numerical integration of the local matrices. In order to compute cell integrals (12), one needs (among others) functionality to evaluate the shape functions and their derivatives at the quadrature points in the physical cell and the determinant of the Jacobian at the quadrature points in the reference cell. In turn, the evaluation of the shape functions and derivatives in the physical cell rely on their evaluation (and possibly the evaluation of the Jacobian) in the reference cell (see, e.g., (13) and (14)). We note that the evaluation of \({\hat{\Psi }}\) does not require any additional information; it is the identity for Lagrangian elements and only requires the Jacobian in the reference cell for vectorvalued shape functions (see (17) and (18)). In the following, we present a set of data types that contain all this information.
The evaluation of cell integrals involves the data type quadrature_t that represents the quadrature \(\mathrm{Q}\), interpolation_t, that stores the values of the shape functions and its first derivatives (either in the reference or physical space) at the quadrature points of \(\mathrm{Q}\), and a cell_map_t that describes the mapping from a reference to a physical cell \(\varvec{\Phi }_K\) (e.g., Jacobianrelated data). Additionally, the data type cell_integrator_t provides the machinery to compute the interpolation_t corresponding to the physical space from the one at the reference space and the cell_map_t at every cell of the triangulation. In the following sections, we cover in detail these software abstractions.
8.1 Numerical Quadrature
The data type that in FEMPAR represents an arbitrary quadrature rule is called quadrature_t and is defined as shown in Listing 14.
In Listing 14, coordinates(:,gp) and weights(gp) store, respectively, \(\hat{\varvec{x}}_{gp} \in {\mathbb {R}}^{\mathtt{num\_dims}}\) and \(\mathrm {w}_{gp}\), for \(gp=1,\ldots ,\mathtt{num\_quadrature\_points}\). It might readily be observed from the interface of its create binding that quadrature_t is designed to be simply a placeholder for the quadrature points coordinates and its associated weights. Indeed, this binding essentially allocates coordinates(:,:) and weights(:). The code that ultimately decides how to distribute the quadrature points over \({\hat{K}}\) and set up its associated weights is actually bounded to the reference_fe_t implementors through the deferred binding with interface shown in Listing 15.
All reference_fe_t subclasses currently available in FEMPAR select by default a Gaussian quadrature that exactly integrates mass matrix terms (within their implementation of the binding in Listing 15) by invoking fill_*_gauss_legendre methods at lines Lines 13 and 14 in Listing 14. This quadrature can be solely determined from the attributes of the reference_fe_t implementor at hand (its topology and order).^{19} However, in other more demanding situations, e.g., the integration of a trilinear weak form, the user can provide the desired quadrature degree through the degree optional dummy argument. If more general scenarios to the ones currently covered (e.g., a nonGaussian quadrature) are to be addressed, then the interface might be modified such that an optional parameter dictionary is passed instead.
8.2 Evaluation of Reference Cell Shape Functions
As commented in the introduction of this section, to compute cell integrals (12), one needs to evaluate shape functions and their derivatives in the physical cell, which in turn rely on their evaluation in the reference cell (see, e.g., (13) and (14)). The values of the shape functions and their first derivatives at a set of quadrature points provided by a quadrature_t instance are stored in the interpolation_t data type presented below. The same data type can be used to store this data in the reference or physical space.
Let us start with the evaluation of shape function in the reference space. The local FE space on top of \({\hat{K}}\) actually depends on the particular reference_fe_t implementor at hand. Consequently, this functionality has to be offered through a deferred binding of this abstract type. The interface of this binding is declared in Listing 16. The subroutine overriding it in concrete subclasses is in charge of computing the shape functions values and derivatives at quadrature points in the reference space and stores them in a rawdata container of type interpolation_t (to be discussed later in this section).
Let us remark several points related to this interface. First, this binding is typically called only once, and the data precomputed and stored within the passed interpolation_t dummy argument is repeatedly reused when transforming these values to an actual cell; see Sect. 8.4. Second, this binding is designed such that all functions are evaluated at all quadrature points within a single call, instead of following a (much) finer granularity approach in which only one function is evaluated at a quadrature point per call.^{20} Third, we stress that the actual implementation of this deferred binding in FEMPAR computes shape functions values and first derivatives in the reference space, whereas it lets the caller to selectively decide whether to compute or not the second derivatives of the shape functions, provided that they are expensive to compute and only required in very particular scenarios; see Sect. 3.7. Indeed, the code implementation of this feature is of crosscutting nature, being reflected in several interfaces and data types in which the cell (and face) integration functionality is split. We will nevertheless omit here (and in the rest of sections) details regarding second derivatives (and its optional computation) in order to keep the presentation simple.
Let us now discuss on the rationale underlying interpolation_t. This data type is not exposed at all to the user of FEMPAR. It is instead used as an internal lowlevel container that lets the data types involved in the implementation of cell integrals exchange the sort of data subject to consideration. It is ultimately the responsibility of the concrete reference_fe_t subclass to decide how the data is actually laid out within the member variables of interpolation_t. Thus, reference_fe_t is the only data type that can access or modify interpolation_t. In its current flavour, interpolation_t is a concrete (i.e., nonabstract) data type with a fixed set of multirank allocatable array member variables for storing shape function values and derivatives. For example, the one storing shape function values is a 3rank array, where a reference_fe_t implementor may choose its indices, from left to right, to refer to the component of the shape function, the shape function, and the quadrature point, respectively. The reference_fe_t subclass is, however, completely free to lay out the data in these arrays, and it is in this flexibility where the extensibility of the software design to accommodate several FE space realizations resides. This, indeed has been proven to be sufficient to (efficiently) implement all FE spaces currently available in FEMPAR, including scalar, vector, and tensorvalued Lagrangian FEs (where higherrank spaces are determined as the tensor product of the scalar spaces, and shape functions have only one nonzero component), and genuinely vectorvalued FE spaces (where more than one component of the shape function may be nonzero).
8.3 Geometrical Mapping
A basic building block is the mapping \(\varvec{\Phi }_K\) among the reference cell \({\hat{K}}\) coordinate system and the one corresponding to an actual cell K of the triangulation in the physical space; see Sects. 3.2 and 3.3. For example, we are able to pull back the gradients of the shape functions from the reference to the physical space in (14) using the Jacobian of the transformation evaluated at quadrature points, or to evaluate the source term at quadrature points in real space. The Jacobian is also required to the transform the integral from the physical to the reference space in (12) and to compute the Piola transformations in div and curlconforming FE spaces (see (17) and (18)). The derived type cell_map_t in FEMPAR is designed to be a placeholder for the data required to provide this sort of services. It is declared as shown in Listing 17. The rationale underlying the inheritance relationship among cell_map_t and base_map_t will be made clear in Sect. 9.
The create binding of cell_map_t takes as input a quadrature_t instance with a set of integration points where \(\varvec{J}_K(\hat{\varvec{x}}_{gp})\), \(\varvec{J}_K^{1}(\hat{\varvec{x}}_{gp})\), and \(\varvec{J}_K(\hat{\varvec{x}}_{gp})\) are to be evaluated (see Listing 17). These geometryrelated data are stored in the jacobian(:,:,gp), inv_jacobian(:,:,gp), and det_jacobian(gp) allocatable array member variables of cell_map_t, respectively, and allocated during a call to this binding. Apart from a quadrature_t instance, cell_map_t also requires a description of the (discrete) space of functions to which \(\varvec{\Phi }_K\) belongs. FEMPAR supports mappings \(\varvec{\Phi }_K\) belonging to abstract FE spaces (e.g., highorder polynomial FE spaces or splinebased spaces). The reference_fe dummy argument of polymorphic type reference_fe_t serves the purpose. (We note that dynamic runtime polymorphism in this particular context let us reuse cell_map_t, e.g., with an arbitrary cell topology.) It turns out that the only information that reference_fe_t has to provide to cell_map_t are its shape functions, first derivatives, and (on demand) second order derivatives at the quadrature points (in the reference space). The interpolation member variable (see Listing 17) is used by reference_fe to exchange this sort of data with cell_map_t via a call to the create_interpolation binding of the former (see Listing 16) during a call to the \(\texttt {create}\) binding of the latter.
While the create TBP of cell_map_t is designed to be called once, the update TBP of cell_map_t is, however, designed to be called multiple times, once per every cell K of the triangulation. A precondition of update is that the nodes_coordinates(:) scratch member variable (see Listing 17) has been loaded with the coordinates in real space of the nodes describing the geometry of K (stored into point_t instances). Once this precondition is fulfilled, \(\varvec{\Phi }_K\) can be expressed as a linear combination of the reference_fe_t shape functions with nodes_coordinates(:) being the corresponding coefficients in the expansion. At this stage, coordinates_quadrature_points(:), which stores the coordinates of quadrature points in real space, and jacobian(:,:,:), can be easily computed. Finally, inv_jacobian(:,:,:) and det_jacobian(:) can be computed from jacobian(:,:,:) using straightforward numerical algorithms.
8.4 Evaluation of Shape Functions in the Physical Space
The user code that evaluates cell integrals in (12), may need the value, gradient, curl, and divergence of the shape functions at the integration points in the physical space, provided that we want to unburden FEMPAR users from the complexity of having to explicitly apply mapping transformations. As commented in Sect. 3, the mapping that transforms a shape function \(\hat{\phi }^{a}(\hat{\varvec{x}})\) in the reference FE space into the one in the physical space \(\phi ^{a}(\varvec{x}) = {\hat{\Psi }}_K(\hat{\phi }^{a}) \circ \varvec{\Phi }^{1}_K\), depends on the particular FE space at hand; see Sects. 3.8, 3.9, and 3.10 for details. For this reason, the actual code that performs these transformations is not actually bounded to cell_map_t, but to reference_fe_t, through the deferred binding with interface declared in Listing 18.
The interpolation_reference_cell input dummy argument of apply_cell_map (see Listing 18) must have been obtained from a call to the binding in Listing 16 invoked on the same reference_fe_t instance. The output dummy argument interpolation_real_cell holds the shape functions and their derivatives evaluated at quadrature points in physical space (see (13) and (14)). It is also assumed that, on input, interpolation_real_cell already contains the data that does not have to be recomputed on each mesh cell, e.g., the value of the shape functions on integration points for Lagrangian FEs; see the discussion related to the update binding below for the strategy that we follow in order to fulfill this requirement. This leaves room for optimization in the implementation of this deferred binding (on subclasses), since these quantities do not have to be recomputed on each cell. The reference_fe_t subclass uses the cell_map_t instance (passed to the apply_cell_map binding, see Listing 18) as a placeholder for the data required to provide the mapping transformations required.
We stress, however, that interpolation_t is a low level structure that is not designed as a data type that FEMPAR users have to interact with, for reasons made clear in Sect. 8.2. Therefore, we need to introduce an additional data type in our software design, called cell_integrator_t, that, among other services, is able to fetch raw data from interpolation_t into field data types (i.e., scalars, vectors, and tensors) the user can be easily familiarized with. This data type is declared as shown in Listing 19.
An instance of cell_integrator_t is created from a quadrature rule (where the shape functions and their derivatives are to be evaluated) and a polymorphic reference_fe_t instance describing the reference FE space at hand; see interface of the create binding in Listing 19. During this stage, reference_fe creates the interpolation_reference_cell member variable of cell_integrator_t via create_interpolation; see Listing 16. It also clones interpolation_reference_cell into interpolation_real_cell, and copies the contents of the former into the latter. This lets cell_integrator_t to fulfill later on the precondition on the last dummy argument of apply_cell_map. The create binding also associates its polymorphic pointer reference_fe member variable to the reference_fe_t instance provided to it on input. This pointer is required later on by the update and get_* bindings (see discussion in the sequel).
The update binding of cell_integrator_t simply invokes apply_cell_map on its polymorphic reference_fe member variable, using the instance of cell_map_t provided on input to update, and the two interpolation_t member variables as actual arguments, respectively; see Listings 18 and 19. It leaves the cell_integrator_t instance on which it is invoked in a state such that it is able to provide the services it was primarily designed for. These are offered through the get_values, get_gradients, get_divergences, get_curls, etc., generic bindings. We note that cell_integrator_t is designed such that it can handle either scalar, vector, or tensorvalued reference_fe_t instances (see Sect. 6.2). With this purpose in mind, each of the aforementioned generic bindings are overloaded with subroutines that have appropriate interfaces for these three types of FEs. For example, the subroutine overloading get_gradients in the case of scalarvalued FEs is declared and implemented as shown in Listing 20, with vector_field_t representing a ddimensional rank1 tensor; the interface of the one corresponding to vectorvalued FEs only differs from the one above on the base type of the gradients allocatable array dummy argument, which is of base type tensor_field_t (i.e., data type representing a ddimensional rank2 tensor).
Let us remark some important points with respect to the subroutines overloading the generic bindings of cell_integrator_t. First, we note that the actual argument passed in place of, e.g., the gradients(:,:) dummy argument in Listing 20, is intended to be actually declared in code written by the user of FEMPAR. Provided that FEMPAR can support variable degree FEs on top of different triangulation cells (see Sect. 10), the allocatable attribute of the gradients(:,:) dummy argument not only unburdens the user from the complexity of having to (pre)allocate this array, but even from the one associated to variable degree FEs. For example, if on input, the size of gradients(:,:) is not sufficient to hold the data to be provided by the cell_integrator_t instance corresponding to the reference_fe_t on top of the current triangulation cell, then it can be reallocated to the appropriate size. Second, this binding is designed such that all functions are evaluated at all quadrature points within a single call, justifying why the dummy argument has to be a rank2 allocatable array.^{21} At this point, let us note that all subroutines subject to consideration ultimately rely on (deferred bindings of) reference_fe_t; see, e.g., line 5 in Listing 20. We recall that reference_fe_t must mediate in any process that requires retrieving data from interpolation_t; see Sect. 8.2.
8.5 Cell Integration User Code Example
At this point of the discussion, we are already in position to show user code that evaluates the entries of the (current cell) local matrix for the Example 3.1 presented in Sect. 3.1. This code is sketched in Listing 21. This code would be bounded to a subclass of the discrete_integration_t abstract data type presented in Sect. 11.2 suitable for the Galerkin discretization of the Poisson problem.
The reader may note from Listing 21 that FEMPAR also offers an expression syntax that lets its users code weak forms in a way that resembles their mathematical expression. The user is in charge of explicitly writing the expression of the numerical integration in the reference cell, i.e., of explicitly implementing the quadrature point summation (loop) and handling the determinant of the Jacobian and the quadrature point weighting in (12). However, the evaluation of the shape function and their gradients, curls, etc., at the quadrature points in the physical space (e.g., expressions (13) and (14)) are completely hidden to the user. This can be achieved using a feature of modern programming languages called operator overloading. (We refer to [67] for a detailed exposition of this mechanism in Fortran2003.) Common (contraction) operations among tensors are provided by means of overloaded intrinsic and librarydefined operators. For example, the operator(*) generic interface (corresponding to the * intrinsic operator) has to be overloaded with the single contraction of rank1 tensors, and the multiplication of a rank1 tensor by a scalar to let our code compile. A crucial design requirement in the seek of code efficiency is that no dynamic memory allocation/deallocation is involved as the partial evaluation of subexpressions proceeds (in the order dictated by operator associativity and priority rules in Fortran). In order to fulfill this requirement, the data types representing vectors and tensors are declared such that their entries are stored in an array member variable of size known at compilation time. This size is stored in the librarylevel parameter constant SPACE_DIM, defined as the maximum number of space dimensions of the physical space in which the physical problem is posed. By default, FEMPAR is prepared to deal with 3D simulations, but the code is written such that a 2D simulation might also be performed if SPACE_DIM is equal to 3, at the price of extra storage and computation.^{22} Higher dimensional problems could be considered by compiling FEMPAR with a larger value for SPACE_DIM. Apart from avoiding dynamic memory allocation/deallocation during the evaluation of weak forms, this solution has the following advantages: (1) there is no need to explicitly have the number of dimensions as a member variable of the data types representing vectors and tensors; (2) the limits of the loops implementing tensor contraction operations are known at compilation time, enabling compiler optimizations. We finally stress that we preferred this solution over the usage of Fortran2003 parameterized data types [67] due to the lack of support of this feature in some of the most popular compilers widely available on highend computing environments.
9 Evaluation of Facet Integrals
This section covers the data types (and their interactions) in which the evaluation of integrals over the facets of the triangulation is grounded on. The integration of facetwise matrices and vectors (see, e.g., (23)) involves the evaluation of shape functions and gradients of the neighbouring cells at the quadrature points within the facet in the physical space and the Jacobian of the facet map at the reference space. As described in Sect. 8, the former quantities are computed at every neighbouring cell from their values at the reference space and the Jacobian of the cell mapping. The evaluation of interior facet also requires the computation of the permutation \(\Pi (\mathrm{gp})\) (see (25)) provided that the coordinate systems of the cells surrounding the facet might not be aligned in physical space.
In FEMPAR the assembly process of the global linear system underlying the discrete weak problem (20) involves two loops, over all cells and facets, respectively. In the former loop, a cellwise matrix \(\mathbf {A}^{K}\) and vector \(\mathbf {f}^{K}\) are computed per each cell. These hold the partial contributions of the cell to the corresponding entries of the global coefficient matrix and righthand side vector, respectively. The data structures involved in their efficient computation have been already covered in Sect. 8. In the latter loop, and assuming that we are sitting on an interior facet \(F\in \mathcal {F}^{\Omega }_{h}\), four facetwise matrices, namely \(\mathbf {A}^{F}_{K^+ K^+}\), \(\mathbf {A}^{F}_{K^+ K^}\), \(\mathbf {A}^{F}_{K^ K^+}\), and \(\mathbf {A}^{F}_{K^ K^}\) are computed (see Sect. 3.12).
9.1 Numerical Quadrature
The data type quadrature_t is designed to be a placeholder for the facet quadrature points \(\hat{\varvec{x}}_q\) and its associated weights \(\mathrm{w}_{q}\). However, the code that ultimately decides how to distribute \(\hat{\varvec{x}}_q\) over the reference facet \({\hat{F}}\) coordinate system, and set up \(\mathrm{w}_{q}\), is bounded to reference_fe_t, in particular through the deferred binding with interface shown in Listing 22. We refer to Sect. 8.1 for the rationale underlying the degree optional dummy argument of this deferred binding.
9.2 Geometrical Mappings
The facet_maps_t data type in Fig. 8 handles the geometrical facet mapping and the two geometrical cell mappings. The facet mapping is represented by facet_map_t, whereas the cell mappings by cell_map_t; see Sects. 9.2.1 and 9.2.2, respectively.
9.2.1 Facet Mapping
As illustrated in Fig. 8, facet_maps_t is composed, among others, of a single instance of type facet_map_t. The member variables (and associated code) that are common to facet_map_t and cell_map_t are factored into a superclass base_map_t (see Listing 17). facet_map_t handles all data related to the facet map \(\varvec{\Phi }_F\), including the facet outward unit normals (see Fig. 8). An extra 2rank real allocatable array member variable, outward_unit_normals(:,:), stores the facet outward unit normals (with respect to \(K^+\) by convention) evaluated at facet quadrature points in real space, as required by (25); \(\varvec{n}^(\varvec{x}_\mathrm{gp})\) can be simply obtained as \(\varvec{n}^(\varvec{x}_{\mathrm{gp}})=\varvec{n}^+(\varvec{x}_\mathrm{gp})\).
Let us now see how facet_maps_t controls the life cycle of its facet_map_t instance. The create binding of facet_map_t takes a quadrature_t instance with the facet quadrature points. \(\varvec{J}_F(\hat{\varvec{x}}_\mathrm{gp})\) and \(\varvec{J}_F(\hat{\varvec{x}}_\mathrm{gp})\) are evaluated at these quadrature points and stored in the jacobian and det_jacobian member variables, which are allocated during a call to this binding together with outward_unit_normals(:,:). Apart from a quadrature_t instance, facet_map_t also requires a description of the discrete, lower dimensional space of functions on top of the reference facet \({\hat{F}}\) to which \(\varvec{\Phi }_F\) belongs. The ref_fe_geo dummy argument of create, of polymorphic type reference_fe_t, is provided for this purpose; in particular, facet_maps_t sends the reference_fe_t on top of \(K^+\) as an actual argument to the ref_fe_geo dummy argument in order to comply with the above described convention for the normals. The interpolation_t member variable of facet_map_t (see Listing 17) is used by ref_fe_geo to exchange with facet_map_t the shape function values and their derivatives. To this end, reference_fe_t is equipped with the create_facet_interpolation deferred binding (see its signature in Listing 23) that computes these quantities on top of the reference facet \({\hat{F}}\).
The update binding of facet_map_t is intended to be called once per facet loop iteration, i.e., once per each facet of the triangulation. A precondition of this binding is that the nodes_coordinates(:) scratch member array of facet_map_t (see Listing 17) has been loaded with the coordinates in real space of the nodes that lay on the the facet.^{23} The update binding takes as input dummy arguments a quadrature_t instance and the real parameter reorientation_factor in order to adjust the sign of the facet normals (see (26)). Within update, quadrature_points_coordinates(:) and jacobian(:,:,:) can be easily computed from the basis shape functions and their first derivatives, respectively. On the other hand, det_jacobian(:) and outward_unit_normals(:,:) can be computed from jacobian(:,:,:). The former as stated in (24), while the latter as in (26).
9.2.2 Neighbouring Cells Mappings
The facet_maps_t data type is also composed by two instances of type cell_map_facet_restriction_t; see Fig. 8. These instances handle all data related to \(\varvec{\Phi }_{K^\alpha }\), with \(\alpha \) being either \(+\) or −. Let us thus refer to these instances as cell_map_facet_restriction \(^\alpha \), and to the polymorphic reference_fe_t instances on top of \(K^\alpha \) as ref_fe_geo \(^\alpha \). In turn, cell_map_facet_restriction \(^\alpha \) are composed by as many cell_map_t instances as facets in \(K^\alpha \). Provided that an actual facet \(F\) can potentially have local identifier \(F^\alpha \) in \(K^\alpha \) within the range \(F^\alpha =1,\ldots ,\text {num}\_\text {facets}(K^\alpha )\), having as many cell_map_t instances as facets per surrounding cell let us hold and (pre)calculate within these instances the result of evaluating the \({\hat{K}}^\alpha \) shape functions and their derivatives at the facet quadrature points for all facets in the reference system. To this end, the create binding of cell_map_facet_restriction \(^\alpha \) is invoked (from the one corresponding to facet_maps_t) with the facet quadrature \(\texttt {q}\) and ref_fe_geo \(^\alpha \) as input actual arguments. It then walks over all possible local facet identifiers in the corresponding cell, and for each local facet identifier, invokes a specialized version of the create binding of the corresponding cell_map_t instance, named create_restricted_to_facet (that additionally requires the local facet identifier); see Fig. 8. The reference_fe_t is ultimately responsible to exchange this sort of data with cell_map_t. This service is in particular provided by the create_interpolation_restricted_to_facet deferred binding of reference_fe_t, with signature defined in Listing 24.
As seen so far, the create binding of facet_maps_t is designed to be called right before the actual loop over all triangulation facets, and it sets up all the scratch data. It does so by covering all possible scenarios corresponding to potential values of local facet identifiers within the two surrounding cells (even if some of these scenarios are not actually exposed in the triangulation). The update binding of facet_maps_t, however, is intended to be called sitting on a particular facet \(F\) of the triangulation, and it has to only update those two cell_map_t instances within cell_map_facet_restriction \(^\alpha \) corresponding to the particular scenario at hand, i.e., to the particular combination of local facet identifiers \(F^+\) and \(F^\) of the facet on which it is being updated. To this end, the update binding of facet_maps_t receives these local identifiers in facet_lids (see Fig. 8) and then calls the update binding of cell_map_facet_restriction \(^+\) and cell_map_facet_restriction \(^\) with facet_lid=facet_lid(1) and facet_lid=facet_lid(2), respectively. The update binding of cell_map_facet_restriction_t picks up the cell_map_t corresponding to facet_lid and invokes the update binding of the latter. We stress that no specialized version of this binding is required in the context of facet integration, i.e., the same version discussed in Sect. 8.4 for cell integration can be reused here.^{24} During the update process, cell_map_facet_restriction_t also registers in its current_facet_lid private member variable, the value supplied to the facet_lid dummy argument. This lets facet_maps_t to extract later on from cell_map_facet_restriction \(^\alpha \) the updated cell_map_t instances; see discussion of facet_integrator_t in the sequel.
9.3 Evaluation of Shape Functions in the Physical Space
The last data type that remains to be covered is facet_integrator_t; see Fig. 8. This data type is the counterpart of cell_integrator_t (see Sect. 8.4) for the case of facet integrals. In particular, it stores and updates shape function values and derivatives, and provides the values, gradients, curls, and divergences of the respective fields for both \(K^+\) and \(K^\) evaluated at facet quadrature points in real space. As can be observed from Fig. 8, its overall design is very close to the one of facet_maps_t, with cell_integrator_facet_restriction_t and the cell_integrator_t instances it is composed of, playing the role of its counterparts in the scope of facet_maps_t (i.e., cell_map_facet_restriction_t and cell_map_t, respectively). There are, however, two major differences among these two. First, facet_integrator_t deals with (e.g., it is created from) the two polymorphic reference_fe_t instances (see ref_fe \(^\alpha \) dummy arguments of its create binding in Fig. 8) on which the global FE spaces of functions \(\mathcal {X}_h\), \(\mathcal {Y}_h\) are grounded on. For example, the create binding of cell_integration_facet_restriction \(^+\) invokes the create_restricted_to_facet binding of the cell_integrator_t for all facets \(F^+\) within \(K^+\). The latter computes at a given facet \(\hat{\phi }^{a}_{K^+}(\hat{\varvec{x}}^{+}_\mathrm{gp})\), \({\varvec{\nabla }}\hat{\phi }^{a}_{K^+}(\hat{\varvec{x}}^{+}_\mathrm{gp})\) through the deferred binding create_interpolation...to_facet of reference_fe_t presented in Listing 24. Second, facet_integrator_t has to unburden the user from the complexity underlying the fact that the coordinate systems of \(K^+\) and \(K^\) might not be aligned in real space. To this end, it is equipped with a private lookup permutation table, called qpoints_perm(:,:) in Fig. 8, that lets it translate facet quadrature points identifiers from the local numbering space of \(K^+\) into the one of \(K^\). This table is allocated and filled during the create binding of facet_integrator_t, in particular by reference_fe_t through a deferred binding called fill_qpoints_permutations. Given the facet quadrature identifier \(\texttt {gp}\) and the facet permutation index \(\texttt {pi}\) (see Sect. 3.16), qpoints_perm(gp,pi) stores the value of \(\Pi (\mathrm{gp})\) (see (25)). The permutation index is stored within the current_permutation_index of facet_integrator_t, extracted from the permutation_index dummy argument of the update binding. In turn, this parameter is extracted from the array facet_permutation_indices(:) of fe_space_t in Listing 27 (see Sect. 10). We note that for nsimplices, we consider a renumbering such that all facets have the same orientation on both cells that share it, as commented in Sect. 3.16. In this case, fill_qpoints_permutations fills the table with the identity permutation in all columns. We note that the reorientation of the nsimplices can lead to mappings \(\varvec{\Phi }_K\) such that \( \varvec{J}_K < 0\), but this is not a problem as soon as one takes its absolute value, e.g., in (12).
9.4 Facet Integration User Code Example
In order to grasp how the data structures covered so far are actually used together in practice, the Fortran pseudocode snippet at Listing 25 shows user’s space code in charge of evaluating the first integral in (22) for each interior facet in a loop over all facets. It would be bounded to a subclass of the discrete_integration_t abstract data type presented in Sect. 11.2 suitable for the nonconforming DG discretization of the Poisson problem.
There are a pair of worth noting remarks about Listing 25. First, the call to the get_values() binding of facet_integrator_t in Line 14 already returns the permuted \(K^\) shape function values, i.e., shape_values_ \(K^\) (b,gp) actually stores \(\phi ^{b}_{K^}(\varvec{x}^{}_{\Pi (\mathrm{gp})})\). Second, it is the socalled fe_space_t abstraction (to be covered in Sect. 10) the one in charge of creating the facet integration data structures on loop initialization and to update them at each facet loop iteration (see Line 9). Therefore, the user does not actually directly deals with all the data types bindings and their interactions illustrated in Fig. 8. In this example, it becomes evident that facetloop based integration is very convenient for the implementation of DG methods, since it very much resembles the blackboard expressions (see, e.g., (20)).
9.5 ChangeofBasis Implementation in a reference_fe_t Subclass
In this section, we provide a detailed presentation of how the changeofbasis required to compute the shape functions basis is implemented in a reference_fe_t subclass. In particular, we show the implementation for the RaviartThomas divconforming FE on ncubes in Sect. 3.5 (see also Sect. 3.9 for details). The prebasis, e.g., \({\mathcal {Q}}_{(k+1,k,k)} \times {\mathcal {Q}}_{(k,k+1,k)} \times {\mathcal {Q}}_{(k,k,k+1)}\) in 3D, has to be generated before this subroutine is called; see, e.g., the evaluation of the prebasis in Line 31 of Listing 26.
We also present how to compute the boundary moments in (16) in Listing 26; interior moments are simpler and omitted for the sake of brevity. The implementation of the boundary moments requires: (1) to create the reference_fe_t that implements \([ {\mathcal {Q}}_{k \varvec{1}} ]^{d1}\) in Line 16, (2) a facet quadrature on the reference facet in Line 24, and (3) the evaluation of the reference FE in the quadrature points in the interpolation_t in Line 25. We also require a Lagrangian (first order) FE that represents the geometry in Line 20. Next, we loop over all the facets of the cell and compute the values of the shape functions of the cell in the facet quadrature, stored in the interpolation_t instance in Line 31. With all these ingredients, we can compute the boundary moments for the prebasis functions (see line 43) and assemble them in the changeofbasis matrix. After doing the same for interior moments, we just need to invert the changeofbasis matrix in Line 54. At this point, we have the shape functions basis as a linear combination of prebasis functions. Thus, when one calls the fill_interpolation binding of the corresponding reference FE, it creates the prebasis interpolation_t instance and next applies the changeofbasis matrix to compute the one for the shape functions basis, i.e., the placeholder where the evaluation of the shape functions and its derivatives (at the set of quadrature points for which the interpolation has been created) are stored. We note that the ownership of DOFs also changes in this process. The boundary moments (integrals of functions on facets) belong to the corresponding facet, whereas interior moments belong to the cell. Vertices and edges do not have DOFs in this case. The definition of the ownership is skipped for brevity.
10 Integration and Global DOF Handling: The fe_space_t Abstraction
In this section, we introduce a software abstraction, referred to as fe_space_t, which represents (in the most general scenario) the mathematical concept of a global FE space \(\mathcal {X}_h = \mathcal {X}^1_h \times \ldots \times \mathcal {X}^n_h\) obtained by means of the Cartesian product of global FE spaces \(\mathcal {X}^{i}_h\) corresponding to each of the \(i=1,\ldots ,n_\mathrm{field}\) field unknowns involved in a system of PDEs; see Sects. 3.6 and 3.11. Each \(\mathcal {X}^{i}_h\) is described as a combination of: (1) an approximation \(\Omega _h\) of the physical domain \(\Omega \) provided by triangulation_t, i.e., a meshlike container for the cells on which \(\Omega _h\) is partitioned, their boundary lowerdimensional objects, and their adjacency relationships; see Sect. 7; (2) a description of the \(n_\mathrm{field}\) reference FEs associated to each triangulation cell grounded on reference_fe_t; see Sect. 6.
These two basic building blocks equip fe_space_t with the tools required to provide the following two crucial services.^{25} On the one hand, it is in charge of handling (i.e., generating, storing, fetching) a global enumeration of the DOFs corresponding to each \(\mathcal {X}^{i}_h\) taking into account the notion of conformity; see e.g., Sects. 3.6 and 6.2. On the other hand, it handles the data structures that are required to evaluate integrals over cells and facets (see Sects. 8 and 9, respectively). In particular, it judiciously sets up them, and orchestrates their respective life cycles and interactions, while unburdening the user (to a large extent) from the complexity (among others) inherent to high order FEs.
The OO design of fe_space_t (as the one of many other data types in FEMPAR, e.g., triangulation_t) strongly strives to preserve encapsulation and data hiding while still storing and accessing data efficiently (i.e., in a way that leverages data locality for the efficient exploitation of modern computer memory architectures). The userfriendly view of fe_space_t is implicitly (re)constructed by the data types (associated interfaces and interactions) that will be covered in Sect. 10.2. We now move on the approach that we follow for the internals of fe_space_t.
10.1 The Internal Organization of fe_space_t
In this section, we sketch how the internals of fe_space_t are organized in order to efficiently deliver the two services outlined above. For simplicity, we restrict ourselves to a simplified version of fe_space_t that, to a large extent, captures the spirit of its actual counterpart in FEMPAR. The declaration of this simplified data type is shown in Listing 27.^{26}
A collection of reference_fe_t polymorphic instances is stored in the reference_fes(:) array. These instances are uniquely identified (within the local scope of fe_space_t) by their position in this array. The global FE space corresponding to a given field, with identifier f_id in the range \(1,\ldots ,{\texttt {num\_fields}}\) (with num_fields equal to \(n_\mathrm{field}\) above), is described by: (1) the triangulation member variable (the rationale underlying it being polymorphic is made clear in Sect. 10.2; (2) its restriction to each cell provided by the reference FE space defined by the reference_fe_t instance with identifier field_cell_to_ref_fes(f_id,c_id) in the collection; c_id is assumed to be a positive integer in \(1,\ldots ,\texttt {triangulation\%get\_num\_cells()}\) that uniquely identifies each cell.
The member variables used to handle the global DOF numbering are encompassed within Lines 18–27 of Listing 27. The global DOF identifiers are stored cellwise, and fieldwise within each cell, in the lst_dofs_gids(:) array, which is in turn (indirectly) addressed by the ptr_dofs_x_fe(:,:) array. In particular, the ones assigned to the local nodes related to field f_id on cell c_id start and end in position ptr_dofs_x_fe(f_id,c_id) and ptr_dofs_x_fe(f_id+1,c_id)1 of lst_dofs_gids(:), respectively, if \(\texttt {f\_id} < \texttt {num\_fields}\), and in position ptr_dofs_x_fe(f_id,c_id) and ptr_dofs_x_fe(1,c_id+1), respectively, if \(\texttt {f\_id} = \texttt {num\_fields}\). The number of DOFs of the global FE space corresponding to each field (excluding those that are subject to strong boundary conditions) is stored in the num_dofs_x_field(:) array.
The member variable in Line 15 stores a reference to a data type that describes the block layout currently selected (i.e., it can be changed on demand) for the global matrix and righthand side vector of the linear system (or a sequence of them) required for the solution of the PDE system at hand. The role of block_layout_t in the global DOF numbering generation process will be illustrated in Sect. 10.3.
The data structures that let fe_space_t handle the evaluation of cell integrals are declared in Lines 23–29 of Listing 27. The set_up_cell_integration binding sets up them. The method is intended to be called by the user’s program right before any cell integration loop. It ensures that any (scratch) data that can be computed on its final form in the reference cell is precomputed for any of the triangulation cells while minimizing the number of integration data structures required for the particular scenario at hand. To this end, fe_space_t is equipped with three array containers of quadrature_t, cell_map_t and cell_integrator_t objects (see Lines 24, 26, and 28, respectively), which are indirectly addressed by the hash_table_t member variables with corresponding names.^{27} This is required because fe_space_t supports, e.g., nonconforming FE spaces with variable order per cell. A unique identifier (dynamically generated within the scope of fe_space_t) is assigned to each of the integration objects that must be created. The hash_table_t instances let fe_space_t transform these unique identifiers into container array positions from which the integration objects can be fetched.
The set_up_cell_integration method loops over all cells. Sitting on a cell, it determines an appropriate quadrature to be used on that cell and its associated unique identifier. (See discussion in the next paragraph for more details.) If this quadrature has not been generated yet (i.e., if the hash table lookup fails), then a new quadrature is created on the next free position of the cells_quadratures(:) array container, and a new identifierposition pair is inserted into the hash table. Otherwise, the quadrature is fetched from this array. The same process is repeated for the cell_map_t and cell_integrator_t instances. The former ones are uniquely determined by the combination of the unique identifier quadrature_t just created/fetched and that of the reference_fe_t instance on top of the current cell (see Sect. 7). On the other hand, a cell_integrator_t instance has to be associated to each field within the current cell; the cell_integrator_t instance corresponding to a field is uniquely determined by the unique identifier of the quadrature_t just created/fetched and the one of the reference_fe_t associated to that field (see Sect. 8.4). Therefore, the unique identifiers of the cell_map_t and cell_integrator_t instances required for the evaluation of cell integrals over the current cell can be easily determined combining the ones corresponding to the instances from which they are created. We recall that the unique identifier of the reference_fe_t instance on top of the current cell, c_id, for a given field, f_id, can be retrieved from reference_fe_id=field_cell_to_ref_fes(f_id,c_id), while the reference_fe_t instance itself from reference_fes(reference_fe_id).
The allocatable array member variable in line 23 (with as many entries as triangulation cells) can be used by the user in order to (optionally) determine the degree of the quadrature to be used on each triangulation cell. This member variable is allocated and initialized (during fe_space_t creation) to a reserved flag that instructs set_up_cell_integration to use an automatic (default) strategy to decide the degree of the quadrature to be used on each cell. This default strategy relies on a deferred binding of reference_fe_t, named get_default_quadrature_degree, which typically returns the quadrature degree for which mass matrix terms are integrated exactly (see Sect. 8.1).^{28} The strategy, in particular, walks over all reference_fe_t instances on top of the cell, and the one for which its (polynomial) reference cell functional space is of maximum order becomes ultimately responsible of creating the quadrature via an invocation to its create_quadrature deferred binding. Alternatively, the user may explicitly select the quadrature degree to be used on each cell. In such a case, create_quadrature is invoked to create a quadrature with the degree given by the corresponding entry in the cell_quadratures_degree(:); see Sect. 8.1. In any case (i.e., default or explicit quadrature degree), both the unique identifier of the reference_fe_t instance on top of the current cell and the quadrature degree are used to generate a unique identifier of the quadrature to be created/fetched.
On the other hand, Lines 32–38 of Listing 27 encompass those data structures required for the evaluation of (both boundary and interior) facet integrals; see Sect. 9. A very close rationale to the one underlying their cell counterparts is followed to set up these data structures. The set_up_facet_integration binding loops over all facets. Sitting on a facet, it determines an appropriate facet quadrature_t rule. The quadrature degree is either the default or a userdefined one (via the allocatable array member variable in Line 32). It also determines the unique identifier of the quadrature and of the rest of the facetintegration data structures, which are created as necessary, while handling their interactions. Both the topology of the two cells sharing the facet and the quadrature degree are used to generate a unique identifier of facet quadratures. The member variables in Lines 41–42 provide support to the implementation on the socalled fe_facet_iterator_t data type and will be covered in detail in Sect. 10.2. Finally, the member variable num_fixed_dofs in Listing 27 is used by fe_space_t to count how many DOFs are subject to strong boundary conditions; see Sect. 10.4.
10.2 A Conceptual View of fe_space_t
Following the ideas presented in Sect. 7.1, fe_space_t offers a number of iterators to provide traversals over its objects, and uniform data access to its internals. Apart from iterators over cells and vefs, fe_space_t also provides traversals over facets by means of the socalled fe_facet_iterator_t data type. This iterator is essentially required to implement the evaluation of jump terms in, e.g., error estimators or DG methods in a userfriendly manner. For reasons made clear in the course of this section, a design goal to be fulfilled by fe_space_t iterators is that they are able to provide access to the same data as their counterpart triangulation_t iterators (see Sect. 7.1), and that they are able to do so efficiently while avoiding duplication of code bounded to the latter ones. For example, fe_cell_iterator_t should be designed such that it is also able to provide the coordinates (in physical space) of the nodes describing the geometry of the cell, apart from the global DOF identifiers on top of it.
Let us first discuss the design of iterators over cells and vefs (as the one of both follows the same lines). These data types are defined in Listing 28, where set must be actually replaced by either cell or vef.
As shown in Listing 28, fe_set_iterator_t holds a polymorphic pointer to the fe_space_t instance to which it has to provide data access. Dynamic polymorphism is exploited here with extensibility and code reuse in mind. Any type extension of fe_space_t (e.g., the one suitable for distributedmemory environments), can also become the target of this polymorphic pointer, thus enabling reuse of data and code bounded to fe_set_iterator_t with these extensions. Of special relevance in Listing 28 is the composition relationship among the data type being defined and set_iterator_t, i.e., its triangulation_t iterator counterpart (see Sect. 7.1). This lets fe_set_iterator_t to fulfill the aforementioned design goal, i.e., to provide a superset of data over the class it is composed of, while still being able to access to any data stored within the triangulation scope. fe_set_iterator_t also reuses from set_iterator_t the code underlying the sequential traversal over all objects of the set. Indeed, as many other TBPs of fe_set_iterator_t, init, next, and has_finished TBPs of fe_set_iterator_t are simply implemented as wrappers of their counterparts in set_iterator_t. (We remark that this is possible provided that fe_space_t is deliberately set up such that it shares with triangulation_t a consistent global numbering for cells and lowerdimensional objects.)
At this point it is important to remark that the set_iterator_t instance that fe_set_iterator_t aggregates is also polymorphic (see Line 3 in Listing 28). As stated in Sect. 10.1 (in particular, see Line 12 of Listing 27), a fe_space_t instance is created from a polymorphic triangulation_t instance. The create binding of fe_set_iterator_t extracts the latter from fe_space_t, and then calls its create_cell_iterator binding (see Sect. 7.1), which becomes ultimately in charge of determining the dynamic type of the set_iterator_t member variable of fe_set_iterator_t (apart from leaving the iterator positioned in the first object of the set). This lets fe_space_t (and its associated iterators) to be reused with any type extension of triangulation_t (e.g., the one suitable for distributedmemory computers and/or hadaptivity). Likewise, the free binding of fe_set_iterator_t relies on the free_cell_iterator binding of triangulation_t in order to safely deallocate any dynamic memory allocation performed during creation. We stress that, as in the case of triangulation_t iterators, both the create and free TBPs are not intended to be directly called by the user. Instead, triangulation_t provides a set of (public) TBPs (as many as different iterators) for this purpose. For example, the expression call fe_space%create_fe_cell_iterator(fe_cell_iterator) creates an iterator on the polymorphic fe_cell_iterator clientspace instance, while call fe_space%free_fe_cell_iterator(fe...) is in charge of safely deallocating this polymorphic instance.
The implementation of fe_facet_iterator_t is based on a very close rationale to the one of cell and vefs iterators, with subtle differences though; see Listing 29. Provided that fe_facet_iterator_t is a kind of fe_vef_iterator_t, it should provide the same set of data access methods of the latter (e.g., the cells sharing the facet). However, it should restrict the traversal to those vefs that are actually facets, and to be able to provide all data required for the implementation of jump terms over facets. As shown in Listing 29, fe_facet_iterator_t extends fe_vef_iterator_t. This automatically equips the former with the data access methods of the latter. On the other hand, it overrides those methods controlling the sequential traversals over the items in the set such that it restricts to facets, i.e., create/free/first/next/has_finished in Listing 29. The implementation of these methods relies on its member variable facet_gid, and the facet_gids(:) member variable of fe_space_t; see Line 41 of Listing 27. For a given facet with global identifier \(\texttt {facet\_gid}\), facet_gids(facet_gid) holds the global vef identifier corresponding to the facet.
The actual set of TBPs of a fe_space_t iterator highly depends on the type of object being pointed to. For completeness, we now briefly discuss those TBPs in the set corresponding to cell and facet iterators, which provide support for the implementation of the two services of fe_space_t we are focusing on. These are in particular shown in Listing 30. This listing also includes the generic TBPs in Lines 35 and 68, although they will be discussed in Sect. 11.1.
The TBPs in Lines 18–28, and 50–61 of Listing 30 let the user fetch from fe_space_t the integration data associated to the current cell and facet being pointed to, respectively. On the other hand, the update_integration bindings in Lines 6 and 47 perform those computations required to update these data structures such that they hold shape function values and derivatives evaluated at (current) cell and facet (quadrature points) in the physical space. The former binding is implemented as shown in Listing 31. Finally, the get_permutation_index TBP of fe_facet_iterator_t lets the caller to obtain the permutation index (see Sects. 3.16 and 9.3 for further details). The implementation of this method relies on the facet_permutation_indices(:) member variable of fe_space_t; see Line 42 of Listing 27. For a given facet with global identifier \(\texttt {facet\_gid}\), facet_permutation_indices(facet_gid) holds the permutation index corresponding to the facet. We have decided to permanently store facet permutation indices for performance reasons. These can be reused over and over again (e.g., in a transient and/or nonlinear PDE problem) without the overhead associated to its computation on each traversal over the facets of the triangulation.
An update of the cell_map_t instance (associated to the cell pointed by the fe_cell_iterator_t instance on which this subroutine is invoked) is performed in Line 12 of Listing 31. It is followed by a loop over the number of fields of the PDE system at hand in order to update the cell_integrator_t for every field in Line 17. The update of the former requires that its nodes_coordinates(:) scratch member variable has been loaded with the coordinates in the physical space of the nodes describing the geometry of the cell at hand (see Sect. 8.3). This is in particular fulfilled in Line 10. The coordinates fetched by this call are actually stored within the triangulation. However, fe_cell_iterator_t can satisfy this query provided that it is composed of a cell_iterator_t instance; see Listing 28 and accompanying discussion. At this point, the reader should be already capable to grasp how the fe_facet_iterator_t counterpart of this subroutine is implemented, so that it is omitted here in order to keep the presentation short.
Going back to Listing 30, the binding in Line 16 lets the user fetch the fieldwise global DOF identifiers that fe_space_t has associated to the node functionals on the current cell interior and its vefs. (The bindings in Lines 9–13 of Listing 30, however, assist fe_space_t on the generation of the global DOF numbering and their usage will be illustrated in Sect. 10.3.) This binding is implemented in Listing 32.
In Listing 32, p_1D_ip_array_t is assumed to be a data type with a single member variable, called p, declared as a pointer to a rank1 integer(ip) array. For each field, the subroutine locates the region within the lst_dofs_gids(:) member variable corresponding to that field within the current cell, and then it associates to it the corresponding pointer in fe_dofs(:). At the expense of sacrificing type safety (in Fortran there is no mechanism to declare a pointer to be readonly), we avoid the costly reallocation of userlevel allocatable arrays that would be needed in the case of nonconforming FE spaces with highly varying degree polynomial spaces among cells.
To end up, the get_vef binding in Listing 30 sets up a fe_vef_iterator_t instance to point to the corresponding vef within the cell. As a consequence, one may navigate over the cells, its vefs, cells around these vefs, etc., using fe_space_t iterators all the way round.
10.3 Global DOF Numbering Generation
In this section, we discuss how fe_space_t coordinates the building blocks covered so far in order to generate a global enumeration of the DOFs describing the global FE space \(\mathcal {X}_h \, \doteq \, \mathcal {X}^1_h \times \ldots \times \mathcal {X}^n_h\) for general multifield systems of PDEs. This process is encompassed within the generate_global_dof_numbering binding of fe_space_t (see Listing 27). The code of this method is shown in Listing 33. The block_layout dummy argument lets the caller to customize the global DOF numbering to be generated.^{29} On the one hand, this data type specifies in how many blocks the user wants to split the (discrete) PDE system at hand. In particular, the user may select to generate a DOF numbering suitable for monolithic or blocked storage linear algebra data structures, with block_layout%get_num_blocks() returning one and a number larger than one, respectively. On the other hand, block_layout_t specifies the mapping of fields into blocks, with block_layout%get_block_id(field_id) returning the block identifier the field with identifier field_id is mapped to. Provided that blocked linear algebra data structures in FEMPAR are addressed using row/column identifiers that are local to each block, block_layout equips the subroutine with the input necessary to generate a blockaware global DOF numbering, in which the DOFs belonging to fields of the first block are numbered first, followed by the ones of the second, and so on. We note that block_layout_t also holds inside how many DOFs are there per block (see Sect. 11.3). These latter quantities are computed within generate_global_dof_numbering (see discussion in the sequel).
The subroutine in Listing 33 starts checking whether it has to actually generate a global DOF numbering. It has to do so if there is no global DOF numbering available yet (see predicate in Line 9), or if the one available is not suitable for the input block_layout (see predicate in Line 10). The bulk of generate_global_dof_numbering is concentrated in the private helper TBPs of fe_space_t called fe_space_count_dofs and fe_space_list_dofs; see Lines 14 and 15 of Listing 33, respectively. The code of these bindings is shown in Listings 34 and 35, respectively. While the former computes the number of DOFs per field and block, the latter is in charge of the actual generation of the global DOF identifiers.
Lines 6–31 of Listing 34 are in charge of computing the number of DOFs per field, while those in Lines34–38, those per block. The latter lines just determine the number of DOFs per block by accumulating those corresponding to fields mapped to the block (computed in the former lines). The former lines are grounded on the notion of owner cell of a vef; a cell is the owner of a vef if (1) the latter lays on the boundary of the former, (2) it is the first cell for which (1) holds in the order in which the iterator over all cells presents them, and (3) the vef owns at least one DOF of the global FE space subject to consideration.^{30} The (logical) work array owner_cell_per_vef_visited(:) keeps track whether the owner cell of the vefs have been already visited (or not) as these are traversed in the nested loop over all cells (see outer loop in Line 12), and over all vefs within the current cell (see inner loop in Line 16). Sitting on a cell, the algorithm first counts those DOFs associated to node functionals logically placed in the interior of the current cell (see line 14). It then loops over the vefs of the current cell. If the owner cell of the current vef has not been visited yet, and the current cell is its owner, then the current cell is registered as the owner of the cell, and the DOFs associated to node functionals logically placed on this vef within the current cell are counted in Line 22. Provided that nonconforming FE spaces do not have DOFs on vefs, we can skip the loop over the vefs of a cell and accelerate the process in this case (see the if clause in Line 15 of Listing 34).
The algorithm shown in Listing 35 is in charge of the actual generation of the global DOF identifiers. The work array owner_cell_gid_per_vef(:,:) is used to store the owner cell global identifier of the vefs. On the other hand, vef_lid_in_owner_cell(:,:) array is used as an accelerator lookup table that stores the vef local identifiers (i.e., vef_lid) within their corresponding owner cells if they have been already visited, and 1 otherwise. Both arrays are indexed using vef global identifiers (i.e., vef_gid). Sitting on a cell, the algorithm first allocates global DOF identifiers for all node functionals associated to the interior of the current cell starting from fields_current_dof(field_id), i.e., the next freely available global identifier; see Line 27. It then loops over the vefs of the current cell. If the current vef has not been visited yet, then the current cell becomes its owner, and both the cell and the local identifier of this vef within the cell are registered in the corresponding work arrays. The global DOF identifiers associated to node functionals on this vef within the owner cell are allocated in Line 32 (as above starting from fields_current_dof(field_id)). On the other hand, if the current vef has been visited, then the global DOF identifiers associated to node functionals on this vef within the current cell are fetched from the corresponding ones within the owner cell in Line 39. The binding called in this line encodes the permutations described in Sect. 3.16.
As the reader might observe, Listing 35 is grounded on several (private) helper bindings of fe_cell_iterator_t that, at the cell level, aid in the generation of a global DOF numbering; see Lines 9–13 of Listing 30. These bindings ultimately rely on the reference_fe_t instances mapped to the cells of the triangulation; see Sect. 10.1. In particular, sitting on a cell, reference_fe_t instructs fe_cell_iterator_t with the association of its node functionals to the interior of the cell, and its lowerdimensional boundary objects according to the notion of conformity underlying the FE space at hand; see Sects. 3.6 and 6.2. For example, the implementation of the generate_own_dofs_vef_numbering binding is implemented as shown in Listing 36.
The code in Listing 36 extracts a list_iterator_t from the own_dofs_n_face member variable of the reference_fe_t instance used in the current cell for field_id. This iterator lets it to traverse those node functionals owned by the vef with local identifier vef_lid (see Sect. 6.2), and thus determine the (relative) position in lst_dofs_gids(:) of the global DOF identifiers to be allocated for such node functionals. We note that the logical predicate in Line 16 is evaluated to .true. if the DOF at hand is actually free, i.e., not subject to boundary conditions imposed in strong form; see Sect. 10.4.
Finally, we would like to stress that error checking statements and a major optimization that can be applied for the singlefield singleblock case are not shown in the code listings of this section in order to keep the presentation as simple as possible. Both are present in FEMPAR. In particular, for the aforementioned case, the global DOF numbering can be generated with a single loop over all cells (instead of two). The call in Line 14 of Listing 33 can be avoided, deferring the computation of the number of DOFs per field and block to the call in Line 15.
On the other hand, there is no need to generate a global DOF numbering from scratch when there is already one available, a permutation from the old to the new numbering could be computed and applied to lst_dofs_gids(:) by a single sweep over all cells. This optimization, however, is not present in FEMPAR, as indeed we did not find frequent the case where an application requires to change onthefly the blocklayout of the system of PDEs at hand.
10.4 Strong Imposition of Boundary Conditions
In order to assemble (33), the process described in Sect. 8 has to be slightly modified. A sweep over all triangulation cells is still required. Sitting on a given cell K, the element matrix \(\mathbf {A}^{K}\) and vector \(\mathbf {f}^{K}\) are computed as usual. However, the rows/columns corresponding to fixed DOFs in \(\mathbf {A}^{K}\) are not assembled into the global matrix. The same applies to the entries of \(\mathbf {f}^{K}\). However, \(\mathbf {f}^{K}\) has to be updated before assembly in order to reflect the contributions of strong boundary conditions (see the righthand side of (33)). Fortunately, the users of FEMPAR are unburdened from these subtleties. These are hidden within the assembly generic binding of fe_cell_iterator_t; see Listing 30 and 39. Apart from adding the contributions of the current cell to the global coefficient linear system and righthand side, this binding is in charge of computing the contribution to \(\mathbf {f}^{K}\) from strong Dirichlet boundary conditions. This poses two additional requirements on fe_space_t. In particular, (1) it should handle a global enumeration of free and fixed DOFs, while being able to distinguish among both kinds of DOFs; and (2) it should offer a suitable set of bindings to project/interpolate \(u_{\mathrm{D}}(x)\) on the boundary to get \(E_h u_{\mathrm{D}}\).
In order to satisfy (1), fe_space_t splits the whole set of DOFs into free and fixed DOFs, and the DOFs within each subset are labeled separately from each other as \(\{1,2,\ldots ,\{\mathrm{free \ {DOFs} }\}\}\), and \(\{1,2,\ldots ,\{\mathrm{fixed \ {DOFs} }\}\}\), respectively. (This is nevertheless an implementation detail that is never exposed to FEMPAR users.) In turn, free and fixed DOF values are actually stored into different arrays, so that they can be addressed separately using the corresponding global identifiers in the former and latter set, respectively; see Sect. 10.5.
The process that associates global identifiers to free DOFs has been already covered in Sect. 10.3. The one corresponding to fixed DOFs very much resembles the one for free DOFs. It is, however, restricted to vertices, edges, and faces of the triangulation that lay at the boundary, and it has to be equipped with support from the user that lets the process become aware of which DOFs sitting on the boundary are actually fixed. The fixed DOFs global enumeration process occurs during the initial setup of fe_space_t; see create generic binding in Listing 27. This process is grounded on two different ingredients. On the one hand, the user can determine \(\Gamma \) subregions through the sets associated to vefs sitting on the boundary (see Sect. 7.1). For example, the user may decide to use set identifier 1 and 2 to split the vefs in \(\Gamma \) into those which belong to \({\Gamma _\mathrm{D}}\) and \({\Gamma _\mathrm{N}}\), respectively. On the other hand, an abstract data type, called conditions_t, to be extended by FEMPAR users, lets users to customize the strong imposition of boundary conditions. In particular, with regard to the fixed DOFs global enumeration process, this data type offers a deferred binding that given a set identifier, provides a logical component mask. For each component of the PDE system, this mask provides whether the DOFs associated to vefs marked with this set identifier are fixed or free. For those FE spaces for which there is no DOFtocomponent association (e.g., RaviartThomas or Nédélec FEs), only the first component in the mask is taken into account, and the rest neglected.
On the other hand, for 2), fe_space_t provides a set of methods that let the user interpolate/project \(u_{\mathrm{D}}(x)\) on the boundary to get \(E_h u_{\mathrm{D}}\) in a number of suitable ways. \(E_h u_{\mathrm{D}}\) is ultimately stored within an instance of the fe_function_t data type; see Sect. 10.5. Boundary projectors involve the solution of a boundary mass matrix problem where integrals over boundary facets have to be evaluated; see Sect. 9. Again, all these bindings rely on the conditions_t abstract data type. In particular, given a boundary vef set identifier, a deferred binding of this data type returns a userdefined (scalarvalued) function to be imposed for each component of the PDE system at hand. In the case of RaviartThomas or Nédélec FEs, the d scalarvalued functions corresponding to its components are used to reconstruct the vectorvalued function, whose tangential or normal component, respectively is to be imposed.
10.5 Global FE Functions and Their Restriction to Triangulation Cells/Facets
In this section, we introduce a convenient software abstraction in our OO design, referred to as fe_function_t, which represents a global FE function \(u_h \in \mathcal {X}\, \doteq \, \mathcal {X}^1_h \times \ldots \times \mathcal {X}^n_h\). This data type and a subset of its TBPs (in particular, those that are relevant for the present section) are presented in Listing 37.
In Listing 37, the free_dof_values and fixed_dof_values are used to store \({\bar{u}}_h\) and \(E_h u_{\mathrm{D}}\), respectively; see Sect. 10.4. The former is a polymorphic member variable of type array_t; see Sect. 11.1. Relying on the set of deferred bindings offered by array_t, the code bounded to fe_function_t can be written independently of how the entries within the concrete implementation of array_t are laid out in memory, enabling code reuse to a large extent. For example, scalar_array_t is a concrete realization of array_t that uses monolithic storage, while block_array_t stores the entries organized into blocks (see Sect. 11.1 for more details). On the other hand, fixed_dof_values is a member variable of static type scalar_array_t; see Sect. 11.1.^{33} Fixed DOFs belonging to different fields might be indeed assigned intermixed global identifiers, significantly simplifying the enumeration process. In particular, a single sweep over all boundary objects suffices, in contrast to Listing 33, where two sweeps over all cells are required in order to generate a blockaware global numbering. From our experience, it turns out that neither blocked storage nor a data structure suitable for distributedmemory environments are strictly required to store \(E_h u_{\mathrm{D}}\), so that we can prevent the overhead associated to runtime polymorphism when dealing with fixed_dof_values.^{34}
A fe_function_t instance is created from a fe_space_t instance (to which it belongs); see signature of the create binding in Listing 37. This binding selects the dynamic type of free_dof_values, and therefore its storage layout, according to the one currently selected for the PDE system at hand; see block_layout member variable in Listing 27. The entries of free_dof_values can be determined in a number of ways. They might become the unknowns of a problem to be solved (e.g., by a preconditioned iterative linear solver or sparse direct solver), or computed from an expression involving other fe_function_t instances, e.g., \(u_h=v_h\), or \(u_h=v_h+w_h\), with \(u_h,v_h,w_h \in \mathcal {X}_h\). (Indeed, FEMPAR offers an expression syntax for global FE functions grounded on overloaded operators.) Apart from these, fe_space_t offers a pair of generic bindings, referred to as interpolate and project, to compute the DOFs nodal values of \({u}_h\) by either interpolation (using the expression in (9)) or projection (e.g., a global \(L^2\) projection) into the FE space of a userdefined function u(x).^{35} Each of these generic bindings is overloaded with three different regular bindings suitable for scalar, vector, and tensorvalued functions, respectively. The interpolate bindings in fe_space_t can be written independently of the reference FE by using a TBP associated to reference_fe_t that computes the local interpolator in (6).
There are three worth noting remarks in these two code snippets. First, the update binding of both data types rely on the gather_nodal_values binding of fe_function_t; see Listing 37. The latter equips cell/facet FE functions with the ability to restrict (gather) the nodal values of \(u^i_h\) from global to local arrays (stored as private scratch data within cell/facet FE function data types), while taking care of strong boundary conditions. Second, the update bindings require a procedure that, given the shape functions, first derivatives, etc., evaluated at quadrature points in physical space, and the nodal values \(u^i_h\) restricted to the current cell, provides the shape function values, gradients, curls, etc., of the FE function at these quadrature points. This service is provided by reference_fe_t by the set of evaluate_fe_function... deferred bindings in Lines 63–68 of Listing 6. We note that fe_function_t can extract the first set of data from the cell_integrator_t and facet_integrator_t instances accessible through fe_cell_iterator_t and fe_facet_iterator_t (provided on input to update), respectively. Third, facet FE functions provide \(u^i_h\) values, gradients, etc., at facet quadrature points from the perspective of its two surrounding cells. This make sense for functions \(u^i_h\) belonging to nonconforming FE spaces, which might be discontinuous across cell boundaries. Facet FE functions should also cope with the fact that the coordinate systems of its surrounding cells might not be aligned in physical space, so that a different local numbering might be assigned to facet quadrature points from the perspective of either cell; see Sect. 9.3 for an exposition of the strategy followed to solve this issue.
11 Building FE Affine Operators
In order to seek the aforementioned goal, fe_affine_operator_t relies on an abstract data type, referred to as assembler_t (see Fig. 10). In a nutshell, assembler_t offers a set of FEassembly tailored, data structure neutral, deferred TBPs, e.g., to assemble the contributions of a cell or facet integral into the linear system coefficient matrix \(\mathbf {A}\) and/or righthand side \(\mathbf {f}\). The subclasses of assembler_t are the ones ultimately responsible to deal with the details underlying the particular linear algebra data structures at hand. The latter ones offer FEassembly neutral interfaces to inject new entries or add contributions to them, such that this software piece becomes reusable and separable, e.g., to be used in third party software projects (not necessarily FEoriented) as a standalone software subsystem. FEMPAR offers a rich set of linear algebra data structures, e.g., data structures organized by blocks, which enable the implementation of block preconditioners for multiphysics problems (see, e.g., [43, 44, 45]). Apart from those required to handle the linear coefficient matrix and righthand side of the system, fe_affine_operator_t also interacts with other data types required to deliver its life cycle (i.e., its autogeneration). In particular, \(\mathbf {A}\) and \(\mathbf {f}\) entries are computed according to the expressions in (10). These expressions involve a FE space (fe_space_t) and the discrete (bi)linear forms of the problem at hand. To express in software this second ingredient, we introduce the discrete_integration_t abstraction; see Fig. 10.
We have structured this section as follows. In Sect. 11.1, we first present the assembler_t abstract data type, and the rationale underlying the design of the linear algebra structures it is grounded on. Next, in Sect. 11.2, we introduce the discrete_integration_t abstract data type that ultimately is in charge of performing the integration of the (bi)linear forms and assembly of the discrete affine operator. We show a particular implementation of this data type (i.e., a subclass) for the Galerkin approximation of the Stokes problem. Finally, the fe_affine_operator_t data type is described in Sect. 11.3.
11.1 Linear Algebra Data Structures and Associated Assemblers
Linear algebra in FEMPAR relies on a pair of data type hierarchies rooted at the mathematical abstractions of a linear algebra operator and a vector, and represented in software by means of the linear_operator_t and vector_t abstract data types, respectively. These abstract data types let a number of linear algebra algorithms within FEMPAR (e.g., iterative linear solvers and block preconditioners for PDE systems) to be expressed independently from the actual implementation of the concrete matrix and vector data structures being used, such as block layout (if any), storage (e.g., dense or sparse storage format) or memory layout (e.g., local or distributedmemory), enabling code reuse and extensibility to a large extent. An abstract expression syntax that allows the construction of complex expressions involving operations among operators and/or vectors is also provided. This enables the implementation of new algorithms in a compact manner. However, because these linear algebra algorithms are not discussed herein but postponed to a further work, the description of the data types and associated methods in these hierarchies will be restricted to what is necessary to describe the assembly of the FE affine operator.
The sparse_matrix_t data type can be found at an intermediate level in the hierarchy rooted at linear_operator_t. This is a crucial data type in FEMPAR, which represents a scalar, nondistributed, sparse matrix. Its design follows the ideas presented in [92]. This design (re)uses the state OO design pattern [88] to hide the actual sparse matrix storage format to the user. Following this pattern, sparse_matrix_t is composed of a polymorphic member variable of (declared) type base_sparse_matrix_t. Its dynamic type can be thus changed at runtime (via reallocation). This dynamic type represents the storage at hand being used. Current subclasses of base_sparse_matrix_t include coo_sparse_matrix_t, csr_sparse_matrix_t, csc_sparse_matrix_t, corresponding to the coordinate list (COO), the compressed sparse row (CSR), and the compressed sparse column (CSC) sparse matrix storage formats [93], respectively.
The life cycle of a sparse_matrix_t instance is as follows. The user first invokes its create TBP, in which one solely specifies the size of the matrix, i.e., the number of rows and columns. This method, however, triggers a number of subsequent actions. In particular, it allocates its dynamic type to the one corresponding to the COO format, and leaves it ready for the injection or addition of contributions to the entries of the matrix. Although not compressed, this format is ideally shaped for the injection or addition of contributions to the entries of the matrix. These are simply pushed back into member arrays that can grow dynamically during the integration/assembly loop (via a judiciously reallocation strategy to trade off cost and memory). Other sparse storage formats, as the CSR storage implemented in the csr_sparse_matrix_t data type (also a type extension of base_sparse_matrix_t), although more memory efficient, require a predefined sparsity pattern, which has to be precomputed. They are not thus well suited for the dynamic build up process of the matrix. At this point the reader should note that, for such inflexible storage formats, one typically needs an accurate estimation of the maximum number of nonzeros per each row (or column) to be memory efficient. This estimation, however, can only be a quite large upper bound in complex scenarios (e.g., hpadaptive methods in 3D, among others).
Once the build up process finishes, the user can call a method specially designed to leave the sparse_matrix_t instance ready for being used (e.g., to perform operations with it). This involves a compression process, in which duplicated entries are either summed up, or filtered (as selected by the user) and a transformation of the COO storage format into the storage format that the user actually requires (e.g., CSR). For simplicity, we refer to this stage as the “compression” of the matrix. Once the sparse_matrix_t instance is in this final state, it is still possible to insert or add contributions to its entries, as far as they belong to the sparsity pattern resulting from the first build up process. Thus, e.g., if a transient and/or nonlinear problem is to be solved and the triangulation of the domain does not change, the assembly in COO format will only be performed at the first nonlinear iteration of the first time step.
As shown so far, the software architecture of sparse_matrix_t is such that several (current and future) storage formats are possible within a single framework. This flexibility is convenient for two main reasons. First, no given storage format is likely to be uniformly better in performance across all possible operations and computer architectures. Second, FEMPAR interoperability with external software dramatically increases. If a new library, that uses its own storage format, is to be integrated, only a new extension of base_sparse_matrix_t has to be added, while leveraging dozens of thousands of lines of code already written. Apart from sparse_matrix_t, there are other sparse matrix data types available, suitable to handle blocks and/or distributedmemory computers. All these data types are essentially composed in some way or another of sparse_matrix_t instances. For example, block_sparse_matrix_t is composed of \(\texttt {nblocks}^2\) sparse_matrix_t instances; see Fig. 10. It, however, provides a set of specialized TBPs that only apply in the blocked case, e.g., the get_block TBP that lets a client to retrieve the sparse_matrix_t instance corresponding to a given block of the matrix.
The counterpart of sparse_matrix_t in the vector case is referred to as scalar_array_t. It represents a scalar, nondistributed, linear algebra vector, with its entries stored explicitly in a simple (Fortran intrinsic) allocatable array. However, provided that it does not have to exploit sparsity, the code bounded to this data type is significantly simpler to the one bounded to sparse_matrix_t. It is equipped with a pair of generic bindings, with signatures coming in different flavours, in order to insert or add contributions to the vector. Likewise, there are other vectorlike data types available suitable to handle blocks and/or distributedmemory computers. For example, block_array_t is composed of \(\texttt {nblocks}\) scalar_array_t instances; see Fig. 10.
Apart from the linear algebra data structures so far, we need the additional data type assembler_t, which offers FEassembly tailored signatures to fe_affine_operator_t. The interface of its deferred TBPs, which its extensions, e.g., scalar_assembler_t and block...assembler_t, implement, are shown in Listing 38. assembler_t has to be “general enough” to handle many storage layouts and it is in charge to isolate fe_affine_operator_t from implementation details. With that purpose in mind, it is composed of a (polymorphic) matrix_t and a (polymorphic) array_t instance. These are in turn abstract data types rooted at all the matrix and array data types seen so far, respectively. The set of deferred TBPs of these two abstract data types is designed (on purpose) to be not sufficiently rich to handle the whole life cycle of the concrete matrix and array instances. The high heterogeneity of the concrete subclasses of matrix_t and array_t precludes it. This set of TBPs is, in particular, restricted to allocation of memory for its entries, initialization of its entries to a given value (e.g., initialization to zero), and deallocation of any internal memory. These three operations are required by fe_affine_operator_t during the deployment of its life cycle. The bulk of the life cycle of the concrete subclasses of matrix_t and array_t is handled by the subclasses of assembler_t. This is how it should be, provided that assembler_t subclasses are the ones aware of the concrete details of the corresponding matrix_t and array_t subclasses. Besides, by doing this, we can overcome the overhead associated to dynamic runtime polymorphism, provided that the binding of finegrain calls to those TBPs injecting or adding contributions to the matrix or the array can be determined at compilation time.
Going back to Listing 38, observe that assembly_array (resp., assembly_matrix) takes an intrinsic Fortran array (resp., rank2 array) as dummy argument for the element vector (resp., matrix). Besides, it also gets the global DOFs identifiers on top a single cell, or those corresponding to cells surrounding the facet (see Lines 23, 35 and 36 in Listing 38). In the case of scalar_assembler_t, the implementation is made using the TBPs provided by scalar_array_t in order to add contributions to its entries and the corresponding TBPs of sparse_matrix_t. In the case of block_assembler_t, the implementation is made by looping through the blocks, obtaining a reference to the current block with the get_block TBP, and using the corresponding TBPs as in the previous case. The assembly_array and assembly_matrix TBPs are used by the fe_cell_iterator_t and fe_facet_iterator_t data types to implement their assembly TBPs (see Lines 35 and 68 in Listing 30 of Sect. 10.2). For completeness, in Listing 39 we show the signature of the latter TBPs. These are the ones actually used by the user in the type extension of discrete_integration_t, as described in Sect. 11.2.
Finally, the compress_storage deferred TBP of assembler_t lets fe_affine_operator_t to signal that the build up process of the linear algebra data structures has already finished and that they can already be “compressed” into its final stage.
We stress that the software architecture presented in this section provides uniform assembly interfaces to the client that are completely independent of the underlying implementation of linear algebra data structures. The subclasses of assembler_t are in charge of the management of blocks (if any), whereas sparse_matrix_t is in charge of the management of the storage schemes.
11.2 Discrete Integration of FE Operators
In this section, we introduce the abstract data type discrete_integration_t (see Listing 40). It defines the generic integrate binding, which is overloaded by the integrate_galerkin and integrate_petrov_galerkin deferred TBPs, depending on the number of fe_space_t instances being passed to them (see, e.g., Line 8 of Listing 40 for the interface corresponding to the Galerkin case). A user that wants to implement a FE problem must extend this data type and overwrite the TBP to be used (Galerkin or PetrovGalerkin) in the userdefined subclass. In the overridden method, the user must implement the evaluation of the entries of \(\mathbf {A}\) and \(\mathbf {f}\) as the numerical integration of the discrete bilinear and linear forms as in (10) (see Sect. 3).
Based on our experience, the integration part of a FE code must exhibit a huge level of flexibility. Every time one wants to consider a new set of PDEs or a new expression of the discrete bilinear form, this component must be modified. It must also have the ability to integrate general time integration schemes that can require functions in an arbitrary number of steps, deal with nonlinear problems that involve the need to evaluating FE functions in the integration of the discrete forms, or including variable physical coefficients of body force terms determined through analytical functions. As a result, any rigidity at this level must be eliminated. Indeed, the discrete_integration_t abstract data type only forces its subclasses to adhere to the signatures of the deferred TBPs overloading integrate, and has no member variables that subclasses are forced to handle. Using the design previously sketched, the user has absolute flexibility to design its own discrete_integration_t subclass, adding the attributes and methods that can be required to integrate and assemble the discrete forms, e.g., by adding an arbitrary number of fe_function_t and *_function_t instances (and corresponding setters to be used at the driver level) that can describe physical properties, previous time step values, the solution at the previous nonlinear iteration, etc.
The integration of cellwise terms of the (bi)linear forms is accomplished by traversing through all the cells using a fe_cell_iterator instance (see Sect. 10.2), which has access to (1) all the cell integration data (see Sect. 8) required to compute the local cell contributions in (11) and (2) the localtoglobal DOF numbering needed for the assembly in the global linear algebra data structures. Analogously, the integration of facet terms, e.g., the ones in (20) for DG formulations, requires the use of a fe_facet_iterator_t instance to traverse through the facets and integrate the corresponding facet terms. The method integrate is called during the execution of the numerical_setup TBP of fe_affine_operator_t. It is in fact the fe_affine_operator_t the one that decides whether to invoke the Galerkin or PetrovGalerkin integration, depending on whether one or two FE spaces have been passed as actual arguments (the second one being optional) in its create binding (see Line 15 of Listing 42). Analogously, the FE space(s) are also passed as actual argument(s) to the integrate_* bindings, since they will be needed at any integration step (see Line 8 of Listing 40 for the Galerkin case).
For illustration purposes, we present in Listing 41 an example extension of discrete_integration_t. It shows the implementation of the deferred procedure integrate_galerkin for the approximation of the Stokes problem using a Galerkin method. This data types will be used in the example driver presented in Sect. 12 for the infsup stable TaylorHood mixed FE method (see Listing 41).^{36}
As commented above, the integration of the (bi)linear forms requires the cell integration machinery, which is provided by fe_space_t through the creation of the fe_cell_iterator_t in Line 19 of Listing 41. Apart from controlling the loop over cells (Lines 24 and 62), fe_cell_iterator_t provides the numerical quadrature, which is in turn required to get the number of integration points (line 31), and its associated weights (line 32). It also provides the determinant of the Jacobian of the cell map (line 32), and the shape functions and gradients at Lines 28 to 30 (see (13) and (14)). The implementation of the (bi)linear forms is very close to the blackboard expression, making it compact, simple, and intuitive. This is possible through the definition of the vector_field_t, and tensor_field_t data types, together with their corresponding expression syntax available in FEMPAR. As it was carefully discussed in Sect. 8.5, it is achieved using operator overloading for different vector and tensor operations, e.g., the contraction and scaling operations. The symmetric_part (used at Lines 35 and 38), double_contract (used at line 40) and trace helper standalone functions (used at Lines 36 and 49) are also offered to make tensor operations easy. We also note that this implementation is also efficient, since all these operations are made without any dynamic memory allocation/deallocation.
Finally, the fe_cell_iterator_t also offers a TBP to assemble the element matrix and vector into the assembler and to impose strong Dirichlet conditions (line 66) using the perturbation in (3) (See Sect. 10.4). The Dirichlet data is extracted from a fe_function_t that represents \(E_h u_{\mathrm{D}}\), which must be an attribute of the concrete discrete_integration_t. For nonconforming FE spaces, the formulation requires also a loop over the facets to integrate DG terms. It can be written in a similar fashion using the tools described in Sect. 9. In this example, the stokes_galerkin_integration_t extension has the attribute \(\texttt {force}\), which is used in Line 56 to integrate the righthand side. It is a vector field described by an instance of the vector_function_t data type.
11.3 The FE Affine Operator Abstraction
A (simplified) declaration of the fe_affine_operator_t data type is shown in Listing 42. The fe_affine_operator_t is created from a single fe_space_t instance, or even two for PetrovGalerkin formulations; the second instance is optional and, when it is not passed, the Galerkin method is used, i.e., the same FE space is used for trial and test spaces. The user can (optionally) configure a desired block layout. Given a Cartesian product FE space \( \mathcal {X}^1_h \times \ldots \times \mathcal {X}^{n_\mathrm{field}}_h\) for a multifield problem with \(n_\mathrm{field}\) fields (see Sect. 3.11), the block layout represents a partition of fields into subsets.^{37} It is described through the argument array field_blocks of size num_fields equal to \(n_\mathrm{field}\), which indicates the block to which each field is assigned; by default, the oneblock case is used. e.g., For the Stokes problem in Example 3.2, one can consider a monolithic block layout with only one block that includes both the velocity and pressure field (field_blocks=[1,1]), or two onefield blocks (field_blocks=[1,2] or [2,1]). Additionally, the user must provide additional information about the diagonal blocks, namely (1) whether the block is symmetric or not, (2) whether symmetric storage wants to be used for the block or not, and (3) whether the block is positive definite, semipositive definite, or indefinite. The user can optionally provide the array of logicals field_coupling (of size num_fields \(\times \) num_fields); the position \((\texttt {i},\texttt {j})\) determines whether the matrix entries related to trial/test functions of the FE space \(\texttt {i}\) and FE space \(\texttt {j}\) are always zero (in this case, the coupling is false) or not. For the Stokes problem and the Galerkin method, the only entry that is false (no coupling) is the pressurepressure entry. When this array is not provided, the case by default is that all fields are coupled. It only implies more memory consumption, e.g., to store the zero entries in the pressurepressure block for the Stokes problem.
The block layout information is stored in the data type block_layout_t, sketched in Listing 43, which stores the arrays field_blocks and field_coupling. It is created in the binding that creates the fe_affine_operator_t. It also stores a blockwise DOF numbering generated by the fe_space_t instance, which is instructed to do so by passing the block_layout_t ^{38} when calling its TBP generate_global_dof_numbering, described in Sect. 10.3.
The fe_affine_operator_t also holds a polymorphic pointer to an assembler_t instance. Its dynamic type is selected during the creation phase depending on the number of blocks, the storage layout required, and the (parallel or serial) environment. Finally, a polymorphic pointer to an instance of declared type discrete_integration_t is also stored (see line 11 of Listing 42). After the creation phase, the fe_affine_operator_t is ready for its setup. Thanks to the design of the linear algebra data structures in FEMPAR, it does not require a symbolic setup, i.e., to precompute a (potential) sparsity pattern. The numerical_setup TBP at line 17 of Listing 42 calls the integrate_galerkin TBP of discrete_integration when the pointer to trial_fe_space is not associated or integrate_petrov_galerkin otherwise, as discussed in Sect. 11.2.
12 Driver Example for the Stokes Problem
In this section, we describe the software architecture of a driver program that approximates the solution of the Stokes problem. To this end, it implements a Galerkin FE method grounded on a “static” (i.e., nonadaptable) conforming mesh and infsup stable FE spaces. In particular, we consider a conforming FE space \(\varvec{\mathcal {V}}_h \times \mathcal {Q}_h \), where \(\varvec{\mathcal {V}}_h\) is a gradconforming Lagrangian space of order \(k+1\), and \(\mathcal {Q}_h\), a gradconforming Lagrangian space of order k, i.e., the mixed TaylorHood FE [5].^{39}
It is up to FEMPAR users to decide how to design the software architecture of their main driver program. Any driver program has nevertheless to follow the typical stages needed in a simulation pipeline based on FEs. In the seek of uniformity, the architecture presented in Listing 44 and 45 is recommended to FEMPAR users. The main program unit relies on a number of driverlevel module units, which are not part of the FEMPAR library but developed by the user specifically for the problem at hand. Each of these modules defines a driverlevel derived data type and its TBPs. A central derived data type, called stokes_driver_t in this example, is designed to drive all the necessary steps. In particular, it offers a public TBP, called run_simulation, on which the driver program relies to perform the actual simulation. The driver program is therefore as simple and concise as shown in Listing 44.
The main data type of the driver, stokes_driver_t, is shown in Listing 45. It is equipped with a set of member variables of type already described in previous sections; see comments on the righthand side of each member variable. The data type solver_t in Line 11 does not exist in FEMPAR as such. There is actually a complete set of data types that provide interfaces to highend third party sparse direct solvers. Besides, we have developed our own abstract implementation of iterative linear solvers (including, e.g., the conjugate gradient or GMRES Krylov subspace solvers). The convergence of these solvers can be accelerated using advanced preconditioners grounded on the Multilevel Balancing Domain Decomposition by Constraints (MLBDDC) preconditioner [34, 37]. The description of the linear solvers software subsystem deserves considerable space and is postponed to a future work. In this example, it has to be understood as a data type that provides the necessary services required to implement the solve_system TBP at Line 20 of Listing 45. The data type stokes_conditions_t at Line 9 extends conditions_t in Sect. 10.4. It encodes the strong Dirichlet boundary conditions data for this particular operator. The member variable parameter_list (see Line 4) is a parameter dictionary of \(<key, value>\) pairs. Its implementation is provided as a standalone external software library called FPL [86]. The member variable stokes_parameters (see Line 3) is a userdefined data type that encapsulates the interaction with a command line parser provided by the FLAP software package [94]. Both of them are used to implement the TBP in Line 14, which parses the arguments given by the user in the command line, and transfers them into the aforementioned parameter_list member variable.
The run_simulation TBP (called from the main program in Line 8 of Listing 44) is implemented with the help of the private TBPs in Lines 17–21 of Listing 45. The setup_triangulation TBP invokes the create TBP of static_triangulation_t. Depending on the commandline parameter values, the user may select to automatically generate a structured/uniform triangulation for simple domains (e.g., a unit cube), currently of brick (quadrilateral or hexahedral) cells, or read it from a mesh data file, e.g., using the GiD unstructured mesh generator [91]. The FE space is built in setup_fe_space TBP, sketched in Listing 46.
An array with base type p_reference_fe_t, a data type that wraps a polymorphic pointer to a reference_fe_t instance, is allocated in Line 8 of Listing 46. The reference_fe_t instances for the velocity and pressure fields are created by calling make_reference_fe in Lines 11 and 21, respectively; see Sect. 6.4. The interpolation order of the numerical scheme is read from commandline in Line 5. We select order equal to \(k+1\) and k in Lines 11 and 21, respectively. The dummy argument continuity determines whether \(\mathcal {X}\) admits a trace operator. In this particular example, we could consider continuity=.false. if we wanted to use a discontinuous pressure space. The create TBP of fe_space_t (Line 35) performs the composition of the reference FEs to build the Cartesian product space \(\mathcal {X}_h\). Finally, we call the set_up_cell_integration TBP of fe_space_t in Line 38 to set up all the data structures required to evaluate cell integrals in Listing 40.
The implementation of the setup_fe_affine_operator binding is shown in Listing 47. It first invokes the create TBP of fe_affine_operator_t in Line 6. We state monolithic storage for the global coefficient matrix (Line 13), that it is symmetric (Line 9), that we want symmetric storage, i.e., to only store its upper triangle (Line 8), and the fact that it is indefinite (Line 10). The definition of field_coupling in Line 14 reflects that the pressure diagonal block is null. We also pass an instance of fe_space_t in Line 11 and an instance of the subclass stokes_integration_t in Line 12.
Before we set up the operator in Line 27, we create a fe_function_t instance in Line 18. In Line 19, by means of the services provided by fe_space_t, we interpolate the analytical function to be prescribed on the boundary for the velocity field (retrieved from stokes_conditions), fixing the strong Dirichlet DOFs of the fe_function_t instance at hand. As a result, this FE function represents \(E_h u_\mathrm{D}\), with the zero extension to free DOFs; see Sect. 10.4. This FE function is passed to the stokes_integration_t instance in Line 24. Finally, we trigger the operator autoconstruction in Line 27.
The solve_system TBP (see Line 20 of Listing 45) invokes either a direct or preconditioned iterative solver to obtain the free DOFs nodal values of our FE function (see Sect. 10.5). Provided that this%solution on input to solve_system is such that it vanishes on free DOFs (see discussion in previous paragraph), a common practice used in FEMPAR drivers to save space is to reuse the space devoted for free DOFs in this%solution to store the free DOFs nodal values of the solution of the problem at hand. We stress that all solvers in FEMPAR are such that they only solve for free DOFs. In our experience, this decision dramatically simplifies the development of some preconditioners, provided that they can be developed without taking care of strong Dirichlet boundary conditions.
Finally, the write_solution TBP (see Line 21 of Listing 45) is in charge of the generation of simulation results in data files for later visualization using, e.g., VisIt [95] or Paraview [96]. To this end, write_solution relies on a format independent, extensible abstraction, referred to as output_handler_t. It lets the user to register an arbitrary number of FE functions (together with the corresponding FE space these functions were generated from) and cell data arrays (e.g., material properties or error estimator indicators), to be output in the appropriate format for later visualization. Among its responsibilities, this (abstract) data type generates the data to be written to the (potentially paralleldistributed) file system in neutral, celloriented data structures, dealing with (potentially) nonconforming (discontinuous), and variable degree FE spaces among cells. The user may also select to apply a differential operator to the FE function, such as divergence, gradient or curl, which involve further calculations to be performed on each cell, or to customize those cells to be output (e.g., only those that belong to the interior of the geometry in unfitted FE simulations) via their own implementation of cell iterators.
The generation of the actual data files in the appropriate format is in charge of the implementations (extensions) of output_handler_t. FEMPAR currently offers two implementations of output_handler_t (although many others could be implemented as well by the growing community of FEMPAR developers given the extensible software architecture designed). vtk_output_handler_t lets the user to generate their data in the standardopen model VTK [97]. It currently relies on Lib_VTK_IO [98], which (by now) does not actually exploit parallel MPI I/O but instead uses a naive single file per MPI task scheme. vtk_output_handler_t is therefore the recommended option for serial computations or parallel computations on a moderate number of processors. The second one, xh5_output_handler_t, lets the user generate their data in XDMF [99]. XDMF separates the description of the raw data, referred to as “light data”, from the data itself, referred to as “heavy data”. The light data is expressed using a set of XMLbased constructs that are suited to represent the distributedmemory data structures in FEMPAR. XDMF in turn supports the heavy data to be stored using HDF5 [100]. HDF5 is, among others, a data model and file format designed with the parallel I/O data challenge in mind. By means of a set of supporting open source libraries, referred to as parallel HDF5 libraries, FEMPAR takes advantage of the underlying distributed file system without having to deal with the high complexity of other lowerlevel implementations, such as raw MPI I/O. In particular, the latter service is provided by XH5For [101], a standalone software library, which we developed from scratch, and lets the user to read/write parallel partitioned FEM meshes taking advantage of the Collective/Independent MPIIO provided by the PHDF5 library for the efficient generation of the vast amount of data typically resulting from a largescale scientific computing simulation.
13 Conclusions
In this work, we have thoroughly described the approach that we have followed in FEMPAR in order to abstract in software the numerical approximation of problems governed by PDEs using FE methods. The mathematical framework of FEs has been split into a number of (mathematically motivated) derived data types and their interaction, resulting into a wellseparated, robust, and stable set of customizable software abstractions for the development of widely applicable FE solvers. These tools equip FEMPAR users with the machinery needed to perform all the steps in the simulation pipeline, including mesh import/generation, DOFs enumeration, evaluation/assembly of the algebraic system of linear equations via FE integration, solution of the linear system, and output of computational results in the appropriate format for later visualization. In order to achieve this goal, the software architecture of FEMPAR has been thoroughly designed by means of advanced OO software reengineering techniques (including the recurrent application of OO design patterns [85, 88]) in order to increase its ease of use, extensibility, flexibility, and reusability. FEMPAR software architecture has been implemented using the latest OO features of the Fortran03/08 standard, namely, information hiding and data encapsulation, inheritance via type extension, and dynamic runtime polymorphism. This version of the Fortran standard is already widely (and robustly) supported by most of the compilers typically available on highend computing environments. A judiciously set of programming techniques let us achieve a reasonable tradeoff among extensibility and performance, while avoiding in most cases the computational overheads frequently associated with abstract OO software libraries.

The definition of reference FEs, which relies on the concept of polytopes to define the cell topology in arbitrary dimensions, a machinery to define multidimensional polynomial functions of arbitrary order in an easy and automatic way, and a general procedure for the generation of the shape function bases and local DOFs.

The global FE space abstraction, which relies on reference FE(s) and a triangulation of the physical domain. It is responsible to define the localtoglobal DOF numbering, which must respect conformity (if needed). The FE space also provides tools for the numerical integration of (bi)linear forms, e.g., mappings from the reference to the physical space, etc., in cells and facets (for DG methods).

The FE affine operator generated after the discretization of the original problem (probably after a linearization step). The FE solution is the only root (as soon as the problem is wellposed) of this operator. This operator, once the trial and test functions and the discrete (bi)linear forms of the problem at hand are defined, is represented through a matrix and a vector whose entries can be computed by numerical integration using the FE space.
The first public release of FEMPAR has almost 300K lines of (mostly) Fortran code. Thus, a document like this one, with a quite detailed description of the services provided by the library and the motivation underlying our software design, can be a very valuable resource to complement the source code, which can become overwhelming in itself. In this paper, we have restricted ourselves to the construction of FE operators for bodyfitted FE spaces. However, a major (and unique compared to other FE scientific software packages available on the Internet) cornerstone of FEMPAR is an abstract OO framework for the implementation of widely applicable highly scalable multilevel DD solvers.^{40} By letting this framework to be highly coupled with the numerical integration data structures of the application, on the one hand, and to be highly customizable, on the other, one can derive optimal preconditioners for the particular structure of the discrete operator at hand, and tackle new problems and challenges, while leveraging the distributedmemory implementation ideas [37] on which the framework is grounded on. Customizable building blocks in the framework include the finegrid to coarsegrid DOFs aggregation, the constraint matrix underlying the imposition of continuity of coarse DOFs functionals across coarse objects, the weighting operator underlying the injection among the continuous and discontinuous spaces, and the kind of solvers to be used for the Dirichlet, Neumann constrained local problems, and the coarsestgrid global problem [103]. However, we postpone the discussion about solvers, preconditioners, data structures suitable for parallel distributedmemory computers, and other more exotic discretization techniques in FEMPAR, like Bsplines and XFEM methods, to subsequent works.
Footnotes
 1.
A paradigmatic example is the design of scalable solvers for the discretization of the Maxwell equations using edge elements, which involve the discretization of additional operators (discrete gradients) and changes of basis at the reference FE level [32].
 2.
Available at http://semver.org/.
 3.
The code snippets are written in advanced OO Fortran 200X [67]. There is a close relationship between these language features and those available in the C++ language [68] and we established some code style rules to emphasize it. In particular, Fortran modules in FEMPAR are always named with the suffix _names, to indicate the analogy with namespaces in C++. Derived types, analog to C structs or C++ classes, are always named with _t to distinguish them from instances. However it should be kept in mind that, whereas structs in C++ are passive data containers and classes are used to carry also methods, Fortran derived data types are used in both cases since the introduction in the 2003 standard of the so called typebound procedures (TBPs).
 4.
In this section, we do not make difference between reference and physical spaces, e.g., using the \({\hat{\cdot }}\) symbol. In any case, all the following developments are usually performed at the reference FE level.
 5.
This assumption in fact applies for FEs of any kind, since the local functional spaces are already conforming and do not require an equivalence class at the cell level.
 6.
The test function spaces in the definition of the moments are always considered with respect to the corresponding domain of integration.
 7.
We note that we can take \(\varvec{J}_{\hat{F}}^T \varvec{v}\) instead of \(\varvec{J}_{\hat{F}}^T (\varvec{v}\times \varvec{n})\) in the definition of the face moments, since the rows of the Jacobian matrix are the transformation of the axes in the reference face \({\hat{F_0}}\) to the actual face \({\hat{F}}\) of the reference cell and the space of test functions is invariant to rotations.
 8.
We note that in fact the order k is not a scalar but a vector \(\varvec{k} \in \mathbb {R}^d\). In principle, the use of a vectorvalued order only has sense for ncubes. The implementation in FEMPAR makes use of a vectorvalued order, even though all entries should be the same for polytopes that are not ncubes. We note that the use of different orders in different directions is basic to define high order RaviartThomas and Nédélec elements on ncubes. In the following presentation, we consider the scalar order case for simplicity.
 9.
In the following, one can consider two unit cubes sharing a face. Since all the concepts are logical one does not have to take into account the real shape of the cells in the physical space. On the other hand, we note that the orientation index is invariant to which of the two cells sharing the face we select as first and second cell.
 10.
Analogously, one could generate serendipity elements only by changing the generation of the multidimensional space in terms of 1D ones.
 11.
We can consider three levels of DOF numbering: the cellwise DOF numbering (referred to as local DOFs), the subdomainwise DOF numbering (referred to as global DOFs), and a full domain global DOFs. The latter numbering is never created/required in FEMPAR. In serial environments, the latter two match.
 12.
We note that the responsibility of determining the permutation index does not lay on reference_fe_t, but on the abstraction of FEMPAR that represents the mesh of the computational domain; see Sect. 7.
 13.
This data type is implemented within the FPL software package [86].
 14.
We note that the actual conceptual representation of the triangulation in FEMPAR differences among local (to subdomain) cell identifiers and global cell identifiers (among the whole triangulation of the domain) in a distributedmemory context. The second sort of identifiers are coded as long precision integers, i.e., integer(igp), in order to accommodate simulations with more than \(2^{31}1\) global cells.
 15.
As mentioned in the case of cells, the actual conceptual representation of the triangulation in FEMPAR differences among local (to a subdomain) vef identifiers and global vef identifiers (among the whole triangulation of the domain) in a distributedmemory context. Again the latter ones are long precision integers.
 16.
For completeness, let us mention that triangulation_t also offers traversals over subsets of objects conveniently selected for acceleration purposes. For example, triangulation_t provides an iterator over vertices, edges, and faces that lay on the interface among subdomains, called itfc_vef_iterator_t (i.e., a subset of the set of objects traversed by vef_iterator_t) for those subclasses suitable for parallel distributedmemory environments.
 17.
Thus, e.g., a triangulation_t subclass that internally labels the global identifiers of vefs by their dimension in increasing order would result in a traversal with such an order. This is however a potentially changing over time lowlevel implementation detail that user programs relying on triangulation_t and its associated iterators should not assume nor rely on.
 18.
We note that \(F^{3}_i \langle F^3 \rangle \) is simply \(F^{3}_i\) and is not stored.
 19.
As it is well known, considering ncube topologies for \({\hat{K}}\), for a Lagrangian reference FE of order p and an affine geometrical map, we need a 1D Gaussian quadrature with \(p+1\) points. For tetrahedral meshes with the Duffy transformation, we need to take \(n=p+ \mathrm{ceiling}(d/2)\) to integrate exactly mass matrices (see Sect. 3.5 for more details).
 20.
Here (and in many other places) we try to maximize the granularity of each call to a deferred binding for efficiency reasons. The reader should be aware that calling to deferred bindings with the granularity of the latter approach would be very expensive, apart from preventing a number of potential compiler optimizations enabled by the former.
 21.
This represents another design decision in the seek of maximizing the granularity of the calls to deferred bindings for code efficiency reasons.
 22.
In fact, 2D problems for PDEs that involve curl operators require SPACE_DIM to be equal to 3.
 23.
This can be easily fulfilled by calling the get_nodes_coordinates binding of vef_iterator_t in Listing 11.
 24.
We note that, as in Sect. 8.3, the nodes_coordinates(:) member variable of these two cell_map_t instances has to be loaded with the coordinates in physical space of the geometry nodes of the two cells surrounding the facet.
 25.
We stress, however, that the full set of services provided by fe_space_t is not actually restricted to only these two.
 26.
We note that fe_space_t is not actually in FEMPAR. It is a whole data type hierarchy rooted at base_fe_space_t, not included here for simplicity. Within this hierarchy, we have, e.g., FE space concretizations suitable for either serial or parallel distributedmemory environments. The one shown in the listing very much resembles serial_fe_space_t.
 27.
The term hash table here reflects its usual meaning, i.e., an associative array that maps keys to values.
 28.
We stress, however, that each particular reference_fe_t subclass at hand has the freedom to implement a different strategy if required.
 29.
 30.
The last requirement has been introduced to include the concept of void FEs for multifield problems in which some fields are not defined on the whole domain (see Sect. 6.5).
 31.
We stress, however, that the approach discussed in the sequel to handle the strong imposition of boundary conditions is applicable to more complex problems and discretizations, e.g., the Maxwell equations discretized with curlconforming FE spaces.
 32.
It is assumed that the discrete Dirichlet data extension is zero on free DOFs, but other more general situations can also be accommodated.
 33.
In parallel environments, every processor only stores the fixed DOF values that belong to its associated subdomain.
 34.
Some of the algorithms in charge of computing \(E_h u_{\mathrm{D}}\) may require a different storage layout from the one of scalar_array_t (e.g., blocked storage and/or suitable for distributedmemory computers), and/or restrict themselves to those fixed DOFs of \(E_h u_{\mathrm{D}}\) corresponding to a given field (or set of fields). In such a case, \(E_h u_{\mathrm{D}}\) is scattered in place back and forth into temporary work space with the appropriate layout for the algorithm at hand in charge of computing its entries (e.g., a serial or parallel distributedmemory boundary mass problem iterative solver). It turns out that it is not such a high performance penalty provided that such algorithms already require to perform a sweep over boundary facets (e.g., in order to assemble a boundary mass matrix). During this sweep, the fixed DOFs in question can be already counted and identified.
 35.
Analytical scalar, vector, and tensorvalued functions are also supported in FEMPAR through the classes scalar_function_t, vector_function_t, and tensor_function_t, respectively. To implement an analytical scalar function \(f(\varvec{x})\) in FEMPAR, the user has to extend scalar_function_t methods get_value, get_gradient (if used), etc., with the analytical expression, for a given \(\texttt {point\_t}\) that represents \(\varvec{x}\). We proceed analogously for vector and tensor fields. These data types are very simple and we omit their description here.
 36.
We note that the Stokes subclass of discrete_integration_t in Listing 41 implements the Galerkin approximation for this problem but it is independent of the FE space being used. It can be reused for any conforming infsup stable mixed FE method, e.g., TaylorHood, conformal CrouzeixRaviart, MINI element, etc. The choice of the mixed FE space will be determined by the user in the driver, when building the Cartesian twofield FE space.
 37.
The actual ordering of the fields in the Cartesian FE space is determined by the user in the creation of the multifield FE space, which must be consistent with the implementation of the discrete weak form. See, e.g., the creation of the mixed TaylorHood FE space in Lines 11–21 of Listing 46, where the first field is the velocity field and the second one is the pressure field, and the integration of the weak form, e.g., in Lines 34, 37, and 42 of Listing 41, where this numbering is respected.
 38.
The blockwise numbering creates independently the DOF numbering of every block. Thus, DOFs of different blocks can have the same blockwise DOF label.
 39.
The pressure field belongs to \(L^2(\Omega )\). Thus, a discontinuous pressure FE space could have been also considered as well. It would still be \(L^2(\Omega )\)conforming. This is the case of, e.g., the conformal CrouzeixRaviart mixed FE.
 40.
Indeed, the multilevel DD solvers within FEMPAR are since 2014 in the HighQ club of the most scalable European codes, maintained by the Jülich supercomputing center [102].
Notes
Acknowledgements
The authors want to thank Jesús Bonilla, Oriol Colomés, Eric Neiva, Hieu Nguyen, Marc Olm, Víctor Sande, and Francesc Verdugo (in alphabetical order) for their strong commitment to the FEMPAR project, the implementation of some of the software described in this work, and their thorough review of preliminary versions of this document. The resources needed to develop a scientific library library like FEMPAR would have not been feasible without excellent research funding. In this sense, SB sincerely thanks the support of the European Research Council through the Starting Grant No. 258443—COMFUS: Computational Methods for Fusion Technology under the the FP7 Program and the two related Proof of Concept Grant No. 640957—FEXFEM: On a free open source extreme scale finite element software and Proof of Concept Grant No. 737439—NuWaSim: On a Nuclear Waste Deep Repository Simulator under the H2020 Program. SB gratefully acknowledges the support received from the Catalan Government through the ICREA Acadèmia Research Program.
Compliance with Ethical Standards
Conflict of interest
The authors declare that they have no conflict of interest.
References
 1.Guo B, Babuska I (1986) The hp version of the finite element method. Comput Mech 1(1):21–41CrossRefzbMATHGoogle Scholar
 2.Ainsworth M, Oden JT (2011) A posteriori error estimation in finite element analysis. Wiley, New YorkzbMATHGoogle Scholar
 3.Melenk JM, Wohlmuth BI (2001) On residualbased a posteriori error estimation in hpFEM. Adv Comput Math 15(1–4):311–331MathSciNetCrossRefzbMATHGoogle Scholar
 4.Nedelec JC (1980) Mixed finite elements in \(\cal{R}^3\). Numer Math 35(3):315–341MathSciNetCrossRefzbMATHGoogle Scholar
 5.Brezzi F, Fortin M (1991) Mixed and hybrid finite element methods. Springer, BerlinCrossRefzbMATHGoogle Scholar
 6.Arnold DN, Falk RS, Winther R (2006) Finite element exterior calculus, homological techniques, and applications. Acta Numer 15:1–155MathSciNetCrossRefzbMATHGoogle Scholar
 7.Neilan M, Sap D (2016) Stokes elements on cubic meshes yielding divergencefree approximations. Calcolo 53(3):263–283MathSciNetCrossRefzbMATHGoogle Scholar
 8.Hughes TJR, Cottrell JA, Bazilevs Y (2005) Isogeometric analysis: CAD, finite elements, NURBS, exact geometry and mesh refinement. Comput Methods Appl Mech Eng 194(39–41):4135–4195MathSciNetCrossRefzbMATHGoogle Scholar
 9.Cockburn B, Gopalakrishnan J, Lazarov R (2009) Unified hybridization of discontinuous Galerkin, mixed, and continuous Galerkin methods for second order elliptic problems. SIAM J Numer Anal 47(2):1319–1365MathSciNetCrossRefzbMATHGoogle Scholar
 10.Demkowicz L, Gopalakrishnan J (2010) A class of discontinuous PetrovGalerkin methods. Part I: the transport equation. Comput Methods Appl Mech Eng 199(23–24):1558–1572MathSciNetCrossRefzbMATHGoogle Scholar
 11.Ang J, Barrett R, Benner R, Burke D, Chan C, Cook J, Donofrio D, Hammond S, Hemmert K, Kelly S, Le H, Leung V, Resnick D, Rodrigues A, Shalf J, Stark D, Unat D, Wright N (2014) Abstract machine models and proxy architectures for exascale computing. In: CoHPC’14 Proceedings of the 1st international workshop on hardware–software codesign for high performance computing. IEEE, pp 25–32Google Scholar
 12.Gropp W (2015) Is MPI+X enough for exascale? Keynote for international high performance computing forum, Tianjin, ChinaGoogle Scholar
 13.Kale LV, Krishnan S (1993) CHARM++: a portable concurrent object oriented system based on C++. ACM SIGPLAN Not 28(10):91–108CrossRefGoogle Scholar
 14.Bauer M, Treichler S, Slaughter E, Aiken A (2012) Legion: expressing locality and independence with logical regions. In: Proceedings of the international conference on high performance computing, networking, storage and analysis, SC ’12, Los Alamitos, CA, USA, IEEE Computer Society Press, pp 66:1–66:11Google Scholar
 15.Kaiser H, Heller T, AdelsteinLelbach B, Serio A, Fey D (2014) Hpx: A task based programming model in a global address space. In Proceedings of the 8th international conference on partitioned global address space programming models, PGAS ’14, New York, NY, USA, ACM, pp 6:1–6:11Google Scholar
 16.Janine Bennett PI, Robert Clay PM, Baker G, Gamell M, Hollman D, Knight S, Kolla H, Sjaardema G, Slattengren N, Teranishi K et al (2015) ASC ATDM level 2 milestone# 5325: asynchronous manytask runtime system analysis and assessment for next generation platforms. Technical Report SAND20158312, Sandia National LaboratoriesGoogle Scholar
 17.Bangerth W, Hartmann R, Kanschat G (2007) deal. II–a generalpurpose objectoriented finite element library. ACM Trans Math Softw 33(4):24MathSciNetCrossRefzbMATHGoogle Scholar
 18.Bangerth W, Davydov D, Heister T, Heltai L, Kanschat G, Kronbichler M, Maier M, Turcksin B, Wells D (2016) The deal. II library, version 8.4. J Numer Math 24:135–141MathSciNetCrossRefzbMATHGoogle Scholar
 19.Alnæs M, Blechta J, Hake J, Johansson A, Kehlet B, Logg A, Richardson C, Ring J, Rognes ME, Wells GN (2015) The FEniCS project version 1.5. Arch Numer Softw 3(100):9–23Google Scholar
 20.Bauman P, Stogner R (2016) GRINS: a multiphysics framework based on the libmesh finite element library. SIAM J Sci Comput 38(5):S78–S100MathSciNetCrossRefGoogle Scholar
 21.Cantwell CD, Moxey D, Comerford A, Bolis A, Rocco G, Mengaldo G, De Grazia D, Yakovlev S, Lombard JE, Ekelschot D, Jordi B, Xu H, Mohamied Y, Eskilsson C, Nelson B, Vos P, Biotto C, Kirby RM, Sherwin SJ (2015) Nektar++: an opensource spectral/element framework. Comput Phys Commun 192:205–219CrossRefzbMATHGoogle Scholar
 22.MOOSE (Multiphysics ObjectOriented Simulation Environment) Framework. http://mooseframework.org/
 23.MFEM—a free, lightweight, scalable C++ library for finite element methods. http://mfem.org/
 24.Hecht F (2012) New development in FreeFem++. J Numer Math 20(3–4):251–265MathSciNetzbMATHGoogle Scholar
 25.Dedner A, Nolte M (2012) Construction of local finite element spaces using the generic reference elements. In: Dedner A, Flemisch B, Klöfkorn R (eds) Advances in DUNE. Springer, Berlin, pp 3–16CrossRefGoogle Scholar
 26.Balay S, Abhyankar S, Adams MF, Brown J, Brune P, Buschelman K, Dalcin L, Eijkhout V, Gropp WD, Kaushik D, Knepley MG, McInnes LC, Rupp K, Smith BF, Zampini S, Zhang H, Zhang H (2016a) PETSc web pageGoogle Scholar
 27.Balay S, Abhyankar S, Adams MF, Brown J, Brune P, Buschelman K, Dalcin L, Eijkhout V, Gropp WD, Kaushik D, Knepley MG, McInnes LC, Rupp K, Smith BF, Zampini S, Zhang H, Zhang H (2016b) PETSc users manual. Technical report ANL95/11—Revision 3.7, Argonne National LaboratoryGoogle Scholar
 28.Balay S, Gropp WD, McInnes LC, Smith BF (1997) Efficient management of parallelism in object oriented numerical software libraries. In: Arge E, Bruaset AM, Langtangen HP (eds) Modern software tools in scientific computing. Birkhäuser Press, Berlin, pp 163–202CrossRefGoogle Scholar
 29.Falgout RD, Yang UM (2002) hypre: a library of high performance preconditioners. In: Computational science—ICCS 2002. Springer, Berlin, pp 632–641Google Scholar
 30.Heroux MA, Bartlett RA, Howle VE, Hoekstra RJ, Hu JJ, Kolda TG, Lehoucq RB, Long KR, Pawlowski RP, Phipps ET, Salinger AG, Thornquist HK, Tuminaro RS, Willenbring JM, Williams A, Stanley KS (2005) An overview of the trilinos project. ACM Trans Math Softw 31(3):397–423MathSciNetCrossRefzbMATHGoogle Scholar
 31.The Trilinos Project. https://trilinos.org
 32.Toselli A (2006) Dualprimal FETI algorithms for edge finiteelement approximations in 3D. IMA J Numer Anal 26(1):96–130MathSciNetCrossRefzbMATHGoogle Scholar
 33.Dohrmann CR (2007) An approximate BDDC preconditioner. Numer Linear Algebra Appl 14(2):149–168MathSciNetCrossRefzbMATHGoogle Scholar
 34.Badia S, Martín AF, Principe J (2015) On the scalability of inexact balancing domain decomposition by constraints with overlapped coarse/fine corrections. Parallel Comput 50:1–24MathSciNetCrossRefGoogle Scholar
 35.Efendiev Y, Hou TY (2009) Multiscale finite element methods: theory and applications. Springer, New YorkzbMATHGoogle Scholar
 36.Klawonn A, Lanser M, Rheinbach O (2015) Toward extremely scalable nonlinear domain decomposition methods for elliptic partial differential equations. SIAM J Sci Comput 37(6):C667–C696MathSciNetCrossRefzbMATHGoogle Scholar
 37.Badia S, Martín A, Principe J (2016) Multilevel balancing domain decomposition at extreme scales. SIAM J Sci Comput, pp C22–C52Google Scholar
 38.Zampini S (2016) PCBDDC: a class of robust dualprimal methods in PETSc. SIAM J Sci Comput 38(5):S282–S306MathSciNetCrossRefzbMATHGoogle Scholar
 39.Badia S, Olm M (2017) Space–time balancing domain decomposition. SIAM J Sci Comput 39(2):C194–C213MathSciNetCrossRefzbMATHGoogle Scholar
 40.Brune P, Knepley M, Smith B, Tu X (2015) Composing scalable nonlinear algebraic solvers. SIAM Rev 57(4):535–565MathSciNetCrossRefzbMATHGoogle Scholar
 41.Falgout R, Friedhoff S, Kolev T, MacLachlan S, Schroder J (2014) Parallel time integration with multigrid. SIAM J Sci Comput 36(6):C635–C661MathSciNetCrossRefzbMATHGoogle Scholar
 42.FEMPAR: Finite Element Multiphysics PARallel solvers. https://gitlab.com/fempar/fempar
 43.Elman HC, Silvester DJ, Wathen AJ (2005) Finite elements and fast iterative solvers: with applications in incompressible fluid dynamics. Oxford University Press, OxfordzbMATHGoogle Scholar
 44.Badia S, Martín AF, Planas R (2014) Block recursive LU preconditioners for the thermally coupled incompressible inductionless MHD problem. J Comput Phys 274:562–591MathSciNetCrossRefzbMATHGoogle Scholar
 45.Cyr E, Shadid J, Tuminaro R (2016) Teko: a block preconditioning capability with concrete example applications in Navier–Stokes and MHD. SIAM J Sci Comput 38(5):S307–S331MathSciNetCrossRefzbMATHGoogle Scholar
 46.Colomés O, Badia S, Codina R, Principe J (2015) Assessment of variational multiscale models for the large eddy simulation of turbulent incompressible flows. Comput Methods Appl Mech Eng 285:32–63MathSciNetCrossRefGoogle Scholar
 47.Colomés O, Badia S (2016) Segregated Runge–Kutta methods for the incompressible Navier–Stokes equations. Int J Numer Methods Eng 105(5):372–400MathSciNetCrossRefGoogle Scholar
 48.Colomés O, Badia S, Principe J (2016) Mixed finite element methods with convection stabilization for the large eddy simulation of incompressible turbulent flows. Comput Methods Appl Mech Eng 304:294–318MathSciNetCrossRefGoogle Scholar
 49.Colomés O, Badia S (2017) Segregated Runge–Kutta time integration of convectionstabilized mixed finite element schemes for wallunresolved LES of incompressible flows. Comput Methods Appl Mech Eng 313:189–215MathSciNetCrossRefGoogle Scholar
 50.Badia S, Codina R, Planas R (2013a) On an unconditionally convergent stabilized finite element approximation of resistive magnetohydrodynamics. J Comput Phys 234:399–416MathSciNetCrossRefzbMATHGoogle Scholar
 51.Badia S, Planas R, GutiérrezSantacreu JV (2013b) Unconditionally stable operator splitting algorithms for the incompressible magnetohydrodynamics system discretized by a stabilized finite element formulation based on projections. Int J Numer Methods Eng 93(3):302–328MathSciNetCrossRefzbMATHGoogle Scholar
 52.Planas R, Badia S, Codina R (2011) Approximation of the inductionless MHD problem using a stabilized finite element method. J Comput Phys 230(8):2977–2996MathSciNetCrossRefzbMATHGoogle Scholar
 53.Smolentsev S, Badia S, Bhattacharyay R, Bühler L, Chen L, Huang Q, Jin HG, Krasnov D, Lee DW, de les Valls EM, Mistrangelo C, Munipalli R, Ni MJ, Pashkevich D, Patel A, Pulugundla G, Satyamurthy P, Snegirev A, Sviridov V, Swain P, Zhou T, Zikanov O (2015) An approach to verification and validation of MHD codes for fusion applications. Fusion Eng Des 100:65–72CrossRefGoogle Scholar
 54.Badia S, Codina R, Planas R (2015) Analysis of an unconditionally convergent stabilized finite element formulation for incompressible magnetohydrodynamics. Arch Comput Methods Eng 22(4):621–636MathSciNetCrossRefzbMATHGoogle Scholar
 55.Badia S, Hierro A (2015) On discrete maximum principles for discontinuous Galerkin methods. Comput Methods Appl Mech Eng 286:107–122MathSciNetCrossRefGoogle Scholar
 56.Badia S, Hierro A (2014) On monotonicitypreserving stabilized finite element approximations of transport problems. SIAM J Sci Comput 36(6):A2673–A2697MathSciNetCrossRefzbMATHGoogle Scholar
 57.Hierro A, Badia S, Kus P (2016) Shock capturing techniques for adaptive finite elements. Comput Methods Appl Mech Eng 309:532–553MathSciNetCrossRefGoogle Scholar
 58.Badia S, Bonilla J (2017) Monotonicitypreserving finite element schemes based on differentiable nonlinear stabilization. Comput Methods Appl Mech Eng 313:133–158MathSciNetCrossRefGoogle Scholar
 59.Badia S, Bonilla J, Hierro A (2017) Differentiable monotonicitypreserving schemes for discontinuous Galerkin methods on arbitrary meshes. Comput Methods Appl Mech Eng 320:582–605MathSciNetCrossRefGoogle Scholar
 60.Badia S, Verdugo F (2017) Robust and scalable domain decomposition solvers for unfitted finite element methods. arXiv:1703.06323 [math]
 61.Chiumenti M, Neiva E, Salsi E, Cervera M, Badia S, Davies C, Chen Z, Lee C (2017) Numerical modelling and experimental validation in selective laser melting (submitted)Google Scholar
 62.Badia S, Martín AF, Principe J (2013) Implementation and scalability analysis of balancing domain decomposition methods. Arch Comput Methods Eng 20(3):239–262MathSciNetCrossRefzbMATHGoogle Scholar
 63.Badia S, Martín AF, Príncipe J (2013) Enhanced balancing Neumann–Neumann preconditioning in computational fluid and solid mechanics. Int J Numer Meth Eng 96(4):203–230MathSciNetzbMATHGoogle Scholar
 64.Badia S, Nguyen H (2016) Balancing domain decomposition by constraints and perturbation. SIAM J Numer Anal 54(6):3436–3464MathSciNetCrossRefzbMATHGoogle Scholar
 65.Badia S, Martín AF, Nguyen H (2016) Physicsbased balancing domain decomposition by constraints for heterogeneous problems. Working paper or preprintGoogle Scholar
 66.Badia S, Martín A, Principe J (2014) A highly scalable parallel implementation of balancing domain decomposition by constraints. SIAM J Sci Comput 36(2):C190–C218MathSciNetCrossRefzbMATHGoogle Scholar
 67.Adams JC, Brainerd WS, Hendrickson RA, Maine RE, Martin JT, Smith BT (2009) The Fortran 2003 handbook. Springer, London,CrossRefzbMATHGoogle Scholar
 68.Rouson D, Xia J, Xu X (2011) Scientific software design: the objectoriented way, 1st edn. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
 69.Ern A, Guermond JL (2004) Theory and practice of finite elements. Springer, BerlinCrossRefzbMATHGoogle Scholar
 70.Brenner SC, Scott R (2010) The mathematical theory of finite element methods. Springer, softcover reprint of hardcover 3rd ed. 2008 ednGoogle Scholar
 71.Quarteroni A (2014) Numerical models for differential problems. Springer Milan, MilanoCrossRefzbMATHGoogle Scholar
 72.Monk P (2003) Finite element methods for Maxwell’s equations. Clarendon Press, OxfordCrossRefzbMATHGoogle Scholar
 73.Duffy M (1982) Quadrature over a pyramid or cube of integrands with a singularity at a vertex. SIAM J Numer Anal 19(6):1260–1262MathSciNetCrossRefzbMATHGoogle Scholar
 74.Dunavant DA (1985) High degree efficient symmetrical Gaussian quadrature rules for the triangle. Int J Numer Meth Eng 21(6):1129–1148MathSciNetCrossRefzbMATHGoogle Scholar
 75.Olm M, Badia S, Martín AF (2017) Simulation of high temperature superconductors and experimental validation. arXiv:1707.09783 [physics]
 76.Badia S, Quaini A, Quarteroni A (2008a) Modular vs. nonmodular preconditioners for fluidstructure systems with large addedmass effect. Comput Methods Appl Mech Eng 197(49–50):4216–4232MathSciNetCrossRefzbMATHGoogle Scholar
 77.Badia S, Nobile F, Vergara C (2008b) Fluidstructure partitioned procedures based on Robin transmission conditions. J Comput Phys 227(14):7027–7051MathSciNetCrossRefzbMATHGoogle Scholar
 78.Badia S, Quaini A, Quarteroni A (2008c) Splitting methods based on algebraic factorization for fluid–structure interaction. SIAM J Sci Comput 30(4):1778MathSciNetCrossRefzbMATHGoogle Scholar
 79.Badia S, Nobile F, Vergara C (2009) Robin–Robin preconditioned Krylov methods for fluidstructure interaction problems. Comput Methods Appl Mech Eng 198(33–36):2768–2784MathSciNetCrossRefzbMATHGoogle Scholar
 80.Rognes M, Kirby R, Logg A (2009) Efficient assembly of H(div) and H(curl) conforming finite elements. SIAM J Sci Comput 31(6):4130–4151MathSciNetCrossRefzbMATHGoogle Scholar
 81.Agelek R, Anderson M, Bangerth W, Barth W (2017) On orienting edges of unstructured two and threedimensional meshes. ACM Trans Math Softw (to appear)Google Scholar
 82.Bangerth W, KayserHerold O (2009) Data structures and requirements for hp finite element software. ACM Trans Math Softw 36(1):1–31MathSciNetCrossRefzbMATHGoogle Scholar
 83.Badia S, Baiges J (2013) Adaptive finite element simulation of incompressible flows by hybrid continuous–discontinuous Galerkin formulations. SIAM J Sci Comput 35(1):A491–A516MathSciNetCrossRefzbMATHGoogle Scholar
 84.Dryja M, Galvis J, Sarkis M (2007) BDDC methods for discontinuous Galerkin discretization of elliptic problems. J Complex 23(4–6):715–739MathSciNetCrossRefzbMATHGoogle Scholar
 85.Freeman E, Robson E, Sierra K, Bates B (eds) (2004) Head first design patterns. O’Reilly, SebastopolGoogle Scholar
 86.FPL—Fortran parameter list. https://gitlab.com/fempar/FPL
 87.Beall MW, Shephard MS (1997) A general topologybased mesh data structure. Int J Numer Meth Eng 40(9):1573–1596MathSciNetCrossRefGoogle Scholar
 88.Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable objectoriented software. AddisonWesley, BostonzbMATHGoogle Scholar
 89.Bangerth W, Burstedde C, Heister T, Kronbichler M (2012) Algorithms and data structures for massively parallel generic adaptive finite element codes. ACM Trans Math Softw 38(2):14:1–14:28MathSciNetzbMATHGoogle Scholar
 90.Burstedde C, Wilcox LC, Ghattas O (2011) p4est : scalable algorithms for parallel adaptive mesh refinement on forests of octrees. SIAM J Sci Comput 33(3):1103–1133MathSciNetCrossRefzbMATHGoogle Scholar
 91.GiD—the personal pre and post processor. http://www.gidhome.com
 92.Filippone S, Buttari A (2012) Objectoriented techniques for sparse matrix computations in Fortran 2003. ACM Trans Math Softw 38(4):23:1–23:20CrossRefzbMATHGoogle Scholar
 93.Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied MathematicsGoogle Scholar
 94.FLAP—Fortran command Line Arguments Parser for poor people. https://github.com/szaghi/FLAP
 95.Childs H, Brugger E, Whitlock B, Meredith J, Ahern S, Pugmire D, Biagas K, Miller M, Harrison C, Weber GH, Krishnan H, Fogal T, Sanderson A, Garth C, Bethel EW, Camp D, Rübel O, Durant M, Favre JM, Navrátil P (2012) VisIt: an enduser tool for visualizing and analyzing very large data. In: High performance visualizationenabling extremescale scientific insight, pp 357–372Google Scholar
 96.Ayachit U (2015) The paraview guide: a parallel visualization application. Kitware Inc, Clifton ParkGoogle Scholar
 97.Schroeder W, Martin KM, Lorensen WE (1998) The visualization toolkit: an objectoriented approach to 3D graphics, 2nd edn. PrenticeHall, Inc., Upper Saddle RiverGoogle Scholar
 98.Lib_VTK_IO—Pure Fortran (2003+) library to write and read data conforming the VTK standard. https://gitlab.com/fempar/Lib_VTK_IO
 99.XDMF—eXtensible data model and format. http://www.xdmf.org/index.php/Main_Page
 100.The HDF Group. Hierarchical data format version 5. http://www.hdfgroup.org/HDF5, 2000–2017
 101.XH5For—XDMF parallel partitioned mesh Input/Output on top of HDF5. https://gitlab.com/fempar/XH5For
 102.Brömmel D, Wylie BJN, Frings W (2015) JUQUEEN extreme scaling workshop 2015. Technical Report FZJ201501645, Jülich Supercomputing CenterGoogle Scholar
 103.Dohrmann CR (2003) A preconditioner for substructuring based on constrained energy minimization. SIAM J Sci Comput 25(1):246–258MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.