Keywords

1 Introduction

The finite element method [19] is a widely used approach finding an approximate solution of partial differential equations (PDEs) specified along with boundary conditions and a solution domain. A mesh with hexahedral elements is created to cover the domain and to approximate the solution over it. Then the weak form of the PDE is discretized using polynomial basis functions spread over the mesh. The hp-adaptive Finite Element Method (hp-FEM) is the most sophisticated version of FEM [9]. It generates a sequence of refined grids, providing exponential convergence of the numerical error with respect to the mesh size. The hp-FEM algorithm uses the coarse and the fine meshes in each iteration to compute the relative error and to guide the adaptive refinement process. Selected finite elements are broken into smaller elements. This procedure is called the h-refinement. Also, the polynomial orders of approximation are updated on selected edges, faces, and interiors. This procedure is called the p-refinement. In selected cases, both h and p refinements are performed, and this process is called the hp-refinement.

The hp-FEM is used to solve difficult PDEs, e.g. with local jumps in material data, with boundary layers, strong gradients, generating local singularities, requiring elongated adaptive elements, or utilization of elements with several orders of magnitude difference in dimension. For such kind of meshes iterative solvers deliver convergence problems.

This paper is devoted to the optimization of the element partition trees controlling the LU factorization of systems of linear equations resulting from the hp-FEM discretizations over three-dimensional meshes with hexahedral elements. In this paper we focus on a class of hp adaptive grids, which has many applications in different areas of computational science and several possible implementations [6,7,8,9, 21, 22, 26,27,28]. The LU factorization for the case of hp-adaptive finite element method is performed using multi-frontal direct solvers, such as e.g. MUMPS solver [2,3,4]. This is because the matrices resulting from the discretization over the computational meshes are sparse, and smart factorization will generate a low number of additional non-zero entries (so-called fill-in) [17, 18]. The problem of finding the optimal permutation of the sparse matrix which minimizes the fill-in (the number of new non-zero entries created during the factorization) is NP-complete [29]. In this paper, we propose a heuristic algorithm that works for arbitrary hp-adaptive gird, with finite elements of different size and with a different distribution of polynomial orders of approximation spread over finite element edge, faces, and possibly interiors. The algorithm performs recursive weighted partitions of the graph representing the computational mesh and uses these partitions to generate an ordering, which minimizes the fill-in in a quasi-optimal way. The partitions are defined by so-called element partition tree, which can be transformed directly into the ordering.

In this paper we focus on the optimization of the sequential in-core multi-frontal solver [11,12,13], although the orderings obtained from our element partition trees can be possibly utilized to speed up shared-memory [14,15,16] or distributed-memory [2,3,4] implementations as well. This will be the topic of our future work.

The heuristic algorithm proposed in this paper is based on the insights we gained in [1], where we proposed a dynamic programming algorithm to search for quasi-optimal element partition trees. These quasi-optimal trees obtained in [1] are too expensive to generate, and they cannot be used in practice, but rather guide our heuristic methods. From the insights garnered from this optimization process, we have proposed a heuristic algorithm that generates quasi-optimal element partition trees for arbitrary h-refined grids in 2D and 3D. In this paper, we generalize the idea presented in [1] to the class of hp-adaptive grids. The heuristic algorithm uses multilevel recursive bisections with weights assigned to element edges, faces, and interiors. Our heuristic algorithm has been implemented and tested in three-dimensional case. It generates mesh partitions for arbitrary hp-refined meshes, by issuing recursive calls to METIS_WPartGraphRecursive. That is, we use the multilevel recursive bisection implemented in METIS [20] available through the MUMPS interface [2,3,4], to find a balanced partition of a weighted graph. We construct the element partition tree by recursive calls of the graph bisection algorithm. Our algorithm for the construction of the element partition tree and the corresponding ordering differs from the orderings used by the METIS library (nested dissection) as follows. First, we use a smaller graph, built from the computational mesh, with vertices representing the finite elements and edges representing the adjacency between elements. Second, we weight the vertices of the graph by the volume of finite elements multiplied by the polynomial orders of approximations in the center of the element. Third, we weight the edges of the graph by the polynomial orders of approximations over element faces.

Previously [23, 24], we have proposed bottom-up approaches for constructing element partition trees for h-adaptive grids. Herein, we propose an alternative algorithm, bisections-weighted-by-element-size-and-order, to construct element partition trees using a top-down approach, for hp-adaptive grids. The element size in our algorithm is a proxy for refinement level of the element. The order is related to the polynomial degrees used on finite element edges, faces and interiors.

The plan of the paper is the following. We first define the computational mesh and basis functions which illustrate how these computational grids are transformed into systems of linear equations using the finite element method. Then, we describe the idea of a new heuristic algorithm which uses bisections weighted by elements sizes and polynomial orders of approximation. We show how the ordering can be generated from our element partition tree. The next section includes numerical tests which compare the number of floating point operations and wall-clock time resulting from the execution of the multi-frontal direct solver algorithm on the alternative orderings under analysis.

2 Meshes, Matrices and Orderings for the hp-adaptive Finite Element Methods

We introduce a class of computational meshes that results from the application of an adaptive finite element method [9]. For our analysis, we start from a three-dimensional boundary-value elliptic partial differential equation problem in its weak (variational) form given by (1): Find \(u \in V\) such that

$$\begin{aligned} b\left( u,v\right) =l\left( v\right) \quad \forall v \in V \end{aligned}$$
(1)

where \(b\left( u,v\right) \) and \(l\left( v\right) \) are some problem-dependent bilinear and linear functionals, and

$$\begin{aligned} V=\{v : \int _{\varOmega } \Vert v\Vert ^2+\Vert \nabla v\Vert ^2 dx < \infty , tr\left( v\right) =0 \text { on } \varGamma _D \} \end{aligned}$$
(2)

is a Sobolev space over an open set \(\varOmega \) called the domain, and \(\varGamma _D\) is the part of the boundary of \(\varOmega \) where Dirichlet boundary conditions are defined.

For a given domain \(\varOmega \) the hp-FEM constructs a finite dimensional subspace \(V_{hp} \subset V\) with a finite dimensional polynomial basis given by \(\{ e^i_{hp}\}_{i=1,\ldots ,N_{hp}}\). The subspace \(V_{hp}\) is constructed by partitioning the domain \(\varOmega \) into three-dimensional finite elements, with vertices, edges, faces, and interiors, as well as shape functions defined over these objects.

Namely, we introduce one-dimensional shape-functions

$$\begin{aligned} \hat{\chi }_1(\xi ) = 1-\xi ; \quad \hat{\chi }_2(\xi ) = \xi ; \quad \hat{\chi }_l(\xi ) = (1-\xi )\xi (2\xi -1)^{l-3}, l=4,\ldots ,{p+1} \end{aligned}$$
(3)

where p is the polynomial order of approximation, and we utilize them to define the three-dimensional hexahedral finite element \(\{\left( \xi _1,\xi _2,\xi _3\right) :\xi _i\in [0,1], i=1,3\}\). We define eight shape functions over the eight vertices of the element:

$$\begin{aligned} \hat{\phi }_1(\xi _1,\xi _2,\xi _3)=\hat{\chi }_1(\xi _1)\hat{\chi }_1(\xi _2)\hat{\chi }_1(\xi _3) \quad \hat{\phi }_2(\xi _1,\xi _2,\xi _3)=\hat{\chi }_2(\xi _1)\hat{\chi }_1(\xi _2)\hat{\chi }_1(\xi _3)\nonumber \\ \hat{\phi }_3(\xi _1,\xi _2,\xi _3)=\hat{\chi }_2(\xi _1)\hat{\chi }_2(\xi _2)\hat{\chi }_1(\xi _3)\quad \hat{\phi }_4(\xi _1,\xi _2,\xi _3)=\hat{\chi }_1(\xi _1)\hat{\chi }_2(\xi _2)\hat{\chi }_1(\xi _3)\nonumber \\ \hat{\phi }_5(\xi _1,\xi _2,\xi _3)=\hat{\chi }_1(\xi _1)\hat{\chi }_1(\xi _2)\hat{\chi }_2(\xi _3)\quad \hat{\phi }_6(\xi _1,\xi _2,\xi _3)=\hat{\chi }_2(\xi _1)\hat{\chi }_1(\xi _2)\hat{\chi }_2(\xi _3)\nonumber \\ \hat{\phi }_7(\xi _1,\xi _2,\xi _3)=\hat{\chi }_2(\xi _1)\hat{\chi }_2(\xi _2)\hat{\chi }_2(\xi _3)\quad \hat{\phi }_8(\xi _1,\xi _2,\xi _3)=\hat{\chi }_1(\xi _1)\hat{\chi }_2(\xi _2)\hat{\chi }_2(\xi _3) \end{aligned}$$
(4)

\(j=1,\ldots ,p_i-1\) shape functions over each of the twelve edges of the element

$$\begin{aligned} \hat{\phi }_{9,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_{2+j}(\xi _1)\hat{\chi }_1(\xi _2)\hat{\chi }_1(\xi _3)\quad \hat{\phi }_{10,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_2(\xi _1)\hat{\chi }_{2+j}(\xi _2)\hat{\chi }_1(\xi _3)\nonumber \\ \hat{\phi }_{11,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_{2+j}(\xi _1)\hat{\chi }_2(\xi _2)\hat{\chi }_1(\xi _3)\quad \hat{\phi }_{12,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_1(\xi _1)\hat{\chi }_{2+j}(\xi _2)\hat{\chi }_1(\xi _3)\nonumber \\ \hat{\phi }_{13,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_{2+j}(\xi _1)\hat{\chi }_1(\xi _2)\hat{\chi }_2(\xi _3)\quad \hat{\phi }_{14,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_2(\xi _1)\hat{\chi }_{2+j}(\xi _2)\hat{\chi }_2(\xi _3)\nonumber \\ \hat{\phi }_{15,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_{2+j}(\xi _1)\hat{\chi }_2(\xi _2)\hat{\chi }_2(\xi _3)\quad \hat{\phi }_{16,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_1(\xi _1)\hat{\chi }_{2+j}(\xi _2)\hat{\chi }_2(\xi _3)\nonumber \\ \hat{\phi }_{17,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_1(\xi _1)\hat{\chi }_1(\xi _2)\hat{\chi }_{2+j}(\xi _3)\quad \hat{\phi }_{18,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_2(\xi _1)\hat{\chi }_1(\xi _2)\hat{\chi }_{2+j}(\xi _3)\nonumber \\ \hat{\phi }_{19,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_2(\xi _1)\hat{\chi }_2(\xi _2)\hat{\chi }_{2+j}(\xi _3)\quad \hat{\phi }_{20,j}(\xi _1,\xi _2,\xi _3)=\hat{\chi }_1(\xi _1)\hat{\chi }_2(\xi _2)\hat{\chi }_{2+j}(\xi _3) \nonumber \\ \end{aligned}$$
(5)

where \(p_i\) is the polynomial order of approximation utilized over the i-th edge. We also define \((p_{ih}-1)\times (p_{iv}-1)\) shape functions for \(j=1,\ldots ,p_{ih}-1\) and \(k=1,\ldots ,p_{iv}-1\), over each of six faces of the element

$$\begin{aligned} \hat{\phi }_21(\xi _1,\xi _2,\xi _3)=\hat{\chi }_{2+j}(\xi _1)\hat{\chi }_{2+k}(\xi _2)\hat{\chi }_1(\xi _3)\quad \hat{\phi }_22(\xi _1,\xi _2,\xi _3)=\hat{\chi }_{2+j}(\xi _1)\hat{\chi }_{2+k}(\xi _2)\hat{\chi }_2(\xi _3)\nonumber \\ \hat{\phi }_23(\xi _1,\xi _2,\xi _3)=\hat{\chi }_{2+j}(\xi _1)\hat{\chi }_1(\xi _2)\hat{\chi }_{2+k}(\xi _3)\quad \hat{\phi }_24(\xi _1,\xi _2,\xi _3)=\hat{\chi }_2(\xi _1)\hat{\chi }_{2+j}(\xi _2)\hat{\chi }_{2+k}(\xi _3)\nonumber \\ \hat{\phi }_25(\xi _1,\xi _2,\xi _3)=\hat{\chi }_{2+j}(\xi _1)\hat{\chi }_2(\xi _2)\hat{\chi }_{2+k}(\xi _3)\quad \hat{\phi }_26(\xi _1,\xi _2,\xi _3)=\hat{\chi }_1(\xi _1)\hat{\chi }_{2+j}(\xi _2)\hat{\chi }_{2+k}(\xi _3) \nonumber \\ \end{aligned}$$
(6)

where \(p_{ih},p_{iv}\) are the polynomial orders of approximations in two directions in the i-th face local coordinates system. Finally, we define \((p_x-1)\times (p_y-1)\times (p_z-1)\) basis functions over an element interior

$$\begin{aligned} \hat{\phi }_{27,ij}(\xi _1,\xi _2)=\hat{\chi }_{2+i}(\xi _1)\hat{\chi }_{2+j}(\xi _2)\hat{\chi }_{2+k}(\xi _3) \end{aligned}$$
(7)

where \((p_x,p_y,p_z)\) are the polynomial orders of approximation in three directions, respectively, utilized over an element interior. The shape functions from the adjacent elements that correspond to identical vertices, edges, or faces, they are merged to form global basis functions.

The support interactions of the basis functions defined over the mesh determine the sparsity pattern for the global matrix.

In the example presented in Fig. 1 there are first order polynomial basis functions associated with element vertices, second order polynomials associated with element edges, and second order polynomials in both directions, associated with element interiors. For more details we refer to [9].

We illustrate these concepts with two-dimensional example. Figure 1 presents an exemplary two-dimensional mesh consisting of rectangular finite elements with vertices, edges and interiors, as well as shape functions defined over vertices, edges and interiors of rectangular finite elements of the mesh.

Fig. 1.
figure 1

Examplary four element mesh and basis functions spread over the mesh

Fig. 2.
figure 2

Matrix resulting from four element mesh with \(p=1\) vertex basis functions.

The interactions of supports of basis functions defined over the mesh define the sparsity pattern for the global matrix. In other words, i-th row and j-th column of the matrix is non-zero, if supports of i-th and j-th basis functions overlap. For example, for the \(p=1\) case the global matrix looks like it is presented in Fig. 2. In this case, only vertex functions are present. For \(p=2\), all the basis functions are interacting, and this corresponds to the case presented in Fig. 3.

Traditional sparse matrix solvers construct the ordering based on the sparsity pattern of the global matrix. This is illustrated in the top path in Fig. 4. The sparse matrix is submitted to an ordering generator, e.g., the nested-dissections [20] or the AMD [5] algorithms from the METIS library. The ordering is utilized later to permute the sparse matrix, which results in less non-zero entries generated during the factorization, and lower computational cost of the factorization procedure. In the meantime, the elimination tree is constructed internally by the sparse solver, which guides the elimination procedureFootnote 1.

The alternative approach is discussed in this paper. We construct the element partition tree based on the structure of the computational mesh, using the weighted bisections algorithm. The element partition tree is then browsed in post-order, to obtain the ordering, which defines how to permute the sparse matrix. This is illustrated on the bottom path presented in Fig. 4. For a detailed description on how to construct ordering based on an element partition tree, we refer to Chap. 8 of the book [25].

The sparsity pattern of the matrix rather not depend on the elliptic PDE being solved over the mesh. It strongly depends on the basis functions and the topology of the computational mesh.

Fig. 3.
figure 3

Matrix resulting from four element mesh with \(p=2\) basis functions related to element vertices, edges, faces and interios.

Fig. 4.
figure 4

The construction of the ordering based on sparsity pattern of the matrix, and based on the element partition tree.

Fig. 5.
figure 5

The exemplary three-dimensional mesh and its weighted graph representation.

3 Bisections-Weighted-by-Element-Size-and-Order

The algorithm of bisections-weighted-by-element-size-and-order creates an initial undirected graph G for finite element mesh. Each node of the graph corresponds to one finite element from the mesh. An edge in the graph G exists if the corresponding finite elements have a common face. Additionally, each node of the graph G has an attribute size that is defined as follows. For the regular meshes, as considered in this paper, the size of an element is defined as the volume of the element times the order of the element. For general three-dimensional grids, the volume attribute is defined as the function of a refinement level of an element:

$$\begin{aligned} volume = 2^{(3*(max\_refinement\_level - refinement\_level))}(p_x-1)(p_y-1)(p_z-1) \end{aligned}$$
(8)

Moreover, each vertex of graph G has an attribute weight defined as the polynomial order of approximation of the face between two neighboring elements. The elements in the three-dimensional mesh may be neighbors through a vertex, an edge, or a face. In these cases, the weight of the edge corresponds to the vertex order (always equal to one), the edge order (defined as \(p_{edge}-1\)) or the face order (defined as \((p_{ih}-1)\times (p_{iv}-1)\). This is illustrated in Fig. 5.

The function named BisectionWeightedByElementSizeOrder() is called initially with the entire graph G, and later it is called recursively with sub-graphs of G. It generates the element partition tree. The BisectionWeightedByElement SizeOrder function is defined as follows:

figure a

Once the algorithm generates the element partition tree, we extract the ordering and call a sequential solver. Herein, we use METIS_WPartGraphRecursive [20] function to find a balanced partition of a graph, where weights on vertices are equal to the size value of the corresponding mesh elements. The METIS_WPart GraphRecursive uses the Sorted Heavy-EdgeMatching method during the coarsening phase, the Region Growing method during partitioning phase and the Early-Exit Boundary FM refinement method during the un-coarsening phase.

Fig. 6.
figure 6

Exponential convergence of the numerical error with respect to the mesh size for the model Fichera problem, obtained on the generated sequence of coarse grids. The corresponding fine grids are not presented here.

Fig. 7.
figure 7

Coarse and fine meshes of hp-FEM code for the Fichera problem. Various polynomial orders of approximation on element edges, faces and interiors are denoted by different colors. (Color figure online)

4 Numerical Results

In this section, we compare the number of flops of the MUMPS multi-frontal direct solver [2,3,4] with the ordering obtained from the element partition trees generated by the bisections-weighted-by-element-size-and-order algorithm, and the MUMPS with automatic selection of the ordering algorithm, compiled with icntl(7) = 7. The MUMPS solver chooses either nested-dissection [20] or approximate minimum degree algorithm [5] for this kind of problem, depending on the properties of the sparse matrix. We focus on the model Fichera problem [9, 10]: Find u temperature scalar field such that \(\nabla u=0\) on \(\varOmega \) being 7/8 of the cube, with zero Dirichlet b.c. on the internal 1/8 boundary, and Neumann b.c. on the external boundary, computed from the manufactured solution. This model problem has strong singularities at the central point, and along the three internal edges, thus the intensive refinements are required.

The hp-FEM code generates a sequence of hp-refined grids delivering exponential convergence of the numerical error with respect to the mesh size, as presented in Fig. 6. The comparison of flops and wall time concerns the last two grids, the coarse, and the corresponding fine grids, generated by the hp-FEM algorithm, with various polynomial orders of approximation, and element sizes, as presented in Fig. 7. It is summarized in Table 1.

Table 1. Comparison of flops and execution times between bisection-weighted-by-element-size-and-order, with MUMPS equipped with automatic generation of ordering on different three-dimensional adaptive grids.

To verify the flops and the wall-time performance of our algorithm against alternative ordering provided by MUMPS, we use the PERM IN input array of the library. The hp-FEM code generates a sequence of optimal grids. The decisions about the optimal mesh refinements are performed by using the reference solution on the fine grids, obtained by the global hp-refinement of the coarse grids. We compare the flops and the wall time-performance on the last two iterations performed by the adaptive algorithm, where the relative error, defined as the H1 norm difference between the coarse and the fine mesh solutions is less than 1.0%. In particular, on the last iteration for the Fichera problem (N = 139,425) MUMPS with its default orderings used 67.94 s while with our ordering it used 33.06 s. The number of floating point operations required to perform the factorizations was \(254*10^9\) as reported by the MUMPS with automatic ordering, and \(111*10^9\) as reported by the MUMPS with our ordering. We can conclude that the bisections-weighted-by-element-size-and-order is an attractive alternative algorithm for generation of the ordering based on the element partition trees.

5 Conclusions

We introduce a heuristic algorithm called bisections-weighted-by-element-size-and-order that utilizes a top-down approach to construct element partition trees. We compare the trees generated by our algorithm against the alternative state-of-the-art ordering algorithms, on a three-dimensional hp-refined grids used to solve the model Fichera problem. We conclude that our ordering algorithm can deliver up to 50% improvement against the state-of-the-art orderings used by MUMPS both in floating-point operations counts as well as wall time.