1 Introduction

Differential equations yield solutions which necessarily contain a certain amount of regularity and are based on local interactions. There are many real-world phenomena where those inherent assumptions are violated. Therefore, over the last two decades, nonlocal models attracted attention due to their capability of circumventing those limitations [2].

The models emerge due to various applications like anomalous diffusion [3, 4], peridynamics [5], or image processing [6, 7] and are diverse in their mathematical nature [8, 9]. That is, the integral kernels might exhibit strong singularities, an infinite or finite interaction horizon, and can be scalar or tensor-valued. Their investigation is often accompanied with numerical experiments motivated by these applications.

In this work, we describe the discretization of nonlocal operators of the general form

$$-\mathcal {L}_\delta \textbf{u}(\textbf{x}):= ~ 2\int _{\widetilde{\Omega }}\left( \textbf{C}_\delta (\textbf{x}, \textbf{y}) \textbf{u}(\textbf{x}) - \textbf{C}_\delta (\textbf{y}, \textbf{x}) \textbf{u}(\textbf{y})\right) d\textbf{y},$$

where the kernel \(\textbf{C}_\delta\) vanishes for points farther apart than some horizon \(\delta >0\). The support of the kernel, also referred to as interaction neighborhood in the following, is often modeled by the Euclidean norm ball or suitable approximations thereof.

The purpose of our code nlfem [10] is to compute numerical solutions of related boundary value problems at a convenient speed for researchers. The discretization of these problems is achieved by a finite element approximation. The resulting variational framework comes at the price of a second integration compared to the operator in its strong form, which makes it more costly compared to classical differential operators. Further challenges in the implementation arise due to finite interaction horizons and singularities of the kernel [1].

The finite element method is one among several methods to approximate nonlocal operators. Mesh-free methods [11,12,13] are commonly applied to peridynamics problems, and there exist finite difference schemes [14] and kernel collocation methods [15] for nonlocal diffusion and mechanics.

As nlfem is a finite element implementation, we shortly review the available codes for nonlocal problems based on that method. The foundation for operators related to the fractional LaplacianFootnote 1 is given by boundary element methods [16]. Based on these fundamentals, a MATLAB implementation for a finite element approximation of the two-dimensional fractional Laplacian with infinite interaction is presented in [17]. Further advanced techniques to efficiently implement the fractional Laplacian are developed in [18] and incorporated into the finite element code PyNucleus [19]. The package is a recommendable alternative to our code, and we give a detailed comparison of PyNucleus and nlfem in Sect. 4.3. Apart from that, there exists a commercial finite element code as part of LS-DYNA [20] for peridynamics. For a general overview, we refer the reader to the comprehensive review paper [21] on numerical methods for nonlocal problems.

Our code nlfem assembles nonlocal operators on triangular meshes based on linear continuous Galerkin (CG) or discontinuous Galerkin (DG) ansatz spaces.

It allows the assembly of stiffness matrices related problems with given Neumann and Dirichlet boundary data [22,23,24]. An important detail is that the null space of systems corresponding to pure Neumann boundary data is exact due to a careful approximation of kernel truncations which eliminates distortions rooted in geometric errors. The knowledge of the null space can be exploited to efficiently evaluate the pseudoinverse, for example, by rank corrections or Krylov subspace solvers.

Concerning the domain, nlfem covers a variety of different configurations. It handles nonlocal interactions in nonconvex or even disconnected domains where the intersection between the interaction neighborhood \(B_\delta (\textbf{x})\) and the domain can be disconnected. For example, this is of particular interest in shape optimization with nonlocal operators [25, 26], where the domain is modified iteratively.

The kernel can be symmetric or nonsymmetric as well as scalar—or matrix-valued. For the symmetric case, our discretization of the weak form guarantees the symmetry of the stiffness matrix up to machine precision. The code can generically handle smooth kernels, and it comes with quadrature rules for fractional-type kernels as they are found in [16, 17]. In addition to that, the kernel can vary depending on the subdomain it is evaluated on. This opens the door to the assembly of interface problems determined by spatially variable kernels.

The nlfem code is most efficient for operators with interaction horizons which are comparable to the mesh size, i.e., \(h \;\leqslant\; \delta \;\leqslant\; C h\) for some \(C \;\geqslant\; 1\). This relation is often used in the nonlocal mechanics setting; for example, in [27, 28], choices such as \(\delta =3h\) or \(\delta =4h\) are used. A careful consideration of the quadrature and interpolation errors can allow smaller ratios \(\delta /h\) and increase the sparsity of the related systems.

Our implementation is based on the extensive discussions on the errors incurred by various approximations of the interaction neighborhood and quadrature rules [1, 26], all of which are implemented here. We only discuss a small selection of interaction neighborhoods in this paper, such as two approximations of the Euclidean norm ball, which provably do not deteriorate the finite element interpolation error [1]. Furthermore, the interaction neighborhood of a kernel is efficiently determined by a breadth-first traversal [29, Chapter 6] of finite elements throughout the assembly process, which avoids expensive preprocessing computations.

A fundamental advantage of finite element methods is that they can be considered to be asymptotically compatible in the sense of [30]. Our code reproduces this property for the cases \(h \;\leqslant\; \delta\), \(h \sim \delta \rightarrow 0\) if the implemented interaction neighborhood does not induce geometric errors. For example, this is the case for the implemented infinity norm ball.

For convenience, the assembly is performed in multiple threads, and the main routine, which is written in C++, comes with a user-friendly Python interface. When it comes to solving, we note that the stiffness matrix is returned in compressed sparse row (CSR) format. Therefore, the user can apply any sparse solver accessible from Python and apply it to the stiffness matrix.

The remainder of this article is organized into two main sections. First, in Sect. 2, we give a precise formulation of the targeted problem class (Sect. 2.1) along with its finite element approximation (Section 2.2). In the following subsections, we then highlight discretization details that deserve special attention in a nonlocal framework. Second, in Sect. 3, we present various numerical examples, including diffusion and mechanics, and give a brief scaling study.

2 Finite Element Approximation

In this section, we review known results about the assembly of finite element approximations to nonlocal operators [1]. However, the current literature does not discuss in detail the influence of ball approximations on the symmetry of the system matrix. That is, a symmetric kernel might be approximated by a nonsymmetric stiffness matrix if the kernel truncation is not handled with sufficient care. The novel contribution in this section is a ball approximation which rules out geometric and quadrature-related distortions. Moreover, we present a novel assembly algorithm tailored to truncated kernels which identifies the interaction neighborhood on-the-fly without the requirement of a prepossessing step.

2.1 Problem Formulation

Let \(\widetilde{\Omega }\subset \mathbb {R}^{d}\) be a compact domain, and let \(\Omega \subset \widetilde{\Omega }\) be open in \(\widetilde{\Omega }\).Footnote 2 We note that the case \(\Omega = \widetilde{\Omega }\) is allowable, and \(\Omega\) is open in \(\mathbb {R}^{d}\) only if \(\Omega \subset {{\,\textrm{int}\,}}(\widetilde{\Omega })\). We refer to \(\Omega\) as domain. The complement of \(\Omega\) in \(\widetilde{\Omega }\) is denoted by \(\Omega _D:= \widetilde{\Omega }\setminus \Omega\). It typically plays the role of a nonlocal Dirichlet boundary in suitable settings. The case \(\widetilde{\Omega }= \Omega\) implies \(\Omega _D=\emptyset\) and thus the absence of further constraints. In this case, the resulting nonlocal problem can be interpreted as Neumann-type problem; see, e.g., [22, 31]. In Fig. 1, we present an exemplary configuration. Let \(\delta > 0\) be an interaction horizon. We make the following assumption about the kernel function.

Fig. 1
figure 1

a The compact domain \(\widetilde{\Omega }\) contains \(\Omega\) (blue) and \(\Omega _D\) (red), where the latter is possibly empty. The blue boundary belongs to \(\Omega\), and the red boundary belongs to \(\Omega _D\). b The red elements belong to the nonlocal Dirichlet boundary \(\Omega _D\), while the blue elements belong to the domain \(\Omega\). Note that the vertices on the red lines do not belong to the interior of \(\Omega\)

Assumption 1

We assume that for the matrix-valued kernel function \(\varvec{\Psi }_\delta : \mathbb {R}^{d} \times \mathbb {R}^{d} \rightarrow \mathbb {R}^{{n}\times {n}}\), there exists an \(s\in (0, 1)\) such that

$$\begin{aligned} \varvec{\Psi }_\delta (\textbf{x}, \textbf{y})\, \mathbbm {1}_{B_\delta (\textbf{x})} (\textbf{y})\, |\textbf{x}- \textbf{y}|^{{d} + 2s} \end{aligned}$$
(1)

is bounded.

This allows singularities at the origin and also includes smooth kernels such as the constant kernel. Next, we introduce the interaction neighborhood \(B_\delta (\textbf{x})\), where

$$\begin{aligned} B_\delta (\textbf{x}) := \lbrace \textbf{y}\in \mathbb {R}^{d} ~|~ |\textbf{x}- \textbf{y}|_{\bullet } \,\leqslant\,\delta \rbrace , \end{aligned}$$
(2)

for some norm \(| \cdot |_{\bullet }\) in \(\mathbb {R}^d\). We denote the truncated kernel by

$$\begin{aligned} \textbf{C}_\delta (\textbf{x}, \textbf{y}) := \varvec{\Psi }_\delta (\textbf{x}, \textbf{y}) \mathbbm {1}_{B_\delta (\textbf{x})} (\textbf{y}). \end{aligned}$$
(3)

The linear nonlocal operator under consideration acting on a function \(\textbf{u}:\mathbb {R}^{d}\rightarrow \mathbb {R}^{n}\) is then given by

$$\begin{aligned} -\mathcal {L}_\delta \textbf{u}(\textbf{x}) := ~ 2\int _{\widetilde{\Omega }} \left( \textbf{C}_\delta (\textbf{x}, \textbf{y}) \textbf{u}(\textbf{x}) - \textbf{C}_\delta (\textbf{y}, \textbf{x}) \textbf{u}(\textbf{y})\right) d\textbf{y}. \end{aligned}$$
(4)

Note that in general, the above integral does not exist, and we tacitly interpret the strong form of the operator in the Cauchy principal value senseFootnote 3 if necessary. By testing (4) with \(\textbf{v}:\widetilde{\Omega }\rightarrow \mathbb {R}^{n}\) where \(\textbf{v}=0\) on \(\Omega _D\), we obtain the bilinear form

$$\begin{aligned} \begin{aligned} A(\textbf{u}, \textbf{v})&:=-\int _\Omega \textbf{v}(\textbf{x})^{\!\top \!\!}\mathcal {L}_\delta \textbf{u}(\textbf{x}) d\textbf{x}\\&=2 \int _{\Omega } \textbf{v}(\textbf{x})^{\!\top \!\!}~\int _{\widetilde{\Omega }} \textbf{C}_\delta (\textbf{x}, \textbf{y})\textbf{u}(\textbf{x}) \\& \quad - \textbf{C}_\delta (\textbf{y}, \textbf{x}) \textbf{u}(\textbf{y}) d\textbf{y}d \textbf{x}. \end{aligned} \end{aligned}$$
(5)

With \(\textbf{v}=\mathbf{0}\) on \(\Omega _D\) and Fubini’s theorem (see, e.g., [22]), the bilinear form can be written as

$$\begin{aligned} \begin{aligned} A(&\textbf{u}, \textbf{v})\\ {}&= \int _{\widetilde{\Omega }} \int _{\widetilde{\Omega }} (\textbf{v}(\textbf{x})-\textbf{v}(\textbf{y}))^{\!\top \!\!}\left( \textbf{C}_\delta (\textbf{x}, \textbf{y})\textbf{u}(\textbf{x}) - \textbf{C}_\delta (\textbf{y}, \textbf{x}) \textbf{u}(\textbf{y}) \right) d\textbf{y}d \textbf{x}\\&=\int _{\widetilde{\Omega }} \int _{\widetilde{\Omega }}\mathbbm {1}_{B_\delta (\textbf{x})}(\textbf{y}) (\textbf{v}(\textbf{x})-\textbf{v}(\textbf{y}))^{\!\top \!\!}\left( \varvec{\Psi }_\delta (\textbf{x}, \textbf{y})\textbf{u}(\textbf{x}) - \varvec{\Psi }_\delta (\textbf{y}, \textbf{x}) \textbf{u}(\textbf{y}) \right) d\textbf{y}d \textbf{x}. \end{aligned} \end{aligned}$$
(6)

Note that we exploited the symmetry of the exact indicator function \(\mathbbm {1}_{B_\delta (\textbf{x})} (\textbf{y})\) to obtain this equality.

Remark 2.1

(Nonlocal convection-diffusion). The kernel introduced in (3) is not assumed to be symmetric and may therefore exhibit a nonzero anti-symmetric component. A splitting of this kernel into its symmetric and anti-symmetric parts results in an additive splitting of the nonlocal operator defined in (4), say \(\mathcal {L}_\delta = \mathcal {L}^d_\delta +\mathcal {L}^c_\delta\). Invoking the operators and terminology from the nonlocal vector calculus introduced in [8, 32], one can relate \(-\mathcal {L}^d_\delta\) and \(-\mathcal {L}^c_\delta\) to nonlocal diffusion and nonlocal convection, respectively (see, e.g., [26, Section 2.3]).

2.2 Finite Dimensional Approximation

For integers \(K,M \in \mathbb {N}\), let \(\mathcal {T}^h:= \lbrace \mathcal {E}_k\rbrace _{k=1}^{K}\) denote a subdivision of \(\widetilde{\Omega }= \Omega \cup \Omega _D\) into polyhedral finite elements with nodes \(\lbrace \textbf{x}_m \rbrace _{m=1}^M\).

Assumption 2

We assume that \(\Omega\) and \(\Omega _D\) can be exactly covered by the subdivisions \(\mathcal {T}^h_\Omega = \lbrace \mathcal {E}_k\rbrace _{k=1}^{K_\Omega }\) and \(\mathcal {T}^h_D = \lbrace \mathcal {E}_k \rbrace _{k=K_\Omega +1}^{K}\) with \(\mathcal {T}^h = \mathcal {T}^h_\Omega \cup \mathcal {T}^h_D\), respectively, where \(\mathcal {T}^h_D\) is possibly empty. Since we assume polyhedral elements, this implies that

$$\begin{aligned} \overline{\Omega } = \bigcup _{k=1}^{K_\Omega } \overline{\mathcal {E}}_k {~~~and~~~} \overline{\Omega }_D = \bigcup _{k=K_\Omega +1}^{K} \overline{\mathcal {E}}_k \end{aligned}$$
(7)

are polyhedral domains.

In the case of \(\Omega _D = \emptyset\), we have that \(K_\Omega = K\) and \(\mathcal {T}^h = \mathcal {T}^h_\Omega\). For convenience of notation, we assume an ordering of the nodes such that \(\lbrace \textbf{x}_m \rbrace _{m=1}^{M_\Omega } \subset \Omega\) and \(\lbrace \textbf{x}_m \rbrace _{m=M_\Omega +1}^{M} \subset \overline{\Omega }_D\). This assumption is not made in the implementation of nlfem.

We implement scalar and vector-valued piecewise-linear continuous and discontinuous basis functions \(\lbrace {\varvec{\phi }}_{j} \rbrace _{{j}=1}^{J}\), where \({J} ={M}\) in the case of continuous and \({J} = {({d}+1)K}\) in case of discontinuous basis functions. Again, for convenience, we assume an ordering of the basis functions and define the corresponding finite-dimensional subspaces

$$\begin{aligned} V^h(\widetilde{\Omega }, \mathbb {R}^{n}) := {{\,\textrm{span}\,}}( \lbrace {\varvec{\phi }}_{j} \rbrace _{{j}=1}^{{J}} ), {~~\text {and}~~} V^h_c(\widetilde{\Omega }, \mathbb {R}^{n}) := {{\,\textrm{span}\,}}( \lbrace {\varvec{\phi }}_{j} \rbrace _{{j}=1}^{{J}_\Omega } ), \end{aligned}$$
(8)

where \({J}_\Omega \;\leqslant\; J\) denotes the number of basis functions which have support within \(\Omega\). In case of continuous basis functions, the unknown coefficients correspond to the nodes lying in the interior of \(\Omega\) with respect to \(\widetilde{\Omega }\). See Fig. 1 for an illustration. Now, the evaluation of the bilinear form A on \(V^h(\widetilde{\Omega }, \mathbb {R}^{n}) \times V_c^h(\widetilde{\Omega }, \mathbb {R}^{n})\) can be written as sum over the finite elements, i.e.,

$$\begin{aligned} \begin{aligned}&A({\varvec{\phi }}_{j}, {\varvec{\phi }}_{i})= \\&\sum _{k=1}^{K} \sum _{\ell =1}^{K} \left[ \int _{\mathcal {E}_k} \int _{\mathcal {E}_{\ell }} ({\varvec{\phi }}_{i}(\textbf{x})-{\varvec{\phi }}_{i}(\textbf{y}))^{\!\top \!\!}\left( \textbf{C}_\delta (\textbf{x}, \textbf{y}){\varvec{\phi }}_{j}(\textbf{x}) - \textbf{C}_\delta (\textbf{y}, \textbf{x}) {\varvec{\phi }}_{j}(\textbf{y})\right) d\textbf{y}d \textbf{x}\right] . \end{aligned} \end{aligned}$$
(9)

Since the kernel \(\textbf{C}_\delta\) may exhibit a truncation on some pairs \((\mathcal {E}_k, \mathcal {E}_{\ell })\), we need an appropriate approximation of its support. The number of elements in the interaction neighborhood of a point in 2d for a fixed horizon \(\delta\) is in \(\mathcal {O}(h^{-2})\), and a greater mesh size h can increase the sparsity in the discrete system. Therefore, a precise approximation can leverage efficiency, and it is desirable that the geometric error in the evaluation of (9) does not deteriorate the interpolation error of the finite element space. While the approximation of the infinity or \(\ell ^1\) norm balls does not introduce a geometric error, the one for curved neighborhoods, like the Euclidean norm ball, does.

In the following, we describe two major examples of the implemented ball approximations for the Euclidean ball

$$B_{\delta }^2(\textbf{x}):= \lbrace \textbf{y}\in \mathbb {R}^d ~ | ~ |\textbf{x}- \textbf{y}|_2 \;\leqslant\; \delta \rbrace ,$$

which we call the approxcaps and the nocaps approximations. Both are based on the given finite element mesh \(\mathcal {T}^h\). By “cap,” we mean the circular segments that arise when a finite element triangle is only partially covered by the Euclidean ball; see Fig. 2. Among others, these ball approximations are investigated in [1].

Fig. 2
figure 2

If a finite element triangle is only partially covered by an Euclidean ball, the intersection contains circular caps. The nocaps ball (a) omits this cap, whereas the approxcaps ball (b) retriangulates the whole intersection

Definition 2.2

(nocaps Ball). For \(\textbf{x}\in \widetilde{\Omega }\), the nocaps ball approximation is defined as the convex hull of the intersection of the boundary \(\partial B^2_\delta (\textbf{x})\) of the Euclidean ball and the boundaries of the elements, i.e.,

$$B_{\delta }^{ncp}(\textbf{x}):= {{\,\textrm{conv}\,}}\left( \bigcup _{\mathcal {E}_\ell \in \mathcal {T}^h} \partial \mathcal {E}_\ell \cap \partial B_{\delta }^2(\textbf{x}) \right) ,$$

where the dependency on the mesh size h is omitted in the notation.

For \({d}=2\), one can show that the area of the symmetric difference of \(B^2_{\delta }(\textbf{x})\) and \(B_{\delta }^{ncp}(\textbf{x})\) is provably of order \(\mathcal {O}(h^2)\) when the mesh size h tends to zero. This is error commensurate with respect to the interpolation error of a linear finite element ansatz space; details for the latter two statements can be found in [1]. The convex hull omits circular caps which appear in the intersection of some elements with the neighborhood \(\mathcal {E}_{\ell } \cap B_{\delta }^2(\textbf{x})\) and have a significant size on coarse grids. Therefore, by adding additional points on the center of possible caps, the geometric error can be reduced even further while still being in \(\mathcal {O}(h^2)\) for \(d=2\); see again Fig. 2. The maximum number of caps for a single intersection \(\mathcal {E}_{\ell } \cap B_{\delta }^2(\textbf{x})\) is three.

While the results in [1] are derived for a fixed horizon \(\delta\), related investigations for the local limit (\(\delta \rightarrow 0\)) with polygonal ball approximations can be found in [33].

Definition 2.3

(approxcaps Ball). Let \(\textbf{x}\in \widetilde{\Omega }\). We denote the points on the cap center of each nonempty intersection \(\mathcal {E}_{\ell } \cap \partial B_{\delta }^2(\textbf{x})\) by \(\textbf{y}_{\ell }\). The approxcaps ball is then defined by

$$\begin{aligned} B_{\delta }^{acp}&(\textbf{x}) := \\&{{\,\textrm{conv}\,}}\left( B_{\delta }^{ncp}(\textbf{x}) \right. \cup \left. \,\lbrace \textbf{y}_{\ell } \, | \, \textbf{y}_{\ell }\;cap\;center\;of\; \mathcal {E}_{\ell } \cap \partial B_{\delta }^2(\textbf{x}) \; for \; some \; \mathcal {E}_{\ell } \in \mathcal {T}^h \rbrace \right) . \end{aligned}$$

Exact quadrature rules for circular caps can be found in [34]. Here, however, the quadrature points have to be computed during run time, as the rules depend on the geometry of the cap for higher quadrature orders. Therefore, we do not consider these exact rules in nlfem.

In addition to these approximate balls, we next also introduce the infinity norm ball.

Definition 2.4

(Infinity Norm Ball). For \(\textbf{x}\in \widetilde{\Omega }\), the infinity normball is defined by

$$B_{\delta }^{\infty }(\textbf{x}):= \lbrace \textbf{y}\in \mathbb {R}^d~ | ~ |\textbf{x}- \textbf{y}|_\infty \;\leqslant\; \delta \rbrace .$$

Proofs for the convergence of the nonlocal Dirichlet-type problem to the classical Dirichlet problem with corresponding scaling constants for various kernel functions can be found, e.g., in [26]. Since the infinity normball is implemented exactly, it allows numerical tests of the expected asymptotic compatibility of the discretization scheme [30].

We finally note that the indicator function based on any implemented truncation may lack symmetry. More precisely, there might exist \(\textbf{x}\) and \(\textbf{y}\), for which

$$\begin{aligned} \mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y}) \ne \mathbbm {1}_{B_{\delta }^\#(\textbf{y})}(\textbf{x}), \end{aligned}$$

where \(B_{\delta }^\#(\textbf{x})\), \(\#\in \{ncp,acp,\infty \}\), represents one of the implemented truncations. This artifact stems from the ball approximation itself in the case of the nocaps and approxcaps ball, but can also be caused by the quadrature; see Remark 2.6 below.

We define the integrand

$$\begin{aligned} \Phi _{i{j} }(\textbf{x}, \textbf{y}) := ( {\varvec{\phi }}_{i}(\textbf{x}) - {\varvec{\phi }}_{i}(\textbf{y}))^{\!\top \!\!}( \varvec{\Psi }_\delta (\textbf{x}, \textbf{y}) {\varvec{\phi }}_{j}(\textbf{x}) - \varvec{\Psi }_\delta (\textbf{y}, \textbf{x}) {\varvec{\phi }}_{j}(\textbf{y}) ) \end{aligned}$$
(10)

and, based on the ball approximation and (6), the approximate bilinear form

$$\begin{aligned} \begin{aligned} A_h^{\#}({\varvec{\phi }}_{j}, {\varvec{\phi }}_{i})&:= \int _{\widetilde{\Omega }}\int _{\widetilde{\Omega }}\mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y}) \Phi _{ i{j}}(\textbf{x}, \textbf{y})d\textbf{y}d \textbf{x}\\&= \sum _{k=1}^{K} \sum _{\ell =1}^{K} \int _{\mathcal {E}_k} \int _{\mathcal {E}_{\ell }} \mathbbm {1}_{B_{\delta }^\#}(\textbf{x}, \textbf{y}) \Phi _{i{j} }(\textbf{x}, \textbf{y}) d\textbf{y}d \textbf{x}. \end{aligned} \end{aligned}$$
(11)

By Fubini’s integration theorem, for sufficiently smooth basis functions \({\varvec{\phi }}_{j} \in V^h(\widetilde{\Omega }, \mathbb {R}^{n})\) and \({\varvec{\phi }}_{i} \in V_c^h(\widetilde{\Omega }, \mathbb {R}^{n})\), the approximate bilinear form defined in (11) can be written as

$$\begin{aligned}&A_h^{\#}({\varvec{\phi }}_{j}, {\varvec{\phi }}_{i}) \\&= \int _{\widetilde{\Omega }}\int _{\widetilde{\Omega }}\mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y}) \Phi _{ i{j}}(\textbf{x}, \textbf{y})d\textbf{y}d \textbf{x}\\&=\int _{\Omega } ~\int _{\widetilde{\Omega }}\mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y}) {\varvec{\phi }}_{i}(\textbf{x})^{\!\top \!\!}\left( \varvec{\Psi }_\delta (\textbf{x}, \textbf{y}) {\varvec{\phi }}_{j}(\textbf{x}) - \varvec{\Psi }_\delta (\textbf{y}, \textbf{x}) {\varvec{\phi }}_{j}(\textbf{y}) \right) d\textbf{y}~ d \textbf{x}\\&~~~-\int _{\widetilde{\Omega }} ~\int _{\Omega }\mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y}) {\varvec{\phi }}_{i}(\textbf{y})^{\!\top \!\!}\left( \varvec{\Psi }_\delta (\textbf{x}, \textbf{y}) {\varvec{\phi }}_{j}(\textbf{x}) - \varvec{\Psi }_\delta (\textbf{y}, \textbf{x}) {\varvec{\phi }}_{j}(\textbf{y}) \right) d\textbf{y}~ d \textbf{x}\\&= 2 \int _{\Omega } {\varvec{\phi }}_{i}(\textbf{x})^{\!\top \!\!}~\int _{\widetilde{\Omega }} \mathbbm {1}^S_{B_{\delta }^\#} (\textbf{x}, \textbf{y}) \left( \varvec{\Psi }_\delta (\textbf{x}, \textbf{y}) {\varvec{\phi }}_{j}(\textbf{x}) - \varvec{\Psi }_\delta (\textbf{y}, \textbf{x}) {\varvec{\phi }}_{j}(\textbf{y}) \right) d\textbf{y}~ d \textbf{x}, \end{aligned}$$

where

$$\begin{aligned} \mathbbm {1}^S_{B_{\delta }^\#}(\textbf{x}, \textbf{y}):=\frac{1}{2}\left( \mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y}) + \mathbbm {1}_{B_{\delta }^\#(\textbf{y})}(\textbf{x}) \right) . \end{aligned}$$
(12)

In view of (5), this shows that the approximate bilinear form \(A_h^{\#}\) can be interpreted as the discretization of the operator \(-\mathcal {L}_\delta\) based on the symmetrified approximate indicator function \(\mathbbm {1}^S_{B_{\delta }^\#}(\textbf{x}, \textbf{y})\) instead of \(\mathbbm {1}_{B_{\delta (\textbf{x})}}(\textbf{y})\) in the strong form. However, defining the approximate bilinear form as in (11) guarantees the symmetry of the stiffness matrix \((A_h^{\#}({\varvec{\phi }}_{j}, {\varvec{\phi }}_{i}) )_{i,j}\); see also Remark 2.6 below.

In the following, we discuss the evaluation of the local contributions

$$\begin{aligned} A^{\#}_{{k} {\ell }} ({\varvec{\phi }}_{j}, {\varvec{\phi }}_{i}) := \int _{\mathcal {E}_k}\int _{\mathcal {E}_{\ell }}\mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y})\Phi _{i{j}}(\textbf{x}, \textbf{y})d\textbf{y}~ d \textbf{x} \end{aligned}$$
(13)

to the (ij)-th entry of the stiffness matrix.

2.3 Population of the Stiffness Matrix

The assembly algorithm iterates over all pairs of elements and then adds the local contributions (13) to the stiffness matrix. We call a pair of elements \((\mathcal {E}_{k}, \mathcal {E}_{\ell })\) intersecting if

$$\overline{\mathcal {E}_k} \cap \overline{\mathcal {E}_{\ell }} \ne \emptyset .$$

If a pair is not intersecting, we call it disjoint. For the evaluation of the contributions, it is only important how the kernel behaves on a fixed pair of elements. If for example the kernel exhibits a singularity at the origin and the pair is disjoint, the singularity does not occur. Also, if a pair of elements \((\mathcal {E}_{k}, \mathcal {E}_{\ell })\) fulfills that

$$\begin{aligned} \mathcal {E}_{\ell } \subset B_\delta (\textbf{x}) \text { for all } \textbf{x}\in \mathcal {E}_k, \end{aligned}$$

the truncation does not come into play. We therefore distinguish two cases. In the first case, the kernel has a singularity with \({s<0.5}\) or the pair of elements is disjoint. This allows to derive the representation (14) of the bilinear form because the singularity either is not too strong or does not occur at all. In the opposite case, the kernel has a singularity with \({s \;\geqslant\; 0.5}\) and the elements are intersecting so that the expression (13) cannot be divided into smaller parts.

2.3.1 Disjoint Pairs or Kernels with \({s<0.5}\)

If \({s<0.5}\) in (1) or if the pair \((\mathcal {E}_{k}, \mathcal {E}_{\ell })\) is disjoint, the local contributions (13) can be factored out and computed separately, i.e.,

$$\begin{aligned} \begin{aligned} A^{\#}_{{k} {\ell }} ({\varvec{\phi }}_{j}, {\varvec{\phi }}_{i})=&\int _{\mathcal {E}_{k}} \int _{\mathcal {E}_{\ell }}\mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y}) {\varvec{\phi }}_{i}(\textbf{x})^{\!\top \!\!}\varvec{\Psi }_\delta (\textbf{x}, \textbf{y}) {\varvec{\phi }}_{j}(\textbf{x}) d\textbf{y}~ d \textbf{x}\\&-\int _{\mathcal {E}_{k}} \int _{\mathcal {E}_{\ell }}\mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y}) {\varvec{\phi }}_{i}(\textbf{x})^{\!\top \!\!}\varvec{\Psi }_\delta (\textbf{y}, \textbf{x}) {\varvec{\phi }}_{j}(\textbf{y}) d\textbf{y}~ d \textbf{x}\\&+\int _{\mathcal {E}_{k}} \int _{\mathcal {E}_{\ell }} \mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y}) {\varvec{\phi }}_{i}(\textbf{y})^{\!\top \!\!}\varvec{\Psi }_\delta (\textbf{y}, \textbf{x}) {\varvec{\phi }}_{j}(\textbf{y}) d\textbf{y}~ d \textbf{x}\\&-\int _{\mathcal {E}_{k}} \int _{\mathcal {E}_{\ell }} \mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y}) {\varvec{\phi }}_{i}(\textbf{y})^{\!\top \!\!}\varvec{\Psi }_\delta (\textbf{x}, \textbf{y}) {\varvec{\phi }}_{j}(\textbf{x}) d\textbf{y}~ d \textbf{x}. \\ =:&A^{\#_1}_{{k} {\ell }} ( {\varvec{\phi }}_{j}, {\varvec{\phi }}_{i}) + A^{\#_2}_{{k} {\ell }} ( {\varvec{\phi }}_{j}, {\varvec{\phi }}_{i}) \\&+ A^{\#_3}_{{k} {\ell }} ( {\varvec{\phi }}_{j}, {\varvec{\phi }}_{i}) + A^{\#_4}_{{k} {\ell }} ( {\varvec{\phi }}_{j}, {\varvec{\phi }}_{i}). \end{aligned} \end{aligned}$$
(14)

In (14), the basis functions, and not their differences, are integrated. Consequently, respective terms in (14) yield identical values for linear discontinuous and continuous elements, and it becomes apparent that, if \(s < 0.5\), the bilinear form can also be evaluated on discontinuous ansatz spaces. Therefore, nlfem allows continuous and discontinuous finite element spaces if \({s<0.5}\). In fact, the splitting (14) is not viable for \(s \;\geqslant\; 0.5\) where we require some regularityFootnote 4 in \({\varvec{\phi }}_{i}\) and \({\varvec{\phi }}_{j}\) for the integral to exist.

The expression \(A^{\#_1}_{{k} {\ell }} ( {\varvec{\phi }}_{j}, {\varvec{\phi }}_{i})\) is nonzero only if the element \(\mathcal {E}_k\) lies in the support of \({\varvec{\phi }}_{i}\) and \({\varvec{\phi }}_{j}\). Similarly, the contribution of \(A^{\#_3}_{{k} {\ell }} ( {\varvec{\phi }}_{j}, {\varvec{\phi }}_{i})\) is linked to the element \(\mathcal {E}_\ell\). The term \(A^{\#_2}_{{k} {\ell }} ( {\varvec{\phi }}_{j}, {\varvec{\phi }}_{i})\) is nonzero only if \({\varvec{\phi }}_{i}\) has its support on \(\mathcal {E}_k\) and \({\varvec{\phi }}_{j}\) on \(\mathcal {E}_\ell\), where the converse holds for \(A^{\#_4}_{{k} {\ell }} ( {\varvec{\phi }}_{j}, {\varvec{\phi }}_{i})\). That way, we derive the indices of the basis functions corresponding to the pair \((\mathcal {E}_k, \mathcal {E}_\ell )\) in the stiffness matrix. We note that the same indices of basis functions occur again for the pair \((\mathcal {E}_\ell , \mathcal {E}_k)\), but the contributions to the stiffness matrix are not identical as the truncation \(\mathbbm {1}_{B_{\delta }^\#(\textbf{x})}(\textbf{y})\) is not symmetric. Ultimately, the contribution of the two pairs \((\mathcal {E}_k,\mathcal {E}_\ell )\) and \((\mathcal {E}_\ell ,\mathcal {E}_k)\) together lead to the symmetrified truncation (12).

2.3.2 Kernels with \({s\;\geqslant\;0.5}\) on Intersecting Pairs

If a pair of elements is intersecting and \({s \;\geqslant\; 0.5}\), the separation of integrands in (14) is not admissible. Therefore, nlfem is restricted to continuous basis functions if \({s \;\geqslant\; 0.5}\). As the pair is intersecting, the elements are either vertex touching, edge touching, or identical. Therefore, the number of basis functions to be considered in these cases is 5, 4, or 3, respectively. If we denote the vertices by \(j_1, \dots , j_\ell\) for \(\ell = 5,4,3\), we obtain 25, 16, or 9 pairs of basis functions \(({\varvec{\phi }}_{j_\nu }, {\varvec{\phi }}_{j_{\nu '}})\) for \(\nu , \nu ' = 1, \dots , \ell\) which yield nonzero contributions to the local stiffness matrix \((A^{\#}_{{k} {\ell }} ({\varvec{\phi }}_{j_\nu }, {\varvec{\phi }}_{j_{\nu^{\prime}}}))_{1 \;\leqslant\; \nu ,\nu^{\prime} \;\leqslant\; \ell }\).

2.4 Quadrature

The quadrature rules need to work for kernels with truncations and singularities. However, the implemented quadrature rules in nlfem do not account for both at the same time on a fixed pair of elements. We therefore require the following assumption.

Assumption 3

The quadrature rules for singular kernels assume that for all intersecting pairs of elements \((\mathcal {E}_k, \mathcal {E}_\ell )\), it holds that \(\mathcal {E}_{\ell } \subset B_\delta (\textbf{x})\) for all \(\textbf{x}\in \mathcal {E}_k\).

If Assumption 3 is violated, the interaction neighborhood of the kernel is overestimated. This deteriorates the error finite element error. However, for sufficiently small mesh size, say \(3h \;\leqslant\; \delta\), the problem does not occur.

Remark 2.5

Specific quadrature rules for singular kernels are required for sufficiently strong singularities only. Kernels like the peridynamics kernel (28) can be integrated by simply technically avoiding zero-divisions. More precisely, by choosing fixed quadrature rules for the inner and outer integral which do not have shared quadrature points, we make sure that \((\textbf{x}-\textbf{y}) \ne \textbf{0}\) in (28) for any kernel evaluation during the assembly process.

Given Assumption 3 and the fact that disjoint pairs do not require a treatment with regularizing integral transforms [16], we can evaluate all contributions as given in Algorithm 1.

Algorithm 1
figure a

Evaluate \(A^{\#}_{{k} {\ell }}\) in (13)

In any case, the quadrature is performed by pulling back the domain of integration to a reference domain

$$\begin{aligned} \widehat{\mathcal {E}} \times \widehat{\mathcal {E}},~~ \text { where }~~ \widehat{\mathcal {E}} := \lbrace \widehat{\textbf{x}}\in \mathbb {R}^d ~|~ \widehat{\textbf{x}} \;\geqslant\; 0, \sum _{\iota =1}^{{d}} \widehat{\textbf{x}}_\iota \;\leqslant\; 1\rbrace . \end{aligned}$$

The affine linear mapping \(\chi _k: \widehat{\mathcal {E}} \rightarrow \mathcal {E}\) with Jacobian determinant \(|\chi _k|\) from the reference to a physical element allows to define the pullback

$$\begin{aligned} \widehat{\Phi }_{k\ell , i{j}}(\widehat{\textbf{x}}, \widehat{\textbf{y}}) := \Phi _{ i{j}}(\chi _{k}(\widehat{\textbf{x}}), \chi _{\ell }(\widehat{\textbf{y}})) |\chi _k| |\chi _\ell |, \end{aligned}$$

the inner integral

$$\begin{aligned} \widehat{\mathcal {K}}_{k\ell , i{j}}^\#(\widehat{\textbf{x}}) := \int _{\widehat{\mathcal {E}}} \mathbbm {1}_{B_{\delta }^\#} (\chi _{k}(\widehat{\textbf{x}}), \chi _{\ell }(\widehat{\textbf{y}})) \widehat{\Phi }_{k\ell , i{j}}(\widehat{\textbf{x}},\widehat{\textbf{y}}) d\widehat{\textbf{y}} \end{aligned}$$
(15)

and the local contribution to the \(({i},j)\)-th entry of the stiffness matrix

$$\begin{aligned} \widehat{A}_{k\ell , i{j}}^\# := \int _{\widehat{\mathcal {E}}} \widehat{\mathcal {K}}_{k\ell , i{j}}^\#(\widehat{\textbf{x}}) d\widehat{\textbf{x}} \end{aligned}$$
(16)

on the reference element.

2.4.1 Quadrature for Kernel Truncations

For some pairs \((\mathcal {E}_k, \mathcal {E}_{\ell })\), we find that \(\mathcal {E}_{\ell }\) is only partially covered by the interaction neighborhood \(B_{\delta }^\# (\textbf{x})\) for some \(\textbf{x}\in \mathcal {E}_k\), so that \(\mathcal {E}_{\ell } \cap B_{\delta }^\# (\textbf{x}) \subsetneq \mathcal {E}_{\ell }\). In this case, the ball approximations \(B_{\delta }^{ncp}(\textbf{x})\) and \(B_{\delta }^{acp}(\textbf{x})\) as well as the ball \(B_{\delta }^{\infty }(\textbf{x})\) require a retriangulation of the integration domain \(\mathcal {E}_{\ell } \cap B_{\delta }^\# (\textbf{x})\). We denote the set of elements which result from such a retriangulationFootnote 5 by

$$\begin{aligned} \mathcal {T}_{h, \ell }^\#(\textbf{x}) := \lbrace \tilde{\mathcal {E}}_{\tilde{\ell }} \rbrace _{\tilde{\ell }=1}^{L_\ell }, \end{aligned}$$

so that \(\bigcup \mathcal {T}_{h, \ell }^\#(\textbf{x}) = \mathcal {E}_{\ell } \cap B_{\delta }^\# (\textbf{x})\). In view of Fig. 2, the set \(\mathcal {T}_{h, \ell }^\#(\textbf{x})\) collects the elements making up the shaded region \(\bigcup \mathcal {T}_{h, \ell }^\#(\textbf{x})\) for the nocaps ball (Fig. 2a) and the approxcaps ball (Fig. 2b), respectively. Let \(\lbrace \widehat{\textbf{y}}_{q}, d_{{\widehat{\textbf{y}}}_q}\rbrace _{q=1}^{Q}\) denote a quadrature rule on the reference element \(\widehat{\mathcal {E}}\), and let \(\widehat{\textbf{x}}\in \widehat{\mathcal {E}}\) be some reference point. Then, the fully discrete inner integral from (15) reads as

$$\begin{aligned} \widehat{\mathcal {K}}_{k\ell , i{j}}^\#(\widehat{\textbf{x}}) \approx \sum _{\tilde{\ell }= 1}^{L_{\ell }} \sum _{ q= 1}^{Q} \widehat{\Phi }_{k\tilde{\ell }, i{j}}(\widehat{\textbf{x}},\widehat{\textbf{y}}_{q}) d_{{\widehat{\textbf{y}}}_q}. \end{aligned}$$
(17)

In finite element implementations, the function values of basis functions are usually pre-computed at the quadrature points and stored. However, if a retriangulation is necessary, the physical coordinates of the quadrature points \(\chi _{\tilde{\ell }}(\widehat{\textbf{y}}_{q})\) for \(q= 1,\dots ,Q\) on some element \(\tilde{\mathcal {E}}_{\tilde{\ell }} \in \mathcal {T}_{h, \ell }^\#(\textbf{x})\) are known at runtime only and the basis functions are evaluated at the corresponding points.

Now, let \(\lbrace \widehat{\textbf{x}}_{ p}, d_{{\widehat{\textbf{x}}}_p}\rbrace _{p=1}^{P}\) be a quadrature rule with points \(\widehat{\textbf{x}}_{ p}\) and weights \(d_{{\widehat{\textbf{x}}}_p}\) for the reference element of the outer integral. Then, with (17), the discretized version of the local contribution to the \((i, j)\)–th entry of the stiffness matrix (16) is obtained by

$$\begin{aligned} \widehat{A}_{k\ell , i{j}}^\# \approx \sum _{p=1}^P\left( \sum _{\tilde{\ell } = 1}^{L_{\ell }} \sum _{ q= 1}^{Q} \widehat{\Phi }_{k\tilde{\ell }, i{j}}(\widehat{\textbf{x}}_{p},\widehat{\textbf{y}}_{q}) d_{{\widehat{\textbf{y}}}_q}\right) d_{{\widehat{\textbf{x}}}_p}. \end{aligned}$$

2.4.2 Quadrature for Singularities

Some kernel functions exhibit a singularity at the origin, which lies in the integration domain of \(\widehat{\mathcal {K}}^\#_{k\ell , i{j} }(\widehat{\textbf{x}})\) whenever the pair \((\mathcal {E}_{k}, \mathcal {E}_{\ell })\) intersects. We therefore require regularizing integral transforms [16]. Assumption 3 allows to ignore possible truncations in (16) and to simply evaluate

$$\begin{aligned} \widehat{A}_{k\ell , i{j}}^\# = \int _{\widehat{\mathcal {E}}} \int _{\widehat{\mathcal {E}}} \widehat{\Phi }_{k\ell , i{j}}(\widehat{\textbf{x}}, \widehat{\textbf{y}}) d\widehat{\textbf{y}}d\widehat{\textbf{x}}. \end{aligned}$$
(18)

Intersecting pairs \((\mathcal {E}_{\ell },\mathcal {E}_k)\) can be vertex touching, edge touching, or identical. For each of those cases, we apply integral transforms, which again pull back subsets of the integration domain \(\widehat{\mathcal {E}} \times \widehat{\mathcal {E}}\) to the unit cube \((0,1)^4\). The transformations are well established and applied, for example, in the field of boundary element methods. Details can be found in [16, 17]. Note again, that in the case of singular kernels with \({s \;\geqslant\; 0.5}\), the implemented routines cannot evaluate the expression in (18) for discontinuous basis functions.

Remark 2.6

(Asymmetry due to quadrature). We have mentioned in Sect. 2.2 that the approximate indicator function based on the approxcaps and nocaps balls may generally lack in symmetry and therefore would lead to nonsymmetric stiffness matrices if one used representation (5) of the bilinear form. Also, truncations invoked by the infinity norm ball \(B_\delta ^\infty (\textbf{x})\), which can be implemented exactly, can be nonsymmetric on sufficiently irregular grids. These two observations hold true independent of the symmetry of the kernel function \(\varvec{\Psi }_\delta (\textbf{x},\textbf{y})\). However, it is a desirable feature that the symmetry of the kernel, i.e., the self-adjointness of the operator, is transported through the discretization process. In other words, for a symmetric kernel, we expect a symmetric stiffness matrix. This behavior is more intricate and related to the approximation of the interaction domain of a finite element by the union of interaction neighborhoods of quadrature points; see Fig. 3. Thus, we use by default representation (11) instead, which as stated above corresponds in strong form to the operator \(-\mathcal {L}_\delta\) based on the approximate indicator function (12). Thereby, we guarantee the stiffness matrix to be symmetric up to machine precision for any symmetric kernel function. Also, an important consequence of the symmetrification is that the null space of the stiffness matrix related to pure Neumann-type problems contains the constant vectors up to machine precision. This allows the application of projected Krylov subspace methods or rank-1 corrections to efficiently evaluate a pseudoinverse of the stiffness matrix.

Fig. 3
figure 3

The figure shows the approximation of the interaction domain of a single triangular element by interaction neighborhoods of three quadrature points for the infinity normball \(B_\delta ^\infty\)

We summarize the different cases for the quadrature of kernels in Table 1. The table distinguishes between different degrees of singularity s in the case of \(d=2\) as well as intersecting and disjoint pairs of elements. When \(s > 0\), we apply regularizing integral transforms represented by \(\clubsuit\) in the table. This quadrature rule is applied for intersecting pairs and requires Assumption 3. Whenever \({s<0.5}\), we allow for discontinuous and continuous basis functions. When \({s \;\geqslant\; 0.5}\), only continuous basis functions are provided. For more details, we refer to [18]. If \(-1 < s \;\leqslant\; 0\), the singularity might be so weak that it suffices to simply avoid zero-divisions on intersecting pairs, which is represented by \(\diamondsuit\) in the table. However, the quadrature rule \(\clubsuit\) can also be applied. If \(s\;\leqslant\;-1\), there is no singularity at all, and the symbol \(\heartsuit\) represents a standard quadrature rule. The rules, \(\heartsuit\) and \(\diamondsuit\), respect the kernel truncation, i.e., they do not require Assumption 3.

Table 1 The table gives an overview of the different quadrature rules which can be applied for dimension \(d=2\) in nlfem

2.5 Traversal of the Interaction Neighborhood

The local contributions \(\widehat{A}_{k\ell , i{j}}^\#\) to the stiffness matrix, defined in (13), can be nonzero even for pairs of disjoint finite elements \((\mathcal {E}_{k}, \mathcal {E}_{\ell })\), which we then refer to as interacting elements in the following. Unstructured meshes and the various supports of kernel functions call for a flexible routine to identify interacting elements. The identification can be accomplished if the assembly follows a breadth-first search. A breadth-first search consecutively traverses the nodes in a graph starting in a root node. It first visits the nodes which are directly adjacent to the root node. Then, only it visits the (unvisited) neighbors of the successive nodes. The search stops when a certain criterion is met or no further successors can be found. We refer to the neighbors of the root node as first layer of the search.

In order to describe the algorithm, we define the set of immediate neighbors of some element \(\mathcal {E}_{k}\) by

$$\begin{aligned} \mathcal {N}(\mathcal {E}_{k}) = \lbrace \mathcal {E}_{\ell } ~\vert ~ \overline{{\mathcal {E}}_k} \cap \overline{{\mathcal {E}}_{\ell }} \ne \emptyset \rbrace \end{aligned}$$
(19)

and the adjacency graph \(\mathbb {T}_{adj}:= \left( \mathcal {T}^h,E_{adj} \right)\) of the finite element mesh with vertices \(\mathcal {T}^h\) and edges

$$E_{adj}:= \lbrace (\mathcal {E}_k, \mathcal {E}_{\ell }) \in \mathcal {T}^h\times \mathcal {T}^h \, | \,{\mathcal {E}_k \in \mathcal {N}(\mathcal {E}_\ell ) }\rbrace .$$

The graph \(\mathbb {T}_{adj}\) can be understood as the dual graph of the finite element mesh, and its vertices are therefore given by the elements. We additionally define the interaction graph \(\mathbb {T}_S:= \left( \mathcal {T}^h, E_S\right)\) with vertices \(\mathcal {T}^h\) and edges

$$\begin{aligned} E_S := \left\{ \left. (\mathcal {E}_k, \mathcal {E}_{\ell }) \in \mathcal {T}^h\times \mathcal {T}^h \, \right| \, \widehat{A}_{k\ell , i{j}}^\# \ne 0 ~\text {for some}~ {\varvec{\phi }}_{j},{\varvec{\phi }}_{i} \in V^h(\widetilde{\Omega }, \mathbb {R}^n) \right\} . \end{aligned}$$
(20)

The set of edges \(E_{adj}\) is contained in \(E_{S}\) because touching elements interact. Hence, the set of vertices of \(\mathbb {T}_{adj}\) and \(\mathbb {T}_S\) are identical, and all edges of \(\mathbb {T}_{adj}\) are contained in \(\mathbb {T}_S\). In that sense, \(\mathbb {T}_{adj}\) is a subgraph of \(\mathbb {T}_S\). Furthermore, for a fixed element \(\mathcal {E}_k\), let us denote the set of all interacting elements by \(\mathcal {T}^h_k:= \{\left. \mathcal {E}_\ell \in \mathcal {T}^h \right| (\mathcal {E}_k, \mathcal {E}_{\ell }) \in E_S\}\). We then define the subgraph \(\mathbb {T}_{S_k}:=\left( \mathcal {T}^h_k, E_{S_k} \right)\) of \(\mathbb {T}_S\) with vertices \(\mathcal {T}^h_k\) and edges given by

$$\begin{aligned} E_{S_k} := \left\{ \left. (\mathcal {E}_k, \mathcal {E}_{\ell }) \in \mathcal {T}^h \times \mathcal {T}^h \, \right| \, (\mathcal {E}_k, \mathcal {E}_{\ell }) \in E_S \right\} . \end{aligned}$$

The adjacency graph \(\mathbb {T}_{adj}\) can be computed and stored efficiently, while the interaction graph \(\mathbb {T}_S\) exhibits storage requirements which are comparable to the full stiffness matrix. We also note that the breadth-first search described below allows to naturally identify intersecting pairs \((\mathcal {E}_k, \mathcal {E}_\ell )\), as they only occur in the first layer of the traversal. This important built-in feature is used to identify the intersection cases for element pairs mentioned in Sect. 2.3.

Assumption 4

We assume without loss of generality that each of the graphs \(\mathbb {T}_{S_k}\), \(\mathbb {T}_{adj}\), and \(\mathbb {T}_S\) is connected.

It is clear that Assumption 4 can be violated if \(\widetilde{\Omega }\) is not connected or even if it is connected as depicted in Fig. 4a. As a remedy, the implementation allows to add artificial vertices and elements to the mesh which connect certain parts of the mesh. The artificial vertices and elements allow to change the mesh topology so that it is guaranteed that the interaction domain of an element, and hence \(\mathbb {T}_{S_m}\), is always connected; see Fig. 4b. Of course, the artificial elements do not enter any integration routines. A straightforward option is to embed \(\widetilde{\Omega }\) into a bounded and convex hold-all domain \(\widehat{\Omega }\subset \mathbb {R}^{n}\), which guarantees that the graphs \(\mathbb {T}_{S_k}\), \(\mathbb {T}_{adj}\), and \(\mathbb {T}_S\) are always connected; see Fig. 4b.

Fig. 4
figure 4

Connected domain causing disconnected neighborhoods. In a, the interaction neighborhood of an element is disconnected even though the domain is connected. In b, a convex hold-all domain \(\widehat{\Omega }\) allows to account for possible interactions within disconnected interaction neighborhoods

In that sense, Assumption 4 does not cause any loss of generality. If Assumption 4 holds, we can recover the subgraph \(\mathbb {T}_{S_k}\) with a truncated breadth-first traversal of \(\mathbb {T}_{adj}\) starting in the root node \(\mathcal {E}_k\) as given in Algorithm 2. To that end, we define an empty queue Q of elements to which we can append new elements and read and remove them in a first-in first-out ordering. As first step, the root node \(\mathcal {E}_k\) is appended to \(Q = [\mathcal {E}_k]\). While Q is not empty, an element \(\mathcal {E}_{\widetilde{\ell }}\) is read and removed from the queue in a first-in first-out order. Its immediate neighbor, say \(\mathcal {E}_{\ell }\in \mathcal {N}(\mathcal {E}_{\widetilde{\ell }})\), are obtained from the adjacency graph and the integrals \(\widehat{A}_{k\ell , i{j}}^\#\) are successively evaluated for all \(\mathcal {E}_{\ell } \in \mathcal {N}(\mathcal {E}_{\widetilde{\ell }})\). The element \(\mathcal {E}_{\ell }\) is added to the queue whenever the integral does not vanish, and finally marked as visited. This procedure automatically truncates the search to interacting elements for any connected interaction neighborhood.

Algorithm 2
figure b

Traversal of interaction neighborhood

3 Numerical Examples

We solve a truncated fractional-type steady-state diffusion problem, a linear bond-based peridynamics equation [5, 35] and a steady-state diffusion problem based on the infinity normball. In the first two examples, we demonstrate the convergence rate of the approximate solutions to a manufactured solution as the mesh size \(h\rightarrow 0\). In the latter example, we demonstrate the asymptotic compatibility of the discretization scheme for a vanishing horizon \(\delta\).

To this end, let \(\Omega := (0,0.5)^2 \subset \mathbb {R}^2\). We define the interaction domain of \(\Omega\) by \(\Omega _D:= [ -\delta ,~ 0.5 + \delta ]^2 ~ {\setminus } ~ \Omega ,\) so that \(\widetilde{\Omega }= [-\delta ,0.5+\delta ]^2\). We then want to solve the nonlocal Dirichlet-type problem

$$\begin{aligned} {\left\{ \begin{array}{ll} -\mathcal {L}_\delta \textbf{u}= \textbf{f}&{} \text {in } \Omega ,\\ \textbf{u}= \textbf{g}&{} \text {in } \Omega _D.\\ \end{array}\right. } \end{aligned}$$
(21)

We define the function spaces

$$\begin{aligned}&V(\widetilde{\Omega }, \mathbb {R}^n) = \lbrace \textbf{u}\in L^2(\widetilde{\Omega }, \mathbb {R}^n) : \Vert \textbf{u}\Vert _V < \infty \rbrace ,\\&V_c(\widetilde{\Omega }, \mathbb {R}^n) = \lbrace \textbf{u}\in V(\widetilde{\Omega }, \mathbb {R}^n) : \textbf{u}= 0 \text { in } \Omega _D\rbrace , \end{aligned}$$

where

$$\begin{aligned} \Vert \textbf{u}\Vert _V^2 = A(\textbf{u}, \textbf{u}) + \Vert \textbf{u}\Vert _{L^2(\widetilde{\Omega })}^2. \end{aligned}$$

For given data \(\textbf{f}\in L^2(\Omega , \mathbb {R}^n)\) and \(\textbf{g}:= \textbf{v}_{|\Omega _D}\), where \(\textbf{v}\in V(\widetilde{\Omega }, \mathbb {R}^n)\), we call \(\textbf{u}\in V(\widetilde{\Omega }, \mathbb {R}^n)\) the weak solution to problem (21), if

$$\begin{aligned} \begin{aligned} A(\textbf{u}, \textbf{v})&= (\textbf{f}, \textbf{v})~~ {for all}~~\textbf{v}\in V_c(\widetilde{\Omega }, \mathbb {R}^n),\\ {and}~~ \textbf{u}&= \textbf{g}\text { in }\Omega _D. \end{aligned} \end{aligned}$$
(22)

The well-posedness of problem (22) for various choices of kernel functions can be found, e.g., in [8, 21, 22, 36,37,38]. By exploiting the known values of \(\textbf{u}\) on the Dirichlet domain, we can rewrite the first line in (22) as

$$\begin{aligned} A_{\Omega \Omega }(\textbf{u}, \textbf{v}) = (\textbf{f}, \textbf{v}) - A_{\Omega \Omega _D}(\textbf{g}, \textbf{v}), \end{aligned}$$
(23)

where

$$\begin{aligned} A_ {\Omega \Omega }(\textbf{u}, \textbf{v}) :=&\int _{\Omega } \int _{\Omega } (\textbf{v}(\textbf{x})-\textbf{v}(\textbf{y}))^{\!\top \!\!}\left( \textbf{C}_\delta (\textbf{x}, \textbf{y})\textbf{u}(\textbf{x}) - \textbf{C}_\delta (\textbf{y}, \textbf{x}) \textbf{u}(\textbf{y}) \right) d\textbf{y}d \textbf{x}\\&+ 2\int _{\Omega } \int _{\Omega _D} \textbf{v}(\textbf{x})^{\!\top \!\!}\textbf{C}_\delta (\textbf{x}, \textbf{y})\textbf{u}(\textbf{x}) d\textbf{y}d \textbf{x}\end{aligned}$$

and

$$\begin{aligned} A_ {\Omega \Omega _D}(\textbf{g}, \textbf{v}):= - 2 \int _{\Omega } \int _{\Omega _D} \textbf{v}(\textbf{x})^{\!\top \!\!}\textbf{C}_\delta (\textbf{y}, \textbf{x}) \textbf{g}(\textbf{y}) d\textbf{y}d \textbf{x}. \end{aligned}$$

In the stiffness matrix, the splitting (23) can be naturally obtained by separating the columns corresponding to the degrees of freedom from the columns corresponding to the nodes on the boundary \(\Omega _D\).

3.1 Truncated Fractional-Type Diffusion

We choose the scalar-valued translationally invariant and symmetric kernel function \(\gamma ^s_\delta : \mathbb {R}^2 \times \mathbb {R}^2 \rightarrow \mathbb {R}\) given by [22]

$$\begin{aligned} \gamma ^s_\delta (\textbf{x}, \textbf{y}) = c_{s,\delta } ~ \frac{1}{|\textbf{x}- \textbf{y}|^{2+2s}}, \end{aligned}$$
(24)

where

$$\begin{aligned}&c_{s, \delta } = \frac{2-2s}{\pi \delta ^{2-2s}} ~~\text { and }~~ s \in (0,1). \end{aligned}$$
(25)

Then, the scalar-valued truncated fractional-type diffusion operator reads as

$$\begin{aligned} -\mathcal {L}^s_\delta u(\textbf{x}) := \int _{B_\delta (\textbf{x})} \gamma ^s_\delta (\textbf{x}, \textbf{y}) (u(\textbf{x}) - u(\textbf{y})) d\textbf{y}. \end{aligned}$$
(26)

The well-posedness of problem (22) for this choice of kernel is studied in [22]. The constant \(c_\delta\) depends on \(\delta\) and s and is chosen such that the operator converges to the classical Laplacian as \(\delta \rightarrow 0\); see, e.g., [26, Lemma 7.4.1]. Another choice of the constant convergence to the fractional Laplacian as \(\delta \rightarrow \infty\) can also be obtained [39].

In the example above, we choose the manufactured solution \(u(\textbf{x}) = x_1^2 x_2 + x_2^ 2\) and set \(f(\textbf{x}):= -\Delta u(\textbf{x})= -2(x_2 + 1)\) in \(\Omega\) and \(g(\textbf{x}):= u(\textbf{x})\) on \(\Omega _D\). Since the correctly scaled nonlocal operator equals the classical Laplacian operator on polynomials of order up to three (see, e.g., [26]), we have that \(u(\textbf{x})\) is the solution of problem (22). Furthermore, we choose \(s:=0.5\), \(\delta = 0.2\), and various mesh sizes h as given in the tables below. In view of Table 1, for pairs of disjoint elements, we use as \(\heartsuit\) a 7-point quadrature ruleFootnote 6 for each, i.e., outer and inner, integral. Since six of the seven points are located on the boundary of the triangle, this choice has proven to be advantageous in the case of truncated kernels since the resulting interaction neighborhoods centered at these points better approximate the interaction domain of this triangle; also, see Fig. 3, and for more details, see [1]. For intersecting pairs, we need a quadrature rule on \((0,1)^4\) after the integral transformations are performed. For \(\clubsuit\), we choose a tensor product of a 5-point Gauss quadrature rule, which is sufficiently accurate to preserve the expected convergence rates.

The convergence rates on a continuous Galerkin ansatz space are shown in Table 2. For our choice of \(s = 0.5\), a discontinuous Galerkin ansatz space is also conforming, and the results are presented in Table 3. In both settings, we observe the expected second-order convergence as the mesh size \(h\rightarrow 0\); see, e.g., [22]. Note that the given examples violate Assumption 4 in the first stage of the experiments as \(2 h > \delta\). We see that the first rates in both tables are affected by this.

Table 2 Convergence rates for the truncated fractional diffusion operator (26), \(\delta =0.2\), and \(h \rightarrow 0\) in a continuous Galerkin ansatz space
Table 3 Convergence rates for the truncated fractional diffusion operator (26), \(\delta =0.2\), and \(h \rightarrow 0\) in a discontinuous Galerkin ansatz space

3.2 Bond-Based Peridynamics

The translationally invariant and symmetric linear peridynamic kernel is given by [38]

$$\begin{aligned} \textbf{C}_\delta (\textbf{x}, \textbf{y}) := c_{\delta } ~ \textbf{C}(\textbf{x}- \textbf{y})~ \mathbbm {1}_{B_\delta (\textbf{x})}(\textbf{y}), \end{aligned}$$
(27)

where

$$\begin{aligned} \textbf{C}(\textbf{x}- \textbf{y}) := \frac{(\textbf{x}- \textbf{y}) (\textbf{x}- \textbf{y})^\top }{|\textbf{x}- \textbf{y}|^3} ~~\text {and}~~ c_\delta := \frac{3}{\delta ^3}. \end{aligned}$$
(28)

The corresponding linear peridynamics operator then reads as

$$\begin{aligned} -\mathcal {P}_\delta \textbf{u}(\textbf{x}) := c_{\delta }\int _{B_\delta (\textbf{x})}C(\textbf{x}- \textbf{y}) (\textbf{u}(\textbf{x}) - \textbf{u}(\textbf{y})) d\textbf{y}. \end{aligned}$$
(29)

In [38], the well-posedness is established for problem (22), where \(\textbf{f}\in L^2(\Omega , \mathbb {R}^{d})\) and \(\textbf{g}\in L^2(\Omega _D, \mathbb {R}^{d})\). For the given constant in (28), it is also shown there that the peridynamics operator \(-\mathcal {P}_\delta\) converges to the local Navier operator

$$\begin{aligned} -\mathcal {P}_0 \textbf{u}(\textbf{x}):= -\frac{\pi }{4} \Delta \textbf{u}(\textbf{x}) - \frac{\pi }{2} \nabla {{\,\textrm{div}\,}}\textbf{u}(\textbf{x}) \end{aligned}$$
(30)

as \(\delta \rightarrow 0\). A similar convergence result is also obtained for the corresponding weak solutions. Thus, in the given example, we choose the manufactured polynomial solution \(\textbf{u}(\textbf{x}):= (x_2^2, x_1^2x_2)\), set \(\textbf{f}(\textbf{x}):=-\mathcal {P}_0 \textbf{u}(\textbf{x})= -\frac{\pi }{2}\left( 1 + 2x_1, x_2 \right)\) in \(\Omega\) and as Dirichlet constraints choose \(\textbf{g}(\textbf{x}):= \textbf{u}(\textbf{x})\) on \(\Omega _D\). Similarly to the diffusion case above, we again obtain that \(\textbf{u}(\textbf{x})\) is the solution of (22) due to the correct scaling of the operator [38]. The results are presented in Tables 4 and 5 for a continuous and discontinuous Galerkin ansatz, respectively. In both settings we observe second-order convergence as the mesh size \(h\rightarrow 0\).

Table 4 Convergence rates for the peridynamics operator (29), \(\delta =0.1\), and \(h \rightarrow 0\) in a continuous Galerkin ansatz space
Table 5 Convergence rates for peridynamics operator (29), \(\delta =0.1\), and \(h \rightarrow 0\) in a discontinuous Galerkin ansatz space

3.3 Diffusion with Infinity Ball Truncation

Here, we consider a constant kernel truncated by the infinity normball. Specifically, we choose the constant to be \(c_\delta ^\infty := \frac{3}{4 \delta ^ 4}\), which ensures the convergence to the local Dirichlet problem for vanishing horizon \(\delta \rightarrow 0\); see, e.g., [26]. The nonlocal operator is then given by

$$\begin{aligned} -\mathcal {L}^\infty _\delta u (\textbf{x}) := c_\delta ^\infty \int _{B_\delta ^\infty (\textbf{x})} (u(\textbf{x}) - u(\textbf{y})) d \textbf{y}. \end{aligned}$$
(31)

The truncation by the infinity normball is implemented without geometric error, which allows numerical tests of the asymptotic compatibility of the finite element discretization [30]. For the numerical experiment, we choose the manufactured solution \(u(\textbf{x}) = \sin ( 4 \pi x_1) \sin (4 \pi x_2 )\), set \(f(\textbf{x}):= -\Delta u(\textbf{x}) = 32 \pi ^2 \sin (4 \pi x_1) \sin (4 \pi x_2 )\) in \(\Omega\) and \(g(\textbf{x}):= u(\textbf{x})\) on \(\Omega _D\). Note that, opposed to the previous examples, here, for a fixed \(\delta >0\), the function \(-\mathcal {L}_\delta ^\infty u\) differs from \(-\Delta u\). Consequently, the solutions of the nonlocal and the local Dirichlet problem differ from each other for each \(\delta >0\) and coincide in the limit \(\delta \rightarrow 0\). In fact, the solution to (31) changes as \(\delta\) changes, and we can observe the convergence in Table 6. We run tests for a fixed mesh size h and vanishing \(\delta\) (see Table 7), as well as for a horizon-dependent mesh size \(h= \sqrt{2}\delta\) and vanishing \(\delta\) (see Table 6). In both cases, we observe a second-order convergence as \(\delta \rightarrow 0\).

Table 6 Convergence rates and timings for the infinity normball (31), \(\delta = \sqrt{2}h\), and \(h,\delta \rightarrow 0\) in a continuous Galerkin ansatz space
Table 7 Convergence rates for the infinity normball (31), fixed h, and \(\delta \rightarrow 0\) in a continuous Galerkin ansatz space

3.4 Nonlocal Convection-Diffusion

Nonsymmetric kernels can model convective effects, and nlfem allows to assemble the respective systems. In this example, we consider a convection-diffusion operator

$$\begin{aligned} -\mathcal {L}^{cd}_\delta u (\textbf{x}) = 2\int _{\widetilde{\Omega }} \vartheta (\textbf{x},\textbf{y}) u(\textbf{x}) - \vartheta (\textbf{y},\textbf{x}) u(\textbf{y}) d \textbf{y}, \end{aligned}$$
(32)

with \(\vartheta = \vartheta ^{d} + \vartheta ^{c}\) consisting of a constant diffusion term

$$\begin{aligned} \vartheta ^{d}(\textbf{x}, \textbf{y}) = \epsilon ~ \mathbbm {1}_{B_{\delta }^{\infty }(\textbf{x})}(\textbf{y}) \frac{3}{4 \delta ^4}, \end{aligned}$$
(33)

and an antisymmetric kernel

$$\begin{aligned} \vartheta ^{c}(\textbf{x}, \textbf{y}) = \mathbbm {1}_{B_{\delta }^{\infty }(\textbf{x})}(\textbf{y}) \frac{\textbf{b}^T(\textbf{y}-\textbf{x})}{2} \frac{3}{4 \delta ^4}, \end{aligned}$$
(34)

with \(\textbf{b}=(1,1)^T\), to account for the convective effects. The factor \(\epsilon >0\) controls the influence of the diffusion in the model, and it is set to \(10^{-3}\). In order to show the convergence of the approximation, we choose the solution \(u(\textbf{x}) = x_1^2 + x_2^2\) for which we obtain \(-\mathcal {L}^{cd}_\delta u (\textbf{x}) = -4\epsilon + 2x_1 + 2x_2=:f(\textbf{x})\). Note that the manufactured solution in fact fulfills

$$\begin{aligned} -\mathcal {L}^{cd}_\delta u (\textbf{x}) = -\epsilon ~\Delta u(\textbf{x}) + \textbf{b}^T \nabla u(\textbf{x}). \end{aligned}$$
(35)

The above differential operator is, in fact, a special case of a local convection-diffusion operator because the vector field \(\textbf{b}\) is constant in \(\textbf{x}\) and thus has zero divergence. The second-order convergence is shown in Table 8. Another example with \(\epsilon = 0.1\) is plotted in Fig. 5.

Fig. 5
figure 5

Contour plot of the solution to a nonlocal convection-diffusion problem with constant forcing term \(f(x) \equiv 1\), Dirichlet zero boundary conditions, horizon \(\delta = 0.01\), and kernel \(\vartheta\) with \(\epsilon = 0.1\)

Table 8 Convergence of the nonlocal convection-diffusion problem for \(\delta = 0.1\) and \(h \rightarrow 0\)

The code nlfem only allows to assemble stiffness matrices for kernels with sufficiently large diffusive part \(\vartheta ^{d}\), and the convergence of the convection-diffusion problem becomes unstable for \(\epsilon =10^{-5}\). The modeling of stronger convective effects requires upwind integration schemes like the one presented in [40].

3.5 Parallel Complexity of the Assembly Process

The number of elements in each interaction neighborhood grows quadratically in 2d if the diameter of the elements h is decreased for fixed \(\delta\). Thus, the assembly of the system matrix with retriangulations as described in the beginning of Sect. 2.4.1 becomes a costly procedure. Therefore, a matrix-free approach is too expensive, and we store the stiffness matrix in a sparse format. Furthermore, it makes sense to share the work among multiple threads. The multithreading is implemented using OpenMP [41], and the work is shared by a partitioning of finite elements as given in Algorithm 2, line 1. Due to the nonlocality of the operator, several threads might need to access identical entries in the global stiffness matrix at the same time to store their contribution. OpenMP allows so-called critical sections to organize the manipulation of shared variables. This avoids write conflicts, but it would tremendously slow down the computation. In order to avoid a critical section during the assembly, each thread separately allocates its portion of the global stiffness matrix. The size of the overlap among the submatrices in the threads, i.e., the amount of additional memory requirements due to the parallelization, depends on the nonlocal overlap of the subdomains. We can therefore reduce the memory requirements by partitioning the domain with metis [42] instead of a scheduler of OpenMP. The submatrices are finally added together into a single sparse matrix.

We depict the strong scaling on a computer with two 2.20GHz Intel Xeon CPUs with 22 cores per socket and two threads per core. The machine has 756 GB of RAM. Table 9 and Fig. 6a (black dots) show the run time of an assembly on a regular grid on a domain \(\widetilde{\Omega }= [-\delta , 0.5 + \delta ]^2\) with mesh size \(h=\) 7.1e\(-\)03 and \(\delta =\) 0.1. The related linear system has 24,336 degrees of freedom and 45,022,167 nonzero entries. The number of threads is increased by a factor up to 32 while the time drops by a factor of 1/20. The scaling looks perfect for up to 16 parallel threads and the effect diminishes from then on. Figure 6a also shows the scaling for smaller problems with 10,848, 6120, and 2736 degrees of freedom (gray dots) which show a similar behavior.

Fig. 6
figure 6

a Strong scaling for 24,336 (black) and 10,848, 6120, and 2736 (gray) degrees of freedom. b Weak scaling for horizons 0.06, 0.05, and 0.04 (black)

Table 9 Assembly time for a system with 24,336 degrees of freedom in strong scaling study

The weak scaling experiment has been performed on a machine with four 2.1GHz AMD Opteron 6272 Processors with 8 cores per socket and 2 threads per core. The domain \(\widetilde{\Omega }= [-\delta , \sqrt{T} + \delta ]^2\) depends on the number of threads \(T=1,2,\dots ,16\). For the study, we choose horizons \(\delta = 0.06, 0.05\), and 0.04 and set the mesh size to \(\delta = 3h\). The timings for the largest experiment with \(\delta =0.04\) (black dots, Fig. 6b) are depicted in Table 10. The system size grows linearly in the number of cores (dof, Table 10). Therefore, a perfect weak scaling should exhibit no increase in the computational time for growing number of threads so that the scaled speedup is equal to the number of threads. Figure 6b depicts that the assembly does not scale perfectly. More precisely, Table 10 shows that the 16 threads yield a speedup of 11. This can be explained by a growing overhead in the merging of the submatrices into a single sparse system.

Table 10 Assembly time for the system with horizon \(\delta =0.04\) degrees of freedom in weak scaling study

However, the increase in precision for vanishing mesh size is quadratic which can amortize the costs. This is reported in Table 6 where we find that an increase of computation time by a factor 253 leads to a 386 times smaller \(L^2\) error.

4 The Code nlfem

4.1 Usage

The package provides an assembly routine while clearly the construction of numerical examples also incorporates the definition of a finite element mesh, the settings of the integrators, kernels and forcing terms, and finally the solution of the resulting equation. We exemplify the usage of nlfem by solving a scalar, nonlocal Dirichlet-type problem (see (21)), where \(\Omega\) is given by a 2d-disk of radius 0.9 with a nonlocal boundary \(\Gamma ^D\) of width \(\delta =0.1\).

4.1.1 Finite Element Mesh

The construction requires a finite element mesh in a 1d, 2d, or 3d domain. The mesh is characterized by its elements and vertices. The elements are given by a numpy.ndarray of datatype numpy.int_ and shape (nE, d+1), where nE is the number of elements and d the dimension of the domain. The vertices are a numpy array of floats with shape (nV, d), where nV is the number of vertices.

The domain described by the above arrays needs to be divided into different parts according to their purpose. To that end, we define a numpy nd.array called elementLabels of type numpy.int_ and length nE. It assigns a label to each element. Negative elements indicate the Dirichlet boundary, any positive element is considered a part of the domain, and zero-labeled elements are ignored and used to manipulate the mesh topology only; see Assumption 4 and the related remarks.

Example

The finite element mesh (that is vertices and elements) of the required form can be obtained for example from gmsh. In the example presented in Fig. 7a, the elementLabels are set to \(-\)1 on \(\Gamma ^D\) (blue in the below figure) and 1 on \(\Omega\) (yellow) in Fig. 7a.

Fig. 7
figure 7

a The domain \(\widetilde{\Omega }\) consisting of \(\Omega\) (yellow) and \(\Omega _D\) (blue). b The degrees of freedom with the corresponding labels which determine whether a degree of freedom is unknown (yellow) or given by the Dirichlet data (blue)

4.1.2 Defining the Settings

The kernel is a Python dictionary which contains the keys function, horizon, outputdim, and possibly fractional_s. A list of implemented kernel functions can be found via nlfem.show_options(). The horizon determines the interaction radius of the kernel and outputdim tells whether the kernel is scalar or tensor-valued. Given the kernel has a singularity which requires regularizing integral transformations, we need to specify the parameter s of the singularity in fractional_s. The quantity is required for the special integration routines fractional and weakFractional which need to be applied to those kernels.

Example

We choose the scalar kernel (24) which is of truncated fractional type where \(d=2\) and \(s=0.4\). The corresponding dictionary reads as

figure c

As optional parameter, it is possible to hand over varying kernel coefficients with the key "Theta". Of course, the implemented kernel has to support the usage of the given information. This is so, for example, for the kernel functions "theta" and "sparsetheta" which evaluate

$$\frac{4}{\pi \delta ^{4}} \Theta (\textbf{x},\textbf{y}) \text { and } \frac{4}{\pi \delta ^{4}} (1 - \Theta (\textbf{x},\textbf{y})),$$

respectively. The coefficients are expected to be of type scipy.sparse.csr_matrix. However, there are no other restrictions, so that any use of the coefficients inside of the kernel function is possible. Note also that new kernels can be defined.

The dictionary specifying the forcing-term has a similar structure and contains a key function. Again, a list of implemented forcing functions can be found via nlfem.show_options(). Note that the assembly of the right-hand side is standard and nlfem offers this functionality for convenience only.

Example

We choose \(f(\textbf{x}) = -2(x_1+1)\) which can be specified by

figure d

All other settings are stored in the dictionary conf. The truncation routines are selected by the method given in approxBalls. The quadrature rule which is used in the truncation routine is indicated by numpy arrays of quadrature points and weights. For kernels which exhibit a singularity, it is possible to select another integration routine specifically for touching elements (e.g., fractional, weakFractional). However, the choice does not affect the quadrature for nonsingular kernels. The function nlfem.show_options() prints a list of the singular kernels.

Example

We discretize the problem with a continuous Galerkin ansatz space and use a retriangulation with caps (Definition 2.3) to approximate the truncation of the interaction neighborhood. We choose a 7-point quadrature rule with points Px and weights dx. The full example configuration is specified by the dictionary

figure e

Note that all options for the settings in kernel, function, and conf can be printed by nlfem.show_options() and empty dictionaries in the required form can be obtained from nlfem.get_empty_settings().

4.1.3 Assembly

Based on the settings, the assembly of the nonlocal stiffness matrix is effected via

figure f

The dictionary mesh contains information about the labeling of elements, vertices, and dof. The labels of the degrees of freedom (dof) depend on the kernel (scalar or tensor-valued) and the ansatz space (CG or DG). Therefore, the function nlfem.stiffnessMatrix_fromArray() automatically deduces vertexLabels and dofLabels. A vertex is labeled according to the labels of the elements it belongs to, and it is given the smallest of those element labels. The dof labels are identical to the vertex labels for CG ansatz spaces and scalar-valued kernels. For DG ansatz spaces, the labels are directly obtained from the element labels. The labels of the degrees of freedom on the domain \(\Omega\) and the Dirichlet domain \(\Omega _D\) are derived from the element labels and stored in mesh as dofLabels so that the following lines isolate the corresponding entries of the discrete solution (Fig. 8).

figure g

Example

In the given example, we have a CG ansatz space and a scalar-valued kernel so that the dofLabels are identical to the vertexLabels; see Fig. 7b.

The solution of a Dirichlet-type problem requires the definition of Dirichlet data \(g\). The function below implements \(g(\textbf{x}) = x_1^2 x_2 + x_2^2\).

figure h

Ultimately, we can solve the discrete nonlocal Dirichlet problem via

figure i
Fig. 8
figure 8

Solution of the example problem

4.2 Structure of the Code

The nlfem code provides a Python interface which communicates all settings to a C++ function. The main functionality of nlfem is evoked by the function stiffnessMatrix_fromArray() in the file cython/nflem.pyx. We therefore restrict the description of the code structure on the flow of function calls starting from the user input to the evaluation of a kernel function and the return of the discrete system. More details are to be found in the C++ documentation of nlfem. The function stiffnessMatrix_fromArray() is written in Cython and can be called from Python. It passes the input to a C++ function and leads to a call of par_system() located in src/Cassemble.cpp. This function collects all settings and starts the assembly of the nonlocal stiffness matrix. It splits up the work by an OpenMP work-sharing construct and starts a double loop over the finite elements for each of the workers. In the center of the double loop, par_system() calls the integration function which has been chosen by the user. The function is called by the pointer integrate() and implemented in src/integration.cpp. This function again evokes a specific method to evaluate the interaction neighborhood which finally evaluates the kernel model_kernel(). The kernels are implemented in src/model.cpp and listed in a C++ map in src/Cassemble.cpp which allows to access them from the Python interface. After the completion of the assembly, a sparse matrix is stored to disk by par_system() and read again by stiffnessMatrix_fromArray() which returns it to the user as scipy.sparse.csr_matrix object.

4.3 Scope and Comparison

The code nlfem [10] provides similar functionality as PyNucleus [19], and we therefore combine the presentation of the scope of nlfem with a comparison of the functionality of PyNucleus and nlfem. First of all, the code PyNucleus points into a different direction than nlfem as it assembles operators of the type

$$\begin{aligned} -\widetilde{\mathcal {L}}\textbf{u}(\textbf{x}) = ~ \int _{\mathbb {R}^d}\left( u(\textbf{x}) - u(\textbf{y})\right) \gamma (\textbf{x}, \textbf{y})d\textbf{y}\end{aligned}$$
(36)

as opposed to (4), where the case that the scalar kernel \(\gamma (\textbf{x}, \textbf{y})\) has an infinite interaction horizon is explicitly allowed. The operators \(\mathcal {L}\) and \(\widetilde{\mathcal {L}}\) are identical for symmetric kernels.

We show an overview of the functionality of the two codes in Table 11. PyNucleus aims to provide efficient discretization and assembly routines with quasi-optimal complexity, in particular for problem with infinite interaction horizon \(\delta\), which nlfem does not cover. Both codes provide a Python interface and implement some kind of parallelization, where PyNucleus can directly be used on clusters (MPI), as opposed to nlfem which offers multi-threading (OpenMP). PyNucleus contains mesh construction and solver functionality, and it can store the matrices resulting from the assembly in hierarchical, dense, and sparse format, where nlfem offers sparse matrices only. Both codes provide discontinuous and continuous finite element spaces on 1d, 2d, and 3d domains. PyNucleus allows discontinuous P0 and continuous P1, P2, and P3 elements while nlfem provides discontinuous and continuous P1 elements. While PyNucleus is restricted to scalar kernels, nlfem allows scalar- and tensor-valued kernels.

Table 11 Comparison of functionality

Remark 4.1

While tensor-valued kernels allow systems related to linearized peridynamics models, nlfem does not allow to approximate nonlinear peridynamics operators.

In Table 12, we see a detailed overview of the truncations and kernel types which are implemented. While in both codes new kernels can be introduced, adding new truncation routines or special quadrature rules is a more complex endeavor. We therefore compare the scope of the codes with respect to the implemented interaction horizon and singularities. On 1d domains, PyNucleus can assemble fractional and integrable kernels with finite and infinite horizons, where nlfem supports truncated integrable kernels only. On 2d domains, both codes support fractional-type and integrable kernels.

Table 12 Comparison of truncation and kernels

Remark 4.2

The quadrature rules for the (truncated) fractional Laplacian, that is for \(\varvec{\Psi }_\delta (\textbf{x},\textbf{y}) \equiv const\) in (1), can be simplified significantly [18]. Identical elements require an evaluation of a 1d instead of a 4d integral. Similarly, edge-touching and vertex-touching elements require the computation of two- and three-dimensional integrals only [18]. As those rules have been incorporated into PyNucleus, it is to be preferred over nlfem for the truncated fractional Laplacian. Note however that the rules in nlfem are directly applicable to a larger range of kernels as, for example, the tensor-valued nonradial kernel (27).

Both codes offer error commensurate approximations of \(B_{\delta }^2\) truncations. The code nlfem moreover supports the \(B_{\delta }^{\infty }\) and barycenter ball approximations [1]. The main advantage of the latter is that it can be implemented independently of the dimension \(d\) of the domain. Therefore, nlfem can assemble nonlocal operators in 3d. On the other hand, PyNucleus contains assembly routines for classical, local operators, which nlfem does not support.

To conclude, we find that PyNucleus covers a larger scope than nlfem[,] while not all features of nlfem are contained in PyNucleus .

5 Conclusion

The code nlfem is a tool to set up numerical experiments for researchers. The documentation also describes the extension by user-defined kernels which allows to consider a large problem class. It can assemble nonlocal problems in 2d with an error commensurate kernel truncations for Euclidean and infinity norm balls and in 1d and 3d using the generically implemented barycenter method [1]. We therefore hope to bridge efficiency and flexibility to obtain a convenient Python package which nourishes the current development in the field of finite element methods for nonlocal operators and enables easy validations of new theory without the effort of implementing code from scratch. In that sense, nlfem contributes to the ongoing effort to unlock the full potential of nonlocal models.