1 Introduction

In this work, we consider the problem of finding an optimal discretization for a linear elasticity problem for a given planar domain and boundary conditions. Previously, this problem has often been solved by adaptive refinement, which means that the partial differential equation (PDE) is first solved on an initial mesh, then the error is estimated using a local error estimator, and the mesh is subsequently refined in the regions of the domain where the error is large. This process is repeated until the obtained uniform error is below a given tolerance. Adaptive refinement is computationally costly, since at each step, a new mesh is generated and progressively larger linear systems need to be assembled and solved. The objective of this study is therefore to find an optimal mesh a priori without needing to first solve the PDE, thereby avoiding the computational costs involved in iterative mesh refinements.

For this purpose, we train a neural network that takes as input the geometry and boundary conditions, and predicts a relative mesh size field for linear elasticity problems. It is well known that for a complex geometry, certain areas need to be refined more to obtain acceptable accuracy, e.g. around re-entrant corners or near to the areas where fixed boundary conditions are prescribed. Using a machine learning approach for the prediction of mesh refinement has several advantages. First, one can avoid an iterative refinement scheme by obtaining a suitable initial guess directly from the initial model. Furthermore, neural networks (especially convolutional neural networks) allow training on several classes of simple configurations, while they can be evaluated on more complex domains. Based on the output from the neural network, we construct a quadrilateral mesh with local refinement properties. Quadrilateral meshes are adopted because of their superior performance in engineering applications. Moreover, we intend to extend the method to isogeometric discretizations based on (curvilinear) quadrilateral patches. There is a rich literature on this topic, for example automatic generation of high quality quadrilateral meshes from given planar curves [1]. The method emphasizes matching interior and exterior boundaries and avoids distorted quads by introducing angle bounds.

Machine learning and especially artificial neural networks (ANN) have been employed recently to geometric problems as well as to problems that arise when solving PDEs. For instance, machine learning techniques are employed in model order reduction [2, 3], and dynamic model decomposition [4]. Specifically, convolutional neural networks (CNN) are often applied in computational problems [5,6,7]. The combination of techniques from computational mathematics and ANN has resulted in interesting contributions for both fields, since the theoretical results from classical approximation theory can be used to derive results on the approximation properties of ANN [8,9,10,11,12].

A straight-forward application of deep learning to the solution of PDEs consists of training a neural network to represent the solution of a single boundary value problem [13]. Especially for very high-dimensional problems, this approach results in efficient methods, compared to classical Galerkin methods. But ANN cannot only be used to describe the discretization spaces for solutions of a PDE, machine learning techniques have also been used in several ways to facilitate the efficient and accurate solution of PDEs using classical methods, such as FEM. Deep learning has been used for the system matrix assembly in Galerkin methods [14] as well as for accelerating the solution process of the resulting system [15].

Another very promising application of ANN in this context is mesh generation. To ensure an efficient approximation, a locally refined discretization space is often needed. So far, the efforts in this direction have focused on generating polygonal/polyhedral meshes for finite element discretizations [16,17,18,19]. In [20] an ANN is trained to predict the mesh quality of a given finite element mesh. In our approach, we train an ANN to predict the optimal local mesh density for a given polygonal geometry that can have holes, and given boundary conditions in the form of one traction boundary and one fixed boundary. Moreover, we study in this paper the dependence of the output of our neural network on the complexity of both the computational domain and different training strategies. We represent all input data, i.e. the domain as well as the boundary conditions, as images, and we employ a convolutional neural network (CNN) that maps to a pixel-wise estimate for the optimal mesh density.

The use of images and convolutional neural networks has several advantages: CNNs are able to detect local features in the data, which is especially well-suited to the problem of local refinement. Moreover, it is possible to rescale the models to apply them to different input sizes. Finally, the use of images as our input data makes the method completely independent of the employed scheme for numerically approximating the solution to the PDE. This means that the training data for our model can be produced by using finite elements based on different polygons as well as using isogeometric analysis [21]. Moreover, image-based engineering and science is a growing research area. Generating meshes from scanned images has wide applications in computation biology, medicine and materials science [22]. As an example, the construction of quadrilateral and hexahedral meshes from volumetric image data is discussed in [23].

In recent works, the concept of CNN has been generalized from planar image data to discrete manifolds and graphs [24, 25], resulting in geometric deep learning. This makes it possible to train and evaluate neural networks on data sets that consist of discrete surfaces. These techniques might allow a future generalization of our method to the problem of local refinement on surfaces. However for the processing of planar domains, as we do in the present paper, as well as of volumetric domains, image data and CNNs are more suitable. In particular, the representation of the geometry as an image is more flexible and does not depend on a specific parameterization.

The approach we present in this paper is similar to [26, 27], where a neural network is used to facilitate the generation of locally refined finite element meshes. The network predicts the distribution of the a posteriori error for a given set of geometry, material properties and boundary conditions, which demonstrates the feasibility of neural networks for generating locally refined meshes. We want to highlight here that the network architecture and the structure of input and output are important factors when generalizing the approach. The approach developed in [26, 27] is based on a fully connected neural network with a fixed structure, which encodes the geometry. Thus, the possible geometries are taken from a parameterized family of possible geometries. In our approach, we consider the domain and input parameters to be images and employ CNNs. Using this setup, we can extend the network to other geometries by incorporating them in the training process, while keeping the network structure fixed. We believe that this facilitates the extension to more complex domains.

The remainder of this paper is organized as follows: In Sect. 2, we describe the model problem and objective which we consider throughout the paper. In Sect. 3, we analyze and classify different measures for geometric complexity that occur in the problems that we consider. We describe our methods for generating the training data in Sect. 4 and present the architecture of our neural network in Sect. 5. Finally, we present numerical experiments in Sect. 6, discuss directions for further research on the use of artificial neural networks for generating locally refined meshes in Sect. 7 and conclude the paper in Sect. 8.

2 Problem formulation

Given a computational domain and corresponding boundary conditions, we want to obtain a quadrilateral mesh over which we can define a finite element space to represent the solution of a linear elasticity problem. The goal is to find a mesh that yields a numerical solution of high quality without comprising the computational complexity, i.e., the uniform error should be below a given tolerance and the number of elements should be as small as possible. To do this, we train a neural network that takes as an input the geometry, the Dirichlet and Neumann boundary conditions and whose output predicts the relative local mesh resolution.

2.1 The PDE model problem

As a model problem, we consider the problem of linear elasticity on a planar domain \(\Omega \subset \mathbb {R}^2\) with polygonal boundary \(\partial \Omega\). The segments \(\Gamma _D \subset \partial \Omega\) and \(\Gamma _N\subset \partial \Omega\) are the Dirichlet and Neumann boundaries, respectively, where \(\Gamma _D\cap \Gamma _N =\emptyset\). The governing equations are the equilibrium equation (1a), strain-displacement relation (1b), and the constitutive law (1c):

$$\begin{aligned}&\varvec{-\nabla \cdot \sigma }=\varvec{f}{,} \end{aligned}$$
(1a)
$$\begin{aligned}&\varvec{\epsilon }=\frac{1}{2}(\varvec{\nabla u }+ \varvec{\nabla } \varvec{u}^T ){,} \end{aligned}$$
(1b)
$$\begin{aligned}&\varvec{\sigma }=2\mu \varvec{\epsilon }+\lambda (\varvec{\nabla } \cdot \varvec{u})\varvec{I}. \end{aligned}$$
(1c)

In the governing equations, \(\varvec{\sigma }\) and \(\varvec{\epsilon }\) denote the stress and the strain tensors respectively, while \(\varvec{u}:\Omega \rightarrow \mathbb {R}^2\) is the displacement field and \(\varvec{f}:\Omega \rightarrow \mathbb {R}^2\) is the body force. The constants \(\lambda\) and \(\mu\) are the Lamé parameters that satisfy

$$\begin{aligned} \lambda&=\frac{\nu E}{(1+\nu )(1-2\nu )},\\ \mu&=\frac{E}{2(1+\nu )}, \end{aligned}$$

where E is Young’s modulus and \(\nu\) is Poisson’s ratio. We moreover consider boundary conditions of the form

$$\begin{aligned} \varvec{u}&=\varvec{u}_0 \quad \hbox {on} \quad \Gamma _D \subset \partial \Omega ,\\ \varvec{\sigma }\varvec{n}&= \varvec{t} \quad \hbox {on} \quad \Gamma _N \subset \partial \Omega , \end{aligned}$$

where \(\varvec{u}_0\) are the prescribed displacements, \(\varvec{t}\) is the prescribed traction, and \(\varvec{n}\) is the outer unit normal vector to \(\Gamma _N\). Then, after combining the equations in (1), multiplying by a test function \(\varvec{v}\) and applying Green’s theorem, we can derive the weak form of (1) as: Find \(\varvec{u}\in \mathcal {U}\) such that

$$\begin{aligned}&\int _\Omega \frac{\mu }{2}(\varvec{\nabla u}+\varvec{\nabla u}^T)\cdot (\varvec{\nabla v}+\varvec{\nabla v}^T)\,d\Omega + \int _\Omega \lambda (\varvec{\nabla } \cdot \varvec{u})(\varvec{\nabla } \cdot \varvec{v})\, d\Omega \\&\quad = \int _\Omega \varvec{v}\cdot \varvec{f}\,d\Omega + \int _{\Gamma _N}\varvec{v}\cdot \varvec{t}\,d\Gamma \end{aligned}$$

for \(\varvec{v} \in \mathcal {V}\). Here,

$$\begin{aligned} \mathcal {U}&= \lbrace \varvec{u} \in (H^1)^2 : \varvec{u}=\varvec{u_0} \hbox { on } \Gamma _D \rbrace \hbox { and } \\ \mathcal {V}&= \lbrace \varvec{v} \in (H^1)^2 : \varvec{v}=\varvec{0} \hbox { on } \Gamma _D \rbrace , \end{aligned}$$

where \(H^1\) is the space of functions with square integrable derivatives.

In this study, we consider two-dimensional problems with fixed isotropic materials under plane stress conditions, with \(E=10^{3}\) and \(\nu =0.3\). The body force is set to \(\varvec{f}=0\). Moreover, we consider fixed boundary conditions \(\varvec{u_0} = \varvec{0}\) and pressure loading where \(\varvec{t}=10\varvec{n}\). Due to the linearity of the problem, different material properties or load magnitudes will still result in similar refinement patterns. We believe that the approach presented in this paper has the potential to be extended to a larger class of PDEs.

2.2 The optimization problem

The aim of our method is to find the optimal mesh density for a given geometry, Dirichlet and Neumann boundaries. This means that the mesh density produced by our method should ideally result in a discretization that achieves an optimal accuracy with a minimum number of degrees of freedom. To formulate this objective in a mathematically precise way is a difficult undertaking, as there is no well-defined notion of optimality for the mesh density. Our proposed method can be regarded as an instance of operator learning. We therefore opt to describe the operator that we want to approximate by referring to a standard adaptive algorithm.

As described in the previous section, the input of our method consists of a geometry \(\Omega\) as well as information on the fixed boundary \(\Gamma _D\subset \partial \Omega\) and the traction boundary \(\Gamma _N\subset \partial \Omega\). We denote the set of possible input data as X, where each element

$$x = (\Omega , \Gamma _D, \Gamma _N)\in X$$

consists of a geometry and the Dirichlet and Neumann boundaries.

The operator that we want to approximate is therefore of the form

$$\begin{aligned} \mathcal {F} : X \rightarrow D, \end{aligned}$$
(2)

where

$$D = \bigcup _{x\in X} D_x$$

and an element \(d_x\in D_x\) is a function \(d_x: \Omega \rightarrow [0,1]\) that represents a relative local mesh density.

The precise definition of the operator \(\mathcal F\) depends on the chosen method for generating the locally refined mesh. For a data point \(x\in X\), we denote by \(W_x\) the set of possible discretizations for solving the PDE using our chosen computational method. In our examples, \(W_x\) is the set of quadrilateral meshes that exactly represent the boundary \(\partial \Omega\).

We write

$$W = \bigcup _{x\in X}W_x$$

and we denote by

$$\begin{aligned} A: X \rightarrow W \end{aligned}$$
(3)

an algorithm for generating a discretization for the given input data. For example, A can be a classical adaptive method based on solving the PDE, marking, and refining the elements iteratively until a stopping criterion is reached. The operator \(\mathcal F\) is therefore defined as

$$\mathcal F (x) = \mathcal D\circ A(x),$$

where

$$\begin{aligned} \mathcal D: W\rightarrow D \end{aligned}$$

maps a discretization to its local relative mesh density.

Since it is fully deterministic, the operator \(\mathcal F\) can be easily evaluated by applying the refinement algorithm A. However, since the evaluation of A typically consists of solving the PDE multiple times on meshes with increasing mesh density, this can be prohibitively computationally complex. Therefore, we want to find a way to evaluate (or approximate) \(\mathcal F\) in an efficient way. Since it is a highly non-linear operator that is impossible to evaluate directly, we opt to approximate \(\mathcal F\) with an artificial neural network \(\mathcal F_{\text {NN}}\).

To be able to use neural networks based on convolution, we consider as a discretized input a discrete approximation of \(x = (\Omega , \Gamma _D, \Gamma _N)\) at a fixed grid of points \((\mathbf {p}_1, \ldots , \mathbf {p}_N) \in \Omega\), i.e., an image. Likewise, we regard the mesh density as a discrete function at the same set of points. Therefore, for an input \(x\in X\) and target mesh density \(d_x\), the loss function can be defined as:

$$\begin{aligned} \mathcal {L} = \sum _{i=1}^N \left( \mathcal {F}_{NN}(x)(\mathbf {p}_i) - d_x(\mathbf {p}_i)\right) ^2. \end{aligned}$$
(4)

During the training, we try to find the optimal approximation \(\mathcal {F}_{NN}\) by optimizing the network parameters. Since we select the data points \(\mathbf {x}_i\) as uniformly spaced, we investigate the use of image-based convolutional networks such as U-net, which are described in detail in Sect. 5.

3 Training strategy and data representation

As shown in Sect. 2.2, the problem of finding an optimal mesh density is complicated and largely depends on the domain \(\Omega\) as well as the boundary conditions. The basic idea of our training strategy is to train the network with simple geometries, for which one can more easily obtain target discretizations. The trained network can then be applied to more complex domains. In this section, we give an overview of several measures of geometric complexity, the training strategy and the data representation.

3.1 Measures of complexity

Since in our model problem we restrict ourselves to constant coefficients and zero body forces, the complexity of the mesh generation problem depends largely on the complexity of the domain. While there is no way to objectively quantify the complexity of a geometry in this context, we aim to categorize possible input geometries by a number of measures corresponding to different aspects of geometric complexity.

One important advantage of using neural networks for this problem is their potential to generalize knowledge obtained from training on data from a number of simple data classes to complex unseen data. We consider computational domains whose geometric complexity is measured using the following criteria:

  • Convexity: Domains with only convex boundaries and domains with non-convex boundaries.

  • Genus: Simply connected domains and domains with a number of k holes.

  • Smoothness of Boundary: Domains with polygonal boundaries and domains whose boundaries are piece-wise polynomial curves of degree \(d\ge 2\).

For each of the geometric complexity criteria we consider a few simple cases, which are listed as follows:

$$\begin{aligned} \begin{array}{lrl} {\text{``Convexity''}} & = & \left\{ \begin{array}{ll} 0 & \hbox { if the outer boundary is convex,} \\ 1 & \hbox { if the outer boundary is non-convex,} \end{array} \right. \\ {\text{``Genus''}} & = & k \in \{0,1,2\}, \hbox { where } k=\hbox { number of holes}{,} \\ {\text{``Smoothness of boundary''}} & = & \left\{ \begin{array}{ll} 0 & \hbox { if all boundaries are polygonal,} \\ 1 & \hbox { if at least one boundary curve is a spline.} \end{array} \right. \end{array} \end{aligned}$$

We then obtain geometric complexity classes, which we can denote with triples

$$(\text{``Convexity'', ``Genus'', ``Smoothness of boundary''}),$$

where e.g., (0, 0, 0) represents convex, simply-connected, polygonal domains, (1, 2, 0) represents non-convex polygonal domains with holes or (0, 0, 1) represents convex, simply-connected, spline domains. We randomly generated 10,000 sets of data each for the geometric complexity classes (0, 0, 0) and (1, 0, 0). While for the classes (0, k, 0) and (1, k, 0), which have higher geometric complexity, we generated 20,000 sets of data each. We then train and validate the neural networks by a random 90% - 10% split between the training and testing data.

While we restrict ourselves to these criteria in this paper, the method can be extended arbitrarily by adding more criteria of complexity. Possible generalizations are discussed in more detail in Sect. 7. Furthermore, when applying the method in practice, using real-world training data, the choice of geometric complexity classes does not need to be a conscious decision, as the network can learn from any data in the available data set.

3.2 Training strategy

In this study, we aim to train the neural networks such that they recognize the following refinement rules:

  • The refinement should be concentrated at points where the stress values are expected to be high, such as the reentrant corners and at the endpoints of the Dirichlet boundary.

  • The refinement level depends on the position of the corner relative to the Dirichlet and Neumann boundaries.

  • The strength of the refinement also depends on the length of the support and traction boundaries and on their relative position.

Instead of trying to implement these rules directly, we consider a more general data driven approach. A data set that contains domains of all possible geometric complexity classes, i.e., combinations of cases of complexity criteria listed in Sect. 3.1, would need to be unfeasibly large and therefore difficult to generate and process. In this study, we exploit the neural network’s potential for generalization to unseen data to handle complicated domains that are complex in more than one of these classes.

We train the neural network on a data set that consists of domains that are simple in all but one of the different classes of geometric complexity. When evaluating the network on a domain that is complex in more than one of the listed classes, the network is able to apply the knowledge learned from this training data also to this case. An example is given in Sect. 6.5, where the training data set contains convex polygonal domains with one hole as well as non-convex polygonal domains without holes (i.e., the classes (0, 1, 0) and (1, 0, 0), respectively). When evaluating the network on a domain that is non-convex and has several holes, e.g. from the class (1, 2, 0), the network can give a good prediction, although it has never seen such a domain.

3.3 Data representation

In our method, we represent the input as well as the output of the neural network as pixel-based images. This choice leads to two significant advantages: On the one hand, we can employ powerful deep learning techniques that have been developed for image processing in the recent years. In particular, we can use convolutional neural networks and related network structures. On the other hand, using only image data makes our method completely independent from the representation of the geometry and from the discretization spaces that are used for solving the partial differential equation numerically.

For example, training data for our method can be obtained using a finite element method, isogeometric analysis, or any other adaptive method for solving PDEs. Likewise, the trained neural network has the potential to guide the generation of locally refined meshes for any of these numerical methods. What has to be taken into account is the interpretation of the output data and translation into a locally refined mesh. Since we train our method with bilinear finite elements over quadrilaterals, the local mesh grading needs to be adapted for higher order or isogeometric elements.

The input of our neural network consists of three grayscale images of the same resolution: One image contains the geometry of the computational domain \(\Omega\). The other two images contain only the Dirichlet boundary \(\Gamma _D\) and the Neumann boundary \(\Gamma _N\), respectively. See Fig. 1a–c for an example input data. The output of the network consists of a single grayscale image of the same size as the input data. The scalar value assigned to each pixel represents the network’s prediction of the local mesh-size at that point of the input image representing the computational domain, see Fig. 1d.

Fig. 1
figure 1

An example for the input and output of our neural network: a Geometry, b Dirichlet boundary, c Neumann boundary and d predicted mesh density

4 Data generation and adaptive refinement

In this section we explain how the training data sets, that is, the input geometry and the target mesh density, are generated.

4.1 Generating the geometry

We have two generators which generate the data for geometry classes with varying Convexity and Genus (as discussed in Sect. 3.1), i.e., for classes \(\{(i_1,i_2,0),i_1\in \{0,1\},i_2\in \{0,1,2\}\}\). In the following, we describe the procedures that we use to generate geometries for different geometric complexity classes. More precisely, we produce random convex polygonal domains, non-convex domains as well as domains with voids.

To create a domain with a convex boundary, we first generate n sample points distributed randomly within the unit square. In our implementation we use \(n=30\) for all training data sets. The domain is then given by the convex hull of the point set, see Fig. 2a, where the blue dots are the sampled points and the black line represents the boundary of the convex hull. The blue dots that lie on the black line are the vertices of the convex hull. They are used to construct a quadrilateral mesh in Gmsh, as shown in Fig. 2b.

Fig. 2
figure 2

Generating a convex domain (of class (0, 0, 0)) and initial mesh: a sample points and vertices of the convex hull and b resulting initial quad mesh obtained from Gmsh

In Fig. 3a, we illustrate the creation of a non-convex domain as a union of two convex domains. Two rectangles that intersect and lie within the unit square are first created. Then, as described above, a convex domain is created for each rectangle as the convex hull (represented by the red polygon) of random sample points. The non-convex domain is then given as the union of the two convex sub-domains; the corresponding mesh is shown in Fig. 3b. If the resulting domain is not simply connected and non-convex, it is discarded.

Fig. 3
figure 3

Example for generating a non-convex domain (of class (1, 0, 0))

Creating a domain of non-zero genus is illustrated in Fig. 4. First, a random convex or non-convex domain is generated, which is then divided into two or more sub-domains. In Fig. 4a, a convex domain (blue polygon) is divided into two sub-domains by the dashed lines. The division lines are created by randomly selecting points that lie on the boundary of the unit square (marked by stars) and connecting them to the center of the polygon (marked by the blue dot). Then, for each sub-domain, a small convex domain is created (represented by the red polygon), again as the convex hull of random sample points (with \(n=10\)). These small domains are then cut out from the main polygon, forming the voids. Figure 4b shows the resulting mesh.

Fig. 4
figure 4

Example of generating a domain with voids (of class (0, 2, 0))

The boundary conditions are generated by randomly selecting two edges from the domain. Here we only consider the outer boundary of the domain. The restriction that we make is that the two boundaries should not be adjacent to each other. One of the selected boundaries is set as the Dirichlet boundary, while the other is set as the Neumann boundary. The domain as well as the edges selected to impose boundary conditions are converted to images which are used as input for the considered data set, cf. Figure 1.

Once the domain is constructed and the boundary conditions are fixed, we create an initial mesh over the domain and perform a numerical simulation for the linear elasticity problem (see Sect. 2.1 for more details) using a finite element solver. This is explained in more detail in the following subsection.

4.2 Generating the target mesh density

For each domain, the sampled points that lie on the boundary are selected and used to construct an initial mesh using GmshFootnote 1. Gmsh is an open-source program written in C++ with an extensive Python interface. It produces unstructured triangular and quad (or mixed) meshes based on a given boundary representation. In our experiments, an all-quad mesh is constructed from a triangle mesh by subdivision.

For the numerical analysis we use SolidsPyFootnote 2 which is written in Python/NumPy. It is an educational software which is easy to use and modify. SolidsPy supports triangle and quad meshes. In addition, it performs stress averaging for plotting continuous stress fields. For the given mesh and boundary conditions, information such as displacement, stress and strain can be obtained. The von Mises stress can be computed from the \(\sigma _{xx}\), \(\sigma _{yy}\) and \(\sigma _{xy}\) stress fields.

To obtain an adaptively refined mesh, we use a uniformly refined fine mesh \(\mathcal {M}^{*}\) for a heuristic error estimation. Consider a mesh \(\mathcal {M}_{\ell }\) at level \(\ell\). The von Mises stress is computed on the mesh \(\mathcal {M}_{\ell }\) and on \(\mathcal {M}^{*}\). By using the error between the fine and coarse solution measured at every vertex of \(\mathcal {M}_{\ell }\), we generate a mesh size field \(\mathcal {B}_{\ell }\) as follows:

$$\begin{aligned} \mathcal {B}_{\ell }=\mathbf {h} \left( 1-\frac{\varvec{\varepsilon }_{avg}}{2\varepsilon _\text {max}} \right) , \end{aligned}$$
(5)

In (5), \(\mathbf {h}\) and \(\varvec{\varepsilon }_{avg}\) represent the maximum edge length and average von Mises error on each element, while \(\varepsilon _{max}\) denotes the global maximum von Mises error in the mesh. We refer to the Gmsh tutorialFootnote 3 for using the estimated mesh size field \(\mathcal {B}_{\ell }\) to create an updated mesh \(\mathcal {M}_{\ell +1}\). An example of adaptive refinement obtained using the method as described above is shown in Fig. 5, where the meshes are adaptively refined from Fig. 5a–d. In the figures, the Dirichlet and Neumann boundary are highlighted in green and red respectively, while the red arrows indicate the traction direction. We choose this heuristic refinement approach to reduce the influence of the initial mesh and of the refinement procedure that is usually present for adaptive schemes based on marking elements.

Fig. 5
figure 5

Example of adaptive refinement

5 Network architecture

In the following we discuss the structure of the neural network architecture that we use in this paper. Usually, neural networks are built up of layers sequentially. The network receives an input which is transformed through a series of hidden layers followed by an output layer. The hidden layers and the output layer are made up of neurons associated with weights and biases. Each neuron takes as input the outputs of the neurons of the previous layer. The output of the neuron is given as the composition of an affine linear combination of the input values (taking the weights as coefficients and adding/subtracting the bias), composed with an activation function. The composition of several such layers yields a non-linear function if the activation function is non-linear. In case of a fully connected neural network, each neuron takes as input all neurons from the previous layer (with possibly non-zero weights).

The weights and biases of all the neurons are the degrees of freedom which are trained, i.e., for which one optimizes. The entire network is then trained based on a given set of input and target data, and the weights and biases are optimized with respect to a given objective. The objective is described in terms of a scalar-valued loss function. The aim of the training is to find the parameters (weights and biases) which minimize the loss function by using gradient descent type methods. Computing the gradients is accomplished by reverse-mode differentiation (back-propagation).

While fully connected neural networks are useful for solving many optimization problems, when handling higher dimensional data, they tend to result in a huge number of parameters and can lead to overfitting. Therefore, when the data possesses some local structure which is of relevance for the output of the network, one can use neural networks based on convolution.

5.1 Convolutional neural networks

Fig. 6
figure 6

Convolutional neural networks

Convolutional neural networks (CNN) are explicitly used for working with 2D or 3D data such as images. CNNs are commonly used for image classification. Spatially coherent features such as corners and lines in the images are identified and combined to predict an associated label or category. Similarly to fully connected neural networks, they are formed by sequences of layers. An example of a CNN is illustrated in Fig. 6. It contains an input (green block), a convolution layer (blue block), and an output (red block). In the convolution layer, the filter is formed by neurons arranged in three dimensions. The filter “slides” over the input, performing element-wise multiplication with the current part of the input (highlighted with dark green in Fig. 6), and resulting in a local output (highlighted with dark red in Fig. 6).

The output size is determined by several factors including the filter size, input size, padding and stride. Padding refers to adding additional boundary data surrounding the image. It is commonly used for reducing border effects. Stride is defined as the number of steps the filter moves in the convolution. Suppose the input size is given by \(W_I\) \(\times\) \(H_I\) \(\times\) \(D_I\), convoluted with a \(W_F\) \(\times\) \(H_F\) \(\times\) \(D_F\) filter, P padding, and S stride will produces output of size \([(W_I-W_F+2P)/S+1]\) \(\times\) \([(H_I-H_F+2P)/S+1]\) \(\times\) \(D_F\). Meanwhile, techniques such as pooling and regularization can be added to the network for deep learning enhancement.

5.2 The U-net architecture

In this study, we develop a network to learn the mesh density operator by using the U-net architecture [28]. The network consists of two parts: contracting and expanding. The first part is formed by encoder blocks which downsample the data, while the second part contains decoder blocks which upsample the data. Each block consists of convolution layers (transposed convolutions for the decoder block), and batch normalization. The two parts are connected by a bottleneck block which has a max pooling layer followed by convolution, and batch normalization. The U-net for training the data is illustrated in Fig. 7. We adopt U-net architecture because it is capable of capturing the contents and enables precise localization. The contracting part of the U-net is the same as a general CNN which extract features such as reentrant corners, and the relative location of the Dirichlet and Neumann boundaries from the inputs. The network then up-samples its hidden layers to create a gray scale image. The gray scale image will have intensity values ranging from 0 to 1, which indicate the relative mesh resolution. This architecture allows the network to combine features from different spatial regions of the images and to localize more precisely the mesh resolution at the region of interest.

The main difference between the U-net architecture as in [28] and the U-net architecture used in this study is that, in our case, there is only one max pooling layer located at the bottleneck block. The low-level feature map from the convolution layers and the precise position of the features in the inputs are important for meshing which is sensitive to geometry. For this reason, we limit the use of pooling having the effect of making the representation become approximately invariant to small translations of the input [29].

We control the down-sampling through padding and striding in the convolution. We apply 2 strides which reduce the size of the convolution output by half, therefore, reducing the image size as the algorithm is moving downward in the contracting path. At the end of the contracting path, the size is reduced to \(4\times 4\). Max-pooling is then used to further down-sample to \(3\times 3\) followed by de-convolution, which is the reverse of convolution in terms of dimensions change at each layer in the expanding path. For the decoder block, the filter size, padding and stride are selected such that the tensor size matches with the contracting path in reverse order. Concatenation is used to add the tensors of matching size between contracting path to expanding path. The final output is a \(60\times 60\) image with a single channel. We set the depth of the filter at the first encoder block to \(D_F=32\). It is doubled from one block to the next in the contracting part. At the max pool block, the depth is increased to \(D_F=512\). The depth is then decreased by half until \(D_F=32\) at the last decoder block. Output samples from the U-net are given in Fig. 8. Each encoder, decoder and max pool block provides a group of outputs. The image resolution and the number of layers are different after each block. For example, the resolution decreases and the depth is increasing from one encoder block to another. For each group, we show a representative layer (slice through the depth) and label the actual size.

Fig. 7
figure 7

U-net architecture for mesh size prediction. It consists of contracting and expanding parts

Fig. 8
figure 8

Representative outputs for the U-net structure shown in Fig. 7. The input of the network contains three layers which represent the geometry, Dirichlet boundary and Neumann boundary, respectively, as shown in Fig. 1a–c

5.3 Model evaluation

In this study, we seek for the operator that maps given geometry and boundary conditions to an optimal (adaptive) mesh density. The inputs and outputs are stored as grayscale images which can be converted to matrices of a fixed size corresponding to the image resolution. The mean square error is used as the regression loss function. It is defined by

$$\begin{aligned} MSE=\frac{\sum _{i=1}^n(x_i-x_i^p)^2}{n}, \end{aligned}$$
(6)

where x refers to the intensity value at pixel i, and n is the number of pixels.

6 Numerical examples

We conduct some experiments for testing the proposed mesh density estimation method. By using the U-net as discussed in Sect. 5.2, we train six models. The first four models are Model(0, 0, 0), Model(0, k, 0), Model(0, k, 0), and Model(1, k, 0), \(k\in \{1,2\}\), where the associated triplets (introduced in Sect. 3.1) indicate the geometry complexity classes used for training. Since each of these models are trained with data restricted to simple geometries or groups of highly related geometries, they are expected to yield more accurate results. However, this choice requires that the underlying complexity class of the example must be known beforehand. Thus, it is more reasonable to have models trained for all possible geometric complexity classes. Here we examine two extended models which are referred to as Model with Reduced Training (Model RT) and Model with Complete Training (Model CT). They are trained using combinations of classes as listed below:

  • Model RT: (0, 0, 0), (1, 0, 0) and \((0,k,0), k\in \{1,2\}\)

  • Model CT: (0, 0, 0), (1, 0, 0), \((0,k,0), k\in \{1,2\}\) and \((1,k,0), k\in \{1,2\}\)

While Model CT covers all possible cases, model Model RT is easier to train, as no training data of high complexity (1, k, 0), \(k\in \{1,2\}\), needs to be constructed. Nonetheless it may be applied to examples of high complexity.

A comparison of ground truth and output for all the models are given in Sect. 6.2. In Sects. 6.3 and 6.4 show the results of Model(0, 0, 0) and Model(1, 0, 0) handing simple geometric domain. The different effect of Model(1, k, 0), Model RT and Model CT on a complex geometry domain is given in Sect. 6.5. Meanwhile, an example with smooth boundary is shown in Sect. 7.1.1, which demonstrates the generalization properties of the corresponding Model(0, k, 0). Finally, the overall performances are quantified by using histograms in Sect. 6.6. We also show the plot of the loss function vs. training epochs for the models at the end of this section. In the following section we introduce the error measures that we consider.

6.1 Error measures

We used relative error for evaluation of predicted output. Let \(I^G\) and \(I^P\) be the \(n_p \times n_p\) matrices representing the ground truth and predicted output images respectively, where \(n_p\) is the number of pixels in each space direction. Here \(I^G_{ij} \in [0,1]\) and \(I^{P}_{ij} \in (0,1]\) denote the value of the mesh size for the subdomain covered by the pixel with index ij. We separate the image into two disjoint subdomains \(\Omega ^{I}\) and \(\Omega ^{O}\), where \(\Omega ^{I}\) denotes the pixels which are (at least partially) in the computational domain and \(\Omega ^{O}\) denotes the pixels which are completely outside the domain. Furthermore, we set \(I^G_{ij}=1\) for the pixels which belong to \(\Omega ^{O}\), whereas \(I^G_{ij}<1\) for pixels \(\Omega ^{I}\). For a given domain and set of boundary conditions, the accuracy of predictive mesh size is evaluated by using relative error (\(\varepsilon _{rel}\)) between the ground truth and the preprocessed output (\(\tilde{I}^{P}\)) defined by:

$$\begin{aligned} \varepsilon _{rel}=\frac{\left\| I^G- \tilde{I}^{P}\right\| }{\left\| I^G\right\| } \end{aligned}$$
(7)

where \(\tilde{I}^{P}\) is obtained by setting \(I^{P}_{ij}\) within \(\Omega ^O\) to be 1 and \(||\cdot ||\) denotes the \(L^2\) norm. \(\tilde{I}^{P}\) is adopted for calculating \(\varepsilon _{rel}\) such that only the computational domain of interest is considered when computing the error.

For validation of the linear elasticity solution, we calculate the error in the relative energy norm of the finite element solution [30]. It is defined in terms of exact stress \(\varvec{\sigma }\), computed stress \(\varvec{\sigma }^h\), and coefficient matrix \(\varvec{D}\) written in Voigt notation as:

$$\begin{aligned} e_{rel} =\frac{\sqrt{\frac{1}{2}\int _\Omega (\varvec{\sigma }-\varvec{\sigma }^h)^T \varvec{D}^{-1} (\varvec{\sigma }-\varvec{\sigma }^h) d\Omega }}{\sqrt{\frac{1}{2}\int _\Omega \varvec{\sigma }^T \varvec{D}^{-1} \varvec{\sigma } d\Omega }} \end{aligned}$$
(8)

and

$$\begin{aligned} \varvec{D} = \frac{E}{1-\nu ^2}\begin{bmatrix} 1 &{} \nu &{} 0 \\ \nu &{} 1 &{} 0 \\ 0 &{} 0 &{} 0.5(1-\nu ) \end{bmatrix}. \end{aligned}$$

6.2 Comparison of relative error

As mentioned previously, the idea of the proposed method is to provide the geometry and boundary conditions as input (in the form of images) to the network and receive a gray scale image (the predicted mesh size density) as output. Here, we compare of relative error between the predicted output and ground truth from the different models. Figure 9 shows the average relative error computed on the test data set for geometric complexity classes as labeled on left side of figure.

The blue bars in the chart are the results from models trained with the respective data sets, i.e. Model (0, 0, 0), Model (1, 0, 0), Model (0, k, 0), and Model (1, k, 0). They always have the smallest average relative error. The green bars are the results from Model RT. Model RT results in higher relative error for test data sets (1, 0, 0) and (1, k, 0). This is because there are fewer data with non-convex boundaries used for training this model. The red bars are the results from Model CT. Although this model is trained with data from all possible geometry complexity classes, it results in higher relative error. The reason is that the network learns better when the training data is consistent. Nonetheless, Model CT could be useful for prediction of mesh size density when there is no information given on the geometry complexity class.

Fig. 9
figure 9

Comparison of average \(\varepsilon _{rel}\) of test data sets for each geometry group on different models. The length of blue bars correspond to average \(\varepsilon _{rel}\) from Model (0, 0, 0), Model (1, 0, 0), Model (0, k, 0), and Model (1, k, 0) respectively. Green bars and red bars shows the average \(\varepsilon _{rel}\) associate to Model RT and Model CT

6.3 Example for geometric complexity class (0,0,0)

In the following, we examine the mesh discretization constructed based on the predicted output. We will start with showing the result from Model(0, 0, 0) which is trained with a simple geometry complexity class. The adaptive mesh, the mesh constructed from the predicted output, and the uniform refinement mesh are shown in the first row of Fig. 10. The meshes are constructed such that they have approximately the same number of elements. The boundary marked with green color represents the Dirichlet boundary. The Neumann boundary is highlighted with red and the arrows denote the traction direction. The figures below each mesh show the corresponding von Mises stress. The relative error between the ground truth and predicted output is \(\varepsilon _{rel}=0.019728\).

It is shown in Figs. 10a and 10b that the predicted output has very similar mesh density distribution to the adaptive mesh (finer mesh size around the Dirichlet boundary and at the end points of the Neumann boundary). However, it does not capture the high stress around the end points of the Dirichlet boundary well (See Fig. 10f). On the other hand, the uniform mesh in Fig. 10d shows a higher stress at the corresponding areas (See Fig. 10f). The reason is that for certain geometries, the uniform mesh contains smaller quads around the corner for better shape approximation. Therefore it is able to capture the high stress if it occurs at those corners. Nonetheless, we can improve the predicted mesh by post-processing the network output. The output is scaled using a piecewise linear function such that for intensity values lower than 0.02, we scale the value by a factor of 0.25. The post-processed result is shown in Fig. 10c. It is shown in the corresponding von Mises stress distribution (Fig. 10g) that the mesh better resolves the high stress region near the Dirichlet boundary. Meanwhile, this locally increased refinement also results in fewer and coarser elements in other areas, as a result of imposing a constraint on the number of elements. The comparison of number of elements and relative error in energy norm (\(e_{rel}\)) are given in Table 1. Note that the error is mostly concentrated in the region with high stresses near the fixed boundary.

Fig. 10
figure 10

Example Model(0, 0, 0). First row: mesh from a adaptive refinement, b predicted output, c post-processed output, and d uniform refinement. Second row: von Mises stress for e adaptive mesh, f predicted output mesh, g post-processed output mesh, and h uniform refinement mesh

Table 1 Example Model(0, 0, 0) : comparison of number of elements, relative error in the energy norm and maximum value of von Mises stress

6.4 Example for geometric complexity class (1,0,0)

In this example shows the result from Model(1, 0, 0). The comparison of adaptive, predicted and fine uniform meshes and their von Mises stress are given in Fig. 11. For this example, the relative error between predicted output and ground truth is given by \(\varepsilon _{rel}= 0.009530\). The predicted and adaptive meshes match fairly well in the domain especially around the Dirichlet boundary. The relative errors in energy norm for the adaptive, predicted, and uniform meshes as listed in Table 2 also demonstrate the capability of proposed method for mesh density prediction.

Fig. 11
figure 11

Model(1, 0, 0) example. First row: mesh from a adaptive refinement, b predicted output, and c uniform refinement. Second row: von Mises stress for d adaptive mesh, e predicted output mesh, and f uniform refinement mesh

Table 2 Model(1, 0, 0) example: comparison of number of elements, relative error in the energy norm, and maximum value of von Mises stress

6.5 Example for geometric complexity class (1,2,0)

Here we show an example with complex geometry. It is a non-convex domain with two voids. The comparison of adaptive mesh, uniform mesh and predicted output from Model(1, k, 0) are shown in the first row of Fig. 12. The figures below the meshes are the corresponding von Mises stresses. In this example, we also compare predicted outputs from Model RT and Model CT. The resulting meshes and von Mises stress are shown in Fig. 12g–j. The predicted output from Model (1, k, 0) has the relative error \(\varepsilon _{rel}= 0.014139\). Model CT results in a slightly lower relative error \(\varepsilon _{rel}= 0.012991\). An interesting observation is that the results from Model RT (\(\varepsilon _{rel}=0.014128\)) are very close to Model(1, k, 0). Geometric complexity class (1, 2, 0) is not included in the training data set for Model RT, but the Unet is able to approximately resemble the higher mesh density around the holes. This demonstrates the generalization capability of the network, which is an advantage of the proposed data-driven mesh density prediction method.

The number of elements and relative errors in energy norm for all the meshes are listed in Table 3. We can observe that when the input has complex geometry, the complete training model (Model CT) can indeed provide a better discretization results.

Table 3 Example for geometric complexity class (1, 2, 0): number of elements, relative error in the energy norm, and maximum value of von Mises stress
Fig. 12
figure 12

Comparison of results from trained networks. First row: a adaptive mesh, b output mesh, and c uniform refinement mesh. Second row: von Mises stress computed on d adaptive mesh, e output mesh and f uniform refinement mesh. Third and fourth rows: mesh constructed using predicted output from g Model RT and h Model CT and the corresponding von Mises stress

6.6 Evaluation of models

For a given predicted output, we construct a uniform mesh with approximately the same number of degrees of freedom, such that: \(\frac{\Vert DOFs^{U}-DOFs^{P}\Vert }{DOFs^{P}}<\)0.05 where \(DOFs^{U}\) and \(DOFs^{P}\) represent the degrees of freedom of the uniform and predicted meshes. We then quantify the solution computed on the predicted mesh over the uniform mesh by using the ratio of errors in the energy norm as follows:

$$\begin{aligned} R^{U}=\frac{ e_{rel}^{U}}{ e_{rel}^{P}} \end{aligned}$$
(9)

where \(e_{rel}^{U}\) and \(e_{rel}^{P}\) refer to the relative error in the energy norm (8) of the uniform and predicted mesh, respectively. The rational functions \(R^{U}\) is interpreted as:

  • \(R^{U}\) \(\approx\) 1 : both meshes have similar analysis accuracy.

  • \(R^{U}\) < 1 : the uniform mesh possesses better analysis accuracy.

  • \(R^{U}\) > 1 : the predicted mesh has better analysis accuracy.

The distributions of \(R^{U}\) for Model(0, 0, 0), Model(1, 0, 0), Model(0, k, 0), and Model(1, k, 0) are shown in Fig. 13. Each histogram shows the results of \(R^{U}\) computed on 100 sets randomly selected test data from the corresponding geometric complexity classes. There are some cases where both meshes have similar accuracy, or the uniform mesh has better accuracy. However, the right-skewed distribution indicate that the predicted mesh performed better in most of the cases.

Fig. 13
figure 13

Distribution of \(R^{U}\) for a Model(0, 0, 0) , b Model(1, 0, 0), c Model(0, k, 0), and d Model(1, k, 0), \(k\in \{1,2\}\)

Finally, the training log for all the models that we discussed above is given in Fig. 14. In the figures, the red line and blue dashes represent the training log and validation log respectively. Note that we used the U-net architecture as discussed in Sect. 5.2 to train different group of data (i.e. different geometry complexity classes). To avoid over-fitting, the training process is terminated when validation loss started to increase. It is shown in the figure that all the training and validation losses converge at very similar rates. A likely cause for this effect is the homogeneity of data, in the sense that cases of equal difficulty are included in the testing and training sets. Although the data sets are randomly generated, they belong to the same or very similar geometric complexity classes.

Fig. 14
figure 14

Training log for the six models trained

6.7 Comparison with MeshingNet on a test geometry

In the following, we demonstrate the proposed method’s capabilities for handling geometries obtained from image data. We compare our approach with the approach presented in [26]. We present the results of our method in Figs. 15 and 16, which are both obtained from the image of the domain given in Fig. 7 of [26]. Note that the linear elasticity problems in [26] are set in such a way that the Dirichlet boundary is formed by two edges and the Neumann boundary is formed by one edge. Since our models are trained with one edge for each of the Dirichlet and Neumann boundaries, we set the boundary conditions in the following examples to be compatible with the trained models.

Figs. 15 and 16 show two examples where each is assigned one of the Dirichlet edges given in [26]; whereas the Neumann boundary is set at the same position. In the two figures, the Dirichlet and Neumann boundaries are highlighted with green and red, respectively. We can observe that both predicted meshes (Figs. 15b and 16b) are finer around the Dirichlet boundary. The locations where stress values are expected to be high can be predicted accordingly and the analysis results from the output meshes have lower relative energy norms compared to uniform meshes. Tables 4 and 5 show the comparison of the number of elements for each mesh and the relative energy norm computed by using the adaptive mesh as the reference solution. We point out that the predicted mesh might not match perfectly with the adaptive refinement mesh. This is because the latter is computed using exact geometry, while the predicted local refinement is based on coarse resolution images.

This study and [26] both introduce machine learning approaches for meshing problems. However, we focus on different aspects. In [26], the domain is restricted to be a polygon with 6-8 edges. In addition, the Dirichlet boundaries are fixed at the 4th and 5th edge, while the 1st edge is always set as the Neumann boundary. But, the model has the option of setting different (homogeneous) material properties and traction with random amplitude up to 1000. Since the underlying problem is linear, we assume that these inputs do not significantly affect the relative mesh density. However, the optimal mesh grading may be affected. Thus, to improve the results we also discuss extensions using varying PDE parameters in Sect. 7. Our method emphasizes handling a larger variety of geometries ranging from convex to non-convex and domains with voids. We restrict our model to impose boundary conditions only on non-adjacent edges to avoid the singularity formed at the points where the Dirichlet and Neumann boundaries join. Nonetheless, the proposed method can also be extended to work with other boundary conditions by simply adding new data in the training process.

Table 4 Comparison between the number of elements and relative errors in the energy norm, for the example from Fig. 15, on a geometry from [26]
Table 5 Comparison between the number of elements and relative errors in the energy norm, for the example from Fig. 16, on a geometry from [26]
Fig. 15
figure 15

Geometry from [26], with the lower left edge of the domain set as the Dirichlet boundary. First row: mesh from a adaptive refinement, b predicted output, and c uniform refinement. Second row: von Mises stress for d adaptive mesh, e predicted output mesh, and f uniform refinement mesh

Fig. 16
figure 16

Geometry from [26], with the edge at the left side of the domain set as the Dirichlet boundary. First row: mesh from a adaptive refinement, b predicted output, and c uniform refinement. Second row: von Mises stress for d adaptive mesh, e predicted output mesh, and f uniform refinement mesh

7 Discussion and future work

A key strength of our method is that it can be extended in a vast number of possible directions. These include the application to more complicated model problems as well as the use of different discretization techniques for solving partial differential equations.

There are two possible ways of extending our approach: Either, we generate training data for more general cases or we modify the network architecture itself, summarized in Sects. 7.1 and 7.2, respectively.

7.1 Extension by adding training data

The measures of geometric complexity that we consider in Sect. 3.1 can be extended in many ways by generating more training data sets. For example, in Sect. 4.1 we restricted our data set to contain only problems where the fixed and the traction boundary are not adjacent. However, this restriction may be dropped by properly extending the training data set. Similarly, one may train with several edges marked as fixed and/or several edges marked as traction boundary.

7.1.1 Extension to curved boundaries

Another possible extension is the application to domains with curved boundaries. To visualize, we present an example for geometric complexity class (0,1,1). We use a well-known benchmark problem (the square plate with circular hole) to demonstrate this generalization of the proposed data driven method. As mentioned in Sect. 4.1, the data used for training are defined by polygons. Smooth boundaries such as circular voids are ’new’ to the trained model. We observe that the model predicts a larger mesh size for such a type of geometry that it has not encountered before; this leads to a higher relative error \(\varepsilon _{rel}=0.08481937\). The mesh constructed using the predicted output has very similar mesh size distribution pattern as the adaptive mesh (i.e. a finer mesh around the hole). While the relative error in the energy norm from the predicted mesh and uniform mesh are very close (see Table 6); it is shown in Fig. 17h that the predicted output mesh has better von Mises stress approximation around the hole compared to a uniform mesh (see Fig. 17i).

Fig. 17
figure 17

Square with circular hole example. First row: a adaptive mesh, b output mesh, and c uniform refinement mesh. Second row: von Mises stress computed on d adaptive mesh, e output mesh and f uniform refinement mesh. Third row: von Mises stress difference between reference solution with g adaptive mesh, h output mesh and i uniform refinement mesh

Table 6 Example for geometric complexity class (0, 1, 1): number of elements, relative error in the energy norm, and maximum value of von Mises stress

On this example one can see that the network can be applied to unseen data, such as curved boundaries, and that the output mesh density is suitable, but not necessarily as good as an adaptive fit. Visually, the output density distribution is similar to the adaptive mesh used as training data, but more uniform. This effect may be reduced by post-processing the output image. Moreover, including domains with curved boundaries also in the training data for the network will most likely improve the result.

7.1.2 Beyond finite element methods

Most importantly, we intend to extend our method to discretization techniques beyond finite element methods, such as isogeometric analysis, which was introduced in [21]. While the network can be trained on data generated from any kind of adaptive numerical method, it can then be applied to any space of locally refinable discretizations, such as linear finite elements, multi-patch splines, THB-splines [31] or (unstructured) T-splines [32, 33]. Since B-splines (and derived concepts) rely on a patch structure, where each patch is tensor-product, refinement becomes more involved. Each patch may be refined in a tensor-product fashion or using e.g. hierarchical B-splines or T-splines. While the latter two are more flexible, a fully tensor-product refinement for each patch is significantly easier to generate. Moreover, the subsequent assembly and solution of the resulting linear system is also faster. Thus, the multi-patch segmentation should be selected a-priori such that a tensor-product refinement is as efficient (with respect to the number of degrees of freedom) as a local refinement scheme using e.g. THB-splines. Hence, it may be of interest to encode not only the local mesh size, but to also obtain information on the mesh anisotropy. If in a region of the domain the local refinement direction is clear, one may fit a patch such that its parameter lines correspond to that direction. Such a segmentation may be performed using classical methods [34, 35] or using machine learning as well.

7.1.3 Improved training strategies

The framework of our approach is given in such a way that, once the family of PDE problems (including material parameters, possible boundary conditions etc.) is set up and the image dimensions are chosen, the neural network can be trained based on the requirements of the users. We envision a continuous training strategy, always improving the network by including additional model geometries (or families of geometries) based on user experience and user demand. This way, the network architecture does not need to change when new types of input data are considered.

Another way to continuously improve the output of the network is by generating training data obtained from improved adaptive refinement schemes. Currently, we use a heuristic scheme, presented in Sect. 4.2, which may be replaced by a (quasi-)optimal local refinement scheme.

7.2 Generalizations that require a modified network structure

In the following we discuss possible generalizations that require a new network structure, increasing the number of input/output layers or the changing their dimensions. In some cases, the previously trained networks may be modified, in other cases a new network may need to be set up.

7.2.1 More general model problems

In this paper we restrict ourselves to linear elasticity on a planar domain without body forces and constant boundary conditions along the traction boundary and fixed boundary. While this is already a challenging problem, our method is by no means restricted to this case.

Since we focus on a representation of the domain as an image, it is possible to encode varying parameters of the PDE or of the boundary conditions as color (or grayscale) images. Similarly, body forces can be included. Besides the variations in geometric complexity, it is then also of interest to vary the complexity of the PDE. As with the geometric models, one can define suitable complexity classes by hand, such as vanishing or non-vanishing body forces, constant or varying parameters etc. Alternatively, one may use a library of problems with known (or desired) solutions as a guide.

7.2.2 Pre- and post-processing of input and output

In the following we also want to highlight how the algorithm may be improved by properly processing the input and output. Since the network is set up such that both input and output are images, pre- and post-processing steps from image processing may be used to improve the results. Note that the network allows any image of the correct size as an input. Thus, the geometry does not need to be from one of the considered classes. This property can be used to train the network also with segments/cutouts of larger images, which may be split into appropriately sized parts. Moreover, image segments of different resolution may be considered and appropriately merged.

The output of the network may be smoothened to obtain meshes with improved mesh grading. Similarly, one may use edge/highlight detection algorithms to find features in the output image that need to be refined more. This approach is also proposed in Sect. 6.3 and tested, see e.g. Table 1. Since we consider images, it may also be feasible to train the network with noisy data to make the output more robust. However, such modifications must be performed carefully and studied more deeply, as small details in the geometry have a significant effect on the expected local mesh size.

7.2.3 Higher dimensions

The proposed method can be generalized to problems of higher dimensions such as solving PDE on volumetric domains or solving time dependent problems such as parabolic PDE using space-time methods. While the method can be set up for problems on volumetric domains in a straight-forward way, an extension to surface domains or space-time problems can be more difficult. The extension to space-time domains leads to 3D or multi-channel images, instead of just grayscale. Note that for time-dependent PDEs, other types of networks like recurrent neural networks or LSTM might be more suitable.

Multi-channel images may also be used to generalize the approach to surface data when the surface can be parameterized over a planar quadrilateral domain. In this case, the differential equation on the parameterized surface can be pulled back to the parametric domain, resulting in a PDE with varying coefficients that can be encoded as separate channels of the image data. A typical example for this case are single-patch B-spline surfaces that are employed in isogeometric analysis.

To process surface data with more complicated methods from geometric deep learning [24] may be used for surface data. These methods generalize the notion of image CNN to data that is represented by graphs or discrete manifolds. The challenge is here to define operators that act on these data types and that perform similarly to the convolution operator of matrices. CNN for discrete for discrete surface data are an active field of research [25, 36] and it would be an interesting direction for future work to employ these techniques to generate locally refined surface parameterizations.

8 Conclusions

In this paper we developed a local refinement method for PDEs using machine learning. The geometry of the domain as well as the boundary data of the PDE are encoded as images. These images are then used as input data for a neural network based on the U-net architecture. The output of the network is another image which encodes the local mesh size of a finite element mesh. From this information we then construct a mesh over the domain using Gmsh. The quality of the locally refined mesh can then be measured by computing the discretization error of the resulting finite element solution.

We compare the quality of the output for different training strategies. To do so, we categorize possible input data sets using different geometric complexity classes. One can then pursue different training strategies. It is expected that the resulting output yields best results, if the network is trained exactly with those training data sets from the same complexity class as the input. This may however be quite expensive, since it means that one has to generate training data for a wide range of geometric complexities. We could show that it suffices to train the network with data sets which are complex only with respect to one dimension and simple with respect to other dimensions of complexity. For instance, if the network is trained for L-shaped domains without holes as well as domains with a convex boundary and a hole, it can produce a suitable output for an L-shaped domain with a hole, even though it has never seen such a domain. This means that one does not necessarily need to train for all possible geometric complexity classes but can restrict to a subset for which training data is easier to produce.

In the future we want to extend the approach to other types of PDEs and to a larger class of possible domains and data of the PDE. Most importantly, the approach can, in principle, also be extended to 3D. However, the classification of possible domains and the generation of training data becomes more involved. In addition, we want to extend the method also to generating multi-patch isogeometric discretizations with curved boundaries, which take into account the estimated mesh density.