Parallel graph-grammar-based algorithm for the longest-edge refinement of triangular meshes and the pollution simulations in Lesser Poland area

In this paper, we propose parallel graph-grammar-based algorithm for the longest-edge refinements and the pollution simulations in Lesser Poland area. We introduce graph-grammar productions for Rivara’s longest-edged algorithm for the local refinement of unstructured triangular meshes. We utilize the hyper-graph to represent the computational mesh and the graph-grammar productions to express the longest-edge mesh refinement algorithm. The parallelism in the original Rivara’s longest edge refinement algorithm is obtained by processing different longest edge refinement paths in different three ads. Our graph-grammar-based algorithm allows for additional parallelization within a single longest-edge refinement path. The graph-grammar-based algorithm automatically guarantees the validity and conformity of the generated mesh; it prevents the generation of duplicated nodes and edges, elongated elements with Jacobians converging to zero, and removes all the hanging nodes automatically from the mesh. We test the algorithm on generating a surface mesh based on a topographic data of Lesser Poland area. The graph-grammar productions also generate the layers of prismatic three-dimensional elements on top of the triangular mesh, and they break each prismatic element into three tetrahedral elements. Next, we propose graph-grammar productions generating element matrices and right-hand-side vectors for each tetrahedral element. We utilize the Streamline Upwind Petrov–Galerkin (SUPG) stabilization for the pollution propagation simulations in Lesser Poland area. We use the advection–diffusion-reaction model, the Crank–Nicolson time integration scheme, and the graph-grammar-based interface to the GMRES solver.


Introduction
Air pollution is receiving a lot of interest nowadays. It is visible, especially in Lesser Poland area, as this is one of the most polluted cities in Europe [1]. Air pollution depends on traffic, climate, heating of building in the winter, the city's architecture, etc. The air quality can vary significantly over a distance of even a few hundred meters. Air quality simulation is a multidisciplinary endeavor. It applies numerical methods for simulations of different meteorological and chemical models [25]. This paper proposes a parallel graphgrammar-based system for simulation and prediction of air pollution over prescribed terrain data.
Most computer-aided simulations start with mesh generation of the domain with a finite set of elements. For irregular geometries, the triangular elements in two dimensions or tetrahedral elements in three dimensions are probably the most used finite elements in engineering computations [3]. The construction of the computational mesh usually starts from an initial mesh. It refines the mesh iteratively towards a final mesh, where we can solve our engineering problem with the required accuracy.
The process of refinement can generate so-called hanging nodes [4,5]. In two-dimensions, they represent an edge with one triangular element subdivided, and on the other side, the large unbroken element. These nodes are difficult to handle since we have shape functions spread over the finite elements. In the hanging node case, we have to deal with the matching of approximation of "small" shape functions spread over the two broken elements with the approximation of "large" shape functions spread over the large unbroken element. In three-dimensions, the situation gets even more complicated since we may have an edge adjacent to two "small" broken tetrahedra and adjacent to many (even hundreds) of unbroken "big" elements.
We can eliminate a hanging node of a triangular element by breaking the element and connecting the hanging node with the opposite node. We must perform this algorithm in a smart way since we do not want to end up with elongated elements, where the Jacobians go to zero. One interesting example of such an algorithm, which is considered a reference for two-dimensional grids, is the Rivara longest-edge refinement algorithm [20,31,32].
In this paper, we express the two-dimensional Rivara algorithm using graph-grammar productions, we extend it to model the generation of the three-dimensional tetrahedral meshes. We also propose graph-grammar productions expressing the stabilized finite element method for the nonstationary advection-diffusion-reaction simulations. We incorporate the Crank-Nicolson time integration scheme and interface with GMRES solver. We utilize this parallel graph-grammar-based system for the pollution simulations in Lesser Poland area.
The authors [7] proposed the first attempt to model mesh transformations by applying the graph-grammar concept for the regular triangular two-dimensional meshes with the h adaptation. The authors used quasi-context sensitive graph grammar. This approach, however, generated hanging nodes, with all the difficulties related to managing these nodes.
Another attempt utilized the Composite Programmable graph grammars (CP-graph grammar) introduced originally by [11][12][13] as a tool for a formal description of various design processes. The authors [10,[27][28][29] applied the CP-graph grammars to model two-and three-dimensional adaptive grids with hanging nodes.
In this paper, we use the concept of a hypergraph, defined in Sect. 2. The hypergraphs and their grammars have been initially introduced by [15,16] for applications in computer graphics. There are special algorithms developed and optimized for the hypergraphs [19,22,26]. This paper utilizes the hypergraphs for modeling mesh refinement algorithm and interfacing with the GMRES iterative solver. We have implemented our graph-grammar-based system in GALOIS [2,8,14,24,30] framework, allowing for concurrent graph processing.
We use the topographical database Shuttle Radar Topography Mission [6] to generate in an adaptive way the twodimensional triangular mesh representing Lesser Poland area. We utilize the longest-edge refinements, and we remove the hanging nodes. We extend the topographic mesh to a three-dimensional tetrahedral mesh, representing the air above the terrain.
The resulting three-dimensional mesh is subject to the computer simulation with the advection-diffusion-reaction time-dependent solver modeling the air pollution propagation. We employ the Streamline Upwind Petrov-Galerkin stabilization [21] of the advection-diffusion-reaction problem. We use the graph-grammar productions, generating element matrices and right-hand-side vectors for each tetrahedral element. We incorporate the Crank-Nicolson time integration scheme and interface with GMRES iterative solver.
The motivations for developing the graph-grammar-based simulation system are the following. We will compare the computational costs of the classical longest-edge refinement algorithm [20], to our graph-grammar-based algorithm on a model example. We will count the number of basic operations, such as checking the status of a single triangle and breaking a single triangle. While it is impossible to derive a formula for the computational cost for a general mesh, we will compare the algorithms on a representative model example. Classical Rivara's algorithm allows for parallelization by assigning each longest edge path to a single core. In our graph-grammar-based algorithm, we move forward, and we allow for processing a single longest edge path with multiple cores.
We also claim that developing a graph-grammar-based system has some potential benefit for parallelization of the computations. The graph grammar productions are basic undividable tasks that can be executed concurrently. The graph-grammar model allows for implementation in the graph processing system like, e.g., GALOIS environment [30]. The parallelization is obtained for free since the GALOIS system automatically manages concurrent processing of graph-grammar productions. In this paper, we implemented a graph-grammar-based model of mesh generation and generation of elemental matrices. In future work, we plan to develop the graph-grammar model of the iterative solver. We will perform matrix-vector multiplications element wise, multiplying the elemental matrices by local portions of the right-hand side, without assembling the matrices but assembling the resulting vector.

3
The structure of the paper is the following. We start from the formal introduction of the hypergraph and graph grammar in Sect. 2. Next, in Sect. 3, the two-dimensional mesh refinement algorithm, initially proposed by Rivara, is described. In Sect. 4, we introduce the graph grammar model expressing the Rivara mesh refinement algorithm. Section 5 is devoted to comparing the graph-grammar-based algorithm with Rivara's longest edge refinement algorithm, including the discussion on parallelization. Section 6 presents the graph-grammar-based productions interfacing with the GMRES solver, using the Crank-Nicolson time integration scheme. Finally, Sect. 7 is devoted to the numerical results of the simulation of pollution in Lesser Poland area. We conclude the paper with Sect. 8.

Hypergraphs and graph grammar
In this section, the concept of hypergraph and graph grammar are summarized, which are later used to model the refinements of the two-dimensional unstructured mesh.
Definition 1 An undirected attributed labeled hypergraph over label alphabet C and attribute set A is defined as a system G = (V, HE, t, l, at, val) , where: -V is a finite set of nodes, -HE is a finite set of hyperedges, -t ∶ HE → V * is a mapping assigning sequences of target nodes to hyperedges, -l ∶ V ∪ HE → C is a node and hyperedge labeling function, -at ∶ V ∪ HE → 2 A is a node and hyperedge attributing function, where 2 A is a power set of A. Definition 2 A hypergraph G 2 = (V 2 , HE 2 , t, l, at, val 1 ) over C and A is a subgraph of a hypergraph G 1 = (V 1 , HE 1 , t, l, at, val 2 ) over C and A, (i.e.,  where ( Ext = (Ext 1 , Ext 2 , … , Ext k )).

Remark 1
Let G 2 be a subgraph of G 1 . If we need G 2 to be a production suitable hypergraph H 2 = (G 2 , Ext 2 ) then we need to define Ext 2 in the following way. Let G 3 = (V 3 , HE 3 , t, l, at, val 3 ) where HE 3 = HE 1 − HE 2 and V 3 are the nodes in V 1 that are connected to HE 3 . Then, the nodes in Ext 2 are V 3 ∩ V 2 ; that is, the nodes that connect H 2 and H 3 . See Fig. 2.

Definition 4
A hypergraph production is a pair p = (L, R) , where both L and R are production suitable hypergraphs of the same type k (both having the same number of external nodes k).
Two graphs are isomorphic, if both have the same number of nodes and edges, the corresponding nodes of both graphs have the same labels and attributes, and the corresponding Hypergraph G 1 (top) and hypergraph G 2 (bottom). Notice that G 2 is a subgraph of G 1 (grayed out) edges of both graphs have the same labels, attributes and sequences of nodes belonging to them.
The application of the graph-grammar production p = (L, R) , where L is isomorphic with the hypergraph H 2 = (G 2 , Ext 2 ) and R is isomorphic with the hypergraph H 4 = (G 4 , Ext 4 ) to the hypergraph G 1 consists in removing the production suitable hypergraph H 2 from G 1 , replacing it by the production suitable hypergraph H 4 , and connecting external nodes of H 4 with the hyperedges of the hypergraph G 1 ∖G 2 in such a way that each hyperedge which connected the node v of G 1 ∖G 2 with the external node Ext 2 i of H 2 before application of the production, where i = 1, … , k , now connects node v of G 1 ∖G 2 with the external node Ext 4 i of H 4 . As the result, the graph G 1 is transformed into G 5 , where the set of nodes is equal to V 1 �V 2 ∪ V 4 and the set of edges is equal to The definition of graph grammar production can be extended by adding a condition over the labels and values of the hypergraphs' attributes, named the applicability predicate, which determines whether a hypergraph production can be applied.

Definition 6
A hypergraph production with applicability predicate is a triple p = (L, R, r) , where both L and R are production suitable hypergraphs of the same type and r is applicability predicate defined as: r ∶ F → {TRUE, FALSE} , where F is a set of logical expressions defined over labels and values of attributes of the hypergraphs.
A production with applicability predicate can be applied to a graph only in such a case in which the applicability predicate is fulfilled.

Definition 7
A hypergraph grammar is a system GG = (P, G S ) , where: -P is a finite set of hypergraph productions of the form p = (L, R, r). -G S is an initial hypergraph.

Longest-edge refinement algorithm
In this section, we are giving a brief description of the longest-edge algorithm. The longest-edge algorithm was first introduced by Rivara in 1984 [32] as the generalized bisection of simplices; be a set q = {a 1 , a 2 , … , a n+1 } of independent points in ℝ n , Rivara defines the diameter of the simplex as (q) = max (⟨a i , a j ⟩)∀i, j ∈ [1, n + 1] where (⟨a i , a j ⟩) = ‖a i − a j ‖ 2 Therefore, there exist two points a k , a m such that (q) = (⟨a k , a m ⟩) . Then, the generalized bisection of the simplex q consists in adding a new point a = (a k + a m )∕2 , and splitting the simplex q in two new simplices q k and q m such that Once the theoretical framework of the general bisection is laid out, Rivara develops a practical algorithm to ensure the conformity of the mesh: the Longest-Edge Propagation Path ( LEPP ) [20]. The main idea is to bisect an edge only when it is the longest-edge of all its adjacent elements (this edge is known as the terminal edge). To this end, when one triangle 0 is marked to be refined, we need to traverse all the adjacent elements through its longest edge until we find a terminal edge. All the traversal elements are known as Longest-Edge Propagation Path, and they constitute the set LEPP ( 0 ). The algorithm bisects the last two elements of the set LEPP( 0 ) and reconstructs it again until LEPP( 0 ) is empty.
In 2009, Rivara published a review of the longest-edge bisection method [17] where she describes the man properties of the method: -The iterative and arbitrary use of this method produces triangles whose smallest interior angles are always greater than or equal to half the initial mesh's smallest internal angle. Furthermore, there exists a similarity between generated triangles. This property proves the non-degeneracy of the algorithm. -Longest-edge bisection always terminates in a finite number of steps. -The relationship between the two adjacent triangles' diameter is positive and greater than a constant (K) that depends on the initial triangulation. This property ensures the smoothness (no abrupt change of size) of the new mesh. -The global iterative application of the method in any triangulation generates most of the new triangles quasiequilateral (with smallest angles greater than 30 • ).
After executing the production, the original hypergraph In 2013, a parallel multi-threaded version of the LEPP algorithm was developed by Rivara [18], where each thread manages the LEPP of a single triangle marked to be refined.
The new contributions of our paper can be summarized as follows: -the parallelism in the original Rivara's longest edge refinement algorithm is achieved by processing different longest edge refinement paths in different cores, while our graph-grammar-based algorithm allows for additional parallelization within a single longest-edge refinement path, -we express the Rivara algorithm by graph-grammar productions, and use it for the topographic mesh generation, -we extend it to model the generation of the three-dimensional tetrahedral meshes span over the terrain mesh, -we express the stabilized finite element method of the non-stationary advection-diffusion-reaction simulator by graph grammar productions, working on top of the three-dimensional tetrahedral finite element mesh, -we incorporate the Crank-Nicolson time-integration scheme and interface our graph-grammar system with GMRES solver, -we perform the pollution simulations in Lesser Poland area.

Two-dimensional mesh refinement algorithm expressed by a hypergraph-grammar
In this section, we express the two-dimensional mesh refinement algorithm, initially proposed by Rivara [20] using graph-grammars. Instead of following the idea of the Longest-Edge Propagation Path algorithm, we will define a hypergraph that models an unstructured triangular mesh and a set of productions that modify the triangles locally.
Productions are set such that all of them bisect the triangle, so they have to be applied only at triangles marked for refinement or triangles that need to be bisected to conform to the mesh; i.e., triangles with one, two, or three hanging-nodes.

Remark 2
Criteria for longest-edge in equilateral or isosceles triangles. The criterion for choosing the longest-edge is to prevent propagation, therefore, the priority for edges is: 1. Edge with a hanging-node; 2. Edge on the boundary; 3. Rest of the edges.

Hypergraph definition
The hypergraph modelling an unstructured mesh with triangular elements is defined with the set of labels C = {N, E, T} and attributes A = {x, y, z, HN, B, L, R} , where -N is a hypergraph node label that represents a triangular element node. -E is a hyperedge label that denotes an edge of a triangular element. -T is a hyperedge label that denotes an interior of a triangular element. x is a hypergraph node attribute which denotes x coordinate of the node, where D x ⊂ ℝ. y is a hypergraph node attribute which denotes y coordinate of the node, where D y ⊂ ℝ.

Productions
We need six productions to perform the longest-edge bisection algorithm. They can be summarized as follows: (P1) Triangle has no hanging-node and is marked to be refined. Predicate prioritizes the longest-edge on the border. (P2) Triangle has one hanging-node; the longest-edge is the one that contains the hanging-node. (P3) Triangle has one hanging-node; the longest-edge is one that does not contain the hanging-node. Predicate prioritizes the longest-edge on the border. (P4) Triangle has two hanging-nodes; the longest-edge is one that contains a hanging-node. (P5) Triangle has two hanging-nodes; the longest-edge is the one that does not contain a hanging-node. Predicate prioritizes the longest-edge on the border. (P6) Triangle has three hanging-nodes.

Remark 3
The six productions assume that any triangle edge can only contain one hanging-node. To ensure this restriction, productions only allow to break one edge that is connected to two regular nodes (no hanging-nodes).
To improve readability in the productions, a new function ( NL ) has been introduced. This function computes the length of a new edge depending on nodes. There are two possibilities depending on the number of arguments. So the two functions are: Next, we describe in more detail the six productions and their predicates.

Production 1 (P1)
Production (P1), Fig. 5, expresses the bisection of a triangle with no hanging-nodes. This production will bisect the triangle by edge 1.
Analysis of the predicate: ). The first condition ensures that the triangle has to be marked for refinement ( R1 ), and that the edge 1 is one of the longest-edges ((L1 ≥ L2 ) AND ( L1 ≥ L3)). If these two conditions are met, there are two cases: -(B1 ) If edge 1 is on the boundary ( B1 ), then the triangle will be bisected; prioritizing the boundary. Note that in this case, we don't have to check if they are any hanging nodes at the end of the edge. This is because there are no hanging nodes on the boundary.
is not on the boundary ( NOT B1), we need to ensure that the two nodes of edge 1 are not hanging nodes ( NOT HN1 AND NOT HN2 ); if they are, we cannot break edge 1 yet. Finally, we should ensure that there is no other longestedge on the boundary ( NOT (( B2 AND L2 = L1 ) OR ( B3 AND L3 = L1))).
This production breaks the longest edge, generating a new node in its midpoint (x = (x1 + x2)∕2, y = (y1 + y2)∕2 , z = (z1 + z2)∕2) ; this new node will be hanging if the edge is not on the boundary or a regular node if it is on the boundary ( HN=!B1 ). The breaking of the edge also generates two new edges whose lengths are half the length of the broken edge (L=L1/2), and inherit the boundary flag (B=B1 ). Then, the triangle has to be bisected; to this end, it generates a new edge that connects the newly created node with the opposite node, computing the length of this new edge (L=NL(1,2,3)) and setting it as an interior edge (B=FALSE ). Finally, it generates the new two triangles that are not marked to be refined (R=FALSE).

Production 2 (P2)
Production (P2), Fig. 6, expresses the bisection of a triangle with one hanging-nodes by the edge that contains the hanging-node.
Analysis of the predicate: The only condition in this production ensures that the broken edge (edge 4 + edge 5) is one of the longest-edges (((L4+L5 ) ≥ L2 ) AND ((L4+L5 ) ≥ L3)). We don't need to check if the triangle is marked to be refined since this bisection is needed for conformity. No other conditions are required since an edge with a hanging-node has a higher priority.
This production does not break the longest edge since it's already broken. The production does bisect the triangle; to this end, it generates a new edge that connects the newly created node with the opposite node, computing the length of this new edge (L=NL(4,3)) and setting it as an interior edge

Production 3 (P3)
Production (P3), Fig. 7, expresses the bisection of a triangle with one hanging-nodes by one edge that does not contain the hanging-node. This production will bisect the triangle by edge 3.
Analysis of the predicate: ( ( L3 ≥ L2 ) AND ( L3 > ( L4+L5 )) ). The first condition ensures that edge 3 is longer or equal to edge 2 ( L3 ≥ L2 ). It also ensures that edge 3 is strictly longer than the edge that is broken ( L3 > ( L4+L5)); it should be strictly longer because if they have the same length, the broken edge has higher priority.
We don't need to check if the triangle is marked to be refined since this bisection is needed for conformity.
Once we know that edge 3 is suitable for being broken, there are two cases:  Finally, we should ensure that edge 2 is not a longestedge on the boundary ( NOT ( B2 AND L2 = L3 ) ).
This production breaks edge 3, generating a new node i n i t s m i d p o i n t (x = (x1 + x3)∕2, y = (y1 + y3)∕2, z = (z1 + z3)∕2) ; this new node will be hanging if the edge is not on the boundary or a regular node if it is on the boundary ( HN=!B3 ). The breaking of the edge also generates two new edges whose lengths are half the length of the broken edge (L=L3/2), and inherit the boundary flag (B=B3 ). Then, the triangle has to be bisected; to this end, it generates a new edge that connects the newly created node with the opposite node, computing the length of this new edge (L=NL(1,3,2)) and setting it as an interior edge (B=FALSE ). Finally, it generates the new two triangles that are not marked to be refined (R=FALSE).

Production 4 (P4)
Production (P4), Fig. 8, expresses the bisection of a triangle with two hanging-nodes by one of the edges that contain a hanging-node. This production will bisect the triangle by the node connected to edge 4 and edge 5. Analysis of the predicate: ( ((L4+L5 ) ≥ ( L6+L7 )) AND ((L4+L5 ) ≥ L3 ) ). The only condition in this production ensures that the broken edge (edge 4 + edge 5) is one of the longest-edges ( ((L4+L5 ) ≥ ( L6+L7 )) AND ((L4+L5 ) ≥ L3 ) ). We don't need to check if the triangle is marked to be refined since this bisection is needed for conformity. No other conditions are required since an edge with a hanging-node has a higher priority.
This production does not break the longest edge since it's already broken. The production does bisect the triangle; to this end, it generates a new edge that connects the newly created node with the opposite node, computing the length of this new edge (L=NL(4,3)) and setting it as an interior edge (B=FALSE ). And finally, it generates the new two triangles that are not marked to be refined (R=FALSE).

Production 5 (P5)
Production (P5), Fig. 9, expresses the bisection of a triangle with two hanging-nodes by the edge that does not contain a hanging-node. This production will bisect the triangle by edge 3.
Analysis of the predicate: ( ( ( L3 > ( L4+L5 )) AND ( L3 > ( L6+L7 )) ) AND ( NOT HN1 AND NOT HN3 ) ). The first condition in this production ensures that the edge 3 is strictly longer than the other two broken edges ( ( L3 > ( L4+L5 )) AND ( L3 > ( L6+L7 )) ); both comparisons are strictly greater because broken edges have priority. The second condition ensures that the two nodes of edge 3 are not hanging nodes ( NOT HN1 AND NOT HN3 ); if they are, we cannot break edge 3 yet. We don't need to check if the triangle is marked to be refined since this bisection is needed for conformity. This production breaks edge 3, generating a new node in its midpoint (x = (x1 + x3)∕2, y = (y1 + y3)∕2 , z = (z1 + z3)∕2) ; this new node will be hanging if the edge is not on the boundary, or a regular node if it is on the boundary ( HN=!B3 ). The breaking of the edge also generates two new edges whose lengths are half the length of the broken edge (L=L3/2), and inherit the boundary flag (B=B3 ). Then, the triangle has to be bisected; to this end, it generates a new edge that connects the newly created node with the opposite node, computing the length of this new edge (L=NL(1,3,2)) and setting it as an interior edge (B=FALSE ). Finally, it generates the new two triangles that are not marked to be refined (R=FALSE).

Production 6 (P6)
Production (P6), Fig. 10, expresses the bisection of a triangle with three hanging-nodes. This production will bisect the triangle by the node connected to edge 4 and edge 5.
This production does not break the longest edge, since it's already broken. The production does bisect the triangle; to this end, it generates a new edge that connects the newly created node with the opposite node, computing the length of   (L=NL(4,3)) and setting it as an interior edge (B=FALSE ). And finally, it generates the new two triangles that are not marked to be refined (R=FALSE).

Comparison of the longest-edge refinement algorithm and graph-grammar-based refinement algorithm
In this section, we will compare two algorithms, the classical Rivara's lonest-edge refinement algorithm, and our graphgrammar-based refinement algorithm.
To describe the Rivara algorithm, we recall the definition of LEPP(t) , the definition of a pair of terminal triangles, and the definition of a terminal boundary triangle.
For any triangle t 0 of any conforming triangulation T, the LEPP(t 0 ) consists of the ordered list of all triangles t 0 , t 1 , t 2 , ..., t n−1 , such that triangle t i is the neighbour triangle of t i−1 by the longest edge of t i−1 , for i = 1, 2, ..., n.
A pair of terminal triangles are two adjacent triangles (t, t * ) with a common longest edge.
A terminal boundary triangle is a triangle whose longestedge is a boundary edge.
We describe the pseudocode of the so-called Rivara Backward-Longest-Edge-Bisection algorithm in Algorithm 1.

Algorithm 1 Backward-Longest-Edge-Bisection
Require: t triangle to refine, T mesh of triangular elements 1: while t remains without being bisected do 2: Find theLEPP (t) 3: t * = the last triangle of LEPP (t) 4: if t * is a terminal boundary triangle then 5: bisect t * 6: else 7: bisect the last pair of terminal triangle of LEPP (t) 8: end while We summarize in Fig. 11 the Rivara algorithm. The green triangle is the triangle denoted to break ( t 0 ). The triangles belonging to the LEPP(t 0 ) are denoted by blue color (triangles "touched" by the algorithm), the red edges are the terminal edges, the new edges created during the refinement process.
We count the number of basic operations as performed by the algorithm, defined as checking a triangle ( CHECK ) (triangles denoted by blue color), or breaking a triangle ( BREAK ) (triangles broken at the end of the LEPP ). The number of CHECK s and BREAK s is summarized in Table 1. While the single longest edge path algorithm has no potential for parallelization, all these CHECK s are executed sequentially in each step. The longest-edge refinement algorithm for a single LEPP is sequential. Even breaking the two triangles located at the end of the path is sequential since the common edge has to be locked until the first break is finished. The parallel processing time is identical. The parallelism in the classical Rivara algorithm is obtained by processing multiple LEPP s in parallel, each LEPP in sequential [18].
The graph-grammar-based algorithm is summarized in the pseudo-code Algorithm 2. We summarize in Fig. 12 the graph-grammar-based algorithm. The initially broken triangle is denoted by green color. The checked neighbors are denoted by blue color. A red breaking line denotes the broken triangles. In Table 2 we count the number of triangles where we tried to execute the productions ( CHECK ) and the number of triangles modified by the execution of the productions ( BREAKs). The graph-grammar algorithm has potential for parallelization even when the Rivara algorithm uses a single LEPP . The number of sequential CHECK s for graph-grammar-based algorithm is 44, but when we utilize Fig. 11 The steps of the Rivara algorithm Table 1 The number of touched and split triangles in each step of the Rivara algorithm presented in Fig. 11 step CHECK BREAK   1  4  2  2  3  2  3  5  2  4  4  2  5  3  2  6  2  2  Total  21  12  Total parallel  21  12 eight cores, it is equal to 10 steps. The number of BREAK s can be reduced to 9, since we break in parallel triangles that do not share the broken edge.

Algorithm 2 Graph-grammar-based mesh refinement
Require: t 0 triangle to refine, List list of triangles to refine 1: Execute production (P1)(t 0) 2: Add to List neighbors broken edges of triangle t 0 3: while Non empty List do 4: Clear(List)) 6: Execute production (P2) on triangles from Ref List 7: if any triangle broken then 8: Add to List neighbors of broken edges 9: continue 10: Execute production (P3) on triangles from Ref List 11: if any triangle broken then 12: Add to List neighbors of broken edges 13: continue 14: (similarly for other productions (P4), (P5), (P6)) 15: end while While it is impossible to derive a formula for the computational cost for a general mesh broken by classical and graph-grammar-based algorithms, we estimated the costs on a representative model example. The classical longest-edge refinement algorithm processes a single LEPP in sequential. Our graph-grammar-based algorithm can check several triangles simultaneously, and it also performs multiple breaks at the same time.

Graph-grammar-based interface with advection-diffusion-reaction solver
At the end of the two-dimensional mesh generation, we execute production (Pmesh) in parallel over each triangular element. It generates the three-dimensional tetrahedral elements on top of the two-dimensional triangular element. It plots a vertical line over each of the vertices of the triangle, it partitions it into M equally distances intervals, and it constructs M prismatic elements, and it divides each prismatic element into three tetrahedral elements (see Fig. 13).
We focus on the pollution propagation equations where u(x, y, z, t) is the pollutant concentration field, (x, y, z, t) = x (x, y, z, t), y (x, y, z, t), z (x, y, z, t) is a given wind velocity, and is the diffusion coefficient, and cu is the reaction term, see [34] for more details.
We introduce time steps 0 = t 0 < t 1 < t 2 < ⋅ < t N = T and we approximate the time derivative in a finite difference manner, with Crank-Nicolson scheme applied for time discretization.
We introduce the finite element discretization. We seek where V h is span by the tetrahedral finite elements and basis functions obtained from glueing together the element basis functions. 13 Generation of three-dimensional mesh starting from 2D mesh representing the terrain, followed by the generation of element matrices

3
A commonly used stabilization technique is the Streamline-upwind Petrov-Galerkin (SUPG) method [21]. In this method, we modify the weak form as follows where stands for the diffusion term, and = ( x , y , z ) for the convection term, and h x K , h y K and h z K are three dimensions of an element K. Thus, we have These productions are executed in parallel over each element at the beginning of each time step. The results of these productions are some matrices and vectors, and they are used to construct the local system over each element, with matrices and right-hand-sides This is done by the production (Psystem) which constructs The resulting local systems are submitted to the GMRES iterative solver.
We propose the following space refinements -time progression algorithm. We start from an initial mesh approximating the topography of the terrain roughly. We solve the first time step of the advection-diffusion-reaction problem with initial conditions. Next, we run one (25) (PgenSUPG) iteration of the longest-edge refinement algorithm. Then, we solve the second time step of the advection-diffusionreaction problem using the solution obtained in the previous step on the coarser mesh. We continue with the space iterations of the longest-edge refinement algorithm, and, at the same time, we iterate with time step with the advection-diffusion-reaction simulations. The general idea of the algorithm can be summarized in the pseudo-code presented in Algorithm 3. Notice that we assume some accuracy of the terrain approximation, and after reaching this accuracy, we do not perform more refinements there.

Manufactured solution advection-diffusion problem
In this section, we verify our solver by testing on the manufactured advection-dominated diffusion solver. We select the advection vector = (1, 1) T , and Pe = 1∕ = 100 and solve the advection-diffusion equation with homogeneous Dirichlet boundary conditions. We utilize a manufactured solution enforced by the forcing term f. We set the reaction term to zero c = 0 . This analytic expression of the solution limits the Péclet number to Pe = 100 due to machine precision. We report in Figs. 14, 15, 16 the sequence of meshes generated by the adaptive algorithm. Figure 17 presents the final mesh and the final results. We also report the convergence in L 2 norm in Table 3.

Mesh generation for the pollution simulations in Lesser Poland area
We run four different experiments, starting from four different initial meshes, presented in Fig. 18. The first initial mesh is a regular triangular mesh. The second mesh is the Delaunay mesh obtained from GMSH mesh generator [9]. The third mesh is obtained from the MeshAdapt algorithm, also from the GMSH generator. The fourth one is the Frontal-Delaunay mesh from the GMSH generator. For the regular initial mesh case, the longest-edge refinement algorithm coincides with the Kossaczky refinement algorithm [23]; for other meshes, it is not equivalent to the Kosaczky algorithm. These initial meshes are not the exact rectangles since the Earth is not flat, and the input data are taken from the Earth database [6]. The initial mesh elements have vertices adjusted to the three-dimensional points located in the R 3 space, as read from the Earth's topography database.
We run 24 iterations of our graph-grammar based algorithm. Figures 19, 20 presents some snapshots from the refinement process. The triangulation of the terrain surface is presented in Fig. 21.

Computations of the wind vector field
We generate a 3D mesh on top of a 2D meshes with four layers of prisms, each divided into three tetrahedrons. We first focus on the computations of the wind distribution in the entire area, based o two fixed values of the velocity based on the real measurements from the station in Kasprowy Wierch mountain and the station in Zakopane city. We generalize these measurements into the entire domain by solving the divu = 0 equation. The results are presented in Fig. 22. We have the north-western wind slightly changing in time.

Computations of the pollution propagation
Having the advection field, representing the wind, we focus on the advection-diffusion-reaction problem.
where u i (x, y, z, t) are the unknown concentrations of the four components of the pollution in the area, namely the vector of four unknowns, representing the chemical components is a wind velocity vector computed above, representing the north-western wind, s(u) is the chemical reactions part, where we assume linear model s(u) = Au where A is the chemical reactions matrix, and K is the diagonal diffusion matrix where the horizontal diffusion coefficient is equal to 8 * 10 −6 m 2 ∕s , and the vertical diffusion coefficient 4 * 10 −6 m 2 ∕s . The diffusion matrix is assumed to be identical for all the four species of the concentration field u. Since we have four components of the pollution, our basis functions, and element matrices and right-hand-side vectors are "duplicated" four times, and they are coupled now through the chemical reactions matrix. We assume that the pollutant comes from the north boundary, and it is blown inside the domain from the north boundary by the north-western wind computed based on the real measurements.    Fig. 18 The initial coarse meshes. The first initial mesh is a regular triangular mesh, the second mesh is the Delunay mesh obtained from the GMSH mesh generator, the third mesh is obtained from the MeshAdapt algorithm, also from the GMSH generator at the wind outlet. Moreover, we have (35) u(x, y, z, t) = 0, at the terrain level, where V d = 1.3 * 10 −3 m/s (so-called diagonal term of the deposition matrix).
We run the entire simulation on a single Linux cluster node. In Figs. 23, 24, 25, 26, we present some snapshots of the simulation. They present the propagation from the north boundary over the terrain by the north-western wind.
We have implemented our graph-grammar-based system in GALOIS [8,30] framework, allowing for concurrent graph processing. The code is compiled on node13 on the Atari Linux cluster from the Adaptive Algorithms and Systems (a2s.agh.edu.pl) research group from the Department of Computer Science, AGH University. The   The wind vector field computed by solving divu = 0 based on two stations point measurements. The maximum wind speed varies between 33 and 49 km/s node13 has Intel(R) Xeon(R) CPU E5-2680 v4 processor with 2.40GHz clock and 28 cores. The code requires gcc/8.1, boost, and cmake. The code can be downloaded from https://github.com/Podsiadlo/TerrainMesh Generator/ lonestar/graphgrammar2. Here, we present the scalability results for the concurrent code for 24 iterations. The scalability of the code running from any of the four initial meshes is similar. We report here the timings for the regular initial mesh. It transforms the initial mesh of 24 triangles into the final mesh with over 10,000,000 triangles. In Table 4 and Fig. 27, we report the execution time on the last 15 iterations of the mesh generation algorithm, with the number of used cores increasing from 1 to 28. In Table 5 and Fig. 28, we report the speedup on the last 15 iterations of the mesh generation algorithm, with the number of used cores increasing from 1 to 28. We report the timings and speedup for iterations 15-24. The previous iterations took less than 100 milliseconds, so we neglect them from our analysis. Notice that after reaching the assumed accuracy for the terrain approximation in step 20, a single step's computational cost goes down since we do not perform massive refinements over the entire mesh.

Conclusions
This paper shows how to express by graph-grammar productions the longest-edge mesh refinement algorithm for a two-dimensional mesh with triangular elements. The graph-grammar-based algorithm allows for better parallelization than classical Rivara's algorithm. We also show how to extend it to the three-dimensional grids and interface with GMRES solver and Crank-Nicolson time integration scheme. The mesh generation algorithm removes all the hanging nodes automatically from the mesh. The stabilized advection-diffusion-reaction solver executed on the computational mesh based on topographic data of Lesser Poland area provides a tool for the pollution propagation simulations. The future work will include the simulation of the pollution resulting from point sources in the Kraków city area, e.g., from the factories' chimneys. We also would like to have the thermal inversion effects simulated with Navier-Stokes-Boussinesq equations [35].