We use the index convention for vertices and edges shown in Fig. 1. For instance, in the unit reference cell \([0,1]\times [0,1]\times [0,1]\) we have \(v_0=(0,0,0)\), and \(e_0=\{v_0,v_1\}\).
The restriction of the trilinear interpolant \(T\) to a unit reference cell has the form
$$\begin{aligned} F(u,v,w)&= (1-w)[f_0 (1-u)(1-v) + f_1 u(1-v) \nonumber \\&\qquad + f_2 (1-u)v + f_3 uv] \nonumber \\&\qquad + w [ f_4 (1-u)(1-v) + f_5 u(1-v) \nonumber \\&\qquad + f_6 (1-u)v + f_7 uv], \end{aligned}$$
(2)
where \((u,v,w) \in [0,1]^3\) are local coordinates and \(f_i\) are the function values at the cell vertices \(v_i\). Up to four branches of the iso-surface obtained from \(F(u,v,w) = \iota _0\) might intersect the cell. In Fig. 2a, we show an iso-surface with only one component intersecting the cell at all twelve edges. In Fig. 2b, we see the corresponding hyperbolic arcs at the cell faces, and in Fig. 2c, the MC polygon used to approximate the hyperbolic arcs.
For each branch of the iso-surface, DMC selects a single vertex within the cell which represents the surface. For the case shown in Fig. 2d, three vertex representatives have to be computed.
Every edge in a hexahedral grid is shared by four cells, excluding boundary edges. An iso-surface branch that intersects an edge, will also intersect all four cells sharing this edge. Therefore, connecting the representative vertices of this branch from all four incident cells will generate a quadrilateral. If the MC polygons are constructed using the asymptotic decider [31], the generated mesh is topologically consistent across cell borders. The mesh is called dual, because each vertex in the MC polygon has a corresponding quadrilateral in the dual mesh, and each vertex of the dual mesh represents a MC polygon.
For example, assume a plane iso-surface branch cutting through a uniform grid. Figure 3a shows the intersections of the branch with the cells, i.e., the MC polygons. The dual of this polygonal mesh is the quadrilateral DMC mesh (Fig. 3b).
DMC generates meshes which have less vertices and better shaped elements than the meshes generated by the standard MC algorithm [30]. Nevertheless, the generated iso-surface might not be topologically correct. For certain configurations which can typically be found in medical data, the iso-surface (1) will not be homeomorphic to the reconstructed DMC mesh. The DMC algorithm as it was formulated above cannot reconstruct tunnels of sub-voxel size [16] (Fig. 4a). The standard MC algorithm cannot reconstruct these tunnels either. If the iso-surface forms a tunnel which extends across two cells, DMC generates a non-manifold edge connecting the vertex representatives of the MC polygons in each of the neighbor cells (Fig. 4b). This non-manifold edge will be shared by four faces (Fig. 7). In Sect. 3.2, we show how to keep the halfedge data structure consistent in that case.
The parallel DMC algorithm we propose generates an indexed face set for the quadrilateral mesh, where the elements are all consistently oriented. Optionally, a halfedge data structure carrying neighbor information can be computed. After initializing buffers the global structure of the proposed algorithm consists of two main steps: (1) compute the DMC mesh; (2) generate a halfedge data structure. In the initialization step, buffers are created and default values are set with the help of simple CUDA kernels. In the next subsections, we briefly describe how to compute the DMC quadrilateral mesh and subsequently generate the halfedge data structure.
DMC quadrilateral mesh
The DMC quadrilateral mesh is computed with two CUDA kernels. The first kernel proceeds cell wise and computes the vertex representatives assigning to each vertex a unique index. For each edge intersected by the MC polygon, the vertex index is stored by the kernel in a hash table where the key is the edge index in the volume grid. For each edge in the hash table, the indices of the vertices constituting the quadrilateral are stored by the threads processing the incident cells. It is mandatory to use a hash table to enable processing of volume data consisting of hundreds of millions of vertices, see Sect. 5, only few edges are intersected by the iso-surface compared to the total number of edges in a large volume grid. The second kernel collects the quadrilaterals from the hash table and stores each element into an index buffer. Each thread started by the first kernel has to carry out the following processing steps for the cell: (1) compute MC polygons, (2) estimate vertex representatives for each MC polygon, and (3) store vertex index in the hash table for each edge being intersected by the MC polygon and (4) compute face colors. In order to improve performance, the kernel returns immediately if the iso-surface does not intersect the cell. In the following, these processing steps are explained in more detail.
Computation of MC polygons
A cell might be intersected by up to four disconnected branches of the iso-surface. Therefore, we expect to obtain up to four closed MC polygons. The method implemented for this purpose returns the number of polygons, the size of each polygon, and the indices of the intersected edges. This quantities can be computed in two steps. First, the cell is processed face wise. The intersection of the iso-surface with a face is given by a segment, i.e., an edge of the MC polygon (Fig. 2). For each segment on a face, the indices of the start and end edge are computed. Segments are oriented such that vertices with function values larger than the iso-value are located to the left of the segments with respect to the face. Ambiguous cases are solved with the asymptotic decider [31]. In a second step, segments are connected to build up closed polygons. MC polygons, i.e., their size and the indices of the edges being intersected, can be encoded in a single \(64\) bit unsigned long long integer.
Positioning of vertex representatives
Within a cell, a vertex is computed for each iso-surface branch. The vertex representatives are placed exactly on the iso-surface \(S_{\iota _0}\) with the iso-value \(\iota _0\) defined by Eq. (1). It must be ensured that vertex representatives are positioned on the right surface branch, otherwise the resulting mesh will be non-manifold. We find vertex representatives by sampling the iso-surface branches. Two of the three parameters \((u,v,w)\) are sampled, whereas the remaining parameter is computed from the trilinear interpolant, Eq. (2). For example, if \((u,v)\) are sampled, parameter \(w\) is:
$$\begin{aligned}&\!\! w = \frac{\iota _0 - g_1(u,v)}{g_2(u,v) - g_1(u,v)}\,, \quad \text {with} \end{aligned}$$
(3)
$$\begin{aligned}&\!\! g_1(u,v) = f_0 (1-u)(1-v) + f_1 u(1-v) \nonumber \\&\qquad + f_2 (1-u)v + f_3 uv\,, \quad \text {and} \nonumber \\&\!\! g_2(u,v) = f_4 (1-u)(1-v) + f_5 u(1-v) \nonumber \\&\qquad + f_6 (1-u)v + f_7 uv. \end{aligned}$$
(4)
Similar equations are obtained if \((u,w)\) or \((v,w)\) are sampled. Using the right parameters for sampling is of importance for stability. We choose as sampling space the two parameters where the MC polygon has the largest projection. Furthermore, we restrict the parameter space to the bounding box of the MC polygon, e.g., \([u_{\mathrm{min}},u_{\mathrm{max}}] \times [v_{\mathrm{min}},v_{\mathrm{max}}]\). If a sample lies outside of the cell, e.g., \(w < 0\) or \(w > 1\), it is discarded. Finally, the sample which is closest to the center of gravity of the MC polygon is chosen as vertex representative. Figure 5 shows the sampling process for a cell with three branches.
The set of sampling points can be further restricted to ensure that the vertex representative is placed on the right branch. We use the bounding boxes of the branches to assure that samples are placed on the correct branch. If there are several branches within a cell, only one of them can intersect more than three edges [16]. There is only one exception, which is when two branches intersect four edges each. If a branch intersects three edges, the sample which is closest to the center of gravity of the MC polygon is on this branch (Fig. 5). For more complex configurations, we discard all samples within the bounding boxes of other branches. Thus, we ensure that samples are only considered when they are on the correct branch.
Theoretically, one could construct a case where two bounding boxes almost completely overlap. In that case, we would need a very fine sampling to find positions which are not within the bounding box of another branch. We use a size \(7 \times 7\) set of samples. If we do not find any valid position, we try again with a size \(25 \times 25\) set of samples. If this also does not deliver a valid position, we use the center of gravity of the MC polygon. However, this fall-back was never required in any of our tests.
Normals are computed in two steps. First, the gradient of the scalar function is estimated at the cell vertices by using central difference. Second, the gradient is interpolated trilinearly at the position of the vertex representative and then normalized. We use central differences because it has a better truncation error than forward or backward difference. The computation of the gradient using the trilinear interpolant has the same approximation error as forward difference. Computing normals using central difference results in more appealing visualizations.
Computation of the quadrilaterals
For each edge that is intersected by the iso-surface, a quadrilateral is generated by connecting the vertex representatives of the corresponding MC polygons in the incident cells. Quadrilaterals are oriented such that their normals point toward function values larger than the iso-value.
Each quadrilateral is uniquely assigned to an edge of the volume mesh. Quadrilaterals are stored in a hash table where the key is the unique index of the edge in the grid. We use open hashing and quadratic probing to find an empty bucket in the hash table. Hash tables were chosen to be twice as large as the expected number of elements in table. A quadrilateral is represented by an array of four integers. For each cell, the index of the vertex representative has to be stored at the right position within this array to construct quadrilaterals which are consistently oriented. We are using the naming convention presented in Fig. 1. Figure 6 demonstrates how to save vertex indices properly. Edge \(e_0 = \{v_0,v_1\}\) is shared by four neighbor cells. In the other three cells, it will have the names \(e_4=\{v_4,v_5\}\), \(e_6=\{v_3,v_7\}\), and \(e_2=\{v_2,v_3\}\). Assume that \(f_0 \ge \iota _0\) and the index \(B\) of the vertex is stored at the first position of the index array. The thread processing the cell where this edge has the name \(e_4\) has to store the index \(C\) of the vertex representative at the second position in the array. Similarly, the thread processing the cell where the edge has the name \(e_6\) stores the index \(D\) at the third position and finally, the thread processing the cell where the edge has the name \(e_2\) stores the vertex \(A\) at the fourth position. All possible cases are summarized in Table 1. Each quadrilateral is computed by four threads and stored in a hash table as {key, [B, A, D, C]}.
Table 1 How to build quadrilaterals depending on the edge configuration A kernel is in charge of computing the vertex representatives from each cell. The kernel processes the input data cellwise. It computes the vertex representatives and corresponding normals for all iso-surface branches in each cell. These vertices are interior to the cell; thus, the kernel can assign a unique global index to the vertices which is required by the mesh data structure. The index corresponds to the position of vertex and normal within a buffer and is obtained using atomicAdd on an atomic counter. As indicated above this unique address for vertex and normal is stored in a hash table, where the key is the unique index of the edge in the volume grid.
Finally, a second kernel will collect the quadrilaterals from the hash table and save the elements into an index buffer. Boundaries are easily handled by this kernel. A bucket in the hash table contains a quadrilateral if all entries are valid indices. If an entry in a bucket is an invalid index, the cell was a boundary cell. In this case, no quadrilateral is generated.
Vertex and face coloring
A fast and simple coloring is applied in parallel to vertices and quadrilaterals by using the uniform structure of the volume data. Vertex and quadrilateral colors are derived from cell colors. The color of a cell is defined by applying a bit pattern to its three dimensional index (i,j,k),
This pattern results in 8 colors. We can transfer these colors onto the vertices of the quadrilateral mesh as they only have neighbors in adjacent cells.
From the cell coloring, we derive edge colors. Each cell colors its edges \(e_0, e_3, e_8\). By giving unique colors to the local edges incident to \(v_0\), i.e., \(e_0 = 0, e_3 = 1, e_8 = 2\), we get an edge coloring with 24 values,
where cv is the cell color and \(\texttt {e} \in \{0,1,2\} \) are the local edge colors. As each cell edge corresponds to maximum one quadrilateral, we can transfer the edge colors onto the quadrilaterals. In a second step, we reduce the number of quadrilateral colors to 5 by adding a second kernel which processes face-wise. It iterates through all quadrilaterals with colors greater or equal 5 and assigns a new color between 0 and 4 by checking which of these colors does not yet appear in its neighborhood.
Halfedge data structure
The halfedge data structure is computed using two kernels. In the first kernel, each thread processes a quadrilateral and collects the local information required by the data structure. For each vertex, we store the halfedge which starts at the vertex, and for the face the first halfedge is stored. For the four halfedges in a quadrilateral, we store the index of the vertex at which the halfedge starts, the face, and the index of the next halfedge. This kernel saves the indices of halfedges which belong to the same edge in a hash table. The key is constructed using the indices of the incident vertices and saved in a 64-bit unsigned long long int. The smaller index is saved in the first 32 bits, the larger index in the second 32 bits. The hash table also contains a counter for the number of halfedges stored in each bucket. Distinct threads processing the same edge will compute the same key. Thus, each thread stores its halfedge in the same bucket of the hash table. The exact storage position within the bucket is computed using the halfedge counter which is increased with atomicAdd.
Global information is collected by a second kernel which processes the entries of the hash table and connects twin edges. If an edge contains only one halfedge, it is a boundary edge. If an edge contains two halfedges, we set each other as twin. Due to tunnels between two cells, it might be that an edge contains four halfedges, i.e., it is non-manifold (Fig. 7). In that case, we set the twins in a way that the data structure is consistent. We achieve this by connecting halfedges which point in opposite directions such that faces have the same orientation.