1 Introduction

Implicitly defined surfaces from real-valued three-dimensional scalar functions are usually computed with the marching cubes (MC) algorithm [25]. This method requires as input the iso-value defining the surface and processes data cell-wise. In this work, cell refers to voxels in uniform grids and hexahedra in more general structured grids. The algorithm can be easily and efficiently parallelized on a GPU which makes this technique versatile and attractive for applications in different areas of science and engineering. Nevertheless, a MC surface shows two major drawbacks, triangles are poorly shaped and the mesh is not topologically correct, i.e., it might be inconsistent across cell borders and not homeomorphic to the underlying surface as defined in Eq. (1) [29].

An alternative strategy to generate consistent triangulations of the iso-surface was presented in [16, 32, 36] and is based on the following observations. Given an iso-value \(\iota _0\), the iso-surface is defined as:

$$\begin{aligned} S_{\iota _0} = \{(x,y,z) \in \mathbb {R}^3 | T(x,y,z) = \iota _0\}, \end{aligned}$$
(1)

where \(T: \mathcal {M} \rightarrow \mathbb {R}\) is the trilinear interpolant defined on the hexahedral mesh \(\mathcal {M}\). The intersection of the iso-surface with a cell is a set of closed curves, each being a \(C^0\)-continuous collection of hyperbolic arcs. Up to four different components or branches of the iso-surface can intersect a cell, resulting in up to four independent closed curves, hereafter referred to as MC polygons. They can be easily computed and are consistent across cell borders if the asymptotic decider [18, 31] is used to resolve ambiguous cases.

In a structured hexahedral mesh, an edge is shared by four cells. If an edge is intersected by the iso-surface, then all four cells contain a MC polygon incident to this edge. A quadrilateral is obtained by connecting the vertex representatives of the four MC polygons.

A novel dual marching cubes (DMC) algorithm was presented in [19], which does not require a lookup table, generates watertight meshes, and runs in parallel on GPU. The algorithm output is a high-quality quad only mesh which accurately represents the underlying geometry as defined by Eq. (1). In this work, we extend the results presented in [19]. A different back projection method for the vertex representatives was developed which positions the vertices exactly on the iso-surface of Eq. (1) almost everywhere. Only in extreme cases with numerical instability it might be that a fall-back is required which is not on the iso-surface, but in our tests such a case never appeared. A parallel vertex and face coloring scheme is introduced. Neighborhood and connectivity information is encoded using a halfedge data structure. A limitation of DMC was discovered concerning tunnels in the iso-surface which cover two neighbor cells, i.e., a piece of surface homeomorphic to a cylinder that extends across two neighbor cells. Such tunnels create non-manifold edges in the DMC mesh, which are shared by four faces. We show how to generate a valid halfedge data structure that handles these non-manifold edges correctly. Furthermore, we implemented parallel simplification methods to reduce vertices and elements with vertex valence pattern 3-X–3-Y, with \(\text {X,Y} \ge 5\) and 3-3-3-3 which commonly appear in DMC meshes computed from volume data, thus improving the distribution of vertex valences in the mesh. For the pattern 3-X–3-Y, we use the face colors to remove a much higher number of irregular elements in comparison to the method proposed in [19]. A side effect of simplification is that some vertices are not positioned exactly on the iso-surface anymore.

In the following, we first present a brief review of previous work emphasizing the problem of topological consistency and parallel reconstruction. In Sect. 3, we present the parallel DMC algorithm, and in Sect. 4, we describe our parallel implementation of two mesh simplification techniques. In Sect. 5, we show the performance of the methods presented, and in Sect. 6, we give some comments on the results. The source code is available at GitHub [17].

2 Related work

Research contributions in the area of iso-surface extraction from volume data can be classified into three main groups, standard marching cubes (MC) and its extensions to resolve topological correctness and consistency; dual marching cubes which extract quad meshes dual to the MC polygons; and primal contouring, which computes an iso-surface from the dual grid or the dual of an octree.

Methods for computing iso-surfaces based on the standard MC, which was presented in [25], have to deal with the problem of inconsistent meshes across cell borders. Furthermore, MC methods might generate meshes which are not homeomorphic to the iso-surface as defined by Eq. (1). Methods presented so far are either based on extended lookup tables, or they first compute the intersection of the iso-surface with a cell resulting in a MC polygon which is consistently triangulated.

After Dürst [11] observed that MC does not consistently triangulate the iso-surface across cell borders, Nielson and Hamann [31] introduced the asymptotic decider to resolve ambiguities at the cell faces. Natarajan [28] discovered interior ambiguities and introduced an extended lookup table. Chernyaev [5] modified the lookup table increasing the number of cases up to 33 in total, which is commonly called MC33. Different authors proposed new techniques to improve performance or to solve topological inconsistencies [6, 7, 12, 22, 24, 26, 27, 29]. For each ambiguous face, special subcases have to be considered which results in a large number of configurations. All these methods have the same issue; namely, they generate many triangles with poor quality.

Algorithms were presented which resolve ambiguities without using a lookup table. The first method to compute iso-surfaces based on the intersection of the surface with the cell faces was proposed by Pasko et al. [32]. Renbo et al. [36] developed a triangulation algorithm which does not use lookup tables. These methods process unambiguous and ambiguous cells in the same manner, which is much more computationally intensive than the MC algorithm. Grosso [16] developed a hybrid technique which processes unambiguous cells with the standard MC. Ambiguous cells are triangulated based on a set of rules applied to the MC polygons. This method has the advantage of not relying on lookup tables in ambiguous cases.

In order to improve performance and to overcome the problem of generating a large amount of triangles many parallel strategies were proposed in the literature. A parallel iso-surface algorithm which is combined with edge collapses was presented in [10, 42]. It is a modification of the tandem algorithm introduced in [2]. Parallel implementations become more complex if the output has to be a data structure with connectivity and neighborhood information. A GPU-based technique to reconstruct and smooth the iso-surface by repositioning the vertices without changing mesh topology was introduced in [4]. A method to compute standard MC on multiple GPUs was described in [9]. A major problem of these algorithms is that they use large buffers to compute a consistent numbering of the vertex indices. Usually, a unique vertex index is computed by counting cells or edges via a prefix sum. This is not viable in our case, as buffers become very large for volume data consisting of hundreds of millions of vertices. We opted for a different technique which results in a much simpler algorithm that delivers good performance results and allows to compute iso-surfaces from very large volume data.

The dual marching cubes algorithm presented by Nielson [30] is a different strategy to reconstruct an iso-surface from volume data. The intersection of the iso-surface with the cell can be approximated by a polygon on the cell faces. This is what we call the MC polygon in the previous section. The DMC algorithm computes the dual of these MC polygons. The dual to the MC polygons is a quad only mesh because each edge in the volume mesh is shared by four cells. This algorithm relies on the lookup table introduced in [29] which consists of 23 basis cases. Ambiguities are resolved by introducing sub-configurations. This method is a generalization of the SurfaceNets proposed in [8, 14]. A parallel implementation of the DMC algorithm is presented in [23]. The method generates a 1-ring neighborhood data structure and approximates the surface by using error quadrics. Nevertheless, it relies on large buffers whose size depends on the number of edges to compute unique vertex indices via prefix sums.

Primal contouring [38] is an alternative method for extracting iso-surfaces from volume data. It first approximates the data by computing an octree. For each cell of the octree, the method computes a single vertex representing the surface intersection with the cell. The vertex is placed within the cell by optimizing a quadratic functional based on error quadrics. The method just computes a single vertex for each cell in the octree, and therefore the resulting mesh might not be manifold. Subsequently different works were proposed that mainly deal with the problem of computing manifold meshes out of an octree [21, 34, 37]. All these methods generate hybrid meshes from a hierarchical data structure.

3 Dual marching cubes

We use the index convention for vertices and edges shown in Fig. 1. For instance, in the unit reference cell \([0,1]\times [0,1]\times [0,1]\) we have \(v_0=(0,0,0)\), and \(e_0=\{v_0,v_1\}\).

The restriction of the trilinear interpolant \(T\) to a unit reference cell has the form

$$\begin{aligned} F(u,v,w)&= (1-w)[f_0 (1-u)(1-v) + f_1 u(1-v) \nonumber \\&\qquad + f_2 (1-u)v + f_3 uv] \nonumber \\&\qquad + w [ f_4 (1-u)(1-v) + f_5 u(1-v) \nonumber \\&\qquad + f_6 (1-u)v + f_7 uv], \end{aligned}$$
(2)
Fig. 1
figure 1

Local index convention for vertices and edges in a cell

Fig. 2
figure 2

MC polygons approximating the intersection of the iso-surface with the cell faces

Fig. 3
figure 3

A plane cutting through a uniform grid. The intersections of cell faces with the plane give the MC polygons

where \((u,v,w) \in [0,1]^3\) are local coordinates and \(f_i\) are the function values at the cell vertices \(v_i\). Up to four branches of the iso-surface obtained from \(F(u,v,w) = \iota _0\) might intersect the cell. In Fig. 2a, we show an iso-surface with only one component intersecting the cell at all twelve edges. In Fig. 2b, we see the corresponding hyperbolic arcs at the cell faces, and in Fig. 2c, the MC polygon used to approximate the hyperbolic arcs.

For each branch of the iso-surface, DMC selects a single vertex within the cell which represents the surface. For the case shown in Fig. 2d, three vertex representatives have to be computed.

Every edge in a hexahedral grid is shared by four cells, excluding boundary edges. An iso-surface branch that intersects an edge, will also intersect all four cells sharing this edge. Therefore, connecting the representative vertices of this branch from all four incident cells will generate a quadrilateral. If the MC polygons are constructed using the asymptotic decider [31], the generated mesh is topologically consistent across cell borders. The mesh is called dual, because each vertex in the MC polygon has a corresponding quadrilateral in the dual mesh, and each vertex of the dual mesh represents a MC polygon.

Fig. 4
figure 4

Two configurations which have the topology of a cylinder (tunnel). The second configuration causes trouble for DMC because both voxels get only one vertex which eventually results in a non-manifold edge connecting those two vertices

For example, assume a plane iso-surface branch cutting through a uniform grid. Figure 3a shows the intersections of the branch with the cells, i.e., the MC polygons. The dual of this polygonal mesh is the quadrilateral DMC mesh (Fig. 3b).

DMC generates meshes which have less vertices and better shaped elements than the meshes generated by the standard MC algorithm [30]. Nevertheless, the generated iso-surface might not be topologically correct. For certain configurations which can typically be found in medical data, the iso-surface (1) will not be homeomorphic to the reconstructed DMC mesh. The DMC algorithm as it was formulated above cannot reconstruct tunnels of sub-voxel size [16] (Fig. 4a). The standard MC algorithm cannot reconstruct these tunnels either. If the iso-surface forms a tunnel which extends across two cells, DMC generates a non-manifold edge connecting the vertex representatives of the MC polygons in each of the neighbor cells (Fig. 4b). This non-manifold edge will be shared by four faces (Fig. 7). In Sect. 3.2, we show how to keep the halfedge data structure consistent in that case.

The parallel DMC algorithm we propose generates an indexed face set for the quadrilateral mesh, where the elements are all consistently oriented. Optionally, a halfedge data structure carrying neighbor information can be computed. After initializing buffers the global structure of the proposed algorithm consists of two main steps: (1) compute the DMC mesh; (2) generate a halfedge data structure. In the initialization step, buffers are created and default values are set with the help of simple CUDA kernels. In the next subsections, we briefly describe how to compute the DMC quadrilateral mesh and subsequently generate the halfedge data structure.

3.1 DMC quadrilateral mesh

The DMC quadrilateral mesh is computed with two CUDA kernels. The first kernel proceeds cell wise and computes the vertex representatives assigning to each vertex a unique index. For each edge intersected by the MC polygon, the vertex index is stored by the kernel in a hash table where the key is the edge index in the volume grid. For each edge in the hash table, the indices of the vertices constituting the quadrilateral are stored by the threads processing the incident cells. It is mandatory to use a hash table to enable processing of volume data consisting of hundreds of millions of vertices, see Sect. 5, only few edges are intersected by the iso-surface compared to the total number of edges in a large volume grid. The second kernel collects the quadrilaterals from the hash table and stores each element into an index buffer. Each thread started by the first kernel has to carry out the following processing steps for the cell: (1) compute MC polygons, (2) estimate vertex representatives for each MC polygon, and (3) store vertex index in the hash table for each edge being intersected by the MC polygon and (4) compute face colors. In order to improve performance, the kernel returns immediately if the iso-surface does not intersect the cell. In the following, these processing steps are explained in more detail.

3.1.1 Computation of MC polygons

A cell might be intersected by up to four disconnected branches of the iso-surface. Therefore, we expect to obtain up to four closed MC polygons. The method implemented for this purpose returns the number of polygons, the size of each polygon, and the indices of the intersected edges. This quantities can be computed in two steps. First, the cell is processed face wise. The intersection of the iso-surface with a face is given by a segment, i.e., an edge of the MC polygon (Fig. 2). For each segment on a face, the indices of the start and end edge are computed. Segments are oriented such that vertices with function values larger than the iso-value are located to the left of the segments with respect to the face. Ambiguous cases are solved with the asymptotic decider [31]. In a second step, segments are connected to build up closed polygons. MC polygons, i.e., their size and the indices of the edges being intersected, can be encoded in a single \(64\) bit unsigned long long integer.

Fig. 5
figure 5

Positioning of vertex representatives by sampling the iso-surface branches. Samples that lie outside the cell or on a different branch are removed. The sample which is closest to the center of gravity of the MC polygon is chosen as vertex representative

3.1.2 Positioning of vertex representatives

Within a cell, a vertex is computed for each iso-surface branch. The vertex representatives are placed exactly on the iso-surface \(S_{\iota _0}\) with the iso-value \(\iota _0\) defined by Eq. (1). It must be ensured that vertex representatives are positioned on the right surface branch, otherwise the resulting mesh will be non-manifold. We find vertex representatives by sampling the iso-surface branches. Two of the three parameters \((u,v,w)\) are sampled, whereas the remaining parameter is computed from the trilinear interpolant, Eq. (2). For example, if \((u,v)\) are sampled, parameter \(w\) is:

$$\begin{aligned}&\!\! w = \frac{\iota _0 - g_1(u,v)}{g_2(u,v) - g_1(u,v)}\,, \quad \text {with} \end{aligned}$$
(3)
$$\begin{aligned}&\!\! g_1(u,v) = f_0 (1-u)(1-v) + f_1 u(1-v) \nonumber \\&\qquad + f_2 (1-u)v + f_3 uv\,, \quad \text {and} \nonumber \\&\!\! g_2(u,v) = f_4 (1-u)(1-v) + f_5 u(1-v) \nonumber \\&\qquad + f_6 (1-u)v + f_7 uv. \end{aligned}$$
(4)

Similar equations are obtained if \((u,w)\) or \((v,w)\) are sampled. Using the right parameters for sampling is of importance for stability. We choose as sampling space the two parameters where the MC polygon has the largest projection. Furthermore, we restrict the parameter space to the bounding box of the MC polygon, e.g., \([u_{\mathrm{min}},u_{\mathrm{max}}] \times [v_{\mathrm{min}},v_{\mathrm{max}}]\). If a sample lies outside of the cell, e.g., \(w < 0\) or \(w > 1\), it is discarded. Finally, the sample which is closest to the center of gravity of the MC polygon is chosen as vertex representative. Figure 5 shows the sampling process for a cell with three branches.

The set of sampling points can be further restricted to ensure that the vertex representative is placed on the right branch. We use the bounding boxes of the branches to assure that samples are placed on the correct branch. If there are several branches within a cell, only one of them can intersect more than three edges [16]. There is only one exception, which is when two branches intersect four edges each. If a branch intersects three edges, the sample which is closest to the center of gravity of the MC polygon is on this branch (Fig. 5). For more complex configurations, we discard all samples within the bounding boxes of other branches. Thus, we ensure that samples are only considered when they are on the correct branch.

Theoretically, one could construct a case where two bounding boxes almost completely overlap. In that case, we would need a very fine sampling to find positions which are not within the bounding box of another branch. We use a size \(7 \times 7\) set of samples. If we do not find any valid position, we try again with a size \(25 \times 25\) set of samples. If this also does not deliver a valid position, we use the center of gravity of the MC polygon. However, this fall-back was never required in any of our tests.

Normals are computed in two steps. First, the gradient of the scalar function is estimated at the cell vertices by using central difference. Second, the gradient is interpolated trilinearly at the position of the vertex representative and then normalized. We use central differences because it has a better truncation error than forward or backward difference. The computation of the gradient using the trilinear interpolant has the same approximation error as forward difference. Computing normals using central difference results in more appealing visualizations.

Fig. 6
figure 6

Construction of a quadrilateral viewed from the four correspondent cells. The cell that contributes vertex B has the iso-surface intersection on edge \(e_0 = \{v_0,v1\}\). The very same edge is called \(e_2 = \{v_2,v_3\}\) in the cell that contributes vertex A. All possible edge cases are listed in Table 1

3.1.3 Computation of the quadrilaterals

For each edge that is intersected by the iso-surface, a quadrilateral is generated by connecting the vertex representatives of the corresponding MC polygons in the incident cells. Quadrilaterals are oriented such that their normals point toward function values larger than the iso-value.

Each quadrilateral is uniquely assigned to an edge of the volume mesh. Quadrilaterals are stored in a hash table where the key is the unique index of the edge in the grid. We use open hashing and quadratic probing to find an empty bucket in the hash table. Hash tables were chosen to be twice as large as the expected number of elements in table. A quadrilateral is represented by an array of four integers. For each cell, the index of the vertex representative has to be stored at the right position within this array to construct quadrilaterals which are consistently oriented. We are using the naming convention presented in Fig. 1. Figure 6 demonstrates how to save vertex indices properly. Edge \(e_0 = \{v_0,v_1\}\) is shared by four neighbor cells. In the other three cells, it will have the names \(e_4=\{v_4,v_5\}\), \(e_6=\{v_3,v_7\}\), and \(e_2=\{v_2,v_3\}\). Assume that \(f_0 \ge \iota _0\) and the index \(B\) of the vertex is stored at the first position of the index array. The thread processing the cell where this edge has the name \(e_4\) has to store the index \(C\) of the vertex representative at the second position in the array. Similarly, the thread processing the cell where the edge has the name \(e_6\) stores the index \(D\) at the third position and finally, the thread processing the cell where the edge has the name \(e_2\) stores the vertex \(A\) at the fourth position. All possible cases are summarized in Table 1. Each quadrilateral is computed by four threads and stored in a hash table as {key, [B, A, D, C]}.

Table 1 How to build quadrilaterals depending on the edge configuration

A kernel is in charge of computing the vertex representatives from each cell. The kernel processes the input data cellwise. It computes the vertex representatives and corresponding normals for all iso-surface branches in each cell. These vertices are interior to the cell; thus, the kernel can assign a unique global index to the vertices which is required by the mesh data structure. The index corresponds to the position of vertex and normal within a buffer and is obtained using atomicAdd on an atomic counter. As indicated above this unique address for vertex and normal is stored in a hash table, where the key is the unique index of the edge in the volume grid.

Finally, a second kernel will collect the quadrilaterals from the hash table and save the elements into an index buffer. Boundaries are easily handled by this kernel. A bucket in the hash table contains a quadrilateral if all entries are valid indices. If an entry in a bucket is an invalid index, the cell was a boundary cell. In this case, no quadrilateral is generated.

3.1.4 Vertex and face coloring

A fast and simple coloring is applied in parallel to vertices and quadrilaterals by using the uniform structure of the volume data. Vertex and quadrilateral colors are derived from cell colors. The color of a cell is defined by applying a bit pattern to its three dimensional index (i,j,k),

figure c

This pattern results in 8 colors. We can transfer these colors onto the vertices of the quadrilateral mesh as they only have neighbors in adjacent cells.

From the cell coloring, we derive edge colors. Each cell colors its edges \(e_0, e_3, e_8\). By giving unique colors to the local edges incident to \(v_0\), i.e., \(e_0 = 0, e_3 = 1, e_8 = 2\), we get an edge coloring with 24 values,

figure d

where cv is the cell color and \(\texttt {e} \in \{0,1,2\} \) are the local edge colors. As each cell edge corresponds to maximum one quadrilateral, we can transfer the edge colors onto the quadrilaterals. In a second step, we reduce the number of quadrilateral colors to 5 by adding a second kernel which processes face-wise. It iterates through all quadrilaterals with colors greater or equal 5 and assigns a new color between 0 and 4 by checking which of these colors does not yet appear in its neighborhood.

Fig. 7
figure 7

Non-manifold edge connecting four edges. Faces are oriented according to the normal direction. Thus, there are two possible ways to consistently connect faces with the same orientation

3.2 Halfedge data structure

The halfedge data structure is computed using two kernels. In the first kernel, each thread processes a quadrilateral and collects the local information required by the data structure. For each vertex, we store the halfedge which starts at the vertex, and for the face the first halfedge is stored. For the four halfedges in a quadrilateral, we store the index of the vertex at which the halfedge starts, the face, and the index of the next halfedge. This kernel saves the indices of halfedges which belong to the same edge in a hash table. The key is constructed using the indices of the incident vertices and saved in a 64-bit unsigned long long int. The smaller index is saved in the first 32 bits, the larger index in the second 32 bits. The hash table also contains a counter for the number of halfedges stored in each bucket. Distinct threads processing the same edge will compute the same key. Thus, each thread stores its halfedge in the same bucket of the hash table. The exact storage position within the bucket is computed using the halfedge counter which is increased with atomicAdd.

Fig. 8
figure 8

Valence pattern 3-X–3-Y, where \(\text {X,Y} \ge 5\) and 3-3-3-3. For the case a on the left vertices in red are collapsed into a new vertex, removing the red element. For the case b in the middle the edges in red are collapsed moving the vertices in red toward the vertices in blue. The red elements are removed. For the configuration on the right no element will be removed in order to keep the mesh manifold

Fig. 9
figure 9

DMC meshes of different CTs

Global information is collected by a second kernel which processes the entries of the hash table and connects twin edges. If an edge contains only one halfedge, it is a boundary edge. If an edge contains two halfedges, we set each other as twin. Due to tunnels between two cells, it might be that an edge contains four halfedges, i.e., it is non-manifold (Fig. 7). In that case, we set the twins in a way that the data structure is consistent. We achieve this by connecting halfedges which point in opposite directions such that faces have the same orientation.

4 Mesh simplification

Due to the transitions between layers in the volume data, the DMC mesh has numerous quadrilaterals with the valence pattern 3-X–3-Y, where \(\text {X,Y} \ge 5\), i.e., two non-consecutive vertices with valence 3 and the other two vertices with valence equal to or larger than 5 (Fig. 8a); and quadrilaterals with the valence pattern 3-3-3-3 (Fig. 8b). These elements can be easily removed from the mesh as follows. For the case 8a, vertices in red are merged into a new vertex and the red element is removed. For the case 8b, edges in red are collapsed moving vertices in red toward vertices in blue. The red elements are removed. For the configuration  8c, no element can be removed in order to keep the mesh manifold.

These patterns are well known in the subject of quad mesh post-processing [1, 3, 35, 41]. The novelty here is that we perform simplification efficiently in parallel using the face coloring which is implicit to the dual marching cubes method. One should keep in mind that simplification improves mesh quality and repositions vertices. Mesh simplification is optional and can be turned off.

4.1 Pattern 3-X–3-Y

We use the face coloring scheme presented in Sect. 3.1.4 to avoid race conditions. Elements are removed from the mesh with the following CUDA kernels:

  1. 1.

    Compute vertex valence. For this purpose, a kernel iterates through the halfedges and increases the valence of its vertex by using atomicAdd().

  2. 2.

    Mark elements with the valence pattern if no neighboring element has a lower color id than the element itself.

  3. 3.

    Remove marked elements by merging vertices with valence three into a new vertex, in Fig. 8a.

  4. 4.

    Find remaining elements with the valence pattern. There is a buffer with an int entry for each vertex, a counter, which counts how often quadrilaterals with this valence pattern share a vertex with valence three. This way neighbor elements with the same valence pattern are identified easily.

  5. 5.

    Remove elements if no neighboring element has the same valence pattern, i.e., none of the valence three vertices is shared.

  6. 6.

    Remove vertices. This kernel works vertex wise. It copies vertices to a new vertex buffer, if they are not marked for removal, and it maps old to new vertex indices.

  7. 7.

    Remove quadrilaterals. This kernel works elementwise and copies quadrilaterals to a new element buffer, if they are not marked for removal. It uses index mapping from the previous kernel to connect vertices which form a quadrilateral.

  8. 8.

    Re-build halfedge data structure.

4.2 Pattern 3-3-3-3

There are two configurations for which elements with this valence pattern cannot be removed from the mesh: two neighbor elements have the same valence pattern, e.g., they are the faces of a hexahedron; or the elements are faces of a configuration as shown in Fig. 8c. In the latter case, two quadrilaterals sharing the same four vertices would remain, i.e., the mesh would be non-manifold. The following processing steps were implemented to remove elements with the valence pattern 3-3-3-3:

  1. 1.

    Compute vertex valence. Similar to the kernel presented in Sect. 4.1.

  2. 2.

    Mark vertices and elements for removal. This kernel processes the mesh elementwise. For each element, it tests if all the vertices have valence three. In this case, the complete neighborhood is reconstructed. The kernel checks if the element has a neighbor with the same valence pattern or if it is the case shown in Fig. 8c. If not, the element and its four neighbors (Fig. 8b) are marked for removal.

  3. 3.

    Remove vertices. Similar to Sect. 4.1.

  4. 4.

    Remove elements. Similar to Sect. 4.1.

  5. 5.

    Re-compute halfedge data structure.

The algorithm requires four kernels to remove elements from the mesh. We remark that neighborhood is reconstructed using the halfedge data structure.

Fig. 10
figure 10

Synthetic data sets described in Eqs. (5) and (6)

5 Results

We evaluate performance and quality of the parallel DMC and simplification algorithms presented in this work and compare them with several other reconstruction methods. We compute the iso-surface from two CT data sets, a human skull of size \(512^2\times 641\) shown in Fig. 9a and a human torso of size \(512^2\times 743\) for which iso-surfaces were extracted with different iso-values. For the iso-value 700.1 (Fig. 9b), we call the data body, and for the iso-value 1200.1 skeleton (Fig. 9c). Furthermore, we generated synthetic data representing a triple periodic level set called iWP, which is the zero surface of the scalar function:

$$\begin{aligned} f(x,y,z) =&\cos x\, \cos y + \cos y\, \cos z + \cos z\, \cos x - \nonumber \\&\cos x\, \cos y\, \cos z \end{aligned}$$
(5)

and visualize five periods in all three directions. The scalar function was discretized on a grid of size \(512^3\) (Fig. 10a). Another synthetic data are gen2 [46], shown in Fig. 10b,

$$\begin{aligned} f(x,y,z) =&2y(y^2 - 3x^2)(1-z^2) + (x^2 + y^2)^2 - \nonumber \\&(9z^2 - 1)(1 - z^2). \end{aligned}$$
(6)

We use the iso-value \(\iota _0 = 0\) and evaluate it on grids of size \(64^3\), \(128^3\), and \(256^3\). Additionally, we use qualitatively different data sets to prove robustness. The experiments were carried out on a desktop computer with an Intel Core i7-8700 with 32 GB memory and a NVIDIA GeForce RTX 2080 Ti 11GB.

We compare DMC to three other methods: the original marching cubes (MC) [25], a topologically correct marching cubes (TMC) which uses the asymptotic decider and also reconstructs tunnels within cells [18], and dual contouring (DC) [20]. DC was designed for generating hybrid meshes containing quadrilaterals and triangles and works on octrees. By restricting DC to uniform voxel grids, it generates pure quadrilateral meshes similar to DMC but with different vertex positioning and handling of cells with multiple branches. The connectivity of vertices is determined by using the SurfaceNets method described in [15]. First, we analyze the performance of the parallel DMC algorithm and present the effects of mesh simplification. Afterward, we compare the surface reconstruction methods in terms of quality.

Table 2 Processing time in milliseconds to extract an iso-surface with the DMC and MC algorithms
Table 3 Number of elements generated with the DMC and MC algorithms

5.1 Performance of parallel DMC

The performance of the parallel DMC is evaluated by measuring computation time and number of vertices and quadrilaterals generated. We compare the results with the corresponding meshes obtained with the MC algorithm. TMC and DC were both implemented on CPU and therefore are not included in the performance comparison.

Performance of DMC in comparison with MC is given in Table 2. Times are given in milliseconds. Both algorithms run on GPU and for comparison both generate a shared vertex data structure. Table 3 gives the number of vertices and elements generated by DMC compared to MC, where DMC generates a pure quadrilateral mesh and MC a triangle mesh. DMC generates slightly less vertices and correspondingly less elements than MC. DMC is slower than MC, but it still only requires approximately 100 milliseconds to reconstruct surfaces from large data sets.

We demonstrate robustness by computing iso-surfaces from numerous data sets. Table 9 gives numbers of elements and times in milliseconds. DMC computed valid iso-surfaces for all examples. Furthermore, the vertex positioning worked in all cases shown in Fig. 16.

Fig. 11
figure 11

Distribution of the vertex valences in the DMC mesh for the human skull before simplification in blue, after simplifying elements with vertex valence pattern 3-X–3-Y in red, and after simplifying elements with valence pattern 3-3-3-3 in cyan

5.2 Mesh simplification

Simplification algorithms are evaluated in terms of computational performance and the number of vertices and elements being eliminated. The histograms presented in Fig. 11 show that the elimination of elements with valence patterns 3-X–3-Y, \(\text {X,Y} \ge 5\), and consecutively 3-3-3-3 considerably reduce the number of irregular vertices.

Fig. 12
figure 12

DMC mesh after removing elements with vertex valence pattern 3-X–3-Y

Fig. 13
figure 13

DMC mesh after removing elements with vertex valence pattern 3-3-3-3

Simplifying elements with pattern 3-X–3-Y eliminates one vertex and element. Pattern 3-3-3-3 simplification eliminates four vertices and elements. The effect of the pattern 3-X–3-Y simplification is shown in Fig. 12, and the result of the simplification of elements with the pattern 3-3-3-3 in Fig. 13. Table 4 gives runtime and total number of elements removed from the original DMC mesh after each step and the number of elements removed from the mesh in total after the two mesh simplification steps. The element simplification reduces the number of elements for the human skull data set in about 10%. A similar result is achieved for the skeleton data set. The human body and the iWP level set data are smoother and therefore do not have so many elements with the vertex valence pattern 3-3-3-3. In theses cases, the simplification reduces the total number of elements by 5%. The mesh simplification based on face coloring reduces much more elements than the method presented in [19]. For the skull surface, a total of 352,337 elements with the vertex valence pattern 3-X–3-Y were simplified. With the previous method, only 167,537 could be eliminated from the mesh. We remark that the halfedge data structure has to be recomputed after each mesh simplification step.

The times required for the computation of the halfedge data structure and the element coloring are given in Table 5. Quadrilaterals are colored using five different colors. The face coloring is used by the simplification of elements with the vertex valence pattern 3-X–3-Y. The computation of the halfedge data structure requires many global memory access on the GPU which is very time consuming.

Table 4 Simplification of elements with valence pattern 3-X–3-Y and 3-3-3-3. Times are given in milliseconds
Table 5 Computation times of the halfedge data structure and face coloring in milliseconds

5.3 Surface reconstruction quality

We evaluate the surface reconstruction methods using four different criteria: deviation from the iso-value, element quality, amount of irregular vertices, and amount of non-manifold vertices. In this section, DMC corresponds to dual marching cubes without simplification and DMCS to dual marching cubes with simplification. For comparing DMC and DC to MC and TMC, quadrilaterals were subdivided into two triangles using the MaxMin angle criterion. The deviation from the iso-value is measured at the vertices. We use Eq. (2) for computing the value at vertices of medical data sets. On synthetic data sets we use the input functions, i.e., Eqs. (5) and (6). The deviation for a vertex \(v\) is computed as

$$\begin{aligned} \Delta f(v) = \vert \iota _0 - f(v) \vert . \end{aligned}$$
(7)

Quality of triangles is computed using the mean ratio metric,

$$\begin{aligned} q_{\text {m}_\text {tri}} = 4 \sqrt{3} \frac{A}{\sum _{i = 1}^{3} l_i^2} \,, \end{aligned}$$
(8)

where A is the area of the triangle, and \(l_i\) is the length of their incident edges [13, 33]. For quadrilaterals we use the shape quality metric of Stimpson et al. [39],

Table 6 Average deviation from the iso-value for different surface reconstruction methods
Fig. 14
figure 14

Deviation from the iso-value for DC, DMC, and DMCS on iWP

Fig. 15
figure 15

Quality of quadrilaterals generated by DC and DMC with and without simplification on skull

Table 7 Quality comparison of surface reconstruction methods generating triangular meshes average
Table 8 Quality comparison of surface reconstruction methods generating quadrilateral meshes
Table 9 Set of qualitatively differing data sets for evaluation of DMC
$$\begin{aligned} q_{\text {m}_\text {quad}} = \min _{i \in [1,4]} \frac{2 A_i}{l_i^2 + l_{i-1}^2} \,, \end{aligned}$$
(9)

where \(A_i\) is the area of the parallelogram defined by the two incident edges of a vertex \(v_i\), with lengths \(l_i\) and \(l_{i-1}\). In a triangle mesh, vertices with a valence unequal to 6, and in a quad mesh, vertices with a valence unequal to 4 are considered as irregular.

Table 6 shows that DMC, TMC, and MC have no deviation from the iso-value for medical data sets. This behavior is expected as all these methods place their vertices exactly on the trilinear interpolant. DMCS has some deviation which is caused by the repositioning after the 3-X–3-Y pattern simplification. The deviation is still a magnitude lower than the one of DC. Figure 14 shows \(\Delta f(v)\) for the data set iWP where vertices are sorted according to their deviation from the iso-value. DMC has its vertices closest to the iso-surface. Its deviation originates from the approximation error of the trilinear interpolant as DMC positions vertices exactly on it. The 3-X–3-Y pattern simplification repositions vertices and thus increases the deviation from the iso-value for these vertices. MC and TMC place their vertices also on the trilinear interpolant but along the voxel edges not within the voxel cells. Therefore, the deviation is similar to DMC but not exactly the same. DC uses quadratic error functions to compute its vertex positions. Therefore, vertices are in general not positioned on the iso-surface. For more smooth surfaces like gen2 all methods behave very similar.

Element quality and number of irregular vertices are stated in Table 7 for triangular and in Table 8 for quadrilateral meshes. We observe an interesting effect for the element quality when simplifying DMC. While simplification improves the quality for quads, it is reduced for triangles. The reason is that the 3-X–3-Y pattern can often be triangulated better than its simplified version, especially in cases like in Fig. 12. The same holds for the number of irregular vertices. It is improved for quadrilaterals but not for triangular meshes. If a triangular mesh is preferred, the 3-X–3-Y pattern simplification should not be applied. In general, the dual methods deliver higher-quality triangles than TMC and MC. DMC has the best triangles most of the time and DMCS has the best quadrilaterals. It is expectable that the suggested simplifications are not suited for triangle meshes as they were explicitly designed to improve the quadrilateral structure. Figure 15 compares the shape quality of quadrilaterals generated by DC, DMC and DMCS for the data set skull. DC delivers slightly less quality than DMC. DMCS has higher-quality elements. Especially the 3-3-3-3 pattern contains low-quality elements which are eliminated by simplification.

In Sect. 3, it was mentioned that DMC generates non-manifold edges when there is a tunnel in between two cells. This also causes the generation of non-manifold vertices stated in Table 8. DMCS has a higher percentage of non-manifold vertices as it has overall less vertices. The absolute number of irregular vertices is the same for DMC and DMCS. DC has another topological issue. Whenever a cell is intersected by more than one branch, a non-manifold vertex is generated. Thus, the amount of non-manifold vertices is one order of magnitude higher than for DMC and DMCS. In Table 7, the amount of non-manifold vertices is not stated because for DMC, DMCS, and DC the numbers are exactly the same as in Table 8. The marching cubes methods MC and TMC do not generate non-manifold vertices.

Fig. 16
figure 16

Reconstructed surfaces from synthetic, CT, and MRI data sets listed in Table 9

6 Conclusions

We presented a parallel implementation of the DMC algorithm which efficiently processes volume data consisting of hundreds of millions of voxels. The DMC meshes are topologically consistent across cell borders. This is due to the fact that we compute the intersection of the iso-surface with the cells using the asymptotic decider to solve ambiguities. The output of the algorithm is a quadrilateral mesh stored in a halfedge data structure. We use the fact that the data are already on GPU and perform some mesh simplification by eliminating vertices and quadrilaterals with vertex valence pattern 3-X–3-Y, \(\text {X,Y} \ge 5\) and 3-3-3-3. Mesh simplification is based on a face coloring, using 5 colors which is also implemented on the GPU. This way, vertices in the DMC mesh have a better valence distribution than meshes generated by DC. The DMC algorithm generates meshes with better triangle quality than marching cubes and better quadrilateral quality than DC. Nevertheless, the DMC algorithm is not able to generate topologically correct surface as defined by Eq. (1). The case of a tunnel in a cell or a tunnel across two neighbor cells, shown in Fig. 4, cannot be solved by the DMC algorithm.