DilatedToothSegNet: Tooth Segmentation Network on 3D Dental Meshes Through Increasing Receptive Vision

Krenmayr, Lucas; von Schwerin, Reinhold; Schaudt, Daniel; Riedel, Pascal; Hafner, Alexander

doi:10.1007/s10278-024-01061-6

DilatedToothSegNet: Tooth Segmentation Network on 3D Dental Meshes Through Increasing Receptive Vision

Open access
Published: 05 March 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Imaging Informatics in Medicine Aims and scope Submit manuscript

DilatedToothSegNet: Tooth Segmentation Network on 3D Dental Meshes Through Increasing Receptive Vision

Download PDF

Lucas Krenmayr ORCID: orcid.org/0000-0002-6502-9133^1,2,
Reinhold von Schwerin²,
Daniel Schaudt²,
Pascal Riedel² &
…
Alexander Hafner²

1108 Accesses
Explore all metrics

Abstract

The utilization of advanced intraoral scanners to acquire 3D dental models has gained significant popularity in the fields of dentistry and orthodontics. Accurate segmentation and labeling of teeth on digitized 3D dental surface models are crucial for computer-aided treatment planning. At the same time, manual labeling of these models is a time-consuming task. Recent advances in geometric deep learning have demonstrated remarkable efficiency in surface segmentation when applied to raw 3D models. However, segmentation of the dental surface remains challenging due to the atypical and diverse appearance of the patients’ teeth. Numerous deep learning methods have been proposed to automate dental surface segmentation. Nevertheless, they still show limitations, particularly in cases where teeth are missing or severely misaligned. To overcome these challenges, we introduce a network operator called dilated edge convolution, which enhances the network’s ability to learn additional, more distant features by expanding its receptive field. This leads to improved segmentation results, particularly in complex and challenging cases. To validate the effectiveness of our proposed method, we performed extensive evaluations on the recently published benchmark data set for dental model segmentation Teeth3DS. We compared our approach with several other state-of-the-art methods using a quantitative and qualitative analysis. Through these evaluations, we demonstrate the superiority of our proposed method, showcasing its ability to outperform existing approaches in dental surface segmentation.

MeshSNet: Deep Multi-scale Mesh Feature Learning for End-to-End Tooth Labeling on 3D Dental Surfaces

Tooth Defect Segmentation in 3D Mesh Scans Using Deep Learning

Machine (Deep) Learning for Orthodontic CAD/CAM Technologies

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The use of three-dimensional (3D) dental models has become increasingly popular in dentistry and orthodontics for diagnosis, treatment planning of tooth misalignments and the fabrication of dental restorations. These 3D dental surface models are obtained by scanning physical impressions (i.e., plaster models) or nowadays, by advanced intraoral scanners (IOSs) that directly reconstruct the digital surface model of the dentition [1]. Precise segmentation and labeling of teeth on such digitized meshes are important for accurate and reliable tooth measurement [2, 3]. Since manually labeling teeth from the dental model is tedious, the development of accurate and automatic 3D tooth segmentation methods for dental models is crucial. However, developing such methods is challenging, since on the one hand the shape and positioning of the teeth is highly dependent on the patient and can deviate greatly from the norm due to tooth misalignments and crowding. On the other hand, the digital scans are influenced by noise and in certain cases do not completely capture the deep intraoral region.

Early approaches often used classical segmentation algorithms such as the watershed approach to segment teeth and gum. However, they are generally not fully automated, as they require user input in the form of starting markers and labeling of segmented regions [4]. With advances in computer vision, researchers have started to utilize deep learning-based methods to address this challenge. These methods are usually applied by either projecting the 3D model into 2D space, which leads to transformation artifacts and loss of information, or by applying deep learning directly to native 3D data representations such as meshes or point clouds. Previous publications [5,6,7,8] have explored the application of these approaches for fully automated tooth segmentation. Despite continuous improvement within this domain, all approaches still exhibit certain limitations that hinder their practical applicability, mainly due to insufficient accuracy. Moreover, most of these approaches simplify the problem by assuming the standard case of 14 teeth per maxillary and mandibular, which often does not correspond to the reality of dental anatomy. In practice, patients may deviate from this standard due to missing teeth or the presence of additional wisdom teeth. Additionally, the evaluation of these approaches is frequently performed on proprietary data sets that are not publicly available, making it impossible to reproduce the results. To address this issue, Teeth3DS [9] has been published. This data set provides a publicly available resource for benchmarking semantic segmentation on 3D models, serving as a solid foundation for the present work.

This work introduces a novel feature learning strategy, called dilated edge convolution, which leverages farthest point sampling to emulate an expanding receptive visual field. The concept involves sampling from an enlarging k-nearest-neighbor graph to incorporate more distant features while maintaining a relatively low number of points involved in the edge convolution operation. The objective of this operation is to enhance the accuracy of semantic tooth segmentation, addressing the limitations of existing methods in discriminating between visually similar tooth classes. By increasing the receptive visual field, the aim is to incorporate features of neighboring teeth, utilizing this additional information to deduce the specific tooth class. Evaluations demonstrate that this strategy substantially improves the accuracy of tooth segmentation.

This approach is implemented within a network architecture consisting of dynamic edge convolution and dilated edge convolution layers (see Fig. 1). The effectiveness of our proposed method is supported by multiple experiments and thorough comparisons with state-of-the-art techniques, using the Teeth3DS benchmark data set.

Background

Deep learning is commonly applied to 3D data analysis, but the structural differences of point clouds or meshes pose challenges due to their lack of grid-like structure. This hinders the direct use of conventional convolutional neural networks (CNNs) popular in computer vision. Earlier approaches aimed to address this challenge by transforming 3D data into a collection of multiple 2D views [10, 11], or by voxelizing data into 3D grids [12, 13]. However, these methods invariably caused the loss of spatial information and introduced transformation artifacts, consequently influencing the performance of the network. To address these limitations, the novel network PointNet emerged [14]. Inspired by spatial transformer networks [15], PointNet enables learning translation-invariant geometric features by employing a series of shared multilayer perceptrons (MLPs) across vertices or faces, along with a symmetric function such as global max-pooling. This approach demonstrated promising results by using the raw surface modeled as a point cloud as input. However, it disregarded the local spatial relationships of 3D shapes, as the network learned features independently for each face. In recent years, various publications have been published aiming to overcome the limitations of PointNet. One prominent example is PointNet++ [16], which introduced a hierarchical network structure that groups points into increasingly larger sets and applies PointNets to learn group-wise geometric features. Although these extensions significantly enhance the results, they often struggle to capture detailed semantic information due to the coarse modeling of local dependencies. A more recent approach is PointNext [17], a further refinement of PointNet++ that introduces residual connections, an inverse bottleneck design, and divisible MLPs to improve the efficiency of the network.

In contrast, graph neural networks have also shown remarkable performance in learning from irregular structures, which has led to the development of several graph-based approaches. Here, 3D data is treated as a graph with points as nodes connected by edges denoting nearest neighbors. To determine the neighbors for each node, the k-nearest neighbor (kNN) algorithm is often employed, utilizing the Euclidean distance between the nodes. A commonly referenced approach in this field is the Dynamic Graph Convolutional Neural Network (DGCNN) [18], which applies edge convolution to the 3D data modeled as a graph. An interesting aspect of this approach is the application of edge convolution on a dynamic graph, which is recomputed after each layer based on the learned feature space. Using the dynamic graph, the network is capable of learning local features by considering not only points that are geometrically close, but also those that are close in the feature space. This enables the network to capture and exploit additional information beyond the input geometric space.

This work extends on these ideas by using dynamic edge convolution to learn highly informative local features which are enriched by more distant features through sampling in multiple layers from an enlarging but sparse k-nearest-neighbor graph. This mimics an increasing receptive field while preserving the ability of the network to learn semantically meaningful local features.

Related Work

In the field of dental model segmentation, early approaches relied on pre-selected geometric properties, such as mesh curvature [4, 19], or utilized harmonic field-based methods [20] to segment teeth and gum. However, these methods often involved manual intervention and provided only semi-automated segmentation.

To address these limitations, researchers used recent advances in deep learning and proposed fully automated segmentation methods for dental models. For example, Xu et al. [5] introduced a framework that converts dental models into geometric features, which are then transformed into 2D images. These images are then used to train CNNs to classify mesh faces. The network was integrated into a complex pipeline that involved multiple pre- and post-processing steps. Another approach presented by Tian et al. [6] utilized octree partitioning to voxelize the dental model. Subsequently, 3D CNNs were applied to segment the teeth and gum. Although these methods demonstrate promising results in dental model segmentation, they often require additional pre- and post-processing steps, such as feature extraction or conversion of the data into a grid structure. Unfortunately, these additional steps can result in the loss of spatial information and introduce quantization errors.

Recently, the focus on dental mesh segmentation has shifted toward deep learning methods that directly utilize the raw surface data obtained from IOSs. Lian et al. [7, 21] proposed MeshSegNet, an extension of PointNet, which can learn from raw surface data by taking the coordinates of the face vertices and normals as input. By employing multi-scale graph-constrained learning modules, MeshSegNet emulates the hierarchical modeling capability of CNNs to capture multi-scale local geometric context. Furthermore, Zhao et al. [8] presented TSGCNet, a method that also directly accepts face vertices and normals as input. This approach is based on the concept of edge convolution, introduced by DGCNN [18], for semantic surface segmentation. Notably, they introduced the idea of separating the coordinates and normals into two distinct feature streams, to account for their distinct geometric meanings. This separation enables the network to learn more discriminative geometric features by considering the unique characteristics of each feature stream.

Using raw surface data and incorporating advanced deep learning techniques, these methods aim to enhance dental mesh segmentation by exploiting the inherent geometric properties of the data and learning informative representations directly from the surface coordinates and normals.

Materials and Methods

Data and Preprocessing

In this work, we use the benchmark data set Teeth3DS [9] for teeth segmentation and labeling. This data set was recently published through the MICCAI 2022 conference and consists of 1800 unique raw maxillary and mandibular dental surfaces captured directly through an IOS from 900 different patients. The data set contains the IOS scan of the maxillary and mandibular as individual data points once per patient. An example of a labeled dental model and the corresponding coloring encoding with the scientific description per tooth, which is used in further analysis, is shown in Fig. 2. According to the publisher, the data set has been carefully validated by orthodontists and dental surgeons with more than five years of professional experience. Moreover, the data set offers two official test-train splits, each dividing the data into 1200 training instances and 600 test instances. From now on, these splits will be referred to as S1 and S2. Statistical insights into the data set are presented in Table 1. This figure presents an overview of the relative distribution of the number of teeth per case in the entire data set, as well as the number of cases that have an artificial socket. In S1, given that the maxillary and mandibular data points are distinct entities within the data set, there are instances where a patient’s maxilla is present in the training split while their mandible is in the test split (or vice versa). In this case, it is debatable whether the training and testing data are independent, as the maxilla and mandible may develop independently over time but are inherently related as parts of the same patient’s anatomy. However, in S2, the training and test data are independent of each other, as patients do not appear in both subsets. To preserve the fundamental concept of using the data set as a benchmark and to ensure reproducibility of the results, the decision was made to still use the given train/test splits S1 and S2. Notable in both S1 and S2, the distribution of the number of teeth follows a similar pattern, with the standard case of 14 teeth making up more than half of all cases. Furthermore, in S2, the number of cases with sockets exhibits a notable imbalance between the training and test sets. Although only 3.5% of the cases in the training set contain sockets, more than 81% cases contain a socket in the test set. The surface of the dental models is represented as a triangular mesh, which is a type of polygon mesh. A triangular mesh is a 3D surface made up entirely of a set of triangles. Each face of the object is a flat surface formed by connecting three vertices [22]. The original dental surfaces in the data set per data instance (maxillary or mandibular) vary in complexity and roughly contain between 26,000 and 520,000 faces, with an average of approximately 230,000 faces. To facilitate the segmentation process for high-resolution mesh surfaces, a common practice is to down-sample or down-scale the meshes to a standardized size (e.g., as done in previous work such as [7, 8, 21]). In our case, we simplify the meshes to a uniform size of 16,000 faces using the quadric-based edge collapse simplification method [23]. As also done by [8, 21], we apply z-score normalization [24] to the feature vector along each dimension.

Table 1 Relative number of teeth over all cases and distribution of cases with and without a socket for split S1 and S2

Full size table

Network Architecture

In this work, the aim is to train a network that effectively classifies each face of a dental surface model with M faces into one of 17 classes. These classes correspond to whether the face belongs to the gum or one of the 16 teeth. The classes are directly translatable to the FDI notation (dental tooth numbering system). To achieve this, we transform the triangular mesh representation of the dental model into an $M \times 24$ vector, which serves as input to the network. Each face is described by a combination of 3D coordinates of its vertices and the center (12 elements), as well as the normal vectors for each vertex and the face normal itself (12 elements).

The output of the network is an $M \times 17$ matrix, where each row represents the probabilities that the corresponding face belongs to a specific class. This enables the network to provide a classification for each individual face in the mesh.

As shown in Fig. 3, the network architecture consists of two main blocks: the local and the dilated feature learning block. The local feature learning block employs a stack of dynamic edge convolution layers. These layers are responsible for learning local features for each face of the mesh. Using dynamic edge convolution layers, the network can effectively capture intricate details and patterns at a local level. The dilated feature learning block operates on the learned local features and incorporates a set of dilated edge convolution layers. These layers enhance the features by considering more distant features. This allows the network to take into account information from neighboring faces in a hierarchical manner, enhancing the receptive field and capturing broader contextual information.

Finally, the prediction head is formed using a set of point-wise MLPs. These MLPs output the logits for each face, which are then used to compute the final class probabilities via the softmax activation function. Overall, this network architecture enables effective classification of individual faces in the tooth surface model by utilizing both local and more distant information.

Local Feature Learning

In order to capture local geometric features, we employ the dynamic edge convolution introduced by DGCNN [18], which involves the construction of a dynamic kNN graph and the successive application of edge convolution. The concept of edge convolution is visualized in Fig. 4.

Consider a mesh with M faces, denoted by $X = \{x_1, \ldots , x_n\}$. For a face $x_i$ edge convolution learns edge features $e_{ij} = h_\theta (x_i, x_j)$, where $h_\theta$ denotes a parametric non-linear function, which is characterized by a set of learnable parameters. These features describe the relationship between a point or, in this case, a face $x_i$ and its neighbors $x_j$. This is being done by first constructing a kNN graph in each layer by finding the k-nearest neighbors for the M faces based on the Euclidean distance in the feature space. For each face $x_i$ we denote the indices of its k-nearest neighbors as $\mathcal {N}\left( i\right)$. Afterward, edge convolution followed by a pooling operation $\boxed { p }$ is applied on each $\mathcal {N}\left( i\right)$ to learn embedded features. The edge convolution is given by Eqs. 1–3 [18]. Here Eq. 1 encodes global information as $x^{g}_i$, Eq. 2 encodes local neighborhood information as $x^{l}_i$ and Eq. 3 combines both global information and local neighborhood information as $x^{c}_i$. $h_\theta$ is implemented as a shared MLP.

$$\begin{aligned} x^{g}_i = \underset{\mathrm { j \in \mathcal {N}(i) }}{\boxed { p }} { h_\theta (x_i}) \end{aligned}$$

(1)

$$\begin{aligned} x^{l}_i \underset{\mathrm { j \in \mathcal {N}(i) }}{\boxed { p }}{ h_\theta ( x_j - x_i }) \end{aligned}$$

(2)

$$\begin{aligned} x^{c}_i = \underset{\mathrm { j \in \mathcal {N}(i) }}{\boxed { p }}{h_\theta (x_i || x_j - x_i }) \end{aligned}$$

(3)

The choice of pooling operation $\boxed {p}$ is an important aspect to consider [25]. The pioneering work of PointNet [14] and DGCNN [18] established max-pooling as a widely adopted choice within this domain. Alternative pooling methods include avg or sum-pooling. Recent work has investigated more sophisticated pooling operations [8, 26] introduce the concept of graph attention pooling, in which a weighted sum-pooling mechanism is employed. This approach involves learning attention weights denoted as $\alpha _i$ and described by Eq. 4, for the neighborhood features of a face $x_i$. These are learned by a lightweight MLP here donated as $h_\sigma$. The input is similar to Eq. 3 the global and local neighborhood features $(x_i || x_j - x_i)$.

$$\begin{aligned} \alpha _i = \underset{\mathrm { j \in \mathcal {N}(i) }}{h_\sigma }(x_i || x_j - x_i ) \end{aligned}$$

(4)

Following this step, edge convolution is applied in which $\boxed { p }$ is substituted by the sum-pooling after the learned edge features are multiplied element-wise ($\odot$) with the learned attention weights $\alpha _i$.

$$\begin{aligned} x^{\alpha }_i = \sum _{\mathrm { j \in \mathcal {N}(i) }}^{ }{\alpha _i \odot {h_\theta (x_i || x_j )}} \end{aligned}$$

(5)

TSGCNet [8] also introduces the practice of splitting coordinates and normals into separate streams to allow the network to learn more discriminative geometric features. This involved applying max-pooling in the normals stream and attention pooling in the coordinate stream. However, various training experiments with the described setup using the Teeth3DS data set showed that separating the streams has no positive influence. Consequently, in pursuit of a simpler variant, we abandoned the notion of separating normals and coordinates into separate streams. Instead, we opted for a unified stream employing the edge convolution operation specified in Eq. 3 followed by max-pooling.

Dilated Feature Learning

Dynamic edge convolution has proven to be a very effective operation in learning local geometric features. In the context of semantic segmentation, it is important to consider not only the features of individual faces but also the surrounding face features. Edge convolution enables this by integrating the features of neighboring faces. Furthermore, the authors of DGCNN [18] claim that dynamic recalculation of the kNN graph in the feature space further enables a decoupling of the neighbor definition from the metric space, allowing to capture semantic characteristics over potentially long distances.

For the semantic segmentation of dental models, the decoupling of the neighborhood relation from the metric space is not sufficient to learn features of surrounding teeth. Figure 5 illustrates the progressive detachment of neighboring faces from their Euclidean distance for an example case, as the data propagates through deeper layers of the network. However, even at the third layer, the emphasis remains on local features with no incorporation of features from adjacent teeth. Nevertheless, neighboring teeth from the same tooth group share similar geometric characteristics (e.g., 1^st, 2^nd and 3^rd molars). Inclusion of these features in the accurate classification of an individual tooth may prove beneficial. To address this issue, one potential solution could be to expand the network depth by stacking additional layers of dynamic edge convolution. Another solution could be to expand the number of neighbors k considered in the edge convolution process. However, both approaches noticeably increase complexity and demand more computational resources since dynamic recalculation of the kNN graph is a computationally intensive operation. To address this challenge, we present an approach inspired by the dilated CNN in 2D computer vision [27] called dilated edge convolution, which aims to improve the learning of more distant features. The primary objective is to expand the receptive field of the neural network without significantly increasing the number of neighbors involved in the edge convolution. To achieve this, we extend the edge convolution operation through iterative farthest point sampling (FPS) from a larger set of neighboring faces. The following steps outline the procedure, which is also depicted in Fig. 6 and is outlined in Eq. 6:

1.
For a given face, denoted as $x_i$, we identify a bigger set of k neighboring faces, denoted as $\mathcal {N}_\text {b}\left( i\right)$.
2.
From this set of neighbors $\mathcal {N}_\text {b}\left( i\right)$, we use FPS to sample f faces, denoted as $\mathcal {N}_\text {s}\left( i\right)$. This sampling technique results in a dilated neighborhood graph that encompasses a larger range compared to the traditional k-nearest neighbor graph while maintaining a fixed number of neighbors. Consequently, we can replace $\mathcal {N}\left( i\right)$ with $\mathcal {N}_\text {s}\left( i\right)$ in the existing edge convolution operation.
3.
Finally, we apply edge convolution followed by max-pooling on each $\mathcal {N}_\text {s}\left( i\right)$ to acquire more distant features.

$$\begin{aligned} x^{d}_i = \max _{ j \in \mathcal {N}_s(i) }{ }{ h_\theta (x_i || x_j - x_i }) \end{aligned}$$

(6)

By incorporating the FPS into the edge convolution operation, we can effectively expand the neural network’s receptive field without increasing the number of neighbors involved in the edge convolution. This idea is also demonstrated on a dental model in Fig. 7 for an increasingly larger sample area. In this particular example, the number of neighboring faces k in $\mathcal {N}_\text {b}\left( i\right)$ increases, while the number of sampled faces f remains the same. Consequently, more distant features are taken into account during the edge convolution operation. However, since this leads to a sparse representation of the surface, it is advantageous to apply this to a feature space that already incorporates neighboring features per face, which is the case after applying edge convolution as described in Section 4.3.

Additionally, it is worth noting that the dilated neighborhood graph is computed on the face coordinates in the metric space and therefore remains static. Consequently, it can be precomputed, thereby saving computational resources.

This operation is integrated in the network as a middle layer after the local edge convolution and operates on the features extracted from the preceding layers.

Experiment

In this section, we provide a brief description of the competing methods that we used for comparison against our own approach. Additionally, we outline the experimental setup.

Competing Methods

To demonstrate the efficacy of our proposed method, we compared it with three state-of-the-art techniques in 3D surface segmentation (PointNet++ [16], PointNext [18] and DGCNN [18]), as well as two state-of-the-art techniques specifically designed for 3D dental model segmentation (MeshSegNet [21] and TSGCNet [8]). The inputs for these methods are briefly described below:

PointNet++ and PontNext: Both networks take as input an $M \times 6$ matrix, where each row represents the 3D coordinates of the face center as well as the normal vector of each face.
DGCNN: Similarly, DGCNN expects an $M \times 6$ matrix as input, where each row contains the 3D coordinates of the face center along with the normal vector of each face.
MeshSegNet: For MeshSegNet, the input is represented by an $M \times 15$ matrix. The first 12 values of each row denote the 3D coordinates of the three vertices of each face and the face center ($3 \times 3 + 1 \times 3$). The last 3 values contain the normal vector of the face. Additionally, small- and large-scale adjacency matrices (AS and AL) serve as input for the graph-constrained learning modules of MeshSegNet.
TSGCNet: TSGCNet expects an $M \times 24$ matrix as input. The first 12 values of each row represent the 3D coordinates of the three vertices of each face and the face center ($3 \times 3 + 1 \times 3$). The last 12 values contain the normal vectors of the three vertices and the face ($3 \times 3 + 1 \times 3$).

For a fair comparison, all methods are trained using the same setup as described in the next section, except for MeshSegNet where the batch size was set to 10. This adjustment was made according to the details provided in the original paper [21], as smaller batch sizes resulted in inadequate convergence.

Experiment Setup

The network implementation and experiments were performed using the deep learning framework PyTorch [28]. Training was carried out on two NVIDIA GTX 3090 GPUs. Both the proposed network and the competing networks were trained by minimizing the face-wise cross-entropy loss for a total of 100 epochs, utilizing the Adam optimizer [29]. The distributed data parallelization technique was used for multi-GPU training, with a batch size of 2 per machine, resulting in an effective batch size of 4. The initial learning rate was set to $1e-3$ and underwent a decay of 0.5 every 30 epochs. To quantitatively evaluate the results, we employed the accuracy, mean intersection over union (mIoU), also known as the Jaccard index, and the Dice score as metrics.

Moreover, in the context of segmenting 3D dental models using neural networks, it is common to refine the results obtained by a post-processing step, typically employing the graph-cut method [30]. However, in this study, our primary focus lies on the outcomes generated solely by the neural networks. To allow a direct comparison, no additional post-processing steps were applied to refine the results. Thus, all the results presented in this work represent the direct output of the neural networks without any further post-processing applied.

Results

In this section, we present the results of a comparison between our proposed method, DilatedToothSegNet, and other advanced techniques for 3D surface segmentation and 3D dental model segmentation. Furthermore, we performed several training experiments to evaluate the effects of the key components employed in our approach.

Quantitative Evaluation

Table 2 displays the quantitative evaluation results for all competing methods. The table includes the overall accuracy (OA), mean Intersection over Union (mIoU), and Dice score for both splits S1 and S2. Additionally, the table presents the per-class metrics, where the metrics for the same tooth types from the left and right sides are aggregated. Several notable observations can be derived from the results:

Our proposed method consistently outperforms all other methods in terms of overall accuracy, mIoU, and Dice score.
In terms of metrics per class, our method achieves superior performance in most cases. However, when it comes to the class of the 3^rd molar, MeshSegNet [21] performs slightly better in terms of accuracy. Additionally, for the gum class in S1, TSGCNet [8] shows slightly better accuracy compared to our method. In terms of mIoU and Dice, our method also performs best considering the train/test split 1. However, while all methods achieve insufficient results for the 3^rd molar in the train/test split 2, DGCNN performs better.
Compared to TSGCNet [8], which employs dynamic edge convolution as the primary feature learning technique, our method shows better results. This emphasizes the effectiveness of our proposed dilated edge convolution for additional more distant feature learning.
Our method also outperforms PoinNext [16] and PointNet++ [16], which rely on iterative grouping and the transformation of points at different scales as their learning strategy. This highlights the effectiveness of utilizing dynamic and dilated edge convolution as a learning strategy for dental models.

Furthermore, the accuracy, mIoU, and Dice score for both S1 and S2 are illustrated in Fig. 8 as box plots, representing the results for each individual data point in the data set. Once again, it is evident that our method outperforms the others and achieves the least deviation. Notable PointNext and MeshSegNet also achieve promising results, followed by TSGCNet. However, TSGCNet tends to exhibit a significant deviation toward low-quality individual results. On the other hand, DGCNN and PointNet++ yield the poorest results. However, there is a noticeable disparity in performance between S1 and S2. In particular, cases without a socket are noticeably worse segmented in S2. This can be explained by the imbalance of cases with and without sockets in S2 as described in Section 4.1. This imbalance leads to a strong tendency during the training toward cases with sockets, which is no longer adequately represented in the evaluation. As a result, the models trained on S2 tend to perform better for cases with sockets, but perform poorly for cases without a socket.

Despite this challenge, our proposed model still produces noticeably better results, demonstrating its ability to generalize effectively and learn features mainly based on the geometric properties of the teeth. This aspect is particularly important in real-world dental scenarios, where different IOSs may be used, leading to variations in socket types and other artifacts.

Table 2 Segmentation results for five competing methods and our method in terms of accuracy, mIoU and Dice score for two different train/test splits per tooth type/gum and over all classes (OA). Relevant values in bold

Full size table

Qualitative Evaluation

For qualitative assessment, Fig. 9 visually presents the results achieved by the competing methodologies. It includes four distinct examples, each showing unique characteristics. Selected regions with poor segmentation accuracy are highlighted by red circles.

In the first example, the 2^nd and 3^rd molars are missing. Our method effectively segments each tooth without erroneously identifying any part of the surface as one of the missing teeth. In contrast, the other methods falsely segment certain areas of the gum or the 1st molar as the 2^nd molar. This suggests that the other methods primarily focus on the geometry of the teeth, where the 1^st and 2^nd molars share similarities. However, considering the surrounding area, it becomes evident that it is the 1^st molar. The second example represents a common case in which all teeth are present, except the wisdom teeth. In addition, a socket is part of the dental model. Also, the central incisors are severely misaligned. In this case, all methods, except our proposed method, produce misclassified patches in different areas. The third example presents a case where multiple teeth are absent. Additionally, the left 1^st premolar exhibits minimal development. In this instance, both TSGCNet and our approach appear to deviate from the ground truth. However, it can be argued that the ground truth in this specific case might be incorrect as it fails to capture all of the surface of the tooth. In the fourth example, all teeth are present, including wisdom teeth. Here, TSGCNet, MeshSegNet, and DGCNN generate reasonable results with only small misclassified patches. However, PointNet++ generates larger misclassified patches and fails to produce clear segmentation boundaries. The fifth example visualizes a case from the front, providing a better view of the segmentation boundaries. In this case, in addition to our method, TSGCNet, MeshSegNet, and DGCNN achieve distinct segmentation boundaries. However, these methods also generate misclassified patches elsewhere. PointNet++ completely fails to provide clear boundaries and generates misclassified patches.

In general, this qualitative assessment further demonstrates that our proposed method excels in delivering clear segmentation boundaries even for cases involving missing or misaligned teeth regardless of whether a socket is present or not.

In addition, we conducted various training experiments that compare the effectiveness of the main components of the presented approach.

Local Feature Learning Strategy

The local features learning layers employing dynamic edge convolution as described in Section 4.3 play a crucial role in the proposed method, as they capture sensitive local information required for learning highly accurate segmentation boundaries. This work builds up on top of concepts introduced by DGCNN [18] and TSGCnet [8] which utilized the following strategies:

Dynamic recalculation of the kNN graph after each layer
Splitting of coordinates and normals into two separate streams
Using attention pooling for the coordinate and max-pooling for the normals stream

Nevertheless, as our proposed method differs in terms of overall architecture from existing methods and utilizes a different data set, we conducted multiple training experiments with various configurations to assess the effectiveness of the mentioned strategies within the context of our proposed architecture using the Teeth3DS data set.

Table 3 Results when using a dynamic versus a static graph, when splitting the coordinates and normals into separate streams and for different pooling methods. Relevant values in bold

Full size table

The results are listed in Table 3, based on which the following observations can be derived:

Max-pooling seems to perform marginally better compared to attention pooling across most configurations, exhibiting slightly higher accuracy, mIoU, and Dice scores for S1 and S2 in several instances.
Splitting the streams and using a static vs. a dynamic kNN graph show mixed impacts on the results. For the train/test split S1, splitting the streams and having a static graph seems to improve performance, while for S2 it seems that having a single stream with a dynamic graph is the better option. However, in most configurations, the differences are marginal.

It should be noted that the discrepancies found in general are mostly marginal and probably only applicable in the context of this specific data set. Consequently, these discrepancies are not relevant, making it difficult to draw a general conclusion. Therefore, for this paper, the decision was made to consider the optimal average results of both training test splits (S1 and S2). Consequently, a singular stream utilizing max-pooling on a dynamic graph was used.

Dilated Feature Learning Strategy

As outlined in Section 4.4, we propose the dilated edge convolution operator to enhance the local features acquired through the dynamic edge convolution layers with additional more distant features. To assess the efficacy of these layers, we conducted a set of training experiments using four different configurations and compared the segmentation metrics and visual results. The configurations encompass a spectrum that ranges from the absence of dilated edge convolution layers to the utilization of varying ranges. Additionally, we explored the stacking of multiple layers, progressively increasing in range, as proposed in the final architecture.

Table 4 Results when using different settings for the dilated edge convolution layers. Relevant values in bold

Full size table

The configurations and results can be found in Table 4. From the results, the following observations can be derived:

Utilizing dilated convolution layers in conjunction with dynamic edge convolution layers improves the performance of the network compared to using dynamic edge convolution layers alone.
Larger sample sizes, leading to a wider receptive field, result in improved performance compared to smaller sample sizes.
Stacking multiple layers and continuously increasing the sample size further enhance the overall results.

The qualitative assessment, as visualized in Fig. 10, confirms these conclusions. The upper section of the figure visualizes the segmentation outcomes of a network that uses only dynamic edge convolution (left) and a network that incorporates dilated edge convolution layers (right). The bottom section illustrates the class salience maps obtained by applying guided backpropagation [31] for the class 1^st premolar (L). This example demonstrates that the network generates a more accurate segmentation mask in the latter setup, as opposed to the former. The improved performance can be explained by the additional contextual information provided by the dilated edge convolution layer, which expands the network’s receptive field. The class salience map reveals that the network on the right-hand side utilizes geometric features from farther away. In contrast, on the left-hand side the network focuses primarily on the geometric features of the tooth itself. This discrepancy leads to misclassifications when the geometric properties of two teeth are quite similar. However, by incorporating features from surrounding areas, such misclassifications can be mitigated. In essence, these two scenarios highlight the distinction between predicting the segmentation mask of a particular tooth solely based on the tooth and its immediate surroundings versus considering neighboring teeth and gum tissue.

Table 5 Timing Analysis of Dilated Edge Convolution with and without precomputed Indices

Full size table

The time required for a single forward pass through the network using a batch size of 2 with different numbers of layers and whether the indices are precomputed is shown in Table 5. As the number of layers increases, the computational time also increases. When the indices are precomputed, the time gradually increases from 0.0078 s for one layer to 0.026 s for three layers. However, without precomputed indices, the time increases markedly, from 0.0391 s for one layer to 0.1291 s for three layers. These timings are important for both the training and inference phases. Although the computation of the sampling index is time-consuming, the training time can be minimized by precomputing the indices as part of the data preprocessing step. However, the sampling indices are not available during the inference phase. Hence, it is important to consider the time needed to compute these when estimating the overall inference time. However, as the time for a forward pass, which includes calculating the sampling indices, is still well below one second, inference can still be performed relatively fast.

Discussion

In summary, this study presents several contributions. First, we introduced a novel technique, termed dilated edge convolution, which involves sampling from the feature space with increasing distances through several layers of the network using farthest point sampling. This aims to simulate an expanding field of receptive vision, which ultimately leads to improved tooth segmentation accuracy. Additionally, we proposed a network architecture that integrates dynamic edge convolution layers alongside the newly introduced dilated edge convolution layers. Furthermore, we conducted a comparative analysis of our methods against various state-of-the-art approaches, using the recently published benchmark data set Teeth3DS [9]. Through this analysis, we evaluate the effectiveness of our proposed method. Nevertheless, the following also discusses the limitations of our approach and the data set used. Finally, future research directions are suggested, and our work is positioned within the current research landscape.

Limitations

Extreme Cases and Unbalanced Labels

Despite achieving improved results in raw dental model surface segmentation, our method exhibits certain limitations that need consideration when it is implemented in practical applications. In Fig. 11, we present three cases in which common limitations are observed. In the first case (left), the 2^nd premolar is missing. Unlike most other instances of missing teeth, there is no dental space, making it exceptionally challenging even for human observers to discern the absence of a specific tooth. In this particular case, our network also fails to detect this scenario accurately. On the left side, parts of the 1^st premolar are misclassified as the 2^nd premolar and on the right side parts of the 1^st molar are misclassified as the 2^nd premolar. This highlights the inherent difficulty of the task and the wide range of extreme cases that may be encountered.

The second (center) and third (right) cases are representative of the unbalanced labels within the data set. In the second scenario, several teeth are absent, which leads to only 10 teeth being present. On the contrary, in the third scenario, all 16 teeth are present, including the 3^rd molars. In both scenarios, our methodology yields misclassified regions. However, this does not occur in all extreme cases. Qualitative analysis reveals instances where our approach adeptly manages these extreme cases. However, it should be noted that such exceptional cases are extremely rare in the data set, making it difficult for the network to learn them and generalize appropriately. Therefore, in future research, it should be considered to address this data imbalance by incorporating additional extreme cases.

Data Quality

It was observed that there are certain cases within the Teeth3DS data set that are obviously mislabeled. Figure 12 shows two such examples. In one case, two teeth are labeled as belonging to the same class (left), while in the other case, a large part of the gum is mislabeled as part of a tooth (right). Although these cases constitute a relatively small number within the data set, they can still have a negative impact on network training and evaluation. Despite these observations, the decision was made to retain the data set as is and not to manually correct mislabeled cases. This was done to preserve the fundamental concept of using the data set as a benchmark and to ensure reproducibility of the results. Nevertheless, it is recommended that a revised version of the data set should be published. This would contribute to improving the quality of the data set and overall reliability as a benchmark for future research and evaluation purposes.

Future Work

The dilated edge convolution operation, when used in conjunction with dynamic edge convolution layers, demonstrated promising results. It is worth exploring whether this operation can be applied to a feature space produced by a different backbone network, such as PointNext [32], to potentially achieve even better results.

Furthermore, in the field of dental model segmentation, recent studies have introduced end-to-end segmentation frameworks [32]. These frameworks utilize a multistage segmentation approach, where the first stage involves predicting the location of each individual tooth. In the second stage, the teeth are segmented individually by extracting the region of interest identified in the first stage. Our proposed approach can seamlessly integrate with this multistage approach by serving as the network for the first or second stage.

Moreover, several methods have been proposed to adapt transformer architectures, which have achieved significant success in natural language processing (NLP) tasks, to the field of 3D deep learning [33, 34]. However, the resource-intensive nature of Attention poses a challenge when applying this architecture to high-resolution 3D models, such as dental models. Overcoming this limitation could make the transformer architecture a promising approach for dental model segmentation.

Conclusion

In this work, we introduced DilatedToothSegNet, a graph neural network designed to automatically segment the surfaces of 3D dental models obtained from IOSs. Building on the work of TSGCNet [8] and DGCNN [18], our approach incorporates the concept of utilizing dynamic edge convolution layers to learn discriminative local geometric features. Additionally, we introduced a dilated edge convolution network operator that effectively learns supplementary more distant features, thereby mitigating misclassified patches and enabling successful segmentation of extreme cases involving missing or misaligned teeth. To assess the performance of DilatedToothSegNet, we performed evaluations on the public benchmark data set Teeth3DS and compared its results with other state-of-the-art methods in the field of 3D point cloud and dental model surface segmentation. The results demonstrate the superiority of our proposed method, highlighting its effectiveness in achieving more accurate and precise segmentation masks compared to existing approaches.

The proposed methodology can be incorporated into a CAD software for treatment planning purposes, automating the otherwise time-intensive segmentation task. Furthermore, the segmentation of dental models serves as the basis for subsequent analysis steps, such as the Bolton analysis [35] for tooth measurement, which therefore requires highly accurate results.

Data Availability

The data set Teeth3DS used for this paper is available at: github.com/abenhamadou/3DTeethSeg22_challenge

Code Availability

The code associated with this paper is available at: github.com/LucasKre/dilated_tooth_seg_net

References

Martin, C.B., Chalmers, E.V., McIntyre, G.T., Cochrane, H., Mossey, P.A.: Orthodontic scanners: what’s available? Journal of Orthodontics 42(2), 136–143 (2015) https://doi.org/10.1179/1465313315y.0000000001
Article PubMed Google Scholar
Shailendran, A., Weir, T., Freer, E., Kerr, B.: Accuracy and reliability of tooth widths and bolton ratios measured by ClinCheck pro. American Journal of Orthodontics and Dentofacial Orthopedics 161(1), 65–73 (2022) https://doi.org/10.1016/j.ajodo.2020.06.048
Article PubMed Google Scholar
Amuk, N.G., Karsli, E., Kurt, G.: Comparison of dental measurements between conventional plaster models, digital models obtained by impression scanning and plaster model scanning. International orthodontics 17 1, 151–158 (2019)
Article Google Scholar
Li, Z., Ning, X., Wang, Z.: A fast segmentation method for stl teeth model. In: 2007 IEEE/ICME International Conference on Complex Medical Engineering, pp. 163–166 (2007). IEEE
Xu, X., Liu, C., Zheng, Y.: 3d tooth segmentation and labeling using deep convolutional neural networks. IEEE Transactions on Visualization and Computer Graphics 25(07), 2336–2348 (2019) https://doi.org/10.1109/TVCG.2018.2839685
Article PubMed Google Scholar
Tian, S., Dai, N., Zhang, B., Yuan, F., Yu, Q., Cheng, X.: Automatic classification and segmentation of teeth on 3d dental model using hierarchical deep learning networks. IEEE Access 7, 84817–84828 (2019) https://doi.org/10.1109/ACCESS.2019.2924262
Article Google Scholar
Lian, C., Wang, L., Wu, T.-H., Liu, M., Durán, F., Ko, C.-C., Shen, D.: Meshsnet: Deep multi-scale mesh feature learning for end-to-end tooth labeling on 3d dental surfaces. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, pp. 837–845. Springer, Cham (2019)
Zhao, Y., Zhang, L., Liu, Y., Meng, D., Cui, Z., Gao, C., Gao, X., Lian, C., Shen, D.: Two-stream graph convolutional network for intra-oral scanner image segmentation. IEEE Transactions on Medical Imaging 41(4) (2022) https://doi.org/10.1109/TMI.2021.3124217
Ben-Hamadou, A., Smaoui, O., Chaabouni-Chouayakh, H., Rekik, A., Pujades, S., Boyer, E., Strippoli, J., Thollot, A., Setbon, H., Trosset, C., et al.: Teeth3ds: a benchmark for teeth segmentation and labeling from intra-oral 3d scans. arXiv preprint arXiv:2210.06094 (2022)
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 945–953. IEEE Computer Society, Los Alamitos, CA, USA (2015). https://doi.org/10.1109/ICCV.2015.114
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3d hand pose estimation in single depth images: From single-view cnn to multi-view cnns. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3593–3601. IEEE Computer Society, Los Alamitos, CA, USA (2016). https://doi.org/10.1109/CVPR.2016.391
Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4490–4499. IEEE Computer Society, Los Alamitos, CA, USA (2018). https://doi.org/10.1109/CVPR.2018.00472
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920. IEEE Computer Society, Los Alamitos, CA, USA (2015). https://doi.org/10.1109/CVPR.2015.7298801
Charles, R., Su, H., Kaichun, M., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77–85. IEEE Computer Society, Los Alamitos, CA, USA (2017). https://doi.org/10.1109/CVPR.2017.16
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Advances in neural information processing systems 28 (2015)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Qian, G., Li, Y., Peng, H., Mai, J., Hammoud, H., Elhoseiny, M., Ghanem, B.: Pointnext: Revisiting pointnet++ with improved training and scaling strategies. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 23192–23204. Curran Associates, Inc., ??? (2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/9318763d049edf9a1f2779b2a59911d3-Paper-Conference.pdf
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. 38(5) (2019) https://doi.org/10.1145/3326362
Zhao, M., Ma, L., Tan, W., Nie, D.: Interactive tooth segmentation of dental models. In: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, pp. 654–657 (2005). https://doi.org/10.1109/IEMBS.2005.1616498
Zou, B.-j., Liu, S.-j., Liao, S.-h., Ding, X., Liang, Y.: Interactive tooth partition of dental mesh base on tooth-target harmonic field. Computers in Biology and Medicine 56, 132–144 (2015) https://doi.org/10.1016/j.compbiomed.2014.10.013
Article PubMed Google Scholar
Lian, C., Wang, L., Wu, T.-H., Wang, F., Yap, P.-T., Ko, C.-C., Shen, D.: Deep multi-scale mesh feature learning for automated labeling of raw dental surfaces from 3d intraoral scanners. IEEE Transactions on Medical Imaging 39(7), 2440–2450 (2020) https://doi.org/10.1109/TMI.2020.2971730
Article PubMed Google Scholar
Lorensen, W.E., Cline, H.E.: Marching cubes: A high resolution 3d surface construction algorithm. SIGGRAPH ’87, pp. 163–169. Association for Computing Machinery, New York, NY, USA (1987). 10.1145/37401.37422 . https://doi.org/10.1145/37401.37422
Garland, M., Heckbert, P.S.: Surface simplification using quadric error metrics. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH ’97, pp. 209–216. ACM Press/Addison-Wesley Publishing Co., USA (1997). https://doi.org/10.1145/258734.258849
Henderi, H., Wahyuningsih, T., Rahwanto, E.: Comparison of min-max normalization and z-score normalization in the k-nearest neighbor (knn) algorithm to test the accuracy of types of breast cancer. International Journal of Informatics and Information Systems 4(1), 13–20 (2021) https://doi.org/10.47738/ijiis.v4i1.73
Liu, Z., Hu, H., Cao, Y., Zhang, Z., Tong, X.: A closer look at local aggregation operators in point cloud analysis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision – ECCV 2020, pp. 326–342. Springer, Cham (2020)
Chapter Google Scholar
Wang, L., Huang, Y., Hou, Y., Zhang, S., Shan, J.: Graph attention convolution for point cloud semantic segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10288–10297 (2019). https://doi.org/10.1109/CVPR.2019.01054
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: International Conference on Learning Representations (2016)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(11), 1222–1239 (2001) https://doi.org/10.1109/34.969114
Article Google Scholar
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A.: Striving for simplicity: The all convolutional net. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings (2015)
Cui, Z., Li, C., Chen, N., Wei, G., Chen, R., Zhou, Y., Shen, D., Wang, W.: Tsegnet: An efficient and accurate tooth segmentation network on 3d dental model. Medical Image Analysis 69, 101949 (2021)
Article PubMed Google Scholar
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16259–16268 (2021)
Guo, M.-H., Cai, J.-X., Liu, Z.-N., Mu, T.-J., Martin, R.R., Hu, S.-M.: Pct: Point cloud transformer. Computational Visual Media 7, 187–199 (2021)
Article Google Scholar
Nalcaci, R., Topcuoglu, T., Ozturk, F.: Comparison of bolton analysis and tooth size measurements obtained using conventional and three-dimensional orthodontic models. European journal of dentistry 7(S 01), 066–070 (2013)

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Cooperative Doctoral Program for Data Science and Analytics, Ulm University and University of Applied Sciences, Ulm, 89075, Germany
Lucas Krenmayr
Department of Computer Science, University of Applied Sciences, Prittwitzstr. 10, Ulm, 89075, Germany
Lucas Krenmayr, Reinhold von Schwerin, Daniel Schaudt, Pascal Riedel & Alexander Hafner

Authors

Lucas Krenmayr
View author publications
You can also search for this author in PubMed Google Scholar
Reinhold von Schwerin
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Schaudt
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Riedel
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Hafner
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Lucas Krenmayr designed the methodology presented, prepared the data and set up the training and evaluation pipeline, trained the models and analyzed the results. Lucas Krenmayr led the writing effort of the manuscript. All authors contributed significantly to technical discussions during the preparation of this paper and reviewed the manuscript.

Corresponding author

Correspondence to Lucas Krenmayr.

Ethics declarations

Ethics Approval

Not applicable.

Consent to Participate

Not applicable.

Consent or Publication

Not applicable.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Krenmayr, L., von Schwerin, R., Schaudt, D. et al. DilatedToothSegNet: Tooth Segmentation Network on 3D Dental Meshes Through Increasing Receptive Vision. J Digit Imaging. Inform. med. (2024). https://doi.org/10.1007/s10278-024-01061-6

Download citation

Received: 08 December 2023
Revised: 12 February 2024
Accepted: 12 February 2024
Published: 05 March 2024
DOI: https://doi.org/10.1007/s10278-024-01061-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

DilatedToothSegNet: Tooth Segmentation Network on 3D Dental Meshes Through Increasing Receptive Vision

Abstract

Similar content being viewed by others

MeshSNet: Deep Multi-scale Mesh Feature Learning for End-to-End Tooth Labeling on 3D Dental Surfaces

Tooth Defect Segmentation in 3D Mesh Scans Using Deep Learning

Machine (Deep) Learning for Orthodontic CAD/CAM Technologies

Introduction

Background

Related Work

Materials and Methods

Data and Preprocessing

Network Architecture

Local Feature Learning

Dilated Feature Learning

Experiment

Competing Methods

Experiment Setup

Results

Quantitative Evaluation

Qualitative Evaluation

Local Feature Learning Strategy

Dilated Feature Learning Strategy

Discussion

Limitations

Extreme Cases and Unbalanced Labels

Data Quality

Future Work

Conclusion

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval

Consent to Participate

Consent or Publication

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation