Automated and data-driven plate computation for presurgical cleft lip and palate treatment

Purpose Presurgical orthopedic plates are widely used for the treatment of cleft lip and palate, which is the most common craniofacial birth defect. For the traditional plate fabrication, an impression is taken under airway-endangering conditions, which recent digital alternatives overcome via intraoral scanners. However, these alternatives demand proficiency in 3D modeling software in addition to the generally required clinical knowledge of plate design. Methods We address these limitations with a data-driven and fully automated digital pipeline, endowed with a graphical user interface. The pipeline adopts a deep learning model to landmark raw intraoral scans of arbitrary mesh topology and orientation, which guides the nonrigid surface registration subsequently employed to segment the scans. The plates that are individually fit to these segmented scans are 3D-printable and offer optional customization. Results With the distance to the alveolar ridges closely centered around the targeted 0.1 mm, our pipeline computes tightly fitting plates in less than 3 min. The plates were approved in 12 out of 12 cases by two cleft care professionals in a printed-model-based evaluation. Moreover, since the pipeline was implemented in clinical routine in two hospitals, 19 patients have been undergoing treatment utilizing our automated designs. Conclusion The results demonstrate that our automated pipeline meets the high precision requirements of the medical setting employed in cleft lip and palate care while substantially reducing the design time and required clinical expertise, which could facilitate access to this presurgical treatment, especially in low-income countries. Supplementary Information The online version contains supplementary material available at 10.1007/s11548-023-02858-6.


Related Work
While classical landmarking approaches have shown promising results on constrained dental models by using fixed conditions, such as finding the peak points [1], and 2D-image-based landmark training has been thoroughly explored [2], robust automated landmarking on raw 3D scans has been lagging behind, as voxels require excessive memory for high resolution inputs and default neural network architectures are not suited for topology or representation changes [3]. PointNet and its variations [4,5] have pioneered learning on unstructured 3D data and have already been successfully applied to landmarking applications on palate impressions [6], but their accuracy can be generally lower than that of models considering the surface connectivity defined by meshes, and the data amount required for training may be higher. DiffusionNet [7] is the first model that supports efficient deep learning on 3D surfaces of arbitrary representation, which makes it superior to other graph-based approaches like MeshCNN [8]. Since Sharp et al. [7] do not discuss the problem of landmarking, their work is extended in [9] by defining landmarking as a segmentation problem. The authors train a global DiffusionNet model in combination with an ensemble of localized models on predicting a full vertex segmentation map for each landmark with only vertices close to the landmark marked as non-zero. However, they do not provide a competitive performance for inputs of arbitrary orientation and, thus, focus their results on pre-aligned meshes. Since the computation of such a pre-alignment generally requires some prior information about the input, we focus our results on inputs or arbitrary orientation. We employ an ensemble of two globally trained models, of which the first serves for aligning the input meshes to enable the second model to predict landmarks on an aligned mesh resulting in a similar precision as in [9]. Hence, our two-stage model is capable of processing raw input scans without requiring the user to provide any prior information.

Training Details
As described in the main document, our dataset comprises 283 UCLP and 114 BCLP scans, of which each was manually annotated with a set of 10 landmarks L g . Figure 1 illustrates the positions of these landmarks for a UCLP and BCLP scan with both alveolar segments being landmarked in the same following fashion: For BCLP meshes, landmark 3 is moved to position 4, whereas landmark 4 is moved inwards horizontally to mark the cleft region/lift-off start. This landmarking scheme was determined empirically to optimally guide the nonrigid registration. Representing the landmark positions L g as continuous, sparsely activated segmentation maps s i,j (cf. main document), we train all DiffusionNet models with random mesh flipping to support input meshes of varying cleft sides. Additionally, the first-stage model is trained with fully randomized rotations to cover the whole spherical space of possible input orientations. The second-stage model, on the other hand, is trained on the prealigned input meshes either without any rotation randomization or on a limited azimuthal range of ±0.2 rad around the z-axis. The model uses 3D vertex coordinates (XYZ) as input features instead of the rotation-invariant 16-component heat kernel signature (HKS) vector proposed by Sharp et al. [7], since we observed inferior performance of HKS compared to XYZ, even with randomized rotations. As the majority of our dataset comprises uniformly colored plaster cast scans, we did not observe any benefit from concatenating vertex color information to the input feature vector, as proposed by [9]. However, a larger set of colored intraoral scans might yield different results. As already outlined in the main document, our models use the default parameters proposed by Sharp et al. [7] for segmentation purposes, except for an added dropout percentage of 0.2, since no significant benefit was observed when fine-tuning the hyperparameters. Training with the mean squared error (MSE) loss using an 80/20 training-test split converged after approximately 500 epochs, which took less than a day on an NVIDIA RTX 2080 TI. The number of mesh triangles required capping at 800 k to stay within the GPU's 11 GB memory limit.

Quantitative Evaluation
Our first landmark prediction tests concern the model's rotation variance. For this purpose, Table 1 compares the test accuracy of a model trained on pre-aligned meshes only (Fixed), i.e., without any rotation randomization, to one trained on completely random orientations (Random). The mean comparison suggests that a Dif-fusionNet model trained on fixed orientations is more accurate in its predictions, which is in line  with the findings from [9]. Moreover, the nonzero prediction standard deviation of the Random model, which represents the prediction variation over different mesh orientations, shows that the model has not learned to be rotation-invariant despite having observed approximately 500 different orientations of each mesh during training. Possibly, the low number of training meshes (283 UCLP and 114 BCLP) is not enough to cover the large space of 3D random rotations sufficiently densely. Furthermore, we did not manage to achieve a competitive setting using the HKS input features, which Sharp et al. [7] proposed to overcome this issue of rotation variance. Table 1 shows our best setting with HKS input features. This setting employs a fixed input value thresholding, since we observed that the model predictions were prone to be disturbed by boundaries and spikes in the input mesh, possibly due to the high HKS feature values in these regions. Further investigations would be required to determine the optimal input representation for training on this specific dataset. Table 2 lists a short ablation study to test our two-stage model setting, which aims at improving the prediction accuracy for random input orientations. The results were again averaged over 10 random rotations to eliminate any orientation bias. The first setting only applies the single Random model from Table 1 trained on fully randomized rotations. The other two rows report the results when utilizing this first model's landmark prediction L p1 for a rough alignment via weighted Procrustes alignment [10], and then in a second step applying either the model trained on fixed orientations (Two-Stage-Fixed; this second model is the Fixed model from Table 1), or the other model trained on meshes with only slight rotation variations (Two-Stage-Varied). The numbers suggest that the second-stage model improves the prediction accuracy of the first-stage model on average. Moreover, the second-stage model seems to benefit from still observing small orientation variations during training. We believe that this decreases the model's overfitting capability and additionally compensates for the possibly inaccurate alignment given by the coarser landmark prediction of the first-stage model. These results from the Two-Stage-Varied model are comparable to those given by the Fixed model from Table 1 trained on and applied to meshes with fixed orientation. This indicates that the performance decrease caused by random input orientations can be compensated to some extent by our proposed two-stage model prediction, although complete rotation invariance is not achieved.
Finally, we test the intra-and inter-observer landmarking variance on the same 12 scans we printed for the model-based study in the main document. The scans were landmarked two times on separate days by the engineer who had already landmarked the whole dataset for an optimized  Table 3. The accuracy of the predicted landmarks approaches the intra-observer variance and is consistently better than that of the cleft surgeon, which indicates that the model has learned the engineer's specific landmarking scheme that may not strictly follow the anatomical description provided in the previous subsection.

Mesh Offsetting
We discuss the problem of mesh offsetting that we face during surface volumization and describe our convex smoothing solution to avoid triangle self-intersections.

Background
First, we briefly explain the concept of Laplacian smoothing as a prerequisite to our convex smoothing algorithm. Then, we describe the general problem of mesh offsetting and previous related work.

Laplacian Smoothing
Laplacian smoothing [11,12] filters out noise and high-frequency details from a mesh while preserving the low-frequency structure, such as the general shape, as much as possible. In an iterative process, each vertex is adjusted towards the average of its neighbors by applying the forward Euler step where v (n) i refers to the vertex with index i at iteration n, N i denotes the set of neighbors of vertex v (n) i , and λ is a scale factor typically chosen in the range (0, 1). Applying more and more iterations can lead not only to a loss of high frequency, but also to major mesh shrinkage if no additional constraints are applied. Selective Laplacian smoothing is a common technique where only a subset of the vertices S ⊂ V is adjusted freely as in (1) while the boundary vertices that surround S serve as a constraint to keep the mesh from shrinking [13]. Likewise, we constrain the Laplacian adjustment vector per vertex to meet our conditions.

Problem
As illustrated in Figure 2, when offsetting a 1D function, the offset distance must not exceed the curvature radius r = 1 k at any point, i.e., the inverse of the curvature k, to avoid that the offset function overlaps around this region. On a surface function, such overlapping may happen in multiple directions. Therefore, the minimum principal curvature k 1 defines the limit of the offsetting distance. Mesh offsetting describes the problem of offsetting a general surface, discretized as a mesh, by an arbitrarily desired distance without causing any overlaps, i.e., self-intersections, in convex regions.

Related Work
Early work treated this mesh offsetting problem as a Minkowski sum of faces, cylinders for edges, and spheres for vertices. Whereas point-based resampling [14] showed only limited application without properly discussing self-intersections, curve-based resampling [15] relied on analytic curve estimation, which requires 2D slicing and is computationally inefficient. Other literature attempted to split vertices on sharp edges using the multiple normal directions defined by their surrounding faces [16], which requires a closed 2D manifold surface. More recent works represent the surface implicitly via signed distance functions [17], which allows resampling the surface at an arbitrary offset distance without any self-intersections, potentially with the use of marching cubes [18]. However, none of these works preserve the mesh topology of the surface to be offset, which may complicate further automated mesh processing. Hence, we developed a novel offsetting algorithm that reduces the curvature selectively via customized Laplacian smoothing, enabling preservation of the original mesh topology.

Convex Smoothing
Our convex smoothing algorithm globally raises k 1 above the required threshold k t to enable offsetting a surface by a desired distance. We adopt libigl's [18] principal curvature approximation per vertex [19] for the discretization of a surface via a triangle mesh M that consists of a set of vertices V and triangles T . The vertices with k 1 < k t are adjusted via weighted Laplacian smoothing, where the vertex neighbors are weighted according to their parallelity with the direction k 1 of the minimum principal curvature. More precisely, the connection vector v ij between the vertex v i and each neighbor v j ∈ N i is projected onto the plane A vi perpendicular to v i 's normal. Afterwards, the weight w vj of neighbor v j corresponds to the absolute dot product between this projected and normalized connection vectorv j,p and the minimum principal curvature direction k 1 . Lastly, the weights of all neighbors w vj are rescaled to sum to 1. Figure 2 summarizes this convex smoothing procedure, and Algorithm 1 specifies the corresponding algorithm. The set of boundary vertices B is excluded, as the curvature is not well-defined on the boundary and a proper adjustment of the remaining vertices is sufficient to eliminate any self-intersections. Note that, when offsetting the vertices in the opposite direction, the algorithm works analogously by reducing the maximum curvature instead of increasing the minimum curvature to ensure that k 2 < k t . Since this algorithm does not require resampling, it preserves the mesh surface topology and thus also the corresponding segmentation, which facilitates further automated processing during the plate computation.  end for 13: end for

Palate Sphere Approximation
As discussed in the main document, the cleft palate area should be covered by a plate that imitates the round shape of a healthy palate [20].
In this section, we describe the sphere we predict on top of the cleft palate mesh data to approximate such a healthy palate. The four parameters of the sphere are derived via a Voronoi diagram, which can be used to find the locally largest empty spheres [21]. The vertices of the input mesh in the inner cleft palate area are offset by 2.5 mm to ensure that the required safety distance is kept everywhere. Afterwards, the Voronoi diagram is constructed on a larger set of constraining points, which yields a finite set of spheres of which none penetrates the input mesh. We only consider those spheres that touch both the left and the right alveolar ridge and choose the optimal sphere that minimizes the overall distance to all points. Formally, we thus minimize the constrained least-squares objective min c,r i∈So where the sphere parameters c, r denote the center and radius respectively, v denotes a vertex of the input scan, S c denotes the constraining set of all Fig. 3 We show our sphere prediction (gray) as the solution to our optimization problem defined in (2). The colored input scan, shown from the posterior, has the left and right ridge S l , Sr marked in green, the area So we minimize our objective function on marked in blue and green, and the constraining area Sc marked by all added colors. The points marked in light blue are offset by 2.5 mm in order to maintain the required safety distance to the inner cleft palate tissue.
points the sphere must not penetrate, S o ⊂ S c the set of vertices in the cleft palate region and both alveolar ridges, and S l , S r ⊂ S o the points on the left and right alveolar ridge respectively. The approximation via largest empty spheres is crucial, as this problem is generally non-convex. Figure 3 visualizes the concept of this sphere prediction.

Implementation Details
We provide details about the implementation of our pipeline and the corresponding GUI. The pipeline was implemented in Python 3.9 [22]. Among other packages, the code relies on Open3D [23] and trimesh [24] for geometry processing, PyTorch [25] for the landmark prediction, libigl [18] for the computation of the principal curvatures, and Menpo [26] for an efficient implementation of nonrigid iterative closest point (NICP) [27]. We adjusted the NICP implementation to ensure efficient Windows compatibility. Additionally, the whole pipeline only performs CPU computations to offer maximum flexibility towards different operating systems and end devices. The GUI, illustrated in Figure 4, is implemented in Blender 3.0 [28] and has been tested on several devices running Windows, Ubuntu, and MacOS (Intel and Apple Silicon). All packages are automatically installed, allowing for minimal user installation effort. The workflow with the GUI comprises importing a palatal scan, predicting the landmarks for this Fig. 4 Captures of our Blender-based GUI. After automatically predicting the landmarks on a UCLP or BCLP mesh, the plate can be computed, with optional adjustments to the shape including addition of stimulation elements and a ventilation hole.
scan, optionally adjusting the default parameters if a different plate thickness is desired or a ventilation or stimulation elements should be added, and finally computing the plate. Online Resource 2 demonstrates this workflow in a video, where the pure computations, given a high resolution intraoral scan as input, take less than 2 min on a 14-inch Apple MacBook Pro with an 8-core M1 chip and 16 GB RAM. The computation time for the landmark prediction and the subsequent surface registration may vary depending on the input size, and machines with a less powerful CPU may take approximately 3 min for the same computations. However, GPU utilization and optimization could be implemented to increase the performance further.