Introduction

Point clouds become easier to capture by laser scanners [1], LiDAR [2, 3], RGB-D scanners [4], stereo cameras [5], and so on. They have become the primary data format for representing the 3D world and can preserve the original geometric information of objects. They have garnered significant attention in various research fields, including virtual reality, robotics, autonomous driving, 3D games, and automatic plant phenotyping [6,7,8]. However, the direct raw point clouds obtained from these devices are predominantly sparse and captured only partially since limitations of occlusions, device resolution, transparency, shooting angles, and reflections. Especially, for complicated morphological objects, characterized by significant variations in shape among their constituent parts, occlusions, and discontinuous surfaces, the issue of missing point cloud data becomes more pronounced compared to regular objects. This is particularly noticeable when dealing with plant point cloud data [9]. Therefore, generating complete point clouds for 3D shapes with morphologically diverse structures is essential for addressing the challenge of missing point cloud data and advancing relevant research in the field [10,11,12,13].

Reconstructing an entire object from a partial or incomplete point cloud has become a popular topic of research in recent times. Early approaches in point cloud completion used voxel localization and 3D convolution, adapting established 2D completion techniques to 3D point clouds. In the past few years, researchers have investigated different methods to address the completion of point clouds in deep learning [14,15,16], such as voxel grids [17], meshes [18, 19], and point clouds [20, 21]. The popularity of 3D analysis based on point clouds has surged following the success of PointNet++ [22], which enables direct processing of 3D coordinates. Encoder–decoder schemes, which have been employed in various pioneering works, have further advanced the field of point cloud completion.

By utilizing an encoder–decoder network, L-GAN [23] was the first to apply a deep learning framework to point cloud completion. Subsequently, PCN (Point Completion Network) [14] integrated the benefits of L-GAN [23] and FoldingNet [15], which focused on incomplete point cloud repair. In addition, PF-Net [24] presented a GAN with a reinforcement learning agent to speed up the point cloud completion prediction time. These methods exhibit remarkable success in restoring the original shapes from incomplete point clouds.

Existing completion datasets are typically generated by sampling the shapes from 3D model datasets [25,26,27]. However, these datasets often assume that the input point cloud data consist of objects with regular shapes and continuous surfaces, such as cars, tables, and planes. Nonetheless, this paper contend that this assumption may not always hold in real-world scenarios. Consider, for instance, the scanning of a plant in a scene, where the scanning devices inevitably capture a partial 3D shape characterized by diverse structures and discontinuous surfaces. In this context, point cloud completion confronts a practical challenge, as the partial point cloud model to be completed inherently contains dissimilarly shaped structures and discontinuous surfaces, which severely impairs the completion performance. Consequently, the existing methods are generally ill-equipped to handle the completion of partial point clouds exhibiting such morphological diversity and discontinuities.

This paper presents SegCompletion, a novel deep neural network, designed to address the challenge of completing point clouds from a partial 3D shape with diverse structures and discontinuous surfaces encountered in real-world scenarios. First, morphological segmentation is introduced before point cloud completion by deep hierarchical feature learning on point sets, and thus, the complex morphological structure is segmented into regular shapes and continuous surfaces. Second, HDBSCAN [28] is utilized to effectively cluster instances of point clouds belonging to the same feature type. Third, a multiscale generative network is employed to achieve sophisticated patching of missing point clouds based on feature points under the same geometric feature. To account for the variance in mean distances between patch centers and their closest neighbors, a simple yet effective uniform loss is utilized. Results from the experiment proved that SegCompletion was successful in completing point clouds of 3D shapes with different morphologies, as seen in Fig. 1. In summary, the main contributions of this research are outlined as follows.

  • We propose SegCompletion, a deep neural network, to tackle point cloud completion from a partial 3D shape with differently shaped structures and discontinuous surfaces in real-world scenarios. The experimental results indicated that SegCompletion could deal with completion in actual circumstances, including diverse and challenging scenarios.

  • We suggest the incorporation of a uniform loss to alleviate the discrepancy in mean distances between the patch centers and their respective closest neighbors in the GAN (Generative Adversarial Network). This strategy effectively mitigates the problem of excessive concentration of generated points in the GAN framework by facilitating the separation of the generated points.

  • We construct a 3D point cloud dataset of cotton plants, the Cotton3D dataset, which comprises more than 724 high-quality partial and complete point cloud cotton plants. The solution of the missing point cloud data of cotton plants can be beneficial for research related to cotton plants.

Fig. 1
figure 1

A partial illustration of SegCompletion results at a variety of different stages

This paper is structured as follows: “Related works” gives an introduction to the relevant work on point cloud segmentation and completion, “Method” presents SegCompletion of the network and the loss function in more detail, “Experiments” describes the practical results of the implementation and the experiments on the public datasets, “Discussion” considers further applications of the network in cotton plants, and “Conclusions” concludes the research.

Related works

Utilization of deep learning techniques has been extensive in 3D reconstruction and representation learning, promoting progress in 3D shape completion research, which can be classified into two main approaches [16, 29,30,31,32]. (1) Traditional 3D shape-completion methods [33,34,35,36] typically rely on hand-crafted features, such as surface flatness or symmetry axes, to estimate the absent parts of incomplete shapes. Furthermore, other methods [34, 37,38,39,40] utilize large, complete 3D shape datasets to search for similar patches and fill in the incomplete regions. (2) Leveraging the representation learning capacity of deep learning [24, 41,42,43], incomplete input shapes can be used to extract geometric features, which can then be used to directly infer the complete shape. In contrast to traditional completion methods, these learnable approaches do not require predefined hand-crafted features, allowing them to effectively utilize the abundant shape information present in large-scale completion datasets [32]. Numerous experimental [44] results have shown that these methods perform well with point cloud completion on point cloud datasets with regular shapes and continuous surfaces (e.g., cars, tables, and planes). However, these methods have not yet addressed the challenge of completing point clouds for objects with morphologically diverse structures and discontinuous surfaces. In this section, we primarily survey the research relevant to our work.

3D point cloud segmentation

3D point cloud segmentation is a basic problem in computer vision, and its main task is to output the semantic label value of each point in the 3D scene through a 3D point cloud semantic segmentation algorithm for the given data describing the 3D scene, such as 3D point clouds and color-depth (RGB-D) maps. 3D point cloud semantic segmentation is the basic task of advanced artificial intelligence tasks, such as automatic driving navigation planning and industrial automatic control grasping, and is also the current research hotspot in 3D computer vision and deep learning [45]. Deep learning methods for 3D point cloud semantic segmentation can be classified into Convolutional Neural Networks (CNNs), Graph Neural Networks (GNNs), Attention Networks, Transformers, and other networks. Attention-based, Transformer-based, and other network-based approaches are all viable options [46].

DGCNN [47] was a Graph Neural Network (GNN)-based algorithm for 3D point cloud segmentation. It dynamically constructs a relationship graph between points to capture local features. GraNet then used GNN to capture local and global features for semantic segmentation, although the computational complexity of GNN could be high. KPConv [48] was an attention-based point cloud convolution method for 3D point cloud segmentation. It used a learnable convolution kernel to dynamically adjust the influence of each point to better capture local and global features, though it might have required more training data for optimal performance. TransPoint [49] was a Transformer-based method for 3D point cloud segmentation. Like Transformers in natural language processing, it used a multi-head self-attention mechanism to capture features in the point cloud. This approach performed well in point cloud segmentation, although it might have also required more computational resources. ShapeNet [50] was a deep learning-based approach for 3D point cloud segmentation. It utilized convolution and aggregation operations to process point cloud data for high-quality segmentation tasks, although it may have required high-quality and integrity of data.

PointNet++ [22], an improved version of PointNet [51], was able to better capture multiscale features, making the segmentation more accurate and comprehensive. Not only that, but it was also applicable to several types of point cloud data, including irregular, sparse, or dense point clouds. In addition, PointNet++ supported end-to-end training, simplifying the training process, and had high memory efficiency to manage large-scale point cloud data. PointNet++ performed well in semantic segmentation tasks, capturing fine-grained semantic information. Although other algorithms also had their merits, PointNet++, as a comprehensive and versatile approach, successfully solved many point cloud segmentation problems and demonstrated impressive performance in practical applications.

In this study, PointNet++ was employed to facilitate the distinction between different morphological features. Consequently, the complex morphological structure was effectively segmented into regular shapes and continuous surfaces.

3D point cloud clustering

Clustering algorithms are a category of unsupervised learning algorithms used to group samples in a dataset based on similarity. In clustering analysis, the measurement of similarity between samples is crucial. The objective of clustering algorithms is to maximize the similarity within the same cluster and minimize the similarity between different clusters. Over the past few decades, researchers have proposed a variety of clustering algorithms, covering diverse methods, and techniques [52].

The K-means clustering algorithm [53] assigned data points to the nearest cluster centroid and updates centroids iteratively to minimize the within-cluster sum of squares. However, it was sensitive to the initial choice of centroids and required the number of clusters to be predefined. Hierarchical clustering [54] built a hierarchy of clusters by merging or splitting them based on a specified criterion. It offered flexibility, but it could be computationally expensive for large datasets and may produce unstable results. Spectral clustering [55] constructed a similarity graph and performs clustering on the eigenvectors of the graph Laplacian matrix. It captured complex structures but could be computationally demanding, sensitive to parameter selection, and required careful parameter tuning.

Density-based clustering algorithms were commonly used to group data points based on their density, with a minimum number of points within a specified distance considered a cluster. HDBSCAN [28], as a density-based clustering algorithm, offered several notable advantages. It was capable of effectively managing clusters of different shapes and sizes, making it suitable for a wide range of datasets. Additionally, HDBSCAN exhibited robustness in parameter selection, alleviating the need for manual tuning and providing reliable results. These characteristics collectively contributed to HDBSCAN's significance as a valuable clustering algorithm in various domains.

3D shape completion based on GAN

Taking inspiration from the achievements of GANs [56] in 2D tasks such as image repair and processing [57], researchers have made significant advancements in point cloud completion using the traditional GANs, leading to remarkable successes [41].

LGAN [58] introduced a deep generative network for point cloud completion, pioneering its application in this field. However, its architecture was not tailored to shape-completion tasks, resulting in subpar performance. FoldingNet [15] proposed a unique decoding operation called folding, which enables mapping from 2 to 3D. Subsequently, PCN [14] developed a reinforcement learning architecture specifically focused on addressing the challenge of shape completion. PCN was deployed to employ folding to approximate a surface that was smooth and satisfied the completion of the shape. A recent development is a GAN-based network, RL-GANNet [42], which utilized a reinforcement learning agent to enable real-time point cloud completion. An RL agent [59] was incorporated to simplify the optimization process and accelerate prediction, but it did not aim to improve the accuracy of predicting the points. PF-Net [24] was designed to handle partial point cloud input and generate the missing part of the point cloud as output, rather than the whole object. It employed a multiresolution encoder for extracting point cloud features, a PPD (Point Pyramid Decoder) for constructing point clouds, and a GAN discriminator for network refinement. However, the GAN network faced the issue of producing point clouds that exhibited excessive concentration.

In this paper, PF-Net [24] aimed to achieve accurate patching of missing point clouds based on geometric features, demonstrating effective results for various missing rates and multiple missing locations. To ensure surface smoothness and encourage planar representations, a uniform loss [60] was incorporated in the discriminator within the GAN network. Additionally, the average distance between the center of the patch and its nearest neighbor was evaluated to penalize any discrepancies. These modifications contributed to enhancing the quality of the completed point clouds generated by PF-Net.

Method

A graphical representation of the proposed method is presented in Fig. 2. SegCompletion comprises two primary modules: (1) the segmentation and clustering network (Fig. 2a), which is responsible for segmenting and extracting various parts of the morphology, and (2) the point cloud completion network (Fig. 2b), which focuses on filling in missing point clouds and enforcing shape uniformity. A comprehensive explanation of each module is provided in the forthcoming section.

Fig. 2
figure 2

The detailed structure of SegCompletion. It consists of two parts: a the segmentation and clustering network to distinguish between the distinctive features of the morphology by PointNet++. Therefore, each instance of a point cloud belonging to the same category of objects can also be effectively indented using the clustering scheme based on HDBSCAN. b A uniform loss is proposed that efficiently penalizes the discrepancy between the average distances of patch centers and their closest neighbors in the PF-Net network

Segmentation and clustering network

The segmentation and clustering network was designed based on PointNet++ [22] in this paper. Given a point cloud set as the input \(P = \{ x_{1} ,x_{2} , \ldots ,x_{n} \}\), while \(x_{i} \in {\mathbb{R}}^{d}\) and a destination point cloud set \(P^{\prime} = \{ x^{\prime}_{1} ,x^{\prime}_{2} , \ldots ,x^{\prime}_{n} \}\), with a set function \(f:\chi \to {\mathbb{R}}\). \(\gamma\) is a continuous function, and MAX is a vector max operator that takes \(n\) vectors as input and returns a new vector of the element-wise maximum

$$ f(x_{1} ,x_{2} , \ldots x_{n} ) = \gamma \left( {\mathop {{\text{MAX}}}\limits_{i = 1, \ldots ,n} \left\{ {h\left( {x_{i} } \right)} \right\}} \right). $$
(1)

The paper applied the FPS (farthest point sampling) to select a subset of points \(\{ x_{i}^{1} ,x_{i}^{2} , \ldots x_{i}^{m} \}\), in such a way that \(x_{i}^{j}\) was the most distant point, which was in the metric distance, from the set \(\{ x_{i}^{1} ,x_{i}^{2} , \ldots x_{i}^{j - 1} \}\) with respect to the other points. The input was a point set of size \(N_{l} \times \left( {d + C} \right)\) for the number of points \(N_{l}\) with \(C\)-dim point features and \(d\)-dim coordinates at the set abstraction layer \(l\). This output was a matrix \(N_{l}{\prime} \times \left( {d + C^{\prime}} \right)\) containing subsampled points \(N_{l}{\prime}\), each with \(C^{\prime}\)-dim feature and \(d\)-dim coordinate vectors that summarized the local context, as well as the coordinates of a set of centroids with a size of \(N_{l}{\prime} \times d\). Each layer \(l\) was taken as its input localized groups of points \(N_{l}{\prime} \) with a data size \(N_{l}{\prime} \times K \times \left( {d + C} \right)\) that each group related to the local region, and where the number of points was \(K\) in the vicinity of the centroid points. Set segmentation was accomplished by the propagation of points from one set to another, known as point feature propagation

$$ \begin{aligned} & f^{\left( j \right)} \left( x \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{k} w_{i} \left( x \right)f_{i}^{\left( j \right)} }}{{\mathop \sum \nolimits_{i = 1}^{k} w_{i} \left( x \right)}}\quad {\text{where}}\\ & w_{i} \left( x \right) = \frac{1}{{d\left( {x, x_{i} } \right)^{p} }}\quad j = 1, \ldots ,C. \end{aligned} $$
(2)

To avoid the problem of misclassification in morphologically diverse structures, the HDBSCAN method [28] was added to the PointNet++ model to find the full dense region of sample points in this paper. We broke the target point cloud \(P^{\prime} = \{ x^{\prime}_{1} ,x^{\prime}_{2} , \ldots x^{\prime}_{n} \}\) out into a series of steps.

Transform the space according to the density/sparsity.

As a very inexpensive estimate of density, KNN (Kth Nearest Neighbor) was used and \({\text{core}}_{k} \left( x \right)\) stood for the core distance between the current point x to its kth closest point

$$ {\text{core}}_{k} \left( {x^{\prime}} \right) = d\left( {x^{\prime}, N^{k} \left( {x^{\prime}} \right)} \right). $$
(3)

A new distance metric, the mutual reachability distance, was developed to disperse points that had a low density and a high core distance

$$ d_{{{\text{mreach}} - k}} \left( {x_{i}{\prime} ,x_{j}{\prime} } \right) = {\text{max}}\left\{ {{\text{core}}_{k} \left( {x_{i}{\prime} } \right),{\text{core}}_{k} \left( {x_{j}{\prime} } \right),d\left( {x_{i}{\prime} ,x_{j}{\prime} } \right)} \right\}, $$
(4)

where \(d\left( {x_{i}{\prime} ,x_{j}{\prime} } \right)\) was the original metric distance between \(x_{i}{\prime}\) and \(x_{j}{\prime}\). This metric ensured that dense points, characterized by a low core distance, maintain their relative proximity to each other. In contrast, it forced sparser points to be separated by at least their core distance from any other point, effectively pushing them further apart.

Build the minimum spanning tree of the distance-weighted graph

In this section, the process of constructing the Minimum Spanning Tree (MST) for the distance-weighted graph was described. Initially, the dataset was treated as a weighted graph, where each data point represented a vertex, and the weights of the edges between data points signified their mutual reachability distance. To enhance the theoretical foundation, the section was extended to provide a more comprehensive description of the methodology. To begin, a threshold value was established to govern edge selection. Edges with weights exceeding this threshold were excluded from the graph, ensuring that only the most relevant connections were retained for further analysis

$$ E\left( G \right) = \left\{ {\left( {x_{i} ,x_{j} } \right)|d\left( {x_{i} ,x_{j} } \right) \le {\text{threshold}}} \right\}. $$
(5)

Here, \(E\left( G \right)\) represented the set of edges, \(x_{i}\) and \(x_{j}\) were vertices in graph \(G\), \(d\left( {x_{i} ,x_{j} } \right)\) represented the distance between vertices \(x_{i}\) and \(x_{j}\), and threshold was the predefined threshold value.

Subsequently, the Prim algorithm, a classical method for finding the MST of a weighted graph, was applied. This algorithm efficiently and systematically identified the edges that constituted the Minimum Spanning Tree while avoiding cycles and redundancies.

Build the cluster hierarchy

To convert the minimal spanning tree into a hierarchy of connected components, the edges of the tree needed to be sorted in ascending order based on their distances. As the edges were traversed, it was important to identify the two clusters that would be joined together by each edge. This was achieved by employing a union-find data structure, which allowed for the management of cluster connectivity. By utilizing this approach, the merging of clusters was accurately determined, while the edges of the tree are being traversed.

Let \(G\) be the weighted graph representing the data points, where \(V\) represents the set of vertices (data points) and E represents the set of edges with their corresponding weights. The union-find data structure UF is utilized to maintain the disjoint sets representing clusters.

For each edge \(\left( {u, v} \right)\) in ascending order of edge weights \(w\left( {u, v} \right)\):

  1. 1.

    If UF.\({\text{Find}}\left( u \right) \ne {\text{ UF}}\). \({\text{Find}}\left( v \right)\), merge the clusters \({\text{UF}}.{\text{Find}}\left( u \right)\) and \({\text{UF}}.{\text{Find}}\left( v \right)\).

  2. 2.

    Update the cluster hierarchy to reflect the merging of clusters.

Condense the cluster hierarchy based on the minimum cluster size

To determine which clusters should be split and which should remain intact, a minimum cluster size setting was introduced in this study. This method involves traversing the cluster hierarchy while providing precise control over cluster behavior. Specifically, a threshold for the minimum cluster size, denoted as \({\text{minSize}}\), was defined. During the traversal of the cluster hierarchy, the following process was implemented:

(1) If a split resulted in the creation of new clusters with a size smaller than \({\text{minSize}}\), the larger cluster retained its integrity, and the data points separated from it were marked, with their corresponding distance values recorded

$$ {\text{if}} \left( {\left| {C_{i} } \right| < {\text{minSize}}} \right), C_{j} = C_{j} \cup C_{i} , C_{i} = \emptyset , $$
(6)

where \(C_{i}\) represents a cluster with a size less than \({\text{minSize}}\), and \(C_{j}\) is a larger cluster.

(2) Conversely, if a split led to the formation of two new clusters, both of which were equal to and larger than \({\text{minSize}}\), the split was allowed to proceed.

Through this process, a more streamlined hierarchy was obtained, providing insights into how cluster sizes decrease with varying distances. This enhancement enriches the content and theoretical depth of the method section and can be succinctly expressed using mathematical formulas.

Extract the clusters

A different measure than distance \(\lambda = \frac{1}{{{\text{distance}}}}\) was to consider the persistence of clusters. For each cluster, the stability was computed as

$$ S_{{{\text{cluster}}}} = \mathop \sum \limits_{{p \in {\text{cluster}}}} \left( {\lambda_{p} - \lambda_{{{\text{birth}}}} } \right). $$
(7)

The lambda values \(\lambda_{{{\text{birth}}}}\) and \(\lambda_{{{\text{death}}}}\) denoted the instances when a cluster had split off and formed its own separate cluster and when a cluster had divided into smaller clusters, respectively. Each point \(p\) within a cluster had an associated lambda value \(\lambda_{p}\), representing the moment when the point “exited the cluster”. This transition occurred between \(\lambda_{{{\text{birth}}}}\) and \(\lambda_{{{\text{death}}}}\), as the point either left the cluster during its existence or departed when the cluster underwent further subdivision into smaller clusters.

In the reverse topological order traversal of the tree, all leaf nodes were considered as individual clusters. The stability of each cluster was determined based on the sum of the stabilities of its child clusters. If the sum of the child stabilities surpassed the stability of the cluster, the cluster’s stability was updated to the sum of the child stabilities. Conversely, if the cluster's stability was higher than the sum of its children, the cluster was marked as selected, and all its descendant clusters were unselected. This process continued until the root node was reached. The resulting set of selected clusters at this stage represented the flat clustering, which was then returned as the final outcome.

Point cloud completion network

This research utilized PF-Net [24] for point cloud completion, which was made up of two deep networks, namely a discriminator \(D\) and a generator \(G\). The generator \(G\) produced artificial examples, while the discriminator \(D\) attempted to differentiate between real and fake samples from the entire dataset. The first-order approximation of multilayer graph convolution with Chebyshev expansions was introduced as follows:

$$ p_{i}^{l + 1} = \sigma \left( {Y_{i}^{l + 1} + \mathop \sum \limits_{{q_{j} \in A\left( {p_{i}^{l} } \right)}} U^{l} q_{j}^{l} + b^{l} } \right), $$
(8)

where \(p_{i}^{l}\) was the \(i\)th node at the \(l\)th layer of the complementary network, \(q_{j}^{l}\) was the \(j\)th neighbor of \(p_{i}^{l}\), and the set \(A\left( {p_{i}^{l} } \right)\) consisted of all ancestors of \(p_{i}^{l}\). In the training process, GCNs determined the most suitable weights \(U^{l}\) and bias \(b^{l}\) at each layer and used these parameters to generate 3D coordinates for point clouds, thus ensuring their resemblance to real point clouds. \(K\) backing from \(F_{K}^{l}\) was responsible for the production of \(Y_{i}^{l + 1}\), where \(F_{K}^{l}\) was a fully connected layer containing \(K\) nodes. \(N\left( {p_{i}^{l} } \right)\) was the set of all neighbors of \(p_{i}^{l}\). \(\sigma ( \cdot )\) was the activation unit. The following point was proposed by the new loop term based on K support as follows:

$$ Y_{i}^{l + 1} = F_{K}^{l} \left( {p_{i}^{l} } \right). $$
(9)

Training loss

Regularization of the completion shape was achieved by the complete ground-truth point cloud, with the help of uniform loss and CD (Chamfer Distance) [61]. The CD, being the most widely implemented structural loss for shape completion, was comparatively insensitive to details and density distribution [17, 62]. A uniform loss could be employed to rectify the problem of point cloud generation being unevenly distributed. Equation (10) defined the loss function of a discriminator \({\mathcal{L}}\)

$$ {\mathcal{L}} = {\mathcal{L}}_{{{\text{com}}}} + {\mathcal{L}}_{{{\text{uniform}}}} . $$
(10)

Uniform loss: To resolve the problem of irregularly distributed point cloud creation, a uniform loss [63] that should improve the generator's generative ability was used. The uniform loss \({\mathcal{L}}_{{{\text{uniform}}}}\) was defined as

$$ {\mathcal{L}}_{{{\text{uniform}}}} = {\text{Var}}\left( {\left\{ {\rho_{j} } \right\}_{j = 1}^{n} } \right),\quad \rho_{j} = \frac{1}{k}\mathop \sum \limits_{i = 1}^{k} {\text{dist}}^{2} \left( {y_{i} , y_{j} } \right). $$
(11)

Specifically, the paper randomly selected \(n\) seed positions on the object surface using the FPS method, and then, small patches were formed by incorporating the \(k\)-nearest neighbors of every seed. These small patches exhibited similar scattering, regardless of whether the structure was fine or coarse. Thus, the average distance from each seed to its \(k\)-nearest neighbors was calculated, and the variance of the average distances of all patches was penalized as expressed in Eq. (9).

Chamfer distance: The CD does not qualify as a distance function due to its deviation from the triangle inequality [61]. In spite of this, the phrase “distance” was utilized to signify any nonnegative function that is established on pairs of point sets. The CD looked for the nearest point \(y\) in the other set \(\hat{Y}\) for each point \(x\) in point set \(Y\), computed the squared distances, and repeated the process in the opposite direction. The CD was seen as a continuous and piecewise smooth function when viewed in terms of the point locations in \(Y\) and \(\hat{Y}\). Each point's range search was executed in parallel, as they were independent of each other. The \({\mathcal{L}}_{CD}\) between \(Y\) and \( \hat{Y}\) was a metric used to calculate the separation between them

$$ {\mathcal{L}}_{{{\text{CD}}}} = {\text{CD}}\left( {Y, \hat{Y}} \right) = \mathop \sum \limits_{x \in Y} \mathop {\min }\limits_{{y \in \hat{Y}}} \| x - y\|_{2}^{2} + \mathop \sum \limits_{{y \in \hat{Y}}} \mathop {\min }\limits_{x \in Y} \|x - y\|_{2}^{2} . $$
(12)

The multistage completion loss, which followed the point pyramid decoder's prediction of three layers at different resolutions, was stated in Eq. (9) through three terms, \(d_{{{\text{CD}}1}}\), \(d_{{{\text{CD}}2}}\), and \(d_{{{\text{CD}}3}}\), weighted by hyperparameter \(\alpha\). The initial formula \(d_{{{\text{CD}}1}}\) computed the squared distance between the points of the first layer \(Y_{1}\) and the actual values of the missing region \(\hat{Y}_{1}\). The distance in the square between the points of the second layer \(Y_{2}\) and the ground-truth point \(\hat{Y}_{2}\) at the missing region was denoted by \(d_{{{\text{CD}}2}}\). The third expression \(d_{{{\text{CD}}3}}\) computed the squared distance between the points of the third layer \(Y_{3}\) and the points of the ground truth \(\hat{Y}_{3}\) at the missing region. We obtained \(\hat{Y}_{2}\) and \(\hat{Y}_{3}\) from \(\hat{Y}_{1}\) by applying FPS. The multistage completion loss designed amplifies the number of feature points, resulting in a more precise focus on them

$$ {\mathcal{L}}_{{{\text{com}}}} = d_{{{\text{CD}}1}} \left( {Y_{1} , \hat{Y}_{1} } \right) + \alpha d_{{{\text{CD}}2}} \left( {Y_{2} ,\hat{Y}_{2} } \right) + 2\alpha d_{{{\text{CD}}3}} \left( {Y_{3} , \hat{Y}_{3} } \right). $$
(13)

Cross-entropy loss: As part of the training process, the point cloud segmentation and clustering network works to reduce its cross-entropy loss, the training loss function used

$$ H\left( {g,p} \right) = - \mathop \sum \limits_{i = 1}^{n} g\left( {x_{i} } \right)\log \left( {p\left( {x_{i} } \right)} \right), $$
(14)

where \(H\left( {g,p} \right)\) was used to measure the discrepancy between the true probability distribution \(g\left( {x_{i} } \right)\) of point clouds and the predicted probability distribution \(p\left( {x_{i} } \right)\) by the model. A lower value of \(H\left( {g,p} \right)\) indicated better performance in model predictions. Here, \(n\) represented the number of point clouds.

Experiments

To facilitate a comprehensive evaluation, this paper used the benchmark datasets ShapeNet and Pheno4D. (1) ShapeNet [26]: the CAD (Computer Aided Design) dataset obtained from PCN consists of 30,974 models of three dimensions divided across 8 categories. The point clouds of the ground truth were composed of 16,384 points that were evenly spread over the surfaces. (2) Pheno4D [64]: the dataset comprised 7 tomato plants that were measured over a span of 20 days, resulting in a total of approximately 350 million points. This dataset consisted of 140 points, out of which 77 points (with 200 million points) were annotated with labels. It is important to note that temporally consistent labels were provided for each point within the point clouds.

This paper compared SegCompletion with several representative methods that directly operated on 3D point clouds, namely FoldingNet [15], GRNet [65], PF-Net [24], PMP-Net [20], PMP-Net++ [32], DeCo [66], and SnowflakeNet [67]. All these existing methods were evaluated on different datasets, while we conducted training and testing on the same dataset for quantitative evaluation. To align with previous studies, we utilized the per-point L1 Chamfer Distance [14] and the per-point L2 Chamfer Distance [67] on the testing set for evaluation purposes.

Detailed settings

PointNet++ and HDBSCAN were employed as the foundational framework for the segmentation and clustering network. The segmentation results were notably influenced by the number of output channels in both the encoder and decoder of MLPs (Multi-Layer Perceptron). A detailed description of the architecture for each component is provided in Table 1.

Table 1 Detailed structure of the encoder and decoder

HDBSCAN was employed to effectively segment each point cloud instance that was associated with the same object category. According to the shape of the object, we selected the same parameter value for \({\text{min}}\_{\text{cluster}}\_{\text{size}} = 400\) on the ShapeNet, Pheno4D, and Cotton3D datasets (Online Appendix Tables 1–4).

The point cloud completion network was implemented on PyTorch. The ADAM optimizer was employed to alternately train all the building blocks with a batch size of 16, a learning rate of 0.001, and an epoch size of 100. The training process was accomplished using 4 NVIDIA GTX 2080TI GPUs, CUDA 11.6, Ubuntu 22.04.1 LTS, and Python 3.7.

Point cloud completion on the Pheno4D dataset

Point cloud preprocessing

Down-sampling and up-sampling: The experiments demonstrated the effective training of SegCompletion on sparse point clouds. As part of the preprocessing, the training of SegCompletion on sparse shapes with 1024 points was achieved by down-sampling and up-sampling techniques. To account for varying numbers of points, shapes with more than 1024 points were reduced to 1024 using the FPS method. Conversely, shapes with fewer than 1024 points were augmented by randomly replicating the points at different scales to reach the same number.

Point cloud labels: Supervised deep learning techniques necessitated labeled data for training the networks. In this study, the segmentation of tomato plant point clouds for annotation was conducted using CloudCompare software, a point cloud visualization tool. A binary classification approach was adopted, where one color was assigned to represent the tomato leaves, while another color represented the other organs of the tomato plant. A total of 220 plants at the tomato seedling stage were labeled, with 200 of them allocated for training, 12 for validation, and 8 for testing. The ratio between these three subsets was 50:3:2.

Incomplete point cloud: Since SegCompletion focused on point movement rather than generation, the incomplete and complete point clouds needed to have an equal number of points. In the process, the point cloud data were centered at the origin and normalized to the range of [− 1, 1] coordinates. The ground-truth point cloud data were created by uniformly sampling 1024 points from each shape. To generate incomplete point cloud data, a central point was randomly chosen from multiple viewpoints, and points within a certain distance from the complete data were eliminated.

Quantitative comparison

SegCompletion showed a performance that was comparable to the existing state-of-the-art techniques [15, 20, 24, 32, 65,66,67] and was the highest ranked on the Pheno4D dataset when evaluated using the metrics L1 and L2. Table 2 reveals that the latest methods for point cloud completion, such as FoldingNet, GRNet, PF-Net, PMP-Net, PMP-Net++, DeCo, and SnowflakeNet, have been developed in recent years. Notably, SegCompletion achieved an average L1 of 0.397, which was significantly lower than FoldingNet’s average L1 of 12,391.682. Moreover, SegCompletion demonstrated its strong generalization performance by obtaining the lowest L1 results in all categories, indicating its capability in completing shapes of multiple categories. The L2 metric can also be used to support this conclusion.

Table 2 Point cloud completion on the Pheno4D dataset in terms of per-point L1 and L2 Chamfer distance × 10−3 (lower is better)

The FoldingNet method excels in handling global features of point clouds, but it exhibits relatively weaker capabilities in capturing local geometric details. While GRNet improves the fusion of global and local features to some extent, it still faces challenges related to point cloud sampling and sparsity. PF-Net focuses on reconstructing partial point clouds but may be sensitive to complex structures and noise. PMP-Net and PMP-Net++ introduce graph context information but encounter challenges of excessive smoothing, leading to the loss of details in reconstructing point clouds. DeCo introduces a decoupling mechanism in the encoder–decoder structure but may have limitations in handling point clouds with multimodal features. SnowflakeNet proposes a multi-level point cloud pyramid but may face difficulties in adapting to diverse shapes. Comparatively, SegCompletion also applied a GAN network for multiscale feature extraction but yielded significantly better results when applied to the Pheno4D dataset. This improvement was attributed to the segmentation and clustering network adopted in SegCompletion, which was capable of distinguishing between the different features of the morphology.

Qualitative comparison

To further highlight the superiority of SegCompletion over other methods, Fig. 3 presents a visual comparison of the overall performance of the Pheno4D dataset. This study aimed to evaluate SegCompletion in comparison to other techniques in terms of their ability to accurately predict the complete shape of a partial point cloud. The results showed that the PMP-Net, PMP-Net++, SnowflakeNet, and GRNet methods generated insufficient complementary points, and the FoldingNet method produced many noise points, resulting in a deformed output. GRNet primarily focused on local features and often performed poorly when dealing with global features and structures. Additionally, its generalization performance on different datasets or point clouds with varying characteristics was limited (Fig. 3). In contrast, SegCompletion demonstrated superior visualization of point cloud completions across various object categories.

Fig. 3
figure 3

Visualization of the Pheno4D dataset showing how point cloud completion compares to previous methods. The PMP-Net, PMP-Net++, SnowflakeNet, GRNet, and FoldingNet methods output the whole object, but our method only displays the part of the point cloud that is missing (yellow) rather than the whole object

SegCompletion stands out from the generative methods, showing its potential to generate a high-quality leaf. By incorporating a segmentation and cluster network into the PF-Net network, the paper observed a marked improvement in the completeness of the Pheno4D dataset. Compared to PF-Net and DeCo, Fig. 4 reveals that SegCompletion was more successful in completing the shape of all leaves. PF-Net focuses on the reconstruction of partial point clouds, but it may be sensitive to complex structures and noise, leading to some defects in the process of shape completion and visualization. DeCo introduces a decoupling mechanism but may face limitations in handling object shape completion, affecting the visualization results. These issues may include unnatural deformations or missing parts when completing object shapes and the presence of artifacts or detail loss in the visualization process.

Fig. 4
figure 4

A visualization showcases how PF-Net, DeCo, and SegCompletion affect the incomplete section of the complement’s point cloud

Point cloud completion on the ShapeNet dataset

Point cloud preprocessing

In the ShapeNet dataset, the additional 2048 points were down-sampled to 2048 using the FPS method, while the point clouds with fewer than 2048 points were up-sampled to 2048 by replicating their neighboring points. For each object, the parts with missing data were manually labeled in one color, while the complete parts were labeled in another color. The annotation process was conducted using CloudCompare software, where experts carefully marked the missing parts and labeled the complete parts. The same approach as in the Pheno4D dataset was employed for generating the missing point clouds.

Quantitative comparison

The ShapeNet dataset reveals that the guitar exhibits the most intricate and diverse geometry, with 254 morphologies, while the notebook has the simplest morphology, with only one form. Table 3 shows that SegCompletion achieves a lower L1 value of 1.329 for guitar geometry completeness compared to state-of-the-art techniques [15, 20, 24, 65, 67]. Additionally, SegCompletion obtained an L1 value of 2.813 for the notebook, which was lower than the PF-Net calculation of 5.652. These results demonstrated that SegCompletion was more effective in achieving completeness for both complex and simple objects in the public dataset. SegCompletion demonstrated its robustness in completing shapes across multiple categories by achieving the lowest L1 values in all categories. Moreover, PF-Net employed a feature-point-based multiscale generating network, enabling hierarchical estimation of the missing point cloud and showcasing its efficacy in various challenging point cloud completion tasks. By utilizing a GAN network for multiscale feature extraction, SegCompletion significantly outperformed other methods when applied to the ShapeNet dataset. This improvement could be attributed to the segmentation and clustering network used in SegCompletion, which effectively recognizes distinct features of different morphologies.

Table 3 Point cloud completion on the ShapeNet dataset in terms of the per-point L1 Chamfer distance × 10−3 (lower is better)

SegCompletion, based on the per-point L2 Chamfer distance, yielded results that were consistent with the per-point L1 Chamfer distance on the ShapeNet dataset. The mean L2 distance was lower than that of the other methods, and it was lower for all five classes of objects. PMP-Net, PMP-Net++, GRNet, SnowflakeNet, and FoldingNet completed the whole object but had varying degrees of error in the reconstruction procedure, except for the missing part. However, PF-Net and DeCo generated too many concentrated points in the absent portion. Table 4 presents that SegCompletion is successful in producing point clouds with increased accuracy and less distortion, both in the complete point cloud and in the point cloud of the missing region.

Table 4 The per-point L2 Chamfer distance ×10−3 (lower is better) is used to evaluate point cloud completion from the ShapeNet dataset

Qualitative comparison

To further highlight the advantage of SegCompletion over other approaches, Fig. 5 presents a visual comparison of its performance on the ShapeNet dataset. The objective of this study was to compare SegCompletion with alternative methods across various object categories. For instance, in Fig. 5, the second row illustrates the prediction of the complete shape of a partially visible laptop. Many of the evaluated methods encountered difficulties in accurately preserving the precise geometries of the laptop screen. The PMP-Net and SnowflakeNet methods generated an insufficient number of complementary points, while the GRNet method introduced excessive noise points. Additionally, the results obtained from the FoldingNet method exhibited significant distortions. In contrast, SegCompletion outperformed these methods by producing more accurate and visually appealing point cloud completions. Its ability to preserve precise geometries and avoid excessive noise points distinguished it from PMP-Net, SnowflakeNet, GRNet, and FoldingNet. Thus, SegCompletion demonstrated its superiority in visualizing point cloud completions across diverse object categories.

Fig. 5
figure 5

Visualization of the ShapeNet dataset showing how point cloud completion compares to previous methods. The PMP-Net, PMP-Net++, SnowflakeNet, GRNet, and FoldingNet methods output the whole object, but our method only displays the part of the point cloud that is missing (yellow) rather than the whole object

PF-Net and DeCo excelled in preserving the spatial configuration of the partial point cloud and accurately inferring the intricate geometric structure of the missing areas during the prediction process. However, they tended to prioritize shape completion over other aspects. In contrast, SegCompletion stands out among the generative methods presented in Fig. 6, demonstrating its ability to deliver high-quality reconstructions of laptops. By incorporating a segmentation and clustering network into the PF-Net architecture, SegCompletion significantly improved the complementary effect on the ShapeNet dataset. Moreover, when compared to PF-Net, the results suggested that SegCompletion was more effective in accurately predicting the shapes of objects across all categories than other methods.

Fig. 6
figure 6

A visualization displays how PF-Net, DeCo, and SegCompletion affect the incomplete section of the complement’s point cloud

Ablation studies

Effectiveness of each component

The impact of the segmentation and clustering module and the uniform loss in the SegCompletion method was evaluated by investigating their efficacy when they were removed. Four different variants were designed for comparison: (1) no-segmentation, where the segmentation and clustering unit was omitted from the network; (2) no-uniform loss, which excluded the uniform loss component from the network; (3) PF-Net, where both the morphological segments and the uniform loss were removed; and (4) the full model, representing the complete SegCompletion method.

Table 5 presents compelling evidence that the full SegCompletion model achieves superior results compared to other network variants. The comparison between the No-Segmentation model and the full model highlights the value of incorporating the segmentation and clustering module before network completion. Similarly, the comparison between the No-Uniform Loss model and the full model demonstrates the effectiveness of addressing the issue of nonuniformity in shape point clouds. Moreover, the comparison between PF-Net and the full model emphasizes the significant contribution of the combined modules to the overall performance.

Table 5 Point cloud completion performances of no-segment, no-uniform, PF-Net, and the full model on the ShapeNet dataset in relation to the per-point L2 Chamfer distance (lower is better, × 10−3)

Robustness to the model

The robustness tests conducted in this study were focused on the "Guitar" class. To assess the reliability of SegCompletion, a robustness test was performed by controlling the number of output points and training the model to repair shapes with varying degrees of incompleteness. The experimental results are summarized in Table 6. In comparison to the ground truth, percentages of 25, 37.5, 50, and 75 indicated that four partial inputs were missing 512 points, 768 points, 1024 points, and 1536 points, respectively. The robustness of SegCompletion was evident in the similarity between the errors of the predicted (Pred) and ground truth (GT) for the four partial inputs. This indicated that SegCompletion was able to effectively manage inputs with different levels of incompleteness. The findings demonstrated that SegCompletion exhibited robustness in accurately completing point cloud shapes, even in the presence of significant missing data. This highlights its potential applicability in scenarios where incomplete or partially scanned point clouds are encountered.

Table 6 The partial point cloud is reduced by 25%, 37.5%, 50%, and 75% of the initial point cloud

The effectiveness of SegCompletion on the test set is illustrated in Fig. 7, highlighting its ability to accurately distinguish between distinct types of guitars while preserving the intricate geometric details of the original point cloud. This holds even when dealing with substantial levels of incompleteness. To further evaluate the robustness of SegCompletion, a second test was conducted. In this test, the model was trained to complete partial shapes by filling in multiple missing points located at unusual positions. The purpose of this test was to assess SegCompletion's performance in handling incomplete inputs with varying degrees of incompleteness.

Fig. 7
figure 7

Examples of repair results when the original input has various degrees of incompleteness. The original point cloud is reduced by 25%, 37.5%, 50%, and 75% for (1), (2), (3), and (4), respectively. Yellow indicates a prediction. Red indicates that the point cloud is not damaged. Purple represents the target ground truth

Discussion

Completion in cotton plant leaves

The Cotton3D dataset is distinct from the existing datasets due to its various data features and complexities. LiDAR, laser scanners, stereo cameras, RGB-D scanners, and other devices can be used to capture point cloud data of cotton plants, preserving the initial geometric information in three-dimensional space. However, due to the variations in shape, mutual occlusion, and discontinuous surfaces, there is a more severe issue of missing point cloud data when compared to the conventional objects. This study aims to ensure that the SegCompletion method can cover various data scenarios, allowing for a deeper exploration of its generalization capabilities.

The analysis of the data presented in Table 7 reveals significant findings. PF-Net [24] secures the second position with an L2 value of 0.146. In contrast, SegCompletion outperforms PF-Net with a significant improvement of 0.124 in the L2 metric. Notably, on the Cotton3D dataset, SegCompletion demonstrates a remarkable 84.93% reduction in the L2 metric compared to PF-Net. This noteworthy decrease highlights the effective utilization of the abundant geometric information embedded in point clouds by SegCompletion. Furthermore, the superior performance of SegCompletion over SnowflakeNet [67] provides compelling evidence of its effectiveness. It highlights exceptional generalization capabilities when dealing with morphologically diverse structures, surpassing other methods in performance across all cotton plants.

Table 7 Point cloud completion on the Cotton3D dataset in terms of the per-point L1 and L2 Chamfer distance \( \times 10^{ - 3}\) (lower is better)

According to the results presented in Table 7, this study evaluates the performance of the two-stage scheme by calculating the mean values of the mentioned metrics on ten different plants. SegCompletion, which combines segmentation and completion within a single network, outperforms other advanced point cloud completion methods by producing higher-quality results. In contrast, the other methods, although successful in completing point clouds with regular shapes and continuous surfaces, still face challenges when dealing with objects that exhibit morphologically diverse structures and surface discontinuities, which are commonly encountered in real-world scenarios.

Generative methods, such as PF-Net and DeCo, have been capable of learning the complete structure of the input plant. However, they have often struggled to accurately place the generated points in the correct morphological positions, resulting in a mismatch with the remaining part of the input shape. In contrast, SegCompletion, which relies on segmentation and clustering networks, demonstrates the ability to differentiate between various morphological features. This enables SegCompletion to effectively segment the complex morphological structure into regular shapes and continuous surfaces. Referring to Fig. 8, it is evident that SegCompletion excels at accurately locating and completing missing regions within the point cloud of cotton plants. The completed point clouds display a uniform distribution, indicating the successful integration of the added data.

Fig. 8
figure 8

Visualization of the Cotton3D dataset showing how point cloud completion compares to the PF-Net and DeCo methods

Point cloud segmentation and cluster network

The segmentation and clustering network was utilized to distinguish the different features of morphology. It successfully partitioned morphologically diverse structures or discontinuous surfaces into regular shapes and continuous surfaces. PointNet++ was employed as a network for segmenting the point cloud. The segmentation accuracy of PointNet++ gradually increased as the Epoch value increased. Once the Epoch value exceeded 10, the accuracy improvement tended to plateau, reaching a value of 98%, which satisfied the experimental requirements as illustrated in Fig. 9. In this study, an Epoch value of 100 was selected, and each iteration of the PointNet++ segmentation achieved an accuracy of over 98%. HDBSCAN was found to be effective for clustering the segmented components.

Fig. 9
figure 9

Relationship between PointNet++ recognition accuracy and Epochs

This paper presented the results of point cloud segmentation across various object categories, including Pheno4D, ShapeNet, and cotton plants in the Cotton3D dataset. For the objects in the ShapeNet dataset, the missing and complete parts were bifurcated based on their morphological characteristics. The morphology of plants was complex and diverse, and the leaves and stems were not necessarily continuous in the Pheno4D and Cotton3D datasets. Therefore, segmentation and clustering techniques were utilized to complete the point cloud representation of cotton leaves. Figure 10 demonstrates the qualitative effects of the point cloud segmentation and clustering network on both the ShapeNet and Cotton3D datasets.

Fig. 10
figure 10

Results of qualitative point cloud segmentation and clustering using point cloud incompleteness on the Pheno4D, ShapeNet, and Cotton3D datasets

The clustering algorithm’s results were significantly influenced by the initial parameter values. The clustering effect was measured using the silhouette coefficient [68], where a higher value indicated a better clustering result. Through experimental validation, the maximum silhouette coefficient (Online Appendix Tables 1–4) was selected as the initial parameter for each clustering algorithm and integrated into the Pheno4D dataset.

In different growth stages of tomatoes, HDBSCAN exhibited superior clustering performance, especially when considering varying leaf counts. In the first row of Fig. 11, representing the tomato seedling stage with only 2 leaflets, both the K-Means and K-Means++ algorithms produce erroneous clustering results. However, DBSCAN and HDBSCAN accurately identify the clusters. Transitioning to the second row, K-Means and K-Means++ still yield suboptimal clustering results, while DBSCAN fails to differentiate the intermediate leaflets. In contrast, HDBSCAN achieves better clustering performance. These results further emphasized the superiority of HDBSCAN in handling the complexity and variability of tomato plant structures throughout their growth stages. By accurately detecting and grouping the data points, HDBSCAN proved to be a robust and reliable clustering algorithm for analyzing tomato plant morphology and facilitating precise phenotypic characterization.

Fig. 11
figure 11

Visualization of HDBSCAN in comparison to K-Means, K-Means++, and DBSCAN in the Pheno4D dataset. The number of clusters of K-Means is \(n\_{\text{clusters}} = 2\), the number of clusters of K-Means +  + is \(n\_{\text{clusters}} = 4\), the \(\varepsilon\)-neighborhood of any point of DBSCAN is \({\text{eps}} = 0.1\), the minimum of MinPts points in the ε-neighborhood of DBSCAN is \({\text{min}}\_{\text{points }} = 400\), and the minimum of any points of HDBSCAN is \({\text{min}}\_{\text{points}} = 400\)

Limitations

The clustering results have a significant impact on the successful completion of leaves, especially when plants have multiple closely spaced leaves. In such cases, clustering algorithms, such as HDBSCAN, play a crucial role in distinguishing individual leaves from point cloud data. The effectiveness of the clustering algorithm depends on factors such as the distances between the leaves and the angles of the petioles. When the leaves are widely spread apart, algorithms such as HDBSCAN can accurately identify and complete each leaf. However, challenges arise when the leaves are closely spaced, as the clustering algorithm may struggle to differentiate them, leading to difficulties in leaf completion, as demonstrated in Fig. 12. Therefore, it is essential to consider the clustering results to achieve successful leaf completion, particularly in cases where plants have multiple closely spaced leaves.

Fig. 12
figure 12

Clustering results of HDBSCAN on two tomato plants. a This tomato plant has 12 leaves, but HDBSCAN only gathers eight, seven of which are incorrectly identified. b This tomato plant has eleven leaves, but HDBSCAN only gathers five, ten of which are incorrectly identified

Conclusions

The proposed network, SegCompletion, was specifically designed for point cloud completion, which involved reconstructing complete 3D shapes from partial geometries with various shape structures and noncontinuous surfaces. PointNet++ and HDBSCAN were utilized to implement morphological segmentation before point cloud completion, allowing for effective segmentation of point cloud instances belonging to the same object category. The multiscale generative network facilitated advanced patching of missing point clouds based on shared geometric features derived from feature points. Additionally, a straightforward and efficient uniform loss function was introduced to minimize the variance in average distances between patch centers and their corresponding closest neighbors. Extensive experiments were conducted on ShapeNet, Pheno4D, and our self-collected Cotton3D dataset to evaluate the effectiveness of the proposed method. The results demonstrated the superiority of the SegCompletion method over other approaches discussed in the literature, establishing its prominence in point cloud completion tasks.

This paper serves as an initial step toward achieving high-quality completion of single object point clouds, which is a typical challenge in the field. An important direction for future research is to extend the completion framework to handle multi-object scenes. Currently, SegCompletion focuses on generating the complete geometric representation of individual objects. However, the goal is to segment and complete individual objects within larger scenes, such as tables, chairs, or cars in indoor and outdoor environments like ScanNet and KITTI datasets. Additionally, we believe that incorporating appropriate joint optimization strategies and enabling end-to-end training across all hierarchical levels will further enhance the performance of our method. Despite these considerations, this research makes a significant contribution to the advancement of 3D shape completion and understanding. It provides valuable insights and methodologies with potential applications across various domains, benefiting industries, academia, and society.