Bin-scanning: Segmentation of X-ray CT volume of binned parts using Morse skeleton graph of distance transform

X-ray CT scanners, due to the transmissive nature of X-rays, have enabled the non-destructive evaluation of industrial products, even inside their bodies. In light of its effectiveness, this study introduces a new approach to accelerate the inspection of many mechanical parts with the same shape in a bin. The input to this problem is a volumetric image (i.e., CT volume) of many parts obtained by a single CT scan. We need to segment the parts in the volume to inspect each of them; however, random postures and dense contacts of the parts prohibit part segmentation using traditional template matching. To address this problem, we convert both the scanned volumetric images of the template and the binned parts to simpler graph structures and solve a subgraph matching problem to segment the parts. We perform a distance transform to convert the CT volume into a distance field. Then, we construct a graph based on Morse theory, in which graph nodes are located at the extremum points of the distance field. The experimental evaluation demonstrates that our fully automatic approach can detect target parts appropriately, even for a heap of 50 parts. Moreover, the overall computation can be performed in approximately 30 min for a large CT volume of approximately 2000×2000×1000 voxels.


Introduction
X-ray computed tomography (CT) is an effective tool for the non-destructive inspection of industrial products that often consists of many parts. X-ray CT scans, due to the capability of imaging objects' inside in a non-destructive manner, can detect defects in manufactured parts, such as cracks, cavities, and inclusions.
As X-ray CT devices have prevailed, the demand for inspecting many mass-produced parts during their production process has arisen. However, current off-the-shelf X-ray CT scanners require too much time to achieve the total inspection in a production line. A possible solution to this problem is to scan many parts simultaneously in a single CT scan by leveraging the capability of an X-ray CT scanner to visualize all the objects in its fields of view. To this end, identical parts (e.g., those manufactured by injection molding) are often arranged in a sorting tray. However, preparing the tray and arranging the parts can be costly and cumbersome.
In contrast, scanning a heap of identical parts randomly stored in a bin (see Fig. 1(a)) is a more efficient solution. We refer to this process as binscanning in this paper. As shown in Figs. 1(b) and 1(c), bin-scanning obtains an X-ray CT volume containing all the parts in the bin by a single CT scan. We can inspect each part individually once the regions of each part in the CT volume are segmented. This procedure is a template-matching problem because the segmented regions correspond to approximately identical objects. Traditional template matching uses chamfer distance [1] and zero-mean normalized crosscorrelation (ZNCC) [2] to measure the similarity against a template. However, these approaches for image-level template matching inherently suffer from the huge search space for the posture of each part. Thereby, many studies [3] have been conducted to speed up the template exploration but have not solved the problem thoroughly. Our paper discusses bin-scanning, which refers to the process of scanning an X-ray CT volume of randomly stored identical parts in a bin, as shown in (a). The CT volume of binned parts in (a) is visualized by volume rendering in (b), and its slice is shown in (c).
Matching two sets of sparse keypoints is a potential approach for fast template matching. Typically, a geometric feature around each keypoint is computed in advance, and these features are employed as guides for template matching. If each keypoint is associated with a spatial position, we can solve the matching problem using well-studied rigid and nonrigid registration algorithms [4][5][6]. Unfortunately, when some features of the keypoints are similar, approaches only using the spatial arrangement of points may induce unpredictable mismatches.
When a set of keypoints also has a graph structure, template matching can be formulated as a subgraph-matching problem. Early studies considered the problem of finding subgraph isomorphisms (i.e., subgraphs with exactly the same graph structure) [7,8]. Meanwhile, because matching subgraph isomorphisms is an NP-hard combinatorial optimization problem, later approaches solved approximated and relaxed graph matching problems [9,10]. Recently, several studies [11][12][13] have employed deep learning techniques for more robust template matching. However, subgraph matching has rarely been applied to problems in 3D objects because the graph structure will be intractably complicated, for example, when graphs are constructed by connecting vertices sampled on object surfaces.
In this study, we solve the template matching for bin-scanning using a subgraph matching algorithm because it is more efficient than simply matching CT subvolumes using traditional similarity measures. To apply the subgraph matching to bin-scanning, we need to obtain a graph from the input CT volumes of a template part and a heap of parts. To focus more on the global structure of the parts, we leverage Morse theory [14], which is a mathematical tool for analyzing the topological property of a manifold to convert a CT volume into a graph structure. In binscanning, we can assume that the graphs obtained for a template part and each included in the heap will be isomorphic because the shape of the template part is almost the same as those in a heap. Therefore, we apply a simple matching algorithm for subgraph isomorphisms [8] to match the graphs of the template and the heap. Figure 2 illustrates the proposed method consisting of two stages. The inputs for our system are two volumes of a template and a heap of parts. For brevity, we refer to these two volumes as the template and target, respectively. We convert each volume into a graph, which we refer to as a Morse skeleton graph (MSG). The MSG is constructed based on Morse theory, and the graph nodes are located on the skeleton of the object's geometry. Then, we apply subgraph matching to search the template in the target volume. In this way, we can simultaneously achieve both the segmentation and localization of the template parts in the target volume.
Our method allows for storing the target parts randomly in a bin during inspection, eliminating the need to arrange the parts in a sorting tray manually. Furthermore, the computation time required by our system is sufficiently shorter than that for a CT scan. Therefore, we can successively scan heaps of parts while processing the CT volumes obtained by the scans synchronously. Hence, our method can facilitate the total part inspection using the proposed bin-scanning system. An overview of our graph-based template matching system for bin-scanning. Our system processes two input volumes of a template part and a heap of parts identical to the template. The system identifies the region of each part in the volume of heaped parts and calculates the posture of the part as a rigid transformation matrix. To this end, each volume (i.e., those of a template part and its heap) is converted to a graph, which we refer to as a Morse skeleton graph. Then, using a subgraph matching technique, we match the graph of a template with subgraphs in the entire graph of heaped parts. Note that the graph structures in this figure are for illustration only.

Problem statement
We assume that our template-matching problem is defined in the 3D Euclidean space R 3 . We denote a position with the bolded small letter (e.g., x ∈ R 3 ), a transformation R 3 → R 3 with the bolded capital letter (e.g., T ∈ SO 3 ), and a region (i.e., a subspace of R 3 ) with the calligraphic letter (e.g., P ⊂ R 3 ).
An X-ray CT scan generates a volumetric image (i.e., CT volume) consisting of voxels with CT values that are roughly proportional to the physical density of the parts. When the parts are made of a single material, voxels of the parts have approximately the same values, as shown in Fig. 1(c). Therefore, we can extract the voxels of the parts by binarizing both template and target volumes. More formally, we obtain a template part P 0 and a target heap of N parts H = N i=1 P i using the X-ray CT scan, whereas the number of parts N is unknown. The posture of each part P i (i.e., its position and orientation) is represented by a rigid transformation T i , with which we write P i as The goal of our problem is to find T 1 , · · · , T N with a given P 0 and H. More precisely, each P i is a closed subset of R 3 , and any two parts do not intersect (i.e., ∀i, j : Int P i ∩ Int P j = φ, where Int P denotes the interior of P) because each part is a solid object. By contrast, the edges and corners of the two parts may be shared (i.e., ∃i, j : ∂P i ∩ ∂P j = φ, where ∂P denotes the boundary of P).
Our method converts both the template volume P 0 and target volume H into lightweight graph structures to accelerate the comparison of the two volumes (see Fig. 2). The conversion is based on Morse theory and persistent homology. Specifically, Morse theory is used to convert the volume into a graph, while persistent homology is used to control the number of graph nodes. Then, we match the graph structures between the template and target graphs using an efficient subgraph matching algorithm [8]. In this way, the entire template matching process can be completed in approximately 30 min, even for a large CT volume (e.g., the one with 2000 × 2000 × 1000 voxels).

Related work
Template matching for bin-scanning needs to solve two different problems simultaneously: segmentation to detect each part in a heap and localization to determine the positions and postures of the parts. This section introduces related studies on these topics, including recent studies based on deep learning. We will also introduce the applications of Morse theory to several geometric modeling tasks because the theory is leveraged in our graph construction.

Image and volume segmentation
The most straightforward approach for segmenting specific regions in a grayscale image is binarization, which often relies on conventional thresholding [15][16][17] and that combined with mathematical morphology (e.g., dilation and erosion) to detach the image domain into small connected regions [18]. Pixel clustering, such as the mean shift method [19], has also been employed for segmentation [20,21], although its high computational cost hinders its application to large volume data. Active contour models, such as Snakes [22], and level sets [23], are other popular choices for 3D CT volumes [24,25], which optimize the contour of an object using the gradients of gray voxel values. However, it is not easy to obtain the contours using active contour models when target objects touch one another. Graph cuts [26][27][28] can obtain a globally optimal segmentation result, thus avoiding the fragmentation of object regions owing to noise. Unfortunately, its high computational cost makes processing large 3D volumes impractical. The watershed method [21,29,30] is also a common approach for segmentation and is often used with distance transform when applied to binary images. Its main drawback is over-segmentation, which causes failure in the subsequent template-matching process.
Our method is similar to the watershed method based on the distance transform but has two noticeable differences. First, we control the persistence of the watershed to prevent over-segmentation. Second, our method applies subgraph matching to a graph that is built based on the results of the watershed method and may contain over-segmented regions.

Template matching based on geometric features
Template matching is often based on photometric and geometric features and simultaneously solves segmentation and localization problems. Based on the vicinity of the features, we can match a template to other data using the correspondences of elements, such as pixels, voxels, and 3D points. For volumetric images, feature vectors, such as the 3D scale-invariant feature transform (SIFT) [31], are computed by extending the features for 2D images [32]. However, when applied to large 3D volumes, these approaches can suffer from high computational costs when exploring the correspondences of too many feature points, even when accelerated, for example, by random sample consensus (RANSAC). In contrast, several recent approaches [33][34][35] have achieved efficient template matching of 3D sparse point clouds, which are obtained by handheld optical scanners and LiDARs, based on the point feature descriptors, such as point pair features (PPF) [36]. These approaches can be applied to 3D CT volumes by extracting an isosurface, e.g., by using the Marching Cubes method [37], and sampling points on the surface. However, the mean recall of the state-ofthe-art method [35] is still insufficient for industrial part inspection, where detection requires almost 100% accuracy.

Segmentation using deep neural networks
Currently, deep learning is one of the most standard approaches for image and volume segmentation, while various network architectures, such as fully convolutional network [38], U-net [39,40], and feature pyramidal network [41], have been applied. However, recent X-ray CT devices have already been equipped with 8k detectors (i.e., with 7680 × 4320 pixels), and thus, the large memory consumption of 3D convolutional neural networks (CNNs) hinders their straightforward application to large CT volumes.
Preceding the industrial applications, deep learning techniques for medical-purpose X-ray CT have already been common that aim to reduce the dose, namely, the patient's X-ray exposure. One previous method reduced the computational complexity by applying 2D CNNs to images obtained by slicing the target volume along three axes [42]. Other methods have concentrated the computational resource only on a specific part of the volume [43,44] and use region-specific priors [45,46]. Unfortunately, these approaches aim to extract the region belonging to a specific object (e.g., lung) and not to identify the regions for different substances of the same object. Furthermore, preparing a large training dataset for our purpose (i.e., segmenting heaped parts in a bin) is extremely cumbersome.
For industrial X-ray CT, a recent study [47] applied the flood filling network [48] to segment a large CT volume with 10,000 3 voxels. However, this method aims to segment the regions of parts made of different materials and cannot be used for the parts made of the same material.

Morse theory for geometric modeling
Morse theory is a powerful tool for topology analysis that can extract geometric and topological features of shapes. In geometric modeling, Forman [49] introduced the discretized version of Morse theory and related algorithms, and Edelsbrunner et al. [50] demonstrated an application of the theory to linear 2-manifolds (e.g., a triangular mesh). As well as the geometry processing on triangular meshes [51], Morse theory has also been applied to image processing [52,53], visualization of cosmic objects [54], molecular analysis [55], and mesh quadrangulation [56,57]. For further applications, we refer readers to the comprehensive surveys [58,59].
A segmentation method similar to that shown in this paper has been introduced by Nagai et al. [60]. Their method introduced semantic segmentation based on Morse theory, which processed 3D Xray CT volumes of industrial assemblies. However, their method requires careful human intervention to achieve good segmentation, even for an assembly comprised of a few parts. By contrast, our method is fully automatic.

Graph construction from a volumetric image
This section introduces several background theories and how the graph structure is obtained from an input 3D CT volume using Morse theory. We present an overview of the conversion from a volume to a graph in Fig. 3.

Distance transform and its properties
We assume that a 3D part is represented by a connected region P ⊂ R 3 . The distance transform obtains a scalar field for the region based on a metric in R 3 . For part P (i.e., a connected region), we can write the distance transform D : R 3 → R by introducing a simple Euclidean norm · as a metric in R 3 .
At a point x ∈ P, its value simply takes the minimum distance to the boundary ∂P and is zero for all x / ∈ P. Figure 3(a) shows a simple example of the distance transform. The distance transform for a region defined on a discrete 3D volume can be calculated by solving the eikonal equation ∇D(x) = 1/f (x), which is efficiently solved using the fast marching method [61] with the computational complexity ordered by O (N log N ).
A good property of the distance transform is its invariance under a rigid transformation. Let T (x) = Rx + t and T −1 (x) = R −1 x − R −1 t be a rigid transformation in R 3 and its inverse, respectively, where R ∈ R 3×3 is a rotation matrix, and t ∈ R 3 is a translation vector. Using the notation T P in Eq. (1) for a rigidly transformed region, we can write the invariance of the distance transform as D(x; P) = D(T (x); T P) Let us assume that a number of parts P 1 , P 2 , · · · , P N with the same shape are in the target volume and P 0 is the part of the template volume. Then, we can write each part P i = T i P 0 using a rigid transformation T i . Considering the invariance of the distance transform in Eq.
(2), we can write D(x; P i ) = D(x; T i P 0 ) = D(T −1 i (x); P 0 ) Furthermore, the distance transform of the heaped parts H = N i=1 P i is derived from the property of distance transformation of taking zero outside the region.
This property is important for template matching because it ensures that the distance transform of the template part is equivalent when a part with the same shape is in a heap.

Basic Morse theory
The Morse theory was originally devised for smooth function on manifolds [14], and in computer graphics, this theory is often used to determine a topological structure on 2-manifold [57,62]. In contrast to these studies, we extract a graph (i.e., the topological structure) on 3-manifold M, which corresponds to P i and H equipped with a simple Euclidean metric. Here, we assume that the 3-manifold M is parameterized by the three coordinates (u 0 , u 1 , u 2 ).
Let f : M → R be a real-valued function of the manifold. Then, a point is critical when its derivative with respect to coordinate [∂f /∂u i ] is zero; otherwise, it is regular. Furthermore, a critical point is Morse when the Hessian matrix [∂ 2 f /∂u i ∂u j ] at the critical point is nonsingular; otherwise, it is degenerate. If and only if the critical points on the manifold are Morse, we refer to the function f as a Morse function. We can then define an integral line of the Morse function f , which is the maximal path on M whose gradient direction agrees with the gradient of f . The ascending (or stable) manifold and descending (or unstable) manifold are defined as clusters of integral lines with a common origin and destination, respectively. The Morse complex is the partition of M into descending manifolds. By contrast, the Morse-Smale complex is a partition of M where each cell is composed of a set of integral lines with the same origin and destination. In our problem, each cell of the Morse complex computed for the distance transform corresponds to a graph node. Specifically, each node lies on a local maximum of the distance field, and two adjacent nodes lie on opposite sides of a saddle which corresponds to an edge connecting the nodes.

Simplification on Morse complex
An important property of the Morse complex is that the number of cells can be reduced using an operation called pair cancellation [58,63]. Pair cancellation is based on the importance of a pair of critical points, which is often defined by the gap between the Morse function values at two critical points. The importance is also known as persistence, and the persistence p of two critical points u and v is formally defined as Depending on the distribution of ascending and descending directions around a saddle, the saddles of a function in a 3D space can be classified into 1-saddle and 2-saddle. The cancellation on a 3D Morse-Smale complex can only be defined for either a pair of local minimum and 1-saddle or a pair of the local maximum and 2-saddle. This eliminates the critical points in the pair and redefines the graph structure with a revised topology (or a gradient field). Informally, on a 3D Morse complex (not Morse-Smale), the cancellation between a local minimum and a 1-saddle corresponds to the elimination of an edge, whereas the cancellation between a 2-saddle and a local maximum corresponds to the elimination of a face [63]. Thus, we can control the complexity of the Morse complex by increasing and decreasing the minimum persistence allowed. We show an example of the graph simplification in Fig. 3(c), where two Morse cells depicted with green and orange are merged into one, and the graph structure is updated accordingly.

Part segmentation by graph matching
To reduce the computational complexity of the template-matching problem between the target and template volumes, we solve the problem using a computationally simpler subgraph matching algorithm. Using the pair cancellation described above, we represent each template and target volume as a graph with a few nodes and edges. Once the graph structures are obtained, we search subgraphs of each part in the target heap by subgraph matching.

Morse skeleton graph
To define a graph structure on an input CT volume, we first extract an object region Ω and its boundary isosurface ∂Ω using the Marching Cubes method [37]. We assume that Ω is a close 3-manifold with a boundary. Second, we compute the distance transform D(x; Ω) from the boundary ∂Ω using the fast marching method [61]. The distance field on a discrete grid (e.g., pixels in an image and voxels in a volume) is not Morse in a strict sense because it is not C 2 -continuous at its local maximum. However, when we assume that this distance field is a discretization of a continuous height map, critical points can be determined by checking whether neighboring voxels have a smaller or larger function value than the voxel of interest.
In our problem, the critical points appear only on either the local maximum or the 2-saddle owing to the non-decreasing property of the distance transform. To partition Ω into descending manifolds, we trace an integral line starting from every voxel x ∈ Ω until it reaches a local maximum, thereby obtaining a Morse complex for Ω as shown in Fig. 3(b). We denote each cell of the complex by C(q i ) ⊂ M where q 1 , · · · , q M are M local maximum points of D(x; Ω). By contrast, 2-saddles exist between the two cells. We determine the 2-saddles by searching for a voxel over the boundary region of two cells (i.e., ∂C(q i )∩∂C(q j )) that satisfies The MSG is defined by the nodes corresponding to the Morse cells and the edges corresponding to the 2-saddles between the two cells (see Fig. 3(c), left). As these critical points are located on the medial axis (i.e., the skeleton of Ω), we refer to the graph structure as a Morse skeleton graph. However, in practice, the Morse complex computed for an X-ray CT volume can include a considerable number of cells because of noise. Therefore, the MSG, corresponding to the complex, also has many nodes. We simplify the MSG using a pair cancellation described previously to reduce the computational complexity in the subsequent subgraph matching. As discussed in Section 3.3, the cancellation of a pair of 2-saddle and a local maximum corresponds to the elimination of a face (i.e., a cell or a portion of voxels in our case). Therefore, we cancel a series of saddle points in ascending order of persistence until the minimum persistence of the saddles reaches a predefined threshold τ . For the Morse complex computed for a distance field, we employ persistence defined as p(s ij ) = min{D(q i ; Ω), D(q j ; Ω)} − D(s ij ; Ω) Considering the fundamental property of persistence, eliminating a saddle point to merge two adjacent local maxima into one does not affect the persistence values of other saddles. After a local maximum is removed, the persistence values of only the saddles next to the removed local maximum are updated. The pair cancellation, performed for the 2-saddles in the order of decreasing persistence, can be implemented using a priority queue. Hereafter, we denote the MSG for an object Ω as G Ω = G(V Ω , E Ω ) where V Ω and E Ω are the nodes and edges of the graph, respectively.
A simplification of the 3D MSG for a teapot example is shown in Fig. 4. To synthesize the data of the heaped teapots, we used a standard rigid body simulation to determine their postures. Subsequently, we obtained the volumes of a single template part and a heap of parts by an X-ray CT simulator. The size of the volume data was 800 × 800 × 600 voxels for both of them. Figure 4(a-0) shows the initial MSG of the template volume, including 22 cells. By increasing the persistence threshold τ , the MSG is simplified gradually to consist of fewer cells, as shown in Figs. 4(a-1), 4(a-2), and 4(a-3), respectively. The simplification of the MSG is also performed for the volume of the heaped parts, decreasing 395 cells in the original MSG to only 30 cells, as shown in Figs. 4(b-0) and 4(b-1). During simplification, the threshold τ should not be too large to obtain a few cells because at least three cells are required to obtain a rigid transformation matrix between the template and a part in the heap. Moreover, it should not be too small because a small τ can obtain many cells. The MSG before simplification is sensitive to noise signals in the original volume, and insufficient simplification makes the MSGs different for the template and heaped parts, even if they represent the same parts with different postures. We discuss the method to determine the threshold τ in Section 5.

Part separation and localization via subgraph matching
In a graph constructed for a set of heaped parts G H , subgraphs for each part (i.e., G P 1 , · · · , G P N ) may be connected, and some edges connect cells belonging to the subgraphs of different parts (see Fig. 2). Therefore, we employ subgraph matching to detach the subgraphs of the individual parts. In bin-scanning, we can assume that the graph for a template part has the same structure as that of each part included in the heap. Therefore, our method employs a simple algorithm to match subgraph isomorphisms [7,8]. Specifically, our subgraph matching is based on the VF2 algorithm [8] which combines a simple backtracking algorithm using depth-first search [7] with search pruning using node and edge attributes. Using the terminology in subgraph matching, the MSG for the template volume G P 0 corresponds to a query graph, and that for the target volume G H corresponds to a data graph. In the VF2 algorithm, we sequentially match the nodes of the query graph to those of the data graph in an order defined by the adjacency of the nodes. Because each node of the MSG is associated with a Morse cell that can be embedded into a 3D space, we employ three criteria for robust node matching. Assume a case where we attempt to match the mth node of the query graph to another node in the data graph after the nodes q i 0 ∈ V P 0 and q i ∈ V P for i = 1, · · · , m − 1 have already been matched. In this case, we first check that the size of cells C(q m 0 ) and C(q m ) are sufficiently similar, where C(q) is the cell associated with q. With a predefined threshold τ V , this criterion is determined by d r (|C(q m 0 )| , |C(q m )|) < τ V where d r (x, y) is the relative difference defined as d r (x, y) = |x − y|/ min(x, y) and |C| is the size of the cell C (i.e., the number of voxels). Second, we check that both q m 0 and q m are at the same distance from the surface position nearest to each of them. This criterion is defined using the distance field and another threshold τ D . d r (D(q; H), D(q 0 ; P 0 )) < τ D Third, we check whether the arrangement of positions q 1 0 , · · · , q m 0 , including the new one, geometrically matches in shape with that in the graph of a heap of parts, i.e., q 1 , · · · , q m . To this end, we compare the distances from a new node q m 0 to other nodes q 1 0 , · · · , q m−1 0 with the distances from q m to q 1 , · · · , q m−1 . The criterion is formally written as , q i − q m < τ R , ∀i = 1, · · · , m − 1 By constantly checking the distance from a new node to the other nodes that have already matched, we can check that the distances of all pairs of nodes in a graph are sufficiently close to those of the other graph. For the same purpose, we might alternatively use a similar criterion to check the geometric equivalence of the two graphs by aligning them. However, this is timeconsuming because a typical solution to obtain a rigid transformation to align two sets of points requires singular value decomposition (SVD) [64]. The third criterion is significantly faster than performing SVD every time a new pair of nodes is matched.
When all nodes in G P 0 are matched to the nodes in G H with the correct topological order, we remove the matched part from G H to avoid duplicate matching. This matching procedure is repeated until no more subgraph remains in G H that can be matched to After subgraph matching is completed, we can calculate the rigid transformation T i of the part P i in a bin using the positions of the nodes of the matched subgraphs. A rigid transformation is obtained as a solution for minimizing the sum of the squared distances between the matched graph nodes corresponding to the local maximum points of the distance transform. We employ an SVD-based solution [64] to solve the minimization problem and to obtain the rigid transformations. Although the SVD-based solution may be rather time-consuming to be used repeatedly, it is performed only once for each part in the end.
We validated the subgraph matching followed by computing the rigid transformation matrices using the previous example of ten teapots. Rigid transformations are computed for the teapots, and the template teapots are arranged to reproduce the heap using the transformations. The reproduced heap of teapots (colored in yellow) and input heap of teapots (colored gray) are compared in Fig. 5. As shown in the figure, the postures of the teapots appear almost equivalent to those in the original heap, and the amount of misalignment appears small. The correct rigid transformation matrices to reproduce the heap for this example are known because the postures of the teapots are calculated by rigid body simulation. After comparing the correct matrices with those computed by our method, the average misalignment is as small as 3.48 voxels in translation and 1.83 • in rotation.

Experiments
To evaluate the proposed algorithm, we conducted several experiments on the three sets of binned parts shown to the left of Fig. 6, which consist of (a) 50 plastic screws, (b) 10 plastic cases, and (c) 500 plastic connectors. We implemented our system using C++ and tested it on a computer equipped with an Intel Core i9-9980XE CPU (3.0 GHz, 18 cores) and 128 GB RAM. The computation time for these examples is shown at the bottom of the respective photos. The overall computations were completed within approximately 30 min, even for a large volume of 2000 × 2000 × 1000 pixels.

System parameters
Before elaborating on the results, we discuss how the parameters used in our method affect them. For the three parameters used in subgraph matching, we experimentally set τ V = 0.5, τ D = 0.5, and τ R = 1.0 and used the same values in all the experiments below because these values do not sensitively affect the results. However, τ (i.e., a threshold to determine the minimum persistence of an MSG graph) somewhat affects the results. Figure 7 shows a chart illustrating the relationship between the number of cells in a graph of the template part and the minimum persistence τ , which is computed for the example in Fig. 6(b). In this chart, the MSG simplification proceeds from right to left, where the minimum persistence monotonically increases from right to left while the number of cells decreases. Interestingly, the change in minimum persistence has a large gap between 0.2 and 0.6, where the number of cells decreases from 7 to 6. Such a significant gap appears when cells are overly merged and have uneven shapes. Therefore, we set the threshold τ = 0.4 in the middle of the gap. Table 1 shows the number of detected matches (i.e., true positive and false positive matches) and the number of correct matches (i.e., true positive matches) obtained by different τ 's. The numbers in this table are computed for the example in Fig. 6(b) of the plastic case, and the number of parts in the CT volume is ten. Although the choice of τ can affect the detection of parts, τ = 0.4 in the middle of the large gap correctly detects all the parts. Figure 6 shows the experimental results for three example parts. For each kind of part, the top row shows a template part and its corresponding MSG, and the bottom row shows a heap of parts, its corresponding MSG, and the parts detected by our system. The nodes of an MSG are colored differently Table 1 Change in the detection ratio when using different τ 's, the thresholding parameter to control the number of cells. The numbers shown in this table are computed with the example of plastic cases in Fig. 6(b), which consists of 10 cases in total in the figures of the MSGs, whereas the detected parts are colored differently in the figures of the detected parts.

Performance evaluation
As shown in Fig. 6(a) and 6(b), our system detected all 50 screws and 10 cases correctly despite the dense contacts. Although the shapes of MSG cells in Fig. 6(b) might be rather irregular, the shapes do not significantly affect our subgraph matching, which considers only the locations of the critical points and the size of the cell. Furthermore, the proposed system can obtain a rigid transformation matrix for each part. We applied an inverse transformation to each detected screw in Fig. 6(a) and aligned the screws in evenly spaced cuboids. This figure shows the successful detection of screws. Specifically, cells belonging to the same parts are merged as intended during the simplification of the MSG, and the shapes of the detected parts are equivalent to the shape of the template part. Furthermore, the orientations of the screws were equivalent, despite their thin and long shapes with a small head.
However, we also found that the positions of the screws were somewhat misaligned. This is because the positional misalignment of local maxima is inevitable when using a discrete distance field. The isosurface extracted from an X-ray CT volume can deviate because of various artifacts [65], and this deviation can displace the positions of the local maxima. Although an additional alignment process can overcome slight misalignments (e.g., using ICP), the standard deviations of the gravity centers of screws were only 4% of their length without such postprocessing.
For a more challenging example in Fig. 6(c), we could detect only 201 of the 500 connectors, which contradicts the previous results for screws and cases. We attribute the low detection ratio to the small cavities, as shown in Fig. 9(a), generated by the fragile manufacturing process of injection molding. Such cavities cause inconsistencies in the distance fields between the template part and each part of the heap, thus obtaining different extremum points for each. This is a limitation of our system, and cavities in the volume data must be removed in advance.
The results in Fig. 6(c) also suggest another limitation of our system. In these data, each connector has small flat sides, and the two connectors can contact each other with the flat sides. Such face contact violates our assumption to ensure the consistency of graph structures. Fig. 9 Limitations of the current graph-based template matching. When (a) small cavities are included in a part and (b) two parts are in contact with a face, the distance field of the part will be inconsistent with that of the template. The inconsistency propagates to the graph structure; thus, our system cannot match the parts appropriately.

Conclusions
This paper introduced an efficient template-matching method for bin-scanning, where identical parts stored in a bin are CT-scanned and segmented fully  Fig. 6(a) segmented by our system. The screws are shown by placing them in evenly arranged cuboids by applying the inverse of the rigid transformation matrices also calculated by our system. Although the positions are slightly misaligned due to the positional displacements of local maximum points on the distance fields, we can find the identification of the parts is successful in terms of the well-estimated postures.
automatically. Our method converts an input CT volume into a graph, which we refer to as the MSG, and leverages the well-studied subgraph matching algorithm to accelerate template matching. After the template is matched to the subgraphs in the volume of the heaped parts, we can calculate the rigid transformation from the template to each part. The experimental results showed that our system works well for heaps of parts with various shapes, such as teapots, screws, and cases.
However, the detection ratio can be lower when a part involves random cavities included during the manufacturing process, and the two parts are in contact with their faces. Similar to many previous methods for template matching [66] and point set registration [4,67], the symmetric shapes of the parts may hinder the acquisition of a unique rigid transformation. In future work, we will seek to alleviate these problems by developing a graph construction insensitive to small cavities and a more sophisticated graphmatching algorithm applicable to parts with face contacts and symmetric shapes.