Evaluation of geometric similarity metrics for structural clusters generated using topology optimization

In the early stages of engineering design, multitudes of feasible designs can be generated using structural optimization methods by varying the design requirements or user preferences for different performance objectives. Data mining such potentially large datasets is a challenging task. An unsupervised data-centric approach for exploring designs is to find clusters of similar designs and recommend only the cluster representatives for review. Design similarity can be defined not only on a purely functional level but also based on geometric properties, such as size, shape, and topology. While metrics such as chamfer distance measure the geometrical differences intuitively, it is more useful for design exploration to use metrics based on geometric features, which are extracted from high-dimensional 3D geometric data using dimensionality reduction techniques. If the Euclidean distance in the geometric features is meaningful, the features can be combined with performance attributes resulting in an aggregate feature vector that can potentially be useful in design exploration based on both geometry and performance. We propose a novel approach to evaluate such derived metrics by measuring their similarity with the metrics commonly used in 3D object classification. Furthermore, we measure clustering accuracy, which is a state-of-the-art unsupervised approach to evaluate metrics. For this purpose, we use a labeled, synthetic dataset with topologically complex designs. From our results, we conclude that Pointcloud Autoencoder is promising in encoding geometric features and developing a comprehensive design exploration method.


Introduction
Recent advances in high-performance computing and simulation tools enable numerical optimization techniques to support engineers by automatically generating a large set of concepts satisfying design requirements. Topology optimization (TO) [1][2][3][4][5][6] is the most flexible type of optimization to generate novel structural concepts; it optimizes material layout subject to a volume constraint in a given design domain for an objective, e.g., structural compliance under specific loads and supports. One of This project has received funding from Honda Research Institute Europe GmbH, Germany.

Nivesh Dommaraju nivesh.dommaraju@tum.de
Extended author information available on the last page of the article. the popular gradient-based TO methods is a densitybased approach using SIMP (Solid Isotropic Material with Penalization) using optimality criteria (OC) update schemes [1]. Figure 1 shows exemplary structures obtained using SIMP. Despite the origins in structural mechanics, TO has found applications in a wide range of physical disciplines such as fluid mechanics [7], electromagnetics [8], and acoustics [9]; it is widely used in the aerospace and automotive industry, civil engineering, materials science, and biomechanics.
Using TO, multiple designs can be generated using different methods which can be classified into the following three groups: -Parameter sampling. Novel designs can be generated by varying the material properties, constraints, boundary conditions, and hyperparameters of the optimization algorithm. For example, in TO, changing the allowed mass in a prescribed design domain results in a new design (Fig. 1). Furthermore, the problem description, Fig. 1 TO designs optimized for structural compliance and constrained to different volume fractions in the cubic design space. The optimization objective is structural compliance of the design under two fixed loads, shown by white arrows, with a fixed boundary. The allowed volume fraction is ranging from 0.3 to 0.1, from left to right e.g., the boundary conditions, can also be varied without changing the final design objective. For example, while designing a component to support a fixed load using, say, three support legs, one can change the allowed fraction of load in the supports to obtain a new design [10]. -Multiobjective optimization. In practice, designs may need to be optimized for multiple objectives, e.g., energy absorption under crash loads, and structural stiffness under smaller static loads. Multiobjective TO [11] with conflicting objectives yields a set of Paretooptimal designs, where choosing a design with better performance for one objective results in performance deterioration of another objective. -Multimodal optimization. Highly complex and nonlinear objectives are normally multimodal, i.e., they have several local optima. Since not all the constraints are known in the early development phase, having a set of local optima is useful, in case some of them violate the unknown constraints in the future. Such designs can be identified using evolutionary algorithms [12] or by restarting gradient-based optimization algorithms, such as TO, from different initial configurations [13], converging to a different optimum in each of the runs.
The aforementioned methods can potentially generate multitudes of designs in the early product development phases. A challenging task is to explore the different concepts and identify a few interesting designs for further review. A selection of designs can be done based on experience, manufacturing cost, or design performance. For instance, the dream lens tool [10] is an interactive framework that guides a designer to a set of interesting designs using, for example, range constraints on performance. For Pareto-optimal designs obtained in multiobjective optimization, the decision-maker (DM) can select designs of interest, based on the relative importance of different objectives [14,15]. However, in practice, it is difficult to define objectively the intent of a DM, especially with a large number of objectives [16]. Hagg et al. [17] propose an unsupervised data-centric approach for exploring designs by finding clusters of similar designs based on performance, e.g., the drag coefficient of an airplane. The medoids of different clusters are interpreted as representative designs, which can be recommended to an engineer for review and further development. In this study, we use this automated approach to explore designs. However, the engineer still needs to choose the features, metrics, and clustering methods, which determine the final recommended designs.
Along with the performance of a design, differences in geometric properties such as size, shape, and topology are also important, especially at the early stages of product development. Quantifying the geometrical differences allows the identification of similar designs and their representatives in a dataset [18]. Furthermore, metrics for geometrical differences allow the use of similarity-controlled optimization methods [19,20] to yield designs similar to a set of reference designs, which might be desirable due to economic reasons, manufacturing limitations, or ease of integration into the existing design process. In all of these applications, the metric used to measure geometrical differences is crucial and is the subject of our study.
Numerous metrics exist with varying degrees of accuracy and computational complexity. Due to the highdimensionality of 3D geometric data, each metric may only compare a few of the geometric properties depending on the complexity of designs. Hand-crafted geometric feature vectors based on surface curvature, material distribution statistics, or spectral descriptors [21] can be used to distinguish designs. In contrast, data-centric methods-which are more successful in practice-extract features relevant to a specific 3D geometric dataset, e.g., PCA (Principal Component Analysis) [22] can identify a reduced number of uncorrelated features that explain the variance in data. Other dimensionality reduction techniques popular in the machine learning field can also be used for geometric feature extraction, e.g., NMF (Non-negative Matrix Factorization) [23], t-SNE (t-distributed Stochastic Neighbor Embedding) [24], or UMAP (Uniform Manifold Approximation and Projection) [25]. More sophisticated methods based on deep learning networks exist too-Qi et al. [26] use PCAE (Pointcloud Autoencoder) to learn features from a pointcloud obtained by sampling points on the surface of the design. The reduced representation has an additional benefit of associating geometric features to a design like any other performance measure. Though, it is not clear if the Euclidean distance with the reduced representation is as meaningful as the reference metrics such as voxel distance. So, we propose a novel method to validate the Euclidean distance by comparing its similarity with reference metrics commonly used in 3D object classification.
In this paper, we quantitatively compare different metrics of geometrical differences based on criteria that are important when analyzing topologically complex design datasets obtained using structural optimization. The criteria considered in this work are as follows: • The metrics should be sensitive to geometrical differences in size, shape, topology, and orientation in a design space. Invariance to rotation, reflection, and translation operations is considered as an advantage in 3D design classification, but not for TO designs, where the configuration relative to the boundary conditions is important. • The metrics should allow the identification of clusters of similar designs even in topologically complex datasets. This allows the recommendation of diverse designs using clustering methods.
• Metrics that use vectors of geometric features associated with designs are preferred since the feature vectors enable 2D visualization of the complete dataset using manifold learning techniques [18,24], easing data exploration. Furthermore, such features can be combined with performance features directly. However, in this case, we need to ensure that the Euclidean distance in the space of geometric features is still meaningful.
In what follows, Section 2 discusses different geometric representations, such as voxel and pointcloud representations, of 3D geometric data, which are used by some intuitive reference distance metrics of geometrical differences (Section 3). Section 4 introduces a few dimensionality reduction techniques to extract geometric features which are used to derive new metrics. In Section 5, we present methods to evaluate metrics based on different properties. Section 6 describes the datasets used to evaluate the metrics. The results of the evaluation are shown in Section 7. Using simple datasets, we verify if the features capture our intuition on geometrical differences. With topologically complex design sets, the properties of different metrics are highlighted. Since our final goal is to identify geometrically similar design classes in a dataset, we compare different feature extraction methods based on the clustering performance with the use of a challenging dataset. Finally, in Section 8, we explore topologically optimized designs using geometric features. Section 9 concludes this study with the key results and an outlook for further research.

Geometric representation
The geometric representation of a design determines the available metrics and feature extraction methods. In this study, we consider two geometric representations: voxel and pointcloud representation (Fig. 2). The former is a natural representation in TO since the design domain is generally discretized into voxels [1], while the latter is a compact and expressive representation popular in 3D object recognition, classification, and segmentation [26,27].
Voxel representation A 3D domain containing the design is discretized into voxels using a regular grid. The voxel representation of the design is a vector of values x ∈ {0, 1} n , where n is the number of voxels. Each component x i corresponds to a specific voxel in the design domain. If the design occupies the voxel i, x i = 1, otherwise, x i = 0. For voxels that are only partially occupied by the design, we assign x i = 1 when the majority volume of the voxel is occupied. The vector x can be large since the design space may be finely discretized to resolve the complexity of the design. This representation is convenient for TO designs which are generally optimized in a voxelized domain [1]. An interesting research question is how well the dimensionality reduction techniques, currently popular in the machine learning field, can identify underlying design patterns and extract relevant features from the voxel representations of TO designs.
Pointcloud representation Compared to the voxel representation, a pointcloud is a compact representation since it only samples the points on the surface of a design. Geometric learning methods using this representation [26,27] are interesting since they can identify different classes of shapes in publicly available datasets such as ShapeNet [28].
Other geometric representations include octrees, 3D meshes, and multi-view projections. The octree-based representation [29] alleviates the high memory usage of a voxel representation by using a higher resolution of voxels only when it is required, e.g., near the surface of a design. 3D meshes, similar to pointcloud representations are compact. They represent the surface of a 3D geometry using a set of polygon faces. A graph representation can also be used to represent 3D meshes. Due to the high computational cost of these representations, there is an increasing interest among researchers to work with low-dimensional pointcloud representations [30]. For our initial analysis of TO designs, we consider metrics provided by voxel and pointcloud representations.

Reference distance metrics
In this section, we describe a few intuitive metrics used to measure geometrical differences between designs. These metrics serve as a reference to compare with the metrics that are derived from dimensionality reduction techniques as explained in the next section.
Voxel distance Voxel representations of two designs can be compared when the voxel arrays have the same size and correspond to the same regular grid in the 3D domain. The Euclidean distance in the voxel representation is equal to the square root of the number of non-overlapping voxels that are occupied by only one of the designs. A disadvantage of this metric is that it is insensitive to the position of nonoverlapping voxels. For two designs without any overlap, the metric is invariant to their relative position in the design domain, so long as there is no overlap and the voxel grid in the domain does not change.

Chamfer distance (CD)
A pointcloud is a compact representation obtained by sampling points on the surface of a 3D geometry. The CD [26,27] is a metric used to measure the difference between two pointclouds, say S 1 , S 2 ⊂ R 3 (Fig. 3). It is defined as follows: where a, b ∈ R 3 .
Earth mover distance (EMD) Similar to CD, EMD [27] is calculated on a pointcloud representation. It solves an optimization problem to find a mapping Ψ : S 1 → S 2 between the points of the pointclouds such that the objective a∈S 1 a − Ψ (a) 2 is minimized. We use an approximate but fast algorithm for EMD calculation proposed by Achlioptas et al. [27]. Still, EMD is computationally more expensive than the CD.
The metrics-voxel distance, CD, and EMD-are sensitive to any changes in configuration relative to the boundary conditions, e.g., due to rotation, translation, or reflection. The voxel distance compares very highdimensional voxel representations. CD and EMD are functions defined with two designs as input. Other metrics such as Wasserstein distance [31] are not considered here since we only need a few reference metrics to demonstrate our metric evaluation method. As discussed previously, the emphasis of this study is to evaluate metrics based on dimensionality reduction techniques, the benefits of which will be discussed later.

Metrics based on dimensionality reduction techniques
Dimensionality reduction techniques identify the underlying patterns in a dataset, summarizing each data point with a lower-dimensional feature vector. These methods are applied to reduce the high-dimensional 3D geometric data to yield a feature vector, referred to as geometric features in this work. Euclidean distance between these lowdimensional feature vectors, when meaningful, can be used as a metric of geometrical differences. In the later sections, we investigate if such derived metrics are indeed meaningful by comparing them with the reference distance metrics.
In this study, we investigate the use of dimensionality reduction methods such as PCA [22], NMF [23], t-SNE [24], UMAP [25], and PCAE [27]. Although this is not an exhaustive study, these methods are representative and widely used in different fields. PCA extracts non-redundant features using linear transformations [22,32]. t-SNE and UMAP are nonlinear dimensionality reduction techniques used in machine learning to visualize high-dimensional data. PCAE is an effective feature extractor for pointcloud representation used in object classification [26,27].
Principal component analysis PCA projects a set of n data points each with d features, The new basis is constructed such that the new coordinates p k are uncorrelated. The dataset X d×n has the highest variance in the first principal component and the variance in the components p i decreases as the order i increases [33]. It is often sufficient to consider only a few of the principal components to explain the variance in data, resulting in the dimensionality reduction. For a voxel representation as input, d is the number of voxels in the design domain and n is the number of designs.
Non-negative matrix factorization NMF [34] factorizes the input data X d×n = W d×p H p×n where W, H have nonnegative entries. In general p min(n, d), which means that each data point x i (column i of X) can be expressed as a linear combination of columns of W , i.e., is the reduced dimensional representation. Since W and H have non-negative entries, data such as images or voxels are decomposed into interpretable components. [24] is a method to embed high-dimensional data in 2D or 3D. The method is especially useful when visualizing clusters in data because similar data points are kept close in the reduced coordinates with high probability.

t-distributed stochastic neighbor embedding t-SNE
Uniform manifold approximation and projection UMAP, similar to t-SNE, is a method to visualize high-dimensional data. McInnes et al. [25] argue that UMAP preserves the inter-cluster distance better than t-SNE. Empirical studies using t-SNE [24] and UMAP [25] show how high-dimensional data can be embedded into 2D without losing the cluster structure. For example, 2D image data of handwritten digits can be reduced to 2D clusters, where images of different digits are separated clearly.
Pointcloud autoencoder An autoencoder is an unsupervised learning method used to reduce the dimensions of an input representation. Umetani et al. [35] parameterize the surface of a shape, which is assumed to be approximately convex. The parameters defined to generate the surface mesh are used as the input vector to an autoencoder. Recent studies on 3D datasets [26,27,36] use autoencoders to extract features from pointcloud representations of designs and identify everyday objects such as chairs, cars, and airplanes. In this study, we use pointcloud representations as input to build a pointcloud autoencoder (PCAE). Pointclouds are simpler to compute and have no additional assumptions on the shape.
For PCAE, we use the neural network architecture proposed by Achlioptas et al [27]. Schematically, the architecture comprises two stacks of neural network layers: encoding and decoding layers. Encoding layers reduce the dimension of the input pointcloud to result in a latent code that is used by the decoding layers to reconstruct an output pointcloud similar to the input. At the start of the training process, the weights used by the network are randomly initialized and the reconstruction is inaccurate. By measuring the difference between output and input pointclouds using a loss function, the autoencoder learns to adjust the weights of the network to reconstruct the input accurately. Since the latent code has fewer dimensions than the input representation, the PCAE achieves dimensionality reduction. Rios et al. [37] demonstrate that a pointcloud autoencoder allows the identification of nonlinear subregions in the design space, each preferentially occupied by a subclass of designs. This explains the usefulness of the latent code in object classification. In this study, we use the CD as the loss function instead of EMD to train the PCAE since CD is computationally cheaper than EMD and is found to be sufficient for our application. It provides meaningful geometric features and clusters similar designs in the test datasets (Section 7).
Each dimensionality reduction technique yields a geometric feature vector for a design. If the Euclidean distance in geometric features captures the geometrical differences, they can be treated as any other performance feature. An aggregate vector with geometric and performance features can be used to find similar designs. For example, this would help to highlight designs with a similar geometrical structure but with different performance values and vice versa.
To verify if the Euclidean distance in geometric features is meaningful, they should be compared with the reference metrics (Section 3) which are designed to measure geometrical differences. We propose a method to do this in the next section.

Methods for evaluating metrics
In this section, we propose two methods to evaluate the different target metrics. The first method compares a given target metric (TM) with a reference metric (RM) which is known to capture at least some aspects of geometrical differences. The second method evaluates the metrics by measuring their clustering performance.

Metric correlations
It is difficult to define geometrical differences between any two 3D geometries objectively. The problem is simplified if the two designs differ only in a simple geometric property. For example, consider a simple set of designs that are obtained by rotating a template design. The angle between any two designs can then serve as an RM. Although such metrics are not generally applicable, we can use them to evaluate more general TMs. If RM is a reasonable metric for a specific dataset, the distances measured by a more general TM should be at least similar to that of RM for the given dataset. We discuss here how to measure this similarity between any two metrics (e.g., TM and RM).
Consider a set of N geometries G = {G e | e = 1, ..., N} and a collection of its geometry pairs: P = {p i = (G m , G n ) | m = n}. A metric, M, measures distance between geometries of a pair (p i ). The collection of distances measured by the metric is given by To measure the similarity between the metrics, RM and TM, we find the correlation between D RM and D TM . A high correlation between the measured values indicates that the metrics are similar. Figure 4 shows the proposed workflow to compare a TM with an RM.
Correlation measure Pearson correlation (ρ p ) [38] measures the linear correlation between any two input variables. It ranges between -1 and +1. If the variables have a positive linear correlation, we expect the correlation to be near +1. In our case, the two variables are D RM , D TM . In general, the values of D RM , D TM may not be linearly related. But if they have a monotonic relation, the metrics can distinguish geometries equally well. So, from hereon, we rely on Spearman correlation (ρ s ) [38], rather than Pearson correlation (ρ p ), to compare metrics. When the correlation between the distances measured by two metrics is high, i.e., ρ s , we say that the two metrics correlate well or the metric correlation is high.
The proposed method can empirically determine if a TM can quantify geometrical differences at least as good as the RM. We complement this evaluation by investigating whether the metrics are also good at clustering similar designs in a topologically complex dataset.

Comparison based on clustering performance
One of the objectives of this study is to identify groups of designs that look geometrically similar. Since geometries in a TO dataset have no pre-determined class labels, we concentrate on unsupervised object classification methods. Workflow to measure similarity between two metrics RM and TM. Note that for each pair of geometries, we obtain two different distance measures by RM and TM. The correlation between the distances measured indicates the degree of similarity between the metrics In addition to the clustering method chosen, the metric used for clustering leads to different clusters of designs. It is difficult to verify the performance of the different clusterings using unlabeled datasets, even though this is the target application. So, we use labeled test datasets for evaluating our method.
Relabeling using the majority label method Each design D i in the labeled dataset has a ground-truth label g i according to its class and a cluster label c i obtained from clustering. Each cluster C k in the dataset is a set of designs with a common cluster label k: If the clustering is successful in identifying the subclasses, then all or most of the designs in a cluster have a common groundtruth label. For example, Fig. 5 shows how a dataset with three subclasses can be assigned arbitrary cluster labels, even if all the subclasses are accurately identified by the clustering algorithm. In practice, when the clustering is not perfect, a cluster can have designs with different groundtruth labels but a majority of them may have a single ground-truth label that can be used to remap cluster labels (Fig. 5). This allows for the use of standard measures of classification performance where predicted labels and ground-truth labels are compared. One of the measures, called precision, finds the proportion of designs that differ in ground-truth and predicted (cluster) labels. We also use the weighted average of F1-score [39] which takes into account both the precision and recall scores for each label and then weighs the score by the number of samples in each cluster. In addition to F1-score and precision, we report adjusted mutual information score (AMI) [40], one of the stateof-the-art methods to measure multi-label classification accuracy. AMI is invariant to permutations of the labels and doesn't need the relabeling step discussed previously. It is adjusted for the chance which ensures that the random labeling gets a zero score. However, AMI is not as intuitive as the classification measures discussed previously. Furthermore, by analyzing the F1-score and precision of individual cluster labels, one can identify which design classes are mislabeled.
For evaluating the metrics, the designs available in the public domain such as ShapeNet [28] are topologically not as complex as the designs obtained in TO. So, we generate complex topologies with well-defined subclasses, which are described in the next section.

Design generation
In this section, we present the datasets used to evaluate different metrics. The first part of the section covers simple datasets where the geometrical difference is easier to quantify. This is followed by more complex truss-like designs, which resemble structures created using TO.

Ellipsoidal designs
We generate three datasets where geometrical differences between the designs in a dataset can be easily quantified. Within a dataset, designs are obtained by translation, rotation, or elongation of a reference design. Although these datasets are simple, we can evaluate if our target metrics capture geometrical differences that arise from these simple transformations.
The reference geometry for these datasets is an ellipsoid, which can be represented using a Moving Morphable Component (MMC) with a form factor m = 2 [41]. We use these datasets to illustrate our method and draw initial observations, while more sophisticated datasets are presented later. The reference MMC ellipsoid, also called beam here, can be transformed by varying the position of the center of mass (C ∈ R 3 ), lengths along the three principal axes (L ∈ R 3 ), and Euler angles (E ∈ R 3 ) representing the orientation.
The resulting geometry is defined using a level set function Φ : R 3 → R: the surface of the object is given by {x ∈ R 3 | Φ(x) = 0}. The interior of the object is given by {x ∈ R 3 | Φ(x) > 0}. The level set formulation is convenient to construct the voxel and pointcloud representations used in this study. As described in Section 2, to construct a voxel representation, a common domain containing the designs is split into voxels. To construct the voxel representation of a design, occupied voxels can be found by evaluating the condition Φ(x i ) ≥ 0 at the centers x i of the i-th voxel. Using the marching cubes algorithm [42,43], Fig. 6 Beam-rotation dataset: An ellipsoidal beam is rotated by different angles to create different designs the volumetric representation is converted to a triangulated surface mesh. For pointcloud representation, points are uniformly sampled on the surface mesh, while ensuring uniform distribution in each triangular face [44,45].
Beam-rotation dataset For this dataset, an ellipsoidal beam is rotated by different angles along a fixed axis to get new designs (Fig. 6). The difference in the rotated angle then serves as the reference metric. The rotation angle is kept below 90 • due to the polar symmetry of the object. For rotations above 90 • , the difference in angle is not a good metric. For example, consider a beam B 1 rotated by angles θ and 180 • − θ to give beams B 2 and B 3 respectively. Due to polar symmetry, B 2 and B 3 fully overlap, but the angular difference indicates that B 1 is more similar to B 2 than B 3 . This dataset contains 20 designs.
Beam-elongation dataset For this dataset, an ellipsoidal beam is elongated by different lengths along a fixed principal axis to get new designs (Fig. 7). The difference in lengths along this axis then serves as the reference metric. This dataset contains 20 designs.
Beam-translation dataset For this dataset, an ellipsoidal beam is translated to different locations along a fixed axis to get new designs (Fig. 8). The difference in the position of the center of mass C then serves as the reference metric. This dataset contains 20 designs.

Topologically-complex designs
A more complex set of truss-like designs can be generated by combining multiple MMCs. Furthermore, we can use MMCs as a basis to generate labeled test datasets with well-defined subclasses.
MMC framework and similar feature mapping techniques are increasingly used in TO [46][47][48][49] since they can be used to construct complex geometries using a few design variables. Zhang et al. demonstrate the generation of arbitrarily curved beams by overlapping ellipsoids [46]. With a relatively small number of MMCs, MMC-based feature mapping techniques are able to generate complex topologies that are comparable to designs obtained by state-of-the-art density-based TO [50,51]. Even for highly nonlinear crash TO problems, it is found that the optimal structures are usually composed of interconnected beams [3]. So, it is reasonable to assume that a topologically optimized design can be represented using MMCs.
The interior of a geometry comprising multiple MMCs, say n MMCs, is defined by max i=1,...,n Φ i . As discussed previously, this allows for a conversion to voxel, surface   [18] to demonstrate a method for design exploration. Since we investigate the effect of topology rather than the effect of the shape, which is extensively studied in the literature [26,27,36], we use m = 2 to generate ellipsoidal MMCs.
Since we need complex topologies with well-defined subclasses as a test dataset, we propose to generate a connected truss-like design using a 3D geometric graph as a template. The nodes and edges of the graph are points and line segments in a 3D Euclidean space respectively. A design containing multiple beams is generated by aligning each beam, using one of its principal axes, along a distinct edge in the graph (Figs. 9, 10). We define a basegraph to generate a labeled test dataset, i.e., a set of designs with a subset of geometrically similar designs assigned a common subclass label. The basegraph is used to generate connected, distinct subgraphs which are used as a template to generate a subclass of similar designs. A subgraph can lead to multiple designs by varying the thickness of beams positioned along the edges of the graph.
In this study, we use three datasets using two different basegraphs with an increasing amount of complexity: a three cube (three back-to-back cubes) and a single cube, as shown in Figs. 9 and 11 respectively. For variations within a subclass, we change the thickness of an MMC using a uniform random distribution.
Three cube trusses This set of 150 designs is based on the basegraph shown in Fig. 9. Using different subgraphs, 6 subclasses are generated of which two are based on subgraphs as shown in Fig. 9. Samples from six of the subclasses are shown in Fig. 10. Note that the center of mass changes significantly from subclass to subclass. Fig. 11. As discussed, each subclass in the dataset is restricted to a connected subgraph. Eleven different subgraphs generate the subclasses. Samples from six of the subclasses are shown in Fig. 12. Note that the designs from different subclasses differ in orientation if they have the same topology.

Single cube trusses This set of 275 designs is based on the basegraph shown in
Randomized topologies This topologically more complex dataset challenges the classification performance (Section 5.2) of the clusters obtained by using different metrics. The dataset consists of 50 subclasses with 20 designs per class. Having a sparse number of designs per class challenges the deep learning method, which works better with more data. Each subclass of designs is based on a subgraph  of the three-cube basegraph discussed previously. A subclass of designs is constructed using the following steps, which change not only the thickness of the beams in a design but also the length along the edge: 1. Define a subgraph for each subclass. The subgraph forms a template to construct different designs in the subclass.
(a) Pick a subset of three-cube vertices (Fig. 9) of size between 5 to 10 randomly. (b) Construct an edge for each possible combination of vertices.
2. Mutate the thickness and the length of the beams which are placed along the edges of the subgraph to get a new design in the subclass. Care is taken such that the  Variations in a subclass are controlled in a conservative way such that the designs still belong to the same subclass. All random values are picked using uniform distribution in the specified ranges. When not specified, the range is [0, 1]. Three samples of three of the subclasses are shown in Fig. 13.

Topologically optimized designs
This dataset comprises 1500 designs obtained using TO [1]. The design domain is restricted to a unit cubic domain with an optimization objective to minimize structural compliance. At the early stages of designs, the exact boundary conditions to be applied to the design may not be known. With the advances in high-performance computing, it is now possible to generate a large set of feasible design layouts for an engineering component by considering different design constraints. In our dataset (Fig. 14), we arbitrarily vary the position of the fixed boundary and the two loads, simulating an extreme use case where the boundary conditions are not known. In practice, only a fixed set of configurations for boundary conditions may be perturbed slightly for a given design task [10]. Nevertheless, we use this dataset to demonstrate our design exploration approach.
Different boundary conditions are generated as follows: -We prescribe zero displacement for the nodes in an arbitrary patch in the fixed boundary face x = 0 (green Fig. 13 Randomized topologies: Each row shows designs from a subclass. With low probability, beams are cut or removed from the underlying basegraph given an arbitrary rectangular boundary patch B in a fixed face area in Fig. 14). The rectangular patch is defined by its bounds along the yand z-axes parallel to the face. The bound [b min , b max ] along an axis is randomized such that 0 ≤ b min ≤ b max ≤ 1 and b min , b max ∈ U(0, 1). -Two radial loads of unit magnitude are applied at randomly chosen centers within the cube. The load is distributed in a radius of 0.1 units. Each coordinate of the center is sampled from U(0, 1) such that the center is within the domain. The load vector is similarly calculated but its magnitude is normalized.
The density-based TO, SIMP [1], is used to optimize designs under static load for structural compliance, measured using internal energy stored in each element. A linear elastic material with the following properties is used: density (7.83 · 10 −9 ton/mm 3 ), Young's modulus (2.07 · 10 5 MPa), and Poisson ratio (0.33).
The dataset has no prescribed subclasses or simple parameters that can capture geometrical differences, unlike the previous datasets. However, the dataset is representative of TO designs obtained in practice. Since the analysis of TO datasets is our intended application, we use this dataset to qualitatively evaluate a metric.

Evaluation on design datasets
In this section, we evaluate different metrics using the methods described in Section 5 on the datasets from Section 6.

Naming convention
A metric can be composed of other metrics starting with geometric representation and dimensionality reduction techniques

Hyperparameters
For the voxel representation of each design, we use a common resolution of 25 × 25 × 25. PCA and NMF use 10 components to reduce the voxel representation, which is found to be satisfactory for our datasets. As expected, t-SNE and UMAP use 2 components which enables the visualization of the dataset in 2D. Each design is represented with a pointcloud of 2048 points, which is used as the input to train the autoencoder. By default, the dimension of PCAE code is 128, except for the last dataset with TO designs, where 500 dimensions are needed to capture the increased complexity.

Metric correlations
For different datasets, we compare different metrics using the correlation coefficients (ρ) as discussed in Section 5. Recall that values measured by a metric M are denoted by the variable D M . We measure the correlation between distances measured by a given target metric (TM) with a reference metric (RM) which is known to be useful in capturing geometrical differences. Using the metric correlations, we discuss the deficiencies of a TM, if any.

Beam-rotation dataset
The metric correlations for this dataset are shown in Fig. 15. RM is the Euclidean distance in MMC parameters which for any two designs in this dataset is the angular difference in their orientations. Figure 15a shows that the voxel distance correlates well with the RM. So, voxel distance agrees with our intuition of geometrical difference. Since the relation between the measured distances is nonlinear, the ρ p is worse, as expected.
Similarly the TMs, CD and EMD, correlate well with the RM (Fig. 15b, c). However, for certain values of D RM , D TM takes multiple values, which results in a vertical segment of dotted points (for both D CD , D EMD ). The reason for this is that the designs for this dataset are obtained by rotating a reference ellipsoid in steps of a constant angle, i.e., the orientations of designs are equispaced. As a result, mul-tiple pairs of designs have the same value for D RM since within such pairs the relative location of geometries is the same. Yet, D TM can take different values for the same pairs due to the deficiency of a TM. For the pointcloud based metrics such as CD and EMD, this effect occurs because the surface points are randomly sampled and may result in some variation in the relative location of points. The effect is observed to a lesser extent with the voxel distance as well (Fig. 15a). This is due to the discretization error of the voxel representation. Even if the relative angle between the two designs is held constant, the actual voxel representation depends on the absolute orientation in the design domain. Figure 15d-h show TMs based on different dimensionality reduction techniques. The metric based on PCAE, the TM in Fig. 15d, correlates very well with the RM (ρ s = 0.96). The slight deterioration is also apparent from the spread of points along the y-axis. PCA reduction (Fig. 15e) with just 10 components is equally good, even if the original dimension of voxel representation ≈ 1.5 · 10 4 is large. t-SNE, the TM in Fig. 15g, has the worst correlation. This is expected since it is designed only for the visualization of clusters, if any, in the high-dimensional data. However, UMAP in Fig. 15h, although designed for the same application as t-SNE, shows a much better correlation. Figure 15f shows NMF with slightly worse performance than PCA, possibly due to the additional non-negativity constraints on the reduced components. In summary, PCA and autoencoder-based metrics have a good correlation with the RM.
Beam-elongation dataset For this dataset, results are shown in Fig. 16. The RM measures the difference in lengths along the axis of elongation. The results are very similar to those with the beam-rotation dataset (Fig. 15).

Beam-translation dataset
For this dataset, the results are shown in Fig. 17. The RM, Euclidean distance in MMC parameters, measures the difference in the location of design for this dataset. Figure 17a, with voxel distance as TM, shows a difference in the behavior compared to beamrotation and beam-elongation datasets. The voxel distance does not change when the positional difference is more than a threshold (x = 4). This is because the designs stop overlapping for this range and the sum of voxel differences (D TM ) is constant (= total number of voxels in the two designs). So, voxel distance is disadvantageous as a metric for TO designs since the position of the non-overlapping material in the design domain of TO is relevant. Other metric comparisons are very similar to those obtained with the beam-rotation dataset (Fig. 15).
The three datasets discussed here use an MMC beam with a form factor m = 2. It is interesting to see if using other MMC shapes similar to bipyramids (m = 1) or cuboids (m = 6) affect the results. So, we repeated each of the experiments above with m ranging from 1 to 6. For a given m value, the results are reasonably similar for a given transformation type. The correlation between CD and EMD is very high (ρ s ≥ 0.99). The autoencoder-based metric also agrees well with CD (ρ s ∈ [0.96, 1]). As expected, voxel distance is similar to CD (ρ s ∈ [0.98, 1]) for these datasets. Other than t-SNE, using dimensionality reduction techniques on voxel representation doesn't significantly reduce the metric correlation with CD (ρ s ∈ [0.95, 1]). With t-SNE, the metric correlation is quite low (ρ s ∈ [0, 0.3]) even for the simple datasets described here. Given the consistency of results with different MMC form factors, we expect similar results with any other shapes.
Single cube truss dataset For the datasets based on single cube, three cubes, and random topologies, the geometrical differences cannot be captured by differences in simple shape parameters. So, we compare metrics with chamfer distance (Pointcloud, CD) as reference (RM), which is popular in 3D object recognition [27]. Metric correlation study indicates that the voxel distance has almost no correlation (ρ s = 0.14) with RM while EMD is very similar to CD even for complex datasets. Since CD is cheaper to compute, one may prefer it to EMD. The latent code of PCAE (AE code) used in this study is trained with chamfer distance (CD) as the loss function and hence shows a very high correlation with it (ρ s = 0.96). Other metrics based on dimensionality reduction of voxel data show improvement in accuracy ρ s ∈ [0.4, 0.6] compared to voxel distance ρ s = 0.14. So, for complex designs, it is beneficial to reduce the dimensions by removing redundant features.
Three cube truss dataset The metric correlations for this dataset are similar to the results obtained for the single cube truss dataset. Voxel distance has a very low correlation with CD (ρ s = 0.55) which is improved by using dimensionality techniques (ρ s ∈ [0.6, 0.8]). The highest improvement is obtained with Autoencoder (ρ s = 0.93) using the AE code. The correlations are higher than the single cube dataset where the design changes are more complex.  Random topologies This dataset has the most diverse topologies among the synthetic datasets. Figure 18 shows the correlations between the metrics. As expected, voxel distance doesn't correlate well with CD. EMD is similar to CD (ρ s = 0.94) even for the complex topologies. It is interesting to see that the dimensionality reduction of voxels doesn't result in any improvement in correlation with CD whereas the dimensionality reduction of point clouds using PCAE (AE code) is very useful (ρ s = 0.83). The results indicate that AE code has successfully learned the loss function, CD, it is trained on.

Summary
The metric comparisons on datasets obtained by rotation, translation, and scaling show the deficiencies of the voxel distance and the metrics based on dimensionality reduction techniques. CD and EMD outperform these metrics in capturing simple geometric differences. Of the dimensionality reduction techniques, Euclidean distance in the AE code shows the best correlation with RM. From the results on metric correlations, CD, EMD, and Euclidean distance in AE code are the top choices for the metrics on geometry. Since the test datasets have complex topologies with strong variations in size and orientation, we expect similar results with TO designs.

Using clustering performance
To evaluate the classification accuracy of different metrics, we consider the randomized topologies dataset of 50 subclasses with strong variations in topology, where each subclass has 20 designs. Metrics mentioned in Section 3 are used with the naming convention from Table 1. For example, the metric: (Voxel, PCA, ED) means that the voxel representation of the geometries is transformed using PCA into a low-dimensional vector. Finally, the Euclidean distance (ED) in the reduced space is the resultant metric. Different metrics are used to cluster the topologies and the corresponding accuracy measures are reported. Table 2 shows the classification accuracy of different metrics using k-means clustering with a prescribed number of clusters (k = 50), the same as the number of class labels. Note that since this is a labeled test set, the class labels are known beforehand. PCA (with 10 components) is better compared to NMF (with 10 components) for this dataset. The pointcloud metric (AE code, ED), UMAP, and t-SNE based metrics can identify all the subclasses accurately. The high precision of UMAP and t-SNE indicates that they can be used to visualize the clusters in the dataset (Fig. 19). Note that CD and EMD metrics are not used since k-means requires an input feature vector and cannot handle pairwise distance matrices provided by CD and EMD. Table 3 shows the classification accuracy of different metrics using OPTICS, which is more general than the kmeans algorithm and can handle pairwise distance matrices. OPTICS identifies the appropriate number of clusters automatically. However, for this dataset, the clustering accuracy is lower compared to the k-means method. The order of the derived metrics using NMF, Voxels (Voxel, ED), PCA, UMAP, AE code is the same as in Table 2. In contrast, t-SNE performs worse than PCA-based metric, which is outperformed by UMAP. CD and EMD perform better than UMAP, but Autoencoder latent code results in the best metric for classification and is better than the loss function (CD) it is trained on.

Engineering application
From our experiments discussed in Section 7, we find that the AE code obtained by PCAE reduction can identify similar designs in a topologically optimized, complex dataset. It also extracts geometric features, which are easy to cluster since they are low-dimensional. Figure 20 shows the clusters in designs as visualized in UMAP with clusters identified using k-means algorithm with AE code as the input data. Since UMAP preserves the intra-cluster distances, the relative location in the 2D cluster plot is meaningful. Figure 21 compares each of the cluster representatives with three different designs: two designs closest to it and one design farthest from it in terms of Euclidean distance in AE code. The most similar designs seem to share similar material distribution while the load conditions seem to be different. The most dissimilar designs (column d) seem to have completely different topology compared to the  representative. The figure also shows the relative distances in terms of AE code and CD from each design (column bd) to the representative (column a). For a given metric, a measured distance is normalized using the farthest distance measured from the representative to yield the relative distance. According to these metrics, the closest designs to each representative are only dissimilar by 0.2 to 0.3 units compared to the farthest design, whose distance is normalized to 1 unit. As expected, AE code mimics the We recommend using the PCAE followed by a UMAP reduction for intuitive and accurate visualization of design clusters. This is an improvement to the design exploration method proposed by us for Pareto-optimal designs using voxel representation and PCA [52]. Furthermore, the geometric features obtained by PCAE can be combined with performance measures. The aggregate vector can be used directly for design exploration based on geometry and performance. Since the Euclidean distance in the geometric features is meaningful, the similarity control in TO [20] can be easily implemented as well. Table 4 summarizes the advantages and disadvantages of using different metrics. In terms of clustering accuracy, the pointcloud based metrics perform the best. t-SNE and UMAP perform well even with raw voxel data, although it is better to use them with the reduced data. By the high quality of geometric features, we mean that the Euclidean distance in the extracted features, if any, is meaningful. The latent code of the PCAE, which gives the most meaningful geometric features, can be thought of as a means to improve the pairwise distance metrics (CD in Fig. 20 Clusters in TO designs are visualized in 2D using UMAP. The k-means clustering algorithm identifies the 4 clusters, labeled 0-3, based on the AE code. The representative design for each cluster is shown as well. k-means clustering identifies the subclasses of designs even if the clusters are not separated, which is often the case with engineering data our case) they are trained on, which do not extract any features. From computation ease, voxel distance is simple to calculate relative to the other metrics based on dimensionality reduction such as PCA or AE code since there are no additional data processing steps. CD and EMD are moderately expensive since they involve some analysis of pointclouds before comparing. This analysis is only valid if the metrics are not used often for a specific dataset. With repeated calculations, the reduced dimensional representation significantly speeds up the metric calculation, once the initial data processing is complete. From clustering effort, the metrics based on simple feature vectors are easier to compute. CD and EMD which require the construction of pairwise distance matrices can be very expensive. For exploring TO designs, the AE code might be a good compromise. It has high clustering accuracy with the extraction of meaningful geometric features. The moderate effort needed for training a PCAE model might be worth the benefits.

Conclusion
In this work, we address the problem of finding geometric features that can be used to explore topologically optimized structures and identify diverse designs based on geometrical properties such as topology, size, shape, and orientation in a design space. Extracting features in an automated fashion using popular unsupervised machine learning techniques improves the usability of topology optimization (TO) and other design generation methods in the engineering design process.
The challenge is to choose a feature extraction method from several methods available: PCA (Principal Component Analysis), NMF (Non-negative Matrix Factorization), t-SNE (t-distributed Stochastic Neighbor Embedding), UMAP (Uniform Manifold Approximation and Projection), or PCAE (Pointcloud Autoencoder), a geometric deep learning method for pointcloud representation. We propose to choose the geometric features based on two properties: the ability to capture geometrical differences and to identify similar designs. The former property is evaluated using a novel method to compare with reference distance metrics, while the latter is measured using state-of-the-art methods for evaluating clustering performance. The proposed test datasets comprise topologically more complex designs than publicly available datasets such as ShapeNet [28].
For clustering geometrically similar designs, PCA can extract better features compared to the voxel representation. t-SNE and UMAP, which are visualization techniques for high-dimensional data, as expected, perform even better. The performance improvement of t-SNE and UMAP comes at a cost that the distances in the reduced representation may not reflect the geometric differences, as shown in our experiments with even simple datasets. PCAE, Fig. 21 TO designs similar to cluster representatives found using AE code (Fig. 20). The first column (a) contains the four design representatives. For each representative, three designs are shown in a row with decreasing similarity along the columns b-d. For each design, we show the corresponding loads as brown arrows. In each row, we measure the relative distance from the representative (a) to each design in columns b-d using AE code (AD) and chamfer distance ( in contrast, performs well in both scenarios: measuring similar distances as reference metrics and identifying geometrically similar designs. Empirically, we observe that the PCAE creates a feature vector with distances correlated to the loss function it is trained on. This allows conversion of any reasonable metric, which is defined as a pairwise function of geometries, to a Euclidean distance in a feature vector (latent code). Since the latent code is better suited for integrating geometric features with performance features, one can simultaneously explore designs based on geometry and performance. This analysis shows the need for our proposed method, along with the clustering performance measures, to validate if the geometric features are as meaningful as the reference metrics.
With an effective geometric feature vector such as the latent code of a PCAE, we can now combine geometric features with performance features for design exploration. It would be interesting to see if this can discover design clusters with unique properties, either geometric-or performance-wise, in real-world engineering applications. Furthermore, one can identify distinct concepts in a design database. This would considerably improve the applicability of topology optimization in the product development process.

Availability of Data and Material Data and material are owned by Technical University of Munich and Honda Research Institute Europe
GmbH. Most of the software used can be replicated by using the opensource libraries on clustering and deep learning. The datasets discussed in this paper are available at zenodo [53].

Competing interests No competing interest exists.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.