2S-ML: A simulation-based classification and regression approach for drawability assessment in deep drawing

New structural sheet metal parts are developed in an iterative, time-consuming manner. To improve the reproducibility and speed up the iterative drawability assessment, we propose a novel low-dimensional multi-fidelity inspired machine learning architecture. The approach utilizes the results of low-fidelity and high-fidelity finite element deep drawing simulation schemes. It hereby relies not only on parameters, but also on additional features to improve the generalization ability and applicability of the drawability assessment compared to classical approaches. Using the machine learning approach on a generated data set for a wide range of different cross-die drawing configurations, a classifier is trained to distinguish between drawable and non-drawable setups. Furthermore, two regression models, one for drawable and one for non-drawable designs are developed that rank designs by drawability. At instantaneous evaluation time, classification scores of high accuracy as well as regression scores of high quality for both regressors are achieved. The presented models can substitute low-fidelity finite element models due to their low evaluation times while at the same time, their predictive quality is close to high-fidelity models. This approach may enable fast and efficient assessments of designs in early development phases at the accuracy of a later design phase in the future.


Introduction
The development of new deep-drawn sheet metal parts is a complex engineering task. Low costs, structural durability, improved crash properties, aesthetics, and sustainability are contrasting requirements. Another constraint in part development is ensuring manufacturability, which has to be guaranteed from the beginning of the early design phase until a part is produced in a press. It needs to be re-assessed for design changes, uncertainties in material and sheet metal properties, and changes in the drawing configuration. To reduce the complexity in development, engineers define the shape and drawing configuration for a part iteratively. This experience-driven process leads to a compromise in the aforementioned requirements and lacks reproducibility. On the other hand, existing methods like optimization are often impractical for manufacturability assessment. This is due to the limitations of the corresponding large-scale simulation models, which are computationally expensive [5].
As we focus on the deep drawing production step, we investigate drawability rather than full manufacturability. There have been ongoing scientific contributions to 1 3 56 Page 2 of 17 improve drawability assessment. Baseline formulas to estimate drawability are given in [12]. The method is based on a cylindrical cup drawing setup. It rather aims to give feedback on whether a part should be deep-drawn or not than to give reliable feedback in part development. A simplified twodimensional approach of a stamping analysis is presented in [14]. We employ the finite element method (FEM). It is a widely used numerical technique for simulating the behavior of complex systems, such as the deep drawing of sheet metals. The method has become an essential tool in this field, as it allows engineers and scientists to predict the behavior of a system under different loading conditions, material properties, and addendum geometries. The main advantage of FEM is the flexibility and low cost in comparison to real-world experiments.
Machine learning (ML) has shown promising results for hybrid modeling in engineering environments. Ambrogio et al. [1] proposed a kriging meta-model on incremental sheet forming to predict sheet thicknesses. A surrogate model for textile forming applications was presented in [25].
Here, simulation results are evaluated with an image-based data representation to investigate the influence of different base geometries on the surrogate model extrapolation quality. Morand et al. [16] presented a surrogate model that predicts the final geometry and field variables of a deep-drawn cup, focusing on selected process and geometry parameters. Slimani et al. [20] developed an artificial neural network surrogate for the flat rolling process based on a combination of the slab method and FEM to predict the rolling force. A dimension-reduced neural network based on features is used to predict the springback behavior in [11].
In our hybrid approach, we utilize the concept of multifidelity (MF) modeling to create surrogate models. Illustrated by Kennedy et al. [13], the goal is to achieve high-fidelity (HF) model accuracy at the computational cost of a lowfidelity (LF) model. Accordingly, a combined MF surrogate with decreased evaluation time is set up. Song et al. [22] mixed an LF and an HF model using radial basis functions to decrease computational cost for surrogate setup. It was tested on numerical and engineering problems. For a comprehensive overview of MF modeling, readers are referred to [24].
To decrease evaluation times and save expensive simulation runs while preserving the prediction quality of sophisticated simulation models, we propose a novel approach named 2S-ML here, an ML-based MF-inspired architecture. It incorporates information from both, LF and HF simulation schemes as well as non-simulated information to train low-dimensional surrogate models based on a data set. Our approach differs from existing MF methods in two ways. First, our surrogate models do not depend solely on simulation parameters but also on additional ML features that contain complementary drawability information. These features improve applicability to a more diverse set of shapes.
Second, unlike MF scaling functions, space mappings, and difference mappings, 2S-ML does not explicitly use mixing or tuning factors, as the mixing is performed implicitly during the training of the ML surrogate models. Third, as our method is intended to work on an established historic data set, we focus on surrogate performance, not computational efficiency in their setup.
Using the 2S-ML architecture, we build a regression surrogate that allows us to continuously assess drawability and rank drawable designs in a large, industry-relevant design space. Subsequently, the same architecture is used to train a classification model that can exclude non-drawable designs. In that regard, we introduce an extension to an existing drawability measure to be able to distinguish different drawable states of drawable designs, which enables continuous ranking of designs. Furthermore, this approach assesses drawability without specific knowledge of tooling. Therefore, it can map the drawability assessment of a later design stage to the early design stage.
The paper is structured in the following manner. In Sect. "2S-ML Machine learning architecture ", two FEM simulation schemes involved in the generation of the data set are presented. Their material model is introduced in Sect. 3. The measure used for drawability assessment is outlined in Sect. 4. Section 5 describes the Design of Experiments (DOE) used for data set generation. It serves as the basis for the development of the regression and classification model presented in Sect. 6. Afterward, the results of the ML models are presented and discussed in Sect. 7. Finally, conclusions are drawn in Sect. 8.

Finite element simulation models
The cross-die geometry used in the simulation schemes is similar to the ones presented in [4,10]. Its shape contains convex, concave, and planar faces that yield a variety of representative multiaxial stress and strain states.

Low-fidelity one-step scheme
The one-step scheme, also known as the inverse approach [7], is based on the principle of virtual work, Hencky deformation, and kinematic-based geometric mapping [6]. It flattens the final workpiece to the initial flat blank. One-step simulations allow for a fast calculation of the sheet thicknesses, thinning, and plastic strains of a sheet metal. The inverse approach lacks precision, as it does not recreate a deep drawing process itself. Important effects such as incremental plasticity and the influence of tooling contacts cannot be modeled. Therefore, it is considered the LF model here. A simulation configuration for an exemplary LF model is shown in Fig. 1.
One-step simulations are suitable for early design stage evaluation since they do not require tooling. We use implicit time integration 1 to calculate the flattened state of equilibrium. The short simulation times are beneficial for practical part-to-part comparisons and iterative methods like optimization. For example, to define the initial blank size as was investigated in [18].

High-fidelity incremental deep drawing scheme
The incremental deep drawing simulations scheme returns high precision results by calculating the deep drawing process time-dependently, including tooling, contacts, and manufacturing boundary conditions. On the downside, the simulation scheme is computationally expensive. Tooling including drawbead design has to be created and embedded before the simulation can be run. Therefore, it can only be implemented at the later design stage. As deep drawing simulations represent the most sophisticated results, they are the last virtual step before the experimental tryout process. The incremental deep drawing scheme is defined here as the HF simulation. A sample configuration is shown in Fig. 2.
We use reduced integrated bilinear Belytschko-Tsay shell elements and limit the hourglass energy to 5% of the internal energy. For the HF scheme, we use explicit time integration [1]. This has several reasons. Nonlinearities induced by the contacts or the material model require small time steps. This counteracts the advantage of unlimited time step sizes of implicit time integration. Also, the explicit scheme here benefits the convergence behavior for discontinuous boundary conditions. Deviations from the exact solution are accepted, as the overall process time is Fig. 1 The one-step simulation setup contains the geometry in its final state. To take the influence of drawbeads into account, a subset of nodes on the free edge of the convex hull of the model are selected to apply restraining forces. We enforce a maximum edge length of 2 mm for each element, resulting in roughly 5 elements over the radius. It provides interpretable results like thinning and a preliminary blank shape suggestion Fig. 2 Four geometries are used in the incremental deep drawing simulation. The tooling, composed of the die, punch, and blankholder, forms the blank into its final geometry state. The initial maximum edge length of blank elements is 4 mm. The minimum edge length of re-meshed elements is set to 0.3 mm, which results in about 15 elements per radius for the formed blank. Drawbeads are considered analytically. Field quantities are evaluated at the last time step when the maximum drawing depth is reached small (considering time scaling). In addition, we employ four techniques to reduce computational costs. First, an initially coarse mesh for the blank is refined adaptively over the simulation time based on two heuristics. We re-mesh regions where neighboring elements are deformed beyond an angle of 25° and when tooling surfaces are approaching to cope with contact interaction. Second, we apply time scaling to reduce the overall simulation time. Third, the tooling is considered rigid. Fourth, we utilize elementwise mass scaling to limit the minimum time step size to Δt min = 1.0 ⋅ 10 −7 s . To obtain a thorough summary of deep drawing FEM models, readers are directed to [2].

Hockett-Sherby hardening model
We use an elasto-plastic material model with Hill48 [8] as yield criterion, assuming transverse anisotropy and plane stress condition in the thin-walled sheet. To embed a variety of different hardening properties in our upcoming surrogates, we need a tunable and representative hardening law for deep drawing steels. The isotropic hardening is calculated based on the Hockett-Sherby saturation law [9] with s as saturation and y as yield stress, N and p as material constants, and as the plastic strain. Strain ratedependent effects are considered negligible. To generate only monotonically increasing hardening curves and to represent different hardening trends, a stress ratio between the true stress equivalent of the tensile strength R m , converted with the elongation without necking A g and yield stress y is introduced. By setting y and i c , the computation of the hardening curve can be conducted.
By adjusting the parameter ranges (see Table 4), we generate representative material properties. Figure 3 shows a comparison of our sampling range to standardized cold rolled (CR) forming steels. All flow curves used afterward are drawn from the sampling range.

Drawability measure
The HF simulation model itself calculates nodal deformations, but no direct measure for drawability. Therefore, there is a need to further evaluate the FEM model to describe and calculate drawability. A commonly used method for this is using the forming limit diagram . It allows for element-wise subdivision of the minor and major true principal strains of a formed sheet metal into regions of cracks, wrinkles, and good (drawable) points. Sun et al. [23] calculate a weighted sum of element-wise vertical distances from crack points to the forming limit curve (FLC) and wrinkle points to the wrinkling limit curve (WLC). In [15], a strain pathdependent distance between the FLC and each point in the crack region is presented. We select the function proposed by Sun et al. [23] as an outset because of its sophisticated drawability representation. It is originally defined in a positive semidefinite manner, where non-drawable configurations can be distinguished from one another. Drawable setups cannot, because their distance inherently is 0. Therefore, we extend the drawability measure with f n−d for non-drawable configurations and f d for drawable ones. This allows ranking of different levels of drawable designs. As most drawing setups tend to have some minor wrinkling, especially near the blankholder region where the mesh is relatively coarse, we introduce a setup-specific, useradjustable drawability threshold of f dt = 1.395 ⋅ 10 −6 . Setups with values below the threshold are considered drawable. The calculation of the two domains The parameter ranges are chosen so that sampled stress-strain curves (within green sampling range) represent typically used forming steels. This is exemplified by the two depicted standardized CR steels is conducted with A e as the surface area of an element normalized with the total surface area of the formed blank, d e as an element's minimum Euclidean distance to the FLC or WLC, respectively, n as the number of elements, and w as weights. The indices c , w , and g denote the affiliation with the crack, wrinkle, or good (drawable) regions in the FLD. We set the weights for wrinkling at w w = 0.1 and for good points at w g = 0.2.
We make four modifications to the original function to improve robustness and simulation-specific mechanics. First, we use the minimum Euclidean distance because it more accurately represents the deformation process, especially for wrinkles. Second, by element-wise area weighting, we mitigate the effect of different element sizes in the mesh to minimize overall mesh dependence. Third, as we do not delete cracked elements during simulation, each cracked element's surface area is set to the surface area at the time step at which the crack occurred. Otherwise, the weighted distances calculation would be affected by non-realistically deformed crack elements with a high surface area. Fourth, a maximum elementwise distance for crack elements d e c−f lc,max = 2.35 is introduced. This limits the effects of heavily distorted or failed elements on distance calculation. The cumulative distance approach adopted here is preferred over a perhaps more intuitive maximum distance approach for two reasons. First, it mitigates the impact of single heavily distorted elements on the drawability measure. Second, cumulative distances provide more consistent feedback, which benefits surrogate training and optimization tasks conducted with them.
The resulting weighted distribution in the drawable domain is depicted in Fig. 4.

Design of experiments
To embed a learnable drawability threshold, a data set containing drawable and non-drawable parts is needed. To avoid imbalanced learning, the ratio between drawable and non-drawable setups should not exceed about 85% instances of one label. We use a Sobol sequence, originally published in [21], to on the one hand sample material parameters and on the other hand sample the remaining parameters separately. Parameters are then combined randomly. The parameters and their ranges are shown in Table 4. The resulting parameter combinations are depicted in Fig. 5.
The mixing of lines to the left of the r axis shows the random combinations of material parameters with respect to the remaining parameters. Otherwise, the uniformity of the Sobol sequence can be seen from evenly intersecting parallel lines. Regions of thicker blue lines indicate that there are more feasible combinations of geometric parameters (e.g. without undercuts). Regions with fewer lines are prone to less accurate predictions. As a result of feasible parameter combinations and error-free simulation runs, we end up generating a data set of 2541 samples. This relatively large design space represents a variety of setups, making it harder to obtain high-quality results with a limited number of samples.

2S-ML Machine learning architecture
Noteworthy industrial appearance of complex deep drawn parts with multiple development iterations is mostly found in car structures. A chassis contains approximately 200-500 sheet metal parts, of which only a varying fraction is deep drawn due to cost implications [3]. Same-part strategies, symmetric structures, inconsistent documentation, and changes in the state-of-the-art of the deep drawing process further narrow the number of usable sheet metals for ML. In this limited data environment, deep learning is unlikely to provide the best model Fig. 4 The drawable domain is defined within the area bounded by the FLC (red), the WLC (blue) and the line of undefined deformation state (black). The color represents the influence of the failure mode (red for cracks and blue for wrinkles). The predominant red filling is caused by the implementation of the wrinkling weight w w . The more cracks influence a region, the larger the black dots performance. To promote applicability, we use domain knowledge to set up the 2S-ML architecture with application-dependent feature and label engineering.
We interpret drawability assessment in two ways: • The ML model distinguishes whether a setup is drawable or not using binary classification. • The ML model assigns a target value representing drawability using regression to rank drawability of different configurations.
We use supervised learning to ensure the best possible model performance. An overview of the proposed 2S-ML architecture is given in Fig. 6.
In the training phase, the first step involves setting parameter ranges. Next, the DOE is employed using the Sobol sequence to sample parameter combinations. For each parameter set representing a drawing configuration, all cross-die-related geometry is created using the geometric parameters. 2 The punch geometry here is derived by offsetting the die geometry. Analytical drawbead sets are used directly in the incremental simulation to apply restraining forces to nearby blank nodes. The material parameters are used to specify the Hockett-Sherby hardening and the anisotropic elasto-plastic material model. There are 12 independent parameters for cross-die geometry, process simulation conditions, and material models. A comprehensive overview of their meaning can be obtained from Table 4   An HF simulation is then set up using the die, drawbead, and blank geometry, along with the material model and process parameters. The drawability measure evaluates the incremental simulation to generate a label for classification or a target value for regression. Additionally, an LF simulation is set up and run using the cross, the material model, and process parameters. Here, restraining forces are mapped from the drawbeads to the boundary nodes of the cross. Features are generated based on the material model, the one-step simulation, and the parameter combinations. These features are then used as input for a classification or regression ML model, while the label or target value is used for the supervision of the training.
In the inference phase, the first step involves defining the parameter set of interest. Next, a representative feature set is generated as described above. Finally, the label or target value is predicted using the feature set on the trained ML model. It should be noted that no incremental forming simulation needs to be run during the inference phase.
We establish a set of 40 initial features that contain baseline simulation parameters and supplementary features. An in-depth overview of their calculation is given in the Appendix 2. The goal of the features is to capture all necessary information for drawability assessment.
Features associated with the shape of the part are inspired by differential geometry. Most material features are intended to embed the hardening behavior. The simulation-based features are application-dependent and should contain relevant drawability information. We do not propose features to characterize tooling geometry, as it is generated parametrically here.
Using this combination of information, a surrogate model captures the representative underlying drawability problem, which helps to generalize to new configurations. Additionally, the supplementary features are independent of the cross-die geometry. They can therefore be applied to a variety of shapes. The assignment of all computed features to their source domain is shown in Fig. 7.
Computing the feature set for every configuration in the whole data set includes five aberrations. These are further excluded from the upcoming training and testing process. The features of the resulting data set are shown in Fig. 17.
Further investigation of the data set is done by calculating the Pearson correlation coefficient between two features X and Y , cov(X, Y) as the covariance between the features, and the standard deviation of the features X and Y . A high correlation of features can lead to worse prediction quality. The features h min , h , h , h max , K max , and ps m1 are neglected due to one or more Pearson correlation coefficients greater than 0.9. Despite feature deletion, there remains redundant information (colinearities) in the feature set, which can limit surrogate quality. Still, further deletion would reduce relevant information and likewise decrease surrogate quality. To address this issue, principal component analysis (PCA) is utilized. While losing the interpretability of a feature's influence, PCA enhances the model's generalization ability and here guarantees the applicability of the proposed approach to non-parametric data sets. To find the most optimal compromise between small dimensionality and variance loss, we make the number of principal components available for hyperparameter optimization.
For regression, the drawability domains are min-maxnormalized each. The resulting distribution is shown in Fig. 8.
Due to the piecewise definition of the drawability measure, there exist two separable histograms. The drawable setups are quite evenly distributed, while the non-drawable setups show an agglomeration near the drawability threshold. The data set is sparsely filled for values greater than 0.75. The resulting label ratio for classification is within the limits of imbalanced learning.
Because there is no prior knowledge of which ML algorithm will build the most suitable surrogate for drawability assessment, we heuristically select a subset of training algorithms (compare Figs. 9 and 12) published in [19]. For an in-depth overview of low-dimensional ML algorithms, interested readers are referred to [17]. As ensemble methods usually provide gains in prediction quality, we train bagging meta-models that use the selected algorithms as base estimators. Also, a stacking and a voting ensemble model are built based on the aforementioned meta-models. We tune the hyperparameters of the PCA, the base estimators, the bagging metamodel, and of the voting and stacking models via crossvalidated grid search optimization. The same ensemble approach is applied to both, regressors and classifiers.
To measure surrogate performance, appropriate metrics must be defined. For classification, the most costly misjudgments are false positive predictions, as this means continuing to develop a part that is not drawable. Therefore, we use with the weight factor = 0.5 , TP for true positive, TN for true negative, FP for false positive, and FN for false negative predictions as our binary classification metric for training.
is used subsidiarily to measure overall performance. For regression, we utilize the coefficient of determination where n is the number of data points, y i is the true value, ŷ i is the predicted value, and y i is the observed mean value to investigate the overall regression quality. The relative maximum absolute error is used to indicate local accuracy.

Results and discussion
For regression and classification, the data is randomly divided into 80% training data and 20% test data. Crossvalidation with five iterations eliminates the need for validation data.

Regression
Independent of the regression training algorithm, we initially evaluate two different modeling approaches. The first approach includes setting up a regressor to predict drawability for both, drawable and non-drawable setups.
(|y i |) Fig. 8 The distribution of the non-drawable (red) and the drawable (green) domains is separated by the abscissa. There are 955 (37.7%) non-drawable configurations and 1579 (62.3%) drawable ones In the second approach, two separate regressors are trained, one for the drawable and one for the non-drawable domain. Results of the best-performing surrogates are listed in Table 1.
It can be noted that the overall high prediction scores illustrate the applicability of our method. For regression, splitting the data set into drawable and non-drawable subsets significantly improves prediction quality. This is caused by the piecewise definition of the drawability measure. Yet, using two surrogates in parallel raises the necessity for a model that decides which regressor should be used for prediction. This can be done by the classifier introduced in Sect. 7.2.
We further concentrate on the separated approach because of its higher prediction quality. A comparison between different models for the regression is shown in Fig. 9.
It can be observed that all models achieve higher R 2 -scores for the drawable domain. The authors attribute this to the fact that the failure composed of cracks and wrinkles is mechanically more complex compared to the good points in the drawability measure calculation. The stacking regressor shows the best R 2 -scores of 0.918 for the drawable (green) and 0.859 for the non-drawable domain (red).  Fig. 9 The prediction quality of all investigated regressors, measured by R 2 and RMAE It also has the lowest RMAE s of 21.8% and 23.6%. The voting regressor's prediction quality suffers from being based on all depicted models, including worse-performing models.
In Fig. 10, predicted target values are compared with the true ones for the best-performing regressor.
The overall true-predicted comparison is homogeneous. There are no severe outliers affecting the prediction scores and errors, which can also be deduced from the low RMAE s. This allows for a drawability ranking of different drawing setups in the whole design space. Drawable configurations are overall very accurately predicted. In the non-drawable domain, there is a slight trend towards predicting worse drawability for nearly drawable designs. This agglomeration of slightly too poor drawability estimation near the drawability threshold could be influenced by the defined drawability threshold.
Furthermore, there are sources of noise in the features and labels. As mass scaling is only applied to shell elements below a certain size, there is uneven mass induction for different cross-die geometries, which influences the calculation of the drawability measure. Also, the coarse blank mesh in the region of the addendum affects the regression. Those elements often are wrinkled elements and due to their element size relatively strongly weighted in the drawability computation. A finer mesh produces distinctive wrinkled geometry. Therefore, more realistic element surface areas are combined with more precise strain results, which results in more accurate drawability values. Also, low-dimensional feature-based approaches work on the basis of feature sets that condense the necessary information to describe the problem at hand. Adding features that contain supplementary information benefits the training process.
Another possibility for improved surrogate performance is to increase the number of training instances. To get insights into how many data points are needed for what model quality, training is conducted for different data set sizes for the stacking regressor. The results are shown in Fig. 11.
It can be noted that the R 2 -scores rise with additional training data. Below 240 training samples, the drawable and non-drawable curves alter significantly. A notable convergence in R 2 -score for the stacking regressors starts at about 240 training samples. Therefore, a minimum ratio of about 20 ⋅ d with d as the number of parameters is required for reliable surrogate performance. Above 240 training samples, there is still a slight trend of increasing R 2 -scores. Therefore, to achieve maximum R 2 -scores, a higher ratio than 20 ⋅ d is necessary for our 12-parameter problem (compare Fig. 5). From the convergence trend of the drawable surrogate (green line in Fig. 11), we conclude that to achieve even higher R 2 -scores, additional features would be necessary. Based on the convergence trend, we expect a lasting gap between drawable and non-drawable curves, at least for an industrial stock of training samples. Besides the more complex non-drawable state, more training samples are likely to still decrease the gap between R 2 -scores of the domains.
None of the models presented in Fig. 9 show a noteworthy different convergence trend-including the limited data environment with less than 240 training samples. Still, it is possible that other surrogate techniques or ML algorithms (e.g. naive Bayes, cokriging) could achieve higher R 2 -scores in the region smaller than 20 ⋅ d . If there is no data set or trained model for drawability assessment, a compromise between computational investment to set up a surrogate and running the original model has to be considered. Fig. 10 The joint predictions for the drawable (green stars) and nondrawable (red circles) domains of the stacking regressor on the test data. The dashed black line represents ideal prediction, where true and predicted values would coincide Fig. 11 The R 2 -scores for the drawable (green) and non-drawable (red) domain for the stacking regressor over the number of training samples. The drawable domain contains 1581 data points (1264 for training). The non-drawable domain 955 (764 for training). The black dashed line marks the threshold of around 10% relative error to maximum R 2 for both regressors, which is at 240 training samples each When 2S-ML regression surrogates are used to rank multiple drawing configurations in an optimization scheme, there is an increased requirement for short evaluation times. As part of 2S-ML, a one-step simulation is run to retrieve the full feature set required for prediction (compare Fig. 6). To determine whether the loss in prediction quality is justifiable by the decrease in evaluation time, we investigate the surrogate model performance by omitting one-step features in the training of a new regression surrogate. Furthermore, in early-stage part design, sheet metal geometry is often the only existing decision basis for the drawability assessment. Therefore, we challenge our 2S-ML surrogate to be trained solely on geometry-based features. The results of the reduced feature set investigations are listed in Table 2.
Considering the benchmark values shown in Fig. 10, it is apparent that relying only on geometric features has a substancial negative impact on the R 2 -scores. The nondrawable domain cannot be assessed correctly, while the prediction quality in the drawable domain is only moderate. Consequently, the approach solely with geometry features is not a feasible way to assess and rank drawability. On the other hand, neglecting simulation features leads to practically no difference in prediction quality, but results in a significant reduction in evaluation time (instantaneous feedback compared to several seconds to minutes). Forgoing one-step simulation in the architecture eliminates the need for a commercial license, as there is currently no open-source one-step code available. Additionally, it reduces the risk of unstable simulation runs and subsequent infeasible predictions. If there is a need to generate new samples, this can be done without conducting a one-step simulation. For existing data sets, simulation-based features can be used to slightly improve prediction quality. We emphasize though that one-step results have the potential to improve prediction quality for other data sets that are less uniformly filled and that contain less similar setups. Table 2 The best regressor without simulation features and only on geometry features both is the stacking regressor. The omitted features are affiliated to certain domains (see Fig. 7 Fig. 12 The prediction quality, measured by F 0.5 -score and Accuracy , on the test data set of the investigated classifiers

Classification
For classification, the model needs to distinguish between drawable and non-drawable configurations. We use the same feature set as for regression. Figure 12 shows the test scores for each trained model. Overall, most models perform similarly in the drawability assessment. The best predictor is the bagging linear support vector classification (LSVC) model with an F 0.5score of 0.950 and an Accuracy of 95.7%. Although the bagging LSVC model shows the best test scores in the data set, the stacking model can also be considered for further evaluation. Due to its inherently robust predictions, it might perform better on data sets with less similar geometries.
The test scores for the bagging LSVC model are based on the predictions displayed in Fig. 13.
The bagging LSVC classifier's hyperparameters are optimized regarding the F 0.5 -score. A low number of FP values is desirable, in addition to good overall prediction scores. It can be seen that there are 11 FP (2.2%) predictions, which represent exactly half of the false predictions. It can be noted that there exist both, drawable and non-drawable predictions, proofing balanced learning.
To check how many data points are needed to build a sophisticated binary classifier, we train and test the bagging LSVC model for different data set sizes. The results are shown in Fig. 14.
It is evident that the prediction quality of the bagging LSVC is nearly constant over different data set sizes. Adding more samples has no effect on the scores. Therefore, the classifier is suited to limited data environments. However, less uniform data sets are likely to start with lower prediction scores and an increase in prediction quality over larger data set sizes. Similarly to the regression model, we expect further potential in adding features.
As described in Sect. 7.1, there exist scenarios where there is a lack of features. To investigate the performance of classifiers in this environment, we train equivalent classification models. Table 3 lists the most accurate geometry-only and LF-free features models.
There is a noticeable decline in the prediction scores for the geometry model, considering the reference values of F 0.5 − score = 0.950 and Accuracy = 95.7% . Key information is missing for proper assessment. Still, if no other reproducible methods exist, this approach can be further investigated to address early-stage drawability assessment from the perspective of a worst-case assumption. The prediction quality of the classifier without simulation features is slightly lower compared to the full model, but still comparable. Neglecting simulation features only leads to a small reduction in prediction quality. Furthermore, there are no noteworthy differences regarding convergence behavior.  Table 3 Scores for models trained solely on geometry features (bagging adaptive boosting classifier) and without LF features (bagging LSVC classifier). The feature affiliation as the basis for neglection can be seen in Fig. 7 Metrics Only geometry Without simulation F 0.5 -score 0.823 0.943 Accuracy 75,5% 94.5%

Conclusion
In this paper, an ML approach to developing a regression and a classification model for drawability assessment in the early design stage in deep drawing is presented.
To generate the data sets for their training, a DOE with a Sobol sequence is followed by a one-step and incremental forming simulation. Feature computation is performed to obtain a drawability representation. An evaluation using an enhanced area-weighted minimum distance drawability measure is applied to the HF model to obtain supervision for the training. The purpose of the classifier is to distinguish between drawable and non-drawable configurations to detect scrap configurations. It is shown that high prediction scores of F 0.5 -score of 0.943 and an Accuracy of 94.5% can be achieved using LSVC. A major finding is that even for a much smaller data set, almost identical prediction quality can be achieved for the classifier. Small potential lies in adding one-step-based features at the cost of running the simulation. Very accurate, late design phase FEM models will remain irreplaceable in a limited data environment. However, the calculation of many small to mediumsized simulations during the early design process can be significantly reduced. Another benefit of our method is that it can be used by component developers who are not simulation engineers. Reproducible evaluations by the classifier allow decision pathways to be traced for process improvements. Saving calculation time and licensing costs is another advantage.
The regressor can be used to rank the drawing setups at hand. The goal to accelerate assessment while keeping its prediction quality is achieved by the regressor, which is split into a regression model for the drawable and one for the non-drawable domain for reasons of higher prediction quality. The stacking regressors have R 2 -scores of 0.916 for drawable and 0.843 for non-drawable configurations for a data set size of 2536 samples. If the data set size is limited, R 2 -scores of 0.835 and 0.744 can be achieved for about 240 samples ( 20 ⋅ d ) each. The classification model can be used to decide which model to choose for prediction.
The use of classification and regression for drawability assessment allows saving and speeding up design iteration loops, decoupling part design and simulation to some extent. We can conclude that our cross-die showcase illustrates the applicability of the method to sheet metal deep drawing. Moreover, the 2S-ML architecture is not limited to deep drawing. The applicationdependent features and labels can be adapted to assess other manufacturing processes if a hierarchy between different sources (e.g. simulation, measurement) of information can be defined. Other manufacturing process surrogates trained with this architecture, especially forming applications, can benefit from the presented features.
To apply the proposed method, a data set of previously investigated drawing configurations has to exist. Given these setups have diverse shapes, cross-die-specific features z slant , Ψ slant , r die , x die , and y die (12.5% of proposed features) cannot be utilized. Also, there typically is more than one feasible tooling configuration per part. To account for this influence, we note that further features to describe the tooling geometry have to be complemented to the feature set to ensure the applicability of our proposed method. As most documented configurations are likely to be drawable, there is a risk of imbalanced learning. Another aspect of our method is the loss of local space-resolved feedback. There is no way to determine where, and to what extent drawability is given on a part. The surrogate models only allow for a global assessment. As no measured data from a press is included in the architecture, the prediction can only be as accurate as the incremental deep drawing simulation. Further influences like tailor-rolled or welded blanks, simultaneous multiple-part drawing, edge crack sensitivity, and springback are not considered in the presented workflow. Furthermore, additional criteria for drawability such as dimensional accuracy and surface irregularities have to be taken into account once automotive body parts are assessed. The 2S-ML architecture has the flexibility in its feature set and labeling process to consider these influences.
Future investigations could focus on overcoming the current limitations. Furthermore, we identified the following promising aspects: • Investigating the sensitivities of the surrogate models will allow insight into which models rely on what features. This has the potential to detect further features, estimate the possible risk of overfitting by reliance on one or a few features, and reveal a nonsensical correlation of features to a prediction that will not generalize well to unseen data. • Assuming there is an adequate number of industry-standard geometries available, deep learning approaches are likely to further improve the drawability assessment. • If not directly applicable to optimization, the regression surrogate could be the source domain for transfer learning to a fine-tuned part-specific model. This approach is likely to decrease the number of necessary samples for part-specific surrogates. It could also be used in an MF scheme. • Alternative training methods such as semi-supervised learning can be applied to decrease the number of computationally intensive HF simulation runs.
are calculated with r die as the die radius, z slant as the slant depth of the cross-die, and the punch radius r punch . It is calculated here based on geometry parameters r punch = r die − t ⋅ (1 + t s ) and a safety gap of t s = 0.05 . The blankholder force is f blkh and the geometrical moments of inertia about the x and y-axis (compare Fig. 1616) of the cross geometry are I x , I y . Additionally, A denotes the surface area and V the volume of either the cross, punch, or blank geometry. l cross convexhull denotes the arc length of the boundary edge of the convex hull of the cross. An exemplary bounding box and convex hull edge curve involved in the feature computation is shown in Fig. 16.
A sphere (V cross boundingbox ) refers to the surface area of a sphere whose volume is equivalent to the volume of the cross's bounding box V cross boundingbox . n e represents the number of elements with H e and A e the element-wise mean curvature and surface area, respectively. The features K and K are the mean and standard deviation of a probability density function of all element's Gaussian curvatures. The features s par , s hyp , s ell , and s pla are the percentages of the cross's nodes that meet the Fig. 16 The bounding box and convex hull edge curve (red) of the cross geometry used in the calculation of features c 2 and c 3 Fig. 17 The proposed 40 initial features to train the regression and classification models. After removing aberrations, there is a distribution of feature values that fills the feature space, except for ps m1 and h min . The values of these features have no variability and are therefore not useful for training surrogates. To be suitable for training with all algorithms, all features are min-max normalized differential geometry criteria parabolic, hyperbolic, elliptic, and planar, respectively.
Material features are where is the true stress and the true strain of the hardening curve. The manufacturing source contains the feature with v punch as the velocity of the punch, if the strain-rate influence is considered relevant. f dwb is the drawbead force proposed by the one-step simulation scheme.17 Its features are with t e 0 as the initial and t e f as the final sheet thickness. E is the Young's modulus. d e 0 is an elements Euclidean distance from its final state f to its unformed state 0 . A e c , A e w , and A e g denote a cracked, wrinkled, or good element's surface area. os drawability is the drawability measure (see Eq. 3) applied to the LF simulation. Features ps m2 and ps t are coefficients of a fit with a polynomial of degree nine and st m1 , st t and st h are coefficients of fit using a polynomial of degree three.
Authors' contributions TL: Conceptualization, Methodology, Software, Formal analysis and investigation, Writing-original draft, Writingreview and editing; AK: Software, Writing-review and editing; IL: Software, Writing-review and editing; FD: Writing-review and editing, Supervision; MW: Writing-review and editing, Supervision.
Funding Open Access funding enabled and organized by Projekt DEAL. This project is supported by the Federal Ministry for Economic Affairs and Climate Action (BMWK) on the basis of a decision by the German Bundestag.
Data availability Not applicable.
Code availability Not applicable.

Declarations
Ethics approval Not applicable.

Competing interests
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.