Single and ensemble classifiers for defect prediction in sheet metal forming under variability

This paper presents an approach, based on machine learning techniques, to predict the occurrence of defects in sheet metal forming processes, exposed to sources of scatter in the material properties and process parameters. An empirical analysis of performance of ML techniques is presented, considering both single learning and ensemble models. These are trained using data sets populated with numerical simulation results of two sheet metal forming processes: U-Channel and Square Cup. Data sets were built for three distinct steel sheets. A total of eleven input features, related to the mechanical properties, sheet thickness and process parameters, were considered; also, two types of defects (outputs) were analysed for each process. The sampling data were generated, assuming that the variability of each input feature is described by a normal distribution. For a given type of defect, most single classifiers show similar performances, regardless of the material. When comparing single learning and ensemble models, the latter can provide an efficient alternative. The fact that ensemble predictive models present relatively high performances, combined with the possibility of reconciling model bias and variance, offer a promising direction for its application in industrial environment.


Introduction
Sheet metal forming is a manufacturing process that is commonly used for producing high-volume and low-cost components in the automotive, aircraft and home appliance industries. In this process, forces are applied to the metallic sheet to modify its geometry, enabling the production of complex shapes. The forces are applied by tools whose geometry dictates the shape of the component. The process design is complex because only the final shape of the component is known. Moreover, the process is highly nonlinear due to the large deformations imposed to the metal sheet, which presents plastic behaviour, but also as a result of the evolutionary boundary conditions imposed by the contact between the tools and the sheet. The conventional process design is based on empirical knowledge and an experimental ''trial-and-error'' approach. In this context, the virtual tryout of sheet metal forming components, based on the finite element method (FEM), has become an indispensable industrial tool to save design effort, money and time during the process set-up and production. The rationale of FEM comes from the optimization of the process parameters, such as the tools geometry. FEM is a deterministic numerical tool since it enables the prediction of forming defects, such as localized necking, fracture and springback, for a predefined set of process and material parameters [4]. Nevertheless, it should be noticed that there are numerous variables involved in sheet metal forming process, which are related to the material properties, the tools geometry and process parameters. This makes the optimization of process conditions quite complex, particularly in the production of components which require several stages, and thus more than one set of tools. Therefore, the virtual tryout of sheet metal forming components with FEM is normally performed considering predefined material properties and values for some process parameters, such as the friction coefficient among others. In fact, the virtual tryout is still reliant on human expertise used to make key decisions at different stages of the design process. Still, even when resorting to FEM, unpredicted defects can occur in the experimental tryout or during production, which can be associated with the scatter observed in material properties, tools geometry and process parameters. The increasing competitiveness and relevance of sustainability issues in the industries lead to growing demands for high-quality components and reducing the costs generated by the production of defective components (scrap).
In this work, an approach to extract information from a sheet metal forming processes, exposed to sources of scatter in the material properties and process parameters, is proposed in order to enable the prediction of defects. The motivation is to reduce the costs and the time spent in the production of defective sheet metal components, i.e. contributing to improve the industry's efficiency. Machine Learning (ML) techniques are used, assuming that they can build models able to generalize well in unseen data. In this context, an empirical analysis of performance of ML techniques is conducted, considering single and ensemble classifiers. These are trained using data sets populated with numerical simulation results of two sheet metal forming processes: U-Channel and Square Cup. These processes were chosen for two main reasons: (i) they are benchmark tests commonly used to investigate the influence of the material as well as the process parameters on the occurrence of forming defects; (ii) they allow fast numerical simulation results, which is suitable for performing a large number of simulations. Since these processes present distinct features, different types of defects were considered for each one. Each type of defect is studied separately using a binary classification. Moreover, the data sets are generated for each forming process, for three steels with distinct mechanical properties.
The paper is organized as follows: Section 2 presents the details of the sheet metal forming processes and a review about ML applications in this context. Section 3 describes the proposed approach for evaluating the performance of both single and ensemble ML classifiers in predicting defects in sheet metal forming processes. The selected ML classifiers and ensemble methods are also discussed. Section 4 introduces the FEM models for the two forming processes under analysis. The procedure for generating and pre-processing the data sets as well as the evaluation metrics is also indicated. In Sect. 5, the results are presented and discussed. Firstly, the ML classifiers are analysed under a monolithic approach, considering also the influence of the size of sampling data. Afterwards, the analysis of the performance is conducted for the ensemble approach. Finally, the performance of both single and ensemble classifiers is compared. Section 6 presents the conclusions and future perspectives.

Sheet metal forming
Sheet metal forming includes simple processes, such as bending, stretch forming and spinning, and more complex processes, like roll forming and deep drawing [16]. Each type of process has its specifications and parameters, including the tools geometry. Process design becomes even more complex when it is required to combine several processes and/or steps to produce the component. The main driver for the development of numerical tools, enabling the virtual tryout of sheet metal forming components, is the industry, in particular the automotive industry, due to the enormous amount of components involved in car production. The outer panels are usually the largest components, and their production involves the most complex operations, including deep drawing. As shown in Fig. 1, in deep drawing the metallic sheet is plastically deformed into the desired shape by the action of forming tools, which typically consist of a punch, a die and a blank holder. The blank (i.e. non-deformed metal sheet) is placed over the die, and it is forced to flow into the die cavity by the movement of the punch; the flow of the sheet is typically controlled with a blank holder, i.e. a tool that imposes a constant force on the flange region of the sheet. Thus, even for a simple forming process and assuming that the mechanical behaviour of the metallic sheet is known, there are many design variables to be considered, which are related to the blank and tools geometry and with the process control, such as the blank holder force. As previously mentioned, the FEM-based virtual tryout of sheet metal forming components enables a feasible process design to be achieved through the repetitive adjustment of process parameters based on the personal experience of the designer. However, to fully explore the finite element analysis, it has been combined with optimization algorithms in order to determine the process parameters automatically (e.g. [29,37,42]). This approach is more or less computationally intensive, depending on the number of design variables and the type of optimization algorithm selected. The number of trial experiments (i.e. numerical simulations) can be reduced resorting to a surrogate model (also called meta-model), used to guide the search for optimized parameter combinations. Different meta-modeling methods have been applied to the optimization of manufacturing process parameters (e.g. [18]), including artificial neural networks (ANNs) (e.g. [26,34]). In the particular case of forming processes, researchers are also trying to explore the large amount of data generated (both experimental and numerical) while designing new products, to guide the process design from its early stage with the help of ANN meta-models to predict product feasibility (e.g. [40,45]). In the context of early design stages, neural network classifiers have been applied in automating the sheet forming selection process, as an alternative to rulebased programmes [16]. Moreover, the robustness of the process design is questionable when neglecting the sources of scatter inherent to the process. The process design can be optimum for a specific combination of parameters but can easily lead to defective components due to slight variations introduced by scatter. In this context, a robust process window should be evaluated, in order to minimize the production of defective components (e.g. [46]). Therefore, much of the recent work has focused on statistical descriptions of variability within FEM, for assessing the sensitivity of defect predictions to the scatter of the parameters under analysis [19,35,47]. In FEM, the material properties are commonly described using physicsbased constitutive models. ML-based models have been pursued as an alternative to this type of models (e.g. [21,23]), since the neural network does not require any prior assumption on the mathematical formulation between the input and output variables. The prime value added by ML is the ability to unveil the intrinsic response of a material in case of convoluted experimental data [24]. Nevertheless, some authors point out that physics-based constitutive models continue to provide useful insights to interpret the phenomena taking place, pursuing a different approach that uses machine learning to construct automatic corrections to existing models, based on data [14].

Machine learning applications to sheet metal forming
This subsection provides an overview of the literature on ML applications in sheet metal forming. Table 1 shows a comparative outline of ML applications in the prediction of forming defects [9,13,15,20,22,25,28,30,32,39,41]), which is the focus of the current work. Additional applications include: (i) material parameters' identification (e.g. [1][2][3]6]); (ii) bend angles' prediction in laser forming processes (e.g. [7,10]); (iii) die roll height prediction in fine blanking (e.g. [43,48]); (iv) optimization of incremental sheet metal forming processes (e.g. [12,17,44]). The summary presented in Table 1 highlights that ML applications have been focused on regression. In this regard, back-propagation-based artificial neural network (BP-ANN) is the primary option for the development of prediction models, some of them coupled with FEM analysis [9,22,28,30,32,41]; ANN models trained with genetic algorithms (GA-ANN) were also developed [25,39]. Most ML strategies in Table 1 are used to predict and account for springback in steel and aluminium parts obtained by sheet bending. This may be connected with the fact that springback (related to the elastic recovery of the material after tool release) is one of the main sources of geometrical and dimensional inaccuracy in sheet metal formed components, but also because of the simple geometries used. Nevertheless, models were also built to predict wrinkling and necking defects [9,32]. In general, the features for training the ML models are material parameters (namely elastic and/or plastic properties) and the initial sheet thickness. This can be related to the fact that the standards for commercial metal sheets specify only a minimum allowable value or a relatively large range of values for the mechanical properties. Nevertheless, there are models that also consider process parameters. Globally, the literature review reveals that ML techniques to predict defects in sheet metal forming take into account different set-ups. Although promising results were reported, techniques to predict more than one type of defect for different types of materials and forming processes have rarely been considered. To the best of the authors' knowledge, there are currently no studies available in the literature regarding ML classification focused on defect prediction in sheet metal forming processes under variability, which is the main subject of the current work.

Proposed approach for building defect predictive models
This work focuses on the building of models able to predict the occurrence of defects for different types of materials and sheet metal forming processes under variability. Figure 2 presents the schematic diagram of the proposed approach, considering the branches for the monolithic and the ensemble classifiers. The first phase of both approaches consists in training the selected classifier. When resorting to an ensemble model, either stacking or majority voting is used in the learning phase. Once the training phase is concluded, the predictive model is tested and the performance analysis is accomplished. To simplify Fig. 2, the predictive model is represented by only one box, although distinct types are built depending on the approach (monolithic or ensemble). To guarantee a proper comparison of performance, each model uses the same training and testing data, obtained from the same scaled sampling data. In addition, the same configuration with random weights was used.

Single learning classifier models
To accomplish the task for evaluating the best predictor of single sheet metal forming defects, seven ML classifiers were selected: -Multilayer perceptron (MLP) -Decision tree (DT) Seven ML models were created for each of the two types of defects considered in each of the two forming processes under analysis, for three different materials. The models were built using Python v3.6.2 and related libraries, such as SciPy Ecosystem and SciKit-learn, using default values for the parameters of each classifier [5,33]. The following sections provide a theoretical background concerning each of the studied classifiers.

Multilayer perceptron (MLP)
The multilayer perceptron (MLP) is a class of feed-forward neural networks that consists of one input layer with n neurons X n ¼ ðx 1 ; x 2 ; :::; x n Þ, at least one hidden layer, where the number of hidden layers is arbitrary, and one output layer. Each layer has neurons that will connect with the neurons of the next layer, but they cannot be interconnected. The MLP learning process is to adapt the connection weights in order to obtain a minimal difference between the network output and the desired output, resorting to learning algorithms such as back-propagation, which is based on gradient descent techniques. The MLP output needs to compute the output of each unit in each layer, considering the set of hidden layers H ¼ ðh 1 ; h 2 ; :::; h n Þ and n i neurons in each hidden layer h i . The following equation is used to calculate the output of the first hidden layer: where l is the layer position in the MLP architecture, / is the activation function, that are nonlinear functions, w lÀ1 ij are the weights between the neuron i in the hidden layer l À 1 and the neuron j in the hidden layer l þ 1. Finally, the network output is computed by: where w l ij is the weight between the neuron i in the last hidden layer l ¼ n, which is the output layer, and the neuron j in the output layer.

Decision tree (DT)
The decision tree (DT) is a nonparametric classifier that splits data continuously, based on simple decision rules. The choice of which feature to consider when splitting the data on each node is made in order to maximize information gain, which means minimizing: where n is the number of examples in the resulting node i with the desired label, N is the total number of examples in the resulting node i, D represents the data in said node, and H is an impurity function, such as entropy: where p is the probability that an example in the data set corresponds to label i. This splitting process is repeated until each of the final nodes (leaves) only has samples with the same label. Alternatively, a stopping criterion can be defined in order to avoid overfitting. Random forest (RF) consists in a combination of several randomized decision trees and aggregating their predictions by averaging, characterizing the ensemble learning method, to solve classification and regression problems. In the binary supervised classification problem, the random response Y takes values in {0,1} and a given input X has to guess the value of Y. A classification rule m n is a measurable function of x and training sample T n that attempts to estimate the label Y from x and T n , where T n ¼ ðX 1 ; Y 1 Þ; :::; ðX n ; Y n Þ of independent random variables distributed the same as the independent prototype pair (X,Y), and X ¼ fX 1 ; X 2 ; :::; X n g.
A random forest is a predictor consisting of a collection of M randomized regression trees. For the kth tree in the family, the predicted value at the query point x is denoted by m n ðx; h k ; T n Þ, where h k is a random vector generated with independent random variables of the kth tree, not related to the past random vectors h 1 ; :::; h kÀ1 but with the same distribution.
In the classification situation, the random forest classifier is obtained via majority voting among the classification trees, that is: m M;n ðx; fh 1 ; :::; h k g;

Naive Bayes (NB)
Naive Bayes is a classifier based on the application of the Bayes theorem, with the (naive) assumption that every pair of features is independent. Bayes' theorem states that: Pðyjx 1 ; :::; x n Þ ¼ PðyÞPðx 1 ; :::; x n jyÞ Pðx 1 ; :::; x n Þ ð6Þ After applying the naive assumption, this expression is simplified to: Pðyjx 1 ; :::; x n Þ ¼ PðyÞ Q n i¼1 Pðx i jyÞ Pðx 1 ; :::; For a given data set, the denominator will be the same for all entries, so a proportionality is considered: The chosen label is the one that presents the maximum probability:

Support vector machine (SVM)
Support vector machine (SVM) is supervised learning models, used to solve classification or regression problems. It is characterized as a discriminative classifier that finds the optimal separating hyperplane for test data points. The method consists in the binary classification of the training examples with features x and labels y, where y 2 fÀ1; 1g, and uses the following function for classification: The SVM classifier directly predicts 1 or À1, instead of first estimating the probability of y being 1, where The separating hyperplane is completely defined by ðw; bÞ. Given a training sample of ðx n ; y n Þ, the functional margin can be defined as: Given a training set that is linearly separable, the optimization problem described by the following equation should be solved: The above is an optimization problem with a convex quadratic objective and only linear constraints, providing the optimal margin classifier.

K-nearest neighbours (KNN)
The k-nearest neighbours classifier does not create a model with the training data. Instead, each time it performs a prediction for a certain point, it starts by calculating the distance between each of the training data points and the test point. Then, the k training points that are nearest to the test point are selected, and these are used to make the prediction. The result of the prediction can be obtained by a simple majority vote from the selected training points. The KNN classifier is often known as a lazy learning since there is no training procedure but rather an assignment of the labels to the training instances in the first phase. In the second phase, the computation of the distance is performed as explained above.

Logistic regression (LR)
Logistic regression (LR) studies the association between a categorical dependent variable y and a set of independent (explanatory) variables x, where y consists of a binary code (0,1 or true, false) and x is numerical. With the requirements satisfied, this method fits a logistic curve, i.e. sigmoid curve, to the relationship between x and y. The sigmoid curve starts with slow, linear growth, followed by exponential growth, which then slows again to a stable rate. The simple logistic function is defined as follows: With the aim to provide more flexibility to the function, the logistic regression formula can be extended to a form where a and b are, respectively, the intercept of y and the regression coefficient:

Ensemble models
Ensemble methods combine single classifiers, called base learners in this context, and are able to be more stable and predict better than single classifiers. The rationale is to reduce the bias and variance of the model to improve predictions. The goal is to build a model less noisy, more stable and less prone to overfitting. Since various base learners are used, each one can lead to a different prediction, where diversity among the base learners is a key aspect to ensemble performance. In this work, the following ensemble methods were used: Majority Voting in the initial phase each base learner is trained. Afterwards, each base learner is fed with the testing data in order to have a prediction. The final predicted label is the one that has more than half of the votes; Stacking similarly, in the initial phase, each base learner is trained. Afterwards, their outputs are used as features to train another ML classifier that is called meta-learner, which will make the final prediction.
Taking into account that majority voting favours the use of odd numbers of base learners, both methods were tested using combinations of 3 and 5 base learners. All possible combinations of the single classifiers described in Sect. 3.1 were tested. These classifiers were also tested as metalearner. All this leads to a total of 56 combinations, for majority voting, and 392 combinations, for stacking.

Simulated data sets
The sampling data were generated using numerical simulation results, obtained with DD3IMP in-house FEM code [27,31]. The numerical models for the U-Channel and the Square Cup processes are shown in Fig. 3. In both cases the total punch displacement is 30 mm. The U-Channel corresponds to a bending process, and thus, the sheet is prone to significant springback. This type of defect is negligible in the Square Cup. In both cases, the occurrence of excessive thinning is an indicator of necking, which can also be controlled using the maximum equivalent plastic strain (EPS). Thus, two types of defects were considered: (i) springback and maximum thinning, for the U-Channel, and (ii) maximum EPS and maximum thinning, for the Square Cup. The tool geometry and the initial in-plane shape of the sheet are assumed fixed. Each process was simulated considering three steels commonly used in the automotive industry that cover a wide range of mechanical properties and applications: DC06 (mild steel), HSLA340 (high-strength low-alloy steel) and DP600 (dual-phase steel). The constitutive model considers: (i) elastic behaviour, Young's modulus, E, and Poisson ratio, m; (ii) plastic behaviour, yield stress, Y 0 , strength and hardening coefficients, C and n, and anisotropy coefficients r 0 , r 45 and r 90 . The initial sheet thickness t 0 is also considered. The variability in the input features related to the material parameters is typified by a normal distribution, with mean (l) and standard deviation (r) values shown in Table 2. Two input features related to process parameters were also considered: the friction coefficient and the blank holder force (BHF). The mean value of the friction coefficient is 0.144 for all materials, with r/l ¼ 20% [19]. For the BHF, two mean values were considered, which correspond to a lower and an upper level of the process window. For the U-Channel, the mean values used were 4.9 and 19.6 kN, while for the Square Cup, they were 2.45 and 9.8 kN. For each BHF value, the variability is r/l = 5%. Thus, the variability of a total of 11 features was considered in analysis of both forming processes, for the three materials. In this context, random numerical simulations were performed within the range of variation of the input features (see Table 2). The numerical simulations using the mean values of the input features presented in Table 2 lead to a non-defective solution, which is considered as a reference solution. A defect occurs when the output value obtained from the random simulations is greater than that of the reference solution, whose values are presented in Table 3.

Data set pre-processing
The data sets were split in training (70%) and testing data (30%). Data scaling was performed in all the data sets. A maximum of 2000 experiments (i.e. numerical simulation results) was considered for each material and forming process. The data were randomly shuffled in order to repeat the process 30 times (runs). For each material and feature, the first row is the mean value and the second row is the standard deviation. The Poisson ratio feature (l ¼ 0:3, r ¼ 0:015) is identical for all materials

Performance measures
The F-score measure is used to evaluate the performance of both single and ensemble classifiers. This performance metric combines both precision and recall metrics and provides a break-even between them. The F-score is calculated as the harmonic mean between precision and recall, as follows: where precision takes into account the proportion of correctly classified instances (true positives (TPs)), among all the positive instances classified (true positives (TP) and false positives (FP)), and recall evaluates the percentage of correctly identified instances of a class (TP) among all the instances of a given class (true positives (TP) and false negatives (FN)): 5 Results and discussion

Single classifiers
Figures 4 and 5 show the evolution of the F-score values with the sampling data size, respectively, for the U-Channel and Square Cup forming predictive models, using 200, 500, 1000, 1500 and 2000 samples. In general, for both the U-Channel and Square Cup, the values of F-score increase with the increase in the sampling data size; exceptions include the cases ''HSLA340-springback'' and ''DP600springback'' with the LR classifier (see Fig. 4c, e), where the F-score is nearly constant. Accordingly, the highest values of F-score are generally obtained for 2000 samples, with few exceptions. Adding more training data would reduce variance but increase bias. Therefore, the performance analysis will focus on the results with 2000 samples, which is considered the critical sampling size for this problem [38]. Figure 6 shows the values of F-score, for the critical sampling size, obtained by the U-Channel predictive models in the cases of springback (Fig. 6b) and maximum thinning (Fig. 6c) Fig. 6b, c). The dissimilarity between the performances of the classifiers is more noticeable for the springback (Fig. 6b) than for the maximum thinning (Fig. 6c), in which all the classifiers except KNN are competitive. Figure 7 shows the values of F-score, for the critical sampling size, obtained by the Square Cup predictive models of maximum EPS (Fig. 7b) and maximum thinning (Fig. 7c). The mean values of F-score range from 74.65% (maximum thinning prediction with DP600, using DT-see Fig. 7c) to 90.50% (maximum EPS prediction with HSLA340, using MLP-see Fig. 7b), with relatively low standard deviation values. The MLP is the highest performing classifier for predicting both the maximum EPS and maximum thinning in all materials, with mean values of F-score ranging from 84.37% (maximum thinning, HSLA340) to 90.50% (maximum EPS, HSLA340); also, the SVM classifiers show a relatively good performance. The NB, KNN and DT classifiers are the lowest performing classifiers. The results show that, for a given type of defect, most classifiers show similar performances among the materials. For a given material, the difference in the performance of the classifiers between the two types of defects is more noticeable in the U-Channel than in the Square Cup. It is further noticed that MLP and KNN are, respectively, the highest and the lowest performing classifiers. This indicates that learning is an important step for finding the nonlinear decision boundaries. In fact, KNN is a ''lazy learner'' and it is harder to discriminate between the sought classes; thus, as a predictor, it becomes less useful. Finally, a Friedman test was conducted on the respective F-score values of each classifier, to check whether the performances of the single classifiers are significantly different; this nonparametric statistical test allows for performance comparison when dealing with several classifiers over multiple data sets [8,11]. In this test, the null hypothesis states that all single classifiers performed equally. The rejection of this hypothesis means that differences between the performances of single classifiers are statistically significant. The obtained Friedman statistic, equal to 59.82 (i.e. corresponding to a p value equal to 4.89Â10 À11 ), is greater than its critical values at significance levels of 5% (12.59) and 1% (16.81), which lead us to reject the null hypothesis.

Ensemble classifiers
Tables 4 and 5 present the best combinations of the classifiers obtained for majority voting and stacking ensembles, respectively. The mean and standard deviation values of the F-score were obtained from 30 runs of each ensemble. When comparing both tables, the use of stacking ensembles generally leads to an increase in the performance relatively to majority voting; the increase in performance corresponds to more than 1.5% in the case of the Square Cup (see cases ''HSLA340-maximum EPS'' and ''DC06maximum thinning'' in Table 5). The opposite occurs only for the maximum thinning prediction in the U-Channel (see cases ''U-Channel-Maximum Thinning'' in Tables 4  and 5), where a maximum performance reduction in 0.4% is obtained for DP600. In the case of stacking ensembles, the area under the curve (AUC) metric was determined (see Table 5), which depicts the trade-off between the truepositive rate and the false-positive rate. The relatively high values of AUC (generally above 90% average The smallest standard deviation values were obtained in both springback and maximum thinning U-Channel ensembles, respectively, stacking for DP600 (see Table 5) and majority voting for DC06 (see Table 4). The low values of standard deviations on all the 30 runs in all the experiments reveal a great deal of stability in the procedure as well as relatively good significance of the results.
The performance comparison between single and ensemble classifiers shows that the latter provide generally better defects predictors. In particular, in stacking ensembles it is expected that the meta-learner is prone to less errors by reducing error variance and thus generalizing well in the test set. Ensemble methods combine several machine learning techniques into one predictive model in order to decrease variance and bias, or improve predictions. The fact that the single classifiers already provide relatively high performances (ca. 85%, on average) and thus are stable learners; the best combinations of the classifiers obtained for the majority voting and stacking ensembles (ca. 90% average performance) do not show to outperform significantly the single classifiers. On the other hand, the overall increase in the ensembles' performance is coupled with a lower variance, which promotes robustness and stability of the procedure and allows a better bias-variance trade-off.

Conclusion
In this study, machine learning techniques were used for predicting defects of sheet metal forming processes. The same sampling data were applied to generate single learning and ensemble models. These data were obtained using numerical simulations of two forming processes, U-Channel and Square Cup, with three different materials. In general, the performance of single classifiers increases with the increase in the sampling data size, showing a stabilization that enables the definition of a critical sampling size. Considering the critical sampling size, the results show that, for a given type of defect, most single classifiers show similar performances among the materials. The best combinations of the classifiers obtained for the majority voting and stacking ensembles can provide better predictors than single classifiers (particularly when using stacking ensembles); however, the performance differences are small. Ensemble models allow a better trade-off between bias and variance, and it is expected they perform well in real data from sheet metal forming industry. In fact, the relatively high F-score values coupled with their low variance motivate the application of the proposed approach in industrial environment, in order to assess its feasibility as a decision support tool for predicting defects in sheet metal forming. Further studies will focus on the development of ML regression models for predicting sheet metal forming defects, and the subsequent performance comparison with response surface methodology and kriging regression models. Portuguese Foundation for Science and Technology, by FEDER, through the programme Portugal-2020 (PT2020), and by POCI, with reference POCI-01-0145-FEDER-031243; EZ-SHEET, co-funded by Portuguese Foundation for Science and Technology, by FEDER, through the programme Portugal-2020 (PT2020), and by POCI, with reference POCI-01-0145-FEDER-031216. All supports are gratefully acknowledged.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creative commons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.