Sheet metal forming technology forms parts through a series of localized incremental plastic deformations called Incremental Sheet Forming (ISF) (Jackson & Allwood, 2009; Kumar & Kumar, 2015). ISF is a moderate, innovative sheet-forming technology without employing classic punch and dies (Najm et al., 2030). The components are formed by way of a predefined strategy called path, which guides a simple tool that performs incremental movements over the clamped sheet to form the desired final shape (Azaouzi & Lebaal, 2012; M. Amala Justus Selvam, R. Velu, & T. Dheerankumar, 1050; Szpunar et al., 2021). The first dieless incremental forming process can be traced back to Leszak (Leszak, 1967). The main two types of ISF are Single Point Incremental Forming (SPIF) and Two Point Incremental Forming (TPIF). The first patent mentioned before can be considered a SPIF, and TPIF was first presented by Matsubara (Matsubara, 1994). SPIF is more appropriate than conventional sheet-forming methods to manufacture prototypes, small production batches, and customized components (Mezher et al., 2018; Paniti et al., 2020; Trzepieciński et al., 2022a). SPIF is flexible and with its help it is very simple to produce complex geometries by following a programmed path strategy using a CNC machine with at least three controlled axes (Trzepieciński et al., 2022b). A very up-to-date paper briefly overviewed recent developments in SPIF of lightweight materials (Trzepieciński et al., 2021a). SPIF is one of the promising technologies in the future of sheet forming for aerospace applications (Trzepieciński et al., 2021b). However, the quality, surface finish, and geometric accuracy of products formed by SPIF are affected by different process parameters. At the same time, improper selection of process parameters can cause deviation and inaccuracy with respect to desired shapes. Still, ISF in general, and SPIF in particular, have disadvantages in terms of geometric accuracy and pillow effect. Due to the elastic–plastic deformation throughout the forming process, which is apt to instability, the formed sheet is prone to be impacted by a springback (Khan et al., 2015; Zhang et al., 2016) and pillow effect (Bai et al., 2019; Najm & Paniti, 2020). Matching the wall profile of the product to the CAD model based on the path strategy is one of the challenges of this process. One of the main factors that affects the accuracy of the components in terms of geometric accuracy is springback (Mezher et al., 2021a, 2021b). Wall displacement due to springback is the difference between the actual wall angle or diameter and the angle or diameter of the CAD model. The springback of component walls is induced mainly by two factors: local springback and global springback. Due to the sheet’s position following the use of the advanced forming tool, the local forming that occurred while the sheet returns to its initial position after the tool has passed causes local springback. On the other hand, global springback occurs as a result of residual stresses, which stem from the unconstrained material in the formed sheet after releasing the sheet from clamping (Edwards et al., 2017). In addition to the two types of springback mentioned above, Gates et al. (2016) noted that another type of springback ensues upon the displacement of the forming tool, which they named continuous local springback. Kiridena et al. (2016) invented innovative tools to increase the geometric accuracy of products formed incrementally. They described that dimensional accuracy is significantly impacted by the lengthening of the tool shank and by the decrease of the tool diameter, while the underside flat part of the donut-shaped tool increases the formed part’s accuracy. The papers (Najm & Paniti, 2018, 2020) stated that, as compared to the hemispherical tool of components formed by SPIF, flat tools led to better formability, increased homogeneity in thickness distribution, and a minimized pillow effect.

The pillow effect—also called bulge—is a concave surface developing on the unformed bottom area in the center of the parts (Nasulea & Oancea, 2021); pillow effect is the influential forming error that negatively impacts the geometric accuracy of SPIF components and limits formed part’s formability (Wei et al., 2020). The pillow effect occurs due to the springback of the wall between the obtained geometry and the CAD model, in which the forming tool generates tension on the unformed surface from the corner, and the middle part remains free to become puffy. The pillow appearing in the form of a bulge on the remaining unformed surface must stay flat during and after the forming process. Researchers have attempted to find a proper method to prevent the formation of pillow effect in products formed using SPIF. Ambrogio et al. (2007) suggest an ANOVA analysis based equation which allows for identifying quadratic model equations for the purpose of predicting geometrical errors between an ideal surface and real surfaces in the case of truncated pyramid shape aluminum alloy AA1050–O sheets. The researchers state that pillow effect is strongly influenced by tool diameter and product height. Micari et al. (2007) indicated that two types of errors in the typologies result in inaccurately formed parts of ISF: the two typologies’ errors are springback and pillow effect. Pillow effect increases forming force, which results in the inaccuracy of parts. Thus, work hardening during multi‐point forming decreases the pillow effect in the case of multi‐point forming (Zhang et al., 2017). Al-Ghamdi and Hussain (2015) have studied the mechanical properties of formed parts on pillow effect. They state that the tensile fracture affecting and controlling formability has an insignificant impact on bulge. However, a decrease in hardening exponent, which controls stretchability, decreases bulge, and this can be considered a significant property that affects the pillow effect. Furthermore, higher forming depth leads to bigger billowing but not in a linear way: certain specified depths relieve the pillow effect because of the property of the hardening exponent. Afzal (2021) sees this differently: he claims that the pillow effect sets in because of two different states in the formed sheet: the unformed base is in an elastic state, while the formed wall is in a plastic state. Isidore et al. (Isidore, 2014; Isidore et al., 2016) found that parts formed with the help of a hemispherical tool caused more oversized pillows because strains and compressive stresses were generated due to the compression of the material. Also, it was observed that smaller pillows resulted when a flat tool was used because tensile stress and strains impacted in transverse directions. Essa and Hartley (2011) discovered various ways to improve geometric accuracy by executing FE in SPIF. They reduced the bending of components via flanking using a support plate, and thus minimized springback by stationing a supporting tool and by eliminating the pillow effect through modifying the last stage of the tool path.

Due to their excellent formability and resistance to corrosion, general-purpose 3xxx’s alloys are used for architectural applications and for the manufacturing of various products. 3xxx’s alloys are non-heat-treatable but exhibit about 20% more strength than 1xxx series alloys. Manganese is the principal alloying element of 3xxx alloys, which is added either of its own or with magnesium. However, magnesium is considerably more effective than manganese as a hardener: about 0.8% Mg is equal to 1.25% Mn (Davis, 2001). In the experiments of single point incremental forming conducted in the scope of this study, an AlMn1Mg1 aluminum alloy blank sheet with an initial thickness of 0.22 mm was used. This alloy belongs to the 3xxx series based on its sequence of elements. Examples of common AlMn1Mg1 aluminum alloy applications include beer & beverage cans (Hirsch, 2006), where good formability is achieved by (Mg) with strengthening effects by (Mn). AlMn1Mg1 aluminum alloy is also used in automotive radiator heat exchangers and as tubing in commercial power plant heat exchangers (Kaufman, 2000), as well as for the following specific purposes: sheet metal work, storage tanks, agricultural applications, building products, containers, electronics, furniture, kitchen equipment, recreation vehicles, trucks and trailers (Davis, 2001).

Recently, various techniques of artificial intelligence have been used in many industries including the metal forming industry. In the last decade, machine learning has been applied using various Artificial Neural networks (ANN) techniques in a number of applications and industries. Furthermore, machine learning has dominated manufacturing with a view to designing the most practical, sufficient and adequate predictive models (Hussaini et al., 2014; Kondayya & Gopala Krishna, 2013; Lela et al., 2009; Li, 2013; Marouani & Aguir, 2012). A recent state-of-the-art review discovered analytical and numerical models of incremental formation (IF). IF-related issues have been solved using artificial intelligence AI-based computational approaches. This research evaluates IF literature. Artificial neural networks, support vector regression, decision trees, fuzzy logic, evolutionary algorithms, and particle swarm optimization solve IF-related issues. Hybrid approaches integrate some of the previous strategies (Nagargoje et al., 2021). Different intelligences with or without controlled manufacturing have been generated or developed as predictive models in end-milling machining, high-speed machining, and powder metallurgy (Amirjan et al., 2013; Ezugwu et al., 2005; Zain et al., 2010). Artificial neural network architecture has generated tool paths directly from a digital component model for ISF components. Multiple training techniques, network topologies, and training sets were examined in a feedforward network structure with backpropagation. They prove neural networks can generate tool routes for sheet metal free forming (Hartmann et al., 2019). Notably with respect to SPIF, different studies have developed ANN, support Victor Regression (SVR), Gradient Boosting Regressions (GBR) models for the prediction of formability (Najm & Paniti, 2021a), surface roughness (Najm & Paniti, 2021b; Trzepieciński et al., 2021c) and hardness (Najm et al., 2021) of components formed using SPIF under various forming conditions. Low et al. (2022) predict this error distribution from input CAD geometry of SPIF components using Convolutional Neural Networks-Forming Prediction (CNN-FP). For the untrained wall angle, the CNN-FP model had an RMSE (Root mean squared error) of 0.381 mm at 50 mm depth. The CNN-FP performance for the untrained complicated geometry was determined to be 0.391 mm at 30 mm depth. However, there was considerable degradation at 50 mm depth of the complicated geometry when the model's prediction had an RMSE of 0.903 mm.

In light of the literature, the above-detailed concerns in connection with SPIF as well as the lacking standardization of SPIF process parameters and the scarcity of referent mathematical models have motivated the authors of this paper to deal with the investigation and prediction of the pillow effect and wall diameter of truncated frustums processed by SPIF. To the authors’ knowledge, such an experimental process has not been tested or described in the literature so far. The researchers consider SPIF process parameters on geometric accuracy (pillow effect and wall diameter) as one of the main significant drawbacks of the SPIF process. Geometric accuracy in the form of components’ wall and pillow effect can be influenced by various factors including tool materials, tool shape and size and the surface finish of the tooltip. Therefore, in the present paper, geometric accuracy with respect to components’ wall accuracy and pillow effect have been investigated experimentally and have been studied in the scope of the above-mentioned parameters. Furthermore, as an aim and novelty in the scope of this paper, various models have been built along different combinations of parameters for both pillow effect and wall diameter datasets, and accordingly prediction equations based on weights and biases have been derived. The combined partitioning weight of the NN was adopted to estimate the relative importance (RI) of SPIF parameters on model output. In addition, and for the first time in ISF, process parameters have been interpreted via SHapley Additive exPlanations (SHAP), which were utilized to establish parameters’ relevance on the pillow effect and wall diameter.

Material properties

In this section material properties of the investigated sheet and properties of the applied tools are presented. In the scope of this study, an initial 0.22 mm thick AlMn1Mg1 aluminum alloy square-shaped blank sheets of 150 mm × 150 mm were used, the tensile of the specimens cut from the blank sheet was tested according to the EN ISO 6892-1:2010 standard and the tensile was conducted using an INSTRON 5582 universal testing machine at room temperature. Based on the rolling direction, the specimens were cut from three directions: 0°, 45°, and 90°, and three samples were produced for each direction. The relative standard deviation did not exceed 3% based on the test procedure. Furthermore, the planar anisotropy values (r10) were established using an Advanced Video Extensometer (AVE). The average values of the mechanical properties are listed in Table 1. The chemical composition of the sheet material was analyzed with the help of a WAS FOUNDRY-MASTER Optical Emission Spectroscopy (OES), and the pertaining data are found in Table 2.

For the formation of the sheet in the experiments of SPIF in the scope of this research design, various forming tools were used as far as tool shapes, tool materials, the tip radius (R), and the corner radius (r) of the tool are concerned. Two different tool designs have been selected: hemispherical with variety in tip radius (R) (see Fig. 1a) and flat tools with changes in corner radius (r) (see Fig. 1b). The (R) values are 1 mm, 2 mm, and 3 mm, while the (r) values are 0.1 mm, 0.3 mm, and 0.5 mm. Each set of tools was created using six different materials: Table 3 details the metal tools and their properties, and Table 4 lists the tools created using polymer provided by STRATASYS. Hardness was experimentally tested with the help of a Wolpert Diatronic 2RC S hardness tester according to ISO 6506-1:2014. Consequently, the corresponding hardness value for each tool was adopted for matching with the ISO code of the tested materials in question. Thus, the mechanical properties were determined based on the ISO code of the forming tool’s material. For the determination of the ISO code, a FOUNDRY-MASTER Pro2 Optical Emission Spectrometer was used to measure each forming tool material.

Fig. 1
figure 1

Forming tools: a hemispherical tool, b flat tool

Table 1 Mechanical properties of AlMn1Mg1 aluminum alloy sheet
Table 2 Chemical composition of AlMn1Mg1 aluminum alloy sheet (in wt%)
Table 3 Mechanical properties of metal tools
Table 4 Polymer properties


A frustum geometry shape with the dimensions shown in Fig. 2 was created experimentally during the forming process. The failure criterion was defined as the end of the forming process, and an example of a failed part is shown in Fig. 3. Two different tool shapes (spherical and flat), with three different tooltip sizes (three different radii (R) for the hemispherical and three different corner radii (r)) for the flat tool, and six different materials (see Fig. 4)—(Steel (C45), Brass (CuZn39Pb3), Bronze (CuSn8), Copper (E-Cu57), Aluminum (AlMgSi 0.5) and polymer VeroWhitePlus (RGD835))—altogether produce 36 various forming process conditions. For reasons of reliability and correct measurement assurance, three parts were formed for each process condition for the purpose of ensuring the accuracy of the obtained results; the total number of formed components thus totaled 108. The Design of Experiments (DOE) was not applied for minimizing the number of experiments because the resulting data were collected and used as an actual dataset (input and output) for the predictive models. A SIEMENS Topper TMV-510T 4-axis CNC milling machine combined with sinumerik 840-D controller was used in the forming process (see Fig. 5 for the machine and clamping design). The fixed forming process parameters are shown in Table 5. A step size with a value of 0.05 mm in direction Z was used as a constant value for the whole forming path. The application of a smaller step size results in an accurate geometry and better surface finish of final parts formed by SPIF (Lu et al., 2014; Mulay et al., 2018). A strategy of spiral path moves inward to the sheet center, with the center adopted to form the frustum parts incrementally, as shown in Fig. 6. This strategy was developed by Skjoedt et al. (2007) to reach the maximum axial loads in step size and to achieve a better surface of the impacted inner surface with the help of the forming tool. Also, this strategy aids the more successful and precise formation of components (Kumar & Gulati, 2020).

Fig. 2
figure 2

CAD geometry and dimensions of experimental product

Fig. 3
figure 3

Failed specimen

Fig. 4
figure 4

Tool materials

Fig. 5
figure 5

Topper CNC machine with rapid clamping rig on the CNC milling table

Table 5 Fixed process parameters
Fig. 6
figure 6

CAD geometry of experimental product with view of an inward spiral path

In each part of forming, the surface roughness of the forming tool was measured before and after starting the forming process. The forming tool’s surface roughness measured before the forming process was adopted as an input value of the upcoming formed part. For this value and the upcoming forming process, the surface roughness of the tool was adopted as input and so on. The above-mentioned scenario was applied to all forming tools used in the scope of this study. Nevertheless, a new polymer tool was employed in each forming process because of the wear on the forming tool developed by the polymer. The surface roughness of each polymer tool was measured before the start of the process.

The profiles of all formed parts and their unformed bases (Wall diameter and Pillow effect) were measured using a Mitutoyo Coordinate Machine CV-1000 (MCM), see Fig. 7. The average deviation of the three parts experimentally formed under the same SPIF condition was compared to the CAD model in terms of wall diameter and unformed base, and this was adopted as values of the geometrical accuracy and pillow effect studied in this research.

Fig. 7
figure 7

Mitutoyo coordinate machine

The MCM instrument was operated in the measurement processes along 49 mm at a speed of 0.5 mm per second together with a pitch size of 0.005 mm. MCM automatically creates a component profile with the possibility of measuring each point, as shown in Fig. 8. This figure shows examples only for 20 points. For each part, more than 9000 points were measured automatically with the help of axis X and Z coordinates. The measured data points were used to build an actual profile so that the CAD model, the measurement of the wall diameter and the pillow effect could be compared with this resulting profile. The pillow effect is the difference between the highest peak and the lowest valley values measured. In an ideal case of forming, the lower bottom unformed surface should be flat between the peak and the lowest valley value.

Fig. 8
figure 8

Component profile; pillow effect and wall diameter as measured by a Mitutoyo coordinate machine

Artificial neural networks

The notion of the neural network is often traced back to Warren McCulloch and Walter Pitts’s 1940 study. Their basic idea was that neural networks could compute any logical functions or mathematical formulations. Furthermore, the invention of the Perceptron Network at the end of 1950 can be considered as the first practical application of ANN (Hagan et al., 2014). Recently, neural networks have become the main interest for thousands of researchers in different fields of science. As a whole, it can be established that there is no scientific field that does not have any links with neural networks. Scientists in different areas including healthcare, aerospace, defense, arts, filmmaking, music and various industry technologies adopted ANN.

Neuron by Neuron (NBN)

Using Visual Studio 6.0 and C + + language, Hao Yu and Wilamowski (Yu & Wilamowski, 2009a, 2009b) developed a Neuron by Neuron (NBN) trainer. NBN can work with Fully Connected Neurons (FCN) and needs a lower number of neurons than Multilayer Perceptron (MLP). Figure 9 shows a scheme of five inputs fully connected with three neurons as well as one output and its topology. The developed tool supports three different types of neurons. The neurons are bipolar (“mbip”), unipolar (mu), and linear (“mlin”). Both the mbip and mu have outputs not exceeding 1, with negative and positive values for mbip, and only positive values for mu. The equations of the three types of neurons are presented in Eqs. 1, 2, and 3 (Yu & Wilamowski, 2009a, 2009b), respectively. In this study, running time was 100, and iteration was 500 with a maximum error of 0.001. The training tool provides a direct sum squared error (SSE) with the plotting area on the interface, which can indicate the accuracy of the prediction. All other parameters were set as default because they gave the best results. All the neuron types were tried with different numbers of connection neurons. It is worth mentioning that categorical encoding was used because the collected data from the 108 samples contained numerical and categorical datasets (see Categorical Encoding).

Fig. 9
figure 9

Scheme of neuron by neuron (NBN) of five inputs fully connected with three neurons and one output

$${\text{Bipolar}}\;\left({\text{mbip}}\right)\, {f}_{b}(net)=tanh(gain\times net)+der\times net$$
$${\text{Unipolar}}\;\left({\text{mu}}\right)\, {f}_{u}(net)=\frac{1}{1+{e}^{-gain\times net}}+der\times net$$
$${\text{Linear}\;}\left({\text{mlin}}\right) \,{f}_{l}(net)=gain\times net$$

where “gain” and “der” are parameters of activation functions.

Gradient boosting regression (GBR)

Gradient Boosting is one of the most common and powerful tree algorithms (Pedregosa, 2011). Gradient boosting is an ML method used as a classifier and regressor. This is a prediction model of collective learning, where each level attempts to correct the errors of a previous level in a formation called decision trees. Gradient Boosting model (Chen et al., 2021) with least-squares loss and 1000 regression trees with a depth of 6 and a minimum samples split of 2 was run to predict the pillow effect and wall diameter as shown in Fig. 10. Least-squares loss tries to locate predictive points and fits them on the already fitted line in an attempt to minimize error. The regression tree depth shows how many splits a tree allows to be created before a prediction is made. The boosting method is about consecutively learning learners’ nodes on the basis of previous ones by way of fitting the data set and analyzing the ensuing errors. In other words, this means that the boosting method works as a cycle consisting of learning nodes, fitting results to the data set, analyzing errors between the actual and predicted values, and re-starting learning the nodes that have been learned before. Moreover, this cycle is repeated until the previously set iteration number is reached.

Fig. 10
figure 10

Regression trees of depth 6 and minimum samples split 2


CatBoost is a high-performance open-source library for gradient boosting of decision trees (Prokhorenkova et al., 2018). A machine learning algorithm called CatBoost was developed by Yandex researchers and engineers. CatBoost has many features. One of the main features allows using categorical data (non-numeric factors), for which pre-processing of data is not needed. In addition, turning such data into numbers by encoding is not necessary, either (Dorogush et al., 2018). Furthermore, CatBoost aptly predicts with default parameters, so parameter tuning is unnecessary (Ibragimov & Gusev, 2019). By default, CatBoost builds 1000 trees, i.e., full symmetric binary trees with six in-depth and two leaves. The learning rate is determined automatically according to the properties of the trained dataset and the number of iterations. The automatically selected learning rate should be close to the optimal one. The number of iterations can be lowered for faster training, but in this case learning rate must be increased.

Multilayer Perceptron MLP

Network elements’ organization or structure, interconnections, inputs, and outputs constitute net topology. The topology of an ANN can be defined according to the number of input and output layers, with the transfer functions between these layers and the number of neurons in each layer (Nabipour & Keshavarz, 2017). ANN structure consists of input and output layers with a minimum of one hidden layer. Each layer of the net contains a number of neurons. Neurons in the input are equal to the number of input variables, and output neurons are equal to the number of outputs associated with each input. Based on the transfer function or so-called activation function (Beale et al., 2013), these neurons in the layers allow for the transfer of weight between the layers backward and forward. The current study adopted the multilayer perceptron (MLP) structure for its ANN model using a backpropagation learning algorithm. The idea of the MLP was initiated by Werbos in 1974, and Rumelhart, McClelland and Hinton in 1986 (Riedmiller). Equation 1 defines the multilayer perceptron as follows:


where \(y\) is the output and \(x\) is input, \({w}_{i}\) are the weights, and \(b\) is the bias (Principe et al., 1997).

Using MATLAB R2020a (Beale et al., 2019), two different MLP structures were created to predict wall diameter and pillow effect. The inputs and targets dataset were obtained from actual measured data from the formed parts by SPIF. The main difference between the two trained structures is the number of prediction outputs: one of the networks deals with one target, and the second deals with two targets. The outputs are wall diameter or pillow effect, or wall diameter together with the pillow effect. However, the inputs have 5 neurons: tool materials, tool shapes, tool radius, and tool surface roughness values Ra and Rz. In the scope of this research, each net structure exhibited one hidden layer with ten neurons connected to the input and output layers, as shown in the illustrated scheme in Fig. 11a and b. The other main parameters selected for the purpose of training in this study were as follows: learning rate 0.01, performance goal 0.001, and 1000 as the number of epochs. It is noteworthy that different training and transfer functions were tried and trained for finding the best model and structure (see Training function and Transfer function).

Fig. 11
figure 11

Different multilayer perceptron (MLP) structures: a two outputs; b one output

The training flowchart of the developed model and the checking process using the test data are presented in Fig. 12. Two main conditions make decisions during the running process. The first loop in light blue color saves the model and all variables with low limits of condition. The second loop in light red is activated after the first condition occurs and stops. The second loop finds the training and variables, compares these with the variables saved from the previous training, and continues to do so until 1000 iterations. Shared step loops are displayed as light green arrows.

Fig. 12
figure 12

Flowchart of developed multilayer perceptron (MLP) model

Training function

In neural networks, dataset training utilizes the optimization technique for tuning and finding a set of network weights for building a good prediction map. There are various optimization algorithms also called training functions. The training function is an algorithm for training the network to identify a specific input and for mapping such input to an output. The training function depends on many characteristics, including the trained dataset, weights and biases, as well as the performance goal. One challenge of building good, fast and really accurate predictions lies in selecting a fitted training function for the network. With this in mind, in the scope of this paper, ten different types of training function “learning algorithms” were executed in both MLP nets for the purpose of mapping outputs associated with inputs. Levenberg–Marquardt (Trainlm) is one of these implemented training functions and is considered the fastest compared to the others. Similarly, the BFGS Quasi-Newton algorithm is quite fast (Beale et al., 2019). The corresponding training functions are listed in Table 6.

Table 6 Details of training functions used in multilayer perceptron (MLP)

Transfer function

In machine learning, the sums of each node are weighted, and the sum is passed through a function known as activation function or transfer function. The summed weights undergo a transfer function, and the transfer function computes the output of each layer through adopting summed weights entering the given layer. Setting proper transfer functions is a challenging task and is based on many factors but mainly on network structure. Usually, in multilayer perceptron (MLP) Log-sigmoid (Logsig) is used, and there are alternative functions like Hyperbolic tangent sigmoid (Tansig), which is generally used for pattern recognition (Beale et al., 2019). In this study, besides the aforementioned two functions, thirteen different transfer functions were performed to improve prediction accuracy. Eventually, the linear (Purelin) transfer function was chosen for the output layer in all cases. Table 7 lists all the transfer function algorithms with related Eqs. 5–18 used for this study (Demuth, 2000).

Table 7 Details of transfer functions used in multilayer perceptron (MLP)

Dataset distribution

The actual data of SPIF components were used as input data for all the structures and models built and trained in the scope of this study to predict pillow effect and wall diameter as network outputs. Predicting results when forming new parts without executing any new process of forming is both economical and more practical. However, existing data must be separated into various subsets: i.e., training, validation, and testing datasets. In fact, prediction accuracy and training performance are significantly influenced by dividing the dataset into training and testing subsets (Zhang et al., 1998). Inappropriate subsets negatively affect benchmark performance. On the other hand, Shahin (Shahin et al., 2000) claimed there is no apparent association between the splitting ratio of the dataset, but Zhang et al. (Zhang et al., 1998) explain that the splitting ratio is one of the main problems. Yet, no general setting is available as a solution. Based on their surveys, most researchers split the datasets into lines with a different ratio of subsets. The most broadly adopted ratios are 90% vs. 10%, 80% vs. 20%, or 70% vs. 30% for training and testing. As part of the training run in the scope of this paper, optimal prediction was obtained by adopting a dividing ratio of 80% vs. 20% of the actual data (108 samples) concerning training and testing datasets, respectively. Regarding the actual dataset, 108 rows were taken from the SPIF components formed experimentally, and the recorded rows were used as training and testing datasets.

Categorical encoding

There are three typical methods for altering categorical variables to numerical values. One of them represents categorical data sets: this is the One-Hot Encoding variables (Guido, 1997). The other two are Ordinal Encoding and Dummy Encoding. The categorical variables will sparse-binarize and can be integrated into training in different machine learning models. The idea of one-hot-encoding is to replace categorical data with one or more new features. Each parameter takes a new numerical feature, and one of these features is always active through their substitution by 0 and 1 by way of creating one binary for each category. In ordinal encoding, each category is given an Integer value and sequences up to the number of the actual features. The dummy encoding is a slightly improved version of one-hot-encoding. In one-hot-encoding the numerical values are equal to the number of categories and the dummy is equal to the number of categories minus 1. Table 8 shows different ways of encoding the categorical data as part of the current study. In the scope of this research, two categorical sets were encoded. Tool materials and tool shapes were binarized as sparse matrices. However, ordinal encoding was adopted for tool materials because the other methods were in conflict either with feature importance calculation (see Contribution analysis of input variables) or one-hot-encoding concerning the tool shape; in the scope of latter the flat tool is replaced by 0 and the hemispherical tool is represented by 1.

Table 8 Details of different encoding methods


A neural network is a practical tool used in various applications, but it has several drawbacks. One of these drawbacks is underfitting or overfitting. Underfitting happens when a model is too simple for training the selected dataset, and overfitting is when the network gives a larger error for the new data set than concerning the set trained before. In other words, the trained net can memorize the learned data set but is not trained to generalize those new data sets that are not fitted. Boosting algorithms are generally supplied with regularization methods to avert overfitting (Ibragimov & Gusev, 2019) because the number of samples in an ensemble set does not always improve the accuracy but can reduce generalization ability (Mease & Wyner, 2008). There are various ways to improve and handle network generalization, such as: using a large network to provide a good fit, training several nets to guarantee that good generalization is found, averaging the outputs of trained multiple neural networks, separating data randomly, and tuning the complexity of a net through regularization (Bishop, 1995).

In the MLP network created for the purpose of prediction in this study, generalization has been improved by the so-called early stopping method. The early stopping technique is a default method automatically provided for all supervised network creation functions, including backpropagation networks. This method splits the training dataset into three subsets: training, validation, and testing (see Fig. 13). During network training, training subset data will be utilized for calculating the gradient and for updating weights and biases to fit the models. Also, the validation subset estimates prediction error during the training process; and finally, the test subset will be utilized to test the learned network as well as to assess generalization errors and plot them instantly as the training is running. However, if the overfitting of data starts over during the training of the net, the errors will be larger in the validation subset. In addition, if the validation error increases above the minimum at a significantly different iteration number of iterations and, at the same time, becomes larger than the error of the test subset, the training stops, and network weights and biases return to the smallest validation error (Beale et al., 2020).

Fig. 13
figure 13

Regression process; training, validation, and testing

Investigation of accuracy

There are numerous validation metrics, but using the appropriate validation metric is essential for evaluating a predictive model, and physical observations are also necessary for improving model performance. This study compared and validated different structures and various training and transferring algorithms for assessing and measuring the agreement between actual and predicted values. Picking the suitable validation metric is crucial and challenging for assessing results and for minimizing errors. In the scope of assessing results to test performance, all structures and models trained and tested in this study were compared based on their evaluation by appropriate metrics. Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination (R2), together with the primary metrics used to extract these above-mentioned metrics, were adopted. An R2 value close to 1 indicates good performance, and RMSE and also MAE near 0 means lower error. In this scenario reliable performance can be ensured; and vice versa: a large deviance between RMSE and MAE values points to significant variations in the distribution of error. However, the limitations of R2 are listed in Misra and He (2020); furthermore, RMSE and MAE have an accurate evaluation compared to other validation metrics because if MAE has more stability, RMSE is more sensitive to error. For our purposes and for prediction values validation, Standard Error Mean (SEM) was used, which is the original Standard Deviation (SD) of the sample size divided by the square root of the sample size. The other metrics like Error (E), Mean Error (ME), and Mean Square Error (MSE) were involved in originating validation equations. The Total Sum of Square (SStot) and the Sum of the Square of Residuals (SSres) were adopted for the derivation of R2 and adj. R2. Analytically, pertaining validation equations are as follows:

$$E= \left({y}_{i}^{target}-{y}_{i}^{predict}\right)$$
$$ME= \frac{1}{n}\sum_{i=1}^{n}\left({y}_{i}^{target}-{y}_{i}^{predict}\right) \;{\text{or}}\; ME= \frac{1}{n}\sum_{i=1}^{n}\left(E\right)$$
$$MAE= \frac{1}{n}\sum_{i=1}^{n}\left(\left|\left.{y}_{i}^{target}-{y}_{i}^{predict}\right|\right.\right) \;{\text{or}}\; MAE= \frac{1}{n}\sum_{i=1}^{n}\left(\left|\left.E\right|\right.\right)$$
$$MSE= \frac{1}{n}\sum_{i=1}^{n}{\left({y}_{i}^{target}-{y}_{i}^{predict}\right)}^{2} \;{\text{or}}\; MSE= \frac{1}{n}\sum_{i=1}^{n}{\left(E\right)}^{2}$$
$$RMSE= \sqrt{MSE}$$
$$\begin{aligned}MRE=& \frac{1}{n}\sum_{i=1}^{n}\left(\left|\frac{{y}_{i}^{target}-{y}_{i}^{predict}}{{y}_{i}^{target}}\right|\right) \;{\text{or}}\\ MRE=& \frac{1}{n}\sum_{i=1}^{n}\left(\left|\frac{E}{{y}_{i}^{target}}\right|\right)\end{aligned}$$
$$\overline{E }= \left({y}_{i}^{predict}-{y}_{i}^{target}\right)$$
$$\overline{ME }= \frac{1}{n}\sum_{i=1}^{n}\left({y}_{i}^{predict}-{y}_{i}^{target}\right)$$
$$SD= \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}{\left(\overline{E }-\overline{ME }\right)}^{2}}$$
$$SEM= \frac{SD}{\sqrt{n}}$$
$$\overline{y }= \frac{1}{n}\sum_{i=1}^{n}\left({y}_{i}^{target}\right)$$
$${SS}_{tot}= \sum_{i=1}^{n}{\left({y}_{i}^{target}-\overline{y }\right)}^{2}$$
$${SS}_{res}= \sum_{i=1}^{n}{\left({y}_{i}^{target}-{y}_{i}^{predict}\right)}^{2} \;{\text{or}}\; {SS}_{res}= \sum_{i=1}^{n}{\left(E\right)}^{2}$$
$${R}^{2}= \frac{{SS}_{tot}-{SS}_{res}}{{SS}_{tot}}$$


$${R}^{2}= \frac{\sum_{i=1}^{n}{\left({y}_{i}^{target}-\overline{y }\right)}^{2}- \sum_{i=1}^{n}{\left({y}_{i}^{target}-{y}_{i}^{predict}\right)}^{2}}{\sum_{i=1}^{n}{\left({y}_{i}^{actual}-\overline{y }\right)}^{2}}$$
$$ {\overline{\overline{y}}} = \frac{1}{n}\sum_{i=1}^{n}\left({y}_{i}^{predict}\right)$$
$$\begin{aligned}{adj. R}^{2}=&1-\left(\frac{\frac{1}{n}\sum_{i=1}^{n}{\left({y}_{i}^{target}-{y}_{i}^{predict}\right)}^{2}}{\left(\left.\frac{\frac{1}{n}\sum_{i=1}^{n}{\left({y}_{i}^{predict}-\overline{\overline{y}}\right)}^{2}}{n-1}\right)\right.}\right) \;{\text{or}}\\ {adj. R}^{2}=&1-\left(\frac{MSE}{\left(\left.\frac{\frac{1}{n}\sum_{i=1}^{n}{\left({y}_{i}^{predict}-\overline{\overline{y}}\right)}^{2}}{n-1}\right)\right.}\right)\end{aligned}$$

Contribution analysis of input variables

The contribution analysis of input variables on the associated outputs is called feature importance, variable importance, or relative importance. However, this analysis indicates each feature’s relative importance in the model driving a prediction by showing changes in the averages of predictions in the event that the feature value changes. The substitute of the input variables with high relative importance (RI) values significantly affects results as compared to variables that have lower RI values (Nabipour & Keshavarz, 2017; Rezakazemi et al., 2011; Vatankhah et al., 2014). Concerning this, there are various techniques to calculate feature importance and such techniques include Garson (Garson, 1991), Most Squares (Ibrahim, 2013), and Connection Weights (Olden & Jackson, 2002). These methods are established based on the connection weights of neurons and are depicted in Eqs. (36)–(38), respectively. In addition, there is also built-in feature importance in Gradient Boost and CatBoost regression. In the literature, different studies adopted feature importance to calculate and evaluate their impacts on variables (Ding, 2019; Rezakazemi et al., 2011; Shabanzadeh et al., 2015; Vatankhah et al., 2014; Zarei et al., 2020; Zhou et al., 2015), but the first time feature importance was adopted to find relative importance in ISF was in Najm & Paniti, 2021b and later in Najm & Paniti, 2021a; Najm et al., 2021). The equations below (3638) describe the above as follows:

$$RI(\mathrm{\%})= \frac{\left[\sum_{j=1}^{{n}_{h}} \left({y}_{vj }/\right. \sum_{k=1}^{{n}_{v}} \left.{y}_{kj}\right){ hO}_{j}\right]}{{\sum }_{y=1}^{{n}_{v}} \left[\sum_{j=1}^{{n}_{h}} \left({y}_{vj }/\right. \sum_{k=1}^{{n}_{v}} \left.{y}_{kj}\right){ hO}_{j}\right]}$$
$$RI\left(\mathrm{\%}\right)= \frac{\sum_{j=1}^{{n}_{v}}{\left({y}_{vj}^{i}-{y}_{vj}^{f}\right)}^{2}}{\sum_{j=1}^{{n}_{v}}\sum_{v=1}^{n}{\left({y}_{vj}^{i}-{y}_{vj}^{f}\right)}^{2}}$$
$$RI\left(\mathrm{\%}\right)= \sum_{j=1}^{{n}_{v}}{y}_{vj}{y}_{jo}$$

where nv number of neurons in the input layer, nh number of neurons in the hidden layer, yj absolute value of connection weights between the input and the hidden layers, hOj absolute value of connection weights between the hidden and the output layers, \(\sum_{j=1}^{{n}_{v}}{\left({y}_{vj}^{i}-{y}_{vj}^{f}\right)}^{2}\) sum squared difference between initial connection weights and final connection weights from the input layer to the hidden layer, \(\sum_{j=1}^{{n}_{v}}\sum_{v=1}^{n}{\left({y}_{vj}^{i}-{y}_{vj}^{f}\right)}^{2}\) total of sum squared difference of all inputs. \(\sum_{j=1}^{{n}_{v}}{y}_{vj}{y}_{jo}\) sum of product of final weights of connection from input neuron to hidden neurons with the connection from hidden neurons to output neurons. \(j\) total number of hidden neurons. o output neurons.

Another way to present the input variables is the SHAP value, which shows a features’ importance by quantifying the contributions of each feature and by calculating the contributions of those features that have contributed to the prediction to the greatest extent. SHAP (SHapley Additive exPlanations) presented by Lundberg and Lee (Lundberg & Lee, 2017) is a unified framework for interpreting predictions, and its interpretation has been inspired by several methods (Bach et al., 2015; Ribeiro et al., 2016; Štrumbelj & Kononenko, 2014). The SHAP principal calculation (Lundberg et al., 2018) is shown in Eq. 39.

$$\begin{aligned}{ShapValues}_{i}=&\sum_{\mathrm{S}\subseteq \mathrm{N}\backslash \left\{i\right\}}\frac{\left|S\right|!\left(M-\left|S\right|-1\right)!}{M!}\\ &\left[{f}_{x}\left(S\cup \left\{i\right\}\right)-{f}_{x}\left(S\right)\right]\end{aligned}$$

where: M number of input features. N set of all input features. S set of non-zero feature indices (features observed and not unknown). \({f}_{x}\left(S\right)=E\left[f\left(X\right)\left|{X}_{s}\right.\right]\) is the model’s prediction for input xx, where \(E\left[f\left(X\right)\left|{X}_{s}\right.\right]\) is the expected value of the function conditioned on a subset S of input features.

Results and discussion

Regarding the prediction of pillow effect and wall diameter by way of applying different models and structures, Table 9 depicts values of different validation metrics used for checking performance.

Table 9 Validation metrics for checking ANN structure and methods used for predicting pillow effect and wall diameter

It is imperative to distinguish between training and test errors. Training errors are calculated using the same data as the ones used for training the model, but a stored full dataset unknown to the model is used for calculating test error. It can be concluded that the R2 value of the training dataset implies variance within the trained samples through the model, whereas the R2 value of the testing dataset indicates the predictive quality of the model. From Table 9, it is clear that there is a significant disparity between the different techniques in favor of the developed model as far as the prediction of the pillow effect is concerned. Using the features to predict a pillow and using the same features to predict the wall diameter are two different problems. The features can be readily used to predict the wall diameter. However, not all the prediction models can learn the connections among the data provided and use such information for the prediction of the pillow. A few possible reasons for this are the following: the problem has a stochastic nature, the data set lacks some critical features, the data are insufficient, the model is too simple for the problem, or the combination of any of the preceding causes. All the above-mentioned issues would cause a disparity in results with respect to the estimation of the model’s real predicting capabilities in the case of unseen data. Therefore, upon comparing the R2 of the testing with respect to all models and algorithms, it can be noted that the developed MLP model offered the best performance in the prediction of the pillow effect. The best performance of predicting the pillow effect was achieved by using BFGS Quasi-Newton (BFG)—Trainbfg as a training function and Symmetric Sigmoid (Tansig) as a transfer function. Regarding the prediction of wall diameter, the Gradient Boosting Regression (GBR) has the largest R2 value as a model performance in terms of wall diameter prediction, and the developed MLP model with one output comes in second in terms of R2 (see Fig. 14, which exhibits the techniques of the ANN used for predicting the pillow effect and wall diameter values of SPIF components). Due to the fact that the R2 value of testing using the NBN technique is negative, R2 was converted to zero for illustration purposes in Fig. 14. Since R2 is defined as the percentage of variance explained by the fit, the fit can be worse than the simple application of a horizontal line for this purpose, in which case R2 will be negative. Nevertheless, from a logical point of view, GBR’s R2 value is slightly larger than the R2 value of the developed model, and all other validation metrics indicate that the developed model is better than GBR in terms of performance. The best performance of the developed MLP model to predict wall diameter was obtained by using the Levenberg–Marquardt (LM)—Trainlm training function and Softmax transfer function. It is noteworthy that the two outputs model was not able to record good performance concerning the other techniques. For checking the results of all the training and transfer functions, see Table 11, 12, 13, 14 in the Appendix.

Fig. 14
figure 14

R2 values for predicted results (pillow effect and wall diameter) in the case of different ANN models

Predictive testing data for both pillow effect and wall diameter using different ANN techniques versus the actual data are presented in Figs. 15a–d and 16a–d. The solid line displays an exact theoretical fit of actual and predicted values, with superimposed data over them. The distribution and deviation situated far away from the solid line are based on the model’s ability to predict the pillow effect or wall diameter values with the lowest errors.

Fig. 15
figure 15

Actual and calculated values of pillow effect obtained with ANN models and algorithms in the case of a Neuron by neuron (NBN), b Gradient boosting regression (GBR), c CatBoost, and d Multilayer perceptron (MLP)

Fig. 16
figure 16

Actual and calculated values of wall diameter obtained with ANN models and algorithms in the case of a Neuron by neuron (NBN), b Gradient boosting regression (GBR), c CatBoost, and d multilayer perceptron (MLP)

For discovering an alternative approach of predicting pillow effect and wall diameter in an easy, practical and faster way—instead of building, running, and evaluating a new ANN model each time recurringly—analytical equations for the prediction of pillow effect and wall diameter of parts formed by SPIF were extracted from the best model. This gave rise to a new method: by way of substituting only the process parameters, the obtained equations can be directly used for predicting either pillow effect or the wall diameter. Therefore, Eqs. 42 and 45 were formed, which needed constant weights and biases imported from the ANN network with the best performance. The extracted ANN network weights and biases functioned as one set of input weight (IW) and layer weight (LW). The IW is between the inputs and the hidden layer, and the LW is between the hidden and the output layers. The biases for each layer are (b1) and (b2). In the Appendix, Table 15 for pillow effect and Table 16 for wall diameter provide b1, b2, IW, and LW obtained from the best trained ANN model.

$$f\left(x\right)=tansig\left(x\right)=\frac{2}{\left(1+{exp}^{\left(-2\times x\right)}\right)-1}$$
$${Pillow}\; {Effect}_{i}^{predict}={b}_{2}+LW\times tansig\left({b}_{1}+IW\times x\right)$$
$$\begin{aligned}&{Pillow}\; {Effect}_{i}^{predict}\\ &\quad={b}_{2}+LW\times \frac{2}{\left(1+{exp}^{\left(-2\times \left({b}_{1}+IW\times x\right)\right)}\right)-1}\end{aligned}$$
$$\begin{aligned}&{Pillow Effect}_{i}^{predict}=\begin{array}{c}{b}_{2}\\ \left[-4.4799\right]\end{array}\begin{array}{c}LW\\ + [4.8997\; -2.9949\; -1.8258\; -4.4692\; 2.3491\; -0.0459\; 0.9892\; -10.9224\; -11.2295\; -15.6383]\end{array} \\ &\quad \times \frac{2}{\left(1+{exp}^{\left(-2\times \left(\begin{array}{c}{b}_{1}\\ \left[\begin{array}{c}-1.0689\\ 2.5502\\ 4.8340\\ 5.3937\\ 0.5004\\ 6.4855\\ 1.7566\\ 2.0245\\ -1.1407\\ 6.9462\end{array}\right]\end{array}+\begin{array}{c}IW\\ \left[\begin{array}{cccccccccc}-0.3664& 3.4493& -0.3515& 2.7239& 0.7225& -1.4367& 9.6762& -3.1508& 3.2435& -14.1984\\ -2.9456& 1.8466& -1.7678& -0.8303& 4.7506& 6.6591& -4.1411& 6.9449& -7.4108& 8.0983\\ -0.0801& 2.2851& 0.6648& 4.9030& 0.1742& -1.2504& 11.9422& -0.5326& 0.4992& 3.6904\\ -0.0311& 3.3176& -2.8060& 2.1082& -1.2817& -3.6789& 2.6040& -8.7422& -3.7858& 1.1147\\ 0.1396& 0.4012& 2.3888& -0.2148& -0.1876& -0.2164& 4.8982& -5.5040& 6.1299& -9.5227\end{array}\right]\end{array}\times \begin{array}{c}x\\ \left[\begin{array}{c}Tool Material\\ Tool Shape\\ Tool End Radius\\ Tool Roughness (Ra)\\ Tool Roughness (Rz)\end{array}\right]\end{array}\right)\right)}\right)-1}\end{aligned}$$
$$f\left(x\right)=softmax\left(x\right)=\frac{{exp}^{\left(x\right)}}{\sum \left({exp}^{\left(x\right)}\right)}$$
$${Wall Diameter}_{i}^{predict}={b}_{2}+LW\times softmax\left({b}_{1}+IW\times x\right)$$
$${Wall Diameter}_{i}^{predict}={b}_{2}+LW\times \frac{{exp}^{\left({b}_{1}+IW\times x\right)}}{\sum \left({exp}^{\left({b}_{1}+IW\times x\right)}\right)}$$
$$\begin{aligned} &{Wall Diameter}_{i}^{predict}=\begin{array}{c}{b}_{2}\\ \left[19.2151\right]\end{array} \begin{array}{c}LW\\ + [\begin{array}{cccccccccc}2.5135& 22.9679& 24.9085& 23.8584& -135.9051& 29.3786& 25.9349& 16.5500& 29.1113& -19.6351\end{array}]\end{array}\\ &\quad \times \frac{{exp}^{\left(\begin{array}{c}{b}_{1}\\ \left[\begin{array}{c}-2.1118\\ 1.0873\\ -5.0721\\ -4.6389\\ -3.1617\\ 2.2753\\ 58.4812\\ -22.5044\\ -11.0040\\ -13.1130\end{array}\right]\end{array}+ \begin{array}{c}IW\\ \left[\begin{array}{cccccccccc}-0.2097& 11.7171& 1.5851& -0.3582& 0.8478& 0.5419& -21.3230& -0.3622& 5.2457& 4.8888\\ 0.2117& 2.9757& -8.5864& 3.2977& -5.0426& -5.9637& 73.9425& -17.8875& -7.9919& -35.0915\\ -1.9420& -77.4397& 24.2144& 25.8808& 23.6070& 22.0928& -85.3340& 32.3866& 18.9395& 20.2970\\ -0.6388& 144.0557& 44.0436& 112.4779& 72.4671& 73.4602& -254.2707& 49.4518& 32.7593& 13.5966\\ -1.9055& 7.5593& 6.3161& -1.9865& 2.9167& 3.1807& -34.9560& 7.7291& 5.1248& 7.2609\end{array}\right]\end{array}\times \begin{array}{c}x\\ \left[\begin{array}{c}Tool Material\\ Tool Shape\\ Tool End Radius\\ Tool Roughness (Ra)\\ Tool Roughness (Rz)\end{array}\right]\end{array}\right)}}{\sum \left({exp}^{\left(\begin{array}{c}{b}_{1}\\ \left[\begin{array}{c}-2.1118\\ 1.0873\\ -5.0721\\ -4.6389\\ -3.1617\\ 2.2753\\ 58.4812\\ -22.5044\\ -11.0040\\ -13.1130\end{array}\right]\end{array}+\begin{array}{c}IW\\ \left[\begin{array}{cccccccccc}-0.2097& 11.7171& 1.5851& -0.3582& 0.8478& 0.5419& -21.3230& -0.3622& 5.2457& 4.8888\\ 0.2117& 2.9757& -8.5864& 3.2977& -5.0426& -5.9637& 73.9425& -17.8875& -7.9919& -35.0915\\ -1.9420& -77.4397& 24.2144& 25.8808& 23.6070& 22.0928& -85.3340& 32.3866& 18.9395& 20.2970\\ -0.6388& 144.0557& 44.0436& 112.4779& 72.4671& 73.4602& -254.2707& 49.4518& 32.7593& 13.5966\\ -1.9055& 7.5593& 6.3161& -1.9865& 2.9167& 3.1807& -34.9560& 7.7291& 5.1248& 7.2609\end{array}\right]\end{array}\times \begin{array}{c}x\\ \left[\begin{array}{c}Tool Material\\ Tool Shape\\ Tool End Radius\\ Tool Roughness (Ra)\\ Tool Roughness (Rz)\end{array}\right]\end{array}\right)}\right)} \end{aligned}$$

With respect to relative importance and weights analysis, considerable significant factors affecting pillow effect and wall diameter are shown in Figs. 17a–d and 18a–d. Different methods based on weights and biases for finding the contributions of the input variables affecting output are pillow effect and wall diameter. Regarding pillow effect (see Fig. 17a–d), all of the methods show that changes in tool materials and tool shapes have a significant effect with insignificant variance on the Garson method (see Fig. 17a). Tooltip roughness (Rz) is listed at the end of the importance list as tooltip roughness has the lowest impact in all cases except for the connection weights method (see Fig. 17b). Concerning wall diameter (see Fig. 18a–d), tooltip roughness (Ra) has the most significant impact. On the other hand, the Catboost method indicates that tool end radius is the most significant impact on wall diameter (see Fig. 18d). Changes in tool shapes are always affected on the second level of the contribution’s impact. Below, tool materials and tool end radius are listed with the exception of the CatBoost method.

Fig. 17
figure 17

Relative importance of different input variables on pillow effect according to a Garson, b connection weights, c most-squares and d CatBoost algorithms

Fig. 18
figure 18

Relative importance of different input variables on wall diameter according to a Garson, b connection weights, c most-squares and d CatBoost algorithms

In an attempt to do away with this controversy in the variation of the relative importance of input parameters as yielded by the different calculation methods, Fig. 19 was created, which show the average relative importance of the four previously mentioned methods. It has been concluded that tool materials and shapes have the most significant influence on pillow effect. Importantly, the difference in tool end radius recorded the lowest value. As for wall diameter, the surface roughness of the tool (Ra) is the highest effective value, followed by the change in tool shape. The least influential parameter was change of tool materials.

Fig. 19
figure 19

Average relative importance of different input variables on a pillow effect, and b wall diameter

SHAP is a theoretical method that explains prediction by models. It can estimate and explain how each feature contributes and influences the model. This technique estimates each feature’s contribution to each row of the dataset. The summary plots in Fig. 20 for pillow effect and Fig. 21 for wall diameter illustrate the individual feature’s importance with respect to their feature effects. Each point on the summary plot represents a Shapley value for a feature. Axis Y defines the feature’s level, and axis X shows Shapley values. The color of the plots denotes the value of the feature from lower to higher importance. The features are listed based on their importance. Shapley values represent the relative distribution of predictions among the features. It is worth stating that the same values of a certain feature can have different contributions towards an output, as dictated by other feature values for that same row.

Fig. 20
figure 20

Summary plot of SHAP value impact on pillow effect

Fig. 21
figure 21

Summary plot of SHAP value impact on wall diameter

Figures 20 and 21 represent every data point in each dataset feature as a single SHAP value on axis X and each feature on axis Y. The color bar indicates the feature’s value: red means high values, and blue indicates low values. Grey points represent categorical inputs. Values on the right have a “positive” effect on the output, and values on the left have a “negative” effect. Positive and negative are merely directional terms and are related to the direction in which the model’s output is affected. This, however, does not indicate the model’s performance. For example, the most distant left point for tool radius is a high value for the tool radius feature in the first raw, therefore it appears in red. This high value for tool radius influenced wall diameter as a model output by approximately − 4. The predictive model without that feature would have predicted a value of 4 or higher. Similarly, the rightmost red point of the surface roughness feature (Ra) with a value of 2 means that the absence of this value leads to the prediction of a wall diameter with a value below − 2. That means that broader extensions of the data point indicate the most effective features.

To understand how sharp value changes during the prediction process, three different examples shown in Table 10 illustrate three different behaviors: one on the left side (negative), one on the right side (positive), and one balanced between the negative and positive sides. The colors used are only for purposes of demonstration.

Table 10 Three different examples of pillow prediction

The prediction models in Table 10 were visualized using a SHAP decision plot, which uses cumulative SHAP values. Each plotted line explains a single model prediction. Figure 22a plots all the pillow effect predictions, and Fig. 22b plots three prediction values as mentioned earlier.

Fig. 22
figure 22

SHAP decision plot a all prediction values, and b three prediction values in Table 10

Each value is represented individually: Fig. 23 shows total positive features values, Fig. 24 total negative features values, and Fig. 25 displays both positive and negative features values. The three figures mentioned earlier are presented on (a) a SHAP decision plot, (b), a SHAP bar plot, and (c and d) on a SHAP force plot. The SHAP force plot shows exactly which features had the most extensive influence on the model’s prediction concerning individual observation values. The difference between (c and d) for all the figures is that (c) presents the value of the features, and (d) presents the feature SHAP value for each feature value.

Fig. 23
figure 23

Total positive SHAP values: a SHAP decision plot, b SHAP bar plot, and c and d SHAP force plot

Fig. 24
figure 24

Total negative SHAP values: a SHAP decision plot, b SHAP bar plot, and c and d SHAP force plot

Fig. 25
figure 25

Positive and negative SHAP values: a SHAP decision plot, b SHAP bar plot, and c and d SHAP force plot

As attested by the waterfall plots (Figs. 23, 24, 25) of various SPIF components selected in different conditions, the order of the impacts of parameters varied in the different components. This signifies that the estimation of parameters is the result of multiple factors. One single effective parameter that sufficiently affects output (pillow effect or wall diameter) cannot be captured, given that other parameters could affect the outcome of the model. Furthermore, complexity in the formability of parts using SPIF, which consists in stretching, bending, and shearing with cyclic effects, varies according to conditions, which thereby yields varying orders of impacting parameters in the individual components. Hence, this analysis reveals that each process condition is unique and that multiple factors interact and differently impact outcomes in individual parts. Also, forming tools with various materials show different hardness, which generates different surface strains. The same also holds true regarding different tool geometry. However, it is essential to point out that different effects on components occur if two tools made from two different materials but with identical geometry are used. This phenomenon is caused by different surface strains. Differing hardness values result in varying tool tip surface roughness, which thereby affects parts accuracy in terms of pillow effect and wall diameter. Furthermore, elastic deformation at the tooltip will produce dimensional inaccuracies of the formed part, and the plastic deformations will permanently damage the forming tool thereby excessively impacting the accuracy of the components, as explained by Kiridena et al. (2016).


This paper offered and described different algorithms and model structures of machine learning to predict pillow effect and wall diameter concerning the geometric accuracy of SPIF components produced from foil aluminum alloy sheets. The study’s primary goal was to analyze the feature importance of SPIF parameters involved in the forming process with a view to determining the best model and architecture. Consequently, findings were adopted to derive two analytic equations for theoretically calculating pillow effect and wall diameter. The most significant findings of the study are as follows:

  1. 1.

    In the case of the training and testing datasets, with the help of R2 values, CatBoost successfully predicted the wall diameter ranging from 0.9714 to 0.8947, and also successfully predicted pillow effect between the range of 0.6062 and 0.6406. Concerning R2 values between 0.9645 and 0.9082 for wall diameter and between 0.7506 and 0.7129 for pillow effect, respectively, it was shown that the Levenberg–Marquardt training algorithm yielded the best performance as a prediction model based on different validation metrics. NBN offers no notable results, while GBR provides a reliable prediction of the wall diameter.

  2. 2.

    A one-output multilayer perceptron (MLP) solution network showed better results than a network with two outputs.

  3. 3.

    The most promising performance of predicting pillow effect was achieved by way of using BFGS Quasi-Newton (BFG)—Trainbfg as a training function, and Symmetric sigmoid (Tansig) as a transfer function.

  4. 4.

    The best performance of the developed MLP model to predict wall diameter was achieved by way of the Levenberg–Marquardt (LM)—Trainlm training function and softmax transfer function.

  5. 5.

    This research project marks the first time the relative importance (RI) method using SHapley Additive exPlanations (SHAP) was used to assess SPIF factors on outputs.

  6. 6.

    Relative importance (RI) revealed that tool materials and shapes are the most influential factors impacting the pillow effect. Surface roughness of the tool (Ra), followed by changes in tool shapes with the highest effective values on wall diameter.

  7. 7.

    The lowest effective parameter on pillow effect was tool end radius, and the lowest effective parameter for wall diameter was changes in tool materials.

  8. 8.

    Computation of the parameters is achieved by accumulating many factors; in reality, individual parameters are not sufficient enough to affect output (pillow effect or wall diameter). In other words, with respect to other parameter values in the same row, identical values of one parameter may contribute to an outcome in a number of different ways.