Introduction

Cold spray (CS) is a solid-state additive manufacturing and coating process, in which powder particles are accelerated to supersonic speeds, after which they are deposited on a surface without melting. The underlying bonding mechanisms are local metallurgical bonding and mechanical bonding which are caused by localized plastic deformation between the particles and/or the substrate. The plastic deformation and therefore the formation of a cold spray deposit rely largely on the particle kinetic energy before impact (Ref 1).

A vast array of more than 30 process parameters are responsible for determining the characteristics of the manufactured parts, such as the temperature and pressure of the accelerating gas, spray nozzle geometry, distance of the spray nozzle to the sprayed surface, and ultimately, the motion path of the spray nozzle. Further, the characteristics and conditions of the spray powder material and the substrate material also affect the properties of the final manufactured parts (Ref 2, 3). Due to this complexity, the optimization of the CS process poses significant challenges.

The resulting level of porosity is a critical property of a cold-sprayed deposit as it is directly related to coating or deposit quality and performance. For some applications, it can be desirable and intentional to achieve a high level of porosity, whereas in other cases a highly dense deposit is required. Figure 1 shows two cross-sectional microscope images of CP titanium samples with a low porosity of 0.7%, achieved with a gas pressure of 49 bar and a gas temperature of 1000 °C (a) and a higher porosity of 5.0% achieved with a gas pressure of 50 bar and a gas temperature of 880 °C (b).

Fig. 1
figure 1

Cross-sectional microscope images of CP titanium samples with low porosity (a) and high porosity (b), manufactured at Titomic Ltd. (Melbourne, Australia)

An example of an application for porous cold-sprayed deposits is orthopedic implants made of titanium. A porous structure permits bone tissue ingrowth and provides a reliable fixation at the bone implant interface. Sun et al. (Ref 4) were able to cold spray a titanium coating with a porosity of 48.6% on a titanium substrate suitable for the aforementioned application.

For other applications a porous coating microstructure is detrimental. For example, when applying a protective surface coating, a porous coating may inadequately protect the underlying material if it is exposed to a corrosive environment, leading to corrosion damage of the structure (Ref 5). Further, the mechanical properties of metal parts are adversely affected by porosity. Therefore, both the total porosity and pore size of load bearing components must both be kept to a minimum (Ref 6).

The porosity of a cold-sprayed coating or deposit is highly sensitive to both material characteristics and processing parameters (Ref 7). Table 1 provides examples of relevant cold spray process parameters and their reported effects on the resulting porosity. It should be noted that the following identified relationships are specific to their particular study conditions and parameters, and as such, these relationships may not have been verified to hold more generally.

Table 1 Parameters affecting the porosity in cold spray

While the changing of individual CS process parameters has observable effects, many of these parameters have complex interdependencies. For instance, the reported contradictory trends in studies by Zahiri et al. (Ref 15) and Marrocco et al. (Ref 19), where one observes increased porosity with larger powder particle size while the other reports the opposite, can be attributed to variations in experimental conditions. Factors such as different cold spray systems, accelerating gases, and the potentially non-monotonic nature of relationships between spray and material characteristics and porosity contribute to the observed discrepancies.

In another example, Magarò et al. (Ref 24) observed that the effect of the nozzle traverse speed on the mechanical properties of stellite-6 coatings is more prominent when low values of gas temperature and pressure are imposed, whereas for the highest ones smaller differences are noticeable.

These nuanced influences suggest that authors may indeed measure different trends under distinct experimental setups and that makes it challenging to optimally control or optimize the process, to achieve specific final process characteristics within manufactured parts or coatings.

Porosity in materials can be measured through various methods, including image analysis, x-ray Microtomography (XMT), and the Archimedes principle. Image analysis technique involves the binarization of microscope images of sample cross-sections with software like ImageJ and determining porosity as the ratio between black and white pixels. XMT employs x-ray imaging to capture detailed, cross-sectional views of materials, providing insights into their internal structures. The Archimedes principle involves measuring the displacement of fluid when a sample is submerged, offering a way to determine the volume and, consequently, the porosity of the material (Ref 12, 13, 25).

Computational models offer an alternative approach to determine the porosity without the need to manufacture physical samples. Terrone et al. (Ref 26), Song et al. (Ref 13), and Weiller and Delloro (Ref 3) employed coupled Eulerian–Lagrangian (CEL) frameworks to simulate multiple particle impact in CS and quantify the resulting porosity in the material. Terrone et al. (Ref 26) focused on multi-material titanium-copper and titanium-aluminum combinations and on the prediction of coating porosity after removal of a sacrificial aluminum or copper phase, varying volume fractions of these sacrificial materials. The porosity was assessed through a simulated chemical etching process, selectively removing sacrificial materials connected to the deposit surface and transforming those in direct contact with the external environment into pores.

Song et al. (Ref 13) analyzed Ti-6Al-4V powder particle impacts on Ti-6Al-4V substrates and found that the substrate temperature had a negligible impact on deposit porosity, which was mainly influenced by particle temperature and velocity. Porosity, represented by the average Eulerian volume fraction (EVF) void percentage, was determined by analyzing a cubic volume within the deposit and counting void elements.

Weiller and Delloro (Ref 25) simulated the impact of a combination of aluminum and alumina particles on an aluminum alloy substrate. To assess the porosity level in the simulation results, a method based on the calculation of successive convex hulls was employed to avoid user-dependent sampling volume choice that might result in non-representative or too small volumes for the porosity evaluation in the deposit. Comparing the porosity predictions with x-ray microtomography (XMT) results revealed notable differences, which were suspected to be due to different volume fraction sizes analyzed in simulation and experiments. An increase in the fraction volumes generated in the simulation to match the sizes of those in the experiments was expected to improve the prediction accuracy, but it would lead to prohibitive computational costs.

The mentioned models share common limitations, as they are tailored to specific use cases, restricting their applicability to a broad range of spray scenarios. Further the predicted porosity accuracy is influenced by mesh size and analyzed volume fraction, with smaller mesh sizes and larger volume fractions achieving higher accuracy but increasing computational time, limiting the practical usability of the models (Ref 13, 25).

Machine learning (ML) models offer an alternative data-driven approach for predicting outcomes by learning from examples, capable of handling multi-dimensional and heterogeneous data to detect both linear and nonlinear relationships (Ref 27). Multidimensionality refers to the number of input and output parameters, and the heterogeneity refers to a high variability of data types of the parameters in the dataset (Ref 28). ML models can deliver fast prediction result within seconds [29, 30] and have been successfully implemented in CS to predict process outcomes (Ref 1,2,3,4,5).

Valente et al. (Ref 31) used a decision tree classifier ML algorithm to classify the flowability of powders used in CS from powder features such as area, perimeter, ellipse ratio, and circularity. Testing the algorithm with unseen data led to an accuracy of 77.31%. Their dataset only comprised 21 datapoints with 29 features.

Ikeuchi et al. (Ref 32) used a neural network (NN) model to predict the shape of single-track profiles for cold spraying titanium from the input variables spray angle, traverse speed, and standoff distance. The model’s mediocre mean absolute percentage error (MAPE) of 8.342% on the test data was attributed to an insufficient data quantity. Therefore, in their following study they added virtual input–output subsets generated by a Gaussian function model to the measured profile shape data. This technique enabled them to improve the prediction accuracy compared to the previous model, and they achieved a MAPE of 1.23% on the test data set (Ref 33).

Liu et al. (Ref 34) utilized a NN to predict multi-layer profiles of cold-sprayed deposits on diverse substrate morphologies. Inputs included stand-off distance, gun traverse speed, deposition cycle count, and 206 points from the previous layer’s profile, while coating profile data served as targets. The NN model achieved a relative error (RE) within 10% when compared to experimental profiles. Discrepancies were attributed to assumptions in the model, not accounting for substrate abrasion and measurement inaccuracies of the 3D coating profiler used to generate the data (Ref 34).

Wang et al. (Ref 35) developed a NN model to predict critical velocity using 10 material features as model input. The 8 datapoints for training their models were taken from the literature. While the developed NN model demonstrated a high prediction accuracy of 96.45% when tested with the property data of two new materials, it is important to note that assessing the model’s performance with a limited dataset of only two datapoints does not provide a comprehensive validation of its predictive capabilities.

The examples show that the prediction accuracy of the models is depending largely on the dataset size and quality of the data. Further these dataset characteristics determine the application range of the models.

This paper aims to develop for the first time a ML model that can be applied to predict the porosity in cold spray metal deposits using process parameters and spray powder and substrate properties as inputs. The dataset necessary to train ML algorithms for this task was developed from experiments reported in published academic papers. With this strategy, a larger pool of data can be generated, and a larger universal applicability of the models can be achieved, helping future users of the models to apply it for a wide range of CS process scenarios. While this approach offers the benefit of models with a wide application range, there is a potential drawback associated with using literature data. The utilization of such data introduces a higher level of uncertainty regarding its accuracy and quality, which in turn may negatively impact the performance of the model. For instance, as different authors reported different sets of parameters, in the resulting combined dataset there were numerous parameters with incomplete data available.

To exploit the advantage of a dataset generated from the literature and to overcome its drawbacks, several data preparation techniques were applied to pinpoint the most effective ways to prepare a high-quality dataset. In “Machine Learning Algorithms“ section of this paper, the applied ML algorithms are described. “Data Preparation” section is about the extensive data preparation strategies that were applied, including a parameter selection strategy, missing value, and categorical feature handling strategies. In “Method” section, methods of measuring the feature importance, null value replacement accuracy, and prediction accuracy are introduced. Finally, in “Results and Discussion” section the results of the applied methods are presented including a description of limitations and suggestions for future works.

Machine Learning Algorithms

Machine learning is an artificial intelligence (AI) technique that allows a computer to automatically learn from data and make decisions or predictions without being explicitly programmed [36]. Once an ML algorithm has been trained, it is presented with a test data set to determine how well it can identify the samples that it has never seen before [27].

In this work, the bagging ensemble method random forest (RF) and the boosting ensemble methods extreme gradient boosting (XGB) and categorical boosting (CatB) were deployed on a dataset to make predictions of the porosity in cold spray. A RF is an ensemble learning method where multiple decision trees are trained on random subsets of the training data with replacement. Each decision tree makes predictions independently, and the final prediction is determined by averaging the predictions of all trees, resulting in improved accuracy and robustness compared to individual decision trees (Ref 37).

XGB, introduced by Chen (Ref 38), is an advanced boosting ensemble method. Specifically, XGB uses a more regularized model formalization which can reduce the sensitivity of predictions to individual observations. Thereby, the risk of overfitting to the training data can be reduced, and the overall performance of the model can be improved. Further XGB focuses on the computational power, by parallelizing the tree formation leading to speed gains in the training of the model. Another advantage of the XGB algorithm is that it can handle missing values in the dataset and therefore does not necessitate the application of missing value replacement methods (Ref 38).

CatB is a boosting variant developed by researchers at Yandex (Ref 40) that stands out for its ability to handle categorical values. This is achieved through an ordered target encoding (OTE) technique. In OTE, each instance’s category is replaced with the mean target value of instances that come before it in the dataset. By avoiding reliance on information from the entire dataset, the model is less likely to overfit the training data, enhancing its generalization to new, unseen data. CatB also applies an ordering principles to the development of the tree structure and the calculation of the residuals, known as ordered boosting, which leads to a further mitigation of the risk of overfitting (Ref 39).

The mathematical details of the RF, XGB and CatB algorithms can be found in Breiman (Ref 37), Chen (Ref 38), and Prokhorenkova (Ref 39), respectively.

Data Preparation

Here, the data for training and testing the ML models were generated from the information provided in cold spray academic papers about process parameters and the associated part properties. The inclusion of instances to the dataset was constrained to non-heat treated, as-sprayed samples that were produced with a one-component metal powder type. The information provided by each paper varied significantly leading to a non-negligible number of missing values in the data set. Due to the high number of missing values for some of the parameters, a minimum of 35% completeness was set as a threshold for inclusion to the dataset. The completeness of a parameter was derived as the ratio of non-null count of the respective parameter to the total number of experiments. Therefore, from 35 papers 242 porosity measurements and data about 14 relevant process parameters that were used to produce the respective samples were included in the dataset. The percentage of completeness for the entire dataset was 84%. Figure 2 illustrates the structure of the dataset. The vertical axis represents the parameters of the dataset (such as gas temperature, gas pressure, etc.), and the horizontal axis represents the experiments reported in an academic paper. Therefore, each field in this matrix represents a specific parameter value that was used in the respective experiment reported in the paper. In some cases, the information about certain parameters was not shared in the article, leading to a missing value in the data matrix. The difference between fields with information content and fields with missing data is indicated in Fig. 2. Black fields represent parameters with information content, white fields represent missing data. Table 2 gives an overview of the 14 parameters and the extent of missing data.

Fig. 2
figure 2

Structure of dataset developed from academic papers. Black fields represent available information, whereas white fields represent missing data

Table 2 Overview of the parameters of the dataset developed from experiments reported in academic papers

Parameter Selection

The success of a machine learning project is highly dependent on the selection of parameters. Considering the available dataset with n = 14 parameters and the option to choose from k = {1,..., 14}, there are a total of 16,383 possible combinations of parameters.

$${\text{Number}}\;{\text{of}}\;{\text{combinations}} = \sum\limits_{k = 1}^{14} {\frac{n!}{{\left( {n - k} \right)! \cdot k!}}}$$
(1)

where n: elements to choose from k: elements chosen.

To reduce the number of models to be tested, only selected parameter combinations were tested. In this case, a floating dataset selection was constructed, with a set of variables selected to correspond with a varying range of variable data completeness. Table 3 shows the resulting configurations (0-10) of the dataset.

Table 3 Definition of dataset configuration 0-10 via floating selection of cold spray parameters

The initial configuration 0 is a dataset including the parameters powder material, substrate shape, process gas type, and gas pressure. For each following configuration (1-10), one additional new parameter was added (see parameters from Table 2). The order in which the new parameters were added was depending on the non-null count of the respective parameter (see Table 2, columns ‘non-null count’). Parameters with a high non-null count were added first. With this method, a range of dataset configurations was created. Starting with configurations without missing values (configurations 0, 1, 2), each following configuration (configurations 3-10) exhibits an increasing number of missing values.

Adding more parameters can contribute to the accuracy of the predictions of a machine learning algorithm if there is a strong correlation between the respective parameter and the target parameter (Ref 40).

On the other hand, in this case the inclusion of a parameter leads to an increase in the number of null values in the dataset and all gaps in the dataset had later to be replaced using a null-value replacement strategy. Depending on how close the null-value replacements are to the unknown true values, this can result in a lower accuracy of the predictions by the model (Ref 41).

Handling of Missing Values

The validity of the application of missing data handling strategies for an incomplete dataset requires meeting certain assumptions about the reason for missingness. These reasons are referred to as the missing data mechanisms (Ref 42). A value of a variable X1 is said to be missing completely at random (MCAR) if the probability that X1 is missing is not related to a specific value of X1 or the observed values of other variables Xn (Ref 42). That means that the missing observations are just a random subset of all observations and there is no systematic difference between the missing and observed data. The missing observations of an explanatory variable X1 are called missing at random (MAR) if the likelihood that the values of X1 are missing depends on the observed values of another variables Xn but not values of the variable X1 (Ref 42). In other words, the missingness of a field can be explained by the values in other columns, but not from that column. Missing data are said to be missing not at random (MNAR) if the likelihood that a value of a variable X1 is missing is related to the missing values of X1. When data are MNAR, almost all the standard statistical techniques are no longer applicable (Ref 42). For the available dataset, none of the above-mentioned missing data mechanisms can be ruled out with 100% certainty. Indeed, MCAR is estimated as the most likely missing data mechanism, due to the apparent random disclosure of information about spray parameters by the authors of the academic papers included in the dataset.

Configurations 0-2 do not contain any missing values and thus do not require the application of a null-value replacement strategy. For configuration 3-10, the addition of parameters comes along with the inclusion of null values in the dataset. To handle the missing values in the datasets (configuration 3-10), different null-value replacement strategies were applied.

The complete case analysis handles the missing values by deleting the rows having null values. That means instances with missing values are omitted from the dataset. The advantage of this method is that there is no data replacement required, and therefore, inaccurate data replacements are avoided. On the other hand, it leads to a loss of information in the dataset. This strategy works best if the data are MCAR, as in this case an analysis using only complete cases will not be biased (Ref 42).

Table 4 shows that from configuration 0-10 more parameters are included into the dataset; indeed, the range of each numerical parameter and the number of types of each categorical parameter are decreasing, leading to a model with a smaller application range.

Table 4 Parameter range for dataset configuration 0-10 for numerical parameters (top) and categorical parameters (bottom)

The mean/mode strategy is a univariate imputation technique, which means that it only uses the column with missing values for imputing values in that column. For numeric data, the mean of each parameter was calculated, and for categorical data the mode for each parameter was determined on the training data set. Missing values in each row of the training data set were then replaced by the mean or mode of the respective parameter. This is an approximation which can increase the model bias. Indeed, the loss of the data can be avoided which can lead to better results compared to the removal of instances with missing values (Ref 42).

K-nearest neighbor (k-NN) imputation algorithms are efficient methods to fill in missing data where each missing value in a data row is replaced by a value obtained from related cases in the whole data set (Ref 43). It belongs to the multivariate imputation techniques. In contrast with the univariate imputation techniques, other variables in the data are factored in to make better predictions about the potential true value of a missing value. K-nearest neighbor imputation algorithms are applicable in any of the three missing data mechanisms, as long as there is a relationship between the variable with the missing value and the other variables (Ref 6).

A k-NN-imputed value is either a value that was actually measured for another record in a database (1-NN) or the average of measured values from k records (k-NN) (Ref 43). The algorithm works by calculating the Euclidian distance from the row with the missing value to all the other corresponding row elements (Ref 6). Based on the calculated Euclidian distance, the k-NN finds a set of k nearest neighbors and then replaces the missing observations for a given parameter by the average of its neighbors’ observed values.

$${\text{Euclidian}}\;{\text{distance}} = \sqrt {{\text{weight}}*\left( {{\text{distance}}\;{\text{from}}\;{\text{present}}\;{\text{coordinates}}} \right)^{2} }$$
(2)

where weight: total # of coordinates / # of present coordinates

Multivariate imputation by chained equations (MICE) is a multivariate imputation technique that is used under the assumption that the missing data are MCAR or MAR. Implementing MICE when data are MNAR could result in biased results (Ref 44).

This technique focuses on one missing variable at a time. Once the focus is placed on one variable, MICE uses all the other variables in the data set to predict missing values in that variable. The prediction is based on a regression model that uses the currently focused missing variable as the dependent variable and all other variables as independent variables (Ref 44).

Handling of Categorical Features

A categorical feature is one that has two or more categories, but there is no intrinsic ordering to the categories. To use categorical features in a dataset to train a model, they first must be transformed into numerical features. Converting categorical data into numerical data can be done by integer encoding. In this procedure, each category of a categorical variable is assigned an integer value, for example, for the categorical variable ‘powder material’ the category CP titanium is 1 and the category copper is 2, etc. The problem with integer encoding is that the assigned numbers represent a natural ordered relationship, and the machine learning algorithms could assume that higher numbers are more important than lower numbers. Therefore, for categorical variables where no such ordinal relationship exists, the integer encoding is not sufficient (Ref 45).

In one hot encoding (OHE) for each category of a categorical feature, one binary attribute (which can have the value 0 or 1) is created. Therefore, it can be avoided that the machine learning model assumes a natural ordering between the categories. Table 5 gives an example of the one hot encoded categorical variable ‘Powder material’. In this case, instance 1 is a CP titanium sample, instance 2 is a copper sample, and instance 3 is a tantalum sample (Ref 45).

Table 5 Example for the application of the OHE strategy for the transformation of the categorical features ‘powder material’ into numerical features

The downside of this method is, that with every category, the feature space gets expanded, increasing the complexity of the model. Here, the OHE feature handling methods was applied. Further, the CatB machine learning algorithm uses the integrated categorical feature handling method OTE (see section machine learning algorithms), which was also used in this study.

Training Data and Testing Data

To separate the existing datasets (configurations 0-10) for the training and testing procedure, the datasets were split into a training and testing portion. Therefore, two different strategies were applied. For the complete case analysis, the datasets were split into training and testing sets in an 80:20 ratio. For the missing data replacement strategies, the data without null values were used as the testing dataset and the data with missing values were used as the training dataset. The latter strategy was selected to allow for a comparison of the applied null value replacement strategies.

Method

The goal of this study was to find the optimal data set configuration, null value replacement strategy, categorical feature handling method, and machine learning algorithm with optimal hyperparameters to achieve the model with the best prediction capacity. Figure 3 shows the examined combinations of these impact factors.

Fig. 3
figure 3

Overview of the dataset configurations, applied data preparation strategies and machine learning algorithms tested

Prediction Accuracy

The root mean squared error (RMSE) is a common performance metric that gives a good idea of the accuracy of a model given the range of the output values. A small RMSE means the model can make predictions with a high accuracy (Ref 46).

It is defined as:

$${\text{RMSE}} = \sqrt {\frac{1}{m}\sum\limits_{i = 1}^{m} {\left( {y_{{\text{p}}} - y} \right)^{2} } }$$
(3)

where yp: predicted output value for test set input values, y: corresponding output value from test set (target), m: number of test sets.

Several authors reported measurement ranges for the percentage of porosity for each examined cold spray sample (Ref 12, 15, 19, 47). The average of those reported measurement ranges is 1.72% porosity. Therefore, the target accuracy of a machine learning model aiming to predict the porosity in cold spray was chosen to be in a similar range of 2% porosity.

For each setup described in Table 3, a hyperparameter optimization for the respective applied machine learning algorithm was conducted. A “one at a time” approach and a randomized search were applied to find the best hyperparameter combinations for each setup. In both cases, a 10-fold cross-validation based on the RMSE as a performance indicator was performed on the training data set of the respective configuration. The “one at a time” approach is a systematic approach, where every hyperparameter is tackled one at a time, using the best results for subsequent iterations (Ref 48). For the randomized search, which is based on inputting all the hyperparameter ranges at once, the number of iterations was set to 60. With 60 iterations, 95% of the time, the best 5% sets of parameters can be found.

With this approach, more than 6000 models with different setups and hyperparameter combinations were evaluated on the training data. From this pool, the hyperparameter-optimized models for each setup were evaluated using the test data set of the respective data set configuration.

Feature Importance

The data set configuration 0-10 differs due the fact that they include different parameters (see Table 3), which can affect the models performances. To find out about which features have the most prominent effect on the porosity, a feature importance study was conducted. Indeed, it is assumed that the applied data preparation strategies alter the structure of the dataset. Therefore, an analysis of the effect of the impact factors mentioned in Fig. 3 on the feature importance was also conducted. Here, the Shapley value approach was selected for this analysis.

The concept of Shapley values was originally developed in Game Theory and is applied to estimate the contribution of each player to the output of a game. The Shapley value for a player is determined by calculating the expected marginal contribution of the player to the game’s output (i.e., value). The expected marginal contribution is defined as the average contribution of a player across all possible permutations or coalitions of players. All the marginal contributions of a player are weighted by the probabilities that they make the contribution. The weighted contributions are summed over all the coalitions that the player can join. Therefore, a player’s individual contribution and the interactions between players are considered (Ref 49, 50).

In ML Shapley values can be used to gain an understanding of the relationship between the prediction of the model and the components of the data instance that the model used to generate that prediction. Therefore, the value of the game is the model prediction and the feature values are the players (Ref 51).

The Shapley value Φi for the local importance (i.e., influence of a feature on the prediction made by a model for a specific instance) of the ith feature can be calculated as follows (Ref 51):

$$\Phi i = \sum\limits_{{S \subseteq F\backslash \left\{ i \right\}}} {\frac{{\left| S \right|!\left( {\left| F \right| - \left| S \right| - 1} \right)}}{\left| F \right|!}\left[ {f_{{S \cup \left\{ i \right\}}} \left( {x_{{S \cup \left\{ i \right\}}} } \right) - f_{S} \left( {x_{S} } \right)} \right]}$$
(4)

where |S|: size of the subset before ith feature is added, |F|: number of features, \(f_{{S \cup \left\{ i \right\}}} \left( {x_{{S \cup \left\{ i \right\}}} } \right)\): prediction made by the model with ith feature is included, \(f_{S} \left( {x_{S} } \right)\): prediction made by the model with ith feature is NOT included, \(\frac{{\left| S \right|!\left( {\left| F \right| - \left| S \right| - 1} \right)}}{\left| F \right|!}\): the weight for combinations for this occurrence, \(S \subseteq {\text{F}}\backslash \left\{ {\text{i}} \right\}\): all possible subsets without ith feature, \(S \cup \left\{ i \right\}\): a subset with ith feature added.

To calculate the global feature importance over the entire model, the average of the absolute Shapley values for all the instances (i.e., the local feature importance for each instance) is calculated (Ref 51).

When interpreting the Shapley values, it must be considered that they are not a measure of how important a given feature is in the real world. It is a measure of how important a feature is to the model. A model is not necessarily a good representation of reality, as predictions can be incorrect. The Shapley values essentially give the contributions of the model features to a prediction that usually deviates from the target variable. Therefore, no conclusion that goes beyond the model should be made (Ref 51).

Null Value Replacement Accuracy

Univariate and multivariate null value replacement techniques have been applied to handle missing values in the dataset. Subsets consisting of only complete instances of the data configurations 3-10 were used to assess the different null value replacement strategies. Therefore, null values were randomly inserted into these datasets. The percentage of null values was chosen to be equivalent to the percentage of null values in the dataset configurations that were used to train the machine learning models. The univariate replacement strategies did not necessitate the conversion of the categorical variables into numerical variables. Indeed, for the application of the multivariate null value replacement strategies a conversion of the categorical features into numerical features was required. For the conversion, the OHE strategy was used. Consequently, the percentage of data completeness varies depending on which type of null value replacement strategy was applied. Table 6 shows the percentage of completeness for both the univariate replacement strategy and for multivariate replacement strategies datasets.

Table 6 Null value percentage of dataset configurations 3-10 for univariate and multivariate replacement strategies

The random insertion of null values was repeated ten times for each dataset to allow for a statistically relevant assessment of the replacement strategies. Subsequently, the univariate (mean, mode) and multivariate (KNN and MICE) replacement strategies were applied on each dataset to replace the missing values. Due to the different scales of the parameters of the datasets, a standard scaler was applied to each value of the datasets (original complete datasets and datasets with replaced missing values) scaling each parameter to unit variance. The standard score of a sample x is calculated as (Ref 52):

$$z = \left( {x - u} \right)/s$$
(5)

where u: feature mean, s: feature standard deviation.

With standardized values in the dataset, a comparison of the different applied null value replacement strategies for each dataset configuration 3-10 using the RMSE was possible. In this case, the RMSE was calculated as:

$${\text{RMSE}} = \sqrt {\frac{1}{m}\mathop \sum \limits_{i = 1}^{m} \left( {{\text{inserted}}\;{\text{value}} - {\text{true}}\;{\text{value}}} \right)^{2} }$$
(6)

where m: number of missing values.

Results and Discussion

In this section, the result of the application of the methods introduced in “Method” section is discussed. An evaluation of the hyperparameter optimized models of each setup was performed on the test dataset. The features which have the biggest effect on the porosity in cold spray were identified with the help of the Shapley value approach. The accuracies of the applied null value replacement strategies were determined. Finally, limitations of the approach and suggestions of future works are discussed.

Evaluation of Hyperparameter-Optimized Machine Learning Models

For each setup in Fig. 3 a hyperparameter optimization was performed as described in chapter 4.1, followed by an evaluation of the hyperparameter optimized models using the test dataset. Figure 4 gives an overview of the performance of all the hyperparameter-optimized models on the test dataset using the RMSE as a performance indicator.

Fig. 4
figure 4

Performance of hyperparameter-optimized models for each setup using the RMSE as a performance indicator

Figure 4 shows that most of the model’s predictions deviate 3-4% porosity on average (denoted by the RMSE class (3,4]) from the true percentage porosity values. Indeed, 12 of the developed models are in the target range of 0-2% porosity RMSE of their predictions and are therefore within the specified target prediction accuracy. This means that these models can predict the percentage of porosity in a cold-sprayed part with a deviation of 0-2% porosity on average.

Table 7 shows the setup of these models and their performance on the testing data and the training data in detail.

Table 7 Overview of all hyperparameter-optimized models with a RMSE of < 2%

In most of the cases, the test error is less than the training error indicating a possible sampling bias in the test. This means that the respective test dataset possibly does not reflect the realities of the environment in which the model was trained. On the other hand, a much better performance on the training set as compared to the testing set would indicate an overfitting of the model to the training data. Comparing the results for the complete case analyses and the application of null value replacement strategies, it must be considered that for the former the training and testing set size decreased with the inclusion of parameters to the dataset (Table 8). This dataset shrinkage only occurred for the complete case analysis as in this case instances with missing values were removed from the dataset.

Table 8 Change of training and testing set size depending on the dataset configuration

This dataset shrinkage must be considered when evaluating the prediction accuracy of the models on the testing dataset. The evaluation of the models using the testing datasets with a higher number of testing samples provides more reliable results, whereas the evaluation of the models with a very small test dataset is less conclusive. Therefore, the use of a model that was tested with a larger test dataset is preferred in the case of similar evaluation results. It is noticed that all models for which the complete case strategy was applied were trained with a small dataset (19-52 samples) and therefore tested with a small dataset (5-13 samples) as well. On the other hand, it is noticeable that models trained with a dataset prepared using a missing value replacement strategy have a lower number of parameters for each dataset. As can be seen from Table 6, these datasets have lower percentages of missing values. Therefore, there is reason to believe that the missing handling replacement strategies were effective only when dealing with a low percentage of missing values.

Figure 5 presents a comparative analysis of predicted and actual porosity values derived from the respective test datasets for two models, one employing a complete case strategy (Table 7 Nr. 3) and the other utilizing a null value replacement strategy (Table 7 Nr. 7). The actual porosity values are the real measurements of porosity taken from the test samples, serving as the true reference for porosity in this analysis. The closer the points are to the red 45° line in each plot, the more accurate was the prediction of the respective model. The predictions of both models come close to the actual porosity values of the test samples. Indeed, in Fig. 5(a) a large evaluation gap in the test dataset can be observed between ca. 4% and 12.5% porosity. This may be due to the small test set size used for models trained with a dataset with many parameters that was prepared using the complete case strategy. Figure 5(b) does not exhibit an as big evaluation gap, due to the larger test set that was used to evaluate the model trained with the dataset for which a missing value replacement strategy was applied.

Fig. 5
figure 5

Comparison of predicted target values versus actual target values. a Conf. 8—complete case—OHE—CatB, b Conf. 2—Univariate—OHE—CatB

Table 9 shows the average deviation of predicted and real porosity values of the test data samples in different porosity ranges. Model #7 performs best in the porosity region from 1 to 10%, whereas the predictions for parts with a high porosity deviate more significantly from the real porosity values. Model #3 performs best in the porosity region from 1 to 3%. Due to the small test set size, there were no samples in the porosity region from 3 to 10%; therefore, the performance of model #3 in this region could not be evaluated.

Table 9 Differences between predicted target values and actual target values for testing portion per bin

Feature Importance Analysis Using the Shapley Value Approach

The Shapley value approach was applied to test the effect of different dataset configurations on the feature importance. Figure 6 shows the Shapley values of every feature for every sample for a CatB algorithm trained with a configuration 10 dataset. It uses the Shapley values to show the distribution of the impacts each feature has on the model output. The color represents the feature value (red: high, blue: low, gray: categorical). For example, low values of the feature ‘Gas pressure’ demonstrate a strong positive (increasing) effect on the resulting porosity, whereas high values of this feature have a negative (decreasing) effect on the porosity.

Fig. 6
figure 6

Shapley value distribution of a CatB algorithm trained with a configuration 10 dataset

Calculating the Shapley value for a specific feature for every sample in the dataset and averaging the absolute values lead to the feature importance of the entire model. Figure 7 shows the mean absolute Shapley value calculated for every sample and every feature of a CatB algorithm trained with the configuration 10 dataset.

Fig. 7
figure 7

Mean absolute Shapley values of all features of a CatB algorithm trained with a configuration 10 dataset

Therefore, for this model it can be concluded that the most important features in the model are the gas pressure, gas temperature and the Cold spray system, whereas less important features are the nozzle standoff distance, the spray angle, and the substrate shape. This approach was taken a step further by calculating the average of the mean absolute Shapley values of all tested models. Table 10 shows the average of the mean absolute Shapley values for every feature.

Table 10 Average mean absolute Shapley values across all models

A comparison of the average mean absolute Shapley values from Table 10 with the mean absolute Shapley values of the CatB model trained with the configuration 10 dataset (Fig. 7) shows that there are differences in the importance of single features across the different models. For instance, the feature cold spray system is the third most important feature for the CatB model trained on the configuration 10 dataset, whereas this feature is the third least important feature across all the tested models. Table 11 summarizes the average mean absolute Shapley values depending on the applied machine learning algorithm. The lower part of the table shows a ranking of the features depending on the respective average mean absolute Shapley value. As a conclusion, there are only minor differences in how the choice of the machine learning algorithm affects the importance of the single features.

Table 11 Comparison of the average mean absolute Shapley values for each applied machine learning algorithm

Table 12 summarizes the difference of the average mean absolute Shapley values as per different applied null value replacement strategy. When considering the feature rankings in the bottom half of the table, it appears that there are only minor differences between the univariate (Mean, mode) and the multivariate (KNN, MICE) replacement strategies. However, there is a major difference evident between those strategies, the complete case analysis, and the XGB strategy, for the features of spray angle, powder feeder rate, and particle size. Therefore, the selection of the null value replacement strategy significantly affects the resulting feature importance of the models.

Table 12 Comparison of the average mean absolute Shapley values for each applied null value replacement strategy

Table 13 compares the two different applied categorical feature handling strategies—OHE and OTE—for a complete case analysis and a univariate replacement strategy. In both cases, the complete case analysis and the univariate replacement strategy, it is apparent that the categorical feature strategy also has a major impact on the determined feature importance of the respective models. Major differences can be observed for the features powder material, spray angle, standoff distance, traverse speed, powder feeder rate, cold spray system, nozzle type, and average particle size.

Table 13 Comparison of the average mean absolute Shapley values for each applied feature handling strategies

Altogether, it is apparent that drawing a conclusion from the Shapley values to the effect of a feature on the porosity is not advisable, as the applied data processing steps (null value replacement strategy and categorical feature handling) have a major impact on the structure of the data and the Shapley values.

Indeed, for the features of substrate shape and gas temperature a similar ranking was observed across all different analyses. The substrate shape is constantly ranked as a feature with a low impact on the porosity, whereas the gas temperature is constantly ranked as a highly important feature. The feature substrate shape is a two-category feature for which the feature category “flat” (240 instances) was highly overrepresented compared to the feature category “round” (two instances) which is the reason for the low impact of this feature on the porosity. Therefore, in any case the developed models should only be applied for the porosity prediction of “flat” sprays. The high significance of the gas temperature on the porosity is in line with experimental studies (Ref 17) and can, due to consistent results across all models, be assumed as valid.

Evaluation of Different Null Value Replacement Strategies

Following the description of the method in “Null Value Replacement Accuracy” section, an evaluation of the null value replacement strategies was conducted. Table 14 gives an overview of the accuracy of the null value replacements of each applied strategy for each dataset configuration expressed through the RMSE.

Table 14 Comparison of the accuracy of the null value replacement strategies applied on standard scaled datasets using the RMSE

From those results, it can be concluded that the Univariate replacement strategy and the KNN-replacement strategy led to comparable results in terms of accurately replacing missing values. The clearly best result was achieved using the MICE replacement strategy on dataset configurations with a lower number of parameters (dataset configurations 3 and 4).

Table 15 gives an overview of the RMSE for the MICE replacement strategy applied on dataset configuration 3 without the application of the standard scaler for different parameters. Without the standard scaler, an interpretation of the deviation between the true values and the imputed values for each parameter is possible.

Table 15 Application of the MICE replacement strategy for a configuration 3 dataset—without the application of a standard scaler

On the categorical parameters and the numerical parameter ‘spray angle,’ the MICE strategy performed well. Indeed, the performance on the numerical parameters ‘Gas temperature’ and ‘Gas pressure’ is unsatisfactory.

Limitations of the Approach and Future Works

A major downside of the reported approach is the limited control over the available data. For instance, the input parameters particle velocity and particle temperature are two main factors that affect the porosity level of a cold spray deposit [21, 53]. These parameters were not included in the development of the models reported here. This is because these parameters were not measured and reported together with the porosity in a sufficient quantity of reports. A possible solution could be the development of a model for the prediction of the particle velocity and/or particle temperature that could be used as an intermediate model to the existing porosity prediction models. This indeed is out of the scope of this work.

This study explores various dataset configurations with different input parameters. Configurations with fewer parameters, such as configuration 4, have the drawback of not considering variations in parameters excluded from the dataset, such as powder morphology, traverse speed, powder feeder rate, cold spray system, nozzle type, and average particle size (refer to Table 3). This may lead the model to learn incorrect relationships between input and output data.

On the contrary, certain input parameters exert a more pronounced impact on porosity than others. Removing irrelevant or redundant features from the dataset, thereby reducing model complexity, serves as a strategy to counteract overfitting. Hence, in this study, different dataset configurations were tested and evaluated to find a good balance between model complexity and overfitting.

The most effective solution would be to generate a firsthand dataset with comprehensive knowledge of the experimental procedure. Generating a first-hand dataset and employing it for the development of a predictive ML model could be viable if a smaller set of parameters was varied. While this would decrease the effort required for data generation, it would also limit the model’s applicability to diverse spray scenarios.

Another point for discussion involves the porosity levels reported in the papers, which were measured using two different techniques. 90% of the porosity samples were measured using image analysis following a metallographic polishing of cross sections of the samples. With this method, factors such as inconsistent sample preparation, inconsistent image quality, choice of thresholding and filtering method, and subjectivity in selecting regions of interest can contribute to inconsistent results. The other 10% of the samples were measured using the Archimedes method. Here the results can be affected by inconsistent surface cleaning of the samples, incomplete penetration of the liquid into the pores, and insufficient drying of the samples. Therefore, both methods are highly susceptible to operator bias. Altogether, this is expected to reduce the consistency of the porosity measurement results in the dataset and therefore negatively affects the prediction accuracy of the models trained with the dataset. This is a disadvantage that can only be truly overcome by developing a model that was trained by a dataset that was generated by one person or a small group of people using the exact same method. One drawback of this practice indeed is that generating an equivalent volume of data would be significantly more challenging.

Due to the presence of categorical features, such as process gas type, powder, and substrate material type, etc., different categorical feature replacement strategies have been applied in this study. Another way of transforming the categorical values into numerical values would have been to use material properties such as density, yield strength, or melting point. These parameters are proved to indirectly affect the porosity through their influence on the critical velocity and actual particle velocity (Ref 54,55,56). Indeed, the values for those parameters were rarely measured or reported together with the porosity. Therefore, this approach was rejected.

Finally, it is crucial to ensure that any input data used with the trained models fall within the ranges of the training data to achieve a prediction accuracy similar to that demonstrated in the manuscript. This requirement applies to both directly included parameters in the model and those indirectly influencing it. For example, a model trained with dataset configuration 10 includes particle size as an input parameter and will adjust its output based on changes in the particle size input parameter, among others. To achieve an accuracy comparable to that reported in the manuscript, the model should only be applied to input parameters within the range of the training dataset.

For a model trained with a configuration 9 dataset, the particle size is not directly included as a parameter. Therefore, it will not respond to a change of the particle size in the input parameters. Indeed, the particle size should also only be varied within the value range of the training set. Selecting an average particle size outside this range may result in a prediction accuracy different from that stated in the manuscript.

Conclusion

In this study, a dataset was generated from the information provided in cold spray academic papers regarding process parameters and the associated part properties. Due to the structure of the available data, several data pre-processing methods were applied to prepare the data for the training and testing of machine learning algorithms. The goal was to develop a machine learning model that can accurately predict the porosity of a cold-sprayed part with an average prediction error of less than 2% porosity when applied within the input parameter ranges of the training data. The assessment of multiple model setups led to the following conclusions:

  1. (1)

    The application of null value replacement strategies and categorical feature handling strategies presumably affects the structure of the dataset and the importance of single features for the predictions of the models.

  2. (2)

    In all tested models, the feature gas temperature was ranked as a highly influential feature for the porosity prediction of a cold-sprayed part.

  3. (3)

    The tested null value replacement strategies fail to accurately replace all the missing values in the datasets. In the best case with the MISE strategy, the average deviation of replaced missing values for ‘Gas pressure’ and ‘Gas temperature’ to the true values was 0.93 MPa and 130.58 °C, respectively. This is still a high value given that the range for both parameters for the given dataset is 6.5 MPa and 1080 °C, respectively.

  4. (4)

    With 72 out of 180 models evaluated with the testing dataset, an average deviation from the predicted porosity to the true porosity of 3-4% was achieved. This is a sufficient accuracy if the model is used to “classify” parts into dense and porous parts. Indeed, the accuracy would be insufficient to distinguish between fully dense materials.

  5. (5)

    With 12 of the models evaluated using a testing dataset, the targeted prediction accuracy of less than 2% porosity prediction error was successfully achieved. Indeed, 8 out of those 12 models were developed using only the complete instances in the dataset. It must be considered that the validation of these models against the testing dataset becomes less conclusive due to the occurring shrinkage of the testing datasets.

  6. (6)

    4 out of the 12 models that achieved and RMSE of less than 2% were trained with a dataset prepared using missing value replacement strategies. The fact that those models were also trained with datasets with a low number of parameters and therefore a low number of missing values led to the conclusion that the application of the applied missing value replacement strategies is only effective in the case of a low percentage of missing values in the dataset.