Introduction

The coating process in Thermal Spraying (TS) is associated with many complex physical phenomena. Due to the large number of parameters involved in this coating technology as well as the nonlinear relationships between these parameters, precise control and optimization of the TS processes is a lengthy and expensive undertaking. Not all of the influencing parameters can be controlled, because on the one hand the effect of many variables on the coating process is not quantitatively measurable, and on the other hand the technical possibilities for an adequate process monitoring are still lacking. Hence, simulation and modeling approaches like the Computational Fluid Dynamics (CFD) are often employed to capture the involved complex physical phenomena. Although CFD offers high potential for understanding the sub-processes of the TS coating technology, the tradeoff between the accuracy of the model and the computational cost has been always a challenge in CFD problems. The simulation of the particle free-jet in a multi-arc plasma spraying process, which is the focus of this study, requires elevated computational cost, while not sacrificing the accuracy of the model(Ref 1).

A promising possibility for substitution of the computationally expensive CFD simulations in plasma spraying is to create a Digital Twin of the process using Machine Learning (ML) algorithms. Digital Twin is referred to as a virtual and computerized representation of a physical system in real space including the data and information that ties the virtual and real systems together (Ref 2). This digital replication occurs mainly by integration of the artificial intelligence methods, with the aim of system optimization, monitoring as well as prognostics (Ref 3). This leads subsequently to greater efficiency, accuracy and economic benefits for the considered system (Ref 4). The majority of prior research works have used experimental data sets to create Digital Twins for the TS process variants with the objective of predicting the particle properties or controlling the process parameters (Ref 5, 6). There are only few studies in the literature that used simulation data sets for training ML models in TS (Ref 7). The motivation of using simulation results is the opportunity to cover a broad range of process parameters, while providing that much experimental data is barely possible. This results not only in enhancement of the prediction accuracy of the model, but also in speeding up computations dramatically.

The goal of the present study is to take the primary steps toward building up a fast Digital Twin for the plasma spraying process to predict the in-flight particle properties based on various input process parameters using ML methods. To this end, several sets of process parameters and particle properties are acquired from CFD simulations of the plasma jet. The data preparation is carried out using two different design of experiments (DOE) methods, namely Central Composite Design (CCD) and Latin Hypercube Sampling (LHS). Finally, the prepared data are fed into a Residual Neural Network (ResNet) and a Support Vector Machine (SVM) to predict the particle properties. The results of the different ML models and DOE methods are then compared with each other in terms of the calculated prediction accuracy. Due to the randomness of the particle behavior caused by particle collisions and the turbulence of the plasma flow, a precise prediction of the properties of each single particle cannot be expected with the ML methods at hand. However, the accurate prediction of average particle properties serves as a key performance indicator in plasma spraying and can significantly help, for example, in investigating the interrelationships between process parameters and coating properties. Hence, the objective in this work is to accurately predict the average particle behavior depending on different sets of process parameters.

Numerical Modeling

The simulation data sets of this study are obtained from a former numerical model, simulating the plasma spraying process of a three-cathode plasma generator performed at the Surface Engineering Institute. To resolve different physical phenomena and reduce the model complexity of the entire system, the plasma spraying process is divided into two sub-processes that are modeled separately: the plasma generator model and the plasma jet model. In the plasma generator model, the flow characteristics at the plasma generator outlet including the temperature and velocity profiles as well as the profiles of turbulent kinetic energy and turbulent eddy dissipation were determined. By using these determined profiles as a boundary condition at the inlet of the plasma jet model, the two sub-models are coupled. A two-equation Shear Stress Transport (SST) turbulence model was used to simulate the turbulence inside the plasma generator as well as in the plasma jet. A detailed description of the numerical modeling used in this study can be found in (Ref 8, 9). For an accurate description of the plasma-particle interaction in plasma spraying, the influences of the plasma on the particles and vice versa were considered in the plasma jet model in a two-way coupled manner (Ref 10). Furthermore, a validation of the plasma generator and the plasma jet models was conducted by comparing numerical results to experimental data (Ref 11).

Figure 1 shows the simulated particle trajectories and their temperatures inside the plasma jet exemplary for one simulation. For each simulation, a virtual clipping plane is defined to export the particle properties at specific stand-off distances. The particle properties include the in-flight particle coordinates on the clipping plane, the velocities and the temperatures. The simulation models were created in ANSYS CFX version 20.2 (ANSYS, Inc., Canonsburg, USA). For each simulation, the calculated number of particle trajectories was set to 2000. Aluminium oxide was used as the feedstock material for the simulations. Further details regarding the procedure of preparing the simulation data are described in the next section.

Fig. 1
figure 1

Exemplary simulated particle trajectories and their temperatures in plasma jet

Data Preparation

Simulations often involve larger numbers of variables compared to physical experiments. It is necessary to find a set of input parameters, namely the design matrix, so that potentially the best-fitting predictive model can be constructed on the resulting data sets formed by the design matrix (Ref 12). Furthermore, this allows understanding the cause-and-effect relationships in the system by changing the designed input variables and observing the resulting changes in the system output (Ref 13). Therefore, two different DOE methods, CCD and LHS, were employed in this study to cover a set of representative input process parameters for the simulations. The parameter setup for the CCD and LHS methods is given in Table 1. Totally six different process parameters were considered for the DOE approach: primary gas flow (Argon), electric current, carrier gas flow, powder feed rate, particle size distribution at the injection point and stand-off distance. The particle sizes were divided into three different fractions to cover the broad spectrum of the possible particle size distributions in plasma spraying. The DOE methods were implemented in the MATLAB environment and were linked with the batch job scheduler of the simulation runs to create an automated data preparation pipeline. Overall, 45 simulations were carried out for the CCD data sets and another 45 simulations for the LHS data sets. In the following, both DOE methods and the structure of the data for the simulations are briefly described.

Table 1 Parameter setup for the DOE methods

Central Composite Design (CCD)

CCD is based on a two-level full or fractional factorial design, which has additionally 2k (k is the number of independent variables) points between the axes and a set of repeated points at the centroid (N0) (Ref 14). Figure 2 shows a geometric view of a CCD for a two-factor full factorial design. CCD is used widely in constructing second-order response surface models (Ref 15).

Fig. 2
figure 2

Geometric view of central composite design for k = 2 factors

Random errors are inevitable in physical experiments, and the output may be different even with the same experimental settings. In the contrary, the computer experiments are deterministic and multiple trials result in identical outputs. Hence, carrying out several runs at the centroid is only meaningful in physical experiments (Ref 12). In this study, the number of computational experiments was set to 45, which corresponds to a CCD with 6-factor fractional design (\({2}^{{\text{k - 1}}} {\text{ + 2k + N}}_{{0}}\)).

Latin Hypercube Sampling (LHS)

LHS is one of the most popular space-filling designs that aims at reducing the variance of sample mean (Ref 16). It is a stratified sampling technique, which divides the multidimensional experimental domain into N strata of equal marginal probability, where N is the number of sample points, in a way that each stratum contains only one sample point along each space dimension and then samples once from each stratum (Ref 12).

The maximin distance criteria can be imposed as an optimality criterion for construction of LHS to further decrease the variance of the sample mean. A maximin LHS maximizes the minimum distance between each pair of experimental points within the experimental domain, see Fig. 3. This optimality criterion ensures that the experimental points are spread out uniformly through the domain and therefore, no point lies too far away from a design point (Ref 17). This results in an enhancement of the prediction accuracy of the constructed model. LHS is a very suitable and powerful DOE technique for computer experimentation, which can serve various numbers of runs and input variables. In this study, the same number of runs as the CCD method was used for the LHS method to ensure the comparability of the results.

Fig. 3
figure 3

Transformation of a 2D LHS (left) to a maximin LHS (right)

Structure of Simulation Data

As mentioned earlier, for each of the DOE methods introduced in the above "Central Composite Design (CCD)" and "Latin Hypercube Sampling (LHS)" sections, 45 simulations are performed, respectively, with different input process parameters, see Table 1. For instance, the simulation data sets gathered from the LHS method for the parameters primary gas flow, electric current, carrier gas flow, powder feed rate, particle size distribution and stand-off distance, respectively, are:

  1. 1.

    40.36 SLPM, 461.6 A, 6.39 SLPM, 28.8 g/min, −35 +15 µm, 126 mm

  2. 2.

    40.36 SLPM, 532.9 A, 5.72 SLPM, 15.6 g/min, −35 +15 µm, 153 mm

  3. 3.

    41.37 SLPM, 473.8 A, 4.04 SLPM, 12.0 g/min, −35 +15 µm, 169 mm

    $$\vdots$$
  4. 45.

    59.87 SLPM, 470.3 A, 4.04 SLPM, 18.0 g/min, −75 +55 µm, 144 mm

The CCD simulation data are also structured into 45 simulations. As it is evident from the above structure, within each of the 45 CCD or LHS simulations, only the particle size can vary in the specified range and the five other process parameters are kept constant. The outputs of the simulations are the in-flight particle properties of the 2000 simulated particle trajectories per simulation, respectively. However, regarding the different process parameters within each simulation, not all of the 2000 simulated particle trajectories can reach the specified stand-off distance. Hence, the exact number of output data per simulation for the 45 CCD or LHS simulations is not the same and can vary between 1500 and 2000 particle trajectories. The inputs and outputs of each simulation are provided with indices to be able to assign the particles of each simulation for the ML models.

Machine Learning Algorithms

The DOE methods provide the representative simulation data sets for training the ML models that are SVM and ResNet. The inputs of the prediction models are the process parameters listed in Table 1. The outputs are the particle properties including the in-flight particle temperatures \(T_{p}\) [K] and velocities \(v_{p}\) [m/s] as well as the in-flight particle x-coordinates \(x_{p}\) [m] and z-coordinates \(z_{p}\) [m] at specific stand-off distances on the virtual substrate (clipping plane).

Due to the collisions of the particles and the turbulence of the plasma flow, even particles of nearly the same size can have different coordinates in the plasma jet and thus, vary greatly in temperature and velocity. Hence, it can be barely expected that the ML models could predict single particle properties with high accuracy, but the average particle properties should be reproducible with a sufficiently small error.

The results from the LHS and CCD methods were each partitioned into one training data set and one test data set. From each of the respective 45 simulations, 75% of the data are used as training data and the remaining 25% as test data. As described in the previous section, the number of particles per simulation may differ and thus the overall number of particles in the training and test data sets for the CCD and LHS methods is different. The training data for CCD contain 64,858 particles and the test data include 21,612 particles, while these numbers amount to 64,728 and 21,566 for the LHS, respectively. The training and test data used for the two ML models were kept identical. Even though both the SVM and ResNet are trained and tested with the whole training and test data out of the 45 simulations respectively, the allocation of the particles to each simulation is still known by use of the indices mentioned in "Structure of Simulation Data" section as data labels. This is utilized in the evaluation of the results in "Results and Discussion" section. In the following, the SVM and ResNet algorithms used in this study are described.

Support Vector Machine (SVM)

The SVM theory introduced by Vapnik (Ref 18) has faced dramatic attention in statistical learning theory and has been increasingly applied by researchers in various fields, where the TS forms no exception (Ref 19, 20). SVM is a supervised-learning algorithm that uses structural risk minimization and a symmetrical loss function, which equally penalizes high and low errors. An important property of the SVM regression is that its computational complexity does not depend on the dimension of the input space. Furthermore, it has great generalization capability, with high prediction accuracy (Ref 21).

The goal of linear SVM regression is to find an approximated hyperplane for the true model \(f\) in the form of:

$$g\left( x \right) = \left\langle {w,\phi \left( x \right)} \right\rangle + b$$
(1)

where \(w\) is the normal vector of \({ }g\), \(\phi\) is a mapping function, which could initially be considered as identity function, and \(b\) is a bias parameter. The predicted values from \(g\) should have a bounded deviation no more than \(\varepsilon\) from the true values \(f\left( x \right)\), i.e.,

$$\left| {g\left( x \right) - f\left( x \right)} \right| \le \varepsilon$$
(2)

The distance between the hyperplane \(g\) and the farthest point away is called margin and it is proportional to \(\frac{1}{||w||}\). The boundary of a maximal margin is called a support vector, see Fig. 4. In addition, \(g\) should be maximally flat, i.e. \(||w||\) should be as small as possible and the margin as large as possible (Ref 22).

Fig. 4
figure 4

Illustration of the support vectors, margins and slack variables in SVM regression

In practical cases, this kind of hyperplane is not guaranteed to exist. In order to cope with otherwise infeasible constraints, the slack variables \(\xi\) and \(\xi^{*}\) are introduced to construct a soft margin hyperplane. Consequently, the constrained optimization problem could be formulated as (Ref 23):

$${\text{Minimize}}:\,\,\,\,\,\,\frac{1}{2}||w||^{2} + C\mathop \sum \limits_{i} \left( {\xi_{i} + \xi_{i}^{*} } \right)$$
(3)
$${\text{Subject to}}:\,\,\,\,\,g\left( x \right) - f\left( x \right) \le \varepsilon + \xi_{i}^{*}$$
(3.1)
$$f\left( x \right) - g\left( x \right) \le \varepsilon + \xi_{i}$$
(3.2)
$$\xi_{i} ,\xi_{i}^{*} \ge 0 \quad \forall \, i = 1, \ldots ,\left| \Omega \right|$$
(3.3)

where \(\Omega\) denotes the input variable space and \(C > 0\) is a constant that determines the penalties for training errors. A closed form representation of the regression hyperplane \(g\) could be derived from the dual form of the optimization problem above:

$$g\left( x \right) = \mathop \sum \limits_{i} \left( {\alpha_{i}^{*} - \alpha_{i} } \right) \left\langle x_{i} ,x \right\rangle + b$$
(4)

where \(\alpha_{i}\), \(\alpha_{i}^{*}\) are Lagrange multipliers (Ref 23).

The already introduced linear form of SVM regression could be transformed into a nonlinear feature space via a nonlinear mapping \(\phi { }:{ }\Omega { } \to \tilde{\Omega }\). The dot product in \(\tilde{\Omega }\) could be expressed by the kernel function \(k\left( {x_{i} ,x_{j} } \right) = \left\langle {\phi (x_{i} ),\phi (x_{j} )} \right\rangle .\) With the implicit mapping of kernel function \(k\), it is possible to directly compute the hyperplane \(g\) in the nonlinear feature space. With this so-called kernel trick, the final form of the approximated hyperplane could be expressed as:

$$g\left( x \right) = \mathop \sum \limits_{i} \left( {\alpha_{i}^{*} - \alpha_{i} } \right)k\left( {x_{i} ,x} \right) + b$$
(5)

where the corresponding constrained optimization problem is now formulated in the transformed feature space \(\tilde{\Omega }\) instead of in the original input variable space \(\Omega\), thanks to the implicit mapping \(\phi\) and the kernel function \(k\) (Ref 24). One advantage of SVM is that although the training involves nonlinear optimization, the corresponding objective function is convex, and therefore, any local solution represents also a global optimum (Ref 25).

In this study, the implementation of the SVM regression algorithm was carried out using the Statistics and Machine Learning Toolbox of MATLAB. In order to make the inputs and targets insensitive to the scales and magnitudes on which they are processed, a preprocessing step has been carried out to standardize the training data sets. The standardization was done based on the so-called z-score method, in which the corresponding standardized data have a mean value of zero and a standard deviation of one. Hence, the shape of the original data set is retained.

Four single-output SVM models, corresponding to the four outputs, for each of the two DOE methods (LHS and CCD) were developed. For training the regression models, Gaussian kernels based on Eq 6 were employed, where \(\gamma\) represents the kernel scale.

$$k\left( {x_{i} ,x_{j} } \right) = {\text{exp}}\left( { - \frac{\left\|x_{i} - x_{j}\right\|^{2} }{{2\gamma^{2} }}} \right)$$
(6)

The training of the SVM models has been conducted with different kernel scales as given in Table 2 to choose the best prediction accuracy. The term \(P\) in Table 2 denotes the number of predictors, which is equal to \(P = 6\) in this study. Furthermore, a 10-fold cross-validation was used to analyze the level of generalization and prevent possible overfitting.

Table 2 Kernel scales of different Gaussian kernels applied for training the SVM models

Residual Neural Network (ResNet)

The classical Artificial Neural Network (ANN) is a multilayer perceptron represented by a mathematical function which maps input values to output values. For an ANN with \(L\) layers and a vector \(x^{\left( 0 \right)}\) containing input values, the output vector \(x^{\left( L \right)}\) representing the prediction of the ANN is determined by:

$$x^{\left( L \right)} = \sigma_{2} \left( {W^{{\left( L \right)^{T} }} \sigma_{1} \left( {W^{{\left( {L - 1} \right)^{T} }} \sigma_{1} \left( { \cdots \sigma_{1} \left( {W^{{\left( 1 \right)^{T} }} x^{\left( 0 \right)} + b^{\left( 1 \right)} } \right) + \cdots } \right) + b^{{\left( {L - 1} \right)}} } \right) + b^{\left( L \right)} } \right)$$
(7)

where \(W^{\left( l \right)}\) and \(b^{\left( l \right)}\), \(l = 1, \ldots ,{ }L\), are weights matrices and bias vectors, respectively, \(\sigma_{1}\) is a nonlinear activation function, e.g., hyperbolic tangent or ReLU, and \(\sigma_{2}\) is an activation function which may differ from \(\sigma_{1}\) and which may be linear. For a given target vector \(y\), the goal is to minimize the deviation of the output vector \(x^{\left( L \right)}\) from \(y\). This deviation is often measured by a loss function, where for regression problems the square error

$$e = \left\| {y - x^{(L)} } \right\|_{2}^{2}$$
(8)

is commonly used. Note that Eq 8 states the error for a single training example, i.e., for one target vector \(y\). For the computation of the mean square error of a training set with \(N\) entries, all errors are summed up and divided by \(N\). To minimize the error, suitable weights matrices \(W^{\left( l \right)}\) and bias vectors \(b^{\left( l \right)}\), \(l = 1, \ldots ,{ }L\), have to be determined. This is done by applying an iterative training process using backpropagation as described below.

In practice, the prediction \(x^{\left( L \right)}\) of an ANN in Eq 7 is computed by forward propagation, which successively predicts the output vector \(x^{\left( l \right)}\) of each layer \(l = 1, \ldots ,L\) of the network by:

$$ x^{\left( l \right)} = \left\{ {\begin{array}{*{20}l} {\sigma_{1} \left( {W^{{\left( l \right)^{T} }} x^{{\left( {l - 1} \right)}} + b^{\left( l \right)} } \right),\, l = 1, \ldots ,L - 1,} \\ {\sigma_{2} \left( {W^{{\left( l \right)^{T} }} x^{{\left( {l - 1} \right)}} + b^{\left( l \right)} } \right),\, l = L.} \\ \end{array} } \right. $$
(9)

ResNets are a particular class of ANNs designed to improve the training of deep networks (Ref 26). The ResNet used in this work is fully connected without skip connections. The only difference in the computation of the ResNet output compared with a standard ANN is the addition of the output \(x^{{\left( {l - 1} \right)}}\) of the previous layer to the right-hand side of the forward propagation formula in Eq 9 for \(l = 1, \ldots ,L - 1\). Here, a ResNet is used where the number of neurons per hidden layer is set to be equal to the number of features (six in this setting, see Table 1). This is denoted as simplified ResNet (SimResNet). Its properties have been discussed, for instance, in (Ref 27, 28). For the SimResNet, the prediction or forward propagation formula reads:

$$ x^{\left( l \right)} = \left\{ {\begin{array}{*{20}c} {x^{{\left( {l - 1} \right)}} + \sigma_{1} \left( {W^{{\left( l \right)^{T} }} x^{{\left( {l - 1} \right)}} + b^{\left( l \right)} } \right),\, l = 1, \ldots ,L - 1,} \\ {\sigma_{2} \left( {W^{{\left( l \right)^{T} }} x^{{\left( {l - 1} \right)}} + b^{\left( l \right)} } \right), \, l = L.} \\ \end{array} } \right. $$
(10)

The forward propagation of Eq 10 is the first step in one iteration of the training algorithm. Subsequently, the weights \(w_{ij}^{\left( l \right)}\) and the biases \(b_{i}^{\left( l \right)}\) are updated for the next iteration by backpropagation, i.e., by adding

$$\Delta w_{ij}^{\left( l \right)} = - \eta x_{i}^{{\left( {l - 1} \right)}} \delta_{j}^{\left( l \right)} \,\,\,{\text{and}}\,\,\,\,\Delta b_{i}^{\left( l \right)} = - \eta \delta_{i}^{\left( l \right)}$$
(11)

respectively, where \(\eta\) is the learning rate,

$$\delta_{j}^{\left( L \right)} = 2\left( {x_{j}^{\left( L \right)} - y_{j} } \right) \cdot \sigma_{2}^{^{\prime}} \left( {\mathop \sum \limits_{i = 1}^{{n_{N} }} \left( {w_{ij}^{\left( L \right)} x_{i}^{{\left( {L - 1} \right)}} } \right) + b_{j}^{\left( L \right)} } \right)$$
(12)

and

$$\delta_{j}^{\left( l \right)} = \left( {\mathop \sum \limits_{k = 1}^{{n_{O} }} \delta_{k}^{\left( L \right)} w_{jk}^{\left( L \right)} + \mathop \sum \limits_{m = l + 1}^{L - 1} \mathop \sum \limits_{k = 1}^{{n_{N} }} \delta_{k}^{\left( m \right)} w_{jk}^{\left( m \right)} } \right) \cdot \sigma_{1}^{^{\prime}} \left( {\mathop \sum \limits_{i = 1}^{{n_{N} }} \left( {w_{ij}^{\left( l \right)} x_{i}^{{\left( {l - 1} \right)}} } \right) + b_{j}^{\left( l \right)} } \right)$$
(13)

for \(l = 1, \ldots ,{ }L - 1\). Here, \(n_{O}\) and \(n_{N}\) denote the number of outputs (predictions) and the number of neurons per hidden layer, respectively. The formulas of Eq 11-13 are derived using the optimality condition of the minimization problem of the error given in Eq 8. With the updated weights matrices and bias vectors, the next training iteration starts with the forward propagation of Eq 10.

The iterative process of forward- and backpropagations described above is applied to a set of training data. For each input value of this set, Eq 1013 are computed iteratively to update the weights and biases until a mean error regarding the whole training set is sufficiently small.

For this study, a MATLAB code developed at the Institute for Geometry and Applied Mathematics is used for training and testing ResNets for the CCD and LHS data sets. The hyperparameters, which have to be fixed prior to the training, are \(\eta = 0.01\) (learning rate), \(L = 11\) (ten hidden layers), hyperbolic tangent as activation function \(\sigma_{1}\) and the identity function as activation function \(\sigma_{2}\). The weights and biases of the ResNet are initialized by Glorot (also called Xavier) initialization (Ref 29). Analogously to the SVM model, the input and target data are standardized for each physical quantity individually by the z-score method. In each iteration of the subsequent training, all input data are forward propagated at once (full batching). The final ResNet outputs for the test data are scaled back to their particular physical range. Two multi-output ResNets are trained: one for the CCD data and one for the LHS data. The structure of the applied ResNets is visualized in Fig. 5, which in addition illustrates the forward propagation procedure of the ResNet (Eq 10) compared to a standard ANN (Eq 9).

Fig. 5
figure 5

Structure of the applied ResNet and its forward propagation procedure to compute the output vector \(x^{{\left( {11} \right)}}\) with comparison to a standard ANN

Results and Discussion

In this chapter, the results of the ML models are presented and discussed. For each data set produced by different experimental designs, separate prediction models are trained. Then, the target values on the virtual substrate, which are the particle temperatures, velocities and positions (x and z-coordinates) are tested by the corresponding predefined test data sets. Hence, in the following only the results of the test data and not the training data for different ML models and DOE methods are presented and discussed.

Due to the data labeling described in "Machine Learning Algorithms" section, the assignment of the particles to their particular simulation is known. Hence, for a qualitative comparison of ML and simulation results, the average particle behavior per simulation can be investigated. Exemplarily, the mean particle temperatures \(\overline{T}_{p,i}\) per simulation \(i \in \left[ {1,45} \right]\) are computed by

$$\overline{T}_{p,i} = \frac{1}{{n_{i} }}\mathop \sum \limits_{j = 1}^{{n_{i} }} T_{p,i,j}$$
(14)

where \(n_{i}\) denotes the number of test particles of simulation \(i\) and \(T_{p,i,j}\) the particle temperature of particle \(j\) of simulation \(i\). The mean value over all 45 simulations is then computed by

$$\overline{T}_{p} = \frac{1}{45}\mathop \sum \limits_{i = 1}^{45} \overline{T}_{p,i}$$
(15)

and denoted by “grandmean” in the following. The means and grandmeans of the particle velocities and positions are computed analogously.

For a quantitative evaluation of the ML results, two statistical measures are considered. To evaluate the prediction accuracy of the single particle properties, the mean absolute percentage error (MAPE) is calculated. Given \(N\) data points, the MAPE is defined by

$$MAPE = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left| {\frac{{t_{i} - p_{i} }}{{t_{i} }}} \right|$$
(16)

with true values \(t_{i}\) and predictions \(p_{i}\). Furthermore, the R-squared value, for \(N\) data points, true values \(t_{i}\) with mean \(\overline{t}\) and predictions \(p_{i}\) defined by

$$R_{sq} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {t_{i} - p_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{N} \left( {t_{i} - \overline{t}} \right)^{2} }}$$
(17)

is calculated to evaluate the prediction accuracy of the average particle properties.

SVM Results

Figure 6 shows the results of the mean particle temperatures \(\overline{T}_{p,i}\) per simulation \(i \in \left[ {1,45} \right]\), see Eq 14, from the (a) CCD and (b) LHS data sets. The mean values predicted by the SVM model, shown in red, are denoted with “Mean SVM”, while the corresponding true values from the simulation model, displayed in blue, are labeled with “Mean Sim.”. The grandmeans according to Eq 15 are also plotted in Fig. 6.

Fig. 6
figure 6

Results of the mean particle temperatures per simulation for SVM model from (a) CCD and (b) LHS data sets

In the same way, the results of the mean predicted particle velocities from the (a) CCD and (b) LHS data sets are depicted in Fig. 7. Figures 6 and 7 demonstrate that the developed metamodels have high accuracy in predicting the mean in-flight particle temperatures and velocities from the input process parameters. Furthermore, it is observed that the developed SVM models have slightly better performance in predicting the particle properties with higher temperatures and velocities than the lower ones. In other words, in cases where the particles penetrated deeply into the plasma jet, thus resulting in higher temperatures and velocities, the models could find better relationships between the input process parameters and the particle properties. This has been observed for both CCD and LHS data sets in case of the SVM metamodels.

Fig. 7
figure 7

Results of the mean particle velocities per simulation for SVM model from (a) CCD and (b) LHS data sets

The predicted and true values of the single particle velocities exemplarily from the LHS data sets are shown in Fig. 8. For a clear presentation, only 250 data points from the total 45 simulations are randomly selected. It is evident that the metamodel can replicate the trend of the particle velocities in the plasma jet. The prediction of the mean particle velocities and temperatures is more accurate than the prediction of the single particle properties. As mentioned earlier, this can be explained with the stochastic nature of the plasma spraying process and the turbulence of the plasma flow, which makes it difficult to predict the behavior of each single particle as it depends on many factors that have influence on each other.

Fig. 8
figure 8

Exemplary trend of the predicted particle velocities of SVM model from LHS data sets

The statistical values MAPE (Eq 16) and R-squared (Eq 17) for prediction of single and average particle properties by SVM model from different DOE methods are given in Table 3. While the performance of the SVM model in terms of prediction accuracy of average particle properties is the same for CCD and LHS data sets, the results of the single particle properties shown in Table 3 indicate a slight improvement in prediction accuracy in case of the LHS experimental design in comparison with the CCD method. This confirms the suitability of the LHS for computational experiments.

Table 3 Statistical values for prediction of single and average particle properties by SVM model from different DOE methods

Figure 9 shows the distribution of the predicted particle coordinates of the SVM model from LHS data sets exemplarily for one simulation. It is clear that the predictions of the single particle coordinates are much less accurate than the particle velocities and temperatures. As previously mentioned, this is due to the fact that the behavior of single particles is to some extent random in a plasma spraying process, while the essence of ML is to learn and predict regular data. In contrast, the SVM model predicts the mean particle coordinates per simulation more accurately with R-squared values of 0.86 and 0.88 for x and z-coordinates, respectively. The accurate prediction of the mean particle coordinates can be used as a tool to find the position of the maximum particle intensity in the free-jet and consequently e.g. to adjust the injection settings or to position a particle diagnostic device (Ref 30).

Fig. 9
figure 9

Exemplary distribution of the particle coordinates of SVM model from LHS data sets for one simulation

The average prediction time of the SVM metamodels for the predefined test data sets was calculated to be about 4.2 s, which is dramatically faster than one CFD simulation of the plasma jet with an average calculation time of 3 hours.

Neural Network Results

The results of the ResNet model for mean particle temperatures from the (a) CCD and (b) LHS data sets are illustrated in Fig. 10. Likewise, the predicted mean particle velocities per simulation from the (a) CCD and (b) LHS data sets are depicted in Fig. 11. It is evident, that the ResNet model has replicated the mean particle temperatures and velocities with high accuracy. Furthermore, it is observed that the ResNet model, in contrast to SVM model, can predict the lower range of the particle properties as good as the upper range. Hence, the model grandmeans show a better agreement with the grandmeans of the simulation than in the SVM case.

Fig. 10
figure 10

Results of the mean particle temperatures per simulation for ResNet model from (a) CCD and (b) LHS data sets

Fig. 11
figure 11

Results of the mean particle velocities per simulation for ResNet model from (a) CCD and (b) LHS data sets

The prediction accuracy of the ResNet models in terms of MAPE and R-squared for single and average particle properties from both experimental designs is given in Table 4. In agreement with the SVM results, the ResNet model also shows higher accuracy for the LHS experimental design. Furthermore, the comparison of the model accuracies given in Tables 3 and 4 demonstrates that overall the ResNet model shows an enhancement in accuracy regarding the prediction of single and average particle properties.

Table 4 Statistical values for prediction of single and average particle properties by ResNet model from different DOE methods

Figure 12 illustrates the results of the mean particle x-coordinates per simulation from the (a) CCD and (b) LHS data sets. The ResNet model shows high accuracy in prediction of mean particle coordinates with the R-squared value of 0.99 for predicted x- and z-coordinates.

Fig. 12
figure 12

Results of the mean particle x-coordinates per simulation for ResNet model from (a) CCD and (b) LHS data sets

Figure 13 depicts the training error of the ResNet model for both the CCD and LHS data sets over 1,000 iterations, for each iteration computed by the mean square error

$$e_{tr} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left[ {\left( {t_{i} - p_{i} } \right)_{{T_{p} }}^{2} + \left( {t_{i} - p_{i} } \right)_{{v_{p} }}^{2} + \left( {t_{i} - p_{i} } \right)_{{x_{p} }}^{2} + \left( {t_{i} - p_{i} } \right)_{{z_{p} }}^{2} } \right],$$
(18)

cf. Eq 8, where \(t_{i}\) and \(p_{i}\) denote standardized true and predicted values, respectively, \(N\) is the number of particles in the training data set and the indices \(T_{p} , v_{p} , x_{p} , z_{p}\) denote for which quantity the particular squared difference is computed. It is evident that the training error out of the LHS data sets is slightly lower compared to the CCD data sets, demonstrating the suitability of LHS for computational experiments.

Fig. 13
figure 13

Iterative error of the ResNet model during the training process

The computation time for the ResNet prediction, i.e., the forward propagation, for the predefined test data sets is about 0.01 s, which again is a significant decrease compared with the average simulation time of 3 hours.

Conclusions and Outlook

The aim of this study was to take the primary steps towards creating a fast Digital Twin for the plasma spraying process to predict the in-flight particle properties based on input process parameters. The data sets for training the ML models have been acquired from a CFD model of the plasma jet. Central Composite Design (CCD) and Latin Hypercube Sampling (LHS) have been employed to cover a set of representative process parameters with reducing the number of tests, while selecting the most valuable sample data. The developed metamodels, namely Residual Neural Network (ResNet) and Support Vector Machine (SVM), are able to replicate the average particle properties with high accuracy, while reducing the computational cost dramatically. The average computational time of one plasma jet simulation is about three hours, while the average prediction time of the metamodels for the predefined test data sets is between 0.01 and 4.2 seconds. The following conclusions can be drawn from the presented results:

  • Demonstrating the suitability of the SVM and ResNet metamodels in combination of the CCD and LHS methods for prediction of particle properties in plasma spraying

  • Substitution of computationally expensive CFD simulations for ML models with dramatic decrease in calculation time

  • Accurate prediction of the mean particle temperatures, velocities and coordinates by SVM and ResNet based on various input process parameters

  • Minor increase in prediction accuracy of single particle properties in case of using LHS method for data preparation compared to CCD

  • Enhancement in accuracy regarding the prediction of single and average particle properties by ResNet compared to SVM

The results showed that the average particle properties could be predicted by the metamodels much more accurately than the behavior of single particles. This phenomenon is expected, since the plasma spraying is a stochastic process that involves many influencing factors. Thus, the behavior of single particles is much more random in comparison to average particle behavior. The results of the metamodels from the LHS data sets showed a minor enhancement in terms of the prediction accuracy, which confirmed the suitability of space-filling designs for computation experiments.

For a more accurate prediction of the behavior of single particles, the concept of physics-informed neural networks (PINNs) (Ref 31) could be applied. This incorporates the outputs of the ResNet into the system of partial differential equations (PDEs) underlying the simulations. In the spirit of discovering “hidden fluid mechanics” (Ref 32), it could be possible to significantly improve the prediction of single particles even through only a selection of the corresponding PDEs. This would finally lead to a compromise in computational cost between the fast ML predictions in this work and the time-consuming simulations.

Future studies could additionally validate the results of the metamodels by carrying out experimental in-flight particle diagnostics. Moreover, the developed models in the context of this study can provide a good starting point for creating the complementary concept of Digital Shadow for plasma spraying by combining further reduced models and experimental data analytics of the process chain.