1 Introduction

In friction stir welding (FSW), the mechanical properties [1] as well as the surface topography [2] are strongly affected by process parameters such as the welding speed vs and the tool rotational speed n (r/min rate). These parameters are typically determined by trial and error, based on handbook values, and by manufacturers’ recommendations [3]. This selection may neither yield optimal nor near-optimal welding performance. Furthermore, it may cause additional energy and material consumption and may also result in low-quality welds [3]. For this reason, several algorithms have already been developed to optimize the process parameters in friction stir welding. Some of these are presented in the following section.

1.1 State of the art—Use of optimization algorithms in the field of FSW

Various statistical and mathematical methods have been used to investigate the influence of process parameters on mechanical properties, in particular the ultimate tensile strength, and subsequently optimize the mechanical properties [4]. In many of these investigations, either the robust parameter design (RPD) method [5] or the response surface methodology (RSM) [6] was applied:

The RPD method focuses on choosing levels of parameters in a process to ensure that the mean of the output response is at a desired target and to ensure that the variability around the target value is as small as possible [5]. Taguchi [7] proposed an approach to solve the RPD problem based on designed experiments and novel methods for analyzing the resulting data [5]. He also simplified the use of orthogonal arrays [8]. An approach that has already been applied to FSW several times is the L9 orthogonal array. This method aims at understanding the influence of four independent factors with three steps each. With the L9 method, only nine experiments have to be performed in order to study four variables at three levels. So this design reduces 81 (34) configurations to nine experimental evaluations [8].

Lakshminarayanan et al. [9] determined the optimum settings for the rotational speed n, the welding speed vs, and the axial force Fz at FSW by adapting the Taguchi L9 orthogonal array method and maximizing the signal-to-noise (S/N) ratio. In the Taguchi method, the S/N ratio is used to determine the deviation of the quality characteristics from the desired value [9]. In order to investigate nonlinearities, each of the three process parameters was varied in three levels. Welding experiments were conducted for only nine out of the 27 possible parameter combinations. For each of the nine applied parameter combinations, three tensile tests were performed, and the mean of the ultimate tensile strength was calculated. Based on the mean values for the S/N ratio and the ultimate tensile strength, an ideal parameter set was determined. The expected ultimate tensile strength UTSexp, when using this ideal parameter set, was calculated with the following formula [9]:

$$ {\mathrm{UTS}}_{\mathrm{exp}}={\overline{\mathrm{UTS}}}_{n,L}+{\overline{\mathrm{UTS}}}_{v_{\mathrm{s}},L}+{\overline{\mathrm{UTS}}}_{F_{\mathrm{z}},L}-2\cdotp \overline{\mathrm{UTS}} $$
(1)

whereby \( {\overline{\mathrm{UTS}}}_{n,L} \), \( {\overline{\mathrm{UTS}}}_{v_{\mathrm{s}},L} \), and \( {\overline{\mathrm{UTS}}}_{F_{\mathrm{z}},L} \) are the mean ultimate tensile strengths at level L of the corresponding process parameters n, vs, and Fz, and \( \overline{\mathrm{UTS}} \) is the overall mean of all 27 determined ultimate tensile strengths. Subsequently, the expected maximum ultimate tensile strength UTSexp was compared with the actual ultimate tensile strength obtained by adjusting the previously determined ideal parameter set, and the deviation was 2.6%. It was also determined that the rotational speed n had an influence on the tensile strength of 41%, the welding speed vs of 33%, and the axial force Fz of 21%. The remaining 5% were referred to as errors. Ugender et al. [10] also used the Taguchi technique and the S/N ratio to find an optimum setting for the ratio of the diameter of the shoulder Ds to the diameter of the probe dp, the tilt angle, and the welding speed. The results showed that the Ds/dp ratio and the welding speed are the most important factors, followed by the tilt angle, when deciding on the mechanical properties of friction stir welds of aluminum alloys. Ganapathy et al. [11], Abbas et al. [12], and Ma et al. [13] also adopted Taguchi’s L9 orthogonal array design and maximized the S/N ratio to optimize FSW process parameters. Vijayan et al. [14] investigated an approach using the Taguchi-based grey relational analysis (GRA) [15] instead of the S/N ratio.

The RSM is an approach to solve the RPD problem that not only allows the use of Taguchi’s robust design concept but also provides a more sound and more efficient approach to experiment design and analysis [5]. Furthermore, the RSM is a collection of mathematical and statistical techniques for analyzing problems in which several independent variables influence a dependent variable and the goal is to optimize the dependent variable [16]. Rajakumar et al. [3] applied the RSM and established an empirical relationship between the independent variables (tool rotational speed, welding speed, axial force, shoulder diameter, probe diameter, and tool material hardness) and the dependent variable, which was the ultimate tensile strength of the joint. For this purpose, a multiple regression model was developed for the ultimate tensile strength of the weld. The model was able to predict the ultimate tensile strength of FSW joints within the 95% confidence level. Khansare et al. [17] proposed a hybrid optimization methodology based on the combination of the RSM and a genetic algorithm (GA) [18] to approximate the optimal welding speed and tool rotational speed in which a maximum ultimate tensile strength could be achieved.

Tansel et al. [19] developed a genetically optimized neural network system (GONNS) for modeling and optimizing the FSW process. The GONNS was introduced by Tansel et al. [20] by using artificial neural networks (ANNs) in combination with a GA. The GONNS models the system by using the ANNs trained with the experimental data or observations. The optimal operating conditions are estimated by using a GA [19]. Tansel et al. [19] used one GA for searching the optimal tool rotational speed and welding speed by using five ANNs representing the FSW operation. The five separate neural networks with two identical inputs (welding speed and tool rotational speed) estimated the mechanical and metallurgical properties of the friction stir welds.

1.2 State of the art—Evaluation of the surface of friction stir welds

Trueba et al. [21] performed an optimization experiment using a factorial design to evaluate the effect of process parameters on the weld temperature, surface and internal quality, and mechanical properties during bobbin-tool friction stir welding. To evaluate the surface appearance, a semi-quantitative visual appearance rating (VAR) was developed based on the presence and severity of visually observable defects. The rating scale ranged from nine (poorest surface quality) to zero (best surface quality), and the criteria wormhole, galling, flash, and narrow bead were included. The wormhole was defined as an internal void extending to the surface. It was found that high levels of tool rotational speeds and welding speeds resulted in high welding temperatures and insufficient weld metal constraint. This in turn led to galling and the formation of wormholes with a corresponding decrease in surface quality. It was taken into account that there is a relationship between rotational speed, weld temperature, surface appearance, and void formation.

According to Zuo et al. [2], the surface topography of friction stir welds plays an important role in the performance of the joints. A larger surface roughness leads to a more serious stress concentration, which will cause the occurrence of fatigue damage and the reduction of fatigue strength of the parts [2]. Important process parameters to control the surface topography of friction stir welds are the welding speed and the rotational speed of the tool [22]. Hartl et al. [23] presented key indicators for quantifying the surface topography of friction stir welds and showed that some of these can be predicted by evaluating process variables such as the process forces or temperatures [24].

To date, there have not been any investigations regarding the very promising algorithm-based optimization of the surface topography of friction stir welds or on the application of reinforcement learning (RL) [25] and Bayesian optimization (BO) in the field of FSW, which is why these modern learning-based algorithms are used in this work. The fundamentals regarding these algorithms are contained in Appendix I.

2 Methodology

2.1 Approach

Previous investigations have shown that the surface quality of friction stir welds significantly depends on the welding speed vs and the rotational speed n of the tool. The optimal setting of these parameters depends on factors such as the sheet thickness, the aluminum alloy, and the tool geometry used, for instance. Due to the complex interrelations, the ideal welding speed vs and tool rotational speed n can often only be found through experience and trial and error. In this work, a learning-based system was developed that helps the FSW user to find optimal settings for these two parameters. Since the production of friction stir welds is time-consuming, as few parameter combinations as possible should be sampled to find suitable parameters.

The evaluation of the surface quality was conducted on the basis of surface topography indicators for friction stir welds, which were presented in Hartl et al. [23]. The task was addressed as an optimization problem:

$$ \underset{\left({v}_{\mathrm{s}},n\right)\in \mathcal{X}\subseteq {\mathrm{\mathbb{R}}}^2\ }{\mathrm{argmin}} def\ \left({v}_{\mathrm{s}},n\right) $$
(2)

Here, def is a function that indicates how defective the surface of the friction stir weld is for the given parameters. The function value def(vs, n) is smaller than the function value def(vs′, n′) if the parameter combination (vs, n) leads to fewer surface defects than the parameter combination (vs′, n′). The evaluation of def(vs, n) can be equated with the explicit testing of the parameter combination (vs, n), i.e., the production of the friction stir weld with the given parameters, the recording of the surface topography, and the calculation of the topography key indicators based on Hartl et al. [23]. Since this process is associated with considerable effort, the number of evaluations of def should be kept as low as possible. When implementing the algorithms, it had to be taken into account that there was no information about the gradient of def. Additionally, def contains an error: even if the process parameters are identical for two experiments, the surface topography of these two welds will not be completely identical. Small measurement inaccuracies may also occur when recording the surface topography with the three-dimensional profilometer. However, for simplification purposes, it was assumed that def has no error. To solve the optimization problem, three different approaches were considered:

  1. I.

    For the first approach, the optimization problem was modeled as a Markov decision process (MDP) and solved using the RL-based value iteration algorithm.

  2. II.

    For the second approach, the optimization problem was solved with BO. In the further discussion, the second approach will be called single-task.

  3. III.

    For the third approach, the optimization problem was also solved using Bayesian optimization. In contrast to the single-task approach, here the Gaussian process (GP) was provided with additional data that it could use to find the optimum. The GP was provided with information about the type of aluminum alloy, the sheet thickness, and the shoulder geometry used. In the further discussion, the third approach will be called multi-task.

2.2 Welding experiments

The welding experiments were conducted on a four-axis milling machining center MCH 250 from Gebr. Heller Maschinenfabrik GmbH, which was adapted for friction stir welding. The maximum axial force of the system was 30 kN. In the experiments, the sheets were joined in the butt joint configuration and a rigid clamping device avoided gaps between the two joining partners. All tests were performed in position-controlled operation with a 2° tilt angle of the tool. Two-piece tools consisting of a shoulder and a conical welding probe with a thread and three flats were used. A total of 262 welding experiments were conducted within the scope of 10 studies. In the 10 studies, the type of aluminum alloy, the tool shoulder geometry, and the sheet thickness were varied. Table 1 provides an overview of the different studies. Some of the studies have already been described in more detail in previous research conducted by Hartl et al. [23, 26]. The evaluated weld seam length varied in the ten studies, but was always between 70 and 170 mm. The evaluated weld seam area started 10 mm after the plunge point and ended approximately 20 mm before the exit hole.

Table 1 Welding experiments used for this work

The welding speed vs and the tool rotational speed n were varied in a large parameter window. As high welding speeds vs are becoming increasingly important for industrial applications, especially in the context of electromobility [27], welding speeds of up to 1500 mm/min were employed. In order to protect the welding equipment, the minimum n/vs ratio was limited to 1 mm−1. In studies no. 1 to 8, the welding speed vs and the tool rotational speed n were varied in a full factorial manner in four steps, respectively. Thereby, the welding speeds vs ranged from 500 to 1500 mm/min and the tool rotational speeds n from 1500 to 3500 min−1. In study no. 9, a total of 13 different rotational speeds n from 1500 to 3500 min−1 were set at a welding speed vs of 833 mm/min. In study no. 10, the welding speed vs was varied in eleven steps from 500 to 1500 mm/min and the rotational speed n was varied in eleven steps from 1500 to 3500 min−1 in a full factorial design.

2.3 Data preprocessing

The topography of the friction stir welds was recorded using a three-dimensional profilometer VR-3100 from Keyence Deutschland GmbH which was based on phase-coded structured light projection. Thereby, white LEDs projected light from two places onto the welds and the reflected light was measured by a CMOS sensor. The smallest measurable difference in the height direction normal to the sheet surface was 1 μm. The sheet surface was defined as the zero height. The distance between the individual topography points in the plane of the sheet surface was approximately 24 μm. A total of about 250,000 height information points per 10 mm weld seam length were generated. The point cloud was processed to determine the key indicators listed in Table 2 for each weld. A more detailed description of the key indicators is given in Hartl et al. [23].

Table 2 Key indicators derived from the three-dimensional point cloud to quantify the features of the weld surface

Table 3 shows the ideal value for each of these eight key indicators as well as the best and the worst values obtained for the 262 welding experiments performed. The value of −2.80 mm for the largest seam underfill was notably high. This value was caused by a lack of fill occurring in some experiments in study no. 2 (see also Hartl et al. [23]). The maximum value for the peak material volume of 37.36 ml/m2 was also remarkably high. This high value could be explained by flash that reached into the weld. The values displayed in Table 3 were therefore all considered plausible.

Table 3 Ideal values for each of the eight key indicators as well as the best and worst values obtained for one of the 262 welding experiments performed

The eight topography indicators obtained for the 262 welds were then scaled to values between 0 and 1. The ideal value for each topography key indicator was scaled to a value of 0 and the worst occurring value for each topography key indicator was scaled to a value of 1. The ideal value for the ratio rarc is 1 [23]. The largest deviation from this ideal value was 0.95 at an rarc of 1.95, which is why that deviation was scaled as 1. The eight scaled values N for the eight topography indicators were then averaged for each weld according to:

$$ def=\frac{N_{f_{\mathrm{m}}}+{N}_{u_{\mathrm{m}}}+{N}_{S_{\mathrm{f}}}+{N}_{S_{\mathrm{u}}}+{N}_{S_{\mathrm{d}}}+{N}_{r_{\mathrm{arc}}}+{N}_{V_{\mathrm{m}\mathrm{p}}}+{N}_{S_{\mathrm{w}}}}{8} $$
(3)

and a scaled and averaged key indicator def was obtained that took into account all eight topography indicators defined before (see Table 2). In Eq. 3, all defined topography key indicators were weighted equally. If a quality characteristic would be particularly relevant in the application, for example, the flash height, this could be weighted more prominently in Eq. 3. The perfect friction stir weld surface would therefore have the value def of 0. The best actual weld of all 262 conducted experiments was experiment no. 53, which had the value def of 0.021. The worst obtained value for def was 0.603 for experiment no. 123. Figure 1 shows the evaluated areas of these two welds as color and topography images. The images were generated using the three-dimensional profilometer. Experiment no. 53, on the one hand, contained no surface defects. Neither pronounced flash formation, nor surface galling, nor cracks were visible on the surface of the weld. The topography image in Fig. 1 shows that the seam underfill was also low and regular. Experiment no. 123, on the other hand, showed a very strong flash formation and pronounced surface galling. The defined key figure def was therefore assessed as suitable for documenting the surface quality in a scalar quantity.

Fig. 1
figure 1

Color images and topography images of the evaluated weld surfaces with the lowest (def = 0.021) and highest (def = 0.603) value for the scaled and averaged surface topography indicator def

3 Results

3.1 Reinforcement learning

In order to solve the optimization problem using RL, it first had to be formulated as an MDP. Two formulations, labeled as formulation 1 and formulation 2, were implemented, which differed in the state transition function and the possible actions. In both formulations, the state transition function was deterministic. For a clearer presentation, the state transition function p was represented deterministically with the two functions T(s, a): S × AS and r(s, a): S × A → ℝ. These two functions described in which state s′ the environment resulted and which reward r the agent received when the agent executed the action a in the state s. The state transition function could be derived from both functions as follows:

$$ p\left({s}^{\hbox{'}},r\ \right|\ s,a\Big)=\left\{\ \begin{array}{c}1\kern1em \mathrm{if}\kern2em {s}^{\hbox{'}}=T\left(s,a\right)\kern1em \mathrm{and}\kern1em r=r\left(s,a\right)\\ {}0\kern6.75em \mathrm{else}\kern7em \end{array}\right. $$
(4)

Both formulations were solved using the value iteration algorithm. The parameter δ (see Algorithm A2) was set to 10−8.

In both formulations, the states S were the parameter combinations of welding speed vs and tool rotational speed n. For example, from study nos. 1 to 8, each of which included 16 different parameter combinations, S was assigned:

$$ S=\left({v}_{\mathrm{s}},n\right)=\left\{500,833,1167,1500\right\}\times \left\{1500,2167,2833,3500\right\} $$
(5)

The reward function r should lead the policy π (see Appendix I) into a minimum as fast as possible, which is why the following reward function was chosen for both formulations:

$$ r\left(s,a\right)=- def\ \left(T\left(s,a\right)\right)-{b}_{\mathrm{s}} $$
(6)

where def was the function to be optimized and bs ∊ ℝ+ was a constant. This reward function gave the agent a higher reward r, the smaller the value of def was, i.e., the less defective the topography of the friction stir weld was. In addition, the agent received a penalty of bs for each step, because each step was coupled with an explicit evaluation of def and thus connected with effort. The value for bs was set to 1. Since the value of def was between 0 and 1, the reward r for each step was between − 2 and − 1.

In formulation 1, only the following four different actions were allowed:

$$ A=\left\{\mathrm{increase}\ \mathrm{welding}\ \mathrm{speed}\ {v}_{\mathrm{s}},\mathrm{reduce}\ \mathrm{welding}\ \mathrm{speed}\ {v}_{\mathrm{s}},\mathrm{increase}\ \mathrm{rotational}\ \mathrm{speed}\ n,\mathrm{reduce}\ \mathrm{rotational}\ \mathrm{speed}\ n\right\}\kern0.50em $$
(7)

For example, for study nos. 1 to 8, the state transition function T for formulation 1 was defined as:

$$ {\displaystyle \begin{array}{c}T\left(s,\mathrm{increase}\ \mathrm{welding}\ \mathrm{speed}\ {v}_{\mathrm{s}}\right)\kern2em =\kern0.5em \left\{\kern1.75em \begin{array}{c}\left({v}_{\mathrm{s}}+333,n\right)\kern4.5em \mathrm{if}\ {v}_{\mathrm{s}}<1500\\ {}\ s\kern9.25em \mathrm{else}\end{array}\right.\\ {}T\left(s,\mathrm{reduce}\ \mathrm{welding}\ \mathrm{speed}\ {v}_{\mathrm{s}}\right)\kern2.5em =\kern0.75em \left\{\kern2em \begin{array}{c}\left({v}_{\mathrm{s}}-333,n\right)\kern4em \mathrm{if}\ {v}_{\mathrm{s}}>500\\ {}s\kern9.25em \mathrm{else}\end{array}\right.\\ {}\begin{array}{c}T\left(s,\mathrm{increase}\ \mathrm{rotational}\ \mathrm{speed}\ n\right)\kern1.5em =\kern0.75em \left\{\kern1.75em \begin{array}{c}\left({v}_{\mathrm{s}},n+667\right)\kern4.25em \mathrm{if}\ n<3500\\ {}\ s\kern8.75em \mathrm{else}\end{array}\right.\\ {}T\left(s,\mathrm{reduce}\ \mathrm{rotational}\ \mathrm{speed}\ n\right)\kern2.25em =\kern0.75em \left\{\kern1.75em \begin{array}{c}\left({v}_{\mathrm{s}},n-667\right)\kern4em \mathrm{if}\ n>1500\\ {}\ s\kern9em \mathrm{else}\end{array}\right.\end{array}\end{array}} $$
(8)

Figure 2 shows the 16 different states with the respective possible actions for formulation 1 for study nos. 1 to 8. The states and possible actions for study nos. 9 and 10 were analogous.

Fig. 2
figure 2

a Possible states s and b possible actions a in the different states s for study nos. 1 to 8 when using formulation 1

In formulation 2, in contrast to formulation 1, the agent was able to change from any state to any other state. This followed:

$$ A=S $$
(9)

The state transition function was thus simplified to:

$$ T\left(s,a\right)=a $$
(10)

In both formulations, the discount factor γ was set to 1 (see Appendix I). This expressed the fact that both immediate and future evaluations of def were equally unwanted [25]. Additionally, the number of iterations no longer depended on the choice of γ.

To ensure that the algorithm still terminated, the minimum of def was regarded as the terminal state [25]. This meant that the value of the value function in the terminal state was always zero. The algorithm terminated because the agent could get from any state to the terminal state in finitely many steps and the agent received a negative reward in every state except the terminal state. So, in order to get the least negative reward possible, the agent had to get to the terminal state as quickly as possible. In Algorithm A1, the slightly modified value iteration algorithm is described. The changes, compared with Algorithm A2, are underlined. The changes ensured that the value functions of the terminal states were always zero. In addition, the algorithm was adapted for the deterministic formulation.

Algorithm A1

Adapted value iteration algorithm

figure a

Figure 3 a shows the values for the scaled and averaged surface topography indicator def (see Eq. 3) for the 16 different states in study no. 1. It is evident that the best value for def, 0.041, in study no. 1 was obtained in the state (500, 1500). This was in good agreement with the result from Hartl et al. [23], wherein this parameter combination led to the best-rated result in study no. 1 according to the visual inspection. Figure 3 b shows the initialization of the value function Vπ(s) with zeros (compare line 1 in Algorithm A1).

Fig. 3
figure 3

a Values for the scaled and averaged surface topography indicator def for study no. 1. b Initialization of the value function Vπ(s)

In both formulations, the algorithm evaluated the function def for all 16 parameter combinations. Figure 4 illustrates the results for the value iteration algorithm for study no. 1 when using formulation 1. Thereby, the strategy for each state was the direction in which the sum of future rewards was maximized (see Eq. 19 in Appendix I). The algorithm terminated after five iterations. Figure 4 a shows the values of the value function Vπ(s) for the first and the fifth (last) iteration. Figure 4 b shows the values of the value function Vπ(s) plus the reward, if the agent changes from another state to this state for these two iterations. Figure 4 c demonstrates the direction to the neighboring state that yields the highest improvement according to Fig. 4b. For example, for state (1500, 3500) in iteration 1, that would be the reduction of the welding speed vs. The terminal state was reached for (500, 1500), so the value function Vπ(s) for this state is zero from the beginning. The results for the other studies (see Table 1) led to the same findings.

Fig. 4
figure 4

Visualization of the algorithm for the 16 different states in study no. 1 using formulation 1 for the five iterations. a Values of the value function Vπ(s). b Values of the value function Vπ(s) plus the reward, if the agent changes from another state to this state. c Optimal action according to b

Figure 5 displays the results for the value iteration algorithm for study no. 1 when using formulation 2 analogous to Fig. 4. For formulation 2, the strategy in each state s was the action (500, 1500). The labeling of the boxes in Figs. 4 and 5 is analogous to the labeling in Fig. 2a.

Fig. 5
figure 5

Values of the value function Vπ(s), which is the sum of rewards if the agent follows the policy in this state, for the 16 different states in study no. 1 using formulation 2 for the two iterations. a Sum of rewards if the agent follows the policy in this state. b Sum of rewards if the agent changes from another state to this state and then follows the policy

The results showed that the optimization problem (see Eq. 2) can be solved using RL. However, the value iteration algorithm was not efficient. Since all states sS and all actions aA were iterated over, the function def had to be evaluated for each process parameter combination. There are algorithms for RL that are more efficient in this respect. For example, the value function could be approximated using a Gaussian process to reduce the number of evaluations [29].

3.2 Bayesian optimization

Optimization of the surface quality

In the second approach (single-task), the optimization problem was solved using Bayesian optimization (BO). Thereby, only the data from the respective study were used. Additional information from other studies or information such as the type of aluminum alloy were not utilized. Since no information was available at the beginning of the optimization for the first selection of a parameter set, a random parameter set had to be selected in the first trial. A parameter set consisted of the welding speed vs and the tool rotational speed n. To ensure that the results were independent of the selected starting point, all parameter combinations were used once as the starting point and then the means and standard deviations of the required number of steps to find good parameter sets were subsequently calculated.

In order to avoid overfitting the hyperparameters, it was assumed that they follow certain distributions. For the hyperparameter length-scale l of the Matérn 5/2 kernel, a uniform distribution was assumed, since the Gaussian process degenerated strongly at values below 0.1 and those were thus excluded. A logarithmic normal distribution was assumed for the variance σ to determine the approximate interval in which σ should be located. The exact choice of the distribution parameters was not significant. The parameters in Table 4 in Appendix II were found by trial and error using the data sets from study nos. 1 to 10, so that the distributions cover approximately the range of the hyperparameters that do not degenerate the GP. Figure 6 shows a degenerate GP for study no. 9 that had to be avoided. The cause of the degeneration was that the hyperparameter l was selected to be 0.01 and therefore too small. Consequently, the mean in Fig. 6 is always zero and only spikes at known points. In addition, the hyperparameter variance σ of 0.001 was too small.

Fig. 6
figure 6

Example of a degenerated GP due to unsuitable hyperparameters

In the third approach (multi-task), the optimization problem was also solved using the BO, but in this case, the GP received the data sets from the nine other studies as additional information, respectively. It was suspected that this would allow the GP to better estimate the function to be minimized and to find the optimum more quickly. In order for the GP to be able to estimate which other studies were similar, it was given additional features as input variables which influence the setting of the process parameters. These features were the type of aluminum alloy, the sheet thickness, and the shoulder geometry. Since they were identical in study nos. 3 ↔ 9, 4 ↔ 8, and 5 ↔ 6 (see Table 1), the study number was also used as an additional input variable. This resulted in the following kernel for the GP:

$$ {\displaystyle \begin{array}{c}g\left({v}_{\mathrm{s}},n,{m}_{\mathrm{f}},{t}_{\mathrm{f}},{s}_{\mathrm{f}},i\right):\mathrm{\mathbb{R}}\times \mathrm{\mathbb{R}}\times {M}_{\mathrm{f}}\times {T}_{\mathrm{f}}\times {S}_{\mathrm{f}}\times \left\{1,2,\dots, 10\right\}\to \mathrm{\mathbb{R}}\\ {}{M}_{\mathrm{f}}=\left\{\mathrm{EN}\ \mathrm{AW}-5754-\mathrm{H}111,\kern0.5em \mathrm{EN}\ \mathrm{AW}-6082-\mathrm{T}6\right\},\\ {}\begin{array}{c}{T}_{\mathrm{f}}=\left\{2\ \mathrm{mm},3\ \mathrm{mm},4\ \mathrm{mm}\right\},\\ {}{S}_{\mathrm{f}}=\left\{\mathrm{concave},\mathrm{spiral},\mathrm{rings}\right\},\end{array}\end{array}} $$
(11)

where Mf, Tf, and Sf represent the quantities of the aluminum alloys mf, sheet thicknesses tf, and shoulder geometries sf used. Additionally, i is the number of the study and {1, 2, …, 10} is the set of natural numbers less than or equal to 10, since data from 10 studies were used in total. For the welding speed vs, the tool rotational speed n, and the sheet thickness tf, the Matérn 5/2 kernel (see Appendix I) was used. Since the type of aluminum alloy, the shoulder geometry, and the study number were categorical values, the coregionalization kernel (see Appendix I) had to be used for those. In that way, their covariance could be learned via hyperparameter optimization. The covariance function k was defined as follows:

$$ {\displaystyle \begin{array}{c}k\left(\left({v}_{\mathrm{s}},n,{m}_{\mathrm{f}},{t}_{\mathrm{f}},{s}_{\mathrm{f}},i\right),\left({v_{\mathrm{s}}}^{\prime },{n}^{\prime },{m_{\mathrm{f}}}^{\prime },{t_{\mathrm{f}}}^{\prime },{s_{\mathrm{f}}}^{\prime },{i}^{\prime}\right)\right)\\ {}={k}_{\mathrm{E}}\left(\left({v}_s,n,{t}_{\mathrm{f}}\right),\left({v_{\mathrm{s}}}^{\prime },{n}^{\prime },{t_{\mathrm{f}}}^{\prime}\right)\right).{k}_{\mathrm{M}}\left(\ {m}_{\mathrm{f}},{m_{\mathrm{f}}}^{\prime}\right).{k}_{\mathrm{S}}\left({s}_{\mathrm{f}},{s_{\mathrm{f}}}^{\prime}\right).{k}_{\mathrm{I}}\left(i,{i}^{\prime}\right)\end{array}} $$
(12)

where kE is the Matérn 5/2 kernel, kM and kS are coregionalization kernels for two or three categories, and kI is a coregionalization kernel for ten categories. Since the GP was given information from the nine other studies, the first parameter set to be tested no longer had to be chosen randomly as in the single-task approach.

In order to prevent overfitting the hyperparameters, it was also assumed for the multi-task approach that the hyperparameters follow certain distributions. A uniform distribution was assumed for the hyperparameter length-scale l of the Matérn 5/2 kernel. A logarithmic normal distribution was assumed for hyperparameters, which can only have positive values, and a normal distribution for all remaining hyperparameters. Table 5 in Appendix II lists the hyperparameters for the multi-task approach and their distribution analogous to Table 4.

The number of expected steps until the random search algorithm finds a suitable parameter setting has been calculated as described below: Let Z be a random variable that indicates after how many steps a random search finds one of the o optima for the first time. Thus, pz was the probability that a random search has not found an optimum in the previous z steps and finds an optimum in the (z + 1)-th step:

(13)

whereby q is the number of possible parameter combinations and Z ∊ {1, 2, …, qo + 1}, whereby {1, 2, …, qo + 1} is the set of natural numbers less than or equal to (qo + 1), since there are at most (qo) parameter sets that are not an optimum and one of the optima is found in the (qo + 1)-th step at the latest. Thus, the expected number of necessary steps to find an optimum using random search was:

$$ \mathbbm{E}\left[Z\right]=\sum \limits_{z=1}^{q-o+1}z.{p}_z(z)=\sum \limits_{z=0}^{q-o}\left(z+1\right).{p}_z\left(Z=z+1\right)\kern0.5em $$
(14)

Table 6 in Appendix II shows the number of steps required to find a suitable parameter set for the ten different studies. Initially, the best 20% of each study were defined as suitable. This resulted in the number of o20 parameter sets for each study, which led to an optimum result. If the random search (RS) algorithm was used, the calculated expected value until one of the o20 good parameter sets was found was 4.3 steps on average. When using the single-task approach and the probability of improvement (PI) acquisition function, an average of 3.3 steps, and when using the multi-task approach, an average of 2.4 steps were required to find a parameter set that was among the best 20%. The average results for the expected improvement (EI) acquisition function were almost identical with the results when using the PI acquisition function. Since the application of the single-task approach also depended on which parameter set was tested first, the mean value and standard deviation are given for each study in Table 6. The BO was started once with each of the parameter sets. Since the determination of suitable settings for the welding speed vs and the tool rotational speed n with both the single-task and the multi-task approaches succeeded faster than an RS algorithm would suggest, the use of the BO was considered suitable. It was expected that the BO would lead to suitable parameters faster than RS, since the BO models the function to be optimized. This made it possible to estimate which parameter setting should be tested next. With RS, the parameter settings were evaluated in random order. In addition, the information given in Table 6 shows that, on average, the multi-task approach led to better results than the single-task approach. The information from the other data sets could thus be successfully used to determine suitable welding parameters faster. Since the similarity to the other data sets was modeled, it was also possible to derive from which of the other studies the information could be better transferred in the multi-task approach.

Figure 7 illustrates the number of steps required for the different studies and approaches. It becomes obvious again that the multi-task approach led to the best results. Only in study nos. 7 and 9 the results were the worst for the multi-task approach. The reason was assumed to be that study no. 7 was the only study in which a sheet thickness of 2 mm was used (see Table 1), which is why the information from the other studies in this case even had a negative effect on the result compared with the single-task approach. Study no. 9 also differed from the other studies. In this study, the welding speed vs was not changed and only the rotational speed n was varied. Due to the greater difference as compared with the other studies, this information could therefore not be used advantageously.

Fig. 7
figure 7

Number of steps required to achieve a parameter set that was among the best 20% by using the acquisition functions a probability of improvement and b expected improvement

Table 7 in Appendix II shows the number of steps required to find suitable welding parameters for the various approaches in analogy to Table 6. Now, only the best 5% of each study were defined as suitable. This resulted in the number of o5 good welding parameter sets for each study. This specification could be used for applications that require a very high surface quality, for example for visible welding seams. Since the number of optimal parameter sets was now lower, the number of steps necessary to find good parameter sets was higher on average. It once again became clear that the use of the BO reduced the number of necessary steps compared with RS. The average number of necessary steps was again lower with the multi-task approach than with the single-task approach. This time, the differences between the two different acquisition functions were slightly bigger and the PI acquisition function was on average better for the single-task approach, whereby for the multi-task approach, the EI acquisition function performed marginally better.

Figure 8 illustrates the steps necessary to find a parameter set that was among the best 5%. Compared with Fig. 7, it is particularly noticeable that in study no. 2, the multi-task approach required the most steps to find a good parameter set. This was assumed to be due to the fact that study no. 2 was the only study in which a spiral shoulder geometry was used (see Table 1) and the information from the other data sets was rather confusing than useful for the multi-task BO algorithm.

Fig. 8
figure 8

Number of steps required to achieve a parameter set that is among the best 5% by using the acquisition functions a probability of improvement and b expected improvement

Figure 9 a shows the sequence of the tested parameter sets in study no. 1 based on the multi-task approach. The sequence was identical for both acquisition functions: PI and EI. Figure 9 b shows the achieved values for the surface topography def in each step. From the two figures, it becomes clear that the optimum parameter set (500, 1500) was already found in the first step. This was also in agreement with the result from Hartl et al. [23], where for study no. 1 the parameter set (500, 1500) led to the best result regarding the visual inspection. Figure 10 shows a color image and a topography image of the evaluated welding surface from this experiment. Since the approach tested all 16 parameter settings and have only tested each parameter set once (see Algorithm A3), the quality of the surface topography decreased in the subsequent steps (see Fig. 9b).

Fig. 9
figure 9

a Sequence of the selected parameter sets for study no. 1 when using the multi-task approach for the two acquisition functions tested. b Corresponding values for the scaled surface defect def

Fig. 10
figure 10

Color image and topography image of the best weld surface achieved in study no. 1 in which the parameter set was found in the first step using the Bayesian optimization multi-task approach

Figure 11 visualizes the GP for study no. 1 when using the BO multi-task approach with the mean function and the 68% confidence interval. The first three steps have already been performed, so the points (500, 1500), (500, 2167), and (833, 1500) in Fig. 11 are already known. Since the acquisition function for the point (833, 2167) was the highest compared with the other points not yet evaluated, this point was evaluated next for both the PI and the EI acquisition function.

Fig. 11
figure 11

Visualization of the Gaussian process and the acquisition functions after three tested parameter sets in study no. 1 after applying the multi-task approach

Figure 12 is analogous to Fig. 11 when using the BO single-task approach and the first three steps have already been performed. Thereby, it was specified that in the first step, the parameter set (500, 1500) was tested. It became clear that the GP mean function and the associated 68% confidence interval could estimate the real data considerably less accurately as compared with the multi-task approach, which is illustrated in Fig. 11.

Fig. 12
figure 12

Visualization of the Gaussian process and the acquisition functions after three tested parameter sets in study no. 1 after applying the single-task approach

In the investigations on the multi-task approach presented so far, all nine other studies shown in Table 1 were used to find suitable parameters. In the study nos. 3 ↔ 9, 4 ↔ 8, and 5 ↔ 6, the same aluminum alloy, the same sheet thickness, and the same shoulder geometry were used. The disadvantage of these studies, hereinafter referred to as duplicates, was that the GP needed additional hyperparameters (see Eq. 11), which made their learning more complex. Furthermore, it was not possible to investigate how well the other data sets, which had no duplicates, could be used for the data sets that had duplicates. In further investigations using the multi-task approach, the duplicates were therefore not used and the study number i (see Eq. 11) was no longer required for the differentiation, which also simplified the covariance function. Table 8 in Appendix II shows which data sets were used to calculate suitable parameter sets, whereas Table 9 displays a comparison of the results with and without the duplicates. On average, the results could be improved without the duplicates. The assumed reason for this was that fewer hyperparameters had to be learned if the duplicates were omitted. Especially in study nos. 2 and 7, which had no duplicates, the results could be improved. In the studies that had duplicates, the results were worse if the duplicates were not used. Only in study no. 9 did the results improve, even though there was a duplicate. This was probably due to the fact that study no. 3 showed a poor welding result for the welding speed of 833 mm/min and a rotational speed of 2167 min−1, whereas study no. 9 showed the best welding results in this parameter range.

Overall, it could be shown that Bayesian optimization can be used very efficiently to find suitable process parameters for friction stir welding. The Bayesian optimization was better suited than reinforcement learning for the aim of this project to optimize the surface quality of FSW seams as efficiently as possible. For the optimization using reinforcement learning, a strategy had to be found that maximizes the expected value of an infinite sum of random variables and the function def had to be evaluated for each process parameter combination. Using the Bayesian optimization, the function def could be directly optimized by searching for values for the welding speed and the tool rotational speed that minimize the function def.

In this work, only the welding speed and the tool rotational speed were varied. This can be extended to other process parameters in further work. For example, the tilt angle or the immersion depth of the tool could be implemented additionally to optimize the surface quality. The Bayesian optimization method can also be transferred to force-controlled processes in order to find a suitable setting for the axial force of the tool.

Optimization of the surface quality with consideration of the welding speed

Since the welding speed vs has a direct influence on the process productivity [30] in any welding operation in an industrial context, the objective behind the selection of suitable welding parameters is to maximize the welding speed vs while ensuring an acceptable welding quality [31]. The growing market for the use of friction stir welding in the electromobility sector also requires sufficiently high welding speeds in order to enable economic production [27]. Richter [27] recommends aiming to attain a welding speed vs of at least 1000 mm/min. Therefore, in the investigations described in this section, slower welding speeds were penalized with plvp according to the following formula:

$$ {p}_{lvp}= lvp.\frac{{v_{\mathrm{s}}}_{\mathrm{max}}-{v}_{\mathrm{s}}}{{v_{\mathrm{s}}}_{\mathrm{max}}-{v_{\mathrm{s}}}_{\mathrm{min}}} $$
(15)

with lvp being a factor indicating the magnitude of low welding speed penalty, vsmax being the maximum welding speed in a study, and vsmin being the minimal welding speed in a study. The higher the selected lvp factor, the higher welding speeds are preferred. The value for plvp was then added to the values for def, which were calculated according to Eq. 16 to generate the new variable defvs, which took into account both the surface quality and the welding speed vs:

$$ {def}_{vs}= def+{p}_{lvp} $$
(16)

In this project, the maximum welding speed in all ten studies was 1500 mm/min and the minimum welding speed was 500 mm/min. Additionally, lvp was chosen to be 0.15. For a welding speed vs of 833 mm/min, for example, this resulted in a plvp of 0.10, which was then added to all values for def where a welding speed vs of 833 mm/min was applied. Figure 13 a shows the sequence of tested parameter sets when using the BO multi-task approach with all ten studies used (including the duplicates) and punishing low welding speeds vs with plvp. It becomes clear that parameter settings with a welding speed vs of 1500 mm/min were preferred then. Figure 13 b shows the corresponding values for defvs. The optimum for defvs of 0.12 for study no. 1 was already reached in the first step again, and thus, the parameter set (1500, 2167) was used.

Fig. 13
figure 13

a Sequence of the selected parameter sets using the multi-task approach for the two acquisition functions tested with penalization of low welding speeds. b Corresponding values for defvs

Figure 14 shows a color image and a topography image of the evaluated welding surface for the parameter set (1500, 2167). It becomes clear that the surface has a small irregular flash formation and slight surface galling compared with the result shown in Fig. 10. However, the welding speed vs was three times as high.

Fig. 14
figure 14

Color image and topography image with the best weld in study no. 1, taking into account the weld surface quality and the welding speed; the parameter setting was achieved in the first step using the Bayesian optimization multi-task approach

A possibility was thus found which allows the FSW user to weight the two criteria surface quality and welding speed individually adjusted to the requirements by setting the parameter lvp and to consider this in the learning-based automated search for suitable process parameters.

4 Conclusions and future research

A total of 262 friction stir welds were performed within 10 studies. Subsequently, with reinforcement learning and Bayesian optimization, two learning-based algorithms were tested for their applicability in optimizing the surface topography by adjusting the welding speed and tool rotational speed. The following conclusions were drawn:

  • The optimization problem could be solved by means of reinforcement learning, but not efficiently. Furthermore, it was complicated to solve the problem with reinforcement learning, because a policy had to be found that maximizes the expected value of an infinite sum of random variables. Instead, it was better to solve the optimization problem directly by using the Bayesian optimization.

  • The Bayesian optimization found suitable settings for the process parameters significantly faster than random search, both without (single-task approach) and with (multi-task approach) the aid of the data sets from the other studies.

  • In the multi-task approach, the information from the other studies could be successfully used to find suitable welding parameters even faster compared with the single-task approach.

  • By penalizing low welding speeds, both the surface topography and the welding speed could be considered for the optimization.

In a future project, the aim will be to show that the optimization of the surface topography in general leads to an increase in the ultimate tensile strength of the friction stir welded joint. The transfer of the algorithms developed in this work for the inline optimization of the weld seam surface is also the subject of future investigations.