A comparison of heuristic, statistical, and machine learning methods for heated tool butt welding of two different materials

Heated tool butt welding is a method often used for joining thermoplastics, especially when the components are made out of different materials. The quality of the connection between the components crucially depends on a suitable choice of the parameters of the welding process, such as heating time, temperature, and the precise way how the parts are then welded. Moreover, when different materials are to be joined, the parameter values need to be tailored to the specifics of the respective material. To this end, in this paper, three approaches to tailor the parameter values to optimize the quality of the connection are compared: a heuristic by Potente, statistical experimental design, and Bayesian optimization. With the suitability for practice in mind, a series of experiments are carried out with these approaches, and their capabilities of proposing well-performing parameter values are investigated. As a result, Bayesian optimization is found to yield peak performance, but the costs for optimization are substantial. In contrast, the Potente heuristic does not require any experimentation and recommends parameter values with competitive quality.


Introduction
One of the standard methods for joining thermoplastics is called heated tool butt welding [1]. This method is used for welding two plastic components together that cannot be manufactured jointly due to technical reasons such as their geometrical shape or different materials. In this process, the individual parts are assembled by first melting the contact surfaces and joining them afterward by pressing them together. While the contact surfaces can be heated each for a different duration and at a different temperature, pressing them together can also be done differently. It can be done either force-or path-regulated. Depending on the selected process, different parameters need to be specified. The choice of parameter values is challenging, especially when welding different materials, because the quality of the result is strongly influenced by it. Thus, the joining of the materials can be of higher or lower quality resulting in better or worse longitudinal weld strength.
The literature is rich of methods to find suitable parameter values. On the one hand, Potente and Tappe [2] propose a heuristic (called Potente heuristic, for the remainder of this work) to derive suitable parameter values from the characteristics of the materials to join. On the other hand, there exist statistical methods such as statistical experimental design, recommending parameter values to test and thereby finding proper parameter values empirically [3]. Furthermore, in the field of artificial intelligence and machine learning in particular, the problem of finding parameter values in order to maximize an unknown function is a well-studied problem, referred to as black-box optimization [4,5]. A renowned approach to black-box optimization is called Bayesian optimization (BO) [6,7], which leverages a machine learning model to make predictions on what parameter values are more promising to yield improvements over the highest function value seen so far.
In this paper, the three aforementioned methods, i.e., Potente heuristic, statistical experimental design, and Bayesian optimization, are investigated regarding their applicability, practicality, and quality of welding results. To this end, an extensive experimental study is conducted, joining tensile bars of polymethylmethacrylate (PMMA) and acrylonitrilebutadiene-styrene (ABS) and measuring their weld strength.
The results of this study show the Potente heuristic to be a surprisingly strong baseline for both automated methods. The latter require several trials in order to find well-performing parameter values rendering them significantly more expensive than the Potente heuristic. Although Bayesian optimization indeed manages to yield peek performance in these experiments, the Potente heuristic performs reasonably well and only requires a small set of preliminary experiments to derive the parameter values.

Heated tool butt welding
The heated tool butt welding process is one of the common methods to join thermoplastics. This welding process is used to interlink two plastic components which cannot be manufactured in a single operation due to their geometry or different materials. During the heated tool butt welding process, no additional joining elements such as screws or rivets are required. It is used, for example, in the automotive industry to produce taillights. Here, the two different materials ABS, which is used as the black backside of the taillights, is welded with the transparent PMMA. Two types of ABS and PMMA are used for this mixed welding. Both have almost the same viscosity, such that the parameters of the process can be adjusted as easily as possible.
The heated tool butt welding process is divided into four successive phases: 1. Adjustment phase 2. Heating phase 3. Changeover phase 4. Joining phase First of all, the two components have to be placed in front of the heating tool. In the adjustment phase, the components get in touch with the heating tool without any pressure and the uneven parts of the two surfaces are melted within a very short time (< 1 s). Afterward, the material is melted at the heating tool for a predefined time during the heating phase. Following this, the two components are removed from the heating tool, which furthermore moves out of the joining zone in the changeover phase.
Then, the joining phase can be realized with two different regulations -force and path. When the process is regulated by force, the two components join with a defined force and velocity until the defined welding time has elapsed. In the path-regulated process the components weld in this phase with a defined velocity until the requested joining path is reached. Figure 1 shows a schematic illustration of the welding process for two different materials. Consequently, when employing path regulation the joined component always has a fixed, predefined length which is very important, if the component has to be installed in other products. Moreover, quite slow welding velocities can be realized in this regulation to avoid unnecessary residual stresses in the seam. The force regulation has the advantage that the same force is applied during the entire joining process and thus no stress peaks occur during the joining phase. The process thus remains more constant.
Overall, this results in five crucial parameters for the mixed material heated tool butt welding process, which must be defined in advance for each welding process: Other parameters such as the velocities or accelerations in each phase are adjustable, but it can be assumed that they are rather unimportant for the quality of the weld seam. These parameters only need to be set within a plausible range. Therefore, this work is limited to the optimization of the five parameters mentioned above. The quality of the weld seam is evaluated through the weld strength, which is examined in a short-time tensile test. When using heated tool butt welding for industrial applications, some general heuristic is used to set the parameters. These heuristic rules do not necessarily guarantee an optimum weld strength, but in most cases provide a good first indication. The rules are based on, for example, component geometries [8]. Usually, if further experimental points are to be investigated, either the trial and error method together with expert knowledge is used or statistical experimental designs are set up.

Related work
From the research work of Potente [8], the relationships presented in Table 1 are known for heated tool butt welding, which provides a general guide for configuring the welding machine. These are derived from the material properties of the plastic and the geometric dimensions of the component. However, these calculation bases only provide a possible setting window and do not guarantee an optimum. This Potente heuristic form the basis for the first set of parameters for a heated tool butt welding process of thermoplastics for which no optimal setting window is already known.
In a recent work by Mathiyazhagan [9] the relationships between 16 parameters in heated tool butt welding and their influence on weld strength were modeled. The expert-based DEMATEL method [10] was used to identify and prioritize the different key factors. From the results of the study, it is found that both the heating temperature and the welding and heating time are the key factors that most influence the weld strength. The influence of other parameters can be considered minor. These results support the focus of these investigations on the mentioned parameters when it comes to maximizing weld strengths. Other research centers deal with very specific issues and thus only highlight partial aspects of heated tool butt welding or specific thermoplastics or thermoplastic pairs. For example, Ülker et al. [11] optimized the setting parameters according to the Taguchi experimental test method to maximize both weld strength and experimental efficiency. The results show that the joint strength of the semi-finished products increased by 70% compared to the initial settings. Here, the parameters for monomaterial welding of a polycarbonate-ABS blend were optimized.
Framing the search for well-suited parameter values as an optimization problem, various approaches tackle this problem employing black-box optimization techniques [12], such as Bayesian optimization [13][14][15]. Leveraging machine learning models, as for example Gaussian processes, the parameter values of the welding process can be tailored to the specifics of the material or the concrete machine used for welding [14]. However, in contrast to previous works, in this work, the focus lies on welding different materials of thermoplastics via heated tool butt welding, which is considered more challenging in general and requires more parameters to be configured which are listed in detail in Section 2.

Methods for configuring heated tool butt welding
Before detailing the different optimization methods focused on in this work, the problem is shortly formalized as follows.
For the configuration of heated tool butt welding one is interested in finding a configuration ⃗ * ∈ Θ = Θ 1 × … × Θ d featuring parameter values for controlling the welding machine.
Here, Θ i denotes the domain of parameter i such that Θ defines the so-called (joint) configuration space over the parameters. In particular, the goal is to find a configuration ⃗ * ∈ Θ optimizing a certain objective function g ∶ Θ → ℝ , in this case, the weld strength of the welded thermoplastic composition, i.e., Often, finding the optimal configuration ⃗ * is deemed impossible from a practical perspective, due to limited resources such as optimization time and cost, so that one resolves to find a configuration that is as good as possible given the constraints. In the following, three approaches for doing so are discussed: a heuristic method ("Potente heuristic"), statistical experimental design, and a machine learning based method called Bayesian optimization.

Potente heuristic
The heuristic was established by Potente at Kunststofftechnik Paderborn [8]. They comprise some essential formulas that specify the relevant parameters of heated tool butt welding. The basis of these rules is the respective thermoplastic type (amorphous or semi-crystalline) and their characteristics such as the melting temperature T M or the glass transition temperature T G and geometric properties like the thickness d of the components. Other material characteristics such as viscosity are not included in the simple equations of the Potente heuristic. The individual equations are shown in Table 1.
If these rules result in values that are not achievable with the individual machine, such as a too high T Heating Element , it is also possible to set a further possible temperature of the heating element from tables in the literature. These table values are the result of extensive research and were also established by Potente [8]. In order to determine the melt layer thickness L 0 , two components made out of the same material are to be welded. For this purpose, the calculated heating temperature from Table 1 is used. The heating time is initially set randomly and the process is force-regulated at a pressure of 1 MPa. At this pressure, according to Potente, 95% of the previously generated melt layer flows into the weld seam. Based on the positions of the tools in the welding system, the melt layer produced can then be determined.
The heating times are to be adjusted until the calculated melt layer thickness from Table 1 is reached. The required parameters for the process are then specified. The Potente heuristic solely suggests path-regulated configurations, since only the welding path s F but no welding force can be calculated.

Statistical experimental design
Based on the Potente heuristic, partial factorial test plans are often used in industrial applications. This is due to the fact that in statistical design of experiments several factors can be optimized simultaneously. In other more intuitive methods like one-factor-at-a-time, this is not possible. The configuration space is defined based on the minimum and maximum values for each parameter. In the setup, it must be decided how the lower and upper bounds of the parameters are selected, depending on the application. Since minimum values must always be welded with maximum values for the partial factorial test plan, the selected limits should also be weldable. For example, it must be possible to produce samples with the smallest temperatures and heating times and the largest welding path. In some cases, machine-related limits must also be considered. In order to find the appropriate limits under all of these conditions, preliminary tests or extensive expert knowledge are required. It is also recommended that the values per parameter are set symmetrically to the mean value from the Potente heuristic. In heated tool butt welding, a separate partial factorial test plan must be set up for each machine regulation strategy (force and path). These partial factorial test plans were generated with the software Design-Expert 1 , which ultimately suggests a final configuration, presumably yielding optimal performance according to the welded experiments. This configuration is then welded and tested again.

Bayesian optimization
Bayesian optimization (BO) [6] is one of the most important techniques for tackling the problem of black-box function optimization. Given a costly-to-evaluate function f ∶ ℝ d → ℝ , the goal in black-box function optimization is to find the function input, which maximizes (or minimizes) the function f, i.e., The problem is called black-box because the function to be optimized is not given in an explicit form, for example in the form of an analytical expression. Instead, the only way to "access" the function is via point-wise evaluations: Submitting a query point ⃗ x , the black-box returns the value f (⃗ x).
Parameter optimization can be posed as a black-box function optimization problem, when defining the function f = g as the objective function to be optimized, i.e., the weld strength.
BO tackles the optimization problem by alternating between creating/updating a cheap-to-evaluate, probabilistic surrogate model f ∶ Θ → P(ℝ) , based on evaluations performed on f and exploiting this surrogate model through a so-called acquisition function to decide on the next point to evaluate f on. Here, P denotes a probability distribution. This process is repeated until a stopping criterion is met and the best configuration found is returned. The underlying idea is to quickly find well-performing configurations while evaluating f on as few points as possible.
As the description above suggests, the two core elements of BO are the surrogate model and the acquisition function. Although the first can in principle be instantiated with any probabilistic machine learning model, the most common choices in practice are Gaussian processes, tree Parzen estimators [16] and random forests [17] as all of these yields reasonably good models when trained with only a few data points.
The surrogate model has to be probabilistic, i.e., return a probability distribution instead of a point estimate. This is a requirement as the acquisition function, which essentially rates configurations according to their sample value, has to balance between sampling points, which will most likely yield very good performance (exploitation), and points, which could yield even better performance, but also much worse performance (exploration). To balance these two criteria, it does not only need to have access to the expected value of a configuration according to the surrogate model but also requires a quantification of the uncertainty about this expectation. The most common acquisition function is called expected improvement (EI) [18,19] and is defined as x is the best configuration seen so far. The expectation is required as f (⃗ x) is unknown at the time of the computation of (⃗ x) and is thus a random variable. For the same reason, for the actual computation, which means that the expectation is then taken with respect to the probability distribution returned by f . EI corresponds to selecting that configuration, which maximizes the improvement over the current incumbent � ⃗ x in expectation. Depending on assumptions underlying the surrogate model, the EI criterion can be efficiently approximated.

Methodological comparison of methods
The aforementioned methods mainly differ in the way they traverse the solution space underlying the optimization problem (1), in the way they incorporate experimental data, and in the degree of interaction with the practitioner.
The Potente heuristic is based on formulas, which have been manually derived by experts based on experience and experiments. Based on these rules all parameters, except for the heating times, can be fixed and the traversal of the solution space mainly consists of annealing the heating times as explained earlier such that very little data is injected during the optimization. As such, the method does not only require a domain expert to compute the configuration based on the rules, but also an iterative interaction with the domain expert to anneal the heating times. However, this interaction is rather limited as the annealing usually can be done in a very limited amount of samples.
In contrast to this, the partial factorial experiment design is a classical statistical method, which traverses the solution space in a controlled manner by investigating combinations of parameter extreme values generated upfront to generate a final configuration to recommend. The finally recommended configuration is almost exclusively based on the generated experimental data except for the configuration resulting from the Potente heuristic, which is taken as the center to define the extreme values. The interaction with the practitioner is on a higher level than for the Potente heuristic, but still quite limited. The approach essentially generates a set of configurations upfront which have to be welded by the practitioner before it suggests the final configuration, which has to be welded finally as well.
Lastly, Bayesian optimization is a purely data-driven approach, which can be initialized based on the configuration resulting from the Potente heuristic. It traverses the solution space based on its current knowledge, i.e., the acquired data and the surrogate model built from it, with the goal to sample as few points a possible. Due to this traversal strategy, it requires a high and iterative interaction with the domain expert, which has to feed the result of a configuration back to the method before being provided the new one.

Experiment setup
Similar to the production of automotive taillights, PMMA is to be welded with ABS. These two amorphous thermoplastics have different viscosities, as it is difficult to achieve a stable and optimum operating point for both materials in such a constellation. This challenging setting allows for a better comparison of the methods with each other. Details of the two materials are listed in Table 2.
The different methods are used and compared with each other to optimize the weld strength. Figure 2 shows the machine with the different components which is used to perform the welding process. The components are half tensile bars with a thickness of 4 mm which are welded to produce a complete tensile bar. Figure 3 shows a welded sample. For each test point, five samples are welded, cooled overnight in norm climate (23 ∘ C, 50% relative humidity), and then subjected to a short-time tensile test on a Z010 universal testing machine from Zwick Roell. Then, the average values and standard deviations of the different test points can be compared afterward. To ensure that even small differences in weld strength can be detected, the materials are tested in the brittle state at very low temperatures of − 40 ∘ C. The shorttime tensile tests are carried out at a beam speed of 20 mm min .

Potente heuristic
Since the materials PMMA and ABS are amorphous thermoplastics, as mentioned earlier, the heuristic from Table 1 delivers the values presented in Table 3.
Here, the heating element temperatures from the tables in the literature were used, since a calculated value would have exceeded the maximum possible temperature of the machine.

Statistical experimental design
The values for the partial factorial test plan are limited upwards on the machine side, especially for the temperature of the heating elements. For the limits of the heating time and the parameters joining pressure and joining  3 10 min (230 ∘ C, 3.8 kg) 34 cm 3 10 min (220 ∘ C, 10 kg)  Table 4 for the limits of the test plan and thus generates the two test plans presented, one for path and one for force regulation, in Tables 5 and 6. For the partial factorial experimental design, only the minimum and maximum values are used for welding. The individual test points are randomized, as is common in statistical experimental design. The randomized order has to be set at the beginning and is not changed thereafter [20].

Bayesian optimization
The experiment is modeled using the parameter optimization software SMAC [21]. Four SMAC processes are run in parallel with varying settings differing in the kind of surrogate model and the strategy used to initialize it (cf. Table 7). In particular two processes are initialized with the configuration computed using the Potente heuristic as a means to inject domain knowledge into the process.  The decision to run four processes in parallel was motivated by practical considerations as welding multiple configurations is not much overhead compared to welding only a single one if the machine is already blocked. Our BO system is designed in such a way that whenever two processes request the same configuration it has to be welded only once and the resulting weld strength is fed back into both processes. Interestingly the four processes quickly converged onto mimicking only two processes as the kind of surrogate model used turned out to be unimportant.
The underlying configuration space consists of 7 parameters, 6 of which are integer-valued -one for each of those mentioned in Section 2 -and one features a binary categorical domain indicating whether the welding process is path-or force-regulated. The lower and upper bounds of each parameter can be found in Table 8. Since the configuration space boundaries were intentionally set larger than commonly done in order to allow the optimization to explore uncommon configurations, the welded sample sticks were visually investigated before testing weld strength such that obviously failed attempts as shown in Fig. 4 could be discarded early. The weld strength was set to 0 for those cases.
In order to simplify interaction with the BO processes, a web frontend (cf. Figs. 8 and 9) was implemented, which served the purpose of presenting the configurations to be welded next and, moreover, allowed to enter the results in terms of weld strength. When the result for a configuration was entered, it was automatically fed to the SMAC processes waiting for it, and the next configurations were generated and added to the frontend.

Results
In order to obtain a visual representation of the difficulty of the heated tool butt welding parameter optimization problem, a principal component analysis [22] over the configurations obtained from the experiments associated with all optimization methods was performed. We note that this was only done for illustrative purposes. Based on this analysis, each configuration was plotted with respect to their first and second principal components, and the corresponding weld strength was visualized through color in Fig. 5.
The visualization indicates quite nicely that the problem is rather hard, because configurations, which are very close together on the plot, can yield drastically different weld strengths.
In the following, the results achieved by the different optimization methods are presented.

Potente heuristic
The evaluation of the parameters from the Potente heuristic results in a weld strength of 49.5 MPa with a standard deviation of 2.33 MPa. Since this configuration was also used as a starting value for some of the BO processes, these results are also illustrated in Fig. 6 in test point 2.

Statistical experimental design
The results of the two statistical experimental designs are shown in Table 10. The results of the partial factorial experimental design lead to a final configuration that outputs the presumably optimum parameters for the respective machine control. The weld strength of the final configurations suggested by the corresponding fractional design is also shown in the table. The weld parameters of the final configuration are listed in Table 9.
It can be seen in Table 10 that the average weld strength varies significantly among the individual test points. In addition, the standard deviations are sometimes very large. In the statistical test design used, this is due to the fact that only the configurations featuring extreme values were welded. The final configuration suggested by the factorial design for the path-regulated process shows the highest weld strength of 47.41 MPa. For the force-regulated welds, the weld strengths are lower overall. In addition, the final configuration does not show the best properties here. Test point 15 has the better strengths with 42.06 MPa for the force-regulated weld. Since the aim of the project was to configure the best weld properties with different methods, this value of test point 15 is used below for the force-regulated   Figure 6 indicates the weld strength associated with configurations sampled by the Bayesian optimization processes.

Bayesian optimization
Configurations featuring a path-regulated welding process are marked with blue dots and those featuring a force-regulated process are marked with green crosses. The orange line represents a linear regression fitted to the performance models.
Firstly, it becomes evident that the BO process sampled much more path-regulated configurations than force-regulated ones, which coincides the lower weld strength of the latter found in the partial factorial design studies. However, the plot also shows that, although the force-regulated configurations often achieve a lower weld strength than the path-regulated ones, the dispersion of the achieved strengths, i.e., the variance, is lower.
Secondly, the linear regression line indicates that the average weld strength of configurations slowly increases with the number of experiments performed as one would expect if the optimization makes progress. Unfortunately, the increase is rather slow. Table 11 shows the top 5 configurations with respect to the achieved weld strength. Test point 2 relates to the configuration computed using the Potente heuristic, whereas test point 40 shows the best configuration, which was found by one of the two processes initialized with the aforementioned Potente heuristic configuration. Thus, injecting domain knowledge into the optimization process seems to be beneficial. These top 5 results show that even unusual parameter configurations can achieve a high weld strength. The highest weld strength was reached with the parameters of test point 40, which suggests very long heating times with more than 2 min. Such long heating times result in high melt layer thicknesses and thus in strongly pronounced weld seams. This can explain the high weld strength in this point. Test points 55 and 10 are very similar to each other. The heating times are very similar and the welding path is even the same. Since the performance is also very similar, the standard deviation of test point 55 should identify the better configuration. All in all, it can be said that even unusual configurations can lead to high weld strengths. Figure 10 shows a comparison of the average weld strength associated using the best configuration found by the different methods. The standard deviation across the different experiment repetitions is shown as a black line.

Comparison of results
As one can see, both the Potente heuristic and the Bayesian optimization approach achieve a similar weld strength around 50 MPa featuring a rather low standard deviation although the BO approach achieves a slightly better average weld strength. The two factorial design methods yield both a worse average weld strength and a larger standard deviation. In industrial applications not only the weld strength itself is of interest but also the effort required to find weld parameters leading to the respective weld strength. To this end, the number of working days needed to run each process until the final weld parameters can be determined are presented in Fig. 7. It becomes evident that, although the BO approach was able to generate the configuration with one of the best average weld strengths, it took by far the most effort in terms of working days spent welding as Fig. 7 shows. Considering this and the rather negligible

Conclusion
This paper considered the problem of determining suitable parameters for heated tool butt welding for the scenario of joining different thermoplastics. In an extensive experimental study, a simple heuristic (Potente) was compared to more sophisticated methods, i.e., statistical experimental design and Bayesian optimization. Although Bayesian optimization manages to yield peek performance in the experiments, finding these parameter values also comes at a high cost, rendering the approach less practical. Surprisingly, applying a simple heuristic solely taking into account known characteristics of the materials results in competitive performance without incurring any additional costs for trying out different parameter values.
Despite the number of trials required by Bayesian optimization to reach a certain performance, showing off peek performance and picking parameter values that would most likely not be considered by practitioners makes future work in this direction still appealing. First of all, the Bayesian optimization method incorporates only little domain knowledge in the form of the starting configuration which was derived from the Potente heuristic. Further enriching the optimization method with such domain knowledge could be beneficial in order to sample parameter values more efficiently and thus reduce the costs of experimentation [23]. Another promising direction is to consider multi-fidelity optimization where the number of repetitions per configuration is budgeted in order to reduce the overall number of weld operations or distribute the overall budget in a more clever way [24,25]. Furthermore, such approaches typically allow for a better parallelization which would also make the optimization approach more practical. Lastly, one could try to optimize over all possible welding parameters, instead of a limited set, in order to investigate whether new knowledge regarding the importance of parameters or connections between them can be detected.

Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Fig. 10 Comparison of the weld strength of the best configuration found by the different optimization methods