Critical joint identification for efficient sequencing

Identifying the optimal sequence of joining is an exhaustive combinatorial optimization problem. On each assembly, there is a specific number of weld points that determine the geometrical deviation of the assembly after joining. The number and sequence of such weld points play a crucial role both for sequencing and assembly planning. While there are studies on identifying the complete sequence of welding, identifying such joints are not addressed. In this paper, based on the principles of machine intelligence, black-box models of the assembly sequences are built using the support vector machines (SVM). To identify the number of the critical weld points, principle component analysis is performed on a proposed data set, evaluated using the SVM models. The approach has been applied to three assemblies of different sizes, and has successfully identified the corresponding critical weld points. It has been shown that a small fraction of the weld points of the assembly can reduce more than 60% of the variability in the assembly deviation after joining.


Introduction
In the manufacturing industry, nominal geometries do not exist. Every component to be assembled is subjected to geometrical variation. Identifying the sources of variation and controlling them is a common challenge. Robust design perspectives [14] and geometry assurance activities [16,28] are developed to control the effect of geometrical variation in the final product. Joining operation is one of the most critical processes causing the final geometrical variation. For sheet metal assemblies, where the sheets are compliant, the joining operation is mainly spot welding or riveting. With the recent development within digitalization and availability of data, a self-compensating assembly line for sheet metal assemblies is developed based on the digital twins [18]. In this concept, a Computer-Aided Tolerancing (CAT) tool is taking advantage of local optimizers to improve the geometrical quality of the assemblies in different disciplines. For the joining operation, the optimal sequence of the spot welding process is proposed using the digital twin for individualized assemblies [22]. For each assembly, a number of the most important welds is needed to be selected and set in a sequence B Roham Sadeghi Tabar rohams@chalmers.se 1 Department of Industrial and Materials Science, Chalmers University of Technology, Gothenburg, Sweden and secure the geometry in each assembly station. Therefore, first, the number of welds that are critical for the assembly needs to be determined. Secondly the sequence of such welds considering the rest of the weld points should be decided on. The question of the sequence of welding is categorized as a classical Operation Research (OR) problem. It is an NPhard combinatorial optimization problem with an expensive evaluation function. In this paper, the above-mentioned problems are tackled with a state of the art clustering approach for weld identification and a novel machine learning perspective on the sequence optimization.

Assembly evaluation
To evaluate the geometrical deformation of the sheet metal assemblies, given the part tolerances and the material properties, CAT-tools based on the Finite Element Method (FEM) are used. With the Method of Influence Coefficients (MIC), a linear relationship between the part deviation and assembly deviation can be achieved by constructing a sensitivity matrix as a medium. This method has shown to perform faster than other direct methods for this purpose [11]. The accuracy of the method is increased by combining the MIC with a contact modeling algorithm. The contact points in this method avoid the parts penetrating in the adjacent areas during the assembly [9]. The calculation time of the MIC and contact modeling has been reduced with a new method to calculate the sensitivity matrix of the spot welded assemblies in a sequence [12,13].

Joining sequence optimization
Finding the optimal joining parameters has been studied over time. Usage of neural networks in the early 90s has been addressed [3,6]. In more recent studies, a deformation prediction approach for continuous welding applications has been introduced using the support vector machines (SVM) and evolutionary optimization [25]. Principle component analysis (PCA) have been studied together with the optimization approaches for the optimal design of joining parameters, such as welding time, current, and electrode force [31].
Using the state of the art MIC and contact modeling, the sheet metal assemblies are evaluated for a specific joining sequence [12]. To identify which sequence of spot welding results in the optimal geometrical outcome of the assembly is a NP-hard combinatorial optimization problem. Assembly deformations after welding, calculated using MIC and contact modeling, is the objective function. The evolutionary algorithms, specifically Genetic Algorithm (GA), have been studied for this problem [5,8,17]. The performance of other evolutionary algorithms has been compared with each other with the same approach [19]. Rule-based approaches have been introduced and have shown to improve the performance of the evolutionary algorithms on the problem [21,24]. Using the neural networks, a surrogate modeling approach has been introduced, representing the sequence-deformation relationship of the assembly [22]. A new stepwise algorithm has also been introduced to identify the optimal welding sequence with the minimum number of evaluations with the method presented above [23]. In most of the studies, the aim is to present a complete sequence that results in the minimum assembly deformations after joining. It has been studied in [20] that some of the weld points are more important in the sequence than the others. A sensitivity analysis approach with regards to the weld gaps, has been shown to be effective. However, the number of critical welds in the sequence has not been discussed. Identifying such welds, accurately, help reduce the problem size for sequence optimization. For industrial applications, the weld points that are set initially on the assembly to lock the geometry, in the processes downstream, are referred to as geometry weld points or geo-spots [29]. These weld points are often identified based on physical experimentation over time, and the sequence of such weld points are critical for the geometrical outcome. The rest of the weld points on the assembly are referred to as re-spots, where the sequence of these weld points do not impact the final geometrical outcome. The challenge is to identify the number of the geometry points, the location of these points among a set of alternatives, and their sequence to minimize the geometrical deviation of the assembly. The identification of the critical weld points helps efficient assembly process planning while improving the geometrical quality [27].
To identify the most critical weld point in a sequence, with respect to geometrical deviation, a method where joining sequences can be evaluated time-efficiently, and an approach to distinguish between the sequence elements are required. While previous studies have mainly focused on introducing an optimal complete sequence, where the problem is finding the optimal sequence for an assembly with a fixed number of joints, the problem of identifying the most critical joints, their number and their location from a set of joints has not been addressed. Therefore, in this paper, an approach to evaluate the sequences time-efficiently is introduced, and a method to identify the number and location of the critical joining elements in a sequence is proposed.

Scope of the paper
In a sheet metal assembly, a number of weld points, known as geometry weld points, play an important role in the final geometrical quality. To identify the number, the location of these weld points out of a number of candidates, and their sequence leads to reduced problem size for sequencing and efficient assembly process planning [29]. Previous studies focus on presenting an optimal sequence considering all the joints, while there is no generic approach for the identification of critical weld points. In this paper, the accurate number, location and sequence of these critical joints are identified with an input-output SVM model for sequence evaluation, and PCA to define the number of the critical joints. The first Section provided an introduction to the problem. The rest of the paper is structured as follows. In Sect. 2, the proposed method for identification and sequence determination of the critical weld points is introduced. Section 3 introduces the reference sheet metal assemblies. The method is evaluated on the sheet metal assemblies, and the results are presented in Sect. 4. Finally, the conclusions and future work plans are presented in Sect. 5.

Proposed identification method
In this section, the proposed method, including the sequence prediction method and the critical weld point identification, is described. The prediction method is built on the surrogate modeling approach studied in [22]. In this study, to predict the geometrical outcome of the spot welded assemblies, a radial basis function (RBF) network is designed and trained with a specified sample. The approach is for the purpose of the complete sequence optimization, considering all the weld points in the assembly. In this paper, accurate evaluation of the sequences needed for critical weld point identifica- tion is essential. Therefore, the above approach is further expanded with SVM networks tuned to predict the outcome of the sequences accurately, and PCA for critical weld point identification. The list of the variables that are introduced throughout the paper and their description are presented in Table 1.

Sequence data processing
Evaluating the initial sequence elements, specifically the initial triplet, has shown to describe the overview of the changes between the sequence with respect to the geometrical deviation of the assembly after welding [22]. Consider the assembly with seven weld points, [w 1 , w 2 , w 3 , w 4 , w 5 , Fig. 1 Comparison between all the permutations and sample of first triplets w 6 , w 7 ], then all the combinations of the first three elements, w 1 , w 2 , w 3 , with any combination from the other elements, provides a clear understanding of the major differences between the sequences. Figure 1 shows all the geometrical deviation in an assembly with seven weld points and the selected sample, as stated above. To represent the geometrical deviation of the assembly the root mean square of the deviations of all the mesh nodes in a specific direction, here normal direction, y, are calculated using the state of the art MIC and contact modeling approach presented by [12]. Any other preferred measure of deviations can be used in the approach. This information can be used to divide the sequences to different intervals, based on the differences between the sequence intervals. Within each interval, the refinements of the observations can be performed to capture the variability among all the sequence elements.
To identify such intervals of permutations, the initial elements of the sequence is evaluated, and a surrogate model is built with the previously introduced approach in [22]. Now finding the outliers in the data with such information can be achieved comparing the differences between the surrogate model responses in each interval. Defining n!×n the sorted matrix of all the permutations of n, with a unique sequence of welds in each row, n!×n+1 is the input-output matrix, where the output π , assembly response to each sequence retrieved from the surrogate model is in the last column of the matrix. The vector of the difference of the simulated responses is gathered in; Consider the assembly with seven weld points in Fig. 1, the vector D reveals the interval indices between the sequences. The limit for the changes in the D can be set to three standard deviations from mean. Let μ D , and σ D be the mean and standard deviation of vector D. Elements of D, are represented by d. If d ≥ μ D ± 3σ D , then the index of d in vector D is the data division point in . The outcome of this process is the divided data where in each interval the response of the assembly to the sequences is within the specified limit. Figure 3 shows the generated intervals on the data set presented in Fig. 2. On this data set, 49 sequence intervals are generated where, for each interval, the assembly response lie within the threshold defined.
To increase the accuracy of the variability representation in each arbitrary interval, the sequences are analyzed to identify the common elements. In each interval, the sequences are represented by the matrix i . The size of the matrix is the . . , n}, the n th sample set is identical to i . With a brute force approach, a suitable i s is selected. One approach to achieve this is to evaluate how many elements of the sequence in each interval can describe the variability between the sequences, within the specified error from a reference. The initial surrogate response is considered as the reference response for each sequence. To clarify this point further, consider the first interval of the sequences 1 , the response of this interval, from the initial surrogate is visualized in Fig. 3. This interval is made of 96 sequence vectors of length n = 7. Finding the common elements in each position x 1 to x 7 between the 96 sequences, give us sevens sets of samples, 1 1 to 1 7 . The responses of each of these samples are shown in Fig. 4. In this interval the first two sample sets did not have any members. The rest of the samples are presented. It can be seen that a model fitted to 1 4 , could describe the variability in this interval. To identify which set of sample is suitable for each interval, a model is fitted to each sample at each interval. The model that results in the root mean square error, e, below the specified limit, u is chosen to be evaluated using the CAT-tool.
The difference in the response of the initial surrogate to the sequences in the interval i , and the response of the model built by sample i s , shown byˆ i s is calculated to build e, where L is the length of each interval. The error limit, u, can be set based on the computation time window. After evaluating the sample of the interval using the CAT-tool, the interval response setˆ i is updated with the chosenˆ i s observations. Depending on the size of the problem, the necessary intervals can be selected. For larger problems, the intervals with the minimum assembly deviation, and the intervals necessary for PCA analysis are updated.

Support vector machines
With the data processed, non-linear models need to be built for the intervals. There are several approaches available for this purpose, while it has been shown that usage of the radial basis functions can provide an accurate estimation for modeling the different deviation among the sequences [22]. SVM is a method that incorporates the RBF kernels to transform the sequence input to the required assembly deviation form. It has also been shown that the performance of such models can be improved by hyper-parameter optimization [10] which can be performed in parallel for time-efficiency. Having a set of observations in each arbitrary interval of the sequences and the initial surrogate model, an SVM network is built for the interval, to represent the variability between the sequences. SVM has been introduced by [26] for data classification [7], and has been extensively used for non-linear modeling [4,30]. Considering the welding sequence as the multiple input, and, the assembly response as one output, then SVM can be used for nonlinear function estimation [26].
The function f (x) is the estimation of the assembly response to weld sequence input, x. Where w l is the support vector weights and φ l is the non-linear transfer function of the input x, and b is the bias. With a Langrange dual formulation, the kernel function G(x 1 , x 2 ), expresses the dot product of the transfer functions φ 1 and φ 2 . The Lagrangian function with the multipliers α l and α * l for each weld sequence assembly response x l is expressed as: Through regularization, the weight factors are determined. Ultimately, the estimation function can be expressed as : where the values of (α l − α * l ) can be determined through quadratic programming. In this study, the Gaussian RBF kernels are used in the SVM model, as the previous studies specified the applicability of the RBF on such problems [22]. The RBF kernel function is expressed as: where σ is the spread of the kernel function.
With the above formulation, for each interval i,ŷ i = f i (x) is trained using the generated sample, to estimate the response to the sequences i . As a result, a non-linear SVM model represents the response of the sequences in the interval. Now to evaluate a given sequence, x l , initially the interval corresponding to the sequence is identified, and the f i (x) is used to estimate the response.
For problems of a larger size, the SVM models can be built for the intervals necessary for PCA analysis and intervals resulting in the minimum mean assembly deviation.

Principle component analysis
Accurate number of the sequence elements contributing to the majority of the variation among the sequences needs to be determined. This is in line with the definition of the PCA used to exploratory data analysis, to summarize the main characteristics of the data in lower dimensions [2]. Being able to estimate the response of the assembly to the sequences using the SVM models, PCA is used to determine the number of the features describing the variability in the sequence of the weld points. The PCA method intends to map the explanatory data in higher dimensional space to lower dimensions. The approach is based on eigenvalue decomposition of a data covariance matrix [1]. To evaluate the number of features describing the behavior between the sequences, a meaningful data set is needed. As shown in Fig. 1 the triplets of the sequences can describe the major effects between the sequences. Generating a data set that con-  x 2 , x 3 , q, . . . , x n ], q n/3 = [. . . , x n−3 , q]. To clarify this point further, consider an assembly with six weld points, the matrix q is expressed as follows: The matrix [q 1 ; q 2 ; . . . ; q n/3 ] m×n is evaluated using the SVM models created, and the response feature is added as a column to the matrix. The creation of the cyclic matrices for non-divisible dimensions can be performed by considering all the combinations of the last three elements, Defining the data matrix as Q, the PCA is performed on such a matrix.
After normalization of the Q matrix, and ensuring zero mean for each feature x 1 to x n+1 , the covariance matrix, Q , of the matrix Q, with m input vectors of n + 1 feature, where x 1 to x n are the sequence of welds, and x n+1 isŷ, is established: The eigenvectors of the covariance matrix Q represents the singular value decomposition of the data set. Defining U n×n as the matrix of the eigenvectors for each feature, the approximation of the data, Z from x ∈ R n to x ∈ R k is: To identify which k number of features represents the variability in the data, a limit J can be set for the ratio of average squared projection error form n dimensions to k, on the total variation in the data.
The minimum k that satisfies (9), to retain the variability in the data corresponds to the number of the critical weld points in the assembly.

Reference assemblies
The proposed method is evaluated on three automotive sheet metal assemblies. The assemblies have been modeled in the CAT-tool RD&T [15] according to the assembly process, for compliant variation simulation. The positioning system for the individual components and assembly inspection is defined. Scanned data of the components are used as part deviation input. Contact modeling is performed in all the assemblies. The welds are introduced to the models. The material properties of the assemblies are also included. All the assemblies are steel with different sheet thicknesses. The elastic modulus for all the parts has been 210 G Pa. The details of the assemblies are provided below.

Assembly I
This assembly is constructed with two parts. Seven spot welds join the parts together. The sheet thicknesses for Part 1 is 1.6 mm (mm) and Part 2, 1.2 mm. Contact modeling is performed using 154 contact points. The positioning system of the assembly is shown in Fig. 5 with the arrows. The spot weld locations and their numbering are also visualized.

Assembly II
In this assembly, three parts are joined together with nine weld points. The contact modeling is performed using 194 contact points to avoid the penetration of the parts. The sheet thicknesses for all the parts are 0.8 mm. The positioning system and the welds points are visualized in Fig. 6.

Assembly III
This assembly is a larger model, with three parts. Ten weld points join these three parts together. The thickness for Part 1 is 2.3 mm, Part 2, 1.6 mm, and Part 3, 1.5 mm. The model is prepared with 813 contact points. The positioning system and the weld points are shown in Fig. 7.

Method evaluation
The proposed method is applied to the three assemblies, and the critical weld points for each assembly are identified. The results are presented in this section. The summary of the proposed method for critical weld point identification is as follows: -Build an initial surrogate model, evaluate the sequences, and divide them into intervals, Sect. 2.1. -Generate a sample for the necessary intervals, and build SVM models for those intervals, Sect. 2.2. -Generate the sequence sample set, Q, for PCA. Estimate assembly deviation of the sample, using the SVM models, and apply PCA to identify the critical weld points 2.3.

SVM
Applying the method on the assemblies results in models of sequence intervals for each assembly. The proposed SVM approach is set up in MATLAB, and parallel Bayesian optimization is performed for each interval model for optimal SVM parameter settings. With the generated SVM models, assembly deviations for each sequence is estimated. The estimates of the assembly deviation for Assembly I is presented in Fig. 8. For this Assembly, since the problem is smaller compared to the other assemblies, all the intervals are modeled for a better visualization. The generated SVM captures higher variability due to the sample updates in each interval, see Sect. 2.2. These models are now used to estimate the required sequences for the PCA analysis.
For Assembly II, the sequencing problem is much larger, with 362,880 alternatives. The initial model for all the sequences relies on the mean numbers of the achieved observation in each interval. In total, 421 intervals are identified. Since covering all the intervals is time-consuming for prob- Fig. 8 Assembly I: the SVM models for each interval compared to the initial model lems with larger sizes, the intervals with minimum mean assembly response from the initial model, and intervals necessary for PCA analysis are built. Figure 9a shows the mean assembly deviation estimation for each interval with the initial model. The interval with the minimum mean assembly deviation is shown as well. Figure 9b shows the SVM model estimates for this interval and compares it to the initial model response. As can be seen, the initial model is dependent on the mean value achieved from the initial sample evaluations, while the interval specific SVM model captures the variability between the sequences more clearly. The SVM models helps to capture the variability between the sequences in the interval with the minimum mean assembly deviation.
Assembly III, includes 10 weld points with more than three million possible permutations. The initial model divides the data to 8761 intervals. Among these intervals, 690 intervals have the lowest mean assembly deviation. By building the SVM models for these candidates, the interval resulting in the minimum mean assembly deviation is identified. Figure 10a shows the mean assembly deviation for each interval. The interval resulting in the minimum mean is updated with the SVM model. Figure 10b shows the updated model compared to the initial model estimate. The initial model underfits the responses in the interval, while the updated SVM model is capturing the variability between the sequences in the interval.
With these information the PCA analysis is performed and the critical weld points are identified.

PCA
The sequences required for PCA analysis for each assembly is generated and estimated using the SVM models, Sect. 2.3. The PCA analysis is set up in MATLAB. The cumulative sum Fig. 9 Assembly II a the initial surrogate model mean assembly deviation for each interval b the SVM model for the interval with minimum mean assembly deviation compared to the initial model of the explained variances by each component for Assembly I is shown in Fig. 11a.
In Assembly I, the first two sequence elements describe approximately 60% of the variability between the sequences. Considering the range of the differences between the sequences which is 0.22 mm, optimizing the first two elements of the sequence reduces 0.13 mm of the assembly geometrical deviation. The percentage of the geometrical deviation that needs to be improved depends on the requirements of the assembly. For these types of assemblies, the requirements are within one-tenth of a millimeter. Therefore, the J value, Sect. 2.3, can be set to secure the needed requirements. For this assembly, the sequence elements three to seven, reduces only 0.088 mm of the total variability range.

Fig. 10
Assembly III a the initial surrogate model mean assembly deviation for each interval b the SVM model for the interval with minimum mean assembly deviation compared to the initial model With this analysis, the number of the critical weld points, refereed to as principle weld component, P w c , in this assembly, is set to two.
To identify the two welds, the interval with minimum mean assembly deviation is analyzed. In this assembly, interval 48 results in the minimum average assembly deviation. Looking at the sequences in this interval, there are 240 sequences, all are starting with the pairs (w 1 , w 6 ), and (w 1 , w 7 ). The optimum sequence for this assembly is, [w 1 ,w 6 , w 5 , w 2 , w 3 , w 4 , w 7 ]. The most critical weld point, therefore is chosen to be (w 1 , w 6 ). The estimate of the above sequence isŷ = 0.3236 mm with the exact same actual value, y = 0.3236. The summary of the results is presented in Table  2.  With the same analysis in Assembly II, elements 4 to 9 of the sequence cover 0.02 mm of the assembly. Considering that there are three parts included in the assembly, by three weld points 71.29% of the assembly deviation is covered. The cumulative sum of the explained variances by each sequence element, for this assembly, is shown in Fig. 11b. The interval with the minimum mean assembly deviation is composed of 720 sequences with (w 9 , w 1 , w 4 ), and (w 9 , w 1 , w 5 ), as the initial three elements. The minimum complete sequence in this interval is composed of (w 9 , w 1 , w 4 ). Therefore, this triplet is chosen as the critical weld points for this assembly.
Assembly III is composed of three parts. Considering this aspect, it can be observed that the initial three weld points in the sequence retain 78.22% of the variability between the sequences, Fig. 11c. According to the interval with the minimum assembly deviation, there are totally 26,368 sequences in the interval, where the sequences starts with the pair (w 3 , w 7 ) or (w 3 , w 6 ). The complete sequence results in the minimum assembly deviation estimate of 1.0322 mm. The observed value of this sequence is 1.0294 mm. The summary of the results are presented in Table 2.

Accuracy comparison
The proposed method intends to evaluate the outcome of each sequence accurately for PCA analysis and critical joint identification. Previous studies have focused on complete sequence identification using the meta-heuristic algorithms. The advantages of the non-linear modeling over the meta-heuristic algorithms, such as GA for complete sequence optimization, are explained and discussed in previous research [22]. Other integer programming methods for joining sequence optimization, require recursive cost matrices (assembly deviation with FEM) for each element of the sequence. Based on this aspect, a stepwise algorithm is introduced for complete sequence optimization [23]. Since the focus on this paper is on sequence evaluation and identification of the critical weld points, the previous critical weld point identification method, presented in [20] is compared to the proposed approach. Rules and guidelines are used to determine the location of a predetermined number of critical weld points in the mentioned study. These rules are; the dis-tance of the weld point to the positioning system, weld gap before welding, and weld gap sensitivity. The sequence of the selected critical weld points is determined by the metaheuristic algorithm GA. The results have shown that weld gap sensitivity is more accurate compared to the rest of the rules. This approach is compared to the presented critical joint identification and sequence determination approach. Table  3 presents the comparison between the proposed non-linear modeling approach with SVM and critical weld point identification using PCA, and the previously introduced weld gap sensitivity analysis for critical weld points identification. One advantage of the proposed approach is identifying the number of critical weld points in a sequence. In contrast, the number of critical weld points in the weld gap sensitivity approach is predefined. In all the assemblies the proposed method outperforms the weld gap sensitivity approach with respect to proposing an accurate number of joints, location and their sequence which results in minimum assembly geometrical deviation.

Conclusion
Joining sequence determination is a time consuming combinatorial problem to solve. While some of the elements of the sequence have a dominant role in the geometrical outcome of the assembly, there has not been a prescriptive study identifying the number, location among a set of alternatives, and the sequence of such weld points, which can be related to the geometry weld points in industrial applications. In this paper, based on the principles of machine intelligence in operation research, with a time-efficient approach, SVM models are built for estimating the outcome of the assembly with specific sequences in combination with the CAT-tools. To identify the number of the critical weld points, a PCA analysis approach is proposed to evaluate the variability of each element of the sequence. The SVM model is then used to determine the sequence of such critical weld points, acting as initial weld points set in the sequence. Three reference sheet metal assemblies with different sizes are evaluated using the proposed approach. The critical weld points of the assembly is successfully identified, showing to reduce more than 60% of the variation between the sequences. The sequence of the critical weld points are identified using the applied SVM model, and a suggestion for the complete sequence of joining is also proposed. The future research includes expanding the approach with permutation learning for environments with independent part input models, or batch analysis, where for each assembly, there are multiple set of sequence intervals. Within a batch of incoming parts, the part deviations are expected to be correlated to each other. Among the batches, these part deviation may not be correlated, which means iterations of the proposed approach are required. Through these iterations, permutation learning can be deployed to analyze the trend of the changes in the joining sequences among the batches. The proposed approach can potentially be applied to other sequenced operations in the assembly process such as clamping.