1 Introduction

A fixture is a workholding device used to securely locate and orientate a workpiece with respect to another workpiece, machine, or measurement tool. Fixtures provide stability and minimum deflection for the workpiece during the manufacturing process. Fixtures also ensure all produced parts maintain conformity and interchangeability. Finally, well-designed fixtures do not interfere with the operating tool. Although fixtures may seem simple products in view of their functionality, their design requires broad expertise and engineering intent. Up to 40% of defective parts derive from faulty fixture design or manufacture [1].

Fixture design and automation has attracted attention in the production research community, since it takes 80 to 90% of the time spent in process planning [2] and accounts for between 10 and 20% of the overall cost of manufacturing [3]. Several reviews have covered the topic, gathering the latest advances in fixtures over the last few decades [3,4,5].

A common way to utilize design automation of fixtures is through different optimization approaches. One example is combining modular fixtures, genetic optimization, and finite element method (FEM) software to reduce time and costs [4]. This paper introduces supervised learning as an alternative to optimization frameworks. The main issue with an optimization approach is the re-usability of distinct parts, which all need an optimization run. Supervised learning looks at the problem differently, capturing knowledge from similar designs to return an immediate solution. In short, the presented framework couples design automation with state-of-the-art supervised learning to automate fixturing in sheet metal workpieces.

The paper confers a framework to lay out the so-called N-2-1 (or 3-2-1) locating principle. Central to this method are well-spread locating points which prevent a workpiece from moving and rotating during a manufacturing process [6].

The data used for training consists of inner and outer b-pillar panels from different vehicle models. In the automotive industry, the vertical support post that connects the body to the vehicle’s roof at the rear of the front door is the b-pillar (see Fig. 1). The framework couples with the computer aided-design (CAD) software through Python-Excel VBA integration. The user gets the proposed locators automatically when interacting with the developed add-in.

Fig. 1
figure 1

(a) Inner b-pillar panel attached to roof and body side sill. (b) Inner and outer panels

While most artificial neural networks (ANNs) aim to build classification models in computer vision, regression applications are often limited to surrogate models that replace complex simulations or other computationally expensive tasks. The framework exploits the potential of regression models introducing custom-made convolutional neural networks (CNNs). The CNN has been optimized in a two-stage tuning to improve performance.

The structure of the paper mimics the framework. CAD data processing and its augmentation open Section 3, followed by the classification part, the regression problem, and the hyperparameter tuning. Section 4 analyzes the metrics obtained for the classification and regression models. Sections 6 and 7 discuss and summarize the findings and contributions, including future work analysis.

2 Background

2.1 Fixture layout principle

Fixture design and layout require expertise and intuition, although some authors have described some generic principles [7]. In general, three fundamental elements make up the fixture: clamps, locators, and supports. The clamps are those elements that apply the necessary force to keep the workpiece in place. A regular fixture uses two clamps. The locators are elements used to position and orientate the workpiece. A typical fixture contains six locators. In the case of excessive deformation of the workpiece, the support points increase the stability and reduce elastic deformations [5].

The 3-2-1 locating principle is a general rule to place the locators along the workpiece. Three points define the primary plane, so best practice recommends placing these three locators as spread out as possible, i.e., on the largest face of the workpiece. The part can move freely along the plane and rotate. Two more points avoid rotation and one axis translation. These two points will conform to a plane perpendicular to the primary plane, the secondary plane. The last locator cancels the translation along the intersection of these two planes. Figure 2 illustrates the locating principle.

Fig. 2
figure 2

The 3-2-1 locating principle applied to a prismatic geometry

Even though the 3-2-1 principle is general, other alternatives are more suitable for certain workpieces. For instance, the 4-1-1 principle is used for fixing revolution geometries. Another example is the 4-2-1 principle in large cast parts, where the workpiece is over-constrained to ensure stability. In cases where the workpiece is slender relative to the fixture size, supports may be required to avoid deflection in the primary plane. Some authors call this variant the N-2-1 locating principle, including the supports as part of the locating system [8].

The N-2-1 locating principle is the composition of the 3-2-1 principle and the support points. It would be the appropriate principle to define the fixture layout of b-pillars. This study focuses on the 3-2-1 principle since it is the first stage, found to be followed based on engineering experience. Other methods are a better fit for identifying the support locations. Support points calculation using physics-informed NN would be the immediate continuation of this work.

2.2 Previous work on fixture layout

There has been significant research on fixture layout, and three trends characterize this topic. The first category consists of publications on rule formulation, where the authors lay out the general principles in equations. They usually use proof of concepts as work objects. The trend with the most research intent is fixture layout optimization, where a genetic algorithm (GA) is usually the tool used to optimize the fixture locations. The last category is knowledge reuse in fixture layout design, and most papers in this path use case-based reasoning (CBR) to retrieve and reuse information from previous fixture layouts, but ML algorithms are growing in popularity within this trend. The current paper belongs to the third mentioned category.

A scoring system is proposed to identify the research gaps and challenges in the fixture layout problem. The scoring system evaluates every paper in four challenges. These challenges (C*) are:

  1. C1.

    Application object. How complex and close to industrial parts the test cases are. 2D examples will receive a lower score than 3D models. Score system punctuates according to 1 to 4 scale, where 4 are the most complicated object.

  2. C2.

    Setup and re-usability. This challenge measures the effort the developed tool would require to be set in an industrial environment. This challenge also studies the frequency the tool has to be reset or rebuilt for a new case. Since it is a subjective challenge, the publication would be evaluated as weak, mild, or strong in this aspect, according to authors criteria.

  3. C3.

    Integration. How deep is the tool immersed in the CAD/CAM framework? Optimization will typically have an advantage over CBR in this challenge since CBR typically does not present a final solution. Weak, mild, or strong are adjectives for the evaluation of this challenge too.

Table 1 gives an overview of publications on the topic and the different approaches taken historically to formulating the fixture layout problem. The publications are classified attending to their category, the main developed tool framework, and the challenges mentioned.

Table 1 Summary of contributions to the fixture layout problem since the 1990s

2.2.1 Rule formulation

The papers within this category use mathematical logic to find the best fixture layout. The main difficulty is the problem definition that governs the fixture design. Having few rules will make the system too vague, while covering more fixturing aspects adds too much complexity. These rules tend to generalize to similar topologies, but it is hard to extrapolate to other geometries.

Rule proposition for plane workpieces have been studied for more than 30 years, for instance, Pham and de Sam Lazaro [9] developed a CAD-based expert system to be used in conjunction with their FEM analysis package for the purpose of analyzing prismatic workparts, while Lin and Yang [10] implemented a rule-based modular fixture location system for simple blocks. Qin et al. [11] propose a novel approach by breaking down the locating principle into a mathematical formulation. They developed a framework that gives a consecutive point-by-point until the theoretical principles are satisified. Parvaz and Nategh [12] build three basic rules that suggest the fixture layout plus two validation conditions on stability and jamming on free surfaces. Mihaylov [13] implemented an add-in in Solidworks to automate the 3-2-1 location principle using if-else statements, focusing on prismatic simple shapes. Manafi and Nategh [14] account for setup planning in prismatic shapes with multiple work faces.

Even though the principles found in papers from this category set the basis for the most deductive approach to fixture layout, they are limited to industrial application, since they are not part-centered approaches. Therefore, application object challenge generally has a low score.

2.2.2 Fixture layout optimization

Generative algorithms, or GA, propose the most efficient fixture design. One known limitation of this approach is the need to set up and start a different optimization run per part. Layout optimization is the most time-consuming methodology.

Modeling the force of clamps and positional fixtures can be challenging. Some publications focus on developing higher fidelity models by using response surface methodology (RSM) [15, 19], surrogate modeling with ANN [18, 19], or strain energy analysis [20]. Other publications choose a more complex workpiece and use simple static analysis to retrieve displacements and forces. Once the clamping system has been determined, Nambiar et al. (2022) suggest utilizing an optimization algorithm to determine the optimal position of automated fixtures designs considering robot path clearance.

The score for the optimization framework is very similar in all publications since they share a methodology with a high penalty in the challenge score system proposed. Re-usability and setup are the major weaknesses of this approach, but they are fully integrated with CAD/CAM software and return the best fixture layout per workpiece of the three categories.

2.2.3 Knowledge reuse on fixture layout

CBR is a discipline with the field of artificial intelligence. Based on the characteristics of the object, it proposes a solution like a past case solution. It is a valuable tool for retrieving knowledge and experience from previous cases, but the inputs of the CBR framework are often challenging to define.

After optimization and mathematical approaches, CBR has been the third preferred approach to fixture layout automation. One limitation of this approach is the inflexibility to adapt to new cases that are substantially different from those found in the library; a knowledge limitation that is also shared by machine learning approaches since they rely on training data. Previous work on CBR has been carried out by Sun and Chen [22] and Hashemi et al. [24]. An early machine learning approach based on a decision tree was introduced by Kumar et al. [23], while more recent research uses reinforcement learning as an alternative [25]. Both publications are conceptual and without a clear connection to industrial applications.

2.3 Artificial neural networks

Making use of the latest advances in supervised learning, the approach presented in this paper takes a hyperparameter tuned CNN to solve the layout problem. The Keras team defines Keras as “a deep learning API written in Python, running on top of TensorFlow.” They enable fast experimentation and include state-of-the-art built-in models such as EfficientNet [26]. EfficientNet [27] is a scalable CNN focused on getting maximum performance according to computing resources. It has more than 5500 citations by 2022, proving its high versatility. The first step in the proposed framework is to use EfficientNet to classify different b-pillar structures. Most built-in networks are intended and recommended for classification. The regression problem stated in this project cannot benefit from pre-built networks, and a custom-made network can reach higher performance after tuning. A neural network is highly dependent on multiple hyperparameters. Tuning hyperparameters is the implementation of search algorithms, such as grid search, random search, or Bayesian optimization, to find the optimal configuration of the network given a predefined search space [28]. From the number of layers to initializers, every hyperparameter can be tuned, so this could become an extremely computationally expensive task. The user carefully designs the search space. In this case, the API supporting hyperparameter optimization is KerasTuner [29]. For more information about hyperparameter tuning, see Yu and Zhu’s [28] comprehensive review covering the most essential topics in hyperparameter optimization.

3 Proposed methodology

3.1 A comprehensive flowchart of the framework

Figure 3 presents a dual flowchart. The flowchart on top outputs the models that the bottom flowchart needs to do inference. Both processes require data processing. The data processing step converts the 3D CAD data into depth maps, to be used as training data for CNN. First, the classification model determines whether the sample is an inner or outer b-pillar. Then, the appropriate regression model creates the locator information, which ends in the CAD software. These regression models have been hyperparameter-tuned to enhance their performance.

Fig. 3
figure 3

General flowchart for the methodology and the resulting inference framework

3.2 Data processing

3.2.1 Gray-scale topographic map

Sheet metal fabrication is the generation of parts from a metal sheet by cutting, stamping, bending, or punching. Since these parts come from a planar metal sheet, their thickness is uniform, in practice.

While most research in automating fixture layout focuses on prismatic study cases, sheet metal design will not benefit from these methodologies. This paper proposes an alternative that is suitable for sheet-metal workpieces.

Other authors have used projections of the geometric parts to scale down the fixture layout problem to 2D. Trappey et al. [30] apply heuristics to the three views to find the fixture configuration, while Low et al. [25] implemented reinforcement learning over the projection in the primary plane.

This framework takes the projection over the grid plane to create a 2D low-resolution image. The grid plane is parallel to the global coordinate plane that allows the three first locators to be positioned as far as possible from each other. The grid plane is the most representative projection of the part in global coordinates. Note the difference with the primary plane, which could be independent of global coordinates.

The local Z coordinate, perpendicular to the primary plane, is stored and converted to a color scale, generating an irregular grayscale drawing from the original sheet metal design, similar to the topographic maps used in cartography. Because of the uniform thickness, the projection contains all the design information.

Automation in CATIAv5-Excel VBA enables the grayscale topographic map generation. A semi-infinite line normal to the primary plane modifies the X-Y coordinates to intersect the shell, storing the intersection coordinates. The X-Y step determines the resolution of the topographic image. The X-Y step will determine the time to generate the image. The generation time is highly dependent on the file weight.

The same rules apply to locators. The training data set has the layout predefined. The output of this rule-based framework consists of the gray-scale image and six points coordinates. These coordinates comprise the label for training. For the image and the label, the X and Y values are the X and Y coordinates in pixels from the top left corner of the image, while the Z-coordinate is converted to color scale (50,255) integer value. There is a minimum threshold of 50 to get a clear difference from the background, with a value of 0.

Table 2 details the input parameters needed by the framework. Algorithm 1 comprises the rule-based framework. The framework implements error handling since the grid contains points where the intersection of the semi-infinite line with the shell fails. The outputs are the image, the labels, and six Image-to-CAD parameters required to revert the conversion, i.e., to send information from images back to CAD. These parameters are the maximum and minimum X, Y, and Z coordinates found in the grid, in millimeters.

Table 2 Inputs required by the framework before running
Algorithm 1
figure a

Topographic image generation. Simplified for gy > gx.

The calculations in Algorithm 1 account for 10% of the margin. Since all parts are oriented vertically, this margin of 10% is at the top and bottom of the image. Once transformed, all parts have the same height. However, the height/width ratio recalls the difference in size between parts, where smaller parts are wider.

Modifying the exact pixel (Xk,Yk) with the color value alone does not produce the topographic map shown in Fig. 4c. Squares with sizes half the step Spx (in pixels) centered in every (Xk,Yk) color the gap and conform a solid and continuous topographic map. Finally, the label follows the same operations. Figure 4 illustrates the whole process, obtaining the coordinates in mm first, transforming it according to the image size, and filling the gaps to get the final image. Figure 4c also shows the transformed position of the locators in the gray-scale new image.

Fig. 4
figure 4

(a.) Monochromatic representation of coordinates in mm. (b.) Monochromatic representation of transformed coordinates to pixels. (c.) Final gray-scale image with the locators in black. Note that this case only shows five points, since the fourth and sixth points share coordinates

3.2.2 Data augmentation

Data augmentation is the technique of increasing the number of data samples by introducing slight modifications to the existing data. It has become very popular in image classification problems, and some operations are rotation, zoom-in or zoom-out, mirroring, modifying saturation, tone or color, erosion, kernel filtering, etc. Data augmentation is known to regularize the network and avoid overfitting while, at the same time, providing training samples where data samples are scarce [31].

Data augmentation does not affect the label in classification problems, i.e., a dog is a dog even though the image is rotated. This fact simplifies the augmentation problem, which is only centered on the image itself. The data augmentation carried out in this paper involves operations that affect the label, and mathematical formulations replicate the image transformation. The techniques proposed are mirroring over principal axes and rotation.

The mirror operation follows an if-else Boolean approach, while the calculations are simple. The transformation offers four possible combinations. The image and labels can mirror over the center X and Y axes, only flip across one of the axes or no transformation. NumPy library is used for the image operation.

The rotation operation adopts complex notation to tilt the image at a desired angle, ϕ. The image rotation uses the SciPy library. The X and Y coordinates represent the real and imaginary parts in the complex spectrum, respectively. The point rotation is calculated as:

$$ \textbf{L}_{r}=(\textbf{L}-\textbf{O}) \cdot e^{i\phi}+\textbf{O} $$

where Lr is the rotated coordinates in complex notation that are converted back to real X and Y coordinates. L and O are the original position and the center of rotation in complex format, respectively. Figure 5 presents an example of these transformations.

Fig. 5
figure 5

Transformations of image and label. Data augmentation operations

Three hundred and fifty images are generated per data point, increasing the training data from seven samples to 2450 data points. The seven samples of inner and outer b-pillars are from cars of the same car manufacturing company; however, they belong to different car segments and size, mainly SUV and hatchback. The augmented data is used to train both classification and regression models. Even though the ratio of data augmented samples over original is massive, introducing these rotated and flipped data points, with different labels makes them a unique problem for the network, and valuable for training.

3.3 Classification problem: EffcientNet

Traditional 3D CNNs have several shortcomings regarding memory, computation power, overfitting, and performance overall [32]. As an alternative, this paper proposes pre-processing to enable classification. The strongest virtue of the proposed methodology is that it reduces the complexity in the network, and 2D conventional CNN can be used for classification. By contrast, this methodology has been envisaged of for sheet metal designs, where one projection is predominant over the others.

The first goal of the classification problem is to enable recognition of 3D parts through the presented pre-processing. The second application is to differentiate different types of parts in order to send them to the regression model that will have different weights for different types of parts. In the studied case, the inner and outer b-pillar sheet metal panels test the classification model.

3.3.1 Experimental setup

The training uses the platform Google Colab, an online Jupyter notebook that does not require setup. Free GPU access is granted for a limited time, in this case, Colab provided Tesla T4 16GB.

The authors presented EfficientNet with seven different architectures. These architectures are optimized targeting accuracy and floating point operations per second (FLOPS). EfficientNet-B0 is the smallest architecture, and the authors test it against popular architectures, ResNet-50 [33] and DenseNet-169 [34], getting similar accuracy results while reducing the number of parameters in a ratio 1-to-4.9 and 1-to-2.6, respectively [27]. Due to the binary nature of this classification problem, EfficientNet-B0 was used. More complex architectures will increase computational costs to give a similar result.

EfficientNet-B0 has been downloaded with pre-trained weights in ImageNet. The input size is locked by the framework to RGB images with 224×224 pixels, so the images are scaled down accordingly. The network’s top is modified according to Table 3 to adapt it to this particular problem. The labels are one-hot encoded. Outer b-pillar panels are classified by the one-hot vector [1 0] and inner b-pillar panels with [0 1].

Table 3 Modified EfficientNet-B0 used for training

3.3.2 Evaluation metrics

The metrics that estimate the goodness of a model during training in a classification problem are the loss and the accuracy. These metrics are obtained for both training itself and validation.

Accuracy is the ratio of the number of correct predictions over the total predictions. Therefore, accuracy always falls in the interval 0 to 1, and the closer to 1, the better the model is. In studied case, only two classes are provided so it can be approached as a binary classification problem. For binary classification, accuracy can also be calculated as:

$$ \text{Accuracy}=\frac{T_{P}+T_{N}}{T_{P}+T_{N}+F_{P}+F_{N}} $$

where TP and TN stand for true positive and true negative, and FP and FN mean false positive and false negative.

The loss is an indicator of the “badness” of the prediction of a model. It consists of a function that obtains the difference between the predicted value and the true value. The algorithm looks for the model that minimizes the loss. In this case, binary cross-entropy loss function is implemented.

While the simplest loss function would consist of a difference operation, it is not the most effective depending on the type of problem. This loss would look like:

$$ \text{Loss}(y,\hat{y})=\frac{1}{N}\sum\limits_{i=1}^{N} (y_{i}-\hat{y}_{i}) $$

The loss would be the average of the difference between the true y and the predicted value \(\hat {y}\) for N data points. To enhance the penalty to the model, mean squared error (MSE) is often used in regression training.

$$ \text{MSE}(y,\hat{y})=\frac{1}{N}\sum\limits_{i=1}^{N} (y_{i}-\hat{y}_{i})^{2} $$

However, the binary cross-entropy loss function has its foundation in the Kullback-Leibler divergence DKL [35], which measures the dissimilarity between two distributions.

$$ \begin{array}{@{}rcl@{}} D_{KL}(a,b)&=&H(a)-H_{p}(a)=\sum\limits_{c=1}^{C} a(y_{c}) \cdot [\log(a(y_{c}))\\&&-\log(b(y_{c}))] \end{array} $$

where a and b are the distributions and C is the number of classes. The DKL is the difference between the entropy H of a distribution \(H(a)=-{\sum }_{c=1}^{C} a(y_{c}) \cdot \log (a(y_{c}))\) and its cross-entropy \(H_{p}(a)=-{\sum }_{c=1}^{C} a(y_{c}) \cdot \log (b(y_{c}))\).

Given a binary problem, where true values are y = 0 ⊕ y = 1, the cross-entropy can be expressed as \(H_{p}(a)=-{\sum }_{c=1}^{C} a(y_{c}) \cdot \log (1-a(y_{j}))\). The binary cross-entropy loss function BCE is the mean average of the Kullback-Leibler divergence applied to binary distribution.

$$ \text{BCE}(y,\hat{y})=-\frac{1}{N}\sum\limits_{i=1}^{N} y_{i} \cdot \log(\hat{y}) + (1-y_{i})\cdot \log(1-\hat{y}) $$

3.4 Regression problem: tuned CNN

A regression neural network is built at this point to obtain the locators all at once. The label contains the information about the locators transformed to pixels (X and Y ) and grayscale (Z). It conforms to a matrix of (6×3). In practice, loading the neural network with the Z coordinate (grayscale data) makes the model more complex, and has no benefit since the X-Y coordinate can project back to the sheet metal panel. Erasing the last column leaves a matrix (6×2). The fourth and sixth locators always share coordinates in the inner and outer panels. The sixth element is deleted from the label, and is always assumed to be in the position of the fourth locator. The final label will be a matrix (5,2) that needs to be arranged as an array of ten elements to become the final output of the regression neural network.

The points are relocated in the same fashion, since there were cases in which the engineer criteria placed the first three locators in a different order. This simple fact would not imply any difference from a fixture point of view, but the network would not find a pattern, because none exists.

Once the labels have been simplified and standardized, the training starts in a custom-made CNN. After some manual tuning, it was decided that hyperparameter tuning would be the correct approach to the problem. The last manual iteration is called the baseline model in this paper (Fig. 6).

Fig. 6
figure 6

Detailed view of the baseline model architecture and compiler. Initializer information is missing in the diagram. The initializer is random normal in the two first layers. The rest of the layers have Glorot uniform by default

The baseline model is trained for inner b-pillar samples. Introducing inner and outer b-pillars simultaneously confuses the network as it is unable to converge. Different regression models are built for each part, even though the architecture of these models will be the same. The idea is that individual training for outer b-pillars using the pre-trained weights for the inner b-pillar will suffice because of the similarity between data samples.

3.4.1 Hyper-parameter design space

The possibilities offered by KerasTuner are immense, and the design space is arranged according to experience from the baseline model manual tuning, attending to computational resources and training times. The tuning is performed in a two-stage optimization.

The first optimization centers on the general architecture of the network. It includes filters of each layer, whether to include average pooling or maximum pooling after every convolution, batch normalization functions, and the last activation function.

The second tuning searches for fine tuning in the optimized network, and is centered on drop-out functions and learning rate. The search algorithm used in both tunings is Bayesian optimization. Table 4 details the whole design space.

Table 4 Hyperparameter tuning design space

Even though Bayesian hyperparameter optimization mathematical principles are out of the scope of this paper, the difference between this algorithm and random normal or grid search is that these two are uninformed by past trials. Bayesian optimization builds a surrogate model of the objective function (loss, in this case), finds the hyperparameters that perform best in the surrogate model, applies the proposal in the following trial, and updates the surrogate model with the result [36].

It is expected that the first optimization will introduce the main improvements. The scope of the second optimization is to find convergence earlier or improve the results slightly.

The loss of the model is the only objective for the optimization run. During the first optimization, 25 trials are run, with one execution per trial and three epochs per execution. The validation split and batch size are the same as the baseline model, 20% and 32%. respectively. The second optimization has almost the same setup but only runs 15 trials. A comparison between the final architecture and the baseline model is described in Section 4.

A third optimization was tested without success. The scope was to tune the initializers, but the results did not improve the model. Therefore, the initializers are the same as in the baseline model.

The final optimized model trains for 20 epochs. Once the training is complete, the model is saved and recycled to be trained with the outer b-pillar data to update the weights, ending with two regression models.

3.5 Exporting fixture layout to CAD software

At this point, there are three trained models. The classification model identifies the sample as inner or outer b-pillar. Two regression models then predict the locators. They have the same architecture but different weights, trained for the two possible classes.

Depending on the output of the classification model, the sample is fed into the correct regression model, retrieving the label data. The label data is post-processed to represent the six locators. First, the sixth point gets the coordinate from the fourth one, inverting the pre-processing step. Next, the X and Y coordinates are transformed from the image coordinates to 3D global coordinates. The transformation leaves the predicted coordinates on the grid plane. A line, perpendicular to the grid plane, intersects the shell to determine the 3D locator. Figure 7 illustrates this step.

Fig. 7
figure 7

Illustration of how the coordinates are exported to the CAD software. If the points are outside of the boundaries, they are projected over the edge of the part

The locators are very close to the edges, and the models may place the points outside the workpiece perimeter, producing errors and liabilities. In these cases, a rule redirects the predicted location to the closest point inside the workpiece. The procedure is automated in the CATIA API. First, it detects the outer edges of the workpiece and projects the point to the closest edge. Finally, the six locators are placed automatically in the workpiece.

4 Analysis of results

4.1 Classification model

Tan and Le [27] built EfficientNet architecture to predict popular image data sets such as ImageNet, CIFAR, and Flowers. These data sets are all relatively large, and they are compounds of real RGB pictures and multiple classes. For a binary grayscale classification problem, the modified architecture shows outstanding results in the first epoch. Within three epochs, the model has reached perfection (accuracy = 100%) while the loss is minimal. The validation parameters also follow the same trend. Table 5 summarizes the training results per epoch.

Table 5 Modified EfficientNet-B0 Training metrics per epoch. Showing loss, accuracy, validation loss, and validation accuracy. Validation split: 30%, batch size: 32, training time: 95s

4.2 Hyperparameter tuning. inner b-pillar regression model

Figure 8 shows a representation of the architecture after both optimizations. In the first optimization, the major differences are the inclusion of average pooling after every convolution except the first, and the Sigmoid activation function. The optimization of filters deviates from best practice, which suggest increasing the filters progressively after each convolution [37].

Fig. 8
figure 8

Model architecture after first and second optimizations

Although the effect of dropout in fully connected layers has proven its efficacy, there is still some controversy about whether dropout is beneficial after convolutions [38]. In this case, the best model only includes dropout in the first convolution. The difference in loss between trials regarding dropout position and quantity is minimal. The dropout value and the learning rate are minimum and maximum in their respective ranges for the best trials.

The loss is the only objective in both optimizations. Figure 9 shows the loss per epoch of the three models, for 20 epochs. The first and second optimized models reach convergence faster than the baseline model, and the loss improves significantly from the first epoch. The detailed view of the plot shows an improvement of more than 60% in the first optimization and about 80% in the second tuning. It is atypical that the validation loss in both optimized models is lower than the loss of the model.

Fig. 9
figure 9

Regression training for 20 epochs. Comparison between baseline model (gray) against first optimization (brown) and second optimization (green). Validation split: 20%, batch size: 32. Avg. training time: 240s

Even though the loss is highly significant for determining the performance of the model, Fig. 10 compares the prediction of a test sample for the optimized model against the baseline model. The baseline model predictions are off, while the optimized model gives almost perfect match. Test samples show the honest performance of the model.

Fig. 10
figure 10

Test sample. Comparison between original label, baseline prediction and tuned model-predicted values

4.3 Outer b-pillar regression model

The optimized model is tuned for inner b-pillar samples. Despite similarities in samples (same data processing) and geometry, the model needs further training in the new parts. The same architecture may return the same good results updating the weights by training in the new data set.

Loss and validation loss show a drop of almost 50%, reaching lower values than in the inner b-pillar data set (see Fig. 11). The plot does not show convergence. However, the loss is so narrow that the model reaches convergence in the first epoch.

Fig. 11
figure 11

Left, loss per epoch: training of the tuned model in outer b-Pillar samples. Right, test sample: comparison between original label and model predicted values

A sample also tests this model in Fig. 11. The results show little deviation from the original setup, proving that the architecture can also predict different sheet metal design parts. The points are colored for ease of reading.

5 Managerial implications

The delivered framework provides a quick estimation of the 3-2-1 locating principle for any recurrent sheet metal design. Training times and data processing will imply an initial effort and investment in resources, but faster process planning for new parts will compensate it.

As long as the parts have similar characteristics to those manufactured in the past, the models will not become obsolete, but will require maintenance. Fine-tuning the models with later designs will increase their versatility and overall performance.

From a human resources point of view, the algorithms presented throughout this paper can by no means replace the process planning team members. Their role will remain the same: to ensure manufacturability of the parts and have close contact with the design department for this purpose.

5.1 About neural networks in automatic fixture layout

Using CNN can have a substantial advantage over current methodologies in fixture layout design. The strength of CNN is recalled according to the four challenges stated in the background study.

  1. C1.

    Application object. The main advantage in this area is that the complexity of the product is not a limiting factor, using either 3D-CNN or the proposed methodology. The limitation would be on the previous data. This fact introduces a considerable advantage versus CBR [22, 24] or rule-based formulation [10, 11, 13, 14].

  2. C2.

    Setup and re-usability. Once the algorithm has been trained, no further actions are required in order to use them in industrial applications. CNN shares this advantage with rule-based formulation over optimization algorithms, which require a setup for every different part [18,19,20]. The need to train on every different type of part and update the models is the main weakness of the proposed method. This weakness is shared with any CBR approach. In this sense, rule formulation presents a clear advantage over the other categories.

  3. C3.

    Integration. The proposed methodology is fully integrated into the CAD software as an add-on. Integration is a factor that CBR usually lacks, leaving the engineer to decide on the fixture layout.

In summary, the review of the literature (see Table 1) revealed that there is limited research utilizing similar methods to the one presented in this paper. CNN offers a good compromise between the advantages of the three categories. CNN and reinforcement learning [25] provide new approaches that have not been explored in fixture layout design, and further work on this topic could belittle traditional methodologies.

6 Discussion

The design of the b-pillar, and any other automotive parts, requires several iterations between the process planning team (which is in charge of fixture design and layout) and the design team. With the introduction of this framework, the designers will -in the best-case scenario- have the final fixture layout. In the worst-case scenario, they will have an approximation. The information can help engineers to make the necessary design decisions beforehand, reducing iterations between teams.

The industry can benefit from problems solved previously through supervised learning assistance. However, engineering ideas are needed regarding how to feed the networks. This framework presents another example of how machine learning applies to industrial parts.

The problem of standardizing the layout is addressed in the methodology, since it is necessary before training. If no standard applies to the fixture layout, the network will not find a correlation between input and output. On the other hand, standardization is always beneficial as the fixture becomes more complicated, e.g., in terms of support points, clamping points, etc.

The data processing methodology is chosen instead of .stl generation or other mesh-related alternatives because it would be highly complex to translate the locators’ coordinates (in mm) into image coordinates (in pixels). The presented framework assures total control of coordinates.

As in every supervised learning task, the final results are highly dependent on the data quality and quantity. An extensive archive of CAD parts would make the network more reliable. In cases where data is limited, data augmentation can tackle the problem. Here, only seven samples are available for training — seven parts belonging to different car models.

The classification network has two main roles in this work. First, it is used to determine if a simple architecture can work with the topographic images of very similar parts (inner and outer b-pillars). Second, it is used to present an end-to-end automated framework. While the classification network may complicate the framework against using CAD metadata for this specific case study, the method is designed to generalize and be applied to cases with multiple part types. In these cases, neural networks may be a better choice than a complex rule-based system for filtering CAD metadata, assuming that this data is available and accessible across different teams, software, and companies, which may not always be the case.

Machine learning always involves some degree of uncertainty. At the end of the day, it is only a non-linear problem solver. Therefore, the results need to be supervised, as no perfect model exists, and fully independent models need “firewalls.”

Optimization of neural networks has demonstrated its utility in this study. However, the number of hyperparameters is so extensive that it is impossible to capture the best configuration for the model with limited computer resources. A local minimum for a good design space can improve the model performance by 80%, making the difference between a useless model and a reliable one.

Short-term future work will apply physics-informed neural networks to find the necessary support points in sheet-metal designs. Re-usability of the network would present a clear advantage over existing optimization frameworks that run one an optimization per part.

With the potential of CNN, expanding the implementation scope could provide more benefits for real-life applications in the automotive industry.The proposed method case study is the 3-2-1 locating principle, even though these locations may not be used for the b-pillar assembly in this case. In the current state of the framework, these points are known a priori, but they are not automated. More importantly, through our case study, the way to automate fixturing of assemblies with different parts and unique configurations has been opened. Long-term studies will tackle the problem of fixture layout in the assembly of different parts, e.g., when the b-pillar is welded to the roof and the body side sill. Once the locating layout has been produced for every part, it should be possible for a neural network to decide on the place and number of fixtures, disregarding redundant supports. The challenge of this application would be found in the joint sections, which are critical areas for assembly support, where fixtures need to leave enough space for the welding tool.

7 Conclusions

This paper proposes a new solution to the fixture layout problem, which has been addressed multiple times during the last few decades. The framework uses supervised learning in the form of tuned CNN to layout fixtures in sheet metal designs. Instead of implementing traditional 3DCNN, the design data is processed into a 2D projection consisting of a topographic grayscale map. Data augmentation and a modified EfficientNet-B0 architecture were used to test the suitability of the topographic map for fixture layout. A custom and tuned CNN was then trained to generate the final layout, which was automatically sent to CAD software. Three contributions are the main pillars of this paper.

  1. 1.

    The topographic gray-scale map effectively solved the fixture layout problem for sheet metal designs. The training times and results endorse the methodology as an alternative to 3DCNN.

  2. 2.

    The framework automates the 3-2-1 locating principle for b-pillars using supervised learning, and can also be applied to other sheet metal designs. The method’s versatility was demonstrated by analyzing two main components in b-pillars.

  3. 3.

    The use of a tuned CNN in a regression problem resulted in a solution to the fixture layout problem, rather than just a probable answer as in classification. The logic behind an engineer’s fixturing decisions is similar to regression, where the solution is based on previous experience rather than an absolute truth.

This paper suggests that supervised learning can be a promising approach for automating the complex fixture layout problem.