1 Introduction

When designing shafts for drivetrains, it is of great importance to be able to correctly assess the influence of notches on the strength. Notches can be, for example, shaft shoulders or round grooves. At these locations, there is an increase in stress compared to the unnotched component. For component design according to nominal stress approaches, this stress concentration is expressed by a stress concentration factor \(K_{\mathrm{t}}\), which is defined as follows [1]:

$$K_{\mathrm{t}}=\frac{\sigma_{\mathrm{max}}}{\sigma_{\mathrm{n}}}\quad\text{or}\quad K_{\mathrm{t}}=\frac{\tau_{\mathrm{max}}}{\tau_{\mathrm{n}}}.$$
(1)

It indicates the ratio of the maximum stress at the notched cross-section to the nominal stress at the unnotched cross-section. Its value depends on the geometry of the component and the load type. There are several methods to determine the stress concentration factor.

Approximate equations for simple geometries like shaft shoulders are given by the calculation specifications of DIN 743 [2]. These are based on the fundamental work on notch stress theory by Neuber, who gives the stress concentration factors for real notches by averaging the theoretically determined stress concentration factors for the limiting cases of an “infinitely shallow” and an “infinitely deep” notch. Further work by Petersen and Peterson extended and simplified Neuber’s equation and adjusted the constants to empirical data [1]. This process resulted in Eq. (2) with the coefficients given in Table 1. The meaning of the symbols \(D\), \(d\), \(t\) and \(r\) is given in Fig. 1.

$$K_{\mathrm{t}}=1+\frac{1}{\sqrt{A\cdot\frac{r}{t}+2\cdot B\cdot\frac{r}{d}\cdot\left(1+2\cdot\frac{r}{d}\right)^{2}+C\cdot\left(\frac{r}{t}\right)^{z}\cdot\frac{d}{D}}}$$
(2)
Table 1 Coefficients for the determination of the stress concentration factor for shaft shoulders according to [2]
Fig. 1
figure 1

Dimensions at the shaft shoulder

Alternatively, the stress concentration factor can be determined numerically using the finite element method (FEM) [3, 4]. In this case, the geometry to be investigated is simulated in a calculation program. By applying an external load, the maximum stress at the notch can be found. Using Eq. (1), the stress concentration factor is obtained.

The application of the approximate equations can be done very quickly and with low resource requirements. However, the accuracy of this method is limited. In contrast, the determination of the stress concentration factors by means of the FEM can be very accurate, depending on the mesh quality. This is offset by the effort required to create the model and the significantly longer calculation time.

A determination method that combines the accuracy of FEM simulations with the speed of applying approximate equations is desirable. To some extent, this has been investigated by Wendler [5] who developed approximate equations for selected spline shafts based on extensive FEM studies with traditional regression methods. However, they are of limited precision with an error of up to 10%. In the recent past, machine learning (ML) methods have proven to be a promising technique for precisely representing complex relationships [6, 7]. In the present work, therefore, their suitability for the regression of stress concentration factors will be investigated. From the results of a parameter variation study performed by FEM, a regression analysis is to be carried out using machine learning methods. Thereby, the accuracy advantages of the FEM shall be combined with the speed of the application of approximate equations.

2 Basics of machine learning

Supervised learning constitutes one category of machine learning methods and can be used for regression. In supervised learning, a function \(f\) of the type \(\hat{y}=f(\vec{x})\) is optimized in such a way that the deviation between the predictions \(\hat{y}\) and the true labels \(y\) becomes minimal. [8] For this purpose, the prediction of the model is compared with the actual value for each data set from the training data. Depending on the deviation, the model parameters are adjusted so that the prediction error is minimized. This process is repeated until a termination criterion is reached. The termination criterion can be a prediction error limit or a maximum number of iterations. [9] In this particular case, \(\vec{x}\) is the geometric quantities of the notch and \(\vec{y}\) is the stress concentration factors for the three load cases tension/compression, bending and torsion. After training, the model should be able to correctly estimate the stress concentration factors even for geometries that were not part of the training data.

The machine learning process can vary in detail depending on the application, but generally proceeds as depicted in Fig. 2. An extensive overview of this process can be found in [10].

Fig. 2
figure 2

Machine learning Workflow

In this work, the holdout approach and cross-validation are used to assess the performance of the trained models. This technique is also employed to optimise the adjustment options of the models in a process called hyperparameter tuning. These procedures are described in more detail in [11].

The function of artificial neural networks (ANN) as described in [12,13,14] is based on the structure of the human brain, where a large number of nerve cells (neurons) are interconnected. This structure enables information to be passed on and get processed. Similarly, an ANN is composed of several layers, which are called input layer, hidden layer or output layer according to their position. Each layer consists of several neurons that are connected to those of the adjacent layers, see Fig. 3.

Fig. 3
figure 3

Structure of an ANN

These neurons each accept a number of input values \(x_{i}\) with \(i=1\dots n\), where \(n\) is the number of inputs. A certain weighting factor \(w_{i}\) is assigned to each input. In the neuron, each input is multiplied by the associated weighting factor and the result is summed up. Additionally, a bias input value \(w_{0}\) is added. The neuron applies an activation function \(\varphi\) to this network input. [15]

The output of the artificial neuron is thus calculated as follows:

$$\hat{y}=\varphi\left(\sum_{i=1}^{n}{x_{i}\cdot w_{i}}+w_{0}\right).$$
(3)

\(\varphi\) is usually a nonlinear function, for example the sigmoid function [16].

Depending on the position of the neuron in the ANN, \(\hat{y}\) could either be the model’s output or one input to the next layer of neurons. During the training process, the weights of the neurons are repeatedly adjusted according to the magnitude and direction of the prediction error using a method called backpropagation [14]. The number of layers and neurons, the learning rate, the activation function, and the number of epochs are hyperparameters of an ANN.

3 Model development process

To implement the workflow described in Sect. 2, an automated process chain for creating and evaluating ML models using Python is developed. The objective is to automate the entire procedure of model creation and assessment to the extent that it runs without further intervention and can be applied to new geometries without major changes. This also includes the process of model optimization through hyperparameter tuning. The goal is to be able to examine a wide variety of model architectures without making significant changes to the source code.

For this purpose, widely used software libraries are utilized. To prepare the data, perform the splits and evaluate the models, scikit-learn [17] is used. The ANNs are created using Tensorflow with the Keras interface [18, 19].

The process chain is divided into three Python scripts, as shown in Fig. 4. The main script is the central interface that initializes the process chain. It defines the input and output directories and creates a specific folder structure to manage the extensive data that accumulates in the model creation process. The experiment’s main folder stores general information, such as data set analyses, log files of the training process, and graphical comparisons of the model results. The training and testing data sets are stored in a subfolder for reproducible evaluation. Each model architecture to be tested has a separate subfolder that contains the fully trained and stored model, as well as diagrams and data for assessing the model quality.

Fig. 4
figure 4

Process chain and developed Python scripts

The training data are collected in a CSV file. Through the main script, a visualization of the distributions and relationships in the dataset is performed using the pairplot function of the seaborn python library [20]. In the next step, the data is split into a training and test dataset. For supervised learning, the data is then divided into features (input variables) and labels (output variables).

This is followed by the actual creation of the models. For this purpose, the main script calls functions implemented by another Python file (the model script). These are used to create different model architectures with their respective hyperparameters. The ANNs are created using Keras Functional Application Programming Interface (API). Input layer size is automatically determined based on input data structure. Size and number of hidden layers are user-defined. Output layer size is determined by number of output quantities. Training is done with cross-validation using scikit-learn function “KFold” to split training data into parts. Subsequently, one model instance is trained on each sub-dataset to determine error metrics using the evaluation function described below. The metrics are averaged and standard deviation is calculated. To prevent overfitting, early stopping provided by Keras API is used. This stops the training procedure if the validation error does not improve over several epochs. The validation dataset is split off the training data of the current cross validation run.

The evaluation is performed using the functions of a third Python file. It compares the trained model against the separate test data and determines standard error metrics such as mean squared error (MSE) or mean absolute percentage error (MAPE) [21]. In addition, plots comparing predicted to true stress concentration factor and residuals are automatically generated. Furthermore, graphs of the progression of the prediction error during training are generated. These allow an assessment of the learning progress. Finally, the trained model is stored in serialized form for later use.

4 Application on shaft shoulders

4.1 Data generation

To illustrate the developed method, it is applied to determine the stress concentration factors for shaft shoulders under tension/compression, bending and torsion. The training data is generated by FEM simulation.

However, the developed workflow including the insights described in the following sections can be adapted to all types of notch geometries. The agnostic approach to the nature of the input data is a key advantage of the process chain. The only limitation lies in the ability to generate a large number of differently parameterized notch geometries. In general, this can be done with any FEM software. Alternatively automated FEM workflows can be applied, reducing the amount of manual effort. This approach is used in this work, using the program Kerbert [22]. Based on the definition of the notch geometry, the program automatically generates a model, meshes it and performs the FE calculation using the Ansys solver. Stress concentration factors can be automatically calculated and returned by the program in Reusable Engineering Exchange Standard (REXS) format [23]. To automate this process, a calculation workflow is developed. It first creates the geometry files based on a given parameter field. Afterwards, it starts the serialized calculation for all input files. It extracts the stress concentration factors from the resulting files and stores them together with the corresponding geometry data in a CSV file. This file constitutes the training data. The parameter field for the geometries to be examined is defined in accordance with DIN 743, which gives the following limits for the validity of the approximate equations [2]:

$$r/t\geq 0.03,\quad d/D\leq 0.98,\quad K_{\mathrm{t}}\leq 6.$$
(4)

In the following, the geometry parameter r/t is referred to as (notch) sharpness and d/D is called (notch) depth.

Within these limits, training data needs to be generated covering the range of valid stress concentration factors as homogeneously distributed as possible. For this purpose, the geometry parameters have to be chosen appropriately. Therefore, different distributions of notch sharpness and notch depth and their effects on the resulting stress concentration factors are investigated. The actual geometry parameters of each shaft shoulder are determined by randomly drawing from the distribution. With these values, the estimation of the expected stress concentration factors is carried out according to Eq. (2).

A normal distribution with mean 0.9 and standard deviation 0.3 is selected for the notch depth. According to Eq. (4), it is trimmed to a range of values from 0.05 to 0.98.

For notch sharpness an exponential distribution is chosen, which produces predominantly small values of notch sharpness. These are necessary to provide a sufficient amount of large stress concentration factors for the training process. To ensure that larger values of the notch sharpness are also present in the training data, a normal distribution with a mean value of 1 is superimposed. The value range of this composite distribution is limited to 0.05–2. The deviation to the definition range of Eq. (4) is due to computational constraints. Preliminary tests showed that for very sharp notches with a notch sharpness \(<0.05\) no stable meshing and calculation is possible.

A total of 10.000 shaft shoulder geometries are created. With the described settings, the distributions of the stress concentration factors for each load case are shown in Fig. 5. A largely uniform distribution is obtained, which covers the range of practically relevant stress concentration factors very well.

Fig. 5
figure 5

Distributions of the expected stress concentration factors (calculated via approximate equations of DIN 743) for tension/compression, bending and torsion

The scatter plot in Fig. 6 shows the realised combinations of notch sharpness and depth. The advantage of random sampling can be recognized clearly. In contrast to a full factorial combination, there are no discrete steps. Instead, the quantities can be considered as continuous.

Fig. 6
figure 6

Scatter plot of notch sharpness and notch depth

4.2 Initial model training

In the following, three different regression models will be investigated. For reference, linear regression and a third-degree polynomial regression are considered. Their performance is compared to that of an ANN.

As explained in Sect. 3, the ANN is created using the Keras library. Initially, a small network shall be investigated. It consists of two hidden layers with five neurons each. Since two input parameters (notch sharpness and depth) and three output values (stress concentration factor for each load case) are given, the overall architecture of the network is 2‑5-5‑3. In the first layer, an additional normalization of the input data is performed. The learning rate is chosen to be 0.001 and 500 epochs of training are performed with a batch size of 32.

For the reference models the scikit-learn library is used. Normalization is carried out similarly to the ANN.

With these settings, the results shown in Table 2 and Fig. 7 are obtained. The models are assessed using triple cross-validation.

Table 2 Model metrics
Fig. 7
figure 7

MSE of different models for shaft shoulder stress concentration factor regression

The ANN shows a significantly better fit than linear or polynomial regression. While the MAPE of the classical regression techniques gets no better than 4,6%, it is possible to reduce it to 1.1% using the ANN.

To better assess the performance of each model, the prediction behavior on the test data is examined. For this purpose, the residuals are investigated in relation to the output values, see Fig. 8. They shall be homoscedastic, i.e. the residual scatter should be approximately equal and symmetrical across all stress concentration factors. For linear regression, this is obviously not the case. The residuals show a parabolic pattern. This is an indicator that the model cannot capture the complexity of the data sufficiently well. A similar pattern emerges with polynomial regression. Despite the quantitatively much better fit, the ANN also shows heteroskedastic behavior with slightly larger residuals in the range of higher stress concentration factors.

Fig. 8
figure 8

Residuals of the stress concentration factors predicted by the models

4.3 Model tuning

The pattern in the residuals plot of the ANN indicates that the model is not complex enough to represent the relationship. For this reason, its architecture shall be optimized. The other hyperparameters (learning rate, batch size) will also be tuned.

Keras provides different tuners for this process. In this work the Hyperband tuner is used. It is a multi-fidelity approach that implements a successive halving algorithm [24]. Given a range of possible hyperparameters, a large number of model instances with different hyperparameter sets are trained over a few epochs. The best parameter combinations are then trained further, while the weaker half are discarded. This process continues until only one model instance with the optimal hyperparameters remains.

The selected parameter ranges and the settings found by the tuning process are listed in Table 3.

Table 3 ANN hyperparameter ranges and optimized values

Fig. 9 shows the mean errors of the tuned ANN compared to those with default settings. A significant improvement was achieved. The MAPE is minimized to 0.37%, which is a further reduction of 65% compared to the initial ANN. To rule out the possibility that this is a case of overfitting, the prediction results are examined in more detail. For this purpose, the predicted stress concentration factors are visualized as a function of the geometry parameters notch sharpness and depth, see Fig. 10. The resulting solution surface should preferably be as smooth as possible. Major unevenness is equivalent to overfitting individual training points, for which there is no justification in terms of the actual physical context.

Fig. 9
figure 9

MSE of the tuned models compared to initial results

Fig. 10
figure 10

Stress concentration factors for tension/compression as functions of notch sharpness and depth for different ANNs

The optimized ANN shows individual convolutions that indicate overfitting. This behavior is particularly evident in the plots in Figs. 11 and 12. These show the predicted stress concentration factors for a notch sharpness of \(r/t=0.4\), respectively \(r/t=1.3\) across all notch depths. While the stress concentration factors show a smooth trend for the notch with sharpness 0.4, a discontinuity can be seen for the smoother notch with sharpness 1.3. At a notch depth of around \(d/D=0.5\), the steepness changes abruptly, which seems physically not justified and might be an imperfection in the simulated data.

Fig. 11
figure 11

Regression result of different ANNs for notch sharpness 0.4

Fig. 12
figure 12

Regression result of different ANNs for notch sharpness 1.3

Regularization methods are used to correct this behavior. These limit the maximum complexity of the model already in the training process and thus reduce the risk of overfitting. For the ANN, this is achieved by dropout regularization. In this approach, a random fraction of neurons is deactivated in each run during the training process. The network is forced to form alternative connections and excessively large weights leading to overfitting are prevented. [25]

Table 3 lists the setting of the regularized model. Particullary, a dropout of 2% is applied after every hidden layer. The result surface of the regularized ANN in Fig. 10 has no more convolutions and shows a smooth behavior. This observation is also shown in the plots in Figs. 11 and 12. The regularized neural network compensates the fluctuations in the data better than using the hyperparameters from tuning and is not influenced by single outlier data points.

Fig. 9 shows a compilation of the errors of the initial, optimized, and regularized ANN compared to those of the reference models. The improved generalization capability of the regularized ANN simultaneously causes individual diverging data points to be captured less accurately. As a result, the prediction error is increased, with an MAPE of 0.68%. This is referred to as the bias-variance tradeoff [26]. A stable model capable of generalization shows a low variance, but has a larger bias (a larger systematic error). Conversely, overfitted models are able to represent the relationship between input and output very well and consequently have a low bias. At the same time, however, they are very sensitive to small variations in their training data and thus show a larger variance.

It can be stated that the used tuning algorithm prefers hyperparameter combinations with low bias. This results in a tendency to overfit. This is somewhat unexpected since the Hyperband tuning algorithm uses a separate validation dataset and multiple iterations to evaluate the achieved goodness of fit. However, since the numerical deviations in the FEM results are present in the entire data set, the overfitting can only be detected to a limited extent with this method. A manual check of the optimization results is therefore recommended.

Furthermore, it can bee seen in Fig. 12 that the regularized ANN predicts stress concentration factors \(K_{\mathrm{t}}<1\), which are not physically reasonable for a real shaft shoulder. This is due to the characteristics of the data creation process which has the objective of an even stress concentration factor distribution. This leads to the result that only few data points cover the range of small values of notch depth. However, since this area of extreme changes in shaft diameter is of little relevance in a technical context, no extension of training data was carried out for this work. Nevertheless, this emphasises the importance of the data collection process in machine learning. Since the extrapolation capability of regression models is always limited, it is of vital essence to cover a wide range of data points during the training process.

4.4 Comparison to other methods

In order to finally assess the quality of the models, a comparison with other calculation methods is performed. The first reference are approximate equations according to DIN 743 [2]. In addition, the modified equations according to Melzer [27] are also taken into account. They are of particular interest because these approximation equations were adjusted by regression to FEM results carried out by Melzer. This approach is therefore essentially comparable to the method presented here, although different regression techniques were used.

The error metrics for the different calculation methods are shown in Table 4. By using machine learning, the mean absolute percentage error on the test data set is reduced to 0.79%. This is an improvement of 57% compared to the Melzer equations. This corresponds to a very accurate representation of the stress concentration factors determined by the FEM simulation while maintaining the high calculation speed of using approximate equations.

Table 4 Model metrics

In the residuals plot in Fig. 13, it can be clearly seen that the ANN provides more accurate results than the calculation according to DIN 743 across the entire stress concentration factor range investigated. Even in comparison to the calculation according to Melzer, the ANN shows better results. Nevertheless, it should be considered that the precision of the stress concentration factor prediction depends on the quality of the training data, in this case the FEM simulations. The determined error metrics focus solely on the models’ capability to capture the rules inherent to the simulation results. Therefore, the results can be further improved by refined simulations.

Fig. 13
figure 13

Comparison of the residuals of different models

5 Conclusion

Machine learning models possess great potential in quick and accurate solutions for complex problems. This work demonstrates the potential for new approaches to strength calculation. As they describe the complex relationship between notch geometry and stress concentration, the calculation of stress concentration factors is used to implement machine learning approaches for regression of these factors. The model development workflow allows for automatic calculation of FE-models for a given range of geometric parameters and notch geometry using the software Kerbert and Ansys. Afterwards, machine learning models can be automatically trained and optimized for the specific problem.

This process can be adapted to a multitude of application scenarios, allowing users with different backgrounds to benefit from the precision of FEM calculations without having to create the models manually. Since the entire process from FEM modeling to training to output of usable models is fully automated, the workflow can be used with little prior knowledge of these topics, making the benefits of machine learning available to a wider range of users.

Results for the shaft shoulder, investigated in this work, show a good accuracy of the stress concentration factor prediction by the machine learning models, even surpassing common approximation equations like DIN 743 or Melzer. Once the training process is completed, the machine learning models combine a short computation time with high accuracy.

This proves most beneficial for complex geometries such as splined shafts. In the case of complex spline geometries, universally applicable approximation equations do not exist or are not as precise as desired, resulting in the need for complex FE models, which require high modelling effort and computational power, resulting in long computation times. This hindrance may be surmounted by the application of machine learning models trained on precise finite element outcomes for these complex geometries. This study presents the necessary tools to develop these models and enable fast yet accurate calculation of stress concentration factors for complex notch geometries.

Due to the model and data agnostic approach and the automatic adaption to the given structure of the input and output data, the developed workflow may even be applied to regression problems beyond the scope of stress concentration factors. With a sufficient amount of high quality training data, the potential use cases are almost unlimited.