1 Introduction

Standard cold rolling is a long and energy-consuming process, and therefore economically not viable. This is the main reason why all attempts at simplifying this process by eliminating some technological operations are fully justified. In practice, this can lead to acceleration of the process and reduction in manufacturing costs. Not without significance are aspects such as reduced energy consumption, improved impact on the natural environment, and reduced emission of pollutants.

In recent years, an intensive development of integrated technological processes has been observed. In these processes, casting is combined with rolling the strand in a semi-solid state into the shape and dimensions close to the shape and dimensions of the final product. Steel solidifies over a range of temperature and not at one fixed point, and therefore strands in the integrated casting and rolling process are characterized by the presence of a semi-solid zone. Very often during this production cycle, problems arise with the quality of the end product, especially when new grades of steel are cast. If during strand straightening its surface or edge temperature is in the reduced range of ductility, there is an immediate risk of cracks forming in the cast strand. Figure 1 shows a schematic diagram of the integrated casting and rolling process, indicating also the critical process steps (C1–C5). The area of the cast strand leaving the mould (C1), as well as the strand bending (C2–C3) and straightening operations (C4) are classified as critical steps, while the area of high tensile stresses (C5) is critical for the installation of pressure rolls in the soft-reduction process as discussed in the manuscript [1].

Fig. 1
figure 1

Schematic diagram of the integrated casting and rolling process showing the critical process steps (C1–C5)

To model the phenomena which occur during soft-reduction, it is necessary to carefully examine numerous complex problems related to this process. The conditions under which the solidification of the strand takes place and the method of its casting affect the final structure of the product, its condition and physical properties as presented in the manuscript [1].

Undoubtedly, the determination of the dynamic response of a given material under the specific conditions of deformation is of key importance for the optimization of its final microstructure and mechanical properties. This subject was discussed in the manuscript [2]. The main parameter describing this response at extra-high temperatures (close to the solidus line) is stress, which depends, among others, on the strain rate, temperature and fraction of the liquid phase as presented in the manuscript [3].

The flow behaviour of steel during hot rolling is difficult to model as the internal structure evolves over time through the interaction of mechanisms such as strain hardening, dynamic recovery and dynamic recrystallization—the phenomena which are presented in the manuscript [4]. In principle, this type of behaviour can be described with constitutive models, which provide a mathematical description of the correlation between stress and strain, strain rate, and temperature.

There are basically three types of constitutive models currently in use, i.e. phenomenological, empirical (physical) and based on artificial intelligence, with the first two functioning in the literature as classical constitutive models, which is presented in the manuscript [5]. Phenomenological models are the ones most commonly used and this problem is discussed in the manuscript [6]. They are based on fitting experimental data to appropriate mathematical equations/functions and do not require a detailed understanding of physical phenomena involved in the deformation process. On the other hand, empirical (physical) models take into account the physical aspects of the deformation of materials. Based on the theory of thermodynamics, thermally activated dislocation motion and slip kinetics, they use a large number of material constants and functional coefficients, the determination of which is often laborious and requires high-precision tools. In the case of classical constitutive models, the functional coefficients in the models are usually determined by the method of regression.

Unfortunately, some of the mechanisms by which various factors control the flow stress cannot be sufficiently well described with classical models. It is difficult to deal with scattered data using the method of regression, and after adding new experimental data, it is necessary to recalculate the regression constants. This topic was discussed, e.g., in manuscripts [6] and [7]. Additionally, in the case of deformation at extra-high temperatures (close to the solidus line), the relationship between stress and strain, strain rate, and temperature is often very complicated and difficult to describe with traditional models. Recently, however, technical literature offers a few examples of the research where artificial neural networks are used as a tool to describe, e.g., the hot deformation process of an aluminium hybrid nanocomposite [8] or to predict the yield stress in magnesium alloy [9]. There are several publications presenting a comparison of constitutive models with network models, e.g. for TC18 titanium alloy [10], Pb-Mg-10Al-0.5B alloy [11] and an Al–Zn–Mg–Cu alloy [12].

Artificial neural networks (ANNs) are essentially semi-parametric regression estimators and are well suited for this purpose because they can approximate virtually any function with any degree of accuracy as presented in the manuscript [6]. An important advantage of the ANN approach is the possibility of algorithmic conversion of input data into output data without the need to have a well-defined process. In the ANN-based models, only a set of representative examples is needed, which makes ANN ideally suited to the problem of estimating the flow stress based on the available experimental data.

As part of this study, an attempt was made to use artificial neural networks to predict the high-temperature mechanical properties of S355 steel. The aim of the study was to develop a model of changes in stress as a function of strain, strain rate and temperature, necessary for the development of soft-reduction process simulation software.

2 Research methodology

In this study, to develop a model of changes in stress as a function of strain, strain rate and temperature, necessary for the development of soft-reduction process simulation software, an experiment was carried out, as a result of which source data was obtained, which at further stages of the work served as training data for the artificial neural networks.

The main goal of the design work was the preliminary experimental verification of the soft—reduction model based on the smoothed particle hydrodynamics method (SPH). It was decided to carry out the initial verification of the developed solutions on S355 steel. One of the reasons for selecting this steel grade is its universality and, the large availability of literature data. Tests of the developed software will also be carried out on other steel grades supplied by the steelworks.

2.1 Test materials and conditions

Studies were made on S355 steel with the chemical composition presented in Table 1.

Table 1 Chemical composition of the tested S355 steel, wt%

A Gleeble 3800 thermo-mechanical simulator was used in the tests. Figure 2 shows the simulator setup with the sample mounted before starting the experiment.

Fig. 2
figure 2

Gleeble 3800 thermo-mechanical simulator with the sample mounted before starting the experiment

The tests were carried out in vacuum with a small addition of argon. Cylindrical Ø10 × 124 mm samples were used in the experiment (Fig. 3).

Fig. 3
figure 3

The sample used for testing the S355 steel

Based on the determined solidus and liquidus temperatures of the tested steel (1465 °C and 1513 °C, respectively), a plan of experiments was designed.

The samples were heated to 1400 °C at a rate of 20 °C/s. In the temperature range up to 1450 °C, slow heating at a rate of 1 °C/s and holding for 10–30 s at 1450 °C were used to stabilize the temperature in the sample volume. Then controlled cooling was performed at a rate of 10 °C/s to the nominal deformation temperature. The deformation of the sample was carried out by stretching at a speed of 1 mm/s and 20 mm/s. The tensile tests were carried out until the sample failure. During the tests, the following parameters were measured: characteristics of changes in current intensity as a function of time, force as a function of tool stroke, and temperature on the sample surface (TC4 thermocouple) and in the sample core (TC3 thermocouple)—Fig. 4.

Fig. 4
figure 4

Schematic diagram of the sample with mounted thermocouples

It was assumed in the studies that nominal strain \(\varepsilon_{n}\) (1) is the quotient of elongation \(\Delta L\)(the stroke of the moving tool in the simulator system) and effective working zone \(L_{0}\), nominal strain rate \(\dot{\varepsilon }_{n}\) (2) is the quotient of tool stroke rate \({s}_{\mathrm{rate}}\) and effective working zone \(L_{0}\), and nominal stress \({\sigma }_{n}^{\mathrm{exp}}\)(3) is the quotient of tensile force F and initial cross-sectional area of the tested sample \(S_{0}\). The estimated value \(L_{0}\) equal to 20 mm and the nominal temperature \(T_{n}\) equal to the sample surface temperature \({T}_{n}={T}_{\mathrm{surf}}^{exp}\) (indications of the TC4 control thermocouple in individual tensile tests) were adopted in the computations.

$${\varepsilon }_{n}=\frac{\Delta L}{{L}_{0}}$$
(1)

where \(\Delta L\) is the sample elongation, \({L}_{0}\) is the effective working zone.

$${\dot{\varepsilon }}_{n}=\frac{{s}_{\mathrm{rate}}}{{L}_{0}}$$
(2)

where \({s}_{\mathrm{rate}}\) is the stroke rate.

$${\sigma }_{n}^{\mathrm{exp}}=\frac{F}{{S}_{0}}$$
(3)

where \({S}_{0}\) is the initial cross-sectional area, \(F\) is the tensile force.

2.2 Test results—source data

As a result of the studies, a data set containing 5852 records and 4 variables was developed. The variables included temperature, T, ℃, strain rate, ε ̇, 1/s, strain, ε, and stress, σ, MPa. The first three variables (T, ε ̇, ε) were independent variables, while the dependent variable in the examined problem was stress, σ. A fragment of the data set is presented in Table 2, and ranges of the values of the studied variables are presented in Table 3.

Table 2 Fragment of the obtained experimental data
Table 3 List and ranges of the analysed input parameters

The characteristics of the data set (its size) are shown graphically, for individual temperatures and strain rates, in Fig. 5.

Fig. 5
figure 5

The characteristics (size) of individual data sets

As follows from Fig. 5, the data set contains a different number of records for different temperatures and strain rates, \(\dot{\varepsilon }\). The largest number of records was collected for the temperature of 1450 ℃ and \(\dot{\varepsilon }=0.05 {\mathrm{s}}^{-1}\), while there are definitely less records in the data set obtained for the strain rate equal to 1 s−1—for each temperature the number is below 100.

2.3 Test results—the stress–strain relationship

The experimental stress–strain diagrams of S355 steel developed for 2 strain rates, i.e. 0.05 s−1 (blue curve) and 1 s−1 (red curve), and 8 temperatures of 400, 700, 1100, 1200, 1250, 1300, 1350, 1450 ℃ are plotted in Figs. 6a–h, respectively.

Fig. 6
figure 6

The stress—strain curves of S355 steel at: a 400 \({^\circ{\rm C} }\), b 700 \({^\circ{\rm C} }\), c 1100 \({^\circ{\rm C} }\), d 1200 \({^\circ{\rm C} }\), e 1250 \({^\circ{\rm C} }\), f 1300 \({^\circ{\rm C} }\), g 1350 \({^\circ{\rm C} }\), h 1450 \({^\circ{\rm C} }\) plotted for two different strain rates of 0.05 s−1 and 1 s−1

The test results illustrated in the graphs (Fig. 6) are related to raw experimental data directly derived from GLEEBLE (blue colour—strain rate of 0.05 s−1, red colour—strain rate of 1 s−1) and do not constitute a set of curves. As a rule, in the literature, such results are presented as an average (e.g. using a regression line).

From Fig. 6 it follows that both the strain rate and the temperature have an impact on the stress value. At all temperatures tested, higher maximum stress values were noted for 1 s−1 strain rates. The diagrams show that stress decreases with the increase in temperature but starting from the temperature of 1200 °C it can be noticed that dots (stress) show significant dispersion, especially at a lower strain rate. Plotting the stress–strain curves at lower temperatures is not a problem as shown in Fig. 6a–c. The main problem is the determination of these curves at extra-high temperatures. During the experiment carried out at temperatures close to the point of solidus, even small changes in temperature cause under these conditions significant differences in the values ​​of yield stress. The simulator tries to maintain the nominal temperature value during the experiment by increasing or decreasing the intensity of the current flowing through the sample, which causes local temperature jumps and a significant scatter of the measurement data which, in turn, results in an enormous heterogeneity of deformation clearly visible in the experimental results shown in Fig. 6d–h.

The capabilities of thermo-mechanical simulators have become sufficient to conduct the experiment, although it is not a simple task. One of the major problems is still the interpretation of test results, since there is no possibility to carry out experimental tests at a constant temperature and homogeneous strain field maintained throughout the whole sample area. A characteristic feature of the simulators is the resistance heating of samples, while the deformation system is operated by servo-hydraulic units. This allows obtaining high heating rates of up to 10,000 °C/s and testing the sample deformation in a combined compression/ tension/torsion process. In such experiments, even small local changes in temperature occurring within the sample volume may cause rapid changes in the mechanical properties of the tested steel, especially in the extra high-temperature ranges (close to the solidus line). This is due to the extremely strong flow stress-temperature dependence in temperature ranges close to the solidus line. The results of such experiments can be interpreted only in combination with the computer simulation of mechanical testing [1].

2.4 Modelling of mechanical properties—artificial neural networks

Applied as an analytical tool, Artificial Neural Networks (ANNs) are useful mainly when it is necessary to model highly non-linear phenomena. Their advantage is the ability to independently discover relationships between variables by repeated presentation of the network with training cases. Another indisputable advantage of this method is the ability to handle uncertain data, as was the case with the source data obtained in this study.

As part of the present study, an attempt was made to use artificial neural networks (ANNs) to develop predictive models that would allow predicting the mechanical properties of S355 steel at extra-high temperatures. The collected experimental results were used to train the artificial neural networks and to verify the developed predictive models. The most popular type of artificial neural networks, i.e. MLP (Multi-Layer-Perceptron), was used. A network of this type is usually composed of one input layer, which in this particular case was a set of neurons where each of the neurons corresponded to one explanatory variable (temperature, strain rate, strain), a hidden layer consisting of McCulloch-Pitts neurons as presented in the manuscript [13], and one output layer containing one neuron consistent with the explained variable, i.e. stress. The adopted general MLP network diagram is shown in Fig. 7.

Fig. 7
figure 7

General MLP network diagram

It is difficult to determine the correct number of hidden layers and the number of neurons in each layer. To determine the optimal network architecture, the STATISTICA programme and its Automatic Neural Networks module were used. There is an automatic normalization procedure which transforms and prepares primary data for building and training of models. The STATISTICA package does not allow its user to select the number of hidden layers. By default, MLP networks in STATISTICA are built with one hidden layer. The only possibility to optimize the models is to modify the number of neurons in the layer and the type of activation functions assumed. Several hundred MLP network architectures with different numbers of neurons and different activation functions (linear, sigmoidal, tangesoidal, logistic and exponential) in the hidden and output layers were tested. Out of all networks generated by the programme, five networks with the lowest validation error were finally selected (Table 4). As an additional measure of the quality of the models, the Pearson's linear correlation coefficients (R), calculated for each type of the set (training, validation and test), for the network response and for the preset values, were used. Additionally, the following measures of model fit were calculated: MAE, RMSE, R2, a summary of which is given in Table 5.

Table 4 Summary of the network training process
Table 5 Values ​​of match metrics

The set of experimental data containing 5852 records has been divided into 3 sets, i.e. training set, validation set and test set. The training set contained 4097 records, and the validation and test sets—877 records each. The validation set was used to control the training process by checking the degree of neuron training. In practice, training involved two phases, i.e. selecting weights for the training set and testing weights on samples from the validation set. The modification of the weight values ​​continued until the approximation error minimum was reached or error in the validation set began to grow. The test set was used to verify the developed models.

Several hundred MLP network architectures with different numbers of hidden neurons and different activation functions in the hidden and output layers (linear, sigmoidal, tangesoidal, logistic and exponential) were tested.

3 Test results

3.1 Analysis of predictive models developed as artificial neural networks

Ultimately, 5 networks with the results characterized by the best matching parameters were selected. Table 4 gives information on the network designations and on the detailed parameters, including the number of neurons in the hidden layer, type of activation functions used in the hidden and output layers, the adopted training method and the number of learning epochs.

To estimate the predictive ability of the developed models, three metrics were used, i.e. mean absolute error (MAE), root mean squared error (RMSE) and the coefficient of determination (R2). They were adequately described with formulas (4) - (6):

$$\mathrm{MAE}=\frac{1}{n}\sum_{i=1}^{n}\left|{y}_{i}-{y}_{i}^{\mathrm{^{\prime}}}\right|$$
(4)
$$\mathrm{RMSE}=\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{\left({y}_{i}-{y}_{i}^{\mathrm{^{\prime}}}\right)}^{2}}$$
(5)
$${R}^{2}={\left(\frac{{\sum }_{i=1}^{n}\left({y}_{i}-\overline{y }\right)\left({y}_{i}^{\mathrm{^{\prime}}}-\overline{{y }^{\mathrm{^{\prime}}}}\right)}{\sqrt{{\sum }_{i=1}^{n}{\left({y}_{i}-\overline{y }\right)}^{2}{\sum }_{i=1}^{n}{\left({y}_{i}^{\mathrm{^{\prime}}}-\overline{{y }^{\mathrm{^{\prime}}}}\right)}^{2} }}\right)}^{2}$$
(6)

where \({y}_{i}\) is the ith stress value from the experiment, \(\overline{y }\) is the mean stress value from the experiment, \({y}^{^{\prime}}\) is the ith stress value from the model, \(\overline{{y }^{^{\prime}}}\) is the mean stress value from the model.

Mean absolute error (MAE) and root mean squared error (RMSE) are the two basic metrics used to test the predictive accuracy of continuous variables. Both MAE and RMSE express the model's mean prediction error in variable units. Both metrics can take values from 0 to and are indifferent to the error direction. These are negative-oriented outcomes where lower scores are better. Since errors are squared with RMSE before they are averaged, this metric places a relatively high weight on large errors, which means that RMSE is more useful when large errors are particularly undesirable. In contrast, the value of the R-squared metric indicates the percentage of the variance in the dependent variable that is collectively explained by the independent variables. In other words, it measures the strength of the relationship between the target variable and the model on a 0 to 100% scale. So, a better model should have a high R-squared value and low MAE and RMSE values. The values of the calculated metrics for all five networks and the training, validation and test sets are presented in Table 5.

As shown in Table 5, in all developed networks, a high degree of compliance between model calculations and experimental calculations was obtained, as evidenced by the low values of MAE and RMSE and the value of R2 approaching 1. And yet, comparing the metrics values obtained for individual models, it is obvious that the best parameters of the training set, validation set and test set were provided by the ANN-5 network with an MLP 3-10-1 architecture, and therefore this model was selected for further analysis. Figure 8 shows a correlation chart based on the results of observations and results generated from the developed MLP 3-10-1 network model.

Fig. 8
figure 8

Correlation chart based on the results of observations and results generated from the developed MLP network model

3.2 Modelling results

To verify the predictive abilities of the selected network (MLP 3-10-1), the results of experimental studies were compared with the results of model calculations.

Figure 9 compares the experimental (blue colour) and model results (orange colour) for 8 temperatures and the strain rate of 0.05 s−1.

Fig. 9
figure 9

Comparison of experimental and model results for the strain rate of 0.05 s−1 and the following temperatures: a 400 \({^\circ{\rm C} }\), b 700 \({^\circ{\rm C} }\), c 1100 \({^\circ{\rm C} }\), d 1200 \({^\circ{\rm C} }\), e 1250 \({^\circ{\rm C} }\), f 1300 \({^\circ{\rm C} }\), g 1350 \({^\circ{\rm C} }\), h 1450 \({^\circ{\rm C} }\)

Figure 10 compares the experimental (red colour) and model results (orange colour) for 8 temperatures and the strain rate of 1 s−1.

Fig. 10
figure 10

Comparison of experimental and model results for the strain rate of 1 s−1 and the following temperatures: a 400 \({^\circ{\rm C} }\), b 700 \({^\circ{\rm C} }\), c 1100 \({^\circ{\rm C} }\), d 1200 \({^\circ{\rm C} }\), e 1250 \({^\circ{\rm C} }\), f 1300 \({^\circ{\rm C} }\), g 1350 \({^\circ{\rm C} }\), h 1450 \({^\circ{\rm C} }\)

The results show that the selected network enables predicting the stress value as a function of the preset strain conditions, temperature and strain rate. Even at high temperatures, when the experimental data shows a high degree of variation, the network can "solve" the problem with a satisfactory accuracy.

Undoubtedly, the advantage of using a neural network is that the developed model works globally for all parameter sets (strain, strain rate, temperature). The results of model calculations are correct and the time of their calculation is very short which will allow the developed solution to be used as a tool for implementation of numerical calculations in the soft-reduction process.

4 Summary and conclusions

As part of the studies, high-temperature characteristics of the S355 steel were determined. The collected set of experimental data contained almost 6000 records describing the dependence of stress on strain, strain rate and temperature of the tested steel. Based on the collected experimental data, models using artificial neural networks were developed and examined, enabling analysis of the stress dependence on strain, strain rate and temperature. The STATISTICA software was used. The optimization of the network architecture consisted in testing several hundred MLP network models with different numbers of neurons and different activation functions in the hidden and output layers. As a result of the analysis, the network MLP 3-10-1 with the best match was selected.

The following conclusions can be drawn:

  • The STATISTICA package can be successfully used to model non-linear phenomena with a small number of explanatory variables and one explained variable.

  • Model calculations based on this network and compared with the experimental results fully justify the use of IT tools based on ANNs in the development of a rheological model of the tested steel.

  • The conducted studies show that the selected models offer satisfactory accuracy in predicting the flow behaviour of S355 steel

  • The developed ANN model can be used in numerical modelling of the soft-reduction process.