With the significant improvement of the process and equipment technical level in the current iron and steel industry, rolling equipment technology has become a bottleneck limiting the overall level of iron and steel industry. The stable operation of the strip rolling mill is a key link to guarantee the stability of the production process and the quality of the strip. Domestic introduced and independently developed strip rolling mills, whose design capacity cannot fully play because of the frequent equipment vibration and instability operation problems. The first crucial reason which caused this phenomenon is the low optimization level of parameters in the rolling process. Considering the strong dynamic coupling effect between the components of the rolling mill system and the large fluctuation of the equipment state, it is difficult to accurately establish the coupling mechanism model between equipment vibration and rolling process, and realize the deep optimization of process parameters. The second main reason is the lack of rolling mill operation condition monitoring means. Through the field investigation of different enterprises, it is found that most strip rolling mills lack vibration detection method, and it is difficult to get equipment running status. As a result, equipment operation state can only be judged by monitoring the signal fluctuation of process parameters, and the rolling mill vibration trend cannot be accurately predicted.

The field vibration tests show that the low matching degree between rolling mill equipment and process parameters is the main factor causing the instability of rolling mill operation. However, the traditional rolling process control system rarely contains the relationship model between process parameters and vibration state of rolling mill, which leads to the mismatch between equipment status and process parameters in the long-term operation of strip rolling mill. The vibration of strip rolling mill is a fusion of multi-source vibration signals caused by the coupling effect of complex internal structure and external load fluctuation of the system, as shown in Fig. 1. To accurately analyze the operation instability mechanism of strip rolling mill and reveal the coupling mechanism between complex dynamic responses in the rolling mill system, it is necessary to study the influence of rolling process parameters on the stability of strip rolling mill system.

Fig. 1
figure 1

The dynamic properties of strip rolling mill

Aiming at the problem of rolling mill vibration, domestic and foreign researchers have studied the mechanism of rolling mill vibration [1,2,3], vibration signal analysis [4, 5], and vibration suppression [6, 7] using the methods of theoretical analysis, numerical calculation, experimental simulation, and engineering verification. Gao et al. [8] summarize in detail the research on rolling mill vibration. However, strip rolling mill is a complex multi-body mechanical system composed of frame, depress hydraulic device, roll system, and other systems. It is difficult to establish an accurate mechanism model between equipment dynamic characteristics and process parameters to predict and analyze the trend of rolling mill vibration. With the rapid development of information technology, data mining technology is widely used in the predictive maintenance of rolling equipment, improvement of product quality, in-depth optimization of process parameters, and so on. Therefore, to quantitatively analyze the influence law of rolling process parameters on the operation stability of strip rolling mill, it is necessary to study the action law between the fluctuation of process parameters and the operation state of rolling mill, and deeply excavate the correlation between mill vibration data and rolling process parameters.

Bagheripoor et al. [9] applied the artificial neural network algorithm to hot strip rolling mill to improve the prediction ability of the rolling force and rolling torque prediction model in the rolling process. Ma et al. [10] proposed a data-based quality related fault diagnosis scheme for hot strip mill process’ equipment fault diagnosis and fault cause analysis. Liu et al. [11] constructed an intelligent prediction model of rolling mill chatter based on long short-term memory (LSTM) recurrent neural network, and predicted the vibration using the historical information data of rolling piece specification, roll condition, rolling process, and rolling mill vibration state. Lu et al. [12] developed data-driven vibration prediction of cold rolling mill, and proposed the XGBoost model to predict rolling mill vibration. Chen et al. [13] and Pan et al. [14] designed a data-driven condition monitoring system to detect mechanical faults of bearings in the main driven system of hot tandem rolling mill. Dong et al. [15] used DBN and GA-BP algorithm to establish the rolling mill vibration prediction model to predict the rolling mill vibration. Deng et al. [16] established a data-based neural network model for prediction of strip crown in hot strip rolling mill. Song et al. [17] built a steel property optimization model based on the XGBoost algorithm and improved particle swarm optimization algorithm to improve and optimize the mechanical properties of steel. Shi et al. [18] established a train arrival delay prediction model based on the XGBoost and BO, and analyzed the prediction efficiency and accuracy of models with different stations. Zhou et al. [19] combined XGBoost with BO to estimate the advance rate of tunnel boring machine under hard rock conditions. Liang et al. [20] used GBDT, XGBoost, and LightGBM algorithms to predict the stability of hard rock columns. Mongan et al. [21] used particle swarm optimization artificial neural network to predict the quality of ultrasonic welded joints.

The prediction model of industrial data combined with machine learning algorithm has been widely used in industry [22,23,24,25,26]. In this paper, a vibration prediction model for hot strip mills based on the XGBoost and BO has been proposed. The main innovations of this paper are illustrated as follows.

  1. (1)

    Based on the self-built dataset, an XGBoost prediction model was developed, which took process parameters as model input variables and rolling mill vibration as model output variables, and accurately constructed complex nonlinear relationships between input and output variables.

  2. (2)

    The XGBoost hyperparameters and parameters were optimized using the BO algorithm to address the problems of slow computation speed, model stability, and prediction accuracy of the existing model using the GS algorithm.

  3. (3)

    The prediction results were interpreted using the SHAP method which fulfilling the technical gap of the lack of interpretability of machine learning models in the field of mill vibration prediction.

This paper is organized as follows: “Methodology” briefly describes the XGBoost, BO, and SHAP algorithms, and provides performance metrics for the experimental study. “Data set and data pre-processing” describes the self-created dataset used in the study, including data collection, data pre-processing, and feature engineering. In “Results and discussion”, different algorithms are comprehensively evaluated, and the model in this paper is compared with other machine learning models. In “ Model interpretation”, the prediction model in this paper is explained as a whole and locally, and the influence of different process parameters on rolling mill vibration is deeply analyzed. In “Conclusion”, the experimental conclusions have been summarized.



XGBoost, proposed by Chen and Guestrin [27] in 2016, is an extensible end-to-end machine learning model. This model has the advantages of efficient tree pruning, regularization, and parallel processing. It has been used in many engineering fields to solve the industrial application problems [28,29,30]. XGBoost model optimizes the predicted value before each iteration through the residual. To solve the problem of over fitting in the optimization process, the objective function is regularized which can be described as

$$\mathcal{I}\left(\theta \right)=L\left(\theta \right)+\Omega \left(\theta \right),$$

where θ is the parameter trained from the given data; Ω indicates regularization; L denotes the training loss function (LOF), which is used to reveal the match degree the model and training data. Equation (2) is the prediction function; according to the Decision Tree (DT) theory, the output of the model \(\widehat{{y}_{i}}\) depends on the mean of the votes or sets

$$\widehat{{y}_{i}}=\sum_{i=1}^{k}{f}_{k}({x}_{i}),{f}_{k}\epsilon F.$$

The objective function of t time iteration can be showed with a mathematical model as follows:

$$\mathcal{I}(t)=\sum_{i=1}^{n}L({y}_{i},\widehat{{y}_{i}})+\sum_{k=1}^{t}\Omega \left({f}_{k}\right),$$

where n is the number of prediction and \({\widehat{{y}_{i}}}^{(t)}\) can be defined as


As shown by Chen and Guestrin [27], the regularization term \(\Omega ({f}_{k})\) of DT is represented as

$$\Omega ({f}_{k})=\gamma T+\frac{1}{2}\lambda \sum_{j=1}^{T}{\omega }_{j}^{2}.$$

\(\lambda \) represents the penalty factor and T delegates the number of leaves in the DT, γ denotes the complexity of each leaf, and ω is a vector of scores on the leaves. A second-order (instead of first-order) Taylor expansion in general gradient boosting is applied to the LOF in XGBoost (Chen and Guestrin [27]). Assuming the mean square error (MSE) as the LOF, the objective function can be obtained from the following equation:

$$\mathcal{I}\left(t\right)\approx \sum_{i=1}^{n}\left[{g}_{i}{\omega }_{q\left({x}_{i}\right)}+\frac{1}{2}\left({h}_{i}{\omega }_{q({x}_{i}}^{2}\right)\right]+\gamma T+\frac{1}{2}\lambda \sum_{j=1}^{T}{\omega }_{j}^{2},$$

where \({g}_{i}\) and \({h}_{i}\) denote the first-order and second-order derivatives of the MSE loss function, respectively, and q is the function that assigns a data point to the corresponding leaf.

Obviously, the LOF in Eq. (6) depends on the sums of loss values for each data sample. Since each data sample corresponds to only one leaf node, the LOF can also be used by the sums of the loss values of each leaf node, that is

$$\mathcal{I}\left(t\right)\approx \gamma T+\sum_{j=1}^{T}\left[{\left(\sum_{i\in {i}_{j}}{g}_{i}\right)\omega }_{j}+\frac{1}{2}\left(\sum_{i\in {i}_{j}}{h}_{i}+\lambda \right){\omega }_{j}^{2}\right]$$

Therefore, \({G}_{j}\) and \({H}_{j}\) are defined as

$${G}_{j}=\sum_{i\in {i}_{j}}{g}_{i},{H}_{j}=\sum_{i\in {i}_{j}}{h}_{i},$$

where \({I}_{j}\) denotes all data samples in leaf node j.

As a result, the objective function can be optimized with translating into the process of finding the minimum of a quadratic function. Figure 2 shows the XGBoost training flowchart.

Fig. 2
figure 2

XGBoost training flowchart

Bayesian optimization (BO)

The choice of hyperparameters is critical to model performance [31,32,33]. BO has proven to be a very effective optimization algorithm for solving machine learning optimization problems. According to Bayes' theorem, given the observation point E, the posterior probability P(M|E) of the model M is proportional to the likelihood ratio probability P(E|M) of the observation point E multiplied by the prior probability P(M) of the model M, that is

$$P\left(M|E\right)\infty P\left(E|M\right)P\left(M\right).$$

The BO algorithm is based on the historical evaluation results of the objective function to build a proxy model of the objective function, which makes full use of the previous evaluation information when selecting the next set of hyperparameters, reduces the retrieval times of hyperparameters. As a result, the obtained hyperparameters are most likely to be optimal, thus improving the prediction accuracy and generalization ability of the model [34,35,36,37,38,39].

SHapley Additive exPlanations (SHAP)

The interpretability of machine learning model is very important, because it can provide mechanism explanation of the machine learning model to make the best decision. SHAP is a method which is used to explain “black box” of machine learning model. SHAP is derived from the ideas of Shapley's game theory and was first proposed by Lundberg and Lee [40]. SHAP attempts to evaluate the contribution of each input feature to the model output, and analyze whether the contribution of each feature is negative or positive. Meanwhile, SHAP can calculate the contribution of each feature for each predicted output.

Model evaluation indicators

To effectively evaluate the reliability of the vibration prediction model and carry out comparative experiments between different algorithms, the relationship between the predicted value and the real value of the model is evaluated with the coefficient (R2), mean square error (MSE), and mean absolute error (MAE) as the evaluation index. The calculation formula of the evaluation index is as follows:

$${R}^{2}=1-\frac{{\sum }_{i=1}^{N}({y}_{i}-\widehat{{y}_{i}})}{{\sum }_{i=1}^{N}{({y}_{i}-\overline{y })}^{2}}$$
$$\mathrm{MAE}=\frac{1}{N}\sum_{i=1}^{N}\left|{y}_{i}-\widehat{{y }_{i}}\right|,$$

where \({y}_{i}\) denotes the true value, \(\widehat{{y}_{i}}\) denotes the predicted value, \(\overline{y }\) denotes the sample mean, and N denotes the data sample size.

Data set and data pre-processing

Data collection

Figure 3 shows the 1580 mm hot tandem strip rolling mill production line, which mainly consists of seven four-high mills. Figure 4 shows the vibration and process data acquisition system.

Fig. 3
figure 3

1580 mm hot strip rolling mill

Fig. 4
figure 4

Vibration and process data acquisition system

The measuring points of the sensors were arranged in the mill stand, depress hydraulic (DH) cylinder, backup roll (BUR) bearing seat, and work roll (WR) bearing seat, which are prone to serious vibration in the rolling process. Position A is the depress hydraulic cylinder, position B is the backup roll bearing seat, position C is the work roll bearing seat, and position D is the mill stand. Based on the professional field knowledge, the process parameters, such as back tension, entrance thickness, outlet thickness, rolling force, and rolling speed, are closely related to rolling mill vibration. Based on the field vibration test, it is known that horizontal vibration of the upper work roll has a significant impact on the product quality. Therefore, the process parameters of back tension, entrance thickness, outlet thickness, rolling force, and rolling speed are selected as the model input variables. At the same time, the upper work roll horizontal vibration is selected as the model output variable.

Excluding the anomalous data of the moment of steel biting and steel throwing, a total of 14,016 sets of valid data were collected. The whole data set was randomly divided into two sets, of which 80% was used as the training set to train the prediction model, and the other 20% was used as the test set to verify the prediction model. When dividing the data set, the random number seed is set to ensure that the data used to train and test different machine learning models are consistent every time. Under the condition that the training set is consistent, different optimization algorithms are used to optimize the important hyperparameters and parameters of the model, to ensure that the algorithm is as fair as possible. Table 1 shows some original data, Table 2 provides the statistics of the original data set.

Table 1 Raw data table
Table 2 The statistics of the original data set

Data description and pre-processing

As shown in Fig. 5, the violin diagram describes the data distribution, and outliers’ analyses of five input variables and one predicted output variable are described by violin diagram. Among them, the data distribution of rolling speed is shown in Fig. 5a. The rolling speed is relatively stable in the whole rolling process, mainly in the range of 2.20–2.35 m/s. During the early stage of rolling process, there are some abnormal values which should be eliminated. The data distribution of entrance thickness is shown in Fig. 5b. It can be seen from Fig. 5b that the entrance thickness is fluctuating obviously in the range of 0.0206–0.0210 m, which may be the main reason for the vibration of the rolling mill. As illustrated in Fig. 5c, the data distribution of rolling force is mainly distributed below 4000 KN in the rolling mill start-up stage and 14,000-20000 KN in the stable rolling stage. Figure 5d shows the data distribution of outlet thickness, which was concentrated near 0.012 mm. As shown in Fig. 5e, the data distribution of post-tension is mainly distributed near 130kN and 160kN. Figure 5f represents the data distribution of vibration acceleration. It can be seen from the Fig. 5f that the data distribution range is large, and there are many abnormal values at the same time. In addition, considering that the data set used in this paper has no missing values (as shown in Table 2) and there are only a few outliers, the elimination method, which would not affect the accuracy of model, is used in this paper to delete outliers.

Fig. 5
figure 5

Violin diagram of input and output data distribution

Feature engineering

Figure 6 illustrates the correlation analysis between input variables and output variable. According to the professional field knowledge, the best input variable is selected by setting the threshold to 0.15. It can be seen from Fig. 6 that the correlation between the five input variables and the output variable was weak, which was in line with the features of complex nonlinear dynamic coupling characteristics of the strip rolling mill system. In addition, the correlation coefficient between the five process parameters and vibration is greater than 0.15. Therefore, the five process parameters are selected as input variables.

Fig. 6
figure 6

Input–output correlation analysis

Results and discussion

Based on the production data collected by the strip rolling mill vibration and process data acquisition system (Fig. 4), a strip rolling mill vibration prediction model based on the XGBoost and BO is established through preprocessed data, as shown in Fig. 7.

Fig. 7
figure 7

Flowchart of data-driven vibration prediction method for strip rolling mill

Hyperparameters and parameters setting

To improve the prediction accuracy of the prediction model, GS, RS, and BO were used to optimize the important hyperparameters and parameters of XGBoost model, and the hyperparameter configuration with higher prediction accuracy than the default XGBoost model was obtained. It should be noted that the number of estimators (recorded as α) is an important hyperparameter of the model. If the value is too small, the model will be under fitted. If the value is too large, the model will be over fitted. The max depth (recorded as β) is another significant parameter of the model, which represents the complexity of the model. Generally, the tree model is pruned by setting appropriate values to avoid the problem of overfitting the model. The learning rate (recorded as γ) is also a critical parameter of the model, whose value indicates the learning speed of the model. Furthermore, the searching range of the same hyperparameters should be consistent to ensure the fairness of the comparison results for different optimization algorithms. The hyperparameters searching space and optimal values of different optimization algorithms are shown in Table 3.

Table 3 Searching space and optimal hyperparameters of XGBoost model

Comparison and analysis of results before and after optimization

Figure 8 shows the optimization results of three algorithms: GS, RS, and BO. It can be seen from Fig. 8 that the prediction performance of the optimized model was significantly improved compared with the XGBoost model under the default hyperparameters. Among them, the XGBoost prediction model optimized by BO possessed the best indexes, and the R2 reached 0.9131, which was better than the other two optimization algorithms. The results revealed that vibration prediction model based on the XGBoost and Bo could better fit the complex relationship between input process parameters and output vibration variables, and obtained better prediction results. As a consequence, the prediction model proposed in this paper was suitable for vibration prediction of hot strip rolling mill system.

Fig. 8
figure 8

Comparison of results before and after optimization

To evaluate the stability of the three optimization algorithms, GS adopted cross validation technology, and RS and BO realized the optimization of model hyperparameters by setting a fixed number of iterations. The R2 distribution in the process of hyperparameters optimization is shown in Fig. 9. From the distribution picture, the best prediction performance of the three optimization algorithms was very close, but the prediction performance stability of the model BO-XGB was the best. Therefore, the vibration prediction model based on the XGBoost and BO has stronger stability.

Fig. 9
figure 9

R2 distribution in the optimization process of three optimization algorithms

The running time of three optimization algorithms was tested using different proportions of original data set (10%, 30%, 50%, 70%, and 100%). The running time, under different proportions, of three optimization algorithms is shown in Table 4. Due to the enumeration method, the running time of GS is the longest, and the defects became more and more obvious with the increase of data sample size. RS had the fastest computing speed, but it was easy to miss the optimal value due to the low stability of the algorithm. As illustrated in Table 4, the considering the prediction performance and calculation speed, BO algorithm was selected to optimize the hyperparameters and parameters of the model.

Table 4 Model performance at different scales

Model comparison of prediction performance

To verify the prediction performance of the vibration prediction model of hot strip rolling mill based on the XGBoost and BO which is proposed in this paper, four machine learning models, including K-Nearest Neighbor (KNN), Decision Tree (DT), Random Forest (RF), and Gradient Boosting (GBoost), are selected for comparison and verification. These models have been applied to a variety of industrial fields and achieved good results. To ensure the fairness of model comparison, all the models were trained with the same training set and tested with the same test set. As shown in Fig. 10, the calculation effect of the KNN model is the worst effect, because the KNN model is not good at predicting data sets with large sample size. The prediction accuracy of RF model is higher than that of DT model, because RF is a parallel integrated learning algorithm, which generates multiple trees for weighted summation. At the same time, Gboost is also an integrated algorithm based on DT, which can promote weak learners to strong learners. Moreover, because the algorithm attaches importance to deviation, the prediction performance is better. In particular, the BO-XGBoost vibration prediction model established in this paper optimizes the loss function and improves the prediction accuracy by removing the constant term in the second-order Taylor expansion of the objective function. Therefore, the BO-XGBoost has better vibration prediction performance than GBoost.

Fig. 10
figure 10

Comparison of BO-XGBoost with other machine learning models

Model interpretation

Figure 11 shows the importance of the features based on the BO-XGBoost model. The figure ranks the features according to the magnitude of their contribution to the computational process. Figure 11 is obtained by calling the “xgb. feature_importances” function in the XGBoost model. From Fig. 11, the entrance thickness is the most important feature affecting the model prediction results, while the outlet thickness, rolling force, rolling speed, and back tension contribute relatively little to the model prediction results.

Fig. 11
figure 11

Sensitivity analysis of process parameters based on BO-XGBoost

The traditional XGBoost model cannot explain the influence law of each feature on the prediction results, and cannot evaluate the contribution of each feature to the prediction results. SHAP emphasizes the contribution of each feature to the corresponding prediction model and to the global and local behavior by assigning an SHAP value to each input variable to indicate its contribution to the result. Global interpretation aims to provide an overview of the SHAP values for input features of all samples. Figure 12 provides a global SHAP summary plot for the entire dataset, where the input features are placed on the y-axis according to their contribution to the rolling mill vibration prediction. The features are sorted from top to bottom based on the magnitude of their contribution, and the SHAP values are on the x-axis. Feature values are represented by colors, where blue-to-pink represent values from low to high. It can be seen from Fig. 12 that the entrance thickness has the greatest impact on the rolling mill vibration. The increase of entrance thickness can reduce the SHAP value, which shows that increasing entrance thickness can reduce rolling mill vibration. In addition, the contribution values of exit thickness, rolling speed, rolling force, and post-tension decrease in turn. It is worth noting that the smaller the rolling force, the smaller the corresponding SHAP value, which shows that reducing the rolling force can reduce the rolling mill vibration.

Fig. 12
figure 12

SHAP summary plot

The variation law between the SHAP value and input features is shown in Fig. 13. As shown in Fig. 13a, when the entrance thickness increases to 0.0205 m, the SHAP value decreases rapidly. Therefore, a reasonable entrance thickness is helpful to reduce the rolling mill vibration in the process of formulating the rolling schedule. As shown in Fig. 13b, d, e, when the rolling speed, outlet thickness, and back tension are small, the corresponding SHAP value remains around 0. Currently, the features do not affect the prediction result of the model. With the increase of feature value, the SHAP value begins to fluctuate greatly, which shows that the prediction result of the model fluctuates greatly and affects the stability of rolling mill operation.

Fig. 13
figure 13

SHAP feature dependence plots

The local interpretation aims to interpret the predictions of each individual sample. In this paper, two samples are selected for local interpretation of the BO-XGBoost model. The first sample in the dataset is illustrated in Fig. 14a and the 621st sample in the dataset is shown in Fig. 14b. As shown in Fig. 14, the red arrow indicates the positive shake value and feature, which increases the predicted value of the model, and the blue arrow indicates the negative shake value and feature, which decreases the predicted value of the model. As can be seen from Fig. 14a, the predicted value of rolling mill vibration of the first sample is 0.779 m/s2. The SHAP values of back tension, rolling speed, outlet thickness, and entrance thickness are positive, which is the feature of improving the rolling mill vibration, while the rolling force is the feature of reducing the rolling mill vibration. As shown in Fig. 14b, the predicted value of rolling mill vibration of the 621st sample is 0.653 m/s2. At the same time, the SHAP values of rolling force, rolling speed, outlet thickness, and back tension are positive, which is the feature of improving the rolling mill vibration, while the entrance thickness is the feature of reducing the rolling mill vibration. Moreover, the specific SHAP values of different features are provided by Fig. 15.

Fig. 14
figure 14

SHAP feature force plot

Fig. 15
figure 15

SHAP feature waterfall plot


To solve the problem of mismatching degree between the process parameters and the operation state of strip rolling mill, one prediction model was proposed and the conclusions of this paper were summarized as follows.

  1. (1)

    The complicated relationship between process parameters and rolling mill vibration was accurately established by the BO-XGBoost prediction model.

  2. (2)

    Compared with GS and RS, the prediction model optimized by BO algorithm with higher prediction accuracy, faster computational speed, and better stability.

  3. (3)

    The prediction results of the model were explained from a global perspective by introducing the SHAP method. As a result of the interpretation, the entrance thickness contributes the most to the output of BO-XGBoost prediction model.

  4. (4)

    Based on the collected data, the rolling speed of stand 2 rolling mill should not be greater than 2.3 m/s, the outlet thickness should not be greater than 0.01120 m and the rolling force should not be greater than 15000 KN to suppress the vibration of hot tandem strip rolling mill system.

By introducing Bayesian optimization algorithm and SHAP method, the problems of slow calculation speed, low prediction accuracy, and poor stability of the model were solved. At the same time, the proposed model will filling the technical gap of interpretable machine learning model in the field of rolling mill vibration prediction.