Abstract
Balanced industrial loading mainly relies on accurate multi-adjustment values, including the truck speed and chute flow. However, the existing models are weak in real-time loading prediction because the single-objective regression may ignore the correlation of multi-adjustment parameters. To solve the problem, we propose a joint model that fuses the composited-residual-block temporal convolutional network and the light gradient boosting machine (i.e., called CTCN-LightGBM). First, the instance selection deviations and abnormal supplement methods are used for data preprocessing and normalization. Second, we propose a side-road dimensionality reduction convolutional branch in the composited-residual-block temporal convolutional network to extract collaborative features effectively. Third, the feature re-enlargement method reconstructs extracted features with the original features to improve extraction accuracy. Fourth, the reconstructed feature matrix is utilized as the input of the light gradient boosting machine to predict multi-adjustment values parallelly. Finally, we compare the CTCN-LightGBM with other related models, and the experimental results show that our model can obtain superior effects for multi-adjustment value prediction.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Industrial loading that aims to achieve precise and quantitative loading for materials is widely used in mining, transportation, etc. However, when we need to achieve target loading by the conventional manual-programmable logic controller system, the real-time loading parameters that include the truck speed and the chute flow (i.e., shown in Fig. 1) are usually predicted and adjusted by the manual experience. Further, operators stop the truck and replenish underloading values according to the actual-target errors. This process often leads to unbalanced loading problems, economic losses (e.g., about 10% cost of coal mining enterprises per year in China), and even railway accidents. Thus, breaking down the barrier that predicts multi-adjustment values by manual interference has been challenging in the industrial loading field. How to precisely obtain multi-adjustment values for balanced loading based on historical experience has become the critical exploration issue in this paper.
Nowadays, the prediction of industrial time-series targets has been promoted with many innovative learning methods (e.g., deep learning models [1]). Especially hybrid deep learning, which aims to integrate the advantages of individual learners, has become a significant focus on improving model-generalization effects in industrial fields [2,3,4]. Ensemble-based learning models have been proposed to achieve single-target prediction [5,6,7]. For example, Li et al. [5] proposed a long short-term memory recurrent neural network to predict the short-term power load. Zhou et al. [6] and Ren et al. [7] provide industrial prediction methods based on the convolutional neural network (CNN) and the long short-term memory network (LSTM). However, these models have limitations in learning hidden and temporal correlations for collaborative features. Namely, they have a weak extracting or forecasting ability in the industrial loading field due to the loss of prior-historical knowledge and low-receptive fields for time-series data. To pursue high extraction capabilities and efficient prediction performance, some researchers have explored the combination of neural networks and machine learning methods [8,9,10]. Significantly, the convolutional neural network and light gradient boosting decision trees have provided sound effects for feature extraction and linear regression. The convolutional neural network relies on the layer-by-layer processing mechanism to learn sequential features from the raw data [11, 12]. In addition, the expansive decision tree methods adopt gradient descent to accelerate convergence [13]. However, the hybrid model cannot efficiently capture long-distance features because the weight-feedback adjusting process will be slow with many deep network layers. In a word, its application to collaborative feature extraction is relatively limited.
Based on the above analysis, the temporal convolutional network (TCN), an expansive convolutional neural network with dilated causal convolution layers, has been proposed for achieving a wide receptive field [14]. The method integrates the advantages of parallel distributed extraction for the convolutional neural network and temporal regression for the recurrent neural network [15]. It is suitable for parallel and dynamic nonlinear feature extraction. However, due to positive and negative multi-adjustment values of industrial loading, its application to collaborative feature predicting is relatively weak. In addition, the gradient-boosting decision tree (GBDT) algorithms have become popular because of their distributed and fast processing capacity for massive data [16, 17]. Among them, the gradient boosting machine (GBM) adopts the local low-gradient data to reduce the time and space overhead, which has an advantage in predicting single targets with positive and negative values while having a shortcoming for multi-objective tasks. Thus, it is not easy to accurately predict multi-adjustment values for industrial balanced loading.
To accurately predict multi-adjustment values for balanced loading, this paper proposes a joint learning model (CTCN-LightGBM) based on the composited-residual-block TCN and the Light-GBM. The novelty of the work is that the CTCN-LightGBM integrates a wide receptive field and dimensionality reduction convolution for the CTCN and negative-gradient ensemble learners for the parallel Light-GBM. The model can improve the predictive accuracy by auxiliary branches and optimize the data-regression performance of the expansive GBDT. Also, we provide a feature re-enlargement (FR) method that reconstructs the collaborative feature matrix with original features to improve the extraction ability of the CTCN. Experimental results show that the CTCN-LightGBM model achieves significant and reasonable improvement compared to other contrast models in the industrial loading field. The main contributions of the paper are as follows:
-
1.
We extract collaborative features through a composited residual block in the TCN, replacing the 1 × 1 convolutional shortcut with a side-road dimensionality reduction convolutional branch. The branch can acquire auxiliary features to improve the generalization ability and preserve the sign characteristics of multi-adjustment values.
-
2.
The feature re-enlargement method (FR method) is proposed to enlarge the extraction accuracy of the CTCN. We process the original features with the extracted speed-flow element ratios and integrate them with the collaborative feature matrix extracted by the CTCN. Further, the reconstructed feature matrix will be used as the input of the Light-GBM for predicting accurate multi-adjustment values (i.e., the truck speed and chute flow).
-
3.
This paper is an academic research based on actual industrial demands. We need to adjust their loading parameters in real engineering scenarios to achieve target loading. Absolutely, only by accurately predicting the multi-adjustment values can industrial loading make a balanced plan. The CTCN-LightGBM model effectively solves practical industrial demands and brings essential significance.
The remainder of the paper is organized as follows: Sect. 2 overviews the related work of hybrid learning models for industrial target prediction. Section 3 proposes the structure of the CTCN-LightGBM model. Section 4 gives some experimental results and theoretical analysis. Finally, the conclusion and future work are given in Sect. 5.
2 Related Work
We review the related research work in two main areas in this paper, including the industrial hybrid model via neural networks and the optimized gradient method via decision trees.
2.1 The Industrial Hybrid Model via Neural Networks
The industrial hybrid model via neural networks has proven successful for forecasting parameters [6, 7, 18, 19] and target detection [20,21,22] in related industrial fields. For example, Li et al. [18] propose a deep learning algorithm composed of long short-term memory and fully connected layers to predict photovoltaic power generation. Because of the simple structure of the FC layer, the hidden distribution of features cannot be efficiently exploited for data prediction. Geng et al. [19] propose a novel gated-convolutional neural network-based transformer for dynamic soft sensor modeling of industrial processes. The model can adaptively filter the essential features. Further, Zhou et al. [6] provide a hybrid model to improve electrical equipment's load decomposition accuracy. Ding et al. [7] propose a model based on convolutional neural networks and a gate recurrent unit model to identify rough-stored express deliveries intelligently. Xia et al. [20] and Qiang [21] propose depth neural networks for industrial control. Siegel [22] proposes an anomaly detection mechanism based on the convolutional neural network and the generative adversarial network for industrial equipment. However, these heterogeneous neural networks are weak for extracting and predicting multi-adjustment values in industrial loading.
2.2 The Optimized Gradient Method via the Decision Trees
The light gradient boosting decision tree and expansive models are adopted to achieve precise regression/classification [23, 24]. Zhang et al. [23] propose a gradient-boosting decision tree-based fault prediction tool for cyber-physical production systems. The online test results prove that the model has high prediction accuracy. Yan and Wen [24] propose a light gradient boosting machine to detect power theft from power companies. However, the learning ability of these single decision tree models is insufficient to process the multi-distribution features. Nakamura et al. [25] use a hybrid model based on the bidirectional long short-term memory and the gradient-boosted decision tree for the binary classification of radiology reports. Lu et al. [26] integrate the long short-term memory with the gradient boosting machine to predict end-to-end inferences. Dan et al. [27] combine a convolutional neural network with the gradient-boosting decision tree for temperature prediction. Also, Ju et al. [28] propose a convolutional neural network and light-GBM model to predict wind power. Due to the limitation of the receptive field, these models have a poor learning effect on temporal feature relationships. Y. Wang et al. [29] propose a short-term load forecasting model based on the temporal convolutional network and the gradient boosting machine for industrial customers. Experiments show that the TCN-LightGBM model can predict electrical loads in multiple industrial scenarios.
However, the existing hybrid models are less mentioned and unsuitable for collaborative feature extraction in industrial loading fields. Thus, this paper explores the CTCN-LightGBM model to achieve accurate multi-adjustment value prediction.
3 Structure of the CTCN-LightGBM Model
The CTCN-LightGBM prediction model consists of three parts: the data preprocessing and normalization, the feature extraction based on the CTCN, and the Light-GBM prediction. The detailed process of the CTCN-LightGBM model is designed in Fig. 2.
3.1 The Data Preprocessing and Normalization
The dataset features consist of speed-related features (i.e., Feature_1), flow-related features (i.e., Feature_2), and labels in this paper. The raw dataset usually has some missing/abnormal instances because of the manual experience inference and recording accuracy errors. We propose data processing methods to deal with this problem, as listed in Table 1. We adopt the unit-adjustment values (i.e., \(\Delta V,\Delta Q = 0.0001\)) to replace the zero-value in actual instances, improving the data accuracy while conforming to industrial conditions. In addition, we set data selection deviations according to actual industrial requirements in Table 2, which can ensure the prediction effect and uniformly regulate loading target standards.
Further, we use the MinMaxScaler function to preprocess raw inputs \(X = [L,T,V,M,C,H,Q]\) (e.g.,\(L:0.118\,{\text{m}},T:2.03\,{\text{s}},V:0.061\,{\text{m/s}},M:0.579\,{\text{t}},C:0.279\,{\text{t/s}},H: - 0.021\,{\text{m}},Q:0.186\,{\text{t/s}}\)). Also, the MaxAbsScaler function is utilized to normalize the labels \([\Delta V,\Delta Q]\) (e.g., \(\Delta V: \, - 0.002\,{\text{m/s}}\) and \(\Delta Q: - 0.015\,{\text{t/s}}\)). These methods can improve the accuracy of feature extraction and accelerate the speed of gradient descent. Formula (1) and (2) describe normalized operations of features and labels.
where \(X_i\) is the ith column vector of the raw feature input \(X\), \(X^{\prime}_i\) is the ith normalized column vector of the \(X_i\), \(X_i .\max ( \cdot )\) and \(X_i .\min ( \cdot )\) are the maximum and minimum values of the ith column vector and \(\hat{X}_i\) is the elements of the ith column vector to be normalized. \(O_i\) is the ith column vector of the raw-label input \(O\), \(O_{Zi}\) is the ith normalized column vector of \(O_i\), and \(O_i .{\text{abs}}(\hat{O}_i )\) represents the absolute value of the ith column vector.
3.2 The Feature Extraction Based on the CTCN
In this section, we propose the feature extraction module based on the CTCN, and the details are as follows.
3.2.1 Dilated causal convolution
The dilated causal convolution of the CTCN is proposed to solve the problem of limited receptive fields in the temporal domain convolution. The interval sampling can be achieved based on multiple dilated convolutional layers by changing the convolution kernel’s size or the expansion factor’s value. For the one-dimensional features \(X^{\prime} = (x^{\prime}_0 ,x^{\prime}_1 , \ldots ,x^{\prime}_t , \ldots ,x^{\prime}_T )\) and the kernels \(f = \{ 0,1, \ldots ,n - 1\}\), the dilated convolution operation \(H( \cdot )\) of each element \(T\) is defined in Formula (3). Further, the final output \(F(X^{\prime})\) of the transformation branch is described in Formula (4).
where \(n\) is the kernel size, d is the dilated factor, and \(T - d \cdot i\) is the past direction. \(f( \cdot )\) denotes the convolutional operation of the ith kernel. \(\psi [ \cdot ]\) is a series of transformation operations, including the dilated convolution, the weight normalization, the Relu activation, and dropout layers.
3.2.2 Composited dimensionality reduction convolution
The simple shortcut (i.e., the 1 × 1 convolutional layer) in residual blocks may lead to the TCN model that does not generalize well enough to collaborative features. However, the hidden bottleneck layer supports the network of existing autoencoders to reconstruct the raw data by reducing feature dimensions. It preserves necessary features to improve the accuracy of feature extraction or prediction. Inspired by this method and reducing the time consumption, we provide a side-road dimensionality reduction convolutional branch to replace the 1 × 1 convolutional shortcut in the residual block. The branch can easily extract auxiliary features and preserve labels’ positive or negative characteristics. First, the one-dimensional features \(X^{\prime} = (x^{\prime}_0 ,x^{\prime}_1 , \ldots ,x^{\prime}_t , \ldots ,x^{\prime}_T )\) are processed by the initial 1 × 1 convolutional layer, reducing the number of parameters to increase the computing power effectively. Second, some valuable features can be extracted through a one-dimensional convolutional layer (e.g., kernel size is \(1 \times k\)) that reduces the feature dimension by a \(b\)-ratio, as described in Formula (5). This convolutional layer records additional features of multi-adjustment values while eliminating some low-relevant features. Finally, we perform a linear projection (i.e., 1 × 1 convolution) at the end of the branch to preserve the characteristics of the convolution branch. The batch normalization function can improve the convergence speed during the training process. Also, the LeakyReLU function is adopted to solve the existing negative value problem in the activation layer. Further, the final output \(F^{\prime}(X^{\prime})\) of the side-road dimensionality reduction convolutional branch can be described in Formula (6)
where \(N\) is the batch size, \(c_{{\text{out}}}\) is the convolution kernel number or the output dimension, and \(c_{{\text{in}}}\) is the input dimension. \({\text{bias}}( \cdot )\) is the bias vector (e.g., \({\text{bias}} = 1\)), and k is the filter number. The c is the input dimension of the next layer.
where \(\psi^{\prime}[ \cdot ]\) is a series of transformation operations, including the dimensionality reduction convolution, the batch normalization, and the activation layer.
3.2.3 Composited residual block
The conventional residual block [30] concludes with a transformation branch and a shortcut, as shown in Fig. 3a. Further, the output \(X^{(l)}\) of the l-th residual block can be expressed as Formula (7). This paper proposes a composited residual block consisting of a dilated causal convolution branch and a composited dimensionality reduction convolution branch, as shown in Fig. 3b. The output \(X^{{\prime}(l)}\) of the l-th residual block can be expressed as Formula (8). When the residual connection operations are completed, we can get a two-dimensional collaborative feature matrix as the output of the extract module, as described in Formula (9).
where \(\delta\) represents the activation operation. \(F( \cdot )\) represents a series of convolutional exchange operations (e.g., dilated convolution, dropout, weight normalization). \(H_{{\text{map}}} [ \cdot ]\) is the feature map produced in residual blocks \(1,2, \ldots ,{\text{final}}\). \(Y_V = (v_0 ,v_1 ,\ldots,v_T )^T\) is the speed-related element of the output matrix Y. \(Y_Q = (q_0 ,q_1 , \ldots ,q_T )^T\) is the flow-related element of the output matrix Y.
Notably, some associated characteristics (e.g., the displacement L is a constant feature as 0.118 m. V and T are associated with L. Also, Q and T are associated with L) should play an important role in collaborative feature extraction. Namely, the collaborative extraction process by the CTCN module may ignore some inherent properties and emphasize the multi-adjustment relationship by changing the feature distribution. Thus, the feature re-enlargement method is proposed to reconstruct and enlarge the relational properties of collaborative features from original relations. The FR method can improve the extraction ability of the CTCN and the accuracy of the parallel Light-GBM prediction module. Also, it further helps to make the sign features consistent between the collaborative and original features. We process original features according to speed-flow element ratios of the two-dimensional collaborative feature matrix. Further, Formula (10) and (11) can express speed-related and flow-related features from the original features. The final reconstructed feature matrix \(Z = [V^{\prime},Q^{\prime}]/2\) can be described in Formula (12).
3.3 The Light-GBM Optimized Prediction
The gradient boosting machine is an upgraded gradient boosting framework based on a decision tree, which is widely applied in classification or regression tasks [31, 32]. First, the Light-GBM adopts a gradient-based one-side sampling method to exclude the lowest gradient samples and calculate the information gain of the large gradient samples. Second, we use a histogram algorithm to obtain optimal splitting points, and the leaf-wise strategy reduces unnecessary splitting overhead for the lower-gain leaf nodes. Third, we set the max depth of all decision trees to prevent overfitting problems. Notably, the Light-GBM adopts a gradient descent function (i.e., \(- g_t (x)\)) to optimize the new function increment, as described in Formula (13). So, we can utilize the classical least-squares minimization task to simplify the objective function, as denoted in Formula (14).
where \(f\) is the functional model between input features and response outputs, \(\{ g_t (x_i )\}_{i = 1}^M\) is the negative gradient, \(\Psi (y^{\prime},f(x))\) is the specific loss function, and \(f(x) = \hat{f}^{t - 1} (x)\) is the (t-1)th function estimation, also called boosts. \(h(x,\theta )\) denotes a custom base-learner function, p denotes the boundary expansion, and \((\rho ,\theta )\) denotes the optimization parameters (i.e., the step size and the functional dependence parameters).
Finally, we suppose that the input sequence of the samples via the gradient-based one-side sampling method is \([Z,O] = [(Z_1 ,O_1 ),(Z_2 ,O_2 ),\ldots,(Z_N ,O_N )]\). \(O = (O_1 ,O_2 ,\ldots,O_N )\) denotes the actual values of the multi-adjustment instances. The final output functions of the parallel Light-GBM prediction module are shown in Formula (15). The multiple prediction values of the multi-adjustment parameters (\(\mathop{Z}\limits^{\frown}_v ,\mathop{Z}\limits^{\frown}_q\)) are described in Formula (16) and (17). Further, the algorithm of the CTCN-LightGBM is proposed as follows.
where \(L_0 (z_g )\) is the value of the initial weak learner. \(F_m ( \cdot )\) denotes the output of the m-th decision trees. \((v_1 ,v_2 , \, \ldots ,v_M )\) is the weight of each tree, \(M\) is the number of the trees, \(\Re_{m,j} ,j = 1,2, \ldots ,J\) denotes the leaf node area of the m-th decision tree, and \(c_{m,j}\) is a leaf node. Further, if \(z_g \in {\text{leaf}}_{m,j}\), \(I = 1\); else \(I = 0\). \(F_{{\text{final}}} (o_g )\) is the final output of the Light-GBM module.
In Algorithm 1, steps 1–6 denotes the process of the initial dataset preparation, the feature matrix extraction, and the feature reconstruction process. Further, Step 7–12 denotes the predicting process of multi-adjustment values based on the parallel Light-GBM module.
4 Experiments
4.1 The Experimental Settings and Performance Metrics
This paper collects the real loading datasets from different coal mines (i.e., Huaibei Mining Co., Ltd and Linhuan Mining Co., Ltd) in Anhui Province, China. We take the whole carriage as a single research object (i.e., concluding 117-loading point instances) and collect 50 carriages’ instances from each coal mine (e.g., each carriage is loaded on average four times a month). The historical loading data and corresponding multi-adjustment values from Apr 1st, 2021, to Nov 1st, 2021 are applied to carry out the experiments. The dataset is split into the training set and the testing set according to the proportion of 8:2. The experimental programming environment is Python 3.9, the Keras library, the NVIDIA RTX 3090, the AMD R7-5800 × CPU, and the 32 GB of memory. Further, the CTCN-LightGBM model and other contrast models are studied in this paper. These include two kinds of models, the classical learning models (i.e., the Light-GBDT [23], the Light-GBM [24], the TCN) and the hybrid learning models (i.e., the LSTM-CNN, the CNN-LSTM [6], the LSTM-LightGBM [26], the CNN-LightGBM [28], the TCN-LightGBM [29], and the CTCN-LightGBDT).
Because of the zero-extreme values in training data, the mean absolute percentage error is not suitable as an evaluation criterion. Usually, we adopt the mean absolute error (MAE), the root mean square error (RMSE), and the determination coefficient (\(R^2\)) as evaluation metrics for model prediction.
where \(W\) denotes the number of testing instances. \(Z_w\), \(\overline{Z}_w\), and \(\hat{Z}_w\) represent the actual, the average actual, and predicted multi-adjustment values in the w-th instance, respectively.
Further, since the range of the actual normalization instances is [− 1, 1], the predicted and actual values may have different signs. We select the measurement coefficient (F1-Score) and the area under the curve (AUC) value as evaluation metrics for model classification.
where \({\text{precision}}\) is the precision value, and \({\text{recall}}\) is the recall value. \({\text{TP}}\) is the number of true-positive values classified by the model, \({\text{FP}}\) is the number of false-positive values, \({\text{FN}}\) is the number of false-negative values, and \({\text{TN}}\) is the number of true-negative values. \(W^+\) and \(W^-\) are the number of positive and negative examples. \({\text{rank}}_i\) is the serial number of the ith sample (e.g., the probability scores are arranged from the small to the large instances).
4.2 Comparison Results for the Feature Dimensionality Reduction Ratio of the CRB
In this experiment, we compare the CRB with different dimensionality reduction ratios to explore the feature extraction ability of the CTCN-LightGBM model. The dimensionality reduction ratios are [0.5, 0.25, 0.125, 2], and the dataset collected in the Huaibei Mining Co., Ltd from Apr 1st, 2021, to July 1st, 2021 is utilized to run simulations. We randomly select the completed loading data as testing instances, and the data concludes the 117-loading point instances. The loss function is MSE, the evaluation metric is RMSE, and the parameter settings of contrast modules are as follows:
-
1.
CTCN: The factors of the dilated causal convolution branch are [1, 2, 4, 8], the filters are 64/64/16/16, and the convolutional kernel size is 2. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. Additionally, the chosen model optimizer is Adam, the gradient calculation function is MSE, the hidden layers are 32/16, the dropout value is 0.25, and the training epoch is 800.
-
2.
Light-GBM: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT.
Figures 4a and 5a show the training loss of the extraction module with different dimensionality reduction ratios for multi-adjustment parameters. Figures 4b and 5b are the fitting curves of the training loss curves. Table 3 shows the detailed results for the extraction module with different dimensionality reduction ratios. Among them, the CTCN-LightGBM model with a dimensionality reduction ratio b = 0.25 can acquire the best extraction performance (e.g., the RMSE metrics are about \(0.107 \times 10^{ - 2} {\text{m/s}}\) and \(0.286 \times 10^{ - 2} {\text{t/s}}\), where \({\text{m/s}}\) denotes the speed unit, and \({\text{t/s}}\) denotes the flow unit.).
4.3 Ablation Experiments for the Extraction Ability of the CTCN
TO explore the performance of the composited dimensionality reduction convolution branch in the CTCN, we compare the CTCN+-LightGBM (i.e., the CTCN-LightGBM without the composited dimensionality reduction convolution branch, but with a 1 × 1 shortcut) and CTCN++-LightGBM (i.e., the CTCN-LightGBM without the dimensionality reduction convolutional layer, as shown in Fig. 6a), and the CTCN+++-LightGBM (i.e., the CTCN-LightGBM without the one-dimensional convolutional layer, Fig. 6b) with the CTCN-LightGBM. The dataset collected in the Huaibei Mining Co., Ltd from Apr 1st, 2021, to July 1st, 2021 is utilized to run simulations. Further, the completed data (including 117-loading points) is adopted to perform the predictive ability of contrast models intuitively. We test three randomly completed data of multi-adjustment values in the testing dataset to eliminate model contingency. The parameter settings of contrast modules are as follows:
-
1.
CTCN: The factors of the dilated causal convolution branch are [1, 2, 4, 8], the filters are 64/64/16/16, and the convolutional kernel size is 2. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. Additionally, the feature dimensionality reduction ratio is 0.25, the hidden layers are 32/16, and the training epoch is 800.
-
2.
Light-GBM: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT.
Tables 4 and 5 show the ablation experimental results for the prediction of speed-adjustment values. Also, Tables 6 and 7 show the ablation experimental results for the prediction of flow-adjustment values. Among them, the modified CTCN modules (i.e., CTCN++ and CTCN+++) obtain better performance than the CTCN+-LightGBM for predicting the multi-adjustment values but are weaker than the CTCN-LightGBM model. Namely, the model generalization and sign-preserving ability could be improved (i.e., the R2: 0.909/0.895 vs. 0.925/0.924) by adding a side-road dimensionality reduction convolutional branch in CRBs. Further, without the one-dimensional convolutional layer, the performance of the CTCN+++-LightGBM is slightly poorer than the CTCN-LightGBM (i.e., the R2: 0.918/0.915 vs. 0.925/0.924). It means that both the one-dimensional convolutional layer and the dimensionality reduction convolutional layer can emphasize related useful features. However, the dimensionality reduction convolutional layer significantly improves generalization and retaining sign features ability. Namely, it emphasizes the model performances by reducing the dimension filters. Further, comparing the time consumption of the CTCN++-LightGBM with the CTCN+-LightGBM (i.e., 0.228 s vs. 0.226 s), we find that the added 1 × 1 convolutional layer can reduce parameter computation to make the auxiliary features effective while takes a little more time. In a word, with the help of features extracted by CRBs, the generalization and retaining sign features ability of the CTCN-LightGBM can be improved.
4.4 Comparison Results for the Extraction Ability of the FR method
This experiment compares the equal-parameter TCN-LightGBM with the dimensionality reduction layer (i.e., called TCN*-LightGBM) and the CTCN-LightGBM without the FR method (i.e., called CTCN*-LightGBM) with the CTCN-LightGBM to verify the extraction ability of the FR method in feature extraction module. Namely, the conventional residual block requires \((2k + 1) \cdot c^2\), whereas the CRB needs \([(2k + 1) + (k + 1) \times b] \cdot c^2\) parameters, which \(c\) denotes the feature dimension of the input and intermediate layers. Further, Fig. 7 shows the structure of the composited residual block for the TCN*, and each dilated casual convolution layer’s filter parameters are described in Table 8. Further, the related dataset is the same as in Sect. 4.3. We test three randomly completed data of multi-adjustment values in the testing dataset to eliminate model contingency. The parameter settings of contrast modules are as follows:
-
1.
TCN*: The temporal convolutional network is built by the Keras library. The dilated convolution factors of the temporal convolution network are [1, 2, 4, 8], the referenced filters are 64/64/16/16, and the convolutional kernel size is 2. The neuron number of the hidden layer is 16, the dropout value is 0.25, and the training epoch is 800.
-
2.
CTCN: The factors of the dilated causal convolution branch are [1, 2, 4, 8], the filters are 64/64/16/16, and the convolutional kernel size is 2. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. Additionally, the feature dimensionality reduction ratio is 0.25, the hidden layers are 32/16, and the training epoch is 800.
-
3.
Light-GBM: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT.
Tables 9 and 10 indicate the prediction and classification results of speed-adjustment values. Tables 11 and 12 indicate the prediction and classification results of flow-adjustment values. Among them, it is evident that the CTCN-LightGBM model can achieve the best prediction performance (e.g., R2: 0.917/0.924, RMSE: 0.107/0.285, MAE:0.077/0.208, F1-Score: 0.918/0.946, AUC: 0.906/0.935). Further, the CTCN-LightGBM model with the CRBs requires more computational time than other models (e.g., about 0.23 s), but it is acceptable considering the improvement of the model accuracy.
Figure 8a and b presents the actual prediction results of truck speed-adjustment values. Also, Fig. 9a and b represents the actual prediction results of chute flow-adjustment values. Among them, the CTCN-LightGBM can obtain the best evaluation metrics (e.g., RMSE: 0.106/0.283, MAE:0.071/0.203). Further, the TCN*-LightGBM is worse than the CTCN-LightGBM (e.g., R2 is 0.908/0.912 vs. 0.921/0.929), and the TCN-LightGBM also gets better performance than the CTCN*-LightGBM (e.g., R2 is 0.899/0.909 vs. 0.921/0.929). Namely, the FR method can improve the extraction ability of the CTCN module and help the Light-GBM module acquire superior performance for the prediction of sign characteristics (e.g., F1-Score: 0.945/0.961, AUC: 0.918/0.917) and values (e.g., R2: 0.921/0.929).
4.5 Prediction for Multi-Adjustment Values
This experiment compares the CTCN-LightGBM model with the listed models for multi-adjustment values prediction (i.e., truck speed and chute flow). We randomly select the completed temporal loading data (i.e., 117-loading points) as the testing instances to perform the predictive effects of each contrast model.
-
(1)
Experiment-1: In this experiment, the historical loading data collected in the Huaibei Mining Co., Ltd from Apr 1st, 2021, to Jul 1st, 2021 are utilized to run simulations. The experimental environments and models are listed in Sect. 4.1. The parameter settings of contrast modules are as follows:
-
(1)
TCN: The temporal convolutional network is built by the Keras library. The dilated convolution factors of the temporal convolution network are [1, 2, 4, 8], the filters are 128/64/32/16, and the convolutional kernel size is 2. The neuron number of the hidden layer is 16, and the dropout value is 0.25.
-
(2)
CTCN: The factors of the dilated causal convolution branch are [1, 2, 4, 8], the filters are 128/64/32/16, and the convolutional kernel size is 2. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. The dimensionality reduction ratio is 0.25, and the hidden layers are 32/16.
-
(3)
CNN: The convolutional neural network is concluded by convolutional layers and the fully connected layers. The convolutional layers are 4, the filters of each convolutional layer are 128/64/32/16, and the kernel size of filters is 2. The number of fully connected layers is 2, and the number of neurons is 16/1.
-
(4)
LSTM: The number of hidden layers is 4, and the hidden neurons are designed as 256/128/64/16. The fully connected layers are 2, and the number of neurons is 16/1.
-
(5)
Light-GBDT: The number of trees is 500, the maximum depth is 6, and the model learning rate is 0.01. The minimum sample split is 2, and the minimum sample leaf is 1.
-
(6)
Light-GBM: The number of trees is 500, the maximum depth is 6, the learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT.
-
(1)
Tables 13 and 14 indicate that the CTCN-LightGBM model with the help of the CTCN and the FR method is better than other classical or hybrid models. For example, the R2 is 0.917/0.926, the RMSE is 0.109/0.285, the MAE is 0.082/0.206, the F1-Score is 0.946/0.964, and the AUC is 0.964/0.964. In the hybrid models, the extraction ability of the expansive LSTM models is relatively poorer than the expansive CNN (i.e., the R2: the LSTM-LightGBM is 0.867/0.864, the CNN-LightGBM is 0.882/0.876 and the TCN-LightGBM is 0.904/0.904). Further, the time consumption of the CTCN-LightGBM model is longer than other hybrid convolutional models (i.e., TCN-LightGBM, TCN-LightGBDT, CNN-LightGBM), about 0.26 s.
Figure 10a and b shows the absolute prediction errors of the expansive Light-GBM models. Among them, the CTCN-LightGBM model obtains lower errors than other models, and the fluctuation trend is relatively stable. Further, the results mean that the CTCN-LightGBM model based on the FR method and the CRBs can obtain high accuracy. It is suitable for multi-adjustment value prediction in actual industrial loading.
-
(2)
Experiment-2: In this experiment, the historical loading data collected in the Linhuan Mining Co., Ltd from Aug 1st, 2021, to Nov 1st, 2021 are utilized to run simulations. The experimental environments and models are listed in Sect. 4.1. The parameter settings of contrast modules are as follows:
-
(1)
TCN: The temporal convolutional network is built by the Keras library. The dilated convolution factors of the temporal convolution network are [1, 2, 4, 8], the filters are 256/64/32/16, and the convolutional kernel size is 3. The hidden layers are 16/1, and the dropout value is 0.25.
-
(2)
CTCN: The factors of the dilated causal convolution branch are [1, 2, 4, 8], the filters are 256/64/32/16, and the convolutional kernel size is 3. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. The feature dimensionality reduction ratio is 0.25, and the hidden layers are 16/1.
-
(3)
CNN: The convolutional neural network is concluded by convolutional layers and the fully connected layers. The convolutional layers are 4, the filters of each convolutional layer are 256/64/32/16, and the kernel size of filters is 3. The number of fully connected layers is 2, and the number of neurons is 16/1.
-
(4)
LSTM: The number of hidden layers is 4, and the hidden neurons are designed as 128/64/64/32. The fully connected layers are 2, and the number of neurons is 16/1.
-
(5)
Light-GBDT: The number of trees is 500, the maximum depth is 6, and the model learning rate is 0.01. The minimum sample split is 2, and the minimum leaf is 1.
-
(6)
Light-GBM: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT.
-
(1)
Similar to Experiment 1, Tables 15 and 16 indicate that the CTCN-LightGBM model outperforms other contrast models. The CTCN-LightGBM model can distinguish and predict the positive and negative adjustment values with the highest evaluation scores (i.e., RMSE: 0.921/0.929, RMSE: 0.107/0.284, MAE: 0.077/0.200, F1-Score: 0.947/0.967, and AUC: 0.952/0.958). Further, the CTCN module of the CTCN-LightGBM model can accurately extract collaborative features and fit them well with machine learning models. Further, the expansive GBDT prediction module can improve the prediction performance for multi-adjustment values. Because of more parameters of each module in the CTCN-LightGBM model, the complexity and computational time are slightly increased (i.e., about 0.27s), but it is acceptable for industrial loading applications.
Figure 11a and b shows the expansive Light-GBM models’ absolute prediction errors and related trend distributions. Among them, the CTCN-LightGBM model precisely matches the truck speed and chute flow values, and the fluctuation of absolute prediction errors is relatively stable. Thus, it can be well applied to forecasting multi-adjustment values in industrial loading.
4.6 Discussion and Analysis
The paper compares the proposed CTCN-LightGBM with other models to illustrate better prediction effects. There are some intuitive results and theoretical analyses as follows:
-
1.
In classical learning models, the Light-GBDT and Light-GBM can better fit actual prediction targets, whether positive or negative values. The computational times of the expansive Light-GBDT models are significantly less than the TCN. Theoretically, the reason is that the ensemble learners (i.e., like decision trees) with the negative-gradient fitting can decrease the loss along the gradient direction. Further, the gradient-based one-side sampling method and the histogram algorithm for the Light-GBM can reduce the data size, guarantee basic features, and accelerate network convergence. Also, the leaf-wise strategy with depth limitation plays an essential role in avoiding overfitting.
-
2.
In hybrid learning models, the extraction performances of models via the LSTM are worse than that of models via the expansive CNN. Because of the long-time span and nonlinear feature distribution, the LSTM is not suitable for extracting hidden collaborative relationships. Further, the dilated convolution and residual blocks for the TCN can obtain a wider receptive field than the CNN, as shown in Table 17. Also, the limitations of the CNN will lead to poverty when capturing the temporal information. Additionally, Formula (25) and (26) provide the size of the receptive fields of convolutional layers and dilated residual blocks.
$$r_c = r_{c - 1} + \left[ {(k_c - 1)*\prod_{l = 1}^{c - 1} {s_l } } \right]$$(25)$$\omega_l = 1{ + }\sum_{i = 0}^{l - 1} {(k - 1)*\gamma^i } { = }1 + (k_l - 1) \ast \frac{\gamma^l - 1}{{\gamma - 1}}$$(26)where \(r_c\) is the receptive field size of the c-th convolutional layer, \(k_c\) is the kernel size of the c-th layer or the pooling layer size, and \(\prod {s_l }\) is the multiplication of the convolutional strides of the previous (c-1)-th layers. Also, \(\omega_l\) denotes the receptive field size of the l-th dilated residual layer. \(k_l\) denotes the kernel size of the l-th layer, and \(\gamma\) denotes the dilated factor (i.e., b = 2).
-
3.
The CTCN-LightGBM model that integrates the superiority of the CTCN and the Light-GBM achieves the best forecasting effect among all models. The receptive fields of the proposed CTCN-LightGBM model are significantly more expansive than that of the TCN-LightGBM, which can improve feature extraction ability. Namely, the CRB adopts a side-road dimensionality reduction convolutional branch to replace the 1 × 1 convolutional shortcut in the conventional residual block. Further, the CTCN-LightGBM model requires more time due to the auxiliary branch parameters, but the predictive performance in industrial loading should be firstly considered. In addition, the FR method reduces the abnormal loss and improves the Light-GBM module’s prediction effect by reconstructing and enlarging the hidden relationships of the collaborative feature matrix.
-
4.
The training time complexity of all contrast models. Based on the above models (e.g., LSTM, CNN, and Light-GBM), the training time complexity of hybrid learning models can be calculated in Tables 18 and 19. Where B is the input of training dataset instances, D is the feature dimension, k is the kernel size, n is the number of branch points, and N is the number of convolutional layers. \(T_1\), \(T_2\), and \(T\) are the time complexity of the dimension reshaping between two single models.\(a\%\) is the selected top \(a \times 100\%\) data and \(b\%\) is the randomly selected \(b \times 100\%\) data in different data subsets. \(N_1\) is the number of cells in the LSTM layer, and \(N_2\) is the number of convolutional layers in CRBs. \(depth\_max\) is the max depth of decision trees. Among them, the CNN-LSTM model and the LSTM-CNN model cost more time than other hybrid learning models. The reason is that these two neural networks need to reshape and match the feature sizes by increasing the channel. However, other hybrid models based on the Light-GBM will reduce a featured channel with decision tree algorithms to obtain a low training time complexity.
5 Conclusion
The paper proposes a CTCN-LightGBM model via the CTCN and the parallel Light-GBM to accurately predict real-time loading values for balanced industrial loading. The composited residual blocks in the CTCN are used to extract collaborative features of multi-adjustment values effectively. Also, we utilize the FR method to reconstruct collaborative features extracted by the CTCN and enlarge related properties for better multi-target prediction. In addition, we adopt the reconstructed feature matrix as the input of the parallel Light-GBM to accurately predict multi-adjustment values. Experiments show that our CTCN-LightGBM model significantly outperforms other contrast models in predicting industrial loading parameters. However, there are still some problems that have not been solved. For example, the proposed method has limitations in that the computational complexity will increase with the number of composite residual blocks. In the future, we will explore optimizing the structure of composited residual blocks in CTCN to reduce time consumption and apply it to more related industrial fields.
Availability of data and materials
Data sharing is not provided in this article or analyzed during the current research period.
References
Dong, S., Wang, P., et al.: A survey on deep learning and its applications. Comput. Sci. Rev. 40, 100379 (2021). https://doi.org/10.1016/j.cosrev.2021.100379
Dabbaghjamanesh, M., Kavousi-Fard, A., et al.: Stochastic modeling and integration of plug-in hybrid electric vehicles in reconfigurable microgrids with deep learning-based forecasting. IEEE Trans. Intell. Transp. Syst. 22(7), 4394–4403 (2021). https://doi.org/10.1109/TITS.2020.2973532
Zhao, R., Wang, D., et al.: Machine health monitoring using local feature-based gated recurrent unit networks. IEEE Trans. Industr. Electron. 65(2), 1539–1548 (2018). https://doi.org/10.1109/TIE.2017.2733438
Chemali, E., Kollmeyer, P.J., et al.: Long short-term memory networks for accurate state-of-charge estimation of li-ion batteries. IEEE Trans. Indus. Electr. 65(8), 6730–6739 (2018). https://doi.org/10.1109/TIE.2017.2787586
Li, F., Yu, X. et al: Short-term load forecasting for an industrial park using LSTM-RNN considering energy storage. In: 2021 3rd Asia Energy and Electrical Engineering Symposium (AEEES), pp. 684–689 (2021). https://doi.org/10.1109/AEEES51875.2021.9403118
Zhou, X., Feng, J., et al.: Non-intrusive load decomposition based on CNN-LSTM hybrid deep learning model. Energy Rep. 7, 5762–5771 (2021). https://doi.org/10.1016/j.egyr.2021.09.001
Ren, L., Dong, J., et al.: A data-driven auto-CNN-LSTM prediction model for lithium-ion battery remaining useful life. IEEE Trans. Indus. Inform. 17(5), 3478–3487 (2021). https://doi.org/10.1109/TII.2020.3008223
Li, Y., Yang, C., et al.: A model combining Seq2Seq network and LightGBM algorithm for industrial soft sensor. IFAC PapersOnLine 53(2), 12068–12073 (2020). https://doi.org/10.1016/j.ifacol.2020.12.753
Kumar, A., Jaiswal, A., et al.: A deep swarm-optimized model for leveraging industrial data analytics in cognitive manufacturing. IEEE Trans. Indus. Inform. 17(4), 2938–2946 (2021). https://doi.org/10.1109/TII.2020.3005532
Yang, H., Li, W.D., et al.: Deep ensemble learning with non-equivalent costs of fault severities for rolling bearing diagnostics. J. Manuf. Syst. 61, 249–264 (2021). https://doi.org/10.1016/j.jmsy.2021.09.009
Yuan, W., Dong, B., et al.: Evolving multi-resolution pooling cnn for monaural singing voice separation. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 807–822 (2021). https://doi.org/10.1109/TASLP.2021.3051331
Krithivasan, S.P.: Detection of cyberattacks in industrial control systems using enhanced principal component analysis and hypergraph-based convolution neural network (EPCA-HG-CNN). IEEE Trans. Ind. Appl. 56(4), 4394–4404 (2020). https://doi.org/10.1109/TIA.2020.2977872
Yu, X., Xu, Z. et al.: Load forecasting based on smart meter data and gradient boosting decision tree. In: 2019 Chinese Automation Congress (CAC), pp: 4438–4442 (2019). https://doi.org/10.1109/CAC48633.2019.8996810.
Yuan, X., Qi, S., et al.: Quality variable prediction for nonlinear dynamic industrial processes based on temporal convolutional networks. IEEE Sens. J. 21(18), 20493–20503 (2021). https://doi.org/10.1109/JSEN.2021.3096215
Yan, J., Mu, L., et al.: Temporal convolutional networks for the advance prediction of ENSO. Sci. Rep. 10(1), 8055 (2020). https://doi.org/10.1038/s41598-020-65070-5
Wang, W., Yang, R., et al.: CNN-based hybrid optimization for anomaly detection of rudder system. IEEE Access 9, 121845–121858 (2021). https://doi.org/10.1109/ACCESS.2021.3109630
Zhang, Z., Jung, C.: GBDT-MO: gradient-boosted decision trees for multiple outputs. IEEE Trans. Neural Netw. Learn. Syst. 327(3), 156–3167 (2021). https://doi.org/10.1109/TNNLS.2020.3009776
Li, Y., Ye, F., et al.: A short-term photovoltaic power generation forecast method based on LSTM. Math. Probl. Eng. (2021). https://doi.org/10.1155/2021/6613123
Geng, Z., Chen, Z., et al.: Novel transformer based on gated convolutional neural network for dynamic soft sensor modeling of industrial processes. IEEE Trans. Industr. Inf. 18(3), 1521–1529 (2022). https://doi.org/10.1109/TII.2021.3086798
Xia, W., Neware, R., et al.: An optimization technique for intrusion detection of industrial control network vulnerabilities based on BP neural network. Int. J. Syst. Assur. Eng. Manage. 13, 576–582 (2022). https://doi.org/10.1007/s13198-021-01541-w
Qiang, R.: Improved depth neural network industrial control security algorithm based on PCA dimension reduction. In: 2021 4th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). pp: 891–894 (2021). https://doi.org/10.1109/AEMCSE51986.2021.00181
Siegel, B.: Industrial anomaly detection: a comparison of unsupervised neural network architectures. IEEE Sens. Lett. 4(8), 1–4 (2020). https://doi.org/10.1109/LSENS.2020.3007880
Zhang, Y., Beudaert, X., et al.: A CPPS based on GBDT for predicting failure events in milling. Int. J. Adv. Manuf. Technol. 111, 341–357 (2020). https://doi.org/10.1007/s00170-020-06078-z
Yan, Z., Wen, H.: Comparative study of electricity-theft detection based on gradient boosting machine. In: 2021 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), pp. 1–6 (2021). https://doi.org/10.1109/I2MTC50364.2021.9460035
Nakamura, Y., Hanaoka, S., et al.: Automatic detection of actionable radiology reports using bidirectional encoder representations from transformers. BMC Med. Inform. Decis. Making (2021). https://doi.org/10.1186/s12911-021-01623-6
Lu, L., Lyu, B.: Reducing energy consumption of neural architecture search: an inference latency prediction framework. Sustain. Cities Soc. 67, 102747 (2021). https://doi.org/10.1016/j.scs.2021.102747
Dan, Y., Dong, R., et al.: Computational prediction of critical temperatures of superconductors based on convolutional gradient boosting decision trees. IEEE Access 8, 57868–57878 (2020). https://doi.org/10.1109/ACCESS.2020.2981874
Ju, Y., Sun, G., et al.: A model combining convolutional neural network and LightGBM algorithm for ultra-short-term wind power forecasting. IEEE Access 7, 28309–28318 (2019). https://doi.org/10.1007/10.1109/ACCESS.2019.2901920
Wang, Y., Chen, J., et al.: Short-term load forecasting for industrial customers based on TCN-LightGBM. IEEE Trans. Power Syst. 36(3), 1984–1997 (2021). https://doi.org/10.1109/TPWRS.2020.3028133
Esmaeilzehi, A., Ahmad, M.O., et al.: SRNHARB: a deep light-weight image super resolution network using hybrid activation residual blocks. Signal Process. Image Comm. 99, 116509 (2021). https://doi.org/10.1016/j.image.2021.116509
Shen, Z., Zhang, Y., et al.: A novel time series forecasting model with deep learning. Neurocomputing 396, 302–313 (2020). https://doi.org/10.1016/j.neucom.2018.12.084
Thai, D., Tu, T.M., et al.: Gradient tree boosting machine learning on predicting the failure modes of the rc panels under impact loads. Eng. Comput. 37(1), 597–608 (2021). https://doi.org/10.1007/s00366-019-00842-w
Acknowledgements
This work was mainly supported by the National Natural Science Foundation of China (Grant No.51874010), and the National Natural Science Foundation of China (Grant No.51675003).
Funding
Research grants from the National Natural Science Foundation of China (Grant No.51874010), the National Natural Science Foundation of China (Grant No.51675003), the Natural Science Research Projects of Colleges and Universities in Anhui Province (KJ2020A0309), and the National key R&D project: the intelligent dispatching technology for all mine personnel and materials (2020YFB1314103).
Author information
Authors and Affiliations
Contributions
ZC participated in the industrial loading studies, carried out the hybrid learning model design, made related experiments, and drafted the manuscript. CW led the research direction, participated in the study's design, and participated in drafting the manuscript. HJ participated in the model design and helped to draft the manuscript. JL and SZ participated in the experiments and polished this manuscript's language expression and grammar. QO provided the corresponding data of industrial loading for experiments and the research background. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare there are no other relationships or activities that could appear to have influenced the submitted work. To the best of our knowledge, the named authors have no conflict of interest, financial or otherwise.
Ethics approval and consent to participate
This article is for non-life science journals.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, Z., Wang, C., Jin, H. et al. The CTCN-LightGBM Joint Model for Industrial Balanced Loading Prediction. Int J Comput Intell Syst 16, 1 (2023). https://doi.org/10.1007/s44196-022-00175-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44196-022-00175-5