1 Introduction

Industrial loading that aims to achieve precise and quantitative loading for materials is widely used in mining, transportation, etc. However, when we need to achieve target loading by the conventional manual-programmable logic controller system, the real-time loading parameters that include the truck speed and the chute flow (i.e., shown in Fig. 1) are usually predicted and adjusted by the manual experience. Further, operators stop the truck and replenish underloading values according to the actual-target errors. This process often leads to unbalanced loading problems, economic losses (e.g., about 10% cost of coal mining enterprises per year in China), and even railway accidents. Thus, breaking down the barrier that predicts multi-adjustment values by manual interference has been challenging in the industrial loading field. How to precisely obtain multi-adjustment values for balanced loading based on historical experience has become the critical exploration issue in this paper.

Fig. 1
figure 1

The diagram of multi-adjustment parameters

Nowadays, the prediction of industrial time-series targets has been promoted with many innovative learning methods (e.g., deep learning models [1]). Especially hybrid deep learning, which aims to integrate the advantages of individual learners, has become a significant focus on improving model-generalization effects in industrial fields [2,3,4]. Ensemble-based learning models have been proposed to achieve single-target prediction [5,6,7]. For example, Li et al. [5] proposed a long short-term memory recurrent neural network to predict the short-term power load. Zhou et al. [6] and Ren et al. [7] provide industrial prediction methods based on the convolutional neural network (CNN) and the long short-term memory network (LSTM). However, these models have limitations in learning hidden and temporal correlations for collaborative features. Namely, they have a weak extracting or forecasting ability in the industrial loading field due to the loss of prior-historical knowledge and low-receptive fields for time-series data. To pursue high extraction capabilities and efficient prediction performance, some researchers have explored the combination of neural networks and machine learning methods [8,9,10]. Significantly, the convolutional neural network and light gradient boosting decision trees have provided sound effects for feature extraction and linear regression. The convolutional neural network relies on the layer-by-layer processing mechanism to learn sequential features from the raw data [11, 12]. In addition, the expansive decision tree methods adopt gradient descent to accelerate convergence [13]. However, the hybrid model cannot efficiently capture long-distance features because the weight-feedback adjusting process will be slow with many deep network layers. In a word, its application to collaborative feature extraction is relatively limited.

Based on the above analysis, the temporal convolutional network (TCN), an expansive convolutional neural network with dilated causal convolution layers, has been proposed for achieving a wide receptive field [14]. The method integrates the advantages of parallel distributed extraction for the convolutional neural network and temporal regression for the recurrent neural network [15]. It is suitable for parallel and dynamic nonlinear feature extraction. However, due to positive and negative multi-adjustment values of industrial loading, its application to collaborative feature predicting is relatively weak. In addition, the gradient-boosting decision tree (GBDT) algorithms have become popular because of their distributed and fast processing capacity for massive data [16, 17]. Among them, the gradient boosting machine (GBM) adopts the local low-gradient data to reduce the time and space overhead, which has an advantage in predicting single targets with positive and negative values while having a shortcoming for multi-objective tasks. Thus, it is not easy to accurately predict multi-adjustment values for industrial balanced loading.

To accurately predict multi-adjustment values for balanced loading, this paper proposes a joint learning model (CTCN-LightGBM) based on the composited-residual-block TCN and the Light-GBM. The novelty of the work is that the CTCN-LightGBM integrates a wide receptive field and dimensionality reduction convolution for the CTCN and negative-gradient ensemble learners for the parallel Light-GBM. The model can improve the predictive accuracy by auxiliary branches and optimize the data-regression performance of the expansive GBDT. Also, we provide a feature re-enlargement (FR) method that reconstructs the collaborative feature matrix with original features to improve the extraction ability of the CTCN. Experimental results show that the CTCN-LightGBM model achieves significant and reasonable improvement compared to other contrast models in the industrial loading field. The main contributions of the paper are as follows:

  1. 1.

    We extract collaborative features through a composited residual block in the TCN, replacing the 1 × 1 convolutional shortcut with a side-road dimensionality reduction convolutional branch. The branch can acquire auxiliary features to improve the generalization ability and preserve the sign characteristics of multi-adjustment values.

  2. 2.

    The feature re-enlargement method (FR method) is proposed to enlarge the extraction accuracy of the CTCN. We process the original features with the extracted speed-flow element ratios and integrate them with the collaborative feature matrix extracted by the CTCN. Further, the reconstructed feature matrix will be used as the input of the Light-GBM for predicting accurate multi-adjustment values (i.e., the truck speed and chute flow).

  3. 3.

    This paper is an academic research based on actual industrial demands. We need to adjust their loading parameters in real engineering scenarios to achieve target loading. Absolutely, only by accurately predicting the multi-adjustment values can industrial loading make a balanced plan. The CTCN-LightGBM model effectively solves practical industrial demands and brings essential significance.

The remainder of the paper is organized as follows: Sect. 2 overviews the related work of hybrid learning models for industrial target prediction. Section 3 proposes the structure of the CTCN-LightGBM model. Section 4 gives some experimental results and theoretical analysis. Finally, the conclusion and future work are given in Sect. 5.

2 Related Work

We review the related research work in two main areas in this paper, including the industrial hybrid model via neural networks and the optimized gradient method via decision trees.

2.1 The Industrial Hybrid Model via Neural Networks

The industrial hybrid model via neural networks has proven successful for forecasting parameters [6, 7, 18, 19] and target detection [20,21,22] in related industrial fields. For example, Li et al. [18] propose a deep learning algorithm composed of long short-term memory and fully connected layers to predict photovoltaic power generation. Because of the simple structure of the FC layer, the hidden distribution of features cannot be efficiently exploited for data prediction. Geng et al. [19] propose a novel gated-convolutional neural network-based transformer for dynamic soft sensor modeling of industrial processes. The model can adaptively filter the essential features. Further, Zhou et al. [6] provide a hybrid model to improve electrical equipment's load decomposition accuracy. Ding et al. [7] propose a model based on convolutional neural networks and a gate recurrent unit model to identify rough-stored express deliveries intelligently. Xia et al. [20] and Qiang [21] propose depth neural networks for industrial control. Siegel [22] proposes an anomaly detection mechanism based on the convolutional neural network and the generative adversarial network for industrial equipment. However, these heterogeneous neural networks are weak for extracting and predicting multi-adjustment values in industrial loading.

2.2 The Optimized Gradient Method via the Decision Trees

The light gradient boosting decision tree and expansive models are adopted to achieve precise regression/classification [23, 24]. Zhang et al. [23] propose a gradient-boosting decision tree-based fault prediction tool for cyber-physical production systems. The online test results prove that the model has high prediction accuracy. Yan and Wen [24] propose a light gradient boosting machine to detect power theft from power companies. However, the learning ability of these single decision tree models is insufficient to process the multi-distribution features. Nakamura et al. [25] use a hybrid model based on the bidirectional long short-term memory and the gradient-boosted decision tree for the binary classification of radiology reports. Lu et al. [26] integrate the long short-term memory with the gradient boosting machine to predict end-to-end inferences. Dan et al. [27] combine a convolutional neural network with the gradient-boosting decision tree for temperature prediction. Also, Ju et al. [28] propose a convolutional neural network and light-GBM model to predict wind power. Due to the limitation of the receptive field, these models have a poor learning effect on temporal feature relationships. Y. Wang et al. [29] propose a short-term load forecasting model based on the temporal convolutional network and the gradient boosting machine for industrial customers. Experiments show that the TCN-LightGBM model can predict electrical loads in multiple industrial scenarios.

However, the existing hybrid models are less mentioned and unsuitable for collaborative feature extraction in industrial loading fields. Thus, this paper explores the CTCN-LightGBM model to achieve accurate multi-adjustment value prediction.

3 Structure of the CTCN-LightGBM Model

The CTCN-LightGBM prediction model consists of three parts: the data preprocessing and normalization, the feature extraction based on the CTCN, and the Light-GBM prediction. The detailed process of the CTCN-LightGBM model is designed in Fig. 2.

Fig. 2
figure 2

The detailed process of the proposed CTCN-LightGBM

3.1 The Data Preprocessing and Normalization

The dataset features consist of speed-related features (i.e., Feature_1), flow-related features (i.e., Feature_2), and labels in this paper. The raw dataset usually has some missing/abnormal instances because of the manual experience inference and recording accuracy errors. We propose data processing methods to deal with this problem, as listed in Table 1. We adopt the unit-adjustment values (i.e., \(\Delta V,\Delta Q = 0.0001\)) to replace the zero-value in actual instances, improving the data accuracy while conforming to industrial conditions. In addition, we set data selection deviations according to actual industrial requirements in Table 2, which can ensure the prediction effect and uniformly regulate loading target standards.

Table 1 The raw dataset processing methods
Table 2 The data selection deviations of the actual loading requirements

Further, we use the MinMaxScaler function to preprocess raw inputs \(X = [L,T,V,M,C,H,Q]\) (e.g.,\(L:0.118\,{\text{m}},T:2.03\,{\text{s}},V:0.061\,{\text{m/s}},M:0.579\,{\text{t}},C:0.279\,{\text{t/s}},H: - 0.021\,{\text{m}},Q:0.186\,{\text{t/s}}\)). Also, the MaxAbsScaler function is utilized to normalize the labels \([\Delta V,\Delta Q]\) (e.g., \(\Delta V: \, - 0.002\,{\text{m/s}}\) and \(\Delta Q: - 0.015\,{\text{t/s}}\)). These methods can improve the accuracy of feature extraction and accelerate the speed of gradient descent. Formula (1) and (2) describe normalized operations of features and labels.

$$X^{\prime}_i = \frac{{\hat{X}_i - X_i .\min ({\text{axis}} = 0)}}{{X_i .\max ({\text{axis}} = 0) - X_i .\min ({\text{axis}} = 0)}}$$
(1)
$$O_{Zi} = \frac{{\hat{O}_i }}{{O_i .\max (O_i .{\text{abs}}(\hat{O}_i ),{\text{axis}} = 0)}}$$
(2)

where \(X_i\) is the ith column vector of the raw feature input \(X\), \(X^{\prime}_i\) is the ith normalized column vector of the \(X_i\), \(X_i .\max ( \cdot )\) and \(X_i .\min ( \cdot )\) are the maximum and minimum values of the ith column vector and \(\hat{X}_i\) is the elements of the ith column vector to be normalized. \(O_i\) is the ith column vector of the raw-label input \(O\), \(O_{Zi}\) is the ith normalized column vector of \(O_i\), and \(O_i .{\text{abs}}(\hat{O}_i )\) represents the absolute value of the ith column vector.

3.2 The Feature Extraction Based on the CTCN

In this section, we propose the feature extraction module based on the CTCN, and the details are as follows.

3.2.1 Dilated causal convolution

The dilated causal convolution of the CTCN is proposed to solve the problem of limited receptive fields in the temporal domain convolution. The interval sampling can be achieved based on multiple dilated convolutional layers by changing the convolution kernel’s size or the expansion factor’s value. For the one-dimensional features \(X^{\prime} = (x^{\prime}_0 ,x^{\prime}_1 , \ldots ,x^{\prime}_t , \ldots ,x^{\prime}_T )\) and the kernels \(f = \{ 0,1, \ldots ,n - 1\}\), the dilated convolution operation \(H( \cdot )\) of each element \(T\) is defined in Formula (3). Further, the final output \(F(X^{\prime})\) of the transformation branch is described in Formula (4).

$$H(T) = (X^{\prime} \ast {}_df_i )(T) = \sum_{i = 0}^{n - 1} {f(i) \ast x^{\prime}_{T - d \cdot i} } ,i = 0,1,..,n - 1$$
(3)
$$F(X^{\prime}) = \psi [H_1 (T),H_2 (T)]$$
(4)

where \(n\) is the kernel size, d is the dilated factor, and \(T - d \cdot i\) is the past direction. \(f( \cdot )\) denotes the convolutional operation of the ith kernel. \(\psi [ \cdot ]\) is a series of transformation operations, including the dilated convolution, the weight normalization, the Relu activation, and dropout layers.

3.2.2 Composited dimensionality reduction convolution

The simple shortcut (i.e., the 1 × 1 convolutional layer) in residual blocks may lead to the TCN model that does not generalize well enough to collaborative features. However, the hidden bottleneck layer supports the network of existing autoencoders to reconstruct the raw data by reducing feature dimensions. It preserves necessary features to improve the accuracy of feature extraction or prediction. Inspired by this method and reducing the time consumption, we provide a side-road dimensionality reduction convolutional branch to replace the 1 × 1 convolutional shortcut in the residual block. The branch can easily extract auxiliary features and preserve labels’ positive or negative characteristics. First, the one-dimensional features \(X^{\prime} = (x^{\prime}_0 ,x^{\prime}_1 , \ldots ,x^{\prime}_t , \ldots ,x^{\prime}_T )\) are processed by the initial 1 × 1 convolutional layer, reducing the number of parameters to increase the computing power effectively. Second, some valuable features can be extracted through a one-dimensional convolutional layer (e.g., kernel size is \(1 \times k\)) that reduces the feature dimension by a \(b\)-ratio, as described in Formula (5). This convolutional layer records additional features of multi-adjustment values while eliminating some low-relevant features. Finally, we perform a linear projection (i.e., 1 × 1 convolution) at the end of the branch to preserve the characteristics of the convolution branch. The batch normalization function can improve the convergence speed during the training process. Also, the LeakyReLU function is adopted to solve the existing negative value problem in the activation layer. Further, the final output \(F^{\prime}(X^{\prime})\) of the side-road dimensionality reduction convolutional branch can be described in Formula (6)

$$\begin{aligned} {\text{out}}(N,c_{{\text{out}}} ) = & {\mkern 1mu} {\text{bias}}(c_{{\text{out}}} ) + \sum_{k = 0}^{c_{{\text{out}}} - 1} {{\text{weight}}(c_{{\text{out}}} ,k)} \\ & \quad \times {\text{input}}(N,k),c_{{\text{out}}} = c/\beta \\ \end{aligned}$$
(5)

where \(N\) is the batch size, \(c_{{\text{out}}}\) is the convolution kernel number or the output dimension, and \(c_{{\text{in}}}\) is the input dimension. \({\text{bias}}( \cdot )\) is the bias vector (e.g., \({\text{bias}} = 1\)), and k is the filter number. The c is the input dimension of the next layer.

$$F^{\prime}(X^{\prime}) = \psi^{\prime}[{\text{out}}(N,c_{{\text{out}}} ),(X^{\prime})]$$
(6)

where \(\psi^{\prime}[ \cdot ]\) is a series of transformation operations, including the dimensionality reduction convolution, the batch normalization, and the activation layer.

3.2.3 Composited residual block

The conventional residual block [30] concludes with a transformation branch and a shortcut, as shown in Fig. 3a. Further, the output \(X^{(l)}\) of the l-th residual block can be expressed as Formula (7). This paper proposes a composited residual block consisting of a dilated causal convolution branch and a composited dimensionality reduction convolution branch, as shown in Fig. 3b. The output \(X^{{\prime}(l)}\) of the l-th residual block can be expressed as Formula (8). When the residual connection operations are completed, we can get a two-dimensional collaborative feature matrix as the output of the extract module, as described in Formula (9).

$$X^{(l)} = \delta (F(X^{(l - 1)} ) + X^{(l - 1)} )$$
(7)
$$X^{{\prime}(l)} = \delta (F^{\prime}(X^{(l - 1)} ) + X^{{\prime}(l - 1)} )$$
(8)
$$\begin{aligned} Y & = H_{{\text{map}}} [X^{{\prime}(1)} ,X^{{\prime}(2)} , \ldots ,X^{{\prime}({\text{final)}}} ] \\ & \Rightarrow Y_V + Y_Q = (v_0 ,v_1 , \ldots ,v_T )^T + (q_0 ,q_1 , \ldots ,q_T )^T \\ \end{aligned}$$
(9)

where \(\delta\) represents the activation operation. \(F( \cdot )\) represents a series of convolutional exchange operations (e.g., dilated convolution, dropout, weight normalization). \(H_{{\text{map}}} [ \cdot ]\) is the feature map produced in residual blocks \(1,2, \ldots ,{\text{final}}\). \(Y_V = (v_0 ,v_1 ,\ldots,v_T )^T\) is the speed-related element of the output matrix Y. \(Y_Q = (q_0 ,q_1 , \ldots ,q_T )^T\) is the flow-related element of the output matrix Y.

Fig. 3
figure 3

The structure diagram of two residual blocks. a is the conventional residual block, b is the composited residual block (CRB) of the CTCN

Notably, some associated characteristics (e.g., the displacement L is a constant feature as 0.118 m. V and T are associated with L. Also, Q and T are associated with L) should play an important role in collaborative feature extraction. Namely, the collaborative extraction process by the CTCN module may ignore some inherent properties and emphasize the multi-adjustment relationship by changing the feature distribution. Thus, the feature re-enlargement method is proposed to reconstruct and enlarge the relational properties of collaborative features from original relations. The FR method can improve the extraction ability of the CTCN and the accuracy of the parallel Light-GBM prediction module. Also, it further helps to make the sign features consistent between the collaborative and original features. We process original features according to speed-flow element ratios of the two-dimensional collaborative feature matrix. Further, Formula (10) and (11) can express speed-related and flow-related features from the original features. The final reconstructed feature matrix \(Z = [V^{\prime},Q^{\prime}]/2\) can be described in Formula (12).

$$Y^{\prime}_V = (x_0^v = \frac{v_0 }{{\sqrt {v_0^2 + q_0^2 } }} \times x_0 , \ldots ,x_T^v )^T$$
(10)
$$Y^{\prime}_Q = (x_0^q = \frac{q_0 }{{\sqrt {v_0^2 + q_0^2 } }} \times x_0 , \ldots ,x_T^q )^T$$
(11)
$$[V^{\prime},Q^{\prime}] = \left[ {\begin{array}{*{20}c} {(v^{\prime}_0 ,v^{\prime}_1 , \ldots ,v^{\prime}_T )} \\ {(q^{\prime}_0 ,q^{\prime}_1 , \ldots ,q^{\prime}_T )} \\ \end{array} } \right]^T = \left[ {\begin{array}{*{20}c} {Y_V + Y^{\prime}_V } \\ {Y_Q + Y^{\prime}_Q } \\ \end{array} } \right]^T$$
(12)

3.3 The Light-GBM Optimized Prediction

The gradient boosting machine is an upgraded gradient boosting framework based on a decision tree, which is widely applied in classification or regression tasks [31, 32]. First, the Light-GBM adopts a gradient-based one-side sampling method to exclude the lowest gradient samples and calculate the information gain of the large gradient samples. Second, we use a histogram algorithm to obtain optimal splitting points, and the leaf-wise strategy reduces unnecessary splitting overhead for the lower-gain leaf nodes. Third, we set the max depth of all decision trees to prevent overfitting problems. Notably, the Light-GBM adopts a gradient descent function (i.e., \(- g_t (x)\)) to optimize the new function increment, as described in Formula (13). So, we can utilize the classical least-squares minimization task to simplify the objective function, as denoted in Formula (14).

$$g_t (x) = E_y \left[ {\frac{{\partial \Psi (y^{\prime},f(x))}}{\partial f(x)}|x} \right]_{f(x) = \hat{f}^{t - 1} (x)}$$
(13)
$$(\rho_t ,\theta_t ) = \mathop {\arg \min }\limits_{\rho ,\theta } \sum_{i = 1}^N {[ - g_t (x_i ) + \rho h(x_i ,\theta )]^2 }$$
(14)

where \(f\) is the functional model between input features and response outputs, \(\{ g_t (x_i )\}_{i = 1}^M\) is the negative gradient, \(\Psi (y^{\prime},f(x))\) is the specific loss function, and \(f(x) = \hat{f}^{t - 1} (x)\) is the (t-1)th function estimation, also called boosts. \(h(x,\theta )\) denotes a custom base-learner function, p denotes the boundary expansion, and \((\rho ,\theta )\) denotes the optimization parameters (i.e., the step size and the functional dependence parameters).

Finally, we suppose that the input sequence of the samples via the gradient-based one-side sampling method is \([Z,O] = [(Z_1 ,O_1 ),(Z_2 ,O_2 ),\ldots,(Z_N ,O_N )]\). \(O = (O_1 ,O_2 ,\ldots,O_N )\) denotes the actual values of the multi-adjustment instances. The final output functions of the parallel Light-GBM prediction module are shown in Formula (15). The multiple prediction values of the multi-adjustment parameters (\(\mathop{Z}\limits^{\frown}_v ,\mathop{Z}\limits^{\frown}_q\)) are described in Formula (16) and (17). Further, the algorithm of the CTCN-LightGBM is proposed as follows.

$$F_{{\text{final}}} (o_g ) = L_0 (z_g ) + \sum_{m = 1}^M {\left[ {\nu _m \cdot \sum_{j = 1}^J {c_{m,j} I(z_g \in \Re _{m,j} )} } \right]} ,(z_g ,o_g ) \in [Z,O]$$
(15)
$$\Rightarrow \mathop {\hat{Z}}\limits_v = F_{{\text{tree}}0} (v) + F_{{\text{tree}}0} (v) + , \ldots , + F_{{\text{tree}}M} (v)$$
(16)
$$\Rightarrow \mathop {\hat{Z}}\limits_q = F_{{\text{tree}}0} (q) + F_{{\text{tree}}0} (q) + , \ldots , + F_{{\text{tree}}M} (q)$$
(17)
figure a

where \(L_0 (z_g )\) is the value of the initial weak learner. \(F_m ( \cdot )\) denotes the output of the m-th decision trees. \((v_1 ,v_2 , \, \ldots ,v_M )\) is the weight of each tree, \(M\) is the number of the trees, \(\Re_{m,j} ,j = 1,2, \ldots ,J\) denotes the leaf node area of the m-th decision tree, and \(c_{m,j}\) is a leaf node. Further, if \(z_g \in {\text{leaf}}_{m,j}\), \(I = 1\); else \(I = 0\). \(F_{{\text{final}}} (o_g )\) is the final output of the Light-GBM module.

In Algorithm 1, steps 1–6 denotes the process of the initial dataset preparation, the feature matrix extraction, and the feature reconstruction process. Further, Step 7–12 denotes the predicting process of multi-adjustment values based on the parallel Light-GBM module.

4 Experiments

4.1 The Experimental Settings and Performance Metrics

This paper collects the real loading datasets from different coal mines (i.e., Huaibei Mining Co., Ltd and Linhuan Mining Co., Ltd) in Anhui Province, China. We take the whole carriage as a single research object (i.e., concluding 117-loading point instances) and collect 50 carriages’ instances from each coal mine (e.g., each carriage is loaded on average four times a month). The historical loading data and corresponding multi-adjustment values from Apr 1st, 2021, to Nov 1st, 2021 are applied to carry out the experiments. The dataset is split into the training set and the testing set according to the proportion of 8:2. The experimental programming environment is Python 3.9, the Keras library, the NVIDIA RTX 3090, the AMD R7-5800 × CPU, and the 32 GB of memory. Further, the CTCN-LightGBM model and other contrast models are studied in this paper. These include two kinds of models, the classical learning models (i.e., the Light-GBDT [23], the Light-GBM [24], the TCN) and the hybrid learning models (i.e., the LSTM-CNN, the CNN-LSTM [6], the LSTM-LightGBM [26], the CNN-LightGBM [28], the TCN-LightGBM [29], and the CTCN-LightGBDT).

Because of the zero-extreme values in training data, the mean absolute percentage error is not suitable as an evaluation criterion. Usually, we adopt the mean absolute error (MAE), the root mean square error (RMSE), and the determination coefficient (\(R^2\)) as evaluation metrics for model prediction.

$${\text{MAE}} = \frac{1}{W}\sum_{w = 1}^W {{|}Z_w - \hat{Z}_w {|}}$$
(18)
$${\text{RMSE}} = \sqrt {\frac{1}{W}\sum_{w = 1}^W {(Z_w - \hat{Z}_w )^2 } }$$
(19)
$$R^2 = 1 - \frac{{{\text{SSE}}}}{{{\text{SST}}}} = 1 - \frac{{\sum_{w = 1}^W {(\hat{Z}_w - Z_w )^2 } }}{{\sum_{w = 1}^W {(\overline{Z}_w - Z_w )^2 } }}$$
(20)

where \(W\) denotes the number of testing instances. \(Z_w\), \(\overline{Z}_w\), and \(\hat{Z}_w\) represent the actual, the average actual, and predicted multi-adjustment values in the w-th instance, respectively.

Further, since the range of the actual normalization instances is [− 1, 1], the predicted and actual values may have different signs. We select the measurement coefficient (F1-Score) and the area under the curve (AUC) value as evaluation metrics for model classification.

$$F1 - {\text{Score}} = \frac{{2 \times {\text{precision}} \times {\text{recall}}}}{{{\text{precision}} + {\text{recall}}}}$$
(21)
$${\text{precision}} = \frac{{{\text{TP}}}}{{\text{TP + FP}}}$$
(22)
$${\text{recall}} = \frac{{{\text{TP}}}}{{\text{TP + FN}}}$$
(23)
$${\text{AUC}} = \frac{{\sum_{i \in {\text{positive}}\;{\text{Class}}} {{\text{rank}}_i - W^+ (1 + W^+ )/2} }}{W^+ \times W^- }$$
(24)

where \({\text{precision}}\) is the precision value, and \({\text{recall}}\) is the recall value. \({\text{TP}}\) is the number of true-positive values classified by the model, \({\text{FP}}\) is the number of false-positive values, \({\text{FN}}\) is the number of false-negative values, and \({\text{TN}}\) is the number of true-negative values. \(W^+\) and \(W^-\) are the number of positive and negative examples. \({\text{rank}}_i\) is the serial number of the ith sample (e.g., the probability scores are arranged from the small to the large instances).

4.2 Comparison Results for the Feature Dimensionality Reduction Ratio of the CRB

In this experiment, we compare the CRB with different dimensionality reduction ratios to explore the feature extraction ability of the CTCN-LightGBM model. The dimensionality reduction ratios are [0.5, 0.25, 0.125, 2], and the dataset collected in the Huaibei Mining Co., Ltd from Apr 1st, 2021, to July 1st, 2021 is utilized to run simulations. We randomly select the completed loading data as testing instances, and the data concludes the 117-loading point instances. The loss function is MSE, the evaluation metric is RMSE, and the parameter settings of contrast modules are as follows:

  1. 1.

    CTCN: The factors of the dilated causal convolution branch are [1, 2, 4, 8], the filters are 64/64/16/16, and the convolutional kernel size is 2. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. Additionally, the chosen model optimizer is Adam, the gradient calculation function is MSE, the hidden layers are 32/16, the dropout value is 0.25, and the training epoch is 800.

  2. 2.

    Light-GBM: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT.

Figures 4a and 5a show the training loss of the extraction module with different dimensionality reduction ratios for multi-adjustment parameters. Figures 4b and 5b are the fitting curves of the training loss curves. Table 3 shows the detailed results for the extraction module with different dimensionality reduction ratios. Among them, the CTCN-LightGBM model with a dimensionality reduction ratio b = 0.25 can acquire the best extraction performance (e.g., the RMSE metrics are about \(0.107 \times 10^{ - 2} {\text{m/s}}\) and \(0.286 \times 10^{ - 2} {\text{t/s}}\), where \({\text{m/s}}\) denotes the speed unit, and \({\text{t/s}}\) denotes the flow unit.).

Fig. 4
figure 4

The training loss and related fitting curve of the extraction module for the truck speed

Fig. 5
figure 5

The training loss and related fitting curve of the extraction module for the chute flow

Table 3 The extraction results of different ratios

4.3 Ablation Experiments for the Extraction Ability of the CTCN

TO explore the performance of the composited dimensionality reduction convolution branch in the CTCN, we compare the CTCN+-LightGBM (i.e., the CTCN-LightGBM without the composited dimensionality reduction convolution branch, but with a 1 × 1 shortcut) and CTCN++-LightGBM (i.e., the CTCN-LightGBM without the dimensionality reduction convolutional layer, as shown in Fig. 6a), and the CTCN+++-LightGBM (i.e., the CTCN-LightGBM without the one-dimensional convolutional layer, Fig. 6b) with the CTCN-LightGBM. The dataset collected in the Huaibei Mining Co., Ltd from Apr 1st, 2021, to July 1st, 2021 is utilized to run simulations. Further, the completed data (including 117-loading points) is adopted to perform the predictive ability of contrast models intuitively. We test three randomly completed data of multi-adjustment values in the testing dataset to eliminate model contingency. The parameter settings of contrast modules are as follows:

  1. 1.

    CTCN: The factors of the dilated causal convolution branch are [1, 2, 4, 8], the filters are 64/64/16/16, and the convolutional kernel size is 2. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. Additionally, the feature dimensionality reduction ratio is 0.25, the hidden layers are 32/16, and the training epoch is 800.

  2. 2.

    Light-GBM: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT.

Fig. 6
figure 6

The residual block structure of the CTCN++-LightGBM (a) and the CTCN+++-LightGBM (b)

Tables 4 and 5 show the ablation experimental results for the prediction of speed-adjustment values. Also, Tables 6 and 7 show the ablation experimental results for the prediction of flow-adjustment values. Among them, the modified CTCN modules (i.e., CTCN++ and CTCN+++) obtain better performance than the CTCN+-LightGBM for predicting the multi-adjustment values but are weaker than the CTCN-LightGBM model. Namely, the model generalization and sign-preserving ability could be improved (i.e., the R2: 0.909/0.895 vs. 0.925/0.924) by adding a side-road dimensionality reduction convolutional branch in CRBs. Further, without the one-dimensional convolutional layer, the performance of the CTCN+++-LightGBM is slightly poorer than the CTCN-LightGBM (i.e., the R2: 0.918/0.915 vs. 0.925/0.924). It means that both the one-dimensional convolutional layer and the dimensionality reduction convolutional layer can emphasize related useful features. However, the dimensionality reduction convolutional layer significantly improves generalization and retaining sign features ability. Namely, it emphasizes the model performances by reducing the dimension filters. Further, comparing the time consumption of the CTCN++-LightGBM with the CTCN+-LightGBM (i.e., 0.228 s vs. 0.226 s), we find that the added 1 × 1 convolutional layer can reduce parameter computation to make the auxiliary features effective while takes a little more time. In a word, with the help of features extracted by CRBs, the generalization and retaining sign features ability of the CTCN-LightGBM can be improved.

Table 4 Ablation results of each model for the truck speed adjustment
Table 5 Ablation classification results of each model for the truck speed adjustment
Table 6 Ablation results of each model for the chute flow adjustment
Table 7 Ablation classification results of each model for the chute flow adjustment

4.4 Comparison Results for the Extraction Ability of the FR method

This experiment compares the equal-parameter TCN-LightGBM with the dimensionality reduction layer (i.e., called TCN*-LightGBM) and the CTCN-LightGBM without the FR method (i.e., called CTCN*-LightGBM) with the CTCN-LightGBM to verify the extraction ability of the FR method in feature extraction module. Namely, the conventional residual block requires \((2k + 1) \cdot c^2\), whereas the CRB needs \([(2k + 1) + (k + 1) \times b] \cdot c^2\) parameters, which \(c\) denotes the feature dimension of the input and intermediate layers. Further, Fig. 7 shows the structure of the composited residual block for the TCN*, and each dilated casual convolution layer’s filter parameters are described in Table 8. Further, the related dataset is the same as in Sect. 4.3. We test three randomly completed data of multi-adjustment values in the testing dataset to eliminate model contingency. The parameter settings of contrast modules are as follows:

  1. 1.

    TCN*: The temporal convolutional network is built by the Keras library. The dilated convolution factors of the temporal convolution network are [1, 2, 4, 8], the referenced filters are 64/64/16/16, and the convolutional kernel size is 2. The neuron number of the hidden layer is 16, the dropout value is 0.25, and the training epoch is 800.

  2. 2.

    CTCN: The factors of the dilated causal convolution branch are [1, 2, 4, 8], the filters are 64/64/16/16, and the convolutional kernel size is 2. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. Additionally, the feature dimensionality reduction ratio is 0.25, the hidden layers are 32/16, and the training epoch is 800.

  3. 3.

    Light-GBM: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT.

Fig. 7
figure 7

The structure of the composited residual block for the TCN*

Table 8 The filter parameters of dilated casual convolution layers

Tables 9 and 10 indicate the prediction and classification results of speed-adjustment values. Tables 11 and 12 indicate the prediction and classification results of flow-adjustment values. Among them, it is evident that the CTCN-LightGBM model can achieve the best prediction performance (e.g., R2: 0.917/0.924, RMSE: 0.107/0.285, MAE:0.077/0.208, F1-Score: 0.918/0.946, AUC: 0.906/0.935). Further, the CTCN-LightGBM model with the CRBs requires more computational time than other models (e.g., about 0.23 s), but it is acceptable considering the improvement of the model accuracy.

Table 9 Prediction results of each model for the truck speed adjustment
Table 10 Classification results of each model for the truck speed adjustment
Table 11 Prediction results of each model for the chute flow adjustment
Table 12 Classification results of each model for the chute flow adjustment

Figure 8a and b presents the actual prediction results of truck speed-adjustment values. Also, Fig. 9a and b represents the actual prediction results of chute flow-adjustment values. Among them, the CTCN-LightGBM can obtain the best evaluation metrics (e.g., RMSE: 0.106/0.283, MAE:0.071/0.203). Further, the TCN*-LightGBM is worse than the CTCN-LightGBM (e.g., R2 is 0.908/0.912 vs. 0.921/0.929), and the TCN-LightGBM also gets better performance than the CTCN*-LightGBM (e.g., R2 is 0.899/0.909 vs. 0.921/0.929). Namely, the FR method can improve the extraction ability of the CTCN module and help the Light-GBM module acquire superior performance for the prediction of sign characteristics (e.g., F1-Score: 0.945/0.961, AUC: 0.918/0.917) and values (e.g., R2: 0.921/0.929).

Fig. 8
figure 8

Actual prediction results for truck speed adjustment

Fig. 9
figure 9

Actual prediction results for chute flow adjustment

4.5 Prediction for Multi-Adjustment Values

This experiment compares the CTCN-LightGBM model with the listed models for multi-adjustment values prediction (i.e., truck speed and chute flow). We randomly select the completed temporal loading data (i.e., 117-loading points) as the testing instances to perform the predictive effects of each contrast model.

  1. (1)

    Experiment-1: In this experiment, the historical loading data collected in the Huaibei Mining Co., Ltd from Apr 1st, 2021, to Jul 1st, 2021 are utilized to run simulations. The experimental environments and models are listed in Sect. 4.1. The parameter settings of contrast modules are as follows:

    1. (1)

      TCN: The temporal convolutional network is built by the Keras library. The dilated convolution factors of the temporal convolution network are [1, 2, 4, 8], the filters are 128/64/32/16, and the convolutional kernel size is 2. The neuron number of the hidden layer is 16, and the dropout value is 0.25.

    2. (2)

      CTCN: The factors of the dilated causal convolution branch are [1, 2, 4, 8], the filters are 128/64/32/16, and the convolutional kernel size is 2. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. The dimensionality reduction ratio is 0.25, and the hidden layers are 32/16.

    3. (3)

      CNN: The convolutional neural network is concluded by convolutional layers and the fully connected layers. The convolutional layers are 4, the filters of each convolutional layer are 128/64/32/16, and the kernel size of filters is 2. The number of fully connected layers is 2, and the number of neurons is 16/1.

    4. (4)

      LSTM: The number of hidden layers is 4, and the hidden neurons are designed as 256/128/64/16. The fully connected layers are 2, and the number of neurons is 16/1.

    5. (5)

      Light-GBDT: The number of trees is 500, the maximum depth is 6, and the model learning rate is 0.01. The minimum sample split is 2, and the minimum sample leaf is 1.

    6. (6)

      Light-GBM: The number of trees is 500, the maximum depth is 6, the learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT.

Tables 13 and 14 indicate that the CTCN-LightGBM model with the help of the CTCN and the FR method is better than other classical or hybrid models. For example, the R2 is 0.917/0.926, the RMSE is 0.109/0.285, the MAE is 0.082/0.206, the F1-Score is 0.946/0.964, and the AUC is 0.964/0.964. In the hybrid models, the extraction ability of the expansive LSTM models is relatively poorer than the expansive CNN (i.e., the R2: the LSTM-LightGBM is 0.867/0.864, the CNN-LightGBM is 0.882/0.876 and the TCN-LightGBM is 0.904/0.904). Further, the time consumption of the CTCN-LightGBM model is longer than other hybrid convolutional models (i.e., TCN-LightGBM, TCN-LightGBDT, CNN-LightGBM), about 0.26 s.

Table 13 The prediction and classification results of all models for the truck speed adjustment
Table 14 The prediction and classification results of all models for the chute flow adjustment

Figure 10a and b shows the absolute prediction errors of the expansive Light-GBM models. Among them, the CTCN-LightGBM model obtains lower errors than other models, and the fluctuation trend is relatively stable. Further, the results mean that the CTCN-LightGBM model based on the FR method and the CRBs can obtain high accuracy. It is suitable for multi-adjustment value prediction in actual industrial loading.

  1. (2)

    Experiment-2: In this experiment, the historical loading data collected in the Linhuan Mining Co., Ltd from Aug 1st, 2021, to Nov 1st, 2021 are utilized to run simulations. The experimental environments and models are listed in Sect. 4.1. The parameter settings of contrast modules are as follows:

    1. (1)

      TCN: The temporal convolutional network is built by the Keras library. The dilated convolution factors of the temporal convolution network are [1, 2, 4, 8], the filters are 256/64/32/16, and the convolutional kernel size is 3. The hidden layers are 16/1, and the dropout value is 0.25.

    2. (2)

      CTCN: The factors of the dilated causal convolution branch are [1, 2, 4, 8], the filters are 256/64/32/16, and the convolutional kernel size is 3. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. The feature dimensionality reduction ratio is 0.25, and the hidden layers are 16/1.

    3. (3)

      CNN: The convolutional neural network is concluded by convolutional layers and the fully connected layers. The convolutional layers are 4, the filters of each convolutional layer are 256/64/32/16, and the kernel size of filters is 3. The number of fully connected layers is 2, and the number of neurons is 16/1.

    4. (4)

      LSTM: The number of hidden layers is 4, and the hidden neurons are designed as 128/64/64/32. The fully connected layers are 2, and the number of neurons is 16/1.

    5. (5)

      Light-GBDT: The number of trees is 500, the maximum depth is 6, and the model learning rate is 0.01. The minimum sample split is 2, and the minimum leaf is 1.

    6. (6)

      Light-GBM: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT.

Fig. 10
figure 10

a Absolute error values of actual prediction results for truck speed adjustment. b Absolute error values of actual prediction results for chute flow adjustment

Similar to Experiment 1, Tables 15 and 16 indicate that the CTCN-LightGBM model outperforms other contrast models. The CTCN-LightGBM model can distinguish and predict the positive and negative adjustment values with the highest evaluation scores (i.e., RMSE: 0.921/0.929, RMSE: 0.107/0.284, MAE: 0.077/0.200, F1-Score: 0.947/0.967, and AUC: 0.952/0.958). Further, the CTCN module of the CTCN-LightGBM model can accurately extract collaborative features and fit them well with machine learning models. Further, the expansive GBDT prediction module can improve the prediction performance for multi-adjustment values. Because of more parameters of each module in the CTCN-LightGBM model, the complexity and computational time are slightly increased (i.e., about 0.27s), but it is acceptable for industrial loading applications.

Table 15 The prediction and classification results of all models for the truck speed adjustment
Table 16 The prediction and classification results of all models for the chute flow adjustment

Figure 11a and b shows the expansive Light-GBM models’ absolute prediction errors and related trend distributions. Among them, the CTCN-LightGBM model precisely matches the truck speed and chute flow values, and the fluctuation of absolute prediction errors is relatively stable. Thus, it can be well applied to forecasting multi-adjustment values in industrial loading.

Fig. 11
figure 11

a Absolute error values of actual prediction results for truck speed adjustment. b Absolute error values of actual prediction results for chute flow adjustment

4.6 Discussion and Analysis

The paper compares the proposed CTCN-LightGBM with other models to illustrate better prediction effects. There are some intuitive results and theoretical analyses as follows:

  1. 1.

    In classical learning models, the Light-GBDT and Light-GBM can better fit actual prediction targets, whether positive or negative values. The computational times of the expansive Light-GBDT models are significantly less than the TCN. Theoretically, the reason is that the ensemble learners (i.e., like decision trees) with the negative-gradient fitting can decrease the loss along the gradient direction. Further, the gradient-based one-side sampling method and the histogram algorithm for the Light-GBM can reduce the data size, guarantee basic features, and accelerate network convergence. Also, the leaf-wise strategy with depth limitation plays an essential role in avoiding overfitting.

  2. 2.

    In hybrid learning models, the extraction performances of models via the LSTM are worse than that of models via the expansive CNN. Because of the long-time span and nonlinear feature distribution, the LSTM is not suitable for extracting hidden collaborative relationships. Further, the dilated convolution and residual blocks for the TCN can obtain a wider receptive field than the CNN, as shown in Table 17. Also, the limitations of the CNN will lead to poverty when capturing the temporal information. Additionally, Formula (25) and (26) provide the size of the receptive fields of convolutional layers and dilated residual blocks.

    $$r_c = r_{c - 1} + \left[ {(k_c - 1)*\prod_{l = 1}^{c - 1} {s_l } } \right]$$
    (25)
    $$\omega_l = 1{ + }\sum_{i = 0}^{l - 1} {(k - 1)*\gamma^i } { = }1 + (k_l - 1) \ast \frac{\gamma^l - 1}{{\gamma - 1}}$$
    (26)

    where \(r_c\) is the receptive field size of the c-th convolutional layer, \(k_c\) is the kernel size of the c-th layer or the pooling layer size, and \(\prod {s_l }\) is the multiplication of the convolutional strides of the previous (c-1)-th layers. Also, \(\omega_l\) denotes the receptive field size of the l-th dilated residual layer. \(k_l\) denotes the kernel size of the l-th layer, and \(\gamma\) denotes the dilated factor (i.e., b = 2).

  3. 3.

    The CTCN-LightGBM model that integrates the superiority of the CTCN and the Light-GBM achieves the best forecasting effect among all models. The receptive fields of the proposed CTCN-LightGBM model are significantly more expansive than that of the TCN-LightGBM, which can improve feature extraction ability. Namely, the CRB adopts a side-road dimensionality reduction convolutional branch to replace the 1 × 1 convolutional shortcut in the conventional residual block. Further, the CTCN-LightGBM model requires more time due to the auxiliary branch parameters, but the predictive performance in industrial loading should be firstly considered. In addition, the FR method reduces the abnormal loss and improves the Light-GBM module’s prediction effect by reconstructing and enlarging the hidden relationships of the collaborative feature matrix.

  4. 4.

    The training time complexity of all contrast models. Based on the above models (e.g., LSTM, CNN, and Light-GBM), the training time complexity of hybrid learning models can be calculated in Tables 18 and 19. Where B is the input of training dataset instances, D is the feature dimension, k is the kernel size, n is the number of branch points, and N is the number of convolutional layers. \(T_1\), \(T_2\), and \(T\) are the time complexity of the dimension reshaping between two single models.\(a\%\) is the selected top \(a \times 100\%\) data and \(b\%\) is the randomly selected \(b \times 100\%\) data in different data subsets. \(N_1\) is the number of cells in the LSTM layer, and \(N_2\) is the number of convolutional layers in CRBs. \(depth\_max\) is the max depth of decision trees. Among them, the CNN-LSTM model and the LSTM-CNN model cost more time than other hybrid learning models. The reason is that these two neural networks need to reshape and match the feature sizes by increasing the channel. However, other hybrid models based on the Light-GBM will reduce a featured channel with decision tree algorithms to obtain a low training time complexity.

Table 17 Receptive field size of the CNN, the TCN, and the STRB
Table 18 Comparison of the model’s time complexity
Table 19 Abbreviations and meanings

5 Conclusion

The paper proposes a CTCN-LightGBM model via the CTCN and the parallel Light-GBM to accurately predict real-time loading values for balanced industrial loading. The composited residual blocks in the CTCN are used to extract collaborative features of multi-adjustment values effectively. Also, we utilize the FR method to reconstruct collaborative features extracted by the CTCN and enlarge related properties for better multi-target prediction. In addition, we adopt the reconstructed feature matrix as the input of the parallel Light-GBM to accurately predict multi-adjustment values. Experiments show that our CTCN-LightGBM model significantly outperforms other contrast models in predicting industrial loading parameters. However, there are still some problems that have not been solved. For example, the proposed method has limitations in that the computational complexity will increase with the number of composite residual blocks. In the future, we will explore optimizing the structure of composited residual blocks in CTCN to reduce time consumption and apply it to more related industrial fields.