The CTCN-LightGBM Joint Model for Industrial Balanced Loading Prediction

Balanced industrial loading mainly relies on accurate multi-adjustment values, including the truck speed and chute flow. However, the existing models are weak in real-time loading prediction because the single-objective regression may ignore the correlation of multi-adjustment parameters. To solve the problem, we propose a joint model that fuses the composited-residual-block temporal convolutional network and the light gradient boosting machine (i.e., called CTCN-LightGBM). First, the instance selection deviations and abnormal supplement methods are used for data preprocessing and normalization. Second, we propose a side-road dimensionality reduction convolutional branch in the composited-residual-block temporal convolutional network to extract collaborative features effectively. Third, the feature re-enlargement method reconstructs extracted features with the original features to improve extraction accuracy. Fourth, the reconstructed feature matrix is utilized as the input of the light gradient boosting machine to predict multi-adjustment values parallelly. Finally, we compare the CTCN-LightGBM with other related models, and the experimental results show that our model can obtain superior effects for multi-adjustment value prediction.


Introduction
Industrial loading that aims to achieve precise and quantitative loading for materials is widely used in mining, transportation, etc. However, when we need to achieve target loading by the conventional manual-programmable logic controller system, the real-time loading parameters that include the truck speed and the chute flow (i.e., shown in Fig. 1) are usually predicted and adjusted by the manual experience. Further, operators stop the truck and replenish underloading values according to the actual-target errors. This process often leads to unbalanced loading problems, economic losses (e.g., about 10% cost of coal mining enterprises per year in China), and even railway accidents. Thus, breaking down the barrier that predicts multi-adjustment values by manual interference has been challenging in the industrial loading field. How to precisely obtain multi-adjustment values for balanced loading based on historical experience has become the critical exploration issue in this paper.
Nowadays, the prediction of industrial time-series targets has been promoted with many innovative learning methods (e.g., deep learning models [1]). Especially hybrid deep learning, which aims to integrate the advantages of individual learners, has become a significant focus on improving model-generalization effects in industrial fields [2][3][4]. Ensemble-based learning models have been proposed to achieve single-target prediction [5][6][7]. For example, Li et al. [5] proposed a long short-term memory recurrent neural network to predict the short-term power load. Zhou et al. [6] and Ren et al. [7] provide industrial prediction methods based on the convolutional neural network (CNN) and the long shortterm memory network (LSTM). However, these models have limitations in learning hidden and temporal correlations for collaborative features. Namely, they have a weak extracting or forecasting ability in the industrial loading field due to the Based on the above analysis, the temporal convolutional network (TCN), an expansive convolutional neural network with dilated causal convolution layers, has been proposed for achieving a wide receptive field [14]. The method integrates the advantages of parallel distributed extraction for the convolutional neural network and temporal regression for the recurrent neural network [15]. It is suitable for parallel and dynamic nonlinear feature extraction. However, due to positive and negative multi-adjustment values of industrial loading, its application to collaborative feature predicting is relatively weak. In addition, the gradient-boosting decision tree (GBDT) algorithms have become popular because of their distributed and fast processing capacity for massive data [16,17]. Among them, the gradient boosting machine (GBM) adopts the local low-gradient data to reduce the time and space overhead, which has an advantage in predicting single targets with positive and negative values while having a shortcoming for multi-objective tasks. Thus, it is not easy to accurately predict multi-adjustment values for industrial balanced loading.
To accurately predict multi-adjustment values for balanced loading, this paper proposes a joint learning model (CTCN-LightGBM) based on the composited-residual-block TCN and the Light-GBM. The novelty of the work is that the CTCN-LightGBM integrates a wide receptive field and dimensionality reduction convolution for the CTCN and negative-gradient ensemble learners for the parallel Light-GBM. The model can improve the predictive accuracy by auxiliary branches and optimize the data-regression performance of the expansive GBDT. Also, we provide a feature re-enlargement (FR) method that reconstructs the collaborative feature matrix with original features to improve the extraction ability of the CTCN. Experimental results show that the CTCN-LightGBM model achieves significant and reasonable improvement compared to other contrast models in the industrial loading field. The main contributions of the paper are as follows: 1. We extract collaborative features through a composited residual block in the TCN, replacing the 1 × 1 convolutional shortcut with a side-road dimensionality reduction convolutional branch. The branch can acquire auxiliary features to improve the generalization ability and preserve the sign characteristics of multi-adjustment values. 2. The feature re-enlargement method (FR method) is proposed to enlarge the extraction accuracy of the CTCN. We process the original features with the extracted speed-flow element ratios and integrate them with the collaborative feature matrix extracted by the CTCN. Further, the reconstructed feature matrix will be used as the input of the Light-GBM for predicting accurate multi-adjustment values (i.e., the truck speed and chute flow). 3. This paper is an academic research based on actual industrial demands. We need to adjust their loading parameters in real engineering scenarios to achieve target loading. Absolutely, only by accurately predicting the multi-adjustment values can industrial loading make a balanced plan. The CTCN-LightGBM model effectively solves practical industrial demands and brings essential significance.
The remainder of the paper is organized as follows: Sect. 2 overviews the related work of hybrid learning models for industrial target prediction. Section 3 proposes the structure of the CTCN-LightGBM model. Section 4 gives

The Industrial Hybrid Model via Neural Networks
The industrial hybrid model via neural networks has proven successful for forecasting parameters [6,7,18,19] and target detection [20][21][22] in related industrial fields. For example, Li et al. [18] propose a deep learning algorithm composed of long short-term memory and fully connected layers to predict photovoltaic power generation. Because of the simple structure of the FC layer, the hidden distribution of features cannot be efficiently exploited for data prediction. Geng et al. [19] propose a novel gated-convolutional neural network-based transformer for dynamic soft sensor modeling of industrial processes. The model can adaptively filter the essential features. Further, Zhou et al. [6] provide a hybrid model to improve electrical equipment's load decomposition accuracy. Ding et al. [7] propose a model based on convolutional neural networks and a gate recurrent unit model to identify rough-stored express deliveries intelligently. Xia et al. [20] and Qiang [21] propose depth neural networks for industrial control. Siegel [22] proposes an anomaly detection mechanism based on the convolutional neural network and the generative adversarial network for industrial equipment. However, these heterogeneous neural networks are weak for extracting and predicting multi-adjustment values in industrial loading.

The Optimized Gradient Method via the Decision Trees
The light gradient boosting decision tree and expansive models are adopted to achieve precise regression/classification [23,24]. Zhang et al. [23] propose a gradient-boosting decision tree-based fault prediction tool for cyber-physical production systems. The online test results prove that the model has high prediction accuracy. Yan and Wen [24] propose a light gradient boosting machine to detect power theft from power companies. However, the learning ability of these single decision tree models is insufficient to process the multi-distribution features. Nakamura et al. [25] use a hybrid model based on the bidirectional long short-term memory and the gradient-boosted decision tree for the binary classification of radiology reports. Lu et al. [26] integrate the long short-term memory with the gradient boosting machine to predict end-to-end inferences. Dan et al. [27] combine a convolutional neural network with the gradient-boosting decision tree for temperature prediction. Also, Ju et al. [28] propose a convolutional neural network and light-GBM model to predict wind power. Due to the limitation of the receptive field, these models have a poor learning effect on temporal feature relationships. Y. Wang et al. [29] propose a short-term load forecasting model based on the temporal convolutional network and the gradient boosting machine for industrial customers. Experiments show that the TCN-LightGBM model can predict electrical loads in multiple industrial scenarios. However, the existing hybrid models are less mentioned and unsuitable for collaborative feature extraction in industrial loading fields. Thus, this paper explores the CTCN-LightGBM model to achieve accurate multi-adjustment value prediction.

Structure of the CTCN-LightGBM Model
The CTCN-LightGBM prediction model consists of three parts: the data preprocessing and normalization, the feature extraction based on the CTCN, and the Light-GBM prediction. The detailed process of the CTCN-LightGBM model is designed in Fig. 2.

The Data Preprocessing and Normalization
The dataset features consist of speed-related features (i.e., Feature_1), flow-related features (i.e., Feature_2), and labels in this paper. The raw dataset usually has some missing/abnormal instances because of the manual experience inference and recording accuracy errors. We propose data processing methods to deal with this problem, as listed in Table 1. We adopt the unit-adjustment values (i.e., ΔV, ΔQ = 0.0001 ) to replace the zero-value in actual instances, improving the data accuracy while conforming to industrial conditions. In addition, we set data selection deviations according to actual industrial requirements in Table 2, which can ensure the prediction effect and uniformly regulate loading target standards. where X i is the ith column vector of the raw feature input X , X ′ i is the ith normalized column vector of the X i , X i . max(⋅) and (1)

The Feature Extraction Based on the CTCN
In this section, we propose the feature extraction module based on the CTCN, and the details are as follows.

Dilated causal convolution
The dilated causal convolution of the CTCN is proposed to solve the problem of limited receptive fields in the temporal domain convolution. The interval sampling can be achieved based on multiple dilated convolutional layers by changing the convolution kernel's size or the expansion factor's value.
For the one-dimensional features Further, the final output F(X � ) of the transformation branch is described in Formula (4).
where n is the kernel size, d is the dilated factor, and T − d ⋅ i is the past direction. f (⋅) denotes the convolutional operation of the ith kernel.
[⋅] is a series of transformation operations, including the dilated convolution, the weight normalization, the Relu activation, and dropout layers.

Composited dimensionality reduction convolution
The simple shortcut (i.e., the 1 × 1 convolutional layer) in residual blocks may lead to the TCN model that does not generalize well enough to collaborative features. However, the hidden bottleneck layer supports the network of existing autoencoders to reconstruct the raw data by reducing feature dimensions. It preserves necessary features to improve the accuracy of feature extraction or prediction. Inspired by this method and reducing the time consumption, we provide a side-road dimensionality reduction convolutional branch to replace the 1 × 1 convolutional shortcut in the residual block. The branch can easily extract auxiliary features and preserve labels' positive or negative characteristics. First, the one-dimensional features are processed by the initial 1 × 1 convolutional layer, reducing the number of parameters to increase the computing power effectively. Second, some valuable features can be extracted through a one-dimensional convolutional layer (e.g., kernel size is 1 × k ) that reduces the feature dimension by a b -ratio, as described in Formula (5). This convolutional layer records additional features of multi-adjustment values while eliminating some low-relevant features. Finally, we perform a linear projection (i.e., 1 × 1 convolution) at the end of the branch to preserve the characteristics of the convolution branch. The batch normalization function can improve the convergence speed during the training process. Also, the LeakyReLU function is adopted to solve the existing negative value problem in the activation layer. Further, the final output F � (X � ) of the side-road dimensionality reduction convolutional branch can be described in Formula (6) where N is the batch size, c out is the convolution kernel number or the output dimension, and c in is the input dimension. bias(⋅) is the bias vector (e.g., bias = 1 ), and k is the filter number. The c is the input dimension of the next layer.
where � [⋅] is a series of transformation operations, including the dimensionality reduction convolution, the batch normalization, and the activation layer.

Composited residual block
The conventional residual block [30] concludes with a transformation branch and a shortcut, as shown in Fig. 3a. Further, the output X (l) of the l-th residual block can be expressed as Formula (7). This paper proposes a composited residual block consisting of a dilated causal convolution branch and a composited dimensionality reduction convolution branch, as shown in Fig. 3b. The output X �(l) of the l-th residual block can be expressed as Formula (8). When the residual connection operations are completed, we can get a two-dimensional collaborative feature matrix as the output of the extract module, as described in Formula (9).
where represents the activation operation. F(⋅) represents a series of convolutional exchange operations (e.g., dilated convolution, dropout, weight normalization). H map [⋅] is the feature map produced in residual blocks 1, 2, … , final .
Page 6 of 20 Notably, some associated characteristics (e.g., the displacement L is a constant feature as 0.118 m. V and T are associated with L. Also, Q and T are associated with L) should play an important role in collaborative feature extraction. Namely, the collaborative extraction process by the CTCN module may ignore some inherent properties and emphasize the multi-adjustment relationship by changing the feature distribution. Thus, the feature re-enlargement method is proposed to reconstruct and enlarge the relational properties of collaborative features from original relations. The FR method can improve the extraction ability of the CTCN and the accuracy of the parallel Light-GBM prediction module. Also, it further helps to make the sign features consistent between the collaborative and original features. We process original features according to speed-flow element ratios of the two-dimensional collaborative feature matrix. Further, Formula (10) and (11) can express speed-related and flowrelated features from the original features. The final reconstructed feature matrix Z = [V � , Q � ]∕2 can be described in Formula (12).

The Light-GBM Optimized Prediction
The gradient boosting machine is an upgraded gradient boosting framework based on a decision tree, which is widely applied in classification or regression tasks [31,32]. First, the Light-GBM adopts a gradient-based one-side sampling method to exclude the lowest gradient samples and calculate the information gain of the large gradient samples. Second, we use a histogram algorithm to obtain optimal splitting points, and the leaf-wise strategy reduces unnecessary splitting overhead for the lower-gain leaf nodes. Third, we set the max depth of all decision trees to prevent overfitting problems. Notably, the Light-GBM adopts a gradient descent function (i.e., −g t (x) ) to optimize the new function increment, as described in Formula (13). So, we can utilize Fig. 3 The structure diagram of two residual blocks. a is the conventional residual block, b is the composited residual block (CRB) of the CTCN Page 7 of 20 1 the classical least-squares minimization task to simplify the objective function, as denoted in Formula (14).
where f is the functional model between input features and response outputs, is the (t-1)th function estimation, also called boosts. h(x, ) denotes a custom base-learner function, p denotes the boundary expansion, and ( , ) denotes the optimization parameters (i.e., the step size and the functional dependence parameters).
Finally, we suppose that the input sequence of the samples via the gradient-based one-side sampling method is the actual values of the multi-adjustment instances. The final output functions of the parallel Light-GBM prediction module are shown in Formula (15). The multiple prediction values of the multi-adjustment parameters ( (16) and (17). Further, the algorithm of the CTCN-LightGBM is proposed as follows.
Page 8 of 20 where L 0 (z g ) is the value of the initial weak learner. F m (⋅) denotes the output of the m-th decision trees.
is the weight of each tree, M is the number of the trees, ℜ m,j , j = 1, 2, … , J denotes the leaf node area of the m-th decision tree, and c m,j is a leaf node. Further, if z g ∈ leaf m,j , In Algorithm 1, steps 1-6 denotes the process of the initial dataset preparation, the feature matrix extraction, and the feature reconstruction process. Further, Step 7-12 denotes the predicting process of multi-adjustment values based on the parallel Light-GBM module.

The Experimental Settings and Performance Metrics
This paper collects the real loading datasets from different coal mines (i.e., Huaibei Mining Co., Ltd and Linhuan Mining Co., Ltd) in Anhui Province, China. We take the whole carriage as a single research object (i.e., concluding 117-loading point instances) and collect 50 carriages' instances from each coal mine (e.g., each carriage is loaded on average four times a month). The historical loading data and corresponding multiadjustment values from Apr 1st, 2021, to Nov 1st, 2021 are applied to carry out the experiments. The dataset is split into the training set and the testing set according to the proportion of 8:2. The experimental programming environment is Python 3.9, the Keras library, the NVIDIA RTX 3090, the AMD R7-5800 × CPU, and the 32 GB of memory. Further, the CTCN-LightGBM model and other contrast models are studied in this paper. These include two kinds of models, the classical learning models (i.e., the Light-GBDT [23], the Light-GBM [24], the TCN) and the hybrid learning models (i.e., the LSTM-CNN, the CNN-LSTM [6], the LSTM-LightGBM [26], the CNN-LightGBM [28], the TCN-LightGBM [29], and the CTCN-LightGBDT).
Because of the zero-extreme values in training data, the mean absolute percentage error is not suitable as an evaluation criterion. Usually, we adopt the mean absolute error (MAE), the root mean square error (RMSE), and the determination coefficient ( R 2 ) as evaluation metrics for model prediction.
where W denotes the number of testing instances. Z w , Z w , and Ẑ w represent the actual, the average actual, and predicted multi-adjustment values in the w-th instance, respectively.
Further, since the range of the actual normalization instances is [− 1, 1], the predicted and actual values may have different signs. We select the measurement coefficient (F1-Score) and the area under the curve (AUC ) value as evaluation metrics for model classification.
where precision is the precision value, and recall is the recall value.

Comparison Results for the Feature Dimensionality Reduction Ratio of the CRB
In this experiment, we compare the CRB with different dimensionality reduction ratios to explore the feature extraction ability of the CTCN-LightGBM model. The dimensionality reduction ratios are [0.5, 0.25, 0.125, 2], and the dataset collected in the Huaibei Mining Co., Ltd from Apr 1st, 2021, to July 1st, 2021 is utilized to run simulations. We randomly select the completed loading data as testing instances, and the data concludes the 117-loading point instances. The loss function is MSE, the evaluation metric is RMSE, and the parameter settings of contrast modules are as follows: 1. CTCN: The factors of the dilated causal convolution branch are [1,2,4,8], the filters are 64/64/16/16, and the convolutional kernel size is 2. The LeakyReLU value is 0.3, and the batch normalization value is 0.9. Addition-(20) ally, the chosen model optimizer is Adam, the gradient calculation function is MSE, the hidden layers are 32/16, the dropout value is 0.25, and the training epoch is 800. 2. Light-GBM: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.01, the bagging fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT. Figures 4a and 5a show the training loss of the extraction module with different dimensionality reduction ratios for multi-adjustment parameters. Figures 4b and 5b are the fitting curves of the training loss curves. Table 3 shows the detailed results for the extraction module with different dimensionality reduction ratios. Among them, the CTCN-LightGBM model with a dimensionality reduction ratio b = 0.25 can acquire the best extraction performance (e.g., the RMSE metrics are about 0.107 × 10 −2 m/s and 0.286 × 10 −2 t/s , where m/s denotes the speed unit, and t/s denotes the flow unit.).

Ablation Experiments for the Extraction Ability of the CTCN
TO explore the performance of the composited dimensionality reduction convolution branch in the CTCN, we compare the CTCN + -LightGBM (i.e., the CTCN-LightGBM without the composited dimensionality reduction convolution branch, but with a 1 × 1 shortcut) and CTCN ++ -LightGBM (i.e., the CTCN-LightGBM without the dimensionality reduction convolutional layer, as shown in Fig. 6a), and the CTCN +++ -LightGBM (i.e., the CTCN-LightGBM without the one-dimensional convolutional layer, Fig. 6b) with the CTCN-LightGBM. The dataset collected in the Huaibei Mining Co., Ltd from Apr 1st, 2021, to July 1st, 2021 is utilized to run simulations. Further, the completed data (including 117-loading points) is adopted to perform the predictive ability of contrast models intuitively. We test three randomly completed data of multi-adjustment values in the testing dataset to eliminate model contingency. The parameter settings of contrast modules are as follows: 1. CTCN: The factors of the dilated causal convolution branch are [1,2,4,8], the filters are 64/64/16/16, and the    Tables 6  and 7 show the ablation experimental results for the prediction of flow-adjustment values. Among them, the modified CTCN modules (i.e., CTCN ++ and CTCN +++ ) obtain better performance than the CTCN + -LightGBM for predicting the multi-adjustment values but are weaker than the CTCN-LightGBM model. Namely, the model generalization and sign-preserving ability could be improved (i.e., the R 2 : 0.909/0.895 vs. 0.925/0.924) by adding a side-road dimensionality reduction convolutional branch in CRBs. Further, without the one-dimensional convolutional layer, the performance of the CTCN +++ -LightGBM is slightly poorer than the CTCN-LightGBM (i.e., the R 2 : 0.918/0.915 vs. 0.925/0.924). It means that both the one-dimensional convolutional layer and the dimensionality reduction convolutional layer can emphasize related useful features. However, the dimensionality reduction convolutional layer significantly improves generalization and retaining sign features ability. Namely, it emphasizes the model performances by reducing the dimension filters. Further, comparing the time consumption of the CTCN ++ -LightGBM with the CTCN + -LightGBM (i.e., 0.228 s vs. 0.226 s), we find that the added 1 × 1 convolutional layer can reduce parameter

Comparison Results for the Extraction Ability of the FR method
This experiment compares the equal-parameter TCN-LightGBM with the dimensionality reduction layer (i.e., called TCN * -LightGBM) and the CTCN-LightGBM without the FR method (i.e., called CTCN*-LightGBM) with the CTCN-LightGBM to verify the extraction ability of the FR method in feature extraction module. Namely, the conventional residual block requires (2k + 1) ⋅ c 2 , whereas the CRB needs [(2k + 1) + (k + 1) × b] ⋅ c 2 parameters, which c denotes the feature dimension of the input and intermediate layers.
Further, Fig. 7 shows the structure of the composited residual block for the TCN*, and each dilated casual convolution layer's filter parameters are described in Table 8     fraction is 0.5, the feature fraction is 0.9, and the boosting method is GBDT. with the CRBs requires more computational time than other models (e.g., about 0.23 s), but it is acceptable considering the improvement of the model accuracy. Figure 8a and b presents the actual prediction results of truck speed-adjustment values. Also, Fig. 9a and b represents the actual prediction results of chute flow-adjustment values. Among them, the CTCN-LightGBM can obtain the best evaluation metrics (e.g., RMSE: 0.106/0.283, MAE:0.071/0.203). Further, the TCN*-LightGBM is worse than the CTCN-LightGBM (e.g., R 2 is 0.908/0.912 vs. 0.921/0.929), and the TCN-LightGBM also gets better performance than the CTCN*-LightGBM (e.g., R 2 is 0.899/0.909 vs. 0.921/0.929). Namely, the FR method can improve the extraction ability of the CTCN module and help the Light-GBM module acquire superior performance for the prediction of sign characteristics (e.g., F1-Score: 0.945/0.961, AUC: 0.918/0.917) and values (e.g., R 2 : 0.921/0.929).

Prediction for Multi-Adjustment Values
This experiment compares the CTCN-LightGBM model with the listed models for multi-adjustment values prediction (i.e., truck speed and chute flow). We randomly select the completed temporal loading data (i.e., 117-loading points) as the testing instances to perform the predictive effects of each contrast model.     Figure 10a and b shows the absolute prediction errors of the expansive Light-GBM models. Among them, the CTCN-LightGBM model obtains lower errors than other models, and the fluctuation trend is relatively stable. Further, the results mean that the CTCN-LightGBM model based on the FR method and the CRBs can obtain high accuracy. It is suitable for multi-adjustment value prediction in actual industrial loading.
(2) Experiment-2: In this experiment, the historical loading data collected in the Linhuan Mining Co., Ltd from Aug 1st, 2021, to Nov 1st, 2021 are utilized to run simulations. The experimental environments and models are listed in Sect. 4.1. The parameter settings of contrast modules are as follows: (1) TCN: The temporal convolutional network is built by the Keras library. The dilated convolution factors of the temporal convolution network are [1,2,4,8], the filters are 256/64/32/16, and the con-  values. Because of more parameters of each module in the CTCN-LightGBM model, the complexity and computational time are slightly increased (i.e., about 0.27s), but it is acceptable for industrial loading applications. Figure 11a and b shows the expansive Light-GBM models' absolute prediction errors and related trend distributions. Among them, the CTCN-LightGBM model precisely matches the truck speed and chute flow values, and the fluctuation of absolute prediction errors is relatively stable. Thus, it can be well applied to forecasting multi-adjustment values in industrial loading.

Discussion and Analysis
The paper compares the proposed CTCN-LightGBM with other models to illustrate better prediction effects. There are some intuitive results and theoretical analyses as follows: 1. In classical learning models, the Light-GBDT and Light-GBM can better fit actual prediction targets, whether positive or negative values. The computational times of the expansive Light-GBDT models are significantly less than the TCN. Theoretically, the reason is that the ensemble learners (i.e., like decision trees) with the negative-gradient fitting can decrease the loss along the gradient direction. Further, the gradient-based one-side sampling method and the histogram algorithm for the Light-GBM can reduce the data size, guarantee basic features, and accelerate network convergence. Also, the leaf-wise strategy with depth limitation plays an essential role in avoiding overfitting. 2. In hybrid learning models, the extraction performances of models via the LSTM are worse than that of models via the expansive CNN. Because of the long-time span and nonlinear feature distribution, the LSTM is not suitable for extracting hidden collaborative relationships. Further, the dilated convolution and residual blocks for the TCN can obtain a wider receptive field than the CNN, as shown in Table 17. Also, the limitations of the CNN will lead to poverty when capturing the temporal information. Additionally, Formula (25) and (26) provide the size of the receptive fields of convolutional layers and dilated residual blocks. where r c is the receptive field size of the c-th convolutional layer, k c is the kernel size of the c-th layer or the pooling layer size, and ∏ s l is the multiplication of the convolutional strides of the previous (c-1)-th layers. Also, l denotes the receptive field size of the l-th dilated residual layer. k l denotes the kernel size of the l-th layer, and denotes the dilated factor (i.e., b = 2). 3. The CTCN-LightGBM model that integrates the superiority of the CTCN and the Light-GBM achieves the best forecasting effect among all models. The receptive fields of the proposed CTCN-LightGBM model are significantly more expansive than that of the TCN-LightGBM, which can improve feature extraction ability. Namely, the CRB adopts a side-road dimensionality reduction convolutional branch to replace the 1 × 1 convolutional shortcut in the conventional residual block. Further, the CTCN-LightGBM model requires more time due to the auxiliary branch parameters, but the predictive performance in industrial loading should be firstly considered.
In addition, the FR method reduces the abnormal loss and improves the Light-GBM module's prediction effect by reconstructing and enlarging the hidden relationships of the collaborative feature matrix. 4. The training time complexity of all contrast models.
Based on the above models (e.g., LSTM, CNN, and Light-GBM), the training time complexity of hybrid learning models can be calculated in Tables 18 and 19.
Where B is the input of training dataset instances, D is the feature dimension, k is the kernel size, n is the number of branch points, and N is the number of convolutional layers. T 1 , T 2 , and T are the time complexity of the dimension reshaping between two single models.a% is the selected top a × 100% data and b% is the randomly selected b × 100% data in different data subsets. N 1 is the number of cells in the LSTM layer, and N 2 is the number of convolutional layers in CRBs. depth_max is the max depth of decision trees. Among them, the CNN-LSTM model and the LSTM-CNN model cost more time than other hybrid learning models. The reason is that these two neural networks need to reshape and match the feature sizes by increasing the channel. However, other hybrid models based on the Light-GBM will reduce a featured channel with decision tree algorithms to obtain a low training time complexity.

Conclusion
The paper proposes a CTCN-LightGBM model via the CTCN and the parallel Light-GBM to accurately predict real-time loading values for balanced industrial loading. The composited residual blocks in the CTCN are used to extract collaborative features of multi-adjustment values effectively. Also, we utilize the FR method to reconstruct collaborative features extracted by the CTCN and enlarge   related properties for better multi-target prediction. In addition, we adopt the reconstructed feature matrix as the input of the parallel Light-GBM to accurately predict multi-adjustment values. Experiments show that our CTCN-LightGBM model significantly outperforms other contrast models in predicting industrial loading parameters. However, there are still some problems that have not been solved. For example, the proposed method has limitations in that the computational complexity will increase with the number of composite residual blocks.
In the future, we will explore optimizing the structure of composited residual blocks in CTCN to reduce time consumption and apply it to more related industrial fields. Determination coefficient F1-Score The measurement coefficient AUC The area under the curve