1 Introduction

Industrial loading is a field to achieve accurately quantitative loading for materials, which is widely applied in agriculture, mining, etc., and the conventional loading process based on this system is shown in Fig. 1, which includes a rough loading process and a precise loading process. The existing works for accomplishing the goal mainly utilize the conventional manual programmable logic controller system [1, 2]. The loading capacity is judged by the accumulation height in the rough loading process. When both the front and rear wheels of the truck are on the track scale, the precise loading process uses on the indicating weights of the track scale to achieve the loading. The above loading process needs to stop midway and repeatedly reload to achieve the target quantity. However, inaccurate and dangerous unbalanced loading problems will often occur in actual scenes because of fuzzy and artificial experience prediction for multi-adjustment values. Thus, how to obtain accurate adjustment parameter values with actual applications has become the key issue for exploration in the paper.

Fig. 1
figure 1

The dynamic loading process of the conventional method

Nowadays, the collaborative control process based on multi-agent systems (MAS) has been studied in many fields. Some authors integrate the multi-agent system with machine learning models [3]. However, the existing prediction models mainly solve the communication consensus problem [4,5,6], and they cannot be well fitted for the advanced multi-parameter prediction. In a word, it is a challenge that what collaborative control multi-parameter prediction model in MAS should be provided and the relevant loading parameters' standards can cater to the accurate prediction model well.

In recent years, we have found that the hybrid machine learning model can dynamically simulate artificial experience and achieve accurate target prediction [7, 8]. Deep learning, particularly convolutional neural networks, provided a new perceptive field for feature extraction/learning by parallel convolution of multi-layer filters [9,10,11,12]. Further, the latest temporal convolutional network (TCN) that as an alternative model for sequence modeling is extensively used in many fields, such as pattern recognition [13] and signal prediction [14]. The TCN integrates both the feature convolution processing of the convolutional neural networks [9] and the time-series information mining capability of the recurrent neural network [15]. It is suitable for parallel and distributed computing for massive amounts of highly nonlinear dynamic process data, making it very popular for data feature extraction. In addition, the gradient boosting decision tree (GBDT) has received a lot of attention, which is adopted as the prediction layer in hybrid learning models. This algorithm is an optimized form of gradient boosting, which has the characteristics of high accuracy, fast convergence, and easy cache optimization. Also, because of the time-series threshold, the GBDT can conduct a nonlinear relationship model for the multi-output prediction [16]. It is suitable for the adjustment values prediction of the multi-parameter in an industrial loading process.

In order to achieve collaborative control multi-parameter prediction, this paper proposes a parallel TCN-LightGBDT model applied in a collaborative control parameters' prediction MAS (MACP), which is shown in Fig. 2. The novelty of this work is that the proposed TCN-LightGBDT model integrates a wide receptive field and cross-layer information transmission for the TCN and negative-gradient learner for the GBDT. We also propose a new theoretical parameter supplementing method and parameter selected deviations for dataset construction. The experimental results show that the proposed model achieves a significant and reasonable improvement compared to the baseline models. The main contributions of this paper are as follows:

  1. (1)

    The parameter-selected deviations are formulated to solve the low-precision prediction problem of the multi-parameter in the irregular loading. We propose a theoretical parameter supplementing method to complete the deviated data in the processed dataset and improve the extraction capability for features' fluctuation trends.

  2. (2)

    We adopt the TCN to extract the deep time-domain features parallelly. With the help of the feature crossover (FC) method, the two-dimensional feature matrix of parameters is reconstructed, which will be used as a key input in the Light-GBDT model. Further, the Light-GBDT model [17, 18] is applied to the prediction of multi-parameter adjustment values.

Fig. 2
figure 2

The structure of the MACP System (i.e., The system is used to transmit the parameter data and adjust agents according to the predicted adjustment values in industrial loading. The agents in the MACP system conclude two parts: the truck control agent and the material control agent.)

The rest of this paper is organized as follows. Section 2 introduces the relative work of multi-agent systems and deep learning models for dynamic target prediction. Section 3 proposes the principle of the TCN-LightGBDT and relevant data processing methods. Section 4 gives the experimental results and the theoretical analysis. Finally, the conclusion and future work are given in Section 5.

2 Related work

We review the related research work in three main areas in this paper, including: (1) The MAS control systems in relative industrial fields; (2) The target predictive models using neural networks; (3) The optimization methods using the expansive decision tree algorithm.

2.1 The MAS systems in relative industrial fields

The existing MASs mainly focus on the effective state consensus of agents [19, 20], the low multi-layer continuous communication costs [21, 22], and the adaptive collaborative control method [12, 23, 24]. For example, Z. Xu, et al. [19] propose the edge event triggering technique to eliminate the Zeon behavior and reduce the burden on the event detector. Y. Han, et al. [20] design an encoding–decoding impulsive protocol to achieve energy constraints in MAS. L. Lindemann, et al. [21] provide a hybrid feedback control strategy based on the time-varying control barrier function. F. Lian, et al. [22] provide sparsity-constrained distributed social optimization and non-cooperative game algorithms to save the cost of the underlying communication network. Y. Qian, et al. [23] adopt distributed event-triggered adaptive output feedback control strategy to solve the control problem of linear multi-agent systems. S. Luo, et al. [12] propose a distributed event-triggered adaptive feedback control strategy to process the consensus problem of external disturbances in MAS. H. Tan, et al. [24] solve the coordination of cloud-based model-free multi-agent systems with communication constraints by the distributed predictive control method.

2.2 The target predictive model using neural networks

The target predictive model based on neural networks has proved its success in the target parameters tracking process [7, 8, 25, 26] and nonlinear mechanical collaborative system control [27,28,29,30]. A Agga, et al. [7] and T. Bao, et al. [8] present the convolutional neural network and long short-term memory model to predict time-series data. A visual object tracking collaborative architecture based on the convolutional neural network is provided by W. Tian, et al. [25]. Additionally, J. Song, et al., [26] propose a heat load prediction model based on a temporal convolutional neural network. W. He, et al. [27] propose a disturbance observer-based radial basis function neural network control scheme. Z. Wang, et al. [28] proposed a radial basis function neural network control scheme based on disturbance observer.. S. Gehrmann, et al. [29] provide a framework of the visual interface collaborative semantic inference for the decision processes. H. Wang, et al. [30] present an intelligent coordinated control system for the dynamic monitoring of the heavy scraper conveyor.

2.3 The optimization methods using expansive decision tree algorithm

The decision tree algorithm and its expansion belong to the machine learning methods widely applied in data classification, regression, and prediction. T. Wang, et al. [31] and L. Wang, et al. [32] integrate the random forest to achieve accurate prediction or classification problems. R. Sun et al. [33] propose a GBDT-based method to predict the pseudo-range errors by considering the relevant signal strength and satellite elevation angle. D. Thai, et al. [34] propose an approach based on a gradient boosting machine to predict the local damage data of reinforced concrete panels under impact loading. L. Lu, et al. [35] propose an LSTM-Light Gradient Boosting Machine model that can predict latency quickly and accurately based on the collected dataset. Y. Dan, et al. [36] combine the deep CNN model with GBDT for the superconductor's critical temperature accurate prediction. H. Kong, et al. [37] propose a risk prediction model based on the combination of Logistic and GBDT. J. Bi, et al. [38] propose a new hybrid prediction method, which combines the capabilities of the temporal convolutional neural network and the LSTM to predict network traffic.

To the best of our knowledge, the existing work on multi-parameter parallel prediction have been less studied in industrial loading fields. Thus, this paper explores the hybrid model to achieve accurate multi-parameter prediction. The detailed structure of the proposed TCN-LightGBDT model is introduced in Section 3.

3 The structure of the TCN-LightGBDT prediction model

The TCN-LightGBDT prediction model consists of two parts: the multi-parameter's feature extraction based on the TCN and the Light-GBDT optimized prediction. Each framework of the proposed model is designed in Fig. 3.

Fig. 3
figure 3

The framework of the TCN-LightGBDT model

3.1 Multi-parameter's feature extraction based on the TCN

The feature extraction model based on the TCN includes exception data processing, theoretical parameters preparation, and multi-parameter matrix extraction.

  1. (1)

    Exception data processing

    Because of the accuracy error and manual operation interference, the raw dataset usually has many low-quality data (i.e., missing and over-precision data). The processing methods (as well as their acquisition accuracy) that are used to deal with this problem are described in Table 1. The labels in Table 1 conclude three parts: the speed adjustment value, the flow adjustment value, and the inclination adjustment value.

  2. (2)

    Theoretical parameter specification

    In this section, we propose the method to calculate the theoretical parameter value, and the details are as follows:

    1. a)

      Let LT, LB, MT, and m0 represent the truck length, the truck wheelbase, standard load, and the empty truck weight, respectively. n presents the loading area number of the truck, and r represents the loading areas within the horizontal distance from the truck rear baffle to the front wheel. Additionally, V = {v1, v2,…, vr, …, vn} denotes the truck target speed of each virtual loading area. Q = {q1, q2,…, qr, …qn} is the belt conveyor flow in each loading area. C = {c1, c2,…, cr, …cn}, c ∈ (0, 90] is the chute inclination angle in each virtual loading area. QF = {qF1, qF2,…, qFr, …qFn} is the material flow at the outlet-chute of virtual each loading area. The i-th loading displacement and the loading capacity value are defined as xi and Δmi, respectively. The horizontal distance between the center of gravity of the material and the front wheel is described as Li. In addition, FNR, and FNF are the pressure exerted by the rear and front wheels on the truck scale. The material loading schematic diagram is shown in Fig. 4.

    2. b)

      Suppose that the truck's rear wheels pass the track scale (i < r), we select a parameter combination (i.e., the truck speed and the belt conveyor flow) to adjust the loading capacity of each area. The first loading area is defined under the truck's standard initial speed (v1) and the standard belt conveyor flow (q1). If the material loaded shape in each loading area is approximately fitted, we can obtain the horizontal distance which is shown in Formula (1).

      $$L_{i} = \frac{\lambda }{2} \left[ {(1 - \frac{i}{2r}) \bullet (L_{B} + L_{T} )} \right],i = 1, 2,..., r$$
      (1)

      where λ stands for the coefficient of the horizontal gravity center and mi represents the total material amount after the i-th virtual interval area loading.

    3. c)

      If the target time consumption of the i-th area is ti, the actual loading capacity (Δmi) under the target speed (vi) and the belt conveyor flow (qi) in the i-th loading process is described in Formula (2). The formulas of the vi and the flow at the outlet-chute (qFi) are described in Formula (3).

      $${\frac{1}{4}} {m_{0}} g \bullet {(L_{T} + L_{B} )} + \triangle {m_{i}} \bullet {g} \bullet {L_{i}} + \sum\limits_{k = 1}^{i-1} {( {m_{k}} - {m_{k - 1}} ) {g} \bullet {L_{k}} } = \sum\limits_{k = 1}^{i} {({F_{NR}^{k}} - {F_{NR}^{k - 1}} ) \bullet {L_{B}} }$$
      (2)
      $$v_{i} = \frac{{x_{i} }}{{t_{i} }},q_{Fi} = q_{i} { = }\frac{{\triangle m_{i} }}{{t_{i} }}$$
      (3)
    4. d)

      When the truck's front wheels pass the track scale (i > r), the truck is entirely above the scale. If the belt conveyor flow keeps the maximum value Qmax, the truck speed (vi) and the chute target inclination (ci) can be calculated in Formula (4) and (5).

      $$v_{i} = \frac{{L{}_{i} - L_{i - 1} }}{{t_{i} }},q_{Fi} = \frac{{(F_{NR}^{i} + F_{NF}^{i} ) - (F_{NR}^{i - 1} + F_{NF}^{i - 1} )}}{{t_{i} }}$$
      (4)
      $$c_{i} = \sigma \frac{{(q_{Fi} )}}{{Q_{max} }},\sigma = 90$$
      (5)

      where \(F_{NR}^{i}\), \(F_{NF}^{i}\) are the pressure value of the rear and front wheels in i-th loading area, respectively.

    5. e)

      The theoretical loading capacity of each loading area is \(\overline{m}_{i} = M_{T} /n\). The material loading error of the i-th material actual loading is denoted as \(m_{error}^{i} = \Delta m_{i} - (\overline{m}_{i} - m_{error}^{i - 1} )\). We calculate the target material target loading capacity (\(m_{{{\text{t}}\arg {\text{et}}}}^{i}\)) of the (i + 1)-th loading area in Formula (7). The i-th parameters adjustment values are calculated in Formula (8).

      $$H_{error} = H_{i} - H_{target} ,(i = 1,2,...,n)$$
      (6)
      $$m_{target}^{i} { = }\overline{m}_{i - 1} - m_{error}^{i}$$
      (7)
      $$\left\{ {\begin{array}{*{20}c} {\triangle v_{i} = v_{i} - v_{i - 1} } \\ {\triangle q_{i} = q_{i} - q_{i - 1} } \\ {\triangle c_{i} = c_{i} - c_{i - 1} } \\ \end{array} } \right.$$
      (8)

      where Hi is the actual loading height of each loading point. Htarget is the target loading height. Δvi, Δqi, Δci respectively represent the i-th speed adjustment, belt conveyor flow adjustment, and the chute inclination adjustment value.

  3. (3)

    Multi-parameter matrix extraction based on TCN

    We denote the pre-input of the parallel TCN neural network as \(\tilde{X} = [V,Q,C,D,T,M_{T} ,M_{A} ,H]\). Then, the max–min range normalization method is described as follows.

    $$MaxRange = {|}\tilde{X}_{\max } - \tilde{X}_{\min } {|}$$
    (9)
    $$X_{ti} = (\tilde{X}_{ti} - \tilde{X}_{\min } )/MaxRange,X_{ti} \in X$$
    (10)

    where \(X = [V^{\prime},Q^{\prime},C^{\prime},D^{\prime},T^{\prime},M^{\prime}_{A} ,H^{\prime}]\) is the standard dataset of the input layer, Xti is an element of the standard dataset. \(\tilde{X}_{ti}\) is an element of the dataset \(\tilde{X}\), \(\tilde{X}_{\min }\) is the minimum value and \(\tilde{X}_{\max }\) is the maximum value in the dataset \(\tilde{X}\).

Table 1 The low-quality data processing methods
Fig. 4
figure 4

The material loading schematic diagram

The dilated causal convolution of the TCN can perform convolution expansion on the input and solve the problem of limited receptive fields, which is shown in Fig. 5. For the one-dimensional features \(X = (x^{\prime}_{0} ,x^{\prime}_{1} ,x^{\prime}_{2} ,...,x^{\prime}_{t} ,...,x^{\prime}_{T} )\) and the filters Df ={f1, f2, …, fD}, the dilated convolution operation F(•) of each element B is defined in Formula (10).

$$F(T) = (X \ast_{d} f_{ti} )(T) = \sum\limits_{i = 0}^{n - 1} {f(i)\ast x_{B - d \bullet i} }$$
(11)
$$\omega = 1{ + }\sum\limits_{i = 0}^{m - 1} {(k - 1) \ast d^{i} } = 1 + (k - 1) \bullet \frac{{d^{m} - 1}}{d - 1}$$
(12)

where n denotes the filter size, d represents the dilated factor, B-d•i is the direction of the past, ω indicates the width of the receptive field, k is the kernel size, and m is the number of the network layers.

Fig. 5
figure 5

The dilated causal convolution stack diagram

In addition, the increasing number of hidden layers will affect the deep network stability and complexity. We use the multi-residual blocks connection [39] with different dilated factors to solve the problem. The detailed structure of the multi-residual blocks is shown in Fig. 6, and the output Xm is denoted in Formula (13). When the residual connection operations are completed, we can get a two-dimensional matrix as the convolutional feature output (\(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{Y}_{v} ,\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{Y}_{q} /\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{Y}_{c}\)) in Formula (14). The related multiplication factors are denoted in Formula (15).

$$X^{m} = \psi_{{{\text{Re}} lu}} (F(X^{m - 1} ) + X^{m - 1} )$$
(13)
$$\begin{array}{*{20} l} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{Y}} = ( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{0,T}, \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{1,T}, \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{2,T}, \dots, \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{M,T}\\ = \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{Y}_{q/c} +\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{v}\\ = {\delta_{1} \cdot} ( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{0,T}^v, \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{1,T}^v, \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{2,T}^v, \dots, \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{M,T}^v) + {\delta_{2} \cdot} ( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{0,T}^{q/c}, \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{1,T}^{q/c}, \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{2,T}^{q/c}, \dots, \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{M,T}^{q/c})\\\end{array}$$
(14)
$$\delta_{1} = \left[ {\begin{array}{*{20}c} 1 & 1 & \cdots & 1 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \vdots & 0 \\ 0 & 0 & \cdots & 0 \\ \end{array} } \right]^{T} ,\delta_{2} = \left[ {\begin{array}{*{20}c} 0 & 0 & \cdots & 0 \\ 1 & 1 & \cdots & 1 \\ \vdots & \vdots & \vdots & 0 \\ 0 & 0 & \cdots & 0 \\ \end{array} } \right]^{T}$$
(15)

where ψRelu(⋅) is an activation operation, Xm−1 is (m-1)-th input of residual block connection, R is the number of filters, both δ1 and δ2 represent matrix multiplication factors.

Fig. 6
figure 6

The detailed structure of the multi residual blocks

Notably, the FC method is adopted to explore further and synthesize the extracted features of the convolutional output matrix \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{Y}\). The FC method can average the hidden relationship and quantify the characteristics among different parameters, which is shown in Fig. 7. First, the FC method swaps the elements corresponding to the same subscript of the column vectors in the two-dimensional matrix. Second, we calculate the average values of each new column vector as the relative elements and reshape as a restructuring multi-parameter extraction matrix (\(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{Y}^{\prime}\)). The process is described in Formula (16). Finally, the output matrix O = (O0,T, O1,T,…,OR,T,) shown in Formula (17) will be used as the Light-GBDT model's input to predict the suitable parameters' adjustment value.

$$\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i,T}^{v} \Leftrightarrow \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i,T}^{q/c} \Rightarrow \left\{ {\begin{array}{*{20}c} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i,T}^{{v^{\prime}}} = average(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{Y}_{v} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i,T}^{q/c} ))} \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i,T}^{{q^{\prime}/c^{\prime}}} = average(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{Y}_{q/c} (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i,T}^{v} ))} \\ \end{array} } \right.,i \in [1,M]$$
(16)
$$O {{ = (\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{Y}^{\prime}}}_{v,q/c} {) = }\left[ {\begin{array}{*{20}c} {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{0,T}^{{v^{\prime}}} } & {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{1,T}^{{v^{\prime}}} } & \cdots & {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{M,T}^{{v^{\prime}}} } \\ {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{0,T}^{{q^{\prime}/c^{\prime}}} } & {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{1,T}^{{q^{\prime}/c^{\prime}}} } & \cdots & {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{M,T}^{{q^{\prime}/c^{\prime}}} } \\ \end{array} } \right]$$
(17)
Fig. 7
figure 7

The FC Method Schematic Diagram

where \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i,T}^{{v^{\prime}}}\) is the speed element of the output matrix O, \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{y}_{i,T}^{{q^{\prime}/c^{\prime}}}\) is the flow or inclination element of the output matrix O, and average (•) denotes the average value function.

3.2 The Light-GBDT optimized prediction

The GBDT is a gradient boosting framework based on a regression decision cart tree, which is applied to the feature regression by selecting the best split point. Normally, the final output in the GBDT is the sum of the results of all regression decision trees. The detailed structure of the GBDT is designed in Fig. 8. This paper utilizes the Light-GBDT to effectively deal with nonlinear and low-dimensional features to regressively predict adjustment values of the multi-parameter.

Fig. 8
figure 8

The structure of the Light-GBDT model

The multi-parameter extraction matrix and basic historical features of industrial loading are combined as the input dataset (\(Input_{gbdt}\)), which is described in Formula (18).

$$Input_{gbdt} = X_{g} = [T^{\prime},V^{\prime},Q^{\prime},C^{\prime},M^{\prime}_{A} ,O]$$
(18)

Suppose the sample data input sequence is [Xg, Og] = [(xg1, og1),(xg2, og2),…,(xgN, ogN)], where N is the number of the dataset collected samples, and ogi (i = 1, 2, …, N) denotes the actual value of the adjustment elements in data samples. The initial weak learner f0(xg) in Formula (19) is used to minimize the initial loss function L(yi, c). We choose the information gain as an index to evaluate the split candidate-point from all feature values. In addition, the gradient descent method is adopted to approximate the calculation because the greedy algorithm cannot be accurate in selecting the optimal basis function. For the training sample i of the m-th iteration, the negative gradient γm,i is calculated by Formula (20), and the gains after splitting each leaf node is described in Formula (21).

$$f_{0} (x_{g} ) = \mathop {\arg \min }\limits_{c} \sum\limits_{i = 1}^{N} {L(o_{gi} ,c)}$$
(19)
$$\gamma_{m,j} = - [\frac{{\partial L(o_{gi} ,f(x_{gi} ))}}{{\partial f(x_{gi} )}}]_{{f(x) = f_{m - 1} (x)}}$$
(20)
$$Gain = \frac{1}{2}[(\frac{{G_{L}^{2} }}{{H_{L} + \lambda }}) + (\frac{{G_{R}^{2} }}{{H_{R} + \lambda }}) - (\frac{{(G_{L} + G_{R} )^{2} }}{{H_{L} + H_{R} + \lambda }})] - \gamma$$
(21)

when we adopt the square variance function, the loss expression L(ogi, f(x)) is (ogi, f(x))2/2. If the absolute loss function is, the loss expression L(ogi, f(x)) is |(ogi, f(x))|, where m = (1, 2, …) denotes the number of iterations. GL,R = ∑i∈|leaf|jqi, qi denotes the first derivative of the loss function in the i-th sample of the j-th leaf node. HL,R = ∑i∈|leaf|j(qi)(−1) denotes the second derivative sum of the loss function. γ represents the penalties for the increased complexity of trees.

Furthermore, by fitting the residual value with the regression tree, the leaf node area of the m-th decision tree can be represented as \(\Re_{m,j} ,j = 1,2,...,J\). The minimal residual loss value of a leaf node cm,j is calculated for the j = 1, 2,…, J is described in Formula (22). The value of the whole decision tree is shown in Formula (23), and the calculation formula of update learner is shown in Formula (24).

$$c_{m,j} = \mathop {\arg \min }\limits_{c} \sum\nolimits_{{x_{i} \in \Re_{m,j} }}^{J} {L(o_{gi} ,f_{m - 1} (x_{gi} ) + c)}$$
(22)
$$h_{m} (x) = \sum\limits_{i = 1}^{{|leaf|_{m} { = }J}} {c_{m,j} } I(x \in leaf_{m,j} )$$
(23)
$$f_{m} (x_{g} ) = f_{m - 1} (x_{g} ) + \nu \bullet \sum\nolimits_{j = 1}^{J} {c_{m,j} } I(x_{g} \in \Re_{m,j} )$$
(24)

where hm(⋅) denotes the value of the m-th decision tree and we have x ∈ leafm,j, I = 1; else I = 0. In addition, fm(⋅) denotes the updated learner's value of the m-th decision tree, v represents the scaling factor, and xg is a vector element of the input dataset Xg.

The Light-GBDT prediction can be expressed as a combination of multi-decision trees, and the final output functions are shown in Formula (25) and (26).

$$F_{M} (x_{g} ) = F_{0} + \nu_{1} F_{1} (x_{g} ) + \nu_{2} F_{2} (x_{g} ) + ,..., + \nu_{T} F_{T} (x_{g} )$$
(25)
$$\Rightarrow \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{output} (O_{g} ) = f_{0} (x_{g} ) + \sum\nolimits_{m = 1}^{T} {[\nu_{m} \bullet \sum\nolimits_{j = 1}^{J} {c_{m,j} } } I(x_{g} \in \Re_{m,j} )]$$
(26)

where FM(⋅) denotes a combined output of all decision trees. v1, v2, …, vT is the weight of each tree, T is the number of the trees. Fi(⋅) denotes the weighted sum of optimal basis fm(⋅). \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{output} (O_{g} )\) is the final output of the Light-GBDT.

Here, the time complexity of the process is O(KB), K is the neural network training epoch, B is the number of data in the training dataset. If the initial sample of the Light-GBDT is N, the time complexity of the training parameters dataset is O(NJM). In a word, the time complexity of the proposed model is O(KB + NJM).

4 Experiments

4.1 The experimental settings and performance metrics

This paper collects a real loading dataset from the coal mine in Anhui Province, China. In the process of collaborative control parameter prediction for industrial coal loading, we have taken the whole carriage as a single research object, and the related historical data collected from Mar 1st, 2020 to Nov 1st, 2020 are applied to carry out experiments. In addition, the dataset selected deviations are presented in Table 2.

Table 2 The dataset selected deviations of parameters

The proposed model and other baseline models are studied in this paper. These include two kinds of models, the classical learning models (i.e., the Light-GBDT [33], the Light-GBM [34], the TCN [26]) and the hybrid learning models (i.e., the CNN-LSTM [7], the TCN-LSTM [38], the LSTM-LightGBM [35], the TCN-CNN, the CNN-LightGBDT [36], and the TCN-LightGBDT). The experimental programming environment is Python 3.8, the Intel Core i7-9700 k CPU, and the 16 GB of memory.

The mean absolute error (MAE) represents the average absolute error between actual and prediction values. The gradient of mean square error (MSE) will change with the loss value. The Mean Absolute Percentage Error (MAPE) expresses the prediction percentage accuracy. R2 represents the coefficient of determination, which represents the interpretation of the independent variable to the dependent variable in the regression analysis, and the value range is (0,1]. Namely, the larger the coefficient is, the closer the predicted value is to the real value. The evaluation metrics are defined in Formula (26), (27), (28), and (29).

$$RMSE = \sqrt {\frac{1}{N}\sum\limits_{g = 1}^{N} {(F_{g} - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{g} )^{2} } }$$
(26)
$$FMAE = \frac{1}{N}\sum\limits_{g = 1}^{N} {{|}F_{g} - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{g} {|}}$$
(27)
$$FMAPE = \frac{1}{N}\sum\limits_{g = 1}^{N} {|\frac{{F_{g} - \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{g} }}{{F_{g} }}|} \times 100\%$$
(28)
$$R^{2} = 1 - \frac{SSE}{{SST}} = 1 - \frac{{\sum\nolimits_{\omega = 1}^{N} {(\hat{Y}_{\omega } - Y_{\omega } )^{2} } }}{{\sum\nolimits_{\omega = 1}^{N} {(\overline{Y} - Y_{\omega } )^{2} } }}$$
(29)

where N denotes the number of testing instances, Fg and \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{F}_{g}\) represent the actual and predicted adjustment value of the parameters in g-th instance, respectively.

4.2 Prediction for loading collaborative control adjustment parameters

In this experiment, the history loading data collected from Mar 1st, 2020 to Oct 1st, 2020 are employed to train simulation models. To better predict the dynamic collaborative control parameters, we randomly select the completed loading data are adopted as the testing instances. The prediction experiments for the adjustment values of parameters are as follows:

  1. 1)

    Experiment 1: Adjustment value prediction of truck speed and belt conveyor flow

    Based on the above-listed experimental environment and data splitting rules, we select the continuous 97 front-loading area data to verify the model prediction effect. The parameters of each model are summarized as follows.

    1. (1)

      TCN: The temporal convolution network is built by the Keras library. The dilated convolution factors are [1, 2, 4, 8], the filters are 128/64/32/16, and the convolutional kernel size is 3.

    2. (2)

      Light-GBDT: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.1, and the split criterion adopts MSE. The minimum samples split is 2, and the minimum leaf is 1.

    3. (3)

      Light-GBM: The number of trees is 500, the maximum depth is 6, the model learning rate is 0.1, the number of leaves is 40, the split metric is L1_mse, the minimum samples split is 2, the bagging fraction is 0.45, the feature fraction is 0.6, and the boosting method is GBDT.

    4. (4)

      CNN: The convolutional neural network is built by the Keras library. The CNN model is concluded by convolutional layers and the full connection layers. The convolutional layers are 2, the filters of each convolutional layer are 128/64, and the kernel size of filters is 3. The number of the fully connected layers is 2, and the number of neurons is 16/2, respectively.

    5. (5)

      LSTM: The number of hidden layers is 4, and the hidden neurons are 128/64/32/16. The fully connected layers are 2, and the number of neurons is 16/2.

The evaluation results of all models with the testing data are presented in Tables 3 and 4, respectively. Further, the adjustment prediction results of all models with the testing data are presented in Fig. 9a and b, respectively. The related absolute error (ABS_error) is shown in Fig. 10a and b. For the prediction effect of severe peaks and valleys, the performances of the single learning models are relatively weak. The performance of the proposed TCN-LightGBDT model compared with the other listed models is more optimal.

Table 3 The speed-prediction evaluation metrics results of all models
Table 4 The flow-prediction evaluation metrics results of all models
Fig. 9
figure 9

a The speed adjustment prediction results. b The flow adjustment prediction results

Fig. 10
figure 10

a The ABS error of speed prediction results. b The ABS error of flow prediction results

Figure 11a and b show that the scatter distribution of the proposed TCN-LightGBDT model is more compact than that of other baseline models, which suggests that the predicted value of the proposed model is closer to the actual adjustment value. Further, the fitting curve of the TCN-LightGBDT can fit the real instances well than other models, meaning that the total change in the dependent variable is small. In addition, the R2 value of all hybrid models indicates that the proposed TCN-LightGBDT model has the highest interpretation for the predicted value. (i.e., TCN-LightGBDT vs LSTM-LightGBM: 1.009, TCN-LightGBDT vs CNN-LightGBDT 1.018, TCN-LightGBDT vs TCN-CNN: 1.025, TCN-LightGBDT vs TCN-LSTM: 1.029, TCN-LightGBDT vs CNN-LSTM: 1.031). Namely, the R2 value of the proposed model by linear regression fitting reinforces the fact that the linear correlation between the true and the predicted value is the strongest. In summary, the TCN-LightGBDT model has a better performance than the other models.

  1. 2)

    Experiment 2: Adjustment value prediction of truck speed and chute inclination

    Based on the listed experimental environment and data splitting rules discussed above, we select the data of 20 rear-loading areas to verify the prediction performance of the proposed model. The parameters of each model are summarized as follows.

    1. (1)

      TCN: The temporal convolution network is built by the Keras library. The dilated convolution factors are [1, 2, 4, 8]. The filters are 64/32/16/16, and the convolutional kernel size is 2.

    2. (2)

      Light-GBDT: The number of trees is 500, the maximum depth is 4, the model learning rate is set to 0.1, the minimum sample leaf is 1, and the split criterion adopts MSE.

    3. (3)

      Light-GBM: The number of trees is 500, the maximum depth is 4, the model learning rate is 0.1, the number of leaves is 40, the split metric is L1_mse, and the minimum samples split is set to 2, the bagging fraction is 0.4, the feature fraction is 0.5, and the boosting method is GBDT.

    4. (4)

      CNN: The convolutional neural network is built by the Keras library. The CNN model is concluded by convolutional layers and the fully connected layers. The convolutional layers are 2, the filters of each convolutional layer are set to 64/32, and the kernel size of filters is 2. The number of the fully connected layers is 2, and the number of neurons respectively is 16/2.

    5. (5)

      LSTM: The number of hidden layers is 3, and the hidden neurons are 64/32/16. The fully connected layers are 2, and the number of neurons is 16/2.

Fig. 11
figure 11

a Scatter diagrams of contrast models for the speed prediction. b. Scatter diagrams of contrast models for the flow prediction

Similar to Experiment 1, Fig. 12a and b show the prediction results of the speed and inclination adjustment value. Figure 13a and b show the trend distribution of ABS_error for each model. It is observed that the prediction performance of the proposed model can precisely match the actual loading adjustment data. In addition, the TCN-LightGBDT model can well capture the continuous stable trend while other hybrids or non-hybrid models show significant fluctuation errors.

Fig. 12
figure 12

a The speed adjustment prediction results. b The inclination adjustment prediction results

Fig. 13
figure 13

a The ABS error of speed prediction. b The ABS error of inclination prediction

To evaluate the effectiveness of our model, we list the evaluation metrics of all models in Table 5 and Table 6. Because of the lower complexity in reconstructed hidden layers of the proposed model, the time cost of speed prediction is slightly reduced. The time cost of the proposed model is no more than 2 s. Figure 14a and b show that the predicted instances using our proposed model can fit the regression curve well. It means that the TCN-LightGBDT model has a better prediction performance for adjustment parameters in the industrial loading than other listed models.

Table 5 The speed-prediction evaluation metrics results of all models
Table 6 The inclination-prediction evaluation metrics results of all models
Fig. 14
figure 14

a Scatter diagrams of contrast models for the speed prediction. b Scatter diagrams of contrast models for the flow prediction

4.3 Comparison results between TCN-LightGBDT and TCN-LightGBDT(non-FC)

To verify the effectiveness of the FC method, we compared the TCN-LightGBDT model with the TCN-LightGBDT model (non-FC). The dataset was collected from Jun 1st, 2020 to Nov 1st, 2020. Further, we normalize the prediction labels to show the difference in performance more clearly, and the complete loading data are randomly adopted to perform the testing results intuitively for the compared models. The detailed experimental settings are as follows.

  1. (1)

    TCN: The temporal convolution network is built by the Keras library, and the dilated convolution factors are [1, 2, 4], the filters are 64/32/16, and the convolutional kernel size is 2.

  2. (2)

    Light-GBDT: The number of trees is 200, the maximum depth is 4, the model learning rate is 0.1, and the split criterion adopts MSE. Additionally, the minimum samples split is 1, and the minimum samples leaf is 1.

According to Fig. 15 and Fig. 16, we know that the TCN-LightGBDT without the FC method fluctuates significantly. Based on the FC method, the features convolution and regression prediction can be well connected to improve the prediction accuracy. Table 7 shows the evaluation metrics results of compared models. It is indicated that the TCN-LightGBDT can have lower errors than the model without the FC method. The R2 score of the proposed model with the FC method is higher than the TCN-LightGBDT(non-FC) model (TCN-LightGBDT vs. TCN-LightGBDT(non-FC): 1.013/1.015). The computational time of the contrast models is similar and acceptable.

Fig. 15
figure 15

a The speed prediction results of 97 points. b The flow prediction results of 97 points

Fig. 16
figure 16

a The speed prediction of 20 points. b The inclination prediction of 20 points

Table 7 The evaluation metrics results of compared models

4.4 Discussion and analysis

In the paper, the proposed TCN-LightGBDT model using the TCN and the Light-GBDT is compared with the above baseline models to illustrate better prediction accuracy. Further, some insightful conclusions and theoretical analysis are presented as follows:

  1. (1)

    The TCN is superior to the CNN and LSTM in dynamic features extraction. The receptive field size of each residual layer calculated by Formula (12) is listed in Table 8. Compared to the TCN, we can see that the receptive field size entirely depends on the convolution kernel for each convolutional layer of the CNN. The final receptive field size of the TCN is 31(16), which can reduce the unnecessary coverage of time-series and improve feature extraction accuracy. Namely, due to the dilated convolution and residual blocks connection, the TCN can obtain a wider receptive field to capture long-term historical relationships. Thus, the prediction ability of the TCN-CNN outperforms the CNN-LSTM and the TCN-CNN. In addition, the prediction of the CNN-LSTM and CNN-LightGBDT are both worse than that of the LSTM-LightGBM. This is because of the limitations of the extraction object or techniques. Namely, the one-dimensional convolution has a relatively poor ability to capture the long-time-range features. Additionally, the LSTM is relatively short of the ability to solve the time series concurrency problem. It is also the main reason why the prediction results of the CNN-LSTM model and the TCN-LSTM model are not as good.

  2. (2)

    The Light-GBDT and the Light-GBM have better prediction performance than the TCN. Because of the strong learner with the negative-gradient fitting, the models can accurately predict nonlinear and low-dimensional data. In addition, with the help of the gradient-based one-side sampling and the histogram algorithm, the Light-GBM outperforms the Light-GBDT based on cart regression trees. The decrease of the loss function along the gradient direction accelerates the function convergence. So, the time consumptions of the Light-GBDT and the Light-GBM are significantly less than that of the TCN.

  3. (3)

    In hybrid learning models, the proposed model outperforms the LSTM-LightGBM and the CNN-LightGBDT. Since LSTM relies on historical time series, it will make the predicted error and computational time higher of the LSTM-LightGBM than the proposed model. In addition, due to the coordinated changes among the adjustment parameters, we provide the FC method to reconstruct the extracted features. The method averages feature values to reduce the loss of the abnormal extraction by the TCN, which improves the prediction accuracy of the proposed model. Also, Fig. 17(a) to (d) are the important values of features that are extracted by different models. Among them, Fig. 17(a) and (b) indicate that the CNN-LightGBDT or the TCN-LightGBDT without the FC method excessively depend on the certain feature, and will cause a large error for feature prediction. Figure 17(c) shows that the GBDT prediction relies on too many features extracted by the LSTM, which will decrease the prediction accuracy. In Fig. 17(d), the FC method makes the GBDT prediction process associated with the appropriate features, improving the proposed model's predicted effect.

Table 8 The receptive field size of the TCN and CNN
Fig. 17
figure 17

The important values of extracted features

5 Conclusions

In the paper, we propose a TCN-LightGBDT model to achieve the accurate prediction of adjustment values for multi-agent collaborative control parameters in industrial loading. The loading parameter deviations and theoretical parameter supplement method are used to optimize the dataset, and the FC method is provided for matrix reconstruction of the temporal features extracted by the parallel TCN. Further, we utilize the reconstruction matrix as the feature training set and accurately predict the adjustment parameter values of different combinations using the Light-GBDT in the 117 virtual loading regions. In experiments, we show that the model significantly outperforms other compared models. However, there are still some problems that have not yet been resolved. In the future, we will explore how to accelerate gradient convergence for further reducing time consumption by weight optimization algorithms. Furthermore, we will adopt and apply our proposed model to more related fields (e.g., image target prediction).