1 Introduction

It is well known that the electrolytic aluminium industry holds an important strategic position worldwide [1]. However, the industry is characterized by high power consumption and high pollution. Investigations show that producing one ton of electrolytic aluminium will emit nearly 1500 m3 of polluting gas into the atmosphere and consume approximately 500 kg of carbon anodes [2]. Research on energy-saving and emission-reduction technology in the electrolytic aluminium process (EAP) has significant engineering application value.

Currently, on the premise of ensuring the stable production of electrolytic aluminium cells, improving current efficiency and reducing energy consumption have become important goals for electrolytic aluminium enterprises. To achieve the above goals, the main research in this field has been focused on the following two aspects: (1) Improving process equipment, such as shaped cathodes and perforated anodes. For example, Peng et al. [3] analyzed the method of improving the current efficiency of a Hall-Heroult cell by using a novel rectangular protruding cathode and process parameters in the EAP. (2) Establishing system models based on data mining technology and using a reasonable and effective filtering method to improve the model accuracy. For instance, Yao et al. [4] used the Kalman filter to establish the dynamic evolution model of the EAP in a Gaussian noise environment. Because the existing industrial aluminium electrolysis cell superheat identification mainly depends on manual experience, the accuracy is far from satisfactory, and a deep soft sensor method for superheat degree detection was proposed by Lei et al. [5].

The above two aspects are helpful for realizing the energy savings and emission reduction of the EAP. However, it should be noted that the first method is meant to improve equipment that is closely related to EAP data, such as measurement and transmission equipment. The implementation of these objectives is difficult and often consumes substantial financial and material resources, which are more applicable for new production systems. The second method predicts the state of process parameters based on the system model and filtering algorithm, which does not need to change the existing production equipment. Moreover, the internal information about real data that has been covered by complex noise can be further explored. Furthermore, since neural networks (NNs) still have an excellent non-linear mapping ability to fit a large amount of data under the condition of an unknown system modeling mechanism [6], the application of NN modeling is an effective method. It does not need to understand the internal mechanism of EAP, and the mapping relationship between decision variables and industrial indicators can be obtained by learning and training a series of process data. However, once the traditional NN is trained, its model parameters cannot be further dynamically updated. The combination of an NN and new filtering algorithms is expected to enhance the ability to optimize the process model online. Yi et al. [7] proposed a dynamic prediction model based on false nearest neighbors and an UKFNN to determine the alumina concentration. Li et al. [8] presented a method that uses an improved UKFNN and NSGA-II algorithm to obtain the optimal output of stable operating variables in the EAP. A modular integrated fuzzy neural network was developed for predicting multiple fault diagnoses of the EAP by Li et al. [9]. The above studies have established a single objective prediction model in the EAP. However, current efficiency and DC power consumption are two main technical and economic indicators in a real electrolytic aluminium equipment process system. The NN modeling of two objectives provides a model foundation for realizing collaborative optimization.

Based on the above analysis, one of the main directions of energy savings and emission reduction is to establish a multiobjective prediction model that not only has high prediction accuracy but also can minimize unit power consumption and maximize current efficiency in the EAP. However, the EAP contains a series of physical and chemical reactions, and there are various internal and external parameters that present a complex coupling interaction, so that the EAP involves some complex noise characteristics (i.e., unknown types, redundant distributions and variable forms). The above problems make it difficult to establish a multiobjective prediction model for the EAP. Moreover, the EAP is extremely susceptible to interference from uncertain factors such as Gaussian noise or non-Gaussian noise while collecting a series of decision parameters such as series current, cell voltage, cell temperature, etc., which seriously affects the accuracy of the prediction model. Therefore, minimizing the noise interference in the model algorithm has become a feasible breakthrough for further improving model prediction accuracy [2].

However, the aforementioned studies did not consider the characteristics of mixed noise in the EAP, and usually only used a single filtering method to estimate the parameters of the NN’s weights and thresholds. These investigations lack a discussion on the filtering prediction problem with complex and unknown system noise, and thus are not conducive to mining the model’s prediction potential in depth. For instance, previous research studies have shown that combining the Kalman filter (or improved Kalman filter) and an NN may cause modeling failure because the noise is not limited to linear or Gaussian characteristics [10, 11]. Additionally, it has been demonstrated that a combination of a particle filter (PF) [12, 13] and NN can solve model problems with non-linear and non-Gaussian noise [14,15,16]. However, particle degradation in the PF may lead to algorithm divergence after several iterations. Therefore, the required state estimation cannot be obtained. To solve the above problems, in this paper, a “multi-source filter” technique is proposed, which uses the mean and variance in the UKF to adjust the PF’s importance function based on the density kernel estimation method. Then, the NN’s model parameters (i.e., weights) can be viewed as state variables of the filtering algorithm, and its outputs can be viewed as measurement variables, which give the above strategy a significant advantage of adaptively adjusting the state estimation under various mixed noise interferences. Finally, a multiobjective incremental learning prediction model that meets the production requirements of the EAP is established, which helps to significantly reduce power consumption and improve current efficiency in the EAP.

Through the aforementioned comprehensive analysis, some important contributions of this study can be summarized as follows:

(1) To solve the interference problem of mixed noise on the model accuracy, a “multi-source filter” technique that can be applied to the model’s parameter estimation under various noise characteristics is proposed.

(2) Considering the dynamic performance of the model, this paper combines the “multi-source filter” and a NN to establish an incremental learning prediction model.

(3) To reduce particle degradation in the algorithm, this paper adopts the mean and variance of the UKF to optimize the PF’s importance function based on a density kernel estimation.

(4) On the basis of the above research findings, in this article, a multi-source filter neural network (MSFNN) framework is developed, and its corresponding construction process is provided.

(5) The new framework is applied to the modeling of the EAP. The experimental results show that the MSFNN can accurately predict the current efficiency and power consumption data in real time.

The remainder of this paper is organized as follows: Section 2 gives a clear problem description encountered in the modeling process of electrolytic aluminium. Based on the NN’s state-space model, Section 3 presents the “multi-source filter” technique and states the process design, theoretical analysis and implementation steps of the new framework (MSFNN) in detail. In Section 4, the framework developed in this paper is applied and verified in the EAP. Section 5 provides a summary.

2 Problem description

In the process manufacturing industry [17], system models are often required to demonstrate accurate prediction performance and an excellent incremental learning capability. However, an industrial process system usually has many characteristics, such as complex and changing environments, multiple alternating processes and strong coupling among parameters. In addition, the production mechanism is often vague and difficult to quantify. Facing the above-mentioned complex conditions, although supervised machine learning algorithms are popular for establishing process models to predict technical indicators of a real technological process [18, 19], the established process system models still have great development potential and can be developed further.

For ease of description, an industrial process system is defined as follows:

$$ \left\{ \begin{aligned} x_{k} &= f(x_{k-1},u_{k})+\theta_{k}\\ y_{k} &= h(x_{k},u_{k})+\nu_{k} \end{aligned} \right. $$
(1)

where xk represents variables (decision variables) of the industrial process system to be estimated at moment k; uk and yk respectively represent input variables and output variables in the process system at moment k; 𝜃k and νk represent the process noise and measurement noise (not necessarily consistent with Gaussian noise), respectively. The functions f and h represent the relationship of the effective variables with time change.

Because the process noise 𝜃k and measurement noise νk in the above-mentioned process system often have some characteristics, such as unknown types and redundant distributions, the Kalman filter (KF) and particle filter (PF) [20] are mostly used to estimate the state of decision variables directly in order to establish an accurate prediction model.

Since the traditional KF algorithm can only be applied to linear systems, research scholars have expanded its application scope and proposed two improved filtering technologies, such as the extended Kalman filter (EKF) [21, 22] and unscented Kalman filter (UKF) [23, 24]. However, the above two strategies are restricted by the condition of a non-linear normal distribution. It can be seen from the non-Gaussian distribution system model in Fig. 1 that the probability distribution is represented as a complex curve, which is composed of multiple Gaussian curve segments with multiple peaks and valleys. In terms of properties, it is not just a fusion of several similar Gaussian distributions, which cannot be characterized by simple means and variances. In related studies [12, 13] on the above issues, the effect of the PF algorithm depends on the establishment of the importance function and the choice of the resampling method. Because the PF algorithm has the advantage of not requiring mandatory constraints on system state variables, it is an “approximately optimal” tool used to solve the problem of state parameter estimation for non-linear non-Gaussian manufacturing systems. This shows that manufacturing systems with different characteristics need to adopt different filtering methods. If the industrial process system meets the operating characteristics of linear Gaussian white noise, then the KF algorithm is preferred. If the industrial process system belongs to the category of a non-linear Gaussian distribution, then it is necessary to comprehensively choose a method with better performance according to the calculation size of different filtering algorithms and the accuracy of state estimation. If the process system conforms to the non-linear and non-Gaussian properties, then the particle filter technique is preferred.

Fig. 1
figure 1

Process design of the MSF technology

Table 1 lists the applicable range of various filtering technologies, from which we can see that the PF algorithm has the widest range of applicability. However, with the gradual development of the PF field, researchers have found that the PF may not be the best filtering technique when using it to approximately estimate all state parameters in industrial manufacturing systems. As the particles degenerate, the weights of most particles will decrease during the process of particle updating. The above phenomenon indicates that if the iterative update is continued, the weight updating exhibits no obvious improvement in the final filtering accuracy. Instead, the filter resources are exhausted to deal with the negligible particle calculation update. There are two methods to solve particle degradation: one is to increase the number of sampling particles, which may lead to the divergence of the PF algorithm itself. The other is to optimize the importance function to make it closer to the real distribution function.

Table 1 Applicability range of various filtering technologies

To solve the above problems, inspired by previous studies [25,26,27], this paper proposes the “multi-source filter” (MSF) technique, which utilizes the UKF’s mean and variance to guide the PF’s importance function based on the density kernel estimation method. This method not only inherits the characteristics and application range of the two filtering methods, but also can solve the problem of particle degradation in the PF. Therefore, the MSF technique ignores the influence of noise properties and overcomes the disadvantages of the PF algorithm, so it can be widely used in the estimation of state parameters with various single noise or mixed noise interferences in process manufacturing systems.

To clearly describe this method, Fig. 1 shows the basic process of MSF parameter estimation. This process includes the UKF segment and PF segment, which interact with each other through the adjustment of particles. From Fig. 1, the particles are processed by the unscented transformation (UT) method, and then the sampling distribution in step(b) is obtained by the density kernel estimation method after UKF optimization. Next, the PF method is used to update the particles on this basis. Figure 1(a) represents the initial sampling distribution. Figure 1(b) represents the sampling distribution after the UT method. Figure 1(c) represents the probability distribution after the particle weight is updated, and Fig. 1(d) is the probability distribution after the particle position is updated.

The process from Fig. 1(a) to Fig. 1(b) is mainly based on the UKF segment. First, a set of sample points (represented by the circle in the figure) are randomly generated from the prior distribution of the initial state space, and then “Sigma points” are calculated from the above sample points by using the UT method in the UKF. Finally, the mean and variance of these “Sigma points” are substituted for the real mean and variance to continuously adjust the sampling distribution. The following process from Fig. 1(b) to Fig. 1(c) shows that the PF’s importance function is adjusted by the mean and variance. The particles are sampled from the optimized importance function, and the weights of particles are constantly adjusted (shown as the change in the circle size in the figure) by using the measured data to modify the distribution. Finally, the weights are mapped to the probability distribution curve. Fig. 1(c) and 1(d) show that the particles in Fig. 1(c) are split to change the particles’ position (indicated as a circle from one to two in Fig. 1(d)), so as to obtain the final modified probability distribution.

Considering that the real-time internal and external data in the EAP are frequently exchanged and constantly changed [28], in order to ensure that the system exhibits good dynamic performance, the above theory is combined with an NN to predict the current efficiency and unit power consumption. The NN’s model parameters (i.e., weights) can be viewed as the state variables of the MSF, and its outputs can be viewed as measurement variables of the MSF. Then, in turn, the performance of the incremental learning model can be tested by the above NN. Finally, the perfect incremental learning prediction model for the Multiobjective problem (MOP) [29, 30] (i.e., unit power consumption and current efficiency in an EAP) is established.

In Fig. 2, we present the dynamic evolutionary process of the probability density distribution for two objectives in the incremental learning prediction model. Fig. 2(a) shows the process of updating the probability density distribution of unit power consumption with time in the EAP; Fig. 2(b) shows the process of updating the probability density distribution of current efficiency with time in the EAP. The above two figures reflect that the MOP prediction model established by MSF theory has an incremental learning ability, so that the model can evolve dynamically to predict the technical power consumption and current efficiency with time and sample changes in the EAP.

Fig. 2
figure 2

The dynamic evolutionary process of the MSF-MOP prediction model

Therefore, in order to fully tap the complementary advantages of the MSF and NN in the MOP, in this paper, a MSFNN is designed that can typically solve adaptive modeling problems with unknown mixed noise by deeply merging the MSF and NN. The process design, theoretical analysis and implementation steps of the new framework (MSFNN) will be presented in the next section.

3 Design of the MSFNN algorithm

3.1 State-space model of the neural network

The state-space representation based on the NN describes the updating process of the back-propagation neural network’s (BPNN’s) weights and thresholds with time [31]. The above process includes using both a state equation to describe the change in the weights and thresholds and a measurement equation to describe the non-linear relationship between the inputs and outputs of the model. The specific equation is as follows.

$$ \left\{ \begin{aligned} \omega_{k} &= \omega_{k-1}+\theta_{k}\\ y_{k} &= h(\omega_{k},u_{k})+\nu_{k} \end{aligned} \right. $$
(2)

where ωk represents the state variables at moment k (i.e., the BPNN’s weights and thresholds to be estimated); uk represents the input variables of the EAP at moment k; yk represents the measurement variables at moment k (i.e., the output variables to evaluate the advantages and disadvantages of the industrial process system). Assume that the system measurement noise νk is Gaussian noise with mean 0 and variance R; the system process noise 𝜃k is Gaussian noise with mean 0 and variance Q. The NN’s weights at moment k depend on the NN’s weights at moment k-1 and the random system process noise 𝜃k, and the measurement noise νk mainly describes the modeling error caused by sensors and other devices in the system.

The non-linear measurement function h(⋅) is approximated by using a multilayer perceptron.

$$ h_{l}(\omega,u)=\sum\limits_{j=1}^{9}\frac{\omega_{jk}^{\ast\ast}}{1+exp\left[-\left( \sum\limits_{i=1}^{9}\omega_{ij}^{\ast}u_{i}+a_{j}\right)\right]}+b_{l} $$
(3)

where \(\omega _{ij}^{\ast }\) represents the connection weights between the i-th input layer and the j-th hidden layer; aj represents the thresholds of hidden layer neurons; \(\omega _{jk}^{\ast \ast }\) represents the connection weights between the j-th hidden layer and the k-th output layer; bl represents the thresholds of output layer neurons; ui is the input variable.

3.2 Multi-source filter technique

The existing filtering theory takes the state-space model of any system as the research object. Under the premise of the known measurement value, the parameter estimation of the state variable is carried out by rigorous mathematical derivation, and the error between the state value of the estimated system and the real value of the corresponding system is finally within the allowable range [32]. However, because the types of noise are unknown, the distribution is complicated, and the forms are variable in the actual process, the existing single filtering methods [33] have limited the applicability and lowered the accuracy, which cannot solve the problem of mixed noise.

To solve the modeling problem of process manufacturing systems in an environment with mixed noise, this paper proposes the MSF technique. The MSF utilizes the mean and variance in the UKF to guide the importance function of the PF based on the density kernel estimation method [34, 35], and it can be embedded in any state model to perform a probability estimation of state variables. Then, the “Sigma points” in the UKF are employed to update the model at every moment. Finally, the weights are calculated and normalized by the modified importance function, and whether to perform sampling importance resampling (SIR) is judged by the number of effective particles, so as to achieve an accurate estimation of the state parameters (decision parameters) of the process system.

The above theory can be applied to the estimation of state parameters under various noise interference conditions and improve the filtering accuracy. The main advantages are as follows:

(1) The method extracts particles from the probability distribution established by the initial values, so it is applicable to the different characteristics of initial states.

$$ \begin{aligned} x \sim q(x_{k}|x_{0:k-1}) \end{aligned} $$
(4)

where x represents particles; q(xk|x0:k− 1) represents the probability of the state variable at moment k under the premise that the state data are known at moment k-1.

(2) The method of constructing a distribution function based on density kernel estimation is used to expand the application range of the filtering algorithm. This method gets rid of the previous filtering problem that the importance function is almost always represented by the Gaussian distribution \(N(\bar {x},\sigma )\) established by the mean \(\bar {x}\) and variance σ of samples. In the proposed MSF, it is only necessary to simulate the desired distribution as the optimal importance function through a set of random particles carrying weights.

$$ x \sim q(x_{k}|x_{0:k-1},z_{0:k}) = N(\bar{x},\sigma) $$
(5)
$$ x \sim q(x_{k}|x_{0:k-1},z_{0:k}) = F(\boldsymbol{x}) $$
(6)

where x represents the particles after sampling; x represents the particle set before sampling; q(xk|x0:k− 1,z0:k) represents the posterior probability of the state variable at moment k under the premise that the state data are known from moment 0 to k-1 and observation data are known from moment 0 to k. F is the distribution function constructed based on density kernel estimation.

Equation (5) shows that the importance function q(xk| x0:k− 1,z0:k) was approximated by the mean \(\bar {x}\) and variance σ, but the posterior distribution cannot often be represented by only a Gaussian curve. In this paper, the distribution function F in (6) is used to replace the normal distribution N in (5) as the importance function.

(3) To improve the algorithm accuracy, this method optimizes the importance function by utilizing the mean and variance obtained in the UKF, as shown in Fig. 3.

Fig. 3
figure 3

The optimization process of the importance function in the MSF algorithm

It is assumed that the curve’s expression in Fig. 3(a) is \(f(x)=\frac {1}{\sqrt {2\pi }\sigma }exp(-\frac {(x-\mu )^{2}}{2\sigma ^{2}})\). Where, μ is the mean and σ is the variance; x is the sampled particle and fmax(x) is the maximum probability density. The Gaussian model in Fig. 3(a) is established based on the mean μ and variance σ, which are updated by the UKF. The red vertical dotted line in Fig. 3 represents the symmetry axis of the Gaussian model. The green curve in Fig. 3(b) is the importance function established by the density kernel estimation method, from which we can see that it is a non-linear non-Gaussian curve. The green vertical dotted line, which represents the expectation of the importance function, divides the area enclosed by the green curve and the black line into two equal parts. In Fig. 3(c), the red vertical dotted line is on the left side of the green vertical dotted line, so the green curve should move toward the red curve to meet the requirements of the average expected value. Fig. 3(d) shows the importance function updated by the mean. Since the variance σ represents the distribution degree of all sampled particles, the importance function in Fig. 3(d) can be further optimized. The smaller the variance is, the more concentrated the distribution is, which makes the original importance function (red curve) move to the position of the yellow curve.

3.3 Design and analysis of the MSFNN algorithm

While modeling a process system with an unknown mechanism, the NN still has the ability to fit a large amount of non-linear process data, and further approximate the operation mode of a real process system. However, when a conventional NN constructs the process operation model of the industrial system, it is often assumed that the internal states of the process system and the interference of the external production environment are stable. In fact, the process system continuously exchanges materials, energy, and information with the external environment, making it difficult for the static NN to adapt to environmental change when modeling the process system.

To make full use of the complementarity between the MSF and NN, this study proposes a MSFNN framework. This MSFNN integrates the powerful non-linear fitting ability of the NN while using MSF theory to forecast the NN’s model parameters. Specifically, the model parameters (i.e., weights) act as state variables of the MSF. Furthermore, the predicted outputs of the process model act as the measurement variables of the MSF.

Taking the neural network state-space model established by (2) as the research object, the main steps of the MSFNN algorithm are as follows:

(1) Initialization.

Extract N particles \(\omega ^{i(a)}_{0} \sim p(\omega _{0})\), i = 1,2,⋯ ,N from the prior distribution p(ω0) established by the NN’s weights and thresholds.

$$ \bar{\omega}^{i(a)}_{0} = E(\omega^{i(a)}_{0}) $$
(7)
$$ P^{i(a)}_{0} = E[(\omega^{i(a)}_{0}-\bar{\omega}^{i(a)}_{0})(\omega^{i(a)}_{0}-\bar{\omega}^{i(a)}_{0})^{T}] $$
(8)

where \(\bar {\omega }^{i(a)}_{0}\) represents the mathematical expectation (mean) of particles; \(P^{i(a)}_{0}\) represents the variance matrix of particles. The superscript number represents the particle sequence, and the subscript number represents the time sequence.

(2) Update each particle with the UKF at each moment as follows:

a. Calculate the Sigma points of each particle.

$$ \begin{aligned} \chi^{i(a)}_{k-1} = [\bar{\omega}^{i(a)}_{k-1},\bar{\omega}^{i(a)}_{k-1}\pm\sqrt{(n_{a}+\lambda)P^{i(a)}_{k-1}}] \end{aligned} $$
(9)

where λ = α2(nx + κ) − nx is the proportional coefficient, and the size of α determines the distribution of the selected sample points around the mean \(\bar {\omega }\). In particular, lowering α can reduce higher-order effects to a greater extent; κ, nx and na are the setting parameters in the UKF.

b. Introduce particle recursion (time update).

$$ \begin{aligned} \chi^{i(x)}_{k|k-1} = f(\chi^{i(a)}_{k-1}) \end{aligned} $$
(10)
$$ \begin{aligned} \bar{\omega}^{i(x)}_{k|k-1} = \sum\limits_{j=1}^{2n_{a}} {w_{j}^{m}}\chi^{i(x)}_{j,k|k-1} \end{aligned} $$
(11)
$$ \begin{array}{@{}rcl@{}} P^{i}_{k|k-1} &=& \sum\limits_{j=1}^{2n_{a}} {w_{j}^{c}}(\chi^{i(x)}_{j,k|k-1}-\bar{\omega}^{i(x)}_{k|k-1}) \\&&\times (\chi^{i(x)}_{j,k|k-1}-\bar{\omega}^{i(x)}_{k|k-1})^{T} \end{array} $$
(12)
$$ \begin{aligned} y^{i}_{k|k-1} = h(\chi^{i(x)}_{j,k|k-1}) \end{aligned} $$
(13)

where χ is the sampling point obtained by UT method; \(\chi ^{i(a)}_{k-1}\) is the original sampling point; \(\chi ^{i(x)}_{k|k-1}\) is the sampling point obtained by symmetrically distributed sampling; \({w_{j}^{m}}\) and \({w_{j}^{c}}\) are the weights corresponding to the j-th sampling point, respectively. According to (1) and (2), it can be known that f(ω) = ω and \(h(\omega ) = \sum \limits _{j=1}^{9}\frac {\omega _{jk}^{\ast \ast }}{1+exp\left [-\left (\sum \limits _{i=1}^{9}\omega _{ij}^{\ast }u_{i}\right )+a_{j}\right ]}+b_{l}\).

c. Calculate new measurement values (measurement update).

The mean \(\bar {\omega }^{i}_{k}\) and variance \(\hat {P}^{i}_{k}\) of the statistics y are calculated as follows.

$$ \begin{aligned} \bar{y}^{i}_{k|k-1} = \sum\limits_{j=1}^{2n_{a}} {w_{j}^{m}}y^{i}_{j,k|k-1} \end{aligned} $$
(14)
$$ \begin{array}{@{}rcl@{}} P_{z_{k}z_{k}} &=& \sum\limits_{j=1}^{2n_{a}} {w_{j}^{c}}(y^{i}_{j,k|k-1}-\bar{y}^{i}_{k|k-1}) \\&&\times (y^{i}_{j,k|k-1}-\bar{y}^{i}_{k|k-1})^{T} \end{array} $$
(15)
$$ \begin{array}{@{}rcl@{}} P_{x_{k}z_{k}} &=& \sum\limits_{j=1}^{2n_{a}} {w_{j}^{c}}(\chi^{i}_{j,k|k-1}-\bar{\omega}^{i}_{k|k-1}) \\&&\times (y^{i}_{j,k|k-1}-\bar{y}^{i}_{k|k-1})^{T} \end{array} $$
(16)
$$ \begin{aligned} K_{k} = P_{x_{k}z_{k}}P_{z_{k}z_{k}}^{-1} \end{aligned} $$
(17)
$$ \begin{aligned} \bar{\omega}^{i}_{k} = \bar{\omega}^{i}_{k|k-1} + K_{k}(y_{k}-\bar{y}^{i}_{k|k-1}) \end{aligned} $$
(18)
$$ \begin{aligned} \hat{P}^{i}_{k} = P^{i}_{k|k-1} - K_{k}P_{z_{k}z_{k}}{K_{k}^{T}} \end{aligned} $$
(19)

d. Use a method based on density kernel estimation to construct the important function \(q({\omega _{k}^{i}}|x_{0:k-1}^{i},y_{1:k}) = F(\cdot )\), and then utilize the mean \(\bar {\omega }^{i}_{k}\) and variance \(\hat {P}^{i}_{k}\) in the UKF to optimize the importance function of the PF \(q({\omega _{k}^{i}}|x_{0:k-1}^{i},\) \(y_{1:k}) = \hat F(\cdot )\).

e. Calculate weights and normalize.

$$ \begin{aligned} {w^{i}_{k}} = w^{i}_{k-1}\frac{p(y_{k}|\bar{\omega}^{i}_{k})p(\bar{\omega}^{i}_{k}|\omega^{i}_{k-1})}{q(\bar{\omega}^{i}_{k}|\omega^{i}_{0:k-1},y_{1:k})} \end{aligned} $$
(20)
$$ \begin{aligned} \tilde{w}^{i}_{k} = {w^{i}_{k}} / \sum\limits_{j=1}^{N} {w_{k}^{j}} \end{aligned} $$
(21)

(3) SIR secondary resampling.

$$ \begin{aligned} N_{eff} = 1 / \sum\limits_{j=1}^{N} \tilde{w}_{k}^{j} \end{aligned} $$
(22)

If NeffNth (Nth is a set threshold, generally taken as N/3 ), it means that the weights of the particles have been seriously degraded, so the residual resampling [36, 37] is needed; otherwise, it goes directly to the next step.

(4) k = k + 1, go to step (2).

The above steps are the process of the MSFNN algorithm. (The pseudocode of the MSFNN is given in Appendix A.) It can be seen that the MSFNN establishes an important function F(⋅), which gets rid of the limitation of the Gaussian model and reduces the algorithm sensitivity to mixed noise. The accurate establishment of the important function F(⋅) is a critical part of the incremental learning model applied to non-linear non-Gaussian systems.

On the convergence of the research framework, the MSFNN takes a NN as the basic model and adopts the UKF’s mean and variance to optimize the PF’s importance function based on the density kernel estimation method. Therefore, the NN’s convergence performance is not changed [38]. Moreover, the convergence characteristic of the MSF algorithm depends on the UKF and PF. To simplify and clarify the discussion, the convergence analysis of the MSFNN can be found in Appendix B.

This section systematically presents the framework of the MSFNN incremental learning model through an in-depth analysis of important links in the model construction process and integrates MSF theory, a NN model, and density kernel estimation.

Figure 4 graphically shows the flow of the MSFNN incremental learning algorithm. First, the MSFNN algorithm needs to initialize the model parameters. Second, the UT method is performed near the estimated points, and the Sigma point sets are calculated. Third, these Sigma points are updated with time and measurement values. Then, the density function F is obtained through the density kernel estimation method, which is modified and optimized by means of the mean and variance in the UKF. Finally, the parameter estimation value is imported into the NN to test the performance of the model. If the system’s sample increases or decreases (i.e., the inputs or outputs are changed), the model can adaptively update the NN’s model parameters to achieve a new dynamic balance.

Fig. 4
figure 4

Algorithm flow of the MSFNN incremental learning model

In Fig. 4, the red font represents the main contribution and innovation of this paper, and the blue virtual boxes represent important modules of this proposed method. Among these modules, i, ii, iii and iv respectively represent updating sigma points by UKF, constructing importance function F, updating model parameters by PF and testing incremental learning model performance. The MSFNN algorithm proposed in this paper performs deep optimization of the traditional BPNN model construction algorithm, mainly including:

(1) The traditional BPNN belongs to the category of static modeling. In contrast, the MSFNN uses a dynamic modeling mechanism, which can adjust the model parameters adaptively as the external or internal environment changes.

(2) To make the model suitable for parameter estimation under mixed noise characteristics, the technique of the MSF is proposed and combined with the NN first.

(3) To reduce the influence of various noise on the algorithm for improving the estimation accuracy of NN’s parameters, this paper adopts a method utilizing the UKF’s mean and variance to guide the PF’s importance function based on the density kernel estimation method, so that the probability density function obtained from the state estimation can better tend to the real density function.

4 Multiobjective incremental learning model based on the MSFNN in the electrolytic aluminium equipment process system

To ensure that the industrial process system has an accurate prediction performance and a good incremental learning ability, the above algorithm can be applied to an EAP system [39]. The main steps in establishing a multiobjective incremental learning model of the EAP based on the MSFNN algorithm are as follows:

Step. 1: Import the input and output data of the EAP into the BPNN model to obtain the initial model parameters;

Step. 2: Build a basic process model, as described in (2), based on the NN principle;

Step. 3: The NN’s model parameters (i.e., weights and thresholds) are taken as particles, and then an iterative loop is performed according to the MSFNN algorithm ((7) to (22)) to obtain a new round of model parameters;

Step. 4: The newly obtained model parameters are imported into the BPNN model to test whether it meets the expected prediction result. If not, the above weights and thresholds are regarded as the particles in the new round of the MSFNN algorithm to continue to iteratively update until the expectation is met.

This paper uses the MSFNN as the theoretical framework to establish a multiobjective incremental learning prediction model for the unit power consumption and current efficiency of the EAP. The BPNN’s weights and thresholds are estimated by the MSF, which enables the production model of the EAP to have good adaptability, accurate predictive ability and a wide application range.

4.1 Experiment object and model parameters

In this paper, industrial experiments based on an electrolytic aluminium cell combining a shaped cathode and perforated anode are carried out [4], as shown in Fig. 5. In the figure, f1 represents the current efficiency, and f2 represents the power consumption of electrolytic aluminium. Ideally, the power consumption should be as low as possible, and the current efficiency should be as high as possible.

Fig. 5
figure 5

The core components of electrolytic aluminium cell

However, the process system of electrolytic aluminium equipment is complex and has the following characteristics: nonlinearity, multiple parameters, strong coupling and noise redundancy. Moreover, it is accompanied by several operation links, such as anode changing, bus lifting, shell punching and aluminium discharging [40, 41]. It is difficult to obtain an accurate multiobjective incremental learning model using traditional modeling methods in the EAP. Fortunately, the proposed MSFNN algorithm can be applied to an environment with various complex noise, and it can update and track the real-time status of multiple targets in the EAP, which has the potential to obtain a high-precision process model.

By analyzing the operating variables related to the two goals (i.e., current efficiency and power consumption) in the EAP, leveraging expert knowledge and a data acquisition system, nine main operating variables and two predicted objectives are acquired and are listed in Table 2. To facilitate k-fold cross validation, all 780 groups of sample data were collected from device No. 160 in 170kA series electrolytic aluminium equipment. To verify the effectiveness of the presented framework, we divided all 780 samples into 10 disjoint subsets on average. On this basis, 78 samples of one subset were selected as a testing set, and the other nine subsets were selected as a training set.

Table 2 The sample data from No. 160 electrolytic aluminium equipment

The MSFNN presented in the study was employed to build a 3-layer feedforward NN, which has 9 decision parameters and 2 outputs. The transfer functions of the second and third layers are Sigmoid and Purelin, respectively. The number of NN’s training iterations is set to 100. To maintain a certain precision and calculation speed, the second layer uses 9 neurons to form a 9-9-2 neural network structure in the experiment. In a consistent experimental object and environment, different models among PFNN, EPFNN, UPFNN and MSFNN are performed to predict technical indicators of the real-world EAP.

4.2 Analysis and discussion of experimental results

The following experimental results of the multiobjective prediction model based on the EAP all come from the PFNN, EPFNN, UPFNN and MSFNN algorithms. All process samples use the daily data shown in Table 2, and the experiment platform uses MATLAB R2014b (CPU: i7-9750H; RAM: 8.00GB; GPU: GTX 1660 Ti).

In Fig. 6, we select some weights and thresholds (such as \(\omega _{11}^{\ast }\), a1, \(\omega _{11}^{\ast \ast }\) and b1) represented in (3) to graphically show the evolution during the learning process. Figs. 7 and 8 show the fitting effects of DC power consumption and current efficiency by establishing the multiobjective prediction model for the EAP based on the above four algorithms. Figure 9 shows the comprehensive comparison effects of using the above four algorithms to predict the performance indicators (DC power consumption and current efficiency) of the EAP system.

Fig. 6
figure 6

The evolution of some weights and thresholds during the learning process

Fig. 7
figure 7

Prediction output of DC power consumption from the PFNN, EPFNN, UPFNN and MSFNN

Fig. 8
figure 8

Prediction output of current efficiency from PFNN, EPFNN, UPFNN and MSFNN

Fig. 9
figure 9

Comparison of multi-objective prediction effects from the PFNN, EPFNN, UPFNN and MSFNN

Figure 10(a) intuitively shows the relative error percentage of the DC power consumption when using the four algorithms to predict the EAP model; Fig. 10(b) depicts the relative error percentage of the current efficiency when predicting the EAP system model based on the 4 algorithms. It can be seen that the relative error of the MSFNN algorithm is smaller than that of the other models, which demonstrates that the fitting effect of the MSFNN is better than that of the other three models. According to the experimental results, it has been verified that it is feasible to combine the MSF and the NN. Overall, the performance of the MSFNN model is more in line with the true characteristics of the EAP.

Fig. 10
figure 10

The relative error percentage of different multi-objective prediction models

Table 3 compares different indicators from the multiobjective prediction models established by the above four algorithms, which shows six different evaluation criteria [42]. By analyzing Table 3, we can see that the predicted error in the PFNN model is larger than other models, and the predicted error value from the MSFNN process model is the smallest, indicating that the MSFNN’s prediction accuracy is quite high. Meanwhile, it can also be confirmed from the side angle that the MSF technique plays a significant role in exploring the optimal model, which helps to further improve model performance and finally obtains the best parameter estimation values.

Table 3 The comparison of related performance indicators from different models

The significance nonparametric tests [47, 48] (i.e., Wilcoxon rank-sum test, Friedman test and Nemenyi test), which are an effective tool to verify the effectiveness of the developed framework, are adopted to analyze the significant difference of different algorithms. The test results with the MSFNN model as the comparison object have been shown in Table 4. It can be seen from the verification result that the developed framework has a significant difference compared with other algorithms. Furthermore, the time and space complexity of different algorithms are also analyzed and shown in Table 4. It indicates that although the MSFNN is obtained by constantly optimizing the PFNN, the corresponding complexity does not grow due to it. Therefore, the superiority of the proposed algorithm is reflected once again.

$$ SSE=\sum\limits_{i=1}^{T} (y_{i}-y)^{2} $$
(23)
$$ MSE=\frac{1}{T} \sum\limits_{i=1}^{T} (y_{i}-y)^{2} $$
(24)
$$ RMSE=\sqrt{\frac{1}{T} \sum\limits_{i=1}^{T} (y_{i}-y)^{2}} $$
(25)

where yi is the predicted value of testing samples; y is the true value of testing samples; and T is the number of testing sample groups.

Table 4 The comparison of related performance indicators from different models

To better evaluate the prediction performance of different models and reduce the influence of overfitting on the proposed model, Tables 5 and 6 respectively give the statistical results of relevant performance indexes from DC power consumption and current efficiency based on k-fold cross-validation [49] with k = 10. The evaluation criteria include the mean absolute error (MAE), the mean relative error (MRE), and the correlation coefficient (R) [50]. Table 7 shows the statistical analysis results based on Tables 5 and 6, which better analyzes the 10 independent cross-validation tests of different algorithms. In Table 7, the comparison indicators include seven different levels. Based on the results, it can be seen that each indicator of the MSFNN algorithm is superior to other algorithms. The effectiveness of the proposed algorithm is proven again.

$$ MAE=\frac{1}{n} \sum\limits_{i=1}^{n} |y_{i}-\hat{y}_{i}| $$
(26)
$$ MRE=\frac{1}{n} \sum\limits_{i=1}^{n} |\frac{\hat{y}_{i}-y_{i}}{y_{i}}| $$
(27)
$$ R=\frac{{\sum}_{i=1}^{n} (\hat{y}_{i}-y_{i})(y_{i}-\bar{y})}{\sqrt{{\sum}_{i=1}^{n} (\hat{y}_{i}-y_{i}){\sum}_{i=1}^{n} (y_{i}-\bar{y})}} $$
(28)

where yi is the true value of testing samples; \(\hat {y_{i}}\) is the predicted value of testing samples; \(\bar {y_{i}}\) is the average value of yi; and n is the group number of testing samples.

Table 5 The statistical results of related performance indexes based on 10 cross-validation tests from PFNN, EPFNN, UPFNN and MSFNN models (DC aluminium power consumption)
Table 6 The statistical results of related performance indexes based on 10 cross-validation tests from PFNN, EPFNN, UPFNN and MSFNN models (Current efficiency)
Table 7 The statistical analysis of performance indexes from 10 independent cross-validation tests

Due to the complex production process of aluminum electrolysis, the cell condition information has dynamic and time-varying characteristics. To further verify the compensation ability of the proposed method for parameter variations and disturbance signals, different disturbances of 5%, 10% and 15% are artificially imposed on each model [4], as shown in Fig. 11(a)–(c). It is obvious from Fig. 11 that the DC energy consumption and current efficiency of the MSFNN can still resist the influence on the interference signal of electrolytic cell to a certain extent, indicating the advancement and effectiveness of this method.

Fig. 11
figure 11

The compensation ability of different methods in parameters variations and interference signals

By discussing the above experimental results, the fundamental reasons why the proposed method has better results than other methods are analyzed as follows:

(1) Although PFNN can handle nonlinear and non-Gaussian parameter estimation problems, the accuracy of the PFNN algorithm will gradually decrease with an increasing number of sampling particles. The main reason is the degradation of particles, that is, the weights of most particles decrease in the iterative process of particle updating. And MSFNN uses Sigma points obtained by UT method to guide the importance function, thus reducing the number of sampling particles and weakening the influence of particle degradation.

(2) The EPFNN uses the EKF to obtain sampling points for updating the importance function of the PF, while MSFNN proposes “multi-source filter” to update the importance function of the PF by using UT method to obtain Sigma points. Because EKF approximates linear estimation by discarding higher-order terms, its accuracy is poorer than that of UKF based on UT method.

(3) The UPFNN and MSFNN take into account the mean and variance of Sigma points obtained by UT method instead of the real mean and variance to achieve continuous sampling distribution. However, MSFNN constructs an important function F(x) based on density kernel estimation, which gets rid of the previous filtering problem that the importance function is almost always represented by the Gaussian distribution \(N(\bar {x},\sigma )\) established by the mean \(\bar {x}\) and variance σ of samples. It expands the application range of the filtering algorithm and reduces the algorithm sensitivity to mixed noise.

5 Conclusion

A multi-source filter neural network (MSFNN) algorithm is developed for exploring the system model’s predictive potential. To apply the algorithm to an environment with mixed noise, the MSF technique is presented first. The MSF utilizes the mean and variance in the UKF to optimize the PF’s importance function based on the density kernel estimation method. Then, the MSF employs the particles to evaluate the weights and thresholds of the NN. Finally, a multiobjective incremental learning prediction model based on the MSFNN for EAP systems is established. The performance comparison between the MSFNN and other electrolytic aluminium models established by the PFNN, EPFNN, and UPFNN algorithms shows that the multiobjective incremental learning model established by the MSFNN has high prediction accuracy and low sensitivity to noise interference, which greatly improves the adaptability of the EAP model.

However, this method is only applicable to the situation where the production data is available and the operation parameters are controllable. Moreover, although the MSFNN algorithm alleviates particle degradation, this problem still exists. In the future, the clustering kernel function smoothing method will be explored to overcome the problems of particle shortages in the MSFNN algorithm and the construction of deep filtering networks.