Identification of the ARX Model with Random Impulse Noise Based on Forgetting Factor Multi-error Information Entropy

Jing, Shaoxue

doi:10.1007/s00034-021-01809-3

Identification of the ARX Model with Random Impulse Noise Based on Forgetting Factor Multi-error Information Entropy

Published: 12 August 2021

Volume 41, pages 915–932, (2022)
Cite this article

Download PDF

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Identification of the ARX Model with Random Impulse Noise Based on Forgetting Factor Multi-error Information Entropy

Download PDF

Shaoxue Jing¹

1414 Accesses
5 Citations
Explore all metrics

Abstract

Entropy has been widely applied in system identification in the last decade. In this paper, a novel stochastic gradient algorithm based on minimum Shannon entropy is proposed. Though needing less computation than the mean square error algorithm, the traditional stochastic gradient algorithm converges relatively slowly. To make the convergence faster, a multi-error method and a forgetting factor are integrated into the algorithm. The scalar error is replaced by a vector error with stacked errors. Further, a simple step size method is proposed and a forgetting factor is adopted to adjust the step size. The proposed algorithm is utilized to estimate the parameters of an ARX model with random impulse noise. Several numerical solutions and case study indicate that the proposed algorithm can obtain more accurate estimates than the traditional gradient algorithm and has a faster convergence speed.

Two-stage Gradient-based Iterative Estimation Methods for Controlled Autoregressive Systems Using the Measurement Data

Article 06 November 2019

Weighted Parameter Estimation for Hammerstein Nonlinear ARX Systems

Article 20 September 2019

Aitken-Based Stochastic Gradient Algorithm for ARX Models with Time Delay

Article 03 December 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

ARX model is an AutoRegressive model with eXogenous terms [31]. Because of its simplicity and easy parameterization, the ARX model has been widely used to model a lot of real systems, such as micro-turbines, data improving, fault detection, biomedical signals, COVID-19 case forecasting and communication systems [1, 3, 7, 28, 34, 44].

Much research has been performed to identify ARX models in the last five decades. A piecewise auto-regressive exogenous structure was adopted to forecast the river floods [18]. A novel automated framework based on generalized spectral decomposition was proposed to estimate the parameters of an ARX model [33]. A new method based on the expectation–maximization (EM) algorithm was utilized for the identification of ARX models subject to missing data [24]. A recursive EM algorithm based on Student’s t-distribution was used for robust identification of ARX models [8]. A modified momentum gradient descent algorithm was investigated to identify ARX models [50]. A three-stage algorithm was studied for the identification of fractional differencing ARX with errors in variables [25].

However, most of the noises considered in the aforementioned papers are white noises or Gaussian noises. Random impulse signals can often be found in industrial signals, such as image signals, audio signals and communication signals [2, 9, 46].

Identification criteria play an important role in system identification. The classical identification criteria include the least square criterion, maximum likelihood criterion and so on. These criteria have the advantages of simple calculation and easy theoretical analysis. However, the performance of the least-squares algorithm is poor for the non-Gaussian case, and the maximum likelihood algorithm needs to know the conditional probability density of the sample. Because of these problems, many researchers have put forward many other criteria, such as $p-$norm error criterion and mixed-norm error criterion [39, 53]. In recent years, information criteria have become more widespread in signal processing and system identification [14, 23, 32]. Compared with mean square error (MSE) criterion, which focuses on second-order statistics, the information-theoretic criterion (e.g., minimum error entropy (MEE) [4], Renyi’s entropy [15, 41], fixed-point maximum correntropy [21]) is related to various statistical behaviors of the probability density function (pdf) of the error. Algorithms based on information-theoretic criterion may have better performance than MSE-based algorithms [6, 15, 16].

In the last decade, entropy has found significant applications in system identification. A maximum correntropy criterion (MCC) algorithm was proposed for sparse system identification based on normalized adaptive filter theory [30]. An extended version of correntropy, whose center can be located at any position, and a new optimization criterion based on MCC with a variable center, were proposed [5]. A blocked proportionate normalized maximum correntropy algorithm and a separable maximum correntropy adaptive algorithm were presented to identify dynamic systems [29, 45].

To decrease the entropy estimators’ complexity, a stochastic information gradient (SIG) algorithm was proposed and its performance was investigated [14]. To improve the estimates, a joint stochastic gradient algorithm based on MSE and MEE was proposed and applied to identify an RBF network [4]. Though having less complexity, the SIG converges very slowly. To speed up the SIG, like the multi-innovation used in [12], a multi-error strategy is adopted and a feasible equation for calculation of the step size is introduced.

Since its introduction in 2003, the SIG algorithm has been widely used in system identification and machine learning. For example, a kernel-based gradient descent algorithms based on MEE was proposed to find nonlinear structures in the data, and its convergence rate was deduced [22]. A kernel adaptive filter for quaternion data was developed, and a new algorithm based on the SIG approach was applied to this filter [37]. To avoid unstable training or poor performance in deep learning, a strategy of directly estimating the gradients of information measures with respect to model parameters was explored, and a general gradient estimation method for information measures was proposed [52]. To avoid potentially sub-optimal solutions with respect to class separability, a dimensionality reduction network training procedure based on the stochastic estimate of the mutual information gradient was presented [38].

For several decades, data-driven techniques have been used in modeling and fault detection. For example, a surrogate model was developed based on a data-driven approach, and it can facilitate the design and optimization processes of permanent magnet systems [36]. A Matlab toolbox for data-based fault detection was developed in a unified data-driven framework. [27]. A new recursive total principle component regression-based design and implementation approach was proposed for efficient data-driven fault detection and applied to vehicular cyber-physical systems [26].

In this paper, the problem of parameter identification of the ARX model disturbed by random impulse noise is studied. The premise is that the structure of the model is known, the type of noise is known, the identification data are normal measurement data, and there are no outliers except impulse noise. The possible outliers, modeling errors, and other uncertainties in practice are not considered. Interested readers can refer to the recent literature [13, 47,48,49]. The main contributions of this work are as follows:

(1)
For the SIG algorithm, a simple step size method is proposed.
(2)
To make the algorithm faster, a multi-error method that uses stack error instead of instantaneous scalar error is applied.
(3)
Since the stack length can only be taken as an integer, a forgetting factor is used to further accelerate the algorithm.
(4)
The proposed algorithm is utilized to estimate the parameters of an ARX model with random impulse noise. Several numerical simulations and a case study show the effectiveness of the algorithm.

The rest of this work is organized as follows. In the next section, we describe the ARX model to be estimated. Based on an SIG algorithm in Sect. 3, a multi-error SIG with a forgetting factor is presented in Sect. 4. The convergence and the computational cost are analyzed in Sect. 5. Then, parameter estimation of an ARX model with a random impulse noise and a gas furnace dataset from the literature [42] is used to validate the proposed algorithm in Sect. 6. Finally, conclusions are presented in Sect. 7.

2 Problem Description

Consider an ARX model depicted in Fig. 1, where u(k) is the input and y(k) is the output. $A({z^{ - 1}})$ and $B({z^{ - 1}})$ are two polynomials with respect to $z^{-1}$, and their degrees are $n_a$ and $n_b$, respectively. The model is polluted by a random impulse noise v(k).

It is easy to find that

$$\begin{aligned} y(k)=\frac{B({{z}^{-1}})}{A({{z}^{-1}})}u(k)+\frac{1}{A({{z}^{-1}})}v(k). \end{aligned}$$

(1)

Multiplying both sides of Eq. 1 by ${A({{z}^{-1}})}$ gives

$$\begin{aligned} A({{z}^{-1}})y(k)=B({{z}^{-1}})u(k)+v(k). \end{aligned}$$

(2)

Suppose $A({z^{ - 1}})=1+a_1 z^{-1}+a_2 z^{-2}+\cdots +a_{n_a}z^{-{n_a}}$ and $B({z^{ - 1}})=b_1 z^{-1}+b_2 z^{-2}+\cdots +b_{n_b}z^{-{n_b}}$, then we can directly parameterize the model as follows,

$$\begin{aligned} \begin{aligned} y(k)&=\left( 1-A({{z}^{-1}}) \right) y(k)+B({{z}^{-1}})u(k)+v(k), \\&={{{\varphi }}^T}(k){{\theta }}+v(k), \end{aligned} \end{aligned}$$

(3)

with

$$\begin{aligned} \left\{ \begin{array}{l} {{\theta }} = {\left[ a_1,\cdots , a_{n_a}, b_1,\cdots , b_{n_b} \right] ^T} \in {{\mathbb {R}}^{n \times 1}},\\ {{{\varphi }}}(k) = {\left[ -y(k-1),\cdots , -y(k-n_a), u(k-1),\cdots , u(k-n_b) \right] ^T} \in {{\mathbb {R}}{^{{n} \times 1}}},\\ n = {n_a} + {n_b } . \end{array} \right. \end{aligned}$$

(4)

Then, the identification of the ARX model shown in Fig. 1 can be transformed into the estimation of the parameters ${{{\theta }}}$ based on the observations $\left\{ {u(k),y(k)} \right\} _{k = 1}^N$ , where N is the data length.

However, traditional identification algorithms, such as the least-square algorithm and the mean square error algorithm, only consider the second moment of the error, and in some cases (such as the presence of random impulse noise) identification results deteriorate further. The information criterion algorithm based on a probability density function (pdf) considers the statistical information of each order of the error and is expected to achieve better estimates.

Next, we introduce the SIG algorithm and then describe our algorithm based on information gradient, which is integrated with the multi-error and a forgetting factor.

3 SIG of Shannon’s Error Entropy

Consider the parameterized system in Eq. 3, and denote a random error e(k) as

$$\begin{aligned} e(k)=y^*(k)-{{{\varphi }}^{T}}(k){\theta }, \end{aligned}$$

(5)

where $y^*(k)$ is the system output without noise.

Shannon’s entropy for e with pdf f(e) is [43]

$$\begin{aligned} H(e)=-\int _{-\infty }^{\infty } f(e) \log f(e) d e=E\left[ -\log f(e)\right] . \end{aligned}$$

(6)

In practice, the pdf of e, i.e., f(e), is unknown. Thus, Eq. 6 cannot be used to calculate the entropy of e directly. One way is to utilize a Parzen window to approximate the unknown pdf underlying the N observations by [41]

$$\begin{aligned} {\hat{f}}(e)=\frac{1}{N} \sum _{i=1}^{N} \kappa _{\sigma }\left( e-e(i)\right) , \end{aligned}$$

(7)

where $\kappa _{\sigma }(\cdot )$ is the kernel function with size $\sigma $ [40].

At time k, a Parzen window estimate of e with window length L is

$$\begin{aligned} {\hat{f}}\left( e(k)\right) =\frac{1}{L} \sum _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) , \end{aligned}$$

(8)

where $\varDelta _{ki} \text {=}e(k)-e(i)$, and e(k), e(i) denote the error at time k, i, respectively.

Thus, the stochastic entropy estimate at time k becomes

$$\begin{aligned} {\hat{H}} (e(k))=E\left[ -\log \left( \frac{1}{L} \sum _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) \right) \right] . \end{aligned}$$

(9)

Dropping the expectation in Eq. 9 [14], we obtain

$$\begin{aligned} {\hat{H}} (e(k))\approx -\log \left( \frac{1}{L} \sum _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) \right) . \end{aligned}$$

(10)

The stochastic gradient of Shannon entropy concerning ${\theta }$ at time k, ${\mathbf {g}}$, is

$$\begin{aligned} {\mathbf {g}}=-\frac{\sum \limits _{i=k-L}^{k-1} \kappa _{\sigma }^{\prime }\left( \varDelta _{ki}\right) \left( \frac{\partial e(k)}{\partial {\theta }}-\frac{\partial e(i)}{\partial {\theta }}\right) }{\sum \limits _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) }, \end{aligned}$$

(11)

where $\kappa _{\sigma }^{\prime }(\cdot )$ is the derivative of the kernel function.

Using the following Gaussian kernel with variance $\sigma ^2$, i.e.,

$$\begin{aligned} \kappa _{\sigma }\left( {\varDelta _{ki}}\right) =\exp \left( -\frac{\left\| {\varDelta _{ki}}\right\| ^{2}}{2 \sigma ^{2}}\right) , \end{aligned}$$

(12)

Equation 11 becomes

$$\begin{aligned} {\mathbf {g}}=\frac{\sum \limits _{i=k-L}^{k-1}{{{\kappa }_{\sigma }}\left( {{\varDelta }_{ki}} \right) {{\epsilon }_{ki}}{{\varDelta }_{ki}}}}{\sigma ^2\sum \limits _{i=k-L}^{k-1}{{{\kappa }_{\sigma }}\left( {{\varDelta }_{ki}} \right) }}, \end{aligned}$$

(13)

with

$$\begin{aligned} \epsilon _{ki}={\varphi }(k)-{\varphi }(i). \end{aligned}$$

(14)

The SIG for estimating the parameter vector ${\theta }$ is obtained as follows:

$$\begin{aligned} {\hat{\theta }}(k)={\hat{\theta }}(k-1)+ {\eta (k)} {\mathbf {g}}, \end{aligned}$$

(15)

where $ \eta (k)$ is the step size and is critical for convergence speed. However, equations to calculate the step size in [4] and [14] are too complicated to operate online. Here, we utilize the equation in stochastic gradient [10]:

$$\begin{aligned} \left\{ \begin{aligned}&r(k)=r(k-1)+\left\| {\varphi }(k) \right\| ^2, \quad r(0)=1,\\&\eta (k)=\frac{1}{r(k)}. \end{aligned} \right. \end{aligned}$$

(16)

In practice, the ${\theta }$ and the output without noise $y^{*}(k)$ in Eq. 5 are unknown. A feasible way is to replace them by $\hat{{\theta }}(k-1)$ and y(k), respectively. Thus, Eq. 5 becomes

$$\begin{aligned} e(k)=y(k)-{{{\varphi }}^{T}}(k)\hat{{\theta }}(k-1). \end{aligned}$$

(17)

4 Forgetting Factor Multi-error SIG Algorithm

One drawback of the SIG algorithm is its slow convergence. To enable the algorithm converge faster, a multi-error strategy is adopted and Eq. 13 is rewritten as follows:

$$\begin{aligned} {\mathbf {g}}=\frac{\sum \limits _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) \varDelta {\varvec{\Phi } }(p;k,i)\varDelta {\mathbf {E}}(p;k,i)}{\sigma ^2\sum \limits _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) } \end{aligned}$$

(18)

with

$$\begin{aligned} \left\{ \begin{aligned}&\varDelta {\varvec{\Phi } }(p;k,i)\text {=}{\varvec{\Phi } }(p,k)-{\varvec{\Phi } }(p,i), \\&\varDelta {\mathbf {E}}(p;k,i)\text {=}{\mathbf {E}}(p,k)-{\mathbf {E}}(p,i), \\ \end{aligned} \right. \end{aligned}$$

(19)

where p is the stack length and ${\mathbf {E}}(p,k)$ and ${\varvec{\Phi } }(p,k)$ are the stacked error vector and stacked information matrix, respectively,

$$\begin{aligned} {\mathbf {E}}(p, k)=\left[ \begin{array}{c} e(k) \\ e(k-1) \\ \vdots \\ e(k-p+1) \end{array}\right] \in {\mathbb {R}}^{p \times 1}, \end{aligned}$$

(20)

and

$$\begin{aligned} \varvec{\Phi }(p, k)= [{\varphi }(k), {\varphi }(k-1), \cdots , {\varphi }(k-p+1)] \in {\mathbb {R}}^{n \times p}. \end{aligned}$$

(21)

Note that the scalar error $\varDelta _{ki}$ in Eq. 13 is replaced by the vector error $\varDelta {\mathbf {E}}(k,i)$ in Eq. 18. In other words, multi-error takes the place of a single error. Thus, the algorithm is named a multi-error SIG algorithm (ME-SIG).

The stack length p can only be a positive integer. To make the ME-SIG faster, a forgetting factor (FF) $\lambda $ is introduced. The first equation of Eq. 16 becomes

$$\begin{aligned} r(k)=\lambda r(k-1)+\left\| {{\varphi }(k)} \right\| ^2, \quad r(0)=1. \end{aligned}$$

(22)

Equations 15 and 17–22 construct the FF-ME-SIG algorithm.

5 Performance Analysis

5.1 Convergence Analysis

The approximate linearization approach [17] is used to analyze the convergence of the proposed ME-SIG algorithm in Eq. (15) with Eqs. (18) and (19).

Subtracting ${\theta }_0$ from both sides of Eq. (15), we obtain

$$\begin{aligned} {{\tilde{\theta }}}(k)={{\tilde{\theta }}}(k-1)+ {\eta (k)} {\mathbf {g}}, \end{aligned}$$

(23)

where ${{\tilde{\theta }}}(k)={\hat{\theta }}(k)-{{\theta }}_{0}$ is the estimation error vector of the parameter.

Approximating the gradient ${\mathbf {g}}$ in Eq. (18) using the first-order Taylor expansion:

$$\begin{aligned} \begin{aligned} {\mathbf {g}}&\approx {\mathbf {g}}(\theta _0)+{H}_1 \left( {\hat{\theta }}(k-1)-{{\theta }_0}\right) \\&={H}_1 \left( {\hat{\theta }}(k-1)-{{\theta }_0}\right) \\&={H}_1 {{\tilde{\theta }}}(k-1), \end{aligned} \end{aligned}$$

(24)

where $H_1=\frac{\partial {\mathbf {g}}^{T}(\theta _0)}{\partial {\theta }}$ is the Hessian matrix and is expressed as

$$\begin{aligned} \begin{aligned} {H}_1=&\frac{\sum \limits _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) {\left( \varDelta {\mathbf {E}}^T(p;k,i)\right) }'\varDelta {\varvec{\Phi }}^T(k,i) }{\sigma ^2\sum \limits _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) }\\&+\frac{\sum \limits _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) {\varDelta {\mathbf {E}}^T(p;k,i)\left( \varDelta {\varvec{\Phi }}^T(k,i) \right) }'}{\sigma ^2\sum \limits _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) }\\&-\frac{\sum \limits _{i=k-L}^{k-1} \kappa _{\sigma }\left( \varDelta _{ki}\right) {\varDelta {\mathbf {E}}^T(p;k,i) \varDelta {\varvec{\Phi }}^T(k,i) } \sum \limits _{i=k-L}^{k-1} \kappa '_{\sigma }\left( \varDelta _{ki}\right) }{\sigma ^2\sum \limits _{i=k-L}^{k-1} \kappa _{\sigma }^2\left( \varDelta _{ki}\right) }.\\ \end{aligned} \end{aligned}$$

(25)

Substituting Eq. (24) into Eq. (23), we obtains

$$\begin{aligned} {{\tilde{\theta }}}(k)={{\tilde{\theta }}}(k-1)+ {\eta (k)} {{H}_1 }{{{\tilde{\theta }}}(k-1)}, \end{aligned}$$

(26)

We analyze the convergence of Eq. 26 by borrowing the results from the LMS convergence theory [19, 20]. Assuming that the Hessian matrix $H_1$ is a normal matrix and can be decomposed into the following normal form:

$$\begin{aligned} {H}_1=Q_1 \varLambda _1 Q_1^{-1}, \end{aligned}$$

(27)

where $Q_1$ is an $ m \times m $ orthogonal matrix, $ \varLambda _1=diag[\gamma _1, \gamma _2, \cdots , \gamma _m], \gamma _i$ is the eigenvalue of $H_1$. Then, the recursive Eq. 26 can be expressed as

$$\begin{aligned} \begin{aligned} {{{\tilde{\theta }}}(k)}&=Q_1\left[ I+\eta (k) \varLambda _1\right] Q_1^{-1} {{{\tilde{\theta }}}(k-1)}\\&=Q_1\left[ \prod \limits _{i=1}^{k}{\left( I+\eta (i)\varLambda _1 \right) }\right] Q_1^{-1} {{{\tilde{\theta }}}(0)}.\\ \end{aligned} \end{aligned}$$

(28)

Clearly, if the following conditions are satisfied, ${{{\tilde{\theta }}}(k)}\rightarrow {\mathbf {0}}$ , i.e., ${\hat{\theta }}(k)\rightarrow {\theta }_0$:

$$\begin{aligned} \left| 1+\eta (i) \gamma _{j}\right| <1, \quad i=1,2,\cdots ,k, \quad j=1, 2,\ldots , m. \end{aligned}$$

(29)

Thus, a sufficient condition that ensures the convergence of the algorithm is as follows:

$$\begin{aligned} \left\{ \begin{aligned}&\gamma _j<0, j=1,2,\cdots , m,\\&0<\eta (i)<\frac{2}{\underset{j}{\mathop {\max }}\,\vert {{\gamma }_{j}} \vert }. \end{aligned} \right. \end{aligned}$$

(30)

5.2 Computational Analysis

According to the calculation method of [11], the calculation amount of each iteration of the three algorithms is shown in Table 1, where $e^x$ is calculated by its Taylor expansion, and the first three terms are used. ‘Time’ denotes the time that the numerical example consumes.

Table 1 Computational cost of SIG, ME-SIG and FF-ME-SIG

Full size table

From the calculation of complexity and running time, we can see that the former algorithm has lower values than the others, while the latter two algorithms have little difference. The computational complexity of the two algorithms proposed in this paper is larger than that of the first algorithm, because the latter two algorithms need the calculation of multi-error. In terms of running time, the latter two algorithms take approximately 4% more time than the previous SIG algorithm, which is not a significant difference.

6 Experimental Results

Consider the ARX model depicted in Fig. 1 with

$$\begin{aligned} \left\{ \begin{array}{l} A(z^{-1})=1.0-1.5z^{-1}+0.7z^{-2},\\ B(z^{-1})=1.0z^{-1}+0.5z^{-2},\\ \end{array} \right. \end{aligned}$$

(31)

where input data u(k) are an M-sequence and v(k) is a random impulse noise. 5% of the output data (30 outputs) are randomly selected, and 30 noises with random amplitude between 0 and 1 are added, respectively. The curves of input u(k) and output y(k) are shown in Fig. 2. All simulation experiments use this model.

6.1 Numerical Simulation

(1) Results using SIG algorithm

The parameter estimates using the SIG with window length $L=3$ are shown in Table 2, where the estimation error $\delta $ is defined as $\delta \text {=}\frac{\left\| {\hat{\theta }}(k)\text {-}{{{\theta }}_{0}} \right\| }{\left\| {{{\theta }}_{0}} \right\| }\times 100$. It is easy to determine that estimation error decreases as data length k increases (for a given L). However, the errors are very large (38.9319%) at the end of the estimation.

Table 2 Results using SIG algorithm ($L=5$)

Full size table

(2) Results using ME-SIG algorithm

The parameter estimates using the proposed ME-SIG with stack length $p=5$ and $L=3$ are shown in Table 3. The estimation errors with different p are depicted in Fig. 3 ($L=3$). It can be seen that:

(1)
Estimation error of a given p decreases when data length k increases;
(2)
With stack length p increasing, the estimation error decreases quickly.

Table 3 Results using ME-SIG algorithm ($p=5, L=3$)

Full size table

(3) Results using FF-ME-SIG algorithm

The parameter estimates using proposed FF-ME-SIG with forgetting factor $\lambda =0.99$ are shown in Table 4, where $L=3$ and $p=5$. It can be seen that:

(1)
Estimation error decreases when data length k increases;
(2)
Compared with Table 3, the estimation error of the FF-ME-SIG is smaller at the same data length k.

Table 4 Results using FF-ME-SIG algorithm ($p=5, L=3, \lambda =0.99$)

Full size table

(4) Comparison of the results of SIG, ME-SIG and FF-ME-SIG algorithms

The estimation errors using SIG, ME-SIG and FF-ME-SIG are depicted in Fig. 4. It can be seen that:

(1)
All curves decrease when data length k increases;
(2)
The estimation error of the SIG algorithm is larger than that of ME-SIG, which means that multi-error can improve the accuracy of the SIG’s estimate;
(3)
The estimate of the FF-ME-SIG is the most accurate one of the three. In other words, the introduction of the forgetting factor improves the accuracy of the estimation.

(5) Results using FF-ME-SIG algorithm under different noise additions

To test the performance of the algorithm under different noise levels, we add 5%, 10%, 30%, and 50% of noises to the samples. The mean of the noise is 0.5, and the amplitude is between 0 and 1. The estimation results when $k=600$ are shown in Table 5, where $p=5, L=3, \lambda =0.99$. The estimation error curve is shown in Fig. 5. It can be seen from Table 5 and Fig. 5 that with the increase in the added noise, the estimation error tends to increase, but the change is small, which indicates that the proposed algorithm has strong adaptability to noise.

Table 5 Results using FF-ME-SIG algorithm under different noise additions ($p=5, L=3, \lambda =0.99$)

Full size table

(6) Results using the FF-ME-SIG for the identification of the Narendra difference equations

To support this paper’s argument, we add further simulation examples involving synthetic input–output relations such as the Narendra difference equations proposed in [35, 51]:

$$\begin{aligned} {\varvec{y}}(n+1)=0.3 {\varvec{y}}(n)+0.6 {\varvec{y}}(n-1)+f[{\varvec{e}}(n)], \end{aligned}$$

(32)

where

$$\begin{aligned} \left\{ \begin{aligned}&f(e)=0.6 \sin (\pi e)+0.3 \sin (3 \pi e)+0.1 \sin (5 \pi {\varvec{e}}) \\&{\varvec{e}}(n)=\sin \left[ (1+{\varvec{a}}) \omega _{0} n\right] . \end{aligned} \right. \end{aligned}$$

(33)

Using the following structure to the model above equation:

$$\begin{aligned} y(k)=a_1y(k-1)+a_2y(k-2)+a_3y(k-3)+a_4y^2(k-1)+a_5y^2(k-2) \end{aligned}$$

(34)

Let the data length be 240. The estimate using the proposed algorithm is $ \left[ 0.3084,0.3255,0.3641,-0.0264,0.0372\right] $ (when $k=240$). The predicted y(k) and observed y(k) are depicted in Fig. 6.

(7) Results using FF-ME-SIG algorithm and RLS, SG algorithm

To prove the superiority of the proposed algorithm, the identification results of the stochastic gradient (SG) algorithm and recursive least squares (RLS) algorithm are compared with that of the proposed algorithm. Figure 7 shows the estimation error curves of the three algorithms. When $k=600$, the estimation error of the SG, RLS and FF-ME-SIG is 55.5395%, 5.2045% and 4.8337%, respectively. It can be seen that the estimation error of SG is very large, and the estimation error of the RLS algorithm is slightly larger than that of the proposed algorithm. However, due to the impulse noise, the estimation error of the RLS algorithm changes dramatically, which indicates that the estimates given by the RLS algorithm change significantly.

Table 6 Results using SIG, ME-SIG and FF-ME-SIG algorithm

Full size table

6.2 Case Study

The data set of a gas furnace from the literature [42] is used to validate the proposed algorithm. These data were continuously collected from a gas furnace and then read every 9 s. The air feed of the furnace was kept constant, but the methane feed rate was varied and the resulting $\text {CO}_2$ concentration in the off gases was measured. There are 296 input–output data in this set. The first 200 data are adopted to estimate the parameters. The curves of the input and output are shown in Fig. 8. The estimation results and the prediction errors using SIG, ME-SIG and FF-ME-SIG are listed in Table 6. The outputs y(k) and the prediction errors pe(k) using the proposed algorithm are shown in Fig. 9.

(1)
It can be seen from Table 6 that among three algorithms, the proposed FF-ME-SIG algorithm has the smallest RMSE, which means that the proposed algorithm can give the most accurate estimate.
(2)
As shown in Fig. 9, the outputs of the obtained model using the proposed algorithm can predict the observations well.

7 Conclusions

In this paper, a novel SIG algorithm based on minimum error entropy is presented. The traditional SIG algorithm needs less computation than the MSE algorithm. However, it converges slowly. A multi-error strategy and a forgetting factor are introduced to speed up the SIG. We compared the results of SIG, ME-SIG and FF-ME-SIG by estimating the parameters of an ARX model with random impulse noise and through a case study. It is found that SIG with ME and FF can obtain accurate estimates, and it has a quick convergence rate.

Data Availability Statement

All data generated or analyzed during this study are included in the attached file “data used in the paper.xlsx”.

Abbreviations

u(k):: System input at time k
y(k):: System output at time k
$y^*(k)$ :: System output without noise at time k
n(k):: System noise at time k
${\theta }$ :: Parameter vector
${\theta }_0$ :: True value of parameter vector
$\hat{{\theta }}(k)$ :: Parameter estimate at time k
${{\tilde{\theta }}}(k)$ :: ${{\tilde{\theta }}}(k)={\hat{\theta }}(k)-{{\theta }}_{0}$
n :: Dimension of parameter vector
N :: Data length
L :: Parzen window length
$\sigma $ :: Kernel width of Gaussian kernel
e :: Error variable
e(k):: Error at time k
pdf:: Probability density function
f(e):: Pdf of error
${\hat{f}}(e(k))$ :: Estimate of f(e) at time k
H(e):: Shannon entropy for e
${\hat{H}}(e(k))$ :: Estimate of H(e) at time k
$E(\centerdot )$ :: Mathematical expectation
$\varDelta _{ki}$ :: $\varDelta _{ki}=e(k)-e(i)$
$\epsilon _{ki}$ :: $\epsilon _{ki}={\varphi }(k)-{\varphi }(i)$
${\mathbf {g}}(k)$ :: Stochastic gradient of Shannon entropy
$\kappa _{\sigma }(\centerdot )$ :: Gaussian kernel with variance $\sigma ^2$
$\kappa _{\sigma }^{\prime }(\cdot )$ :: Derivative of $\kappa _{\sigma }(\centerdot )$
SG:: Stochastic gradient
SIG:: Stochastic information gradient
$\eta (k)$ :: Step size of SIG
ME:: Multi-error
p :: Stack length of ME
${\mathbf {E}}(p, k)$ :: Stacked error vector
$ {\varvec{\Phi }}(p, k)$ :: Stacked information matrix
$\varDelta {\varvec{\Phi } }(p;k,i)$ :: $\varDelta {\varvec{\Phi } }(p;k,i)\text {=}{\varvec{\Phi } }(p,k)-{\varvec{\Phi } }(p,i)$
$\varDelta {{E}}(p;k,i)$ :: $ \varDelta {{E}}(p;k,i)\text {=}{{E}}(p,k)-{{E}}(p,i)$
FF:: Forgetting factor
$\lambda $ :: Forgetting factor
${{H}}_1$ :: $H_1=\frac{\partial {\mathbf {g}}^{T}}{\partial {\theta }}$
I :: Identity matrix
$\gamma $ :: Eigenvalue

References

H.N. Akouemo, R.J. Povinelli, Data improving in time series using ARX and ANN models. IEEE Trans. Power Syst. 32(5), 3352–3359 (2017)
Article Google Scholar
A. Awad, Impulse noise reduction in audio signal through multi-stage technique. Eng. Sci. Technol. Int. J. 22(2), 629–636 (2018)
Google Scholar
C. Böck, K. Kostoglou, P. Kovacs, M. Huemer, J. Meier, A linear parameter varying ARX model for describing biomedical signal couplings, in Computer Aided Systems Theory-EUROCAST 2019. 17th International Conference. (Las Palmas de Gran Canaria, Spain, 2020), pp. 339–346
B. Chen, J. Hu, H. Li, Z. Sun. A joint stochastic gradient algorithm and its application to system identification with RBF networks, in Proceedings of the 6th World Congress on Intelligent Control and Automation (Dalian, China, 2006), pp. 1754–1758
B. Chen, X. Wang, Y. Li, J.C. Principe, Maximum correntropy criterion with variable center. IEEE Signal Process. Lett. 26(8), 1212–1216 (2019)
Article Google Scholar
B. Chen, Y. Zhu, J. Hu, J.C. Principe, System Parameter Identification: Information Criteria and Algorithms (Elsevier, New York, 2013)
Google Scholar
J. Chen, Y. Liu, F. Ding, Q. Zhu, Gradient-based particle filter algorithm for an ARX model with nonlinear communication output. IEEE Trans. Syst. Man Cybern. Syst. 50(6), 2198–2207 (2020)
Article Google Scholar
X. Chen, S. Zhao, F. Liu, Robust identification of linear ARX models with recursive EM algorithm based on student’s t-distribution. J. Frankl. Inst. 358(1), 1103–1121 (2021)
H. Dawood, H. Dawood, P. Guo, Removal of random-valued impulse noise by local statistics. Multimed. Tools Appl. 74(24), 11485–11498 (2015)
Article Google Scholar
F. Ding, System identification. Part F: multi-innovation identification theory and methods. J. Nanjing Univ. Inf. Sci. Technol. 4(1), 1–28 (2012)
MATH Google Scholar
F. Ding, New Theory of System Identification (Tsinghua University Press, Beijing, 2013)
Google Scholar
F. Ding, T. Chen, Performance analysis of multi-innovation gradient type identification methods. Automatica 43(1), 1–14 (2007)
Article MathSciNet MATH Google Scholar
X. Dong, S. He, V. Stojanovic, Robust fault detection filter design for a class of discrete-time conic-type nonlinear Markov jump systems with jump fault signals. IET Control Theory Appl. 14(14), 1912–1919 (2020)
Article Google Scholar
D. Erdogmus, K.E. Hild, J.C. Principe, Online entropy manipulation: stochastic information gradient. Signal Process. Lett. 10(8), 242–245 (2003)
Article Google Scholar
D. Erdogmus, J.C. Principe, An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems. IEEE Trans. Signal Process. 50(7), 1780–1786 (2002)
Article Google Scholar
D. Erdogmus, J.C. Principe, Generalized information potential criterion for adaptive system training. IEEE Trans. Neural Netw. 13(5), 35–44 (2002)
Article Google Scholar
D. Erdogmus, J.C. Principe, Convergence properties and data efficiency of the minimum error entropy criterion in Adaline training. IEEE Trans. Signal Process. 51(7), 1966–1978 (2003)
Article Google Scholar
B. Hadid, E. Duviella, S. Lecoeuche, Data-driven modeling for river flood forecasting based on a piecewise linear ARX system identification. J. Process Control 86, 44–56 (2020)
Article Google Scholar
S. Haykin, Least-Mean-Square Adaptive Filters (Wiley, New York, 2003)
Book Google Scholar
S. Haykin, Adaptive Filter Theory (Pearson Education Limited, England, 2014)
MATH Google Scholar
A.R. Heravi, G.A. Hodtani, Comparison of the convergence rates of the new correntropy-based Levenberg–Marquardt (CLM) method and the fixed-point maximum correntropy (FP-MCC) algorithm. Circuits Syst. Signal Process. 37(7), 2884–2910 (2018)
Article MathSciNet MATH Google Scholar
T. Hu, Q. Wu, D. Zhou, Kernel gradient descent algorithm for information theoretic learning. J. Approx. Theory 263, 105518 (2021)
Article MathSciNet MATH Google Scholar
A. Hyvarinen, E. Oja, Independent component analysis: algorithms and applications. Neural Netw. 13(4), 411–430 (2000)
Article Google Scholar
A.J. Isaksson, Identification of ARX-models subject to missing data. IEEE Trans. Autom. Control 38(5), 813–819 (2002)
Article MathSciNet MATH Google Scholar
D.V. Ivanov, I.L. Sandler, O.A. Katsyuba, V.N. Vlasova, Identification of FARARX models with errors in variables, in Recent Trends in Intelligent Computing, Communication and Devices. Advances in Intelligent Systems and Computing, vol. 1006, ed. by V. Jain, S. Patnaik, V.F. Popentiu, I. Sethi (Springer, Singapore, 2020), pp. 481–487
Google Scholar
Y. Jiang, S. Yin, Recursive total principle component regression based fault detection and its application to vehicular cyber-physical systems. IEEE Trans. Industr. Inf. 14(4), 1415–1423 (2017)
Article Google Scholar
Y. Jiang, S. Yin, Recent advances in key-performance-indicator oriented prognosis and diagnosis with a Matlab toolbox: Db-kit. IEEE Trans. Industr. Inf. 15(5), 2849–2858 (2018)
Article Google Scholar
F. Jurado, A. Cano, Use of ARX algorithms for modelling micro-turbines on the distribution feeder. IEE Proc. Gener. Trans. Distrib. 151(2), 232–238 (2004)
Article Google Scholar
Y. Li, Z. Jiang, W. Shi, X. Han, B. Chen, Blocked maximum correntropy criterion algorithm for cluster-sparse system identifications. IEEE Trans. Circuits Syst. II Express Briefs 66(11), 1915–1919 (2019)
Google Scholar
Y. Li, Y. Wang, R. Yang, F. Albu, A soft parameter function penalized normalized maximum correntropy criterion algorithm for sparse system identification. Entropy 19(1), 1–16 (2017)
Article MathSciNet Google Scholar
L. Ljung, System Identification: Theory for the User (Tsinghua University Press, Beijing, 2002)
MATH Google Scholar
W. Magdy, T. Elsayed, Unsupervised adaptive microblog filtering for broad dynamic topics. Inf. Process. Manage. 52(4), 513–528 (2016)
Article Google Scholar
D. Maurya, A. Tangirala, S. Narasimhan. ARX model identification using generalized spectral decomposition. eprint arXiv:2008.04779 (2020)
T. Najeh, C.B. Njima, T. Garna, J. Ragot, Input fault detection and estimation using pi observer based on the ARX-Laguerre model. Int. J. Adv. Manuf. Technol. 90(5), 1317–1336 (2017)
Article Google Scholar
K.S. Narendra, K. Parthasarathy, Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1(1), 4–27 (1990)
Article Google Scholar
V.T. Nguyen, M. Bermingham, M.S. Dargusch, Data-driven modelling of the interaction force between permanent magnets. J. Magn. Magn. Mater. 532, 167869 (2021)
Article Google Scholar
T. Ogunfunmi, C. Safarian, The quaternion stochastic information gradient algorithm for nonlinear adaptive systems. IEEE Trans. Signal Process. 67(23), 5909–5921 (2019)
Article MathSciNet MATH Google Scholar
O. Özdenizci, D. Erdogmus, Stochastic mutual information gradient estimation for dimensionality reduction networks. Inf. Sci. 570, 298–305 (2021)
Article MathSciNet Google Scholar
E.V. Papoulis, T. Stathaki, A normalized robust mixed-norm adaptive algorithm for system identification. IEEE Signal Process. Lett. 11(1), 56–59 (2004)
Article Google Scholar
E. Parzen, On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
J.C. Principe, Information Theoretic Learning: Renyis Entropy and Kernel Perspectives (Springer, New York, 2010)
Book MATH Google Scholar
T. Söderström, P. Stoica, Instrumental variable methods for system identification. Lect. Notes Control Inf. Ences 21(1), 1–9 (1983)
MathSciNet MATH Google Scholar
C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Article MathSciNet MATH Google Scholar
R.R. Sharma, M. Kumar, S. Maheshwari, K.P. Ray, Evdhm-Arima-based time series forecasting model and its application for Covid-19 cases. IEEE Trans. Instrum. Meas. 70, 1–10 (2021)
Google Scholar
W. Shi, Y. Li, B. Chen, A separable maximum correntropy adaptive algorithm. IEEE Trans. Circuits Syst. II Express Briefs 67(11), 2797–2801 (2020)
Google Scholar
W. Shieh, I.B. Djordjevic, OFDM for Optical Communications (Elsevier, London, 2010)
Google Scholar
V. Stojanovic, S. He, B. Zhang, State and parameter joint estimation of linear stochastic systems in presence of faults and non-gaussian noises. Int. J. Robust Nonlinear Control 30(16), 6683–6700 (2020)
Article MathSciNet Google Scholar
V. Stojanovic, D. Prsic, Robust identification for fault detection in the presence of non-gaussian noises: application to hydraulic servo drives. Nonlinear Dyn. 100, 2299–2313 (2020)
Article Google Scholar
H. Tao, P. Wang, Y. Chen, V. Stojanovic, H. Yang, An unsupervised fault diagnosis method for rolling bearing using STFT and generative neural networks. J. Franklin Inst. 357(11), 7286–7307 (2020)
Article MathSciNet MATH Google Scholar
Q. Tu, Y. Rong, J. Chen, Parameter identification of ARX models based on modified momentum gradient descent algorithm. Complexity 2020(3), 1–11 (2020)
Article MATH Google Scholar
C. Turchetti, G. Biagetti, F. Gianfelici, P. Crippa, Nonlinear system identification: an effective framework based on the Karhunen–Loeve transform. IEEE Trans. Signal Process. 57(2), 536–550 (2009)
Article MathSciNet MATH Google Scholar
L. Wen, H. Bai, L. He, Y. Zhou, M. Zhou, Z. Xu, Gradient estimation of information measures in deep learning. Knowl. Based Syst. 224, 107046 (2021)
Article Google Scholar
H. Zayyani, Continuous mixed $p$-norm adaptive algorithm for system identification. IEEE Signal Process. Lett. 21(9), 1108–1110 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Physics and Electronic Electrical Engineering, Huaiyin Normal University, Huaian, 223300, Jiangsu, China
Shaoxue Jing

Authors

Shaoxue Jing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaoxue Jing.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the Natural Science Research Project of Jiangsu Higher School, China (19KJD510001), and the National Scholarship Foundation of China (201908320048).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jing, S. Identification of the ARX Model with Random Impulse Noise Based on Forgetting Factor Multi-error Information Entropy. Circuits Syst Signal Process 41, 915–932 (2022). https://doi.org/10.1007/s00034-021-01809-3

Download citation

Received: 22 November 2020
Revised: 22 July 2021
Accepted: 23 July 2021
Published: 12 August 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00034-021-01809-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Identification of the ARX Model with Random Impulse Noise Based on Forgetting Factor Multi-error Information Entropy

Abstract

Similar content being viewed by others

Two-stage Gradient-based Iterative Estimation Methods for Controlled Autoregressive Systems Using the Measurement Data

Weighted Parameter Estimation for Hammerstein Nonlinear ARX Systems

Aitken-Based Stochastic Gradient Algorithm for ARX Models with Time Delay

1 Introduction

2 Problem Description

3 SIG of Shannon’s Error Entropy

4 Forgetting Factor Multi-error SIG Algorithm