Skip to main content
Log in

Adaptive online sequential extreme learning machine for dynamic modeling

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Extreme learning machine (ELM) is an emerging machine learning algorithm for training single-hidden-layer feedforward networks (SLFNs). The salient features of ELM are that its hidden layer parameters can be generated randomly, and only the corresponding output weights are determined analytically through the least-square manner, so it is easier to be implemented with faster learning speed and better generalization performance. As the online version of ELM, online sequential ELM (OS-ELM) can deal with the sequentially coming data one by one or chunk by chunk with fixed or varying chunk size. However, OS-ELM cannot function well in dealing with dynamic modeling problems due to the data saturation problem. In order to tackle this issue, in this paper, we propose a novel OS-ELM, named adaptive OS-ELM (AOS-ELM), for enhancing the generalization performance and dynamic tracking capability of OS-ELM for modeling problems in nonstationary environments. The proposed AOS-ELM can efficiently reduce the negative effects of the data saturation problem, in which approximate linear dependence (ALD) and a modified hybrid forgetting mechanism (HFM) are adopted to filter the useless new data and alleviate the impacts of the outdated data, respectively. The performance of AOS-ELM is verified using selected benchmark datasets and a real-world application, i.e., device-free localization (DFL), by comparing it with classic ELM, OS-ELM, FOS-ELM, and DU-OS-ELM. Experimental results demonstrate that AOS-ELM can achieve better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Engel Y, Mannor S, Meir R (2004) The kernel recursive east-squares algorithm. IEEE Trans Signal Process 52(8):2275–2285

    Article  MathSciNet  Google Scholar 

  • Fusi S, Miller EK, Rigotti M (2016) Why neurons mix: High dimensionality for higher cognition. Curr Opin Neurobio 37:66–74

    Article  Google Scholar 

  • Gu Y, Liu J, Chen Y, Jiang X, Yu H (2014) TOSELM: Timeless online sequential extreme learning machine. Neurocomptuing 128:119–127

    Article  Google Scholar 

  • Hirnik K (1991) Approximation capabilities of multilayer feedforword networks. Neural Netw 2:251–257

    Article  Google Scholar 

  • Huang GB, Chen YQ, Babri HA (2000) Classification ability of single hidden layer feedforward networks. IEEE Trans Neural Netw 11:799–801

    Article  Google Scholar 

  • Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: Theory and applications. Neurocomptuing 70:489–501

    Article  Google Scholar 

  • Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892

    Article  Google Scholar 

  • Lan Y, Soh YC, Huang GB (2009) Ensemble of online sequential extreme learning machine. Neurocomputing 72:3391–3395

    Article  Google Scholar 

  • Li Y, Zhang S, Yin Y, Xiao W, Zhang J (2017) A novel online sequential extreme learning machine for gas utilization ratio prediction in blast furnaces. Sensors 17(8):1847–1870

    Article  Google Scholar 

  • Li Y, Zhang S, Yin Y, Zhang J, Xiao W (2019) A soft sensing scheme of gas utilization ratio prediction for balst furnace via improved extreme learning machine. Neural Process Lett 50:1191–1213

    Article  Google Scholar 

  • Li Y, Zhang S, Zhang J, Yin Y, Xiao W, Zhang Z (2020) Data-driven multi-objective optimizaton for burden surface in blast furnace with feedback compensation. IEEE Trans Ind Inform 16(4):2233–2244

    Article  Google Scholar 

  • Li Y, Li H, Zhang J, Zhang S, Yin Y (2020) Burden surface decision using MODE with TOPSIS in blast furnace ironmkaing. IEEE Access 8:5712–35725

    Google Scholar 

  • Liang NY, Huang GB, Saratchandran P, Sundararajan N (2006) A fast and accurate online squential learning algorithm for feedforward networks. IEEE Trans Neural Netw 17(6):1411–1423

    Article  Google Scholar 

  • Lim JS, Lee S, Pang HS (2013) Low complexity adaptive forgetting factor for online sequential extreme learning machine (OS-ELM) for application to nonstationary system estimations. Neural Comput Appl 22:569–576

    Article  Google Scholar 

  • Maponi P (2007) The solution of linear systems by using the Sherman-Morrison formula. Linear Algebra Appl 420(2–3):276–294

    Article  MathSciNet  Google Scholar 

  • Merched R (2013) A unified approach to structured covariances: Fast generalized sliding window RLS recursions for arbitrary basis. IEEE Trans Signal Process 61(23):6060–6075

    Article  MathSciNet  Google Scholar 

  • Safaei A, Wu QMJ, Thangarajah A, Yang Y (2019) System-on-a-chip (SoC)-based hardware acceleration for an online sequential extreme learning machine (OS-ELM). IEEE Trans Comput Aided Des Int Circuits Syst 38(11):2127–2138

    Article  Google Scholar 

  • Scardapane S, Comminiello D, Scarpiniti M, Uncini A (2015) Online sequential extreme learning machine with kernels. IEEE Trans Neural Netw Learn Syst 26(9):2214–2220

    Article  MathSciNet  Google Scholar 

  • Soares SG, Araujo R (2016) An adaptive ensemble of on-line extreme learning machines with variable forgetting factor for dynamic system prediction. Neurocomptuing 171:693–707

    Article  Google Scholar 

  • Tang J, Yu W, Chai T, Zhao L (2012) On-line principal component analysis with application to process modeling. Neurocomputing 82:167–178

    Article  Google Scholar 

  • Wang R, Kwong S, Wang XZ (2012) A study on random weights between input and hidden layers in extreme learning machine. Soft Comput 16:1465–1475

    Article  Google Scholar 

  • Xiao W, Zhang J, Li Y, Zhang S, Yang W (2017) Class-specific cost regulation extreme learning machine for imbalanced classification. Neurocomptuing 261:70–82

    Article  Google Scholar 

  • Zhang J, Xiao W, Zhang S, Huang S (2017) Device-free localization via an extreme learning machine with parameterized geometrical feature extraction. Sensors 17(4):879–890

    Article  Google Scholar 

  • Zhang J, Xiao X, Li Y, Zhang S (2018) Residual compensation extreme learning machine for regression. Neurocomputing 311:126–136

    Article  Google Scholar 

  • Zhang J, Xiao W, Li Y, Zhang S, Zhang Z (2020) Multilayer probability extreme learning machine for device-free localization. Neurocomputing 396:383–393

    Article  Google Scholar 

  • Zhao J, Wang Z, Park DS (2012) Online sequential extreme learning machine with forgetting mechanism. Neurocomptuing 87:79–89

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their insightful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wendong Xiao.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Proof of Theorem 1

Assuming that from time q to time \(q+r-1\), there are input vector \({X_q}\) and output vector \({Y_q}\) of the r samples, where

$$\begin{aligned} {X_{q + r - 1}}= & {} [{x_q},{x_{q + 1}},\ldots ,{x_{q + r - 1}}] \end{aligned}$$
(29)
$$\begin{aligned} {Y_{q + r - 1}}= & {} [{y_q},{y_{q + 1}},\ldots ,{y_{q + r - 1}}] \end{aligned}$$
(30)

The corresponding output weights \({\beta _{q + r - 1}}\) can be obtained by

$$\begin{aligned} Min:\left\| {{H_{q + r - 1}}{\beta _{q + r - 1}} - {Y_{q + r - 1}}} \right\| \end{aligned}$$
(31)

where \({H_{q + r - 1}}\) is the hidden layer output matrix of the r samples from time q to time \(q+r\)-1:

$$\begin{aligned} {H_{q + r - 1}}= & {} \left[ {\begin{array}{*{20}{c}} {G({a_1},{b_1},{x_q})}&{} \cdots &{}{G({a_N},{b_N},{x_q})}\\ \vdots &{} \ddots &{} \vdots \\ {G({a_1},{b_1},{x_{q + r - 1}})}&{} \cdots &{}{G({a_N},{b_N},{x_{q + r - 1}})} \end{array}} \right] \end{aligned}$$
(32)

Let \({P_{q + r - 1}} = K_{q + r - 1}^{ - 1}\), the solution of \({\beta _{q + r - 1}}\) is

$$\begin{aligned} {\beta _{q + r - 1}} = {K_{q + r - 1}^{ - 1}}H_{q + r - 1}^T{Y_{q + r - 1}} \end{aligned}$$
(33)

where \({K_{q + r - 1}} = H_{q + r - 1}^T{H_{q + r - 1}}\).

At time \(q+r\), when a data pair \(({x_{q + r}},{y_{q + r}})\) comes, \({H_{q + r - 1}}\) becomes

$$\begin{aligned} \begin{array}{l} {H_{q + r}} = \left[ {\begin{array}{*{20}{c}} {G({a_1},{b_1},{x_q})}&{} \cdots &{}{G({a_N},{b_N},{x_q})}\\ \vdots &{} \ddots &{} \vdots \\ {G({a_1},{b_1},{x_{q + r - 1}})}&{} \cdots &{}{G({a_N},{b_N},{x_{q + r - 1}})}\\ {G({a_1},{b_1},{x_{q + r}})}&{} \cdots &{}{G({a_N},{b_N},{x_{q + r}})} \end{array}} \right] \nonumber \\ ~~~~~~~= \left[ {\begin{array}{*{20}{c}} {{H_{q + r - 1}}}\\ {{h^T}(q + r)} \end{array}} \right] \end{array}\nonumber \\ \end{aligned}$$
(34)

The corresponding output weights \({\beta _{q + r}}\) can be obtained by

$$\begin{aligned} Min:\left\| {{H_{q + r}}{\beta _{q + r}} - {Y_{q + r}}} \right\| \end{aligned}$$
(35)

where \({Y_{q + r}} = [{y_q},{y_{q + 1}},\ldots ,{y_{q + r - 1}},{y_{q + r}}]\).

The solution of \({\beta _{q + r}}\) is

$$\begin{aligned} {\beta _{q + r}} = {K_{q + r - 1}^{ - 1}}H_{q + r}^T{Y_{q + r}} \end{aligned}$$
(36)

where

$$\begin{aligned} \begin{aligned} {K_{q + r}}&= H_{q + r}^T{H_{q + r}}\\&= [\begin{array}{*{20}{c}} {H_{q + r - 1}^T}&{h(q + r)} \end{array}]\left[ {\begin{array}{*{20}{c}} {{H_{q + r - 1}}}\\ {{h^T}(q + r)} \end{array}} \right] \\&= H_{q + r - 1}^T{H_{q + r - 1}} + h(q + r){h^T}(q + r)\\&= {[{K_{q + r - 1}} + h(q + r){h^T}(q + r)]^{ - 1}} \end{aligned} \end{aligned}$$
(37)

Then,

$$\begin{aligned} \begin{aligned} {\beta _{q + r}}&= K_{q + r}^{ - 1}H_{q + r}^T{Y_{q + r}}\\&= K_{q + r}^{ - 1}\sum \limits _{i = q}^{q + r} {h(i)t(i)} \\&= K_{q + r}^{ - 1}[H_{q + r - 1}^T{Y_{q + r - 1}} + h(q + r)y(q + r)]\\&= K_{q + r}^{ - 1}[{K_{q + r - 1}}{\beta _{q + r - 1}} + h(q + r)y(q + r)]\\&= K_{q + r}^{ - 1}\left\{ {[{K_{q + r}} - h(q + r){h^T}(q + r)]{\beta _{q + r - 1}}}\right. \\&\quad \left. { + h(q + r)y(q + r)} \right\} \\&= {\beta _{q + r - 1}} + K_{q + r}^{ - 1}h(q + r)\\&\qquad [y(q + r) - {h^T}(q + r){\beta _{q + r - 1}}] \end{aligned} \end{aligned}$$
(38)

Accordingly, we have

$$\begin{aligned} {P_{q \!+ \!r}}&= {P_{q \!+\! r \!- \!1}}\! -\! {P_{q \!+\! r\! - \!1}}h(q \!+ \!r)\nonumber \\&\qquad {[I\! +\! {h^T}(q \!+\! r){P_{q \!+\! r - 1}}(q \!+\! r)]^{ - 1}}{h^T}(q \!+ \!r){P_{q \!+\! r - 1}}\nonumber \\&= [I - \frac{{{P_{q + r - 1}}h(q + r){h^T}(q + r)}}{{1 + {h^T}(q + r){P_{q + r - 1}}h(q + r)}}]{P_{q + r - 1}} \end{aligned}$$
(39)

Because \({P_{q + r - 1}} = {(H_{q + r - 1}^T{H_{q + r - 1}})^{ - 1}}\) and \({P_{q + r}} = {P_{q + r}} + h(q + r){h^T}(q + r)\), \({P_{q + r - 1}}\) and \({P_{q + r}}\) are positive definite matrices. Thus, we have

$$\begin{aligned} {P_{q + r}} < {P_{q + r - 1}} \end{aligned}$$
(40)

According to (40), we have the following relationship:

$$\begin{aligned} \mathop {\lim }\limits _{(q + r) \rightarrow \infty } {P_{q + r}} = 0 \end{aligned}$$
(41)

To sum up, extremely, OS-ELM will lose its correction capability. \(\square \)

Appendix B

Detailed derivation of ALD:

Let \({X_{{N_0}}} = [{x_1},{x_2},\ldots ,{x_{{N_0}}}]\), and the mean of each variable of the initial training data \({\aleph _0} = \{ ({x_i},{y_i})\} _{i = 1}^{{N_0}}\) is

$$\begin{aligned} {u_{{\aleph _0}}}= & {} \frac{1}{{{N_0}}}{({X_{{N_0}}})^T} \cdot {\mathbf{{1}}_{{N_0}}} \end{aligned}$$
(42)

where \({\mathbf{{1}}_{{N_0}}} = {[1,1,\ldots ,1]^T}\). The data scaled to zero mean and unit variance can be represented as

$$\begin{aligned} {\tilde{X}_{{N_0}}}= & {} ({X_{{N_0}}} - {\mathbf{{1}}_{{N_0}}}u_{{\aleph _0}}^T) \cdot \sum \nolimits _{{N_0}}^{ - 1} {} \end{aligned}$$
(43)

where \(\sum \nolimits _{{N_0}}^{} { = diag({\sigma _{{N_0}1}},{\sigma _{{N_0}2}},\ldots ,{\sigma _{{N_0}w}})} \), and \({\sigma _{{N_0}w}}\) stands for the standard deviation of the wth datum.

When the new datum \({x_{k + 1}}\) arrives, the corresponding mean and the standard deviation can be calculated by

$$\begin{aligned} {u_{{N_0} + 1}}= & {} \frac{{{N_0}}}{{{N_0} + 1}}{u_{{N_0}}} + \frac{1}{{{N_0} + 1}}{({x_{{N_0} + 1}})^T} \end{aligned}$$
(44)
$$\begin{aligned} \sigma _{\left( {{N_0} + 1} \right) i}^2 \!= & {} \!\frac{{{N_0} - 1}}{{{N_0}}}\sigma _{{N_0}i}^2 + \varDelta u_{{N_0} + 1}^2(i)\nonumber \\&\quad \!+\! \frac{1}{{{N_0}}}{\left\| {{x_{{N_0} + 1}}(i) - {u_{{N_0} + 1}}(i)} \right\| ^2} \end{aligned}$$
(45)

Let \({x_{{N_0} + 1}}\) be scaled with

$$\begin{aligned} {\tilde{x}_{{N_0} + 1}}= & {} ({x_{{N_0} + 1}} - \mathbf{{1}} \cdot u_{{N_0} + 1}^T)\sum \nolimits _{{N_0} + 1}^{ - 1} {} \end{aligned}$$
(46)

where \(\sum \nolimits _{{N_0} + 1}^{} { = diag({\sigma _{({N_0} + 1)1}},{\sigma _{({N_0} + 1)2}},\ldots ,{\sigma _{({N_0} + 1)w}})} \).

Expanding the ALD shown in (18), we have

$$\begin{aligned} \begin{aligned} {\delta _{k + 1}}&= \min \left\{ {\sum \limits _{l,m = 1}^{{N_0}} {{\alpha _l}{\alpha _m}\left\langle {{{\tilde{x}}_l},{{\tilde{x}}_m}} \right\rangle \! - \! 2\sum \limits _{m = 1}^{{N_0}} {{\alpha _m}\left\langle {{{\tilde{x}}_m},{{\tilde{x}}_{{N_0} + 1}}} \right\rangle } }}\right. \\&\left. \qquad {{{ \! + \!\left\langle {{{\tilde{x}}_m},{{\tilde{x}}_{{N_0} + 1}}} \right\rangle } } } \right\} \\&= \min \left\{ {\alpha _{{N_0} + 1}^T{J_{{N_0}}}{\alpha _{{N_0} + 1}} - 2\alpha _{{N_0} + 1}^T{j_{{N_0}}} + {{\tilde{j}}_{{N_0} + 1}}} \right\} \end{aligned} \end{aligned}$$
(47)

where \({J_{{N_0}}} = {\tilde{X}_{{N_0}}} \cdot \tilde{X}_{{N_0}}^T\), \({j_{{N_0}}} = {\tilde{X}_{{N_0}}} \cdot \tilde{x}_{{N_0} + 1}^T\), \({\tilde{j}_{{N_0} + 1}} = {\tilde{x}_{{N_0} + 1}} \cdot \tilde{x}_{{N_0} + 1}^T\).

In order to minimize \({\delta _{k + 1}}\), we have

$$\begin{aligned} {\alpha _{{N_0} + 1}} = J_{{N_0}}^{ - 1} \cdot {j_{{N_0}}} \end{aligned}$$
(48)

Substituting (48) into (47), we can obtain the recursive ALD:

$$\begin{aligned} {\delta _{k + 1}}= & {} {\tilde{j}_{{N_0} + 1}} - j_{{N_0}}^T{\alpha _{{N_0} + 1}} = {\tilde{j}_{{N_0} + 1}} - j_{{N_0}}^TJ_{{N_0}}^{ - 1} \cdot {j_{{N_0}}} \end{aligned}$$
(49)

Thus, we have

$$\begin{aligned} {\tilde{x}_{k + 1}}= & {} \sum \limits _{l = 1}^{{N_0}} {{\alpha _l}} {\tilde{x}_l} + \varepsilon \end{aligned}$$
(50)

where \(\xi \) is the error.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Li, Y. & Xiao, W. Adaptive online sequential extreme learning machine for dynamic modeling. Soft Comput 25, 2177–2189 (2021). https://doi.org/10.1007/s00500-020-05289-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-020-05289-6

Keywords

Navigation