Abstract
More recently, extreme learning machine (ELM) has emerged as a novel computing paradigm that enables the neural network (NN) based learning to be achieved with fast training speed and good generalization performance. However, the single hidden layer NN using ELM may be not effective in addressing some large-scale problems with more computational efforts. To avoid such limitation, we utilize the multilayer ELM architecture in this article to reduce the computational complexity, without the physical memory limitation. Meanwhile, it is known to us all that there are a lot of noises in the practical applications, and the traditional ELM may not perform well in this instance. Considering the existence of noises or outliers in training dataset, we develop a more practical approach by incorporating the kernel risk-sensitive loss (KRSL) criterion into ELM, on the basis of the efficient performance surface of KRSL with high accuracy while still maintaining the robustness to outliers. A robust multilayer ELM, i.e., the stacked ELM using the minimum KRSL criterion (SELM-MKRSL), is accordingly proposed in this article to enhance the outlier robustness on large-scale and complicated dataset. The simulation results on some synthetic datasets indicate that the proposed approach SELM-MKRSL can achieve higher classification accuracy and is more robust to the noises compared with other state-of-the-art algorithms related to multilayer ELM.
Similar content being viewed by others
References
Serengil SI, Ozpinar A (2017) Workforce optimization for bank operation centers: a machine learning approach. Int J Interact Multimed Artif Intell 4(6):81–87
Elvira C, Ochoa A, Gonzalvez JC, Mochón F (2018) Machine-learning-based no show prediction in outpatient visits. Int J Interact Multimed Artif Intell 4(7):29–34
Alasadi AHH, Alsafy BM (2017) Diagnosis of malignant melanoma of skin cancer types. Int J Interact Multimed Artif Intell 4(5):44–49
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122
Cervellera C, Maccio D (2017) An extreme learning machine approach to density estimation problems. IEEE Trans Cybern 47(10):3254–3265
Iosifidis A, Gabbouj M (2015) On the kernel extreme learning machine speedup. Pattern Recognit Lett 68:205–210
Huang GB, Bai Z, Kasun LLC, Vong CM (2015) Local receptive fields based extreme learning machine. IEEE Comput Intell Mag 10(2):18–29
Zhu H, Tsang ECC, Wang XZ (2016) Monotonic classification extreme learning machine. Neurocomputing 225:205–213
Cao JW, Zhang K, Luo MX, Yin C, Lai XP (2016) Extreme learning machine and adaptive sparse representation for image classification. Neural Netw 81:91–102
Luo X, Yang X, Jiang C, Ban XJ (2018) Timeliness online regularized extreme learning machine. Int J Mach Learn Cybern 9(3):465–476
Mozaffari A, Azad NL (2016) Self-controlled bio-inspired extreme learning machines for scalable regression and classification: a comprehensive analysis with some recommendations. Artif Intell Rev 46(2):167–223
Zhai JH, Shao QY, Wang XZ (2016) Architecture selection of ELM networks based on sensitivity of hidden nodes. Neural Process Lett 44(2):471–489
Balasundaram S, Gupta D (2016) On optimization based extreme learning machine in primal for regression and classification by functional iterative method. Int J Mach Learn Cybern 7(5):707–728
Zhu H, Tsang ECC, Wang XZ, Aamir Raza Ashfaq R (2017) Monotonic classification extreme learning machine. Neurocomputing 225:205–213
Ding S, Zhang N, Zhang J, Xu X, Shi Z (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8(2):587–595
Alom MZ, Sidike P, Taha TM, Asari VK (2017) State preserving extreme learning machine: a monotonically increasing learning approach. Neural Process Lett 45(2):703–725
Luo X, Jiang C, Wang W, Xu Y, Wang JH, Zhao W (2019) User behavior prediction in social networks using weighted extreme learning machine with distribution optimization. Future Gener Comput Syst 93:1023–1035
Bai Z, Huang GB, Wang D (2014) Sparse extreme learning machine for classification. IEEE Trans Cybern 44(10):1858–1870
Cao JW, Zhao YF, Lai XP, Ong MEH, Yin C, Koh ZX, Liu N (2015) Landmark recognition with sparse representation classification and extreme learning machine. J Franklin Inst 352(10):4528–4545
Cao WP, Ming Z, Wang XZ, Cai SB (2017) Improved bidirectional extreme learning machine based on enhanced random search. Memet Comput 5:1–8
Lan Y, Soh YC, Huang GB (2009) Ensemble of online sequential extreme learning machine. Neurocomputing 72:3391–3395
Tang JX, Deng CW, Huang GB (2016) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821
Zhou HM, Huang GB, Lin ZP, Wang H, Soh YC (2015) Stacked extreme learning machines. IEEE Trans Cybern 45(9):2013–2025
Luo X, Deng J, Liu J, Wang W, Ban X, Wang JH (2017) A quantized kernel least mean square scheme with entropy-guided learning for intelligent data analysis. China Commun 14(7):127–136
Miche Y, Bas P, Jutten C, Simula O, Lendasse A (2008) A methodology for building regression models using extreme learning machine: OP-ELM. In: Proc 16th Eur symposium artif neural netw—adv comput intell learn, pp 247–252
Guo D, Shamai S, Verdu S (2005) Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans Inf Theory 51(4):1261–1282
Lu XJ, Ming L, Liu WB, Li HX (2017) Probabilistic regularized extreme learning machine for robust modeling of noise data. IEEE Trans Cybern 48(8):2368–2377
Zhang T, Deng ZH, Choi KS, Liu JF, Wang ST (2017) Robust extreme learning fuzzy systems using ridge regression for small and noisy datasets. In: Proc IEEE int conf fuzzy syst, pp 1–7
Wong SY, Yap KS, Yap HJ (2016) A constrained optimization based extreme learning machine for noisy data regression. Neurocomputing 171:1431–1443
Santamaria I, Pokharel PP, Principe JC (2006) Generalized correlation function: definition, properties, and application to blind equalization. IEEE Trans Signal Process 54(61):2187–2197
Liu W, Pokharel PP, Principe JC (2007) Correntropy: properties and applications in non-Gaussian signal processing. IEEE Trans Signal Process 55(11):5286–5298
He R, Zheng WS, Hu BG (2011) Maximum correntropy criterion for robust face recognition. IEEE Trans Pattern Anal Mach Intell 33(8):1561–1576
Chen BD, Xing L, Liang JL, Zheng N, Principe JC (2014) Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion. IEEE Signal Process Lett 21(7):880–884
Xing HJ, Wang XM (2013) Training extreme learning machine via regularized correntropy criterion. Neural Comput Appl 23(7–8):1977–1986
Luo X, Sun J, Wang L, Wang W, Zhao W, Wu J, Wang JH, Zhang Z (2018) Short-term wind speed forecasting via stacked extreme learning machine with generalized correntropy. IEEE Trans Ind Inf 14(11):4963–4971
Luo X, Xu Y, Wang WP, Yuan MM, Ban XJ, Zhu YQ, Zhao WB (2018) Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy. J Franklin Inst 355(4):1945–1966
Syed MN, Pardalos PM, Principe JC (2014) On the optimization properties of the correntropic loss function in data analysis. Optim Lett 8(3):823–839
Chen BD, Xing L, Xu B, Zhao H, Zheng N, Principe JC (2017) Kernel risk-sensitive loss: definition, properties and application to robust adaptive filtering. IEEE Trans Signal Process 65(11):2888–2901
Luo X, Zhang D, Yang LT, Liu J, Chang X, Ning H (2016) A kernel machine-based secure data sensing and fusion scheme in wireless sensor networks for the cyber-physical systems. Future Gener Comput Syst 61:85–96
Serre D (2010) Matrices: theory and applications. Springer, New York
Dwyer PS, Rao CR, Mitra SK (1973) Generalized inverse of matrices and its applications. J Am Stat Assoc 68:239
Huang GB, Zhou HM, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529
Candes EJ, Li X, Ma Y (2011) Robust principal component analysis? J ACM 58(3):11
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Cole R, Fanty M (1994) UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/ISOLET
Kurgan LA, Cios KJ, Tadeusiewicz R, Ogiela M, Goodenday LS (2001) Knowledge discovery approach to automated cardiac SPECT diagnosis. Artif Intell Med 23(2):149–169
Kushmerick N (1999) Learning to remove Internet advertisements. In: Proc int conf autonom agents, pp 175–181
Klahr D, Siegler RS (1978) The representation of children’s knowledge. Adv Child Dev Behav 12:61–116
Fehrman E, Muhammad AK, Mirkes EM, Egan V, Gorban AN (2017) The five factor model of personality and evaluation of drug consumption risk. In: Palumbo F, Montanari A, Vichi M. (eds) Data science. Studies in classification, data analysis, and knowledge organization. Springer, Cham
Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228
Sakar BE, Isenkul ME, Sakar CO, Sertbas A, Gurgen F, Delil S, Apaydin H, Kursun O (2013) Collection and Analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomed Health 17(4):828–834
Mansouri K, Ringsted T, Ballabio D, Todeschini R, Consonni V (2013) Quantitative structure-activity relationship models for ready biodegradability of chemicals. J Chem Inf Model 53(4):867–878
Sikora M, Wrobel L (2010) Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Arch Min Sci 55(1):91–114
Dua D, Karra TE (2019) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://archive.ics.uci.edu/ml
Principe JC (2010) Information theoretic learning: Renyi’s entropy and kernel perspectives. Springer, New York
Chen BD, Zhu Y, Hu JC, Principe JC (2013) System parameter identification: information criteria and algorithms. Elsevier, Amsterdam
Chen M, Li Y, Luo X, Wang W, Wang L, Zhao W (2019) A novel human activity recognition scheme for smart health using multilayer extreme learning machine. IEEE Internet Things J 6(2):1410–1418
Chen LJ, Qu H, Zhao JH, Chen BD, Principe JC (2016) Efficient and robust deep learning with correntropy-induced loss function. Neural Comput Appl 27(4):1019–1031
Acknowledgements
This work is funded by the National Key Research and Development Program of China under Grant 2016YFC0600510, the National Natural Science Foundation of China under Grants U1836106 and U1736117, the Key Laboratory of Geological Information Technology of Ministry of Land and Resources under Grant 2017320, and the University of Science and Technology Beijing—National Taipei University of Technology Joint Research Program under Grant TW201705.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This manuscript is recommended by the 8th International Conference on Extreme Learning Machines (ELM2017).
Rights and permissions
About this article
Cite this article
Luo, X., Li, Y., Wang, W. et al. A robust multilayer extreme learning machine using kernel risk-sensitive loss criterion. Int. J. Mach. Learn. & Cyber. 11, 197–216 (2020). https://doi.org/10.1007/s13042-019-00967-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-019-00967-w