Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification

Du, Jie; Vong, Chi-Man; Chang, Yajie; Jiao, Yang

doi:10.1007/978-3-319-57421-9_19

Jie Du⁸,
Chi-Man Vong⁸,
Yajie Chang⁸ &
…
Yang Jiao⁸

Part of the book series: Proceedings in Adaptation, Learning and Optimization ((PALO,volume 9))

934 Accesses
2 Citations

Abstract

In this paper, a novel method called online sequential extreme learning machine with under-sampling and over-sampling (OSELM-UO) for imbalanced Big data classification is proposed which combines the structures of under-sampling and over-sampling and applies online sequential extreme learning machine as its base model. The novel structure enables OSELM-UO performs well on both minority and majority classes and simultaneously overcomes the issues of information loss and overfitting. Moreover, when the dataset keeps growing, OSELM-UO can be applied without retraining all previous data. Experiments have been conducted for OSELM-UO and several imbalance learning methods over real-world datasets respectively under high imbalance ratio (IR) and large amount of samples and features. Through the analysis of the experimental results, OSELM-UO is shown to give the best results in various aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Katal, A., Wazid, M., Goudar, R.: Big data: issues, challenges, tools and good practices. In: 2013 Sixth International Conference on Contemporary Computing (IC3), pp. 404–409. IEEE (2013)
Google Scholar
Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Article MathSciNet Google Scholar
del Río, S., López, V., Benítez, J.M., Herrera, F.: On the use of mapreduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014)
Article Google Scholar
Ling, C.X., Sheng, V.S.: Cost-sensitive learning. In: Encyclopedia of Machine Learning, pp. 231–235. Springer (2011)
Google Scholar
Gershunskaya, J., Jiang, J., Lahiri, P.: Resampling methods in surveys. Handb. Stat. 29, 121–151 (2009)
Article Google Scholar
Zong, W., Huang, G.-B., Chen, Y.: Weighted extreme learning machine for imbalance learning. Neurocomputing 101, 229–242 (2013)
Article Google Scholar
Gao, X., Chen, Z., Tang, S., Zhang, Y., Li, J.: Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing 173, 1927–1935 (2016)
Article Google Scholar
Sharma, R., Bist, A.S.: Genetic algorithm based weighted extreme learning machine for binary imbalance learning. In: 2015 International Conference on Cognitive Computing and Information Processing (CCIP), pp. 1–6. IEEE (2015)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Tang, Y., Zhang, Y.-Q.: Granular svm with repetitive undersampling for highly imbalanced protein homology prediction. In: 2006 IEEE International Conference on Granular Computing, pp. 457–460. IEEE (2006)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. knowl. data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
Google Scholar
Liang, N.-Y., Huang, G.-B., Saratchandran, P., Sundararajan, N.: A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 17(6), 1411–1423 (2006)
Article Google Scholar
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Article Google Scholar
Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: "Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 42(2), 513–529 (2012)
Article Google Scholar
Huang, G.-B., Wang, D.H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2), 107–122 (2011)
Article Google Scholar
Benesty, J., Paleologu, C., Gänsler, T., Ciochină, S.: Recursive least-squares algorithms. In: A Perspective on Stereophonic Acoustic Echo Cancellation, pp. 63–69. Springer (2011)
Google Scholar
Frank, A., Asuncion, A.: Uci machine learning repository [http://archive.ics.uci.edu/ml]. irvine, ca: University of california, School of Information and Computer Science, vol. 213 (2010)
Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17(2–3), 255–287 (2010)
Google Scholar
Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: International Symposium on Intelligence Computation and Applications, pp. 461–471. Springer (2009)
Google Scholar

Download references

Acknowledgements

The work is financially supported by funding from University of Macau, project number MYRG2014-00083-FST, and from FDCT Macau, project number 050/2015/A.

Author information

Authors and Affiliations

Department of Computer of Information Science, University of Macau, Macau, China
Jie Du, Chi-Man Vong, Yajie Chang & Yang Jiao

Authors

Jie Du
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Man Vong
View author publications
You can also search for this author in PubMed Google Scholar
Yajie Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Jiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chi-Man Vong .

Editor information

Editors and Affiliations

Institute of Information and Control, Hangzhou Dianzi University, Zhejiang, China
Jiuwen Cao
School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Erik Cambria
Department of Mechanical and Industrial Engineering, University of Iowa, Iowa City, Iowa, USA
Amaury Lendasse
Department of Information and Computer Science, School of Science, Aalto University, Aalto, Finland
Yoan Miche
Department of Computer and Information Science, University of Macau, Macau, China
Chi Man Vong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Du, J., Vong, CM., Chang, Y., Jiao, Y. (2018). Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification. In: Cao, J., Cambria, E., Lendasse, A., Miche, Y., Vong, C. (eds) Proceedings of ELM-2016. Proceedings in Adaptation, Learning and Optimization, vol 9. Springer, Cham. https://doi.org/10.1007/978-3-319-57421-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-57421-9_19
Published: 26 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57420-2
Online ISBN: 978-3-319-57421-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics