Abstract
In this paper, a novel method called online sequential extreme learning machine with under-sampling and over-sampling (OSELM-UO) for imbalanced Big data classification is proposed which combines the structures of under-sampling and over-sampling and applies online sequential extreme learning machine as its base model. The novel structure enables OSELM-UO performs well on both minority and majority classes and simultaneously overcomes the issues of information loss and overfitting. Moreover, when the dataset keeps growing, OSELM-UO can be applied without retraining all previous data. Experiments have been conducted for OSELM-UO and several imbalance learning methods over real-world datasets respectively under high imbalance ratio (IR) and large amount of samples and features. Through the analysis of the experimental results, OSELM-UO is shown to give the best results in various aspects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Katal, A., Wazid, M., Goudar, R.: Big data: issues, challenges, tools and good practices. In: 2013 Sixth International Conference on Contemporary Computing (IC3), pp. 404–409. IEEE (2013)
Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
del RÃo, S., López, V., BenÃtez, J.M., Herrera, F.: On the use of mapreduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014)
Ling, C.X., Sheng, V.S.: Cost-sensitive learning. In: Encyclopedia of Machine Learning, pp. 231–235. Springer (2011)
Gershunskaya, J., Jiang, J., Lahiri, P.: Resampling methods in surveys. Handb. Stat. 29, 121–151 (2009)
Zong, W., Huang, G.-B., Chen, Y.: Weighted extreme learning machine for imbalance learning. Neurocomputing 101, 229–242 (2013)
Gao, X., Chen, Z., Tang, S., Zhang, Y., Li, J.: Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing 173, 1927–1935 (2016)
Sharma, R., Bist, A.S.: Genetic algorithm based weighted extreme learning machine for binary imbalance learning. In: 2015 International Conference on Cognitive Computing and Information Processing (CCIP), pp. 1–6. IEEE (2015)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Tang, Y., Zhang, Y.-Q.: Granular svm with repetitive undersampling for highly imbalanced protein homology prediction. In: 2006 IEEE International Conference on Granular Computing, pp. 457–460. IEEE (2006)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. knowl. data Eng. 21(9), 1263–1284 (2009)
He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
Liang, N.-Y., Huang, G.-B., Saratchandran, P., Sundararajan, N.: A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 17(6), 1411–1423 (2006)
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: "Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 42(2), 513–529 (2012)
Huang, G.-B., Wang, D.H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2), 107–122 (2011)
Benesty, J., Paleologu, C., Gänsler, T., Ciochină, S.: Recursive least-squares algorithms. In: A Perspective on Stereophonic Acoustic Echo Cancellation, pp. 63–69. Springer (2011)
Frank, A., Asuncion, A.: Uci machine learning repository [http://archive.ics.uci.edu/ml]. irvine, ca: University of california, School of Information and Computer Science, vol. 213 (2010)
Alcalá, J., Fernández, A., Luengo, J., Derrac, J., GarcÃa, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17(2–3), 255–287 (2010)
Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: International Symposium on Intelligence Computation and Applications, pp. 461–471. Springer (2009)
Acknowledgements
The work is financially supported by funding from University of Macau, project number MYRG2014-00083-FST, and from FDCT Macau, project number 050/2015/A.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Du, J., Vong, CM., Chang, Y., Jiao, Y. (2018). Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification. In: Cao, J., Cambria, E., Lendasse, A., Miche, Y., Vong, C. (eds) Proceedings of ELM-2016. Proceedings in Adaptation, Learning and Optimization, vol 9. Springer, Cham. https://doi.org/10.1007/978-3-319-57421-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-57421-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57420-2
Online ISBN: 978-3-319-57421-9
eBook Packages: EngineeringEngineering (R0)