Skip to main content

Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification

  • Conference paper
  • First Online:
Proceedings of ELM-2016

Part of the book series: Proceedings in Adaptation, Learning and Optimization ((PALO,volume 9))

Abstract

In this paper, a novel method called online sequential extreme learning machine with under-sampling and over-sampling (OSELM-UO) for imbalanced Big data classification is proposed which combines the structures of under-sampling and over-sampling and applies online sequential extreme learning machine as its base model. The novel structure enables OSELM-UO performs well on both minority and majority classes and simultaneously overcomes the issues of information loss and overfitting. Moreover, when the dataset keeps growing, OSELM-UO can be applied without retraining all previous data. Experiments have been conducted for OSELM-UO and several imbalance learning methods over real-world datasets respectively under high imbalance ratio (IR) and large amount of samples and features. Through the analysis of the experimental results, OSELM-UO is shown to give the best results in various aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Katal, A., Wazid, M., Goudar, R.: Big data: issues, challenges, tools and good practices. In: 2013 Sixth International Conference on Contemporary Computing (IC3), pp. 404–409. IEEE (2013)

    Google Scholar 

  2. Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)

    Article  MathSciNet  Google Scholar 

  3. del Río, S., López, V., Benítez, J.M., Herrera, F.: On the use of mapreduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014)

    Article  Google Scholar 

  4. Ling, C.X., Sheng, V.S.: Cost-sensitive learning. In: Encyclopedia of Machine Learning, pp. 231–235. Springer (2011)

    Google Scholar 

  5. Gershunskaya, J., Jiang, J., Lahiri, P.: Resampling methods in surveys. Handb. Stat. 29, 121–151 (2009)

    Article  Google Scholar 

  6. Zong, W., Huang, G.-B., Chen, Y.: Weighted extreme learning machine for imbalance learning. Neurocomputing 101, 229–242 (2013)

    Article  Google Scholar 

  7. Gao, X., Chen, Z., Tang, S., Zhang, Y., Li, J.: Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing 173, 1927–1935 (2016)

    Article  Google Scholar 

  8. Sharma, R., Bist, A.S.: Genetic algorithm based weighted extreme learning machine for binary imbalance learning. In: 2015 International Conference on Cognitive Computing and Information Processing (CCIP), pp. 1–6. IEEE (2015)

    Google Scholar 

  9. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  10. Tang, Y., Zhang, Y.-Q.: Granular svm with repetitive undersampling for highly imbalanced protein homology prediction. In: 2006 IEEE International Conference on Granular Computing, pp. 457–460. IEEE (2006)

    Google Scholar 

  11. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. knowl. data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  12. He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)

    Google Scholar 

  13. Liang, N.-Y., Huang, G.-B., Saratchandran, P., Sundararajan, N.: A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 17(6), 1411–1423 (2006)

    Article  Google Scholar 

  14. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)

    Article  Google Scholar 

  15. Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: "Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 42(2), 513–529 (2012)

    Article  Google Scholar 

  16. Huang, G.-B., Wang, D.H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2), 107–122 (2011)

    Article  Google Scholar 

  17. Benesty, J., Paleologu, C., Gänsler, T., Ciochină, S.: Recursive least-squares algorithms. In: A Perspective on Stereophonic Acoustic Echo Cancellation, pp. 63–69. Springer (2011)

    Google Scholar 

  18. Frank, A., Asuncion, A.: Uci machine learning repository [http://archive.ics.uci.edu/ml]. irvine, ca: University of california, School of Information and Computer Science, vol. 213 (2010)

  19. Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult.-Valued Log. Soft Comput. 17(2–3), 255–287 (2010)

    Google Scholar 

  20. Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: International Symposium on Intelligence Computation and Applications, pp. 461–471. Springer (2009)

    Google Scholar 

Download references

Acknowledgements

The work is financially supported by funding from University of Macau, project number MYRG2014-00083-FST, and from FDCT Macau, project number 050/2015/A.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi-Man Vong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Du, J., Vong, CM., Chang, Y., Jiao, Y. (2018). Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification. In: Cao, J., Cambria, E., Lendasse, A., Miche, Y., Vong, C. (eds) Proceedings of ELM-2016. Proceedings in Adaptation, Learning and Optimization, vol 9. Springer, Cham. https://doi.org/10.1007/978-3-319-57421-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57421-9_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57420-2

  • Online ISBN: 978-3-319-57421-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics