Skip to main content

An Ensemble Learning-Based Undersampling Technique for Handling Class-Imbalance Problem

  • Conference paper
  • First Online:
Proceedings of ICETIT 2019

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 605))

Abstract

Real-world data commonly have an issue of class-imbalance, which poses a big challenge in pattern recognition and machine learning tasks. To handle this issue, we have proposed an ensemble learning-based undersampling technique using Extreme Gradient Boosting (XGBoost) and Support Vector Machine (SVM). The technique has been validated using an accident dataset obtained from a steel plant. The results explore that the proposed technique is capable of resolving the issue of class-imbalance effectively. This method outperforms traditional under-sampling technique in terms of performance metrics, i.e., geometric mean (G-mean), recall, and precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sarkar, S., Ejaz, N., Maiti, J.: Application of hybrid clustering technique for pattern extraction of accident at work: a case study of a steel industry. In: 2018 4th International Conference on Recent Advances in Information Technology (RAIT), pp. 1–6. IEEE (2018)

    Google Scholar 

  2. Sarkar, S., Pateshwari, V., Maiti, J.: Predictive model for incident occurrences in steel plant in India. In: ICCCNT 2017, pp. 1–5. IEEE (2017)

    Google Scholar 

  3. Sarkar, S., Vinay, S., Raj, R., Maiti, J., Mitra, P.: Application of optimized machine learning techniques for prediction of occupational accidents. Comput. Oper. Res. 106, 210–224 (2019)

    Article  MathSciNet  Google Scholar 

  4. Verma, A., Chatterjee, S., Sarkar, S., Maiti, J.: Data-driven mapping between proactive and reactive measures of occupational safety performance. In: Industrial Safety Management - 21st Century Perspective of Asia, pp. 53–63. Springer, Singapore (2018)

    Google Scholar 

  5. Sarkar, S., Lohani, A., Maiti, J.: Genetic algorithm-based association rule mining approach towards rule generation of occupational accidents. In: Communications in Computer and Information Science, vol. 776, pp. 517–530. Springer, Singapore (2017)

    Google Scholar 

  6. Sarkar, S., Baidya, S., Maiti, J.: Application of rough set theory in accident analysis at work: a case study. In: ICRCICN 2017, pp. 245–250. IEEE (2017)

    Google Scholar 

  7. Sarkar, S., Kumar, A., Mohanpuria, S.K., Maiti, J.: Application of Bayesian network model in explaining occupational accidents in a steel industry. In: 2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), pp. 337–392. IEEE (2017)

    Google Scholar 

  8. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)

    Article  Google Scholar 

  9. Sarkar, S., Vinay, S., Pateshwari, V., Maiti, J.: Study of optimized SVM for incident prediction of a steel plant in India. In: IEEE INDICON 2017, pp. 1–6. IEEE (2017)

    Google Scholar 

  10. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

    Google Scholar 

  11. Kubat, M., Holte, R., Matwin, S.: Learning when negative examples abound. In: European Conference on Machine Learning, pp. 146–153. Springer (1997)

    Google Scholar 

  12. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization (1998)

    Google Scholar 

  13. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  14. Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, Nashville, USA, vol. 97, pp. 179–186 (1997)

    Google Scholar 

  15. Drummond, C., Holte, R.C., et al.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11, pp. 1–8. Citeseer (2003)

    Google Scholar 

  16. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107–119. Springer (2003)

    Google Scholar 

  17. Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: KDD, vol. 98, pp. 73–79 (1998)

    Google Scholar 

  18. Bruzzone, L., Serpico, S.B.: Classification of imbalanced remote-sensing data by neural networks. Pattern Recogn. Lett. 18(11–13), 1323–1328 (1997)

    Article  Google Scholar 

  19. Kim, H.C., Pang, S., Je, H.M., Kim, D., Bang, S.: Constructing support vector machine ensemble. Pattern Recogn. 36(12), 2757–2767 (2003)

    Article  Google Scholar 

  20. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)

    Article  Google Scholar 

  21. Sarkar, S., Raj, R., Vinay, S., Maiti, J., Pratihar, D.K.: An optimization-based decision tree approach for predicting slip-trip-fall accidents at work. Saf. Sci. 118, 57–69 (2019)

    Article  Google Scholar 

  22. Sarkar, S., Patel, A., Madaan, S., Maiti, J.: Prediction of occupational accidents using decision tree approach. In: IEEE INDICON 2017, pp. 1–6. IEEE (2017)

    Google Scholar 

  23. Sarkar, S., Lakha, V., Ansari, I., Maiti, J.: Supplier selection in uncertain environment: a fuzzy MCDM approach. In: Proceedings of the First International Conference on Intelligent Computing and Communication, pp. 257–266. Springer (2017)

    Google Scholar 

  24. Sarkar, S., Chain, M., Nayak, S., Maiti, J.: Decision support system for prediction of occupational accident: a case study from a steel plant. In: Emerging Technologies in Data Mining and Information Security, vol. 813, pp. 787–796. Springer, Singapore (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sobhan Sarkar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarkar, S., Khatedi, N., Pramanik, A., Maiti, J. (2020). An Ensemble Learning-Based Undersampling Technique for Handling Class-Imbalance Problem. In: Singh, P., Panigrahi, B., Suryadevara, N., Sharma, S., Singh, A. (eds) Proceedings of ICETIT 2019. Lecture Notes in Electrical Engineering, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-030-30577-2_51

Download citation

Publish with us

Policies and ethics