Abstract
Real-world data commonly have an issue of class-imbalance, which poses a big challenge in pattern recognition and machine learning tasks. To handle this issue, we have proposed an ensemble learning-based undersampling technique using Extreme Gradient Boosting (XGBoost) and Support Vector Machine (SVM). The technique has been validated using an accident dataset obtained from a steel plant. The results explore that the proposed technique is capable of resolving the issue of class-imbalance effectively. This method outperforms traditional under-sampling technique in terms of performance metrics, i.e., geometric mean (G-mean), recall, and precision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sarkar, S., Ejaz, N., Maiti, J.: Application of hybrid clustering technique for pattern extraction of accident at work: a case study of a steel industry. In: 2018 4th International Conference on Recent Advances in Information Technology (RAIT), pp. 1–6. IEEE (2018)
Sarkar, S., Pateshwari, V., Maiti, J.: Predictive model for incident occurrences in steel plant in India. In: ICCCNT 2017, pp. 1–5. IEEE (2017)
Sarkar, S., Vinay, S., Raj, R., Maiti, J., Mitra, P.: Application of optimized machine learning techniques for prediction of occupational accidents. Comput. Oper. Res. 106, 210–224 (2019)
Verma, A., Chatterjee, S., Sarkar, S., Maiti, J.: Data-driven mapping between proactive and reactive measures of occupational safety performance. In: Industrial Safety Management - 21st Century Perspective of Asia, pp. 53–63. Springer, Singapore (2018)
Sarkar, S., Lohani, A., Maiti, J.: Genetic algorithm-based association rule mining approach towards rule generation of occupational accidents. In: Communications in Computer and Information Science, vol. 776, pp. 517–530. Springer, Singapore (2017)
Sarkar, S., Baidya, S., Maiti, J.: Application of rough set theory in accident analysis at work: a case study. In: ICRCICN 2017, pp. 245–250. IEEE (2017)
Sarkar, S., Kumar, A., Mohanpuria, S.K., Maiti, J.: Application of Bayesian network model in explaining occupational accidents in a steel industry. In: 2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), pp. 337–392. IEEE (2017)
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)
Sarkar, S., Vinay, S., Pateshwari, V., Maiti, J.: Study of optimized SVM for incident prediction of a steel plant in India. In: IEEE INDICON 2017, pp. 1–6. IEEE (2017)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Kubat, M., Holte, R., Matwin, S.: Learning when negative examples abound. In: European Conference on Machine Learning, pp. 146–153. Springer (1997)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization (1998)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: ICML, Nashville, USA, vol. 97, pp. 179–186 (1997)
Drummond, C., Holte, R.C., et al.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11, pp. 1–8. Citeseer (2003)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107–119. Springer (2003)
Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: KDD, vol. 98, pp. 73–79 (1998)
Bruzzone, L., Serpico, S.B.: Classification of imbalanced remote-sensing data by neural networks. Pattern Recogn. Lett. 18(11–13), 1323–1328 (1997)
Kim, H.C., Pang, S., Je, H.M., Kim, D., Bang, S.: Constructing support vector machine ensemble. Pattern Recogn. 36(12), 2757–2767 (2003)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Sarkar, S., Raj, R., Vinay, S., Maiti, J., Pratihar, D.K.: An optimization-based decision tree approach for predicting slip-trip-fall accidents at work. Saf. Sci. 118, 57–69 (2019)
Sarkar, S., Patel, A., Madaan, S., Maiti, J.: Prediction of occupational accidents using decision tree approach. In: IEEE INDICON 2017, pp. 1–6. IEEE (2017)
Sarkar, S., Lakha, V., Ansari, I., Maiti, J.: Supplier selection in uncertain environment: a fuzzy MCDM approach. In: Proceedings of the First International Conference on Intelligent Computing and Communication, pp. 257–266. Springer (2017)
Sarkar, S., Chain, M., Nayak, S., Maiti, J.: Decision support system for prediction of occupational accident: a case study from a steel plant. In: Emerging Technologies in Data Mining and Information Security, vol. 813, pp. 787–796. Springer, Singapore (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sarkar, S., Khatedi, N., Pramanik, A., Maiti, J. (2020). An Ensemble Learning-Based Undersampling Technique for Handling Class-Imbalance Problem. In: Singh, P., Panigrahi, B., Suryadevara, N., Sharma, S., Singh, A. (eds) Proceedings of ICETIT 2019. Lecture Notes in Electrical Engineering, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-030-30577-2_51
Download citation
DOI: https://doi.org/10.1007/978-3-030-30577-2_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30576-5
Online ISBN: 978-3-030-30577-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)