Abstract
A large amount of incident narratives is collected across all industries around the world. These narratives are usually written manually, which might include a significant amount of errors. Due to the size and complexity of such narrative data, advanced text mining and natural language processing (NLP) techniques are essentially required to extract useful information. Automatic document classification is one of such important tasks in NLP. Therefore, to reduce the dimensionality problem in data, the study proposes a genetic stacking-based ensemble learning (GSEL) method using Skip-gram model and Doc2vec framework. The classifiers, namely logistic regression, random forest, k-nearest neighbor, multi-layer perceptron (MLP), and support vector machine are used and their outputs are ensembled to produce better accuracy in prediction. A real-coded genetic algorithm (GA) is used to tune the parameters of ensemble method. Results reveal that the proposed approach is capable of handling a huge amount of text data in analysis and predict with enhanced accuracy as compared to other state-of-the-art algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Takala, J.: Global estimates of fatal occupational accidents. Epidemiol.-Baltimore 10(5), 640–646 (1999)
Leigh, J.P., Marcin, J.P., Miller, T.R.: An estimate of the us government’s undercount of nonfatal occupational injuries. J. Occup. Environ. Med. 46(1), 10–18 (2004)
Singh, K., Raj, N., Sahu, S., Behera, R., Sarkar, S., Maiti, J.: Modelling safety of gantry crane operations using petri nets. In. J. Injury Control Saf. Promot. 24, 32–43 (2015)
Gautam, S., Maiti, J., Syamsundar, A., Sarkar, S.: Segmented point process models for work system safety analysis. Saf. Sci. 95, 15–27 (2017)
Sarkar, S., Verma, A., Maiti, J.: Prediction of occupational incidents using proactive and reactive data: a data mining approach. In: Industrial Safety Management - 21st Century Perspective of Asia (Springer), pp. 65–79. Springer, Singapore (2018)
Verma, A., Chatterjee, S., Sarkar, S., Maiti, J.: Data-driven mapping between proactive and reactive measures of occupational safety performance. In: Industrial Safety Management - 21st Century Perspective of Asia (Springer), pp. 53–63. Springer, Singapore (2018)
Sarkar, S., Baidya, S., Maiti, J.: Application of rough set theory in accident analysis at work: a case study. In: ICRCICN 2017, pp. 245–250. IEEE (2017)
Sarkar, S., Raj, R., Vinay, S., Malti, J., Pratihar, D.K.: An optimization-based decision tree approach for predicting slip-trip-fall accidents at work. Saf. Sci. 118, 57–69 (2019)
Sarkar, S., Ejaz, N., Maiti, J.: Application of hybrid clustering technique for pattern extraction of accident at work: a case study of a steel industry. In: 2018 4th International Conference on Recent Advances in Information Technology (RAIT), pp. 1–6. IEEE (2018)
Sarkar, S., Vinay, S., Maiti, J.: Text mining based safety risk assessment and prediction of occupational accidents in a steel plant. In: 2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT), pp. 439–444. IEEE (2016)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)
Dadgar, S.M.H., Araghi, M.S., Farahani, M.M.: A novel text mining approach based on TF-IDFand support vector machine for news classification. In: 2016 IEEE International Conference on Engineering and Technology (ICETECH), pp. 112–116. IEEE (2016)
Sarkar, S., Pateshwari, V., Maiti, J.: Predictive model for incident occurrences in steel plant in India. In: ICCCNT 2017, pp. 1–5. IEEE (2017)
Dumais, S.T.: Latent semantic analysis. Ann. Rev. Inf. Sci. Technol. 38(1), 188–230 (2004)
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Application of dimensionality reduction in recommender system-a case study. Technical report, University Minneapolis Department of Computer Science, Minnesota (2000)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Heuer, H.: Text comparison using word vector representations and dimensionality reduction. arXiv preprint arXiv:1607.00534 (2016)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, pp. 377–382. ACM (2001)
Yu, J., Jang, J., Yoo, J., Park, J.H., Kim, S.: Bagged auto-associative kernel regression-based fault detection and identification approach for steam boilers in thermal power plants. J. Electr. Eng. Technol. 12(4), 1406–1416 (2017)
Li, X., Wang, L., Sung, E.: Adaboost with SVM-based component classifiers. Eng. Appl. Artif. Intell. 21(5), 785–795 (2008)
Bieshaar, M., Zernetsch, S., Hubert, A., Sick, B., Doll, K.: Cooperative starting movement detection of cyclists using convolutional neural networks and a boosted stacking ensemble. IEEE Trans. Intell. Veh. 3(4), 534–544 (2018)
Pathan, M., Patsias, S., Tagarielli, V.: A real-coded genetic algorithm for optimizing the damping response of composite laminates. Comput. Struct. 198, 51–60 (2018)
Rong, X.: Word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014)
Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, Hoboken (2013)
Belgiu, M., Drăguţ, L.: Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogrammetry Remote Sens. 114, 24–31 (2016)
Sarkar, S., Patel, A., Madaan, S., Maiti, J.: Prediction of occupational accidents using decision tree approach. In: INDICON 2017, pp. 1–6. IEEE (2017)
Marchesi, M., Orlandi, G., Piazza, F., Uncini, A.: Fast neural networks without multipliers. IEEE Trans. Neural Netw. 4(1), 53–62 (1993)
Chen, M.S., Manry, M.T.: Conventional modeling of the multilayer perceptron using polynomial basis functions. IEEE Trans. Neural Netw. 4(1), 164–166 (1993)
Patrick, E.A., Fischer III, F.P.: A generalized k-nearest neighbor rule. Inf. Control 16(2), 128–152 (1970)
Weston, J.: Support vector machine. Tutorial http://www.cs.columbia.edu/~kathy/cs4701/documents/jason_svm_tutorial.pdf. Accessed 10 May 2014
Sarkar, S., Vinay, S., Pateshwari, V., Maiti, J.: Study of optimized SVM for incident prediction of a steel plant in India. In: INDICON 2017, pp. 1–6. IEEE (2017)
Wright, A.H.: Genetic algorithms for real parameter optimization. In: Foundations of Genetic Algorithms, vol.1, pp. 205–218. Elsevier (1991)
Sarkar, S., Vinay, S., Raj, R., Maiti, J., Mitra, P.: Application of optimized machine learning techniques for prediction of occupational accidents. Comput. Oper. Res. 106, 210–224 (2018)
Sarkar, S., Lohani, A., Maiti, J.: Genetic algorithm-based association rule mining approach towards rule generation of occupational accidents. In: Communications in Computer and Information Science (Springer), vol. 776, pp. 517–530. Springer, Singapore (2017)
Sarkar, S., Lakha, V., Ansari, I., Maiti, J.: Supplier selection in uncertain environment: a fuzzy MCDM approach. In: Proceedings of the First International Conference on Intelligent Computing and Communication, pp. 257–266. Springer (2017)
Sarkar, S., Chain, M., Nayak, S., Maiti, J.: Decision support system for prediction of occupational accident: a case study from a steel plant. In: Emerging Technologies in Data Mining and Information Security, vol. 813, pp. 787–796. Springer, Singapore (2019)
Sarkar, S., Kumar, A., Mohanpuria, S.K., Maiti, J.: Application of bayesian network model in explaining occupational accidents in a steel industry. In: 2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), pp. 337–392. IEEE (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sarkar, S., Pramanik, A., Khatedi, N., Balu, A.S.M., Maiti, J. (2020). GSEL: A Genetic Stacking-Based Ensemble Learning Approach for Incident Classification. In: Singh, P., Panigrahi, B., Suryadevara, N., Sharma, S., Singh, A. (eds) Proceedings of ICETIT 2019. Lecture Notes in Electrical Engineering, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-030-30577-2_64
Download citation
DOI: https://doi.org/10.1007/978-3-030-30577-2_64
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30576-5
Online ISBN: 978-3-030-30577-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)