Abstract
The likelihood that an earthquake will occur in a specific location, within a specific time frame, and with ground motion intensity greater than a specific threshold is known as a seismic hazard. Predicting these types of hazards is crucial since doing so can enable early warnings, which can lessen the negative effects. Research is currently being executed in the field of machine learning to predict seismic events based on previously recorded incidents. However, because these events happen so infrequently, this presents a class imbalance problem to the machine learning or deep learning learners. As a result, this study provided a comparison of the performance of popular over-sampling techniques that seek to even out class imbalance in seismic events data. Specifically, this work applied SMOTE, SMOTENC, SMOTEN, BorderlineSMOTE, SVMSMOTE, and ADASYN to an open source Seismic Bumps dataset then trained several machine learning classifiers with stratified K-fold cross-validation for seismic hazard detection. The SVMSMOTE algorithm was the best over-sampling method as it produced classifiers with the highest overall accuracy, F1 score, recall, and precision of 100%, respectively, whereas the ADASYN over-sampling methodology showed the lowest performance in all the reported metrices of all the models. To our understanding, no research has been done comparing the effectiveness of the aforementioned over-sampling techniques for tasks involving seismic events.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bisong, E., Bisong, E.: Logistic regression. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, pp. 243–250 (2019)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Cutter, S.L.: Vulnerability to environmental hazards. Prog. Hum. Geogr. 20(4), 529–539 (1996)
Geng, Y., Su, L., Jia, Y., Han, C.: Seismic events prediction using deep temporal convolution networks. J. Electr. Comput. Eng. 2019 (2019)
Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: KNN model-based approach in classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) OTM 2003. LNCS, vol. 2888, pp. 986–996. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39964-3_62
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
Islahulhaq, W.W., Ratih, I.D.: Classification of non-performing financing using logistic regression and synthetic minority over-sampling technique-nominal continuous (SMOTE-NC). Int. J. Adv. Soft Comput. Appl. 13, 115–128 (2021)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Kalaycioglu, O., Akhanli, S.E., Mentese, E.Y., Kalaycioglu, M., Kalaycioglu, S.: Using machine learning algorithms to identify predictors of social vulnerability in the event of an earthquake: Istanbul case study. Nat. Hazards Earth Syst. Sci. Discuss. 2022, 1–32 (2022)
Kiani, J., Camp, C., Pezeshk, S.: On the application of machine learning techniques to derive seismic fragility curves. Comput. Struct. 218, 108–122 (2019). https://doi.org/10.1016/j.compstruc.2019.03.004. https://www.sciencedirect.com/science/article/pii/S0045794918318650
Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39, 261–283 (2013)
Kruse, R., Mostaghim, S., Borgelt, C., Braune, C., Steinbrecher, M.: Multi-layer perceptrons. In: Computational Intelligence. TCS, pp. 53–124. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-42227-1_5
Maldonado, S., López, J., Vairetti, C.: An alternative smote oversampling strategy for high-dimensional datasets. Appl. Soft Comput. 76, 380–389 (2019)
Menon, A.P., Varghese, A., Joseph, J.P., Sajan, J., Francis, N.: Performance analysis of different classifiers for earthquake prediction: Pace. IJIRT 2, 142–146 (2020)
Mohammed, R., Rawashdeh, J., Abdullah, M.: Machine learning with oversampling and undersampling techniques: overview study and experimental results. In: 2020 11th International Conference on Information and Communication Systems (ICICS), pp. 243–248. IEEE (2020)
Mugdha, S.B.S., et al.: A Gaussian Naive Bayesian classifier for fake news detection in Bengali. In: Hassanien, A.E., Bhattacharyya, S., Chakrabati, S., Bhattacharya, A., Dutta, S. (eds.) Emerging Technologies in Data Mining and Information Security. AISC, vol. 1300, pp. 283–291. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-4367-2_28
Naim, F.A., Hannan, U.H., Humayun Kabir, M.: Effective rate of minority class over-sampling for maximizing the imbalanced dataset model performance. In: Gupta, D., Polkowski, Z., Khanna, A., Bhattacharyya, S., Castillo, O. (eds.) Proceedings of Data Analytics and Management. LNDECT, vol. 91, pp. 9–20. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-6285-0_2
Netti, K., Radhika, Y.: An efficient Naïve Bayes classifier with negation handling for seismic hazard prediction. In: 2016 10th International Conference on Intelligent Systems and Control (ISCO), pp. 1–4. IEEE (2016)
Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 3(1), 4–21 (2011)
Nicolis, O., Plaza, F., Salas, R.: Prediction of intensity and location of seismic events using deep learning. Spat. Stat. 42, 100442 (2021)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Prusty, S., Patnaik, S., Dash, S.K.: SKCV: stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. Front. Nanotechnol. 4, 972421 (2022)
Rachburee, N., Punlumjeak, W.: Oversampling technique in student performance classification from engineering course. Int. J. Electr. Comput. Eng. 11(4), 3567 (2021)
Revathi, M., Ramyachitra, D.: A modified borderline smote with noise reduction in imbalanced datasets. Wirel. Pers. Commun. 121, 1659–1680 (2021)
Rigatti, S.J.: Random forest. J. Insur. Med. 47(1), 31–39 (2017)
Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41136-6_5
Shen, W., Fan, W., Chen, C.: An electric vehicle charging pile fault diagnosis system using Borderline-SMOTE and LightGBM. In: Tenth International Symposium on Precision Mechanical Measurements, vol. 12059, pp. 615–622. SPIE (2021)
Singh, A., Yadav, A., Rana, A.: K-means with three different distance metrics. Int. J. Comput. Appl. 67(10) (2013)
Turlapati, V.P.K., Prusty, M.R.: Outlier-SMOTE: a refined oversampling technique for improved detection of COVID-19. Intell.-Based Med. 3, 100023 (2020)
Verma, L.K., Kishore, N., Jharia, D.: Predicting dangerous seismic events in active coal mines through data mining. Int. J. Appl. Eng. Res. 12(5), 567–571 (2017)
Wang, D., Liang, Y., Yang, X., Dong, H., Tan, C.: A safe zone smote oversampling algorithm used in earthquake prediction based on extreme imbalanced precursor data. Int. J. Pattern Recogn. Artif. Intell. 35(13), 2155013 (2021)
Widodo, S., Brawijaya, H., Samudi, S.: Stratified K-fold cross validation optimization on machine learning for prediction. Sinkron: jurnal dan penelitian teknik informatika 7(4), 2407–2414 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mokoatle, M., Coleman, T., Mokilane, P. (2023). A Comparative Study of Over-Sampling Techniques as Applied to Seismic Events. In: Pillay, A., Jembere, E., J. Gerber, A. (eds) Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science, vol 1976. Springer, Cham. https://doi.org/10.1007/978-3-031-49002-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-49002-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49001-9
Online ISBN: 978-3-031-49002-6
eBook Packages: Computer ScienceComputer Science (R0)