Skip to main content

A Comparative Study of Over-Sampling Techniques as Applied to Seismic Events

  • Conference paper
  • First Online:
Artificial Intelligence Research (SACAIR 2023)

Abstract

The likelihood that an earthquake will occur in a specific location, within a specific time frame, and with ground motion intensity greater than a specific threshold is known as a seismic hazard. Predicting these types of hazards is crucial since doing so can enable early warnings, which can lessen the negative effects. Research is currently being executed in the field of machine learning to predict seismic events based on previously recorded incidents. However, because these events happen so infrequently, this presents a class imbalance problem to the machine learning or deep learning learners. As a result, this study provided a comparison of the performance of popular over-sampling techniques that seek to even out class imbalance in seismic events data. Specifically, this work applied SMOTE, SMOTENC, SMOTEN, BorderlineSMOTE, SVMSMOTE, and ADASYN to an open source Seismic Bumps dataset then trained several machine learning classifiers with stratified K-fold cross-validation for seismic hazard detection. The SVMSMOTE algorithm was the best over-sampling method as it produced classifiers with the highest overall accuracy, F1 score, recall, and precision of 100%, respectively, whereas the ADASYN over-sampling methodology showed the lowest performance in all the reported metrices of all the models. To our understanding, no research has been done comparing the effectiveness of the aforementioned over-sampling techniques for tasks involving seismic events.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bisong, E., Bisong, E.: Logistic regression. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, pp. 243–250 (2019)

    Google Scholar 

  2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  MATH  Google Scholar 

  3. Cutter, S.L.: Vulnerability to environmental hazards. Prog. Hum. Geogr. 20(4), 529–539 (1996)

    Article  Google Scholar 

  4. Geng, Y., Su, L., Jia, Y., Han, C.: Seismic events prediction using deep temporal convolution networks. J. Electr. Comput. Eng. 2019 (2019)

    Google Scholar 

  5. Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: KNN model-based approach in classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) OTM 2003. LNCS, vol. 2888, pp. 986–996. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39964-3_62

    Chapter  Google Scholar 

  6. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  7. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)

    Google Scholar 

  8. Islahulhaq, W.W., Ratih, I.D.: Classification of non-performing financing using logistic regression and synthetic minority over-sampling technique-nominal continuous (SMOTE-NC). Int. J. Adv. Soft Comput. Appl. 13, 115–128 (2021)

    Google Scholar 

  9. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    Article  MATH  Google Scholar 

  10. Kalaycioglu, O., Akhanli, S.E., Mentese, E.Y., Kalaycioglu, M., Kalaycioglu, S.: Using machine learning algorithms to identify predictors of social vulnerability in the event of an earthquake: Istanbul case study. Nat. Hazards Earth Syst. Sci. Discuss. 2022, 1–32 (2022)

    Google Scholar 

  11. Kiani, J., Camp, C., Pezeshk, S.: On the application of machine learning techniques to derive seismic fragility curves. Comput. Struct. 218, 108–122 (2019). https://doi.org/10.1016/j.compstruc.2019.03.004. https://www.sciencedirect.com/science/article/pii/S0045794918318650

  12. Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39, 261–283 (2013)

    Article  Google Scholar 

  13. Kruse, R., Mostaghim, S., Borgelt, C., Braune, C., Steinbrecher, M.: Multi-layer perceptrons. In: Computational Intelligence. TCS, pp. 53–124. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-42227-1_5

    Chapter  MATH  Google Scholar 

  14. Maldonado, S., López, J., Vairetti, C.: An alternative smote oversampling strategy for high-dimensional datasets. Appl. Soft Comput. 76, 380–389 (2019)

    Article  Google Scholar 

  15. Menon, A.P., Varghese, A., Joseph, J.P., Sajan, J., Francis, N.: Performance analysis of different classifiers for earthquake prediction: Pace. IJIRT 2, 142–146 (2020)

    Google Scholar 

  16. Mohammed, R., Rawashdeh, J., Abdullah, M.: Machine learning with oversampling and undersampling techniques: overview study and experimental results. In: 2020 11th International Conference on Information and Communication Systems (ICICS), pp. 243–248. IEEE (2020)

    Google Scholar 

  17. Mugdha, S.B.S., et al.: A Gaussian Naive Bayesian classifier for fake news detection in Bengali. In: Hassanien, A.E., Bhattacharyya, S., Chakrabati, S., Bhattacharya, A., Dutta, S. (eds.) Emerging Technologies in Data Mining and Information Security. AISC, vol. 1300, pp. 283–291. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-4367-2_28

    Chapter  Google Scholar 

  18. Naim, F.A., Hannan, U.H., Humayun Kabir, M.: Effective rate of minority class over-sampling for maximizing the imbalanced dataset model performance. In: Gupta, D., Polkowski, Z., Khanna, A., Bhattacharyya, S., Castillo, O. (eds.) Proceedings of Data Analytics and Management. LNDECT, vol. 91, pp. 9–20. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-6285-0_2

    Chapter  Google Scholar 

  19. Netti, K., Radhika, Y.: An efficient Naïve Bayes classifier with negation handling for seismic hazard prediction. In: 2016 10th International Conference on Intelligent Systems and Control (ISCO), pp. 1–4. IEEE (2016)

    Google Scholar 

  20. Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 3(1), 4–21 (2011)

    Article  Google Scholar 

  21. Nicolis, O., Plaza, F., Salas, R.: Prediction of intensity and location of seismic events using deep learning. Spat. Stat. 42, 100442 (2021)

    Article  MathSciNet  Google Scholar 

  22. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  23. Prusty, S., Patnaik, S., Dash, S.K.: SKCV: stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. Front. Nanotechnol. 4, 972421 (2022)

    Article  Google Scholar 

  24. Rachburee, N., Punlumjeak, W.: Oversampling technique in student performance classification from engineering course. Int. J. Electr. Comput. Eng. 11(4), 3567 (2021)

    Google Scholar 

  25. Revathi, M., Ramyachitra, D.: A modified borderline smote with noise reduction in imbalanced datasets. Wirel. Pers. Commun. 121, 1659–1680 (2021)

    Article  Google Scholar 

  26. Rigatti, S.J.: Random forest. J. Insur. Med. 47(1), 31–39 (2017)

    Article  Google Scholar 

  27. Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41136-6_5

    Chapter  Google Scholar 

  28. Shen, W., Fan, W., Chen, C.: An electric vehicle charging pile fault diagnosis system using Borderline-SMOTE and LightGBM. In: Tenth International Symposium on Precision Mechanical Measurements, vol. 12059, pp. 615–622. SPIE (2021)

    Google Scholar 

  29. Singh, A., Yadav, A., Rana, A.: K-means with three different distance metrics. Int. J. Comput. Appl. 67(10) (2013)

    Google Scholar 

  30. Turlapati, V.P.K., Prusty, M.R.: Outlier-SMOTE: a refined oversampling technique for improved detection of COVID-19. Intell.-Based Med. 3, 100023 (2020)

    Google Scholar 

  31. Verma, L.K., Kishore, N., Jharia, D.: Predicting dangerous seismic events in active coal mines through data mining. Int. J. Appl. Eng. Res. 12(5), 567–571 (2017)

    Google Scholar 

  32. Wang, D., Liang, Y., Yang, X., Dong, H., Tan, C.: A safe zone smote oversampling algorithm used in earthquake prediction based on extreme imbalanced precursor data. Int. J. Pattern Recogn. Artif. Intell. 35(13), 2155013 (2021)

    Article  Google Scholar 

  33. Widodo, S., Brawijaya, H., Samudi, S.: Stratified K-fold cross validation optimization on machine learning for prediction. Sinkron: jurnal dan penelitian teknik informatika 7(4), 2407–2414 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mpho Mokoatle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mokoatle, M., Coleman, T., Mokilane, P. (2023). A Comparative Study of Over-Sampling Techniques as Applied to Seismic Events. In: Pillay, A., Jembere, E., J. Gerber, A. (eds) Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science, vol 1976. Springer, Cham. https://doi.org/10.1007/978-3-031-49002-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49002-6_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49001-9

  • Online ISBN: 978-3-031-49002-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics