Skip to main content
Log in

A hybrid cluster-borderline SMOTE method for imbalanced data of rock groutability classification

  • Original Paper
  • Published:
Bulletin of Engineering Geology and the Environment Aims and scope Submit manuscript

Abstract

Groutability classification is highly important for guaranteeing the safety and quality of grouting projects. However, the precision of groutability classification is inevitably influenced by imbalanced data, in which most fractured rock masses are groutable. Current studies cannot realize high-precision classification for minority classes without considering imbalanced data. Although synthetic minority oversampling technique (SMOTE) is the most influential oversampling method, it produces redundant samples and noise labels. To address these issues, a hybrid cluster-borderline SMOTE method (HCBS) is proposed in this paper. The weights of samples near the minority class center and border were improved to present the category feature, thereby solving the redundant samples problem of SMOTE. To restrain noise, the total samples were divided into different clusters by k-means, and a cluster with an imbalance ratio of more than one was selected to generate new samples. The negative majority samples near the minority class border were removed by variant borderline SMOTE to clean the dataset from noisy instances, and the original and produced minority samples were reduced to a certain ratio according to the majority samples. Finally, the classification precision of the proposed HCBS method was verified using random forest (RF) in grouting engineering, and the number of trees and split variables of the tree nodes were optimized using the grey wolf optimization (GWO) algorithm. The proposed HCBS-GRF method outperformed RF, GWO-optimized RF (GRF), SMOTE-GRF, density SMOTE-GRF, borderline density SMOTE-GRF, and other competitive methods, thereby providing the highest groutability classification accuracy with the least number of generated instances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Azimian A, Ajalloeian R (2015) Permeability and groutability appraisal of the Nargesi dam site in Iran based on the secondary permeability index, joint hydraulic aperture and Lugeon tests. Bull Eng Geol Environ 74:845–859

    Article  Google Scholar 

  • Barton N, Choubey V (1977) The shear strength of rock joints in theory and practice. Rock Mech 10(1–2):1–54

    Article  Google Scholar 

  • Barua S, Islam MM, Yao X, Murase K (2013) MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425

    Article  Google Scholar 

  • Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsletter 6(1):20–29

    Article  Google Scholar 

  • Bayan C, Fisher R (2015) Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recogn 48:1653–1672

    Article  Google Scholar 

  • Breiman L (2001) Random Forests Mach Lean 45(1):5–32

    Google Scholar 

  • Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Advances in Knowledge Discovery & Data Mining, LNAI 5476, pp 475-482

  • Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. PKDD, LNAI 2838:107–119

    Google Scholar 

  • Chen BY, Xia SY, Chen ZZ, Wang BG, Wang GY (2020) RSMOTE: a self-adaptive robust SMOTE for imbalanced problems with label noise. Inf Sci 553:397–428

    Article  Google Scholar 

  • Chen YY, Zheng WZ, Li WB, Huang YM (2021) Large group activity security risk assessment and risk early warning based on random forest algorithm. Pattern Recognit Lett 144:1–5

    Article  Google Scholar 

  • Cheng L, Chen XW, De VJ, Lai XJ, Witlox F (2019) Applying a random forest method approach to model travel mode choice behavior. Travel Behav Soc 14:1–10

    Article  Google Scholar 

  • Deng SH, Wang XL, Yu J, Zhang YC, Liu Z, Zhu YS (2018) Simulation of grouting process in rock masses under a dam foundation characterized by a 3D fracture network. Rock Mech Rock Eng 51:1801

    Article  Google Scholar 

  • Deng SH, Wang XL, Zhu YS, Lv F, Wang JJ (2019) Hybrid grey wolf optimization algorithm–based support vector machine for groutability prediction of fractured rock mass. J Comput Civil Eng 33(2):04018065

    Article  Google Scholar 

  • Dong YJ, Wang XH (2011) A new over-sampling approach: random-SMOTE for learning from imbalanced data sets. In: LNCS 7091: Proceedings of the 5th Interna-tional Conference on Knowledge Science, Engineering and Management (KSE ̓M11), Berlin, Heidelberg: Springer-Verlag, pp 343–352

  • Douzas G, Bacao F (2019) Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Inf Sci 501:118–135

    Article  Google Scholar 

  • Ebrahim R, Ebrahim ST, Ahmad R (2019) Cement take estimation using neural networks and statistical analysis in Bakhtiari and Karun 4 dam sites, in south west of Iran. Bull Eng Geol Environ 78:2817–2834

    Article  Google Scholar 

  • Feng SX, Zhao YF, Wang YJ (2020) A comprehensive approach to karst identification and groutability evaluation – a case study of the Dehou reservoir, SW China. Eng Geol 269:105529

    Article  Google Scholar 

  • Galar M, Fernandez A, Barrenechea E, Sola HB (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Pt C 42(4):463–484

    Article  Google Scholar 

  • Georgios D, Fernando B, Felix L (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf Sci 465:1–20

    Article  Google Scholar 

  • Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new over sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, ICIC, pp 878–887

  • Hoang ND, Bui DT, Liao KW (2016) Groutability estimation of grouting processes with cement grouts using differential flower pollination optimized support vector machine. Appl Soft Comput 45:173–186

    Article  Google Scholar 

  • Hong HY, Miao YM, Liu JZ, Zhu AX (2019) Exploring the effects of the design and quantity of absence data on the performance of random forest-based landslide susceptibility mapping. CATENA 176:45–64

    Article  Google Scholar 

  • Hu F, Li H (2013) A novel boundary oversampling algorithm based on neighborhood rough set model: NRSBoundary-SMOTE. Math Probl Eng (pt.13):43–44

  • Huang YG, Zhao A, Guo WB (2020) Experimental study on groutability and reconstructability of broken mudstone and their relationship. Arab J Geosci 13:774

    Article  Google Scholar 

  • Khosravi A, Nahavandi S, Creighton D, Atiya AF (2011) Lower upper bound estimation method for construction of neural network–based prediction intervals. IEEE Trans Neural Netw 22(3):337–346

    Article  Google Scholar 

  • Koziarski M, Krawczyk B, Woz´niak M (2019) Radial-based oversampling for noisy imbalanced data classification. Neurocomputing 343:19–33

    Article  Google Scholar 

  • Liu RT, Zhang CY, Liu HJ (2020) Effects of flocculation of cement slurry on groutability of porous media. Constr Build Mater 237:117649

    Article  Google Scholar 

  • Majzoub HA, Elgedawy I, Akaydn Y, Ulukk MK (2020) HCAB-SMOTE: a hybrid clustered affinitive borderline SMOTE approach for imbalanced data binary classification. Arab J Sci Eng 45:3205–3222

    Article  Google Scholar 

  • Markou IN, Kakavias CK, Christodoulou DN, Toumpanou I, Atmatzidis DK (2020) Prediction of cement suspension groutability based on sand hydraulic conductivity. Soils Found 60:825–839

    Article  Google Scholar 

  • Mirjalili S, Mirjalili SM, Lewis A (2014) Grey Wolf Optimizer Adv Eng Softw 69:46–61

    Article  Google Scholar 

  • Mirzaei B, Nikpour B, Nezamabadi-Pour H (2021) CDBH: a clustering and density-based hybrid approach for imbalanced data classification. Expert Syst Appl 164:114035

    Article  Google Scholar 

  • Ramentol E, Caballero Y, Bello R, Herrera F (2012) Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl Inf Syst 33(2):245–265

    Article  Google Scholar 

  • Rastegar NA, Lashkaripour GR, Ghafoori M (2016) Prediction of grout take using rock mass properties. Bull Eng Geol Environ 76(4):1643–1654

    Article  Google Scholar 

  • Robert O, Hemant I (2019) A random forests quantile classifier for class imbalanced data. Pattern Recogn 90:232–249

    Article  Google Scholar 

  • Sankar B, Fredrik S, Björn W (2017) Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins. J Comput Aid Mol Des 31:453–466

    Article  Google Scholar 

  • Santos MS, Abreu PH, García-Laencina PJ, Simão A, Carvalho A (2015) A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J Biomed Inf 58:49–59

    Article  Google Scholar 

  • Seiffert C, Khoshgoftaar TM, Hulse JV, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Pt A Syst Hum 40(1):185–197

    Article  Google Scholar 

  • Sohrabi-Bidar A, Rastegar-Nia A, Zolfaghari A (2015) Estimation of the grout take using empirical relationships (case study: Bakhtiari dam site). Bull Eng Geol Environ 75:425–438

    Article  Google Scholar 

  • Tran HH, Hoang ND (2014) An artificial intelligence approach for groutability estimation based on autotuning support vector machine. J Constr Eng 2014:1–9

    Google Scholar 

  • Tekin E, Akbas SO (2017) Predicting groutability of granular soils using adaptive neuro-fuzzy inference system. Neural Comput Appl 31(4):1091–1101

    Article  Google Scholar 

  • Torres FR, Carrasco-Ochoa JA, Martínez-Trinidad JF (2016) SMOTE-D a deterministic version of SMOTE. In: Mexican Conference on Pattern Recognition, pp 177–188

  • Wan C, Xu Z, Pinson P, Zhao YD, Wong KP (2014) Probabilistic forecasting of wind power generation using extreme learning machine. IEEE Trans Power Syst 29(3):1033–1044

    Article  Google Scholar 

  • Yan YT, Wu ZB, Du XQ, Chen J, Zhao S, Zhang YP (2018) A three-way decision ensemble method for imbalanced data oversampling. Int J Approx Reason 107:1–16

    Article  Google Scholar 

  • Yang CP (2004) Estimating cement take and grout efficiency on foundation improvement for Li-Yu-Tan dam. Eng Geol 75(1):1–14

    Article  Google Scholar 

  • Zhai J, Qi J, Zhang S (2021) Imbalanced data classification based on diverse sample generation and classifier fusion. Int J Mach Learn Cybern 7

  • Zhong DH, Yan FG, Li MC, Huang CX, Fan K, Tang JF (2014) A real-time analysis and feedback system for quality control of dam foundation grouting engineering. Rock Mech Rock Eng 48(5):1947–1968

    Article  Google Scholar 

Download references

Funding

This study was funded by the National Natural Science Foundation of China (Grant No. 51839007) and China Three Gorges Projects Development Co., Ltd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bingyu Ren.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, K., Ren, B., Guan, T. et al. A hybrid cluster-borderline SMOTE method for imbalanced data of rock groutability classification. Bull Eng Geol Environ 81, 39 (2022). https://doi.org/10.1007/s10064-021-02523-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10064-021-02523-9

Keywords

Navigation