Abstract
Groutability classification is highly important for guaranteeing the safety and quality of grouting projects. However, the precision of groutability classification is inevitably influenced by imbalanced data, in which most fractured rock masses are groutable. Current studies cannot realize high-precision classification for minority classes without considering imbalanced data. Although synthetic minority oversampling technique (SMOTE) is the most influential oversampling method, it produces redundant samples and noise labels. To address these issues, a hybrid cluster-borderline SMOTE method (HCBS) is proposed in this paper. The weights of samples near the minority class center and border were improved to present the category feature, thereby solving the redundant samples problem of SMOTE. To restrain noise, the total samples were divided into different clusters by k-means, and a cluster with an imbalance ratio of more than one was selected to generate new samples. The negative majority samples near the minority class border were removed by variant borderline SMOTE to clean the dataset from noisy instances, and the original and produced minority samples were reduced to a certain ratio according to the majority samples. Finally, the classification precision of the proposed HCBS method was verified using random forest (RF) in grouting engineering, and the number of trees and split variables of the tree nodes were optimized using the grey wolf optimization (GWO) algorithm. The proposed HCBS-GRF method outperformed RF, GWO-optimized RF (GRF), SMOTE-GRF, density SMOTE-GRF, borderline density SMOTE-GRF, and other competitive methods, thereby providing the highest groutability classification accuracy with the least number of generated instances.
Similar content being viewed by others
References
Azimian A, Ajalloeian R (2015) Permeability and groutability appraisal of the Nargesi dam site in Iran based on the secondary permeability index, joint hydraulic aperture and Lugeon tests. Bull Eng Geol Environ 74:845–859
Barton N, Choubey V (1977) The shear strength of rock joints in theory and practice. Rock Mech 10(1–2):1–54
Barua S, Islam MM, Yao X, Murase K (2013) MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsletter 6(1):20–29
Bayan C, Fisher R (2015) Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recogn 48:1653–1672
Breiman L (2001) Random Forests Mach Lean 45(1):5–32
Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia Conference on Advances in Knowledge Discovery & Data Mining, LNAI 5476, pp 475-482
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting. PKDD, LNAI 2838:107–119
Chen BY, Xia SY, Chen ZZ, Wang BG, Wang GY (2020) RSMOTE: a self-adaptive robust SMOTE for imbalanced problems with label noise. Inf Sci 553:397–428
Chen YY, Zheng WZ, Li WB, Huang YM (2021) Large group activity security risk assessment and risk early warning based on random forest algorithm. Pattern Recognit Lett 144:1–5
Cheng L, Chen XW, De VJ, Lai XJ, Witlox F (2019) Applying a random forest method approach to model travel mode choice behavior. Travel Behav Soc 14:1–10
Deng SH, Wang XL, Yu J, Zhang YC, Liu Z, Zhu YS (2018) Simulation of grouting process in rock masses under a dam foundation characterized by a 3D fracture network. Rock Mech Rock Eng 51:1801
Deng SH, Wang XL, Zhu YS, Lv F, Wang JJ (2019) Hybrid grey wolf optimization algorithm–based support vector machine for groutability prediction of fractured rock mass. J Comput Civil Eng 33(2):04018065
Dong YJ, Wang XH (2011) A new over-sampling approach: random-SMOTE for learning from imbalanced data sets. In: LNCS 7091: Proceedings of the 5th Interna-tional Conference on Knowledge Science, Engineering and Management (KSE ̓M11), Berlin, Heidelberg: Springer-Verlag, pp 343–352
Douzas G, Bacao F (2019) Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Inf Sci 501:118–135
Ebrahim R, Ebrahim ST, Ahmad R (2019) Cement take estimation using neural networks and statistical analysis in Bakhtiari and Karun 4 dam sites, in south west of Iran. Bull Eng Geol Environ 78:2817–2834
Feng SX, Zhao YF, Wang YJ (2020) A comprehensive approach to karst identification and groutability evaluation – a case study of the Dehou reservoir, SW China. Eng Geol 269:105529
Galar M, Fernandez A, Barrenechea E, Sola HB (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Pt C 42(4):463–484
Georgios D, Fernando B, Felix L (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf Sci 465:1–20
Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new over sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, ICIC, pp 878–887
Hoang ND, Bui DT, Liao KW (2016) Groutability estimation of grouting processes with cement grouts using differential flower pollination optimized support vector machine. Appl Soft Comput 45:173–186
Hong HY, Miao YM, Liu JZ, Zhu AX (2019) Exploring the effects of the design and quantity of absence data on the performance of random forest-based landslide susceptibility mapping. CATENA 176:45–64
Hu F, Li H (2013) A novel boundary oversampling algorithm based on neighborhood rough set model: NRSBoundary-SMOTE. Math Probl Eng (pt.13):43–44
Huang YG, Zhao A, Guo WB (2020) Experimental study on groutability and reconstructability of broken mudstone and their relationship. Arab J Geosci 13:774
Khosravi A, Nahavandi S, Creighton D, Atiya AF (2011) Lower upper bound estimation method for construction of neural network–based prediction intervals. IEEE Trans Neural Netw 22(3):337–346
Koziarski M, Krawczyk B, Woz´niak M (2019) Radial-based oversampling for noisy imbalanced data classification. Neurocomputing 343:19–33
Liu RT, Zhang CY, Liu HJ (2020) Effects of flocculation of cement slurry on groutability of porous media. Constr Build Mater 237:117649
Majzoub HA, Elgedawy I, Akaydn Y, Ulukk MK (2020) HCAB-SMOTE: a hybrid clustered affinitive borderline SMOTE approach for imbalanced data binary classification. Arab J Sci Eng 45:3205–3222
Markou IN, Kakavias CK, Christodoulou DN, Toumpanou I, Atmatzidis DK (2020) Prediction of cement suspension groutability based on sand hydraulic conductivity. Soils Found 60:825–839
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey Wolf Optimizer Adv Eng Softw 69:46–61
Mirzaei B, Nikpour B, Nezamabadi-Pour H (2021) CDBH: a clustering and density-based hybrid approach for imbalanced data classification. Expert Syst Appl 164:114035
Ramentol E, Caballero Y, Bello R, Herrera F (2012) Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowl Inf Syst 33(2):245–265
Rastegar NA, Lashkaripour GR, Ghafoori M (2016) Prediction of grout take using rock mass properties. Bull Eng Geol Environ 76(4):1643–1654
Robert O, Hemant I (2019) A random forests quantile classifier for class imbalanced data. Pattern Recogn 90:232–249
Sankar B, Fredrik S, Björn W (2017) Proteus: a random forest classifier to predict disorder-to-order transitioning binding regions in intrinsically disordered proteins. J Comput Aid Mol Des 31:453–466
Santos MS, Abreu PH, García-Laencina PJ, Simão A, Carvalho A (2015) A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J Biomed Inf 58:49–59
Seiffert C, Khoshgoftaar TM, Hulse JV, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Pt A Syst Hum 40(1):185–197
Sohrabi-Bidar A, Rastegar-Nia A, Zolfaghari A (2015) Estimation of the grout take using empirical relationships (case study: Bakhtiari dam site). Bull Eng Geol Environ 75:425–438
Tran HH, Hoang ND (2014) An artificial intelligence approach for groutability estimation based on autotuning support vector machine. J Constr Eng 2014:1–9
Tekin E, Akbas SO (2017) Predicting groutability of granular soils using adaptive neuro-fuzzy inference system. Neural Comput Appl 31(4):1091–1101
Torres FR, Carrasco-Ochoa JA, Martínez-Trinidad JF (2016) SMOTE-D a deterministic version of SMOTE. In: Mexican Conference on Pattern Recognition, pp 177–188
Wan C, Xu Z, Pinson P, Zhao YD, Wong KP (2014) Probabilistic forecasting of wind power generation using extreme learning machine. IEEE Trans Power Syst 29(3):1033–1044
Yan YT, Wu ZB, Du XQ, Chen J, Zhao S, Zhang YP (2018) A three-way decision ensemble method for imbalanced data oversampling. Int J Approx Reason 107:1–16
Yang CP (2004) Estimating cement take and grout efficiency on foundation improvement for Li-Yu-Tan dam. Eng Geol 75(1):1–14
Zhai J, Qi J, Zhang S (2021) Imbalanced data classification based on diverse sample generation and classifier fusion. Int J Mach Learn Cybern 7
Zhong DH, Yan FG, Li MC, Huang CX, Fan K, Tang JF (2014) A real-time analysis and feedback system for quality control of dam foundation grouting engineering. Rock Mech Rock Eng 48(5):1947–1968
Funding
This study was funded by the National Natural Science Foundation of China (Grant No. 51839007) and China Three Gorges Projects Development Co., Ltd.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, K., Ren, B., Guan, T. et al. A hybrid cluster-borderline SMOTE method for imbalanced data of rock groutability classification. Bull Eng Geol Environ 81, 39 (2022). https://doi.org/10.1007/s10064-021-02523-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10064-021-02523-9